[PATCH v2 0/5] Adding blk-mq and DMA support to pmem block driver

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/5] Adding blk-mq and DMA support to pmem block driver
@ 2017-08-02 18:40 Dave Jiang
  2017-08-02 18:41 ` [PATCH v2 1/5] dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels Dave Jiang
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Dave Jiang @ 2017-08-02 18:40 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, linux-nvdimm

v2:
- Make dma_prep_memcpy_* into one function per Dan.
- Addressed various comments from Ross with code formatting and etc.
- Replaced open code with offset_in_page() macro per Johannes.

The following series implements adds blk-mq support to the pmem block driver
and also adds infrastructure code to ioatdma and dmaengine in order to
support copying to and from scatterlist in order to process block
requests provided by blk-mq. The usage of DMA engines available on certain
platforms allow us to drastically reduce CPU utilization and at the same time
maintain performance that is good enough. Experimentations have been done on
DRAM backed pmem block device that showed the utilization of DMA engine is
beneficial. User can revert back to original behavior by providing
queue_mode=0 to the nd_pmem kernel module if desired.

---

Dave Jiang (5):
      dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels
      dmaengine: ioatdma: dma_prep_memcpy_sg support
      dmaengine: add SG support to dmaengine_unmap
      libnvdimm: Adding blk-mq support to the pmem driver
      libnvdimm: add DMA support for pmem blk-mq


 drivers/dma/dmaengine.c   |   45 +++++-
 drivers/dma/ioat/dma.h    |    4 +
 drivers/dma/ioat/init.c   |    4 -
 drivers/dma/ioat/prep.c   |   57 ++++++++
 drivers/nvdimm/pmem.c     |  331 ++++++++++++++++++++++++++++++++++++++++++---
 drivers/nvdimm/pmem.h     |    3 
 include/linux/dmaengine.h |    9 +
 7 files changed, 420 insertions(+), 33 deletions(-)

--
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/5] dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels
  2017-08-02 18:40 [PATCH v2 0/5] Adding blk-mq and DMA support to pmem block driver Dave Jiang
@ 2017-08-02 18:41 ` Dave Jiang
  2017-08-02 18:41 ` [PATCH v2 2/5] dmaengine: ioatdma: dma_prep_memcpy_sg support Dave Jiang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Dave Jiang @ 2017-08-02 18:41 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, linux-nvdimm

Commit 7618d0359c16 ("dmaengine: ioatdma: Set non RAID channels to be
private capable") makes all non-RAID ioatdma channels as private to be
requestable by dma_request_channel(). With PQ CAP support going away for
ioatdma, this would make all channels private. To support the usage of
ioatdma for blk-mq implementation of pmem we need as many channels we can
share in order to be high performing. Thus reverting the patch.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/ioat/init.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
index 6ad4384..e437112 100644
--- a/drivers/dma/ioat/init.c
+++ b/drivers/dma/ioat/init.c
@@ -1163,9 +1163,6 @@ static int ioat3_dma_probe(struct ioatdma_device *ioat_dma, int dca)
 		}
 	}
 
-	if (!(ioat_dma->cap & (IOAT_CAP_XOR | IOAT_CAP_PQ)))
-		dma_cap_set(DMA_PRIVATE, dma->cap_mask);
-
 	err = ioat_probe(ioat_dma);
 	if (err)
 		return err;

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 2/5] dmaengine: ioatdma: dma_prep_memcpy_sg support
  2017-08-02 18:40 [PATCH v2 0/5] Adding blk-mq and DMA support to pmem block driver Dave Jiang
  2017-08-02 18:41 ` [PATCH v2 1/5] dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels Dave Jiang
@ 2017-08-02 18:41 ` Dave Jiang
  2017-08-02 18:41 ` [PATCH v2 3/5] dmaengine: add SG support to dmaengine_unmap Dave Jiang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 28+ messages in thread
From: Dave Jiang @ 2017-08-02 18:41 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, linux-nvdimm

Adding ioatdma support to copy from a physically contiguos buffer to a
provided scatterlist and vice versa. This is used to support
reading/writing persistent memory in the pmem driver.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/ioat/dma.h    |    4 +++
 drivers/dma/ioat/init.c   |    1 +
 drivers/dma/ioat/prep.c   |   57 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dmaengine.h |    5 ++++
 4 files changed, 67 insertions(+)

diff --git a/drivers/dma/ioat/dma.h b/drivers/dma/ioat/dma.h
index 56200ee..6c08b06 100644
--- a/drivers/dma/ioat/dma.h
+++ b/drivers/dma/ioat/dma.h
@@ -370,6 +370,10 @@ struct dma_async_tx_descriptor *
 ioat_dma_prep_memcpy_lock(struct dma_chan *c, dma_addr_t dma_dest,
 			   dma_addr_t dma_src, size_t len, unsigned long flags);
 struct dma_async_tx_descriptor *
+ioat_dma_prep_memcpy_sg_lock(struct dma_chan *c,
+		struct scatterlist *sg, unsigned int sg_nents,
+		dma_addr_t dma_addr, bool to_sg, unsigned long flags);
+struct dma_async_tx_descriptor *
 ioat_prep_interrupt_lock(struct dma_chan *c, unsigned long flags);
 struct dma_async_tx_descriptor *
 ioat_prep_xor(struct dma_chan *chan, dma_addr_t dest, dma_addr_t *src,
diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
index e437112..f82d3bb 100644
--- a/drivers/dma/ioat/init.c
+++ b/drivers/dma/ioat/init.c
@@ -1091,6 +1091,7 @@ static int ioat3_dma_probe(struct ioatdma_device *ioat_dma, int dca)
 
 	dma = &ioat_dma->dma_dev;
 	dma->device_prep_dma_memcpy = ioat_dma_prep_memcpy_lock;
+	dma->device_prep_dma_memcpy_sg = ioat_dma_prep_memcpy_sg_lock;
 	dma->device_issue_pending = ioat_issue_pending;
 	dma->device_alloc_chan_resources = ioat_alloc_chan_resources;
 	dma->device_free_chan_resources = ioat_free_chan_resources;
diff --git a/drivers/dma/ioat/prep.c b/drivers/dma/ioat/prep.c
index 243421a..d8219af 100644
--- a/drivers/dma/ioat/prep.c
+++ b/drivers/dma/ioat/prep.c
@@ -159,6 +159,63 @@ ioat_dma_prep_memcpy_lock(struct dma_chan *c, dma_addr_t dma_dest,
 	return &desc->txd;
 }
 
+struct dma_async_tx_descriptor *
+ioat_dma_prep_memcpy_sg_lock(struct dma_chan *c,
+		struct scatterlist *sg, unsigned int sg_nents,
+		dma_addr_t dma_addr, bool to_sg, unsigned long flags)
+{
+	struct ioatdma_chan *ioat_chan = to_ioat_chan(c);
+	struct ioat_dma_descriptor *hw = NULL;
+	struct ioat_ring_ent *desc = NULL;
+	dma_addr_t dma_off = dma_addr;
+	int num_descs, idx, i;
+	struct scatterlist *s;
+	size_t total_len = 0, len;
+
+
+	if (test_bit(IOAT_CHAN_DOWN, &ioat_chan->state))
+		return NULL;
+
+	/*
+	 * The upper layer will garantee that each entry does not exceed
+	 * xfercap.
+	 */
+	num_descs = sg_nents;
+
+	if (likely(num_descs) &&
+	    ioat_check_space_lock(ioat_chan, num_descs) == 0)
+		idx = ioat_chan->head;
+	else
+		return NULL;
+
+	for_each_sg(sg, s, sg_nents, i) {
+		desc = ioat_get_ring_ent(ioat_chan, idx + i);
+		hw = desc->hw;
+		len = sg_dma_len(s);
+		hw->size = len;
+		hw->ctl = 0;
+		if (to_sg) {
+			hw->src_addr = dma_off;
+			hw->dst_addr = sg_dma_address(s);
+		} else {
+			hw->src_addr = sg_dma_address(s);
+			hw->dst_addr = dma_off;
+		}
+		dma_off += len;
+		total_len += len;
+		dump_desc_dbg(ioat_chan, desc);
+	}
+
+	desc->txd.flags = flags;
+	desc->len = total_len;
+	hw->ctl_f.int_en = !!(flags & DMA_PREP_INTERRUPT);
+	hw->ctl_f.fence = !!(flags & DMA_PREP_FENCE);
+	hw->ctl_f.compl_write = 1;
+	dump_desc_dbg(ioat_chan, desc);
+	/* we leave the channel locked to ensure in order submission */
+
+	return &desc->txd;
+}
 
 static struct dma_async_tx_descriptor *
 __ioat_prep_xor_lock(struct dma_chan *c, enum sum_check_flags *result,
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 5336808..060f152 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -694,6 +694,7 @@ struct dma_filter {
  * @device_prep_dma_memset: prepares a memset operation
  * @device_prep_dma_memset_sg: prepares a memset operation over a scatter list
  * @device_prep_dma_interrupt: prepares an end of chain interrupt operation
+ * @device_prep_dma_memcpy_sg: prepares memcpy between scatterlist and buffer
  * @device_prep_slave_sg: prepares a slave dma operation
  * @device_prep_dma_cyclic: prepare a cyclic dma operation suitable for audio.
  *	The function takes a buffer of size buf_len. The callback function will
@@ -776,6 +777,10 @@ struct dma_device {
 		struct scatterlist *dst_sg, unsigned int dst_nents,
 		struct scatterlist *src_sg, unsigned int src_nents,
 		unsigned long flags);
+	struct dma_async_tx_descriptor *(*device_prep_dma_memcpy_sg)(
+		struct dma_chan *chan,
+		struct scatterlist *dst_sg, unsigned int dst_nents,
+		dma_addr_t src, bool to_sg, unsigned long flags);
 
 	struct dma_async_tx_descriptor *(*device_prep_slave_sg)(
 		struct dma_chan *chan, struct scatterlist *sgl,

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 3/5] dmaengine: add SG support to dmaengine_unmap
  2017-08-02 18:40 [PATCH v2 0/5] Adding blk-mq and DMA support to pmem block driver Dave Jiang
  2017-08-02 18:41 ` [PATCH v2 1/5] dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels Dave Jiang
  2017-08-02 18:41 ` [PATCH v2 2/5] dmaengine: ioatdma: dma_prep_memcpy_sg support Dave Jiang
@ 2017-08-02 18:41 ` Dave Jiang
  2017-08-02 18:41 ` [PATCH v2 4/5] libnvdimm: Adding blk-mq support to the pmem driver Dave Jiang
  2017-08-02 18:41 ` [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq Dave Jiang
  4 siblings, 0 replies; 28+ messages in thread
From: Dave Jiang @ 2017-08-02 18:41 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, linux-nvdimm

This should provide support to unmap scatterlist with the
dmaengine_unmap_data. We will support only 1 scatterlist per
direction. The DMA addresses array has been overloaded for the
2 or less entries DMA unmap data structure in order to store the
SG pointer(s).

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/dma/dmaengine.c   |   45 ++++++++++++++++++++++++++++++++++++---------
 include/linux/dmaengine.h |    4 ++++
 2 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index d9118ec..81d10c97 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -1126,16 +1126,35 @@ static void dmaengine_unmap(struct kref *kref)
 {
 	struct dmaengine_unmap_data *unmap = container_of(kref, typeof(*unmap), kref);
 	struct device *dev = unmap->dev;
-	int cnt, i;
+	int cnt, i, sg_nents;
+	struct scatterlist *sg;
+
+	sg_nents = dma_unmap_data_sg_to_nents(unmap, unmap->map_cnt);
+	if (sg_nents) {
+		i = 0;
+		cnt = 1;
+		sg = (struct scatterlist *)unmap->addr[i];
+		dma_unmap_sg(dev, sg, sg_nents, DMA_TO_DEVICE);
+	} else {
+		cnt = unmap->to_cnt;
+		for (i = 0; i < cnt; i++)
+			dma_unmap_page(dev, unmap->addr[i], unmap->len,
+					DMA_TO_DEVICE);
+	}
+
+	sg_nents = dma_unmap_data_sg_from_nents(unmap, unmap->map_cnt);
+	if (sg_nents) {
+		sg = (struct scatterlist *)unmap->addr[i];
+		dma_unmap_sg(dev, sg, sg_nents, DMA_FROM_DEVICE);
+		cnt++;
+		i++;
+	} else {
+		cnt += unmap->from_cnt;
+		for (; i < cnt; i++)
+			dma_unmap_page(dev, unmap->addr[i], unmap->len,
+					DMA_FROM_DEVICE);
+	}
 
-	cnt = unmap->to_cnt;
-	for (i = 0; i < cnt; i++)
-		dma_unmap_page(dev, unmap->addr[i], unmap->len,
-			       DMA_TO_DEVICE);
-	cnt += unmap->from_cnt;
-	for (; i < cnt; i++)
-		dma_unmap_page(dev, unmap->addr[i], unmap->len,
-			       DMA_FROM_DEVICE);
 	cnt += unmap->bidi_cnt;
 	for (; i < cnt; i++) {
 		if (unmap->addr[i] == 0)
@@ -1179,6 +1198,10 @@ static int __init dmaengine_init_unmap_pool(void)
 		size = sizeof(struct dmaengine_unmap_data) +
 		       sizeof(dma_addr_t) * p->size;
 
+		/* add 2 more entries for SG nents overload */
+		if (i == 0)
+			size += sizeof(dma_addr_t) * 2;
+
 		p->cache = kmem_cache_create(p->name, size, 0,
 					     SLAB_HWCACHE_ALIGN, NULL);
 		if (!p->cache)
@@ -1205,6 +1228,10 @@ dmaengine_get_unmap_data(struct device *dev, int nr, gfp_t flags)
 		return NULL;
 
 	memset(unmap, 0, sizeof(*unmap));
+	/* clear the overloaded sg nents entries */
+	if (nr < 3)
+		memset(&unmap->addr[nr], 0,
+				DMA_UNMAP_SG_ENTS * sizeof(dma_addr_t));
 	kref_init(&unmap->kref);
 	unmap->dev = dev;
 	unmap->map_cnt = nr;
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 060f152..ba978dd 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -475,6 +475,10 @@ struct dmaengine_unmap_data {
 	dma_addr_t addr[0];
 };
 
+#define DMA_UNMAP_SG_ENTS	2
+#define dma_unmap_data_sg_to_nents(x, n) x->addr[n]
+#define dma_unmap_data_sg_from_nents(x, n) x->addr[n+1]
+
 /**
  * struct dma_async_tx_descriptor - async transaction descriptor
  * ---dma generic offload fields---

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 4/5] libnvdimm: Adding blk-mq support to the pmem driver
  2017-08-02 18:40 [PATCH v2 0/5] Adding blk-mq and DMA support to pmem block driver Dave Jiang
                   ` (2 preceding siblings ...)
  2017-08-02 18:41 ` [PATCH v2 3/5] dmaengine: add SG support to dmaengine_unmap Dave Jiang
@ 2017-08-02 18:41 ` Dave Jiang
  2017-08-03 20:04   ` Ross Zwisler
  2017-08-02 18:41 ` [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq Dave Jiang
  4 siblings, 1 reply; 28+ messages in thread
From: Dave Jiang @ 2017-08-02 18:41 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, linux-nvdimm

Adding blk-mq support to the pmem driver in addition to the direct bio
support. This allows for hardware offloading via DMA engines. By default
the bio method will be enabled. The blk-mq support can be turned on via
module parameter queue_mode=1.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/nvdimm/pmem.c |  137 +++++++++++++++++++++++++++++++++++++++++--------
 drivers/nvdimm/pmem.h |    3 +
 2 files changed, 119 insertions(+), 21 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index c544d46..98e752f 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -31,10 +31,24 @@
 #include <linux/pmem.h>
 #include <linux/dax.h>
 #include <linux/nd.h>
+#include <linux/blk-mq.h>
 #include "pmem.h"
 #include "pfn.h"
 #include "nd.h"
 
+enum {
+	PMEM_Q_BIO = 0,
+	PMEM_Q_MQ = 1,
+};
+
+static int queue_mode = PMEM_Q_BIO;
+module_param(queue_mode, int, 0444);
+MODULE_PARM_DESC(queue_mode, "Pmem Queue Mode (0=BIO, 1=BLK-MQ)");
+
+struct pmem_cmd {
+	struct request *rq;
+};
+
 static struct device *to_dev(struct pmem_device *pmem)
 {
 	/*
@@ -239,9 +253,13 @@ static const struct dax_operations pmem_dax_ops = {
 	.direct_access = pmem_dax_direct_access,
 };
 
-static void pmem_release_queue(void *q)
+static void pmem_release_queue(void *data)
 {
-	blk_cleanup_queue(q);
+	struct pmem_device *pmem = (struct pmem_device *)data;
+
+	blk_cleanup_queue(pmem->q);
+	if (queue_mode == PMEM_Q_MQ)
+		blk_mq_free_tag_set(&pmem->tag_set);
 }
 
 static void pmem_freeze_queue(void *q)
@@ -259,6 +277,54 @@ static void pmem_release_disk(void *__pmem)
 	put_disk(pmem->disk);
 }
 
+static int pmem_handle_cmd(struct pmem_cmd *cmd)
+{
+	struct request *req = cmd->rq;
+	struct request_queue *q = req->q;
+	struct pmem_device *pmem = q->queuedata;
+	struct nd_region *nd_region = to_region(pmem);
+	struct bio_vec bvec;
+	struct req_iterator iter;
+	int rc = 0;
+
+	if (req->cmd_flags & REQ_FLUSH)
+		nvdimm_flush(nd_region);
+
+	rq_for_each_segment(bvec, req, iter) {
+		rc = pmem_do_bvec(pmem, bvec.bv_page, bvec.bv_len,
+				bvec.bv_offset, op_is_write(req_op(req)),
+				iter.iter.bi_sector);
+		if (rc < 0)
+			break;
+	}
+
+	if (req->cmd_flags & REQ_FUA)
+		nvdimm_flush(nd_region);
+
+	blk_mq_end_request(cmd->rq, rc);
+
+	return rc;
+}
+
+static int pmem_queue_rq(struct blk_mq_hw_ctx *hctx,
+		const struct blk_mq_queue_data *bd)
+{
+	struct pmem_cmd *cmd = blk_mq_rq_to_pdu(bd->rq);
+
+	cmd->rq = bd->rq;
+
+	blk_mq_start_request(bd->rq);
+
+	if (pmem_handle_cmd(cmd) < 0)
+		return BLK_MQ_RQ_QUEUE_ERROR;
+	else
+		return BLK_MQ_RQ_QUEUE_OK;
+}
+
+static const struct blk_mq_ops pmem_mq_ops = {
+	.queue_rq	= pmem_queue_rq,
+};
+
 static int pmem_attach_disk(struct device *dev,
 		struct nd_namespace_common *ndns)
 {
@@ -272,9 +338,9 @@ static int pmem_attach_disk(struct device *dev,
 	struct nd_pfn_sb *pfn_sb;
 	struct pmem_device *pmem;
 	struct resource pfn_res;
-	struct request_queue *q;
 	struct gendisk *disk;
 	void *addr;
+	int rc;
 
 	/* while nsio_rw_bytes is active, parse a pfn info block if present */
 	if (is_nd_pfn(dev)) {
@@ -303,17 +369,47 @@ static int pmem_attach_disk(struct device *dev,
 		return -EBUSY;
 	}
 
-	q = blk_alloc_queue_node(GFP_KERNEL, dev_to_node(dev));
-	if (!q)
-		return -ENOMEM;
+	if (queue_mode == PMEM_Q_MQ) {
+		pmem->tag_set.ops = &pmem_mq_ops;
+		pmem->tag_set.nr_hw_queues = nr_online_nodes;
+		pmem->tag_set.queue_depth = 64;
+		pmem->tag_set.numa_node = dev_to_node(dev);
+		pmem->tag_set.cmd_size = sizeof(struct pmem_cmd);
+		pmem->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+		pmem->tag_set.driver_data = pmem;
+
+		rc = blk_mq_alloc_tag_set(&pmem->tag_set);
+		if (rc < 0)
+			return rc;
+
+		pmem->q = blk_mq_init_queue(&pmem->tag_set);
+		if (IS_ERR(pmem->q)) {
+			blk_mq_free_tag_set(&pmem->tag_set);
+			return -ENOMEM;
+		}
 
-	if (devm_add_action_or_reset(dev, pmem_release_queue, q))
-		return -ENOMEM;
+		if (devm_add_action_or_reset(dev, pmem_release_queue, pmem)) {
+			blk_mq_free_tag_set(&pmem->tag_set);
+			return -ENOMEM;
+		}
+	} else if (queue_mode == PMEM_Q_BIO) {
+		pmem->q = blk_alloc_queue_node(GFP_KERNEL, dev_to_node(dev));
+		if (!pmem->q)
+			return -ENOMEM;
+
+		if (devm_add_action_or_reset(dev, pmem_release_queue, pmem))
+			return -ENOMEM;
+
+		blk_queue_make_request(pmem->q, pmem_make_request);
+	} else {
+		dev_warn(dev, "Invalid queue mode: %d\n", queue_mode);
+		return -EINVAL;
+	}
 
 	pmem->pfn_flags = PFN_DEV;
 	if (is_nd_pfn(dev)) {
-		addr = devm_memremap_pages(dev, &pfn_res, &q->q_usage_counter,
-				altmap);
+		addr = devm_memremap_pages(dev, &pfn_res,
+				&pmem->q->q_usage_counter, altmap);
 		pfn_sb = nd_pfn->pfn_sb;
 		pmem->data_offset = le64_to_cpu(pfn_sb->dataoff);
 		pmem->pfn_pad = resource_size(res) - resource_size(&pfn_res);
@@ -322,7 +418,7 @@ static int pmem_attach_disk(struct device *dev,
 		res->start += pmem->data_offset;
 	} else if (pmem_should_map_pages(dev)) {
 		addr = devm_memremap_pages(dev, &nsio->res,
-				&q->q_usage_counter, NULL);
+				&pmem->q->q_usage_counter, NULL);
 		pmem->pfn_flags |= PFN_MAP;
 	} else
 		addr = devm_memremap(dev, pmem->phys_addr,
@@ -332,21 +428,20 @@ static int pmem_attach_disk(struct device *dev,
 	 * At release time the queue must be frozen before
 	 * devm_memremap_pages is unwound
 	 */
-	if (devm_add_action_or_reset(dev, pmem_freeze_queue, q))
+	if (devm_add_action_or_reset(dev, pmem_freeze_queue, pmem->q))
 		return -ENOMEM;
 
 	if (IS_ERR(addr))
 		return PTR_ERR(addr);
 	pmem->virt_addr = addr;
 
-	blk_queue_write_cache(q, true, true);
-	blk_queue_make_request(q, pmem_make_request);
-	blk_queue_physical_block_size(q, PAGE_SIZE);
-	blk_queue_max_hw_sectors(q, UINT_MAX);
-	blk_queue_bounce_limit(q, BLK_BOUNCE_ANY);
-	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
-	queue_flag_set_unlocked(QUEUE_FLAG_DAX, q);
-	q->queuedata = pmem;
+	blk_queue_write_cache(pmem->q, true, true);
+	blk_queue_physical_block_size(pmem->q, PAGE_SIZE);
+	blk_queue_max_hw_sectors(pmem->q, UINT_MAX);
+	blk_queue_bounce_limit(pmem->q, BLK_BOUNCE_ANY);
+	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->q);
+	queue_flag_set_unlocked(QUEUE_FLAG_DAX, pmem->q);
+	pmem->q->queuedata = pmem;
 
 	disk = alloc_disk_node(0, nid);
 	if (!disk)
@@ -354,7 +449,7 @@ static int pmem_attach_disk(struct device *dev,
 	pmem->disk = disk;
 
 	disk->fops		= &pmem_fops;
-	disk->queue		= q;
+	disk->queue		= pmem->q;
 	disk->flags		= GENHD_FL_EXT_DEVT;
 	nvdimm_namespace_disk_name(ndns, disk->disk_name);
 	set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset)
diff --git a/drivers/nvdimm/pmem.h b/drivers/nvdimm/pmem.h
index 7f4dbd7..1836c7b 100644
--- a/drivers/nvdimm/pmem.h
+++ b/drivers/nvdimm/pmem.h
@@ -4,6 +4,7 @@
 #include <linux/types.h>
 #include <linux/pfn_t.h>
 #include <linux/fs.h>
+#include <linux/blk-mq.h>
 
 /* this definition is in it's own header for tools/testing/nvdimm to consume */
 struct pmem_device {
@@ -20,6 +21,8 @@ struct pmem_device {
 	struct badblocks	bb;
 	struct dax_device	*dax_dev;
 	struct gendisk		*disk;
+	struct blk_mq_tag_set	tag_set;
+	struct request_queue	*q;
 };
 
 long __pmem_direct_access(struct pmem_device *pmem, pgoff_t pgoff,

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-02 18:40 [PATCH v2 0/5] Adding blk-mq and DMA support to pmem block driver Dave Jiang
                   ` (3 preceding siblings ...)
  2017-08-02 18:41 ` [PATCH v2 4/5] libnvdimm: Adding blk-mq support to the pmem driver Dave Jiang
@ 2017-08-02 18:41 ` Dave Jiang
  2017-08-02 19:22   ` Sinan Kaya
  2017-08-03 20:20   ` Ross Zwisler
  4 siblings, 2 replies; 28+ messages in thread
From: Dave Jiang @ 2017-08-02 18:41 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams; +Cc: dmaengine, linux-nvdimm

Adding DMA support for pmem blk reads. This provides signficant CPU
reduction with large memory reads with good performance. DMAs are triggered
with test against bio_multiple_segment(), so the small I/Os (4k or less?)
are still performed by the CPU in order to reduce latency. By default
the pmem driver will be using blk-mq with DMA.

Numbers below are measured against pmem simulated via DRAM using
memmap=NN!SS.  DMA engine used is the ioatdma on Intel Skylake Xeon
platform.  Keep in mind the performance for actual persistent memory
will differ.
Fio 2.21 was used.

64k: 1 task queuedepth=1
CPU Read:  7631 MB/s  99.7% CPU    DMA Read: 2415 MB/s  54% CPU
CPU Write: 3552 MB/s  100% CPU     DMA Write 2173 MB/s  54% CPU

64k: 16 tasks queuedepth=16
CPU Read: 36800 MB/s  1593% CPU    DMA Read:  29100 MB/s  607% CPU
CPU Write 20900 MB/s  1589% CPU    DMA Write: 23400 MB/s  585% CPU

2M: 1 task queuedepth=1
CPU Read:  6013 MB/s  99.3% CPU    DMA Read:  7986 MB/s  59.3% CPU
CPU Write: 3579 MB/s  100% CPU     DMA Write: 5211 MB/s  58.3% CPU

2M: 16 tasks queuedepth=16
CPU Read:  18100 MB/s 1588% CPU    DMA Read:  21300 MB/s 180.9% CPU
CPU Write: 14100 MB/s 1594% CPU    DMA Write: 20400 MB/s 446.9% CPU

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
---
 drivers/nvdimm/pmem.c |  214 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 204 insertions(+), 10 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 98e752f..c8f7a2f 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -32,6 +32,8 @@
 #include <linux/dax.h>
 #include <linux/nd.h>
 #include <linux/blk-mq.h>
+#include <linux/dmaengine.h>
+#include <linux/dma-mapping.h>
 #include "pmem.h"
 #include "pfn.h"
 #include "nd.h"
@@ -41,12 +43,24 @@ enum {
 	PMEM_Q_MQ = 1,
 };
 
-static int queue_mode = PMEM_Q_BIO;
+static int queue_mode = PMEM_Q_MQ;
 module_param(queue_mode, int, 0444);
-MODULE_PARM_DESC(queue_mode, "Pmem Queue Mode (0=BIO, 1=BLK-MQ)");
+MODULE_PARM_DESC(queue_mode, "Pmem Queue Mode (0=BIO, 1=BLK-MQ & DMA)");
+
+static int queue_depth = 128;
+module_param(queue_depth, int, 0444);
+MODULE_PARM_DESC(queue_depth, "I/O Queue Depth for multi queue mode");
+
+/* typically maps to number of DMA channels/devices per socket */
+static int q_per_node = 8;
+module_param(q_per_node, int, 0444);
+MODULE_PARM_DESC(q_per_node, "Hardware queues per node\n");
 
 struct pmem_cmd {
 	struct request *rq;
+	struct dma_chan *chan;
+	int sg_nents;
+	struct scatterlist sg[];
 };
 
 static struct device *to_dev(struct pmem_device *pmem)
@@ -277,6 +291,159 @@ static void pmem_release_disk(void *__pmem)
 	put_disk(pmem->disk);
 }
 
+static void nd_pmem_dma_callback(void *data,
+		const struct dmaengine_result *res)
+{
+	struct pmem_cmd *cmd = data;
+	struct request *req = cmd->rq;
+	struct request_queue *q = req->q;
+	struct pmem_device *pmem = q->queuedata;
+	struct nd_region *nd_region = to_region(pmem);
+	struct device *dev = to_dev(pmem);
+	int rc = 0;
+
+	if (res) {
+		enum dmaengine_tx_result dma_err = res->result;
+
+		switch (dma_err) {
+		case DMA_TRANS_READ_FAILED:
+		case DMA_TRANS_WRITE_FAILED:
+		case DMA_TRANS_ABORTED:
+			dev_dbg(dev, "bio failed\n");
+			rc = -ENXIO;
+			break;
+		case DMA_TRANS_NOERROR:
+		default:
+			break;
+		}
+	}
+
+	if (req->cmd_flags & REQ_FUA)
+		nvdimm_flush(nd_region);
+
+	blk_mq_end_request(cmd->rq, rc);
+}
+
+static int pmem_handle_cmd_dma(struct pmem_cmd *cmd, bool is_write)
+{
+	struct request *req = cmd->rq;
+	struct request_queue *q = req->q;
+	struct pmem_device *pmem = q->queuedata;
+	struct device *dev = to_dev(pmem);
+	phys_addr_t pmem_off = blk_rq_pos(req) * 512 + pmem->data_offset;
+	void *pmem_addr = pmem->virt_addr + pmem_off;
+	struct nd_region *nd_region = to_region(pmem);
+	size_t len;
+	struct dma_device *dma = cmd->chan->device;
+	struct dmaengine_unmap_data *unmap;
+	dma_cookie_t cookie;
+	struct dma_async_tx_descriptor *txd;
+	struct page *page;
+	unsigned int off;
+	int rc;
+	enum dma_data_direction dir;
+	dma_addr_t dma_addr;
+
+	if (req->cmd_flags & REQ_FLUSH)
+		nvdimm_flush(nd_region);
+
+	unmap = dmaengine_get_unmap_data(dma->dev, 2, GFP_NOWAIT);
+	if (!unmap) {
+		dev_dbg(dev, "failed to get dma unmap data\n");
+		rc = -ENOMEM;
+		goto err;
+	}
+
+	/*
+	 * If reading from pmem, writing to scatterlist,
+	 * and if writing to pmem, reading from scatterlist.
+	 */
+	dir = is_write ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+	cmd->sg_nents = blk_rq_map_sg(req->q, req, cmd->sg);
+	if (cmd->sg_nents < 1) {
+		rc = -EINVAL;
+		goto err;
+	}
+
+	if (cmd->sg_nents > 128) {
+		rc = -ENOMEM;
+		dev_warn(dev, "Number of sg greater than allocated\n");
+		goto err;
+	}
+
+	rc = dma_map_sg(dma->dev, cmd->sg, cmd->sg_nents, dir);
+	if (rc < 1) {
+		rc = -ENXIO;
+		goto err;
+	}
+
+	len = blk_rq_payload_bytes(req);
+	page = virt_to_page(pmem_addr);
+	off = offset_in_page(pmem_addr);
+	dir = is_write ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
+	dma_addr = dma_map_page(dma->dev, page, off, len, dir);
+	if (dma_mapping_error(dma->dev, unmap->addr[0])) {
+		dev_dbg(dma->dev, "src DMA mapping error\n");
+		rc = -ENXIO;
+		goto err_unmap_sg;
+	}
+
+	unmap->len = len;
+
+	if (is_write) {
+		unmap->addr[0] = dma_addr;
+		unmap->addr[1] = (dma_addr_t)cmd->sg;
+		unmap->to_cnt = 1;
+		unmap->from_cnt = 0;
+		dma_unmap_data_sg_from_nents(unmap, 2) = cmd->sg_nents;
+	} else {
+		unmap->addr[0] = (dma_addr_t)cmd->sg;
+		unmap->addr[1] = dma_addr;
+		unmap->from_cnt = 1;
+		unmap->to_cnt = 0;
+		dma_unmap_data_sg_to_nents(unmap, 2) = cmd->sg_nents;
+	}
+
+	txd = dma->device_prep_dma_memcpy_sg(cmd->chan,
+				cmd->sg, cmd->sg_nents, dma_addr,
+				!is_write, DMA_PREP_INTERRUPT);
+	if (!txd) {
+		dev_dbg(dma->dev, "dma prep failed\n");
+		rc = -ENXIO;
+		goto err_unmap_buffer;
+	}
+
+	txd->callback_result = nd_pmem_dma_callback;
+	txd->callback_param = cmd;
+	dma_set_unmap(txd, unmap);
+	cookie = dmaengine_submit(txd);
+	if (dma_submit_error(cookie)) {
+		dev_dbg(dma->dev, "dma submit error\n");
+		rc = -ENXIO;
+		goto err_set_unmap;
+	}
+
+	dmaengine_unmap_put(unmap);
+	dma_async_issue_pending(cmd->chan);
+
+	return 0;
+
+err_set_unmap:
+	dmaengine_unmap_put(unmap);
+err_unmap_buffer:
+	dma_unmap_page(dev, dma_addr, len, dir);
+err_unmap_sg:
+	if (dir == DMA_TO_DEVICE)
+		dir = DMA_FROM_DEVICE;
+	else
+		dir = DMA_TO_DEVICE;
+	dma_unmap_sg(dev, cmd->sg, cmd->sg_nents, dir);
+	dmaengine_unmap_put(unmap);
+err:
+	blk_mq_end_request(cmd->rq, -ENXIO);
+	return rc;
+}
+
 static int pmem_handle_cmd(struct pmem_cmd *cmd)
 {
 	struct request *req = cmd->rq;
@@ -310,12 +477,18 @@ static int pmem_queue_rq(struct blk_mq_hw_ctx *hctx,
 		const struct blk_mq_queue_data *bd)
 {
 	struct pmem_cmd *cmd = blk_mq_rq_to_pdu(bd->rq);
+	struct request *req;
+	int rc;
 
-	cmd->rq = bd->rq;
-
-	blk_mq_start_request(bd->rq);
+	req = cmd->rq = bd->rq;
+	cmd->chan = dma_find_channel(DMA_MEMCPY);
+	blk_mq_start_request(req);
 
-	if (pmem_handle_cmd(cmd) < 0)
+	if (cmd->chan && bio_multiple_segments(req->bio))
+		rc = pmem_handle_cmd_dma(cmd, op_is_write(req_op(req)));
+	else
+		rc = pmem_handle_cmd(cmd);
+	if (rc < 0)
 		return BLK_MQ_RQ_QUEUE_ERROR;
 	else
 		return BLK_MQ_RQ_QUEUE_OK;
@@ -341,6 +514,7 @@ static int pmem_attach_disk(struct device *dev,
 	struct gendisk *disk;
 	void *addr;
 	int rc;
+	struct dma_chan *chan;
 
 	/* while nsio_rw_bytes is active, parse a pfn info block if present */
 	if (is_nd_pfn(dev)) {
@@ -370,11 +544,20 @@ static int pmem_attach_disk(struct device *dev,
 	}
 
 	if (queue_mode == PMEM_Q_MQ) {
+		chan = dma_find_channel(DMA_MEMCPY);
+		if (!chan) {
+			queue_mode = PMEM_Q_BIO;
+			dev_warn(dev, "Forced back to PMEM_Q_BIO, no DMA\n");
+		}
+	}
+
+	if (queue_mode == PMEM_Q_MQ) {
 		pmem->tag_set.ops = &pmem_mq_ops;
-		pmem->tag_set.nr_hw_queues = nr_online_nodes;
-		pmem->tag_set.queue_depth = 64;
+		pmem->tag_set.nr_hw_queues = nr_online_nodes * q_per_node;
+		pmem->tag_set.queue_depth = queue_depth;
 		pmem->tag_set.numa_node = dev_to_node(dev);
-		pmem->tag_set.cmd_size = sizeof(struct pmem_cmd);
+		pmem->tag_set.cmd_size = sizeof(struct pmem_cmd) +
+			sizeof(struct scatterlist) * 128;
 		pmem->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
 		pmem->tag_set.driver_data = pmem;
 
@@ -437,7 +620,11 @@ static int pmem_attach_disk(struct device *dev,
 
 	blk_queue_write_cache(pmem->q, true, true);
 	blk_queue_physical_block_size(pmem->q, PAGE_SIZE);
-	blk_queue_max_hw_sectors(pmem->q, UINT_MAX);
+	if (queue_mode == PMEM_Q_MQ) {
+		blk_queue_max_hw_sectors(pmem->q, 0x200000);
+		blk_queue_max_segments(pmem->q, 128);
+	} else
+		blk_queue_max_hw_sectors(pmem->q, UINT_MAX);
 	blk_queue_bounce_limit(pmem->q, BLK_BOUNCE_ANY);
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, pmem->q);
 	queue_flag_set_unlocked(QUEUE_FLAG_DAX, pmem->q);
@@ -576,15 +763,22 @@ static struct nd_device_driver nd_pmem_driver = {
 
 static int __init pmem_init(void)
 {
+	if (queue_mode == PMEM_Q_MQ)
+		dmaengine_get();
+
 	return nd_driver_register(&nd_pmem_driver);
 }
 module_init(pmem_init);
 
 static void pmem_exit(void)
 {
+	if (queue_mode == PMEM_Q_MQ)
+		dmaengine_put();
+
 	driver_unregister(&nd_pmem_driver.drv);
 }
 module_exit(pmem_exit);
 
+MODULE_SOFTDEP("pre: dmaengine");
 MODULE_AUTHOR("Ross Zwisler <ross.zwisler@linux.intel.com>");
 MODULE_LICENSE("GPL v2");

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-02 18:41 ` [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq Dave Jiang
@ 2017-08-02 19:22   ` Sinan Kaya
  2017-08-02 20:52     ` Dave Jiang
  2017-08-03 20:20   ` Ross Zwisler
  1 sibling, 1 reply; 28+ messages in thread
From: Sinan Kaya @ 2017-08-02 19:22 UTC (permalink / raw)
  To: Dave Jiang, vinod.koul, dan.j.williams; +Cc: dmaengine, linux-nvdimm

On 8/2/2017 2:41 PM, Dave Jiang wrote:
>  	if (queue_mode == PMEM_Q_MQ) {
> +		chan = dma_find_channel(DMA_MEMCPY);
> +		if (!chan) {
> +			queue_mode = PMEM_Q_BIO;
> +			dev_warn(dev, "Forced back to PMEM_Q_BIO, no DMA\n");
> +		}

We can't expect all MEMCPY capable hardware to support this feature, right?

Do we need a new API / new function, or new capability?

> +	}


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-02 19:22   ` Sinan Kaya
@ 2017-08-02 20:52     ` Dave Jiang
  2017-08-02 21:10       ` Sinan Kaya
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Jiang @ 2017-08-02 20:52 UTC (permalink / raw)
  To: Sinan Kaya, Koul, Vinod, Williams, Dan J; +Cc: dmaengine, linux-nvdimm



On 08/02/2017 12:22 PM, Sinan Kaya wrote:
> On 8/2/2017 2:41 PM, Dave Jiang wrote:
>>  	if (queue_mode == PMEM_Q_MQ) {
>> +		chan = dma_find_channel(DMA_MEMCPY);
>> +		if (!chan) {
>> +			queue_mode = PMEM_Q_BIO;
>> +			dev_warn(dev, "Forced back to PMEM_Q_BIO, no DMA\n");
>> +		}
> 
> We can't expect all MEMCPY capable hardware to support this feature, right?
> 
> Do we need a new API / new function, or new capability?

Hmmm...you are right. I wonder if we need something like DMA_SG cap....


> 
>> +	}
> 
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-02 20:52     ` Dave Jiang
@ 2017-08-02 21:10       ` Sinan Kaya
  2017-08-02 21:13         ` Dave Jiang
  0 siblings, 1 reply; 28+ messages in thread
From: Sinan Kaya @ 2017-08-02 21:10 UTC (permalink / raw)
  To: Dave Jiang, Koul, Vinod, Williams, Dan J; +Cc: dmaengine, linux-nvdimm

On 8/2/2017 4:52 PM, Dave Jiang wrote:
>> Do we need a new API / new function, or new capability?
> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
> 
> 

Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
to be similar with DMA_MEMSET_SG.

enum dma_transaction_type {
	DMA_MEMCPY,
	DMA_XOR,
	DMA_PQ,
	DMA_XOR_VAL,
	DMA_PQ_VAL,
	DMA_MEMSET,
	DMA_MEMSET_SG,
	DMA_INTERRUPT,
	DMA_SG,
	DMA_PRIVATE,
	DMA_ASYNC_TX,
	DMA_SLAVE,
	DMA_CYCLIC,
	DMA_INTERLEAVE,
/* last transaction type for creation of the capabilities mask */
	DMA_TX_TYPE_END,
};

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-02 21:10       ` Sinan Kaya
@ 2017-08-02 21:13         ` Dave Jiang
  2017-08-03  5:01           ` Vinod Koul
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Jiang @ 2017-08-02 21:13 UTC (permalink / raw)
  To: Sinan Kaya, Koul, Vinod, Williams, Dan J; +Cc: dmaengine, linux-nvdimm



On 08/02/2017 02:10 PM, Sinan Kaya wrote:
> On 8/2/2017 4:52 PM, Dave Jiang wrote:
>>> Do we need a new API / new function, or new capability?
>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
>>
>>
> 
> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
> to be similar with DMA_MEMSET_SG.

I'm ok with that if Vinod is.

> 
> enum dma_transaction_type {
> 	DMA_MEMCPY,
> 	DMA_XOR,
> 	DMA_PQ,
> 	DMA_XOR_VAL,
> 	DMA_PQ_VAL,
> 	DMA_MEMSET,
> 	DMA_MEMSET_SG,
> 	DMA_INTERRUPT,
> 	DMA_SG,
> 	DMA_PRIVATE,
> 	DMA_ASYNC_TX,
> 	DMA_SLAVE,
> 	DMA_CYCLIC,
> 	DMA_INTERLEAVE,
> /* last transaction type for creation of the capabilities mask */
> 	DMA_TX_TYPE_END,
> };
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-02 21:13         ` Dave Jiang
@ 2017-08-03  5:01           ` Vinod Koul
  2017-08-03  5:11             ` Jiang, Dave
  0 siblings, 1 reply; 28+ messages in thread
From: Vinod Koul @ 2017-08-03  5:01 UTC (permalink / raw)
  To: Dave Jiang; +Cc: Sinan Kaya, dmaengine, linux-nvdimm

On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
> 
> 
> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
> > On 8/2/2017 4:52 PM, Dave Jiang wrote:
> >>> Do we need a new API / new function, or new capability?
> >> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
> >>
> >>
> > 
> > Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
> > to be similar with DMA_MEMSET_SG.
> 
> I'm ok with that if Vinod is.

So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
or all :). We should have done bitfields for this though...

> 
> > 
> > enum dma_transaction_type {
> > 	DMA_MEMCPY,
> > 	DMA_XOR,
> > 	DMA_PQ,
> > 	DMA_XOR_VAL,
> > 	DMA_PQ_VAL,
> > 	DMA_MEMSET,
> > 	DMA_MEMSET_SG,
> > 	DMA_INTERRUPT,
> > 	DMA_SG,
> > 	DMA_PRIVATE,
> > 	DMA_ASYNC_TX,
> > 	DMA_SLAVE,
> > 	DMA_CYCLIC,
> > 	DMA_INTERLEAVE,
> > /* last transaction type for creation of the capabilities mask */
> > 	DMA_TX_TYPE_END,
> > };
> > 

-- 
~Vinod
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-03  5:01           ` Vinod Koul
@ 2017-08-03  5:11             ` Jiang, Dave
  2017-08-03  5:28               ` Vinod Koul
  0 siblings, 1 reply; 28+ messages in thread
From: Jiang, Dave @ 2017-08-03  5:11 UTC (permalink / raw)
  To: Koul, Vinod; +Cc: Sinan Kaya, dmaengine, linux-nvdimm



> On Aug 2, 2017, at 9:58 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
> 
>> On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
>> 
>> 
>>> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
>>> On 8/2/2017 4:52 PM, Dave Jiang wrote:
>>>>> Do we need a new API / new function, or new capability?
>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
>>>> 
>>>> 
>>> 
>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
>>> to be similar with DMA_MEMSET_SG.
>> 
>> I'm ok with that if Vinod is.
> 
> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
> or all :). We should have done bitfields for this though...

Add DMA_MEMCPY_SG to transaction type. 

> 
>> 
>>> 
>>> enum dma_transaction_type {
>>>    DMA_MEMCPY,
>>>    DMA_XOR,
>>>    DMA_PQ,
>>>    DMA_XOR_VAL,
>>>    DMA_PQ_VAL,
>>>    DMA_MEMSET,
>>>    DMA_MEMSET_SG,
>>>    DMA_INTERRUPT,
>>>    DMA_SG,
>>>    DMA_PRIVATE,
>>>    DMA_ASYNC_TX,
>>>    DMA_SLAVE,
>>>    DMA_CYCLIC,
>>>    DMA_INTERLEAVE,
>>> /* last transaction type for creation of the capabilities mask */
>>>    DMA_TX_TYPE_END,
>>> };
>>> 
> 
> -- 
> ~Vinod
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-03  5:11             ` Jiang, Dave
@ 2017-08-03  5:28               ` Vinod Koul
  2017-08-03  5:36                 ` Jiang, Dave
  0 siblings, 1 reply; 28+ messages in thread
From: Vinod Koul @ 2017-08-03  5:28 UTC (permalink / raw)
  To: Jiang, Dave, ; +Cc: Sinan Kaya, dmaengine, linux-nvdimm

On Thu, Aug 03, 2017 at 10:41:51AM +0530, Jiang, Dave wrote:
> 
> 
> > On Aug 2, 2017, at 9:58 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
> > 
> >> On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
> >> 
> >> 
> >>> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
> >>> On 8/2/2017 4:52 PM, Dave Jiang wrote:
> >>>>> Do we need a new API / new function, or new capability?
> >>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
> >>>> 
> >>>> 
> >>> 
> >>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
> >>> to be similar with DMA_MEMSET_SG.
> >> 
> >> I'm ok with that if Vinod is.
> > 
> > So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
> > or all :). We should have done bitfields for this though...
> 
> Add DMA_MEMCPY_SG to transaction type. 

Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
scatterlist to scatterlist copy which is used to check for
device_prep_dma_sg() calls

-- 
~Vinod
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-03  5:28               ` Vinod Koul
@ 2017-08-03  5:36                 ` Jiang, Dave
  2017-08-03  8:59                   ` Vinod Koul
  0 siblings, 1 reply; 28+ messages in thread
From: Jiang, Dave @ 2017-08-03  5:36 UTC (permalink / raw)
  To: Koul, Vinod; +Cc: Sinan Kaya, dmaengine, linux-nvdimm



> On Aug 2, 2017, at 10:25 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
> 
>> On Thu, Aug 03, 2017 at 10:41:51AM +0530, Jiang, Dave wrote:
>> 
>> 
>>>> On Aug 2, 2017, at 9:58 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
>>>> 
>>>> On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
>>>> 
>>>> 
>>>>> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
>>>>> On 8/2/2017 4:52 PM, Dave Jiang wrote:
>>>>>>> Do we need a new API / new function, or new capability?
>>>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
>>>>>> 
>>>>>> 
>>>>> 
>>>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
>>>>> to be similar with DMA_MEMSET_SG.
>>>> 
>>>> I'm ok with that if Vinod is.
>>> 
>>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
>>> or all :). We should have done bitfields for this though...
>> 
>> Add DMA_MEMCPY_SG to transaction type. 
> 
> Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
> scatterlist to scatterlist copy which is used to check for
> device_prep_dma_sg() calls
> 
Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So we need something separate than what DMA_SG is used for. 


> -- 
> ~Vinod
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-03  5:36                 ` Jiang, Dave
@ 2017-08-03  8:59                   ` Vinod Koul
  2017-08-03 14:36                     ` Jiang, Dave
  0 siblings, 1 reply; 28+ messages in thread
From: Vinod Koul @ 2017-08-03  8:59 UTC (permalink / raw)
  To: Jiang, Dave; +Cc: Sinan Kaya, dmaengine, linux-nvdimm

On Thu, Aug 03, 2017 at 11:06:13AM +0530, Jiang, Dave wrote:
> 
> 
> > On Aug 2, 2017, at 10:25 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
> > 
> >> On Thu, Aug 03, 2017 at 10:41:51AM +0530, Jiang, Dave wrote:
> >> 
> >> 
> >>>> On Aug 2, 2017, at 9:58 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
> >>>> 
> >>>> On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
> >>>> 
> >>>> 
> >>>>> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
> >>>>> On 8/2/2017 4:52 PM, Dave Jiang wrote:
> >>>>>>> Do we need a new API / new function, or new capability?
> >>>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
> >>>>>> 
> >>>>>> 
> >>>>> 
> >>>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
> >>>>> to be similar with DMA_MEMSET_SG.
> >>>> 
> >>>> I'm ok with that if Vinod is.
> >>> 
> >>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
> >>> or all :). We should have done bitfields for this though...
> >> 
> >> Add DMA_MEMCPY_SG to transaction type. 
> > 
> > Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
> > scatterlist to scatterlist copy which is used to check for
> > device_prep_dma_sg() calls
> > 
> Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So
> we need something separate than what DMA_SG is used for. 

Hmm, its SG-buffer  and its memcpy, so should we call it DMA_SG_BUFFER,
since it is not memset (or is it) I would not call it memset, or maybe we
should also change DMA_SG to DMA_SG_SG to make it terribly clear :D

-- 
~Vinod
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-03  8:59                   ` Vinod Koul
@ 2017-08-03 14:36                     ` Jiang, Dave
  2017-08-03 15:55                       ` Vinod Koul
  0 siblings, 1 reply; 28+ messages in thread
From: Jiang, Dave @ 2017-08-03 14:36 UTC (permalink / raw)
  To: Koul, Vinod; +Cc: Sinan Kaya, dmaengine, linux-nvdimm



> On Aug 3, 2017, at 1:56 AM, Koul, Vinod <vinod.koul@intel.com> wrote:
> 
>> On Thu, Aug 03, 2017 at 11:06:13AM +0530, Jiang, Dave wrote:
>> 
>> 
>>>> On Aug 2, 2017, at 10:25 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
>>>> 
>>>> On Thu, Aug 03, 2017 at 10:41:51AM +0530, Jiang, Dave wrote:
>>>> 
>>>> 
>>>>>> On Aug 2, 2017, at 9:58 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
>>>>>> 
>>>>>> On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
>>>>>>> On 8/2/2017 4:52 PM, Dave Jiang wrote:
>>>>>>>>> Do we need a new API / new function, or new capability?
>>>>>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
>>>>>>> to be similar with DMA_MEMSET_SG.
>>>>>> 
>>>>>> I'm ok with that if Vinod is.
>>>>> 
>>>>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
>>>>> or all :). We should have done bitfields for this though...
>>>> 
>>>> Add DMA_MEMCPY_SG to transaction type. 
>>> 
>>> Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
>>> scatterlist to scatterlist copy which is used to check for
>>> device_prep_dma_sg() calls
>>> 
>> Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So
>> we need something separate than what DMA_SG is used for. 
> 
> Hmm, its SG-buffer  and its memcpy, so should we call it DMA_SG_BUFFER,
> since it is not memset (or is it) I would not call it memset, or maybe we
> should also change DMA_SG to DMA_SG_SG to make it terribly clear :D

I can create patches for both. 

> 
> -- 
> ~Vinod
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-03 14:36                     ` Jiang, Dave
@ 2017-08-03 15:55                       ` Vinod Koul
  2017-08-03 16:14                         ` Dan Williams
  0 siblings, 1 reply; 28+ messages in thread
From: Vinod Koul @ 2017-08-03 15:55 UTC (permalink / raw)
  To: Jiang, Dave; +Cc: Sinan Kaya, dmaengine, linux-nvdimm

On Thu, Aug 03, 2017 at 08:06:07PM +0530, Jiang, Dave wrote:
> 
> 
> > On Aug 3, 2017, at 1:56 AM, Koul, Vinod <vinod.koul@intel.com> wrote:
> > 
> >> On Thu, Aug 03, 2017 at 11:06:13AM +0530, Jiang, Dave wrote:
> >> 
> >> 
> >>>> On Aug 2, 2017, at 10:25 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
> >>>> 
> >>>> On Thu, Aug 03, 2017 at 10:41:51AM +0530, Jiang, Dave wrote:
> >>>> 
> >>>> 
> >>>>>> On Aug 2, 2017, at 9:58 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
> >>>>>> 
> >>>>>> On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
> >>>>>> 
> >>>>>> 
> >>>>>>> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
> >>>>>>> On 8/2/2017 4:52 PM, Dave Jiang wrote:
> >>>>>>>>> Do we need a new API / new function, or new capability?
> >>>>>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>> 
> >>>>>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
> >>>>>>> to be similar with DMA_MEMSET_SG.
> >>>>>> 
> >>>>>> I'm ok with that if Vinod is.
> >>>>> 
> >>>>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
> >>>>> or all :). We should have done bitfields for this though...
> >>>> 
> >>>> Add DMA_MEMCPY_SG to transaction type. 
> >>> 
> >>> Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
> >>> scatterlist to scatterlist copy which is used to check for
> >>> device_prep_dma_sg() calls
> >>> 
> >> Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So
> >> we need something separate than what DMA_SG is used for. 
> > 
> > Hmm, its SG-buffer  and its memcpy, so should we call it DMA_SG_BUFFER,
> > since it is not memset (or is it) I would not call it memset, or maybe we
> > should also change DMA_SG to DMA_SG_SG to make it terribly clear :D
> 
> I can create patches for both. 

Great, anyone who disagrees or can give better names :)

-- 
~Vinod
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-03 15:55                       ` Vinod Koul
@ 2017-08-03 16:14                         ` Dan Williams
  2017-08-03 17:07                           ` Dave Jiang
  2017-08-16 16:50                           ` Vinod Koul
  0 siblings, 2 replies; 28+ messages in thread
From: Dan Williams @ 2017-08-03 16:14 UTC (permalink / raw)
  To: Vinod Koul; +Cc: Sinan Kaya, dmaengine, linux-nvdimm

On Thu, Aug 3, 2017 at 8:55 AM, Vinod Koul <vinod.koul@intel.com> wrote:
> On Thu, Aug 03, 2017 at 08:06:07PM +0530, Jiang, Dave wrote:
>>
>>
>> > On Aug 3, 2017, at 1:56 AM, Koul, Vinod <vinod.koul@intel.com> wrote:
>> >
>> >> On Thu, Aug 03, 2017 at 11:06:13AM +0530, Jiang, Dave wrote:
>> >>
>> >>
>> >>>> On Aug 2, 2017, at 10:25 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
>> >>>>
>> >>>> On Thu, Aug 03, 2017 at 10:41:51AM +0530, Jiang, Dave wrote:
>> >>>>
>> >>>>
>> >>>>>> On Aug 2, 2017, at 9:58 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
>> >>>>>>
>> >>>>>> On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>>> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
>> >>>>>>> On 8/2/2017 4:52 PM, Dave Jiang wrote:
>> >>>>>>>>> Do we need a new API / new function, or new capability?
>> >>>>>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
>> >>>>>>> to be similar with DMA_MEMSET_SG.
>> >>>>>>
>> >>>>>> I'm ok with that if Vinod is.
>> >>>>>
>> >>>>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
>> >>>>> or all :). We should have done bitfields for this though...
>> >>>>
>> >>>> Add DMA_MEMCPY_SG to transaction type.
>> >>>
>> >>> Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
>> >>> scatterlist to scatterlist copy which is used to check for
>> >>> device_prep_dma_sg() calls
>> >>>
>> >> Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So
>> >> we need something separate than what DMA_SG is used for.
>> >
>> > Hmm, its SG-buffer  and its memcpy, so should we call it DMA_SG_BUFFER,
>> > since it is not memset (or is it) I would not call it memset, or maybe we
>> > should also change DMA_SG to DMA_SG_SG to make it terribly clear :D
>>
>> I can create patches for both.
>
> Great, anyone who disagrees or can give better names :)
>

All my suggestions would involve a lot more work. If we had infinite
time we'd stop with the per-operation-type entry points and make this
look like a typical driver sub-system that takes commands like
block-devices or usb, but perhaps that ship has sailed.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-03 16:14                         ` Dan Williams
@ 2017-08-03 17:07                           ` Dave Jiang
  2017-08-03 18:35                             ` Allen Hubbe
  2017-08-16 16:50                           ` Vinod Koul
  1 sibling, 1 reply; 28+ messages in thread
From: Dave Jiang @ 2017-08-03 17:07 UTC (permalink / raw)
  To: Dan Williams, Koul, Vinod
  Cc: Sinan Kaya, dmaengine, Hubbe, Allen  <Allen.Hubbe@dell.com>,
	linux-nvdimm@lists.01.org



On 08/03/2017 09:14 AM, Dan Williams wrote:
> On Thu, Aug 3, 2017 at 8:55 AM, Vinod Koul <vinod.koul@intel.com> wrote:
>> On Thu, Aug 03, 2017 at 08:06:07PM +0530, Jiang, Dave wrote:
>>>
>>>
>>>> On Aug 3, 2017, at 1:56 AM, Koul, Vinod <vinod.koul@intel.com> wrote:
>>>>
>>>>> On Thu, Aug 03, 2017 at 11:06:13AM +0530, Jiang, Dave wrote:
>>>>>
>>>>>
>>>>>>> On Aug 2, 2017, at 10:25 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
>>>>>>>
>>>>>>> On Thu, Aug 03, 2017 at 10:41:51AM +0530, Jiang, Dave wrote:
>>>>>>>
>>>>>>>
>>>>>>>>> On Aug 2, 2017, at 9:58 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
>>>>>>>>>> On 8/2/2017 4:52 PM, Dave Jiang wrote:
>>>>>>>>>>>> Do we need a new API / new function, or new capability?
>>>>>>>>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
>>>>>>>>>> to be similar with DMA_MEMSET_SG.
>>>>>>>>>
>>>>>>>>> I'm ok with that if Vinod is.
>>>>>>>>
>>>>>>>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
>>>>>>>> or all :). We should have done bitfields for this though...
>>>>>>>
>>>>>>> Add DMA_MEMCPY_SG to transaction type.
>>>>>>
>>>>>> Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
>>>>>> scatterlist to scatterlist copy which is used to check for
>>>>>> device_prep_dma_sg() calls
>>>>>>
>>>>> Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So
>>>>> we need something separate than what DMA_SG is used for.
>>>>
>>>> Hmm, its SG-buffer  and its memcpy, so should we call it DMA_SG_BUFFER,
>>>> since it is not memset (or is it) I would not call it memset, or maybe we
>>>> should also change DMA_SG to DMA_SG_SG to make it terribly clear :D
>>>
>>> I can create patches for both.
>>
>> Great, anyone who disagrees or can give better names :)
>>
> 
> All my suggestions would involve a lot more work. If we had infinite
> time we'd stop with the per-operation-type entry points and make this
> look like a typical driver sub-system that takes commands like
> block-devices or usb, but perhaps that ship has sailed.
> 

Allen, isn't this what we were just talking about on IRC yesterday?

<allenbh> I dislike prep_tx grabbing a device-specific descriptor.  The
device really only needs as many device-specific descriptors will keep
the hw busy, and just let the logical layer queue up logical descriptors
until hw descriptors become available.  Let the client allocate
descriptors to submit, and hw driver can translate to a
hardware-specific descriptor at the last moment before notifying the hw.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-03 17:07                           ` Dave Jiang
@ 2017-08-03 18:35                             ` Allen Hubbe
  0 siblings, 0 replies; 28+ messages in thread
From: Allen Hubbe @ 2017-08-03 18:35 UTC (permalink / raw)
  To: 'Dave Jiang', 'Dan Williams', 'Koul, Vinod'
  Cc: 'Sinan Kaya', dmaengine, linux-nvdimm

From: Dave Jiang
> On 08/03/2017 09:14 AM, Dan Williams wrote:
> > On Thu, Aug 3, 2017 at 8:55 AM, Vinod Koul <vinod.koul@intel.com> wrote:
> >> On Thu, Aug 03, 2017 at 08:06:07PM +0530, Jiang, Dave wrote:
> >>>> On Aug 3, 2017, at 1:56 AM, Koul, Vinod <vinod.koul@intel.com> wrote:
> >>>>> On Thu, Aug 03, 2017 at 11:06:13AM +0530, Jiang, Dave wrote:
> >>>>>>> On Aug 2, 2017, at 10:25 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
> >>>>>>> On Thu, Aug 03, 2017 at 10:41:51AM +0530, Jiang, Dave wrote:
> >>>>>>>>> On Aug 2, 2017, at 9:58 PM, Koul, Vinod <vinod.koul@intel.com> wrote:
> >>>>>>>>> On Wed, Aug 02, 2017 at 02:13:56PM -0700, Dave Jiang wrote:
> >>>>>>>>>> On 08/02/2017 02:10 PM, Sinan Kaya wrote:
> >>>>>>>>>> On 8/2/2017 4:52 PM, Dave Jiang wrote:
> >>>>>>>>>>>> Do we need a new API / new function, or new capability?
> >>>>>>>>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
> >>>>>>>>>> to be similar with DMA_MEMSET_SG.
> >>>>>>>>>
> >>>>>>>>> I'm ok with that if Vinod is.
> >>>>>>>>
> >>>>>>>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
> >>>>>>>> or all :). We should have done bitfields for this though...
> >>>>>>>
> >>>>>>> Add DMA_MEMCPY_SG to transaction type.
> >>>>>>
> >>>>>> Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
> >>>>>> scatterlist to scatterlist copy which is used to check for
> >>>>>> device_prep_dma_sg() calls
> >>>>>>
> >>>>> Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So
> >>>>> we need something separate than what DMA_SG is used for.
> >>>>
> >>>> Hmm, its SG-buffer  and its memcpy, so should we call it DMA_SG_BUFFER,
> >>>> since it is not memset (or is it) I would not call it memset, or maybe we
> >>>> should also change DMA_SG to DMA_SG_SG to make it terribly clear :D
> >>>
> >>> I can create patches for both.
> >>
> >> Great, anyone who disagrees or can give better names :)
> >
> > All my suggestions would involve a lot more work. If we had infinite
> > time we'd stop with the per-operation-type entry points and make this
> > look like a typical driver sub-system that takes commands like
> > block-devices or usb, but perhaps that ship has sailed.
> 
> Allen, isn't this what we were just talking about on IRC yesterday?
> 
> <allenbh> I dislike prep_tx grabbing a device-specific descriptor.  The
> device really only needs as many device-specific descriptors will keep
> the hw busy, and just let the logical layer queue up logical descriptors
> until hw descriptors become available.  Let the client allocate
> descriptors to submit, and hw driver can translate to a
> hardware-specific descriptor at the last moment before notifying the hw.

Yeah, that last part, "like a typical driver sub-system that takes commands" +1!

Especially if the client can submit a list of commands.  I have an rdma driver using the dmaengine api, but it would be more natural to build up a list of dma operations, just using my driver's own memory and the struct definition of some abstract dma descriptor, not calling any functions of the dmaengine api.  After building the list of descriptors, submit the entire list all at once to the hardware driver, or not at all.  Instead, in perf I see a lot of heat on that spinlock for each individual dma prep in ioat (sorry for picking on you, Dave, this is the only dma engine hw and driver that I am familiar with).

Also, I think we could glean some more efficiency if the dma completion path took a hint from napi_schedule() and napi->poll(), instead of scheduling a tasklet directly in the dmaengine hw driver.  For one thing, it could reduce the number of interrupts.  Also, coordinating the servicing of dma completions with the posting of new work, with the schedule determined from the top of the stack down, could further reduce contention on that lock between dma prep and cleanup/complete.

http://elixir.free-electrons.com/linux/v4.12.4/source/drivers/dma/ioat/dma.c#L448
http://elixir.free-electrons.com/linux/v4.12.4/source/drivers/dma/ioat/dma.c#L652

Also, I apologize for not offering to do all this work.  If I was ready to jump in do it, I would have spoken up earlier.

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/5] libnvdimm: Adding blk-mq support to the pmem driver
  2017-08-02 18:41 ` [PATCH v2 4/5] libnvdimm: Adding blk-mq support to the pmem driver Dave Jiang
@ 2017-08-03 20:04   ` Ross Zwisler
  0 siblings, 0 replies; 28+ messages in thread
From: Ross Zwisler @ 2017-08-03 20:04 UTC (permalink / raw)
  To: Dave Jiang; +Cc: vinod.koul, dmaengine, linux-nvdimm

On Wed, Aug 02, 2017 at 11:41:19AM -0700, Dave Jiang wrote:
> Adding blk-mq support to the pmem driver in addition to the direct bio
> support. This allows for hardware offloading via DMA engines. By default
> the bio method will be enabled. The blk-mq support can be turned on via
> module parameter queue_mode=1.
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>

One small nit with error handling.  With that addressed you can add:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

> @@ -303,17 +369,47 @@ static int pmem_attach_disk(struct device *dev,
>  		return -EBUSY;
>  	}
>  
> -	q = blk_alloc_queue_node(GFP_KERNEL, dev_to_node(dev));
> -	if (!q)
> -		return -ENOMEM;
> +	if (queue_mode == PMEM_Q_MQ) {
> +		pmem->tag_set.ops = &pmem_mq_ops;
> +		pmem->tag_set.nr_hw_queues = nr_online_nodes;
> +		pmem->tag_set.queue_depth = 64;
> +		pmem->tag_set.numa_node = dev_to_node(dev);
> +		pmem->tag_set.cmd_size = sizeof(struct pmem_cmd);
> +		pmem->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
> +		pmem->tag_set.driver_data = pmem;
> +
> +		rc = blk_mq_alloc_tag_set(&pmem->tag_set);
> +		if (rc < 0)
> +			return rc;
> +
> +		pmem->q = blk_mq_init_queue(&pmem->tag_set);
> +		if (IS_ERR(pmem->q)) {
> +			blk_mq_free_tag_set(&pmem->tag_set);
> +			return -ENOMEM;
> +		}
>  
> -	if (devm_add_action_or_reset(dev, pmem_release_queue, q))
> -		return -ENOMEM;
> +		if (devm_add_action_or_reset(dev, pmem_release_queue, pmem)) {

In the failure case for both this devm_add_action_or_reset() and the one a few
lines down in the PMEM_Q_BIO case, I think you should manually call
pmem_release_queue() instead of calling blk_mq_free_tag_set().  This will free
the tag set and it will clean up the queue you just set up with
blk_mq_init_queue() or blk_alloc_queue_node().

> +			blk_mq_free_tag_set(&pmem->tag_set);
> +			return -ENOMEM;
> +		}
> +	} else if (queue_mode == PMEM_Q_BIO) {
> +		pmem->q = blk_alloc_queue_node(GFP_KERNEL, dev_to_node(dev));
> +		if (!pmem->q)
> +			return -ENOMEM;
> +
> +		if (devm_add_action_or_reset(dev, pmem_release_queue, pmem))

The 2nd case.  This one was like this in the previous code, but should be
fixed unless I'm missing something.

> +			return -ENOMEM;
> +
> +		blk_queue_make_request(pmem->q, pmem_make_request);
> +	} else {
> +		dev_warn(dev, "Invalid queue mode: %d\n", queue_mode);
> +		return -EINVAL;
> +	}
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-02 18:41 ` [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq Dave Jiang
  2017-08-02 19:22   ` Sinan Kaya
@ 2017-08-03 20:20   ` Ross Zwisler
  1 sibling, 0 replies; 28+ messages in thread
From: Ross Zwisler @ 2017-08-03 20:20 UTC (permalink / raw)
  To: Dave Jiang; +Cc: vinod.koul, dmaengine, linux-nvdimm

On Wed, Aug 02, 2017 at 11:41:25AM -0700, Dave Jiang wrote:
> Adding DMA support for pmem blk reads. This provides signficant CPU
> reduction with large memory reads with good performance. DMAs are triggered
> with test against bio_multiple_segment(), so the small I/Os (4k or less?)
> are still performed by the CPU in order to reduce latency. By default
> the pmem driver will be using blk-mq with DMA.
> 
> Numbers below are measured against pmem simulated via DRAM using
> memmap=NN!SS.  DMA engine used is the ioatdma on Intel Skylake Xeon
> platform.  Keep in mind the performance for actual persistent memory
> will differ.
> Fio 2.21 was used.
> 
> 64k: 1 task queuedepth=1
> CPU Read:  7631 MB/s  99.7% CPU    DMA Read: 2415 MB/s  54% CPU
> CPU Write: 3552 MB/s  100% CPU     DMA Write 2173 MB/s  54% CPU
> 
> 64k: 16 tasks queuedepth=16
> CPU Read: 36800 MB/s  1593% CPU    DMA Read:  29100 MB/s  607% CPU
> CPU Write 20900 MB/s  1589% CPU    DMA Write: 23400 MB/s  585% CPU
> 
> 2M: 1 task queuedepth=1
> CPU Read:  6013 MB/s  99.3% CPU    DMA Read:  7986 MB/s  59.3% CPU
> CPU Write: 3579 MB/s  100% CPU     DMA Write: 5211 MB/s  58.3% CPU
> 
> 2M: 16 tasks queuedepth=16
> CPU Read:  18100 MB/s 1588% CPU    DMA Read:  21300 MB/s 180.9% CPU
> CPU Write: 14100 MB/s 1594% CPU    DMA Write: 20400 MB/s 446.9% CPU
> 
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>

<>

> +static int pmem_handle_cmd_dma(struct pmem_cmd *cmd, bool is_write)
> +{
...
> +err_set_unmap:
> +	dmaengine_unmap_put(unmap);
> +err_unmap_buffer:
> +	dma_unmap_page(dev, dma_addr, len, dir);
> +err_unmap_sg:
> +	if (dir == DMA_TO_DEVICE)
> +		dir = DMA_FROM_DEVICE;
> +	else
> +		dir = DMA_TO_DEVICE;
> +	dma_unmap_sg(dev, cmd->sg, cmd->sg_nents, dir);
> +	dmaengine_unmap_put(unmap);
> +err:
> +	blk_mq_end_request(cmd->rq, -ENXIO);

Should this be:

	blk_mq_end_request(cmd->rq, rc);

?

Otherwise:

Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>

> +	return rc;
> +}
> +
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-03 16:14                         ` Dan Williams
  2017-08-03 17:07                           ` Dave Jiang
@ 2017-08-16 16:50                           ` Vinod Koul
  2017-08-16 17:06                             ` Dan Williams
  1 sibling, 1 reply; 28+ messages in thread
From: Vinod Koul @ 2017-08-16 16:50 UTC (permalink / raw)
  To: Dan Williams; +Cc: Sinan Kaya, dmaengine, linux-nvdimm

On Thu, Aug 03, 2017 at 09:14:13AM -0700, Dan Williams wrote:
> >> >>>>>>>>> Do we need a new API / new function, or new capability?
> >> >>>>>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
> >> >>>>>>> to be similar with DMA_MEMSET_SG.
> >> >>>>>>
> >> >>>>>> I'm ok with that if Vinod is.
> >> >>>>>
> >> >>>>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
> >> >>>>> or all :). We should have done bitfields for this though...
> >> >>>>
> >> >>>> Add DMA_MEMCPY_SG to transaction type.
> >> >>>
> >> >>> Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
> >> >>> scatterlist to scatterlist copy which is used to check for
> >> >>> device_prep_dma_sg() calls
> >> >>>
> >> >> Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So
> >> >> we need something separate than what DMA_SG is used for.
> >> >
> >> > Hmm, its SG-buffer  and its memcpy, so should we call it DMA_SG_BUFFER,
> >> > since it is not memset (or is it) I would not call it memset, or maybe we
> >> > should also change DMA_SG to DMA_SG_SG to make it terribly clear :D
> >>
> >> I can create patches for both.
> >
> > Great, anyone who disagrees or can give better names :)
> 
> All my suggestions would involve a lot more work. If we had infinite
> time we'd stop with the per-operation-type entry points and make this
> look like a typical driver sub-system that takes commands like
> block-devices or usb, but perhaps that ship has sailed.

Can you elaborate on this :)

I have been thinking about the need to redo the API. So lets discuss :)

-- 
~Vinod
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-16 16:50                           ` Vinod Koul
@ 2017-08-16 17:06                             ` Dan Williams
  2017-08-16 17:16                               ` Dave Jiang
  0 siblings, 1 reply; 28+ messages in thread
From: Dan Williams @ 2017-08-16 17:06 UTC (permalink / raw)
  To: Vinod Koul; +Cc: Sinan Kaya, dmaengine, linux-nvdimm

On Wed, Aug 16, 2017 at 9:50 AM, Vinod Koul <vinod.koul@intel.com> wrote:
> On Thu, Aug 03, 2017 at 09:14:13AM -0700, Dan Williams wrote:
>> >> >>>>>>>>> Do we need a new API / new function, or new capability?
>> >> >>>>>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
>> >> >>>>>>> to be similar with DMA_MEMSET_SG.
>> >> >>>>>>
>> >> >>>>>> I'm ok with that if Vinod is.
>> >> >>>>>
>> >> >>>>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
>> >> >>>>> or all :). We should have done bitfields for this though...
>> >> >>>>
>> >> >>>> Add DMA_MEMCPY_SG to transaction type.
>> >> >>>
>> >> >>> Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
>> >> >>> scatterlist to scatterlist copy which is used to check for
>> >> >>> device_prep_dma_sg() calls
>> >> >>>
>> >> >> Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So
>> >> >> we need something separate than what DMA_SG is used for.
>> >> >
>> >> > Hmm, its SG-buffer  and its memcpy, so should we call it DMA_SG_BUFFER,
>> >> > since it is not memset (or is it) I would not call it memset, or maybe we
>> >> > should also change DMA_SG to DMA_SG_SG to make it terribly clear :D
>> >>
>> >> I can create patches for both.
>> >
>> > Great, anyone who disagrees or can give better names :)
>>
>> All my suggestions would involve a lot more work. If we had infinite
>> time we'd stop with the per-operation-type entry points and make this
>> look like a typical driver sub-system that takes commands like
>> block-devices or usb, but perhaps that ship has sailed.
>
> Can you elaborate on this :)
>
> I have been thinking about the need to redo the API. So lets discuss :)

The high level is straightforward, the devil is in the details. Define
a generic dma command object, perhaps 'struct dma_io' certainly not
'struct dma_async_tx_descriptor', and have just one entry point
per-driver. That 'struct dma_io' would carry a generic command number
a target address and a scatterlist. The driver entry point would then
convert and build the command to the hardware command format plus
submission queue. The basic driving design principle is convert all
the current function pointer complexity with the prep_* routines into
data structure complexity in the common command format.

This trades off some efficiency because now you need to write the
generic command and write the descriptor, but I think if the operation
is worth offloading those conversion costs must already be in the
noise.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-16 17:06                             ` Dan Williams
@ 2017-08-16 17:16                               ` Dave Jiang
  2017-08-16 17:20                                 ` Dan Williams
  2017-08-18  5:35                                 ` Vinod Koul
  0 siblings, 2 replies; 28+ messages in thread
From: Dave Jiang @ 2017-08-16 17:16 UTC (permalink / raw)
  To: Dan Williams, Koul, Vinod; +Cc: Sinan Kaya, dmaengine, linux-nvdimm



On 08/16/2017 10:06 AM, Dan Williams wrote:
> On Wed, Aug 16, 2017 at 9:50 AM, Vinod Koul <vinod.koul@intel.com> wrote:
>> On Thu, Aug 03, 2017 at 09:14:13AM -0700, Dan Williams wrote:
>>>>>>>>>>>>>> Do we need a new API / new function, or new capability?
>>>>>>>>>>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
>>>>>>>>>>>> to be similar with DMA_MEMSET_SG.
>>>>>>>>>>>
>>>>>>>>>>> I'm ok with that if Vinod is.
>>>>>>>>>>
>>>>>>>>>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
>>>>>>>>>> or all :). We should have done bitfields for this though...
>>>>>>>>>
>>>>>>>>> Add DMA_MEMCPY_SG to transaction type.
>>>>>>>>
>>>>>>>> Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
>>>>>>>> scatterlist to scatterlist copy which is used to check for
>>>>>>>> device_prep_dma_sg() calls
>>>>>>>>
>>>>>>> Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So
>>>>>>> we need something separate than what DMA_SG is used for.
>>>>>>
>>>>>> Hmm, its SG-buffer  and its memcpy, so should we call it DMA_SG_BUFFER,
>>>>>> since it is not memset (or is it) I would not call it memset, or maybe we
>>>>>> should also change DMA_SG to DMA_SG_SG to make it terribly clear :D
>>>>>
>>>>> I can create patches for both.
>>>>
>>>> Great, anyone who disagrees or can give better names :)
>>>
>>> All my suggestions would involve a lot more work. If we had infinite
>>> time we'd stop with the per-operation-type entry points and make this
>>> look like a typical driver sub-system that takes commands like
>>> block-devices or usb, but perhaps that ship has sailed.
>>
>> Can you elaborate on this :)
>>
>> I have been thinking about the need to redo the API. So lets discuss :)
> 
> The high level is straightforward, the devil is in the details. Define
> a generic dma command object, perhaps 'struct dma_io' certainly not
> 'struct dma_async_tx_descriptor', and have just one entry point
> per-driver. That 'struct dma_io' would carry a generic command number
> a target address and a scatterlist. The driver entry point would then
> convert and build the command to the hardware command format plus
> submission queue. The basic driving design principle is convert all
> the current function pointer complexity with the prep_* routines into
> data structure complexity in the common command format.
> 
> This trades off some efficiency because now you need to write the
> generic command and write the descriptor, but I think if the operation
> is worth offloading those conversion costs must already be in the
> noise.

Vinod, I think if you want to look at existing examples take a look at
the block layer request queue. Or even better blk-mq. I think this is
pretty close to what Dan is envisioning? Also, it's probably time we
looking into supporting hotplugging for DMA engines? Maybe this will
make it easier to do so. I'm willing to help and hoping that it will
make things easier for me for the next gen hardware.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-16 17:16                               ` Dave Jiang
@ 2017-08-16 17:20                                 ` Dan Williams
  2017-08-16 17:27                                   ` Dave Jiang
  2017-08-18  5:35                                 ` Vinod Koul
  1 sibling, 1 reply; 28+ messages in thread
From: Dan Williams @ 2017-08-16 17:20 UTC (permalink / raw)
  To: Dave Jiang; +Cc: Koul, Vinod, Sinan Kaya, dmaengine, linux-nvdimm

On Wed, Aug 16, 2017 at 10:16 AM, Dave Jiang <dave.jiang@intel.com> wrote:
>
>
> On 08/16/2017 10:06 AM, Dan Williams wrote:
>> On Wed, Aug 16, 2017 at 9:50 AM, Vinod Koul <vinod.koul@intel.com> wrote:
>>> On Thu, Aug 03, 2017 at 09:14:13AM -0700, Dan Williams wrote:
>>>>>>>>>>>>>>> Do we need a new API / new function, or new capability?
>>>>>>>>>>>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
>>>>>>>>>>>>> to be similar with DMA_MEMSET_SG.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm ok with that if Vinod is.
>>>>>>>>>>>
>>>>>>>>>>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
>>>>>>>>>>> or all :). We should have done bitfields for this though...
>>>>>>>>>>
>>>>>>>>>> Add DMA_MEMCPY_SG to transaction type.
>>>>>>>>>
>>>>>>>>> Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
>>>>>>>>> scatterlist to scatterlist copy which is used to check for
>>>>>>>>> device_prep_dma_sg() calls
>>>>>>>>>
>>>>>>>> Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So
>>>>>>>> we need something separate than what DMA_SG is used for.
>>>>>>>
>>>>>>> Hmm, its SG-buffer  and its memcpy, so should we call it DMA_SG_BUFFER,
>>>>>>> since it is not memset (or is it) I would not call it memset, or maybe we
>>>>>>> should also change DMA_SG to DMA_SG_SG to make it terribly clear :D
>>>>>>
>>>>>> I can create patches for both.
>>>>>
>>>>> Great, anyone who disagrees or can give better names :)
>>>>
>>>> All my suggestions would involve a lot more work. If we had infinite
>>>> time we'd stop with the per-operation-type entry points and make this
>>>> look like a typical driver sub-system that takes commands like
>>>> block-devices or usb, but perhaps that ship has sailed.
>>>
>>> Can you elaborate on this :)
>>>
>>> I have been thinking about the need to redo the API. So lets discuss :)
>>
>> The high level is straightforward, the devil is in the details. Define
>> a generic dma command object, perhaps 'struct dma_io' certainly not
>> 'struct dma_async_tx_descriptor', and have just one entry point
>> per-driver. That 'struct dma_io' would carry a generic command number
>> a target address and a scatterlist. The driver entry point would then
>> convert and build the command to the hardware command format plus
>> submission queue. The basic driving design principle is convert all
>> the current function pointer complexity with the prep_* routines into
>> data structure complexity in the common command format.
>>
>> This trades off some efficiency because now you need to write the
>> generic command and write the descriptor, but I think if the operation
>> is worth offloading those conversion costs must already be in the
>> noise.
>
> Vinod, I think if you want to look at existing examples take a look at
> the block layer request queue. Or even better blk-mq. I think this is
> pretty close to what Dan is envisioning? Also, it's probably time we
> looking into supporting hotplugging for DMA engines? Maybe this will
> make it easier to do so. I'm willing to help and hoping that it will
> make things easier for me for the next gen hardware.

Yes, device hotplug is a good one to add to the list. We didn't have
'struct percpu_ref' when dmaengine started, that would make hotplug
support easier to handle without coarse locking.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-16 17:20                                 ` Dan Williams
@ 2017-08-16 17:27                                   ` Dave Jiang
  0 siblings, 0 replies; 28+ messages in thread
From: Dave Jiang @ 2017-08-16 17:27 UTC (permalink / raw)
  To: Dan Williams; +Cc: Koul, Vinod, Sinan Kaya, dmaengine, linux-nvdimm



On 08/16/2017 10:20 AM, Dan Williams wrote:
> On Wed, Aug 16, 2017 at 10:16 AM, Dave Jiang <dave.jiang@intel.com> wrote:
>>
>>
>> On 08/16/2017 10:06 AM, Dan Williams wrote:
>>> On Wed, Aug 16, 2017 at 9:50 AM, Vinod Koul <vinod.koul@intel.com> wrote:
>>>> On Thu, Aug 03, 2017 at 09:14:13AM -0700, Dan Williams wrote:
>>>>>>>>>>>>>>>> Do we need a new API / new function, or new capability?
>>>>>>>>>>>>>>> Hmmm...you are right. I wonder if we need something like DMA_SG cap....
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Unfortunately, DMA_SG means something else. Maybe, we need DMA_MEMCPY_SG
>>>>>>>>>>>>>> to be similar with DMA_MEMSET_SG.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm ok with that if Vinod is.
>>>>>>>>>>>>
>>>>>>>>>>>> So what exactly is the ask here, are you trying to do MEMCPY or SG or MEMSET
>>>>>>>>>>>> or all :). We should have done bitfields for this though...
>>>>>>>>>>>
>>>>>>>>>>> Add DMA_MEMCPY_SG to transaction type.
>>>>>>>>>>
>>>>>>>>>> Not MEMSET right, then why not use DMA_SG, DMA_SG is supposed for
>>>>>>>>>> scatterlist to scatterlist copy which is used to check for
>>>>>>>>>> device_prep_dma_sg() calls
>>>>>>>>>>
>>>>>>>>> Right. But we are doing flat buffer to/from scatterlist, not sg to sg. So
>>>>>>>>> we need something separate than what DMA_SG is used for.
>>>>>>>>
>>>>>>>> Hmm, its SG-buffer  and its memcpy, so should we call it DMA_SG_BUFFER,
>>>>>>>> since it is not memset (or is it) I would not call it memset, or maybe we
>>>>>>>> should also change DMA_SG to DMA_SG_SG to make it terribly clear :D
>>>>>>>
>>>>>>> I can create patches for both.
>>>>>>
>>>>>> Great, anyone who disagrees or can give better names :)
>>>>>
>>>>> All my suggestions would involve a lot more work. If we had infinite
>>>>> time we'd stop with the per-operation-type entry points and make this
>>>>> look like a typical driver sub-system that takes commands like
>>>>> block-devices or usb, but perhaps that ship has sailed.
>>>>
>>>> Can you elaborate on this :)
>>>>
>>>> I have been thinking about the need to redo the API. So lets discuss :)
>>>
>>> The high level is straightforward, the devil is in the details. Define
>>> a generic dma command object, perhaps 'struct dma_io' certainly not
>>> 'struct dma_async_tx_descriptor', and have just one entry point
>>> per-driver. That 'struct dma_io' would carry a generic command number
>>> a target address and a scatterlist. The driver entry point would then
>>> convert and build the command to the hardware command format plus
>>> submission queue. The basic driving design principle is convert all
>>> the current function pointer complexity with the prep_* routines into
>>> data structure complexity in the common command format.
>>>
>>> This trades off some efficiency because now you need to write the
>>> generic command and write the descriptor, but I think if the operation
>>> is worth offloading those conversion costs must already be in the
>>> noise.
>>
>> Vinod, I think if you want to look at existing examples take a look at
>> the block layer request queue. Or even better blk-mq. I think this is
>> pretty close to what Dan is envisioning? Also, it's probably time we
>> looking into supporting hotplugging for DMA engines? Maybe this will
>> make it easier to do so. I'm willing to help and hoping that it will
>> make things easier for me for the next gen hardware.
> 
> Yes, device hotplug is a good one to add to the list. We didn't have
> 'struct percpu_ref' when dmaengine started, that would make hotplug
> support easier to handle without coarse locking.
> 

And also perhaps hw queues / channels hotplug. Future hardware may have
reconfigurable queues that can be dynamic in numbers.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq
  2017-08-16 17:16                               ` Dave Jiang
  2017-08-16 17:20                                 ` Dan Williams
@ 2017-08-18  5:35                                 ` Vinod Koul
  1 sibling, 0 replies; 28+ messages in thread
From: Vinod Koul @ 2017-08-18  5:35 UTC (permalink / raw)
  To: Dave Jiang; +Cc: Sinan Kaya, dmaengine, linux-nvdimm

On Wed, Aug 16, 2017 at 10:16:31AM -0700, Dave Jiang wrote:
> 
> 
> On 08/16/2017 10:06 AM, Dan Williams wrote:
> > On Wed, Aug 16, 2017 at 9:50 AM, Vinod Koul <vinod.koul@intel.com> wrote:
> >> On Thu, Aug 03, 2017 at 09:14:13AM -0700, Dan Williams wrote:

> >>> All my suggestions would involve a lot more work. If we had infinite
> >>> time we'd stop with the per-operation-type entry points and make this
> >>> look like a typical driver sub-system that takes commands like
> >>> block-devices or usb, but perhaps that ship has sailed.
> >>
> >> Can you elaborate on this :)
> >>
> >> I have been thinking about the need to redo the API. So lets discuss :)
> > 
> > The high level is straightforward, the devil is in the details. Define
> > a generic dma command object, perhaps 'struct dma_io' certainly not
> > 'struct dma_async_tx_descriptor', and have just one entry point
> > per-driver. That 'struct dma_io' would carry a generic command number
> > a target address and a scatterlist. The driver entry point would then
> > convert and build the command to the hardware command format plus
> > submission queue. The basic driving design principle is convert all
> > the current function pointer complexity with the prep_* routines into
> > data structure complexity in the common command format.
> > 
> > This trades off some efficiency because now you need to write the
> > generic command and write the descriptor, but I think if the operation
> > is worth offloading those conversion costs must already be in the
> > noise.

yes the trade off IMO seems a good one actually. In the hindsight the
proliferation of prep_* wasn't a good one. Also for request processing I
feel lot more burden should be undertaken by dmaengine core rather than
drivers, we should just give requests to driver. The descriptors should be
managed in the core...

To ease the change, we can create the new API and not accept drivers to old
API and start moving off drivers one bit at a time. Yes that will take a lot
of time but the conversion will be less painful hopefully...

> Vinod, I think if you want to look at existing examples take a look at
> the block layer request queue. Or even better blk-mq. I think this is
> pretty close to what Dan is envisioning? Also, it's probably time we
> looking into supporting hotplugging for DMA engines? Maybe this will
> make it easier to do so. I'm willing to help and hoping that it will
> make things easier for me for the next gen hardware.

Okay i will look it up. Thanks for the help, we need all the help here :)

> --
> To unsubscribe from this list: send the line "unsubscribe dmaengine" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
~Vinod
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-08-18  5:29 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02 18:40 [PATCH v2 0/5] Adding blk-mq and DMA support to pmem block driver Dave Jiang
2017-08-02 18:41 ` [PATCH v2 1/5] dmaengine: ioatdma: revert 7618d035 to allow sharing of DMA channels Dave Jiang
2017-08-02 18:41 ` [PATCH v2 2/5] dmaengine: ioatdma: dma_prep_memcpy_sg support Dave Jiang
2017-08-02 18:41 ` [PATCH v2 3/5] dmaengine: add SG support to dmaengine_unmap Dave Jiang
2017-08-02 18:41 ` [PATCH v2 4/5] libnvdimm: Adding blk-mq support to the pmem driver Dave Jiang
2017-08-03 20:04   ` Ross Zwisler
2017-08-02 18:41 ` [PATCH v2 5/5] libnvdimm: add DMA support for pmem blk-mq Dave Jiang
2017-08-02 19:22   ` Sinan Kaya
2017-08-02 20:52     ` Dave Jiang
2017-08-02 21:10       ` Sinan Kaya
2017-08-02 21:13         ` Dave Jiang
2017-08-03  5:01           ` Vinod Koul
2017-08-03  5:11             ` Jiang, Dave
2017-08-03  5:28               ` Vinod Koul
2017-08-03  5:36                 ` Jiang, Dave
2017-08-03  8:59                   ` Vinod Koul
2017-08-03 14:36                     ` Jiang, Dave
2017-08-03 15:55                       ` Vinod Koul
2017-08-03 16:14                         ` Dan Williams
2017-08-03 17:07                           ` Dave Jiang
2017-08-03 18:35                             ` Allen Hubbe
2017-08-16 16:50                           ` Vinod Koul
2017-08-16 17:06                             ` Dan Williams
2017-08-16 17:16                               ` Dave Jiang
2017-08-16 17:20                                 ` Dan Williams
2017-08-16 17:27                                   ` Dave Jiang
2017-08-18  5:35                                 ` Vinod Koul
2017-08-03 20:20   ` Ross Zwisler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.