linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg
@ 2016-07-20  8:50 Peter Ujfalusi
  2016-07-20  8:50 ` [PATCH v2 1/6] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list Peter Ujfalusi
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Peter Ujfalusi @ 2016-07-20  8:50 UTC (permalink / raw)
  To: vinod.koul, linux
  Cc: linux-kernel, dmaengine, linux-arm-kernel, linux-omap, tony

Hi,

Changes since v1:
- dropped the patch changing the sequence of vchan_cookie_complete and
  omap_dma_start_sg in omap_dma_callback
- Use appropriate macros to find omap_chan and omap_desc in patch 6
- Use per-device pool instead of per-channel pools.

The following series with the final patch will add support for sDMA Linked List
transfer support.
Linked List is supported by sDMA in OMAP3630+ (OMAP4/5, dra7 family).
If the descriptor load feature is present we can create the descriptors for each
SG beforehand and let sDMA to walk them through.
This way the number of sDMA interrupts the kernel need to handle will drop
dramatically.

I have gathered some numbers to show the difference.

Booting up the board with filesystem on SD card for example:
# cat /proc/interrupts | grep dma
W/o LinkedList support:
 27:       4436          0     WUGEN  13 Level     omap-dma-engine

Same board/filesystem with this patch:
 27:       1027          0     WUGEN  13 Level     omap-dma-engine

Or copying files from SD card to eMCC:
# du -h /usr
2.1G    /usr/
# find /usr/ -type f | wc -l
232001

# cp -r /usr/* /mnt/emmc/tmp/

W/o LinkedList we see ~761069 DMA interrupts.
With LinkedList support it is down to ~269314 DMA interrupts.

With the decreased DMA interrupt number the CPU load is dropping
significantly as well.

The series depends on the interleaved transfer support patch I have sent couple
of days ago:
https://lkml.org/lkml/2016/7/12/216

Regards,
Peter
---
Peter Ujfalusi (6):
  dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list
  dmaengine: omap-dma: Simplify omap_dma_callback
  dmaengine: omap-dma: Dynamically allocate memory for lch_map
  dmaengine: omap-dma: Add more debug information when freeing channel
  dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop
  dmaengine: omap-dma: Support for LinkedList transfer of slave_sg

 drivers/dma/omap-dma.c | 234 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 207 insertions(+), 27 deletions(-)

--
2.9.2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/6] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list
  2016-07-20  8:50 [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
@ 2016-07-20  8:50 ` Peter Ujfalusi
  2016-07-20  8:50 ` [PATCH v2 2/6] dmaengine: omap-dma: Simplify omap_dma_callback Peter Ujfalusi
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Peter Ujfalusi @ 2016-07-20  8:50 UTC (permalink / raw)
  To: vinod.koul, linux
  Cc: linux-kernel, dmaengine, linux-arm-kernel, linux-omap, tony

We can drop the (sg)idx parameter for the omap_dma_start_sg() function and
increment the sgidx inside of the same function.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 2e0d49bcfd8a..7d56cd88c9a5 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -365,10 +365,9 @@ static void omap_dma_stop(struct omap_chan *c)
 	c->running = false;
 }
 
-static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d,
-	unsigned idx)
+static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d)
 {
-	struct omap_sg *sg = d->sg + idx;
+	struct omap_sg *sg = d->sg + c->sgidx;
 	unsigned cxsa, cxei, cxfi;
 
 	if (d->dir == DMA_DEV_TO_MEM || d->dir == DMA_MEM_TO_MEM) {
@@ -388,6 +387,7 @@ static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d,
 	omap_dma_chan_write(c, CFN, sg->fn);
 
 	omap_dma_start(c, d);
+	c->sgidx++;
 }
 
 static void omap_dma_start_desc(struct omap_chan *c)
@@ -433,7 +433,7 @@ static void omap_dma_start_desc(struct omap_chan *c)
 	omap_dma_chan_write(c, CSDP, d->csdp);
 	omap_dma_chan_write(c, CLNK_CTRL, d->clnk_ctrl);
 
-	omap_dma_start_sg(c, d, 0);
+	omap_dma_start_sg(c, d);
 }
 
 static void omap_dma_callback(int ch, u16 status, void *data)
@@ -446,8 +446,8 @@ static void omap_dma_callback(int ch, u16 status, void *data)
 	d = c->desc;
 	if (d) {
 		if (!c->cyclic) {
-			if (++c->sgidx < d->sglen) {
-				omap_dma_start_sg(c, d, c->sgidx);
+			if (c->sgidx < d->sglen) {
+				omap_dma_start_sg(c, d);
 			} else {
 				omap_dma_start_desc(c);
 				vchan_cookie_complete(&d->vd);
-- 
2.9.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/6] dmaengine: omap-dma: Simplify omap_dma_callback
  2016-07-20  8:50 [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
  2016-07-20  8:50 ` [PATCH v2 1/6] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list Peter Ujfalusi
@ 2016-07-20  8:50 ` Peter Ujfalusi
  2016-07-20  8:50 ` [PATCH v2 3/6] dmaengine: omap-dma: Dynamically allocate memory for lch_map Peter Ujfalusi
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Peter Ujfalusi @ 2016-07-20  8:50 UTC (permalink / raw)
  To: vinod.koul, linux
  Cc: linux-kernel, dmaengine, linux-arm-kernel, linux-omap, tony

Flatten the indentation level of the function which gives better view on
the cases we handle here.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 7d56cd88c9a5..d469f9b820e0 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -445,15 +445,13 @@ static void omap_dma_callback(int ch, u16 status, void *data)
 	spin_lock_irqsave(&c->vc.lock, flags);
 	d = c->desc;
 	if (d) {
-		if (!c->cyclic) {
-			if (c->sgidx < d->sglen) {
-				omap_dma_start_sg(c, d);
-			} else {
-				omap_dma_start_desc(c);
-				vchan_cookie_complete(&d->vd);
-			}
-		} else {
+		if (c->cyclic) {
 			vchan_cyclic_callback(&d->vd);
+		} else if (c->sgidx == d->sglen) {
+			omap_dma_start_desc(c);
+			vchan_cookie_complete(&d->vd);
+		} else {
+			omap_dma_start_sg(c, d);
 		}
 	}
 	spin_unlock_irqrestore(&c->vc.lock, flags);
-- 
2.9.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/6] dmaengine: omap-dma: Dynamically allocate memory for lch_map
  2016-07-20  8:50 [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
  2016-07-20  8:50 ` [PATCH v2 1/6] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list Peter Ujfalusi
  2016-07-20  8:50 ` [PATCH v2 2/6] dmaengine: omap-dma: Simplify omap_dma_callback Peter Ujfalusi
@ 2016-07-20  8:50 ` Peter Ujfalusi
  2016-07-20  8:50 ` [PATCH v2 4/6] dmaengine: omap-dma: Add more debug information when freeing channel Peter Ujfalusi
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Peter Ujfalusi @ 2016-07-20  8:50 UTC (permalink / raw)
  To: vinod.koul, linux
  Cc: linux-kernel, dmaengine, linux-arm-kernel, linux-omap, tony

On OMAP1 platforms we do not have 32 channels available. Allocate the
lch_map based on the available channels. This way we are not going to have
more visible channels then it is available on the platform.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index d469f9b820e0..6a97350ea76d 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -35,7 +35,7 @@ struct omap_dmadev {
 	unsigned dma_requests;
 	spinlock_t irq_lock;
 	uint32_t irq_enable_mask;
-	struct omap_chan *lch_map[OMAP_SDMA_CHANNELS];
+	struct omap_chan **lch_map;
 };
 
 struct omap_chan {
@@ -1223,16 +1223,24 @@ static int omap_dma_probe(struct platform_device *pdev)
 	spin_lock_init(&od->lock);
 	spin_lock_init(&od->irq_lock);
 
-	od->dma_requests = OMAP_SDMA_REQUESTS;
-	if (pdev->dev.of_node && of_property_read_u32(pdev->dev.of_node,
-						      "dma-requests",
-						      &od->dma_requests)) {
+	if (!pdev->dev.of_node) {
+		od->dma_requests = od->plat->dma_attr->lch_count;
+		if (unlikely(!od->dma_requests))
+			od->dma_requests = OMAP_SDMA_REQUESTS;
+	} else if (of_property_read_u32(pdev->dev.of_node, "dma-requests",
+					&od->dma_requests)) {
 		dev_info(&pdev->dev,
 			 "Missing dma-requests property, using %u.\n",
 			 OMAP_SDMA_REQUESTS);
+		od->dma_requests = OMAP_SDMA_REQUESTS;
 	}
 
-	for (i = 0; i < OMAP_SDMA_CHANNELS; i++) {
+	od->lch_map = devm_kcalloc(&pdev->dev, od->dma_requests,
+				   sizeof(*od->lch_map), GFP_KERNEL);
+	if (!od->lch_map)
+		return -ENOMEM;
+
+	for (i = 0; i < od->dma_requests; i++) {
 		rc = omap_dma_chan_init(od);
 		if (rc) {
 			omap_dma_free(od);
-- 
2.9.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 4/6] dmaengine: omap-dma: Add more debug information when freeing channel
  2016-07-20  8:50 [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
                   ` (2 preceding siblings ...)
  2016-07-20  8:50 ` [PATCH v2 3/6] dmaengine: omap-dma: Dynamically allocate memory for lch_map Peter Ujfalusi
@ 2016-07-20  8:50 ` Peter Ujfalusi
  2016-07-20  8:50 ` [PATCH v2 5/6] dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop Peter Ujfalusi
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Peter Ujfalusi @ 2016-07-20  8:50 UTC (permalink / raw)
  To: vinod.koul, linux
  Cc: linux-kernel, dmaengine, linux-arm-kernel, linux-omap, tony

Print the same information the driver prints when allocating the channel
resources regarding to the sDMA channel.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 6a97350ea76d..072fff7164dd 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -568,7 +568,8 @@ static void omap_dma_free_chan_resources(struct dma_chan *chan)
 	vchan_free_chan_resources(&c->vc);
 	omap_free_dma(c->dma_ch);
 
-	dev_dbg(od->ddev.dev, "freeing channel for %u\n", c->dma_sig);
+	dev_dbg(od->ddev.dev, "freeing channel %u used for %u\n", c->dma_ch,
+		c->dma_sig);
 	c->dma_sig = 0;
 }
 
-- 
2.9.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 5/6] dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop
  2016-07-20  8:50 [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
                   ` (3 preceding siblings ...)
  2016-07-20  8:50 ` [PATCH v2 4/6] dmaengine: omap-dma: Add more debug information when freeing channel Peter Ujfalusi
@ 2016-07-20  8:50 ` Peter Ujfalusi
  2016-07-20  8:50 ` [PATCH v2 6/6] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg Peter Ujfalusi
  2016-08-10 17:32 ` [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg Vinod Koul
  6 siblings, 0 replies; 10+ messages in thread
From: Peter Ujfalusi @ 2016-07-20  8:50 UTC (permalink / raw)
  To: vinod.koul, linux
  Cc: linux-kernel, dmaengine, linux-arm-kernel, linux-omap, tony

Instead of accessing the array via index, take the pointer first and use
it to set up the omap_sg struct.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 072fff7164dd..9f1cfac16f92 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -819,9 +819,11 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
 	en = burst;
 	frame_bytes = es_bytes[es] * en;
 	for_each_sg(sgl, sgent, sglen, i) {
-		d->sg[i].addr = sg_dma_address(sgent);
-		d->sg[i].en = en;
-		d->sg[i].fn = sg_dma_len(sgent) / frame_bytes;
+		struct omap_sg *osg = &d->sg[i];
+
+		osg->addr = sg_dma_address(sgent);
+		osg->en = en;
+		osg->fn = sg_dma_len(sgent) / frame_bytes;
 	}
 
 	d->sglen = sglen;
-- 
2.9.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 6/6] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg
  2016-07-20  8:50 [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
                   ` (4 preceding siblings ...)
  2016-07-20  8:50 ` [PATCH v2 5/6] dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop Peter Ujfalusi
@ 2016-07-20  8:50 ` Peter Ujfalusi
  2016-08-08  5:42   ` Vinod Koul
  2016-08-10 17:32 ` [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg Vinod Koul
  6 siblings, 1 reply; 10+ messages in thread
From: Peter Ujfalusi @ 2016-07-20  8:50 UTC (permalink / raw)
  To: vinod.koul, linux
  Cc: linux-kernel, dmaengine, linux-arm-kernel, linux-omap, tony

sDMA in OMAP3630 or newer SoC have support for LinkedList transfer. When
LinkedList or Descriptor load feature is present we can create the
descriptors for each and program sDMA to walk through the list of
descriptors instead of the current way of sDMA stop, sDMA reconfiguration
and sDMA start after each SG transfer.
By using LinkedList transfer in sDMA the number of DMA interrupts will
decrease dramatically.
Booting up the board with filesystem on SD card for example:
W/o LinkedList support:
 27:       4436          0     WUGEN  13 Level     omap-dma-engine

Same board/filesystem with this patch:
 27:       1027          0     WUGEN  13 Level     omap-dma-engine

Or copying files from SD card to eMCC:
2.1G    /usr/
232001

W/o LinkedList we see ~761069 DMA interrupts.
With LinkedList support it is down to ~269314 DMA interrupts.

With the decreased DMA interrupt number the CPU load is dropping
significantly as well.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 183 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 177 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 9f1cfac16f92..48578d820a63 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -8,6 +8,7 @@
 #include <linux/delay.h>
 #include <linux/dmaengine.h>
 #include <linux/dma-mapping.h>
+#include <linux/dmapool.h>
 #include <linux/err.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
@@ -32,6 +33,8 @@ struct omap_dmadev {
 	const struct omap_dma_reg *reg_map;
 	struct omap_system_dma_plat_info *plat;
 	bool legacy;
+	bool ll123_supported;
+	struct dma_pool *desc_pool;
 	unsigned dma_requests;
 	spinlock_t irq_lock;
 	uint32_t irq_enable_mask;
@@ -55,16 +58,40 @@ struct omap_chan {
 	unsigned sgidx;
 };
 
+#define DESC_NXT_SV_REFRESH	(0x1 << 24)
+#define DESC_NXT_SV_REUSE	(0x2 << 24)
+#define DESC_NXT_DV_REFRESH	(0x1 << 26)
+#define DESC_NXT_DV_REUSE	(0x2 << 26)
+#define DESC_NTYPE_TYPE2	(0x2 << 29)
+
+/* Type 2 descriptor with Source or Destination address update */
+struct omap_type2_desc {
+	uint32_t next_desc;
+	uint32_t en;
+	uint32_t addr; /* src or dst */
+	uint16_t fn;
+	uint16_t cicr;
+	uint16_t cdei;
+	uint16_t csei;
+	uint32_t cdfi;
+	uint32_t csfi;
+} __packed;
+
 struct omap_sg {
 	dma_addr_t addr;
 	uint32_t en;		/* number of elements (24-bit) */
 	uint32_t fn;		/* number of frames (16-bit) */
 	int32_t fi;		/* for double indexing */
 	int16_t ei;		/* for double indexing */
+
+	/* Linked list */
+	struct omap_type2_desc *t2_desc;
+	dma_addr_t t2_desc_paddr;
 };
 
 struct omap_desc {
 	struct virt_dma_desc vd;
+	bool using_ll;
 	enum dma_transfer_direction dir;
 	dma_addr_t dev_addr;
 
@@ -81,6 +108,9 @@ struct omap_desc {
 };
 
 enum {
+	CAPS_0_SUPPORT_LL123	= BIT(20),	/* Linked List type1/2/3 */
+	CAPS_0_SUPPORT_LL4	= BIT(21),	/* Linked List type4 */
+
 	CCR_FS			= BIT(5),
 	CCR_READ_PRIORITY	= BIT(6),
 	CCR_ENABLE		= BIT(7),
@@ -151,6 +181,19 @@ enum {
 	CICR_SUPER_BLOCK_IE	= BIT(14),	/* OMAP2+ only */
 
 	CLNK_CTRL_ENABLE_LNK	= BIT(15),
+
+	CDP_DST_VALID_INC	= 0 << 0,
+	CDP_DST_VALID_RELOAD	= 1 << 0,
+	CDP_DST_VALID_REUSE	= 2 << 0,
+	CDP_SRC_VALID_INC	= 0 << 2,
+	CDP_SRC_VALID_RELOAD	= 1 << 2,
+	CDP_SRC_VALID_REUSE	= 2 << 2,
+	CDP_NTYPE_TYPE1		= 1 << 4,
+	CDP_NTYPE_TYPE2		= 2 << 4,
+	CDP_NTYPE_TYPE3		= 3 << 4,
+	CDP_TMODE_NORMAL	= 0 << 8,
+	CDP_TMODE_LLIST		= 1 << 8,
+	CDP_FAST		= BIT(10),
 };
 
 static const unsigned es_bytes[] = {
@@ -180,7 +223,64 @@ static inline struct omap_desc *to_omap_dma_desc(struct dma_async_tx_descriptor
 
 static void omap_dma_desc_free(struct virt_dma_desc *vd)
 {
-	kfree(container_of(vd, struct omap_desc, vd));
+	struct omap_desc *d = to_omap_dma_desc(&vd->tx);
+
+	if (d->using_ll) {
+		struct omap_dmadev *od = to_omap_dma_dev(vd->tx.chan->device);
+		int i;
+
+		for (i = 0; i < d->sglen; i++) {
+			if (d->sg[i].t2_desc)
+				dma_pool_free(od->desc_pool, d->sg[i].t2_desc,
+					      d->sg[i].t2_desc_paddr);
+		}
+	}
+
+	kfree(d);
+}
+
+static void omap_dma_fill_type2_desc(struct omap_desc *d, int idx,
+				     enum dma_transfer_direction dir, bool last)
+{
+	struct omap_sg *sg = &d->sg[idx];
+	struct omap_type2_desc *t2_desc = sg->t2_desc;
+
+	if (idx)
+		d->sg[idx - 1].t2_desc->next_desc = sg->t2_desc_paddr;
+	if (last)
+		t2_desc->next_desc = 0xfffffffc;
+
+	t2_desc->en = sg->en;
+	t2_desc->addr = sg->addr;
+	t2_desc->fn = sg->fn & 0xffff;
+	t2_desc->cicr = d->cicr;
+	if (!last)
+		t2_desc->cicr &= ~CICR_BLOCK_IE;
+
+	switch (dir) {
+	case DMA_DEV_TO_MEM:
+		t2_desc->cdei = sg->ei;
+		t2_desc->csei = d->ei;
+		t2_desc->cdfi = sg->fi;
+		t2_desc->csfi = d->fi;
+
+		t2_desc->en |= DESC_NXT_DV_REFRESH;
+		t2_desc->en |= DESC_NXT_SV_REUSE;
+		break;
+	case DMA_MEM_TO_DEV:
+		t2_desc->cdei = d->ei;
+		t2_desc->csei = sg->ei;
+		t2_desc->cdfi = d->fi;
+		t2_desc->csfi = sg->fi;
+
+		t2_desc->en |= DESC_NXT_SV_REFRESH;
+		t2_desc->en |= DESC_NXT_DV_REUSE;
+		break;
+	default:
+		return;
+	}
+
+	t2_desc->en |= DESC_NTYPE_TYPE2;
 }
 
 static void omap_dma_write(uint32_t val, unsigned type, void __iomem *addr)
@@ -285,6 +385,7 @@ static void omap_dma_assign(struct omap_dmadev *od, struct omap_chan *c,
 static void omap_dma_start(struct omap_chan *c, struct omap_desc *d)
 {
 	struct omap_dmadev *od = to_omap_dma_dev(c->vc.chan.device);
+	uint16_t cicr = d->cicr;
 
 	if (__dma_omap15xx(od->plat->dma_attr))
 		omap_dma_chan_write(c, CPC, 0);
@@ -293,8 +394,27 @@ static void omap_dma_start(struct omap_chan *c, struct omap_desc *d)
 
 	omap_dma_clear_csr(c);
 
+	if (d->using_ll) {
+		uint32_t cdp = CDP_TMODE_LLIST | CDP_NTYPE_TYPE2 | CDP_FAST;
+
+		if (d->dir == DMA_DEV_TO_MEM)
+			cdp |= (CDP_DST_VALID_RELOAD | CDP_SRC_VALID_REUSE);
+		else
+			cdp |= (CDP_DST_VALID_REUSE | CDP_SRC_VALID_RELOAD);
+		omap_dma_chan_write(c, CDP, cdp);
+
+		omap_dma_chan_write(c, CNDP, d->sg[0].t2_desc_paddr);
+		omap_dma_chan_write(c, CCDN, 0);
+		omap_dma_chan_write(c, CCFN, 0xffff);
+		omap_dma_chan_write(c, CCEN, 0xffffff);
+
+		cicr &= ~CICR_BLOCK_IE;
+	} else if (od->ll123_supported) {
+		omap_dma_chan_write(c, CDP, 0);
+	}
+
 	/* Enable interrupts */
-	omap_dma_chan_write(c, CICR, d->cicr);
+	omap_dma_chan_write(c, CICR, cicr);
 
 	/* Enable channel */
 	omap_dma_chan_write(c, CCR, d->ccr | CCR_ENABLE);
@@ -447,7 +567,7 @@ static void omap_dma_callback(int ch, u16 status, void *data)
 	if (d) {
 		if (c->cyclic) {
 			vchan_cyclic_callback(&d->vd);
-		} else if (c->sgidx == d->sglen) {
+		} else if (d->using_ll || c->sgidx == d->sglen) {
 			omap_dma_start_desc(c);
 			vchan_cookie_complete(&d->vd);
 		} else {
@@ -501,6 +621,7 @@ static int omap_dma_alloc_chan_resources(struct dma_chan *chan)
 {
 	struct omap_dmadev *od = to_omap_dma_dev(chan->device);
 	struct omap_chan *c = to_omap_dma_chan(chan);
+	struct device *dev = od->ddev.dev;
 	int ret;
 
 	if (od->legacy) {
@@ -511,8 +632,7 @@ static int omap_dma_alloc_chan_resources(struct dma_chan *chan)
 				       &c->dma_ch);
 	}
 
-	dev_dbg(od->ddev.dev, "allocating channel %u for %u\n",
-		c->dma_ch, c->dma_sig);
+	dev_dbg(dev, "allocating channel %u for %u\n", c->dma_ch, c->dma_sig);
 
 	if (ret >= 0) {
 		omap_dma_assign(od, c, c->dma_ch);
@@ -743,6 +863,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
 	struct omap_desc *d;
 	dma_addr_t dev_addr;
 	unsigned i, es, en, frame_bytes;
+	bool ll_failed = false;
 	u32 burst;
 
 	if (dir == DMA_DEV_TO_MEM) {
@@ -818,16 +939,47 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
 	 */
 	en = burst;
 	frame_bytes = es_bytes[es] * en;
+
+	if (sglen >= 2)
+		d->using_ll = od->ll123_supported;
+
 	for_each_sg(sgl, sgent, sglen, i) {
 		struct omap_sg *osg = &d->sg[i];
 
 		osg->addr = sg_dma_address(sgent);
 		osg->en = en;
 		osg->fn = sg_dma_len(sgent) / frame_bytes;
+
+		if (d->using_ll) {
+			osg->t2_desc = dma_pool_alloc(od->desc_pool, GFP_ATOMIC,
+						      &osg->t2_desc_paddr);
+			if (!osg->t2_desc) {
+				dev_err(chan->device->dev,
+					"t2_desc[%d] allocation failed\n", i);
+				ll_failed = true;
+				d->using_ll = false;
+				continue;
+			}
+
+			omap_dma_fill_type2_desc(d, i, dir, (i == sglen - 1));
+		}
 	}
 
 	d->sglen = sglen;
 
+	/* Release the dma_pool entries if one allocation failed */
+	if (ll_failed) {
+		for (i = 0; i < d->sglen; i++) {
+			struct omap_sg *osg = &d->sg[i];
+
+			if (osg->t2_desc) {
+				dma_pool_free(od->desc_pool, osg->t2_desc,
+					      osg->t2_desc_paddr);
+				osg->t2_desc = NULL;
+			}
+		}
+	}
+
 	return vchan_tx_prep(&c->vc, &d->vd, tx_flags);
 }
 
@@ -1266,10 +1418,25 @@ static int omap_dma_probe(struct platform_device *pdev)
 			return rc;
 	}
 
+	if (omap_dma_glbl_read(od, CAPS_0) & CAPS_0_SUPPORT_LL123)
+		od->ll123_supported = true;
+
 	od->ddev.filter.map = od->plat->slave_map;
 	od->ddev.filter.mapcnt = od->plat->slavecnt;
 	od->ddev.filter.fn = omap_dma_filter_fn;
 
+	if (od->ll123_supported) {
+		od->desc_pool = dma_pool_create(dev_name(&pdev->dev),
+						&pdev->dev,
+						sizeof(struct omap_type2_desc),
+						4, 0);
+		if (!od->desc_pool) {
+			dev_err(&pdev->dev,
+				"unable to allocate descriptor pool\n");
+			od->ll123_supported = false;
+		}
+	}
+
 	rc = dma_async_device_register(&od->ddev);
 	if (rc) {
 		pr_warn("OMAP-DMA: failed to register slave DMA engine device: %d\n",
@@ -1293,7 +1460,8 @@ static int omap_dma_probe(struct platform_device *pdev)
 		}
 	}
 
-	dev_info(&pdev->dev, "OMAP DMA engine driver\n");
+	dev_info(&pdev->dev, "OMAP DMA engine driver%s\n",
+		 od->ll123_supported ? " (LinkedList1/2/3 supported)" : "");
 
 	return rc;
 }
@@ -1312,6 +1480,9 @@ static int omap_dma_remove(struct platform_device *pdev)
 		omap_dma_glbl_write(od, IRQENABLE_L0, 0);
 	}
 
+	if (od->ll123_supported)
+		dma_pool_destroy(od->desc_pool);
+
 	omap_dma_free(od);
 
 	return 0;
-- 
2.9.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 6/6] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg
  2016-07-20  8:50 ` [PATCH v2 6/6] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg Peter Ujfalusi
@ 2016-08-08  5:42   ` Vinod Koul
  2016-08-08 13:58     ` Peter Ujfalusi
  0 siblings, 1 reply; 10+ messages in thread
From: Vinod Koul @ 2016-08-08  5:42 UTC (permalink / raw)
  To: Peter Ujfalusi
  Cc: linux, linux-kernel, dmaengine, linux-arm-kernel, linux-omap, tony

On Wed, Jul 20, 2016 at 11:50:32AM +0300, Peter Ujfalusi wrote:
> sDMA in OMAP3630 or newer SoC have support for LinkedList transfer. When
> LinkedList or Descriptor load feature is present we can create the
> descriptors for each and program sDMA to walk through the list of
> descriptors instead of the current way of sDMA stop, sDMA reconfiguration
> and sDMA start after each SG transfer.
> By using LinkedList transfer in sDMA the number of DMA interrupts will
> decrease dramatically.
> Booting up the board with filesystem on SD card for example:
> W/o LinkedList support:
>  27:       4436          0     WUGEN  13 Level     omap-dma-engine
> 
> Same board/filesystem with this patch:
>  27:       1027          0     WUGEN  13 Level     omap-dma-engine
> 
> Or copying files from SD card to eMCC:
> 2.1G    /usr/
> 232001
> 
> W/o LinkedList we see ~761069 DMA interrupts.
> With LinkedList support it is down to ~269314 DMA interrupts.
> 
> With the decreased DMA interrupt number the CPU load is dropping
> significantly as well.

Interesting, I would have counted the throughput of DMA by using time for
transfer and not really interrupts and CPU load. With LL mode, you get a
big performance boost due to starting next transaction by hardware without
waiting for CPU intervention and yes side effect is lesser interrupts and
load :)

> @@ -743,6 +863,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
>  	struct omap_desc *d;
>  	dma_addr_t dev_addr;
>  	unsigned i, es, en, frame_bytes;
> +	bool ll_failed = false;
>  	u32 burst;
>  
>  	if (dir == DMA_DEV_TO_MEM) {
> @@ -818,16 +939,47 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
>  	 */
>  	en = burst;
>  	frame_bytes = es_bytes[es] * en;
> +
> +	if (sglen >= 2)
> +		d->using_ll = od->ll123_supported;

No upperbound on length? Does the hardware support any lengths?



-- 
~Vinod

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 6/6] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg
  2016-08-08  5:42   ` Vinod Koul
@ 2016-08-08 13:58     ` Peter Ujfalusi
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Ujfalusi @ 2016-08-08 13:58 UTC (permalink / raw)
  To: Vinod Koul
  Cc: linux, linux-kernel, dmaengine, linux-arm-kernel, linux-omap, tony

On 08/08/16 08:42, Vinod Koul wrote:
> On Wed, Jul 20, 2016 at 11:50:32AM +0300, Peter Ujfalusi wrote:
>> sDMA in OMAP3630 or newer SoC have support for LinkedList transfer. When
>> LinkedList or Descriptor load feature is present we can create the
>> descriptors for each and program sDMA to walk through the list of
>> descriptors instead of the current way of sDMA stop, sDMA reconfiguration
>> and sDMA start after each SG transfer.
>> By using LinkedList transfer in sDMA the number of DMA interrupts will
>> decrease dramatically.
>> Booting up the board with filesystem on SD card for example:
>> W/o LinkedList support:
>>  27:       4436          0     WUGEN  13 Level     omap-dma-engine
>>
>> Same board/filesystem with this patch:
>>  27:       1027          0     WUGEN  13 Level     omap-dma-engine
>>
>> Or copying files from SD card to eMCC:
>> 2.1G    /usr/
>> 232001
>>
>> W/o LinkedList we see ~761069 DMA interrupts.
>> With LinkedList support it is down to ~269314 DMA interrupts.
>>
>> With the decreased DMA interrupt number the CPU load is dropping
>> significantly as well.
> 
> Interesting, I would have counted the throughput of DMA by using time for
> transfer and not really interrupts and CPU load. With LL mode, you get a
> big performance boost due to starting next transaction by hardware without
> waiting for CPU intervention and yes side effect is lesser interrupts and
> load :)

I did throughput test as well, it was slightly faster, but not the boost I was
hoping for.
The copy of the /usr (2.1G) - 5 runs average:
w/o linked list: 7:30 mins
with this patch: 7:23 mins

The limiting factor here is the SD card I have used. But the board was way
more responsible during heavy I/O tasks, like running 'emerge --sync' I can
still use the board.

>> @@ -743,6 +863,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
>>  	struct omap_desc *d;
>>  	dma_addr_t dev_addr;
>>  	unsigned i, es, en, frame_bytes;
>> +	bool ll_failed = false;
>>  	u32 burst;
>>  
>>  	if (dir == DMA_DEV_TO_MEM) {
>> @@ -818,16 +939,47 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
>>  	 */
>>  	en = burst;
>>  	frame_bytes = es_bytes[es] * en;
>> +
>> +	if (sglen >= 2)
>> +		d->using_ll = od->ll123_supported;
> 
> No upperbound on length? Does the hardware support any lengths?

No, we don't have upper limit, we can link as many sg as we could allocate
from the pool.

-- 
Péter

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg
  2016-07-20  8:50 [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
                   ` (5 preceding siblings ...)
  2016-07-20  8:50 ` [PATCH v2 6/6] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg Peter Ujfalusi
@ 2016-08-10 17:32 ` Vinod Koul
  6 siblings, 0 replies; 10+ messages in thread
From: Vinod Koul @ 2016-08-10 17:32 UTC (permalink / raw)
  To: Peter Ujfalusi
  Cc: linux, linux-kernel, dmaengine, linux-arm-kernel, linux-omap, tony

On Wed, Jul 20, 2016 at 11:50:26AM +0300, Peter Ujfalusi wrote:
> Hi,
> 
> Changes since v1:
> - dropped the patch changing the sequence of vchan_cookie_complete and
>   omap_dma_start_sg in omap_dma_callback
> - Use appropriate macros to find omap_chan and omap_desc in patch 6
> - Use per-device pool instead of per-channel pools.
> 
> The following series with the final patch will add support for sDMA Linked List
> transfer support.
> Linked List is supported by sDMA in OMAP3630+ (OMAP4/5, dra7 family).
> If the descriptor load feature is present we can create the descriptors for each
> SG beforehand and let sDMA to walk them through.
> This way the number of sDMA interrupts the kernel need to handle will drop
> dramatically.

Applied all, thanks

-- 
~Vinod

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-08-10 18:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-20  8:50 [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
2016-07-20  8:50 ` [PATCH v2 1/6] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list Peter Ujfalusi
2016-07-20  8:50 ` [PATCH v2 2/6] dmaengine: omap-dma: Simplify omap_dma_callback Peter Ujfalusi
2016-07-20  8:50 ` [PATCH v2 3/6] dmaengine: omap-dma: Dynamically allocate memory for lch_map Peter Ujfalusi
2016-07-20  8:50 ` [PATCH v2 4/6] dmaengine: omap-dma: Add more debug information when freeing channel Peter Ujfalusi
2016-07-20  8:50 ` [PATCH v2 5/6] dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop Peter Ujfalusi
2016-07-20  8:50 ` [PATCH v2 6/6] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg Peter Ujfalusi
2016-08-08  5:42   ` Vinod Koul
2016-08-08 13:58     ` Peter Ujfalusi
2016-08-10 17:32 ` [PATCH v2 0/6] dmaengine:omap-dma: Linked List transfer for slave_sg Vinod Koul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).