linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg
@ 2016-07-14 12:42 Peter Ujfalusi
  2016-07-14 12:42 ` [PATCH 1/7] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list Peter Ujfalusi
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

The following series with the final patch will add support for sDMA Linked List
transfer support.
Linked List is supported by sDMA in OMAP3630+ (OMAP4/5, dra7 family).
If the descriptor load feature is present we can create the descriptors for each
SG beforehand and let sDMA to walk them through.
This way the number of sDMA interrupts the kernel need to handle will drop
dramatically.

I have gathered some numbers to show the difference.

Booting up the board with filesystem on SD card for example:
# cat /proc/interrupts | grep dma
W/o LinkedList support:
 27:       4436          0     WUGEN  13 Level     omap-dma-engine

Same board/filesystem with this patch:
 27:       1027          0     WUGEN  13 Level     omap-dma-engine

Or copying files from SD card to eMCC:
# du -h /usr
2.1G    /usr/
# find /usr/ -type f | wc -l
232001

# cp -r /usr/* /mnt/emmc/tmp/

W/o LinkedList we see ~761069 DMA interrupts.
With LinkedList support it is down to ~269314 DMA interrupts.

With the decreased DMA interrupt number the CPU load is dropping
significantly as well.

The series depends on the interleaved transfer support patch I have sent couple
of days ago:
https://lkml.org/lkml/2016/7/12/216

Regards,
Peter
---
Peter Ujfalusi (7):
  dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list
  dmaengine: omap-dma: Complete the cookie first on transfer completion
  dmaengine: omap-dma: Simplify omap_dma_callback
  dmaengine: omap-dma: Dynamically allocate memory for lch_map
  dmaengine: omap-dma: Add more debug information when freeing channel
  dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop
  dmaengine: omap-dma: Support for LinkedList transfer of slave_sg

 drivers/dma/omap-dma.c | 234 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 207 insertions(+), 27 deletions(-)

--
2.9.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/7] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list
  2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
@ 2016-07-14 12:42 ` Peter Ujfalusi
  2016-07-14 12:42 ` [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion Peter Ujfalusi
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

We can drop the (sg)idx parameter for the omap_dma_start_sg() function and
increment the sgidx inside of the same function.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 2e0d49bcfd8a..7d56cd88c9a5 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -365,10 +365,9 @@ static void omap_dma_stop(struct omap_chan *c)
 	c->running = false;
 }
 
-static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d,
-	unsigned idx)
+static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d)
 {
-	struct omap_sg *sg = d->sg + idx;
+	struct omap_sg *sg = d->sg + c->sgidx;
 	unsigned cxsa, cxei, cxfi;
 
 	if (d->dir == DMA_DEV_TO_MEM || d->dir == DMA_MEM_TO_MEM) {
@@ -388,6 +387,7 @@ static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d,
 	omap_dma_chan_write(c, CFN, sg->fn);
 
 	omap_dma_start(c, d);
+	c->sgidx++;
 }
 
 static void omap_dma_start_desc(struct omap_chan *c)
@@ -433,7 +433,7 @@ static void omap_dma_start_desc(struct omap_chan *c)
 	omap_dma_chan_write(c, CSDP, d->csdp);
 	omap_dma_chan_write(c, CLNK_CTRL, d->clnk_ctrl);
 
-	omap_dma_start_sg(c, d, 0);
+	omap_dma_start_sg(c, d);
 }
 
 static void omap_dma_callback(int ch, u16 status, void *data)
@@ -446,8 +446,8 @@ static void omap_dma_callback(int ch, u16 status, void *data)
 	d = c->desc;
 	if (d) {
 		if (!c->cyclic) {
-			if (++c->sgidx < d->sglen) {
-				omap_dma_start_sg(c, d, c->sgidx);
+			if (c->sgidx < d->sglen) {
+				omap_dma_start_sg(c, d);
 			} else {
 				omap_dma_start_desc(c);
 				vchan_cookie_complete(&d->vd);
-- 
2.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion
  2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
  2016-07-14 12:42 ` [PATCH 1/7] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list Peter Ujfalusi
@ 2016-07-14 12:42 ` Peter Ujfalusi
  2016-07-18 10:34   ` Russell King - ARM Linux
  2016-07-14 12:42 ` [PATCH 3/7] dmaengine: omap-dma: Simplify omap_dma_callback Peter Ujfalusi
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

Before looking for the next descriptor to start, complete the just finished
cookie.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 7d56cd88c9a5..f7b0b0c668fb 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -449,8 +449,8 @@ static void omap_dma_callback(int ch, u16 status, void *data)
 			if (c->sgidx < d->sglen) {
 				omap_dma_start_sg(c, d);
 			} else {
-				omap_dma_start_desc(c);
 				vchan_cookie_complete(&d->vd);
+				omap_dma_start_desc(c);
 			}
 		} else {
 			vchan_cyclic_callback(&d->vd);
-- 
2.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/7] dmaengine: omap-dma: Simplify omap_dma_callback
  2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
  2016-07-14 12:42 ` [PATCH 1/7] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list Peter Ujfalusi
  2016-07-14 12:42 ` [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion Peter Ujfalusi
@ 2016-07-14 12:42 ` Peter Ujfalusi
  2016-07-14 12:42 ` [PATCH 4/7] dmaengine: omap-dma: Dynamically allocate memory for lch_map Peter Ujfalusi
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

Flatten the indentation level of the function which gives better view on
the cases we handle here.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index f7b0b0c668fb..6d134252ed61 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -445,15 +445,13 @@ static void omap_dma_callback(int ch, u16 status, void *data)
 	spin_lock_irqsave(&c->vc.lock, flags);
 	d = c->desc;
 	if (d) {
-		if (!c->cyclic) {
-			if (c->sgidx < d->sglen) {
-				omap_dma_start_sg(c, d);
-			} else {
-				vchan_cookie_complete(&d->vd);
-				omap_dma_start_desc(c);
-			}
-		} else {
+		if (c->cyclic) {
 			vchan_cyclic_callback(&d->vd);
+		} else if (c->sgidx == d->sglen) {
+			vchan_cookie_complete(&d->vd);
+			omap_dma_start_desc(c);
+		} else {
+			omap_dma_start_sg(c, d);
 		}
 	}
 	spin_unlock_irqrestore(&c->vc.lock, flags);
-- 
2.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/7] dmaengine: omap-dma: Dynamically allocate memory for lch_map
  2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
                   ` (2 preceding siblings ...)
  2016-07-14 12:42 ` [PATCH 3/7] dmaengine: omap-dma: Simplify omap_dma_callback Peter Ujfalusi
@ 2016-07-14 12:42 ` Peter Ujfalusi
  2016-07-14 12:42 ` [PATCH 5/7] dmaengine: omap-dma: Add more debug information when freeing channel Peter Ujfalusi
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

On OMAP1 platforms we do not have 32 channels available. Allocate the
lch_map based on the available channels. This way we are not going to have
more visible channels then it is available on the platform.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 6d134252ed61..c026642fc66a 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -35,7 +35,7 @@ struct omap_dmadev {
 	unsigned dma_requests;
 	spinlock_t irq_lock;
 	uint32_t irq_enable_mask;
-	struct omap_chan *lch_map[OMAP_SDMA_CHANNELS];
+	struct omap_chan **lch_map;
 };
 
 struct omap_chan {
@@ -1223,16 +1223,24 @@ static int omap_dma_probe(struct platform_device *pdev)
 	spin_lock_init(&od->lock);
 	spin_lock_init(&od->irq_lock);
 
-	od->dma_requests = OMAP_SDMA_REQUESTS;
-	if (pdev->dev.of_node && of_property_read_u32(pdev->dev.of_node,
-						      "dma-requests",
-						      &od->dma_requests)) {
+	if (!pdev->dev.of_node) {
+		od->dma_requests = od->plat->dma_attr->lch_count;
+		if (unlikely(!od->dma_requests))
+			od->dma_requests = OMAP_SDMA_REQUESTS;
+	} else if (of_property_read_u32(pdev->dev.of_node, "dma-requests",
+					&od->dma_requests)) {
 		dev_info(&pdev->dev,
 			 "Missing dma-requests property, using %u.\n",
 			 OMAP_SDMA_REQUESTS);
+		od->dma_requests = OMAP_SDMA_REQUESTS;
 	}
 
-	for (i = 0; i < OMAP_SDMA_CHANNELS; i++) {
+	od->lch_map = devm_kcalloc(&pdev->dev, od->dma_requests,
+				   sizeof(*od->lch_map), GFP_KERNEL);
+	if (!od->lch_map)
+		return -ENOMEM;
+
+	for (i = 0; i < od->dma_requests; i++) {
 		rc = omap_dma_chan_init(od);
 		if (rc) {
 			omap_dma_free(od);
-- 
2.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/7] dmaengine: omap-dma: Add more debug information when freeing channel
  2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
                   ` (3 preceding siblings ...)
  2016-07-14 12:42 ` [PATCH 4/7] dmaengine: omap-dma: Dynamically allocate memory for lch_map Peter Ujfalusi
@ 2016-07-14 12:42 ` Peter Ujfalusi
  2016-07-14 12:42 ` [PATCH 6/7] dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop Peter Ujfalusi
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

Print the same information the driver prints when allocating the channel
resources regarding to the sDMA channel.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index c026642fc66a..bbad82985083 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -568,7 +568,8 @@ static void omap_dma_free_chan_resources(struct dma_chan *chan)
 	vchan_free_chan_resources(&c->vc);
 	omap_free_dma(c->dma_ch);
 
-	dev_dbg(od->ddev.dev, "freeing channel for %u\n", c->dma_sig);
+	dev_dbg(od->ddev.dev, "freeing channel %u used for %u\n", c->dma_ch,
+		c->dma_sig);
 	c->dma_sig = 0;
 }
 
-- 
2.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/7] dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop
  2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
                   ` (4 preceding siblings ...)
  2016-07-14 12:42 ` [PATCH 5/7] dmaengine: omap-dma: Add more debug information when freeing channel Peter Ujfalusi
@ 2016-07-14 12:42 ` Peter Ujfalusi
  2016-07-14 12:42 ` [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg Peter Ujfalusi
  2016-07-18 10:31 ` [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Russell King - ARM Linux
  7 siblings, 0 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

Instead of accessing the array via index, take the pointer first and use
it to set up the omap_sg struct.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index bbad82985083..8497750fa44a 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -819,9 +819,11 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
 	en = burst;
 	frame_bytes = es_bytes[es] * en;
 	for_each_sg(sgl, sgent, sglen, i) {
-		d->sg[i].addr = sg_dma_address(sgent);
-		d->sg[i].en = en;
-		d->sg[i].fn = sg_dma_len(sgent) / frame_bytes;
+		struct omap_sg *osg = &d->sg[i];
+
+		osg->addr = sg_dma_address(sgent);
+		osg->en = en;
+		osg->fn = sg_dma_len(sgent) / frame_bytes;
 	}
 
 	d->sglen = sglen;
-- 
2.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg
  2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
                   ` (5 preceding siblings ...)
  2016-07-14 12:42 ` [PATCH 6/7] dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop Peter Ujfalusi
@ 2016-07-14 12:42 ` Peter Ujfalusi
  2016-07-18 10:42   ` Russell King - ARM Linux
  2016-07-18 10:31 ` [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Russell King - ARM Linux
  7 siblings, 1 reply; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw)
  To: linux-arm-kernel

sDMA in OMAP3630 or newer SoC have support for LinkedList transfer. When
LinkedList or Descriptor load feature is present we can create the
descriptors for each and program sDMA to walk through the list of
descriptors instead of the current way of sDMA stop, sDMA reconfiguration
and sDMA start after each SG transfer.
By using LinkedList transfer in sDMA the number of DMA interrupts will
decrease dramatically.
Booting up the board with filesystem on SD card for example:
# cat /proc/interrupts | grep dma
W/o LinkedList support:
 27:       4436          0     WUGEN  13 Level     omap-dma-engine

Same board/filesystem with this patch:
 27:       1027          0     WUGEN  13 Level     omap-dma-engine

Or copying files from SD card to eMCC:
# du -h /usr
2.1G    /usr/
# find /usr/ -type f | wc -l
232001

# cp -r /usr/* /mnt/emmc/tmp/

W/o LinkedList we see ~761069 DMA interrupts.
With LinkedList support it is down to ~269314 DMA interrupts.

With the decreased DMA interrupt number the CPU load is dropping
significantly as well.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/dma/omap-dma.c | 183 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 177 insertions(+), 6 deletions(-)

diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
index 8497750fa44a..22b3e1a5425d 100644
--- a/drivers/dma/omap-dma.c
+++ b/drivers/dma/omap-dma.c
@@ -8,6 +8,7 @@
 #include <linux/delay.h>
 #include <linux/dmaengine.h>
 #include <linux/dma-mapping.h>
+#include <linux/dmapool.h>
 #include <linux/err.h>
 #include <linux/init.h>
 #include <linux/interrupt.h>
@@ -32,6 +33,7 @@ struct omap_dmadev {
 	const struct omap_dma_reg *reg_map;
 	struct omap_system_dma_plat_info *plat;
 	bool legacy;
+	bool ll123_supported;
 	unsigned dma_requests;
 	spinlock_t irq_lock;
 	uint32_t irq_enable_mask;
@@ -41,6 +43,7 @@ struct omap_dmadev {
 struct omap_chan {
 	struct virt_dma_chan vc;
 	void __iomem *channel_base;
+	struct dma_pool *desc_pool;
 	const struct omap_dma_reg *reg_map;
 	uint32_t ccr;
 
@@ -55,16 +58,41 @@ struct omap_chan {
 	unsigned sgidx;
 };
 
+#define DESC_NXT_SV_REFRESH	(0x1 << 24)
+#define DESC_NXT_SV_REUSE	(0x2 << 24)
+#define DESC_NXT_DV_REFRESH	(0x1 << 26)
+#define DESC_NXT_DV_REUSE	(0x2 << 26)
+#define DESC_NTYPE_TYPE2	(0x2 << 29)
+
+/* Type 2 descriptor with Source or Destination address update */
+struct omap_type2_desc {
+	uint32_t next_desc;
+	uint32_t en;
+	uint32_t addr; /* src or dst */
+	uint16_t fn;
+	uint16_t cicr;
+	uint16_t cdei;
+	uint16_t csei;
+	uint32_t cdfi;
+	uint32_t csfi;
+} __packed;
+
 struct omap_sg {
 	dma_addr_t addr;
 	uint32_t en;		/* number of elements (24-bit) */
 	uint32_t fn;		/* number of frames (16-bit) */
 	int32_t fi;		/* for double indexing */
 	int16_t ei;		/* for double indexing */
+
+	/* Linked list */
+	struct omap_type2_desc *t2_desc;
+	dma_addr_t t2_desc_paddr;
 };
 
 struct omap_desc {
+	struct omap_chan *c;
 	struct virt_dma_desc vd;
+	bool using_ll;
 	enum dma_transfer_direction dir;
 	dma_addr_t dev_addr;
 
@@ -81,6 +109,9 @@ struct omap_desc {
 };
 
 enum {
+	CAPS_0_SUPPORT_LL123	= BIT(20),	/* Linked List type1/2/3 */
+	CAPS_0_SUPPORT_LL4	= BIT(21),	/* Linked List type4 */
+
 	CCR_FS			= BIT(5),
 	CCR_READ_PRIORITY	= BIT(6),
 	CCR_ENABLE		= BIT(7),
@@ -151,6 +182,19 @@ enum {
 	CICR_SUPER_BLOCK_IE	= BIT(14),	/* OMAP2+ only */
 
 	CLNK_CTRL_ENABLE_LNK	= BIT(15),
+
+	CDP_DST_VALID_INC	= 0 << 0,
+	CDP_DST_VALID_RELOAD	= 1 << 0,
+	CDP_DST_VALID_REUSE	= 2 << 0,
+	CDP_SRC_VALID_INC	= 0 << 2,
+	CDP_SRC_VALID_RELOAD	= 1 << 2,
+	CDP_SRC_VALID_REUSE	= 2 << 2,
+	CDP_NTYPE_TYPE1		= 1 << 4,
+	CDP_NTYPE_TYPE2		= 2 << 4,
+	CDP_NTYPE_TYPE3		= 3 << 4,
+	CDP_TMODE_NORMAL	= 0 << 8,
+	CDP_TMODE_LLIST		= 1 << 8,
+	CDP_FAST		= BIT(10),
 };
 
 static const unsigned es_bytes[] = {
@@ -180,7 +224,64 @@ static inline struct omap_desc *to_omap_dma_desc(struct dma_async_tx_descriptor
 
 static void omap_dma_desc_free(struct virt_dma_desc *vd)
 {
-	kfree(container_of(vd, struct omap_desc, vd));
+	struct omap_desc *d = container_of(vd, struct omap_desc, vd);
+
+	if (d->using_ll) {
+		struct omap_chan *c = d->c;
+		int i;
+
+		for (i = 0; i < d->sglen; i++) {
+			if (d->sg[i].t2_desc)
+				dma_pool_free(c->desc_pool, d->sg[i].t2_desc,
+					      d->sg[i].t2_desc_paddr);
+		}
+	}
+
+	kfree(d);
+}
+
+static void omap_dma_fill_type2_desc(struct omap_desc *d, int idx,
+				     enum dma_transfer_direction dir, bool last)
+{
+	struct omap_sg *sg = &d->sg[idx];
+	struct omap_type2_desc *t2_desc = sg->t2_desc;
+
+	if (idx)
+		d->sg[idx - 1].t2_desc->next_desc = sg->t2_desc_paddr;
+	if (last)
+		t2_desc->next_desc = 0xfffffffc;
+
+	t2_desc->en = sg->en;
+	t2_desc->addr = sg->addr;
+	t2_desc->fn = sg->fn & 0xffff;
+	t2_desc->cicr = d->cicr;
+	if (!last)
+		t2_desc->cicr &= ~CICR_BLOCK_IE;
+
+	switch (dir) {
+	case DMA_DEV_TO_MEM:
+		t2_desc->cdei = sg->ei;
+		t2_desc->csei = d->ei;
+		t2_desc->cdfi = sg->fi;
+		t2_desc->csfi = d->fi;
+
+		t2_desc->en |= DESC_NXT_DV_REFRESH;
+		t2_desc->en |= DESC_NXT_SV_REUSE;
+		break;
+	case DMA_MEM_TO_DEV:
+		t2_desc->cdei = d->ei;
+		t2_desc->csei = sg->ei;
+		t2_desc->cdfi = d->fi;
+		t2_desc->csfi = sg->fi;
+
+		t2_desc->en |= DESC_NXT_SV_REFRESH;
+		t2_desc->en |= DESC_NXT_DV_REUSE;
+		break;
+	default:
+		return;
+	}
+
+	t2_desc->en |= DESC_NTYPE_TYPE2;
 }
 
 static void omap_dma_write(uint32_t val, unsigned type, void __iomem *addr)
@@ -285,6 +386,7 @@ static void omap_dma_assign(struct omap_dmadev *od, struct omap_chan *c,
 static void omap_dma_start(struct omap_chan *c, struct omap_desc *d)
 {
 	struct omap_dmadev *od = to_omap_dma_dev(c->vc.chan.device);
+	uint16_t cicr = d->cicr;
 
 	if (__dma_omap15xx(od->plat->dma_attr))
 		omap_dma_chan_write(c, CPC, 0);
@@ -293,8 +395,27 @@ static void omap_dma_start(struct omap_chan *c, struct omap_desc *d)
 
 	omap_dma_clear_csr(c);
 
+	if (d->using_ll) {
+		uint32_t cdp = CDP_TMODE_LLIST | CDP_NTYPE_TYPE2 | CDP_FAST;
+
+		if (d->dir == DMA_DEV_TO_MEM)
+			cdp |= (CDP_DST_VALID_RELOAD | CDP_SRC_VALID_REUSE);
+		else
+			cdp |= (CDP_DST_VALID_REUSE | CDP_SRC_VALID_RELOAD);
+		omap_dma_chan_write(c, CDP, cdp);
+
+		omap_dma_chan_write(c, CNDP, d->sg[0].t2_desc_paddr);
+		omap_dma_chan_write(c, CCDN, 0);
+		omap_dma_chan_write(c, CCFN, 0xffff);
+		omap_dma_chan_write(c, CCEN, 0xffffff);
+
+		cicr &= ~CICR_BLOCK_IE;
+	} else if (od->ll123_supported) {
+		omap_dma_chan_write(c, CDP, 0);
+	}
+
 	/* Enable interrupts */
-	omap_dma_chan_write(c, CICR, d->cicr);
+	omap_dma_chan_write(c, CICR, cicr);
 
 	/* Enable channel */
 	omap_dma_chan_write(c, CCR, d->ccr | CCR_ENABLE);
@@ -447,7 +568,7 @@ static void omap_dma_callback(int ch, u16 status, void *data)
 	if (d) {
 		if (c->cyclic) {
 			vchan_cyclic_callback(&d->vd);
-		} else if (c->sgidx == d->sglen) {
+		} else if (d->using_ll || c->sgidx == d->sglen) {
 			vchan_cookie_complete(&d->vd);
 			omap_dma_start_desc(c);
 		} else {
@@ -501,8 +622,19 @@ static int omap_dma_alloc_chan_resources(struct dma_chan *chan)
 {
 	struct omap_dmadev *od = to_omap_dma_dev(chan->device);
 	struct omap_chan *c = to_omap_dma_chan(chan);
+	struct device *dev = od->ddev.dev;
 	int ret;
 
+	if (od->ll123_supported) {
+		c->desc_pool = dma_pool_create(dev_name(dev), dev,
+					       sizeof(struct omap_type2_desc),
+					       4, 0);
+		if (!c->desc_pool) {
+			dev_err(dev, "unable to allocate descriptor pool\n");
+			return -ENOMEM;
+		}
+	}
+
 	if (od->legacy) {
 		ret = omap_request_dma(c->dma_sig, "DMA engine",
 				       omap_dma_callback, c, &c->dma_ch);
@@ -511,8 +643,7 @@ static int omap_dma_alloc_chan_resources(struct dma_chan *chan)
 				       &c->dma_ch);
 	}
 
-	dev_dbg(od->ddev.dev, "allocating channel %u for %u\n",
-		c->dma_ch, c->dma_sig);
+	dev_dbg(dev, "allocating channel %u for %u\n", c->dma_ch, c->dma_sig);
 
 	if (ret >= 0) {
 		omap_dma_assign(od, c, c->dma_ch);
@@ -567,6 +698,8 @@ static void omap_dma_free_chan_resources(struct dma_chan *chan)
 	od->lch_map[c->dma_ch] = NULL;
 	vchan_free_chan_resources(&c->vc);
 	omap_free_dma(c->dma_ch);
+	if (od->ll123_supported)
+		dma_pool_destroy(c->desc_pool);
 
 	dev_dbg(od->ddev.dev, "freeing channel %u used for %u\n", c->dma_ch,
 		c->dma_sig);
@@ -743,6 +876,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
 	struct omap_desc *d;
 	dma_addr_t dev_addr;
 	unsigned i, es, en, frame_bytes;
+	bool ll_failed = false;
 	u32 burst;
 
 	if (dir == DMA_DEV_TO_MEM) {
@@ -778,6 +912,8 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
 	if (!d)
 		return NULL;
 
+	d->c = c;
+
 	d->dir = dir;
 	d->dev_addr = dev_addr;
 	d->es = es;
@@ -818,16 +954,47 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg(
 	 */
 	en = burst;
 	frame_bytes = es_bytes[es] * en;
+
+	if (sglen >= 2)
+		d->using_ll = od->ll123_supported;
+
 	for_each_sg(sgl, sgent, sglen, i) {
 		struct omap_sg *osg = &d->sg[i];
 
 		osg->addr = sg_dma_address(sgent);
 		osg->en = en;
 		osg->fn = sg_dma_len(sgent) / frame_bytes;
+
+		if (d->using_ll) {
+			osg->t2_desc = dma_pool_alloc(c->desc_pool, GFP_ATOMIC,
+						      &osg->t2_desc_paddr);
+			if (!osg->t2_desc) {
+				dev_err(chan->device->dev,
+					"t2_desc[%d] allocation failed\n", i);
+				ll_failed = true;
+				d->using_ll = false;
+				continue;
+			}
+
+			omap_dma_fill_type2_desc(d, i, dir, (i == sglen - 1));
+		}
 	}
 
 	d->sglen = sglen;
 
+	/* Release the dma_pool entries if one allocation failed */
+	if (ll_failed) {
+		for (i = 0; i < d->sglen; i++) {
+			struct omap_sg *osg = &d->sg[i];
+
+			if (osg->t2_desc) {
+				dma_pool_free(c->desc_pool, osg->t2_desc,
+					      osg->t2_desc_paddr);
+				osg->t2_desc = NULL;
+			}
+		}
+	}
+
 	return vchan_tx_prep(&c->vc, &d->vd, tx_flags);
 }
 
@@ -1266,6 +1433,9 @@ static int omap_dma_probe(struct platform_device *pdev)
 			return rc;
 	}
 
+	if (omap_dma_glbl_read(od, CAPS_0) & CAPS_0_SUPPORT_LL123)
+		od->ll123_supported = true;
+
 	od->ddev.filter.map = od->plat->slave_map;
 	od->ddev.filter.mapcnt = od->plat->slavecnt;
 	od->ddev.filter.fn = omap_dma_filter_fn;
@@ -1293,7 +1463,8 @@ static int omap_dma_probe(struct platform_device *pdev)
 		}
 	}
 
-	dev_info(&pdev->dev, "OMAP DMA engine driver\n");
+	dev_info(&pdev->dev, "OMAP DMA engine driver%s\n",
+		 od->ll123_supported ? " (LinkedList1/2/3 supported)" : "");
 
 	return rc;
 }
-- 
2.9.1

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg
  2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
                   ` (6 preceding siblings ...)
  2016-07-14 12:42 ` [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg Peter Ujfalusi
@ 2016-07-18 10:31 ` Russell King - ARM Linux
  2016-07-18 12:07   ` Peter Ujfalusi
  7 siblings, 1 reply; 24+ messages in thread
From: Russell King - ARM Linux @ 2016-07-18 10:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 14, 2016 at 03:42:35PM +0300, Peter Ujfalusi wrote:
> Hi,
> 
> The following series with the final patch will add support for sDMA Linked List
> transfer support.
> Linked List is supported by sDMA in OMAP3630+ (OMAP4/5, dra7 family).
> If the descriptor load feature is present we can create the descriptors for each
> SG beforehand and let sDMA to walk them through.
> This way the number of sDMA interrupts the kernel need to handle will drop
> dramatically.

I suggested this a few years ago, and I was told by TI that there was
no interest to implement this feature as it had very little performance
effect.  Do I take it that TI have changed their position on this
feature?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion
  2016-07-14 12:42 ` [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion Peter Ujfalusi
@ 2016-07-18 10:34   ` Russell King - ARM Linux
  2016-07-19 12:35     ` Peter Ujfalusi
  0 siblings, 1 reply; 24+ messages in thread
From: Russell King - ARM Linux @ 2016-07-18 10:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote:
> Before looking for the next descriptor to start, complete the just finished
> cookie.

This change will reduce performance as we no longer have an overlap
between the next request starting to be dealt with in the hardware
vs the previous request being completed.  Your commit log doesn't
say _why_ the change is being made, it merely tells us what the
patch is doing, which we can see already.

Please describe changes a little better.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg
  2016-07-14 12:42 ` [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg Peter Ujfalusi
@ 2016-07-18 10:42   ` Russell King - ARM Linux
  2016-07-18 11:12     ` Peter Ujfalusi
  0 siblings, 1 reply; 24+ messages in thread
From: Russell King - ARM Linux @ 2016-07-18 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 14, 2016 at 03:42:42PM +0300, Peter Ujfalusi wrote:
>  struct omap_desc {
> +	struct omap_chan *c;
>  	struct virt_dma_desc vd;

No need for this.  to_omap_dma_chan(foo->vd.tx.chan) will give you the
omap_chan for the descriptor.  In any case, I question whether you
actually need this (see below.)

> +	bool using_ll;
>  	enum dma_transfer_direction dir;
>  	dma_addr_t dev_addr;
>  
> @@ -81,6 +109,9 @@ struct omap_desc {
>  };
>  
>  enum {
> +	CAPS_0_SUPPORT_LL123	= BIT(20),	/* Linked List type1/2/3 */
> +	CAPS_0_SUPPORT_LL4	= BIT(21),	/* Linked List type4 */
> +
>  	CCR_FS			= BIT(5),
>  	CCR_READ_PRIORITY	= BIT(6),
>  	CCR_ENABLE		= BIT(7),
> @@ -151,6 +182,19 @@ enum {
>  	CICR_SUPER_BLOCK_IE	= BIT(14),	/* OMAP2+ only */
>  
>  	CLNK_CTRL_ENABLE_LNK	= BIT(15),
> +
> +	CDP_DST_VALID_INC	= 0 << 0,
> +	CDP_DST_VALID_RELOAD	= 1 << 0,
> +	CDP_DST_VALID_REUSE	= 2 << 0,
> +	CDP_SRC_VALID_INC	= 0 << 2,
> +	CDP_SRC_VALID_RELOAD	= 1 << 2,
> +	CDP_SRC_VALID_REUSE	= 2 << 2,
> +	CDP_NTYPE_TYPE1		= 1 << 4,
> +	CDP_NTYPE_TYPE2		= 2 << 4,
> +	CDP_NTYPE_TYPE3		= 3 << 4,
> +	CDP_TMODE_NORMAL	= 0 << 8,
> +	CDP_TMODE_LLIST		= 1 << 8,
> +	CDP_FAST		= BIT(10),
>  };
>  
>  static const unsigned es_bytes[] = {
> @@ -180,7 +224,64 @@ static inline struct omap_desc *to_omap_dma_desc(struct dma_async_tx_descriptor
>  
>  static void omap_dma_desc_free(struct virt_dma_desc *vd)
>  {
> -	kfree(container_of(vd, struct omap_desc, vd));
> +	struct omap_desc *d = container_of(vd, struct omap_desc, vd);

	struct omap_desc *d = to_omap_dma_desc(&vd->tx);

works just as well, and looks much nicer, and follows the existing code
pattern.

> +
> +	if (d->using_ll) {
> +		struct omap_chan *c = d->c;
> +		int i;
> +
> +		for (i = 0; i < d->sglen; i++) {
> +			if (d->sg[i].t2_desc)
> +				dma_pool_free(c->desc_pool, d->sg[i].t2_desc,
> +					      d->sg[i].t2_desc_paddr);

Why do you need a per-channel pool of descriptors?  Won't a per-device
descriptor pool be much better, and simplify the code here?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg
  2016-07-18 10:42   ` Russell King - ARM Linux
@ 2016-07-18 11:12     ` Peter Ujfalusi
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-18 11:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/18/16 13:42, Russell King - ARM Linux wrote:
> On Thu, Jul 14, 2016 at 03:42:42PM +0300, Peter Ujfalusi wrote:
>>  struct omap_desc {
>> +	struct omap_chan *c;
>>  	struct virt_dma_desc vd;
> 
> No need for this.  to_omap_dma_chan(foo->vd.tx.chan) will give you the
> omap_chan for the descriptor.  In any case, I question whether you
> actually need this (see below.)

I don't know how I missed that. Works and looks better!


>> +	bool using_ll;
>>  	enum dma_transfer_direction dir;
>>  	dma_addr_t dev_addr;
>>  
>> @@ -81,6 +109,9 @@ struct omap_desc {
>>  };
>>  
>>  enum {
>> +	CAPS_0_SUPPORT_LL123	= BIT(20),	/* Linked List type1/2/3 */
>> +	CAPS_0_SUPPORT_LL4	= BIT(21),	/* Linked List type4 */
>> +
>>  	CCR_FS			= BIT(5),
>>  	CCR_READ_PRIORITY	= BIT(6),
>>  	CCR_ENABLE		= BIT(7),
>> @@ -151,6 +182,19 @@ enum {
>>  	CICR_SUPER_BLOCK_IE	= BIT(14),	/* OMAP2+ only */
>>  
>>  	CLNK_CTRL_ENABLE_LNK	= BIT(15),
>> +
>> +	CDP_DST_VALID_INC	= 0 << 0,
>> +	CDP_DST_VALID_RELOAD	= 1 << 0,
>> +	CDP_DST_VALID_REUSE	= 2 << 0,
>> +	CDP_SRC_VALID_INC	= 0 << 2,
>> +	CDP_SRC_VALID_RELOAD	= 1 << 2,
>> +	CDP_SRC_VALID_REUSE	= 2 << 2,
>> +	CDP_NTYPE_TYPE1		= 1 << 4,
>> +	CDP_NTYPE_TYPE2		= 2 << 4,
>> +	CDP_NTYPE_TYPE3		= 3 << 4,
>> +	CDP_TMODE_NORMAL	= 0 << 8,
>> +	CDP_TMODE_LLIST		= 1 << 8,
>> +	CDP_FAST		= BIT(10),
>>  };
>>  
>>  static const unsigned es_bytes[] = {
>> @@ -180,7 +224,64 @@ static inline struct omap_desc *to_omap_dma_desc(struct dma_async_tx_descriptor
>>  
>>  static void omap_dma_desc_free(struct virt_dma_desc *vd)
>>  {
>> -	kfree(container_of(vd, struct omap_desc, vd));
>> +	struct omap_desc *d = container_of(vd, struct omap_desc, vd);
> 
> 	struct omap_desc *d = to_omap_dma_desc(&vd->tx);
> 
> works just as well, and looks much nicer, and follows the existing code
> pattern.

Yes, I missed this as well.

>> +
>> +	if (d->using_ll) {
>> +		struct omap_chan *c = d->c;
>> +		int i;
>> +
>> +		for (i = 0; i < d->sglen; i++) {
>> +			if (d->sg[i].t2_desc)
>> +				dma_pool_free(c->desc_pool, d->sg[i].t2_desc,
>> +					      d->sg[i].t2_desc_paddr);
> 
> Why do you need a per-channel pool of descriptors?  Won't a per-device
> descriptor pool be much better, and simplify the code here?

I was planning to try per-device pool after this series. I think I went with
per-channel pool as for example bcm2835-dma was doing the same.
In code wise I don't think it is going to simplify much as we still need to
free here what we have allocated. I can test this out.

-- 
P?ter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg
  2016-07-18 10:31 ` [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Russell King - ARM Linux
@ 2016-07-18 12:07   ` Peter Ujfalusi
  2016-07-18 12:21     ` Russell King - ARM Linux
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-18 12:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/18/16 13:31, Russell King - ARM Linux wrote:
> On Thu, Jul 14, 2016 at 03:42:35PM +0300, Peter Ujfalusi wrote:
>> Hi,
>>
>> The following series with the final patch will add support for sDMA Linked List
>> transfer support.
>> Linked List is supported by sDMA in OMAP3630+ (OMAP4/5, dra7 family).
>> If the descriptor load feature is present we can create the descriptors for each
>> SG beforehand and let sDMA to walk them through.
>> This way the number of sDMA interrupts the kernel need to handle will drop
>> dramatically.
> 
> I suggested this a few years ago, and I was told by TI that there was
> no interest to implement this feature as it had very little performance
> effect.

I can not comment on this... Few years ago I was not involved with the DMA
drivers so I don't have any idea why would anyone object to have the linked
list (or descriptor load) mode in use whenever it is possible.
I was not even aware of the linked list mode of sDMA 3 weeks back, but while
reading the TRM - for the interleaved mode mainly it sounded like a good idea
to implement this.
Not really sure about the raw performance impact, but for interactivity it
does help. I remember running 'emerge --sync' on BeagleBoard was pain as it
took hours and the board was mostly unusable during that time. With the linked
list mode the same takes reasonable time and I can still poke around in the board.

> Do I take it that TI have changed their position on this feature?

I was not aware of any position on this from TI - as I mentioned I was not
involved with DMA. It could be that the position from 'TI' is still what it
was. Or changed. But as I have been asked to look after TI DMA drivers
upstream and I believe that the linked list mode is a good thing to have -
which is backed by my experiences. My position is that linked list support is
cool.

-- 
P?ter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg
  2016-07-18 12:07   ` Peter Ujfalusi
@ 2016-07-18 12:21     ` Russell King - ARM Linux
  2016-07-18 12:30       ` Peter Ujfalusi
  0 siblings, 1 reply; 24+ messages in thread
From: Russell King - ARM Linux @ 2016-07-18 12:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jul 18, 2016 at 03:07:57PM +0300, Peter Ujfalusi wrote:
> I was not aware of any position on this from TI - as I mentioned I was not
> involved with DMA. It could be that the position from 'TI' is still what it
> was. Or changed. But as I have been asked to look after TI DMA drivers
> upstream and I believe that the linked list mode is a good thing to have -
> which is backed by my experiences. My position is that linked list support is
> cool.

That's really nice news.  Nothing like asking the author first whether
he'd like to pass over maintainership of the driver.  I guess you won't
mind if at some point in the future, I decide to just take it back...

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg
  2016-07-18 12:21     ` Russell King - ARM Linux
@ 2016-07-18 12:30       ` Peter Ujfalusi
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-18 12:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/18/16 15:21, Russell King - ARM Linux wrote:
> On Mon, Jul 18, 2016 at 03:07:57PM +0300, Peter Ujfalusi wrote:
>> I was not aware of any position on this from TI - as I mentioned I was not
>> involved with DMA. It could be that the position from 'TI' is still what it
>> was. Or changed. But as I have been asked to look after TI DMA drivers
>> upstream and I believe that the linked list mode is a good thing to have -
>> which is backed by my experiences. My position is that linked list support is
>> cool.
> 
> That's really nice news.  Nothing like asking the author first whether
> he'd like to pass over maintainership of the driver.  I guess you won't
> mind if at some point in the future, I decide to just take it back...

I work with the DMA drivers on behalf of TI. Inside TI the DMA related queries
are targeted at me. This does not change the maintainer of the drivers upstream.

-- 
P?ter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion
  2016-07-18 10:34   ` Russell King - ARM Linux
@ 2016-07-19 12:35     ` Peter Ujfalusi
  2016-07-19 16:20       ` Russell King - ARM Linux
  2016-07-20  6:26       ` Robert Jarzmik
  0 siblings, 2 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-19 12:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/18/16 13:34, Russell King - ARM Linux wrote:
> On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote:
>> Before looking for the next descriptor to start, complete the just finished
>> cookie.
> 
> This change will reduce performance as we no longer have an overlap
> between the next request starting to be dealt with in the hardware
> vs the previous request being completed.

vchan_cookie_complete() will only mark the cookie completed, adds the vd to
the desc_completed list (it was deleted from desc_issued list when it was
started by omap_dma_start_desc) and schedule the tasklet to deal with the real
completion later.
Marking the just finished descriptor/cookie done first then looking for
possible descriptors in the queue to start feels like a better sequence.

After a quick grep in the kernel source: only omap-dma.c was starting the next
transfer before marking the current completed descriptor/cookie done.

> Your commit log doesn't
> say _why_ the change is being made, it merely tells us what the
> patch is doing, which we can see already.
> 
> Please describe changes a little better.
> 


-- 
P?ter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion
  2016-07-19 12:35     ` Peter Ujfalusi
@ 2016-07-19 16:20       ` Russell King - ARM Linux
  2016-07-19 19:23         ` Peter Ujfalusi
  2016-07-24  7:39         ` Vinod Koul
  2016-07-20  6:26       ` Robert Jarzmik
  1 sibling, 2 replies; 24+ messages in thread
From: Russell King - ARM Linux @ 2016-07-19 16:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 19, 2016 at 03:35:18PM +0300, Peter Ujfalusi wrote:
> On 07/18/16 13:34, Russell King - ARM Linux wrote:
> > On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote:
> >> Before looking for the next descriptor to start, complete the just finished
> >> cookie.
> > 
> > This change will reduce performance as we no longer have an overlap
> > between the next request starting to be dealt with in the hardware
> > vs the previous request being completed.
> 
> vchan_cookie_complete() will only mark the cookie completed, adds the vd to
> the desc_completed list (it was deleted from desc_issued list when it was
> started by omap_dma_start_desc) and schedule the tasklet to deal with the real
> completion later.
> Marking the just finished descriptor/cookie done first then looking for
> possible descriptors in the queue to start feels like a better sequence.

I deliberately arranged the code in the original order so that the next
transfer was started on the hardware with the least amount of work by
the CPU.  Yes, there may not be much in it, but everything you mention
above adds to the number of CPU cycles that need to be executed before
the next transfer can be started.

More CPU cycles wasted means higher latency between transfers, which
means lower performance.

> After a quick grep in the kernel source: only omap-dma.c was starting the
> next transfer before marking the current completed descriptor/cookie done.

Right, because I've thought about the issue, having been the author of
both virt-dma and omap-dma.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion
  2016-07-19 16:20       ` Russell King - ARM Linux
@ 2016-07-19 19:23         ` Peter Ujfalusi
  2016-07-24  7:39         ` Vinod Koul
  1 sibling, 0 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-19 19:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/19/2016 07:20 PM, Russell King - ARM Linux wrote:
>> vchan_cookie_complete() will only mark the cookie completed, adds the vd to
>> the desc_completed list (it was deleted from desc_issued list when it was
>> started by omap_dma_start_desc) and schedule the tasklet to deal with the real
>> completion later.
>> Marking the just finished descriptor/cookie done first then looking for
>> possible descriptors in the queue to start feels like a better sequence.
> 
> I deliberately arranged the code in the original order so that the next
> transfer was started on the hardware with the least amount of work by
> the CPU.  Yes, there may not be much in it, but everything you mention
> above adds to the number of CPU cycles that need to be executed before
> the next transfer can be started.
> 
> More CPU cycles wasted means higher latency between transfers, which
> means lower performance.

OK. I will drop this patch in v2.

>> After a quick grep in the kernel source: only omap-dma.c was starting the
>> next transfer before marking the current completed descriptor/cookie done.
> 
> Right, because I've thought about the issue, having been the author of
> both virt-dma and omap-dma.
> 


-- 
P?ter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion
  2016-07-19 12:35     ` Peter Ujfalusi
  2016-07-19 16:20       ` Russell King - ARM Linux
@ 2016-07-20  6:26       ` Robert Jarzmik
  2016-07-21  9:33         ` Peter Ujfalusi
  1 sibling, 1 reply; 24+ messages in thread
From: Robert Jarzmik @ 2016-07-20  6:26 UTC (permalink / raw)
  To: linux-arm-kernel

Peter Ujfalusi <peter.ujfalusi@ti.com> writes:

> On 07/18/16 13:34, Russell King - ARM Linux wrote:
>> On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote:
>>> Before looking for the next descriptor to start, complete the just finished
>>> cookie.
>> 
>> This change will reduce performance as we no longer have an overlap
>> between the next request starting to be dealt with in the hardware
>> vs the previous request being completed.
>
> vchan_cookie_complete() will only mark the cookie completed, adds the vd to
> the desc_completed list (it was deleted from desc_issued list when it was
> started by omap_dma_start_desc) and schedule the tasklet to deal with the real
> completion later.
> Marking the just finished descriptor/cookie done first then looking for
> possible descriptors in the queue to start feels like a better sequence.
>
> After a quick grep in the kernel source: only omap-dma.c was starting the next
> transfer before marking the current completed descriptor/cookie done.

Euh actually I think it's done in other drivers as well :
 - Documentation/dmaengine/pxa_dma.txt (chapter "Transfers hot-chaining)
 - drivers/dma/pxa_dma.c
   => look for pxad_try_hotchain() and it's impact on pxad_chan_handler() which
   will mark the completion while the next transfer is already pumped by the
   hardware.

Speaking of which, from a purely design point of view, as long as you think
beforehand what is your sequence, ie. what is the sequence of your link
chaining, completion handling, etc ..., both marking before or after next tx
start should be fine IMHO.

So in your quest for the "better sequence" the pxa driver's one might give you
some perspective :)

Cheers.

-- 
Robert

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion
  2016-07-20  6:26       ` Robert Jarzmik
@ 2016-07-21  9:33         ` Peter Ujfalusi
  2016-07-21  9:35           ` Peter Ujfalusi
  2016-07-21  9:47           ` Russell King - ARM Linux
  0 siblings, 2 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-21  9:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/20/16 09:26, Robert Jarzmik wrote:
> Peter Ujfalusi <peter.ujfalusi@ti.com> writes:
> 
>> On 07/18/16 13:34, Russell King - ARM Linux wrote:
>>> On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote:
>>>> Before looking for the next descriptor to start, complete the just finished
>>>> cookie.
>>>
>>> This change will reduce performance as we no longer have an overlap
>>> between the next request starting to be dealt with in the hardware
>>> vs the previous request being completed.
>>
>> vchan_cookie_complete() will only mark the cookie completed, adds the vd to
>> the desc_completed list (it was deleted from desc_issued list when it was
>> started by omap_dma_start_desc) and schedule the tasklet to deal with the real
>> completion later.
>> Marking the just finished descriptor/cookie done first then looking for
>> possible descriptors in the queue to start feels like a better sequence.
>>
>> After a quick grep in the kernel source: only omap-dma.c was starting the next
>> transfer before marking the current completed descriptor/cookie done.
> 
> Euh actually I think it's done in other drivers as well :
>  - Documentation/dmaengine/pxa_dma.txt (chapter "Transfers hot-chaining)
>  - drivers/dma/pxa_dma.c
>    => look for pxad_try_hotchain() and it's impact on pxad_chan_handler() which
>    will mark the completion while the next transfer is already pumped by the
>    hardware.

The 'hot-chaining' is a bit different then what omap-dma is doing. If I got it
right. When the DMA is running and a new request comes the driver will append
the new transfer to the list used by the HW. This way there will be no stop
and restart needed, the DMA is running w/o interruption.

> Speaking of which, from a purely design point of view, as long as you think
> beforehand what is your sequence, ie. what is the sequence of your link
> chaining, completion handling, etc ..., both marking before or after next tx
> start should be fine IMHO.

Yes, it might be a bit better from performance point of view if we first start
the pending descriptor (if there is one) then do the vchan_cookie_complete().
On the other hand if we care more about latency and accuracy we should
complete the transfer first then look for pending descriptors. But since
virt_dma is using a tasklet for the real completion, the latency is always
going to be when the tasklet is given the chance to execute.

> So in your quest for the "better sequence" the pxa driver's one might give you
> some perspective :)

I did thought about similar 'hot-chaining' for TI's eDMA and sDMA. Especially
eDMA would benefit from it, but so far I see too many race conditions to
overcome to be brave enough to write something to test it. and I don't have
time for it atm ;)

-- 
P?ter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion
  2016-07-21  9:33         ` Peter Ujfalusi
@ 2016-07-21  9:35           ` Peter Ujfalusi
  2016-07-21  9:47           ` Russell King - ARM Linux
  1 sibling, 0 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-21  9:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/21/16 12:33, Peter Ujfalusi wrote:
> On 07/20/16 09:26, Robert Jarzmik wrote:
>> Peter Ujfalusi <peter.ujfalusi@ti.com> writes:
>>
>>> On 07/18/16 13:34, Russell King - ARM Linux wrote:
>>>> On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote:
>>>>> Before looking for the next descriptor to start, complete the just finished
>>>>> cookie.
>>>>
>>>> This change will reduce performance as we no longer have an overlap
>>>> between the next request starting to be dealt with in the hardware
>>>> vs the previous request being completed.
>>>
>>> vchan_cookie_complete() will only mark the cookie completed, adds the vd to
>>> the desc_completed list (it was deleted from desc_issued list when it was
>>> started by omap_dma_start_desc) and schedule the tasklet to deal with the real
>>> completion later.
>>> Marking the just finished descriptor/cookie done first then looking for
>>> possible descriptors in the queue to start feels like a better sequence.
>>>
>>> After a quick grep in the kernel source: only omap-dma.c was starting the next
>>> transfer before marking the current completed descriptor/cookie done.
>>
>> Euh actually I think it's done in other drivers as well :
>>  - Documentation/dmaengine/pxa_dma.txt (chapter "Transfers hot-chaining)
>>  - drivers/dma/pxa_dma.c
>>    => look for pxad_try_hotchain() and it's impact on pxad_chan_handler() which
>>    will mark the completion while the next transfer is already pumped by the
>>    hardware.
> 
> The 'hot-chaining' is a bit different then what omap-dma is doing.

s/then/than

> If I got it
> right. When the DMA is running and a new request comes the driver will append
> the new transfer to the list used by the HW. This way there will be no stop
> and restart needed, the DMA is running w/o interruption.
> 
>> Speaking of which, from a purely design point of view, as long as you think
>> beforehand what is your sequence, ie. what is the sequence of your link
>> chaining, completion handling, etc ..., both marking before or after next tx
>> start should be fine IMHO.
> 
> Yes, it might be a bit better from performance point of view if we first start
> the pending descriptor (if there is one) then do the vchan_cookie_complete().
> On the other hand if we care more about latency and accuracy we should
> complete the transfer first then look for pending descriptors. But since
> virt_dma is using a tasklet for the real completion, the latency is always
> going to be when the tasklet is given the chance to execute.
> 
>> So in your quest for the "better sequence" the pxa driver's one might give you
>> some perspective :)
> 
> I did thought about similar 'hot-chaining' for TI's eDMA and sDMA. Especially
> eDMA would benefit from it, but so far I see too many race conditions to
> overcome to be brave enough to write something to test it. and I don't have
> time for it atm ;)
> 


-- 
P?ter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion
  2016-07-21  9:33         ` Peter Ujfalusi
  2016-07-21  9:35           ` Peter Ujfalusi
@ 2016-07-21  9:47           ` Russell King - ARM Linux
  2016-07-22 11:00             ` Peter Ujfalusi
  1 sibling, 1 reply; 24+ messages in thread
From: Russell King - ARM Linux @ 2016-07-21  9:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 21, 2016 at 12:33:12PM +0300, Peter Ujfalusi wrote:
> On 07/20/16 09:26, Robert Jarzmik wrote:
> > Speaking of which, from a purely design point of view, as long as you think
> > beforehand what is your sequence, ie. what is the sequence of your link
> > chaining, completion handling, etc ..., both marking before or after next tx
> > start should be fine IMHO.
> 
> Yes, it might be a bit better from performance point of view if we first start
> the pending descriptor (if there is one) then do the vchan_cookie_complete().
> On the other hand if we care more about latency and accuracy we should
> complete the transfer first then look for pending descriptors. But since
> virt_dma is using a tasklet for the real completion, the latency is always
> going to be when the tasklet is given the chance to execute.

I think this shows a slight misunderstanding of the DMA engine API.  The
DMA completion is defined by the API to always happen in tasklet context,
which is why the virt-dma stuff does it that way - and all other DMA
engine drivers.  It's one of the fundamentals of the API.

As it happens in tasklet context, tasklets can be scheduled to run with
variable latency, so any use of the DMA engine API which has a predictable
latency around the completion handling is going to be unreliable.

Remember also that with circular buffers, there's no guarantee of getting
period-based completion callbacks - several periods can complete and you
are only guaranteed to get one completion callback.

So, the idea that completion callbacks can have anything to do with low
latency or accuracy is totally incorrect.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion
  2016-07-21  9:47           ` Russell King - ARM Linux
@ 2016-07-22 11:00             ` Peter Ujfalusi
  0 siblings, 0 replies; 24+ messages in thread
From: Peter Ujfalusi @ 2016-07-22 11:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/21/16 12:47, Russell King - ARM Linux wrote:
> On Thu, Jul 21, 2016 at 12:33:12PM +0300, Peter Ujfalusi wrote:
>> On 07/20/16 09:26, Robert Jarzmik wrote:
>>> Speaking of which, from a purely design point of view, as long as you think
>>> beforehand what is your sequence, ie. what is the sequence of your link
>>> chaining, completion handling, etc ..., both marking before or after next tx
>>> start should be fine IMHO.
>>
>> Yes, it might be a bit better from performance point of view if we first start
>> the pending descriptor (if there is one) then do the vchan_cookie_complete().
>> On the other hand if we care more about latency and accuracy we should
>> complete the transfer first then look for pending descriptors. But since
>> virt_dma is using a tasklet for the real completion, the latency is always
>> going to be when the tasklet is given the chance to execute.
> 
> I think this shows a slight misunderstanding of the DMA engine API.  The
> DMA completion is defined by the API to always happen in tasklet context,
> which is why the virt-dma stuff does it that way - and all other DMA
> engine drivers.  It's one of the fundamentals of the API.
> 
> As it happens in tasklet context, tasklets can be scheduled to run with
> variable latency, so any use of the DMA engine API which has a predictable
> latency around the completion handling is going to be unreliable.
> 
> Remember also that with circular buffers, there's no guarantee of getting
> period-based completion callbacks - several periods can complete and you
> are only guaranteed to get one completion callback.
> 
> So, the idea that completion callbacks can have anything to do with low
> latency or accuracy is totally incorrect.

Thanks for refreshing my memory, you are absolutely right.

-- 
P?ter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion
  2016-07-19 16:20       ` Russell King - ARM Linux
  2016-07-19 19:23         ` Peter Ujfalusi
@ 2016-07-24  7:39         ` Vinod Koul
  1 sibling, 0 replies; 24+ messages in thread
From: Vinod Koul @ 2016-07-24  7:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jul 19, 2016 at 05:20:04PM +0100, Russell King - ARM Linux wrote:
> On Tue, Jul 19, 2016 at 03:35:18PM +0300, Peter Ujfalusi wrote:
> > On 07/18/16 13:34, Russell King - ARM Linux wrote:
> > > On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote:
> > >> Before looking for the next descriptor to start, complete the just finished
> > >> cookie.
> > > 
> > > This change will reduce performance as we no longer have an overlap
> > > between the next request starting to be dealt with in the hardware
> > > vs the previous request being completed.
> > 
> > vchan_cookie_complete() will only mark the cookie completed, adds the vd to
> > the desc_completed list (it was deleted from desc_issued list when it was
> > started by omap_dma_start_desc) and schedule the tasklet to deal with the real
> > completion later.
> > Marking the just finished descriptor/cookie done first then looking for
> > possible descriptors in the queue to start feels like a better sequence.
> 
> I deliberately arranged the code in the original order so that the next
> transfer was started on the hardware with the least amount of work by
> the CPU.  Yes, there may not be much in it, but everything you mention
> above adds to the number of CPU cycles that need to be executed before
> the next transfer can be started.

Yes that is really the right thing to do. Ideally people would want to
minimize the delay and submit the next one as soon as possible, but people
have been lazy on this and few other aspects :)

-- 
~Vinod

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-07-24  7:39 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi
2016-07-14 12:42 ` [PATCH 1/7] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list Peter Ujfalusi
2016-07-14 12:42 ` [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion Peter Ujfalusi
2016-07-18 10:34   ` Russell King - ARM Linux
2016-07-19 12:35     ` Peter Ujfalusi
2016-07-19 16:20       ` Russell King - ARM Linux
2016-07-19 19:23         ` Peter Ujfalusi
2016-07-24  7:39         ` Vinod Koul
2016-07-20  6:26       ` Robert Jarzmik
2016-07-21  9:33         ` Peter Ujfalusi
2016-07-21  9:35           ` Peter Ujfalusi
2016-07-21  9:47           ` Russell King - ARM Linux
2016-07-22 11:00             ` Peter Ujfalusi
2016-07-14 12:42 ` [PATCH 3/7] dmaengine: omap-dma: Simplify omap_dma_callback Peter Ujfalusi
2016-07-14 12:42 ` [PATCH 4/7] dmaengine: omap-dma: Dynamically allocate memory for lch_map Peter Ujfalusi
2016-07-14 12:42 ` [PATCH 5/7] dmaengine: omap-dma: Add more debug information when freeing channel Peter Ujfalusi
2016-07-14 12:42 ` [PATCH 6/7] dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop Peter Ujfalusi
2016-07-14 12:42 ` [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg Peter Ujfalusi
2016-07-18 10:42   ` Russell King - ARM Linux
2016-07-18 11:12     ` Peter Ujfalusi
2016-07-18 10:31 ` [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Russell King - ARM Linux
2016-07-18 12:07   ` Peter Ujfalusi
2016-07-18 12:21     ` Russell King - ARM Linux
2016-07-18 12:30       ` Peter Ujfalusi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).