All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] dma: edma: Support scatter-lists of any length
@ 2013-07-29 13:29 ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List

The following series adds support to EDMA driver to enable DMA of
scatter-gather lists of arbitrary length, but still make use of only
a certain MAX number of slots at a time for a given channel. Thus
free-ing up the rest of the slots to other slaves/channels.  With this
there is no need for slave drivers to query the EDMA driver about how
much is the MAX it can send at a time as done in [1]. Drivers can send
SG lists of any number of entries to DMA. Reference discussion at [2].

Tested omap-aes and omap_hsmmc drivers with different MAX number of slots,
even just 1. In the case where it is 1, only 1-slot is used to DMA an
entire scatter list of arbitrary length.
Since this series touches EDMA private API code also shared with davinci-pcm,
playback of a 16-bit 44.1KHz audio file with davinci-pcm has been tested.

Sample test run with 1 vs 16 (MAX number of slots/SG) in omap-aes driver:
MAX slots = 1:
 (128 bit key, 8192 byte blocks): 1266 operations in 1 seconds (10371072 bytes)
MAX slots = 16:
 (128 bit key, 8192 byte blocks): 1601 operations in 1 seconds (13115392 bytes)

Note: For the above test, 8K buffer is mapped into SG list of size 2 so
only 2 slots are required. So beyond size 2, there will not be any noticeable
performance improvement. But using 1 slot is even managed by just DMA'ing 1 SG
entry at a time.

[1] https://lkml.org/lkml/2013/7/18/432
[2] http://marc.info/?l=linux-omap&m=137416733628831&w=2

Joel Fernandes (9):
  dma: edma: Setup parameters to DMA MAX_NR_SG at a time
  dma: edma: Write out and handle MAX_NR_SG at a given time
  ARM: edma: Add function to manually trigger an EDMA channel
  dma: edma: Find missed events and issue them
  dma: edma: Leave linked to Null slot instead of DUMMY slot
  dma: edma: Detect null slot errors and handle them correctly
  ARM: edma: Don't clear EMR of channel in edma_stop
  dma: edma: Link to dummy slot only for last SG list split
  dma: edma: remove limits on number of slots

 arch/arm/common/edma.c             |   22 ++++-
 drivers/dma/edma.c                 |  157 +++++++++++++++++++++++++++---------
 include/linux/platform_data/edma.h |    2 +
 3 files changed, 142 insertions(+), 39 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 0/9] dma: edma: Support scatter-lists of any length
@ 2013-07-29 13:29 ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux DaVinci Kernel List, Linux OMAP List, Linux MMC List,
	Linux Kernel Mailing List, Linux ARM Kernel List

The following series adds support to EDMA driver to enable DMA of
scatter-gather lists of arbitrary length, but still make use of only
a certain MAX number of slots at a time for a given channel. Thus
free-ing up the rest of the slots to other slaves/channels.  With this
there is no need for slave drivers to query the EDMA driver about how
much is the MAX it can send at a time as done in [1]. Drivers can send
SG lists of any number of entries to DMA. Reference discussion at [2].

Tested omap-aes and omap_hsmmc drivers with different MAX number of slots,
even just 1. In the case where it is 1, only 1-slot is used to DMA an
entire scatter list of arbitrary length.
Since this series touches EDMA private API code also shared with davinci-pcm,
playback of a 16-bit 44.1KHz audio file with davinci-pcm has been tested.

Sample test run with 1 vs 16 (MAX number of slots/SG) in omap-aes driver:
MAX slots = 1:
 (128 bit key, 8192 byte blocks): 1266 operations in 1 seconds (10371072 bytes)
MAX slots = 16:
 (128 bit key, 8192 byte blocks): 1601 operations in 1 seconds (13115392 bytes)

Note: For the above test, 8K buffer is mapped into SG list of size 2 so
only 2 slots are required. So beyond size 2, there will not be any noticeable
performance improvement. But using 1 slot is even managed by just DMA'ing 1 SG
entry at a time.

[1] https://lkml.org/lkml/2013/7/18/432
[2] http://marc.info/?l=linux-omap&m=137416733628831&w=2

Joel Fernandes (9):
  dma: edma: Setup parameters to DMA MAX_NR_SG at a time
  dma: edma: Write out and handle MAX_NR_SG at a given time
  ARM: edma: Add function to manually trigger an EDMA channel
  dma: edma: Find missed events and issue them
  dma: edma: Leave linked to Null slot instead of DUMMY slot
  dma: edma: Detect null slot errors and handle them correctly
  ARM: edma: Don't clear EMR of channel in edma_stop
  dma: edma: Link to dummy slot only for last SG list split
  dma: edma: remove limits on number of slots

 arch/arm/common/edma.c             |   22 ++++-
 drivers/dma/edma.c                 |  157 +++++++++++++++++++++++++++---------
 include/linux/platform_data/edma.h |    2 +
 3 files changed, 142 insertions(+), 39 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 0/9] dma: edma: Support scatter-lists of any length
@ 2013-07-29 13:29 ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

The following series adds support to EDMA driver to enable DMA of
scatter-gather lists of arbitrary length, but still make use of only
a certain MAX number of slots at a time for a given channel. Thus
free-ing up the rest of the slots to other slaves/channels.  With this
there is no need for slave drivers to query the EDMA driver about how
much is the MAX it can send at a time as done in [1]. Drivers can send
SG lists of any number of entries to DMA. Reference discussion at [2].

Tested omap-aes and omap_hsmmc drivers with different MAX number of slots,
even just 1. In the case where it is 1, only 1-slot is used to DMA an
entire scatter list of arbitrary length.
Since this series touches EDMA private API code also shared with davinci-pcm,
playback of a 16-bit 44.1KHz audio file with davinci-pcm has been tested.

Sample test run with 1 vs 16 (MAX number of slots/SG) in omap-aes driver:
MAX slots = 1:
 (128 bit key, 8192 byte blocks): 1266 operations in 1 seconds (10371072 bytes)
MAX slots = 16:
 (128 bit key, 8192 byte blocks): 1601 operations in 1 seconds (13115392 bytes)

Note: For the above test, 8K buffer is mapped into SG list of size 2 so
only 2 slots are required. So beyond size 2, there will not be any noticeable
performance improvement. But using 1 slot is even managed by just DMA'ing 1 SG
entry at a time.

[1] https://lkml.org/lkml/2013/7/18/432
[2] http://marc.info/?l=linux-omap&m=137416733628831&w=2

Joel Fernandes (9):
  dma: edma: Setup parameters to DMA MAX_NR_SG at a time
  dma: edma: Write out and handle MAX_NR_SG at a given time
  ARM: edma: Add function to manually trigger an EDMA channel
  dma: edma: Find missed events and issue them
  dma: edma: Leave linked to Null slot instead of DUMMY slot
  dma: edma: Detect null slot errors and handle them correctly
  ARM: edma: Don't clear EMR of channel in edma_stop
  dma: edma: Link to dummy slot only for last SG list split
  dma: edma: remove limits on number of slots

 arch/arm/common/edma.c             |   22 ++++-
 drivers/dma/edma.c                 |  157 +++++++++++++++++++++++++++---------
 include/linux/platform_data/edma.h |    2 +
 3 files changed, 142 insertions(+), 39 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 1/9] dma: edma: Setup parameters to DMA MAX_NR_SG at a time
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

Changes are made here for configuring existing parameters to support
DMA'ing them out in batches as needed.

Also allocate as many as slots as needed by the SG list, but not more
than MAX_NR_SG. Then these slots will be reused accordingly.
For ex, if MAX_NR_SG=10, and number of SG entries is 40, still only
10 slots will be allocated to DMA the entire SG list of size 40.

Also enable TC interrupts for slots that are a last in a current
iteration, or that fall on a MAX_NR_SG boundary.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 5f3e532..0b68f94 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -222,9 +222,9 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 	enum dma_slave_buswidth dev_width;
 	u32 burst;
 	struct scatterlist *sg;
-	int i;
 	int acnt, bcnt, ccnt, src, dst, cidx;
 	int src_bidx, dst_bidx, src_cidx, dst_cidx;
+	int i, num_slots_needed;
 
 	if (unlikely(!echan || !sgl || !sg_len))
 		return NULL;
@@ -262,8 +262,11 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 
 	edesc->pset_nr = sg_len;
 
-	for_each_sg(sgl, sg, sg_len, i) {
-		/* Allocate a PaRAM slot, if needed */
+	/* Allocate a PaRAM slot, if needed */
+
+	num_slots_needed = sg_len > MAX_NR_SG ? MAX_NR_SG : sg_len;
+
+	for (i = 0; i < num_slots_needed; i++) {
 		if (echan->slot[i] < 0) {
 			echan->slot[i] =
 				edma_alloc_slot(EDMA_CTLR(echan->ch_num),
@@ -273,6 +276,10 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 				return NULL;
 			}
 		}
+	}
+
+	/* Configure PaRAM sets for each SG */
+	for_each_sg(sgl, sg, sg_len, i) {
 
 		acnt = dev_width;
 
@@ -330,6 +337,12 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 		/* Configure A or AB synchronized transfers */
 		if (edesc->absync)
 			edesc->pset[i].opt |= SYNCDIM;
+
+		/* If this is the last in a current SG set of transactions,
+		   enable interrupts so that next set is processed */
+		if (!((i+1) % MAX_NR_SG))
+			edesc->pset[i].opt |= TCINTEN;
+
 		/* If this is the last set, enable completion interrupt flag */
 		if (i == sg_len - 1)
 			edesc->pset[i].opt |= TCINTEN;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 1/9] dma: edma: Setup parameters to DMA MAX_NR_SG at a time
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux DaVinci Kernel List, Joel Fernandes, Linux MMC List,
	Linux Kernel Mailing List, Linux OMAP List,
	Linux ARM Kernel List

Changes are made here for configuring existing parameters to support
DMA'ing them out in batches as needed.

Also allocate as many as slots as needed by the SG list, but not more
than MAX_NR_SG. Then these slots will be reused accordingly.
For ex, if MAX_NR_SG=10, and number of SG entries is 40, still only
10 slots will be allocated to DMA the entire SG list of size 40.

Also enable TC interrupts for slots that are a last in a current
iteration, or that fall on a MAX_NR_SG boundary.

Signed-off-by: Joel Fernandes <joelf-l0cyMroinI0@public.gmane.org>
---
 drivers/dma/edma.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 5f3e532..0b68f94 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -222,9 +222,9 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 	enum dma_slave_buswidth dev_width;
 	u32 burst;
 	struct scatterlist *sg;
-	int i;
 	int acnt, bcnt, ccnt, src, dst, cidx;
 	int src_bidx, dst_bidx, src_cidx, dst_cidx;
+	int i, num_slots_needed;
 
 	if (unlikely(!echan || !sgl || !sg_len))
 		return NULL;
@@ -262,8 +262,11 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 
 	edesc->pset_nr = sg_len;
 
-	for_each_sg(sgl, sg, sg_len, i) {
-		/* Allocate a PaRAM slot, if needed */
+	/* Allocate a PaRAM slot, if needed */
+
+	num_slots_needed = sg_len > MAX_NR_SG ? MAX_NR_SG : sg_len;
+
+	for (i = 0; i < num_slots_needed; i++) {
 		if (echan->slot[i] < 0) {
 			echan->slot[i] =
 				edma_alloc_slot(EDMA_CTLR(echan->ch_num),
@@ -273,6 +276,10 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 				return NULL;
 			}
 		}
+	}
+
+	/* Configure PaRAM sets for each SG */
+	for_each_sg(sgl, sg, sg_len, i) {
 
 		acnt = dev_width;
 
@@ -330,6 +337,12 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 		/* Configure A or AB synchronized transfers */
 		if (edesc->absync)
 			edesc->pset[i].opt |= SYNCDIM;
+
+		/* If this is the last in a current SG set of transactions,
+		   enable interrupts so that next set is processed */
+		if (!((i+1) % MAX_NR_SG))
+			edesc->pset[i].opt |= TCINTEN;
+
 		/* If this is the last set, enable completion interrupt flag */
 		if (i == sg_len - 1)
 			edesc->pset[i].opt |= TCINTEN;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 1/9] dma: edma: Setup parameters to DMA MAX_NR_SG at a time
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

Changes are made here for configuring existing parameters to support
DMA'ing them out in batches as needed.

Also allocate as many as slots as needed by the SG list, but not more
than MAX_NR_SG. Then these slots will be reused accordingly.
For ex, if MAX_NR_SG=10, and number of SG entries is 40, still only
10 slots will be allocated to DMA the entire SG list of size 40.

Also enable TC interrupts for slots that are a last in a current
iteration, or that fall on a MAX_NR_SG boundary.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |   19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 5f3e532..0b68f94 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -222,9 +222,9 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 	enum dma_slave_buswidth dev_width;
 	u32 burst;
 	struct scatterlist *sg;
-	int i;
 	int acnt, bcnt, ccnt, src, dst, cidx;
 	int src_bidx, dst_bidx, src_cidx, dst_cidx;
+	int i, num_slots_needed;
 
 	if (unlikely(!echan || !sgl || !sg_len))
 		return NULL;
@@ -262,8 +262,11 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 
 	edesc->pset_nr = sg_len;
 
-	for_each_sg(sgl, sg, sg_len, i) {
-		/* Allocate a PaRAM slot, if needed */
+	/* Allocate a PaRAM slot, if needed */
+
+	num_slots_needed = sg_len > MAX_NR_SG ? MAX_NR_SG : sg_len;
+
+	for (i = 0; i < num_slots_needed; i++) {
 		if (echan->slot[i] < 0) {
 			echan->slot[i] =
 				edma_alloc_slot(EDMA_CTLR(echan->ch_num),
@@ -273,6 +276,10 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 				return NULL;
 			}
 		}
+	}
+
+	/* Configure PaRAM sets for each SG */
+	for_each_sg(sgl, sg, sg_len, i) {
 
 		acnt = dev_width;
 
@@ -330,6 +337,12 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 		/* Configure A or AB synchronized transfers */
 		if (edesc->absync)
 			edesc->pset[i].opt |= SYNCDIM;
+
+		/* If this is the last in a current SG set of transactions,
+		   enable interrupts so that next set is processed */
+		if (!((i+1) % MAX_NR_SG))
+			edesc->pset[i].opt |= TCINTEN;
+
 		/* If this is the last set, enable completion interrupt flag */
 		if (i == sg_len - 1)
 			edesc->pset[i].opt |= TCINTEN;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 2/9] dma: edma: Write out and handle MAX_NR_SG at a given time
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

Process SG-elements in batches of MAX_NR_SG if they are greater
than MAX_NR_SG. Due to this, at any given time only those many
slots will be used in the given channel no matter how long the
scatter list is. We keep track of how much has been written
inorder to process the next batch of elements in the scatter-list
and detect completion.

For such intermediate transfer completions (one batch of MAX_NR_SG),
make use of pause and resume functions instead of start and stop
when such intermediate transfer is in progress or completed as we
donot want to clear any pending events.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |   79 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 54 insertions(+), 25 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 0b68f94..d9a151b 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -56,6 +56,7 @@ struct edma_desc {
 	struct list_head		node;
 	int				absync;
 	int				pset_nr;
+	int				total_processed;
 	struct edmacc_param		pset[0];
 };
 
@@ -104,22 +105,36 @@ static void edma_desc_free(struct virt_dma_desc *vdesc)
 /* Dispatch a queued descriptor to the controller (caller holds lock) */
 static void edma_execute(struct edma_chan *echan)
 {
-	struct virt_dma_desc *vdesc = vchan_next_desc(&echan->vchan);
+	struct virt_dma_desc *vdesc;
 	struct edma_desc *edesc;
-	int i;
+	struct device *dev = echan->vchan.chan.device->dev;
 
-	if (!vdesc) {
-		echan->edesc = NULL;
-		return;
+	int i, j, total_left, total_process;
+
+	/* If either we processed all psets or we're still not started */
+	if (!echan->edesc ||
+	    echan->edesc->pset_nr == echan->edesc->total_processed) {
+		/* Get next vdesc */
+		vdesc = vchan_next_desc(&echan->vchan);
+		if (!vdesc) {
+			echan->edesc = NULL;
+			return;
+		}
+		list_del(&vdesc->node);
+		echan->edesc = to_edma_desc(&vdesc->tx);
 	}
 
-	list_del(&vdesc->node);
+	edesc = echan->edesc;
+
+	/* Find out how many left */
+	total_left = edesc->pset_nr - edesc->total_processed;
+	total_process = total_left > MAX_NR_SG ? MAX_NR_SG : total_left;
 
-	echan->edesc = edesc = to_edma_desc(&vdesc->tx);
 
 	/* Write descriptor PaRAM set(s) */
-	for (i = 0; i < edesc->pset_nr; i++) {
-		edma_write_slot(echan->slot[i], &edesc->pset[i]);
+	for (i = 0; i < total_process; i++) {
+		j = i + edesc->total_processed;
+		edma_write_slot(echan->slot[i], &edesc->pset[j]);
 		dev_dbg(echan->vchan.chan.device->dev,
 			"\n pset[%d]:\n"
 			"  chnum\t%d\n"
@@ -132,24 +147,31 @@ static void edma_execute(struct edma_chan *echan)
 			"  bidx\t%08x\n"
 			"  cidx\t%08x\n"
 			"  lkrld\t%08x\n",
-			i, echan->ch_num, echan->slot[i],
-			edesc->pset[i].opt,
-			edesc->pset[i].src,
-			edesc->pset[i].dst,
-			edesc->pset[i].a_b_cnt,
-			edesc->pset[i].ccnt,
-			edesc->pset[i].src_dst_bidx,
-			edesc->pset[i].src_dst_cidx,
-			edesc->pset[i].link_bcntrld);
+			j, echan->ch_num, echan->slot[i],
+			edesc->pset[j].opt,
+			edesc->pset[j].src,
+			edesc->pset[j].dst,
+			edesc->pset[j].a_b_cnt,
+			edesc->pset[j].ccnt,
+			edesc->pset[j].src_dst_bidx,
+			edesc->pset[j].src_dst_cidx,
+			edesc->pset[j].link_bcntrld);
 		/* Link to the previous slot if not the last set */
-		if (i != (edesc->pset_nr - 1))
+		if (i != (total_process - 1))
 			edma_link(echan->slot[i], echan->slot[i+1]);
 		/* Final pset links to the dummy pset */
 		else
 			edma_link(echan->slot[i], echan->ecc->dummy_slot);
 	}
 
-	edma_start(echan->ch_num);
+	edesc->total_processed += total_process;
+
+	edma_resume(echan->ch_num);
+
+	if (edesc->total_processed <= MAX_NR_SG) {
+		dev_dbg(dev, "first transfer starting %d\n", echan->ch_num);
+		edma_start(echan->ch_num);
+	}
 }
 
 static int edma_terminate_all(struct edma_chan *echan)
@@ -369,19 +391,26 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
 	struct edma_desc *edesc;
 	unsigned long flags;
 
-	/* Stop the channel */
-	edma_stop(echan->ch_num);
+	/* Pause the channel */
+	edma_pause(echan->ch_num);
 
 	switch (ch_status) {
 	case DMA_COMPLETE:
-		dev_dbg(dev, "transfer complete on channel %d\n", ch_num);
-
 		spin_lock_irqsave(&echan->vchan.lock, flags);
 
 		edesc = echan->edesc;
 		if (edesc) {
+			if (edesc->total_processed == edesc->pset_nr) {
+				dev_dbg(dev, "transfer complete." \
+					" stopping channel %d\n", ch_num);
+				edma_stop(echan->ch_num);
+				vchan_cookie_complete(&edesc->vdesc);
+			} else {
+				dev_dbg(dev, "Intermediate transfer complete" \
+					" on channel %d\n", ch_num);
+			}
+
 			edma_execute(echan);
-			vchan_cookie_complete(&edesc->vdesc);
 		}
 
 		spin_unlock_irqrestore(&echan->vchan.lock, flags);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 2/9] dma: edma: Write out and handle MAX_NR_SG at a given time
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux DaVinci Kernel List, Joel Fernandes, Linux MMC List,
	Linux Kernel Mailing List, Linux OMAP List,
	Linux ARM Kernel List

Process SG-elements in batches of MAX_NR_SG if they are greater
than MAX_NR_SG. Due to this, at any given time only those many
slots will be used in the given channel no matter how long the
scatter list is. We keep track of how much has been written
inorder to process the next batch of elements in the scatter-list
and detect completion.

For such intermediate transfer completions (one batch of MAX_NR_SG),
make use of pause and resume functions instead of start and stop
when such intermediate transfer is in progress or completed as we
donot want to clear any pending events.

Signed-off-by: Joel Fernandes <joelf-l0cyMroinI0@public.gmane.org>
---
 drivers/dma/edma.c |   79 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 54 insertions(+), 25 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 0b68f94..d9a151b 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -56,6 +56,7 @@ struct edma_desc {
 	struct list_head		node;
 	int				absync;
 	int				pset_nr;
+	int				total_processed;
 	struct edmacc_param		pset[0];
 };
 
@@ -104,22 +105,36 @@ static void edma_desc_free(struct virt_dma_desc *vdesc)
 /* Dispatch a queued descriptor to the controller (caller holds lock) */
 static void edma_execute(struct edma_chan *echan)
 {
-	struct virt_dma_desc *vdesc = vchan_next_desc(&echan->vchan);
+	struct virt_dma_desc *vdesc;
 	struct edma_desc *edesc;
-	int i;
+	struct device *dev = echan->vchan.chan.device->dev;
 
-	if (!vdesc) {
-		echan->edesc = NULL;
-		return;
+	int i, j, total_left, total_process;
+
+	/* If either we processed all psets or we're still not started */
+	if (!echan->edesc ||
+	    echan->edesc->pset_nr == echan->edesc->total_processed) {
+		/* Get next vdesc */
+		vdesc = vchan_next_desc(&echan->vchan);
+		if (!vdesc) {
+			echan->edesc = NULL;
+			return;
+		}
+		list_del(&vdesc->node);
+		echan->edesc = to_edma_desc(&vdesc->tx);
 	}
 
-	list_del(&vdesc->node);
+	edesc = echan->edesc;
+
+	/* Find out how many left */
+	total_left = edesc->pset_nr - edesc->total_processed;
+	total_process = total_left > MAX_NR_SG ? MAX_NR_SG : total_left;
 
-	echan->edesc = edesc = to_edma_desc(&vdesc->tx);
 
 	/* Write descriptor PaRAM set(s) */
-	for (i = 0; i < edesc->pset_nr; i++) {
-		edma_write_slot(echan->slot[i], &edesc->pset[i]);
+	for (i = 0; i < total_process; i++) {
+		j = i + edesc->total_processed;
+		edma_write_slot(echan->slot[i], &edesc->pset[j]);
 		dev_dbg(echan->vchan.chan.device->dev,
 			"\n pset[%d]:\n"
 			"  chnum\t%d\n"
@@ -132,24 +147,31 @@ static void edma_execute(struct edma_chan *echan)
 			"  bidx\t%08x\n"
 			"  cidx\t%08x\n"
 			"  lkrld\t%08x\n",
-			i, echan->ch_num, echan->slot[i],
-			edesc->pset[i].opt,
-			edesc->pset[i].src,
-			edesc->pset[i].dst,
-			edesc->pset[i].a_b_cnt,
-			edesc->pset[i].ccnt,
-			edesc->pset[i].src_dst_bidx,
-			edesc->pset[i].src_dst_cidx,
-			edesc->pset[i].link_bcntrld);
+			j, echan->ch_num, echan->slot[i],
+			edesc->pset[j].opt,
+			edesc->pset[j].src,
+			edesc->pset[j].dst,
+			edesc->pset[j].a_b_cnt,
+			edesc->pset[j].ccnt,
+			edesc->pset[j].src_dst_bidx,
+			edesc->pset[j].src_dst_cidx,
+			edesc->pset[j].link_bcntrld);
 		/* Link to the previous slot if not the last set */
-		if (i != (edesc->pset_nr - 1))
+		if (i != (total_process - 1))
 			edma_link(echan->slot[i], echan->slot[i+1]);
 		/* Final pset links to the dummy pset */
 		else
 			edma_link(echan->slot[i], echan->ecc->dummy_slot);
 	}
 
-	edma_start(echan->ch_num);
+	edesc->total_processed += total_process;
+
+	edma_resume(echan->ch_num);
+
+	if (edesc->total_processed <= MAX_NR_SG) {
+		dev_dbg(dev, "first transfer starting %d\n", echan->ch_num);
+		edma_start(echan->ch_num);
+	}
 }
 
 static int edma_terminate_all(struct edma_chan *echan)
@@ -369,19 +391,26 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
 	struct edma_desc *edesc;
 	unsigned long flags;
 
-	/* Stop the channel */
-	edma_stop(echan->ch_num);
+	/* Pause the channel */
+	edma_pause(echan->ch_num);
 
 	switch (ch_status) {
 	case DMA_COMPLETE:
-		dev_dbg(dev, "transfer complete on channel %d\n", ch_num);
-
 		spin_lock_irqsave(&echan->vchan.lock, flags);
 
 		edesc = echan->edesc;
 		if (edesc) {
+			if (edesc->total_processed == edesc->pset_nr) {
+				dev_dbg(dev, "transfer complete." \
+					" stopping channel %d\n", ch_num);
+				edma_stop(echan->ch_num);
+				vchan_cookie_complete(&edesc->vdesc);
+			} else {
+				dev_dbg(dev, "Intermediate transfer complete" \
+					" on channel %d\n", ch_num);
+			}
+
 			edma_execute(echan);
-			vchan_cookie_complete(&edesc->vdesc);
 		}
 
 		spin_unlock_irqrestore(&echan->vchan.lock, flags);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 2/9] dma: edma: Write out and handle MAX_NR_SG at a given time
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

Process SG-elements in batches of MAX_NR_SG if they are greater
than MAX_NR_SG. Due to this, at any given time only those many
slots will be used in the given channel no matter how long the
scatter list is. We keep track of how much has been written
inorder to process the next batch of elements in the scatter-list
and detect completion.

For such intermediate transfer completions (one batch of MAX_NR_SG),
make use of pause and resume functions instead of start and stop
when such intermediate transfer is in progress or completed as we
donot want to clear any pending events.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |   79 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 54 insertions(+), 25 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 0b68f94..d9a151b 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -56,6 +56,7 @@ struct edma_desc {
 	struct list_head		node;
 	int				absync;
 	int				pset_nr;
+	int				total_processed;
 	struct edmacc_param		pset[0];
 };
 
@@ -104,22 +105,36 @@ static void edma_desc_free(struct virt_dma_desc *vdesc)
 /* Dispatch a queued descriptor to the controller (caller holds lock) */
 static void edma_execute(struct edma_chan *echan)
 {
-	struct virt_dma_desc *vdesc = vchan_next_desc(&echan->vchan);
+	struct virt_dma_desc *vdesc;
 	struct edma_desc *edesc;
-	int i;
+	struct device *dev = echan->vchan.chan.device->dev;
 
-	if (!vdesc) {
-		echan->edesc = NULL;
-		return;
+	int i, j, total_left, total_process;
+
+	/* If either we processed all psets or we're still not started */
+	if (!echan->edesc ||
+	    echan->edesc->pset_nr == echan->edesc->total_processed) {
+		/* Get next vdesc */
+		vdesc = vchan_next_desc(&echan->vchan);
+		if (!vdesc) {
+			echan->edesc = NULL;
+			return;
+		}
+		list_del(&vdesc->node);
+		echan->edesc = to_edma_desc(&vdesc->tx);
 	}
 
-	list_del(&vdesc->node);
+	edesc = echan->edesc;
+
+	/* Find out how many left */
+	total_left = edesc->pset_nr - edesc->total_processed;
+	total_process = total_left > MAX_NR_SG ? MAX_NR_SG : total_left;
 
-	echan->edesc = edesc = to_edma_desc(&vdesc->tx);
 
 	/* Write descriptor PaRAM set(s) */
-	for (i = 0; i < edesc->pset_nr; i++) {
-		edma_write_slot(echan->slot[i], &edesc->pset[i]);
+	for (i = 0; i < total_process; i++) {
+		j = i + edesc->total_processed;
+		edma_write_slot(echan->slot[i], &edesc->pset[j]);
 		dev_dbg(echan->vchan.chan.device->dev,
 			"\n pset[%d]:\n"
 			"  chnum\t%d\n"
@@ -132,24 +147,31 @@ static void edma_execute(struct edma_chan *echan)
 			"  bidx\t%08x\n"
 			"  cidx\t%08x\n"
 			"  lkrld\t%08x\n",
-			i, echan->ch_num, echan->slot[i],
-			edesc->pset[i].opt,
-			edesc->pset[i].src,
-			edesc->pset[i].dst,
-			edesc->pset[i].a_b_cnt,
-			edesc->pset[i].ccnt,
-			edesc->pset[i].src_dst_bidx,
-			edesc->pset[i].src_dst_cidx,
-			edesc->pset[i].link_bcntrld);
+			j, echan->ch_num, echan->slot[i],
+			edesc->pset[j].opt,
+			edesc->pset[j].src,
+			edesc->pset[j].dst,
+			edesc->pset[j].a_b_cnt,
+			edesc->pset[j].ccnt,
+			edesc->pset[j].src_dst_bidx,
+			edesc->pset[j].src_dst_cidx,
+			edesc->pset[j].link_bcntrld);
 		/* Link to the previous slot if not the last set */
-		if (i != (edesc->pset_nr - 1))
+		if (i != (total_process - 1))
 			edma_link(echan->slot[i], echan->slot[i+1]);
 		/* Final pset links to the dummy pset */
 		else
 			edma_link(echan->slot[i], echan->ecc->dummy_slot);
 	}
 
-	edma_start(echan->ch_num);
+	edesc->total_processed += total_process;
+
+	edma_resume(echan->ch_num);
+
+	if (edesc->total_processed <= MAX_NR_SG) {
+		dev_dbg(dev, "first transfer starting %d\n", echan->ch_num);
+		edma_start(echan->ch_num);
+	}
 }
 
 static int edma_terminate_all(struct edma_chan *echan)
@@ -369,19 +391,26 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
 	struct edma_desc *edesc;
 	unsigned long flags;
 
-	/* Stop the channel */
-	edma_stop(echan->ch_num);
+	/* Pause the channel */
+	edma_pause(echan->ch_num);
 
 	switch (ch_status) {
 	case DMA_COMPLETE:
-		dev_dbg(dev, "transfer complete on channel %d\n", ch_num);
-
 		spin_lock_irqsave(&echan->vchan.lock, flags);
 
 		edesc = echan->edesc;
 		if (edesc) {
+			if (edesc->total_processed == edesc->pset_nr) {
+				dev_dbg(dev, "transfer complete." \
+					" stopping channel %d\n", ch_num);
+				edma_stop(echan->ch_num);
+				vchan_cookie_complete(&edesc->vdesc);
+			} else {
+				dev_dbg(dev, "Intermediate transfer complete" \
+					" on channel %d\n", ch_num);
+			}
+
 			edma_execute(echan);
-			vchan_cookie_complete(&edesc->vdesc);
 		}
 
 		spin_unlock_irqrestore(&echan->vchan.lock, flags);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

Manual trigger for events missed as a result of splitting a
scatter gather list and DMA'ing it in batches. Add a helper
function to trigger a channel incase any such events are missed.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 arch/arm/common/edma.c             |   21 +++++++++++++++++++++
 include/linux/platform_data/edma.h |    2 ++
 2 files changed, 23 insertions(+)

diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
index 3567ba1..10995b2 100644
--- a/arch/arm/common/edma.c
+++ b/arch/arm/common/edma.c
@@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
 }
 EXPORT_SYMBOL(edma_resume);
 
+int edma_manual_trigger(unsigned channel)
+{
+	unsigned ctlr;
+	int j;
+	unsigned int mask;
+
+	ctlr = EDMA_CTLR(channel);
+	channel = EDMA_CHAN_SLOT(channel);
+	mask = BIT(channel & 0x1f);
+
+	j = channel >> 5;
+
+	/* EDMA channels without event association */
+	edma_shadow0_write_array(ctlr, SH_ESR, j, mask);
+
+	pr_debug("EDMA: ESR%d %08x\n", j,
+		 edma_shadow0_read_array(ctlr, SH_ESR, j));
+	return 0;
+}
+EXPORT_SYMBOL(edma_manual_trigger);
+
 /**
  * edma_start - start dma on a channel
  * @channel: channel being activated
diff --git a/include/linux/platform_data/edma.h b/include/linux/platform_data/edma.h
index 57300fd..0e11eca 100644
--- a/include/linux/platform_data/edma.h
+++ b/include/linux/platform_data/edma.h
@@ -180,4 +180,6 @@ struct edma_soc_info {
 	const s16	(*xbar_chans)[2];
 };
 
+int edma_manual_trigger(unsigned);
+
 #endif
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux DaVinci Kernel List, Joel Fernandes, Linux MMC List,
	Linux Kernel Mailing List, Linux OMAP List,
	Linux ARM Kernel List

Manual trigger for events missed as a result of splitting a
scatter gather list and DMA'ing it in batches. Add a helper
function to trigger a channel incase any such events are missed.

Signed-off-by: Joel Fernandes <joelf-l0cyMroinI0@public.gmane.org>
---
 arch/arm/common/edma.c             |   21 +++++++++++++++++++++
 include/linux/platform_data/edma.h |    2 ++
 2 files changed, 23 insertions(+)

diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
index 3567ba1..10995b2 100644
--- a/arch/arm/common/edma.c
+++ b/arch/arm/common/edma.c
@@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
 }
 EXPORT_SYMBOL(edma_resume);
 
+int edma_manual_trigger(unsigned channel)
+{
+	unsigned ctlr;
+	int j;
+	unsigned int mask;
+
+	ctlr = EDMA_CTLR(channel);
+	channel = EDMA_CHAN_SLOT(channel);
+	mask = BIT(channel & 0x1f);
+
+	j = channel >> 5;
+
+	/* EDMA channels without event association */
+	edma_shadow0_write_array(ctlr, SH_ESR, j, mask);
+
+	pr_debug("EDMA: ESR%d %08x\n", j,
+		 edma_shadow0_read_array(ctlr, SH_ESR, j));
+	return 0;
+}
+EXPORT_SYMBOL(edma_manual_trigger);
+
 /**
  * edma_start - start dma on a channel
  * @channel: channel being activated
diff --git a/include/linux/platform_data/edma.h b/include/linux/platform_data/edma.h
index 57300fd..0e11eca 100644
--- a/include/linux/platform_data/edma.h
+++ b/include/linux/platform_data/edma.h
@@ -180,4 +180,6 @@ struct edma_soc_info {
 	const s16	(*xbar_chans)[2];
 };
 
+int edma_manual_trigger(unsigned);
+
 #endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

Manual trigger for events missed as a result of splitting a
scatter gather list and DMA'ing it in batches. Add a helper
function to trigger a channel incase any such events are missed.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 arch/arm/common/edma.c             |   21 +++++++++++++++++++++
 include/linux/platform_data/edma.h |    2 ++
 2 files changed, 23 insertions(+)

diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
index 3567ba1..10995b2 100644
--- a/arch/arm/common/edma.c
+++ b/arch/arm/common/edma.c
@@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
 }
 EXPORT_SYMBOL(edma_resume);
 
+int edma_manual_trigger(unsigned channel)
+{
+	unsigned ctlr;
+	int j;
+	unsigned int mask;
+
+	ctlr = EDMA_CTLR(channel);
+	channel = EDMA_CHAN_SLOT(channel);
+	mask = BIT(channel & 0x1f);
+
+	j = channel >> 5;
+
+	/* EDMA channels without event association */
+	edma_shadow0_write_array(ctlr, SH_ESR, j, mask);
+
+	pr_debug("EDMA: ESR%d %08x\n", j,
+		 edma_shadow0_read_array(ctlr, SH_ESR, j));
+	return 0;
+}
+EXPORT_SYMBOL(edma_manual_trigger);
+
 /**
  * edma_start - start dma on a channel
  * @channel: channel being activated
diff --git a/include/linux/platform_data/edma.h b/include/linux/platform_data/edma.h
index 57300fd..0e11eca 100644
--- a/include/linux/platform_data/edma.h
+++ b/include/linux/platform_data/edma.h
@@ -180,4 +180,6 @@ struct edma_soc_info {
 	const s16	(*xbar_chans)[2];
 };
 
+int edma_manual_trigger(unsigned);
+
 #endif
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
  2013-07-29 13:29 ` Joel Fernandes
  (?)
@ 2013-07-29 13:29   ` Joel Fernandes
  -1 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

In an effort to move to using Scatter gather lists of any size with
EDMA as discussed at [1] instead of placing limitations on the driver,
we work through the limitations of the EDMAC hardware to find missed
events and issue them.

The sequence of events that require this are:

For the scenario where MAX slots for an EDMA channel is 3:

SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null

The above SG list will have to be DMA'd in 2 sets:

(1) SG1 -> SG2 -> SG3 -> Null
(2) SG4 -> SG5 -> SG6 -> Null

After (1) is succesfully transferred, the events from the MMC controller
donot stop coming and are missed by the time we have setup the transfer
for (2). So here, we catch the events missed as an error condition and
issue them manually.

[1] http://marc.info/?l=linux-omap&m=137416733628831&w=2

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index d9a151b..aa4989f 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -417,7 +417,15 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
 
 		break;
 	case DMA_CC_ERROR:
-		dev_dbg(dev, "transfer error on channel %d\n", ch_num);
+		if (echan->edesc) {
+			dev_dbg(dev, "Missed event on %d, retrying\n",
+				ch_num);
+			edma_clean_channel(echan->ch_num);
+			edma_stop(echan->ch_num);
+			edma_start(echan->ch_num);
+			edma_manual_trigger(echan->ch_num);
+		}
+		dev_dbg(dev, "handled error on channel %d\n", ch_num);
 		break;
 	default:
 		break;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

In an effort to move to using Scatter gather lists of any size with
EDMA as discussed at [1] instead of placing limitations on the driver,
we work through the limitations of the EDMAC hardware to find missed
events and issue them.

The sequence of events that require this are:

For the scenario where MAX slots for an EDMA channel is 3:

SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null

The above SG list will have to be DMA'd in 2 sets:

(1) SG1 -> SG2 -> SG3 -> Null
(2) SG4 -> SG5 -> SG6 -> Null

After (1) is succesfully transferred, the events from the MMC controller
donot stop coming and are missed by the time we have setup the transfer
for (2). So here, we catch the events missed as an error condition and
issue them manually.

[1] http://marc.info/?l=linux-omap&m=137416733628831&w=2

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index d9a151b..aa4989f 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -417,7 +417,15 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
 
 		break;
 	case DMA_CC_ERROR:
-		dev_dbg(dev, "transfer error on channel %d\n", ch_num);
+		if (echan->edesc) {
+			dev_dbg(dev, "Missed event on %d, retrying\n",
+				ch_num);
+			edma_clean_channel(echan->ch_num);
+			edma_stop(echan->ch_num);
+			edma_start(echan->ch_num);
+			edma_manual_trigger(echan->ch_num);
+		}
+		dev_dbg(dev, "handled error on channel %d\n", ch_num);
 		break;
 	default:
 		break;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

In an effort to move to using Scatter gather lists of any size with
EDMA as discussed at [1] instead of placing limitations on the driver,
we work through the limitations of the EDMAC hardware to find missed
events and issue them.

The sequence of events that require this are:

For the scenario where MAX slots for an EDMA channel is 3:

SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null

The above SG list will have to be DMA'd in 2 sets:

(1) SG1 -> SG2 -> SG3 -> Null
(2) SG4 -> SG5 -> SG6 -> Null

After (1) is succesfully transferred, the events from the MMC controller
donot stop coming and are missed by the time we have setup the transfer
for (2). So here, we catch the events missed as an error condition and
issue them manually.

[1] http://marc.info/?l=linux-omap&m=137416733628831&w=2

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index d9a151b..aa4989f 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -417,7 +417,15 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
 
 		break;
 	case DMA_CC_ERROR:
-		dev_dbg(dev, "transfer error on channel %d\n", ch_num);
+		if (echan->edesc) {
+			dev_dbg(dev, "Missed event on %d, retrying\n",
+				ch_num);
+			edma_clean_channel(echan->ch_num);
+			edma_stop(echan->ch_num);
+			edma_start(echan->ch_num);
+			edma_manual_trigger(echan->ch_num);
+		}
+		dev_dbg(dev, "handled error on channel %d\n", ch_num);
 		break;
 	default:
 		break;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 5/9] dma: edma: Leave linked to Null slot instead of DUMMY slot
  2013-07-29 13:29 ` Joel Fernandes
  (?)
@ 2013-07-29 13:29   ` Joel Fernandes
  -1 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

Dummy slot has been used as a way for missed-events not to be
reported as missing. This has been particularly troublesome for cases
where we might want to temporarily pause all incoming events.

For EDMA DMAC, there is no way to do any such pausing of events as
the occurence of the "next" event is not software controlled.
Using "edma_pause" in IRQ handlers doesn't help as by then the event
in concern from the slave is already missed.

Linking a dummy slot, is seen to absorb these events which we didn't
want to miss. So we don't link to dummy, but instead leave it linked
to NULL set, allow an error condition and detect the channel that
missed it.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index aa4989f..1eda5cc 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -159,9 +159,6 @@ static void edma_execute(struct edma_chan *echan)
 		/* Link to the previous slot if not the last set */
 		if (i != (total_process - 1))
 			edma_link(echan->slot[i], echan->slot[i+1]);
-		/* Final pset links to the dummy pset */
-		else
-			edma_link(echan->slot[i], echan->ecc->dummy_slot);
 	}
 
 	edesc->total_processed += total_process;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 5/9] dma: edma: Leave linked to Null slot instead of DUMMY slot
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux DaVinci Kernel List, Joel Fernandes, Linux MMC List,
	Linux Kernel Mailing List, Linux OMAP List,
	Linux ARM Kernel List

Dummy slot has been used as a way for missed-events not to be
reported as missing. This has been particularly troublesome for cases
where we might want to temporarily pause all incoming events.

For EDMA DMAC, there is no way to do any such pausing of events as
the occurence of the "next" event is not software controlled.
Using "edma_pause" in IRQ handlers doesn't help as by then the event
in concern from the slave is already missed.

Linking a dummy slot, is seen to absorb these events which we didn't
want to miss. So we don't link to dummy, but instead leave it linked
to NULL set, allow an error condition and detect the channel that
missed it.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index aa4989f..1eda5cc 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -159,9 +159,6 @@ static void edma_execute(struct edma_chan *echan)
 		/* Link to the previous slot if not the last set */
 		if (i != (total_process - 1))
 			edma_link(echan->slot[i], echan->slot[i+1]);
-		/* Final pset links to the dummy pset */
-		else
-			edma_link(echan->slot[i], echan->ecc->dummy_slot);
 	}
 
 	edesc->total_processed += total_process;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 5/9] dma: edma: Leave linked to Null slot instead of DUMMY slot
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

Dummy slot has been used as a way for missed-events not to be
reported as missing. This has been particularly troublesome for cases
where we might want to temporarily pause all incoming events.

For EDMA DMAC, there is no way to do any such pausing of events as
the occurence of the "next" event is not software controlled.
Using "edma_pause" in IRQ handlers doesn't help as by then the event
in concern from the slave is already missed.

Linking a dummy slot, is seen to absorb these events which we didn't
want to miss. So we don't link to dummy, but instead leave it linked
to NULL set, allow an error condition and detect the channel that
missed it.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index aa4989f..1eda5cc 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -159,9 +159,6 @@ static void edma_execute(struct edma_chan *echan)
 		/* Link to the previous slot if not the last set */
 		if (i != (total_process - 1))
 			edma_link(echan->slot[i], echan->slot[i+1]);
-		/* Final pset links to the dummy pset */
-		else
-			edma_link(echan->slot[i], echan->ecc->dummy_slot);
 	}
 
 	edesc->total_processed += total_process;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 6/9] dma: edma: Detect null slot errors and handle them correctly
  2013-07-29 13:29 ` Joel Fernandes
  (?)
@ 2013-07-29 13:29   ` Joel Fernandes
  -1 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

For crypto IP, we continue to receive events even continuously in
NULL slot, and request lines don't get de-asserted unlike omap_hsmmc.
Due to this, we continously receive error interrupts when we
manually trigger an event.

We fix this, by first detecting if the Channel is currently transferring
from a NULL slot or not, that's where the edma_read_slot in the error
callback from interrupt handler comes in.

Second thing we do is, if we detect if we are on a NULL slot, we don't
forcefully trigger as this will only result in more error conditions.
Instead we set a missed flag and allow the manual triggerring to happen
in edma_execute which will eventually be called. This fixes the issue
where we are on a NULL slot and continue to receive events from
modules like crypto that don't stop their request events after a
transfer is completed.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |   42 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 1eda5cc..c72e8c9 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -70,6 +70,7 @@ struct edma_chan {
 	int				ch_num;
 	bool				alloced;
 	int				slot[EDMA_MAX_SLOTS];
+	int				missed;
 	struct dma_slave_config		cfg;
 };
 
@@ -169,6 +170,18 @@ static void edma_execute(struct edma_chan *echan)
 		dev_dbg(dev, "first transfer starting %d\n", echan->ch_num);
 		edma_start(echan->ch_num);
 	}
+
+	/* This happens due to setup times between intermediate transfers
+	   in long SG lists which have to be broken up into transfers of
+	   MAX_NR_SG */
+	if (echan->missed) {
+		dev_dbg(dev, "missed event in execute detected\n");
+		edma_clean_channel(echan->ch_num);
+		edma_stop(echan->ch_num);
+		edma_start(echan->ch_num);
+		edma_manual_trigger(echan->ch_num);
+		echan->missed = 0;
+	}
 }
 
 static int edma_terminate_all(struct edma_chan *echan)
@@ -387,6 +400,7 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
 	struct device *dev = echan->vchan.chan.device->dev;
 	struct edma_desc *edesc;
 	unsigned long flags;
+	struct edmacc_param p;
 
 	/* Pause the channel */
 	edma_pause(echan->ch_num);
@@ -414,15 +428,35 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
 
 		break;
 	case DMA_CC_ERROR:
-		if (echan->edesc) {
-			dev_dbg(dev, "Missed event on %d, retrying\n",
-				ch_num);
+		spin_lock_irqsave(&echan->vchan.lock, flags);
+
+		edma_read_slot(EDMA_CHAN_SLOT(echan->slot[0]), &p);
+
+		if (p.a_b_cnt == 0 && p.ccnt == 0) {
+			dev_dbg(dev, "Error occurred, looks like slot is null, just setting miss\n");
+			/*
+			   Issue later based on missed flag which will be sure
+			   to happen as:
+			   (1) we finished transmitting an intermediate slot and
+			       edma_execute is coming up.
+			   (2) or we finished current transfer and issue will
+			       call edma_execute.
+
+			   Important note: issuing can be dangerous here and
+			   lead to some nasty recursion as this is a NULL slot
+			   at this point.
+			*/
+			echan->missed = 1;
+		} else {
+			dev_dbg(dev, "Error occurred but slot is non-null, TRIGGERING\n");
 			edma_clean_channel(echan->ch_num);
 			edma_stop(echan->ch_num);
 			edma_start(echan->ch_num);
 			edma_manual_trigger(echan->ch_num);
 		}
-		dev_dbg(dev, "handled error on channel %d\n", ch_num);
+
+		spin_unlock_irqrestore(&echan->vchan.lock, flags);
+
 		break;
 	default:
 		break;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 6/9] dma: edma: Detect null slot errors and handle them correctly
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

For crypto IP, we continue to receive events even continuously in
NULL slot, and request lines don't get de-asserted unlike omap_hsmmc.
Due to this, we continously receive error interrupts when we
manually trigger an event.

We fix this, by first detecting if the Channel is currently transferring
from a NULL slot or not, that's where the edma_read_slot in the error
callback from interrupt handler comes in.

Second thing we do is, if we detect if we are on a NULL slot, we don't
forcefully trigger as this will only result in more error conditions.
Instead we set a missed flag and allow the manual triggerring to happen
in edma_execute which will eventually be called. This fixes the issue
where we are on a NULL slot and continue to receive events from
modules like crypto that don't stop their request events after a
transfer is completed.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |   42 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 1eda5cc..c72e8c9 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -70,6 +70,7 @@ struct edma_chan {
 	int				ch_num;
 	bool				alloced;
 	int				slot[EDMA_MAX_SLOTS];
+	int				missed;
 	struct dma_slave_config		cfg;
 };
 
@@ -169,6 +170,18 @@ static void edma_execute(struct edma_chan *echan)
 		dev_dbg(dev, "first transfer starting %d\n", echan->ch_num);
 		edma_start(echan->ch_num);
 	}
+
+	/* This happens due to setup times between intermediate transfers
+	   in long SG lists which have to be broken up into transfers of
+	   MAX_NR_SG */
+	if (echan->missed) {
+		dev_dbg(dev, "missed event in execute detected\n");
+		edma_clean_channel(echan->ch_num);
+		edma_stop(echan->ch_num);
+		edma_start(echan->ch_num);
+		edma_manual_trigger(echan->ch_num);
+		echan->missed = 0;
+	}
 }
 
 static int edma_terminate_all(struct edma_chan *echan)
@@ -387,6 +400,7 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
 	struct device *dev = echan->vchan.chan.device->dev;
 	struct edma_desc *edesc;
 	unsigned long flags;
+	struct edmacc_param p;
 
 	/* Pause the channel */
 	edma_pause(echan->ch_num);
@@ -414,15 +428,35 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
 
 		break;
 	case DMA_CC_ERROR:
-		if (echan->edesc) {
-			dev_dbg(dev, "Missed event on %d, retrying\n",
-				ch_num);
+		spin_lock_irqsave(&echan->vchan.lock, flags);
+
+		edma_read_slot(EDMA_CHAN_SLOT(echan->slot[0]), &p);
+
+		if (p.a_b_cnt == 0 && p.ccnt == 0) {
+			dev_dbg(dev, "Error occurred, looks like slot is null, just setting miss\n");
+			/*
+			   Issue later based on missed flag which will be sure
+			   to happen as:
+			   (1) we finished transmitting an intermediate slot and
+			       edma_execute is coming up.
+			   (2) or we finished current transfer and issue will
+			       call edma_execute.
+
+			   Important note: issuing can be dangerous here and
+			   lead to some nasty recursion as this is a NULL slot
+			   at this point.
+			*/
+			echan->missed = 1;
+		} else {
+			dev_dbg(dev, "Error occurred but slot is non-null, TRIGGERING\n");
 			edma_clean_channel(echan->ch_num);
 			edma_stop(echan->ch_num);
 			edma_start(echan->ch_num);
 			edma_manual_trigger(echan->ch_num);
 		}
-		dev_dbg(dev, "handled error on channel %d\n", ch_num);
+
+		spin_unlock_irqrestore(&echan->vchan.lock, flags);
+
 		break;
 	default:
 		break;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 6/9] dma: edma: Detect null slot errors and handle them correctly
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

For crypto IP, we continue to receive events even continuously in
NULL slot, and request lines don't get de-asserted unlike omap_hsmmc.
Due to this, we continously receive error interrupts when we
manually trigger an event.

We fix this, by first detecting if the Channel is currently transferring
from a NULL slot or not, that's where the edma_read_slot in the error
callback from interrupt handler comes in.

Second thing we do is, if we detect if we are on a NULL slot, we don't
forcefully trigger as this will only result in more error conditions.
Instead we set a missed flag and allow the manual triggerring to happen
in edma_execute which will eventually be called. This fixes the issue
where we are on a NULL slot and continue to receive events from
modules like crypto that don't stop their request events after a
transfer is completed.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |   42 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 1eda5cc..c72e8c9 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -70,6 +70,7 @@ struct edma_chan {
 	int				ch_num;
 	bool				alloced;
 	int				slot[EDMA_MAX_SLOTS];
+	int				missed;
 	struct dma_slave_config		cfg;
 };
 
@@ -169,6 +170,18 @@ static void edma_execute(struct edma_chan *echan)
 		dev_dbg(dev, "first transfer starting %d\n", echan->ch_num);
 		edma_start(echan->ch_num);
 	}
+
+	/* This happens due to setup times between intermediate transfers
+	   in long SG lists which have to be broken up into transfers of
+	   MAX_NR_SG */
+	if (echan->missed) {
+		dev_dbg(dev, "missed event in execute detected\n");
+		edma_clean_channel(echan->ch_num);
+		edma_stop(echan->ch_num);
+		edma_start(echan->ch_num);
+		edma_manual_trigger(echan->ch_num);
+		echan->missed = 0;
+	}
 }
 
 static int edma_terminate_all(struct edma_chan *echan)
@@ -387,6 +400,7 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
 	struct device *dev = echan->vchan.chan.device->dev;
 	struct edma_desc *edesc;
 	unsigned long flags;
+	struct edmacc_param p;
 
 	/* Pause the channel */
 	edma_pause(echan->ch_num);
@@ -414,15 +428,35 @@ static void edma_callback(unsigned ch_num, u16 ch_status, void *data)
 
 		break;
 	case DMA_CC_ERROR:
-		if (echan->edesc) {
-			dev_dbg(dev, "Missed event on %d, retrying\n",
-				ch_num);
+		spin_lock_irqsave(&echan->vchan.lock, flags);
+
+		edma_read_slot(EDMA_CHAN_SLOT(echan->slot[0]), &p);
+
+		if (p.a_b_cnt == 0 && p.ccnt == 0) {
+			dev_dbg(dev, "Error occurred, looks like slot is null, just setting miss\n");
+			/*
+			   Issue later based on missed flag which will be sure
+			   to happen as:
+			   (1) we finished transmitting an intermediate slot and
+			       edma_execute is coming up.
+			   (2) or we finished current transfer and issue will
+			       call edma_execute.
+
+			   Important note: issuing can be dangerous here and
+			   lead to some nasty recursion as this is a NULL slot
+			   at this point.
+			*/
+			echan->missed = 1;
+		} else {
+			dev_dbg(dev, "Error occurred but slot is non-null, TRIGGERING\n");
 			edma_clean_channel(echan->ch_num);
 			edma_stop(echan->ch_num);
 			edma_start(echan->ch_num);
 			edma_manual_trigger(echan->ch_num);
 		}
-		dev_dbg(dev, "handled error on channel %d\n", ch_num);
+
+		spin_unlock_irqrestore(&echan->vchan.lock, flags);
+
 		break;
 	default:
 		break;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
  2013-07-29 13:29 ` Joel Fernandes
  (?)
@ 2013-07-29 13:29   ` Joel Fernandes
  -1 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

We certainly don't want error conditions to be cleared anywhere
as this will make us 'forget' about missed events. We depend on
knowing which events were missed in order to be able to reissue them.

This fixes a race condition where the EMR was being cleared
by the transfer completion interrupt handler.

Basically, what was happening was:

            Missed event
             |
             |
             V
SG1-SG2-SG3-Null
         \
          \__TC Interrupt (Almost same time as ARM is executing
TC interrupt handler, an event got missed and also forgotten
by clearing the EMR).

The EMR is ultimately being cleared by the Error interrupt
handler once it is handled so we don't have to do it in edma_stop.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 arch/arm/common/edma.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
index 10995b2..dec772e 100644
--- a/arch/arm/common/edma.c
+++ b/arch/arm/common/edma.c
@@ -1328,7 +1328,6 @@ void edma_stop(unsigned channel)
 		edma_shadow0_write_array(ctlr, SH_EECR, j, mask);
 		edma_shadow0_write_array(ctlr, SH_ECR, j, mask);
 		edma_shadow0_write_array(ctlr, SH_SECR, j, mask);
-		edma_write_array(ctlr, EDMA_EMCR, j, mask);
 
 		pr_debug("EDMA: EER%d %08x\n", j,
 				edma_shadow0_read_array(ctlr, SH_EER, j));
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux DaVinci Kernel List, Joel Fernandes, Linux MMC List,
	Linux Kernel Mailing List, Linux OMAP List,
	Linux ARM Kernel List

We certainly don't want error conditions to be cleared anywhere
as this will make us 'forget' about missed events. We depend on
knowing which events were missed in order to be able to reissue them.

This fixes a race condition where the EMR was being cleared
by the transfer completion interrupt handler.

Basically, what was happening was:

            Missed event
             |
             |
             V
SG1-SG2-SG3-Null
         \
          \__TC Interrupt (Almost same time as ARM is executing
TC interrupt handler, an event got missed and also forgotten
by clearing the EMR).

The EMR is ultimately being cleared by the Error interrupt
handler once it is handled so we don't have to do it in edma_stop.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 arch/arm/common/edma.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
index 10995b2..dec772e 100644
--- a/arch/arm/common/edma.c
+++ b/arch/arm/common/edma.c
@@ -1328,7 +1328,6 @@ void edma_stop(unsigned channel)
 		edma_shadow0_write_array(ctlr, SH_EECR, j, mask);
 		edma_shadow0_write_array(ctlr, SH_ECR, j, mask);
 		edma_shadow0_write_array(ctlr, SH_SECR, j, mask);
-		edma_write_array(ctlr, EDMA_EMCR, j, mask);
 
 		pr_debug("EDMA: EER%d %08x\n", j,
 				edma_shadow0_read_array(ctlr, SH_EER, j));
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

We certainly don't want error conditions to be cleared anywhere
as this will make us 'forget' about missed events. We depend on
knowing which events were missed in order to be able to reissue them.

This fixes a race condition where the EMR was being cleared
by the transfer completion interrupt handler.

Basically, what was happening was:

            Missed event
             |
             |
             V
SG1-SG2-SG3-Null
         \
          \__TC Interrupt (Almost same time as ARM is executing
TC interrupt handler, an event got missed and also forgotten
by clearing the EMR).

The EMR is ultimately being cleared by the Error interrupt
handler once it is handled so we don't have to do it in edma_stop.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 arch/arm/common/edma.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
index 10995b2..dec772e 100644
--- a/arch/arm/common/edma.c
+++ b/arch/arm/common/edma.c
@@ -1328,7 +1328,6 @@ void edma_stop(unsigned channel)
 		edma_shadow0_write_array(ctlr, SH_EECR, j, mask);
 		edma_shadow0_write_array(ctlr, SH_ECR, j, mask);
 		edma_shadow0_write_array(ctlr, SH_SECR, j, mask);
-		edma_write_array(ctlr, EDMA_EMCR, j, mask);
 
 		pr_debug("EDMA: EER%d %08x\n", j,
 				edma_shadow0_read_array(ctlr, SH_EER, j));
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 8/9] dma: edma: Link to dummy slot only for last SG list split
  2013-07-29 13:29 ` Joel Fernandes
  (?)
@ 2013-07-29 13:29   ` Joel Fernandes
  -1 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

Consider the case where we have a scatter-list like:
SG1->SG2->SG3->SG4->SG5->SG6->Null

For ex, for a MAX_NR_SG of 2, earlier we were splitting this as:
SG1->SG2->Null
SG3->SG4->Null
SG5->SG6->Null

Now we split it as
SG1->SG2->Null
SG3->SG4->Null
SG5->SG6->Dummy

This approach results in lesser unwanted interrupts that occur
for the last list split. The Dummy slot has the property of not
raising an error condition if events are missed unlike the Null
slot. We are OK with this as we're done with processing the
whole list once we reach Dummy.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index c72e8c9..0d3ebde 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -164,6 +164,12 @@ static void edma_execute(struct edma_chan *echan)
 
 	edesc->total_processed += total_process;
 
+	/* If this is either the last set in a set of SG-list transactions
+	   then setup a link to the dummy slot, this results in all future
+	   events being absorbed and that's OK because we're done */
+	if (edesc->total_processed == edesc->pset_nr)
+		edma_link(echan->slot[total_process-1], echan->ecc->dummy_slot);
+
 	edma_resume(echan->ch_num);
 
 	if (edesc->total_processed <= MAX_NR_SG) {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 8/9] dma: edma: Link to dummy slot only for last SG list split
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

Consider the case where we have a scatter-list like:
SG1->SG2->SG3->SG4->SG5->SG6->Null

For ex, for a MAX_NR_SG of 2, earlier we were splitting this as:
SG1->SG2->Null
SG3->SG4->Null
SG5->SG6->Null

Now we split it as
SG1->SG2->Null
SG3->SG4->Null
SG5->SG6->Dummy

This approach results in lesser unwanted interrupts that occur
for the last list split. The Dummy slot has the property of not
raising an error condition if events are missed unlike the Null
slot. We are OK with this as we're done with processing the
whole list once we reach Dummy.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index c72e8c9..0d3ebde 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -164,6 +164,12 @@ static void edma_execute(struct edma_chan *echan)
 
 	edesc->total_processed += total_process;
 
+	/* If this is either the last set in a set of SG-list transactions
+	   then setup a link to the dummy slot, this results in all future
+	   events being absorbed and that's OK because we're done */
+	if (edesc->total_processed == edesc->pset_nr)
+		edma_link(echan->slot[total_process-1], echan->ecc->dummy_slot);
+
 	edma_resume(echan->ch_num);
 
 	if (edesc->total_processed <= MAX_NR_SG) {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 8/9] dma: edma: Link to dummy slot only for last SG list split
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

Consider the case where we have a scatter-list like:
SG1->SG2->SG3->SG4->SG5->SG6->Null

For ex, for a MAX_NR_SG of 2, earlier we were splitting this as:
SG1->SG2->Null
SG3->SG4->Null
SG5->SG6->Null

Now we split it as
SG1->SG2->Null
SG3->SG4->Null
SG5->SG6->Dummy

This approach results in lesser unwanted interrupts that occur
for the last list split. The Dummy slot has the property of not
raising an error condition if events are missed unlike the Null
slot. We are OK with this as we're done with processing the
whole list once we reach Dummy.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index c72e8c9..0d3ebde 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -164,6 +164,12 @@ static void edma_execute(struct edma_chan *echan)
 
 	edesc->total_processed += total_process;
 
+	/* If this is either the last set in a set of SG-list transactions
+	   then setup a link to the dummy slot, this results in all future
+	   events being absorbed and that's OK because we're done */
+	if (edesc->total_processed == edesc->pset_nr)
+		edma_link(echan->slot[total_process-1], echan->ecc->dummy_slot);
+
 	edma_resume(echan->ch_num);
 
 	if (edesc->total_processed <= MAX_NR_SG) {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 9/9] dma: edma: remove limits on number of slots
  2013-07-29 13:29 ` Joel Fernandes
  (?)
@ 2013-07-29 13:29   ` Joel Fernandes
  -1 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

With this series, this check is no longer required and
we shouldn't need to reject drivers DMA'ing more than the
MAX number of slots.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |    6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 0d3ebde..abf2e87 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -285,12 +285,6 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 		return NULL;
 	}
 
-	if (sg_len > MAX_NR_SG) {
-		dev_err(dev, "Exceeded max SG segments %d > %d\n",
-			sg_len, MAX_NR_SG);
-		return NULL;
-	}
-
 	edesc = kzalloc(sizeof(*edesc) + sg_len *
 		sizeof(edesc->pset[0]), GFP_ATOMIC);
 	if (!edesc) {
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 9/9] dma: edma: remove limits on number of slots
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: Tony Lindgren, Sekhar Nori, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner
  Cc: Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List, Joel Fernandes

With this series, this check is no longer required and
we shouldn't need to reject drivers DMA'ing more than the
MAX number of slots.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |    6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 0d3ebde..abf2e87 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -285,12 +285,6 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 		return NULL;
 	}
 
-	if (sg_len > MAX_NR_SG) {
-		dev_err(dev, "Exceeded max SG segments %d > %d\n",
-			sg_len, MAX_NR_SG);
-		return NULL;
-	}
-
 	edesc = kzalloc(sizeof(*edesc) + sg_len *
 		sizeof(edesc->pset[0]), GFP_ATOMIC);
 	if (!edesc) {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH 9/9] dma: edma: remove limits on number of slots
@ 2013-07-29 13:29   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-29 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

With this series, this check is no longer required and
we shouldn't need to reject drivers DMA'ing more than the
MAX number of slots.

Signed-off-by: Joel Fernandes <joelf@ti.com>
---
 drivers/dma/edma.c |    6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
index 0d3ebde..abf2e87 100644
--- a/drivers/dma/edma.c
+++ b/drivers/dma/edma.c
@@ -285,12 +285,6 @@ static struct dma_async_tx_descriptor *edma_prep_slave_sg(
 		return NULL;
 	}
 
-	if (sg_len > MAX_NR_SG) {
-		dev_err(dev, "Exceeded max SG segments %d > %d\n",
-			sg_len, MAX_NR_SG);
-		return NULL;
-	}
-
 	edesc = kzalloc(sizeof(*edesc) + sg_len *
 		sizeof(edesc->pset[0]), GFP_ATOMIC);
 	if (!edesc) {
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-30  5:18     ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-30  5:18 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
> Manual trigger for events missed as a result of splitting a
> scatter gather list and DMA'ing it in batches. Add a helper
> function to trigger a channel incase any such events are missed.
> 
> Signed-off-by: Joel Fernandes <joelf@ti.com>
> ---
>  arch/arm/common/edma.c             |   21 +++++++++++++++++++++
>  include/linux/platform_data/edma.h |    2 ++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
> index 3567ba1..10995b2 100644
> --- a/arch/arm/common/edma.c
> +++ b/arch/arm/common/edma.c
> @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
>  }
>  EXPORT_SYMBOL(edma_resume);
>  
> +int edma_manual_trigger(unsigned channel)

edma_trigger_channel() maybe? Brings consistency with
edma_alloc_channel() edma_free_channel() etc.

> +{
> +	unsigned ctlr;
> +	int j;
> +	unsigned int mask;
> +
> +	ctlr = EDMA_CTLR(channel);
> +	channel = EDMA_CHAN_SLOT(channel);
> +	mask = BIT(channel & 0x1f);
> +
> +	j = channel >> 5;
> +
> +	/* EDMA channels without event association */

May be actually check for no-event association before you trigger in
software? You can do that by looking at unused channel list, no?

> +	edma_shadow0_write_array(ctlr, SH_ESR, j, mask);

edma_shadow0_write_array(ctlr, SH_ESR, channel >> 5, mask) is no less
readable, but I leave it to you.

Thanks,
Sekhar


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-30  5:18     ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-30  5:18 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
> Manual trigger for events missed as a result of splitting a
> scatter gather list and DMA'ing it in batches. Add a helper
> function to trigger a channel incase any such events are missed.
> 
> Signed-off-by: Joel Fernandes <joelf-l0cyMroinI0@public.gmane.org>
> ---
>  arch/arm/common/edma.c             |   21 +++++++++++++++++++++
>  include/linux/platform_data/edma.h |    2 ++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
> index 3567ba1..10995b2 100644
> --- a/arch/arm/common/edma.c
> +++ b/arch/arm/common/edma.c
> @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
>  }
>  EXPORT_SYMBOL(edma_resume);
>  
> +int edma_manual_trigger(unsigned channel)

edma_trigger_channel() maybe? Brings consistency with
edma_alloc_channel() edma_free_channel() etc.

> +{
> +	unsigned ctlr;
> +	int j;
> +	unsigned int mask;
> +
> +	ctlr = EDMA_CTLR(channel);
> +	channel = EDMA_CHAN_SLOT(channel);
> +	mask = BIT(channel & 0x1f);
> +
> +	j = channel >> 5;
> +
> +	/* EDMA channels without event association */

May be actually check for no-event association before you trigger in
software? You can do that by looking at unused channel list, no?

> +	edma_shadow0_write_array(ctlr, SH_ESR, j, mask);

edma_shadow0_write_array(ctlr, SH_ESR, channel >> 5, mask) is no less
readable, but I leave it to you.

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-30  5:18     ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-30  5:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
> Manual trigger for events missed as a result of splitting a
> scatter gather list and DMA'ing it in batches. Add a helper
> function to trigger a channel incase any such events are missed.
> 
> Signed-off-by: Joel Fernandes <joelf@ti.com>
> ---
>  arch/arm/common/edma.c             |   21 +++++++++++++++++++++
>  include/linux/platform_data/edma.h |    2 ++
>  2 files changed, 23 insertions(+)
> 
> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
> index 3567ba1..10995b2 100644
> --- a/arch/arm/common/edma.c
> +++ b/arch/arm/common/edma.c
> @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
>  }
>  EXPORT_SYMBOL(edma_resume);
>  
> +int edma_manual_trigger(unsigned channel)

edma_trigger_channel() maybe? Brings consistency with
edma_alloc_channel() edma_free_channel() etc.

> +{
> +	unsigned ctlr;
> +	int j;
> +	unsigned int mask;
> +
> +	ctlr = EDMA_CTLR(channel);
> +	channel = EDMA_CHAN_SLOT(channel);
> +	mask = BIT(channel & 0x1f);
> +
> +	j = channel >> 5;
> +
> +	/* EDMA channels without event association */

May be actually check for no-event association before you trigger in
software? You can do that by looking at unused channel list, no?

> +	edma_shadow0_write_array(ctlr, SH_ESR, j, mask);

edma_shadow0_write_array(ctlr, SH_ESR, channel >> 5, mask) is no less
readable, but I leave it to you.

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-07-30  7:05     ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-30  7:05 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
> In an effort to move to using Scatter gather lists of any size with
> EDMA as discussed at [1] instead of placing limitations on the driver,
> we work through the limitations of the EDMAC hardware to find missed
> events and issue them.
> 
> The sequence of events that require this are:
> 
> For the scenario where MAX slots for an EDMA channel is 3:
> 
> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
> 
> The above SG list will have to be DMA'd in 2 sets:
> 
> (1) SG1 -> SG2 -> SG3 -> Null
> (2) SG4 -> SG5 -> SG6 -> Null
> 
> After (1) is succesfully transferred, the events from the MMC controller
> donot stop coming and are missed by the time we have setup the transfer
> for (2). So here, we catch the events missed as an error condition and
> issue them manually.

Are you sure there wont be any effect of these missed events on the
peripheral side. For example, wont McASP get into an underrun condition
when it encounters a null PaRAM set? Even UART has to transmit to a
particular baud so I guess it cannot wait like the way MMC/SD can.

Also, wont this lead to under-utilization of the peripheral bandwith?
Meaning, MMC/SD is ready with data but cannot transfer because the DMA
is waiting to be set-up.

Did you consider a ping-pong scheme with say three PaRAM sets per
channel? That way you can keep a continuous transfer going on from the
peripheral over the complete SG list.

Thanks,
Sekhar


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-07-30  7:05     ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-30  7:05 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
> In an effort to move to using Scatter gather lists of any size with
> EDMA as discussed at [1] instead of placing limitations on the driver,
> we work through the limitations of the EDMAC hardware to find missed
> events and issue them.
> 
> The sequence of events that require this are:
> 
> For the scenario where MAX slots for an EDMA channel is 3:
> 
> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
> 
> The above SG list will have to be DMA'd in 2 sets:
> 
> (1) SG1 -> SG2 -> SG3 -> Null
> (2) SG4 -> SG5 -> SG6 -> Null
> 
> After (1) is succesfully transferred, the events from the MMC controller
> donot stop coming and are missed by the time we have setup the transfer
> for (2). So here, we catch the events missed as an error condition and
> issue them manually.

Are you sure there wont be any effect of these missed events on the
peripheral side. For example, wont McASP get into an underrun condition
when it encounters a null PaRAM set? Even UART has to transmit to a
particular baud so I guess it cannot wait like the way MMC/SD can.

Also, wont this lead to under-utilization of the peripheral bandwith?
Meaning, MMC/SD is ready with data but cannot transfer because the DMA
is waiting to be set-up.

Did you consider a ping-pong scheme with say three PaRAM sets per
channel? That way you can keep a continuous transfer going on from the
peripheral over the complete SG list.

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-07-30  7:05     ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-30  7:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
> In an effort to move to using Scatter gather lists of any size with
> EDMA as discussed at [1] instead of placing limitations on the driver,
> we work through the limitations of the EDMAC hardware to find missed
> events and issue them.
> 
> The sequence of events that require this are:
> 
> For the scenario where MAX slots for an EDMA channel is 3:
> 
> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
> 
> The above SG list will have to be DMA'd in 2 sets:
> 
> (1) SG1 -> SG2 -> SG3 -> Null
> (2) SG4 -> SG5 -> SG6 -> Null
> 
> After (1) is succesfully transferred, the events from the MMC controller
> donot stop coming and are missed by the time we have setup the transfer
> for (2). So here, we catch the events missed as an error condition and
> issue them manually.

Are you sure there wont be any effect of these missed events on the
peripheral side. For example, wont McASP get into an underrun condition
when it encounters a null PaRAM set? Even UART has to transmit to a
particular baud so I guess it cannot wait like the way MMC/SD can.

Also, wont this lead to under-utilization of the peripheral bandwith?
Meaning, MMC/SD is ready with data but cannot transfer because the DMA
is waiting to be set-up.

Did you consider a ping-pong scheme with say three PaRAM sets per
channel? That way you can keep a continuous transfer going on from the
peripheral over the complete SG list.

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-07-30  8:29     ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-30  8:29 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
> We certainly don't want error conditions to be cleared anywhere

'anywhere' is a really loaded term.

> as this will make us 'forget' about missed events. We depend on
> knowing which events were missed in order to be able to reissue them.

> This fixes a race condition where the EMR was being cleared
> by the transfer completion interrupt handler.
> 
> Basically, what was happening was:
> 
>             Missed event
>              |
>              |
>              V
> SG1-SG2-SG3-Null
>          \
>           \__TC Interrupt (Almost same time as ARM is executing
> TC interrupt handler, an event got missed and also forgotten
> by clearing the EMR).

Sorry, but I dont see how edma_stop() is coming into picture in the race
you describe?

> The EMR is ultimately being cleared by the Error interrupt
> handler once it is handled so we don't have to do it in edma_stop.

This, I agree with. edma_clean_channel() also there to re-initialize the
channel so doing it in edma_stop() certainly seems superfluous.

Thanks,
Sekhar

> 
> Signed-off-by: Joel Fernandes <joelf@ti.com>
> ---
>  arch/arm/common/edma.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
> index 10995b2..dec772e 100644
> --- a/arch/arm/common/edma.c
> +++ b/arch/arm/common/edma.c
> @@ -1328,7 +1328,6 @@ void edma_stop(unsigned channel)
>  		edma_shadow0_write_array(ctlr, SH_EECR, j, mask);
>  		edma_shadow0_write_array(ctlr, SH_ECR, j, mask);
>  		edma_shadow0_write_array(ctlr, SH_SECR, j, mask);
> -		edma_write_array(ctlr, EDMA_EMCR, j, mask);
>  
>  		pr_debug("EDMA: EER%d %08x\n", j,
>  				edma_shadow0_read_array(ctlr, SH_EER, j));
> 


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-07-30  8:29     ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-30  8:29 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
> We certainly don't want error conditions to be cleared anywhere

'anywhere' is a really loaded term.

> as this will make us 'forget' about missed events. We depend on
> knowing which events were missed in order to be able to reissue them.

> This fixes a race condition where the EMR was being cleared
> by the transfer completion interrupt handler.
> 
> Basically, what was happening was:
> 
>             Missed event
>              |
>              |
>              V
> SG1-SG2-SG3-Null
>          \
>           \__TC Interrupt (Almost same time as ARM is executing
> TC interrupt handler, an event got missed and also forgotten
> by clearing the EMR).

Sorry, but I dont see how edma_stop() is coming into picture in the race
you describe?

> The EMR is ultimately being cleared by the Error interrupt
> handler once it is handled so we don't have to do it in edma_stop.

This, I agree with. edma_clean_channel() also there to re-initialize the
channel so doing it in edma_stop() certainly seems superfluous.

Thanks,
Sekhar

> 
> Signed-off-by: Joel Fernandes <joelf-l0cyMroinI0@public.gmane.org>
> ---
>  arch/arm/common/edma.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
> index 10995b2..dec772e 100644
> --- a/arch/arm/common/edma.c
> +++ b/arch/arm/common/edma.c
> @@ -1328,7 +1328,6 @@ void edma_stop(unsigned channel)
>  		edma_shadow0_write_array(ctlr, SH_EECR, j, mask);
>  		edma_shadow0_write_array(ctlr, SH_ECR, j, mask);
>  		edma_shadow0_write_array(ctlr, SH_SECR, j, mask);
> -		edma_write_array(ctlr, EDMA_EMCR, j, mask);
>  
>  		pr_debug("EDMA: EER%d %08x\n", j,
>  				edma_shadow0_read_array(ctlr, SH_EER, j));
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-07-30  8:29     ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-30  8:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
> We certainly don't want error conditions to be cleared anywhere

'anywhere' is a really loaded term.

> as this will make us 'forget' about missed events. We depend on
> knowing which events were missed in order to be able to reissue them.

> This fixes a race condition where the EMR was being cleared
> by the transfer completion interrupt handler.
> 
> Basically, what was happening was:
> 
>             Missed event
>              |
>              |
>              V
> SG1-SG2-SG3-Null
>          \
>           \__TC Interrupt (Almost same time as ARM is executing
> TC interrupt handler, an event got missed and also forgotten
> by clearing the EMR).

Sorry, but I dont see how edma_stop() is coming into picture in the race
you describe?

> The EMR is ultimately being cleared by the Error interrupt
> handler once it is handled so we don't have to do it in edma_stop.

This, I agree with. edma_clean_channel() also there to re-initialize the
channel so doing it in edma_stop() certainly seems superfluous.

Thanks,
Sekhar

> 
> Signed-off-by: Joel Fernandes <joelf@ti.com>
> ---
>  arch/arm/common/edma.c |    1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
> index 10995b2..dec772e 100644
> --- a/arch/arm/common/edma.c
> +++ b/arch/arm/common/edma.c
> @@ -1328,7 +1328,6 @@ void edma_stop(unsigned channel)
>  		edma_shadow0_write_array(ctlr, SH_EECR, j, mask);
>  		edma_shadow0_write_array(ctlr, SH_ECR, j, mask);
>  		edma_shadow0_write_array(ctlr, SH_SECR, j, mask);
> -		edma_write_array(ctlr, EDMA_EMCR, j, mask);
>  
>  		pr_debug("EDMA: EER%d %08x\n", j,
>  				edma_shadow0_read_array(ctlr, SH_EER, j));
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-31  4:30       ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-31  4:30 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On 07/30/2013 12:18 AM, Sekhar Nori wrote:
> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>> Manual trigger for events missed as a result of splitting a
>> scatter gather list and DMA'ing it in batches. Add a helper
>> function to trigger a channel incase any such events are missed.
>>
>> Signed-off-by: Joel Fernandes <joelf@ti.com>
>> ---
>>  arch/arm/common/edma.c             |   21 +++++++++++++++++++++
>>  include/linux/platform_data/edma.h |    2 ++
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
>> index 3567ba1..10995b2 100644
>> --- a/arch/arm/common/edma.c
>> +++ b/arch/arm/common/edma.c
>> @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
>>  }
>>  EXPORT_SYMBOL(edma_resume);
>>  
>> +int edma_manual_trigger(unsigned channel)
> 
> edma_trigger_channel() maybe? Brings consistency with
> edma_alloc_channel() edma_free_channel() etc.

Ok, sure.

> 
>> +{
>> +	unsigned ctlr;
>> +	int j;
>> +	unsigned int mask;
>> +
>> +	ctlr = EDMA_CTLR(channel);
>> +	channel = EDMA_CHAN_SLOT(channel);
>> +	mask = BIT(channel & 0x1f);
>> +
>> +	j = channel >> 5;
>> +
>> +	/* EDMA channels without event association */
> 
> May be actually check for no-event association before you trigger in
> software? You can do that by looking at unused channel list, no?

But, we want to trigger whether there is event association or not in
this function. For ex, MMC has event associated but still this function
is used to trigger event for it.

> 
>> +	edma_shadow0_write_array(ctlr, SH_ESR, j, mask);
> 
> edma_shadow0_write_array(ctlr, SH_ESR, channel >> 5, mask) is no less
> readable, but I leave it to you.

Sure that's more readable, will changed it to that.

Thanks,

-Joel


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-31  4:30       ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-31  4:30 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On 07/30/2013 12:18 AM, Sekhar Nori wrote:
> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>> Manual trigger for events missed as a result of splitting a
>> scatter gather list and DMA'ing it in batches. Add a helper
>> function to trigger a channel incase any such events are missed.
>>
>> Signed-off-by: Joel Fernandes <joelf-l0cyMroinI0@public.gmane.org>
>> ---
>>  arch/arm/common/edma.c             |   21 +++++++++++++++++++++
>>  include/linux/platform_data/edma.h |    2 ++
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
>> index 3567ba1..10995b2 100644
>> --- a/arch/arm/common/edma.c
>> +++ b/arch/arm/common/edma.c
>> @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
>>  }
>>  EXPORT_SYMBOL(edma_resume);
>>  
>> +int edma_manual_trigger(unsigned channel)
> 
> edma_trigger_channel() maybe? Brings consistency with
> edma_alloc_channel() edma_free_channel() etc.

Ok, sure.

> 
>> +{
>> +	unsigned ctlr;
>> +	int j;
>> +	unsigned int mask;
>> +
>> +	ctlr = EDMA_CTLR(channel);
>> +	channel = EDMA_CHAN_SLOT(channel);
>> +	mask = BIT(channel & 0x1f);
>> +
>> +	j = channel >> 5;
>> +
>> +	/* EDMA channels without event association */
> 
> May be actually check for no-event association before you trigger in
> software? You can do that by looking at unused channel list, no?

But, we want to trigger whether there is event association or not in
this function. For ex, MMC has event associated but still this function
is used to trigger event for it.

> 
>> +	edma_shadow0_write_array(ctlr, SH_ESR, j, mask);
> 
> edma_shadow0_write_array(ctlr, SH_ESR, channel >> 5, mask) is no less
> readable, but I leave it to you.

Sure that's more readable, will changed it to that.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-31  4:30       ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-31  4:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/30/2013 12:18 AM, Sekhar Nori wrote:
> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>> Manual trigger for events missed as a result of splitting a
>> scatter gather list and DMA'ing it in batches. Add a helper
>> function to trigger a channel incase any such events are missed.
>>
>> Signed-off-by: Joel Fernandes <joelf@ti.com>
>> ---
>>  arch/arm/common/edma.c             |   21 +++++++++++++++++++++
>>  include/linux/platform_data/edma.h |    2 ++
>>  2 files changed, 23 insertions(+)
>>
>> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
>> index 3567ba1..10995b2 100644
>> --- a/arch/arm/common/edma.c
>> +++ b/arch/arm/common/edma.c
>> @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
>>  }
>>  EXPORT_SYMBOL(edma_resume);
>>  
>> +int edma_manual_trigger(unsigned channel)
> 
> edma_trigger_channel() maybe? Brings consistency with
> edma_alloc_channel() edma_free_channel() etc.

Ok, sure.

> 
>> +{
>> +	unsigned ctlr;
>> +	int j;
>> +	unsigned int mask;
>> +
>> +	ctlr = EDMA_CTLR(channel);
>> +	channel = EDMA_CHAN_SLOT(channel);
>> +	mask = BIT(channel & 0x1f);
>> +
>> +	j = channel >> 5;
>> +
>> +	/* EDMA channels without event association */
> 
> May be actually check for no-event association before you trigger in
> software? You can do that by looking at unused channel list, no?

But, we want to trigger whether there is event association or not in
this function. For ex, MMC has event associated but still this function
is used to trigger event for it.

> 
>> +	edma_shadow0_write_array(ctlr, SH_ESR, j, mask);
> 
> edma_shadow0_write_array(ctlr, SH_ESR, channel >> 5, mask) is no less
> readable, but I leave it to you.

Sure that's more readable, will changed it to that.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-07-31  4:49       ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-31  4:49 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

Hi Sekhar,

On 07/30/2013 02:05 AM, Sekhar Nori wrote:
> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>> In an effort to move to using Scatter gather lists of any size with
>> EDMA as discussed at [1] instead of placing limitations on the driver,
>> we work through the limitations of the EDMAC hardware to find missed
>> events and issue them.
>>
>> The sequence of events that require this are:
>>
>> For the scenario where MAX slots for an EDMA channel is 3:
>>
>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>
>> The above SG list will have to be DMA'd in 2 sets:
>>
>> (1) SG1 -> SG2 -> SG3 -> Null
>> (2) SG4 -> SG5 -> SG6 -> Null
>>
>> After (1) is succesfully transferred, the events from the MMC controller
>> donot stop coming and are missed by the time we have setup the transfer
>> for (2). So here, we catch the events missed as an error condition and
>> issue them manually.
> 
> Are you sure there wont be any effect of these missed events on the
> peripheral side. For example, wont McASP get into an underrun condition
> when it encounters a null PaRAM set? Even UART has to transmit to a

But it will not encounter null PaRAM set because McASP uses contiguous
buffers for transfer which are not scattered across physical memory.
This can be accomplished with an SG of size 1. For such SGs, this patch
series leaves it linked Dummy and does not link to Null set. Null set is
only used for SG lists that are > MAX_NR_SG in size such as those
created for example by MMC and Crypto.

> particular baud so I guess it cannot wait like the way MMC/SD can.

Existing driver have to wait anyway if they hit MAX SG limit today. If
they don't want to wait, they would have allocated a contiguous block of
memory and DMA that in one stretch so they don't lose any events, and in
such cases we are not linking to Null.

> Also, wont this lead to under-utilization of the peripheral bandwith?
> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
> is waiting to be set-up.

But it is waiting anyway even today. Currently based on MAX segs, MMC
driver/subsystem will make SG list of size max_segs. Between these
sessions of creating such smaller SG-lists, if for some reason the MMC
controller is sending events, these will be lost anyway.

What will happen now with this patch series is we are simply accepting a
bigger list than this, and handling all the max_segs stuff within the
EDMA driver itself without outside world knowing. This is actually more
efficient as for long transfers, we are not going back and forth much
between the client and EDMA driver.

> Did you consider a ping-pong scheme with say three PaRAM sets per
> channel? That way you can keep a continuous transfer going on from the
> peripheral over the complete SG list.

Do you mean ping-pong scheme as used in the davinci-pcm driver today?
This can be used only for buffers that are contiguous in memory, not
those that are scattered across memory.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-07-31  4:49       ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-31  4:49 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

Hi Sekhar,

On 07/30/2013 02:05 AM, Sekhar Nori wrote:
> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>> In an effort to move to using Scatter gather lists of any size with
>> EDMA as discussed at [1] instead of placing limitations on the driver,
>> we work through the limitations of the EDMAC hardware to find missed
>> events and issue them.
>>
>> The sequence of events that require this are:
>>
>> For the scenario where MAX slots for an EDMA channel is 3:
>>
>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>
>> The above SG list will have to be DMA'd in 2 sets:
>>
>> (1) SG1 -> SG2 -> SG3 -> Null
>> (2) SG4 -> SG5 -> SG6 -> Null
>>
>> After (1) is succesfully transferred, the events from the MMC controller
>> donot stop coming and are missed by the time we have setup the transfer
>> for (2). So here, we catch the events missed as an error condition and
>> issue them manually.
> 
> Are you sure there wont be any effect of these missed events on the
> peripheral side. For example, wont McASP get into an underrun condition
> when it encounters a null PaRAM set? Even UART has to transmit to a

But it will not encounter null PaRAM set because McASP uses contiguous
buffers for transfer which are not scattered across physical memory.
This can be accomplished with an SG of size 1. For such SGs, this patch
series leaves it linked Dummy and does not link to Null set. Null set is
only used for SG lists that are > MAX_NR_SG in size such as those
created for example by MMC and Crypto.

> particular baud so I guess it cannot wait like the way MMC/SD can.

Existing driver have to wait anyway if they hit MAX SG limit today. If
they don't want to wait, they would have allocated a contiguous block of
memory and DMA that in one stretch so they don't lose any events, and in
such cases we are not linking to Null.

> Also, wont this lead to under-utilization of the peripheral bandwith?
> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
> is waiting to be set-up.

But it is waiting anyway even today. Currently based on MAX segs, MMC
driver/subsystem will make SG list of size max_segs. Between these
sessions of creating such smaller SG-lists, if for some reason the MMC
controller is sending events, these will be lost anyway.

What will happen now with this patch series is we are simply accepting a
bigger list than this, and handling all the max_segs stuff within the
EDMA driver itself without outside world knowing. This is actually more
efficient as for long transfers, we are not going back and forth much
between the client and EDMA driver.

> Did you consider a ping-pong scheme with say three PaRAM sets per
> channel? That way you can keep a continuous transfer going on from the
> peripheral over the complete SG list.

Do you mean ping-pong scheme as used in the davinci-pcm driver today?
This can be used only for buffers that are contiguous in memory, not
those that are scattered across memory.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-07-31  4:49       ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-31  4:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sekhar,

On 07/30/2013 02:05 AM, Sekhar Nori wrote:
> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>> In an effort to move to using Scatter gather lists of any size with
>> EDMA as discussed at [1] instead of placing limitations on the driver,
>> we work through the limitations of the EDMAC hardware to find missed
>> events and issue them.
>>
>> The sequence of events that require this are:
>>
>> For the scenario where MAX slots for an EDMA channel is 3:
>>
>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>
>> The above SG list will have to be DMA'd in 2 sets:
>>
>> (1) SG1 -> SG2 -> SG3 -> Null
>> (2) SG4 -> SG5 -> SG6 -> Null
>>
>> After (1) is succesfully transferred, the events from the MMC controller
>> donot stop coming and are missed by the time we have setup the transfer
>> for (2). So here, we catch the events missed as an error condition and
>> issue them manually.
> 
> Are you sure there wont be any effect of these missed events on the
> peripheral side. For example, wont McASP get into an underrun condition
> when it encounters a null PaRAM set? Even UART has to transmit to a

But it will not encounter null PaRAM set because McASP uses contiguous
buffers for transfer which are not scattered across physical memory.
This can be accomplished with an SG of size 1. For such SGs, this patch
series leaves it linked Dummy and does not link to Null set. Null set is
only used for SG lists that are > MAX_NR_SG in size such as those
created for example by MMC and Crypto.

> particular baud so I guess it cannot wait like the way MMC/SD can.

Existing driver have to wait anyway if they hit MAX SG limit today. If
they don't want to wait, they would have allocated a contiguous block of
memory and DMA that in one stretch so they don't lose any events, and in
such cases we are not linking to Null.

> Also, wont this lead to under-utilization of the peripheral bandwith?
> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
> is waiting to be set-up.

But it is waiting anyway even today. Currently based on MAX segs, MMC
driver/subsystem will make SG list of size max_segs. Between these
sessions of creating such smaller SG-lists, if for some reason the MMC
controller is sending events, these will be lost anyway.

What will happen now with this patch series is we are simply accepting a
bigger list than this, and handling all the max_segs stuff within the
EDMA driver itself without outside world knowing. This is actually more
efficient as for long transfers, we are not going back and forth much
between the client and EDMA driver.

> Did you consider a ping-pong scheme with say three PaRAM sets per
> channel? That way you can keep a continuous transfer going on from the
> peripheral over the complete SG list.

Do you mean ping-pong scheme as used in the davinci-pcm driver today?
This can be used only for buffers that are contiguous in memory, not
those that are scattered across memory.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-07-31  5:05       ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-31  5:05 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On 07/30/2013 03:29 AM, Sekhar Nori wrote:
> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>> We certainly don't want error conditions to be cleared anywhere
> 
> 'anywhere' is a really loaded term.
> 
>> as this will make us 'forget' about missed events. We depend on
>> knowing which events were missed in order to be able to reissue them.
> 
>> This fixes a race condition where the EMR was being cleared
>> by the transfer completion interrupt handler.
>>
>> Basically, what was happening was:
>>
>>             Missed event
>>              |
>>              |
>>              V
>> SG1-SG2-SG3-Null
>>          \
>>           \__TC Interrupt (Almost same time as ARM is executing
>> TC interrupt handler, an event got missed and also forgotten
>> by clearing the EMR).
> 
> Sorry, but I dont see how edma_stop() is coming into picture in the race
> you describe?

In edma_callback function, for the case of DMA_COMPLETE (Transfer
completion interrupt), edma_stop() is called when all sets have been
processed. This had the effect of clearing the EMR.

This has 2 problems:

1.
If error interrupt is also pending and TC interrupt clears the EMR.

Due to this the ARM will execute the error interrupt even though the EMR
is clear. As a result, the following if condition in dma_ccerr_handler
will be true and IRQ_NONE is returned.

        if ((edma_read_array(ctlr, EDMA_EMR, 0) == 0) &&
            (edma_read_array(ctlr, EDMA_EMR, 1) == 0) &&
            (edma_read(ctlr, EDMA_QEMR) == 0) &&
            (edma_read(ctlr, EDMA_CCERR) == 0))
                return IRQ_NONE;

If this happens enough number of times, IRQ subsystem disables the
interrupt thinking its spurious which creates serious problems.

2.
If the above if statement condition is removed, then EMR is 0 so the
callback function will not be called in dma_ccerr_handler thus the event
is forgotten, never triggered manually or never sets missed flag of the
channel.

So about the race: TC interrupt handler executing before the error
interrupt handler can result in clearing the EMR and creates these problems.

>> The EMR is ultimately being cleared by the Error interrupt
>> handler once it is handled so we don't have to do it in edma_stop.
> 
> This, I agree with. edma_clean_channel() also there to re-initialize the
> channel so doing it in edma_stop() certainly seems superfluous.

Sure.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-07-31  5:05       ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-31  5:05 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On 07/30/2013 03:29 AM, Sekhar Nori wrote:
> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>> We certainly don't want error conditions to be cleared anywhere
> 
> 'anywhere' is a really loaded term.
> 
>> as this will make us 'forget' about missed events. We depend on
>> knowing which events were missed in order to be able to reissue them.
> 
>> This fixes a race condition where the EMR was being cleared
>> by the transfer completion interrupt handler.
>>
>> Basically, what was happening was:
>>
>>             Missed event
>>              |
>>              |
>>              V
>> SG1-SG2-SG3-Null
>>          \
>>           \__TC Interrupt (Almost same time as ARM is executing
>> TC interrupt handler, an event got missed and also forgotten
>> by clearing the EMR).
> 
> Sorry, but I dont see how edma_stop() is coming into picture in the race
> you describe?

In edma_callback function, for the case of DMA_COMPLETE (Transfer
completion interrupt), edma_stop() is called when all sets have been
processed. This had the effect of clearing the EMR.

This has 2 problems:

1.
If error interrupt is also pending and TC interrupt clears the EMR.

Due to this the ARM will execute the error interrupt even though the EMR
is clear. As a result, the following if condition in dma_ccerr_handler
will be true and IRQ_NONE is returned.

        if ((edma_read_array(ctlr, EDMA_EMR, 0) == 0) &&
            (edma_read_array(ctlr, EDMA_EMR, 1) == 0) &&
            (edma_read(ctlr, EDMA_QEMR) == 0) &&
            (edma_read(ctlr, EDMA_CCERR) == 0))
                return IRQ_NONE;

If this happens enough number of times, IRQ subsystem disables the
interrupt thinking its spurious which creates serious problems.

2.
If the above if statement condition is removed, then EMR is 0 so the
callback function will not be called in dma_ccerr_handler thus the event
is forgotten, never triggered manually or never sets missed flag of the
channel.

So about the race: TC interrupt handler executing before the error
interrupt handler can result in clearing the EMR and creates these problems.

>> The EMR is ultimately being cleared by the Error interrupt
>> handler once it is handled so we don't have to do it in edma_stop.
> 
> This, I agree with. edma_clean_channel() also there to re-initialize the
> channel so doing it in edma_stop() certainly seems superfluous.

Sure.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-07-31  5:05       ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-07-31  5:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/30/2013 03:29 AM, Sekhar Nori wrote:
> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>> We certainly don't want error conditions to be cleared anywhere
> 
> 'anywhere' is a really loaded term.
> 
>> as this will make us 'forget' about missed events. We depend on
>> knowing which events were missed in order to be able to reissue them.
> 
>> This fixes a race condition where the EMR was being cleared
>> by the transfer completion interrupt handler.
>>
>> Basically, what was happening was:
>>
>>             Missed event
>>              |
>>              |
>>              V
>> SG1-SG2-SG3-Null
>>          \
>>           \__TC Interrupt (Almost same time as ARM is executing
>> TC interrupt handler, an event got missed and also forgotten
>> by clearing the EMR).
> 
> Sorry, but I dont see how edma_stop() is coming into picture in the race
> you describe?

In edma_callback function, for the case of DMA_COMPLETE (Transfer
completion interrupt), edma_stop() is called when all sets have been
processed. This had the effect of clearing the EMR.

This has 2 problems:

1.
If error interrupt is also pending and TC interrupt clears the EMR.

Due to this the ARM will execute the error interrupt even though the EMR
is clear. As a result, the following if condition in dma_ccerr_handler
will be true and IRQ_NONE is returned.

        if ((edma_read_array(ctlr, EDMA_EMR, 0) == 0) &&
            (edma_read_array(ctlr, EDMA_EMR, 1) == 0) &&
            (edma_read(ctlr, EDMA_QEMR) == 0) &&
            (edma_read(ctlr, EDMA_CCERR) == 0))
                return IRQ_NONE;

If this happens enough number of times, IRQ subsystem disables the
interrupt thinking its spurious which creates serious problems.

2.
If the above if statement condition is removed, then EMR is 0 so the
callback function will not be called in dma_ccerr_handler thus the event
is forgotten, never triggered manually or never sets missed flag of the
channel.

So about the race: TC interrupt handler executing before the error
interrupt handler can result in clearing the EMR and creates these problems.

>> The EMR is ultimately being cleared by the Error interrupt
>> handler once it is handled so we don't have to do it in edma_stop.
> 
> This, I agree with. edma_clean_channel() also there to re-initialize the
> channel so doing it in edma_stop() certainly seems superfluous.

Sure.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-31  5:23         ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-31  5:23 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On Wednesday 31 July 2013 10:00 AM, Joel Fernandes wrote:
> On 07/30/2013 12:18 AM, Sekhar Nori wrote:
>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>> Manual trigger for events missed as a result of splitting a
>>> scatter gather list and DMA'ing it in batches. Add a helper
>>> function to trigger a channel incase any such events are missed.
>>>
>>> Signed-off-by: Joel Fernandes <joelf@ti.com>
>>> ---
>>>  arch/arm/common/edma.c             |   21 +++++++++++++++++++++
>>>  include/linux/platform_data/edma.h |    2 ++
>>>  2 files changed, 23 insertions(+)
>>>
>>> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
>>> index 3567ba1..10995b2 100644
>>> --- a/arch/arm/common/edma.c
>>> +++ b/arch/arm/common/edma.c
>>> @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
>>>  }
>>>  EXPORT_SYMBOL(edma_resume);
>>>  
>>> +int edma_manual_trigger(unsigned channel)
>>
>> edma_trigger_channel() maybe? Brings consistency with
>> edma_alloc_channel() edma_free_channel() etc.
> 
> Ok, sure.
> 
>>
>>> +{
>>> +	unsigned ctlr;
>>> +	int j;
>>> +	unsigned int mask;
>>> +
>>> +	ctlr = EDMA_CTLR(channel);
>>> +	channel = EDMA_CHAN_SLOT(channel);
>>> +	mask = BIT(channel & 0x1f);
>>> +
>>> +	j = channel >> 5;
>>> +
>>> +	/* EDMA channels without event association */
>>
>> May be actually check for no-event association before you trigger in
>> software? You can do that by looking at unused channel list, no?
> 
> But, we want to trigger whether there is event association or not in
> this function. For ex, MMC has event associated but still this function
> is used to trigger event for it.

Okay, just drop the misleading comment then.

Regards,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-31  5:23         ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-31  5:23 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On Wednesday 31 July 2013 10:00 AM, Joel Fernandes wrote:
> On 07/30/2013 12:18 AM, Sekhar Nori wrote:
>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>> Manual trigger for events missed as a result of splitting a
>>> scatter gather list and DMA'ing it in batches. Add a helper
>>> function to trigger a channel incase any such events are missed.
>>>
>>> Signed-off-by: Joel Fernandes <joelf-l0cyMroinI0@public.gmane.org>
>>> ---
>>>  arch/arm/common/edma.c             |   21 +++++++++++++++++++++
>>>  include/linux/platform_data/edma.h |    2 ++
>>>  2 files changed, 23 insertions(+)
>>>
>>> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
>>> index 3567ba1..10995b2 100644
>>> --- a/arch/arm/common/edma.c
>>> +++ b/arch/arm/common/edma.c
>>> @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
>>>  }
>>>  EXPORT_SYMBOL(edma_resume);
>>>  
>>> +int edma_manual_trigger(unsigned channel)
>>
>> edma_trigger_channel() maybe? Brings consistency with
>> edma_alloc_channel() edma_free_channel() etc.
> 
> Ok, sure.
> 
>>
>>> +{
>>> +	unsigned ctlr;
>>> +	int j;
>>> +	unsigned int mask;
>>> +
>>> +	ctlr = EDMA_CTLR(channel);
>>> +	channel = EDMA_CHAN_SLOT(channel);
>>> +	mask = BIT(channel & 0x1f);
>>> +
>>> +	j = channel >> 5;
>>> +
>>> +	/* EDMA channels without event association */
>>
>> May be actually check for no-event association before you trigger in
>> software? You can do that by looking at unused channel list, no?
> 
> But, we want to trigger whether there is event association or not in
> this function. For ex, MMC has event associated but still this function
> is used to trigger event for it.

Okay, just drop the misleading comment then.

Regards,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-31  5:23         ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-31  5:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 31 July 2013 10:00 AM, Joel Fernandes wrote:
> On 07/30/2013 12:18 AM, Sekhar Nori wrote:
>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>> Manual trigger for events missed as a result of splitting a
>>> scatter gather list and DMA'ing it in batches. Add a helper
>>> function to trigger a channel incase any such events are missed.
>>>
>>> Signed-off-by: Joel Fernandes <joelf@ti.com>
>>> ---
>>>  arch/arm/common/edma.c             |   21 +++++++++++++++++++++
>>>  include/linux/platform_data/edma.h |    2 ++
>>>  2 files changed, 23 insertions(+)
>>>
>>> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
>>> index 3567ba1..10995b2 100644
>>> --- a/arch/arm/common/edma.c
>>> +++ b/arch/arm/common/edma.c
>>> @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
>>>  }
>>>  EXPORT_SYMBOL(edma_resume);
>>>  
>>> +int edma_manual_trigger(unsigned channel)
>>
>> edma_trigger_channel() maybe? Brings consistency with
>> edma_alloc_channel() edma_free_channel() etc.
> 
> Ok, sure.
> 
>>
>>> +{
>>> +	unsigned ctlr;
>>> +	int j;
>>> +	unsigned int mask;
>>> +
>>> +	ctlr = EDMA_CTLR(channel);
>>> +	channel = EDMA_CHAN_SLOT(channel);
>>> +	mask = BIT(channel & 0x1f);
>>> +
>>> +	j = channel >> 5;
>>> +
>>> +	/* EDMA channels without event association */
>>
>> May be actually check for no-event association before you trigger in
>> software? You can do that by looking at unused channel list, no?
> 
> But, we want to trigger whether there is event association or not in
> this function. For ex, MMC has event associated but still this function
> is used to trigger event for it.

Okay, just drop the misleading comment then.

Regards,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
  2013-07-31  5:23         ` Sekhar Nori
@ 2013-07-31  5:34             ` Fernandes, Joel
  -1 siblings, 0 replies; 89+ messages in thread
From: Fernandes, Joel @ 2013-07-31  5:34 UTC (permalink / raw)
  To: Nori, Sekhar
  Cc: Mark Brown, Tony Lindgren, Grant Likely, R,  Sricharan,
	Russell King, Vinod Koul, Vutla,  Lokesh, Chris Ball,
	Arnd Bergmann, Nayak,  Rajendra, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Krishnamoorthy, Balaji T,
	Linux MMC List, Linux Kernel Mailing List

On Jul 31, 2013, at 12:23 AM, "Nori, Sekhar" <nsekhar-l0cyMroinI0@public.gmane.org> wrote:

> On Wednesday 31 July 2013 10:00 AM, Joel Fernandes wrote:
>> On 07/30/2013 12:18 AM, Sekhar Nori wrote:
>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>> Manual trigger for events missed as a result of splitting a
>>>> scatter gather list and DMA'ing it in batches. Add a helper
>>>> function to trigger a channel incase any such events are missed.
>>>> 
>>>> Signed-off-by: Joel Fernandes <joelf-l0cyMroinI0@public.gmane.org>
>>>> ---
>>>> arch/arm/common/edma.c             |   21 +++++++++++++++++++++
>>>> include/linux/platform_data/edma.h |    2 ++
>>>> 2 files changed, 23 insertions(+)
>>>> 
>>>> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
>>>> index 3567ba1..10995b2 100644
>>>> --- a/arch/arm/common/edma.c
>>>> +++ b/arch/arm/common/edma.c
>>>> @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
>>>> }
>>>> EXPORT_SYMBOL(edma_resume);
>>>> 
>>>> +int edma_manual_trigger(unsigned channel)
>>> 
>>> edma_trigger_channel() maybe? Brings consistency with
>>> edma_alloc_channel() edma_free_channel() etc.
>> 
>> Ok, sure.
>> 
>>> 
>>>> +{
>>>> +    unsigned ctlr;
>>>> +    int j;
>>>> +    unsigned int mask;
>>>> +
>>>> +    ctlr = EDMA_CTLR(channel);
>>>> +    channel = EDMA_CHAN_SLOT(channel);
>>>> +    mask = BIT(channel & 0x1f);
>>>> +
>>>> +    j = channel >> 5;
>>>> +
>>>> +    /* EDMA channels without event association */
>>> 
>>> May be actually check for no-event association before you trigger in
>>> software? You can do that by looking at unused channel list, no?
>> 
>> But, we want to trigger whether there is event association or not in
>> this function. For ex, MMC has event associated but still this function
>> is used to trigger event for it.
> 
> Okay, just drop the misleading comment then.

Ok, will do.

Thanks,

-Joel

> 
> Regards,
> Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel
@ 2013-07-31  5:34             ` Fernandes, Joel
  0 siblings, 0 replies; 89+ messages in thread
From: Fernandes, Joel @ 2013-07-31  5:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Jul 31, 2013, at 12:23 AM, "Nori, Sekhar" <nsekhar@ti.com> wrote:

> On Wednesday 31 July 2013 10:00 AM, Joel Fernandes wrote:
>> On 07/30/2013 12:18 AM, Sekhar Nori wrote:
>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>> Manual trigger for events missed as a result of splitting a
>>>> scatter gather list and DMA'ing it in batches. Add a helper
>>>> function to trigger a channel incase any such events are missed.
>>>> 
>>>> Signed-off-by: Joel Fernandes <joelf@ti.com>
>>>> ---
>>>> arch/arm/common/edma.c             |   21 +++++++++++++++++++++
>>>> include/linux/platform_data/edma.h |    2 ++
>>>> 2 files changed, 23 insertions(+)
>>>> 
>>>> diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
>>>> index 3567ba1..10995b2 100644
>>>> --- a/arch/arm/common/edma.c
>>>> +++ b/arch/arm/common/edma.c
>>>> @@ -1236,6 +1236,27 @@ void edma_resume(unsigned channel)
>>>> }
>>>> EXPORT_SYMBOL(edma_resume);
>>>> 
>>>> +int edma_manual_trigger(unsigned channel)
>>> 
>>> edma_trigger_channel() maybe? Brings consistency with
>>> edma_alloc_channel() edma_free_channel() etc.
>> 
>> Ok, sure.
>> 
>>> 
>>>> +{
>>>> +    unsigned ctlr;
>>>> +    int j;
>>>> +    unsigned int mask;
>>>> +
>>>> +    ctlr = EDMA_CTLR(channel);
>>>> +    channel = EDMA_CHAN_SLOT(channel);
>>>> +    mask = BIT(channel & 0x1f);
>>>> +
>>>> +    j = channel >> 5;
>>>> +
>>>> +    /* EDMA channels without event association */
>>> 
>>> May be actually check for no-event association before you trigger in
>>> software? You can do that by looking at unused channel list, no?
>> 
>> But, we want to trigger whether there is event association or not in
>> this function. For ex, MMC has event associated but still this function
>> is used to trigger event for it.
> 
> Okay, just drop the misleading comment then.

Ok, will do.

Thanks,

-Joel

> 
> Regards,
> Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-07-31  9:18         ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-31  9:18 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
> Hi Sekhar,
> 
> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>> In an effort to move to using Scatter gather lists of any size with
>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>> we work through the limitations of the EDMAC hardware to find missed
>>> events and issue them.
>>>
>>> The sequence of events that require this are:
>>>
>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>
>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>
>>> The above SG list will have to be DMA'd in 2 sets:
>>>
>>> (1) SG1 -> SG2 -> SG3 -> Null
>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>
>>> After (1) is succesfully transferred, the events from the MMC controller
>>> donot stop coming and are missed by the time we have setup the transfer
>>> for (2). So here, we catch the events missed as an error condition and
>>> issue them manually.
>>
>> Are you sure there wont be any effect of these missed events on the
>> peripheral side. For example, wont McASP get into an underrun condition
>> when it encounters a null PaRAM set? Even UART has to transmit to a
> 
> But it will not encounter null PaRAM set because McASP uses contiguous
> buffers for transfer which are not scattered across physical memory.
> This can be accomplished with an SG of size 1. For such SGs, this patch
> series leaves it linked Dummy and does not link to Null set. Null set is
> only used for SG lists that are > MAX_NR_SG in size such as those
> created for example by MMC and Crypto.
> 
>> particular baud so I guess it cannot wait like the way MMC/SD can.
> 
> Existing driver have to wait anyway if they hit MAX SG limit today. If
> they don't want to wait, they would have allocated a contiguous block of
> memory and DMA that in one stretch so they don't lose any events, and in
> such cases we are not linking to Null.

As long as DMA driver can advertize its MAX SG limit, peripherals can
always work around that by limiting the number of sync events they
generate so as to not having any of the events getting missed. With this
series, I am worried that EDMA drivers is advertizing that it can handle
any length SG list while not taking care of missing any events while
doing so. This will break the assumptions that driver writers make.

> 
>> Also, wont this lead to under-utilization of the peripheral bandwith?
>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>> is waiting to be set-up.
> 
> But it is waiting anyway even today. Currently based on MAX segs, MMC
> driver/subsystem will make SG list of size max_segs. Between these
> sessions of creating such smaller SG-lists, if for some reason the MMC
> controller is sending events, these will be lost anyway.

But if MMC/SD driver knows how many events it should generate if it
knows the MAX SG limit. So there should not be any missed events in
current code. And I am not claiming that your solution is making matters
worse. But its not making it much better as well.

> 
> What will happen now with this patch series is we are simply accepting a
> bigger list than this, and handling all the max_segs stuff within the
> EDMA driver itself without outside world knowing. This is actually more
> efficient as for long transfers, we are not going back and forth much
> between the client and EDMA driver.

Agreed, I am not debating that we need to handle SG lists of any length.
The hardware is capable of handling them, and no reason kernel should not.

> 
>> Did you consider a ping-pong scheme with say three PaRAM sets per
>> channel? That way you can keep a continuous transfer going on from the
>> peripheral over the complete SG list.
> 
> Do you mean ping-pong scheme as used in the davinci-pcm driver today?

No. AFAIR, thats a ping-pong between internal RAM and DDR for earlier
audio ports which did not come with FIFO.

> This can be used only for buffers that are contiguous in memory, not
> those that are scattered across memory.

I was hinting at using the linking facility of EDMA to achieve this.
Each PaRAM set has full 32-bit source and destination pointers so I see
no reason why non-contiguous case cannot be handled.

Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
typically 4 times the number of channels. In this case we use one DMA
PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
and P1 and P2 are the Link sets.

Initial setup:

SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
 ^      ^      ^
 |      |      |
P0  -> P1  -> P2  -> NULL

P[0..2].TCINTEN = 1, so get an interrupt after each SG element
completion. On each completion interrupt, hardware automatically copies
the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
out, the state of hardware is:

SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
 ^       ^
 |       |
P0,1    P2  -> NULL
 |       ^
 |       |
 ---------

SG1 transfer has already started by the time the TC interrupt is
handled. As you can see P1 is now redundant and ready to be recycled. So
in the interrupt handler, software recycles P1. Thus:

SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
 ^      ^      ^
 |      |      |
P0  -> P2  -> P1  -> NULL

Now, on next interrupt, P2 gets copied and thus can get recycled.
Hardware state:

SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
 ^       ^
 |       |
P0,2    P1  -> NULL
 |       ^
 |       |
 ---------

As part of TC completion interrupt handling:

SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
 ^      ^      ^
 |      |      |
P0  -> P1  -> P2  -> NULL

This goes on until the SG list in exhausted. If you use more PaRAM sets,
interrupt handler gets more time to recycle the PaRAM set. At no point
we touch P0 as it is always under active transfer. Thus the peripheral
is always kept busy.

Do you see any reason why such a mechanism cannot be implemented?

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-07-31  9:18         ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-31  9:18 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
> Hi Sekhar,
> 
> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>> In an effort to move to using Scatter gather lists of any size with
>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>> we work through the limitations of the EDMAC hardware to find missed
>>> events and issue them.
>>>
>>> The sequence of events that require this are:
>>>
>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>
>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>
>>> The above SG list will have to be DMA'd in 2 sets:
>>>
>>> (1) SG1 -> SG2 -> SG3 -> Null
>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>
>>> After (1) is succesfully transferred, the events from the MMC controller
>>> donot stop coming and are missed by the time we have setup the transfer
>>> for (2). So here, we catch the events missed as an error condition and
>>> issue them manually.
>>
>> Are you sure there wont be any effect of these missed events on the
>> peripheral side. For example, wont McASP get into an underrun condition
>> when it encounters a null PaRAM set? Even UART has to transmit to a
> 
> But it will not encounter null PaRAM set because McASP uses contiguous
> buffers for transfer which are not scattered across physical memory.
> This can be accomplished with an SG of size 1. For such SGs, this patch
> series leaves it linked Dummy and does not link to Null set. Null set is
> only used for SG lists that are > MAX_NR_SG in size such as those
> created for example by MMC and Crypto.
> 
>> particular baud so I guess it cannot wait like the way MMC/SD can.
> 
> Existing driver have to wait anyway if they hit MAX SG limit today. If
> they don't want to wait, they would have allocated a contiguous block of
> memory and DMA that in one stretch so they don't lose any events, and in
> such cases we are not linking to Null.

As long as DMA driver can advertize its MAX SG limit, peripherals can
always work around that by limiting the number of sync events they
generate so as to not having any of the events getting missed. With this
series, I am worried that EDMA drivers is advertizing that it can handle
any length SG list while not taking care of missing any events while
doing so. This will break the assumptions that driver writers make.

> 
>> Also, wont this lead to under-utilization of the peripheral bandwith?
>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>> is waiting to be set-up.
> 
> But it is waiting anyway even today. Currently based on MAX segs, MMC
> driver/subsystem will make SG list of size max_segs. Between these
> sessions of creating such smaller SG-lists, if for some reason the MMC
> controller is sending events, these will be lost anyway.

But if MMC/SD driver knows how many events it should generate if it
knows the MAX SG limit. So there should not be any missed events in
current code. And I am not claiming that your solution is making matters
worse. But its not making it much better as well.

> 
> What will happen now with this patch series is we are simply accepting a
> bigger list than this, and handling all the max_segs stuff within the
> EDMA driver itself without outside world knowing. This is actually more
> efficient as for long transfers, we are not going back and forth much
> between the client and EDMA driver.

Agreed, I am not debating that we need to handle SG lists of any length.
The hardware is capable of handling them, and no reason kernel should not.

> 
>> Did you consider a ping-pong scheme with say three PaRAM sets per
>> channel? That way you can keep a continuous transfer going on from the
>> peripheral over the complete SG list.
> 
> Do you mean ping-pong scheme as used in the davinci-pcm driver today?

No. AFAIR, thats a ping-pong between internal RAM and DDR for earlier
audio ports which did not come with FIFO.

> This can be used only for buffers that are contiguous in memory, not
> those that are scattered across memory.

I was hinting at using the linking facility of EDMA to achieve this.
Each PaRAM set has full 32-bit source and destination pointers so I see
no reason why non-contiguous case cannot be handled.

Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
typically 4 times the number of channels. In this case we use one DMA
PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
and P1 and P2 are the Link sets.

Initial setup:

SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
 ^      ^      ^
 |      |      |
P0  -> P1  -> P2  -> NULL

P[0..2].TCINTEN = 1, so get an interrupt after each SG element
completion. On each completion interrupt, hardware automatically copies
the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
out, the state of hardware is:

SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
 ^       ^
 |       |
P0,1    P2  -> NULL
 |       ^
 |       |
 ---------

SG1 transfer has already started by the time the TC interrupt is
handled. As you can see P1 is now redundant and ready to be recycled. So
in the interrupt handler, software recycles P1. Thus:

SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
 ^      ^      ^
 |      |      |
P0  -> P2  -> P1  -> NULL

Now, on next interrupt, P2 gets copied and thus can get recycled.
Hardware state:

SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
 ^       ^
 |       |
P0,2    P1  -> NULL
 |       ^
 |       |
 ---------

As part of TC completion interrupt handling:

SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
 ^      ^      ^
 |      |      |
P0  -> P1  -> P2  -> NULL

This goes on until the SG list in exhausted. If you use more PaRAM sets,
interrupt handler gets more time to recycle the PaRAM set. At no point
we touch P0 as it is always under active transfer. Thus the peripheral
is always kept busy.

Do you see any reason why such a mechanism cannot be implemented?

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-07-31  9:18         ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-31  9:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
> Hi Sekhar,
> 
> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>> In an effort to move to using Scatter gather lists of any size with
>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>> we work through the limitations of the EDMAC hardware to find missed
>>> events and issue them.
>>>
>>> The sequence of events that require this are:
>>>
>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>
>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>
>>> The above SG list will have to be DMA'd in 2 sets:
>>>
>>> (1) SG1 -> SG2 -> SG3 -> Null
>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>
>>> After (1) is succesfully transferred, the events from the MMC controller
>>> donot stop coming and are missed by the time we have setup the transfer
>>> for (2). So here, we catch the events missed as an error condition and
>>> issue them manually.
>>
>> Are you sure there wont be any effect of these missed events on the
>> peripheral side. For example, wont McASP get into an underrun condition
>> when it encounters a null PaRAM set? Even UART has to transmit to a
> 
> But it will not encounter null PaRAM set because McASP uses contiguous
> buffers for transfer which are not scattered across physical memory.
> This can be accomplished with an SG of size 1. For such SGs, this patch
> series leaves it linked Dummy and does not link to Null set. Null set is
> only used for SG lists that are > MAX_NR_SG in size such as those
> created for example by MMC and Crypto.
> 
>> particular baud so I guess it cannot wait like the way MMC/SD can.
> 
> Existing driver have to wait anyway if they hit MAX SG limit today. If
> they don't want to wait, they would have allocated a contiguous block of
> memory and DMA that in one stretch so they don't lose any events, and in
> such cases we are not linking to Null.

As long as DMA driver can advertize its MAX SG limit, peripherals can
always work around that by limiting the number of sync events they
generate so as to not having any of the events getting missed. With this
series, I am worried that EDMA drivers is advertizing that it can handle
any length SG list while not taking care of missing any events while
doing so. This will break the assumptions that driver writers make.

> 
>> Also, wont this lead to under-utilization of the peripheral bandwith?
>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>> is waiting to be set-up.
> 
> But it is waiting anyway even today. Currently based on MAX segs, MMC
> driver/subsystem will make SG list of size max_segs. Between these
> sessions of creating such smaller SG-lists, if for some reason the MMC
> controller is sending events, these will be lost anyway.

But if MMC/SD driver knows how many events it should generate if it
knows the MAX SG limit. So there should not be any missed events in
current code. And I am not claiming that your solution is making matters
worse. But its not making it much better as well.

> 
> What will happen now with this patch series is we are simply accepting a
> bigger list than this, and handling all the max_segs stuff within the
> EDMA driver itself without outside world knowing. This is actually more
> efficient as for long transfers, we are not going back and forth much
> between the client and EDMA driver.

Agreed, I am not debating that we need to handle SG lists of any length.
The hardware is capable of handling them, and no reason kernel should not.

> 
>> Did you consider a ping-pong scheme with say three PaRAM sets per
>> channel? That way you can keep a continuous transfer going on from the
>> peripheral over the complete SG list.
> 
> Do you mean ping-pong scheme as used in the davinci-pcm driver today?

No. AFAIR, thats a ping-pong between internal RAM and DDR for earlier
audio ports which did not come with FIFO.

> This can be used only for buffers that are contiguous in memory, not
> those that are scattered across memory.

I was hinting at using the linking facility of EDMA to achieve this.
Each PaRAM set has full 32-bit source and destination pointers so I see
no reason why non-contiguous case cannot be handled.

Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
typically 4 times the number of channels. In this case we use one DMA
PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
and P1 and P2 are the Link sets.

Initial setup:

SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
 ^      ^      ^
 |      |      |
P0  -> P1  -> P2  -> NULL

P[0..2].TCINTEN = 1, so get an interrupt after each SG element
completion. On each completion interrupt, hardware automatically copies
the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
out, the state of hardware is:

SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
 ^       ^
 |       |
P0,1    P2  -> NULL
 |       ^
 |       |
 ---------

SG1 transfer has already started by the time the TC interrupt is
handled. As you can see P1 is now redundant and ready to be recycled. So
in the interrupt handler, software recycles P1. Thus:

SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
 ^      ^      ^
 |      |      |
P0  -> P2  -> P1  -> NULL

Now, on next interrupt, P2 gets copied and thus can get recycled.
Hardware state:

SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
 ^       ^
 |       |
P0,2    P1  -> NULL
 |       ^
 |       |
 ---------

As part of TC completion interrupt handling:

SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
 ^      ^      ^
 |      |      |
P0  -> P1  -> P2  -> NULL

This goes on until the SG list in exhausted. If you use more PaRAM sets,
interrupt handler gets more time to recycle the PaRAM set. At no point
we touch P0 as it is always under active transfer. Thus the peripheral
is always kept busy.

Do you see any reason why such a mechanism cannot be implemented?

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-07-31  9:35         ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-31  9:35 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On Wednesday 31 July 2013 10:35 AM, Joel Fernandes wrote:
> On 07/30/2013 03:29 AM, Sekhar Nori wrote:
>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>> We certainly don't want error conditions to be cleared anywhere
>>
>> 'anywhere' is a really loaded term.
>>
>>> as this will make us 'forget' about missed events. We depend on
>>> knowing which events were missed in order to be able to reissue them.
>>
>>> This fixes a race condition where the EMR was being cleared
>>> by the transfer completion interrupt handler.
>>>
>>> Basically, what was happening was:
>>>
>>>             Missed event
>>>              |
>>>              |
>>>              V
>>> SG1-SG2-SG3-Null
>>>          \
>>>           \__TC Interrupt (Almost same time as ARM is executing
>>> TC interrupt handler, an event got missed and also forgotten
>>> by clearing the EMR).
>>
>> Sorry, but I dont see how edma_stop() is coming into picture in the race
>> you describe?
> 
> In edma_callback function, for the case of DMA_COMPLETE (Transfer
> completion interrupt), edma_stop() is called when all sets have been
> processed. This had the effect of clearing the EMR.

Ah, thanks. I was missing the fact that the race comes into picture only
when using the DMA engine driver. I guess that should be mentioned
somewhere since it is not immediately obvious.

The patch looks good to me. So if you respin just this one with some
updated explanation based on what you wrote below, I will take it.

Thanks,
Sekhar


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-07-31  9:35         ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-31  9:35 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On Wednesday 31 July 2013 10:35 AM, Joel Fernandes wrote:
> On 07/30/2013 03:29 AM, Sekhar Nori wrote:
>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>> We certainly don't want error conditions to be cleared anywhere
>>
>> 'anywhere' is a really loaded term.
>>
>>> as this will make us 'forget' about missed events. We depend on
>>> knowing which events were missed in order to be able to reissue them.
>>
>>> This fixes a race condition where the EMR was being cleared
>>> by the transfer completion interrupt handler.
>>>
>>> Basically, what was happening was:
>>>
>>>             Missed event
>>>              |
>>>              |
>>>              V
>>> SG1-SG2-SG3-Null
>>>          \
>>>           \__TC Interrupt (Almost same time as ARM is executing
>>> TC interrupt handler, an event got missed and also forgotten
>>> by clearing the EMR).
>>
>> Sorry, but I dont see how edma_stop() is coming into picture in the race
>> you describe?
> 
> In edma_callback function, for the case of DMA_COMPLETE (Transfer
> completion interrupt), edma_stop() is called when all sets have been
> processed. This had the effect of clearing the EMR.

Ah, thanks. I was missing the fact that the race comes into picture only
when using the DMA engine driver. I guess that should be mentioned
somewhere since it is not immediately obvious.

The patch looks good to me. So if you respin just this one with some
updated explanation based on what you wrote below, I will take it.

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-07-31  9:35         ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-07-31  9:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 31 July 2013 10:35 AM, Joel Fernandes wrote:
> On 07/30/2013 03:29 AM, Sekhar Nori wrote:
>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>> We certainly don't want error conditions to be cleared anywhere
>>
>> 'anywhere' is a really loaded term.
>>
>>> as this will make us 'forget' about missed events. We depend on
>>> knowing which events were missed in order to be able to reissue them.
>>
>>> This fixes a race condition where the EMR was being cleared
>>> by the transfer completion interrupt handler.
>>>
>>> Basically, what was happening was:
>>>
>>>             Missed event
>>>              |
>>>              |
>>>              V
>>> SG1-SG2-SG3-Null
>>>          \
>>>           \__TC Interrupt (Almost same time as ARM is executing
>>> TC interrupt handler, an event got missed and also forgotten
>>> by clearing the EMR).
>>
>> Sorry, but I dont see how edma_stop() is coming into picture in the race
>> you describe?
> 
> In edma_callback function, for the case of DMA_COMPLETE (Transfer
> completion interrupt), edma_stop() is called when all sets have been
> processed. This had the effect of clearing the EMR.

Ah, thanks. I was missing the fact that the race comes into picture only
when using the DMA engine driver. I guess that should be mentioned
somewhere since it is not immediately obvious.

The patch looks good to me. So if you respin just this one with some
updated explanation based on what you wrote below, I will take it.

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-08-01  1:59           ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01  1:59 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On 07/31/2013 04:35 AM, Sekhar Nori wrote:
> On Wednesday 31 July 2013 10:35 AM, Joel Fernandes wrote:
>> On 07/30/2013 03:29 AM, Sekhar Nori wrote:
>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>> We certainly don't want error conditions to be cleared anywhere
>>>
>>> 'anywhere' is a really loaded term.
>>>
>>>> as this will make us 'forget' about missed events. We depend on
>>>> knowing which events were missed in order to be able to reissue them.
>>>
>>>> This fixes a race condition where the EMR was being cleared
>>>> by the transfer completion interrupt handler.
>>>>
>>>> Basically, what was happening was:
>>>>
>>>>             Missed event
>>>>              |
>>>>              |
>>>>              V
>>>> SG1-SG2-SG3-Null
>>>>          \
>>>>           \__TC Interrupt (Almost same time as ARM is executing
>>>> TC interrupt handler, an event got missed and also forgotten
>>>> by clearing the EMR).
>>>
>>> Sorry, but I dont see how edma_stop() is coming into picture in the race
>>> you describe?
>>
>> In edma_callback function, for the case of DMA_COMPLETE (Transfer
>> completion interrupt), edma_stop() is called when all sets have been
>> processed. This had the effect of clearing the EMR.
> 
> Ah, thanks. I was missing the fact that the race comes into picture only
> when using the DMA engine driver. I guess that should be mentioned
> somewhere since it is not immediately obvious.
> 
> The patch looks good to me. So if you respin just this one with some
> updated explanation based on what you wrote below, I will take it.

Sure I'll do that. Also the trigger_channel patch, will you be taking
that one too? I can send these 2 in a series as they touch
arch/arm/common/edma.c

Thanks,

-Joel



> 
> Thanks,
> Sekhar
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-08-01  1:59           ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01  1:59 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On 07/31/2013 04:35 AM, Sekhar Nori wrote:
> On Wednesday 31 July 2013 10:35 AM, Joel Fernandes wrote:
>> On 07/30/2013 03:29 AM, Sekhar Nori wrote:
>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>> We certainly don't want error conditions to be cleared anywhere
>>>
>>> 'anywhere' is a really loaded term.
>>>
>>>> as this will make us 'forget' about missed events. We depend on
>>>> knowing which events were missed in order to be able to reissue them.
>>>
>>>> This fixes a race condition where the EMR was being cleared
>>>> by the transfer completion interrupt handler.
>>>>
>>>> Basically, what was happening was:
>>>>
>>>>             Missed event
>>>>              |
>>>>              |
>>>>              V
>>>> SG1-SG2-SG3-Null
>>>>          \
>>>>           \__TC Interrupt (Almost same time as ARM is executing
>>>> TC interrupt handler, an event got missed and also forgotten
>>>> by clearing the EMR).
>>>
>>> Sorry, but I dont see how edma_stop() is coming into picture in the race
>>> you describe?
>>
>> In edma_callback function, for the case of DMA_COMPLETE (Transfer
>> completion interrupt), edma_stop() is called when all sets have been
>> processed. This had the effect of clearing the EMR.
> 
> Ah, thanks. I was missing the fact that the race comes into picture only
> when using the DMA engine driver. I guess that should be mentioned
> somewhere since it is not immediately obvious.
> 
> The patch looks good to me. So if you respin just this one with some
> updated explanation based on what you wrote below, I will take it.

Sure I'll do that. Also the trigger_channel patch, will you be taking
that one too? I can send these 2 in a series as they touch
arch/arm/common/edma.c

Thanks,

-Joel



> 
> Thanks,
> Sekhar
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop
@ 2013-08-01  1:59           ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01  1:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/31/2013 04:35 AM, Sekhar Nori wrote:
> On Wednesday 31 July 2013 10:35 AM, Joel Fernandes wrote:
>> On 07/30/2013 03:29 AM, Sekhar Nori wrote:
>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>> We certainly don't want error conditions to be cleared anywhere
>>>
>>> 'anywhere' is a really loaded term.
>>>
>>>> as this will make us 'forget' about missed events. We depend on
>>>> knowing which events were missed in order to be able to reissue them.
>>>
>>>> This fixes a race condition where the EMR was being cleared
>>>> by the transfer completion interrupt handler.
>>>>
>>>> Basically, what was happening was:
>>>>
>>>>             Missed event
>>>>              |
>>>>              |
>>>>              V
>>>> SG1-SG2-SG3-Null
>>>>          \
>>>>           \__TC Interrupt (Almost same time as ARM is executing
>>>> TC interrupt handler, an event got missed and also forgotten
>>>> by clearing the EMR).
>>>
>>> Sorry, but I dont see how edma_stop() is coming into picture in the race
>>> you describe?
>>
>> In edma_callback function, for the case of DMA_COMPLETE (Transfer
>> completion interrupt), edma_stop() is called when all sets have been
>> processed. This had the effect of clearing the EMR.
> 
> Ah, thanks. I was missing the fact that the race comes into picture only
> when using the DMA engine driver. I guess that should be mentioned
> somewhere since it is not immediately obvious.
> 
> The patch looks good to me. So if you respin just this one with some
> updated explanation based on what you wrote below, I will take it.

Sure I'll do that. Also the trigger_channel patch, will you be taking
that one too? I can send these 2 in a series as they touch
arch/arm/common/edma.c

Thanks,

-Joel



> 
> Thanks,
> Sekhar
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01  2:27           ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01  2:27 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On 07/31/2013 04:18 AM, Sekhar Nori wrote:
> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>> Hi Sekhar,
>>
>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>> In an effort to move to using Scatter gather lists of any size with
>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>> we work through the limitations of the EDMAC hardware to find missed
>>>> events and issue them.
>>>>
>>>> The sequence of events that require this are:
>>>>
>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>
>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>
>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>
>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>
>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>> donot stop coming and are missed by the time we have setup the transfer
>>>> for (2). So here, we catch the events missed as an error condition and
>>>> issue them manually.
>>>
>>> Are you sure there wont be any effect of these missed events on the
>>> peripheral side. For example, wont McASP get into an underrun condition
>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>
>> But it will not encounter null PaRAM set because McASP uses contiguous
>> buffers for transfer which are not scattered across physical memory.
>> This can be accomplished with an SG of size 1. For such SGs, this patch
>> series leaves it linked Dummy and does not link to Null set. Null set is
>> only used for SG lists that are > MAX_NR_SG in size such as those
>> created for example by MMC and Crypto.
>>
>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>
>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>> they don't want to wait, they would have allocated a contiguous block of
>> memory and DMA that in one stretch so they don't lose any events, and in
>> such cases we are not linking to Null.
> 
> As long as DMA driver can advertize its MAX SG limit, peripherals can
> always work around that by limiting the number of sync events they
> generate so as to not having any of the events getting missed. With this
> series, I am worried that EDMA drivers is advertizing that it can handle
> any length SG list while not taking care of missing any events while
> doing so. This will break the assumptions that driver writers make.

This is already being done by some other DMA engine drivers ;). We can
advertise more than we can handle at a time, that's the basis of this
whole idea.

I understand what you're saying but events are not something that have
be serviced immediately, they can be queued etc and the actually
transfer from the DMA controller can be delayed. As long as we don't
miss the event we are fine which my series takes care off.

So far I have tested this series on following modules in various
configurations and have seen no issues:
- Crypto AES
- MMC/SD
- SPI (128x160 display)

>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>> is waiting to be set-up.
>>
>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>> driver/subsystem will make SG list of size max_segs. Between these
>> sessions of creating such smaller SG-lists, if for some reason the MMC
>> controller is sending events, these will be lost anyway.
> 
> But if MMC/SD driver knows how many events it should generate if it
> knows the MAX SG limit. So there should not be any missed events in
> current code. And I am not claiming that your solution is making matters
> worse. But its not making it much better as well.

This is not true for crypto, the events are not deasserted and crypto
continues to send events. This is what led to the "don't trigger in
Null" patch where I'm setting the missed flag to avoid recursion.

>> This can be used only for buffers that are contiguous in memory, not
>> those that are scattered across memory.
> 
> I was hinting at using the linking facility of EDMA to achieve this.
> Each PaRAM set has full 32-bit source and destination pointers so I see
> no reason why non-contiguous case cannot be handled.
> 
> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
> typically 4 times the number of channels. In this case we use one DMA
> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
> and P1 and P2 are the Link sets.
> 
> Initial setup:
> 
> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>  ^      ^      ^
>  |      |      |
> P0  -> P1  -> P2  -> NULL
> 
> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
> completion. On each completion interrupt, hardware automatically copies
> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
> out, the state of hardware is:
> 
> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>  ^       ^
>  |       |
> P0,1    P2  -> NULL
>  |       ^
>  |       |
>  ---------
> 
> SG1 transfer has already started by the time the TC interrupt is
> handled. As you can see P1 is now redundant and ready to be recycled. So
> in the interrupt handler, software recycles P1. Thus:
> 
> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>  ^      ^      ^
>  |      |      |
> P0  -> P2  -> P1  -> NULL
> 
> Now, on next interrupt, P2 gets copied and thus can get recycled.
> Hardware state:
> 
> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>  ^       ^
>  |       |
> P0,2    P1  -> NULL
>  |       ^
>  |       |
>  ---------
> 
> As part of TC completion interrupt handling:
> 
> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>  ^      ^      ^
>  |      |      |
> P0  -> P1  -> P2  -> NULL
> 
> This goes on until the SG list in exhausted. If you use more PaRAM sets,
> interrupt handler gets more time to recycle the PaRAM set. At no point
> we touch P0 as it is always under active transfer. Thus the peripheral
> is always kept busy.
> 
> Do you see any reason why such a mechanism cannot be implemented?

This is possible and looks like another way to do it, but there are 2
problems I can see with it.

1. Its inefficient because of too many interrupts:

Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
10. This method will trigger 30 interrupts always, where as with my
patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
you'd get even fewer interrupts.

2. If the interrupt handler for some reason doesn't complete or get
service in time, we will end up DMA'ing incorrect data as events
wouldn't stop coming in even if interrupt is not yet handled (in your
example linked sets P1 or P2 would be old ones being repeated). Where as
with my method, we are not doing any DMA once we finish the current
MAX_NR_SG set even if events continue to come.

I feel my patch series efficient, has less LOC because of code reuse and
has passed all possible tests I've performed on it.

Thanks,

-Joel

> 
> Thanks,
> Sekhar
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01  2:27           ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01  2:27 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On 07/31/2013 04:18 AM, Sekhar Nori wrote:
> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>> Hi Sekhar,
>>
>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>> In an effort to move to using Scatter gather lists of any size with
>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>> we work through the limitations of the EDMAC hardware to find missed
>>>> events and issue them.
>>>>
>>>> The sequence of events that require this are:
>>>>
>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>
>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>
>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>
>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>
>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>> donot stop coming and are missed by the time we have setup the transfer
>>>> for (2). So here, we catch the events missed as an error condition and
>>>> issue them manually.
>>>
>>> Are you sure there wont be any effect of these missed events on the
>>> peripheral side. For example, wont McASP get into an underrun condition
>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>
>> But it will not encounter null PaRAM set because McASP uses contiguous
>> buffers for transfer which are not scattered across physical memory.
>> This can be accomplished with an SG of size 1. For such SGs, this patch
>> series leaves it linked Dummy and does not link to Null set. Null set is
>> only used for SG lists that are > MAX_NR_SG in size such as those
>> created for example by MMC and Crypto.
>>
>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>
>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>> they don't want to wait, they would have allocated a contiguous block of
>> memory and DMA that in one stretch so they don't lose any events, and in
>> such cases we are not linking to Null.
> 
> As long as DMA driver can advertize its MAX SG limit, peripherals can
> always work around that by limiting the number of sync events they
> generate so as to not having any of the events getting missed. With this
> series, I am worried that EDMA drivers is advertizing that it can handle
> any length SG list while not taking care of missing any events while
> doing so. This will break the assumptions that driver writers make.

This is already being done by some other DMA engine drivers ;). We can
advertise more than we can handle at a time, that's the basis of this
whole idea.

I understand what you're saying but events are not something that have
be serviced immediately, they can be queued etc and the actually
transfer from the DMA controller can be delayed. As long as we don't
miss the event we are fine which my series takes care off.

So far I have tested this series on following modules in various
configurations and have seen no issues:
- Crypto AES
- MMC/SD
- SPI (128x160 display)

>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>> is waiting to be set-up.
>>
>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>> driver/subsystem will make SG list of size max_segs. Between these
>> sessions of creating such smaller SG-lists, if for some reason the MMC
>> controller is sending events, these will be lost anyway.
> 
> But if MMC/SD driver knows how many events it should generate if it
> knows the MAX SG limit. So there should not be any missed events in
> current code. And I am not claiming that your solution is making matters
> worse. But its not making it much better as well.

This is not true for crypto, the events are not deasserted and crypto
continues to send events. This is what led to the "don't trigger in
Null" patch where I'm setting the missed flag to avoid recursion.

>> This can be used only for buffers that are contiguous in memory, not
>> those that are scattered across memory.
> 
> I was hinting at using the linking facility of EDMA to achieve this.
> Each PaRAM set has full 32-bit source and destination pointers so I see
> no reason why non-contiguous case cannot be handled.
> 
> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
> typically 4 times the number of channels. In this case we use one DMA
> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
> and P1 and P2 are the Link sets.
> 
> Initial setup:
> 
> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>  ^      ^      ^
>  |      |      |
> P0  -> P1  -> P2  -> NULL
> 
> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
> completion. On each completion interrupt, hardware automatically copies
> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
> out, the state of hardware is:
> 
> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>  ^       ^
>  |       |
> P0,1    P2  -> NULL
>  |       ^
>  |       |
>  ---------
> 
> SG1 transfer has already started by the time the TC interrupt is
> handled. As you can see P1 is now redundant and ready to be recycled. So
> in the interrupt handler, software recycles P1. Thus:
> 
> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>  ^      ^      ^
>  |      |      |
> P0  -> P2  -> P1  -> NULL
> 
> Now, on next interrupt, P2 gets copied and thus can get recycled.
> Hardware state:
> 
> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>  ^       ^
>  |       |
> P0,2    P1  -> NULL
>  |       ^
>  |       |
>  ---------
> 
> As part of TC completion interrupt handling:
> 
> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>  ^      ^      ^
>  |      |      |
> P0  -> P1  -> P2  -> NULL
> 
> This goes on until the SG list in exhausted. If you use more PaRAM sets,
> interrupt handler gets more time to recycle the PaRAM set. At no point
> we touch P0 as it is always under active transfer. Thus the peripheral
> is always kept busy.
> 
> Do you see any reason why such a mechanism cannot be implemented?

This is possible and looks like another way to do it, but there are 2
problems I can see with it.

1. Its inefficient because of too many interrupts:

Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
10. This method will trigger 30 interrupts always, where as with my
patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
you'd get even fewer interrupts.

2. If the interrupt handler for some reason doesn't complete or get
service in time, we will end up DMA'ing incorrect data as events
wouldn't stop coming in even if interrupt is not yet handled (in your
example linked sets P1 or P2 would be old ones being repeated). Where as
with my method, we are not doing any DMA once we finish the current
MAX_NR_SG set even if events continue to come.

I feel my patch series efficient, has less LOC because of code reuse and
has passed all possible tests I've performed on it.

Thanks,

-Joel

> 
> Thanks,
> Sekhar
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01  2:27           ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01  2:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/31/2013 04:18 AM, Sekhar Nori wrote:
> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>> Hi Sekhar,
>>
>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>> In an effort to move to using Scatter gather lists of any size with
>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>> we work through the limitations of the EDMAC hardware to find missed
>>>> events and issue them.
>>>>
>>>> The sequence of events that require this are:
>>>>
>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>
>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>
>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>
>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>
>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>> donot stop coming and are missed by the time we have setup the transfer
>>>> for (2). So here, we catch the events missed as an error condition and
>>>> issue them manually.
>>>
>>> Are you sure there wont be any effect of these missed events on the
>>> peripheral side. For example, wont McASP get into an underrun condition
>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>
>> But it will not encounter null PaRAM set because McASP uses contiguous
>> buffers for transfer which are not scattered across physical memory.
>> This can be accomplished with an SG of size 1. For such SGs, this patch
>> series leaves it linked Dummy and does not link to Null set. Null set is
>> only used for SG lists that are > MAX_NR_SG in size such as those
>> created for example by MMC and Crypto.
>>
>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>
>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>> they don't want to wait, they would have allocated a contiguous block of
>> memory and DMA that in one stretch so they don't lose any events, and in
>> such cases we are not linking to Null.
> 
> As long as DMA driver can advertize its MAX SG limit, peripherals can
> always work around that by limiting the number of sync events they
> generate so as to not having any of the events getting missed. With this
> series, I am worried that EDMA drivers is advertizing that it can handle
> any length SG list while not taking care of missing any events while
> doing so. This will break the assumptions that driver writers make.

This is already being done by some other DMA engine drivers ;). We can
advertise more than we can handle at a time, that's the basis of this
whole idea.

I understand what you're saying but events are not something that have
be serviced immediately, they can be queued etc and the actually
transfer from the DMA controller can be delayed. As long as we don't
miss the event we are fine which my series takes care off.

So far I have tested this series on following modules in various
configurations and have seen no issues:
- Crypto AES
- MMC/SD
- SPI (128x160 display)

>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>> is waiting to be set-up.
>>
>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>> driver/subsystem will make SG list of size max_segs. Between these
>> sessions of creating such smaller SG-lists, if for some reason the MMC
>> controller is sending events, these will be lost anyway.
> 
> But if MMC/SD driver knows how many events it should generate if it
> knows the MAX SG limit. So there should not be any missed events in
> current code. And I am not claiming that your solution is making matters
> worse. But its not making it much better as well.

This is not true for crypto, the events are not deasserted and crypto
continues to send events. This is what led to the "don't trigger in
Null" patch where I'm setting the missed flag to avoid recursion.

>> This can be used only for buffers that are contiguous in memory, not
>> those that are scattered across memory.
> 
> I was hinting at using the linking facility of EDMA to achieve this.
> Each PaRAM set has full 32-bit source and destination pointers so I see
> no reason why non-contiguous case cannot be handled.
> 
> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
> typically 4 times the number of channels. In this case we use one DMA
> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
> and P1 and P2 are the Link sets.
> 
> Initial setup:
> 
> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>  ^      ^      ^
>  |      |      |
> P0  -> P1  -> P2  -> NULL
> 
> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
> completion. On each completion interrupt, hardware automatically copies
> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
> out, the state of hardware is:
> 
> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>  ^       ^
>  |       |
> P0,1    P2  -> NULL
>  |       ^
>  |       |
>  ---------
> 
> SG1 transfer has already started by the time the TC interrupt is
> handled. As you can see P1 is now redundant and ready to be recycled. So
> in the interrupt handler, software recycles P1. Thus:
> 
> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>  ^      ^      ^
>  |      |      |
> P0  -> P2  -> P1  -> NULL
> 
> Now, on next interrupt, P2 gets copied and thus can get recycled.
> Hardware state:
> 
> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>  ^       ^
>  |       |
> P0,2    P1  -> NULL
>  |       ^
>  |       |
>  ---------
> 
> As part of TC completion interrupt handling:
> 
> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>  ^      ^      ^
>  |      |      |
> P0  -> P1  -> P2  -> NULL
> 
> This goes on until the SG list in exhausted. If you use more PaRAM sets,
> interrupt handler gets more time to recycle the PaRAM set. At no point
> we touch P0 as it is always under active transfer. Thus the peripheral
> is always kept busy.
> 
> Do you see any reason why such a mechanism cannot be implemented?

This is possible and looks like another way to do it, but there are 2
problems I can see with it.

1. Its inefficient because of too many interrupts:

Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
10. This method will trigger 30 interrupts always, where as with my
patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
you'd get even fewer interrupts.

2. If the interrupt handler for some reason doesn't complete or get
service in time, we will end up DMA'ing incorrect data as events
wouldn't stop coming in even if interrupt is not yet handled (in your
example linked sets P1 or P2 would be old ones being repeated). Where as
with my method, we are not doing any DMA once we finish the current
MAX_NR_SG set even if events continue to come.

I feel my patch series efficient, has less LOC because of code reuse and
has passed all possible tests I've performed on it.

Thanks,

-Joel

> 
> Thanks,
> Sekhar
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01  3:43             ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01  3:43 UTC (permalink / raw)
  To: joelf
  Cc: Sekhar Nori, Tony Lindgren, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List

On 07/31/2013 09:27 PM, Joel Fernandes wrote:
> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>> Hi Sekhar,
>>>
>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>> events and issue them.
>>>>>
>>>>> The sequence of events that require this are:
>>>>>
>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>
>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>> issue them manually.
>>>>
>>>> Are you sure there wont be any effect of these missed events on the
>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>
>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>> buffers for transfer which are not scattered across physical memory.
>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>> created for example by MMC and Crypto.
>>>
>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>
>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>> they don't want to wait, they would have allocated a contiguous block of
>>> memory and DMA that in one stretch so they don't lose any events, and in
>>> such cases we are not linking to Null.
>>
>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>> always work around that by limiting the number of sync events they
>> generate so as to not having any of the events getting missed. With this
>> series, I am worried that EDMA drivers is advertizing that it can handle
>> any length SG list while not taking care of missing any events while
>> doing so. This will break the assumptions that driver writers make.

Sorry, just forgot to respond to "not taking care of missing any events
while doing so". Can you clarify this? DMA engine driver is taking care
of missed events.

Also- missing of events doesn't result in feedback to the peripheral.
Peripheral sends even to DMA controller, event is missed. Peripheral
doesn't know anything about what happened and is waiting for transfer
from the DMA controller.

Thanks,

-Joel


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01  3:43             ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01  3:43 UTC (permalink / raw)
  To: joelf-l0cyMroinI0
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On 07/31/2013 09:27 PM, Joel Fernandes wrote:
> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>> Hi Sekhar,
>>>
>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>> events and issue them.
>>>>>
>>>>> The sequence of events that require this are:
>>>>>
>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>
>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>> issue them manually.
>>>>
>>>> Are you sure there wont be any effect of these missed events on the
>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>
>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>> buffers for transfer which are not scattered across physical memory.
>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>> created for example by MMC and Crypto.
>>>
>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>
>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>> they don't want to wait, they would have allocated a contiguous block of
>>> memory and DMA that in one stretch so they don't lose any events, and in
>>> such cases we are not linking to Null.
>>
>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>> always work around that by limiting the number of sync events they
>> generate so as to not having any of the events getting missed. With this
>> series, I am worried that EDMA drivers is advertizing that it can handle
>> any length SG list while not taking care of missing any events while
>> doing so. This will break the assumptions that driver writers make.

Sorry, just forgot to respond to "not taking care of missing any events
while doing so". Can you clarify this? DMA engine driver is taking care
of missed events.

Also- missing of events doesn't result in feedback to the peripheral.
Peripheral sends even to DMA controller, event is missed. Peripheral
doesn't know anything about what happened and is waiting for transfer
from the DMA controller.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01  3:43             ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01  3:43 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/31/2013 09:27 PM, Joel Fernandes wrote:
> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>> Hi Sekhar,
>>>
>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>> events and issue them.
>>>>>
>>>>> The sequence of events that require this are:
>>>>>
>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>
>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>> issue them manually.
>>>>
>>>> Are you sure there wont be any effect of these missed events on the
>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>
>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>> buffers for transfer which are not scattered across physical memory.
>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>> created for example by MMC and Crypto.
>>>
>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>
>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>> they don't want to wait, they would have allocated a contiguous block of
>>> memory and DMA that in one stretch so they don't lose any events, and in
>>> such cases we are not linking to Null.
>>
>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>> always work around that by limiting the number of sync events they
>> generate so as to not having any of the events getting missed. With this
>> series, I am worried that EDMA drivers is advertizing that it can handle
>> any length SG list while not taking care of missing any events while
>> doing so. This will break the assumptions that driver writers make.

Sorry, just forgot to respond to "not taking care of missing any events
while doing so". Can you clarify this? DMA engine driver is taking care
of missed events.

Also- missing of events doesn't result in feedback to the peripheral.
Peripheral sends even to DMA controller, event is missed. Peripheral
doesn't know anything about what happened and is waiting for transfer
from the DMA controller.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01  4:39             ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01  4:39 UTC (permalink / raw)
  To: joelf
  Cc: Sekhar Nori, Tony Lindgren, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List

On 07/31/2013 09:27 PM, Joel Fernandes wrote:
> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>> Hi Sekhar,
>>>
>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>> events and issue them.
>>>>>
>>>>> The sequence of events that require this are:
>>>>>
>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>
>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>> issue them manually.
>>>>
>>>> Are you sure there wont be any effect of these missed events on the
>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>
>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>> buffers for transfer which are not scattered across physical memory.
>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>> created for example by MMC and Crypto.
>>>
>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>
>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>> they don't want to wait, they would have allocated a contiguous block of
>>> memory and DMA that in one stretch so they don't lose any events, and in
>>> such cases we are not linking to Null.
>>
>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>> always work around that by limiting the number of sync events they
>> generate so as to not having any of the events getting missed. With this
>> series, I am worried that EDMA drivers is advertizing that it can handle
>> any length SG list while not taking care of missing any events while
>> doing so. This will break the assumptions that driver writers make.
> 
> This is already being done by some other DMA engine drivers ;). We can
> advertise more than we can handle at a time, that's the basis of this
> whole idea.
> 
> I understand what you're saying but events are not something that have
> be serviced immediately, they can be queued etc and the actually
> transfer from the DMA controller can be delayed. As long as we don't
> miss the event we are fine which my series takes care off.
> 
> So far I have tested this series on following modules in various
> configurations and have seen no issues:
> - Crypto AES
> - MMC/SD
> - SPI (128x160 display)
> 
>>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>>> is waiting to be set-up.
>>>
>>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>>> driver/subsystem will make SG list of size max_segs. Between these
>>> sessions of creating such smaller SG-lists, if for some reason the MMC
>>> controller is sending events, these will be lost anyway.
>>
>> But if MMC/SD driver knows how many events it should generate if it
>> knows the MAX SG limit. So there should not be any missed events in
>> current code. And I am not claiming that your solution is making matters
>> worse. But its not making it much better as well.
> 
> This is not true for crypto, the events are not deasserted and crypto
> continues to send events. This is what led to the "don't trigger in
> Null" patch where I'm setting the missed flag to avoid recursion.
> 
>>> This can be used only for buffers that are contiguous in memory, not
>>> those that are scattered across memory.
>>
>> I was hinting at using the linking facility of EDMA to achieve this.
>> Each PaRAM set has full 32-bit source and destination pointers so I see
>> no reason why non-contiguous case cannot be handled.
>>
>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>> typically 4 times the number of channels. In this case we use one DMA
>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>> and P1 and P2 are the Link sets.
>>
>> Initial setup:
>>
>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P1  -> P2  -> NULL
>>
>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>> completion. On each completion interrupt, hardware automatically copies
>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>> out, the state of hardware is:
>>
>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>  ^       ^
>>  |       |
>> P0,1    P2  -> NULL
>>  |       ^
>>  |       |
>>  ---------
>>
>> SG1 transfer has already started by the time the TC interrupt is
>> handled. As you can see P1 is now redundant and ready to be recycled. So
>> in the interrupt handler, software recycles P1. Thus:
>>
>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P2  -> P1  -> NULL
>>
>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>> Hardware state:
>>
>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^       ^
>>  |       |
>> P0,2    P1  -> NULL
>>  |       ^
>>  |       |
>>  ---------
>>
>> As part of TC completion interrupt handling:
>>
>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P1  -> P2  -> NULL
>>
>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>> interrupt handler gets more time to recycle the PaRAM set. At no point
>> we touch P0 as it is always under active transfer. Thus the peripheral
>> is always kept busy.
>>
>> Do you see any reason why such a mechanism cannot be implemented?
> 
> This is possible and looks like another way to do it, but there are 2
> problems I can see with it.
> 
> 1. Its inefficient because of too many interrupts:
> 
> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
> 10. This method will trigger 30 interrupts always, where as with my
> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
> you'd get even fewer interrupts.
> 
> 2. If the interrupt handler for some reason doesn't complete or get
> service in time, we will end up DMA'ing incorrect data as events
> wouldn't stop coming in even if interrupt is not yet handled (in your
> example linked sets P1 or P2 would be old ones being repeated). Where as
> with my method, we are not doing any DMA once we finish the current
> MAX_NR_SG set even if events continue to come.
> 

Actually on second thought, 1. can be tackled by having a list of PaRAM
set instead of just 1 set for P1, and another list in P2. And ping-pong
between the P1 and P2 sets only interrupting in between to setup one or
the other. However 2. is still a concern.

Still, what you're asking for is a rewrite of quite a bit of the driver
which I feel is unnecessary at this point as my patch series is
alternate method that's been tested and  working.

The only point of concern I think you have with the series is how will
peripherals react if their events are not handled right away. I am
certain that the peripheral doesn't go into an error condition state
because it doesn't know that its event was missed and it'd be just
waiting. I haven't dealt with EDMA queuing but wouldn't this kind of
wait be happening even with such queues. Another note is, the waiting is
happening even in today's state of the driver where we limit by
MAX_NR_SG. Probably not the same kind of wait as this series (like send
event and wait), but peripheral just not doing anything.

Let me know if you had a specific case of a peripheral where this could
be a problem and I'll be happy to test it. Thanks.

Regards,

-Joel


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01  4:39             ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01  4:39 UTC (permalink / raw)
  To: joelf-l0cyMroinI0
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On 07/31/2013 09:27 PM, Joel Fernandes wrote:
> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>> Hi Sekhar,
>>>
>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>> events and issue them.
>>>>>
>>>>> The sequence of events that require this are:
>>>>>
>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>
>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>> issue them manually.
>>>>
>>>> Are you sure there wont be any effect of these missed events on the
>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>
>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>> buffers for transfer which are not scattered across physical memory.
>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>> created for example by MMC and Crypto.
>>>
>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>
>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>> they don't want to wait, they would have allocated a contiguous block of
>>> memory and DMA that in one stretch so they don't lose any events, and in
>>> such cases we are not linking to Null.
>>
>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>> always work around that by limiting the number of sync events they
>> generate so as to not having any of the events getting missed. With this
>> series, I am worried that EDMA drivers is advertizing that it can handle
>> any length SG list while not taking care of missing any events while
>> doing so. This will break the assumptions that driver writers make.
> 
> This is already being done by some other DMA engine drivers ;). We can
> advertise more than we can handle at a time, that's the basis of this
> whole idea.
> 
> I understand what you're saying but events are not something that have
> be serviced immediately, they can be queued etc and the actually
> transfer from the DMA controller can be delayed. As long as we don't
> miss the event we are fine which my series takes care off.
> 
> So far I have tested this series on following modules in various
> configurations and have seen no issues:
> - Crypto AES
> - MMC/SD
> - SPI (128x160 display)
> 
>>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>>> is waiting to be set-up.
>>>
>>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>>> driver/subsystem will make SG list of size max_segs. Between these
>>> sessions of creating such smaller SG-lists, if for some reason the MMC
>>> controller is sending events, these will be lost anyway.
>>
>> But if MMC/SD driver knows how many events it should generate if it
>> knows the MAX SG limit. So there should not be any missed events in
>> current code. And I am not claiming that your solution is making matters
>> worse. But its not making it much better as well.
> 
> This is not true for crypto, the events are not deasserted and crypto
> continues to send events. This is what led to the "don't trigger in
> Null" patch where I'm setting the missed flag to avoid recursion.
> 
>>> This can be used only for buffers that are contiguous in memory, not
>>> those that are scattered across memory.
>>
>> I was hinting at using the linking facility of EDMA to achieve this.
>> Each PaRAM set has full 32-bit source and destination pointers so I see
>> no reason why non-contiguous case cannot be handled.
>>
>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>> typically 4 times the number of channels. In this case we use one DMA
>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>> and P1 and P2 are the Link sets.
>>
>> Initial setup:
>>
>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P1  -> P2  -> NULL
>>
>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>> completion. On each completion interrupt, hardware automatically copies
>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>> out, the state of hardware is:
>>
>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>  ^       ^
>>  |       |
>> P0,1    P2  -> NULL
>>  |       ^
>>  |       |
>>  ---------
>>
>> SG1 transfer has already started by the time the TC interrupt is
>> handled. As you can see P1 is now redundant and ready to be recycled. So
>> in the interrupt handler, software recycles P1. Thus:
>>
>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P2  -> P1  -> NULL
>>
>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>> Hardware state:
>>
>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^       ^
>>  |       |
>> P0,2    P1  -> NULL
>>  |       ^
>>  |       |
>>  ---------
>>
>> As part of TC completion interrupt handling:
>>
>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P1  -> P2  -> NULL
>>
>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>> interrupt handler gets more time to recycle the PaRAM set. At no point
>> we touch P0 as it is always under active transfer. Thus the peripheral
>> is always kept busy.
>>
>> Do you see any reason why such a mechanism cannot be implemented?
> 
> This is possible and looks like another way to do it, but there are 2
> problems I can see with it.
> 
> 1. Its inefficient because of too many interrupts:
> 
> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
> 10. This method will trigger 30 interrupts always, where as with my
> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
> you'd get even fewer interrupts.
> 
> 2. If the interrupt handler for some reason doesn't complete or get
> service in time, we will end up DMA'ing incorrect data as events
> wouldn't stop coming in even if interrupt is not yet handled (in your
> example linked sets P1 or P2 would be old ones being repeated). Where as
> with my method, we are not doing any DMA once we finish the current
> MAX_NR_SG set even if events continue to come.
> 

Actually on second thought, 1. can be tackled by having a list of PaRAM
set instead of just 1 set for P1, and another list in P2. And ping-pong
between the P1 and P2 sets only interrupting in between to setup one or
the other. However 2. is still a concern.

Still, what you're asking for is a rewrite of quite a bit of the driver
which I feel is unnecessary at this point as my patch series is
alternate method that's been tested and  working.

The only point of concern I think you have with the series is how will
peripherals react if their events are not handled right away. I am
certain that the peripheral doesn't go into an error condition state
because it doesn't know that its event was missed and it'd be just
waiting. I haven't dealt with EDMA queuing but wouldn't this kind of
wait be happening even with such queues. Another note is, the waiting is
happening even in today's state of the driver where we limit by
MAX_NR_SG. Probably not the same kind of wait as this series (like send
event and wait), but peripheral just not doing anything.

Let me know if you had a specific case of a peripheral where this could
be a problem and I'll be happy to test it. Thanks.

Regards,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01  4:39             ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01  4:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 07/31/2013 09:27 PM, Joel Fernandes wrote:
> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>> Hi Sekhar,
>>>
>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>> events and issue them.
>>>>>
>>>>> The sequence of events that require this are:
>>>>>
>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>
>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>> issue them manually.
>>>>
>>>> Are you sure there wont be any effect of these missed events on the
>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>
>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>> buffers for transfer which are not scattered across physical memory.
>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>> created for example by MMC and Crypto.
>>>
>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>
>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>> they don't want to wait, they would have allocated a contiguous block of
>>> memory and DMA that in one stretch so they don't lose any events, and in
>>> such cases we are not linking to Null.
>>
>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>> always work around that by limiting the number of sync events they
>> generate so as to not having any of the events getting missed. With this
>> series, I am worried that EDMA drivers is advertizing that it can handle
>> any length SG list while not taking care of missing any events while
>> doing so. This will break the assumptions that driver writers make.
> 
> This is already being done by some other DMA engine drivers ;). We can
> advertise more than we can handle at a time, that's the basis of this
> whole idea.
> 
> I understand what you're saying but events are not something that have
> be serviced immediately, they can be queued etc and the actually
> transfer from the DMA controller can be delayed. As long as we don't
> miss the event we are fine which my series takes care off.
> 
> So far I have tested this series on following modules in various
> configurations and have seen no issues:
> - Crypto AES
> - MMC/SD
> - SPI (128x160 display)
> 
>>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>>> is waiting to be set-up.
>>>
>>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>>> driver/subsystem will make SG list of size max_segs. Between these
>>> sessions of creating such smaller SG-lists, if for some reason the MMC
>>> controller is sending events, these will be lost anyway.
>>
>> But if MMC/SD driver knows how many events it should generate if it
>> knows the MAX SG limit. So there should not be any missed events in
>> current code. And I am not claiming that your solution is making matters
>> worse. But its not making it much better as well.
> 
> This is not true for crypto, the events are not deasserted and crypto
> continues to send events. This is what led to the "don't trigger in
> Null" patch where I'm setting the missed flag to avoid recursion.
> 
>>> This can be used only for buffers that are contiguous in memory, not
>>> those that are scattered across memory.
>>
>> I was hinting at using the linking facility of EDMA to achieve this.
>> Each PaRAM set has full 32-bit source and destination pointers so I see
>> no reason why non-contiguous case cannot be handled.
>>
>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>> typically 4 times the number of channels. In this case we use one DMA
>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>> and P1 and P2 are the Link sets.
>>
>> Initial setup:
>>
>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P1  -> P2  -> NULL
>>
>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>> completion. On each completion interrupt, hardware automatically copies
>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>> out, the state of hardware is:
>>
>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>  ^       ^
>>  |       |
>> P0,1    P2  -> NULL
>>  |       ^
>>  |       |
>>  ---------
>>
>> SG1 transfer has already started by the time the TC interrupt is
>> handled. As you can see P1 is now redundant and ready to be recycled. So
>> in the interrupt handler, software recycles P1. Thus:
>>
>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P2  -> P1  -> NULL
>>
>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>> Hardware state:
>>
>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^       ^
>>  |       |
>> P0,2    P1  -> NULL
>>  |       ^
>>  |       |
>>  ---------
>>
>> As part of TC completion interrupt handling:
>>
>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P1  -> P2  -> NULL
>>
>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>> interrupt handler gets more time to recycle the PaRAM set. At no point
>> we touch P0 as it is always under active transfer. Thus the peripheral
>> is always kept busy.
>>
>> Do you see any reason why such a mechanism cannot be implemented?
> 
> This is possible and looks like another way to do it, but there are 2
> problems I can see with it.
> 
> 1. Its inefficient because of too many interrupts:
> 
> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
> 10. This method will trigger 30 interrupts always, where as with my
> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
> you'd get even fewer interrupts.
> 
> 2. If the interrupt handler for some reason doesn't complete or get
> service in time, we will end up DMA'ing incorrect data as events
> wouldn't stop coming in even if interrupt is not yet handled (in your
> example linked sets P1 or P2 would be old ones being repeated). Where as
> with my method, we are not doing any DMA once we finish the current
> MAX_NR_SG set even if events continue to come.
> 

Actually on second thought, 1. can be tackled by having a list of PaRAM
set instead of just 1 set for P1, and another list in P2. And ping-pong
between the P1 and P2 sets only interrupting in between to setup one or
the other. However 2. is still a concern.

Still, what you're asking for is a rewrite of quite a bit of the driver
which I feel is unnecessary at this point as my patch series is
alternate method that's been tested and  working.

The only point of concern I think you have with the series is how will
peripherals react if their events are not handled right away. I am
certain that the peripheral doesn't go into an error condition state
because it doesn't know that its event was missed and it'd be just
waiting. I haven't dealt with EDMA queuing but wouldn't this kind of
wait be happening even with such queues. Another note is, the waiting is
happening even in today's state of the driver where we limit by
MAX_NR_SG. Probably not the same kind of wait as this series (like send
event and wait), but peripheral just not doing anything.

Let me know if you had a specific case of a peripheral where this could
be a problem and I'll be happy to test it. Thanks.

Regards,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01  6:13             ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-08-01  6:13 UTC (permalink / raw)
  To: joelf
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On Thursday 01 August 2013 07:57 AM, Joel Fernandes wrote:
> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>> Hi Sekhar,
>>>
>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>> events and issue them.
>>>>>
>>>>> The sequence of events that require this are:
>>>>>
>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>
>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>> issue them manually.
>>>>
>>>> Are you sure there wont be any effect of these missed events on the
>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>
>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>> buffers for transfer which are not scattered across physical memory.
>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>> created for example by MMC and Crypto.
>>>
>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>
>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>> they don't want to wait, they would have allocated a contiguous block of
>>> memory and DMA that in one stretch so they don't lose any events, and in
>>> such cases we are not linking to Null.
>>
>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>> always work around that by limiting the number of sync events they
>> generate so as to not having any of the events getting missed. With this
>> series, I am worried that EDMA drivers is advertizing that it can handle
>> any length SG list while not taking care of missing any events while
>> doing so. This will break the assumptions that driver writers make.
> 
> This is already being done by some other DMA engine drivers ;). We can
> advertise more than we can handle at a time, that's the basis of this
> whole idea.
> 
> I understand what you're saying but events are not something that have
> be serviced immediately, they can be queued etc and the actually
> transfer from the DMA controller can be delayed. As long as we don't
> miss the event we are fine which my series takes care off.
> 
> So far I have tested this series on following modules in various
> configurations and have seen no issues:
> - Crypto AES
> - MMC/SD
> - SPI (128x160 display)

Notice how in each of these cases the peripheral is in control of when
data is driven out? Please test with McASP in a configuration where
codec drives the frame-sync/bit-clock or with UART under high baud rate.

> 
>>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>>> is waiting to be set-up.
>>>
>>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>>> driver/subsystem will make SG list of size max_segs. Between these
>>> sessions of creating such smaller SG-lists, if for some reason the MMC
>>> controller is sending events, these will be lost anyway.
>>
>> But if MMC/SD driver knows how many events it should generate if it
>> knows the MAX SG limit. So there should not be any missed events in
>> current code. And I am not claiming that your solution is making matters
>> worse. But its not making it much better as well.
> 
> This is not true for crypto, the events are not deasserted and crypto
> continues to send events. This is what led to the "don't trigger in
> Null" patch where I'm setting the missed flag to avoid recursion.

Sorry, I am not sure which patch you are talking about here. Can you
provide the full subject line to avoid confusion?

>>> This can be used only for buffers that are contiguous in memory, not
>>> those that are scattered across memory.
>>
>> I was hinting at using the linking facility of EDMA to achieve this.
>> Each PaRAM set has full 32-bit source and destination pointers so I see
>> no reason why non-contiguous case cannot be handled.
>>
>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>> typically 4 times the number of channels. In this case we use one DMA
>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>> and P1 and P2 are the Link sets.
>>
>> Initial setup:
>>
>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P1  -> P2  -> NULL
>>
>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>> completion. On each completion interrupt, hardware automatically copies
>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>> out, the state of hardware is:
>>
>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>  ^       ^
>>  |       |
>> P0,1    P2  -> NULL
>>  |       ^
>>  |       |
>>  ---------
>>
>> SG1 transfer has already started by the time the TC interrupt is
>> handled. As you can see P1 is now redundant and ready to be recycled. So
>> in the interrupt handler, software recycles P1. Thus:
>>
>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P2  -> P1  -> NULL
>>
>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>> Hardware state:
>>
>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^       ^
>>  |       |
>> P0,2    P1  -> NULL
>>  |       ^
>>  |       |
>>  ---------
>>
>> As part of TC completion interrupt handling:
>>
>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P1  -> P2  -> NULL
>>
>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>> interrupt handler gets more time to recycle the PaRAM set. At no point
>> we touch P0 as it is always under active transfer. Thus the peripheral
>> is always kept busy.
>>
>> Do you see any reason why such a mechanism cannot be implemented?
> 
> This is possible and looks like another way to do it, but there are 2
> problems I can see with it.
> 
> 1. Its inefficient because of too many interrupts:
> 
> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
> 10. This method will trigger 30 interrupts always, where as with my
> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
> you'd get even fewer interrupts.

Yes, but you are seeing only one side of inefficiency. In your design
DMA *always* stalls waiting for CPU to intervene. The whole point to DMA
is to keep it going while CPU does bookeeping in background. This is
simply not going to scale with fast peripherals.

Besides, missed events are error conditions as far as EDMA and the
peripheral is considered. You are handling error interrupt to support a
successful transaction. Think about why EDMA considers missed events as
error condition.

> 
> 2. If the interrupt handler for some reason doesn't complete or get
> service in time, we will end up DMA'ing incorrect data as events
> wouldn't stop coming in even if interrupt is not yet handled (in your
> example linked sets P1 or P2 would be old ones being repeated). Where as
> with my method, we are not doing any DMA once we finish the current
> MAX_NR_SG set even if events continue to come.

Where is repetition and possibility of wrong data being transferred? We
have a linear list of PaRAM sets - not a loop. You would link the end to
PaRAM set chain to dummy PaRAM set which BTW will not cause missed
events. The more number of PaRAM sets you add to the chain, the more
time CPU gets to intervene before DMA eventually stalls. This is a
tradeoff system designers can manage.

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01  6:13             ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-08-01  6:13 UTC (permalink / raw)
  To: joelf-l0cyMroinI0
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On Thursday 01 August 2013 07:57 AM, Joel Fernandes wrote:
> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>> Hi Sekhar,
>>>
>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>> events and issue them.
>>>>>
>>>>> The sequence of events that require this are:
>>>>>
>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>
>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>> issue them manually.
>>>>
>>>> Are you sure there wont be any effect of these missed events on the
>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>
>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>> buffers for transfer which are not scattered across physical memory.
>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>> created for example by MMC and Crypto.
>>>
>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>
>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>> they don't want to wait, they would have allocated a contiguous block of
>>> memory and DMA that in one stretch so they don't lose any events, and in
>>> such cases we are not linking to Null.
>>
>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>> always work around that by limiting the number of sync events they
>> generate so as to not having any of the events getting missed. With this
>> series, I am worried that EDMA drivers is advertizing that it can handle
>> any length SG list while not taking care of missing any events while
>> doing so. This will break the assumptions that driver writers make.
> 
> This is already being done by some other DMA engine drivers ;). We can
> advertise more than we can handle at a time, that's the basis of this
> whole idea.
> 
> I understand what you're saying but events are not something that have
> be serviced immediately, they can be queued etc and the actually
> transfer from the DMA controller can be delayed. As long as we don't
> miss the event we are fine which my series takes care off.
> 
> So far I have tested this series on following modules in various
> configurations and have seen no issues:
> - Crypto AES
> - MMC/SD
> - SPI (128x160 display)

Notice how in each of these cases the peripheral is in control of when
data is driven out? Please test with McASP in a configuration where
codec drives the frame-sync/bit-clock or with UART under high baud rate.

> 
>>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>>> is waiting to be set-up.
>>>
>>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>>> driver/subsystem will make SG list of size max_segs. Between these
>>> sessions of creating such smaller SG-lists, if for some reason the MMC
>>> controller is sending events, these will be lost anyway.
>>
>> But if MMC/SD driver knows how many events it should generate if it
>> knows the MAX SG limit. So there should not be any missed events in
>> current code. And I am not claiming that your solution is making matters
>> worse. But its not making it much better as well.
> 
> This is not true for crypto, the events are not deasserted and crypto
> continues to send events. This is what led to the "don't trigger in
> Null" patch where I'm setting the missed flag to avoid recursion.

Sorry, I am not sure which patch you are talking about here. Can you
provide the full subject line to avoid confusion?

>>> This can be used only for buffers that are contiguous in memory, not
>>> those that are scattered across memory.
>>
>> I was hinting at using the linking facility of EDMA to achieve this.
>> Each PaRAM set has full 32-bit source and destination pointers so I see
>> no reason why non-contiguous case cannot be handled.
>>
>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>> typically 4 times the number of channels. In this case we use one DMA
>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>> and P1 and P2 are the Link sets.
>>
>> Initial setup:
>>
>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P1  -> P2  -> NULL
>>
>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>> completion. On each completion interrupt, hardware automatically copies
>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>> out, the state of hardware is:
>>
>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>  ^       ^
>>  |       |
>> P0,1    P2  -> NULL
>>  |       ^
>>  |       |
>>  ---------
>>
>> SG1 transfer has already started by the time the TC interrupt is
>> handled. As you can see P1 is now redundant and ready to be recycled. So
>> in the interrupt handler, software recycles P1. Thus:
>>
>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P2  -> P1  -> NULL
>>
>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>> Hardware state:
>>
>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^       ^
>>  |       |
>> P0,2    P1  -> NULL
>>  |       ^
>>  |       |
>>  ---------
>>
>> As part of TC completion interrupt handling:
>>
>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P1  -> P2  -> NULL
>>
>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>> interrupt handler gets more time to recycle the PaRAM set. At no point
>> we touch P0 as it is always under active transfer. Thus the peripheral
>> is always kept busy.
>>
>> Do you see any reason why such a mechanism cannot be implemented?
> 
> This is possible and looks like another way to do it, but there are 2
> problems I can see with it.
> 
> 1. Its inefficient because of too many interrupts:
> 
> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
> 10. This method will trigger 30 interrupts always, where as with my
> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
> you'd get even fewer interrupts.

Yes, but you are seeing only one side of inefficiency. In your design
DMA *always* stalls waiting for CPU to intervene. The whole point to DMA
is to keep it going while CPU does bookeeping in background. This is
simply not going to scale with fast peripherals.

Besides, missed events are error conditions as far as EDMA and the
peripheral is considered. You are handling error interrupt to support a
successful transaction. Think about why EDMA considers missed events as
error condition.

> 
> 2. If the interrupt handler for some reason doesn't complete or get
> service in time, we will end up DMA'ing incorrect data as events
> wouldn't stop coming in even if interrupt is not yet handled (in your
> example linked sets P1 or P2 would be old ones being repeated). Where as
> with my method, we are not doing any DMA once we finish the current
> MAX_NR_SG set even if events continue to come.

Where is repetition and possibility of wrong data being transferred? We
have a linear list of PaRAM sets - not a loop. You would link the end to
PaRAM set chain to dummy PaRAM set which BTW will not cause missed
events. The more number of PaRAM sets you add to the chain, the more
time CPU gets to intervene before DMA eventually stalls. This is a
tradeoff system designers can manage.

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01  6:13             ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-08-01  6:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 01 August 2013 07:57 AM, Joel Fernandes wrote:
> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>> Hi Sekhar,
>>>
>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>> events and issue them.
>>>>>
>>>>> The sequence of events that require this are:
>>>>>
>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>
>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>
>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>> issue them manually.
>>>>
>>>> Are you sure there wont be any effect of these missed events on the
>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>
>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>> buffers for transfer which are not scattered across physical memory.
>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>> created for example by MMC and Crypto.
>>>
>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>
>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>> they don't want to wait, they would have allocated a contiguous block of
>>> memory and DMA that in one stretch so they don't lose any events, and in
>>> such cases we are not linking to Null.
>>
>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>> always work around that by limiting the number of sync events they
>> generate so as to not having any of the events getting missed. With this
>> series, I am worried that EDMA drivers is advertizing that it can handle
>> any length SG list while not taking care of missing any events while
>> doing so. This will break the assumptions that driver writers make.
> 
> This is already being done by some other DMA engine drivers ;). We can
> advertise more than we can handle at a time, that's the basis of this
> whole idea.
> 
> I understand what you're saying but events are not something that have
> be serviced immediately, they can be queued etc and the actually
> transfer from the DMA controller can be delayed. As long as we don't
> miss the event we are fine which my series takes care off.
> 
> So far I have tested this series on following modules in various
> configurations and have seen no issues:
> - Crypto AES
> - MMC/SD
> - SPI (128x160 display)

Notice how in each of these cases the peripheral is in control of when
data is driven out? Please test with McASP in a configuration where
codec drives the frame-sync/bit-clock or with UART under high baud rate.

> 
>>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>>> is waiting to be set-up.
>>>
>>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>>> driver/subsystem will make SG list of size max_segs. Between these
>>> sessions of creating such smaller SG-lists, if for some reason the MMC
>>> controller is sending events, these will be lost anyway.
>>
>> But if MMC/SD driver knows how many events it should generate if it
>> knows the MAX SG limit. So there should not be any missed events in
>> current code. And I am not claiming that your solution is making matters
>> worse. But its not making it much better as well.
> 
> This is not true for crypto, the events are not deasserted and crypto
> continues to send events. This is what led to the "don't trigger in
> Null" patch where I'm setting the missed flag to avoid recursion.

Sorry, I am not sure which patch you are talking about here. Can you
provide the full subject line to avoid confusion?

>>> This can be used only for buffers that are contiguous in memory, not
>>> those that are scattered across memory.
>>
>> I was hinting at using the linking facility of EDMA to achieve this.
>> Each PaRAM set has full 32-bit source and destination pointers so I see
>> no reason why non-contiguous case cannot be handled.
>>
>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>> typically 4 times the number of channels. In this case we use one DMA
>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>> and P1 and P2 are the Link sets.
>>
>> Initial setup:
>>
>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P1  -> P2  -> NULL
>>
>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>> completion. On each completion interrupt, hardware automatically copies
>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>> out, the state of hardware is:
>>
>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>  ^       ^
>>  |       |
>> P0,1    P2  -> NULL
>>  |       ^
>>  |       |
>>  ---------
>>
>> SG1 transfer has already started by the time the TC interrupt is
>> handled. As you can see P1 is now redundant and ready to be recycled. So
>> in the interrupt handler, software recycles P1. Thus:
>>
>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P2  -> P1  -> NULL
>>
>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>> Hardware state:
>>
>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^       ^
>>  |       |
>> P0,2    P1  -> NULL
>>  |       ^
>>  |       |
>>  ---------
>>
>> As part of TC completion interrupt handling:
>>
>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>  ^      ^      ^
>>  |      |      |
>> P0  -> P1  -> P2  -> NULL
>>
>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>> interrupt handler gets more time to recycle the PaRAM set. At no point
>> we touch P0 as it is always under active transfer. Thus the peripheral
>> is always kept busy.
>>
>> Do you see any reason why such a mechanism cannot be implemented?
> 
> This is possible and looks like another way to do it, but there are 2
> problems I can see with it.
> 
> 1. Its inefficient because of too many interrupts:
> 
> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
> 10. This method will trigger 30 interrupts always, where as with my
> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
> you'd get even fewer interrupts.

Yes, but you are seeing only one side of inefficiency. In your design
DMA *always* stalls waiting for CPU to intervene. The whole point to DMA
is to keep it going while CPU does bookeeping in background. This is
simply not going to scale with fast peripherals.

Besides, missed events are error conditions as far as EDMA and the
peripheral is considered. You are handling error interrupt to support a
successful transaction. Think about why EDMA considers missed events as
error condition.

> 
> 2. If the interrupt handler for some reason doesn't complete or get
> service in time, we will end up DMA'ing incorrect data as events
> wouldn't stop coming in even if interrupt is not yet handled (in your
> example linked sets P1 or P2 would be old ones being repeated). Where as
> with my method, we are not doing any DMA once we finish the current
> MAX_NR_SG set even if events continue to come.

Where is repetition and possibility of wrong data being transferred? We
have a linear list of PaRAM sets - not a loop. You would link the end to
PaRAM set chain to dummy PaRAM set which BTW will not cause missed
events. The more number of PaRAM sets you add to the chain, the more
time CPU gets to intervene before DMA eventually stalls. This is a
tradeoff system designers can manage.

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01 20:28               ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01 20:28 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On 08/01/2013 01:13 AM, Sekhar Nori wrote:
> On Thursday 01 August 2013 07:57 AM, Joel Fernandes wrote:
>> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>>> Hi Sekhar,
>>>>
>>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>>> events and issue them.
>>>>>>
>>>>>> The sequence of events that require this are:
>>>>>>
>>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>>
>>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>>
>>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>>
>>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>>
>>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>>> issue them manually.
>>>>>
>>>>> Are you sure there wont be any effect of these missed events on the
>>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>>
>>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>>> buffers for transfer which are not scattered across physical memory.
>>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>>> created for example by MMC and Crypto.
>>>>
>>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>>
>>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>>> they don't want to wait, they would have allocated a contiguous block of
>>>> memory and DMA that in one stretch so they don't lose any events, and in
>>>> such cases we are not linking to Null.
>>>
>>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>>> always work around that by limiting the number of sync events they
>>> generate so as to not having any of the events getting missed. With this
>>> series, I am worried that EDMA drivers is advertizing that it can handle
>>> any length SG list while not taking care of missing any events while
>>> doing so. This will break the assumptions that driver writers make.
>>
>> This is already being done by some other DMA engine drivers ;). We can
>> advertise more than we can handle at a time, that's the basis of this
>> whole idea.
>>
>> I understand what you're saying but events are not something that have
>> be serviced immediately, they can be queued etc and the actually
>> transfer from the DMA controller can be delayed. As long as we don't
>> miss the event we are fine which my series takes care off.
>>
>> So far I have tested this series on following modules in various
>> configurations and have seen no issues:
>> - Crypto AES
>> - MMC/SD
>> - SPI (128x160 display)
> 
> Notice how in each of these cases the peripheral is in control of when
> data is driven out? Please test with McASP in a configuration where
> codec drives the frame-sync/bit-clock or with UART under high baud rate.

McASP allocates a contiguous buffer. For this case there is always an SG
of size 1 and this patch series doesn't effect it at all, there is not
stalling. Further McASP audio driver is still awaiting conversion to use
DMA engine so there's no way yet to test it.

>>>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>>>> is waiting to be set-up.
>>>>
>>>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>>>> driver/subsystem will make SG list of size max_segs. Between these
>>>> sessions of creating such smaller SG-lists, if for some reason the MMC
>>>> controller is sending events, these will be lost anyway.
>>>
>>> But if MMC/SD driver knows how many events it should generate if it
>>> knows the MAX SG limit. So there should not be any missed events in
>>> current code. And I am not claiming that your solution is making matters
>>> worse. But its not making it much better as well.
>>
>> This is not true for crypto, the events are not deasserted and crypto
>> continues to send events. This is what led to the "don't trigger in
>> Null" patch where I'm setting the missed flag to avoid recursion.
> 
> Sorry, I am not sure which patch you are talking about here. Can you
> provide the full subject line to avoid confusion?

Sure, "dma: edma: Detect null slot errors and handle them correctly".

>>>> This can be used only for buffers that are contiguous in memory, not
>>>> those that are scattered across memory.
>>>
>>> I was hinting at using the linking facility of EDMA to achieve this.
>>> Each PaRAM set has full 32-bit source and destination pointers so I see
>>> no reason why non-contiguous case cannot be handled.
>>>
>>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>>> typically 4 times the number of channels. In this case we use one DMA
>>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>>> and P1 and P2 are the Link sets.
>>>
>>> Initial setup:
>>>
>>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>  ^      ^      ^
>>>  |      |      |
>>> P0  -> P1  -> P2  -> NULL
>>>
>>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>>> completion. On each completion interrupt, hardware automatically copies
>>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>>> out, the state of hardware is:
>>>
>>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>>  ^       ^
>>>  |       |
>>> P0,1    P2  -> NULL
>>>  |       ^
>>>  |       |
>>>  ---------
>>>
>>> SG1 transfer has already started by the time the TC interrupt is
>>> handled. As you can see P1 is now redundant and ready to be recycled. So
>>> in the interrupt handler, software recycles P1. Thus:
>>>
>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>  ^      ^      ^
>>>  |      |      |
>>> P0  -> P2  -> P1  -> NULL
>>>
>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>> Hardware state:
>>>
>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>  ^       ^
>>>  |       |
>>> P0,2    P1  -> NULL
>>>  |       ^
>>>  |       |
>>>  ---------
>>>
>>> As part of TC completion interrupt handling:
>>>
>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>  ^      ^      ^
>>>  |      |      |
>>> P0  -> P1  -> P2  -> NULL
>>>
>>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>>> interrupt handler gets more time to recycle the PaRAM set. At no point
>>> we touch P0 as it is always under active transfer. Thus the peripheral
>>> is always kept busy.
>>>
>>> Do you see any reason why such a mechanism cannot be implemented?
>>
>> This is possible and looks like another way to do it, but there are 2
>> problems I can see with it.
>>
>> 1. Its inefficient because of too many interrupts:
>>
>> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
>> 10. This method will trigger 30 interrupts always, where as with my
>> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
>> you'd get even fewer interrupts.
> 
> Yes, but you are seeing only one side of inefficiency. In your design
> DMA *always* stalls waiting for CPU to intervene. The whole point to DMA
> is to keep it going while CPU does bookeeping in background. This is
> simply not going to scale with fast peripherals.

Agreed. So far though, I've no way to reproduce a fast peripheral that
scatters data across physical memory and suffers from any stall.

> Besides, missed events are error conditions as far as EDMA and the
> peripheral is considered. You are handling error interrupt to support a
> successful transaction. Think about why EDMA considers missed events as
> error condition.

I agree with this, its not the best way to do it. I have been working on
a different approach.

However, in support of the series:
1. It doesn't break any existing code
2. It works for all current DMA users (performance and correctness)
3. It removes the SG limitations on DMA users.

So what you suggested, would be more of a feature addition than a
limitation of this series. It is atleast better than what's being done
now - forcing the limit to the total number of SGs, so it is a step in
the right direction.

>> 2. If the interrupt handler for some reason doesn't complete or get
>> service in time, we will end up DMA'ing incorrect data as events
>> wouldn't stop coming in even if interrupt is not yet handled (in your
>> example linked sets P1 or P2 would be old ones being repeated). Where as
>> with my method, we are not doing any DMA once we finish the current
>> MAX_NR_SG set even if events continue to come.
> 
> Where is repetition and possibility of wrong data being transferred? We
> have a linear list of PaRAM sets - not a loop. You would link the end to
> PaRAM set chain to dummy PaRAM set which BTW will not cause missed
> events. The more number of PaRAM sets you add to the chain, the more

There would have to be a loop, how else would you ensure continuity and
uninterrupted DMA?

Consider if you have 2 sets of linked sets:
L1 is the first set of Linked sets and L2 is the second.

When L1 is done, EDMA continues with L2 (due to the link) while
interrupt handler prepares L1. The continuity depends on L1 being linked
to L2. Only the absolute last break up of the MAX_NR_SG linked set will
be linked to Dummy.

So consider MAX_NR_SG=10, and sg_len = 35

L1 - L2 - L1 - L1 - Dummy

The split would be in number of slots,
10 - 10 - 10 -  5 - Dummy

> time CPU gets to intervene before DMA eventually stalls. This is a
> tradeoff system designers can manage.

Consider what happens in the case where MAX_SG_NR=1 or 2. In that case,
there's a change we might not get enough time for the interrupt handler
to setup next series of linked set.

Some how this limitation has to be overcome by advising in comments than
MAX_SG_NR should always be greater than a certain number to ensure
proper operation.

Thanks,

-Joel


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01 20:28               ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01 20:28 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On 08/01/2013 01:13 AM, Sekhar Nori wrote:
> On Thursday 01 August 2013 07:57 AM, Joel Fernandes wrote:
>> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>>> Hi Sekhar,
>>>>
>>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>>> events and issue them.
>>>>>>
>>>>>> The sequence of events that require this are:
>>>>>>
>>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>>
>>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>>
>>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>>
>>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>>
>>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>>> issue them manually.
>>>>>
>>>>> Are you sure there wont be any effect of these missed events on the
>>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>>
>>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>>> buffers for transfer which are not scattered across physical memory.
>>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>>> created for example by MMC and Crypto.
>>>>
>>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>>
>>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>>> they don't want to wait, they would have allocated a contiguous block of
>>>> memory and DMA that in one stretch so they don't lose any events, and in
>>>> such cases we are not linking to Null.
>>>
>>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>>> always work around that by limiting the number of sync events they
>>> generate so as to not having any of the events getting missed. With this
>>> series, I am worried that EDMA drivers is advertizing that it can handle
>>> any length SG list while not taking care of missing any events while
>>> doing so. This will break the assumptions that driver writers make.
>>
>> This is already being done by some other DMA engine drivers ;). We can
>> advertise more than we can handle at a time, that's the basis of this
>> whole idea.
>>
>> I understand what you're saying but events are not something that have
>> be serviced immediately, they can be queued etc and the actually
>> transfer from the DMA controller can be delayed. As long as we don't
>> miss the event we are fine which my series takes care off.
>>
>> So far I have tested this series on following modules in various
>> configurations and have seen no issues:
>> - Crypto AES
>> - MMC/SD
>> - SPI (128x160 display)
> 
> Notice how in each of these cases the peripheral is in control of when
> data is driven out? Please test with McASP in a configuration where
> codec drives the frame-sync/bit-clock or with UART under high baud rate.

McASP allocates a contiguous buffer. For this case there is always an SG
of size 1 and this patch series doesn't effect it at all, there is not
stalling. Further McASP audio driver is still awaiting conversion to use
DMA engine so there's no way yet to test it.

>>>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>>>> is waiting to be set-up.
>>>>
>>>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>>>> driver/subsystem will make SG list of size max_segs. Between these
>>>> sessions of creating such smaller SG-lists, if for some reason the MMC
>>>> controller is sending events, these will be lost anyway.
>>>
>>> But if MMC/SD driver knows how many events it should generate if it
>>> knows the MAX SG limit. So there should not be any missed events in
>>> current code. And I am not claiming that your solution is making matters
>>> worse. But its not making it much better as well.
>>
>> This is not true for crypto, the events are not deasserted and crypto
>> continues to send events. This is what led to the "don't trigger in
>> Null" patch where I'm setting the missed flag to avoid recursion.
> 
> Sorry, I am not sure which patch you are talking about here. Can you
> provide the full subject line to avoid confusion?

Sure, "dma: edma: Detect null slot errors and handle them correctly".

>>>> This can be used only for buffers that are contiguous in memory, not
>>>> those that are scattered across memory.
>>>
>>> I was hinting at using the linking facility of EDMA to achieve this.
>>> Each PaRAM set has full 32-bit source and destination pointers so I see
>>> no reason why non-contiguous case cannot be handled.
>>>
>>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>>> typically 4 times the number of channels. In this case we use one DMA
>>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>>> and P1 and P2 are the Link sets.
>>>
>>> Initial setup:
>>>
>>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>  ^      ^      ^
>>>  |      |      |
>>> P0  -> P1  -> P2  -> NULL
>>>
>>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>>> completion. On each completion interrupt, hardware automatically copies
>>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>>> out, the state of hardware is:
>>>
>>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>>  ^       ^
>>>  |       |
>>> P0,1    P2  -> NULL
>>>  |       ^
>>>  |       |
>>>  ---------
>>>
>>> SG1 transfer has already started by the time the TC interrupt is
>>> handled. As you can see P1 is now redundant and ready to be recycled. So
>>> in the interrupt handler, software recycles P1. Thus:
>>>
>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>  ^      ^      ^
>>>  |      |      |
>>> P0  -> P2  -> P1  -> NULL
>>>
>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>> Hardware state:
>>>
>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>  ^       ^
>>>  |       |
>>> P0,2    P1  -> NULL
>>>  |       ^
>>>  |       |
>>>  ---------
>>>
>>> As part of TC completion interrupt handling:
>>>
>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>  ^      ^      ^
>>>  |      |      |
>>> P0  -> P1  -> P2  -> NULL
>>>
>>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>>> interrupt handler gets more time to recycle the PaRAM set. At no point
>>> we touch P0 as it is always under active transfer. Thus the peripheral
>>> is always kept busy.
>>>
>>> Do you see any reason why such a mechanism cannot be implemented?
>>
>> This is possible and looks like another way to do it, but there are 2
>> problems I can see with it.
>>
>> 1. Its inefficient because of too many interrupts:
>>
>> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
>> 10. This method will trigger 30 interrupts always, where as with my
>> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
>> you'd get even fewer interrupts.
> 
> Yes, but you are seeing only one side of inefficiency. In your design
> DMA *always* stalls waiting for CPU to intervene. The whole point to DMA
> is to keep it going while CPU does bookeeping in background. This is
> simply not going to scale with fast peripherals.

Agreed. So far though, I've no way to reproduce a fast peripheral that
scatters data across physical memory and suffers from any stall.

> Besides, missed events are error conditions as far as EDMA and the
> peripheral is considered. You are handling error interrupt to support a
> successful transaction. Think about why EDMA considers missed events as
> error condition.

I agree with this, its not the best way to do it. I have been working on
a different approach.

However, in support of the series:
1. It doesn't break any existing code
2. It works for all current DMA users (performance and correctness)
3. It removes the SG limitations on DMA users.

So what you suggested, would be more of a feature addition than a
limitation of this series. It is atleast better than what's being done
now - forcing the limit to the total number of SGs, so it is a step in
the right direction.

>> 2. If the interrupt handler for some reason doesn't complete or get
>> service in time, we will end up DMA'ing incorrect data as events
>> wouldn't stop coming in even if interrupt is not yet handled (in your
>> example linked sets P1 or P2 would be old ones being repeated). Where as
>> with my method, we are not doing any DMA once we finish the current
>> MAX_NR_SG set even if events continue to come.
> 
> Where is repetition and possibility of wrong data being transferred? We
> have a linear list of PaRAM sets - not a loop. You would link the end to
> PaRAM set chain to dummy PaRAM set which BTW will not cause missed
> events. The more number of PaRAM sets you add to the chain, the more

There would have to be a loop, how else would you ensure continuity and
uninterrupted DMA?

Consider if you have 2 sets of linked sets:
L1 is the first set of Linked sets and L2 is the second.

When L1 is done, EDMA continues with L2 (due to the link) while
interrupt handler prepares L1. The continuity depends on L1 being linked
to L2. Only the absolute last break up of the MAX_NR_SG linked set will
be linked to Dummy.

So consider MAX_NR_SG=10, and sg_len = 35

L1 - L2 - L1 - L1 - Dummy

The split would be in number of slots,
10 - 10 - 10 -  5 - Dummy

> time CPU gets to intervene before DMA eventually stalls. This is a
> tradeoff system designers can manage.

Consider what happens in the case where MAX_SG_NR=1 or 2. In that case,
there's a change we might not get enough time for the interrupt handler
to setup next series of linked set.

Some how this limitation has to be overcome by advising in comments than
MAX_SG_NR should always be greater than a certain number to ensure
proper operation.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01 20:28               ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01 20:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 08/01/2013 01:13 AM, Sekhar Nori wrote:
> On Thursday 01 August 2013 07:57 AM, Joel Fernandes wrote:
>> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>>> Hi Sekhar,
>>>>
>>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>>> events and issue them.
>>>>>>
>>>>>> The sequence of events that require this are:
>>>>>>
>>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>>
>>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>>
>>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>>
>>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>>
>>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>>> issue them manually.
>>>>>
>>>>> Are you sure there wont be any effect of these missed events on the
>>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>>
>>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>>> buffers for transfer which are not scattered across physical memory.
>>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>>> created for example by MMC and Crypto.
>>>>
>>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>>
>>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>>> they don't want to wait, they would have allocated a contiguous block of
>>>> memory and DMA that in one stretch so they don't lose any events, and in
>>>> such cases we are not linking to Null.
>>>
>>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>>> always work around that by limiting the number of sync events they
>>> generate so as to not having any of the events getting missed. With this
>>> series, I am worried that EDMA drivers is advertizing that it can handle
>>> any length SG list while not taking care of missing any events while
>>> doing so. This will break the assumptions that driver writers make.
>>
>> This is already being done by some other DMA engine drivers ;). We can
>> advertise more than we can handle at a time, that's the basis of this
>> whole idea.
>>
>> I understand what you're saying but events are not something that have
>> be serviced immediately, they can be queued etc and the actually
>> transfer from the DMA controller can be delayed. As long as we don't
>> miss the event we are fine which my series takes care off.
>>
>> So far I have tested this series on following modules in various
>> configurations and have seen no issues:
>> - Crypto AES
>> - MMC/SD
>> - SPI (128x160 display)
> 
> Notice how in each of these cases the peripheral is in control of when
> data is driven out? Please test with McASP in a configuration where
> codec drives the frame-sync/bit-clock or with UART under high baud rate.

McASP allocates a contiguous buffer. For this case there is always an SG
of size 1 and this patch series doesn't effect it at all, there is not
stalling. Further McASP audio driver is still awaiting conversion to use
DMA engine so there's no way yet to test it.

>>>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>>>> is waiting to be set-up.
>>>>
>>>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>>>> driver/subsystem will make SG list of size max_segs. Between these
>>>> sessions of creating such smaller SG-lists, if for some reason the MMC
>>>> controller is sending events, these will be lost anyway.
>>>
>>> But if MMC/SD driver knows how many events it should generate if it
>>> knows the MAX SG limit. So there should not be any missed events in
>>> current code. And I am not claiming that your solution is making matters
>>> worse. But its not making it much better as well.
>>
>> This is not true for crypto, the events are not deasserted and crypto
>> continues to send events. This is what led to the "don't trigger in
>> Null" patch where I'm setting the missed flag to avoid recursion.
> 
> Sorry, I am not sure which patch you are talking about here. Can you
> provide the full subject line to avoid confusion?

Sure, "dma: edma: Detect null slot errors and handle them correctly".

>>>> This can be used only for buffers that are contiguous in memory, not
>>>> those that are scattered across memory.
>>>
>>> I was hinting at using the linking facility of EDMA to achieve this.
>>> Each PaRAM set has full 32-bit source and destination pointers so I see
>>> no reason why non-contiguous case cannot be handled.
>>>
>>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>>> typically 4 times the number of channels. In this case we use one DMA
>>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>>> and P1 and P2 are the Link sets.
>>>
>>> Initial setup:
>>>
>>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>  ^      ^      ^
>>>  |      |      |
>>> P0  -> P1  -> P2  -> NULL
>>>
>>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>>> completion. On each completion interrupt, hardware automatically copies
>>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>>> out, the state of hardware is:
>>>
>>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>>  ^       ^
>>>  |       |
>>> P0,1    P2  -> NULL
>>>  |       ^
>>>  |       |
>>>  ---------
>>>
>>> SG1 transfer has already started by the time the TC interrupt is
>>> handled. As you can see P1 is now redundant and ready to be recycled. So
>>> in the interrupt handler, software recycles P1. Thus:
>>>
>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>  ^      ^      ^
>>>  |      |      |
>>> P0  -> P2  -> P1  -> NULL
>>>
>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>> Hardware state:
>>>
>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>  ^       ^
>>>  |       |
>>> P0,2    P1  -> NULL
>>>  |       ^
>>>  |       |
>>>  ---------
>>>
>>> As part of TC completion interrupt handling:
>>>
>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>  ^      ^      ^
>>>  |      |      |
>>> P0  -> P1  -> P2  -> NULL
>>>
>>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>>> interrupt handler gets more time to recycle the PaRAM set. At no point
>>> we touch P0 as it is always under active transfer. Thus the peripheral
>>> is always kept busy.
>>>
>>> Do you see any reason why such a mechanism cannot be implemented?
>>
>> This is possible and looks like another way to do it, but there are 2
>> problems I can see with it.
>>
>> 1. Its inefficient because of too many interrupts:
>>
>> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
>> 10. This method will trigger 30 interrupts always, where as with my
>> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
>> you'd get even fewer interrupts.
> 
> Yes, but you are seeing only one side of inefficiency. In your design
> DMA *always* stalls waiting for CPU to intervene. The whole point to DMA
> is to keep it going while CPU does bookeeping in background. This is
> simply not going to scale with fast peripherals.

Agreed. So far though, I've no way to reproduce a fast peripheral that
scatters data across physical memory and suffers from any stall.

> Besides, missed events are error conditions as far as EDMA and the
> peripheral is considered. You are handling error interrupt to support a
> successful transaction. Think about why EDMA considers missed events as
> error condition.

I agree with this, its not the best way to do it. I have been working on
a different approach.

However, in support of the series:
1. It doesn't break any existing code
2. It works for all current DMA users (performance and correctness)
3. It removes the SG limitations on DMA users.

So what you suggested, would be more of a feature addition than a
limitation of this series. It is atleast better than what's being done
now - forcing the limit to the total number of SGs, so it is a step in
the right direction.

>> 2. If the interrupt handler for some reason doesn't complete or get
>> service in time, we will end up DMA'ing incorrect data as events
>> wouldn't stop coming in even if interrupt is not yet handled (in your
>> example linked sets P1 or P2 would be old ones being repeated). Where as
>> with my method, we are not doing any DMA once we finish the current
>> MAX_NR_SG set even if events continue to come.
> 
> Where is repetition and possibility of wrong data being transferred? We
> have a linear list of PaRAM sets - not a loop. You would link the end to
> PaRAM set chain to dummy PaRAM set which BTW will not cause missed
> events. The more number of PaRAM sets you add to the chain, the more

There would have to be a loop, how else would you ensure continuity and
uninterrupted DMA?

Consider if you have 2 sets of linked sets:
L1 is the first set of Linked sets and L2 is the second.

When L1 is done, EDMA continues with L2 (due to the link) while
interrupt handler prepares L1. The continuity depends on L1 being linked
to L2. Only the absolute last break up of the MAX_NR_SG linked set will
be linked to Dummy.

So consider MAX_NR_SG=10, and sg_len = 35

L1 - L2 - L1 - L1 - Dummy

The split would be in number of slots,
10 - 10 - 10 -  5 - Dummy

> time CPU gets to intervene before DMA eventually stalls. This is a
> tradeoff system designers can manage.

Consider what happens in the case where MAX_SG_NR=1 or 2. In that case,
there's a change we might not get enough time for the interrupt handler
to setup next series of linked set.

Some how this limitation has to be overcome by advising in comments than
MAX_SG_NR should always be greater than a certain number to ensure
proper operation.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01 20:48                 ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01 20:48 UTC (permalink / raw)
  To: joelf
  Cc: Sekhar Nori, Tony Lindgren, Santosh Shilimkar, Sricharan R,
	Rajendra Nayak, Lokesh Vutla, Matt Porter, Grant Likely,
	Rob Herring, Vinod Koul, Dan Williams, Mark Brown,
	Benoit Cousson, Russell King, Arnd Bergmann, Olof Johansson,
	Balaji TK, Gururaja Hebbar, Chris Ball, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Linux Kernel Mailing List,
	Linux MMC List

Just some corrections here..

On 08/01/2013 03:28 PM, Joel Fernandes wrote:

>>> 2. If the interrupt handler for some reason doesn't complete or get
>>> service in time, we will end up DMA'ing incorrect data as events
>>> wouldn't stop coming in even if interrupt is not yet handled (in your
>>> example linked sets P1 or P2 would be old ones being repeated). Where as
>>> with my method, we are not doing any DMA once we finish the current
>>> MAX_NR_SG set even if events continue to come.
>>
>> Where is repetition and possibility of wrong data being transferred? We
>> have a linear list of PaRAM sets - not a loop. You would link the end to
>> PaRAM set chain to dummy PaRAM set which BTW will not cause missed
>> events. The more number of PaRAM sets you add to the chain, the more
> 
> There would have to be a loop, how else would you ensure continuity and
> uninterrupted DMA?
> 
> Consider if you have 2 sets of linked sets:
> L1 is the first set of Linked sets and L2 is the second.
> 
> When L1 is done, EDMA continues with L2 (due to the link) while
> interrupt handler prepares L1. The continuity depends on L1 being linked
> to L2. Only the absolute last break up of the MAX_NR_SG linked set will
> be linked to Dummy.
> 
> So consider MAX_NR_SG=10, and sg_len = 35
> 
> L1 - L2 - L1 - L1 - Dummy

Should be,
L1 - L2 - L1 - L2 - Dummy

> 
> The split would be in number of slots,
> 10 - 10 - 10 -  5 - Dummy
> 
>> time CPU gets to intervene before DMA eventually stalls. This is a
>> tradeoff system designers can manage.
> 
> Consider what happens in the case where MAX_SG_NR=1 or 2. In that case,
> there's a change we might not get enough time for the interrupt handler
> to setup next series of linked set.
> 
> Some how this limitation has to be overcome by advising in comments than
> MAX_SG_NR should always be greater than a certain number to ensure
> proper operation.

s/than/that/


Thanks,

-Joel


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01 20:48                 ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01 20:48 UTC (permalink / raw)
  To: joelf-l0cyMroinI0
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

Just some corrections here..

On 08/01/2013 03:28 PM, Joel Fernandes wrote:

>>> 2. If the interrupt handler for some reason doesn't complete or get
>>> service in time, we will end up DMA'ing incorrect data as events
>>> wouldn't stop coming in even if interrupt is not yet handled (in your
>>> example linked sets P1 or P2 would be old ones being repeated). Where as
>>> with my method, we are not doing any DMA once we finish the current
>>> MAX_NR_SG set even if events continue to come.
>>
>> Where is repetition and possibility of wrong data being transferred? We
>> have a linear list of PaRAM sets - not a loop. You would link the end to
>> PaRAM set chain to dummy PaRAM set which BTW will not cause missed
>> events. The more number of PaRAM sets you add to the chain, the more
> 
> There would have to be a loop, how else would you ensure continuity and
> uninterrupted DMA?
> 
> Consider if you have 2 sets of linked sets:
> L1 is the first set of Linked sets and L2 is the second.
> 
> When L1 is done, EDMA continues with L2 (due to the link) while
> interrupt handler prepares L1. The continuity depends on L1 being linked
> to L2. Only the absolute last break up of the MAX_NR_SG linked set will
> be linked to Dummy.
> 
> So consider MAX_NR_SG=10, and sg_len = 35
> 
> L1 - L2 - L1 - L1 - Dummy

Should be,
L1 - L2 - L1 - L2 - Dummy

> 
> The split would be in number of slots,
> 10 - 10 - 10 -  5 - Dummy
> 
>> time CPU gets to intervene before DMA eventually stalls. This is a
>> tradeoff system designers can manage.
> 
> Consider what happens in the case where MAX_SG_NR=1 or 2. In that case,
> there's a change we might not get enough time for the interrupt handler
> to setup next series of linked set.
> 
> Some how this limitation has to be overcome by advising in comments than
> MAX_SG_NR should always be greater than a certain number to ensure
> proper operation.

s/than/that/


Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-01 20:48                 ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-01 20:48 UTC (permalink / raw)
  To: linux-arm-kernel

Just some corrections here..

On 08/01/2013 03:28 PM, Joel Fernandes wrote:

>>> 2. If the interrupt handler for some reason doesn't complete or get
>>> service in time, we will end up DMA'ing incorrect data as events
>>> wouldn't stop coming in even if interrupt is not yet handled (in your
>>> example linked sets P1 or P2 would be old ones being repeated). Where as
>>> with my method, we are not doing any DMA once we finish the current
>>> MAX_NR_SG set even if events continue to come.
>>
>> Where is repetition and possibility of wrong data being transferred? We
>> have a linear list of PaRAM sets - not a loop. You would link the end to
>> PaRAM set chain to dummy PaRAM set which BTW will not cause missed
>> events. The more number of PaRAM sets you add to the chain, the more
> 
> There would have to be a loop, how else would you ensure continuity and
> uninterrupted DMA?
> 
> Consider if you have 2 sets of linked sets:
> L1 is the first set of Linked sets and L2 is the second.
> 
> When L1 is done, EDMA continues with L2 (due to the link) while
> interrupt handler prepares L1. The continuity depends on L1 being linked
> to L2. Only the absolute last break up of the MAX_NR_SG linked set will
> be linked to Dummy.
> 
> So consider MAX_NR_SG=10, and sg_len = 35
> 
> L1 - L2 - L1 - L1 - Dummy

Should be,
L1 - L2 - L1 - L2 - Dummy

> 
> The split would be in number of slots,
> 10 - 10 - 10 -  5 - Dummy
> 
>> time CPU gets to intervene before DMA eventually stalls. This is a
>> tradeoff system designers can manage.
> 
> Consider what happens in the case where MAX_SG_NR=1 or 2. In that case,
> there's a change we might not get enough time for the interrupt handler
> to setup next series of linked set.
> 
> Some how this limitation has to be overcome by advising in comments than
> MAX_SG_NR should always be greater than a certain number to ensure
> proper operation.

s/than/that/


Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-02 13:26                 ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-08-02 13:26 UTC (permalink / raw)
  To: joelf
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

On 8/2/2013 1:58 AM, Joel Fernandes wrote:
> On 08/01/2013 01:13 AM, Sekhar Nori wrote:
>> On Thursday 01 August 2013 07:57 AM, Joel Fernandes wrote:
>>> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>>>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>>>> Hi Sekhar,
>>>>>
>>>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>>>> events and issue them.
>>>>>>>
>>>>>>> The sequence of events that require this are:
>>>>>>>
>>>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>>>
>>>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>>>
>>>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>>>
>>>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>>>
>>>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>>>> issue them manually.
>>>>>>
>>>>>> Are you sure there wont be any effect of these missed events on the
>>>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>>>
>>>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>>>> buffers for transfer which are not scattered across physical memory.
>>>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>>>> created for example by MMC and Crypto.
>>>>>
>>>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>>>
>>>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>>>> they don't want to wait, they would have allocated a contiguous block of
>>>>> memory and DMA that in one stretch so they don't lose any events, and in
>>>>> such cases we are not linking to Null.
>>>>
>>>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>>>> always work around that by limiting the number of sync events they
>>>> generate so as to not having any of the events getting missed. With this
>>>> series, I am worried that EDMA drivers is advertizing that it can handle
>>>> any length SG list while not taking care of missing any events while
>>>> doing so. This will break the assumptions that driver writers make.
>>>
>>> This is already being done by some other DMA engine drivers ;). We can
>>> advertise more than we can handle at a time, that's the basis of this
>>> whole idea.
>>>
>>> I understand what you're saying but events are not something that have
>>> be serviced immediately, they can be queued etc and the actually
>>> transfer from the DMA controller can be delayed. As long as we don't
>>> miss the event we are fine which my series takes care off.
>>>
>>> So far I have tested this series on following modules in various
>>> configurations and have seen no issues:
>>> - Crypto AES
>>> - MMC/SD
>>> - SPI (128x160 display)
>>
>> Notice how in each of these cases the peripheral is in control of when
>> data is driven out? Please test with McASP in a configuration where
>> codec drives the frame-sync/bit-clock or with UART under high baud rate.
> 
> McASP allocates a contiguous buffer. For this case there is always an SG
> of size 1 and this patch series doesn't effect it at all, there is not
> stalling. Further McASP audio driver is still awaiting conversion to use
> DMA engine so there's no way yet to test it.

Okay, looks like omap-serial does not use DMA as well so you cannot use
that. Anyway, my point is beyond what the McASP driver does currently.
Once you expose "the handle any number of SGs" feature from EDMA driver,
any client is free to use it. So we need to think ahead to see if we
break any use cases.

> 
>>>>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>>>>> is waiting to be set-up.
>>>>>
>>>>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>>>>> driver/subsystem will make SG list of size max_segs. Between these
>>>>> sessions of creating such smaller SG-lists, if for some reason the MMC
>>>>> controller is sending events, these will be lost anyway.
>>>>
>>>> But if MMC/SD driver knows how many events it should generate if it
>>>> knows the MAX SG limit. So there should not be any missed events in
>>>> current code. And I am not claiming that your solution is making matters
>>>> worse. But its not making it much better as well.
>>>
>>> This is not true for crypto, the events are not deasserted and crypto
>>> continues to send events. This is what led to the "don't trigger in
>>> Null" patch where I'm setting the missed flag to avoid recursion.
>>
>> Sorry, I am not sure which patch you are talking about here. Can you
>> provide the full subject line to avoid confusion?
> 
> Sure, "dma: edma: Detect null slot errors and handle them correctly".
> 
>>>>> This can be used only for buffers that are contiguous in memory, not
>>>>> those that are scattered across memory.
>>>>
>>>> I was hinting at using the linking facility of EDMA to achieve this.
>>>> Each PaRAM set has full 32-bit source and destination pointers so I see
>>>> no reason why non-contiguous case cannot be handled.
>>>>
>>>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>>>> typically 4 times the number of channels. In this case we use one DMA
>>>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>>>> and P1 and P2 are the Link sets.
>>>>
>>>> Initial setup:
>>>>
>>>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>  ^      ^      ^
>>>>  |      |      |
>>>> P0  -> P1  -> P2  -> NULL
>>>>
>>>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>>>> completion. On each completion interrupt, hardware automatically copies
>>>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>>>> out, the state of hardware is:
>>>>
>>>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>>>  ^       ^
>>>>  |       |
>>>> P0,1    P2  -> NULL
>>>>  |       ^
>>>>  |       |
>>>>  ---------
>>>>
>>>> SG1 transfer has already started by the time the TC interrupt is
>>>> handled. As you can see P1 is now redundant and ready to be recycled. So
>>>> in the interrupt handler, software recycles P1. Thus:
>>>>
>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>  ^      ^      ^
>>>>  |      |      |
>>>> P0  -> P2  -> P1  -> NULL
>>>>
>>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>>> Hardware state:
>>>>
>>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>  ^       ^
>>>>  |       |
>>>> P0,2    P1  -> NULL
>>>>  |       ^
>>>>  |       |
>>>>  ---------
>>>>
>>>> As part of TC completion interrupt handling:
>>>>
>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>  ^      ^      ^
>>>>  |      |      |
>>>> P0  -> P1  -> P2  -> NULL
>>>>
>>>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>>>> interrupt handler gets more time to recycle the PaRAM set. At no point
>>>> we touch P0 as it is always under active transfer. Thus the peripheral
>>>> is always kept busy.
>>>>
>>>> Do you see any reason why such a mechanism cannot be implemented?
>>>
>>> This is possible and looks like another way to do it, but there are 2
>>> problems I can see with it.
>>>
>>> 1. Its inefficient because of too many interrupts:
>>>
>>> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
>>> 10. This method will trigger 30 interrupts always, where as with my
>>> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
>>> you'd get even fewer interrupts.
>>
>> Yes, but you are seeing only one side of inefficiency. In your design
>> DMA *always* stalls waiting for CPU to intervene. The whole point to DMA
>> is to keep it going while CPU does bookeeping in background. This is
>> simply not going to scale with fast peripherals.
> 
> Agreed. So far though, I've no way to reproduce a fast peripheral that
> scatters data across physical memory and suffers from any stall.
> 
>> Besides, missed events are error conditions as far as EDMA and the
>> peripheral is considered. You are handling error interrupt to support a
>> successful transaction. Think about why EDMA considers missed events as
>> error condition.
> 
> I agree with this, its not the best way to do it. I have been working on
> a different approach.
> 
> However, in support of the series:
> 1. It doesn't break any existing code
> 2. It works for all current DMA users (performance and correctness)
> 3. It removes the SG limitations on DMA users.

Right, all of this should be true even with the approach I am suggesting.

> So what you suggested, would be more of a feature addition than a
> limitation of this series. It is atleast better than what's being done
> now - forcing the limit to the total number of SGs, so it is a step in
> the right direction.

No, I do not see my approach is an feature addition to what you are
doing. They are both very contrasting ways. For example, you would not
need the manual (re)trigger in CC error condition in what I am proposing.

> 
>>> 2. If the interrupt handler for some reason doesn't complete or get
>>> service in time, we will end up DMA'ing incorrect data as events
>>> wouldn't stop coming in even if interrupt is not yet handled (in your
>>> example linked sets P1 or P2 would be old ones being repeated). Where as
>>> with my method, we are not doing any DMA once we finish the current
>>> MAX_NR_SG set even if events continue to come.
>>
>> Where is repetition and possibility of wrong data being transferred? We
>> have a linear list of PaRAM sets - not a loop. You would link the end to
>> PaRAM set chain to dummy PaRAM set which BTW will not cause missed
>> events. The more number of PaRAM sets you add to the chain, the more
> 
> There would have to be a loop, how else would you ensure continuity and
> uninterrupted DMA?

Uninterrupted DMA comes because of PaRAM set recycling. In my diagrams
above, hardware is *always* using P0 for transfer while software always
updates the tail of PaRAM linked list.

> 
> Consider if you have 2 sets of linked sets:
> L1 is the first set of Linked sets and L2 is the second.

I think this is where there is confusion. I am using only one linked set
of PaRAM entries (P0->P1->P2->DUMMY). If you need more time to service
the interrupt before the DMA hits the dummy PaRAM you allocate more link
PaRAM sets for the channel (P0->P1->...Pn->DUMMY). At no point was I
suggesting having two sets of linked PaRAM sets. Why would you need
something like that?

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-02 13:26                 ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-08-02 13:26 UTC (permalink / raw)
  To: joelf-l0cyMroinI0
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

On 8/2/2013 1:58 AM, Joel Fernandes wrote:
> On 08/01/2013 01:13 AM, Sekhar Nori wrote:
>> On Thursday 01 August 2013 07:57 AM, Joel Fernandes wrote:
>>> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>>>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>>>> Hi Sekhar,
>>>>>
>>>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>>>> events and issue them.
>>>>>>>
>>>>>>> The sequence of events that require this are:
>>>>>>>
>>>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>>>
>>>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>>>
>>>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>>>
>>>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>>>
>>>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>>>> issue them manually.
>>>>>>
>>>>>> Are you sure there wont be any effect of these missed events on the
>>>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>>>
>>>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>>>> buffers for transfer which are not scattered across physical memory.
>>>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>>>> created for example by MMC and Crypto.
>>>>>
>>>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>>>
>>>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>>>> they don't want to wait, they would have allocated a contiguous block of
>>>>> memory and DMA that in one stretch so they don't lose any events, and in
>>>>> such cases we are not linking to Null.
>>>>
>>>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>>>> always work around that by limiting the number of sync events they
>>>> generate so as to not having any of the events getting missed. With this
>>>> series, I am worried that EDMA drivers is advertizing that it can handle
>>>> any length SG list while not taking care of missing any events while
>>>> doing so. This will break the assumptions that driver writers make.
>>>
>>> This is already being done by some other DMA engine drivers ;). We can
>>> advertise more than we can handle at a time, that's the basis of this
>>> whole idea.
>>>
>>> I understand what you're saying but events are not something that have
>>> be serviced immediately, they can be queued etc and the actually
>>> transfer from the DMA controller can be delayed. As long as we don't
>>> miss the event we are fine which my series takes care off.
>>>
>>> So far I have tested this series on following modules in various
>>> configurations and have seen no issues:
>>> - Crypto AES
>>> - MMC/SD
>>> - SPI (128x160 display)
>>
>> Notice how in each of these cases the peripheral is in control of when
>> data is driven out? Please test with McASP in a configuration where
>> codec drives the frame-sync/bit-clock or with UART under high baud rate.
> 
> McASP allocates a contiguous buffer. For this case there is always an SG
> of size 1 and this patch series doesn't effect it at all, there is not
> stalling. Further McASP audio driver is still awaiting conversion to use
> DMA engine so there's no way yet to test it.

Okay, looks like omap-serial does not use DMA as well so you cannot use
that. Anyway, my point is beyond what the McASP driver does currently.
Once you expose "the handle any number of SGs" feature from EDMA driver,
any client is free to use it. So we need to think ahead to see if we
break any use cases.

> 
>>>>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>>>>> is waiting to be set-up.
>>>>>
>>>>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>>>>> driver/subsystem will make SG list of size max_segs. Between these
>>>>> sessions of creating such smaller SG-lists, if for some reason the MMC
>>>>> controller is sending events, these will be lost anyway.
>>>>
>>>> But if MMC/SD driver knows how many events it should generate if it
>>>> knows the MAX SG limit. So there should not be any missed events in
>>>> current code. And I am not claiming that your solution is making matters
>>>> worse. But its not making it much better as well.
>>>
>>> This is not true for crypto, the events are not deasserted and crypto
>>> continues to send events. This is what led to the "don't trigger in
>>> Null" patch where I'm setting the missed flag to avoid recursion.
>>
>> Sorry, I am not sure which patch you are talking about here. Can you
>> provide the full subject line to avoid confusion?
> 
> Sure, "dma: edma: Detect null slot errors and handle them correctly".
> 
>>>>> This can be used only for buffers that are contiguous in memory, not
>>>>> those that are scattered across memory.
>>>>
>>>> I was hinting at using the linking facility of EDMA to achieve this.
>>>> Each PaRAM set has full 32-bit source and destination pointers so I see
>>>> no reason why non-contiguous case cannot be handled.
>>>>
>>>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>>>> typically 4 times the number of channels. In this case we use one DMA
>>>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>>>> and P1 and P2 are the Link sets.
>>>>
>>>> Initial setup:
>>>>
>>>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>  ^      ^      ^
>>>>  |      |      |
>>>> P0  -> P1  -> P2  -> NULL
>>>>
>>>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>>>> completion. On each completion interrupt, hardware automatically copies
>>>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>>>> out, the state of hardware is:
>>>>
>>>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>>>  ^       ^
>>>>  |       |
>>>> P0,1    P2  -> NULL
>>>>  |       ^
>>>>  |       |
>>>>  ---------
>>>>
>>>> SG1 transfer has already started by the time the TC interrupt is
>>>> handled. As you can see P1 is now redundant and ready to be recycled. So
>>>> in the interrupt handler, software recycles P1. Thus:
>>>>
>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>  ^      ^      ^
>>>>  |      |      |
>>>> P0  -> P2  -> P1  -> NULL
>>>>
>>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>>> Hardware state:
>>>>
>>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>  ^       ^
>>>>  |       |
>>>> P0,2    P1  -> NULL
>>>>  |       ^
>>>>  |       |
>>>>  ---------
>>>>
>>>> As part of TC completion interrupt handling:
>>>>
>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>  ^      ^      ^
>>>>  |      |      |
>>>> P0  -> P1  -> P2  -> NULL
>>>>
>>>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>>>> interrupt handler gets more time to recycle the PaRAM set. At no point
>>>> we touch P0 as it is always under active transfer. Thus the peripheral
>>>> is always kept busy.
>>>>
>>>> Do you see any reason why such a mechanism cannot be implemented?
>>>
>>> This is possible and looks like another way to do it, but there are 2
>>> problems I can see with it.
>>>
>>> 1. Its inefficient because of too many interrupts:
>>>
>>> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
>>> 10. This method will trigger 30 interrupts always, where as with my
>>> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
>>> you'd get even fewer interrupts.
>>
>> Yes, but you are seeing only one side of inefficiency. In your design
>> DMA *always* stalls waiting for CPU to intervene. The whole point to DMA
>> is to keep it going while CPU does bookeeping in background. This is
>> simply not going to scale with fast peripherals.
> 
> Agreed. So far though, I've no way to reproduce a fast peripheral that
> scatters data across physical memory and suffers from any stall.
> 
>> Besides, missed events are error conditions as far as EDMA and the
>> peripheral is considered. You are handling error interrupt to support a
>> successful transaction. Think about why EDMA considers missed events as
>> error condition.
> 
> I agree with this, its not the best way to do it. I have been working on
> a different approach.
> 
> However, in support of the series:
> 1. It doesn't break any existing code
> 2. It works for all current DMA users (performance and correctness)
> 3. It removes the SG limitations on DMA users.

Right, all of this should be true even with the approach I am suggesting.

> So what you suggested, would be more of a feature addition than a
> limitation of this series. It is atleast better than what's being done
> now - forcing the limit to the total number of SGs, so it is a step in
> the right direction.

No, I do not see my approach is an feature addition to what you are
doing. They are both very contrasting ways. For example, you would not
need the manual (re)trigger in CC error condition in what I am proposing.

> 
>>> 2. If the interrupt handler for some reason doesn't complete or get
>>> service in time, we will end up DMA'ing incorrect data as events
>>> wouldn't stop coming in even if interrupt is not yet handled (in your
>>> example linked sets P1 or P2 would be old ones being repeated). Where as
>>> with my method, we are not doing any DMA once we finish the current
>>> MAX_NR_SG set even if events continue to come.
>>
>> Where is repetition and possibility of wrong data being transferred? We
>> have a linear list of PaRAM sets - not a loop. You would link the end to
>> PaRAM set chain to dummy PaRAM set which BTW will not cause missed
>> events. The more number of PaRAM sets you add to the chain, the more
> 
> There would have to be a loop, how else would you ensure continuity and
> uninterrupted DMA?

Uninterrupted DMA comes because of PaRAM set recycling. In my diagrams
above, hardware is *always* using P0 for transfer while software always
updates the tail of PaRAM linked list.

> 
> Consider if you have 2 sets of linked sets:
> L1 is the first set of Linked sets and L2 is the second.

I think this is where there is confusion. I am using only one linked set
of PaRAM entries (P0->P1->P2->DUMMY). If you need more time to service
the interrupt before the DMA hits the dummy PaRAM you allocate more link
PaRAM sets for the channel (P0->P1->...Pn->DUMMY). At no point was I
suggesting having two sets of linked PaRAM sets. Why would you need
something like that?

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-02 13:26                 ` Sekhar Nori
  0 siblings, 0 replies; 89+ messages in thread
From: Sekhar Nori @ 2013-08-02 13:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 8/2/2013 1:58 AM, Joel Fernandes wrote:
> On 08/01/2013 01:13 AM, Sekhar Nori wrote:
>> On Thursday 01 August 2013 07:57 AM, Joel Fernandes wrote:
>>> On 07/31/2013 04:18 AM, Sekhar Nori wrote:
>>>> On Wednesday 31 July 2013 10:19 AM, Joel Fernandes wrote:
>>>>> Hi Sekhar,
>>>>>
>>>>> On 07/30/2013 02:05 AM, Sekhar Nori wrote:
>>>>>> On Monday 29 July 2013 06:59 PM, Joel Fernandes wrote:
>>>>>>> In an effort to move to using Scatter gather lists of any size with
>>>>>>> EDMA as discussed at [1] instead of placing limitations on the driver,
>>>>>>> we work through the limitations of the EDMAC hardware to find missed
>>>>>>> events and issue them.
>>>>>>>
>>>>>>> The sequence of events that require this are:
>>>>>>>
>>>>>>> For the scenario where MAX slots for an EDMA channel is 3:
>>>>>>>
>>>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> Null
>>>>>>>
>>>>>>> The above SG list will have to be DMA'd in 2 sets:
>>>>>>>
>>>>>>> (1) SG1 -> SG2 -> SG3 -> Null
>>>>>>> (2) SG4 -> SG5 -> SG6 -> Null
>>>>>>>
>>>>>>> After (1) is succesfully transferred, the events from the MMC controller
>>>>>>> donot stop coming and are missed by the time we have setup the transfer
>>>>>>> for (2). So here, we catch the events missed as an error condition and
>>>>>>> issue them manually.
>>>>>>
>>>>>> Are you sure there wont be any effect of these missed events on the
>>>>>> peripheral side. For example, wont McASP get into an underrun condition
>>>>>> when it encounters a null PaRAM set? Even UART has to transmit to a
>>>>>
>>>>> But it will not encounter null PaRAM set because McASP uses contiguous
>>>>> buffers for transfer which are not scattered across physical memory.
>>>>> This can be accomplished with an SG of size 1. For such SGs, this patch
>>>>> series leaves it linked Dummy and does not link to Null set. Null set is
>>>>> only used for SG lists that are > MAX_NR_SG in size such as those
>>>>> created for example by MMC and Crypto.
>>>>>
>>>>>> particular baud so I guess it cannot wait like the way MMC/SD can.
>>>>>
>>>>> Existing driver have to wait anyway if they hit MAX SG limit today. If
>>>>> they don't want to wait, they would have allocated a contiguous block of
>>>>> memory and DMA that in one stretch so they don't lose any events, and in
>>>>> such cases we are not linking to Null.
>>>>
>>>> As long as DMA driver can advertize its MAX SG limit, peripherals can
>>>> always work around that by limiting the number of sync events they
>>>> generate so as to not having any of the events getting missed. With this
>>>> series, I am worried that EDMA drivers is advertizing that it can handle
>>>> any length SG list while not taking care of missing any events while
>>>> doing so. This will break the assumptions that driver writers make.
>>>
>>> This is already being done by some other DMA engine drivers ;). We can
>>> advertise more than we can handle at a time, that's the basis of this
>>> whole idea.
>>>
>>> I understand what you're saying but events are not something that have
>>> be serviced immediately, they can be queued etc and the actually
>>> transfer from the DMA controller can be delayed. As long as we don't
>>> miss the event we are fine which my series takes care off.
>>>
>>> So far I have tested this series on following modules in various
>>> configurations and have seen no issues:
>>> - Crypto AES
>>> - MMC/SD
>>> - SPI (128x160 display)
>>
>> Notice how in each of these cases the peripheral is in control of when
>> data is driven out? Please test with McASP in a configuration where
>> codec drives the frame-sync/bit-clock or with UART under high baud rate.
> 
> McASP allocates a contiguous buffer. For this case there is always an SG
> of size 1 and this patch series doesn't effect it at all, there is not
> stalling. Further McASP audio driver is still awaiting conversion to use
> DMA engine so there's no way yet to test it.

Okay, looks like omap-serial does not use DMA as well so you cannot use
that. Anyway, my point is beyond what the McASP driver does currently.
Once you expose "the handle any number of SGs" feature from EDMA driver,
any client is free to use it. So we need to think ahead to see if we
break any use cases.

> 
>>>>>> Also, wont this lead to under-utilization of the peripheral bandwith?
>>>>>> Meaning, MMC/SD is ready with data but cannot transfer because the DMA
>>>>>> is waiting to be set-up.
>>>>>
>>>>> But it is waiting anyway even today. Currently based on MAX segs, MMC
>>>>> driver/subsystem will make SG list of size max_segs. Between these
>>>>> sessions of creating such smaller SG-lists, if for some reason the MMC
>>>>> controller is sending events, these will be lost anyway.
>>>>
>>>> But if MMC/SD driver knows how many events it should generate if it
>>>> knows the MAX SG limit. So there should not be any missed events in
>>>> current code. And I am not claiming that your solution is making matters
>>>> worse. But its not making it much better as well.
>>>
>>> This is not true for crypto, the events are not deasserted and crypto
>>> continues to send events. This is what led to the "don't trigger in
>>> Null" patch where I'm setting the missed flag to avoid recursion.
>>
>> Sorry, I am not sure which patch you are talking about here. Can you
>> provide the full subject line to avoid confusion?
> 
> Sure, "dma: edma: Detect null slot errors and handle them correctly".
> 
>>>>> This can be used only for buffers that are contiguous in memory, not
>>>>> those that are scattered across memory.
>>>>
>>>> I was hinting at using the linking facility of EDMA to achieve this.
>>>> Each PaRAM set has full 32-bit source and destination pointers so I see
>>>> no reason why non-contiguous case cannot be handled.
>>>>
>>>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>>>> typically 4 times the number of channels. In this case we use one DMA
>>>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>>>> and P1 and P2 are the Link sets.
>>>>
>>>> Initial setup:
>>>>
>>>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>  ^      ^      ^
>>>>  |      |      |
>>>> P0  -> P1  -> P2  -> NULL
>>>>
>>>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>>>> completion. On each completion interrupt, hardware automatically copies
>>>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>>>> out, the state of hardware is:
>>>>
>>>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>>>  ^       ^
>>>>  |       |
>>>> P0,1    P2  -> NULL
>>>>  |       ^
>>>>  |       |
>>>>  ---------
>>>>
>>>> SG1 transfer has already started by the time the TC interrupt is
>>>> handled. As you can see P1 is now redundant and ready to be recycled. So
>>>> in the interrupt handler, software recycles P1. Thus:
>>>>
>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>  ^      ^      ^
>>>>  |      |      |
>>>> P0  -> P2  -> P1  -> NULL
>>>>
>>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>>> Hardware state:
>>>>
>>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>  ^       ^
>>>>  |       |
>>>> P0,2    P1  -> NULL
>>>>  |       ^
>>>>  |       |
>>>>  ---------
>>>>
>>>> As part of TC completion interrupt handling:
>>>>
>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>  ^      ^      ^
>>>>  |      |      |
>>>> P0  -> P1  -> P2  -> NULL
>>>>
>>>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>>>> interrupt handler gets more time to recycle the PaRAM set. At no point
>>>> we touch P0 as it is always under active transfer. Thus the peripheral
>>>> is always kept busy.
>>>>
>>>> Do you see any reason why such a mechanism cannot be implemented?
>>>
>>> This is possible and looks like another way to do it, but there are 2
>>> problems I can see with it.
>>>
>>> 1. Its inefficient because of too many interrupts:
>>>
>>> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
>>> 10. This method will trigger 30 interrupts always, where as with my
>>> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
>>> you'd get even fewer interrupts.
>>
>> Yes, but you are seeing only one side of inefficiency. In your design
>> DMA *always* stalls waiting for CPU to intervene. The whole point to DMA
>> is to keep it going while CPU does bookeeping in background. This is
>> simply not going to scale with fast peripherals.
> 
> Agreed. So far though, I've no way to reproduce a fast peripheral that
> scatters data across physical memory and suffers from any stall.
> 
>> Besides, missed events are error conditions as far as EDMA and the
>> peripheral is considered. You are handling error interrupt to support a
>> successful transaction. Think about why EDMA considers missed events as
>> error condition.
> 
> I agree with this, its not the best way to do it. I have been working on
> a different approach.
> 
> However, in support of the series:
> 1. It doesn't break any existing code
> 2. It works for all current DMA users (performance and correctness)
> 3. It removes the SG limitations on DMA users.

Right, all of this should be true even with the approach I am suggesting.

> So what you suggested, would be more of a feature addition than a
> limitation of this series. It is atleast better than what's being done
> now - forcing the limit to the total number of SGs, so it is a step in
> the right direction.

No, I do not see my approach is an feature addition to what you are
doing. They are both very contrasting ways. For example, you would not
need the manual (re)trigger in CC error condition in what I am proposing.

> 
>>> 2. If the interrupt handler for some reason doesn't complete or get
>>> service in time, we will end up DMA'ing incorrect data as events
>>> wouldn't stop coming in even if interrupt is not yet handled (in your
>>> example linked sets P1 or P2 would be old ones being repeated). Where as
>>> with my method, we are not doing any DMA once we finish the current
>>> MAX_NR_SG set even if events continue to come.
>>
>> Where is repetition and possibility of wrong data being transferred? We
>> have a linear list of PaRAM sets - not a loop. You would link the end to
>> PaRAM set chain to dummy PaRAM set which BTW will not cause missed
>> events. The more number of PaRAM sets you add to the chain, the more
> 
> There would have to be a loop, how else would you ensure continuity and
> uninterrupted DMA?

Uninterrupted DMA comes because of PaRAM set recycling. In my diagrams
above, hardware is *always* using P0 for transfer while software always
updates the tail of PaRAM linked list.

> 
> Consider if you have 2 sets of linked sets:
> L1 is the first set of Linked sets and L2 is the second.

I think this is where there is confusion. I am using only one linked set
of PaRAM entries (P0->P1->P2->DUMMY). If you need more time to service
the interrupt before the DMA hits the dummy PaRAM you allocate more link
PaRAM sets for the channel (P0->P1->...Pn->DUMMY). At no point was I
suggesting having two sets of linked PaRAM sets. Why would you need
something like that?

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-02 18:15                   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-02 18:15 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

Hi Sekhar,

Thanks for your detailed illustrations.

On 08/02/2013 08:26 AM, Sekhar Nori wrote:
[..]
>>>>>> This can be used only for buffers that are contiguous in memory, not
>>>>>> those that are scattered across memory.
>>>>>
>>>>> I was hinting at using the linking facility of EDMA to achieve this.
>>>>> Each PaRAM set has full 32-bit source and destination pointers so I see
>>>>> no reason why non-contiguous case cannot be handled.
>>>>>
>>>>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>>>>> typically 4 times the number of channels. In this case we use one DMA
>>>>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>>>>> and P1 and P2 are the Link sets.
>>>>>
>>>>> Initial setup:
>>>>>
>>>>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P1  -> P2  -> NULL
>>>>>
>>>>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>>>>> completion. On each completion interrupt, hardware automatically copies
>>>>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>>>>> out, the state of hardware is:
>>>>>
>>>>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>>>>  ^       ^
>>>>>  |       |
>>>>> P0,1    P2  -> NULL
>>>>>  |       ^
>>>>>  |       |
>>>>>  ---------
>>>>>
>>>>> SG1 transfer has already started by the time the TC interrupt is
>>>>> handled. As you can see P1 is now redundant and ready to be recycled. So
>>>>> in the interrupt handler, software recycles P1. Thus:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P2  -> P1  -> NULL
>>>>>
>>>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>>>> Hardware state:
>>>>>
>>>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^       ^
>>>>>  |       |
>>>>> P0,2    P1  -> NULL
>>>>>  |       ^
>>>>>  |       |
>>>>>  ---------
>>>>>
>>>>> As part of TC completion interrupt handling:
>>>>>
>>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P1  -> P2  -> NULL
>>>>>
>>>>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>>>>> interrupt handler gets more time to recycle the PaRAM set. At no point
>>>>> we touch P0 as it is always under active transfer. Thus the peripheral
>>>>> is always kept busy.
>>>>>
>>>>> Do you see any reason why such a mechanism cannot be implemented?
>>>>
>>>> This is possible and looks like another way to do it, but there are 2
>>>> problems I can see with it.
>>>>
>>>> 1. Its inefficient because of too many interrupts:
>>>>
>>>> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
>>>> 10. This method will trigger 30 interrupts always, where as with my
>>>> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
>>>> you'd get even fewer interrupts.
>>>
>>> Yes, but you are seeing only one side of inefficiency. In your design
>>> DMA *always* stalls waiting for CPU to intervene. The whole point to DMA
>>> is to keep it going while CPU does bookeeping in background. This is
>>> simply not going to scale with fast peripherals.
>>
>> Agreed. So far though, I've no way to reproduce a fast peripheral that
>> scatters data across physical memory and suffers from any stall.
>>
>>> Besides, missed events are error conditions as far as EDMA and the
>>> peripheral is considered. You are handling error interrupt to support a
>>> successful transaction. Think about why EDMA considers missed events as
>>> error condition.
>>
>> I agree with this, its not the best way to do it. I have been working on
>> a different approach.
>>
>> However, in support of the series:
>> 1. It doesn't break any existing code
>> 2. It works for all current DMA users (performance and correctness)
>> 3. It removes the SG limitations on DMA users.
> 
> Right, all of this should be true even with the approach I am suggesting.
> 
>> So what you suggested, would be more of a feature addition than a
>> limitation of this series. It is atleast better than what's being done
>> now - forcing the limit to the total number of SGs, so it is a step in
>> the right direction.
> 
> No, I do not see my approach is an feature addition to what you are
> doing. They are both very contrasting ways. For example, you would not
> need the manual (re)trigger in CC error condition in what I am proposing.
> 
>>
>>>> 2. If the interrupt handler for some reason doesn't complete or get
>>>> service in time, we will end up DMA'ing incorrect data as events
>>>> wouldn't stop coming in even if interrupt is not yet handled (in your
>>>> example linked sets P1 or P2 would be old ones being repeated). Where as
>>>> with my method, we are not doing any DMA once we finish the current
>>>> MAX_NR_SG set even if events continue to come.
>>>
>>> Where is repetition and possibility of wrong data being transferred? We
>>> have a linear list of PaRAM sets - not a loop. You would link the end to
>>> PaRAM set chain to dummy PaRAM set which BTW will not cause missed
>>> events. The more number of PaRAM sets you add to the chain, the more
>>
>> There would have to be a loop, how else would you ensure continuity and
>> uninterrupted DMA?
> 
> Uninterrupted DMA comes because of PaRAM set recycling. In my diagrams
> above, hardware is *always* using P0 for transfer while software always
> updates the tail of PaRAM linked list.
> 
>>
>> Consider if you have 2 sets of linked sets:
>> L1 is the first set of Linked sets and L2 is the second.
> 
> I think this is where there is confusion. I am using only one linked set
> of PaRAM entries (P0->P1->P2->DUMMY). If you need more time to service
> the interrupt before the DMA hits the dummy PaRAM you allocate more link
> PaRAM sets for the channel (P0->P1->...Pn->DUMMY). At no point was I
> suggesting having two sets of linked PaRAM sets. Why would you need
> something like that?
> 

I think we are talking about the same thing. Let's for now discuss
having just 1 linked set to avoid confusion, that's fine.

I think where we are differing in our understanding, is the dummy link
comes into picture only when we are transferring the *last* SG.
For all others there is a cyclic link between P1 and P2. Would you agree?

Even in your diagrams you are actually showing such a cyclic link


>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P2  -> P1  -> NULL

Comparing this..

>>>>>
>>>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>>>> Hardware state:
>>>>>
>>>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^       ^
>>>>>  |       |
>>>>> P0,2    P1  -> NULL
>>>>>  |       ^
>>>>>  |       |
>>>>>  ---------
>>>>>
>>>>> As part of TC completion interrupt handling:
>>>>>
>>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P1  -> P2  -> NULL

.. with this. Notice that P2 -> P1 became P1 -> P2

The next thing logical diagram would look like:

>>>>>
>>>>> Now, on next interrupt, P1 gets copied and thus can get recycled.
>>>>> Hardware state:
>>>>>
>>>>> SG3  -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^       ^
>>>>>  |       |
>>>>> P0,1    P2  -> NULL
>>>>>  |       ^
>>>>>  |       |
>>>>>  ---------
>>>>>
>>>>> As part of TC completion interrupt handling:
>>>>>
>>>>> SG3 -> SG5 -> SG6 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P2  -> P1  -> NULL


"P1 gets copied" happens only because of the cyclic link from P2 to P1,
it wouldn't have happened if P2 was linked to Dummy as you described.

Now coming to 2 linked sets vs 1, I meant the same thing that to give
interrupt handler more time, we could have something like:

>>>>> As part of TC completion interrupt handling:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> NULL
>>>>>  ^      ^             ^
>>>>>  |      |             |
>>>>> P0  -> P1  -> P2  -> P3  -> P4  ->  Null

So what I was describing as 2 sets of linked sets is P1 and P2 being 1
set, and P3 and P4 being another set. We would then recycle a complete
set at the same time. That way interrupt handler could do more at once
and get more time to recycle. So we would setup TC interrupts only for
P2 and P4 in the above diagrams.

Thanks,

-Joel


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-02 18:15                   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-02 18:15 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

Hi Sekhar,

Thanks for your detailed illustrations.

On 08/02/2013 08:26 AM, Sekhar Nori wrote:
[..]
>>>>>> This can be used only for buffers that are contiguous in memory, not
>>>>>> those that are scattered across memory.
>>>>>
>>>>> I was hinting at using the linking facility of EDMA to achieve this.
>>>>> Each PaRAM set has full 32-bit source and destination pointers so I see
>>>>> no reason why non-contiguous case cannot be handled.
>>>>>
>>>>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>>>>> typically 4 times the number of channels. In this case we use one DMA
>>>>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>>>>> and P1 and P2 are the Link sets.
>>>>>
>>>>> Initial setup:
>>>>>
>>>>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P1  -> P2  -> NULL
>>>>>
>>>>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>>>>> completion. On each completion interrupt, hardware automatically copies
>>>>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>>>>> out, the state of hardware is:
>>>>>
>>>>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>>>>  ^       ^
>>>>>  |       |
>>>>> P0,1    P2  -> NULL
>>>>>  |       ^
>>>>>  |       |
>>>>>  ---------
>>>>>
>>>>> SG1 transfer has already started by the time the TC interrupt is
>>>>> handled. As you can see P1 is now redundant and ready to be recycled. So
>>>>> in the interrupt handler, software recycles P1. Thus:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P2  -> P1  -> NULL
>>>>>
>>>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>>>> Hardware state:
>>>>>
>>>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^       ^
>>>>>  |       |
>>>>> P0,2    P1  -> NULL
>>>>>  |       ^
>>>>>  |       |
>>>>>  ---------
>>>>>
>>>>> As part of TC completion interrupt handling:
>>>>>
>>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P1  -> P2  -> NULL
>>>>>
>>>>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>>>>> interrupt handler gets more time to recycle the PaRAM set. At no point
>>>>> we touch P0 as it is always under active transfer. Thus the peripheral
>>>>> is always kept busy.
>>>>>
>>>>> Do you see any reason why such a mechanism cannot be implemented?
>>>>
>>>> This is possible and looks like another way to do it, but there are 2
>>>> problems I can see with it.
>>>>
>>>> 1. Its inefficient because of too many interrupts:
>>>>
>>>> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
>>>> 10. This method will trigger 30 interrupts always, where as with my
>>>> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
>>>> you'd get even fewer interrupts.
>>>
>>> Yes, but you are seeing only one side of inefficiency. In your design
>>> DMA *always* stalls waiting for CPU to intervene. The whole point to DMA
>>> is to keep it going while CPU does bookeeping in background. This is
>>> simply not going to scale with fast peripherals.
>>
>> Agreed. So far though, I've no way to reproduce a fast peripheral that
>> scatters data across physical memory and suffers from any stall.
>>
>>> Besides, missed events are error conditions as far as EDMA and the
>>> peripheral is considered. You are handling error interrupt to support a
>>> successful transaction. Think about why EDMA considers missed events as
>>> error condition.
>>
>> I agree with this, its not the best way to do it. I have been working on
>> a different approach.
>>
>> However, in support of the series:
>> 1. It doesn't break any existing code
>> 2. It works for all current DMA users (performance and correctness)
>> 3. It removes the SG limitations on DMA users.
> 
> Right, all of this should be true even with the approach I am suggesting.
> 
>> So what you suggested, would be more of a feature addition than a
>> limitation of this series. It is atleast better than what's being done
>> now - forcing the limit to the total number of SGs, so it is a step in
>> the right direction.
> 
> No, I do not see my approach is an feature addition to what you are
> doing. They are both very contrasting ways. For example, you would not
> need the manual (re)trigger in CC error condition in what I am proposing.
> 
>>
>>>> 2. If the interrupt handler for some reason doesn't complete or get
>>>> service in time, we will end up DMA'ing incorrect data as events
>>>> wouldn't stop coming in even if interrupt is not yet handled (in your
>>>> example linked sets P1 or P2 would be old ones being repeated). Where as
>>>> with my method, we are not doing any DMA once we finish the current
>>>> MAX_NR_SG set even if events continue to come.
>>>
>>> Where is repetition and possibility of wrong data being transferred? We
>>> have a linear list of PaRAM sets - not a loop. You would link the end to
>>> PaRAM set chain to dummy PaRAM set which BTW will not cause missed
>>> events. The more number of PaRAM sets you add to the chain, the more
>>
>> There would have to be a loop, how else would you ensure continuity and
>> uninterrupted DMA?
> 
> Uninterrupted DMA comes because of PaRAM set recycling. In my diagrams
> above, hardware is *always* using P0 for transfer while software always
> updates the tail of PaRAM linked list.
> 
>>
>> Consider if you have 2 sets of linked sets:
>> L1 is the first set of Linked sets and L2 is the second.
> 
> I think this is where there is confusion. I am using only one linked set
> of PaRAM entries (P0->P1->P2->DUMMY). If you need more time to service
> the interrupt before the DMA hits the dummy PaRAM you allocate more link
> PaRAM sets for the channel (P0->P1->...Pn->DUMMY). At no point was I
> suggesting having two sets of linked PaRAM sets. Why would you need
> something like that?
> 

I think we are talking about the same thing. Let's for now discuss
having just 1 linked set to avoid confusion, that's fine.

I think where we are differing in our understanding, is the dummy link
comes into picture only when we are transferring the *last* SG.
For all others there is a cyclic link between P1 and P2. Would you agree?

Even in your diagrams you are actually showing such a cyclic link


>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P2  -> P1  -> NULL

Comparing this..

>>>>>
>>>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>>>> Hardware state:
>>>>>
>>>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^       ^
>>>>>  |       |
>>>>> P0,2    P1  -> NULL
>>>>>  |       ^
>>>>>  |       |
>>>>>  ---------
>>>>>
>>>>> As part of TC completion interrupt handling:
>>>>>
>>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P1  -> P2  -> NULL

.. with this. Notice that P2 -> P1 became P1 -> P2

The next thing logical diagram would look like:

>>>>>
>>>>> Now, on next interrupt, P1 gets copied and thus can get recycled.
>>>>> Hardware state:
>>>>>
>>>>> SG3  -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^       ^
>>>>>  |       |
>>>>> P0,1    P2  -> NULL
>>>>>  |       ^
>>>>>  |       |
>>>>>  ---------
>>>>>
>>>>> As part of TC completion interrupt handling:
>>>>>
>>>>> SG3 -> SG5 -> SG6 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P2  -> P1  -> NULL


"P1 gets copied" happens only because of the cyclic link from P2 to P1,
it wouldn't have happened if P2 was linked to Dummy as you described.

Now coming to 2 linked sets vs 1, I meant the same thing that to give
interrupt handler more time, we could have something like:

>>>>> As part of TC completion interrupt handling:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> NULL
>>>>>  ^      ^             ^
>>>>>  |      |             |
>>>>> P0  -> P1  -> P2  -> P3  -> P4  ->  Null

So what I was describing as 2 sets of linked sets is P1 and P2 being 1
set, and P3 and P4 being another set. We would then recycle a complete
set at the same time. That way interrupt handler could do more at once
and get more time to recycle. So we would setup TC interrupts only for
P2 and P4 in the above diagrams.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-02 18:15                   ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-02 18:15 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sekhar,

Thanks for your detailed illustrations.

On 08/02/2013 08:26 AM, Sekhar Nori wrote:
[..]
>>>>>> This can be used only for buffers that are contiguous in memory, not
>>>>>> those that are scattered across memory.
>>>>>
>>>>> I was hinting at using the linking facility of EDMA to achieve this.
>>>>> Each PaRAM set has full 32-bit source and destination pointers so I see
>>>>> no reason why non-contiguous case cannot be handled.
>>>>>
>>>>> Lets say you need to transfer SG[0..6] on channel C. Now, PaRAM sets are
>>>>> typically 4 times the number of channels. In this case we use one DMA
>>>>> PaRAM set and two Link PaRAM sets per channel. P0 is the DMA PaRAM set
>>>>> and P1 and P2 are the Link sets.
>>>>>
>>>>> Initial setup:
>>>>>
>>>>> SG0 -> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P1  -> P2  -> NULL
>>>>>
>>>>> P[0..2].TCINTEN = 1, so get an interrupt after each SG element
>>>>> completion. On each completion interrupt, hardware automatically copies
>>>>> the linked PaRAM set into the DMA PaRAM set so after SG0 is transferred
>>>>> out, the state of hardware is:
>>>>>
>>>>> SG1  -> SG2 -> SG3 -> SG3 -> SG6 -> NULL
>>>>>  ^       ^
>>>>>  |       |
>>>>> P0,1    P2  -> NULL
>>>>>  |       ^
>>>>>  |       |
>>>>>  ---------
>>>>>
>>>>> SG1 transfer has already started by the time the TC interrupt is
>>>>> handled. As you can see P1 is now redundant and ready to be recycled. So
>>>>> in the interrupt handler, software recycles P1. Thus:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P2  -> P1  -> NULL
>>>>>
>>>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>>>> Hardware state:
>>>>>
>>>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^       ^
>>>>>  |       |
>>>>> P0,2    P1  -> NULL
>>>>>  |       ^
>>>>>  |       |
>>>>>  ---------
>>>>>
>>>>> As part of TC completion interrupt handling:
>>>>>
>>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P1  -> P2  -> NULL
>>>>>
>>>>> This goes on until the SG list in exhausted. If you use more PaRAM sets,
>>>>> interrupt handler gets more time to recycle the PaRAM set. At no point
>>>>> we touch P0 as it is always under active transfer. Thus the peripheral
>>>>> is always kept busy.
>>>>>
>>>>> Do you see any reason why such a mechanism cannot be implemented?
>>>>
>>>> This is possible and looks like another way to do it, but there are 2
>>>> problems I can see with it.
>>>>
>>>> 1. Its inefficient because of too many interrupts:
>>>>
>>>> Imagine case where we have an SG list of size 30 and MAX_NR_SG size is
>>>> 10. This method will trigger 30 interrupts always, where as with my
>>>> patch series, you'd get only 3 interrupts. If you increase MAX_SG_NR ,
>>>> you'd get even fewer interrupts.
>>>
>>> Yes, but you are seeing only one side of inefficiency. In your design
>>> DMA *always* stalls waiting for CPU to intervene. The whole point to DMA
>>> is to keep it going while CPU does bookeeping in background. This is
>>> simply not going to scale with fast peripherals.
>>
>> Agreed. So far though, I've no way to reproduce a fast peripheral that
>> scatters data across physical memory and suffers from any stall.
>>
>>> Besides, missed events are error conditions as far as EDMA and the
>>> peripheral is considered. You are handling error interrupt to support a
>>> successful transaction. Think about why EDMA considers missed events as
>>> error condition.
>>
>> I agree with this, its not the best way to do it. I have been working on
>> a different approach.
>>
>> However, in support of the series:
>> 1. It doesn't break any existing code
>> 2. It works for all current DMA users (performance and correctness)
>> 3. It removes the SG limitations on DMA users.
> 
> Right, all of this should be true even with the approach I am suggesting.
> 
>> So what you suggested, would be more of a feature addition than a
>> limitation of this series. It is atleast better than what's being done
>> now - forcing the limit to the total number of SGs, so it is a step in
>> the right direction.
> 
> No, I do not see my approach is an feature addition to what you are
> doing. They are both very contrasting ways. For example, you would not
> need the manual (re)trigger in CC error condition in what I am proposing.
> 
>>
>>>> 2. If the interrupt handler for some reason doesn't complete or get
>>>> service in time, we will end up DMA'ing incorrect data as events
>>>> wouldn't stop coming in even if interrupt is not yet handled (in your
>>>> example linked sets P1 or P2 would be old ones being repeated). Where as
>>>> with my method, we are not doing any DMA once we finish the current
>>>> MAX_NR_SG set even if events continue to come.
>>>
>>> Where is repetition and possibility of wrong data being transferred? We
>>> have a linear list of PaRAM sets - not a loop. You would link the end to
>>> PaRAM set chain to dummy PaRAM set which BTW will not cause missed
>>> events. The more number of PaRAM sets you add to the chain, the more
>>
>> There would have to be a loop, how else would you ensure continuity and
>> uninterrupted DMA?
> 
> Uninterrupted DMA comes because of PaRAM set recycling. In my diagrams
> above, hardware is *always* using P0 for transfer while software always
> updates the tail of PaRAM linked list.
> 
>>
>> Consider if you have 2 sets of linked sets:
>> L1 is the first set of Linked sets and L2 is the second.
> 
> I think this is where there is confusion. I am using only one linked set
> of PaRAM entries (P0->P1->P2->DUMMY). If you need more time to service
> the interrupt before the DMA hits the dummy PaRAM you allocate more link
> PaRAM sets for the channel (P0->P1->...Pn->DUMMY). At no point was I
> suggesting having two sets of linked PaRAM sets. Why would you need
> something like that?
> 

I think we are talking about the same thing. Let's for now discuss
having just 1 linked set to avoid confusion, that's fine.

I think where we are differing in our understanding, is the dummy link
comes into picture only when we are transferring the *last* SG.
For all others there is a cyclic link between P1 and P2. Would you agree?

Even in your diagrams you are actually showing such a cyclic link


>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P2  -> P1  -> NULL

Comparing this..

>>>>>
>>>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>>>> Hardware state:
>>>>>
>>>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^       ^
>>>>>  |       |
>>>>> P0,2    P1  -> NULL
>>>>>  |       ^
>>>>>  |       |
>>>>>  ---------
>>>>>
>>>>> As part of TC completion interrupt handling:
>>>>>
>>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P1  -> P2  -> NULL

.. with this. Notice that P2 -> P1 became P1 -> P2

The next thing logical diagram would look like:

>>>>>
>>>>> Now, on next interrupt, P1 gets copied and thus can get recycled.
>>>>> Hardware state:
>>>>>
>>>>> SG3  -> SG4 -> SG5 -> SG6 -> NULL
>>>>>  ^       ^
>>>>>  |       |
>>>>> P0,1    P2  -> NULL
>>>>>  |       ^
>>>>>  |       |
>>>>>  ---------
>>>>>
>>>>> As part of TC completion interrupt handling:
>>>>>
>>>>> SG3 -> SG5 -> SG6 -> SG6 -> NULL
>>>>>  ^      ^      ^
>>>>>  |      |      |
>>>>> P0  -> P2  -> P1  -> NULL


"P1 gets copied" happens only because of the cyclic link from P2 to P1,
it wouldn't have happened if P2 was linked to Dummy as you described.

Now coming to 2 linked sets vs 1, I meant the same thing that to give
interrupt handler more time, we could have something like:

>>>>> As part of TC completion interrupt handling:
>>>>>
>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> NULL
>>>>>  ^      ^             ^
>>>>>  |      |             |
>>>>> P0  -> P1  -> P2  -> P3  -> P4  ->  Null

So what I was describing as 2 sets of linked sets is P1 and P2 being 1
set, and P3 and P4 being another set. We would then recycle a complete
set at the same time. That way interrupt handler could do more at once
and get more time to recycle. So we would setup TC interrupts only for
P2 and P4 in the above diagrams.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-02 23:00                     ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-02 23:00 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Tony Lindgren, Santosh Shilimkar, Sricharan R, Rajendra Nayak,
	Lokesh Vutla, Matt Porter, Grant Likely, Rob Herring, Vinod Koul,
	Dan Williams, Mark Brown, Benoit Cousson, Russell King,
	Arnd Bergmann, Olof Johansson, Balaji TK, Gururaja Hebbar,
	Chris Ball, Jason Kridner, Linux OMAP List,
	Linux ARM Kernel List, Linux DaVinci Kernel List,
	Linux Kernel Mailing List, Linux MMC List

Hi Sekhar,

Considering you agree with my understanding of the approach you proposed,

I worked on some code to quickly try the different approach (ping-pong)
between sets, here is a hack patch:

https://github.com/joelagnel/linux-kernel/commits/dma/edma-no-sg-limits-interleaved

As I suspected it also has problems with missing interrupts, coming back
to my other point about getting errors if ISR doesn't get enough time to
setup for the next transfer. If you'd use < 5 MAX_NR slots you start
seeing EDMA errors.

For > 5 slots, I don't see errors, but there is stalling because of
missed interrupts.

I observe that for an SG-list of size 10, it takes atleast 7 ms before
the interrupt handlers (ISR) gets a chance to execute. This I feel is
quite long, what is your opinion about this?

Describing my approach here:

If MAX slots is 10 for example, we split it into 2 cyclically linked
sets of size 5 each. Interrupts are setup to trigger for every 5 PaRAM
set transfers. After the first 5 transfer, the ISR recycles them for the
next 5 entries in the SG-list. This happens in parallel/simultaneously
as the second set of 5 are being transferred.

Thanks,

-Joel

On 08/02/2013 01:15 PM, Joel Fernandes wrote:[..]
> Even in your diagrams you are actually showing such a cyclic link
>
>
>>>>>>
>>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>>  ^      ^      ^
>>>>>>  |      |      |
>>>>>> P0  -> P2  -> P1  -> NULL
>
> Comparing this..
>
>>>>>>
>>>>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>>>>> Hardware state:
>>>>>>
>>>>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>>  ^       ^
>>>>>>  |       |
>>>>>> P0,2    P1  -> NULL
>>>>>>  |       ^
>>>>>>  |       |
>>>>>>  ---------
>>>>>>
>>>>>> As part of TC completion interrupt handling:
>>>>>>
>>>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>>  ^      ^      ^
>>>>>>  |      |      |
>>>>>> P0  -> P1  -> P2  -> NULL
>
> .. with this. Notice that P2 -> P1 became P1 -> P2
>
> The next thing logical diagram would look like:
>
>>>>>>
>>>>>> Now, on next interrupt, P1 gets copied and thus can get recycled.
>>>>>> Hardware state:
>>>>>>
>>>>>> SG3  -> SG4 -> SG5 -> SG6 -> NULL
>>>>>>  ^       ^
>>>>>>  |       |
>>>>>> P0,1    P2  -> NULL
>>>>>>  |       ^
>>>>>>  |       |
>>>>>>  ---------
>>>>>>
>>>>>> As part of TC completion interrupt handling:
>>>>>>
>>>>>> SG3 -> SG5 -> SG6 -> SG6 -> NULL
>>>>>>  ^      ^      ^
>>>>>>  |      |      |
>>>>>> P0  -> P2  -> P1  -> NULL
>
>
> "P1 gets copied" happens only because of the cyclic link from P2 to P1,
> it wouldn't have happened if P2 was linked to Dummy as you described.
>
> Now coming to 2 linked sets vs 1, I meant the same thing that to give
> interrupt handler more time, we could have something like:
>
>>>>>> As part of TC completion interrupt handling:
>>>>>>
>>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> NULL
>>>>>>  ^      ^             ^
>>>>>>  |      |             |
>>>>>> P0  -> P1  -> P2  -> P3  -> P4  ->  Null
>
> So what I was describing as 2 sets of linked sets is P1 and P2 being 1
> set, and P3 and P4 being another set. We would then recycle a complete
> set at the same time. That way interrupt handler could do more at once
> and get more time to recycle. So we would setup TC interrupts only for
> P2 and P4 in the above diagrams.
>
> Thanks,
>
> -Joel
>



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-02 23:00                     ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-02 23:00 UTC (permalink / raw)
  To: Sekhar Nori
  Cc: Mark Brown, Tony Lindgren, Grant Likely, Sricharan R,
	Russell King, Vinod Koul, Lokesh Vutla, Chris Ball,
	Arnd Bergmann, Rajendra Nayak, Rob Herring, Jason Kridner,
	Linux OMAP List, Linux ARM Kernel List,
	Linux DaVinci Kernel List, Balaji TK, Linux MMC List,
	Linux Kernel Mailing List, Santosh Shilimkar

Hi Sekhar,

Considering you agree with my understanding of the approach you proposed,

I worked on some code to quickly try the different approach (ping-pong)
between sets, here is a hack patch:

https://github.com/joelagnel/linux-kernel/commits/dma/edma-no-sg-limits-interleaved

As I suspected it also has problems with missing interrupts, coming back
to my other point about getting errors if ISR doesn't get enough time to
setup for the next transfer. If you'd use < 5 MAX_NR slots you start
seeing EDMA errors.

For > 5 slots, I don't see errors, but there is stalling because of
missed interrupts.

I observe that for an SG-list of size 10, it takes atleast 7 ms before
the interrupt handlers (ISR) gets a chance to execute. This I feel is
quite long, what is your opinion about this?

Describing my approach here:

If MAX slots is 10 for example, we split it into 2 cyclically linked
sets of size 5 each. Interrupts are setup to trigger for every 5 PaRAM
set transfers. After the first 5 transfer, the ISR recycles them for the
next 5 entries in the SG-list. This happens in parallel/simultaneously
as the second set of 5 are being transferred.

Thanks,

-Joel

On 08/02/2013 01:15 PM, Joel Fernandes wrote:[..]
> Even in your diagrams you are actually showing such a cyclic link
>
>
>>>>>>
>>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>>  ^      ^      ^
>>>>>>  |      |      |
>>>>>> P0  -> P2  -> P1  -> NULL
>
> Comparing this..
>
>>>>>>
>>>>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>>>>> Hardware state:
>>>>>>
>>>>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>>  ^       ^
>>>>>>  |       |
>>>>>> P0,2    P1  -> NULL
>>>>>>  |       ^
>>>>>>  |       |
>>>>>>  ---------
>>>>>>
>>>>>> As part of TC completion interrupt handling:
>>>>>>
>>>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>>  ^      ^      ^
>>>>>>  |      |      |
>>>>>> P0  -> P1  -> P2  -> NULL
>
> .. with this. Notice that P2 -> P1 became P1 -> P2
>
> The next thing logical diagram would look like:
>
>>>>>>
>>>>>> Now, on next interrupt, P1 gets copied and thus can get recycled.
>>>>>> Hardware state:
>>>>>>
>>>>>> SG3  -> SG4 -> SG5 -> SG6 -> NULL
>>>>>>  ^       ^
>>>>>>  |       |
>>>>>> P0,1    P2  -> NULL
>>>>>>  |       ^
>>>>>>  |       |
>>>>>>  ---------
>>>>>>
>>>>>> As part of TC completion interrupt handling:
>>>>>>
>>>>>> SG3 -> SG5 -> SG6 -> SG6 -> NULL
>>>>>>  ^      ^      ^
>>>>>>  |      |      |
>>>>>> P0  -> P2  -> P1  -> NULL
>
>
> "P1 gets copied" happens only because of the cyclic link from P2 to P1,
> it wouldn't have happened if P2 was linked to Dummy as you described.
>
> Now coming to 2 linked sets vs 1, I meant the same thing that to give
> interrupt handler more time, we could have something like:
>
>>>>>> As part of TC completion interrupt handling:
>>>>>>
>>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> NULL
>>>>>>  ^      ^             ^
>>>>>>  |      |             |
>>>>>> P0  -> P1  -> P2  -> P3  -> P4  ->  Null
>
> So what I was describing as 2 sets of linked sets is P1 and P2 being 1
> set, and P3 and P4 being another set. We would then recycle a complete
> set at the same time. That way interrupt handler could do more at once
> and get more time to recycle. So we would setup TC interrupts only for
> P2 and P4 in the above diagrams.
>
> Thanks,
>
> -Joel
>

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH 4/9] dma: edma: Find missed events and issue them
@ 2013-08-02 23:00                     ` Joel Fernandes
  0 siblings, 0 replies; 89+ messages in thread
From: Joel Fernandes @ 2013-08-02 23:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sekhar,

Considering you agree with my understanding of the approach you proposed,

I worked on some code to quickly try the different approach (ping-pong)
between sets, here is a hack patch:

https://github.com/joelagnel/linux-kernel/commits/dma/edma-no-sg-limits-interleaved

As I suspected it also has problems with missing interrupts, coming back
to my other point about getting errors if ISR doesn't get enough time to
setup for the next transfer. If you'd use < 5 MAX_NR slots you start
seeing EDMA errors.

For > 5 slots, I don't see errors, but there is stalling because of
missed interrupts.

I observe that for an SG-list of size 10, it takes atleast 7 ms before
the interrupt handlers (ISR) gets a chance to execute. This I feel is
quite long, what is your opinion about this?

Describing my approach here:

If MAX slots is 10 for example, we split it into 2 cyclically linked
sets of size 5 each. Interrupts are setup to trigger for every 5 PaRAM
set transfers. After the first 5 transfer, the ISR recycles them for the
next 5 entries in the SG-list. This happens in parallel/simultaneously
as the second set of 5 are being transferred.

Thanks,

-Joel

On 08/02/2013 01:15 PM, Joel Fernandes wrote:[..]
> Even in your diagrams you are actually showing such a cyclic link
>
>
>>>>>>
>>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>>  ^      ^      ^
>>>>>>  |      |      |
>>>>>> P0  -> P2  -> P1  -> NULL
>
> Comparing this..
>
>>>>>>
>>>>>> Now, on next interrupt, P2 gets copied and thus can get recycled.
>>>>>> Hardware state:
>>>>>>
>>>>>> SG2  -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>>  ^       ^
>>>>>>  |       |
>>>>>> P0,2    P1  -> NULL
>>>>>>  |       ^
>>>>>>  |       |
>>>>>>  ---------
>>>>>>
>>>>>> As part of TC completion interrupt handling:
>>>>>>
>>>>>> SG2 -> SG3 -> SG4 -> SG5 -> SG6 -> NULL
>>>>>>  ^      ^      ^
>>>>>>  |      |      |
>>>>>> P0  -> P1  -> P2  -> NULL
>
> .. with this. Notice that P2 -> P1 became P1 -> P2
>
> The next thing logical diagram would look like:
>
>>>>>>
>>>>>> Now, on next interrupt, P1 gets copied and thus can get recycled.
>>>>>> Hardware state:
>>>>>>
>>>>>> SG3  -> SG4 -> SG5 -> SG6 -> NULL
>>>>>>  ^       ^
>>>>>>  |       |
>>>>>> P0,1    P2  -> NULL
>>>>>>  |       ^
>>>>>>  |       |
>>>>>>  ---------
>>>>>>
>>>>>> As part of TC completion interrupt handling:
>>>>>>
>>>>>> SG3 -> SG5 -> SG6 -> SG6 -> NULL
>>>>>>  ^      ^      ^
>>>>>>  |      |      |
>>>>>> P0  -> P2  -> P1  -> NULL
>
>
> "P1 gets copied" happens only because of the cyclic link from P2 to P1,
> it wouldn't have happened if P2 was linked to Dummy as you described.
>
> Now coming to 2 linked sets vs 1, I meant the same thing that to give
> interrupt handler more time, we could have something like:
>
>>>>>> As part of TC completion interrupt handling:
>>>>>>
>>>>>> SG1 -> SG2 -> SG3 -> SG4 -> SG5 -> NULL
>>>>>>  ^      ^             ^
>>>>>>  |      |             |
>>>>>> P0  -> P1  -> P2  -> P3  -> P4  ->  Null
>
> So what I was describing as 2 sets of linked sets is P1 and P2 being 1
> set, and P3 and P4 being another set. We would then recycle a complete
> set at the same time. That way interrupt handler could do more at once
> and get more time to recycle. So we would setup TC interrupts only for
> P2 and P4 in the above diagrams.
>
> Thanks,
>
> -Joel
>

^ permalink raw reply	[flat|nested] 89+ messages in thread

end of thread, other threads:[~2013-08-02 23:00 UTC | newest]

Thread overview: 89+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-29 13:29 [PATCH 0/9] dma: edma: Support scatter-lists of any length Joel Fernandes
2013-07-29 13:29 ` Joel Fernandes
2013-07-29 13:29 ` Joel Fernandes
2013-07-29 13:29 ` [PATCH 1/9] dma: edma: Setup parameters to DMA MAX_NR_SG at a time Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29 ` [PATCH 2/9] dma: edma: Write out and handle MAX_NR_SG at a given time Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29 ` [PATCH 3/9] ARM: edma: Add function to manually trigger an EDMA channel Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-30  5:18   ` Sekhar Nori
2013-07-30  5:18     ` Sekhar Nori
2013-07-30  5:18     ` Sekhar Nori
2013-07-31  4:30     ` Joel Fernandes
2013-07-31  4:30       ` Joel Fernandes
2013-07-31  4:30       ` Joel Fernandes
2013-07-31  5:23       ` Sekhar Nori
2013-07-31  5:23         ` Sekhar Nori
2013-07-31  5:23         ` Sekhar Nori
     [not found]         ` <51F89F5E.2050605-l0cyMroinI0@public.gmane.org>
2013-07-31  5:34           ` Fernandes, Joel
2013-07-31  5:34             ` Fernandes, Joel
2013-07-29 13:29 ` [PATCH 4/9] dma: edma: Find missed events and issue them Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-30  7:05   ` Sekhar Nori
2013-07-30  7:05     ` Sekhar Nori
2013-07-30  7:05     ` Sekhar Nori
2013-07-31  4:49     ` Joel Fernandes
2013-07-31  4:49       ` Joel Fernandes
2013-07-31  4:49       ` Joel Fernandes
2013-07-31  9:18       ` Sekhar Nori
2013-07-31  9:18         ` Sekhar Nori
2013-07-31  9:18         ` Sekhar Nori
2013-08-01  2:27         ` Joel Fernandes
2013-08-01  2:27           ` Joel Fernandes
2013-08-01  2:27           ` Joel Fernandes
2013-08-01  3:43           ` Joel Fernandes
2013-08-01  3:43             ` Joel Fernandes
2013-08-01  3:43             ` Joel Fernandes
2013-08-01  4:39           ` Joel Fernandes
2013-08-01  4:39             ` Joel Fernandes
2013-08-01  4:39             ` Joel Fernandes
2013-08-01  6:13           ` Sekhar Nori
2013-08-01  6:13             ` Sekhar Nori
2013-08-01  6:13             ` Sekhar Nori
2013-08-01 20:28             ` Joel Fernandes
2013-08-01 20:28               ` Joel Fernandes
2013-08-01 20:28               ` Joel Fernandes
2013-08-01 20:48               ` Joel Fernandes
2013-08-01 20:48                 ` Joel Fernandes
2013-08-01 20:48                 ` Joel Fernandes
2013-08-02 13:26               ` Sekhar Nori
2013-08-02 13:26                 ` Sekhar Nori
2013-08-02 13:26                 ` Sekhar Nori
2013-08-02 18:15                 ` Joel Fernandes
2013-08-02 18:15                   ` Joel Fernandes
2013-08-02 18:15                   ` Joel Fernandes
2013-08-02 23:00                   ` Joel Fernandes
2013-08-02 23:00                     ` Joel Fernandes
2013-08-02 23:00                     ` Joel Fernandes
2013-07-29 13:29 ` [PATCH 5/9] dma: edma: Leave linked to Null slot instead of DUMMY slot Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29 ` [PATCH 6/9] dma: edma: Detect null slot errors and handle them correctly Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29 ` [PATCH 7/9] ARM: edma: Don't clear EMR of channel in edma_stop Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-30  8:29   ` Sekhar Nori
2013-07-30  8:29     ` Sekhar Nori
2013-07-30  8:29     ` Sekhar Nori
2013-07-31  5:05     ` Joel Fernandes
2013-07-31  5:05       ` Joel Fernandes
2013-07-31  5:05       ` Joel Fernandes
2013-07-31  9:35       ` Sekhar Nori
2013-07-31  9:35         ` Sekhar Nori
2013-07-31  9:35         ` Sekhar Nori
2013-08-01  1:59         ` Joel Fernandes
2013-08-01  1:59           ` Joel Fernandes
2013-08-01  1:59           ` Joel Fernandes
2013-07-29 13:29 ` [PATCH 8/9] dma: edma: Link to dummy slot only for last SG list split Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29 ` [PATCH 9/9] dma: edma: remove limits on number of slots Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes
2013-07-29 13:29   ` Joel Fernandes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.