* [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg @ 2016-07-14 12:42 Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 1/7] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list Peter Ujfalusi ` (7 more replies) 0 siblings, 8 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw) To: linux-arm-kernel Hi, The following series with the final patch will add support for sDMA Linked List transfer support. Linked List is supported by sDMA in OMAP3630+ (OMAP4/5, dra7 family). If the descriptor load feature is present we can create the descriptors for each SG beforehand and let sDMA to walk them through. This way the number of sDMA interrupts the kernel need to handle will drop dramatically. I have gathered some numbers to show the difference. Booting up the board with filesystem on SD card for example: # cat /proc/interrupts | grep dma W/o LinkedList support: 27: 4436 0 WUGEN 13 Level omap-dma-engine Same board/filesystem with this patch: 27: 1027 0 WUGEN 13 Level omap-dma-engine Or copying files from SD card to eMCC: # du -h /usr 2.1G /usr/ # find /usr/ -type f | wc -l 232001 # cp -r /usr/* /mnt/emmc/tmp/ W/o LinkedList we see ~761069 DMA interrupts. With LinkedList support it is down to ~269314 DMA interrupts. With the decreased DMA interrupt number the CPU load is dropping significantly as well. The series depends on the interleaved transfer support patch I have sent couple of days ago: https://lkml.org/lkml/2016/7/12/216 Regards, Peter --- Peter Ujfalusi (7): dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list dmaengine: omap-dma: Complete the cookie first on transfer completion dmaengine: omap-dma: Simplify omap_dma_callback dmaengine: omap-dma: Dynamically allocate memory for lch_map dmaengine: omap-dma: Add more debug information when freeing channel dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop dmaengine: omap-dma: Support for LinkedList transfer of slave_sg drivers/dma/omap-dma.c | 234 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 207 insertions(+), 27 deletions(-) -- 2.9.1 ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 1/7] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list 2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi @ 2016-07-14 12:42 ` Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion Peter Ujfalusi ` (6 subsequent siblings) 7 siblings, 0 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw) To: linux-arm-kernel We can drop the (sg)idx parameter for the omap_dma_start_sg() function and increment the sgidx inside of the same function. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> --- drivers/dma/omap-dma.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c index 2e0d49bcfd8a..7d56cd88c9a5 100644 --- a/drivers/dma/omap-dma.c +++ b/drivers/dma/omap-dma.c @@ -365,10 +365,9 @@ static void omap_dma_stop(struct omap_chan *c) c->running = false; } -static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d, - unsigned idx) +static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d) { - struct omap_sg *sg = d->sg + idx; + struct omap_sg *sg = d->sg + c->sgidx; unsigned cxsa, cxei, cxfi; if (d->dir == DMA_DEV_TO_MEM || d->dir == DMA_MEM_TO_MEM) { @@ -388,6 +387,7 @@ static void omap_dma_start_sg(struct omap_chan *c, struct omap_desc *d, omap_dma_chan_write(c, CFN, sg->fn); omap_dma_start(c, d); + c->sgidx++; } static void omap_dma_start_desc(struct omap_chan *c) @@ -433,7 +433,7 @@ static void omap_dma_start_desc(struct omap_chan *c) omap_dma_chan_write(c, CSDP, d->csdp); omap_dma_chan_write(c, CLNK_CTRL, d->clnk_ctrl); - omap_dma_start_sg(c, d, 0); + omap_dma_start_sg(c, d); } static void omap_dma_callback(int ch, u16 status, void *data) @@ -446,8 +446,8 @@ static void omap_dma_callback(int ch, u16 status, void *data) d = c->desc; if (d) { if (!c->cyclic) { - if (++c->sgidx < d->sglen) { - omap_dma_start_sg(c, d, c->sgidx); + if (c->sgidx < d->sglen) { + omap_dma_start_sg(c, d); } else { omap_dma_start_desc(c); vchan_cookie_complete(&d->vd); -- 2.9.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion 2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 1/7] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list Peter Ujfalusi @ 2016-07-14 12:42 ` Peter Ujfalusi 2016-07-18 10:34 ` Russell King - ARM Linux 2016-07-14 12:42 ` [PATCH 3/7] dmaengine: omap-dma: Simplify omap_dma_callback Peter Ujfalusi ` (5 subsequent siblings) 7 siblings, 1 reply; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw) To: linux-arm-kernel Before looking for the next descriptor to start, complete the just finished cookie. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> --- drivers/dma/omap-dma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c index 7d56cd88c9a5..f7b0b0c668fb 100644 --- a/drivers/dma/omap-dma.c +++ b/drivers/dma/omap-dma.c @@ -449,8 +449,8 @@ static void omap_dma_callback(int ch, u16 status, void *data) if (c->sgidx < d->sglen) { omap_dma_start_sg(c, d); } else { - omap_dma_start_desc(c); vchan_cookie_complete(&d->vd); + omap_dma_start_desc(c); } } else { vchan_cyclic_callback(&d->vd); -- 2.9.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion 2016-07-14 12:42 ` [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion Peter Ujfalusi @ 2016-07-18 10:34 ` Russell King - ARM Linux 2016-07-19 12:35 ` Peter Ujfalusi 0 siblings, 1 reply; 24+ messages in thread From: Russell King - ARM Linux @ 2016-07-18 10:34 UTC (permalink / raw) To: linux-arm-kernel On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote: > Before looking for the next descriptor to start, complete the just finished > cookie. This change will reduce performance as we no longer have an overlap between the next request starting to be dealt with in the hardware vs the previous request being completed. Your commit log doesn't say _why_ the change is being made, it merely tells us what the patch is doing, which we can see already. Please describe changes a little better. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion 2016-07-18 10:34 ` Russell King - ARM Linux @ 2016-07-19 12:35 ` Peter Ujfalusi 2016-07-19 16:20 ` Russell King - ARM Linux 2016-07-20 6:26 ` Robert Jarzmik 0 siblings, 2 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-19 12:35 UTC (permalink / raw) To: linux-arm-kernel On 07/18/16 13:34, Russell King - ARM Linux wrote: > On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote: >> Before looking for the next descriptor to start, complete the just finished >> cookie. > > This change will reduce performance as we no longer have an overlap > between the next request starting to be dealt with in the hardware > vs the previous request being completed. vchan_cookie_complete() will only mark the cookie completed, adds the vd to the desc_completed list (it was deleted from desc_issued list when it was started by omap_dma_start_desc) and schedule the tasklet to deal with the real completion later. Marking the just finished descriptor/cookie done first then looking for possible descriptors in the queue to start feels like a better sequence. After a quick grep in the kernel source: only omap-dma.c was starting the next transfer before marking the current completed descriptor/cookie done. > Your commit log doesn't > say _why_ the change is being made, it merely tells us what the > patch is doing, which we can see already. > > Please describe changes a little better. > -- P?ter ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion 2016-07-19 12:35 ` Peter Ujfalusi @ 2016-07-19 16:20 ` Russell King - ARM Linux 2016-07-19 19:23 ` Peter Ujfalusi 2016-07-24 7:39 ` Vinod Koul 2016-07-20 6:26 ` Robert Jarzmik 1 sibling, 2 replies; 24+ messages in thread From: Russell King - ARM Linux @ 2016-07-19 16:20 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 19, 2016 at 03:35:18PM +0300, Peter Ujfalusi wrote: > On 07/18/16 13:34, Russell King - ARM Linux wrote: > > On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote: > >> Before looking for the next descriptor to start, complete the just finished > >> cookie. > > > > This change will reduce performance as we no longer have an overlap > > between the next request starting to be dealt with in the hardware > > vs the previous request being completed. > > vchan_cookie_complete() will only mark the cookie completed, adds the vd to > the desc_completed list (it was deleted from desc_issued list when it was > started by omap_dma_start_desc) and schedule the tasklet to deal with the real > completion later. > Marking the just finished descriptor/cookie done first then looking for > possible descriptors in the queue to start feels like a better sequence. I deliberately arranged the code in the original order so that the next transfer was started on the hardware with the least amount of work by the CPU. Yes, there may not be much in it, but everything you mention above adds to the number of CPU cycles that need to be executed before the next transfer can be started. More CPU cycles wasted means higher latency between transfers, which means lower performance. > After a quick grep in the kernel source: only omap-dma.c was starting the > next transfer before marking the current completed descriptor/cookie done. Right, because I've thought about the issue, having been the author of both virt-dma and omap-dma. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion 2016-07-19 16:20 ` Russell King - ARM Linux @ 2016-07-19 19:23 ` Peter Ujfalusi 2016-07-24 7:39 ` Vinod Koul 1 sibling, 0 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-19 19:23 UTC (permalink / raw) To: linux-arm-kernel On 07/19/2016 07:20 PM, Russell King - ARM Linux wrote: >> vchan_cookie_complete() will only mark the cookie completed, adds the vd to >> the desc_completed list (it was deleted from desc_issued list when it was >> started by omap_dma_start_desc) and schedule the tasklet to deal with the real >> completion later. >> Marking the just finished descriptor/cookie done first then looking for >> possible descriptors in the queue to start feels like a better sequence. > > I deliberately arranged the code in the original order so that the next > transfer was started on the hardware with the least amount of work by > the CPU. Yes, there may not be much in it, but everything you mention > above adds to the number of CPU cycles that need to be executed before > the next transfer can be started. > > More CPU cycles wasted means higher latency between transfers, which > means lower performance. OK. I will drop this patch in v2. >> After a quick grep in the kernel source: only omap-dma.c was starting the >> next transfer before marking the current completed descriptor/cookie done. > > Right, because I've thought about the issue, having been the author of > both virt-dma and omap-dma. > -- P?ter ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion 2016-07-19 16:20 ` Russell King - ARM Linux 2016-07-19 19:23 ` Peter Ujfalusi @ 2016-07-24 7:39 ` Vinod Koul 1 sibling, 0 replies; 24+ messages in thread From: Vinod Koul @ 2016-07-24 7:39 UTC (permalink / raw) To: linux-arm-kernel On Tue, Jul 19, 2016 at 05:20:04PM +0100, Russell King - ARM Linux wrote: > On Tue, Jul 19, 2016 at 03:35:18PM +0300, Peter Ujfalusi wrote: > > On 07/18/16 13:34, Russell King - ARM Linux wrote: > > > On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote: > > >> Before looking for the next descriptor to start, complete the just finished > > >> cookie. > > > > > > This change will reduce performance as we no longer have an overlap > > > between the next request starting to be dealt with in the hardware > > > vs the previous request being completed. > > > > vchan_cookie_complete() will only mark the cookie completed, adds the vd to > > the desc_completed list (it was deleted from desc_issued list when it was > > started by omap_dma_start_desc) and schedule the tasklet to deal with the real > > completion later. > > Marking the just finished descriptor/cookie done first then looking for > > possible descriptors in the queue to start feels like a better sequence. > > I deliberately arranged the code in the original order so that the next > transfer was started on the hardware with the least amount of work by > the CPU. Yes, there may not be much in it, but everything you mention > above adds to the number of CPU cycles that need to be executed before > the next transfer can be started. Yes that is really the right thing to do. Ideally people would want to minimize the delay and submit the next one as soon as possible, but people have been lazy on this and few other aspects :) -- ~Vinod ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion 2016-07-19 12:35 ` Peter Ujfalusi 2016-07-19 16:20 ` Russell King - ARM Linux @ 2016-07-20 6:26 ` Robert Jarzmik 2016-07-21 9:33 ` Peter Ujfalusi 1 sibling, 1 reply; 24+ messages in thread From: Robert Jarzmik @ 2016-07-20 6:26 UTC (permalink / raw) To: linux-arm-kernel Peter Ujfalusi <peter.ujfalusi@ti.com> writes: > On 07/18/16 13:34, Russell King - ARM Linux wrote: >> On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote: >>> Before looking for the next descriptor to start, complete the just finished >>> cookie. >> >> This change will reduce performance as we no longer have an overlap >> between the next request starting to be dealt with in the hardware >> vs the previous request being completed. > > vchan_cookie_complete() will only mark the cookie completed, adds the vd to > the desc_completed list (it was deleted from desc_issued list when it was > started by omap_dma_start_desc) and schedule the tasklet to deal with the real > completion later. > Marking the just finished descriptor/cookie done first then looking for > possible descriptors in the queue to start feels like a better sequence. > > After a quick grep in the kernel source: only omap-dma.c was starting the next > transfer before marking the current completed descriptor/cookie done. Euh actually I think it's done in other drivers as well : - Documentation/dmaengine/pxa_dma.txt (chapter "Transfers hot-chaining) - drivers/dma/pxa_dma.c => look for pxad_try_hotchain() and it's impact on pxad_chan_handler() which will mark the completion while the next transfer is already pumped by the hardware. Speaking of which, from a purely design point of view, as long as you think beforehand what is your sequence, ie. what is the sequence of your link chaining, completion handling, etc ..., both marking before or after next tx start should be fine IMHO. So in your quest for the "better sequence" the pxa driver's one might give you some perspective :) Cheers. -- Robert ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion 2016-07-20 6:26 ` Robert Jarzmik @ 2016-07-21 9:33 ` Peter Ujfalusi 2016-07-21 9:35 ` Peter Ujfalusi 2016-07-21 9:47 ` Russell King - ARM Linux 0 siblings, 2 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-21 9:33 UTC (permalink / raw) To: linux-arm-kernel On 07/20/16 09:26, Robert Jarzmik wrote: > Peter Ujfalusi <peter.ujfalusi@ti.com> writes: > >> On 07/18/16 13:34, Russell King - ARM Linux wrote: >>> On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote: >>>> Before looking for the next descriptor to start, complete the just finished >>>> cookie. >>> >>> This change will reduce performance as we no longer have an overlap >>> between the next request starting to be dealt with in the hardware >>> vs the previous request being completed. >> >> vchan_cookie_complete() will only mark the cookie completed, adds the vd to >> the desc_completed list (it was deleted from desc_issued list when it was >> started by omap_dma_start_desc) and schedule the tasklet to deal with the real >> completion later. >> Marking the just finished descriptor/cookie done first then looking for >> possible descriptors in the queue to start feels like a better sequence. >> >> After a quick grep in the kernel source: only omap-dma.c was starting the next >> transfer before marking the current completed descriptor/cookie done. > > Euh actually I think it's done in other drivers as well : > - Documentation/dmaengine/pxa_dma.txt (chapter "Transfers hot-chaining) > - drivers/dma/pxa_dma.c > => look for pxad_try_hotchain() and it's impact on pxad_chan_handler() which > will mark the completion while the next transfer is already pumped by the > hardware. The 'hot-chaining' is a bit different then what omap-dma is doing. If I got it right. When the DMA is running and a new request comes the driver will append the new transfer to the list used by the HW. This way there will be no stop and restart needed, the DMA is running w/o interruption. > Speaking of which, from a purely design point of view, as long as you think > beforehand what is your sequence, ie. what is the sequence of your link > chaining, completion handling, etc ..., both marking before or after next tx > start should be fine IMHO. Yes, it might be a bit better from performance point of view if we first start the pending descriptor (if there is one) then do the vchan_cookie_complete(). On the other hand if we care more about latency and accuracy we should complete the transfer first then look for pending descriptors. But since virt_dma is using a tasklet for the real completion, the latency is always going to be when the tasklet is given the chance to execute. > So in your quest for the "better sequence" the pxa driver's one might give you > some perspective :) I did thought about similar 'hot-chaining' for TI's eDMA and sDMA. Especially eDMA would benefit from it, but so far I see too many race conditions to overcome to be brave enough to write something to test it. and I don't have time for it atm ;) -- P?ter ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion 2016-07-21 9:33 ` Peter Ujfalusi @ 2016-07-21 9:35 ` Peter Ujfalusi 2016-07-21 9:47 ` Russell King - ARM Linux 1 sibling, 0 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-21 9:35 UTC (permalink / raw) To: linux-arm-kernel On 07/21/16 12:33, Peter Ujfalusi wrote: > On 07/20/16 09:26, Robert Jarzmik wrote: >> Peter Ujfalusi <peter.ujfalusi@ti.com> writes: >> >>> On 07/18/16 13:34, Russell King - ARM Linux wrote: >>>> On Thu, Jul 14, 2016 at 03:42:37PM +0300, Peter Ujfalusi wrote: >>>>> Before looking for the next descriptor to start, complete the just finished >>>>> cookie. >>>> >>>> This change will reduce performance as we no longer have an overlap >>>> between the next request starting to be dealt with in the hardware >>>> vs the previous request being completed. >>> >>> vchan_cookie_complete() will only mark the cookie completed, adds the vd to >>> the desc_completed list (it was deleted from desc_issued list when it was >>> started by omap_dma_start_desc) and schedule the tasklet to deal with the real >>> completion later. >>> Marking the just finished descriptor/cookie done first then looking for >>> possible descriptors in the queue to start feels like a better sequence. >>> >>> After a quick grep in the kernel source: only omap-dma.c was starting the next >>> transfer before marking the current completed descriptor/cookie done. >> >> Euh actually I think it's done in other drivers as well : >> - Documentation/dmaengine/pxa_dma.txt (chapter "Transfers hot-chaining) >> - drivers/dma/pxa_dma.c >> => look for pxad_try_hotchain() and it's impact on pxad_chan_handler() which >> will mark the completion while the next transfer is already pumped by the >> hardware. > > The 'hot-chaining' is a bit different then what omap-dma is doing. s/then/than > If I got it > right. When the DMA is running and a new request comes the driver will append > the new transfer to the list used by the HW. This way there will be no stop > and restart needed, the DMA is running w/o interruption. > >> Speaking of which, from a purely design point of view, as long as you think >> beforehand what is your sequence, ie. what is the sequence of your link >> chaining, completion handling, etc ..., both marking before or after next tx >> start should be fine IMHO. > > Yes, it might be a bit better from performance point of view if we first start > the pending descriptor (if there is one) then do the vchan_cookie_complete(). > On the other hand if we care more about latency and accuracy we should > complete the transfer first then look for pending descriptors. But since > virt_dma is using a tasklet for the real completion, the latency is always > going to be when the tasklet is given the chance to execute. > >> So in your quest for the "better sequence" the pxa driver's one might give you >> some perspective :) > > I did thought about similar 'hot-chaining' for TI's eDMA and sDMA. Especially > eDMA would benefit from it, but so far I see too many race conditions to > overcome to be brave enough to write something to test it. and I don't have > time for it atm ;) > -- P?ter ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion 2016-07-21 9:33 ` Peter Ujfalusi 2016-07-21 9:35 ` Peter Ujfalusi @ 2016-07-21 9:47 ` Russell King - ARM Linux 2016-07-22 11:00 ` Peter Ujfalusi 1 sibling, 1 reply; 24+ messages in thread From: Russell King - ARM Linux @ 2016-07-21 9:47 UTC (permalink / raw) To: linux-arm-kernel On Thu, Jul 21, 2016 at 12:33:12PM +0300, Peter Ujfalusi wrote: > On 07/20/16 09:26, Robert Jarzmik wrote: > > Speaking of which, from a purely design point of view, as long as you think > > beforehand what is your sequence, ie. what is the sequence of your link > > chaining, completion handling, etc ..., both marking before or after next tx > > start should be fine IMHO. > > Yes, it might be a bit better from performance point of view if we first start > the pending descriptor (if there is one) then do the vchan_cookie_complete(). > On the other hand if we care more about latency and accuracy we should > complete the transfer first then look for pending descriptors. But since > virt_dma is using a tasklet for the real completion, the latency is always > going to be when the tasklet is given the chance to execute. I think this shows a slight misunderstanding of the DMA engine API. The DMA completion is defined by the API to always happen in tasklet context, which is why the virt-dma stuff does it that way - and all other DMA engine drivers. It's one of the fundamentals of the API. As it happens in tasklet context, tasklets can be scheduled to run with variable latency, so any use of the DMA engine API which has a predictable latency around the completion handling is going to be unreliable. Remember also that with circular buffers, there's no guarantee of getting period-based completion callbacks - several periods can complete and you are only guaranteed to get one completion callback. So, the idea that completion callbacks can have anything to do with low latency or accuracy is totally incorrect. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion 2016-07-21 9:47 ` Russell King - ARM Linux @ 2016-07-22 11:00 ` Peter Ujfalusi 0 siblings, 0 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-22 11:00 UTC (permalink / raw) To: linux-arm-kernel On 07/21/16 12:47, Russell King - ARM Linux wrote: > On Thu, Jul 21, 2016 at 12:33:12PM +0300, Peter Ujfalusi wrote: >> On 07/20/16 09:26, Robert Jarzmik wrote: >>> Speaking of which, from a purely design point of view, as long as you think >>> beforehand what is your sequence, ie. what is the sequence of your link >>> chaining, completion handling, etc ..., both marking before or after next tx >>> start should be fine IMHO. >> >> Yes, it might be a bit better from performance point of view if we first start >> the pending descriptor (if there is one) then do the vchan_cookie_complete(). >> On the other hand if we care more about latency and accuracy we should >> complete the transfer first then look for pending descriptors. But since >> virt_dma is using a tasklet for the real completion, the latency is always >> going to be when the tasklet is given the chance to execute. > > I think this shows a slight misunderstanding of the DMA engine API. The > DMA completion is defined by the API to always happen in tasklet context, > which is why the virt-dma stuff does it that way - and all other DMA > engine drivers. It's one of the fundamentals of the API. > > As it happens in tasklet context, tasklets can be scheduled to run with > variable latency, so any use of the DMA engine API which has a predictable > latency around the completion handling is going to be unreliable. > > Remember also that with circular buffers, there's no guarantee of getting > period-based completion callbacks - several periods can complete and you > are only guaranteed to get one completion callback. > > So, the idea that completion callbacks can have anything to do with low > latency or accuracy is totally incorrect. Thanks for refreshing my memory, you are absolutely right. -- P?ter ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 3/7] dmaengine: omap-dma: Simplify omap_dma_callback 2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 1/7] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion Peter Ujfalusi @ 2016-07-14 12:42 ` Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 4/7] dmaengine: omap-dma: Dynamically allocate memory for lch_map Peter Ujfalusi ` (4 subsequent siblings) 7 siblings, 0 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw) To: linux-arm-kernel Flatten the indentation level of the function which gives better view on the cases we handle here. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> --- drivers/dma/omap-dma.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c index f7b0b0c668fb..6d134252ed61 100644 --- a/drivers/dma/omap-dma.c +++ b/drivers/dma/omap-dma.c @@ -445,15 +445,13 @@ static void omap_dma_callback(int ch, u16 status, void *data) spin_lock_irqsave(&c->vc.lock, flags); d = c->desc; if (d) { - if (!c->cyclic) { - if (c->sgidx < d->sglen) { - omap_dma_start_sg(c, d); - } else { - vchan_cookie_complete(&d->vd); - omap_dma_start_desc(c); - } - } else { + if (c->cyclic) { vchan_cyclic_callback(&d->vd); + } else if (c->sgidx == d->sglen) { + vchan_cookie_complete(&d->vd); + omap_dma_start_desc(c); + } else { + omap_dma_start_sg(c, d); } } spin_unlock_irqrestore(&c->vc.lock, flags); -- 2.9.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 4/7] dmaengine: omap-dma: Dynamically allocate memory for lch_map 2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi ` (2 preceding siblings ...) 2016-07-14 12:42 ` [PATCH 3/7] dmaengine: omap-dma: Simplify omap_dma_callback Peter Ujfalusi @ 2016-07-14 12:42 ` Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 5/7] dmaengine: omap-dma: Add more debug information when freeing channel Peter Ujfalusi ` (3 subsequent siblings) 7 siblings, 0 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw) To: linux-arm-kernel On OMAP1 platforms we do not have 32 channels available. Allocate the lch_map based on the available channels. This way we are not going to have more visible channels then it is available on the platform. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> --- drivers/dma/omap-dma.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c index 6d134252ed61..c026642fc66a 100644 --- a/drivers/dma/omap-dma.c +++ b/drivers/dma/omap-dma.c @@ -35,7 +35,7 @@ struct omap_dmadev { unsigned dma_requests; spinlock_t irq_lock; uint32_t irq_enable_mask; - struct omap_chan *lch_map[OMAP_SDMA_CHANNELS]; + struct omap_chan **lch_map; }; struct omap_chan { @@ -1223,16 +1223,24 @@ static int omap_dma_probe(struct platform_device *pdev) spin_lock_init(&od->lock); spin_lock_init(&od->irq_lock); - od->dma_requests = OMAP_SDMA_REQUESTS; - if (pdev->dev.of_node && of_property_read_u32(pdev->dev.of_node, - "dma-requests", - &od->dma_requests)) { + if (!pdev->dev.of_node) { + od->dma_requests = od->plat->dma_attr->lch_count; + if (unlikely(!od->dma_requests)) + od->dma_requests = OMAP_SDMA_REQUESTS; + } else if (of_property_read_u32(pdev->dev.of_node, "dma-requests", + &od->dma_requests)) { dev_info(&pdev->dev, "Missing dma-requests property, using %u.\n", OMAP_SDMA_REQUESTS); + od->dma_requests = OMAP_SDMA_REQUESTS; } - for (i = 0; i < OMAP_SDMA_CHANNELS; i++) { + od->lch_map = devm_kcalloc(&pdev->dev, od->dma_requests, + sizeof(*od->lch_map), GFP_KERNEL); + if (!od->lch_map) + return -ENOMEM; + + for (i = 0; i < od->dma_requests; i++) { rc = omap_dma_chan_init(od); if (rc) { omap_dma_free(od); -- 2.9.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 5/7] dmaengine: omap-dma: Add more debug information when freeing channel 2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi ` (3 preceding siblings ...) 2016-07-14 12:42 ` [PATCH 4/7] dmaengine: omap-dma: Dynamically allocate memory for lch_map Peter Ujfalusi @ 2016-07-14 12:42 ` Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 6/7] dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop Peter Ujfalusi ` (2 subsequent siblings) 7 siblings, 0 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw) To: linux-arm-kernel Print the same information the driver prints when allocating the channel resources regarding to the sDMA channel. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> --- drivers/dma/omap-dma.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c index c026642fc66a..bbad82985083 100644 --- a/drivers/dma/omap-dma.c +++ b/drivers/dma/omap-dma.c @@ -568,7 +568,8 @@ static void omap_dma_free_chan_resources(struct dma_chan *chan) vchan_free_chan_resources(&c->vc); omap_free_dma(c->dma_ch); - dev_dbg(od->ddev.dev, "freeing channel for %u\n", c->dma_sig); + dev_dbg(od->ddev.dev, "freeing channel %u used for %u\n", c->dma_ch, + c->dma_sig); c->dma_sig = 0; } -- 2.9.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 6/7] dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop 2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi ` (4 preceding siblings ...) 2016-07-14 12:42 ` [PATCH 5/7] dmaengine: omap-dma: Add more debug information when freeing channel Peter Ujfalusi @ 2016-07-14 12:42 ` Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg Peter Ujfalusi 2016-07-18 10:31 ` [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Russell King - ARM Linux 7 siblings, 0 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw) To: linux-arm-kernel Instead of accessing the array via index, take the pointer first and use it to set up the omap_sg struct. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> --- drivers/dma/omap-dma.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c index bbad82985083..8497750fa44a 100644 --- a/drivers/dma/omap-dma.c +++ b/drivers/dma/omap-dma.c @@ -819,9 +819,11 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg( en = burst; frame_bytes = es_bytes[es] * en; for_each_sg(sgl, sgent, sglen, i) { - d->sg[i].addr = sg_dma_address(sgent); - d->sg[i].en = en; - d->sg[i].fn = sg_dma_len(sgent) / frame_bytes; + struct omap_sg *osg = &d->sg[i]; + + osg->addr = sg_dma_address(sgent); + osg->en = en; + osg->fn = sg_dma_len(sgent) / frame_bytes; } d->sglen = sglen; -- 2.9.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg 2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi ` (5 preceding siblings ...) 2016-07-14 12:42 ` [PATCH 6/7] dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop Peter Ujfalusi @ 2016-07-14 12:42 ` Peter Ujfalusi 2016-07-18 10:42 ` Russell King - ARM Linux 2016-07-18 10:31 ` [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Russell King - ARM Linux 7 siblings, 1 reply; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-14 12:42 UTC (permalink / raw) To: linux-arm-kernel sDMA in OMAP3630 or newer SoC have support for LinkedList transfer. When LinkedList or Descriptor load feature is present we can create the descriptors for each and program sDMA to walk through the list of descriptors instead of the current way of sDMA stop, sDMA reconfiguration and sDMA start after each SG transfer. By using LinkedList transfer in sDMA the number of DMA interrupts will decrease dramatically. Booting up the board with filesystem on SD card for example: # cat /proc/interrupts | grep dma W/o LinkedList support: 27: 4436 0 WUGEN 13 Level omap-dma-engine Same board/filesystem with this patch: 27: 1027 0 WUGEN 13 Level omap-dma-engine Or copying files from SD card to eMCC: # du -h /usr 2.1G /usr/ # find /usr/ -type f | wc -l 232001 # cp -r /usr/* /mnt/emmc/tmp/ W/o LinkedList we see ~761069 DMA interrupts. With LinkedList support it is down to ~269314 DMA interrupts. With the decreased DMA interrupt number the CPU load is dropping significantly as well. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> --- drivers/dma/omap-dma.c | 183 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 177 insertions(+), 6 deletions(-) diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c index 8497750fa44a..22b3e1a5425d 100644 --- a/drivers/dma/omap-dma.c +++ b/drivers/dma/omap-dma.c @@ -8,6 +8,7 @@ #include <linux/delay.h> #include <linux/dmaengine.h> #include <linux/dma-mapping.h> +#include <linux/dmapool.h> #include <linux/err.h> #include <linux/init.h> #include <linux/interrupt.h> @@ -32,6 +33,7 @@ struct omap_dmadev { const struct omap_dma_reg *reg_map; struct omap_system_dma_plat_info *plat; bool legacy; + bool ll123_supported; unsigned dma_requests; spinlock_t irq_lock; uint32_t irq_enable_mask; @@ -41,6 +43,7 @@ struct omap_dmadev { struct omap_chan { struct virt_dma_chan vc; void __iomem *channel_base; + struct dma_pool *desc_pool; const struct omap_dma_reg *reg_map; uint32_t ccr; @@ -55,16 +58,41 @@ struct omap_chan { unsigned sgidx; }; +#define DESC_NXT_SV_REFRESH (0x1 << 24) +#define DESC_NXT_SV_REUSE (0x2 << 24) +#define DESC_NXT_DV_REFRESH (0x1 << 26) +#define DESC_NXT_DV_REUSE (0x2 << 26) +#define DESC_NTYPE_TYPE2 (0x2 << 29) + +/* Type 2 descriptor with Source or Destination address update */ +struct omap_type2_desc { + uint32_t next_desc; + uint32_t en; + uint32_t addr; /* src or dst */ + uint16_t fn; + uint16_t cicr; + uint16_t cdei; + uint16_t csei; + uint32_t cdfi; + uint32_t csfi; +} __packed; + struct omap_sg { dma_addr_t addr; uint32_t en; /* number of elements (24-bit) */ uint32_t fn; /* number of frames (16-bit) */ int32_t fi; /* for double indexing */ int16_t ei; /* for double indexing */ + + /* Linked list */ + struct omap_type2_desc *t2_desc; + dma_addr_t t2_desc_paddr; }; struct omap_desc { + struct omap_chan *c; struct virt_dma_desc vd; + bool using_ll; enum dma_transfer_direction dir; dma_addr_t dev_addr; @@ -81,6 +109,9 @@ struct omap_desc { }; enum { + CAPS_0_SUPPORT_LL123 = BIT(20), /* Linked List type1/2/3 */ + CAPS_0_SUPPORT_LL4 = BIT(21), /* Linked List type4 */ + CCR_FS = BIT(5), CCR_READ_PRIORITY = BIT(6), CCR_ENABLE = BIT(7), @@ -151,6 +182,19 @@ enum { CICR_SUPER_BLOCK_IE = BIT(14), /* OMAP2+ only */ CLNK_CTRL_ENABLE_LNK = BIT(15), + + CDP_DST_VALID_INC = 0 << 0, + CDP_DST_VALID_RELOAD = 1 << 0, + CDP_DST_VALID_REUSE = 2 << 0, + CDP_SRC_VALID_INC = 0 << 2, + CDP_SRC_VALID_RELOAD = 1 << 2, + CDP_SRC_VALID_REUSE = 2 << 2, + CDP_NTYPE_TYPE1 = 1 << 4, + CDP_NTYPE_TYPE2 = 2 << 4, + CDP_NTYPE_TYPE3 = 3 << 4, + CDP_TMODE_NORMAL = 0 << 8, + CDP_TMODE_LLIST = 1 << 8, + CDP_FAST = BIT(10), }; static const unsigned es_bytes[] = { @@ -180,7 +224,64 @@ static inline struct omap_desc *to_omap_dma_desc(struct dma_async_tx_descriptor static void omap_dma_desc_free(struct virt_dma_desc *vd) { - kfree(container_of(vd, struct omap_desc, vd)); + struct omap_desc *d = container_of(vd, struct omap_desc, vd); + + if (d->using_ll) { + struct omap_chan *c = d->c; + int i; + + for (i = 0; i < d->sglen; i++) { + if (d->sg[i].t2_desc) + dma_pool_free(c->desc_pool, d->sg[i].t2_desc, + d->sg[i].t2_desc_paddr); + } + } + + kfree(d); +} + +static void omap_dma_fill_type2_desc(struct omap_desc *d, int idx, + enum dma_transfer_direction dir, bool last) +{ + struct omap_sg *sg = &d->sg[idx]; + struct omap_type2_desc *t2_desc = sg->t2_desc; + + if (idx) + d->sg[idx - 1].t2_desc->next_desc = sg->t2_desc_paddr; + if (last) + t2_desc->next_desc = 0xfffffffc; + + t2_desc->en = sg->en; + t2_desc->addr = sg->addr; + t2_desc->fn = sg->fn & 0xffff; + t2_desc->cicr = d->cicr; + if (!last) + t2_desc->cicr &= ~CICR_BLOCK_IE; + + switch (dir) { + case DMA_DEV_TO_MEM: + t2_desc->cdei = sg->ei; + t2_desc->csei = d->ei; + t2_desc->cdfi = sg->fi; + t2_desc->csfi = d->fi; + + t2_desc->en |= DESC_NXT_DV_REFRESH; + t2_desc->en |= DESC_NXT_SV_REUSE; + break; + case DMA_MEM_TO_DEV: + t2_desc->cdei = d->ei; + t2_desc->csei = sg->ei; + t2_desc->cdfi = d->fi; + t2_desc->csfi = sg->fi; + + t2_desc->en |= DESC_NXT_SV_REFRESH; + t2_desc->en |= DESC_NXT_DV_REUSE; + break; + default: + return; + } + + t2_desc->en |= DESC_NTYPE_TYPE2; } static void omap_dma_write(uint32_t val, unsigned type, void __iomem *addr) @@ -285,6 +386,7 @@ static void omap_dma_assign(struct omap_dmadev *od, struct omap_chan *c, static void omap_dma_start(struct omap_chan *c, struct omap_desc *d) { struct omap_dmadev *od = to_omap_dma_dev(c->vc.chan.device); + uint16_t cicr = d->cicr; if (__dma_omap15xx(od->plat->dma_attr)) omap_dma_chan_write(c, CPC, 0); @@ -293,8 +395,27 @@ static void omap_dma_start(struct omap_chan *c, struct omap_desc *d) omap_dma_clear_csr(c); + if (d->using_ll) { + uint32_t cdp = CDP_TMODE_LLIST | CDP_NTYPE_TYPE2 | CDP_FAST; + + if (d->dir == DMA_DEV_TO_MEM) + cdp |= (CDP_DST_VALID_RELOAD | CDP_SRC_VALID_REUSE); + else + cdp |= (CDP_DST_VALID_REUSE | CDP_SRC_VALID_RELOAD); + omap_dma_chan_write(c, CDP, cdp); + + omap_dma_chan_write(c, CNDP, d->sg[0].t2_desc_paddr); + omap_dma_chan_write(c, CCDN, 0); + omap_dma_chan_write(c, CCFN, 0xffff); + omap_dma_chan_write(c, CCEN, 0xffffff); + + cicr &= ~CICR_BLOCK_IE; + } else if (od->ll123_supported) { + omap_dma_chan_write(c, CDP, 0); + } + /* Enable interrupts */ - omap_dma_chan_write(c, CICR, d->cicr); + omap_dma_chan_write(c, CICR, cicr); /* Enable channel */ omap_dma_chan_write(c, CCR, d->ccr | CCR_ENABLE); @@ -447,7 +568,7 @@ static void omap_dma_callback(int ch, u16 status, void *data) if (d) { if (c->cyclic) { vchan_cyclic_callback(&d->vd); - } else if (c->sgidx == d->sglen) { + } else if (d->using_ll || c->sgidx == d->sglen) { vchan_cookie_complete(&d->vd); omap_dma_start_desc(c); } else { @@ -501,8 +622,19 @@ static int omap_dma_alloc_chan_resources(struct dma_chan *chan) { struct omap_dmadev *od = to_omap_dma_dev(chan->device); struct omap_chan *c = to_omap_dma_chan(chan); + struct device *dev = od->ddev.dev; int ret; + if (od->ll123_supported) { + c->desc_pool = dma_pool_create(dev_name(dev), dev, + sizeof(struct omap_type2_desc), + 4, 0); + if (!c->desc_pool) { + dev_err(dev, "unable to allocate descriptor pool\n"); + return -ENOMEM; + } + } + if (od->legacy) { ret = omap_request_dma(c->dma_sig, "DMA engine", omap_dma_callback, c, &c->dma_ch); @@ -511,8 +643,7 @@ static int omap_dma_alloc_chan_resources(struct dma_chan *chan) &c->dma_ch); } - dev_dbg(od->ddev.dev, "allocating channel %u for %u\n", - c->dma_ch, c->dma_sig); + dev_dbg(dev, "allocating channel %u for %u\n", c->dma_ch, c->dma_sig); if (ret >= 0) { omap_dma_assign(od, c, c->dma_ch); @@ -567,6 +698,8 @@ static void omap_dma_free_chan_resources(struct dma_chan *chan) od->lch_map[c->dma_ch] = NULL; vchan_free_chan_resources(&c->vc); omap_free_dma(c->dma_ch); + if (od->ll123_supported) + dma_pool_destroy(c->desc_pool); dev_dbg(od->ddev.dev, "freeing channel %u used for %u\n", c->dma_ch, c->dma_sig); @@ -743,6 +876,7 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg( struct omap_desc *d; dma_addr_t dev_addr; unsigned i, es, en, frame_bytes; + bool ll_failed = false; u32 burst; if (dir == DMA_DEV_TO_MEM) { @@ -778,6 +912,8 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg( if (!d) return NULL; + d->c = c; + d->dir = dir; d->dev_addr = dev_addr; d->es = es; @@ -818,16 +954,47 @@ static struct dma_async_tx_descriptor *omap_dma_prep_slave_sg( */ en = burst; frame_bytes = es_bytes[es] * en; + + if (sglen >= 2) + d->using_ll = od->ll123_supported; + for_each_sg(sgl, sgent, sglen, i) { struct omap_sg *osg = &d->sg[i]; osg->addr = sg_dma_address(sgent); osg->en = en; osg->fn = sg_dma_len(sgent) / frame_bytes; + + if (d->using_ll) { + osg->t2_desc = dma_pool_alloc(c->desc_pool, GFP_ATOMIC, + &osg->t2_desc_paddr); + if (!osg->t2_desc) { + dev_err(chan->device->dev, + "t2_desc[%d] allocation failed\n", i); + ll_failed = true; + d->using_ll = false; + continue; + } + + omap_dma_fill_type2_desc(d, i, dir, (i == sglen - 1)); + } } d->sglen = sglen; + /* Release the dma_pool entries if one allocation failed */ + if (ll_failed) { + for (i = 0; i < d->sglen; i++) { + struct omap_sg *osg = &d->sg[i]; + + if (osg->t2_desc) { + dma_pool_free(c->desc_pool, osg->t2_desc, + osg->t2_desc_paddr); + osg->t2_desc = NULL; + } + } + } + return vchan_tx_prep(&c->vc, &d->vd, tx_flags); } @@ -1266,6 +1433,9 @@ static int omap_dma_probe(struct platform_device *pdev) return rc; } + if (omap_dma_glbl_read(od, CAPS_0) & CAPS_0_SUPPORT_LL123) + od->ll123_supported = true; + od->ddev.filter.map = od->plat->slave_map; od->ddev.filter.mapcnt = od->plat->slavecnt; od->ddev.filter.fn = omap_dma_filter_fn; @@ -1293,7 +1463,8 @@ static int omap_dma_probe(struct platform_device *pdev) } } - dev_info(&pdev->dev, "OMAP DMA engine driver\n"); + dev_info(&pdev->dev, "OMAP DMA engine driver%s\n", + od->ll123_supported ? " (LinkedList1/2/3 supported)" : ""); return rc; } -- 2.9.1 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg 2016-07-14 12:42 ` [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg Peter Ujfalusi @ 2016-07-18 10:42 ` Russell King - ARM Linux 2016-07-18 11:12 ` Peter Ujfalusi 0 siblings, 1 reply; 24+ messages in thread From: Russell King - ARM Linux @ 2016-07-18 10:42 UTC (permalink / raw) To: linux-arm-kernel On Thu, Jul 14, 2016 at 03:42:42PM +0300, Peter Ujfalusi wrote: > struct omap_desc { > + struct omap_chan *c; > struct virt_dma_desc vd; No need for this. to_omap_dma_chan(foo->vd.tx.chan) will give you the omap_chan for the descriptor. In any case, I question whether you actually need this (see below.) > + bool using_ll; > enum dma_transfer_direction dir; > dma_addr_t dev_addr; > > @@ -81,6 +109,9 @@ struct omap_desc { > }; > > enum { > + CAPS_0_SUPPORT_LL123 = BIT(20), /* Linked List type1/2/3 */ > + CAPS_0_SUPPORT_LL4 = BIT(21), /* Linked List type4 */ > + > CCR_FS = BIT(5), > CCR_READ_PRIORITY = BIT(6), > CCR_ENABLE = BIT(7), > @@ -151,6 +182,19 @@ enum { > CICR_SUPER_BLOCK_IE = BIT(14), /* OMAP2+ only */ > > CLNK_CTRL_ENABLE_LNK = BIT(15), > + > + CDP_DST_VALID_INC = 0 << 0, > + CDP_DST_VALID_RELOAD = 1 << 0, > + CDP_DST_VALID_REUSE = 2 << 0, > + CDP_SRC_VALID_INC = 0 << 2, > + CDP_SRC_VALID_RELOAD = 1 << 2, > + CDP_SRC_VALID_REUSE = 2 << 2, > + CDP_NTYPE_TYPE1 = 1 << 4, > + CDP_NTYPE_TYPE2 = 2 << 4, > + CDP_NTYPE_TYPE3 = 3 << 4, > + CDP_TMODE_NORMAL = 0 << 8, > + CDP_TMODE_LLIST = 1 << 8, > + CDP_FAST = BIT(10), > }; > > static const unsigned es_bytes[] = { > @@ -180,7 +224,64 @@ static inline struct omap_desc *to_omap_dma_desc(struct dma_async_tx_descriptor > > static void omap_dma_desc_free(struct virt_dma_desc *vd) > { > - kfree(container_of(vd, struct omap_desc, vd)); > + struct omap_desc *d = container_of(vd, struct omap_desc, vd); struct omap_desc *d = to_omap_dma_desc(&vd->tx); works just as well, and looks much nicer, and follows the existing code pattern. > + > + if (d->using_ll) { > + struct omap_chan *c = d->c; > + int i; > + > + for (i = 0; i < d->sglen; i++) { > + if (d->sg[i].t2_desc) > + dma_pool_free(c->desc_pool, d->sg[i].t2_desc, > + d->sg[i].t2_desc_paddr); Why do you need a per-channel pool of descriptors? Won't a per-device descriptor pool be much better, and simplify the code here? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg 2016-07-18 10:42 ` Russell King - ARM Linux @ 2016-07-18 11:12 ` Peter Ujfalusi 0 siblings, 0 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-18 11:12 UTC (permalink / raw) To: linux-arm-kernel On 07/18/16 13:42, Russell King - ARM Linux wrote: > On Thu, Jul 14, 2016 at 03:42:42PM +0300, Peter Ujfalusi wrote: >> struct omap_desc { >> + struct omap_chan *c; >> struct virt_dma_desc vd; > > No need for this. to_omap_dma_chan(foo->vd.tx.chan) will give you the > omap_chan for the descriptor. In any case, I question whether you > actually need this (see below.) I don't know how I missed that. Works and looks better! >> + bool using_ll; >> enum dma_transfer_direction dir; >> dma_addr_t dev_addr; >> >> @@ -81,6 +109,9 @@ struct omap_desc { >> }; >> >> enum { >> + CAPS_0_SUPPORT_LL123 = BIT(20), /* Linked List type1/2/3 */ >> + CAPS_0_SUPPORT_LL4 = BIT(21), /* Linked List type4 */ >> + >> CCR_FS = BIT(5), >> CCR_READ_PRIORITY = BIT(6), >> CCR_ENABLE = BIT(7), >> @@ -151,6 +182,19 @@ enum { >> CICR_SUPER_BLOCK_IE = BIT(14), /* OMAP2+ only */ >> >> CLNK_CTRL_ENABLE_LNK = BIT(15), >> + >> + CDP_DST_VALID_INC = 0 << 0, >> + CDP_DST_VALID_RELOAD = 1 << 0, >> + CDP_DST_VALID_REUSE = 2 << 0, >> + CDP_SRC_VALID_INC = 0 << 2, >> + CDP_SRC_VALID_RELOAD = 1 << 2, >> + CDP_SRC_VALID_REUSE = 2 << 2, >> + CDP_NTYPE_TYPE1 = 1 << 4, >> + CDP_NTYPE_TYPE2 = 2 << 4, >> + CDP_NTYPE_TYPE3 = 3 << 4, >> + CDP_TMODE_NORMAL = 0 << 8, >> + CDP_TMODE_LLIST = 1 << 8, >> + CDP_FAST = BIT(10), >> }; >> >> static const unsigned es_bytes[] = { >> @@ -180,7 +224,64 @@ static inline struct omap_desc *to_omap_dma_desc(struct dma_async_tx_descriptor >> >> static void omap_dma_desc_free(struct virt_dma_desc *vd) >> { >> - kfree(container_of(vd, struct omap_desc, vd)); >> + struct omap_desc *d = container_of(vd, struct omap_desc, vd); > > struct omap_desc *d = to_omap_dma_desc(&vd->tx); > > works just as well, and looks much nicer, and follows the existing code > pattern. Yes, I missed this as well. >> + >> + if (d->using_ll) { >> + struct omap_chan *c = d->c; >> + int i; >> + >> + for (i = 0; i < d->sglen; i++) { >> + if (d->sg[i].t2_desc) >> + dma_pool_free(c->desc_pool, d->sg[i].t2_desc, >> + d->sg[i].t2_desc_paddr); > > Why do you need a per-channel pool of descriptors? Won't a per-device > descriptor pool be much better, and simplify the code here? I was planning to try per-device pool after this series. I think I went with per-channel pool as for example bcm2835-dma was doing the same. In code wise I don't think it is going to simplify much as we still need to free here what we have allocated. I can test this out. -- P?ter ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg 2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi ` (6 preceding siblings ...) 2016-07-14 12:42 ` [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg Peter Ujfalusi @ 2016-07-18 10:31 ` Russell King - ARM Linux 2016-07-18 12:07 ` Peter Ujfalusi 7 siblings, 1 reply; 24+ messages in thread From: Russell King - ARM Linux @ 2016-07-18 10:31 UTC (permalink / raw) To: linux-arm-kernel On Thu, Jul 14, 2016 at 03:42:35PM +0300, Peter Ujfalusi wrote: > Hi, > > The following series with the final patch will add support for sDMA Linked List > transfer support. > Linked List is supported by sDMA in OMAP3630+ (OMAP4/5, dra7 family). > If the descriptor load feature is present we can create the descriptors for each > SG beforehand and let sDMA to walk them through. > This way the number of sDMA interrupts the kernel need to handle will drop > dramatically. I suggested this a few years ago, and I was told by TI that there was no interest to implement this feature as it had very little performance effect. Do I take it that TI have changed their position on this feature? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg 2016-07-18 10:31 ` [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Russell King - ARM Linux @ 2016-07-18 12:07 ` Peter Ujfalusi 2016-07-18 12:21 ` Russell King - ARM Linux 0 siblings, 1 reply; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-18 12:07 UTC (permalink / raw) To: linux-arm-kernel On 07/18/16 13:31, Russell King - ARM Linux wrote: > On Thu, Jul 14, 2016 at 03:42:35PM +0300, Peter Ujfalusi wrote: >> Hi, >> >> The following series with the final patch will add support for sDMA Linked List >> transfer support. >> Linked List is supported by sDMA in OMAP3630+ (OMAP4/5, dra7 family). >> If the descriptor load feature is present we can create the descriptors for each >> SG beforehand and let sDMA to walk them through. >> This way the number of sDMA interrupts the kernel need to handle will drop >> dramatically. > > I suggested this a few years ago, and I was told by TI that there was > no interest to implement this feature as it had very little performance > effect. I can not comment on this... Few years ago I was not involved with the DMA drivers so I don't have any idea why would anyone object to have the linked list (or descriptor load) mode in use whenever it is possible. I was not even aware of the linked list mode of sDMA 3 weeks back, but while reading the TRM - for the interleaved mode mainly it sounded like a good idea to implement this. Not really sure about the raw performance impact, but for interactivity it does help. I remember running 'emerge --sync' on BeagleBoard was pain as it took hours and the board was mostly unusable during that time. With the linked list mode the same takes reasonable time and I can still poke around in the board. > Do I take it that TI have changed their position on this feature? I was not aware of any position on this from TI - as I mentioned I was not involved with DMA. It could be that the position from 'TI' is still what it was. Or changed. But as I have been asked to look after TI DMA drivers upstream and I believe that the linked list mode is a good thing to have - which is backed by my experiences. My position is that linked list support is cool. -- P?ter ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg 2016-07-18 12:07 ` Peter Ujfalusi @ 2016-07-18 12:21 ` Russell King - ARM Linux 2016-07-18 12:30 ` Peter Ujfalusi 0 siblings, 1 reply; 24+ messages in thread From: Russell King - ARM Linux @ 2016-07-18 12:21 UTC (permalink / raw) To: linux-arm-kernel On Mon, Jul 18, 2016 at 03:07:57PM +0300, Peter Ujfalusi wrote: > I was not aware of any position on this from TI - as I mentioned I was not > involved with DMA. It could be that the position from 'TI' is still what it > was. Or changed. But as I have been asked to look after TI DMA drivers > upstream and I believe that the linked list mode is a good thing to have - > which is backed by my experiences. My position is that linked list support is > cool. That's really nice news. Nothing like asking the author first whether he'd like to pass over maintainership of the driver. I guess you won't mind if at some point in the future, I decide to just take it back... -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg 2016-07-18 12:21 ` Russell King - ARM Linux @ 2016-07-18 12:30 ` Peter Ujfalusi 0 siblings, 0 replies; 24+ messages in thread From: Peter Ujfalusi @ 2016-07-18 12:30 UTC (permalink / raw) To: linux-arm-kernel On 07/18/16 15:21, Russell King - ARM Linux wrote: > On Mon, Jul 18, 2016 at 03:07:57PM +0300, Peter Ujfalusi wrote: >> I was not aware of any position on this from TI - as I mentioned I was not >> involved with DMA. It could be that the position from 'TI' is still what it >> was. Or changed. But as I have been asked to look after TI DMA drivers >> upstream and I believe that the linked list mode is a good thing to have - >> which is backed by my experiences. My position is that linked list support is >> cool. > > That's really nice news. Nothing like asking the author first whether > he'd like to pass over maintainership of the driver. I guess you won't > mind if at some point in the future, I decide to just take it back... I work with the DMA drivers on behalf of TI. Inside TI the DMA related queries are targeted at me. This does not change the maintainer of the drivers upstream. -- P?ter ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2016-07-24 7:39 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-07-14 12:42 [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 1/7] dmaengine: omap-dma: Simplify omap_dma_start_sg parameter list Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 2/7] dmaengine: omap-dma: Complete the cookie first on transfer completion Peter Ujfalusi 2016-07-18 10:34 ` Russell King - ARM Linux 2016-07-19 12:35 ` Peter Ujfalusi 2016-07-19 16:20 ` Russell King - ARM Linux 2016-07-19 19:23 ` Peter Ujfalusi 2016-07-24 7:39 ` Vinod Koul 2016-07-20 6:26 ` Robert Jarzmik 2016-07-21 9:33 ` Peter Ujfalusi 2016-07-21 9:35 ` Peter Ujfalusi 2016-07-21 9:47 ` Russell King - ARM Linux 2016-07-22 11:00 ` Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 3/7] dmaengine: omap-dma: Simplify omap_dma_callback Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 4/7] dmaengine: omap-dma: Dynamically allocate memory for lch_map Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 5/7] dmaengine: omap-dma: Add more debug information when freeing channel Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 6/7] dmaengine: omap-dma: Use pointer to omap_sg in slave_sg setup's loop Peter Ujfalusi 2016-07-14 12:42 ` [PATCH 7/7] dmaengine: omap-dma: Support for LinkedList transfer of slave_sg Peter Ujfalusi 2016-07-18 10:42 ` Russell King - ARM Linux 2016-07-18 11:12 ` Peter Ujfalusi 2016-07-18 10:31 ` [PATCH 0/7] dmaengine:omap-dma: Linked List transfer for slave_sg Russell King - ARM Linux 2016-07-18 12:07 ` Peter Ujfalusi 2016-07-18 12:21 ` Russell King - ARM Linux 2016-07-18 12:30 ` Peter Ujfalusi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).