All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Query on Audio DMA using DMAEngine
       [not found] <083BC63EECB6FD41B8E81CF7FD87CC0F2E4F1488@DLEE08.ent.ti.com>
@ 2013-06-30 12:06 ` Lars-Peter Clausen
  2013-07-01  6:10   ` Mike Looijmans
  2013-07-02  1:04   ` Joel Fernandes
  0 siblings, 2 replies; 30+ messages in thread
From: Lars-Peter Clausen @ 2013-06-30 12:06 UTC (permalink / raw)
  To: Fernandes, Joel; +Cc: alsa-devel

Added alsa-devel to Cc.

On 06/28/2013 05:27 AM, Fernandes, Joel wrote:
> Hi Lars,
> 
> Hope you are doing well.
> 
> I am implementing Cyclic DMA support in the EDMA driver that is used by Davinci and now newer TI SoCs.
> I am thinking once I am done I can plug it into the snd_dmaengine framework.
> 
> Currently however, the davinci-pcm code directly programs the EDMA. That is what I am working to replace with a single driver and adapt to the snd dmaengine framework. However, once the current code in davinci-pcm uses internal RAM as an intermediate step in the whole DMA process (First data is TX to IRAM from DRAM and then from DRAM to the audio device).
> 
> Do you have any ideas on how we can adapt to the framework, such that we can till use the IRAM? Are there any existing implementations out there that do something similar?

Hm, I guess using the snd_dmaengine_pcm helper functions here shouldn't be too
hard. Using the generic snd_dmaengine_pcm driver will require some extensions
to it though. The mmp platform (pxa/mmp-pcm.c) is also using some kind of
on-chip memory, so having support for this in the generic driver certainly
makes sense. For the chaining you'd probably have to extend the dmaengine
framework, since this kind of interleaved mem-to-mem and mem-to-dev cyclic
transfer is currently not possible.

I'm wondering though why do you need to copy the data to RAM first, is it not
possible to map the IRAM to userspace?

- Lars

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-06-30 12:06 ` Query on Audio DMA using DMAEngine Lars-Peter Clausen
@ 2013-07-01  6:10   ` Mike Looijmans
  2013-07-02  1:28     ` Joel Fernandes
  2013-07-02  3:33     ` Joel Fernandes
  2013-07-02  1:04   ` Joel Fernandes
  1 sibling, 2 replies; 30+ messages in thread
From: Mike Looijmans @ 2013-07-01  6:10 UTC (permalink / raw)
  To: alsa-devel; +Cc: joelf, lars

On 06/30/2013 02:06 PM, Lars-Peter Clausen wrote:
> Added alsa-devel to Cc.
>
> On 06/28/2013 05:27 AM, Fernandes, Joel wrote:
>> Hi Lars,
>>
>> Hope you are doing well.
>>
>> I am implementing Cyclic DMA support in the EDMA driver that is used by Davinci and now newer TI SoCs.
>> I am thinking once I am done I can plug it into the snd_dmaengine framework.
>>
>> Currently however, the davinci-pcm code directly programs the EDMA. That is what I am working to replace with a single driver and adapt to the snd dmaengine framework. However, once the current code in davinci-pcm uses internal RAM as an intermediate step in the whole DMA process (First data is TX to IRAM from DRAM and then from DRAM to the audio device).
>>
>> Do you have any ideas on how we can adapt to the framework, such that we can till use the IRAM? Are there any existing implementations out there that do something similar?
>
> Hm, I guess using the snd_dmaengine_pcm helper functions here shouldn't be too
> hard. Using the generic snd_dmaengine_pcm driver will require some extensions
> to it though. The mmp platform (pxa/mmp-pcm.c) is also using some kind of
> on-chip memory, so having support for this in the generic driver certainly
> makes sense. For the chaining you'd probably have to extend the dmaengine
> framework, since this kind of interleaved mem-to-mem and mem-to-dev cyclic
> transfer is currently not possible.
>
> I'm wondering though why do you need to copy the data to RAM first, is it not
> possible to map the IRAM to userspace?


I've already built a cyclic DMA implementation into the EDMA driver for 
Davinci, without using the internal RAM. But that was for a 2.6.37 kernel.

For capture, the internal RAM ping pong only made things worse, not 
better. I really have no idea what problem it was supposed to solve.

The trouble with the current davinci driver is that the IRQ handler has 
a real-time requirement, it must finish before the next DMA block 
completes. This causes most of the buffer overruns on heavily loaded 
systems.
It's easy to set up a cyclic chain of DMA transfers with the EDMA 
controller that continuously transfers data to the audio buffer. Once 
that is done, the completion IRQ can be used to periodically "trigger" 
user space, but it isn't time critical any more.
The McASP has enough internal buffering to take care of any DDR latency 
issues.

With the cyclic DMA, I can capture 16 channels of 32-bit audio at 51kHz, 
simultaneously playback 2 channels and write the audio data to an SD 
card on the OMAP-L138. Before that change, it wasn't even possible to 
capture 4 channels without overruns.

I can mail you the 2.6.37 code, it isn't worthy for direct inclusion but 
may save you some time to figure things out.

Kind regards,
Mike.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-06-30 12:06 ` Query on Audio DMA using DMAEngine Lars-Peter Clausen
  2013-07-01  6:10   ` Mike Looijmans
@ 2013-07-02  1:04   ` Joel Fernandes
  2013-07-03  9:07     ` Lars-Peter Clausen
  1 sibling, 1 reply; 30+ messages in thread
From: Joel Fernandes @ 2013-07-02  1:04 UTC (permalink / raw)
  To: Lars-Peter Clausen; +Cc: alsa-devel

On 06/30/2013 07:06 AM, Lars-Peter Clausen wrote:
> Added alsa-devel to Cc.
> 
> On 06/28/2013 05:27 AM, Fernandes, Joel wrote:
>> Hi Lars,
>>
>> Hope you are doing well.
>>
>> I am implementing Cyclic DMA support in the EDMA driver that is used by Davinci and now newer TI SoCs.
>> I am thinking once I am done I can plug it into the snd_dmaengine framework.
>>
>> Currently however, the davinci-pcm code directly programs the EDMA. That is what I am working to replace with a single driver and adapt to the snd dmaengine framework. However, once the current code in davinci-pcm uses internal RAM as an intermediate step in the whole DMA process (First data is TX to IRAM from DRAM and then from DRAM to the audio device).
>>
>> Do you have any ideas on how we can adapt to the framework, such that we can till use the IRAM? Are there any existing implementations out there that do something similar?
> 
> Hm, I guess using the snd_dmaengine_pcm helper functions here shouldn't be too
> hard. Using the generic snd_dmaengine_pcm driver will require some extensions
> to it though. The mmp platform (pxa/mmp-pcm.c) is also using some kind of
> on-chip memory, so having support for this in the generic driver certainly

I quickly looked at the implementation there. That's neat the way IRAM is used
to allocate the DMA buffer.

> makes sense. For the chaining you'd probably have to extend the dmaengine
> framework, since this kind of interleaved mem-to-mem and mem-to-dev cyclic
> transfer is currently not possible.

I was thinking , if it makes sense to make this kind of intermediate IRAM step
purely a DMA controller driver specific implementation. Basically, what I mean
is the use of IRAM will be unknown to any of the other DMA layers and purely
implement in the DMA controller driver making the interleaving with IRAM
transparent to the DMAEngine framework or the other drivers. Using device tree
or some other method, one could indicate that IRAM is present and should be used
for the specific DMA channel.

> I'm wondering though why do you need to copy the data to RAM first, is it not
> possible to map the IRAM to userspace?

Yes, certainly it should be possible to map the IRAM directly. I don't know the
exact reasons why it was done that way but I do know not using the IRAM was
causing under runs. I will run some experiments mapping IRAM directly to see if
we still see under runs.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-01  6:10   ` Mike Looijmans
@ 2013-07-02  1:28     ` Joel Fernandes
  2013-07-02  6:02       ` Mike Looijmans
  2013-07-04 11:00       ` Clemens Ladisch
  2013-07-02  3:33     ` Joel Fernandes
  1 sibling, 2 replies; 30+ messages in thread
From: Joel Fernandes @ 2013-07-02  1:28 UTC (permalink / raw)
  To: Mike Looijmans; +Cc: alsa-devel, lars

On 07/01/2013 01:10 AM, Mike Looijmans wrote:
> On 06/30/2013 02:06 PM, Lars-Peter Clausen wrote:
>> Added alsa-devel to Cc.
>>
>> On 06/28/2013 05:27 AM, Fernandes, Joel wrote:
>>> Hi Lars,
>>>
>>> Hope you are doing well.
>>>
>>> I am implementing Cyclic DMA support in the EDMA driver that is used by
>>> Davinci and now newer TI SoCs.
>>> I am thinking once I am done I can plug it into the snd_dmaengine framework.
>>>
>>> Currently however, the davinci-pcm code directly programs the EDMA. That is
>>> what I am working to replace with a single driver and adapt to the snd
>>> dmaengine framework. However, once the current code in davinci-pcm uses
>>> internal RAM as an intermediate step in the whole DMA process (First data is
>>> TX to IRAM from DRAM and then from DRAM to the audio device).
>>>
>>> Do you have any ideas on how we can adapt to the framework, such that we can
>>> till use the IRAM? Are there any existing implementations out there that do
>>> something similar?
>>
>> Hm, I guess using the snd_dmaengine_pcm helper functions here shouldn't be too
>> hard. Using the generic snd_dmaengine_pcm driver will require some extensions
>> to it though. The mmp platform (pxa/mmp-pcm.c) is also using some kind of
>> on-chip memory, so having support for this in the generic driver certainly
>> makes sense. For the chaining you'd probably have to extend the dmaengine
>> framework, since this kind of interleaved mem-to-mem and mem-to-dev cyclic
>> transfer is currently not possible.
>>
>> I'm wondering though why do you need to copy the data to RAM first, is it not
>> possible to map the IRAM to userspace?
> 
> 
> I've already built a cyclic DMA implementation into the EDMA driver for Davinci,
> without using the internal RAM. But that was for a 2.6.37 kernel.

Great!

> For capture, the internal RAM ping pong only made things worse, not better. I
> really have no idea what problem it was supposed to solve.

Interesting.

> The trouble with the current davinci driver is that the IRQ handler has a
> real-time requirement, it must finish before the next DMA block completes. This
> causes most of the buffer overruns on heavily loaded systems.

But how do you get around not calling snd_pcm_period_elapsed in a time-sensitive
fashion? Isn't it always time senstive,  or maybe you mean the timing is a bit
more relaxed (still sensitive though) as now the interrupt handler can its own
time to finish as long as it finishes before the next interrupt comes.

If that's what you mean, then actually what you said is not true for the
ping-pong implementation. because the DMA controller is programmed only *once*
at the beginning for the ping-pong or IRAM case. It is just the way the
ping-pong works, there is no need to program the DMA controller again and again
every interrupt. On the other hand, fully agree that for the regular case the
DMA controller has to be programmed for every period and this is what I guess
makes it time sensitive, you could confirm.

> It's easy to set up a cyclic chain of DMA transfers with the EDMA controller
> that continuously transfers data to the audio buffer. Once that is done, the
> completion IRQ can be used to periodically "trigger" user space, but it isn't
> time critical any more.

That makes a lot of sense.

> The McASP has enough internal buffering to take care of any DDR latency issues.

Sure.

> With the cyclic DMA, I can capture 16 channels of 32-bit audio at 51kHz,
> simultaneously playback 2 channels and write the audio data to an SD card on the
> OMAP-L138. Before that change, it wasn't even possible to capture 4 channels
> without overruns.

Sweet! Any particular reason why it wasn't merged in vs the existing ping-pong code?

> I can mail you the 2.6.37 code, it isn't worthy for direct inclusion but may
> save you some time to figure things out.

Certainly could take a look. Could you share it? Thank you.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-01  6:10   ` Mike Looijmans
  2013-07-02  1:28     ` Joel Fernandes
@ 2013-07-02  3:33     ` Joel Fernandes
  2013-07-02  5:50       ` Mike Looijmans
  1 sibling, 1 reply; 30+ messages in thread
From: Joel Fernandes @ 2013-07-02  3:33 UTC (permalink / raw)
  To: Mike Looijmans; +Cc: alsa-devel, lars

Hi Mike,

On 07/01/2013 01:10 AM, Mike Looijmans wrote:
[..]
> The trouble with the current davinci driver is that the IRQ handler has a
> real-time requirement, it must finish before the next DMA block completes. This

I looked into this a little more.

I think you are picturing the following:

DMA transfer -> IRQ has to complete -> DMA transfer -> IRQ has to complete.. etc.

This is not really true in the davinci-pcm driver, the normal case without IRAM
works more like..

DMA ----> DMA ---> DMA
 \        \        \
  \__ IRQ  \__ IRQ  \__ IRQ

The only hard requirement is the IRQ handler much finish updating before the
next DMA transfer, or we're in trouble. Is this what you mean by real-time
requirement, or did you mean something else?

Either way I'm sure your multi-slot approach is superior, but I don't see how
you can get away with not updating the DMA addresses on every IRQ with the
current davinci-pcm or EDMA controller (Unless you use a complicated mechanism
like ping-pong where the address updates take care of itself). If you are using
a set of chained slots, you only have so many slots so you have to continuously
change addresses of the slots at some point or the other for a large transfer.

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-02  3:33     ` Joel Fernandes
@ 2013-07-02  5:50       ` Mike Looijmans
  2013-07-02 12:13         ` Mark Brown
  2013-08-14  4:30         ` Joel Fernandes
  0 siblings, 2 replies; 30+ messages in thread
From: Mike Looijmans @ 2013-07-02  5:50 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: alsa-devel, lars

On 07/02/2013 05:33 AM, Joel Fernandes wrote:
> Hi Mike,
>
> On 07/01/2013 01:10 AM, Mike Looijmans wrote:
> [..]
>> The trouble with the current davinci driver is that the IRQ handler has a
>> real-time requirement, it must finish before the next DMA block completes. This
>
> I looked into this a little more.
>
> I think you are picturing the following:
>
> DMA transfer -> IRQ has to complete -> DMA transfer -> IRQ has to complete.. etc.
>
> This is not really true in the davinci-pcm driver, the normal case without IRAM
> works more like..
>
> DMA ----> DMA ---> DMA
>   \        \        \
>    \__ IRQ  \__ IRQ  \__ IRQ
>
> The only hard requirement is the IRQ handler much finish updating before the
> next DMA transfer, or we're in trouble. Is this what you mean by real-time
> requirement, or did you mean something else?


Yep, that's what I meant. Because I was capturing 16 channels of 32-bit 
data, the DMA buffer would drain in mere milliseconds. Interrupt latency 
on the L138 is pretty bad, I've seen it take over 10ms to handle an IRQ 
on occasion.

But even with much lower loads, I got underruns when recording to SD 
card that I couldn't really explain. I noticed that the SD transfers 
took up a lot of DMA params (about 40), so maybe that was just causing 
too much work for the IRQ or DMA handler routines.

> Either way I'm sure your multi-slot approach is superior, but I don't see how
> you can get away with not updating the DMA addresses on every IRQ with the
> current davinci-pcm or EDMA controller (Unless you use a complicated mechanism
> like ping-pong where the address updates take care of itself). If you are using
> a set of chained slots, you only have so many slots so you have to continuously
> change addresses of the slots at some point or the other for a large transfer.

I use a chain like this:

DMA1 -> DMA2 -> DMA... -> DMA1

This meant I had to use a DMA PARAM slot for every "period". The OMAP 
L138 has 128 of those slots, so it's no problem to use a bunch of them. 
Because the chain is cyclic, there is no need to update any DMA 
parameter while running. All that ALSA needs to do is empty the buffer 
before the cycle completes and the current position gets overwritten.

The IRQ handler is called after each DMA completion, but it's no problem 
if it isn't handled in time. It is only used to give the ALSA framework 
a gentle push that a period has been transferred. Completely missing a 
bunch of interrupts has no effect whatsoever.

>
> Thanks,
>

You're welcome.

Mike.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-02  1:28     ` Joel Fernandes
@ 2013-07-02  6:02       ` Mike Looijmans
  2013-07-02 12:16         ` Mark Brown
  2013-07-04 11:00       ` Clemens Ladisch
  1 sibling, 1 reply; 30+ messages in thread
From: Mike Looijmans @ 2013-07-02  6:02 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: alsa-devel, lars

[-- Attachment #1: Type: text/plain, Size: 1578 bytes --]

On 07/02/2013 03:28 AM, Joel Fernandes wrote:
> On 07/01/2013 01:10 AM, Mike Looijmans wrote:
>> With the cyclic DMA, I can capture 16 channels of 32-bit audio at 51kHz,
>> simultaneously playback 2 channels and write the audio data to an SD card on the
>> OMAP-L138. Before that change, it wasn't even possible to capture 4 channels
>> without overruns.
>
> Sweet! Any particular reason why it wasn't merged in vs the existing ping-pong code?

I've posted questions and other stuff concerning the McASP/OMAP1, but 
there was very little interest, so I supposed the chipset was on its way 
out and there wasn't any point in maintaining it.


>> I can mail you the 2.6.37 code, it isn't worthy for direct inclusion but may
>> save you some time to figure things out.
>
> Certainly could take a look. Could you share it? Thank you.

I attached the source files.

There are a lot of changes in the files that are product specific hacks 
to get my things working in the cheapest way possible (cheap meaning 
spending little effort).

I also included the "davinci-mcasp.c" code. Because the customer wanted 
a bigger buffer, I set up the McASP FIFO to transfer larger blocks. The 
DMA is limited to 64k words, so increasing the size of a "word" is a 
sure way to transfer more data. It's also good for bursting transfers, 
which I'm told the DDR memory likes much better. I was unable to measure 
any effect (positive nor negative) of that though. The larger blocks 
aren't needed if you're satisfied with much smaller buffers, which for 
normal purposed should be just fine.

Mike.

[-- Attachment #2: davinci-2.6.37-cyclic-dma.tar.gz --]
[-- Type: application/x-gzip, Size: 10498 bytes --]

[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-02  5:50       ` Mike Looijmans
@ 2013-07-02 12:13         ` Mark Brown
  2013-07-02 13:40           ` Mike Looijmans
  2013-07-03  9:09           ` Lars-Peter Clausen
  2013-08-14  4:30         ` Joel Fernandes
  1 sibling, 2 replies; 30+ messages in thread
From: Mark Brown @ 2013-07-02 12:13 UTC (permalink / raw)
  To: Mike Looijmans; +Cc: Joel Fernandes, alsa-devel, lars


[-- Attachment #1.1: Type: text/plain, Size: 1169 bytes --]

On Tue, Jul 02, 2013 at 07:50:16AM +0200, Mike Looijmans wrote:
> On 07/02/2013 05:33 AM, Joel Fernandes wrote:

> But even with much lower loads, I got underruns when recording to SD
> card that I couldn't really explain. I noticed that the SD transfers
> took up a lot of DMA params (about 40), so maybe that was just
> causing too much work for the IRQ or DMA handler routines.

SD cards are generally just slow, it's possible it's just not able to
keep up with the data you're throwing at it.  Things like batching the
writes up into large chunks can help here but you may just be hitting a
genuine limit if you need to record for too long and don't have enough
fast storage (like RAM) to buffer.

> This meant I had to use a DMA PARAM slot for every "period". The
> OMAP L138 has 128 of those slots, so it's no problem to use a bunch
> of them. Because the chain is cyclic, there is no need to update any
> DMA parameter while running. All that ALSA needs to do is empty the
> buffer before the cycle completes and the current position gets
> overwritten.

This sort of cyclic thing tends to be best, ideally you don't need
interrupts at all (other than a timer).

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-02  6:02       ` Mike Looijmans
@ 2013-07-02 12:16         ` Mark Brown
  2013-07-02 13:30           ` Mike Looijmans
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2013-07-02 12:16 UTC (permalink / raw)
  To: Mike Looijmans; +Cc: Joel Fernandes, alsa-devel, lars


[-- Attachment #1.1: Type: text/plain, Size: 601 bytes --]

On Tue, Jul 02, 2013 at 08:02:00AM +0200, Mike Looijmans wrote:
> On 07/02/2013 03:28 AM, Joel Fernandes wrote:

> >Sweet! Any particular reason why it wasn't merged in vs the existing ping-pong code?

> I've posted questions and other stuff concerning the McASP/OMAP1,
> but there was very little interest, so I supposed the chipset was on
> its way out and there wasn't any point in maintaining it.

Did you CC the relevant maintainers and other people working on the
code?  You've not done so on this thread...  if you only post to the
list it's very likely that people won't see what you've sent.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-02 12:16         ` Mark Brown
@ 2013-07-02 13:30           ` Mike Looijmans
  2013-07-02 14:58             ` Mark Brown
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Looijmans @ 2013-07-02 13:30 UTC (permalink / raw)
  To: Mark Brown; +Cc: Joel Fernandes, alsa-devel, lars

On 07/02/2013 02:16 PM, Mark Brown wrote:
> On Tue, Jul 02, 2013 at 08:02:00AM +0200, Mike Looijmans wrote:
>> On 07/02/2013 03:28 AM, Joel Fernandes wrote:
>
>>> Sweet! Any particular reason why it wasn't merged in vs the existing ping-pong code?
>
>> I've posted questions and other stuff concerning the McASP/OMAP1,
>> but there was very little interest, so I supposed the chipset was on
>> its way out and there wasn't any point in maintaining it.
>
> Did you CC the relevant maintainers and other people working on the
> code?  You've not done so on this thread...  if you only post to the
> list it's very likely that people won't see what you've sent.

I'm relatively new to Linux kernel programming. The key problem for me 
was - and still is - that there is so overwhelmingly much information 
available, that it's virtually impossible to find out things that would 
be obvious for long-time developers, like finding out who the maintainer 
for a piece of code is. I still don't know that, by the way. How do I 
find the "CC" list that I'm supposed to send bugs/suggestions/patches to 
for a given piece of code?

I guess that a document on 
kernel-driver-development-for-people-who-used-to-work-with-a-centrally-organized-OS-and-used-to-get-all-their-answers-from-them 
would help, but then again finding that particular document - or 
realizing that it even exists (it doesn't, does it?) - would be the next 
problem.

It's quite easy to find out how one goes about writing a driver, but the 
process surrounding it - such as finding whether such a driver already 
exists, where to go for technical advice and where to post the git patch 
for inclusion in mainline is something that no one seems to want to 
dwell on.

Sorry if I'm ranting here. Maybe in time I'll learn to behave...

Mike

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-02 12:13         ` Mark Brown
@ 2013-07-02 13:40           ` Mike Looijmans
  2013-07-03  9:09           ` Lars-Peter Clausen
  1 sibling, 0 replies; 30+ messages in thread
From: Mike Looijmans @ 2013-07-02 13:40 UTC (permalink / raw)
  To: Mark Brown; +Cc: Joel Fernandes, alsa-devel, lars

On 07/02/2013 02:13 PM, Mark Brown wrote:
> On Tue, Jul 02, 2013 at 07:50:16AM +0200, Mike Looijmans wrote:
>> On 07/02/2013 05:33 AM, Joel Fernandes wrote:
>
>> But even with much lower loads, I got underruns when recording to SD
>> card that I couldn't really explain. I noticed that the SD transfers
>> took up a lot of DMA params (about 40), so maybe that was just
>> causing too much work for the IRQ or DMA handler routines.
>
> SD cards are generally just slow, it's possible it's just not able to
> keep up with the data you're throwing at it.  Things like batching the
> writes up into large chunks can help here but you may just be hitting a
> genuine limit if you need to record for too long and don't have enough
> fast storage (like RAM) to buffer.

What i meant by "couldn't really explain" is that I monitored CPU, 
memory and IO, and could clearly conclude that the card (or network, or 
even /dev/null on occasion) wasn't the bottleneck in itself. It's not so 
much the medium, but more the load it causes on the system that 
triggered the overruns.

>> This meant I had to use a DMA PARAM slot for every "period". The
>> OMAP L138 has 128 of those slots, so it's no problem to use a bunch
>> of them. Because the chain is cyclic, there is no need to update any
>> DMA parameter while running. All that ALSA needs to do is empty the
>> buffer before the cycle completes and the current position gets
>> overwritten.
>
> This sort of cyclic thing tends to be best, ideally you don't need
> interrupts at all (other than a timer).

Indeed. In this case, the DMA completion IRQ is still useful because the 
DMA controller issues it after the "period" data has been transferred 
completely. Just monitoring the DMA registers will tell which data is 
currently being transferred, but you can't be sure that it has actually 
been committed. So if the DMA pointer is now at 0x0100, you cannot be 
sure whether the data at 0x0100 is already valid, or even the data at 
0x0098, because that might still be in the controller's queue. The 
pointer increments when the request is sent to the queue, but there's no 
guarantee as to when the queue will actually be executed, because other 
transactions may have higher priority.

Mike.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-02 13:30           ` Mike Looijmans
@ 2013-07-02 14:58             ` Mark Brown
  0 siblings, 0 replies; 30+ messages in thread
From: Mark Brown @ 2013-07-02 14:58 UTC (permalink / raw)
  To: Mike Looijmans; +Cc: Joel Fernandes, alsa-devel, lars


[-- Attachment #1.1: Type: text/plain, Size: 1306 bytes --]

On Tue, Jul 02, 2013 at 03:30:51PM +0200, Mike Looijmans wrote:

> things that would be obvious for long-time developers, like finding
> out who the maintainer for a piece of code is. I still don't know
> that, by the way. How do I find the "CC" list that I'm supposed to
> send bugs/suggestions/patches to for a given piece of code?

MAINTAINERS and git log should give you a good guide - there's a script
called get_maintainer.pl in the kernel which will help but shouldn't be
100% relied on.  Basically just look at revision control history and
see who's been working on the code.

> I guess that a document on kernel-driver-development-for-people-who-used-to-work-with-a-centrally-organized-OS-and-used-to-get-all-their-answers-from-them
> would help, but then again finding that particular document - or
> realizing that it even exists (it doesn't, does it?) - would be the
> next problem.

That's what MAINTAINERS is there for.

> It's quite easy to find out how one goes about writing a driver, but
> the process surrounding it - such as finding whether such a driver
> already exists, where to go for technical advice and where to post
> the git patch for inclusion in mainline is something that no one
> seems to want to dwell on.

Documentation/SubmittingPatches is a pretty good starting point.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-02  1:04   ` Joel Fernandes
@ 2013-07-03  9:07     ` Lars-Peter Clausen
  0 siblings, 0 replies; 30+ messages in thread
From: Lars-Peter Clausen @ 2013-07-03  9:07 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: alsa-devel

On 07/02/2013 03:04 AM, Joel Fernandes wrote:
> On 06/30/2013 07:06 AM, Lars-Peter Clausen wrote:
>> Added alsa-devel to Cc.
>>
>> On 06/28/2013 05:27 AM, Fernandes, Joel wrote:
>>> Hi Lars,
>>>
>>> Hope you are doing well.
>>>
>>> I am implementing Cyclic DMA support in the EDMA driver that is used by Davinci and now newer TI SoCs.
>>> I am thinking once I am done I can plug it into the snd_dmaengine framework.
>>>
>>> Currently however, the davinci-pcm code directly programs the EDMA. That is what I am working to replace with a single driver and adapt to the snd dmaengine framework. However, once the current code in davinci-pcm uses internal RAM as an intermediate step in the whole DMA process (First data is TX to IRAM from DRAM and then from DRAM to the audio device).
>>>
>>> Do you have any ideas on how we can adapt to the framework, such that we can till use the IRAM? Are there any existing implementations out there that do something similar?
>>
>> Hm, I guess using the snd_dmaengine_pcm helper functions here shouldn't be too
>> hard. Using the generic snd_dmaengine_pcm driver will require some extensions
>> to it though. The mmp platform (pxa/mmp-pcm.c) is also using some kind of
>> on-chip memory, so having support for this in the generic driver certainly
> 
> I quickly looked at the implementation there. That's neat the way IRAM is used
> to allocate the DMA buffer.
> 
>> makes sense. For the chaining you'd probably have to extend the dmaengine
>> framework, since this kind of interleaved mem-to-mem and mem-to-dev cyclic
>> transfer is currently not possible.
> 
> I was thinking , if it makes sense to make this kind of intermediate IRAM step
> purely a DMA controller driver specific implementation. Basically, what I mean
> is the use of IRAM will be unknown to any of the other DMA layers and purely
> implement in the DMA controller driver making the interleaving with IRAM
> transparent to the DMAEngine framework or the other drivers. Using device tree
> or some other method, one could indicate that IRAM is present and should be used
> for the specific DMA channel.

Putting the ping-pong buffer handling into the DMA driver would allow you to
re-implement the current functionality with the current dmaengine API. So
this sounds like an option. And maybe there are also other usecases besides
audio for this.

- Lars

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-02 12:13         ` Mark Brown
  2013-07-02 13:40           ` Mike Looijmans
@ 2013-07-03  9:09           ` Lars-Peter Clausen
  2013-07-03  9:43             ` Mark Brown
  1 sibling, 1 reply; 30+ messages in thread
From: Lars-Peter Clausen @ 2013-07-03  9:09 UTC (permalink / raw)
  To: Mark Brown; +Cc: Mike Looijmans, Joel Fernandes, alsa-devel

On 07/02/2013 02:13 PM, Mark Brown wrote:
> On Tue, Jul 02, 2013 at 07:50:16AM +0200, Mike Looijmans wrote:
>> On 07/02/2013 05:33 AM, Joel Fernandes wrote:
[...]
>> This meant I had to use a DMA PARAM slot for every "period". The
>> OMAP L138 has 128 of those slots, so it's no problem to use a bunch
>> of them. Because the chain is cyclic, there is no need to update any
>> DMA parameter while running. All that ALSA needs to do is empty the
>> buffer before the cycle completes and the current position gets
>> overwritten.
> 
> This sort of cyclic thing tends to be best, ideally you don't need
> interrupts at all (other than a timer).

Yes, this is usually how it is done. But I'm wondering maybe the EDMA
controller only has a small total amount of slots available.

- Lars

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-03  9:09           ` Lars-Peter Clausen
@ 2013-07-03  9:43             ` Mark Brown
  2013-07-03 13:17               ` Mike Looijmans
       [not found]               ` <20130703094307.GE27646-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
  0 siblings, 2 replies; 30+ messages in thread
From: Mark Brown @ 2013-07-03  9:43 UTC (permalink / raw)
  To: Lars-Peter Clausen; +Cc: Mike Looijmans, Joel Fernandes, alsa-devel


[-- Attachment #1.1: Type: text/plain, Size: 460 bytes --]

On Wed, Jul 03, 2013 at 11:09:22AM +0200, Lars-Peter Clausen wrote:
> On 07/02/2013 02:13 PM, Mark Brown wrote:

> > This sort of cyclic thing tends to be best, ideally you don't need
> > interrupts at all (other than a timer).

> Yes, this is usually how it is done. But I'm wondering maybe the EDMA
> controller only has a small total amount of slots available.

Well, you don't need particularly many slots so long as you can cope
with a large period size.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-03  9:43             ` Mark Brown
@ 2013-07-03 13:17               ` Mike Looijmans
       [not found]                 ` <51D4245F.8070307-Oq418RWZeHk@public.gmane.org>
       [not found]               ` <20130703094307.GE27646-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
  1 sibling, 1 reply; 30+ messages in thread
From: Mike Looijmans @ 2013-07-03 13:17 UTC (permalink / raw)
  To: Mark Brown; +Cc: Joel Fernandes, alsa-devel, Lars-Peter Clausen

On 07/03/2013 11:43 AM, Mark Brown wrote:
> On Wed, Jul 03, 2013 at 11:09:22AM +0200, Lars-Peter Clausen wrote:
>> On 07/02/2013 02:13 PM, Mark Brown wrote:
>
>>> This sort of cyclic thing tends to be best, ideally you don't need
>>> interrupts at all (other than a timer).
>
>> Yes, this is usually how it is done. But I'm wondering maybe the EDMA
>> controller only has a small total amount of slots available.
>
> Well, you don't need particularly many slots so long as you can cope
> with a large period size.

On the OMAP L138, there are 128 PARAM slots. 32 of those are tied to 
hardware events (though you can use them if you aren't using the related 
hardware, for example the UART drivers don't use DMA so you can freely 
use those slots if you want), leaving (at least) 96 PARAM slots free. 
Both audio events are on the same controller, so you can't use the 128 
of the other one (the OMAP has 2 EDMA controllers). Only a few dozen of 
those are being used by various drivers, the SD card driver being the 
most hungry.
For the system to work, you can even get away with only using one slot, 
and hence one period, but then you'll have to use a mmap and a timer to 
fill it.

I experimented with various memory layouts. For large transfers, using 2 
big periods was quite enough. I also tested with very small period 
sizes. Using the original code, I was unable to reliably capture (to 
/dev/null) at period sizes below 80 samples. With the cyclic DMA, I 
could set a period size of only 40 samples and still be able to record 
audio reliably, when using only 8 periods. The same for playback, 
basically. So that's how I arrived at the MAX_PERIODS define of "8". It 
will only claim channels when you use them, so setting it to say "100" 
will not crash the system.

The period size is limited by the EDMA parameter set. It can only 
transfer 64k-1 "words" per slot. You can (and should!) use the McASP 
FIFO buffer to increase the word size, thus allowing for period sizes in 
the megabyte range.

M.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [alsa-devel] Query on Audio DMA using DMAEngine
       [not found]               ` <20130703094307.GE27646-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
@ 2013-07-03 17:55                 ` Joel Fernandes
       [not found]                   ` <51D46598.6070005-l0cyMroinI0@public.gmane.org>
                                     ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Joel Fernandes @ 2013-07-03 17:55 UTC (permalink / raw)
  To: Mark Brown
  Cc: Mike Looijmans, alsa-devel-K7yf7f+aM1XWsZ/bQMPhNw,
	Lars-Peter Clausen,
	davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
	linux-omap-u79uwXL29TY76Z2rM5mHXA

Copying some more lists are we're also discussing the DMA controller in the
SoCs. Thanks.

On 07/03/2013 04:43 AM, Mark Brown wrote:
> On Wed, Jul 03, 2013 at 11:09:22AM +0200, Lars-Peter Clausen wrote:
>> On 07/02/2013 02:13 PM, Mark Brown wrote:
> 
>>> This sort of cyclic thing tends to be best, ideally you don't need
>>> interrupts at all (other than a timer).
> 
>> Yes, this is usually how it is done. But I'm wondering maybe the EDMA
>> controller only has a small total amount of slots available.
> 
> Well, you don't need particularly many slots so long as you can cope
> with a large period size.

Hi Mark,

When would it not be possible to cope with a large period size? Are there any
guidelines on what to consider when fixing a period size?

I see tegra and aux1x go upto .period_bytes_min = 1024

About slots, following are no.of slots on some SoCs with EDMA:

am1808 - 96 slots available + 32 taken up for channel but can be reused with
some changes.
am335x - 172 slots available + 64 taken up for channels

On a slightly different note, about buffer_bytes_max, is there any drawback to
setting it to a smaller value? Currently 128K is about what is used on davinci-pcm.
My idea is to map to do the direct mapping to IRAM if the IRAM transfers are
really what are preventing the under runs, but 128K will be too much for the
buffer as we don't have that much IRAM infact it is just the boundary on am33xx
(128K)


Thanks,

-Joel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [alsa-devel] Query on Audio DMA using DMAEngine
       [not found]                   ` <51D46598.6070005-l0cyMroinI0@public.gmane.org>
@ 2013-07-03 18:12                     ` Mark Brown
  2013-07-04  5:56                       ` Mike Looijmans
  0 siblings, 1 reply; 30+ messages in thread
From: Mark Brown @ 2013-07-03 18:12 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Mike Looijmans, alsa-devel-K7yf7f+aM1XWsZ/bQMPhNw,
	Lars-Peter Clausen,
	davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
	linux-omap-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 373 bytes --]

On Wed, Jul 03, 2013 at 12:55:36PM -0500, Joel Fernandes wrote:

> When would it not be possible to cope with a large period size? Are there any
> guidelines on what to consider when fixing a period size?

This is an application issue not a driver issue.  An application that
wants low latency may need high resolution information about what
exactly the hardware is doing.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [alsa-devel] Query on Audio DMA using DMAEngine
  2013-07-03 17:55                 ` [alsa-devel] " Joel Fernandes
       [not found]                   ` <51D46598.6070005-l0cyMroinI0@public.gmane.org>
@ 2013-07-03 18:18                   ` Joel Fernandes
  2013-07-04  6:06                   ` Mike Looijmans
  2 siblings, 0 replies; 30+ messages in thread
From: Joel Fernandes @ 2013-07-03 18:18 UTC (permalink / raw)
  To: Darren Etheridge; +Cc: linux-omap, davinci-linux-open-source

On 07/03/2013 01:12 PM, Mark Brown wrote:> On Wed, Jul 03, 2013 at 12:55:36PM
-0500, Joel Fernandes wrote:
>
>> When would it not be possible to cope with a large period size? Are there any
>> guidelines on what to consider when fixing a period size?
>
> This is an application issue not a driver issue.  An application that
> wants low latency may need high resolution information about what
> exactly the hardware is doing.
>

Hi Darren,

  Can anyone in your team share some insight about min/max buffer size and
min/max period size in the davinci-pcm driver. This will help make better
decisions about the Cyclic DMA. Below was my original email on the mailing list.

Also could you copy Peter? I couldn't find his email address.

Thanks,
Joel


Quoting my original email to the mailing list:


Copying some more lists are we're also discussing the DMA controller in the
SoCs. Thanks.

On 07/03/2013 04:43 AM, Mark Brown wrote:
> On Wed, Jul 03, 2013 at 11:09:22AM +0200, Lars-Peter Clausen wrote:
>> On 07/02/2013 02:13 PM, Mark Brown wrote:
>
>>> This sort of cyclic thing tends to be best, ideally you don't need
>>> interrupts at all (other than a timer).
>
>> Yes, this is usually how it is done. But I'm wondering maybe the EDMA
>> controller only has a small total amount of slots available.
>
> Well, you don't need particularly many slots so long as you can cope
> with a large period size.

Hi Mark,

When would it not be possible to cope with a large period size? Are there any
guidelines on what to consider when fixing a period size?

I see tegra and aux1x go upto .period_bytes_min = 1024

About slots, following are no.of slots on some SoCs with EDMA:

am1808 - 96 slots available + 32 taken up for channel but can be reused with
some changes.
am335x - 172 slots available + 64 taken up for channels

On a slightly different note, about buffer_bytes_max, is there any drawback to
setting it to a smaller value? Currently 128K is about what is used on davinci-pcm.
My idea is to map to do the direct mapping to IRAM if the IRAM transfers are
really what are preventing the under runs, but 128K will be too much for the
buffer as we don't have that much IRAM infact it is just the boundary on am33xx
(128K)


Thanks,

-Joel



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
       [not found]                 ` <51D4245F.8070307-Oq418RWZeHk@public.gmane.org>
@ 2013-07-03 19:56                   ` Joel Fernandes
  0 siblings, 0 replies; 30+ messages in thread
From: Joel Fernandes @ 2013-07-03 19:56 UTC (permalink / raw)
  To: Mike Looijmans
  Cc: davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
	alsa-devel-K7yf7f+aM1XWsZ/bQMPhNw, Mark Brown,
	Lars-Peter Clausen, linux-omap-u79uwXL29TY76Z2rM5mHXA

Hi Mike,

On 07/03/2013 08:17 AM, Mike Looijmans wrote:
> On 07/03/2013 11:43 AM, Mark Brown wrote:
>> On Wed, Jul 03, 2013 at 11:09:22AM +0200, Lars-Peter Clausen wrote:
>>> On 07/02/2013 02:13 PM, Mark Brown wrote:
>>
>>>> This sort of cyclic thing tends to be best, ideally you don't need
>>>> interrupts at all (other than a timer).
>>
>>> Yes, this is usually how it is done. But I'm wondering maybe the EDMA
>>> controller only has a small total amount of slots available.
>>
>> Well, you don't need particularly many slots so long as you can cope
>> with a large period size.
> 
> On the OMAP L138, there are 128 PARAM slots. 32 of those are tied to hardware
> events (though you can use them if you aren't using the related hardware, for
> example the UART drivers don't use DMA so you can freely use those slots if you
> want), leaving (at least) 96 PARAM slots free. Both audio events are on the same
> controller, so you can't use the 128 of the other one (the OMAP has 2 EDMA
> controllers). Only a few dozen of those are being used by various drivers, the
> SD card driver being the most hungry.
> For the system to work, you can even get away with only using one slot, and
> hence one period, but then you'll have to use a mmap and a timer to fill it.
> 
> I experimented with various memory layouts. For large transfers, using 2 big
> periods was quite enough. I also tested with very small period sizes. Using the

Wouldn't very small periods take up too many interrupts, and also occupy lots of
slots?

> original code, I was unable to reliably capture (to /dev/null) at period sizes
> below 80 samples. With the cyclic DMA, I could set a period size of only 40
> samples and still be able to record audio reliably, when using only 8 periods.
> The same for playback, basically. So that's how I arrived at the MAX_PERIODS
> define of "8". It will only claim channels when you use them, so setting it to
> say "100" will not crash the system.

Thanks for your post Mike. It makes more sense to me now.
Correct me if I'm wrong but:
- more the periods, more the granularity- but the drawback is you'd need more
slots and too many interrupts; so we want fewer periods as many as we need. I
still don't know though, how do we arrive at an acceptable number that userspace
expects?
- periods also will determine buffer size. Considering in future if we'd want to
use IRAM as the buffer which is limited on some users of the davinci-pcm, there
might not be enough buffer space.

So too many periods is certainly not a good thing. I wonder how we can arrive at
what would constitute an acceptable number? As Linus said, "we never break
userspace :P" so I'd rather not change anything that breaks someone's audio
application.

I will post some RFC notes soon capture our discussion and other ideas I had put
together for EDMA as some notes to summarize and get everyone's opinion. I will
copy you on that as well. Thanks.

-Joel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-03 18:12                     ` Mark Brown
@ 2013-07-04  5:56                       ` Mike Looijmans
  2013-07-04 10:49                         ` Mark Brown
  0 siblings, 1 reply; 30+ messages in thread
From: Mike Looijmans @ 2013-07-04  5:56 UTC (permalink / raw)
  To: Mark Brown
  Cc: Joel Fernandes, alsa-devel, Lars-Peter Clausen,
	davinci-linux-open-source, linux-omap

On 07/03/2013 08:12 PM, Mark Brown wrote:
> On Wed, Jul 03, 2013 at 12:55:36PM -0500, Joel Fernandes wrote:
>
>> When would it not be possible to cope with a large period size? Are there any
>> guidelines on what to consider when fixing a period size?
>
> This is an application issue not a driver issue.  An application that
> wants low latency may need high resolution information about what
> exactly the hardware is doing.

To get low-latency, the best thing from userspace is to mmap the audio 
buffer, and monitor the position of the DMA transfers. If the driver 
reports the DMA position accurately, you can get latencies of only a few 
samples. I must admit that I know next to nothing about how ALSA works 
in userspace, but that's how DirectSound works, for example. And from 
what I've seen, this is also possible with ALSA.

Even without that - I tried with small periods of only 40 samples, this 
invariably fails on the current driver, with or without the ping-ping. 
Using the cyclic DMA I had no problem using such small periods.

Mike.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [alsa-devel] Query on Audio DMA using DMAEngine
  2013-07-03 17:55                 ` [alsa-devel] " Joel Fernandes
       [not found]                   ` <51D46598.6070005-l0cyMroinI0@public.gmane.org>
  2013-07-03 18:18                   ` [alsa-devel] " Joel Fernandes
@ 2013-07-04  6:06                   ` Mike Looijmans
  2013-07-04 10:53                     ` Mark Brown
       [not found]                     ` <51D510EA.1030809-Oq418RWZeHk@public.gmane.org>
  2 siblings, 2 replies; 30+ messages in thread
From: Mike Looijmans @ 2013-07-04  6:06 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Mark Brown, Lars-Peter Clausen, alsa-devel, linux-omap,
	davinci-linux-open-source

On 07/03/2013 07:55 PM, Joel Fernandes wrote:
> Copying some more lists are we're also discussing the DMA controller in the
> SoCs. Thanks.
>
> On 07/03/2013 04:43 AM, Mark Brown wrote:
>> On Wed, Jul 03, 2013 at 11:09:22AM +0200, Lars-Peter Clausen wrote:
>>> On 07/02/2013 02:13 PM, Mark Brown wrote:
>>
>>>> This sort of cyclic thing tends to be best, ideally you don't need
>>>> interrupts at all (other than a timer).
>>
>>> Yes, this is usually how it is done. But I'm wondering maybe the EDMA
>>> controller only has a small total amount of slots available.
>>
>> Well, you don't need particularly many slots so long as you can cope
>> with a large period size.
>
> Hi Mark,
>
> When would it not be possible to cope with a large period size? Are there any
> guidelines on what to consider when fixing a period size?
>
> I see tegra and aux1x go upto .period_bytes_min = 1024
>
> About slots, following are no.of slots on some SoCs with EDMA:
>
> am1808 - 96 slots available + 32 taken up for channel but can be reused with
> some changes.
> am335x - 172 slots available + 64 taken up for channels
>
> On a slightly different note, about buffer_bytes_max, is there any drawback to
> setting it to a smaller value? Currently 128K is about what is used on davinci-pcm.
> My idea is to map to do the direct mapping to IRAM if the IRAM transfers are
> really what are preventing the under runs, but 128K will be too much for the
> buffer as we don't have that much IRAM infact it is just the boundary on am33xx
> (128K)

In any case, using the IRAM directly might have some use, because you 
don't have to compete for the DDRRAM with other devices. But I never 
understood what the ping-ping via IRAM was supposed to accomplish, I 
don't see why McASP -> IRAM -> DDRRAM (or the other way around) would be 
better than just McASP -> DDRRAM. Especially since the McASP has a 
built-in 256 byte FIFO buffer on both channels. In all my measurements, 
using the IRAM ping-pong only made things worse in terms of overruns and 
underruns, not better.

Anyone who know why the ping-pong was implemented and what kind of usage 
it was intended for?

Mike.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-04  5:56                       ` Mike Looijmans
@ 2013-07-04 10:49                         ` Mark Brown
  0 siblings, 0 replies; 30+ messages in thread
From: Mark Brown @ 2013-07-04 10:49 UTC (permalink / raw)
  To: Mike Looijmans
  Cc: Joel Fernandes, alsa-devel, Lars-Peter Clausen,
	davinci-linux-open-source, linux-omap


[-- Attachment #1.1: Type: text/plain, Size: 1473 bytes --]

On Thu, Jul 04, 2013 at 07:56:25AM +0200, Mike Looijmans wrote:
> On 07/03/2013 08:12 PM, Mark Brown wrote:

> >This is an application issue not a driver issue.  An application that
> >wants low latency may need high resolution information about what
> >exactly the hardware is doing.

> To get low-latency, the best thing from userspace is to mmap the
> audio buffer, and monitor the position of the DMA transfers. If the
> driver reports the DMA position accurately, you can get latencies of
> only a few samples. I must admit that I know next to nothing about
> how ALSA works in userspace, but that's how DirectSound works, for
> example. And from what I've seen, this is also possible with ALSA.

There are often hardware limitations that mean that it is not possible
to know the actual position of the DMA with anything less than period
accuracy - either the hardware just doesn't report the current status
during a transfer or it reports something that's not quite what's needed
to usefully interact with it.  The former is depressingly common.  The
APIs can support peering at the current position but it's not something
that a portable application should be relying on.

> Even without that - I tried with small periods of only 40 samples,
> this invariably fails on the current driver, with or without the
> ping-ping. Using the cyclic DMA I had no problem using such small
> periods.

The period size is generally orthogonal to decisions about using cyclic
DMA.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-04  6:06                   ` Mike Looijmans
@ 2013-07-04 10:53                     ` Mark Brown
       [not found]                     ` <51D510EA.1030809-Oq418RWZeHk@public.gmane.org>
  1 sibling, 0 replies; 30+ messages in thread
From: Mark Brown @ 2013-07-04 10:53 UTC (permalink / raw)
  To: Mike Looijmans
  Cc: Joel Fernandes, alsa-devel, Lars-Peter Clausen,
	davinci-linux-open-source, linux-omap


[-- Attachment #1.1: Type: text/plain, Size: 1073 bytes --]

On Thu, Jul 04, 2013 at 08:06:34AM +0200, Mike Looijmans wrote:

> In any case, using the IRAM directly might have some use, because
> you don't have to compete for the DDRRAM with other devices. But I
> never understood what the ping-ping via IRAM was supposed to
> accomplish, I don't see why McASP -> IRAM -> DDRRAM (or the other
> way around) would be better than just McASP -> DDRRAM. Especially
> since the McASP has a built-in 256 byte FIFO buffer on both
> channels. In all my measurements, using the IRAM ping-pong only made
> things worse in terms of overruns and underruns, not better.

> Anyone who know why the ping-pong was implemented and what kind of
> usage it was intended for?

Pushing the audio through some static RAM is normally implemented in
order to save power - when doing this you can put the dynamic RAM into a
lower power state for more of the time, only waking it up to burst data
to or from the static RAM (assuming an otherwise idle system).  This is
more normally used for playback than for capture but the same idea
applies in both cases.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [alsa-devel] Query on Audio DMA using DMAEngine
       [not found]                     ` <51D510EA.1030809-Oq418RWZeHk@public.gmane.org>
@ 2013-07-04 10:59                       ` Sekhar Nori
  0 siblings, 0 replies; 30+ messages in thread
From: Sekhar Nori @ 2013-07-04 10:59 UTC (permalink / raw)
  To: Mike Looijmans
  Cc: davinci-linux-open-source-VycZQUHpC/PFrsHnngEfi1aTQe2KTcn/,
	Lars-Peter Clausen, Joel Fernandes,
	alsa-devel-K7yf7f+aM1XWsZ/bQMPhNw, Mark Brown,
	linux-omap-u79uwXL29TY76Z2rM5mHXA

On 7/4/2013 11:36 AM, Mike Looijmans wrote:
> On 07/03/2013 07:55 PM, Joel Fernandes wrote:
>> Copying some more lists are we're also discussing the DMA controller
>> in the
>> SoCs. Thanks.
>>
>> On 07/03/2013 04:43 AM, Mark Brown wrote:
>>> On Wed, Jul 03, 2013 at 11:09:22AM +0200, Lars-Peter Clausen wrote:
>>>> On 07/02/2013 02:13 PM, Mark Brown wrote:
>>>
>>>>> This sort of cyclic thing tends to be best, ideally you don't need
>>>>> interrupts at all (other than a timer).
>>>
>>>> Yes, this is usually how it is done. But I'm wondering maybe the EDMA
>>>> controller only has a small total amount of slots available.
>>>
>>> Well, you don't need particularly many slots so long as you can cope
>>> with a large period size.
>>
>> Hi Mark,
>>
>> When would it not be possible to cope with a large period size? Are
>> there any
>> guidelines on what to consider when fixing a period size?
>>
>> I see tegra and aux1x go upto .period_bytes_min = 1024
>>
>> About slots, following are no.of slots on some SoCs with EDMA:
>>
>> am1808 - 96 slots available + 32 taken up for channel but can be
>> reused with
>> some changes.
>> am335x - 172 slots available + 64 taken up for channels
>>
>> On a slightly different note, about buffer_bytes_max, is there any
>> drawback to
>> setting it to a smaller value? Currently 128K is about what is used on
>> davinci-pcm.
>> My idea is to map to do the direct mapping to IRAM if the IRAM
>> transfers are
>> really what are preventing the under runs, but 128K will be too much
>> for the
>> buffer as we don't have that much IRAM infact it is just the boundary
>> on am33xx
>> (128K)
> 
> In any case, using the IRAM directly might have some use, because you
> don't have to compete for the DDRRAM with other devices. But I never
> understood what the ping-ping via IRAM was supposed to accomplish, I
> don't see why McASP -> IRAM -> DDRRAM (or the other way around) would be
> better than just McASP -> DDRRAM. Especially since the McASP has a
> built-in 256 byte FIFO buffer on both channels. In all my measurements,
> using the IRAM ping-pong only made things worse in terms of overruns and
> underruns, not better.
> 
> Anyone who know why the ping-pong was implemented and what kind of usage
> it was intended for?

McBSP peripheral that was included in the DaVinci devices like DM644x
dis not come with FIFO. Due to latency of DDR accesses, there were
channel swaps observed due to lost samples on these devices and IRAM
implementation helped there.

Thanks,
Sekhar

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-02  1:28     ` Joel Fernandes
  2013-07-02  6:02       ` Mike Looijmans
@ 2013-07-04 11:00       ` Clemens Ladisch
  1 sibling, 0 replies; 30+ messages in thread
From: Clemens Ladisch @ 2013-07-04 11:00 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: Mike Looijmans, alsa-devel, lars

Joel Fernandes wrote:
> On 07/01/2013 01:10 AM, Mike Looijmans wrote:
>> The trouble with the current davinci driver is that the IRQ handler has a
>> real-time requirement, it must finish before the next DMA block completes. This
>> causes most of the buffer overruns on heavily loaded systems.
>
> But how do you get around not calling snd_pcm_period_elapsed in a time-sensitive
> fashion?

To ensure that other stuff is completed first, snd_pcm_period_elapsed()
could be called later from a tasklet.  (snd_pcm_period_elapsed() calls
the .pointer callback, which could be another source of delays depending
on how much hardware accesses it does.)


Regards,
Clemens

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-07-02  5:50       ` Mike Looijmans
  2013-07-02 12:13         ` Mark Brown
@ 2013-08-14  4:30         ` Joel Fernandes
  2013-08-14  4:53           ` Joel Fernandes
  2013-08-14 12:06           ` Mark Brown
  1 sibling, 2 replies; 30+ messages in thread
From: Joel Fernandes @ 2013-08-14  4:30 UTC (permalink / raw)
  To: Mike Looijmans; +Cc: alsa-devel, lars

Hi Mike,

On 07/02/2013 12:50 AM, Mike Looijmans wrote:
[..]
> 
>> Either way I'm sure your multi-slot approach is superior, but I don't
>> see how
>> you can get away with not updating the DMA addresses on every IRQ with
>> the
>> current davinci-pcm or EDMA controller (Unless you use a complicated
>> mechanism
>> like ping-pong where the address updates take care of itself). If you
>> are using
>> a set of chained slots, you only have so many slots so you have to
>> continuously
>> change addresses of the slots at some point or the other for a large
>> transfer.
> 
> I use a chain like this:
> 
> DMA1 -> DMA2 -> DMA... -> DMA1
> 
> This meant I had to use a DMA PARAM slot for every "period". The OMAP
> L138 has 128 of those slots, so it's no problem to use a bunch of them.
> Because the chain is cyclic, there is no need to update any DMA
> parameter while running. All that ALSA needs to do is empty the buffer
> before the cycle completes and the current position gets overwritten.

Replying to this thread after a long time but just wondering, how do you
guarantee in your implementation that DMA will not empty the buffer
faster than it is filled?

Thanks,

-Joel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-08-14  4:30         ` Joel Fernandes
@ 2013-08-14  4:53           ` Joel Fernandes
  2013-08-14 14:10             ` Mike Looijmans
  2013-08-14 12:06           ` Mark Brown
  1 sibling, 1 reply; 30+ messages in thread
From: Joel Fernandes @ 2013-08-14  4:53 UTC (permalink / raw)
  To: Mike Looijmans; +Cc: alsa-devel, lars

On 08/13/2013 11:30 PM, Joel Fernandes wrote:
> Hi Mike,
> 
> On 07/02/2013 12:50 AM, Mike Looijmans wrote:
> [..]
>>
>>> Either way I'm sure your multi-slot approach is superior, but I don't
>>> see how
>>> you can get away with not updating the DMA addresses on every IRQ with
>>> the
>>> current davinci-pcm or EDMA controller (Unless you use a complicated
>>> mechanism
>>> like ping-pong where the address updates take care of itself). If you
>>> are using
>>> a set of chained slots, you only have so many slots so you have to
>>> continuously
>>> change addresses of the slots at some point or the other for a large
>>> transfer.
>>
>> I use a chain like this:
>>
>> DMA1 -> DMA2 -> DMA... -> DMA1
>>
>> This meant I had to use a DMA PARAM slot for every "period". The OMAP
>> L138 has 128 of those slots, so it's no problem to use a bunch of them.
>> Because the chain is cyclic, there is no need to update any DMA
>> parameter while running. All that ALSA needs to do is empty the buffer
>> before the cycle completes and the current position gets overwritten.
> 
> [Joel] Replying to this thread after a long time but just wondering, how do you
> guarantee in your implementation that DMA will not empty the buffer
> faster than it is filled?


I guess this is also what you've called in some threads as the overrun
condition.

-Joel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-08-14  4:30         ` Joel Fernandes
  2013-08-14  4:53           ` Joel Fernandes
@ 2013-08-14 12:06           ` Mark Brown
  1 sibling, 0 replies; 30+ messages in thread
From: Mark Brown @ 2013-08-14 12:06 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: Mike Looijmans, alsa-devel, lars


[-- Attachment #1.1: Type: text/plain, Size: 395 bytes --]

On Tue, Aug 13, 2013 at 11:30:54PM -0500, Joel Fernandes wrote:

> Replying to this thread after a long time but just wondering, how do you
> guarantee in your implementation that DMA will not empty the buffer
> faster than it is filled?

Userspace is ultimately responsible for supplying data - if the
application can't keep up then the application should get an underrun
reported and restart.

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Query on Audio DMA using DMAEngine
  2013-08-14  4:53           ` Joel Fernandes
@ 2013-08-14 14:10             ` Mike Looijmans
  0 siblings, 0 replies; 30+ messages in thread
From: Mike Looijmans @ 2013-08-14 14:10 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: alsa-devel, lars

On 08/14/2013 06:53 AM, Joel Fernandes wrote:
> On 08/13/2013 11:30 PM, Joel Fernandes wrote:
>> Hi Mike,
>>
>> On 07/02/2013 12:50 AM, Mike Looijmans wrote:
>> [..]
>>>
>>>> Either way I'm sure your multi-slot approach is superior, but I don't
>>>> see how
>>>> you can get away with not updating the DMA addresses on every IRQ with
>>>> the
>>>> current davinci-pcm or EDMA controller (Unless you use a complicated
>>>> mechanism
>>>> like ping-pong where the address updates take care of itself). If you
>>>> are using
>>>> a set of chained slots, you only have so many slots so you have to
>>>> continuously
>>>> change addresses of the slots at some point or the other for a large
>>>> transfer.
>>>
>>> I use a chain like this:
>>>
>>> DMA1 -> DMA2 -> DMA... -> DMA1
>>>
>>> This meant I had to use a DMA PARAM slot for every "period". The OMAP
>>> L138 has 128 of those slots, so it's no problem to use a bunch of them.
>>> Because the chain is cyclic, there is no need to update any DMA
>>> parameter while running. All that ALSA needs to do is empty the buffer
>>> before the cycle completes and the current position gets overwritten.
>>
>> [Joel] Replying to this thread after a long time but just wondering, how do you
>> guarantee in your implementation that DMA will not empty the buffer
>> faster than it is filled?
>
> I guess this is also what you've called in some threads as the overrun
> condition.

Indeed. Alsa monitors the "position" of the ring, and when the DMA 
passes the application's "cursor", it reports an underrun or overrun 
(depending on whether it's capture or playback).

There is no guarantee - only verification. The user's application must 
keep up, or suffer the consequenses. My customer has been using the 
modified driver to capture 16 channels of 32-bit data at 50kHz for quite 
a while now. Before the modification, it wasn't even possible to 
reliably capture more than 4 channels.

Mike.


Met vriendelijke groet / kind regards,

Mike Looijmans

TOPIC Embedded Systems
Eindhovenseweg 32-C, NL-5683 KH Best
Postbus 440, NL-5680 AK Best
Telefoon: (+31) – (0)499 - 33.69.79
Telefax: (+31) - (0)499 - 33.69.70
E-mail: mike.looijmans@topic.nl
Website: www.topic.nl

Dit e-mail bericht en de eventueel daarbij behorende bijlagen zijn uitsluitend bestemd voor de geadresseerde, zoals die blijkt uit het e-mail bericht en/of de bijlagen. Er kunnen gegevens met betrekking tot een derde instaan. Indien u als niet-geadresseerde dit bericht en de bijlagen ontvangt, terwijl u niet bevoegd of gemachtigd bent om dit bericht namens de geadresseerde te ontvangen, wordt u verzocht de afzender hierover direct te informeren en het e-mail bericht met de bijlagen te vernietigen. Ieder gebruik van de inhoud van het e-mail bericht, waaronder de daarbij behorende bijlagen, door een ander dan de geadresseerde is onrechtmatig jegens ons dan wel de eventueel in het e-mail bericht of de bijlagen voorkomende andere personen. TOPIC Embedded Systems is niet aansprakelijk voor enigerlei schade voortvloeiend uit het gebruik en/of acceptatie van dit e-mail bericht of de daarbij behorende bijlagen.

The contents of this message, as well as any enclosures, are addressed personally to, and thus solely intended for the addressee. They may contain information regarding a third party. A recipient who is neither the addressee, nor empowered to receive this message on behalf of the addressee, is kindly requested to immediately inform the sender of receipt, and to destroy the message and the enclosures. Any use of the contents of this message and/or the enclosures by any other person than the addressee or person who is empowered to receive this message, is illegal towards the sender and/or the aforementioned third party. TOPIC Embedded Systems is not  liable for any damage as a result of the use and/or acceptance of this message and as well as any enclosures.
_______________________________________________
Alsa-devel mailing list
Alsa-devel@alsa-project.org
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-08-14 14:10 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <083BC63EECB6FD41B8E81CF7FD87CC0F2E4F1488@DLEE08.ent.ti.com>
2013-06-30 12:06 ` Query on Audio DMA using DMAEngine Lars-Peter Clausen
2013-07-01  6:10   ` Mike Looijmans
2013-07-02  1:28     ` Joel Fernandes
2013-07-02  6:02       ` Mike Looijmans
2013-07-02 12:16         ` Mark Brown
2013-07-02 13:30           ` Mike Looijmans
2013-07-02 14:58             ` Mark Brown
2013-07-04 11:00       ` Clemens Ladisch
2013-07-02  3:33     ` Joel Fernandes
2013-07-02  5:50       ` Mike Looijmans
2013-07-02 12:13         ` Mark Brown
2013-07-02 13:40           ` Mike Looijmans
2013-07-03  9:09           ` Lars-Peter Clausen
2013-07-03  9:43             ` Mark Brown
2013-07-03 13:17               ` Mike Looijmans
     [not found]                 ` <51D4245F.8070307-Oq418RWZeHk@public.gmane.org>
2013-07-03 19:56                   ` Joel Fernandes
     [not found]               ` <20130703094307.GE27646-GFdadSzt00ze9xe1eoZjHA@public.gmane.org>
2013-07-03 17:55                 ` [alsa-devel] " Joel Fernandes
     [not found]                   ` <51D46598.6070005-l0cyMroinI0@public.gmane.org>
2013-07-03 18:12                     ` Mark Brown
2013-07-04  5:56                       ` Mike Looijmans
2013-07-04 10:49                         ` Mark Brown
2013-07-03 18:18                   ` [alsa-devel] " Joel Fernandes
2013-07-04  6:06                   ` Mike Looijmans
2013-07-04 10:53                     ` Mark Brown
     [not found]                     ` <51D510EA.1030809-Oq418RWZeHk@public.gmane.org>
2013-07-04 10:59                       ` [alsa-devel] " Sekhar Nori
2013-08-14  4:30         ` Joel Fernandes
2013-08-14  4:53           ` Joel Fernandes
2013-08-14 14:10             ` Mike Looijmans
2013-08-14 12:06           ` Mark Brown
2013-07-02  1:04   ` Joel Fernandes
2013-07-03  9:07     ` Lars-Peter Clausen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.