All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: question about alsa tracepoints and alsa debugging
       [not found] <CALd3UbSrAqYFe5Z-S6NMKGaVMvOZx7C4zV5O7CwtTKPmV+UeVQ@mail.gmail.com>
@ 2020-12-22  5:23 ` Bert Schiettecatte
  2020-12-23  4:31   ` Bert Schiettecatte
  0 siblings, 1 reply; 7+ messages in thread
From: Bert Schiettecatte @ 2020-12-22  5:23 UTC (permalink / raw)
  To: alsa-devel

Hi all,

I have a rk3288 board I am working with, it has two ADCs and one DAC.
The ADCs are connected via an FPGA to the rk3288. The DAC is directly
connected to the I2S port of the rk3288. The rk3288 generates the
master clock. I am using buildroot and busybox and kernel 5.9.12
(stable).

I have a driver which receives the ADC data from the FPGA via SDIO.
When I run speaker-test in the background and use arecord to record
simultaneously in the foreground, both arecord and speaker-test seem
to lock up. When I run speaker-test alone I do not have a problem. The
playback device uses the rockchip_i2s.c driver while the capture
device uses my SDIO driver.

My driver based on dw_mmc.c calls snd_pcm_period_elapsed() for each
buffer of frames that is ready. I can see that this is being called
for a while when I start arecord in the foreground, until the lockup
occurs. The moment I start arecord, the console output from
speaker-test stops, so it's locked up by starting arecord. Previously
this was working fine on kernel 4.11. However, since I updated to
kernel 5.9.12 this no longer works. Has anything changed between 4.11
and 5.9.12 that could be the cause of the problem?

commands used for starting playback / trace capture and for recording -
./perf record -F 99 -g -e snd_pcm:* -e sched:sched_switch -e
sched:sched_wakeup -e irq:* speaker-test -D
hw:CARD=ak4458ak5558aud,DEV=0 -c8 -r192000 -FS32_LE -tsine &
arecord -Dhw:CARD=ak4458ak5558aud,DEV=1 -fS32_LE -c16 -r192000
/mnt/ramdisk/rec.wav

I enabled the kernel options as described at
https://www.kernel.org/doc/html/v5.9/sound/designs/tracepoints.html
and have used perf to record a trace, which can be access here -
https://kernel-debugging.s3-us-west-1.amazonaws.com/perf.data.script.211220202054.txt
(exported using "perf script")

The trace file shows "lockdep_recursion" at some point so I am
wondering if my problem has to do with a deadlock somehow being
caused. However, I also see handle_mm_fault, and do_page_fault, which
has to do with memory mapping? So I'm not sure what the problem
exactly is and what to do next to find out what's wrong.

Thanks in advance for your help and interest and patience,
Bert

^ permalink raw reply	[flat|nested] 7+ messages in thread

* question about alsa tracepoints and alsa debugging
  2020-12-22  5:23 ` Fwd: question about alsa tracepoints and alsa debugging Bert Schiettecatte
@ 2020-12-23  4:31   ` Bert Schiettecatte
  2020-12-23  9:56     ` Lars-Peter Clausen
  0 siblings, 1 reply; 7+ messages in thread
From: Bert Schiettecatte @ 2020-12-23  4:31 UTC (permalink / raw)
  To: alsa-devel

Hi all,

I have a rk3288 board I am working with, it has two ADCs and one DAC.
The ADCs are connected via an FPGA to the rk3288. The DAC is directly
connected to the I2S port of the rk3288. The rk3288 generates the
master clock. I am using buildroot and busybox and kernel 5.9.12
(stable).

I have a driver which receives the ADC data from the FPGA via SDIO.
When I run speaker-test in the background and use arecord to record
simultaneously in the foreground, both arecord and speaker-test seem
to lock up. When I run speaker-test alone I do not have a problem. The
playback device uses the rockchip_i2s.c driver while the capture
device uses my SDIO driver.

My driver based on dw_mmc.c calls snd_pcm_period_elapsed() for each
buffer of frames that is ready. I can see that this is being called
for a while when I start arecord in the foreground, until the lockup
occurs. The moment I start arecord, the console output from
speaker-test stops, so it's locked up by starting arecord. Previously
this was working fine on kernel 4.11. However, since I updated to
kernel 5.9.12 this no longer works. Has anything changed between 4.11
and 5.9.12 that could be the cause of the problem?

commands used for starting playback / trace capture and for recording -
./perf record -F 99 -g -e snd_pcm:* -e sched:sched_switch -e
sched:sched_wakeup -e irq:* speaker-test -D
hw:CARD=ak4458ak5558aud,DEV=0 -c8 -r192000 -FS32_LE -tsine &
arecord -Dhw:CARD=ak4458ak5558aud,DEV=1 -fS32_LE -c16 -r192000
/mnt/ramdisk/rec.wav

I enabled the kernel options as described at
https://www.kernel.org/doc/html/v5.9/sound/designs/tracepoints.html
and have used perf to record a trace, which can be access here -
https://kernel-debugging.s3-us-west-1.amazonaws.com/perf.data.script.211220202054.txt
(exported using "perf script")

The trace file shows "lockdep_recursion" at some point so I am
wondering if my problem has to do with a deadlock somehow being
caused. However, I also see handle_mm_fault, and do_page_fault, which
has to do with memory mapping? So I'm not sure what the problem
exactly is and what to do next to find out what's wrong.

Thanks in advance for your help and interest and patience,
Bert

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question about alsa tracepoints and alsa debugging
  2020-12-23  4:31   ` Bert Schiettecatte
@ 2020-12-23  9:56     ` Lars-Peter Clausen
  2020-12-24 20:40       ` Bert Schiettecatte
  0 siblings, 1 reply; 7+ messages in thread
From: Lars-Peter Clausen @ 2020-12-23  9:56 UTC (permalink / raw)
  To: Bert Schiettecatte, alsa-devel

On 12/23/20 5:31 AM, Bert Schiettecatte wrote:
> Hi all,
>
> I have a rk3288 board I am working with, it has two ADCs and one DAC.
> The ADCs are connected via an FPGA to the rk3288. The DAC is directly
> connected to the I2S port of the rk3288. The rk3288 generates the
> master clock. I am using buildroot and busybox and kernel 5.9.12
> (stable).
>
> I have a driver which receives the ADC data from the FPGA via SDIO.
> When I run speaker-test in the background and use arecord to record
> simultaneously in the foreground, both arecord and speaker-test seem
> to lock up. When I run speaker-test alone I do not have a problem. The
> playback device uses the rockchip_i2s.c driver while the capture
> device uses my SDIO driver.
>
> My driver based on dw_mmc.c calls snd_pcm_period_elapsed() for each
> buffer of frames that is ready. I can see that this is being called
> for a while when I start arecord in the foreground, until the lockup
> occurs. The moment I start arecord, the console output from
> speaker-test stops, so it's locked up by starting arecord.
Are these implemented as two separate sound cards? If so I don't see how 
they could interfere with each other. There should not be any shared 
resources or locks between them at least on the ALSA side.
> Previously this was working fine on kernel 4.11. However, since I updated to
> kernel 5.9.12 this no longer works. Has anything changed between 4.11
> and 5.9.12 that could be the cause of the problem?

There are almost 300k commits between v4.11 and v5.9.12 one of them 
surely could have added a new issue :)

If possible try to narrow this down a bit more, this will help to track 
this down. If this is a more general issue it is quite likely that this 
is a more recently introduced problem, otherwise others would have 
already run into it. Maybe try one of the stable kernels like v5.4.

>
> commands used for starting playback / trace capture and for recording -
> ./perf record -F 99 -g -e snd_pcm:* -e sched:sched_switch -e
> sched:sched_wakeup -e irq:* speaker-test -D
> hw:CARD=ak4458ak5558aud,DEV=0 -c8 -r192000 -FS32_LE -tsine &
> arecord -Dhw:CARD=ak4458ak5558aud,DEV=1 -fS32_LE -c16 -r192000
> /mnt/ramdisk/rec.wav
>
> I enabled the kernel options as described at
> https://www.kernel.org/doc/html/v5.9/sound/designs/tracepoints.html
> and have used perf to record a trace, which can be access here -
> https://kernel-debugging.s3-us-west-1.amazonaws.com/perf.data.script.211220202054.txt
> (exported using "perf script")
>
> The trace file shows "lockdep_recursion" at some point so I am
> wondering if my problem has to do with a deadlock somehow being
> caused.

lockdep_recursion is not even a function. I think this is just an 
incorrectly resolved stack trace.

To check if there is a locking issue somewhere compile your kernel with 
CONFIG_PROVE_LOCKING=y

Do both devices I2S and MMC use the same DMA? If so I could see things 
going wrong there since that would be shared resources in the DMA 
driver. If possible try to run the MMC driver without DMA and see what 
happens.

- Lars


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question about alsa tracepoints and alsa debugging
  2020-12-23  9:56     ` Lars-Peter Clausen
@ 2020-12-24 20:40       ` Bert Schiettecatte
  2020-12-25  0:59         ` Takashi Sakamoto
  0 siblings, 1 reply; 7+ messages in thread
From: Bert Schiettecatte @ 2020-12-24 20:40 UTC (permalink / raw)
  To: Lars-Peter Clausen; +Cc: alsa-devel

Hi Lars-Peter

> > occurs. The moment I start arecord, the console output from
> > speaker-test stops, so it's locked up by starting arecord.
> Are these implemented as two separate sound cards? If so I don't see how
> they could interfere with each other. There should not be any shared
> resources or locks between them at least on the ALSA side.

here are the definitions I've used in my driver - I had to translate
my old code from kernel 4.11 to using the new macros so not sure if
something got broken while doing that. I want my DAC and ADCs to be
one sound card device.

SND_SOC_DAILINK_DEFS(ak4458,
                     DAILINK_COMP_ARRAY(COMP_CPU("ff890000.i2s")),
                     DAILINK_COMP_ARRAY(COMP_CODEC(NULL, "ak4458")),
                     DAILINK_COMP_ARRAY(COMP_PLATFORM("ff890000.i2s")));

SND_SOC_DAILINK_DEFS(ak5558,
                     DAILINK_COMP_ARRAY(COMP_CPU("audio-cpld")),
                     DAILINK_COMP_ARRAY(COMP_CODEC(NULL, "ak5558")),
                     DAILINK_COMP_ARRAY(COMP_PLATFORM("audio-cpld")));

enum {
        DAILINK_AK4458,
        DAILINK_AK5558,
};

static struct snd_soc_dai_link rockchip_ak4458_ak5558_dai[] = {
        [DAILINK_AK4458] = {
                        .name = "Audio out",
                        .ops = &rockchip_ak4458_dai_ops,
                        SND_SOC_DAILINK_REG(ak4458),
        },
        [DAILINK_AK5558] = {
                        .name = "Audio in",
                        .ops = &rockchip_ak5558_dai_ops,
                        SND_SOC_DAILINK_REG(ak5558),
        },
};

static struct snd_soc_card rockchip_ak4458_ak5558 = {
        .name           = "ak4458-ak5558-audio",
        .dai_link       = rockchip_ak4458_ak5558_dai,
        .num_links      = ARRAY_SIZE(rockchip_ak4458_ak5558_dai),
};

> If possible try to narrow this down a bit more, this will help to track
> this down. If this is a more general issue it is quite likely that this
> is a more recently introduced problem, otherwise others would have
> already run into it. Maybe try one of the stable kernels like v5.4.

I will play around with a few different versions to see what happens.

> To check if there is a locking issue somewhere compile your kernel with
> CONFIG_PROVE_LOCKING=y

do I need PROVE_RAW_LOCK_NESTING as well? what about PROVE_RCU and
PROVE_RCU_LIST ?

> Do both devices I2S and MMC use the same DMA? If so I could see things
> going wrong there since that would be shared resources in the DMA
> driver. If possible try to run the MMC driver without DMA and see what
> happens.

I think they both use DMA but I don't think the resources are shared,
from what I can tell. My version of dw_mmc.c is 99% the same code but
I just directly call my own callbacks in dw_mci_interrupt
instead of dealing with the tasklet or work stuff.

Thanks for helping!
Bert

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question about alsa tracepoints and alsa debugging
  2020-12-24 20:40       ` Bert Schiettecatte
@ 2020-12-25  0:59         ` Takashi Sakamoto
  2020-12-25  3:06           ` Bert Schiettecatte
  0 siblings, 1 reply; 7+ messages in thread
From: Takashi Sakamoto @ 2020-12-25  0:59 UTC (permalink / raw)
  To: Bert Schiettecatte, Lars-Peter Clausen; +Cc: alsa-devel

Hi,

> commands used for starting playback / trace capture and for recording -
> ./perf record -F 99 -g -e snd_pcm:* -e sched:sched_switch -e
> sched:sched_wakeup -e irq:* speaker-test -D
> hw:CARD=ak4458ak5558aud,DEV=0 -c8 -r192000 -FS32_LE -tsine &
> arecord -Dhw:CARD=ak4458ak5558aud,DEV=1 -fS32_LE -c16 -r192000
> /mnt/ramdisk/rec.wav
>
> I enabled the kernel options as described at
> https://www.kernel.org/doc/html/v5.9/sound/designs/tracepoints.html
> and have used perf to record a trace, which can be access here -
> https://kernel-debugging.s3-us-west-1.amazonaws.com/perf.data.script.211220202054.txt
> (exported using "perf script")
>
> The trace file shows "lockdep_recursion" at some point so I am
> wondering if my problem has to do with a deadlock somehow being
> caused.

According to the log, speaker-test starts PCM substream for playback
direction with below parameters:

 - bits per sample: 32 bit
 - samples per frame: 8
 - frames per second: 192000
 - frames per period: 4096
 - frames per buffer: 16384
 - ...

By calculation with the above parameters, the hardware of I2S interface
should be configured to generate harware interrupt every 21.3 msec.

I wrote a Python 3 script to check the period of interrupt with a focus
on the record including call of 'snd_pcm_period_elapsed' in its backtrace:
https://gist.github.com/takaswie/af2b93b58a1154903d6fbe1a6bee3c69

```
$ ./parse.py /tmp/perf.data.script.211220202054.txt | \
  grep -B 3 -A 3 unexpected
73.2000 0.02132900 expected
73.2213 0.02133500 expected
73.2426 0.02135300 expected
110.0207 36.77807000 unexpected
110.0420 0.02133000 expected
110.0634 0.02133400 expected
110.0847 0.02132400 expected
```

I can find around 37 second gap in timestamp 110.0207. It means that the
I2S driver failed to control RK3288 I2S interface for periodical hardware
interrupt.

Furthermore, I can see following records after the gap. It means that
speaker-test process successfully detected XRUN of the PCM substream,
then recoverd it to continue processing PCM frames for playback
expectedly. Thus the I2S driver seems to have few problems about
dead-lock issue.

With records of tracepoints events for capture PCM substream
which Bert's SDIO driver handles as well as for the playback PCM
substream which speaker-test starts, we can get more details of
runtimes.

P.S. diagrams for the case:

```
+-----------+       +----+          +------+         +-------+
|AK5558 x 2 |<-(?)->|FPGA|          |RK3288|         | AK4458|
| (ADC)     |<-(?)->|    |<-(SDIO)->|      |<-(I2S)->| (DAC) |
+-----------+       +----+          +------+         +-------+
```


Regards

Takashi Sakamoto

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question about alsa tracepoints and alsa debugging
  2020-12-25  0:59         ` Takashi Sakamoto
@ 2020-12-25  3:06           ` Bert Schiettecatte
  2020-12-25  4:46             ` Bert Schiettecatte
  0 siblings, 1 reply; 7+ messages in thread
From: Bert Schiettecatte @ 2020-12-25  3:06 UTC (permalink / raw)
  To: Bert Schiettecatte, Lars-Peter Clausen, alsa-devel

Hi Takashi-san

On Thu, Dec 24, 2020 at 5:00 PM Takashi Sakamoto
<o-takashi@sakamocchi.jp> wrote:
> With records of tracepoints events for capture PCM substream
> which Bert's SDIO driver handles as well as for the playback PCM
> substream which speaker-test starts, we can get more details of
> runtimes.

Thanks so much for your detailed explanation. I tried capturing a new
trace for both speaker-test and arecord by running perf from a
different console and keeping an eye on /var/log/messages and it's
clear there is a deadlock, please see below. In my code I can see that
spin_lock_irqsave() is being called in the dw_mci_interrupt() which
then calls some other code which again calls spin_lock_irqsave() so I
guess that is what the problem is, and I have to restructure my code
to avoid that. What I don't understand, is why it says speaker-test is
trying to acquire the lock, while it's arecord that is using the
capture device to record (which uses my sdio driver). I guess it
doesn't matter because playback and capture are a single device (?).

Thanks!
Bert

[  187.653313]
[  187.656519] ============================================
[  187.663958] WARNING: possible recursive locking detected
[  187.671392] 5.9.12-00015-g3d36682d8cdf-dirty #74 Not tainted
[  187.679214] --------------------------------------------
[  187.686613] speaker-test/193 is trying to acquire lock:
[  187.693886] ed337cbc (&host->irq_lock#2){-...}-{2:2}, at:
dw_mci_set_drto+0xa0/0xf8
[  187.703898]
[  187.703898] but task is already holding lock:
[  187.713189] ed337cbc (&host->irq_lock#2){-...}-{2:2}, at:
dw_mci_interrupt+0x468/0x4e4
[  187.723463]
[  187.723463] other info that might help us debug this:
[  187.733531]  Possible unsafe locking scenario:
[  187.733531]
[  187.742727]        CPU0
[  187.746692]        ----
[  187.750624]   lock(&host->irq_lock#2);
[  187.755998]   lock(&host->irq_lock#2);
[  187.761326]
[  187.761326]  *** DEADLOCK ***
[  187.761326]
[  187.771203]  May be due to missing lock nesting notation
[  187.771203]
[  187.780964] 1 lock held by speaker-test/193:
[  187.786799]  #0: ed337cbc (&host->irq_lock#2){-...}-{2:2}, at:
dw_mci_interrupt+0x468/0x4e4
[  187.797239]
[  187.797239] stack backtrace:
[  187.804121] CPU: 0 PID: 193 Comm: speaker-test Not tainted
5.9.12-00015-g3d36682d8cdf-dirty #74
[  187.814905] Hardware name: Rockchip (Device Tree)
[  187.821204] [<c0311930>] (unwind_backtrace) from [<c030c200>]
(show_stack+0x10/0x14)
[  187.830964] [<c030c200>] (show_stack) from [<c0f4b9b0>]
(dump_stack+0xd0/0xf0)
[  187.840147] [<c0f4b9b0>] (dump_stack) from [<c03adc34>]
(__lock_acquire+0xff0/0x3394)
[  187.850018] [<c03adc34>] (__lock_acquire) from [<c03b0c74>]
(lock_acquire+0x134/0x56c)
[  187.859997] [<c03b0c74>] (lock_acquire) from [<c0f62900>]
(_raw_spin_lock_irqsave+0x4c/0x94)
[  187.870578] [<c0f62900>] (_raw_spin_lock_irqsave) from [<c0d0fca8>]
(dw_mci_set_drto+0xa0/0xf8)
[  187.881474] [<c0d0fca8>] (dw_mci_set_drto) from [<c0d10208>]
(dw_mci_interrupt+0x484/0x4e4)
[  187.891976] [<c0d10208>] (dw_mci_interrupt) from [<c03bf90c>]
(__handle_irq_event_percpu+0x70/0x484)
[  187.903363] [<c03bf90c>] (__handle_irq_event_percpu) from
[<c03bfd50>] (handle_irq_event_percpu+0x30/0x8c)
[  187.915376] [<c03bfd50>] (handle_irq_event_percpu) from
[<c03bfde4>] (handle_irq_event+0x38/0x5c)
[  187.926515] [<c03bfde4>] (handle_irq_event) from [<c03c46fc>]
(handle_fasteoi_irq+0xcc/0x124)
[  187.937255] [<c03c46fc>] (handle_fasteoi_irq) from [<c03be990>]
(generic_handle_irq+0x34/0x44)
[  187.948107] [<c03be990>] (generic_handle_irq) from [<c03befe4>]
(__handle_domain_irq+0x7c/0xe8)
[  187.959073] [<c03befe4>] (__handle_domain_irq) from [<c07a1340>]
(gic_handle_irq+0x58/0x9c)
[  187.969677] [<c07a1340>] (gic_handle_irq) from [<c0300f74>]
(__irq_usr+0x54/0x80)
[  187.979345] Exception stack(0xec40ffb0 to 0xec40fff8)
[  187.986316] ffa0:                                     bff0a126
b6ef7ef0 00000044 b6ef7670
[  187.996843] ffc0: 00000000 00000000 b6bfea54 b6bfb008 00000003
b6bfb008 000252a0 00025140
[  188.007419] ffe0: 000250b0 beb2484c 00012f44 b6ebc26c 60030010 ffffffff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question about alsa tracepoints and alsa debugging
  2020-12-25  3:06           ` Bert Schiettecatte
@ 2020-12-25  4:46             ` Bert Schiettecatte
  0 siblings, 0 replies; 7+ messages in thread
From: Bert Schiettecatte @ 2020-12-25  4:46 UTC (permalink / raw)
  To: Bert Schiettecatte, Lars-Peter Clausen, alsa-devel

Hi Takashi-san, Lars-Peter

> clear there is a deadlock, please see below. In my code I can see that
> spin_lock_irqsave() is being called in the dw_mci_interrupt() which
> then calls some other code which again calls spin_lock_irqsave() so I
> guess that is what the problem is, and I have to restructure my code
> to avoid that.

I managed to solve my problem by removing the accidentally introduced
extra locking call in dw_mci_set_drto(). So now everything seems to
work smoothly.

Thanks again for your help!
Bert

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-12-25  4:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CALd3UbSrAqYFe5Z-S6NMKGaVMvOZx7C4zV5O7CwtTKPmV+UeVQ@mail.gmail.com>
2020-12-22  5:23 ` Fwd: question about alsa tracepoints and alsa debugging Bert Schiettecatte
2020-12-23  4:31   ` Bert Schiettecatte
2020-12-23  9:56     ` Lars-Peter Clausen
2020-12-24 20:40       ` Bert Schiettecatte
2020-12-25  0:59         ` Takashi Sakamoto
2020-12-25  3:06           ` Bert Schiettecatte
2020-12-25  4:46             ` Bert Schiettecatte

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.