On Thu, Oct 22, 2020 at 11:50:41AM +0200, Maxime Ripard wrote:

> This is caused by the HDMI driver polling some status bit that reports
> that the infoframes have been properly sent, and calling usleep_range
> between each iteration[1], and that is done in our trigger callback that
> seems to be run with a spinlock taken and the interrupt disabled
> (snd_pcm_action_lock_irq) as part of snd_pcm_start_lock_irq. This is the
> entire stack trace:

That doesn't sound like something I would expect you do be doing in the
trigger callback TBH - it feels like if this is something that could
block then the setup should have been done during parameter
configuration or something rather than in trigger.

> It looks like the snd_soc_dai_link structure has a nonatomic flag that
> seems to be made to address more or less that issue, taking a mutex
> instead of a spinlock. However setting that flag results in another
> lockdep issue, since the dmaengine controller doing the DMA transfer
> would call snd_pcm_period_elapsed on completion, in a tasklet, this time
> taking a mutex in an atomic context which is just as bad as the initial
> issue. This is the stacktrace this time:

Like Jaroslav says you could punt to a workqueue here.  I'd be more
inclined to move the sleeping stuff out of the trigger operations but
that'd avoid the issue too.  There are some drivers doing this already
IIRC.

> So, I'm not really sure what I'm supposed to do here. The drivers
> involved don't appear to be doing anything extraordinary, but the issues
> lockdep report are definitely valid too. What are the expectations in
> terms of context from ALSA when running the callbacks, and how can we
> fix it?

To me having something in the trigger that needs waiting for is the bit
that feels the most awkward fit here, trigger is supposed to run very
quickly.