alsa-devel.alsa-project.org archive mirror
 help / color / mirror / Atom feed
From: Robin Gong <yibin.gong@nxp.com>
To: Richard Leitner <richard.leitner@skidata.com>,
	"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
	"alsa-devel@alsa-project.org" <alsa-devel@alsa-project.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "timur@kernel.org" <timur@kernel.org>,
	"nicoleotsuka@gmail.com" <nicoleotsuka@gmail.com>,
	"vkoul@kernel.org" <vkoul@kernel.org>,
	dl-linux-imx <linux-imx@nxp.com>,
	"kernel@pengutronix.de" <kernel@pengutronix.de>,
	"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
	"shawnguo@kernel.org" <shawnguo@kernel.org>,
	Benjamin Bara <benjamin.bara@skidata.com>
Subject: RE: pcm|dmaengine|imx-sdma race condition on i.MX6
Date: Fri, 14 Aug 2020 08:45:17 +0000	[thread overview]
Message-ID: <VE1PR04MB6638EE5BDBE2C65FF50B7DB889400@VE1PR04MB6638.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <20200813112258.GA327172@pcleri>

On 2020/08/13 19:23: Richard Leitner <richard.leitner@skidata.com> wrote:
> Hi,
> we've found a race condition with the PCM on the i.MX6 which results in an
> -EIO for the SNDRV_PCM_IOCTL_READI_FRAMES ioctl after an -EPIPE (XRUN).
> 
> A possible reproduction may look like the following reduced call graph during a
> PCM capture:
> 
> us -> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES)
>       - wait_for_avail()
>         - schedule_timeout()
>    -> snd_pcm_update_hw_ptr0()
>       - snd_pcm_update_state: EPIPE (XRUN)
>       - sdma_disable_channel_async() # get's scheduled away due to sleep us
> <- ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns -EPIPE us ->
> ioctl(SNDRV_PCM_IOCTL_PREPARE) # as reaction to the EPIPE (XRUN) us ->
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) # next try to capture frames
>       - sdma_prep_dma_cyclic()
>         - sdma_load_context() # not loaded as context_loaded is 1
>       - wait_for_avail()
>         - schedule_timeout()
> # now the sdma_channel_terminate_work() comes back and sets #
> context_loaded = false and frees in vchan_dma_desc_free_list().
> us <- ioctl returns -EIO (capture write error (DMA or IRQ trouble?))
Seems the write error caused by context_loaded not set to false before
next transfer start? If yes, please have a try with the 03/04 of the below
patch set, anyway, could you post your failure log?
https://lkml.org/lkml/2020/8/11/111

> 
> 
> What we have found out, based on our understanding:
> The dmaengine docu states that a dmaengine_terminate_async() must be
> followed by a dmaengine_synchronize().
> However, in the pcm_dmaengine.c, only dmaengine_terminate_async() is
> called (for performance reasons and because it might be called from an
> interrupt handler).
> 
> In our tests, we saw that the user-space immediately calls
> ioctl(SNDRV_PCM_IOCTL_PREPARE) as a handler for the happened xrun
> (previous ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns with -EPIPE). In
> our case (imx-sdma.c), the terminate really happens asynchronously with a
> worker thread which is not awaited/synchronized by the
> ioctl(SNDRV_PCM_IOCTL_PREPARE) call.
> 
> Since the syscall immediately enters an atomic context
> (snd_pcm_stream_lock_irq()), we are not able to flush the work of the
> termination worker from within the DMA context. This leads to an
> unterminated DMA getting re-initialized and then terminated.
> 
> On the i.MX6 platform the problem is (if I got it correctly) that the
> sdma_channel_terminate_work() called after the -EPIPE gets scheduled away
> (for the 1-2ms sleep [1]). During that time the userspace already sends in the
> ioctl(SNDRV_PCM_IOCTL_PREPARE) and
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES).
> As none of them are anyhow synchronized to the terminate_worker the
> vchan_dma_desc_free_list() [2] and "sdmac->context_loaded = false;" [3] are
> executed during the wait_for_avail() [4] of the
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES).
> 
> To make sure we identified the problem correctly we've tested to add a
> "dmaengine_synchronize()" before the snd_pcm_prepare() in [5]. This fixed the
> race condition in all our tests. (Before we were able to reproduce it in 100% of
> the test runs).
> 
> Based on our understanding, there are two different points to ensure the
> termination:
> Either ensure that the termination is finished within the previous
> SNDRV_PCM_IOCTL_READI_FRAMES call (inside the DMA context) or finishing
> it in the SNDRV_PCM_IOCTL_PREPARE call (and all other applicable ioclts)
> before entering the atomic context (from the PCM context).
> 
> We initially thought about implementing the first approach, basically splitting
> up the dma_device terminate_all operation into a sync
> (busy-wait) and a async one. This would align the operations with the
> DMAengine interface and would enable a sync termination variant from atomic
> contexts.
> However, we saw that the dma_free_attrs() function has a WARN_ON on irqs
> disabled, which would be the case for the sync variant. 
> Side note: We found this issue on the current v5.4.y LTS branch, but it also
> affects v5.8.y.
> 
> Any feedback or pointers how we may fix the problem are warmly welcome!
> If anything is unclear please just ask :-)
> 
> regards;
> Richard Leitner
> Benjamin Bara
> 
> [1]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
> L1066&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
> e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
> 329145824068928&amp;sdata=D9F%2FRUG27xv9nv8J1KtrLtld2eaI6gsXiWIAIgk
> Avjw%3D&amp;reserved=0
> [2]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
> L1071&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
> e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
> 329145824068928&amp;sdata=0EKDVgzOZzL7TpX4ykhqjvpz5ryUHUpWw7frRe
> cksBU%3D&amp;reserved=0
> [3]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
> L1072&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
> e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
> 329145824068928&amp;sdata=aIhatvb1ocQqyYCVFEg71LgJlRBoVusbDFPIxnte
> PuY%3D&amp;reserved=0
> [4]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fsound%2Fcore%2Fpcm_lib.c%23L1
> 825&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7e
> 7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6373
> 29145824073919&amp;sdata=y0Udbd%2FKGaVgqLrcp6fNOlMlFCGHCMfojkpp
> B4HzUuE%3D&amp;reserved=0
> [5]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fsound%2Fcore%2Fpcm_native.c%2
> 3L3226&amp;data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f
> 7e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63
> 7329145824073919&amp;sdata=ch3BQ5DDGU5HWXqIZSvUeFnBoRoP%2BMM
> HEpnk8mIfWj8%3D&amp;reserved=0

  reply	other threads:[~2020-08-16  7:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-13 11:22 pcm|dmaengine|imx-sdma race condition on i.MX6 Richard Leitner
2020-08-14  8:45 ` Robin Gong [this message]
     [not found]   ` <7f98cd6d30404e4d9d621f57f45ae441@skidata.com>
2020-08-17  5:38     ` Richard Leitner
2020-08-17  7:28   ` Benjamin Bara - SKIDATA
2020-08-17  9:22     ` Robin Gong
2020-08-17 11:38       ` Benjamin Bara - SKIDATA
2020-08-18 10:41         ` Robin Gong
2020-08-19 11:08     ` Lars-Peter Clausen
2020-08-19 11:16       ` Lars-Peter Clausen
2020-08-19 14:15       ` Benjamin Bara - SKIDATA
2020-08-19 14:25       ` Benjamin Bara - SKIDATA
2020-08-20 15:01         ` Robin Gong
2020-08-21  4:34           ` Richard Leitner
2020-08-21  9:21             ` Robin Gong
2020-08-21  9:54               ` Richard Leitner
2020-08-20  6:52       ` Sascha Hauer
2020-08-21  9:52         ` Robin Gong
2020-08-25  6:12           ` Sascha Hauer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=VE1PR04MB6638EE5BDBE2C65FF50B7DB889400@VE1PR04MB6638.eurprd04.prod.outlook.com \
    --to=yibin.gong@nxp.com \
    --cc=alsa-devel@alsa-project.org \
    --cc=benjamin.bara@skidata.com \
    --cc=dan.j.williams@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=kernel@pengutronix.de \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-imx@nxp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nicoleotsuka@gmail.com \
    --cc=richard.leitner@skidata.com \
    --cc=shawnguo@kernel.org \
    --cc=timur@kernel.org \
    --cc=vkoul@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).