From: Robin Gong <yibin.gong@nxp.com>
To: Richard Leitner <richard.leitner@skidata.com>,
"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
"alsa-devel@alsa-project.org" <alsa-devel@alsa-project.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: "timur@kernel.org" <timur@kernel.org>,
"nicoleotsuka@gmail.com" <nicoleotsuka@gmail.com>,
"vkoul@kernel.org" <vkoul@kernel.org>,
dl-linux-imx <linux-imx@nxp.com>,
"kernel@pengutronix.de" <kernel@pengutronix.de>,
"dan.j.williams@intel.com" <dan.j.williams@intel.com>,
"shawnguo@kernel.org" <shawnguo@kernel.org>,
Benjamin Bara <benjamin.bara@skidata.com>
Subject: RE: pcm|dmaengine|imx-sdma race condition on i.MX6
Date: Fri, 14 Aug 2020 08:45:17 +0000 [thread overview]
Message-ID: <VE1PR04MB6638EE5BDBE2C65FF50B7DB889400@VE1PR04MB6638.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <20200813112258.GA327172@pcleri>
On 2020/08/13 19:23: Richard Leitner <richard.leitner@skidata.com> wrote:
> Hi,
> we've found a race condition with the PCM on the i.MX6 which results in an
> -EIO for the SNDRV_PCM_IOCTL_READI_FRAMES ioctl after an -EPIPE (XRUN).
>
> A possible reproduction may look like the following reduced call graph during a
> PCM capture:
>
> us -> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES)
> - wait_for_avail()
> - schedule_timeout()
> -> snd_pcm_update_hw_ptr0()
> - snd_pcm_update_state: EPIPE (XRUN)
> - sdma_disable_channel_async() # get's scheduled away due to sleep us
> <- ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns -EPIPE us ->
> ioctl(SNDRV_PCM_IOCTL_PREPARE) # as reaction to the EPIPE (XRUN) us ->
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) # next try to capture frames
> - sdma_prep_dma_cyclic()
> - sdma_load_context() # not loaded as context_loaded is 1
> - wait_for_avail()
> - schedule_timeout()
> # now the sdma_channel_terminate_work() comes back and sets #
> context_loaded = false and frees in vchan_dma_desc_free_list().
> us <- ioctl returns -EIO (capture write error (DMA or IRQ trouble?))
Seems the write error caused by context_loaded not set to false before
next transfer start? If yes, please have a try with the 03/04 of the below
patch set, anyway, could you post your failure log?
https://lkml.org/lkml/2020/8/11/111
>
>
> What we have found out, based on our understanding:
> The dmaengine docu states that a dmaengine_terminate_async() must be
> followed by a dmaengine_synchronize().
> However, in the pcm_dmaengine.c, only dmaengine_terminate_async() is
> called (for performance reasons and because it might be called from an
> interrupt handler).
>
> In our tests, we saw that the user-space immediately calls
> ioctl(SNDRV_PCM_IOCTL_PREPARE) as a handler for the happened xrun
> (previous ioctl(SNDRV_PCM_IOCTL_READI_FRAMES) returns with -EPIPE). In
> our case (imx-sdma.c), the terminate really happens asynchronously with a
> worker thread which is not awaited/synchronized by the
> ioctl(SNDRV_PCM_IOCTL_PREPARE) call.
>
> Since the syscall immediately enters an atomic context
> (snd_pcm_stream_lock_irq()), we are not able to flush the work of the
> termination worker from within the DMA context. This leads to an
> unterminated DMA getting re-initialized and then terminated.
>
> On the i.MX6 platform the problem is (if I got it correctly) that the
> sdma_channel_terminate_work() called after the -EPIPE gets scheduled away
> (for the 1-2ms sleep [1]). During that time the userspace already sends in the
> ioctl(SNDRV_PCM_IOCTL_PREPARE) and
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES).
> As none of them are anyhow synchronized to the terminate_worker the
> vchan_dma_desc_free_list() [2] and "sdmac->context_loaded = false;" [3] are
> executed during the wait_for_avail() [4] of the
> ioctl(SNDRV_PCM_IOCTL_READI_FRAMES).
>
> To make sure we identified the problem correctly we've tested to add a
> "dmaengine_synchronize()" before the snd_pcm_prepare() in [5]. This fixed the
> race condition in all our tests. (Before we were able to reproduce it in 100% of
> the test runs).
>
> Based on our understanding, there are two different points to ensure the
> termination:
> Either ensure that the termination is finished within the previous
> SNDRV_PCM_IOCTL_READI_FRAMES call (inside the DMA context) or finishing
> it in the SNDRV_PCM_IOCTL_PREPARE call (and all other applicable ioclts)
> before entering the atomic context (from the PCM context).
>
> We initially thought about implementing the first approach, basically splitting
> up the dma_device terminate_all operation into a sync
> (busy-wait) and a async one. This would align the operations with the
> DMAengine interface and would enable a sync termination variant from atomic
> contexts.
> However, we saw that the dma_free_attrs() function has a WARN_ON on irqs
> disabled, which would be the case for the sync variant.
> Side note: We found this issue on the current v5.4.y LTS branch, but it also
> affects v5.8.y.
>
> Any feedback or pointers how we may fix the problem are warmly welcome!
> If anything is unclear please just ask :-)
>
> regards;
> Richard Leitner
> Benjamin Bara
>
> [1]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
> L1066&data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
> e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
> 329145824068928&sdata=D9F%2FRUG27xv9nv8J1KtrLtld2eaI6gsXiWIAIgk
> Avjw%3D&reserved=0
> [2]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
> L1071&data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
> e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
> 329145824068928&sdata=0EKDVgzOZzL7TpX4ykhqjvpz5ryUHUpWw7frRe
> cksBU%3D&reserved=0
> [3]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fdrivers%2Fdma%2Fimx-sdma.c%23
> L1072&data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7
> e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637
> 329145824068928&sdata=aIhatvb1ocQqyYCVFEg71LgJlRBoVusbDFPIxnte
> PuY%3D&reserved=0
> [4]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fsound%2Fcore%2Fpcm_lib.c%23L1
> 825&data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f7e
> 7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6373
> 29145824073919&sdata=y0Udbd%2FKGaVgqLrcp6fNOlMlFCGHCMfojkpp
> B4HzUuE%3D&reserved=0
> [5]https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.
> bootlin.com%2Flinux%2Fv5.8%2Fsource%2Fsound%2Fcore%2Fpcm_native.c%2
> 3L3226&data=02%7C01%7Cyibin.gong%40nxp.com%7C79ad115b01ef453f
> 7e7408d83f7b3c4d%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63
> 7329145824073919&sdata=ch3BQ5DDGU5HWXqIZSvUeFnBoRoP%2BMM
> HEpnk8mIfWj8%3D&reserved=0
next prev parent reply other threads:[~2020-08-16 7:54 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-13 11:22 pcm|dmaengine|imx-sdma race condition on i.MX6 Richard Leitner
2020-08-14 8:45 ` Robin Gong [this message]
[not found] ` <7f98cd6d30404e4d9d621f57f45ae441@skidata.com>
2020-08-17 5:38 ` Richard Leitner
2020-08-17 7:28 ` Benjamin Bara - SKIDATA
2020-08-17 9:22 ` Robin Gong
2020-08-17 11:38 ` Benjamin Bara - SKIDATA
2020-08-18 10:41 ` Robin Gong
2020-08-19 11:08 ` Lars-Peter Clausen
2020-08-19 11:16 ` Lars-Peter Clausen
2020-08-19 14:15 ` Benjamin Bara - SKIDATA
2020-08-19 14:25 ` Benjamin Bara - SKIDATA
2020-08-20 15:01 ` Robin Gong
2020-08-21 4:34 ` Richard Leitner
2020-08-21 9:21 ` Robin Gong
2020-08-21 9:54 ` Richard Leitner
2020-08-20 6:52 ` Sascha Hauer
2020-08-21 9:52 ` Robin Gong
2020-08-25 6:12 ` Sascha Hauer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VE1PR04MB6638EE5BDBE2C65FF50B7DB889400@VE1PR04MB6638.eurprd04.prod.outlook.com \
--to=yibin.gong@nxp.com \
--cc=alsa-devel@alsa-project.org \
--cc=benjamin.bara@skidata.com \
--cc=dan.j.williams@intel.com \
--cc=dmaengine@vger.kernel.org \
--cc=kernel@pengutronix.de \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-imx@nxp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nicoleotsuka@gmail.com \
--cc=richard.leitner@skidata.com \
--cc=shawnguo@kernel.org \
--cc=timur@kernel.org \
--cc=vkoul@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).