linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime
       [not found] <1664210154-11552-1-git-send-email-erosca@de.adit-jv.com>
@ 2022-09-27  7:34 ` Amadeusz Sławiński
  2022-09-27 11:56   ` Eugeniu Rosca
  2022-09-27  7:50 ` Cezary Rojewski
  2022-09-27  7:51 ` Pierre-Louis Bossart
  2 siblings, 1 reply; 10+ messages in thread
From: Amadeusz Sławiński @ 2022-09-27  7:34 UTC (permalink / raw)
  To: Eugeniu Rosca, Liam Girdwood, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, alsa-devel, linux-kernel
  Cc: Yanmin Zhang, Eugeniu Rosca, Jiada Wang, Zhang Yanmin,
	Ramesh Babu, Dean Jenkins, Ramesh Babu B, xiao jin,
	Cezary Rojewski

On 9/26/2022 6:35 PM, Eugeniu Rosca wrote:
> From: xiao jin <jin.xiao@intel.com>
> 
> After start of fe and be, fe might go to close without triggering
> STOP, and substream->runtime is freed. However, be is still at
> START state and its substream->runtime still points to the
> freed runtime.
> 

Well if it is being freed, maybe pointer should be set to NULL instead 
in place that happens?

> Later on, FE is opened/started again, and triggers STOP.
> snd_pcm_do_stop => dpcm_fe_dai_trigger
>                  => dpcm_fe_dai_do_trigger
>                  => dpcm_be_dai_trigger
>                  => dpcm_do_trigger
>                  => soc_pcm_trigger
>                  => skl_platform_pcm_trigger
> skl_platform_pcm_trigger accesses the freed old runtime data and
> kernel panic.
> 
> The patch fixes it by assigning be_substream->runtime in
> dpcm_be_dai_startup when be's state is START.
> 
> Signed-off-by: xiao jin <jin.xiao@intel.com>
> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
> Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
> ---
>   sound/soc/soc-pcm.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
> index 4f60c0a83311..6ca1d02065ce 100644
> --- a/sound/soc/soc-pcm.c
> +++ b/sound/soc/soc-pcm.c
> @@ -1608,6 +1608,8 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
>   		if (be->dpcm[stream].users++ != 0)
>   			continue;
>   
> +		be_substream->runtime = be->dpcm[stream].runtime;
> +
>   		if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_NEW) &&
>   		    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_CLOSE))
>   			continue;
> @@ -1615,7 +1617,6 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
>   		dev_dbg(be->dev, "ASoC: open %s BE %s\n",
>   			stream ? "capture" : "playback", be->dai_link->name);
>   
> -		be_substream->runtime = be->dpcm[stream].runtime;
>   		err = __soc_pcm_open(be, be_substream);
>   		if (err < 0) {
>   			be->dpcm[stream].users--;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime
       [not found] <1664210154-11552-1-git-send-email-erosca@de.adit-jv.com>
  2022-09-27  7:34 ` [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime Amadeusz Sławiński
@ 2022-09-27  7:50 ` Cezary Rojewski
  2022-09-27 11:00   ` Eugeniu Rosca
  2022-09-27  7:51 ` Pierre-Louis Bossart
  2 siblings, 1 reply; 10+ messages in thread
From: Cezary Rojewski @ 2022-09-27  7:50 UTC (permalink / raw)
  To: Eugeniu Rosca, Liam Girdwood, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, alsa-devel, linux-kernel
  Cc: Yanmin Zhang, Eugeniu Rosca, Jiada Wang, Zhang Yanmin,
	Ramesh Babu, Dean Jenkins, Ramesh Babu B, xiao jin,
	Pierre-Louis Bossart, Amadeusz Sławiński

On 2022-09-26 6:35 PM, Eugeniu Rosca wrote:
> From: xiao jin <jin.xiao@intel.com>
> 
> After start of fe and be, fe might go to close without triggering
> STOP, and substream->runtime is freed. However, be is still at
> START state and its substream->runtime still points to the
> freed runtime.
> 
> Later on, FE is opened/started again, and triggers STOP.
> snd_pcm_do_stop => dpcm_fe_dai_trigger
>                  => dpcm_fe_dai_do_trigger
>                  => dpcm_be_dai_trigger
>                  => dpcm_do_trigger
>                  => soc_pcm_trigger
>                  => skl_platform_pcm_trigger
> skl_platform_pcm_trigger accesses the freed old runtime data and
> kernel panic.
> 
> The patch fixes it by assigning be_substream->runtime in
> dpcm_be_dai_startup when be's state is START.
> 
> Signed-off-by: xiao jin <jin.xiao@intel.com>
> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
> Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>


Hello,

The change seems to be driven by the skylake-driver problem. With all 
due respect, why not ping owners of the driver first? There are some 
crucial CCs missing.

I'd like to know more about the scenario you guys reproduced the problem 
in. Configuration details and kernel base would be good to know too. 
Since our CI did not detect problem of such sort, if the problem 
actually exists, we would like to append a test or two to cover it later on.


Regards,
Czarek

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime
       [not found] <1664210154-11552-1-git-send-email-erosca@de.adit-jv.com>
  2022-09-27  7:34 ` [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime Amadeusz Sławiński
  2022-09-27  7:50 ` Cezary Rojewski
@ 2022-09-27  7:51 ` Pierre-Louis Bossart
  2022-09-27 12:30   ` Eugeniu Rosca
  2 siblings, 1 reply; 10+ messages in thread
From: Pierre-Louis Bossart @ 2022-09-27  7:51 UTC (permalink / raw)
  To: Eugeniu Rosca, Liam Girdwood, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, alsa-devel, linux-kernel
  Cc: Yanmin Zhang, Eugeniu Rosca, Jiada Wang, Zhang Yanmin,
	Ramesh Babu, Dean Jenkins, Ramesh Babu B, xiao jin,
	Cezary Rojewski



On 9/26/22 18:35, Eugeniu Rosca wrote:
> From: xiao jin <jin.xiao@intel.com>
> 
> After start of fe and be, fe might go to close without triggering
> STOP, and substream->runtime is freed. However, be is still at
> START state and its substream->runtime still points to the
> freed runtime.
> 
> Later on, FE is opened/started again, and triggers STOP.
> snd_pcm_do_stop => dpcm_fe_dai_trigger
>                 => dpcm_fe_dai_do_trigger
>                 => dpcm_be_dai_trigger
>                 => dpcm_do_trigger
>                 => soc_pcm_trigger
>                 => skl_platform_pcm_trigger
> skl_platform_pcm_trigger accesses the freed old runtime data and
> kernel panic.
> 
> The patch fixes it by assigning be_substream->runtime in
> dpcm_be_dai_startup when be's state is START.

Can I ask on which kernel this patch was validated and on what platform?

We've done a lot of work since last year on DPCM states, and I wonder
the problem mentioned above actually exists on recent kernels.

Specifically, if the FE is closed, I don't get how the BE is not closed
as well. And if this problem is found on a recent kernel, then it should
be seen in the AVS driver as well, no?


> Signed-off-by: xiao jin <jin.xiao@intel.com>
> Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
> Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
> ---
>  sound/soc/soc-pcm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
> index 4f60c0a83311..6ca1d02065ce 100644
> --- a/sound/soc/soc-pcm.c
> +++ b/sound/soc/soc-pcm.c
> @@ -1608,6 +1608,8 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
>  		if (be->dpcm[stream].users++ != 0)
>  			continue;
>  
> +		be_substream->runtime = be->dpcm[stream].runtime;
> +
>  		if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_NEW) &&
>  		    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_CLOSE))
>  			continue;
> @@ -1615,7 +1617,6 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
>  		dev_dbg(be->dev, "ASoC: open %s BE %s\n",
>  			stream ? "capture" : "playback", be->dai_link->name);
>  
> -		be_substream->runtime = be->dpcm[stream].runtime;
>  		err = __soc_pcm_open(be, be_substream);
>  		if (err < 0) {
>  			be->dpcm[stream].users--;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime
  2022-09-27  7:50 ` Cezary Rojewski
@ 2022-09-27 11:00   ` Eugeniu Rosca
  2022-09-28 14:24     ` Cezary Rojewski
  0 siblings, 1 reply; 10+ messages in thread
From: Eugeniu Rosca @ 2022-09-27 11:00 UTC (permalink / raw)
  To: Cezary Rojewski
  Cc: Eugeniu Rosca, Liam Girdwood, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, alsa-devel, linux-kernel, Yanmin Zhang,
	Eugeniu Rosca, Jiada Wang, Zhang Yanmin, Ramesh Babu,
	Dean Jenkins, Ramesh Babu B, xiao jin, Pierre-Louis Bossart,
	Amadeusz Sławiński, Andy Shevchenko, Ranjani Sridharan,
	Bard Liao, Kai Vehmanen, Peter Ujfalusi, Guennadi Liakhovetski,
	Rander Wang

Hello Czarek,

On Di, Sep 27, 2022 at 09:50:05 +0200, Cezary Rojewski wrote:
> On 2022-09-26 6:35 PM, Eugeniu Rosca wrote:
> >From: xiao jin <jin.xiao@intel.com>
> >
> >After start of fe and be, fe might go to close without triggering
> >STOP, and substream->runtime is freed. However, be is still at
> >START state and its substream->runtime still points to the
> >freed runtime.
> >
> >Later on, FE is opened/started again, and triggers STOP.
> >snd_pcm_do_stop => dpcm_fe_dai_trigger
> >                 => dpcm_fe_dai_do_trigger
> >                 => dpcm_be_dai_trigger
> >                 => dpcm_do_trigger
> >                 => soc_pcm_trigger
> >                 => skl_platform_pcm_trigger
> >skl_platform_pcm_trigger accesses the freed old runtime data and
> >kernel panic.
> >
> >The patch fixes it by assigning be_substream->runtime in
> >dpcm_be_dai_startup when be's state is START.
> >
> >Signed-off-by: xiao jin <jin.xiao@intel.com>
> >Signed-off-by: Zhang Yanmin <yanmin.zhang@intel.com>
> >Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
> 
> Hello,
> 
> The change seems to be driven by the skylake-driver problem. 

Agreed, based on the author/co-signer's e-mail and the call stack.

> With all due
> respect, why not ping owners of the driver first? There are some crucial CCs
> missing.

Some feedback already provided by Pierre-Louis Bossart (many thanks).
Cc-ing more Intel contributors in the sound subsystem.

> 
> I'd like to know more about the scenario you guys reproduced the problem in.

This patch was originally identified in the Intel Apollo Lake v4.1 KNLs.
Given that the change itself is in the core sound subsystem, our internal
assessment was that the patch might potentially be relevant/helpful
on other HW platforms.

Our intention is to confirm or invalidate this assumption with the
original developers of the patch, as well as with the audio maintainers
and the members of the alsa-devel ML.

> Configuration details and kernel base would be good to know too. Since our
> CI did not detect problem of such sort, if the problem actually exists, we
> would like to append a test or two to cover it later on.

If there is no evidence that the patch is fixing a real-life issue
occurring in the latest vanilla, I agree to drop the patch.

So far, I do not possess this evidence myself.

> Regards,
> Czarek

Best regards,
Eugeniu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime
  2022-09-27  7:34 ` [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime Amadeusz Sławiński
@ 2022-09-27 11:56   ` Eugeniu Rosca
  0 siblings, 0 replies; 10+ messages in thread
From: Eugeniu Rosca @ 2022-09-27 11:56 UTC (permalink / raw)
  To: Amadeusz Sławiński
  Cc: Eugeniu Rosca, Liam Girdwood, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, alsa-devel, linux-kernel, Yanmin Zhang,
	Eugeniu Rosca, Jiada Wang, Zhang Yanmin, Ramesh Babu,
	Dean Jenkins, Ramesh Babu B, xiao jin, Cezary Rojewski

Hello Amadeusz,

On Di, Sep 27, 2022 at 09:34:49 +0200, Amadeusz Sławiński wrote:
> On 9/26/2022 6:35 PM, Eugeniu Rosca wrote:
> >From: xiao jin <jin.xiao@intel.com>
> >
> >After start of fe and be, fe might go to close without triggering
> >STOP, and substream->runtime is freed. However, be is still at
> >START state and its substream->runtime still points to the
> >freed runtime.
> >
> 
> Well if it is being freed, maybe pointer should be set to NULL instead in
> place that happens?
> 

As soon as the agreement is reached that the patch is still
relevant/useful on the latest vanilla, I think this review
finding will receive proper attention.

Thanks,
Eugeniu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime
  2022-09-27  7:51 ` Pierre-Louis Bossart
@ 2022-09-27 12:30   ` Eugeniu Rosca
  2022-09-28  8:36     ` Pierre-Louis Bossart
  0 siblings, 1 reply; 10+ messages in thread
From: Eugeniu Rosca @ 2022-09-27 12:30 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Eugeniu Rosca, Liam Girdwood, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, alsa-devel, linux-kernel, Yanmin Zhang,
	Eugeniu Rosca, Jiada Wang, Zhang Yanmin, Ramesh Babu,
	Dean Jenkins, Ramesh Babu B, xiao jin, Cezary Rojewski

Hi Pierre,

On Di, Sep 27, 2022 at 09:51:46 +0200, Pierre-Louis Bossart wrote:
> On 9/26/22 18:35, Eugeniu Rosca wrote:
> > From: xiao jin <jin.xiao@intel.com>
> > 
> > After start of fe and be, fe might go to close without triggering
> > STOP, and substream->runtime is freed. However, be is still at
> > START state and its substream->runtime still points to the
> > freed runtime.
> > 
> > Later on, FE is opened/started again, and triggers STOP.
> > snd_pcm_do_stop => dpcm_fe_dai_trigger
> >                 => dpcm_fe_dai_do_trigger
> >                 => dpcm_be_dai_trigger
> >                 => dpcm_do_trigger
> >                 => soc_pcm_trigger
> >                 => skl_platform_pcm_trigger
> > skl_platform_pcm_trigger accesses the freed old runtime data and
> > kernel panic.
> > 
> > The patch fixes it by assigning be_substream->runtime in
> > dpcm_be_dai_startup when be's state is START.
> 
> Can I ask on which kernel this patch was validated and on what platform?

As shared with Czarek in https://lore.kernel.org/alsa-devel/20220927110022.GA3802@lxhi-065/ ,
this patch was originally extracted from the out-of-tree Intel Apollo
Lake v4.1 KNL releases, hence it was validated on Intel ref.boards.

No re-testing/re-validation has been performed on the latest vanilla.

One of the goals behind submitting the patch is getting in touch
with the original authors, as well as the members of alsa-devel,
to assess if the patch is still relevant.

> 
> We've done a lot of work since last year on DPCM states, 

Could you please feedback if the work on the DPCM states is
pre- or post-v5.10?


> and I wonder
> the problem mentioned above actually exists on recent kernels.
> 
> Specifically, if the FE is closed, I don't get how the BE is not closed
> as well. And if this problem is found on a recent kernel, then it should
> be seen in the AVS driver as well, no?

It is totally conceivable (if not very likely) that the mainline
advancements in the sound subsystem make this patch obsolete.

I would be happy if that's the final outcome of our discussion
(since this will allow dropping the patch in our downstream kernel).

Best Regards,
Eugeniu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime
  2022-09-27 12:30   ` Eugeniu Rosca
@ 2022-09-28  8:36     ` Pierre-Louis Bossart
  2022-09-29 16:36       ` Eugeniu Rosca
  0 siblings, 1 reply; 10+ messages in thread
From: Pierre-Louis Bossart @ 2022-09-28  8:36 UTC (permalink / raw)
  To: Eugeniu Rosca
  Cc: Yanmin Zhang, alsa-devel, Eugeniu Rosca, Jiada Wang,
	linux-kernel, Cezary Rojewski, Zhang Yanmin, Takashi Iwai,
	Ramesh Babu, Liam Girdwood, Dean Jenkins, Mark Brown,
	Ramesh Babu B, xiao jin



On 9/27/22 14:30, Eugeniu Rosca wrote:
> Hi Pierre,
> 
> On Di, Sep 27, 2022 at 09:51:46 +0200, Pierre-Louis Bossart wrote:
>> On 9/26/22 18:35, Eugeniu Rosca wrote:
>>> From: xiao jin <jin.xiao@intel.com>
>>>
>>> After start of fe and be, fe might go to close without triggering
>>> STOP, and substream->runtime is freed. However, be is still at
>>> START state and its substream->runtime still points to the
>>> freed runtime.
>>>
>>> Later on, FE is opened/started again, and triggers STOP.
>>> snd_pcm_do_stop => dpcm_fe_dai_trigger
>>>                 => dpcm_fe_dai_do_trigger
>>>                 => dpcm_be_dai_trigger
>>>                 => dpcm_do_trigger
>>>                 => soc_pcm_trigger
>>>                 => skl_platform_pcm_trigger
>>> skl_platform_pcm_trigger accesses the freed old runtime data and
>>> kernel panic.
>>>
>>> The patch fixes it by assigning be_substream->runtime in
>>> dpcm_be_dai_startup when be's state is START.
>>
>> Can I ask on which kernel this patch was validated and on what platform?
> 
> As shared with Czarek in https://lore.kernel.org/alsa-devel/20220927110022.GA3802@lxhi-065/ ,
> this patch was originally extracted from the out-of-tree Intel Apollo
> Lake v4.1 KNL releases, hence it was validated on Intel ref.boards.
> 
> No re-testing/re-validation has been performed on the latest vanilla.

There's no way to predict how a patch for a kernel 4.1 - released 7
years ago - would behave with a new kernel. If it's not tested it cannot
be merged.

> One of the goals behind submitting the patch is getting in touch
> with the original authors, as well as the members of alsa-devel,
> to assess if the patch is still relevant.

The only thing we could do is have more clarity on the test case and try
to reproduce it.

>>
>> We've done a lot of work since last year on DPCM states, 
> 
> Could you please feedback if the work on the DPCM states is
> pre- or post-v5.10?

It doesn't matter for this discussion on the upstream kernel. But yes
it's post v5.10.

> 
>> and I wonder
>> the problem mentioned above actually exists on recent kernels.
>>
>> Specifically, if the FE is closed, I don't get how the BE is not closed
>> as well. And if this problem is found on a recent kernel, then it should
>> be seen in the AVS driver as well, no?
> 
> It is totally conceivable (if not very likely) that the mainline
> advancements in the sound subsystem make this patch obsolete.
> 
> I would be happy if that's the final outcome of our discussion
> (since this will allow dropping the patch in our downstream kernel).
> 
> Best Regards,
> Eugeniu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime
  2022-09-27 11:00   ` Eugeniu Rosca
@ 2022-09-28 14:24     ` Cezary Rojewski
  2022-09-29 16:25       ` Eugeniu Rosca
  0 siblings, 1 reply; 10+ messages in thread
From: Cezary Rojewski @ 2022-09-28 14:24 UTC (permalink / raw)
  To: Eugeniu Rosca
  Cc: Liam Girdwood, Mark Brown, Jaroslav Kysela, Takashi Iwai,
	alsa-devel, linux-kernel, Yanmin Zhang, Eugeniu Rosca,
	Jiada Wang, Zhang Yanmin, Ramesh Babu, Dean Jenkins,
	Ramesh Babu B, xiao jin, Pierre-Louis Bossart,
	Amadeusz Sławiński, Andy Shevchenko, Ranjani Sridharan,
	Bard Liao, Kai Vehmanen, Peter Ujfalusi, Guennadi Liakhovetski,
	Rander Wang

On 2022-09-27 1:00 PM, Eugeniu Rosca wrote:
> Hello Czarek,

...

>> I'd like to know more about the scenario you guys reproduced the problem in.
> 
> This patch was originally identified in the Intel Apollo Lake v4.1 KNLs.
> Given that the change itself is in the core sound subsystem, our internal
> assessment was that the patch might potentially be relevant/helpful
> on other HW platforms.
> 
> Our intention is to confirm or invalidate this assumption with the
> original developers of the patch, as well as with the audio maintainers
> and the members of the alsa-devel ML.
> 
>> Configuration details and kernel base would be good to know too. Since our
>> CI did not detect problem of such sort, if the problem actually exists, we
>> would like to append a test or two to cover it later on.
> 
> If there is no evidence that the patch is fixing a real-life issue
> occurring in the latest vanilla, I agree to drop the patch.
> 
> So far, I do not possess this evidence myself.


I've spent some time to locate the change. Found it and it seems 
obsolete. Some tags are missing in the revision of yours and the 
original date does not match either - it's Apr 2018 for the original. 
Won't be mentioning the tags as some engineers no longer bear @intel.com.

soc-pcm and skylake-driver valuable bits from those trees are already 
part of the upstream. Most of what is left was later proven obsolete or 
redundant by my or Pierre's engineers. There seems to be no patch 
missing except for few fixes from the recent SKL/KBL up-revs for our 
clients. Nothing APL specific.

Following kernels related to APL are maintained by the IPG team from 
software perspective:
	4.1.42, 4.1.49, 4.4, 4.9, 4.14, 4.19

Multiple OSes. And then there are flavors for kernels/OS both. It's 
quite likely kernel base of yours fits into one of these buckets or at 
least have had changes ported from one of them.

TLDR: I agree here with my colleagues - if you believe the change is 
necessary, a proof e.g.: in form of reproduction steps, is needed. 
Otherwise it's no-go. Happy to hop on a call should you need any 
additional information.


Regards,
Czarek

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime
  2022-09-28 14:24     ` Cezary Rojewski
@ 2022-09-29 16:25       ` Eugeniu Rosca
  0 siblings, 0 replies; 10+ messages in thread
From: Eugeniu Rosca @ 2022-09-29 16:25 UTC (permalink / raw)
  To: Cezary Rojewski
  Cc: Eugeniu Rosca, Liam Girdwood, Mark Brown, Jaroslav Kysela,
	Takashi Iwai, alsa-devel, linux-kernel, Yanmin Zhang,
	Eugeniu Rosca, Jiada Wang, Zhang Yanmin, Ramesh Babu,
	Dean Jenkins, Ramesh Babu B, xiao jin, Pierre-Louis Bossart,
	Amadeusz Sławiński, Andy Shevchenko, Ranjani Sridharan,
	Bard Liao, Kai Vehmanen, Peter Ujfalusi, Guennadi Liakhovetski,
	Rander Wang

Hello Czarek,

Thank you for your friendly feedback.

On Mi, Sep 28, 2022 at 04:24:43 +0200, Cezary Rojewski wrote:
> On 2022-09-27 1:00 PM, Eugeniu Rosca wrote:
> >Hello Czarek,
> 
> ...
> 
> >>I'd like to know more about the scenario you guys reproduced the problem in.
> >
> >This patch was originally identified in the Intel Apollo Lake v4.1 KNLs.
> >Given that the change itself is in the core sound subsystem, our internal
> >assessment was that the patch might potentially be relevant/helpful
> >on other HW platforms.
> >
> >Our intention is to confirm or invalidate this assumption with the
> >original developers of the patch, as well as with the audio maintainers
> >and the members of the alsa-devel ML.
> >
> >>Configuration details and kernel base would be good to know too. Since our
> >>CI did not detect problem of such sort, if the problem actually exists, we
> >>would like to append a test or two to cover it later on.
> >
> >If there is no evidence that the patch is fixing a real-life issue
> >occurring in the latest vanilla, I agree to drop the patch.
> >
> >So far, I do not possess this evidence myself.
> 
> 
> I've spent some time to locate the change. Found it and it seems obsolete.
> Some tags are missing in the revision of yours and the original date does
> not match either - it's Apr 2018 for the original. Won't be mentioning the
> tags as some engineers no longer bear @intel.com.
> 
> soc-pcm and skylake-driver valuable bits from those trees are already part
> of the upstream. Most of what is left was later proven obsolete or redundant
> by my or Pierre's engineers. There seems to be no patch missing except for
> few fixes from the recent SKL/KBL up-revs for our clients. Nothing APL
> specific.

Thanks for this thorough check. That also gives us enough confidence
to drop the patch in some of our downstream kernels.

> 
> Following kernels related to APL are maintained by the IPG team from
> software perspective:
> 	4.1.42, 4.1.49, 4.4, 4.9, 4.14, 4.19
> 
> Multiple OSes. And then there are flavors for kernels/OS both. It's quite
> likely kernel base of yours fits into one of these buckets or at least have
> had changes ported from one of them.

Good to know and yes, you are right w.r.t. the origin of the patch.

> 
> TLDR: I agree here with my colleagues - if you believe the change is
> necessary, a proof e.g.: in form of reproduction steps, is needed. Otherwise
> it's no-go. Happy to hop on a call should you need any additional
> information.

That's a very kind attitude and we will definitely share any empirical
evidence if it turns out the patch is really contributing with healing
of any future runtime issues.

> 
> Regards,
> Czarek

Best Regards
Eugeniu

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime
  2022-09-28  8:36     ` Pierre-Louis Bossart
@ 2022-09-29 16:36       ` Eugeniu Rosca
  0 siblings, 0 replies; 10+ messages in thread
From: Eugeniu Rosca @ 2022-09-29 16:36 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Eugeniu Rosca, Yanmin Zhang, alsa-devel, Eugeniu Rosca,
	Jiada Wang, linux-kernel, Cezary Rojewski, Zhang Yanmin,
	Takashi Iwai, Ramesh Babu, Liam Girdwood, Dean Jenkins,
	Mark Brown, Ramesh Babu B, xiao jin

Hi Pierre,

On Mi, Sep 28, 2022 at 10:36:32 +0200, Pierre-Louis Bossart wrote:
> 
> 
> On 9/27/22 14:30, Eugeniu Rosca wrote:
> > Hi Pierre,
> > 
> > On Di, Sep 27, 2022 at 09:51:46 +0200, Pierre-Louis Bossart wrote:
> >> On 9/26/22 18:35, Eugeniu Rosca wrote:
> >>> From: xiao jin <jin.xiao@intel.com>
> >>>
> >>> After start of fe and be, fe might go to close without triggering
> >>> STOP, and substream->runtime is freed. However, be is still at
> >>> START state and its substream->runtime still points to the
> >>> freed runtime.
> >>>
> >>> Later on, FE is opened/started again, and triggers STOP.
> >>> snd_pcm_do_stop => dpcm_fe_dai_trigger
> >>>                 => dpcm_fe_dai_do_trigger
> >>>                 => dpcm_be_dai_trigger
> >>>                 => dpcm_do_trigger
> >>>                 => soc_pcm_trigger
> >>>                 => skl_platform_pcm_trigger
> >>> skl_platform_pcm_trigger accesses the freed old runtime data and
> >>> kernel panic.
> >>>
> >>> The patch fixes it by assigning be_substream->runtime in
> >>> dpcm_be_dai_startup when be's state is START.
> >>
> >> Can I ask on which kernel this patch was validated and on what platform?
> > 
> > As shared with Czarek, 
> > this patch was originally extracted from the out-of-tree Intel Apollo
> > Lake v4.1 KNL releases, hence it was validated on Intel ref.boards.
> > 
> > No re-testing/re-validation has been performed on the latest vanilla.
> 
> There's no way to predict how a patch for a kernel 4.1 - released 7
> years ago - would behave with a new kernel. If it's not tested it cannot
> be merged.

No disagreement here :)

> 
> > One of the goals behind submitting the patch is getting in touch
> > with the original authors, as well as the members of alsa-devel,
> > to assess if the patch is still relevant.
> 
> The only thing we could do is have more clarity on the test case and try
> to reproduce it.

Agreed. As soon as a test-case pops up where the patch makes a
difference in the runtime behavior, you will hear back from us.

> 
> >>
> >> We've done a lot of work since last year on DPCM states, 
> > 
> > Could you please feedback if the work on the DPCM states is
> > pre- or post-v5.10?
> 
> It doesn't matter for this discussion on the upstream kernel. But yes
> it's post v5.10.

Thanks. This is helpful in the downstream context.

> 
> > 
> >> and I wonder
> >> the problem mentioned above actually exists on recent kernels.
> >>
> >> Specifically, if the FE is closed, I don't get how the BE is not closed
> >> as well. And if this problem is found on a recent kernel, then it should
> >> be seen in the AVS driver as well, no?
> > 
> > It is totally conceivable (if not very likely) that the mainline
> > advancements in the sound subsystem make this patch obsolete.
> > 
> > I would be happy if that's the final outcome of our discussion
> > (since this will allow dropping the patch in our downstream kernel).

Best Regards
Eugeniu

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-09-29 16:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1664210154-11552-1-git-send-email-erosca@de.adit-jv.com>
2022-09-27  7:34 ` [PATCH] ASoC: soc-pcm: fix fe and be race when accessing substream->runtime Amadeusz Sławiński
2022-09-27 11:56   ` Eugeniu Rosca
2022-09-27  7:50 ` Cezary Rojewski
2022-09-27 11:00   ` Eugeniu Rosca
2022-09-28 14:24     ` Cezary Rojewski
2022-09-29 16:25       ` Eugeniu Rosca
2022-09-27  7:51 ` Pierre-Louis Bossart
2022-09-27 12:30   ` Eugeniu Rosca
2022-09-28  8:36     ` Pierre-Louis Bossart
2022-09-29 16:36       ` Eugeniu Rosca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).