All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
@ 2021-10-04 22:54 Pierre-Louis Bossart
  2021-10-04 22:54   ` Pierre-Louis Bossart
                   ` (5 more replies)
  0 siblings, 6 replies; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-04 22:54 UTC (permalink / raw)
  To: alsa-devel
  Cc: Kuninori Morimoto, tiwai, Sameer Pujar, Pierre-Louis Bossart,
	vkoul, broonie, Gyeongtaek Lee, Peter Ujfalusi

We've been adding a 'deep buffer' PCM device to several SOF topologies
in order to reduce power consumption. The typical use-case would be
music playback over a headset: this additional PCM device provides
more buffering and longer latencies, leaving the rest of the system
sleep for longer periods. Notifications and 'regular' low-latency
audio playback would still use the 'normal' PCM device and be mixed
with the 'deep buffer' before rendering on the headphone endpoint. The
tentative direction would be to expose this alternate device to
PulseAudio/PipeWire/CRAS via the UCM SectionModifier definitions.

That seemed a straightforward topology change until our automated
validation stress tests started reporting issues on SoundWire
platforms, when e.g. two START triggers might be send and conversely
the STOP trigger is never sent. The SoundWire stream state management
flagged inconsistent states when the two 'normal' and 'deep buffer'
devices are used concurrently with rapid play/stop/pause monkey
testing.

Looking at the soc-pcm.c code, it seems that the BE state
management needs a lot of love.

a) there is no consistent protection for the BE state. In some parts
of the code, the state updates are protected by a spinlock but in the
trigger they are not. When we open/play/close the two PCM devices in
stress tests, we end-up testing a state that is being modified. That
can't be good.

b) there is a conceptual deadlock: on stop we check the FE states to
see if a shared BE can be stopped, but since we trigger the BE first
the FE states have not been modified yet, so the TRIGGER_STOP is never
sent.

This patchset suggests a transition from a spinlock to a mutex, an
extended protection when walking through the BE list, and the use of a
refcount to decide when to trigger the BE. With these patches I am
able to run our entire validation suite without any issues with this
new 'deep buffer' topology, and no regressions on existing solutions [1]

One might ask 'how come we didn't see this earlier'? The answer is
probably that the .trigger callbacks in most implementations seems to
perform DAPM operations, and sending the triggers multiple times is
not an issue. In the case of SoundWire, we do use the .trigger
callback to reconfigure the bus using the 'bank switch' mechanism. It
could be acceptable to tolerate a trigger multiple times, but the
deadlock on stop cannot be fixed at the SoundWire layer alone.

I chose to send this patchset as an RFCv2 to gather more feedback and
make use others know about DPCM issues. We're going to spend more time
on this but if others can provide feedback/test results it would be
greatly appreciated.

Opens:

1) is this the right solution? The DPCM code is far from simple, has
notions such as SND_SOC_DPCM_UPDATE_NO and 'trigger_pending' that I
have no background on.

2) There are other reports of kernel oopses [2] that seem related to
the lack of protection. I'd be good to confirm if this patchset solve
these problems as well.

[1] https://github.com/thesofproject/linux/pull/3146
[2] https://lore.kernel.org/alsa-devel/002f01d7b4f5$c030f4a0$4092dde0$@samsung.com/

changes since RFC v1:
Removed unused function
Removed exported symbols only used in soc-pcm.c, used static instead
Use a mutex instead of a spinlock
Protect all for_each_dpcm_be() loops
Fix bugs introduced in the refcount

Pierre-Louis Bossart (5):
  ASoC: soc-pcm: remove snd_soc_dpcm_fe_can_update()
  ASoC: soc-pcm: don't export local functions, use static
  ASoC: soc-pcm: replace dpcm_lock with dpcm_mutex
  ASoC: soc-pcm: protect for_each_dpcm_be() loops with dpcm_mutex
  ASoC: soc-pcm: test refcount before triggering

 include/sound/soc-dpcm.h |  17 +----
 include/sound/soc.h      |   2 +-
 sound/soc/soc-core.c     |   2 +-
 sound/soc/soc-pcm.c      | 153 ++++++++++++++++++++++++++-------------
 4 files changed, 108 insertions(+), 66 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 1/5] ASoC: soc-pcm: remove snd_soc_dpcm_fe_can_update()
  2021-10-04 22:54 [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE Pierre-Louis Bossart
@ 2021-10-04 22:54   ` Pierre-Louis Bossart
  2021-10-04 22:54   ` Pierre-Louis Bossart
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-04 22:54 UTC (permalink / raw)
  To: alsa-devel
  Cc: tiwai, broonie, vkoul, Sameer Pujar, Gyeongtaek Lee,
	Peter Ujfalusi, Kuninori Morimoto, Pierre-Louis Bossart,
	Liam Girdwood, Jaroslav Kysela, Takashi Iwai, open list

This function is not used anywhere, including soc-pcm.c

Remove dead code.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 include/sound/soc-dpcm.h | 3 ---
 sound/soc/soc-pcm.c      | 9 ---------
 2 files changed, 12 deletions(-)

diff --git a/include/sound/soc-dpcm.h b/include/sound/soc-dpcm.h
index bc7af90099a8..72d45ad47ee3 100644
--- a/include/sound/soc-dpcm.h
+++ b/include/sound/soc-dpcm.h
@@ -121,9 +121,6 @@ int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
 int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream);
 
-/* is the current PCM operation for this FE ? */
-int snd_soc_dpcm_fe_can_update(struct snd_soc_pcm_runtime *fe, int stream);
-
 /* is the current PCM operation for this BE ? */
 int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream);
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index fc1854e3e43f..17476e3219ea 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -2805,15 +2805,6 @@ int soc_new_pcm(struct snd_soc_pcm_runtime *rtd, int num)
 	return ret;
 }
 
-/* is the current PCM operation for this FE ? */
-int snd_soc_dpcm_fe_can_update(struct snd_soc_pcm_runtime *fe, int stream)
-{
-	if (fe->dpcm[stream].runtime_update == SND_SOC_DPCM_UPDATE_FE)
-		return 1;
-	return 0;
-}
-EXPORT_SYMBOL_GPL(snd_soc_dpcm_fe_can_update);
-
 /* is the current PCM operation for this BE ? */
 int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 1/5] ASoC: soc-pcm: remove snd_soc_dpcm_fe_can_update()
@ 2021-10-04 22:54   ` Pierre-Louis Bossart
  0 siblings, 0 replies; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-04 22:54 UTC (permalink / raw)
  To: alsa-devel
  Cc: Kuninori Morimoto, tiwai, open list, Sameer Pujar, Takashi Iwai,
	Pierre-Louis Bossart, Liam Girdwood, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

This function is not used anywhere, including soc-pcm.c

Remove dead code.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 include/sound/soc-dpcm.h | 3 ---
 sound/soc/soc-pcm.c      | 9 ---------
 2 files changed, 12 deletions(-)

diff --git a/include/sound/soc-dpcm.h b/include/sound/soc-dpcm.h
index bc7af90099a8..72d45ad47ee3 100644
--- a/include/sound/soc-dpcm.h
+++ b/include/sound/soc-dpcm.h
@@ -121,9 +121,6 @@ int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
 int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream);
 
-/* is the current PCM operation for this FE ? */
-int snd_soc_dpcm_fe_can_update(struct snd_soc_pcm_runtime *fe, int stream);
-
 /* is the current PCM operation for this BE ? */
 int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream);
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index fc1854e3e43f..17476e3219ea 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -2805,15 +2805,6 @@ int soc_new_pcm(struct snd_soc_pcm_runtime *rtd, int num)
 	return ret;
 }
 
-/* is the current PCM operation for this FE ? */
-int snd_soc_dpcm_fe_can_update(struct snd_soc_pcm_runtime *fe, int stream)
-{
-	if (fe->dpcm[stream].runtime_update == SND_SOC_DPCM_UPDATE_FE)
-		return 1;
-	return 0;
-}
-EXPORT_SYMBOL_GPL(snd_soc_dpcm_fe_can_update);
-
 /* is the current PCM operation for this BE ? */
 int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 2/5] ASoC: soc-pcm: don't export local functions, use static
  2021-10-04 22:54 [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE Pierre-Louis Bossart
@ 2021-10-04 22:54   ` Pierre-Louis Bossart
  2021-10-04 22:54   ` Pierre-Louis Bossart
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-04 22:54 UTC (permalink / raw)
  To: alsa-devel
  Cc: tiwai, broonie, vkoul, Sameer Pujar, Gyeongtaek Lee,
	Peter Ujfalusi, Kuninori Morimoto, Pierre-Louis Bossart,
	Liam Girdwood, Jaroslav Kysela, Takashi Iwai, open list

No one uses the following functions outside of soc-pcm.c

snd_soc_dpcm_can_be_free_stop()
snd_soc_dpcm_can_be_params()
snd_soc_dpcm_be_can_update()

In preparation for locking changes, move them to static functions and
remove the EXPORT_SYMBOL_GPL()

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 include/sound/soc-dpcm.h | 12 ------------
 sound/soc/soc-pcm.c      | 21 +++++++++++++++------
 2 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/include/sound/soc-dpcm.h b/include/sound/soc-dpcm.h
index 72d45ad47ee3..9c00118603e7 100644
--- a/include/sound/soc-dpcm.h
+++ b/include/sound/soc-dpcm.h
@@ -113,18 +113,6 @@ struct snd_soc_dpcm_runtime {
 #define for_each_dpcm_be_rollback(fe, stream, _dpcm)			\
 	list_for_each_entry_continue_reverse(_dpcm, &(fe)->dpcm[stream].be_clients, list_be)
 
-/* can this BE stop and free */
-int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
-		struct snd_soc_pcm_runtime *be, int stream);
-
-/* can this BE perform a hw_params() */
-int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
-		struct snd_soc_pcm_runtime *be, int stream);
-
-/* is the current PCM operation for this BE ? */
-int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
-		struct snd_soc_pcm_runtime *be, int stream);
-
 /* get the substream for this BE */
 struct snd_pcm_substream *
 	snd_soc_dpcm_get_substream(struct snd_soc_pcm_runtime *be, int stream);
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index 17476e3219ea..360811f8a18c 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -29,6 +29,18 @@
 
 #define DPCM_MAX_BE_USERS	8
 
+/* can this BE stop and free */
+static int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
+					 struct snd_soc_pcm_runtime *be, int stream);
+
+/* can this BE perform a hw_params() */
+static int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
+				      struct snd_soc_pcm_runtime *be, int stream);
+
+/* is the current PCM operation for this BE ? */
+static int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
+				      struct snd_soc_pcm_runtime *be, int stream);
+
 static inline const char *soc_cpu_dai_name(struct snd_soc_pcm_runtime *rtd)
 {
 	return (rtd)->num_cpus == 1 ? asoc_rtd_to_cpu(rtd, 0)->name : "multicpu";
@@ -2806,7 +2818,7 @@ int soc_new_pcm(struct snd_soc_pcm_runtime *rtd, int num)
 }
 
 /* is the current PCM operation for this BE ? */
-int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
+static int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
 {
 	if ((fe->dpcm[stream].runtime_update == SND_SOC_DPCM_UPDATE_FE) ||
@@ -2815,7 +2827,6 @@ int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
 		return 1;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(snd_soc_dpcm_be_can_update);
 
 /* get the substream for this BE */
 struct snd_pcm_substream *
@@ -2861,7 +2872,7 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
  * We can only hw_free, stop, pause or suspend a BE DAI if any of it's FE
  * are not running, paused or suspended for the specified stream direction.
  */
-int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
+static int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
 {
 	const enum snd_soc_dpcm_state state[] = {
@@ -2872,13 +2883,12 @@ int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
 
 	return snd_soc_dpcm_check_state(fe, be, stream, state, ARRAY_SIZE(state));
 }
-EXPORT_SYMBOL_GPL(snd_soc_dpcm_can_be_free_stop);
 
 /*
  * We can only change hw params a BE DAI if any of it's FE are not prepared,
  * running, paused or suspended for the specified stream direction.
  */
-int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
+static int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
 {
 	const enum snd_soc_dpcm_state state[] = {
@@ -2890,4 +2900,3 @@ int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
 
 	return snd_soc_dpcm_check_state(fe, be, stream, state, ARRAY_SIZE(state));
 }
-EXPORT_SYMBOL_GPL(snd_soc_dpcm_can_be_params);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 2/5] ASoC: soc-pcm: don't export local functions, use static
@ 2021-10-04 22:54   ` Pierre-Louis Bossart
  0 siblings, 0 replies; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-04 22:54 UTC (permalink / raw)
  To: alsa-devel
  Cc: Kuninori Morimoto, tiwai, open list, Sameer Pujar, Takashi Iwai,
	Pierre-Louis Bossart, Liam Girdwood, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

No one uses the following functions outside of soc-pcm.c

snd_soc_dpcm_can_be_free_stop()
snd_soc_dpcm_can_be_params()
snd_soc_dpcm_be_can_update()

In preparation for locking changes, move them to static functions and
remove the EXPORT_SYMBOL_GPL()

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 include/sound/soc-dpcm.h | 12 ------------
 sound/soc/soc-pcm.c      | 21 +++++++++++++++------
 2 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/include/sound/soc-dpcm.h b/include/sound/soc-dpcm.h
index 72d45ad47ee3..9c00118603e7 100644
--- a/include/sound/soc-dpcm.h
+++ b/include/sound/soc-dpcm.h
@@ -113,18 +113,6 @@ struct snd_soc_dpcm_runtime {
 #define for_each_dpcm_be_rollback(fe, stream, _dpcm)			\
 	list_for_each_entry_continue_reverse(_dpcm, &(fe)->dpcm[stream].be_clients, list_be)
 
-/* can this BE stop and free */
-int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
-		struct snd_soc_pcm_runtime *be, int stream);
-
-/* can this BE perform a hw_params() */
-int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
-		struct snd_soc_pcm_runtime *be, int stream);
-
-/* is the current PCM operation for this BE ? */
-int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
-		struct snd_soc_pcm_runtime *be, int stream);
-
 /* get the substream for this BE */
 struct snd_pcm_substream *
 	snd_soc_dpcm_get_substream(struct snd_soc_pcm_runtime *be, int stream);
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index 17476e3219ea..360811f8a18c 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -29,6 +29,18 @@
 
 #define DPCM_MAX_BE_USERS	8
 
+/* can this BE stop and free */
+static int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
+					 struct snd_soc_pcm_runtime *be, int stream);
+
+/* can this BE perform a hw_params() */
+static int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
+				      struct snd_soc_pcm_runtime *be, int stream);
+
+/* is the current PCM operation for this BE ? */
+static int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
+				      struct snd_soc_pcm_runtime *be, int stream);
+
 static inline const char *soc_cpu_dai_name(struct snd_soc_pcm_runtime *rtd)
 {
 	return (rtd)->num_cpus == 1 ? asoc_rtd_to_cpu(rtd, 0)->name : "multicpu";
@@ -2806,7 +2818,7 @@ int soc_new_pcm(struct snd_soc_pcm_runtime *rtd, int num)
 }
 
 /* is the current PCM operation for this BE ? */
-int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
+static int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
 {
 	if ((fe->dpcm[stream].runtime_update == SND_SOC_DPCM_UPDATE_FE) ||
@@ -2815,7 +2827,6 @@ int snd_soc_dpcm_be_can_update(struct snd_soc_pcm_runtime *fe,
 		return 1;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(snd_soc_dpcm_be_can_update);
 
 /* get the substream for this BE */
 struct snd_pcm_substream *
@@ -2861,7 +2872,7 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
  * We can only hw_free, stop, pause or suspend a BE DAI if any of it's FE
  * are not running, paused or suspended for the specified stream direction.
  */
-int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
+static int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
 {
 	const enum snd_soc_dpcm_state state[] = {
@@ -2872,13 +2883,12 @@ int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
 
 	return snd_soc_dpcm_check_state(fe, be, stream, state, ARRAY_SIZE(state));
 }
-EXPORT_SYMBOL_GPL(snd_soc_dpcm_can_be_free_stop);
 
 /*
  * We can only change hw params a BE DAI if any of it's FE are not prepared,
  * running, paused or suspended for the specified stream direction.
  */
-int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
+static int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
 {
 	const enum snd_soc_dpcm_state state[] = {
@@ -2890,4 +2900,3 @@ int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
 
 	return snd_soc_dpcm_check_state(fe, be, stream, state, ARRAY_SIZE(state));
 }
-EXPORT_SYMBOL_GPL(snd_soc_dpcm_can_be_params);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 3/5] ASoC: soc-pcm: replace dpcm_lock with dpcm_mutex
  2021-10-04 22:54 [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE Pierre-Louis Bossart
@ 2021-10-04 22:54   ` Pierre-Louis Bossart
  2021-10-04 22:54   ` Pierre-Louis Bossart
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-04 22:54 UTC (permalink / raw)
  To: alsa-devel
  Cc: tiwai, broonie, vkoul, Sameer Pujar, Gyeongtaek Lee,
	Peter Ujfalusi, Kuninori Morimoto, Pierre-Louis Bossart,
	Liam Girdwood, Jaroslav Kysela, Takashi Iwai, open list

It's not clear why a spinlock was used, and even less why the
'dpcm_lock' is used with the irqsave/irqrestore: DPCM functions are
typically not used in interrupt context.

Move to a mutex which will allow us to sleep for longer periods of
time and handle non-atomic triggers.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 include/sound/soc.h  |  2 +-
 sound/soc/soc-core.c |  2 +-
 sound/soc/soc-pcm.c  | 30 ++++++++++++------------------
 3 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/include/sound/soc.h b/include/sound/soc.h
index 8e6dd8a257c5..c13d8ec72d62 100644
--- a/include/sound/soc.h
+++ b/include/sound/soc.h
@@ -893,7 +893,7 @@ struct snd_soc_card {
 	struct mutex pcm_mutex;
 	enum snd_soc_pcm_subclass pcm_subclass;
 
-	spinlock_t dpcm_lock;
+	struct mutex dpcm_mutex; /* protect all dpcm states and updates */
 
 	int (*probe)(struct snd_soc_card *card);
 	int (*late_probe)(struct snd_soc_card *card);
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index c830e96afba2..e94d43c392e0 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -2339,7 +2339,7 @@ int snd_soc_register_card(struct snd_soc_card *card)
 	mutex_init(&card->mutex);
 	mutex_init(&card->dapm_mutex);
 	mutex_init(&card->pcm_mutex);
-	spin_lock_init(&card->dpcm_lock);
+	mutex_init(&card->dpcm_mutex);
 
 	return snd_soc_bind_card(card);
 }
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index 360811f8a18c..af12593b033a 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -85,7 +85,6 @@ static ssize_t dpcm_show_state(struct snd_soc_pcm_runtime *fe,
 	struct snd_pcm_hw_params *params = &fe->dpcm[stream].hw_params;
 	struct snd_soc_dpcm *dpcm;
 	ssize_t offset = 0;
-	unsigned long flags;
 
 	/* FE state */
 	offset += scnprintf(buf + offset, size - offset,
@@ -113,7 +112,7 @@ static ssize_t dpcm_show_state(struct snd_soc_pcm_runtime *fe,
 		goto out;
 	}
 
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		params = &dpcm->hw_params;
@@ -134,7 +133,7 @@ static ssize_t dpcm_show_state(struct snd_soc_pcm_runtime *fe,
 					   params_channels(params),
 					   params_rate(params));
 	}
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	mutex_unlock(&fe->card->dpcm_mutex);
 out:
 	return offset;
 }
@@ -1141,7 +1140,6 @@ static int dpcm_be_connect(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
 {
 	struct snd_soc_dpcm *dpcm;
-	unsigned long flags;
 
 	/* only add new dpcms */
 	for_each_dpcm_be(fe, stream, dpcm) {
@@ -1157,10 +1155,10 @@ static int dpcm_be_connect(struct snd_soc_pcm_runtime *fe,
 	dpcm->fe = fe;
 	be->dpcm[stream].runtime = fe->dpcm[stream].runtime;
 	dpcm->state = SND_SOC_DPCM_LINK_STATE_NEW;
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	mutex_lock(&fe->card->dpcm_mutex);
 	list_add(&dpcm->list_be, &fe->dpcm[stream].be_clients);
 	list_add(&dpcm->list_fe, &be->dpcm[stream].fe_clients);
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	dev_dbg(fe->dev, "connected new DPCM %s path %s %s %s\n",
 			stream ? "capture" : "playback",  fe->dai_link->name,
@@ -1203,7 +1201,6 @@ static void dpcm_be_reparent(struct snd_soc_pcm_runtime *fe,
 void dpcm_be_disconnect(struct snd_soc_pcm_runtime *fe, int stream)
 {
 	struct snd_soc_dpcm *dpcm, *d;
-	unsigned long flags;
 
 	for_each_dpcm_be_safe(fe, stream, dpcm, d) {
 		dev_dbg(fe->dev, "ASoC: BE %s disconnect check for %s\n",
@@ -1222,10 +1219,10 @@ void dpcm_be_disconnect(struct snd_soc_pcm_runtime *fe, int stream)
 
 		dpcm_remove_debugfs_state(dpcm);
 
-		spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+		mutex_lock(&fe->card->dpcm_mutex);
 		list_del(&dpcm->list_be);
 		list_del(&dpcm->list_fe);
-		spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+		mutex_unlock(&fe->card->dpcm_mutex);
 		kfree(dpcm);
 	}
 }
@@ -1441,12 +1438,11 @@ int dpcm_process_paths(struct snd_soc_pcm_runtime *fe,
 void dpcm_clear_pending_state(struct snd_soc_pcm_runtime *fe, int stream)
 {
 	struct snd_soc_dpcm *dpcm;
-	unsigned long flags;
 
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm)
 		dpcm_set_be_update_state(dpcm->be, stream, SND_SOC_DPCM_UPDATE_NO);
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 void dpcm_be_dai_stop(struct snd_soc_pcm_runtime *fe, int stream,
@@ -2364,7 +2360,6 @@ static int dpcm_run_update_startup(struct snd_soc_pcm_runtime *fe, int stream)
 	struct snd_soc_dpcm *dpcm;
 	enum snd_soc_dpcm_trigger trigger = fe->dai_link->trigger[stream];
 	int ret = 0;
-	unsigned long flags;
 
 	dev_dbg(fe->dev, "ASoC: runtime %s open on FE %s\n",
 			stream ? "capture" : "playback", fe->dai_link->name);
@@ -2433,7 +2428,7 @@ static int dpcm_run_update_startup(struct snd_soc_pcm_runtime *fe, int stream)
 	dpcm_be_dai_shutdown(fe, stream);
 disconnect:
 	/* disconnect any pending BEs */
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 
@@ -2445,7 +2440,7 @@ static int dpcm_run_update_startup(struct snd_soc_pcm_runtime *fe, int stream)
 			be->dpcm[stream].state == SND_SOC_DPCM_STATE_NEW)
 				dpcm->state = SND_SOC_DPCM_LINK_STATE_FREE;
 	}
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	if (ret < 0)
 		dev_err(fe->dev, "ASoC: %s() failed (%d)\n", __func__, ret);
@@ -2845,10 +2840,9 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 	struct snd_soc_dpcm *dpcm;
 	int state;
 	int ret = 1;
-	unsigned long flags;
 	int i;
 
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_fe(be, stream, dpcm) {
 
 		if (dpcm->fe == fe)
@@ -2862,7 +2856,7 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 			}
 		}
 	}
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	/* it's safe to do this BE DAI */
 	return ret;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 3/5] ASoC: soc-pcm: replace dpcm_lock with dpcm_mutex
@ 2021-10-04 22:54   ` Pierre-Louis Bossart
  0 siblings, 0 replies; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-04 22:54 UTC (permalink / raw)
  To: alsa-devel
  Cc: Kuninori Morimoto, tiwai, open list, Sameer Pujar, Takashi Iwai,
	Pierre-Louis Bossart, Liam Girdwood, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

It's not clear why a spinlock was used, and even less why the
'dpcm_lock' is used with the irqsave/irqrestore: DPCM functions are
typically not used in interrupt context.

Move to a mutex which will allow us to sleep for longer periods of
time and handle non-atomic triggers.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 include/sound/soc.h  |  2 +-
 sound/soc/soc-core.c |  2 +-
 sound/soc/soc-pcm.c  | 30 ++++++++++++------------------
 3 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/include/sound/soc.h b/include/sound/soc.h
index 8e6dd8a257c5..c13d8ec72d62 100644
--- a/include/sound/soc.h
+++ b/include/sound/soc.h
@@ -893,7 +893,7 @@ struct snd_soc_card {
 	struct mutex pcm_mutex;
 	enum snd_soc_pcm_subclass pcm_subclass;
 
-	spinlock_t dpcm_lock;
+	struct mutex dpcm_mutex; /* protect all dpcm states and updates */
 
 	int (*probe)(struct snd_soc_card *card);
 	int (*late_probe)(struct snd_soc_card *card);
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index c830e96afba2..e94d43c392e0 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -2339,7 +2339,7 @@ int snd_soc_register_card(struct snd_soc_card *card)
 	mutex_init(&card->mutex);
 	mutex_init(&card->dapm_mutex);
 	mutex_init(&card->pcm_mutex);
-	spin_lock_init(&card->dpcm_lock);
+	mutex_init(&card->dpcm_mutex);
 
 	return snd_soc_bind_card(card);
 }
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index 360811f8a18c..af12593b033a 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -85,7 +85,6 @@ static ssize_t dpcm_show_state(struct snd_soc_pcm_runtime *fe,
 	struct snd_pcm_hw_params *params = &fe->dpcm[stream].hw_params;
 	struct snd_soc_dpcm *dpcm;
 	ssize_t offset = 0;
-	unsigned long flags;
 
 	/* FE state */
 	offset += scnprintf(buf + offset, size - offset,
@@ -113,7 +112,7 @@ static ssize_t dpcm_show_state(struct snd_soc_pcm_runtime *fe,
 		goto out;
 	}
 
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		params = &dpcm->hw_params;
@@ -134,7 +133,7 @@ static ssize_t dpcm_show_state(struct snd_soc_pcm_runtime *fe,
 					   params_channels(params),
 					   params_rate(params));
 	}
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	mutex_unlock(&fe->card->dpcm_mutex);
 out:
 	return offset;
 }
@@ -1141,7 +1140,6 @@ static int dpcm_be_connect(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
 {
 	struct snd_soc_dpcm *dpcm;
-	unsigned long flags;
 
 	/* only add new dpcms */
 	for_each_dpcm_be(fe, stream, dpcm) {
@@ -1157,10 +1155,10 @@ static int dpcm_be_connect(struct snd_soc_pcm_runtime *fe,
 	dpcm->fe = fe;
 	be->dpcm[stream].runtime = fe->dpcm[stream].runtime;
 	dpcm->state = SND_SOC_DPCM_LINK_STATE_NEW;
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	mutex_lock(&fe->card->dpcm_mutex);
 	list_add(&dpcm->list_be, &fe->dpcm[stream].be_clients);
 	list_add(&dpcm->list_fe, &be->dpcm[stream].fe_clients);
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	dev_dbg(fe->dev, "connected new DPCM %s path %s %s %s\n",
 			stream ? "capture" : "playback",  fe->dai_link->name,
@@ -1203,7 +1201,6 @@ static void dpcm_be_reparent(struct snd_soc_pcm_runtime *fe,
 void dpcm_be_disconnect(struct snd_soc_pcm_runtime *fe, int stream)
 {
 	struct snd_soc_dpcm *dpcm, *d;
-	unsigned long flags;
 
 	for_each_dpcm_be_safe(fe, stream, dpcm, d) {
 		dev_dbg(fe->dev, "ASoC: BE %s disconnect check for %s\n",
@@ -1222,10 +1219,10 @@ void dpcm_be_disconnect(struct snd_soc_pcm_runtime *fe, int stream)
 
 		dpcm_remove_debugfs_state(dpcm);
 
-		spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+		mutex_lock(&fe->card->dpcm_mutex);
 		list_del(&dpcm->list_be);
 		list_del(&dpcm->list_fe);
-		spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+		mutex_unlock(&fe->card->dpcm_mutex);
 		kfree(dpcm);
 	}
 }
@@ -1441,12 +1438,11 @@ int dpcm_process_paths(struct snd_soc_pcm_runtime *fe,
 void dpcm_clear_pending_state(struct snd_soc_pcm_runtime *fe, int stream)
 {
 	struct snd_soc_dpcm *dpcm;
-	unsigned long flags;
 
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm)
 		dpcm_set_be_update_state(dpcm->be, stream, SND_SOC_DPCM_UPDATE_NO);
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 void dpcm_be_dai_stop(struct snd_soc_pcm_runtime *fe, int stream,
@@ -2364,7 +2360,6 @@ static int dpcm_run_update_startup(struct snd_soc_pcm_runtime *fe, int stream)
 	struct snd_soc_dpcm *dpcm;
 	enum snd_soc_dpcm_trigger trigger = fe->dai_link->trigger[stream];
 	int ret = 0;
-	unsigned long flags;
 
 	dev_dbg(fe->dev, "ASoC: runtime %s open on FE %s\n",
 			stream ? "capture" : "playback", fe->dai_link->name);
@@ -2433,7 +2428,7 @@ static int dpcm_run_update_startup(struct snd_soc_pcm_runtime *fe, int stream)
 	dpcm_be_dai_shutdown(fe, stream);
 disconnect:
 	/* disconnect any pending BEs */
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 
@@ -2445,7 +2440,7 @@ static int dpcm_run_update_startup(struct snd_soc_pcm_runtime *fe, int stream)
 			be->dpcm[stream].state == SND_SOC_DPCM_STATE_NEW)
 				dpcm->state = SND_SOC_DPCM_LINK_STATE_FREE;
 	}
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	if (ret < 0)
 		dev_err(fe->dev, "ASoC: %s() failed (%d)\n", __func__, ret);
@@ -2845,10 +2840,9 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 	struct snd_soc_dpcm *dpcm;
 	int state;
 	int ret = 1;
-	unsigned long flags;
 	int i;
 
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_fe(be, stream, dpcm) {
 
 		if (dpcm->fe == fe)
@@ -2862,7 +2856,7 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 			}
 		}
 	}
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	/* it's safe to do this BE DAI */
 	return ret;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 4/5] ASoC: soc-pcm: protect for_each_dpcm_be() loops with dpcm_mutex
  2021-10-04 22:54 [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE Pierre-Louis Bossart
@ 2021-10-04 22:54   ` Pierre-Louis Bossart
  2021-10-04 22:54   ` Pierre-Louis Bossart
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-04 22:54 UTC (permalink / raw)
  To: alsa-devel
  Cc: tiwai, broonie, vkoul, Sameer Pujar, Gyeongtaek Lee,
	Peter Ujfalusi, Kuninori Morimoto, Pierre-Louis Bossart,
	Liam Girdwood, Jaroslav Kysela, Takashi Iwai, open list

When more than one FE is connected to a BE, e.g. in a mixing use case,
the BE can be triggered multiple times when the FE are opened/started
concurrently. This race condition is problematic in the case of
SoundWire BE dailinks, and this is not desirable in a general
case. The code carefully checks when the BE can be stopped or
hw_free'ed, but the trigger code does not use any mutual exclusion.

This can be fixed by using the dpcm_mutex.

As discussed on the alsa-mailing list [1], there is an additional
issue that the BEs can be disconnected dynamically, leading to
potential accesses to freed memory.

This patch suggests a general protection of the for_each_dpcm_be()
loop with the dpcm_mutex, to solve both the problem of multiple
triggers and BE dynamic addition/removal.

[1] https://lore.kernel.org/alsa-devel/002f01d7b4f5$c030f4a0$4092dde0$@samsung.com/

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 sound/soc/soc-pcm.c | 48 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index af12593b033a..6089e0c1bf9f 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -320,6 +320,7 @@ int dpcm_dapm_stream_event(struct snd_soc_pcm_runtime *fe, int dir,
 {
 	struct snd_soc_dpcm *dpcm;
 
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, dir, dpcm) {
 
 		struct snd_soc_pcm_runtime *be = dpcm->be;
@@ -333,6 +334,8 @@ int dpcm_dapm_stream_event(struct snd_soc_pcm_runtime *fe, int dir,
 
 		snd_soc_dapm_stream_event(be, dir, event);
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
+
 
 	snd_soc_dapm_stream_event(fe, dir, event);
 
@@ -1142,10 +1145,14 @@ static int dpcm_be_connect(struct snd_soc_pcm_runtime *fe,
 	struct snd_soc_dpcm *dpcm;
 
 	/* only add new dpcms */
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
-		if (dpcm->be == be && dpcm->fe == fe)
+		if (dpcm->be == be && dpcm->fe == fe) {
+			mutex_unlock(&fe->card->dpcm_mutex);
 			return 0;
+		}
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	dpcm = kzalloc(sizeof(struct snd_soc_dpcm), GFP_KERNEL);
 	if (!dpcm)
@@ -1202,6 +1209,7 @@ void dpcm_be_disconnect(struct snd_soc_pcm_runtime *fe, int stream)
 {
 	struct snd_soc_dpcm *dpcm, *d;
 
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be_safe(fe, stream, dpcm, d) {
 		dev_dbg(fe->dev, "ASoC: BE %s disconnect check for %s\n",
 				stream ? "capture" : "playback",
@@ -1219,12 +1227,11 @@ void dpcm_be_disconnect(struct snd_soc_pcm_runtime *fe, int stream)
 
 		dpcm_remove_debugfs_state(dpcm);
 
-		mutex_lock(&fe->card->dpcm_mutex);
 		list_del(&dpcm->list_be);
 		list_del(&dpcm->list_fe);
-		mutex_unlock(&fe->card->dpcm_mutex);
 		kfree(dpcm);
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 /* get BE for DAI widget and stream */
@@ -1351,6 +1358,7 @@ static int dpcm_prune_paths(struct snd_soc_pcm_runtime *fe, int stream,
 	int prune = 0;
 
 	/* Destroy any old FE <--> BE connections */
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		if (dpcm_be_is_active(dpcm, stream, *list_))
 			continue;
@@ -1362,6 +1370,7 @@ static int dpcm_prune_paths(struct snd_soc_pcm_runtime *fe, int stream,
 		dpcm_set_be_update_state(dpcm->be, stream, SND_SOC_DPCM_UPDATE_BE);
 		prune++;
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	dev_dbg(fe->dev, "ASoC: found %d old BE paths for pruning\n", prune);
 	return prune;
@@ -1451,13 +1460,16 @@ void dpcm_be_dai_stop(struct snd_soc_pcm_runtime *fe, int stream,
 	struct snd_soc_dpcm *dpcm;
 
 	/* disable any enabled and non active backends */
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		struct snd_pcm_substream *be_substream =
 			snd_soc_dpcm_get_substream(be, stream);
 
-		if (dpcm == last)
+		if (dpcm == last) {
+			mutex_unlock(&fe->card->dpcm_mutex);
 			return;
+		}
 
 		/* is this op for this BE ? */
 		if (!snd_soc_dpcm_be_can_update(fe, be, stream))
@@ -1487,6 +1499,7 @@ void dpcm_be_dai_stop(struct snd_soc_pcm_runtime *fe, int stream,
 		be_substream->runtime = NULL;
 		be->dpcm[stream].state = SND_SOC_DPCM_STATE_CLOSE;
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
@@ -1496,6 +1509,7 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
 	int err, count = 0;
 
 	/* only startup BE DAIs that are either sinks or sources to this FE DAI */
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_pcm_substream *be_substream;
 
@@ -1546,11 +1560,13 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
 		be->dpcm[stream].state = SND_SOC_DPCM_STATE_OPEN;
 		count++;
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	return count;
 
 unwind:
 	dpcm_be_dai_startup_rollback(fe, stream, dpcm);
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	dev_err(fe->dev, "ASoC: %s() failed at %s (%d)\n",
 		__func__, be->dai_link->name, err);
@@ -1605,6 +1621,7 @@ static void dpcm_runtime_setup_be_format(struct snd_pcm_substream *substream)
 	 * if FE want to use it (= dpcm_merged_format)
 	 */
 
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		struct snd_soc_pcm_stream *codec_stream;
@@ -1623,6 +1640,7 @@ static void dpcm_runtime_setup_be_format(struct snd_pcm_substream *substream)
 			soc_pcm_hw_update_format(hw, codec_stream);
 		}
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 static void dpcm_runtime_setup_be_chan(struct snd_pcm_substream *substream)
@@ -1640,7 +1658,7 @@ static void dpcm_runtime_setup_be_chan(struct snd_pcm_substream *substream)
 	 * It returns merged BE codec channel;
 	 * if FE want to use it (= dpcm_merged_chan)
 	 */
-
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		struct snd_soc_pcm_stream *cpu_stream;
@@ -1671,6 +1689,7 @@ static void dpcm_runtime_setup_be_chan(struct snd_pcm_substream *substream)
 			soc_pcm_hw_update_chan(hw, codec_stream);
 		}
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 static void dpcm_runtime_setup_be_rate(struct snd_pcm_substream *substream)
@@ -1688,7 +1707,7 @@ static void dpcm_runtime_setup_be_rate(struct snd_pcm_substream *substream)
 	 * It returns merged BE codec channel;
 	 * if FE want to use it (= dpcm_merged_chan)
 	 */
-
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		struct snd_soc_pcm_stream *pcm;
@@ -1708,6 +1727,7 @@ static void dpcm_runtime_setup_be_rate(struct snd_pcm_substream *substream)
 			soc_pcm_hw_update_rate(hw, pcm);
 		}
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 static int dpcm_apply_symmetry(struct snd_pcm_substream *fe_substream,
@@ -1730,6 +1750,7 @@ static int dpcm_apply_symmetry(struct snd_pcm_substream *fe_substream,
 	}
 
 	/* apply symmetry for BE */
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		struct snd_pcm_substream *be_substream =
@@ -1755,6 +1776,7 @@ static int dpcm_apply_symmetry(struct snd_pcm_substream *fe_substream,
 		}
 	}
 error:
+	mutex_unlock(&fe->card->dpcm_mutex);
 	if (err < 0)
 		dev_err(fe->dev, "ASoC: %s failed (%d)\n", __func__, err);
 
@@ -1830,6 +1852,7 @@ void dpcm_be_dai_hw_free(struct snd_soc_pcm_runtime *fe, int stream)
 
 	/* only hw_params backends that are either sinks or sources
 	 * to this frontend DAI */
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 
 		struct snd_soc_pcm_runtime *be = dpcm->be;
@@ -1842,7 +1865,7 @@ void dpcm_be_dai_hw_free(struct snd_soc_pcm_runtime *fe, int stream)
 
 		/* only free hw when no longer used - check all FEs */
 		if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
-				continue;
+			continue;
 
 		/* do not free hw if this BE is used by other FE */
 		if (be->dpcm[stream].users > 1)
@@ -1863,6 +1886,7 @@ void dpcm_be_dai_hw_free(struct snd_soc_pcm_runtime *fe, int stream)
 
 		be->dpcm[stream].state = SND_SOC_DPCM_STATE_HW_FREE;
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 static int dpcm_fe_dai_hw_free(struct snd_pcm_substream *substream)
@@ -1896,6 +1920,7 @@ int dpcm_be_dai_hw_params(struct snd_soc_pcm_runtime *fe, int stream)
 	struct snd_soc_dpcm *dpcm;
 	int ret;
 
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		be = dpcm->be;
 		be_substream = snd_soc_dpcm_get_substream(be, stream);
@@ -1935,6 +1960,7 @@ int dpcm_be_dai_hw_params(struct snd_soc_pcm_runtime *fe, int stream)
 
 		be->dpcm[stream].state = SND_SOC_DPCM_STATE_HW_PARAMS;
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 	return 0;
 
 unwind:
@@ -1961,6 +1987,7 @@ int dpcm_be_dai_hw_params(struct snd_soc_pcm_runtime *fe, int stream)
 
 		soc_pcm_hw_free(be_substream);
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	return ret;
 }
@@ -2008,6 +2035,7 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 	struct snd_soc_dpcm *dpcm;
 	int ret = 0;
 
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_pcm_substream *be_substream;
 
@@ -2097,6 +2125,7 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 		}
 	}
 end:
+	mutex_unlock(&fe->card->dpcm_mutex);
 	if (ret < 0)
 		dev_err(fe->dev, "ASoC: %s() failed at %s (%d)\n",
 			__func__, be->dai_link->name, ret);
@@ -2831,6 +2860,7 @@ struct snd_pcm_substream *
 }
 EXPORT_SYMBOL_GPL(snd_soc_dpcm_get_substream);
 
+/* This must be called with fe->card->dpcm_mutex held */
 static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 				    struct snd_soc_pcm_runtime *be,
 				    int stream,
@@ -2842,7 +2872,6 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 	int ret = 1;
 	int i;
 
-	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_fe(be, stream, dpcm) {
 
 		if (dpcm->fe == fe)
@@ -2856,7 +2885,6 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 			}
 		}
 	}
-	mutex_unlock(&fe->card->dpcm_mutex);
 
 	/* it's safe to do this BE DAI */
 	return ret;
@@ -2865,6 +2893,7 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 /*
  * We can only hw_free, stop, pause or suspend a BE DAI if any of it's FE
  * are not running, paused or suspended for the specified stream direction.
+ * This must be called with fe->card->dpcm_mutex held.
  */
 static int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
@@ -2881,6 +2910,7 @@ static int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
 /*
  * We can only change hw params a BE DAI if any of it's FE are not prepared,
  * running, paused or suspended for the specified stream direction.
+ * This must be called with fe->card->dpcm_mutex held.
  */
 static int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 4/5] ASoC: soc-pcm: protect for_each_dpcm_be() loops with dpcm_mutex
@ 2021-10-04 22:54   ` Pierre-Louis Bossart
  0 siblings, 0 replies; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-04 22:54 UTC (permalink / raw)
  To: alsa-devel
  Cc: Kuninori Morimoto, tiwai, open list, Sameer Pujar, Takashi Iwai,
	Pierre-Louis Bossart, Liam Girdwood, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

When more than one FE is connected to a BE, e.g. in a mixing use case,
the BE can be triggered multiple times when the FE are opened/started
concurrently. This race condition is problematic in the case of
SoundWire BE dailinks, and this is not desirable in a general
case. The code carefully checks when the BE can be stopped or
hw_free'ed, but the trigger code does not use any mutual exclusion.

This can be fixed by using the dpcm_mutex.

As discussed on the alsa-mailing list [1], there is an additional
issue that the BEs can be disconnected dynamically, leading to
potential accesses to freed memory.

This patch suggests a general protection of the for_each_dpcm_be()
loop with the dpcm_mutex, to solve both the problem of multiple
triggers and BE dynamic addition/removal.

[1] https://lore.kernel.org/alsa-devel/002f01d7b4f5$c030f4a0$4092dde0$@samsung.com/

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 sound/soc/soc-pcm.c | 48 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index af12593b033a..6089e0c1bf9f 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -320,6 +320,7 @@ int dpcm_dapm_stream_event(struct snd_soc_pcm_runtime *fe, int dir,
 {
 	struct snd_soc_dpcm *dpcm;
 
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, dir, dpcm) {
 
 		struct snd_soc_pcm_runtime *be = dpcm->be;
@@ -333,6 +334,8 @@ int dpcm_dapm_stream_event(struct snd_soc_pcm_runtime *fe, int dir,
 
 		snd_soc_dapm_stream_event(be, dir, event);
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
+
 
 	snd_soc_dapm_stream_event(fe, dir, event);
 
@@ -1142,10 +1145,14 @@ static int dpcm_be_connect(struct snd_soc_pcm_runtime *fe,
 	struct snd_soc_dpcm *dpcm;
 
 	/* only add new dpcms */
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
-		if (dpcm->be == be && dpcm->fe == fe)
+		if (dpcm->be == be && dpcm->fe == fe) {
+			mutex_unlock(&fe->card->dpcm_mutex);
 			return 0;
+		}
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	dpcm = kzalloc(sizeof(struct snd_soc_dpcm), GFP_KERNEL);
 	if (!dpcm)
@@ -1202,6 +1209,7 @@ void dpcm_be_disconnect(struct snd_soc_pcm_runtime *fe, int stream)
 {
 	struct snd_soc_dpcm *dpcm, *d;
 
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be_safe(fe, stream, dpcm, d) {
 		dev_dbg(fe->dev, "ASoC: BE %s disconnect check for %s\n",
 				stream ? "capture" : "playback",
@@ -1219,12 +1227,11 @@ void dpcm_be_disconnect(struct snd_soc_pcm_runtime *fe, int stream)
 
 		dpcm_remove_debugfs_state(dpcm);
 
-		mutex_lock(&fe->card->dpcm_mutex);
 		list_del(&dpcm->list_be);
 		list_del(&dpcm->list_fe);
-		mutex_unlock(&fe->card->dpcm_mutex);
 		kfree(dpcm);
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 /* get BE for DAI widget and stream */
@@ -1351,6 +1358,7 @@ static int dpcm_prune_paths(struct snd_soc_pcm_runtime *fe, int stream,
 	int prune = 0;
 
 	/* Destroy any old FE <--> BE connections */
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		if (dpcm_be_is_active(dpcm, stream, *list_))
 			continue;
@@ -1362,6 +1370,7 @@ static int dpcm_prune_paths(struct snd_soc_pcm_runtime *fe, int stream,
 		dpcm_set_be_update_state(dpcm->be, stream, SND_SOC_DPCM_UPDATE_BE);
 		prune++;
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	dev_dbg(fe->dev, "ASoC: found %d old BE paths for pruning\n", prune);
 	return prune;
@@ -1451,13 +1460,16 @@ void dpcm_be_dai_stop(struct snd_soc_pcm_runtime *fe, int stream,
 	struct snd_soc_dpcm *dpcm;
 
 	/* disable any enabled and non active backends */
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		struct snd_pcm_substream *be_substream =
 			snd_soc_dpcm_get_substream(be, stream);
 
-		if (dpcm == last)
+		if (dpcm == last) {
+			mutex_unlock(&fe->card->dpcm_mutex);
 			return;
+		}
 
 		/* is this op for this BE ? */
 		if (!snd_soc_dpcm_be_can_update(fe, be, stream))
@@ -1487,6 +1499,7 @@ void dpcm_be_dai_stop(struct snd_soc_pcm_runtime *fe, int stream,
 		be_substream->runtime = NULL;
 		be->dpcm[stream].state = SND_SOC_DPCM_STATE_CLOSE;
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
@@ -1496,6 +1509,7 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
 	int err, count = 0;
 
 	/* only startup BE DAIs that are either sinks or sources to this FE DAI */
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_pcm_substream *be_substream;
 
@@ -1546,11 +1560,13 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
 		be->dpcm[stream].state = SND_SOC_DPCM_STATE_OPEN;
 		count++;
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	return count;
 
 unwind:
 	dpcm_be_dai_startup_rollback(fe, stream, dpcm);
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	dev_err(fe->dev, "ASoC: %s() failed at %s (%d)\n",
 		__func__, be->dai_link->name, err);
@@ -1605,6 +1621,7 @@ static void dpcm_runtime_setup_be_format(struct snd_pcm_substream *substream)
 	 * if FE want to use it (= dpcm_merged_format)
 	 */
 
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		struct snd_soc_pcm_stream *codec_stream;
@@ -1623,6 +1640,7 @@ static void dpcm_runtime_setup_be_format(struct snd_pcm_substream *substream)
 			soc_pcm_hw_update_format(hw, codec_stream);
 		}
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 static void dpcm_runtime_setup_be_chan(struct snd_pcm_substream *substream)
@@ -1640,7 +1658,7 @@ static void dpcm_runtime_setup_be_chan(struct snd_pcm_substream *substream)
 	 * It returns merged BE codec channel;
 	 * if FE want to use it (= dpcm_merged_chan)
 	 */
-
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		struct snd_soc_pcm_stream *cpu_stream;
@@ -1671,6 +1689,7 @@ static void dpcm_runtime_setup_be_chan(struct snd_pcm_substream *substream)
 			soc_pcm_hw_update_chan(hw, codec_stream);
 		}
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 static void dpcm_runtime_setup_be_rate(struct snd_pcm_substream *substream)
@@ -1688,7 +1707,7 @@ static void dpcm_runtime_setup_be_rate(struct snd_pcm_substream *substream)
 	 * It returns merged BE codec channel;
 	 * if FE want to use it (= dpcm_merged_chan)
 	 */
-
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		struct snd_soc_pcm_stream *pcm;
@@ -1708,6 +1727,7 @@ static void dpcm_runtime_setup_be_rate(struct snd_pcm_substream *substream)
 			soc_pcm_hw_update_rate(hw, pcm);
 		}
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 static int dpcm_apply_symmetry(struct snd_pcm_substream *fe_substream,
@@ -1730,6 +1750,7 @@ static int dpcm_apply_symmetry(struct snd_pcm_substream *fe_substream,
 	}
 
 	/* apply symmetry for BE */
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		struct snd_pcm_substream *be_substream =
@@ -1755,6 +1776,7 @@ static int dpcm_apply_symmetry(struct snd_pcm_substream *fe_substream,
 		}
 	}
 error:
+	mutex_unlock(&fe->card->dpcm_mutex);
 	if (err < 0)
 		dev_err(fe->dev, "ASoC: %s failed (%d)\n", __func__, err);
 
@@ -1830,6 +1852,7 @@ void dpcm_be_dai_hw_free(struct snd_soc_pcm_runtime *fe, int stream)
 
 	/* only hw_params backends that are either sinks or sources
 	 * to this frontend DAI */
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 
 		struct snd_soc_pcm_runtime *be = dpcm->be;
@@ -1842,7 +1865,7 @@ void dpcm_be_dai_hw_free(struct snd_soc_pcm_runtime *fe, int stream)
 
 		/* only free hw when no longer used - check all FEs */
 		if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
-				continue;
+			continue;
 
 		/* do not free hw if this BE is used by other FE */
 		if (be->dpcm[stream].users > 1)
@@ -1863,6 +1886,7 @@ void dpcm_be_dai_hw_free(struct snd_soc_pcm_runtime *fe, int stream)
 
 		be->dpcm[stream].state = SND_SOC_DPCM_STATE_HW_FREE;
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 }
 
 static int dpcm_fe_dai_hw_free(struct snd_pcm_substream *substream)
@@ -1896,6 +1920,7 @@ int dpcm_be_dai_hw_params(struct snd_soc_pcm_runtime *fe, int stream)
 	struct snd_soc_dpcm *dpcm;
 	int ret;
 
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		be = dpcm->be;
 		be_substream = snd_soc_dpcm_get_substream(be, stream);
@@ -1935,6 +1960,7 @@ int dpcm_be_dai_hw_params(struct snd_soc_pcm_runtime *fe, int stream)
 
 		be->dpcm[stream].state = SND_SOC_DPCM_STATE_HW_PARAMS;
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 	return 0;
 
 unwind:
@@ -1961,6 +1987,7 @@ int dpcm_be_dai_hw_params(struct snd_soc_pcm_runtime *fe, int stream)
 
 		soc_pcm_hw_free(be_substream);
 	}
+	mutex_unlock(&fe->card->dpcm_mutex);
 
 	return ret;
 }
@@ -2008,6 +2035,7 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 	struct snd_soc_dpcm *dpcm;
 	int ret = 0;
 
+	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_pcm_substream *be_substream;
 
@@ -2097,6 +2125,7 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 		}
 	}
 end:
+	mutex_unlock(&fe->card->dpcm_mutex);
 	if (ret < 0)
 		dev_err(fe->dev, "ASoC: %s() failed at %s (%d)\n",
 			__func__, be->dai_link->name, ret);
@@ -2831,6 +2860,7 @@ struct snd_pcm_substream *
 }
 EXPORT_SYMBOL_GPL(snd_soc_dpcm_get_substream);
 
+/* This must be called with fe->card->dpcm_mutex held */
 static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 				    struct snd_soc_pcm_runtime *be,
 				    int stream,
@@ -2842,7 +2872,6 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 	int ret = 1;
 	int i;
 
-	mutex_lock(&fe->card->dpcm_mutex);
 	for_each_dpcm_fe(be, stream, dpcm) {
 
 		if (dpcm->fe == fe)
@@ -2856,7 +2885,6 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 			}
 		}
 	}
-	mutex_unlock(&fe->card->dpcm_mutex);
 
 	/* it's safe to do this BE DAI */
 	return ret;
@@ -2865,6 +2893,7 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 /*
  * We can only hw_free, stop, pause or suspend a BE DAI if any of it's FE
  * are not running, paused or suspended for the specified stream direction.
+ * This must be called with fe->card->dpcm_mutex held.
  */
 static int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
@@ -2881,6 +2910,7 @@ static int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
 /*
  * We can only change hw params a BE DAI if any of it's FE are not prepared,
  * running, paused or suspended for the specified stream direction.
+ * This must be called with fe->card->dpcm_mutex held.
  */
 static int snd_soc_dpcm_can_be_params(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 5/5] ASoC: soc-pcm: test refcount before triggering
  2021-10-04 22:54 [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE Pierre-Louis Bossart
@ 2021-10-04 22:54   ` Pierre-Louis Bossart
  2021-10-04 22:54   ` Pierre-Louis Bossart
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-04 22:54 UTC (permalink / raw)
  To: alsa-devel
  Cc: tiwai, broonie, vkoul, Sameer Pujar, Gyeongtaek Lee,
	Peter Ujfalusi, Kuninori Morimoto, Pierre-Louis Bossart,
	Liam Girdwood, Jaroslav Kysela, Takashi Iwai, open list

On start/pause_release/resume, when more than one FE is connected to
the same BE, it's possible that the trigger is sent more than
once. This is not desirable, we only want to trigger a BE once, which
is straightforward to implement with a refcount.

For stop/pause/suspend, the problem is more complicated: the check
implemented in snd_soc_dpcm_can_be_free_stop() may fail due to a
conceptual deadlock when we trigger the BE before the FE. In this
case, the FE states have not yet changed, so there are corner cases
where the TRIGGER_STOP is never sent - the dual case of start where
multiple triggers might be sent.

This patch suggests an unconditional trigger in all cases, without
checking the FE states, using a refcount protected by a mutex.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 include/sound/soc-dpcm.h |  2 ++
 sound/soc/soc-pcm.c      | 53 +++++++++++++++++++++++++++++++---------
 2 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/include/sound/soc-dpcm.h b/include/sound/soc-dpcm.h
index 9c00118603e7..6d84e2c60ca8 100644
--- a/include/sound/soc-dpcm.h
+++ b/include/sound/soc-dpcm.h
@@ -101,6 +101,8 @@ struct snd_soc_dpcm_runtime {
 	enum snd_soc_dpcm_state state;
 
 	int trigger_pending; /* trigger cmd + 1 if pending, 0 if not */
+
+	int be_start; /* refcount protected by card's dpcm_mutex */
 };
 
 #define for_each_dpcm_fe(be, stream, _dpcm)				\
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index 6089e0c1bf9f..ed898d83fe17 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -1556,7 +1556,7 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_CLOSE;
 			goto unwind;
 		}
-
+		be->dpcm[stream].be_start = 0;
 		be->dpcm[stream].state = SND_SOC_DPCM_STATE_OPEN;
 		count++;
 	}
@@ -2051,14 +2051,21 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 
 		switch (cmd) {
 		case SNDRV_PCM_TRIGGER_START:
-			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
+			if (!be->dpcm[stream].be_start &&
+			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
 			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
 			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
 				continue;
 
+			be->dpcm[stream].be_start++;
+			if (be->dpcm[stream].be_start != 1)
+				continue;
+
 			ret = soc_pcm_trigger(be_substream, cmd);
-			if (ret)
+			if (ret) {
+				be->dpcm[stream].be_start--;
 				goto end;
+			}
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_START;
 			break;
@@ -2066,9 +2073,15 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_SUSPEND))
 				continue;
 
+			be->dpcm[stream].be_start++;
+			if (be->dpcm[stream].be_start != 1)
+				continue;
+
 			ret = soc_pcm_trigger(be_substream, cmd);
-			if (ret)
+			if (ret) {
+				be->dpcm[stream].be_start--;
 				goto end;
+			}
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_START;
 			break;
@@ -2076,9 +2089,15 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
 				continue;
 
+			be->dpcm[stream].be_start++;
+			if (be->dpcm[stream].be_start != 1)
+				continue;
+
 			ret = soc_pcm_trigger(be_substream, cmd);
-			if (ret)
+			if (ret) {
+				be->dpcm[stream].be_start--;
 				goto end;
+			}
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_START;
 			break;
@@ -2087,12 +2106,18 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
 				continue;
 
-			if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
+			if (be->dpcm[stream].state == SND_SOC_DPCM_STATE_START)
+				be->dpcm[stream].be_start--;
+
+			if (be->dpcm[stream].be_start != 0)
 				continue;
 
 			ret = soc_pcm_trigger(be_substream, cmd);
-			if (ret)
+			if (ret) {
+				if (be->dpcm[stream].state == SND_SOC_DPCM_STATE_START)
+					be->dpcm[stream].be_start++;
 				goto end;
+			}
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_STOP;
 			break;
@@ -2100,12 +2125,15 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 			if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START)
 				continue;
 
-			if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
+			be->dpcm[stream].be_start--;
+			if (be->dpcm[stream].be_start != 0)
 				continue;
 
 			ret = soc_pcm_trigger(be_substream, cmd);
-			if (ret)
+			if (ret) {
+				be->dpcm[stream].be_start++;
 				goto end;
+			}
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_SUSPEND;
 			break;
@@ -2113,12 +2141,15 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 			if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START)
 				continue;
 
-			if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
+			be->dpcm[stream].be_start--;
+			if (be->dpcm[stream].be_start != 0)
 				continue;
 
 			ret = soc_pcm_trigger(be_substream, cmd);
-			if (ret)
+			if (ret) {
+				be->dpcm[stream].be_start++;
 				goto end;
+			}
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_PAUSED;
 			break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH v2 5/5] ASoC: soc-pcm: test refcount before triggering
@ 2021-10-04 22:54   ` Pierre-Louis Bossart
  0 siblings, 0 replies; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-04 22:54 UTC (permalink / raw)
  To: alsa-devel
  Cc: Kuninori Morimoto, tiwai, open list, Sameer Pujar, Takashi Iwai,
	Pierre-Louis Bossart, Liam Girdwood, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

On start/pause_release/resume, when more than one FE is connected to
the same BE, it's possible that the trigger is sent more than
once. This is not desirable, we only want to trigger a BE once, which
is straightforward to implement with a refcount.

For stop/pause/suspend, the problem is more complicated: the check
implemented in snd_soc_dpcm_can_be_free_stop() may fail due to a
conceptual deadlock when we trigger the BE before the FE. In this
case, the FE states have not yet changed, so there are corner cases
where the TRIGGER_STOP is never sent - the dual case of start where
multiple triggers might be sent.

This patch suggests an unconditional trigger in all cases, without
checking the FE states, using a refcount protected by a mutex.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 include/sound/soc-dpcm.h |  2 ++
 sound/soc/soc-pcm.c      | 53 +++++++++++++++++++++++++++++++---------
 2 files changed, 44 insertions(+), 11 deletions(-)

diff --git a/include/sound/soc-dpcm.h b/include/sound/soc-dpcm.h
index 9c00118603e7..6d84e2c60ca8 100644
--- a/include/sound/soc-dpcm.h
+++ b/include/sound/soc-dpcm.h
@@ -101,6 +101,8 @@ struct snd_soc_dpcm_runtime {
 	enum snd_soc_dpcm_state state;
 
 	int trigger_pending; /* trigger cmd + 1 if pending, 0 if not */
+
+	int be_start; /* refcount protected by card's dpcm_mutex */
 };
 
 #define for_each_dpcm_fe(be, stream, _dpcm)				\
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index 6089e0c1bf9f..ed898d83fe17 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -1556,7 +1556,7 @@ int dpcm_be_dai_startup(struct snd_soc_pcm_runtime *fe, int stream)
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_CLOSE;
 			goto unwind;
 		}
-
+		be->dpcm[stream].be_start = 0;
 		be->dpcm[stream].state = SND_SOC_DPCM_STATE_OPEN;
 		count++;
 	}
@@ -2051,14 +2051,21 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 
 		switch (cmd) {
 		case SNDRV_PCM_TRIGGER_START:
-			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
+			if (!be->dpcm[stream].be_start &&
+			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
 			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
 			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
 				continue;
 
+			be->dpcm[stream].be_start++;
+			if (be->dpcm[stream].be_start != 1)
+				continue;
+
 			ret = soc_pcm_trigger(be_substream, cmd);
-			if (ret)
+			if (ret) {
+				be->dpcm[stream].be_start--;
 				goto end;
+			}
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_START;
 			break;
@@ -2066,9 +2073,15 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_SUSPEND))
 				continue;
 
+			be->dpcm[stream].be_start++;
+			if (be->dpcm[stream].be_start != 1)
+				continue;
+
 			ret = soc_pcm_trigger(be_substream, cmd);
-			if (ret)
+			if (ret) {
+				be->dpcm[stream].be_start--;
 				goto end;
+			}
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_START;
 			break;
@@ -2076,9 +2089,15 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
 				continue;
 
+			be->dpcm[stream].be_start++;
+			if (be->dpcm[stream].be_start != 1)
+				continue;
+
 			ret = soc_pcm_trigger(be_substream, cmd);
-			if (ret)
+			if (ret) {
+				be->dpcm[stream].be_start--;
 				goto end;
+			}
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_START;
 			break;
@@ -2087,12 +2106,18 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
 				continue;
 
-			if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
+			if (be->dpcm[stream].state == SND_SOC_DPCM_STATE_START)
+				be->dpcm[stream].be_start--;
+
+			if (be->dpcm[stream].be_start != 0)
 				continue;
 
 			ret = soc_pcm_trigger(be_substream, cmd);
-			if (ret)
+			if (ret) {
+				if (be->dpcm[stream].state == SND_SOC_DPCM_STATE_START)
+					be->dpcm[stream].be_start++;
 				goto end;
+			}
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_STOP;
 			break;
@@ -2100,12 +2125,15 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 			if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START)
 				continue;
 
-			if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
+			be->dpcm[stream].be_start--;
+			if (be->dpcm[stream].be_start != 0)
 				continue;
 
 			ret = soc_pcm_trigger(be_substream, cmd);
-			if (ret)
+			if (ret) {
+				be->dpcm[stream].be_start++;
 				goto end;
+			}
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_SUSPEND;
 			break;
@@ -2113,12 +2141,15 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 			if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START)
 				continue;
 
-			if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
+			be->dpcm[stream].be_start--;
+			if (be->dpcm[stream].be_start != 0)
 				continue;
 
 			ret = soc_pcm_trigger(be_substream, cmd);
-			if (ret)
+			if (ret) {
+				be->dpcm[stream].be_start++;
 				goto end;
+			}
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_PAUSED;
 			break;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-04 22:54 [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE Pierre-Louis Bossart
                   ` (4 preceding siblings ...)
  2021-10-04 22:54   ` Pierre-Louis Bossart
@ 2021-10-05  6:36 ` Sameer Pujar
  2021-10-05 13:17   ` Pierre-Louis Bossart
  5 siblings, 1 reply; 33+ messages in thread
From: Sameer Pujar @ 2021-10-05  6:36 UTC (permalink / raw)
  To: Pierre-Louis Bossart, alsa-devel
  Cc: Kuninori Morimoto, tiwai, vkoul, broonie, Gyeongtaek Lee, Peter Ujfalusi



On 10/5/2021 4:24 AM, Pierre-Louis Bossart wrote:
> External email: Use caution opening links or attachments
>
>
> We've been adding a 'deep buffer' PCM device to several SOF topologies
> in order to reduce power consumption. The typical use-case would be
> music playback over a headset: this additional PCM device provides
> more buffering and longer latencies, leaving the rest of the system
> sleep for longer periods. Notifications and 'regular' low-latency
> audio playback would still use the 'normal' PCM device and be mixed
> with the 'deep buffer' before rendering on the headphone endpoint. The
> tentative direction would be to expose this alternate device to
> PulseAudio/PipeWire/CRAS via the UCM SectionModifier definitions.
>
> That seemed a straightforward topology change until our automated
> validation stress tests started reporting issues on SoundWire
> platforms, when e.g. two START triggers might be send and conversely
> the STOP trigger is never sent. The SoundWire stream state management
> flagged inconsistent states when the two 'normal' and 'deep buffer'
> devices are used concurrently with rapid play/stop/pause monkey
> testing.
>
> Looking at the soc-pcm.c code, it seems that the BE state
> management needs a lot of love.
>
> a) there is no consistent protection for the BE state. In some parts
> of the code, the state updates are protected by a spinlock but in the
> trigger they are not. When we open/play/close the two PCM devices in
> stress tests, we end-up testing a state that is being modified. That
> can't be good.
>
> b) there is a conceptual deadlock: on stop we check the FE states to
> see if a shared BE can be stopped, but since we trigger the BE first
> the FE states have not been modified yet, so the TRIGGER_STOP is never
> sent.
>
> This patchset suggests a transition from a spinlock to a mutex, an
> extended protection when walking through the BE list, and the use of a
> refcount to decide when to trigger the BE. With these patches I am
> able to run our entire validation suite without any issues with this
> new 'deep buffer' topology, and no regressions on existing solutions [1]
>
> One might ask 'how come we didn't see this earlier'? The answer is
> probably that the .trigger callbacks in most implementations seems to
> perform DAPM operations, and sending the triggers multiple times is
> not an issue. In the case of SoundWire, we do use the .trigger
> callback to reconfigure the bus using the 'bank switch' mechanism. It
> could be acceptable to tolerate a trigger multiple times, but the
> deadlock on stop cannot be fixed at the SoundWire layer alone.
>
> I chose to send this patchset as an RFCv2 to gather more feedback and
> make use others know about DPCM issues. We're going to spend more time
> on this but if others can provide feedback/test results it would be
> greatly appreciated.
>
> Opens:
>
> 1) is this the right solution? The DPCM code is far from simple, has
> notions such as SND_SOC_DPCM_UPDATE_NO and 'trigger_pending' that I
> have no background on.
>
> 2) There are other reports of kernel oopses [2] that seem related to
> the lack of protection. I'd be good to confirm if this patchset solve
> these problems as well.
>
> [1] https://github.com/thesofproject/linux/pull/3146
> [2] https://lore.kernel.org/alsa-devel/002f01d7b4f5$c030f4a0$4092dde0$@samsung.com/
>
> changes since RFC v1:
> Removed unused function
> Removed exported symbols only used in soc-pcm.c, used static instead
> Use a mutex instead of a spinlock
> Protect all for_each_dpcm_be() loops
> Fix bugs introduced in the refcount


> Pierre-Louis Bossart (5):
>    ASoC: soc-pcm: remove snd_soc_dpcm_fe_can_update()
>    ASoC: soc-pcm: don't export local functions, use static
>    ASoC: soc-pcm: replace dpcm_lock with dpcm_mutex
>    ASoC: soc-pcm: protect for_each_dpcm_be() loops with dpcm_mutex
>    ASoC: soc-pcm: test refcount before triggering

Thank you Pierre-Louis for adding me to this thread.

I did a quick test of your patches on my Tegra board and seeing issues 
with multiple streams. For instance, I ran it for a 2x1 mixer 
configuration and hitting below:

[  277.661886] BUG: scheduling while atomic: aplay/1306/0x00000100
[  287.713193] BUG: spinlock cpu recursion on CPU#0, aplay/1307
[  287.719138]  lock: 0xffff00008cc820f0, .magic: dead4ead, .owner: 
aplay/1306, .owner_cpu: 0
[  287.727319] CPU: 0 PID: 1307 Comm: aplay Tainted: G W         
5.15.0-rc3-next-20210927-00026-gffdabce987b1 #12
[  287.737783] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
[  287.744228] Call trace:
[  287.746656]  dump_backtrace+0x0/0x1c0
[  287.750300]  show_stack+0x18/0x28
[  287.753604]  dump_stack_lvl+0x7c/0xa8
[  287.757236]  dump_stack+0x18/0x34
[  287.760536]  spin_dump+0x70/0x90
[  287.763732]  do_raw_spin_lock+0xd8/0x120
[  287.767615]  _raw_spin_lock_irq+0x60/0x80
[  287.771581]  snd_pcm_stream_lock_irq+0x20/0x48 [snd_pcm]
[  287.776853]  snd_pcm_drain+0x1ec/0x348 [snd_pcm]
[  287.781421]  snd_pcm_common_ioctl+0xacc/0x1938 [snd_pcm]
[  287.786685]  snd_pcm_ioctl+0x2c/0x48 [snd_pcm]
[  287.791101]  __arm64_sys_ioctl+0xb0/0xf0
[  287.794982]  invoke_syscall+0x44/0x108
[  287.798683]  el0_svc_common.constprop.3+0x74/0x100
[  287.803416]  do_el0_svc+0x24/0x90
[  287.806687]  el0_svc+0x20/0x60
[  287.809705]  el0t_64_sync_handler+0x94/0xb8
[  287.813839]  el0t_64_sync+0x180/0x184


And in some case just below:

[ 1074.212276] BUG: scheduling while atomic: aplay/12327/0x00000100
[ 1095.227509] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1095.233443] rcu:     0-...0: (1 GPs behind) 
idle=4af/1/0x4000000000000004 softirq=19902/19902 fqs=2626
[ 1095.242528] rcu:     2-...0: (1 GPs behind) 
idle=9d5/1/0x4000000000000000 softirq=22707/22707 fqs=262



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-05  6:36 ` [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE Sameer Pujar
@ 2021-10-05 13:17   ` Pierre-Louis Bossart
  2021-10-06 14:22     ` Sameer Pujar
  0 siblings, 1 reply; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-05 13:17 UTC (permalink / raw)
  To: Sameer Pujar, alsa-devel
  Cc: Kuninori Morimoto, tiwai, vkoul, broonie, Gyeongtaek Lee, Peter Ujfalusi


> I did a quick test of your patches on my Tegra board and seeing issues
> with multiple streams. For instance, I ran it for a 2x1 mixer
> configuration and hitting below:
> 
> [  277.661886] BUG: scheduling while atomic: aplay/1306/0x00000100
> [  287.713193] BUG: spinlock cpu recursion on CPU#0, aplay/1307
> [  287.719138]  lock: 0xffff00008cc820f0, .magic: dead4ead, .owner:
> aplay/1306, .owner_cpu: 0
> [  287.727319] CPU: 0 PID: 1307 Comm: aplay Tainted: G W        
> 5.15.0-rc3-next-20210927-00026-gffdabce987b1 #12
> [  287.737783] Hardware name: NVIDIA Jetson AGX Xavier Developer Kit (DT)
> [  287.744228] Call trace:
> [  287.746656]  dump_backtrace+0x0/0x1c0
> [  287.750300]  show_stack+0x18/0x28
> [  287.753604]  dump_stack_lvl+0x7c/0xa8
> [  287.757236]  dump_stack+0x18/0x34
> [  287.760536]  spin_dump+0x70/0x90
> [  287.763732]  do_raw_spin_lock+0xd8/0x120
> [  287.767615]  _raw_spin_lock_irq+0x60/0x80
> [  287.771581]  snd_pcm_stream_lock_irq+0x20/0x48 [snd_pcm]
> [  287.776853]  snd_pcm_drain+0x1ec/0x348 [snd_pcm]
> [  287.781421]  snd_pcm_common_ioctl+0xacc/0x1938 [snd_pcm]
> [  287.786685]  snd_pcm_ioctl+0x2c/0x48 [snd_pcm]
> [  287.791101]  __arm64_sys_ioctl+0xb0/0xf0
> [  287.794982]  invoke_syscall+0x44/0x108
> [  287.798683]  el0_svc_common.constprop.3+0x74/0x100
> [  287.803416]  do_el0_svc+0x24/0x90
> [  287.806687]  el0_svc+0x20/0x60
> [  287.809705]  el0t_64_sync_handler+0x94/0xb8
> [  287.813839]  el0t_64_sync+0x180/0x184
> 
> 
> And in some case just below:
> 
> [ 1074.212276] BUG: scheduling while atomic: aplay/12327/0x00000100
> [ 1095.227509] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 1095.233443] rcu:     0-...0: (1 GPs behind)
> idle=4af/1/0x4000000000000004 softirq=19902/19902 fqs=2626
> [ 1095.242528] rcu:     2-...0: (1 GPs behind)
> idle=9d5/1/0x4000000000000000 softirq=22707/22707 fqs=262

Thanks Sameer for the overnight tests, much appreciated.

My patches don't change anything related to a spinlock or pcm stream
management, so not sure what could cause this rather spectacular
failure. That hints at a fundamental configuration difference, possibly
caused by your component chaining?

Since in your case you have a 1:1 mapping between FE and BE, would you
mind testing by backtracking, one patch at a time to see which one of
the three last patches could cause a problem on your board?

Thanks again for your time!
-Pierre


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-05 13:17   ` Pierre-Louis Bossart
@ 2021-10-06 14:22     ` Sameer Pujar
  2021-10-06 19:47       ` Pierre-Louis Bossart
  0 siblings, 1 reply; 33+ messages in thread
From: Sameer Pujar @ 2021-10-06 14:22 UTC (permalink / raw)
  To: Pierre-Louis Bossart, alsa-devel
  Cc: Kuninori Morimoto, tiwai, vkoul, broonie, Gyeongtaek Lee, Peter Ujfalusi

Hi Pierre,

On 10/5/2021 6:47 PM, Pierre-Louis Bossart wrote:
>
> My patches don't change anything related to a spinlock or pcm stream
> management, so not sure what could cause this rather spectacular
> failure. That hints at a fundamental configuration difference, possibly
> caused by your component chaining?
>
> Since in your case you have a 1:1 mapping between FE and BE, would you
> mind testing by backtracking, one patch at a time to see which one of
> the three last patches could cause a problem on your board?

I tested this further. It appears that things work fine till 'patch 3/5' 
of yours. After I take 'patch 4/5', the print "BUG: scheduling while 
atomic: aplay" started showing up and I see a hang. This failure was 
seen for 2x1 mixer test itself.

The 'patch 4/5' introduces mutex_lock/unlock() in dpcm_be_dai_trigger(). 
This seems to be the problem, since trigger() runs in atomic context 
depending on the PCM 'nonatomic' flag. I am not sure if your design sets 
'nonatomic' flag by default and that is why the issue is not seen at 
your end?

With below (just for testing purpose), tests ran well. I was able to run 
2x1, 3x1 ... 10x1 mixer tests.

diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index e5df898..2ce30d1 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -2045,7 +2045,7 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime 
*fe, int stream,
         struct snd_soc_dpcm *dpcm;
         int ret = 0;

-       mutex_lock(&fe->card->dpcm_mutex);
+       //mutex_lock(&fe->card->dpcm_mutex);
         for_each_dpcm_be(fe, stream, dpcm) {
                 struct snd_pcm_substream *be_substream;

@@ -2166,7 +2166,7 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime 
*fe, int stream,
                 }
         }
  end:
-       mutex_unlock(&fe->card->dpcm_mutex);
+       //mutex_unlock(&fe->card->dpcm_mutex);
         if (ret < 0)
                 dev_err(fe->dev, "ASoC: %s() failed at %s (%d)\n",
                         __func__, be->dai_link->name, ret);


In fact I picked up all of your patches + above test patch, it worked fine.


Thanks,
Sameer.

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-06 14:22     ` Sameer Pujar
@ 2021-10-06 19:47       ` Pierre-Louis Bossart
  2021-10-07 11:06         ` Takashi Iwai
  0 siblings, 1 reply; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-06 19:47 UTC (permalink / raw)
  To: Sameer Pujar, alsa-devel
  Cc: Kuninori Morimoto, tiwai, vkoul, broonie, Gyeongtaek Lee, Peter Ujfalusi


> I tested this further. It appears that things work fine till 'patch 3/5'
> of yours. After I take 'patch 4/5', the print "BUG: scheduling while
> atomic: aplay" started showing up and I see a hang. This failure was
> seen for 2x1 mixer test itself.
> 
> The 'patch 4/5' introduces mutex_lock/unlock() in dpcm_be_dai_trigger().
> This seems to be the problem, since trigger() runs in atomic context
> depending on the PCM 'nonatomic' flag. I am not sure if your design sets
> 'nonatomic' flag by default and that is why the issue is not seen at
> your end?
> 
> With below (just for testing purpose), tests ran well. I was able to run
> 2x1, 3x1 ... 10x1 mixer tests.

This is useful information, thanks a lot Sameer for your time.

Unfortunately removing the lines below will not work, that's
precisely what's needed to prevent multiple triggers from being sent to
the same shared BE.

I don't have a clear idea why we see different results, and now I have
even less understanding of the ALSA/ASoC/DPCM locking model. We use
card->mutex, card->pcm_mutex, card->dpcm_lock,
substream->self_group.mutex or lock, client_mutex, dapm_mutex....

I don't think any of the DPCM code is ever called from an interrupt
context, so the errors you reported seem more like a card configuration,
specifically the interaction with 'aplay'/userspace will happen for FEs.
BEs are mostly hidden to userspace.

One possible difference is that all our DPCM solutions are based on
non-atomic FEs (since they all involve an IPC with a DSP), so we would
always use the top banch of these tests:

	if (nonatomic) \
		mutex_ ## mutex_action(&group->mutex); \
	else \
		spin_ ## action(&group->lock);

I don't see this nonatomic flag set anywhere in the sound/soc/tegra
code, so it's possible that in your case the spin_lock/spin_lock_irq is
used before triggering the FE, and using a mutex before the BE trigger
throws errors? I think Takashi mentioned that the BEs inherit the
properties of the FE to some extent.

Looking at the code, I see that only Intel legacy, SOF and Qualcomm
drivers set nonatomic=1, so maybe we can assume that that FEs for a card
are either all atomic or all non-atomic, maybe we could then use a
spinlock or a mutex. But if the FEs can be different then I am not sure
what locking model might work, we can't have a BE protected by a
spinlock or a mutex depending on the property of the FE. We need one
method only to guarantee a mutual exclusion.

Takashi, Mark, do you think that an all/none assumption on FE nonatomic
properties would make sense?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-06 19:47       ` Pierre-Louis Bossart
@ 2021-10-07 11:06         ` Takashi Iwai
  2021-10-07 13:31           ` Pierre-Louis Bossart
  0 siblings, 1 reply; 33+ messages in thread
From: Takashi Iwai @ 2021-10-07 11:06 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

On Wed, 06 Oct 2021 21:47:33 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> > I tested this further. It appears that things work fine till 'patch 3/5'
> > of yours. After I take 'patch 4/5', the print "BUG: scheduling while
> > atomic: aplay" started showing up and I see a hang. This failure was
> > seen for 2x1 mixer test itself.
> > 
> > The 'patch 4/5' introduces mutex_lock/unlock() in dpcm_be_dai_trigger().
> > This seems to be the problem, since trigger() runs in atomic context
> > depending on the PCM 'nonatomic' flag. I am not sure if your design sets
> > 'nonatomic' flag by default and that is why the issue is not seen at
> > your end?
> > 
> > With below (just for testing purpose), tests ran well. I was able to run
> > 2x1, 3x1 ... 10x1 mixer tests.
> 
> This is useful information, thanks a lot Sameer for your time.
> 
> Unfortunately removing the lines below will not work, that's
> precisely what's needed to prevent multiple triggers from being sent to
> the same shared BE.
> 
> I don't have a clear idea why we see different results, and now I have
> even less understanding of the ALSA/ASoC/DPCM locking model. We use
> card->mutex, card->pcm_mutex, card->dpcm_lock,
> substream->self_group.mutex or lock, client_mutex, dapm_mutex....
> 
> I don't think any of the DPCM code is ever called from an interrupt
> context, so the errors you reported seem more like a card configuration,
> specifically the interaction with 'aplay'/userspace will happen for FEs.
> BEs are mostly hidden to userspace.
> 
> One possible difference is that all our DPCM solutions are based on
> non-atomic FEs (since they all involve an IPC with a DSP), so we would
> always use the top banch of these tests:
> 
> 	if (nonatomic) \
> 		mutex_ ## mutex_action(&group->mutex); \
> 	else \
> 		spin_ ## action(&group->lock);
> 
> I don't see this nonatomic flag set anywhere in the sound/soc/tegra
> code, so it's possible that in your case the spin_lock/spin_lock_irq is
> used before triggering the FE, and using a mutex before the BE trigger
> throws errors? I think Takashi mentioned that the BEs inherit the
> properties of the FE to some extent.
> 
> Looking at the code, I see that only Intel legacy, SOF and Qualcomm
> drivers set nonatomic=1, so maybe we can assume that that FEs for a card
> are either all atomic or all non-atomic, maybe we could then use a
> spinlock or a mutex. But if the FEs can be different then I am not sure
> what locking model might work, we can't have a BE protected by a
> spinlock or a mutex depending on the property of the FE. We need one
> method only to guarantee a mutual exclusion.
> 
> Takashi, Mark, do you think that an all/none assumption on FE nonatomic
> properties would make sense?

As long as BE's are updated from FE's PCM callback, BE has to follow
the atomicity of its FE, so we can't assume all or none globally.

How is the expected lock granularity and the protection context?  Do
we need a card global lock/mutex, or is it specific to each BE
substream?


thanks,

Takashi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-07 11:06         ` Takashi Iwai
@ 2021-10-07 13:31           ` Pierre-Louis Bossart
  2021-10-07 14:59             ` Takashi Iwai
  0 siblings, 1 reply; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-07 13:31 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi


>> Takashi, Mark, do you think that an all/none assumption on FE nonatomic
>> properties would make sense?
> 
> As long as BE's are updated from FE's PCM callback, BE has to follow
> the atomicity of its FE, so we can't assume all or none globally.

A BE may have more than one FEs. That's precisely the point of
mixers/demux, and if the BE has FEs with different 'atomicity' then I
don't see how locking can work: the BE operations are run in the context
of each of its FE, typically using the following loop:

for_each_dpcm_be(fe, stream, dpcm) {
   do_something();
}

Applications will view multiple FEs as completely independent. They may
be opened/prepared/started/stopped/paused as needed. When the BE is
shared, then there is a need for consistency, such as starting the BE
when the first FE becomes active and stopping it when the last FE stops.

> How is the expected lock granularity and the protection context?  Do
> we need a card global lock/mutex, or is it specific to each BE
> substream?

We already have a dpcm_lock at the card level, which protects the
addition of BEs and BE states. That spin_lock is fine for most cases.

The only real problem is the trigger, which is currently completely
unprotected: we have to serialize the BE triggers, otherwise you could
STOP before you START due to scheduling, or other problems that I saw in
my SoundWire tests with two START triggers, or the STOP never sent.

But how to do this serialization is unclear...

A lateral thinking approach would be to decouple the BEs entirely, and
have the FEs 'signal' their change of state. The BE 'thread' run in the
BE context would then serialize the requests and perform all the BE
operations, and the same approach could be chained. I am afraid that
would be a complete rewrite of DPCM, but maybe we have to do so anyways
if we can't support a basic case of a mixer with 2 streams :-)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-07 13:31           ` Pierre-Louis Bossart
@ 2021-10-07 14:59             ` Takashi Iwai
  2021-10-07 15:24               ` Pierre-Louis Bossart
  0 siblings, 1 reply; 33+ messages in thread
From: Takashi Iwai @ 2021-10-07 14:59 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

On Thu, 07 Oct 2021 15:31:49 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> >> Takashi, Mark, do you think that an all/none assumption on FE nonatomic
> >> properties would make sense?
> > 
> > As long as BE's are updated from FE's PCM callback, BE has to follow
> > the atomicity of its FE, so we can't assume all or none globally.
> 
> A BE may have more than one FEs. That's precisely the point of
> mixers/demux, and if the BE has FEs with different 'atomicity' then I
> don't see how locking can work: the BE operations are run in the context
> of each of its FE, typically using the following loop:
> 
> for_each_dpcm_be(fe, stream, dpcm) {
>    do_something();
> }

Do we really have the cases where FEs have different atomicity?
I don't think it's a valid configuration, and we should catch it via
WARN_ON() or such.

> Applications will view multiple FEs as completely independent. They may
> be opened/prepared/started/stopped/paused as needed. When the BE is
> shared, then there is a need for consistency, such as starting the BE
> when the first FE becomes active and stopping it when the last FE stops.
> 
> > How is the expected lock granularity and the protection context?  Do
> > we need a card global lock/mutex, or is it specific to each BE
> > substream?
> 
> We already have a dpcm_lock at the card level, which protects the
> addition of BEs and BE states. That spin_lock is fine for most cases.
> 
> The only real problem is the trigger, which is currently completely
> unprotected: we have to serialize the BE triggers, otherwise you could
> STOP before you START due to scheduling, or other problems that I saw in
> my SoundWire tests with two START triggers, or the STOP never sent.

So it's about calling triggers to the same BE stream concurrently?
If that's the case, can't we simply protect the trigger handling of
each BE like below?

> But how to do this serialization is unclear...
> 
> A lateral thinking approach would be to decouple the BEs entirely, and
> have the FEs 'signal' their change of state. The BE 'thread' run in the
> BE context would then serialize the requests and perform all the BE
> operations, and the same approach could be chained. I am afraid that
> would be a complete rewrite of DPCM, but maybe we have to do so anyways
> if we can't support a basic case of a mixer with 2 streams :-)

Well, let's see whether we can get some improvements by a simple
change at first.


Takashi


--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -1998,6 +1998,7 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 {
 	struct snd_soc_pcm_runtime *be;
 	struct snd_soc_dpcm *dpcm;
+	unsigned long flags;
 	int ret = 0;
 
 	for_each_dpcm_be(fe, stream, dpcm) {
@@ -2006,9 +2007,11 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 		be = dpcm->be;
 		be_substream = snd_soc_dpcm_get_substream(be, stream);
 
+		snd_pcm_stream_lock_irqsave(be_substream, flags);
+
 		/* is this op for this BE ? */
 		if (!snd_soc_dpcm_be_can_update(fe, be, stream))
-			continue;
+			goto unlock;
 
 		dev_dbg(be->dev, "ASoC: trigger BE %s cmd %d\n",
 			be->dai_link->name, cmd);
@@ -2018,77 +2021,81 @@ int dpcm_be_dai_trigger(struct snd_soc_pcm_runtime *fe, int stream,
 			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
 			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
 			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
-				continue;
+				goto unlock;
 
 			ret = soc_pcm_trigger(be_substream, cmd);
 			if (ret)
-				goto end;
+				goto unlock;
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_START;
 			break;
 		case SNDRV_PCM_TRIGGER_RESUME:
 			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_SUSPEND))
-				continue;
+				goto unlock;
 
 			ret = soc_pcm_trigger(be_substream, cmd);
 			if (ret)
-				goto end;
+				goto unlock;
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_START;
 			break;
 		case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
 			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
-				continue;
+				goto unlock;
 
 			ret = soc_pcm_trigger(be_substream, cmd);
 			if (ret)
-				goto end;
+				goto unlock;
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_START;
 			break;
 		case SNDRV_PCM_TRIGGER_STOP:
 			if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) &&
 			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
-				continue;
+				goto unlock;
 
 			if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
-				continue;
+				goto unlock;
 
 			ret = soc_pcm_trigger(be_substream, cmd);
 			if (ret)
-				goto end;
+				goto unlock;
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_STOP;
 			break;
 		case SNDRV_PCM_TRIGGER_SUSPEND:
 			if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START)
-				continue;
+				goto unlock;
 
 			if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
-				continue;
+				goto unlock;
 
 			ret = soc_pcm_trigger(be_substream, cmd);
 			if (ret)
-				goto end;
+				goto unlock;
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_SUSPEND;
 			break;
 		case SNDRV_PCM_TRIGGER_PAUSE_PUSH:
 			if (be->dpcm[stream].state != SND_SOC_DPCM_STATE_START)
-				continue;
+				goto unlock;
 
 			if (!snd_soc_dpcm_can_be_free_stop(fe, be, stream))
-				continue;
+				goto unlock;
 
 			ret = soc_pcm_trigger(be_substream, cmd);
 			if (ret)
-				goto end;
+				goto unlock;
 
 			be->dpcm[stream].state = SND_SOC_DPCM_STATE_PAUSED;
 			break;
 		}
+	unlock:
+		snd_pcm_stream_unlock_irqrestore(be_substream, flags);
+		if (ret < 0)
+			break;
 	}
-end:
+
 	if (ret < 0)
 		dev_err(fe->dev, "ASoC: %s() failed at %s (%d)\n",
 			__func__, be->dai_link->name, ret);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-07 14:59             ` Takashi Iwai
@ 2021-10-07 15:24               ` Pierre-Louis Bossart
  2021-10-07 15:44                 ` Takashi Iwai
  0 siblings, 1 reply; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-07 15:24 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi



>>>> Takashi, Mark, do you think that an all/none assumption on FE nonatomic
>>>> properties would make sense?
>>>
>>> As long as BE's are updated from FE's PCM callback, BE has to follow
>>> the atomicity of its FE, so we can't assume all or none globally.
>>
>> A BE may have more than one FEs. That's precisely the point of
>> mixers/demux, and if the BE has FEs with different 'atomicity' then I
>> don't see how locking can work: the BE operations are run in the context
>> of each of its FE, typically using the following loop:
>>
>> for_each_dpcm_be(fe, stream, dpcm) {
>>    do_something();
>> }
> 
> Do we really have the cases where FEs have different atomicity?
> I don't think it's a valid configuration, and we should catch it via
> WARN_ON() or such.

I don't think we have this configuration today, that's why I suggested
making the assumption it's an unsupported configuration.

That would allow us to use the relevant locking mechanism, as done for
PCM streams.

>> Applications will view multiple FEs as completely independent. They may
>> be opened/prepared/started/stopped/paused as needed. When the BE is
>> shared, then there is a need for consistency, such as starting the BE
>> when the first FE becomes active and stopping it when the last FE stops.
>>
>>> How is the expected lock granularity and the protection context?  Do
>>> we need a card global lock/mutex, or is it specific to each BE
>>> substream?
>>
>> We already have a dpcm_lock at the card level, which protects the
>> addition of BEs and BE states. That spin_lock is fine for most cases.
>>
>> The only real problem is the trigger, which is currently completely
>> unprotected: we have to serialize the BE triggers, otherwise you could
>> STOP before you START due to scheduling, or other problems that I saw in
>> my SoundWire tests with two START triggers, or the STOP never sent.
> 
> So it's about calling triggers to the same BE stream concurrently?
> If that's the case, can't we simply protect the trigger handling of
> each BE like below?

Using snd_pcm_stream_lock_irqsave(be_substream, flags); will prevent
multiple triggers indeed, but the state management is handled by
dpcm_lock, so I think we have to use dpcm_lock/mutex in all BE transitions.

if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
 			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))

if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
 			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))

The position of the lock also matters, we may have to take the lock
before walking through the list, since that list can be modified. that's
what Gyeongtaek Lee reported with a separate patch, I was hoping that we
can fix all BE state handling in a consistent manner.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-07 15:24               ` Pierre-Louis Bossart
@ 2021-10-07 15:44                 ` Takashi Iwai
  2021-10-07 18:13                   ` Pierre-Louis Bossart
  0 siblings, 1 reply; 33+ messages in thread
From: Takashi Iwai @ 2021-10-07 15:44 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

On Thu, 07 Oct 2021 17:24:45 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> 
> >>>> Takashi, Mark, do you think that an all/none assumption on FE nonatomic
> >>>> properties would make sense?
> >>>
> >>> As long as BE's are updated from FE's PCM callback, BE has to follow
> >>> the atomicity of its FE, so we can't assume all or none globally.
> >>
> >> A BE may have more than one FEs. That's precisely the point of
> >> mixers/demux, and if the BE has FEs with different 'atomicity' then I
> >> don't see how locking can work: the BE operations are run in the context
> >> of each of its FE, typically using the following loop:
> >>
> >> for_each_dpcm_be(fe, stream, dpcm) {
> >>    do_something();
> >> }
> > 
> > Do we really have the cases where FEs have different atomicity?
> > I don't think it's a valid configuration, and we should catch it via
> > WARN_ON() or such.
> 
> I don't think we have this configuration today, that's why I suggested
> making the assumption it's an unsupported configuration.
> 
> That would allow us to use the relevant locking mechanism, as done for
> PCM streams.
> 
> >> Applications will view multiple FEs as completely independent. They may
> >> be opened/prepared/started/stopped/paused as needed. When the BE is
> >> shared, then there is a need for consistency, such as starting the BE
> >> when the first FE becomes active and stopping it when the last FE stops.
> >>
> >>> How is the expected lock granularity and the protection context?  Do
> >>> we need a card global lock/mutex, or is it specific to each BE
> >>> substream?
> >>
> >> We already have a dpcm_lock at the card level, which protects the
> >> addition of BEs and BE states. That spin_lock is fine for most cases.
> >>
> >> The only real problem is the trigger, which is currently completely
> >> unprotected: we have to serialize the BE triggers, otherwise you could
> >> STOP before you START due to scheduling, or other problems that I saw in
> >> my SoundWire tests with two START triggers, or the STOP never sent.
> > 
> > So it's about calling triggers to the same BE stream concurrently?
> > If that's the case, can't we simply protect the trigger handling of
> > each BE like below?
> 
> Using snd_pcm_stream_lock_irqsave(be_substream, flags); will prevent
> multiple triggers indeed, but the state management is handled by
> dpcm_lock, so I think we have to use dpcm_lock/mutex in all BE transitions.
> 
> if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
>     (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
>  			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
> 
> if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
>     (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
>  			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))

The stream lock can be put around those appropriate places, I suppose?


> The position of the lock also matters, we may have to take the lock
> before walking through the list, since that list can be modified. that's
> what Gyeongtaek Lee reported with a separate patch, I was hoping that we
> can fix all BE state handling in a consistent manner.

The list belongs to FE, so possibly we can take FE's stream lock while
manipulating the list for protecting from the racy trigger access.

But beware that the stream lock would be needed only if nonatomic is
false.  In the nonatomic PCM, the stream lock is a mutex and it's used
for all PCM ops.  OTOH, in normal PCM mode, the spinlock is used for
trigger and a few others while mutex is used for prepare and co, hence
an extra stream lock is needed there.


Takashi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-07 15:44                 ` Takashi Iwai
@ 2021-10-07 18:13                   ` Pierre-Louis Bossart
  2021-10-07 21:11                     ` Takashi Iwai
  0 siblings, 1 reply; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-07 18:13 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi


>> Using snd_pcm_stream_lock_irqsave(be_substream, flags); will prevent
>> multiple triggers indeed, but the state management is handled by
>> dpcm_lock, so I think we have to use dpcm_lock/mutex in all BE transitions.
>>
>> if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
>>     (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
>>  			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
>>
>> if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
>>     (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
>>  			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
> 
> The stream lock can be put around those appropriate places, I suppose?

I doubled checked the code a bit more, and all functions using
be->dpcm[stream].state and be->dpcm[stream].users are protected by the
card->mutex.

The exceptions are dpcm_be_dai_trigger() and dpcm_show_state() so we
probably don't need to worry too much about these fields.

I am more nervous about that the dpcm_lock was supposed to protect. It
was added in "ASoC: dpcm: prevent snd_soc_dpcm use after free" to solve
a race condition, according to the commit log between

void dpcm_be_disconnect(
    	...
    	list_del(&dpcm->list_be);
    	list_del(&dpcm->list_fe);
    	kfree(dpcm);
    	...

and
    	for_each_dpcm_fe()
    	for_each_dpcm_be*()

That would suggest that every one of those loops should be protected,
but that's not the case at all. In some cases the spinlock is taken
*inside* of the loops.

I think this is what explains the problem reported by Gyeongtaek Lee in

https://lore.kernel.org/alsa-devel/002f01d7b4f5$c030f4a0$4092dde0$@samsung.com/

the for_each_dpcm_be() loop in dpcm_be_dai_trigger() is NOT protected.

But if we add a spin-lock in there, the atomicity remains a problem.

I think the only solution is to follow the example of the PCM case,
where the type of lock depends on the FE types, with the assumption that
there are no mixed atomic/non-atomic FE configurations.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-07 18:13                   ` Pierre-Louis Bossart
@ 2021-10-07 21:11                     ` Takashi Iwai
  2021-10-07 21:27                       ` Pierre-Louis Bossart
  0 siblings, 1 reply; 33+ messages in thread
From: Takashi Iwai @ 2021-10-07 21:11 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

On Thu, 07 Oct 2021 20:13:22 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> >> Using snd_pcm_stream_lock_irqsave(be_substream, flags); will prevent
> >> multiple triggers indeed, but the state management is handled by
> >> dpcm_lock, so I think we have to use dpcm_lock/mutex in all BE transitions.
> >>
> >> if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
> >>     (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
> >>  			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
> >>
> >> if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_PREPARE) &&
> >>     (be->dpcm[stream].state != SND_SOC_DPCM_STATE_STOP) &&
> >>  			    (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
> > 
> > The stream lock can be put around those appropriate places, I suppose?
> 
> I doubled checked the code a bit more, and all functions using
> be->dpcm[stream].state and be->dpcm[stream].users are protected by the
> card->mutex.
> 
> The exceptions are dpcm_be_dai_trigger() and dpcm_show_state() so we
> probably don't need to worry too much about these fields.
> 
> I am more nervous about that the dpcm_lock was supposed to protect. It
> was added in "ASoC: dpcm: prevent snd_soc_dpcm use after free" to solve
> a race condition, according to the commit log between
> 
> void dpcm_be_disconnect(
>     	...
>     	list_del(&dpcm->list_be);
>     	list_del(&dpcm->list_fe);
>     	kfree(dpcm);
>     	...
> 
> and
>     	for_each_dpcm_fe()
>     	for_each_dpcm_be*()
> 
> That would suggest that every one of those loops should be protected,
> but that's not the case at all. In some cases the spinlock is taken
> *inside* of the loops.
> 
> I think this is what explains the problem reported by Gyeongtaek Lee in
> 
> https://lore.kernel.org/alsa-devel/002f01d7b4f5$c030f4a0$4092dde0$@samsung.com/
> 
> the for_each_dpcm_be() loop in dpcm_be_dai_trigger() is NOT protected.
> 
> But if we add a spin-lock in there, the atomicity remains a problem.
> 
> I think the only solution is to follow the example of the PCM case,
> where the type of lock depends on the FE types, with the assumption that
> there are no mixed atomic/non-atomic FE configurations.

Yes, and I guess we can simply replace those all card->dpcm_lock with
FE's stream lock.  It'll solve the atomicity problem, too, and the FE
stream lock can be applied outside the loop of dpcm_be_disconnect()
gracefully.

And, this should solve the race with dpcm_be_dai_trigger() as well,
because it's called from dpcm_fe_dai_trigger() that is called already
inside the FE's stream lock held by PCM core.  A PoC is something like
below.  (I replaced the superfluous *_irqsave with *_irq there)

The scenario above (using the FE stream lock) is one missing piece,
though: the compress API seems using the DPCM, and this might not
work.  It needs more verification.


BTW, looking through the current code, I wonder how the
snd_pcm_stream_lock_irq() call in dpcm_set_fe_update_state() doesn't
deadlock when nonatomic=true.  The function may be called from
dpcm_fe_dai_prepare(), which is a PCM prepare callback for FE.  And, a
PCM prepare callback is called always inside the stream mutex, which
is *the* stream lock in the case of nonatomic mode.
Maybe I might overlook something obvious...


thanks,

Takashi

---
diff --git a/include/sound/soc.h b/include/sound/soc.h
index 8e6dd8a257c5..5872a8864f3b 100644
--- a/include/sound/soc.h
+++ b/include/sound/soc.h
@@ -893,8 +893,6 @@ struct snd_soc_card {
 	struct mutex pcm_mutex;
 	enum snd_soc_pcm_subclass pcm_subclass;
 
-	spinlock_t dpcm_lock;
-
 	int (*probe)(struct snd_soc_card *card);
 	int (*late_probe)(struct snd_soc_card *card);
 	int (*remove)(struct snd_soc_card *card);
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index c830e96afba2..51ef9917ca98 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -2339,7 +2339,6 @@ int snd_soc_register_card(struct snd_soc_card *card)
 	mutex_init(&card->mutex);
 	mutex_init(&card->dapm_mutex);
 	mutex_init(&card->pcm_mutex);
-	spin_lock_init(&card->dpcm_lock);
 
 	return snd_soc_bind_card(card);
 }
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index 48f71bb81a2f..c171e5f431b9 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -38,6 +38,15 @@ static inline const char *soc_codec_dai_name(struct snd_soc_pcm_runtime *rtd)
 	return (rtd)->num_codecs == 1 ? asoc_rtd_to_codec(rtd, 0)->name : "multicodec";
 }
 
+#define dpcm_fe_stream_lock_irq(fe, stream) \
+	snd_pcm_stream_lock_irq(snd_soc_dpcm_get_substream(fe, stream))
+#define dpcm_fe_stream_unlock_irq(fe, stream) \
+	snd_pcm_stream_unlock_irq(snd_soc_dpcm_get_substream(fe, stream))
+#define dpcm_fe_stream_lock_irqsave(fe, stream, flags)			\
+	snd_pcm_stream_lock_irq(snd_soc_dpcm_get_substream(fe, stream), flags)
+#define dpcm_fe_stream_unlock_irqrestore(fe, stream, flags)		\
+	snd_pcm_stream_unlock_irq(snd_soc_dpcm_get_substream(fe, stream), flags)
+
 #ifdef CONFIG_DEBUG_FS
 static const char *dpcm_state_string(enum snd_soc_dpcm_state state)
 {
@@ -73,7 +82,6 @@ static ssize_t dpcm_show_state(struct snd_soc_pcm_runtime *fe,
 	struct snd_pcm_hw_params *params = &fe->dpcm[stream].hw_params;
 	struct snd_soc_dpcm *dpcm;
 	ssize_t offset = 0;
-	unsigned long flags;
 
 	/* FE state */
 	offset += scnprintf(buf + offset, size - offset,
@@ -101,7 +109,7 @@ static ssize_t dpcm_show_state(struct snd_soc_pcm_runtime *fe,
 		goto out;
 	}
 
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	dpcm_fe_stream_lock_irq(fe, stream);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		params = &dpcm->hw_params;
@@ -122,7 +130,7 @@ static ssize_t dpcm_show_state(struct snd_soc_pcm_runtime *fe,
 					   params_channels(params),
 					   params_rate(params));
 	}
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	dpcm_fe_stream_unlock_irq(fe, stream);
 out:
 	return offset;
 }
@@ -1129,7 +1137,6 @@ static int dpcm_be_connect(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
 {
 	struct snd_soc_dpcm *dpcm;
-	unsigned long flags;
 
 	/* only add new dpcms */
 	for_each_dpcm_be(fe, stream, dpcm) {
@@ -1141,14 +1148,14 @@ static int dpcm_be_connect(struct snd_soc_pcm_runtime *fe,
 	if (!dpcm)
 		return -ENOMEM;
 
+	dpcm_fe_stream_lock_irq(fe, stream);
 	dpcm->be = be;
 	dpcm->fe = fe;
 	be->dpcm[stream].runtime = fe->dpcm[stream].runtime;
 	dpcm->state = SND_SOC_DPCM_LINK_STATE_NEW;
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
 	list_add(&dpcm->list_be, &fe->dpcm[stream].be_clients);
 	list_add(&dpcm->list_fe, &be->dpcm[stream].fe_clients);
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	dpcm_fe_stream_unlock_irq(fe, stream);
 
 	dev_dbg(fe->dev, "connected new DPCM %s path %s %s %s\n",
 			stream ? "capture" : "playback",  fe->dai_link->name,
@@ -1191,8 +1198,8 @@ static void dpcm_be_reparent(struct snd_soc_pcm_runtime *fe,
 void dpcm_be_disconnect(struct snd_soc_pcm_runtime *fe, int stream)
 {
 	struct snd_soc_dpcm *dpcm, *d;
-	unsigned long flags;
 
+	dpcm_fe_stream_lock_irq(fe, stream);
 	for_each_dpcm_be_safe(fe, stream, dpcm, d) {
 		dev_dbg(fe->dev, "ASoC: BE %s disconnect check for %s\n",
 				stream ? "capture" : "playback",
@@ -1210,12 +1217,11 @@ void dpcm_be_disconnect(struct snd_soc_pcm_runtime *fe, int stream)
 
 		dpcm_remove_debugfs_state(dpcm);
 
-		spin_lock_irqsave(&fe->card->dpcm_lock, flags);
 		list_del(&dpcm->list_be);
 		list_del(&dpcm->list_fe);
-		spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
 		kfree(dpcm);
 	}
+	dpcm_fe_stream_unlock_irq(fe, stream);
 }
 
 /* get BE for DAI widget and stream */
@@ -1429,12 +1435,11 @@ int dpcm_process_paths(struct snd_soc_pcm_runtime *fe,
 void dpcm_clear_pending_state(struct snd_soc_pcm_runtime *fe, int stream)
 {
 	struct snd_soc_dpcm *dpcm;
-	unsigned long flags;
 
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	dpcm_fe_stream_lock_irq(fe, stream);
 	for_each_dpcm_be(fe, stream, dpcm)
 		dpcm_set_be_update_state(dpcm->be, stream, SND_SOC_DPCM_UPDATE_NO);
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	dpcm_fe_stream_unlock_irq(fe, stream);
 }
 
 void dpcm_be_dai_stop(struct snd_soc_pcm_runtime *fe, int stream,
@@ -2352,7 +2357,6 @@ static int dpcm_run_update_startup(struct snd_soc_pcm_runtime *fe, int stream)
 	struct snd_soc_dpcm *dpcm;
 	enum snd_soc_dpcm_trigger trigger = fe->dai_link->trigger[stream];
 	int ret = 0;
-	unsigned long flags;
 
 	dev_dbg(fe->dev, "ASoC: runtime %s open on FE %s\n",
 			stream ? "capture" : "playback", fe->dai_link->name);
@@ -2421,7 +2425,7 @@ static int dpcm_run_update_startup(struct snd_soc_pcm_runtime *fe, int stream)
 	dpcm_be_dai_shutdown(fe, stream);
 disconnect:
 	/* disconnect any pending BEs */
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	dpcm_fe_stream_lock_irq(fe, stream);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 
@@ -2433,7 +2437,7 @@ static int dpcm_run_update_startup(struct snd_soc_pcm_runtime *fe, int stream)
 			be->dpcm[stream].state == SND_SOC_DPCM_STATE_NEW)
 				dpcm->state = SND_SOC_DPCM_LINK_STATE_FREE;
 	}
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	dpcm_fe_stream_unlock_irq(fe, stream);
 
 	if (ret < 0)
 		dev_err(fe->dev, "ASoC: %s() failed (%d)\n", __func__, ret);
@@ -2843,10 +2847,9 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 	struct snd_soc_dpcm *dpcm;
 	int state;
 	int ret = 1;
-	unsigned long flags;
 	int i;
 
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	dpcm_fe_stream_lock_irq(fe, stream);
 	for_each_dpcm_fe(be, stream, dpcm) {
 
 		if (dpcm->fe == fe)
@@ -2860,7 +2863,7 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 			}
 		}
 	}
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	dpcm_fe_stream_unlock_irq(fe, stream);
 
 	/* it's safe to do this BE DAI */
 	return ret;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-07 21:11                     ` Takashi Iwai
@ 2021-10-07 21:27                       ` Pierre-Louis Bossart
  2021-10-08  6:13                         ` Takashi Iwai
  0 siblings, 1 reply; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-07 21:27 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi



>> I think the only solution is to follow the example of the PCM case,
>> where the type of lock depends on the FE types, with the assumption that
>> there are no mixed atomic/non-atomic FE configurations.
> 
> Yes, and I guess we can simply replace those all card->dpcm_lock with
> FE's stream lock.  It'll solve the atomicity problem, too, and the FE
> stream lock can be applied outside the loop of dpcm_be_disconnect()
> gracefully.
> 
> And, this should solve the race with dpcm_be_dai_trigger() as well,
> because it's called from dpcm_fe_dai_trigger() that is called already
> inside the FE's stream lock held by PCM core.  A PoC is something like
> below.  (I replaced the superfluous *_irqsave with *_irq there)

No I don't think so. The code starts from an FE and loops for all the
BEs connected to that FE, but we want to serialize at the BE level! we
really need a dpcm lock at the card level, not the FE/stream level.

I am testing a quick set of changes at
https://github.com/thesofproject/linux/pull/3201,  where I also replaced
irqsave by irq btw.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-07 21:27                       ` Pierre-Louis Bossart
@ 2021-10-08  6:13                         ` Takashi Iwai
  2021-10-08 14:41                           ` Pierre-Louis Bossart
  0 siblings, 1 reply; 33+ messages in thread
From: Takashi Iwai @ 2021-10-08  6:13 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

On Thu, 07 Oct 2021 23:27:50 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> 
> >> I think the only solution is to follow the example of the PCM case,
> >> where the type of lock depends on the FE types, with the assumption that
> >> there are no mixed atomic/non-atomic FE configurations.
> > 
> > Yes, and I guess we can simply replace those all card->dpcm_lock with
> > FE's stream lock.  It'll solve the atomicity problem, too, and the FE
> > stream lock can be applied outside the loop of dpcm_be_disconnect()
> > gracefully.
> > 
> > And, this should solve the race with dpcm_be_dai_trigger() as well,
> > because it's called from dpcm_fe_dai_trigger() that is called already
> > inside the FE's stream lock held by PCM core.  A PoC is something like
> > below.  (I replaced the superfluous *_irqsave with *_irq there)
> 
> No I don't think so. The code starts from an FE and loops for all the
> BEs connected to that FE, but we want to serialize at the BE level! we
> really need a dpcm lock at the card level, not the FE/stream level.

The FE lock prevents the race between dpcm_be_dai_trigger() and
dpcm_be_disconnect(), i.e. the problem Gyeongtaek showed.

The race among concurrent dpcm_be_dai_trigger() calls itself can be
addressed by BE stream locks as suggested earlier.


Takashi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-08  6:13                         ` Takashi Iwai
@ 2021-10-08 14:41                           ` Pierre-Louis Bossart
  2021-10-08 14:51                             ` Takashi Iwai
  0 siblings, 1 reply; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-08 14:41 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi



>>>> I think the only solution is to follow the example of the PCM case,
>>>> where the type of lock depends on the FE types, with the assumption that
>>>> there are no mixed atomic/non-atomic FE configurations.
>>>
>>> Yes, and I guess we can simply replace those all card->dpcm_lock with
>>> FE's stream lock.  It'll solve the atomicity problem, too, and the FE
>>> stream lock can be applied outside the loop of dpcm_be_disconnect()
>>> gracefully.
>>>
>>> And, this should solve the race with dpcm_be_dai_trigger() as well,
>>> because it's called from dpcm_fe_dai_trigger() that is called already
>>> inside the FE's stream lock held by PCM core.  A PoC is something like
>>> below.  (I replaced the superfluous *_irqsave with *_irq there)
>>
>> No I don't think so. The code starts from an FE and loops for all the
>> BEs connected to that FE, but we want to serialize at the BE level! we
>> really need a dpcm lock at the card level, not the FE/stream level.
> 
> The FE lock prevents the race between dpcm_be_dai_trigger() and
> dpcm_be_disconnect(), i.e. the problem Gyeongtaek showed.
> 
> The race among concurrent dpcm_be_dai_trigger() calls itself can be
> addressed by BE stream locks as suggested earlier.

I am not following your line of thought Takashi.

dpcm_be_disconnect already uses a spin_lock around

list_del(&dpcm->list_be);
list_del(&dpcm->list_fe);

and in some other places, are you suggesting we change those to the FE lock?

Otherwise, I understood your proposal as using three locks (existing
spinlock, FE lock, BE lock) to deal with DPCM. If the existing spinlock
and FE lock are combined, we'd still have two locks.

I was suggesting we use only one ;-)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-08 14:41                           ` Pierre-Louis Bossart
@ 2021-10-08 14:51                             ` Takashi Iwai
  2021-10-08 15:41                               ` Pierre-Louis Bossart
  0 siblings, 1 reply; 33+ messages in thread
From: Takashi Iwai @ 2021-10-08 14:51 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

On Fri, 08 Oct 2021 16:41:59 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> 
> >>>> I think the only solution is to follow the example of the PCM case,
> >>>> where the type of lock depends on the FE types, with the assumption that
> >>>> there are no mixed atomic/non-atomic FE configurations.
> >>>
> >>> Yes, and I guess we can simply replace those all card->dpcm_lock with
> >>> FE's stream lock.  It'll solve the atomicity problem, too, and the FE
> >>> stream lock can be applied outside the loop of dpcm_be_disconnect()
> >>> gracefully.
> >>>
> >>> And, this should solve the race with dpcm_be_dai_trigger() as well,
> >>> because it's called from dpcm_fe_dai_trigger() that is called already
> >>> inside the FE's stream lock held by PCM core.  A PoC is something like
> >>> below.  (I replaced the superfluous *_irqsave with *_irq there)
> >>
> >> No I don't think so. The code starts from an FE and loops for all the
> >> BEs connected to that FE, but we want to serialize at the BE level! we
> >> really need a dpcm lock at the card level, not the FE/stream level.
> > 
> > The FE lock prevents the race between dpcm_be_dai_trigger() and
> > dpcm_be_disconnect(), i.e. the problem Gyeongtaek showed.
> > 
> > The race among concurrent dpcm_be_dai_trigger() calls itself can be
> > addressed by BE stream locks as suggested earlier.
> 
> I am not following your line of thought Takashi.
> 
> dpcm_be_disconnect already uses a spin_lock around
> 
> list_del(&dpcm->list_be);
> list_del(&dpcm->list_fe);
> 
> and in some other places, are you suggesting we change those to the FE lock?

Basically yes.

> Otherwise, I understood your proposal as using three locks (existing
> spinlock, FE lock, BE lock) to deal with DPCM. If the existing spinlock
> and FE lock are combined, we'd still have two locks.

Stream locks are more fine-grained, hence more efficient :)
The card-level spinlock is superfluous and it can go away.

> I was suggesting we use only one ;-)

Basically we need to protect two things:
- The BE links
- The concurrent accesses to BEs
The former belongs to each FE that holds the links, and the FE stream
lock would cover.  The latter is rather a per-BE business.

An oft-seen risk of multiple locks is deadlocking, but in this case,
as long as we keep the lock order FE->BE, nothing wrong can happen.


Takashi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-08 14:51                             ` Takashi Iwai
@ 2021-10-08 15:41                               ` Pierre-Louis Bossart
  2021-10-08 19:09                                 ` Pierre-Louis Bossart
  0 siblings, 1 reply; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-08 15:41 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi


>> dpcm_be_disconnect already uses a spin_lock around
>>
>> list_del(&dpcm->list_be);
>> list_del(&dpcm->list_fe);
>>
>> and in some other places, are you suggesting we change those to the FE lock?
> 
> Basically yes.
> 
>> Otherwise, I understood your proposal as using three locks (existing
>> spinlock, FE lock, BE lock) to deal with DPCM. If the existing spinlock
>> and FE lock are combined, we'd still have two locks.
> 
> Stream locks are more fine-grained, hence more efficient :)
> The card-level spinlock is superfluous and it can go away.
> 
>> I was suggesting we use only one ;-)
> 
> Basically we need to protect two things:
> - The BE links
> - The concurrent accesses to BEs
> The former belongs to each FE that holds the links, and the FE stream
> lock would cover.  The latter is rather a per-BE business.
> 
> An oft-seen risk of multiple locks is deadlocking, but in this case,
> as long as we keep the lock order FE->BE, nothing wrong can happen.

famous last words  "nothing wrong can happen." :-)

I already added a helper to do this FE lock, I can easily replace the
implementation to remove the spin_lock and use the FE PCM lock.
we might even add the lock in the definition of for_each_dpcm_be() to
avoid misses.

Let me try this out today, thanks for the suggestions.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-08 15:41                               ` Pierre-Louis Bossart
@ 2021-10-08 19:09                                 ` Pierre-Louis Bossart
  2021-10-11 20:06                                   ` Pierre-Louis Bossart
  0 siblings, 1 reply; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-08 19:09 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

[-- Attachment #1: Type: text/plain, Size: 1591 bytes --]


>> Basically we need to protect two things:
>> - The BE links
>> - The concurrent accesses to BEs
>> The former belongs to each FE that holds the links, and the FE stream
>> lock would cover.  The latter is rather a per-BE business.
>>
>> An oft-seen risk of multiple locks is deadlocking, but in this case,
>> as long as we keep the lock order FE->BE, nothing wrong can happen.
> 
> famous last words  "nothing wrong can happen." :-)
> 
> I already added a helper to do this FE lock, I can easily replace the
> implementation to remove the spin_lock and use the FE PCM lock.
> we might even add the lock in the definition of for_each_dpcm_be() to
> avoid misses.
> 
> Let me try this out today, thanks for the suggestions.

well, it's not successful at all...

When I replace the existing dpcm_lock with the FE PCM lock as you
suggested, without any additional changes, speaker-test produces valid
audio on the endpoints, but if I try a Ctrl-C or limit the number of
loops with the '-l' option, I hear an endless loop on the same buffer
and I have to power cycle my test device to stop the sound.

See 2 patches attached, the first patch with the introduction of the
helper works fine, the second with the use of the FE PCM lock doesn't.
In hindsight I am glad I worked on minimal patches, one after the other,
to identify problems.

And when I add the BE lock, then nothing happens. Device stuck and no
audio...

There must be something we're missing related to the locking...

My work version is at
https://github.com/plbossart/sound/tree/fix/dpcm-lock5 if anyone wants
to take a look.


[-- Attachment #2: 0002-ASoC-soc-pcm-remove-dpcm-spin_lock-use-PCM-stream-lo.patch --]
[-- Type: text/x-patch, Size: 2147 bytes --]

From f0a7068b50057eb918821dbcda6d448f49f5f1c4 Mon Sep 17 00:00:00 2001
From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Date: Thu, 7 Oct 2021 15:35:23 -0500
Subject: [PATCH 2/2] ASoC: soc-pcm: remove dpcm spin_lock, use PCM stream lock

There is no need for a DPCM-specific lock at the card level, we can
use the FE-specific PCM lock instead. In addition, these PCM lock will
rely on either a spin-lock or a mutex depending on atomicity.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 include/sound/soc.h  | 2 --
 sound/soc/soc-core.c | 1 -
 sound/soc/soc-pcm.c  | 4 ++--
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/sound/soc.h b/include/sound/soc.h
index 8e6dd8a257c5..5872a8864f3b 100644
--- a/include/sound/soc.h
+++ b/include/sound/soc.h
@@ -893,8 +893,6 @@ struct snd_soc_card {
 	struct mutex pcm_mutex;
 	enum snd_soc_pcm_subclass pcm_subclass;
 
-	spinlock_t dpcm_lock;
-
 	int (*probe)(struct snd_soc_card *card);
 	int (*late_probe)(struct snd_soc_card *card);
 	int (*remove)(struct snd_soc_card *card);
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index c830e96afba2..51ef9917ca98 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -2339,7 +2339,6 @@ int snd_soc_register_card(struct snd_soc_card *card)
 	mutex_init(&card->mutex);
 	mutex_init(&card->dapm_mutex);
 	mutex_init(&card->pcm_mutex);
-	spin_lock_init(&card->dpcm_lock);
 
 	return snd_soc_bind_card(card);
 }
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index 52851827d53f..c1dbd8633587 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -31,13 +31,13 @@
 
 void snd_soc_dpcm_fe_lock_irq(struct snd_soc_pcm_runtime *fe, int stream)
 {
-	spin_lock_irq(&fe->card->dpcm_lock);
+	snd_pcm_stream_lock_irq(snd_soc_dpcm_get_substream(fe, stream));
 }
 EXPORT_SYMBOL_GPL(snd_soc_dpcm_fe_lock_irq);
 
 void snd_soc_dpcm_fe_unlock_irq(struct snd_soc_pcm_runtime *fe, int stream)
 {
-	spin_unlock_irq(&fe->card->dpcm_lock);
+	snd_pcm_stream_unlock_irq(snd_soc_dpcm_get_substream(fe, stream));
 }
 EXPORT_SYMBOL_GPL(snd_soc_dpcm_fe_unlock_irq);
 
-- 
2.25.1


[-- Attachment #3: 0001-ASoC-soc-pcm-introduce-snd_soc_dpcm_fe_lock_irq-unlo.patch --]
[-- Type: text/x-patch, Size: 7038 bytes --]

From c35ccac18235f6f50e2d0e9fb6167612dcc753f7 Mon Sep 17 00:00:00 2001
From: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Date: Wed, 29 Sep 2021 11:30:19 -0500
Subject: [PATCH 1/2] ASoC: soc-pcm: introduce
 snd_soc_dpcm_fe_lock_irq/unlock_irq()

In preparation for more changes, add two new helpers to gradually
modify the DPCM locks.

Since DPCM functions are not used from interrupt handlers, we can only
use the lock_irq case.

While most of the uses of DPCM are internal to soc-pcm.c, some drivers
in soc/fsl and soc/sh do make use of DPCM-related loops that will
require protection, adding EXPORT_SYMBOL_GPL() is needed for those
drivers.

The stream argument is unused in this patch but will be enabled in
follow-up patches.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
---
 include/sound/soc-dpcm.h |  3 +++
 sound/soc/soc-pcm.c      | 42 +++++++++++++++++++++++-----------------
 2 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/include/sound/soc-dpcm.h b/include/sound/soc-dpcm.h
index 9c00118603e7..8ed40b8f3da8 100644
--- a/include/sound/soc-dpcm.h
+++ b/include/sound/soc-dpcm.h
@@ -151,4 +151,7 @@ bool dpcm_end_walk_at_be(struct snd_soc_dapm_widget *widget, enum snd_soc_dapm_d
 #define dpcm_be_dai_startup_unwind(fe, stream)	dpcm_be_dai_stop(fe, stream, 0, NULL)
 #define dpcm_be_dai_shutdown(fe, stream)	dpcm_be_dai_stop(fe, stream, 1, NULL)
 
+void snd_soc_dpcm_fe_lock_irq(struct snd_soc_pcm_runtime *fe, int stream);
+void snd_soc_dpcm_fe_unlock_irq(struct snd_soc_pcm_runtime *fe, int stream);
+
 #endif
diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c
index 19539300d94d..52851827d53f 100644
--- a/sound/soc/soc-pcm.c
+++ b/sound/soc/soc-pcm.c
@@ -29,6 +29,18 @@
 
 #define DPCM_MAX_BE_USERS	8
 
+void snd_soc_dpcm_fe_lock_irq(struct snd_soc_pcm_runtime *fe, int stream)
+{
+	spin_lock_irq(&fe->card->dpcm_lock);
+}
+EXPORT_SYMBOL_GPL(snd_soc_dpcm_fe_lock_irq);
+
+void snd_soc_dpcm_fe_unlock_irq(struct snd_soc_pcm_runtime *fe, int stream)
+{
+	spin_unlock_irq(&fe->card->dpcm_lock);
+}
+EXPORT_SYMBOL_GPL(snd_soc_dpcm_fe_unlock_irq);
+
 /* can this BE stop and free */
 static int snd_soc_dpcm_can_be_free_stop(struct snd_soc_pcm_runtime *fe,
 					 struct snd_soc_pcm_runtime *be, int stream);
@@ -85,7 +97,6 @@ static ssize_t dpcm_show_state(struct snd_soc_pcm_runtime *fe,
 	struct snd_pcm_hw_params *params = &fe->dpcm[stream].hw_params;
 	struct snd_soc_dpcm *dpcm;
 	ssize_t offset = 0;
-	unsigned long flags;
 
 	/* FE state */
 	offset += scnprintf(buf + offset, size - offset,
@@ -113,7 +124,7 @@ static ssize_t dpcm_show_state(struct snd_soc_pcm_runtime *fe,
 		goto out;
 	}
 
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	snd_soc_dpcm_fe_lock_irq(fe, stream);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 		params = &dpcm->hw_params;
@@ -134,7 +145,7 @@ static ssize_t dpcm_show_state(struct snd_soc_pcm_runtime *fe,
 					   params_channels(params),
 					   params_rate(params));
 	}
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	snd_soc_dpcm_fe_unlock_irq(fe, stream);
 out:
 	return offset;
 }
@@ -1141,7 +1152,6 @@ static int dpcm_be_connect(struct snd_soc_pcm_runtime *fe,
 		struct snd_soc_pcm_runtime *be, int stream)
 {
 	struct snd_soc_dpcm *dpcm;
-	unsigned long flags;
 
 	/* only add new dpcms */
 	for_each_dpcm_be(fe, stream, dpcm) {
@@ -1157,10 +1167,10 @@ static int dpcm_be_connect(struct snd_soc_pcm_runtime *fe,
 	dpcm->fe = fe;
 	be->dpcm[stream].runtime = fe->dpcm[stream].runtime;
 	dpcm->state = SND_SOC_DPCM_LINK_STATE_NEW;
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	snd_soc_dpcm_fe_lock_irq(fe, stream);
 	list_add(&dpcm->list_be, &fe->dpcm[stream].be_clients);
 	list_add(&dpcm->list_fe, &be->dpcm[stream].fe_clients);
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	snd_soc_dpcm_fe_unlock_irq(fe, stream);
 
 	dev_dbg(fe->dev, "connected new DPCM %s path %s %s %s\n",
 			stream ? "capture" : "playback",  fe->dai_link->name,
@@ -1203,7 +1213,6 @@ static void dpcm_be_reparent(struct snd_soc_pcm_runtime *fe,
 void dpcm_be_disconnect(struct snd_soc_pcm_runtime *fe, int stream)
 {
 	struct snd_soc_dpcm *dpcm, *d;
-	unsigned long flags;
 
 	for_each_dpcm_be_safe(fe, stream, dpcm, d) {
 		dev_dbg(fe->dev, "ASoC: BE %s disconnect check for %s\n",
@@ -1222,10 +1231,10 @@ void dpcm_be_disconnect(struct snd_soc_pcm_runtime *fe, int stream)
 
 		dpcm_remove_debugfs_state(dpcm);
 
-		spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+		snd_soc_dpcm_fe_lock_irq(fe, stream);
 		list_del(&dpcm->list_be);
 		list_del(&dpcm->list_fe);
-		spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+		snd_soc_dpcm_fe_unlock_irq(fe, stream);
 		kfree(dpcm);
 	}
 }
@@ -1451,12 +1460,11 @@ int dpcm_process_paths(struct snd_soc_pcm_runtime *fe,
 void dpcm_clear_pending_state(struct snd_soc_pcm_runtime *fe, int stream)
 {
 	struct snd_soc_dpcm *dpcm;
-	unsigned long flags;
 
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	snd_soc_dpcm_fe_lock_irq(fe, stream);
 	for_each_dpcm_be(fe, stream, dpcm)
 		dpcm_set_be_update_state(dpcm->be, stream, SND_SOC_DPCM_UPDATE_NO);
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	snd_soc_dpcm_fe_unlock_irq(fe, stream);
 }
 
 void dpcm_be_dai_stop(struct snd_soc_pcm_runtime *fe, int stream,
@@ -2374,7 +2382,6 @@ static int dpcm_run_update_startup(struct snd_soc_pcm_runtime *fe, int stream)
 	struct snd_soc_dpcm *dpcm;
 	enum snd_soc_dpcm_trigger trigger = fe->dai_link->trigger[stream];
 	int ret = 0;
-	unsigned long flags;
 
 	dev_dbg(fe->dev, "ASoC: runtime %s open on FE %s\n",
 			stream ? "capture" : "playback", fe->dai_link->name);
@@ -2443,7 +2450,7 @@ static int dpcm_run_update_startup(struct snd_soc_pcm_runtime *fe, int stream)
 	dpcm_be_dai_shutdown(fe, stream);
 disconnect:
 	/* disconnect any pending BEs */
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	snd_soc_dpcm_fe_lock_irq(fe, stream);
 	for_each_dpcm_be(fe, stream, dpcm) {
 		struct snd_soc_pcm_runtime *be = dpcm->be;
 
@@ -2455,7 +2462,7 @@ static int dpcm_run_update_startup(struct snd_soc_pcm_runtime *fe, int stream)
 			be->dpcm[stream].state == SND_SOC_DPCM_STATE_NEW)
 				dpcm->state = SND_SOC_DPCM_LINK_STATE_FREE;
 	}
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	snd_soc_dpcm_fe_unlock_irq(fe, stream);
 
 	if (ret < 0)
 		dev_err(fe->dev, "ASoC: %s() failed (%d)\n", __func__, ret);
@@ -2855,10 +2862,9 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 	struct snd_soc_dpcm *dpcm;
 	int state;
 	int ret = 1;
-	unsigned long flags;
 	int i;
 
-	spin_lock_irqsave(&fe->card->dpcm_lock, flags);
+	snd_soc_dpcm_fe_lock_irq(fe, stream);
 	for_each_dpcm_fe(be, stream, dpcm) {
 
 		if (dpcm->fe == fe)
@@ -2872,7 +2878,7 @@ static int snd_soc_dpcm_check_state(struct snd_soc_pcm_runtime *fe,
 			}
 		}
 	}
-	spin_unlock_irqrestore(&fe->card->dpcm_lock, flags);
+	snd_soc_dpcm_fe_unlock_irq(fe, stream);
 
 	/* it's safe to do this BE DAI */
 	return ret;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-08 19:09                                 ` Pierre-Louis Bossart
@ 2021-10-11 20:06                                   ` Pierre-Louis Bossart
  2021-10-12  6:34                                     ` Takashi Iwai
  0 siblings, 1 reply; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-11 20:06 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi


>>> Basically we need to protect two things:
>>> - The BE links
>>> - The concurrent accesses to BEs
>>> The former belongs to each FE that holds the links, and the FE stream
>>> lock would cover.  The latter is rather a per-BE business.
>>>
>>> An oft-seen risk of multiple locks is deadlocking, but in this case,
>>> as long as we keep the lock order FE->BE, nothing wrong can happen.
>>
>> famous last words  "nothing wrong can happen." :-)
>>
>> I already added a helper to do this FE lock, I can easily replace the
>> implementation to remove the spin_lock and use the FE PCM lock.
>> we might even add the lock in the definition of for_each_dpcm_be() to
>> avoid misses.
>>
>> Let me try this out today, thanks for the suggestions.
> 
> well, it's not successful at all...
> 
> When I replace the existing dpcm_lock with the FE PCM lock as you
> suggested, without any additional changes, speaker-test produces valid
> audio on the endpoints, but if I try a Ctrl-C or limit the number of
> loops with the '-l' option, I hear an endless loop on the same buffer
> and I have to power cycle my test device to stop the sound.
> 
> See 2 patches attached, the first patch with the introduction of the
> helper works fine, the second with the use of the FE PCM lock doesn't.
> In hindsight I am glad I worked on minimal patches, one after the other,
> to identify problems.
> 
> And when I add the BE lock, then nothing happens. Device stuck and no
> audio...
> 
> There must be something we're missing related to the locking...

And indeed there's a deadlock!

snd_pcm_period_elapsed() takes the FE pcm stream lock, and will call
snd_pcm_trigger. So if we also take the pcm stream lock in the BE
trigger, there's a conceptual deadlock: we painted ourselves in a corner
by using the same lock twice.

Takashi, are you really sure we should protect these for_each_dpcm_be()
loops with the same pcm lock? it seems like asking for trouble to
revisit the ALSA core just to walking through a list of BEs? Would you
object to changing dpcm_lock as I suggested, but not interfering with
stream handling?

See the traces below based on the hack in
https://github.com/plbossart/sound/tree/test/dpcm-lock-loop-on-stop

[   67.892082] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_sof_pcm_period_elapsed_work: plb taking lock
[   67.892088] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_sof_pcm_period_elapsed_work: plb taking lock - done
[   67.892092] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_period_elapsed_under_stream_lock: start
[   67.892096] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_period_elapsed_under_stream_lock: check running
[   67.892099] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_period_elapsed_under_stream_lock: check update_hw_ptr0
[   67.892103] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: start
[   67.892110] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: delta
[   67.892113] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: check1
[   67.892116] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: no_delta_check
[   67.892119] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: playback silence
[   67.892123] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: checks 3
[   67.892126] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_hw_ptr0: done
[   67.892129] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_state: start
[   67.892132] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_state: before draining
[   67.892136] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_state: before draining2
[   67.892139] skl_hda_dsp_generic skl_hda_dsp_generic:
snd_pcm_update_state: before snd_pcm_drain_done
[   67.892143] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_do_stop:
start
[   67.892147] skl_hda_dsp_generic skl_hda_dsp_generic: snd_pcm_do_stop:
before TRIGGER_STOP

<<< we never reach the end of this TRIGGER_STOP

[   67.892153] sof-audio-pci-intel-cnl 0000:00:1f.3: pcm: trigger stream
0 dir 0 cmd 0
[   67.892166] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60050000:
GLB_STREAM_MSG: TRIG_STOP
[   67.892396] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded:
0x60050000: GLB_STREAM_MSG: TRIG_STOP
[   67.892408] sof-audio-pci-intel-cnl 0000:00:1f.3: FW Poll Status:
reg[0x160]=0x140000 successful
[   67.892418] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx: 0x60030000:
GLB_STREAM_MSG: PCM_FREE
[   67.892564] sof-audio-pci-intel-cnl 0000:00:1f.3: ipc tx succeeded:
0x60030000: GLB_STREAM_MSG: PCM_FREE
[   67.892571]  HDA Analog: dpcm_be_dai_trigger: BE TRIGGER_STOP start
[   67.892575]  HDA Analog: dpcm_be_dai_trigger: BE TRIGGER_STOP check
can_be_free_stop
[   67.892579]  HDA Analog: snd_soc_dpcm_check_state: plb: taking fe
lock_irqsave from snd_soc_dpcm_check_state

<<< checkmate at this point, we're trying to take the same lock twice.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-11 20:06                                   ` Pierre-Louis Bossart
@ 2021-10-12  6:34                                     ` Takashi Iwai
  2021-10-12 10:42                                       ` Takashi Iwai
  0 siblings, 1 reply; 33+ messages in thread
From: Takashi Iwai @ 2021-10-12  6:34 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

On Mon, 11 Oct 2021 22:06:51 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> >>> Basically we need to protect two things:
> >>> - The BE links
> >>> - The concurrent accesses to BEs
> >>> The former belongs to each FE that holds the links, and the FE stream
> >>> lock would cover.  The latter is rather a per-BE business.
> >>>
> >>> An oft-seen risk of multiple locks is deadlocking, but in this case,
> >>> as long as we keep the lock order FE->BE, nothing wrong can happen.
> >>
> >> famous last words  "nothing wrong can happen." :-)
> >>
> >> I already added a helper to do this FE lock, I can easily replace the
> >> implementation to remove the spin_lock and use the FE PCM lock.
> >> we might even add the lock in the definition of for_each_dpcm_be() to
> >> avoid misses.
> >>
> >> Let me try this out today, thanks for the suggestions.
> > 
> > well, it's not successful at all...
> > 
> > When I replace the existing dpcm_lock with the FE PCM lock as you
> > suggested, without any additional changes, speaker-test produces valid
> > audio on the endpoints, but if I try a Ctrl-C or limit the number of
> > loops with the '-l' option, I hear an endless loop on the same buffer
> > and I have to power cycle my test device to stop the sound.
> > 
> > See 2 patches attached, the first patch with the introduction of the
> > helper works fine, the second with the use of the FE PCM lock doesn't.
> > In hindsight I am glad I worked on minimal patches, one after the other,
> > to identify problems.
> > 
> > And when I add the BE lock, then nothing happens. Device stuck and no
> > audio...
> > 
> > There must be something we're missing related to the locking...
> 
> And indeed there's a deadlock!
> 
> snd_pcm_period_elapsed() takes the FE pcm stream lock, and will call
> snd_pcm_trigger.

Indeed, this would deadlock.

> So if we also take the pcm stream lock in the BE
> trigger, there's a conceptual deadlock: we painted ourselves in a corner
> by using the same lock twice.
> 
> Takashi, are you really sure we should protect these for_each_dpcm_be()
> loops with the same pcm lock?

The call within the FE lock is done only in dpcm_dai_trigger_fe_be(),
and this should call dpcm_be_dai_trigger() as is.  In other places,
the calls are without FE lock, hence they can take the lock,
e.g. create a variant dpcm_dai_trigger_fe_be_lock().

> it seems like asking for trouble to
> revisit the ALSA core just to walking through a list of BEs? Would you
> object to changing dpcm_lock as I suggested, but not interfering with
> stream handling?

That would work, too, it's just a pity to degrade the fine-grained
locks that have been already taken into global locks...


Takashi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-12  6:34                                     ` Takashi Iwai
@ 2021-10-12 10:42                                       ` Takashi Iwai
  2021-10-12 13:45                                         ` Pierre-Louis Bossart
  0 siblings, 1 reply; 33+ messages in thread
From: Takashi Iwai @ 2021-10-12 10:42 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

On Tue, 12 Oct 2021 08:34:07 +0200,
Takashi Iwai wrote:
> 
> On Mon, 11 Oct 2021 22:06:51 +0200,
> Pierre-Louis Bossart wrote:
> > 
> > 
> > >>> Basically we need to protect two things:
> > >>> - The BE links
> > >>> - The concurrent accesses to BEs
> > >>> The former belongs to each FE that holds the links, and the FE stream
> > >>> lock would cover.  The latter is rather a per-BE business.
> > >>>
> > >>> An oft-seen risk of multiple locks is deadlocking, but in this case,
> > >>> as long as we keep the lock order FE->BE, nothing wrong can happen.
> > >>
> > >> famous last words  "nothing wrong can happen." :-)
> > >>
> > >> I already added a helper to do this FE lock, I can easily replace the
> > >> implementation to remove the spin_lock and use the FE PCM lock.
> > >> we might even add the lock in the definition of for_each_dpcm_be() to
> > >> avoid misses.
> > >>
> > >> Let me try this out today, thanks for the suggestions.
> > > 
> > > well, it's not successful at all...
> > > 
> > > When I replace the existing dpcm_lock with the FE PCM lock as you
> > > suggested, without any additional changes, speaker-test produces valid
> > > audio on the endpoints, but if I try a Ctrl-C or limit the number of
> > > loops with the '-l' option, I hear an endless loop on the same buffer
> > > and I have to power cycle my test device to stop the sound.
> > > 
> > > See 2 patches attached, the first patch with the introduction of the
> > > helper works fine, the second with the use of the FE PCM lock doesn't.
> > > In hindsight I am glad I worked on minimal patches, one after the other,
> > > to identify problems.
> > > 
> > > And when I add the BE lock, then nothing happens. Device stuck and no
> > > audio...
> > > 
> > > There must be something we're missing related to the locking...
> > 
> > And indeed there's a deadlock!
> > 
> > snd_pcm_period_elapsed() takes the FE pcm stream lock, and will call
> > snd_pcm_trigger.
> 
> Indeed, this would deadlock.
> 
> > So if we also take the pcm stream lock in the BE
> > trigger, there's a conceptual deadlock: we painted ourselves in a corner
> > by using the same lock twice.
> > 
> > Takashi, are you really sure we should protect these for_each_dpcm_be()
> > loops with the same pcm lock?
> 
> The call within the FE lock is done only in dpcm_dai_trigger_fe_be(),
> and this should call dpcm_be_dai_trigger() as is.  In other places,
> the calls are without FE lock, hence they can take the lock,
> e.g. create a variant dpcm_dai_trigger_fe_be_lock().

Or rather it was the code path of snd_soc_dpcm_check_state()?
The function could be called from dpcm_be_dai_trigger().
Currently dpcm_lock seems to be added at places casually covering some
of the for_each_dpcm_be() or whatever...  The whole lock scheme has to
be revisited.

> > it seems like asking for trouble to
> > revisit the ALSA core just to walking through a list of BEs? Would you
> > object to changing dpcm_lock as I suggested, but not interfering with
> > stream handling?
> 
> That would work, too, it's just a pity to degrade the fine-grained
> locks that have been already taken into global locks...

For the performance problem, at least, you can make it rwlock and
rwsem types (depending on nonatomic) so that the concurrent accesses
would work.  The only exclusive lock is the case for adding and
removing the entries, which should be done with write lock / sem down,
while other link traversals can be executed in the read lock / sem.


Takashi

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-12 10:42                                       ` Takashi Iwai
@ 2021-10-12 13:45                                         ` Pierre-Louis Bossart
  2021-10-12 15:07                                           ` Takashi Iwai
  0 siblings, 1 reply; 33+ messages in thread
From: Pierre-Louis Bossart @ 2021-10-12 13:45 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi


>>> And indeed there's a deadlock!
>>>
>>> snd_pcm_period_elapsed() takes the FE pcm stream lock, and will call
>>> snd_pcm_trigger.
>>
>> Indeed, this would deadlock.
>>
>>> So if we also take the pcm stream lock in the BE
>>> trigger, there's a conceptual deadlock: we painted ourselves in a corner
>>> by using the same lock twice.
>>>
>>> Takashi, are you really sure we should protect these for_each_dpcm_be()
>>> loops with the same pcm lock?
>>
>> The call within the FE lock is done only in dpcm_dai_trigger_fe_be(),
>> and this should call dpcm_be_dai_trigger() as is.  In other places,
>> the calls are without FE lock, hence they can take the lock,
>> e.g. create a variant dpcm_dai_trigger_fe_be_lock().
> 
> Or rather it was the code path of snd_soc_dpcm_check_state()?
> The function could be called from dpcm_be_dai_trigger().
> Currently dpcm_lock seems to be added at places casually covering some
> of the for_each_dpcm_be() or whatever...  The whole lock scheme has to
> be revisited.
> 
>>> it seems like asking for trouble to
>>> revisit the ALSA core just to walking through a list of BEs? Would you
>>> object to changing dpcm_lock as I suggested, but not interfering with
>>> stream handling?
>>
>> That would work, too, it's just a pity to degrade the fine-grained
>> locks that have been already taken into global locks...
> 
> For the performance problem, at least, you can make it rwlock and
> rwsem types (depending on nonatomic) so that the concurrent accesses
> would work.  The only exclusive lock is the case for adding and
> removing the entries, which should be done with write lock / sem down,
> while other link traversals can be executed in the read lock / sem.

Thanks Takashi for your feedback.

Let's first get the locking right. We can optimize performance later.

I will continue with the idea of using existing fine-grained locks a bit
more, I am with you that this dpcm_lock was not added in a consistent
way and reusing the concept is really building on sand.

We can remove the lock in snd_soc_dpcm_check_state, I already did the
change in other versions. But what I'll need to check further is if
indeed dpcm_be_dai_trigger() is called with the FE lock taken already.
Adding a lockdep_assert_help() or something would help I guess.

The part where I am still not clear is that snd_pcm_period_elapsed uses
the irqsave/irqrestore version, but in the initial code you suggested
the vanilla irq version is fine. I am not sure I get the nuance here,
and btw in the case of SOF the snd_pcm_period_elapsed is called from a
workqueue, not an interrupt handler, as a work-around to avoid an IPC
deadlock, so we probably don't need the irqsave/irqrestore version anyways.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE
  2021-10-12 13:45                                         ` Pierre-Louis Bossart
@ 2021-10-12 15:07                                           ` Takashi Iwai
  0 siblings, 0 replies; 33+ messages in thread
From: Takashi Iwai @ 2021-10-12 15:07 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: alsa-devel, Kuninori Morimoto, Sameer Pujar, vkoul, broonie,
	Gyeongtaek Lee, Peter Ujfalusi

On Tue, 12 Oct 2021 15:45:41 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> >>> And indeed there's a deadlock!
> >>>
> >>> snd_pcm_period_elapsed() takes the FE pcm stream lock, and will call
> >>> snd_pcm_trigger.
> >>
> >> Indeed, this would deadlock.
> >>
> >>> So if we also take the pcm stream lock in the BE
> >>> trigger, there's a conceptual deadlock: we painted ourselves in a corner
> >>> by using the same lock twice.
> >>>
> >>> Takashi, are you really sure we should protect these for_each_dpcm_be()
> >>> loops with the same pcm lock?
> >>
> >> The call within the FE lock is done only in dpcm_dai_trigger_fe_be(),
> >> and this should call dpcm_be_dai_trigger() as is.  In other places,
> >> the calls are without FE lock, hence they can take the lock,
> >> e.g. create a variant dpcm_dai_trigger_fe_be_lock().
> > 
> > Or rather it was the code path of snd_soc_dpcm_check_state()?
> > The function could be called from dpcm_be_dai_trigger().
> > Currently dpcm_lock seems to be added at places casually covering some
> > of the for_each_dpcm_be() or whatever...  The whole lock scheme has to
> > be revisited.
> > 
> >>> it seems like asking for trouble to
> >>> revisit the ALSA core just to walking through a list of BEs? Would you
> >>> object to changing dpcm_lock as I suggested, but not interfering with
> >>> stream handling?
> >>
> >> That would work, too, it's just a pity to degrade the fine-grained
> >> locks that have been already taken into global locks...
> > 
> > For the performance problem, at least, you can make it rwlock and
> > rwsem types (depending on nonatomic) so that the concurrent accesses
> > would work.  The only exclusive lock is the case for adding and
> > removing the entries, which should be done with write lock / sem down,
> > while other link traversals can be executed in the read lock / sem.
> 
> Thanks Takashi for your feedback.
> 
> Let's first get the locking right. We can optimize performance later.
> 
> I will continue with the idea of using existing fine-grained locks a bit
> more, I am with you that this dpcm_lock was not added in a consistent
> way and reusing the concept is really building on sand.
> 
> We can remove the lock in snd_soc_dpcm_check_state, I already did the
> change in other versions. But what I'll need to check further is if
> indeed dpcm_be_dai_trigger() is called with the FE lock taken already.
> Adding a lockdep_assert_help() or something would help I guess.
> 
> The part where I am still not clear is that snd_pcm_period_elapsed uses
> the irqsave/irqrestore version, but in the initial code you suggested
> the vanilla irq version is fine. I am not sure I get the nuance here,
> and btw in the case of SOF the snd_pcm_period_elapsed is called from a
> workqueue, not an interrupt handler, as a work-around to avoid an IPC
> deadlock, so we probably don't need the irqsave/irqrestore version anyways.

In a nutshell:
* in the code paths that are already with the stream lock
  (i.e. trigger, pointer and ack PCM callbacks), you need no extra
  lock for the stream itself.  But if you want additional locks
  (e.g. for BE), use either *_spin_lock() or *_spin_lock_irqsave(),
  but not *_spin_lock_irq().

* In other code paths, *_spin_lock_irq().

In doubt, you can use always use *_irqsave(), of course.


Takashi

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-10-12 15:08 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-04 22:54 [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE Pierre-Louis Bossart
2021-10-04 22:54 ` [RFC PATCH v2 1/5] ASoC: soc-pcm: remove snd_soc_dpcm_fe_can_update() Pierre-Louis Bossart
2021-10-04 22:54   ` Pierre-Louis Bossart
2021-10-04 22:54 ` [RFC PATCH v2 2/5] ASoC: soc-pcm: don't export local functions, use static Pierre-Louis Bossart
2021-10-04 22:54   ` Pierre-Louis Bossart
2021-10-04 22:54 ` [RFC PATCH v2 3/5] ASoC: soc-pcm: replace dpcm_lock with dpcm_mutex Pierre-Louis Bossart
2021-10-04 22:54   ` Pierre-Louis Bossart
2021-10-04 22:54 ` [RFC PATCH v2 4/5] ASoC: soc-pcm: protect for_each_dpcm_be() loops " Pierre-Louis Bossart
2021-10-04 22:54   ` Pierre-Louis Bossart
2021-10-04 22:54 ` [RFC PATCH v2 5/5] ASoC: soc-pcm: test refcount before triggering Pierre-Louis Bossart
2021-10-04 22:54   ` Pierre-Louis Bossart
2021-10-05  6:36 ` [RFC PATCH v2 0/5] ASoC: soc-pcm: fix trigger race conditions with shared BE Sameer Pujar
2021-10-05 13:17   ` Pierre-Louis Bossart
2021-10-06 14:22     ` Sameer Pujar
2021-10-06 19:47       ` Pierre-Louis Bossart
2021-10-07 11:06         ` Takashi Iwai
2021-10-07 13:31           ` Pierre-Louis Bossart
2021-10-07 14:59             ` Takashi Iwai
2021-10-07 15:24               ` Pierre-Louis Bossart
2021-10-07 15:44                 ` Takashi Iwai
2021-10-07 18:13                   ` Pierre-Louis Bossart
2021-10-07 21:11                     ` Takashi Iwai
2021-10-07 21:27                       ` Pierre-Louis Bossart
2021-10-08  6:13                         ` Takashi Iwai
2021-10-08 14:41                           ` Pierre-Louis Bossart
2021-10-08 14:51                             ` Takashi Iwai
2021-10-08 15:41                               ` Pierre-Louis Bossart
2021-10-08 19:09                                 ` Pierre-Louis Bossart
2021-10-11 20:06                                   ` Pierre-Louis Bossart
2021-10-12  6:34                                     ` Takashi Iwai
2021-10-12 10:42                                       ` Takashi Iwai
2021-10-12 13:45                                         ` Pierre-Louis Bossart
2021-10-12 15:07                                           ` Takashi Iwai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.