linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock()
@ 2018-11-26  5:36 Chanho Min
  2018-11-26  8:36 ` Takashi Iwai
  0 siblings, 1 reply; 11+ messages in thread
From: Chanho Min @ 2018-11-26  5:36 UTC (permalink / raw)
  To: Jaroslav Kysela, Takashi Iwai, Vinod Koul, Daniel Mentz
  Cc: linux-kernel, alsa-devel, Seungho Park, Jongsung Kim,
	Wonmin Jung, Jaehyun Kim, Hyonwoo Park, Chanho Min

Commit 67ec1072b053 ("ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream")
fixes deadlock for non-atomic PCM stream. But, This patch causes antother stuck.
If writer is RT thread and reader is a normal thread, the reader thread will
be difficult to get scheduled. It may not give chance to release readlocks
and writer gets stuck for a long time if they are pinned to single cpu.

The deadlock described in the previous commit is because the linux rwsem
queues like a FIFO. So, we might need non-FIFO writelock, not non-block one.

My suggestion is that the writer gives reader a chance to be scheduled by using
the minimum msleep() instaed of spinning without blocking by writer. Also,
The *_nonblock may be changed to *_nonfifo appropriately to this concept.
In terms of performance, when trylock is failed, this minimum periodic msleep
will have the same performance as the tick-based schedule()/wake_up_q().

Suggested-by: Wonmin Jung <wonmin.jung@lge.com>
Signed-off-by: Chanho Min <chanho.min@lge.com>
---
 sound/core/pcm_native.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 66c90f4..bdca0e1 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -36,6 +36,7 @@
 #include <sound/timer.h>
 #include <sound/minors.h>
 #include <linux/uio.h>
+#include <linux/delay.h>
 
 #include "pcm_local.h"
 
@@ -91,12 +92,12 @@ static DECLARE_RWSEM(snd_pcm_link_rwsem);
  * and this may lead to a deadlock when the code path takes read sem
  * twice (e.g. one in snd_pcm_action_nonatomic() and another in
  * snd_pcm_stream_lock()).  As a (suboptimal) workaround, let writer to
- * spin until it gets the lock.
+ * sleep until all the readers are completed without blocking by writer.
  */
-static inline void down_write_nonblock(struct rw_semaphore *lock)
+static inline void down_write_nonfifo(struct rw_semaphore *lock)
 {
 	while (!down_write_trylock(lock))
-		cond_resched();
+		msleep(1);
 }
 
 #define PCM_LOCK_DEFAULT	0
@@ -1967,7 +1968,7 @@ static int snd_pcm_link(struct snd_pcm_substream *substream, int fd)
 		res = -ENOMEM;
 		goto _nolock;
 	}
-	down_write_nonblock(&snd_pcm_link_rwsem);
+	down_write_nonfifo(&snd_pcm_link_rwsem);
 	write_lock_irq(&snd_pcm_link_rwlock);
 	if (substream->runtime->status->state == SNDRV_PCM_STATE_OPEN ||
 	    substream->runtime->status->state != substream1->runtime->status->state ||
@@ -2014,7 +2015,7 @@ static int snd_pcm_unlink(struct snd_pcm_substream *substream)
 	struct snd_pcm_substream *s;
 	int res = 0;
 
-	down_write_nonblock(&snd_pcm_link_rwsem);
+	down_write_nonfifo(&snd_pcm_link_rwsem);
 	write_lock_irq(&snd_pcm_link_rwlock);
 	if (!snd_pcm_stream_linked(substream)) {
 		res = -EALREADY;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock()
  2018-11-26  5:36 [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock() Chanho Min
@ 2018-11-26  8:36 ` Takashi Iwai
  2018-11-27 23:47   ` Chanho Min
  2018-11-28  6:26   ` Chanho Min
  0 siblings, 2 replies; 11+ messages in thread
From: Takashi Iwai @ 2018-11-26  8:36 UTC (permalink / raw)
  To: Chanho Min
  Cc: Jaroslav Kysela, Takashi Iwai, Vinod Koul, Daniel Mentz,
	linux-kernel, alsa-devel, Seungho Park, Jongsung Kim,
	Wonmin Jung, Jaehyun Kim, Hyonwoo Park

On Mon, 26 Nov 2018 06:36:37 +0100,
Chanho Min wrote:
> 
> Commit 67ec1072b053 ("ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream")
> fixes deadlock for non-atomic PCM stream. But, This patch causes antother stuck.
> If writer is RT thread and reader is a normal thread, the reader thread will
> be difficult to get scheduled. It may not give chance to release readlocks
> and writer gets stuck for a long time if they are pinned to single cpu.
> 
> The deadlock described in the previous commit is because the linux rwsem
> queues like a FIFO. So, we might need non-FIFO writelock, not non-block one.
> 
> My suggestion is that the writer gives reader a chance to be scheduled by using
> the minimum msleep() instaed of spinning without blocking by writer. Also,
> The *_nonblock may be changed to *_nonfifo appropriately to this concept.
> In terms of performance, when trylock is failed, this minimum periodic msleep
> will have the same performance as the tick-based schedule()/wake_up_q().
> 
> Suggested-by: Wonmin Jung <wonmin.jung@lge.com>
> Signed-off-by: Chanho Min <chanho.min@lge.com>

Hrm, converting unconditionally with msleep() looks too drastic.

I guess you've hit this while not explicitly using the linked PCM
streams, i.e. in the call of snd_pcm_unlink() at close, right?

Then this can be worked around by checking the link before calling
it.  Could you check the patch below?


thanks,

Takashi

--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -2369,7 +2369,8 @@ int snd_pcm_hw_constraints_complete(struct snd_pcm_substream *substream)
 
 static void pcm_release_private(struct snd_pcm_substream *substream)
 {
-	snd_pcm_unlink(substream);
+	if (snd_pcm_stream_linked(substream))
+		snd_pcm_unlink(substream);
 }
 
 void snd_pcm_release_substream(struct snd_pcm_substream *substream)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock()
  2018-11-26  8:36 ` Takashi Iwai
@ 2018-11-27 23:47   ` Chanho Min
  2018-11-28  8:37     ` Takashi Iwai
  2018-11-28  6:26   ` Chanho Min
  1 sibling, 1 reply; 11+ messages in thread
From: Chanho Min @ 2018-11-27 23:47 UTC (permalink / raw)
  To: 'Takashi Iwai'
  Cc: 'Jaroslav Kysela', 'Takashi Iwai',
	'Vinod Koul', 'Daniel Mentz',
	linux-kernel, alsa-devel, 'Seungho Park',
	'Jongsung Kim', 'Wonmin Jung',
	'Jaehyun Kim', 'Hyonwoo Park'

> On Mon, 26 Nov 2018 06:36:37 +0100,
> Chanho Min wrote:
> >
> > Commit 67ec1072b053 ("ALSA: pcm: Fix rwsem deadlock for non-atomic PCM
> > stream") fixes deadlock for non-atomic PCM stream. But, This patch
> causes antother stuck.
> > If writer is RT thread and reader is a normal thread, the reader
> > thread will be difficult to get scheduled. It may not give chance to
> > release readlocks and writer gets stuck for a long time if they are
> pinned to single cpu.
> >
> > The deadlock described in the previous commit is because the linux
> > rwsem queues like a FIFO. So, we might need non-FIFO writelock, not non-
> block one.
> >
> > My suggestion is that the writer gives reader a chance to be scheduled
> > by using the minimum msleep() instaed of spinning without blocking by
> > writer. Also, The *_nonblock may be changed to *_nonfifo appropriately
> to this concept.
> > In terms of performance, when trylock is failed, this minimum periodic
> > msleep will have the same performance as the tick-based
> schedule()/wake_up_q().
> >
> > Suggested-by: Wonmin Jung <wonmin.jung@lge.com>
> > Signed-off-by: Chanho Min <chanho.min@lge.com>
> 
> Hrm, converting unconditionally with msleep() looks too drastic.

Yes, it looks drastic. But, IMHO, I can't say busy-spin is not non-drastic.
To fix the root cause, We may need another rwsem that does not work as a
FIFO.

> 
> I guess you've hit this while not explicitly using the linked PCM streams,
> i.e. in the call of snd_pcm_unlink() at close, right?
> 
> Then this can be worked around by checking the link before calling it.
> Could you check the patch below?

More testing is needed, but it seems to be fixed by your patch.
We may not use the linked PCM.
But, If the linked PCM is enabled,  Can snd_pcm_unlink() be called?
This also seems to be a workaround.

> 
> 
> thanks,
> 
> Takashi
> 
> --- a/sound/core/pcm_native.c
> +++ b/sound/core/pcm_native.c
> @@ -2369,7 +2369,8 @@ int snd_pcm_hw_constraints_complete(struct
> snd_pcm_substream *substream)
> 
>  static void pcm_release_private(struct snd_pcm_substream *substream)  {
> -	snd_pcm_unlink(substream);
> +	if (snd_pcm_stream_linked(substream))
> +		snd_pcm_unlink(substream);
>  }
> 
>  void snd_pcm_release_substream(struct snd_pcm_substream *substream)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock()
  2018-11-26  8:36 ` Takashi Iwai
  2018-11-27 23:47   ` Chanho Min
@ 2018-11-28  6:26   ` Chanho Min
  1 sibling, 0 replies; 11+ messages in thread
From: Chanho Min @ 2018-11-28  6:26 UTC (permalink / raw)
  To: 'Takashi Iwai'
  Cc: 'Jaroslav Kysela', 'Takashi Iwai',
	'Vinod Koul', 'Daniel Mentz',
	linux-kernel, alsa-devel, 'Seungho Park',
	'Jongsung Kim', 'Wonmin Jung',
	'Jaehyun Kim', 'Hyonwoo Park'

> >
> > Hrm, converting unconditionally with msleep() looks too drastic.
> 
> Yes, it looks drastic. But, IMHO, I can't say busy-spin is not
non-drastic.
Fix typo in this comment:
I can't say busy-spin is not drastic.

Thanks
Chanho


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock()
  2018-11-27 23:47   ` Chanho Min
@ 2018-11-28  8:37     ` Takashi Iwai
  2018-11-28 23:48       ` Chanho Min
  0 siblings, 1 reply; 11+ messages in thread
From: Takashi Iwai @ 2018-11-28  8:37 UTC (permalink / raw)
  To: Chanho Min
  Cc: 'Jaroslav Kysela', 'Takashi Iwai',
	'Vinod Koul', 'Daniel Mentz',
	linux-kernel, alsa-devel, 'Seungho Park',
	'Jongsung Kim', 'Wonmin Jung',
	'Jaehyun Kim', 'Hyonwoo Park'

On Wed, 28 Nov 2018 00:47:03 +0100,
Chanho Min wrote:
> 
> > On Mon, 26 Nov 2018 06:36:37 +0100,
> > Chanho Min wrote:
> > >
> > > Commit 67ec1072b053 ("ALSA: pcm: Fix rwsem deadlock for non-atomic PCM
> > > stream") fixes deadlock for non-atomic PCM stream. But, This patch
> > causes antother stuck.
> > > If writer is RT thread and reader is a normal thread, the reader
> > > thread will be difficult to get scheduled. It may not give chance to
> > > release readlocks and writer gets stuck for a long time if they are
> > pinned to single cpu.
> > >
> > > The deadlock described in the previous commit is because the linux
> > > rwsem queues like a FIFO. So, we might need non-FIFO writelock, not non-
> > block one.
> > >
> > > My suggestion is that the writer gives reader a chance to be scheduled
> > > by using the minimum msleep() instaed of spinning without blocking by
> > > writer. Also, The *_nonblock may be changed to *_nonfifo appropriately
> > to this concept.
> > > In terms of performance, when trylock is failed, this minimum periodic
> > > msleep will have the same performance as the tick-based
> > schedule()/wake_up_q().
> > >
> > > Suggested-by: Wonmin Jung <wonmin.jung@lge.com>
> > > Signed-off-by: Chanho Min <chanho.min@lge.com>
> > 
> > Hrm, converting unconditionally with msleep() looks too drastic.
> 
> Yes, it looks drastic. But, IMHO, I can't say busy-spin is not non-drastic.
> To fix the root cause, We may need another rwsem that does not work as a
> FIFO.

Right, but applying msleep(1) unconditionally is really bad.

> > I guess you've hit this while not explicitly using the linked PCM streams,
> > i.e. in the call of snd_pcm_unlink() at close, right?
> > 
> > Then this can be worked around by checking the link before calling it.
> > Could you check the patch below?
> 
> More testing is needed, but it seems to be fixed by your patch.
> We may not use the linked PCM.

Then I'm sure that my patch papers over.

> But, If the linked PCM is enabled,  Can snd_pcm_unlink() be called?
> This also seems to be a workaround.

Yes, for the linked streams, something else is needed *in addition*.

The original fix with busy loop also assumed that this code path (via
snd_pcm_link() and snd_pcm_unlink()) is the rare occasion, and it
didn't consider that it were called for regular use cases.  So the fix
to make things just works for regular use cases without any artifact
must be implemented in the first place.  The fix for the linked
streams comes at next.  It might be like your msleep() change as a
workaround, but in anyway it's far less urgency.


Takashi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock()
  2018-11-28  8:37     ` Takashi Iwai
@ 2018-11-28 23:48       ` Chanho Min
  2018-11-29  7:19         ` Takashi Iwai
  0 siblings, 1 reply; 11+ messages in thread
From: Chanho Min @ 2018-11-28 23:48 UTC (permalink / raw)
  To: 'Takashi Iwai'
  Cc: 'Jaroslav Kysela', 'Takashi Iwai',
	'Vinod Koul', 'Daniel Mentz',
	linux-kernel, alsa-devel, 'Seungho Park',
	'Jongsung Kim', 'Wonmin Jung',
	'Jaehyun Kim', 'Hyonwoo Park'

> > > On Mon, 26 Nov 2018 06:36:37 +0100,
> > > Chanho Min wrote:
> > > >
> > > > Commit 67ec1072b053 ("ALSA: pcm: Fix rwsem deadlock for non-atomic
> > > > PCM
> > > > stream") fixes deadlock for non-atomic PCM stream. But, This patch
> > > causes antother stuck.
> > > > If writer is RT thread and reader is a normal thread, the reader
> > > > thread will be difficult to get scheduled. It may not give chance
> > > > to release readlocks and writer gets stuck for a long time if they
> > > > are
> > > pinned to single cpu.
> > > >
> > > > The deadlock described in the previous commit is because the linux
> > > > rwsem queues like a FIFO. So, we might need non-FIFO writelock,
> > > > not non-
> > > block one.
> > > >
> > > > My suggestion is that the writer gives reader a chance to be
> > > > scheduled by using the minimum msleep() instaed of spinning
> > > > without blocking by writer. Also, The *_nonblock may be changed to
> > > > *_nonfifo appropriately
> > > to this concept.
> > > > In terms of performance, when trylock is failed, this minimum
> > > > periodic msleep will have the same performance as the tick-based
> > > schedule()/wake_up_q().
> > > >
> > > > Suggested-by: Wonmin Jung <wonmin.jung@lge.com>
> > > > Signed-off-by: Chanho Min <chanho.min@lge.com>
> > >
> > > Hrm, converting unconditionally with msleep() looks too drastic.
> >
> > Yes, it looks drastic. But, IMHO, I can't say busy-spin is not non-
> drastic.
> > To fix the root cause, We may need another rwsem that does not work as
> > a FIFO.
> 
> Right, but applying msleep(1) unconditionally is really bad.
> 
> > > I guess you've hit this while not explicitly using the linked PCM
> > > streams, i.e. in the call of snd_pcm_unlink() at close, right?
> > >
> > > Then this can be worked around by checking the link before calling it.
> > > Could you check the patch below?
> >
> > More testing is needed, but it seems to be fixed by your patch.
> > We may not use the linked PCM.
> 
> Then I'm sure that my patch papers over.
Thanks, Plz let me know when the patch is merged.

> 
> > But, If the linked PCM is enabled,  Can snd_pcm_unlink() be called?
> > This also seems to be a workaround.
> 
> Yes, for the linked streams, something else is needed *in addition*.
> 
> The original fix with busy loop also assumed that this code path (via
> snd_pcm_link() and snd_pcm_unlink()) is the rare occasion, and it didn't
> consider that it were called for regular use cases.  So the fix to make
> things just works for regular use cases without any artifact must be
> implemented in the first place.  The fix for the linked streams comes at
> next.  It might be like your msleep() change as a workaround, but in
> anyway it's far less urgency.

msleep is worst, but If it is harmless, can I apply my patch to the private
tree
temporarily until your next fix comes?
We may use the linked streams in the near future. It makes our product
unstable.
It's urgency for us. How is your opinion?

Thanks
Chanho


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock()
  2018-11-28 23:48       ` Chanho Min
@ 2018-11-29  7:19         ` Takashi Iwai
  2018-11-29 23:03           ` Chanho Min
  0 siblings, 1 reply; 11+ messages in thread
From: Takashi Iwai @ 2018-11-29  7:19 UTC (permalink / raw)
  To: Chanho Min
  Cc: 'Jaroslav Kysela', 'Takashi Iwai',
	'Vinod Koul', 'Daniel Mentz',
	linux-kernel, alsa-devel, 'Seungho Park',
	'Jongsung Kim', 'Wonmin Jung',
	'Jaehyun Kim', 'Hyonwoo Park'

On Thu, 29 Nov 2018 00:48:20 +0100,
Chanho Min wrote:
> 
> > > > On Mon, 26 Nov 2018 06:36:37 +0100,
> > > > Chanho Min wrote:
> > > > >
> > > > > Commit 67ec1072b053 ("ALSA: pcm: Fix rwsem deadlock for non-atomic
> > > > > PCM
> > > > > stream") fixes deadlock for non-atomic PCM stream. But, This patch
> > > > causes antother stuck.
> > > > > If writer is RT thread and reader is a normal thread, the reader
> > > > > thread will be difficult to get scheduled. It may not give chance
> > > > > to release readlocks and writer gets stuck for a long time if they
> > > > > are
> > > > pinned to single cpu.
> > > > >
> > > > > The deadlock described in the previous commit is because the linux
> > > > > rwsem queues like a FIFO. So, we might need non-FIFO writelock,
> > > > > not non-
> > > > block one.
> > > > >
> > > > > My suggestion is that the writer gives reader a chance to be
> > > > > scheduled by using the minimum msleep() instaed of spinning
> > > > > without blocking by writer. Also, The *_nonblock may be changed to
> > > > > *_nonfifo appropriately
> > > > to this concept.
> > > > > In terms of performance, when trylock is failed, this minimum
> > > > > periodic msleep will have the same performance as the tick-based
> > > > schedule()/wake_up_q().
> > > > >
> > > > > Suggested-by: Wonmin Jung <wonmin.jung@lge.com>
> > > > > Signed-off-by: Chanho Min <chanho.min@lge.com>
> > > >
> > > > Hrm, converting unconditionally with msleep() looks too drastic.
> > >
> > > Yes, it looks drastic. But, IMHO, I can't say busy-spin is not non-
> > drastic.
> > > To fix the root cause, We may need another rwsem that does not work as
> > > a FIFO.
> > 
> > Right, but applying msleep(1) unconditionally is really bad.
> > 
> > > > I guess you've hit this while not explicitly using the linked PCM
> > > > streams, i.e. in the call of snd_pcm_unlink() at close, right?
> > > >
> > > > Then this can be worked around by checking the link before calling it.
> > > > Could you check the patch below?
> > >
> > > More testing is needed, but it seems to be fixed by your patch.
> > > We may not use the linked PCM.
> > 
> > Then I'm sure that my patch papers over.
> Thanks, Plz let me know when the patch is merged.

I'm going to merge the patch below now.
It slips from the pull request to Linus in today, but will be the next
one for 4.20-rc6.

> > > But, If the linked PCM is enabled,  Can snd_pcm_unlink() be called?
> > > This also seems to be a workaround.
> > 
> > Yes, for the linked streams, something else is needed *in addition*.
> > 
> > The original fix with busy loop also assumed that this code path (via
> > snd_pcm_link() and snd_pcm_unlink()) is the rare occasion, and it didn't
> > consider that it were called for regular use cases.  So the fix to make
> > things just works for regular use cases without any artifact must be
> > implemented in the first place.  The fix for the linked streams comes at
> > next.  It might be like your msleep() change as a workaround, but in
> > anyway it's far less urgency.
> 
> msleep is worst, but If it is harmless, can I apply my patch to the private
> tree
> temporarily until your next fix comes?
> We may use the linked streams in the near future. It makes our product
> unstable.
> It's urgency for us. How is your opinion?

I'll add your fix on top of mine for now.  The msleep() is applied
only for linked streams, so it's not that bad any longer.


thanks,

Takashi

-- 8< --
From: Takashi Iwai <tiwai@suse.de>
Subject: [PATCH] ALSA: pcm: Call snd_pcm_unlink() conditionally at closing

Currently the PCM core calls snd_pcm_unlink() always unconditionally
at closing a stream.  However, since snd_pcm_unlink() invokes the
global rwsem down, the lock can be easily contended.  More badly, when
a thread runs in a high priority RT-FIFO, it may stall at spinning.

Basically the call of snd_pcm_unlink() is required only for the linked
streams that are already rare occasion.  For normal use cases, this
code path is fairly superfluous.

As an optimization (and also as a workaround for the RT problem
above in normal situations without linked streams), this patch adds a
check before calling snd_pcm_unlink() and calls it only when needed.

Reported-by: Chanho Min <chanho.min@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 sound/core/pcm_native.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 66c90f486af9..6afcc393113a 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -2369,7 +2369,8 @@ int snd_pcm_hw_constraints_complete(struct snd_pcm_substream *substream)
 
 static void pcm_release_private(struct snd_pcm_substream *substream)
 {
-	snd_pcm_unlink(substream);
+	if (snd_pcm_stream_linked(substream))
+		snd_pcm_unlink(substream);
 }
 
 void snd_pcm_release_substream(struct snd_pcm_substream *substream)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock()
  2018-11-29  7:19         ` Takashi Iwai
@ 2018-11-29 23:03           ` Chanho Min
  0 siblings, 0 replies; 11+ messages in thread
From: Chanho Min @ 2018-11-29 23:03 UTC (permalink / raw)
  To: 'Takashi Iwai'
  Cc: 'Jaroslav Kysela', 'Takashi Iwai',
	'Vinod Koul', 'Daniel Mentz',
	linux-kernel, alsa-devel, 'Seungho Park',
	'Jongsung Kim', 'Wonmin Jung',
	'Jaehyun Kim', 'Hyonwoo Park'

> Chanho Min wrote:
> >
> > > > > On Mon, 26 Nov 2018 06:36:37 +0100,
> > > > > Chanho Min wrote:
> > > > > >
> > > > > > Commit 67ec1072b053 ("ALSA: pcm: Fix rwsem deadlock for non-
> atomic
> > > > > > PCM
> > > > > > stream") fixes deadlock for non-atomic PCM stream. But, This
> patch
> > > > > causes antother stuck.
> > > > > > If writer is RT thread and reader is a normal thread, the reader
> > > > > > thread will be difficult to get scheduled. It may not give
> chance
> > > > > > to release readlocks and writer gets stuck for a long time if
> they
> > > > > > are
> > > > > pinned to single cpu.
> > > > > >
> > > > > > The deadlock described in the previous commit is because the
> linux
> > > > > > rwsem queues like a FIFO. So, we might need non-FIFO writelock,
> > > > > > not non-
> > > > > block one.
> > > > > >
> > > > > > My suggestion is that the writer gives reader a chance to be
> > > > > > scheduled by using the minimum msleep() instaed of spinning
> > > > > > without blocking by writer. Also, The *_nonblock may be changed
> to
> > > > > > *_nonfifo appropriately
> > > > > to this concept.
> > > > > > In terms of performance, when trylock is failed, this minimum
> > > > > > periodic msleep will have the same performance as the tick-based
> > > > > schedule()/wake_up_q().
> > > > > >
> > > > > > Suggested-by: Wonmin Jung <wonmin.jung@lge.com>
> > > > > > Signed-off-by: Chanho Min <chanho.min@lge.com>
> > > > >
> > > > > Hrm, converting unconditionally with msleep() looks too drastic.
> > > >
> > > > Yes, it looks drastic. But, IMHO, I can't say busy-spin is not non-
> > > drastic.
> > > > To fix the root cause, We may need another rwsem that does not work
> as
> > > > a FIFO.
> > >
> > > Right, but applying msleep(1) unconditionally is really bad.
> > >
> > > > > I guess you've hit this while not explicitly using the linked PCM
> > > > > streams, i.e. in the call of snd_pcm_unlink() at close, right?
> > > > >
> > > > > Then this can be worked around by checking the link before calling
> it.
> > > > > Could you check the patch below?
> > > >
> > > > More testing is needed, but it seems to be fixed by your patch.
> > > > We may not use the linked PCM.
> > >
> > > Then I'm sure that my patch papers over.
> > Thanks, Plz let me know when the patch is merged.
> 
> I'm going to merge the patch below now.
> It slips from the pull request to Linus in today, but will be the next
> one for 4.20-rc6.
> 
> > > > But, If the linked PCM is enabled,  Can snd_pcm_unlink() be called?
> > > > This also seems to be a workaround.
> > >
> > > Yes, for the linked streams, something else is needed *in addition*.
> > >
> > > The original fix with busy loop also assumed that this code path (via
> > > snd_pcm_link() and snd_pcm_unlink()) is the rare occasion, and it
> didn't
> > > consider that it were called for regular use cases.  So the fix to
> make
> > > things just works for regular use cases without any artifact must be
> > > implemented in the first place.  The fix for the linked streams comes
> at
> > > next.  It might be like your msleep() change as a workaround, but in
> > > anyway it's far less urgency.
> >
> > msleep is worst, but If it is harmless, can I apply my patch to the
> private
> > tree
> > temporarily until your next fix comes?
> > We may use the linked streams in the near future. It makes our product
> > unstable.
> > It's urgency for us. How is your opinion?
> 
> I'll add your fix on top of mine for now.  The msleep() is applied
> only for linked streams, so it's not that bad any longer.
> 
> 
> thanks,
> 
> Takashi
> 
> -- 8< --
> From: Takashi Iwai <tiwai@suse.de>
> Subject: [PATCH] ALSA: pcm: Call snd_pcm_unlink() conditionally at closing
> 
> Currently the PCM core calls snd_pcm_unlink() always unconditionally
> at closing a stream.  However, since snd_pcm_unlink() invokes the
> global rwsem down, the lock can be easily contended.  More badly, when
> a thread runs in a high priority RT-FIFO, it may stall at spinning.
> 
> Basically the call of snd_pcm_unlink() is required only for the linked
> streams that are already rare occasion.  For normal use cases, this
> code path is fairly superfluous.
> 
> As an optimization (and also as a workaround for the RT problem
> above in normal situations without linked streams), this patch adds a
> check before calling snd_pcm_unlink() and calls it only when needed.
> 
> Reported-by: Chanho Min <chanho.min@lge.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> ---
>  sound/core/pcm_native.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
> index 66c90f486af9..6afcc393113a 100644
> --- a/sound/core/pcm_native.c
> +++ b/sound/core/pcm_native.c
> @@ -2369,7 +2369,8 @@ int snd_pcm_hw_constraints_complete(struct
> snd_pcm_substream *substream)
> 
>  static void pcm_release_private(struct snd_pcm_substream *substream)
>  {
> -	snd_pcm_unlink(substream);
> +	if (snd_pcm_stream_linked(substream))
> +		snd_pcm_unlink(substream);
>  }
> 
>  void snd_pcm_release_substream(struct snd_pcm_substream *substream)
> --
> 2.19.1
Great, Many thanks for the fast response.

Chanho


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock()
  2018-11-24  7:32 Chanho Min
  2018-11-24 10:56 ` kbuild test robot
@ 2018-11-25 11:30 ` kbuild test robot
  1 sibling, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2018-11-25 11:30 UTC (permalink / raw)
  To: Chanho Min
  Cc: kbuild-all, Jaroslav Kysela, Takashi Iwai, Vinod Koul,
	Daniel Mentz, linux-kernel, alsa-devel, Seungho Park,
	Jongsung Kim, Wonmin Jung, Jaehyun Kim, Hyonwoo Park, Chanho Min

[-- Attachment #1: Type: text/plain, Size: 10047 bytes --]

Hi Chanho,

I love your patch! Yet something to improve:

[auto build test ERROR on sound/for-next]
[also build test ERROR on v4.20-rc3 next-20181123]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Chanho-Min/ALSA-pcm-Fix-starvation-on-down_write_nonblock/20181124-182630
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-next
config: x86_64-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   include/linux/slab.h:332:43: warning: dubious: x & !y
   include/linux/slab.h:332:43: warning: dubious: x & !y
   include/linux/slab.h:332:43: warning: dubious: x & !y
   include/linux/slab.h:332:43: warning: dubious: x & !y
   sound/core/pcm_native.c:590:51: warning: incorrect type in assignment (different base types)
   sound/core/pcm_native.c:590:51:    expected restricted snd_pcm_state_t [usertype] state
   sound/core/pcm_native.c:590:51:    got int [signed] state
   sound/core/pcm_native.c:755:38: warning: incorrect type in argument 2 (different base types)
   sound/core/pcm_native.c:755:38:    expected int [signed] state
   sound/core/pcm_native.c:755:38:    got restricted snd_pcm_state_t [usertype] <noident>
   sound/core/pcm_native.c:767:38: warning: incorrect type in argument 2 (different base types)
   sound/core/pcm_native.c:767:38:    expected int [signed] state
   sound/core/pcm_native.c:767:38:    got restricted snd_pcm_state_t [usertype] <noident>
   sound/core/pcm_native.c:816:38: warning: incorrect type in argument 2 (different base types)
   sound/core/pcm_native.c:816:38:    expected int [signed] state
   sound/core/pcm_native.c:816:38:    got restricted snd_pcm_state_t [usertype] <noident>
   sound/core/pcm_native.c:1226:32: warning: incorrect type in assignment (different base types)
   sound/core/pcm_native.c:1226:32:    expected restricted snd_pcm_state_t [usertype] state
   sound/core/pcm_native.c:1226:32:    got int [signed] state
   sound/core/pcm_native.c:1250:31: warning: incorrect type in argument 3 (different base types)
   sound/core/pcm_native.c:1250:31:    expected int [signed] state
   sound/core/pcm_native.c:1250:31:    got restricted snd_pcm_state_t [usertype] <noident>
   sound/core/pcm_native.c:1257:40: warning: incorrect type in argument 3 (different base types)
   sound/core/pcm_native.c:1257:40:    expected int [signed] state
   sound/core/pcm_native.c:1257:40:    got restricted snd_pcm_state_t [usertype] <noident>
   sound/core/pcm_native.c:1283:28: warning: restricted snd_pcm_state_t degrades to integer
   sound/core/pcm_native.c:1285:40: warning: incorrect type in assignment (different base types)
   sound/core/pcm_native.c:1285:40:    expected restricted snd_pcm_state_t [usertype] state
   sound/core/pcm_native.c:1285:40:    got int [signed] state
   sound/core/pcm_native.c:1309:64: warning: incorrect type in argument 3 (different base types)
   sound/core/pcm_native.c:1309:64:    expected int [signed] state
   sound/core/pcm_native.c:1309:64:    got restricted snd_pcm_state_t [usertype] state
   sound/core/pcm_native.c:1325:38: warning: incorrect type in argument 3 (different base types)
   sound/core/pcm_native.c:1325:38:    expected int [signed] state
   sound/core/pcm_native.c:1325:38:    got restricted snd_pcm_state_t [usertype] <noident>
   sound/core/pcm_native.c:1684:38: warning: incorrect type in argument 2 (different base types)
   sound/core/pcm_native.c:1684:38:    expected int [signed] state
   sound/core/pcm_native.c:1684:38:    got restricted snd_pcm_state_t [usertype] <noident>
   sound/core/pcm_native.c:1750:61: warning: incorrect type in argument 2 (different base types)
   sound/core/pcm_native.c:1750:61:    expected int [signed] state
   sound/core/pcm_native.c:1750:61:    got restricted snd_pcm_state_t [usertype] <noident>
   sound/core/pcm_native.c:1751:63: warning: incorrect type in argument 2 (different base types)
   sound/core/pcm_native.c:1751:63:    expected int [signed] state
   sound/core/pcm_native.c:1751:63:    got restricted snd_pcm_state_t [usertype] <noident>
   sound/core/pcm_native.c:1768:76: warning: incorrect type in initializer (different base types)
   sound/core/pcm_native.c:1768:76:    expected int [signed] new_state
   sound/core/pcm_native.c:1768:76:    got restricted snd_pcm_state_t
   include/linux/slab.h:332:43: warning: dubious: x & !y
>> sound/core/pcm_native.c:99:17: error: undefined identifier 'msleep'
>> sound/core/pcm_native.c:99:23: error: not a function <noident>
   sound/core/pcm_native.c:2089:26: warning: restricted snd_pcm_format_t degrades to integer
   sound/core/pcm_native.c:2093:54: warning: incorrect type in argument 1 (different base types)
   sound/core/pcm_native.c:2093:54:    expected restricted snd_pcm_format_t [usertype] format
   sound/core/pcm_native.c:2093:54:    got unsigned int [unsigned] [assigned] k
   sound/core/pcm_native.c:2111:26: warning: restricted snd_pcm_format_t degrades to integer
   sound/core/pcm_native.c:2115:54: warning: incorrect type in argument 1 (different base types)
   sound/core/pcm_native.c:2115:54:    expected restricted snd_pcm_format_t [usertype] format
   sound/core/pcm_native.c:2115:54:    got unsigned int [unsigned] [assigned] k
   sound/core/pcm_native.c:2295:30: warning: restricted snd_pcm_access_t degrades to integer
   sound/core/pcm_native.c:2297:30: warning: restricted snd_pcm_access_t degrades to integer
   sound/core/pcm_native.c:2300:38: warning: restricted snd_pcm_access_t degrades to integer
   sound/core/pcm_native.c:2302:38: warning: restricted snd_pcm_access_t degrades to integer
   sound/core/pcm_native.c:2304:38: warning: restricted snd_pcm_access_t degrades to integer
   sound/core/pcm_native.c:2314:86: warning: restricted snd_pcm_subformat_t degrades to integer
   include/linux/slab.h:332:43: warning: dubious: x & !y
   include/linux/slab.h:332:43: warning: dubious: x & !y
   include/linux/slab.h:332:43: warning: dubious: x & !y
   sound/core/pcm_compat.c:226:13: warning: incorrect type in assignment (different base types)
   sound/core/pcm_compat.c:226:13:    expected signed int [signed] [explicitly-signed] __pu_val
   sound/core/pcm_compat.c:226:13:    got restricted snd_pcm_state_t [addressable] [assigned] [usertype] state
   sound/core/pcm_compat.c:235:13: warning: incorrect type in assignment (different base types)
   sound/core/pcm_compat.c:235:13:    expected signed int [signed] [explicitly-signed] __pu_val
   sound/core/pcm_compat.c:235:13:    got restricted snd_pcm_state_t [addressable] [assigned] [usertype] suspended_state
   sound/core/pcm_compat.c:290:13: warning: incorrect type in assignment (different base types)
   sound/core/pcm_compat.c:290:13:    expected signed int [signed] [explicitly-signed] __pu_val
   sound/core/pcm_compat.c:290:13:    got restricted snd_pcm_state_t [addressable] [assigned] [usertype] state
   sound/core/pcm_compat.c:299:13: warning: incorrect type in assignment (different base types)
   sound/core/pcm_compat.c:299:13:    expected signed int [signed] [explicitly-signed] __pu_val
   sound/core/pcm_compat.c:299:13:    got restricted snd_pcm_state_t [addressable] [assigned] [usertype] suspended_state
   include/linux/slab.h:332:43: warning: dubious: x & !y
   include/linux/slab.h:332:43: warning: dubious: x & !y
   sound/core/pcm_compat.c:525:13: warning: incorrect type in assignment (different base types)
   sound/core/pcm_compat.c:525:13:    expected signed int [signed] [explicitly-signed] __pu_val
   sound/core/pcm_compat.c:525:13:    got restricted snd_pcm_state_t [assigned] [usertype] state
   sound/core/pcm_compat.c:528:13: warning: incorrect type in assignment (different base types)
   sound/core/pcm_compat.c:528:13:    expected signed int [signed] [explicitly-signed] __pu_val
   sound/core/pcm_compat.c:528:13:    got restricted snd_pcm_state_t [assigned] [usertype] suspended_state
   sound/core/pcm_compat.c:614:13: warning: incorrect type in assignment (different base types)
   sound/core/pcm_compat.c:614:13:    expected signed int [signed] [explicitly-signed] __pu_val
   sound/core/pcm_compat.c:614:13:    got restricted snd_pcm_state_t [assigned] [usertype] state
   sound/core/pcm_compat.c:617:13: warning: incorrect type in assignment (different base types)
   sound/core/pcm_compat.c:617:13:    expected signed int [signed] [explicitly-signed] __pu_val
   sound/core/pcm_compat.c:617:13:    got restricted snd_pcm_state_t [assigned] [usertype] suspended_state
   sound/core/pcm_native.c:127:9: warning: context imbalance in '__snd_pcm_stream_lock_mode' - different lock contexts for basic block
   sound/core/pcm_native.c:137:28: warning: context imbalance in '__snd_pcm_stream_unlock_mode' - unexpected unlock
   sound/core/pcm_native.c:1097:52: warning: context imbalance in 'snd_pcm_action_group' - unexpected unlock
   sound/core/pcm_native.c: In function 'down_write_nonblock':
   sound/core/pcm_native.c:99:3: error: implicit declaration of function 'msleep' [-Werror=implicit-function-declaration]
      msleep(1);
      ^~~~~~
   cc1: some warnings being treated as errors

vim +/msleep +99 sound/core/pcm_native.c

    89	
    90	/* Writer in rwsem may block readers even during its waiting in queue,
    91	 * and this may lead to a deadlock when the code path takes read sem
    92	 * twice (e.g. one in snd_pcm_action_nonatomic() and another in
    93	 * snd_pcm_stream_lock()).  As a (suboptimal) workaround, let writer to
    94	 * spin until it gets the lock.
    95	 */
    96	static inline void down_write_nonblock(struct rw_semaphore *lock)
    97	{
    98		while (!down_write_trylock(lock))
  > 99			msleep(1);
   100	}
   101	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 66605 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock()
  2018-11-24  7:32 Chanho Min
@ 2018-11-24 10:56 ` kbuild test robot
  2018-11-25 11:30 ` kbuild test robot
  1 sibling, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2018-11-24 10:56 UTC (permalink / raw)
  To: Chanho Min
  Cc: kbuild-all, Jaroslav Kysela, Takashi Iwai, Vinod Koul,
	Daniel Mentz, linux-kernel, alsa-devel, Seungho Park,
	Jongsung Kim, Wonmin Jung, Jaehyun Kim, Hyonwoo Park, Chanho Min

[-- Attachment #1: Type: text/plain, Size: 1675 bytes --]

Hi Chanho,

I love your patch! Yet something to improve:

[auto build test ERROR on sound/for-next]
[also build test ERROR on v4.20-rc3 next-20181123]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Chanho-Min/ALSA-pcm-Fix-starvation-on-down_write_nonblock/20181124-182630
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git for-next
config: x86_64-randconfig-x001-201846 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   sound/core/pcm_native.c: In function 'down_write_nonblock':
>> sound/core/pcm_native.c:99:3: error: implicit declaration of function 'msleep' [-Werror=implicit-function-declaration]
      msleep(1);
      ^~~~~~
   cc1: some warnings being treated as errors

vim +/msleep +99 sound/core/pcm_native.c

    89	
    90	/* Writer in rwsem may block readers even during its waiting in queue,
    91	 * and this may lead to a deadlock when the code path takes read sem
    92	 * twice (e.g. one in snd_pcm_action_nonatomic() and another in
    93	 * snd_pcm_stream_lock()).  As a (suboptimal) workaround, let writer to
    94	 * spin until it gets the lock.
    95	 */
    96	static inline void down_write_nonblock(struct rw_semaphore *lock)
    97	{
    98		while (!down_write_trylock(lock))
  > 99			msleep(1);
   100	}
   101	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26749 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock()
@ 2018-11-24  7:32 Chanho Min
  2018-11-24 10:56 ` kbuild test robot
  2018-11-25 11:30 ` kbuild test robot
  0 siblings, 2 replies; 11+ messages in thread
From: Chanho Min @ 2018-11-24  7:32 UTC (permalink / raw)
  To: Jaroslav Kysela, Takashi Iwai, Vinod Koul, Daniel Mentz
  Cc: linux-kernel, alsa-devel, Seungho Park, Jongsung Kim,
	Wonmin Jung, Jaehyun Kim, Hyonwoo Park, Chanho Min

Commit 67ec1072b053 (ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream)
fixes deadlock for non-atomic PCM stream. But, This patch causes antother stuck.
If writer is RT thread and reader is a normal thread, the reader thread will
be difficult to get scheduled. It may not give chance to release read locks
and writer gets stuck for a long time or forever if they are pinned to single
cpu.

To fix this, The writer gives reader a chance to be scheduled by using the
minimum msleep() instaed of spinning. This is for concept, We may need to
change the function name and comments or suggest another approach.

Signed-off-by: Chanho Min <chanho.min@lge.com>
---
 sound/core/pcm_native.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 66c90f4..88d4aab 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -96,7 +96,7 @@ static DECLARE_RWSEM(snd_pcm_link_rwsem);
 static inline void down_write_nonblock(struct rw_semaphore *lock)
 {
 	while (!down_write_trylock(lock))
-		cond_resched();
+		msleep(1);
 }
 
 #define PCM_LOCK_DEFAULT	0
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-11-29 23:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-26  5:36 [PATCH] ALSA: pcm: Fix starvation on down_write_nonblock() Chanho Min
2018-11-26  8:36 ` Takashi Iwai
2018-11-27 23:47   ` Chanho Min
2018-11-28  8:37     ` Takashi Iwai
2018-11-28 23:48       ` Chanho Min
2018-11-29  7:19         ` Takashi Iwai
2018-11-29 23:03           ` Chanho Min
2018-11-28  6:26   ` Chanho Min
  -- strict thread matches above, loose matches on Subject: below --
2018-11-24  7:32 Chanho Min
2018-11-24 10:56 ` kbuild test robot
2018-11-25 11:30 ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).