linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access
@ 2019-11-12 17:17 paulhsia
  2019-11-12 17:17 ` [PATCH 1/2] ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed() paulhsia
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: paulhsia @ 2019-11-12 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mark Brown, Takashi Iwai, alsa-devel, paulhsia

Since
- snd_pcm_detach_substream sets runtime to null without stream lock and
- snd_pcm_period_elapsed checks the nullity of the runtime outside of
  stream lock.

This will trigger null memory access in snd_pcm_running() call in
snd_pcm_period_elapsed.

paulhsia (2):
  ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed()
  ALSA: pcm: Use stream lock in snd_pcm_detach_substream()

 sound/core/pcm.c     | 8 +++++++-
 sound/core/pcm_lib.c | 8 ++++++--
 2 files changed, 13 insertions(+), 3 deletions(-)

-- 
2.24.0.rc1.363.gb1bccd3e3d-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed()
  2019-11-12 17:17 [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access paulhsia
@ 2019-11-12 17:17 ` paulhsia
  2019-11-12 17:17 ` [PATCH 2/2] ALSA: pcm: Use stream lock in snd_pcm_detach_substream() paulhsia
  2019-11-12 18:16 ` [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access Takashi Iwai
  2 siblings, 0 replies; 14+ messages in thread
From: paulhsia @ 2019-11-12 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mark Brown, Takashi Iwai, alsa-devel, paulhsia

If the nullity check for `substream->runtime` is outside of the lock
region, it is possible to have a null runtime in the critical section
if snd_pcm_detach_substream is called right before the lock.

Signed-off-by: paulhsia <paulhsia@chromium.org>
---
 sound/core/pcm_lib.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/sound/core/pcm_lib.c b/sound/core/pcm_lib.c
index d80041ea4e01..2236b5e0c1f2 100644
--- a/sound/core/pcm_lib.c
+++ b/sound/core/pcm_lib.c
@@ -1782,11 +1782,14 @@ void snd_pcm_period_elapsed(struct snd_pcm_substream *substream)
 	struct snd_pcm_runtime *runtime;
 	unsigned long flags;
 
-	if (PCM_RUNTIME_CHECK(substream))
+	if (snd_BUG_ON(!substream))
 		return;
-	runtime = substream->runtime;
 
 	snd_pcm_stream_lock_irqsave(substream, flags);
+	if (PCM_RUNTIME_CHECK(substream))
+		goto _unlock;
+	runtime = substream->runtime;
+
 	if (!snd_pcm_running(substream) ||
 	    snd_pcm_update_hw_ptr0(substream, 1) < 0)
 		goto _end;
@@ -1797,6 +1800,7 @@ void snd_pcm_period_elapsed(struct snd_pcm_substream *substream)
 #endif
  _end:
 	kill_fasync(&runtime->fasync, SIGIO, POLL_IN);
+ _unlock:
 	snd_pcm_stream_unlock_irqrestore(substream, flags);
 }
 EXPORT_SYMBOL(snd_pcm_period_elapsed);
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] ALSA: pcm: Use stream lock in snd_pcm_detach_substream()
  2019-11-12 17:17 [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access paulhsia
  2019-11-12 17:17 ` [PATCH 1/2] ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed() paulhsia
@ 2019-11-12 17:17 ` paulhsia
  2019-11-12 18:16 ` [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access Takashi Iwai
  2 siblings, 0 replies; 14+ messages in thread
From: paulhsia @ 2019-11-12 17:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mark Brown, Takashi Iwai, alsa-devel, paulhsia

Since snd_pcm_detach_substream modifies the input substream, the
function should use stream lock to protect the modification section.

Signed-off-by: paulhsia <paulhsia@chromium.org>
---
 sound/core/pcm.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/sound/core/pcm.c b/sound/core/pcm.c
index 9a72d641743d..06da9cb8984a 100644
--- a/sound/core/pcm.c
+++ b/sound/core/pcm.c
@@ -980,8 +980,12 @@ void snd_pcm_detach_substream(struct snd_pcm_substream *substream)
 {
 	struct snd_pcm_runtime *runtime;
 
-	if (PCM_RUNTIME_CHECK(substream))
+	if (snd_BUG_ON(!substream))
 		return;
+
+	snd_pcm_stream_lock_irq(substream);
+	if (PCM_RUNTIME_CHECK(substream))
+		goto unlock;
 	runtime = substream->runtime;
 	if (runtime->private_free != NULL)
 		runtime->private_free(runtime);
@@ -1000,6 +1004,8 @@ void snd_pcm_detach_substream(struct snd_pcm_substream *substream)
 	put_pid(substream->pid);
 	substream->pid = NULL;
 	substream->pstr->substream_opened--;
+ unlock:
+	snd_pcm_stream_unlock_irq(substream);
 }
 
 static ssize_t show_pcm_class(struct device *dev,
-- 
2.24.0.rc1.363.gb1bccd3e3d-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access
  2019-11-12 17:17 [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access paulhsia
  2019-11-12 17:17 ` [PATCH 1/2] ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed() paulhsia
  2019-11-12 17:17 ` [PATCH 2/2] ALSA: pcm: Use stream lock in snd_pcm_detach_substream() paulhsia
@ 2019-11-12 18:16 ` Takashi Iwai
  2019-11-13  7:24   ` Chih-Yang Hsia
  2 siblings, 1 reply; 14+ messages in thread
From: Takashi Iwai @ 2019-11-12 18:16 UTC (permalink / raw)
  To: paulhsia; +Cc: linux-kernel, Mark Brown, Takashi Iwai, alsa-devel

On Tue, 12 Nov 2019 18:17:13 +0100,
paulhsia wrote:
> 
> Since
> - snd_pcm_detach_substream sets runtime to null without stream lock and
> - snd_pcm_period_elapsed checks the nullity of the runtime outside of
>   stream lock.
> 
> This will trigger null memory access in snd_pcm_running() call in
> snd_pcm_period_elapsed.

Well, if a stream is detached, it means that the stream must have been
already closed; i.e. it's already a clear bug in the driver that
snd_pcm_period_elapsed() is called against such a stream.

Or am I missing other possible case?


thanks,

Takashi

> 
> paulhsia (2):
>   ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed()
>   ALSA: pcm: Use stream lock in snd_pcm_detach_substream()
> 
>  sound/core/pcm.c     | 8 +++++++-
>  sound/core/pcm_lib.c | 8 ++++++--
>  2 files changed, 13 insertions(+), 3 deletions(-)
> 
> -- 
> 2.24.0.rc1.363.gb1bccd3e3d-goog
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access
  2019-11-12 18:16 ` [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access Takashi Iwai
@ 2019-11-13  7:24   ` Chih-Yang Hsia
  2019-11-13  9:47     ` Takashi Iwai
  0 siblings, 1 reply; 14+ messages in thread
From: Chih-Yang Hsia @ 2019-11-13  7:24 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: linux-kernel, Mark Brown, Takashi Iwai, alsa-devel

On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai <tiwai@suse.de> wrote:
>
> On Tue, 12 Nov 2019 18:17:13 +0100,
> paulhsia wrote:
> >
> > Since
> > - snd_pcm_detach_substream sets runtime to null without stream lock and
> > - snd_pcm_period_elapsed checks the nullity of the runtime outside of
> >   stream lock.
> >
> > This will trigger null memory access in snd_pcm_running() call in
> > snd_pcm_period_elapsed.
>
> Well, if a stream is detached, it means that the stream must have been
> already closed; i.e. it's already a clear bug in the driver that
> snd_pcm_period_elapsed() is called against such a stream.
>
> Or am I missing other possible case?
>
>
> thanks,
>
> Takashi
>

In multithreaded environment, it is possible to have to access both
`interrupt_handler` (from irq) and `substream close` (from
snd_pcm_release) at the same time.
Therefore, in driver implementation, if "substream close function" and
the "code section where snd_pcm_period_elapsed() in" do not hold the
same lock, then the following things can happen:

1. interrupt_handler -> goes into snd_pcm_period_elapsed with a valid
sustream pointer
2. snd_pcm_release_substream: call close without blocking
3. snd_pcm_release_substream: call snd_pcm_detache_substream and set
substream->runtime to NULL
4. interrupt_handler -> call snd_pcm_runtime() and crash while
accessing fields in `substream->runtime`

e.g. In intel8x0.c driver for ac97 device,
In driver intel8x0.c, `snd_pcm_period_elapsed` is called after
checking `ichdev->substream` in `snd_intel8x0_update`.
And if a `snd_pcm_release` call from alsa-lib and pass through close()
and run to snd_pcm_detach_substream() in another thread, it's possible
to trigger a crash.
I can reproduce the issue within a multithread VM easily.

My patches are trying to provide a basic protection for this situation
(and internal pcm lock between detach and elapsed), since
- the usage of `snd_pcm_period_elapsed` does not warn callers about
the possible race if the driver does not  force the order for `calling
snd_pcm_period_elapsed` and `close` by lock and
- lots of drivers already have this hidden issue and I can't fix them
one by one (You can check the "snd_pcm_period_elapsed usage" and the
"close implementation" within all the drivers). The most common
mistake is that
  - Checking if the substream is null and call into snd_pcm_period_elapsed
  - But `close` can happen anytime, pass without block and
snd_pcm_detach_substream will be trigger right after it

Best,
Paul

> >
> > paulhsia (2):
> >   ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed()
> >   ALSA: pcm: Use stream lock in snd_pcm_detach_substream()
> >
> >  sound/core/pcm.c     | 8 +++++++-
> >  sound/core/pcm_lib.c | 8 ++++++--
> >  2 files changed, 13 insertions(+), 3 deletions(-)
> >
> > --
> > 2.24.0.rc1.363.gb1bccd3e3d-goog
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access
  2019-11-13  7:24   ` Chih-Yang Hsia
@ 2019-11-13  9:47     ` Takashi Iwai
  2019-11-13 11:36       ` Takashi Iwai
  0 siblings, 1 reply; 14+ messages in thread
From: Takashi Iwai @ 2019-11-13  9:47 UTC (permalink / raw)
  To: Chih-Yang Hsia; +Cc: linux-kernel, Mark Brown, Takashi Iwai, alsa-devel

On Wed, 13 Nov 2019 08:24:41 +0100,
Chih-Yang Hsia wrote:
> 
> On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai <tiwai@suse.de> wrote:
> >
> > On Tue, 12 Nov 2019 18:17:13 +0100,
> > paulhsia wrote:
> > >
> > > Since
> > > - snd_pcm_detach_substream sets runtime to null without stream lock and
> > > - snd_pcm_period_elapsed checks the nullity of the runtime outside of
> > >   stream lock.
> > >
> > > This will trigger null memory access in snd_pcm_running() call in
> > > snd_pcm_period_elapsed.
> >
> > Well, if a stream is detached, it means that the stream must have been
> > already closed; i.e. it's already a clear bug in the driver that
> > snd_pcm_period_elapsed() is called against such a stream.
> >
> > Or am I missing other possible case?
> >
> >
> > thanks,
> >
> > Takashi
> >
> 
> In multithreaded environment, it is possible to have to access both
> `interrupt_handler` (from irq) and `substream close` (from
> snd_pcm_release) at the same time.
> Therefore, in driver implementation, if "substream close function" and
> the "code section where snd_pcm_period_elapsed() in" do not hold the
> same lock, then the following things can happen:
> 
> 1. interrupt_handler -> goes into snd_pcm_period_elapsed with a valid
> sustream pointer
> 2. snd_pcm_release_substream: call close without blocking
> 3. snd_pcm_release_substream: call snd_pcm_detache_substream and set
> substream->runtime to NULL
> 4. interrupt_handler -> call snd_pcm_runtime() and crash while
> accessing fields in `substream->runtime`
> 
> e.g. In intel8x0.c driver for ac97 device,
> In driver intel8x0.c, `snd_pcm_period_elapsed` is called after
> checking `ichdev->substream` in `snd_intel8x0_update`.
> And if a `snd_pcm_release` call from alsa-lib and pass through close()
> and run to snd_pcm_detach_substream() in another thread, it's possible
> to trigger a crash.
> I can reproduce the issue within a multithread VM easily.
> 
> My patches are trying to provide a basic protection for this situation
> (and internal pcm lock between detach and elapsed), since
> - the usage of `snd_pcm_period_elapsed` does not warn callers about
> the possible race if the driver does not  force the order for `calling
> snd_pcm_period_elapsed` and `close` by lock and
> - lots of drivers already have this hidden issue and I can't fix them
> one by one (You can check the "snd_pcm_period_elapsed usage" and the
> "close implementation" within all the drivers). The most common
> mistake is that
>   - Checking if the substream is null and call into snd_pcm_period_elapsed
>   - But `close` can happen anytime, pass without block and
> snd_pcm_detach_substream will be trigger right after it

Thanks, point taken.  While this argument is valid and it's good to
harden the PCM core side, the concurrent calls are basically a bug,
and we'd need another fix in anyway.  Also, the patch 2 makes little
sense; there can't be multiple close calls racing with each other.  So
I'll go for taking your fix but only the first patch.

Back to this race: the surfaced issue is, as you pointed out, the race
between snd_pcm_period_elapsed() vs close call.  However, the
fundamental problem is the pending action after the PCM trigger-stop
call.  Since the PCM trigger doesn't block nor wait until the hardware
actually stops the things, the driver may go to the other step even
after this "supposed-to-be-stopped" point.  In your case, it goes up
to close, and crashes.  If we had a sync-stop operation, the interrupt
handler should have finished before moving to the close stage, hence
such a race could be avoided.

It's been a long known problem, and some drivers have the own
implementation for stop-sync.  I think it's time to investigate and
start implementing the fundamental solution.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access
  2019-11-13  9:47     ` Takashi Iwai
@ 2019-11-13 11:36       ` Takashi Iwai
  2019-11-14 14:16         ` Chih-Yang Hsia
  0 siblings, 1 reply; 14+ messages in thread
From: Takashi Iwai @ 2019-11-13 11:36 UTC (permalink / raw)
  To: Chih-Yang Hsia; +Cc: linux-kernel, Mark Brown, Takashi Iwai, alsa-devel

On Wed, 13 Nov 2019 10:47:51 +0100,
Takashi Iwai wrote:
> 
> On Wed, 13 Nov 2019 08:24:41 +0100,
> Chih-Yang Hsia wrote:
> > 
> > On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai <tiwai@suse.de> wrote:
> > >
> > > On Tue, 12 Nov 2019 18:17:13 +0100,
> > > paulhsia wrote:
> > > >
> > > > Since
> > > > - snd_pcm_detach_substream sets runtime to null without stream lock and
> > > > - snd_pcm_period_elapsed checks the nullity of the runtime outside of
> > > >   stream lock.
> > > >
> > > > This will trigger null memory access in snd_pcm_running() call in
> > > > snd_pcm_period_elapsed.
> > >
> > > Well, if a stream is detached, it means that the stream must have been
> > > already closed; i.e. it's already a clear bug in the driver that
> > > snd_pcm_period_elapsed() is called against such a stream.
> > >
> > > Or am I missing other possible case?
> > >
> > >
> > > thanks,
> > >
> > > Takashi
> > >
> > 
> > In multithreaded environment, it is possible to have to access both
> > `interrupt_handler` (from irq) and `substream close` (from
> > snd_pcm_release) at the same time.
> > Therefore, in driver implementation, if "substream close function" and
> > the "code section where snd_pcm_period_elapsed() in" do not hold the
> > same lock, then the following things can happen:
> > 
> > 1. interrupt_handler -> goes into snd_pcm_period_elapsed with a valid
> > sustream pointer
> > 2. snd_pcm_release_substream: call close without blocking
> > 3. snd_pcm_release_substream: call snd_pcm_detache_substream and set
> > substream->runtime to NULL
> > 4. interrupt_handler -> call snd_pcm_runtime() and crash while
> > accessing fields in `substream->runtime`
> > 
> > e.g. In intel8x0.c driver for ac97 device,
> > In driver intel8x0.c, `snd_pcm_period_elapsed` is called after
> > checking `ichdev->substream` in `snd_intel8x0_update`.
> > And if a `snd_pcm_release` call from alsa-lib and pass through close()
> > and run to snd_pcm_detach_substream() in another thread, it's possible
> > to trigger a crash.
> > I can reproduce the issue within a multithread VM easily.
> > 
> > My patches are trying to provide a basic protection for this situation
> > (and internal pcm lock between detach and elapsed), since
> > - the usage of `snd_pcm_period_elapsed` does not warn callers about
> > the possible race if the driver does not  force the order for `calling
> > snd_pcm_period_elapsed` and `close` by lock and
> > - lots of drivers already have this hidden issue and I can't fix them
> > one by one (You can check the "snd_pcm_period_elapsed usage" and the
> > "close implementation" within all the drivers). The most common
> > mistake is that
> >   - Checking if the substream is null and call into snd_pcm_period_elapsed
> >   - But `close` can happen anytime, pass without block and
> > snd_pcm_detach_substream will be trigger right after it
> 
> Thanks, point taken.  While this argument is valid and it's good to
> harden the PCM core side, the concurrent calls are basically a bug,
> and we'd need another fix in anyway.  Also, the patch 2 makes little
> sense; there can't be multiple close calls racing with each other.  So
> I'll go for taking your fix but only the first patch.
> 
> Back to this race: the surfaced issue is, as you pointed out, the race
> between snd_pcm_period_elapsed() vs close call.  However, the
> fundamental problem is the pending action after the PCM trigger-stop
> call.  Since the PCM trigger doesn't block nor wait until the hardware
> actually stops the things, the driver may go to the other step even
> after this "supposed-to-be-stopped" point.  In your case, it goes up
> to close, and crashes.  If we had a sync-stop operation, the interrupt
> handler should have finished before moving to the close stage, hence
> such a race could be avoided.
> 
> It's been a long known problem, and some drivers have the own
> implementation for stop-sync.  I think it's time to investigate and
> start implementing the fundamental solution.

BTW, what we need essentially for intel8x0 is to just call
synchronize_irq() before closing, at best in hw_free procedure:

--- a/sound/pci/intel8x0.c
+++ b/sound/pci/intel8x0.c
@@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct snd_pcm_substream *substream,
 
 static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
 {
+	struct intel8x0 *chip = snd_pcm_substream_chip(substream);
 	struct ichdev *ichdev = get_ichdev(substream);
 
+	synchronize_irq(chip->irq);
 	if (ichdev->pcm_open_flag) {
 		snd_ac97_pcm_close(ichdev->pcm);
 		ichdev->pcm_open_flag = 0;


The same would be needed also at the beginning of the prepare, as the
application may restart the stream without release.

My idea is to add sync_stop PCM ops and call it from PCM core at
snd_pcm_prepare() and snd_pcm_hw_free().


thanks,

Takashi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access
  2019-11-13 11:36       ` Takashi Iwai
@ 2019-11-14 14:16         ` Chih-Yang Hsia
  2019-11-14 14:20           ` Takashi Iwai
  0 siblings, 1 reply; 14+ messages in thread
From: Chih-Yang Hsia @ 2019-11-14 14:16 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: linux-kernel, Mark Brown, Takashi Iwai, alsa-devel

On Wed, Nov 13, 2019 at 7:36 PM Takashi Iwai <tiwai@suse.de> wrote:
>
> On Wed, 13 Nov 2019 10:47:51 +0100,
> Takashi Iwai wrote:
> >
> > On Wed, 13 Nov 2019 08:24:41 +0100,
> > Chih-Yang Hsia wrote:
> > >
> > > On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai <tiwai@suse.de> wrote:
> > > >
> > > > On Tue, 12 Nov 2019 18:17:13 +0100,
> > > > paulhsia wrote:
> > > > >
> > > > > Since
> > > > > - snd_pcm_detach_substream sets runtime to null without stream lock and
> > > > > - snd_pcm_period_elapsed checks the nullity of the runtime outside of
> > > > >   stream lock.
> > > > >
> > > > > This will trigger null memory access in snd_pcm_running() call in
> > > > > snd_pcm_period_elapsed.
> > > >
> > > > Well, if a stream is detached, it means that the stream must have been
> > > > already closed; i.e. it's already a clear bug in the driver that
> > > > snd_pcm_period_elapsed() is called against such a stream.
> > > >
> > > > Or am I missing other possible case?
> > > >
> > > >
> > > > thanks,
> > > >
> > > > Takashi
> > > >
> > >
> > > In multithreaded environment, it is possible to have to access both
> > > `interrupt_handler` (from irq) and `substream close` (from
> > > snd_pcm_release) at the same time.
> > > Therefore, in driver implementation, if "substream close function" and
> > > the "code section where snd_pcm_period_elapsed() in" do not hold the
> > > same lock, then the following things can happen:
> > >
> > > 1. interrupt_handler -> goes into snd_pcm_period_elapsed with a valid
> > > sustream pointer
> > > 2. snd_pcm_release_substream: call close without blocking
> > > 3. snd_pcm_release_substream: call snd_pcm_detache_substream and set
> > > substream->runtime to NULL
> > > 4. interrupt_handler -> call snd_pcm_runtime() and crash while
> > > accessing fields in `substream->runtime`
> > >
> > > e.g. In intel8x0.c driver for ac97 device,
> > > In driver intel8x0.c, `snd_pcm_period_elapsed` is called after
> > > checking `ichdev->substream` in `snd_intel8x0_update`.
> > > And if a `snd_pcm_release` call from alsa-lib and pass through close()
> > > and run to snd_pcm_detach_substream() in another thread, it's possible
> > > to trigger a crash.
> > > I can reproduce the issue within a multithread VM easily.
> > >
> > > My patches are trying to provide a basic protection for this situation
> > > (and internal pcm lock between detach and elapsed), since
> > > - the usage of `snd_pcm_period_elapsed` does not warn callers about
> > > the possible race if the driver does not  force the order for `calling
> > > snd_pcm_period_elapsed` and `close` by lock and
> > > - lots of drivers already have this hidden issue and I can't fix them
> > > one by one (You can check the "snd_pcm_period_elapsed usage" and the
> > > "close implementation" within all the drivers). The most common
> > > mistake is that
> > >   - Checking if the substream is null and call into snd_pcm_period_elapsed
> > >   - But `close` can happen anytime, pass without block and
> > > snd_pcm_detach_substream will be trigger right after it
> >
> > Thanks, point taken.  While this argument is valid and it's good to
> > harden the PCM core side, the concurrent calls are basically a bug,
> > and we'd need another fix in anyway.  Also, the patch 2 makes little
> > sense; there can't be multiple close calls racing with each other.  So
> > I'll go for taking your fix but only the first patch.
> >
> > Back to this race: the surfaced issue is, as you pointed out, the race
> > between snd_pcm_period_elapsed() vs close call.  However, the
> > fundamental problem is the pending action after the PCM trigger-stop
> > call.  Since the PCM trigger doesn't block nor wait until the hardware
> > actually stops the things, the driver may go to the other step even
> > after this "supposed-to-be-stopped" point.  In your case, it goes up
> > to close, and crashes.  If we had a sync-stop operation, the interrupt
> > handler should have finished before moving to the close stage, hence
> > such a race could be avoided.
> >
> > It's been a long known problem, and some drivers have the own
> > implementation for stop-sync.  I think it's time to investigate and
> > start implementing the fundamental solution.
>
> BTW, what we need essentially for intel8x0 is to just call
> synchronize_irq() before closing, at best in hw_free procedure:
>
> --- a/sound/pci/intel8x0.c
> +++ b/sound/pci/intel8x0.c
> @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct snd_pcm_substream *substream,
>
>  static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
>  {
> +       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
>         struct ichdev *ichdev = get_ichdev(substream);
>
> +       synchronize_irq(chip->irq);
>         if (ichdev->pcm_open_flag) {
>                 snd_ac97_pcm_close(ichdev->pcm);
>                 ichdev->pcm_open_flag = 0;
>
>
> The same would be needed also at the beginning of the prepare, as the
> application may restart the stream without release.
>
> My idea is to add sync_stop PCM ops and call it from PCM core at
> snd_pcm_prepare() and snd_pcm_hw_free().
>
Will adding synchronize_irq() in snd_pcm_hw_free there fix the race issue?
Is it possible to have sequence like the following steps ?
- [Thread 1] snd_pcm_hw_free: just pass synchronize_irq()
- [Thread 2] another interrupt come -> snd_intel8x0_update() -> goes
into the lock region of snd_pcm_period_elapsed() and passes the
PCM_RUNTIME_CHECK (right before snd_pcm_running())
- [Thread 1] snd_pcm_hw_free finished() -> snd_pcm_detach_substream()
-> runtime=NULL
- [Thread 2] Execute snd_pcm_running and crash

I can't trigger the issue after adding the synchronize_irq(), but
maybe it's just luck. Correct my if I miss something.

Thanks,
Paul




>
> thanks,
>
> Takashi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access
  2019-11-14 14:16         ` Chih-Yang Hsia
@ 2019-11-14 14:20           ` Takashi Iwai
  2019-11-14 16:37             ` Chih-Yang Hsia
  0 siblings, 1 reply; 14+ messages in thread
From: Takashi Iwai @ 2019-11-14 14:20 UTC (permalink / raw)
  To: Chih-Yang Hsia; +Cc: linux-kernel, Mark Brown, Takashi Iwai, alsa-devel

On Thu, 14 Nov 2019 15:16:04 +0100,
Chih-Yang Hsia wrote:
> 
> On Wed, Nov 13, 2019 at 7:36 PM Takashi Iwai <tiwai@suse.de> wrote:
> >
> > On Wed, 13 Nov 2019 10:47:51 +0100,
> > Takashi Iwai wrote:
> > >
> > > On Wed, 13 Nov 2019 08:24:41 +0100,
> > > Chih-Yang Hsia wrote:
> > > >
> > > > On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai <tiwai@suse.de> wrote:
> > > > >
> > > > > On Tue, 12 Nov 2019 18:17:13 +0100,
> > > > > paulhsia wrote:
> > > > > >
> > > > > > Since
> > > > > > - snd_pcm_detach_substream sets runtime to null without stream lock and
> > > > > > - snd_pcm_period_elapsed checks the nullity of the runtime outside of
> > > > > >   stream lock.
> > > > > >
> > > > > > This will trigger null memory access in snd_pcm_running() call in
> > > > > > snd_pcm_period_elapsed.
> > > > >
> > > > > Well, if a stream is detached, it means that the stream must have been
> > > > > already closed; i.e. it's already a clear bug in the driver that
> > > > > snd_pcm_period_elapsed() is called against such a stream.
> > > > >
> > > > > Or am I missing other possible case?
> > > > >
> > > > >
> > > > > thanks,
> > > > >
> > > > > Takashi
> > > > >
> > > >
> > > > In multithreaded environment, it is possible to have to access both
> > > > `interrupt_handler` (from irq) and `substream close` (from
> > > > snd_pcm_release) at the same time.
> > > > Therefore, in driver implementation, if "substream close function" and
> > > > the "code section where snd_pcm_period_elapsed() in" do not hold the
> > > > same lock, then the following things can happen:
> > > >
> > > > 1. interrupt_handler -> goes into snd_pcm_period_elapsed with a valid
> > > > sustream pointer
> > > > 2. snd_pcm_release_substream: call close without blocking
> > > > 3. snd_pcm_release_substream: call snd_pcm_detache_substream and set
> > > > substream->runtime to NULL
> > > > 4. interrupt_handler -> call snd_pcm_runtime() and crash while
> > > > accessing fields in `substream->runtime`
> > > >
> > > > e.g. In intel8x0.c driver for ac97 device,
> > > > In driver intel8x0.c, `snd_pcm_period_elapsed` is called after
> > > > checking `ichdev->substream` in `snd_intel8x0_update`.
> > > > And if a `snd_pcm_release` call from alsa-lib and pass through close()
> > > > and run to snd_pcm_detach_substream() in another thread, it's possible
> > > > to trigger a crash.
> > > > I can reproduce the issue within a multithread VM easily.
> > > >
> > > > My patches are trying to provide a basic protection for this situation
> > > > (and internal pcm lock between detach and elapsed), since
> > > > - the usage of `snd_pcm_period_elapsed` does not warn callers about
> > > > the possible race if the driver does not  force the order for `calling
> > > > snd_pcm_period_elapsed` and `close` by lock and
> > > > - lots of drivers already have this hidden issue and I can't fix them
> > > > one by one (You can check the "snd_pcm_period_elapsed usage" and the
> > > > "close implementation" within all the drivers). The most common
> > > > mistake is that
> > > >   - Checking if the substream is null and call into snd_pcm_period_elapsed
> > > >   - But `close` can happen anytime, pass without block and
> > > > snd_pcm_detach_substream will be trigger right after it
> > >
> > > Thanks, point taken.  While this argument is valid and it's good to
> > > harden the PCM core side, the concurrent calls are basically a bug,
> > > and we'd need another fix in anyway.  Also, the patch 2 makes little
> > > sense; there can't be multiple close calls racing with each other.  So
> > > I'll go for taking your fix but only the first patch.
> > >
> > > Back to this race: the surfaced issue is, as you pointed out, the race
> > > between snd_pcm_period_elapsed() vs close call.  However, the
> > > fundamental problem is the pending action after the PCM trigger-stop
> > > call.  Since the PCM trigger doesn't block nor wait until the hardware
> > > actually stops the things, the driver may go to the other step even
> > > after this "supposed-to-be-stopped" point.  In your case, it goes up
> > > to close, and crashes.  If we had a sync-stop operation, the interrupt
> > > handler should have finished before moving to the close stage, hence
> > > such a race could be avoided.
> > >
> > > It's been a long known problem, and some drivers have the own
> > > implementation for stop-sync.  I think it's time to investigate and
> > > start implementing the fundamental solution.
> >
> > BTW, what we need essentially for intel8x0 is to just call
> > synchronize_irq() before closing, at best in hw_free procedure:
> >
> > --- a/sound/pci/intel8x0.c
> > +++ b/sound/pci/intel8x0.c
> > @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct snd_pcm_substream *substream,
> >
> >  static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
> >  {
> > +       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
> >         struct ichdev *ichdev = get_ichdev(substream);
> >
> > +       synchronize_irq(chip->irq);
> >         if (ichdev->pcm_open_flag) {
> >                 snd_ac97_pcm_close(ichdev->pcm);
> >                 ichdev->pcm_open_flag = 0;
> >
> >
> > The same would be needed also at the beginning of the prepare, as the
> > application may restart the stream without release.
> >
> > My idea is to add sync_stop PCM ops and call it from PCM core at
> > snd_pcm_prepare() and snd_pcm_hw_free().
> >
> Will adding synchronize_irq() in snd_pcm_hw_free there fix the race issue?
> Is it possible to have sequence like the following steps ?
> - [Thread 1] snd_pcm_hw_free: just pass synchronize_irq()
> - [Thread 2] another interrupt come -> snd_intel8x0_update() -> goes
> into the lock region of snd_pcm_period_elapsed() and passes the
> PCM_RUNTIME_CHECK (right before snd_pcm_running())

This shouldn't happen because at the point snd_pcm_hw_free() the
stream has been already in the SETUP state, i.e. with trigger PCM
callback, the hardware has been programmed not to generate the PCM
stream IRQ.


Takashi


> - [Thread 1] snd_pcm_hw_free finished() -> snd_pcm_detach_substream()
> -> runtime=NULL
> - [Thread 2] Execute snd_pcm_running and crash
> 
> I can't trigger the issue after adding the synchronize_irq(), but
> maybe it's just luck. Correct my if I miss something.
> 
> Thanks,
> Paul
> 
> 
> 
> 
> >
> > thanks,
> >
> > Takashi
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access
  2019-11-14 14:20           ` Takashi Iwai
@ 2019-11-14 16:37             ` Chih-Yang Hsia
  2019-11-14 17:00               ` Takashi Iwai
  0 siblings, 1 reply; 14+ messages in thread
From: Chih-Yang Hsia @ 2019-11-14 16:37 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: linux-kernel, Mark Brown, Takashi Iwai, alsa-devel

On Thu, Nov 14, 2019 at 10:20 PM Takashi Iwai <tiwai@suse.de> wrote:
>
> On Thu, 14 Nov 2019 15:16:04 +0100,
> Chih-Yang Hsia wrote:
> >
> > On Wed, Nov 13, 2019 at 7:36 PM Takashi Iwai <tiwai@suse.de> wrote:
> > >
> > > On Wed, 13 Nov 2019 10:47:51 +0100,
> > > Takashi Iwai wrote:
> > > >
> > > > On Wed, 13 Nov 2019 08:24:41 +0100,
> > > > Chih-Yang Hsia wrote:
> > > > >
> > > > > On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai <tiwai@suse.de> wrote:
> > > > > >
> > > > > > On Tue, 12 Nov 2019 18:17:13 +0100,
> > > > > > paulhsia wrote:
> > > > > > >
> > > > > > > Since
> > > > > > > - snd_pcm_detach_substream sets runtime to null without stream lock and
> > > > > > > - snd_pcm_period_elapsed checks the nullity of the runtime outside of
> > > > > > >   stream lock.
> > > > > > >
> > > > > > > This will trigger null memory access in snd_pcm_running() call in
> > > > > > > snd_pcm_period_elapsed.
> > > > > >
> > > > > > Well, if a stream is detached, it means that the stream must have been
> > > > > > already closed; i.e. it's already a clear bug in the driver that
> > > > > > snd_pcm_period_elapsed() is called against such a stream.
> > > > > >
> > > > > > Or am I missing other possible case?
> > > > > >
> > > > > >
> > > > > > thanks,
> > > > > >
> > > > > > Takashi
> > > > > >
> > > > >
> > > > > In multithreaded environment, it is possible to have to access both
> > > > > `interrupt_handler` (from irq) and `substream close` (from
> > > > > snd_pcm_release) at the same time.
> > > > > Therefore, in driver implementation, if "substream close function" and
> > > > > the "code section where snd_pcm_period_elapsed() in" do not hold the
> > > > > same lock, then the following things can happen:
> > > > >
> > > > > 1. interrupt_handler -> goes into snd_pcm_period_elapsed with a valid
> > > > > sustream pointer
> > > > > 2. snd_pcm_release_substream: call close without blocking
> > > > > 3. snd_pcm_release_substream: call snd_pcm_detache_substream and set
> > > > > substream->runtime to NULL
> > > > > 4. interrupt_handler -> call snd_pcm_runtime() and crash while
> > > > > accessing fields in `substream->runtime`
> > > > >
> > > > > e.g. In intel8x0.c driver for ac97 device,
> > > > > In driver intel8x0.c, `snd_pcm_period_elapsed` is called after
> > > > > checking `ichdev->substream` in `snd_intel8x0_update`.
> > > > > And if a `snd_pcm_release` call from alsa-lib and pass through close()
> > > > > and run to snd_pcm_detach_substream() in another thread, it's possible
> > > > > to trigger a crash.
> > > > > I can reproduce the issue within a multithread VM easily.
> > > > >
> > > > > My patches are trying to provide a basic protection for this situation
> > > > > (and internal pcm lock between detach and elapsed), since
> > > > > - the usage of `snd_pcm_period_elapsed` does not warn callers about
> > > > > the possible race if the driver does not  force the order for `calling
> > > > > snd_pcm_period_elapsed` and `close` by lock and
> > > > > - lots of drivers already have this hidden issue and I can't fix them
> > > > > one by one (You can check the "snd_pcm_period_elapsed usage" and the
> > > > > "close implementation" within all the drivers). The most common
> > > > > mistake is that
> > > > >   - Checking if the substream is null and call into snd_pcm_period_elapsed
> > > > >   - But `close` can happen anytime, pass without block and
> > > > > snd_pcm_detach_substream will be trigger right after it
> > > >
> > > > Thanks, point taken.  While this argument is valid and it's good to
> > > > harden the PCM core side, the concurrent calls are basically a bug,
> > > > and we'd need another fix in anyway.  Also, the patch 2 makes little
> > > > sense; there can't be multiple close calls racing with each other.  So
> > > > I'll go for taking your fix but only the first patch.
> > > >
> > > > Back to this race: the surfaced issue is, as you pointed out, the race
> > > > between snd_pcm_period_elapsed() vs close call.  However, the
> > > > fundamental problem is the pending action after the PCM trigger-stop
> > > > call.  Since the PCM trigger doesn't block nor wait until the hardware
> > > > actually stops the things, the driver may go to the other step even
> > > > after this "supposed-to-be-stopped" point.  In your case, it goes up
> > > > to close, and crashes.  If we had a sync-stop operation, the interrupt
> > > > handler should have finished before moving to the close stage, hence
> > > > such a race could be avoided.
> > > >
> > > > It's been a long known problem, and some drivers have the own
> > > > implementation for stop-sync.  I think it's time to investigate and
> > > > start implementing the fundamental solution.
> > >
> > > BTW, what we need essentially for intel8x0 is to just call
> > > synchronize_irq() before closing, at best in hw_free procedure:
> > >
> > > --- a/sound/pci/intel8x0.c
> > > +++ b/sound/pci/intel8x0.c
> > > @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct snd_pcm_substream *substream,
> > >
> > >  static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
> > >  {
> > > +       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
> > >         struct ichdev *ichdev = get_ichdev(substream);
> > >
> > > +       synchronize_irq(chip->irq);
> > >         if (ichdev->pcm_open_flag) {
> > >                 snd_ac97_pcm_close(ichdev->pcm);
> > >                 ichdev->pcm_open_flag = 0;
> > >
> > >
> > > The same would be needed also at the beginning of the prepare, as the
> > > application may restart the stream without release.
> > >
> > > My idea is to add sync_stop PCM ops and call it from PCM core at
> > > snd_pcm_prepare() and snd_pcm_hw_free().
> > >
> > Will adding synchronize_irq() in snd_pcm_hw_free there fix the race issue?
> > Is it possible to have sequence like the following steps ?
> > - [Thread 1] snd_pcm_hw_free: just pass synchronize_irq()
> > - [Thread 2] another interrupt come -> snd_intel8x0_update() -> goes
> > into the lock region of snd_pcm_period_elapsed() and passes the
> > PCM_RUNTIME_CHECK (right before snd_pcm_running())
>
> This shouldn't happen because at the point snd_pcm_hw_free() the
> stream has been already in the SETUP state, i.e. with trigger PCM
> callback, the hardware has been programmed not to generate the PCM
> stream IRQ.
>
Thanks for pointing that out.
snd_pcm_drop() will be called right before accessing `opts->hw_free`
and device dma will be stopped by SNDRV_PCM_TRIGGER_STOP.
And snd_pcm_prepare() will be called when the device is not running.
So synchronize_irq() should be enough for both of them.

I have a patch like this now in intel8x0:

diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c
index 6ff94d8ad86e..728588937673 100644
--- a/sound/pci/intel8x0.c
+++ b/sound/pci/intel8x0.c
@@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct
snd_pcm_substream *substream,

 static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
 {
+       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
        struct ichdev *ichdev = get_ichdev(substream);

+       synchronize_irq(chip->irq);
        if (ichdev->pcm_open_flag) {
                snd_ac97_pcm_close(ichdev->pcm);
                ichdev->pcm_open_flag = 0;
@@ -993,6 +995,7 @@ static int snd_intel8x0_pcm_prepare(struct
snd_pcm_substream *substream)
        struct snd_pcm_runtime *runtime = substream->runtime;
        struct ichdev *ichdev = get_ichdev(substream);

+       synchronize_irq(chip->irq);
        ichdev->physbuf = runtime->dma_addr;
        ichdev->size = snd_pcm_lib_buffer_bytes(substream);
        ichdev->fragsize = snd_pcm_lib_period_bytes(substream);

If that looks good to you, I can upload the patch to pw as well.
Then we can upstream the intel8x0 patch and the first change I made in
this series (the elapse one).
Does that sound good to you?

Thanks,
Paul

>
> Takashi
>
>
> > - [Thread 1] snd_pcm_hw_free finished() -> snd_pcm_detach_substream()
> > -> runtime=NULL
> > - [Thread 2] Execute snd_pcm_running and crash
> >
> > I can't trigger the issue after adding the synchronize_irq(), but
> > maybe it's just luck. Correct my if I miss something.
> >
> > Thanks,
> > Paul
> >
> >
> >
> >
> > >
> > > thanks,
> > >
> > > Takashi
> >

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access
  2019-11-14 16:37             ` Chih-Yang Hsia
@ 2019-11-14 17:00               ` Takashi Iwai
  2019-11-15 15:36                 ` Chih-Yang Hsia
       [not found]                 ` <CAJaf1Ta1tqYMCTaWxeL82gfY8Fg6hidLjHO3FFiqU7yyn5oVPg@mail.gmail.com>
  0 siblings, 2 replies; 14+ messages in thread
From: Takashi Iwai @ 2019-11-14 17:00 UTC (permalink / raw)
  To: Chih-Yang Hsia; +Cc: linux-kernel, Mark Brown, Takashi Iwai, alsa-devel

[-- Attachment #1: Type: text/plain, Size: 9001 bytes --]

On Thu, 14 Nov 2019 17:37:54 +0100,
Chih-Yang Hsia wrote:
> 
> On Thu, Nov 14, 2019 at 10:20 PM Takashi Iwai <tiwai@suse.de> wrote:
> >
> > On Thu, 14 Nov 2019 15:16:04 +0100,
> > Chih-Yang Hsia wrote:
> > >
> > > On Wed, Nov 13, 2019 at 7:36 PM Takashi Iwai <tiwai@suse.de> wrote:
> > > >
> > > > On Wed, 13 Nov 2019 10:47:51 +0100,
> > > > Takashi Iwai wrote:
> > > > >
> > > > > On Wed, 13 Nov 2019 08:24:41 +0100,
> > > > > Chih-Yang Hsia wrote:
> > > > > >
> > > > > > On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai <tiwai@suse.de> wrote:
> > > > > > >
> > > > > > > On Tue, 12 Nov 2019 18:17:13 +0100,
> > > > > > > paulhsia wrote:
> > > > > > > >
> > > > > > > > Since
> > > > > > > > - snd_pcm_detach_substream sets runtime to null without stream lock and
> > > > > > > > - snd_pcm_period_elapsed checks the nullity of the runtime outside of
> > > > > > > >   stream lock.
> > > > > > > >
> > > > > > > > This will trigger null memory access in snd_pcm_running() call in
> > > > > > > > snd_pcm_period_elapsed.
> > > > > > >
> > > > > > > Well, if a stream is detached, it means that the stream must have been
> > > > > > > already closed; i.e. it's already a clear bug in the driver that
> > > > > > > snd_pcm_period_elapsed() is called against such a stream.
> > > > > > >
> > > > > > > Or am I missing other possible case?
> > > > > > >
> > > > > > >
> > > > > > > thanks,
> > > > > > >
> > > > > > > Takashi
> > > > > > >
> > > > > >
> > > > > > In multithreaded environment, it is possible to have to access both
> > > > > > `interrupt_handler` (from irq) and `substream close` (from
> > > > > > snd_pcm_release) at the same time.
> > > > > > Therefore, in driver implementation, if "substream close function" and
> > > > > > the "code section where snd_pcm_period_elapsed() in" do not hold the
> > > > > > same lock, then the following things can happen:
> > > > > >
> > > > > > 1. interrupt_handler -> goes into snd_pcm_period_elapsed with a valid
> > > > > > sustream pointer
> > > > > > 2. snd_pcm_release_substream: call close without blocking
> > > > > > 3. snd_pcm_release_substream: call snd_pcm_detache_substream and set
> > > > > > substream->runtime to NULL
> > > > > > 4. interrupt_handler -> call snd_pcm_runtime() and crash while
> > > > > > accessing fields in `substream->runtime`
> > > > > >
> > > > > > e.g. In intel8x0.c driver for ac97 device,
> > > > > > In driver intel8x0.c, `snd_pcm_period_elapsed` is called after
> > > > > > checking `ichdev->substream` in `snd_intel8x0_update`.
> > > > > > And if a `snd_pcm_release` call from alsa-lib and pass through close()
> > > > > > and run to snd_pcm_detach_substream() in another thread, it's possible
> > > > > > to trigger a crash.
> > > > > > I can reproduce the issue within a multithread VM easily.
> > > > > >
> > > > > > My patches are trying to provide a basic protection for this situation
> > > > > > (and internal pcm lock between detach and elapsed), since
> > > > > > - the usage of `snd_pcm_period_elapsed` does not warn callers about
> > > > > > the possible race if the driver does not  force the order for `calling
> > > > > > snd_pcm_period_elapsed` and `close` by lock and
> > > > > > - lots of drivers already have this hidden issue and I can't fix them
> > > > > > one by one (You can check the "snd_pcm_period_elapsed usage" and the
> > > > > > "close implementation" within all the drivers). The most common
> > > > > > mistake is that
> > > > > >   - Checking if the substream is null and call into snd_pcm_period_elapsed
> > > > > >   - But `close` can happen anytime, pass without block and
> > > > > > snd_pcm_detach_substream will be trigger right after it
> > > > >
> > > > > Thanks, point taken.  While this argument is valid and it's good to
> > > > > harden the PCM core side, the concurrent calls are basically a bug,
> > > > > and we'd need another fix in anyway.  Also, the patch 2 makes little
> > > > > sense; there can't be multiple close calls racing with each other.  So
> > > > > I'll go for taking your fix but only the first patch.
> > > > >
> > > > > Back to this race: the surfaced issue is, as you pointed out, the race
> > > > > between snd_pcm_period_elapsed() vs close call.  However, the
> > > > > fundamental problem is the pending action after the PCM trigger-stop
> > > > > call.  Since the PCM trigger doesn't block nor wait until the hardware
> > > > > actually stops the things, the driver may go to the other step even
> > > > > after this "supposed-to-be-stopped" point.  In your case, it goes up
> > > > > to close, and crashes.  If we had a sync-stop operation, the interrupt
> > > > > handler should have finished before moving to the close stage, hence
> > > > > such a race could be avoided.
> > > > >
> > > > > It's been a long known problem, and some drivers have the own
> > > > > implementation for stop-sync.  I think it's time to investigate and
> > > > > start implementing the fundamental solution.
> > > >
> > > > BTW, what we need essentially for intel8x0 is to just call
> > > > synchronize_irq() before closing, at best in hw_free procedure:
> > > >
> > > > --- a/sound/pci/intel8x0.c
> > > > +++ b/sound/pci/intel8x0.c
> > > > @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct snd_pcm_substream *substream,
> > > >
> > > >  static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
> > > >  {
> > > > +       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
> > > >         struct ichdev *ichdev = get_ichdev(substream);
> > > >
> > > > +       synchronize_irq(chip->irq);
> > > >         if (ichdev->pcm_open_flag) {
> > > >                 snd_ac97_pcm_close(ichdev->pcm);
> > > >                 ichdev->pcm_open_flag = 0;
> > > >
> > > >
> > > > The same would be needed also at the beginning of the prepare, as the
> > > > application may restart the stream without release.
> > > >
> > > > My idea is to add sync_stop PCM ops and call it from PCM core at
> > > > snd_pcm_prepare() and snd_pcm_hw_free().
> > > >
> > > Will adding synchronize_irq() in snd_pcm_hw_free there fix the race issue?
> > > Is it possible to have sequence like the following steps ?
> > > - [Thread 1] snd_pcm_hw_free: just pass synchronize_irq()
> > > - [Thread 2] another interrupt come -> snd_intel8x0_update() -> goes
> > > into the lock region of snd_pcm_period_elapsed() and passes the
> > > PCM_RUNTIME_CHECK (right before snd_pcm_running())
> >
> > This shouldn't happen because at the point snd_pcm_hw_free() the
> > stream has been already in the SETUP state, i.e. with trigger PCM
> > callback, the hardware has been programmed not to generate the PCM
> > stream IRQ.
> >
> Thanks for pointing that out.
> snd_pcm_drop() will be called right before accessing `opts->hw_free`
> and device dma will be stopped by SNDRV_PCM_TRIGGER_STOP.
> And snd_pcm_prepare() will be called when the device is not running.
> So synchronize_irq() should be enough for both of them.
> 
> I have a patch like this now in intel8x0:
> 
> diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c
> index 6ff94d8ad86e..728588937673 100644
> --- a/sound/pci/intel8x0.c
> +++ b/sound/pci/intel8x0.c
> @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct
> snd_pcm_substream *substream,
> 
>  static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
>  {
> +       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
>         struct ichdev *ichdev = get_ichdev(substream);
> 
> +       synchronize_irq(chip->irq);
>         if (ichdev->pcm_open_flag) {
>                 snd_ac97_pcm_close(ichdev->pcm);
>                 ichdev->pcm_open_flag = 0;
> @@ -993,6 +995,7 @@ static int snd_intel8x0_pcm_prepare(struct
> snd_pcm_substream *substream)
>         struct snd_pcm_runtime *runtime = substream->runtime;
>         struct ichdev *ichdev = get_ichdev(substream);
> 
> +       synchronize_irq(chip->irq);
>         ichdev->physbuf = runtime->dma_addr;
>         ichdev->size = snd_pcm_lib_buffer_bytes(substream);
>         ichdev->fragsize = snd_pcm_lib_period_bytes(substream);
> 
> If that looks good to you, I can upload the patch to pw as well.
> Then we can upstream the intel8x0 patch and the first change I made in
> this series (the elapse one).
> Does that sound good to you?

I already have a patch set that adds the irq-sync commonly, as this
problem is seen on various drivers as you already pointed.

Below two patches add the support in PCM core side, and the rest need in
intel8x0.c is something like:

--- a/sound/pci/intel8x0.c
+++ b/sound/pci/intel8x0.c
@@ -3092,6 +3092,7 @@ static int snd_intel8x0_create(struct snd_card *card,
 		return -EBUSY;
 	}
 	chip->irq = pci->irq;
+	card->sync_irq = chip->irq;
 
 	if ((err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops)) < 0) {
 		snd_intel8x0_free(chip);


(The intel8x0 does re-acquire IRQ, so it'll need a bit more lines, but
 you get the idea.)

My plan is to merge the whole changes after 5.5-rc1, destined for
5.6.


thanks,

Takashi


[-- Attachment #2: 0001-ALSA-pcm-Add-the-support-for-sync-stop-operation.patch --]
[-- Type: application/octet-stream, Size: 4424 bytes --]

From a2ba786d9dcbd92b9d0f2fcc408fa80168484c0a Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai@suse.de>
Date: Wed, 13 Nov 2019 15:43:41 +0100
Subject: [PATCH] ALSA: pcm: Add the support for sync-stop operation

The standard programming model of a PCM sound driver is to process
snd_pcm_period_elapsed() from an interrupt handler.  When a running
stream is stopped, PCM core calls the trigger-STOP PCM ops, sets the
stream state to SETUP, and moves on to the next step.  This is
performed in an atomic manner -- this could be called from the interrupt
context, after all.

The problem is that, if the stream goes further and reaches to the
CLOSE state immediately, the stream might be still being processed in
snd_pcm_period_elapsed() in the interrupt context, and hits a NULL
dereference.  Such a crash happens because of the atomic operation,
and we can't wait until the stream-stop finishes.

For addressing such a problem, this commit adds a new PCM ops,
sync_stop.  This gets called at the appropriate places that need a
sync with the stream-stop, i.e. at hw_params, prepare and hw_free.

Some drivers already have a similar mechanism implemented locally, and
we'll refactor the code later.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 include/sound/pcm.h     |  2 ++
 sound/core/pcm_native.c | 15 +++++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/sound/pcm.h b/include/sound/pcm.h
index 9b1accfd9413..76053d9deb07 100644
--- a/include/sound/pcm.h
+++ b/include/sound/pcm.h
@@ -59,6 +59,7 @@ struct snd_pcm_ops {
 	int (*hw_free)(struct snd_pcm_substream *substream);
 	int (*prepare)(struct snd_pcm_substream *substream);
 	int (*trigger)(struct snd_pcm_substream *substream, int cmd);
+	int (*sync_stop)(struct snd_pcm_substream *substream);
 	snd_pcm_uframes_t (*pointer)(struct snd_pcm_substream *substream);
 	int (*get_time_info)(struct snd_pcm_substream *substream,
 			struct timespec *system_ts, struct timespec *audio_ts,
@@ -395,6 +396,7 @@ struct snd_pcm_runtime {
 	wait_queue_head_t sleep;	/* poll sleep */
 	wait_queue_head_t tsleep;	/* transfer sleep */
 	struct fasync_struct *fasync;
+	bool stop_operating;		/* sync_stop will be called */
 
 	/* -- private section -- */
 	void *private_data;
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 704ea75199e4..163d621ff238 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -568,6 +568,15 @@ static inline void snd_pcm_timer_notify(struct snd_pcm_substream *substream,
 #endif
 }
 
+static void snd_pcm_sync_stop(struct snd_pcm_substream *substream)
+{
+	if (substream->runtime->stop_operating) {
+		substream->runtime->stop_operating = false;
+		if (substream->ops->sync_stop)
+			substream->ops->sync_stop(substream);
+	}
+}
+
 /**
  * snd_pcm_hw_param_choose - choose a configuration defined by @params
  * @pcm: PCM instance
@@ -660,6 +669,8 @@ static int snd_pcm_hw_params(struct snd_pcm_substream *substream,
 		if (atomic_read(&substream->mmap_count))
 			return -EBADFD;
 
+	snd_pcm_sync_stop(substream);
+
 	params->rmask = ~0U;
 	err = snd_pcm_hw_refine(substream, params);
 	if (err < 0)
@@ -788,6 +799,7 @@ static int snd_pcm_hw_free(struct snd_pcm_substream *substream)
 	snd_pcm_stream_unlock_irq(substream);
 	if (atomic_read(&substream->mmap_count))
 		return -EBADFD;
+	snd_pcm_sync_stop(substream);
 	if (substream->ops->hw_free)
 		result = substream->ops->hw_free(substream);
 	if (substream->managed_buffer_alloc)
@@ -1313,6 +1325,7 @@ static void snd_pcm_post_stop(struct snd_pcm_substream *substream, int state)
 		runtime->status->state = state;
 		snd_pcm_timer_notify(substream, SNDRV_TIMER_EVENT_MSTOP);
 	}
+	runtime->stop_operating = true;
 	wake_up(&runtime->sleep);
 	wake_up(&runtime->tsleep);
 }
@@ -1589,6 +1602,7 @@ static void snd_pcm_post_resume(struct snd_pcm_substream *substream, int state)
 	snd_pcm_trigger_tstamp(substream);
 	runtime->status->state = runtime->status->suspended_state;
 	snd_pcm_timer_notify(substream, SNDRV_TIMER_EVENT_MRESUME);
+	snd_pcm_sync_stop(substream);
 }
 
 static const struct action_ops snd_pcm_action_resume = {
@@ -1709,6 +1723,7 @@ static int snd_pcm_pre_prepare(struct snd_pcm_substream *substream,
 static int snd_pcm_do_prepare(struct snd_pcm_substream *substream, int state)
 {
 	int err;
+	snd_pcm_sync_stop(substream);
 	err = substream->ops->prepare(substream);
 	if (err < 0)
 		return err;
-- 
2.16.4


[-- Attachment #3: 0002-ALSA-pcm-Add-card-sync_irq-field.patch --]
[-- Type: application/octet-stream, Size: 2192 bytes --]

From 66e5ee3821e3072ce6c308e0ad12a3667ee3bcc9 Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai@suse.de>
Date: Thu, 14 Nov 2019 12:47:27 +0100
Subject: [PATCH] ALSA: pcm: Add card sync_irq field

Many PCI and other drivers performs snd_pcm_period_elapsed() simply in
its interrupt handler, so the sync_stop operation is just to call
synchronize_irq().  Instead of putting this call multiple times,
introduce the common card->sync_irq field.  When this field is set,
PCM core performs synchronize_irq() for sync-stop operation.  Each
driver just needs to copy its local IRQ number to card->sync_irq, and
that's all we need.

Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 include/sound/core.h    | 1 +
 sound/core/init.c       | 1 +
 sound/core/pcm_native.c | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/include/sound/core.h b/include/sound/core.h
index ee238f100f73..af3dce956c17 100644
--- a/include/sound/core.h
+++ b/include/sound/core.h
@@ -117,6 +117,7 @@ struct snd_card {
 	struct device card_dev;		/* cardX object for sysfs */
 	const struct attribute_group *dev_groups[4]; /* assigned sysfs attr */
 	bool registered;		/* card_dev is registered? */
+	int sync_irq;			/* assigned irq, used for PCM sync */
 	wait_queue_head_t remove_sleep;
 
 #ifdef CONFIG_PM
diff --git a/sound/core/init.c b/sound/core/init.c
index db99b7fad6ad..faa9f03c01ca 100644
--- a/sound/core/init.c
+++ b/sound/core/init.c
@@ -215,6 +215,7 @@ int snd_card_new(struct device *parent, int idx, const char *xid,
 	init_waitqueue_head(&card->power_sleep);
 #endif
 	init_waitqueue_head(&card->remove_sleep);
+	card->sync_irq = -1;
 
 	device_initialize(&card->card_dev);
 	card->card_dev.parent = parent;
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 163d621ff238..1fe581167b7b 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -574,6 +574,8 @@ static void snd_pcm_sync_stop(struct snd_pcm_substream *substream)
 		substream->runtime->stop_operating = false;
 		if (substream->ops->sync_stop)
 			substream->ops->sync_stop(substream);
+		else if (substream->pcm->card->sync_irq > 0)
+			synchronize_irq(substream->pcm->card->sync_irq);
 	}
 }
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access
  2019-11-14 17:00               ` Takashi Iwai
@ 2019-11-15 15:36                 ` Chih-Yang Hsia
       [not found]                 ` <CAJaf1Ta1tqYMCTaWxeL82gfY8Fg6hidLjHO3FFiqU7yyn5oVPg@mail.gmail.com>
  1 sibling, 0 replies; 14+ messages in thread
From: Chih-Yang Hsia @ 2019-11-15 15:36 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: linux-kernel, Mark Brown, Takashi Iwai, alsa-devel

On Fri, Nov 15, 2019 at 1:00 AM Takashi Iwai <tiwai@suse.de> wrote:
>
> On Thu, 14 Nov 2019 17:37:54 +0100,
> Chih-Yang Hsia wrote:
> >
> > On Thu, Nov 14, 2019 at 10:20 PM Takashi Iwai <tiwai@suse.de> wrote:
> > >
> > > On Thu, 14 Nov 2019 15:16:04 +0100,
> > > Chih-Yang Hsia wrote:
> > > >
> > > > On Wed, Nov 13, 2019 at 7:36 PM Takashi Iwai <tiwai@suse.de> wrote:
> > > > >
> > > > > On Wed, 13 Nov 2019 10:47:51 +0100,
> > > > > Takashi Iwai wrote:
> > > > > >
> > > > > > On Wed, 13 Nov 2019 08:24:41 +0100,
> > > > > > Chih-Yang Hsia wrote:
> > > > > > >
> > > > > > > On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai <tiwai@suse.de> wrote:
> > > > > > > >
> > > > > > > > On Tue, 12 Nov 2019 18:17:13 +0100,
> > > > > > > > paulhsia wrote:
> > > > > > > > >
> > > > > > > > > Since
> > > > > > > > > - snd_pcm_detach_substream sets runtime to null without stream lock and
> > > > > > > > > - snd_pcm_period_elapsed checks the nullity of the runtime outside of
> > > > > > > > >   stream lock.
> > > > > > > > >
> > > > > > > > > This will trigger null memory access in snd_pcm_running() call in
> > > > > > > > > snd_pcm_period_elapsed.
> > > > > > > >
> > > > > > > > Well, if a stream is detached, it means that the stream must have been
> > > > > > > > already closed; i.e. it's already a clear bug in the driver that
> > > > > > > > snd_pcm_period_elapsed() is called against such a stream.
> > > > > > > >
> > > > > > > > Or am I missing other possible case?
> > > > > > > >
> > > > > > > >
> > > > > > > > thanks,
> > > > > > > >
> > > > > > > > Takashi
> > > > > > > >
> > > > > > >
> > > > > > > In multithreaded environment, it is possible to have to access both
> > > > > > > `interrupt_handler` (from irq) and `substream close` (from
> > > > > > > snd_pcm_release) at the same time.
> > > > > > > Therefore, in driver implementation, if "substream close function" and
> > > > > > > the "code section where snd_pcm_period_elapsed() in" do not hold the
> > > > > > > same lock, then the following things can happen:
> > > > > > >
> > > > > > > 1. interrupt_handler -> goes into snd_pcm_period_elapsed with a valid
> > > > > > > sustream pointer
> > > > > > > 2. snd_pcm_release_substream: call close without blocking
> > > > > > > 3. snd_pcm_release_substream: call snd_pcm_detache_substream and set
> > > > > > > substream->runtime to NULL
> > > > > > > 4. interrupt_handler -> call snd_pcm_runtime() and crash while
> > > > > > > accessing fields in `substream->runtime`
> > > > > > >
> > > > > > > e.g. In intel8x0.c driver for ac97 device,
> > > > > > > In driver intel8x0.c, `snd_pcm_period_elapsed` is called after
> > > > > > > checking `ichdev->substream` in `snd_intel8x0_update`.
> > > > > > > And if a `snd_pcm_release` call from alsa-lib and pass through close()
> > > > > > > and run to snd_pcm_detach_substream() in another thread, it's possible
> > > > > > > to trigger a crash.
> > > > > > > I can reproduce the issue within a multithread VM easily.
> > > > > > >
> > > > > > > My patches are trying to provide a basic protection for this situation
> > > > > > > (and internal pcm lock between detach and elapsed), since
> > > > > > > - the usage of `snd_pcm_period_elapsed` does not warn callers about
> > > > > > > the possible race if the driver does not  force the order for `calling
> > > > > > > snd_pcm_period_elapsed` and `close` by lock and
> > > > > > > - lots of drivers already have this hidden issue and I can't fix them
> > > > > > > one by one (You can check the "snd_pcm_period_elapsed usage" and the
> > > > > > > "close implementation" within all the drivers). The most common
> > > > > > > mistake is that
> > > > > > >   - Checking if the substream is null and call into snd_pcm_period_elapsed
> > > > > > >   - But `close` can happen anytime, pass without block and
> > > > > > > snd_pcm_detach_substream will be trigger right after it
> > > > > >
> > > > > > Thanks, point taken.  While this argument is valid and it's good to
> > > > > > harden the PCM core side, the concurrent calls are basically a bug,
> > > > > > and we'd need another fix in anyway.  Also, the patch 2 makes little
> > > > > > sense; there can't be multiple close calls racing with each other.  So
> > > > > > I'll go for taking your fix but only the first patch.
> > > > > >
> > > > > > Back to this race: the surfaced issue is, as you pointed out, the race
> > > > > > between snd_pcm_period_elapsed() vs close call.  However, the
> > > > > > fundamental problem is the pending action after the PCM trigger-stop
> > > > > > call.  Since the PCM trigger doesn't block nor wait until the hardware
> > > > > > actually stops the things, the driver may go to the other step even
> > > > > > after this "supposed-to-be-stopped" point.  In your case, it goes up
> > > > > > to close, and crashes.  If we had a sync-stop operation, the interrupt
> > > > > > handler should have finished before moving to the close stage, hence
> > > > > > such a race could be avoided.
> > > > > >
> > > > > > It's been a long known problem, and some drivers have the own
> > > > > > implementation for stop-sync.  I think it's time to investigate and
> > > > > > start implementing the fundamental solution.
> > > > >
> > > > > BTW, what we need essentially for intel8x0 is to just call
> > > > > synchronize_irq() before closing, at best in hw_free procedure:
> > > > >
> > > > > --- a/sound/pci/intel8x0.c
> > > > > +++ b/sound/pci/intel8x0.c
> > > > > @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct snd_pcm_substream *substream,
> > > > >
> > > > >  static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
> > > > >  {
> > > > > +       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
> > > > >         struct ichdev *ichdev = get_ichdev(substream);
> > > > >
> > > > > +       synchronize_irq(chip->irq);
> > > > >         if (ichdev->pcm_open_flag) {
> > > > >                 snd_ac97_pcm_close(ichdev->pcm);
> > > > >                 ichdev->pcm_open_flag = 0;
> > > > >
> > > > >
> > > > > The same would be needed also at the beginning of the prepare, as the
> > > > > application may restart the stream without release.
> > > > >
> > > > > My idea is to add sync_stop PCM ops and call it from PCM core at
> > > > > snd_pcm_prepare() and snd_pcm_hw_free().
> > > > >
> > > > Will adding synchronize_irq() in snd_pcm_hw_free there fix the race issue?
> > > > Is it possible to have sequence like the following steps ?
> > > > - [Thread 1] snd_pcm_hw_free: just pass synchronize_irq()
> > > > - [Thread 2] another interrupt come -> snd_intel8x0_update() -> goes
> > > > into the lock region of snd_pcm_period_elapsed() and passes the
> > > > PCM_RUNTIME_CHECK (right before snd_pcm_running())
> > >
> > > This shouldn't happen because at the point snd_pcm_hw_free() the
> > > stream has been already in the SETUP state, i.e. with trigger PCM
> > > callback, the hardware has been programmed not to generate the PCM
> > > stream IRQ.
> > >
> > Thanks for pointing that out.
> > snd_pcm_drop() will be called right before accessing `opts->hw_free`
> > and device dma will be stopped by SNDRV_PCM_TRIGGER_STOP.
> > And snd_pcm_prepare() will be called when the device is not running.
> > So synchronize_irq() should be enough for both of them.
> >
> > I have a patch like this now in intel8x0:
> >
> > diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c
> > index 6ff94d8ad86e..728588937673 100644
> > --- a/sound/pci/intel8x0.c
> > +++ b/sound/pci/intel8x0.c
> > @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct
> > snd_pcm_substream *substream,
> >
> >  static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
> >  {
> > +       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
> >         struct ichdev *ichdev = get_ichdev(substream);
> >
> > +       synchronize_irq(chip->irq);
> >         if (ichdev->pcm_open_flag) {
> >                 snd_ac97_pcm_close(ichdev->pcm);
> >                 ichdev->pcm_open_flag = 0;
> > @@ -993,6 +995,7 @@ static int snd_intel8x0_pcm_prepare(struct
> > snd_pcm_substream *substream)
> >         struct snd_pcm_runtime *runtime = substream->runtime;
> >         struct ichdev *ichdev = get_ichdev(substream);
> >
> > +       synchronize_irq(chip->irq);
> >         ichdev->physbuf = runtime->dma_addr;
> >         ichdev->size = snd_pcm_lib_buffer_bytes(substream);
> >         ichdev->fragsize = snd_pcm_lib_period_bytes(substream);
> >
> > If that looks good to you, I can upload the patch to pw as well.
> > Then we can upstream the intel8x0 patch and the first change I made in
> > this series (the elapse one).
> > Does that sound good to you?
>
> I already have a patch set that adds the irq-sync commonly, as this
> problem is seen on various drivers as you already pointed.
>
> Below two patches add the support in PCM core side, and the rest need in
> intel8x0.c is something like:
>
> --- a/sound/pci/intel8x0.c
> +++ b/sound/pci/intel8x0.c
> @@ -3092,6 +3092,7 @@ static int snd_intel8x0_create(struct snd_card *card,
>                 return -EBUSY;
>         }
>         chip->irq = pci->irq;
> +       card->sync_irq = chip->irq;

Will this assignment or removement cause possible race if the driver
is careless?
Maybe providing some helper functions or teaching driver writers when
is the right time to change or remove the sync_irq will help.

Best,
Paul

>         if ((err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops)) < 0) {
>                 snd_intel8x0_free(chip);
>
>
> (The intel8x0 does re-acquire IRQ, so it'll need a bit more lines, but
>  you get the idea.)
>
> My plan is to merge the whole changes after 5.5-rc1, destined for
> 5.6.
>
>
> thanks,
>
> Takashi
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access
       [not found]                   ` <s5hy2whi1gw.wl-tiwai@suse.de>
@ 2019-11-15 17:04                     ` Chih-Yang Hsia
  2019-11-15 17:07                       ` Takashi Iwai
  0 siblings, 1 reply; 14+ messages in thread
From: Chih-Yang Hsia @ 2019-11-15 17:04 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: linux-kernel, Mark Brown, Takashi Iwai, alsa-devel

On Fri, Nov 15, 2019 at 11:49 PM Takashi Iwai <tiwai@suse.de> wrote:
>
> On Fri, 15 Nov 2019 16:35:19 +0100,
> Chih-Yang Hsia wrote:
> >
> > On Fri, Nov 15, 2019 at 1:00 AM Takashi Iwai <tiwai@suse.de> wrote:
> > >
> > > On Thu, 14 Nov 2019 17:37:54 +0100,
> > > Chih-Yang Hsia wrote:
> > > >
> > > > On Thu, Nov 14, 2019 at 10:20 PM Takashi Iwai <tiwai@suse.de> wrote:
> > > > >
> > > > > On Thu, 14 Nov 2019 15:16:04 +0100,
> > > > > Chih-Yang Hsia wrote:
> > > > > >
> > > > > > On Wed, Nov 13, 2019 at 7:36 PM Takashi Iwai <tiwai@suse.de> wrote:
> > > > > > >
> > > > > > > On Wed, 13 Nov 2019 10:47:51 +0100,
> > > > > > > Takashi Iwai wrote:
> > > > > > > >
> > > > > > > > On Wed, 13 Nov 2019 08:24:41 +0100,
> > > > > > > > Chih-Yang Hsia wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai <tiwai@suse.de> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 12 Nov 2019 18:17:13 +0100,
> > > > > > > > > > paulhsia wrote:
> > > > > > > > > > >
> > > > > > > > > > > Since
> > > > > > > > > > > - snd_pcm_detach_substream sets runtime to null without stream lock and
> > > > > > > > > > > - snd_pcm_period_elapsed checks the nullity of the runtime outside of
> > > > > > > > > > >   stream lock.
> > > > > > > > > > >
> > > > > > > > > > > This will trigger null memory access in snd_pcm_running() call in
> > > > > > > > > > > snd_pcm_period_elapsed.
> > > > > > > > > >
> > > > > > > > > > Well, if a stream is detached, it means that the stream must have been
> > > > > > > > > > already closed; i.e. it's already a clear bug in the driver that
> > > > > > > > > > snd_pcm_period_elapsed() is called against such a stream.
> > > > > > > > > >
> > > > > > > > > > Or am I missing other possible case?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > thanks,
> > > > > > > > > >
> > > > > > > > > > Takashi
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > In multithreaded environment, it is possible to have to access both
> > > > > > > > > `interrupt_handler` (from irq) and `substream close` (from
> > > > > > > > > snd_pcm_release) at the same time.
> > > > > > > > > Therefore, in driver implementation, if "substream close function" and
> > > > > > > > > the "code section where snd_pcm_period_elapsed() in" do not hold the
> > > > > > > > > same lock, then the following things can happen:
> > > > > > > > >
> > > > > > > > > 1. interrupt_handler -> goes into snd_pcm_period_elapsed with a valid
> > > > > > > > > sustream pointer
> > > > > > > > > 2. snd_pcm_release_substream: call close without blocking
> > > > > > > > > 3. snd_pcm_release_substream: call snd_pcm_detache_substream and set
> > > > > > > > > substream->runtime to NULL
> > > > > > > > > 4. interrupt_handler -> call snd_pcm_runtime() and crash while
> > > > > > > > > accessing fields in `substream->runtime`
> > > > > > > > >
> > > > > > > > > e.g. In intel8x0.c driver for ac97 device,
> > > > > > > > > In driver intel8x0.c, `snd_pcm_period_elapsed` is called after
> > > > > > > > > checking `ichdev->substream` in `snd_intel8x0_update`.
> > > > > > > > > And if a `snd_pcm_release` call from alsa-lib and pass through close()
> > > > > > > > > and run to snd_pcm_detach_substream() in another thread, it's possible
> > > > > > > > > to trigger a crash.
> > > > > > > > > I can reproduce the issue within a multithread VM easily.
> > > > > > > > >
> > > > > > > > > My patches are trying to provide a basic protection for this situation
> > > > > > > > > (and internal pcm lock between detach and elapsed), since
> > > > > > > > > - the usage of `snd_pcm_period_elapsed` does not warn callers about
> > > > > > > > > the possible race if the driver does not  force the order for `calling
> > > > > > > > > snd_pcm_period_elapsed` and `close` by lock and
> > > > > > > > > - lots of drivers already have this hidden issue and I can't fix them
> > > > > > > > > one by one (You can check the "snd_pcm_period_elapsed usage" and the
> > > > > > > > > "close implementation" within all the drivers). The most common
> > > > > > > > > mistake is that
> > > > > > > > >   - Checking if the substream is null and call into snd_pcm_period_elapsed
> > > > > > > > >   - But `close` can happen anytime, pass without block and
> > > > > > > > > snd_pcm_detach_substream will be trigger right after it
> > > > > > > >
> > > > > > > > Thanks, point taken.  While this argument is valid and it's good to
> > > > > > > > harden the PCM core side, the concurrent calls are basically a bug,
> > > > > > > > and we'd need another fix in anyway.  Also, the patch 2 makes little
> > > > > > > > sense; there can't be multiple close calls racing with each other.  So
> > > > > > > > I'll go for taking your fix but only the first patch.
> > > > > > > >
> > > > > > > > Back to this race: the surfaced issue is, as you pointed out, the race
> > > > > > > > between snd_pcm_period_elapsed() vs close call.  However, the
> > > > > > > > fundamental problem is the pending action after the PCM trigger-stop
> > > > > > > > call.  Since the PCM trigger doesn't block nor wait until the hardware
> > > > > > > > actually stops the things, the driver may go to the other step even
> > > > > > > > after this "supposed-to-be-stopped" point.  In your case, it goes up
> > > > > > > > to close, and crashes.  If we had a sync-stop operation, the interrupt
> > > > > > > > handler should have finished before moving to the close stage, hence
> > > > > > > > such a race could be avoided.
> > > > > > > >
> > > > > > > > It's been a long known problem, and some drivers have the own
> > > > > > > > implementation for stop-sync.  I think it's time to investigate and
> > > > > > > > start implementing the fundamental solution.
> > > > > > >
> > > > > > > BTW, what we need essentially for intel8x0 is to just call
> > > > > > > synchronize_irq() before closing, at best in hw_free procedure:
> > > > > > >
> > > > > > > --- a/sound/pci/intel8x0.c
> > > > > > > +++ b/sound/pci/intel8x0.c
> > > > > > > @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct snd_pcm_substream *substream,
> > > > > > >
> > > > > > >  static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
> > > > > > >  {
> > > > > > > +       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
> > > > > > >         struct ichdev *ichdev = get_ichdev(substream);
> > > > > > >
> > > > > > > +       synchronize_irq(chip->irq);
> > > > > > >         if (ichdev->pcm_open_flag) {
> > > > > > >                 snd_ac97_pcm_close(ichdev->pcm);
> > > > > > >                 ichdev->pcm_open_flag = 0;
> > > > > > >
> > > > > > >
> > > > > > > The same would be needed also at the beginning of the prepare, as the
> > > > > > > application may restart the stream without release.
> > > > > > >
> > > > > > > My idea is to add sync_stop PCM ops and call it from PCM core at
> > > > > > > snd_pcm_prepare() and snd_pcm_hw_free().
> > > > > > >
> > > > > > Will adding synchronize_irq() in snd_pcm_hw_free there fix the race issue?
> > > > > > Is it possible to have sequence like the following steps ?
> > > > > > - [Thread 1] snd_pcm_hw_free: just pass synchronize_irq()
> > > > > > - [Thread 2] another interrupt come -> snd_intel8x0_update() -> goes
> > > > > > into the lock region of snd_pcm_period_elapsed() and passes the
> > > > > > PCM_RUNTIME_CHECK (right before snd_pcm_running())
> > > > >
> > > > > This shouldn't happen because at the point snd_pcm_hw_free() the
> > > > > stream has been already in the SETUP state, i.e. with trigger PCM
> > > > > callback, the hardware has been programmed not to generate the PCM
> > > > > stream IRQ.
> > > > >
> > > > Thanks for pointing that out.
> > > > snd_pcm_drop() will be called right before accessing `opts->hw_free`
> > > > and device dma will be stopped by SNDRV_PCM_TRIGGER_STOP.
> > > > And snd_pcm_prepare() will be called when the device is not running.
> > > > So synchronize_irq() should be enough for both of them.
> > > >
> > > > I have a patch like this now in intel8x0:
> > > >
> > > > diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c
> > > > index 6ff94d8ad86e..728588937673 100644
> > > > --- a/sound/pci/intel8x0.c
> > > > +++ b/sound/pci/intel8x0.c
> > > > @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct
> > > > snd_pcm_substream *substream,
> > > >
> > > >  static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
> > > >  {
> > > > +       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
> > > >         struct ichdev *ichdev = get_ichdev(substream);
> > > >
> > > > +       synchronize_irq(chip->irq);
> > > >         if (ichdev->pcm_open_flag) {
> > > >                 snd_ac97_pcm_close(ichdev->pcm);
> > > >                 ichdev->pcm_open_flag = 0;
> > > > @@ -993,6 +995,7 @@ static int snd_intel8x0_pcm_prepare(struct
> > > > snd_pcm_substream *substream)
> > > >         struct snd_pcm_runtime *runtime = substream->runtime;
> > > >         struct ichdev *ichdev = get_ichdev(substream);
> > > >
> > > > +       synchronize_irq(chip->irq);
> > > >         ichdev->physbuf = runtime->dma_addr;
> > > >         ichdev->size = snd_pcm_lib_buffer_bytes(substream);
> > > >         ichdev->fragsize = snd_pcm_lib_period_bytes(substream);
> > > >
> > > > If that looks good to you, I can upload the patch to pw as well.
> > > > Then we can upstream the intel8x0 patch and the first change I made in
> > > > this series (the elapse one).
> > > > Does that sound good to you?
> > >
> > > I already have a patch set that adds the irq-sync commonly, as this
> > > problem is seen on various drivers as you already pointed.
> > >
> > > Below two patches add the support in PCM core side, and the rest need in
> > > intel8x0.c is something like:
> > >
> > > --- a/sound/pci/intel8x0.c
> > > +++ b/sound/pci/intel8x0.c
> > > @@ -3092,6 +3092,7 @@ static int snd_intel8x0_create(struct snd_card *card,
> > >                 return -EBUSY;
> > >         }
> > >         chip->irq = pci->irq;
> > > +       card->sync_irq = chip->irq;
> > >
> > Will this assignment or removement cause possible race if the driver
> > is careless?
>
> Not really, it just influences on the possible synchronize_irq() call,
> and the call itself can't be so racy.  So it's basically safe to set
> and clear at any time.
Got it. I'm not that familiar with that function.
>
> > Maybe providing some helper functions or teaching driver writers when
> > is the right time to change or remove the sync_irq will help.
>
> The assumption is to set this whenever an irq handler is requested or
> freed.  I don't mind introducing an API function
> (e.g. snd_card_set_sync_irq(card, irq)), but OTOH I don't see much
> benefit by that, either.  This is no mandatory thing, you can
> implement in the driver side in different ways, too...
>
Thanks for your clarification. I think both ways would be fine.

Let me wait for your patches and add the fix for intel8x0 based on it later?
CC me anytime when you're ready.

Thanks!
Paul
>
> thanks,
>
> Takashi
>
> >
> > Best,
> > Paul
> >
> > >         if ((err = snd_device_new(card, SNDRV_DEV_LOWLEVEL, chip, &ops)) < 0) {
> > >                 snd_intel8x0_free(chip);
> > >
> > >
> > > (The intel8x0 does re-acquire IRQ, so it'll need a bit more lines, but
> > >  you get the idea.)
> > >
> > > My plan is to merge the whole changes after 5.5-rc1, destined for
> > > 5.6.
> > >
> > >
> > > thanks,
> > >
> > > Takashi
> > >
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access
  2019-11-15 17:04                     ` Chih-Yang Hsia
@ 2019-11-15 17:07                       ` Takashi Iwai
  0 siblings, 0 replies; 14+ messages in thread
From: Takashi Iwai @ 2019-11-15 17:07 UTC (permalink / raw)
  To: Chih-Yang Hsia; +Cc: linux-kernel, Mark Brown, Takashi Iwai, alsa-devel

On Fri, 15 Nov 2019 18:04:09 +0100,
Chih-Yang Hsia wrote:
> 
> On Fri, Nov 15, 2019 at 11:49 PM Takashi Iwai <tiwai@suse.de> wrote:
> >
> > On Fri, 15 Nov 2019 16:35:19 +0100,
> > Chih-Yang Hsia wrote:
> > >
> > > On Fri, Nov 15, 2019 at 1:00 AM Takashi Iwai <tiwai@suse.de> wrote:
> > > >
> > > > On Thu, 14 Nov 2019 17:37:54 +0100,
> > > > Chih-Yang Hsia wrote:
> > > > >
> > > > > On Thu, Nov 14, 2019 at 10:20 PM Takashi Iwai <tiwai@suse.de> wrote:
> > > > > >
> > > > > > On Thu, 14 Nov 2019 15:16:04 +0100,
> > > > > > Chih-Yang Hsia wrote:
> > > > > > >
> > > > > > > On Wed, Nov 13, 2019 at 7:36 PM Takashi Iwai <tiwai@suse.de> wrote:
> > > > > > > >
> > > > > > > > On Wed, 13 Nov 2019 10:47:51 +0100,
> > > > > > > > Takashi Iwai wrote:
> > > > > > > > >
> > > > > > > > > On Wed, 13 Nov 2019 08:24:41 +0100,
> > > > > > > > > Chih-Yang Hsia wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Nov 13, 2019 at 2:16 AM Takashi Iwai <tiwai@suse.de> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 12 Nov 2019 18:17:13 +0100,
> > > > > > > > > > > paulhsia wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Since
> > > > > > > > > > > > - snd_pcm_detach_substream sets runtime to null without stream lock and
> > > > > > > > > > > > - snd_pcm_period_elapsed checks the nullity of the runtime outside of
> > > > > > > > > > > >   stream lock.
> > > > > > > > > > > >
> > > > > > > > > > > > This will trigger null memory access in snd_pcm_running() call in
> > > > > > > > > > > > snd_pcm_period_elapsed.
> > > > > > > > > > >
> > > > > > > > > > > Well, if a stream is detached, it means that the stream must have been
> > > > > > > > > > > already closed; i.e. it's already a clear bug in the driver that
> > > > > > > > > > > snd_pcm_period_elapsed() is called against such a stream.
> > > > > > > > > > >
> > > > > > > > > > > Or am I missing other possible case?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > thanks,
> > > > > > > > > > >
> > > > > > > > > > > Takashi
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > In multithreaded environment, it is possible to have to access both
> > > > > > > > > > `interrupt_handler` (from irq) and `substream close` (from
> > > > > > > > > > snd_pcm_release) at the same time.
> > > > > > > > > > Therefore, in driver implementation, if "substream close function" and
> > > > > > > > > > the "code section where snd_pcm_period_elapsed() in" do not hold the
> > > > > > > > > > same lock, then the following things can happen:
> > > > > > > > > >
> > > > > > > > > > 1. interrupt_handler -> goes into snd_pcm_period_elapsed with a valid
> > > > > > > > > > sustream pointer
> > > > > > > > > > 2. snd_pcm_release_substream: call close without blocking
> > > > > > > > > > 3. snd_pcm_release_substream: call snd_pcm_detache_substream and set
> > > > > > > > > > substream->runtime to NULL
> > > > > > > > > > 4. interrupt_handler -> call snd_pcm_runtime() and crash while
> > > > > > > > > > accessing fields in `substream->runtime`
> > > > > > > > > >
> > > > > > > > > > e.g. In intel8x0.c driver for ac97 device,
> > > > > > > > > > In driver intel8x0.c, `snd_pcm_period_elapsed` is called after
> > > > > > > > > > checking `ichdev->substream` in `snd_intel8x0_update`.
> > > > > > > > > > And if a `snd_pcm_release` call from alsa-lib and pass through close()
> > > > > > > > > > and run to snd_pcm_detach_substream() in another thread, it's possible
> > > > > > > > > > to trigger a crash.
> > > > > > > > > > I can reproduce the issue within a multithread VM easily.
> > > > > > > > > >
> > > > > > > > > > My patches are trying to provide a basic protection for this situation
> > > > > > > > > > (and internal pcm lock between detach and elapsed), since
> > > > > > > > > > - the usage of `snd_pcm_period_elapsed` does not warn callers about
> > > > > > > > > > the possible race if the driver does not  force the order for `calling
> > > > > > > > > > snd_pcm_period_elapsed` and `close` by lock and
> > > > > > > > > > - lots of drivers already have this hidden issue and I can't fix them
> > > > > > > > > > one by one (You can check the "snd_pcm_period_elapsed usage" and the
> > > > > > > > > > "close implementation" within all the drivers). The most common
> > > > > > > > > > mistake is that
> > > > > > > > > >   - Checking if the substream is null and call into snd_pcm_period_elapsed
> > > > > > > > > >   - But `close` can happen anytime, pass without block and
> > > > > > > > > > snd_pcm_detach_substream will be trigger right after it
> > > > > > > > >
> > > > > > > > > Thanks, point taken.  While this argument is valid and it's good to
> > > > > > > > > harden the PCM core side, the concurrent calls are basically a bug,
> > > > > > > > > and we'd need another fix in anyway.  Also, the patch 2 makes little
> > > > > > > > > sense; there can't be multiple close calls racing with each other.  So
> > > > > > > > > I'll go for taking your fix but only the first patch.
> > > > > > > > >
> > > > > > > > > Back to this race: the surfaced issue is, as you pointed out, the race
> > > > > > > > > between snd_pcm_period_elapsed() vs close call.  However, the
> > > > > > > > > fundamental problem is the pending action after the PCM trigger-stop
> > > > > > > > > call.  Since the PCM trigger doesn't block nor wait until the hardware
> > > > > > > > > actually stops the things, the driver may go to the other step even
> > > > > > > > > after this "supposed-to-be-stopped" point.  In your case, it goes up
> > > > > > > > > to close, and crashes.  If we had a sync-stop operation, the interrupt
> > > > > > > > > handler should have finished before moving to the close stage, hence
> > > > > > > > > such a race could be avoided.
> > > > > > > > >
> > > > > > > > > It's been a long known problem, and some drivers have the own
> > > > > > > > > implementation for stop-sync.  I think it's time to investigate and
> > > > > > > > > start implementing the fundamental solution.
> > > > > > > >
> > > > > > > > BTW, what we need essentially for intel8x0 is to just call
> > > > > > > > synchronize_irq() before closing, at best in hw_free procedure:
> > > > > > > >
> > > > > > > > --- a/sound/pci/intel8x0.c
> > > > > > > > +++ b/sound/pci/intel8x0.c
> > > > > > > > @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct snd_pcm_substream *substream,
> > > > > > > >
> > > > > > > >  static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
> > > > > > > >  {
> > > > > > > > +       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
> > > > > > > >         struct ichdev *ichdev = get_ichdev(substream);
> > > > > > > >
> > > > > > > > +       synchronize_irq(chip->irq);
> > > > > > > >         if (ichdev->pcm_open_flag) {
> > > > > > > >                 snd_ac97_pcm_close(ichdev->pcm);
> > > > > > > >                 ichdev->pcm_open_flag = 0;
> > > > > > > >
> > > > > > > >
> > > > > > > > The same would be needed also at the beginning of the prepare, as the
> > > > > > > > application may restart the stream without release.
> > > > > > > >
> > > > > > > > My idea is to add sync_stop PCM ops and call it from PCM core at
> > > > > > > > snd_pcm_prepare() and snd_pcm_hw_free().
> > > > > > > >
> > > > > > > Will adding synchronize_irq() in snd_pcm_hw_free there fix the race issue?
> > > > > > > Is it possible to have sequence like the following steps ?
> > > > > > > - [Thread 1] snd_pcm_hw_free: just pass synchronize_irq()
> > > > > > > - [Thread 2] another interrupt come -> snd_intel8x0_update() -> goes
> > > > > > > into the lock region of snd_pcm_period_elapsed() and passes the
> > > > > > > PCM_RUNTIME_CHECK (right before snd_pcm_running())
> > > > > >
> > > > > > This shouldn't happen because at the point snd_pcm_hw_free() the
> > > > > > stream has been already in the SETUP state, i.e. with trigger PCM
> > > > > > callback, the hardware has been programmed not to generate the PCM
> > > > > > stream IRQ.
> > > > > >
> > > > > Thanks for pointing that out.
> > > > > snd_pcm_drop() will be called right before accessing `opts->hw_free`
> > > > > and device dma will be stopped by SNDRV_PCM_TRIGGER_STOP.
> > > > > And snd_pcm_prepare() will be called when the device is not running.
> > > > > So synchronize_irq() should be enough for both of them.
> > > > >
> > > > > I have a patch like this now in intel8x0:
> > > > >
> > > > > diff --git a/sound/pci/intel8x0.c b/sound/pci/intel8x0.c
> > > > > index 6ff94d8ad86e..728588937673 100644
> > > > > --- a/sound/pci/intel8x0.c
> > > > > +++ b/sound/pci/intel8x0.c
> > > > > @@ -923,8 +923,10 @@ static int snd_intel8x0_hw_params(struct
> > > > > snd_pcm_substream *substream,
> > > > >
> > > > >  static int snd_intel8x0_hw_free(struct snd_pcm_substream *substream)
> > > > >  {
> > > > > +       struct intel8x0 *chip = snd_pcm_substream_chip(substream);
> > > > >         struct ichdev *ichdev = get_ichdev(substream);
> > > > >
> > > > > +       synchronize_irq(chip->irq);
> > > > >         if (ichdev->pcm_open_flag) {
> > > > >                 snd_ac97_pcm_close(ichdev->pcm);
> > > > >                 ichdev->pcm_open_flag = 0;
> > > > > @@ -993,6 +995,7 @@ static int snd_intel8x0_pcm_prepare(struct
> > > > > snd_pcm_substream *substream)
> > > > >         struct snd_pcm_runtime *runtime = substream->runtime;
> > > > >         struct ichdev *ichdev = get_ichdev(substream);
> > > > >
> > > > > +       synchronize_irq(chip->irq);
> > > > >         ichdev->physbuf = runtime->dma_addr;
> > > > >         ichdev->size = snd_pcm_lib_buffer_bytes(substream);
> > > > >         ichdev->fragsize = snd_pcm_lib_period_bytes(substream);
> > > > >
> > > > > If that looks good to you, I can upload the patch to pw as well.
> > > > > Then we can upstream the intel8x0 patch and the first change I made in
> > > > > this series (the elapse one).
> > > > > Does that sound good to you?
> > > >
> > > > I already have a patch set that adds the irq-sync commonly, as this
> > > > problem is seen on various drivers as you already pointed.
> > > >
> > > > Below two patches add the support in PCM core side, and the rest need in
> > > > intel8x0.c is something like:
> > > >
> > > > --- a/sound/pci/intel8x0.c
> > > > +++ b/sound/pci/intel8x0.c
> > > > @@ -3092,6 +3092,7 @@ static int snd_intel8x0_create(struct snd_card *card,
> > > >                 return -EBUSY;
> > > >         }
> > > >         chip->irq = pci->irq;
> > > > +       card->sync_irq = chip->irq;
> > > >
> > > Will this assignment or removement cause possible race if the driver
> > > is careless?
> >
> > Not really, it just influences on the possible synchronize_irq() call,
> > and the call itself can't be so racy.  So it's basically safe to set
> > and clear at any time.
> Got it. I'm not that familiar with that function.
> >
> > > Maybe providing some helper functions or teaching driver writers when
> > > is the right time to change or remove the sync_irq will help.
> >
> > The assumption is to set this whenever an irq handler is requested or
> > freed.  I don't mind introducing an API function
> > (e.g. snd_card_set_sync_irq(card, irq)), but OTOH I don't see much
> > benefit by that, either.  This is no mandatory thing, you can
> > implement in the driver side in different ways, too...
> >
> Thanks for your clarification. I think both ways would be fine.
> 
> Let me wait for your patches and add the fix for intel8x0 based on it later?
> CC me anytime when you're ready.

I have already queued the fixes for all PCI and ISA drivers on my
local tree.  They need a bit more cooking to adjust with other tons of
patches (there are a lot of other PCM enhancement / cleanup patches at
this time), but they'll be ready after 5.5-rc1.  So stay tuned :)


thanks,

Takashi

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-11-15 17:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 17:17 [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access paulhsia
2019-11-12 17:17 ` [PATCH 1/2] ALSA: pcm: Fix stream lock usage in snd_pcm_period_elapsed() paulhsia
2019-11-12 17:17 ` [PATCH 2/2] ALSA: pcm: Use stream lock in snd_pcm_detach_substream() paulhsia
2019-11-12 18:16 ` [PATCH 0/2] ALSA: pcm: Fix race condition in runtime access Takashi Iwai
2019-11-13  7:24   ` Chih-Yang Hsia
2019-11-13  9:47     ` Takashi Iwai
2019-11-13 11:36       ` Takashi Iwai
2019-11-14 14:16         ` Chih-Yang Hsia
2019-11-14 14:20           ` Takashi Iwai
2019-11-14 16:37             ` Chih-Yang Hsia
2019-11-14 17:00               ` Takashi Iwai
2019-11-15 15:36                 ` Chih-Yang Hsia
     [not found]                 ` <CAJaf1Ta1tqYMCTaWxeL82gfY8Fg6hidLjHO3FFiqU7yyn5oVPg@mail.gmail.com>
     [not found]                   ` <s5hy2whi1gw.wl-tiwai@suse.de>
2019-11-15 17:04                     ` Chih-Yang Hsia
2019-11-15 17:07                       ` Takashi Iwai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).