All of lore.kernel.org
 help / color / mirror / Atom feed
* Query about xrun on usb/pcm
@ 2022-11-21 21:14 Carl Hetherington
  2022-11-22  7:25 ` Takashi Iwai
  0 siblings, 1 reply; 13+ messages in thread
From: Carl Hetherington @ 2022-11-21 21:14 UTC (permalink / raw)
  To: alsa-devel

Hi all,

I wonder if anybody has any clues/suggestions about problem I'm seeing
with an XMOS-based USB sound card.

As far as I can see, the card has endpoint 0x82 set up for capture data,
0x2 for playback data, and 0x82 is also used as the sync endpoint for
playback.  I'm assuming that's a fairly common arrangement?

I am testing it using simultaneous playback and capture, and simulating
a high-CPU-load case by sleeping for long enough to cause a lot of
xruns.  After some random time, I see a failure case that I'm struggling
to explain.  It goes like this:

There's an XRUN on the playback PCM, so __snd_pcm_xrun happens, then
  stop_endpoints() happens, and:
      it decides not to stop 0x82 because its running is > 1
      it stops 0x2, so its state goes to EP_STATE_STOPPING

Then the ALSA userspace code calls prepare on the playback PCM to get it
going again.

This ends up in wait_clear_urbs(), which does nothing with 0x82 as it is
still running.

At this point, the prepare thread is interrupted by an XRUN on
the capture PCM. With this PCM, there is no sync endpoint, and 0x82 is the data
endpoint.
In the xrun handler:
  stop_urbs() sets 0x82 to EP_STATE_STOPPING.
  ... and the xrun handler finishes.

Then we end up back in the prepare for the playback PCM.
wait_clear_urbs() then sets 0x2 to STOPPED, and the prepare is finished.

Now, snd_usb_endpoint_start() is called on 0x2 and that is fine.  Next,
snd_usb_endpoint_start() is called on 0x82 and that fails because its
state is still STOPPING.

At this point things seem broken.

Does anyone have a hint about where in this sequence things are going
wrong, and maybe even why?

I'm more than happy to clarify anything I can, or provide more debugging
information.

Thanks in advance!
Best,
Carl



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Query about xrun on usb/pcm
  2022-11-21 21:14 Query about xrun on usb/pcm Carl Hetherington
@ 2022-11-22  7:25 ` Takashi Iwai
  2022-11-22 11:16   ` Carl Hetherington
  0 siblings, 1 reply; 13+ messages in thread
From: Takashi Iwai @ 2022-11-22  7:25 UTC (permalink / raw)
  To: Carl Hetherington; +Cc: alsa-devel

On Mon, 21 Nov 2022 22:14:39 +0100,
Carl Hetherington wrote:
> 
> Hi all,
> 
> I wonder if anybody has any clues/suggestions about problem I'm seeing
> with an XMOS-based USB sound card.
> 
> As far as I can see, the card has endpoint 0x82 set up for capture data,
> 0x2 for playback data, and 0x82 is also used as the sync endpoint for
> playback.  I'm assuming that's a fairly common arrangement?

Yes, that's an oft-seen implicit feedback mode.
You seem hitting a corner case of dealing with that mode...

> I am testing it using simultaneous playback and capture, and simulating
> a high-CPU-load case by sleeping for long enough to cause a lot of
> xruns.  After some random time, I see a failure case that I'm struggling
> to explain.  It goes like this:
> 
> There's an XRUN on the playback PCM, so __snd_pcm_xrun happens, then
>   stop_endpoints() happens, and:
>       it decides not to stop 0x82 because its running is > 1
>       it stops 0x2, so its state goes to EP_STATE_STOPPING
> 
> Then the ALSA userspace code calls prepare on the playback PCM to get it
> going again.
> 
> This ends up in wait_clear_urbs(), which does nothing with 0x82 as it is
> still running.

So far, so good.

> At this point, the prepare thread is interrupted by an XRUN on
> the capture PCM. With this PCM, there is no sync endpoint, and 0x82 is the data
> endpoint.
> In the xrun handler:
>   stop_urbs() sets 0x82 to EP_STATE_STOPPING.
>   ... and the xrun handler finishes.
> 
> Then we end up back in the prepare for the playback PCM.
> wait_clear_urbs() then sets 0x2 to STOPPED, and the prepare is finished.
> 
> Now, snd_usb_endpoint_start() is called on 0x2 and that is fine.  Next,
> snd_usb_endpoint_start() is called on 0x82 and that fails because its
> state is still STOPPING.
> 
> At this point things seem broken.
>
> Does anyone have a hint about where in this sequence things are going
> wrong, and maybe even why?

The problem is that it's treating XRUNs on the both streams
individually.  It's correct to recover only the PCM stream when an
XRUN is reported to the PCM stream.  However, for an XRUN on the
capture stream that serves as a sync source, it should stop and
recover not only the capture PCM stream but also the playback stream
as a sync sink as well.

Below is a possible test fix (totally untested!).
This may give XRUNs twice eventually, which is a bit confusing, but it
aligns with the actual hardware behavior, at least.


thanks,

Takashi

-- 8< --
--- a/sound/usb/endpoint.c
+++ b/sound/usb/endpoint.c
@@ -403,10 +403,15 @@ static int prepare_inbound_urb(struct snd_usb_endpoint *ep,
 static void notify_xrun(struct snd_usb_endpoint *ep)
 {
 	struct snd_usb_substream *data_subs;
+	struct snd_usb_endpoint *ep_sink;
 
 	data_subs = READ_ONCE(ep->data_subs);
-	if (data_subs && data_subs->pcm_substream)
+	if (data_subs && data_subs->pcm_substream) {
 		snd_pcm_stop_xrun(data_subs->pcm_substream);
+		ep_sink = READ_ONCE(ep->sync_sink);
+		if (ep_sink)
+			notify_xrun(ep_sink);
+	}
 }
 
 static struct snd_usb_packet_info *

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Query about xrun on usb/pcm
  2022-11-22  7:25 ` Takashi Iwai
@ 2022-11-22 11:16   ` Carl Hetherington
  2022-11-22 14:19     ` Takashi Iwai
  0 siblings, 1 reply; 13+ messages in thread
From: Carl Hetherington @ 2022-11-22 11:16 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel

Hi Takashi,

Thank you for getting back to me!

On Tue, 22 Nov 2022, Takashi Iwai wrote:

[snip]

> > Now, snd_usb_endpoint_start() is called on 0x2 and that is fine.  Next,
> > snd_usb_endpoint_start() is called on 0x82 and that fails because its
> > state is still STOPPING.
> >
> > At this point things seem broken.
> >
> > Does anyone have a hint about where in this sequence things are going
> > wrong, and maybe even why?
>
> The problem is that it's treating XRUNs on the both streams
> individually.  It's correct to recover only the PCM stream when an
> XRUN is reported to the PCM stream.  However, for an XRUN on the
> capture stream that serves as a sync source, it should stop and
> recover not only the capture PCM stream but also the playback stream
> as a sync sink as well.
>
> Below is a possible test fix (totally untested!).
> This may give XRUNs twice eventually, which is a bit confusing, but it
> aligns with the actual hardware behavior, at least.

[snip fix]

Makes sense, thank you!  Sadly, the fix doesn't seem to work because (I
think) the xruns I'm seeing come via a different path (not though
notify_xrun()).  Mine arrive via this trace:

__snd_pcm_xrun
snd_pcm_update_state
snd_pcm_update_hw_ptr
usb_hcd_giveback_urb
snd_pcm_period_elapsed_under_stream_lock
snd_pcm_period_elapsed
retire_capture_urb
snd_complete_urb

I'll see if can apply a similar fix to this case, though to my naive
eyes it looks a little trickier as the xrun is found in the snd_pcm
code rather than the USB code.  Any suggestions most welcome!

Kind regards,
Carl



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Query about xrun on usb/pcm
  2022-11-22 11:16   ` Carl Hetherington
@ 2022-11-22 14:19     ` Takashi Iwai
  2022-11-22 14:22       ` Takashi Iwai
  2022-11-28 22:51       ` Carl Hetherington
  0 siblings, 2 replies; 13+ messages in thread
From: Takashi Iwai @ 2022-11-22 14:19 UTC (permalink / raw)
  To: Carl Hetherington; +Cc: alsa-devel

On Tue, 22 Nov 2022 12:16:47 +0100,
Carl Hetherington wrote:
> 
> Hi Takashi,
> 
> Thank you for getting back to me!
> 
> On Tue, 22 Nov 2022, Takashi Iwai wrote:
> 
> [snip]
> 
> > > Now, snd_usb_endpoint_start() is called on 0x2 and that is fine.  Next,
> > > snd_usb_endpoint_start() is called on 0x82 and that fails because its
> > > state is still STOPPING.
> > >
> > > At this point things seem broken.
> > >
> > > Does anyone have a hint about where in this sequence things are going
> > > wrong, and maybe even why?
> >
> > The problem is that it's treating XRUNs on the both streams
> > individually.  It's correct to recover only the PCM stream when an
> > XRUN is reported to the PCM stream.  However, for an XRUN on the
> > capture stream that serves as a sync source, it should stop and
> > recover not only the capture PCM stream but also the playback stream
> > as a sync sink as well.
> >
> > Below is a possible test fix (totally untested!).
> > This may give XRUNs twice eventually, which is a bit confusing, but it
> > aligns with the actual hardware behavior, at least.
> 
> [snip fix]
> 
> Makes sense, thank you!  Sadly, the fix doesn't seem to work because (I
> think) the xruns I'm seeing come via a different path (not though
> notify_xrun()).  Mine arrive via this trace:
> 
> __snd_pcm_xrun
> snd_pcm_update_state
> snd_pcm_update_hw_ptr
> usb_hcd_giveback_urb
> snd_pcm_period_elapsed_under_stream_lock
> snd_pcm_period_elapsed
> retire_capture_urb
> snd_complete_urb
> 
> I'll see if can apply a similar fix to this case, though to my naive
> eyes it looks a little trickier as the xrun is found in the snd_pcm
> code rather than the USB code.  Any suggestions most welcome!

OK, then it's a bit different problem, and not so trivial to fix in
the kernel side alone, I'm afraid.  Basically it's a race between
start and stop of two streams.  The key point is that, for stopping a
(USB) stream, a sync-stop operation is needed, and this can't be
performed at the PCM trigger itself (which is an atomic operation).
So, the kernel trigger may at most return an error there.

I assume that it's from snd_usb_endpoint_start() and it returning
-EPIPE error.  If so, we may change the PCM core code to set the PCM
state again XRUN in such an error case, so that application may repeat
the standard recovery process.  Something like below.


Takashi

-- 8< --
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -1424,9 +1424,14 @@ static int snd_pcm_pre_start(struct snd_pcm_substream *substream,
 static int snd_pcm_do_start(struct snd_pcm_substream *substream,
 			    snd_pcm_state_t state)
 {
+	int err;
+
 	if (substream->runtime->trigger_master != substream)
 		return 0;
-	return substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_START);
+	err = substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_START);
+	if (err == -EPIPE)
+		__snd_pcm_set_state(runtime, SNDRV_PCM_STATE_XRUN);
+	return err;
 }
 
 static void snd_pcm_undo_start(struct snd_pcm_substream *substream,

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Query about xrun on usb/pcm
  2022-11-22 14:19     ` Takashi Iwai
@ 2022-11-22 14:22       ` Takashi Iwai
  2022-11-28 22:51       ` Carl Hetherington
  1 sibling, 0 replies; 13+ messages in thread
From: Takashi Iwai @ 2022-11-22 14:22 UTC (permalink / raw)
  To: Carl Hetherington; +Cc: alsa-devel

On Tue, 22 Nov 2022 15:19:13 +0100,
Takashi Iwai wrote:
> 
> On Tue, 22 Nov 2022 12:16:47 +0100,
> Carl Hetherington wrote:
> > 
> > Hi Takashi,
> > 
> > Thank you for getting back to me!
> > 
> > On Tue, 22 Nov 2022, Takashi Iwai wrote:
> > 
> > [snip]
> > 
> > > > Now, snd_usb_endpoint_start() is called on 0x2 and that is fine.  Next,
> > > > snd_usb_endpoint_start() is called on 0x82 and that fails because its
> > > > state is still STOPPING.
> > > >
> > > > At this point things seem broken.
> > > >
> > > > Does anyone have a hint about where in this sequence things are going
> > > > wrong, and maybe even why?
> > >
> > > The problem is that it's treating XRUNs on the both streams
> > > individually.  It's correct to recover only the PCM stream when an
> > > XRUN is reported to the PCM stream.  However, for an XRUN on the
> > > capture stream that serves as a sync source, it should stop and
> > > recover not only the capture PCM stream but also the playback stream
> > > as a sync sink as well.
> > >
> > > Below is a possible test fix (totally untested!).
> > > This may give XRUNs twice eventually, which is a bit confusing, but it
> > > aligns with the actual hardware behavior, at least.
> > 
> > [snip fix]
> > 
> > Makes sense, thank you!  Sadly, the fix doesn't seem to work because (I
> > think) the xruns I'm seeing come via a different path (not though
> > notify_xrun()).  Mine arrive via this trace:
> > 
> > __snd_pcm_xrun
> > snd_pcm_update_state
> > snd_pcm_update_hw_ptr
> > usb_hcd_giveback_urb
> > snd_pcm_period_elapsed_under_stream_lock
> > snd_pcm_period_elapsed
> > retire_capture_urb
> > snd_complete_urb
> > 
> > I'll see if can apply a similar fix to this case, though to my naive
> > eyes it looks a little trickier as the xrun is found in the snd_pcm
> > code rather than the USB code.  Any suggestions most welcome!
> 
> OK, then it's a bit different problem, and not so trivial to fix in
> the kernel side alone, I'm afraid.  Basically it's a race between
> start and stop of two streams.  The key point is that, for stopping a
> (USB) stream, a sync-stop operation is needed, and this can't be
> performed at the PCM trigger itself (which is an atomic operation).
> So, the kernel trigger may at most return an error there.
> 
> I assume that it's from snd_usb_endpoint_start() and it returning
> -EPIPE error.  If so, we may change the PCM core code to set the PCM
> state again XRUN in such an error case, so that application may repeat
> the standard recovery process.  Something like below.

Also, it might be slightly better if we swap the starting order of two
streams: sync at first, then data.  A race can still happen, though.


Takashi

-- 8< --
--- a/sound/usb/pcm.c
+++ b/sound/usb/pcm.c
@@ -241,19 +241,19 @@ static int start_endpoints(struct snd_usb_substream *subs)
 	if (!subs->data_endpoint)
 		return -EINVAL;
 
-	if (!test_and_set_bit(SUBSTREAM_FLAG_DATA_EP_STARTED, &subs->flags)) {
-		err = snd_usb_endpoint_start(subs->data_endpoint);
+	if (subs->sync_endpoint &&
+	    !test_and_set_bit(SUBSTREAM_FLAG_SYNC_EP_STARTED, &subs->flags)) {
+		err = snd_usb_endpoint_start(subs->sync_endpoint);
 		if (err < 0) {
-			clear_bit(SUBSTREAM_FLAG_DATA_EP_STARTED, &subs->flags);
+			clear_bit(SUBSTREAM_FLAG_SYNC_EP_STARTED, &subs->flags);
 			goto error;
 		}
 	}
 
-	if (subs->sync_endpoint &&
-	    !test_and_set_bit(SUBSTREAM_FLAG_SYNC_EP_STARTED, &subs->flags)) {
-		err = snd_usb_endpoint_start(subs->sync_endpoint);
+	if (!test_and_set_bit(SUBSTREAM_FLAG_DATA_EP_STARTED, &subs->flags)) {
+		err = snd_usb_endpoint_start(subs->data_endpoint);
 		if (err < 0) {
-			clear_bit(SUBSTREAM_FLAG_SYNC_EP_STARTED, &subs->flags);
+			clear_bit(SUBSTREAM_FLAG_DATA_EP_STARTED, &subs->flags);
 			goto error;
 		}
 	}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Query about xrun on usb/pcm
  2022-11-22 14:19     ` Takashi Iwai
  2022-11-22 14:22       ` Takashi Iwai
@ 2022-11-28 22:51       ` Carl Hetherington
  2022-11-29  7:45         ` Takashi Iwai
  1 sibling, 1 reply; 13+ messages in thread
From: Carl Hetherington @ 2022-11-28 22:51 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel

Hi Takashi,

Thank you for your continued attention with this!

[snip]

> > I'll see if can apply a similar fix to this case, though to my naive
> > eyes it looks a little trickier as the xrun is found in the snd_pcm
> > code rather than the USB code.  Any suggestions most welcome!
>
> OK, then it's a bit different problem, and not so trivial to fix in
> the kernel side alone, I'm afraid.  Basically it's a race between
> start and stop of two streams.  The key point is that, for stopping a
> (USB) stream, a sync-stop operation is needed, and this can't be
> performed at the PCM trigger itself (which is an atomic operation).
> So, the kernel trigger may at most return an error there.
>
> I assume that it's from snd_usb_endpoint_start() and it returning
> -EPIPE error.  If so, we may change the PCM core code to set the PCM
> state again XRUN in such an error case, so that application may repeat
> the standard recovery process.  Something like below.

Thanks for the suggestion.  I experimented a little with this, but I
think the problem I'm seeing is that (even if the application knows it
should retry the snd_pcm_prepare() step) we still end up with an endpoint
in EP_STATE_STOPPING while the corresponding stop_operating flag is 0.

This means that snd_pcm_sync_stop will never call the USB sync_stop
handler, which AFAICS is the only way (?) the endpoint can get back to
EP_STATE_STOPPED.

In my error case, the code in snd_pcm_sync_stop sets stop_operating to
false (perhaps assuming that substream->ops->sync_stop will "succeed"
in setting any STOPPING endpoints to STOPPED) but then this doesn't
happen because of this xrun that arrives halfway through the sync_stop
operation.

I experimented with removing the check at the top of snd_pcm_sync_stop,
so that we enter the if body regardless of
substream->runtime->stop_operating, and making my application retry
snd_pcm_prepare() if it fails with -EPIPE, and this seems to "fix" my
problem.  Obviously this causes more (unnecessary) calls to the
sync_stop() entry point...

I'd be grateful of any thoughts you have about that.

Kind regards,
Carl

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Query about xrun on usb/pcm
  2022-11-28 22:51       ` Carl Hetherington
@ 2022-11-29  7:45         ` Takashi Iwai
  2022-11-30 22:37           ` Carl Hetherington
  0 siblings, 1 reply; 13+ messages in thread
From: Takashi Iwai @ 2022-11-29  7:45 UTC (permalink / raw)
  To: Carl Hetherington; +Cc: alsa-devel

On Mon, 28 Nov 2022 23:51:55 +0100,
Carl Hetherington wrote:
> 
> Hi Takashi,
> 
> Thank you for your continued attention with this!
> 
> [snip]
> 
> > > I'll see if can apply a similar fix to this case, though to my naive
> > > eyes it looks a little trickier as the xrun is found in the snd_pcm
> > > code rather than the USB code.  Any suggestions most welcome!
> >
> > OK, then it's a bit different problem, and not so trivial to fix in
> > the kernel side alone, I'm afraid.  Basically it's a race between
> > start and stop of two streams.  The key point is that, for stopping a
> > (USB) stream, a sync-stop operation is needed, and this can't be
> > performed at the PCM trigger itself (which is an atomic operation).
> > So, the kernel trigger may at most return an error there.
> >
> > I assume that it's from snd_usb_endpoint_start() and it returning
> > -EPIPE error.  If so, we may change the PCM core code to set the PCM
> > state again XRUN in such an error case, so that application may repeat
> > the standard recovery process.  Something like below.
> 
> Thanks for the suggestion.  I experimented a little with this, but I
> think the problem I'm seeing is that (even if the application knows it
> should retry the snd_pcm_prepare() step) we still end up with an endpoint
> in EP_STATE_STOPPING while the corresponding stop_operating flag is 0.

Ah, I guess that's a fallout in the logic.  When XRUN happens at start
-- receiving an -EPIPE error at snd_pcm_do_start() -- then the patch
sets the XRUN state.  This assumed that the stream gets stopped the
following snd_pcm_undo_start() call.  Indeed it does stop but there we
forgot setting stop_operating flag unlike what snd_pcm_stop() does.

> This means that snd_pcm_sync_stop will never call the USB sync_stop
> handler, which AFAICS is the only way (?) the endpoint can get back to
> EP_STATE_STOPPED.
> 
> In my error case, the code in snd_pcm_sync_stop sets stop_operating to
> false (perhaps assuming that substream->ops->sync_stop will "succeed"
> in setting any STOPPING endpoints to STOPPED) but then this doesn't
> happen because of this xrun that arrives halfway through the sync_stop
> operation.
> 
> I experimented with removing the check at the top of snd_pcm_sync_stop,
> so that we enter the if body regardless of
> substream->runtime->stop_operating, and making my application retry
> snd_pcm_prepare() if it fails with -EPIPE, and this seems to "fix" my
> problem.  Obviously this causes more (unnecessary) calls to the
> sync_stop() entry point...
> 
> I'd be grateful of any thoughts you have about that.

How about the revised patch below?


Takashi

-- 8< --
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -1424,16 +1424,28 @@ static int snd_pcm_pre_start(struct snd_pcm_substream *substream,
 static int snd_pcm_do_start(struct snd_pcm_substream *substream,
 			    snd_pcm_state_t state)
 {
+	int err;
+
 	if (substream->runtime->trigger_master != substream)
 		return 0;
-	return substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_START);
+	err = substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_START);
+	/* XRUN happened during the start; usually we do trigger(STOP)
+	 * then set the PCM state to XRUN, but in this case, the stream
+	 * is stopped in snd_pcm_undo_start() right after this point without
+	 * knowing the reason -- so set the PCM state beforehand as exception.
+	 */
+	if (err == -EPIPE)
+		__snd_pcm_set_state(substream->runtime, SNDRV_PCM_STATE_XRUN);
+	return err;
 }
 
 static void snd_pcm_undo_start(struct snd_pcm_substream *substream,
 			       snd_pcm_state_t state)
 {
-	if (substream->runtime->trigger_master == substream)
+	if (substream->runtime->trigger_master == substream) {
 		substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP);
+		substream->runtime->stop_operating = true;
+	}
 }
 
 static void snd_pcm_post_start(struct snd_pcm_substream *substream,

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Query about xrun on usb/pcm
  2022-11-29  7:45         ` Takashi Iwai
@ 2022-11-30 22:37           ` Carl Hetherington
  2022-12-01  7:47             ` Takashi Iwai
  0 siblings, 1 reply; 13+ messages in thread
From: Carl Hetherington @ 2022-11-30 22:37 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel

Hi Takashi,

> > Thanks for the suggestion.  I experimented a little with this, but I
> > think the problem I'm seeing is that (even if the application knows it
> > should retry the snd_pcm_prepare() step) we still end up with an endpoint
> > in EP_STATE_STOPPING while the corresponding stop_operating flag is 0.
>
> Ah, I guess that's a fallout in the logic.  When XRUN happens at start
> -- receiving an -EPIPE error at snd_pcm_do_start() -- then the patch
> sets the XRUN state.  This assumed that the stream gets stopped the
> following snd_pcm_undo_start() call.  Indeed it does stop but there we
> forgot setting stop_operating flag unlike what snd_pcm_stop() does.

Thanks for the hint.  I checked it out again, and in fact I'm seeing the
-EPIPE come back from snd_pcm_do_prepare().  It starts its sync-stop,
another xrun comes in (as we talked about before), it tries to
start_endpoints() and that fails.

A fairly similar thing to what you suggested seems to work for me:

diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index f38c2e5e9a29..0b61943cca98 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -1948,9 +1948,17 @@ static void snd_pcm_post_prepare(struct snd_pcm_substream *substream,
 	snd_pcm_set_state(substream, SNDRV_PCM_STATE_PREPARED);
 }

+
+static void snd_pcm_undo_prepare(struct snd_pcm_substream *substream,
+				 snd_pcm_state_t state)
+{
+	substream->runtime->stop_operating = true;
+}
+
 static const struct action_ops snd_pcm_action_prepare = {
 	.pre_action = snd_pcm_pre_prepare,
 	.do_action = snd_pcm_do_prepare,
+	.undo_action = snd_pcm_undo_prepare,
 	.post_action = snd_pcm_post_prepare
 };


Can you see any problems with that?  In the application code I do need
to re-try the snd_pcm_prepare() if one fails with -EPIPE, but with this
undo step the second snd_pcm_prepare() is able to recover the endpoint
states, instead of hitting this problem where it tries to start things
that are STOPPING, but also won't set things to STOPPED because
stop_operating is false.

Thanks and best regards,
Carl



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Query about xrun on usb/pcm
  2022-11-30 22:37           ` Carl Hetherington
@ 2022-12-01  7:47             ` Takashi Iwai
  2022-12-05 11:59               ` Carl Hetherington
  0 siblings, 1 reply; 13+ messages in thread
From: Takashi Iwai @ 2022-12-01  7:47 UTC (permalink / raw)
  To: Carl Hetherington; +Cc: alsa-devel

[-- Attachment #1: Type: text/plain, Size: 3170 bytes --]

On Wed, 30 Nov 2022 23:37:39 +0100,
Carl Hetherington wrote:
> 
> Hi Takashi,
> 
> > > Thanks for the suggestion.  I experimented a little with this, but I
> > > think the problem I'm seeing is that (even if the application knows it
> > > should retry the snd_pcm_prepare() step) we still end up with an endpoint
> > > in EP_STATE_STOPPING while the corresponding stop_operating flag is 0.
> >
> > Ah, I guess that's a fallout in the logic.  When XRUN happens at start
> > -- receiving an -EPIPE error at snd_pcm_do_start() -- then the patch
> > sets the XRUN state.  This assumed that the stream gets stopped the
> > following snd_pcm_undo_start() call.  Indeed it does stop but there we
> > forgot setting stop_operating flag unlike what snd_pcm_stop() does.
> 
> Thanks for the hint.  I checked it out again, and in fact I'm seeing the
> -EPIPE come back from snd_pcm_do_prepare().  It starts its sync-stop,
> another xrun comes in (as we talked about before), it tries to
> start_endpoints() and that fails.

Ahh, now I see the another missing piece; it's starting a stream at
prepare, not explicitly via the start call...

> A fairly similar thing to what you suggested seems to work for me:
> 
> diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
> index f38c2e5e9a29..0b61943cca98 100644
> --- a/sound/core/pcm_native.c
> +++ b/sound/core/pcm_native.c
> @@ -1948,9 +1948,17 @@ static void snd_pcm_post_prepare(struct snd_pcm_substream *substream,
>  	snd_pcm_set_state(substream, SNDRV_PCM_STATE_PREPARED);
>  }
> 
> +
> +static void snd_pcm_undo_prepare(struct snd_pcm_substream *substream,
> +				 snd_pcm_state_t state)
> +{
> +	substream->runtime->stop_operating = true;
> +}
> +
>  static const struct action_ops snd_pcm_action_prepare = {
>  	.pre_action = snd_pcm_pre_prepare,
>  	.do_action = snd_pcm_do_prepare,
> +	.undo_action = snd_pcm_undo_prepare,
>  	.post_action = snd_pcm_post_prepare
>  };
> 
> 
> Can you see any problems with that?  In the application code I do need
> to re-try the snd_pcm_prepare() if one fails with -EPIPE, but with this
> undo step the second snd_pcm_prepare() is able to recover the endpoint
> states, instead of hitting this problem where it tries to start things
> that are STOPPING, but also won't set things to STOPPED because
> stop_operating is false.

Setting the stop_operating unconditionally there doesn't look right,
as there may be other error types not only the pending XRUN.

The problem is rather specific to USB audio driver that tries to start
the stream at PCM prepare, so it's better to handle in USB audio
driver itself.  That is, when -EPIPE is returned from
start_endpoints() at prepare, the driver does some action.

I can see two options:
- Issue snd_pcm_stop_xrun() when start_endpoints() returns -EPIPE
- Repeat the prepare after the sync at snd_usb_pcm_prepare()

The former would require a bit of change in snd_pcm_stop_xrun(), and
it relies on the application retrying the prepare.  The latter would
be more self-contained.  I attached two patches (totally untested) for
both scenarios.

My gut feeling is for the latter solution, but this needs
verification.


thanks,

Takashi


[-- Attachment #2: usb-xrun-workaround.diff --]
[-- Type: application/octet-stream, Size: 1243 bytes --]

diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index ba6e44d02faa..dbc145aabf52 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -1565,8 +1565,12 @@ int snd_pcm_stop_xrun(struct snd_pcm_substream *substream)
 	unsigned long flags;
 
 	snd_pcm_stream_lock_irqsave(substream, flags);
-	if (substream->runtime && snd_pcm_running(substream))
-		__snd_pcm_xrun(substream);
+	if (substream->runtime) {
+		if (snd_pcm_running(substream))
+			__snd_pcm_xrun(substream);
+		else if (substream->runtime->state == SNDRV_PCM_STATE_XRUN)
+			substream->runtime->stop_operating = true;
+	}
 	snd_pcm_stream_unlock_irqrestore(substream, flags);
 	return 0;
 }
diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
index 8ed165f036a0..dc8f034a768a 100644
--- a/sound/usb/pcm.c
+++ b/sound/usb/pcm.c
@@ -638,8 +638,11 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
 
 	subs->lowlatency_playback = lowlatency_playback_available(runtime, subs);
 	if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK &&
-	    !subs->lowlatency_playback)
+	    !subs->lowlatency_playback) {
 		ret = start_endpoints(subs);
+		if (ret == -EPIPE)
+			snd_pcm_stop_xrun(substream);
+	}
 
  unlock:
 	snd_usb_unlock_shutdown(chip);

[-- Attachment #3: usb-xrun-workaround-v2.diff --]
[-- Type: application/octet-stream, Size: 1275 bytes --]

diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
index 8ed165f036a0..9557bd4d1bbc 100644
--- a/sound/usb/pcm.c
+++ b/sound/usb/pcm.c
@@ -604,6 +604,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
 	struct snd_pcm_runtime *runtime = substream->runtime;
 	struct snd_usb_substream *subs = runtime->private_data;
 	struct snd_usb_audio *chip = subs->stream->chip;
+	int retry = 0;
 	int ret;
 
 	ret = snd_usb_lock_shutdown(chip);
@@ -614,6 +615,7 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
 		goto unlock;
 	}
 
+ again:
 	if (subs->sync_endpoint) {
 		ret = snd_usb_endpoint_prepare(chip, subs->sync_endpoint);
 		if (ret < 0)
@@ -638,9 +640,16 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
 
 	subs->lowlatency_playback = lowlatency_playback_available(runtime, subs);
 	if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK &&
-	    !subs->lowlatency_playback)
+	    !subs->lowlatency_playback) {
 		ret = start_endpoints(subs);
-
+		/* if XRUN happens at starting streams (possibly with implicit
+		 * fb case), restart again, but only try once.
+		 */
+		if (ret == -EPIPE && !retry++) {
+			sync_pending_stops(subs);
+			goto again;
+		}
+	}
  unlock:
 	snd_usb_unlock_shutdown(chip);
 	return ret;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Query about xrun on usb/pcm
  2022-12-01  7:47             ` Takashi Iwai
@ 2022-12-05 11:59               ` Carl Hetherington
  2022-12-05 12:33                 ` Takashi Iwai
  0 siblings, 1 reply; 13+ messages in thread
From: Carl Hetherington @ 2022-12-05 11:59 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel

Hi Takashi,

> > Can you see any problems with that?  In the application code I do need
> > to re-try the snd_pcm_prepare() if one fails with -EPIPE, but with this
> > undo step the second snd_pcm_prepare() is able to recover the endpoint
> > states, instead of hitting this problem where it tries to start things
> > that are STOPPING, but also won't set things to STOPPED because
> > stop_operating is false.
>
> Setting the stop_operating unconditionally there doesn't look right,
> as there may be other error types not only the pending XRUN.
>
> The problem is rather specific to USB audio driver that tries to start
> the stream at PCM prepare, so it's better to handle in USB audio
> driver itself.  That is, when -EPIPE is returned from
> start_endpoints() at prepare, the driver does some action.
>
> I can see two options:
> - Issue snd_pcm_stop_xrun() when start_endpoints() returns -EPIPE
> - Repeat the prepare after the sync at snd_usb_pcm_prepare()
>
> The former would require a bit of change in snd_pcm_stop_xrun(), and
> it relies on the application retrying the prepare.  The latter would
> be more self-contained.  I attached two patches (totally untested) for
> both scenarios.
>
> My gut feeling is for the latter solution, but this needs
> verification.

The latter solution seems to fix our problem perfectly!  Thank you so
much!

Is there anything I can/should do to help get the change merged?

Kind regards,
Carl

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Query about xrun on usb/pcm
  2022-12-05 11:59               ` Carl Hetherington
@ 2022-12-05 12:33                 ` Takashi Iwai
  2022-12-05 18:53                   ` Carl Hetherington
  0 siblings, 1 reply; 13+ messages in thread
From: Takashi Iwai @ 2022-12-05 12:33 UTC (permalink / raw)
  To: Carl Hetherington; +Cc: alsa-devel

On Mon, 05 Dec 2022 12:59:54 +0100,
Carl Hetherington wrote:
> 
> Hi Takashi,
> 
> > > Can you see any problems with that?  In the application code I do need
> > > to re-try the snd_pcm_prepare() if one fails with -EPIPE, but with this
> > > undo step the second snd_pcm_prepare() is able to recover the endpoint
> > > states, instead of hitting this problem where it tries to start things
> > > that are STOPPING, but also won't set things to STOPPED because
> > > stop_operating is false.
> >
> > Setting the stop_operating unconditionally there doesn't look right,
> > as there may be other error types not only the pending XRUN.
> >
> > The problem is rather specific to USB audio driver that tries to start
> > the stream at PCM prepare, so it's better to handle in USB audio
> > driver itself.  That is, when -EPIPE is returned from
> > start_endpoints() at prepare, the driver does some action.
> >
> > I can see two options:
> > - Issue snd_pcm_stop_xrun() when start_endpoints() returns -EPIPE
> > - Repeat the prepare after the sync at snd_usb_pcm_prepare()
> >
> > The former would require a bit of change in snd_pcm_stop_xrun(), and
> > it relies on the application retrying the prepare.  The latter would
> > be more self-contained.  I attached two patches (totally untested) for
> > both scenarios.
> >
> > My gut feeling is for the latter solution, but this needs
> > verification.
> 
> The latter solution seems to fix our problem perfectly!  Thank you so
> much!
> 
> Is there anything I can/should do to help get the change merged?

I'm going to submit fix patches and put you to Cc.  I believe that the
former patches are also valid, although it doesn't influence in your
case, so they'll be included.

The fixes will be likely included in 6.2-rc1.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Query about xrun on usb/pcm
  2022-12-05 12:33                 ` Takashi Iwai
@ 2022-12-05 18:53                   ` Carl Hetherington
  2022-12-06  7:51                     ` Takashi Iwai
  0 siblings, 1 reply; 13+ messages in thread
From: Carl Hetherington @ 2022-12-05 18:53 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: alsa-devel

Hi Takashi,

> I'm going to submit fix patches and put you to Cc.  I believe that the
> former patches are also valid, although it doesn't influence in your
> case, so they'll be included.
>
> The fixes will be likely included in 6.2-rc1.

Thank you, that is great!

I should have mentioned before that I actually tested the patch against
5.15.48 (where it doesn't quite apply cleanly).  Sorry for the
confusion.

If it would help to test against a newer kernel, let me know and I can
give it a try.

All the best,
Carl


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Query about xrun on usb/pcm
  2022-12-05 18:53                   ` Carl Hetherington
@ 2022-12-06  7:51                     ` Takashi Iwai
  0 siblings, 0 replies; 13+ messages in thread
From: Takashi Iwai @ 2022-12-06  7:51 UTC (permalink / raw)
  To: Carl Hetherington; +Cc: alsa-devel

On Mon, 05 Dec 2022 19:53:26 +0100,
Carl Hetherington wrote:
> 
> Hi Takashi,
> 
> > I'm going to submit fix patches and put you to Cc.  I believe that the
> > former patches are also valid, although it doesn't influence in your
> > case, so they'll be included.
> >
> > The fixes will be likely included in 6.2-rc1.
> 
> Thank you, that is great!
> 
> I should have mentioned before that I actually tested the patch against
> 5.15.48 (where it doesn't quite apply cleanly).  Sorry for the
> confusion.
> 
> If it would help to test against a newer kernel, let me know and I can
> give it a try.

It's OK with 5.15.x, as the endpoint management hasn't been changed
much since then.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-12-06  7:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-21 21:14 Query about xrun on usb/pcm Carl Hetherington
2022-11-22  7:25 ` Takashi Iwai
2022-11-22 11:16   ` Carl Hetherington
2022-11-22 14:19     ` Takashi Iwai
2022-11-22 14:22       ` Takashi Iwai
2022-11-28 22:51       ` Carl Hetherington
2022-11-29  7:45         ` Takashi Iwai
2022-11-30 22:37           ` Carl Hetherington
2022-12-01  7:47             ` Takashi Iwai
2022-12-05 11:59               ` Carl Hetherington
2022-12-05 12:33                 ` Takashi Iwai
2022-12-05 18:53                   ` Carl Hetherington
2022-12-06  7:51                     ` Takashi Iwai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.