All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting
@ 2021-09-10 14:10 Takashi Iwai
  2021-09-10 17:04 ` Pierre-Louis Bossart
  2021-09-13  2:36 ` Jens Axboe
  0 siblings, 2 replies; 8+ messages in thread
From: Takashi Iwai @ 2021-09-10 14:10 UTC (permalink / raw)
  To: alsa-devel; +Cc: Jens Axboe

The position reporting on Intel Skylake and later chips via
azx_get_pos_skl() contains a udelay(20) call for the capture streams.
A call for this alone doesn't sound too harmful.  However, as the
pointer PCM ops is one of the hottest path in the PCM operations --
especially for the timer-scheduled operations like PulseAudio -- such
a delay hogs CPU usage significantly in the total performance.

The function there was taken from the original code in ASoC SST
Skylake driver blindly.  The udelay() is a workaround for the case
where the reported position is behind the period boundary at the
timing triggered from interrupts; applications often expect that the
full data is available for the whole period when returned (and also
that's the definition of the ALSA PCM period).

OTOH, HD-audio (legacy) driver has already some workarounds for the
delayed position reporting due to its relatively large FIFO, such as
the BDL position adjustment and the delayed period-elapsed call in the
work.  That said, the udelay() is almost superfluous for HD-audio
driver unlike SST, and we can drop the udelay().

Though, the current code doesn't guarantee the full period readiness
as mentioned in the above, but rather it checks the wallclock and
detects the unexpected jump.  That's one missing piece, and the drop
of udelay() needs a bit more sanity checks for the delayed handling.

This patch implements those: the drop of udelay() call in
azx_get_pos_skl() and the more proper check of hwptr in
azx_position_ok().  The latter change is applied only for the case
where the stream is running in the normal mode without
no_period_wakeup flag.  When no_period_wakeup is set, it essentially
ignores the period handling and rather concentrates only on the
current position; which implies that we don't need to care about the
period boundary at all.

Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 sound/pci/hda/hda_intel.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 3aa432d814a2..faeeeb923d5e 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -637,13 +637,17 @@ static int azx_position_check(struct azx *chip, struct azx_dev *azx_dev)
  * the update-IRQ timing.  The IRQ is issued before actually the
  * data is processed.  So, we need to process it afterwords in a
  * workqueue.
+ *
+ * Returns 1 if OK to proceed, 0 for delay handling, -1 for skipping update
  */
 static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
 {
 	struct snd_pcm_substream *substream = azx_dev->core.substream;
+	struct snd_pcm_runtime *runtime = substream->runtime;
 	int stream = substream->stream;
 	u32 wallclk;
 	unsigned int pos;
+	snd_pcm_uframes_t hwptr, target;
 
 	wallclk = azx_readl(chip, WALLCLK) - azx_dev->core.start_wallclk;
 	if (wallclk < (azx_dev->core.period_wallclk * 2) / 3)
@@ -680,6 +684,24 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
 		/* NG - it's below the first next period boundary */
 		return chip->bdl_pos_adj ? 0 : -1;
 	azx_dev->core.start_wallclk += wallclk;
+
+	if (azx_dev->core.no_period_wakeup)
+		return 1; /* OK, no need to check period boundary */
+
+	if (runtime->hw_ptr_base != runtime->hw_ptr_interrupt)
+		return 1; /* OK, already in hwptr updating process */
+
+	/* check whether the period gets really elapsed */
+	pos = bytes_to_frames(runtime, pos);
+	hwptr = runtime->hw_ptr_base + pos;
+	if (hwptr < runtime->status->hw_ptr)
+		hwptr += runtime->buffer_size;
+	target = runtime->hw_ptr_interrupt + runtime->period_size;
+	if (hwptr < target) {
+		/* too early wakeup, process it later */
+		return chip->bdl_pos_adj ? 0 : -1;
+	}
+
 	return 1; /* OK, it's fine */
 }
 
@@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
 	if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
 		return azx_skl_get_dpib_pos(chip, azx_dev);
 
-	/* For capture, we need to read posbuf, but it requires a delay
-	 * for the possible boundary overlap; the read of DPIB fetches the
-	 * actual posbuf
-	 */
-	udelay(20);
+	/* read of DPIB fetches the actual posbuf */
 	azx_skl_get_dpib_pos(chip, azx_dev);
 	return azx_get_pos_posbuf(chip, azx_dev);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting
  2021-09-10 14:10 [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting Takashi Iwai
@ 2021-09-10 17:04 ` Pierre-Louis Bossart
  2021-09-13  5:48   ` Takashi Iwai
  2021-09-13  2:36 ` Jens Axboe
  1 sibling, 1 reply; 8+ messages in thread
From: Pierre-Louis Bossart @ 2021-09-10 17:04 UTC (permalink / raw)
  To: Takashi Iwai, alsa-devel; +Cc: Jens Axboe



On 9/10/21 9:10 AM, Takashi Iwai wrote:
> The position reporting on Intel Skylake and later chips via
> azx_get_pos_skl() contains a udelay(20) call for the capture streams.
> A call for this alone doesn't sound too harmful.  However, as the
> pointer PCM ops is one of the hottest path in the PCM operations --
> especially for the timer-scheduled operations like PulseAudio -- such
> a delay hogs CPU usage significantly in the total performance.
> 
> The function there was taken from the original code in ASoC SST
> Skylake driver blindly.  The udelay() is a workaround for the case
> where the reported position is behind the period boundary at the
> timing triggered from interrupts; applications often expect that the
> full data is available for the whole period when returned (and also
> that's the definition of the ALSA PCM period).

that initial work-around from the Intel SST driver does not seem to be
legit in the first place, when I checked with hardware folks recently no
one understands why there are delays and special cases for capture. The
only requirement wrt. DPIB is that the DDR update is valid across VC0
and VC1, while the DPIB registers are only valid with VC0. For SOF we
don't know of any VC1 use so will default to the DPIB vendor-specific
registers.

See https://github.com/thesofproject/linux/pull/3143 for my WIP fixes
for SOF.

I don't have the time to look at this specific patch today but wanted to
make sure you are aware of my on-going fixes.

Note that the use of DPIB works best if you don't use the IOC interrupt.
when the interrupt is thrown, there is likely a delay for the DPIB
information to be refreshed.

> 
> OTOH, HD-audio (legacy) driver has already some workarounds for the
> delayed position reporting due to its relatively large FIFO, such as
> the BDL position adjustment and the delayed period-elapsed call in the
> work.  That said, the udelay() is almost superfluous for HD-audio
> driver unlike SST, and we can drop the udelay().
> 
> Though, the current code doesn't guarantee the full period readiness
> as mentioned in the above, but rather it checks the wallclock and
> detects the unexpected jump.  That's one missing piece, and the drop
> of udelay() needs a bit more sanity checks for the delayed handling.
> 
> This patch implements those: the drop of udelay() call in
> azx_get_pos_skl() and the more proper check of hwptr in
> azx_position_ok().  The latter change is applied only for the case
> where the stream is running in the normal mode without
> no_period_wakeup flag.  When no_period_wakeup is set, it essentially
> ignores the period handling and rather concentrates only on the
> current position; which implies that we don't need to care about the
> period boundary at all.
> 
> Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+")
> Reported-by: Jens Axboe <axboe@kernel.dk>
> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> ---
>  sound/pci/hda/hda_intel.c | 28 +++++++++++++++++++++++-----
>  1 file changed, 23 insertions(+), 5 deletions(-)
> 
> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> index 3aa432d814a2..faeeeb923d5e 100644
> --- a/sound/pci/hda/hda_intel.c
> +++ b/sound/pci/hda/hda_intel.c
> @@ -637,13 +637,17 @@ static int azx_position_check(struct azx *chip, struct azx_dev *azx_dev)
>   * the update-IRQ timing.  The IRQ is issued before actually the
>   * data is processed.  So, we need to process it afterwords in a
>   * workqueue.
> + *
> + * Returns 1 if OK to proceed, 0 for delay handling, -1 for skipping update
>   */
>  static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
>  {
>  	struct snd_pcm_substream *substream = azx_dev->core.substream;
> +	struct snd_pcm_runtime *runtime = substream->runtime;
>  	int stream = substream->stream;
>  	u32 wallclk;
>  	unsigned int pos;
> +	snd_pcm_uframes_t hwptr, target;
>  
>  	wallclk = azx_readl(chip, WALLCLK) - azx_dev->core.start_wallclk;
>  	if (wallclk < (azx_dev->core.period_wallclk * 2) / 3)
> @@ -680,6 +684,24 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
>  		/* NG - it's below the first next period boundary */
>  		return chip->bdl_pos_adj ? 0 : -1;
>  	azx_dev->core.start_wallclk += wallclk;
> +
> +	if (azx_dev->core.no_period_wakeup)
> +		return 1; /* OK, no need to check period boundary */
> +
> +	if (runtime->hw_ptr_base != runtime->hw_ptr_interrupt)
> +		return 1; /* OK, already in hwptr updating process */
> +
> +	/* check whether the period gets really elapsed */
> +	pos = bytes_to_frames(runtime, pos);
> +	hwptr = runtime->hw_ptr_base + pos;
> +	if (hwptr < runtime->status->hw_ptr)
> +		hwptr += runtime->buffer_size;
> +	target = runtime->hw_ptr_interrupt + runtime->period_size;
> +	if (hwptr < target) {
> +		/* too early wakeup, process it later */
> +		return chip->bdl_pos_adj ? 0 : -1;
> +	}
> +
>  	return 1; /* OK, it's fine */
>  }
>  
> @@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
>  	if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
>  		return azx_skl_get_dpib_pos(chip, azx_dev);
>  
> -	/* For capture, we need to read posbuf, but it requires a delay
> -	 * for the possible boundary overlap; the read of DPIB fetches the
> -	 * actual posbuf
> -	 */
> -	udelay(20);
> +	/* read of DPIB fetches the actual posbuf */
>  	azx_skl_get_dpib_pos(chip, azx_dev);
>  	return azx_get_pos_posbuf(chip, azx_dev);
>  }
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting
  2021-09-10 14:10 [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting Takashi Iwai
  2021-09-10 17:04 ` Pierre-Louis Bossart
@ 2021-09-13  2:36 ` Jens Axboe
  1 sibling, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2021-09-13  2:36 UTC (permalink / raw)
  To: Takashi Iwai, alsa-devel

On 9/10/21 8:10 AM, Takashi Iwai wrote:
> The position reporting on Intel Skylake and later chips via
> azx_get_pos_skl() contains a udelay(20) call for the capture streams.
> A call for this alone doesn't sound too harmful.  However, as the
> pointer PCM ops is one of the hottest path in the PCM operations --
> especially for the timer-scheduled operations like PulseAudio -- such
> a delay hogs CPU usage significantly in the total performance.
> 
> The function there was taken from the original code in ASoC SST
> Skylake driver blindly.  The udelay() is a workaround for the case
> where the reported position is behind the period boundary at the
> timing triggered from interrupts; applications often expect that the
> full data is available for the whole period when returned (and also
> that's the definition of the ALSA PCM period).
> 
> OTOH, HD-audio (legacy) driver has already some workarounds for the
> delayed position reporting due to its relatively large FIFO, such as
> the BDL position adjustment and the delayed period-elapsed call in the
> work.  That said, the udelay() is almost superfluous for HD-audio
> driver unlike SST, and we can drop the udelay().
> 
> Though, the current code doesn't guarantee the full period readiness
> as mentioned in the above, but rather it checks the wallclock and
> detects the unexpected jump.  That's one missing piece, and the drop
> of udelay() needs a bit more sanity checks for the delayed handling.
> 
> This patch implements those: the drop of udelay() call in
> azx_get_pos_skl() and the more proper check of hwptr in
> azx_position_ok().  The latter change is applied only for the case
> where the stream is running in the normal mode without
> no_period_wakeup flag.  When no_period_wakeup is set, it essentially
> ignores the period handling and rather concentrates only on the
> current position; which implies that we don't need to care about the
> period boundary at all.

I tested this on top of 5.15-rc1, and it seems like a massive
improvement. No longer have stalled or slowed down nestopia, seems
smooth all the time. I'll report back in a few days, just in case.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting
  2021-09-10 17:04 ` Pierre-Louis Bossart
@ 2021-09-13  5:48   ` Takashi Iwai
  2021-09-21 22:18     ` Pierre-Louis Bossart
  0 siblings, 1 reply; 8+ messages in thread
From: Takashi Iwai @ 2021-09-13  5:48 UTC (permalink / raw)
  To: Pierre-Louis Bossart; +Cc: Jens Axboe, alsa-devel

On Fri, 10 Sep 2021 19:04:37 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> 
> On 9/10/21 9:10 AM, Takashi Iwai wrote:
> > The position reporting on Intel Skylake and later chips via
> > azx_get_pos_skl() contains a udelay(20) call for the capture streams.
> > A call for this alone doesn't sound too harmful.  However, as the
> > pointer PCM ops is one of the hottest path in the PCM operations --
> > especially for the timer-scheduled operations like PulseAudio -- such
> > a delay hogs CPU usage significantly in the total performance.
> > 
> > The function there was taken from the original code in ASoC SST
> > Skylake driver blindly.  The udelay() is a workaround for the case
> > where the reported position is behind the period boundary at the
> > timing triggered from interrupts; applications often expect that the
> > full data is available for the whole period when returned (and also
> > that's the definition of the ALSA PCM period).
> 
> that initial work-around from the Intel SST driver does not seem to be
> legit in the first place, when I checked with hardware folks recently no
> one understands why there are delays and special cases for capture. The
> only requirement wrt. DPIB is that the DDR update is valid across VC0
> and VC1, while the DPIB registers are only valid with VC0. For SOF we
> don't know of any VC1 use so will default to the DPIB vendor-specific
> registers.

What are those VC0 and VC1 registers?  I can't find the definitions in
the code, so I assume that none of ALSA/ASoC drivers use VC1.

> See https://github.com/thesofproject/linux/pull/3143 for my WIP fixes
> for SOF.
> 
> I don't have the time to look at this specific patch today but wanted to
> make sure you are aware of my on-going fixes.
> 
> Note that the use of DPIB works best if you don't use the IOC interrupt.
> when the interrupt is thrown, there is likely a delay for the DPIB
> information to be refreshed.

Thanks for the information!  The delay could be the reason of the
udelay(), and that's superfluous as mentioned in the commit.

So the remaining question seems to be which method is a better
approach for the capture stream: DPIB or posbuf.  I kept the posbuf
just for conservatism, but judging from your comment, we may use DPIB
for both directions safely?

In anyway, the additional mechanism to check the delayed position
report in this patch can be kept no matter which way (DPIB or posbuf)
is used.


Takashi

> 
> > 
> > OTOH, HD-audio (legacy) driver has already some workarounds for the
> > delayed position reporting due to its relatively large FIFO, such as
> > the BDL position adjustment and the delayed period-elapsed call in the
> > work.  That said, the udelay() is almost superfluous for HD-audio
> > driver unlike SST, and we can drop the udelay().
> > 
> > Though, the current code doesn't guarantee the full period readiness
> > as mentioned in the above, but rather it checks the wallclock and
> > detects the unexpected jump.  That's one missing piece, and the drop
> > of udelay() needs a bit more sanity checks for the delayed handling.
> > 
> > This patch implements those: the drop of udelay() call in
> > azx_get_pos_skl() and the more proper check of hwptr in
> > azx_position_ok().  The latter change is applied only for the case
> > where the stream is running in the normal mode without
> > no_period_wakeup flag.  When no_period_wakeup is set, it essentially
> > ignores the period handling and rather concentrates only on the
> > current position; which implies that we don't need to care about the
> > period boundary at all.
> > 
> > Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+")
> > Reported-by: Jens Axboe <axboe@kernel.dk>
> > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > ---
> >  sound/pci/hda/hda_intel.c | 28 +++++++++++++++++++++++-----
> >  1 file changed, 23 insertions(+), 5 deletions(-)
> > 
> > diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> > index 3aa432d814a2..faeeeb923d5e 100644
> > --- a/sound/pci/hda/hda_intel.c
> > +++ b/sound/pci/hda/hda_intel.c
> > @@ -637,13 +637,17 @@ static int azx_position_check(struct azx *chip, struct azx_dev *azx_dev)
> >   * the update-IRQ timing.  The IRQ is issued before actually the
> >   * data is processed.  So, we need to process it afterwords in a
> >   * workqueue.
> > + *
> > + * Returns 1 if OK to proceed, 0 for delay handling, -1 for skipping update
> >   */
> >  static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >  {
> >  	struct snd_pcm_substream *substream = azx_dev->core.substream;
> > +	struct snd_pcm_runtime *runtime = substream->runtime;
> >  	int stream = substream->stream;
> >  	u32 wallclk;
> >  	unsigned int pos;
> > +	snd_pcm_uframes_t hwptr, target;
> >  
> >  	wallclk = azx_readl(chip, WALLCLK) - azx_dev->core.start_wallclk;
> >  	if (wallclk < (azx_dev->core.period_wallclk * 2) / 3)
> > @@ -680,6 +684,24 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >  		/* NG - it's below the first next period boundary */
> >  		return chip->bdl_pos_adj ? 0 : -1;
> >  	azx_dev->core.start_wallclk += wallclk;
> > +
> > +	if (azx_dev->core.no_period_wakeup)
> > +		return 1; /* OK, no need to check period boundary */
> > +
> > +	if (runtime->hw_ptr_base != runtime->hw_ptr_interrupt)
> > +		return 1; /* OK, already in hwptr updating process */
> > +
> > +	/* check whether the period gets really elapsed */
> > +	pos = bytes_to_frames(runtime, pos);
> > +	hwptr = runtime->hw_ptr_base + pos;
> > +	if (hwptr < runtime->status->hw_ptr)
> > +		hwptr += runtime->buffer_size;
> > +	target = runtime->hw_ptr_interrupt + runtime->period_size;
> > +	if (hwptr < target) {
> > +		/* too early wakeup, process it later */
> > +		return chip->bdl_pos_adj ? 0 : -1;
> > +	}
> > +
> >  	return 1; /* OK, it's fine */
> >  }
> >  
> > @@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
> >  	if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
> >  		return azx_skl_get_dpib_pos(chip, azx_dev);
> >  
> > -	/* For capture, we need to read posbuf, but it requires a delay
> > -	 * for the possible boundary overlap; the read of DPIB fetches the
> > -	 * actual posbuf
> > -	 */
> > -	udelay(20);
> > +	/* read of DPIB fetches the actual posbuf */
> >  	azx_skl_get_dpib_pos(chip, azx_dev);
> >  	return azx_get_pos_posbuf(chip, azx_dev);
> >  }
> > 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting
  2021-09-13  5:48   ` Takashi Iwai
@ 2021-09-21 22:18     ` Pierre-Louis Bossart
  2021-09-28  9:01       ` Takashi Iwai
  0 siblings, 1 reply; 8+ messages in thread
From: Pierre-Louis Bossart @ 2021-09-21 22:18 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Jens Axboe, alsa-devel

Sorry Takashi for the delay, I missed your reply.

>>> The position reporting on Intel Skylake and later chips via
>>> azx_get_pos_skl() contains a udelay(20) call for the capture streams.
>>> A call for this alone doesn't sound too harmful.  However, as the
>>> pointer PCM ops is one of the hottest path in the PCM operations --
>>> especially for the timer-scheduled operations like PulseAudio -- such
>>> a delay hogs CPU usage significantly in the total performance.
>>>
>>> The function there was taken from the original code in ASoC SST
>>> Skylake driver blindly.  The udelay() is a workaround for the case
>>> where the reported position is behind the period boundary at the
>>> timing triggered from interrupts; applications often expect that the
>>> full data is available for the whole period when returned (and also
>>> that's the definition of the ALSA PCM period).
>>
>> that initial work-around from the Intel SST driver does not seem to be
>> legit in the first place, when I checked with hardware folks recently no
>> one understands why there are delays and special cases for capture. The
>> only requirement wrt. DPIB is that the DDR update is valid across VC0
>> and VC1, while the DPIB registers are only valid with VC0. For SOF we
>> don't know of any VC1 use so will default to the DPIB vendor-specific
>> registers.
> 
> What are those VC0 and VC1 registers?  I can't find the definitions in
> the code, so I assume that none of ALSA/ASoC drivers use VC1.

These are PCI concepts/capabilities. VC stands for "Virtual Channel",
which are mapped to Traffic Class (TC). These registers are typically
set by the BIOS AFAIK. The point is that if VC1 is enabled only the DPIB
updates are valid, the vendor-specific will report information can be
misleading.

The recommendation from hardware folks is to use DPIB updates in DDR,
which are known to work across both VC0 and VC1.

For SOF, we do know VC1 is never used so we'll use the vendor-specific
registers.

>> See https://github.com/thesofproject/linux/pull/3143 for my WIP fixes
>> for SOF.
>>
>> I don't have the time to look at this specific patch today but wanted to
>> make sure you are aware of my on-going fixes.
>>
>> Note that the use of DPIB works best if you don't use the IOC interrupt.
>> when the interrupt is thrown, there is likely a delay for the DPIB
>> information to be refreshed.
> 
> Thanks for the information!  The delay could be the reason of the
> udelay(), and that's superfluous as mentioned in the commit.
> 
> So the remaining question seems to be which method is a better
> approach for the capture stream: DPIB or posbuf.  I kept the posbuf
> just for conservatism, but judging from your comment, we may use DPIB
> for both directions safely?

sorry you lost me. Isn't DPIB updates in DDR precisely what posbuf reports?

I think what you need to use is use azx_get_pos_posbuf(chip, azx_dev);
unconditionally, both for capture and playback, and remove the use of
the skylake specific stuff and the delay.

> In anyway, the additional mechanism to check the delayed position
> report in this patch can be kept no matter which way (DPIB or posbuf)
> is used.

Agree!

> 
> 
> Takashi
> 
>>
>>>
>>> OTOH, HD-audio (legacy) driver has already some workarounds for the
>>> delayed position reporting due to its relatively large FIFO, such as
>>> the BDL position adjustment and the delayed period-elapsed call in the
>>> work.  That said, the udelay() is almost superfluous for HD-audio
>>> driver unlike SST, and we can drop the udelay().
>>>
>>> Though, the current code doesn't guarantee the full period readiness
>>> as mentioned in the above, but rather it checks the wallclock and
>>> detects the unexpected jump.  That's one missing piece, and the drop
>>> of udelay() needs a bit more sanity checks for the delayed handling.
>>>
>>> This patch implements those: the drop of udelay() call in
>>> azx_get_pos_skl() and the more proper check of hwptr in
>>> azx_position_ok().  The latter change is applied only for the case
>>> where the stream is running in the normal mode without
>>> no_period_wakeup flag.  When no_period_wakeup is set, it essentially
>>> ignores the period handling and rather concentrates only on the
>>> current position; which implies that we don't need to care about the
>>> period boundary at all.
>>>
>>> Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+")
>>> Reported-by: Jens Axboe <axboe@kernel.dk>
>>> Signed-off-by: Takashi Iwai <tiwai@suse.de>
>>> ---
>>>  sound/pci/hda/hda_intel.c | 28 +++++++++++++++++++++++-----
>>>  1 file changed, 23 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
>>> index 3aa432d814a2..faeeeb923d5e 100644
>>> --- a/sound/pci/hda/hda_intel.c
>>> +++ b/sound/pci/hda/hda_intel.c
>>> @@ -637,13 +637,17 @@ static int azx_position_check(struct azx *chip, struct azx_dev *azx_dev)
>>>   * the update-IRQ timing.  The IRQ is issued before actually the
>>>   * data is processed.  So, we need to process it afterwords in a
>>>   * workqueue.
>>> + *
>>> + * Returns 1 if OK to proceed, 0 for delay handling, -1 for skipping update
>>>   */
>>>  static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
>>>  {
>>>  	struct snd_pcm_substream *substream = azx_dev->core.substream;
>>> +	struct snd_pcm_runtime *runtime = substream->runtime;
>>>  	int stream = substream->stream;
>>>  	u32 wallclk;
>>>  	unsigned int pos;
>>> +	snd_pcm_uframes_t hwptr, target;
>>>  
>>>  	wallclk = azx_readl(chip, WALLCLK) - azx_dev->core.start_wallclk;
>>>  	if (wallclk < (azx_dev->core.period_wallclk * 2) / 3)
>>> @@ -680,6 +684,24 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
>>>  		/* NG - it's below the first next period boundary */
>>>  		return chip->bdl_pos_adj ? 0 : -1;
>>>  	azx_dev->core.start_wallclk += wallclk;
>>> +
>>> +	if (azx_dev->core.no_period_wakeup)
>>> +		return 1; /* OK, no need to check period boundary */
>>> +
>>> +	if (runtime->hw_ptr_base != runtime->hw_ptr_interrupt)
>>> +		return 1; /* OK, already in hwptr updating process */
>>> +
>>> +	/* check whether the period gets really elapsed */
>>> +	pos = bytes_to_frames(runtime, pos);
>>> +	hwptr = runtime->hw_ptr_base + pos;
>>> +	if (hwptr < runtime->status->hw_ptr)
>>> +		hwptr += runtime->buffer_size;
>>> +	target = runtime->hw_ptr_interrupt + runtime->period_size;
>>> +	if (hwptr < target) {
>>> +		/* too early wakeup, process it later */
>>> +		return chip->bdl_pos_adj ? 0 : -1;
>>> +	}
>>> +
>>>  	return 1; /* OK, it's fine */
>>>  }
>>>  
>>> @@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
>>>  	if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
>>>  		return azx_skl_get_dpib_pos(chip, azx_dev);
>>>  
>>> -	/* For capture, we need to read posbuf, but it requires a delay
>>> -	 * for the possible boundary overlap; the read of DPIB fetches the
>>> -	 * actual posbuf
>>> -	 */
>>> -	udelay(20);
>>> +	/* read of DPIB fetches the actual posbuf */
>>>  	azx_skl_get_dpib_pos(chip, azx_dev);
>>>  	return azx_get_pos_posbuf(chip, azx_dev);
>>>  }
>>>
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting
  2021-09-21 22:18     ` Pierre-Louis Bossart
@ 2021-09-28  9:01       ` Takashi Iwai
  2021-09-28 15:00         ` Pierre-Louis Bossart
  0 siblings, 1 reply; 8+ messages in thread
From: Takashi Iwai @ 2021-09-28  9:01 UTC (permalink / raw)
  To: Pierre-Louis Bossart; +Cc: Jens Axboe, alsa-devel

On Wed, 22 Sep 2021 00:18:14 +0200,
Pierre-Louis Bossart wrote:
> 
> Sorry Takashi for the delay, I missed your reply.
> 
> >>> The position reporting on Intel Skylake and later chips via
> >>> azx_get_pos_skl() contains a udelay(20) call for the capture streams.
> >>> A call for this alone doesn't sound too harmful.  However, as the
> >>> pointer PCM ops is one of the hottest path in the PCM operations --
> >>> especially for the timer-scheduled operations like PulseAudio -- such
> >>> a delay hogs CPU usage significantly in the total performance.
> >>>
> >>> The function there was taken from the original code in ASoC SST
> >>> Skylake driver blindly.  The udelay() is a workaround for the case
> >>> where the reported position is behind the period boundary at the
> >>> timing triggered from interrupts; applications often expect that the
> >>> full data is available for the whole period when returned (and also
> >>> that's the definition of the ALSA PCM period).
> >>
> >> that initial work-around from the Intel SST driver does not seem to be
> >> legit in the first place, when I checked with hardware folks recently no
> >> one understands why there are delays and special cases for capture. The
> >> only requirement wrt. DPIB is that the DDR update is valid across VC0
> >> and VC1, while the DPIB registers are only valid with VC0. For SOF we
> >> don't know of any VC1 use so will default to the DPIB vendor-specific
> >> registers.
> > 
> > What are those VC0 and VC1 registers?  I can't find the definitions in
> > the code, so I assume that none of ALSA/ASoC drivers use VC1.
> 
> These are PCI concepts/capabilities. VC stands for "Virtual Channel",
> which are mapped to Traffic Class (TC). These registers are typically
> set by the BIOS AFAIK. The point is that if VC1 is enabled only the DPIB
> updates are valid, the vendor-specific will report information can be
> misleading.
> 
> The recommendation from hardware folks is to use DPIB updates in DDR,
> which are known to work across both VC0 and VC1.
> 
> For SOF, we do know VC1 is never used so we'll use the vendor-specific
> registers.

OK, thanks for clarification.

> >> See https://github.com/thesofproject/linux/pull/3143 for my WIP fixes
> >> for SOF.
> >>
> >> I don't have the time to look at this specific patch today but wanted to
> >> make sure you are aware of my on-going fixes.
> >>
> >> Note that the use of DPIB works best if you don't use the IOC interrupt.
> >> when the interrupt is thrown, there is likely a delay for the DPIB
> >> information to be refreshed.
> > 
> > Thanks for the information!  The delay could be the reason of the
> > udelay(), and that's superfluous as mentioned in the commit.
> > 
> > So the remaining question seems to be which method is a better
> > approach for the capture stream: DPIB or posbuf.  I kept the posbuf
> > just for conservatism, but judging from your comment, we may use DPIB
> > for both directions safely?
> 
> sorry you lost me. Isn't DPIB updates in DDR precisely what posbuf reports?
> 
> I think what you need to use is use azx_get_pos_posbuf(chip, azx_dev);
> unconditionally, both for capture and playback, and remove the use of
> the skylake specific stuff and the delay.

When I measured, I saw a slight difference between values in DPIB and
posbuf, so I wonder which is actually more accurate.  The latter is
sometimes a bit behind the former.

If the posbuf is more correct in the sense for the PCM pointer, we can
simply move back to the posbuf like other platforms.


thanks,

Takashi


> > In anyway, the additional mechanism to check the delayed position
> > report in this patch can be kept no matter which way (DPIB or posbuf)
> > is used.
> 
> Agree!
> 
> > 
> > 
> > Takashi
> > 
> >>
> >>>
> >>> OTOH, HD-audio (legacy) driver has already some workarounds for the
> >>> delayed position reporting due to its relatively large FIFO, such as
> >>> the BDL position adjustment and the delayed period-elapsed call in the
> >>> work.  That said, the udelay() is almost superfluous for HD-audio
> >>> driver unlike SST, and we can drop the udelay().
> >>>
> >>> Though, the current code doesn't guarantee the full period readiness
> >>> as mentioned in the above, but rather it checks the wallclock and
> >>> detects the unexpected jump.  That's one missing piece, and the drop
> >>> of udelay() needs a bit more sanity checks for the delayed handling.
> >>>
> >>> This patch implements those: the drop of udelay() call in
> >>> azx_get_pos_skl() and the more proper check of hwptr in
> >>> azx_position_ok().  The latter change is applied only for the case
> >>> where the stream is running in the normal mode without
> >>> no_period_wakeup flag.  When no_period_wakeup is set, it essentially
> >>> ignores the period handling and rather concentrates only on the
> >>> current position; which implies that we don't need to care about the
> >>> period boundary at all.
> >>>
> >>> Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+")
> >>> Reported-by: Jens Axboe <axboe@kernel.dk>
> >>> Signed-off-by: Takashi Iwai <tiwai@suse.de>
> >>> ---
> >>>  sound/pci/hda/hda_intel.c | 28 +++++++++++++++++++++++-----
> >>>  1 file changed, 23 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
> >>> index 3aa432d814a2..faeeeb923d5e 100644
> >>> --- a/sound/pci/hda/hda_intel.c
> >>> +++ b/sound/pci/hda/hda_intel.c
> >>> @@ -637,13 +637,17 @@ static int azx_position_check(struct azx *chip, struct azx_dev *azx_dev)
> >>>   * the update-IRQ timing.  The IRQ is issued before actually the
> >>>   * data is processed.  So, we need to process it afterwords in a
> >>>   * workqueue.
> >>> + *
> >>> + * Returns 1 if OK to proceed, 0 for delay handling, -1 for skipping update
> >>>   */
> >>>  static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >>>  {
> >>>  	struct snd_pcm_substream *substream = azx_dev->core.substream;
> >>> +	struct snd_pcm_runtime *runtime = substream->runtime;
> >>>  	int stream = substream->stream;
> >>>  	u32 wallclk;
> >>>  	unsigned int pos;
> >>> +	snd_pcm_uframes_t hwptr, target;
> >>>  
> >>>  	wallclk = azx_readl(chip, WALLCLK) - azx_dev->core.start_wallclk;
> >>>  	if (wallclk < (azx_dev->core.period_wallclk * 2) / 3)
> >>> @@ -680,6 +684,24 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
> >>>  		/* NG - it's below the first next period boundary */
> >>>  		return chip->bdl_pos_adj ? 0 : -1;
> >>>  	azx_dev->core.start_wallclk += wallclk;
> >>> +
> >>> +	if (azx_dev->core.no_period_wakeup)
> >>> +		return 1; /* OK, no need to check period boundary */
> >>> +
> >>> +	if (runtime->hw_ptr_base != runtime->hw_ptr_interrupt)
> >>> +		return 1; /* OK, already in hwptr updating process */
> >>> +
> >>> +	/* check whether the period gets really elapsed */
> >>> +	pos = bytes_to_frames(runtime, pos);
> >>> +	hwptr = runtime->hw_ptr_base + pos;
> >>> +	if (hwptr < runtime->status->hw_ptr)
> >>> +		hwptr += runtime->buffer_size;
> >>> +	target = runtime->hw_ptr_interrupt + runtime->period_size;
> >>> +	if (hwptr < target) {
> >>> +		/* too early wakeup, process it later */
> >>> +		return chip->bdl_pos_adj ? 0 : -1;
> >>> +	}
> >>> +
> >>>  	return 1; /* OK, it's fine */
> >>>  }
> >>>  
> >>> @@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
> >>>  	if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
> >>>  		return azx_skl_get_dpib_pos(chip, azx_dev);
> >>>  
> >>> -	/* For capture, we need to read posbuf, but it requires a delay
> >>> -	 * for the possible boundary overlap; the read of DPIB fetches the
> >>> -	 * actual posbuf
> >>> -	 */
> >>> -	udelay(20);
> >>> +	/* read of DPIB fetches the actual posbuf */
> >>>  	azx_skl_get_dpib_pos(chip, azx_dev);
> >>>  	return azx_get_pos_posbuf(chip, azx_dev);
> >>>  }
> >>>
> >>
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting
  2021-09-28  9:01       ` Takashi Iwai
@ 2021-09-28 15:00         ` Pierre-Louis Bossart
  2021-09-29  7:14           ` Takashi Iwai
  0 siblings, 1 reply; 8+ messages in thread
From: Pierre-Louis Bossart @ 2021-09-28 15:00 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Jens Axboe, alsa-devel


>> I think what you need to use is use azx_get_pos_posbuf(chip, azx_dev);
>> unconditionally, both for capture and playback, and remove the use of
>> the skylake specific stuff and the delay.
> 
> When I measured, I saw a slight difference between values in DPIB and
> posbuf, so I wonder which is actually more accurate.  The latter is
> sometimes a bit behind the former.
> 
> If the posbuf is more correct in the sense for the PCM pointer, we can
> simply move back to the posbuf like other platforms.

DPIB registers are known to be updated 'too early' in the case of VC1
traffic, the recommendation is to use posbuf as a 'safer' alternative.
And it's certainly 'more correct' that the current work-arounds which
apply different solutions on capture and playback for no apparent reason.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting
  2021-09-28 15:00         ` Pierre-Louis Bossart
@ 2021-09-29  7:14           ` Takashi Iwai
  0 siblings, 0 replies; 8+ messages in thread
From: Takashi Iwai @ 2021-09-29  7:14 UTC (permalink / raw)
  To: Pierre-Louis Bossart; +Cc: Jens Axboe, alsa-devel

On Tue, 28 Sep 2021 17:00:32 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> >> I think what you need to use is use azx_get_pos_posbuf(chip, azx_dev);
> >> unconditionally, both for capture and playback, and remove the use of
> >> the skylake specific stuff and the delay.
> > 
> > When I measured, I saw a slight difference between values in DPIB and
> > posbuf, so I wonder which is actually more accurate.  The latter is
> > sometimes a bit behind the former.
> > 
> > If the posbuf is more correct in the sense for the PCM pointer, we can
> > simply move back to the posbuf like other platforms.
> 
> DPIB registers are known to be updated 'too early' in the case of VC1
> traffic, the recommendation is to use posbuf as a 'safer' alternative.
> And it's certainly 'more correct' that the current work-arounds which
> apply different solutions on capture and playback for no apparent reason.

OK, thanks for clarification.  I'll send a v2 patch set to reflect
that.


Takashi

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-29  7:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-10 14:10 [PATCH] ALSA: hda: Reduce udelay() at SKL+ position reporting Takashi Iwai
2021-09-10 17:04 ` Pierre-Louis Bossart
2021-09-13  5:48   ` Takashi Iwai
2021-09-21 22:18     ` Pierre-Louis Bossart
2021-09-28  9:01       ` Takashi Iwai
2021-09-28 15:00         ` Pierre-Louis Bossart
2021-09-29  7:14           ` Takashi Iwai
2021-09-13  2:36 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.