All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] ALSA: hda: Reduce CPU hog with SKL+ position reporting
@ 2021-09-29  7:29 Takashi Iwai
  2021-09-29  7:29 ` [PATCH v2 1/2] ALSA: hda: Reduce udelay() at " Takashi Iwai
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Takashi Iwai @ 2021-09-29  7:29 UTC (permalink / raw)
  To: alsa-devel; +Cc: Jens Axboe, Pierre-Louis Bossart

Hi,

this is a v2 patch set for reducing the CPU hog with the HD-audio PCM
position reporting.  The first patch is almost same, while the second
patch is added to take back to the position buffer as suggested by
Pierre.


Takashi

v1: https://lore.kernel.org/r/20210910141002.32749-1-tiwai@suse.de

===

Takashi Iwai (2):
  ALSA: hda: Reduce udelay() at SKL+ position reporting
  ALSA: hda: Use position buffer for SKL+ again

 sound/pci/hda/hda_intel.c | 49 ++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 26 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] ALSA: hda: Reduce udelay() at SKL+ position reporting
  2021-09-29  7:29 [PATCH v2 0/2] ALSA: hda: Reduce CPU hog with SKL+ position reporting Takashi Iwai
@ 2021-09-29  7:29 ` Takashi Iwai
  2021-09-29 14:15   ` Pierre-Louis Bossart
  2021-09-29  7:29 ` [PATCH v2 2/2] ALSA: hda: Use position buffer for SKL+ again Takashi Iwai
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Takashi Iwai @ 2021-09-29  7:29 UTC (permalink / raw)
  To: alsa-devel; +Cc: Jens Axboe, Pierre-Louis Bossart

The position reporting on Intel Skylake and later chips via
azx_get_pos_skl() contains a udelay(20) call for the capture streams.
A call for this alone doesn't sound too harmful.  However, as the
pointer PCM ops is one of the hottest path in the PCM operations --
especially for the timer-scheduled operations like PulseAudio -- such
a delay hogs CPU usage significantly in the total performance.

The code there was taken from the original code in ASoC SST Skylake
driver blindly.  The udelay() is a workaround for the case where the
reported position is behind the period boundary at the timing
triggered from interrupts; applications often expect that the full
data is available for the whole period when returned (and also that's
the definition of the ALSA PCM period).

OTOH, HD-audio (legacy) driver has already some workarounds for the
delayed position reporting due to its relatively large FIFO, such as
the BDL position adjustment and the delayed period-elapsed call in the
work.  That said, the udelay() is almost superfluous for HD-audio
driver unlike SST, and we can drop the udelay().

Though, the current code doesn't guarantee the full period readiness
as mentioned in the above, but rather it checks the wallclock and
detects the unexpected jump.  That's one missing piece, and the drop
of udelay() needs a bit more sanity checks for the delayed handling.

This patch implements those: the drop of udelay() call in
azx_get_pos_skl() and the more proper check of hwptr in
azx_position_ok().  The latter change is applied only for the case
where the stream is running in the normal mode without
no_period_wakeup flag.  When no_period_wakeup is set, it essentially
ignores the period handling and rather concentrates only on the
current position; which implies that we don't need to care about the
period boundary at all.

Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 sound/pci/hda/hda_intel.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 47777439961c..9989ec4dc324 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -637,13 +637,17 @@ static int azx_position_check(struct azx *chip, struct azx_dev *azx_dev)
  * the update-IRQ timing.  The IRQ is issued before actually the
  * data is processed.  So, we need to process it afterwords in a
  * workqueue.
+ *
+ * Returns 1 if OK to proceed, 0 for delay handling, -1 for skipping update
  */
 static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
 {
 	struct snd_pcm_substream *substream = azx_dev->core.substream;
+	struct snd_pcm_runtime *runtime = substream->runtime;
 	int stream = substream->stream;
 	u32 wallclk;
 	unsigned int pos;
+	snd_pcm_uframes_t hwptr, target;
 
 	wallclk = azx_readl(chip, WALLCLK) - azx_dev->core.start_wallclk;
 	if (wallclk < (azx_dev->core.period_wallclk * 2) / 3)
@@ -680,6 +684,24 @@ static int azx_position_ok(struct azx *chip, struct azx_dev *azx_dev)
 		/* NG - it's below the first next period boundary */
 		return chip->bdl_pos_adj ? 0 : -1;
 	azx_dev->core.start_wallclk += wallclk;
+
+	if (azx_dev->core.no_period_wakeup)
+		return 1; /* OK, no need to check period boundary */
+
+	if (runtime->hw_ptr_base != runtime->hw_ptr_interrupt)
+		return 1; /* OK, already in hwptr updating process */
+
+	/* check whether the period gets really elapsed */
+	pos = bytes_to_frames(runtime, pos);
+	hwptr = runtime->hw_ptr_base + pos;
+	if (hwptr < runtime->status->hw_ptr)
+		hwptr += runtime->buffer_size;
+	target = runtime->hw_ptr_interrupt + runtime->period_size;
+	if (hwptr < target) {
+		/* too early wakeup, process it later */
+		return chip->bdl_pos_adj ? 0 : -1;
+	}
+
 	return 1; /* OK, it's fine */
 }
 
@@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
 	if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
 		return azx_skl_get_dpib_pos(chip, azx_dev);
 
-	/* For capture, we need to read posbuf, but it requires a delay
-	 * for the possible boundary overlap; the read of DPIB fetches the
-	 * actual posbuf
-	 */
-	udelay(20);
+	/* read of DPIB fetches the actual posbuf */
 	azx_skl_get_dpib_pos(chip, azx_dev);
 	return azx_get_pos_posbuf(chip, azx_dev);
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] ALSA: hda: Use position buffer for SKL+ again
  2021-09-29  7:29 [PATCH v2 0/2] ALSA: hda: Reduce CPU hog with SKL+ position reporting Takashi Iwai
  2021-09-29  7:29 ` [PATCH v2 1/2] ALSA: hda: Reduce udelay() at " Takashi Iwai
@ 2021-09-29  7:29 ` Takashi Iwai
  2021-09-29 14:54 ` [PATCH v2 0/2] ALSA: hda: Reduce CPU hog with SKL+ position reporting Pierre-Louis Bossart
  2021-09-30 11:50 ` Takashi Iwai
  3 siblings, 0 replies; 8+ messages in thread
From: Takashi Iwai @ 2021-09-29  7:29 UTC (permalink / raw)
  To: alsa-devel; +Cc: Jens Axboe, Pierre-Louis Bossart

The commit f87e7f25893d ("ALSA: hda - Improved position reporting on
SKL+") changed the PCM position report for SKL+ chips to use DPIB, but
according to Pierre, DPIB is no best choice for the accurate position
reports and it often reports too early.  The recommended method is
rather the classical position buffer.

This patch makes the PCM position reporting on SKL+ back to the
position buffer again.

Fixes: f87e7f25893d ("ALSA: hda - Improved position reporting on SKL+")
Suggested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
 sound/pci/hda/hda_intel.c | 23 +----------------------
 1 file changed, 1 insertion(+), 22 deletions(-)

diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index 9989ec4dc324..14298f015fba 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -880,27 +880,6 @@ static int azx_get_delay_from_fifo(struct azx *chip, struct azx_dev *azx_dev,
 	return substream->runtime->delay;
 }
 
-static unsigned int azx_skl_get_dpib_pos(struct azx *chip,
-					 struct azx_dev *azx_dev)
-{
-	return _snd_hdac_chip_readl(azx_bus(chip),
-				    AZX_REG_VS_SDXDPIB_XBASE +
-				    (AZX_REG_VS_SDXDPIB_XINTERVAL *
-				     azx_dev->core.index));
-}
-
-/* get the current DMA position with correction on SKL+ chips */
-static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
-{
-	/* DPIB register gives a more accurate position for playback */
-	if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
-		return azx_skl_get_dpib_pos(chip, azx_dev);
-
-	/* read of DPIB fetches the actual posbuf */
-	azx_skl_get_dpib_pos(chip, azx_dev);
-	return azx_get_pos_posbuf(chip, azx_dev);
-}
-
 static void __azx_shutdown_chip(struct azx *chip, bool skip_link_reset)
 {
 	azx_stop_chip(chip);
@@ -1590,7 +1569,7 @@ static void assign_position_fix(struct azx *chip, int fix)
 		[POS_FIX_POSBUF] = azx_get_pos_posbuf,
 		[POS_FIX_VIACOMBO] = azx_via_get_position,
 		[POS_FIX_COMBO] = azx_get_pos_lpib,
-		[POS_FIX_SKL] = azx_get_pos_skl,
+		[POS_FIX_SKL] = azx_get_pos_posbuf,
 		[POS_FIX_FIFO] = azx_get_pos_fifo,
 	};
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] ALSA: hda: Reduce udelay() at SKL+ position reporting
  2021-09-29  7:29 ` [PATCH v2 1/2] ALSA: hda: Reduce udelay() at " Takashi Iwai
@ 2021-09-29 14:15   ` Pierre-Louis Bossart
  2021-09-29 14:39     ` Takashi Iwai
  0 siblings, 1 reply; 8+ messages in thread
From: Pierre-Louis Bossart @ 2021-09-29 14:15 UTC (permalink / raw)
  To: Takashi Iwai, alsa-devel; +Cc: Jens Axboe


> @@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
>  	if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
>  		return azx_skl_get_dpib_pos(chip, azx_dev);
>  
> -	/* For capture, we need to read posbuf, but it requires a delay
> -	 * for the possible boundary overlap; the read of DPIB fetches the
> -	 * actual posbuf
> -	 */
> -	udelay(20);
> +	/* read of DPIB fetches the actual posbuf */
>  	azx_skl_get_dpib_pos(chip, azx_dev);

I don't think extra read has any effect, it could be removed....

>  	return azx_get_pos_posbuf(chip, azx_dev);
>  }

The suggestion was to further simplify with

static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev
*azx_dev)
{
	return azx_get_pos_posbuf(chip, azx_dev);
}

i.e. same behavior on playback and capture. that's the recommendation
from Intel hw folks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] ALSA: hda: Reduce udelay() at SKL+ position reporting
  2021-09-29 14:15   ` Pierre-Louis Bossart
@ 2021-09-29 14:39     ` Takashi Iwai
  2021-09-29 14:53       ` Pierre-Louis Bossart
  0 siblings, 1 reply; 8+ messages in thread
From: Takashi Iwai @ 2021-09-29 14:39 UTC (permalink / raw)
  To: Pierre-Louis Bossart; +Cc: Jens Axboe, alsa-devel

On Wed, 29 Sep 2021 16:15:44 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> > @@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
> >  	if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
> >  		return azx_skl_get_dpib_pos(chip, azx_dev);
> >  
> > -	/* For capture, we need to read posbuf, but it requires a delay
> > -	 * for the possible boundary overlap; the read of DPIB fetches the
> > -	 * actual posbuf
> > -	 */
> > -	udelay(20);
> > +	/* read of DPIB fetches the actual posbuf */
> >  	azx_skl_get_dpib_pos(chip, azx_dev);
> 
> I don't think extra read has any effect, it could be removed....
> 
> >  	return azx_get_pos_posbuf(chip, azx_dev);
> >  }
> 
> The suggestion was to further simplify with
> 
> static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev
> *azx_dev)
> {
> 	return azx_get_pos_posbuf(chip, azx_dev);
> }
> 
> i.e. same behavior on playback and capture. that's the recommendation
> from Intel hw folks.

It's achieved in the second patch.


Takashi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/2] ALSA: hda: Reduce udelay() at SKL+ position reporting
  2021-09-29 14:39     ` Takashi Iwai
@ 2021-09-29 14:53       ` Pierre-Louis Bossart
  0 siblings, 0 replies; 8+ messages in thread
From: Pierre-Louis Bossart @ 2021-09-29 14:53 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Jens Axboe, alsa-devel



On 9/29/21 9:39 AM, Takashi Iwai wrote:
> On Wed, 29 Sep 2021 16:15:44 +0200,
> Pierre-Louis Bossart wrote:
>>
>>
>>> @@ -874,11 +896,7 @@ static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev *azx_dev)
>>>  	if (azx_dev->core.substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
>>>  		return azx_skl_get_dpib_pos(chip, azx_dev);
>>>  
>>> -	/* For capture, we need to read posbuf, but it requires a delay
>>> -	 * for the possible boundary overlap; the read of DPIB fetches the
>>> -	 * actual posbuf
>>> -	 */
>>> -	udelay(20);
>>> +	/* read of DPIB fetches the actual posbuf */
>>>  	azx_skl_get_dpib_pos(chip, azx_dev);
>>
>> I don't think extra read has any effect, it could be removed....
>>
>>>  	return azx_get_pos_posbuf(chip, azx_dev);
>>>  }
>>
>> The suggestion was to further simplify with
>>
>> static unsigned int azx_get_pos_skl(struct azx *chip, struct azx_dev
>> *azx_dev)
>> {
>> 	return azx_get_pos_posbuf(chip, azx_dev);
>> }
>>
>> i.e. same behavior on playback and capture. that's the recommendation
>> from Intel hw folks.
> 
> It's achieved in the second patch.

Ah ok. the additional comment made me think it was there to stay.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] ALSA: hda: Reduce CPU hog with SKL+ position reporting
  2021-09-29  7:29 [PATCH v2 0/2] ALSA: hda: Reduce CPU hog with SKL+ position reporting Takashi Iwai
  2021-09-29  7:29 ` [PATCH v2 1/2] ALSA: hda: Reduce udelay() at " Takashi Iwai
  2021-09-29  7:29 ` [PATCH v2 2/2] ALSA: hda: Use position buffer for SKL+ again Takashi Iwai
@ 2021-09-29 14:54 ` Pierre-Louis Bossart
  2021-09-30 11:50 ` Takashi Iwai
  3 siblings, 0 replies; 8+ messages in thread
From: Pierre-Louis Bossart @ 2021-09-29 14:54 UTC (permalink / raw)
  To: Takashi Iwai, alsa-devel; +Cc: Jens Axboe



On 9/29/21 2:29 AM, Takashi Iwai wrote:
> Hi,
> 
> this is a v2 patch set for reducing the CPU hog with the HD-audio PCM
> position reporting.  The first patch is almost same, while the second
> patch is added to take back to the position buffer as suggested by
> Pierre.
> 
> 
> Takashi
> 
> v1: https://lore.kernel.org/r/20210910141002.32749-1-tiwai@suse.de
> 
> ===
> 
> Takashi Iwai (2):
>   ALSA: hda: Reduce udelay() at SKL+ position reporting
>   ALSA: hda: Use position buffer for SKL+ again

Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/2] ALSA: hda: Reduce CPU hog with SKL+ position reporting
  2021-09-29  7:29 [PATCH v2 0/2] ALSA: hda: Reduce CPU hog with SKL+ position reporting Takashi Iwai
                   ` (2 preceding siblings ...)
  2021-09-29 14:54 ` [PATCH v2 0/2] ALSA: hda: Reduce CPU hog with SKL+ position reporting Pierre-Louis Bossart
@ 2021-09-30 11:50 ` Takashi Iwai
  3 siblings, 0 replies; 8+ messages in thread
From: Takashi Iwai @ 2021-09-30 11:50 UTC (permalink / raw)
  To: alsa-devel; +Cc: Jens Axboe, Pierre-Louis Bossart

On Wed, 29 Sep 2021 09:29:32 +0200,
Takashi Iwai wrote:
> 
> Hi,
> 
> this is a v2 patch set for reducing the CPU hog with the HD-audio PCM
> position reporting.  The first patch is almost same, while the second
> patch is added to take back to the position buffer as suggested by
> Pierre.
> 
> 
> Takashi
> 
> v1: https://lore.kernel.org/r/20210910141002.32749-1-tiwai@suse.de
> 
> ===
> 
> Takashi Iwai (2):
>   ALSA: hda: Reduce udelay() at SKL+ position reporting
>   ALSA: hda: Use position buffer for SKL+ again

Now applied both patches to for-next branch.


Takashi

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-09-30 11:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-29  7:29 [PATCH v2 0/2] ALSA: hda: Reduce CPU hog with SKL+ position reporting Takashi Iwai
2021-09-29  7:29 ` [PATCH v2 1/2] ALSA: hda: Reduce udelay() at " Takashi Iwai
2021-09-29 14:15   ` Pierre-Louis Bossart
2021-09-29 14:39     ` Takashi Iwai
2021-09-29 14:53       ` Pierre-Louis Bossart
2021-09-29  7:29 ` [PATCH v2 2/2] ALSA: hda: Use position buffer for SKL+ again Takashi Iwai
2021-09-29 14:54 ` [PATCH v2 0/2] ALSA: hda: Reduce CPU hog with SKL+ position reporting Pierre-Louis Bossart
2021-09-30 11:50 ` Takashi Iwai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.