linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2)
       [not found] <1587494585.7pihgq0z3i.none.ref@localhost>
@ 2020-04-21 19:08 ` Alex Xu (Hello71)
  2020-04-21 19:40   ` Takashi Iwai
  2020-04-22 20:50   ` Bjorn Helgaas
  0 siblings, 2 replies; 6+ messages in thread
From: Alex Xu (Hello71) @ 2020-04-21 19:08 UTC (permalink / raw)
  To: alsa-devel, Takashi Iwai; +Cc: Roy Spliet, linux-kernel, linux-pci

With 5.7-rc2, after resuming from suspend to RAM, I get:

[   55.679382] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
[   55.679405] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[   55.679410] pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00100000/04400000
[   55.679414] pcieport 0000:00:03.1: AER:    [20] UnsupReq               (First)
[   55.679417] pcieport 0000:00:03.1: AER:   TLP Header: 40000004 0a0000ff fffc0e80 00000000
[   55.679423] amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback)
[   55.679425] snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback)
[   55.679455] pcieport 0000:00:03.1: AER: device recovery failed

Then the display freezes and the system basically falls apart (can't 
even sudo reboot -f, need to use magic sysrq).

I bisected this to "ALSA: hda: Skip controller resume if not needed". 
Setting snd_hda_intel.power_save=0 resolves the issue.

I am using an ASRock B450 Pro4 with Realtek HDA codec:

[    1.009400] snd_hda_intel 0000:0a:00.1: enabling device (0000 -> 0002)
[    1.009425] snd_hda_intel 0000:0a:00.1: Force to non-snoop mode
[    1.009653] snd_hda_intel 0000:0c:00.3: enabling device (0000 -> 0002)
[    1.021452] snd_hda_codec_generic hdaudioC0D0: ignore pin 0x7, too many assigned pins
[    1.021461] snd_hda_codec_generic hdaudioC0D0: ignore pin 0x9, too many assigned pins
[    1.021471] snd_hda_codec_generic hdaudioC0D0: ignore pin 0xb, too many assigned pins
[    1.021480] snd_hda_codec_generic hdaudioC0D0: ignore pin 0xd, too many assigned pins
[    1.021482] snd_hda_codec_generic hdaudioC0D0: autoconfig for Generic: line_outs=0 (0x0/0x0/0x0/0x0/0x0) type:line
[    1.021482] snd_hda_codec_generic hdaudioC0D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
[    1.021483] snd_hda_codec_generic hdaudioC0D0:    hp_outs=0 (0x0/0x0/0x0/0x0/0x0)
[    1.021484] snd_hda_codec_generic hdaudioC0D0:    mono: mono_out=0x0
[    1.021484] snd_hda_codec_generic hdaudioC0D0:    dig-out=0x3/0x5
[    1.021485] snd_hda_codec_generic hdaudioC0D0:    inputs:
[    1.046053] snd_hda_codec_realtek hdaudioC1D0: autoconfig for ALC892: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:line
[    1.046054] snd_hda_codec_realtek hdaudioC1D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
[    1.046055] snd_hda_codec_realtek hdaudioC1D0:    hp_outs=1 (0x1b/0x0/0x0/0x0/0x0)
[    1.046055] snd_hda_codec_realtek hdaudioC1D0:    mono: mono_out=0x0
[    1.046056] snd_hda_codec_realtek hdaudioC1D0:    inputs:
[    1.046057] snd_hda_codec_realtek hdaudioC1D0:      Front Mic=0x19
[    1.046058] snd_hda_codec_realtek hdaudioC1D0:      Rear Mic=0x18
[    1.046058] snd_hda_codec_realtek hdaudioC1D0:      Line=0x1a

I also have an ASUS RX 480 graphics card with HDMI audio output.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2)
  2020-04-21 19:08 ` Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) Alex Xu (Hello71)
@ 2020-04-21 19:40   ` Takashi Iwai
  2020-04-22 20:50   ` Bjorn Helgaas
  1 sibling, 0 replies; 6+ messages in thread
From: Takashi Iwai @ 2020-04-21 19:40 UTC (permalink / raw)
  To: Alex Xu (Hello71); +Cc: alsa-devel, Roy Spliet, linux-kernel, linux-pci

On Tue, 21 Apr 2020 21:08:44 +0200,
Alex Xu (Hello71) wrote:
> 
> With 5.7-rc2, after resuming from suspend to RAM, I get:
> 
> [   55.679382] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
> [   55.679405] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> [   55.679410] pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00100000/04400000
> [   55.679414] pcieport 0000:00:03.1: AER:    [20] UnsupReq               (First)
> [   55.679417] pcieport 0000:00:03.1: AER:   TLP Header: 40000004 0a0000ff fffc0e80 00000000
> [   55.679423] amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback)
> [   55.679425] snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback)
> [   55.679455] pcieport 0000:00:03.1: AER: device recovery failed
> 
> Then the display freezes and the system basically falls apart (can't 
> even sudo reboot -f, need to use magic sysrq).
> 
> I bisected this to "ALSA: hda: Skip controller resume if not needed". 
> Setting snd_hda_intel.power_save=0 resolves the issue.

Hrm, it means the condition to skip the controller resume doesn't fit
well.  Does the patch below help?

But looking at the dmesg output:
> [    1.021452] snd_hda_codec_generic hdaudioC0D0: ignore pin 0x7, too many assigned pins
> [    1.021461] snd_hda_codec_generic hdaudioC0D0: ignore pin 0x9, too many assigned pins
> [    1.021471] snd_hda_codec_generic hdaudioC0D0: ignore pin 0xb, too many assigned pins
> [    1.021480] snd_hda_codec_generic hdaudioC0D0: ignore pin 0xd, too many assigned pins
> [    1.021482] snd_hda_codec_generic hdaudioC0D0: autoconfig for Generic: line_outs=0 (0x0/0x0/0x0/0x0/0x0) type:line
> [    1.021482] snd_hda_codec_generic hdaudioC0D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
> [    1.021483] snd_hda_codec_generic hdaudioC0D0:    hp_outs=0 (0x0/0x0/0x0/0x0/0x0)
> [    1.021484] snd_hda_codec_generic hdaudioC0D0:    mono: mono_out=0x0
> [    1.021484] snd_hda_codec_generic hdaudioC0D0:    dig-out=0x3/0x5
> [    1.021485] snd_hda_codec_generic hdaudioC0D0:    inputs:

... it looks like snd-hda-codec-generic is used for HDMI/DP codec.
This can't work well.  Did you enable CONFIG_SND_HDA_HDMI?

In anyway, please give alsa-info.sh output.  Run the script with
--no-upload option and attach the output.


thanks,

Takashi

---
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -1060,7 +1060,7 @@ static int azx_resume(struct device *dev)
 
 	/* check for the forced resume */
 	list_for_each_codec(codec, &chip->bus) {
-		if (hda_codec_need_resume(codec)) {
+		if (!codec->relaxed_resume) {
 			forced_resume = true;
 			break;
 		}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2)
  2020-04-21 19:08 ` Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) Alex Xu (Hello71)
  2020-04-21 19:40   ` Takashi Iwai
@ 2020-04-22 20:50   ` Bjorn Helgaas
  2020-04-22 21:25     ` Takashi Iwai
  1 sibling, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2020-04-22 20:50 UTC (permalink / raw)
  To: Alex Xu (Hello71)
  Cc: alsa-devel, Takashi Iwai, Roy Spliet, linux-kernel, linux-pci,
	Rafael J. Wysocki, linux-pm

[+cc Rafael, linux-pm]

On Tue, Apr 21, 2020 at 03:08:44PM -0400, Alex Xu (Hello71) wrote:
> With 5.7-rc2, after resuming from suspend to RAM, I get:
> 
> [   55.679382] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
> [   55.679405] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> [   55.679410] pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00100000/04400000
> [   55.679414] pcieport 0000:00:03.1: AER:    [20] UnsupReq               (First)
> [   55.679417] pcieport 0000:00:03.1: AER:   TLP Header: 40000004 0a0000ff fffc0e80 00000000
> [   55.679423] amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback)
> [   55.679425] snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback)
> [   55.679455] pcieport 0000:00:03.1: AER: device recovery failed

I'm not at all confident in my decoding skills, but I *think* the TLP
header decodes to:

  Fmt           010b         3 DW header with data (32-bit address)
  Type          00000b       MWr
  Length        0x4          4 DW = 16 bytes
  Requester ID  0x0a00       0a:00.0
  Byte enables  0xff
  Address       0xfffc0e80

which would mean the 0a:00.0 GPU did a 16-byte write to 0xfffc0e80,
and the 00:03.1 Root Port reported that as an Unsupported Request.
I don't know why that would be unless the address is invalid.

Maybe that's supposed to be an MSI address?  Maybe a complete dmesg or
/proc/iomem would have a clue?

I feel like this UR issue could be a PCI core issue or maybe some sort
of misuse of PCI power management, but I can't seem to get traction on
it.

> Then the display freezes and the system basically falls apart (can't 
> even sudo reboot -f, need to use magic sysrq).
> 
> I bisected this to "ALSA: hda: Skip controller resume if not needed". 
> Setting snd_hda_intel.power_save=0 resolves the issue.

FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip
controller resume if not needed"),
https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in
v5.7-rc2.

> I am using an ASRock B450 Pro4 with Realtek HDA codec:
> 
> [    1.009400] snd_hda_intel 0000:0a:00.1: enabling device (0000 -> 0002)
> [    1.009425] snd_hda_intel 0000:0a:00.1: Force to non-snoop mode
> [    1.009653] snd_hda_intel 0000:0c:00.3: enabling device (0000 -> 0002)
> [    1.021452] snd_hda_codec_generic hdaudioC0D0: ignore pin 0x7, too many assigned pins
> [    1.021461] snd_hda_codec_generic hdaudioC0D0: ignore pin 0x9, too many assigned pins
> [    1.021471] snd_hda_codec_generic hdaudioC0D0: ignore pin 0xb, too many assigned pins
> [    1.021480] snd_hda_codec_generic hdaudioC0D0: ignore pin 0xd, too many assigned pins
> [    1.021482] snd_hda_codec_generic hdaudioC0D0: autoconfig for Generic: line_outs=0 (0x0/0x0/0x0/0x0/0x0) type:line
> [    1.021482] snd_hda_codec_generic hdaudioC0D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
> [    1.021483] snd_hda_codec_generic hdaudioC0D0:    hp_outs=0 (0x0/0x0/0x0/0x0/0x0)
> [    1.021484] snd_hda_codec_generic hdaudioC0D0:    mono: mono_out=0x0
> [    1.021484] snd_hda_codec_generic hdaudioC0D0:    dig-out=0x3/0x5
> [    1.021485] snd_hda_codec_generic hdaudioC0D0:    inputs:
> [    1.046053] snd_hda_codec_realtek hdaudioC1D0: autoconfig for ALC892: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:line
> [    1.046054] snd_hda_codec_realtek hdaudioC1D0:    speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
> [    1.046055] snd_hda_codec_realtek hdaudioC1D0:    hp_outs=1 (0x1b/0x0/0x0/0x0/0x0)
> [    1.046055] snd_hda_codec_realtek hdaudioC1D0:    mono: mono_out=0x0
> [    1.046056] snd_hda_codec_realtek hdaudioC1D0:    inputs:
> [    1.046057] snd_hda_codec_realtek hdaudioC1D0:      Front Mic=0x19
> [    1.046058] snd_hda_codec_realtek hdaudioC1D0:      Rear Mic=0x18
> [    1.046058] snd_hda_codec_realtek hdaudioC1D0:      Line=0x1a
> 
> I also have an ASUS RX 480 graphics card with HDMI audio output.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2)
  2020-04-22 20:50   ` Bjorn Helgaas
@ 2020-04-22 21:25     ` Takashi Iwai
  2020-04-22 23:21       ` Bjorn Helgaas
  0 siblings, 1 reply; 6+ messages in thread
From: Takashi Iwai @ 2020-04-22 21:25 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Alex Xu (Hello71),
	alsa-devel, Roy Spliet, linux-kernel, linux-pci,
	Rafael J. Wysocki, linux-pm

On Wed, 22 Apr 2020 22:50:28 +0200,
Bjorn Helgaas wrote:
> 
> [+cc Rafael, linux-pm]
> 
> On Tue, Apr 21, 2020 at 03:08:44PM -0400, Alex Xu (Hello71) wrote:
> > With 5.7-rc2, after resuming from suspend to RAM, I get:
> > 
> > [   55.679382] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
> > [   55.679405] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
> > [   55.679410] pcieport 0000:00:03.1: AER:   device [1022:1453] error status/mask=00100000/04400000
> > [   55.679414] pcieport 0000:00:03.1: AER:    [20] UnsupReq               (First)
> > [   55.679417] pcieport 0000:00:03.1: AER:   TLP Header: 40000004 0a0000ff fffc0e80 00000000
> > [   55.679423] amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback)
> > [   55.679425] snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback)
> > [   55.679455] pcieport 0000:00:03.1: AER: device recovery failed
> 
> I'm not at all confident in my decoding skills, but I *think* the TLP
> header decodes to:
> 
>   Fmt           010b         3 DW header with data (32-bit address)
>   Type          00000b       MWr
>   Length        0x4          4 DW = 16 bytes
>   Requester ID  0x0a00       0a:00.0
>   Byte enables  0xff
>   Address       0xfffc0e80
> 
> which would mean the 0a:00.0 GPU did a 16-byte write to 0xfffc0e80,
> and the 00:03.1 Root Port reported that as an Unsupported Request.
> I don't know why that would be unless the address is invalid.
> 
> Maybe that's supposed to be an MSI address?  Maybe a complete dmesg or
> /proc/iomem would have a clue?
> 
> I feel like this UR issue could be a PCI core issue or maybe some sort
> of misuse of PCI power management, but I can't seem to get traction on
> it.
> 
> > Then the display freezes and the system basically falls apart (can't 
> > even sudo reboot -f, need to use magic sysrq).
> > 
> > I bisected this to "ALSA: hda: Skip controller resume if not needed". 
> > Setting snd_hda_intel.power_save=0 resolves the issue.
> 
> FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip
> controller resume if not needed"),
> https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in
> v5.7-rc2.

Yes, and I posted the fix patch right now:
  https://lore.kernel.org/r/20200422203744.26299-1-tiwai@suse.de

The possible cause was the tricky resume code that both HD-audio
controller (the parent PCI device) and the codec devices used.

At least the patch above seems working for the reporter's machine.
Now we need a bit more testing before merging, but it looks promising,
so far.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2)
  2020-04-22 21:25     ` Takashi Iwai
@ 2020-04-22 23:21       ` Bjorn Helgaas
  2020-04-23  7:05         ` Takashi Iwai
  0 siblings, 1 reply; 6+ messages in thread
From: Bjorn Helgaas @ 2020-04-22 23:21 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Alex Xu (Hello71),
	alsa-devel, Roy Spliet, linux-kernel, linux-pci,
	Rafael J. Wysocki, linux-pm

On Wed, Apr 22, 2020 at 11:25:04PM +0200, Takashi Iwai wrote:
> On Wed, 22 Apr 2020 22:50:28 +0200,
> Bjorn Helgaas wrote:
> > ...
> > I feel like this UR issue could be a PCI core issue or maybe some sort
> > of misuse of PCI power management, but I can't seem to get traction on
> > it.
> > 
> > > Then the display freezes and the system basically falls apart (can't 
> > > even sudo reboot -f, need to use magic sysrq).
> > > 
> > > I bisected this to "ALSA: hda: Skip controller resume if not needed". 
> > > Setting snd_hda_intel.power_save=0 resolves the issue.
> > 
> > FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip
> > controller resume if not needed"),
> > https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in
> > v5.7-rc2.
> 
> Yes, and I posted the fix patch right now:
>   https://lore.kernel.org/r/20200422203744.26299-1-tiwai@suse.de
> 
> The possible cause was the tricky resume code that both HD-audio
> controller (the parent PCI device) and the codec devices used.
> 
> At least the patch above seems working for the reporter's machine.
> Now we need a bit more testing before merging, but it looks promising,
> so far.

Great, I'm glad you figured something out because I sure wasn't
getting anywhere!

Maybe this is a tangent, but I can't figure out what
snd_power_change_state() is doing.  It *looks* like it's supposed to
change the PCI power state, but I gave up trying to figure out where
it actually touches the device.

It seems like sound has more magic in power management than other
device types, which makes me wonder if we're not providing the right
interfaces or something.

Bjorn

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2)
  2020-04-22 23:21       ` Bjorn Helgaas
@ 2020-04-23  7:05         ` Takashi Iwai
  0 siblings, 0 replies; 6+ messages in thread
From: Takashi Iwai @ 2020-04-23  7:05 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Alex Xu (Hello71),
	alsa-devel, Roy Spliet, linux-kernel, linux-pci,
	Rafael J. Wysocki, linux-pm

On Thu, 23 Apr 2020 01:21:27 +0200,
Bjorn Helgaas wrote:
> 
> On Wed, Apr 22, 2020 at 11:25:04PM +0200, Takashi Iwai wrote:
> > On Wed, 22 Apr 2020 22:50:28 +0200,
> > Bjorn Helgaas wrote:
> > > ...
> > > I feel like this UR issue could be a PCI core issue or maybe some sort
> > > of misuse of PCI power management, but I can't seem to get traction on
> > > it.
> > > 
> > > > Then the display freezes and the system basically falls apart (can't 
> > > > even sudo reboot -f, need to use magic sysrq).
> > > > 
> > > > I bisected this to "ALSA: hda: Skip controller resume if not needed". 
> > > > Setting snd_hda_intel.power_save=0 resolves the issue.
> > > 
> > > FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip
> > > controller resume if not needed"),
> > > https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in
> > > v5.7-rc2.
> > 
> > Yes, and I posted the fix patch right now:
> >   https://lore.kernel.org/r/20200422203744.26299-1-tiwai@suse.de
> > 
> > The possible cause was the tricky resume code that both HD-audio
> > controller (the parent PCI device) and the codec devices used.
> > 
> > At least the patch above seems working for the reporter's machine.
> > Now we need a bit more testing before merging, but it looks promising,
> > so far.
> 
> Great, I'm glad you figured something out because I sure wasn't
> getting anywhere!
> 
> Maybe this is a tangent, but I can't figure out what
> snd_power_change_state() is doing.  It *looks* like it's supposed to
> change the PCI power state, but I gave up trying to figure out where
> it actually touches the device.

Not really, it merely updates the internal state field stored in the
sound card object, see in include/sound/core.h:

static inline void snd_power_change_state(struct snd_card *card, unsigned int state)
{
	card->power_state = state;
	wake_up(&card->power_sleep);
}

The sound API blocks the operation while suspend/resume explicitly
with this card top-level signal.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-04-23  7:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1587494585.7pihgq0z3i.none.ref@localhost>
2020-04-21 19:08 ` Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) Alex Xu (Hello71)
2020-04-21 19:40   ` Takashi Iwai
2020-04-22 20:50   ` Bjorn Helgaas
2020-04-22 21:25     ` Takashi Iwai
2020-04-22 23:21       ` Bjorn Helgaas
2020-04-23  7:05         ` Takashi Iwai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).