* Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) [not found] <1587494585.7pihgq0z3i.none.ref@localhost> @ 2020-04-21 19:08 ` Alex Xu (Hello71) 2020-04-21 19:40 ` Takashi Iwai 2020-04-22 20:50 ` Bjorn Helgaas 0 siblings, 2 replies; 6+ messages in thread From: Alex Xu (Hello71) @ 2020-04-21 19:08 UTC (permalink / raw) To: alsa-devel, Takashi Iwai; +Cc: Roy Spliet, linux-kernel, linux-pci With 5.7-rc2, after resuming from suspend to RAM, I get: [ 55.679382] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0 [ 55.679405] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 55.679410] pcieport 0000:00:03.1: AER: device [1022:1453] error status/mask=00100000/04400000 [ 55.679414] pcieport 0000:00:03.1: AER: [20] UnsupReq (First) [ 55.679417] pcieport 0000:00:03.1: AER: TLP Header: 40000004 0a0000ff fffc0e80 00000000 [ 55.679423] amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback) [ 55.679425] snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback) [ 55.679455] pcieport 0000:00:03.1: AER: device recovery failed Then the display freezes and the system basically falls apart (can't even sudo reboot -f, need to use magic sysrq). I bisected this to "ALSA: hda: Skip controller resume if not needed". Setting snd_hda_intel.power_save=0 resolves the issue. I am using an ASRock B450 Pro4 with Realtek HDA codec: [ 1.009400] snd_hda_intel 0000:0a:00.1: enabling device (0000 -> 0002) [ 1.009425] snd_hda_intel 0000:0a:00.1: Force to non-snoop mode [ 1.009653] snd_hda_intel 0000:0c:00.3: enabling device (0000 -> 0002) [ 1.021452] snd_hda_codec_generic hdaudioC0D0: ignore pin 0x7, too many assigned pins [ 1.021461] snd_hda_codec_generic hdaudioC0D0: ignore pin 0x9, too many assigned pins [ 1.021471] snd_hda_codec_generic hdaudioC0D0: ignore pin 0xb, too many assigned pins [ 1.021480] snd_hda_codec_generic hdaudioC0D0: ignore pin 0xd, too many assigned pins [ 1.021482] snd_hda_codec_generic hdaudioC0D0: autoconfig for Generic: line_outs=0 (0x0/0x0/0x0/0x0/0x0) type:line [ 1.021482] snd_hda_codec_generic hdaudioC0D0: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0) [ 1.021483] snd_hda_codec_generic hdaudioC0D0: hp_outs=0 (0x0/0x0/0x0/0x0/0x0) [ 1.021484] snd_hda_codec_generic hdaudioC0D0: mono: mono_out=0x0 [ 1.021484] snd_hda_codec_generic hdaudioC0D0: dig-out=0x3/0x5 [ 1.021485] snd_hda_codec_generic hdaudioC0D0: inputs: [ 1.046053] snd_hda_codec_realtek hdaudioC1D0: autoconfig for ALC892: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:line [ 1.046054] snd_hda_codec_realtek hdaudioC1D0: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0) [ 1.046055] snd_hda_codec_realtek hdaudioC1D0: hp_outs=1 (0x1b/0x0/0x0/0x0/0x0) [ 1.046055] snd_hda_codec_realtek hdaudioC1D0: mono: mono_out=0x0 [ 1.046056] snd_hda_codec_realtek hdaudioC1D0: inputs: [ 1.046057] snd_hda_codec_realtek hdaudioC1D0: Front Mic=0x19 [ 1.046058] snd_hda_codec_realtek hdaudioC1D0: Rear Mic=0x18 [ 1.046058] snd_hda_codec_realtek hdaudioC1D0: Line=0x1a I also have an ASUS RX 480 graphics card with HDMI audio output. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) 2020-04-21 19:08 ` Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) Alex Xu (Hello71) @ 2020-04-21 19:40 ` Takashi Iwai 2020-04-22 20:50 ` Bjorn Helgaas 1 sibling, 0 replies; 6+ messages in thread From: Takashi Iwai @ 2020-04-21 19:40 UTC (permalink / raw) To: Alex Xu (Hello71); +Cc: alsa-devel, Roy Spliet, linux-kernel, linux-pci On Tue, 21 Apr 2020 21:08:44 +0200, Alex Xu (Hello71) wrote: > > With 5.7-rc2, after resuming from suspend to RAM, I get: > > [ 55.679382] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0 > [ 55.679405] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) > [ 55.679410] pcieport 0000:00:03.1: AER: device [1022:1453] error status/mask=00100000/04400000 > [ 55.679414] pcieport 0000:00:03.1: AER: [20] UnsupReq (First) > [ 55.679417] pcieport 0000:00:03.1: AER: TLP Header: 40000004 0a0000ff fffc0e80 00000000 > [ 55.679423] amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback) > [ 55.679425] snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback) > [ 55.679455] pcieport 0000:00:03.1: AER: device recovery failed > > Then the display freezes and the system basically falls apart (can't > even sudo reboot -f, need to use magic sysrq). > > I bisected this to "ALSA: hda: Skip controller resume if not needed". > Setting snd_hda_intel.power_save=0 resolves the issue. Hrm, it means the condition to skip the controller resume doesn't fit well. Does the patch below help? But looking at the dmesg output: > [ 1.021452] snd_hda_codec_generic hdaudioC0D0: ignore pin 0x7, too many assigned pins > [ 1.021461] snd_hda_codec_generic hdaudioC0D0: ignore pin 0x9, too many assigned pins > [ 1.021471] snd_hda_codec_generic hdaudioC0D0: ignore pin 0xb, too many assigned pins > [ 1.021480] snd_hda_codec_generic hdaudioC0D0: ignore pin 0xd, too many assigned pins > [ 1.021482] snd_hda_codec_generic hdaudioC0D0: autoconfig for Generic: line_outs=0 (0x0/0x0/0x0/0x0/0x0) type:line > [ 1.021482] snd_hda_codec_generic hdaudioC0D0: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0) > [ 1.021483] snd_hda_codec_generic hdaudioC0D0: hp_outs=0 (0x0/0x0/0x0/0x0/0x0) > [ 1.021484] snd_hda_codec_generic hdaudioC0D0: mono: mono_out=0x0 > [ 1.021484] snd_hda_codec_generic hdaudioC0D0: dig-out=0x3/0x5 > [ 1.021485] snd_hda_codec_generic hdaudioC0D0: inputs: ... it looks like snd-hda-codec-generic is used for HDMI/DP codec. This can't work well. Did you enable CONFIG_SND_HDA_HDMI? In anyway, please give alsa-info.sh output. Run the script with --no-upload option and attach the output. thanks, Takashi --- --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -1060,7 +1060,7 @@ static int azx_resume(struct device *dev) /* check for the forced resume */ list_for_each_codec(codec, &chip->bus) { - if (hda_codec_need_resume(codec)) { + if (!codec->relaxed_resume) { forced_resume = true; break; } ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) 2020-04-21 19:08 ` Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) Alex Xu (Hello71) 2020-04-21 19:40 ` Takashi Iwai @ 2020-04-22 20:50 ` Bjorn Helgaas 2020-04-22 21:25 ` Takashi Iwai 1 sibling, 1 reply; 6+ messages in thread From: Bjorn Helgaas @ 2020-04-22 20:50 UTC (permalink / raw) To: Alex Xu (Hello71) Cc: alsa-devel, Takashi Iwai, Roy Spliet, linux-kernel, linux-pci, Rafael J. Wysocki, linux-pm [+cc Rafael, linux-pm] On Tue, Apr 21, 2020 at 03:08:44PM -0400, Alex Xu (Hello71) wrote: > With 5.7-rc2, after resuming from suspend to RAM, I get: > > [ 55.679382] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0 > [ 55.679405] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) > [ 55.679410] pcieport 0000:00:03.1: AER: device [1022:1453] error status/mask=00100000/04400000 > [ 55.679414] pcieport 0000:00:03.1: AER: [20] UnsupReq (First) > [ 55.679417] pcieport 0000:00:03.1: AER: TLP Header: 40000004 0a0000ff fffc0e80 00000000 > [ 55.679423] amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback) > [ 55.679425] snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback) > [ 55.679455] pcieport 0000:00:03.1: AER: device recovery failed I'm not at all confident in my decoding skills, but I *think* the TLP header decodes to: Fmt 010b 3 DW header with data (32-bit address) Type 00000b MWr Length 0x4 4 DW = 16 bytes Requester ID 0x0a00 0a:00.0 Byte enables 0xff Address 0xfffc0e80 which would mean the 0a:00.0 GPU did a 16-byte write to 0xfffc0e80, and the 00:03.1 Root Port reported that as an Unsupported Request. I don't know why that would be unless the address is invalid. Maybe that's supposed to be an MSI address? Maybe a complete dmesg or /proc/iomem would have a clue? I feel like this UR issue could be a PCI core issue or maybe some sort of misuse of PCI power management, but I can't seem to get traction on it. > Then the display freezes and the system basically falls apart (can't > even sudo reboot -f, need to use magic sysrq). > > I bisected this to "ALSA: hda: Skip controller resume if not needed". > Setting snd_hda_intel.power_save=0 resolves the issue. FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip controller resume if not needed"), https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in v5.7-rc2. > I am using an ASRock B450 Pro4 with Realtek HDA codec: > > [ 1.009400] snd_hda_intel 0000:0a:00.1: enabling device (0000 -> 0002) > [ 1.009425] snd_hda_intel 0000:0a:00.1: Force to non-snoop mode > [ 1.009653] snd_hda_intel 0000:0c:00.3: enabling device (0000 -> 0002) > [ 1.021452] snd_hda_codec_generic hdaudioC0D0: ignore pin 0x7, too many assigned pins > [ 1.021461] snd_hda_codec_generic hdaudioC0D0: ignore pin 0x9, too many assigned pins > [ 1.021471] snd_hda_codec_generic hdaudioC0D0: ignore pin 0xb, too many assigned pins > [ 1.021480] snd_hda_codec_generic hdaudioC0D0: ignore pin 0xd, too many assigned pins > [ 1.021482] snd_hda_codec_generic hdaudioC0D0: autoconfig for Generic: line_outs=0 (0x0/0x0/0x0/0x0/0x0) type:line > [ 1.021482] snd_hda_codec_generic hdaudioC0D0: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0) > [ 1.021483] snd_hda_codec_generic hdaudioC0D0: hp_outs=0 (0x0/0x0/0x0/0x0/0x0) > [ 1.021484] snd_hda_codec_generic hdaudioC0D0: mono: mono_out=0x0 > [ 1.021484] snd_hda_codec_generic hdaudioC0D0: dig-out=0x3/0x5 > [ 1.021485] snd_hda_codec_generic hdaudioC0D0: inputs: > [ 1.046053] snd_hda_codec_realtek hdaudioC1D0: autoconfig for ALC892: line_outs=1 (0x14/0x0/0x0/0x0/0x0) type:line > [ 1.046054] snd_hda_codec_realtek hdaudioC1D0: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0) > [ 1.046055] snd_hda_codec_realtek hdaudioC1D0: hp_outs=1 (0x1b/0x0/0x0/0x0/0x0) > [ 1.046055] snd_hda_codec_realtek hdaudioC1D0: mono: mono_out=0x0 > [ 1.046056] snd_hda_codec_realtek hdaudioC1D0: inputs: > [ 1.046057] snd_hda_codec_realtek hdaudioC1D0: Front Mic=0x19 > [ 1.046058] snd_hda_codec_realtek hdaudioC1D0: Rear Mic=0x18 > [ 1.046058] snd_hda_codec_realtek hdaudioC1D0: Line=0x1a > > I also have an ASUS RX 480 graphics card with HDMI audio output. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) 2020-04-22 20:50 ` Bjorn Helgaas @ 2020-04-22 21:25 ` Takashi Iwai 2020-04-22 23:21 ` Bjorn Helgaas 0 siblings, 1 reply; 6+ messages in thread From: Takashi Iwai @ 2020-04-22 21:25 UTC (permalink / raw) To: Bjorn Helgaas Cc: Alex Xu (Hello71), alsa-devel, Roy Spliet, linux-kernel, linux-pci, Rafael J. Wysocki, linux-pm On Wed, 22 Apr 2020 22:50:28 +0200, Bjorn Helgaas wrote: > > [+cc Rafael, linux-pm] > > On Tue, Apr 21, 2020 at 03:08:44PM -0400, Alex Xu (Hello71) wrote: > > With 5.7-rc2, after resuming from suspend to RAM, I get: > > > > [ 55.679382] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0 > > [ 55.679405] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) > > [ 55.679410] pcieport 0000:00:03.1: AER: device [1022:1453] error status/mask=00100000/04400000 > > [ 55.679414] pcieport 0000:00:03.1: AER: [20] UnsupReq (First) > > [ 55.679417] pcieport 0000:00:03.1: AER: TLP Header: 40000004 0a0000ff fffc0e80 00000000 > > [ 55.679423] amdgpu 0000:0a:00.0: AER: can't recover (no error_detected callback) > > [ 55.679425] snd_hda_intel 0000:0a:00.1: AER: can't recover (no error_detected callback) > > [ 55.679455] pcieport 0000:00:03.1: AER: device recovery failed > > I'm not at all confident in my decoding skills, but I *think* the TLP > header decodes to: > > Fmt 010b 3 DW header with data (32-bit address) > Type 00000b MWr > Length 0x4 4 DW = 16 bytes > Requester ID 0x0a00 0a:00.0 > Byte enables 0xff > Address 0xfffc0e80 > > which would mean the 0a:00.0 GPU did a 16-byte write to 0xfffc0e80, > and the 00:03.1 Root Port reported that as an Unsupported Request. > I don't know why that would be unless the address is invalid. > > Maybe that's supposed to be an MSI address? Maybe a complete dmesg or > /proc/iomem would have a clue? > > I feel like this UR issue could be a PCI core issue or maybe some sort > of misuse of PCI power management, but I can't seem to get traction on > it. > > > Then the display freezes and the system basically falls apart (can't > > even sudo reboot -f, need to use magic sysrq). > > > > I bisected this to "ALSA: hda: Skip controller resume if not needed". > > Setting snd_hda_intel.power_save=0 resolves the issue. > > FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip > controller resume if not needed"), > https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in > v5.7-rc2. Yes, and I posted the fix patch right now: https://lore.kernel.org/r/20200422203744.26299-1-tiwai@suse.de The possible cause was the tricky resume code that both HD-audio controller (the parent PCI device) and the codec devices used. At least the patch above seems working for the reporter's machine. Now we need a bit more testing before merging, but it looks promising, so far. thanks, Takashi ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) 2020-04-22 21:25 ` Takashi Iwai @ 2020-04-22 23:21 ` Bjorn Helgaas 2020-04-23 7:05 ` Takashi Iwai 0 siblings, 1 reply; 6+ messages in thread From: Bjorn Helgaas @ 2020-04-22 23:21 UTC (permalink / raw) To: Takashi Iwai Cc: Alex Xu (Hello71), alsa-devel, Roy Spliet, linux-kernel, linux-pci, Rafael J. Wysocki, linux-pm On Wed, Apr 22, 2020 at 11:25:04PM +0200, Takashi Iwai wrote: > On Wed, 22 Apr 2020 22:50:28 +0200, > Bjorn Helgaas wrote: > > ... > > I feel like this UR issue could be a PCI core issue or maybe some sort > > of misuse of PCI power management, but I can't seem to get traction on > > it. > > > > > Then the display freezes and the system basically falls apart (can't > > > even sudo reboot -f, need to use magic sysrq). > > > > > > I bisected this to "ALSA: hda: Skip controller resume if not needed". > > > Setting snd_hda_intel.power_save=0 resolves the issue. > > > > FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip > > controller resume if not needed"), > > https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in > > v5.7-rc2. > > Yes, and I posted the fix patch right now: > https://lore.kernel.org/r/20200422203744.26299-1-tiwai@suse.de > > The possible cause was the tricky resume code that both HD-audio > controller (the parent PCI device) and the codec devices used. > > At least the patch above seems working for the reporter's machine. > Now we need a bit more testing before merging, but it looks promising, > so far. Great, I'm glad you figured something out because I sure wasn't getting anywhere! Maybe this is a tangent, but I can't figure out what snd_power_change_state() is doing. It *looks* like it's supposed to change the PCI power state, but I gave up trying to figure out where it actually touches the device. It seems like sound has more magic in power management than other device types, which makes me wonder if we're not providing the right interfaces or something. Bjorn ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) 2020-04-22 23:21 ` Bjorn Helgaas @ 2020-04-23 7:05 ` Takashi Iwai 0 siblings, 0 replies; 6+ messages in thread From: Takashi Iwai @ 2020-04-23 7:05 UTC (permalink / raw) To: Bjorn Helgaas Cc: Alex Xu (Hello71), alsa-devel, Roy Spliet, linux-kernel, linux-pci, Rafael J. Wysocki, linux-pm On Thu, 23 Apr 2020 01:21:27 +0200, Bjorn Helgaas wrote: > > On Wed, Apr 22, 2020 at 11:25:04PM +0200, Takashi Iwai wrote: > > On Wed, 22 Apr 2020 22:50:28 +0200, > > Bjorn Helgaas wrote: > > > ... > > > I feel like this UR issue could be a PCI core issue or maybe some sort > > > of misuse of PCI power management, but I can't seem to get traction on > > > it. > > > > > > > Then the display freezes and the system basically falls apart (can't > > > > even sudo reboot -f, need to use magic sysrq). > > > > > > > > I bisected this to "ALSA: hda: Skip controller resume if not needed". > > > > Setting snd_hda_intel.power_save=0 resolves the issue. > > > > > > FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip > > > controller resume if not needed"), > > > https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in > > > v5.7-rc2. > > > > Yes, and I posted the fix patch right now: > > https://lore.kernel.org/r/20200422203744.26299-1-tiwai@suse.de > > > > The possible cause was the tricky resume code that both HD-audio > > controller (the parent PCI device) and the codec devices used. > > > > At least the patch above seems working for the reporter's machine. > > Now we need a bit more testing before merging, but it looks promising, > > so far. > > Great, I'm glad you figured something out because I sure wasn't > getting anywhere! > > Maybe this is a tangent, but I can't figure out what > snd_power_change_state() is doing. It *looks* like it's supposed to > change the PCI power state, but I gave up trying to figure out where > it actually touches the device. Not really, it merely updates the internal state field stored in the sound card object, see in include/sound/core.h: static inline void snd_power_change_state(struct snd_card *card, unsigned int state) { card->power_state = state; wake_up(&card->power_sleep); } The sound API blocks the operation while suspend/resume explicitly with this card top-level signal. thanks, Takashi ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-04-23 7:06 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1587494585.7pihgq0z3i.none.ref@localhost> 2020-04-21 19:08 ` Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2) Alex Xu (Hello71) 2020-04-21 19:40 ` Takashi Iwai 2020-04-22 20:50 ` Bjorn Helgaas 2020-04-22 21:25 ` Takashi Iwai 2020-04-22 23:21 ` Bjorn Helgaas 2020-04-23 7:05 ` Takashi Iwai
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).