* Ryzen7 3700U xhci fails on resume from sleep @ 2019-08-26 9:10 Daniel Drake 2019-08-26 9:29 ` Rafael J. Wysocki 0 siblings, 1 reply; 11+ messages in thread From: Daniel Drake @ 2019-08-26 9:10 UTC (permalink / raw) To: Linux USB Mailing List, Linux PCI, Linux PM Cc: Endless Linux Upstreaming Team Hi, Working with a new consumer laptop based on AMD Ryzen 7 3700U, all USB functionality goes dead upon suspend/resume (s2idle). Reproduced on linus master from today. <<<suspend>>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. <<<wake up happens here>>> xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3 xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 WARNING: CPU: 0 PID: 1980 at kernel/irq/chip.c:210 irq_startup+0xda/0xe0 Modules linked in: CPU: 0 PID: 1980 Comm: bash Not tainted 5.3.0-rc6+ #265 Hardware name: ASUSTeK COMPUTER INC. ZenBook UX434DA_UX434DA/UX434DA, BIOS UX434DA_UX434DA.301-C03 08/20/2019 RIP: 0010:irq_startup+0xda/0xe0 Code: ef e8 fa 2b 00 00 85 c0 0f 85 04 09 00 00 48 89 ee 31 d2 4c 89 ef e8 d5 d3 ff ff 48 89 df e8 cd fe ff ff 89 c5 e9 53 ff ff ff <0f> 0b eb b5 66 90 55 48 89 fd 53 48 8b 47 38 89 f3 8b 00 a9 00 00 RSP: 0018:ffffa045407cfd68 EFLAGS: 00010002 RAX: 0000000000000040 RBX: ffff98058e968800 RCX: 0000000000000040 RDX: 0000000000000000 RSI: ffffffffa2b6d1f8 RDI: ffff98058e968818 RBP: ffff98058e968818 R08: 0000000000000000 R09: ffff98058ec00650 R10: 0000000000000000 R11: ffffffffa2a49568 R12: 0000000000000001 R13: 0000000000000001 R14: 0000000000000246 R15: ffffa045407cfde0 FS: 00007f2d54054740(0000) GS:ffff980590c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000562df2f3c976 CR3: 000000040ea00000 CR4: 00000000003406f0 Call Trace: resume_irqs+0x9e/0xd0 dpm_noirq_end+0x5/0x10 suspend_devices_and_enter+0x587/0x780 pm_suspend.cold.7+0x309/0x35f state_store+0x7b/0xe0 kernfs_fop_write+0x100/0x180 vfs_write+0xa0/0x1a0 ksys_write+0x54/0xd0 do_syscall_64+0x3d/0x110 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f2d54141804 Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 5e 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53 RSP: 002b:00007ffce602cb28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f2d54141804 RDX: 0000000000000004 RSI: 000055764b26faa0 RDI: 0000000000000001 RBP: 000055764b26faa0 R08: 000000000000000a R09: 0000000000000077 R10: 000000000000000a R11: 0000000000000246 R12: 00007f2d54213760 R13: 0000000000000004 R14: 00007f2d5420e760 R15: 0000000000000004 ---[ end trace 68323bdeb91ed863 ]--- xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002) xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002) xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout xhci_hcd 0000:03:00.4: PCI post-resume error -110! xhci_hcd 0000:03:00.4: HC died; cleaning up xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 xhci_hcd 0000:03:00.3: PCI post-resume error -110! PM: Device 0000:03:00.4 failed to resume async: error -110 xhci_hcd 0000:03:00.3: HC died; cleaning up PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 PM: Device 0000:03:00.3 failed to resume async: error -110 Restarting tasks ... usb 1-3: USB disconnect, device number 2 usb 3-1: USB disconnect, device number 2 asix 1-3:1.0 enx001c490105e9: unregister 'asix' usb-0000:03:00.3-3, ASIX AX88772 USB 2.0 Ethernet done. PM: suspend exit xhci_hcd 0000:03:00.4: xHCI host controller not responding, assume dead xhci_hcd 0000:03:00.4: HC died; cleaning up xhci_hcd 0000:03:00.4: Timeout while waiting for configure endpoint command xhci_hcd 0000:03:00.3: xHCI host controller not responding, assume dead xhci_hcd 0000:03:00.3: HC died; cleaning up xhci_hcd 0000:03:00.3: Timeout while waiting for configure endpoint command usb 3-2: USB disconnect, device number 3 usb 1-4: USB disconnect, device number 3 I think the irq_startup() warning is unrelated - anyway the logs already start complaining about xhci_hcd above that: xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3 xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 These messages indicate that Linux tried to power on the device again, but the PCI power management registers indicate that it ignored the request and remains in D3. I tried a few things like making it try D3hot instead of D3cold (which is what it's aiming for even though it's not mentioned in the logs above), and disabling the suspend/resume actions taken by drivers/pci/pci-acpi.c without any improvement. Trying to sanity check other basic details I observe that this simple routine (to put it in D3 then D0) also fails: # cd /sys/bus/pci/drivers/xhci_hcd # echo -n 0000:00:03.0 > unbind # setpci -s 00:00.3 CAP_PM+4.b=3 # setpci -s 00:00.3 CAP_PM+4.b=0 # echo -n 0000:00:03.0 > bind bind then fails with: xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002) xhci_hcd 0000:03:00.3: xHCI Host Controller xhci_hcd 0000:03:00.3: new USB bus registered, assigned bus number 1 xhci_hcd 0000:03:00.3: Host halt failed, -19 xhci_hcd 0000:03:00.3: can't setup: -19 xhci_hcd 0000:03:00.3: USB bus 1 deregistered xhci_hcd 0000:03:00.3: init 0000:03:00.3 fail, -19 As another test I was wondering if I could get Linux to put it into D3 and then go back into D0 without having to go through the whole suspend procedure, but even when I unbind it from xhci_hcd and set power/control to "auto" in /sys/bus/pci/devices/0000:03:00.3, runtime_status is "suspended" but Linux still leaves the device in D0 - is that expected? Any debugging pointers much appreciated. acpidump: https://gist.github.com/dsd/ff3dfc0f63cdd9eba4a0fbd9e776e8be lspci: https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5 Thanks Daniel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ryzen7 3700U xhci fails on resume from sleep 2019-08-26 9:10 Ryzen7 3700U xhci fails on resume from sleep Daniel Drake @ 2019-08-26 9:29 ` Rafael J. Wysocki 2019-08-26 13:34 ` Mathias Nyman 0 siblings, 1 reply; 11+ messages in thread From: Rafael J. Wysocki @ 2019-08-26 9:29 UTC (permalink / raw) To: Daniel Drake Cc: Linux USB Mailing List, Linux PCI, Linux PM, Endless Linux Upstreaming Team On Mon, Aug 26, 2019 at 11:11 AM Daniel Drake <drake@endlessm.com> wrote: > > Hi, > > Working with a new consumer laptop based on AMD Ryzen 7 3700U, all USB > functionality goes dead upon suspend/resume (s2idle). Reproduced on > linus master from today. I wonder if you can reproduce this with the pm-s2idle-rework branch from linux-pm.git merged in. > <<<suspend>>> > Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. > <<<wake up happens here>>> > xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3 > xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 > WARNING: CPU: 0 PID: 1980 at kernel/irq/chip.c:210 irq_startup+0xda/0xe0 > Modules linked in: > CPU: 0 PID: 1980 Comm: bash Not tainted 5.3.0-rc6+ #265 > Hardware name: ASUSTeK COMPUTER INC. ZenBook UX434DA_UX434DA/UX434DA, > BIOS UX434DA_UX434DA.301-C03 08/20/2019 > RIP: 0010:irq_startup+0xda/0xe0 > Code: ef e8 fa 2b 00 00 85 c0 0f 85 04 09 00 00 48 89 ee 31 d2 4c 89 > ef e8 d5 d3 ff ff 48 89 df e8 cd fe ff ff 89 c5 e9 53 ff ff ff <0f> 0b > eb b5 66 90 55 48 89 fd 53 48 8b 47 38 89 f3 8b 00 a9 00 00 > RSP: 0018:ffffa045407cfd68 EFLAGS: 00010002 > RAX: 0000000000000040 RBX: ffff98058e968800 RCX: 0000000000000040 > RDX: 0000000000000000 RSI: ffffffffa2b6d1f8 RDI: ffff98058e968818 > RBP: ffff98058e968818 R08: 0000000000000000 R09: ffff98058ec00650 > R10: 0000000000000000 R11: ffffffffa2a49568 R12: 0000000000000001 > R13: 0000000000000001 R14: 0000000000000246 R15: ffffa045407cfde0 > FS: 00007f2d54054740(0000) GS:ffff980590c00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 0000562df2f3c976 CR3: 000000040ea00000 CR4: 00000000003406f0 > Call Trace: > resume_irqs+0x9e/0xd0 > dpm_noirq_end+0x5/0x10 > suspend_devices_and_enter+0x587/0x780 > pm_suspend.cold.7+0x309/0x35f > state_store+0x7b/0xe0 > kernfs_fop_write+0x100/0x180 > vfs_write+0xa0/0x1a0 > ksys_write+0x54/0xd0 > do_syscall_64+0x3d/0x110 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x7f2d54141804 > Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 > 00 48 8d 05 f9 5e 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d > 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53 > RSP: 002b:00007ffce602cb28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f2d54141804 > RDX: 0000000000000004 RSI: 000055764b26faa0 RDI: 0000000000000001 > RBP: 000055764b26faa0 R08: 000000000000000a R09: 0000000000000077 > R10: 000000000000000a R11: 0000000000000246 R12: 00007f2d54213760 > R13: 0000000000000004 R14: 00007f2d5420e760 R15: 0000000000000004 > ---[ end trace 68323bdeb91ed863 ]--- > xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002) > xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002) > xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout > xhci_hcd 0000:03:00.4: PCI post-resume error -110! > xhci_hcd 0000:03:00.4: HC died; cleaning up > xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout > PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 > xhci_hcd 0000:03:00.3: PCI post-resume error -110! > PM: Device 0000:03:00.4 failed to resume async: error -110 > xhci_hcd 0000:03:00.3: HC died; cleaning up > PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 > PM: Device 0000:03:00.3 failed to resume async: error -110 > Restarting tasks ... > usb 1-3: USB disconnect, device number 2 > usb 3-1: USB disconnect, device number 2 > asix 1-3:1.0 enx001c490105e9: unregister 'asix' usb-0000:03:00.3-3, > ASIX AX88772 USB 2.0 Ethernet > done. > PM: suspend exit > xhci_hcd 0000:03:00.4: xHCI host controller not responding, assume dead > xhci_hcd 0000:03:00.4: HC died; cleaning up > xhci_hcd 0000:03:00.4: Timeout while waiting for configure endpoint command > xhci_hcd 0000:03:00.3: xHCI host controller not responding, assume dead > xhci_hcd 0000:03:00.3: HC died; cleaning up > xhci_hcd 0000:03:00.3: Timeout while waiting for configure endpoint command > usb 3-2: USB disconnect, device number 3 > usb 1-4: USB disconnect, device number 3 > > I think the irq_startup() warning is unrelated - anyway the logs > already start complaining about xhci_hcd above that: > > xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3 > xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 > > These messages indicate that Linux tried to power on the device again, > but the PCI power management registers indicate that it ignored the > request and remains in D3. > > I tried a few things like making it try D3hot instead of D3cold (which > is what it's aiming for even though it's not mentioned in the logs > above), and disabling the suspend/resume actions taken by > drivers/pci/pci-acpi.c without any improvement. > > Trying to sanity check other basic details I observe that this simple > routine (to put it in D3 then D0) also fails: > > # cd /sys/bus/pci/drivers/xhci_hcd > # echo -n 0000:00:03.0 > unbind > # setpci -s 00:00.3 CAP_PM+4.b=3 > # setpci -s 00:00.3 CAP_PM+4.b=0 > # echo -n 0000:00:03.0 > bind > > bind then fails with: > xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002) > xhci_hcd 0000:03:00.3: xHCI Host Controller > xhci_hcd 0000:03:00.3: new USB bus registered, assigned bus number 1 > xhci_hcd 0000:03:00.3: Host halt failed, -19 > xhci_hcd 0000:03:00.3: can't setup: -19 > xhci_hcd 0000:03:00.3: USB bus 1 deregistered > xhci_hcd 0000:03:00.3: init 0000:03:00.3 fail, -19 > > As another test I was wondering if I could get Linux to put it into D3 > and then go back into D0 without having to go through the whole > suspend procedure, but even when I unbind it from xhci_hcd and set > power/control to "auto" in /sys/bus/pci/devices/0000:03:00.3, > runtime_status is "suspended" but Linux still leaves the device in D0 > - is that expected? > > Any debugging pointers much appreciated. > > acpidump: > https://gist.github.com/dsd/ff3dfc0f63cdd9eba4a0fbd9e776e8be > > lspci: > https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5 > > Thanks > Daniel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ryzen7 3700U xhci fails on resume from sleep 2019-08-26 9:29 ` Rafael J. Wysocki @ 2019-08-26 13:34 ` Mathias Nyman 2019-08-27 2:49 ` Daniel Drake 0 siblings, 1 reply; 11+ messages in thread From: Mathias Nyman @ 2019-08-26 13:34 UTC (permalink / raw) To: Rafael J. Wysocki, Daniel Drake Cc: Linux USB Mailing List, Linux PCI, Linux PM, Endless Linux Upstreaming Team On 26.8.2019 12.29, Rafael J. Wysocki wrote: > On Mon, Aug 26, 2019 at 11:11 AM Daniel Drake <drake@endlessm.com> wrote: >> >> Hi, >> >> Working with a new consumer laptop based on AMD Ryzen 7 3700U, all USB >> functionality goes dead upon suspend/resume (s2idle). Reproduced on >> linus master from today. > > I wonder if you can reproduce this with the pm-s2idle-rework branch > from linux-pm.git merged in. > Root cause looks similar to: https://bugzilla.kernel.org/show_bug.cgi?id=203885 Mika wrote a fix for that: https://lore.kernel.org/linux-pci/20190821124519.71594-1-mika.westerberg@linux.intel.com/ -Mathias >> <<<suspend>>> >> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. >> <<<wake up happens here>>> >> xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3 >> xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 >> WARNING: CPU: 0 PID: 1980 at kernel/irq/chip.c:210 irq_startup+0xda/0xe0 >> Modules linked in: >> CPU: 0 PID: 1980 Comm: bash Not tainted 5.3.0-rc6+ #265 >> Hardware name: ASUSTeK COMPUTER INC. ZenBook UX434DA_UX434DA/UX434DA, >> BIOS UX434DA_UX434DA.301-C03 08/20/2019 >> RIP: 0010:irq_startup+0xda/0xe0 >> Code: ef e8 fa 2b 00 00 85 c0 0f 85 04 09 00 00 48 89 ee 31 d2 4c 89 >> ef e8 d5 d3 ff ff 48 89 df e8 cd fe ff ff 89 c5 e9 53 ff ff ff <0f> 0b >> eb b5 66 90 55 48 89 fd 53 48 8b 47 38 89 f3 8b 00 a9 00 00 >> RSP: 0018:ffffa045407cfd68 EFLAGS: 00010002 >> RAX: 0000000000000040 RBX: ffff98058e968800 RCX: 0000000000000040 >> RDX: 0000000000000000 RSI: ffffffffa2b6d1f8 RDI: ffff98058e968818 >> RBP: ffff98058e968818 R08: 0000000000000000 R09: ffff98058ec00650 >> R10: 0000000000000000 R11: ffffffffa2a49568 R12: 0000000000000001 >> R13: 0000000000000001 R14: 0000000000000246 R15: ffffa045407cfde0 >> FS: 00007f2d54054740(0000) GS:ffff980590c00000(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> CR2: 0000562df2f3c976 CR3: 000000040ea00000 CR4: 00000000003406f0 >> Call Trace: >> resume_irqs+0x9e/0xd0 >> dpm_noirq_end+0x5/0x10 >> suspend_devices_and_enter+0x587/0x780 >> pm_suspend.cold.7+0x309/0x35f >> state_store+0x7b/0xe0 >> kernfs_fop_write+0x100/0x180 >> vfs_write+0xa0/0x1a0 >> ksys_write+0x54/0xd0 >> do_syscall_64+0x3d/0x110 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> RIP: 0033:0x7f2d54141804 >> Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 >> 00 48 8d 05 f9 5e 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d >> 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53 >> RSP: 002b:00007ffce602cb28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 >> RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f2d54141804 >> RDX: 0000000000000004 RSI: 000055764b26faa0 RDI: 0000000000000001 >> RBP: 000055764b26faa0 R08: 000000000000000a R09: 0000000000000077 >> R10: 000000000000000a R11: 0000000000000246 R12: 00007f2d54213760 >> R13: 0000000000000004 R14: 00007f2d5420e760 R15: 0000000000000004 >> ---[ end trace 68323bdeb91ed863 ]--- >> xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002) >> xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002) >> xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout >> xhci_hcd 0000:03:00.4: PCI post-resume error -110! >> xhci_hcd 0000:03:00.4: HC died; cleaning up >> xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout >> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 >> xhci_hcd 0000:03:00.3: PCI post-resume error -110! >> PM: Device 0000:03:00.4 failed to resume async: error -110 >> xhci_hcd 0000:03:00.3: HC died; cleaning up >> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 >> PM: Device 0000:03:00.3 failed to resume async: error -110 >> Restarting tasks ... >> usb 1-3: USB disconnect, device number 2 >> usb 3-1: USB disconnect, device number 2 >> asix 1-3:1.0 enx001c490105e9: unregister 'asix' usb-0000:03:00.3-3, >> ASIX AX88772 USB 2.0 Ethernet >> done. >> PM: suspend exit >> xhci_hcd 0000:03:00.4: xHCI host controller not responding, assume dead >> xhci_hcd 0000:03:00.4: HC died; cleaning up >> xhci_hcd 0000:03:00.4: Timeout while waiting for configure endpoint command >> xhci_hcd 0000:03:00.3: xHCI host controller not responding, assume dead >> xhci_hcd 0000:03:00.3: HC died; cleaning up >> xhci_hcd 0000:03:00.3: Timeout while waiting for configure endpoint command >> usb 3-2: USB disconnect, device number 3 >> usb 1-4: USB disconnect, device number 3 >> >> I think the irq_startup() warning is unrelated - anyway the logs >> already start complaining about xhci_hcd above that: >> >> xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3 >> xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3 >> >> These messages indicate that Linux tried to power on the device again, >> but the PCI power management registers indicate that it ignored the >> request and remains in D3. >> >> I tried a few things like making it try D3hot instead of D3cold (which >> is what it's aiming for even though it's not mentioned in the logs >> above), and disabling the suspend/resume actions taken by >> drivers/pci/pci-acpi.c without any improvement. >> >> Trying to sanity check other basic details I observe that this simple >> routine (to put it in D3 then D0) also fails: >> >> # cd /sys/bus/pci/drivers/xhci_hcd >> # echo -n 0000:00:03.0 > unbind >> # setpci -s 00:00.3 CAP_PM+4.b=3 >> # setpci -s 00:00.3 CAP_PM+4.b=0 >> # echo -n 0000:00:03.0 > bind >> >> bind then fails with: >> xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002) >> xhci_hcd 0000:03:00.3: xHCI Host Controller >> xhci_hcd 0000:03:00.3: new USB bus registered, assigned bus number 1 >> xhci_hcd 0000:03:00.3: Host halt failed, -19 >> xhci_hcd 0000:03:00.3: can't setup: -19 >> xhci_hcd 0000:03:00.3: USB bus 1 deregistered >> xhci_hcd 0000:03:00.3: init 0000:03:00.3 fail, -19 >> >> As another test I was wondering if I could get Linux to put it into D3 >> and then go back into D0 without having to go through the whole >> suspend procedure, but even when I unbind it from xhci_hcd and set >> power/control to "auto" in /sys/bus/pci/devices/0000:03:00.3, >> runtime_status is "suspended" but Linux still leaves the device in D0 >> - is that expected? >> >> Any debugging pointers much appreciated. >> >> acpidump: >> https://gist.github.com/dsd/ff3dfc0f63cdd9eba4a0fbd9e776e8be >> >> lspci: >> https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5 >> >> Thanks >> Daniel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ryzen7 3700U xhci fails on resume from sleep 2019-08-26 13:34 ` Mathias Nyman @ 2019-08-27 2:49 ` Daniel Drake 2019-08-27 7:48 ` Rafael J. Wysocki 0 siblings, 1 reply; 11+ messages in thread From: Daniel Drake @ 2019-08-27 2:49 UTC (permalink / raw) To: Mathias Nyman Cc: Rafael J. Wysocki, Linux USB Mailing List, Linux PCI, Linux PM, Endless Linux Upstreaming Team On Mon, Aug 26, 2019 at 9:32 PM Mathias Nyman <mathias.nyman@linux.intel.com> wrote: > On 26.8.2019 12.29, Rafael J. Wysocki wrote: > > I wonder if you can reproduce this with the pm-s2idle-rework branch > > from linux-pm.git merged in. > > Root cause looks similar to: > https://bugzilla.kernel.org/show_bug.cgi?id=203885 > > Mika wrote a fix for that: > https://lore.kernel.org/linux-pci/20190821124519.71594-1-mika.westerberg@linux.intel.com/ Thanks for the suggestions. Mika's patch was already applied then reverted, I applied it again but there's no change. Also merging in pm-s2idle-rework doesn't make any difference. Any other ideas? Or comments on my findings so far? Given that I can't shift D0-D3-D0 reliably directly with setpci before loading the driver, is that indicative of a fundamental problem with the platform, or is my test invalid? Or in terms of other ways of testing the power transition outside of the suspend path, if a PCI dev is runtime suspended with no driver loaded, should Linux not be attempting to put it into D3? Thanks Daniel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ryzen7 3700U xhci fails on resume from sleep 2019-08-27 2:49 ` Daniel Drake @ 2019-08-27 7:48 ` Rafael J. Wysocki 2019-08-28 8:34 ` Daniel Drake 0 siblings, 1 reply; 11+ messages in thread From: Rafael J. Wysocki @ 2019-08-27 7:48 UTC (permalink / raw) To: Daniel Drake Cc: Mathias Nyman, Rafael J. Wysocki, Linux USB Mailing List, Linux PCI, Linux PM, Endless Linux Upstreaming Team On Tue, Aug 27, 2019 at 4:50 AM Daniel Drake <drake@endlessm.com> wrote: > > On Mon, Aug 26, 2019 at 9:32 PM Mathias Nyman > <mathias.nyman@linux.intel.com> wrote: > > On 26.8.2019 12.29, Rafael J. Wysocki wrote: > > > I wonder if you can reproduce this with the pm-s2idle-rework branch > > > from linux-pm.git merged in. > > > > Root cause looks similar to: > > https://bugzilla.kernel.org/show_bug.cgi?id=203885 > > > > Mika wrote a fix for that: > > https://lore.kernel.org/linux-pci/20190821124519.71594-1-mika.westerberg@linux.intel.com/ > > Thanks for the suggestions. Mika's patch was already applied then > reverted, I applied it again but there's no change. > Also merging in pm-s2idle-rework doesn't make any difference. > > Any other ideas? Or comments on my findings so far? > Given that I can't shift D0-D3-D0 reliably directly with setpci before > loading the driver, is that indicative of a fundamental problem with > the platform, or is my test invalid? That depends on what exactly happens when you try to do the D0-D3-D0 with setpci. If the device becomes unreachable (or worse) after that, it indicates a platform issue. It should not do any harm at the least. However, in principle D0-D3-D0 at the PCI level alone may not be sufficient, because ACPI may need to be involved. I think that PM-runtime should suspend XHCI controllers without anything on the bus under them, so I wonder what happens if ".../power/control" is set to "on" and then to "auto" for that device, with the driver loaded. > Or in terms of other ways of testing the power transition outside of > the suspend path, if a PCI dev is runtime suspended with no driver > loaded, should Linux not be attempting to put it into D3? PCI devices without drivers cannot be runtime-suspended at all. Cheers, Rafael ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ryzen7 3700U xhci fails on resume from sleep 2019-08-27 7:48 ` Rafael J. Wysocki @ 2019-08-28 8:34 ` Daniel Drake 2019-08-28 8:43 ` Rafael J. Wysocki 0 siblings, 1 reply; 11+ messages in thread From: Daniel Drake @ 2019-08-28 8:34 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Mathias Nyman, Linux USB Mailing List, Linux PCI, Linux PM, Endless Linux Upstreaming Team On Tue, Aug 27, 2019 at 3:48 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > That depends on what exactly happens when you try to do the D0-D3-D0 > with setpci. If the device becomes unreachable (or worse) after that, > it indicates a platform issue. It should not do any harm at the > least. > > However, in principle D0-D3-D0 at the PCI level alone may not be > sufficient, because ACPI may need to be involved. After using setpci to do D0-D3-D0 transitions, the xhci module fails to probe. xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout xhci_hcd 0000:03:00.3: PCI post-resume error -110! But maybe it's not a great test; as you say I'm not involving ACPI, and also if Linux has a reason for not runtime suspending PCI devices without drivers present then maybe I should also not be doing this. > I think that PM-runtime should suspend XHCI controllers without > anything on the bus under them, so I wonder what happens if > ".../power/control" is set to "on" and then to "auto" for that device, > with the driver loaded. Good hint. # echo on > /sys/bus/pci/devices/0000\:03\:00.3/power/control # echo auto > /sys/bus/pci/devices/0000\:03\:00.3/power/control # echo 1 > /sys/bus/usb/devices/1-4/remove # cat /sys/bus/pci/devices/0000\:03\:00.3/power/runtime_status suspended # echo on > /sys/bus/pci/devices/0000\:03\:00.3/power/control The final command there triggers these messages (including a printk I added in pci_raw_set_power_state): xhci_hcd 0000:03:00.3: pci_raw_set_power_state from 3 to 0 xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3 xhci_hcd 0000:03:00.3: pci_raw_set_power_state from 3 to 0 xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002) xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout xhci_hcd 0000:03:00.3: PCI post-resume error -110! xhci_hcd 0000:03:00.3: HC died; cleaning up So we just reproduced the same issue using runtime PM, without having to go through the whole suspend path. I guess that points towards a platform issue, although the weird thing is that Windows presumably does the D3-D0-D3 transition during suspend/resume and that appears to work fine. I'll report it to the vendor, but if you have any other debug ideas they would be much appreciated. Daniel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ryzen7 3700U xhci fails on resume from sleep 2019-08-28 8:34 ` Daniel Drake @ 2019-08-28 8:43 ` Rafael J. Wysocki 2019-08-29 7:33 ` Daniel Drake 2019-09-23 9:10 ` Daniel Drake 0 siblings, 2 replies; 11+ messages in thread From: Rafael J. Wysocki @ 2019-08-28 8:43 UTC (permalink / raw) To: Daniel Drake Cc: Rafael J. Wysocki, Mathias Nyman, Linux USB Mailing List, Linux PCI, Linux PM, Endless Linux Upstreaming Team On Wed, Aug 28, 2019 at 10:34 AM Daniel Drake <drake@endlessm.com> wrote: > > On Tue, Aug 27, 2019 at 3:48 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > That depends on what exactly happens when you try to do the D0-D3-D0 > > with setpci. If the device becomes unreachable (or worse) after that, > > it indicates a platform issue. It should not do any harm at the > > least. > > > > However, in principle D0-D3-D0 at the PCI level alone may not be > > sufficient, because ACPI may need to be involved. > > After using setpci to do D0-D3-D0 transitions, the xhci module fails to probe. > > xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout > xhci_hcd 0000:03:00.3: PCI post-resume error -110! > > But maybe it's not a great test; as you say I'm not involving ACPI, > and also if Linux has a reason for not runtime suspending PCI devices > without drivers present then maybe I should also not be doing this. > > > I think that PM-runtime should suspend XHCI controllers without > > anything on the bus under them, so I wonder what happens if > > ".../power/control" is set to "on" and then to "auto" for that device, > > with the driver loaded. > > Good hint. > > # echo on > /sys/bus/pci/devices/0000\:03\:00.3/power/control > # echo auto > /sys/bus/pci/devices/0000\:03\:00.3/power/control > # echo 1 > /sys/bus/usb/devices/1-4/remove > # cat /sys/bus/pci/devices/0000\:03\:00.3/power/runtime_status > suspended > # echo on > /sys/bus/pci/devices/0000\:03\:00.3/power/control > > The final command there triggers these messages (including a printk I > added in pci_raw_set_power_state): > xhci_hcd 0000:03:00.3: pci_raw_set_power_state from 3 to 0 > xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3 > xhci_hcd 0000:03:00.3: pci_raw_set_power_state from 3 to 0 > xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002) > xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout > xhci_hcd 0000:03:00.3: PCI post-resume error -110! > xhci_hcd 0000:03:00.3: HC died; cleaning up > > So we just reproduced the same issue using runtime PM, without having > to go through the whole suspend path. > > I guess that points towards a platform issue, although the weird thing > is that Windows presumably does the D3-D0-D3 transition during > suspend/resume and that appears to work fine. It looks like the platform expects the OS to do something that our generic XHCI driver and the PCI/ACPI layer don't do. A quirk or similar may be needed to address that. > I'll report it to the vendor, Yes, please. At least try to get the information on what the exact platform expectations with respect to the OS are. Quite evidently, they aren't just "do the usual thing". > but if you have any other debug ideas they would be much appreciated. With the git branch mentioned previously merged in, you can enable dynamic debug in device_pm.c, repeat the PM-runtime test and collect the log. There should be some additional messages from the ACPI layer in it. > > Daniel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ryzen7 3700U xhci fails on resume from sleep 2019-08-28 8:43 ` Rafael J. Wysocki @ 2019-08-29 7:33 ` Daniel Drake 2019-09-23 9:10 ` Daniel Drake 1 sibling, 0 replies; 11+ messages in thread From: Daniel Drake @ 2019-08-29 7:33 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Mathias Nyman, Linux USB Mailing List, Linux PCI, Linux PM, Endless Linux Upstreaming Team On Wed, Aug 28, 2019 at 4:43 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > With the git branch mentioned previously merged in, you can enable > dynamic debug in device_pm.c, repeat the PM-runtime test and collect > the log. There should be some additional messages from the ACPI layer > in it. That's useful, thanks. Runtime suspend: usb 1-4: USB disconnect, device number 2 power-0419 __acpi_power_off : Power resource [P0U0] turned off device_pm-0278 device_set_power : Device [XHC0] transitioned to D3hot Runtime resume: power-0363 __acpi_power_on : Power resource [P0U0] turned on device_pm-0278 device_set_power : Device [XHC0] transitioned to D0 xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3 xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002) xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout xhci_hcd 0000:03:00.3: PCI post-resume error -110! xhci_hcd 0000:03:00.3: HC died; cleaning up ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ryzen7 3700U xhci fails on resume from sleep 2019-08-28 8:43 ` Rafael J. Wysocki 2019-08-29 7:33 ` Daniel Drake @ 2019-09-23 9:10 ` Daniel Drake 2019-09-25 5:36 ` Daniel Drake 1 sibling, 1 reply; 11+ messages in thread From: Daniel Drake @ 2019-09-23 9:10 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Mathias Nyman, Linux USB Mailing List, Linux PCI, Linux PM, Endless Linux Upstreaming Team On Wed, Aug 28, 2019 at 4:43 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > I'll report it to the vendor, > > Yes, please. At least try to get the information on what the exact > platform expectations with respect to the OS are. Quite evidently, > they aren't just "do the usual thing". AMD's response was: > It’s modern stanby system, we don’t have any resource to debug Linux issue on it. I imagine there are people in AMD that do care, but we don't have the right contacts here, not sure if you happen to have anyone you could forward this thread to just in case. Anyway, I looked again and found some more interesting points and a likely workaround, if you can help me nail down the details. I checked again the experiment of runtime-suspending and resuming the USB controller. As before, the problematic step is waking up. In pci_raw_set_power_state() the pmcsr is first read as value 103, then written as 0, then it msleeps for 10ms (in pci_dev_d3_sleep()), then reads back value 3. What I hadn't spotted before is that even though it failed to change the power state bits, bit 8 did get successfully unset, indicating that the device is not completely dead. I then increased the msleep delay to 20ms and now it resumes fine & USB devices work. Unfortunately it's not quite as simple as quirking d3_delay though, because the problem still happens upon resume from s2idle. In that case, pci_dev_d3_sleep() is not called at all. if (state == PCI_D3hot || dev->current_state == PCI_D3hot) pci_dev_d3_sleep(dev); In the runtime resume case, dev->current_state == PCI_D3hot here (even though pci_set_power_state had been called to put it into D3cold), so the msleep happens. But in the system sleep (s2idle) case, dev->current_state == PCI_D3cold here, so no sleep happens. That is strangely inconsistent - is it a bug? I also noticed that there is a 100ms d3cold_delay, but that seems to happen before pcmsr is accessed at all, and doesn't have take any effect here. However, I did also notice that there is no d3cold_delay done during wakeup from s2idle, it only happens on wakeup from runtime suspend. The code does seem to be written that way (runtime_d3cold flag) but I wonder if that is correct. From the standpoint of the ACPI PM specs, is there a difference between runtime suspend and s2idle suspend? Since there is no firmware-based system suspend happening I wonder if the d3cold_delay should apply in both cases. I compared behaviour to another system with Amd Ryzen5 3500U. It's not quite the same SoC but the XHCI controllers have the same PCI IDs. On that platform, I was able to reproduce the failure to runtime resume, but then it succeeds with a d3_delay of 20ms. On system suspend/resume, this other platform uses S3, and the XHCI controller is already in D0 upon resume (looks like the firmware turns on the USB controllers for us, so Linux avoids any difficulties there). That seems to agree that quirking these XHCI controllers (based on PCI ID) to have a d3_delay of 20ms seems sane, but first we need to nail down why that delay is not applied at all during resume from s2idle. Daniel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ryzen7 3700U xhci fails on resume from sleep 2019-09-23 9:10 ` Daniel Drake @ 2019-09-25 5:36 ` Daniel Drake 2019-10-08 5:42 ` Daniel Drake 0 siblings, 1 reply; 11+ messages in thread From: Daniel Drake @ 2019-09-25 5:36 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Mathias Nyman, Linux USB Mailing List, Linux PCI, Linux PM, Endless Linux Upstreaming Team On Mon, Sep 23, 2019 at 5:10 PM Daniel Drake <drake@endlessm.com> wrote: > Unfortunately it's not quite as simple as quirking d3_delay though, > because the problem still happens upon resume from s2idle. In that > case, pci_dev_d3_sleep() is not called at all. > > if (state == PCI_D3hot || dev->current_state == PCI_D3hot) > pci_dev_d3_sleep(dev); > > In the runtime resume case, dev->current_state == PCI_D3hot here (even > though pci_set_power_state had been called to put it into D3cold), so > the msleep happens. > But in the system sleep (s2idle) case, dev->current_state == > PCI_D3cold here, so no sleep happens. > That is strangely inconsistent - is it a bug? In more detail: During runtime suspend: pci_set_power_state() is called with state=D3cold - This calls pci_raw_set_power_state() with state=D3hot -- After transitioning to D3hot, dev->current_state is updated to D3hot based on pmcsr readback acpi_device_set_power() is called with state=D3cold - acpi_power_transition() is called with state=D3cold, updates adev->power.state to D3cold - adev->power.state is updated (again) to D0 pci_update_current_state() is called - platform_pci_get_power_state() returns D3cold - dev->current_state is updated to D3cold Observations: everything seems to be in order During runtime resume: pci_update_current_state() is called - platform_pci_get_power_state() returns D3cold - dev->current_state is updated to D3cold pci_set_power_state() is called with state=D0 - Calls pci_platform_power_transition -- Calls acpi_pci_set_power_state -> acpi_device_set_power with state=D0 : --- acpi_power_transition() updates adev->power.state to D0 --- adev->power.state is updated (again) to D0 -- Calls pci_update_current_state --- platform_pci_get_power_state() returns D0 (so this is ignored) --- dev->current_state is updated to D3 via pmcsr read - D3cold delay happens (good) - Calls pci_raw_set_power_state() with state=D0 -- current_state is D3 so the D3 delay happens (good) (I quirked this delay to 20ms) -- device is transitioned to D0 and dev->current_state is updated to D0 from pmcsr read Observations: everything seems to be in order, USB is working after resume During s2idle suspend: Exactly the same as runtime suspend above. Device transitions into D3cold and dev->current_state ends up as D3cold. Everything seems to be in order. During s2idle resume: acpi_device_set_power() is called with state=D0 - acpi_power_transition() is called with state=D0, updates adev->power.state to D0 - adev->power.state is updated (again) to D0 pci_raw_set_power_state() is calld with state=D0 -- dev->current_state is D3cold so no D3 delay happens *** -- device fails to transitioned to D0, pmcsr read indicates it's still in D3. Observations: that's a pretty clear difference between the s2idle resume and runtime resume paths. The s2idle resume path is arrived at via pci_pm_default_resume_early() --> pci_power_up(). Should the s2idle resume path be modified to call into pci_update_current_state() to change the current_state to D3hot based on pmcsr (like the runtime resume path does)? Or should pci_raw_set_power_state() be modified to also apply the d3_delay when transitioning from D3cold to D0? Thanks Daniel ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Ryzen7 3700U xhci fails on resume from sleep 2019-09-25 5:36 ` Daniel Drake @ 2019-10-08 5:42 ` Daniel Drake 0 siblings, 0 replies; 11+ messages in thread From: Daniel Drake @ 2019-10-08 5:42 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Mathias Nyman, Linux USB Mailing List, Linux PCI, Linux PM, Endless Linux Upstreaming Team Hi Rafael, On Wed, Sep 25, 2019 at 1:36 PM Daniel Drake <drake@endlessm.com> wrote: > Should the s2idle resume path be modified to call into > pci_update_current_state() to change the current_state to D3hot based > on pmcsr (like the runtime resume path does)? > Or should pci_raw_set_power_state() be modified to also apply the > d3_delay when transitioning from D3cold to D0? Any thoughts here? I also sent a patch implementing the 2nd point above: https://patchwork.kernel.org/patch/11164089/ but no response yet. Thanks Daniel ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-10-08 5:42 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-08-26 9:10 Ryzen7 3700U xhci fails on resume from sleep Daniel Drake 2019-08-26 9:29 ` Rafael J. Wysocki 2019-08-26 13:34 ` Mathias Nyman 2019-08-27 2:49 ` Daniel Drake 2019-08-27 7:48 ` Rafael J. Wysocki 2019-08-28 8:34 ` Daniel Drake 2019-08-28 8:43 ` Rafael J. Wysocki 2019-08-29 7:33 ` Daniel Drake 2019-09-23 9:10 ` Daniel Drake 2019-09-25 5:36 ` Daniel Drake 2019-10-08 5:42 ` Daniel Drake
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).