linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Ryzen7 3700U xhci fails on resume from sleep
@ 2019-08-26  9:10 Daniel Drake
  2019-08-26  9:29 ` Rafael J. Wysocki
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Drake @ 2019-08-26  9:10 UTC (permalink / raw)
  To: Linux USB Mailing List, Linux PCI, Linux PM
  Cc: Endless Linux Upstreaming Team

Hi,

Working with a new consumer laptop based on AMD Ryzen 7 3700U, all USB
functionality goes dead upon suspend/resume (s2idle). Reproduced on
linus master from today.

<<<suspend>>>
Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
<<<wake up happens here>>>
xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3
xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
WARNING: CPU: 0 PID: 1980 at kernel/irq/chip.c:210 irq_startup+0xda/0xe0
Modules linked in:
CPU: 0 PID: 1980 Comm: bash Not tainted 5.3.0-rc6+ #265
Hardware name: ASUSTeK COMPUTER INC. ZenBook UX434DA_UX434DA/UX434DA,
BIOS UX434DA_UX434DA.301-C03 08/20/2019
RIP: 0010:irq_startup+0xda/0xe0
Code: ef e8 fa 2b 00 00 85 c0 0f 85 04 09 00 00 48 89 ee 31 d2 4c 89
ef e8 d5 d3 ff ff 48 89 df e8 cd fe ff ff 89 c5 e9 53 ff ff ff <0f> 0b
eb b5 66 90 55 48 89 fd 53 48 8b 47 38 89 f3 8b 00 a9 00 00
RSP: 0018:ffffa045407cfd68 EFLAGS: 00010002
RAX: 0000000000000040 RBX: ffff98058e968800 RCX: 0000000000000040
RDX: 0000000000000000 RSI: ffffffffa2b6d1f8 RDI: ffff98058e968818
RBP: ffff98058e968818 R08: 0000000000000000 R09: ffff98058ec00650
R10: 0000000000000000 R11: ffffffffa2a49568 R12: 0000000000000001
R13: 0000000000000001 R14: 0000000000000246 R15: ffffa045407cfde0
FS:  00007f2d54054740(0000) GS:ffff980590c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000562df2f3c976 CR3: 000000040ea00000 CR4: 00000000003406f0
Call Trace:
 resume_irqs+0x9e/0xd0
 dpm_noirq_end+0x5/0x10
 suspend_devices_and_enter+0x587/0x780
 pm_suspend.cold.7+0x309/0x35f
 state_store+0x7b/0xe0
 kernfs_fop_write+0x100/0x180
 vfs_write+0xa0/0x1a0
 ksys_write+0x54/0xd0
 do_syscall_64+0x3d/0x110
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f2d54141804
Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00
00 48 8d 05 f9 5e 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d
00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
RSP: 002b:00007ffce602cb28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f2d54141804
RDX: 0000000000000004 RSI: 000055764b26faa0 RDI: 0000000000000001
RBP: 000055764b26faa0 R08: 000000000000000a R09: 0000000000000077
R10: 000000000000000a R11: 0000000000000246 R12: 00007f2d54213760
R13: 0000000000000004 R14: 00007f2d5420e760 R15: 0000000000000004
---[ end trace 68323bdeb91ed863 ]---
xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002)
xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout
xhci_hcd 0000:03:00.4: PCI post-resume error -110!
xhci_hcd 0000:03:00.4: HC died; cleaning up
xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout
PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
xhci_hcd 0000:03:00.3: PCI post-resume error -110!
PM: Device 0000:03:00.4 failed to resume async: error -110
xhci_hcd 0000:03:00.3: HC died; cleaning up
PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
PM: Device 0000:03:00.3 failed to resume async: error -110
Restarting tasks ...
usb 1-3: USB disconnect, device number 2
usb 3-1: USB disconnect, device number 2
asix 1-3:1.0 enx001c490105e9: unregister 'asix' usb-0000:03:00.3-3,
ASIX AX88772 USB 2.0 Ethernet
done.
PM: suspend exit
xhci_hcd 0000:03:00.4: xHCI host controller not responding, assume dead
xhci_hcd 0000:03:00.4: HC died; cleaning up
xhci_hcd 0000:03:00.4: Timeout while waiting for configure endpoint command
xhci_hcd 0000:03:00.3: xHCI host controller not responding, assume dead
xhci_hcd 0000:03:00.3: HC died; cleaning up
xhci_hcd 0000:03:00.3: Timeout while waiting for configure endpoint command
usb 3-2: USB disconnect, device number 3
usb 1-4: USB disconnect, device number 3

I think the irq_startup() warning is unrelated - anyway the logs
already start complaining about xhci_hcd above that:

xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3
xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3

These messages indicate that Linux tried to power on the device again,
but the PCI power management registers indicate that it ignored the
request and remains in D3.

I tried a few things like making it try D3hot instead of D3cold (which
is what it's aiming for even though it's not mentioned in the logs
above), and disabling the suspend/resume actions taken by
drivers/pci/pci-acpi.c without any improvement.

Trying to sanity check other basic details I observe that this simple
routine (to put it in D3 then D0) also fails:

# cd /sys/bus/pci/drivers/xhci_hcd
# echo -n 0000:00:03.0 > unbind
# setpci -s 00:00.3 CAP_PM+4.b=3
# setpci -s 00:00.3 CAP_PM+4.b=0
# echo -n 0000:00:03.0 > bind

bind then fails with:
  xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002)
  xhci_hcd 0000:03:00.3: xHCI Host Controller
  xhci_hcd 0000:03:00.3: new USB bus registered, assigned bus number 1
  xhci_hcd 0000:03:00.3: Host halt failed, -19
  xhci_hcd 0000:03:00.3: can't setup: -19
  xhci_hcd 0000:03:00.3: USB bus 1 deregistered
  xhci_hcd 0000:03:00.3: init 0000:03:00.3 fail, -19

As another test I was wondering if I could get Linux to put it into D3
and then go back into D0 without having to go through the whole
suspend procedure, but even when I unbind it from xhci_hcd and set
power/control to "auto" in /sys/bus/pci/devices/0000:03:00.3,
runtime_status is "suspended" but Linux still leaves the device in D0
- is that expected?

Any debugging pointers much appreciated.

acpidump:
https://gist.github.com/dsd/ff3dfc0f63cdd9eba4a0fbd9e776e8be

lspci:
https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5

Thanks
Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ryzen7 3700U xhci fails on resume from sleep
  2019-08-26  9:10 Ryzen7 3700U xhci fails on resume from sleep Daniel Drake
@ 2019-08-26  9:29 ` Rafael J. Wysocki
  2019-08-26 13:34   ` Mathias Nyman
  0 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2019-08-26  9:29 UTC (permalink / raw)
  To: Daniel Drake
  Cc: Linux USB Mailing List, Linux PCI, Linux PM,
	Endless Linux Upstreaming Team

On Mon, Aug 26, 2019 at 11:11 AM Daniel Drake <drake@endlessm.com> wrote:
>
> Hi,
>
> Working with a new consumer laptop based on AMD Ryzen 7 3700U, all USB
> functionality goes dead upon suspend/resume (s2idle). Reproduced on
> linus master from today.

I wonder if you can reproduce this with the pm-s2idle-rework branch
from linux-pm.git merged in.

> <<<suspend>>>
> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> <<<wake up happens here>>>
> xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3
> xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
> WARNING: CPU: 0 PID: 1980 at kernel/irq/chip.c:210 irq_startup+0xda/0xe0
> Modules linked in:
> CPU: 0 PID: 1980 Comm: bash Not tainted 5.3.0-rc6+ #265
> Hardware name: ASUSTeK COMPUTER INC. ZenBook UX434DA_UX434DA/UX434DA,
> BIOS UX434DA_UX434DA.301-C03 08/20/2019
> RIP: 0010:irq_startup+0xda/0xe0
> Code: ef e8 fa 2b 00 00 85 c0 0f 85 04 09 00 00 48 89 ee 31 d2 4c 89
> ef e8 d5 d3 ff ff 48 89 df e8 cd fe ff ff 89 c5 e9 53 ff ff ff <0f> 0b
> eb b5 66 90 55 48 89 fd 53 48 8b 47 38 89 f3 8b 00 a9 00 00
> RSP: 0018:ffffa045407cfd68 EFLAGS: 00010002
> RAX: 0000000000000040 RBX: ffff98058e968800 RCX: 0000000000000040
> RDX: 0000000000000000 RSI: ffffffffa2b6d1f8 RDI: ffff98058e968818
> RBP: ffff98058e968818 R08: 0000000000000000 R09: ffff98058ec00650
> R10: 0000000000000000 R11: ffffffffa2a49568 R12: 0000000000000001
> R13: 0000000000000001 R14: 0000000000000246 R15: ffffa045407cfde0
> FS:  00007f2d54054740(0000) GS:ffff980590c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000562df2f3c976 CR3: 000000040ea00000 CR4: 00000000003406f0
> Call Trace:
>  resume_irqs+0x9e/0xd0
>  dpm_noirq_end+0x5/0x10
>  suspend_devices_and_enter+0x587/0x780
>  pm_suspend.cold.7+0x309/0x35f
>  state_store+0x7b/0xe0
>  kernfs_fop_write+0x100/0x180
>  vfs_write+0xa0/0x1a0
>  ksys_write+0x54/0xd0
>  do_syscall_64+0x3d/0x110
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f2d54141804
> Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00
> 00 48 8d 05 f9 5e 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d
> 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
> RSP: 002b:00007ffce602cb28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f2d54141804
> RDX: 0000000000000004 RSI: 000055764b26faa0 RDI: 0000000000000001
> RBP: 000055764b26faa0 R08: 000000000000000a R09: 0000000000000077
> R10: 000000000000000a R11: 0000000000000246 R12: 00007f2d54213760
> R13: 0000000000000004 R14: 00007f2d5420e760 R15: 0000000000000004
> ---[ end trace 68323bdeb91ed863 ]---
> xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
> xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002)
> xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout
> xhci_hcd 0000:03:00.4: PCI post-resume error -110!
> xhci_hcd 0000:03:00.4: HC died; cleaning up
> xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout
> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
> xhci_hcd 0000:03:00.3: PCI post-resume error -110!
> PM: Device 0000:03:00.4 failed to resume async: error -110
> xhci_hcd 0000:03:00.3: HC died; cleaning up
> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
> PM: Device 0000:03:00.3 failed to resume async: error -110
> Restarting tasks ...
> usb 1-3: USB disconnect, device number 2
> usb 3-1: USB disconnect, device number 2
> asix 1-3:1.0 enx001c490105e9: unregister 'asix' usb-0000:03:00.3-3,
> ASIX AX88772 USB 2.0 Ethernet
> done.
> PM: suspend exit
> xhci_hcd 0000:03:00.4: xHCI host controller not responding, assume dead
> xhci_hcd 0000:03:00.4: HC died; cleaning up
> xhci_hcd 0000:03:00.4: Timeout while waiting for configure endpoint command
> xhci_hcd 0000:03:00.3: xHCI host controller not responding, assume dead
> xhci_hcd 0000:03:00.3: HC died; cleaning up
> xhci_hcd 0000:03:00.3: Timeout while waiting for configure endpoint command
> usb 3-2: USB disconnect, device number 3
> usb 1-4: USB disconnect, device number 3
>
> I think the irq_startup() warning is unrelated - anyway the logs
> already start complaining about xhci_hcd above that:
>
> xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3
> xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
>
> These messages indicate that Linux tried to power on the device again,
> but the PCI power management registers indicate that it ignored the
> request and remains in D3.
>
> I tried a few things like making it try D3hot instead of D3cold (which
> is what it's aiming for even though it's not mentioned in the logs
> above), and disabling the suspend/resume actions taken by
> drivers/pci/pci-acpi.c without any improvement.
>
> Trying to sanity check other basic details I observe that this simple
> routine (to put it in D3 then D0) also fails:
>
> # cd /sys/bus/pci/drivers/xhci_hcd
> # echo -n 0000:00:03.0 > unbind
> # setpci -s 00:00.3 CAP_PM+4.b=3
> # setpci -s 00:00.3 CAP_PM+4.b=0
> # echo -n 0000:00:03.0 > bind
>
> bind then fails with:
>   xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002)
>   xhci_hcd 0000:03:00.3: xHCI Host Controller
>   xhci_hcd 0000:03:00.3: new USB bus registered, assigned bus number 1
>   xhci_hcd 0000:03:00.3: Host halt failed, -19
>   xhci_hcd 0000:03:00.3: can't setup: -19
>   xhci_hcd 0000:03:00.3: USB bus 1 deregistered
>   xhci_hcd 0000:03:00.3: init 0000:03:00.3 fail, -19
>
> As another test I was wondering if I could get Linux to put it into D3
> and then go back into D0 without having to go through the whole
> suspend procedure, but even when I unbind it from xhci_hcd and set
> power/control to "auto" in /sys/bus/pci/devices/0000:03:00.3,
> runtime_status is "suspended" but Linux still leaves the device in D0
> - is that expected?
>
> Any debugging pointers much appreciated.
>
> acpidump:
> https://gist.github.com/dsd/ff3dfc0f63cdd9eba4a0fbd9e776e8be
>
> lspci:
> https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5
>
> Thanks
> Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ryzen7 3700U xhci fails on resume from sleep
  2019-08-26  9:29 ` Rafael J. Wysocki
@ 2019-08-26 13:34   ` Mathias Nyman
  2019-08-27  2:49     ` Daniel Drake
  0 siblings, 1 reply; 11+ messages in thread
From: Mathias Nyman @ 2019-08-26 13:34 UTC (permalink / raw)
  To: Rafael J. Wysocki, Daniel Drake
  Cc: Linux USB Mailing List, Linux PCI, Linux PM,
	Endless Linux Upstreaming Team

On 26.8.2019 12.29, Rafael J. Wysocki wrote:
> On Mon, Aug 26, 2019 at 11:11 AM Daniel Drake <drake@endlessm.com> wrote:
>>
>> Hi,
>>
>> Working with a new consumer laptop based on AMD Ryzen 7 3700U, all USB
>> functionality goes dead upon suspend/resume (s2idle). Reproduced on
>> linus master from today.
> 
> I wonder if you can reproduce this with the pm-s2idle-rework branch
> from linux-pm.git merged in.
> 

Root cause looks similar to:
https://bugzilla.kernel.org/show_bug.cgi?id=203885

Mika wrote a fix for that:
https://lore.kernel.org/linux-pci/20190821124519.71594-1-mika.westerberg@linux.intel.com/

-Mathias
  
>> <<<suspend>>>
>> Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> <<<wake up happens here>>>
>> xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3
>> xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
>> WARNING: CPU: 0 PID: 1980 at kernel/irq/chip.c:210 irq_startup+0xda/0xe0
>> Modules linked in:
>> CPU: 0 PID: 1980 Comm: bash Not tainted 5.3.0-rc6+ #265
>> Hardware name: ASUSTeK COMPUTER INC. ZenBook UX434DA_UX434DA/UX434DA,
>> BIOS UX434DA_UX434DA.301-C03 08/20/2019
>> RIP: 0010:irq_startup+0xda/0xe0
>> Code: ef e8 fa 2b 00 00 85 c0 0f 85 04 09 00 00 48 89 ee 31 d2 4c 89
>> ef e8 d5 d3 ff ff 48 89 df e8 cd fe ff ff 89 c5 e9 53 ff ff ff <0f> 0b
>> eb b5 66 90 55 48 89 fd 53 48 8b 47 38 89 f3 8b 00 a9 00 00
>> RSP: 0018:ffffa045407cfd68 EFLAGS: 00010002
>> RAX: 0000000000000040 RBX: ffff98058e968800 RCX: 0000000000000040
>> RDX: 0000000000000000 RSI: ffffffffa2b6d1f8 RDI: ffff98058e968818
>> RBP: ffff98058e968818 R08: 0000000000000000 R09: ffff98058ec00650
>> R10: 0000000000000000 R11: ffffffffa2a49568 R12: 0000000000000001
>> R13: 0000000000000001 R14: 0000000000000246 R15: ffffa045407cfde0
>> FS:  00007f2d54054740(0000) GS:ffff980590c00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000562df2f3c976 CR3: 000000040ea00000 CR4: 00000000003406f0
>> Call Trace:
>>   resume_irqs+0x9e/0xd0
>>   dpm_noirq_end+0x5/0x10
>>   suspend_devices_and_enter+0x587/0x780
>>   pm_suspend.cold.7+0x309/0x35f
>>   state_store+0x7b/0xe0
>>   kernfs_fop_write+0x100/0x180
>>   vfs_write+0xa0/0x1a0
>>   ksys_write+0x54/0xd0
>>   do_syscall_64+0x3d/0x110
>>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> RIP: 0033:0x7f2d54141804
>> Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00
>> 00 48 8d 05 f9 5e 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d
>> 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
>> RSP: 002b:00007ffce602cb28 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>> RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f2d54141804
>> RDX: 0000000000000004 RSI: 000055764b26faa0 RDI: 0000000000000001
>> RBP: 000055764b26faa0 R08: 000000000000000a R09: 0000000000000077
>> R10: 000000000000000a R11: 0000000000000246 R12: 00007f2d54213760
>> R13: 0000000000000004 R14: 00007f2d5420e760 R15: 0000000000000004
>> ---[ end trace 68323bdeb91ed863 ]---
>> xhci_hcd 0000:03:00.4: enabling device (0000 -> 0002)
>> xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002)
>> xhci_hcd 0000:03:00.4: WARN: xHC restore state timeout
>> xhci_hcd 0000:03:00.4: PCI post-resume error -110!
>> xhci_hcd 0000:03:00.4: HC died; cleaning up
>> xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout
>> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
>> xhci_hcd 0000:03:00.3: PCI post-resume error -110!
>> PM: Device 0000:03:00.4 failed to resume async: error -110
>> xhci_hcd 0000:03:00.3: HC died; cleaning up
>> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
>> PM: Device 0000:03:00.3 failed to resume async: error -110
>> Restarting tasks ...
>> usb 1-3: USB disconnect, device number 2
>> usb 3-1: USB disconnect, device number 2
>> asix 1-3:1.0 enx001c490105e9: unregister 'asix' usb-0000:03:00.3-3,
>> ASIX AX88772 USB 2.0 Ethernet
>> done.
>> PM: suspend exit
>> xhci_hcd 0000:03:00.4: xHCI host controller not responding, assume dead
>> xhci_hcd 0000:03:00.4: HC died; cleaning up
>> xhci_hcd 0000:03:00.4: Timeout while waiting for configure endpoint command
>> xhci_hcd 0000:03:00.3: xHCI host controller not responding, assume dead
>> xhci_hcd 0000:03:00.3: HC died; cleaning up
>> xhci_hcd 0000:03:00.3: Timeout while waiting for configure endpoint command
>> usb 3-2: USB disconnect, device number 3
>> usb 1-4: USB disconnect, device number 3
>>
>> I think the irq_startup() warning is unrelated - anyway the logs
>> already start complaining about xhci_hcd above that:
>>
>> xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3
>> xhci_hcd 0000:03:00.4: Refused to change power state, currently in D3
>>
>> These messages indicate that Linux tried to power on the device again,
>> but the PCI power management registers indicate that it ignored the
>> request and remains in D3.
>>
>> I tried a few things like making it try D3hot instead of D3cold (which
>> is what it's aiming for even though it's not mentioned in the logs
>> above), and disabling the suspend/resume actions taken by
>> drivers/pci/pci-acpi.c without any improvement.
>>
>> Trying to sanity check other basic details I observe that this simple
>> routine (to put it in D3 then D0) also fails:
>>
>> # cd /sys/bus/pci/drivers/xhci_hcd
>> # echo -n 0000:00:03.0 > unbind
>> # setpci -s 00:00.3 CAP_PM+4.b=3
>> # setpci -s 00:00.3 CAP_PM+4.b=0
>> # echo -n 0000:00:03.0 > bind
>>
>> bind then fails with:
>>    xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002)
>>    xhci_hcd 0000:03:00.3: xHCI Host Controller
>>    xhci_hcd 0000:03:00.3: new USB bus registered, assigned bus number 1
>>    xhci_hcd 0000:03:00.3: Host halt failed, -19
>>    xhci_hcd 0000:03:00.3: can't setup: -19
>>    xhci_hcd 0000:03:00.3: USB bus 1 deregistered
>>    xhci_hcd 0000:03:00.3: init 0000:03:00.3 fail, -19
>>
>> As another test I was wondering if I could get Linux to put it into D3
>> and then go back into D0 without having to go through the whole
>> suspend procedure, but even when I unbind it from xhci_hcd and set
>> power/control to "auto" in /sys/bus/pci/devices/0000:03:00.3,
>> runtime_status is "suspended" but Linux still leaves the device in D0
>> - is that expected?
>>
>> Any debugging pointers much appreciated.
>>
>> acpidump:
>> https://gist.github.com/dsd/ff3dfc0f63cdd9eba4a0fbd9e776e8be
>>
>> lspci:
>> https://gist.github.com/dsd/bd9370b35defdf43680b81ecb34381d5
>>
>> Thanks
>> Daniel


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ryzen7 3700U xhci fails on resume from sleep
  2019-08-26 13:34   ` Mathias Nyman
@ 2019-08-27  2:49     ` Daniel Drake
  2019-08-27  7:48       ` Rafael J. Wysocki
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Drake @ 2019-08-27  2:49 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Rafael J. Wysocki, Linux USB Mailing List, Linux PCI, Linux PM,
	Endless Linux Upstreaming Team

On Mon, Aug 26, 2019 at 9:32 PM Mathias Nyman
<mathias.nyman@linux.intel.com> wrote:
> On 26.8.2019 12.29, Rafael J. Wysocki wrote:
> > I wonder if you can reproduce this with the pm-s2idle-rework branch
> > from linux-pm.git merged in.
>
> Root cause looks similar to:
> https://bugzilla.kernel.org/show_bug.cgi?id=203885
>
> Mika wrote a fix for that:
> https://lore.kernel.org/linux-pci/20190821124519.71594-1-mika.westerberg@linux.intel.com/

Thanks for the suggestions. Mika's patch was already applied then
reverted, I applied it again but there's no change.
Also merging in pm-s2idle-rework doesn't make any difference.

Any other ideas? Or comments on my findings so far?
Given that I can't shift D0-D3-D0 reliably directly with setpci before
loading the driver, is that indicative of a fundamental problem with
the platform, or is my test invalid?
Or in terms of other ways of testing the power transition outside of
the suspend path, if a PCI dev is runtime suspended with no driver
loaded, should Linux not be attempting to put it into D3?

Thanks
Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ryzen7 3700U xhci fails on resume from sleep
  2019-08-27  2:49     ` Daniel Drake
@ 2019-08-27  7:48       ` Rafael J. Wysocki
  2019-08-28  8:34         ` Daniel Drake
  0 siblings, 1 reply; 11+ messages in thread
From: Rafael J. Wysocki @ 2019-08-27  7:48 UTC (permalink / raw)
  To: Daniel Drake
  Cc: Mathias Nyman, Rafael J. Wysocki, Linux USB Mailing List,
	Linux PCI, Linux PM, Endless Linux Upstreaming Team

On Tue, Aug 27, 2019 at 4:50 AM Daniel Drake <drake@endlessm.com> wrote:
>
> On Mon, Aug 26, 2019 at 9:32 PM Mathias Nyman
> <mathias.nyman@linux.intel.com> wrote:
> > On 26.8.2019 12.29, Rafael J. Wysocki wrote:
> > > I wonder if you can reproduce this with the pm-s2idle-rework branch
> > > from linux-pm.git merged in.
> >
> > Root cause looks similar to:
> > https://bugzilla.kernel.org/show_bug.cgi?id=203885
> >
> > Mika wrote a fix for that:
> > https://lore.kernel.org/linux-pci/20190821124519.71594-1-mika.westerberg@linux.intel.com/
>
> Thanks for the suggestions. Mika's patch was already applied then
> reverted, I applied it again but there's no change.
> Also merging in pm-s2idle-rework doesn't make any difference.
>
> Any other ideas? Or comments on my findings so far?
> Given that I can't shift D0-D3-D0 reliably directly with setpci before
> loading the driver, is that indicative of a fundamental problem with
> the platform, or is my test invalid?

That depends on what exactly happens when you try to do the D0-D3-D0
with setpci.  If the device becomes unreachable (or worse) after that,
it indicates a platform issue.  It should not do any harm at the
least.

However, in principle D0-D3-D0 at the PCI level alone may not be
sufficient, because ACPI may need to be involved.

I think that PM-runtime should suspend XHCI controllers without
anything on the bus under them, so I wonder what happens if
".../power/control" is set to "on" and then to "auto" for that device,
with the driver loaded.

> Or in terms of other ways of testing the power transition outside of
> the suspend path, if a PCI dev is runtime suspended with no driver
> loaded, should Linux not be attempting to put it into D3?

PCI devices without drivers cannot be runtime-suspended at all.

Cheers,
Rafael

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ryzen7 3700U xhci fails on resume from sleep
  2019-08-27  7:48       ` Rafael J. Wysocki
@ 2019-08-28  8:34         ` Daniel Drake
  2019-08-28  8:43           ` Rafael J. Wysocki
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Drake @ 2019-08-28  8:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mathias Nyman, Linux USB Mailing List, Linux PCI, Linux PM,
	Endless Linux Upstreaming Team

On Tue, Aug 27, 2019 at 3:48 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> That depends on what exactly happens when you try to do the D0-D3-D0
> with setpci.  If the device becomes unreachable (or worse) after that,
> it indicates a platform issue.  It should not do any harm at the
> least.
>
> However, in principle D0-D3-D0 at the PCI level alone may not be
> sufficient, because ACPI may need to be involved.

After using setpci to do D0-D3-D0 transitions, the xhci module fails to probe.

  xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout
  xhci_hcd 0000:03:00.3: PCI post-resume error -110!

But maybe it's not a great test; as you say I'm not involving ACPI,
and also if Linux has a reason for not runtime suspending PCI devices
without drivers present then maybe I should also not be doing this.

> I think that PM-runtime should suspend XHCI controllers without
> anything on the bus under them, so I wonder what happens if
> ".../power/control" is set to "on" and then to "auto" for that device,
> with the driver loaded.

Good hint.

# echo on > /sys/bus/pci/devices/0000\:03\:00.3/power/control
# echo auto > /sys/bus/pci/devices/0000\:03\:00.3/power/control
# echo 1 > /sys/bus/usb/devices/1-4/remove
# cat /sys/bus/pci/devices/0000\:03\:00.3/power/runtime_status
suspended
# echo on > /sys/bus/pci/devices/0000\:03\:00.3/power/control

The final command there triggers these messages (including a printk I
added in pci_raw_set_power_state):
 xhci_hcd 0000:03:00.3: pci_raw_set_power_state from 3 to 0
 xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3
 xhci_hcd 0000:03:00.3: pci_raw_set_power_state from 3 to 0
 xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002)
 xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout
 xhci_hcd 0000:03:00.3: PCI post-resume error -110!
 xhci_hcd 0000:03:00.3: HC died; cleaning up

So we just reproduced the same issue using runtime PM, without having
to go through the whole suspend path.

I guess that points towards a platform issue, although the weird thing
is that Windows presumably does the D3-D0-D3 transition during
suspend/resume and that appears to work fine.

I'll report it to the vendor, but if you have any other debug ideas
they would be much appreciated.

Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ryzen7 3700U xhci fails on resume from sleep
  2019-08-28  8:34         ` Daniel Drake
@ 2019-08-28  8:43           ` Rafael J. Wysocki
  2019-08-29  7:33             ` Daniel Drake
  2019-09-23  9:10             ` Daniel Drake
  0 siblings, 2 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2019-08-28  8:43 UTC (permalink / raw)
  To: Daniel Drake
  Cc: Rafael J. Wysocki, Mathias Nyman, Linux USB Mailing List,
	Linux PCI, Linux PM, Endless Linux Upstreaming Team

On Wed, Aug 28, 2019 at 10:34 AM Daniel Drake <drake@endlessm.com> wrote:
>
> On Tue, Aug 27, 2019 at 3:48 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > That depends on what exactly happens when you try to do the D0-D3-D0
> > with setpci.  If the device becomes unreachable (or worse) after that,
> > it indicates a platform issue.  It should not do any harm at the
> > least.
> >
> > However, in principle D0-D3-D0 at the PCI level alone may not be
> > sufficient, because ACPI may need to be involved.
>
> After using setpci to do D0-D3-D0 transitions, the xhci module fails to probe.
>
>   xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout
>   xhci_hcd 0000:03:00.3: PCI post-resume error -110!
>
> But maybe it's not a great test; as you say I'm not involving ACPI,
> and also if Linux has a reason for not runtime suspending PCI devices
> without drivers present then maybe I should also not be doing this.
>
> > I think that PM-runtime should suspend XHCI controllers without
> > anything on the bus under them, so I wonder what happens if
> > ".../power/control" is set to "on" and then to "auto" for that device,
> > with the driver loaded.
>
> Good hint.
>
> # echo on > /sys/bus/pci/devices/0000\:03\:00.3/power/control
> # echo auto > /sys/bus/pci/devices/0000\:03\:00.3/power/control
> # echo 1 > /sys/bus/usb/devices/1-4/remove
> # cat /sys/bus/pci/devices/0000\:03\:00.3/power/runtime_status
> suspended
> # echo on > /sys/bus/pci/devices/0000\:03\:00.3/power/control
>
> The final command there triggers these messages (including a printk I
> added in pci_raw_set_power_state):
>  xhci_hcd 0000:03:00.3: pci_raw_set_power_state from 3 to 0
>  xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3
>  xhci_hcd 0000:03:00.3: pci_raw_set_power_state from 3 to 0
>  xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002)
>  xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout
>  xhci_hcd 0000:03:00.3: PCI post-resume error -110!
>  xhci_hcd 0000:03:00.3: HC died; cleaning up
>
> So we just reproduced the same issue using runtime PM, without having
> to go through the whole suspend path.
>
> I guess that points towards a platform issue, although the weird thing
> is that Windows presumably does the D3-D0-D3 transition during
> suspend/resume and that appears to work fine.

It looks like the platform expects the OS to do something that our
generic XHCI driver and the PCI/ACPI layer don't do.

A quirk or similar may be needed to address that.

> I'll report it to the vendor,

Yes, please.  At least try to get the information on what the exact
platform expectations with respect to the OS are.  Quite evidently,
they aren't just "do the usual thing".

> but if you have any other debug ideas they would be much appreciated.

With the git branch mentioned previously merged in, you can enable
dynamic debug in device_pm.c, repeat the PM-runtime test and collect
the log.  There should be some additional messages from the ACPI layer
in it.


>
> Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ryzen7 3700U xhci fails on resume from sleep
  2019-08-28  8:43           ` Rafael J. Wysocki
@ 2019-08-29  7:33             ` Daniel Drake
  2019-09-23  9:10             ` Daniel Drake
  1 sibling, 0 replies; 11+ messages in thread
From: Daniel Drake @ 2019-08-29  7:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mathias Nyman, Linux USB Mailing List, Linux PCI, Linux PM,
	Endless Linux Upstreaming Team

On Wed, Aug 28, 2019 at 4:43 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> With the git branch mentioned previously merged in, you can enable
> dynamic debug in device_pm.c, repeat the PM-runtime test and collect
> the log.  There should be some additional messages from the ACPI layer
> in it.

That's useful, thanks. Runtime suspend:

usb 1-4: USB disconnect, device number 2
    power-0419 __acpi_power_off      : Power resource [P0U0] turned off
device_pm-0278 device_set_power      : Device [XHC0] transitioned to D3hot

Runtime resume:
    power-0363 __acpi_power_on       : Power resource [P0U0] turned on
device_pm-0278 device_set_power      : Device [XHC0] transitioned to D0
xhci_hcd 0000:03:00.3: Refused to change power state, currently in D3
xhci_hcd 0000:03:00.3: enabling device (0000 -> 0002)
xhci_hcd 0000:03:00.3: WARN: xHC restore state timeout
xhci_hcd 0000:03:00.3: PCI post-resume error -110!
xhci_hcd 0000:03:00.3: HC died; cleaning up

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ryzen7 3700U xhci fails on resume from sleep
  2019-08-28  8:43           ` Rafael J. Wysocki
  2019-08-29  7:33             ` Daniel Drake
@ 2019-09-23  9:10             ` Daniel Drake
  2019-09-25  5:36               ` Daniel Drake
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel Drake @ 2019-09-23  9:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mathias Nyman, Linux USB Mailing List, Linux PCI, Linux PM,
	Endless Linux Upstreaming Team

On Wed, Aug 28, 2019 at 4:43 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > I'll report it to the vendor,
>
> Yes, please.  At least try to get the information on what the exact
> platform expectations with respect to the OS are.  Quite evidently,
> they aren't just "do the usual thing".

AMD's response was:
 > It’s modern stanby system, we don’t have any resource to debug
Linux issue on it.

I imagine there are people in AMD that do care, but we don't have the
right contacts here, not sure if you happen to have anyone you could
forward this thread to just in case.

Anyway, I looked again and found some more interesting points and a
likely workaround, if you can help me nail down the details.

I checked again the experiment of runtime-suspending and resuming the
USB controller. As before, the problematic step is waking up. In
pci_raw_set_power_state() the pmcsr is first read as value 103, then
written as 0, then it msleeps for 10ms (in pci_dev_d3_sleep()), then
reads back value 3.

What I hadn't spotted before is that even though it failed to change
the power state bits, bit 8 did get successfully unset, indicating
that the device is not completely dead.

I then increased the msleep delay to 20ms and now it resumes fine &
USB devices work.

Unfortunately it's not quite as simple as quirking d3_delay though,
because the problem still happens upon resume from s2idle. In that
case, pci_dev_d3_sleep() is not called at all.

    if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
        pci_dev_d3_sleep(dev);

In the runtime resume case, dev->current_state == PCI_D3hot here (even
though pci_set_power_state had been called to put it into D3cold), so
the msleep happens.
But in the system sleep (s2idle) case, dev->current_state ==
PCI_D3cold here, so no sleep happens.
That is strangely inconsistent - is it a bug?

I also noticed that there is a 100ms d3cold_delay, but that seems to
happen before pcmsr is accessed at all, and doesn't have take any
effect here. However, I did also notice that there is no d3cold_delay
done during wakeup from s2idle, it only happens on wakeup from runtime
suspend. The code does seem to be written that way (runtime_d3cold
flag) but I wonder if that is correct. From the standpoint of the ACPI
PM specs, is there a difference between runtime suspend and s2idle
suspend? Since there is no firmware-based system suspend happening I
wonder if the d3cold_delay should apply in both cases.

I compared behaviour to another system with Amd Ryzen5 3500U. It's not
quite the same SoC but the XHCI controllers have the same PCI IDs. On
that platform, I was able to reproduce the failure to runtime resume,
but then it succeeds with a d3_delay of 20ms. On system
suspend/resume, this other platform uses S3, and the XHCI controller
is already in D0 upon resume (looks like the firmware turns on the USB
controllers for us, so Linux avoids any difficulties there).

That seems to agree that quirking these XHCI controllers (based on PCI
ID) to have a d3_delay of 20ms seems sane, but first we need to nail
down why that delay is not applied at all during resume from s2idle.

Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ryzen7 3700U xhci fails on resume from sleep
  2019-09-23  9:10             ` Daniel Drake
@ 2019-09-25  5:36               ` Daniel Drake
  2019-10-08  5:42                 ` Daniel Drake
  0 siblings, 1 reply; 11+ messages in thread
From: Daniel Drake @ 2019-09-25  5:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mathias Nyman, Linux USB Mailing List, Linux PCI, Linux PM,
	Endless Linux Upstreaming Team

On Mon, Sep 23, 2019 at 5:10 PM Daniel Drake <drake@endlessm.com> wrote:
> Unfortunately it's not quite as simple as quirking d3_delay though,
> because the problem still happens upon resume from s2idle. In that
> case, pci_dev_d3_sleep() is not called at all.
>
>     if (state == PCI_D3hot || dev->current_state == PCI_D3hot)
>         pci_dev_d3_sleep(dev);
>
> In the runtime resume case, dev->current_state == PCI_D3hot here (even
> though pci_set_power_state had been called to put it into D3cold), so
> the msleep happens.
> But in the system sleep (s2idle) case, dev->current_state ==
> PCI_D3cold here, so no sleep happens.
> That is strangely inconsistent - is it a bug?

In more detail:

During runtime suspend:

pci_set_power_state() is called with state=D3cold
 - This calls pci_raw_set_power_state() with state=D3hot
 -- After transitioning to D3hot, dev->current_state is updated to
D3hot based on pmcsr readback

acpi_device_set_power() is called with state=D3cold
 - acpi_power_transition() is called with state=D3cold, updates
adev->power.state to D3cold
 - adev->power.state is updated (again) to D0

pci_update_current_state() is called
 - platform_pci_get_power_state() returns D3cold
 - dev->current_state is updated to D3cold

Observations: everything seems to be in order


During runtime resume:

pci_update_current_state() is called
 - platform_pci_get_power_state() returns D3cold
 - dev->current_state is updated to D3cold

pci_set_power_state() is called with state=D0
 - Calls pci_platform_power_transition
 -- Calls acpi_pci_set_power_state -> acpi_device_set_power with state=D0 :
 --- acpi_power_transition() updates adev->power.state to D0
 --- adev->power.state is updated (again) to D0
 -- Calls pci_update_current_state
 --- platform_pci_get_power_state() returns D0 (so this is ignored)
 --- dev->current_state is updated to D3 via pmcsr read
 - D3cold delay happens (good)
 - Calls pci_raw_set_power_state() with state=D0
 -- current_state is D3 so the D3 delay happens (good) (I quirked this
delay to 20ms)
 -- device is transitioned to D0 and dev->current_state is updated to
D0 from pmcsr read

Observations: everything seems to be in order, USB is working after resume


During s2idle suspend:

Exactly the same as runtime suspend above. Device transitions into
D3cold and dev->current_state ends up as D3cold. Everything seems to
be in order.


During s2idle resume:

acpi_device_set_power() is called with state=D0
 - acpi_power_transition() is called with state=D0, updates
adev->power.state to D0
 - adev->power.state is updated (again) to D0

pci_raw_set_power_state() is calld with state=D0
 -- dev->current_state is D3cold so no D3 delay happens ***
 -- device fails to transitioned to D0, pmcsr read indicates it's still in D3.

Observations: that's a pretty clear difference between the s2idle
resume and runtime resume paths.
The s2idle resume path is arrived at via pci_pm_default_resume_early()
--> pci_power_up().


Should the s2idle resume path be modified to call into
pci_update_current_state() to change the current_state to D3hot based
on pmcsr (like the runtime resume path does)?
Or should pci_raw_set_power_state() be modified to also apply the
d3_delay when transitioning from D3cold to D0?

Thanks
Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Ryzen7 3700U xhci fails on resume from sleep
  2019-09-25  5:36               ` Daniel Drake
@ 2019-10-08  5:42                 ` Daniel Drake
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Drake @ 2019-10-08  5:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mathias Nyman, Linux USB Mailing List, Linux PCI, Linux PM,
	Endless Linux Upstreaming Team

Hi Rafael,

On Wed, Sep 25, 2019 at 1:36 PM Daniel Drake <drake@endlessm.com> wrote:
> Should the s2idle resume path be modified to call into
> pci_update_current_state() to change the current_state to D3hot based
> on pmcsr (like the runtime resume path does)?
> Or should pci_raw_set_power_state() be modified to also apply the
> d3_delay when transitioning from D3cold to D0?

Any thoughts here?
I also sent a patch implementing the 2nd point above:
https://patchwork.kernel.org/patch/11164089/
but no response yet.

Thanks
Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-10-08  5:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-26  9:10 Ryzen7 3700U xhci fails on resume from sleep Daniel Drake
2019-08-26  9:29 ` Rafael J. Wysocki
2019-08-26 13:34   ` Mathias Nyman
2019-08-27  2:49     ` Daniel Drake
2019-08-27  7:48       ` Rafael J. Wysocki
2019-08-28  8:34         ` Daniel Drake
2019-08-28  8:43           ` Rafael J. Wysocki
2019-08-29  7:33             ` Daniel Drake
2019-09-23  9:10             ` Daniel Drake
2019-09-25  5:36               ` Daniel Drake
2019-10-08  5:42                 ` Daniel Drake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).