Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-05-02 14:47 Mathias Nyman
  0 siblings, 0 replies; 14+ messages in thread
From: Mathias Nyman @ 2018-05-02 14:47 UTC (permalink / raw)
  To: Alan Stern; +Cc: russianneuromancer, linux-usb

On 24.04.2018 16:50, Alan Stern wrote:
> On Tue, 24 Apr 2018, Mathias Nyman wrote:
> 
>>>>> In this situation, the HCD_WAKEUP_PENDING(hcd) test in
>>>>> hcd-pci.c:suspend_common() should prevent the controller from going
>>>>> back into D3.  The WAKEUP_PENDING bit gets set in
>>>>> usb_hcd_resume_root_hub() and it doesn't get cleared until
>>>>> hcd_bus_resume() runs.
>>>>>
>>>>
>>>> I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in this
>>>> specific failing case
>>>>
>>>> xhci_resume() has a check:
>>>> /* Resume root hubs only when have pending events. */
>>>>      status = readl(&xhci->op_regs->status);
>>>>        if (status & STS_EINT) {
>>>>          usb_hcd_resume_root_hub(xhci->shared_hcd);
>>>>          usb_hcd_resume_root_hub(hcd);
>>>>        }
>>>>
>>>> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
>>>> can suspend host controller again. when xhci driver finally gets to handle the interrupt
>>>> the controller may be in D3 already
>>>>
>>>> This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
>>>> could be possible as xhci has interrupt moderation enabled.
>>>
>>> Then maybe that test should be removed.  Calling
>>> usb_hcd_resume_root_hub() for every wakeup shouldn't be too bad,
>>> because there probably are not very many times when the controller gets
>>> resumed without the root hub also being resumed.
>>>
>>
>> The check was added to fix system suspend issue on a runtime suspended host:
>>
>> commit d6236f6d1d885aa19d1cd7317346fe795227a3cc
>>
>>       xhci: Fix runtime suspended xhci from blocking system suspend.
>>       
>>       The system suspend flow as following:
>>       1, Freeze all user processes and kenrel threads.
>>       
>>       2, Try to suspend all devices.
>>       
>>       2.1, If pci device is in RPM suspended state, then pci driver will try
>>       to resume it to RPM active state in the prepare stage.
>>       
>>       2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two
>>       workqueue items to resume usb2&usb3 roothub devices.
>>       
>>       2.3, Call suspend callbacks of devices.
>>       
>>       2.3.1, All suspend callbacks of all hcd's children, including
>>       roothub devices are called.
>>       
>>       2.3.2, Finally, hcd_pci_suspend callback is called.
>>       
>>       Due to workqueue threads were already frozen in step 1, the workqueue
>>       items can't be scheduled, and the roothub devices can't be resumed in
>>       this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in
>>       usb_hcd_resume_root_hub won't be cleared. Finally,
>>       hcd_pci_suspend will return -EBUSY, and system suspend fails.
> 
> Hmmm.  I don't recall seeing this problem occur with ehci-hcd.  But
> then, I haven't tested it very much recently.
> 
> We could change to a different work queue, one that doesn't get
> frozen.  But there's no guarantee that the work items would run before
> your step 2.3.2.
> 
> Maybe we can avoid step 2.1.  I think there have been some recent
> changes to the PM code in this area.  There may be a flag you can set
> that will prevent the PCI core from resuming the host controller.
> 
> Or maybe we can change step 2.3.1, so that the root hub's suspend
> callback will first do a resume if the WAKEUP_PENDING flag is set.
> That might be the most reliable approach.
> 

I'm not sure I understand the last suggestion, could you open up how it
would work?

I started approaching this from another direction, mainly making sure we don't
immediately runtime suspend the host controller after resume. Adding a next_statechange
minimal time between resume and suspend, and checking for pending events in xhci_suspend().

I'll have some patches to test for russianneuromancer@ya.ru soon

These are generic checks that ehci_suspend() also does

-Mathias
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread