All of lore.kernel.org
 help / color / mirror / Atom feed
* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-05-02 14:47 Mathias Nyman
  0 siblings, 0 replies; 14+ messages in thread
From: Mathias Nyman @ 2018-05-02 14:47 UTC (permalink / raw)
  To: Alan Stern; +Cc: russianneuromancer, linux-usb

On 24.04.2018 16:50, Alan Stern wrote:
> On Tue, 24 Apr 2018, Mathias Nyman wrote:
> 
>>>>> In this situation, the HCD_WAKEUP_PENDING(hcd) test in
>>>>> hcd-pci.c:suspend_common() should prevent the controller from going
>>>>> back into D3.  The WAKEUP_PENDING bit gets set in
>>>>> usb_hcd_resume_root_hub() and it doesn't get cleared until
>>>>> hcd_bus_resume() runs.
>>>>>
>>>>
>>>> I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in this
>>>> specific failing case
>>>>
>>>> xhci_resume() has a check:
>>>> /* Resume root hubs only when have pending events. */
>>>>      status = readl(&xhci->op_regs->status);
>>>>        if (status & STS_EINT) {
>>>>          usb_hcd_resume_root_hub(xhci->shared_hcd);
>>>>          usb_hcd_resume_root_hub(hcd);
>>>>        }
>>>>
>>>> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
>>>> can suspend host controller again. when xhci driver finally gets to handle the interrupt
>>>> the controller may be in D3 already
>>>>
>>>> This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
>>>> could be possible as xhci has interrupt moderation enabled.
>>>
>>> Then maybe that test should be removed.  Calling
>>> usb_hcd_resume_root_hub() for every wakeup shouldn't be too bad,
>>> because there probably are not very many times when the controller gets
>>> resumed without the root hub also being resumed.
>>>
>>
>> The check was added to fix system suspend issue on a runtime suspended host:
>>
>> commit d6236f6d1d885aa19d1cd7317346fe795227a3cc
>>
>>       xhci: Fix runtime suspended xhci from blocking system suspend.
>>       
>>       The system suspend flow as following:
>>       1, Freeze all user processes and kenrel threads.
>>       
>>       2, Try to suspend all devices.
>>       
>>       2.1, If pci device is in RPM suspended state, then pci driver will try
>>       to resume it to RPM active state in the prepare stage.
>>       
>>       2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two
>>       workqueue items to resume usb2&usb3 roothub devices.
>>       
>>       2.3, Call suspend callbacks of devices.
>>       
>>       2.3.1, All suspend callbacks of all hcd's children, including
>>       roothub devices are called.
>>       
>>       2.3.2, Finally, hcd_pci_suspend callback is called.
>>       
>>       Due to workqueue threads were already frozen in step 1, the workqueue
>>       items can't be scheduled, and the roothub devices can't be resumed in
>>       this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in
>>       usb_hcd_resume_root_hub won't be cleared. Finally,
>>       hcd_pci_suspend will return -EBUSY, and system suspend fails.
> 
> Hmmm.  I don't recall seeing this problem occur with ehci-hcd.  But
> then, I haven't tested it very much recently.
> 
> We could change to a different work queue, one that doesn't get
> frozen.  But there's no guarantee that the work items would run before
> your step 2.3.2.
> 
> Maybe we can avoid step 2.1.  I think there have been some recent
> changes to the PM code in this area.  There may be a flag you can set
> that will prevent the PCI core from resuming the host controller.
> 
> Or maybe we can change step 2.3.1, so that the root hub's suspend
> callback will first do a resume if the WAKEUP_PENDING flag is set.
> That might be the most reliable approach.
> 

I'm not sure I understand the last suggestion, could you open up how it
would work?

I started approaching this from another direction, mainly making sure we don't
immediately runtime suspend the host controller after resume. Adding a next_statechange
minimal time between resume and suspend, and checking for pending events in xhci_suspend().

I'll have some patches to test for russianneuromancer@ya.ru soon

These are generic checks that ehci_suspend() also does

-Mathias
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-05-04 11:53 Mathias Nyman
  0 siblings, 0 replies; 14+ messages in thread
From: Mathias Nyman @ 2018-05-04 11:53 UTC (permalink / raw)
  To: Alan Stern; +Cc: russianneuromancer, USB list

On 03.05.2018 21:56, Alan Stern wrote:
> On Thu, 3 May 2018, Mathias Nyman wrote:
> 
>> When everything is runtime suspended and start system suspend, we first suspend all
>> the usb devices, including the roothubs calling choose_wakeup() for the roothubs.
>> No flags are set yet. When pm continues suspending, and tries to suspend the xhci PCI
>> controller the PCI suspend code notices the device is runtime suspended,
>> it resumes it -> xhci_resume() -> usb_hcd_resume_root_hub() and WAKEUP_PENDING flag is set.
>> When PCI code then continues and tries to suspend the pci device it fails because the flag is set.
> 
> Okay, I get the picture.  And I just spent some time going over the
> core code and some of the other drivers.
> 
> So yes, what I said earlier was wrong.  The existing code in
> xhci_resume() is more or less correct; it should call
> usb_hcd_resume_root_hub() _only_ when there is a pending wakeup request
> from the root hub or a downstream device.
> 
> Earlier you wrote:
> 
>> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
>> can suspend host controller again. when xhci driver finally gets to handle the interrupt
>> the controller may be in D3 already
>>
>> This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
>> could be possible as xhci has interrupt moderation enabled.
> 
> This is the real problem.  You need to make sure that even with
> interrupt moderation, if there is a pending wakeup request then you can
> detect it properly.  In other words, xhci_resume() may need to
> explicitly check the root-hub port statuses, because it can't rely on
> the interrupt handler to inform it that a wakeup request has been
> received.
> 
> Does that make sense?
> 

It does, thanks
I'll write a testpatch that checks changes for ports in xhci_resume()

-Mathias
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-05-03 18:56 Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2018-05-03 18:56 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: russianneuromancer, USB list

On Thu, 3 May 2018, Mathias Nyman wrote:

> When everything is runtime suspended and start system suspend, we first suspend all
> the usb devices, including the roothubs calling choose_wakeup() for the roothubs.
> No flags are set yet. When pm continues suspending, and tries to suspend the xhci PCI
> controller the PCI suspend code notices the device is runtime suspended,
> it resumes it -> xhci_resume() -> usb_hcd_resume_root_hub() and WAKEUP_PENDING flag is set.
> When PCI code then continues and tries to suspend the pci device it fails because the flag is set.

Okay, I get the picture.  And I just spent some time going over the 
core code and some of the other drivers.

So yes, what I said earlier was wrong.  The existing code in
xhci_resume() is more or less correct; it should call
usb_hcd_resume_root_hub() _only_ when there is a pending wakeup request
from the root hub or a downstream device.

Earlier you wrote:

> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
> can suspend host controller again. when xhci driver finally gets to handle the interrupt
> the controller may be in D3 already
> 
> This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
> could be possible as xhci has interrupt moderation enabled.

This is the real problem.  You need to make sure that even with
interrupt moderation, if there is a pending wakeup request then you can
detect it properly.  In other words, xhci_resume() may need to
explicitly check the root-hub port statuses, because it can't rely on
the interrupt handler to inform it that a wakeup request has been
received.

Does that make sense?

Alan Stern
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-05-03 11:53 Mathias Nyman
  0 siblings, 0 replies; 14+ messages in thread
From: Mathias Nyman @ 2018-05-03 11:53 UTC (permalink / raw)
  To: Alan Stern; +Cc: russianneuromancer, linux-usb

On 03.05.2018 14:37, Mathias Nyman wrote:
> On 02.05.2018 20:52, Alan Stern wrote:
>> On Wed, 2 May 2018, Mathias Nyman wrote:
>>
>>> On 24.04.2018 16:50, Alan Stern wrote:
>>>> On Tue, 24 Apr 2018, Mathias Nyman wrote:
>>>>
>>>>>>>> In this situation, the HCD_WAKEUP_PENDING(hcd) test in
>>>>>>>> hcd-pci.c:suspend_common() should prevent the controller from going
>>>>>>>> back into D3.  The WAKEUP_PENDING bit gets set in
>>>>>>>> usb_hcd_resume_root_hub() and it doesn't get cleared until
>>>>>>>> hcd_bus_resume() runs.
>>>>>>>>
>>>>>>>
>>>>>>> I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in this
>>>>>>> specific failing case
>>>>>>>
>>>>>>> xhci_resume() has a check:
>>>>>>> /* Resume root hubs only when have pending events. */
>>>>>>>       status = readl(&xhci->op_regs->status);
>>>>>>>         if (status & STS_EINT) {
>>>>>>>           usb_hcd_resume_root_hub(xhci->shared_hcd);
>>>>>>>           usb_hcd_resume_root_hub(hcd);
>>>>>>>         }
>>>>>>>
>>>>>>> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
>>>>>>> can suspend host controller again. when xhci driver finally gets to handle the interrupt
>>>>>>> the controller may be in D3 already
>>>>>>>
>>>>>>> This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
>>>>>>> could be possible as xhci has interrupt moderation enabled.
>>>>>>
>>>>>> Then maybe that test should be removed.  Calling
>>>>>> usb_hcd_resume_root_hub() for every wakeup shouldn't be too bad,
>>>>>> because there probably are not very many times when the controller gets
>>>>>> resumed without the root hub also being resumed.
>>>>>>
>>>>>
>>>>> The check was added to fix system suspend issue on a runtime suspended host:
>>>>>
>>>>> commit d6236f6d1d885aa19d1cd7317346fe795227a3cc
>>>>>
>>>>>        xhci: Fix runtime suspended xhci from blocking system suspend.
>>>>>        The system suspend flow as following:
>>>>>        1, Freeze all user processes and kenrel threads.
>>>>>        2, Try to suspend all devices.
>>>>>        2.1, If pci device is in RPM suspended state, then pci driver will try
>>>>>        to resume it to RPM active state in the prepare stage.
>>>>>        2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two
>>>>>        workqueue items to resume usb2&usb3 roothub devices.
>>>>>        2.3, Call suspend callbacks of devices.
>>>>>        2.3.1, All suspend callbacks of all hcd's children, including
>>>>>        roothub devices are called.
>>>>>        2.3.2, Finally, hcd_pci_suspend callback is called.
>>>>>        Due to workqueue threads were already frozen in step 1, the workqueue
>>>>>        items can't be scheduled, and the roothub devices can't be resumed in
>>>>>        this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in
>>>>>        usb_hcd_resume_root_hub won't be cleared. Finally,
>>>>>        hcd_pci_suspend will return -EBUSY, and system suspend fails.
>>>>
>>>> Hmmm.  I don't recall seeing this problem occur with ehci-hcd.  But
>>>> then, I haven't tested it very much recently.
>>>>
>>>> We could change to a different work queue, one that doesn't get
>>>> frozen.  But there's no guarantee that the work items would run before
>>>> your step 2.3.2.
>>>>
>>>> Maybe we can avoid step 2.1.  I think there have been some recent
>>>> changes to the PM code in this area.  There may be a flag you can set
>>>> that will prevent the PCI core from resuming the host controller.
>>>>
>>>> Or maybe we can change step 2.3.1, so that the root hub's suspend
>>>> callback will first do a resume if the WAKEUP_PENDING flag is set.
>>>> That might be the most reliable approach.
>>>>
>>>
>>> I'm not sure I understand the last suggestion, could you open up how it
>>> would work?
>>
>> Here's what I had in mind.  See if you think this would work.
>>
>> Consider choose_wakeup() in core/driver.c.  That subroutine gets called
>> by usb_suspend() when step 2.3.1 wants to suspend a USB device.  We
>> could patch it as follows:
> 
> Thanks, now I see.
> 
> I was able to reproduce system suspend failure of a runtime
> suspended host by removing the event check in xhci_ring, and making sure
> pm_runtime_resume(&udev->dev) wasn't called in choose_wakeup().
> 
>>
>> --- usb-4.x.orig/drivers/usb/core/driver.c
>> +++ usb-4.x/drivers/usb/core/driver.c
>> @@ -1449,11 +1449,21 @@ static void choose_wakeup(struct usb_dev
>>        */
>>       w = device_may_wakeup(&udev->dev);
>> -    /* If the device is autosuspended with the wrong wakeup setting,
>> +    /*
>> +     * If the device is autosuspended with the wrong wakeup setting,
>>        * autoresume now so the setting can be changed.
>> +     *
>> +     * Likewise, if the device is an autosuspended root hub and the
>> +     * hcd needs to wake it up before the controller can be suspended,
>> +     * resume it now to clear the WAKEUP_PENDING flag.
>>        */
>> -    if (udev->state == USB_STATE_SUSPENDED && w != udev->do_remote_wakeup)
>> -        pm_runtime_resume(&udev->dev);
>> +    if (udev->state == USB_STATE_SUSPENDED) {
>> +        struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
>> +
>> +        if (w != udev->do_remote_wakeup ||
>> +                (!udev->parent && HCD_WAKEUP_PENDING(hcd)))
>> +            pm_runtime_resume(&udev->dev);
>> +    }
>>       udev->do_remote_wakeup = w;
>>   }
>>
> 
> If I only add the:
> if (!udev->parent && HCD_WAKEUP_PENDING(hcd)))
>      pm_runtime_resume(&udev->dev);
> to choose_wakeup() It still doesn't work.
> The HCD_WAKEUP_PENDING(hcd) check is false.
> 
> Turns out that the xhci_resume() that ends up setting HCD_FLAG_WAKEUP_PENDING is called
> after choose_wakeup() for the roothub.
> 
> There's something in the pm functions order that I don't follow here

Actually I do get the ordering.
When everything is runtime suspended and start system suspend, we first suspend all
the usb devices, including the roothubs calling choose_wakeup() for the roothubs.
No flags are set yet. When pm continues suspending, and tries to suspend the xhci PCI
controller the PCI suspend code notices the device is runtime suspended,
it resumes it -> xhci_resume() -> usb_hcd_resume_root_hub() and WAKEUP_PENDING flag is set.
When PCI code then continues and tries to suspend the pci device it fails because the flag is set.

So checking HCD_WAKEUP_PENDING(hcd) for roothub in choose_wakeup() won't help, it's too early

-Mathias
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-05-03 11:37 Mathias Nyman
  0 siblings, 0 replies; 14+ messages in thread
From: Mathias Nyman @ 2018-05-03 11:37 UTC (permalink / raw)
  To: Alan Stern; +Cc: russianneuromancer, linux-usb

On 02.05.2018 20:52, Alan Stern wrote:
> On Wed, 2 May 2018, Mathias Nyman wrote:
> 
>> On 24.04.2018 16:50, Alan Stern wrote:
>>> On Tue, 24 Apr 2018, Mathias Nyman wrote:
>>>
>>>>>>> In this situation, the HCD_WAKEUP_PENDING(hcd) test in
>>>>>>> hcd-pci.c:suspend_common() should prevent the controller from going
>>>>>>> back into D3.  The WAKEUP_PENDING bit gets set in
>>>>>>> usb_hcd_resume_root_hub() and it doesn't get cleared until
>>>>>>> hcd_bus_resume() runs.
>>>>>>>
>>>>>>
>>>>>> I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in this
>>>>>> specific failing case
>>>>>>
>>>>>> xhci_resume() has a check:
>>>>>> /* Resume root hubs only when have pending events. */
>>>>>>       status = readl(&xhci->op_regs->status);
>>>>>>         if (status & STS_EINT) {
>>>>>>           usb_hcd_resume_root_hub(xhci->shared_hcd);
>>>>>>           usb_hcd_resume_root_hub(hcd);
>>>>>>         }
>>>>>>
>>>>>> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
>>>>>> can suspend host controller again. when xhci driver finally gets to handle the interrupt
>>>>>> the controller may be in D3 already
>>>>>>
>>>>>> This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
>>>>>> could be possible as xhci has interrupt moderation enabled.
>>>>>
>>>>> Then maybe that test should be removed.  Calling
>>>>> usb_hcd_resume_root_hub() for every wakeup shouldn't be too bad,
>>>>> because there probably are not very many times when the controller gets
>>>>> resumed without the root hub also being resumed.
>>>>>
>>>>
>>>> The check was added to fix system suspend issue on a runtime suspended host:
>>>>
>>>> commit d6236f6d1d885aa19d1cd7317346fe795227a3cc
>>>>
>>>>        xhci: Fix runtime suspended xhci from blocking system suspend.
>>>>        
>>>>        The system suspend flow as following:
>>>>        1, Freeze all user processes and kenrel threads.
>>>>        
>>>>        2, Try to suspend all devices.
>>>>        
>>>>        2.1, If pci device is in RPM suspended state, then pci driver will try
>>>>        to resume it to RPM active state in the prepare stage.
>>>>        
>>>>        2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two
>>>>        workqueue items to resume usb2&usb3 roothub devices.
>>>>        
>>>>        2.3, Call suspend callbacks of devices.
>>>>        
>>>>        2.3.1, All suspend callbacks of all hcd's children, including
>>>>        roothub devices are called.
>>>>        
>>>>        2.3.2, Finally, hcd_pci_suspend callback is called.
>>>>        
>>>>        Due to workqueue threads were already frozen in step 1, the workqueue
>>>>        items can't be scheduled, and the roothub devices can't be resumed in
>>>>        this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in
>>>>        usb_hcd_resume_root_hub won't be cleared. Finally,
>>>>        hcd_pci_suspend will return -EBUSY, and system suspend fails.
>>>
>>> Hmmm.  I don't recall seeing this problem occur with ehci-hcd.  But
>>> then, I haven't tested it very much recently.
>>>
>>> We could change to a different work queue, one that doesn't get
>>> frozen.  But there's no guarantee that the work items would run before
>>> your step 2.3.2.
>>>
>>> Maybe we can avoid step 2.1.  I think there have been some recent
>>> changes to the PM code in this area.  There may be a flag you can set
>>> that will prevent the PCI core from resuming the host controller.
>>>
>>> Or maybe we can change step 2.3.1, so that the root hub's suspend
>>> callback will first do a resume if the WAKEUP_PENDING flag is set.
>>> That might be the most reliable approach.
>>>
>>
>> I'm not sure I understand the last suggestion, could you open up how it
>> would work?
> 
> Here's what I had in mind.  See if you think this would work.
> 
> Consider choose_wakeup() in core/driver.c.  That subroutine gets called
> by usb_suspend() when step 2.3.1 wants to suspend a USB device.  We
> could patch it as follows:

Thanks, now I see.

I was able to reproduce system suspend failure of a runtime
suspended host by removing the event check in xhci_ring, and making sure
pm_runtime_resume(&udev->dev) wasn't called in choose_wakeup().

> 
> --- usb-4.x.orig/drivers/usb/core/driver.c
> +++ usb-4.x/drivers/usb/core/driver.c
> @@ -1449,11 +1449,21 @@ static void choose_wakeup(struct usb_dev
>   	 */
>   	w = device_may_wakeup(&udev->dev);
>   
> -	/* If the device is autosuspended with the wrong wakeup setting,
> +	/*
> +	 * If the device is autosuspended with the wrong wakeup setting,
>   	 * autoresume now so the setting can be changed.
> +	 *
> +	 * Likewise, if the device is an autosuspended root hub and the
> +	 * hcd needs to wake it up before the controller can be suspended,
> +	 * resume it now to clear the WAKEUP_PENDING flag.
>   	 */
> -	if (udev->state == USB_STATE_SUSPENDED && w != udev->do_remote_wakeup)
> -		pm_runtime_resume(&udev->dev);
> +	if (udev->state == USB_STATE_SUSPENDED) {
> +		struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
> +
> +		if (w != udev->do_remote_wakeup ||
> +				(!udev->parent && HCD_WAKEUP_PENDING(hcd)))
> +			pm_runtime_resume(&udev->dev);
> +	}
>   	udev->do_remote_wakeup = w;
>   }
>   
> 

If I only add the:
if (!udev->parent && HCD_WAKEUP_PENDING(hcd)))
	pm_runtime_resume(&udev->dev);
to choose_wakeup() It still doesn't work.
The HCD_WAKEUP_PENDING(hcd) check is false.

Turns out that the xhci_resume() that ends up setting HCD_FLAG_WAKEUP_PENDING is called
after choose_wakeup() for the roothub.

There's something in the pm functions order that I don't follow here

-Mathias
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-05-02 17:52 Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2018-05-02 17:52 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: russianneuromancer, linux-usb

On Wed, 2 May 2018, Mathias Nyman wrote:

> On 24.04.2018 16:50, Alan Stern wrote:
> > On Tue, 24 Apr 2018, Mathias Nyman wrote:
> > 
> >>>>> In this situation, the HCD_WAKEUP_PENDING(hcd) test in
> >>>>> hcd-pci.c:suspend_common() should prevent the controller from going
> >>>>> back into D3.  The WAKEUP_PENDING bit gets set in
> >>>>> usb_hcd_resume_root_hub() and it doesn't get cleared until
> >>>>> hcd_bus_resume() runs.
> >>>>>
> >>>>
> >>>> I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in this
> >>>> specific failing case
> >>>>
> >>>> xhci_resume() has a check:
> >>>> /* Resume root hubs only when have pending events. */
> >>>>      status = readl(&xhci->op_regs->status);
> >>>>        if (status & STS_EINT) {
> >>>>          usb_hcd_resume_root_hub(xhci->shared_hcd);
> >>>>          usb_hcd_resume_root_hub(hcd);
> >>>>        }
> >>>>
> >>>> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
> >>>> can suspend host controller again. when xhci driver finally gets to handle the interrupt
> >>>> the controller may be in D3 already
> >>>>
> >>>> This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
> >>>> could be possible as xhci has interrupt moderation enabled.
> >>>
> >>> Then maybe that test should be removed.  Calling
> >>> usb_hcd_resume_root_hub() for every wakeup shouldn't be too bad,
> >>> because there probably are not very many times when the controller gets
> >>> resumed without the root hub also being resumed.
> >>>
> >>
> >> The check was added to fix system suspend issue on a runtime suspended host:
> >>
> >> commit d6236f6d1d885aa19d1cd7317346fe795227a3cc
> >>
> >>       xhci: Fix runtime suspended xhci from blocking system suspend.
> >>       
> >>       The system suspend flow as following:
> >>       1, Freeze all user processes and kenrel threads.
> >>       
> >>       2, Try to suspend all devices.
> >>       
> >>       2.1, If pci device is in RPM suspended state, then pci driver will try
> >>       to resume it to RPM active state in the prepare stage.
> >>       
> >>       2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two
> >>       workqueue items to resume usb2&usb3 roothub devices.
> >>       
> >>       2.3, Call suspend callbacks of devices.
> >>       
> >>       2.3.1, All suspend callbacks of all hcd's children, including
> >>       roothub devices are called.
> >>       
> >>       2.3.2, Finally, hcd_pci_suspend callback is called.
> >>       
> >>       Due to workqueue threads were already frozen in step 1, the workqueue
> >>       items can't be scheduled, and the roothub devices can't be resumed in
> >>       this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in
> >>       usb_hcd_resume_root_hub won't be cleared. Finally,
> >>       hcd_pci_suspend will return -EBUSY, and system suspend fails.
> > 
> > Hmmm.  I don't recall seeing this problem occur with ehci-hcd.  But
> > then, I haven't tested it very much recently.
> > 
> > We could change to a different work queue, one that doesn't get
> > frozen.  But there's no guarantee that the work items would run before
> > your step 2.3.2.
> > 
> > Maybe we can avoid step 2.1.  I think there have been some recent
> > changes to the PM code in this area.  There may be a flag you can set
> > that will prevent the PCI core from resuming the host controller.
> > 
> > Or maybe we can change step 2.3.1, so that the root hub's suspend
> > callback will first do a resume if the WAKEUP_PENDING flag is set.
> > That might be the most reliable approach.
> > 
> 
> I'm not sure I understand the last suggestion, could you open up how it
> would work?

Here's what I had in mind.  See if you think this would work.

Consider choose_wakeup() in core/driver.c.  That subroutine gets called
by usb_suspend() when step 2.3.1 wants to suspend a USB device.  We
could patch it as follows:


> I started approaching this from another direction, mainly making sure we don't
> immediately runtime suspend the host controller after resume. Adding a next_statechange
> minimal time between resume and suspend, and checking for pending events in xhci_suspend().
> 
> I'll have some patches to test for russianneuromancer@ya.ru soon
> 
> These are generic checks that ehci_suspend() also does

To tell the truth, I'm not sure how necessary those next_statechange 
tests really are.

Alan Stern
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--- usb-4.x.orig/drivers/usb/core/driver.c
+++ usb-4.x/drivers/usb/core/driver.c
@@ -1449,11 +1449,21 @@ static void choose_wakeup(struct usb_dev
 	 */
 	w = device_may_wakeup(&udev->dev);
 
-	/* If the device is autosuspended with the wrong wakeup setting,
+	/*
+	 * If the device is autosuspended with the wrong wakeup setting,
 	 * autoresume now so the setting can be changed.
+	 *
+	 * Likewise, if the device is an autosuspended root hub and the
+	 * hcd needs to wake it up before the controller can be suspended,
+	 * resume it now to clear the WAKEUP_PENDING flag.
 	 */
-	if (udev->state == USB_STATE_SUSPENDED && w != udev->do_remote_wakeup)
-		pm_runtime_resume(&udev->dev);
+	if (udev->state == USB_STATE_SUSPENDED) {
+		struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
+
+		if (w != udev->do_remote_wakeup ||
+				(!udev->parent && HCD_WAKEUP_PENDING(hcd)))
+			pm_runtime_resume(&udev->dev);
+	}
 	udev->do_remote_wakeup = w;
 }
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-04-24 13:50 Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2018-04-24 13:50 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: russianneuromancer, linux-usb

On Tue, 24 Apr 2018, Mathias Nyman wrote:

> >>> In this situation, the HCD_WAKEUP_PENDING(hcd) test in
> >>> hcd-pci.c:suspend_common() should prevent the controller from going
> >>> back into D3.  The WAKEUP_PENDING bit gets set in
> >>> usb_hcd_resume_root_hub() and it doesn't get cleared until
> >>> hcd_bus_resume() runs.
> >>>
> >>
> >> I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in this
> >> specific failing case
> >>
> >> xhci_resume() has a check:
> >> /* Resume root hubs only when have pending events. */
> >>     status = readl(&xhci->op_regs->status);
> >>       if (status & STS_EINT) {
> >>         usb_hcd_resume_root_hub(xhci->shared_hcd);
> >>         usb_hcd_resume_root_hub(hcd);
> >>       }
> >>
> >> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
> >> can suspend host controller again. when xhci driver finally gets to handle the interrupt
> >> the controller may be in D3 already
> >>
> >> This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
> >> could be possible as xhci has interrupt moderation enabled.
> > 
> > Then maybe that test should be removed.  Calling
> > usb_hcd_resume_root_hub() for every wakeup shouldn't be too bad,
> > because there probably are not very many times when the controller gets
> > resumed without the root hub also being resumed.
> > 
> 
> The check was added to fix system suspend issue on a runtime suspended host:
> 
> commit d6236f6d1d885aa19d1cd7317346fe795227a3cc
> 
>      xhci: Fix runtime suspended xhci from blocking system suspend.
>      
>      The system suspend flow as following:
>      1, Freeze all user processes and kenrel threads.
>      
>      2, Try to suspend all devices.
>      
>      2.1, If pci device is in RPM suspended state, then pci driver will try
>      to resume it to RPM active state in the prepare stage.
>      
>      2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two
>      workqueue items to resume usb2&usb3 roothub devices.
>      
>      2.3, Call suspend callbacks of devices.
>      
>      2.3.1, All suspend callbacks of all hcd's children, including
>      roothub devices are called.
>      
>      2.3.2, Finally, hcd_pci_suspend callback is called.
>      
>      Due to workqueue threads were already frozen in step 1, the workqueue
>      items can't be scheduled, and the roothub devices can't be resumed in
>      this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in
>      usb_hcd_resume_root_hub won't be cleared. Finally,
>      hcd_pci_suspend will return -EBUSY, and system suspend fails.

Hmmm.  I don't recall seeing this problem occur with ehci-hcd.  But 
then, I haven't tested it very much recently.

We could change to a different work queue, one that doesn't get 
frozen.  But there's no guarantee that the work items would run before 
your step 2.3.2.

Maybe we can avoid step 2.1.  I think there have been some recent
changes to the PM code in this area.  There may be a flag you can set
that will prevent the PCI core from resuming the host controller.

Or maybe we can change step 2.3.1, so that the root hub's suspend 
callback will first do a resume if the WAKEUP_PENDING flag is set.  
That might be the most reliable approach.

Alan Stern
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-04-24 13:32 Mathias Nyman
  0 siblings, 0 replies; 14+ messages in thread
From: Mathias Nyman @ 2018-04-24 13:32 UTC (permalink / raw)
  To: Alan Stern; +Cc: russianneuromancer, linux-usb

On 24.04.2018 16:24, Alan Stern wrote:
> On Tue, 24 Apr 2018, Mathias Nyman wrote:
> 
>> On 23.04.2018 18:11, Alan Stern wrote:
>>> On Mon, 23 Apr 2018, Mathias Nyman wrote:
>>>
>>>> On 22.04.2018 09:29, russianneuromancer@ya.ru wrote:
>>>>> Hello!
>>>>>
>>>>> So far I tested attached patch but didn't tried to revert commit yet,
>>>>> will do next week.
>>>>>
>>>>> Result of running patched kernel with recommended debug options:
>>>>> https://paste.fedoraproject.org/paste/UpezexD~tDmQthoxV2BFbg
>>>>>
>>>>
>>>> Logs show there is a race, controller is suspended, then resumed,
>>>> but no interrupt is pending in xhci_resume so roothubs are not resumed,
>>>> and host starts to suspend again.
>>>>
>>>> We get the interrupt only after we already started suspending xhci
>>>> controller again.
>>>>
>>>> My guess is that when we handle the interrupt we queue work to resume the roothub,
>>>> but controller is probably put to D3 suspended state by then,
>>>> returning 0xffffffff from some register reads, which driver understands as a dead host.
>>>>
>>>> I need to look into this a bit more.
>>>
>>> In this situation, the HCD_WAKEUP_PENDING(hcd) test in
>>> hcd-pci.c:suspend_common() should prevent the controller from going
>>> back into D3.  The WAKEUP_PENDING bit gets set in
>>> usb_hcd_resume_root_hub() and it doesn't get cleared until
>>> hcd_bus_resume() runs.
>>>
>>
>> I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in this
>> specific failing case
>>
>> xhci_resume() has a check:
>> /* Resume root hubs only when have pending events. */
>>     status = readl(&xhci->op_regs->status);
>>       if (status & STS_EINT) {
>>         usb_hcd_resume_root_hub(xhci->shared_hcd);
>>         usb_hcd_resume_root_hub(hcd);
>>       }
>>
>> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
>> can suspend host controller again. when xhci driver finally gets to handle the interrupt
>> the controller may be in D3 already
>>
>> This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
>> could be possible as xhci has interrupt moderation enabled.
> 
> Then maybe that test should be removed.  Calling
> usb_hcd_resume_root_hub() for every wakeup shouldn't be too bad,
> because there probably are not very many times when the controller gets
> resumed without the root hub also being resumed.
> 

The check was added to fix system suspend issue on a runtime suspended host:

commit d6236f6d1d885aa19d1cd7317346fe795227a3cc

     xhci: Fix runtime suspended xhci from blocking system suspend.
     
     The system suspend flow as following:
     1, Freeze all user processes and kenrel threads.
     
     2, Try to suspend all devices.
     
     2.1, If pci device is in RPM suspended state, then pci driver will try
     to resume it to RPM active state in the prepare stage.
     
     2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two
     workqueue items to resume usb2&usb3 roothub devices.
     
     2.3, Call suspend callbacks of devices.
     
     2.3.1, All suspend callbacks of all hcd's children, including
     roothub devices are called.
     
     2.3.2, Finally, hcd_pci_suspend callback is called.
     
     Due to workqueue threads were already frozen in step 1, the workqueue
     items can't be scheduled, and the roothub devices can't be resumed in
     this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in
     usb_hcd_resume_root_hub won't be cleared. Finally,
     hcd_pci_suspend will return -EBUSY, and system suspend fails.

-Mathias
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-04-24 13:24 Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2018-04-24 13:24 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: russianneuromancer, linux-usb

On Tue, 24 Apr 2018, Mathias Nyman wrote:

> On 23.04.2018 18:11, Alan Stern wrote:
> > On Mon, 23 Apr 2018, Mathias Nyman wrote:
> > 
> >> On 22.04.2018 09:29, russianneuromancer@ya.ru wrote:
> >>> Hello!
> >>>
> >>> So far I tested attached patch but didn't tried to revert commit yet,
> >>> will do next week.
> >>>
> >>> Result of running patched kernel with recommended debug options:
> >>> https://paste.fedoraproject.org/paste/UpezexD~tDmQthoxV2BFbg
> >>>
> >>
> >> Logs show there is a race, controller is suspended, then resumed,
> >> but no interrupt is pending in xhci_resume so roothubs are not resumed,
> >> and host starts to suspend again.
> >>
> >> We get the interrupt only after we already started suspending xhci
> >> controller again.
> >>
> >> My guess is that when we handle the interrupt we queue work to resume the roothub,
> >> but controller is probably put to D3 suspended state by then,
> >> returning 0xffffffff from some register reads, which driver understands as a dead host.
> >>
> >> I need to look into this a bit more.
> > 
> > In this situation, the HCD_WAKEUP_PENDING(hcd) test in
> > hcd-pci.c:suspend_common() should prevent the controller from going
> > back into D3.  The WAKEUP_PENDING bit gets set in
> > usb_hcd_resume_root_hub() and it doesn't get cleared until
> > hcd_bus_resume() runs.
> > 
> 
> I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in this
> specific failing case
> 
> xhci_resume() has a check:
> /* Resume root hubs only when have pending events. */
>    status = readl(&xhci->op_regs->status);
>      if (status & STS_EINT) {
>        usb_hcd_resume_root_hub(xhci->shared_hcd);
>        usb_hcd_resume_root_hub(hcd);
>      }
> 
> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
> can suspend host controller again. when xhci driver finally gets to handle the interrupt
> the controller may be in D3 already
> 
> This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
> could be possible as xhci has interrupt moderation enabled.

Then maybe that test should be removed.  Calling 
usb_hcd_resume_root_hub() for every wakeup shouldn't be too bad, 
because there probably are not very many times when the controller gets 
resumed without the root hub also being resumed.

Alan Stern
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-04-24 13:15 Mathias Nyman
  0 siblings, 0 replies; 14+ messages in thread
From: Mathias Nyman @ 2018-04-24 13:15 UTC (permalink / raw)
  To: Alan Stern; +Cc: russianneuromancer, linux-usb

On 23.04.2018 18:11, Alan Stern wrote:
> On Mon, 23 Apr 2018, Mathias Nyman wrote:
> 
>> On 22.04.2018 09:29, russianneuromancer@ya.ru wrote:
>>> Hello!
>>>
>>> So far I tested attached patch but didn't tried to revert commit yet,
>>> will do next week.
>>>
>>> Result of running patched kernel with recommended debug options:
>>> https://paste.fedoraproject.org/paste/UpezexD~tDmQthoxV2BFbg
>>>
>>
>> Logs show there is a race, controller is suspended, then resumed,
>> but no interrupt is pending in xhci_resume so roothubs are not resumed,
>> and host starts to suspend again.
>>
>> We get the interrupt only after we already started suspending xhci
>> controller again.
>>
>> My guess is that when we handle the interrupt we queue work to resume the roothub,
>> but controller is probably put to D3 suspended state by then,
>> returning 0xffffffff from some register reads, which driver understands as a dead host.
>>
>> I need to look into this a bit more.
> 
> In this situation, the HCD_WAKEUP_PENDING(hcd) test in
> hcd-pci.c:suspend_common() should prevent the controller from going
> back into D3.  The WAKEUP_PENDING bit gets set in
> usb_hcd_resume_root_hub() and it doesn't get cleared until
> hcd_bus_resume() runs.
> 

I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in this
specific failing case

xhci_resume() has a check:
/* Resume root hubs only when have pending events. */
   status = readl(&xhci->op_regs->status);
     if (status & STS_EINT) {
       usb_hcd_resume_root_hub(xhci->shared_hcd);
       usb_hcd_resume_root_hub(hcd);
     }

If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
can suspend host controller again. when xhci driver finally gets to handle the interrupt
the controller may be in D3 already

This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
could be possible as xhci has interrupt moderation enabled.

-Mathias
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-04-23 15:11 Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2018-04-23 15:11 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: russianneuromancer, linux-usb

On Mon, 23 Apr 2018, Mathias Nyman wrote:

> On 22.04.2018 09:29, russianneuromancer@ya.ru wrote:
> > Hello!
> > 
> > So far I tested attached patch but didn't tried to revert commit yet,
> > will do next week.
> > 
> > Result of running patched kernel with recommended debug options:
> > https://paste.fedoraproject.org/paste/UpezexD~tDmQthoxV2BFbg
> > 
> 
> Logs show there is a race, controller is suspended, then resumed,
> but no interrupt is pending in xhci_resume so roothubs are not resumed,
> and host starts to suspend again.
> 
> We get the interrupt only after we already started suspending xhci
> controller again.
> 
> My guess is that when we handle the interrupt we queue work to resume the roothub,
> but controller is probably put to D3 suspended state by then,
> returning 0xffffffff from some register reads, which driver understands as a dead host.
> 
> I need to look into this a bit more.

In this situation, the HCD_WAKEUP_PENDING(hcd) test in
hcd-pci.c:suspend_common() should prevent the controller from going
back into D3.  The WAKEUP_PENDING bit gets set in
usb_hcd_resume_root_hub() and it doesn't get cleared until
hcd_bus_resume() runs.

Alan Stern
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-04-23 14:52 Mathias Nyman
  0 siblings, 0 replies; 14+ messages in thread
From: Mathias Nyman @ 2018-04-23 14:52 UTC (permalink / raw)
  To: russianneuromancer, linux-usb

On 22.04.2018 09:29, russianneuromancer@ya.ru wrote:
> Hello!
> 
> So far I tested attached patch but didn't tried to revert commit yet,
> will do next week.
> 
> Result of running patched kernel with recommended debug options:
> https://paste.fedoraproject.org/paste/UpezexD~tDmQthoxV2BFbg
> 

Logs show there is a race, controller is suspended, then resumed,
but no interrupt is pending in xhci_resume so roothubs are not resumed,
and host starts to suspend again.

We get the interrupt only after we already started suspending xhci
controller again.

My guess is that when we handle the interrupt we queue work to resume the roothub,
but controller is probably put to D3 suspended state by then,
returning 0xffffffff from some register reads, which driver understands as a dead host.

I need to look into this a bit more.

[  268.144527] xhci_hcd 0000:00:14.0: xhci_suspend: stopping port polling.
[  268.144543] xhci_hcd 0000:00:14.0: // Setting command ring address to 0x349bd001
[  268.520802] xhci_hcd 0000:00:14.0: // Setting command ring address to 0x349bd001
[  268.520969] xhci_hcd 0000:00:14.0: xhci_resume: starting port polling.
[  268.520985] xhci_hcd 0000:00:14.0: xhci_hub_status_data: stopping port polling.
[  268.521030] xhci_hcd 0000:00:14.0: xhci_suspend: stopping port polling.
[  268.521040] xhci_hcd 0000:00:14.0: // Setting command ring address to 0x349bd001
[  268.521139] xhci_hcd 0000:00:14.0: Port Status Change Event for port 3
[  268.521149] xhci_hcd 0000:00:14.0: resume root hub
[  268.521163] xhci_hcd 0000:00:14.0: port resume event for port 3
[  268.521168] xhci_hcd 0000:00:14.0: xHC is not running.
[  268.521174] xhci_hcd 0000:00:14.0: handle_port_status: starting port polling.
[  268.596322] xhci_hcd 0000:00:14.0: xhci_hc_died: xHCI host controller not responding, assume dead
[  268.596340] xhci_hcd 0000:00:14.0: Killing URBs for slot ID 1, ep index 0

-Mathias

> 16/04/2018 14:55 +0300, Mathias Nyman:
>> On 10.04.2018 12:15, russianneuromancer@ya.ru wrote:
>>> Hello!
>>>
>>> On Dell Venue 8 Pro 5855 tablet installing tlp or running "powertop
>>> --
>>> auto-tune" cause "xHCI host controller not responding, assume dead"
>>> error, when error happen two integrated USB devices (Bluetooth
>>> adapter
>>> and LTE modem) disappear until reboot. First time this issue was
>>> observer in Linux 4.13 and still present in Linux 4.16.
>>> Blacklisting
>>> both "Linux Foundation 3.0 root hub" from autosuspend in tlp
>>> configuration is workaround for this issue, however on other
>>> devices
>>> tlp works fine without blacklisting usb hub autosuspend, and on
>>> this
>>> tablet there was no such issue before (at least in Linux ~4.8-4.12
>>> range) so I assume there is regression somewhere.
>>>
>>> Is there any related commits between 4.12 and 4.13 that I could try
>>> to revert?
>>>
>>
>> In 4.12 there was a added sensitivity to react to hotplug removed
>> xhc controllers, i.e. if we read 0xffffffff from a xhci register
>> we assume host is removed and start cleaning up.
>>
>> commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b
>>       xhci: Rework how we handle unresponsive or hoptlug removed hosts
>>
>> You can try to revert that, but as a final solution we should
>> find the real rootcause
>>
>>> How issue looks like in logs:
>>>
>>> [  227.258385] xhci_hcd 0000:00:14.0: xHC is not running.
>>> [  329.671544] xhci_hcd 0000:00:14.0: xHC is not running.
>>> [  416.695796] xhci_hcd 0000:00:14.0: xHC is not running.
>>
>> The "xHC is not running" is the xhci driver handing a port event
>> interrupt for a resuming port, but whole host controller is not
>> running.
>> We stop the host controller in xhci_suspend(), and start it in
>> xhci_resume()
>>
>> Attaching a patch that improves preventing xhci host suspend during
>> USB2 resume signaling.
>> Could help, worth a shot.
>>
>>> [  416.695862] xhci_hcd 0000:00:14.0: xHCI host controller not
>>> responding, assume dead
>>
>> This means xhci_hc_died() was called, many possible places.
>> Adding the code below could give a hint:
>>
>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-
>> ring.c
>> index daa94c3..51fb3d0 100644
>> --- a/drivers/usb/host/xhci-ring.c
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -900,7 +900,8 @@ void xhci_hc_died(struct xhci_hcd *xhci)
>>           if (xhci->xhc_state & XHCI_STATE_DYING)
>>                   return;
>>    
>> -       xhci_err(xhci, "xHCI host controller not responding, assume
>> dead\n");
>> +       xhci_err(xhci, "%ps: xHCI host controller not responding,
>> assume dead\n",
>> +                __builtin_return_address(0));
>>           xhci->xhc_state |= XHCI_STATE_DYING;
>>    
>>           xhci_cleanup_command_queue(xhci);
>>
>>> [  416.695900] xhci_hcd 0000:00:14.0: HC died; cleaning up
>>> [  416.696052] usb 1-3: USB disconnect, device number 2
>>> [  416.815610] cdc_mbim 1-3:1.12 wwp0s20u3i12: unregister
>>> 'cdc_mbim'
>>> usb-0000:00:14.0-3, CDC MBIM
>>> [  416.847934] usb 1-4: USB disconnect, device number 3
>>>
>>> After that Bluetooth adapter and LTE modem disappear from lsusb
>>> output,
>>> while xHCI controller itself remain visible.
>>
>> we stop the host activity in xhci_hc_died(), no usb devices under
>> this host will work.
>>
>>> Complete dmesg: https://paste.fedoraproject.org/paste/7aMpVGLfZ82zp
>>> pdGs
>>> 56Oqg
>>> lsusb -v: https://paste.fedoraproject.org/paste/c7y8GisC13YdzcYE9B-
>>> JIw
>>> dsdt.dsl: https://paste.fedoraproject.org/paste/8g6mp2dafypUkFT4sa4
>>> 3iA
>>
>> xhci traces and dynamic debug could help:
>>
>> mount -t debugfs none /sys/kernel/debug
>> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
>> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
>>
>> echo -n 'module xhci_hcd =p' >
>> /sys/kernel/debug/dynamic_debug/control
>>
>> -Mathias
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-04-22  6:29 russianneuromancer
  0 siblings, 0 replies; 14+ messages in thread
From: russianneuromancer @ 2018-04-22  6:29 UTC (permalink / raw)
  To: Mathias Nyman, linux-usb

Hello!

So far I tested attached patch but didn't tried to revert commit yet,
will do next week.

Result of running patched kernel with recommended debug options: 
https://paste.fedoraproject.org/paste/UpezexD~tDmQthoxV2BFbg

16/04/2018 14:55 +0300, Mathias Nyman:
> On 10.04.2018 12:15, russianneuromancer@ya.ru wrote:
> > Hello!
> > 
> > On Dell Venue 8 Pro 5855 tablet installing tlp or running "powertop
> > --
> > auto-tune" cause "xHCI host controller not responding, assume dead"
> > error, when error happen two integrated USB devices (Bluetooth
> > adapter
> > and LTE modem) disappear until reboot. First time this issue was
> > observer in Linux 4.13 and still present in Linux 4.16.
> > Blacklisting
> > both "Linux Foundation 3.0 root hub" from autosuspend in tlp
> > configuration is workaround for this issue, however on other
> > devices
> > tlp works fine without blacklisting usb hub autosuspend, and on
> > this
> > tablet there was no such issue before (at least in Linux ~4.8-4.12
> > range) so I assume there is regression somewhere.
> > 
> > Is there any related commits between 4.12 and 4.13 that I could try
> > to revert?
> > 
> 
> In 4.12 there was a added sensitivity to react to hotplug removed
> xhc controllers, i.e. if we read 0xffffffff from a xhci register
> we assume host is removed and start cleaning up.
> 
> commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b
>      xhci: Rework how we handle unresponsive or hoptlug removed hosts
> 
> You can try to revert that, but as a final solution we should
> find the real rootcause
> 
> > How issue looks like in logs:
> > 
> > [  227.258385] xhci_hcd 0000:00:14.0: xHC is not running.
> > [  329.671544] xhci_hcd 0000:00:14.0: xHC is not running.
> > [  416.695796] xhci_hcd 0000:00:14.0: xHC is not running.
> 
> The "xHC is not running" is the xhci driver handing a port event
> interrupt for a resuming port, but whole host controller is not
> running.
> We stop the host controller in xhci_suspend(), and start it in
> xhci_resume()
> 
> Attaching a patch that improves preventing xhci host suspend during
> USB2 resume signaling.
> Could help, worth a shot.
> 
> > [  416.695862] xhci_hcd 0000:00:14.0: xHCI host controller not
> > responding, assume dead
> 
> This means xhci_hc_died() was called, many possible places.
> Adding the code below could give a hint:
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-
> ring.c
> index daa94c3..51fb3d0 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -900,7 +900,8 @@ void xhci_hc_died(struct xhci_hcd *xhci)
>          if (xhci->xhc_state & XHCI_STATE_DYING)
>                  return;
>   
> -       xhci_err(xhci, "xHCI host controller not responding, assume
> dead\n");
> +       xhci_err(xhci, "%ps: xHCI host controller not responding,
> assume dead\n",
> +                __builtin_return_address(0));
>          xhci->xhc_state |= XHCI_STATE_DYING;
>   
>          xhci_cleanup_command_queue(xhci);
> 
> > [  416.695900] xhci_hcd 0000:00:14.0: HC died; cleaning up
> > [  416.696052] usb 1-3: USB disconnect, device number 2
> > [  416.815610] cdc_mbim 1-3:1.12 wwp0s20u3i12: unregister
> > 'cdc_mbim'
> > usb-0000:00:14.0-3, CDC MBIM
> > [  416.847934] usb 1-4: USB disconnect, device number 3
> > 
> > After that Bluetooth adapter and LTE modem disappear from lsusb
> > output,
> > while xHCI controller itself remain visible.
> 
> we stop the host activity in xhci_hc_died(), no usb devices under
> this host will work.
> 
> > Complete dmesg: https://paste.fedoraproject.org/paste/7aMpVGLfZ82zp
> > pdGs
> > 56Oqg
> > lsusb -v: https://paste.fedoraproject.org/paste/c7y8GisC13YdzcYE9B-
> > JIw
> > dsdt.dsl: https://paste.fedoraproject.org/paste/8g6mp2dafypUkFT4sa4
> > 3iA
> 
> xhci traces and dynamic debug could help:
> 
> mount -t debugfs none /sys/kernel/debug
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
> 
> echo -n 'module xhci_hcd =p' >
> /sys/kernel/debug/dynamic_debug/control
> 
> -Mathias
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855
@ 2018-04-16 11:55 Mathias Nyman
  0 siblings, 0 replies; 14+ messages in thread
From: Mathias Nyman @ 2018-04-16 11:55 UTC (permalink / raw)
  To: russianneuromancer, linux-usb

On 10.04.2018 12:15, russianneuromancer@ya.ru wrote:
> Hello!
> 
> On Dell Venue 8 Pro 5855 tablet installing tlp or running "powertop --
> auto-tune" cause "xHCI host controller not responding, assume dead"
> error, when error happen two integrated USB devices (Bluetooth adapter
> and LTE modem) disappear until reboot. First time this issue was
> observer in Linux 4.13 and still present in Linux 4.16. Blacklisting
> both "Linux Foundation 3.0 root hub" from autosuspend in tlp
> configuration is workaround for this issue, however on other devices
> tlp works fine without blacklisting usb hub autosuspend, and on this
> tablet there was no such issue before (at least in Linux ~4.8-4.12
> range) so I assume there is regression somewhere.
> 
> Is there any related commits between 4.12 and 4.13 that I could try to
> revert?
> 

In 4.12 there was a added sensitivity to react to hotplug removed
xhc controllers, i.e. if we read 0xffffffff from a xhci register
we assume host is removed and start cleaning up.

commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b
     xhci: Rework how we handle unresponsive or hoptlug removed hosts

You can try to revert that, but as a final solution we should
find the real rootcause

> How issue looks like in logs:
> 
> [  227.258385] xhci_hcd 0000:00:14.0: xHC is not running.
> [  329.671544] xhci_hcd 0000:00:14.0: xHC is not running.
> [  416.695796] xhci_hcd 0000:00:14.0: xHC is not running.

The "xHC is not running" is the xhci driver handing a port event
interrupt for a resuming port, but whole host controller is not running.
We stop the host controller in xhci_suspend(), and start it in xhci_resume()

Attaching a patch that improves preventing xhci host suspend during
USB2 resume signaling.
Could help, worth a shot.

> [  416.695862] xhci_hcd 0000:00:14.0: xHCI host controller not
> responding, assume dead

This means xhci_hc_died() was called, many possible places.
Adding the code below could give a hint:


> [  416.695900] xhci_hcd 0000:00:14.0: HC died; cleaning up
> [  416.696052] usb 1-3: USB disconnect, device number 2
> [  416.815610] cdc_mbim 1-3:1.12 wwp0s20u3i12: unregister 'cdc_mbim'
> usb-0000:00:14.0-3, CDC MBIM
> [  416.847934] usb 1-4: USB disconnect, device number 3
> 
> After that Bluetooth adapter and LTE modem disappear from lsusb output,
> while xHCI controller itself remain visible.

we stop the host activity in xhci_hc_died(), no usb devices under this host will work.

> Complete dmesg: https://paste.fedoraproject.org/paste/7aMpVGLfZ82zppdGs
> 56Oqg
> lsusb -v: https://paste.fedoraproject.org/paste/c7y8GisC13YdzcYE9B-JIw
> dsdt.dsl: https://paste.fedoraproject.org/paste/8g6mp2dafypUkFT4sa43iA

xhci traces and dynamic debug could help:

mount -t debugfs none /sys/kernel/debug
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable

echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control

-Mathias

From 090b13a6df3f489a9781223dd959e03c2f81347b Mon Sep 17 00:00:00 2001
From: Mathias Nyman <mathias.nyman@linux.intel.com>
Date: Thu, 1 Mar 2018 18:48:32 +0200
Subject: [PATCH] xhci: prevent USB 2 roothub autosuspend during port resume
 signaling

xhci USB 2 roothub tries to autosuspended itself again immediately after
being resumed by a remote wake. This can be avoided by calling the
usb_hcd_start_port_resume() and usb_hcd_end_port_resume() implemented
especially for this purpose.

Use them, and prevent roothub autosuspend during resume signaling.

Suggested-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-hub.c  | 3 +++
 drivers/usb/host/xhci-ring.c | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 72ebbc9..671a336 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -905,6 +905,7 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
 
 				set_bit(wIndex, &bus_state->resuming_ports);
 				bus_state->resume_done[wIndex] = timeout;
+				usb_hcd_start_port_resume(&hcd->self, wIndex);
 				mod_timer(&hcd->rh_timer, timeout);
 			}
 		/* Has resume been signalled for USB_RESUME_TIME yet? */
@@ -930,6 +931,7 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
 					msecs_to_jiffies(
 						XHCI_MAX_REXIT_TIMEOUT));
 			spin_lock_irqsave(&xhci->lock, flags);
+			usb_hcd_end_port_resume(&hcd->self, wIndex);
 
 			if (time_left) {
 				slot_id = xhci_find_slot_id_by_port(hcd,
@@ -970,6 +972,7 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
 	    (raw_port_status & PORT_PLS_MASK) != XDEV_RESUME) {
 		bus_state->resume_done[wIndex] = 0;
 		clear_bit(wIndex, &bus_state->resuming_ports);
+		usb_hcd_end_port_resume(&hcd->self, wIndex);
 	}
 
 
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index daa94c3..a1cffe9 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1666,6 +1666,8 @@ static void handle_port_status(struct xhci_hcd *xhci,
 			bus_state->resume_done[faked_port_index] = jiffies +
 				msecs_to_jiffies(USB_RESUME_TIMEOUT);
 			set_bit(faked_port_index, &bus_state->resuming_ports);
+			usb_hcd_start_port_resume(&hcd->self, faked_port_index);
+
 			/* Do the rest in GetPortStatus after resume time delay.
 			 * Avoid polling roothub status before that so that a
 			 * usb device auto-resume latency around ~40ms.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-05-04 11:53 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-02 14:47 Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855 Mathias Nyman
  -- strict thread matches above, loose matches on Subject: below --
2018-05-04 11:53 Mathias Nyman
2018-05-03 18:56 Alan Stern
2018-05-03 11:53 Mathias Nyman
2018-05-03 11:37 Mathias Nyman
2018-05-02 17:52 Alan Stern
2018-04-24 13:50 Alan Stern
2018-04-24 13:32 Mathias Nyman
2018-04-24 13:24 Alan Stern
2018-04-24 13:15 Mathias Nyman
2018-04-23 15:11 Alan Stern
2018-04-23 14:52 Mathias Nyman
2018-04-22  6:29 russianneuromancer
2018-04-16 11:55 Mathias Nyman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.