linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xhci: clear port_remote_wakeup after resume failure
@ 2019-05-24 14:52 Nicolas Saenz Julienne
  2019-05-27 11:16 ` Mathias Nyman
  0 siblings, 1 reply; 5+ messages in thread
From: Nicolas Saenz Julienne @ 2019-05-24 14:52 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: oneukum, Nicolas Saenz Julienne, Greg Kroah-Hartman, linux-usb,
	linux-kernel

This was seen on a Dell Precision 5520 using it's WD15 dock. The dock's
Ethernet device interfaces with the laptop through one of it's USB3
ports. While idle, the Ethernet device and HCD are suspended by runtime
PM, being the only device connected on the bus. Then, both are resumed on
behalf of the Ethernet device, which has remote wake-up capabilities.

The Ethernet device was observed to randomly disconnect from the USB
port shortly after submitting it's remote wake-up request. Probably a
weird timing issue yet to be investigated. This causes runtime PM to
busyloop causing some tangible CPU load. The reason is the port gets
stuck in the middle of a remote wake-up operation, waiting for the
device to switch to U0. This never happens, leaving "port_remote_wakeup"
enabled, and automatically triggering a failure on any further suspend
operation.

This patch clears "port_remote_wakeup" upon detecting a device with a
wrong resuming port state (see Table 4-9 in 4.15.2.3). Making sure the
above mentioned situation doesn't trigger a PM busyloop.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
---
 drivers/usb/host/xhci-hub.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 3abe70ff1b1e..53f5ee50ef8c 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1047,8 +1047,8 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
 		xhci_get_usb2_port_status(port, &status, raw_port_status,
 					  flags);
 	/*
-	 * Clear stale usb2 resume signalling variables in case port changed
-	 * state during resume signalling. For example on error
+	 * Clear stale resume signalling variables in case port changed
+	 * state during resume signalling. For example on error.
 	 */
 	if ((bus_state->resume_done[wIndex] ||
 	     test_bit(wIndex, &bus_state->resuming_ports)) &&
@@ -1057,6 +1057,12 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
 		bus_state->resume_done[wIndex] = 0;
 		clear_bit(wIndex, &bus_state->resuming_ports);
 		usb_hcd_end_port_resume(&hcd->self, wIndex);
+	} else if (bus_state->port_remote_wakeup & (1 << port->hcd_portnum) &&
+		   ((raw_port_status & PORT_PLS_MASK) != XDEV_RESUME ||
+		   !(raw_port_status & PORT_CONNECT) ||
+		   !(raw_port_status & PORT_PE) ||
+		   raw_port_status & PORT_OC)) {
+		bus_state->port_remote_wakeup &= ~(1 << port->hcd_portnum);
 	}
 
 	if (bus_state->port_c_suspend & (1 << wIndex))
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] xhci: clear port_remote_wakeup after resume failure
  2019-05-24 14:52 [PATCH] xhci: clear port_remote_wakeup after resume failure Nicolas Saenz Julienne
@ 2019-05-27 11:16 ` Mathias Nyman
  2019-05-27 11:28   ` Nicolas Saenz Julienne
  0 siblings, 1 reply; 5+ messages in thread
From: Mathias Nyman @ 2019-05-27 11:16 UTC (permalink / raw)
  To: Nicolas Saenz Julienne, Mathias Nyman
  Cc: oneukum, Greg Kroah-Hartman, linux-usb, linux-kernel

On 24.5.2019 17.52, Nicolas Saenz Julienne wrote:
> This was seen on a Dell Precision 5520 using it's WD15 dock. The dock's
> Ethernet device interfaces with the laptop through one of it's USB3
> ports. While idle, the Ethernet device and HCD are suspended by runtime
> PM, being the only device connected on the bus. Then, both are resumed on
> behalf of the Ethernet device, which has remote wake-up capabilities.
> 
> The Ethernet device was observed to randomly disconnect from the USB
> port shortly after submitting it's remote wake-up request. Probably a
> weird timing issue yet to be investigated. This causes runtime PM to
> busyloop causing some tangible CPU load. The reason is the port gets
> stuck in the middle of a remote wake-up operation, waiting for the
> device to switch to U0. This never happens, leaving "port_remote_wakeup"
> enabled, and automatically triggering a failure on any further suspend
> operation.
> 
> This patch clears "port_remote_wakeup" upon detecting a device with a
> wrong resuming port state (see Table 4-9 in 4.15.2.3). Making sure the
> above mentioned situation doesn't trigger a PM busyloop.
> 

There was a similar case where the USB3 link had transitioned to a
lower power U1 or U2 state after resume, before driver read the state,
leaving port_remote_wakeup flag uncleared.

This was fixed in 5.1 kernel by commit:

6cbcf59 xhci: Fix port resume done detection for SS ports with LPM enable

Can you check if you have it?
It should be in recent stable releases as well.

-Mathias

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xhci: clear port_remote_wakeup after resume failure
  2019-05-27 11:16 ` Mathias Nyman
@ 2019-05-27 11:28   ` Nicolas Saenz Julienne
  2019-06-04 13:53     ` Mathias Nyman
  0 siblings, 1 reply; 5+ messages in thread
From: Nicolas Saenz Julienne @ 2019-05-27 11:28 UTC (permalink / raw)
  To: Mathias Nyman, Mathias Nyman
  Cc: oneukum, Greg Kroah-Hartman, linux-usb, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2033 bytes --]

Hi Matthias,
thanks for the review.

On Mon, 2019-05-27 at 14:16 +0300, Mathias Nyman wrote:
> On 24.5.2019 17.52, Nicolas Saenz Julienne wrote:
> > This was seen on a Dell Precision 5520 using it's WD15 dock. The dock's
> > Ethernet device interfaces with the laptop through one of it's USB3
> > ports. While idle, the Ethernet device and HCD are suspended by runtime
> > PM, being the only device connected on the bus. Then, both are resumed on
> > behalf of the Ethernet device, which has remote wake-up capabilities.
> > 
> > The Ethernet device was observed to randomly disconnect from the USB
> > port shortly after submitting it's remote wake-up request. Probably a
> > weird timing issue yet to be investigated. This causes runtime PM to
> > busyloop causing some tangible CPU load. The reason is the port gets
> > stuck in the middle of a remote wake-up operation, waiting for the
> > device to switch to U0. This never happens, leaving "port_remote_wakeup"
> > enabled, and automatically triggering a failure on any further suspend
> > operation.
> > 
> > This patch clears "port_remote_wakeup" upon detecting a device with a
> > wrong resuming port state (see Table 4-9 in 4.15.2.3). Making sure the
> > above mentioned situation doesn't trigger a PM busyloop.
> > 
> 
> There was a similar case where the USB3 link had transitioned to a
> lower power U1 or U2 state after resume, before driver read the state,
> leaving port_remote_wakeup flag uncleared.
> 
> This was fixed in 5.1 kernel by commit:
> 
> 6cbcf59 xhci: Fix port resume done detection for SS ports with LPM enable
> 
> Can you check if you have it?
> It should be in recent stable releases as well.

I was aware of that patch, unfortunately it doesn't address the same issue. In
my case I never get a second port status event (so no PLC == 1 or any state
change seen in PLS). The device simply disconnects from the bus.

I did test both the issue and fix on top of last week's master branch.

Regards,
Nicolas


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xhci: clear port_remote_wakeup after resume failure
  2019-05-27 11:28   ` Nicolas Saenz Julienne
@ 2019-06-04 13:53     ` Mathias Nyman
  2019-06-08 13:33       ` Nicolas Saenz Julienne
  0 siblings, 1 reply; 5+ messages in thread
From: Mathias Nyman @ 2019-06-04 13:53 UTC (permalink / raw)
  To: Nicolas Saenz Julienne, Mathias Nyman
  Cc: oneukum, Greg Kroah-Hartman, linux-usb, linux-kernel

On 27.5.2019 14.28, Nicolas Saenz Julienne wrote:
> Hi Matthias,
> thanks for the review.
> 
> On Mon, 2019-05-27 at 14:16 +0300, Mathias Nyman wrote:
>> On 24.5.2019 17.52, Nicolas Saenz Julienne wrote:
>>> This was seen on a Dell Precision 5520 using it's WD15 dock. The dock's
>>> Ethernet device interfaces with the laptop through one of it's USB3
>>> ports. While idle, the Ethernet device and HCD are suspended by runtime
>>> PM, being the only device connected on the bus. Then, both are resumed on
>>> behalf of the Ethernet device, which has remote wake-up capabilities.
>>>
>>> The Ethernet device was observed to randomly disconnect from the USB
>>> port shortly after submitting it's remote wake-up request. Probably a
>>> weird timing issue yet to be investigated. This causes runtime PM to
>>> busyloop causing some tangible CPU load. The reason is the port gets
>>> stuck in the middle of a remote wake-up operation, waiting for the
>>> device to switch to U0. This never happens, leaving "port_remote_wakeup"
>>> enabled, and automatically triggering a failure on any further suspend
>>> operation.
>>>
>>> This patch clears "port_remote_wakeup" upon detecting a device with a
>>> wrong resuming port state (see Table 4-9 in 4.15.2.3). Making sure the
>>> above mentioned situation doesn't trigger a PM busyloop.
>>>
>>
>> There was a similar case where the USB3 link had transitioned to a
>> lower power U1 or U2 state after resume, before driver read the state,
>> leaving port_remote_wakeup flag uncleared.
>>
>> This was fixed in 5.1 kernel by commit:
>>
>> 6cbcf59 xhci: Fix port resume done detection for SS ports with LPM enable
>>
>> Can you check if you have it?
>> It should be in recent stable releases as well.
> 
> I was aware of that patch, unfortunately it doesn't address the same issue. In
> my case I never get a second port status event (so no PLC == 1 or any state
> change seen in PLS). The device simply disconnects from the bus.
> 

I see, ok, then we need to clear the flag in the hub thread.

But to me it looks like this patch could cause a small race risk in the successful
device initiated resume cases.

If the hub thread, i.e. the get_port_status() function, notices the U0 state before
the interrupt handler, i.e. handle_port_status() function, then port_remote_wakeup
flag is cleared in the hub thread and the wakeup notification is never called from
handle_port_status().

Would it be enough to just check for (port_remote_wakeup flag && !PORT_CONNECT) in the hub thread?
USB3 PORT_CONNECT bit is lost in most error cases.

-Mathias

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xhci: clear port_remote_wakeup after resume failure
  2019-06-04 13:53     ` Mathias Nyman
@ 2019-06-08 13:33       ` Nicolas Saenz Julienne
  0 siblings, 0 replies; 5+ messages in thread
From: Nicolas Saenz Julienne @ 2019-06-08 13:33 UTC (permalink / raw)
  To: Mathias Nyman, Mathias Nyman
  Cc: oneukum, Greg Kroah-Hartman, linux-usb, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3795 bytes --]

On Tue, 2019-06-04 at 16:53 +0300, Mathias Nyman wrote:
> On 27.5.2019 14.28, Nicolas Saenz Julienne wrote:
> > Hi Matthias,
> > thanks for the review.
> > 
> > On Mon, 2019-05-27 at 14:16 +0300, Mathias Nyman wrote:
> > > On 24.5.2019 17.52, Nicolas Saenz Julienne wrote:
> > > > This was seen on a Dell Precision 5520 using it's WD15 dock. The dock's
> > > > Ethernet device interfaces with the laptop through one of it's USB3
> > > > ports. While idle, the Ethernet device and HCD are suspended by runtime
> > > > PM, being the only device connected on the bus. Then, both are resumed
> > > > on
> > > > behalf of the Ethernet device, which has remote wake-up capabilities.
> > > > 
> > > > The Ethernet device was observed to randomly disconnect from the USB
> > > > port shortly after submitting it's remote wake-up request. Probably a
> > > > weird timing issue yet to be investigated. This causes runtime PM to
> > > > busyloop causing some tangible CPU load. The reason is the port gets
> > > > stuck in the middle of a remote wake-up operation, waiting for the
> > > > device to switch to U0. This never happens, leaving "port_remote_wakeup"
> > > > enabled, and automatically triggering a failure on any further suspend
> > > > operation.
> > > > 
> > > > This patch clears "port_remote_wakeup" upon detecting a device with a
> > > > wrong resuming port state (see Table 4-9 in 4.15.2.3). Making sure the
> > > > above mentioned situation doesn't trigger a PM busyloop.
> > > > 
> > > 
> > > There was a similar case where the USB3 link had transitioned to a
> > > lower power U1 or U2 state after resume, before driver read the state,
> > > leaving port_remote_wakeup flag uncleared.
> > > 
> > > This was fixed in 5.1 kernel by commit:
> > > 
> > > 6cbcf59 xhci: Fix port resume done detection for SS ports with LPM enable
> > > 
> > > Can you check if you have it?
> > > It should be in recent stable releases as well.
> > 
> > I was aware of that patch, unfortunately it doesn't address the same issue.
> > In
> > my case I never get a second port status event (so no PLC == 1 or any state
> > change seen in PLS). The device simply disconnects from the bus.
> > 
> 
> I see, ok, then we need to clear the flag in the hub thread.
> 
> But to me it looks like this patch could cause a small race risk in the
> successful
> device initiated resume cases.
> 
> If the hub thread, i.e. the get_port_status() function, notices the U0 state
> before
> the interrupt handler, i.e. handle_port_status() function, then
> port_remote_wakeup
> flag is cleared in the hub thread and the wakeup notification is never called
> from
> handle_port_status().
> 
> Would it be enough to just check for (port_remote_wakeup flag &&
> !PORT_CONNECT) in the hub thread?
> USB3 PORT_CONNECT bit is lost in most error cases.

I get your concerns, there is a race indeed. On top of that, checking
PORT_CONNECT works fine for me.

So if I undertood your suggestion right, would this be fine? 

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 3abe70ff1b1e..253957dc62de 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1057,6 +1057,9 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
                bus_state->resume_done[wIndex] = 0;
                clear_bit(wIndex, &bus_state->resuming_ports);
                usb_hcd_end_port_resume(&hcd->self, wIndex);
+       } else if (bus_state->port_remote_wakeup & (1 << port->hcd_portnum) &&
+                  !(raw_port_status & PORT_CONNECT)) {
+               bus_state->port_remote_wakeup &= ~(1 << port->hcd_portnum);
        }
 
        if (bus_state->port_c_suspend & (1 << wIndex))

Regards,
Nicolas


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-08 13:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-24 14:52 [PATCH] xhci: clear port_remote_wakeup after resume failure Nicolas Saenz Julienne
2019-05-27 11:16 ` Mathias Nyman
2019-05-27 11:28   ` Nicolas Saenz Julienne
2019-06-04 13:53     ` Mathias Nyman
2019-06-08 13:33       ` Nicolas Saenz Julienne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).