[PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout

* [PATCH] usb: xhci: Fix incomplete PM resume operation due to XHCI commmand timeout
@ 2016-03-18  7:01 Rajesh Bhagat
  2016-03-18 11:20 ` Mathias Nyman
  2016-03-18 14:21 ` Alan Stern
  0 siblings, 2 replies; 18+ messages in thread
From: Rajesh Bhagat @ 2016-03-18  7:01 UTC (permalink / raw)
  To: linux-usb, linux-kernel; +Cc: gregkh, mathias.nyman, sriram.dash, Rajesh Bhagat

We are facing issue while performing the system resume operation from STR
where XHCI is going to indefinite hang/sleep state due to
wait_for_completion API called in function xhci_alloc_dev for command
TRB_ENABLE_SLOT which never completes.

Now, xhci_handle_command_timeout function is called and prints
"Command timeout" message but never calls complete API for above
TRB_ENABLE_SLOT command as xhci_abort_cmd_ring is successful.

Solution to above problem is:
1. calling xhci_cleanup_command_queue API even if xhci_abort_cmd_ring
   is successful or not.
2. checking the status of reset_device in usb core code.

Before Fix:
root@phoenix:~# echo mem > /sys/power/state
PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.001 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
PM: suspend of devices complete after 103.144 msecs
PM: late suspend of devices complete after 1.503 msecs
PM: noirq suspend of devices complete after 1.220 msecs
Disabling non-boot CPUs ...
CPU1: shutdown
Retrying again to check for CPU kill
CPU1 killed.
Enabling non-boot CPUs ...
CPU1 is up
PM: noirq resume of devices complete after 1.996 msecs
PM: early resume of devices complete after 1.152 msecs
usb usb1: root hub lost power or was reset
usb usb2: root hub lost power or was reset
----- <<hangs indefinitely>> --------------

After Fix:
root@phoenix:~#  echo mem > /sys/power/state
PM: Syncing filesystems ... done.
Freezing user space processes ... (elapsed 0.001 seconds) done.
Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
PM: suspend of devices complete after 103.086 msecs
PM: late suspend of devices complete after 1.517 msecs
PM: noirq suspend of devices complete after 1.217 msecs
Disabling non-boot CPUs ...
CPU1: shutdown
Retrying again to check for CPU kill
CPU1 killed.
Enabling non-boot CPUs ...
CPU1 is up
PM: noirq resume of devices complete after 1.991 msecs
PM: early resume of devices complete after 1.239 msecs
usb usb1: root hub lost power or was reset
usb usb2: root hub lost power or was reset
xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127.
xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127.
xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127.
xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127.
xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127.
xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127.
xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127.
xhci-hcd xhci-hcd.0.auto: Error while assigning device slot ID
xhci-hcd xhci-hcd.0.auto: Max number of devices this xHCI host supports is 127.
PM: resume of devices complete after 75567.769 msecs
Restarting tasks ...
usb 1-1: USB disconnect, device number 2
usb 2-1: USB disconnect, device number 2
usb 2-1.1: USB disconnect, device number 3
done.
root@phoenix:~#

Signed-off-by: Sriram Dash <sriram.dash@nxp.com>
Signed-off-by: Rajesh Bhagat <rajesh.bhagat@nxp.com>
---
 drivers/usb/core/hub.c       |   12 ++++++++----
 drivers/usb/host/xhci-ring.c |    2 +-
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 38cc4ba..c906018 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -2897,10 +2897,14 @@ done:
 			/* The xHC may think the device is already reset,
 			 * so ignore the status.
 			 */
-			if (hcd->driver->reset_device)
-				hcd->driver->reset_device(hcd, udev);
-
-			usb_set_device_state(udev, USB_STATE_DEFAULT);
+			if (hcd->driver->reset_device) {
+				status = hcd->driver->reset_device(hcd, udev);
+				if (status == 0)
+					usb_set_device_state(udev, USB_STATE_DEFAULT);
+				else
+					usb_set_device_state(udev, USB_STATE_NOTATTACHED);
+			} else
+				usb_set_device_state(udev, USB_STATE_DEFAULT);
 		}
 	} else {
 		if (udev)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 7cf6621..be8fd61 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1272,9 +1272,9 @@ void xhci_handle_command_timeout(unsigned long data)
 		spin_unlock_irqrestore(&xhci->lock, flags);
 		xhci_dbg(xhci, "Command timeout\n");
 		ret = xhci_abort_cmd_ring(xhci);
+		xhci_cleanup_command_queue(xhci);
 		if (unlikely(ret == -ESHUTDOWN)) {
 			xhci_err(xhci, "Abort command ring failed\n");
-			xhci_cleanup_command_queue(xhci);
 			usb_hc_died(xhci_to_hcd(xhci)->primary_hcd);
 			xhci_dbg(xhci, "xHCI host controller is dead.\n");
 		}
-- 
1.7.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread