All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v2] usb: hub: Fix error loop seen after hub communication errors
@ 2017-03-15 18:19 Guenter Roeck
  2017-03-15 19:59 ` Alan Stern
  0 siblings, 1 reply; 2+ messages in thread
From: Guenter Roeck @ 2017-03-15 18:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Alan Stern, linux-usb, linux-kernel, Douglas Anderson,
	Brian Norris, Guenter Roeck

While stress testing a usb controller using a bind/unbind looop, the
following error loop was observed.

usb 7-1.2: new low-speed USB device number 3 using xhci-hcd
usb 7-1.2: hub failed to enable device, error -108
usb 7-1-port2: cannot disable (err = -22)
usb 7-1-port2: couldn't allocate usb_device
usb 7-1-port2: cannot disable (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: activate --> -22
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
hub 7-1:1.0: hub_ext_port_status failed (err = -22)
** 57 printk messages dropped ** hub 7-1:1.0: activate --> -22
** 82 printk messages dropped ** hub 7-1:1.0: hub_ext_port_status failed (err = -22)

This continues forever. After adding tracebacks into the code,
the call sequence leading to this is found to be as follows.

[<ffffffc0007fc8e0>] hub_activate+0x368/0x7b8
[<ffffffc0007fceb4>] hub_resume+0x2c/0x3c
[<ffffffc00080b3b8>] usb_resume_interface.isra.6+0x128/0x158
[<ffffffc00080b5d0>] usb_suspend_both+0x1e8/0x288
[<ffffffc00080c9c4>] usb_runtime_suspend+0x3c/0x98
[<ffffffc0007820a0>] __rpm_callback+0x48/0x7c
[<ffffffc00078217c>] rpm_callback+0xa8/0xd4
[<ffffffc000786234>] rpm_suspend+0x84/0x758
[<ffffffc000786ca4>] rpm_idle+0x2c8/0x498
[<ffffffc000786ed4>] __pm_runtime_idle+0x60/0xac
[<ffffffc00080eba8>] usb_autopm_put_interface+0x6c/0x7c
[<ffffffc000803798>] hub_event+0x10ac/0x12ac
[<ffffffc000249bb8>] process_one_work+0x390/0x6b8
[<ffffffc00024abcc>] worker_thread+0x480/0x610
[<ffffffc000251a80>] kthread+0x164/0x178
[<ffffffc0002045d0>] ret_from_fork+0x10/0x40

kick_hub_wq() is called from hub_activate() even after failures to
communicate with the hub. This results in an endless sequence of
hub event -> hub activate -> wq trigger -> hub event -> ...

Provide two solutions for the problem.

- Only trigger the hub event queue if communication with the hub
  is successful.
- After a suspend failure, only resume already suspended interfaces
  if the communication with the device is still possible.

Each of the changes fixes the observed problem.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
v2: Instead of not triggering the hub wq after an error to submit an urb,
    implement a more complex error detection and handling. Do it in two
    places. Marked as RFC to determine if one (or both) of those solutions
    are viable.

 drivers/usb/core/driver.c | 14 ++++++++++++++
 drivers/usb/core/hub.c    | 15 ++++++++++++++-
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c
index cdee5130638b..40a0ab395d82 100644
--- a/drivers/usb/core/driver.c
+++ b/drivers/usb/core/driver.c
@@ -1331,6 +1331,20 @@ static int usb_suspend_both(struct usb_device *udev, pm_message_t msg)
 		 */
 		if (udev->parent && !PMSG_IS_AUTO(msg))
 			status = 0;
+
+		/*
+		 * If the device is inaccessible, don't try to resume
+		 * suspended interfaces and just return the error.
+		 */
+		if (status) {
+			int err;
+			u16 devstat;
+
+			err = usb_get_status(udev, USB_RECIP_DEVICE, 0,
+					     &devstat);
+			if (err)
+				goto done;
+		}
 	}
 
 	/* If the suspend failed, resume interfaces that did get suspended */
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 5a420657f9f7..8a74f483cfef 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1066,6 +1066,19 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)
 
 		portstatus = portchange = 0;
 		status = hub_port_status(hub, port1, &portstatus, &portchange);
+		if (status) {
+			u16 hubstatus;
+
+			ret = usb_get_status(hdev, USB_RECIP_DEVICE, 0,
+					     &hubstatus);
+			if (ret < 0) {
+				dev_err(&hdev->dev,
+					"Failed to read hub status (%d)\n",
+					ret);
+				goto abort;
+			}
+		}
+
 		if (udev || (portstatus & USB_PORT_STAT_CONNECTION))
 			dev_dbg(&port_dev->dev, "status %04x change %04x\n",
 					portstatus, portchange);
@@ -1198,7 +1211,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)
 
 	/* Scan all ports that need attention */
 	kick_hub_wq(hub);
-
+abort:
 	if (type == HUB_INIT2 || type == HUB_INIT3) {
 		/* Allow autosuspend if it was suppressed */
  disconnected:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [RFC PATCH v2] usb: hub: Fix error loop seen after hub communication errors
  2017-03-15 18:19 [RFC PATCH v2] usb: hub: Fix error loop seen after hub communication errors Guenter Roeck
@ 2017-03-15 19:59 ` Alan Stern
  0 siblings, 0 replies; 2+ messages in thread
From: Alan Stern @ 2017-03-15 19:59 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Greg Kroah-Hartman, linux-usb, linux-kernel, Douglas Anderson,
	Brian Norris

On Wed, 15 Mar 2017, Guenter Roeck wrote:

> While stress testing a usb controller using a bind/unbind looop, the
> following error loop was observed.
> 
> usb 7-1.2: new low-speed USB device number 3 using xhci-hcd
> usb 7-1.2: hub failed to enable device, error -108
> usb 7-1-port2: cannot disable (err = -22)
> usb 7-1-port2: couldn't allocate usb_device
> usb 7-1-port2: cannot disable (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: activate --> -22
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> ** 57 printk messages dropped ** hub 7-1:1.0: activate --> -22
> ** 82 printk messages dropped ** hub 7-1:1.0: hub_ext_port_status failed (err = -22)
> 
> This continues forever. After adding tracebacks into the code,
> the call sequence leading to this is found to be as follows.
> 
> [<ffffffc0007fc8e0>] hub_activate+0x368/0x7b8
> [<ffffffc0007fceb4>] hub_resume+0x2c/0x3c
> [<ffffffc00080b3b8>] usb_resume_interface.isra.6+0x128/0x158
> [<ffffffc00080b5d0>] usb_suspend_both+0x1e8/0x288
> [<ffffffc00080c9c4>] usb_runtime_suspend+0x3c/0x98
> [<ffffffc0007820a0>] __rpm_callback+0x48/0x7c
> [<ffffffc00078217c>] rpm_callback+0xa8/0xd4
> [<ffffffc000786234>] rpm_suspend+0x84/0x758
> [<ffffffc000786ca4>] rpm_idle+0x2c8/0x498
> [<ffffffc000786ed4>] __pm_runtime_idle+0x60/0xac
> [<ffffffc00080eba8>] usb_autopm_put_interface+0x6c/0x7c
> [<ffffffc000803798>] hub_event+0x10ac/0x12ac
> [<ffffffc000249bb8>] process_one_work+0x390/0x6b8
> [<ffffffc00024abcc>] worker_thread+0x480/0x610
> [<ffffffc000251a80>] kthread+0x164/0x178
> [<ffffffc0002045d0>] ret_from_fork+0x10/0x40
> 
> kick_hub_wq() is called from hub_activate() even after failures to
> communicate with the hub. This results in an endless sequence of
> hub event -> hub activate -> wq trigger -> hub event -> ...
> 
> Provide two solutions for the problem.
> 
> - Only trigger the hub event queue if communication with the hub
>   is successful.
> - After a suspend failure, only resume already suspended interfaces
>   if the communication with the device is still possible.
> 
> Each of the changes fixes the observed problem.
> 
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
> v2: Instead of not triggering the hub wq after an error to submit an urb,
>     implement a more complex error detection and handling. Do it in two
>     places. Marked as RFC to determine if one (or both) of those solutions
>     are viable.

This seems like a good approach, and adding both checks should provide 
a valuable extra layer of security.

>  drivers/usb/core/driver.c | 14 ++++++++++++++
>  drivers/usb/core/hub.c    | 15 ++++++++++++++-
>  2 files changed, 28 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/core/driver.c b/drivers/usb/core/driver.c
> index cdee5130638b..40a0ab395d82 100644
> --- a/drivers/usb/core/driver.c
> +++ b/drivers/usb/core/driver.c
> @@ -1331,6 +1331,20 @@ static int usb_suspend_both(struct usb_device *udev, pm_message_t msg)
>  		 */
>  		if (udev->parent && !PMSG_IS_AUTO(msg))
>  			status = 0;
> +
> +		/*
> +		 * If the device is inaccessible, don't try to resume
> +		 * suspended interfaces and just return the error.
> +		 */
> +		if (status) {
> +			int err;
> +			u16 devstat;
> +
> +			err = usb_get_status(udev, USB_RECIP_DEVICE, 0,
> +					     &devstat);
> +			if (err)

Putting a log message here would be a good idea.

For example, suppose a device spontaneously stops working.  The driver,
not receiving any data from the device, might decide to suspend it.  
The only clue we would have that something has gone wrong would be
right here.

> +				goto done;
> +		}
>  	}
>  
>  	/* If the suspend failed, resume interfaces that did get suspended */
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 5a420657f9f7..8a74f483cfef 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -1066,6 +1066,19 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)
>  
>  		portstatus = portchange = 0;
>  		status = hub_port_status(hub, port1, &portstatus, &portchange);
> +		if (status) {
> +			u16 hubstatus;
> +
> +			ret = usb_get_status(hdev, USB_RECIP_DEVICE, 0,
> +					     &hubstatus);

In this case, since this code is in the middle of a loop that reads all
the port statuses, I wouldn't bother trying to read the hub status.  
Just log the error message and abort.

> +			if (ret < 0) {
> +				dev_err(&hdev->dev,
> +					"Failed to read hub status (%d)\n",
> +					ret);
> +				goto abort;
> +			}
> +		}
> +
>  		if (udev || (portstatus & USB_PORT_STAT_CONNECTION))
>  			dev_dbg(&port_dev->dev, "status %04x change %04x\n",
>  					portstatus, portchange);
> @@ -1198,7 +1211,7 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)
>  
>  	/* Scan all ports that need attention */
>  	kick_hub_wq(hub);
> -
> +abort:
>  	if (type == HUB_INIT2 || type == HUB_INIT3) {
>  		/* Allow autosuspend if it was suppressed */
>   disconnected:

On the whole, this looks very good.

Alan Stern

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-03-15 19:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-15 18:19 [RFC PATCH v2] usb: hub: Fix error loop seen after hub communication errors Guenter Roeck
2017-03-15 19:59 ` Alan Stern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.