* [PATCH] usb: hub: Fix error loop seen after hub communication errors @ 2017-03-08 21:08 Guenter Roeck 2017-03-10 21:14 ` Alan Stern 0 siblings, 1 reply; 3+ messages in thread From: Guenter Roeck @ 2017-03-08 21:08 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Douglas Anderson, Brian Norris, linux-usb, linux-kernel, Guenter Roeck While stress testing a usb controller using a bind/unbind looop, the following error loop was observed. usb 7-1.2: new low-speed USB device number 3 using xhci-hcd usb 7-1.2: hub failed to enable device, error -108 usb 7-1-port2: cannot disable (err = -22) usb 7-1-port2: couldn't allocate usb_device usb 7-1-port2: cannot disable (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: activate --> -22 hub 7-1:1.0: hub_ext_port_status failed (err = -22) hub 7-1:1.0: hub_ext_port_status failed (err = -22) ** 57 printk messages dropped ** hub 7-1:1.0: activate --> -22 ** 82 printk messages dropped ** hub 7-1:1.0: hub_ext_port_status failed (err = -22) This continues forever. After adding tracebacks into the code, the call sequence leading to this is found to be as follows. [<ffffffc0007fc8e0>] hub_activate+0x368/0x7b8 [<ffffffc0007fceb4>] hub_resume+0x2c/0x3c [<ffffffc00080b3b8>] usb_resume_interface.isra.6+0x128/0x158 [<ffffffc00080b5d0>] usb_suspend_both+0x1e8/0x288 [<ffffffc00080c9c4>] usb_runtime_suspend+0x3c/0x98 [<ffffffc0007820a0>] __rpm_callback+0x48/0x7c [<ffffffc00078217c>] rpm_callback+0xa8/0xd4 [<ffffffc000786234>] rpm_suspend+0x84/0x758 [<ffffffc000786ca4>] rpm_idle+0x2c8/0x498 [<ffffffc000786ed4>] __pm_runtime_idle+0x60/0xac [<ffffffc00080eba8>] usb_autopm_put_interface+0x6c/0x7c [<ffffffc000803798>] hub_event+0x10ac/0x12ac [<ffffffc000249bb8>] process_one_work+0x390/0x6b8 [<ffffffc00024abcc>] worker_thread+0x480/0x610 [<ffffffc000251a80>] kthread+0x164/0x178 [<ffffffc0002045d0>] ret_from_fork+0x10/0x40 kick_hub_wq() is called from hub_activate() even after failures to communicate with the hub. This results in an endless sequence of hub event -> hub activate -> wq trigger -> hub event -> ... To break the loop, only trigger the hub event queue if communication with the hub is successful. Signed-off-by: Guenter Roeck <linux@roeck-us.net> --- drivers/usb/core/hub.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 5286bf67869a..f22ab428b310 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -1197,7 +1197,8 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) &hub->leds, LED_CYCLE_PERIOD); /* Scan all ports that need attention */ - kick_hub_wq(hub); + if (!status) + kick_hub_wq(hub); if (type == HUB_INIT2 || type == HUB_INIT3) { /* Allow autosuspend if it was suppressed */ -- 2.7.4 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] usb: hub: Fix error loop seen after hub communication errors 2017-03-08 21:08 [PATCH] usb: hub: Fix error loop seen after hub communication errors Guenter Roeck @ 2017-03-10 21:14 ` Alan Stern 2017-03-10 23:37 ` Guenter Roeck 0 siblings, 1 reply; 3+ messages in thread From: Alan Stern @ 2017-03-10 21:14 UTC (permalink / raw) To: Guenter Roeck Cc: Greg Kroah-Hartman, Douglas Anderson, Brian Norris, linux-usb, linux-kernel On Wed, 8 Mar 2017, Guenter Roeck wrote: > While stress testing a usb controller using a bind/unbind looop, the > following error loop was observed. > > usb 7-1.2: new low-speed USB device number 3 using xhci-hcd > usb 7-1.2: hub failed to enable device, error -108 > usb 7-1-port2: cannot disable (err = -22) > usb 7-1-port2: couldn't allocate usb_device > usb 7-1-port2: cannot disable (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: activate --> -22 > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: hub_ext_port_status failed (err = -22) > hub 7-1:1.0: activate --> -22 ... > This continues forever. After adding tracebacks into the code, > the call sequence leading to this is found to be as follows. > > [<ffffffc0007fc8e0>] hub_activate+0x368/0x7b8 > [<ffffffc0007fceb4>] hub_resume+0x2c/0x3c > [<ffffffc00080b3b8>] usb_resume_interface.isra.6+0x128/0x158 > [<ffffffc00080b5d0>] usb_suspend_both+0x1e8/0x288 > [<ffffffc00080c9c4>] usb_runtime_suspend+0x3c/0x98 > [<ffffffc0007820a0>] __rpm_callback+0x48/0x7c > [<ffffffc00078217c>] rpm_callback+0xa8/0xd4 > [<ffffffc000786234>] rpm_suspend+0x84/0x758 > [<ffffffc000786ca4>] rpm_idle+0x2c8/0x498 > [<ffffffc000786ed4>] __pm_runtime_idle+0x60/0xac > [<ffffffc00080eba8>] usb_autopm_put_interface+0x6c/0x7c > [<ffffffc000803798>] hub_event+0x10ac/0x12ac > [<ffffffc000249bb8>] process_one_work+0x390/0x6b8 > [<ffffffc00024abcc>] worker_thread+0x480/0x610 > [<ffffffc000251a80>] kthread+0x164/0x178 > [<ffffffc0002045d0>] ret_from_fork+0x10/0x40 > > kick_hub_wq() is called from hub_activate() even after failures to > communicate with the hub. This results in an endless sequence of > hub event -> hub activate -> wq trigger -> hub event -> ... > To break the loop, only trigger the hub event queue if communication > with the hub is successful. > > Signed-off-by: Guenter Roeck <linux@roeck-us.net> > --- > drivers/usb/core/hub.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c > index 5286bf67869a..f22ab428b310 100644 > --- a/drivers/usb/core/hub.c > +++ b/drivers/usb/core/hub.c > @@ -1197,7 +1197,8 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) > &hub->leds, LED_CYCLE_PERIOD); > > /* Scan all ports that need attention */ > - kick_hub_wq(hub); > + if (!status) > + kick_hub_wq(hub); > > if (type == HUB_INIT2 || type == HUB_INIT3) { > /* Allow autosuspend if it was suppressed */ I'm not at all sure this is the best solution. status == 0 means that an URB was successfully _submitted_; it does not mean that future communication with the hub will be successful. The problem arises here because the system is unable to runtime-suspend the hub (you didn't say why, but probably because of the same communication problem). When a suspend fails, drivers are informed that the device is back at full power, and of course the hub driver tries to determine the hub's current state at that time. When this fails, there's no reason to keep the hub at full power any longer, so the system tries to do another runtime suspend. This suspend fails just like the first one, and there's your loop. This same kind of thing could happen with any driver. In your case, the problem could be solved by making sure that when the hub fails to suspend, it is still at least minimally functional. We don't do this now; when usb_port_suspend fails there is no checking to see whether the device is still working. (It would be interesting to know exactly at which point usb_port_suspend fails during your test. And why didn't your system mark the hub as disconnected when the host controller was unbound?) An even worse problem is what to do when the device is working okay but runtime suspend still fails (for example, if remote wakeup cannot be enabled). How do we prevent the system from going into a similar loop then? Alan Stern ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] usb: hub: Fix error loop seen after hub communication errors 2017-03-10 21:14 ` Alan Stern @ 2017-03-10 23:37 ` Guenter Roeck 0 siblings, 0 replies; 3+ messages in thread From: Guenter Roeck @ 2017-03-10 23:37 UTC (permalink / raw) To: Alan Stern Cc: Greg Kroah-Hartman, Douglas Anderson, Brian Norris, linux-usb, linux-kernel Hi Alan, On 03/10/2017 01:14 PM, Alan Stern wrote: > On Wed, 8 Mar 2017, Guenter Roeck wrote: > >> While stress testing a usb controller using a bind/unbind looop, the >> following error loop was observed. >> >> usb 7-1.2: new low-speed USB device number 3 using xhci-hcd >> usb 7-1.2: hub failed to enable device, error -108 >> usb 7-1-port2: cannot disable (err = -22) >> usb 7-1-port2: couldn't allocate usb_device >> usb 7-1-port2: cannot disable (err = -22) >> hub 7-1:1.0: hub_ext_port_status failed (err = -22) >> hub 7-1:1.0: hub_ext_port_status failed (err = -22) >> hub 7-1:1.0: activate --> -22 >> hub 7-1:1.0: hub_ext_port_status failed (err = -22) >> hub 7-1:1.0: hub_ext_port_status failed (err = -22) >> hub 7-1:1.0: activate --> -22 > ... > >> This continues forever. After adding tracebacks into the code, >> the call sequence leading to this is found to be as follows. >> >> [<ffffffc0007fc8e0>] hub_activate+0x368/0x7b8 >> [<ffffffc0007fceb4>] hub_resume+0x2c/0x3c >> [<ffffffc00080b3b8>] usb_resume_interface.isra.6+0x128/0x158 >> [<ffffffc00080b5d0>] usb_suspend_both+0x1e8/0x288 >> [<ffffffc00080c9c4>] usb_runtime_suspend+0x3c/0x98 >> [<ffffffc0007820a0>] __rpm_callback+0x48/0x7c >> [<ffffffc00078217c>] rpm_callback+0xa8/0xd4 >> [<ffffffc000786234>] rpm_suspend+0x84/0x758 >> [<ffffffc000786ca4>] rpm_idle+0x2c8/0x498 >> [<ffffffc000786ed4>] __pm_runtime_idle+0x60/0xac >> [<ffffffc00080eba8>] usb_autopm_put_interface+0x6c/0x7c >> [<ffffffc000803798>] hub_event+0x10ac/0x12ac >> [<ffffffc000249bb8>] process_one_work+0x390/0x6b8 >> [<ffffffc00024abcc>] worker_thread+0x480/0x610 >> [<ffffffc000251a80>] kthread+0x164/0x178 >> [<ffffffc0002045d0>] ret_from_fork+0x10/0x40 >> >> kick_hub_wq() is called from hub_activate() even after failures to >> communicate with the hub. This results in an endless sequence of >> hub event -> hub activate -> wq trigger -> hub event -> ... >> To break the loop, only trigger the hub event queue if communication >> with the hub is successful. >> >> Signed-off-by: Guenter Roeck <linux@roeck-us.net> >> --- >> drivers/usb/core/hub.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c >> index 5286bf67869a..f22ab428b310 100644 >> --- a/drivers/usb/core/hub.c >> +++ b/drivers/usb/core/hub.c >> @@ -1197,7 +1197,8 @@ static void hub_activate(struct usb_hub *hub, enum hub_activation_type type) >> &hub->leds, LED_CYCLE_PERIOD); >> >> /* Scan all ports that need attention */ >> - kick_hub_wq(hub); >> + if (!status) >> + kick_hub_wq(hub); >> >> if (type == HUB_INIT2 || type == HUB_INIT3) { >> /* Allow autosuspend if it was suppressed */ > > I'm not at all sure this is the best solution. status == 0 means that > an URB was successfully _submitted_; it does not mean that future > communication with the hub will be successful. > > The problem arises here because the system is unable to runtime-suspend > the hub (you didn't say why, but probably because of the same > communication problem). When a suspend fails, drivers are informed Yes, exactly. > that the device is back at full power, and of course the hub driver > tries to determine the hub's current state at that time. When this > fails, there's no reason to keep the hub at full power any longer, so > the system tries to do another runtime suspend. This suspend fails > just like the first one, and there's your loop. > If we don't cut the loop in hub_activate(), where else would be a good place to cut it ? > This same kind of thing could happen with any driver. In your case, > the problem could be solved by making sure that when the hub fails to > suspend, it is still at least minimally functional. We don't do this > now; when usb_port_suspend fails there is no checking to see whether > the device is still working. > > (It would be interesting to know exactly at which point > usb_port_suspend fails during your test. And why didn't your system > mark the hub as disconnected when the host controller was unbound?) > It does, but that happens a bit later. If I disable console logging (for example by using dev_err_ratelimited() instead of dev_err() for the observed errors), the hub is marked as disabled after a few hundred or a few thousand loops. Only that code never gets to run runs if console logging is enabled, and to restrict logging didn't seem to be a good solution for the problem. The test script I am running is: while true do for i in /sys/bus/platform/drivers/rockchip-dwc3/usb*; do basename $i > $(dirname $i)/unbind basename $i > $(dirname $i)/bind done sleep $(awk "BEGIN {printf \"%.2f\", $((RANDOM % 150)) / 100 + 0.5}") done where rockchip-dwc3 is a slightly more complex version of dwc3-of-simple.c, with pretty much the same functionality, only that it ties into extcon to be able to handle cable states. The hub disconnect is a result of a call to of_platform_depopulate(), similar to the same call in dwc3-of-simple.c. The sleep in the above script is essential; it results in all kinds of race conditions. The one I am trying to fix here is just one of them. The system under test has two Type-C ports; one is connected to an Ethernet interface and the other to an Apple Type-C adapter (it probably doesn't matter much what is connected, as long as there is a hub). > An even worse problem is what to do when the device is working okay but > runtime suspend still fails (for example, if remote wakeup cannot be > enabled). How do we prevent the system from going into a similar loop > then? > Good question. I am open to suggestions. Thanks, Guenter ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-03-10 23:38 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-03-08 21:08 [PATCH] usb: hub: Fix error loop seen after hub communication errors Guenter Roeck 2017-03-10 21:14 ` Alan Stern 2017-03-10 23:37 ` Guenter Roeck
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.