All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] usb: xhci: Use non-interruptible sleep for XHCI commands
@ 2013-05-25  1:39 Julius Werner
  2013-05-29 18:06 ` Sarah Sharp
  0 siblings, 1 reply; 4+ messages in thread
From: Julius Werner @ 2013-05-25  1:39 UTC (permalink / raw)
  To: Sarah Sharp
  Cc: Greg Kroah-Hartman, Frank Lömker, linux-usb, linux-kernel,
	Vincent Palatin, Julius Werner

The XHCI stack usually uses wait_for_completion_interruptible_timeout()
to wait for the completion of an XHCI command, and treats both timeouts
and interruptions as errors. This is a bad idea, since these commands
are often essential for the correct operation of the USB stack, and
their failure can leave devices in a misconfigured and/or unusable
state. While a timeout probably means a real problem that needs to be
dealt with, a random signal to the controlling user-space process should
not cause such harsh consequences.

This patch changes that behavior to use uninterruptible waits instead.
It fixes an easy to reproduce bug that occurs when you kill -9 a
process that reads from a UVC camera. The UVC driver's release code
tries to set the camera's alternate setting back to 0, but the lingering
SIGKILL instantly aborts any Stop Endpoint or Configure Endpoint command
sent to the xHC. The code dies halfway through the bandwidth allocation
process, leaving the device in an active configuration and preventing
future access to it due to the now out-of-sync bandwidth calculation.

This patch should be backported to kernels as far 2.6.31 that contain
the commit 3ffbba9511b4148cbe1f6b6238686adaeaca8feb "USB: xhci: Allocate
and address USB devices".

Signed-off-by: Julius Werner <jwerner@chromium.org>
---
 drivers/usb/host/xhci-hub.c |  7 +++----
 drivers/usb/host/xhci.c     | 26 +++++++++++---------------
 2 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 187a3ec..c1bcfa8 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -293,12 +293,11 @@ static int xhci_stop_device(struct xhci_hcd *xhci, int slot_id, int suspend)
 	spin_unlock_irqrestore(&xhci->lock, flags);
 
 	/* Wait for last stop endpoint command to finish */
-	timeleft = wait_for_completion_interruptible_timeout(
+	timeleft = wait_for_completion_timeout(
 			cmd->completion,
-			USB_CTRL_SET_TIMEOUT);
+			XHCI_CMD_DEFAULT_TIMEOUT);
 	if (timeleft <= 0) {
-		xhci_warn(xhci, "%s while waiting for stop endpoint command\n",
-				timeleft == 0 ? "Timeout" : "Signal");
+		xhci_warn(xhci, "Timeout while waiting for Stop Endpoint\n");
 		spin_lock_irqsave(&xhci->lock, flags);
 		/* The timeout might have raced with the event ring handler, so
 		 * only delete from the list if the item isn't poisoned.
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index b4aa79d..302f992 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -2613,15 +2613,14 @@ static int xhci_configure_endpoint(struct xhci_hcd *xhci,
 	spin_unlock_irqrestore(&xhci->lock, flags);
 
 	/* Wait for the configure endpoint command to complete */
-	timeleft = wait_for_completion_interruptible_timeout(
+	timeleft = wait_for_completion_timeout(
 			cmd_completion,
 			XHCI_CMD_DEFAULT_TIMEOUT);
 	if (timeleft <= 0) {
-		xhci_warn(xhci, "%s while waiting for %s command\n",
-				timeleft == 0 ? "Timeout" : "Signal",
+		xhci_warn(xhci, "Timeout while waiting for %s\n",
 				ctx_change == 0 ?
-					"configure endpoint" :
-					"evaluate context");
+					"Configure Endpoint" :
+					"Evaluate Context");
 		/* cancel the configure endpoint command */
 		ret = xhci_cancel_cmd(xhci, command, cmd_trb);
 		if (ret < 0)
@@ -3399,12 +3398,11 @@ int xhci_discover_or_reset_device(struct usb_hcd *hcd, struct usb_device *udev)
 	spin_unlock_irqrestore(&xhci->lock, flags);
 
 	/* Wait for the Reset Device command to finish */
-	timeleft = wait_for_completion_interruptible_timeout(
+	timeleft = wait_for_completion_timeout(
 			reset_device_cmd->completion,
-			USB_CTRL_SET_TIMEOUT);
+			XHCI_CMD_DEFAULT_TIMEOUT);
 	if (timeleft <= 0) {
-		xhci_warn(xhci, "%s while waiting for reset device command\n",
-				timeleft == 0 ? "Timeout" : "Signal");
+		xhci_warn(xhci, "Timeout while waiting for Reset Device\n");
 		spin_lock_irqsave(&xhci->lock, flags);
 		/* The timeout might have raced with the event ring handler, so
 		 * only delete from the list if the item isn't poisoned.
@@ -3588,11 +3586,10 @@ int xhci_alloc_dev(struct usb_hcd *hcd, struct usb_device *udev)
 	spin_unlock_irqrestore(&xhci->lock, flags);
 
 	/* XXX: how much time for xHC slot assignment? */
-	timeleft = wait_for_completion_interruptible_timeout(&xhci->addr_dev,
+	timeleft = wait_for_completion_timeout(&xhci->addr_dev,
 			XHCI_CMD_DEFAULT_TIMEOUT);
 	if (timeleft <= 0) {
-		xhci_warn(xhci, "%s while waiting for a slot\n",
-				timeleft == 0 ? "Timeout" : "Signal");
+		xhci_warn(xhci, "Timeout while waiting for a slot\n");
 		/* cancel the enable slot request */
 		return xhci_cancel_cmd(xhci, NULL, cmd_trb);
 	}
@@ -3706,15 +3703,14 @@ int xhci_address_device(struct usb_hcd *hcd, struct usb_device *udev)
 	spin_unlock_irqrestore(&xhci->lock, flags);
 
 	/* ctrl tx can take up to 5 sec; XXX: need more time for xHC? */
-	timeleft = wait_for_completion_interruptible_timeout(&xhci->addr_dev,
+	timeleft = wait_for_completion_timeout(&xhci->addr_dev,
 			XHCI_CMD_DEFAULT_TIMEOUT);
 	/* FIXME: From section 4.3.4: "Software shall be responsible for timing
 	 * the SetAddress() "recovery interval" required by USB and aborting the
 	 * command on a timeout.
 	 */
 	if (timeleft <= 0) {
-		xhci_warn(xhci, "%s while waiting for address device command\n",
-				timeleft == 0 ? "Timeout" : "Signal");
+		xhci_warn(xhci, "Timeout while waiting for Address Device\n");
 		/* cancel the address device command */
 		ret = xhci_cancel_cmd(xhci, NULL, cmd_trb);
 		if (ret < 0)
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] usb: xhci: Use non-interruptible sleep for XHCI commands
  2013-05-25  1:39 [PATCH] usb: xhci: Use non-interruptible sleep for XHCI commands Julius Werner
@ 2013-05-29 18:06 ` Sarah Sharp
  2013-05-29 20:36   ` Julius Werner
  0 siblings, 1 reply; 4+ messages in thread
From: Sarah Sharp @ 2013-05-29 18:06 UTC (permalink / raw)
  To: Julius Werner
  Cc: Greg Kroah-Hartman, Frank Lömker, linux-usb, linux-kernel,
	Vincent Palatin

I really wish you had contacted me about this issue before writing code
to fix it.  Now I don't know the backstory behind this issue, which
makes it hard to evaluate whether this fix is correct.

On Fri, May 24, 2013 at 06:39:37PM -0700, Julius Werner wrote:
> The XHCI stack usually uses wait_for_completion_interruptible_timeout()
> to wait for the completion of an XHCI command, and treats both timeouts
> and interruptions as errors. This is a bad idea, since these commands
> are often essential for the correct operation of the USB stack, and
> their failure can leave devices in a misconfigured and/or unusable
> state.

What causes the devices to be "unusable"?  If a Configure Endpoint
command fails, the USB core is supposed to try to put the device back
into its original state.  Is there a mismatch caused by the command
being interrupted, and if so, how can we fix it, rather than allowing
the kernel to go into an uninterruptible sleep?

> While a timeout probably means a real problem that needs to be
> dealt with, a random signal to the controlling user-space process should
> not cause such harsh consequences.

We should deal with the canceled command gracefully, rather than
papering over the issue.

> This patch changes that behavior to use uninterruptible waits instead.
> It fixes an easy to reproduce bug that occurs when you kill -9 a
> process that reads from a UVC camera. The UVC driver's release code
> tries to set the camera's alternate setting back to 0, but the lingering
> SIGKILL instantly aborts any Stop Endpoint or Configure Endpoint command
> sent to the xHC. The code dies halfway through the bandwidth allocation
> process, leaving the device in an active configuration and preventing
> future access to it due to the now out-of-sync bandwidth calculation.

Obviously the command handling needs to be re-worked so this bandwidth
mismatch doesn't happen.  What I would like to see instead (but have not
had time to implement) is a global command queue manager.  It would have
a queue of commands, similar to what we do for the TD lists, but only
one list per xhci_hcd.  The command queue manager would handle
cancellation requests, and properly keep track of whether each command
completed.

Functions submitting commands would all have a unique completion, and
wait (interruptibly) on that completion.  Commands that are interrupted
with a signal should attempt to cancel the command, and wait on the
completion to see if their command was canceled or successfully
completed.  If it was successfully completed, it should return success,
rather than -ETIMEOUT.

I think that change will fix the case where the bandwidth allocation
gets out-of-sync with the USB core, but since you haven't shared the
details of how the code handles being interrupted, I can't tell whether
this is actually a good solution.

Sarah Sharp

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] usb: xhci: Use non-interruptible sleep for XHCI commands
  2013-05-29 18:06 ` Sarah Sharp
@ 2013-05-29 20:36   ` Julius Werner
  2013-08-20 20:48     ` Julius Werner
  0 siblings, 1 reply; 4+ messages in thread
From: Julius Werner @ 2013-05-29 20:36 UTC (permalink / raw)
  To: Sarah Sharp
  Cc: Julius Werner, Greg Kroah-Hartman, Frank Lömker, linux-usb,
	LKML, Vincent Palatin

On Wed, May 29, 2013 at 11:06 AM, Sarah Sharp
<sarah.a.sharp@linux.intel.com> wrote:
> I really wish you had contacted me about this issue before writing code
> to fix it.  Now I don't know the backstory behind this issue, which
> makes it hard to evaluate whether this fix is correct.

No worries... this was a very trivial patch and didn't take long. I
won't mind if we settle on a different solution here eventually (even
though I think this is the way that makes sense). The issue is very
easy to reproduce: just read of a UVC camera (I use Python with
"import cv2; dev = cv2.VideoCapture(0); dev.read()") and kill -9 the
process. However, you will have to pull in my other patch first ("usb:
xhci: Fix Command Ring Stopped Event handling"), or you may run into a
follow-up problem with cancelled commands that kills your whole HCD.

> On Fri, May 24, 2013 at 06:39:37PM -0700, Julius Werner wrote:
>> The XHCI stack usually uses wait_for_completion_interruptible_timeout()
>> to wait for the completion of an XHCI command, and treats both timeouts
>> and interruptions as errors. This is a bad idea, since these commands
>> are often essential for the correct operation of the USB stack, and
>> their failure can leave devices in a misconfigured and/or unusable
>> state.
>
> What causes the devices to be "unusable"?  If a Configure Endpoint
> command fails, the USB core is supposed to try to put the device back
> into its original state.  Is there a mismatch caused by the command
> being interrupted, and if so, how can we fix it, rather than allowing
> the kernel to go into an uninterruptible sleep?

The problem is that this happens while the USB core is already trying
to put the device back into its default/inactive state: the process
gets killed, its /dev/video0 file descriptor gets closed, the
uvc_v4l2_release() handler runs and eventually calls
usb_set_interface() to return the device to alternate setting 0 (back
from the "active" alternate setting that it was in while transmitting
video). But that alternate setting change requires an XHCI command,
which gets immediately cancelled because the SIGKILL signal keeps
lingering on the process until it is dead. (To be honest I am not
quite sure why the device stays unusable after that... I just get
bandwidth exceeded errors when I try to read it again from a new
process. Maybe there is another error in the bandwidth management code
there, but the problem remains that the UVC driver should be allowed
to correctly return the device to its default state even during a
SIGKILL.)

The other thing to note is that we already use uninterruptible sleeps
in many other places -- the USB core does it all the time: most
prominently in usb_start_wait_urb(), which is used for device
communication (SET_ADDRESS, SET_CONFIGURATION, etc.) during
enumeration (presumably to avoid similar problems as this patch). This
function is even used for the actual SET_INTERFACE message that is
sent to the device when changing alternate settings... so does it make
sense to make the (usually very fast and reliable, unless the hardware
is completely broken) communication with the xHC interruptible, when
for the same operation the much more unreliable communication with the
device is not?

>> This patch changes that behavior to use uninterruptible waits instead.
>> It fixes an easy to reproduce bug that occurs when you kill -9 a
>> process that reads from a UVC camera. The UVC driver's release code
>> tries to set the camera's alternate setting back to 0, but the lingering
>> SIGKILL instantly aborts any Stop Endpoint or Configure Endpoint command
>> sent to the xHC. The code dies halfway through the bandwidth allocation
>> process, leaving the device in an active configuration and preventing
>> future access to it due to the now out-of-sync bandwidth calculation.
>
> Obviously the command handling needs to be re-worked so this bandwidth
> mismatch doesn't happen.  What I would like to see instead (but have not
> had time to implement) is a global command queue manager.  It would have
> a queue of commands, similar to what we do for the TD lists, but only
> one list per xhci_hcd.  The command queue manager would handle
> cancellation requests, and properly keep track of whether each command
> completed.
>
> Functions submitting commands would all have a unique completion, and
> wait (interruptibly) on that completion.  Commands that are interrupted
> with a signal should attempt to cancel the command, and wait on the
> completion to see if their command was canceled or successfully
> completed.  If it was successfully completed, it should return success,
> rather than -ETIMEOUT.
>
> I think that change will fix the case where the bandwidth allocation
> gets out-of-sync with the USB core, but since you haven't shared the
> details of how the code handles being interrupted, I can't tell whether
> this is actually a good solution.

This sounds like a good idea in general, but I don't think it will fix
this problem. The issue is not that command cancellation doesn't work
(at least with my other patch, it does), it's just that the USB core
cannot do its job when every command gets cancelled immediately due to
a lingering signal. The only thing it can do when an add_endpoint() or
drop_endpoint() fails is to assume the device/slot is really hosed and
leave it in whatever state it is (and even if we solved the bandwidth
code so that it could recover from there later, I think leaving it
there in the first place is not good behavior). Therefore I think
cancelling commands (and thus failing those low-level operations) due
to something as trivial as a signal is not a good idea in general (and
the rest of the USB core seems to be implemented according to that
philosophy as well).

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] usb: xhci: Use non-interruptible sleep for XHCI commands
  2013-05-29 20:36   ` Julius Werner
@ 2013-08-20 20:48     ` Julius Werner
  0 siblings, 0 replies; 4+ messages in thread
From: Julius Werner @ 2013-08-20 20:48 UTC (permalink / raw)
  To: Sarah Sharp
  Cc: Greg Kroah-Hartman, Frank Lömker, Julius Werner, linux-usb,
	LKML, Vincent Palatin

Hi Sarah,

I know you are probably swamped with patches, but this
(https://patchwork.kernel.org/patch/2612831/) has been hanging a few
months and I just wanted to double-check if it slipped through
somewhere. I still think this is necessary (regardless of any command
queue reengineering in the future) as explained in my last mail.
Please let me know if (and why) you disagree, and whether you have
intentionally decided to not pick this up.

Same goes (essentially) for:

usb: xhci: Fix Command Ring Stopped Event handling
(https://patchwork.kernel.org/patch/2612821/)
usb: xhci: Consolidate XHCI TD Size calculation
(https://patchwork.kernel.org/patch/2548321/)

Thanks!

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-08-20 20:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-25  1:39 [PATCH] usb: xhci: Use non-interruptible sleep for XHCI commands Julius Werner
2013-05-29 18:06 ` Sarah Sharp
2013-05-29 20:36   ` Julius Werner
2013-08-20 20:48     ` Julius Werner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.