linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Prevent race condition between USB authorisation and USB discovery events
@ 2018-12-17  6:10 Andrew Worsley
  2018-12-17 16:21 ` Alan Stern
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Worsley @ 2018-12-17  6:10 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Alan Stern, Mathias Nyman, Nicolas Boichat,
	Jon Flatley, Kai-Heng Feng, Bin Liu, Benson Leung,
	open list:USB SUBSYSTEM, open list
  Cc: Andrew Worsley

A sysfs driven USB authorisation change can trigger a usb_set_configuration
while a hub_event worker thread is running. This can result in a USB device
being disabled just after it was configured and bringing down all the
devices and impacting hardware and user processes that were established on
top of this these interfaces. In some cases the USB disable never completed
and the whole system hung.

At my work I had an occasional hang due to this race condition. Roughly 1
in 50 boots had the race occurrence and 1 in 4 of those resulted in a hang.
This patch fixed the problem and I had no problems (spurious disables
or hangs) in 750+ boots.

Signed-off-by: Andrew Worsley <amworsley@gmail.com>
---
 drivers/usb/core/hub.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index f76b2e0aba9d..dabd07aa8602 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -51,6 +51,9 @@ static DEFINE_SPINLOCK(device_state_lock);
 static struct workqueue_struct *hub_wq;
 static void hub_event(struct work_struct *work);
 
+/* synchronize hub_event and authorize_device operations */
+DEFINE_MUTEX(usb_authorize_mutex);
+
 /* synchronize hub-port add/remove and peering operations */
 DEFINE_MUTEX(usb_port_peer_mutex);
 
@@ -2538,6 +2541,7 @@ int usb_new_device(struct usb_device *udev)
  */
 int usb_deauthorize_device(struct usb_device *usb_dev)
 {
+	mutex_lock(&usb_authorize_mutex);
 	usb_lock_device(usb_dev);
 	if (usb_dev->authorized == 0)
 		goto out_unauthorized;
@@ -2547,6 +2551,7 @@ int usb_deauthorize_device(struct usb_device *usb_dev)
 
 out_unauthorized:
 	usb_unlock_device(usb_dev);
+	mutex_unlock(&usb_authorize_mutex);
 	return 0;
 }
 
@@ -2555,6 +2560,7 @@ int usb_authorize_device(struct usb_device *usb_dev)
 {
 	int result = 0, c;
 
+	mutex_lock(&usb_authorize_mutex);
 	usb_lock_device(usb_dev);
 	if (usb_dev->authorized == 1)
 		goto out_authorized;
@@ -2596,6 +2602,7 @@ int usb_authorize_device(struct usb_device *usb_dev)
 error_autoresume:
 out_authorized:
 	usb_unlock_device(usb_dev);	/* complements locktree */
+	mutex_unlock(&usb_authorize_mutex);
 	return result;
 }
 
@@ -5320,6 +5327,7 @@ static void hub_event(struct work_struct *work)
 	hub_dev = hub->intfdev;
 	intf = to_usb_interface(hub_dev);
 
+	mutex_lock(&usb_authorize_mutex);
 	dev_dbg(hub_dev, "state %d ports %d chg %04x evt %04x\n",
 			hdev->state, hdev->maxchild,
 			/* NOTE: expects max 15 ports... */
@@ -5422,6 +5430,7 @@ static void hub_event(struct work_struct *work)
 	usb_autopm_put_interface_no_suspend(intf);
 out_hdev_lock:
 	usb_unlock_device(hdev);
+	mutex_unlock(&usb_authorize_mutex);
 
 	/* Balance the stuff in kick_hub_wq() and allow autosuspend */
 	usb_autopm_put_interface(intf);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] Prevent race condition between USB authorisation and USB discovery events
  2018-12-17  6:10 [PATCH] Prevent race condition between USB authorisation and USB discovery events Andrew Worsley
@ 2018-12-17 16:21 ` Alan Stern
  2018-12-18  0:34   ` Andrew Worsley
  0 siblings, 1 reply; 4+ messages in thread
From: Alan Stern @ 2018-12-17 16:21 UTC (permalink / raw)
  To: Andrew Worsley
  Cc: Greg Kroah-Hartman, Mathias Nyman, Nicolas Boichat, Jon Flatley,
	Kai-Heng Feng, Bin Liu, Benson Leung, open list:USB SUBSYSTEM,
	open list

On Mon, 17 Dec 2018, Andrew Worsley wrote:

> A sysfs driven USB authorisation change can trigger a usb_set_configuration
> while a hub_event worker thread is running. This can result in a USB device
> being disabled just after it was configured and bringing down all the
> devices and impacting hardware and user processes that were established on
> top of this these interfaces. In some cases the USB disable never completed
> and the whole system hung.

Can you be more specific about this?  Disabling a USB device shouldn't
cause these kinds of problems, regardless of whether or not the device
was just configured.

> At my work I had an occasional hang due to this race condition. Roughly 1
> in 50 boots had the race occurrence and 1 in 4 of those resulted in a hang.
> This patch fixed the problem and I had no problems (spurious disables
> or hangs) in 750+ boots.

usb_authorize_device, usb_deauthorize_device, and hub_event all acquire 
the device mutex.  Why should adding another mutex make any difference?

In fact there's an actual disadvantage: Making hub_event acquire a
global mutex will prevent us from handling multiple hubs concurrently.  
Although we don't do this now, we might want to in the future.

Alan Stern


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Prevent race condition between USB authorisation and USB discovery events
  2018-12-17 16:21 ` Alan Stern
@ 2018-12-18  0:34   ` Andrew Worsley
  2018-12-18 15:03     ` Alan Stern
  0 siblings, 1 reply; 4+ messages in thread
From: Andrew Worsley @ 2018-12-18  0:34 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg Kroah-Hartman, Mathias Nyman, Nicolas Boichat, Jon Flatley,
	Kai-Heng Feng, Bin Liu, Benson Leung, open list:USB SUBSYSTEM,
	open list

On Tue, 18 Dec 2018 at 03:21, Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Mon, 17 Dec 2018, Andrew Worsley wrote:
>
> > A sysfs driven USB authorisation change can trigger a usb_set_configuration
> > while a hub_event worker thread is running. This can result in a USB device
> > being disabled just after it was configured and bringing down all the
> > devices and impacting hardware and user processes that were established on
> > top of this these interfaces. In some cases the USB disable never completed
> > and the whole system hung.
>
> Can you be more specific about this?  Disabling a USB device shouldn't
> cause these kinds of problems, regardless of whether or not the device
> was just configured.

I can't say too much about the hardware - but there was an SPI bus and
an i2c bus built
on top of USB devices on a bus. It appears the SPI bus shutdown was
the item that was prone
to problems when being torn down.

I understand it would be better if the dependant systems shutdown
cleanly, which it
did about 75% of the time but then it was rather messy sequence. So
the current system
is far from ideal. The crux of the issue is the usb_disable_device()
(line 1874 of drivers/usb/core/message.c)
call in usb_set_configuration() is applied to a configured device and
triggers the
tear down of all the devices built on that USB device and is very disruptive.

> > At my work I had an occasional hang due to this race condition. Roughly 1
> > in 50 boots had the race occurrence and 1 in 4 of those resulted in a hang.
> > This patch fixed the problem and I had no problems (spurious disables
> > or hangs) in 750+ boots.
>
> usb_authorize_device, usb_deauthorize_device, and hub_event all acquire
> the device mutex.  Why should adding another mutex make any difference?

I believe the new usb_authorize_mutex prevents the cascade of USB
device bus scans,
discoveries and probes from colliding with those caused a change in
USB bus authorisation.
The individual device / hub locks are not across the whole system,
only an individual
component hub/device so any logic that gives an orderly USB device
scanning and probing is
undermined. The idea of the mutex is to keep the USB device discovery
work when a device is
added/removed/powered up is separated from those that are caused by
authorisation changes.

I don't pretend to understand all the logic and sequencing of the
current code which works works
faultlessly when these threads are serialised by the mutex (750+
times) in boot ups which
build up the system and shutdowns which bring down the system.
Likewise if I endlessly run
the authorisation off / on by itself it is faultless over hundreds of
iterations.

> In fact there's an actual disadvantage: Making hub_event acquire a
> global mutex will prevent us from handling multiple hubs concurrently.
> Although we don't do this now, we might want to in the future.
>
> Alan Stern

This is true - I am not trying to serialise the hub_event() threads -
but to prevent the authorisation changes
from happening during the hub_event threads. So perhaps there would be
a way of allowing multiple
concurrent hub_events from happening but only one authorisation thread
at a time. Assuming multiple
hub_event()s are deemed ok. Something like a lock which allows
multiple readers but only a single exclusive
writer lock?

Andrew

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Prevent race condition between USB authorisation and USB discovery events
  2018-12-18  0:34   ` Andrew Worsley
@ 2018-12-18 15:03     ` Alan Stern
  0 siblings, 0 replies; 4+ messages in thread
From: Alan Stern @ 2018-12-18 15:03 UTC (permalink / raw)
  To: Andrew Worsley
  Cc: Greg Kroah-Hartman, Mathias Nyman, Nicolas Boichat, Jon Flatley,
	Kai-Heng Feng, Bin Liu, Benson Leung, open list:USB SUBSYSTEM,
	open list

On Tue, 18 Dec 2018, Andrew Worsley wrote:

> On Tue, 18 Dec 2018 at 03:21, Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > On Mon, 17 Dec 2018, Andrew Worsley wrote:
> >
> > > A sysfs driven USB authorisation change can trigger a usb_set_configuration
> > > while a hub_event worker thread is running. This can result in a USB device
> > > being disabled just after it was configured and bringing down all the
> > > devices and impacting hardware and user processes that were established on
> > > top of this these interfaces. In some cases the USB disable never completed
> > > and the whole system hung.
> >
> > Can you be more specific about this?  Disabling a USB device shouldn't
> > cause these kinds of problems, regardless of whether or not the device
> > was just configured.
> 
> I can't say too much about the hardware - but there was an SPI bus and
> an i2c bus built
> on top of USB devices on a bus. It appears the SPI bus shutdown was
> the item that was prone
> to problems when being torn down.
> 
> I understand it would be better if the dependant systems shutdown
> cleanly, which it
> did about 75% of the time but then it was rather messy sequence. So
> the current system
> is far from ideal. The crux of the issue is the usb_disable_device()
> (line 1874 of drivers/usb/core/message.c)
> call in usb_set_configuration() is applied to a configured device and
> triggers the
> tear down of all the devices built on that USB device and is very disruptive.

This is intentional.  When a USB device is disabled, nothing that
depends on it is supposed to survive.  However in spite of all the
disruption, nothing should hang.

> > > At my work I had an occasional hang due to this race condition. Roughly 1
> > > in 50 boots had the race occurrence and 1 in 4 of those resulted in a hang.
> > > This patch fixed the problem and I had no problems (spurious disables
> > > or hangs) in 750+ boots.
> >
> > usb_authorize_device, usb_deauthorize_device, and hub_event all acquire
> > the device mutex.  Why should adding another mutex make any difference?
> 
> I believe the new usb_authorize_mutex prevents the cascade of USB
> device bus scans,
> discoveries and probes from colliding with those caused a change in
> USB bus authorisation.
> The individual device / hub locks are not across the whole system,
> only an individual
> component hub/device so any logic that gives an orderly USB device
> scanning and probing is
> undermined. The idea of the mutex is to keep the USB device discovery
> work when a device is
> added/removed/powered up is separated from those that are caused by
> authorisation changes.
> 
> I don't pretend to understand all the logic and sequencing of the
> current code which works works
> faultlessly when these threads are serialised by the mutex (750+
> times) in boot ups which
> build up the system and shutdowns which bring down the system.
> Likewise if I endlessly run
> the authorisation off / on by itself it is faultless over hundreds of
> iterations.

Can you post a dmesg log that illustrates the problem with dynamic
debugging enabled for the usbcore module?

> > In fact there's an actual disadvantage: Making hub_event acquire a
> > global mutex will prevent us from handling multiple hubs concurrently.
> > Although we don't do this now, we might want to in the future.
> >
> > Alan Stern
> 
> This is true - I am not trying to serialise the hub_event() threads -
> but to prevent the authorisation changes
> from happening during the hub_event threads. So perhaps there would be
> a way of allowing multiple
> concurrent hub_events from happening but only one authorisation thread
> at a time. Assuming multiple
> hub_event()s are deemed ok. Something like a lock which allows
> multiple readers but only a single exclusive
> writer lock?

It's not that simple.  There are other ways a USB device can be
disabled that don't involve authorization.  For example, the user can
do:

	echo 0 >/sys/bus/usb/devices/.../bConfigurationValue

(the ... gets filled in with the device's path).  This should cause
just as much disruption.

The locking we already have is supposed to prevent the sort of problem 
you are seeing.  To find out why it doesn't, we need more information.

Alan Stern


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-12-18 15:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-17  6:10 [PATCH] Prevent race condition between USB authorisation and USB discovery events Andrew Worsley
2018-12-17 16:21 ` Alan Stern
2018-12-18  0:34   ` Andrew Worsley
2018-12-18 15:03     ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).