All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] usb: hub: Disable autosuspend before disabling usb device
@ 2017-03-17 16:45 Guenter Roeck
  2017-03-17 20:07 ` Alan Stern
  0 siblings, 1 reply; 3+ messages in thread
From: Guenter Roeck @ 2017-03-17 16:45 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Alan Stern, linux-usb, linux-kernel, Douglas Anderson,
	Brian Norris, Guenter Roeck

While running a bind/unbind stress test with the dwc3 usb driver on rk3399,
the following crash was observed.

Unable to handle kernel NULL pointer dereference at virtual address 00000218
pgd = ffffffc00165f000
[00000218] *pgd=000000000174f003, *pud=000000000174f003,
				*pmd=0000000001750003, *pte=00e8000001751713
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in: uinput uvcvideo videobuf2_vmalloc cmac
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat rfcomm
xt_mark fuse bridge stp llc zram btusb btrtl btbcm btintel bluetooth
ip6table_filter mwifiex_pcie mwifiex cfg80211 cdc_ether usbnet r8152 mii joydev
snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device ppp_async
ppp_generic slhc tun
CPU: 1 PID: 29814 Comm: kworker/1:1 Not tainted 4.4.52 #507
Hardware name: Google Kevin (DT)
Workqueue: pm pm_runtime_work
task: ffffffc0ac540000 ti: ffffffc0af4d4000 task.ti: ffffffc0af4d4000
PC is at autosuspend_check+0x74/0x174
LR is at autosuspend_check+0x70/0x174
...
Call trace:
[<ffffffc00080dcc0>] autosuspend_check+0x74/0x174
[<ffffffc000810500>] usb_runtime_idle+0x20/0x40
[<ffffffc000785ae0>] __rpm_callback+0x48/0x7c
[<ffffffc000786af0>] rpm_idle+0x1e8/0x498
[<ffffffc000787cdc>] pm_runtime_work+0x88/0xcc
[<ffffffc000249bb8>] process_one_work+0x390/0x6b8
[<ffffffc00024abcc>] worker_thread+0x480/0x610
[<ffffffc000251a80>] kthread+0x164/0x178
[<ffffffc0002045d0>] ret_from_fork+0x10/0x40

Source:

(gdb) l *0xffffffc00080dcc0
0xffffffc00080dcc0 is in autosuspend_check
(drivers/usb/core/driver.c:1778).
1773		/* We don't need to check interfaces that are
1774		 * disabled for runtime PM.  Either they are unbound
1775		 * or else their drivers don't support autosuspend
1776		 * and so they are permanently active.
1777		 */
1778		if (intf->dev.power.disable_depth)
1779			continue;
1780		if (atomic_read(&intf->dev.power.usage_count) > 0)
1781			return -EBUSY;
1782		w |= intf->needs_remote_wakeup;

Code analysis shows that intf is set to NULL in usb_disable_device() prior
to setting actconfig to NULL. At the same time, usb_runtime_idle() does not
lock the usb device, and neither does any of the functions in the
traceback. This means that there is no protection against a race condition
where usb_disable_device() is removing dev->actconfig->interface[] pointers
while those are being accessed from autosuspend_check() and possibly by
other callers.

Explicitly disable autosuspend in usb_disconnect() before calling
usb_disable_device(). This doesn't fix the race for good, but it ensures
that the pm runtime worker doesn't call usb_runtime_idle() on the interface
that is being removed, and thus avoids the race in the affected code path.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
This is another interesting situation. As mentioned above, the patch doesn't
really fix the race problem. On the other side, fixing it for good would
(probably) be much more complex. I still see the race after applying this
patch, but it happens maybe once a day vs. several times per hour.

Marked as RFC in the hope that someone has an idea for a better fix.
I tried clearing udev->actconfig prior to removing the interfaces
in usb_disable_device(), but that alone didn't help; it does not
resolve the race condition either, and still results in the crash.

The only clean solution I can think of would be to protect accesses
to dev->actconfig with a spinlock or mutex, and to make sure that the
lock is held during read accesses and that dev->actconfig is cleared
before releasing the lock on write accesses. I'll be happy to do that
if it is the way to go, but I would like some feedback before I give it
a try.

 drivers/usb/core/hub.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 5286bf67869a..5a420657f9f7 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -2093,6 +2093,15 @@ void usb_disconnect(struct usb_device **pdev)
 	 * so that the hardware is now fully quiesced.
 	 */
 	dev_dbg(&udev->dev, "unregistering device\n");
+
+	/*
+	 * Disable autosuspend before disabling the device, and make sure
+	 * that autosuspend doesn't touch it while it is in the process
+	 * of being deleted.
+	 */
+	usb_disable_autosuspend(udev);
+	pm_runtime_barrier(&udev->dev);
+
 	usb_disable_device(udev, 0);
 	usb_hcd_synchronize_unlinks(udev);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH] usb: hub: Disable autosuspend before disabling usb device
  2017-03-17 16:45 [RFC PATCH] usb: hub: Disable autosuspend before disabling usb device Guenter Roeck
@ 2017-03-17 20:07 ` Alan Stern
  2017-03-17 21:57   ` Guenter Roeck
  0 siblings, 1 reply; 3+ messages in thread
From: Alan Stern @ 2017-03-17 20:07 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Greg Kroah-Hartman, linux-usb, linux-kernel, Douglas Anderson,
	Brian Norris

On Fri, 17 Mar 2017, Guenter Roeck wrote:

> While running a bind/unbind stress test with the dwc3 usb driver on rk3399,
> the following crash was observed.
> 
> Unable to handle kernel NULL pointer dereference at virtual address 00000218
> pgd = ffffffc00165f000
> [00000218] *pgd=000000000174f003, *pud=000000000174f003,
> 				*pmd=0000000001750003, *pte=00e8000001751713
> Internal error: Oops: 96000005 [#1] PREEMPT SMP
> Modules linked in: uinput uvcvideo videobuf2_vmalloc cmac
> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat rfcomm
> xt_mark fuse bridge stp llc zram btusb btrtl btbcm btintel bluetooth
> ip6table_filter mwifiex_pcie mwifiex cfg80211 cdc_ether usbnet r8152 mii joydev
> snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device ppp_async
> ppp_generic slhc tun
> CPU: 1 PID: 29814 Comm: kworker/1:1 Not tainted 4.4.52 #507
> Hardware name: Google Kevin (DT)
> Workqueue: pm pm_runtime_work
> task: ffffffc0ac540000 ti: ffffffc0af4d4000 task.ti: ffffffc0af4d4000
> PC is at autosuspend_check+0x74/0x174
> LR is at autosuspend_check+0x70/0x174
> ...
> Call trace:
> [<ffffffc00080dcc0>] autosuspend_check+0x74/0x174
> [<ffffffc000810500>] usb_runtime_idle+0x20/0x40
> [<ffffffc000785ae0>] __rpm_callback+0x48/0x7c
> [<ffffffc000786af0>] rpm_idle+0x1e8/0x498
> [<ffffffc000787cdc>] pm_runtime_work+0x88/0xcc
> [<ffffffc000249bb8>] process_one_work+0x390/0x6b8
> [<ffffffc00024abcc>] worker_thread+0x480/0x610
> [<ffffffc000251a80>] kthread+0x164/0x178
> [<ffffffc0002045d0>] ret_from_fork+0x10/0x40
> 
> Source:
> 
> (gdb) l *0xffffffc00080dcc0
> 0xffffffc00080dcc0 is in autosuspend_check
> (drivers/usb/core/driver.c:1778).
> 1773		/* We don't need to check interfaces that are
> 1774		 * disabled for runtime PM.  Either they are unbound
> 1775		 * or else their drivers don't support autosuspend
> 1776		 * and so they are permanently active.
> 1777		 */
> 1778		if (intf->dev.power.disable_depth)
> 1779			continue;
> 1780		if (atomic_read(&intf->dev.power.usage_count) > 0)
> 1781			return -EBUSY;
> 1782		w |= intf->needs_remote_wakeup;
> 
> Code analysis shows that intf is set to NULL in usb_disable_device() prior
> to setting actconfig to NULL. At the same time, usb_runtime_idle() does not
> lock the usb device, and neither does any of the functions in the
> traceback. This means that there is no protection against a race condition
> where usb_disable_device() is removing dev->actconfig->interface[] pointers
> while those are being accessed from autosuspend_check() and possibly by
> other callers.

That is an interesting race, one that had previously escaped my notice.

> Explicitly disable autosuspend in usb_disconnect() before calling
> usb_disable_device(). This doesn't fix the race for good, but it ensures
> that the pm runtime worker doesn't call usb_runtime_idle() on the interface
> that is being removed, and thus avoids the race in the affected code path.
> 
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
> This is another interesting situation. As mentioned above, the patch doesn't
> really fix the race problem. On the other side, fixing it for good would
> (probably) be much more complex. I still see the race after applying this
> patch, but it happens maybe once a day vs. several times per hour.
> 
> Marked as RFC in the hope that someone has an idea for a better fix.
> I tried clearing udev->actconfig prior to removing the interfaces
> in usb_disable_device(), but that alone didn't help; it does not
> resolve the race condition either, and still results in the crash.
> 
> The only clean solution I can think of would be to protect accesses
> to dev->actconfig with a spinlock or mutex, and to make sure that the
> lock is held during read accesses and that dev->actconfig is cleared
> before releasing the lock on write accesses. I'll be happy to do that
> if it is the way to go, but I would like some feedback before I give it
> a try.

I think the right thing to do is test udev->state at the start of 
autosuspend_check(), much like the test at the start of 
usb_suspend_both().  udev->state gets set to USB_STATE_NOTATTACHED in 
usb_disconnect() before usb_disable_device() is called.

Then there will be no need for usb_disable_autosuspend(), although 
pm_runtime_barrier() will still be necessary.

Alan Stern


>  drivers/usb/core/hub.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 5286bf67869a..5a420657f9f7 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -2093,6 +2093,15 @@ void usb_disconnect(struct usb_device **pdev)
>  	 * so that the hardware is now fully quiesced.
>  	 */
>  	dev_dbg(&udev->dev, "unregistering device\n");
> +
> +	/*
> +	 * Disable autosuspend before disabling the device, and make sure
> +	 * that autosuspend doesn't touch it while it is in the process
> +	 * of being deleted.
> +	 */
> +	usb_disable_autosuspend(udev);
> +	pm_runtime_barrier(&udev->dev);
> +
>  	usb_disable_device(udev, 0);
>  	usb_hcd_synchronize_unlinks(udev);
>  
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH] usb: hub: Disable autosuspend before disabling usb device
  2017-03-17 20:07 ` Alan Stern
@ 2017-03-17 21:57   ` Guenter Roeck
  0 siblings, 0 replies; 3+ messages in thread
From: Guenter Roeck @ 2017-03-17 21:57 UTC (permalink / raw)
  To: Alan Stern
  Cc: Greg Kroah-Hartman, linux-usb, linux-kernel, Douglas Anderson,
	Brian Norris

On Fri, Mar 17, 2017 at 04:07:14PM -0400, Alan Stern wrote:
> On Fri, 17 Mar 2017, Guenter Roeck wrote:
> 
> > While running a bind/unbind stress test with the dwc3 usb driver on rk3399,
> > the following crash was observed.
> > 
> > Unable to handle kernel NULL pointer dereference at virtual address 00000218
> > pgd = ffffffc00165f000
> > [00000218] *pgd=000000000174f003, *pud=000000000174f003,
> > 				*pmd=0000000001750003, *pte=00e8000001751713
> > Internal error: Oops: 96000005 [#1] PREEMPT SMP
> > Modules linked in: uinput uvcvideo videobuf2_vmalloc cmac
> > ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat rfcomm
> > xt_mark fuse bridge stp llc zram btusb btrtl btbcm btintel bluetooth
> > ip6table_filter mwifiex_pcie mwifiex cfg80211 cdc_ether usbnet r8152 mii joydev
> > snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device ppp_async
> > ppp_generic slhc tun
> > CPU: 1 PID: 29814 Comm: kworker/1:1 Not tainted 4.4.52 #507
> > Hardware name: Google Kevin (DT)
> > Workqueue: pm pm_runtime_work
> > task: ffffffc0ac540000 ti: ffffffc0af4d4000 task.ti: ffffffc0af4d4000
> > PC is at autosuspend_check+0x74/0x174
> > LR is at autosuspend_check+0x70/0x174
> > ...
> > Call trace:
> > [<ffffffc00080dcc0>] autosuspend_check+0x74/0x174
> > [<ffffffc000810500>] usb_runtime_idle+0x20/0x40
> > [<ffffffc000785ae0>] __rpm_callback+0x48/0x7c
> > [<ffffffc000786af0>] rpm_idle+0x1e8/0x498
> > [<ffffffc000787cdc>] pm_runtime_work+0x88/0xcc
> > [<ffffffc000249bb8>] process_one_work+0x390/0x6b8
> > [<ffffffc00024abcc>] worker_thread+0x480/0x610
> > [<ffffffc000251a80>] kthread+0x164/0x178
> > [<ffffffc0002045d0>] ret_from_fork+0x10/0x40
> > 
> > Source:
> > 
> > (gdb) l *0xffffffc00080dcc0
> > 0xffffffc00080dcc0 is in autosuspend_check
> > (drivers/usb/core/driver.c:1778).
> > 1773		/* We don't need to check interfaces that are
> > 1774		 * disabled for runtime PM.  Either they are unbound
> > 1775		 * or else their drivers don't support autosuspend
> > 1776		 * and so they are permanently active.
> > 1777		 */
> > 1778		if (intf->dev.power.disable_depth)
> > 1779			continue;
> > 1780		if (atomic_read(&intf->dev.power.usage_count) > 0)
> > 1781			return -EBUSY;
> > 1782		w |= intf->needs_remote_wakeup;
> > 
> > Code analysis shows that intf is set to NULL in usb_disable_device() prior
> > to setting actconfig to NULL. At the same time, usb_runtime_idle() does not
> > lock the usb device, and neither does any of the functions in the
> > traceback. This means that there is no protection against a race condition
> > where usb_disable_device() is removing dev->actconfig->interface[] pointers
> > while those are being accessed from autosuspend_check() and possibly by
> > other callers.
> 
> That is an interesting race, one that had previously escaped my notice.
> 
> > Explicitly disable autosuspend in usb_disconnect() before calling
> > usb_disable_device(). This doesn't fix the race for good, but it ensures
> > that the pm runtime worker doesn't call usb_runtime_idle() on the interface
> > that is being removed, and thus avoids the race in the affected code path.
> > 
> > Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> > ---
> > This is another interesting situation. As mentioned above, the patch doesn't
> > really fix the race problem. On the other side, fixing it for good would
> > (probably) be much more complex. I still see the race after applying this
> > patch, but it happens maybe once a day vs. several times per hour.
> > 
> > Marked as RFC in the hope that someone has an idea for a better fix.
> > I tried clearing udev->actconfig prior to removing the interfaces
> > in usb_disable_device(), but that alone didn't help; it does not
> > resolve the race condition either, and still results in the crash.
> > 
> > The only clean solution I can think of would be to protect accesses
> > to dev->actconfig with a spinlock or mutex, and to make sure that the
> > lock is held during read accesses and that dev->actconfig is cleared
> > before releasing the lock on write accesses. I'll be happy to do that
> > if it is the way to go, but I would like some feedback before I give it
> > a try.
> 
> I think the right thing to do is test udev->state at the start of 
> autosuspend_check(), much like the test at the start of 
> usb_suspend_both().  udev->state gets set to USB_STATE_NOTATTACHED in 
> usb_disconnect() before usb_disable_device() is called.
> 
> Then there will be no need for usb_disable_autosuspend(), although 
> pm_runtime_barrier() will still be necessary.
> 
Makes sense. I'll give it a try.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-03-17 22:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-17 16:45 [RFC PATCH] usb: hub: Disable autosuspend before disabling usb device Guenter Roeck
2017-03-17 20:07 ` Alan Stern
2017-03-17 21:57   ` Guenter Roeck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.