From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751263AbdCQWQ4 (ORCPT ); Fri, 17 Mar 2017 18:16:56 -0400 Received: from bh-25.webhostbox.net ([208.91.199.152]:40020 "EHLO bh-25.webhostbox.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751023AbdCQWQy (ORCPT ); Fri, 17 Mar 2017 18:16:54 -0400 Date: Fri, 17 Mar 2017 14:57:15 -0700 From: Guenter Roeck To: Alan Stern Cc: Greg Kroah-Hartman , linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org, Douglas Anderson , Brian Norris Subject: Re: [RFC PATCH] usb: hub: Disable autosuspend before disabling usb device Message-ID: <20170317215715.GA29612@roeck-us.net> References: <1489769155-9823-1-git-send-email-linux@roeck-us.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Authenticated_sender: guenter@roeck-us.net X-OutGoing-Spam-Status: No, score=-1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - bh-25.webhostbox.net X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - roeck-us.net X-Get-Message-Sender-Via: bh-25.webhostbox.net: authenticated_id: guenter@roeck-us.net X-Authenticated-Sender: bh-25.webhostbox.net: guenter@roeck-us.net X-Source: X-Source-Args: X-Source-Dir: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 17, 2017 at 04:07:14PM -0400, Alan Stern wrote: > On Fri, 17 Mar 2017, Guenter Roeck wrote: > > > While running a bind/unbind stress test with the dwc3 usb driver on rk3399, > > the following crash was observed. > > > > Unable to handle kernel NULL pointer dereference at virtual address 00000218 > > pgd = ffffffc00165f000 > > [00000218] *pgd=000000000174f003, *pud=000000000174f003, > > *pmd=0000000001750003, *pte=00e8000001751713 > > Internal error: Oops: 96000005 [#1] PREEMPT SMP > > Modules linked in: uinput uvcvideo videobuf2_vmalloc cmac > > ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat rfcomm > > xt_mark fuse bridge stp llc zram btusb btrtl btbcm btintel bluetooth > > ip6table_filter mwifiex_pcie mwifiex cfg80211 cdc_ether usbnet r8152 mii joydev > > snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device ppp_async > > ppp_generic slhc tun > > CPU: 1 PID: 29814 Comm: kworker/1:1 Not tainted 4.4.52 #507 > > Hardware name: Google Kevin (DT) > > Workqueue: pm pm_runtime_work > > task: ffffffc0ac540000 ti: ffffffc0af4d4000 task.ti: ffffffc0af4d4000 > > PC is at autosuspend_check+0x74/0x174 > > LR is at autosuspend_check+0x70/0x174 > > ... > > Call trace: > > [] autosuspend_check+0x74/0x174 > > [] usb_runtime_idle+0x20/0x40 > > [] __rpm_callback+0x48/0x7c > > [] rpm_idle+0x1e8/0x498 > > [] pm_runtime_work+0x88/0xcc > > [] process_one_work+0x390/0x6b8 > > [] worker_thread+0x480/0x610 > > [] kthread+0x164/0x178 > > [] ret_from_fork+0x10/0x40 > > > > Source: > > > > (gdb) l *0xffffffc00080dcc0 > > 0xffffffc00080dcc0 is in autosuspend_check > > (drivers/usb/core/driver.c:1778). > > 1773 /* We don't need to check interfaces that are > > 1774 * disabled for runtime PM. Either they are unbound > > 1775 * or else their drivers don't support autosuspend > > 1776 * and so they are permanently active. > > 1777 */ > > 1778 if (intf->dev.power.disable_depth) > > 1779 continue; > > 1780 if (atomic_read(&intf->dev.power.usage_count) > 0) > > 1781 return -EBUSY; > > 1782 w |= intf->needs_remote_wakeup; > > > > Code analysis shows that intf is set to NULL in usb_disable_device() prior > > to setting actconfig to NULL. At the same time, usb_runtime_idle() does not > > lock the usb device, and neither does any of the functions in the > > traceback. This means that there is no protection against a race condition > > where usb_disable_device() is removing dev->actconfig->interface[] pointers > > while those are being accessed from autosuspend_check() and possibly by > > other callers. > > That is an interesting race, one that had previously escaped my notice. > > > Explicitly disable autosuspend in usb_disconnect() before calling > > usb_disable_device(). This doesn't fix the race for good, but it ensures > > that the pm runtime worker doesn't call usb_runtime_idle() on the interface > > that is being removed, and thus avoids the race in the affected code path. > > > > Signed-off-by: Guenter Roeck > > --- > > This is another interesting situation. As mentioned above, the patch doesn't > > really fix the race problem. On the other side, fixing it for good would > > (probably) be much more complex. I still see the race after applying this > > patch, but it happens maybe once a day vs. several times per hour. > > > > Marked as RFC in the hope that someone has an idea for a better fix. > > I tried clearing udev->actconfig prior to removing the interfaces > > in usb_disable_device(), but that alone didn't help; it does not > > resolve the race condition either, and still results in the crash. > > > > The only clean solution I can think of would be to protect accesses > > to dev->actconfig with a spinlock or mutex, and to make sure that the > > lock is held during read accesses and that dev->actconfig is cleared > > before releasing the lock on write accesses. I'll be happy to do that > > if it is the way to go, but I would like some feedback before I give it > > a try. > > I think the right thing to do is test udev->state at the start of > autosuspend_check(), much like the test at the start of > usb_suspend_both(). udev->state gets set to USB_STATE_NOTATTACHED in > usb_disconnect() before usb_disable_device() is called. > > Then there will be no need for usb_disable_autosuspend(), although > pm_runtime_barrier() will still be necessary. > Makes sense. I'll give it a try. Thanks, Guenter