diff mbox series

Message ID 1512638774-6837-1-git-send-email-agk@godking.net
State New, archived
Headers show
Series
  • Untitled series #334877
Related show

Commit Message

Alexander Kappner Dec. 7, 2017, 9:26 a.m. UTC
Date: Wed, 6 Dec 2017 15:28:37 -0800
Subject: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

My kernel crashed just after resuming from hibernate and starting usbmuxd
(a user-space daemon for iOS device pairing) with several USB devices
connected (dmesg attached).

Backtrace leads to:

0xffffffff8170465d is in xhci_debugfs_create_endpoint
(drivers/usb/host/xhci-debugfs.c:381).
376                                       int ep_index)
377     {
378             struct xhci_ep_priv     *epriv;
379             struct xhci_slot_priv   *spriv = dev->debugfs_private;
380
381             if (spriv->eps[ep_index])
382                     return;
383
384             epriv = kzalloc(sizeof(*epriv), GFP_KERNEL);
385             if (!epriv)

The read violation happens at address 0x40 and sizeof(struct
xhci_ep_priv)=0x40, so it seems ep_index is 1 and spriv is NULL here.

spriv gets allocated in xhci_debugfs_create_slot:

...
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
    return;
...

There's no separate error path if this allocation fails, so we might be
left with NULL in priv. Subsequent users of priv thus need to check for
this NULL - so this is what the patch does.

There might be other ways of triggering this null pointer dereference,
including when xhci_resume frees the device structures (e.g. after
returning from a hibernate), but I wasn't able to find or reproduce it. 

[63953.758083] BUG: unable to handle kernel NULL pointer dereference at
0000000000000040
[63953.758090] IP: xhci_debugfs_create_endpoint+0x1d/0xa0
[63953.758091] PGD bb911d067 P4D bb911d067 PUD 10500ff067 PMD 0
[63953.758093] Oops: 0000 [#1] PREEMPT SMP
[63953.758094] Modules linked in: ipheth tun nvidia_modeset(PO) iwlmvm
mac80211 iwlwifi nvidia(PO) btusb btrtl btbcm btintel bluetooth cfg80211
qmi_wwan ecdh_generic thinkpad_acpi rfkill
[63953.758103] CPU: 4 PID: 27091 Comm: usbmuxd Tainted: P           O
4.14.0.1-12769-g1deab8c #1
[63953.758104] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET62W
(1.35 ) 11/10/2016
[63953.758105] task: ffff8810527ba0c0 task.stack: ffffc9000a8ec000
[63953.758107] RIP: 0010:xhci_debugfs_create_endpoint+0x1d/0xa0
[63953.758108] RSP: 0018:ffffc9000a8efc80 EFLAGS: 00010206
[63953.758109] RAX: 0000000000000000 RBX: ffff88105a71c000 RCX:
0000000000030000
[63953.758110] RDX: 0000000000000003 RSI: ffff880c0b57e000 RDI:
ffff88105a71c238
[63953.758110] RBP: 0000000000000003 R08: ffff881063800600 R09:
0000000000000003
[63953.758111] R10: ffff88105a71c238 R11: 0000000000000001 R12:
0000000000000011
[63953.758112] R13: 0000000000000018 R14: 0000000000000000 R15:
0000000000000000
[63953.758113] FS:  00007f0a77715700(0000) GS:ffff8810a3d00000(0000)
knlGS:0000000000000000
[63953.758114] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63953.758115] CR2: 0000000000000040 CR3: 00000003f91a8006 CR4:
00000000003606e0
[63953.758115] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[63953.758116] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[63953.758117] Call Trace:
[63953.758120]  xhci_add_endpoint+0x127/0x2b0
[63953.758123]  usb_hcd_alloc_bandwidth+0x1ad/0x300
[63953.758125]  usb_set_configuration+0x1c8/0x880
[63953.758128]  usbdev_do_ioctl+0xc41/0x1120
[63953.758130]  usbdev_ioctl+0xa/0x10
[63953.758151]  do_vfs_ioctl+0x8b/0x5c0
[63953.758153]  ? __fget+0x6c/0xb0
[63953.758155]  SyS_ioctl+0x76/0x90
[63953.758157]  do_syscall_64+0x6b/0x290
[63953.758173]  entry_SYSCALL64_slow_path+0x25/0x25
[63953.758175] RIP: 0033:0x7f0a76a151c7
[63953.758175] RSP: 002b:00007ffd1431b0c8 EFLAGS: 00000202 ORIG_RAX:
0000000000000010
[63953.758177] RAX: ffffffffffffffda RBX: 00000000023239a0 RCX:
00007f0a76a151c7
[63953.758178] RDX: 00007ffd1431b0dc RSI: 0000000080045505 RDI:
000000000000000e
[63953.758178] RBP: 00000000023240c0 R08: 00007ffd1431b008 R09:
0000000000000004
[63953.758179] R10: 00007ffd1431aec0 R11: 0000000000000202 R12:
00000000023240c0
[63953.758180] R13: 0000000000000001 R14: 0000000000000056 R15:
0000000000000038
[63953.758182] Code: e9 39 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 41 57 41 56 41 55 41 54 55 48 63 ea 53 4c 8b b6 88 15 00 00 4d 8d 2c ee
<49> 83 7d 28 00 74 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 3d
[63953.758202] RIP: xhci_debugfs_create_endpoint+0x1d/0xa0 RSP:
ffffc9000a8efc80
[63953.758203] CR2: 0000000000000040
[63953.758204] ---[ end trace 1f7ea9a959f02054 ]---

Signed-off-by: Alexander Kappner <agk@godking.net>
---
 drivers/usb/host/xhci-debugfs.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Greg Kroah-Hartman Dec. 7, 2017, 10:38 a.m. UTC | #1
On Thu, Dec 07, 2017 at 01:26:14AM -0800, Alexander Kappner wrote:
> Date: Wed, 6 Dec 2017 15:28:37 -0800
> Subject: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

Something went wrong here, resulting in an email with no subject.

Can you fix this up and resend?

thanks,

greg k-h
Mathias Nyman Dec. 7, 2017, 12:47 p.m. UTC | #2
On 07.12.2017 11:26, Alexander Kappner wrote:
> Date: Wed, 6 Dec 2017 15:28:37 -0800
> Subject: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

(Added this as reply subject, please remember to send mail with proper subject)

> 
> My kernel crashed just after resuming from hibernate and starting usbmuxd
> (a user-space daemon for iOS device pairing) with several USB devices
> connected (dmesg attached).
> 
> Backtrace leads to:
> 
> 0xffffffff8170465d is in xhci_debugfs_create_endpoint
> (drivers/usb/host/xhci-debugfs.c:381).
> 376                                       int ep_index)
> 377     {
> 378             struct xhci_ep_priv     *epriv;
> 379             struct xhci_slot_priv   *spriv = dev->debugfs_private;
> 380
> 381             if (spriv->eps[ep_index])
> 382                     return;
> 383
> 384             epriv = kzalloc(sizeof(*epriv), GFP_KERNEL);
> 385             if (!epriv)
> 
> The read violation happens at address 0x40 and sizeof(struct
> xhci_ep_priv)=0x40, so it seems ep_index is 1 and spriv is NULL here.
> 

Thanks for reporting and debugging this, much appreciated.
  
> spriv gets allocated in xhci_debugfs_create_slot:
> 
> ...
> priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> if (!priv)
>      return;
> ...
> 
> There's no separate error path if this allocation fails, so we might be
> left with NULL in priv. Subsequent users of priv thus need to check for
> this NULL - so this is what the patch does.
> 

I think we need to dig a bit deeper. It's good to check if spriv is valid
but there are probably other reasons than kzalloc failing.

Resume from hibernation will reset the xhci controller, reinitializing a lot.
It could be related.

-Mathias

(keeping rest of message as reference)

> There might be other ways of triggering this null pointer dereference,
> including when xhci_resume frees the device structures (e.g. after
> returning from a hibernate), but I wasn't able to find or reproduce it.
> 
> [63953.758083] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000040
> [63953.758090] IP: xhci_debugfs_create_endpoint+0x1d/0xa0
> [63953.758091] PGD bb911d067 P4D bb911d067 PUD 10500ff067 PMD 0
> [63953.758093] Oops: 0000 [#1] PREEMPT SMP
> [63953.758094] Modules linked in: ipheth tun nvidia_modeset(PO) iwlmvm
> mac80211 iwlwifi nvidia(PO) btusb btrtl btbcm btintel bluetooth cfg80211
> qmi_wwan ecdh_generic thinkpad_acpi rfkill
> [63953.758103] CPU: 4 PID: 27091 Comm: usbmuxd Tainted: P           O
> 4.14.0.1-12769-g1deab8c #1
> [63953.758104] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET62W
> (1.35 ) 11/10/2016
> [63953.758105] task: ffff8810527ba0c0 task.stack: ffffc9000a8ec000
> [63953.758107] RIP: 0010:xhci_debugfs_create_endpoint+0x1d/0xa0
> [63953.758108] RSP: 0018:ffffc9000a8efc80 EFLAGS: 00010206
> [63953.758109] RAX: 0000000000000000 RBX: ffff88105a71c000 RCX:
> 0000000000030000
> [63953.758110] RDX: 0000000000000003 RSI: ffff880c0b57e000 RDI:
> ffff88105a71c238
> [63953.758110] RBP: 0000000000000003 R08: ffff881063800600 R09:
> 0000000000000003
> [63953.758111] R10: ffff88105a71c238 R11: 0000000000000001 R12:
> 0000000000000011
> [63953.758112] R13: 0000000000000018 R14: 0000000000000000 R15:
> 0000000000000000
> [63953.758113] FS:  00007f0a77715700(0000) GS:ffff8810a3d00000(0000)
> knlGS:0000000000000000
> [63953.758114] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [63953.758115] CR2: 0000000000000040 CR3: 00000003f91a8006 CR4:
> 00000000003606e0
> [63953.758115] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [63953.758116] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [63953.758117] Call Trace:
> [63953.758120]  xhci_add_endpoint+0x127/0x2b0
> [63953.758123]  usb_hcd_alloc_bandwidth+0x1ad/0x300
> [63953.758125]  usb_set_configuration+0x1c8/0x880
> [63953.758128]  usbdev_do_ioctl+0xc41/0x1120
> [63953.758130]  usbdev_ioctl+0xa/0x10
> [63953.758151]  do_vfs_ioctl+0x8b/0x5c0
> [63953.758153]  ? __fget+0x6c/0xb0
> [63953.758155]  SyS_ioctl+0x76/0x90
> [63953.758157]  do_syscall_64+0x6b/0x290
> [63953.758173]  entry_SYSCALL64_slow_path+0x25/0x25
> [63953.758175] RIP: 0033:0x7f0a76a151c7
> [63953.758175] RSP: 002b:00007ffd1431b0c8 EFLAGS: 00000202 ORIG_RAX:
> 0000000000000010
> [63953.758177] RAX: ffffffffffffffda RBX: 00000000023239a0 RCX:
> 00007f0a76a151c7
> [63953.758178] RDX: 00007ffd1431b0dc RSI: 0000000080045505 RDI:
> 000000000000000e
> [63953.758178] RBP: 00000000023240c0 R08: 00007ffd1431b008 R09:
> 0000000000000004
> [63953.758179] R10: 00007ffd1431aec0 R11: 0000000000000202 R12:
> 00000000023240c0
> [63953.758180] R13: 0000000000000001 R14: 0000000000000056 R15:
> 0000000000000038
> [63953.758182] Code: e9 39 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00
> 00 41 57 41 56 41 55 41 54 55 48 63 ea 53 4c 8b b6 88 15 00 00 4d 8d 2c ee
> <49> 83 7d 28 00 74 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 3d
> [63953.758202] RIP: xhci_debugfs_create_endpoint+0x1d/0xa0 RSP:
> ffffc9000a8efc80
> [63953.758203] CR2: 0000000000000040
> [63953.758204] ---[ end trace 1f7ea9a959f02054 ]---
> 
> Signed-off-by: Alexander Kappner <agk@godking.net>
> ---
>   drivers/usb/host/xhci-debugfs.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/usb/host/xhci-debugfs.c b/drivers/usb/host/xhci-debugfs.c
> index 4f7895d..1cea59c 100644
> --- a/drivers/usb/host/xhci-debugfs.c
> +++ b/drivers/usb/host/xhci-debugfs.c
> @@ -378,6 +378,9 @@ void xhci_debugfs_create_endpoint(struct xhci_hcd *xhci,
>   	struct xhci_ep_priv	*epriv;
>   	struct xhci_slot_priv	*spriv = dev->debugfs_private;
>   
> +	if (!spriv)
> +		return;
> +
>   	if (spriv->eps[ep_index])
>   		return;
>   
>
Alexander Kappner Dec. 8, 2017, 11:06 a.m. UTC | #3
Hi,

 > I think we need to dig a bit deeper. It's good to check if spriv is 
valid
but there are probably other reasons than kzalloc failing.

I agree -- this small allocation is  unlikely to fail in practice. Also, 
while my patch prevents the kernel oops, it also prevents the debugfs 
entries from being created.

I've been debugging this more trying to come up with a better solution, 
but I might need some guidance as I'm not too familiar with the USB 
subsystem. The immediate cause of the crash was usbmuxd sending a 
USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if it fails_ 
calls usb_hcd_alloc_bandwidth to try and reset the device, which in turn 
calls xhci_debugfs_create_endpoint. The ioctl handler acquires a 
device-specific lock via usb_lock_device.

When the system resumed from hibernate, xhci_resume was called. This in 
turn called xhci_mem_cleanup to deallocate the device structures, which 
include setting the debugfs_private pointer to NULL  (via 
xhci_free_virt_devices_depth_first). It thus seems likely that the ioctl 
is somehow racing with the hibernate. The call to xhci_resume is 
protected by a host-controller specific lock (xhci->lock) but it doesn't 
attempt to take the usb_lock_device device-specific lock.

Now my suspicion is that xhci_resume freed and zeroed the device 
structures while racing with the ioctl handler. I can't seem to find any 
exclusion mechanism that would prevent xhci_resume from racing with the 
USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for that matter). 
Am I missing something? If not, is there any reason why an ioctl might 
need to execute in parallel with the xhci_resume? If not, can we just do 
a busy wait in xhci_resume until all pending ioctls have returned?

 > (Added this as reply subject, please remember to send mail with 
proper subject)
My apologies, thanks for fixing.


On 12/07/2017 04:47 AM, Mathias Nyman wrote:
> On 07.12.2017 11:26, Alexander Kappner wrote:
>> Date: Wed, 6 Dec 2017 15:28:37 -0800
>> Subject: [PATCH] usb-core: Fix potential null pointer dereference in 
>> xhci-debugfs.c
>
> (Added this as reply subject, please remember to send mail with proper 
> subject)
>
>>
>> My kernel crashed just after resuming from hibernate and starting 
>> usbmuxd
>> (a user-space daemon for iOS device pairing) with several USB devices
>> connected (dmesg attached).
>>
>> Backtrace leads to:
>>
>> 0xffffffff8170465d is in xhci_debugfs_create_endpoint
>> (drivers/usb/host/xhci-debugfs.c:381).
>> 376                                       int ep_index)
>> 377     {
>> 378             struct xhci_ep_priv     *epriv;
>> 379             struct xhci_slot_priv   *spriv = dev->debugfs_private;
>> 380
>> 381             if (spriv->eps[ep_index])
>> 382                     return;
>> 383
>> 384             epriv = kzalloc(sizeof(*epriv), GFP_KERNEL);
>> 385             if (!epriv)
>>
>> The read violation happens at address 0x40 and sizeof(struct
>> xhci_ep_priv)=0x40, so it seems ep_index is 1 and spriv is NULL here.
>>
>
> Thanks for reporting and debugging this, much appreciated.
>
>> spriv gets allocated in xhci_debugfs_create_slot:
>>
>> ...
>> priv = kzalloc(sizeof(*priv), GFP_KERNEL);
>> if (!priv)
>>      return;
>> ...
>>
>> There's no separate error path if this allocation fails, so we might be
>> left with NULL in priv. Subsequent users of priv thus need to check for
>> this NULL - so this is what the patch does.
>>
>
> I think we need to dig a bit deeper. It's good to check if spriv is valid
> but there are probably other reasons than kzalloc failing.
>
> Resume from hibernation will reset the xhci controller, reinitializing 
> a lot.
> It could be related.
>
> -Mathias
>
> (keeping rest of message as reference)
>
>> There might be other ways of triggering this null pointer dereference,
>> including when xhci_resume frees the device structures (e.g. after
>> returning from a hibernate), but I wasn't able to find or reproduce it.
>>
>> [63953.758083] BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000040
>> [63953.758090] IP: xhci_debugfs_create_endpoint+0x1d/0xa0
>> [63953.758091] PGD bb911d067 P4D bb911d067 PUD 10500ff067 PMD 0
>> [63953.758093] Oops: 0000 [#1] PREEMPT SMP
>> [63953.758094] Modules linked in: ipheth tun nvidia_modeset(PO) iwlmvm
>> mac80211 iwlwifi nvidia(PO) btusb btrtl btbcm btintel bluetooth cfg80211
>> qmi_wwan ecdh_generic thinkpad_acpi rfkill
>> [63953.758103] CPU: 4 PID: 27091 Comm: usbmuxd Tainted: P           O
>> 4.14.0.1-12769-g1deab8c #1
>> [63953.758104] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS 
>> N1EET62W
>> (1.35 ) 11/10/2016
>> [63953.758105] task: ffff8810527ba0c0 task.stack: ffffc9000a8ec000
>> [63953.758107] RIP: 0010:xhci_debugfs_create_endpoint+0x1d/0xa0
>> [63953.758108] RSP: 0018:ffffc9000a8efc80 EFLAGS: 00010206
>> [63953.758109] RAX: 0000000000000000 RBX: ffff88105a71c000 RCX:
>> 0000000000030000
>> [63953.758110] RDX: 0000000000000003 RSI: ffff880c0b57e000 RDI:
>> ffff88105a71c238
>> [63953.758110] RBP: 0000000000000003 R08: ffff881063800600 R09:
>> 0000000000000003
>> [63953.758111] R10: ffff88105a71c238 R11: 0000000000000001 R12:
>> 0000000000000011
>> [63953.758112] R13: 0000000000000018 R14: 0000000000000000 R15:
>> 0000000000000000
>> [63953.758113] FS:  00007f0a77715700(0000) GS:ffff8810a3d00000(0000)
>> knlGS:0000000000000000
>> [63953.758114] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [63953.758115] CR2: 0000000000000040 CR3: 00000003f91a8006 CR4:
>> 00000000003606e0
>> [63953.758115] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [63953.758116] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>> 0000000000000400
>> [63953.758117] Call Trace:
>> [63953.758120]  xhci_add_endpoint+0x127/0x2b0
>> [63953.758123]  usb_hcd_alloc_bandwidth+0x1ad/0x300
>> [63953.758125]  usb_set_configuration+0x1c8/0x880
>> [63953.758128]  usbdev_do_ioctl+0xc41/0x1120
>> [63953.758130]  usbdev_ioctl+0xa/0x10
>> [63953.758151]  do_vfs_ioctl+0x8b/0x5c0
>> [63953.758153]  ? __fget+0x6c/0xb0
>> [63953.758155]  SyS_ioctl+0x76/0x90
>> [63953.758157]  do_syscall_64+0x6b/0x290
>> [63953.758173]  entry_SYSCALL64_slow_path+0x25/0x25
>> [63953.758175] RIP: 0033:0x7f0a76a151c7
>> [63953.758175] RSP: 002b:00007ffd1431b0c8 EFLAGS: 00000202 ORIG_RAX:
>> 0000000000000010
>> [63953.758177] RAX: ffffffffffffffda RBX: 00000000023239a0 RCX:
>> 00007f0a76a151c7
>> [63953.758178] RDX: 00007ffd1431b0dc RSI: 0000000080045505 RDI:
>> 000000000000000e
>> [63953.758178] RBP: 00000000023240c0 R08: 00007ffd1431b008 R09:
>> 0000000000000004
>> [63953.758179] R10: 00007ffd1431aec0 R11: 0000000000000202 R12:
>> 00000000023240c0
>> [63953.758180] R13: 0000000000000001 R14: 0000000000000056 R15:
>> 0000000000000038
>> [63953.758182] Code: e9 39 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 
>> 44 00
>> 00 41 57 41 56 41 55 41 54 55 48 63 ea 53 4c 8b b6 88 15 00 00 4d 8d 
>> 2c ee
>> <49> 83 7d 28 00 74 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 3d
>> [63953.758202] RIP: xhci_debugfs_create_endpoint+0x1d/0xa0 RSP:
>> ffffc9000a8efc80
>> [63953.758203] CR2: 0000000000000040
>> [63953.758204] ---[ end trace 1f7ea9a959f02054 ]---
>>
>> Signed-off-by: Alexander Kappner <agk@godking.net>
>> ---
>>   drivers/usb/host/xhci-debugfs.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/usb/host/xhci-debugfs.c 
>> b/drivers/usb/host/xhci-debugfs.c
>> index 4f7895d..1cea59c 100644
>> --- a/drivers/usb/host/xhci-debugfs.c
>> +++ b/drivers/usb/host/xhci-debugfs.c
>> @@ -378,6 +378,9 @@ void xhci_debugfs_create_endpoint(struct xhci_hcd 
>> *xhci,
>>       struct xhci_ep_priv    *epriv;
>>       struct xhci_slot_priv    *spriv = dev->debugfs_private;
>>   +    if (!spriv)
>> +        return;
>> +
>>       if (spriv->eps[ep_index])
>>           return;
>>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mathias Nyman Dec. 8, 2017, 5:01 p.m. UTC | #4
On 08.12.2017 13:06, Alexander Kappner wrote:
> Hi,
> 
>> I think we need to dig a bit deeper. It's good to check if spriv is
>> valid
>> but there are probably other reasons than kzalloc failing.
> 
> I agree -- this small allocation is  unlikely to fail in practice.
> Also, while my patch prevents the kernel oops, it also prevents the
> debugfs entries from being created.
> 
> I've been debugging this more trying to come up with a better
> solution, but I might need some guidance as I'm not too familiar with
> the USB subsystem. The immediate cause of the crash was usbmuxd
> sending a USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if
> it fails_ calls usb_hcd_alloc_bandwidth to try and reset the device,
> which in turn calls xhci_debugfs_create_endpoint. The ioctl handler
> acquires a device-specific lock via usb_lock_device.
> 
> When the system resumed from hibernate, xhci_resume was called. This
> in turn called xhci_mem_cleanup to deallocate the device structures,
> which include setting the debugfs_private pointer to NULL  (via
> xhci_free_virt_devices_depth_first). It thus seems likely that the
> ioctl is somehow racing with the hibernate. The call to xhci_resume
> is protected by a host-controller specific lock (xhci->lock) but it
> doesn't attempt to take the usb_lock_device device-specific lock.
> 
> Now my suspicion is that xhci_resume freed and zeroed the device
> structures while racing with the ioctl handler. I can't seem to find
> any exclusion mechanism that would prevent xhci_resume from racing
> with the USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for
> that matter). Am I missing something? If not, is there any reason why
> an ioctl might need to execute in parallel with the xhci_resume? If
> not, can we just do a busy wait in xhci_resume until all pending
> ioctls have returned?

I'm not sure, but if I recall correctly then power management is supposed
to make sure a driver doesn't access usb devices while the host controller
is still resuming.

The odd thing here is that
xhci_debugfs_remove_slot(xhci, slot_id), and
xhci_free_virt_device(xhci, slot_id) are called together when
xhci_mem_cleanup() calls xhci_free_virt_devices_depth_first()

That means both the xhci_virt_device *dev and dev->debugfs_private
should both be freed and xhci->devs[slot_id] set NULL for that virt_device.

so xhci_add_endpoint() should fail a lot earlier because the xhci->devs[slot_id]
should be a null pointer as well.

Allocation is also done together in xhci_alloc_dev()

Looking at it more closely there is actually the .free_dev callback that
first frees the dev->debugs_private but the virt_dev is only freed
conditionally later

Attached a patch that frees them together, can you try it out?

If it doesn't help we need to add some elaborate tracing

Thanks
-Mathias
>From 811aad86287e78879b7e7f28e0c266bcd2a2f73e Mon Sep 17 00:00:00 2001
From: Mathias Nyman <mathias.nyman@linux.intel.com>
Date: Fri, 8 Dec 2017 18:50:18 +0200
Subject: [PATCH] xhci: Only free xhci_virt_device->debugfs_private if device
 is freed

freeing device is conditional, we otherwise might end up with a
xhci virt_device that doesnt have a valid debugs_private pointer.
For testing only

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 2424d30..da6dbe3 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3525,8 +3525,6 @@ static void xhci_free_dev(struct usb_hcd *hcd, struct usb_device *udev)
 	struct xhci_slot_ctx *slot_ctx;
 	int i, ret;
 
-	xhci_debugfs_remove_slot(xhci, udev->slot_id);
-
 #ifndef CONFIG_USB_DEFAULT_PERSIST
 	/*
 	 * We called pm_runtime_get_noresume when the device was attached.
@@ -3555,8 +3553,10 @@ static void xhci_free_dev(struct usb_hcd *hcd, struct usb_device *udev)
 	}
 
 	ret = xhci_disable_slot(xhci, udev->slot_id);
-	if (ret)
+	if (ret) {
+		xhci_debugfs_remove_slot(xhci, udev->slot_id);
 		xhci_free_virt_device(xhci, udev->slot_id);
+	}
 }
 
 int xhci_disable_slot(struct xhci_hcd *xhci, u32 slot_id)
Alexander Kappner Dec. 9, 2017, 10:38 p.m. UTC | #5
Hi Mathias,

thanks for the patch! The system now resumes cleanly from hibernate even 
with usbmuxd doing its thing.

Tested-by: Alexander Kappner <agk@godking.net>

While testing this I hit some other issues with xhci-debugfs.c but I'll 
write these up in a separate bug.

On 12/08/2017 09:01 AM, Mathias Nyman wrote:
> On 08.12.2017 13:06, Alexander Kappner wrote:
>> Hi,
>>
>>> I think we need to dig a bit deeper. It's good to check if spriv is
>>> valid
>>> but there are probably other reasons than kzalloc failing.
>>
>> I agree -- this small allocation is  unlikely to fail in practice.
>> Also, while my patch prevents the kernel oops, it also prevents the
>> debugfs entries from being created.
>>
>> I've been debugging this more trying to come up with a better
>> solution, but I might need some guidance as I'm not too familiar with
>> the USB subsystem. The immediate cause of the crash was usbmuxd
>> sending a USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if
>> it fails_ calls usb_hcd_alloc_bandwidth to try and reset the device,
>> which in turn calls xhci_debugfs_create_endpoint. The ioctl handler
>> acquires a device-specific lock via usb_lock_device.
>>
>> When the system resumed from hibernate, xhci_resume was called. This
>> in turn called xhci_mem_cleanup to deallocate the device structures,
>> which include setting the debugfs_private pointer to NULL  (via
>> xhci_free_virt_devices_depth_first). It thus seems likely that the
>> ioctl is somehow racing with the hibernate. The call to xhci_resume
>> is protected by a host-controller specific lock (xhci->lock) but it
>> doesn't attempt to take the usb_lock_device device-specific lock.
>>
>> Now my suspicion is that xhci_resume freed and zeroed the device
>> structures while racing with the ioctl handler. I can't seem to find
>> any exclusion mechanism that would prevent xhci_resume from racing
>> with the USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for
>> that matter). Am I missing something? If not, is there any reason why
>> an ioctl might need to execute in parallel with the xhci_resume? If
>> not, can we just do a busy wait in xhci_resume until all pending
>> ioctls have returned?
>
> I'm not sure, but if I recall correctly then power management is supposed
> to make sure a driver doesn't access usb devices while the host 
> controller
> is still resuming.
>
> The odd thing here is that
> xhci_debugfs_remove_slot(xhci, slot_id), and
> xhci_free_virt_device(xhci, slot_id) are called together when
> xhci_mem_cleanup() calls xhci_free_virt_devices_depth_first()
>
> That means both the xhci_virt_device *dev and dev->debugfs_private
> should both be freed and xhci->devs[slot_id] set NULL for that 
> virt_device.
>
> so xhci_add_endpoint() should fail a lot earlier because the 
> xhci->devs[slot_id]
> should be a null pointer as well.
>
> Allocation is also done together in xhci_alloc_dev()
>
> Looking at it more closely there is actually the .free_dev callback that
> first frees the dev->debugs_private but the virt_dev is only freed
> conditionally later
>
> Attached a patch that frees them together, can you try it out?
>
> If it doesn't help we need to add some elaborate tracing
>
> Thanks
> -Mathias
>
>
Mathias Nyman Dec. 11, 2017, 10:39 a.m. UTC | #6
On 10.12.2017 00:38, Alexander Kappner wrote:
> Hi Mathias,
> 
> thanks for the patch! The system now resumes cleanly from hibernate even with usbmuxd doing its thing.
> 
> Tested-by: Alexander Kappner <agk@godking.net>
> 
> While testing this I hit some other issues with xhci-debugfs.c but I'll write these up in a separate bug.
> 

Thanks,
Adding reported-by and tested-by tags, and pushing patch to my for-usb-linus branch

-Mathias

Patch
diff mbox series

diff --git a/drivers/usb/host/xhci-debugfs.c b/drivers/usb/host/xhci-debugfs.c
index 4f7895d..1cea59c 100644
--- a/drivers/usb/host/xhci-debugfs.c
+++ b/drivers/usb/host/xhci-debugfs.c
@@ -378,6 +378,9 @@  void xhci_debugfs_create_endpoint(struct xhci_hcd *xhci,
 	struct xhci_ep_priv	*epriv;
 	struct xhci_slot_priv	*spriv = dev->debugfs_private;
 
+	if (!spriv)
+		return;
+
 	if (spriv->eps[ep_index])
 		return;