* usb-storage URB use-after-free
@ 2015-01-28 22:04 Joe Lawrence
2015-01-28 22:07 ` Joe Lawrence
0 siblings, 1 reply; 4+ messages in thread
From: Joe Lawrence @ 2015-01-28 22:04 UTC (permalink / raw)
To: linux-scsi; +Cc: Joe Lawrence
Hello linux-usb,
We've hit a USB use-after-free on Stratus HW during device removal tests.
We're running fio disk I/O to a scsi disk hanging off USB when the USB
controller is hotplug removed. This crash is very consistent (usually the
first device pull during testing). Without I/O, it may take days to
encounter.
general protection fault: 0000 [#1] SMP
...
CPU: 35 PID: 19626 Comm: kworker/u97:0 Tainted: PF W O-------------- 3.10.0-210.el7.dev02.x86_64 #1
Hardware name: Stratus ftServer 6800/G7LYY, BIOS BIOS Version 8.0:30 11/12/2014
Workqueue: scsi_tmf_872 scmd_eh_abort_handler
task: ffff88031bd91960 ti: ffff880981318000 task.ti: ffff880981318000
RIP: 0010:[<ffffffff812d5e2d>] [<ffffffff812d5e2d>] kobject_put+0xd/0x60
RSP: 0018:ffff88098131bd28 EFLAGS: 00010002
RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6c0b RCX: 00000001002f0009
RDX: 000000000000002f RSI: ffffea0015719800 RDI: 6b6b6b6b6b6b6c0b
RBP: ffff88098131bd30 R08: ffff88055c6622f0 R09: 00000001002f0008
R10: ffff88085f407a80 R11: ffffffff81450503 R12: ffff8804bab6d248
R13: 00000000ffffff98 R14: 0000000000000086 R15: 0000000000000c20
FS: 0000000000000000(0000) GS:ffff88085fce0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2ebb5d6008 CR3: 0000001038dc5000 CR4: 00000000001407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
0000000000000000 ffff88098131bd40 ffffffff813cd327 ffff88098131bd50
ffffffff8140a65a ffff88098131bd78 ffffffff81416795 0000000000000000
ffff880414791968 ffff880414791978 ffff88098131bd88 ffffffff814175f6
Call Trace:
[<ffffffff813cd327>] put_device+0x17/0x20
[<ffffffff8140a65a>] usb_put_dev+0x1a/0x20
[<ffffffff81416795>] usb_hcd_unlink_urb+0x65/0xe0
[<ffffffff814175f6>] usb_unlink_urb+0x26/0x50
[<ffffffff81418e92>] usb_sg_cancel+0x82/0xe0
[<ffffffffa0074e71>] usb_stor_stop_transport+0x31/0x50 [usb_storage]
[<ffffffffa0073b8d>] command_abort+0xcd/0xe0 [usb_storage]
[<ffffffff813f51ef>] scmd_eh_abort_handler+0xbf/0x480
[<ffffffff8108f06b>] process_one_work+0x17b/0x470
[<ffffffff8108fe4b>] worker_thread+0x11b/0x400
[<ffffffff8108fd30>] ? rescuer_thread+0x400/0x400
[<ffffffff8109722f>] kthread+0xcf/0xe0
[<ffffffff81097160>] ? kthread_create_on_node+0x140/0x140
[<ffffffff8161387c>] ret_from_fork+0x7c/0xb0
[<ffffffff81097160>] ? kthread_create_on_node+0x140/0x140
from another crash, we know that the URB itself has been freed:
crash> struct -o urb | grep SIZE
SIZE: 0xc0
crash> p 0xc0/8
$2 = 0x18
crash> rd ffff880846df9248 0x18
ffff880846df9248: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff880846df9258: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff880846df9268: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff880846df9278: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff880846df9288: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff880846df9298: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff880846df92a8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff880846df92b8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff880846df92c8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff880846df92d8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff880846df92e8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff880846df92f8: 6b6b6b6b6b6b6b6b a56b6b6b6b6b6b6b kkkkkkkkkkkkkkk.
but the underlying usb_device->device is still present:
crash> struct usb_device ffff88084fd22e68
...
dev = {
parent = 0xffff881050c3d418,
p = 0xffff88084ff27ae8,
kobj = {
name = 0xffff88084ff071e0 "6-1",
entry = {
next = 0xffff88084ff09568,
prev = 0xffff88084ff09000
},
parent = 0xffff881050c3d428,
kset = 0xffff8808540dc9f0,
ktype = 0xffffffff819cc220 <device_ktype>,
sd = 0xffff88084ff1b8b8,
kref = {
refcount = {
counter = 0x8
}
},
state_initialized = 0x1,
state_in_sysfs = 0x1,
state_add_uevent_sent = 0x1,
state_remove_uevent_sent = 0x0,
uevent_suppress = 0x0
},
...
I added trace printing to usb_{alloc_urb,get_urb,free_urb} and urb_destroy
showing the URB and the calling function, then crashed again:
... snip ...
usb-storage-687 [000] 350.013212: bprint: usb_alloc_urb : JL: usb_sg_init+0xe1 -> usb_alloc_urb(0xffff8808489b0a28)
usb-storage-687 [000] 350.013212: bprint: usb_get_urb : JL: usb_hcd_submit_urb+0x3b -> usb_get_urb(0xffff8808489b0a28)
kworker/u97:2-703 [006] 380.656791: bprint: usb_free_urb : JL: __usb_hcd_giveback_urb -> usb_free_urb(0xffff8808489b0a28)
usb-storage-687 [000] 380.656793: bprint: usb_free_urb : JL: sg_clean+0x33 -> usb_free_urb(0xffff8808489b0a28)
usb-storage-687 [000] 380.656794: bprint: urb_destroy : JL: usb_free_urb+0x40 -> urb_destroy(0xffff8808489b0a28)
crash> bt 703
PID: 703 TASK: ffff8808497f8000 CPU: 6 COMMAND: "kworker/u97:2"
#0 [ffff8808497f7c10] die at ffffffff8101736b
#1 [ffff8808497f7c40] do_general_protection at ffffffff8160be4e
#2 [ffff8808497f7c70] general_protection at ffffffff8160b768
[exception RIP: kobject_put+0xd]
RIP: ffffffff812d69dd RSP: ffff8808497f7d28 RFLAGS: 00010002
RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6c0b RCX: dead000000200200
RDX: ffff88085f400390 RSI: 0000000000000047 RDI: 6b6b6b6b6b6b6c0b
RBP: ffff8808497f7d30 R8: ffff88085f400390 R9: 00000001002f002e
R10: ffff88085f407a80 R11: ffffffff81450e03 R12: ffff8808489b0a28
R13: 00000000ffffff98 R14: 0000000000000086 R15: 0000000000000c20
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#3 [ffff8808497f7d38] put_device at ffffffff813cdc57
#4 [ffff8808497f7d48] usb_put_dev at ffffffff8140afda
#5 [ffff8808497f7d58] usb_hcd_unlink_urb at ffffffff81417115
#6 [ffff8808497f7d80] usb_unlink_urb at ffffffff814180d6
#7 [ffff8808497f7d90] usb_sg_cancel at ffffffff81419792
#8 [ffff8808497f7dc0] usb_stor_stop_transport at ffffffffa00a9e71 [usb_storage]
#9 [ffff8808497f7dd8] command_abort at ffffffffa00a8b8d [usb_storage]
#10 [ffff8808497f7df8] scmd_eh_abort_handler at ffffffff813f5b5f
#11 [ffff8808497f7e20] process_one_work at ffffffff8108f0ab
#12 [ffff8808497f7e68] worker_thread at ffffffff8108fe8b
#13 [ffff8808497f7ec8] kthread at ffffffff8109726f
#14 [ffff8808497f7f50] ret_from_fork at ffffffff81613cbc
PID: 687 TASK: ffff88084c92a610 CPU: 0 COMMAND: "usb-storage"
[exception RIP: _raw_spin_lock_irq+0x38]
RIP: ffffffff8160b1d8 RSP: ffff88084cc23da0 RFLAGS: 00000002
RAX: 0000000000001d66 RBX: ffff88104cea1888 RCX: 0000000000008160
RDX: 0000000000008162 RSI: 0000000000008162 RDI: ffff88104cea1198
RBP: ffff88084cc23da0 R8: 0000000000001000 R9: 0000000100300023
R10: ffff88085f404240 R11: ffffffff81418e39 R12: ffff88084a878900
R13: ffff88104cea19f8 R14: 0000000000000003 R15: 0000000000000000
CS: 0010 SS: 0018
#0 [ffff88084cc23da8] usb_stor_invoke_transport at ffffffffa00a9fe2 [usb_storage]
#1 [ffff88084cc23e58] usb_stor_transparent_scsi_command at ffffffffa00a8c6e [usb_storage]
#2 [ffff88084cc23e68] usb_stor_control_thread at ffffffffa00ab6d8 [usb_storage]
#3 [ffff88084cc23ec8] kthread at ffffffff8109726f
#4 [ffff88084cc23f50] ret_from_fork at ffffffff81613cbc
It looks like usb_hcd_unlink_urb takes a reference out on the underlying
device but not the URB, which in our test case consistently gets freed
just before usb_hcd_unlink_urb tries to return a reference on urb->device.
Maybe the URB is a reference count short when usb_hcd_unlink_urb calls
unlink1? Somewhere usb-storage missed taking out a ref?
A tourniquet hack-patch follows (probably inside out, as the URB should stay
intact while usb_hcd_unlink_urb does its thing) and has been running a
little over 30 surprise device removals under I/O without incident.
Any better suggestions?
-- Joe
-->8-- -->8-- -->8--
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 11cee55ae397..3a4ccc13f6a8 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1624,8 +1624,10 @@ int usb_hcd_unlink_urb (struct urb *urb, int status)
/* Prevent the device and bus from going away while
* the unlink is carried out. If they are already gone
* then urb->use_count must be 0, since disconnected
- * devices can't have any active URBs.
+ * devices can't have any active URBs. Nail down the
+ * URB while we're at it, too.
*/
+ usb_get_urb(urb);
spin_lock_irqsave(&hcd_urb_unlink_lock, flags);
if (atomic_read(&urb->use_count) > 0) {
retval = 0;
@@ -1643,6 +1645,7 @@ int usb_hcd_unlink_urb (struct urb *urb, int status)
else if (retval != -EIDRM && retval != -EBUSY)
dev_dbg(&urb->dev->dev, "hcd_unlink_urb %p fail %d\n",
urb, retval);
+ usb_put_urb(urb);
return retval;
}
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: usb-storage URB use-after-free
2015-01-28 22:04 usb-storage URB use-after-free Joe Lawrence
@ 2015-01-28 22:07 ` Joe Lawrence
[not found] ` <54C95DA9.2050306-7+ureL1bLXNBDgjK7y7TUQ@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Joe Lawrence @ 2015-01-28 22:07 UTC (permalink / raw)
To: linux-scsi; +Cc: linux-usb
This one should have gone over to linux-usb.
-- Joe
On 01/28/2015 05:04 PM, Joe Lawrence wrote:
> Hello linux-usb,
>
> We've hit a USB use-after-free on Stratus HW during device removal tests.
> We're running fio disk I/O to a scsi disk hanging off USB when the USB
> controller is hotplug removed. This crash is very consistent (usually the
> first device pull during testing). Without I/O, it may take days to
> encounter.
>
> general protection fault: 0000 [#1] SMP
> ...
> CPU: 35 PID: 19626 Comm: kworker/u97:0 Tainted: PF W O-------------- 3.10.0-210.el7.dev02.x86_64 #1
> Hardware name: Stratus ftServer 6800/G7LYY, BIOS BIOS Version 8.0:30 11/12/2014
> Workqueue: scsi_tmf_872 scmd_eh_abort_handler
> task: ffff88031bd91960 ti: ffff880981318000 task.ti: ffff880981318000
> RIP: 0010:[<ffffffff812d5e2d>] [<ffffffff812d5e2d>] kobject_put+0xd/0x60
> RSP: 0018:ffff88098131bd28 EFLAGS: 00010002
> RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6c0b RCX: 00000001002f0009
> RDX: 000000000000002f RSI: ffffea0015719800 RDI: 6b6b6b6b6b6b6c0b
> RBP: ffff88098131bd30 R08: ffff88055c6622f0 R09: 00000001002f0008
> R10: ffff88085f407a80 R11: ffffffff81450503 R12: ffff8804bab6d248
> R13: 00000000ffffff98 R14: 0000000000000086 R15: 0000000000000c20
> FS: 0000000000000000(0000) GS:ffff88085fce0000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2ebb5d6008 CR3: 0000001038dc5000 CR4: 00000000001407e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Stack:
> 0000000000000000 ffff88098131bd40 ffffffff813cd327 ffff88098131bd50
> ffffffff8140a65a ffff88098131bd78 ffffffff81416795 0000000000000000
> ffff880414791968 ffff880414791978 ffff88098131bd88 ffffffff814175f6
> Call Trace:
> [<ffffffff813cd327>] put_device+0x17/0x20
> [<ffffffff8140a65a>] usb_put_dev+0x1a/0x20
> [<ffffffff81416795>] usb_hcd_unlink_urb+0x65/0xe0
> [<ffffffff814175f6>] usb_unlink_urb+0x26/0x50
> [<ffffffff81418e92>] usb_sg_cancel+0x82/0xe0
> [<ffffffffa0074e71>] usb_stor_stop_transport+0x31/0x50 [usb_storage]
> [<ffffffffa0073b8d>] command_abort+0xcd/0xe0 [usb_storage]
> [<ffffffff813f51ef>] scmd_eh_abort_handler+0xbf/0x480
> [<ffffffff8108f06b>] process_one_work+0x17b/0x470
> [<ffffffff8108fe4b>] worker_thread+0x11b/0x400
> [<ffffffff8108fd30>] ? rescuer_thread+0x400/0x400
> [<ffffffff8109722f>] kthread+0xcf/0xe0
> [<ffffffff81097160>] ? kthread_create_on_node+0x140/0x140
> [<ffffffff8161387c>] ret_from_fork+0x7c/0xb0
> [<ffffffff81097160>] ? kthread_create_on_node+0x140/0x140
>
> from another crash, we know that the URB itself has been freed:
>
> crash> struct -o urb | grep SIZE
> SIZE: 0xc0
> crash> p 0xc0/8
> $2 = 0x18
> crash> rd ffff880846df9248 0x18
> ffff880846df9248: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
> ffff880846df9258: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
> ffff880846df9268: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
> ffff880846df9278: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
> ffff880846df9288: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
> ffff880846df9298: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
> ffff880846df92a8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
> ffff880846df92b8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
> ffff880846df92c8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
> ffff880846df92d8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
> ffff880846df92e8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
> ffff880846df92f8: 6b6b6b6b6b6b6b6b a56b6b6b6b6b6b6b kkkkkkkkkkkkkkk.
>
> but the underlying usb_device->device is still present:
>
> crash> struct usb_device ffff88084fd22e68
> ...
> dev = {
> parent = 0xffff881050c3d418,
> p = 0xffff88084ff27ae8,
> kobj = {
> name = 0xffff88084ff071e0 "6-1",
> entry = {
> next = 0xffff88084ff09568,
> prev = 0xffff88084ff09000
> },
> parent = 0xffff881050c3d428,
> kset = 0xffff8808540dc9f0,
> ktype = 0xffffffff819cc220 <device_ktype>,
> sd = 0xffff88084ff1b8b8,
> kref = {
> refcount = {
> counter = 0x8
> }
> },
> state_initialized = 0x1,
> state_in_sysfs = 0x1,
> state_add_uevent_sent = 0x1,
> state_remove_uevent_sent = 0x0,
> uevent_suppress = 0x0
> },
> ...
>
> I added trace printing to usb_{alloc_urb,get_urb,free_urb} and urb_destroy
> showing the URB and the calling function, then crashed again:
>
> ... snip ...
> usb-storage-687 [000] 350.013212: bprint: usb_alloc_urb : JL: usb_sg_init+0xe1 -> usb_alloc_urb(0xffff8808489b0a28)
> usb-storage-687 [000] 350.013212: bprint: usb_get_urb : JL: usb_hcd_submit_urb+0x3b -> usb_get_urb(0xffff8808489b0a28)
> kworker/u97:2-703 [006] 380.656791: bprint: usb_free_urb : JL: __usb_hcd_giveback_urb -> usb_free_urb(0xffff8808489b0a28)
> usb-storage-687 [000] 380.656793: bprint: usb_free_urb : JL: sg_clean+0x33 -> usb_free_urb(0xffff8808489b0a28)
> usb-storage-687 [000] 380.656794: bprint: urb_destroy : JL: usb_free_urb+0x40 -> urb_destroy(0xffff8808489b0a28)
>
> crash> bt 703
> PID: 703 TASK: ffff8808497f8000 CPU: 6 COMMAND: "kworker/u97:2"
> #0 [ffff8808497f7c10] die at ffffffff8101736b
> #1 [ffff8808497f7c40] do_general_protection at ffffffff8160be4e
> #2 [ffff8808497f7c70] general_protection at ffffffff8160b768
> [exception RIP: kobject_put+0xd]
> RIP: ffffffff812d69dd RSP: ffff8808497f7d28 RFLAGS: 00010002
> RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6c0b RCX: dead000000200200
> RDX: ffff88085f400390 RSI: 0000000000000047 RDI: 6b6b6b6b6b6b6c0b
> RBP: ffff8808497f7d30 R8: ffff88085f400390 R9: 00000001002f002e
> R10: ffff88085f407a80 R11: ffffffff81450e03 R12: ffff8808489b0a28
> R13: 00000000ffffff98 R14: 0000000000000086 R15: 0000000000000c20
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #3 [ffff8808497f7d38] put_device at ffffffff813cdc57
> #4 [ffff8808497f7d48] usb_put_dev at ffffffff8140afda
> #5 [ffff8808497f7d58] usb_hcd_unlink_urb at ffffffff81417115
> #6 [ffff8808497f7d80] usb_unlink_urb at ffffffff814180d6
> #7 [ffff8808497f7d90] usb_sg_cancel at ffffffff81419792
> #8 [ffff8808497f7dc0] usb_stor_stop_transport at ffffffffa00a9e71 [usb_storage]
> #9 [ffff8808497f7dd8] command_abort at ffffffffa00a8b8d [usb_storage]
> #10 [ffff8808497f7df8] scmd_eh_abort_handler at ffffffff813f5b5f
> #11 [ffff8808497f7e20] process_one_work at ffffffff8108f0ab
> #12 [ffff8808497f7e68] worker_thread at ffffffff8108fe8b
> #13 [ffff8808497f7ec8] kthread at ffffffff8109726f
> #14 [ffff8808497f7f50] ret_from_fork at ffffffff81613cbc
>
> PID: 687 TASK: ffff88084c92a610 CPU: 0 COMMAND: "usb-storage"
> [exception RIP: _raw_spin_lock_irq+0x38]
> RIP: ffffffff8160b1d8 RSP: ffff88084cc23da0 RFLAGS: 00000002
> RAX: 0000000000001d66 RBX: ffff88104cea1888 RCX: 0000000000008160
> RDX: 0000000000008162 RSI: 0000000000008162 RDI: ffff88104cea1198
> RBP: ffff88084cc23da0 R8: 0000000000001000 R9: 0000000100300023
> R10: ffff88085f404240 R11: ffffffff81418e39 R12: ffff88084a878900
> R13: ffff88104cea19f8 R14: 0000000000000003 R15: 0000000000000000
> CS: 0010 SS: 0018
> #0 [ffff88084cc23da8] usb_stor_invoke_transport at ffffffffa00a9fe2 [usb_storage]
> #1 [ffff88084cc23e58] usb_stor_transparent_scsi_command at ffffffffa00a8c6e [usb_storage]
> #2 [ffff88084cc23e68] usb_stor_control_thread at ffffffffa00ab6d8 [usb_storage]
> #3 [ffff88084cc23ec8] kthread at ffffffff8109726f
> #4 [ffff88084cc23f50] ret_from_fork at ffffffff81613cbc
>
> It looks like usb_hcd_unlink_urb takes a reference out on the underlying
> device but not the URB, which in our test case consistently gets freed
> just before usb_hcd_unlink_urb tries to return a reference on urb->device.
>
> Maybe the URB is a reference count short when usb_hcd_unlink_urb calls
> unlink1? Somewhere usb-storage missed taking out a ref?
>
> A tourniquet hack-patch follows (probably inside out, as the URB should stay
> intact while usb_hcd_unlink_urb does its thing) and has been running a
> little over 30 surprise device removals under I/O without incident.
>
> Any better suggestions?
>
> -- Joe
>
> -->8-- -->8-- -->8--
>
> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> index 11cee55ae397..3a4ccc13f6a8 100644
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -1624,8 +1624,10 @@ int usb_hcd_unlink_urb (struct urb *urb, int status)
> /* Prevent the device and bus from going away while
> * the unlink is carried out. If they are already gone
> * then urb->use_count must be 0, since disconnected
> - * devices can't have any active URBs.
> + * devices can't have any active URBs. Nail down the
> + * URB while we're at it, too.
> */
> + usb_get_urb(urb);
> spin_lock_irqsave(&hcd_urb_unlink_lock, flags);
> if (atomic_read(&urb->use_count) > 0) {
> retval = 0;
> @@ -1643,6 +1645,7 @@ int usb_hcd_unlink_urb (struct urb *urb, int status)
> else if (retval != -EIDRM && retval != -EBUSY)
> dev_dbg(&urb->dev->dev, "hcd_unlink_urb %p fail %d\n",
> urb, retval);
> + usb_put_urb(urb);
> return retval;
> }
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: usb-storage URB use-after-free
[not found] ` <54C95DA9.2050306-7+ureL1bLXNBDgjK7y7TUQ@public.gmane.org>
@ 2015-01-29 16:42 ` Alan Stern
2015-01-30 13:40 ` Joe Lawrence
0 siblings, 1 reply; 4+ messages in thread
From: Alan Stern @ 2015-01-29 16:42 UTC (permalink / raw)
To: Joe Lawrence
Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA
On Wed, 28 Jan 2015, Joe Lawrence wrote:
> This one should have gone over to linux-usb.
>
> -- Joe
>
> On 01/28/2015 05:04 PM, Joe Lawrence wrote:
> > Hello linux-usb,
> >
> > We've hit a USB use-after-free on Stratus HW during device removal tests.
> > We're running fio disk I/O to a scsi disk hanging off USB when the USB
> > controller is hotplug removed. This crash is very consistent (usually the
> > first device pull during testing). Without I/O, it may take days to
> > encounter.
> >
> > general protection fault: 0000 [#1] SMP
> > ...
> > CPU: 35 PID: 19626 Comm: kworker/u97:0 Tainted: PF W O-------------- 3.10.0-210.el7.dev02.x86_64 #1
> > Hardware name: Stratus ftServer 6800/G7LYY, BIOS BIOS Version 8.0:30 11/12/2014
> > Workqueue: scsi_tmf_872 scmd_eh_abort_handler
> > task: ffff88031bd91960 ti: ffff880981318000 task.ti: ffff880981318000
> > RIP: 0010:[<ffffffff812d5e2d>] [<ffffffff812d5e2d>] kobject_put+0xd/0x60
> > RSP: 0018:ffff88098131bd28 EFLAGS: 00010002
> > RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6c0b RCX: 00000001002f0009
> > RDX: 000000000000002f RSI: ffffea0015719800 RDI: 6b6b6b6b6b6b6c0b
> > RBP: ffff88098131bd30 R08: ffff88055c6622f0 R09: 00000001002f0008
> > R10: ffff88085f407a80 R11: ffffffff81450503 R12: ffff8804bab6d248
> > R13: 00000000ffffff98 R14: 0000000000000086 R15: 0000000000000c20
> > FS: 0000000000000000(0000) GS:ffff88085fce0000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f2ebb5d6008 CR3: 0000001038dc5000 CR4: 00000000001407e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Stack:
> > 0000000000000000 ffff88098131bd40 ffffffff813cd327 ffff88098131bd50
> > ffffffff8140a65a ffff88098131bd78 ffffffff81416795 0000000000000000
> > ffff880414791968 ffff880414791978 ffff88098131bd88 ffffffff814175f6
> > Call Trace:
> > [<ffffffff813cd327>] put_device+0x17/0x20
> > [<ffffffff8140a65a>] usb_put_dev+0x1a/0x20
> > [<ffffffff81416795>] usb_hcd_unlink_urb+0x65/0xe0
> > [<ffffffff814175f6>] usb_unlink_urb+0x26/0x50
> > [<ffffffff81418e92>] usb_sg_cancel+0x82/0xe0
> > [<ffffffffa0074e71>] usb_stor_stop_transport+0x31/0x50 [usb_storage]
> > [<ffffffffa0073b8d>] command_abort+0xcd/0xe0 [usb_storage]
> > [<ffffffff813f51ef>] scmd_eh_abort_handler+0xbf/0x480
> > [<ffffffff8108f06b>] process_one_work+0x17b/0x470
> > [<ffffffff8108fe4b>] worker_thread+0x11b/0x400
> > [<ffffffff8108fd30>] ? rescuer_thread+0x400/0x400
> > [<ffffffff8109722f>] kthread+0xcf/0xe0
> > [<ffffffff81097160>] ? kthread_create_on_node+0x140/0x140
> > [<ffffffff8161387c>] ret_from_fork+0x7c/0xb0
> > [<ffffffff81097160>] ? kthread_create_on_node+0x140/0x140
> >
> > from another crash, we know that the URB itself has been freed:
> > but the underlying usb_device->device is still present:
> > It looks like usb_hcd_unlink_urb takes a reference out on the underlying
> > device but not the URB, which in our test case consistently gets freed
> > just before usb_hcd_unlink_urb tries to return a reference on urb->device.
The analysis looks correct.
> > Maybe the URB is a reference count short when usb_hcd_unlink_urb calls
> > unlink1? Somewhere usb-storage missed taking out a ref?
No; it's a simple use-after-free error.
> > A tourniquet hack-patch follows (probably inside out, as the URB should stay
> > intact while usb_hcd_unlink_urb does its thing) and has been running a
> > little over 30 surprise device removals under I/O without incident.
> >
> > Any better suggestions?
Please try this instead.
Alan Stern
Index: usb-3.19/drivers/usb/core/hcd.c
===================================================================
--- usb-3.19.orig/drivers/usb/core/hcd.c
+++ usb-3.19/drivers/usb/core/hcd.c
@@ -1618,6 +1618,7 @@ static int unlink1(struct usb_hcd *hcd,
int usb_hcd_unlink_urb (struct urb *urb, int status)
{
struct usb_hcd *hcd;
+ struct usb_device *udev = urb->dev;
int retval = -EIDRM;
unsigned long flags;
@@ -1629,20 +1630,19 @@ int usb_hcd_unlink_urb (struct urb *urb,
spin_lock_irqsave(&hcd_urb_unlink_lock, flags);
if (atomic_read(&urb->use_count) > 0) {
retval = 0;
- usb_get_dev(urb->dev);
+ usb_get_dev(udev);
}
spin_unlock_irqrestore(&hcd_urb_unlink_lock, flags);
if (retval == 0) {
hcd = bus_to_hcd(urb->dev->bus);
retval = unlink1(hcd, urb, status);
- usb_put_dev(urb->dev);
+ if (retval == 0)
+ retval = -EINPROGRESS;
+ else if (retval != -EIDRM && retval != -EBUSY)
+ dev_dbg(&udev->dev, "hcd_unlink_urb %p fail %d\n",
+ urb, retval);
+ usb_put_dev(udev);
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: usb-storage URB use-after-free
2015-01-29 16:42 ` Alan Stern
@ 2015-01-30 13:40 ` Joe Lawrence
0 siblings, 0 replies; 4+ messages in thread
From: Joe Lawrence @ 2015-01-30 13:40 UTC (permalink / raw)
To: Alan Stern; +Cc: linux-scsi, linux-usb
On Thu, 29 Jan 2015 11:42:18 -0500
Alan Stern <stern@rowland.harvard.edu> wrote:
> On Wed, 28 Jan 2015, Joe Lawrence wrote:
>
> > This one should have gone over to linux-usb.
> >
> > -- Joe
> >
> > On 01/28/2015 05:04 PM, Joe Lawrence wrote:
> > > Hello linux-usb,
> > >
> > > We've hit a USB use-after-free on Stratus HW during device removal tests.
> > > We're running fio disk I/O to a scsi disk hanging off USB when the USB
> > > controller is hotplug removed. This crash is very consistent (usually the
> > > first device pull during testing). Without I/O, it may take days to
> > > encounter.
> > >
> > > general protection fault: 0000 [#1] SMP
> > > ...
> > > CPU: 35 PID: 19626 Comm: kworker/u97:0 Tainted: PF W O-------------- 3.10.0-210.el7.dev02.x86_64 #1
> > > Hardware name: Stratus ftServer 6800/G7LYY, BIOS BIOS Version 8.0:30 11/12/2014
> > > Workqueue: scsi_tmf_872 scmd_eh_abort_handler
> > > task: ffff88031bd91960 ti: ffff880981318000 task.ti: ffff880981318000
> > > RIP: 0010:[<ffffffff812d5e2d>] [<ffffffff812d5e2d>] kobject_put+0xd/0x60
> > > RSP: 0018:ffff88098131bd28 EFLAGS: 00010002
> > > RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6c0b RCX: 00000001002f0009
> > > RDX: 000000000000002f RSI: ffffea0015719800 RDI: 6b6b6b6b6b6b6c0b
> > > RBP: ffff88098131bd30 R08: ffff88055c6622f0 R09: 00000001002f0008
> > > R10: ffff88085f407a80 R11: ffffffff81450503 R12: ffff8804bab6d248
> > > R13: 00000000ffffff98 R14: 0000000000000086 R15: 0000000000000c20
> > > FS: 0000000000000000(0000) GS:ffff88085fce0000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007f2ebb5d6008 CR3: 0000001038dc5000 CR4: 00000000001407e0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > Stack:
> > > 0000000000000000 ffff88098131bd40 ffffffff813cd327 ffff88098131bd50
> > > ffffffff8140a65a ffff88098131bd78 ffffffff81416795 0000000000000000
> > > ffff880414791968 ffff880414791978 ffff88098131bd88 ffffffff814175f6
> > > Call Trace:
> > > [<ffffffff813cd327>] put_device+0x17/0x20
> > > [<ffffffff8140a65a>] usb_put_dev+0x1a/0x20
> > > [<ffffffff81416795>] usb_hcd_unlink_urb+0x65/0xe0
> > > [<ffffffff814175f6>] usb_unlink_urb+0x26/0x50
> > > [<ffffffff81418e92>] usb_sg_cancel+0x82/0xe0
> > > [<ffffffffa0074e71>] usb_stor_stop_transport+0x31/0x50 [usb_storage]
> > > [<ffffffffa0073b8d>] command_abort+0xcd/0xe0 [usb_storage]
> > > [<ffffffff813f51ef>] scmd_eh_abort_handler+0xbf/0x480
> > > [<ffffffff8108f06b>] process_one_work+0x17b/0x470
> > > [<ffffffff8108fe4b>] worker_thread+0x11b/0x400
> > > [<ffffffff8108fd30>] ? rescuer_thread+0x400/0x400
> > > [<ffffffff8109722f>] kthread+0xcf/0xe0
> > > [<ffffffff81097160>] ? kthread_create_on_node+0x140/0x140
> > > [<ffffffff8161387c>] ret_from_fork+0x7c/0xb0
> > > [<ffffffff81097160>] ? kthread_create_on_node+0x140/0x140
> > >
> > > from another crash, we know that the URB itself has been freed:
>
> > > but the underlying usb_device->device is still present:
>
> > > It looks like usb_hcd_unlink_urb takes a reference out on the underlying
> > > device but not the URB, which in our test case consistently gets freed
> > > just before usb_hcd_unlink_urb tries to return a reference on urb->device.
>
> The analysis looks correct.
>
> > > Maybe the URB is a reference count short when usb_hcd_unlink_urb calls
> > > unlink1? Somewhere usb-storage missed taking out a ref?
>
> No; it's a simple use-after-free error.
>
> > > A tourniquet hack-patch follows (probably inside out, as the URB should stay
> > > intact while usb_hcd_unlink_urb does its thing) and has been running a
> > > little over 30 surprise device removals under I/O without incident.
> > >
> > > Any better suggestions?
>
> Please try this instead.
>
> Alan Stern
>
>
>
> Index: usb-3.19/drivers/usb/core/hcd.c
> ===================================================================
> --- usb-3.19.orig/drivers/usb/core/hcd.c
> +++ usb-3.19/drivers/usb/core/hcd.c
> @@ -1618,6 +1618,7 @@ static int unlink1(struct usb_hcd *hcd,
> int usb_hcd_unlink_urb (struct urb *urb, int status)
> {
> struct usb_hcd *hcd;
> + struct usb_device *udev = urb->dev;
> int retval = -EIDRM;
> unsigned long flags;
>
> @@ -1629,20 +1630,19 @@ int usb_hcd_unlink_urb (struct urb *urb,
> spin_lock_irqsave(&hcd_urb_unlink_lock, flags);
> if (atomic_read(&urb->use_count) > 0) {
> retval = 0;
> - usb_get_dev(urb->dev);
> + usb_get_dev(udev);
> }
> spin_unlock_irqrestore(&hcd_urb_unlink_lock, flags);
> if (retval == 0) {
> hcd = bus_to_hcd(urb->dev->bus);
> retval = unlink1(hcd, urb, status);
> - usb_put_dev(urb->dev);
> + if (retval == 0)
> + retval = -EINPROGRESS;
> + else if (retval != -EIDRM && retval != -EBUSY)
> + dev_dbg(&udev->dev, "hcd_unlink_urb %p fail %d\n",
> + urb, retval);
> + usb_put_dev(udev);
> }
> -
> - if (retval == 0)
> - retval = -EINPROGRESS;
> - else if (retval != -EIDRM && retval != -EBUSY)
> - dev_dbg(&urb->dev->dev, "hcd_unlink_urb %p fail %d\n",
> - urb, retval);
> return retval;
> }
Hi Alan,
Thanks for the confirmation and refined patch. Caching the usb_device
pointer and moving the unlink1 return checking up looks good to me.
Your patch ran overnight tests with no problems.
Tested-off-by: Joe Lawrence <joe.lawrence@stratus.com>
-- Joe
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-01-30 13:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-28 22:04 usb-storage URB use-after-free Joe Lawrence
2015-01-28 22:07 ` Joe Lawrence
[not found] ` <54C95DA9.2050306-7+ureL1bLXNBDgjK7y7TUQ@public.gmane.org>
2015-01-29 16:42 ` Alan Stern
2015-01-30 13:40 ` Joe Lawrence
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.