All of lore.kernel.org
 help / color / mirror / Atom feed
* use-after-free in usbnet
@ 2012-03-19 15:12 Dave Jones
  2012-03-20  9:40 ` Ming Lei
  0 siblings, 1 reply; 40+ messages in thread
From: Dave Jones @ 2012-03-19 15:12 UTC (permalink / raw)
  To: netdev; +Cc: linux-usb, Fedora Kernel Team

We've had two reports of this use after free in Fedora now recently..



general protection fault: 0000 [#1] SMP 
CPU 0 
Modules linked in: cdc_ether usbnet mii cdc_wdm cdc_acm fuse ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM lockd iptable_mangle sunrpc bridge stp llc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi arc4 uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev v4l2_compat_ioctl32 media microcode iwlwifi i2c_i801 snd_hda_codec_conexant snd_hda_intel mac80211 snd_hda_codec snd_hwdep iTCO_wdt iTCO_vendor_support snd_pcm cfg80211 snd_page_alloc snd_timer e1000e thinkpad_acpi snd soundcore rfkill vhost_net tun macvtap macvlan kvm_intel kvm uin
 put xts gf128mul dm_crypt sdhci_pci sdhci mmc_core wmi i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
Pid: 632, comm: NetworkManager Not tainted 3.3.0-0.rc7.git0.3.fc17.x86_64 #1 LENOVO 2808D9G/2808D9G
RIP: 0010:[<ffffffff8131f671>]  [<ffffffff8131f671>] kobject_put+0x11/0x60
RSP: 0018:ffff88021424d4e8  EFLAGS: 00010206
RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6c03 RCX: 0000000000000006
RDX: 0000000000000990 RSI: ffff8802165108d0 RDI: 6b6b6b6b6b6b6c03
RBP: ffff88021424d4f8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8801be9eb0c0
R13: 00000000ffffff98 R14: ffff8801d88e4010 R15: 000000000000002f
FS:  00007fde3f65c840(0000) GS:ffff880232e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe21c41000 CR3: 00000002167ed000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process NetworkManager (pid: 632, threadinfo ffff88021424c000, task ffff880216510000)
Stack:
 ffff88021424d4f8 0000000000000000 ffff88021424d508 ffffffff8140d9f7
 ffff88021424d518 ffffffff8147e78a ffff88021424d548 ffffffff81487ef0
 ffff88021424d538 ffff8801bbfbf3c0 ffff8801d88e4028 ffff8801be9eb0c0
Call Trace:
 [<ffffffff8140d9f7>] put_device+0x17/0x20
 [<ffffffff8147e78a>] usb_put_dev+0x1a/0x20
 [<ffffffff81487ef0>] usb_hcd_unlink_urb+0x70/0x100
 [<ffffffff814889f6>] usb_unlink_urb+0x26/0x50
 [<ffffffffa06c68bb>] unlink_urbs.isra.15+0x7b/0xd0 [usbnet]
 [<ffffffffa06c6a82>] usbnet_terminate_urbs+0x172/0x290 [usbnet]
 [<ffffffff8169aedd>] ? __mutex_unlock_slowpath+0xdd/0x180
 [<ffffffff8109c2b0>] ? try_to_wake_up+0x340/0x340
 [<ffffffff8108b0c7>] ? add_wait_queue+0x27/0x60
 [<ffffffffa06c89f8>] usbnet_stop+0x108/0x1a0 [usbnet]
 [<ffffffff81559f1d>] __dev_close_many+0x9d/0xf0
 [<ffffffff81559fab>] __dev_close+0x3b/0x60
 [<ffffffff81563af1>] __dev_change_flags+0xa1/0x180
 [<ffffffff81563c88>] dev_change_flags+0x28/0x70
 [<ffffffff8157261a>] do_setlink+0x34a/0x9d0
 [<ffffffff8133e881>] ? nla_parse+0x31/0xe0
 [<ffffffff812bfe5e>] ? avc_has_perm_noaudit+0x4e/0x530
 [<ffffffff815736ae>] rtnl_newlink+0x37e/0x560
 [<ffffffff812c0d19>] ? selinux_capable+0x39/0x50
 [<ffffffff8169a100>] ? mutex_lock_nested+0x270/0x3a0
 [<ffffffff812bd158>] ? security_capable+0x18/0x20
 [<ffffffff81572f04>] rtnetlink_rcv_msg+0x114/0x2e0
 [<ffffffff815707d7>] ? rtnl_lock+0x17/0x20
 [<ffffffff81572df0>] ? __rtnl_unlock+0x20/0x20
 [<ffffffff8158f291>] netlink_rcv_skb+0xa1/0xb0
 [<ffffffff81570805>] rtnetlink_rcv+0x25/0x40
 [<ffffffff8158ebdd>] netlink_unicast+0x19d/0x1f0
 [<ffffffff8158ef0b>] netlink_sendmsg+0x2db/0x360
 [<ffffffff8154b6fd>] ? sock_update_netprioidx+0x6d/0x250
 [<ffffffff81544d58>] sock_sendmsg+0xf8/0x130
 [<ffffffff810cc45f>] ? lock_release_non_nested+0x2ef/0x330
 [<ffffffff81174105>] ? might_fault+0xa5/0xb0
 [<ffffffff811740bc>] ? might_fault+0x5c/0xb0
 [<ffffffff815542c6>] ? verify_iovec+0x56/0xd0
 [<ffffffff8154521e>] __sys_sendmsg+0x42e/0x440
 [<ffffffff810210f9>] ? sched_clock+0x9/0x10
 [<ffffffff810a2a35>] ? sched_clock_local+0x25/0xa0
 [<ffffffff810a2be8>] ? sched_clock_cpu+0xa8/0x120
 [<ffffffff810c706d>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff810a2ccf>] ? local_clock+0x6f/0x80
 [<ffffffff811bd2e0>] ? fget_light+0xf0/0x4a0
 [<ffffffff811bd252>] ? fget_light+0x62/0x4a0
 [<ffffffff811bde14>] ? fput+0x1e4/0x280
 [<ffffffff81547ce9>] sys_sendmsg+0x49/0x90
 [<ffffffff816a6b69>] system_call_fastpath+0x16/0x1b
Code: 01 00 e9 10 ff ff ff 0f 1f 00 55 48 83 ef 38 48 89 e5 e8 43 fe ff ff 5d c3 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 1a <f6> 47 3c 01 74 21 f0 83 6b 38 01 0f 94 c0 84 c0 74 08 48 89 df 
RIP  [<ffffffff8131f671>] kobject_put+0x11/0x60
 RSP <ffff88021424d4e8>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-03-19 15:12 use-after-free in usbnet Dave Jones
@ 2012-03-20  9:40 ` Ming Lei
       [not found]   ` <CACVXFVN5O2S0hsnzHoi=LX+KAnccHc_F0SXq9-hMOXnaoUOdkg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2012-03-20  9:40 UTC (permalink / raw)
  To: Dave Jones; +Cc: netdev, linux-usb, Fedora Kernel Team

Hi,

On Mon, Mar 19, 2012 at 11:12 PM, Dave Jones <davej@redhat.com> wrote:
> We've had two reports of this use after free in Fedora now recently..

Could you provide output of 'dmesg' and 'lsusb -v' from the reported machine?



Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]   ` <CACVXFVN5O2S0hsnzHoi=LX+KAnccHc_F0SXq9-hMOXnaoUOdkg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-03-21  1:04     ` Ming Lei
  2012-03-21  1:34       ` Ming Lei
  2012-03-21 14:44       ` Greg KH
  0 siblings, 2 replies; 40+ messages in thread
From: Ming Lei @ 2012-03-21  1:04 UTC (permalink / raw)
  To: Dave Jones
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

On Tue, Mar 20, 2012 at 5:40 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi,
>
> On Mon, Mar 19, 2012 at 11:12 PM, Dave Jones <davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>> We've had two reports of this use after free in Fedora now recently..
>
> Could you provide output of 'dmesg' and 'lsusb -v' from the reported machine?

Looks I have figured out why your problem is triggered.

If the URB being unlinked is freed before usb_put_dev
inside usb_hcd_unlink_urb, the use-after-free will be triggered.
And the below patch[1] should fix the problem.

Also there is another bug in tx_complete() of usbnet, the line below

                urb->dev = NULL;

should be removed to avoid possible oops or memory leak in unlink path.

Please test the patch if you can reproduce the problem.

[1],
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 59681f0..4f4e028 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -592,7 +592,9 @@ static int unlink_urbs (struct usbnet *dev, struct
sk_buff_head *q)
 		spin_unlock_irqrestore(&q->lock, flags);
 		// during some PM-driven resume scenarios,
 		// these (async) unlinks complete immediately
+		local_bh_disable();
 		retval = usb_unlink_urb (urb);
+		local_bh_enable();
 		if (retval != -EINPROGRESS && retval != 0)
 			netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
 		else
@@ -1028,7 +1030,6 @@ static void tx_complete (struct urb *urb)
 	}

 	usb_autopm_put_interface_async(dev->intf);
-	urb->dev = NULL;
 	entry->state = tx_done;
 	defer_bh(dev, skb, &dev->txq);
 }


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-03-21  1:04     ` Ming Lei
@ 2012-03-21  1:34       ` Ming Lei
  2012-03-21 14:35         ` Alan Stern
  2012-03-21 14:44       ` Greg KH
  1 sibling, 1 reply; 40+ messages in thread
From: Ming Lei @ 2012-03-21  1:34 UTC (permalink / raw)
  To: Alan Stern; +Cc: netdev, linux-usb, Fedora Kernel Team, Dave Jones

On Wed, Mar 21, 2012 at 9:04 AM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Tue, Mar 20, 2012 at 5:40 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>> Hi,
>>
>> On Mon, Mar 19, 2012 at 11:12 PM, Dave Jones <davej@redhat.com> wrote:
>>> We've had two reports of this use after free in Fedora now recently..
>>
>> Could you provide output of 'dmesg' and 'lsusb -v' from the reported machine?
>
> Looks I have figured out why your problem is triggered.
>
> If the URB being unlinked is freed before usb_put_dev
> inside usb_hcd_unlink_urb, the use-after-free will be triggered.
> And the below patch[1] should fix the problem.
>
> Also there is another bug in tx_complete() of usbnet, the line below
>
>                urb->dev = NULL;
>
> should be removed to avoid possible oops or memory leak in unlink path.
>
> Please test the patch if you can reproduce the problem.
>
> [1],
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index 59681f0..4f4e028 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -592,7 +592,9 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>                spin_unlock_irqrestore(&q->lock, flags);
>                // during some PM-driven resume scenarios,
>                // these (async) unlinks complete immediately
> +               local_bh_disable();
>                retval = usb_unlink_urb (urb);
> +               local_bh_enable();

Looks it is a general issue about usb_hcd_unlink_urb.

Alan, how about increasing URB reference count before calling unlink1
inside usb_hcd_unlink_urb to fix this kind of problem?

Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-03-21  1:34       ` Ming Lei
@ 2012-03-21 14:35         ` Alan Stern
       [not found]           ` <Pine.LNX.4.44L0.1203211031490.1369-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Alan Stern @ 2012-03-21 14:35 UTC (permalink / raw)
  To: Ming Lei; +Cc: netdev, linux-usb, Fedora Kernel Team, Dave Jones

On Wed, 21 Mar 2012, Ming Lei wrote:

> Looks it is a general issue about usb_hcd_unlink_urb.
> 
> Alan, how about increasing URB reference count before calling unlink1
> inside usb_hcd_unlink_urb to fix this kind of problem?

No, that won't fix the problem.  The URB could complete and be
deallocated even before usb_hcd_unlink_urb() is called, so nothing that
function can do will prevent the error.

It is the caller's responsibility to make sure that the URB does not 
get freed before usb_unlink_urb() or usb_kill_urb() returns.  We 
probably should mention this in the kerneldoc...

Alan Stern

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-03-21  1:04     ` Ming Lei
  2012-03-21  1:34       ` Ming Lei
@ 2012-03-21 14:44       ` Greg KH
  2012-03-21 15:07         ` Ming Lei
  1 sibling, 1 reply; 40+ messages in thread
From: Greg KH @ 2012-03-21 14:44 UTC (permalink / raw)
  To: Ming Lei; +Cc: Dave Jones, netdev, linux-usb, Fedora Kernel Team

On Wed, Mar 21, 2012 at 09:04:15AM +0800, Ming Lei wrote:
> On Tue, Mar 20, 2012 at 5:40 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> > Hi,
> >
> > On Mon, Mar 19, 2012 at 11:12 PM, Dave Jones <davej@redhat.com> wrote:
> >> We've had two reports of this use after free in Fedora now recently..
> >
> > Could you provide output of 'dmesg' and 'lsusb -v' from the reported machine?
> 
> Looks I have figured out why your problem is triggered.
> 
> If the URB being unlinked is freed before usb_put_dev
> inside usb_hcd_unlink_urb, the use-after-free will be triggered.
> And the below patch[1] should fix the problem.

With the reference counting we have, how can the urb be freed at this
point in time?  Is the driver doing wierd things with the urb reference
counts?

> Also there is another bug in tx_complete() of usbnet, the line below
> 
>                 urb->dev = NULL;
> 
> should be removed to avoid possible oops or memory leak in unlink path.
> 
> Please test the patch if you can reproduce the problem.
> 
> [1],
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index 59681f0..4f4e028 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -592,7 +592,9 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>  		spin_unlock_irqrestore(&q->lock, flags);
>  		// during some PM-driven resume scenarios,
>  		// these (async) unlinks complete immediately
> +		local_bh_disable();
>  		retval = usb_unlink_urb (urb);
> +		local_bh_enable();

That doesn't seem right, as you point out in your follow-up message.
This shouldn't be needed, unless you are doing some really wierd things
with the urb :(

greg k-h

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]           ` <Pine.LNX.4.44L0.1203211031490.1369-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
@ 2012-03-21 15:02             ` Ming Lei
       [not found]               ` <CACVXFVP+U1k7JFTmbabF-k8F3bO9zc58c3tLG6=1nQPcrR9p1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2012-03-21 15:02 UTC (permalink / raw)
  To: Alan Stern
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team, Dave Jones

On Wed, Mar 21, 2012 at 10:35 PM, Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org> wrote:
> On Wed, 21 Mar 2012, Ming Lei wrote:
>
>> Looks it is a general issue about usb_hcd_unlink_urb.
>>
>> Alan, how about increasing URB reference count before calling unlink1
>> inside usb_hcd_unlink_urb to fix this kind of problem?
>
> No, that won't fix the problem.  The URB could complete and be
> deallocated even before usb_hcd_unlink_urb() is called, so nothing that
> function can do will prevent the error.

IMO, driver should not call usb_hcd_unlink_urb after urb is freed from
the driver,
but this problem is that URB may be freed during usb_hcd_unlink_urb.

In fact, it is allowed that usb_free_urb is called inside .complete handler,
at least as said in Documentation/URB.txt:

         "You may free an urb that you've submitted,..."

So looks reasonable to increase the URB reference count before calling
unlink1(), just like that done inside usb_hcd_flush_endpoint().  And I
think it is a general solution for avoiding this kind of problem.

>
> It is the caller's responsibility to make sure that the URB does not
> get freed before usb_unlink_urb() or usb_kill_urb() returns.  We
> probably should mention this in the kerneldoc...

If so, looks it is a bit contrary with Documentation/URB.txt, also
this may add extra constraint(maybe unnecessary) to the driver.


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-03-21 14:44       ` Greg KH
@ 2012-03-21 15:07         ` Ming Lei
  0 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2012-03-21 15:07 UTC (permalink / raw)
  To: Greg KH; +Cc: Dave Jones, netdev, linux-usb, Fedora Kernel Team

On Wed, Mar 21, 2012 at 10:44 PM, Greg KH <greg@kroah.com> wrote:
> On Wed, Mar 21, 2012 at 09:04:15AM +0800, Ming Lei wrote:
>> On Tue, Mar 20, 2012 at 5:40 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>> > Hi,
>> >
>> > On Mon, Mar 19, 2012 at 11:12 PM, Dave Jones <davej@redhat.com> wrote:
>> >> We've had two reports of this use after free in Fedora now recently..
>> >
>> > Could you provide output of 'dmesg' and 'lsusb -v' from the reported machine?
>>
>> Looks I have figured out why your problem is triggered.
>>
>> If the URB being unlinked is freed before usb_put_dev
>> inside usb_hcd_unlink_urb, the use-after-free will be triggered.
>> And the below patch[1] should fix the problem.
>
> With the reference counting we have, how can the urb be freed at this
> point in time?  Is the driver doing wierd things with the urb reference
> counts?

The problem is that the .complete may schedule a tasklet to
free the completed URB. And the .complete may be run inside
unlink path, so the use-after-free will be triggered if the
tasklet is excuted before usb_put_dev inside usb_hcd_unlink_urb.

>
>> Also there is another bug in tx_complete() of usbnet, the line below
>>
>>                 urb->dev = NULL;
>>
>> should be removed to avoid possible oops or memory leak in unlink path.
>>
>> Please test the patch if you can reproduce the problem.
>>
>> [1],
>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>> index 59681f0..4f4e028 100644
>> --- a/drivers/net/usb/usbnet.c
>> +++ b/drivers/net/usb/usbnet.c
>> @@ -592,7 +592,9 @@ static int unlink_urbs (struct usbnet *dev, struct
>> sk_buff_head *q)
>>               spin_unlock_irqrestore(&q->lock, flags);
>>               // during some PM-driven resume scenarios,
>>               // these (async) unlinks complete immediately
>> +             local_bh_disable();
>>               retval = usb_unlink_urb (urb);
>> +             local_bh_enable();
>
> That doesn't seem right, as you point out in your follow-up message.
> This shouldn't be needed, unless you are doing some really wierd things
> with the urb :(

Looks the driver doesn't do any wierd things, as said above.


Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]               ` <CACVXFVP+U1k7JFTmbabF-k8F3bO9zc58c3tLG6=1nQPcrR9p1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-03-21 16:12                 ` Alan Stern
       [not found]                   ` <Pine.LNX.4.44L0.1203211201560.1369-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Alan Stern @ 2012-03-21 16:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team, Dave Jones

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=UTF-8, Size: 2679 bytes --]

On Wed, 21 Mar 2012, Ming Lei wrote:

> On Wed, Mar 21, 2012 at 10:35 PM, Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLrNAH6kLmebB@public.gmane.orgdu> wrote:
> > On Wed, 21 Mar 2012, Ming Lei wrote:
> >
> >> Looks it is a general issue about usb_hcd_unlink_urb.
> >>
> >> Alan, how about increasing URB reference count before calling unlink1
> >> inside usb_hcd_unlink_urb to fix this kind of problem?
> >
> > No, that won't fix the problem.  The URB could complete and be
> > deallocated even before usb_hcd_unlink_urb() is called, so nothing that
> > function can do will prevent the error.
> 
> IMO, driver should not call usb_hcd_unlink_urb after urb is freed from
> the driver,
> but this problem is that URB may be freed during usb_hcd_unlink_urb.

Drivers don't call usb_hcd_unlink_urb; they call usb_unlink_urb.  This 
sort of thing can happen:

	Driver			Interrupt handler
	------			-----------------
	call usb_unlink_urb
				URB completion interrupt occurs
				 call usb_hcd_giveback_urb
				  completion routine calls usb_free_urb
				  URB is deallocated
	 call usb_hcd_unlink_urb
	  try to increment URB's refcount
	  oops because URB is gone

> In fact, it is allowed that usb_free_urb is called inside .complete handler,
> at least as said in Documentation/URB.txt:
> 
>          "You may free an urb that you've submitted,..."
> 
> So looks reasonable to increase the URB reference count before calling
> unlink1(), just like that done inside usb_hcd_flush_endpoint().  And I
> think it is a general solution for avoiding this kind of problem.

It will not solve the problem illustrated above.  The driver must avoid
freeing the URB before usb_unlink_urb returns.  In this case,
increasing the refcount around the unlink call would work.

> > It is the caller's responsibility to make sure that the URB does not
> > get freed before usb_unlink_urb() or usb_kill_urb() returns.  We
> > probably should mention this in the kerneldoc...
> 
> If so, looks it is a bit contrary with Documentation/URB.txt, also
> this may add extra constraint(maybe unnecessary) to the driver.

It's not contradictory.  You may indeed free an URB that you have 
submitted, so long as you don't free it while usb_unlink_urb (or 
related routines like usb_kill_urb) is running.

The extra constraint on the driver is indeed necessary.  However the 
driver can avoid complications by using anchors.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                   ` <Pine.LNX.4.44L0.1203211201560.1369-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
@ 2012-03-21 16:22                     ` Ming Lei
       [not found]                       ` <CACVXFVOVjnWjqpKxbU98DAyUC_OSb8ZL-3WcyYuFXgPJn5UyuA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2012-03-21 16:22 UTC (permalink / raw)
  To: Alan Stern
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team, Dave Jones

On Thu, Mar 22, 2012 at 12:12 AM, Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org> wrote:
> On Wed, 21 Mar 2012, Ming Lei wrote:
>
>> On Wed, Mar 21, 2012 at 10:35 PM, Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLg@public.gmane.orgedu> wrote:
>> > On Wed, 21 Mar 2012, Ming Lei wrote:
>> >
>> >> Looks it is a general issue about usb_hcd_unlink_urb.
>> >>
>> >> Alan, how about increasing URB reference count before calling unlink1
>> >> inside usb_hcd_unlink_urb to fix this kind of problem?
>> >
>> > No, that won't fix the problem.  The URB could complete and be
>> > deallocated even before usb_hcd_unlink_urb() is called, so nothing that
>> > function can do will prevent the error.
>>
>> IMO, driver should not call usb_hcd_unlink_urb after urb is freed from
>> the driver,
>> but this problem is that URB may be freed during usb_hcd_unlink_urb.
>
> Drivers don't call usb_hcd_unlink_urb; they call usb_unlink_urb.  This
> sort of thing can happen:
>
>        Driver                  Interrupt handler
>        ------                  -----------------
>        call usb_unlink_urb
>                                URB completion interrupt occurs
>                                 call usb_hcd_giveback_urb
>                                  completion routine calls usb_free_urb
>                                  URB is deallocated
>         call usb_hcd_unlink_urb
>          try to increment URB's refcount
>          oops because URB is gone

Got it, thanks for your detailed explanation.

>> In fact, it is allowed that usb_free_urb is called inside .complete handler,
>> at least as said in Documentation/URB.txt:
>>
>>          "You may free an urb that you've submitted,..."
>>
>> So looks reasonable to increase the URB reference count before calling
>> unlink1(), just like that done inside usb_hcd_flush_endpoint().  And I
>> think it is a general solution for avoiding this kind of problem.
>
> It will not solve the problem illustrated above.  The driver must avoid
> freeing the URB before usb_unlink_urb returns.  In this case,
> increasing the refcount around the unlink call would work.

Yes, it is the right fix.

>> > It is the caller's responsibility to make sure that the URB does not
>> > get freed before usb_unlink_urb() or usb_kill_urb() returns.  We
>> > probably should mention this in the kerneldoc...
>>
>> If so, looks it is a bit contrary with Documentation/URB.txt, also
>> this may add extra constraint(maybe unnecessary) to the driver.
>
> It's not contradictory.  You may indeed free an URB that you have
> submitted, so long as you don't free it while usb_unlink_urb (or
> related routines like usb_kill_urb) is running.

Maybe it should be documented.

So looks the correct fix should be below:

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 4b8b52c..e36a821 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -588,7 +588,7 @@ static int unlink_urbs (struct usbnet *dev, struct
sk_buff_head *q)

 		entry = (struct skb_data *) skb->cb;
 		urb = entry->urb;
-
+		usb_get_urb(urb);
 		spin_unlock_irqrestore(&q->lock, flags);
 		// during some PM-driven resume scenarios,
 		// these (async) unlinks complete immediately
@@ -597,6 +597,7 @@ static int unlink_urbs (struct usbnet *dev, struct
sk_buff_head *q)
 			netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
 		else
 			count++;
+		usb_put_urb(urb);
 		spin_lock_irqsave(&q->lock, flags);
 	}
 	spin_unlock_irqrestore (&q->lock, flags);
@@ -1028,7 +1029,6 @@ static void tx_complete (struct urb *urb)
 	}

 	usb_autopm_put_interface_async(dev->intf);
-	urb->dev = NULL;
 	entry->state = tx_done;
 	defer_bh(dev, skb, &dev->txq);
 }



Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                       ` <CACVXFVOVjnWjqpKxbU98DAyUC_OSb8ZL-3WcyYuFXgPJn5UyuA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-03-21 17:30                         ` Alan Stern
  2012-03-22  9:08                         ` Oliver Neukum
  1 sibling, 0 replies; 40+ messages in thread
From: Alan Stern @ 2012-03-21 17:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team, Dave Jones

On Thu, 22 Mar 2012, Ming Lei wrote:

> So looks the correct fix should be below:
> 
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index 4b8b52c..e36a821 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -588,7 +588,7 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
> 
>  		entry = (struct skb_data *) skb->cb;
>  		urb = entry->urb;
> -
> +		usb_get_urb(urb);
>  		spin_unlock_irqrestore(&q->lock, flags);
>  		// during some PM-driven resume scenarios,
>  		// these (async) unlinks complete immediately
> @@ -597,6 +597,7 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>  			netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
>  		else
>  			count++;
> +		usb_put_urb(urb);
>  		spin_lock_irqsave(&q->lock, flags);
>  	}
>  	spin_unlock_irqrestore (&q->lock, flags);
> @@ -1028,7 +1029,6 @@ static void tx_complete (struct urb *urb)
>  	}
> 
>  	usb_autopm_put_interface_async(dev->intf);
> -	urb->dev = NULL;
>  	entry->state = tx_done;
>  	defer_bh(dev, skb, &dev->txq);
>  }

Yes, that looks about right.  But I'm not familiar with the details of 
usbnet.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                       ` <CACVXFVOVjnWjqpKxbU98DAyUC_OSb8ZL-3WcyYuFXgPJn5UyuA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-03-21 17:30                         ` Alan Stern
@ 2012-03-22  9:08                         ` Oliver Neukum
       [not found]                           ` <201203221008.46882.oneukum-l3A5Bk7waGM@public.gmane.org>
  1 sibling, 1 reply; 40+ messages in thread
From: Oliver Neukum @ 2012-03-22  9:08 UTC (permalink / raw)
  To: Ming Lei
  Cc: Alan Stern, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Fedora Kernel Team, Dave Jones

Am Mittwoch, 21. März 2012, 17:22:59 schrieb Ming Lei:

> -
> +               usb_get_urb(urb);
>                 spin_unlock_irqrestore(&q->lock, flags);
>                 // during some PM-driven resume scenarios,
>                 // these (async) unlinks complete immediately
> @@ -597,6 +597,7 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>                         netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
>                 else
>                         count++;
> +               usb_put_urb(urb);


Hi,

this looks good, but could you add a comment explaining the reason for
taking a reference?

	Regards
		Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                           ` <201203221008.46882.oneukum-l3A5Bk7waGM@public.gmane.org>
@ 2012-03-22  9:30                             ` Ming Lei
  2012-03-22  9:57                               ` Oliver Neukum
  2012-04-20 13:37                               ` Huajun Li
  0 siblings, 2 replies; 40+ messages in thread
From: Ming Lei @ 2012-03-22  9:30 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Alan Stern, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, Fedora Kernel Team, Dave Jones

On Thu, Mar 22, 2012 at 5:08 PM, Oliver Neukum <oneukum-l3A5Bk7waGM@public.gmane.org> wrote:
>
> this looks good, but could you add a comment explaining the reason for
> taking a reference?

OK, I will post a formal one if you have no objection on the below.

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 4b8b52c..febfdce 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -589,6 +589,14 @@ static int unlink_urbs (struct usbnet *dev,
struct sk_buff_head *q)
 		entry = (struct skb_data *) skb->cb;
 		urb = entry->urb;

+		/*
+		 * Get a reference count of the URB to avoid it to be
+		 * freed during usb_unlink_urb, which may trigger
+		 * use-after-free problem inside usb_unlink_urb since
+		 * usb_unlink_urb is always racing with .complete
+		 * handler(include defer_bh).
+		 */
+		usb_get_urb(urb);
 		spin_unlock_irqrestore(&q->lock, flags);
 		// during some PM-driven resume scenarios,
 		// these (async) unlinks complete immediately
@@ -597,6 +605,7 @@ static int unlink_urbs (struct usbnet *dev, struct
sk_buff_head *q)
 			netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
 		else
 			count++;
+		usb_put_urb(urb);
 		spin_lock_irqsave(&q->lock, flags);
 	}
 	spin_unlock_irqrestore (&q->lock, flags);


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-03-22  9:30                             ` Ming Lei
@ 2012-03-22  9:57                               ` Oliver Neukum
  2012-04-20 13:37                               ` Huajun Li
  1 sibling, 0 replies; 40+ messages in thread
From: Oliver Neukum @ 2012-03-22  9:57 UTC (permalink / raw)
  To: Ming Lei; +Cc: Alan Stern, netdev, linux-usb, Fedora Kernel Team, Dave Jones

Am Donnerstag, 22. März 2012, 10:30:36 schrieb Ming Lei:
> On Thu, Mar 22, 2012 at 5:08 PM, Oliver Neukum <oneukum@suse.de> wrote:
> >
> > this looks good, but could you add a comment explaining the reason for
> > taking a reference?
> 
> OK, I will post a formal one if you have no objection on the below.

Good patch :-)

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-03-22  9:30                             ` Ming Lei
  2012-03-22  9:57                               ` Oliver Neukum
@ 2012-04-20 13:37                               ` Huajun Li
  2012-04-20 14:22                                 ` Ming Lei
       [not found]                                 ` <CA+v9cxawVwKakF6c_RpAw2XUGWcbqd8M+ZJqyq76Au9rmNosmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 2 replies; 40+ messages in thread
From: Huajun Li @ 2012-04-20 13:37 UTC (permalink / raw)
  To: Ming Lei, Oliver Neukum, Alan Stern, Dave Jones
  Cc: netdev, linux-usb, Fedora Kernel Team, Huajun Li

On Thu, Mar 22, 2012 at 5:30 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Thu, Mar 22, 2012 at 5:08 PM, Oliver Neukum <oneukum@suse.de> wrote:
>>
>> this looks good, but could you add a comment explaining the reason for
>> taking a reference?
>
> OK, I will post a formal one if you have no objection on the below.
>
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index 4b8b52c..febfdce 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -589,6 +589,14 @@ static int unlink_urbs (struct usbnet *dev,
> struct sk_buff_head *q)
>                entry = (struct skb_data *) skb->cb;
>                urb = entry->urb;
>
> +               /*
> +                * Get a reference count of the URB to avoid it to be
> +                * freed during usb_unlink_urb, which may trigger
> +                * use-after-free problem inside usb_unlink_urb since
> +                * usb_unlink_urb is always racing with .complete
> +                * handler(include defer_bh).
> +                */
> +               usb_get_urb(urb);
>                spin_unlock_irqrestore(&q->lock, flags);
>                // during some PM-driven resume scenarios,
>                // these (async) unlinks complete immediately
> @@ -597,6 +605,7 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>                        netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
>                else
>                        count++;
> +               usb_put_urb(urb);
>                spin_lock_irqsave(&q->lock, flags);
>        }
>        spin_unlock_irqrestore (&q->lock, flags);
>
>


Above patch has already been integrated to mainline. However, maybe
there still exists another potentail use-after-free issue, here is a
case:
      After release the lock in unlink_urbs(), defer_bh() may move
current skb from rxq/txq to dev->done queue, even cause the skb be
released. Then in next loop cycle, it can't refer to expected skb, and
may Oops again.

To easily reproduce it, in unlink_urbs(), you can delay a short time
after usb_put_urb(urb), then disconnect your device while transferring
data, and repeat it times you will find errors on your screen.

Following is a draft patch to guarantee the queue consistent, and
refer to expected skb in each loop cycle:

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index b7b3f5b..6da0141 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -578,16 +578,28 @@ EXPORT_SYMBOL_GPL(usbnet_purge_paused_rxq);
 static int unlink_urbs (struct usbnet *dev, struct sk_buff_head *q)
 {
 	unsigned long		flags;
-	struct sk_buff		*skb, *skbnext;
+	struct sk_buff		*skb;
 	int			count = 0;

 	spin_lock_irqsave (&q->lock, flags);
-	skb_queue_walk_safe(q, skb, skbnext) {
+	while (!skb_queue_empty(q)) {
 		struct skb_data		*entry;
 		struct urb		*urb;
 		int			retval;

-		entry = (struct skb_data *) skb->cb;
+		skb_queue_walk(q, skb) {
+			entry = (struct skb_data *)skb->cb;
+			if (entry->state == rx_done ||
+				entry->state == tx_done ||
+				entry->state == rx_cleanup)
+				continue;
+			else
+				break;
+		}
+
+		if (skb == (struct sk_buff *)(q))
+			break;
+
 		urb = entry->urb;


Thanks,
--Huajun Li

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-04-20 13:37                               ` Huajun Li
@ 2012-04-20 14:22                                 ` Ming Lei
       [not found]                                   ` <CACVXFVM5YBJkDZddeGi1_MPY7EqEV_wtoFy-NtBHYA6rxez0jg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
       [not found]                                 ` <CA+v9cxawVwKakF6c_RpAw2XUGWcbqd8M+ZJqyq76Au9rmNosmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 40+ messages in thread
From: Ming Lei @ 2012-04-20 14:22 UTC (permalink / raw)
  To: Huajun Li
  Cc: Oliver Neukum, Alan Stern, Dave Jones, netdev, linux-usb,
	Fedora Kernel Team

On Fri, Apr 20, 2012 at 9:37 PM, Huajun Li <huajun.li.lee@gmail.com> wrote:
>
> Above patch has already been integrated to mainline. However, maybe
> there still exists another potentail use-after-free issue, here is a
> case:
>      After release the lock in unlink_urbs(), defer_bh() may move
> current skb from rxq/txq to dev->done queue, even cause the skb be
> released. Then in next loop cycle, it can't refer to expected skb, and
> may Oops again.

Could you explain in a bit detail? Why can't the expected skb be refered
to in next loop?

>
> To easily reproduce it, in unlink_urbs(), you can delay a short time
> after usb_put_urb(urb), then disconnect your device while transferring
> data, and repeat it times you will find errors on your screen.

Could you post out the error log?

>
> Following is a draft patch to guarantee the queue consistent, and
> refer to expected skb in each loop cycle:
>
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index b7b3f5b..6da0141 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -578,16 +578,28 @@ EXPORT_SYMBOL_GPL(usbnet_purge_paused_rxq);
>  static int unlink_urbs (struct usbnet *dev, struct sk_buff_head *q)
>  {
>        unsigned long           flags;
> -       struct sk_buff          *skb, *skbnext;
> +       struct sk_buff          *skb;
>        int                     count = 0;
>
>        spin_lock_irqsave (&q->lock, flags);
> -       skb_queue_walk_safe(q, skb, skbnext) {
> +       while (!skb_queue_empty(q)) {
>                struct skb_data         *entry;
>                struct urb              *urb;
>                int                     retval;
>
> -               entry = (struct skb_data *) skb->cb;
> +               skb_queue_walk(q, skb) {
> +                       entry = (struct skb_data *)skb->cb;
> +                       if (entry->state == rx_done ||
> +                               entry->state == tx_done ||
> +                               entry->state == rx_cleanup)

Maybe it is not necessary, if the state has been changed to  rx_done
or tx_done or rx_cleanup, it means the URB referenced to by the skb
has been completed, and the coming usb_unlink_urb can handle the
case correctly.

> +                               continue;
> +                       else
> +                               break;
> +               }
> +
> +               if (skb == (struct sk_buff *)(q))
> +                       break;
> +
>                urb = entry->urb;
>
>
> Thanks,
> --Huajun Li


Thanks,
-- 
Ming Lei

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                   ` <CACVXFVM5YBJkDZddeGi1_MPY7EqEV_wtoFy-NtBHYA6rxez0jg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-20 14:56                                     ` Huajun Li
       [not found]                                       ` <CA+v9cxZWtGbz6uCSysVbAc1hT27rCiuJXzcvSiTxH-zuQYnrZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-04-21  1:49                                       ` Ming Lei
  0 siblings, 2 replies; 40+ messages in thread
From: Huajun Li @ 2012-04-20 14:56 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

On Fri, Apr 20, 2012 at 10:22 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Fri, Apr 20, 2012 at 9:37 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>
>> Above patch has already been integrated to mainline. However, maybe
>> there still exists another potentail use-after-free issue, here is a
>> case:
>>      After release the lock in unlink_urbs(), defer_bh() may move
>> current skb from rxq/txq to dev->done queue, even cause the skb be
>> released. Then in next loop cycle, it can't refer to expected skb, and
>> may Oops again.
>
> Could you explain in a bit detail? Why can't the expected skb be refered
> to in next loop?


      unlink_urbs()                                           complete handler
--------------------------------------
-------------------------------------------------
     spin_unlock_irqrestore()
                                                                  rx_complete()
                                                                  derver_bh()

 __skb_unlink()

 __skb_queue_tail(&dev->done, skb)   =======> skb is moved to
dev->done, and can be freed by usbnet_bh()
      skb_queue_walk_safe()
                      tmp = skb->next   ===> refer to freed skb

>
>>
>> To easily reproduce it, in unlink_urbs(), you can delay a short time
>> after usb_put_urb(urb), then disconnect your device while transferring
>> data, and repeat it times you will find errors on your screen.
>
> Could you post out the error log?

The log like this:
===============================================================
[45219.230127] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[45219.230192] CPU 1
[45219.230208] Modules linked in: cdc_ether usbnet(O) bluetooth
dm_crypt binfmt_misc snd_hda_codec_analog snd_hda_intel snd_hda_codec
snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event hp_wmi
ppdev sparse_keymap i915 snd_seq snd_timer coretemp microcode
snd_seq_device psmouse tpm_infineon serio_raw snd soundcore
drm_kms_helper snd_page_alloc tpm_tis tpm parport_pc tpm_bios drm
i2c_algo_bit video lp parport sg ehci_hcd uhci_hcd sr_mod sd_mod
usbcore e1000e usb_common floppy
[45219.230658]
[45219.230673] Pid: 200, comm: khubd Tainted: G          IO 3.4.0-rc3
#56 Hewlett-Packard HP Compaq dc7800p Convertible Minitower/0AACh
[45219.230761] RIP: 0010:[<ffffffffa0094e7d>]  [<ffffffffa0094e7d>]
usb_get_urb+0x1d/0x70 [usbcore]
[45219.230837] RSP: 0018:ffff8800554df8e0  EFLAGS: 00010002
[45219.230874] RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000008760
[45219.230920] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000202 RDI: 6b6b6b6b6b6b6b6b
[45219.230966] RBP: ffff8800554df8f0 R08: 0000000304b1e1be R09: 0000000000000001
[45219.231012] R10: 0000000000000001 R11: 0000000000000000 R12: 6b6b6b6b6b6b6b6b
[45219.231058] R13: ffff88000650e330 R14: ffff88000650e318 R15: 0000000000000001
[45219.231105] FS:  0000000000000000(0000) GS:ffff88007a400000(0000)
knlGS:0000000000000000
[45219.231158] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[45219.231196] CR2: 00007f1bd331c3e0 CR3: 000000000656b000 CR4: 00000000000007e0
[45219.231243] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[45219.231289] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[45219.231335] Process khubd (pid: 200, threadinfo ffff8800554de000,
task ffff880055b4a410)
[45219.231387] Stack:
[45219.231405]  ffff8800567daf40 ffff88000650e330 ffff8800554df940
ffffffffa053ee1a
[45219.231469]  0000000000000202 ffff88000650e2a0 ffff8800554df940
ffff88000650e140
[45219.231534]  ffff880055b4a410 ffff88000650e318 ffff88000650e378
ffff88000650e2a0
[45219.231597] Call Trace:
[45219.231622]  [<ffffffffa053ee1a>] unlink_urbs.isra.14+0xca/0x170 [usbnet]
[45219.231670]  [<ffffffffa053f05b>] usbnet_terminate_urbs+0x19b/0x350 [usbnet]
[45219.231721]  [<ffffffff810d1700>] ? try_to_wake_up+0x580/0x580
[45219.231762]  [<ffffffff810c24ba>] ? __wake_up+0x3a/0x90
[45219.231801]  [<ffffffffa053ff90>] usbnet_stop+0x190/0x270 [usbnet]
[45219.231846]  [<ffffffff817b1226>] __dev_close_many+0xd6/0x160
[45219.231886]  [<ffffffff817b1368>] dev_close_many+0xb8/0x160
[45219.231926]  [<ffffffff817b1518>] rollback_registered_many+0x108/0x390
[45219.231972]  [<ffffffff817b1874>] rollback_registered+0x44/0x70
[45219.232013]  [<ffffffff817b1920>] unregister_netdevice_queue+0x80/0xf0
[45219.232058]  [<ffffffff817b1ae0>] unregister_netdev+0x30/0x50
[45219.232100]  [<ffffffffa053e61a>] usbnet_disconnect+0x8a/0x170 [usbnet]
[45219.232155]  [<ffffffffa009adc5>] usb_unbind_interface+0x65/0x230 [usbcore]
[45219.232205]  [<ffffffff8163cda7>] __device_release_driver+0x157/0x170
[45219.232249]  [<ffffffff8163cdfe>] device_release_driver+0x3e/0x60
[45219.232291]  [<ffffffff8163c27f>] bus_remove_device+0x15f/0x210
[45219.232331]  [<ffffffff81637e6d>] device_del+0x1ad/0x2b0
[45219.232379]  [<ffffffffa0097970>] usb_disable_device+0x100/0x340 [usbcore]
[45219.232435]  [<ffffffffa008a417>] usb_disconnect+0xf7/0x220 [usbcore]
[45219.232487]  [<ffffffffa008e411>] hub_thread+0x1321/0x21e0 [usbcore]
[45219.232535]  [<ffffffff810b3780>] ? wake_up_bit+0x50/0x50
[45219.232581]  [<ffffffffa008d0f0>] ? usb_remote_wakeup+0xb0/0xb0 [usbcore]
[45219.232628]  [<ffffffff810b2b9f>] kthread+0xdf/0xf0
[45219.232664]  [<ffffffff819a5574>] kernel_thread_helper+0x4/0x10
[45219.232706]  [<ffffffff819970b0>] ? retint_restore_args+0x13/0x13
[45219.232749]  [<ffffffff810b2ac0>] ? __init_kthread_worker+0x80/0x80
[45219.232791]  [<ffffffff819a5570>] ? gs_change+0x13/0x13
[45219.232826] Code: ff ff e9 d4 fb ff ff 0f 1f 80 00 00 00 00 55 48
89 e5 48 83 ec 10 66 66 66 66 90 48 83 05 2b aa 02 00 01 48 85 ff 48
89 f8 74 19 <8b> 17 48 83 05 21 aa 02 00 01 85 d2 74 0d f0 ff 00 48 83
05 2a
[45219.233184] RIP  [<ffffffffa0094e7d>] usb_get_urb+0x1d/0x70 [usbcore]
[45219.233240]  RSP <ffff8800554df8e0>
[45219.233268] ---[ end trace a7919e7f17c0a727 ]---
[45222.870031] usb0: no IPv6 routers present

>
>>
>> Following is a draft patch to guarantee the queue consistent, and
>> refer to expected skb in each loop cycle:
>>
>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>> index b7b3f5b..6da0141 100644
>> --- a/drivers/net/usb/usbnet.c
>> +++ b/drivers/net/usb/usbnet.c
>> @@ -578,16 +578,28 @@ EXPORT_SYMBOL_GPL(usbnet_purge_paused_rxq);
>>  static int unlink_urbs (struct usbnet *dev, struct sk_buff_head *q)
>>  {
>>        unsigned long           flags;
>> -       struct sk_buff          *skb, *skbnext;
>> +       struct sk_buff          *skb;
>>        int                     count = 0;
>>
>>        spin_lock_irqsave (&q->lock, flags);
>> -       skb_queue_walk_safe(q, skb, skbnext) {
>> +       while (!skb_queue_empty(q)) {
>>                struct skb_data         *entry;
>>                struct urb              *urb;
>>                int                     retval;
>>
>> -               entry = (struct skb_data *) skb->cb;
>> +               skb_queue_walk(q, skb) {
>> +                       entry = (struct skb_data *)skb->cb;
>> +                       if (entry->state == rx_done ||
>> +                               entry->state == tx_done ||
>> +                               entry->state == rx_cleanup)
>
> Maybe it is not necessary, if the state has been changed to  rx_done
> or tx_done or rx_cleanup, it means the URB referenced to by the skb
> has been completed, and the coming usb_unlink_urb can handle the
> case correctly.
>

If its state is x_done/tx_done/rx_cleanup, that means the the skb will
be released soon, right? If so, it should avoid calling
usb_unlink_urb().

>> +                               continue;
>> +                       else
>> +                               break;
>> +               }
>> +
>> +               if (skb == (struct sk_buff *)(q))
>> +                       break;
>> +
>>                urb = entry->urb;
>>
>>
>> Thanks,
>> --Huajun Li
>
>
> Thanks,
> --
> Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                       ` <CA+v9cxZWtGbz6uCSysVbAc1hT27rCiuJXzcvSiTxH-zuQYnrZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-20 15:07                                         ` Huajun Li
  0 siblings, 0 replies; 40+ messages in thread
From: Huajun Li @ 2012-04-20 15:07 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

On Fri, Apr 20, 2012 at 10:56 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Fri, Apr 20, 2012 at 10:22 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Fri, Apr 20, 2012 at 9:37 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>
>>> Above patch has already been integrated to mainline. However, maybe
>>> there still exists another potentail use-after-free issue, here is a
>>> case:
>>>      After release the lock in unlink_urbs(), defer_bh() may move
>>> current skb from rxq/txq to dev->done queue, even cause the skb be
>>> released. Then in next loop cycle, it can't refer to expected skb, and
>>> may Oops again.
>>
>> Could you explain in a bit detail? Why can't the expected skb be refered
>> to in next loop?
>
>
>      unlink_urbs()                                           complete handler
> --------------------------------------
> -------------------------------------------------
>     spin_unlock_irqrestore()
>                                                                  rx_complete()
>                                                                  derver_bh()
>
>  __skb_unlink()
>
>  __skb_queue_tail(&dev->done, skb)   =======> skb is moved to
> dev->done, and can be freed by usbnet_bh()
>      skb_queue_walk_safe()
>                      tmp = skb->next   ===> refer to freed skb
>

Sorry, email client messed up these lines, resend it:

  unlink_urbs()                    complete handler
------------------------               ------------------------------
spin_unlock_irqrestore()
                                       rx_complete()
                                       derver_bh()
                                          __skb_unlink()
                                          __skb_queue_tail(&dev->done, skb)
                                          =======> skb is moved to dev->done,
                                        and can be freed by usbnet_bh()

skb_queue_walk_safe()
        tmp = skb->next   ===> refer to freed skb
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-04-20 14:56                                     ` Huajun Li
       [not found]                                       ` <CA+v9cxZWtGbz6uCSysVbAc1hT27rCiuJXzcvSiTxH-zuQYnrZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-21  1:49                                       ` Ming Lei
  2012-04-21  2:02                                         ` Ming Lei
                                                           ` (2 more replies)
  1 sibling, 3 replies; 40+ messages in thread
From: Ming Lei @ 2012-04-21  1:49 UTC (permalink / raw)
  To: Huajun Li
  Cc: Oliver Neukum, Alan Stern, Dave Jones, netdev, linux-usb,
	Fedora Kernel Team

On Fri, Apr 20, 2012 at 10:56 PM, Huajun Li <huajun.li.lee@gmail.com> wrote:
> On Fri, Apr 20, 2012 at 10:22 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>> On Fri, Apr 20, 2012 at 9:37 PM, Huajun Li <huajun.li.lee@gmail.com> wrote:
>>>
>>> Above patch has already been integrated to mainline. However, maybe
>>> there still exists another potentail use-after-free issue, here is a
>>> case:
>>>      After release the lock in unlink_urbs(), defer_bh() may move
>>> current skb from rxq/txq to dev->done queue, even cause the skb be
>>> released. Then in next loop cycle, it can't refer to expected skb, and
>>> may Oops again.
>>
>> Could you explain in a bit detail? Why can't the expected skb be refered
>> to in next loop?
>
>
>      unlink_urbs()                                           complete handler
> --------------------------------------
> -------------------------------------------------
>     spin_unlock_irqrestore()
>                                                                  rx_complete()
>                                                                  derver_bh()
>
>  __skb_unlink()
>
>  __skb_queue_tail(&dev->done, skb)   =======> skb is moved to
> dev->done, and can be freed by usbnet_bh()
>      skb_queue_walk_safe()
>                      tmp = skb->next   ===> refer to freed skb

I see the problem, so looks skb_queue_walk_safe is not safe.
I don' know why the 2nd ' tmp = skb->next' in  skb_queue_walk_safe
is needed and it may become unsafe if skb is freed during current loop.

But removing the 2nd 'tmp = skb->next' doesn't help the problem, because
tmp still may become freed after releasing lock.

> If its state is x_done/tx_done/rx_cleanup, that means the the skb will
> be released soon, right? If so, it should avoid calling
> usb_unlink_urb().

Even though you can avoid calling unlink for completed URBs, the skbs
still may be freed in unlink path because complete handler will be triggered
by unlink and the referenced skb may be freed before next loop, so your
patch can't fix the oops.

As far as I can think of, we can hold lock of done queue to forbid skb free
during unlinking. The below patch may fix the problem, are you OK
with it?

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index db99536..a9809d4 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -581,7 +581,8 @@ static int unlink_urbs (struct usbnet *dev, struct
sk_buff_head *q)
 	struct sk_buff		*skb, *skbnext;
 	int			count = 0;

-	spin_lock_irqsave (&q->lock, flags);
+	spin_lock_irqsave(&dev->done.lock, flags);
+	spin_lock(&q->lock);
 	skb_queue_walk_safe(q, skb, skbnext) {
 		struct skb_data		*entry;
 		struct urb		*urb;
@@ -598,7 +599,7 @@ static int unlink_urbs (struct usbnet *dev, struct
sk_buff_head *q)
 		 * handler(include defer_bh).
 		 */
 		usb_get_urb(urb);
-		spin_unlock_irqrestore(&q->lock, flags);
+		spin_unlock(&q->lock);
 		// during some PM-driven resume scenarios,
 		// these (async) unlinks complete immediately
 		retval = usb_unlink_urb (urb);
@@ -607,9 +608,10 @@ static int unlink_urbs (struct usbnet *dev,
struct sk_buff_head *q)
 		else
 			count++;
 		usb_put_urb(urb);
-		spin_lock_irqsave(&q->lock, flags);
+		spin_lock(&q->lock);
 	}
-	spin_unlock_irqrestore (&q->lock, flags);
+	spin_unlock(&q->lock);
+	spin_unlock_irqrestore(&dev->done.lock, flags);
 	return count;
 }



Thanks,
-- 
Ming Lei

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-04-21  1:49                                       ` Ming Lei
@ 2012-04-21  2:02                                         ` Ming Lei
       [not found]                                         ` <CACVXFVOc0XZ+eLHGiVwKuiUResRk8Cj9MS4EPMx7k57a0tEJhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-04-21 19:23                                         ` David Miller
  2 siblings, 0 replies; 40+ messages in thread
From: Ming Lei @ 2012-04-21  2:02 UTC (permalink / raw)
  To: Huajun Li
  Cc: Oliver Neukum, Alan Stern, Dave Jones, netdev, linux-usb,
	Fedora Kernel Team

On Sat, Apr 21, 2012 at 9:49 AM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Fri, Apr 20, 2012 at 10:56 PM, Huajun Li <huajun.li.lee@gmail.com> wrote:
>> On Fri, Apr 20, 2012 at 10:22 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>>> On Fri, Apr 20, 2012 at 9:37 PM, Huajun Li <huajun.li.lee@gmail.com> wrote:
>>>>
>>>> Above patch has already been integrated to mainline. However, maybe
>>>> there still exists another potentail use-after-free issue, here is a
>>>> case:
>>>>      After release the lock in unlink_urbs(), defer_bh() may move
>>>> current skb from rxq/txq to dev->done queue, even cause the skb be
>>>> released. Then in next loop cycle, it can't refer to expected skb, and
>>>> may Oops again.
>>>
>>> Could you explain in a bit detail? Why can't the expected skb be refered
>>> to in next loop?
>>
>>
>>      unlink_urbs()                                           complete handler
>> --------------------------------------
>> -------------------------------------------------
>>     spin_unlock_irqrestore()
>>                                                                  rx_complete()
>>                                                                  derver_bh()
>>
>>  __skb_unlink()
>>
>>  __skb_queue_tail(&dev->done, skb)   =======> skb is moved to
>> dev->done, and can be freed by usbnet_bh()
>>      skb_queue_walk_safe()
>>                      tmp = skb->next   ===> refer to freed skb
>
> I see the problem, so looks skb_queue_walk_safe is not safe.
> I don' know why the 2nd ' tmp = skb->next' in  skb_queue_walk_safe
> is needed and it may become unsafe if skb is freed during current loop.
>
> But removing the 2nd 'tmp = skb->next' doesn't help the problem, because
> tmp still may become freed after releasing lock.
>
>> If its state is x_done/tx_done/rx_cleanup, that means the the skb will
>> be released soon, right? If so, it should avoid calling
>> usb_unlink_urb().
>
> Even though you can avoid calling unlink for completed URBs, the skbs
> still may be freed in unlink path because complete handler will be triggered
> by unlink and the referenced skb may be freed before next loop, so your
> patch can't fix the oops.
>
> As far as I can think of, we can hold lock of done queue to forbid skb free
> during unlinking. The below patch may fix the problem, are you OK
> with it?

Sorry, the patch still doesn't work since done.lock can't be held
before calling unlinking.

One simple solution is just always holding q->lock during the whole loop,
like before commit 4231d47e6fe69f061f96c98c30eaf9fb4c14b96d
(net/usbnet: avoid recursive locking in usbnet_stop()), and take
the per cpu flag trick to avoid holding q->lock in complete handler
called by unlink, see idea in below link:

http://marc.info/?l=linux-usb&m=133491718707499&w=2

In theory, it should be simple, just take avoiding policy.

>
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index db99536..a9809d4 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -581,7 +581,8 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>        struct sk_buff          *skb, *skbnext;
>        int                     count = 0;
>
> -       spin_lock_irqsave (&q->lock, flags);
> +       spin_lock_irqsave(&dev->done.lock, flags);
> +       spin_lock(&q->lock);
>        skb_queue_walk_safe(q, skb, skbnext) {
>                struct skb_data         *entry;
>                struct urb              *urb;
> @@ -598,7 +599,7 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>                 * handler(include defer_bh).
>                 */
>                usb_get_urb(urb);
> -               spin_unlock_irqrestore(&q->lock, flags);
> +               spin_unlock(&q->lock);
>                // during some PM-driven resume scenarios,
>                // these (async) unlinks complete immediately
>                retval = usb_unlink_urb (urb);
> @@ -607,9 +608,10 @@ static int unlink_urbs (struct usbnet *dev,
> struct sk_buff_head *q)
>                else
>                        count++;
>                usb_put_urb(urb);
> -               spin_lock_irqsave(&q->lock, flags);
> +               spin_lock(&q->lock);
>        }
> -       spin_unlock_irqrestore (&q->lock, flags);
> +       spin_unlock(&q->lock);
> +       spin_unlock_irqrestore(&dev->done.lock, flags);
>        return count;
>  }
>
>
>
> Thanks,
> --
> Ming Lei



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                         ` <CACVXFVOc0XZ+eLHGiVwKuiUResRk8Cj9MS4EPMx7k57a0tEJhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-21  6:39                                           ` Huajun Li
       [not found]                                             ` <CA+v9cxZZpL5soSga=MX_bD45KNve-Lnr2Qb6+gr7Mv6Txyh-fA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Huajun Li @ 2012-04-21  6:39 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

On Sat, Apr 21, 2012 at 9:49 AM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Fri, Apr 20, 2012 at 10:56 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Fri, Apr 20, 2012 at 10:22 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> On Fri, Apr 20, 2012 at 9:37 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>
>>>> Above patch has already been integrated to mainline. However, maybe
>>>> there still exists another potentail use-after-free issue, here is a
>>>> case:
>>>>      After release the lock in unlink_urbs(), defer_bh() may move
>>>> current skb from rxq/txq to dev->done queue, even cause the skb be
>>>> released. Then in next loop cycle, it can't refer to expected skb, and
>>>> may Oops again.
>>>
>>> Could you explain in a bit detail? Why can't the expected skb be refered
>>> to in next loop?
>>
>>
>>      unlink_urbs()                                           complete handler
>> --------------------------------------
>> -------------------------------------------------
>>     spin_unlock_irqrestore()
>>                                                                  rx_complete()
>>                                                                  derver_bh()
>>
>>  __skb_unlink()
>>
>>  __skb_queue_tail(&dev->done, skb)   =======> skb is moved to
>> dev->done, and can be freed by usbnet_bh()
>>      skb_queue_walk_safe()
>>                      tmp = skb->next   ===> refer to freed skb
>
> I see the problem, so looks skb_queue_walk_safe is not safe.
> I don' know why the 2nd ' tmp = skb->next' in  skb_queue_walk_safe
> is needed and it may become unsafe if skb is freed during current loop.
>
> But removing the 2nd 'tmp = skb->next' doesn't help the problem, because
> tmp still may become freed after releasing lock.
>
>> If its state is x_done/tx_done/rx_cleanup, that means the the skb will
>> be released soon, right? If so, it should avoid calling
>> usb_unlink_urb().
>
> Even though you can avoid calling unlink for completed URBs, the skbs
> still may be freed in unlink path because complete handler will be triggered
> by unlink and the referenced skb may be freed before next loop, so your
> patch can't fix the oops.
>

Hi Ming,
     That's why my patch uses skb_queue_walk() to traverse the queue,
it guarantees the skb available in each loop.  Is this what you
expected?

     The main idea of my patch(it is based on current mainline: 3.4.0-rc3) is:
     1. If the skb in txq/rxq, then it must be available,
unlink_urbs() can refer to it safely while it holds  q->lock;
     2. If the skb in txq/rxq and its state is
rx_done/tx_done/rx_cleanup, that means the skb's URB is complete, then
don't need to unlink it again;
     3. Before releasing  q->lock in unlink_urbs(), it will increase
the URB's refercount, so even the related skb is freed in future, the
URB is still available.

Thanks,
--Huajun

> As far as I can think of, we can hold lock of done queue to forbid skb free
> during unlinking. The below patch may fix the problem, are you OK
> with it?

Just skip trying this per your following email's comments.

>
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index db99536..a9809d4 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -581,7 +581,8 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>        struct sk_buff          *skb, *skbnext;
>        int                     count = 0;
>
> -       spin_lock_irqsave (&q->lock, flags);
> +       spin_lock_irqsave(&dev->done.lock, flags);
> +       spin_lock(&q->lock);
>        skb_queue_walk_safe(q, skb, skbnext) {
>                struct skb_data         *entry;
>                struct urb              *urb;
> @@ -598,7 +599,7 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>                 * handler(include defer_bh).
>                 */
>                usb_get_urb(urb);
> -               spin_unlock_irqrestore(&q->lock, flags);
> +               spin_unlock(&q->lock);
>                // during some PM-driven resume scenarios,
>                // these (async) unlinks complete immediately
>                retval = usb_unlink_urb (urb);
> @@ -607,9 +608,10 @@ static int unlink_urbs (struct usbnet *dev,
> struct sk_buff_head *q)
>                else
>                        count++;
>                usb_put_urb(urb);
> -               spin_lock_irqsave(&q->lock, flags);
> +               spin_lock(&q->lock);
>        }
> -       spin_unlock_irqrestore (&q->lock, flags);
> +       spin_unlock(&q->lock);
> +       spin_unlock_irqrestore(&dev->done.lock, flags);
>        return count;
>  }
>
>
>
> Thanks,
> --
> Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                             ` <CA+v9cxZZpL5soSga=MX_bD45KNve-Lnr2Qb6+gr7Mv6Txyh-fA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-21  7:06                                               ` Ming Lei
       [not found]                                                 ` <CACVXFVN6kvXk7s4Sc0d_-yKSM=rV3qcuPPBHVZYzoQjnwkGX+Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-04-21  7:50                                                 ` Huajun Li
  0 siblings, 2 replies; 40+ messages in thread
From: Ming Lei @ 2012-04-21  7:06 UTC (permalink / raw)
  To: Huajun Li
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

Hi Huajun,

On Sat, Apr 21, 2012 at 2:39 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Hi Ming,
>     That's why my patch uses skb_queue_walk() to traverse the queue,
> it guarantees the skb available in each loop.  Is this what you
> expected?
>
>     The main idea of my patch(it is based on current mainline: 3.4.0-rc3) is:
>     1. If the skb in txq/rxq, then it must be available,
> unlink_urbs() can refer to it safely while it holds  q->lock;
>     2. If the skb in txq/rxq and its state is
> rx_done/tx_done/rx_cleanup, that means the skb's URB is complete, then
> don't need to unlink it again;

As I said before, at least current code is not mistaken, since unlink
can handle the case correctly.

>     3. Before releasing  q->lock in unlink_urbs(), it will increase
> the URB's refercount, so even the related skb is freed in future, the
> URB is still available.

No, increasing URB's reference count does not prevent the referenced skb
from being freed, see usbnet_bh(), so 'tmp = skb->next' in skb_queue_walk_safe
still may reference a freed pointer.

>> As far as I can think of, we can hold lock of done queue to forbid skb free
>> during unlinking. The below patch may fix the problem, are you OK
>> with it?
>
> Just skip trying this per your following email's comments.

I will prepare a new patch later, if you'd like to try it.


Thanks
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                                 ` <CACVXFVN6kvXk7s4Sc0d_-yKSM=rV3qcuPPBHVZYzoQjnwkGX+Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-21  7:45                                                   ` Ming Lei
       [not found]                                                     ` <CACVXFVNc_S8pTaBqMzQZx6Dt-tSP_9iXepxJzv=iR9BFu=Tj8g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2012-04-21  7:45 UTC (permalink / raw)
  To: Huajun Li
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

Hi Huajun,

On Sat, Apr 21, 2012 at 3:06 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Just skip trying this per your following email's comments.
>
> I will prepare a new patch later, if you'd like to try it.

The below patch reverts the below commits:

       0956a8c20b23d429e79ff86d4325583fc06f9eb4
      (usbnet: increase URB reference count before usb_unlink_urb)

      4231d47e6fe69f061f96c98c30eaf9fb4c14b96d
      (net/usbnet: avoid recursive locking in usbnet_stop())

and keep holding tx/rx queue lock during unlinking, but avoid
to acquire the same queue lock inside complete handler triggered by
usb_unlink_urb.

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index db99536..effb34e 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -291,9 +291,11 @@ static void defer_bh(struct usbnet *dev, struct
sk_buff *skb, struct sk_buff_hea
 {
 	unsigned long		flags;

-	spin_lock_irqsave(&list->lock, flags);
+	if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
+		spin_lock_irqsave(&list->lock, flags);
 	__skb_unlink(skb, list);
-	spin_unlock(&list->lock);
+	if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
+		spin_unlock(&list->lock);
 	spin_lock(&dev->done.lock);
 	__skb_queue_tail(&dev->done, skb);
 	if (dev->done.qlen == 1)
@@ -345,7 +347,8 @@ static int rx_submit (struct usbnet *dev, struct
urb *urb, gfp_t flags)
 	usb_fill_bulk_urb (urb, dev->udev, dev->in,
 		skb->data, size, rx_complete, skb);

-	spin_lock_irqsave (&dev->rxq.lock, lockflags);
+	if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
+		spin_lock_irqsave (&dev->rxq.lock, lockflags);

 	if (netif_running (dev->net) &&
 	    netif_device_present (dev->net) &&
@@ -377,7 +380,8 @@ static int rx_submit (struct usbnet *dev, struct
urb *urb, gfp_t flags)
 		netif_dbg(dev, ifdown, dev->net, "rx: stopped\n");
 		retval = -ENOLINK;
 	}
-	spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
+	if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
+		spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
 	if (retval) {
 		dev_kfree_skb_any (skb);
 		usb_free_urb (urb);
@@ -582,6 +586,7 @@ static int unlink_urbs (struct usbnet *dev, struct
sk_buff_head *q)
 	int			count = 0;

 	spin_lock_irqsave (&q->lock, flags);
+	set_cpu_bit(URB_UNLINKING, dev->cpuflags);
 	skb_queue_walk_safe(q, skb, skbnext) {
 		struct skb_data		*entry;
 		struct urb		*urb;
@@ -590,15 +595,6 @@ static int unlink_urbs (struct usbnet *dev,
struct sk_buff_head *q)
 		entry = (struct skb_data *) skb->cb;
 		urb = entry->urb;

-		/*
-		 * Get reference count of the URB to avoid it to be
-		 * freed during usb_unlink_urb, which may trigger
-		 * use-after-free problem inside usb_unlink_urb since
-		 * usb_unlink_urb is always racing with .complete
-		 * handler(include defer_bh).
-		 */
-		usb_get_urb(urb);
-		spin_unlock_irqrestore(&q->lock, flags);
 		// during some PM-driven resume scenarios,
 		// these (async) unlinks complete immediately
 		retval = usb_unlink_urb (urb);
@@ -606,9 +602,8 @@ static int unlink_urbs (struct usbnet *dev, struct
sk_buff_head *q)
 			netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
 		else
 			count++;
-		usb_put_urb(urb);
-		spin_lock_irqsave(&q->lock, flags);
 	}
+	clear_cpu_bit(URB_UNLINKING, dev->cpuflags);
 	spin_unlock_irqrestore (&q->lock, flags);
 	return count;
 }
@@ -1283,6 +1278,7 @@ void usbnet_disconnect (struct usb_interface *intf)
 	usb_kill_urb(dev->interrupt);
 	usb_free_urb(dev->interrupt);

+	free_percpu(dev->cpuflags);
 	free_netdev(net);
 	usb_put_dev (xdev);
 }
@@ -1353,6 +1349,13 @@ usbnet_probe (struct usb_interface *udev, const
struct usb_device_id *prod)
 	SET_NETDEV_DEV(net, &udev->dev);

 	dev = netdev_priv(net);
+
+	dev->cpuflags = alloc_percpu(unsigned long);
+	if (!dev->cpuflags) {
+		status = -ENOMEM;
+		goto out1;
+	}
+
 	dev->udev = xdev;
 	dev->intf = udev;
 	dev->driver_info = info;
@@ -1396,7 +1399,7 @@ usbnet_probe (struct usb_interface *udev, const
struct usb_device_id *prod)
 	if (info->bind) {
 		status = info->bind (dev, udev);
 		if (status < 0)
-			goto out1;
+			goto out2;

 		// heuristic:  "usb%d" for links we know are two-host,
 		// else "eth%d" when there's reasonable doubt.  userspace
@@ -1465,6 +1468,8 @@ usbnet_probe (struct usb_interface *udev, const
struct usb_device_id *prod)
 out3:
 	if (info->unbind)
 		info->unbind (dev, udev);
+out2:
+	free_percpu(dev->cpuflags);
 out1:
 	free_netdev(net);
 out:
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index 605b0aa..2dc46f5 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -69,8 +69,28 @@ struct usbnet {
 #		define EVENT_DEV_WAKING 6
 #		define EVENT_DEV_ASLEEP 7
 #		define EVENT_DEV_OPEN	8
+	unsigned long __percpu *cpuflags;
+#		define URB_UNLINKING	0
 };

+static inline void set_cpu_bit(int nr, unsigned long __percpu *addr)
+{
+	unsigned long *fl = __this_cpu_ptr(addr);
+	set_bit(nr, fl);
+}
+
+static inline void clear_cpu_bit(int nr, unsigned long __percpu *addr)
+{
+	unsigned long *fl = __this_cpu_ptr(addr);
+	clear_bit(nr, fl);
+}
+
+static inline int test_cpu_bit(int nr, unsigned long __percpu *addr)
+{
+	unsigned long *fl = __this_cpu_ptr(addr);
+	return test_bit(nr, fl);
+}
+
 static inline struct usb_driver *driver_of(struct usb_interface *intf)
 {
 	return to_usb_driver(intf->dev.driver);


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-04-21  7:06                                               ` Ming Lei
       [not found]                                                 ` <CACVXFVN6kvXk7s4Sc0d_-yKSM=rV3qcuPPBHVZYzoQjnwkGX+Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-21  7:50                                                 ` Huajun Li
       [not found]                                                   ` <CA+v9cxa+fyzMOD-=oLLczpu1FDtTwcok+y2FkC=mHzDH3JYJ2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 40+ messages in thread
From: Huajun Li @ 2012-04-21  7:50 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, Alan Stern, Dave Jones, netdev, linux-usb,
	Fedora Kernel Team

On Sat, Apr 21, 2012 at 3:06 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> Hi Huajun,
>
> On Sat, Apr 21, 2012 at 2:39 PM, Huajun Li <huajun.li.lee@gmail.com> wrote:
>
>> Hi Ming,
>>     That's why my patch uses skb_queue_walk() to traverse the queue,
>> it guarantees the skb available in each loop.  Is this what you
>> expected?
>>
>>     The main idea of my patch(it is based on current mainline: 3.4.0-rc3) is:
>>     1. If the skb in txq/rxq, then it must be available,
>> unlink_urbs() can refer to it safely while it holds  q->lock;
>>     2. If the skb in txq/rxq and its state is
>> rx_done/tx_done/rx_cleanup, that means the skb's URB is complete, then
>> don't need to unlink it again;
>
> As I said before, at least current code is not mistaken, since unlink
> can handle the case correctly.
>
>>     3. Before releasing  q->lock in unlink_urbs(), it will increase
>> the URB's refercount, so even the related skb is freed in future, the
>> URB is still available.
>
> No, increasing URB's reference count does not prevent the referenced skb
> from being freed, see usbnet_bh(), so 'tmp = skb->next' in skb_queue_walk_safe
> still may reference a freed pointer.
>

Did we on the same page, could you please review my patch again?

My draft patch was based on current mainline( 3.4.0-rc3)  which had
already integrated your previous patch. And in my patch, it replaced
skb_queue_walk_safe() with skb_queue_walk(), so you will not see  'tmp
= skb->next'  any more.

>>> As far as I can think of, we can hold lock of done queue to forbid skb free
>>> during unlinking. The below patch may fix the problem, are you OK
>>> with it?
>>
>> Just skip trying this per your following email's comments.
>
> I will prepare a new patch later, if you'd like to try it.
>
>
> Thanks
> --
> Ming Lei

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                                   ` <CA+v9cxa+fyzMOD-=oLLczpu1FDtTwcok+y2FkC=mHzDH3JYJ2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-21  7:56                                                     ` Ming Lei
       [not found]                                                       ` <CACVXFVOMTbq0zbmsZAt-Pyc=3oqQ=UcWV5HgNryu7s6oMhKpQg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2012-04-21  7:56 UTC (permalink / raw)
  To: Huajun Li
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

Hi Huajun,

On Sat, Apr 21, 2012 at 3:50 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:

> Did we on the same page, could you please review my patch again?
>
> My draft patch was based on current mainline( 3.4.0-rc3)  which had
> already integrated your previous patch. And in my patch, it replaced
> skb_queue_walk_safe() with skb_queue_walk(), so you will not see  'tmp
> = skb->next'  any more.

Replace skb_queue_walk_safe with skb_queue_walk doesn't improve
the problem, since 'skb = skb->next' in skb_queue_walk still may trigger
the oops, does it?


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                                     ` <CACVXFVNc_S8pTaBqMzQZx6Dt-tSP_9iXepxJzv=iR9BFu=Tj8g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-21  8:00                                                       ` Huajun Li
  2012-04-22  2:19                                                       ` Huajun Li
  1 sibling, 0 replies; 40+ messages in thread
From: Huajun Li @ 2012-04-21  8:00 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

On Sat, Apr 21, 2012 at 3:45 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Huajun,
>
> On Sat, Apr 21, 2012 at 3:06 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> Just skip trying this per your following email's comments.
>>
>> I will prepare a new patch later, if you'd like to try it.
>
> The below patch reverts the below commits:
>
>       0956a8c20b23d429e79ff86d4325583fc06f9eb4
>      (usbnet: increase URB reference count before usb_unlink_urb)
>
>      4231d47e6fe69f061f96c98c30eaf9fb4c14b96d
>      (net/usbnet: avoid recursive locking in usbnet_stop())
>
> and keep holding tx/rx queue lock during unlinking, but avoid
> to acquire the same queue lock inside complete handler triggered by
> usb_unlink_urb.
>
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index db99536..effb34e 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -291,9 +291,11 @@ static void defer_bh(struct usbnet *dev, struct
> sk_buff *skb, struct sk_buff_hea
>  {
>        unsigned long           flags;
>
> -       spin_lock_irqsave(&list->lock, flags);
> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
> +               spin_lock_irqsave(&list->lock, flags);
>        __skb_unlink(skb, list);
> -       spin_unlock(&list->lock);
> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
> +               spin_unlock(&list->lock);
>        spin_lock(&dev->done.lock);
>        __skb_queue_tail(&dev->done, skb);
>        if (dev->done.qlen == 1)
> @@ -345,7 +347,8 @@ static int rx_submit (struct usbnet *dev, struct
> urb *urb, gfp_t flags)
>        usb_fill_bulk_urb (urb, dev->udev, dev->in,
>                skb->data, size, rx_complete, skb);
>
> -       spin_lock_irqsave (&dev->rxq.lock, lockflags);
> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
> +               spin_lock_irqsave (&dev->rxq.lock, lockflags);
>
>        if (netif_running (dev->net) &&
>            netif_device_present (dev->net) &&
> @@ -377,7 +380,8 @@ static int rx_submit (struct usbnet *dev, struct
> urb *urb, gfp_t flags)
>                netif_dbg(dev, ifdown, dev->net, "rx: stopped\n");
>                retval = -ENOLINK;
>        }
> -       spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
> +               spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
>        if (retval) {
>                dev_kfree_skb_any (skb);
>                usb_free_urb (urb);
> @@ -582,6 +586,7 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>        int                     count = 0;
>
>        spin_lock_irqsave (&q->lock, flags);
> +       set_cpu_bit(URB_UNLINKING, dev->cpuflags);
>        skb_queue_walk_safe(q, skb, skbnext) {
>                struct skb_data         *entry;
>                struct urb              *urb;
> @@ -590,15 +595,6 @@ static int unlink_urbs (struct usbnet *dev,
> struct sk_buff_head *q)
>                entry = (struct skb_data *) skb->cb;
>                urb = entry->urb;
>
> -               /*
> -                * Get reference count of the URB to avoid it to be
> -                * freed during usb_unlink_urb, which may trigger
> -                * use-after-free problem inside usb_unlink_urb since
> -                * usb_unlink_urb is always racing with .complete
> -                * handler(include defer_bh).
> -                */
> -               usb_get_urb(urb);
> -               spin_unlock_irqrestore(&q->lock, flags);
>                // during some PM-driven resume scenarios,
>                // these (async) unlinks complete immediately
>                retval = usb_unlink_urb (urb);
> @@ -606,9 +602,8 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>                        netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
>                else
>                        count++;
> -               usb_put_urb(urb);
> -               spin_lock_irqsave(&q->lock, flags);
>        }
> +       clear_cpu_bit(URB_UNLINKING, dev->cpuflags);
>        spin_unlock_irqrestore (&q->lock, flags);
>        return count;
>  }
> @@ -1283,6 +1278,7 @@ void usbnet_disconnect (struct usb_interface *intf)
>        usb_kill_urb(dev->interrupt);
>        usb_free_urb(dev->interrupt);
>
> +       free_percpu(dev->cpuflags);
>        free_netdev(net);
>        usb_put_dev (xdev);
>  }
> @@ -1353,6 +1349,13 @@ usbnet_probe (struct usb_interface *udev, const
> struct usb_device_id *prod)
>        SET_NETDEV_DEV(net, &udev->dev);
>
>        dev = netdev_priv(net);
> +
> +       dev->cpuflags = alloc_percpu(unsigned long);
> +       if (!dev->cpuflags) {
> +               status = -ENOMEM;
> +               goto out1;
> +       }
> +
>        dev->udev = xdev;
>        dev->intf = udev;
>        dev->driver_info = info;
> @@ -1396,7 +1399,7 @@ usbnet_probe (struct usb_interface *udev, const
> struct usb_device_id *prod)
>        if (info->bind) {
>                status = info->bind (dev, udev);
>                if (status < 0)
> -                       goto out1;
> +                       goto out2;
>
>                // heuristic:  "usb%d" for links we know are two-host,
>                // else "eth%d" when there's reasonable doubt.  userspace
> @@ -1465,6 +1468,8 @@ usbnet_probe (struct usb_interface *udev, const
> struct usb_device_id *prod)
>  out3:
>        if (info->unbind)
>                info->unbind (dev, udev);
> +out2:
> +       free_percpu(dev->cpuflags);
>  out1:
>        free_netdev(net);
>  out:
> diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
> index 605b0aa..2dc46f5 100644
> --- a/include/linux/usb/usbnet.h
> +++ b/include/linux/usb/usbnet.h
> @@ -69,8 +69,28 @@ struct usbnet {
>  #              define EVENT_DEV_WAKING 6
>  #              define EVENT_DEV_ASLEEP 7
>  #              define EVENT_DEV_OPEN   8
> +       unsigned long __percpu *cpuflags;
> +#              define URB_UNLINKING    0
>  };
>
> +static inline void set_cpu_bit(int nr, unsigned long __percpu *addr)
> +{
> +       unsigned long *fl = __this_cpu_ptr(addr);
> +       set_bit(nr, fl);
> +}
> +
> +static inline void clear_cpu_bit(int nr, unsigned long __percpu *addr)
> +{
> +       unsigned long *fl = __this_cpu_ptr(addr);
> +       clear_bit(nr, fl);
> +}
> +
> +static inline int test_cpu_bit(int nr, unsigned long __percpu *addr)
> +{
> +       unsigned long *fl = __this_cpu_ptr(addr);
> +       return test_bit(nr, fl);
> +}
> +
>  static inline struct usb_driver *driver_of(struct usb_interface *intf)
>  {
>        return to_usb_driver(intf->dev.driver);
>
>
> Thanks,
> --
> Ming Lei

Hi Ming,
   Many changes.
   Anyway, I will try it when I back to home.  :)
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                                       ` <CACVXFVOMTbq0zbmsZAt-Pyc=3oqQ=UcWV5HgNryu7s6oMhKpQg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-21  8:23                                                         ` Huajun Li
       [not found]                                                           ` <CA+v9cxZRpthaZ=64tLzKf=AGOqaRVwe5o0UMadiXGuiuXiA7uA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Huajun Li @ 2012-04-21  8:23 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

On Sat, Apr 21, 2012 at 3:56 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Huajun,
>
> On Sat, Apr 21, 2012 at 3:50 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>> Did we on the same page, could you please review my patch again?
>>
>> My draft patch was based on current mainline( 3.4.0-rc3)  which had
>> already integrated your previous patch. And in my patch, it replaced
>> skb_queue_walk_safe() with skb_queue_walk(), so you will not see  'tmp
>> = skb->next'  any more.
>
> Replace skb_queue_walk_safe with skb_queue_walk doesn't improve
> the problem, since 'skb = skb->next' in skb_queue_walk still may trigger
> the oops, does it?
>

No.
In each loop, my patch traverse the queue from its head, and it always
holds  q->lock when it need refer "skb->next", this can make sure the
right skb is not moved out of rxq/txq.

Can this fix what you concern? If so, IMO, there is no need to revert
your previous patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                                           ` <CA+v9cxZRpthaZ=64tLzKf=AGOqaRVwe5o0UMadiXGuiuXiA7uA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-21  9:40                                                             ` Ming Lei
       [not found]                                                               ` <CACVXFVOemJqfT9OPRer3qzbVEsGyUOupoOUNCBzC4deNRsksQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2012-04-21  9:40 UTC (permalink / raw)
  To: Huajun Li
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

Hi Huajun,

On Sat, Apr 21, 2012 at 4:23 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Sat, Apr 21, 2012 at 3:56 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Hi Huajun,
>>
>> On Sat, Apr 21, 2012 at 3:50 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>
>>> Did we on the same page, could you please review my patch again?
>>>
>>> My draft patch was based on current mainline( 3.4.0-rc3)  which had
>>> already integrated your previous patch. And in my patch, it replaced
>>> skb_queue_walk_safe() with skb_queue_walk(), so you will not see  'tmp
>>> = skb->next'  any more.
>>
>> Replace skb_queue_walk_safe with skb_queue_walk doesn't improve
>> the problem, since 'skb = skb->next' in skb_queue_walk still may trigger
>> the oops, does it?
>>
>
> No.
> In each loop, my patch traverse the queue from its head, and it always
> holds  q->lock when it need refer "skb->next", this can make sure the
> right skb is not moved out of rxq/txq.

OK, your patch can avoid the oops, sorry for miss the point.

>
> Can this fix what you concern? If so, IMO, there is no need to revert
> your previous patch.

But your patch may introduce another problem, in fact, what your patch does
is basically same with the below change[1]:

So we can find easily that one same URB may be unlinked more than one
time with your patch because usb_unlink_urb is asynchronous even though
it behaves synchronously sometimes.

I remembered that is not allowed, at least usb_unlink_urb's comment says so:

          URBs complete only once per submission, and may be canceled only
          once per submission.


[1], against 3.4.0-rc3
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index db99536..aadf009 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -578,15 +578,19 @@ EXPORT_SYMBOL_GPL(usbnet_purge_paused_rxq);
 static int unlink_urbs (struct usbnet *dev, struct sk_buff_head *q)
 {
 	unsigned long		flags;
-	struct sk_buff		*skb, *skbnext;
+	struct sk_buff		*skb;
 	int			count = 0;

 	spin_lock_irqsave (&q->lock, flags);
-	skb_queue_walk_safe(q, skb, skbnext) {
+	while (1) {
 		struct skb_data		*entry;
 		struct urb		*urb;
 		int			retval;

+		skb = q->next;
+		if (skb == (struct sk_buff *)q)
+			break;
+
 		entry = (struct skb_data *) skb->cb;
 		urb = entry->urb;


Thanks,
-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-04-21  1:49                                       ` Ming Lei
  2012-04-21  2:02                                         ` Ming Lei
       [not found]                                         ` <CACVXFVOc0XZ+eLHGiVwKuiUResRk8Cj9MS4EPMx7k57a0tEJhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-21 19:23                                         ` David Miller
       [not found]                                           ` <20120421.152345.290988116097275353.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  2 siblings, 1 reply; 40+ messages in thread
From: David Miller @ 2012-04-21 19:23 UTC (permalink / raw)
  To: tom.leiming
  Cc: huajun.li.lee, oneukum, stern, davej, netdev, linux-usb, kernel-team

From: Ming Lei <tom.leiming@gmail.com>
Date: Sat, 21 Apr 2012 09:49:51 +0800

> I see the problem, so looks skb_queue_walk_safe is not safe.
> I don' know why the 2nd ' tmp = skb->next' in  skb_queue_walk_safe
> is needed and it may become unsafe if skb is freed during current loop.

I can't see what the problem is, skb_queue_walk_safe() is perfect
and does exactly what it advertises to do.

If 'skb' is unlinked inside of an skb_queue_walk_safe() loop, that's
fine, because we won't touch 'skb' in the loop iteration tail code.

Instead, before the loop contents, we pre-fetch skb->next into 'tmp'
and then at the end we move 'skb' forward by simply assigning 'tmp'.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                           ` <20120421.152345.290988116097275353.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2012-04-22  1:45                                             ` Huajun Li
  2012-04-22  2:05                                               ` David Miller
  0 siblings, 1 reply; 40+ messages in thread
From: Huajun Li @ 2012-04-22  1:45 UTC (permalink / raw)
  To: David Miller
  Cc: tom.leiming-Re5JQEeQqe8AvxtiuMwx3w, oneukum-l3A5Bk7waGM,
	stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz,
	davej-H+wXaHxf7aLQT0dZR+AlfA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	kernel-team-rxtnV0ftBwyoClj4AeEUq9i2O/JbrIOy

On Sun, Apr 22, 2012 at 3:23 AM, David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
> From: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date: Sat, 21 Apr 2012 09:49:51 +0800
>
>> I see the problem, so looks skb_queue_walk_safe is not safe.
>> I don' know why the 2nd ' tmp = skb->next' in  skb_queue_walk_safe
>> is needed and it may become unsafe if skb is freed during current loop.
>
> I can't see what the problem is, skb_queue_walk_safe() is perfect
> and does exactly what it advertises to do.
>
> If 'skb' is unlinked inside of an skb_queue_walk_safe() loop, that's
> fine, because we won't touch 'skb' in the loop iteration tail code.
>
> Instead, before the loop contents, we pre-fetch skb->next into 'tmp'
> and then at the end we move 'skb' forward by simply assigning 'tmp'.

In this case, the problem is, 'tmp = skb->next' can be moved out of
rxq/txq, and even be freed. Then in next loop cycle, 'skb = tmp' will
refer to a freed skb.  You know, in current code stack, unlink_urbs()
releases q->lock in each loop, this gives chance to urb complete
handler to call defer_bh() and cause the problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                                               ` <CACVXFVOemJqfT9OPRer3qzbVEsGyUOupoOUNCBzC4deNRsksQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-22  2:02                                                                 ` Huajun Li
  0 siblings, 0 replies; 40+ messages in thread
From: Huajun Li @ 2012-04-22  2:02 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

On Sat, Apr 21, 2012 at 5:40 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Huajun,
>
> On Sat, Apr 21, 2012 at 4:23 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Sat, Apr 21, 2012 at 3:56 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> Hi Huajun,
>>>
>>> On Sat, Apr 21, 2012 at 3:50 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>
>>>> Did we on the same page, could you please review my patch again?
>>>>
>>>> My draft patch was based on current mainline( 3.4.0-rc3)  which had
>>>> already integrated your previous patch. And in my patch, it replaced
>>>> skb_queue_walk_safe() with skb_queue_walk(), so you will not see  'tmp
>>>> = skb->next'  any more.
>>>
>>> Replace skb_queue_walk_safe with skb_queue_walk doesn't improve
>>> the problem, since 'skb = skb->next' in skb_queue_walk still may trigger
>>> the oops, does it?
>>>
>>
>> No.
>> In each loop, my patch traverse the queue from its head, and it always
>> holds  q->lock when it need refer "skb->next", this can make sure the
>> right skb is not moved out of rxq/txq.
>
> OK, your patch can avoid the oops, sorry for miss the point.
>
>>
>> Can this fix what you concern? If so, IMO, there is no need to revert
>> your previous patch.
>
> But your patch may introduce another problem, in fact, what your patch does
> is basically same with the below change[1]:
>
> So we can find easily that one same URB may be unlinked more than one
> time with your patch because usb_unlink_urb is asynchronous even though
> it behaves synchronously sometimes.
>
> I remembered that is not allowed, at least usb_unlink_urb's comment says so:
>
>          URBs complete only once per submission, and may be canceled only
>          once per submission.
>

Yes, this is a problem should be avoided.

> [1], against 3.4.0-rc3
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index db99536..aadf009 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -578,15 +578,19 @@ EXPORT_SYMBOL_GPL(usbnet_purge_paused_rxq);
>  static int unlink_urbs (struct usbnet *dev, struct sk_buff_head *q)
>  {
>        unsigned long           flags;
> -       struct sk_buff          *skb, *skbnext;
> +       struct sk_buff          *skb;
>        int                     count = 0;
>
>        spin_lock_irqsave (&q->lock, flags);
> -       skb_queue_walk_safe(q, skb, skbnext) {
> +       while (1) {
>                struct skb_data         *entry;
>                struct urb              *urb;
>                int                     retval;
>
> +               skb = q->next;
> +               if (skb == (struct sk_buff *)q)
> +                       break;
> +
>                entry = (struct skb_data *) skb->cb;
>                urb = entry->urb;
>
>
> Thanks,
> --
> Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-04-22  1:45                                             ` Huajun Li
@ 2012-04-22  2:05                                               ` David Miller
  0 siblings, 0 replies; 40+ messages in thread
From: David Miller @ 2012-04-22  2:05 UTC (permalink / raw)
  To: huajun.li.lee
  Cc: tom.leiming, oneukum, stern, davej, netdev, linux-usb, kernel-team

From: Huajun Li <huajun.li.lee@gmail.com>
Date: Sun, 22 Apr 2012 09:45:55 +0800

> On Sun, Apr 22, 2012 at 3:23 AM, David Miller <davem@davemloft.net> wrote:
>> From: Ming Lei <tom.leiming@gmail.com>
>> Date: Sat, 21 Apr 2012 09:49:51 +0800
>>
>>> I see the problem, so looks skb_queue_walk_safe is not safe.
>>> I don' know why the 2nd ' tmp = skb->next' in  skb_queue_walk_safe
>>> is needed and it may become unsafe if skb is freed during current loop.
>>
>> I can't see what the problem is, skb_queue_walk_safe() is perfect
>> and does exactly what it advertises to do.
>>
>> If 'skb' is unlinked inside of an skb_queue_walk_safe() loop, that's
>> fine, because we won't touch 'skb' in the loop iteration tail code.
>>
>> Instead, before the loop contents, we pre-fetch skb->next into 'tmp'
>> and then at the end we move 'skb' forward by simply assigning 'tmp'.
> 
> In this case, the problem is, 'tmp = skb->next' can be moved out of
> rxq/txq, and even be freed. Then in next loop cycle, 'skb = tmp' will
> refer to a freed skb.  You know, in current code stack, unlink_urbs()
> releases q->lock in each loop, this gives chance to urb complete
> handler to call defer_bh() and cause the problem.

Right, just like interfaces such as list_for_each_entry_safe(), this
macro isn't designed to handle cases where you unlink more than one
entry in the list.  Specifically, it's designed only to handle the
case when you unlink the entry being processed in the current loop
iteration.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                                     ` <CACVXFVNc_S8pTaBqMzQZx6Dt-tSP_9iXepxJzv=iR9BFu=Tj8g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-04-21  8:00                                                       ` Huajun Li
@ 2012-04-22  2:19                                                       ` Huajun Li
  2012-04-22 12:05                                                         ` Ming Lei
  1 sibling, 1 reply; 40+ messages in thread
From: Huajun Li @ 2012-04-22  2:19 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

On Sat, Apr 21, 2012 at 3:45 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Hi Huajun,
>
> On Sat, Apr 21, 2012 at 3:06 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> Just skip trying this per your following email's comments.
>>
>> I will prepare a new patch later, if you'd like to try it.
>
> The below patch reverts the below commits:
>
>       0956a8c20b23d429e79ff86d4325583fc06f9eb4
>      (usbnet: increase URB reference count before usb_unlink_urb)
>
>      4231d47e6fe69f061f96c98c30eaf9fb4c14b96d
>      (net/usbnet: avoid recursive locking in usbnet_stop())
>
> and keep holding tx/rx queue lock during unlinking, but avoid
> to acquire the same queue lock inside complete handler triggered by
> usb_unlink_urb.
>
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index db99536..effb34e 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -291,9 +291,11 @@ static void defer_bh(struct usbnet *dev, struct
> sk_buff *skb, struct sk_buff_hea
>  {
>        unsigned long           flags;
>
> -       spin_lock_irqsave(&list->lock, flags);
> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
> +               spin_lock_irqsave(&list->lock, flags);
>        __skb_unlink(skb, list);
> -       spin_unlock(&list->lock);
> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
> +               spin_unlock(&list->lock);
>        spin_lock(&dev->done.lock);
>        __skb_queue_tail(&dev->done, skb);
>        if (dev->done.qlen == 1)


Then 'flags' may not be initialized, and this will cause problem while
calling spin_unlock_irqrestore(&dev->done.lock, flags), right?


> @@ -345,7 +347,8 @@ static int rx_submit (struct usbnet *dev, struct
> urb *urb, gfp_t flags)
>        usb_fill_bulk_urb (urb, dev->udev, dev->in,
>                skb->data, size, rx_complete, skb);
>
> -       spin_lock_irqsave (&dev->rxq.lock, lockflags);
> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
> +               spin_lock_irqsave (&dev->rxq.lock, lockflags);
>
>        if (netif_running (dev->net) &&
>            netif_device_present (dev->net) &&
> @@ -377,7 +380,8 @@ static int rx_submit (struct usbnet *dev, struct
> urb *urb, gfp_t flags)
>                netif_dbg(dev, ifdown, dev->net, "rx: stopped\n");
>                retval = -ENOLINK;
>        }
> -       spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
> +               spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
>        if (retval) {
>                dev_kfree_skb_any (skb);
>                usb_free_urb (urb);
> @@ -582,6 +586,7 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>        int                     count = 0;
>
>        spin_lock_irqsave (&q->lock, flags);
> +       set_cpu_bit(URB_UNLINKING, dev->cpuflags);
>        skb_queue_walk_safe(q, skb, skbnext) {
>                struct skb_data         *entry;
>                struct urb              *urb;
> @@ -590,15 +595,6 @@ static int unlink_urbs (struct usbnet *dev,
> struct sk_buff_head *q)
>                entry = (struct skb_data *) skb->cb;
>                urb = entry->urb;
>
> -               /*
> -                * Get reference count of the URB to avoid it to be
> -                * freed during usb_unlink_urb, which may trigger
> -                * use-after-free problem inside usb_unlink_urb since
> -                * usb_unlink_urb is always racing with .complete
> -                * handler(include defer_bh).
> -                */
> -               usb_get_urb(urb);
> -               spin_unlock_irqrestore(&q->lock, flags);
>                // during some PM-driven resume scenarios,
>                // these (async) unlinks complete immediately
>                retval = usb_unlink_urb (urb);
> @@ -606,9 +602,8 @@ static int unlink_urbs (struct usbnet *dev, struct
> sk_buff_head *q)
>                        netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
>                else
>                        count++;
> -               usb_put_urb(urb);
> -               spin_lock_irqsave(&q->lock, flags);
>        }
> +       clear_cpu_bit(URB_UNLINKING, dev->cpuflags);
>        spin_unlock_irqrestore (&q->lock, flags);
>        return count;
>  }
> @@ -1283,6 +1278,7 @@ void usbnet_disconnect (struct usb_interface *intf)
>        usb_kill_urb(dev->interrupt);
>        usb_free_urb(dev->interrupt);
>
> +       free_percpu(dev->cpuflags);
>        free_netdev(net);
>        usb_put_dev (xdev);
>  }
> @@ -1353,6 +1349,13 @@ usbnet_probe (struct usb_interface *udev, const
> struct usb_device_id *prod)
>        SET_NETDEV_DEV(net, &udev->dev);
>
>        dev = netdev_priv(net);
> +
> +       dev->cpuflags = alloc_percpu(unsigned long);
> +       if (!dev->cpuflags) {
> +               status = -ENOMEM;
> +               goto out1;
> +       }
> +
>        dev->udev = xdev;
>        dev->intf = udev;
>        dev->driver_info = info;
> @@ -1396,7 +1399,7 @@ usbnet_probe (struct usb_interface *udev, const
> struct usb_device_id *prod)
>        if (info->bind) {
>                status = info->bind (dev, udev);
>                if (status < 0)
> -                       goto out1;
> +                       goto out2;
>
>                // heuristic:  "usb%d" for links we know are two-host,
>                // else "eth%d" when there's reasonable doubt.  userspace
> @@ -1465,6 +1468,8 @@ usbnet_probe (struct usb_interface *udev, const
> struct usb_device_id *prod)
>  out3:
>        if (info->unbind)
>                info->unbind (dev, udev);
> +out2:
> +       free_percpu(dev->cpuflags);
>  out1:
>        free_netdev(net);
>  out:
> diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
> index 605b0aa..2dc46f5 100644
> --- a/include/linux/usb/usbnet.h
> +++ b/include/linux/usb/usbnet.h
> @@ -69,8 +69,28 @@ struct usbnet {
>  #              define EVENT_DEV_WAKING 6
>  #              define EVENT_DEV_ASLEEP 7
>  #              define EVENT_DEV_OPEN   8
> +       unsigned long __percpu *cpuflags;
> +#              define URB_UNLINKING    0

Is it possible using a simple bool variable to track whether q->lock
is hold by unlink_urb() ?  If yes, it can avoid introducing following
new codes into current code stack.

>  };
>
> +static inline void set_cpu_bit(int nr, unsigned long __percpu *addr)
> +{
> +       unsigned long *fl = __this_cpu_ptr(addr);
> +       set_bit(nr, fl);
> +}
> +
> +static inline void clear_cpu_bit(int nr, unsigned long __percpu *addr)
> +{
> +       unsigned long *fl = __this_cpu_ptr(addr);
> +       clear_bit(nr, fl);
> +}
> +
> +static inline int test_cpu_bit(int nr, unsigned long __percpu *addr)
> +{
> +       unsigned long *fl = __this_cpu_ptr(addr);
> +       return test_bit(nr, fl);
> +}
> +
>  static inline struct usb_driver *driver_of(struct usb_interface *intf)
>  {
>        return to_usb_driver(intf->dev.driver);
>
>
> Thanks,
> --
> Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-04-22  2:19                                                       ` Huajun Li
@ 2012-04-22 12:05                                                         ` Ming Lei
  2012-04-22 13:15                                                           ` Huajun Li
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2012-04-22 12:05 UTC (permalink / raw)
  To: Huajun Li
  Cc: Oliver Neukum, Alan Stern, Dave Jones, netdev, linux-usb,
	Fedora Kernel Team

On Sun, Apr 22, 2012 at 10:19 AM, Huajun Li <huajun.li.lee@gmail.com> wrote:
> On Sat, Apr 21, 2012 at 3:45 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>> Hi Huajun,
>>
>> On Sat, Apr 21, 2012 at 3:06 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>>>> Just skip trying this per your following email's comments.
>>>
>>> I will prepare a new patch later, if you'd like to try it.
>>
>> The below patch reverts the below commits:
>>
>>       0956a8c20b23d429e79ff86d4325583fc06f9eb4
>>      (usbnet: increase URB reference count before usb_unlink_urb)
>>
>>      4231d47e6fe69f061f96c98c30eaf9fb4c14b96d
>>      (net/usbnet: avoid recursive locking in usbnet_stop())
>>
>> and keep holding tx/rx queue lock during unlinking, but avoid
>> to acquire the same queue lock inside complete handler triggered by
>> usb_unlink_urb.
>>
>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>> index db99536..effb34e 100644
>> --- a/drivers/net/usb/usbnet.c
>> +++ b/drivers/net/usb/usbnet.c
>> @@ -291,9 +291,11 @@ static void defer_bh(struct usbnet *dev, struct
>> sk_buff *skb, struct sk_buff_hea
>>  {
>>        unsigned long           flags;
>>
>> -       spin_lock_irqsave(&list->lock, flags);
>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>> +               spin_lock_irqsave(&list->lock, flags);
>>        __skb_unlink(skb, list);
>> -       spin_unlock(&list->lock);
>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>> +               spin_unlock(&list->lock);
>>        spin_lock(&dev->done.lock);
>>        __skb_queue_tail(&dev->done, skb);
>>        if (dev->done.qlen == 1)
>
>
> Then 'flags' may not be initialized, and this will cause problem while
> calling spin_unlock_irqrestore(&dev->done.lock, flags), right?

The flag is a percpu variable, so it can't change during the
above code piece.

>
>
>> @@ -345,7 +347,8 @@ static int rx_submit (struct usbnet *dev, struct
>> urb *urb, gfp_t flags)
>>        usb_fill_bulk_urb (urb, dev->udev, dev->in,
>>                skb->data, size, rx_complete, skb);
>>
>> -       spin_lock_irqsave (&dev->rxq.lock, lockflags);
>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>> +               spin_lock_irqsave (&dev->rxq.lock, lockflags);
>>
>>        if (netif_running (dev->net) &&
>>            netif_device_present (dev->net) &&
>> @@ -377,7 +380,8 @@ static int rx_submit (struct usbnet *dev, struct
>> urb *urb, gfp_t flags)
>>                netif_dbg(dev, ifdown, dev->net, "rx: stopped\n");
>>                retval = -ENOLINK;
>>        }
>> -       spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>> +               spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
>>        if (retval) {
>>                dev_kfree_skb_any (skb);
>>                usb_free_urb (urb);
>> @@ -582,6 +586,7 @@ static int unlink_urbs (struct usbnet *dev, struct
>> sk_buff_head *q)
>>        int                     count = 0;
>>
>>        spin_lock_irqsave (&q->lock, flags);
>> +       set_cpu_bit(URB_UNLINKING, dev->cpuflags);
>>        skb_queue_walk_safe(q, skb, skbnext) {
>>                struct skb_data         *entry;
>>                struct urb              *urb;
>> @@ -590,15 +595,6 @@ static int unlink_urbs (struct usbnet *dev,
>> struct sk_buff_head *q)
>>                entry = (struct skb_data *) skb->cb;
>>                urb = entry->urb;
>>
>> -               /*
>> -                * Get reference count of the URB to avoid it to be
>> -                * freed during usb_unlink_urb, which may trigger
>> -                * use-after-free problem inside usb_unlink_urb since
>> -                * usb_unlink_urb is always racing with .complete
>> -                * handler(include defer_bh).
>> -                */
>> -               usb_get_urb(urb);
>> -               spin_unlock_irqrestore(&q->lock, flags);
>>                // during some PM-driven resume scenarios,
>>                // these (async) unlinks complete immediately
>>                retval = usb_unlink_urb (urb);
>> @@ -606,9 +602,8 @@ static int unlink_urbs (struct usbnet *dev, struct
>> sk_buff_head *q)
>>                        netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
>>                else
>>                        count++;
>> -               usb_put_urb(urb);
>> -               spin_lock_irqsave(&q->lock, flags);
>>        }
>> +       clear_cpu_bit(URB_UNLINKING, dev->cpuflags);
>>        spin_unlock_irqrestore (&q->lock, flags);
>>        return count;
>>  }
>> @@ -1283,6 +1278,7 @@ void usbnet_disconnect (struct usb_interface *intf)
>>        usb_kill_urb(dev->interrupt);
>>        usb_free_urb(dev->interrupt);
>>
>> +       free_percpu(dev->cpuflags);
>>        free_netdev(net);
>>        usb_put_dev (xdev);
>>  }
>> @@ -1353,6 +1349,13 @@ usbnet_probe (struct usb_interface *udev, const
>> struct usb_device_id *prod)
>>        SET_NETDEV_DEV(net, &udev->dev);
>>
>>        dev = netdev_priv(net);
>> +
>> +       dev->cpuflags = alloc_percpu(unsigned long);
>> +       if (!dev->cpuflags) {
>> +               status = -ENOMEM;
>> +               goto out1;
>> +       }
>> +
>>        dev->udev = xdev;
>>        dev->intf = udev;
>>        dev->driver_info = info;
>> @@ -1396,7 +1399,7 @@ usbnet_probe (struct usb_interface *udev, const
>> struct usb_device_id *prod)
>>        if (info->bind) {
>>                status = info->bind (dev, udev);
>>                if (status < 0)
>> -                       goto out1;
>> +                       goto out2;
>>
>>                // heuristic:  "usb%d" for links we know are two-host,
>>                // else "eth%d" when there's reasonable doubt.  userspace
>> @@ -1465,6 +1468,8 @@ usbnet_probe (struct usb_interface *udev, const
>> struct usb_device_id *prod)
>>  out3:
>>        if (info->unbind)
>>                info->unbind (dev, udev);
>> +out2:
>> +       free_percpu(dev->cpuflags);
>>  out1:
>>        free_netdev(net);
>>  out:
>> diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
>> index 605b0aa..2dc46f5 100644
>> --- a/include/linux/usb/usbnet.h
>> +++ b/include/linux/usb/usbnet.h
>> @@ -69,8 +69,28 @@ struct usbnet {
>>  #              define EVENT_DEV_WAKING 6
>>  #              define EVENT_DEV_ASLEEP 7
>>  #              define EVENT_DEV_OPEN   8
>> +       unsigned long __percpu *cpuflags;
>> +#              define URB_UNLINKING    0
>
> Is it possible using a simple bool variable to track whether q->lock
> is hold by unlink_urb() ?  If yes, it can avoid introducing following
> new codes into current code stack.

It should be defined as percpu variable. The URB complete handler
may be triggered inside unlink path, or it can be triggered in hardirq path
from other CPUs at the same time.

>
>>  };
>>
>> +static inline void set_cpu_bit(int nr, unsigned long __percpu *addr)
>> +{
>> +       unsigned long *fl = __this_cpu_ptr(addr);
>> +       set_bit(nr, fl);
>> +}
>> +
>> +static inline void clear_cpu_bit(int nr, unsigned long __percpu *addr)
>> +{
>> +       unsigned long *fl = __this_cpu_ptr(addr);
>> +       clear_bit(nr, fl);
>> +}
>> +
>> +static inline int test_cpu_bit(int nr, unsigned long __percpu *addr)
>> +{
>> +       unsigned long *fl = __this_cpu_ptr(addr);
>> +       return test_bit(nr, fl);
>> +}
>> +
>>  static inline struct usb_driver *driver_of(struct usb_interface *intf)
>>  {
>>        return to_usb_driver(intf->dev.driver);
>>
>>
>> Thanks,
>> --
>> Ming Lei



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-04-22 12:05                                                         ` Ming Lei
@ 2012-04-22 13:15                                                           ` Huajun Li
       [not found]                                                             ` <CA+v9cxZfzATWynXGtEe6gvcD2aRsLqAF3ZN_HM_dyU4W_rQSpg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-04-22 14:19                                                             ` Ming Lei
  0 siblings, 2 replies; 40+ messages in thread
From: Huajun Li @ 2012-04-22 13:15 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, Alan Stern, Dave Jones, netdev, linux-usb,
	Fedora Kernel Team

On Sun, Apr 22, 2012 at 8:05 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Sun, Apr 22, 2012 at 10:19 AM, Huajun Li <huajun.li.lee@gmail.com> wrote:
>> On Sat, Apr 21, 2012 at 3:45 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>>> Hi Huajun,
>>>
>>> On Sat, Apr 21, 2012 at 3:06 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>>>>> Just skip trying this per your following email's comments.
>>>>
>>>> I will prepare a new patch later, if you'd like to try it.
>>>
>>> The below patch reverts the below commits:
>>>
>>>       0956a8c20b23d429e79ff86d4325583fc06f9eb4
>>>      (usbnet: increase URB reference count before usb_unlink_urb)
>>>
>>>      4231d47e6fe69f061f96c98c30eaf9fb4c14b96d
>>>      (net/usbnet: avoid recursive locking in usbnet_stop())
>>>
>>> and keep holding tx/rx queue lock during unlinking, but avoid
>>> to acquire the same queue lock inside complete handler triggered by
>>> usb_unlink_urb.
>>>
>>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>>> index db99536..effb34e 100644
>>> --- a/drivers/net/usb/usbnet.c
>>> +++ b/drivers/net/usb/usbnet.c
>>> @@ -291,9 +291,11 @@ static void defer_bh(struct usbnet *dev, struct
>>> sk_buff *skb, struct sk_buff_hea
>>>  {
>>>        unsigned long           flags;
>>>
>>> -       spin_lock_irqsave(&list->lock, flags);
>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>> +               spin_lock_irqsave(&list->lock, flags);
>>>        __skb_unlink(skb, list);
>>> -       spin_unlock(&list->lock);
>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>> +               spin_unlock(&list->lock);
>>>        spin_lock(&dev->done.lock);
>>>        __skb_queue_tail(&dev->done, skb);
>>>        if (dev->done.qlen == 1)
>>
>>
>> Then 'flags' may not be initialized, and this will cause problem while
>> calling spin_unlock_irqrestore(&dev->done.lock, flags), right?
>
> The flag is a percpu variable, so it can't change during the
> above code piece.
>

No, maybe you misunderstood it.
Legacy code stack has 'flags' variable to save irq information, and at
the end defer_bh(), spin_unlock_irqrestore(&dev->done.lock, flags)
will use it. However, the variable may not be initialized in your
patch.

>>
>>
>>> @@ -345,7 +347,8 @@ static int rx_submit (struct usbnet *dev, struct
>>> urb *urb, gfp_t flags)
>>>        usb_fill_bulk_urb (urb, dev->udev, dev->in,
>>>                skb->data, size, rx_complete, skb);
>>>
>>> -       spin_lock_irqsave (&dev->rxq.lock, lockflags);
>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>> +               spin_lock_irqsave (&dev->rxq.lock, lockflags);
>>>
>>>        if (netif_running (dev->net) &&
>>>            netif_device_present (dev->net) &&
>>> @@ -377,7 +380,8 @@ static int rx_submit (struct usbnet *dev, struct
>>> urb *urb, gfp_t flags)
>>>                netif_dbg(dev, ifdown, dev->net, "rx: stopped\n");
>>>                retval = -ENOLINK;
>>>        }
>>> -       spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>> +               spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
>>>        if (retval) {
>>>                dev_kfree_skb_any (skb);
>>>                usb_free_urb (urb);
>>> @@ -582,6 +586,7 @@ static int unlink_urbs (struct usbnet *dev, struct
>>> sk_buff_head *q)
>>>        int                     count = 0;
>>>
>>>        spin_lock_irqsave (&q->lock, flags);
>>> +       set_cpu_bit(URB_UNLINKING, dev->cpuflags);
>>>        skb_queue_walk_safe(q, skb, skbnext) {
>>>                struct skb_data         *entry;
>>>                struct urb              *urb;
>>> @@ -590,15 +595,6 @@ static int unlink_urbs (struct usbnet *dev,
>>> struct sk_buff_head *q)
>>>                entry = (struct skb_data *) skb->cb;
>>>                urb = entry->urb;
>>>
>>> -               /*
>>> -                * Get reference count of the URB to avoid it to be
>>> -                * freed during usb_unlink_urb, which may trigger
>>> -                * use-after-free problem inside usb_unlink_urb since
>>> -                * usb_unlink_urb is always racing with .complete
>>> -                * handler(include defer_bh).
>>> -                */
>>> -               usb_get_urb(urb);
>>> -               spin_unlock_irqrestore(&q->lock, flags);
>>>                // during some PM-driven resume scenarios,
>>>                // these (async) unlinks complete immediately
>>>                retval = usb_unlink_urb (urb);
>>> @@ -606,9 +602,8 @@ static int unlink_urbs (struct usbnet *dev, struct
>>> sk_buff_head *q)
>>>                        netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
>>>                else
>>>                        count++;
>>> -               usb_put_urb(urb);
>>> -               spin_lock_irqsave(&q->lock, flags);
>>>        }
>>> +       clear_cpu_bit(URB_UNLINKING, dev->cpuflags);
>>>        spin_unlock_irqrestore (&q->lock, flags);
>>>        return count;
>>>  }
>>> @@ -1283,6 +1278,7 @@ void usbnet_disconnect (struct usb_interface *intf)
>>>        usb_kill_urb(dev->interrupt);
>>>        usb_free_urb(dev->interrupt);
>>>
>>> +       free_percpu(dev->cpuflags);
>>>        free_netdev(net);
>>>        usb_put_dev (xdev);
>>>  }
>>> @@ -1353,6 +1349,13 @@ usbnet_probe (struct usb_interface *udev, const
>>> struct usb_device_id *prod)
>>>        SET_NETDEV_DEV(net, &udev->dev);
>>>
>>>        dev = netdev_priv(net);
>>> +
>>> +       dev->cpuflags = alloc_percpu(unsigned long);
>>> +       if (!dev->cpuflags) {
>>> +               status = -ENOMEM;
>>> +               goto out1;
>>> +       }
>>> +
>>>        dev->udev = xdev;
>>>        dev->intf = udev;
>>>        dev->driver_info = info;
>>> @@ -1396,7 +1399,7 @@ usbnet_probe (struct usb_interface *udev, const
>>> struct usb_device_id *prod)
>>>        if (info->bind) {
>>>                status = info->bind (dev, udev);
>>>                if (status < 0)
>>> -                       goto out1;
>>> +                       goto out2;
>>>
>>>                // heuristic:  "usb%d" for links we know are two-host,
>>>                // else "eth%d" when there's reasonable doubt.  userspace
>>> @@ -1465,6 +1468,8 @@ usbnet_probe (struct usb_interface *udev, const
>>> struct usb_device_id *prod)
>>>  out3:
>>>        if (info->unbind)
>>>                info->unbind (dev, udev);
>>> +out2:
>>> +       free_percpu(dev->cpuflags);
>>>  out1:
>>>        free_netdev(net);
>>>  out:
>>> diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
>>> index 605b0aa..2dc46f5 100644
>>> --- a/include/linux/usb/usbnet.h
>>> +++ b/include/linux/usb/usbnet.h
>>> @@ -69,8 +69,28 @@ struct usbnet {
>>>  #              define EVENT_DEV_WAKING 6
>>>  #              define EVENT_DEV_ASLEEP 7
>>>  #              define EVENT_DEV_OPEN   8
>>> +       unsigned long __percpu *cpuflags;
>>> +#              define URB_UNLINKING    0
>>
>> Is it possible using a simple bool variable to track whether q->lock
>> is hold by unlink_urb() ?  If yes, it can avoid introducing following
>> new codes into current code stack.
>
> It should be defined as percpu variable. The URB complete handler
> may be triggered inside unlink path, or it can be triggered in hardirq path
> from other CPUs at the same time.
>

Then the potential issue is,  in defer_bh(),  __skb_unlink() may be
called without holding the lock. This can cause current issue too,
because __skb_unlink() may move 'skb->next' out of txq/rxq, and even
free it.

>>
>>>  };
>>>
>>> +static inline void set_cpu_bit(int nr, unsigned long __percpu *addr)
>>> +{
>>> +       unsigned long *fl = __this_cpu_ptr(addr);
>>> +       set_bit(nr, fl);
>>> +}
>>> +
>>> +static inline void clear_cpu_bit(int nr, unsigned long __percpu *addr)
>>> +{
>>> +       unsigned long *fl = __this_cpu_ptr(addr);
>>> +       clear_bit(nr, fl);
>>> +}
>>> +
>>> +static inline int test_cpu_bit(int nr, unsigned long __percpu *addr)
>>> +{
>>> +       unsigned long *fl = __this_cpu_ptr(addr);
>>> +       return test_bit(nr, fl);
>>> +}
>>> +
>>>  static inline struct usb_driver *driver_of(struct usb_interface *intf)
>>>  {
>>>        return to_usb_driver(intf->dev.driver);
>>>
>>>
>>> Thanks,
>>> --
>>> Ming Lei
>
>
>
> --
> Ming Lei

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                                             ` <CA+v9cxZfzATWynXGtEe6gvcD2aRsLqAF3ZN_HM_dyU4W_rQSpg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-22 14:10                                                               ` Ming Lei
  2012-04-22 14:15                                                               ` Ming Lei
  1 sibling, 0 replies; 40+ messages in thread
From: Ming Lei @ 2012-04-22 14:10 UTC (permalink / raw)
  To: Huajun Li
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

On Sun, Apr 22, 2012 at 9:15 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Sun, Apr 22, 2012 at 8:05 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Sun, Apr 22, 2012 at 10:19 AM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> On Sat, Apr 21, 2012 at 3:45 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>> Hi Huajun,
>>>>
>>>> On Sat, Apr 21, 2012 at 3:06 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>>> Just skip trying this per your following email's comments.
>>>>>
>>>>> I will prepare a new patch later, if you'd like to try it.
>>>>
>>>> The below patch reverts the below commits:
>>>>
>>>>       0956a8c20b23d429e79ff86d4325583fc06f9eb4
>>>>      (usbnet: increase URB reference count before usb_unlink_urb)
>>>>
>>>>      4231d47e6fe69f061f96c98c30eaf9fb4c14b96d
>>>>      (net/usbnet: avoid recursive locking in usbnet_stop())
>>>>
>>>> and keep holding tx/rx queue lock during unlinking, but avoid
>>>> to acquire the same queue lock inside complete handler triggered by
>>>> usb_unlink_urb.
>>>>
>>>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>>>> index db99536..effb34e 100644
>>>> --- a/drivers/net/usb/usbnet.c
>>>> +++ b/drivers/net/usb/usbnet.c
>>>> @@ -291,9 +291,11 @@ static void defer_bh(struct usbnet *dev, struct
>>>> sk_buff *skb, struct sk_buff_hea
>>>>  {
>>>>        unsigned long           flags;
>>>>
>>>> -       spin_lock_irqsave(&list->lock, flags);
>>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>>> +               spin_lock_irqsave(&list->lock, flags);
>>>>        __skb_unlink(skb, list);
>>>> -       spin_unlock(&list->lock);
>>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>>> +               spin_unlock(&list->lock);
>>>>        spin_lock(&dev->done.lock);
>>>>        __skb_queue_tail(&dev->done, skb);
>>>>        if (dev->done.qlen == 1)
>>>
>>>
>>> Then 'flags' may not be initialized, and this will cause problem while
>>> calling spin_unlock_irqrestore(&dev->done.lock, flags), right?
>>
>> The flag is a percpu variable, so it can't change during the
>> above code piece.
>>
>
> No, maybe you misunderstood it.
> Legacy code stack has 'flags' variable to save irq information, and at
> the end defer_bh(), spin_unlock_irqrestore(&dev->done.lock, flags)
> will use it. However, the variable may not be initialized in your
> patch.

Got it, it is easy to fix it by replace the below line:

                 spin_unlock_irqrestore(&dev->done.lock, flags);

with these:

                 if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
                       spin_unlock_irqrestore(&dev->done.lock, flags);
                 else
                       spin_unlock(&dev->done.lock);

>
>>>
>>>
>>>> @@ -345,7 +347,8 @@ static int rx_submit (struct usbnet *dev, struct
>>>> urb *urb, gfp_t flags)
>>>>        usb_fill_bulk_urb (urb, dev->udev, dev->in,
>>>>                skb->data, size, rx_complete, skb);
>>>>
>>>> -       spin_lock_irqsave (&dev->rxq.lock, lockflags);
>>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>>> +               spin_lock_irqsave (&dev->rxq.lock, lockflags);
>>>>
>>>>        if (netif_running (dev->net) &&
>>>>            netif_device_present (dev->net) &&
>>>> @@ -377,7 +380,8 @@ static int rx_submit (struct usbnet *dev, struct
>>>> urb *urb, gfp_t flags)
>>>>                netif_dbg(dev, ifdown, dev->net, "rx: stopped\n");
>>>>                retval = -ENOLINK;
>>>>        }
>>>> -       spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
>>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>>> +               spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
>>>>        if (retval) {
>>>>                dev_kfree_skb_any (skb);
>>>>                usb_free_urb (urb);
>>>> @@ -582,6 +586,7 @@ static int unlink_urbs (struct usbnet *dev, struct
>>>> sk_buff_head *q)
>>>>        int                     count = 0;
>>>>
>>>>        spin_lock_irqsave (&q->lock, flags);
>>>> +       set_cpu_bit(URB_UNLINKING, dev->cpuflags);
>>>>        skb_queue_walk_safe(q, skb, skbnext) {
>>>>                struct skb_data         *entry;
>>>>                struct urb              *urb;
>>>> @@ -590,15 +595,6 @@ static int unlink_urbs (struct usbnet *dev,
>>>> struct sk_buff_head *q)
>>>>                entry = (struct skb_data *) skb->cb;
>>>>                urb = entry->urb;
>>>>
>>>> -               /*
>>>> -                * Get reference count of the URB to avoid it to be
>>>> -                * freed during usb_unlink_urb, which may trigger
>>>> -                * use-after-free problem inside usb_unlink_urb since
>>>> -                * usb_unlink_urb is always racing with .complete
>>>> -                * handler(include defer_bh).
>>>> -                */
>>>> -               usb_get_urb(urb);
>>>> -               spin_unlock_irqrestore(&q->lock, flags);
>>>>                // during some PM-driven resume scenarios,
>>>>                // these (async) unlinks complete immediately
>>>>                retval = usb_unlink_urb (urb);
>>>> @@ -606,9 +602,8 @@ static int unlink_urbs (struct usbnet *dev, struct
>>>> sk_buff_head *q)
>>>>                        netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
>>>>                else
>>>>                        count++;
>>>> -               usb_put_urb(urb);
>>>> -               spin_lock_irqsave(&q->lock, flags);
>>>>        }
>>>> +       clear_cpu_bit(URB_UNLINKING, dev->cpuflags);
>>>>        spin_unlock_irqrestore (&q->lock, flags);
>>>>        return count;
>>>>  }
>>>> @@ -1283,6 +1278,7 @@ void usbnet_disconnect (struct usb_interface *intf)
>>>>        usb_kill_urb(dev->interrupt);
>>>>        usb_free_urb(dev->interrupt);
>>>>
>>>> +       free_percpu(dev->cpuflags);
>>>>        free_netdev(net);
>>>>        usb_put_dev (xdev);
>>>>  }
>>>> @@ -1353,6 +1349,13 @@ usbnet_probe (struct usb_interface *udev, const
>>>> struct usb_device_id *prod)
>>>>        SET_NETDEV_DEV(net, &udev->dev);
>>>>
>>>>        dev = netdev_priv(net);
>>>> +
>>>> +       dev->cpuflags = alloc_percpu(unsigned long);
>>>> +       if (!dev->cpuflags) {
>>>> +               status = -ENOMEM;
>>>> +               goto out1;
>>>> +       }
>>>> +
>>>>        dev->udev = xdev;
>>>>        dev->intf = udev;
>>>>        dev->driver_info = info;
>>>> @@ -1396,7 +1399,7 @@ usbnet_probe (struct usb_interface *udev, const
>>>> struct usb_device_id *prod)
>>>>        if (info->bind) {
>>>>                status = info->bind (dev, udev);
>>>>                if (status < 0)
>>>> -                       goto out1;
>>>> +                       goto out2;
>>>>
>>>>                // heuristic:  "usb%d" for links we know are two-host,
>>>>                // else "eth%d" when there's reasonable doubt.  userspace
>>>> @@ -1465,6 +1468,8 @@ usbnet_probe (struct usb_interface *udev, const
>>>> struct usb_device_id *prod)
>>>>  out3:
>>>>        if (info->unbind)
>>>>                info->unbind (dev, udev);
>>>> +out2:
>>>> +       free_percpu(dev->cpuflags);
>>>>  out1:
>>>>        free_netdev(net);
>>>>  out:
>>>> diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
>>>> index 605b0aa..2dc46f5 100644
>>>> --- a/include/linux/usb/usbnet.h
>>>> +++ b/include/linux/usb/usbnet.h
>>>> @@ -69,8 +69,28 @@ struct usbnet {
>>>>  #              define EVENT_DEV_WAKING 6
>>>>  #              define EVENT_DEV_ASLEEP 7
>>>>  #              define EVENT_DEV_OPEN   8
>>>> +       unsigned long __percpu *cpuflags;
>>>> +#              define URB_UNLINKING    0
>>>
>>> Is it possible using a simple bool variable to track whether q->lock
>>> is hold by unlink_urb() ?  If yes, it can avoid introducing following
>>> new codes into current code stack.
>>
>> It should be defined as percpu variable. The URB complete handler
>> may be triggered inside unlink path, or it can be triggered in hardirq path
>> from other CPUs at the same time.
>>
>
> Then the potential issue is,  in defer_bh(),  __skb_unlink() may be
> called without holding the lock. This can cause current issue too,

No, URB_UNLINKING being set means the lock has been held during unlink,
see unlink_urbs related changes in the percpu flag patch.

> because __skb_unlink() may move 'skb->next' out of txq/rxq, and even
> free it.
>
>>>
>>>>  };
>>>>
>>>> +static inline void set_cpu_bit(int nr, unsigned long __percpu *addr)
>>>> +{
>>>> +       unsigned long *fl = __this_cpu_ptr(addr);
>>>> +       set_bit(nr, fl);
>>>> +}
>>>> +
>>>> +static inline void clear_cpu_bit(int nr, unsigned long __percpu *addr)
>>>> +{
>>>> +       unsigned long *fl = __this_cpu_ptr(addr);
>>>> +       clear_bit(nr, fl);
>>>> +}
>>>> +
>>>> +static inline int test_cpu_bit(int nr, unsigned long __percpu *addr)
>>>> +{
>>>> +       unsigned long *fl = __this_cpu_ptr(addr);
>>>> +       return test_bit(nr, fl);
>>>> +}
>>>> +
>>>>  static inline struct usb_driver *driver_of(struct usb_interface *intf)
>>>>  {
>>>>        return to_usb_driver(intf->dev.driver);
>>>>
>>>>
>>>> Thanks,
>>>> --
>>>> Ming Lei
>>
>>
>>
>> --
>> Ming Lei



-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                                             ` <CA+v9cxZfzATWynXGtEe6gvcD2aRsLqAF3ZN_HM_dyU4W_rQSpg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-04-22 14:10                                                               ` Ming Lei
@ 2012-04-22 14:15                                                               ` Ming Lei
  1 sibling, 0 replies; 40+ messages in thread
From: Ming Lei @ 2012-04-22 14:15 UTC (permalink / raw)
  To: Huajun Li
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

On Sun, Apr 22, 2012 at 9:15 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Sun, Apr 22, 2012 at 8:05 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On Sun, Apr 22, 2012 at 10:19 AM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> On Sat, Apr 21, 2012 at 3:45 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>> Hi Huajun,
>>>>
>>>> On Sat, Apr 21, 2012 at 3:06 PM, Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>>>>> Just skip trying this per your following email's comments.
>>>>>
>>>>> I will prepare a new patch later, if you'd like to try it.
>>>>
>>>> The below patch reverts the below commits:
>>>>
>>>>       0956a8c20b23d429e79ff86d4325583fc06f9eb4
>>>>      (usbnet: increase URB reference count before usb_unlink_urb)
>>>>
>>>>      4231d47e6fe69f061f96c98c30eaf9fb4c14b96d
>>>>      (net/usbnet: avoid recursive locking in usbnet_stop())
>>>>
>>>> and keep holding tx/rx queue lock during unlinking, but avoid
>>>> to acquire the same queue lock inside complete handler triggered by
>>>> usb_unlink_urb.
>>>>
>>>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>>>> index db99536..effb34e 100644
>>>> --- a/drivers/net/usb/usbnet.c
>>>> +++ b/drivers/net/usb/usbnet.c
>>>> @@ -291,9 +291,11 @@ static void defer_bh(struct usbnet *dev, struct
>>>> sk_buff *skb, struct sk_buff_hea
>>>>  {
>>>>        unsigned long           flags;
>>>>
>>>> -       spin_lock_irqsave(&list->lock, flags);
>>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>>> +               spin_lock_irqsave(&list->lock, flags);
>>>>        __skb_unlink(skb, list);
>>>> -       spin_unlock(&list->lock);
>>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>>> +               spin_unlock(&list->lock);
>>>>        spin_lock(&dev->done.lock);
>>>>        __skb_queue_tail(&dev->done, skb);
>>>>        if (dev->done.qlen == 1)
>>>
>>>
>>> Then 'flags' may not be initialized, and this will cause problem while
>>> calling spin_unlock_irqrestore(&dev->done.lock, flags), right?
>>
>> The flag is a percpu variable, so it can't change during the
>> above code piece.
>>
>
> No, maybe you misunderstood it.
> Legacy code stack has 'flags' variable to save irq information, and at
> the end defer_bh(), spin_unlock_irqrestore(&dev->done.lock, flags)
> will use it. However, the variable may not be initialized in your
> patch.

Got it, it is easy to fix it by replace the below line:

                 spin_unlock_irqrestore(&dev->done.lock, flags);

with these:

                 if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
                       spin_unlock_irqrestore(&dev->done.lock, flags);
                 else
                       spin_unlock(&dev->done.lock);

>
>>>
>>>
>>>> @@ -345,7 +347,8 @@ static int rx_submit (struct usbnet *dev, struct
>>>> urb *urb, gfp_t flags)
>>>>        usb_fill_bulk_urb (urb, dev->udev, dev->in,
>>>>                skb->data, size, rx_complete, skb);
>>>>
>>>> -       spin_lock_irqsave (&dev->rxq.lock, lockflags);
>>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>>> +               spin_lock_irqsave (&dev->rxq.lock, lockflags);
>>>>
>>>>        if (netif_running (dev->net) &&
>>>>            netif_device_present (dev->net) &&
>>>> @@ -377,7 +380,8 @@ static int rx_submit (struct usbnet *dev, struct
>>>> urb *urb, gfp_t flags)
>>>>                netif_dbg(dev, ifdown, dev->net, "rx: stopped\n");
>>>>                retval = -ENOLINK;
>>>>        }
>>>> -       spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
>>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>>> +               spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
>>>>        if (retval) {
>>>>                dev_kfree_skb_any (skb);
>>>>                usb_free_urb (urb);
>>>> @@ -582,6 +586,7 @@ static int unlink_urbs (struct usbnet *dev, struct
>>>> sk_buff_head *q)
>>>>        int                     count = 0;
>>>>
>>>>        spin_lock_irqsave (&q->lock, flags);
>>>> +       set_cpu_bit(URB_UNLINKING, dev->cpuflags);
>>>>        skb_queue_walk_safe(q, skb, skbnext) {
>>>>                struct skb_data         *entry;
>>>>                struct urb              *urb;
>>>> @@ -590,15 +595,6 @@ static int unlink_urbs (struct usbnet *dev,
>>>> struct sk_buff_head *q)
>>>>                entry = (struct skb_data *) skb->cb;
>>>>                urb = entry->urb;
>>>>
>>>> -               /*
>>>> -                * Get reference count of the URB to avoid it to be
>>>> -                * freed during usb_unlink_urb, which may trigger
>>>> -                * use-after-free problem inside usb_unlink_urb since
>>>> -                * usb_unlink_urb is always racing with .complete
>>>> -                * handler(include defer_bh).
>>>> -                */
>>>> -               usb_get_urb(urb);
>>>> -               spin_unlock_irqrestore(&q->lock, flags);
>>>>                // during some PM-driven resume scenarios,
>>>>                // these (async) unlinks complete immediately
>>>>                retval = usb_unlink_urb (urb);
>>>> @@ -606,9 +602,8 @@ static int unlink_urbs (struct usbnet *dev, struct
>>>> sk_buff_head *q)
>>>>                        netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
>>>>                else
>>>>                        count++;
>>>> -               usb_put_urb(urb);
>>>> -               spin_lock_irqsave(&q->lock, flags);
>>>>        }
>>>> +       clear_cpu_bit(URB_UNLINKING, dev->cpuflags);
>>>>        spin_unlock_irqrestore (&q->lock, flags);
>>>>        return count;
>>>>  }
>>>> @@ -1283,6 +1278,7 @@ void usbnet_disconnect (struct usb_interface *intf)
>>>>        usb_kill_urb(dev->interrupt);
>>>>        usb_free_urb(dev->interrupt);
>>>>
>>>> +       free_percpu(dev->cpuflags);
>>>>        free_netdev(net);
>>>>        usb_put_dev (xdev);
>>>>  }
>>>> @@ -1353,6 +1349,13 @@ usbnet_probe (struct usb_interface *udev, const
>>>> struct usb_device_id *prod)
>>>>        SET_NETDEV_DEV(net, &udev->dev);
>>>>
>>>>        dev = netdev_priv(net);
>>>> +
>>>> +       dev->cpuflags = alloc_percpu(unsigned long);
>>>> +       if (!dev->cpuflags) {
>>>> +               status = -ENOMEM;
>>>> +               goto out1;
>>>> +       }
>>>> +
>>>>        dev->udev = xdev;
>>>>        dev->intf = udev;
>>>>        dev->driver_info = info;
>>>> @@ -1396,7 +1399,7 @@ usbnet_probe (struct usb_interface *udev, const
>>>> struct usb_device_id *prod)
>>>>        if (info->bind) {
>>>>                status = info->bind (dev, udev);
>>>>                if (status < 0)
>>>> -                       goto out1;
>>>> +                       goto out2;
>>>>
>>>>                // heuristic:  "usb%d" for links we know are two-host,
>>>>                // else "eth%d" when there's reasonable doubt.  userspace
>>>> @@ -1465,6 +1468,8 @@ usbnet_probe (struct usb_interface *udev, const
>>>> struct usb_device_id *prod)
>>>>  out3:
>>>>        if (info->unbind)
>>>>                info->unbind (dev, udev);
>>>> +out2:
>>>> +       free_percpu(dev->cpuflags);
>>>>  out1:
>>>>        free_netdev(net);
>>>>  out:
>>>> diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
>>>> index 605b0aa..2dc46f5 100644
>>>> --- a/include/linux/usb/usbnet.h
>>>> +++ b/include/linux/usb/usbnet.h
>>>> @@ -69,8 +69,28 @@ struct usbnet {
>>>>  #              define EVENT_DEV_WAKING 6
>>>>  #              define EVENT_DEV_ASLEEP 7
>>>>  #              define EVENT_DEV_OPEN   8
>>>> +       unsigned long __percpu *cpuflags;
>>>> +#              define URB_UNLINKING    0
>>>
>>> Is it possible using a simple bool variable to track whether q->lock
>>> is hold by unlink_urb() ?  If yes, it can avoid introducing following
>>> new codes into current code stack.
>>
>> It should be defined as percpu variable. The URB complete handler
>> may be triggered inside unlink path, or it can be triggered in hardirq path
>> from other CPUs at the same time.
>>
>
> Then the potential issue is,  in defer_bh(),  __skb_unlink() may be
> called without holding the lock. This can cause current issue too,

No, URB_UNLINKING being set means the lock has been held during unlink,
see unlink_urbs related changes in the percpu flag patch.

> because __skb_unlink() may move 'skb->next' out of txq/rxq, and even
> free it.
>
>>>
>>>>  };
>>>>
>>>> +static inline void set_cpu_bit(int nr, unsigned long __percpu *addr)
>>>> +{
>>>> +       unsigned long *fl = __this_cpu_ptr(addr);
>>>> +       set_bit(nr, fl);
>>>> +}
>>>> +
>>>> +static inline void clear_cpu_bit(int nr, unsigned long __percpu *addr)
>>>> +{
>>>> +       unsigned long *fl = __this_cpu_ptr(addr);
>>>> +       clear_bit(nr, fl);
>>>> +}
>>>> +
>>>> +static inline int test_cpu_bit(int nr, unsigned long __percpu *addr)
>>>> +{
>>>> +       unsigned long *fl = __this_cpu_ptr(addr);
>>>> +       return test_bit(nr, fl);
>>>> +}
>>>> +
>>>>  static inline struct usb_driver *driver_of(struct usb_interface *intf)
>>>>  {
>>>>        return to_usb_driver(intf->dev.driver);
>>>>
>>>>
>>>> Thanks,
>>>> --
>>>> Ming Lei
>>
>>
>>
>> --
>> Ming Lei



-- 
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-04-22 13:15                                                           ` Huajun Li
       [not found]                                                             ` <CA+v9cxZfzATWynXGtEe6gvcD2aRsLqAF3ZN_HM_dyU4W_rQSpg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-22 14:19                                                             ` Ming Lei
  1 sibling, 0 replies; 40+ messages in thread
From: Ming Lei @ 2012-04-22 14:19 UTC (permalink / raw)
  To: Huajun Li
  Cc: Oliver Neukum, Alan Stern, Dave Jones, netdev, linux-usb,
	Fedora Kernel Team

On Sun, Apr 22, 2012 at 9:15 PM, Huajun Li <huajun.li.lee@gmail.com> wrote:
> On Sun, Apr 22, 2012 at 8:05 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>> On Sun, Apr 22, 2012 at 10:19 AM, Huajun Li <huajun.li.lee@gmail.com> wrote:
>>> On Sat, Apr 21, 2012 at 3:45 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>>>> Hi Huajun,
>>>>
>>>> On Sat, Apr 21, 2012 at 3:06 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>>>>>> Just skip trying this per your following email's comments.
>>>>>
>>>>> I will prepare a new patch later, if you'd like to try it.
>>>>
>>>> The below patch reverts the below commits:
>>>>
>>>>       0956a8c20b23d429e79ff86d4325583fc06f9eb4
>>>>      (usbnet: increase URB reference count before usb_unlink_urb)
>>>>
>>>>      4231d47e6fe69f061f96c98c30eaf9fb4c14b96d
>>>>      (net/usbnet: avoid recursive locking in usbnet_stop())
>>>>
>>>> and keep holding tx/rx queue lock during unlinking, but avoid
>>>> to acquire the same queue lock inside complete handler triggered by
>>>> usb_unlink_urb.
>>>>
>>>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>>>> index db99536..effb34e 100644
>>>> --- a/drivers/net/usb/usbnet.c
>>>> +++ b/drivers/net/usb/usbnet.c
>>>> @@ -291,9 +291,11 @@ static void defer_bh(struct usbnet *dev, struct
>>>> sk_buff *skb, struct sk_buff_hea
>>>>  {
>>>>        unsigned long           flags;
>>>>
>>>> -       spin_lock_irqsave(&list->lock, flags);
>>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>>> +               spin_lock_irqsave(&list->lock, flags);
>>>>        __skb_unlink(skb, list);
>>>> -       spin_unlock(&list->lock);
>>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>>> +               spin_unlock(&list->lock);
>>>>        spin_lock(&dev->done.lock);
>>>>        __skb_queue_tail(&dev->done, skb);
>>>>        if (dev->done.qlen == 1)
>>>
>>>
>>> Then 'flags' may not be initialized, and this will cause problem while
>>> calling spin_unlock_irqrestore(&dev->done.lock, flags), right?
>>
>> The flag is a percpu variable, so it can't change during the
>> above code piece.
>>
>
> No, maybe you misunderstood it.
> Legacy code stack has 'flags' variable to save irq information, and at
> the end defer_bh(), spin_unlock_irqrestore(&dev->done.lock, flags)
> will use it. However, the variable may not be initialized in your
> patch.

Got it, it is easy to fix it by replace the below line:

                 spin_unlock_irqrestore(&dev->done.lock, flags);

with these:

                 if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
                       spin_unlock_irqrestore(&dev->done.lock, flags);
                 else
                       spin_unlock(&dev->done.lock);

>
>>>
>>>
>>>> @@ -345,7 +347,8 @@ static int rx_submit (struct usbnet *dev, struct
>>>> urb *urb, gfp_t flags)
>>>>        usb_fill_bulk_urb (urb, dev->udev, dev->in,
>>>>                skb->data, size, rx_complete, skb);
>>>>
>>>> -       spin_lock_irqsave (&dev->rxq.lock, lockflags);
>>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>>> +               spin_lock_irqsave (&dev->rxq.lock, lockflags);
>>>>
>>>>        if (netif_running (dev->net) &&
>>>>            netif_device_present (dev->net) &&
>>>> @@ -377,7 +380,8 @@ static int rx_submit (struct usbnet *dev, struct
>>>> urb *urb, gfp_t flags)
>>>>                netif_dbg(dev, ifdown, dev->net, "rx: stopped\n");
>>>>                retval = -ENOLINK;
>>>>        }
>>>> -       spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
>>>> +       if (!test_cpu_bit(URB_UNLINKING, dev->cpuflags))
>>>> +               spin_unlock_irqrestore (&dev->rxq.lock, lockflags);
>>>>        if (retval) {
>>>>                dev_kfree_skb_any (skb);
>>>>                usb_free_urb (urb);
>>>> @@ -582,6 +586,7 @@ static int unlink_urbs (struct usbnet *dev, struct
>>>> sk_buff_head *q)
>>>>        int                     count = 0;
>>>>
>>>>        spin_lock_irqsave (&q->lock, flags);
>>>> +       set_cpu_bit(URB_UNLINKING, dev->cpuflags);
>>>>        skb_queue_walk_safe(q, skb, skbnext) {
>>>>                struct skb_data         *entry;
>>>>                struct urb              *urb;
>>>> @@ -590,15 +595,6 @@ static int unlink_urbs (struct usbnet *dev,
>>>> struct sk_buff_head *q)
>>>>                entry = (struct skb_data *) skb->cb;
>>>>                urb = entry->urb;
>>>>
>>>> -               /*
>>>> -                * Get reference count of the URB to avoid it to be
>>>> -                * freed during usb_unlink_urb, which may trigger
>>>> -                * use-after-free problem inside usb_unlink_urb since
>>>> -                * usb_unlink_urb is always racing with .complete
>>>> -                * handler(include defer_bh).
>>>> -                */
>>>> -               usb_get_urb(urb);
>>>> -               spin_unlock_irqrestore(&q->lock, flags);
>>>>                // during some PM-driven resume scenarios,
>>>>                // these (async) unlinks complete immediately
>>>>                retval = usb_unlink_urb (urb);
>>>> @@ -606,9 +602,8 @@ static int unlink_urbs (struct usbnet *dev, struct
>>>> sk_buff_head *q)
>>>>                        netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
>>>>                else
>>>>                        count++;
>>>> -               usb_put_urb(urb);
>>>> -               spin_lock_irqsave(&q->lock, flags);
>>>>        }
>>>> +       clear_cpu_bit(URB_UNLINKING, dev->cpuflags);
>>>>        spin_unlock_irqrestore (&q->lock, flags);
>>>>        return count;
>>>>  }
>>>> @@ -1283,6 +1278,7 @@ void usbnet_disconnect (struct usb_interface *intf)
>>>>        usb_kill_urb(dev->interrupt);
>>>>        usb_free_urb(dev->interrupt);
>>>>
>>>> +       free_percpu(dev->cpuflags);
>>>>        free_netdev(net);
>>>>        usb_put_dev (xdev);
>>>>  }
>>>> @@ -1353,6 +1349,13 @@ usbnet_probe (struct usb_interface *udev, const
>>>> struct usb_device_id *prod)
>>>>        SET_NETDEV_DEV(net, &udev->dev);
>>>>
>>>>        dev = netdev_priv(net);
>>>> +
>>>> +       dev->cpuflags = alloc_percpu(unsigned long);
>>>> +       if (!dev->cpuflags) {
>>>> +               status = -ENOMEM;
>>>> +               goto out1;
>>>> +       }
>>>> +
>>>>        dev->udev = xdev;
>>>>        dev->intf = udev;
>>>>        dev->driver_info = info;
>>>> @@ -1396,7 +1399,7 @@ usbnet_probe (struct usb_interface *udev, const
>>>> struct usb_device_id *prod)
>>>>        if (info->bind) {
>>>>                status = info->bind (dev, udev);
>>>>                if (status < 0)
>>>> -                       goto out1;
>>>> +                       goto out2;
>>>>
>>>>                // heuristic:  "usb%d" for links we know are two-host,
>>>>                // else "eth%d" when there's reasonable doubt.  userspace
>>>> @@ -1465,6 +1468,8 @@ usbnet_probe (struct usb_interface *udev, const
>>>> struct usb_device_id *prod)
>>>>  out3:
>>>>        if (info->unbind)
>>>>                info->unbind (dev, udev);
>>>> +out2:
>>>> +       free_percpu(dev->cpuflags);
>>>>  out1:
>>>>        free_netdev(net);
>>>>  out:
>>>> diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
>>>> index 605b0aa..2dc46f5 100644
>>>> --- a/include/linux/usb/usbnet.h
>>>> +++ b/include/linux/usb/usbnet.h
>>>> @@ -69,8 +69,28 @@ struct usbnet {
>>>>  #              define EVENT_DEV_WAKING 6
>>>>  #              define EVENT_DEV_ASLEEP 7
>>>>  #              define EVENT_DEV_OPEN   8
>>>> +       unsigned long __percpu *cpuflags;
>>>> +#              define URB_UNLINKING    0
>>>
>>> Is it possible using a simple bool variable to track whether q->lock
>>> is hold by unlink_urb() ?  If yes, it can avoid introducing following
>>> new codes into current code stack.
>>
>> It should be defined as percpu variable. The URB complete handler
>> may be triggered inside unlink path, or it can be triggered in hardirq path
>> from other CPUs at the same time.
>>
>
> Then the potential issue is,  in defer_bh(),  __skb_unlink() may be
> called without holding the lock. This can cause current issue too,

No, URB_UNLINKING being set means the lock has been held during unlink,
see unlink_urbs related changes in the percpu flag patch.

> because __skb_unlink() may move 'skb->next' out of txq/rxq, and even
> free it.
>
>>>
>>>>  };
>>>>
>>>> +static inline void set_cpu_bit(int nr, unsigned long __percpu *addr)
>>>> +{
>>>> +       unsigned long *fl = __this_cpu_ptr(addr);
>>>> +       set_bit(nr, fl);
>>>> +}
>>>> +
>>>> +static inline void clear_cpu_bit(int nr, unsigned long __percpu *addr)
>>>> +{
>>>> +       unsigned long *fl = __this_cpu_ptr(addr);
>>>> +       clear_bit(nr, fl);
>>>> +}
>>>> +
>>>> +static inline int test_cpu_bit(int nr, unsigned long __percpu *addr)
>>>> +{
>>>> +       unsigned long *fl = __this_cpu_ptr(addr);
>>>> +       return test_bit(nr, fl);
>>>> +}
>>>> +
>>>>  static inline struct usb_driver *driver_of(struct usb_interface *intf)
>>>>  {
>>>>        return to_usb_driver(intf->dev.driver);
>>>>
>>>>
>>>> Thanks,
>>>> --
>>>> Ming Lei
>>
>>
>>
>> --
>> Ming Lei



-- 
Ming Lei

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
       [not found]                                 ` <CA+v9cxawVwKakF6c_RpAw2XUGWcbqd8M+ZJqyq76Au9rmNosmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-04-26  5:02                                   ` Ming Lei
  2012-04-26 16:30                                     ` Huajun Li
  0 siblings, 1 reply; 40+ messages in thread
From: Ming Lei @ 2012-04-26  5:02 UTC (permalink / raw)
  To: Huajun Li
  Cc: Oliver Neukum, Alan Stern, Dave Jones,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	Fedora Kernel Team

Hi Huajun,

On Fri, Apr 20, 2012 at 9:37 PM, Huajun Li <huajun.li.lee-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
>
> Above patch has already been integrated to mainline. However, maybe
> there still exists another potentail use-after-free issue, here is a
> case:
>      After release the lock in unlink_urbs(), defer_bh() may move
> current skb from rxq/txq to dev->done queue, even cause the skb be
> released. Then in next loop cycle, it can't refer to expected skb, and
> may Oops again.
>
> To easily reproduce it, in unlink_urbs(), you can delay a short time
> after usb_put_urb(urb), then disconnect your device while transferring
> data, and repeat it times you will find errors on your screen.
>
> Following is a draft patch to guarantee the queue consistent, and
> refer to expected skb in each loop cycle:
>
> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
> index b7b3f5b..6da0141 100644
> --- a/drivers/net/usb/usbnet.c
> +++ b/drivers/net/usb/usbnet.c
> @@ -578,16 +578,28 @@ EXPORT_SYMBOL_GPL(usbnet_purge_paused_rxq);
>  static int unlink_urbs (struct usbnet *dev, struct sk_buff_head *q)
>  {
>        unsigned long           flags;
> -       struct sk_buff          *skb, *skbnext;
> +       struct sk_buff          *skb;
>        int                     count = 0;
>
>        spin_lock_irqsave (&q->lock, flags);
> -       skb_queue_walk_safe(q, skb, skbnext) {
> +       while (!skb_queue_empty(q)) {
>                struct skb_data         *entry;
>                struct urb              *urb;
>                int                     retval;
>
> -               entry = (struct skb_data *) skb->cb;
> +               skb_queue_walk(q, skb) {
> +                       entry = (struct skb_data *)skb->cb;
> +                       if (entry->state == rx_done ||
> +                               entry->state == tx_done ||
> +                               entry->state == rx_cleanup)
> +                               continue;
> +                       else
> +                               break;
> +               }
> +
> +               if (skb == (struct sk_buff *)(q))
> +                       break;
> +
>                urb = entry->urb;
>
>

After thinking about the issue further, the basic idea of your patch is good,
but not safe(double unlink, also races on accessing entry->state), so I write
a new one based on your patch.

Are you OK this new one?

Thanks,
-- 
Ming Lei

From 89eb41968caa9523c90c1cf7b7e2259db00e04b8 Mon Sep 17 00:00:00 2001
From: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date: Thu, 26 Apr 2012 11:33:46 +0800
Subject: [PATCH] usbnet: fix skb traversing races during unlink

Commit 4231d47e6fe69f061f96c98c30eaf9fb4c14b96d(net/usbnet: avoid
recursive locking in usbnet_stop()) fixes the recursive locking
problem by releasing the skb queue lock before unlink, but may
cause skb traversing races:
	- after URB is unlinked and the queue lock is released,
	the refered skb and skb->next may be moved to done queue,
	even be released
	- in skb_queue_walk_safe, the next skb is still obtained
	by next pointer of the last skb
	- so maybe trigger oops or other problems

This patch extends the usage of entry->state to describe 'start_unlink'
state, so always holding the queue(rx/tx) lock to change the state if
the referd skb is in rx or tx queue because we need to know if the
refered urb has been started unlinking in unlink_urbs.

Also the patch uses usb_block_urb introduced by Oliver to block
URB resubmit in complete handler if the URB will be unlinked.

The other part of this patch is based on Hugjun's patch:
always traverse from head of the tx/rx queue to get skb which is
to be unlinked but not been started unlinking.
---
 drivers/net/usb/usbnet.c   |   54 ++++++++++++++++++++++++++++++++------------
 include/linux/usb/usbnet.h |    3 ++-
 2 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index db99536..dff5e1b 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -281,17 +281,30 @@ int usbnet_change_mtu (struct net_device *net,
int new_mtu)
 }
 EXPORT_SYMBOL_GPL(usbnet_change_mtu);

+/*The caller must hold list->lock*/
+static void __usbnet_queue_skb(struct sk_buff_head *list,
+			struct sk_buff *newsk, enum skb_state state)
+{
+	struct skb_data *entry = (struct skb_data *) newsk->cb;
+
+	__skb_queue_tail(list, newsk);
+	entry->state = state;
+}
+
 /*-------------------------------------------------------------------------*/

 /* some LK 2.4 HCDs oopsed if we freed or resubmitted urbs from
  * completion callbacks.  2.5 should have fixed those bugs...
  */

-static void defer_bh(struct usbnet *dev, struct sk_buff *skb, struct
sk_buff_head *list)
+static void defer_bh(struct usbnet *dev, struct sk_buff *skb,
+		struct sk_buff_head *list, enum skb_state state)
 {
 	unsigned long		flags;
+	struct skb_data *entry = (struct skb_data *) skb->cb;

 	spin_lock_irqsave(&list->lock, flags);
+	entry->state = state;
 	__skb_unlink(skb, list);
 	spin_unlock(&list->lock);
 	spin_lock(&dev->done.lock);
@@ -339,7 +352,6 @@ static int rx_submit (struct usbnet *dev, struct
urb *urb, gfp_t flags)
 	entry = (struct skb_data *) skb->cb;
 	entry->urb = urb;
 	entry->dev = dev;
-	entry->state = rx_start;
 	entry->length = 0;

 	usb_fill_bulk_urb (urb, dev->udev, dev->in,
@@ -371,7 +383,7 @@ static int rx_submit (struct usbnet *dev, struct
urb *urb, gfp_t flags)
 			tasklet_schedule (&dev->bh);
 			break;
 		case 0:
-			__skb_queue_tail (&dev->rxq, skb);
+			__usbnet_queue_skb(&dev->rxq, skb, rx_start);
 		}
 	} else {
 		netif_dbg(dev, ifdown, dev->net, "rx: stopped\n");
@@ -422,16 +434,17 @@ static void rx_complete (struct urb *urb)
 	struct skb_data		*entry = (struct skb_data *) skb->cb;
 	struct usbnet		*dev = entry->dev;
 	int			urb_status = urb->status;
+	enum skb_state		state;

 	skb_put (skb, urb->actual_length);
-	entry->state = rx_done;
+	state = rx_done;
 	entry->urb = NULL;

 	switch (urb_status) {
 	/* success */
 	case 0:
 		if (skb->len < dev->net->hard_header_len) {
-			entry->state = rx_cleanup;
+			state = rx_cleanup;
 			dev->net->stats.rx_errors++;
 			dev->net->stats.rx_length_errors++;
 			netif_dbg(dev, rx_err, dev->net,
@@ -470,7 +483,7 @@ static void rx_complete (struct urb *urb)
 				  "rx throttle %d\n", urb_status);
 		}
 block:
-		entry->state = rx_cleanup;
+		state = rx_cleanup;
 		entry->urb = urb;
 		urb = NULL;
 		break;
@@ -481,13 +494,13 @@ block:
 		// FALLTHROUGH

 	default:
-		entry->state = rx_cleanup;
+		state = rx_cleanup;
 		dev->net->stats.rx_errors++;
 		netif_dbg(dev, rx_err, dev->net, "rx status %d\n", urb_status);
 		break;
 	}

-	defer_bh(dev, skb, &dev->rxq);
+	defer_bh(dev, skb, &dev->rxq, state);

 	if (urb) {
 		if (netif_running (dev->net) &&
@@ -578,16 +591,24 @@ EXPORT_SYMBOL_GPL(usbnet_purge_paused_rxq);
 static int unlink_urbs (struct usbnet *dev, struct sk_buff_head *q)
 {
 	unsigned long		flags;
-	struct sk_buff		*skb, *skbnext;
+	struct sk_buff		*skb;
 	int			count = 0;

 	spin_lock_irqsave (&q->lock, flags);
-	skb_queue_walk_safe(q, skb, skbnext) {
+	while (!skb_queue_empty(q)) {
 		struct skb_data		*entry;
 		struct urb		*urb;
 		int			retval;

-		entry = (struct skb_data *) skb->cb;
+		skb_queue_walk(q, skb) {
+			entry = (struct skb_data *) skb->cb;
+			if (entry->state != unlink_start)
+				break;
+		}
+		if (skb == (struct sk_buff *)q)
+			break;
+
+		entry->state = unlink_start;
 		urb = entry->urb;

 		/*
@@ -598,6 +619,10 @@ static int unlink_urbs (struct usbnet *dev,
struct sk_buff_head *q)
 		 * handler(include defer_bh).
 		 */
 		usb_get_urb(urb);
+
+		/*speedup unlink by blocking resubmit*/
+		usb_block_urb(urb);
+
 		spin_unlock_irqrestore(&q->lock, flags);
 		// during some PM-driven resume scenarios,
 		// these (async) unlinks complete immediately
@@ -606,6 +631,7 @@ static int unlink_urbs (struct usbnet *dev, struct
sk_buff_head *q)
 			netdev_dbg(dev->net, "unlink urb err, %d\n", retval);
 		else
 			count++;
+		usb_unblock_urb(urb);
 		usb_put_urb(urb);
 		spin_lock_irqsave(&q->lock, flags);
 	}
@@ -1039,8 +1065,7 @@ static void tx_complete (struct urb *urb)
 	}

 	usb_autopm_put_interface_async(dev->intf);
-	entry->state = tx_done;
-	defer_bh(dev, skb, &dev->txq);
+	defer_bh(dev, skb, &dev->txq, tx_done);
 }

 /*-------------------------------------------------------------------------*/
@@ -1096,7 +1121,6 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
 	entry = (struct skb_data *) skb->cb;
 	entry->urb = urb;
 	entry->dev = dev;
-	entry->state = tx_start;
 	entry->length = length;

 	usb_fill_bulk_urb (urb, dev->udev, dev->out,
@@ -1155,7 +1179,7 @@ netdev_tx_t usbnet_start_xmit (struct sk_buff *skb,
 		break;
 	case 0:
 		net->trans_start = jiffies;
-		__skb_queue_tail (&dev->txq, skb);
+		__usbnet_queue_skb(&dev->txq, skb, tx_start);
 		if (dev->txq.qlen >= TX_QLEN (dev))
 			netif_stop_queue (net);
 	}
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index 605b0aa..76f4396 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -191,7 +191,8 @@ extern void usbnet_cdc_status(struct usbnet *,
struct urb *);
 enum skb_state {
 	illegal = 0,
 	tx_start, tx_done,
-	rx_start, rx_done, rx_cleanup
+	rx_start, rx_done, rx_cleanup,
+	unlink_start
 };

 struct skb_data {	/* skb->cb is one of these */
-- 
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: use-after-free in usbnet
  2012-04-26  5:02                                   ` Ming Lei
@ 2012-04-26 16:30                                     ` Huajun Li
  0 siblings, 0 replies; 40+ messages in thread
From: Huajun Li @ 2012-04-26 16:30 UTC (permalink / raw)
  To: Ming Lei
  Cc: Oliver Neukum, Alan Stern, Dave Jones, netdev, linux-usb,
	Fedora Kernel Team

On Thu, Apr 26, 2012 at 1:02 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> Hi Huajun,
>
> On Fri, Apr 20, 2012 at 9:37 PM, Huajun Li <huajun.li.lee@gmail.com> wrote:
>>
>>
>> Above patch has already been integrated to mainline. However, maybe
>> there still exists another potentail use-after-free issue, here is a
>> case:
>>      After release the lock in unlink_urbs(), defer_bh() may move
>> current skb from rxq/txq to dev->done queue, even cause the skb be
>> released. Then in next loop cycle, it can't refer to expected skb, and
>> may Oops again.
>>
>> To easily reproduce it, in unlink_urbs(), you can delay a short time
>> after usb_put_urb(urb), then disconnect your device while transferring
>> data, and repeat it times you will find errors on your screen.
>>
>> Following is a draft patch to guarantee the queue consistent, and
>> refer to expected skb in each loop cycle:
>>
>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>> index b7b3f5b..6da0141 100644
>> --- a/drivers/net/usb/usbnet.c
>> +++ b/drivers/net/usb/usbnet.c
>> @@ -578,16 +578,28 @@ EXPORT_SYMBOL_GPL(usbnet_purge_paused_rxq);
>>  static int unlink_urbs (struct usbnet *dev, struct sk_buff_head *q)
>>  {
>>        unsigned long           flags;
>> -       struct sk_buff          *skb, *skbnext;
>> +       struct sk_buff          *skb;
>>        int                     count = 0;
>>
>>        spin_lock_irqsave (&q->lock, flags);
>> -       skb_queue_walk_safe(q, skb, skbnext) {
>> +       while (!skb_queue_empty(q)) {
>>                struct skb_data         *entry;
>>                struct urb              *urb;
>>                int                     retval;
>>
>> -               entry = (struct skb_data *) skb->cb;
>> +               skb_queue_walk(q, skb) {
>> +                       entry = (struct skb_data *)skb->cb;
>> +                       if (entry->state == rx_done ||
>> +                               entry->state == tx_done ||
>> +                               entry->state == rx_cleanup)
>> +                               continue;
>> +                       else
>> +                               break;
>> +               }
>> +
>> +               if (skb == (struct sk_buff *)(q))
>> +                       break;
>> +
>>                urb = entry->urb;
>>
>>
>
> After thinking about the issue further, the basic idea of your patch is good,
> but not safe(double unlink, also races on accessing entry->state), so I write
> a new one based on your patch.
>
> Are you OK this new one?
>

Yes,  no more issue that I can see by now, thanks.

> Thanks,
> --
> Ming Lei
>
> From 89eb41968caa9523c90c1cf7b7e2259db00e04b8 Mon Sep 17 00:00:00 2001
> From: Ming Lei <tom.leiming@gmail.com>
> Date: Thu, 26 Apr 2012 11:33:46 +0800
> Subject: [PATCH] usbnet: fix skb traversing races during unlink
>
> Commit 4231d47e6fe69f061f96c98c30eaf9fb4c14b96d(net/usbnet: avoid
> recursive locking in usbnet_stop()) fixes the recursive locking
> problem by releasing the skb queue lock before unlink, but may
> cause skb traversing races:
>        - after URB is unlinked and the queue lock is released,
>        the refered skb and skb->next may be moved to done queue,
>        even be released
>        - in skb_queue_walk_safe, the next skb is still obtained
>        by next pointer of the last skb
>        - so maybe trigger oops or other problems
>
> This patch extends the usage of entry->state to describe 'start_unlink'
> state, so always holding the queue(rx/tx) lock to change the state if
> the referd skb is in rx or tx queue because we need to know if the
> refered urb has been started unlinking in unlink_urbs.
>
> Also the patch uses usb_block_urb introduced by Oliver to block
> URB resubmit in complete handler if the URB will be unlinked.
>
> The other part of this patch is based on Hugjun's patch:

s/Hugjun/Huajun

> always traverse from head of the tx/rx queue to get skb which is
> to be unlinked but not been started unlinking.
> ---

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2012-04-26 16:30 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-19 15:12 use-after-free in usbnet Dave Jones
2012-03-20  9:40 ` Ming Lei
     [not found]   ` <CACVXFVN5O2S0hsnzHoi=LX+KAnccHc_F0SXq9-hMOXnaoUOdkg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-21  1:04     ` Ming Lei
2012-03-21  1:34       ` Ming Lei
2012-03-21 14:35         ` Alan Stern
     [not found]           ` <Pine.LNX.4.44L0.1203211031490.1369-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
2012-03-21 15:02             ` Ming Lei
     [not found]               ` <CACVXFVP+U1k7JFTmbabF-k8F3bO9zc58c3tLG6=1nQPcrR9p1g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-21 16:12                 ` Alan Stern
     [not found]                   ` <Pine.LNX.4.44L0.1203211201560.1369-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>
2012-03-21 16:22                     ` Ming Lei
     [not found]                       ` <CACVXFVOVjnWjqpKxbU98DAyUC_OSb8ZL-3WcyYuFXgPJn5UyuA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-03-21 17:30                         ` Alan Stern
2012-03-22  9:08                         ` Oliver Neukum
     [not found]                           ` <201203221008.46882.oneukum-l3A5Bk7waGM@public.gmane.org>
2012-03-22  9:30                             ` Ming Lei
2012-03-22  9:57                               ` Oliver Neukum
2012-04-20 13:37                               ` Huajun Li
2012-04-20 14:22                                 ` Ming Lei
     [not found]                                   ` <CACVXFVM5YBJkDZddeGi1_MPY7EqEV_wtoFy-NtBHYA6rxez0jg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-20 14:56                                     ` Huajun Li
     [not found]                                       ` <CA+v9cxZWtGbz6uCSysVbAc1hT27rCiuJXzcvSiTxH-zuQYnrZw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-20 15:07                                         ` Huajun Li
2012-04-21  1:49                                       ` Ming Lei
2012-04-21  2:02                                         ` Ming Lei
     [not found]                                         ` <CACVXFVOc0XZ+eLHGiVwKuiUResRk8Cj9MS4EPMx7k57a0tEJhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-21  6:39                                           ` Huajun Li
     [not found]                                             ` <CA+v9cxZZpL5soSga=MX_bD45KNve-Lnr2Qb6+gr7Mv6Txyh-fA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-21  7:06                                               ` Ming Lei
     [not found]                                                 ` <CACVXFVN6kvXk7s4Sc0d_-yKSM=rV3qcuPPBHVZYzoQjnwkGX+Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-21  7:45                                                   ` Ming Lei
     [not found]                                                     ` <CACVXFVNc_S8pTaBqMzQZx6Dt-tSP_9iXepxJzv=iR9BFu=Tj8g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-21  8:00                                                       ` Huajun Li
2012-04-22  2:19                                                       ` Huajun Li
2012-04-22 12:05                                                         ` Ming Lei
2012-04-22 13:15                                                           ` Huajun Li
     [not found]                                                             ` <CA+v9cxZfzATWynXGtEe6gvcD2aRsLqAF3ZN_HM_dyU4W_rQSpg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-22 14:10                                                               ` Ming Lei
2012-04-22 14:15                                                               ` Ming Lei
2012-04-22 14:19                                                             ` Ming Lei
2012-04-21  7:50                                                 ` Huajun Li
     [not found]                                                   ` <CA+v9cxa+fyzMOD-=oLLczpu1FDtTwcok+y2FkC=mHzDH3JYJ2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-21  7:56                                                     ` Ming Lei
     [not found]                                                       ` <CACVXFVOMTbq0zbmsZAt-Pyc=3oqQ=UcWV5HgNryu7s6oMhKpQg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-21  8:23                                                         ` Huajun Li
     [not found]                                                           ` <CA+v9cxZRpthaZ=64tLzKf=AGOqaRVwe5o0UMadiXGuiuXiA7uA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-21  9:40                                                             ` Ming Lei
     [not found]                                                               ` <CACVXFVOemJqfT9OPRer3qzbVEsGyUOupoOUNCBzC4deNRsksQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-22  2:02                                                                 ` Huajun Li
2012-04-21 19:23                                         ` David Miller
     [not found]                                           ` <20120421.152345.290988116097275353.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2012-04-22  1:45                                             ` Huajun Li
2012-04-22  2:05                                               ` David Miller
     [not found]                                 ` <CA+v9cxawVwKakF6c_RpAw2XUGWcbqd8M+ZJqyq76Au9rmNosmQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-04-26  5:02                                   ` Ming Lei
2012-04-26 16:30                                     ` Huajun Li
2012-03-21 14:44       ` Greg KH
2012-03-21 15:07         ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.