* [PATCH] hv_netvsc: fix schedule in RCU context
@ 2018-09-13 15:03 Stephen Hemminger
2018-09-13 15:33 ` Haiyang Zhang
2018-09-13 17:30 ` David Miller
0 siblings, 2 replies; 3+ messages in thread
From: Stephen Hemminger @ 2018-09-13 15:03 UTC (permalink / raw)
To: kys, haiyangz; +Cc: netdev, Stephen Hemminger
When netvsc device is removed it can call reschedule in RCU context.
This happens because canceling the subchannel setup work could (in theory)
cause a reschedule when manipulating the timer.
To reproduce, run with lockdep enabled kernel and unbind
a network device from hv_netvsc (via sysfs).
[ 160.682011] WARNING: suspicious RCU usage
[ 160.707466] 4.19.0-rc3-uio+ #2 Not tainted
[ 160.709937] -----------------------------
[ 160.712352] ./include/linux/rcupdate.h:302 Illegal context switch in RCU read-side critical section!
[ 160.723691]
[ 160.723691] other info that might help us debug this:
[ 160.723691]
[ 160.730955]
[ 160.730955] rcu_scheduler_active = 2, debug_locks = 1
[ 160.762813] 5 locks held by rebind-eth.sh/1812:
[ 160.766851] #0: 000000008befa37a (sb_writers#6){.+.+}, at: vfs_write+0x184/0x1b0
[ 160.773416] #1: 00000000b097f236 (&of->mutex){+.+.}, at: kernfs_fop_write+0xe2/0x1a0
[ 160.783766] #2: 0000000041ee6889 (kn->count#3){++++}, at: kernfs_fop_write+0xeb/0x1a0
[ 160.787465] #3: 0000000056d92a74 (&dev->mutex){....}, at: device_release_driver_internal+0x39/0x250
[ 160.816987] #4: 0000000030f6031e (rcu_read_lock){....}, at: netvsc_remove+0x1e/0x250 [hv_netvsc]
[ 160.828629]
[ 160.828629] stack backtrace:
[ 160.831966] CPU: 1 PID: 1812 Comm: rebind-eth.sh Not tainted 4.19.0-rc3-uio+ #2
[ 160.832952] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
[ 160.832952] Call Trace:
[ 160.832952] dump_stack+0x85/0xcb
[ 160.832952] ___might_sleep+0x1a3/0x240
[ 160.832952] __flush_work+0x57/0x2e0
[ 160.832952] ? __mutex_lock+0x83/0x990
[ 160.832952] ? __kernfs_remove+0x24f/0x2e0
[ 160.832952] ? __kernfs_remove+0x1b2/0x2e0
[ 160.832952] ? mark_held_locks+0x50/0x80
[ 160.832952] ? get_work_pool+0x90/0x90
[ 160.832952] __cancel_work_timer+0x13c/0x1e0
[ 160.832952] ? netvsc_remove+0x1e/0x250 [hv_netvsc]
[ 160.832952] ? __lock_is_held+0x55/0x90
[ 160.832952] netvsc_remove+0x9a/0x250 [hv_netvsc]
[ 160.832952] vmbus_remove+0x26/0x30
[ 160.832952] device_release_driver_internal+0x18a/0x250
[ 160.832952] unbind_store+0xb4/0x180
[ 160.832952] kernfs_fop_write+0x113/0x1a0
[ 160.832952] __vfs_write+0x36/0x1a0
[ 160.832952] ? rcu_read_lock_sched_held+0x6b/0x80
[ 160.832952] ? rcu_sync_lockdep_assert+0x2e/0x60
[ 160.832952] ? __sb_start_write+0x141/0x1a0
[ 160.832952] ? vfs_write+0x184/0x1b0
[ 160.832952] vfs_write+0xbe/0x1b0
[ 160.832952] ksys_write+0x55/0xc0
[ 160.832952] do_syscall_64+0x60/0x1b0
[ 160.832952] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 160.832952] RIP: 0033:0x7fe48f4c8154
Resolve this by getting RTNL earlier. This is safe because the subchannel
work queue does trylock on RTNL and will detect the race.
Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
drivers/net/hyperv/netvsc_drv.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 70921bbe0e28..915fbd66a02b 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2272,17 +2272,15 @@ static int netvsc_remove(struct hv_device *dev)
cancel_delayed_work_sync(&ndev_ctx->dwork);
- rcu_read_lock();
- nvdev = rcu_dereference(ndev_ctx->nvdev);
-
- if (nvdev)
+ rtnl_lock();
+ nvdev = rtnl_dereference(ndev_ctx->nvdev);
+ if (nvdev)
cancel_work_sync(&nvdev->subchan_work);
/*
* Call to the vsc driver to let it know that the device is being
* removed. Also blocks mtu and channel changes.
*/
- rtnl_lock();
vf_netdev = rtnl_dereference(ndev_ctx->vf_netdev);
if (vf_netdev)
netvsc_unregister_vf(vf_netdev);
@@ -2294,7 +2292,6 @@ static int netvsc_remove(struct hv_device *dev)
list_del(&ndev_ctx->list);
rtnl_unlock();
- rcu_read_unlock();
hv_set_drvdata(dev, NULL);
--
2.18.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* RE: [PATCH] hv_netvsc: fix schedule in RCU context
2018-09-13 15:03 [PATCH] hv_netvsc: fix schedule in RCU context Stephen Hemminger
@ 2018-09-13 15:33 ` Haiyang Zhang
2018-09-13 17:30 ` David Miller
1 sibling, 0 replies; 3+ messages in thread
From: Haiyang Zhang @ 2018-09-13 15:33 UTC (permalink / raw)
To: Stephen Hemminger, KY Srinivasan; +Cc: netdev, Stephen Hemminger
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Thursday, September 13, 2018 11:04 AM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>
> Cc: netdev@vger.kernel.org; Stephen Hemminger <sthemmin@microsoft.com>
> Subject: [PATCH] hv_netvsc: fix schedule in RCU context
>
> When netvsc device is removed it can call reschedule in RCU context.
> This happens because canceling the subchannel setup work could (in theory)
> cause a reschedule when manipulating the timer.
>
> To reproduce, run with lockdep enabled kernel and unbind
> a network device from hv_netvsc (via sysfs).
>
> [ 160.682011] WARNING: suspicious RCU usage
> [ 160.707466] 4.19.0-rc3-uio+ #2 Not tainted
> [ 160.709937] -----------------------------
> [ 160.712352] ./include/linux/rcupdate.h:302 Illegal context switch in RCU
> read-side critical section!
> [ 160.723691]
> [ 160.723691] other info that might help us debug this:
> [ 160.723691]
> [ 160.730955]
> [ 160.730955] rcu_scheduler_active = 2, debug_locks = 1
> [ 160.762813] 5 locks held by rebind-eth.sh/1812:
> [ 160.766851] #0: 000000008befa37a (sb_writers#6){.+.+}, at:
> vfs_write+0x184/0x1b0
> [ 160.773416] #1: 00000000b097f236 (&of->mutex){+.+.}, at:
> kernfs_fop_write+0xe2/0x1a0
> [ 160.783766] #2: 0000000041ee6889 (kn->count#3){++++}, at:
> kernfs_fop_write+0xeb/0x1a0
> [ 160.787465] #3: 0000000056d92a74 (&dev->mutex){....}, at:
> device_release_driver_internal+0x39/0x250
> [ 160.816987] #4: 0000000030f6031e (rcu_read_lock){....}, at:
> netvsc_remove+0x1e/0x250 [hv_netvsc]
> [ 160.828629]
> [ 160.828629] stack backtrace:
> [ 160.831966] CPU: 1 PID: 1812 Comm: rebind-eth.sh Not tainted 4.19.0-rc3-
> uio+ #2
> [ 160.832952] Hardware name: Microsoft Corporation Virtual Machine/Virtual
> Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
> [ 160.832952] Call Trace:
> [ 160.832952] dump_stack+0x85/0xcb
> [ 160.832952] ___might_sleep+0x1a3/0x240
> [ 160.832952] __flush_work+0x57/0x2e0
> [ 160.832952] ? __mutex_lock+0x83/0x990
> [ 160.832952] ? __kernfs_remove+0x24f/0x2e0
> [ 160.832952] ? __kernfs_remove+0x1b2/0x2e0
> [ 160.832952] ? mark_held_locks+0x50/0x80
> [ 160.832952] ? get_work_pool+0x90/0x90
> [ 160.832952] __cancel_work_timer+0x13c/0x1e0
> [ 160.832952] ? netvsc_remove+0x1e/0x250 [hv_netvsc]
> [ 160.832952] ? __lock_is_held+0x55/0x90
> [ 160.832952] netvsc_remove+0x9a/0x250 [hv_netvsc]
> [ 160.832952] vmbus_remove+0x26/0x30
> [ 160.832952] device_release_driver_internal+0x18a/0x250
> [ 160.832952] unbind_store+0xb4/0x180
> [ 160.832952] kernfs_fop_write+0x113/0x1a0
> [ 160.832952] __vfs_write+0x36/0x1a0
> [ 160.832952] ? rcu_read_lock_sched_held+0x6b/0x80
> [ 160.832952] ? rcu_sync_lockdep_assert+0x2e/0x60
> [ 160.832952] ? __sb_start_write+0x141/0x1a0
> [ 160.832952] ? vfs_write+0x184/0x1b0
> [ 160.832952] vfs_write+0xbe/0x1b0
> [ 160.832952] ksys_write+0x55/0xc0
> [ 160.832952] do_syscall_64+0x60/0x1b0
> [ 160.832952] entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 160.832952] RIP: 0033:0x7fe48f4c8154
>
> Resolve this by getting RTNL earlier. This is safe because the subchannel
> work queue does trylock on RTNL and will detect the race.
>
> Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Thank you!
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] hv_netvsc: fix schedule in RCU context
2018-09-13 15:03 [PATCH] hv_netvsc: fix schedule in RCU context Stephen Hemminger
2018-09-13 15:33 ` Haiyang Zhang
@ 2018-09-13 17:30 ` David Miller
1 sibling, 0 replies; 3+ messages in thread
From: David Miller @ 2018-09-13 17:30 UTC (permalink / raw)
To: stephen; +Cc: kys, haiyangz, netdev, sthemmin
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Thu, 13 Sep 2018 08:03:43 -0700
> When netvsc device is removed it can call reschedule in RCU context.
> This happens because canceling the subchannel setup work could (in theory)
> cause a reschedule when manipulating the timer.
>
> To reproduce, run with lockdep enabled kernel and unbind
> a network device from hv_netvsc (via sysfs).
...
> Resolve this by getting RTNL earlier. This is safe because the subchannel
> work queue does trylock on RTNL and will detect the race.
>
> Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Applied and queued up for -stable.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-09-13 22:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-13 15:03 [PATCH] hv_netvsc: fix schedule in RCU context Stephen Hemminger
2018-09-13 15:33 ` Haiyang Zhang
2018-09-13 17:30 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.