linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND][PATCH] net: called rtnl_unlock() before runpm resumes devices
@ 2021-08-09  3:28 AceLan Kao
  2021-08-09  5:51 ` Leon Romanovsky
  0 siblings, 1 reply; 6+ messages in thread
From: AceLan Kao @ 2021-08-09  3:28 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Eric Dumazet, Daniel Borkmann,
	Antoine Tenart, Alexander Lobakin, Wei Wang, Taehee Yoo,
	Björn Töpel, Heiner Kallweit, netdev, linux-kernel

From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>

The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in
__dev_open() it calls pm_runtime_resume() to resume devices, and in
some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock()
again. That leads to a recursive lock.

It should leave the devices' resume function to decide if they need to
call rtnl_lock()/rtnl_unlock(), so call rtnl_unlock() before calling
pm_runtime_resume() and then call rtnl_lock() after it in __dev_open().

[  967.723577] INFO: task ip:6024 blocked for more than 120 seconds.
[  967.723588]       Not tainted 5.12.0-rc3+ #1
[  967.723592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  967.723594] task:ip              state:D stack:    0 pid: 6024 ppid:  5957 flags:0x00004000
[  967.723603] Call Trace:
[  967.723610]  __schedule+0x2de/0x890
[  967.723623]  schedule+0x4f/0xc0
[  967.723629]  schedule_preempt_disabled+0xe/0x10
[  967.723636]  __mutex_lock.isra.0+0x190/0x510
[  967.723644]  __mutex_lock_slowpath+0x13/0x20
[  967.723651]  mutex_lock+0x32/0x40
[  967.723657]  rtnl_lock+0x15/0x20
[  967.723665]  igb_resume+0xee/0x1d0 [igb]
[  967.723687]  ? pci_pm_default_resume+0x30/0x30
[  967.723696]  igb_runtime_resume+0xe/0x10 [igb]
[  967.723713]  pci_pm_runtime_resume+0x74/0x90
[  967.723718]  __rpm_callback+0x53/0x1c0
[  967.723725]  rpm_callback+0x57/0x80
[  967.723730]  ? pci_pm_default_resume+0x30/0x30
[  967.723735]  rpm_resume+0x547/0x760
[  967.723740]  __pm_runtime_resume+0x52/0x80
[  967.723745]  __dev_open+0x56/0x160
[  967.723753]  ? _raw_spin_unlock_bh+0x1e/0x20
[  967.723758]  __dev_change_flags+0x188/0x1e0
[  967.723766]  dev_change_flags+0x26/0x60
[  967.723773]  do_setlink+0x723/0x10b0
[  967.723782]  ? __nla_validate_parse+0x5b/0xb80
[  967.723792]  __rtnl_newlink+0x594/0xa00
[  967.723800]  ? nla_put_ifalias+0x38/0xa0
[  967.723807]  ? __nla_reserve+0x41/0x50
[  967.723813]  ? __nla_reserve+0x41/0x50
[  967.723818]  ? __kmalloc_node_track_caller+0x49b/0x4d0
[  967.723824]  ? pskb_expand_head+0x75/0x310
[  967.723830]  ? nla_reserve+0x28/0x30
[  967.723835]  ? skb_free_head+0x25/0x30
[  967.723843]  ? security_sock_rcv_skb+0x2f/0x50
[  967.723850]  ? netlink_deliver_tap+0x3d/0x210
[  967.723859]  ? sk_filter_trim_cap+0xc1/0x230
[  967.723863]  ? skb_queue_tail+0x43/0x50
[  967.723870]  ? sock_def_readable+0x4b/0x80
[  967.723876]  ? __netlink_sendskb+0x42/0x50
[  967.723888]  ? security_capable+0x3d/0x60
[  967.723894]  ? __cond_resched+0x19/0x30
[  967.723900]  ? kmem_cache_alloc_trace+0x390/0x440
[  967.723906]  rtnl_newlink+0x49/0x70
[  967.723913]  rtnetlink_rcv_msg+0x13c/0x370
[  967.723920]  ? _copy_to_iter+0xa0/0x460
[  967.723927]  ? rtnl_calcit.isra.0+0x130/0x130
[  967.723934]  netlink_rcv_skb+0x55/0x100
[  967.723939]  rtnetlink_rcv+0x15/0x20
[  967.723944]  netlink_unicast+0x1a8/0x250
[  967.723949]  netlink_sendmsg+0x233/0x460
[  967.723954]  sock_sendmsg+0x65/0x70
[  967.723958]  ____sys_sendmsg+0x218/0x290
[  967.723961]  ? copy_msghdr_from_user+0x5c/0x90
[  967.723966]  ? lru_cache_add_inactive_or_unevictable+0x27/0xb0
[  967.723974]  ___sys_sendmsg+0x81/0xc0
[  967.723980]  ? __mod_memcg_lruvec_state+0x22/0xe0
[  967.723987]  ? kmem_cache_free+0x244/0x420
[  967.723991]  ? dentry_free+0x37/0x70
[  967.723996]  ? mntput_no_expire+0x4c/0x260
[  967.724001]  ? __cond_resched+0x19/0x30
[  967.724007]  ? security_file_free+0x54/0x60
[  967.724013]  ? call_rcu+0xa4/0x250
[  967.724021]  __sys_sendmsg+0x62/0xb0
[  967.724026]  ? exit_to_user_mode_prepare+0x3d/0x1a0
[  967.724032]  __x64_sys_sendmsg+0x1f/0x30
[  967.724037]  do_syscall_64+0x38/0x90
[  967.724044]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Fixes: bd869245a3dc ("net: core: try to runtime-resume detached device in __dev_open")
Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
---
 net/core/dev.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8f1a47ad6781..dd43a29419fd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1585,8 +1585,11 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
 
 	if (!netif_device_present(dev)) {
 		/* may be detached because parent is runtime-suspended */
-		if (dev->dev.parent)
+		if (dev->dev.parent) {
+			rtnl_unlock();
 			pm_runtime_resume(dev->dev.parent);
+			rtnl_lock();
+		}
 		if (!netif_device_present(dev))
 			return -ENODEV;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RESEND][PATCH] net: called rtnl_unlock() before runpm resumes devices
  2021-08-09  3:28 [RESEND][PATCH] net: called rtnl_unlock() before runpm resumes devices AceLan Kao
@ 2021-08-09  5:51 ` Leon Romanovsky
  2021-08-10  1:57   ` AceLan Kao
  0 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2021-08-09  5:51 UTC (permalink / raw)
  To: AceLan Kao
  Cc: David S. Miller, Jakub Kicinski, Eric Dumazet, Daniel Borkmann,
	Antoine Tenart, Alexander Lobakin, Wei Wang, Taehee Yoo,
	Björn Töpel, Heiner Kallweit, netdev, linux-kernel

On Mon, Aug 09, 2021 at 11:28:09AM +0800, AceLan Kao wrote:
> From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>
> 
> The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in
> __dev_open() it calls pm_runtime_resume() to resume devices, and in
> some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock()
> again. That leads to a recursive lock.
> 
> It should leave the devices' resume function to decide if they need to
> call rtnl_lock()/rtnl_unlock(), 

Why? It doesn't sound right that drivers internally decide if to take or
release some external to them lock without seeing full picture.

Most of the time, device driver authors do it wrong. I afraid that igs
is one of such drivers that did it wrong.

Thanks

> so call rtnl_unlock() before calling pm_runtime_resume() and then call
> rtnl_lock() after it in __dev_open().
> 
> [  967.723577] INFO: task ip:6024 blocked for more than 120 seconds.
> [  967.723588]       Not tainted 5.12.0-rc3+ #1
> [  967.723592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  967.723594] task:ip              state:D stack:    0 pid: 6024 ppid:  5957 flags:0x00004000
> [  967.723603] Call Trace:
> [  967.723610]  __schedule+0x2de/0x890
> [  967.723623]  schedule+0x4f/0xc0
> [  967.723629]  schedule_preempt_disabled+0xe/0x10
> [  967.723636]  __mutex_lock.isra.0+0x190/0x510
> [  967.723644]  __mutex_lock_slowpath+0x13/0x20
> [  967.723651]  mutex_lock+0x32/0x40
> [  967.723657]  rtnl_lock+0x15/0x20
> [  967.723665]  igb_resume+0xee/0x1d0 [igb]
> [  967.723687]  ? pci_pm_default_resume+0x30/0x30
> [  967.723696]  igb_runtime_resume+0xe/0x10 [igb]
> [  967.723713]  pci_pm_runtime_resume+0x74/0x90
> [  967.723718]  __rpm_callback+0x53/0x1c0
> [  967.723725]  rpm_callback+0x57/0x80
> [  967.723730]  ? pci_pm_default_resume+0x30/0x30
> [  967.723735]  rpm_resume+0x547/0x760
> [  967.723740]  __pm_runtime_resume+0x52/0x80
> [  967.723745]  __dev_open+0x56/0x160
> [  967.723753]  ? _raw_spin_unlock_bh+0x1e/0x20
> [  967.723758]  __dev_change_flags+0x188/0x1e0
> [  967.723766]  dev_change_flags+0x26/0x60
> [  967.723773]  do_setlink+0x723/0x10b0
> [  967.723782]  ? __nla_validate_parse+0x5b/0xb80
> [  967.723792]  __rtnl_newlink+0x594/0xa00
> [  967.723800]  ? nla_put_ifalias+0x38/0xa0
> [  967.723807]  ? __nla_reserve+0x41/0x50
> [  967.723813]  ? __nla_reserve+0x41/0x50
> [  967.723818]  ? __kmalloc_node_track_caller+0x49b/0x4d0
> [  967.723824]  ? pskb_expand_head+0x75/0x310
> [  967.723830]  ? nla_reserve+0x28/0x30
> [  967.723835]  ? skb_free_head+0x25/0x30
> [  967.723843]  ? security_sock_rcv_skb+0x2f/0x50
> [  967.723850]  ? netlink_deliver_tap+0x3d/0x210
> [  967.723859]  ? sk_filter_trim_cap+0xc1/0x230
> [  967.723863]  ? skb_queue_tail+0x43/0x50
> [  967.723870]  ? sock_def_readable+0x4b/0x80
> [  967.723876]  ? __netlink_sendskb+0x42/0x50
> [  967.723888]  ? security_capable+0x3d/0x60
> [  967.723894]  ? __cond_resched+0x19/0x30
> [  967.723900]  ? kmem_cache_alloc_trace+0x390/0x440
> [  967.723906]  rtnl_newlink+0x49/0x70
> [  967.723913]  rtnetlink_rcv_msg+0x13c/0x370
> [  967.723920]  ? _copy_to_iter+0xa0/0x460
> [  967.723927]  ? rtnl_calcit.isra.0+0x130/0x130
> [  967.723934]  netlink_rcv_skb+0x55/0x100
> [  967.723939]  rtnetlink_rcv+0x15/0x20
> [  967.723944]  netlink_unicast+0x1a8/0x250
> [  967.723949]  netlink_sendmsg+0x233/0x460
> [  967.723954]  sock_sendmsg+0x65/0x70
> [  967.723958]  ____sys_sendmsg+0x218/0x290
> [  967.723961]  ? copy_msghdr_from_user+0x5c/0x90
> [  967.723966]  ? lru_cache_add_inactive_or_unevictable+0x27/0xb0
> [  967.723974]  ___sys_sendmsg+0x81/0xc0
> [  967.723980]  ? __mod_memcg_lruvec_state+0x22/0xe0
> [  967.723987]  ? kmem_cache_free+0x244/0x420
> [  967.723991]  ? dentry_free+0x37/0x70
> [  967.723996]  ? mntput_no_expire+0x4c/0x260
> [  967.724001]  ? __cond_resched+0x19/0x30
> [  967.724007]  ? security_file_free+0x54/0x60
> [  967.724013]  ? call_rcu+0xa4/0x250
> [  967.724021]  __sys_sendmsg+0x62/0xb0
> [  967.724026]  ? exit_to_user_mode_prepare+0x3d/0x1a0
> [  967.724032]  __x64_sys_sendmsg+0x1f/0x30
> [  967.724037]  do_syscall_64+0x38/0x90
> [  967.724044]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> Fixes: bd869245a3dc ("net: core: try to runtime-resume detached device in __dev_open")
> Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
> ---
>  net/core/dev.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 8f1a47ad6781..dd43a29419fd 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1585,8 +1585,11 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
>  
>  	if (!netif_device_present(dev)) {
>  		/* may be detached because parent is runtime-suspended */
> -		if (dev->dev.parent)
> +		if (dev->dev.parent) {
> +			rtnl_unlock();
>  			pm_runtime_resume(dev->dev.parent);
> +			rtnl_lock();
> +		}
>  		if (!netif_device_present(dev))
>  			return -ENODEV;
>  	}
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RESEND][PATCH] net: called rtnl_unlock() before runpm resumes devices
  2021-08-09  5:51 ` Leon Romanovsky
@ 2021-08-10  1:57   ` AceLan Kao
  2021-08-10  7:08     ` Leon Romanovsky
  0 siblings, 1 reply; 6+ messages in thread
From: AceLan Kao @ 2021-08-10  1:57 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: David S. Miller, Jakub Kicinski, Eric Dumazet, Daniel Borkmann,
	Antoine Tenart, Alexander Lobakin, Wei Wang, Taehee Yoo,
	Björn Töpel, Heiner Kallweit, netdev, linux-kernel

Leon Romanovsky <leon@kernel.org> 於 2021年8月9日 週一 下午1:51寫道:
>
> On Mon, Aug 09, 2021 at 11:28:09AM +0800, AceLan Kao wrote:
> > From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>
> >
> > The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in
> > __dev_open() it calls pm_runtime_resume() to resume devices, and in
> > some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock()
> > again. That leads to a recursive lock.
> >
> > It should leave the devices' resume function to decide if they need to
> > call rtnl_lock()/rtnl_unlock(),
>
> Why? It doesn't sound right that drivers internally decide if to take or
> release some external to them lock without seeing full picture.
From what I observed, this is the only calling path that acquired
rtnl_lock() before calling drivers' resume function.
So, it encounters recursive lock while driver is going to cal rtnl_lock() again.

>
> Most of the time, device driver authors do it wrong. I afraid that igs
> is one of such drivers that did it wrong.
The issues could be if we remove rtnl_lock in device drivers, then in
other calling path, it won't be protected by the rtnl lock,
and maybe we shouldn't call pm_runtime_resume() here(within rtnl
lock), for device drivers don't know if they are protected by the rtnl
lock while their resume() got called.

>
> Thanks
>
> > so call rtnl_unlock() before calling pm_runtime_resume() and then call
> > rtnl_lock() after it in __dev_open().
> >
> > [  967.723577] INFO: task ip:6024 blocked for more than 120 seconds.
> > [  967.723588]       Not tainted 5.12.0-rc3+ #1
> > [  967.723592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  967.723594] task:ip              state:D stack:    0 pid: 6024 ppid:  5957 flags:0x00004000
> > [  967.723603] Call Trace:
> > [  967.723610]  __schedule+0x2de/0x890
> > [  967.723623]  schedule+0x4f/0xc0
> > [  967.723629]  schedule_preempt_disabled+0xe/0x10
> > [  967.723636]  __mutex_lock.isra.0+0x190/0x510
> > [  967.723644]  __mutex_lock_slowpath+0x13/0x20
> > [  967.723651]  mutex_lock+0x32/0x40
> > [  967.723657]  rtnl_lock+0x15/0x20
> > [  967.723665]  igb_resume+0xee/0x1d0 [igb]
> > [  967.723687]  ? pci_pm_default_resume+0x30/0x30
> > [  967.723696]  igb_runtime_resume+0xe/0x10 [igb]
> > [  967.723713]  pci_pm_runtime_resume+0x74/0x90
> > [  967.723718]  __rpm_callback+0x53/0x1c0
> > [  967.723725]  rpm_callback+0x57/0x80
> > [  967.723730]  ? pci_pm_default_resume+0x30/0x30
> > [  967.723735]  rpm_resume+0x547/0x760
> > [  967.723740]  __pm_runtime_resume+0x52/0x80
> > [  967.723745]  __dev_open+0x56/0x160
> > [  967.723753]  ? _raw_spin_unlock_bh+0x1e/0x20
> > [  967.723758]  __dev_change_flags+0x188/0x1e0
> > [  967.723766]  dev_change_flags+0x26/0x60
> > [  967.723773]  do_setlink+0x723/0x10b0
> > [  967.723782]  ? __nla_validate_parse+0x5b/0xb80
> > [  967.723792]  __rtnl_newlink+0x594/0xa00
> > [  967.723800]  ? nla_put_ifalias+0x38/0xa0
> > [  967.723807]  ? __nla_reserve+0x41/0x50
> > [  967.723813]  ? __nla_reserve+0x41/0x50
> > [  967.723818]  ? __kmalloc_node_track_caller+0x49b/0x4d0
> > [  967.723824]  ? pskb_expand_head+0x75/0x310
> > [  967.723830]  ? nla_reserve+0x28/0x30
> > [  967.723835]  ? skb_free_head+0x25/0x30
> > [  967.723843]  ? security_sock_rcv_skb+0x2f/0x50
> > [  967.723850]  ? netlink_deliver_tap+0x3d/0x210
> > [  967.723859]  ? sk_filter_trim_cap+0xc1/0x230
> > [  967.723863]  ? skb_queue_tail+0x43/0x50
> > [  967.723870]  ? sock_def_readable+0x4b/0x80
> > [  967.723876]  ? __netlink_sendskb+0x42/0x50
> > [  967.723888]  ? security_capable+0x3d/0x60
> > [  967.723894]  ? __cond_resched+0x19/0x30
> > [  967.723900]  ? kmem_cache_alloc_trace+0x390/0x440
> > [  967.723906]  rtnl_newlink+0x49/0x70
> > [  967.723913]  rtnetlink_rcv_msg+0x13c/0x370
> > [  967.723920]  ? _copy_to_iter+0xa0/0x460
> > [  967.723927]  ? rtnl_calcit.isra.0+0x130/0x130
> > [  967.723934]  netlink_rcv_skb+0x55/0x100
> > [  967.723939]  rtnetlink_rcv+0x15/0x20
> > [  967.723944]  netlink_unicast+0x1a8/0x250
> > [  967.723949]  netlink_sendmsg+0x233/0x460
> > [  967.723954]  sock_sendmsg+0x65/0x70
> > [  967.723958]  ____sys_sendmsg+0x218/0x290
> > [  967.723961]  ? copy_msghdr_from_user+0x5c/0x90
> > [  967.723966]  ? lru_cache_add_inactive_or_unevictable+0x27/0xb0
> > [  967.723974]  ___sys_sendmsg+0x81/0xc0
> > [  967.723980]  ? __mod_memcg_lruvec_state+0x22/0xe0
> > [  967.723987]  ? kmem_cache_free+0x244/0x420
> > [  967.723991]  ? dentry_free+0x37/0x70
> > [  967.723996]  ? mntput_no_expire+0x4c/0x260
> > [  967.724001]  ? __cond_resched+0x19/0x30
> > [  967.724007]  ? security_file_free+0x54/0x60
> > [  967.724013]  ? call_rcu+0xa4/0x250
> > [  967.724021]  __sys_sendmsg+0x62/0xb0
> > [  967.724026]  ? exit_to_user_mode_prepare+0x3d/0x1a0
> > [  967.724032]  __x64_sys_sendmsg+0x1f/0x30
> > [  967.724037]  do_syscall_64+0x38/0x90
> > [  967.724044]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > Fixes: bd869245a3dc ("net: core: try to runtime-resume detached device in __dev_open")
> > Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
> > ---
> >  net/core/dev.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 8f1a47ad6781..dd43a29419fd 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -1585,8 +1585,11 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
> >
> >       if (!netif_device_present(dev)) {
> >               /* may be detached because parent is runtime-suspended */
> > -             if (dev->dev.parent)
> > +             if (dev->dev.parent) {
> > +                     rtnl_unlock();
> >                       pm_runtime_resume(dev->dev.parent);
> > +                     rtnl_lock();
> > +             }
> >               if (!netif_device_present(dev))
> >                       return -ENODEV;
> >       }
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RESEND][PATCH] net: called rtnl_unlock() before runpm resumes devices
  2021-08-10  1:57   ` AceLan Kao
@ 2021-08-10  7:08     ` Leon Romanovsky
  2021-08-23  3:26       ` AceLan Kao
  0 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2021-08-10  7:08 UTC (permalink / raw)
  To: AceLan Kao
  Cc: David S. Miller, Jakub Kicinski, Eric Dumazet, Daniel Borkmann,
	Antoine Tenart, Alexander Lobakin, Wei Wang, Taehee Yoo,
	Björn Töpel, Heiner Kallweit, netdev, linux-kernel

On Tue, Aug 10, 2021 at 09:57:57AM +0800, AceLan Kao wrote:
> Leon Romanovsky <leon@kernel.org> 於 2021年8月9日 週一 下午1:51寫道:
> >
> > On Mon, Aug 09, 2021 at 11:28:09AM +0800, AceLan Kao wrote:
> > > From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>
> > >
> > > The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in
> > > __dev_open() it calls pm_runtime_resume() to resume devices, and in
> > > some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock()
> > > again. That leads to a recursive lock.
> > >
> > > It should leave the devices' resume function to decide if they need to
> > > call rtnl_lock()/rtnl_unlock(),
> >
> > Why? It doesn't sound right that drivers internally decide if to take or
> > release some external to them lock without seeing full picture.
> From what I observed, this is the only calling path that acquired
> rtnl_lock() before calling drivers' resume function.
> So, it encounters recursive lock while driver is going to cal rtnl_lock() again.

I clearly see the problem, but don't agree with a solution.

> 
> >
> > Most of the time, device driver authors do it wrong. I afraid that igs
> > is one of such drivers that did it wrong.
> The issues could be if we remove rtnl_lock in device drivers, then in
> other calling path, it won't be protected by the rtnl lock,
> and maybe we shouldn't call pm_runtime_resume() here(within rtnl
> lock), for device drivers don't know if they are protected by the rtnl
> lock while their resume() got called.

This is exactly the problem, every driver guesses if rtnl_lock is needed
or not in specific path. It is wrong by design. You should ensure that
all paths that are triggered through rtnl should hold rtnl_lock.

You dropped rtnl_lock() without any protection, it is 100% bug.

Thanks

> 
> >
> > Thanks
> >
> > > so call rtnl_unlock() before calling pm_runtime_resume() and then call
> > > rtnl_lock() after it in __dev_open().
> > >
> > > [  967.723577] INFO: task ip:6024 blocked for more than 120 seconds.
> > > [  967.723588]       Not tainted 5.12.0-rc3+ #1
> > > [  967.723592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [  967.723594] task:ip              state:D stack:    0 pid: 6024 ppid:  5957 flags:0x00004000
> > > [  967.723603] Call Trace:
> > > [  967.723610]  __schedule+0x2de/0x890
> > > [  967.723623]  schedule+0x4f/0xc0
> > > [  967.723629]  schedule_preempt_disabled+0xe/0x10
> > > [  967.723636]  __mutex_lock.isra.0+0x190/0x510
> > > [  967.723644]  __mutex_lock_slowpath+0x13/0x20
> > > [  967.723651]  mutex_lock+0x32/0x40
> > > [  967.723657]  rtnl_lock+0x15/0x20
> > > [  967.723665]  igb_resume+0xee/0x1d0 [igb]
> > > [  967.723687]  ? pci_pm_default_resume+0x30/0x30
> > > [  967.723696]  igb_runtime_resume+0xe/0x10 [igb]
> > > [  967.723713]  pci_pm_runtime_resume+0x74/0x90
> > > [  967.723718]  __rpm_callback+0x53/0x1c0
> > > [  967.723725]  rpm_callback+0x57/0x80
> > > [  967.723730]  ? pci_pm_default_resume+0x30/0x30
> > > [  967.723735]  rpm_resume+0x547/0x760
> > > [  967.723740]  __pm_runtime_resume+0x52/0x80
> > > [  967.723745]  __dev_open+0x56/0x160
> > > [  967.723753]  ? _raw_spin_unlock_bh+0x1e/0x20
> > > [  967.723758]  __dev_change_flags+0x188/0x1e0
> > > [  967.723766]  dev_change_flags+0x26/0x60
> > > [  967.723773]  do_setlink+0x723/0x10b0
> > > [  967.723782]  ? __nla_validate_parse+0x5b/0xb80
> > > [  967.723792]  __rtnl_newlink+0x594/0xa00
> > > [  967.723800]  ? nla_put_ifalias+0x38/0xa0
> > > [  967.723807]  ? __nla_reserve+0x41/0x50
> > > [  967.723813]  ? __nla_reserve+0x41/0x50
> > > [  967.723818]  ? __kmalloc_node_track_caller+0x49b/0x4d0
> > > [  967.723824]  ? pskb_expand_head+0x75/0x310
> > > [  967.723830]  ? nla_reserve+0x28/0x30
> > > [  967.723835]  ? skb_free_head+0x25/0x30
> > > [  967.723843]  ? security_sock_rcv_skb+0x2f/0x50
> > > [  967.723850]  ? netlink_deliver_tap+0x3d/0x210
> > > [  967.723859]  ? sk_filter_trim_cap+0xc1/0x230
> > > [  967.723863]  ? skb_queue_tail+0x43/0x50
> > > [  967.723870]  ? sock_def_readable+0x4b/0x80
> > > [  967.723876]  ? __netlink_sendskb+0x42/0x50
> > > [  967.723888]  ? security_capable+0x3d/0x60
> > > [  967.723894]  ? __cond_resched+0x19/0x30
> > > [  967.723900]  ? kmem_cache_alloc_trace+0x390/0x440
> > > [  967.723906]  rtnl_newlink+0x49/0x70
> > > [  967.723913]  rtnetlink_rcv_msg+0x13c/0x370
> > > [  967.723920]  ? _copy_to_iter+0xa0/0x460
> > > [  967.723927]  ? rtnl_calcit.isra.0+0x130/0x130
> > > [  967.723934]  netlink_rcv_skb+0x55/0x100
> > > [  967.723939]  rtnetlink_rcv+0x15/0x20
> > > [  967.723944]  netlink_unicast+0x1a8/0x250
> > > [  967.723949]  netlink_sendmsg+0x233/0x460
> > > [  967.723954]  sock_sendmsg+0x65/0x70
> > > [  967.723958]  ____sys_sendmsg+0x218/0x290
> > > [  967.723961]  ? copy_msghdr_from_user+0x5c/0x90
> > > [  967.723966]  ? lru_cache_add_inactive_or_unevictable+0x27/0xb0
> > > [  967.723974]  ___sys_sendmsg+0x81/0xc0
> > > [  967.723980]  ? __mod_memcg_lruvec_state+0x22/0xe0
> > > [  967.723987]  ? kmem_cache_free+0x244/0x420
> > > [  967.723991]  ? dentry_free+0x37/0x70
> > > [  967.723996]  ? mntput_no_expire+0x4c/0x260
> > > [  967.724001]  ? __cond_resched+0x19/0x30
> > > [  967.724007]  ? security_file_free+0x54/0x60
> > > [  967.724013]  ? call_rcu+0xa4/0x250
> > > [  967.724021]  __sys_sendmsg+0x62/0xb0
> > > [  967.724026]  ? exit_to_user_mode_prepare+0x3d/0x1a0
> > > [  967.724032]  __x64_sys_sendmsg+0x1f/0x30
> > > [  967.724037]  do_syscall_64+0x38/0x90
> > > [  967.724044]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > Fixes: bd869245a3dc ("net: core: try to runtime-resume detached device in __dev_open")
> > > Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
> > > ---
> > >  net/core/dev.c | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 8f1a47ad6781..dd43a29419fd 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -1585,8 +1585,11 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
> > >
> > >       if (!netif_device_present(dev)) {
> > >               /* may be detached because parent is runtime-suspended */
> > > -             if (dev->dev.parent)
> > > +             if (dev->dev.parent) {
> > > +                     rtnl_unlock();
> > >                       pm_runtime_resume(dev->dev.parent);
> > > +                     rtnl_lock();
> > > +             }
> > >               if (!netif_device_present(dev))
> > >                       return -ENODEV;
> > >       }
> > > --
> > > 2.25.1
> > >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RESEND][PATCH] net: called rtnl_unlock() before runpm resumes devices
  2021-08-10  7:08     ` Leon Romanovsky
@ 2021-08-23  3:26       ` AceLan Kao
  2021-08-24  2:45         ` Nguyen, Anthony L
  0 siblings, 1 reply; 6+ messages in thread
From: AceLan Kao @ 2021-08-23  3:26 UTC (permalink / raw)
  To: Jesse Brandeburg, Tony Nguyen
  Cc: Leon Romanovsky, David S. Miller, Jakub Kicinski, Eric Dumazet,
	Daniel Borkmann, Antoine Tenart, Alexander Lobakin, Wei Wang,
	Taehee Yoo, Björn Töpel, Heiner Kallweit, netdev,
	Linux-Kernel@Vger. Kernel. Org

Leon Romanovsky <leon@kernel.org> 於 2021年8月10日 週二 下午3:08寫道:
>
> On Tue, Aug 10, 2021 at 09:57:57AM +0800, AceLan Kao wrote:
> > Leon Romanovsky <leon@kernel.org> 於 2021年8月9日 週一 下午1:51寫道:
> > >
> > > On Mon, Aug 09, 2021 at 11:28:09AM +0800, AceLan Kao wrote:
> > > > From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>
> > > >
> > > > The rtnl_lock() has been called in rtnetlink_rcv_msg(), and then in
> > > > __dev_open() it calls pm_runtime_resume() to resume devices, and in
> > > > some devices' resume function(igb_resum,igc_resume) they calls rtnl_lock()
> > > > again. That leads to a recursive lock.
> > > >
> > > > It should leave the devices' resume function to decide if they need to
> > > > call rtnl_lock()/rtnl_unlock(),
> > >
> > > Why? It doesn't sound right that drivers internally decide if to take or
> > > release some external to them lock without seeing full picture.
> > From what I observed, this is the only calling path that acquired
> > rtnl_lock() before calling drivers' resume function.
> > So, it encounters recursive lock while driver is going to cal rtnl_lock() again.
>
> I clearly see the problem, but don't agree with a solution.
>
> >
> > >
> > > Most of the time, device driver authors do it wrong. I afraid that igs
> > > is one of such drivers that did it wrong.
> > The issues could be if we remove rtnl_lock in device drivers, then in
> > other calling path, it won't be protected by the rtnl lock,
> > and maybe we shouldn't call pm_runtime_resume() here(within rtnl
> > lock), for device drivers don't know if they are protected by the rtnl
> > lock while their resume() got called.
>
> This is exactly the problem, every driver guesses if rtnl_lock is needed
> or not in specific path. It is wrong by design. You should ensure that
> all paths that are triggered through rtnl should hold rtnl_lock.
Hi Jesse, Tony,

As you are the Intel Ethernet drivers' maintainers, do you have any
idea about this?
We can reproduce this issue on the machine with PCI Ethernet card
using igb or igc driver.

>
> You dropped rtnl_lock() without any protection, it is 100% bug.
>
> Thanks
>
> >
> > >
> > > Thanks
> > >
> > > > so call rtnl_unlock() before calling pm_runtime_resume() and then call
> > > > rtnl_lock() after it in __dev_open().
> > > >
> > > > [  967.723577] INFO: task ip:6024 blocked for more than 120 seconds.
> > > > [  967.723588]       Not tainted 5.12.0-rc3+ #1
> > > > [  967.723592] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > > [  967.723594] task:ip              state:D stack:    0 pid: 6024 ppid:  5957 flags:0x00004000
> > > > [  967.723603] Call Trace:
> > > > [  967.723610]  __schedule+0x2de/0x890
> > > > [  967.723623]  schedule+0x4f/0xc0
> > > > [  967.723629]  schedule_preempt_disabled+0xe/0x10
> > > > [  967.723636]  __mutex_lock.isra.0+0x190/0x510
> > > > [  967.723644]  __mutex_lock_slowpath+0x13/0x20
> > > > [  967.723651]  mutex_lock+0x32/0x40
> > > > [  967.723657]  rtnl_lock+0x15/0x20
> > > > [  967.723665]  igb_resume+0xee/0x1d0 [igb]
> > > > [  967.723687]  ? pci_pm_default_resume+0x30/0x30
> > > > [  967.723696]  igb_runtime_resume+0xe/0x10 [igb]
> > > > [  967.723713]  pci_pm_runtime_resume+0x74/0x90
> > > > [  967.723718]  __rpm_callback+0x53/0x1c0
> > > > [  967.723725]  rpm_callback+0x57/0x80
> > > > [  967.723730]  ? pci_pm_default_resume+0x30/0x30
> > > > [  967.723735]  rpm_resume+0x547/0x760
> > > > [  967.723740]  __pm_runtime_resume+0x52/0x80
> > > > [  967.723745]  __dev_open+0x56/0x160
> > > > [  967.723753]  ? _raw_spin_unlock_bh+0x1e/0x20
> > > > [  967.723758]  __dev_change_flags+0x188/0x1e0
> > > > [  967.723766]  dev_change_flags+0x26/0x60
> > > > [  967.723773]  do_setlink+0x723/0x10b0
> > > > [  967.723782]  ? __nla_validate_parse+0x5b/0xb80
> > > > [  967.723792]  __rtnl_newlink+0x594/0xa00
> > > > [  967.723800]  ? nla_put_ifalias+0x38/0xa0
> > > > [  967.723807]  ? __nla_reserve+0x41/0x50
> > > > [  967.723813]  ? __nla_reserve+0x41/0x50
> > > > [  967.723818]  ? __kmalloc_node_track_caller+0x49b/0x4d0
> > > > [  967.723824]  ? pskb_expand_head+0x75/0x310
> > > > [  967.723830]  ? nla_reserve+0x28/0x30
> > > > [  967.723835]  ? skb_free_head+0x25/0x30
> > > > [  967.723843]  ? security_sock_rcv_skb+0x2f/0x50
> > > > [  967.723850]  ? netlink_deliver_tap+0x3d/0x210
> > > > [  967.723859]  ? sk_filter_trim_cap+0xc1/0x230
> > > > [  967.723863]  ? skb_queue_tail+0x43/0x50
> > > > [  967.723870]  ? sock_def_readable+0x4b/0x80
> > > > [  967.723876]  ? __netlink_sendskb+0x42/0x50
> > > > [  967.723888]  ? security_capable+0x3d/0x60
> > > > [  967.723894]  ? __cond_resched+0x19/0x30
> > > > [  967.723900]  ? kmem_cache_alloc_trace+0x390/0x440
> > > > [  967.723906]  rtnl_newlink+0x49/0x70
> > > > [  967.723913]  rtnetlink_rcv_msg+0x13c/0x370
> > > > [  967.723920]  ? _copy_to_iter+0xa0/0x460
> > > > [  967.723927]  ? rtnl_calcit.isra.0+0x130/0x130
> > > > [  967.723934]  netlink_rcv_skb+0x55/0x100
> > > > [  967.723939]  rtnetlink_rcv+0x15/0x20
> > > > [  967.723944]  netlink_unicast+0x1a8/0x250
> > > > [  967.723949]  netlink_sendmsg+0x233/0x460
> > > > [  967.723954]  sock_sendmsg+0x65/0x70
> > > > [  967.723958]  ____sys_sendmsg+0x218/0x290
> > > > [  967.723961]  ? copy_msghdr_from_user+0x5c/0x90
> > > > [  967.723966]  ? lru_cache_add_inactive_or_unevictable+0x27/0xb0
> > > > [  967.723974]  ___sys_sendmsg+0x81/0xc0
> > > > [  967.723980]  ? __mod_memcg_lruvec_state+0x22/0xe0
> > > > [  967.723987]  ? kmem_cache_free+0x244/0x420
> > > > [  967.723991]  ? dentry_free+0x37/0x70
> > > > [  967.723996]  ? mntput_no_expire+0x4c/0x260
> > > > [  967.724001]  ? __cond_resched+0x19/0x30
> > > > [  967.724007]  ? security_file_free+0x54/0x60
> > > > [  967.724013]  ? call_rcu+0xa4/0x250
> > > > [  967.724021]  __sys_sendmsg+0x62/0xb0
> > > > [  967.724026]  ? exit_to_user_mode_prepare+0x3d/0x1a0
> > > > [  967.724032]  __x64_sys_sendmsg+0x1f/0x30
> > > > [  967.724037]  do_syscall_64+0x38/0x90
> > > > [  967.724044]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > >
> > > > Fixes: bd869245a3dc ("net: core: try to runtime-resume detached device in __dev_open")
> > > > Signed-off-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
> > > > ---
> > > >  net/core/dev.c | 5 ++++-
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > index 8f1a47ad6781..dd43a29419fd 100644
> > > > --- a/net/core/dev.c
> > > > +++ b/net/core/dev.c
> > > > @@ -1585,8 +1585,11 @@ static int __dev_open(struct net_device *dev, struct netlink_ext_ack *extack)
> > > >
> > > >       if (!netif_device_present(dev)) {
> > > >               /* may be detached because parent is runtime-suspended */
> > > > -             if (dev->dev.parent)
> > > > +             if (dev->dev.parent) {
> > > > +                     rtnl_unlock();
> > > >                       pm_runtime_resume(dev->dev.parent);
> > > > +                     rtnl_lock();
> > > > +             }
> > > >               if (!netif_device_present(dev))
> > > >                       return -ENODEV;
> > > >       }
> > > > --
> > > > 2.25.1
> > > >

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RESEND][PATCH] net: called rtnl_unlock() before runpm resumes devices
  2021-08-23  3:26       ` AceLan Kao
@ 2021-08-24  2:45         ` Nguyen, Anthony L
  0 siblings, 0 replies; 6+ messages in thread
From: Nguyen, Anthony L @ 2021-08-24  2:45 UTC (permalink / raw)
  To: acelan.kao, Brandeburg, Jesse
  Cc: alobakin, davem, ap420073, hkallweit1, Loktionov, Aleksandr,
	daniel, bjorn, atenart, kuba, netdev, leon, weiwan, linux-kernel,
	edumazet, Neftin, Sasha

On Mon, 2021-08-23 at 11:26 +0800, AceLan Kao wrote:
> Leon Romanovsky <leon@kernel.org> 於 2021年8月10日 週二 下午3:08寫道:
> > On Tue, Aug 10, 2021 at 09:57:57AM +0800, AceLan Kao wrote:
> > > Leon Romanovsky <leon@kernel.org> 於 2021年8月9日 週一 下午1:51寫道:
> > > > On Mon, Aug 09, 2021 at 11:28:09AM +0800, AceLan Kao wrote:
> > > > > From: "Chia-Lin Kao (AceLan)" <acelan.kao@canonical.com>
> > > > > 
> > > > > The rtnl_lock() has been called in rtnetlink_rcv_msg(), and
> > > > > then in
> > > > > __dev_open() it calls pm_runtime_resume() to resume devices,
> > > > > and in
> > > > > some devices' resume function(igb_resum,igc_resume) they
> > > > > calls rtnl_lock()
> > > > > again. That leads to a recursive lock.
> > > > > 
> > > > > It should leave the devices' resume function to decide if
> > > > > they need to
> > > > > call rtnl_lock()/rtnl_unlock(),
> > > > 
> > > > Why? It doesn't sound right that drivers internally decide if
> > > > to take or
> > > > release some external to them lock without seeing full picture.
> > > From what I observed, this is the only calling path that acquired
> > > rtnl_lock() before calling drivers' resume function.
> > > So, it encounters recursive lock while driver is going to cal
> > > rtnl_lock() again.
> > 
> > I clearly see the problem, but don't agree with a solution.
> > 
> > > > Most of the time, device driver authors do it wrong. I afraid
> > > > that igs
> > > > is one of such drivers that did it wrong.
> > > The issues could be if we remove rtnl_lock in device drivers,
> > > then in
> > > other calling path, it won't be protected by the rtnl lock,
> > > and maybe we shouldn't call pm_runtime_resume() here(within rtnl
> > > lock), for device drivers don't know if they are protected by the
> > > rtnl
> > > lock while their resume() got called.
> > 
> > This is exactly the problem, every driver guesses if rtnl_lock is
> > needed
> > or not in specific path. It is wrong by design. You should ensure
> > that
> > all paths that are triggered through rtnl should hold rtnl_lock.
> Hi Jesse, Tony,
> 
> As you are the Intel Ethernet drivers' maintainers, do you have any
> idea about this?
> We can reproduce this issue on the machine with PCI Ethernet card
> using igb or igc driver.

Adding Aleks and Sasha for their input as they are the leads for the
two drivers.

> > You dropped rtnl_lock() without any protection, it is 100% bug.
> > 
> > Thanks
> > 
> > > > Thanks
> > > > 
> > > > > so call rtnl_unlock() before calling pm_runtime_resume() and
> > > > > then call
> > > > > rtnl_lock() after it in __dev_open().
> > > > > 
> > > > > [  967.723577] INFO: task ip:6024 blocked for more than 120
> > > > > seconds.
> > > > > [  967.723588]       Not tainted 5.12.0-rc3+ #1
> > > > > [  967.723592] "echo 0 >
> > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this
> > > > > message.
> > > > > [  967.723594] task:ip              state:D stack:    0 pid:
> > > > > 6024 ppid:  5957 flags:0x00004000
> > > > > [  967.723603] Call Trace:
> > > > > [  967.723610]  __schedule+0x2de/0x890
> > > > > [  967.723623]  schedule+0x4f/0xc0
> > > > > [  967.723629]  schedule_preempt_disabled+0xe/0x10
> > > > > [  967.723636]  __mutex_lock.isra.0+0x190/0x510
> > > > > [  967.723644]  __mutex_lock_slowpath+0x13/0x20
> > > > > [  967.723651]  mutex_lock+0x32/0x40
> > > > > [  967.723657]  rtnl_lock+0x15/0x20
> > > > > [  967.723665]  igb_resume+0xee/0x1d0 [igb]
> > > > > [  967.723687]  ? pci_pm_default_resume+0x30/0x30
> > > > > [  967.723696]  igb_runtime_resume+0xe/0x10 [igb]
> > > > > [  967.723713]  pci_pm_runtime_resume+0x74/0x90
> > > > > [  967.723718]  __rpm_callback+0x53/0x1c0
> > > > > [  967.723725]  rpm_callback+0x57/0x80
> > > > > [  967.723730]  ? pci_pm_default_resume+0x30/0x30
> > > > > [  967.723735]  rpm_resume+0x547/0x760
> > > > > [  967.723740]  __pm_runtime_resume+0x52/0x80
> > > > > [  967.723745]  __dev_open+0x56/0x160
> > > > > [  967.723753]  ? _raw_spin_unlock_bh+0x1e/0x20
> > > > > [  967.723758]  __dev_change_flags+0x188/0x1e0
> > > > > [  967.723766]  dev_change_flags+0x26/0x60
> > > > > [  967.723773]  do_setlink+0x723/0x10b0
> > > > > [  967.723782]  ? __nla_validate_parse+0x5b/0xb80
> > > > > [  967.723792]  __rtnl_newlink+0x594/0xa00
> > > > > [  967.723800]  ? nla_put_ifalias+0x38/0xa0
> > > > > [  967.723807]  ? __nla_reserve+0x41/0x50
> > > > > [  967.723813]  ? __nla_reserve+0x41/0x50
> > > > > [  967.723818]  ? __kmalloc_node_track_caller+0x49b/0x4d0
> > > > > [  967.723824]  ? pskb_expand_head+0x75/0x310
> > > > > [  967.723830]  ? nla_reserve+0x28/0x30
> > > > > [  967.723835]  ? skb_free_head+0x25/0x30
> > > > > [  967.723843]  ? security_sock_rcv_skb+0x2f/0x50
> > > > > [  967.723850]  ? netlink_deliver_tap+0x3d/0x210
> > > > > [  967.723859]  ? sk_filter_trim_cap+0xc1/0x230
> > > > > [  967.723863]  ? skb_queue_tail+0x43/0x50
> > > > > [  967.723870]  ? sock_def_readable+0x4b/0x80
> > > > > [  967.723876]  ? __netlink_sendskb+0x42/0x50
> > > > > [  967.723888]  ? security_capable+0x3d/0x60
> > > > > [  967.723894]  ? __cond_resched+0x19/0x30
> > > > > [  967.723900]  ? kmem_cache_alloc_trace+0x390/0x440
> > > > > [  967.723906]  rtnl_newlink+0x49/0x70
> > > > > [  967.723913]  rtnetlink_rcv_msg+0x13c/0x370
> > > > > [  967.723920]  ? _copy_to_iter+0xa0/0x460
> > > > > [  967.723927]  ? rtnl_calcit.isra.0+0x130/0x130
> > > > > [  967.723934]  netlink_rcv_skb+0x55/0x100
> > > > > [  967.723939]  rtnetlink_rcv+0x15/0x20
> > > > > [  967.723944]  netlink_unicast+0x1a8/0x250
> > > > > [  967.723949]  netlink_sendmsg+0x233/0x460
> > > > > [  967.723954]  sock_sendmsg+0x65/0x70
> > > > > [  967.723958]  ____sys_sendmsg+0x218/0x290
> > > > > [  967.723961]  ? copy_msghdr_from_user+0x5c/0x90
> > > > > [  967.723966]  ?
> > > > > lru_cache_add_inactive_or_unevictable+0x27/0xb0
> > > > > [  967.723974]  ___sys_sendmsg+0x81/0xc0
> > > > > [  967.723980]  ? __mod_memcg_lruvec_state+0x22/0xe0
> > > > > [  967.723987]  ? kmem_cache_free+0x244/0x420
> > > > > [  967.723991]  ? dentry_free+0x37/0x70
> > > > > [  967.723996]  ? mntput_no_expire+0x4c/0x260
> > > > > [  967.724001]  ? __cond_resched+0x19/0x30
> > > > > [  967.724007]  ? security_file_free+0x54/0x60
> > > > > [  967.724013]  ? call_rcu+0xa4/0x250
> > > > > [  967.724021]  __sys_sendmsg+0x62/0xb0
> > > > > [  967.724026]  ? exit_to_user_mode_prepare+0x3d/0x1a0
> > > > > [  967.724032]  __x64_sys_sendmsg+0x1f/0x30
> > > > > [  967.724037]  do_syscall_64+0x38/0x90
> > > > > [  967.724044]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > 
> > > > > Fixes: bd869245a3dc ("net: core: try to runtime-resume
> > > > > detached device in __dev_open")
> > > > > Signed-off-by: Chia-Lin Kao (AceLan) <
> > > > > acelan.kao@canonical.com>
> > > > > ---
> > > > >  net/core/dev.c | 5 ++++-
> > > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > > index 8f1a47ad6781..dd43a29419fd 100644
> > > > > --- a/net/core/dev.c
> > > > > +++ b/net/core/dev.c
> > > > > @@ -1585,8 +1585,11 @@ static int __dev_open(struct
> > > > > net_device *dev, struct netlink_ext_ack *extack)
> > > > > 
> > > > >       if (!netif_device_present(dev)) {
> > > > >               /* may be detached because parent is runtime-
> > > > > suspended */
> > > > > -             if (dev->dev.parent)
> > > > > +             if (dev->dev.parent) {
> > > > > +                     rtnl_unlock();
> > > > >                       pm_runtime_resume(dev->dev.parent);
> > > > > +                     rtnl_lock();
> > > > > +             }
> > > > >               if (!netif_device_present(dev))
> > > > >                       return -ENODEV;
> > > > >       }
> > > > > --
> > > > > 2.25.1
> > > > > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-24  2:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-09  3:28 [RESEND][PATCH] net: called rtnl_unlock() before runpm resumes devices AceLan Kao
2021-08-09  5:51 ` Leon Romanovsky
2021-08-10  1:57   ` AceLan Kao
2021-08-10  7:08     ` Leon Romanovsky
2021-08-23  3:26       ` AceLan Kao
2021-08-24  2:45         ` Nguyen, Anthony L

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).