* [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
@ 2021-10-05 1:51 Luis Claudio R. Goncalves
2021-10-11 13:56 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 12+ messages in thread
From: Luis Claudio R. Goncalves @ 2021-10-05 1:51 UTC (permalink / raw)
To: linux-rt-users, Steven Rostedt, stable-rt
Cc: Sebastian Andrzej Siewior, Nitesh Narayan Lal, Pei Zhang
Avoid a possible circular locking dependency by taking the softirq_ctrl.lock
before taking dev_base_lock in netstat_show(), keeping locking order
consistent.
Lockdep splat:
[ 106.529733] ======================================================
[ 106.529734] WARNING: possible circular locking dependency detected
[ 106.529734] 5.10.65-rt53+ #1 Tainted: G W I
[ 106.529736] ------------------------------------------------------
[ 106.529736] tuned/2760 is trying to acquire lock:
[ 106.529737] ffffa0695ea18220 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0x116/0x2f0
[ 106.529746]
but task is already holding lock:
[ 106.529746] ffffffff844af760 (dev_base_lock){++.+}-{0:0}, at: netstat_show.isra.17+0x4a/0xb0
[ 106.529752]
which lock already depends on the new lock.
...
[ 106.529825]
other info that might help us debug this:
[ 106.529826] Possible unsafe locking scenario:
[ 106.529826] CPU0 CPU1
[ 106.529827] ---- ----
[ 106.529827] lock(dev_base_lock);
[ 106.529828] lock((softirq_ctrl.lock).lock);
[ 106.529829] lock(dev_base_lock);
[ 106.529829] lock((softirq_ctrl.lock).lock);
[ 106.529830]
*** DEADLOCK ***
[ 106.529831] 6 locks held by tuned/2760:
[ 106.529832] #0: ffffa062296a6670 (&f->f_pos_lock){+.+.}-{0:0}, at: __fdget_pos+0x4b/0x60
[ 106.529836] #1: ffffa062223ae310 (&p->lock){+.+.}-{0:0}, at: seq_read_iter+0x56/0x420
[ 106.529839] #2: ffffa06242ea1ab8 (&of->mutex){+.+.}-{0:0}, at: kernfs_seq_start+0x28/0x90
[ 106.529842] #3: ffffa06268f52008 (kn->active#285){++++}-{0:0}, at: kernfs_seq_start+0x30/0x90
[ 106.529844] #4: ffffffff844af760 (dev_base_lock){++.+}-{0:0}, at: netstat_show.isra.17+0x4a/0xb0
[ 106.529848] #5: ffffffff840c8fa0 (rcu_read_lock){....}-{1:2}, at: rt_read_lock+0x7d/0x1e0
Reported-by: Pei Zhang <pezhang@redhat.com>
Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
---
net/core/net-sysfs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index f6197774048b..7291ca052632 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -625,14 +625,14 @@ static ssize_t netstat_show(const struct device *d,
WARN_ON(offset > sizeof(struct rtnl_link_stats64) ||
offset % sizeof(u64) != 0);
- read_lock(&dev_base_lock);
+ read_lock_bh(&dev_base_lock);
if (dev_isalive(dev)) {
struct rtnl_link_stats64 temp;
const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp);
ret = sprintf(buf, fmt_u64, *(u64 *)(((u8 *)stats) + offset));
}
- read_unlock(&dev_base_lock);
+ read_unlock_bh(&dev_base_lock);
return ret;
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
2021-10-05 1:51 [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() Luis Claudio R. Goncalves
@ 2021-10-11 13:56 ` Sebastian Andrzej Siewior
2021-10-11 16:36 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-11 13:56 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang
On 2021-10-04 22:51:05 [-0300], Luis Claudio R. Goncalves wrote:
> Avoid a possible circular locking dependency by taking the softirq_ctrl.lock
> before taking dev_base_lock in netstat_show(), keeping locking order
> consistent.
…
> [ 106.529734] 5.10.65-rt53+ #1 Tainted: G W I
So looking at 5.15 there are multiple instances of that. Either _all_ of
them need to disable bh while acquiring dev_base_lock or none of them.
I've been looking at a few call chains and didn't find one which
acquired that lock from softirq. This does not mean there isn't one.
Tracing this back, I stumbled upon v2.3.6 where a few of those got their
_bh removed while new ones were added with _bh. So _maybe_ this _bh
isn't required and nobody noticed. Lockdep would yell if it would
observe dev_base_lock in softirq and noticed one instance like this one
in netstat_show() where it is missing. But it didn't happen.
So my guess is that the _bh isn't needed.
Sebastian
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
2021-10-11 13:56 ` Sebastian Andrzej Siewior
@ 2021-10-11 16:36 ` Sebastian Andrzej Siewior
2021-10-18 15:42 ` Luis Goncalves
0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-11 16:36 UTC (permalink / raw)
To: Luis Claudio R. Goncalves
Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang
On 2021-10-11 15:56:09 [+0200], To Luis Claudio R. Goncalves wrote:
> So my guess is that the _bh isn't needed.
Correction: If there is at least _one_ reader in softirq all, writer
need to disable BH which _might_ be the case here.
The splat is probably because on read_lock holder disables BH later on
which I don't see yet.
Sebastian
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
2021-10-11 16:36 ` Sebastian Andrzej Siewior
@ 2021-10-18 15:42 ` Luis Goncalves
2021-11-04 18:58 ` Luis Goncalves
0 siblings, 1 reply; 12+ messages in thread
From: Luis Goncalves @ 2021-10-18 15:42 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang
On Mon, Oct 11, 2021 at 1:36 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2021-10-11 15:56:09 [+0200], To Luis Claudio R. Goncalves wrote:
> > So my guess is that the _bh isn't needed.
>
> Correction: If there is at least _one_ reader in softirq all, writer
> need to disable BH which _might_ be the case here.
> The splat is probably because on read_lock holder disables BH later on
> which I don't see yet.
>
> Sebastian
The way I was able to reproduce the problem in 5.10 (and in 5.14 before -rt11)
was:
1. systemctl disable tuned.service
2. Reboot to a kernel-rt with lockdep support
3. systemctl start tuned.service
I used a few tuned profiles, but stuck with 'realtime' for obvious reasons.
If that helps in the process I can investigate which specific
action/configuration
performed by tuned is triggering the lockdep splat.
Luis
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
2021-10-18 15:42 ` Luis Goncalves
@ 2021-11-04 18:58 ` Luis Goncalves
2021-11-25 1:39 ` Luis Goncalves
0 siblings, 1 reply; 12+ messages in thread
From: Luis Goncalves @ 2021-11-04 18:58 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang
On Mon, Oct 18, 2021 at 12:42 PM Luis Goncalves <lgoncalv@redhat.com> wrote:
>
> On Mon, Oct 11, 2021 at 1:36 PM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > On 2021-10-11 15:56:09 [+0200], To Luis Claudio R. Goncalves wrote:
> > > So my guess is that the _bh isn't needed.
> >
> > Correction: If there is at least _one_ reader in softirq all, writer
> > need to disable BH which _might_ be the case here.
> > The splat is probably because on read_lock holder disables BH later on
> > which I don't see yet.
> >
> > Sebastian
>
> The way I was able to reproduce the problem in 5.10 (and in 5.14 before -rt11)
> was:
>
> 1. systemctl disable tuned.service
> 2. Reboot to a kernel-rt with lockdep support
> 3. systemctl start tuned.service
>
> I used a few tuned profiles, but stuck with 'realtime' for obvious reasons.
>
> If that helps in the process I can investigate which specific
> action/configuration
> performed by tuned is triggering the lockdep splat.
Just for the sake of completeness, this is the lockdep splat for v5.10.73-rt54:
[ 41.917124] hrtimer: interrupt took 19800197 ns
[ 45.105231] bnxt_en 0000:19:00.0 eno1np0: NIC Link is Up, 10000
Mbps full duplex, Flow control: ON - receive & transmit
[ 45.105235] bnxt_en 0000:19:00.0 eno1np0: FEC autoneg off encoding: None
[ 46.370088] ======================================================
[ 46.370089] WARNING: possible circular locking dependency detected
[ 46.370089] 5.10.73-rt54.lockdep+ #2 Tainted: G I
[ 46.370091] ------------------------------------------------------
[ 46.370091] tuned/2248 is trying to acquire lock:
[ 46.370093] ffff954d9e818220
((softirq_ctrl.lock).lock){+.+.}-{2:2}, at:
__local_bh_disable_ip+0x116/0x2f0
[ 46.370104]
but task is already holding lock:
[ 46.370104] ffffffff9ccafa60 (dev_base_lock){++.+}-{0:0}, at:
netstat_show.isra.17+0x4a/0xb0
[ 46.370111]
which lock already depends on the new lock.
[ 46.370111]
the existing dependency chain (in reverse order) is:
[ 46.370111]
-> #1 (dev_base_lock){++.+}-{0:0}:
[ 46.370113] lock_acquire+0xde/0x380
[ 46.370117] rt_write_lock+0x3a/0x1b0
[ 46.370122] list_netdevice+0x4c/0x140
[ 46.370125] register_netdevice+0x59b/0x810
[ 46.370127] register_netdev+0x1a/0x30
[ 46.370128] loopback_net_init+0x48/0xa0
[ 46.370130] ops_init+0x3a/0x180
[ 46.370133] register_pernet_operations+0x126/0x220
[ 46.370135] register_pernet_device+0x25/0x60
[ 46.370138] net_dev_init+0x31b/0x391
[ 46.370141] do_one_initcall+0x74/0x440
[ 46.370145] kernel_init_freeable+0x39d/0x408
[ 46.370148] kernel_init+0xb/0x12e
[ 46.370150] ret_from_fork+0x1f/0x30
[ 46.370152]
-> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}:
[ 46.370153] check_prevs_add+0x1bb/0xe50
[ 46.370155] __lock_acquire+0x1187/0x1640
[ 46.370157] lock_acquire+0xde/0x380
[ 46.370158] rt_spin_lock+0x2b/0xd0
[ 46.370160] __local_bh_disable_ip+0x116/0x2f0
[ 46.370162] tg3_get_stats64+0x29/0xb0 [tg3]
[ 46.370167] dev_get_stats+0x5b/0xc0
[ 46.370168] netstat_show.isra.17+0x5b/0xb0
[ 46.370170] dev_attr_show+0x18/0x50
[ 46.370173] sysfs_kf_seq_show+0xa1/0x130
[ 46.370176] seq_read_iter+0x19e/0x420
[ 46.370178] new_sync_read+0x121/0x1c0
[ 46.370180] vfs_read+0x185/0x1e0
[ 46.370182] ksys_read+0x5f/0xe0
[ 46.370184] do_syscall_64+0x33/0x40
[ 46.370186] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 46.370188]
other info that might help us debug this:
[ 46.370188] Possible unsafe locking scenario:
[ 46.370189] CPU0 CPU1
[ 46.370189] ---- ----
[ 46.370189] lock(dev_base_lock);
[ 46.370190] lock((softirq_ctrl.lock).lock);
[ 46.370191] lock(dev_base_lock);
[ 46.370192] lock((softirq_ctrl.lock).lock);
[ 46.370193]
*** DEADLOCK ***
[ 46.370193] 6 locks held by tuned/2248:
[ 46.370194] #0: ffff954606ef7d30 (&f->f_pos_lock){+.+.}-{0:0}, at:
__fdget_pos+0x4b/0x60
[ 46.370199] #1: ffff9545d307a1f0 (&p->lock){+.+.}-{0:0}, at:
seq_read_iter+0x56/0x420
[ 46.370202] #2: ffff9545fcde6cb8 (&of->mutex){+.+.}-{0:0}, at:
kernfs_seq_start+0x28/0x90
[ 46.370205] #3: ffff953e5c12dd98 (kn->active#246){.+.+}-{0:0}, at:
kernfs_seq_start+0x30/0x90
[ 46.370208] #4: ffffffff9ccafa60 (dev_base_lock){++.+}-{0:0}, at:
netstat_show.isra.17+0x4a/0xb0
[ 46.370211] #5: ffffffff9c8c8fe0 (rcu_read_lock){....}-{1:2}, at:
rt_read_lock+0x7d/0x1e0
[ 46.370214]
stack backtrace:
[ 46.370214] CPU: 27 PID: 2248 Comm: tuned Tainted: G I
5.10.73-rt54.lockdep+ #2
[ 46.370216] Hardware name: Dell Inc. PowerEdge R740/07X9K0, BIOS
2.8.2 08/27/2020
[ 46.370217] Call Trace:
[ 46.370218] dump_stack+0x77/0x97
[ 46.370224] check_noncircular+0xff/0x120
[ 46.370228] check_prevs_add+0x1bb/0xe50
[ 46.370232] __lock_acquire+0x1187/0x1640
[ 46.370236] lock_acquire+0xde/0x380
[ 46.370237] ? __local_bh_disable_ip+0x116/0x2f0
[ 46.370242] ? tg3_get_stats64+0x5/0xb0 [tg3]
[ 46.370245] rt_spin_lock+0x2b/0xd0
[ 46.370246] ? __local_bh_disable_ip+0x116/0x2f0
[ 46.370249] __local_bh_disable_ip+0x116/0x2f0
[ 46.370251] tg3_get_stats64+0x29/0xb0 [tg3]
[ 46.370255] dev_get_stats+0x5b/0xc0
[ 46.370257] netstat_show.isra.17+0x5b/0xb0
[ 46.370263] dev_attr_show+0x18/0x50
[ 46.370264] sysfs_kf_seq_show+0xa1/0x130
[ 46.370266] seq_read_iter+0x19e/0x420
[ 46.370269] new_sync_read+0x121/0x1c0
[ 46.370272] vfs_read+0x185/0x1e0
[ 46.370274] ksys_read+0x5f/0xe0
[ 46.370276] do_syscall_64+0x33/0x40
[ 46.370278] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 46.370280] RIP: 0033:0x7f4ee932b9e4
[ 46.370282] Code: c3 0f 1f 44 00 00 41 54 49 89 d4 55 48 89 f5 53
89 fb 48 83 ec 10 e8 7b fc ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 31
c0 0f 05 <48> 3d 00 f0 ff ff 77 38 44 89 c7 48 89 44 24 08 e8 b7 fc ff
ff 48
[ 46.370283] RSP: 002b:00007f4ede603be0 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 46.370285] RAX: ffffffffffffffda RBX: 0000000000000039 RCX: 00007f4ee932b9e4
[ 46.370286] RDX: 0000000000001001 RSI: 00007f4ed801b670 RDI: 0000000000000039
[ 46.370287] RBP: 00007f4ed801b670 R08: 0000000000000000 R09: 0000000000000003
[ 46.370288] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000001001
[ 46.370289] R13: 0000000000000039 R14: 00007f4ed801b670 R15: 000055cd6476e060
[ 47.606432] DMA-API: dma_debug_entry pool grown to 131072 (200%)
[ 49.474083] DMA-API: dma_debug_entry pool grown to 196608 (300%)
[ 49.653708] DMA-API: dma_debug_entry pool grown to 262144 (400%)
Luis
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
2021-11-04 18:58 ` Luis Goncalves
@ 2021-11-25 1:39 ` Luis Goncalves
2021-11-25 10:01 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 12+ messages in thread
From: Luis Goncalves @ 2021-11-25 1:39 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang
On Thu, Nov 4, 2021 at 3:58 PM Luis Goncalves <lgoncalv@redhat.com> wrote:
>
> On Mon, Oct 18, 2021 at 12:42 PM Luis Goncalves <lgoncalv@redhat.com> wrote:
> >
> > On Mon, Oct 11, 2021 at 1:36 PM Sebastian Andrzej Siewior
> > <bigeasy@linutronix.de> wrote:
> > >
> > > On 2021-10-11 15:56:09 [+0200], To Luis Claudio R. Goncalves wrote:
> > > > So my guess is that the _bh isn't needed.
> > >
> > > Correction: If there is at least _one_ reader in softirq all, writer
> > > need to disable BH which _might_ be the case here.
> > > The splat is probably because on read_lock holder disables BH later on
> > > which I don't see yet.
> > >
> > > Sebastian
> >
> > The way I was able to reproduce the problem in 5.10 (and in 5.14 before -rt11)
> > was:
> >
> > 1. systemctl disable tuned.service
> > 2. Reboot to a kernel-rt with lockdep support
> > 3. systemctl start tuned.service
> >
> > I used a few tuned profiles, but stuck with 'realtime' for obvious reasons.
> >
> > If that helps in the process I can investigate which specific
> > action/configuration
> > performed by tuned is triggering the lockdep splat.
>
> Just for the sake of completeness, this is the lockdep splat for v5.10.73-rt54:
Sebastian, I just stumbled over the lockdep splat being discussed here
in v5.15.3-rt21. Well, slightly different, but related to the very
same situation:
(btw, sorry if the formatting is off, I am using the gmail webui)
[35429.000894] ======================================================
[35429.000895] WARNING: possible circular locking dependency detected
[35429.000896] 5.15.3-rt21 #2 Not tainted
[35429.000897] ------------------------------------------------------
[35429.000898] tuned/6697 is trying to acquire lock:
[35429.000899] ffff8dbfd0cebf50 (&adapter->stats_lock){+.+.}-{2:2},
at: cxgb_get_stats+0x41/0x1d0 [cxgb4]
[35429.000925]
but task is already holding lock:
[35429.000925] ffffffff9e0ed570 (dev_base_lock){++.+}-{2:2}, at:
netstat_show.constprop.0+0x41/0xb0
[35429.000930]
which lock already depends on the new lock.
[35429.000931]
the existing dependency chain (in reverse order) is:
[35429.000931]
-> #2 (dev_base_lock){++.+}-{2:2}:
[35429.000933] __lock_acquire+0x4d2/0x930
[35429.000937] lock_acquire.part.0+0x9c/0x210
[35429.000938] rt_write_lock+0x48/0xc0
[35429.000941] list_netdevice+0x4c/0x130
[35429.000943] register_netdevice+0x4fd/0x670
[35429.000945] register_netdev+0x1c/0x40
[35429.000959] loopback_net_init+0x48/0x90
[35429.000962] ops_init+0x3a/0x160
[35429.000964] register_pernet_operations+0x127/0x220
[35429.000965] register_pernet_device+0x26/0x60
[35429.000967] net_dev_init+0x23c/0x2b0
[35429.000971] do_one_initcall+0x47/0x170
[35429.000974] do_initcalls+0xc6/0xdf
[35429.000977] kernel_init_freeable+0xfc/0x12c
[35429.000979] kernel_init+0x16/0x120
[35429.000982] ret_from_fork+0x22/0x30
[35429.000984]
-> #1 ((softirq_ctrl.lock)){+.+.}-{2:2}:
[35429.000986] __lock_acquire+0x4d2/0x930
[35429.000988] lock_acquire.part.0+0x9c/0x210
[35429.000989] rt_spin_lock+0x27/0xe0
[35429.000992] __local_bh_disable_ip+0xb8/0x190
[35429.000994] t4_wr_mbox_meat_timeout+0xa7/0x720 [cxgb4]
[35429.001004] t4_get_mps_bg_map+0xba/0x1b0 [cxgb4]
[35429.001028] t4_get_port_stats+0x16/0xc50 [cxgb4]
[35429.001039] t4_get_port_stats_offset+0x12/0x30 [cxgb4]
[35429.001050] cxgb_get_stats+0x66/0x1d0 [cxgb4]
[35429.001059] dev_get_stats+0x5c/0xc0
[35429.001061] rtnl_fill_stats+0x3b/0x130
[35429.001063] rtnl_fill_ifinfo+0x706/0x1140
[35429.001066] rtmsg_ifinfo_build_skb+0x88/0xe0
[35429.001068] rtmsg_ifinfo+0x25/0x60
[35429.001070] register_netdevice+0x5f9/0x670
[35429.001072] register_netdev+0x1c/0x40
[35429.001074] init_one+0x8af/0xda0 [cxgb4]
[35429.001084] local_pci_probe+0x45/0x80
[35429.001088] work_for_cpu_fn+0x16/0x20
[35429.001090] process_one_work+0x27a/0x450
[35429.001092] worker_thread+0x1ea/0x3c0
[35429.001093] kthread+0x188/0x1a0
[35429.001095] ret_from_fork+0x22/0x30
[35429.001098]
-> #0 (&adapter->stats_lock){+.+.}-{2:2}:
[35429.001099] check_prev_add+0x91/0xc10
[35429.001101] validate_chain+0x480/0x510
[35429.001102] __lock_acquire+0x4d2/0x930
[35429.001104] lock_acquire.part.0+0x9c/0x210
[35429.001105] rt_spin_lock+0x27/0xe0
[35429.001107] cxgb_get_stats+0x41/0x1d0 [cxgb4]
[35429.001116] dev_get_stats+0x5c/0xc0
[35429.001118] netstat_show.constprop.0+0x52/0xb0
[35429.001120] dev_attr_show+0x19/0x40
[35429.001122] sysfs_kf_seq_show+0xa9/0x100
[35429.001125] seq_read_iter+0x128/0x4c0
[35429.001128] new_sync_read+0x121/0x1b0
[35429.001130] vfs_read+0x133/0x1d0
[35429.001131] ksys_read+0x68/0xe0
[35429.001133] do_syscall_64+0x3b/0x90
[35429.001134] entry_SYSCALL_64_after_hwframe+0x44/0xae
[35429.001137]
other info that might help us debug this:
[35429.001137] Chain exists of:
&adapter->stats_lock --> (softirq_ctrl.lock) --> dev_base_lock
[35429.001139] Possible unsafe locking scenario:
[35429.001140] CPU0 CPU1
[35429.001140] ---- ----
[35429.001140] lock(dev_base_lock);
[35429.001141] lock((softirq_ctrl.lock));
[35429.001142] lock(dev_base_lock);
[35429.001143] lock(&adapter->stats_lock);
[35429.001144]
*** DEADLOCK ***
[35429.001144] 6 locks held by tuned/6697:
[35429.001145] #0: ffff8dba839df838 (&f->f_pos_lock){+.+.}-{3:3}, at:
__fdget_pos+0x4d/0x60
[35429.001149] #1: ffff8dba44ef3d98 (&p->lock){+.+.}-{3:3}, at:
seq_read_iter+0x51/0x4c0
[35429.001153] #2: ffff8dba49960a80 (&of->mutex){+.+.}-{3:3}, at:
kernfs_seq_start+0x2a/0xb0
[35429.001156] #3: ffff8dbff8ea31e8 (kn->active#223){++++}-{0:0}, at:
kernfs_seq_start+0x32/0xb0
[35429.001159] #4: ffffffff9e0ed570 (dev_base_lock){++.+}-{2:2}, at:
netstat_show.constprop.0+0x41/0xb0
[35429.001163] #5: ffffffff9de61200 (rcu_read_lock){....}-{1:2}, at:
rt_read_lock+0x65/0x130
[35429.001166]
stack backtrace:
[35429.001167] CPU: 16 PID: 6697 Comm: tuned Kdump: loaded Not tainted
5.15.3-rt21 #2
[35429.001169] Hardware name: Dell Inc. PowerEdge R520/08DM12, BIOS
2.1.2 01/20/2014
[35429.001171] Call Trace:
[35429.001172] <TASK>
[35429.001174] dump_stack_lvl+0x57/0x7d
[35429.001177] check_noncircular+0xff/0x110
[35429.001182] check_prev_add+0x91/0xc10
[35429.001184] ? add_chain_cache+0x112/0x2d0
[35429.001186] validate_chain+0x480/0x510
[35429.001190] __lock_acquire+0x4d2/0x930
[35429.001193] lock_acquire.part.0+0x9c/0x210
[35429.001195] ? cxgb_get_stats+0x41/0x1d0 [cxgb4]
[35429.001205] ? rcu_read_lock_sched_held+0x3f/0x70
[35429.001209] ? trace_lock_acquire+0x38/0x140
[35429.001210] ? lock_acquire+0x30/0x80
[35429.001212] ? cxgb_get_stats+0x41/0x1d0 [cxgb4]
[35429.001222] rt_spin_lock+0x27/0xe0
[35429.001225] ? cxgb_get_stats+0x41/0x1d0 [cxgb4]
[35429.001234] cxgb_get_stats+0x41/0x1d0 [cxgb4]
[35429.001245] ? check_prev_add+0xa3/0xc10
[35429.001247] ? check_prev_add+0xa3/0xc10
[35429.001248] ? add_chain_cache+0x112/0x2d0
[35429.001251] ? lockdep_unlock+0x5c/0xc0
[35429.001252] ? validate_chain+0x29a/0x510
[35429.001255] ? __lock_acquire+0x4d2/0x930
[35429.001258] ? lock_acquire.part.0+0x9c/0x210
[35429.001261] ? rcu_read_lock_sched_held+0x3f/0x70
[35429.001263] ? trace_lock_acquire+0x38/0x140
[35429.001265] ? lock_acquire+0x30/0x80
[35429.001266] ? rt_read_lock+0x65/0x130
[35429.001269] dev_get_stats+0x5c/0xc0
[35429.001272] netstat_show.constprop.0+0x52/0xb0
[35429.001279] dev_attr_show+0x19/0x40
[35429.001280] sysfs_kf_seq_show+0xa9/0x100
[35429.001283] seq_read_iter+0x128/0x4c0
[35429.001288] new_sync_read+0x121/0x1b0
[35429.001292] vfs_read+0x133/0x1d0
[35429.001294] ksys_read+0x68/0xe0
[35429.001297] do_syscall_64+0x3b/0x90
[35429.001299] entry_SYSCALL_64_after_hwframe+0x44/0xae
[35429.001301] RIP: 0033:0x7f03e459a9ec
[35429.001304] Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08
e8 09 87 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31
c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 4f 87 f8
ff 48
[35429.001306] RSP: 002b:00007f03e23896a0 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[35429.001308] RAX: ffffffffffffffda RBX: 00007f03e238b5c0 RCX: 00007f03e459a9ec
[35429.001309] RDX: 0000000000001001 RSI: 00007f03dc03b5b0 RDI: 0000000000000008
[35429.001310] RBP: 0000000000001001 R08: 0000000000000000 R09: 0000000000000000
[35429.001311] R10: 0000000000000002 R11: 0000000000000246 R12: 00007f03e021b540
[35429.001312] R13: 00007f03dc03b5b0 R14: 0000000000000008 R15: 00007f03d8001c00
[35429.001328] </TASK>
[35430.549435] bnx2x 0000:08:00.1 enp8s0f1: using MSI-X IRQs: sp 97
fp[0] 99 ... fp[19] 332
[35430.695688] bnx2x 0000:08:00.1 enp8s0f1: NIC Link is Up, 1000 Mbps
full duplex, Flow control: ON - receive & transmit
[35432.120806] bnx2x 0000:08:00.0 enp8s0f0: using MSI-X IRQs: sp 61
fp[0] 63 ... fp[19] 344
[35432.425838] bnx2x 0000:08:00.0 enp8s0f0: NIC Link is Up, 1000 Mbps
full duplex, Flow control: ON - receive & transmit
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
2021-11-25 1:39 ` Luis Goncalves
@ 2021-11-25 10:01 ` Sebastian Andrzej Siewior
2021-11-25 11:11 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-11-25 10:01 UTC (permalink / raw)
To: Luis Goncalves
Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang
On 2021-11-24 22:39:34 [-0300], Luis Goncalves wrote:
> Sebastian, I just stumbled over the lockdep splat being discussed here
> in v5.15.3-rt21. Well, slightly different, but related to the very
> same situation:
…
so we have write_lock_bh(&dev_base_lock) and read_lock(&dev_base_lock)
followed by local_bh_disable() (due to spin_lock_bh(&adap->mbox_lock)).
It makes sense to only change that one invocation. Let me form a patch.
Sebastian
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
2021-11-25 10:01 ` Sebastian Andrzej Siewior
@ 2021-11-25 11:11 ` Sebastian Andrzej Siewior
2021-11-26 12:09 ` Luis Goncalves
0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-11-25 11:11 UTC (permalink / raw)
To: Luis Goncalves
Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang
On 2021-11-25 11:01:10 [+0100], To Luis Goncalves wrote:
>
> It makes sense to only change that one invocation. Let me form a patch.
So I made dis:
| From aaef7596167b3b38a81567af81aaa2f9559c91df Mon Sep 17 00:00:00 2001
| From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
| Date: Thu, 25 Nov 2021 11:05:17 +0100
| Subject: [PATCH] net-sysfs: Acquire dev_base_lock disabled BH in
| netstat_show()
|
| The writer acquires dev_base_lock with disabled bottom halves.
| The reader can acquire dev_base_lock without disabling bottom halves
| because there is no writer in softirq context. Also readers can acquire
| the lock in task and softirq context and multiple readers are allowed.
|
| On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
| as a lock to ensure that resources, that are protected by disabling
| bottom halves, remain protected.
| This leads to a circular locking dependency if the lock acquired with
| disabled bottom halves (as in write_lock_bh()) and somewhere else with
| enabled bottom halves (as by read_lock() in netstat_show()) followed by
| disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
| -> spin_lock_bh()). This is the reverse locking order.
|
| In order to avoid the reverse locking order, acquire the dev_base_lock
| with disabled bottom halves.
|
| Reported-by: Pei Zhang <pezhang@redhat.com>
| Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
| Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
and then I started locking for in-softirq reader or writer. I can't find
any. Therefore I made this for testing:
diff --git a/net/core/dev.c b/net/core/dev.c
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev)
ASSERT_RTNL();
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
netdev_name_node_add(net, dev->name_node);
hlist_add_head_rcu(&dev->index_hlist,
dev_index_hash(net, dev->ifindex));
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
dev_base_seq_inc(net);
}
@@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev)
ASSERT_RTNL();
/* Unlink dev from the device chain */
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
list_del_rcu(&dev->dev_list);
netdev_name_node_del(dev->name_node);
hlist_del_rcu(&dev->index_hlist);
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
dev_base_seq_inc(dev_net(dev));
}
@@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
netdev_adjacent_rename_links(dev, oldname);
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
netdev_name_node_del(dev->name_node);
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
synchronize_rcu();
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
netdev_name_node_add(net, dev->name_node);
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
ret = notifier_to_errno(ret);
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev)
if (operstate == dev->operstate)
return;
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
switch(dev->link_mode) {
case IF_LINK_MODE_TESTING:
@@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev)
dev->operstate = operstate;
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
}
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
}
if (dev->operstate != operstate) {
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
dev->operstate = operstate;
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
netdev_state_change(dev);
}
}
@@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb,
if (tb[IFLA_LINKMODE]) {
unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]);
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
if (dev->link_mode ^ value)
status |= DO_SETLINK_NOTIFY;
dev->link_mode = value;
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
}
if (tb[IFLA_VFINFO_LIST]) {
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev)
static void __hsr_set_operstate(struct net_device *dev, int transition)
{
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
if (dev->operstate != transition) {
dev->operstate = transition;
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
netdev_state_change(dev);
} else {
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
}
}
If this does not create any warnings (with or without RT) then I would
suggest to remove that _bh suffix since there is no need for it. This
should also fix the RT warning you see now.
Agreed?
Sebastian
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
2021-11-25 11:11 ` Sebastian Andrzej Siewior
@ 2021-11-26 12:09 ` Luis Goncalves
2021-11-26 13:24 ` Luis Goncalves
0 siblings, 1 reply; 12+ messages in thread
From: Luis Goncalves @ 2021-11-26 12:09 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang
On Thu, Nov 25, 2021 at 8:11 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2021-11-25 11:01:10 [+0100], To Luis Goncalves wrote:
> >
> > It makes sense to only change that one invocation. Let me form a patch.
>
> So I made dis:
>
> | From aaef7596167b3b38a81567af81aaa2f9559c91df Mon Sep 17 00:00:00 2001
> | From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
> | Date: Thu, 25 Nov 2021 11:05:17 +0100
> | Subject: [PATCH] net-sysfs: Acquire dev_base_lock disabled BH in
> | netstat_show()
> |
> | The writer acquires dev_base_lock with disabled bottom halves.
> | The reader can acquire dev_base_lock without disabling bottom halves
> | because there is no writer in softirq context. Also readers can acquire
> | the lock in task and softirq context and multiple readers are allowed.
> |
> | On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
> | as a lock to ensure that resources, that are protected by disabling
> | bottom halves, remain protected.
> | This leads to a circular locking dependency if the lock acquired with
> | disabled bottom halves (as in write_lock_bh()) and somewhere else with
> | enabled bottom halves (as by read_lock() in netstat_show()) followed by
> | disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
> | -> spin_lock_bh()). This is the reverse locking order.
> |
> | In order to avoid the reverse locking order, acquire the dev_base_lock
> | with disabled bottom halves.
> |
> | Reported-by: Pei Zhang <pezhang@redhat.com>
> | Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> | Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> and then I started locking for in-softirq reader or writer. I can't find
> any. Therefore I made this for testing:
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev)
>
> ASSERT_RTNL();
>
> - write_lock_bh(&dev_base_lock);
> + write_lock(&dev_base_lock);
> list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
> netdev_name_node_add(net, dev->name_node);
> hlist_add_head_rcu(&dev->index_hlist,
> dev_index_hash(net, dev->ifindex));
> - write_unlock_bh(&dev_base_lock);
> + write_unlock(&dev_base_lock);
>
> dev_base_seq_inc(net);
> }
> @@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev)
> ASSERT_RTNL();
>
> /* Unlink dev from the device chain */
> - write_lock_bh(&dev_base_lock);
> + write_lock(&dev_base_lock);
> list_del_rcu(&dev->dev_list);
> netdev_name_node_del(dev->name_node);
> hlist_del_rcu(&dev->index_hlist);
> - write_unlock_bh(&dev_base_lock);
> + write_unlock(&dev_base_lock);
>
> dev_base_seq_inc(dev_net(dev));
> }
> @@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
>
> netdev_adjacent_rename_links(dev, oldname);
>
> - write_lock_bh(&dev_base_lock);
> + write_lock(&dev_base_lock);
> netdev_name_node_del(dev->name_node);
> - write_unlock_bh(&dev_base_lock);
> + write_unlock(&dev_base_lock);
>
> synchronize_rcu();
>
> - write_lock_bh(&dev_base_lock);
> + write_lock(&dev_base_lock);
> netdev_name_node_add(net, dev->name_node);
> - write_unlock_bh(&dev_base_lock);
> + write_unlock(&dev_base_lock);
>
> ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
> ret = notifier_to_errno(ret);
> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> --- a/net/core/link_watch.c
> +++ b/net/core/link_watch.c
> @@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev)
> if (operstate == dev->operstate)
> return;
>
> - write_lock_bh(&dev_base_lock);
> + write_lock(&dev_base_lock);
>
> switch(dev->link_mode) {
> case IF_LINK_MODE_TESTING:
> @@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev)
>
> dev->operstate = operstate;
>
> - write_unlock_bh(&dev_base_lock);
> + write_unlock(&dev_base_lock);
> }
>
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
> }
>
> if (dev->operstate != operstate) {
> - write_lock_bh(&dev_base_lock);
> + write_lock(&dev_base_lock);
> dev->operstate = operstate;
> - write_unlock_bh(&dev_base_lock);
> + write_unlock(&dev_base_lock);
> netdev_state_change(dev);
> }
> }
> @@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb,
> if (tb[IFLA_LINKMODE]) {
> unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]);
>
> - write_lock_bh(&dev_base_lock);
> + write_lock(&dev_base_lock);
> if (dev->link_mode ^ value)
> status |= DO_SETLINK_NOTIFY;
> dev->link_mode = value;
> - write_unlock_bh(&dev_base_lock);
> + write_unlock(&dev_base_lock);
> }
>
> if (tb[IFLA_VFINFO_LIST]) {
> diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
> --- a/net/hsr/hsr_device.c
> +++ b/net/hsr/hsr_device.c
> @@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev)
>
> static void __hsr_set_operstate(struct net_device *dev, int transition)
> {
> - write_lock_bh(&dev_base_lock);
> + write_lock(&dev_base_lock);
> if (dev->operstate != transition) {
> dev->operstate = transition;
> - write_unlock_bh(&dev_base_lock);
> + write_unlock(&dev_base_lock);
> netdev_state_change(dev);
> } else {
> - write_unlock_bh(&dev_base_lock);
> + write_unlock(&dev_base_lock);
> }
> }
>
>
> If this does not create any warnings (with or without RT) then I would
> suggest to remove that _bh suffix since there is no need for it. This
> should also fix the RT warning you see now.
> Agreed?
Sounds like a plan!
I will start testing within the hour and report soon.
Thanks,
Luis
> Sebastian
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
2021-11-26 12:09 ` Luis Goncalves
@ 2021-11-26 13:24 ` Luis Goncalves
2021-11-26 16:15 ` [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves Sebastian Andrzej Siewior
0 siblings, 1 reply; 12+ messages in thread
From: Luis Goncalves @ 2021-11-26 13:24 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang
On Fri, Nov 26, 2021 at 9:09 AM Luis Goncalves <lgoncalv@redhat.com> wrote:
>
> On Thu, Nov 25, 2021 at 8:11 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > On 2021-11-25 11:01:10 [+0100], To Luis Goncalves wrote:
> > >
> > > It makes sense to only change that one invocation. Let me form a patch.
> >
> > So I made dis:
> >
> > | From aaef7596167b3b38a81567af81aaa2f9559c91df Mon Sep 17 00:00:00 2001
> > | From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
> > | Date: Thu, 25 Nov 2021 11:05:17 +0100
> > | Subject: [PATCH] net-sysfs: Acquire dev_base_lock disabled BH in
> > | netstat_show()
> > |
> > | The writer acquires dev_base_lock with disabled bottom halves.
> > | The reader can acquire dev_base_lock without disabling bottom halves
> > | because there is no writer in softirq context. Also readers can acquire
> > | the lock in task and softirq context and multiple readers are allowed.
> > |
> > | On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
> > | as a lock to ensure that resources, that are protected by disabling
> > | bottom halves, remain protected.
> > | This leads to a circular locking dependency if the lock acquired with
> > | disabled bottom halves (as in write_lock_bh()) and somewhere else with
> > | enabled bottom halves (as by read_lock() in netstat_show()) followed by
> > | disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
> > | -> spin_lock_bh()). This is the reverse locking order.
> > |
> > | In order to avoid the reverse locking order, acquire the dev_base_lock
> > | with disabled bottom halves.
> > |
> > | Reported-by: Pei Zhang <pezhang@redhat.com>
> > | Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> > | Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> >
> > and then I started locking for in-softirq reader or writer. I can't find
> > any. Therefore I made this for testing:
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev)
> >
> > ASSERT_RTNL();
> >
> > - write_lock_bh(&dev_base_lock);
> > + write_lock(&dev_base_lock);
> > list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
> > netdev_name_node_add(net, dev->name_node);
> > hlist_add_head_rcu(&dev->index_hlist,
> > dev_index_hash(net, dev->ifindex));
> > - write_unlock_bh(&dev_base_lock);
> > + write_unlock(&dev_base_lock);
> >
> > dev_base_seq_inc(net);
> > }
> > @@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev)
> > ASSERT_RTNL();
> >
> > /* Unlink dev from the device chain */
> > - write_lock_bh(&dev_base_lock);
> > + write_lock(&dev_base_lock);
> > list_del_rcu(&dev->dev_list);
> > netdev_name_node_del(dev->name_node);
> > hlist_del_rcu(&dev->index_hlist);
> > - write_unlock_bh(&dev_base_lock);
> > + write_unlock(&dev_base_lock);
> >
> > dev_base_seq_inc(dev_net(dev));
> > }
> > @@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
> >
> > netdev_adjacent_rename_links(dev, oldname);
> >
> > - write_lock_bh(&dev_base_lock);
> > + write_lock(&dev_base_lock);
> > netdev_name_node_del(dev->name_node);
> > - write_unlock_bh(&dev_base_lock);
> > + write_unlock(&dev_base_lock);
> >
> > synchronize_rcu();
> >
> > - write_lock_bh(&dev_base_lock);
> > + write_lock(&dev_base_lock);
> > netdev_name_node_add(net, dev->name_node);
> > - write_unlock_bh(&dev_base_lock);
> > + write_unlock(&dev_base_lock);
> >
> > ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
> > ret = notifier_to_errno(ret);
> > diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> > --- a/net/core/link_watch.c
> > +++ b/net/core/link_watch.c
> > @@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev)
> > if (operstate == dev->operstate)
> > return;
> >
> > - write_lock_bh(&dev_base_lock);
> > + write_lock(&dev_base_lock);
> >
> > switch(dev->link_mode) {
> > case IF_LINK_MODE_TESTING:
> > @@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev)
> >
> > dev->operstate = operstate;
> >
> > - write_unlock_bh(&dev_base_lock);
> > + write_unlock(&dev_base_lock);
> > }
> >
> >
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
> > }
> >
> > if (dev->operstate != operstate) {
> > - write_lock_bh(&dev_base_lock);
> > + write_lock(&dev_base_lock);
> > dev->operstate = operstate;
> > - write_unlock_bh(&dev_base_lock);
> > + write_unlock(&dev_base_lock);
> > netdev_state_change(dev);
> > }
> > }
> > @@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb,
> > if (tb[IFLA_LINKMODE]) {
> > unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]);
> >
> > - write_lock_bh(&dev_base_lock);
> > + write_lock(&dev_base_lock);
> > if (dev->link_mode ^ value)
> > status |= DO_SETLINK_NOTIFY;
> > dev->link_mode = value;
> > - write_unlock_bh(&dev_base_lock);
> > + write_unlock(&dev_base_lock);
> > }
> >
> > if (tb[IFLA_VFINFO_LIST]) {
> > diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
> > --- a/net/hsr/hsr_device.c
> > +++ b/net/hsr/hsr_device.c
> > @@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev)
> >
> > static void __hsr_set_operstate(struct net_device *dev, int transition)
> > {
> > - write_lock_bh(&dev_base_lock);
> > + write_lock(&dev_base_lock);
> > if (dev->operstate != transition) {
> > dev->operstate = transition;
> > - write_unlock_bh(&dev_base_lock);
> > + write_unlock(&dev_base_lock);
> > netdev_state_change(dev);
> > } else {
> > - write_unlock_bh(&dev_base_lock);
> > + write_unlock(&dev_base_lock);
> > }
> > }
> >
> >
> > If this does not create any warnings (with or without RT) then I would
> > suggest to remove that _bh suffix since there is no need for it. This
> > should also fix the RT warning you see now.
> > Agreed?
>
> Sounds like a plan!
>
> I will start testing within the hour and report soon.
Using v5.15.3-rt21 + your patch, I rebooted multiple times the two
boxes where I had observed the netstat_show() lockdep complaint
before. The kernel booted fine every single time, no netstat_show()
lockdep complaints at all.
I only tested a kernel with PREEMPT_RT enabled. Do you want me to test
the vanilla kernel with your patch?
Luis
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves.
2021-11-26 13:24 ` Luis Goncalves
@ 2021-11-26 16:15 ` Sebastian Andrzej Siewior
2021-11-29 12:20 ` patchwork-bot+netdevbpf
0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-11-26 16:15 UTC (permalink / raw)
To: netdev
Cc: Luis Goncalves, Steven Rostedt, Nitesh Narayan Lal, Pei Zhang,
David S. Miller, Jakub Kicinski, Thomas Gleixner
The writer acquires dev_base_lock with disabled bottom halves.
The reader can acquire dev_base_lock without disabling bottom halves
because there is no writer in softirq context.
On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves (as in write_lock_bh()) and somewhere else with
enabled bottom halves (as by read_lock() in netstat_show()) followed by
disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
-> spin_lock_bh()). This is the reverse locking order.
All read_lock() invocation are from sysfs callback which are not invoked
from softirq context. Therefore there is no need to disable bottom
halves while acquiring a write lock.
Acquire the write lock of dev_base_lock without disabling bottom halves.
Reported-by: Pei Zhang <pezhang@redhat.com>
Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
net/core/dev.c | 16 ++++++++--------
net/core/link_watch.c | 4 ++--
net/core/rtnetlink.c | 8 ++++----
net/hsr/hsr_device.c | 6 +++---
4 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 8b5cf8ad859b5..ad3cccbfa573b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev)
ASSERT_RTNL();
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
netdev_name_node_add(net, dev->name_node);
hlist_add_head_rcu(&dev->index_hlist,
dev_index_hash(net, dev->ifindex));
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
dev_base_seq_inc(net);
}
@@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev)
ASSERT_RTNL();
/* Unlink dev from the device chain */
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
list_del_rcu(&dev->dev_list);
netdev_name_node_del(dev->name_node);
hlist_del_rcu(&dev->index_hlist);
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
dev_base_seq_inc(dev_net(dev));
}
@@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
netdev_adjacent_rename_links(dev, oldname);
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
netdev_name_node_del(dev->name_node);
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
synchronize_rcu();
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
netdev_name_node_add(net, dev->name_node);
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
ret = notifier_to_errno(ret);
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 1a455847da54f..9599afd0862da 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev)
if (operstate == dev->operstate)
return;
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
switch(dev->link_mode) {
case IF_LINK_MODE_TESTING:
@@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev)
dev->operstate = operstate;
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
}
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2af8aeeadadf0..716be2f88cd75 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
}
if (dev->operstate != operstate) {
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
dev->operstate = operstate;
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
netdev_state_change(dev);
}
}
@@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb,
if (tb[IFLA_LINKMODE]) {
unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]);
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
if (dev->link_mode ^ value)
status |= DO_SETLINK_NOTIFY;
dev->link_mode = value;
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
}
if (tb[IFLA_VFINFO_LIST]) {
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
index 737e4f17e1c6d..e57fdad9ef942 100644
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev)
static void __hsr_set_operstate(struct net_device *dev, int transition)
{
- write_lock_bh(&dev_base_lock);
+ write_lock(&dev_base_lock);
if (dev->operstate != transition) {
dev->operstate = transition;
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
netdev_state_change(dev);
} else {
- write_unlock_bh(&dev_base_lock);
+ write_unlock(&dev_base_lock);
}
}
--
2.34.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves.
2021-11-26 16:15 ` [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves Sebastian Andrzej Siewior
@ 2021-11-29 12:20 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-11-29 12:20 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: netdev, lgoncalv, rostedt, nilal, pezhang, davem, kuba, tglx
Hello:
This patch was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:
On Fri, 26 Nov 2021 17:15:29 +0100 you wrote:
> The writer acquires dev_base_lock with disabled bottom halves.
> The reader can acquire dev_base_lock without disabling bottom halves
> because there is no writer in softirq context.
>
> On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
> as a lock to ensure that resources, that are protected by disabling
> bottom halves, remain protected.
> This leads to a circular locking dependency if the lock acquired with
> disabled bottom halves (as in write_lock_bh()) and somewhere else with
> enabled bottom halves (as by read_lock() in netstat_show()) followed by
> disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
> -> spin_lock_bh()). This is the reverse locking order.
>
> [...]
Here is the summary with links:
- [net-next] net: Write lock dev_base_lock without disabling bottom halves.
https://git.kernel.org/netdev/net-next/c/fd888e85fe6b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-11-29 13:38 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-05 1:51 [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() Luis Claudio R. Goncalves
2021-10-11 13:56 ` Sebastian Andrzej Siewior
2021-10-11 16:36 ` Sebastian Andrzej Siewior
2021-10-18 15:42 ` Luis Goncalves
2021-11-04 18:58 ` Luis Goncalves
2021-11-25 1:39 ` Luis Goncalves
2021-11-25 10:01 ` Sebastian Andrzej Siewior
2021-11-25 11:11 ` Sebastian Andrzej Siewior
2021-11-26 12:09 ` Luis Goncalves
2021-11-26 13:24 ` Luis Goncalves
2021-11-26 16:15 ` [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves Sebastian Andrzej Siewior
2021-11-29 12:20 ` patchwork-bot+netdevbpf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.