* [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() @ 2021-10-05 1:51 Luis Claudio R. Goncalves 2021-10-11 13:56 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 12+ messages in thread From: Luis Claudio R. Goncalves @ 2021-10-05 1:51 UTC (permalink / raw) To: linux-rt-users, Steven Rostedt, stable-rt Cc: Sebastian Andrzej Siewior, Nitesh Narayan Lal, Pei Zhang Avoid a possible circular locking dependency by taking the softirq_ctrl.lock before taking dev_base_lock in netstat_show(), keeping locking order consistent. Lockdep splat: [ 106.529733] ====================================================== [ 106.529734] WARNING: possible circular locking dependency detected [ 106.529734] 5.10.65-rt53+ #1 Tainted: G W I [ 106.529736] ------------------------------------------------------ [ 106.529736] tuned/2760 is trying to acquire lock: [ 106.529737] ffffa0695ea18220 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0x116/0x2f0 [ 106.529746] but task is already holding lock: [ 106.529746] ffffffff844af760 (dev_base_lock){++.+}-{0:0}, at: netstat_show.isra.17+0x4a/0xb0 [ 106.529752] which lock already depends on the new lock. ... [ 106.529825] other info that might help us debug this: [ 106.529826] Possible unsafe locking scenario: [ 106.529826] CPU0 CPU1 [ 106.529827] ---- ---- [ 106.529827] lock(dev_base_lock); [ 106.529828] lock((softirq_ctrl.lock).lock); [ 106.529829] lock(dev_base_lock); [ 106.529829] lock((softirq_ctrl.lock).lock); [ 106.529830] *** DEADLOCK *** [ 106.529831] 6 locks held by tuned/2760: [ 106.529832] #0: ffffa062296a6670 (&f->f_pos_lock){+.+.}-{0:0}, at: __fdget_pos+0x4b/0x60 [ 106.529836] #1: ffffa062223ae310 (&p->lock){+.+.}-{0:0}, at: seq_read_iter+0x56/0x420 [ 106.529839] #2: ffffa06242ea1ab8 (&of->mutex){+.+.}-{0:0}, at: kernfs_seq_start+0x28/0x90 [ 106.529842] #3: ffffa06268f52008 (kn->active#285){++++}-{0:0}, at: kernfs_seq_start+0x30/0x90 [ 106.529844] #4: ffffffff844af760 (dev_base_lock){++.+}-{0:0}, at: netstat_show.isra.17+0x4a/0xb0 [ 106.529848] #5: ffffffff840c8fa0 (rcu_read_lock){....}-{1:2}, at: rt_read_lock+0x7d/0x1e0 Reported-by: Pei Zhang <pezhang@redhat.com> Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> --- net/core/net-sysfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index f6197774048b..7291ca052632 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -625,14 +625,14 @@ static ssize_t netstat_show(const struct device *d, WARN_ON(offset > sizeof(struct rtnl_link_stats64) || offset % sizeof(u64) != 0); - read_lock(&dev_base_lock); + read_lock_bh(&dev_base_lock); if (dev_isalive(dev)) { struct rtnl_link_stats64 temp; const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp); ret = sprintf(buf, fmt_u64, *(u64 *)(((u8 *)stats) + offset)); } - read_unlock(&dev_base_lock); + read_unlock_bh(&dev_base_lock); return ret; } ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() 2021-10-05 1:51 [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() Luis Claudio R. Goncalves @ 2021-10-11 13:56 ` Sebastian Andrzej Siewior 2021-10-11 16:36 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 12+ messages in thread From: Sebastian Andrzej Siewior @ 2021-10-11 13:56 UTC (permalink / raw) To: Luis Claudio R. Goncalves Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang On 2021-10-04 22:51:05 [-0300], Luis Claudio R. Goncalves wrote: > Avoid a possible circular locking dependency by taking the softirq_ctrl.lock > before taking dev_base_lock in netstat_show(), keeping locking order > consistent. … > [ 106.529734] 5.10.65-rt53+ #1 Tainted: G W I So looking at 5.15 there are multiple instances of that. Either _all_ of them need to disable bh while acquiring dev_base_lock or none of them. I've been looking at a few call chains and didn't find one which acquired that lock from softirq. This does not mean there isn't one. Tracing this back, I stumbled upon v2.3.6 where a few of those got their _bh removed while new ones were added with _bh. So _maybe_ this _bh isn't required and nobody noticed. Lockdep would yell if it would observe dev_base_lock in softirq and noticed one instance like this one in netstat_show() where it is missing. But it didn't happen. So my guess is that the _bh isn't needed. Sebastian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() 2021-10-11 13:56 ` Sebastian Andrzej Siewior @ 2021-10-11 16:36 ` Sebastian Andrzej Siewior 2021-10-18 15:42 ` Luis Goncalves 0 siblings, 1 reply; 12+ messages in thread From: Sebastian Andrzej Siewior @ 2021-10-11 16:36 UTC (permalink / raw) To: Luis Claudio R. Goncalves Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang On 2021-10-11 15:56:09 [+0200], To Luis Claudio R. Goncalves wrote: > So my guess is that the _bh isn't needed. Correction: If there is at least _one_ reader in softirq all, writer need to disable BH which _might_ be the case here. The splat is probably because on read_lock holder disables BH later on which I don't see yet. Sebastian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() 2021-10-11 16:36 ` Sebastian Andrzej Siewior @ 2021-10-18 15:42 ` Luis Goncalves 2021-11-04 18:58 ` Luis Goncalves 0 siblings, 1 reply; 12+ messages in thread From: Luis Goncalves @ 2021-10-18 15:42 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang On Mon, Oct 11, 2021 at 1:36 PM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > On 2021-10-11 15:56:09 [+0200], To Luis Claudio R. Goncalves wrote: > > So my guess is that the _bh isn't needed. > > Correction: If there is at least _one_ reader in softirq all, writer > need to disable BH which _might_ be the case here. > The splat is probably because on read_lock holder disables BH later on > which I don't see yet. > > Sebastian The way I was able to reproduce the problem in 5.10 (and in 5.14 before -rt11) was: 1. systemctl disable tuned.service 2. Reboot to a kernel-rt with lockdep support 3. systemctl start tuned.service I used a few tuned profiles, but stuck with 'realtime' for obvious reasons. If that helps in the process I can investigate which specific action/configuration performed by tuned is triggering the lockdep splat. Luis ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() 2021-10-18 15:42 ` Luis Goncalves @ 2021-11-04 18:58 ` Luis Goncalves 2021-11-25 1:39 ` Luis Goncalves 0 siblings, 1 reply; 12+ messages in thread From: Luis Goncalves @ 2021-11-04 18:58 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang On Mon, Oct 18, 2021 at 12:42 PM Luis Goncalves <lgoncalv@redhat.com> wrote: > > On Mon, Oct 11, 2021 at 1:36 PM Sebastian Andrzej Siewior > <bigeasy@linutronix.de> wrote: > > > > On 2021-10-11 15:56:09 [+0200], To Luis Claudio R. Goncalves wrote: > > > So my guess is that the _bh isn't needed. > > > > Correction: If there is at least _one_ reader in softirq all, writer > > need to disable BH which _might_ be the case here. > > The splat is probably because on read_lock holder disables BH later on > > which I don't see yet. > > > > Sebastian > > The way I was able to reproduce the problem in 5.10 (and in 5.14 before -rt11) > was: > > 1. systemctl disable tuned.service > 2. Reboot to a kernel-rt with lockdep support > 3. systemctl start tuned.service > > I used a few tuned profiles, but stuck with 'realtime' for obvious reasons. > > If that helps in the process I can investigate which specific > action/configuration > performed by tuned is triggering the lockdep splat. Just for the sake of completeness, this is the lockdep splat for v5.10.73-rt54: [ 41.917124] hrtimer: interrupt took 19800197 ns [ 45.105231] bnxt_en 0000:19:00.0 eno1np0: NIC Link is Up, 10000 Mbps full duplex, Flow control: ON - receive & transmit [ 45.105235] bnxt_en 0000:19:00.0 eno1np0: FEC autoneg off encoding: None [ 46.370088] ====================================================== [ 46.370089] WARNING: possible circular locking dependency detected [ 46.370089] 5.10.73-rt54.lockdep+ #2 Tainted: G I [ 46.370091] ------------------------------------------------------ [ 46.370091] tuned/2248 is trying to acquire lock: [ 46.370093] ffff954d9e818220 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0x116/0x2f0 [ 46.370104] but task is already holding lock: [ 46.370104] ffffffff9ccafa60 (dev_base_lock){++.+}-{0:0}, at: netstat_show.isra.17+0x4a/0xb0 [ 46.370111] which lock already depends on the new lock. [ 46.370111] the existing dependency chain (in reverse order) is: [ 46.370111] -> #1 (dev_base_lock){++.+}-{0:0}: [ 46.370113] lock_acquire+0xde/0x380 [ 46.370117] rt_write_lock+0x3a/0x1b0 [ 46.370122] list_netdevice+0x4c/0x140 [ 46.370125] register_netdevice+0x59b/0x810 [ 46.370127] register_netdev+0x1a/0x30 [ 46.370128] loopback_net_init+0x48/0xa0 [ 46.370130] ops_init+0x3a/0x180 [ 46.370133] register_pernet_operations+0x126/0x220 [ 46.370135] register_pernet_device+0x25/0x60 [ 46.370138] net_dev_init+0x31b/0x391 [ 46.370141] do_one_initcall+0x74/0x440 [ 46.370145] kernel_init_freeable+0x39d/0x408 [ 46.370148] kernel_init+0xb/0x12e [ 46.370150] ret_from_fork+0x1f/0x30 [ 46.370152] -> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}: [ 46.370153] check_prevs_add+0x1bb/0xe50 [ 46.370155] __lock_acquire+0x1187/0x1640 [ 46.370157] lock_acquire+0xde/0x380 [ 46.370158] rt_spin_lock+0x2b/0xd0 [ 46.370160] __local_bh_disable_ip+0x116/0x2f0 [ 46.370162] tg3_get_stats64+0x29/0xb0 [tg3] [ 46.370167] dev_get_stats+0x5b/0xc0 [ 46.370168] netstat_show.isra.17+0x5b/0xb0 [ 46.370170] dev_attr_show+0x18/0x50 [ 46.370173] sysfs_kf_seq_show+0xa1/0x130 [ 46.370176] seq_read_iter+0x19e/0x420 [ 46.370178] new_sync_read+0x121/0x1c0 [ 46.370180] vfs_read+0x185/0x1e0 [ 46.370182] ksys_read+0x5f/0xe0 [ 46.370184] do_syscall_64+0x33/0x40 [ 46.370186] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 46.370188] other info that might help us debug this: [ 46.370188] Possible unsafe locking scenario: [ 46.370189] CPU0 CPU1 [ 46.370189] ---- ---- [ 46.370189] lock(dev_base_lock); [ 46.370190] lock((softirq_ctrl.lock).lock); [ 46.370191] lock(dev_base_lock); [ 46.370192] lock((softirq_ctrl.lock).lock); [ 46.370193] *** DEADLOCK *** [ 46.370193] 6 locks held by tuned/2248: [ 46.370194] #0: ffff954606ef7d30 (&f->f_pos_lock){+.+.}-{0:0}, at: __fdget_pos+0x4b/0x60 [ 46.370199] #1: ffff9545d307a1f0 (&p->lock){+.+.}-{0:0}, at: seq_read_iter+0x56/0x420 [ 46.370202] #2: ffff9545fcde6cb8 (&of->mutex){+.+.}-{0:0}, at: kernfs_seq_start+0x28/0x90 [ 46.370205] #3: ffff953e5c12dd98 (kn->active#246){.+.+}-{0:0}, at: kernfs_seq_start+0x30/0x90 [ 46.370208] #4: ffffffff9ccafa60 (dev_base_lock){++.+}-{0:0}, at: netstat_show.isra.17+0x4a/0xb0 [ 46.370211] #5: ffffffff9c8c8fe0 (rcu_read_lock){....}-{1:2}, at: rt_read_lock+0x7d/0x1e0 [ 46.370214] stack backtrace: [ 46.370214] CPU: 27 PID: 2248 Comm: tuned Tainted: G I 5.10.73-rt54.lockdep+ #2 [ 46.370216] Hardware name: Dell Inc. PowerEdge R740/07X9K0, BIOS 2.8.2 08/27/2020 [ 46.370217] Call Trace: [ 46.370218] dump_stack+0x77/0x97 [ 46.370224] check_noncircular+0xff/0x120 [ 46.370228] check_prevs_add+0x1bb/0xe50 [ 46.370232] __lock_acquire+0x1187/0x1640 [ 46.370236] lock_acquire+0xde/0x380 [ 46.370237] ? __local_bh_disable_ip+0x116/0x2f0 [ 46.370242] ? tg3_get_stats64+0x5/0xb0 [tg3] [ 46.370245] rt_spin_lock+0x2b/0xd0 [ 46.370246] ? __local_bh_disable_ip+0x116/0x2f0 [ 46.370249] __local_bh_disable_ip+0x116/0x2f0 [ 46.370251] tg3_get_stats64+0x29/0xb0 [tg3] [ 46.370255] dev_get_stats+0x5b/0xc0 [ 46.370257] netstat_show.isra.17+0x5b/0xb0 [ 46.370263] dev_attr_show+0x18/0x50 [ 46.370264] sysfs_kf_seq_show+0xa1/0x130 [ 46.370266] seq_read_iter+0x19e/0x420 [ 46.370269] new_sync_read+0x121/0x1c0 [ 46.370272] vfs_read+0x185/0x1e0 [ 46.370274] ksys_read+0x5f/0xe0 [ 46.370276] do_syscall_64+0x33/0x40 [ 46.370278] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 46.370280] RIP: 0033:0x7f4ee932b9e4 [ 46.370282] Code: c3 0f 1f 44 00 00 41 54 49 89 d4 55 48 89 f5 53 89 fb 48 83 ec 10 e8 7b fc ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 38 44 89 c7 48 89 44 24 08 e8 b7 fc ff ff 48 [ 46.370283] RSP: 002b:00007f4ede603be0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 46.370285] RAX: ffffffffffffffda RBX: 0000000000000039 RCX: 00007f4ee932b9e4 [ 46.370286] RDX: 0000000000001001 RSI: 00007f4ed801b670 RDI: 0000000000000039 [ 46.370287] RBP: 00007f4ed801b670 R08: 0000000000000000 R09: 0000000000000003 [ 46.370288] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000001001 [ 46.370289] R13: 0000000000000039 R14: 00007f4ed801b670 R15: 000055cd6476e060 [ 47.606432] DMA-API: dma_debug_entry pool grown to 131072 (200%) [ 49.474083] DMA-API: dma_debug_entry pool grown to 196608 (300%) [ 49.653708] DMA-API: dma_debug_entry pool grown to 262144 (400%) Luis ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() 2021-11-04 18:58 ` Luis Goncalves @ 2021-11-25 1:39 ` Luis Goncalves 2021-11-25 10:01 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 12+ messages in thread From: Luis Goncalves @ 2021-11-25 1:39 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang On Thu, Nov 4, 2021 at 3:58 PM Luis Goncalves <lgoncalv@redhat.com> wrote: > > On Mon, Oct 18, 2021 at 12:42 PM Luis Goncalves <lgoncalv@redhat.com> wrote: > > > > On Mon, Oct 11, 2021 at 1:36 PM Sebastian Andrzej Siewior > > <bigeasy@linutronix.de> wrote: > > > > > > On 2021-10-11 15:56:09 [+0200], To Luis Claudio R. Goncalves wrote: > > > > So my guess is that the _bh isn't needed. > > > > > > Correction: If there is at least _one_ reader in softirq all, writer > > > need to disable BH which _might_ be the case here. > > > The splat is probably because on read_lock holder disables BH later on > > > which I don't see yet. > > > > > > Sebastian > > > > The way I was able to reproduce the problem in 5.10 (and in 5.14 before -rt11) > > was: > > > > 1. systemctl disable tuned.service > > 2. Reboot to a kernel-rt with lockdep support > > 3. systemctl start tuned.service > > > > I used a few tuned profiles, but stuck with 'realtime' for obvious reasons. > > > > If that helps in the process I can investigate which specific > > action/configuration > > performed by tuned is triggering the lockdep splat. > > Just for the sake of completeness, this is the lockdep splat for v5.10.73-rt54: Sebastian, I just stumbled over the lockdep splat being discussed here in v5.15.3-rt21. Well, slightly different, but related to the very same situation: (btw, sorry if the formatting is off, I am using the gmail webui) [35429.000894] ====================================================== [35429.000895] WARNING: possible circular locking dependency detected [35429.000896] 5.15.3-rt21 #2 Not tainted [35429.000897] ------------------------------------------------------ [35429.000898] tuned/6697 is trying to acquire lock: [35429.000899] ffff8dbfd0cebf50 (&adapter->stats_lock){+.+.}-{2:2}, at: cxgb_get_stats+0x41/0x1d0 [cxgb4] [35429.000925] but task is already holding lock: [35429.000925] ffffffff9e0ed570 (dev_base_lock){++.+}-{2:2}, at: netstat_show.constprop.0+0x41/0xb0 [35429.000930] which lock already depends on the new lock. [35429.000931] the existing dependency chain (in reverse order) is: [35429.000931] -> #2 (dev_base_lock){++.+}-{2:2}: [35429.000933] __lock_acquire+0x4d2/0x930 [35429.000937] lock_acquire.part.0+0x9c/0x210 [35429.000938] rt_write_lock+0x48/0xc0 [35429.000941] list_netdevice+0x4c/0x130 [35429.000943] register_netdevice+0x4fd/0x670 [35429.000945] register_netdev+0x1c/0x40 [35429.000959] loopback_net_init+0x48/0x90 [35429.000962] ops_init+0x3a/0x160 [35429.000964] register_pernet_operations+0x127/0x220 [35429.000965] register_pernet_device+0x26/0x60 [35429.000967] net_dev_init+0x23c/0x2b0 [35429.000971] do_one_initcall+0x47/0x170 [35429.000974] do_initcalls+0xc6/0xdf [35429.000977] kernel_init_freeable+0xfc/0x12c [35429.000979] kernel_init+0x16/0x120 [35429.000982] ret_from_fork+0x22/0x30 [35429.000984] -> #1 ((softirq_ctrl.lock)){+.+.}-{2:2}: [35429.000986] __lock_acquire+0x4d2/0x930 [35429.000988] lock_acquire.part.0+0x9c/0x210 [35429.000989] rt_spin_lock+0x27/0xe0 [35429.000992] __local_bh_disable_ip+0xb8/0x190 [35429.000994] t4_wr_mbox_meat_timeout+0xa7/0x720 [cxgb4] [35429.001004] t4_get_mps_bg_map+0xba/0x1b0 [cxgb4] [35429.001028] t4_get_port_stats+0x16/0xc50 [cxgb4] [35429.001039] t4_get_port_stats_offset+0x12/0x30 [cxgb4] [35429.001050] cxgb_get_stats+0x66/0x1d0 [cxgb4] [35429.001059] dev_get_stats+0x5c/0xc0 [35429.001061] rtnl_fill_stats+0x3b/0x130 [35429.001063] rtnl_fill_ifinfo+0x706/0x1140 [35429.001066] rtmsg_ifinfo_build_skb+0x88/0xe0 [35429.001068] rtmsg_ifinfo+0x25/0x60 [35429.001070] register_netdevice+0x5f9/0x670 [35429.001072] register_netdev+0x1c/0x40 [35429.001074] init_one+0x8af/0xda0 [cxgb4] [35429.001084] local_pci_probe+0x45/0x80 [35429.001088] work_for_cpu_fn+0x16/0x20 [35429.001090] process_one_work+0x27a/0x450 [35429.001092] worker_thread+0x1ea/0x3c0 [35429.001093] kthread+0x188/0x1a0 [35429.001095] ret_from_fork+0x22/0x30 [35429.001098] -> #0 (&adapter->stats_lock){+.+.}-{2:2}: [35429.001099] check_prev_add+0x91/0xc10 [35429.001101] validate_chain+0x480/0x510 [35429.001102] __lock_acquire+0x4d2/0x930 [35429.001104] lock_acquire.part.0+0x9c/0x210 [35429.001105] rt_spin_lock+0x27/0xe0 [35429.001107] cxgb_get_stats+0x41/0x1d0 [cxgb4] [35429.001116] dev_get_stats+0x5c/0xc0 [35429.001118] netstat_show.constprop.0+0x52/0xb0 [35429.001120] dev_attr_show+0x19/0x40 [35429.001122] sysfs_kf_seq_show+0xa9/0x100 [35429.001125] seq_read_iter+0x128/0x4c0 [35429.001128] new_sync_read+0x121/0x1b0 [35429.001130] vfs_read+0x133/0x1d0 [35429.001131] ksys_read+0x68/0xe0 [35429.001133] do_syscall_64+0x3b/0x90 [35429.001134] entry_SYSCALL_64_after_hwframe+0x44/0xae [35429.001137] other info that might help us debug this: [35429.001137] Chain exists of: &adapter->stats_lock --> (softirq_ctrl.lock) --> dev_base_lock [35429.001139] Possible unsafe locking scenario: [35429.001140] CPU0 CPU1 [35429.001140] ---- ---- [35429.001140] lock(dev_base_lock); [35429.001141] lock((softirq_ctrl.lock)); [35429.001142] lock(dev_base_lock); [35429.001143] lock(&adapter->stats_lock); [35429.001144] *** DEADLOCK *** [35429.001144] 6 locks held by tuned/6697: [35429.001145] #0: ffff8dba839df838 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4d/0x60 [35429.001149] #1: ffff8dba44ef3d98 (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0x51/0x4c0 [35429.001153] #2: ffff8dba49960a80 (&of->mutex){+.+.}-{3:3}, at: kernfs_seq_start+0x2a/0xb0 [35429.001156] #3: ffff8dbff8ea31e8 (kn->active#223){++++}-{0:0}, at: kernfs_seq_start+0x32/0xb0 [35429.001159] #4: ffffffff9e0ed570 (dev_base_lock){++.+}-{2:2}, at: netstat_show.constprop.0+0x41/0xb0 [35429.001163] #5: ffffffff9de61200 (rcu_read_lock){....}-{1:2}, at: rt_read_lock+0x65/0x130 [35429.001166] stack backtrace: [35429.001167] CPU: 16 PID: 6697 Comm: tuned Kdump: loaded Not tainted 5.15.3-rt21 #2 [35429.001169] Hardware name: Dell Inc. PowerEdge R520/08DM12, BIOS 2.1.2 01/20/2014 [35429.001171] Call Trace: [35429.001172] <TASK> [35429.001174] dump_stack_lvl+0x57/0x7d [35429.001177] check_noncircular+0xff/0x110 [35429.001182] check_prev_add+0x91/0xc10 [35429.001184] ? add_chain_cache+0x112/0x2d0 [35429.001186] validate_chain+0x480/0x510 [35429.001190] __lock_acquire+0x4d2/0x930 [35429.001193] lock_acquire.part.0+0x9c/0x210 [35429.001195] ? cxgb_get_stats+0x41/0x1d0 [cxgb4] [35429.001205] ? rcu_read_lock_sched_held+0x3f/0x70 [35429.001209] ? trace_lock_acquire+0x38/0x140 [35429.001210] ? lock_acquire+0x30/0x80 [35429.001212] ? cxgb_get_stats+0x41/0x1d0 [cxgb4] [35429.001222] rt_spin_lock+0x27/0xe0 [35429.001225] ? cxgb_get_stats+0x41/0x1d0 [cxgb4] [35429.001234] cxgb_get_stats+0x41/0x1d0 [cxgb4] [35429.001245] ? check_prev_add+0xa3/0xc10 [35429.001247] ? check_prev_add+0xa3/0xc10 [35429.001248] ? add_chain_cache+0x112/0x2d0 [35429.001251] ? lockdep_unlock+0x5c/0xc0 [35429.001252] ? validate_chain+0x29a/0x510 [35429.001255] ? __lock_acquire+0x4d2/0x930 [35429.001258] ? lock_acquire.part.0+0x9c/0x210 [35429.001261] ? rcu_read_lock_sched_held+0x3f/0x70 [35429.001263] ? trace_lock_acquire+0x38/0x140 [35429.001265] ? lock_acquire+0x30/0x80 [35429.001266] ? rt_read_lock+0x65/0x130 [35429.001269] dev_get_stats+0x5c/0xc0 [35429.001272] netstat_show.constprop.0+0x52/0xb0 [35429.001279] dev_attr_show+0x19/0x40 [35429.001280] sysfs_kf_seq_show+0xa9/0x100 [35429.001283] seq_read_iter+0x128/0x4c0 [35429.001288] new_sync_read+0x121/0x1b0 [35429.001292] vfs_read+0x133/0x1d0 [35429.001294] ksys_read+0x68/0xe0 [35429.001297] do_syscall_64+0x3b/0x90 [35429.001299] entry_SYSCALL_64_after_hwframe+0x44/0xae [35429.001301] RIP: 0033:0x7f03e459a9ec [35429.001304] Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 09 87 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 4f 87 f8 ff 48 [35429.001306] RSP: 002b:00007f03e23896a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [35429.001308] RAX: ffffffffffffffda RBX: 00007f03e238b5c0 RCX: 00007f03e459a9ec [35429.001309] RDX: 0000000000001001 RSI: 00007f03dc03b5b0 RDI: 0000000000000008 [35429.001310] RBP: 0000000000001001 R08: 0000000000000000 R09: 0000000000000000 [35429.001311] R10: 0000000000000002 R11: 0000000000000246 R12: 00007f03e021b540 [35429.001312] R13: 00007f03dc03b5b0 R14: 0000000000000008 R15: 00007f03d8001c00 [35429.001328] </TASK> [35430.549435] bnx2x 0000:08:00.1 enp8s0f1: using MSI-X IRQs: sp 97 fp[0] 99 ... fp[19] 332 [35430.695688] bnx2x 0000:08:00.1 enp8s0f1: NIC Link is Up, 1000 Mbps full duplex, Flow control: ON - receive & transmit [35432.120806] bnx2x 0000:08:00.0 enp8s0f0: using MSI-X IRQs: sp 61 fp[0] 63 ... fp[19] 344 [35432.425838] bnx2x 0000:08:00.0 enp8s0f0: NIC Link is Up, 1000 Mbps full duplex, Flow control: ON - receive & transmit ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() 2021-11-25 1:39 ` Luis Goncalves @ 2021-11-25 10:01 ` Sebastian Andrzej Siewior 2021-11-25 11:11 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 12+ messages in thread From: Sebastian Andrzej Siewior @ 2021-11-25 10:01 UTC (permalink / raw) To: Luis Goncalves Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang On 2021-11-24 22:39:34 [-0300], Luis Goncalves wrote: > Sebastian, I just stumbled over the lockdep splat being discussed here > in v5.15.3-rt21. Well, slightly different, but related to the very > same situation: … so we have write_lock_bh(&dev_base_lock) and read_lock(&dev_base_lock) followed by local_bh_disable() (due to spin_lock_bh(&adap->mbox_lock)). It makes sense to only change that one invocation. Let me form a patch. Sebastian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() 2021-11-25 10:01 ` Sebastian Andrzej Siewior @ 2021-11-25 11:11 ` Sebastian Andrzej Siewior 2021-11-26 12:09 ` Luis Goncalves 0 siblings, 1 reply; 12+ messages in thread From: Sebastian Andrzej Siewior @ 2021-11-25 11:11 UTC (permalink / raw) To: Luis Goncalves Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang On 2021-11-25 11:01:10 [+0100], To Luis Goncalves wrote: > > It makes sense to only change that one invocation. Let me form a patch. So I made dis: | From aaef7596167b3b38a81567af81aaa2f9559c91df Mon Sep 17 00:00:00 2001 | From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> | Date: Thu, 25 Nov 2021 11:05:17 +0100 | Subject: [PATCH] net-sysfs: Acquire dev_base_lock disabled BH in | netstat_show() | | The writer acquires dev_base_lock with disabled bottom halves. | The reader can acquire dev_base_lock without disabling bottom halves | because there is no writer in softirq context. Also readers can acquire | the lock in task and softirq context and multiple readers are allowed. | | On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts | as a lock to ensure that resources, that are protected by disabling | bottom halves, remain protected. | This leads to a circular locking dependency if the lock acquired with | disabled bottom halves (as in write_lock_bh()) and somewhere else with | enabled bottom halves (as by read_lock() in netstat_show()) followed by | disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout() | -> spin_lock_bh()). This is the reverse locking order. | | In order to avoid the reverse locking order, acquire the dev_base_lock | with disabled bottom halves. | | Reported-by: Pei Zhang <pezhang@redhat.com> | Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> | Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> and then I started locking for in-softirq reader or writer. I can't find any. Therefore I made this for testing: diff --git a/net/core/dev.c b/net/core/dev.c --- a/net/core/dev.c +++ b/net/core/dev.c @@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev) ASSERT_RTNL(); - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); list_add_tail_rcu(&dev->dev_list, &net->dev_base_head); netdev_name_node_add(net, dev->name_node); hlist_add_head_rcu(&dev->index_hlist, dev_index_hash(net, dev->ifindex)); - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); dev_base_seq_inc(net); } @@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev) ASSERT_RTNL(); /* Unlink dev from the device chain */ - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); list_del_rcu(&dev->dev_list); netdev_name_node_del(dev->name_node); hlist_del_rcu(&dev->index_hlist); - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); dev_base_seq_inc(dev_net(dev)); } @@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname) netdev_adjacent_rename_links(dev, oldname); - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); netdev_name_node_del(dev->name_node); - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); synchronize_rcu(); - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); netdev_name_node_add(net, dev->name_node); - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev); ret = notifier_to_errno(ret); diff --git a/net/core/link_watch.c b/net/core/link_watch.c --- a/net/core/link_watch.c +++ b/net/core/link_watch.c @@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev) if (operstate == dev->operstate) return; - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); switch(dev->link_mode) { case IF_LINK_MODE_TESTING: @@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev) dev->operstate = operstate; - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); } diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition) } if (dev->operstate != operstate) { - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); dev->operstate = operstate; - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); netdev_state_change(dev); } } @@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb, if (tb[IFLA_LINKMODE]) { unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]); - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); if (dev->link_mode ^ value) status |= DO_SETLINK_NOTIFY; dev->link_mode = value; - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); } if (tb[IFLA_VFINFO_LIST]) { diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c --- a/net/hsr/hsr_device.c +++ b/net/hsr/hsr_device.c @@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev) static void __hsr_set_operstate(struct net_device *dev, int transition) { - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); if (dev->operstate != transition) { dev->operstate = transition; - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); netdev_state_change(dev); } else { - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); } } If this does not create any warnings (with or without RT) then I would suggest to remove that _bh suffix since there is no need for it. This should also fix the RT warning you see now. Agreed? Sebastian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() 2021-11-25 11:11 ` Sebastian Andrzej Siewior @ 2021-11-26 12:09 ` Luis Goncalves 2021-11-26 13:24 ` Luis Goncalves 0 siblings, 1 reply; 12+ messages in thread From: Luis Goncalves @ 2021-11-26 12:09 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang On Thu, Nov 25, 2021 at 8:11 AM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > On 2021-11-25 11:01:10 [+0100], To Luis Goncalves wrote: > > > > It makes sense to only change that one invocation. Let me form a patch. > > So I made dis: > > | From aaef7596167b3b38a81567af81aaa2f9559c91df Mon Sep 17 00:00:00 2001 > | From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> > | Date: Thu, 25 Nov 2021 11:05:17 +0100 > | Subject: [PATCH] net-sysfs: Acquire dev_base_lock disabled BH in > | netstat_show() > | > | The writer acquires dev_base_lock with disabled bottom halves. > | The reader can acquire dev_base_lock without disabling bottom halves > | because there is no writer in softirq context. Also readers can acquire > | the lock in task and softirq context and multiple readers are allowed. > | > | On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts > | as a lock to ensure that resources, that are protected by disabling > | bottom halves, remain protected. > | This leads to a circular locking dependency if the lock acquired with > | disabled bottom halves (as in write_lock_bh()) and somewhere else with > | enabled bottom halves (as by read_lock() in netstat_show()) followed by > | disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout() > | -> spin_lock_bh()). This is the reverse locking order. > | > | In order to avoid the reverse locking order, acquire the dev_base_lock > | with disabled bottom halves. > | > | Reported-by: Pei Zhang <pezhang@redhat.com> > | Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> > | Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > and then I started locking for in-softirq reader or writer. I can't find > any. Therefore I made this for testing: > > diff --git a/net/core/dev.c b/net/core/dev.c > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev) > > ASSERT_RTNL(); > > - write_lock_bh(&dev_base_lock); > + write_lock(&dev_base_lock); > list_add_tail_rcu(&dev->dev_list, &net->dev_base_head); > netdev_name_node_add(net, dev->name_node); > hlist_add_head_rcu(&dev->index_hlist, > dev_index_hash(net, dev->ifindex)); > - write_unlock_bh(&dev_base_lock); > + write_unlock(&dev_base_lock); > > dev_base_seq_inc(net); > } > @@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev) > ASSERT_RTNL(); > > /* Unlink dev from the device chain */ > - write_lock_bh(&dev_base_lock); > + write_lock(&dev_base_lock); > list_del_rcu(&dev->dev_list); > netdev_name_node_del(dev->name_node); > hlist_del_rcu(&dev->index_hlist); > - write_unlock_bh(&dev_base_lock); > + write_unlock(&dev_base_lock); > > dev_base_seq_inc(dev_net(dev)); > } > @@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname) > > netdev_adjacent_rename_links(dev, oldname); > > - write_lock_bh(&dev_base_lock); > + write_lock(&dev_base_lock); > netdev_name_node_del(dev->name_node); > - write_unlock_bh(&dev_base_lock); > + write_unlock(&dev_base_lock); > > synchronize_rcu(); > > - write_lock_bh(&dev_base_lock); > + write_lock(&dev_base_lock); > netdev_name_node_add(net, dev->name_node); > - write_unlock_bh(&dev_base_lock); > + write_unlock(&dev_base_lock); > > ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev); > ret = notifier_to_errno(ret); > diff --git a/net/core/link_watch.c b/net/core/link_watch.c > --- a/net/core/link_watch.c > +++ b/net/core/link_watch.c > @@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev) > if (operstate == dev->operstate) > return; > > - write_lock_bh(&dev_base_lock); > + write_lock(&dev_base_lock); > > switch(dev->link_mode) { > case IF_LINK_MODE_TESTING: > @@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev) > > dev->operstate = operstate; > > - write_unlock_bh(&dev_base_lock); > + write_unlock(&dev_base_lock); > } > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > --- a/net/core/rtnetlink.c > +++ b/net/core/rtnetlink.c > @@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition) > } > > if (dev->operstate != operstate) { > - write_lock_bh(&dev_base_lock); > + write_lock(&dev_base_lock); > dev->operstate = operstate; > - write_unlock_bh(&dev_base_lock); > + write_unlock(&dev_base_lock); > netdev_state_change(dev); > } > } > @@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb, > if (tb[IFLA_LINKMODE]) { > unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]); > > - write_lock_bh(&dev_base_lock); > + write_lock(&dev_base_lock); > if (dev->link_mode ^ value) > status |= DO_SETLINK_NOTIFY; > dev->link_mode = value; > - write_unlock_bh(&dev_base_lock); > + write_unlock(&dev_base_lock); > } > > if (tb[IFLA_VFINFO_LIST]) { > diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c > --- a/net/hsr/hsr_device.c > +++ b/net/hsr/hsr_device.c > @@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev) > > static void __hsr_set_operstate(struct net_device *dev, int transition) > { > - write_lock_bh(&dev_base_lock); > + write_lock(&dev_base_lock); > if (dev->operstate != transition) { > dev->operstate = transition; > - write_unlock_bh(&dev_base_lock); > + write_unlock(&dev_base_lock); > netdev_state_change(dev); > } else { > - write_unlock_bh(&dev_base_lock); > + write_unlock(&dev_base_lock); > } > } > > > If this does not create any warnings (with or without RT) then I would > suggest to remove that _bh suffix since there is no need for it. This > should also fix the RT warning you see now. > Agreed? Sounds like a plan! I will start testing within the hour and report soon. Thanks, Luis > Sebastian > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() 2021-11-26 12:09 ` Luis Goncalves @ 2021-11-26 13:24 ` Luis Goncalves 2021-11-26 16:15 ` [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves Sebastian Andrzej Siewior 0 siblings, 1 reply; 12+ messages in thread From: Luis Goncalves @ 2021-11-26 13:24 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang On Fri, Nov 26, 2021 at 9:09 AM Luis Goncalves <lgoncalv@redhat.com> wrote: > > On Thu, Nov 25, 2021 at 8:11 AM Sebastian Andrzej Siewior > <bigeasy@linutronix.de> wrote: > > > > On 2021-11-25 11:01:10 [+0100], To Luis Goncalves wrote: > > > > > > It makes sense to only change that one invocation. Let me form a patch. > > > > So I made dis: > > > > | From aaef7596167b3b38a81567af81aaa2f9559c91df Mon Sep 17 00:00:00 2001 > > | From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> > > | Date: Thu, 25 Nov 2021 11:05:17 +0100 > > | Subject: [PATCH] net-sysfs: Acquire dev_base_lock disabled BH in > > | netstat_show() > > | > > | The writer acquires dev_base_lock with disabled bottom halves. > > | The reader can acquire dev_base_lock without disabling bottom halves > > | because there is no writer in softirq context. Also readers can acquire > > | the lock in task and softirq context and multiple readers are allowed. > > | > > | On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts > > | as a lock to ensure that resources, that are protected by disabling > > | bottom halves, remain protected. > > | This leads to a circular locking dependency if the lock acquired with > > | disabled bottom halves (as in write_lock_bh()) and somewhere else with > > | enabled bottom halves (as by read_lock() in netstat_show()) followed by > > | disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout() > > | -> spin_lock_bh()). This is the reverse locking order. > > | > > | In order to avoid the reverse locking order, acquire the dev_base_lock > > | with disabled bottom halves. > > | > > | Reported-by: Pei Zhang <pezhang@redhat.com> > > | Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> > > | Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > > > and then I started locking for in-softirq reader or writer. I can't find > > any. Therefore I made this for testing: > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev) > > > > ASSERT_RTNL(); > > > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > list_add_tail_rcu(&dev->dev_list, &net->dev_base_head); > > netdev_name_node_add(net, dev->name_node); > > hlist_add_head_rcu(&dev->index_hlist, > > dev_index_hash(net, dev->ifindex)); > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > > > dev_base_seq_inc(net); > > } > > @@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev) > > ASSERT_RTNL(); > > > > /* Unlink dev from the device chain */ > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > list_del_rcu(&dev->dev_list); > > netdev_name_node_del(dev->name_node); > > hlist_del_rcu(&dev->index_hlist); > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > > > dev_base_seq_inc(dev_net(dev)); > > } > > @@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname) > > > > netdev_adjacent_rename_links(dev, oldname); > > > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > netdev_name_node_del(dev->name_node); > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > > > synchronize_rcu(); > > > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > netdev_name_node_add(net, dev->name_node); > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > > > ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev); > > ret = notifier_to_errno(ret); > > diff --git a/net/core/link_watch.c b/net/core/link_watch.c > > --- a/net/core/link_watch.c > > +++ b/net/core/link_watch.c > > @@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev) > > if (operstate == dev->operstate) > > return; > > > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > > > switch(dev->link_mode) { > > case IF_LINK_MODE_TESTING: > > @@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev) > > > > dev->operstate = operstate; > > > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > } > > > > > > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c > > --- a/net/core/rtnetlink.c > > +++ b/net/core/rtnetlink.c > > @@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition) > > } > > > > if (dev->operstate != operstate) { > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > dev->operstate = operstate; > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > netdev_state_change(dev); > > } > > } > > @@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb, > > if (tb[IFLA_LINKMODE]) { > > unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]); > > > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > if (dev->link_mode ^ value) > > status |= DO_SETLINK_NOTIFY; > > dev->link_mode = value; > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > } > > > > if (tb[IFLA_VFINFO_LIST]) { > > diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c > > --- a/net/hsr/hsr_device.c > > +++ b/net/hsr/hsr_device.c > > @@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev) > > > > static void __hsr_set_operstate(struct net_device *dev, int transition) > > { > > - write_lock_bh(&dev_base_lock); > > + write_lock(&dev_base_lock); > > if (dev->operstate != transition) { > > dev->operstate = transition; > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > netdev_state_change(dev); > > } else { > > - write_unlock_bh(&dev_base_lock); > > + write_unlock(&dev_base_lock); > > } > > } > > > > > > If this does not create any warnings (with or without RT) then I would > > suggest to remove that _bh suffix since there is no need for it. This > > should also fix the RT warning you see now. > > Agreed? > > Sounds like a plan! > > I will start testing within the hour and report soon. Using v5.15.3-rt21 + your patch, I rebooted multiple times the two boxes where I had observed the netstat_show() lockdep complaint before. The kernel booted fine every single time, no netstat_show() lockdep complaints at all. I only tested a kernel with PREEMPT_RT enabled. Do you want me to test the vanilla kernel with your patch? Luis ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves. 2021-11-26 13:24 ` Luis Goncalves @ 2021-11-26 16:15 ` Sebastian Andrzej Siewior 2021-11-29 12:20 ` patchwork-bot+netdevbpf 0 siblings, 1 reply; 12+ messages in thread From: Sebastian Andrzej Siewior @ 2021-11-26 16:15 UTC (permalink / raw) To: netdev Cc: Luis Goncalves, Steven Rostedt, Nitesh Narayan Lal, Pei Zhang, David S. Miller, Jakub Kicinski, Thomas Gleixner The writer acquires dev_base_lock with disabled bottom halves. The reader can acquire dev_base_lock without disabling bottom halves because there is no writer in softirq context. On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts as a lock to ensure that resources, that are protected by disabling bottom halves, remain protected. This leads to a circular locking dependency if the lock acquired with disabled bottom halves (as in write_lock_bh()) and somewhere else with enabled bottom halves (as by read_lock() in netstat_show()) followed by disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout() -> spin_lock_bh()). This is the reverse locking order. All read_lock() invocation are from sysfs callback which are not invoked from softirq context. Therefore there is no need to disable bottom halves while acquiring a write lock. Acquire the write lock of dev_base_lock without disabling bottom halves. Reported-by: Pei Zhang <pezhang@redhat.com> Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- net/core/dev.c | 16 ++++++++-------- net/core/link_watch.c | 4 ++-- net/core/rtnetlink.c | 8 ++++---- net/hsr/hsr_device.c | 6 +++--- 4 files changed, 17 insertions(+), 17 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 8b5cf8ad859b5..ad3cccbfa573b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev) ASSERT_RTNL(); - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); list_add_tail_rcu(&dev->dev_list, &net->dev_base_head); netdev_name_node_add(net, dev->name_node); hlist_add_head_rcu(&dev->index_hlist, dev_index_hash(net, dev->ifindex)); - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); dev_base_seq_inc(net); } @@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev) ASSERT_RTNL(); /* Unlink dev from the device chain */ - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); list_del_rcu(&dev->dev_list); netdev_name_node_del(dev->name_node); hlist_del_rcu(&dev->index_hlist); - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); dev_base_seq_inc(dev_net(dev)); } @@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname) netdev_adjacent_rename_links(dev, oldname); - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); netdev_name_node_del(dev->name_node); - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); synchronize_rcu(); - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); netdev_name_node_add(net, dev->name_node); - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev); ret = notifier_to_errno(ret); diff --git a/net/core/link_watch.c b/net/core/link_watch.c index 1a455847da54f..9599afd0862da 100644 --- a/net/core/link_watch.c +++ b/net/core/link_watch.c @@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev) if (operstate == dev->operstate) return; - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); switch(dev->link_mode) { case IF_LINK_MODE_TESTING: @@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev) dev->operstate = operstate; - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); } diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 2af8aeeadadf0..716be2f88cd75 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition) } if (dev->operstate != operstate) { - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); dev->operstate = operstate; - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); netdev_state_change(dev); } } @@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb, if (tb[IFLA_LINKMODE]) { unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]); - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); if (dev->link_mode ^ value) status |= DO_SETLINK_NOTIFY; dev->link_mode = value; - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); } if (tb[IFLA_VFINFO_LIST]) { diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c index 737e4f17e1c6d..e57fdad9ef942 100644 --- a/net/hsr/hsr_device.c +++ b/net/hsr/hsr_device.c @@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev) static void __hsr_set_operstate(struct net_device *dev, int transition) { - write_lock_bh(&dev_base_lock); + write_lock(&dev_base_lock); if (dev->operstate != transition) { dev->operstate = transition; - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); netdev_state_change(dev); } else { - write_unlock_bh(&dev_base_lock); + write_unlock(&dev_base_lock); } } -- 2.34.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves. 2021-11-26 16:15 ` [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves Sebastian Andrzej Siewior @ 2021-11-29 12:20 ` patchwork-bot+netdevbpf 0 siblings, 0 replies; 12+ messages in thread From: patchwork-bot+netdevbpf @ 2021-11-29 12:20 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: netdev, lgoncalv, rostedt, nilal, pezhang, davem, kuba, tglx Hello: This patch was applied to netdev/net-next.git (master) by David S. Miller <davem@davemloft.net>: On Fri, 26 Nov 2021 17:15:29 +0100 you wrote: > The writer acquires dev_base_lock with disabled bottom halves. > The reader can acquire dev_base_lock without disabling bottom halves > because there is no writer in softirq context. > > On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts > as a lock to ensure that resources, that are protected by disabling > bottom halves, remain protected. > This leads to a circular locking dependency if the lock acquired with > disabled bottom halves (as in write_lock_bh()) and somewhere else with > enabled bottom halves (as by read_lock() in netstat_show()) followed by > disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout() > -> spin_lock_bh()). This is the reverse locking order. > > [...] Here is the summary with links: - [net-next] net: Write lock dev_base_lock without disabling bottom halves. https://git.kernel.org/netdev/net-next/c/fd888e85fe6b You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-11-29 13:38 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-05 1:51 [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() Luis Claudio R. Goncalves 2021-10-11 13:56 ` Sebastian Andrzej Siewior 2021-10-11 16:36 ` Sebastian Andrzej Siewior 2021-10-18 15:42 ` Luis Goncalves 2021-11-04 18:58 ` Luis Goncalves 2021-11-25 1:39 ` Luis Goncalves 2021-11-25 10:01 ` Sebastian Andrzej Siewior 2021-11-25 11:11 ` Sebastian Andrzej Siewior 2021-11-26 12:09 ` Luis Goncalves 2021-11-26 13:24 ` Luis Goncalves 2021-11-26 16:15 ` [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves Sebastian Andrzej Siewior 2021-11-29 12:20 ` patchwork-bot+netdevbpf
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.