[RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
@ 2021-10-05  1:51 Luis Claudio R. Goncalves
  2021-10-11 13:56 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Claudio R. Goncalves @ 2021-10-05  1:51 UTC (permalink / raw)
  To: linux-rt-users, Steven Rostedt, stable-rt
  Cc: Sebastian Andrzej Siewior, Nitesh Narayan Lal, Pei Zhang

Avoid a possible circular locking dependency by taking the softirq_ctrl.lock
before taking dev_base_lock in netstat_show(), keeping locking order
consistent.

Lockdep splat:

[  106.529733] ======================================================
[  106.529734] WARNING: possible circular locking dependency detected
[  106.529734] 5.10.65-rt53+ #1 Tainted: G        W I      
[  106.529736] ------------------------------------------------------
[  106.529736] tuned/2760 is trying to acquire lock:
[  106.529737] ffffa0695ea18220 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0x116/0x2f0
[  106.529746] 
               but task is already holding lock:
[  106.529746] ffffffff844af760 (dev_base_lock){++.+}-{0:0}, at: netstat_show.isra.17+0x4a/0xb0
[  106.529752] 
               which lock already depends on the new lock.
...
[  106.529825] 
               other info that might help us debug this:

[  106.529826]  Possible unsafe locking scenario:

[  106.529826]        CPU0                    CPU1
[  106.529827]        ----                    ----
[  106.529827]   lock(dev_base_lock);
[  106.529828]                                lock((softirq_ctrl.lock).lock);
[  106.529829]                                lock(dev_base_lock);
[  106.529829]   lock((softirq_ctrl.lock).lock);
[  106.529830] 
                *** DEADLOCK ***

[  106.529831] 6 locks held by tuned/2760:
[  106.529832]  #0: ffffa062296a6670 (&f->f_pos_lock){+.+.}-{0:0}, at: __fdget_pos+0x4b/0x60
[  106.529836]  #1: ffffa062223ae310 (&p->lock){+.+.}-{0:0}, at: seq_read_iter+0x56/0x420
[  106.529839]  #2: ffffa06242ea1ab8 (&of->mutex){+.+.}-{0:0}, at: kernfs_seq_start+0x28/0x90
[  106.529842]  #3: ffffa06268f52008 (kn->active#285){++++}-{0:0}, at: kernfs_seq_start+0x30/0x90
[  106.529844]  #4: ffffffff844af760 (dev_base_lock){++.+}-{0:0}, at: netstat_show.isra.17+0x4a/0xb0
[  106.529848]  #5: ffffffff840c8fa0 (rcu_read_lock){....}-{1:2}, at: rt_read_lock+0x7d/0x1e0


Reported-by: Pei Zhang <pezhang@redhat.com>
Signed-off-by: Nitesh Narayan Lal <nilal@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
---
 net/core/net-sysfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index f6197774048b..7291ca052632 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -625,14 +625,14 @@ static ssize_t netstat_show(const struct device *d,
 	WARN_ON(offset > sizeof(struct rtnl_link_stats64) ||
 		offset % sizeof(u64) != 0);
 
-	read_lock(&dev_base_lock);
+	read_lock_bh(&dev_base_lock);
 	if (dev_isalive(dev)) {
 		struct rtnl_link_stats64 temp;
 		const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp);
 
 		ret = sprintf(buf, fmt_u64, *(u64 *)(((u8 *)stats) + offset));
 	}
-	read_unlock(&dev_base_lock);
+	read_unlock_bh(&dev_base_lock);
 	return ret;
 }
 


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
  2021-10-05  1:51 [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() Luis Claudio R. Goncalves
@ 2021-10-11 13:56 ` Sebastian Andrzej Siewior
  2021-10-11 16:36   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-11 13:56 UTC (permalink / raw)
  To: Luis Claudio R. Goncalves
  Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang

On 2021-10-04 22:51:05 [-0300], Luis Claudio R. Goncalves wrote:
> Avoid a possible circular locking dependency by taking the softirq_ctrl.lock
> before taking dev_base_lock in netstat_show(), keeping locking order
> consistent.
…
> [  106.529734] 5.10.65-rt53+ #1 Tainted: G        W I      

So looking at 5.15 there are multiple instances of that. Either _all_ of
them need to disable bh while acquiring dev_base_lock or none of them.
I've been looking at a few call chains and didn't find one which
acquired that lock from softirq. This does not mean there isn't one.

Tracing this back, I stumbled upon v2.3.6 where a few of those got their
_bh removed while new ones were added with _bh. So _maybe_ this _bh
isn't required and nobody noticed. Lockdep would yell if it would
observe dev_base_lock in softirq and noticed one instance like this one
in netstat_show() where it is missing. But it didn't happen.

So my guess is that the _bh isn't needed.

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
  2021-10-11 13:56 ` Sebastian Andrzej Siewior
@ 2021-10-11 16:36   ` Sebastian Andrzej Siewior
  2021-10-18 15:42     ` Luis Goncalves
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-10-11 16:36 UTC (permalink / raw)
  To: Luis Claudio R. Goncalves
  Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang

On 2021-10-11 15:56:09 [+0200], To Luis Claudio R. Goncalves wrote:
> So my guess is that the _bh isn't needed.

Correction: If there is at least _one_ reader in softirq all, writer
need to disable BH which _might_ be the case here.
The splat is probably because on read_lock holder disables BH later on
which I don't see yet.

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
  2021-10-11 16:36   ` Sebastian Andrzej Siewior
@ 2021-10-18 15:42     ` Luis Goncalves
  2021-11-04 18:58       ` Luis Goncalves
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Goncalves @ 2021-10-18 15:42 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang

On Mon, Oct 11, 2021 at 1:36 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2021-10-11 15:56:09 [+0200], To Luis Claudio R. Goncalves wrote:
> > So my guess is that the _bh isn't needed.
>
> Correction: If there is at least _one_ reader in softirq all, writer
> need to disable BH which _might_ be the case here.
> The splat is probably because on read_lock holder disables BH later on
> which I don't see yet.
>
> Sebastian

The way I was able to reproduce the problem in 5.10 (and in 5.14 before -rt11)
was:

    1. systemctl disable tuned.service
    2. Reboot to a kernel-rt with lockdep support
    3. systemctl start tuned.service

I used a few tuned profiles, but stuck with 'realtime' for obvious reasons.

If that helps in the process I can investigate which specific
action/configuration
performed by tuned is triggering the lockdep splat.

Luis


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
  2021-10-18 15:42     ` Luis Goncalves
@ 2021-11-04 18:58       ` Luis Goncalves
  2021-11-25  1:39         ` Luis Goncalves
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Goncalves @ 2021-11-04 18:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang

On Mon, Oct 18, 2021 at 12:42 PM Luis Goncalves <lgoncalv@redhat.com> wrote:
>
> On Mon, Oct 11, 2021 at 1:36 PM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > On 2021-10-11 15:56:09 [+0200], To Luis Claudio R. Goncalves wrote:
> > > So my guess is that the _bh isn't needed.
> >
> > Correction: If there is at least _one_ reader in softirq all, writer
> > need to disable BH which _might_ be the case here.
> > The splat is probably because on read_lock holder disables BH later on
> > which I don't see yet.
> >
> > Sebastian
>
> The way I was able to reproduce the problem in 5.10 (and in 5.14 before -rt11)
> was:
>
>     1. systemctl disable tuned.service
>     2. Reboot to a kernel-rt with lockdep support
>     3. systemctl start tuned.service
>
> I used a few tuned profiles, but stuck with 'realtime' for obvious reasons.
>
> If that helps in the process I can investigate which specific
> action/configuration
> performed by tuned is triggering the lockdep splat.

Just for the sake of completeness, this is the lockdep splat for v5.10.73-rt54:

[   41.917124] hrtimer: interrupt took 19800197 ns
[   45.105231] bnxt_en 0000:19:00.0 eno1np0: NIC Link is Up, 10000
Mbps full duplex, Flow control: ON - receive & transmit
[   45.105235] bnxt_en 0000:19:00.0 eno1np0: FEC autoneg off encoding: None

[   46.370088] ======================================================
[   46.370089] WARNING: possible circular locking dependency detected
[   46.370089] 5.10.73-rt54.lockdep+ #2 Tainted: G          I
[   46.370091] ------------------------------------------------------
[   46.370091] tuned/2248 is trying to acquire lock:
[   46.370093] ffff954d9e818220
((softirq_ctrl.lock).lock){+.+.}-{2:2}, at:
__local_bh_disable_ip+0x116/0x2f0
[   46.370104]
               but task is already holding lock:
[   46.370104] ffffffff9ccafa60 (dev_base_lock){++.+}-{0:0}, at:
netstat_show.isra.17+0x4a/0xb0
[   46.370111]
               which lock already depends on the new lock.

[   46.370111]
               the existing dependency chain (in reverse order) is:
[   46.370111]
               -> #1 (dev_base_lock){++.+}-{0:0}:
[   46.370113]        lock_acquire+0xde/0x380
[   46.370117]        rt_write_lock+0x3a/0x1b0
[   46.370122]        list_netdevice+0x4c/0x140
[   46.370125]        register_netdevice+0x59b/0x810
[   46.370127]        register_netdev+0x1a/0x30
[   46.370128]        loopback_net_init+0x48/0xa0
[   46.370130]        ops_init+0x3a/0x180
[   46.370133]        register_pernet_operations+0x126/0x220
[   46.370135]        register_pernet_device+0x25/0x60
[   46.370138]        net_dev_init+0x31b/0x391
[   46.370141]        do_one_initcall+0x74/0x440
[   46.370145]        kernel_init_freeable+0x39d/0x408
[   46.370148]        kernel_init+0xb/0x12e
[   46.370150]        ret_from_fork+0x1f/0x30
[   46.370152]
               -> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}:
[   46.370153]        check_prevs_add+0x1bb/0xe50
[   46.370155]        __lock_acquire+0x1187/0x1640
[   46.370157]        lock_acquire+0xde/0x380
[   46.370158]        rt_spin_lock+0x2b/0xd0
[   46.370160]        __local_bh_disable_ip+0x116/0x2f0
[   46.370162]        tg3_get_stats64+0x29/0xb0 [tg3]
[   46.370167]        dev_get_stats+0x5b/0xc0
[   46.370168]        netstat_show.isra.17+0x5b/0xb0
[   46.370170]        dev_attr_show+0x18/0x50
[   46.370173]        sysfs_kf_seq_show+0xa1/0x130
[   46.370176]        seq_read_iter+0x19e/0x420
[   46.370178]        new_sync_read+0x121/0x1c0
[   46.370180]        vfs_read+0x185/0x1e0
[   46.370182]        ksys_read+0x5f/0xe0
[   46.370184]        do_syscall_64+0x33/0x40
[   46.370186]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   46.370188]
               other info that might help us debug this:

[   46.370188]  Possible unsafe locking scenario:

[   46.370189]        CPU0                    CPU1
[   46.370189]        ----                    ----
[   46.370189]   lock(dev_base_lock);
[   46.370190]                                lock((softirq_ctrl.lock).lock);
[   46.370191]                                lock(dev_base_lock);
[   46.370192]   lock((softirq_ctrl.lock).lock);
[   46.370193]
                *** DEADLOCK ***

[   46.370193] 6 locks held by tuned/2248:
[   46.370194]  #0: ffff954606ef7d30 (&f->f_pos_lock){+.+.}-{0:0}, at:
__fdget_pos+0x4b/0x60
[   46.370199]  #1: ffff9545d307a1f0 (&p->lock){+.+.}-{0:0}, at:
seq_read_iter+0x56/0x420
[   46.370202]  #2: ffff9545fcde6cb8 (&of->mutex){+.+.}-{0:0}, at:
kernfs_seq_start+0x28/0x90
[   46.370205]  #3: ffff953e5c12dd98 (kn->active#246){.+.+}-{0:0}, at:
kernfs_seq_start+0x30/0x90
[   46.370208]  #4: ffffffff9ccafa60 (dev_base_lock){++.+}-{0:0}, at:
netstat_show.isra.17+0x4a/0xb0
[   46.370211]  #5: ffffffff9c8c8fe0 (rcu_read_lock){....}-{1:2}, at:
rt_read_lock+0x7d/0x1e0
[   46.370214]
               stack backtrace:
[   46.370214] CPU: 27 PID: 2248 Comm: tuned Tainted: G          I
  5.10.73-rt54.lockdep+ #2
[   46.370216] Hardware name: Dell Inc. PowerEdge R740/07X9K0, BIOS
2.8.2 08/27/2020
[   46.370217] Call Trace:
[   46.370218]  dump_stack+0x77/0x97
[   46.370224]  check_noncircular+0xff/0x120
[   46.370228]  check_prevs_add+0x1bb/0xe50
[   46.370232]  __lock_acquire+0x1187/0x1640
[   46.370236]  lock_acquire+0xde/0x380
[   46.370237]  ? __local_bh_disable_ip+0x116/0x2f0
[   46.370242]  ? tg3_get_stats64+0x5/0xb0 [tg3]
[   46.370245]  rt_spin_lock+0x2b/0xd0
[   46.370246]  ? __local_bh_disable_ip+0x116/0x2f0
[   46.370249]  __local_bh_disable_ip+0x116/0x2f0
[   46.370251]  tg3_get_stats64+0x29/0xb0 [tg3]
[   46.370255]  dev_get_stats+0x5b/0xc0
[   46.370257]  netstat_show.isra.17+0x5b/0xb0
[   46.370263]  dev_attr_show+0x18/0x50
[   46.370264]  sysfs_kf_seq_show+0xa1/0x130
[   46.370266]  seq_read_iter+0x19e/0x420
[   46.370269]  new_sync_read+0x121/0x1c0
[   46.370272]  vfs_read+0x185/0x1e0
[   46.370274]  ksys_read+0x5f/0xe0
[   46.370276]  do_syscall_64+0x33/0x40
[   46.370278]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   46.370280] RIP: 0033:0x7f4ee932b9e4
[   46.370282] Code: c3 0f 1f 44 00 00 41 54 49 89 d4 55 48 89 f5 53
89 fb 48 83 ec 10 e8 7b fc ff ff 4c 89 e2 48 89 ee 89 df 41 89 c0 31
c0 0f 05 <48> 3d 00 f0 ff ff 77 38 44 89 c7 48 89 44 24 08 e8 b7 fc ff
ff 48
[   46.370283] RSP: 002b:00007f4ede603be0 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[   46.370285] RAX: ffffffffffffffda RBX: 0000000000000039 RCX: 00007f4ee932b9e4
[   46.370286] RDX: 0000000000001001 RSI: 00007f4ed801b670 RDI: 0000000000000039
[   46.370287] RBP: 00007f4ed801b670 R08: 0000000000000000 R09: 0000000000000003
[   46.370288] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000001001
[   46.370289] R13: 0000000000000039 R14: 00007f4ed801b670 R15: 000055cd6476e060
[   47.606432] DMA-API: dma_debug_entry pool grown to 131072 (200%)
[   49.474083] DMA-API: dma_debug_entry pool grown to 196608 (300%)
[   49.653708] DMA-API: dma_debug_entry pool grown to 262144 (400%)

Luis


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
  2021-11-04 18:58       ` Luis Goncalves
@ 2021-11-25  1:39         ` Luis Goncalves
  2021-11-25 10:01           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Goncalves @ 2021-11-25  1:39 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang

On Thu, Nov 4, 2021 at 3:58 PM Luis Goncalves <lgoncalv@redhat.com> wrote:
>
> On Mon, Oct 18, 2021 at 12:42 PM Luis Goncalves <lgoncalv@redhat.com> wrote:
> >
> > On Mon, Oct 11, 2021 at 1:36 PM Sebastian Andrzej Siewior
> > <bigeasy@linutronix.de> wrote:
> > >
> > > On 2021-10-11 15:56:09 [+0200], To Luis Claudio R. Goncalves wrote:
> > > > So my guess is that the _bh isn't needed.
> > >
> > > Correction: If there is at least _one_ reader in softirq all, writer
> > > need to disable BH which _might_ be the case here.
> > > The splat is probably because on read_lock holder disables BH later on
> > > which I don't see yet.
> > >
> > > Sebastian
> >
> > The way I was able to reproduce the problem in 5.10 (and in 5.14 before -rt11)
> > was:
> >
> >     1. systemctl disable tuned.service
> >     2. Reboot to a kernel-rt with lockdep support
> >     3. systemctl start tuned.service
> >
> > I used a few tuned profiles, but stuck with 'realtime' for obvious reasons.
> >
> > If that helps in the process I can investigate which specific
> > action/configuration
> > performed by tuned is triggering the lockdep splat.
>
> Just for the sake of completeness, this is the lockdep splat for v5.10.73-rt54:

Sebastian, I just stumbled over the lockdep splat being discussed here
in v5.15.3-rt21. Well, slightly different, but related to the very
same situation:

(btw, sorry if the formatting is off, I am using the gmail webui)


[35429.000894] ======================================================
[35429.000895] WARNING: possible circular locking dependency detected
[35429.000896] 5.15.3-rt21 #2 Not tainted
[35429.000897] ------------------------------------------------------
[35429.000898] tuned/6697 is trying to acquire lock:
[35429.000899] ffff8dbfd0cebf50 (&adapter->stats_lock){+.+.}-{2:2},
at: cxgb_get_stats+0x41/0x1d0 [cxgb4]
[35429.000925]
               but task is already holding lock:
[35429.000925] ffffffff9e0ed570 (dev_base_lock){++.+}-{2:2}, at:
netstat_show.constprop.0+0x41/0xb0
[35429.000930]
               which lock already depends on the new lock.

[35429.000931]
               the existing dependency chain (in reverse order) is:
[35429.000931]
               -> #2 (dev_base_lock){++.+}-{2:2}:
[35429.000933]        __lock_acquire+0x4d2/0x930
[35429.000937]        lock_acquire.part.0+0x9c/0x210
[35429.000938]        rt_write_lock+0x48/0xc0
[35429.000941]        list_netdevice+0x4c/0x130
[35429.000943]        register_netdevice+0x4fd/0x670
[35429.000945]        register_netdev+0x1c/0x40
[35429.000959]        loopback_net_init+0x48/0x90
[35429.000962]        ops_init+0x3a/0x160
[35429.000964]        register_pernet_operations+0x127/0x220
[35429.000965]        register_pernet_device+0x26/0x60
[35429.000967]        net_dev_init+0x23c/0x2b0
[35429.000971]        do_one_initcall+0x47/0x170
[35429.000974]        do_initcalls+0xc6/0xdf
[35429.000977]        kernel_init_freeable+0xfc/0x12c
[35429.000979]        kernel_init+0x16/0x120
[35429.000982]        ret_from_fork+0x22/0x30
[35429.000984]
               -> #1 ((softirq_ctrl.lock)){+.+.}-{2:2}:
[35429.000986]        __lock_acquire+0x4d2/0x930
[35429.000988]        lock_acquire.part.0+0x9c/0x210
[35429.000989]        rt_spin_lock+0x27/0xe0
[35429.000992]        __local_bh_disable_ip+0xb8/0x190
[35429.000994]        t4_wr_mbox_meat_timeout+0xa7/0x720 [cxgb4]
[35429.001004]        t4_get_mps_bg_map+0xba/0x1b0 [cxgb4]
[35429.001028]        t4_get_port_stats+0x16/0xc50 [cxgb4]
[35429.001039]        t4_get_port_stats_offset+0x12/0x30 [cxgb4]
[35429.001050]        cxgb_get_stats+0x66/0x1d0 [cxgb4]
[35429.001059]        dev_get_stats+0x5c/0xc0
[35429.001061]        rtnl_fill_stats+0x3b/0x130
[35429.001063]        rtnl_fill_ifinfo+0x706/0x1140
[35429.001066]        rtmsg_ifinfo_build_skb+0x88/0xe0
[35429.001068]        rtmsg_ifinfo+0x25/0x60
[35429.001070]        register_netdevice+0x5f9/0x670
[35429.001072]        register_netdev+0x1c/0x40
[35429.001074]        init_one+0x8af/0xda0 [cxgb4]
[35429.001084]        local_pci_probe+0x45/0x80
[35429.001088]        work_for_cpu_fn+0x16/0x20
[35429.001090]        process_one_work+0x27a/0x450
[35429.001092]        worker_thread+0x1ea/0x3c0
[35429.001093]        kthread+0x188/0x1a0
[35429.001095]        ret_from_fork+0x22/0x30
[35429.001098]
               -> #0 (&adapter->stats_lock){+.+.}-{2:2}:
[35429.001099]        check_prev_add+0x91/0xc10
[35429.001101]        validate_chain+0x480/0x510
[35429.001102]        __lock_acquire+0x4d2/0x930
[35429.001104]        lock_acquire.part.0+0x9c/0x210
[35429.001105]        rt_spin_lock+0x27/0xe0
[35429.001107]        cxgb_get_stats+0x41/0x1d0 [cxgb4]
[35429.001116]        dev_get_stats+0x5c/0xc0
[35429.001118]        netstat_show.constprop.0+0x52/0xb0
[35429.001120]        dev_attr_show+0x19/0x40
[35429.001122]        sysfs_kf_seq_show+0xa9/0x100
[35429.001125]        seq_read_iter+0x128/0x4c0
[35429.001128]        new_sync_read+0x121/0x1b0
[35429.001130]        vfs_read+0x133/0x1d0
[35429.001131]        ksys_read+0x68/0xe0
[35429.001133]        do_syscall_64+0x3b/0x90
[35429.001134]        entry_SYSCALL_64_after_hwframe+0x44/0xae
[35429.001137]
               other info that might help us debug this:

[35429.001137] Chain exists of:
                 &adapter->stats_lock --> (softirq_ctrl.lock) --> dev_base_lock

[35429.001139]  Possible unsafe locking scenario:

[35429.001140]        CPU0                    CPU1
[35429.001140]        ----                    ----
[35429.001140]   lock(dev_base_lock);
[35429.001141]                                lock((softirq_ctrl.lock));
[35429.001142]                                lock(dev_base_lock);
[35429.001143]   lock(&adapter->stats_lock);
[35429.001144]
                *** DEADLOCK ***

[35429.001144] 6 locks held by tuned/6697:
[35429.001145]  #0: ffff8dba839df838 (&f->f_pos_lock){+.+.}-{3:3}, at:
__fdget_pos+0x4d/0x60
[35429.001149]  #1: ffff8dba44ef3d98 (&p->lock){+.+.}-{3:3}, at:
seq_read_iter+0x51/0x4c0
[35429.001153]  #2: ffff8dba49960a80 (&of->mutex){+.+.}-{3:3}, at:
kernfs_seq_start+0x2a/0xb0
[35429.001156]  #3: ffff8dbff8ea31e8 (kn->active#223){++++}-{0:0}, at:
kernfs_seq_start+0x32/0xb0
[35429.001159]  #4: ffffffff9e0ed570 (dev_base_lock){++.+}-{2:2}, at:
netstat_show.constprop.0+0x41/0xb0
[35429.001163]  #5: ffffffff9de61200 (rcu_read_lock){....}-{1:2}, at:
rt_read_lock+0x65/0x130
[35429.001166]
               stack backtrace:
[35429.001167] CPU: 16 PID: 6697 Comm: tuned Kdump: loaded Not tainted
5.15.3-rt21 #2
[35429.001169] Hardware name: Dell Inc. PowerEdge R520/08DM12, BIOS
2.1.2 01/20/2014
[35429.001171] Call Trace:
[35429.001172]  <TASK>
[35429.001174]  dump_stack_lvl+0x57/0x7d
[35429.001177]  check_noncircular+0xff/0x110
[35429.001182]  check_prev_add+0x91/0xc10
[35429.001184]  ? add_chain_cache+0x112/0x2d0
[35429.001186]  validate_chain+0x480/0x510
[35429.001190]  __lock_acquire+0x4d2/0x930
[35429.001193]  lock_acquire.part.0+0x9c/0x210
[35429.001195]  ? cxgb_get_stats+0x41/0x1d0 [cxgb4]
[35429.001205]  ? rcu_read_lock_sched_held+0x3f/0x70
[35429.001209]  ? trace_lock_acquire+0x38/0x140
[35429.001210]  ? lock_acquire+0x30/0x80
[35429.001212]  ? cxgb_get_stats+0x41/0x1d0 [cxgb4]
[35429.001222]  rt_spin_lock+0x27/0xe0
[35429.001225]  ? cxgb_get_stats+0x41/0x1d0 [cxgb4]
[35429.001234]  cxgb_get_stats+0x41/0x1d0 [cxgb4]
[35429.001245]  ? check_prev_add+0xa3/0xc10
[35429.001247]  ? check_prev_add+0xa3/0xc10
[35429.001248]  ? add_chain_cache+0x112/0x2d0
[35429.001251]  ? lockdep_unlock+0x5c/0xc0
[35429.001252]  ? validate_chain+0x29a/0x510
[35429.001255]  ? __lock_acquire+0x4d2/0x930
[35429.001258]  ? lock_acquire.part.0+0x9c/0x210
[35429.001261]  ? rcu_read_lock_sched_held+0x3f/0x70
[35429.001263]  ? trace_lock_acquire+0x38/0x140
[35429.001265]  ? lock_acquire+0x30/0x80
[35429.001266]  ? rt_read_lock+0x65/0x130
[35429.001269]  dev_get_stats+0x5c/0xc0
[35429.001272]  netstat_show.constprop.0+0x52/0xb0
[35429.001279]  dev_attr_show+0x19/0x40
[35429.001280]  sysfs_kf_seq_show+0xa9/0x100
[35429.001283]  seq_read_iter+0x128/0x4c0
[35429.001288]  new_sync_read+0x121/0x1b0
[35429.001292]  vfs_read+0x133/0x1d0
[35429.001294]  ksys_read+0x68/0xe0
[35429.001297]  do_syscall_64+0x3b/0x90
[35429.001299]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[35429.001301] RIP: 0033:0x7f03e459a9ec
[35429.001304] Code: ec 28 48 89 54 24 18 48 89 74 24 10 89 7c 24 08
e8 09 87 f8 ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 31
c0 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 48 89 44 24 08 e8 4f 87 f8
ff 48
[35429.001306] RSP: 002b:00007f03e23896a0 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[35429.001308] RAX: ffffffffffffffda RBX: 00007f03e238b5c0 RCX: 00007f03e459a9ec
[35429.001309] RDX: 0000000000001001 RSI: 00007f03dc03b5b0 RDI: 0000000000000008
[35429.001310] RBP: 0000000000001001 R08: 0000000000000000 R09: 0000000000000000
[35429.001311] R10: 0000000000000002 R11: 0000000000000246 R12: 00007f03e021b540
[35429.001312] R13: 00007f03dc03b5b0 R14: 0000000000000008 R15: 00007f03d8001c00
[35429.001328]  </TASK>
[35430.549435] bnx2x 0000:08:00.1 enp8s0f1: using MSI-X  IRQs: sp 97
fp[0] 99 ... fp[19] 332
[35430.695688] bnx2x 0000:08:00.1 enp8s0f1: NIC Link is Up, 1000 Mbps
full duplex, Flow control: ON - receive & transmit
[35432.120806] bnx2x 0000:08:00.0 enp8s0f0: using MSI-X  IRQs: sp 61
fp[0] 63 ... fp[19] 344
[35432.425838] bnx2x 0000:08:00.0 enp8s0f0: NIC Link is Up, 1000 Mbps
full duplex, Flow control: ON - receive & transmit


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
  2021-11-25  1:39         ` Luis Goncalves
@ 2021-11-25 10:01           ` Sebastian Andrzej Siewior
  2021-11-25 11:11             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-11-25 10:01 UTC (permalink / raw)
  To: Luis Goncalves
  Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang

On 2021-11-24 22:39:34 [-0300], Luis Goncalves wrote:
> Sebastian, I just stumbled over the lockdep splat being discussed here
> in v5.15.3-rt21. Well, slightly different, but related to the very
> same situation:
…

so we have write_lock_bh(&dev_base_lock) and read_lock(&dev_base_lock)
followed by local_bh_disable() (due to spin_lock_bh(&adap->mbox_lock)).

It makes sense to only change that one invocation. Let me form a patch.

Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
  2021-11-25 10:01           ` Sebastian Andrzej Siewior
@ 2021-11-25 11:11             ` Sebastian Andrzej Siewior
  2021-11-26 12:09               ` Luis Goncalves
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-11-25 11:11 UTC (permalink / raw)
  To: Luis Goncalves
  Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang

On 2021-11-25 11:01:10 [+0100], To Luis Goncalves wrote:
> 
> It makes sense to only change that one invocation. Let me form a patch.

So I made dis:

| From aaef7596167b3b38a81567af81aaa2f9559c91df Mon Sep 17 00:00:00 2001
| From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
| Date: Thu, 25 Nov 2021 11:05:17 +0100
| Subject: [PATCH] net-sysfs: Acquire dev_base_lock disabled BH in
|  netstat_show()
| 
| The writer acquires dev_base_lock with disabled bottom halves.
| The reader can acquire dev_base_lock without disabling bottom halves
| because there is no writer in softirq context. Also readers can acquire
| the lock in task and softirq context and multiple readers are allowed.
| 
| On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
| as a lock to ensure that resources, that are protected by disabling
| bottom halves, remain protected.
| This leads to a circular locking dependency if the lock acquired with
| disabled bottom halves (as in write_lock_bh()) and somewhere else with
| enabled bottom halves (as by read_lock() in netstat_show()) followed by
| disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
| -> spin_lock_bh()). This is the reverse locking order.
| 
| In order to avoid the reverse locking order, acquire the dev_base_lock
| with disabled bottom halves.
| 
| Reported-by: Pei Zhang <pezhang@redhat.com>
| Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
| Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

and then I started locking for in-softirq reader or writer. I can't find
any. Therefore I made this for testing:

diff --git a/net/core/dev.c b/net/core/dev.c
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev)
 
 	ASSERT_RTNL();
 
-	write_lock_bh(&dev_base_lock);
+	write_lock(&dev_base_lock);
 	list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
 	netdev_name_node_add(net, dev->name_node);
 	hlist_add_head_rcu(&dev->index_hlist,
 			   dev_index_hash(net, dev->ifindex));
-	write_unlock_bh(&dev_base_lock);
+	write_unlock(&dev_base_lock);
 
 	dev_base_seq_inc(net);
 }
@@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev)
 	ASSERT_RTNL();
 
 	/* Unlink dev from the device chain */
-	write_lock_bh(&dev_base_lock);
+	write_lock(&dev_base_lock);
 	list_del_rcu(&dev->dev_list);
 	netdev_name_node_del(dev->name_node);
 	hlist_del_rcu(&dev->index_hlist);
-	write_unlock_bh(&dev_base_lock);
+	write_unlock(&dev_base_lock);
 
 	dev_base_seq_inc(dev_net(dev));
 }
@@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
 
 	netdev_adjacent_rename_links(dev, oldname);
 
-	write_lock_bh(&dev_base_lock);
+	write_lock(&dev_base_lock);
 	netdev_name_node_del(dev->name_node);
-	write_unlock_bh(&dev_base_lock);
+	write_unlock(&dev_base_lock);
 
 	synchronize_rcu();
 
-	write_lock_bh(&dev_base_lock);
+	write_lock(&dev_base_lock);
 	netdev_name_node_add(net, dev->name_node);
-	write_unlock_bh(&dev_base_lock);
+	write_unlock(&dev_base_lock);
 
 	ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
 	ret = notifier_to_errno(ret);
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev)
 	if (operstate == dev->operstate)
 		return;
 
-	write_lock_bh(&dev_base_lock);
+	write_lock(&dev_base_lock);
 
 	switch(dev->link_mode) {
 	case IF_LINK_MODE_TESTING:
@@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev)
 
 	dev->operstate = operstate;
 
-	write_unlock_bh(&dev_base_lock);
+	write_unlock(&dev_base_lock);
 }
 
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
 	}
 
 	if (dev->operstate != operstate) {
-		write_lock_bh(&dev_base_lock);
+		write_lock(&dev_base_lock);
 		dev->operstate = operstate;
-		write_unlock_bh(&dev_base_lock);
+		write_unlock(&dev_base_lock);
 		netdev_state_change(dev);
 	}
 }
@@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb,
 	if (tb[IFLA_LINKMODE]) {
 		unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]);
 
-		write_lock_bh(&dev_base_lock);
+		write_lock(&dev_base_lock);
 		if (dev->link_mode ^ value)
 			status |= DO_SETLINK_NOTIFY;
 		dev->link_mode = value;
-		write_unlock_bh(&dev_base_lock);
+		write_unlock(&dev_base_lock);
 	}
 
 	if (tb[IFLA_VFINFO_LIST]) {
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev)
 
 static void __hsr_set_operstate(struct net_device *dev, int transition)
 {
-	write_lock_bh(&dev_base_lock);
+	write_lock(&dev_base_lock);
 	if (dev->operstate != transition) {
 		dev->operstate = transition;
-		write_unlock_bh(&dev_base_lock);
+		write_unlock(&dev_base_lock);
 		netdev_state_change(dev);
 	} else {
-		write_unlock_bh(&dev_base_lock);
+		write_unlock(&dev_base_lock);
 	}
 }
 

If this does not create any warnings (with or without RT) then I would
suggest to remove that _bh suffix since there is no need for it. This
should also fix the RT warning you see now.
Agreed?
 
Sebastian

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
  2021-11-25 11:11             ` Sebastian Andrzej Siewior
@ 2021-11-26 12:09               ` Luis Goncalves
  2021-11-26 13:24                 ` Luis Goncalves
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Goncalves @ 2021-11-26 12:09 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang

On Thu, Nov 25, 2021 at 8:11 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2021-11-25 11:01:10 [+0100], To Luis Goncalves wrote:
> >
> > It makes sense to only change that one invocation. Let me form a patch.
>
> So I made dis:
>
> | From aaef7596167b3b38a81567af81aaa2f9559c91df Mon Sep 17 00:00:00 2001
> | From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
> | Date: Thu, 25 Nov 2021 11:05:17 +0100
> | Subject: [PATCH] net-sysfs: Acquire dev_base_lock disabled BH in
> |  netstat_show()
> |
> | The writer acquires dev_base_lock with disabled bottom halves.
> | The reader can acquire dev_base_lock without disabling bottom halves
> | because there is no writer in softirq context. Also readers can acquire
> | the lock in task and softirq context and multiple readers are allowed.
> |
> | On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
> | as a lock to ensure that resources, that are protected by disabling
> | bottom halves, remain protected.
> | This leads to a circular locking dependency if the lock acquired with
> | disabled bottom halves (as in write_lock_bh()) and somewhere else with
> | enabled bottom halves (as by read_lock() in netstat_show()) followed by
> | disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
> | -> spin_lock_bh()). This is the reverse locking order.
> |
> | In order to avoid the reverse locking order, acquire the dev_base_lock
> | with disabled bottom halves.
> |
> | Reported-by: Pei Zhang <pezhang@redhat.com>
> | Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> | Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> and then I started locking for in-softirq reader or writer. I can't find
> any. Therefore I made this for testing:
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev)
>
>         ASSERT_RTNL();
>
> -       write_lock_bh(&dev_base_lock);
> +       write_lock(&dev_base_lock);
>         list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
>         netdev_name_node_add(net, dev->name_node);
>         hlist_add_head_rcu(&dev->index_hlist,
>                            dev_index_hash(net, dev->ifindex));
> -       write_unlock_bh(&dev_base_lock);
> +       write_unlock(&dev_base_lock);
>
>         dev_base_seq_inc(net);
>  }
> @@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev)
>         ASSERT_RTNL();
>
>         /* Unlink dev from the device chain */
> -       write_lock_bh(&dev_base_lock);
> +       write_lock(&dev_base_lock);
>         list_del_rcu(&dev->dev_list);
>         netdev_name_node_del(dev->name_node);
>         hlist_del_rcu(&dev->index_hlist);
> -       write_unlock_bh(&dev_base_lock);
> +       write_unlock(&dev_base_lock);
>
>         dev_base_seq_inc(dev_net(dev));
>  }
> @@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
>
>         netdev_adjacent_rename_links(dev, oldname);
>
> -       write_lock_bh(&dev_base_lock);
> +       write_lock(&dev_base_lock);
>         netdev_name_node_del(dev->name_node);
> -       write_unlock_bh(&dev_base_lock);
> +       write_unlock(&dev_base_lock);
>
>         synchronize_rcu();
>
> -       write_lock_bh(&dev_base_lock);
> +       write_lock(&dev_base_lock);
>         netdev_name_node_add(net, dev->name_node);
> -       write_unlock_bh(&dev_base_lock);
> +       write_unlock(&dev_base_lock);
>
>         ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
>         ret = notifier_to_errno(ret);
> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> --- a/net/core/link_watch.c
> +++ b/net/core/link_watch.c
> @@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev)
>         if (operstate == dev->operstate)
>                 return;
>
> -       write_lock_bh(&dev_base_lock);
> +       write_lock(&dev_base_lock);
>
>         switch(dev->link_mode) {
>         case IF_LINK_MODE_TESTING:
> @@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev)
>
>         dev->operstate = operstate;
>
> -       write_unlock_bh(&dev_base_lock);
> +       write_unlock(&dev_base_lock);
>  }
>
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
>         }
>
>         if (dev->operstate != operstate) {
> -               write_lock_bh(&dev_base_lock);
> +               write_lock(&dev_base_lock);
>                 dev->operstate = operstate;
> -               write_unlock_bh(&dev_base_lock);
> +               write_unlock(&dev_base_lock);
>                 netdev_state_change(dev);
>         }
>  }
> @@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb,
>         if (tb[IFLA_LINKMODE]) {
>                 unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]);
>
> -               write_lock_bh(&dev_base_lock);
> +               write_lock(&dev_base_lock);
>                 if (dev->link_mode ^ value)
>                         status |= DO_SETLINK_NOTIFY;
>                 dev->link_mode = value;
> -               write_unlock_bh(&dev_base_lock);
> +               write_unlock(&dev_base_lock);
>         }
>
>         if (tb[IFLA_VFINFO_LIST]) {
> diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
> --- a/net/hsr/hsr_device.c
> +++ b/net/hsr/hsr_device.c
> @@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev)
>
>  static void __hsr_set_operstate(struct net_device *dev, int transition)
>  {
> -       write_lock_bh(&dev_base_lock);
> +       write_lock(&dev_base_lock);
>         if (dev->operstate != transition) {
>                 dev->operstate = transition;
> -               write_unlock_bh(&dev_base_lock);
> +               write_unlock(&dev_base_lock);
>                 netdev_state_change(dev);
>         } else {
> -               write_unlock_bh(&dev_base_lock);
> +               write_unlock(&dev_base_lock);
>         }
>  }
>
>
> If this does not create any warnings (with or without RT) then I would
> suggest to remove that _bh suffix since there is no need for it. This
> should also fix the RT warning you see now.
> Agreed?

Sounds like a plan!

I will start testing within the hour and report soon.

Thanks,
Luis

> Sebastian
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show()
  2021-11-26 12:09               ` Luis Goncalves
@ 2021-11-26 13:24                 ` Luis Goncalves
  2021-11-26 16:15                   ` [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves Sebastian Andrzej Siewior
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Goncalves @ 2021-11-26 13:24 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-users, Steven Rostedt, stable-rt, Nitesh Narayan Lal, Pei Zhang

On Fri, Nov 26, 2021 at 9:09 AM Luis Goncalves <lgoncalv@redhat.com> wrote:
>
> On Thu, Nov 25, 2021 at 8:11 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> > On 2021-11-25 11:01:10 [+0100], To Luis Goncalves wrote:
> > >
> > > It makes sense to only change that one invocation. Let me form a patch.
> >
> > So I made dis:
> >
> > | From aaef7596167b3b38a81567af81aaa2f9559c91df Mon Sep 17 00:00:00 2001
> > | From: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
> > | Date: Thu, 25 Nov 2021 11:05:17 +0100
> > | Subject: [PATCH] net-sysfs: Acquire dev_base_lock disabled BH in
> > |  netstat_show()
> > |
> > | The writer acquires dev_base_lock with disabled bottom halves.
> > | The reader can acquire dev_base_lock without disabling bottom halves
> > | because there is no writer in softirq context. Also readers can acquire
> > | the lock in task and softirq context and multiple readers are allowed.
> > |
> > | On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
> > | as a lock to ensure that resources, that are protected by disabling
> > | bottom halves, remain protected.
> > | This leads to a circular locking dependency if the lock acquired with
> > | disabled bottom halves (as in write_lock_bh()) and somewhere else with
> > | enabled bottom halves (as by read_lock() in netstat_show()) followed by
> > | disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
> > | -> spin_lock_bh()). This is the reverse locking order.
> > |
> > | In order to avoid the reverse locking order, acquire the dev_base_lock
> > | with disabled bottom halves.
> > |
> > | Reported-by: Pei Zhang <pezhang@redhat.com>
> > | Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
> > | Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> >
> > and then I started locking for in-softirq reader or writer. I can't find
> > any. Therefore I made this for testing:
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev)
> >
> >         ASSERT_RTNL();
> >
> > -       write_lock_bh(&dev_base_lock);
> > +       write_lock(&dev_base_lock);
> >         list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
> >         netdev_name_node_add(net, dev->name_node);
> >         hlist_add_head_rcu(&dev->index_hlist,
> >                            dev_index_hash(net, dev->ifindex));
> > -       write_unlock_bh(&dev_base_lock);
> > +       write_unlock(&dev_base_lock);
> >
> >         dev_base_seq_inc(net);
> >  }
> > @@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev)
> >         ASSERT_RTNL();
> >
> >         /* Unlink dev from the device chain */
> > -       write_lock_bh(&dev_base_lock);
> > +       write_lock(&dev_base_lock);
> >         list_del_rcu(&dev->dev_list);
> >         netdev_name_node_del(dev->name_node);
> >         hlist_del_rcu(&dev->index_hlist);
> > -       write_unlock_bh(&dev_base_lock);
> > +       write_unlock(&dev_base_lock);
> >
> >         dev_base_seq_inc(dev_net(dev));
> >  }
> > @@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
> >
> >         netdev_adjacent_rename_links(dev, oldname);
> >
> > -       write_lock_bh(&dev_base_lock);
> > +       write_lock(&dev_base_lock);
> >         netdev_name_node_del(dev->name_node);
> > -       write_unlock_bh(&dev_base_lock);
> > +       write_unlock(&dev_base_lock);
> >
> >         synchronize_rcu();
> >
> > -       write_lock_bh(&dev_base_lock);
> > +       write_lock(&dev_base_lock);
> >         netdev_name_node_add(net, dev->name_node);
> > -       write_unlock_bh(&dev_base_lock);
> > +       write_unlock(&dev_base_lock);
> >
> >         ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
> >         ret = notifier_to_errno(ret);
> > diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> > --- a/net/core/link_watch.c
> > +++ b/net/core/link_watch.c
> > @@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev)
> >         if (operstate == dev->operstate)
> >                 return;
> >
> > -       write_lock_bh(&dev_base_lock);
> > +       write_lock(&dev_base_lock);
> >
> >         switch(dev->link_mode) {
> >         case IF_LINK_MODE_TESTING:
> > @@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev)
> >
> >         dev->operstate = operstate;
> >
> > -       write_unlock_bh(&dev_base_lock);
> > +       write_unlock(&dev_base_lock);
> >  }
> >
> >
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
> >         }
> >
> >         if (dev->operstate != operstate) {
> > -               write_lock_bh(&dev_base_lock);
> > +               write_lock(&dev_base_lock);
> >                 dev->operstate = operstate;
> > -               write_unlock_bh(&dev_base_lock);
> > +               write_unlock(&dev_base_lock);
> >                 netdev_state_change(dev);
> >         }
> >  }
> > @@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb,
> >         if (tb[IFLA_LINKMODE]) {
> >                 unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]);
> >
> > -               write_lock_bh(&dev_base_lock);
> > +               write_lock(&dev_base_lock);
> >                 if (dev->link_mode ^ value)
> >                         status |= DO_SETLINK_NOTIFY;
> >                 dev->link_mode = value;
> > -               write_unlock_bh(&dev_base_lock);
> > +               write_unlock(&dev_base_lock);
> >         }
> >
> >         if (tb[IFLA_VFINFO_LIST]) {
> > diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
> > --- a/net/hsr/hsr_device.c
> > +++ b/net/hsr/hsr_device.c
> > @@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev)
> >
> >  static void __hsr_set_operstate(struct net_device *dev, int transition)
> >  {
> > -       write_lock_bh(&dev_base_lock);
> > +       write_lock(&dev_base_lock);
> >         if (dev->operstate != transition) {
> >                 dev->operstate = transition;
> > -               write_unlock_bh(&dev_base_lock);
> > +               write_unlock(&dev_base_lock);
> >                 netdev_state_change(dev);
> >         } else {
> > -               write_unlock_bh(&dev_base_lock);
> > +               write_unlock(&dev_base_lock);
> >         }
> >  }
> >
> >
> > If this does not create any warnings (with or without RT) then I would
> > suggest to remove that _bh suffix since there is no need for it. This
> > should also fix the RT warning you see now.
> > Agreed?
>
> Sounds like a plan!
>
> I will start testing within the hour and report soon.

Using v5.15.3-rt21 + your patch, I rebooted multiple times the two
boxes where I had observed the netstat_show() lockdep complaint
before. The kernel booted fine every single time,  no netstat_show()
lockdep complaints at all.

I only tested a kernel with PREEMPT_RT enabled. Do you want me to test
the vanilla kernel with your patch?

Luis


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves.
  2021-11-26 13:24                 ` Luis Goncalves
@ 2021-11-26 16:15                   ` Sebastian Andrzej Siewior
  2021-11-29 12:20                     ` patchwork-bot+netdevbpf
  0 siblings, 1 reply; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-11-26 16:15 UTC (permalink / raw)
  To: netdev
  Cc: Luis Goncalves, Steven Rostedt, Nitesh Narayan Lal, Pei Zhang,
	David S. Miller, Jakub Kicinski, Thomas Gleixner

The writer acquires dev_base_lock with disabled bottom halves.
The reader can acquire dev_base_lock without disabling bottom halves
because there is no writer in softirq context.

On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves (as in write_lock_bh()) and somewhere else with
enabled bottom halves (as by read_lock() in netstat_show()) followed by
disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
-> spin_lock_bh()). This is the reverse locking order.

All read_lock() invocation are from sysfs callback which are not invoked
from softirq context. Therefore there is no need to disable bottom
halves while acquiring a write lock.

Acquire the write lock of dev_base_lock without disabling bottom halves.

Reported-by: Pei Zhang <pezhang@redhat.com>
Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 net/core/dev.c        | 16 ++++++++--------
 net/core/link_watch.c |  4 ++--
 net/core/rtnetlink.c  |  8 ++++----
 net/hsr/hsr_device.c  |  6 +++---
 4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8b5cf8ad859b5..ad3cccbfa573b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -371,12 +371,12 @@ static void list_netdevice(struct net_device *dev)
 
 	ASSERT_RTNL();
 
-	write_lock_bh(&dev_base_lock);
+	write_lock(&dev_base_lock);
 	list_add_tail_rcu(&dev->dev_list, &net->dev_base_head);
 	netdev_name_node_add(net, dev->name_node);
 	hlist_add_head_rcu(&dev->index_hlist,
 			   dev_index_hash(net, dev->ifindex));
-	write_unlock_bh(&dev_base_lock);
+	write_unlock(&dev_base_lock);
 
 	dev_base_seq_inc(net);
 }
@@ -389,11 +389,11 @@ static void unlist_netdevice(struct net_device *dev)
 	ASSERT_RTNL();
 
 	/* Unlink dev from the device chain */
-	write_lock_bh(&dev_base_lock);
+	write_lock(&dev_base_lock);
 	list_del_rcu(&dev->dev_list);
 	netdev_name_node_del(dev->name_node);
 	hlist_del_rcu(&dev->index_hlist);
-	write_unlock_bh(&dev_base_lock);
+	write_unlock(&dev_base_lock);
 
 	dev_base_seq_inc(dev_net(dev));
 }
@@ -1272,15 +1272,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
 
 	netdev_adjacent_rename_links(dev, oldname);
 
-	write_lock_bh(&dev_base_lock);
+	write_lock(&dev_base_lock);
 	netdev_name_node_del(dev->name_node);
-	write_unlock_bh(&dev_base_lock);
+	write_unlock(&dev_base_lock);
 
 	synchronize_rcu();
 
-	write_lock_bh(&dev_base_lock);
+	write_lock(&dev_base_lock);
 	netdev_name_node_add(net, dev->name_node);
-	write_unlock_bh(&dev_base_lock);
+	write_unlock(&dev_base_lock);
 
 	ret = call_netdevice_notifiers(NETDEV_CHANGENAME, dev);
 	ret = notifier_to_errno(ret);
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 1a455847da54f..9599afd0862da 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -55,7 +55,7 @@ static void rfc2863_policy(struct net_device *dev)
 	if (operstate == dev->operstate)
 		return;
 
-	write_lock_bh(&dev_base_lock);
+	write_lock(&dev_base_lock);
 
 	switch(dev->link_mode) {
 	case IF_LINK_MODE_TESTING:
@@ -74,7 +74,7 @@ static void rfc2863_policy(struct net_device *dev)
 
 	dev->operstate = operstate;
 
-	write_unlock_bh(&dev_base_lock);
+	write_unlock(&dev_base_lock);
 }
 
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2af8aeeadadf0..716be2f88cd75 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -842,9 +842,9 @@ static void set_operstate(struct net_device *dev, unsigned char transition)
 	}
 
 	if (dev->operstate != operstate) {
-		write_lock_bh(&dev_base_lock);
+		write_lock(&dev_base_lock);
 		dev->operstate = operstate;
-		write_unlock_bh(&dev_base_lock);
+		write_unlock(&dev_base_lock);
 		netdev_state_change(dev);
 	}
 }
@@ -2779,11 +2779,11 @@ static int do_setlink(const struct sk_buff *skb,
 	if (tb[IFLA_LINKMODE]) {
 		unsigned char value = nla_get_u8(tb[IFLA_LINKMODE]);
 
-		write_lock_bh(&dev_base_lock);
+		write_lock(&dev_base_lock);
 		if (dev->link_mode ^ value)
 			status |= DO_SETLINK_NOTIFY;
 		dev->link_mode = value;
-		write_unlock_bh(&dev_base_lock);
+		write_unlock(&dev_base_lock);
 	}
 
 	if (tb[IFLA_VFINFO_LIST]) {
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
index 737e4f17e1c6d..e57fdad9ef942 100644
--- a/net/hsr/hsr_device.c
+++ b/net/hsr/hsr_device.c
@@ -30,13 +30,13 @@ static bool is_slave_up(struct net_device *dev)
 
 static void __hsr_set_operstate(struct net_device *dev, int transition)
 {
-	write_lock_bh(&dev_base_lock);
+	write_lock(&dev_base_lock);
 	if (dev->operstate != transition) {
 		dev->operstate = transition;
-		write_unlock_bh(&dev_base_lock);
+		write_unlock(&dev_base_lock);
 		netdev_state_change(dev);
 	} else {
-		write_unlock_bh(&dev_base_lock);
+		write_unlock(&dev_base_lock);
 	}
 }
 
-- 
2.34.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves.
  2021-11-26 16:15                   ` [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves Sebastian Andrzej Siewior
@ 2021-11-29 12:20                     ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 12+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-11-29 12:20 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, lgoncalv, rostedt, nilal, pezhang, davem, kuba, tglx

Hello:

This patch was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Fri, 26 Nov 2021 17:15:29 +0100 you wrote:
> The writer acquires dev_base_lock with disabled bottom halves.
> The reader can acquire dev_base_lock without disabling bottom halves
> because there is no writer in softirq context.
> 
> On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
> as a lock to ensure that resources, that are protected by disabling
> bottom halves, remain protected.
> This leads to a circular locking dependency if the lock acquired with
> disabled bottom halves (as in write_lock_bh()) and somewhere else with
> enabled bottom halves (as by read_lock() in netstat_show()) followed by
> disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
> -> spin_lock_bh()). This is the reverse locking order.
> 
> [...]

Here is the summary with links:
  - [net-next] net: Write lock dev_base_lock without disabling bottom halves.
    https://git.kernel.org/netdev/net-next/c/fd888e85fe6b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-11-29 13:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-05  1:51 [RFC RT v5.10] [rt] repair usage of dev_base_lock in netstat_show() Luis Claudio R. Goncalves
2021-10-11 13:56 ` Sebastian Andrzej Siewior
2021-10-11 16:36   ` Sebastian Andrzej Siewior
2021-10-18 15:42     ` Luis Goncalves
2021-11-04 18:58       ` Luis Goncalves
2021-11-25  1:39         ` Luis Goncalves
2021-11-25 10:01           ` Sebastian Andrzej Siewior
2021-11-25 11:11             ` Sebastian Andrzej Siewior
2021-11-26 12:09               ` Luis Goncalves
2021-11-26 13:24                 ` Luis Goncalves
2021-11-26 16:15                   ` [PATCH net-next] net: Write lock dev_base_lock without disabling bottom halves Sebastian Andrzej Siewior
2021-11-29 12:20                     ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.