On Wed, 2019-07-31 at 11:38 +0300, Leon Romanovsky wrote: > From: Leon Romanovsky > > The below kernel panic was observed when created bond mode LACP > with GRE tunnel on top. The reason to it was not released spinlock > during mlx5 notify unregsiter sequence. > > [ 234.562007] BUG: scheduling while atomic: sh/10900/0x00000002 > [ 234.563005] Preemption disabled at: > [ 234.566864] ------------[ cut here ]------------ > [ 234.567120] DEBUG_LOCKS_WARN_ON(val > preempt_count()) > [ 234.567139] WARNING: CPU: 16 PID: 10900 at kernel/sched/core.c:3203 > preempt_count_sub+0xca/0x170 > [ 234.569550] CPU: 16 PID: 10900 Comm: sh Tainted: G W 5.2.0- > rc1-for-linust-dbg-2019-05-25_04-57-33-60 #1 > [ 234.569886] Hardware name: Dell Inc. PowerEdge R720/0X3D66, BIOS > 2.6.1 02/12/2018 > [ 234.570183] RIP: 0010:preempt_count_sub+0xca/0x170 > [ 234.570404] Code: 03 38 > d0 7c 08 84 d2 0f 85 b0 00 00 00 8b 15 dd 02 03 04 85 d2 75 ba 48 c7 > c6 > 00 e1 88 83 48 c7 c7 40 e1 88 83 e8 76 11 f7 ff <0f> 0b 5b c3 65 8b 05 > d3 1f d8 7e 84 c0 75 82 e8 62 c3 c3 00 85 c0 > [ 234.570911] RSP: 0018:ffff888b94477b08 EFLAGS: 00010286 > [ 234.571133] RAX: 0000000000000000 RBX: 0000000000000001 RCX: > 0000000000000000 > [ 234.571391] RDX: 0000000000000000 RSI: 0000000000000004 RDI: > 0000000000000246 > [ 234.571648] RBP: ffff888ba5560000 R08: fffffbfff08962d5 R09: > fffffbfff08962d5 > [ 234.571902] R10: 0000000000000001 R11: fffffbfff08962d4 R12: > ffff888bac6e9548 > [ 234.572157] R13: ffff888babfaf728 R14: ffff888bac6e9568 R15: > ffff888babfaf750 > [ 234.572412] FS: 00007fcafa59b740(0000) GS:ffff888bed200000(0000) > knlGS:0000000000000000 > [ 234.572686] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 234.572914] CR2: 00007f984f16b140 CR3: 0000000b2bf0a001 CR4: > 00000000001606e0 > [ 234.573172] Call Trace: > [ 234.573336] _raw_spin_unlock+0x2e/0x50 > [ 234.573542] mlx5_ib_unbind_slave_port+0x1bc/0x690 [mlx5_ib] > [ 234.573793] mlx5_ib_cleanup_multiport_master+0x1d3/0x660 [mlx5_ib] > [ 234.574039] mlx5_ib_stage_init_cleanup+0x4c/0x360 [mlx5_ib] > [ 234.574271] ? kfree+0xf5/0x2f0 > [ 234.574465] __mlx5_ib_remove+0x61/0xd0 [mlx5_ib] > [ 234.574688] ? __mlx5_ib_remove+0xd0/0xd0 [mlx5_ib] > [ 234.574951] mlx5_remove_device+0x234/0x300 [mlx5_core] > [ 234.575224] mlx5_unregister_device+0x4d/0x1e0 [mlx5_core] > [ 234.575493] remove_one+0x4f/0x160 [mlx5_core] > [ 234.575704] pci_device_remove+0xef/0x2a0 > [ 234.581407] ? pcibios_free_irq+0x10/0x10 > [ 234.587143] ? up_read+0xc1/0x260 > [ 234.592785] device_release_driver_internal+0x1ab/0x430 > [ 234.598442] unbind_store+0x152/0x200 > [ 234.604064] ? sysfs_kf_write+0x3b/0x180 > [ 234.609441] ? sysfs_file_ops+0x160/0x160 > [ 234.615021] kernfs_fop_write+0x277/0x440 > [ 234.620288] ? __sb_start_write+0x1ef/0x2c0 > [ 234.625512] vfs_write+0x15e/0x460 > [ 234.630786] ksys_write+0x156/0x1e0 > [ 234.635988] ? __ia32_sys_read+0xb0/0xb0 > [ 234.641120] ? trace_hardirqs_off_thunk+0x1a/0x1c > [ 234.646163] do_syscall_64+0x95/0x470 > [ 234.651106] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [ 234.656004] RIP: 0033:0x7fcaf9c9cfd0 > [ 234.660686] Code: 73 01 > c3 48 8b 0d c0 6e 2d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 > 00 > 83 3d cd cf 2d 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 > 31 c3 48 83 ec 08 e8 ee cb 01 00 48 89 04 24 > [ 234.670128] RSP: 002b:00007ffd3b01ddd8 EFLAGS: 00000246 ORIG_RAX: > 0000000000000001 > [ 234.674811] RAX: ffffffffffffffda RBX: 000000000000000d RCX: > 00007fcaf9c9cfd0 > [ 234.679387] RDX: 000000000000000d RSI: 00007fcafa5c1000 RDI: > 0000000000000001 > [ 234.683848] RBP: 00007fcafa5c1000 R08: 000000000000000a R09: > 00007fcafa59b740 > [ 234.688167] R10: 00007ffd3b01d8e0 R11: 0000000000000246 R12: > 00007fcaf9f75400 > [ 234.692386] R13: 000000000000000d R14: 0000000000000001 R15: > 0000000000000000 > [ 234.696495] irq event stamp: 153067 > [ 234.700525] hardirqs last enabled at (153067): [] > _raw_spin_unlock_irqrestore+0x59/0x70 > [ 234.704665] hardirqs last disabled at (153066): > [] _raw_spin_lock_irqsave+0x22/0x90 > [ 234.708722] softirqs last enabled at (153058): [] > __do_softirq+0x6c5/0xb4e > [ 234.712673] softirqs last disabled at (153051): > [] irq_exit+0x17d/0x1d0 > [ 234.716601] ---[ end trace 5dbf096843ee9ce6 ]--- > > Fixes: df097a278c75 ("IB/mlx5: Use the new mlx5 core notifier API") > Signed-off-by: Leon Romanovsky > --- > drivers/infiniband/hw/mlx5/main.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/infiniband/hw/mlx5/main.c > b/drivers/infiniband/hw/mlx5/main.c > index c2a5780cb394..e12a4404096b 100644 > --- a/drivers/infiniband/hw/mlx5/main.c > +++ b/drivers/infiniband/hw/mlx5/main.c > @@ -5802,13 +5802,12 @@ static void mlx5_ib_unbind_slave_port(struct > mlx5_ib_dev *ibdev, > return; > } > > - if (mpi->mdev_events.notifier_call) > - mlx5_notifier_unregister(mpi->mdev, &mpi->mdev_events); > - mpi->mdev_events.notifier_call = NULL; > - > mpi->ibdev = NULL; > > spin_unlock(&port->mp.mpi_lock); > + if (mpi->mdev_events.notifier_call) > + mlx5_notifier_unregister(mpi->mdev, &mpi->mdev_events); > + mpi->mdev_events.notifier_call = NULL; I can see where this fixes the problem at hand, but this gives the appearance of creating a new race. Doing a check/unregister/set-null series outside of any locks is a red flag to someone investigating the code. You should at least make note of the fact that calling unregister more than once is safe. If you're fine with it, I can add a comment and take the patch, or you can resubmit. > mlx5_remove_netdev_notifier(ibdev, port_num); > spin_lock(&port->mp.mpi_lock); > -- Doug Ledford GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD