* PROVE_RCU_LIST + /proc/lockdep warning @ 2020-03-06 4:06 Qian Cai 2020-03-07 3:33 ` Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks" Qian Cai 2020-03-07 17:16 ` PROVE_RCU_LIST + /proc/lockdep warning Joel Fernandes 0 siblings, 2 replies; 13+ messages in thread From: Qian Cai @ 2020-03-06 4:06 UTC (permalink / raw) To: Madhuparna Bhowmik Cc: Joel Fernandes (Google), Paul E. McKenney, rcu, LKML, Peter Zijlstra, Will Deacon, Waiman Long Since the linux-next commit c9af03c14bfd (“Default enable RCU list lockdep debugging with PROVE_RCU”), read /proc/lockdep will trigger a warning with this config below. Reverted the commit fixed the issue right away. https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config [26405.676199][ T3548] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) [26405.676239][ T3548] WARNING: CPU: 11 PID: 3548 at kernel/locking/lockdep.c:4637 check_flags.part.28+0x218/0x220 [26405.756287][ T3548] Modules linked in: kvm_intel nls_iso8859_1 nls_cp437 kvm vfat fat irqbypass intel_cstate intel_uncore intel_rapl_perf dax_pmem dax_pmem_core efivars ip_tables x_tables xfs sd_mod bnx2x hpsa mdio scsi_transport_sas firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs [26405.881899][ T3548] CPU: 11 PID: 3548 Comm: cat Not tainted 5.6.0-rc4-next-20200305+ #8 [26405.920091][ T3548] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 [26405.955370][ T3548] RIP: 0010:check_flags.part.28+0x218/0x220 [26405.983016][ T3548] Code: 13 8a e8 2b 3f 29 00 44 8b 15 84 df ba 01 45 85 d2 0f 85 c7 94 00 00 48 c7 c6 40 2b 47 89 48 c7 c7 40 04 47 89 e8 49 e3 f3 ff <0f> 0b e9 ad 94 00 00 90 55 48 89 e5 41 57 4d 89 cf 41 56 45 89 c6 [26406.076147][ T3548] RSP: 0018:ffffc9000695f848 EFLAGS: 00010086 [26406.104215][ T3548] RAX: 0000000000000000 RBX: ffff888fe6184040 RCX: ffffffff8858cecf [26406.140856][ T3548] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: 0000000000000000 [26406.178465][ T3548] RBP: ffffc9000695f850 R08: fffffbfff1377355 R09: fffffbfff1377355 [26406.217995][ T3548] R10: ffffffff89bb9aa3 R11: fffffbfff1377354 R12: 0000000000000000 [26406.256760][ T3548] R13: ffffffff8aa55ee0 R14: 0000000000000046 R15: ffffffff8aa55ec0 [26406.293708][ T3548] FS: 00007f58cf3a3540(0000) GS:ffff88905fa80000(0000) knlGS:0000000000000000 [26406.335252][ T3548] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [26406.366331][ T3548] CR2: 00007f58cf326000 CR3: 0000000f1ba38006 CR4: 00000000001606e0 [26406.402408][ T3548] Call Trace: [26406.416739][ T3548] lock_is_held_type+0x5d/0x150 [26406.438262][ T3548] ? rcu_lockdep_current_cpu_online+0x64/0x80 [26406.466463][ T3548] rcu_read_lock_any_held+0xac/0x100 [26406.490105][ T3548] ? rcu_read_lock_held+0xc0/0xc0 [26406.513258][ T3548] ? __slab_free+0x421/0x540 [26406.535012][ T3548] ? kasan_kmalloc+0x9/0x10 [26406.555901][ T3548] ? __kmalloc_node+0x1d7/0x320 [26406.578668][ T3548] ? kvmalloc_node+0x6f/0x80 [26406.599872][ T3548] __bfs+0x28a/0x3c0 [26406.617075][ T3548] ? class_equal+0x30/0x30 [26406.637524][ T3548] lockdep_count_forward_deps+0x11a/0x1a0 [26406.664134][ T3548] ? check_noncircular+0x2e0/0x2e0 [26406.688191][ T3548] ? __kasan_check_read+0x11/0x20 [26406.713581][ T3548] ? check_chain_key+0x1df/0x2e0 [26406.738044][ T3548] ? seq_vprintf+0x4e/0xb0 [26406.758241][ T3548] ? seq_printf+0x9b/0xd0 [26406.778169][ T3548] ? seq_vprintf+0xb0/0xb0 [26406.798172][ T3548] l_show+0x1c4/0x380 [26406.816474][ T3548] ? print_name+0xb0/0xb0 [26406.836393][ T3548] seq_read+0x56b/0x750 [26406.855346][ T3548] proc_reg_read+0x1b4/0x200 [26406.876737][ T3548] ? proc_reg_unlocked_ioctl+0x1e0/0x1e0 [26406.903030][ T3548] ? check_chain_key+0x1df/0x2e0 [26406.926531][ T3548] ? find_held_lock+0xca/0xf0 [26406.948291][ T3548] __vfs_read+0x50/0xa0 [26406.967391][ T3548] vfs_read+0xcb/0x1e0 [26406.986102][ T3548] ksys_read+0xc6/0x160 [26407.005405][ T3548] ? kernel_write+0xc0/0xc0 [26407.026076][ T3548] ? do_syscall_64+0x79/0xaec [26407.047448][ T3548] ? do_syscall_64+0x79/0xaec [26407.068650][ T3548] __x64_sys_read+0x43/0x50 [26407.089132][ T3548] do_syscall_64+0xcc/0xaec [26407.109939][ T3548] ? trace_hardirqs_on_thunk+0x1a/0x1c [26407.134924][ T3548] ? syscall_return_slowpath+0x580/0x580 [26407.160854][ T3548] ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe [26407.188943][ T3548] ? trace_hardirqs_off_caller+0x3a/0x150 [26407.216692][ T3548] ? trace_hardirqs_off_thunk+0x1a/0x1c [26407.243534][ T3548] entry_SYSCALL_64_after_hwframe+0x49/0xbe [26407.272720][ T3548] RIP: 0033:0x7f58ceeafd75 [26407.293162][ T3548] Code: fe ff ff 50 48 8d 3d 4a dc 09 00 e8 25 0e 02 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 a5 59 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 [26407.386043][ T3548] RSP: 002b:00007ffc115111a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [26407.425283][ T3548] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f58ceeafd75 [26407.462717][ T3548] RDX: 0000000000020000 RSI: 00007f58cf327000 RDI: 0000000000000003 [26407.500428][ T3548] RBP: 00007f58cf327000 R08: 00000000ffffffff R09: 0000000000000000 [26407.538473][ T3548] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f58cf327000 [26407.575743][ T3548] R13: 0000000000000003 R14: 0000000000000fff R15: 0000000000020000 [26407.613112][ T3548] irq event stamp: 7161 [26407.632089][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 [26407.680094][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 [26407.727273][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 [26407.774000][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 [26407.814407][ T3548] ---[ end trace 1026d00df66af83e ]--- [26407.839742][ T3548] possible reason: unannotated irqs-off. [26407.866243][ T3548] irq event stamp: 7161 [26407.885407][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 [26407.933602][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 [26407.980432][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 [26408.022826][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 On a side note, it likely to hit another bug in next-20200305 (not such problem on 0304) where it will stuck during boot, but the reverting does not help there. Rebooting a few times could pass. [ 0.013514][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 [ 0.013514][ C0] Modules linked in: [ 0.013514][ C0] irq event stamp: 64186318 [ 0.013514][ C0] hardirqs last enabled at (64186317): [<ffffffff84c9b107>] _raw_spin_unlock_irq+0x27/0x40 [ 0.013514][ C0] hardirqs last disabled at (64186318): [<ffffffff84c8f384>] __schedule+0x214/0x1070 [ 0.013514][ C0] softirqs last enabled at (267904): [<ffffffff85000447>] __do_softirq+0x447/0x766 [ 0.013514][ C0] softirqs last disabled at (267897): [<ffffffff842d1f16>] irq_exit+0xd6/0xf0 [ 0.013514][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200305+ #6 [ 0.013514][ C0] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 [ 0.013514][ C0] RIP: 0010:lock_is_held_type+0x12a/0x150 [ 0.013514][ C0] Code: 41 0f 94 c4 65 48 8b 1c 25 40 0f 02 00 48 8d bb 74 08 00 00 e8 77 c0 28 00 c7 83 74 08 00 00 00 00 00 00 41 56 9d 48 83 c4 18 <44> 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 45 31 e4 eb c7 41 bc 01 [ 0.013514][ C0] RSP: 0000:ffffc9000628f9f8 EFLAGS: 00000082 [ 0.013514][ C0] RAX: 0000000000000000 RBX: ffff889880efc040 RCX: ffffffff8438b449 [ 0.013514][ C0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff889880efc8b4 [ 0.013514][ C0] RBP: ffffc9000628fa20 R08: ffffed1108588a24 R09: ffffed1108588a24 [ 0.013514][ C0] R10: ffff888842c4511b R11: 0000000000000000 R12: 0000000000000000 [ 0.013514][ C0] R13: ffff889880efc908 R14: 0000000000000046 R15: 0000000000000003 [ 0.013514][ C0] FS: 0000000000000000(0000) GS:ffff888842c00000(0000) knlGS:0000000000000000 [ 0.013514][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.013514][ C0] CR2: ffff88a0707ff000 CR3: 0000000b72012001 CR4: 00000000001606f0 [ 0.013514][ C0] Call Trace: [ 0.013514][ C0] rcu_read_lock_sched_held+0xac/0xe0 lock_is_held at include/linux/lockdep.h:361 (inlined by) rcu_read_lock_sched_held at kernel/rcu/update.c:121 [ 0.013514][ C0] ? rcu_read_lock_bh_held+0xc0/0xc0 [ 0.013514][ C0] rcu_note_context_switcx186/0x3b0 [ 0.013514][ C0] __schedule+0x21f/0x1070 [ 0.013514][ C0] ? __sched_text_start+0x8/0x8 [ 0.013514][ C0] schedule+0x95/0x160 [ 0.013514][ C0] do_boot_cpu+0x58c/0xaf0 [ 0.013514][ C0] native_cpu_up+0x298/0x430 [ 0.013514][ C0] ? common_cpu_up+0x150/0x150 [ 0.013514][ C0] bringup_cpu+0x44/0x310 [ 0.013514][ C0] ? timers_prepare_cpu+0x114/0x190 [ 0.013514][ C0] ? takedown_cpu+0x2e0/0x2e0 [ 0.013514][ C0] cpuhp_invoke_callback+0x197/0x1120 [ 0.013514][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 [ 0.013514][ C0] _cpu_up+0x171/0x280 [ 0.013514][ C0] do_cpu_up+0xb1/0x120 [ 0.013514][ C0] cpu_up+0x13/0x20 [ 0.013514][ C0] smp_init+0x91/0x118 [ 0.013514][ C0] kernel_init_freeable+0x221/0x4f8 [ 0.013514][ C0] ? mark_held_locks+0x34/0xb0 [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 [ 0.013514][ C0] ? start_kernel+0x876/0x876 [ 0.013514][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 [ 0.013514][ C0] ? rest_init+0x307/0x307 [ 0.013514][ C0] kernel_init+0x 0.013514][ C0] ? rest_init+0x307/0x307 [ 0.013514][ C0] ret_from_fork+0x3a/0x50 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks" 2020-03-06 4:06 PROVE_RCU_LIST + /proc/lockdep warning Qian Cai @ 2020-03-07 3:33 ` Qian Cai 2020-03-08 21:18 ` Waiman Long 2020-03-19 13:58 ` Qian Cai 2020-03-07 17:16 ` PROVE_RCU_LIST + /proc/lockdep warning Joel Fernandes 1 sibling, 2 replies; 13+ messages in thread From: Qian Cai @ 2020-03-07 3:33 UTC (permalink / raw) To: Waiman Long, Thomas Gleixner Cc: Joel Fernandes (Google), Paul E. McKenney, rcu, LKML, Peter Zijlstra, Will Deacon, Ingo Molnar, Frederic Weisbecker > On Mar 5, 2020, at 11:06 PM, Qian Cai <cai@lca.pw> wrote: > Using this config, > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config Reverted the linux-next commit d441dceb5dce (“tick/common: Make tick_periodic() check for missing ticks”) fixed the lockup that could easily happen during boot. > > [ 0.013514][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 > [ 0.013514][ C0] Modules linked in: > [ 0.013514][ C0] irq event stamp: 64186318 > [ 0.013514][ C0] hardirqs last enabled at (64186317): [<ffffffff84c9b107>] _raw_spin_unlock_irq+0x27/0x40 > [ 0.013514][ C0] hardirqs last disabled at (64186318): [<ffffffff84c8f384>] __schedule+0x214/0x1070 > [ 0.013514][ C0] softirqs last enabled at (267904): [<ffffffff85000447>] __do_softirq+0x447/0x766 > [ 0.013514][ C0] softirqs last disabled at (267897): [<ffffffff842d1f16>] irq_exit+0xd6/0xf0 > [ 0.013514][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200305+ #6 > [ 0.013514][ C0] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > [ 0.013514][ C0] RIP: 0010:lock_is_held_type+0x12a/0x150 > [ 0.013514][ C0] Code: 41 0f 94 c4 65 48 8b 1c 25 40 0f 02 00 48 8d bb 74 08 00 00 e8 77 c0 28 00 c7 83 74 08 00 00 00 00 00 00 41 56 9d 48 83 c4 18 <44> 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 45 31 e4 eb c7 41 bc 01 > [ 0.013514][ C0] RSP: 0000:ffffc9000628f9f8 EFLAGS: 00000082 > [ 0.013514][ C0] RAX: 0000000000000000 RBX: ffff889880efc040 RCX: ffffffff8438b449 > [ 0.013514][ C0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff889880efc8b4 > [ 0.013514][ C0] RBP: ffffc9000628fa20 R08: ffffed1108588a24 R09: ffffed1108588a24 > [ 0.013514][ C0] R10: ffff888842c4511b R11: 0000000000000000 R12: 0000000000000000 > [ 0.013514][ C0] R13: ffff889880efc908 R14: 0000000000000046 R15: 0000000000000003 > [ 0.013514][ C0] FS: 0000000000000000(0000) GS:ffff888842c00000(0000) knlGS:0000000000000000 > [ 0.013514][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.013514][ C0] CR2: ffff88a0707ff000 CR3: 0000000b72012001 CR4: 00000000001606f0 > [ 0.013514][ C0] Call Trace: > [ 0.013514][ C0] rcu_read_lock_sched_held+0xac/0xe0 > lock_is_held at include/linux/lockdep.h:361 > (inlined by) rcu_read_lock_sched_held at kernel/rcu/update.c:121 > [ 0.013514][ C0] ? rcu_read_lock_bh_held+0xc0/0xc0 > [ 0.013514][ C0] rcu_note_context_switcx186/0x3b0 > [ 0.013514][ C0] __schedule+0x21f/0x1070 > [ 0.013514][ C0] ? __sched_text_start+0x8/0x8 > [ 0.013514][ C0] schedule+0x95/0x160 > [ 0.013514][ C0] do_boot_cpu+0x58c/0xaf0 > [ 0.013514][ C0] native_cpu_up+0x298/0x430 > [ 0.013514][ C0] ? common_cpu_up+0x150/0x150 > [ 0.013514][ C0] bringup_cpu+0x44/0x310 > [ 0.013514][ C0] ? timers_prepare_cpu+0x114/0x190 > [ 0.013514][ C0] ? takedown_cpu+0x2e0/0x2e0 > [ 0.013514][ C0] cpuhp_invoke_callback+0x197/0x1120 > [ 0.013514][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 > [ 0.013514][ C0] _cpu_up+0x171/0x280 > [ 0.013514][ C0] do_cpu_up+0xb1/0x120 > [ 0.013514][ C0] cpu_up+0x13/0x20 > [ 0.013514][ C0] smp_init+0x91/0x118 > [ 0.013514][ C0] kernel_init_freeable+0x221/0x4f8 > [ 0.013514][ C0] ? mark_held_locks+0x34/0xb0 > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > [ 0.013514][ C0] ? start_kernel+0x876/0x876 > [ 0.013514][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > [ 0.013514][ C0] ? rest_init+0x307/0x307 > [ 0.013514][ C0] kernel_init+0x 0.013514][ C0] ? rest_init+0x307/0x307 > [ 0.013514][ C0] ret_from_fork+0x3a/0x50 > We could have many slightly different traces, [ 0.000000][ T0] smpboot: CPU 8 Converting physical 0 to logical die 1 [ 0.021496][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 [ 0.021496][ C0] Modules linked in: [ 0.021496][ C0] irq event stamp: 53241496 [ 0.021496][ C0] hardirqs last enabled at (53241495): [<ffffffffa9c8c037>] _raw_spin_unlock_irq+0x27/0x40 [ 0.021496][ C0] hardirqs last disabled at (53241496): [<ffffffffa9c80244>] __schedule+0x214/0x1070 [ 0.021496][ C0] softirqs last enabled at (88160): [<ffffffffaa000447>] __do_softirq+0x447/0x766 [ 0.021496][ C0] softirqs last disabled at (88153): [<ffffffffa92d0c66>] irq_exit+0xd6/0xf0 [ 0.021496][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4+ #25 [ 0.021496][ C0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018 [ 0.021496][ C0] RIP: 0010:__asan_load8+0x0/0xa0 [ 0.021496][ C0] Code: e8 03 0f b6 04 30 84 c0 74 c2 38 d0 0f 9e c0 84 c0 74 b9 ba 01 00 00 00 be 04 00 00 00 e8 c8 e3 ff ff 5d c3 66 0f 1f 44 00 00 <55> 48 89 e5 48 8b 4d 08 eb 3a 0f 1f 00 48 b8 00 00 00 00 00 00 00 [ 0.021496][ C0] RSP: 0018:ffffc900031779d8 EFLAGS: 00000082 [ 0.021496][ C0] RAX: 000000000000000f RBX: ffff88820f118040 RCX: ffffffffa9386e1c [ 0.021496][ C0] RDX: ffff88820f1188b0 RSI: ffff8884534442d8 RDI: ffff88820f118938 [ 0.021496][ C0] RBP: ffffc90003177a10 R08: ffffed1041e23009 R09: ffffed1041e23009 [ 0.021496][ C0] R10: ffffed1041e23008 R11: 0000000000000000 R12: ffff88820f118928 [ 0.021496][ C0] R13: ffff8884534442d8 R14: 0000000000000003 R15: ffff88820f118928 [ 0.021496][ C0] FS: 0000000000000000(0000) GS:ffff888453400000(0000) knlGS:0000000000000000 [ 0.021496][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.021496][ C0] CR2: ffff88887b9ff000 CR3: 0000000257c12000 CR4: 00000000003406f0 [ 0.021496][ C0] Call Trace: [ 0.021496][ C0] ? match_held_lock+0x20/0x250 [ 0.021496][ C0] lock_unpin_lock+0x16a/0x260 [ 0.021496][ C0] ? lock_repin_lock+0x210/0x210 [ 0.021496][ C0] ? __kasan_check_read+0x11/0x20 [ 0.021496][ C0] ? pick_next_task_fair+0x3a6/0x6b0 [ 0.021496][ C0] __schedule+0xd4f/0x1070 [ 0.021496][ C0] ? firmware_map_remove+0xee/0xee [ 0.021496][ C0] ? schedule+0xc9/0x160 [ 0.021496][ C0] schedule+0x95/0x160 [ 0.021496][ C0] do_boot_cpu+0x58c/0xaf0 [ 0.021496][ C0] native_cpu_up+0x298/0x430 [ 0.021496][ C0] ? common_cpu_up+0x150/0x150 [ 0.021496][ C0] bringup_cpu+0x44/0x310 [ 0.021496][ C0] ? timers_prepare_cpu+0x114/0x190 [ 0.021496][ C0] ? takedown_cpu+0x2e0/0x2e0 [ 0.021496][ C0] cpuhp_invoke_callback+0x197/0x1120 [ 0.021496][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 [ 0.021496][ C0] _cpu_up+0x171/0x280 [ 0.021496][ C0] do_cpu_up+0xb1/0x120 [ 0.021496][ C0] cpu_up+0x13/0x20 [ 0.021496][ C0] smp_init+0x91/0x118 [ 0.021496][ C0] kernel_init_freeable+0x221/0x4f8 [ 0.021496][ C0] ? mark_held_locks+0x34/0xb0 [ 0.021496][ C0] ? _raw_spin_unlock_irq+0x27/0x40 [ 0.021496][ C0] ? start_kernel+0x857/0x857 [ 0.021496][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 [ 0.021496][ C0] ? _raw_spin_unlock_irq+0x27/0x40 [ 0.021496][ C0] ? rest_init+0x307/0x307 [ 0.021496][ C0] kernel_init+0x11/0x139 [ 0.021496][ C0] ? rest_init+0x307/0x307 [ 0.021496][ C0] ret_from_fork+0x27/0x50 [ 0.021458][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 [ 0.021458][ C0] Modules linked in: [ 0.021458][ C0] irq event stamp: 55574034 [ 0.021458][ C0] hardirqs last enabled at (55574033): [<ffffffffa549d4d7>] _raw_spin_unlock_irq+0x27/0x40 [ 0.021458][ C0] hardirqs last disabled at (55574034): [<ffffffffa5491754>] __schedule+0x214/0x1070 [ 0.021458][ C0] softirqs last enabled at (83640): [<ffffffffa5800447>] __do_softirq+0x447/0x766 [ 0.021458][ C0] softirqs last disabled at (83623): [<ffffffffa4ad2196>] irq_exit+0xd6/0xf0 [ 0.021458][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4+ #13 [ 0.021458][ C0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018 [ 0.021458][ C0] RIP: 0010:check_memory_region+0x136/0x200 [ 0.021458][ C0] Code: 00 eb 0c 49 83 c0 01 48 89 d8 49 39 d8 74 10 41 80 38 00 74 ee 4b 8d 44 0d 00 4d 85 c0 75 4d 4c 89 e3 48 29 c3 e9 3e ff ff ff <48> 85 db 74 2e 41 80 39 00 75 34 48 b8 01 00 00 00 00 fc ff df 49 [ 0.021458][ C0] RSP: 0018:ffffc900031779e8 EFLAGS: 00000083 [ 0.021458][ C0] RAX: fffff5200062ef48 RBX: 0000000000000001 RCX: ffffffffa4b94e16 [ 0.021458][ C0] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffc90003177a40 [ 0.021458][ C0] RBP: ffffc90003177a00 R08: 1ffff9200062ef48 R09: fffff5200062ef48 [ 0.021458][ C0] R10: fffff5200062ef48 R11: ffffc90003177a43 R12: fffff5200062ef49 [ 0.021458][ C0] R13: ffffc90003177a80 R14: ffff888453444310 R15: ffff888453444308 [ 0.021458][ C0] FS: 0000000000000000(0000) GS:ffff888453400000(0000) knlGS:0000000000000000 [ 0.021458][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.021458][ C0] CR2: ffff88887b9ff000 CR3: 00000006e6c12000 CR4: 00000000003406f0 [ 0.021458][ C0] Call Trace: [ 0.021458][ C0] __kasan_check_write+0x14/0x20 [ 0.021458][ C0] do_raw_spin_lock+0xe6/0x1e0 [ 0.021458][ C0] ? rwlock_bug.part.1+0x60/0x60 [ 0.021458][ C0] ? __schedule+0x227/0x1070 [ 0.021458][ C0] _raw_spin_lock+0x37/0x40 [ 0.021458][ C0] ? __schedule+0x227/0x1070 [ 0.021458][ C0] __schedule+0x227/0x1070 [ 0.021458][ C0] ? __sched_text_start+0x8/0x8 [ 0.021458][ C0] schedule+0x95/0x160 [ 0.021458][ C0] do_boot_cpu+0x58c/0xaf0 [ 0.021458][ C0] native_cpu_up+0x298/0x430 [ 0.021458][ C0] ? common_cpu_up+0x150/0x150 [ 0.021458][ C0] bringup_cpu+0x44/0x310 [ 0.021458][ C0] ? timers_prepare_cpu+0x114/0x190 [ 0.021458][ C0] ? takedown_cpu+0x2e0/0x2e0 [ 0.021458][ C0] cpuhp_invoke_callback+0x197/0x1120 [ 0.021458][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 [ 0.021458][ C0] _cpu_up+0x171/0x280 [ 0.021458][ C0] do_cpu_up+0xb1/0x120 [ 0.021458][ C0] cpu_up+0x13/0x20 [ 0.021458][ C0] smp_init+0x91/0x118 [ 0.021458][ C0] kernel_init_freeable+0x221/0x4f8 [ 0.021458][ C0] ? mark_held_locks+0x34/0xb0 [ 0.021458][ C0] ? _raw_spin_unlock_irq+0x27/0x40 [ 0.021458][ C0] ? start_kernel+0x876/0x876 [ 0.021458][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 [ 0.021458][ C0] ? _raw_spin_unlock_irq+0x27/0x40 [ 0.021458][ C0] ? rest_init+0x307/0x307 [ 0.021458][ C0] kernel_init+0x11/0x139 [ 0.021458][ C0] ? rest_init+0x307/0x307 [ 0.021458][ C0] ret_from_fork+0x27/0x50 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks" 2020-03-07 3:33 ` Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks" Qian Cai @ 2020-03-08 21:18 ` Waiman Long 2020-03-19 13:58 ` Qian Cai 1 sibling, 0 replies; 13+ messages in thread From: Waiman Long @ 2020-03-08 21:18 UTC (permalink / raw) To: Qian Cai, Thomas Gleixner Cc: Joel Fernandes (Google), Paul E. McKenney, rcu, LKML, Peter Zijlstra, Will Deacon, Ingo Molnar, Frederic Weisbecker On 3/6/20 10:33 PM, Qian Cai wrote: > >> On Mar 5, 2020, at 11:06 PM, Qian Cai <cai@lca.pw> wrote: >> > Using this config, > >> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > Reverted the linux-next commit d441dceb5dce (“tick/common: Make tick_periodic() check for missing ticks”) > fixed the lockup that could easily happen during boot. > I will take a look at that and get back to you ASAP. Thanks, Longman ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks" 2020-03-07 3:33 ` Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks" Qian Cai 2020-03-08 21:18 ` Waiman Long @ 2020-03-19 13:58 ` Qian Cai 2020-03-19 19:00 ` Waiman Long 1 sibling, 1 reply; 13+ messages in thread From: Qian Cai @ 2020-03-19 13:58 UTC (permalink / raw) To: Waiman Long, Thomas Gleixner, Stephen Rothwell Cc: Joel Fernandes (Google), Paul E. McKenney, rcu, LKML, Peter Zijlstra, Will Deacon, Ingo Molnar, Frederic Weisbecker > On Mar 6, 2020, at 10:33 PM, Qian Cai <cai@lca.pw> wrote: > > > >> On Mar 5, 2020, at 11:06 PM, Qian Cai <cai@lca.pw> wrote: >> > > Using this config, > >> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > Reverted the linux-next commit d441dceb5dce (“tick/common: Make tick_periodic() check for missing ticks”) > fixed the lockup that could easily happen during boot. Thomas or Stephen, can you back out this patch from linux-next while Waiman is getting to the bottom of it? I can still reproduce it as today. [ 0.021353][ T1] ... fixed-purpose events: 0 [ 0.021353][ T1] ... event mask: 000000000000003f [ 0.021353][ T1] rcu: Hierarchical SRCU implementation. [ 0.021353][ T5] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter. [ 0.021353][ T1] smp: Bringing up secondary CPUs ... [ 0.021353][ T1] x86: Booting SMP configuration: [ 0.021353][ T1] .... node #0, CPUs: #1 [ 0.021353][ T1] .... node #1, CPUs: #2 [ 0.021353][ T1] #3 [ 0.021353][ T1] .... node #2, CPUs: #4 [ 0.021353][ T1] #5 [ 0.021353][ T1] .... node #3, CPUs: #6 [ 0.021353][ T1] #7 [ 0.021353][ T1] .... node #4, CPUs: #8 [ 0.000000][ T0] smpboot: CPU 8 Converting physical 0 to logical die 1 [ 0.021353][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 [ 0.021353][ C0] Modules linked in: [ 0.021353][ C0] irq event stamp: 51754775 [ 0.021353][ C0] hardirqs last enabled at (51754775): [<ffffffff936a8117>] _raw_spin_unlock_irq+0x27/0x40 [ 0.021353][ C0] hardirqs last disabled at (51754774): [<ffffffff9369c394>] __schedule+0x214/0x1070 [ 0.021353][ C0] softirqs last enabled at (78480): [<ffffffff93a00447>] __do_softirq+0x447/0x766 [ 0.021353][ C0] softirqs last disabled at (78463): [<ffffffff92cd2226>] irq_exit+0xd6/0xf0 [ 0.021353][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc6-next-20200319 #1 [ 0.021353][ C0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018 [ 0.021353][ C0] RIP: 0010:lock_is_held_type+0x3e/0x150 [ 0.021353][ C0] Code: 02 00 48 83 ec 18 48 89 7d d0 48 8d bb 74 08 00 00 89 75 c4 e8 23 fe 28 00 8b 8b 74 08 00 00 85 c9 0f 85 02 01 00 00 9c 41 5e <fa> 48 c7 c7 b4 8c 93 94 e8 05 fe 28 00 8b 15 23 b8 ba 01 85 d2 74 [ 0.021353][ C0] RSP: 0018:ffffc90003177a88 EFLAGS: 00000246 [ 0.021353][ C0] RAX: 0000000000000000 RBX: ffff888480e0c040 RCX: 0000000000000000 [ 0.021353][ C0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff888480e0c8b4 [ 0.021353][ C0] RBP: ffffc90003177ac8 R08: ffffed110f700fb9 R09: 0000000000000001 [ 0.021353][ C0] R10: ffff88887b807dc7 R11: ffffed110f700fb8 R12: ffff888480e0c040 [ 0.021353][ C0] R13: 0000000000000000 R14: 0000000000000246 R15: ffff888453444340 [ 0.021353][ C0] FS: 0000000000000000(0000) GS:ffff888453400000(0000) knlGS:0000000000000000 [ 0.021353][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.021353][ C0] CR2: ffff88887b9ff000 CR3: 00000003a7e12000 CR4: 00000000003406f0 [ 0.021353][ C0] Call Trace: [ 0.021353][ C0] __schedule+0x821/0x1070 [ 0.021353][ C0] ? firmware_map_remove+0xe9/0xe9 [ 0.021353][ C0] schedule+0x95/0x160 [ 0.021353][ C0] do_boot_cpu+0x58c/0xaf0 [ 0.021353][ C0] native_cpu_up+0x298/0x430 [ 0.021353][ C0] ? common_cpu_up+0x150/0x150 [ 0.021353][ C0] bringup_cpu+0x44/0x310 [ 0.021353][ C0] ? timers_prepare_cpu+0x114/0x190 [ 0.021353][ C0] ? takedown_cpu+0x2e0/0x2e0 [ 0.021353][ C0] cpuhp_invoke_callback+0x197/0x1120 [ 0.021353][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 [ 0.021353][ C0] _cpu_up+0x171/0x280 [ 0.021353][ C0] do_cpu_up+0xb1/0x120 [ 0.021353][ C0] cpu_up+0x13/0x20 [ 0.021353][ C0] smp_init+0x91/0x118 [ 0.021353][ C0] kernel_init_freeable+0x221/0x4f8 [ 0.021353][ C0] ? mark_held_locks+0x34/0xb0 [ 0.021353][ C0] ? _raw_spin_unlock_irq+0x27/0x40 [ 0.021353][ C0] ? start_kernel+0x8c0/0x8c0 [ 0.021353][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 [ 0.021353][ C0] ? _raw_spin_unlock_irq+0x27/0x40 [ 0.021353][ C0] ? rest_init+0x307/0x307 [ 0.021353][ C0] kernel_init+0x11/0x139 [ 0.021353][ C0] ? rest_init+0x307/0x307 [ 0.021353][ C0] ret_from_fork+0x27/0x50 > >> >> [ 0.013514][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 >> [ 0.013514][ C0] Modules linked in: >> [ 0.013514][ C0] irq event stamp: 64186318 >> [ 0.013514][ C0] hardirqs last enabled at (64186317): [<ffffffff84c9b107>] _raw_spin_unlock_irq+0x27/0x40 >> [ 0.013514][ C0] hardirqs last disabled at (64186318): [<ffffffff84c8f384>] __schedule+0x214/0x1070 >> [ 0.013514][ C0] softirqs last enabled at (267904): [<ffffffff85000447>] __do_softirq+0x447/0x766 >> [ 0.013514][ C0] softirqs last disabled at (267897): [<ffffffff842d1f16>] irq_exit+0xd6/0xf0 >> [ 0.013514][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200305+ #6 >> [ 0.013514][ C0] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 >> [ 0.013514][ C0] RIP: 0010:lock_is_held_type+0x12a/0x150 >> [ 0.013514][ C0] Code: 41 0f 94 c4 65 48 8b 1c 25 40 0f 02 00 48 8d bb 74 08 00 00 e8 77 c0 28 00 c7 83 74 08 00 00 00 00 00 00 41 56 9d 48 83 c4 18 <44> 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 45 31 e4 eb c7 41 bc 01 >> [ 0.013514][ C0] RSP: 0000:ffffc9000628f9f8 EFLAGS: 00000082 >> [ 0.013514][ C0] RAX: 0000000000000000 RBX: ffff889880efc040 RCX: ffffffff8438b449 >> [ 0.013514][ C0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff889880efc8b4 >> [ 0.013514][ C0] RBP: ffffc9000628fa20 R08: ffffed1108588a24 R09: ffffed1108588a24 >> [ 0.013514][ C0] R10: ffff888842c4511b R11: 0000000000000000 R12: 0000000000000000 >> [ 0.013514][ C0] R13: ffff889880efc908 R14: 0000000000000046 R15: 0000000000000003 >> [ 0.013514][ C0] FS: 0000000000000000(0000) GS:ffff888842c00000(0000) knlGS:0000000000000000 >> [ 0.013514][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 0.013514][ C0] CR2: ffff88a0707ff000 CR3: 0000000b72012001 CR4: 00000000001606f0 >> [ 0.013514][ C0] Call Trace: >> [ 0.013514][ C0] rcu_read_lock_sched_held+0xac/0xe0 >> lock_is_held at include/linux/lockdep.h:361 >> (inlined by) rcu_read_lock_sched_held at kernel/rcu/update.c:121 >> [ 0.013514][ C0] ? rcu_read_lock_bh_held+0xc0/0xc0 >> [ 0.013514][ C0] rcu_note_context_switcx186/0x3b0 >> [ 0.013514][ C0] __schedule+0x21f/0x1070 >> [ 0.013514][ C0] ? __sched_text_start+0x8/0x8 >> [ 0.013514][ C0] schedule+0x95/0x160 >> [ 0.013514][ C0] do_boot_cpu+0x58c/0xaf0 >> [ 0.013514][ C0] native_cpu_up+0x298/0x430 >> [ 0.013514][ C0] ? common_cpu_up+0x150/0x150 >> [ 0.013514][ C0] bringup_cpu+0x44/0x310 >> [ 0.013514][ C0] ? timers_prepare_cpu+0x114/0x190 >> [ 0.013514][ C0] ? takedown_cpu+0x2e0/0x2e0 >> [ 0.013514][ C0] cpuhp_invoke_callback+0x197/0x1120 >> [ 0.013514][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 >> [ 0.013514][ C0] _cpu_up+0x171/0x280 >> [ 0.013514][ C0] do_cpu_up+0xb1/0x120 >> [ 0.013514][ C0] cpu_up+0x13/0x20 >> [ 0.013514][ C0] smp_init+0x91/0x118 >> [ 0.013514][ C0] kernel_init_freeable+0x221/0x4f8 >> [ 0.013514][ C0] ? mark_held_locks+0x34/0xb0 >> [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 >> [ 0.013514][ C0] ? start_kernel+0x876/0x876 >> [ 0.013514][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 >> [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 >> [ 0.013514][ C0] ? rest_init+0x307/0x307 >> [ 0.013514][ C0] kernel_init+0x 0.013514][ C0] ? rest_init+0x307/0x307 >> [ 0.013514][ C0] ret_from_fork+0x3a/0x50 >> > > We could have many slightly different traces, > > [ 0.000000][ T0] smpboot: CPU 8 Converting physical 0 to logical die 1 > [ 0.021496][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 > [ 0.021496][ C0] Modules linked in: > [ 0.021496][ C0] irq event stamp: 53241496 > [ 0.021496][ C0] hardirqs last enabled at (53241495): [<ffffffffa9c8c037>] _raw_spin_unlock_irq+0x27/0x40 > [ 0.021496][ C0] hardirqs last disabled at (53241496): [<ffffffffa9c80244>] __schedule+0x214/0x1070 > [ 0.021496][ C0] softirqs last enabled at (88160): [<ffffffffaa000447>] __do_softirq+0x447/0x766 > [ 0.021496][ C0] softirqs last disabled at (88153): [<ffffffffa92d0c66>] irq_exit+0xd6/0xf0 > [ 0.021496][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4+ #25 > [ 0.021496][ C0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018 > [ 0.021496][ C0] RIP: 0010:__asan_load8+0x0/0xa0 > [ 0.021496][ C0] Code: e8 03 0f b6 04 30 84 c0 74 c2 38 d0 0f 9e c0 84 c0 74 b9 ba 01 00 00 00 be 04 00 00 00 e8 c8 e3 ff ff 5d c3 66 0f 1f 44 00 00 <55> 48 89 e5 48 8b 4d 08 eb 3a 0f 1f 00 48 b8 00 00 00 00 00 00 00 > [ 0.021496][ C0] RSP: 0018:ffffc900031779d8 EFLAGS: 00000082 > [ 0.021496][ C0] RAX: 000000000000000f RBX: ffff88820f118040 RCX: ffffffffa9386e1c > [ 0.021496][ C0] RDX: ffff88820f1188b0 RSI: ffff8884534442d8 RDI: ffff88820f118938 > [ 0.021496][ C0] RBP: ffffc90003177a10 R08: ffffed1041e23009 R09: ffffed1041e23009 > [ 0.021496][ C0] R10: ffffed1041e23008 R11: 0000000000000000 R12: ffff88820f118928 > [ 0.021496][ C0] R13: ffff8884534442d8 R14: 0000000000000003 R15: ffff88820f118928 > [ 0.021496][ C0] FS: 0000000000000000(0000) GS:ffff888453400000(0000) knlGS:0000000000000000 > [ 0.021496][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.021496][ C0] CR2: ffff88887b9ff000 CR3: 0000000257c12000 CR4: 00000000003406f0 > [ 0.021496][ C0] Call Trace: > [ 0.021496][ C0] ? match_held_lock+0x20/0x250 > [ 0.021496][ C0] lock_unpin_lock+0x16a/0x260 > [ 0.021496][ C0] ? lock_repin_lock+0x210/0x210 > [ 0.021496][ C0] ? __kasan_check_read+0x11/0x20 > [ 0.021496][ C0] ? pick_next_task_fair+0x3a6/0x6b0 > [ 0.021496][ C0] __schedule+0xd4f/0x1070 > [ 0.021496][ C0] ? firmware_map_remove+0xee/0xee > [ 0.021496][ C0] ? schedule+0xc9/0x160 > [ 0.021496][ C0] schedule+0x95/0x160 > [ 0.021496][ C0] do_boot_cpu+0x58c/0xaf0 > [ 0.021496][ C0] native_cpu_up+0x298/0x430 > [ 0.021496][ C0] ? common_cpu_up+0x150/0x150 > [ 0.021496][ C0] bringup_cpu+0x44/0x310 > [ 0.021496][ C0] ? timers_prepare_cpu+0x114/0x190 > [ 0.021496][ C0] ? takedown_cpu+0x2e0/0x2e0 > [ 0.021496][ C0] cpuhp_invoke_callback+0x197/0x1120 > [ 0.021496][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 > [ 0.021496][ C0] _cpu_up+0x171/0x280 > [ 0.021496][ C0] do_cpu_up+0xb1/0x120 > [ 0.021496][ C0] cpu_up+0x13/0x20 > [ 0.021496][ C0] smp_init+0x91/0x118 > [ 0.021496][ C0] kernel_init_freeable+0x221/0x4f8 > [ 0.021496][ C0] ? mark_held_locks+0x34/0xb0 > [ 0.021496][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > [ 0.021496][ C0] ? start_kernel+0x857/0x857 > [ 0.021496][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 > [ 0.021496][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > [ 0.021496][ C0] ? rest_init+0x307/0x307 > [ 0.021496][ C0] kernel_init+0x11/0x139 > [ 0.021496][ C0] ? rest_init+0x307/0x307 > [ 0.021496][ C0] ret_from_fork+0x27/0x50 > > [ 0.021458][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 > [ 0.021458][ C0] Modules linked in: > [ 0.021458][ C0] irq event stamp: 55574034 > [ 0.021458][ C0] hardirqs last enabled at (55574033): [<ffffffffa549d4d7>] _raw_spin_unlock_irq+0x27/0x40 > [ 0.021458][ C0] hardirqs last disabled at (55574034): [<ffffffffa5491754>] __schedule+0x214/0x1070 > [ 0.021458][ C0] softirqs last enabled at (83640): [<ffffffffa5800447>] __do_softirq+0x447/0x766 > [ 0.021458][ C0] softirqs last disabled at (83623): [<ffffffffa4ad2196>] irq_exit+0xd6/0xf0 > [ 0.021458][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4+ #13 > [ 0.021458][ C0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 03/09/2018 > [ 0.021458][ C0] RIP: 0010:check_memory_region+0x136/0x200 > [ 0.021458][ C0] Code: 00 eb 0c 49 83 c0 01 48 89 d8 49 39 d8 74 10 41 80 38 00 74 ee 4b 8d 44 0d 00 4d 85 c0 75 4d 4c 89 e3 48 29 c3 e9 3e ff ff ff <48> 85 db 74 2e 41 80 39 00 75 34 48 b8 01 00 00 00 00 fc ff df 49 > [ 0.021458][ C0] RSP: 0018:ffffc900031779e8 EFLAGS: 00000083 > [ 0.021458][ C0] RAX: fffff5200062ef48 RBX: 0000000000000001 RCX: ffffffffa4b94e16 > [ 0.021458][ C0] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffc90003177a40 > [ 0.021458][ C0] RBP: ffffc90003177a00 R08: 1ffff9200062ef48 R09: fffff5200062ef48 > [ 0.021458][ C0] R10: fffff5200062ef48 R11: ffffc90003177a43 R12: fffff5200062ef49 > [ 0.021458][ C0] R13: ffffc90003177a80 R14: ffff888453444310 R15: ffff888453444308 > [ 0.021458][ C0] FS: 0000000000000000(0000) GS:ffff888453400000(0000) knlGS:0000000000000000 > [ 0.021458][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.021458][ C0] CR2: ffff88887b9ff000 CR3: 00000006e6c12000 CR4: 00000000003406f0 > [ 0.021458][ C0] Call Trace: > [ 0.021458][ C0] __kasan_check_write+0x14/0x20 > [ 0.021458][ C0] do_raw_spin_lock+0xe6/0x1e0 > [ 0.021458][ C0] ? rwlock_bug.part.1+0x60/0x60 > [ 0.021458][ C0] ? __schedule+0x227/0x1070 > [ 0.021458][ C0] _raw_spin_lock+0x37/0x40 > [ 0.021458][ C0] ? __schedule+0x227/0x1070 > [ 0.021458][ C0] __schedule+0x227/0x1070 > [ 0.021458][ C0] ? __sched_text_start+0x8/0x8 > [ 0.021458][ C0] schedule+0x95/0x160 > [ 0.021458][ C0] do_boot_cpu+0x58c/0xaf0 > [ 0.021458][ C0] native_cpu_up+0x298/0x430 > [ 0.021458][ C0] ? common_cpu_up+0x150/0x150 > [ 0.021458][ C0] bringup_cpu+0x44/0x310 > [ 0.021458][ C0] ? timers_prepare_cpu+0x114/0x190 > [ 0.021458][ C0] ? takedown_cpu+0x2e0/0x2e0 > [ 0.021458][ C0] cpuhp_invoke_callback+0x197/0x1120 > [ 0.021458][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 > [ 0.021458][ C0] _cpu_up+0x171/0x280 > [ 0.021458][ C0] do_cpu_up+0xb1/0x120 > [ 0.021458][ C0] cpu_up+0x13/0x20 > [ 0.021458][ C0] smp_init+0x91/0x118 > [ 0.021458][ C0] kernel_init_freeable+0x221/0x4f8 > [ 0.021458][ C0] ? mark_held_locks+0x34/0xb0 > [ 0.021458][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > [ 0.021458][ C0] ? start_kernel+0x876/0x876 > [ 0.021458][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 > [ 0.021458][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > [ 0.021458][ C0] ? rest_init+0x307/0x307 > [ 0.021458][ C0] kernel_init+0x11/0x139 > [ 0.021458][ C0] ? rest_init+0x307/0x307 > [ 0.021458][ C0] ret_from_fork+0x27/0x50 > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks" 2020-03-19 13:58 ` Qian Cai @ 2020-03-19 19:00 ` Waiman Long 2020-03-20 2:00 ` Stephen Rothwell 0 siblings, 1 reply; 13+ messages in thread From: Waiman Long @ 2020-03-19 19:00 UTC (permalink / raw) To: Qian Cai, Thomas Gleixner, Stephen Rothwell Cc: Joel Fernandes (Google), Paul E. McKenney, rcu, LKML, Peter Zijlstra, Will Deacon, Ingo Molnar, Frederic Weisbecker On 3/19/20 9:58 AM, Qian Cai wrote: > >> On Mar 6, 2020, at 10:33 PM, Qian Cai <cai@lca.pw> wrote: >> >> >> >>> On Mar 5, 2020, at 11:06 PM, Qian Cai <cai@lca.pw> wrote: >>> >> Using this config, >> >>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config >> Reverted the linux-next commit d441dceb5dce (“tick/common: Make tick_periodic() check for missing ticks”) >> fixed the lockup that could easily happen during boot. > > Thomas or Stephen, can you back out this patch from linux-next while Waiman is getting to the bottom of it? > I can still reproduce it as today. I am fine for reverting the patch for now. I am still having some trouble reproducing it and I have other tasks to work on right now. I will get to the bottom of it and resubmit a new patch later on. Cheers, Longman ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks" 2020-03-19 19:00 ` Waiman Long @ 2020-03-20 2:00 ` Stephen Rothwell 2020-03-20 11:50 ` Thomas Gleixner 0 siblings, 1 reply; 13+ messages in thread From: Stephen Rothwell @ 2020-03-20 2:00 UTC (permalink / raw) To: Waiman Long Cc: Qian Cai, Thomas Gleixner, Joel Fernandes (Google), Paul E. McKenney, rcu, LKML, Peter Zijlstra, Will Deacon, Ingo Molnar, Frederic Weisbecker [-- Attachment #1: Type: text/plain, Size: 1044 bytes --] Hi all, On Thu, 19 Mar 2020 15:00:35 -0400 Waiman Long <longman@redhat.com> wrote: > > On 3/19/20 9:58 AM, Qian Cai wrote: > > > >> On Mar 6, 2020, at 10:33 PM, Qian Cai <cai@lca.pw> wrote: > >> > >> > >> > >>> On Mar 5, 2020, at 11:06 PM, Qian Cai <cai@lca.pw> wrote: > >>> > >> Using this config, > >> > >>> https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > >> Reverted the linux-next commit d441dceb5dce (“tick/common: Make tick_periodic() check for missing ticks”) > >> fixed the lockup that could easily happen during boot. > > > > Thomas or Stephen, can you back out this patch from linux-next while Waiman is getting to the bottom of it? > > I can still reproduce it as today. > > I am fine for reverting the patch for now. I am still having some > trouble reproducing it and I have other tasks to work on right now. I > will get to the bottom of it and resubmit a new patch later on. I have reverted that commit from linux-next today. -- Cheers, Stephen Rothwell [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks" 2020-03-20 2:00 ` Stephen Rothwell @ 2020-03-20 11:50 ` Thomas Gleixner 0 siblings, 0 replies; 13+ messages in thread From: Thomas Gleixner @ 2020-03-20 11:50 UTC (permalink / raw) To: Stephen Rothwell, Waiman Long Cc: Qian Cai, Joel Fernandes (Google), Paul E. McKenney, rcu, LKML, Peter Zijlstra, Will Deacon, Ingo Molnar, Frederic Weisbecker Stephen Rothwell <sfr@canb.auug.org.au> writes: Thu, 19 Mar 2020 15:00:35 -0400 Waiman Long <longman@redhat.com> wrote: >> I am fine for reverting the patch for now. I am still having some >> trouble reproducing it and I have other tasks to work on right now. I >> will get to the bottom of it and resubmit a new patch later on. > > I have reverted that commit from linux-next today. I did the same in the tip tree. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: PROVE_RCU_LIST + /proc/lockdep warning 2020-03-06 4:06 PROVE_RCU_LIST + /proc/lockdep warning Qian Cai 2020-03-07 3:33 ` Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks" Qian Cai @ 2020-03-07 17:16 ` Joel Fernandes 2020-03-09 1:40 ` Boqun Feng 1 sibling, 1 reply; 13+ messages in thread From: Joel Fernandes @ 2020-03-07 17:16 UTC (permalink / raw) To: Qian Cai, Peter Zijlstra Cc: Madhuparna Bhowmik, Paul E. McKenney, rcu, LKML, Will Deacon, Waiman Long On Thu, Mar 05, 2020 at 11:06:10PM -0500, Qian Cai wrote: > Since the linux-next commit c9af03c14bfd (“Default enable RCU list lockdep debugging with PROVE_RCU”), > read /proc/lockdep will trigger a warning with this config below. Reverted the commit fixed the issue > right away. > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config Hmm, since Peter hates the list-RCU checking patches and rejected a patch by Amol to fix this (;-)), the easiest way to resolve it would be to just bypass the check in lockdep code: Peter, this should be the last of the list-RCU changes and thank you for the patience. Should I or Amol send a patch for this? ---8<----------------------- diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 32406ef0d6a2d..d47643d8081b2 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -1493,7 +1493,7 @@ static int __bfs(struct lock_list *source_entry, DEBUG_LOCKS_WARN_ON(!irqs_disabled()); - list_for_each_entry_rcu(entry, head, entry) { + list_for_each_entry_rcu(entry, head, entry, true) { if (!lock_accessed(entry)) { unsigned int cq_depth; mark_lock_accessed(entry, lock); thanks, - Joel > > [26405.676199][ T3548] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) > [26405.676239][ T3548] WARNING: CPU: 11 PID: 3548 at kernel/locking/lockdep.c:4637 check_flags.part.28+0x218/0x220 > [26405.756287][ T3548] Modules linked in: kvm_intel nls_iso8859_1 nls_cp437 kvm vfat fat irqbypass intel_cstate intel_uncore intel_rapl_perf dax_pmem dax_pmem_core efivars ip_tables x_tables xfs sd_mod bnx2x hpsa mdio scsi_transport_sas firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs > [26405.881899][ T3548] CPU: 11 PID: 3548 Comm: cat Not tainted 5.6.0-rc4-next-20200305+ #8 > [26405.920091][ T3548] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > [26405.955370][ T3548] RIP: 0010:check_flags.part.28+0x218/0x220 > [26405.983016][ T3548] Code: 13 8a e8 2b 3f 29 00 44 8b 15 84 df ba 01 45 85 d2 0f 85 c7 94 00 00 48 c7 c6 40 2b 47 89 48 c7 c7 40 04 47 89 e8 49 e3 f3 ff <0f> 0b e9 ad 94 00 00 90 55 48 89 e5 41 57 4d 89 cf 41 56 45 89 c6 > [26406.076147][ T3548] RSP: 0018:ffffc9000695f848 EFLAGS: 00010086 > [26406.104215][ T3548] RAX: 0000000000000000 RBX: ffff888fe6184040 RCX: ffffffff8858cecf > [26406.140856][ T3548] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: 0000000000000000 > [26406.178465][ T3548] RBP: ffffc9000695f850 R08: fffffbfff1377355 R09: fffffbfff1377355 > [26406.217995][ T3548] R10: ffffffff89bb9aa3 R11: fffffbfff1377354 R12: 0000000000000000 > [26406.256760][ T3548] R13: ffffffff8aa55ee0 R14: 0000000000000046 R15: ffffffff8aa55ec0 > [26406.293708][ T3548] FS: 00007f58cf3a3540(0000) GS:ffff88905fa80000(0000) knlGS:0000000000000000 > [26406.335252][ T3548] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [26406.366331][ T3548] CR2: 00007f58cf326000 CR3: 0000000f1ba38006 CR4: 00000000001606e0 > [26406.402408][ T3548] Call Trace: > [26406.416739][ T3548] lock_is_held_type+0x5d/0x150 > [26406.438262][ T3548] ? rcu_lockdep_current_cpu_online+0x64/0x80 > [26406.466463][ T3548] rcu_read_lock_any_held+0xac/0x100 > [26406.490105][ T3548] ? rcu_read_lock_held+0xc0/0xc0 > [26406.513258][ T3548] ? __slab_free+0x421/0x540 > [26406.535012][ T3548] ? kasan_kmalloc+0x9/0x10 > [26406.555901][ T3548] ? __kmalloc_node+0x1d7/0x320 > [26406.578668][ T3548] ? kvmalloc_node+0x6f/0x80 > [26406.599872][ T3548] __bfs+0x28a/0x3c0 > [26406.617075][ T3548] ? class_equal+0x30/0x30 > [26406.637524][ T3548] lockdep_count_forward_deps+0x11a/0x1a0 > [26406.664134][ T3548] ? check_noncircular+0x2e0/0x2e0 > [26406.688191][ T3548] ? __kasan_check_read+0x11/0x20 > [26406.713581][ T3548] ? check_chain_key+0x1df/0x2e0 > [26406.738044][ T3548] ? seq_vprintf+0x4e/0xb0 > [26406.758241][ T3548] ? seq_printf+0x9b/0xd0 > [26406.778169][ T3548] ? seq_vprintf+0xb0/0xb0 > [26406.798172][ T3548] l_show+0x1c4/0x380 > [26406.816474][ T3548] ? print_name+0xb0/0xb0 > [26406.836393][ T3548] seq_read+0x56b/0x750 > [26406.855346][ T3548] proc_reg_read+0x1b4/0x200 > [26406.876737][ T3548] ? proc_reg_unlocked_ioctl+0x1e0/0x1e0 > [26406.903030][ T3548] ? check_chain_key+0x1df/0x2e0 > [26406.926531][ T3548] ? find_held_lock+0xca/0xf0 > [26406.948291][ T3548] __vfs_read+0x50/0xa0 > [26406.967391][ T3548] vfs_read+0xcb/0x1e0 > [26406.986102][ T3548] ksys_read+0xc6/0x160 > [26407.005405][ T3548] ? kernel_write+0xc0/0xc0 > [26407.026076][ T3548] ? do_syscall_64+0x79/0xaec > [26407.047448][ T3548] ? do_syscall_64+0x79/0xaec > [26407.068650][ T3548] __x64_sys_read+0x43/0x50 > [26407.089132][ T3548] do_syscall_64+0xcc/0xaec > [26407.109939][ T3548] ? trace_hardirqs_on_thunk+0x1a/0x1c > [26407.134924][ T3548] ? syscall_return_slowpath+0x580/0x580 > [26407.160854][ T3548] ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe > [26407.188943][ T3548] ? trace_hardirqs_off_caller+0x3a/0x150 > [26407.216692][ T3548] ? trace_hardirqs_off_thunk+0x1a/0x1c > [26407.243534][ T3548] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [26407.272720][ T3548] RIP: 0033:0x7f58ceeafd75 > [26407.293162][ T3548] Code: fe ff ff 50 48 8d 3d 4a dc 09 00 e8 25 0e 02 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 a5 59 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 > [26407.386043][ T3548] RSP: 002b:00007ffc115111a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > [26407.425283][ T3548] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f58ceeafd75 > [26407.462717][ T3548] RDX: 0000000000020000 RSI: 00007f58cf327000 RDI: 0000000000000003 > [26407.500428][ T3548] RBP: 00007f58cf327000 R08: 00000000ffffffff R09: 0000000000000000 > [26407.538473][ T3548] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f58cf327000 > [26407.575743][ T3548] R13: 0000000000000003 R14: 0000000000000fff R15: 0000000000020000 > [26407.613112][ T3548] irq event stamp: 7161 > [26407.632089][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 > [26407.680094][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 > [26407.727273][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 > [26407.774000][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 > [26407.814407][ T3548] ---[ end trace 1026d00df66af83e ]--- > [26407.839742][ T3548] possible reason: unannotated irqs-off. > [26407.866243][ T3548] irq event stamp: 7161 > [26407.885407][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 > [26407.933602][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 > [26407.980432][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 > [26408.022826][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 > > On a side note, it likely to hit another bug in next-20200305 (not such problem on 0304) where it > will stuck during boot, but the reverting does not help there. Rebooting a few times could pass. > > [ 0.013514][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 > [ 0.013514][ C0] Modules linked in: > [ 0.013514][ C0] irq event stamp: 64186318 > [ 0.013514][ C0] hardirqs last enabled at (64186317): [<ffffffff84c9b107>] _raw_spin_unlock_irq+0x27/0x40 > [ 0.013514][ C0] hardirqs last disabled at (64186318): [<ffffffff84c8f384>] __schedule+0x214/0x1070 > [ 0.013514][ C0] softirqs last enabled at (267904): [<ffffffff85000447>] __do_softirq+0x447/0x766 > [ 0.013514][ C0] softirqs last disabled at (267897): [<ffffffff842d1f16>] irq_exit+0xd6/0xf0 > [ 0.013514][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200305+ #6 > [ 0.013514][ C0] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > [ 0.013514][ C0] RIP: 0010:lock_is_held_type+0x12a/0x150 > [ 0.013514][ C0] Code: 41 0f 94 c4 65 48 8b 1c 25 40 0f 02 00 48 8d bb 74 08 00 00 e8 77 c0 28 00 c7 83 74 08 00 00 00 00 00 00 41 56 9d 48 83 c4 18 <44> 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 45 31 e4 eb c7 41 bc 01 > [ 0.013514][ C0] RSP: 0000:ffffc9000628f9f8 EFLAGS: 00000082 > [ 0.013514][ C0] RAX: 0000000000000000 RBX: ffff889880efc040 RCX: ffffffff8438b449 > [ 0.013514][ C0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff889880efc8b4 > [ 0.013514][ C0] RBP: ffffc9000628fa20 R08: ffffed1108588a24 R09: ffffed1108588a24 > [ 0.013514][ C0] R10: ffff888842c4511b R11: 0000000000000000 R12: 0000000000000000 > [ 0.013514][ C0] R13: ffff889880efc908 R14: 0000000000000046 R15: 0000000000000003 > [ 0.013514][ C0] FS: 0000000000000000(0000) GS:ffff888842c00000(0000) knlGS:0000000000000000 > [ 0.013514][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.013514][ C0] CR2: ffff88a0707ff000 CR3: 0000000b72012001 CR4: 00000000001606f0 > [ 0.013514][ C0] Call Trace: > [ 0.013514][ C0] rcu_read_lock_sched_held+0xac/0xe0 > lock_is_held at include/linux/lockdep.h:361 > (inlined by) rcu_read_lock_sched_held at kernel/rcu/update.c:121 > [ 0.013514][ C0] ? rcu_read_lock_bh_held+0xc0/0xc0 > [ 0.013514][ C0] rcu_note_context_switcx186/0x3b0 > [ 0.013514][ C0] __schedule+0x21f/0x1070 > [ 0.013514][ C0] ? __sched_text_start+0x8/0x8 > [ 0.013514][ C0] schedule+0x95/0x160 > [ 0.013514][ C0] do_boot_cpu+0x58c/0xaf0 > [ 0.013514][ C0] native_cpu_up+0x298/0x430 > [ 0.013514][ C0] ? common_cpu_up+0x150/0x150 > [ 0.013514][ C0] bringup_cpu+0x44/0x310 > [ 0.013514][ C0] ? timers_prepare_cpu+0x114/0x190 > [ 0.013514][ C0] ? takedown_cpu+0x2e0/0x2e0 > [ 0.013514][ C0] cpuhp_invoke_callback+0x197/0x1120 > [ 0.013514][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 > [ 0.013514][ C0] _cpu_up+0x171/0x280 > [ 0.013514][ C0] do_cpu_up+0xb1/0x120 > [ 0.013514][ C0] cpu_up+0x13/0x20 > [ 0.013514][ C0] smp_init+0x91/0x118 > [ 0.013514][ C0] kernel_init_freeable+0x221/0x4f8 > [ 0.013514][ C0] ? mark_held_locks+0x34/0xb0 > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > [ 0.013514][ C0] ? start_kernel+0x876/0x876 > [ 0.013514][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > [ 0.013514][ C0] ? rest_init+0x307/0x307 > [ 0.013514][ C0] kernel_init+0x 0.013514][ C0] ? rest_init+0x307/0x307 > [ 0.013514][ C0] ret_from_fork+0x3a/0x50 > > ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: PROVE_RCU_LIST + /proc/lockdep warning 2020-03-07 17:16 ` PROVE_RCU_LIST + /proc/lockdep warning Joel Fernandes @ 2020-03-09 1:40 ` Boqun Feng 2020-03-10 18:42 ` Qian Cai 0 siblings, 1 reply; 13+ messages in thread From: Boqun Feng @ 2020-03-09 1:40 UTC (permalink / raw) To: Joel Fernandes Cc: Qian Cai, Peter Zijlstra, Madhuparna Bhowmik, Paul E. McKenney, rcu, LKML, Will Deacon, Waiman Long On Sat, Mar 07, 2020 at 12:16:18PM -0500, Joel Fernandes wrote: > On Thu, Mar 05, 2020 at 11:06:10PM -0500, Qian Cai wrote: > > Since the linux-next commit c9af03c14bfd (“Default enable RCU list lockdep debugging with PROVE_RCU”), > > read /proc/lockdep will trigger a warning with this config below. Reverted the commit fixed the issue > > right away. > > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > Hmm, since Peter hates the list-RCU checking patches and rejected a patch by > Amol to fix this (;-)), the easiest way to resolve it would be to just bypass > the check in lockdep code: > > Peter, this should be the last of the list-RCU changes and thank you for the > patience. > > Should I or Amol send a patch for this? > Hmm.. IIUC, the warning got triggered here is because lockdep_count_forward_deps() didn't set up the ->lockdep_recursion, as a result, __bfs() was called without ->lockdep_recursion being 1, which introduced the inconsistency of lockdep status. So how about the following (untested) fix?: Thoughts? Regards, Boqun ------------------------- diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 32406ef0d6a2..a258640edace 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -1720,7 +1720,9 @@ unsigned long lockdep_count_forward_deps(struct lock_class *class) raw_local_irq_save(flags); arch_spin_lock(&lockdep_lock); + current->lockdep_recursion = 1; ret = __lockdep_count_forward_deps(&this); + current->lockdep_recursion = 0; arch_spin_unlock(&lockdep_lock); raw_local_irq_restore(flags); @@ -1747,7 +1749,9 @@ unsigned long lockdep_count_backward_deps(struct lock_class *class) raw_local_irq_save(flags); arch_spin_lock(&lockdep_lock); + current->lockdep_recursion = 1; ret = __lockdep_count_backward_deps(&this); + current->lockdep_recursion = 0; arch_spin_unlock(&lockdep_lock); raw_local_irq_restore(flags); > ---8<----------------------- > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index 32406ef0d6a2d..d47643d8081b2 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -1493,7 +1493,7 @@ static int __bfs(struct lock_list *source_entry, > > DEBUG_LOCKS_WARN_ON(!irqs_disabled()); > > - list_for_each_entry_rcu(entry, head, entry) { > + list_for_each_entry_rcu(entry, head, entry, true) { > if (!lock_accessed(entry)) { > unsigned int cq_depth; > mark_lock_accessed(entry, lock); > > thanks, > > - Joel > > > > > [26405.676199][ T3548] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) > > [26405.676239][ T3548] WARNING: CPU: 11 PID: 3548 at kernel/locking/lockdep.c:4637 check_flags.part.28+0x218/0x220 > > [26405.756287][ T3548] Modules linked in: kvm_intel nls_iso8859_1 nls_cp437 kvm vfat fat irqbypass intel_cstate intel_uncore intel_rapl_perf dax_pmem dax_pmem_core efivars ip_tables x_tables xfs sd_mod bnx2x hpsa mdio scsi_transport_sas firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs > > [26405.881899][ T3548] CPU: 11 PID: 3548 Comm: cat Not tainted 5.6.0-rc4-next-20200305+ #8 > > [26405.920091][ T3548] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > > [26405.955370][ T3548] RIP: 0010:check_flags.part.28+0x218/0x220 > > [26405.983016][ T3548] Code: 13 8a e8 2b 3f 29 00 44 8b 15 84 df ba 01 45 85 d2 0f 85 c7 94 00 00 48 c7 c6 40 2b 47 89 48 c7 c7 40 04 47 89 e8 49 e3 f3 ff <0f> 0b e9 ad 94 00 00 90 55 48 89 e5 41 57 4d 89 cf 41 56 45 89 c6 > > [26406.076147][ T3548] RSP: 0018:ffffc9000695f848 EFLAGS: 00010086 > > [26406.104215][ T3548] RAX: 0000000000000000 RBX: ffff888fe6184040 RCX: ffffffff8858cecf > > [26406.140856][ T3548] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: 0000000000000000 > > [26406.178465][ T3548] RBP: ffffc9000695f850 R08: fffffbfff1377355 R09: fffffbfff1377355 > > [26406.217995][ T3548] R10: ffffffff89bb9aa3 R11: fffffbfff1377354 R12: 0000000000000000 > > [26406.256760][ T3548] R13: ffffffff8aa55ee0 R14: 0000000000000046 R15: ffffffff8aa55ec0 > > [26406.293708][ T3548] FS: 00007f58cf3a3540(0000) GS:ffff88905fa80000(0000) knlGS:0000000000000000 > > [26406.335252][ T3548] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [26406.366331][ T3548] CR2: 00007f58cf326000 CR3: 0000000f1ba38006 CR4: 00000000001606e0 > > [26406.402408][ T3548] Call Trace: > > [26406.416739][ T3548] lock_is_held_type+0x5d/0x150 > > [26406.438262][ T3548] ? rcu_lockdep_current_cpu_online+0x64/0x80 > > [26406.466463][ T3548] rcu_read_lock_any_held+0xac/0x100 > > [26406.490105][ T3548] ? rcu_read_lock_held+0xc0/0xc0 > > [26406.513258][ T3548] ? __slab_free+0x421/0x540 > > [26406.535012][ T3548] ? kasan_kmalloc+0x9/0x10 > > [26406.555901][ T3548] ? __kmalloc_node+0x1d7/0x320 > > [26406.578668][ T3548] ? kvmalloc_node+0x6f/0x80 > > [26406.599872][ T3548] __bfs+0x28a/0x3c0 > > [26406.617075][ T3548] ? class_equal+0x30/0x30 > > [26406.637524][ T3548] lockdep_count_forward_deps+0x11a/0x1a0 > > [26406.664134][ T3548] ? check_noncircular+0x2e0/0x2e0 > > [26406.688191][ T3548] ? __kasan_check_read+0x11/0x20 > > [26406.713581][ T3548] ? check_chain_key+0x1df/0x2e0 > > [26406.738044][ T3548] ? seq_vprintf+0x4e/0xb0 > > [26406.758241][ T3548] ? seq_printf+0x9b/0xd0 > > [26406.778169][ T3548] ? seq_vprintf+0xb0/0xb0 > > [26406.798172][ T3548] l_show+0x1c4/0x380 > > [26406.816474][ T3548] ? print_name+0xb0/0xb0 > > [26406.836393][ T3548] seq_read+0x56b/0x750 > > [26406.855346][ T3548] proc_reg_read+0x1b4/0x200 > > [26406.876737][ T3548] ? proc_reg_unlocked_ioctl+0x1e0/0x1e0 > > [26406.903030][ T3548] ? check_chain_key+0x1df/0x2e0 > > [26406.926531][ T3548] ? find_held_lock+0xca/0xf0 > > [26406.948291][ T3548] __vfs_read+0x50/0xa0 > > [26406.967391][ T3548] vfs_read+0xcb/0x1e0 > > [26406.986102][ T3548] ksys_read+0xc6/0x160 > > [26407.005405][ T3548] ? kernel_write+0xc0/0xc0 > > [26407.026076][ T3548] ? do_syscall_64+0x79/0xaec > > [26407.047448][ T3548] ? do_syscall_64+0x79/0xaec > > [26407.068650][ T3548] __x64_sys_read+0x43/0x50 > > [26407.089132][ T3548] do_syscall_64+0xcc/0xaec > > [26407.109939][ T3548] ? trace_hardirqs_on_thunk+0x1a/0x1c > > [26407.134924][ T3548] ? syscall_return_slowpath+0x580/0x580 > > [26407.160854][ T3548] ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe > > [26407.188943][ T3548] ? trace_hardirqs_off_caller+0x3a/0x150 > > [26407.216692][ T3548] ? trace_hardirqs_off_thunk+0x1a/0x1c > > [26407.243534][ T3548] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > [26407.272720][ T3548] RIP: 0033:0x7f58ceeafd75 > > [26407.293162][ T3548] Code: fe ff ff 50 48 8d 3d 4a dc 09 00 e8 25 0e 02 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 a5 59 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 > > [26407.386043][ T3548] RSP: 002b:00007ffc115111a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > > [26407.425283][ T3548] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f58ceeafd75 > > [26407.462717][ T3548] RDX: 0000000000020000 RSI: 00007f58cf327000 RDI: 0000000000000003 > > [26407.500428][ T3548] RBP: 00007f58cf327000 R08: 00000000ffffffff R09: 0000000000000000 > > [26407.538473][ T3548] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f58cf327000 > > [26407.575743][ T3548] R13: 0000000000000003 R14: 0000000000000fff R15: 0000000000020000 > > [26407.613112][ T3548] irq event stamp: 7161 > > [26407.632089][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 > > [26407.680094][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 > > [26407.727273][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 > > [26407.774000][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 > > [26407.814407][ T3548] ---[ end trace 1026d00df66af83e ]--- > > [26407.839742][ T3548] possible reason: unannotated irqs-off. > > [26407.866243][ T3548] irq event stamp: 7161 > > [26407.885407][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 > > [26407.933602][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 > > [26407.980432][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 > > [26408.022826][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 > > > > On a side note, it likely to hit another bug in next-20200305 (not such problem on 0304) where it > > will stuck during boot, but the reverting does not help there. Rebooting a few times could pass. > > > > [ 0.013514][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 > > [ 0.013514][ C0] Modules linked in: > > [ 0.013514][ C0] irq event stamp: 64186318 > > [ 0.013514][ C0] hardirqs last enabled at (64186317): [<ffffffff84c9b107>] _raw_spin_unlock_irq+0x27/0x40 > > [ 0.013514][ C0] hardirqs last disabled at (64186318): [<ffffffff84c8f384>] __schedule+0x214/0x1070 > > [ 0.013514][ C0] softirqs last enabled at (267904): [<ffffffff85000447>] __do_softirq+0x447/0x766 > > [ 0.013514][ C0] softirqs last disabled at (267897): [<ffffffff842d1f16>] irq_exit+0xd6/0xf0 > > [ 0.013514][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200305+ #6 > > [ 0.013514][ C0] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > > [ 0.013514][ C0] RIP: 0010:lock_is_held_type+0x12a/0x150 > > [ 0.013514][ C0] Code: 41 0f 94 c4 65 48 8b 1c 25 40 0f 02 00 48 8d bb 74 08 00 00 e8 77 c0 28 00 c7 83 74 08 00 00 00 00 00 00 41 56 9d 48 83 c4 18 <44> 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 45 31 e4 eb c7 41 bc 01 > > [ 0.013514][ C0] RSP: 0000:ffffc9000628f9f8 EFLAGS: 00000082 > > [ 0.013514][ C0] RAX: 0000000000000000 RBX: ffff889880efc040 RCX: ffffffff8438b449 > > [ 0.013514][ C0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff889880efc8b4 > > [ 0.013514][ C0] RBP: ffffc9000628fa20 R08: ffffed1108588a24 R09: ffffed1108588a24 > > [ 0.013514][ C0] R10: ffff888842c4511b R11: 0000000000000000 R12: 0000000000000000 > > [ 0.013514][ C0] R13: ffff889880efc908 R14: 0000000000000046 R15: 0000000000000003 > > [ 0.013514][ C0] FS: 0000000000000000(0000) GS:ffff888842c00000(0000) knlGS:0000000000000000 > > [ 0.013514][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 0.013514][ C0] CR2: ffff88a0707ff000 CR3: 0000000b72012001 CR4: 00000000001606f0 > > [ 0.013514][ C0] Call Trace: > > [ 0.013514][ C0] rcu_read_lock_sched_held+0xac/0xe0 > > lock_is_held at include/linux/lockdep.h:361 > > (inlined by) rcu_read_lock_sched_held at kernel/rcu/update.c:121 > > [ 0.013514][ C0] ? rcu_read_lock_bh_held+0xc0/0xc0 > > [ 0.013514][ C0] rcu_note_context_switcx186/0x3b0 > > [ 0.013514][ C0] __schedule+0x21f/0x1070 > > [ 0.013514][ C0] ? __sched_text_start+0x8/0x8 > > [ 0.013514][ C0] schedule+0x95/0x160 > > [ 0.013514][ C0] do_boot_cpu+0x58c/0xaf0 > > [ 0.013514][ C0] native_cpu_up+0x298/0x430 > > [ 0.013514][ C0] ? common_cpu_up+0x150/0x150 > > [ 0.013514][ C0] bringup_cpu+0x44/0x310 > > [ 0.013514][ C0] ? timers_prepare_cpu+0x114/0x190 > > [ 0.013514][ C0] ? takedown_cpu+0x2e0/0x2e0 > > [ 0.013514][ C0] cpuhp_invoke_callback+0x197/0x1120 > > [ 0.013514][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 > > [ 0.013514][ C0] _cpu_up+0x171/0x280 > > [ 0.013514][ C0] do_cpu_up+0xb1/0x120 > > [ 0.013514][ C0] cpu_up+0x13/0x20 > > [ 0.013514][ C0] smp_init+0x91/0x118 > > [ 0.013514][ C0] kernel_init_freeable+0x221/0x4f8 > > [ 0.013514][ C0] ? mark_held_locks+0x34/0xb0 > > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > > [ 0.013514][ C0] ? start_kernel+0x876/0x876 > > [ 0.013514][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 > > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > > [ 0.013514][ C0] ? rest_init+0x307/0x307 > > [ 0.013514][ C0] kernel_init+0x 0.013514][ C0] ? rest_init+0x307/0x307 > > [ 0.013514][ C0] ret_from_fork+0x3a/0x50 > > > > > > ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: PROVE_RCU_LIST + /proc/lockdep warning 2020-03-09 1:40 ` Boqun Feng @ 2020-03-10 18:42 ` Qian Cai 2020-03-11 3:30 ` Boqun Feng 0 siblings, 1 reply; 13+ messages in thread From: Qian Cai @ 2020-03-10 18:42 UTC (permalink / raw) To: Boqun Feng, Joel Fernandes Cc: Peter Zijlstra, Madhuparna Bhowmik, Paul E. McKenney, rcu, LKML, Will Deacon, Waiman Long On Mon, 2020-03-09 at 09:40 +0800, Boqun Feng wrote: > On Sat, Mar 07, 2020 at 12:16:18PM -0500, Joel Fernandes wrote: > > On Thu, Mar 05, 2020 at 11:06:10PM -0500, Qian Cai wrote: > > > Since the linux-next commit c9af03c14bfd (“Default enable RCU list lockdep debugging with PROVE_RCU”), > > > read /proc/lockdep will trigger a warning with this config below. Reverted the commit fixed the issue > > > right away. > > > > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > > > Hmm, since Peter hates the list-RCU checking patches and rejected a patch by > > Amol to fix this (;-)), the easiest way to resolve it would be to just bypass > > the check in lockdep code: > > > > Peter, this should be the last of the list-RCU changes and thank you for the > > patience. > > > > Should I or Amol send a patch for this? > > > > Hmm.. IIUC, the warning got triggered here is because > lockdep_count_forward_deps() didn't set up the ->lockdep_recursion, as a > result, __bfs() was called without ->lockdep_recursion being 1, which > introduced the inconsistency of lockdep status. So how about the > following (untested) fix?: > > Thoughts? > > Regards, > Boqun > > ------------------------- > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index 32406ef0d6a2..a258640edace 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -1720,7 +1720,9 @@ unsigned long lockdep_count_forward_deps(struct lock_class *class) > > raw_local_irq_save(flags); > arch_spin_lock(&lockdep_lock); > + current->lockdep_recursion = 1; > ret = __lockdep_count_forward_deps(&this); > + current->lockdep_recursion = 0; > arch_spin_unlock(&lockdep_lock); > raw_local_irq_restore(flags); This does not work. Still the same splat. > > > ---8<----------------------- > > > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > > index 32406ef0d6a2d..d47643d8081b2 100644 > > --- a/kernel/locking/lockdep.c > > +++ b/kernel/locking/lockdep.c > > @@ -1493,7 +1493,7 @@ static int __bfs(struct lock_list *source_entry, > > > > DEBUG_LOCKS_WARN_ON(!irqs_disabled()); > > > > - list_for_each_entry_rcu(entry, head, entry) { > > + list_for_each_entry_rcu(entry, head, entry, true) { > > if (!lock_accessed(entry)) { > > unsigned int cq_depth; > > mark_lock_accessed(entry, lock); > > > > thanks, > > > > - Joel > > > > > > > > [26405.676199][ T3548] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) > > > [26405.676239][ T3548] WARNING: CPU: 11 PID: 3548 at kernel/locking/lockdep.c:4637 check_flags.part.28+0x218/0x220 > > > [26405.756287][ T3548] Modules linked in: kvm_intel nls_iso8859_1 nls_cp437 kvm vfat fat irqbypass intel_cstate intel_uncore intel_rapl_perf dax_pmem dax_pmem_core efivars ip_tables x_tables xfs sd_mod bnx2x hpsa mdio scsi_transport_sas firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs > > > [26405.881899][ T3548] CPU: 11 PID: 3548 Comm: cat Not tainted 5.6.0-rc4-next-20200305+ #8 > > > [26405.920091][ T3548] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > > > [26405.955370][ T3548] RIP: 0010:check_flags.part.28+0x218/0x220 > > > [26405.983016][ T3548] Code: 13 8a e8 2b 3f 29 00 44 8b 15 84 df ba 01 45 85 d2 0f 85 c7 94 00 00 48 c7 c6 40 2b 47 89 48 c7 c7 40 04 47 89 e8 49 e3 f3 ff <0f> 0b e9 ad 94 00 00 90 55 48 89 e5 41 57 4d 89 cf 41 56 45 89 c6 > > > [26406.076147][ T3548] RSP: 0018:ffffc9000695f848 EFLAGS: 00010086 > > > [26406.104215][ T3548] RAX: 0000000000000000 RBX: ffff888fe6184040 RCX: ffffffff8858cecf > > > [26406.140856][ T3548] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: 0000000000000000 > > > [26406.178465][ T3548] RBP: ffffc9000695f850 R08: fffffbfff1377355 R09: fffffbfff1377355 > > > [26406.217995][ T3548] R10: ffffffff89bb9aa3 R11: fffffbfff1377354 R12: 0000000000000000 > > > [26406.256760][ T3548] R13: ffffffff8aa55ee0 R14: 0000000000000046 R15: ffffffff8aa55ec0 > > > [26406.293708][ T3548] FS: 00007f58cf3a3540(0000) GS:ffff88905fa80000(0000) knlGS:0000000000000000 > > > [26406.335252][ T3548] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [26406.366331][ T3548] CR2: 00007f58cf326000 CR3: 0000000f1ba38006 CR4: 00000000001606e0 > > > [26406.402408][ T3548] Call Trace: > > > [26406.416739][ T3548] lock_is_held_type+0x5d/0x150 > > > [26406.438262][ T3548] ? rcu_lockdep_current_cpu_online+0x64/0x80 > > > [26406.466463][ T3548] rcu_read_lock_any_held+0xac/0x100 > > > [26406.490105][ T3548] ? rcu_read_lock_held+0xc0/0xc0 > > > [26406.513258][ T3548] ? __slab_free+0x421/0x540 > > > [26406.535012][ T3548] ? kasan_kmalloc+0x9/0x10 > > > [26406.555901][ T3548] ? __kmalloc_node+0x1d7/0x320 > > > [26406.578668][ T3548] ? kvmalloc_node+0x6f/0x80 > > > [26406.599872][ T3548] __bfs+0x28a/0x3c0 > > > [26406.617075][ T3548] ? class_equal+0x30/0x30 > > > [26406.637524][ T3548] lockdep_count_forward_deps+0x11a/0x1a0 > > > [26406.664134][ T3548] ? check_noncircular+0x2e0/0x2e0 > > > [26406.688191][ T3548] ? __kasan_check_read+0x11/0x20 > > > [26406.713581][ T3548] ? check_chain_key+0x1df/0x2e0 > > > [26406.738044][ T3548] ? seq_vprintf+0x4e/0xb0 > > > [26406.758241][ T3548] ? seq_printf+0x9b/0xd0 > > > [26406.778169][ T3548] ? seq_vprintf+0xb0/0xb0 > > > [26406.798172][ T3548] l_show+0x1c4/0x380 > > > [26406.816474][ T3548] ? print_name+0xb0/0xb0 > > > [26406.836393][ T3548] seq_read+0x56b/0x750 > > > [26406.855346][ T3548] proc_reg_read+0x1b4/0x200 > > > [26406.876737][ T3548] ? proc_reg_unlocked_ioctl+0x1e0/0x1e0 > > > [26406.903030][ T3548] ? check_chain_key+0x1df/0x2e0 > > > [26406.926531][ T3548] ? find_held_lock+0xca/0xf0 > > > [26406.948291][ T3548] __vfs_read+0x50/0xa0 > > > [26406.967391][ T3548] vfs_read+0xcb/0x1e0 > > > [26406.986102][ T3548] ksys_read+0xc6/0x160 > > > [26407.005405][ T3548] ? kernel_write+0xc0/0xc0 > > > [26407.026076][ T3548] ? do_syscall_64+0x79/0xaec > > > [26407.047448][ T3548] ? do_syscall_64+0x79/0xaec > > > [26407.068650][ T3548] __x64_sys_read+0x43/0x50 > > > [26407.089132][ T3548] do_syscall_64+0xcc/0xaec > > > [26407.109939][ T3548] ? trace_hardirqs_on_thunk+0x1a/0x1c > > > [26407.134924][ T3548] ? syscall_return_slowpath+0x580/0x580 > > > [26407.160854][ T3548] ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe > > > [26407.188943][ T3548] ? trace_hardirqs_off_caller+0x3a/0x150 > > > [26407.216692][ T3548] ? trace_hardirqs_off_thunk+0x1a/0x1c > > > [26407.243534][ T3548] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > [26407.272720][ T3548] RIP: 0033:0x7f58ceeafd75 > > > [26407.293162][ T3548] Code: fe ff ff 50 48 8d 3d 4a dc 09 00 e8 25 0e 02 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 a5 59 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 > > > [26407.386043][ T3548] RSP: 002b:00007ffc115111a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > > > [26407.425283][ T3548] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f58ceeafd75 > > > [26407.462717][ T3548] RDX: 0000000000020000 RSI: 00007f58cf327000 RDI: 0000000000000003 > > > [26407.500428][ T3548] RBP: 00007f58cf327000 R08: 00000000ffffffff R09: 0000000000000000 > > > [26407.538473][ T3548] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f58cf327000 > > > [26407.575743][ T3548] R13: 0000000000000003 R14: 0000000000000fff R15: 0000000000020000 > > > [26407.613112][ T3548] irq event stamp: 7161 > > > [26407.632089][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 > > > [26407.680094][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 > > > [26407.727273][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 > > > [26407.774000][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 > > > [26407.814407][ T3548] ---[ end trace 1026d00df66af83e ]--- > > > [26407.839742][ T3548] possible reason: unannotated irqs-off. > > > [26407.866243][ T3548] irq event stamp: 7161 > > > [26407.885407][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 > > > [26407.933602][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 > > > [26407.980432][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 > > > [26408.022826][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 > > > > > > On a side note, it likely to hit another bug in next-20200305 (not such problem on 0304) where it > > > will stuck during boot, but the reverting does not help there. Rebooting a few times could pass. > > > > > > [ 0.013514][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 > > > [ 0.013514][ C0] Modules linked in: > > > [ 0.013514][ C0] irq event stamp: 64186318 > > > [ 0.013514][ C0] hardirqs last enabled at (64186317): [<ffffffff84c9b107>] _raw_spin_unlock_irq+0x27/0x40 > > > [ 0.013514][ C0] hardirqs last disabled at (64186318): [<ffffffff84c8f384>] __schedule+0x214/0x1070 > > > [ 0.013514][ C0] softirqs last enabled at (267904): [<ffffffff85000447>] __do_softirq+0x447/0x766 > > > [ 0.013514][ C0] softirqs last disabled at (267897): [<ffffffff842d1f16>] irq_exit+0xd6/0xf0 > > > [ 0.013514][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200305+ #6 > > > [ 0.013514][ C0] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > > > [ 0.013514][ C0] RIP: 0010:lock_is_held_type+0x12a/0x150 > > > [ 0.013514][ C0] Code: 41 0f 94 c4 65 48 8b 1c 25 40 0f 02 00 48 8d bb 74 08 00 00 e8 77 c0 28 00 c7 83 74 08 00 00 00 00 00 00 41 56 9d 48 83 c4 18 <44> 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 45 31 e4 eb c7 41 bc 01 > > > [ 0.013514][ C0] RSP: 0000:ffffc9000628f9f8 EFLAGS: 00000082 > > > [ 0.013514][ C0] RAX: 0000000000000000 RBX: ffff889880efc040 RCX: ffffffff8438b449 > > > [ 0.013514][ C0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff889880efc8b4 > > > [ 0.013514][ C0] RBP: ffffc9000628fa20 R08: ffffed1108588a24 R09: ffffed1108588a24 > > > [ 0.013514][ C0] R10: ffff888842c4511b R11: 0000000000000000 R12: 0000000000000000 > > > [ 0.013514][ C0] R13: ffff889880efc908 R14: 0000000000000046 R15: 0000000000000003 > > > [ 0.013514][ C0] FS: 0000000000000000(0000) GS:ffff888842c00000(0000) knlGS:0000000000000000 > > > [ 0.013514][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 0.013514][ C0] CR2: ffff88a0707ff000 CR3: 0000000b72012001 CR4: 00000000001606f0 > > > [ 0.013514][ C0] Call Trace: > > > [ 0.013514][ C0] rcu_read_lock_sched_held+0xac/0xe0 > > > lock_is_held at include/linux/lockdep.h:361 > > > (inlined by) rcu_read_lock_sched_held at kernel/rcu/update.c:121 > > > [ 0.013514][ C0] ? rcu_read_lock_bh_held+0xc0/0xc0 > > > [ 0.013514][ C0] rcu_note_context_switcx186/0x3b0 > > > [ 0.013514][ C0] __schedule+0x21f/0x1070 > > > [ 0.013514][ C0] ? __sched_text_start+0x8/0x8 > > > [ 0.013514][ C0] schedule+0x95/0x160 > > > [ 0.013514][ C0] do_boot_cpu+0x58c/0xaf0 > > > [ 0.013514][ C0] native_cpu_up+0x298/0x430 > > > [ 0.013514][ C0] ? common_cpu_up+0x150/0x150 > > > [ 0.013514][ C0] bringup_cpu+0x44/0x310 > > > [ 0.013514][ C0] ? timers_prepare_cpu+0x114/0x190 > > > [ 0.013514][ C0] ? takedown_cpu+0x2e0/0x2e0 > > > [ 0.013514][ C0] cpuhp_invoke_callback+0x197/0x1120 > > > [ 0.013514][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 > > > [ 0.013514][ C0] _cpu_up+0x171/0x280 > > > [ 0.013514][ C0] do_cpu_up+0xb1/0x120 > > > [ 0.013514][ C0] cpu_up+0x13/0x20 > > > [ 0.013514][ C0] smp_init+0x91/0x118 > > > [ 0.013514][ C0] kernel_init_freeable+0x221/0x4f8 > > > [ 0.013514][ C0] ? mark_held_locks+0x34/0xb0 > > > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > > > [ 0.013514][ C0] ? start_kernel+0x876/0x876 > > > [ 0.013514][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 > > > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > > > [ 0.013514][ C0] ? rest_init+0x307/0x307 > > > [ 0.013514][ C0] kernel_init+0x 0.013514][ C0] ? rest_init+0x307/0x307 > > > [ 0.013514][ C0] ret_from_fork+0x3a/0x50 > > > > > > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: PROVE_RCU_LIST + /proc/lockdep warning 2020-03-10 18:42 ` Qian Cai @ 2020-03-11 3:30 ` Boqun Feng 2020-03-11 13:24 ` Qian Cai 0 siblings, 1 reply; 13+ messages in thread From: Boqun Feng @ 2020-03-11 3:30 UTC (permalink / raw) To: Qian Cai Cc: Joel Fernandes, Peter Zijlstra, Madhuparna Bhowmik, Paul E. McKenney, rcu, LKML, Will Deacon, Waiman Long On Tue, Mar 10, 2020 at 02:42:24PM -0400, Qian Cai wrote: > On Mon, 2020-03-09 at 09:40 +0800, Boqun Feng wrote: > > On Sat, Mar 07, 2020 at 12:16:18PM -0500, Joel Fernandes wrote: > > > On Thu, Mar 05, 2020 at 11:06:10PM -0500, Qian Cai wrote: > > > > Since the linux-next commit c9af03c14bfd (“Default enable RCU list lockdep debugging with PROVE_RCU”), > > > > read /proc/lockdep will trigger a warning with this config below. Reverted the commit fixed the issue > > > > right away. > > > > > > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > > > > > Hmm, since Peter hates the list-RCU checking patches and rejected a patch by > > > Amol to fix this (;-)), the easiest way to resolve it would be to just bypass > > > the check in lockdep code: > > > > > > Peter, this should be the last of the list-RCU changes and thank you for the > > > patience. > > > > > > Should I or Amol send a patch for this? > > > > > > > Hmm.. IIUC, the warning got triggered here is because > > lockdep_count_forward_deps() didn't set up the ->lockdep_recursion, as a > > result, __bfs() was called without ->lockdep_recursion being 1, which > > introduced the inconsistency of lockdep status. So how about the > > following (untested) fix?: > > > > Thoughts? > > > > Regards, > > Boqun > > > > ------------------------- > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > > index 32406ef0d6a2..a258640edace 100644 > > --- a/kernel/locking/lockdep.c > > +++ b/kernel/locking/lockdep.c > > @@ -1720,7 +1720,9 @@ unsigned long lockdep_count_forward_deps(struct lock_class *class) > > > > raw_local_irq_save(flags); > > arch_spin_lock(&lockdep_lock); > > + current->lockdep_recursion = 1; > > ret = __lockdep_count_forward_deps(&this); > > + current->lockdep_recursion = 0; > > arch_spin_unlock(&lockdep_lock); > > raw_local_irq_restore(flags); > > This does not work. Still the same splat. > Hmm.. interesting, I tried this myself. I could trigger the "DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)" warning after a "cat /proc/lockdep" with the linux-next tree, and after I applied the changes I proposed above, "cat /proc/lockdep" did not trigger the same warning. Is this the typical "works on my machine" situation? ;-) Just to be clear, you mean after you added the ->lockdep_recursion pair for both lockdep_count_forward_deps() and lockdep_count_backward_deps(), the same splat ("DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)" still got trigger, after you did a read on /proc/lockdep? Thanks! Regards, Boqun > > > > > ---8<----------------------- > > > > > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > > > index 32406ef0d6a2d..d47643d8081b2 100644 > > > --- a/kernel/locking/lockdep.c > > > +++ b/kernel/locking/lockdep.c > > > @@ -1493,7 +1493,7 @@ static int __bfs(struct lock_list *source_entry, > > > > > > DEBUG_LOCKS_WARN_ON(!irqs_disabled()); > > > > > > - list_for_each_entry_rcu(entry, head, entry) { > > > + list_for_each_entry_rcu(entry, head, entry, true) { > > > if (!lock_accessed(entry)) { > > > unsigned int cq_depth; > > > mark_lock_accessed(entry, lock); > > > > > > thanks, > > > > > > - Joel > > > > > > > > > > > [26405.676199][ T3548] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) > > > > [26405.676239][ T3548] WARNING: CPU: 11 PID: 3548 at kernel/locking/lockdep.c:4637 check_flags.part.28+0x218/0x220 > > > > [26405.756287][ T3548] Modules linked in: kvm_intel nls_iso8859_1 nls_cp437 kvm vfat fat irqbypass intel_cstate intel_uncore intel_rapl_perf dax_pmem dax_pmem_core efivars ip_tables x_tables xfs sd_mod bnx2x hpsa mdio scsi_transport_sas firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs > > > > [26405.881899][ T3548] CPU: 11 PID: 3548 Comm: cat Not tainted 5.6.0-rc4-next-20200305+ #8 > > > > [26405.920091][ T3548] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > > > > [26405.955370][ T3548] RIP: 0010:check_flags.part.28+0x218/0x220 > > > > [26405.983016][ T3548] Code: 13 8a e8 2b 3f 29 00 44 8b 15 84 df ba 01 45 85 d2 0f 85 c7 94 00 00 48 c7 c6 40 2b 47 89 48 c7 c7 40 04 47 89 e8 49 e3 f3 ff <0f> 0b e9 ad 94 00 00 90 55 48 89 e5 41 57 4d 89 cf 41 56 45 89 c6 > > > > [26406.076147][ T3548] RSP: 0018:ffffc9000695f848 EFLAGS: 00010086 > > > > [26406.104215][ T3548] RAX: 0000000000000000 RBX: ffff888fe6184040 RCX: ffffffff8858cecf > > > > [26406.140856][ T3548] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: 0000000000000000 > > > > [26406.178465][ T3548] RBP: ffffc9000695f850 R08: fffffbfff1377355 R09: fffffbfff1377355 > > > > [26406.217995][ T3548] R10: ffffffff89bb9aa3 R11: fffffbfff1377354 R12: 0000000000000000 > > > > [26406.256760][ T3548] R13: ffffffff8aa55ee0 R14: 0000000000000046 R15: ffffffff8aa55ec0 > > > > [26406.293708][ T3548] FS: 00007f58cf3a3540(0000) GS:ffff88905fa80000(0000) knlGS:0000000000000000 > > > > [26406.335252][ T3548] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [26406.366331][ T3548] CR2: 00007f58cf326000 CR3: 0000000f1ba38006 CR4: 00000000001606e0 > > > > [26406.402408][ T3548] Call Trace: > > > > [26406.416739][ T3548] lock_is_held_type+0x5d/0x150 > > > > [26406.438262][ T3548] ? rcu_lockdep_current_cpu_online+0x64/0x80 > > > > [26406.466463][ T3548] rcu_read_lock_any_held+0xac/0x100 > > > > [26406.490105][ T3548] ? rcu_read_lock_held+0xc0/0xc0 > > > > [26406.513258][ T3548] ? __slab_free+0x421/0x540 > > > > [26406.535012][ T3548] ? kasan_kmalloc+0x9/0x10 > > > > [26406.555901][ T3548] ? __kmalloc_node+0x1d7/0x320 > > > > [26406.578668][ T3548] ? kvmalloc_node+0x6f/0x80 > > > > [26406.599872][ T3548] __bfs+0x28a/0x3c0 > > > > [26406.617075][ T3548] ? class_equal+0x30/0x30 > > > > [26406.637524][ T3548] lockdep_count_forward_deps+0x11a/0x1a0 > > > > [26406.664134][ T3548] ? check_noncircular+0x2e0/0x2e0 > > > > [26406.688191][ T3548] ? __kasan_check_read+0x11/0x20 > > > > [26406.713581][ T3548] ? check_chain_key+0x1df/0x2e0 > > > > [26406.738044][ T3548] ? seq_vprintf+0x4e/0xb0 > > > > [26406.758241][ T3548] ? seq_printf+0x9b/0xd0 > > > > [26406.778169][ T3548] ? seq_vprintf+0xb0/0xb0 > > > > [26406.798172][ T3548] l_show+0x1c4/0x380 > > > > [26406.816474][ T3548] ? print_name+0xb0/0xb0 > > > > [26406.836393][ T3548] seq_read+0x56b/0x750 > > > > [26406.855346][ T3548] proc_reg_read+0x1b4/0x200 > > > > [26406.876737][ T3548] ? proc_reg_unlocked_ioctl+0x1e0/0x1e0 > > > > [26406.903030][ T3548] ? check_chain_key+0x1df/0x2e0 > > > > [26406.926531][ T3548] ? find_held_lock+0xca/0xf0 > > > > [26406.948291][ T3548] __vfs_read+0x50/0xa0 > > > > [26406.967391][ T3548] vfs_read+0xcb/0x1e0 > > > > [26406.986102][ T3548] ksys_read+0xc6/0x160 > > > > [26407.005405][ T3548] ? kernel_write+0xc0/0xc0 > > > > [26407.026076][ T3548] ? do_syscall_64+0x79/0xaec > > > > [26407.047448][ T3548] ? do_syscall_64+0x79/0xaec > > > > [26407.068650][ T3548] __x64_sys_read+0x43/0x50 > > > > [26407.089132][ T3548] do_syscall_64+0xcc/0xaec > > > > [26407.109939][ T3548] ? trace_hardirqs_on_thunk+0x1a/0x1c > > > > [26407.134924][ T3548] ? syscall_return_slowpath+0x580/0x580 > > > > [26407.160854][ T3548] ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe > > > > [26407.188943][ T3548] ? trace_hardirqs_off_caller+0x3a/0x150 > > > > [26407.216692][ T3548] ? trace_hardirqs_off_thunk+0x1a/0x1c > > > > [26407.243534][ T3548] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > [26407.272720][ T3548] RIP: 0033:0x7f58ceeafd75 > > > > [26407.293162][ T3548] Code: fe ff ff 50 48 8d 3d 4a dc 09 00 e8 25 0e 02 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 a5 59 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 > > > > [26407.386043][ T3548] RSP: 002b:00007ffc115111a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > > > > [26407.425283][ T3548] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f58ceeafd75 > > > > [26407.462717][ T3548] RDX: 0000000000020000 RSI: 00007f58cf327000 RDI: 0000000000000003 > > > > [26407.500428][ T3548] RBP: 00007f58cf327000 R08: 00000000ffffffff R09: 0000000000000000 > > > > [26407.538473][ T3548] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f58cf327000 > > > > [26407.575743][ T3548] R13: 0000000000000003 R14: 0000000000000fff R15: 0000000000020000 > > > > [26407.613112][ T3548] irq event stamp: 7161 > > > > [26407.632089][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 > > > > [26407.680094][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 > > > > [26407.727273][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 > > > > [26407.774000][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 > > > > [26407.814407][ T3548] ---[ end trace 1026d00df66af83e ]--- > > > > [26407.839742][ T3548] possible reason: unannotated irqs-off. > > > > [26407.866243][ T3548] irq event stamp: 7161 > > > > [26407.885407][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 > > > > [26407.933602][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 > > > > [26407.980432][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 > > > > [26408.022826][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 > > > > > > > > On a side note, it likely to hit another bug in next-20200305 (not such problem on 0304) where it > > > > will stuck during boot, but the reverting does not help there. Rebooting a few times could pass. > > > > > > > > [ 0.013514][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 > > > > [ 0.013514][ C0] Modules linked in: > > > > [ 0.013514][ C0] irq event stamp: 64186318 > > > > [ 0.013514][ C0] hardirqs last enabled at (64186317): [<ffffffff84c9b107>] _raw_spin_unlock_irq+0x27/0x40 > > > > [ 0.013514][ C0] hardirqs last disabled at (64186318): [<ffffffff84c8f384>] __schedule+0x214/0x1070 > > > > [ 0.013514][ C0] softirqs last enabled at (267904): [<ffffffff85000447>] __do_softirq+0x447/0x766 > > > > [ 0.013514][ C0] softirqs last disabled at (267897): [<ffffffff842d1f16>] irq_exit+0xd6/0xf0 > > > > [ 0.013514][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200305+ #6 > > > > [ 0.013514][ C0] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > > > > [ 0.013514][ C0] RIP: 0010:lock_is_held_type+0x12a/0x150 > > > > [ 0.013514][ C0] Code: 41 0f 94 c4 65 48 8b 1c 25 40 0f 02 00 48 8d bb 74 08 00 00 e8 77 c0 28 00 c7 83 74 08 00 00 00 00 00 00 41 56 9d 48 83 c4 18 <44> 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 45 31 e4 eb c7 41 bc 01 > > > > [ 0.013514][ C0] RSP: 0000:ffffc9000628f9f8 EFLAGS: 00000082 > > > > [ 0.013514][ C0] RAX: 0000000000000000 RBX: ffff889880efc040 RCX: ffffffff8438b449 > > > > [ 0.013514][ C0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff889880efc8b4 > > > > [ 0.013514][ C0] RBP: ffffc9000628fa20 R08: ffffed1108588a24 R09: ffffed1108588a24 > > > > [ 0.013514][ C0] R10: ffff888842c4511b R11: 0000000000000000 R12: 0000000000000000 > > > > [ 0.013514][ C0] R13: ffff889880efc908 R14: 0000000000000046 R15: 0000000000000003 > > > > [ 0.013514][ C0] FS: 0000000000000000(0000) GS:ffff888842c00000(0000) knlGS:0000000000000000 > > > > [ 0.013514][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 0.013514][ C0] CR2: ffff88a0707ff000 CR3: 0000000b72012001 CR4: 00000000001606f0 > > > > [ 0.013514][ C0] Call Trace: > > > > [ 0.013514][ C0] rcu_read_lock_sched_held+0xac/0xe0 > > > > lock_is_held at include/linux/lockdep.h:361 > > > > (inlined by) rcu_read_lock_sched_held at kernel/rcu/update.c:121 > > > > [ 0.013514][ C0] ? rcu_read_lock_bh_held+0xc0/0xc0 > > > > [ 0.013514][ C0] rcu_note_context_switcx186/0x3b0 > > > > [ 0.013514][ C0] __schedule+0x21f/0x1070 > > > > [ 0.013514][ C0] ? __sched_text_start+0x8/0x8 > > > > [ 0.013514][ C0] schedule+0x95/0x160 > > > > [ 0.013514][ C0] do_boot_cpu+0x58c/0xaf0 > > > > [ 0.013514][ C0] native_cpu_up+0x298/0x430 > > > > [ 0.013514][ C0] ? common_cpu_up+0x150/0x150 > > > > [ 0.013514][ C0] bringup_cpu+0x44/0x310 > > > > [ 0.013514][ C0] ? timers_prepare_cpu+0x114/0x190 > > > > [ 0.013514][ C0] ? takedown_cpu+0x2e0/0x2e0 > > > > [ 0.013514][ C0] cpuhp_invoke_callback+0x197/0x1120 > > > > [ 0.013514][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 > > > > [ 0.013514][ C0] _cpu_up+0x171/0x280 > > > > [ 0.013514][ C0] do_cpu_up+0xb1/0x120 > > > > [ 0.013514][ C0] cpu_up+0x13/0x20 > > > > [ 0.013514][ C0] smp_init+0x91/0x118 > > > > [ 0.013514][ C0] kernel_init_freeable+0x221/0x4f8 > > > > [ 0.013514][ C0] ? mark_held_locks+0x34/0xb0 > > > > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > > > > [ 0.013514][ C0] ? start_kernel+0x876/0x876 > > > > [ 0.013514][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 > > > > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > > > > [ 0.013514][ C0] ? rest_init+0x307/0x307 > > > > [ 0.013514][ C0] kernel_init+0x 0.013514][ C0] ? rest_init+0x307/0x307 > > > > [ 0.013514][ C0] ret_from_fork+0x3a/0x50 > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: PROVE_RCU_LIST + /proc/lockdep warning 2020-03-11 3:30 ` Boqun Feng @ 2020-03-11 13:24 ` Qian Cai 2020-03-12 0:48 ` Boqun Feng 0 siblings, 1 reply; 13+ messages in thread From: Qian Cai @ 2020-03-11 13:24 UTC (permalink / raw) To: Boqun Feng Cc: Joel Fernandes, Peter Zijlstra, Madhuparna Bhowmik, Paul E. McKenney, rcu, LKML, Will Deacon, Waiman Long On Wed, 2020-03-11 at 11:30 +0800, Boqun Feng wrote: > On Tue, Mar 10, 2020 at 02:42:24PM -0400, Qian Cai wrote: > > On Mon, 2020-03-09 at 09:40 +0800, Boqun Feng wrote: > > > On Sat, Mar 07, 2020 at 12:16:18PM -0500, Joel Fernandes wrote: > > > > On Thu, Mar 05, 2020 at 11:06:10PM -0500, Qian Cai wrote: > > > > > Since the linux-next commit c9af03c14bfd (“Default enable RCU list lockdep debugging with PROVE_RCU”), > > > > > read /proc/lockdep will trigger a warning with this config below. Reverted the commit fixed the issue > > > > > right away. > > > > > > > > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > > > > > > > Hmm, since Peter hates the list-RCU checking patches and rejected a patch by > > > > Amol to fix this (;-)), the easiest way to resolve it would be to just bypass > > > > the check in lockdep code: > > > > > > > > Peter, this should be the last of the list-RCU changes and thank you for the > > > > patience. > > > > > > > > Should I or Amol send a patch for this? > > > > > > > > > > Hmm.. IIUC, the warning got triggered here is because > > > lockdep_count_forward_deps() didn't set up the ->lockdep_recursion, as a > > > result, __bfs() was called without ->lockdep_recursion being 1, which > > > introduced the inconsistency of lockdep status. So how about the > > > following (untested) fix?: > > > > > > Thoughts? > > > > > > Regards, > > > Boqun > > > > > > ------------------------- > > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > > > index 32406ef0d6a2..a258640edace 100644 > > > --- a/kernel/locking/lockdep.c > > > +++ b/kernel/locking/lockdep.c > > > @@ -1720,7 +1720,9 @@ unsigned long lockdep_count_forward_deps(struct lock_class *class) > > > > > > raw_local_irq_save(flags); > > > arch_spin_lock(&lockdep_lock); > > > + current->lockdep_recursion = 1; > > > ret = __lockdep_count_forward_deps(&this); > > > + current->lockdep_recursion = 0; > > > arch_spin_unlock(&lockdep_lock); > > > raw_local_irq_restore(flags); > > > > This does not work. Still the same splat. > > > > Hmm.. interesting, I tried this myself. I could trigger the > "DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)" warning after a "cat > /proc/lockdep" with the linux-next tree, and after I applied the changes > I proposed above, "cat /proc/lockdep" did not trigger the same warning. > > Is this the typical "works on my machine" situation? ;-) > > Just to be clear, you mean after you added the ->lockdep_recursion pair > for both lockdep_count_forward_deps() and lockdep_count_backward_deps(), > the same splat ("DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)" still > got trigger, after you did a read on /proc/lockdep? Thanks! Ah, my fault. I missed the part for lockdep_count_backward_deps(). Anyway, it works fine here. Care to post the patch? > > Regards, > Boqun > > > > > > > > ---8<----------------------- > > > > > > > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > > > > index 32406ef0d6a2d..d47643d8081b2 100644 > > > > --- a/kernel/locking/lockdep.c > > > > +++ b/kernel/locking/lockdep.c > > > > @@ -1493,7 +1493,7 @@ static int __bfs(struct lock_list *source_entry, > > > > > > > > DEBUG_LOCKS_WARN_ON(!irqs_disabled()); > > > > > > > > - list_for_each_entry_rcu(entry, head, entry) { > > > > + list_for_each_entry_rcu(entry, head, entry, true) { > > > > if (!lock_accessed(entry)) { > > > > unsigned int cq_depth; > > > > mark_lock_accessed(entry, lock); > > > > > > > > thanks, > > > > > > > > - Joel > > > > > > > > > > > > > > [26405.676199][ T3548] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) > > > > > [26405.676239][ T3548] WARNING: CPU: 11 PID: 3548 at kernel/locking/lockdep.c:4637 check_flags.part.28+0x218/0x220 > > > > > [26405.756287][ T3548] Modules linked in: kvm_intel nls_iso8859_1 nls_cp437 kvm vfat fat irqbypass intel_cstate intel_uncore intel_rapl_perf dax_pmem dax_pmem_core efivars ip_tables x_tables xfs sd_mod bnx2x hpsa mdio scsi_transport_sas firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs > > > > > [26405.881899][ T3548] CPU: 11 PID: 3548 Comm: cat Not tainted 5.6.0-rc4-next-20200305+ #8 > > > > > [26405.920091][ T3548] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > > > > > [26405.955370][ T3548] RIP: 0010:check_flags.part.28+0x218/0x220 > > > > > [26405.983016][ T3548] Code: 13 8a e8 2b 3f 29 00 44 8b 15 84 df ba 01 45 85 d2 0f 85 c7 94 00 00 48 c7 c6 40 2b 47 89 48 c7 c7 40 04 47 89 e8 49 e3 f3 ff <0f> 0b e9 ad 94 00 00 90 55 48 89 e5 41 57 4d 89 cf 41 56 45 89 c6 > > > > > [26406.076147][ T3548] RSP: 0018:ffffc9000695f848 EFLAGS: 00010086 > > > > > [26406.104215][ T3548] RAX: 0000000000000000 RBX: ffff888fe6184040 RCX: ffffffff8858cecf > > > > > [26406.140856][ T3548] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: 0000000000000000 > > > > > [26406.178465][ T3548] RBP: ffffc9000695f850 R08: fffffbfff1377355 R09: fffffbfff1377355 > > > > > [26406.217995][ T3548] R10: ffffffff89bb9aa3 R11: fffffbfff1377354 R12: 0000000000000000 > > > > > [26406.256760][ T3548] R13: ffffffff8aa55ee0 R14: 0000000000000046 R15: ffffffff8aa55ec0 > > > > > [26406.293708][ T3548] FS: 00007f58cf3a3540(0000) GS:ffff88905fa80000(0000) knlGS:0000000000000000 > > > > > [26406.335252][ T3548] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > [26406.366331][ T3548] CR2: 00007f58cf326000 CR3: 0000000f1ba38006 CR4: 00000000001606e0 > > > > > [26406.402408][ T3548] Call Trace: > > > > > [26406.416739][ T3548] lock_is_held_type+0x5d/0x150 > > > > > [26406.438262][ T3548] ? rcu_lockdep_current_cpu_online+0x64/0x80 > > > > > [26406.466463][ T3548] rcu_read_lock_any_held+0xac/0x100 > > > > > [26406.490105][ T3548] ? rcu_read_lock_held+0xc0/0xc0 > > > > > [26406.513258][ T3548] ? __slab_free+0x421/0x540 > > > > > [26406.535012][ T3548] ? kasan_kmalloc+0x9/0x10 > > > > > [26406.555901][ T3548] ? __kmalloc_node+0x1d7/0x320 > > > > > [26406.578668][ T3548] ? kvmalloc_node+0x6f/0x80 > > > > > [26406.599872][ T3548] __bfs+0x28a/0x3c0 > > > > > [26406.617075][ T3548] ? class_equal+0x30/0x30 > > > > > [26406.637524][ T3548] lockdep_count_forward_deps+0x11a/0x1a0 > > > > > [26406.664134][ T3548] ? check_noncircular+0x2e0/0x2e0 > > > > > [26406.688191][ T3548] ? __kasan_check_read+0x11/0x20 > > > > > [26406.713581][ T3548] ? check_chain_key+0x1df/0x2e0 > > > > > [26406.738044][ T3548] ? seq_vprintf+0x4e/0xb0 > > > > > [26406.758241][ T3548] ? seq_printf+0x9b/0xd0 > > > > > [26406.778169][ T3548] ? seq_vprintf+0xb0/0xb0 > > > > > [26406.798172][ T3548] l_show+0x1c4/0x380 > > > > > [26406.816474][ T3548] ? print_name+0xb0/0xb0 > > > > > [26406.836393][ T3548] seq_read+0x56b/0x750 > > > > > [26406.855346][ T3548] proc_reg_read+0x1b4/0x200 > > > > > [26406.876737][ T3548] ? proc_reg_unlocked_ioctl+0x1e0/0x1e0 > > > > > [26406.903030][ T3548] ? check_chain_key+0x1df/0x2e0 > > > > > [26406.926531][ T3548] ? find_held_lock+0xca/0xf0 > > > > > [26406.948291][ T3548] __vfs_read+0x50/0xa0 > > > > > [26406.967391][ T3548] vfs_read+0xcb/0x1e0 > > > > > [26406.986102][ T3548] ksys_read+0xc6/0x160 > > > > > [26407.005405][ T3548] ? kernel_write+0xc0/0xc0 > > > > > [26407.026076][ T3548] ? do_syscall_64+0x79/0xaec > > > > > [26407.047448][ T3548] ? do_syscall_64+0x79/0xaec > > > > > [26407.068650][ T3548] __x64_sys_read+0x43/0x50 > > > > > [26407.089132][ T3548] do_syscall_64+0xcc/0xaec > > > > > [26407.109939][ T3548] ? trace_hardirqs_on_thunk+0x1a/0x1c > > > > > [26407.134924][ T3548] ? syscall_return_slowpath+0x580/0x580 > > > > > [26407.160854][ T3548] ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe > > > > > [26407.188943][ T3548] ? trace_hardirqs_off_caller+0x3a/0x150 > > > > > [26407.216692][ T3548] ? trace_hardirqs_off_thunk+0x1a/0x1c > > > > > [26407.243534][ T3548] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > > [26407.272720][ T3548] RIP: 0033:0x7f58ceeafd75 > > > > > [26407.293162][ T3548] Code: fe ff ff 50 48 8d 3d 4a dc 09 00 e8 25 0e 02 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 a5 59 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 > > > > > [26407.386043][ T3548] RSP: 002b:00007ffc115111a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > > > > > [26407.425283][ T3548] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f58ceeafd75 > > > > > [26407.462717][ T3548] RDX: 0000000000020000 RSI: 00007f58cf327000 RDI: 0000000000000003 > > > > > [26407.500428][ T3548] RBP: 00007f58cf327000 R08: 00000000ffffffff R09: 0000000000000000 > > > > > [26407.538473][ T3548] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f58cf327000 > > > > > [26407.575743][ T3548] R13: 0000000000000003 R14: 0000000000000fff R15: 0000000000020000 > > > > > [26407.613112][ T3548] irq event stamp: 7161 > > > > > [26407.632089][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 > > > > > [26407.680094][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 > > > > > [26407.727273][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 > > > > > [26407.774000][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 > > > > > [26407.814407][ T3548] ---[ end trace 1026d00df66af83e ]--- > > > > > [26407.839742][ T3548] possible reason: unannotated irqs-off. > > > > > [26407.866243][ T3548] irq event stamp: 7161 > > > > > [26407.885407][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 > > > > > [26407.933602][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 > > > > > [26407.980432][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 > > > > > [26408.022826][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 > > > > > > > > > > On a side note, it likely to hit another bug in next-20200305 (not such problem on 0304) where it > > > > > will stuck during boot, but the reverting does not help there. Rebooting a few times could pass. > > > > > > > > > > [ 0.013514][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 > > > > > [ 0.013514][ C0] Modules linked in: > > > > > [ 0.013514][ C0] irq event stamp: 64186318 > > > > > [ 0.013514][ C0] hardirqs last enabled at (64186317): [<ffffffff84c9b107>] _raw_spin_unlock_irq+0x27/0x40 > > > > > [ 0.013514][ C0] hardirqs last disabled at (64186318): [<ffffffff84c8f384>] __schedule+0x214/0x1070 > > > > > [ 0.013514][ C0] softirqs last enabled at (267904): [<ffffffff85000447>] __do_softirq+0x447/0x766 > > > > > [ 0.013514][ C0] softirqs last disabled at (267897): [<ffffffff842d1f16>] irq_exit+0xd6/0xf0 > > > > > [ 0.013514][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200305+ #6 > > > > > [ 0.013514][ C0] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > > > > > [ 0.013514][ C0] RIP: 0010:lock_is_held_type+0x12a/0x150 > > > > > [ 0.013514][ C0] Code: 41 0f 94 c4 65 48 8b 1c 25 40 0f 02 00 48 8d bb 74 08 00 00 e8 77 c0 28 00 c7 83 74 08 00 00 00 00 00 00 41 56 9d 48 83 c4 18 <44> 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 45 31 e4 eb c7 41 bc 01 > > > > > [ 0.013514][ C0] RSP: 0000:ffffc9000628f9f8 EFLAGS: 00000082 > > > > > [ 0.013514][ C0] RAX: 0000000000000000 RBX: ffff889880efc040 RCX: ffffffff8438b449 > > > > > [ 0.013514][ C0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff889880efc8b4 > > > > > [ 0.013514][ C0] RBP: ffffc9000628fa20 R08: ffffed1108588a24 R09: ffffed1108588a24 > > > > > [ 0.013514][ C0] R10: ffff888842c4511b R11: 0000000000000000 R12: 0000000000000000 > > > > > [ 0.013514][ C0] R13: ffff889880efc908 R14: 0000000000000046 R15: 0000000000000003 > > > > > [ 0.013514][ C0] FS: 0000000000000000(0000) GS:ffff888842c00000(0000) knlGS:0000000000000000 > > > > > [ 0.013514][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > [ 0.013514][ C0] CR2: ffff88a0707ff000 CR3: 0000000b72012001 CR4: 00000000001606f0 > > > > > [ 0.013514][ C0] Call Trace: > > > > > [ 0.013514][ C0] rcu_read_lock_sched_held+0xac/0xe0 > > > > > lock_is_held at include/linux/lockdep.h:361 > > > > > (inlined by) rcu_read_lock_sched_held at kernel/rcu/update.c:121 > > > > > [ 0.013514][ C0] ? rcu_read_lock_bh_held+0xc0/0xc0 > > > > > [ 0.013514][ C0] rcu_note_context_switcx186/0x3b0 > > > > > [ 0.013514][ C0] __schedule+0x21f/0x1070 > > > > > [ 0.013514][ C0] ? __sched_text_start+0x8/0x8 > > > > > [ 0.013514][ C0] schedule+0x95/0x160 > > > > > [ 0.013514][ C0] do_boot_cpu+0x58c/0xaf0 > > > > > [ 0.013514][ C0] native_cpu_up+0x298/0x430 > > > > > [ 0.013514][ C0] ? common_cpu_up+0x150/0x150 > > > > > [ 0.013514][ C0] bringup_cpu+0x44/0x310 > > > > > [ 0.013514][ C0] ? timers_prepare_cpu+0x114/0x190 > > > > > [ 0.013514][ C0] ? takedown_cpu+0x2e0/0x2e0 > > > > > [ 0.013514][ C0] cpuhp_invoke_callback+0x197/0x1120 > > > > > [ 0.013514][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 > > > > > [ 0.013514][ C0] _cpu_up+0x171/0x280 > > > > > [ 0.013514][ C0] do_cpu_up+0xb1/0x120 > > > > > [ 0.013514][ C0] cpu_up+0x13/0x20 > > > > > [ 0.013514][ C0] smp_init+0x91/0x118 > > > > > [ 0.013514][ C0] kernel_init_freeable+0x221/0x4f8 > > > > > [ 0.013514][ C0] ? mark_held_locks+0x34/0xb0 > > > > > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > > > > > [ 0.013514][ C0] ? start_kernel+0x876/0x876 > > > > > [ 0.013514][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 > > > > > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > > > > > [ 0.013514][ C0] ? rest_init+0x307/0x307 > > > > > [ 0.013514][ C0] kernel_init+0x 0.013514][ C0] ? rest_init+0x307/0x307 > > > > > [ 0.013514][ C0] ret_from_fork+0x3a/0x50 > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: PROVE_RCU_LIST + /proc/lockdep warning 2020-03-11 13:24 ` Qian Cai @ 2020-03-12 0:48 ` Boqun Feng 0 siblings, 0 replies; 13+ messages in thread From: Boqun Feng @ 2020-03-12 0:48 UTC (permalink / raw) To: Qian Cai Cc: Joel Fernandes, Peter Zijlstra, Madhuparna Bhowmik, Paul E. McKenney, rcu, LKML, Will Deacon, Waiman Long On Wed, Mar 11, 2020 at 09:24:50AM -0400, Qian Cai wrote: > On Wed, 2020-03-11 at 11:30 +0800, Boqun Feng wrote: > > On Tue, Mar 10, 2020 at 02:42:24PM -0400, Qian Cai wrote: > > > On Mon, 2020-03-09 at 09:40 +0800, Boqun Feng wrote: > > > > On Sat, Mar 07, 2020 at 12:16:18PM -0500, Joel Fernandes wrote: > > > > > On Thu, Mar 05, 2020 at 11:06:10PM -0500, Qian Cai wrote: > > > > > > Since the linux-next commit c9af03c14bfd (“Default enable RCU list lockdep debugging with PROVE_RCU”), > > > > > > read /proc/lockdep will trigger a warning with this config below. Reverted the commit fixed the issue > > > > > > right away. > > > > > > > > > > > > https://raw.githubusercontent.com/cailca/linux-mm/master/x86.config > > > > > > > > > > Hmm, since Peter hates the list-RCU checking patches and rejected a patch by > > > > > Amol to fix this (;-)), the easiest way to resolve it would be to just bypass > > > > > the check in lockdep code: > > > > > > > > > > Peter, this should be the last of the list-RCU changes and thank you for the > > > > > patience. > > > > > > > > > > Should I or Amol send a patch for this? > > > > > > > > > > > > > Hmm.. IIUC, the warning got triggered here is because > > > > lockdep_count_forward_deps() didn't set up the ->lockdep_recursion, as a > > > > result, __bfs() was called without ->lockdep_recursion being 1, which > > > > introduced the inconsistency of lockdep status. So how about the > > > > following (untested) fix?: > > > > > > > > Thoughts? > > > > > > > > Regards, > > > > Boqun > > > > > > > > ------------------------- > > > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > > > > index 32406ef0d6a2..a258640edace 100644 > > > > --- a/kernel/locking/lockdep.c > > > > +++ b/kernel/locking/lockdep.c > > > > @@ -1720,7 +1720,9 @@ unsigned long lockdep_count_forward_deps(struct lock_class *class) > > > > > > > > raw_local_irq_save(flags); > > > > arch_spin_lock(&lockdep_lock); > > > > + current->lockdep_recursion = 1; > > > > ret = __lockdep_count_forward_deps(&this); > > > > + current->lockdep_recursion = 0; > > > > arch_spin_unlock(&lockdep_lock); > > > > raw_local_irq_restore(flags); > > > > > > This does not work. Still the same splat. > > > > > > > Hmm.. interesting, I tried this myself. I could trigger the > > "DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)" warning after a "cat > > /proc/lockdep" with the linux-next tree, and after I applied the changes > > I proposed above, "cat /proc/lockdep" did not trigger the same warning. > > > > Is this the typical "works on my machine" situation? ;-) > > > > Just to be clear, you mean after you added the ->lockdep_recursion pair > > for both lockdep_count_forward_deps() and lockdep_count_backward_deps(), > > the same splat ("DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)" still > > got trigger, after you did a read on /proc/lockdep? Thanks! > > Ah, my fault. I missed the part for lockdep_count_backward_deps(). Anyway, it > works fine here. Care to post the patch? > OK, I will post a patch shortly. Regards, Boqun > > > > Regards, > > Boqun > > > > > > > > > > > ---8<----------------------- > > > > > > > > > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > > > > > index 32406ef0d6a2d..d47643d8081b2 100644 > > > > > --- a/kernel/locking/lockdep.c > > > > > +++ b/kernel/locking/lockdep.c > > > > > @@ -1493,7 +1493,7 @@ static int __bfs(struct lock_list *source_entry, > > > > > > > > > > DEBUG_LOCKS_WARN_ON(!irqs_disabled()); > > > > > > > > > > - list_for_each_entry_rcu(entry, head, entry) { > > > > > + list_for_each_entry_rcu(entry, head, entry, true) { > > > > > if (!lock_accessed(entry)) { > > > > > unsigned int cq_depth; > > > > > mark_lock_accessed(entry, lock); > > > > > > > > > > thanks, > > > > > > > > > > - Joel > > > > > > > > > > > > > > > > > [26405.676199][ T3548] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) > > > > > > [26405.676239][ T3548] WARNING: CPU: 11 PID: 3548 at kernel/locking/lockdep.c:4637 check_flags.part.28+0x218/0x220 > > > > > > [26405.756287][ T3548] Modules linked in: kvm_intel nls_iso8859_1 nls_cp437 kvm vfat fat irqbypass intel_cstate intel_uncore intel_rapl_perf dax_pmem dax_pmem_core efivars ip_tables x_tables xfs sd_mod bnx2x hpsa mdio scsi_transport_sas firmware_class dm_mirror dm_region_hash dm_log dm_mod efivarfs > > > > > > [26405.881899][ T3548] CPU: 11 PID: 3548 Comm: cat Not tainted 5.6.0-rc4-next-20200305+ #8 > > > > > > [26405.920091][ T3548] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > > > > > > [26405.955370][ T3548] RIP: 0010:check_flags.part.28+0x218/0x220 > > > > > > [26405.983016][ T3548] Code: 13 8a e8 2b 3f 29 00 44 8b 15 84 df ba 01 45 85 d2 0f 85 c7 94 00 00 48 c7 c6 40 2b 47 89 48 c7 c7 40 04 47 89 e8 49 e3 f3 ff <0f> 0b e9 ad 94 00 00 90 55 48 89 e5 41 57 4d 89 cf 41 56 45 89 c6 > > > > > > [26406.076147][ T3548] RSP: 0018:ffffc9000695f848 EFLAGS: 00010086 > > > > > > [26406.104215][ T3548] RAX: 0000000000000000 RBX: ffff888fe6184040 RCX: ffffffff8858cecf > > > > > > [26406.140856][ T3548] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: 0000000000000000 > > > > > > [26406.178465][ T3548] RBP: ffffc9000695f850 R08: fffffbfff1377355 R09: fffffbfff1377355 > > > > > > [26406.217995][ T3548] R10: ffffffff89bb9aa3 R11: fffffbfff1377354 R12: 0000000000000000 > > > > > > [26406.256760][ T3548] R13: ffffffff8aa55ee0 R14: 0000000000000046 R15: ffffffff8aa55ec0 > > > > > > [26406.293708][ T3548] FS: 00007f58cf3a3540(0000) GS:ffff88905fa80000(0000) knlGS:0000000000000000 > > > > > > [26406.335252][ T3548] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > > [26406.366331][ T3548] CR2: 00007f58cf326000 CR3: 0000000f1ba38006 CR4: 00000000001606e0 > > > > > > [26406.402408][ T3548] Call Trace: > > > > > > [26406.416739][ T3548] lock_is_held_type+0x5d/0x150 > > > > > > [26406.438262][ T3548] ? rcu_lockdep_current_cpu_online+0x64/0x80 > > > > > > [26406.466463][ T3548] rcu_read_lock_any_held+0xac/0x100 > > > > > > [26406.490105][ T3548] ? rcu_read_lock_held+0xc0/0xc0 > > > > > > [26406.513258][ T3548] ? __slab_free+0x421/0x540 > > > > > > [26406.535012][ T3548] ? kasan_kmalloc+0x9/0x10 > > > > > > [26406.555901][ T3548] ? __kmalloc_node+0x1d7/0x320 > > > > > > [26406.578668][ T3548] ? kvmalloc_node+0x6f/0x80 > > > > > > [26406.599872][ T3548] __bfs+0x28a/0x3c0 > > > > > > [26406.617075][ T3548] ? class_equal+0x30/0x30 > > > > > > [26406.637524][ T3548] lockdep_count_forward_deps+0x11a/0x1a0 > > > > > > [26406.664134][ T3548] ? check_noncircular+0x2e0/0x2e0 > > > > > > [26406.688191][ T3548] ? __kasan_check_read+0x11/0x20 > > > > > > [26406.713581][ T3548] ? check_chain_key+0x1df/0x2e0 > > > > > > [26406.738044][ T3548] ? seq_vprintf+0x4e/0xb0 > > > > > > [26406.758241][ T3548] ? seq_printf+0x9b/0xd0 > > > > > > [26406.778169][ T3548] ? seq_vprintf+0xb0/0xb0 > > > > > > [26406.798172][ T3548] l_show+0x1c4/0x380 > > > > > > [26406.816474][ T3548] ? print_name+0xb0/0xb0 > > > > > > [26406.836393][ T3548] seq_read+0x56b/0x750 > > > > > > [26406.855346][ T3548] proc_reg_read+0x1b4/0x200 > > > > > > [26406.876737][ T3548] ? proc_reg_unlocked_ioctl+0x1e0/0x1e0 > > > > > > [26406.903030][ T3548] ? check_chain_key+0x1df/0x2e0 > > > > > > [26406.926531][ T3548] ? find_held_lock+0xca/0xf0 > > > > > > [26406.948291][ T3548] __vfs_read+0x50/0xa0 > > > > > > [26406.967391][ T3548] vfs_read+0xcb/0x1e0 > > > > > > [26406.986102][ T3548] ksys_read+0xc6/0x160 > > > > > > [26407.005405][ T3548] ? kernel_write+0xc0/0xc0 > > > > > > [26407.026076][ T3548] ? do_syscall_64+0x79/0xaec > > > > > > [26407.047448][ T3548] ? do_syscall_64+0x79/0xaec > > > > > > [26407.068650][ T3548] __x64_sys_read+0x43/0x50 > > > > > > [26407.089132][ T3548] do_syscall_64+0xcc/0xaec > > > > > > [26407.109939][ T3548] ? trace_hardirqs_on_thunk+0x1a/0x1c > > > > > > [26407.134924][ T3548] ? syscall_return_slowpath+0x580/0x580 > > > > > > [26407.160854][ T3548] ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe > > > > > > [26407.188943][ T3548] ? trace_hardirqs_off_caller+0x3a/0x150 > > > > > > [26407.216692][ T3548] ? trace_hardirqs_off_thunk+0x1a/0x1c > > > > > > [26407.243534][ T3548] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > > > > [26407.272720][ T3548] RIP: 0033:0x7f58ceeafd75 > > > > > > [26407.293162][ T3548] Code: fe ff ff 50 48 8d 3d 4a dc 09 00 e8 25 0e 02 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 a5 59 2d 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89 > > > > > > [26407.386043][ T3548] RSP: 002b:00007ffc115111a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > > > > > > [26407.425283][ T3548] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f58ceeafd75 > > > > > > [26407.462717][ T3548] RDX: 0000000000020000 RSI: 00007f58cf327000 RDI: 0000000000000003 > > > > > > [26407.500428][ T3548] RBP: 00007f58cf327000 R08: 00000000ffffffff R09: 0000000000000000 > > > > > > [26407.538473][ T3548] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f58cf327000 > > > > > > [26407.575743][ T3548] R13: 0000000000000003 R14: 0000000000000fff R15: 0000000000020000 > > > > > > [26407.613112][ T3548] irq event stamp: 7161 > > > > > > [26407.632089][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 > > > > > > [26407.680094][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 > > > > > > [26407.727273][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 > > > > > > [26407.774000][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 > > > > > > [26407.814407][ T3548] ---[ end trace 1026d00df66af83e ]--- > > > > > > [26407.839742][ T3548] possible reason: unannotated irqs-off. > > > > > > [26407.866243][ T3548] irq event stamp: 7161 > > > > > > [26407.885407][ T3548] hardirqs last enabled at (7161): [<ffffffff88ea2684>] _raw_spin_unlock_irqrestore+0x44/0x50 > > > > > > [26407.933602][ T3548] hardirqs last disabled at (7160): [<ffffffff88ea2418>] _raw_spin_lock_irqsave+0x18/0x50 > > > > > > [26407.980432][ T3548] softirqs last enabled at (5898): [<ffffffff89200447>] __do_softirq+0x447/0x766 > > > > > > [26408.022826][ T3548] softirqs last disabled at (5889): [<ffffffff884d20e6>] irq_exit+0xd6/0xf0 > > > > > > > > > > > > On a side note, it likely to hit another bug in next-20200305 (not such problem on 0304) where it > > > > > > will stuck during boot, but the reverting does not help there. Rebooting a few times could pass. > > > > > > > > > > > > [ 0.013514][ C0] NMI watchdog: Watchdog detected hard LOCKUP on cpu 0 > > > > > > [ 0.013514][ C0] Modules linked in: > > > > > > [ 0.013514][ C0] irq event stamp: 64186318 > > > > > > [ 0.013514][ C0] hardirqs last enabled at (64186317): [<ffffffff84c9b107>] _raw_spin_unlock_irq+0x27/0x40 > > > > > > [ 0.013514][ C0] hardirqs last disabled at (64186318): [<ffffffff84c8f384>] __schedule+0x214/0x1070 > > > > > > [ 0.013514][ C0] softirqs last enabled at (267904): [<ffffffff85000447>] __do_softirq+0x447/0x766 > > > > > > [ 0.013514][ C0] softirqs last disabled at (267897): [<ffffffff842d1f16>] irq_exit+0xd6/0xf0 > > > > > > [ 0.013514][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc4-next-20200305+ #6 > > > > > > [ 0.013514][ C0] Hardware name: HP ProLiant BL660c Gen9, BIOS I38 10/17/2018 > > > > > > [ 0.013514][ C0] RIP: 0010:lock_is_held_type+0x12a/0x150 > > > > > > [ 0.013514][ C0] Code: 41 0f 94 c4 65 48 8b 1c 25 40 0f 02 00 48 8d bb 74 08 00 00 e8 77 c0 28 00 c7 83 74 08 00 00 00 00 00 00 41 56 9d 48 83 c4 18 <44> 89 e0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 45 31 e4 eb c7 41 bc 01 > > > > > > [ 0.013514][ C0] RSP: 0000:ffffc9000628f9f8 EFLAGS: 00000082 > > > > > > [ 0.013514][ C0] RAX: 0000000000000000 RBX: ffff889880efc040 RCX: ffffffff8438b449 > > > > > > [ 0.013514][ C0] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff889880efc8b4 > > > > > > [ 0.013514][ C0] RBP: ffffc9000628fa20 R08: ffffed1108588a24 R09: ffffed1108588a24 > > > > > > [ 0.013514][ C0] R10: ffff888842c4511b R11: 0000000000000000 R12: 0000000000000000 > > > > > > [ 0.013514][ C0] R13: ffff889880efc908 R14: 0000000000000046 R15: 0000000000000003 > > > > > > [ 0.013514][ C0] FS: 0000000000000000(0000) GS:ffff888842c00000(0000) knlGS:0000000000000000 > > > > > > [ 0.013514][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > > [ 0.013514][ C0] CR2: ffff88a0707ff000 CR3: 0000000b72012001 CR4: 00000000001606f0 > > > > > > [ 0.013514][ C0] Call Trace: > > > > > > [ 0.013514][ C0] rcu_read_lock_sched_held+0xac/0xe0 > > > > > > lock_is_held at include/linux/lockdep.h:361 > > > > > > (inlined by) rcu_read_lock_sched_held at kernel/rcu/update.c:121 > > > > > > [ 0.013514][ C0] ? rcu_read_lock_bh_held+0xc0/0xc0 > > > > > > [ 0.013514][ C0] rcu_note_context_switcx186/0x3b0 > > > > > > [ 0.013514][ C0] __schedule+0x21f/0x1070 > > > > > > [ 0.013514][ C0] ? __sched_text_start+0x8/0x8 > > > > > > [ 0.013514][ C0] schedule+0x95/0x160 > > > > > > [ 0.013514][ C0] do_boot_cpu+0x58c/0xaf0 > > > > > > [ 0.013514][ C0] native_cpu_up+0x298/0x430 > > > > > > [ 0.013514][ C0] ? common_cpu_up+0x150/0x150 > > > > > > [ 0.013514][ C0] bringup_cpu+0x44/0x310 > > > > > > [ 0.013514][ C0] ? timers_prepare_cpu+0x114/0x190 > > > > > > [ 0.013514][ C0] ? takedown_cpu+0x2e0/0x2e0 > > > > > > [ 0.013514][ C0] cpuhp_invoke_callback+0x197/0x1120 > > > > > > [ 0.013514][ C0] ? ring_buffer_record_is_set_on+0x40/0x40 > > > > > > [ 0.013514][ C0] _cpu_up+0x171/0x280 > > > > > > [ 0.013514][ C0] do_cpu_up+0xb1/0x120 > > > > > > [ 0.013514][ C0] cpu_up+0x13/0x20 > > > > > > [ 0.013514][ C0] smp_init+0x91/0x118 > > > > > > [ 0.013514][ C0] kernel_init_freeable+0x221/0x4f8 > > > > > > [ 0.013514][ C0] ? mark_held_locks+0x34/0xb0 > > > > > > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > > > > > > [ 0.013514][ C0] ? start_kernel+0x876/0x876 > > > > > > [ 0.013514][ C0] ? lockdep_hardirqs_on+0x1b0/0x2a0 > > > > > > [ 0.013514][ C0] ? _raw_spin_unlock_irq+0x27/0x40 > > > > > > [ 0.013514][ C0] ? rest_init+0x307/0x307 > > > > > > [ 0.013514][ C0] kernel_init+0x 0.013514][ C0] ? rest_init+0x307/0x307 > > > > > > [ 0.013514][ C0] ret_from_fork+0x3a/0x50 > > > > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-03-20 11:50 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-06 4:06 PROVE_RCU_LIST + /proc/lockdep warning Qian Cai 2020-03-07 3:33 ` Hard lockups due to "tick/common: Make tick_periodic() check for missing ticks" Qian Cai 2020-03-08 21:18 ` Waiman Long 2020-03-19 13:58 ` Qian Cai 2020-03-19 19:00 ` Waiman Long 2020-03-20 2:00 ` Stephen Rothwell 2020-03-20 11:50 ` Thomas Gleixner 2020-03-07 17:16 ` PROVE_RCU_LIST + /proc/lockdep warning Joel Fernandes 2020-03-09 1:40 ` Boqun Feng 2020-03-10 18:42 ` Qian Cai 2020-03-11 3:30 ` Boqun Feng 2020-03-11 13:24 ` Qian Cai 2020-03-12 0:48 ` Boqun Feng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).