On Tue, Jun 23, 2020 at 08:39PM +0200, Marco Elver wrote: > On Tue, 23 Jun 2020 at 20:13, Peter Zijlstra wrote: > > > > On Tue, Jun 23, 2020 at 07:59:57PM +0200, Marco Elver wrote: > > > On Tue, Jun 23, 2020 at 06:37PM +0200, Peter Zijlstra wrote: > > > > On Tue, Jun 23, 2020 at 06:13:21PM +0200, Ahmed S. Darwish wrote: > > > > > Well, freshly merged code is using it. For example, KCSAN: > > > > > > > > > > => f1bc96210c6a ("kcsan: Make KCSAN compatible with lockdep") > > > > > => kernel/kcsan/report.c: > > > > > > > > > > void kcsan_report(...) > > > > > { > > > > > ... > > > > > /* > > > > > * With TRACE_IRQFLAGS, lockdep's IRQ trace state becomes corrupted if > > > > > * we do not turn off lockdep here; this could happen due to recursion > > > > > * into lockdep via KCSAN if we detect a race in utilities used by > > > > > * lockdep. > > > > > */ > > > > > lockdep_off(); > > > > > ... > > > > > } > > > > > > > > Marco, do you remember what exactly happened there? Because I'm about to > > > > wreck that. That is, I'm going to make TRACE_IRQFLAGS ignore > > > > lockdep_off(). > > > > > > Yeah, I was trying to squash any kind of recursion: > > > > > > lockdep -> other libs -> > > > -> KCSAN > > > -> print report > > > -> dump stack, printk and friends > > > -> lockdep -> other libs > > > -> KCSAN ... > > > > > > Some history: > > > > > > * Initial patch to fix: > > > https://lore.kernel.org/lkml/20200115162512.70807-1-elver@google.com/ > > > > That patch is weird; just :=n on lockdep.c should've cured that, the > > rest is massive overkill. > > > > > * KCSAN+lockdep+ftrace: > > > https://lore.kernel.org/lkml/20200214211035.209972-1-elver@google.com/ > > > > That doesn't really have anything useful.. > > > > > lockdep now has KCSAN_SANITIZE := n, but we still need to ensure that > > > there are no paths out of lockdep, or the IRQ flags tracing code, that > > > might lead through other libs, through KCSAN, libs used to generate a > > > report, and back to lockdep. > > > > > > I never quite figured out the exact trace that led to corruption, but > > > avoiding any kind of potential for recursion was the only thing that > > > would avoid the check_flags() warnings. > > > > Fair enough; I'll rip it all up and boot a KCSAN kernel, see what if > > anything happens. > > Thanks! > > This was happening with Qian Cai's (Cc'd) test cases. If the kernel or > this patch changed things around so this doesn't happen anymore > regardless, then I don't see a problem. I see the below report when I boot with your branch + KCSAN and PROVE_LOCKING. config attached. Trying to make sense of what's happening. Thanks, -- Marco ------ >8 ------ [ 10.182354] ------------[ cut here ]------------ [ 10.183058] WARNING: CPU: 7 PID: 136 at kernel/locking/lockdep.c:398 lockdep_hardirqs_on_prepare+0x1c6/0x270 [ 10.184347] Modules linked in: [ 10.184771] CPU: 7 PID: 136 Comm: systemd-journal Not tainted 5.8.0-rc1+ #3 [ 10.185706] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 [ 10.186821] RIP: 0010:lockdep_hardirqs_on_prepare+0x1c6/0x270 [ 10.187594] Code: 75 28 65 48 8b 04 25 28 00 00 00 48 3b 44 24 08 0f 85 b9 00 00 00 48 83 c4 10 5b 41 5e 41 5f c3 65 48 ff 05 d4 24 4e 75 eb d8 <0f> 0b 90 41 c7 86 c4 08 00 00 00 00 00 00 eb c8 e8 65 09 71 01 85 [ 10.190203] RSP: 0018:ffffa7ee802b7848 EFLAGS: 00010017 [ 10.190989] RAX: 0000000000000001 RBX: ffff955e92a34ab0 RCX: 0000000000000001 [ 10.192053] RDX: 0000000000000006 RSI: ffff955e92a34a88 RDI: ffff955e92a341c0 [ 10.193117] RBP: ffffa7ee802b7be8 R08: 0000000000000000 R09: 0000ffffffffffff [ 10.194186] R10: 0000ffffffffffff R11: 0000ffff8d07e268 R12: 0000000000000001 [ 10.195249] R13: ffffffff8e41bb10 R14: ffff955e92a341c0 R15: 0000000000000001 [ 10.196312] FS: 00007fd6862aa8c0(0000) GS:ffff955e9fd80000(0000) knlGS:0000000000000000 [ 10.197513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 10.198373] CR2: 00007fd6837dd000 CR3: 0000000812acc001 CR4: 0000000000760ee0 [ 10.199436] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 10.200494] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 10.201554] PKRU: 55555554 [ 10.201967] Call Trace: [ 10.202348] ? _raw_spin_unlock_irqrestore+0x40/0x70 [ 10.203093] trace_hardirqs_on+0x56/0x60 <----- enter IRQ flags tracing code? [ 10.203686] _raw_spin_unlock_irqrestore+0x40/0x70 <----- take report_lock [ 10.204406] prepare_report+0x11f/0x150 [ 10.204986] kcsan_report+0xca/0x6c0 <----- generating a KCSAN report [ 10.205529] ? __this_cpu_preempt_check+0x18/0x20 [ 10.206236] ? validate_chain+0x104/0x3320 [ 10.206853] ? __lock_acquire+0x9b2/0x10b0 [ 10.207472] ? __lock_acquire+0x9b2/0x10b0 [ 10.208091] ? lock_acquire+0xd3/0x2c0 [ 10.208665] ? kernfs_iop_permission+0x3f/0x170 [ 10.209347] ? lockref_get_not_dead+0x14/0x60 [ 10.210012] ? kernfs_iop_permission+0x3f/0x170 [ 10.210698] ? __mutex_trylock_or_owner+0x9a/0xf0 [ 10.211404] ? kernfs_iop_permission+0x3f/0x170 [ 10.212083] ? __mutex_lock+0x226/0x710 [ 10.212669] kcsan_found_watchpoint+0xe5/0x110 [ 10.213334] kernfs_iop_permission+0x72/0x170 [ 10.214000] ? kernfs_evict_inode+0x40/0x40 [ 10.214632] link_path_walk+0x667/0x840 [ 10.215215] path_lookupat+0x7b/0x4b0 [ 10.215774] filename_lookup+0xe9/0x330 [ 10.216355] ? getname_flags+0x7a/0x370 [ 10.216939] ? __might_fault+0x5/0x10 [ 10.217495] ? strncpy_from_user+0x190/0x240 [ 10.218149] user_path_at_empty+0x3b/0x50 [ 10.218757] do_faccessat+0x1e8/0x430 [ 10.219313] __x64_sys_access+0x33/0x40 [ 10.219896] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 10.220679] do_syscall_64+0x65/0xa0 [ 10.221221] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 10.221987] RIP: 0033:0x7fd6855679c7 [ 10.222528] Code: Bad RIP value. [ 10.223010] RSP: 002b:00007ffcca601df8 EFLAGS: 00000246 ORIG_RAX: 0000000000000015 [ 10.224123] RAX: ffffffffffffffda RBX: 00007ffcca601e80 RCX: 00007fd6855679c7 [ 10.225173] RDX: 00302e33303a3030 RSI: 0000000000000000 RDI: 00007ffcca601e00 [ 10.226229] RBP: 00007ffcca602f20 R08: 000000000000c0c0 R09: 0000000000000120 [ 10.227283] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffcca602f40 [ 10.228339] R13: 000055878dd137c8 R14: 0000000000000018 R15: 00007ffcca601e00 [ 10.229400] irq event stamp: 63012 [ 10.229922] hardirqs last enabled at (63011): [] step_into+0x328/0xd20 [ 10.231130] hardirqs last disabled at (63012): [] _raw_spin_lock_irqsave+0x2e/0x80 [ 10.232415] softirqs last enabled at (62514): [] asm_call_on_stack+0x12/0x20 [ 10.233564] softirqs last disabled at (62327): [] asm_call_on_stack+0x12/0x20 [ 10.234718] ---[ end trace 696b39497a6233dc ]---