* [syzbot] WARNING: suspicious RCU usage in get_timespec64
@ 2021-04-04 19:05 syzbot
2021-04-04 20:38 ` Thomas Gleixner
0 siblings, 1 reply; 11+ messages in thread
From: syzbot @ 2021-04-04 19:05 UTC (permalink / raw)
To: john.stultz, linux-kernel, sboyd, syzkaller-bugs, tglx
Hello,
syzbot found the following issue on:
HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
Unfortunately, I don't have any reproducer for this issue yet.
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
=============================
WARNING: suspicious RCU usage
5.12.0-rc5-syzkaller #0 Not tainted
-----------------------------
kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0
3 locks held by syz-executor.4/8418:
#0:
ffff8880751d2b28
(
&p->pi_lock
){-.-.}-{2:2}
, at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
#1:
ffff8880b9d35258
(
&rq->lock
){-.-.}-{2:2}
, at: rq_lock kernel/sched/sched.h:1321 [inline]
, at: ttwu_queue kernel/sched/core.c:3184 [inline]
, at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
#2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
stack backtrace:
CPU: 0 PID: 8418 Comm: syz-executor.4 Not tainted 5.12.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
___might_sleep+0x266/0x2c0 kernel/sched/core.c:8294
__might_fault+0x6e/0x180 mm/memory.c:5018
_copy_from_user+0x27/0x180 lib/usercopy.c:13
copy_from_user include/linux/uaccess.h:192 [inline]
get_timespec64+0x75/0x220 kernel/time/time.c:787
__do_sys_clock_nanosleep kernel/time/posix-timers.c:1257 [inline]
__se_sys_clock_nanosleep kernel/time/posix-timers.c:1245 [inline]
__x64_sys_clock_nanosleep+0x1bb/0x430 kernel/time/posix-timers.c:1245
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x48a621
Code: 24 0c 89 3c 24 48 89 4c 24 18 e8 aa e7 ff ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 8b 74 24 0c 8b 3c 24 b8 e6 00 00 00 0f 05 <44> 89 c7 48 89 04 24 e8 e3 e7 ff ff 48 8b 04 24 eb 97 66 2e 0f 1f
RSP: 002b:00007fffe59fbd50 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6
RAX: ffffffffffffffda RBX: 0000000000000294 RCX: 000000000048a621
RDX: 00007fffe59fbd90 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00007fffe59fbe2c R08: 0000000000000000 R09: 00007fffe5b8a090
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000032
R13: 000000000005717a R14: 0000000000000003 R15: 00007fffe59fbe90
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64
2021-04-04 19:05 [syzbot] WARNING: suspicious RCU usage in get_timespec64 syzbot
@ 2021-04-04 20:38 ` Thomas Gleixner
2021-04-04 21:40 ` Paul E. McKenney
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2021-04-04 20:38 UTC (permalink / raw)
To: syzbot
Cc: john.stultz, linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra,
Paul E. McKenney
On Sun, Apr 04 2021 at 12:05, syzbot wrote:
Cc + ...
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
> dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
>
> =============================
> WARNING: suspicious RCU usage
> 5.12.0-rc5-syzkaller #0 Not tainted
> -----------------------------
> kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 2, debug_locks = 0
> 3 locks held by syz-executor.4/8418:
> #0:
> ffff8880751d2b28
> (
> &p->pi_lock
> ){-.-.}-{2:2}
> , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
> #1:
> ffff8880b9d35258
> (
> &rq->lock
> ){-.-.}-{2:2}
> , at: rq_lock kernel/sched/sched.h:1321 [inline]
> , at: ttwu_queue kernel/sched/core.c:3184 [inline]
> , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
> #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
>
> stack backtrace:
> CPU: 0 PID: 8418 Comm: syz-executor.4 Not tainted 5.12.0-rc5-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:79 [inline]
> dump_stack+0x141/0x1d7 lib/dump_stack.c:120
> ___might_sleep+0x266/0x2c0 kernel/sched/core.c:8294
> __might_fault+0x6e/0x180 mm/memory.c:5018
> _copy_from_user+0x27/0x180 lib/usercopy.c:13
> copy_from_user include/linux/uaccess.h:192 [inline]
> get_timespec64+0x75/0x220 kernel/time/time.c:787
> __do_sys_clock_nanosleep kernel/time/posix-timers.c:1257 [inline]
> __se_sys_clock_nanosleep kernel/time/posix-timers.c:1245 [inline]
> __x64_sys_clock_nanosleep+0x1bb/0x430 kernel/time/posix-timers.c:1245
> do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x48a621
> Code: 24 0c 89 3c 24 48 89 4c 24 18 e8 aa e7 ff ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 8b 74 24 0c 8b 3c 24 b8 e6 00 00 00 0f 05 <44> 89 c7 48 89 04 24 e8 e3 e7 ff ff 48 8b 04 24 eb 97 66 2e 0f 1f
> RSP: 002b:00007fffe59fbd50 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6
> RAX: ffffffffffffffda RBX: 0000000000000294 RCX: 000000000048a621
> RDX: 00007fffe59fbd90 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 00007fffe59fbe2c R08: 0000000000000000 R09: 00007fffe5b8a090
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000032
> R13: 000000000005717a R14: 0000000000000003 R15: 00007fffe59fbe90
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64
2021-04-04 20:38 ` Thomas Gleixner
@ 2021-04-04 21:40 ` Paul E. McKenney
2021-04-05 3:08 ` Matthew Wilcox
0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2021-04-04 21:40 UTC (permalink / raw)
To: Thomas Gleixner
Cc: syzbot, john.stultz, linux-kernel, sboyd, syzkaller-bugs,
Peter Zijlstra, boqun.feng, willy
On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote:
> On Sun, Apr 04 2021 at 12:05, syzbot wrote:
>
> Cc + ...
And a couple more...
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
> > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
> >
> > =============================
> > WARNING: suspicious RCU usage
> > 5.12.0-rc5-syzkaller #0 Not tainted
> > -----------------------------
> > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
> >
> > other info that might help us debug this:
> >
> >
> > rcu_scheduler_active = 2, debug_locks = 0
> > 3 locks held by syz-executor.4/8418:
> > #0:
> > ffff8880751d2b28
> > (
> > &p->pi_lock
> > ){-.-.}-{2:2}
> > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
> > #1:
> > ffff8880b9d35258
> > (
> > &rq->lock
> > ){-.-.}-{2:2}
> > , at: rq_lock kernel/sched/sched.h:1321 [inline]
> > , at: ttwu_queue kernel/sched/core.c:3184 [inline]
> > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
> > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com
in that rcu_sleep_check() sees an RCU lock held, but the later call to
lockdep_print_held_locks() does not. Did something change recently that
could let the ->lockdep_depth counter get out of sync with the actual
number of locks held?
Thanx, Paul
> > stack backtrace:
> > CPU: 0 PID: 8418 Comm: syz-executor.4 Not tainted 5.12.0-rc5-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Call Trace:
> > __dump_stack lib/dump_stack.c:79 [inline]
> > dump_stack+0x141/0x1d7 lib/dump_stack.c:120
> > ___might_sleep+0x266/0x2c0 kernel/sched/core.c:8294
> > __might_fault+0x6e/0x180 mm/memory.c:5018
> > _copy_from_user+0x27/0x180 lib/usercopy.c:13
> > copy_from_user include/linux/uaccess.h:192 [inline]
> > get_timespec64+0x75/0x220 kernel/time/time.c:787
> > __do_sys_clock_nanosleep kernel/time/posix-timers.c:1257 [inline]
> > __se_sys_clock_nanosleep kernel/time/posix-timers.c:1245 [inline]
> > __x64_sys_clock_nanosleep+0x1bb/0x430 kernel/time/posix-timers.c:1245
> > do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
> > entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x48a621
> > Code: 24 0c 89 3c 24 48 89 4c 24 18 e8 aa e7 ff ff 4c 8b 54 24 18 48 8b 54 24 10 41 89 c0 8b 74 24 0c 8b 3c 24 b8 e6 00 00 00 0f 05 <44> 89 c7 48 89 04 24 e8 e3 e7 ff ff 48 8b 04 24 eb 97 66 2e 0f 1f
> > RSP: 002b:00007fffe59fbd50 EFLAGS: 00000293 ORIG_RAX: 00000000000000e6
> > RAX: ffffffffffffffda RBX: 0000000000000294 RCX: 000000000048a621
> > RDX: 00007fffe59fbd90 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: 00007fffe59fbe2c R08: 0000000000000000 R09: 00007fffe5b8a090
> > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000032
> > R13: 000000000005717a R14: 0000000000000003 R15: 00007fffe59fbe90
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64
2021-04-04 21:40 ` Paul E. McKenney
@ 2021-04-05 3:08 ` Matthew Wilcox
2021-04-05 4:01 ` Paul E. McKenney
0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2021-04-05 3:08 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Thomas Gleixner, syzbot, john.stultz, linux-kernel, sboyd,
syzkaller-bugs, Peter Zijlstra, boqun.feng
On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote:
> On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote:
> > On Sun, Apr 04 2021 at 12:05, syzbot wrote:
> >
> > Cc + ...
>
> And a couple more...
>
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
> > > git tree: upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
> > >
> > > =============================
> > > WARNING: suspicious RCU usage
> > > 5.12.0-rc5-syzkaller #0 Not tainted
> > > -----------------------------
> > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
> > >
> > > other info that might help us debug this:
> > >
> > >
> > > rcu_scheduler_active = 2, debug_locks = 0
> > > 3 locks held by syz-executor.4/8418:
> > > #0:
> > > ffff8880751d2b28
> > > (
> > > &p->pi_lock
> > > ){-.-.}-{2:2}
> > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
> > > #1:
> > > ffff8880b9d35258
> > > (
> > > &rq->lock
> > > ){-.-.}-{2:2}
> > > , at: rq_lock kernel/sched/sched.h:1321 [inline]
> > > , at: ttwu_queue kernel/sched/core.c:3184 [inline]
> > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
> > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
>
> This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com
> in that rcu_sleep_check() sees an RCU lock held, but the later call to
> lockdep_print_held_locks() does not. Did something change recently that
> could let the ->lockdep_depth counter get out of sync with the actual
> number of locks held?
Dmitri had a different theory here:
https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64
2021-04-05 3:08 ` Matthew Wilcox
@ 2021-04-05 4:01 ` Paul E. McKenney
2021-04-05 4:30 ` Paul E. McKenney
0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2021-04-05 4:01 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Thomas Gleixner, syzbot, john.stultz, linux-kernel, sboyd,
syzkaller-bugs, Peter Zijlstra, boqun.feng
On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote:
> On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote:
> > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote:
> > > On Sun, Apr 04 2021 at 12:05, syzbot wrote:
> > >
> > > Cc + ...
> >
> > And a couple more...
> >
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
> > > > git tree: upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
> > > >
> > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
> > > >
> > > > =============================
> > > > WARNING: suspicious RCU usage
> > > > 5.12.0-rc5-syzkaller #0 Not tainted
> > > > -----------------------------
> > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
> > > >
> > > > other info that might help us debug this:
> > > >
> > > >
> > > > rcu_scheduler_active = 2, debug_locks = 0
> > > > 3 locks held by syz-executor.4/8418:
> > > > #0:
> > > > ffff8880751d2b28
> > > > (
> > > > &p->pi_lock
> > > > ){-.-.}-{2:2}
> > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
> > > > #1:
> > > > ffff8880b9d35258
> > > > (
> > > > &rq->lock
> > > > ){-.-.}-{2:2}
> > > > , at: rq_lock kernel/sched/sched.h:1321 [inline]
> > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline]
> > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
> > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
> >
> > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com
> > in that rcu_sleep_check() sees an RCU lock held, but the later call to
> > lockdep_print_held_locks() does not. Did something change recently that
> > could let the ->lockdep_depth counter get out of sync with the actual
> > number of locks held?
>
> Dmitri had a different theory here:
>
> https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ
There is always room for more than one bug. ;-)
He says "one-off false positives". I was afraid of that...
Thanx, Paul
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64
2021-04-05 4:01 ` Paul E. McKenney
@ 2021-04-05 4:30 ` Paul E. McKenney
2021-04-05 5:23 ` Boqun Feng
0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2021-04-05 4:30 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Thomas Gleixner, syzbot, john.stultz, linux-kernel, sboyd,
syzkaller-bugs, Peter Zijlstra, boqun.feng
On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote:
> On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote:
> > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote:
> > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote:
> > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote:
> > > >
> > > > Cc + ...
> > >
> > > And a couple more...
> > >
> > > > > Hello,
> > > > >
> > > > > syzbot found the following issue on:
> > > > >
> > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
> > > > > git tree: upstream
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
> > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
> > > > >
> > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > >
> > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
> > > > >
> > > > > =============================
> > > > > WARNING: suspicious RCU usage
> > > > > 5.12.0-rc5-syzkaller #0 Not tainted
> > > > > -----------------------------
> > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
> > > > >
> > > > > other info that might help us debug this:
> > > > >
> > > > >
> > > > > rcu_scheduler_active = 2, debug_locks = 0
> > > > > 3 locks held by syz-executor.4/8418:
> > > > > #0:
> > > > > ffff8880751d2b28
> > > > > (
> > > > > &p->pi_lock
> > > > > ){-.-.}-{2:2}
> > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
> > > > > #1:
> > > > > ffff8880b9d35258
> > > > > (
> > > > > &rq->lock
> > > > > ){-.-.}-{2:2}
> > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline]
> > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline]
> > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
> > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
> > >
> > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com
> > > in that rcu_sleep_check() sees an RCU lock held, but the later call to
> > > lockdep_print_held_locks() does not. Did something change recently that
> > > could let the ->lockdep_depth counter get out of sync with the actual
> > > number of locks held?
> >
> > Dmitri had a different theory here:
> >
> > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ
>
> There is always room for more than one bug. ;-)
>
> He says "one-off false positives". I was afraid of that...
And both the examples I have been copied on today are consistent with
debug_locks getting zeroed (e.g., via a call to __debug_locks_off())
in the midst of a call to rcu_sleep_check(). But I would expect to see
a panic or another splat if that were to happen.
Dmitry's example did have an additional splat, but I would expect the
RCU-related one to come second. Again, there is always room for more
than one bug.
On the other hand, there are a lot more callers to debug_locks_off()
than there were last I looked into this. And both of these splats
are consistent with an interrupt in the middle of rcu_sleep_check(),
and that interrupt's handler invoking debug_locks_off(), but without
printing anything to the console. Does that sequence of events ring a
bell for anyone?
If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck
debug_lockdep_rcu_enabled() after evaluating the condition, but with
a memory barrier immediately before the recheck. But I am not at all
excited by doing this on speculation. Especially given that doing
so might be covering up some other bug.
Thanx, Paul
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64
2021-04-05 4:30 ` Paul E. McKenney
@ 2021-04-05 5:23 ` Boqun Feng
2021-04-05 17:27 ` Paul E. McKenney
0 siblings, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2021-04-05 5:23 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Matthew Wilcox, Thomas Gleixner, syzbot, john.stultz,
linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra
On Sun, Apr 04, 2021 at 09:30:38PM -0700, Paul E. McKenney wrote:
> On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote:
> > On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote:
> > > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote:
> > > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote:
> > > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote:
> > > > >
> > > > > Cc + ...
> > > >
> > > > And a couple more...
> > > >
> > > > > > Hello,
> > > > > >
> > > > > > syzbot found the following issue on:
> > > > > >
> > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
> > > > > > git tree: upstream
> > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
> > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
> > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
> > > > > >
> > > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > > >
> > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
> > > > > >
> > > > > > =============================
> > > > > > WARNING: suspicious RCU usage
> > > > > > 5.12.0-rc5-syzkaller #0 Not tainted
> > > > > > -----------------------------
> > > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
> > > > > >
> > > > > > other info that might help us debug this:
> > > > > >
> > > > > >
> > > > > > rcu_scheduler_active = 2, debug_locks = 0
> > > > > > 3 locks held by syz-executor.4/8418:
> > > > > > #0:
> > > > > > ffff8880751d2b28
> > > > > > (
> > > > > > &p->pi_lock
> > > > > > ){-.-.}-{2:2}
> > > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
> > > > > > #1:
> > > > > > ffff8880b9d35258
> > > > > > (
> > > > > > &rq->lock
> > > > > > ){-.-.}-{2:2}
> > > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline]
> > > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline]
> > > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
> > > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
> > > >
> > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com
> > > > in that rcu_sleep_check() sees an RCU lock held, but the later call to
> > > > lockdep_print_held_locks() does not. Did something change recently that
> > > > could let the ->lockdep_depth counter get out of sync with the actual
> > > > number of locks held?
> > >
> > > Dmitri had a different theory here:
> > >
> > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ
> >
> > There is always room for more than one bug. ;-)
> >
> > He says "one-off false positives". I was afraid of that...
>
> And both the examples I have been copied on today are consistent with
> debug_locks getting zeroed (e.g., via a call to __debug_locks_off())
> in the midst of a call to rcu_sleep_check(). But I would expect to see
> a panic or another splat if that were to happen.
>
> Dmitry's example did have an additional splat, but I would expect the
> RCU-related one to come second. Again, there is always room for more
> than one bug.
>
> On the other hand, there are a lot more callers to debug_locks_off()
> than there were last I looked into this. And both of these splats
> are consistent with an interrupt in the middle of rcu_sleep_check(),
> and that interrupt's handler invoking debug_locks_off(), but without
> printing anything to the console. Does that sequence of events ring a
> bell for anyone?
>
> If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck
> debug_lockdep_rcu_enabled() after evaluating the condition, but with
> a memory barrier immediately before the recheck. But I am not at all
> excited by doing this on speculation. Especially given that doing
> so might be covering up some other bug.
>
Just check the original console log and find:
[ 356.696686][ T8418] =============================
[ 356.696692][ T8418] WARNING: suspicious RCU usage
[ 356.700193][T14782] ====================================
[ 356.704548][ T8418] 5.12.0-rc5-syzkaller #0 Not tainted
[ 356.729981][ T8418] -----------------------------
[ 356.732473][T14782] WARNING: iou-sqp-14780/14782 still has locks held!
, so there are two warnnings here, one is from lockdep_rcu_suspisous()
and the other is from print_held_locks_bug(). I think this is what
happened:
in RCU_LOCKDEP_WARN():
if (debug_lockdep_rcu_enabled() // this is true and at this time debug_locks = 1
<interrupted>
// lockdep detects a lock bug, set debug_locks = 0
<swicth back>
&& !__warned // true
&& (c)) // "c" is a lock_is_held(), which will always returns true if debug_locks == 0!
the cause of the problem is that RCU_LOCKDEP_WARN() in fact read
debug_locks twice and get different values.
But if you change the ordering of two reads, probably can avoid the
problem:
First read:
lock_is_held(); // true if 1) lock is really held or 2) lockdep is off
Second read:
debug_lockdep_rcu_enabled(); // if lockdep is not off, we know
// that the first read got correct
// value, otherwise we just ignore
// the first read, because either
// there is a bug reported between
// two reads, or lockdep is already
// off when the first read happens.
So maybe something below:
Regards,
Boqun
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index bd04f722714f..d11bee5d9347 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void)
#define RCU_LOCKDEP_WARN(c, s) \
do { \
static bool __section(".data.unlikely") __warned; \
- if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
+ if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \
__warned = true; \
lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
} \
> Thanx, Paul
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64
2021-04-05 5:23 ` Boqun Feng
@ 2021-04-05 17:27 ` Paul E. McKenney
2021-04-05 23:25 ` Boqun Feng
0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2021-04-05 17:27 UTC (permalink / raw)
To: Boqun Feng
Cc: Matthew Wilcox, Thomas Gleixner, syzbot, john.stultz,
linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra
On Mon, Apr 05, 2021 at 01:23:30PM +0800, Boqun Feng wrote:
> On Sun, Apr 04, 2021 at 09:30:38PM -0700, Paul E. McKenney wrote:
> > On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote:
> > > On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote:
> > > > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote:
> > > > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote:
> > > > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote:
> > > > > >
> > > > > > Cc + ...
> > > > >
> > > > > And a couple more...
> > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > syzbot found the following issue on:
> > > > > > >
> > > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
> > > > > > > git tree: upstream
> > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
> > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
> > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
> > > > > > >
> > > > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > > > >
> > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
> > > > > > >
> > > > > > > =============================
> > > > > > > WARNING: suspicious RCU usage
> > > > > > > 5.12.0-rc5-syzkaller #0 Not tainted
> > > > > > > -----------------------------
> > > > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
> > > > > > >
> > > > > > > other info that might help us debug this:
> > > > > > >
> > > > > > >
> > > > > > > rcu_scheduler_active = 2, debug_locks = 0
> > > > > > > 3 locks held by syz-executor.4/8418:
> > > > > > > #0:
> > > > > > > ffff8880751d2b28
> > > > > > > (
> > > > > > > &p->pi_lock
> > > > > > > ){-.-.}-{2:2}
> > > > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
> > > > > > > #1:
> > > > > > > ffff8880b9d35258
> > > > > > > (
> > > > > > > &rq->lock
> > > > > > > ){-.-.}-{2:2}
> > > > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline]
> > > > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline]
> > > > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
> > > > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
> > > > >
> > > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com
> > > > > in that rcu_sleep_check() sees an RCU lock held, but the later call to
> > > > > lockdep_print_held_locks() does not. Did something change recently that
> > > > > could let the ->lockdep_depth counter get out of sync with the actual
> > > > > number of locks held?
> > > >
> > > > Dmitri had a different theory here:
> > > >
> > > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ
> > >
> > > There is always room for more than one bug. ;-)
> > >
> > > He says "one-off false positives". I was afraid of that...
> >
> > And both the examples I have been copied on today are consistent with
> > debug_locks getting zeroed (e.g., via a call to __debug_locks_off())
> > in the midst of a call to rcu_sleep_check(). But I would expect to see
> > a panic or another splat if that were to happen.
> >
> > Dmitry's example did have an additional splat, but I would expect the
> > RCU-related one to come second. Again, there is always room for more
> > than one bug.
> >
> > On the other hand, there are a lot more callers to debug_locks_off()
> > than there were last I looked into this. And both of these splats
> > are consistent with an interrupt in the middle of rcu_sleep_check(),
> > and that interrupt's handler invoking debug_locks_off(), but without
> > printing anything to the console. Does that sequence of events ring a
> > bell for anyone?
> >
> > If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck
> > debug_lockdep_rcu_enabled() after evaluating the condition, but with
> > a memory barrier immediately before the recheck. But I am not at all
> > excited by doing this on speculation. Especially given that doing
> > so might be covering up some other bug.
> >
>
> Just check the original console log and find:
>
> [ 356.696686][ T8418] =============================
> [ 356.696692][ T8418] WARNING: suspicious RCU usage
> [ 356.700193][T14782] ====================================
> [ 356.704548][ T8418] 5.12.0-rc5-syzkaller #0 Not tainted
> [ 356.729981][ T8418] -----------------------------
> [ 356.732473][T14782] WARNING: iou-sqp-14780/14782 still has locks held!
>
> , so there are two warnnings here, one is from lockdep_rcu_suspisous()
> and the other is from print_held_locks_bug(). I think this is what
> happened:
>
> in RCU_LOCKDEP_WARN():
>
> if (debug_lockdep_rcu_enabled() // this is true and at this time debug_locks = 1
> <interrupted>
> // lockdep detects a lock bug, set debug_locks = 0
> <swicth back>
> && !__warned // true
> && (c)) // "c" is a lock_is_held(), which will always returns true if debug_locks == 0!
>
> the cause of the problem is that RCU_LOCKDEP_WARN() in fact read
> debug_locks twice and get different values.
>
> But if you change the ordering of two reads, probably can avoid the
> problem:
>
> First read:
> lock_is_held(); // true if 1) lock is really held or 2) lockdep is off
>
> Second read:
> debug_lockdep_rcu_enabled(); // if lockdep is not off, we know
> // that the first read got correct
> // value, otherwise we just ignore
> // the first read, because either
> // there is a bug reported between
> // two reads, or lockdep is already
> // off when the first read happens.
>
> So maybe something below:
>
> Regards,
> Boqun
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index bd04f722714f..d11bee5d9347 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void)
> #define RCU_LOCKDEP_WARN(c, s) \
> do { \
> static bool __section(".data.unlikely") __warned; \
> - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
> + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \
> __warned = true; \
> lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
> } \
Good point -- if we check debug_lockdep_rcu_enabled() after the condition,
then we will reject false positives in cases where debug_locks was switched
to zero out from under us.
However, we do need ordering. The "c" usually contains lock_is_held(),
which also checks debug_locks, but from some other translation unit.
Back in the day, the translation-unit boundaries would provide the needed
ordering, but LTO...
In addition, the "debug_locks = 0" was originally supposed to be a hint
that the report might be a false positive. It is clear that this needs
to be made explicit.
Taking all this together, how about the following? (The intent is
that the changes to lockdep_rcu_suspicious() will be in a separate
commit.)
Thanx, Paul
------------------------------------------------------------------------
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9455476..1199ffd 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void)
#define RCU_LOCKDEP_WARN(c, s) \
do { \
static bool __section(".data.unlikely") __warned; \
- if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
+ if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \
__warned = true; \
lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
} \
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index c6d0c1dc..80065cd 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -6387,6 +6387,7 @@ asmlinkage __visible void lockdep_sys_exit(void)
void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
{
struct task_struct *curr = current;
+ int dl = READ_ONCE(debug_locks);
/* Note: the following can be executed concurrently, so be careful. */
pr_warn("\n");
@@ -6396,11 +6397,12 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
pr_warn("-----------------------------\n");
pr_warn("%s:%d %s!\n", file, line, s);
pr_warn("\nother info that might help us debug this:\n\n");
- pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n",
+ pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n%s",
!rcu_lockdep_current_cpu_online()
? "RCU used illegally from offline CPU!\n"
: "",
- rcu_scheduler_active, debug_locks);
+ rcu_scheduler_active, dl,
+ dl ? "" : "Possible false positive due to lockdep disabling via debug_locks = 0\n");
/*
* If a CPU is in the RCU-free window in idle (ie: in the section
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index b95ae86..dd94a60 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -277,7 +277,7 @@ EXPORT_SYMBOL_GPL(rcu_callback_map);
noinstr int notrace debug_lockdep_rcu_enabled(void)
{
- return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
+ return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && READ_ONCE(debug_locks) &&
current->lockdep_recursion == 0;
}
EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64
2021-04-05 17:27 ` Paul E. McKenney
@ 2021-04-05 23:25 ` Boqun Feng
2021-04-05 23:38 ` Paul E. McKenney
0 siblings, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2021-04-05 23:25 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Matthew Wilcox, Thomas Gleixner, syzbot, john.stultz,
linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra
On Mon, Apr 05, 2021 at 10:27:52AM -0700, Paul E. McKenney wrote:
> On Mon, Apr 05, 2021 at 01:23:30PM +0800, Boqun Feng wrote:
> > On Sun, Apr 04, 2021 at 09:30:38PM -0700, Paul E. McKenney wrote:
> > > On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote:
> > > > On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote:
> > > > > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote:
> > > > > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote:
> > > > > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote:
> > > > > > >
> > > > > > > Cc + ...
> > > > > >
> > > > > > And a couple more...
> > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > syzbot found the following issue on:
> > > > > > > >
> > > > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
> > > > > > > > git tree: upstream
> > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
> > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
> > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
> > > > > > > >
> > > > > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > > > > >
> > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
> > > > > > > >
> > > > > > > > =============================
> > > > > > > > WARNING: suspicious RCU usage
> > > > > > > > 5.12.0-rc5-syzkaller #0 Not tainted
> > > > > > > > -----------------------------
> > > > > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
> > > > > > > >
> > > > > > > > other info that might help us debug this:
> > > > > > > >
> > > > > > > >
> > > > > > > > rcu_scheduler_active = 2, debug_locks = 0
> > > > > > > > 3 locks held by syz-executor.4/8418:
> > > > > > > > #0:
> > > > > > > > ffff8880751d2b28
> > > > > > > > (
> > > > > > > > &p->pi_lock
> > > > > > > > ){-.-.}-{2:2}
> > > > > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
> > > > > > > > #1:
> > > > > > > > ffff8880b9d35258
> > > > > > > > (
> > > > > > > > &rq->lock
> > > > > > > > ){-.-.}-{2:2}
> > > > > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline]
> > > > > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline]
> > > > > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
> > > > > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
> > > > > >
> > > > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com
> > > > > > in that rcu_sleep_check() sees an RCU lock held, but the later call to
> > > > > > lockdep_print_held_locks() does not. Did something change recently that
> > > > > > could let the ->lockdep_depth counter get out of sync with the actual
> > > > > > number of locks held?
> > > > >
> > > > > Dmitri had a different theory here:
> > > > >
> > > > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ
> > > >
> > > > There is always room for more than one bug. ;-)
> > > >
> > > > He says "one-off false positives". I was afraid of that...
> > >
> > > And both the examples I have been copied on today are consistent with
> > > debug_locks getting zeroed (e.g., via a call to __debug_locks_off())
> > > in the midst of a call to rcu_sleep_check(). But I would expect to see
> > > a panic or another splat if that were to happen.
> > >
> > > Dmitry's example did have an additional splat, but I would expect the
> > > RCU-related one to come second. Again, there is always room for more
> > > than one bug.
> > >
> > > On the other hand, there are a lot more callers to debug_locks_off()
> > > than there were last I looked into this. And both of these splats
> > > are consistent with an interrupt in the middle of rcu_sleep_check(),
> > > and that interrupt's handler invoking debug_locks_off(), but without
> > > printing anything to the console. Does that sequence of events ring a
> > > bell for anyone?
> > >
> > > If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck
> > > debug_lockdep_rcu_enabled() after evaluating the condition, but with
> > > a memory barrier immediately before the recheck. But I am not at all
> > > excited by doing this on speculation. Especially given that doing
> > > so might be covering up some other bug.
> > >
> >
> > Just check the original console log and find:
> >
> > [ 356.696686][ T8418] =============================
> > [ 356.696692][ T8418] WARNING: suspicious RCU usage
> > [ 356.700193][T14782] ====================================
> > [ 356.704548][ T8418] 5.12.0-rc5-syzkaller #0 Not tainted
> > [ 356.729981][ T8418] -----------------------------
> > [ 356.732473][T14782] WARNING: iou-sqp-14780/14782 still has locks held!
> >
> > , so there are two warnnings here, one is from lockdep_rcu_suspisous()
> > and the other is from print_held_locks_bug(). I think this is what
> > happened:
> >
> > in RCU_LOCKDEP_WARN():
> >
> > if (debug_lockdep_rcu_enabled() // this is true and at this time debug_locks = 1
> > <interrupted>
> > // lockdep detects a lock bug, set debug_locks = 0
> > <swicth back>
> > && !__warned // true
> > && (c)) // "c" is a lock_is_held(), which will always returns true if debug_locks == 0!
> >
> > the cause of the problem is that RCU_LOCKDEP_WARN() in fact read
> > debug_locks twice and get different values.
> >
> > But if you change the ordering of two reads, probably can avoid the
> > problem:
> >
> > First read:
> > lock_is_held(); // true if 1) lock is really held or 2) lockdep is off
> >
> > Second read:
> > debug_lockdep_rcu_enabled(); // if lockdep is not off, we know
> > // that the first read got correct
> > // value, otherwise we just ignore
> > // the first read, because either
> > // there is a bug reported between
> > // two reads, or lockdep is already
> > // off when the first read happens.
> >
> > So maybe something below:
> >
> > Regards,
> > Boqun
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index bd04f722714f..d11bee5d9347 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void)
> > #define RCU_LOCKDEP_WARN(c, s) \
> > do { \
> > static bool __section(".data.unlikely") __warned; \
> > - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
> > + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \
> > __warned = true; \
> > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
> > } \
>
> Good point -- if we check debug_lockdep_rcu_enabled() after the condition,
> then we will reject false positives in cases where debug_locks was switched
> to zero out from under us.
>
> However, we do need ordering. The "c" usually contains lock_is_held(),
> which also checks debug_locks, but from some other translation unit.
> Back in the day, the translation-unit boundaries would provide the needed
> ordering, but LTO...
>
> In addition, the "debug_locks = 0" was originally supposed to be a hint
> that the report might be a false positive. It is clear that this needs
> to be made explicit.
>
> Taking all this together, how about the following? (The intent is
> that the changes to lockdep_rcu_suspicious() will be in a separate
> commit.)
>
Looks good to me ;-)
Regards,
Boqun
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 9455476..1199ffd 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void)
> #define RCU_LOCKDEP_WARN(c, s) \
> do { \
> static bool __section(".data.unlikely") __warned; \
> - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
> + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \
> __warned = true; \
> lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
> } \
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index c6d0c1dc..80065cd 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -6387,6 +6387,7 @@ asmlinkage __visible void lockdep_sys_exit(void)
> void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> {
> struct task_struct *curr = current;
> + int dl = READ_ONCE(debug_locks);
>
> /* Note: the following can be executed concurrently, so be careful. */
> pr_warn("\n");
> @@ -6396,11 +6397,12 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> pr_warn("-----------------------------\n");
> pr_warn("%s:%d %s!\n", file, line, s);
> pr_warn("\nother info that might help us debug this:\n\n");
> - pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n",
> + pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n%s",
> !rcu_lockdep_current_cpu_online()
> ? "RCU used illegally from offline CPU!\n"
> : "",
> - rcu_scheduler_active, debug_locks);
> + rcu_scheduler_active, dl,
> + dl ? "" : "Possible false positive due to lockdep disabling via debug_locks = 0\n");
>
> /*
> * If a CPU is in the RCU-free window in idle (ie: in the section
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index b95ae86..dd94a60 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -277,7 +277,7 @@ EXPORT_SYMBOL_GPL(rcu_callback_map);
>
> noinstr int notrace debug_lockdep_rcu_enabled(void)
> {
> - return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
> + return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && READ_ONCE(debug_locks) &&
> current->lockdep_recursion == 0;
> }
> EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled);
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64
2021-04-05 23:25 ` Boqun Feng
@ 2021-04-05 23:38 ` Paul E. McKenney
2021-04-05 23:42 ` Boqun Feng
0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2021-04-05 23:38 UTC (permalink / raw)
To: Boqun Feng
Cc: Matthew Wilcox, Thomas Gleixner, syzbot, john.stultz,
linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra
On Tue, Apr 06, 2021 at 07:25:44AM +0800, Boqun Feng wrote:
> On Mon, Apr 05, 2021 at 10:27:52AM -0700, Paul E. McKenney wrote:
> > On Mon, Apr 05, 2021 at 01:23:30PM +0800, Boqun Feng wrote:
> > > On Sun, Apr 04, 2021 at 09:30:38PM -0700, Paul E. McKenney wrote:
> > > > On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote:
> > > > > On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote:
> > > > > > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote:
> > > > > > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote:
> > > > > > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote:
> > > > > > > >
> > > > > > > > Cc + ...
> > > > > > >
> > > > > > > And a couple more...
> > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > syzbot found the following issue on:
> > > > > > > > >
> > > > > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
> > > > > > > > > git tree: upstream
> > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
> > > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
> > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
> > > > > > > > >
> > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > > > > > >
> > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
> > > > > > > > >
> > > > > > > > > =============================
> > > > > > > > > WARNING: suspicious RCU usage
> > > > > > > > > 5.12.0-rc5-syzkaller #0 Not tainted
> > > > > > > > > -----------------------------
> > > > > > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
> > > > > > > > >
> > > > > > > > > other info that might help us debug this:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0
> > > > > > > > > 3 locks held by syz-executor.4/8418:
> > > > > > > > > #0:
> > > > > > > > > ffff8880751d2b28
> > > > > > > > > (
> > > > > > > > > &p->pi_lock
> > > > > > > > > ){-.-.}-{2:2}
> > > > > > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
> > > > > > > > > #1:
> > > > > > > > > ffff8880b9d35258
> > > > > > > > > (
> > > > > > > > > &rq->lock
> > > > > > > > > ){-.-.}-{2:2}
> > > > > > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline]
> > > > > > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline]
> > > > > > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
> > > > > > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
> > > > > > >
> > > > > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com
> > > > > > > in that rcu_sleep_check() sees an RCU lock held, but the later call to
> > > > > > > lockdep_print_held_locks() does not. Did something change recently that
> > > > > > > could let the ->lockdep_depth counter get out of sync with the actual
> > > > > > > number of locks held?
> > > > > >
> > > > > > Dmitri had a different theory here:
> > > > > >
> > > > > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ
> > > > >
> > > > > There is always room for more than one bug. ;-)
> > > > >
> > > > > He says "one-off false positives". I was afraid of that...
> > > >
> > > > And both the examples I have been copied on today are consistent with
> > > > debug_locks getting zeroed (e.g., via a call to __debug_locks_off())
> > > > in the midst of a call to rcu_sleep_check(). But I would expect to see
> > > > a panic or another splat if that were to happen.
> > > >
> > > > Dmitry's example did have an additional splat, but I would expect the
> > > > RCU-related one to come second. Again, there is always room for more
> > > > than one bug.
> > > >
> > > > On the other hand, there are a lot more callers to debug_locks_off()
> > > > than there were last I looked into this. And both of these splats
> > > > are consistent with an interrupt in the middle of rcu_sleep_check(),
> > > > and that interrupt's handler invoking debug_locks_off(), but without
> > > > printing anything to the console. Does that sequence of events ring a
> > > > bell for anyone?
> > > >
> > > > If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck
> > > > debug_lockdep_rcu_enabled() after evaluating the condition, but with
> > > > a memory barrier immediately before the recheck. But I am not at all
> > > > excited by doing this on speculation. Especially given that doing
> > > > so might be covering up some other bug.
> > > >
> > >
> > > Just check the original console log and find:
> > >
> > > [ 356.696686][ T8418] =============================
> > > [ 356.696692][ T8418] WARNING: suspicious RCU usage
> > > [ 356.700193][T14782] ====================================
> > > [ 356.704548][ T8418] 5.12.0-rc5-syzkaller #0 Not tainted
> > > [ 356.729981][ T8418] -----------------------------
> > > [ 356.732473][T14782] WARNING: iou-sqp-14780/14782 still has locks held!
> > >
> > > , so there are two warnnings here, one is from lockdep_rcu_suspisous()
> > > and the other is from print_held_locks_bug(). I think this is what
> > > happened:
> > >
> > > in RCU_LOCKDEP_WARN():
> > >
> > > if (debug_lockdep_rcu_enabled() // this is true and at this time debug_locks = 1
> > > <interrupted>
> > > // lockdep detects a lock bug, set debug_locks = 0
> > > <swicth back>
> > > && !__warned // true
> > > && (c)) // "c" is a lock_is_held(), which will always returns true if debug_locks == 0!
> > >
> > > the cause of the problem is that RCU_LOCKDEP_WARN() in fact read
> > > debug_locks twice and get different values.
> > >
> > > But if you change the ordering of two reads, probably can avoid the
> > > problem:
> > >
> > > First read:
> > > lock_is_held(); // true if 1) lock is really held or 2) lockdep is off
> > >
> > > Second read:
> > > debug_lockdep_rcu_enabled(); // if lockdep is not off, we know
> > > // that the first read got correct
> > > // value, otherwise we just ignore
> > > // the first read, because either
> > > // there is a bug reported between
> > > // two reads, or lockdep is already
> > > // off when the first read happens.
> > >
> > > So maybe something below:
> > >
> > > Regards,
> > > Boqun
> > >
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index bd04f722714f..d11bee5d9347 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void)
> > > #define RCU_LOCKDEP_WARN(c, s) \
> > > do { \
> > > static bool __section(".data.unlikely") __warned; \
> > > - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
> > > + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \
> > > __warned = true; \
> > > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
> > > } \
> >
> > Good point -- if we check debug_lockdep_rcu_enabled() after the condition,
> > then we will reject false positives in cases where debug_locks was switched
> > to zero out from under us.
> >
> > However, we do need ordering. The "c" usually contains lock_is_held(),
> > which also checks debug_locks, but from some other translation unit.
> > Back in the day, the translation-unit boundaries would provide the needed
> > ordering, but LTO...
> >
> > In addition, the "debug_locks = 0" was originally supposed to be a hint
> > that the report might be a false positive. It is clear that this needs
> > to be made explicit.
> >
> > Taking all this together, how about the following? (The intent is
> > that the changes to lockdep_rcu_suspicious() will be in a separate
> > commit.)
>
> Looks good to me ;-)
Whew! May I add your Reviewed-by?
Thanx, Paul
> Regards,
> Boqun
>
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 9455476..1199ffd 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void)
> > #define RCU_LOCKDEP_WARN(c, s) \
> > do { \
> > static bool __section(".data.unlikely") __warned; \
> > - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
> > + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \
> > __warned = true; \
> > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
> > } \
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index c6d0c1dc..80065cd 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -6387,6 +6387,7 @@ asmlinkage __visible void lockdep_sys_exit(void)
> > void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> > {
> > struct task_struct *curr = current;
> > + int dl = READ_ONCE(debug_locks);
> >
> > /* Note: the following can be executed concurrently, so be careful. */
> > pr_warn("\n");
> > @@ -6396,11 +6397,12 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> > pr_warn("-----------------------------\n");
> > pr_warn("%s:%d %s!\n", file, line, s);
> > pr_warn("\nother info that might help us debug this:\n\n");
> > - pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n",
> > + pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n%s",
> > !rcu_lockdep_current_cpu_online()
> > ? "RCU used illegally from offline CPU!\n"
> > : "",
> > - rcu_scheduler_active, debug_locks);
> > + rcu_scheduler_active, dl,
> > + dl ? "" : "Possible false positive due to lockdep disabling via debug_locks = 0\n");
> >
> > /*
> > * If a CPU is in the RCU-free window in idle (ie: in the section
> > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > index b95ae86..dd94a60 100644
> > --- a/kernel/rcu/update.c
> > +++ b/kernel/rcu/update.c
> > @@ -277,7 +277,7 @@ EXPORT_SYMBOL_GPL(rcu_callback_map);
> >
> > noinstr int notrace debug_lockdep_rcu_enabled(void)
> > {
> > - return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
> > + return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && READ_ONCE(debug_locks) &&
> > current->lockdep_recursion == 0;
> > }
> > EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled);
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] WARNING: suspicious RCU usage in get_timespec64
2021-04-05 23:38 ` Paul E. McKenney
@ 2021-04-05 23:42 ` Boqun Feng
0 siblings, 0 replies; 11+ messages in thread
From: Boqun Feng @ 2021-04-05 23:42 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Matthew Wilcox, Thomas Gleixner, syzbot, john.stultz,
linux-kernel, sboyd, syzkaller-bugs, Peter Zijlstra
On Mon, Apr 05, 2021 at 04:38:07PM -0700, Paul E. McKenney wrote:
> On Tue, Apr 06, 2021 at 07:25:44AM +0800, Boqun Feng wrote:
> > On Mon, Apr 05, 2021 at 10:27:52AM -0700, Paul E. McKenney wrote:
> > > On Mon, Apr 05, 2021 at 01:23:30PM +0800, Boqun Feng wrote:
> > > > On Sun, Apr 04, 2021 at 09:30:38PM -0700, Paul E. McKenney wrote:
> > > > > On Sun, Apr 04, 2021 at 09:01:25PM -0700, Paul E. McKenney wrote:
> > > > > > On Mon, Apr 05, 2021 at 04:08:55AM +0100, Matthew Wilcox wrote:
> > > > > > > On Sun, Apr 04, 2021 at 02:40:30PM -0700, Paul E. McKenney wrote:
> > > > > > > > On Sun, Apr 04, 2021 at 10:38:41PM +0200, Thomas Gleixner wrote:
> > > > > > > > > On Sun, Apr 04 2021 at 12:05, syzbot wrote:
> > > > > > > > >
> > > > > > > > > Cc + ...
> > > > > > > >
> > > > > > > > And a couple more...
> > > > > > > >
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > syzbot found the following issue on:
> > > > > > > > > >
> > > > > > > > > > HEAD commit: 5e46d1b7 reiserfs: update reiserfs_xattrs_initialized() co..
> > > > > > > > > > git tree: upstream
> > > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1125f831d00000
> > > > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=78ef1d159159890
> > > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=88e4f02896967fe1ab0d
> > > > > > > > > >
> > > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > > > > > > >
> > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > > > > > Reported-by: syzbot+88e4f02896967fe1ab0d@syzkaller.appspotmail.com
> > > > > > > > > >
> > > > > > > > > > =============================
> > > > > > > > > > WARNING: suspicious RCU usage
> > > > > > > > > > 5.12.0-rc5-syzkaller #0 Not tainted
> > > > > > > > > > -----------------------------
> > > > > > > > > > kernel/sched/core.c:8294 Illegal context switch in RCU-sched read-side critical section!
> > > > > > > > > >
> > > > > > > > > > other info that might help us debug this:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > rcu_scheduler_active = 2, debug_locks = 0
> > > > > > > > > > 3 locks held by syz-executor.4/8418:
> > > > > > > > > > #0:
> > > > > > > > > > ffff8880751d2b28
> > > > > > > > > > (
> > > > > > > > > > &p->pi_lock
> > > > > > > > > > ){-.-.}-{2:2}
> > > > > > > > > > , at: try_to_wake_up+0x98/0x14a0 kernel/sched/core.c:3345
> > > > > > > > > > #1:
> > > > > > > > > > ffff8880b9d35258
> > > > > > > > > > (
> > > > > > > > > > &rq->lock
> > > > > > > > > > ){-.-.}-{2:2}
> > > > > > > > > > , at: rq_lock kernel/sched/sched.h:1321 [inline]
> > > > > > > > > > , at: ttwu_queue kernel/sched/core.c:3184 [inline]
> > > > > > > > > > , at: try_to_wake_up+0x5e6/0x14a0 kernel/sched/core.c:3464
> > > > > > > > > > #2: ffff8880b9d1f948 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_change+0x142/0x220 kernel/sched/psi.c:807
> > > > > > > >
> > > > > > > > This looks similar to syzbot+dde0cc33951735441301@syzkaller.appspotmail.com
> > > > > > > > in that rcu_sleep_check() sees an RCU lock held, but the later call to
> > > > > > > > lockdep_print_held_locks() does not. Did something change recently that
> > > > > > > > could let the ->lockdep_depth counter get out of sync with the actual
> > > > > > > > number of locks held?
> > > > > > >
> > > > > > > Dmitri had a different theory here:
> > > > > > >
> > > > > > > https://groups.google.com/g/syzkaller-bugs/c/FmYvfZCZzqA/m/nc2CXUgsAgAJ
> > > > > >
> > > > > > There is always room for more than one bug. ;-)
> > > > > >
> > > > > > He says "one-off false positives". I was afraid of that...
> > > > >
> > > > > And both the examples I have been copied on today are consistent with
> > > > > debug_locks getting zeroed (e.g., via a call to __debug_locks_off())
> > > > > in the midst of a call to rcu_sleep_check(). But I would expect to see
> > > > > a panic or another splat if that were to happen.
> > > > >
> > > > > Dmitry's example did have an additional splat, but I would expect the
> > > > > RCU-related one to come second. Again, there is always room for more
> > > > > than one bug.
> > > > >
> > > > > On the other hand, there are a lot more callers to debug_locks_off()
> > > > > than there were last I looked into this. And both of these splats
> > > > > are consistent with an interrupt in the middle of rcu_sleep_check(),
> > > > > and that interrupt's handler invoking debug_locks_off(), but without
> > > > > printing anything to the console. Does that sequence of events ring a
> > > > > bell for anyone?
> > > > >
> > > > > If this is the new normal, I could make RCU_LOCKDEP_WARN() recheck
> > > > > debug_lockdep_rcu_enabled() after evaluating the condition, but with
> > > > > a memory barrier immediately before the recheck. But I am not at all
> > > > > excited by doing this on speculation. Especially given that doing
> > > > > so might be covering up some other bug.
> > > > >
> > > >
> > > > Just check the original console log and find:
> > > >
> > > > [ 356.696686][ T8418] =============================
> > > > [ 356.696692][ T8418] WARNING: suspicious RCU usage
> > > > [ 356.700193][T14782] ====================================
> > > > [ 356.704548][ T8418] 5.12.0-rc5-syzkaller #0 Not tainted
> > > > [ 356.729981][ T8418] -----------------------------
> > > > [ 356.732473][T14782] WARNING: iou-sqp-14780/14782 still has locks held!
> > > >
> > > > , so there are two warnnings here, one is from lockdep_rcu_suspisous()
> > > > and the other is from print_held_locks_bug(). I think this is what
> > > > happened:
> > > >
> > > > in RCU_LOCKDEP_WARN():
> > > >
> > > > if (debug_lockdep_rcu_enabled() // this is true and at this time debug_locks = 1
> > > > <interrupted>
> > > > // lockdep detects a lock bug, set debug_locks = 0
> > > > <swicth back>
> > > > && !__warned // true
> > > > && (c)) // "c" is a lock_is_held(), which will always returns true if debug_locks == 0!
> > > >
> > > > the cause of the problem is that RCU_LOCKDEP_WARN() in fact read
> > > > debug_locks twice and get different values.
> > > >
> > > > But if you change the ordering of two reads, probably can avoid the
> > > > problem:
> > > >
> > > > First read:
> > > > lock_is_held(); // true if 1) lock is really held or 2) lockdep is off
> > > >
> > > > Second read:
> > > > debug_lockdep_rcu_enabled(); // if lockdep is not off, we know
> > > > // that the first read got correct
> > > > // value, otherwise we just ignore
> > > > // the first read, because either
> > > > // there is a bug reported between
> > > > // two reads, or lockdep is already
> > > > // off when the first read happens.
> > > >
> > > > So maybe something below:
> > > >
> > > > Regards,
> > > > Boqun
> > > >
> > > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > > index bd04f722714f..d11bee5d9347 100644
> > > > --- a/include/linux/rcupdate.h
> > > > +++ b/include/linux/rcupdate.h
> > > > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void)
> > > > #define RCU_LOCKDEP_WARN(c, s) \
> > > > do { \
> > > > static bool __section(".data.unlikely") __warned; \
> > > > - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
> > > > + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \
> > > > __warned = true; \
> > > > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
> > > > } \
> > >
> > > Good point -- if we check debug_lockdep_rcu_enabled() after the condition,
> > > then we will reject false positives in cases where debug_locks was switched
> > > to zero out from under us.
> > >
> > > However, we do need ordering. The "c" usually contains lock_is_held(),
> > > which also checks debug_locks, but from some other translation unit.
> > > Back in the day, the translation-unit boundaries would provide the needed
> > > ordering, but LTO...
> > >
> > > In addition, the "debug_locks = 0" was originally supposed to be a hint
> > > that the report might be a false positive. It is clear that this needs
> > > to be made explicit.
> > >
> > > Taking all this together, how about the following? (The intent is
> > > that the changes to lockdep_rcu_suspicious() will be in a separate
> > > commit.)
> >
> > Looks good to me ;-)
>
> Whew! May I add your Reviewed-by?
>
Of course ;-)
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Regards,
Boqun
> Thanx, Paul
>
> > Regards,
> > Boqun
> >
> > > Thanx, Paul
> > >
> > > ------------------------------------------------------------------------
> > >
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index 9455476..1199ffd 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -315,7 +315,7 @@ static inline int rcu_read_lock_any_held(void)
> > > #define RCU_LOCKDEP_WARN(c, s) \
> > > do { \
> > > static bool __section(".data.unlikely") __warned; \
> > > - if (debug_lockdep_rcu_enabled() && !__warned && (c)) { \
> > > + if ((c) && debug_lockdep_rcu_enabled() && !__warned) { \
> > > __warned = true; \
> > > lockdep_rcu_suspicious(__FILE__, __LINE__, s); \
> > > } \
> > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > > index c6d0c1dc..80065cd 100644
> > > --- a/kernel/locking/lockdep.c
> > > +++ b/kernel/locking/lockdep.c
> > > @@ -6387,6 +6387,7 @@ asmlinkage __visible void lockdep_sys_exit(void)
> > > void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> > > {
> > > struct task_struct *curr = current;
> > > + int dl = READ_ONCE(debug_locks);
> > >
> > > /* Note: the following can be executed concurrently, so be careful. */
> > > pr_warn("\n");
> > > @@ -6396,11 +6397,12 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
> > > pr_warn("-----------------------------\n");
> > > pr_warn("%s:%d %s!\n", file, line, s);
> > > pr_warn("\nother info that might help us debug this:\n\n");
> > > - pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n",
> > > + pr_warn("\n%srcu_scheduler_active = %d, debug_locks = %d\n%s",
> > > !rcu_lockdep_current_cpu_online()
> > > ? "RCU used illegally from offline CPU!\n"
> > > : "",
> > > - rcu_scheduler_active, debug_locks);
> > > + rcu_scheduler_active, dl,
> > > + dl ? "" : "Possible false positive due to lockdep disabling via debug_locks = 0\n");
> > >
> > > /*
> > > * If a CPU is in the RCU-free window in idle (ie: in the section
> > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > index b95ae86..dd94a60 100644
> > > --- a/kernel/rcu/update.c
> > > +++ b/kernel/rcu/update.c
> > > @@ -277,7 +277,7 @@ EXPORT_SYMBOL_GPL(rcu_callback_map);
> > >
> > > noinstr int notrace debug_lockdep_rcu_enabled(void)
> > > {
> > > - return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
> > > + return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && READ_ONCE(debug_locks) &&
> > > current->lockdep_recursion == 0;
> > > }
> > > EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled);
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-04-05 23:43 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-04 19:05 [syzbot] WARNING: suspicious RCU usage in get_timespec64 syzbot
2021-04-04 20:38 ` Thomas Gleixner
2021-04-04 21:40 ` Paul E. McKenney
2021-04-05 3:08 ` Matthew Wilcox
2021-04-05 4:01 ` Paul E. McKenney
2021-04-05 4:30 ` Paul E. McKenney
2021-04-05 5:23 ` Boqun Feng
2021-04-05 17:27 ` Paul E. McKenney
2021-04-05 23:25 ` Boqun Feng
2021-04-05 23:38 ` Paul E. McKenney
2021-04-05 23:42 ` Boqun Feng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).