All of lore.kernel.org
 help / color / mirror / Atom feed
* INFO: rcu detected stall in corrupted (3)
@ 2019-03-29 22:34 syzbot
  2019-03-30  0:13 ` Tetsuo Handa
  0 siblings, 1 reply; 10+ messages in thread
From: syzbot @ 2019-03-29 22:34 UTC (permalink / raw)
  To: bp, hpa, linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86

Hello,

syzbot found the following crash on:

HEAD commit:    8c2ffd91 Linux 5.1-rc2
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15099d2b200000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8dcdce25ea72bedf
dashboard link: https://syzkaller.appspot.com/bug?extid=65cecdd27b726c261799
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17d3c67d200000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11d4f317200000

Bisection is inconclusive: the bug happens on the oldest tested release.

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=111a358b200000
console output: https://syzkaller.appspot.com/x/log.txt?x=151a358b200000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+65cecdd27b726c261799@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-1): P8340
rcu: 	(detected by 1, t=10502 jiffies, g=5905, q=81)
syz-executor586 R  running task    27832  8340   8338 0x00000000
Call Trace:
  context_switch kernel/sched/core.c:2877 [inline]
  __schedule+0x817/0x1cc0 kernel/sched/core.c:3518
  preempt_schedule_common+0x4f/0xe0 kernel/sched/core.c:3642
  preempt_schedule+0x4b/0x60 kernel/sched/core.c:3668
  ___preempt_schedule+0x16/0x18
  __sched_setscheduler+0x12fb/0x1e70 kernel/sched/core.c:4398
  sched_setattr kernel/sched/core.c:4440 [inline]
  __do_sys_sched_setattr kernel/sched/core.c:4616 [inline]
  __se_sys_sched_setattr kernel/sched/core.c:4595 [inline]
  __x64_sys_sched_setattr+0x184/0x2b0 kernel/sched/core.c:4595
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4403c9
Code: 76 77 78 5d 20 5b 2d 6c 3c 68 6f 73 74 6c 69 73 74 3e 5d 20 5b 2d 73  
3c 64 6f 6d 61 69 6e 6c 69 73 74 3e 5d 0a 20 20 20 20 20 <20> 20 20 20 20  
20 20 20 20 20 20 5b 2d 66 3c 63 6f 6e 66 66 69 6c
RSP: 002b:00007ffea1a49298 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
RAX: ffffffffffffffda RBX: 00007ffea1a49340 RCX: 00000000004403c9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000002b80 R09: 0000000000400d10
R10: 000000000000f8f8 R11: 0000000000000246 R12: 0000000000401c90
R13: 0000000000401d20 R14: 0000000000000000 R15: 0000000000000000
syz-executor586 R  running task    27832  8340   8338 0x00000000
Call Trace:
  context_switch kernel/sched/core.c:2877 [inline]
  __schedule+0x817/0x1cc0 kernel/sched/core.c:3518
  preempt_schedule_common+0x4f/0xe0 kernel/sched/core.c:3642
  preempt_schedule+0x4b/0x60 kernel/sched/core.c:3668
  ___preempt_schedule+0x16/0x18
  __sched_setscheduler+0x12fb/0x1e70 kernel/sched/core.c:4398
  sched_setattr kernel/sched/core.c:4440 [inline]
  __do_sys_sched_setattr kernel/sched/core.c:4616 [inline]
  __se_sys_sched_setattr kernel/sched/core.c:4595 [inline]
  __x64_sys_sched_setattr+0x184/0x2b0 kernel/sched/core.c:4595
  do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4403c9
Code: 76 77 78 5d 20 5b 2d 6c 3c 68 6f 73 74 6c 69 73 74 3e 5d 20 5b 2d 73  
3c 64 6f 6d 61 69 6e 6c 69 73 74 3e 5d 0a 20 20 20 20 20 <20> 20 20 20 20  
20 20 20 20 20 20 5b 2d 66 3c 63 6f 6e 66 66 69 6c
RSP: 002b:00007ffea1a49298 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
RAX: ffffffffffffffda RBX: 00007ffea1a49340 RCX: 00000000004403c9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000002b80 R09: 0000000000400d10
R10: 000000000000f8f8 R11: 0000000000000246 R12: 0000000000401c90
R13: 0000000000401d20 R14: 0000000000000000 R15: 0000000000000000


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in corrupted (3)
  2019-03-29 22:34 INFO: rcu detected stall in corrupted (3) syzbot
@ 2019-03-30  0:13 ` Tetsuo Handa
  2019-03-30  7:46   ` Thomas Gleixner
  0 siblings, 1 reply; 10+ messages in thread
From: Tetsuo Handa @ 2019-03-30  0:13 UTC (permalink / raw)
  To: syzbot, syzkaller-bugs; +Cc: bp, hpa, linux-kernel, luto, mingo, tglx, x86

On 2019/03/30 7:34, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    8c2ffd91 Linux 5.1-rc2
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=15099d2b200000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=8dcdce25ea72bedf
> dashboard link: https://syzkaller.appspot.com/bug?extid=65cecdd27b726c261799
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17d3c67d200000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11d4f317200000
> 
> Bisection is inconclusive: the bug happens on the oldest tested release.

This reproducer does sched_setattr(SCHED_DEADLINE) with bogus value, as with
a reproducer for "INFO: rcu detected stall in sys_sendfile64" did.

  sched_setattr(0, {size=0, sched_policy=0x6 /* SCHED_DEADLINE */, sched_flags=0, sched_nice=0, sched_priority=0, sched_runtime=65535, sched_deadline=4611686018427453437, sched_period=0}, 0) = 0

#syz invalid

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in corrupted (3)
  2019-03-30  0:13 ` Tetsuo Handa
@ 2019-03-30  7:46   ` Thomas Gleixner
  2019-03-30 10:40     ` Tetsuo Handa
  0 siblings, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2019-03-30  7:46 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: syzbot, syzkaller-bugs, Borislav Petkov, H. Peter Anvin, LKML,
	Andy Lutomirski, mingo, x86, Peter Zijlstra, Juri Lelli

[-- Attachment #1: Type: text/plain, Size: 1324 bytes --]

On Sat, 30 Mar 2019, Tetsuo Handa wrote:

> On 2019/03/30 7:34, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following crash on:
> > 
> > HEAD commit:    8c2ffd91 Linux 5.1-rc2
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15099d2b200000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=8dcdce25ea72bedf
> > dashboard link: https://syzkaller.appspot.com/bug?extid=65cecdd27b726c261799
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17d3c67d200000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11d4f317200000
> > 
> > Bisection is inconclusive: the bug happens on the oldest tested release.
> 
> This reproducer does sched_setattr(SCHED_DEADLINE) with bogus value, as with
> a reproducer for "INFO: rcu detected stall in sys_sendfile64" did.
> 
> sched_setattr(0, {size=0, sched_policy=0x6 /* SCHED_DEADLINE */,
> sched_flags=0, sched_nice=0, sched_priority=0, sched_runtime=65535,
> sched_deadline=4611686018427453437, sched_period=0}, 0) = 0
>
> #syz invalid

Marking this invalid is not really the right thing to do. Bogus deadline
parameters should not cause RCU stalls. They either need to be rejected or
handled gracefully.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in corrupted (3)
  2019-03-30  7:46   ` Thomas Gleixner
@ 2019-03-30 10:40     ` Tetsuo Handa
  2019-03-30 10:45       ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Tetsuo Handa @ 2019-03-30 10:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: syzbot, syzkaller-bugs, Borislav Petkov, H. Peter Anvin, LKML,
	Andy Lutomirski, mingo, x86, Peter Zijlstra, Juri Lelli

On 2019/03/30 16:46, Thomas Gleixner wrote:
> On Sat, 30 Mar 2019, Tetsuo Handa wrote:
>> This reproducer does sched_setattr(SCHED_DEADLINE) with bogus value, as with
>> a reproducer for "INFO: rcu detected stall in sys_sendfile64" did.
>>
>> sched_setattr(0, {size=0, sched_policy=0x6 /* SCHED_DEADLINE */,
>> sched_flags=0, sched_nice=0, sched_priority=0, sched_runtime=65535,
>> sched_deadline=4611686018427453437, sched_period=0}, 0) = 0
>>
>> #syz invalid
> 
> Marking this invalid is not really the right thing to do. Bogus deadline
> parameters should not cause RCU stalls. They either need to be rejected or
> handled gracefully.

But how can the scheduler be aware of various watchdogs' thresholds?

The scheduler behaves differently based on watchdog's remaining grace periods?
That sounds quite strange. If administrator tunes watchdog thresholds in a way
schedulers can't survive (or vice versa), it must be an administrator's fault.

Since this stall might occur with any combination, not closing this kind of
report will result in flood of stall reports...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in corrupted (3)
  2019-03-30 10:40     ` Tetsuo Handa
@ 2019-03-30 10:45       ` Borislav Petkov
  2019-03-30 10:57         ` Tetsuo Handa
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2019-03-30 10:45 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Thomas Gleixner, syzbot, syzkaller-bugs, H. Peter Anvin, LKML,
	Andy Lutomirski, mingo, x86, Peter Zijlstra, Juri Lelli

On Sat, Mar 30, 2019 at 07:40:11PM +0900, Tetsuo Handa wrote:
> But how can the scheduler be aware of various watchdogs' thresholds?

I think what tglx means is sched_setattr() should be fixed to fail due
to the bogus value.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in corrupted (3)
  2019-03-30 10:45       ` Borislav Petkov
@ 2019-03-30 10:57         ` Tetsuo Handa
  2019-03-30 11:09           ` Borislav Petkov
  0 siblings, 1 reply; 10+ messages in thread
From: Tetsuo Handa @ 2019-03-30 10:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, syzbot, syzkaller-bugs, H. Peter Anvin, LKML,
	Andy Lutomirski, mingo, x86, Peter Zijlstra, Juri Lelli

On 2019/03/30 19:45, Borislav Petkov wrote:
> On Sat, Mar 30, 2019 at 07:40:11PM +0900, Tetsuo Handa wrote:
>> But how can the scheduler be aware of various watchdogs' thresholds?
> 
> I think what tglx means is sched_setattr() should be fixed to fail due
> to the bogus value.
> 

Yes. But what such threshold be? 0.1 second? 1 second? 10 seconds?
Can we find a threshold where everyone can agree on?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in corrupted (3)
  2019-03-30 10:57         ` Tetsuo Handa
@ 2019-03-30 11:09           ` Borislav Petkov
  2019-03-30 14:07             ` Tetsuo Handa
  2019-04-01  6:42             ` Juri Lelli
  0 siblings, 2 replies; 10+ messages in thread
From: Borislav Petkov @ 2019-03-30 11:09 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Thomas Gleixner, syzbot, syzkaller-bugs, H. Peter Anvin, LKML,
	Andy Lutomirski, mingo, x86, Peter Zijlstra, Juri Lelli

On Sat, Mar 30, 2019 at 07:57:50PM +0900, Tetsuo Handa wrote:
> Yes. But what such threshold be? 0.1 second? 1 second? 10 seconds?
> Can we find a threshold where everyone can agree on?

This is what we do all day on lkml: discussing changes so that (almost)
everyone is happy with them.

:-)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in corrupted (3)
  2019-03-30 11:09           ` Borislav Petkov
@ 2019-03-30 14:07             ` Tetsuo Handa
  2019-03-30 19:05               ` Borislav Petkov
  2019-04-01  6:42             ` Juri Lelli
  1 sibling, 1 reply; 10+ messages in thread
From: Tetsuo Handa @ 2019-03-30 14:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, syzbot, syzkaller-bugs, H. Peter Anvin, LKML,
	Andy Lutomirski, mingo, x86, Peter Zijlstra, Juri Lelli

On 2019/03/30 20:09, Borislav Petkov wrote:
> On Sat, Mar 30, 2019 at 07:57:50PM +0900, Tetsuo Handa wrote:
>> Yes. But what such threshold be? 0.1 second? 1 second? 10 seconds?
>> Can we find a threshold where everyone can agree on?
> 
> This is what we do all day on lkml: discussing changes so that (almost)
> everyone is happy with them.
> 
> :-)
> 

I think that syzbot should for now refrain from testing syscalls that change
scheduling related attributes, for mixing stall reports caused by change of
scheduling related attributes and different stall reports caused by e.g.
(almost) infinite loop due to race conditions is annoying.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in corrupted (3)
  2019-03-30 14:07             ` Tetsuo Handa
@ 2019-03-30 19:05               ` Borislav Petkov
  0 siblings, 0 replies; 10+ messages in thread
From: Borislav Petkov @ 2019-03-30 19:05 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Thomas Gleixner, syzbot, syzkaller-bugs, H. Peter Anvin, LKML,
	Andy Lutomirski, mingo, x86, Peter Zijlstra, Juri Lelli

On Sat, Mar 30, 2019 at 11:07:40PM +0900, Tetsuo Handa wrote:
> I think that syzbot should for now refrain from testing syscalls that change
> scheduling related attributes,

And how would we know about problems there, otherwise?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: INFO: rcu detected stall in corrupted (3)
  2019-03-30 11:09           ` Borislav Petkov
  2019-03-30 14:07             ` Tetsuo Handa
@ 2019-04-01  6:42             ` Juri Lelli
  1 sibling, 0 replies; 10+ messages in thread
From: Juri Lelli @ 2019-04-01  6:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tetsuo Handa, Thomas Gleixner, syzbot, syzkaller-bugs,
	H. Peter Anvin, LKML, Andy Lutomirski, mingo, x86,
	Peter Zijlstra

Hi,

On 30/03/19 12:09, Borislav Petkov wrote:
> On Sat, Mar 30, 2019 at 07:57:50PM +0900, Tetsuo Handa wrote:
> > Yes. But what such threshold be? 0.1 second? 1 second? 10 seconds?
> > Can we find a threshold where everyone can agree on?
> 
> This is what we do all day on lkml: discussing changes so that (almost)
> everyone is happy with them.

Looks like this is the same problem we discussed a while ago [1], but
couldn't reach an agreement on what's best to do about it.

I'll need to go back and refresh memory.

Thanks,

- Juri

1 - https://lore.kernel.org/lkml/000000000000a4ee200578172fde@google.com/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-04-01  6:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-29 22:34 INFO: rcu detected stall in corrupted (3) syzbot
2019-03-30  0:13 ` Tetsuo Handa
2019-03-30  7:46   ` Thomas Gleixner
2019-03-30 10:40     ` Tetsuo Handa
2019-03-30 10:45       ` Borislav Petkov
2019-03-30 10:57         ` Tetsuo Handa
2019-03-30 11:09           ` Borislav Petkov
2019-03-30 14:07             ` Tetsuo Handa
2019-03-30 19:05               ` Borislav Petkov
2019-04-01  6:42             ` Juri Lelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.