All of lore.kernel.org
 help / color / mirror / Atom feed
* WARNING in enqueue_task_dl
@ 2018-11-18 18:49 syzbot
  2018-11-19  8:23 ` Thomas Gleixner
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: syzbot @ 2018-11-18 18:49 UTC (permalink / raw)
  To: bp, hpa, linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86

Hello,

syzbot found the following crash on:

HEAD commit:    1ce80e0fe98e Merge tag 'fsnotify_for_v4.20-rc3' of git://g..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14ddbb0b400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d86f24333880b605
dashboard link: https://syzkaller.appspot.com/bug?extid=119ba87189432ead09b4
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13e9e015400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
8021q: adding VLAN 0 to HW filter on device team0
hrtimer: interrupt took 33411 ns
sched: DL replenish lagged too much
WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628  
enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
PM: Basic memory bitmaps freed
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x244/0x39d lib/dump_stack.c:113
  panic+0x2ad/0x55c kernel/panic.c:188
  __warn.cold.8+0x20/0x45 kernel/panic.c:540
  report_bug+0x254/0x2d0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
  do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b 95 d8 fe  
ff ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff <0f> 0b e9 17 f1  
ff ff 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
  enqueue_task+0x184/0x390 kernel/sched/core.c:730
  __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
  sched_setattr kernel/sched/core.c:4394 [inline]
  __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
  __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
  __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
RBP: 000000000072bfa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f05ce0a36d4
R13: 00000000004c369f R14: 00000000004d5730 R15: 00000000ffffffff

======================================================
WARNING: possible circular locking dependency detected
4.20.0-rc2+ #338 Not tainted
------------------------------------------------------
syz-executor0/6351 is trying to acquire lock:
00000000b2b97155 ((console_sem).lock){-.-.}, at: down_trylock+0x13/0x70  
kernel/locking/semaphore.c:136

but task is already holding lock:
000000004cd5557e (&rq->lock){-.-.}, at: task_rq_lock+0xc5/0x2a0  
kernel/sched/core.c:99

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&rq->lock){-.-.}:
        __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
        _raw_spin_lock+0x2d/0x40 kernel/locking/spinlock.c:144
        rq_lock kernel/sched/sched.h:1126 [inline]
        task_fork_fair+0xb0/0x6d0 kernel/sched/fair.c:9768
        sched_fork+0x443/0xba0 kernel/sched/core.c:2359
        copy_process+0x25b8/0x87a0 kernel/fork.c:1887
        _do_fork+0x1cb/0x11d0 kernel/fork.c:2216
        kernel_thread+0x34/0x40 kernel/fork.c:2275
        rest_init+0x28/0x372 init/main.c:409
        arch_call_rest_init+0xe/0x1b
        start_kernel+0x9f0/0xa2b init/main.c:745
        x86_64_start_reservations+0x2e/0x30 arch/x86/kernel/head64.c:472
        x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:451
        secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243

-> #1 (&p->pi_lock){-.-.}:
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
        try_to_wake_up+0xdc/0x1490 kernel/sched/core.c:1965
        wake_up_process+0x10/0x20 kernel/sched/core.c:2129
        __up.isra.1+0x1c0/0x2a0 kernel/locking/semaphore.c:262
        up+0x13c/0x1c0 kernel/locking/semaphore.c:187
        __up_console_sem+0xbe/0x1b0 kernel/printk/printk.c:236
        console_unlock+0x811/0x1190 kernel/printk/printk.c:2432
        do_con_write+0x1356/0x23b0 drivers/tty/vt/vt.c:2767
        con_write+0x25/0xc0 drivers/tty/vt/vt.c:3116
        process_output_block drivers/tty/n_tty.c:593 [inline]
        n_tty_write+0x6c1/0x11a0 drivers/tty/n_tty.c:2331
        do_tty_write drivers/tty/tty_io.c:959 [inline]
        tty_write+0x3f1/0x880 drivers/tty/tty_io.c:1043
        __vfs_write+0x119/0x9f0 fs/read_write.c:485
        vfs_write+0x1fc/0x560 fs/read_write.c:549
        ksys_write+0x101/0x260 fs/read_write.c:598
        __do_sys_write fs/read_write.c:610 [inline]
        __se_sys_write fs/read_write.c:607 [inline]
        __x64_sys_write+0x73/0xb0 fs/read_write.c:607
        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 ((console_sem).lock){-.-.}:
        lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
        down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
        __down_trylock_console_sem+0xae/0x1f0 kernel/printk/printk.c:219
        console_trylock+0x15/0xa0 kernel/printk/printk.c:2247
        console_trylock_spinning kernel/printk/printk.c:1653 [inline]
        vprintk_emit+0x372/0x990 kernel/printk/printk.c:1921
        vprintk_default+0x28/0x30 kernel/printk/printk.c:1964
        vprintk_func+0x7e/0x181 kernel/printk/printk_safe.c:398
        printk+0xa7/0xcf kernel/printk/printk.c:1997
        __warn+0x9e/0x1d0 kernel/panic.c:522
        report_bug+0x254/0x2d0 lib/bug.c:186
        fixup_bug arch/x86/kernel/traps.c:178 [inline]
        do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
        do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
        invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
        enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
        enqueue_task+0x184/0x390 kernel/sched/core.c:730
        __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
        sched_setattr kernel/sched/core.c:4394 [inline]
        __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
        __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
        __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
        do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Chain exists of:
   (console_sem).lock --> &p->pi_lock --> &rq->lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&rq->lock);
                                lock(&p->pi_lock);
                                lock(&rq->lock);
   lock((console_sem).lock);

  *** DEADLOCK ***

3 locks held by syz-executor0/6351:
  #0: 000000001a0356c1 (rcu_read_lock){....}, at: __do_sys_sched_setattr  
kernel/sched/core.c:4563 [inline]
  #0: 000000001a0356c1 (rcu_read_lock){....}, at: __se_sys_sched_setattr  
kernel/sched/core.c:4549 [inline]
  #0: 000000001a0356c1 (rcu_read_lock){....}, at:  
__x64_sys_sched_setattr+0x146/0x2f0 kernel/sched/core.c:4549
  #1: 000000000b71b478 (&p->pi_lock){-.-.}, at: task_rq_lock+0x62/0x2a0  
kernel/sched/core.c:97
  #2: 000000004cd5557e (&rq->lock){-.-.}, at: task_rq_lock+0xc5/0x2a0  
kernel/sched/core.c:99

stack backtrace:
CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x244/0x39d lib/dump_stack.c:113
  print_circular_bug.isra.35.cold.54+0x1bd/0x27d  
kernel/locking/lockdep.c:1221
  check_prev_add kernel/locking/lockdep.c:1863 [inline]
  check_prevs_add kernel/locking/lockdep.c:1976 [inline]
  validate_chain kernel/locking/lockdep.c:2347 [inline]
  __lock_acquire+0x3399/0x4c20 kernel/locking/lockdep.c:3341
  lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
  down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
  __down_trylock_console_sem+0xae/0x1f0 kernel/printk/printk.c:219
  console_trylock+0x15/0xa0 kernel/printk/printk.c:2247
  console_trylock_spinning kernel/printk/printk.c:1653 [inline]
  vprintk_emit+0x372/0x990 kernel/printk/printk.c:1921
  vprintk_default+0x28/0x30 kernel/printk/printk.c:1964
  vprintk_func+0x7e/0x181 kernel/printk/printk_safe.c:398
  printk+0xa7/0xcf kernel/printk/printk.c:1997
  __warn+0x9e/0x1d0 kernel/panic.c:522
  report_bug+0x254/0x2d0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
  do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b 95 d8 fe  
ff ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff <0f> 0b e9 17 f1  
ff ff 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
  enqueue_task+0x184/0x390 kernel/sched/core.c:730
  __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
  sched_setattr kernel/sched/core.c:4394 [inline]
  __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
  __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
  __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
RBP: 000000000072bfa0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f05ce0a36d4
R13: 00000000004c369f R14: 00000000004d5730 R15: 00000000ffffffff
Shutting down cpus with NMI
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2018-11-18 18:49 WARNING in enqueue_task_dl syzbot
@ 2018-11-19  8:23 ` Thomas Gleixner
  2018-11-19 10:34   ` Peter Zijlstra
  2018-11-19 12:07   ` luca abeni
  2018-12-31 15:02 ` WARNING in enqueue_task_dl syzbot
  2019-03-20 17:08 ` syzbot
  2 siblings, 2 replies; 18+ messages in thread
From: Thomas Gleixner @ 2018-11-19  8:23 UTC (permalink / raw)
  To: syzbot
  Cc: Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski, mingo,
	syzkaller-bugs, x86, Peter Zijlstra, Juri Lelli, Luca Abeni,
	Daniel Bristot de Oliveira

Adding scheduler folks

On Sun, 18 Nov 2018, syzbot wrote:

> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    1ce80e0fe98e Merge tag 'fsnotify_for_v4.20-rc3' of git://g..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=14ddbb0b400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d86f24333880b605
> dashboard link: https://syzkaller.appspot.com/bug?extid=119ba87189432ead09b4
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13e9e015400000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
> 
> IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> 8021q: adding VLAN 0 to HW filter on device team0
> hrtimer: interrupt took 33411 ns
> sched: DL replenish lagged too much
> WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
> enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> PM: Basic memory bitmaps freed
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x244/0x39d lib/dump_stack.c:113
> panic+0x2ad/0x55c kernel/panic.c:188
> __warn.cold.8+0x20/0x45 kernel/panic.c:540
> report_bug+0x254/0x2d0 lib/bug.c:186
> fixup_bug arch/x86/kernel/traps.c:178 [inline]
> do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
> do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
> invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
> RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b 95 d8 fe ff
> ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff <0f> 0b e9 17 f1 ff ff
> 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
> RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
> RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
> RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
> RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
> R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
> R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
> enqueue_task+0x184/0x390 kernel/sched/core.c:730
> __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
> sched_setattr kernel/sched/core.c:4394 [inline]
> __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
> __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
> __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x457569
> Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f
> 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
> RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
> RBP: 000000000072bfa0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f05ce0a36d4
> R13: 00000000004c369f R14: 00000000004d5730 R15: 00000000ffffffff
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.20.0-rc2+ #338 Not tainted
> ------------------------------------------------------
> syz-executor0/6351 is trying to acquire lock:
> 00000000b2b97155 ((console_sem).lock){-.-.}, at: down_trylock+0x13/0x70
> kernel/locking/semaphore.c:136
> 
> but task is already holding lock:
> 000000004cd5557e (&rq->lock){-.-.}, at: task_rq_lock+0xc5/0x2a0
> kernel/sched/core.c:99
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (&rq->lock){-.-.}:
>       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
>       _raw_spin_lock+0x2d/0x40 kernel/locking/spinlock.c:144
>       rq_lock kernel/sched/sched.h:1126 [inline]
>       task_fork_fair+0xb0/0x6d0 kernel/sched/fair.c:9768
>       sched_fork+0x443/0xba0 kernel/sched/core.c:2359
>       copy_process+0x25b8/0x87a0 kernel/fork.c:1887
>       _do_fork+0x1cb/0x11d0 kernel/fork.c:2216
>       kernel_thread+0x34/0x40 kernel/fork.c:2275
>       rest_init+0x28/0x372 init/main.c:409
>       arch_call_rest_init+0xe/0x1b
>       start_kernel+0x9f0/0xa2b init/main.c:745
>       x86_64_start_reservations+0x2e/0x30 arch/x86/kernel/head64.c:472
>       x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:451
>       secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
> 
> -> #1 (&p->pi_lock){-.-.}:
>       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>       _raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
>       try_to_wake_up+0xdc/0x1490 kernel/sched/core.c:1965
>       wake_up_process+0x10/0x20 kernel/sched/core.c:2129
>       __up.isra.1+0x1c0/0x2a0 kernel/locking/semaphore.c:262
>       up+0x13c/0x1c0 kernel/locking/semaphore.c:187
>       __up_console_sem+0xbe/0x1b0 kernel/printk/printk.c:236
>       console_unlock+0x811/0x1190 kernel/printk/printk.c:2432
>       do_con_write+0x1356/0x23b0 drivers/tty/vt/vt.c:2767
>       con_write+0x25/0xc0 drivers/tty/vt/vt.c:3116
>       process_output_block drivers/tty/n_tty.c:593 [inline]
>       n_tty_write+0x6c1/0x11a0 drivers/tty/n_tty.c:2331
>       do_tty_write drivers/tty/tty_io.c:959 [inline]
>       tty_write+0x3f1/0x880 drivers/tty/tty_io.c:1043
>       __vfs_write+0x119/0x9f0 fs/read_write.c:485
>       vfs_write+0x1fc/0x560 fs/read_write.c:549
>       ksys_write+0x101/0x260 fs/read_write.c:598
>       __do_sys_write fs/read_write.c:610 [inline]
>       __se_sys_write fs/read_write.c:607 [inline]
>       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
>       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>       entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> -> #0 ((console_sem).lock){-.-.}:
>       lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
>       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>       _raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
>       down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
>       __down_trylock_console_sem+0xae/0x1f0 kernel/printk/printk.c:219
>       console_trylock+0x15/0xa0 kernel/printk/printk.c:2247
>       console_trylock_spinning kernel/printk/printk.c:1653 [inline]
>       vprintk_emit+0x372/0x990 kernel/printk/printk.c:1921
>       vprintk_default+0x28/0x30 kernel/printk/printk.c:1964
>       vprintk_func+0x7e/0x181 kernel/printk/printk_safe.c:398
>       printk+0xa7/0xcf kernel/printk/printk.c:1997
>       __warn+0x9e/0x1d0 kernel/panic.c:522
>       report_bug+0x254/0x2d0 lib/bug.c:186
>       fixup_bug arch/x86/kernel/traps.c:178 [inline]
>       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
>       do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
>       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
>       enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
>       enqueue_task+0x184/0x390 kernel/sched/core.c:730
>       __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
>       sched_setattr kernel/sched/core.c:4394 [inline]
>       __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
>       __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
>       __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
>       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>       entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> other info that might help us debug this:
> 
> Chain exists of:
>  (console_sem).lock --> &p->pi_lock --> &rq->lock
> 
> Possible unsafe locking scenario:
> 
>       CPU0                    CPU1
>       ----                    ----
>  lock(&rq->lock);
>                               lock(&p->pi_lock);
>                               lock(&rq->lock);
>  lock((console_sem).lock);
> 
> *** DEADLOCK ***
> 
> 3 locks held by syz-executor0/6351:
> #0: 000000001a0356c1 (rcu_read_lock){....}, at: __do_sys_sched_setattr
> kernel/sched/core.c:4563 [inline]
> #0: 000000001a0356c1 (rcu_read_lock){....}, at: __se_sys_sched_setattr
> kernel/sched/core.c:4549 [inline]
> #0: 000000001a0356c1 (rcu_read_lock){....}, at:
> __x64_sys_sched_setattr+0x146/0x2f0 kernel/sched/core.c:4549
> #1: 000000000b71b478 (&p->pi_lock){-.-.}, at: task_rq_lock+0x62/0x2a0
> kernel/sched/core.c:97
> #2: 000000004cd5557e (&rq->lock){-.-.}, at: task_rq_lock+0xc5/0x2a0
> kernel/sched/core.c:99
> 
> stack backtrace:
> CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x244/0x39d lib/dump_stack.c:113
> print_circular_bug.isra.35.cold.54+0x1bd/0x27d kernel/locking/lockdep.c:1221
> check_prev_add kernel/locking/lockdep.c:1863 [inline]
> check_prevs_add kernel/locking/lockdep.c:1976 [inline]
> validate_chain kernel/locking/lockdep.c:2347 [inline]
> __lock_acquire+0x3399/0x4c20 kernel/locking/lockdep.c:3341
> lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> _raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
> down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
> __down_trylock_console_sem+0xae/0x1f0 kernel/printk/printk.c:219
> console_trylock+0x15/0xa0 kernel/printk/printk.c:2247
> console_trylock_spinning kernel/printk/printk.c:1653 [inline]
> vprintk_emit+0x372/0x990 kernel/printk/printk.c:1921
> vprintk_default+0x28/0x30 kernel/printk/printk.c:1964
> vprintk_func+0x7e/0x181 kernel/printk/printk_safe.c:398
> printk+0xa7/0xcf kernel/printk/printk.c:1997
> __warn+0x9e/0x1d0 kernel/panic.c:522
> report_bug+0x254/0x2d0 lib/bug.c:186
> fixup_bug arch/x86/kernel/traps.c:178 [inline]
> do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
> do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
> invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
> RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b 95 d8 fe ff
> ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff <0f> 0b e9 17 f1 ff ff
> 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
> RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
> RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
> RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
> RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
> R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
> R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
> enqueue_task+0x184/0x390 kernel/sched/core.c:730
> __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
> sched_setattr kernel/sched/core.c:4394 [inline]
> __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
> __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
> __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
> do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x457569
> Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48
> 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f
> 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
> RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
> RBP: 000000000072bfa0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f05ce0a36d4
> R13: 00000000004c369f R14: 00000000004d5730 R15: 00000000ffffffff
> Shutting down cpus with NMI
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2018-11-19  8:23 ` Thomas Gleixner
@ 2018-11-19 10:34   ` Peter Zijlstra
  2018-11-19 12:07   ` luca abeni
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2018-11-19 10:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: syzbot, Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski,
	mingo, syzkaller-bugs, x86, Juri Lelli, Luca Abeni,
	Daniel Bristot de Oliveira

On Mon, Nov 19, 2018 at 09:23:03AM +0100, Thomas Gleixner wrote:
> Adding scheduler folks

Nothing new there, printk is crap.

> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.20.0-rc2+ #338 Not tainted
> > ------------------------------------------------------
> > syz-executor0/6351 is trying to acquire lock:
> > 00000000b2b97155 ((console_sem).lock){-.-.}, at: down_trylock+0x13/0x70
> > kernel/locking/semaphore.c:136
> > 
> > but task is already holding lock:
> > 000000004cd5557e (&rq->lock){-.-.}, at: task_rq_lock+0xc5/0x2a0
> > kernel/sched/core.c:99
> > 
> > which lock already depends on the new lock.
> > 
> > 
> > the existing dependency chain (in reverse order) is:
> > 
> > -> #2 (&rq->lock){-.-.}:
> >       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
> >       _raw_spin_lock+0x2d/0x40 kernel/locking/spinlock.c:144
> >       rq_lock kernel/sched/sched.h:1126 [inline]
> >       task_fork_fair+0xb0/0x6d0 kernel/sched/fair.c:9768
> >       sched_fork+0x443/0xba0 kernel/sched/core.c:2359
> >       copy_process+0x25b8/0x87a0 kernel/fork.c:1887
> >       _do_fork+0x1cb/0x11d0 kernel/fork.c:2216
> >       kernel_thread+0x34/0x40 kernel/fork.c:2275
> >       rest_init+0x28/0x372 init/main.c:409
> >       arch_call_rest_init+0xe/0x1b
> >       start_kernel+0x9f0/0xa2b init/main.c:745
> >       x86_64_start_reservations+0x2e/0x30 arch/x86/kernel/head64.c:472
> >       x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:451
> >       secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
> > 
> > -> #1 (&p->pi_lock){-.-.}:
> >       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> >       _raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
> >       try_to_wake_up+0xdc/0x1490 kernel/sched/core.c:1965
> >       wake_up_process+0x10/0x20 kernel/sched/core.c:2129
> >       __up.isra.1+0x1c0/0x2a0 kernel/locking/semaphore.c:262
> >       up+0x13c/0x1c0 kernel/locking/semaphore.c:187
> >       __up_console_sem+0xbe/0x1b0 kernel/printk/printk.c:236
> >       console_unlock+0x811/0x1190 kernel/printk/printk.c:2432
> >       do_con_write+0x1356/0x23b0 drivers/tty/vt/vt.c:2767
> >       con_write+0x25/0xc0 drivers/tty/vt/vt.c:3116
> >       process_output_block drivers/tty/n_tty.c:593 [inline]
> >       n_tty_write+0x6c1/0x11a0 drivers/tty/n_tty.c:2331
> >       do_tty_write drivers/tty/tty_io.c:959 [inline]
> >       tty_write+0x3f1/0x880 drivers/tty/tty_io.c:1043
> >       __vfs_write+0x119/0x9f0 fs/read_write.c:485
> >       vfs_write+0x1fc/0x560 fs/read_write.c:549
> >       ksys_write+0x101/0x260 fs/read_write.c:598
> >       __do_sys_write fs/read_write.c:610 [inline]
> >       __se_sys_write fs/read_write.c:607 [inline]
> >       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
> >       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> >       entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > -> #0 ((console_sem).lock){-.-.}:
> >       lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
> >       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> >       _raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
> >       down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
> >       __down_trylock_console_sem+0xae/0x1f0 kernel/printk/printk.c:219
> >       console_trylock+0x15/0xa0 kernel/printk/printk.c:2247
> >       console_trylock_spinning kernel/printk/printk.c:1653 [inline]
> >       vprintk_emit+0x372/0x990 kernel/printk/printk.c:1921
> >       vprintk_default+0x28/0x30 kernel/printk/printk.c:1964
> >       vprintk_func+0x7e/0x181 kernel/printk/printk_safe.c:398
> >       printk+0xa7/0xcf kernel/printk/printk.c:1997
> >       __warn+0x9e/0x1d0 kernel/panic.c:522
> >       report_bug+0x254/0x2d0 lib/bug.c:186
> >       fixup_bug arch/x86/kernel/traps.c:178 [inline]
> >       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
> >       do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
> >       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
> >       enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> >       enqueue_task+0x184/0x390 kernel/sched/core.c:730
> >       __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
> >       sched_setattr kernel/sched/core.c:4394 [inline]
> >       __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
> >       __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
> >       __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
> >       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> >       entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > other info that might help us debug this:
> > 
> > Chain exists of:
> >  (console_sem).lock --> &p->pi_lock --> &rq->lock
> > 
> > Possible unsafe locking scenario:
> > 
> >       CPU0                    CPU1
> >       ----                    ----
> >  lock(&rq->lock);
> >                               lock(&p->pi_lock);
> >                               lock(&rq->lock);
> >  lock((console_sem).lock);
> > 
> > *** DEADLOCK ***
> > 
> > 3 locks held by syz-executor0/6351:
> > #0: 000000001a0356c1 (rcu_read_lock){....}, at: __do_sys_sched_setattr
> > kernel/sched/core.c:4563 [inline]
> > #0: 000000001a0356c1 (rcu_read_lock){....}, at: __se_sys_sched_setattr
> > kernel/sched/core.c:4549 [inline]
> > #0: 000000001a0356c1 (rcu_read_lock){....}, at:
> > __x64_sys_sched_setattr+0x146/0x2f0 kernel/sched/core.c:4549
> > #1: 000000000b71b478 (&p->pi_lock){-.-.}, at: task_rq_lock+0x62/0x2a0
> > kernel/sched/core.c:97
> > #2: 000000004cd5557e (&rq->lock){-.-.}, at: task_rq_lock+0xc5/0x2a0
> > kernel/sched/core.c:99
> > 
> > stack backtrace:
> > CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> > 01/01/2011
> > Call Trace:
> > __dump_stack lib/dump_stack.c:77 [inline]
> > dump_stack+0x244/0x39d lib/dump_stack.c:113
> > print_circular_bug.isra.35.cold.54+0x1bd/0x27d kernel/locking/lockdep.c:1221
> > check_prev_add kernel/locking/lockdep.c:1863 [inline]
> > check_prevs_add kernel/locking/lockdep.c:1976 [inline]
> > validate_chain kernel/locking/lockdep.c:2347 [inline]
> > __lock_acquire+0x3399/0x4c20 kernel/locking/lockdep.c:3341
> > lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
> > __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> > _raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
> > down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
> > __down_trylock_console_sem+0xae/0x1f0 kernel/printk/printk.c:219
> > console_trylock+0x15/0xa0 kernel/printk/printk.c:2247
> > console_trylock_spinning kernel/printk/printk.c:1653 [inline]
> > vprintk_emit+0x372/0x990 kernel/printk/printk.c:1921
> > vprintk_default+0x28/0x30 kernel/printk/printk.c:1964
> > vprintk_func+0x7e/0x181 kernel/printk/printk_safe.c:398
> > printk+0xa7/0xcf kernel/printk/printk.c:1997
> > __warn+0x9e/0x1d0 kernel/panic.c:522
> > report_bug+0x254/0x2d0 lib/bug.c:186
> > fixup_bug arch/x86/kernel/traps.c:178 [inline]
> > do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
> > do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
> > invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
> > RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> > Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b 95 d8 fe ff
> > ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff <0f> 0b e9 17 f1 ff ff
> > 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
> > RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
> > RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
> > RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
> > RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
> > R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
> > R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
> > enqueue_task+0x184/0x390 kernel/sched/core.c:730
> > __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
> > sched_setattr kernel/sched/core.c:4394 [inline]
> > __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
> > __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
> > __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
> > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x457569
> > Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48
> > 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f
> > 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
> > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
> > RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
> > RBP: 000000000072bfa0 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 00007f05ce0a36d4
> > R13: 00000000004c369f R14: 00000000004d5730 R15: 00000000ffffffff
> > Shutting down cpus with NMI
> > Kernel Offset: disabled
> > Rebooting in 86400 seconds..
> > 
> > 
> > ---
> > This bug is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > 
> > syzbot will keep track of this bug report. See:
> > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.
> > syzbot can test patches for this bug, for details see:
> > https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2018-11-19  8:23 ` Thomas Gleixner
  2018-11-19 10:34   ` Peter Zijlstra
@ 2018-11-19 12:07   ` luca abeni
  2018-11-19 12:52     ` Peter Zijlstra
  1 sibling, 1 reply; 18+ messages in thread
From: luca abeni @ 2018-11-19 12:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: syzbot, Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski,
	mingo, syzkaller-bugs, x86, Peter Zijlstra, Juri Lelli,
	Daniel Bristot de Oliveira

Hi all,

On Mon, 19 Nov 2018 09:23:03 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:

> Adding scheduler folks
> 
> On Sun, 18 Nov 2018, syzbot wrote:
> 
> > Hello,
> > 
> > syzbot found the following crash on:
> > 
> > HEAD commit:    1ce80e0fe98e Merge tag 'fsnotify_for_v4.20-rc3' of
> > git://g.. git tree:       upstream
> > console output:
> > https://syzkaller.appspot.com/x/log.txt?x=14ddbb0b400000 kernel
> > config:  https://syzkaller.appspot.com/x/.config?x=d86f24333880b605
> > dashboard link:
> > https://syzkaller.appspot.com/bug?extid=119ba87189432ead09b4
> > compiler:       gcc (GCC) 8.0.1 20180413 (experimental) syz
> > repro:
> > https://syzkaller.appspot.com/x/repro.syz?x=13e9e015400000
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the
> > commit: Reported-by:
> > syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
> > 
> > IPv6: ADDRCONF(NETDEV_CHANGE): veth1: link becomes ready
> > IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> > 8021q: adding VLAN 0 to HW filter on device team0
> > hrtimer: interrupt took 33411 ns
> > sched: DL replenish lagged too much
> > WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
> > enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504

Here, it looks like a task is invoking sched_setattr() to become
SCHED_DEADLINE when dl_boosted is set...

Is this possible / correct?

If this (sched_setattr() with dl_boosted set) should not be possible,
then we have a bug that we need to investigate...

Otherwise, I suspect we can just remove the WARN_ON at line 628 of
deadline.c


			Luca




> > PM: Basic memory bitmaps freed
> > Kernel panic - not syncing: panic_on_warn set ...
> > CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
> > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > BIOS Google 01/01/2011
> > Call Trace:
> > __dump_stack lib/dump_stack.c:77 [inline]
> > dump_stack+0x244/0x39d lib/dump_stack.c:113
> > panic+0x2ad/0x55c kernel/panic.c:188
> > __warn.cold.8+0x20/0x45 kernel/panic.c:540
> > report_bug+0x254/0x2d0 lib/bug.c:186
> > fixup_bug arch/x86/kernel/traps.c:178 [inline]
> > do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
> > do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
> > invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
> > RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> > Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b
> > 95 d8 fe ff ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff
> > <0f> 0b e9 17 f1 ff ff 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
> > RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
> > RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
> > RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
> > RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
> > R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
> > R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
> > enqueue_task+0x184/0x390 kernel/sched/core.c:730
> > __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
> > sched_setattr kernel/sched/core.c:4394 [inline]
> > __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
> > __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
> > __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
> > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x457569
> > Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8
> > 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05
> > <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000013a RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
> > 0000000000457569 RDX: 0000000000000000 RSI: 0000000020000000 RDI:
> > 0000000000000000 RBP: 000000000072bfa0 R08: 0000000000000000 R09:
> > 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12:
> > 00007f05ce0a36d4 R13: 00000000004c369f R14: 00000000004d5730 R15:
> > 00000000ffffffff
> > 
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.20.0-rc2+ #338 Not tainted
> > ------------------------------------------------------
> > syz-executor0/6351 is trying to acquire lock:
> > 00000000b2b97155 ((console_sem).lock){-.-.}, at:
> > down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
> > 
> > but task is already holding lock:
> > 000000004cd5557e (&rq->lock){-.-.}, at: task_rq_lock+0xc5/0x2a0
> > kernel/sched/core.c:99
> > 
> > which lock already depends on the new lock.
> > 
> > 
> > the existing dependency chain (in reverse order) is:
> >   
> > -> #2 (&rq->lock){-.-.}:  
> >       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
> >       _raw_spin_lock+0x2d/0x40 kernel/locking/spinlock.c:144
> >       rq_lock kernel/sched/sched.h:1126 [inline]
> >       task_fork_fair+0xb0/0x6d0 kernel/sched/fair.c:9768
> >       sched_fork+0x443/0xba0 kernel/sched/core.c:2359
> >       copy_process+0x25b8/0x87a0 kernel/fork.c:1887
> >       _do_fork+0x1cb/0x11d0 kernel/fork.c:2216
> >       kernel_thread+0x34/0x40 kernel/fork.c:2275
> >       rest_init+0x28/0x372 init/main.c:409
> >       arch_call_rest_init+0xe/0x1b
> >       start_kernel+0x9f0/0xa2b init/main.c:745
> >       x86_64_start_reservations+0x2e/0x30
> > arch/x86/kernel/head64.c:472 x86_64_start_kernel+0x76/0x79
> > arch/x86/kernel/head64.c:451 secondary_startup_64+0xa4/0xb0
> > arch/x86/kernel/head_64.S:243 
> > -> #1 (&p->pi_lock){-.-.}:  
> >       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110
> > [inline] _raw_spin_lock_irqsave+0x99/0xd0
> > kernel/locking/spinlock.c:152 try_to_wake_up+0xdc/0x1490
> > kernel/sched/core.c:1965 wake_up_process+0x10/0x20
> > kernel/sched/core.c:2129 __up.isra.1+0x1c0/0x2a0
> > kernel/locking/semaphore.c:262 up+0x13c/0x1c0
> > kernel/locking/semaphore.c:187 __up_console_sem+0xbe/0x1b0
> > kernel/printk/printk.c:236 console_unlock+0x811/0x1190
> > kernel/printk/printk.c:2432 do_con_write+0x1356/0x23b0
> > drivers/tty/vt/vt.c:2767 con_write+0x25/0xc0
> > drivers/tty/vt/vt.c:3116 process_output_block
> > drivers/tty/n_tty.c:593 [inline] n_tty_write+0x6c1/0x11a0
> > drivers/tty/n_tty.c:2331 do_tty_write drivers/tty/tty_io.c:959
> > [inline] tty_write+0x3f1/0x880 drivers/tty/tty_io.c:1043
> >       __vfs_write+0x119/0x9f0 fs/read_write.c:485
> >       vfs_write+0x1fc/0x560 fs/read_write.c:549
> >       ksys_write+0x101/0x260 fs/read_write.c:598
> >       __do_sys_write fs/read_write.c:610 [inline]
> >       __se_sys_write fs/read_write.c:607 [inline]
> >       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
> >       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> >       entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >   
> > -> #0 ((console_sem).lock){-.-.}:  
> >       lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
> >       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110
> > [inline] _raw_spin_lock_irqsave+0x99/0xd0
> > kernel/locking/spinlock.c:152 down_trylock+0x13/0x70
> > kernel/locking/semaphore.c:136
> > __down_trylock_console_sem+0xae/0x1f0 kernel/printk/printk.c:219
> > console_trylock+0x15/0xa0 kernel/printk/printk.c:2247
> > console_trylock_spinning kernel/printk/printk.c:1653 [inline]
> > vprintk_emit+0x372/0x990 kernel/printk/printk.c:1921
> > vprintk_default+0x28/0x30 kernel/printk/printk.c:1964
> > vprintk_func+0x7e/0x181 kernel/printk/printk_safe.c:398
> > printk+0xa7/0xcf kernel/printk/printk.c:1997 __warn+0x9e/0x1d0
> > kernel/panic.c:522 report_bug+0x254/0x2d0 lib/bug.c:186
> >       fixup_bug arch/x86/kernel/traps.c:178 [inline]
> >       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
> >       do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
> >       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
> >       enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> >       enqueue_task+0x184/0x390 kernel/sched/core.c:730
> >       __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
> >       sched_setattr kernel/sched/core.c:4394 [inline]
> >       __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
> >       __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
> >       __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
> >       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> >       entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > other info that might help us debug this:
> > 
> > Chain exists of:
> >  (console_sem).lock --> &p->pi_lock --> &rq->lock
> > 
> > Possible unsafe locking scenario:
> > 
> >       CPU0                    CPU1
> >       ----                    ----
> >  lock(&rq->lock);
> >                               lock(&p->pi_lock);
> >                               lock(&rq->lock);
> >  lock((console_sem).lock);
> > 
> > *** DEADLOCK ***
> > 
> > 3 locks held by syz-executor0/6351:
> > #0: 000000001a0356c1 (rcu_read_lock){....}, at:
> > __do_sys_sched_setattr kernel/sched/core.c:4563 [inline]
> > #0: 000000001a0356c1 (rcu_read_lock){....}, at:
> > __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
> > #0: 000000001a0356c1 (rcu_read_lock){....}, at:
> > __x64_sys_sched_setattr+0x146/0x2f0 kernel/sched/core.c:4549
> > #1: 000000000b71b478 (&p->pi_lock){-.-.}, at:
> > task_rq_lock+0x62/0x2a0 kernel/sched/core.c:97
> > #2: 000000004cd5557e (&rq->lock){-.-.}, at: task_rq_lock+0xc5/0x2a0
> > kernel/sched/core.c:99
> > 
> > stack backtrace:
> > CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
> > Hardware name: Google Google Compute Engine/Google Compute Engine,
> > BIOS Google 01/01/2011
> > Call Trace:
> > __dump_stack lib/dump_stack.c:77 [inline]
> > dump_stack+0x244/0x39d lib/dump_stack.c:113
> > print_circular_bug.isra.35.cold.54+0x1bd/0x27d
> > kernel/locking/lockdep.c:1221 check_prev_add
> > kernel/locking/lockdep.c:1863 [inline] check_prevs_add
> > kernel/locking/lockdep.c:1976 [inline] validate_chain
> > kernel/locking/lockdep.c:2347 [inline] __lock_acquire+0x3399/0x4c20
> > kernel/locking/lockdep.c:3341 lock_acquire+0x1ed/0x520
> > kernel/locking/lockdep.c:3844 __raw_spin_lock_irqsave
> > include/linux/spinlock_api_smp.h:110 [inline]
> > _raw_spin_lock_irqsave+0x99/0xd0 kernel/locking/spinlock.c:152
> > down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
> > __down_trylock_console_sem+0xae/0x1f0 kernel/printk/printk.c:219
> > console_trylock+0x15/0xa0 kernel/printk/printk.c:2247
> > console_trylock_spinning kernel/printk/printk.c:1653 [inline]
> > vprintk_emit+0x372/0x990 kernel/printk/printk.c:1921
> > vprintk_default+0x28/0x30 kernel/printk/printk.c:1964
> > vprintk_func+0x7e/0x181 kernel/printk/printk_safe.c:398
> > printk+0xa7/0xcf kernel/printk/printk.c:1997 __warn+0x9e/0x1d0
> > kernel/panic.c:522 report_bug+0x254/0x2d0 lib/bug.c:186
> > fixup_bug arch/x86/kernel/traps.c:178 [inline]
> > do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
> > do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
> > invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
> > RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> > Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b
> > 95 d8 fe ff ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff
> > <0f> 0b e9 17 f1 ff ff 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
> > RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
> > RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
> > RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
> > RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
> > R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
> > R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
> > enqueue_task+0x184/0x390 kernel/sched/core.c:730
> > __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
> > sched_setattr kernel/sched/core.c:4394 [inline]
> > __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
> > __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
> > __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
> > do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x457569
> > Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8
> > 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05
> > <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000013a RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
> > 0000000000457569 RDX: 0000000000000000 RSI: 0000000020000000 RDI:
> > 0000000000000000 RBP: 000000000072bfa0 R08: 0000000000000000 R09:
> > 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12:
> > 00007f05ce0a36d4 R13: 00000000004c369f R14: 00000000004d5730 R15:
> > 00000000ffffffff Shutting down cpus with NMI
> > Kernel Offset: disabled
> > Rebooting in 86400 seconds..
> > 
> > 
> > ---
> > This bug is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > 
> > syzbot will keep track of this bug report. See:
> > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate
> > with syzbot. syzbot can test patches for this bug, for details see:
> > https://goo.gl/tpsmEJ#testing-patches  


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2018-11-19 12:07   ` luca abeni
@ 2018-11-19 12:52     ` Peter Zijlstra
  2018-11-19 13:43       ` Juri Lelli
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2018-11-19 12:52 UTC (permalink / raw)
  To: luca abeni
  Cc: Thomas Gleixner, syzbot, Borislav Petkov, H. Peter Anvin, LKML,
	Andy Lutomirski, mingo, syzkaller-bugs, x86, Juri Lelli,
	Daniel Bristot de Oliveira

On Mon, Nov 19, 2018 at 01:07:18PM +0100, luca abeni wrote:

> > On Sun, 18 Nov 2018, syzbot wrote:

> > > WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
> > > enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> 
> Here, it looks like a task is invoking sched_setattr() to become
> SCHED_DEADLINE when dl_boosted is set...
> 
> Is this possible / correct?

Possible, clearly. Correct, only in so far as that it is not a malformed
program, but it is very poor design to actually trigger this (of course
the fuzzer doesn't care about that).

> If this (sched_setattr() with dl_boosted set) should not be possible,
> then we have a bug that we need to investigate...
> 
> Otherwise, I suspect we can just remove the WARN_ON at line 628 of
> deadline.c

I wonder why we put that WARN in there to begin with... git-blame gives
us:

  98b0a8578050 ("sched/deadline: Remove useless parameter from setup_new_dl_entity()")

So the problem seems to be that if we're boosted, we should maybe not be
using our own (newly set) parameters, but those of the donor task.

Specifically, our 'suboptimal' deadline inheritance scheme 'requires' us
to use the inherited deadline, not our own. So in that respect I think
the WARN is valid, although I'm not sure what, apart from actually
finishing that PE patch-set we can do about it just now.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2018-11-19 12:52     ` Peter Zijlstra
@ 2018-11-19 13:43       ` Juri Lelli
  2018-11-19 15:32         ` Juri Lelli
  0 siblings, 1 reply; 18+ messages in thread
From: Juri Lelli @ 2018-11-19 13:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: luca abeni, Thomas Gleixner, syzbot, Borislav Petkov,
	H. Peter Anvin, LKML, Andy Lutomirski, mingo, syzkaller-bugs,
	x86, Daniel Bristot de Oliveira

On 19/11/18 13:52, Peter Zijlstra wrote:
> On Mon, Nov 19, 2018 at 01:07:18PM +0100, luca abeni wrote:
> 
> > > On Sun, 18 Nov 2018, syzbot wrote:
> 
> > > > WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
> > > > enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> > 
> > Here, it looks like a task is invoking sched_setattr() to become
> > SCHED_DEADLINE when dl_boosted is set...
> > 
> > Is this possible / correct?
> 
> Possible, clearly. Correct, only in so far as that it is not a malformed
> program, but it is very poor design to actually trigger this (of course
> the fuzzer doesn't care about that).
> 
> > If this (sched_setattr() with dl_boosted set) should not be possible,
> > then we have a bug that we need to investigate...
> > 
> > Otherwise, I suspect we can just remove the WARN_ON at line 628 of
> > deadline.c
> 
> I wonder why we put that WARN in there to begin with... git-blame gives
> us:
> 
>   98b0a8578050 ("sched/deadline: Remove useless parameter from setup_new_dl_entity()")
> 
> So the problem seems to be that if we're boosted, we should maybe not be
> using our own (newly set) parameters, but those of the donor task.
> 
> Specifically, our 'suboptimal' deadline inheritance scheme 'requires' us
> to use the inherited deadline, not our own. So in that respect I think
> the WARN is valid, although I'm not sure what, apart from actually
> finishing that PE patch-set we can do about it just now.

Mmm, but, as it was written in the comment that was removed by 295d6d5
("sched/deadline: Fix switching to -deadline"), I was still expecting
that for a boosted task setup_new_dl_entity() shouldn't be called.
Wonder if this is another manifestation of the problems we have with
clocks. Need to think more about it.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2018-11-19 13:43       ` Juri Lelli
@ 2018-11-19 15:32         ` Juri Lelli
  2019-01-07 16:19           ` Daniel Bristot de Oliveira
                             ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Juri Lelli @ 2018-11-19 15:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: luca abeni, Thomas Gleixner, syzbot, Borislav Petkov,
	H. Peter Anvin, LKML, Andy Lutomirski, mingo, syzkaller-bugs,
	x86, Daniel Bristot de Oliveira

On 19/11/18 14:43, Juri Lelli wrote:
> On 19/11/18 13:52, Peter Zijlstra wrote:
> > On Mon, Nov 19, 2018 at 01:07:18PM +0100, luca abeni wrote:
> > 
> > > > On Sun, 18 Nov 2018, syzbot wrote:
> > 
> > > > > WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
> > > > > enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> > > 
> > > Here, it looks like a task is invoking sched_setattr() to become
> > > SCHED_DEADLINE when dl_boosted is set...
> > > 
> > > Is this possible / correct?
> > 
> > Possible, clearly. Correct, only in so far as that it is not a malformed
> > program, but it is very poor design to actually trigger this (of course
> > the fuzzer doesn't care about that).
> > 
> > > If this (sched_setattr() with dl_boosted set) should not be possible,
> > > then we have a bug that we need to investigate...
> > > 
> > > Otherwise, I suspect we can just remove the WARN_ON at line 628 of
> > > deadline.c
> > 
> > I wonder why we put that WARN in there to begin with... git-blame gives
> > us:
> > 
> >   98b0a8578050 ("sched/deadline: Remove useless parameter from setup_new_dl_entity()")
> > 
> > So the problem seems to be that if we're boosted, we should maybe not be
> > using our own (newly set) parameters, but those of the donor task.
> > 
> > Specifically, our 'suboptimal' deadline inheritance scheme 'requires' us
> > to use the inherited deadline, not our own. So in that respect I think
> > the WARN is valid, although I'm not sure what, apart from actually
> > finishing that PE patch-set we can do about it just now.
> 
> Mmm, but, as it was written in the comment that was removed by 295d6d5
> ("sched/deadline: Fix switching to -deadline"), I was still expecting
> that for a boosted task setup_new_dl_entity() shouldn't be called.
> Wonder if this is another manifestation of the problems we have with
> clocks. Need to think more about it.

So, while this looks like nothing more than a stop-gap solution until we
get PE in place, would the following make any sense? It seems I can't
reproduce the warning anymore with it (w/o it usually takes a few secs
to reproduce).

--->8---

From 9326fd2b20269cffef7290bdc5b8173460d3c870 Mon Sep 17 00:00:00 2001
From: Juri Lelli <juri.lelli@redhat.com>
Date: Mon, 19 Nov 2018 16:04:42 +0100
Subject: [PATCH] sched/core: Fix PI boosting between RT and DEADLINE

syzbot reported the following warning:

 WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
 enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
 PM: Basic memory bitmaps freed
 Kernel panic - not syncing: panic_on_warn set ...
 CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x244/0x39d lib/dump_stack.c:113
   panic+0x2ad/0x55c kernel/panic.c:188
   __warn.cold.8+0x20/0x45 kernel/panic.c:540
   report_bug+0x254/0x2d0 lib/bug.c:186
   fixup_bug arch/x86/kernel/traps.c:178 [inline]
   do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
   do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
 RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
 Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b 95 d8 fe
 ff ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff <0f> 0b e9 17 f1
 ff ff 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
 RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
 RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
 RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
 RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
 R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
 R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
   enqueue_task+0x184/0x390 kernel/sched/core.c:730
   __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
   sched_setattr kernel/sched/core.c:4394 [inline]
   __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
   __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
   __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x457569
 Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
 ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
 RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
 RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
 RBP: 000000000072bfa0 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f05ce0a36d4
 R13: 00000000004c369f R14: 00000000004d5730 R15: 00000000ffffffff

At deadline.c:628 we have:

 623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
 624 {
 625 	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
 626 	struct rq *rq = rq_of_dl_rq(dl_rq);
 627
 628 	WARN_ON(dl_se->dl_boosted);
 629 	WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
        [...]
     }

Which means that setup_new_dl_entity() has been called on a task
currently boosted. This shouldn't happen though, as setup_new_
dl_entity() is only called when the 'dynamic' deadline of the new entity
is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
condition.

Digging through PI code I noticed that what above might in fact happen
if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
first branch of boosting conditions we check only if a pi_task 'dynamic'
deadline is earlier than mutex holder's and in this case we set mutex
holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
initialized if such tasks get boosted at some point (or if they become
DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
to 0 and this verifies the aforementioned condition.

Fix it by checking that the potential donor task is actually (even if
temporary because in turn boosted) running at DEADLINE priority before
using its 'dynamic' deadline value.

Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5afb868f7339..d17257613f10 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3827,7 +3827,8 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
 	 */
 	if (dl_prio(prio)) {
 		if (!dl_prio(p->normal_prio) ||
-		    (pi_task && dl_entity_preempt(&pi_task->dl, &p->dl))) {
+		    (pi_task && dl_prio(pi_task->prio) &&
+		     dl_entity_preempt(&pi_task->dl, &p->dl))) {
 			p->dl.dl_boosted = 1;
 			queue_flag |= ENQUEUE_REPLENISH;
 		} else
-- 
2.17.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2018-11-18 18:49 WARNING in enqueue_task_dl syzbot
  2018-11-19  8:23 ` Thomas Gleixner
@ 2018-12-31 15:02 ` syzbot
  2019-01-02  9:15   ` luca abeni
  2019-03-20 17:08 ` syzbot
  2 siblings, 1 reply; 18+ messages in thread
From: syzbot @ 2018-12-31 15:02 UTC (permalink / raw)
  To: bp, bristot, hpa, juri.lelli, linux-kernel, luca.abeni, luto,
	mingo, peterz, syzkaller-bugs, tglx, x86

syzbot has found a reproducer for the following crash on:

HEAD commit:    195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=118af84b400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=76d28549be7c27cf
dashboard link: https://syzkaller.appspot.com/bug?extid=119ba87189432ead09b4
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10eb7ebf400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14156d77400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com

WARNING: CPU: 0 PID: 9019 at kernel/sched/deadline.c:628  
setup_new_dl_entity kernel/sched/deadline.c:629 [inline]
WARNING: CPU: 0 PID: 9019 at kernel/sched/deadline.c:628 enqueue_dl_entity  
kernel/sched/deadline.c:1429 [inline]
WARNING: CPU: 0 PID: 9019 at kernel/sched/deadline.c:628  
enqueue_task_dl+0x2355/0x3dc0 kernel/sched/deadline.c:1500
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9019 Comm: syz-executor280 Not tainted 4.20.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
  panic+0x2cb/0x589 kernel/panic.c:189
  __warn.cold+0x20/0x4b kernel/panic.c:544
  report_bug+0x263/0x2b0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  fixup_bug arch/x86/kernel/traps.c:173 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
  do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:setup_new_dl_entity kernel/sched/deadline.c:628 [inline]
RIP: 0010:enqueue_dl_entity kernel/sched/deadline.c:1429 [inline]
RIP: 0010:enqueue_task_dl+0x2355/0x3dc0 kernel/sched/deadline.c:1500
Code: 3c 02 00 0f 85 ba 05 00 00 49 8b b5 50 0a 00 00 e9 53 fa ff ff e8 fb  
f2 64 00 48 8d 4d d8 e9 48 dd ff ff 0f 0b e9 92 f1 ff ff <0f> 0b e9 18 f1  
ff ff 4c 89 ef 4c 89 95 28 ff ff ff 4c 89 85 30 ff
RSP: 0018:ffff88809eebfaf8 EFLAGS: 00010002
RAX: 0000000000000002 RBX: 1ffff11013dd7f6a RCX: dffffc0000000000
RDX: 000000333cf09f75 RSI: 0000000000000004 RDI: ffff8880ae62d850
RBP: ffff88809eebfbf8 R08: ffff88807fb0a538 R09: ffff88807fb0a2fc
R10: ffff88807fb0a580 R11: ffff8880ae62dc7b R12: ffff88807fb0a2c0
R13: ffff8880ae62ce00 R14: ffff8880ae62ce00 R15: ffff88807fb0a58c
  enqueue_task+0xb9/0x380 kernel/sched/core.c:730
  __sched_setscheduler+0xe32/0x1fe0 kernel/sched/core.c:4336
  sched_setattr kernel/sched/core.c:4394 [inline]
  __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
  __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
  __x64_sys_sched_setattr+0x1af/0x2f0 kernel/sched/core.c:4549
  do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x44c829
Code: e8 8c d8 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 eb c9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f28685e8ce8 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
RAX: ffffffffffffffda RBX: 00000000006e49f8 RCX: 000000000044c829
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
RBP: 00000000006e49f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e49fc
R13: 00007ffd1981c8af R14: 00007f28685e99c0 R15: 0000000000000001

======================================================
WARNING: possible circular locking dependency detected
4.20.0+ #1 Not tainted
------------------------------------------------------
syz-executor280/9019 is trying to acquire lock:
000000001aef527c ((console_sem).lock){-.-.}, at: down_trylock+0x13/0x70  
kernel/locking/semaphore.c:136

but task is already holding lock:
000000000ba17b09 (&rq->lock){-.-.}, at: task_rq_lock+0xc8/0x290  
kernel/sched/core.c:99

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&rq->lock){-.-.}:
        __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
        _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
        rq_lock kernel/sched/sched.h:1149 [inline]
        task_fork_fair+0xb5/0x7a0 kernel/sched/fair.c:10083
        sched_fork+0x437/0xb90 kernel/sched/core.c:2359
        copy_process+0x1ff6/0x8730 kernel/fork.c:1893
        _do_fork+0x1a9/0x1170 kernel/fork.c:2222
        kernel_thread+0x34/0x40 kernel/fork.c:2281
        rest_init+0x28/0x37b init/main.c:409
        arch_call_rest_init+0xe/0x1b
        start_kernel+0x882/0x8bd init/main.c:741
        x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:470
        x86_64_start_kernel+0x77/0x7b arch/x86/kernel/head64.c:451
        secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243

-> #1 (&p->pi_lock){-.-.}:
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
        try_to_wake_up+0xb9/0x1480 kernel/sched/core.c:1965
        wake_up_process+0x10/0x20 kernel/sched/core.c:2129
        __up.isra.0+0x1c0/0x2a0 kernel/locking/semaphore.c:262
        up+0x13e/0x1c0 kernel/locking/semaphore.c:187
        __up_console_sem+0xb7/0x1c0 kernel/printk/printk.c:236
        console_unlock+0x778/0x11e0 kernel/printk/printk.c:2426
        con_flush_chars drivers/tty/vt/vt.c:3197 [inline]
        con_flush_chars drivers/tty/vt/vt.c:3185 [inline]
        con_write+0xa2/0xb0 drivers/tty/vt/vt.c:3117
        process_output_block drivers/tty/n_tty.c:593 [inline]
        n_tty_write+0x497/0x1220 drivers/tty/n_tty.c:2331
        do_tty_write drivers/tty/tty_io.c:959 [inline]
        tty_write+0x45b/0x7a0 drivers/tty/tty_io.c:1043
        __vfs_write+0x116/0xb40 fs/read_write.c:485
        vfs_write+0x20c/0x580 fs/read_write.c:549
        ksys_write+0x105/0x260 fs/read_write.c:598
        __do_sys_write fs/read_write.c:610 [inline]
        __se_sys_write fs/read_write.c:607 [inline]
        __x64_sys_write+0x73/0xb0 fs/read_write.c:607
        do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 ((console_sem).lock){-.-.}:
        lock_acquire+0x1db/0x570 kernel/locking/lockdep.c:3841
        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
        _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
        down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
        __down_trylock_console_sem+0xa8/0x210 kernel/printk/printk.c:219
        console_trylock+0x15/0xa0 kernel/printk/printk.c:2242
        console_trylock_spinning kernel/printk/printk.c:1662 [inline]
        vprintk_emit+0x351/0x960 kernel/printk/printk.c:1930
        vprintk_default+0x28/0x30 kernel/printk/printk.c:1958
        vprintk_func+0x7e/0x189 kernel/printk/printk_safe.c:398
        printk+0xba/0xed kernel/printk/printk.c:1991
        __warn+0x9e/0x1d0 kernel/panic.c:526
        report_bug+0x263/0x2b0 lib/bug.c:186
        fixup_bug arch/x86/kernel/traps.c:178 [inline]
        fixup_bug arch/x86/kernel/traps.c:173 [inline]
        do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
        do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
        invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
        setup_new_dl_entity kernel/sched/deadline.c:629 [inline]
        enqueue_dl_entity kernel/sched/deadline.c:1429 [inline]
        enqueue_task_dl+0x2355/0x3dc0 kernel/sched/deadline.c:1500
        enqueue_task+0xb9/0x380 kernel/sched/core.c:730
        __sched_setscheduler+0xe32/0x1fe0 kernel/sched/core.c:4336
        sched_setattr kernel/sched/core.c:4394 [inline]
        __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
        __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
        __x64_sys_sched_setattr+0x1af/0x2f0 kernel/sched/core.c:4549
        do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Chain exists of:
   (console_sem).lock --> &p->pi_lock --> &rq->lock

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&rq->lock);
                                lock(&p->pi_lock);
                                lock(&rq->lock);
   lock((console_sem).lock);

  *** DEADLOCK ***

3 locks held by syz-executor280/9019:
  #0: 0000000014b8e16d (rcu_read_lock){....}, at: __do_sys_sched_setattr  
kernel/sched/core.c:4563 [inline]
  #0: 0000000014b8e16d (rcu_read_lock){....}, at: __se_sys_sched_setattr  
kernel/sched/core.c:4549 [inline]
  #0: 0000000014b8e16d (rcu_read_lock){....}, at:  
__x64_sys_sched_setattr+0x144/0x2f0 kernel/sched/core.c:4549
  #1: 00000000b31ff59d (&p->pi_lock){-.-.}, at: task_rq_lock+0x6a/0x290  
kernel/sched/core.c:97
  #2: 000000000ba17b09 (&rq->lock){-.-.}, at: task_rq_lock+0xc8/0x290  
kernel/sched/core.c:99

stack backtrace:
CPU: 0 PID: 9019 Comm: syz-executor280 Not tainted 4.20.0+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
  print_circular_bug.isra.0.cold+0x1cc/0x28f kernel/locking/lockdep.c:1224
  check_prev_add kernel/locking/lockdep.c:1866 [inline]
  check_prevs_add kernel/locking/lockdep.c:1979 [inline]
  validate_chain kernel/locking/lockdep.c:2350 [inline]
  __lock_acquire+0x3014/0x4a30 kernel/locking/lockdep.c:3338
  lock_acquire+0x1db/0x570 kernel/locking/lockdep.c:3841
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
  down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
  __down_trylock_console_sem+0xa8/0x210 kernel/printk/printk.c:219
  console_trylock+0x15/0xa0 kernel/printk/printk.c:2242
  console_trylock_spinning kernel/printk/printk.c:1662 [inline]
  vprintk_emit+0x351/0x960 kernel/printk/printk.c:1930
  vprintk_default+0x28/0x30 kernel/printk/printk.c:1958
  vprintk_func+0x7e/0x189 kernel/printk/printk_safe.c:398
  printk+0xba/0xed kernel/printk/printk.c:1991
  __warn+0x9e/0x1d0 kernel/panic.c:526
  report_bug+0x263/0x2b0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  fixup_bug arch/x86/kernel/traps.c:173 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
  do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:setup_new_dl_entity kernel/sched/deadline.c:628 [inline]
RIP: 0010:enqueue_dl_entity kernel/sched/deadline.c:1429 [inline]
RIP: 0010:enqueue_task_dl+0x2355/0x3dc0 kernel/sched/deadline.c:1500
Code: 3c 02 00 0f 85 ba 05 00 00 49 8b b5 50 0a 00 00 e9 53 fa ff ff e8 fb  
f2 64 00 48 8d 4d d8 e9 48 dd ff ff 0f 0b e9 92 f1 ff ff <0f> 0b e9 18 f1  
ff ff 4c 89 ef 4c 89 95 28 ff ff ff 4c 89 85 30 ff
RSP: 0018:ffff88809eebfaf8 EFLAGS: 00010002
RAX: 0000000000000002 RBX: 1ffff11013dd7f6a RCX: dffffc0000000000
RDX: 000000333cf09f75 RSI: 0000000000000004 RDI: ffff8880ae62d850
RBP: ffff88809eebfbf8 R08: ffff88807fb0a538 R09: ffff88807fb0a2fc
R10: ffff88807fb0a580 R11: ffff8880ae62dc7b R12: ffff88807fb0a2c0
R13: ffff8880ae62ce00 R14: ffff8880ae62ce00 R15: ffff88807fb0a58c
  enqueue_task+0xb9/0x380 kernel/sched/core.c:730
  __sched_setscheduler+0xe32/0x1fe0 kernel/sched/core.c:4336
  sched_setattr kernel/sched/core.c:4394 [inline]
  __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
  __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
  __x64_sys_sched_setattr+0x1af/0x2f0 kernel/sched/core.c:4549
  do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x44c829
Code: e8 8c d8 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 eb c9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f28685e8ce8 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
RAX: ffffffffffffffda RBX: 00000000006e49f8 RCX: 000000000044c829
RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
RBP: 00000000006e49f0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e49fc
R13: 00007ffd1981c8af R14: 00007f28685e99c0 R15: 0000000000000001
Shutting down cpus with NMI
Kernel Offset: disabled
Rebooting in 86400 seconds..


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2018-12-31 15:02 ` WARNING in enqueue_task_dl syzbot
@ 2019-01-02  9:15   ` luca abeni
  2019-01-07  7:46     ` Juri Lelli
  0 siblings, 1 reply; 18+ messages in thread
From: luca abeni @ 2019-01-02  9:15 UTC (permalink / raw)
  To: juri.lelli
  Cc: syzbot, bp, bristot, hpa, linux-kernel, luto, mingo, peterz,
	syzkaller-bugs, tglx, x86

Hi all,
(and, happy new year to everyone!)

this looks similar to a bug we have seen some time ago (a task
switching from SCHED_OTHER to SCHED_DEADLINE while inheriting a
deadline from a SCHED_DEADLINE task triggers the warning)...

Juri, I think you found a fix for such a bug; has it been committed?



			Thanks,
				Luca

On Mon, 31 Dec 2018 07:02:04 -0800
syzbot <syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com> wrote:

> syzbot has found a reproducer for the following crash on:
> 
> HEAD commit:    195303136f19 Merge tag 'kconfig-v4.21-2' of
> git://git.kern.. git tree:       upstream
> console output:
> https://syzkaller.appspot.com/x/log.txt?x=118af84b400000 kernel
> config:  https://syzkaller.appspot.com/x/.config?x=76d28549be7c27cf
> dashboard link:
> https://syzkaller.appspot.com/bug?extid=119ba87189432ead09b4
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental) syz
> repro:
> https://syzkaller.appspot.com/x/repro.syz?x=10eb7ebf400000 C
> reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14156d77400000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the
> commit: Reported-by:
> syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
> 
> WARNING: CPU: 0 PID: 9019 at kernel/sched/deadline.c:628  
> setup_new_dl_entity kernel/sched/deadline.c:629 [inline]
> WARNING: CPU: 0 PID: 9019 at kernel/sched/deadline.c:628
> enqueue_dl_entity kernel/sched/deadline.c:1429 [inline]
> WARNING: CPU: 0 PID: 9019 at kernel/sched/deadline.c:628  
> enqueue_task_dl+0x2355/0x3dc0 kernel/sched/deadline.c:1500
> Kernel panic - not syncing: panic_on_warn set ...
> CPU: 0 PID: 9019 Comm: syz-executor280 Not tainted 4.20.0+ #1
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> Call Trace:
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
>   panic+0x2cb/0x589 kernel/panic.c:189
>   __warn.cold+0x20/0x4b kernel/panic.c:544
>   report_bug+0x263/0x2b0 lib/bug.c:186
>   fixup_bug arch/x86/kernel/traps.c:178 [inline]
>   fixup_bug arch/x86/kernel/traps.c:173 [inline]
>   do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
>   do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
>   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
> RIP: 0010:setup_new_dl_entity kernel/sched/deadline.c:628 [inline]
> RIP: 0010:enqueue_dl_entity kernel/sched/deadline.c:1429 [inline]
> RIP: 0010:enqueue_task_dl+0x2355/0x3dc0 kernel/sched/deadline.c:1500
> Code: 3c 02 00 0f 85 ba 05 00 00 49 8b b5 50 0a 00 00 e9 53 fa ff ff
> e8 fb f2 64 00 48 8d 4d d8 e9 48 dd ff ff 0f 0b e9 92 f1 ff ff <0f>
> 0b e9 18 f1 ff ff 4c 89 ef 4c 89 95 28 ff ff ff 4c 89 85 30 ff
> RSP: 0018:ffff88809eebfaf8 EFLAGS: 00010002
> RAX: 0000000000000002 RBX: 1ffff11013dd7f6a RCX: dffffc0000000000
> RDX: 000000333cf09f75 RSI: 0000000000000004 RDI: ffff8880ae62d850
> RBP: ffff88809eebfbf8 R08: ffff88807fb0a538 R09: ffff88807fb0a2fc
> R10: ffff88807fb0a580 R11: ffff8880ae62dc7b R12: ffff88807fb0a2c0
> R13: ffff8880ae62ce00 R14: ffff8880ae62ce00 R15: ffff88807fb0a58c
>   enqueue_task+0xb9/0x380 kernel/sched/core.c:730
>   __sched_setscheduler+0xe32/0x1fe0 kernel/sched/core.c:4336
>   sched_setattr kernel/sched/core.c:4394 [inline]
>   __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
>   __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
>   __x64_sys_sched_setattr+0x1af/0x2f0 kernel/sched/core.c:4549
>   do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x44c829
> Code: e8 8c d8 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> 3d 01 f0 ff ff 0f 83 eb c9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007f28685e8ce8 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
> RAX: ffffffffffffffda RBX: 00000000006e49f8 RCX: 000000000044c829
> RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
> RBP: 00000000006e49f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e49fc
> R13: 00007ffd1981c8af R14: 00007f28685e99c0 R15: 0000000000000001
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.20.0+ #1 Not tainted
> ------------------------------------------------------
> syz-executor280/9019 is trying to acquire lock:
> 000000001aef527c ((console_sem).lock){-.-.}, at:
> down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
> 
> but task is already holding lock:
> 000000000ba17b09 (&rq->lock){-.-.}, at: task_rq_lock+0xc8/0x290  
> kernel/sched/core.c:99
> 
> which lock already depends on the new lock.
> 
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (&rq->lock){-.-.}:  
>         __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
>         _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:144
>         rq_lock kernel/sched/sched.h:1149 [inline]
>         task_fork_fair+0xb5/0x7a0 kernel/sched/fair.c:10083
>         sched_fork+0x437/0xb90 kernel/sched/core.c:2359
>         copy_process+0x1ff6/0x8730 kernel/fork.c:1893
>         _do_fork+0x1a9/0x1170 kernel/fork.c:2222
>         kernel_thread+0x34/0x40 kernel/fork.c:2281
>         rest_init+0x28/0x37b init/main.c:409
>         arch_call_rest_init+0xe/0x1b
>         start_kernel+0x882/0x8bd init/main.c:741
>         x86_64_start_reservations+0x29/0x2b
> arch/x86/kernel/head64.c:470 x86_64_start_kernel+0x77/0x7b
> arch/x86/kernel/head64.c:451 secondary_startup_64+0xa4/0xb0
> arch/x86/kernel/head_64.S:243
> 
> -> #1 (&p->pi_lock){-.-.}:  
>         __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110
> [inline] _raw_spin_lock_irqsave+0x95/0xcd
> kernel/locking/spinlock.c:152 try_to_wake_up+0xb9/0x1480
> kernel/sched/core.c:1965 wake_up_process+0x10/0x20
> kernel/sched/core.c:2129 __up.isra.0+0x1c0/0x2a0
> kernel/locking/semaphore.c:262 up+0x13e/0x1c0
> kernel/locking/semaphore.c:187 __up_console_sem+0xb7/0x1c0
> kernel/printk/printk.c:236 console_unlock+0x778/0x11e0
> kernel/printk/printk.c:2426 con_flush_chars drivers/tty/vt/vt.c:3197
> [inline] con_flush_chars drivers/tty/vt/vt.c:3185 [inline]
>         con_write+0xa2/0xb0 drivers/tty/vt/vt.c:3117
>         process_output_block drivers/tty/n_tty.c:593 [inline]
>         n_tty_write+0x497/0x1220 drivers/tty/n_tty.c:2331
>         do_tty_write drivers/tty/tty_io.c:959 [inline]
>         tty_write+0x45b/0x7a0 drivers/tty/tty_io.c:1043
>         __vfs_write+0x116/0xb40 fs/read_write.c:485
>         vfs_write+0x20c/0x580 fs/read_write.c:549
>         ksys_write+0x105/0x260 fs/read_write.c:598
>         __do_sys_write fs/read_write.c:610 [inline]
>         __se_sys_write fs/read_write.c:607 [inline]
>         __x64_sys_write+0x73/0xb0 fs/read_write.c:607
>         do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
>         entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> -> #0 ((console_sem).lock){-.-.}:  
>         lock_acquire+0x1db/0x570 kernel/locking/lockdep.c:3841
>         __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110
> [inline] _raw_spin_lock_irqsave+0x95/0xcd
> kernel/locking/spinlock.c:152 down_trylock+0x13/0x70
> kernel/locking/semaphore.c:136 __down_trylock_console_sem+0xa8/0x210
> kernel/printk/printk.c:219 console_trylock+0x15/0xa0
> kernel/printk/printk.c:2242 console_trylock_spinning
> kernel/printk/printk.c:1662 [inline] vprintk_emit+0x351/0x960
> kernel/printk/printk.c:1930 vprintk_default+0x28/0x30
> kernel/printk/printk.c:1958 vprintk_func+0x7e/0x189
> kernel/printk/printk_safe.c:398 printk+0xba/0xed
> kernel/printk/printk.c:1991 __warn+0x9e/0x1d0 kernel/panic.c:526
>         report_bug+0x263/0x2b0 lib/bug.c:186
>         fixup_bug arch/x86/kernel/traps.c:178 [inline]
>         fixup_bug arch/x86/kernel/traps.c:173 [inline]
>         do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
>         do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
>         invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
>         setup_new_dl_entity kernel/sched/deadline.c:629 [inline]
>         enqueue_dl_entity kernel/sched/deadline.c:1429 [inline]
>         enqueue_task_dl+0x2355/0x3dc0 kernel/sched/deadline.c:1500
>         enqueue_task+0xb9/0x380 kernel/sched/core.c:730
>         __sched_setscheduler+0xe32/0x1fe0 kernel/sched/core.c:4336
>         sched_setattr kernel/sched/core.c:4394 [inline]
>         __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
>         __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
>         __x64_sys_sched_setattr+0x1af/0x2f0 kernel/sched/core.c:4549
>         do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
>         entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    (console_sem).lock --> &p->pi_lock --> &rq->lock
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&rq->lock);
>                                 lock(&p->pi_lock);
>                                 lock(&rq->lock);
>    lock((console_sem).lock);
> 
>   *** DEADLOCK ***
> 
> 3 locks held by syz-executor280/9019:
>   #0: 0000000014b8e16d (rcu_read_lock){....}, at:
> __do_sys_sched_setattr kernel/sched/core.c:4563 [inline]
>   #0: 0000000014b8e16d (rcu_read_lock){....}, at:
> __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
>   #0: 0000000014b8e16d (rcu_read_lock){....}, at:  
> __x64_sys_sched_setattr+0x144/0x2f0 kernel/sched/core.c:4549
>   #1: 00000000b31ff59d (&p->pi_lock){-.-.}, at:
> task_rq_lock+0x6a/0x290 kernel/sched/core.c:97
>   #2: 000000000ba17b09 (&rq->lock){-.-.}, at:
> task_rq_lock+0xc8/0x290 kernel/sched/core.c:99
> 
> stack backtrace:
> CPU: 0 PID: 9019 Comm: syz-executor280 Not tainted 4.20.0+ #1
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> Call Trace:
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
>   print_circular_bug.isra.0.cold+0x1cc/0x28f
> kernel/locking/lockdep.c:1224 check_prev_add
> kernel/locking/lockdep.c:1866 [inline] check_prevs_add
> kernel/locking/lockdep.c:1979 [inline] validate_chain
> kernel/locking/lockdep.c:2350 [inline] __lock_acquire+0x3014/0x4a30
> kernel/locking/lockdep.c:3338 lock_acquire+0x1db/0x570
> kernel/locking/lockdep.c:3841 __raw_spin_lock_irqsave
> include/linux/spinlock_api_smp.h:110 [inline]
> _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
> down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
> __down_trylock_console_sem+0xa8/0x210 kernel/printk/printk.c:219
> console_trylock+0x15/0xa0 kernel/printk/printk.c:2242
> console_trylock_spinning kernel/printk/printk.c:1662 [inline]
> vprintk_emit+0x351/0x960 kernel/printk/printk.c:1930
> vprintk_default+0x28/0x30 kernel/printk/printk.c:1958
> vprintk_func+0x7e/0x189 kernel/printk/printk_safe.c:398
> printk+0xba/0xed kernel/printk/printk.c:1991 __warn+0x9e/0x1d0
> kernel/panic.c:526 report_bug+0x263/0x2b0 lib/bug.c:186
>   fixup_bug arch/x86/kernel/traps.c:178 [inline]
>   fixup_bug arch/x86/kernel/traps.c:173 [inline]
>   do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
>   do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:290
>   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
> RIP: 0010:setup_new_dl_entity kernel/sched/deadline.c:628 [inline]
> RIP: 0010:enqueue_dl_entity kernel/sched/deadline.c:1429 [inline]
> RIP: 0010:enqueue_task_dl+0x2355/0x3dc0 kernel/sched/deadline.c:1500
> Code: 3c 02 00 0f 85 ba 05 00 00 49 8b b5 50 0a 00 00 e9 53 fa ff ff
> e8 fb f2 64 00 48 8d 4d d8 e9 48 dd ff ff 0f 0b e9 92 f1 ff ff <0f>
> 0b e9 18 f1 ff ff 4c 89 ef 4c 89 95 28 ff ff ff 4c 89 85 30 ff
> RSP: 0018:ffff88809eebfaf8 EFLAGS: 00010002
> RAX: 0000000000000002 RBX: 1ffff11013dd7f6a RCX: dffffc0000000000
> RDX: 000000333cf09f75 RSI: 0000000000000004 RDI: ffff8880ae62d850
> RBP: ffff88809eebfbf8 R08: ffff88807fb0a538 R09: ffff88807fb0a2fc
> R10: ffff88807fb0a580 R11: ffff8880ae62dc7b R12: ffff88807fb0a2c0
> R13: ffff8880ae62ce00 R14: ffff8880ae62ce00 R15: ffff88807fb0a58c
>   enqueue_task+0xb9/0x380 kernel/sched/core.c:730
>   __sched_setscheduler+0xe32/0x1fe0 kernel/sched/core.c:4336
>   sched_setattr kernel/sched/core.c:4394 [inline]
>   __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
>   __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
>   __x64_sys_sched_setattr+0x1af/0x2f0 kernel/sched/core.c:4549
>   do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x44c829
> Code: e8 8c d8 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48>
> 3d 01 f0 ff ff 0f 83 eb c9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007f28685e8ce8 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
> RAX: ffffffffffffffda RBX: 00000000006e49f8 RCX: 000000000044c829
> RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
> RBP: 00000000006e49f0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e49fc
> R13: 00007ffd1981c8af R14: 00007f28685e99c0 R15: 0000000000000001
> Shutting down cpus with NMI
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2019-01-02  9:15   ` luca abeni
@ 2019-01-07  7:46     ` Juri Lelli
  0 siblings, 0 replies; 18+ messages in thread
From: Juri Lelli @ 2019-01-07  7:46 UTC (permalink / raw)
  To: luca abeni
  Cc: syzbot, bp, bristot, hpa, linux-kernel, luto, mingo, peterz,
	syzkaller-bugs, tglx, x86

Hi,

On 02/01/19 10:15, luca abeni wrote:
> Hi all,
> (and, happy new year to everyone!)
> 
> this looks similar to a bug we have seen some time ago (a task
> switching from SCHED_OTHER to SCHED_DEADLINE while inheriting a
> deadline from a SCHED_DEADLINE task triggers the warning)...
> 
> Juri, I think you found a fix for such a bug; has it been committed?

Looks similar to the one I proposed this fix for:

https://lore.kernel.org/lkml/20181119153201.GB2119@localhost.localdomain/

AFAIK it hasn't been reviewed/tested yet. Could you (or someone) please
have a look?

Best,

- Juri

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2018-11-19 15:32         ` Juri Lelli
@ 2019-01-07 16:19           ` Daniel Bristot de Oliveira
  2019-02-07  9:35             ` Dmitry Vyukov
  2020-06-23  7:19           ` [tip: sched/urgent] sched/core: Fix PI boosting between RT and DEADLINE tip-bot2 for Juri Lelli
  2020-06-23  8:48           ` [tip: sched/urgent] sched/core: Fix PI boosting between RT and DEADLINE tasks tip-bot2 for Juri Lelli
  2 siblings, 1 reply; 18+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-01-07 16:19 UTC (permalink / raw)
  To: Juri Lelli, Peter Zijlstra
  Cc: luca abeni, Thomas Gleixner, syzbot, Borislav Petkov,
	H. Peter Anvin, LKML, Andy Lutomirski, mingo, syzkaller-bugs,
	x86

On 11/19/18 4:32 PM, Juri Lelli wrote:
> From 9326fd2b20269cffef7290bdc5b8173460d3c870 Mon Sep 17 00:00:00 2001
> From: Juri Lelli <juri.lelli@redhat.com>
> Date: Mon, 19 Nov 2018 16:04:42 +0100
> Subject: [PATCH] sched/core: Fix PI boosting between RT and DEADLINE
> 
> syzbot reported the following warning:
> 
>  WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
>  enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
>  PM: Basic memory bitmaps freed
>  Kernel panic - not syncing: panic_on_warn set ...
>  CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>  Google 01/01/2011
>  Call Trace:
>    __dump_stack lib/dump_stack.c:77 [inline]
>    dump_stack+0x244/0x39d lib/dump_stack.c:113
>    panic+0x2ad/0x55c kernel/panic.c:188
>    __warn.cold.8+0x20/0x45 kernel/panic.c:540
>    report_bug+0x254/0x2d0 lib/bug.c:186
>    fixup_bug arch/x86/kernel/traps.c:178 [inline]
>    do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
>    do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
>    invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
>  RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
>  Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b 95 d8 fe
>  ff ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff <0f> 0b e9 17 f1
>  ff ff 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
>  RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
>  RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
>  RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
>  RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
>  R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
>  R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
>    enqueue_task+0x184/0x390 kernel/sched/core.c:730
>    __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
>    sched_setattr kernel/sched/core.c:4394 [inline]
>    __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
>    __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
>    __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
>    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
>  RIP: 0033:0x457569
>  Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
>  48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
>  ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
>  RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
>  RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
>  RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
>  RBP: 000000000072bfa0 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f05ce0a36d4
>  R13: 00000000004c369f R14: 00000000004d5730 R15: 00000000ffffffff
> 
> At deadline.c:628 we have:
> 
>  623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
>  624 {
>  625 	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
>  626 	struct rq *rq = rq_of_dl_rq(dl_rq);
>  627
>  628 	WARN_ON(dl_se->dl_boosted);
>  629 	WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
>         [...]
>      }
> 
> Which means that setup_new_dl_entity() has been called on a task
> currently boosted. This shouldn't happen though, as setup_new_
> dl_entity() is only called when the 'dynamic' deadline of the new entity
> is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
> condition.
> 
> Digging through PI code I noticed that what above might in fact happen
> if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
> first branch of boosting conditions we check only if a pi_task 'dynamic'
> deadline is earlier than mutex holder's and in this case we set mutex
> holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
> initialized if such tasks get boosted at some point (or if they become
> DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
> to 0 and this verifies the aforementioned condition.
> 
> Fix it by checking that the potential donor task is actually (even if
> temporary because in turn boosted) running at DEADLINE priority before
> using its 'dynamic' deadline value.
> 
> Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>

Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>

Thanks!
-- Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2019-01-07 16:19           ` Daniel Bristot de Oliveira
@ 2019-02-07  9:35             ` Dmitry Vyukov
  2019-07-24  4:45               ` Eric Biggers
  0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Vyukov @ 2019-02-07  9:35 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: Juri Lelli, Peter Zijlstra, luca abeni, Thomas Gleixner, syzbot,
	Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, the arch/x86 maintainers

On Mon, Jan 7, 2019 at 5:19 PM Daniel Bristot de Oliveira
<bristot@redhat.com> wrote:
>
> On 11/19/18 4:32 PM, Juri Lelli wrote:
> > From 9326fd2b20269cffef7290bdc5b8173460d3c870 Mon Sep 17 00:00:00 2001
> > From: Juri Lelli <juri.lelli@redhat.com>
> > Date: Mon, 19 Nov 2018 16:04:42 +0100
> > Subject: [PATCH] sched/core: Fix PI boosting between RT and DEADLINE
> >
> > syzbot reported the following warning:
> >
> >  WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
> >  enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> >  PM: Basic memory bitmaps freed
> >  Kernel panic - not syncing: panic_on_warn set ...
> >  CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
> >  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >  Google 01/01/2011
> >  Call Trace:
> >    __dump_stack lib/dump_stack.c:77 [inline]
> >    dump_stack+0x244/0x39d lib/dump_stack.c:113
> >    panic+0x2ad/0x55c kernel/panic.c:188
> >    __warn.cold.8+0x20/0x45 kernel/panic.c:540
> >    report_bug+0x254/0x2d0 lib/bug.c:186
> >    fixup_bug arch/x86/kernel/traps.c:178 [inline]
> >    do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
> >    do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
> >    invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
> >  RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> >  Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b 95 d8 fe
> >  ff ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff <0f> 0b e9 17 f1
> >  ff ff 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
> >  RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
> >  RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
> >  RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
> >  RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
> >  R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
> >  R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
> >    enqueue_task+0x184/0x390 kernel/sched/core.c:730
> >    __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
> >    sched_setattr kernel/sched/core.c:4394 [inline]
> >    __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
> >    __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
> >    __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
> >    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> >    entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >  RIP: 0033:0x457569
> >  Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
> >  48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> >  ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> >  RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
> >  RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
> >  RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
> >  RBP: 000000000072bfa0 R08: 0000000000000000 R09: 0000000000000000
> >  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f05ce0a36d4
> >  R13: 00000000004c369f R14: 00000000004d5730 R15: 00000000ffffffff
> >
> > At deadline.c:628 we have:
> >
> >  623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
> >  624 {
> >  625  struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> >  626  struct rq *rq = rq_of_dl_rq(dl_rq);
> >  627
> >  628  WARN_ON(dl_se->dl_boosted);
> >  629  WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
> >         [...]
> >      }
> >
> > Which means that setup_new_dl_entity() has been called on a task
> > currently boosted. This shouldn't happen though, as setup_new_
> > dl_entity() is only called when the 'dynamic' deadline of the new entity
> > is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
> > condition.
> >
> > Digging through PI code I noticed that what above might in fact happen
> > if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
> > first branch of boosting conditions we check only if a pi_task 'dynamic'
> > deadline is earlier than mutex holder's and in this case we set mutex
> > holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
> > initialized if such tasks get boosted at some point (or if they become
> > DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
> > to 0 and this verifies the aforementioned condition.
> >
> > Fix it by checking that the potential donor task is actually (even if
> > temporary because in turn boosted) running at DEADLINE priority before
> > using its 'dynamic' deadline value.
> >
> > Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
> > Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
>
> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>

What happened with this patch? I still don't see it in linux-next.

There is a number of reproducers that involve sched_setattr and lead
to dead machines on syzbot:
https://syzkaller.appspot.com/bug?id=0b210638616bb68109e9642158d4c0072770ae1c

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2018-11-18 18:49 WARNING in enqueue_task_dl syzbot
  2018-11-19  8:23 ` Thomas Gleixner
  2018-12-31 15:02 ` WARNING in enqueue_task_dl syzbot
@ 2019-03-20 17:08 ` syzbot
  2 siblings, 0 replies; 18+ messages in thread
From: syzbot @ 2019-03-20 17:08 UTC (permalink / raw)
  To: bp, bristot, dvhart, dvyukov, hpa, juri.lelli, linux-kernel,
	luca.abeni, luto, mingo, mingo, peterz, syzkaller-bugs, tglx,
	torvalds, x86

syzbot has bisected this bug to:

commit 7c80cfc99b7bfdc92cee26f8008859f326f4a37f
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Sat May 6 14:03:17 2017 +0000

     sched/fair: Clean up calc_cfs_shares()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=169d614d200000
start commit:   7c80cfc9 sched/fair: Clean up calc_cfs_shares()
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=159d614d200000
console output: https://syzkaller.appspot.com/x/log.txt?x=119d614d200000
kernel config:  https://syzkaller.appspot.com/x/.config?x=76d28549be7c27cf
dashboard link: https://syzkaller.appspot.com/bug?extid=119ba87189432ead09b4
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=10eb7ebf400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14156d77400000

Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
Fixes: 7c80cfc9 ("sched/fair: Clean up calc_cfs_shares()")

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2019-02-07  9:35             ` Dmitry Vyukov
@ 2019-07-24  4:45               ` Eric Biggers
  2020-06-16  6:53                 ` Daniel Wagner
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Biggers @ 2019-07-24  4:45 UTC (permalink / raw)
  To: Peter Zijlstra, Juri Lelli
  Cc: Daniel Bristot de Oliveira, Dmitry Vyukov, luca abeni,
	Thomas Gleixner, syzbot, Borislav Petkov, H. Peter Anvin, LKML,
	Andy Lutomirski, Ingo Molnar, syzkaller-bugs,
	the arch/x86 maintainers

On Thu, Feb 07, 2019 at 10:35:04AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> On Mon, Jan 7, 2019 at 5:19 PM Daniel Bristot de Oliveira
> <bristot@redhat.com> wrote:
> >
> > On 11/19/18 4:32 PM, Juri Lelli wrote:
> > > From 9326fd2b20269cffef7290bdc5b8173460d3c870 Mon Sep 17 00:00:00 2001
> > > From: Juri Lelli <juri.lelli@redhat.com>
> > > Date: Mon, 19 Nov 2018 16:04:42 +0100
> > > Subject: [PATCH] sched/core: Fix PI boosting between RT and DEADLINE
> > >
> > > syzbot reported the following warning:
> > >
> > >  WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
> > >  enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> > >  PM: Basic memory bitmaps freed
> > >  Kernel panic - not syncing: panic_on_warn set ...
> > >  CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
> > >  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > >  Google 01/01/2011
> > >  Call Trace:
> > >    __dump_stack lib/dump_stack.c:77 [inline]
> > >    dump_stack+0x244/0x39d lib/dump_stack.c:113
> > >    panic+0x2ad/0x55c kernel/panic.c:188
> > >    __warn.cold.8+0x20/0x45 kernel/panic.c:540
> > >    report_bug+0x254/0x2d0 lib/bug.c:186
> > >    fixup_bug arch/x86/kernel/traps.c:178 [inline]
> > >    do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
> > >    do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
> > >    invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
> > >  RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> > >  Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b 95 d8 fe
> > >  ff ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff <0f> 0b e9 17 f1
> > >  ff ff 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
> > >  RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
> > >  RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
> > >  RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
> > >  RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
> > >  R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
> > >  R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
> > >    enqueue_task+0x184/0x390 kernel/sched/core.c:730
> > >    __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
> > >    sched_setattr kernel/sched/core.c:4394 [inline]
> > >    __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
> > >    __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
> > >    __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
> > >    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> > >    entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >  RIP: 0033:0x457569
> > >  Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
> > >  48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> > >  ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > >  RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
> > >  RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
> > >  RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
> > >  RBP: 000000000072bfa0 R08: 0000000000000000 R09: 0000000000000000
> > >  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f05ce0a36d4
> > >  R13: 00000000004c369f R14: 00000000004d5730 R15: 00000000ffffffff
> > >
> > > At deadline.c:628 we have:
> > >
> > >  623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
> > >  624 {
> > >  625  struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> > >  626  struct rq *rq = rq_of_dl_rq(dl_rq);
> > >  627
> > >  628  WARN_ON(dl_se->dl_boosted);
> > >  629  WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
> > >         [...]
> > >      }
> > >
> > > Which means that setup_new_dl_entity() has been called on a task
> > > currently boosted. This shouldn't happen though, as setup_new_
> > > dl_entity() is only called when the 'dynamic' deadline of the new entity
> > > is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
> > > condition.
> > >
> > > Digging through PI code I noticed that what above might in fact happen
> > > if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
> > > first branch of boosting conditions we check only if a pi_task 'dynamic'
> > > deadline is earlier than mutex holder's and in this case we set mutex
> > > holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
> > > initialized if such tasks get boosted at some point (or if they become
> > > DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
> > > to 0 and this verifies the aforementioned condition.
> > >
> > > Fix it by checking that the potential donor task is actually (even if
> > > temporary because in turn boosted) running at DEADLINE priority before
> > > using its 'dynamic' deadline value.
> > >
> > > Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
> > > Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> >
> > Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
> 
> What happened with this patch? I still don't see it in linux-next.
> 
> There is a number of reproducers that involve sched_setattr and lead
> to dead machines on syzbot:
> https://syzkaller.appspot.com/bug?id=0b210638616bb68109e9642158d4c0072770ae1c
> 

Ping.  Patch is not applied, and this WARNING is still being hit.

Also note the bisection result:

	commit 7c80cfc99b7bfdc92cee26f8008859f326f4a37f
	Author: Peter Zijlstra <peterz@infradead.org>
	Date: Sat May 6 14:03:17 2017 +0000

	  sched/fair: Clean up calc_cfs_shares()

- Eric

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2019-07-24  4:45               ` Eric Biggers
@ 2020-06-16  6:53                 ` Daniel Wagner
  2020-06-16  8:20                   ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Wagner @ 2020-06-16  6:53 UTC (permalink / raw)
  To: Peter Zijlstra, Juri Lelli, Daniel Bristot de Oliveira,
	Dmitry Vyukov, luca abeni, Thomas Gleixner, syzbot,
	Borislav Petkov, H. Peter Anvin, LKML, Andy Lutomirski,
	Ingo Molnar, syzkaller-bugs, the arch/x86 maintainers

On Tue, Jul 23, 2019 at 09:45:16PM -0700, Eric Biggers wrote:
> On Thu, Feb 07, 2019 at 10:35:04AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> > On Mon, Jan 7, 2019 at 5:19 PM Daniel Bristot de Oliveira
> > <bristot@redhat.com> wrote:
> > >
> > > On 11/19/18 4:32 PM, Juri Lelli wrote:
> > > > From 9326fd2b20269cffef7290bdc5b8173460d3c870 Mon Sep 17 00:00:00 2001
> > > > From: Juri Lelli <juri.lelli@redhat.com>
> > > > Date: Mon, 19 Nov 2018 16:04:42 +0100
> > > > Subject: [PATCH] sched/core: Fix PI boosting between RT and DEADLINE
> > > >
> > > > syzbot reported the following warning:
> > > >
> > > >  WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
> > > >  enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> > > >  PM: Basic memory bitmaps freed
> > > >  Kernel panic - not syncing: panic_on_warn set ...
> > > >  CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
> > > >  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > >  Google 01/01/2011
> > > >  Call Trace:
> > > >    __dump_stack lib/dump_stack.c:77 [inline]
> > > >    dump_stack+0x244/0x39d lib/dump_stack.c:113
> > > >    panic+0x2ad/0x55c kernel/panic.c:188
> > > >    __warn.cold.8+0x20/0x45 kernel/panic.c:540
> > > >    report_bug+0x254/0x2d0 lib/bug.c:186
> > > >    fixup_bug arch/x86/kernel/traps.c:178 [inline]
> > > >    do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
> > > >    do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
> > > >    invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
> > > >  RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
> > > >  Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b 95 d8 fe
> > > >  ff ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff <0f> 0b e9 17 f1
> > > >  ff ff 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
> > > >  RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
> > > >  RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
> > > >  RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
> > > >  RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
> > > >  R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
> > > >  R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
> > > >    enqueue_task+0x184/0x390 kernel/sched/core.c:730
> > > >    __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
> > > >    sched_setattr kernel/sched/core.c:4394 [inline]
> > > >    __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
> > > >    __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
> > > >    __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
> > > >    do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
> > > >    entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > >  RIP: 0033:0x457569
> > > >  Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
> > > >  48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> > > >  ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> > > >  RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
> > > >  RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
> > > >  RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
> > > >  RBP: 000000000072bfa0 R08: 0000000000000000 R09: 0000000000000000
> > > >  R10: 0000000000000000 R11: 0000000000000246 R12: 00007f05ce0a36d4
> > > >  R13: 00000000004c369f R14: 00000000004d5730 R15: 00000000ffffffff
> > > >
> > > > At deadline.c:628 we have:
> > > >
> > > >  623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
> > > >  624 {
> > > >  625  struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> > > >  626  struct rq *rq = rq_of_dl_rq(dl_rq);
> > > >  627
> > > >  628  WARN_ON(dl_se->dl_boosted);
> > > >  629  WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
> > > >         [...]
> > > >      }
> > > >
> > > > Which means that setup_new_dl_entity() has been called on a task
> > > > currently boosted. This shouldn't happen though, as setup_new_
> > > > dl_entity() is only called when the 'dynamic' deadline of the new entity
> > > > is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
> > > > condition.
> > > >
> > > > Digging through PI code I noticed that what above might in fact happen
> > > > if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
> > > > first branch of boosting conditions we check only if a pi_task 'dynamic'
> > > > deadline is earlier than mutex holder's and in this case we set mutex
> > > > holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
> > > > initialized if such tasks get boosted at some point (or if they become
> > > > DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
> > > > to 0 and this verifies the aforementioned condition.
> > > >
> > > > Fix it by checking that the potential donor task is actually (even if
> > > > temporary because in turn boosted) running at DEADLINE priority before
> > > > using its 'dynamic' deadline value.
> > > >
> > > > Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
> > > > Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
> > >
> > > Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
> > 
> > What happened with this patch? I still don't see it in linux-next.
> > 
> > There is a number of reproducers that involve sched_setattr and lead
> > to dead machines on syzbot:
> > https://syzkaller.appspot.com/bug?id=0b210638616bb68109e9642158d4c0072770ae1c
> > 
> 
> Ping.  Patch is not applied, and this WARNING is still being hit.
> 
> Also note the bisection result:
> 
> 	commit 7c80cfc99b7bfdc92cee26f8008859f326f4a37f
> 	Author: Peter Zijlstra <peterz@infradead.org>
> 	Date: Sat May 6 14:03:17 2017 +0000
> 
> 	  sched/fair: Clean up calc_cfs_shares()

I've tested this patch against 5.8-rc1. Without the fix, after around 2 hours
the warning was triggered by the reproducer. With the patch, it survived
roughly 12 hours without the warning.

Tested-by: Daniel Wagner <dwagner@suse.de>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING in enqueue_task_dl
  2020-06-16  6:53                 ` Daniel Wagner
@ 2020-06-16  8:20                   ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2020-06-16  8:20 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Juri Lelli, Daniel Bristot de Oliveira, Dmitry Vyukov,
	luca abeni, Thomas Gleixner, syzbot, Borislav Petkov,
	H. Peter Anvin, LKML, Andy Lutomirski, Ingo Molnar,
	syzkaller-bugs, the arch/x86 maintainers

On Tue, Jun 16, 2020 at 08:53:59AM +0200, Daniel Wagner wrote:
> I've tested this patch against 5.8-rc1. Without the fix, after around 2 hours
> the warning was triggered by the reproducer. With the patch, it survived
> roughly 12 hours without the warning.
> 
> Tested-by: Daniel Wagner <dwagner@suse.de>

Thanks, have it now.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip: sched/urgent] sched/core: Fix PI boosting between RT and DEADLINE
  2018-11-19 15:32         ` Juri Lelli
  2019-01-07 16:19           ` Daniel Bristot de Oliveira
@ 2020-06-23  7:19           ` tip-bot2 for Juri Lelli
  2020-06-23  8:48           ` [tip: sched/urgent] sched/core: Fix PI boosting between RT and DEADLINE tasks tip-bot2 for Juri Lelli
  2 siblings, 0 replies; 18+ messages in thread
From: tip-bot2 for Juri Lelli @ 2020-06-23  7:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: syzbot+119ba87189432ead09b4, Juri Lelli, Peter Zijlstra (Intel),
	Daniel Bristot de Oliveira, Daniel Wagner, x86, LKML

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     195819207674143c790809f97f96102c7fada077
Gitweb:        https://git.kernel.org/tip/195819207674143c790809f97f96102c7fada077
Author:        Juri Lelli <juri.lelli@redhat.com>
AuthorDate:    Mon, 19 Nov 2018 16:32:01 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 22 Jun 2020 20:51:06 +02:00

sched/core: Fix PI boosting between RT and DEADLINE

syzbot reported the following warning:

 WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
 enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
 PM: Basic memory bitmaps freed
 Kernel panic - not syncing: panic_on_warn set ...
 CPU: 1 PID: 6351 Comm: syz-executor0 Not tainted 4.20.0-rc2+ #338
 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
 Google 01/01/2011
 Call Trace:
   __dump_stack lib/dump_stack.c:77 [inline]
   dump_stack+0x244/0x39d lib/dump_stack.c:113
   panic+0x2ad/0x55c kernel/panic.c:188
   __warn.cold.8+0x20/0x45 kernel/panic.c:540
   report_bug+0x254/0x2d0 lib/bug.c:186
   fixup_bug arch/x86/kernel/traps.c:178 [inline]
   do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
   do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
   invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
 RIP: 0010:enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504
 Code: ff 48 8b 8d c8 fe ff ff 48 c1 e6 2a 4c 8b 9d d0 fe ff ff 8b 95 d8 fe
 ff ff 48 8b 85 e0 fe ff ff e9 16 e4 ff ff e8 16 d0 ea ff <0f> 0b e9 17 f1
 ff ff 48 8b bd e8 fe ff ff 4c 89 95 c8 fe ff ff 48
 RSP: 0018:ffff8881ba39fa18 EFLAGS: 00010002
 RAX: 0000000000000000 RBX: ffff8881b9d6c000 RCX: ffff8881b9d6c278
 RDX: ffff8881b9d6c03c RSI: 0000000000000002 RDI: ffff8881daf2d710
 RBP: ffff8881ba39fb78 R08: 0000000000000001 R09: ffff8881daf00000
 R10: 0000001a4d4f1987 R11: ffff8881daf2db3b R12: 1ffff11037473f4e
 R13: ffff8881b9d6c2cc R14: ffff8881daf2ccc0 R15: ffff8881daf2ccc0
   enqueue_task+0x184/0x390 kernel/sched/core.c:730
   __sched_setscheduler+0xe99/0x2190 kernel/sched/core.c:4336
   sched_setattr kernel/sched/core.c:4394 [inline]
   __do_sys_sched_setattr kernel/sched/core.c:4570 [inline]
   __se_sys_sched_setattr kernel/sched/core.c:4549 [inline]
   __x64_sys_sched_setattr+0x1b2/0x2f0 kernel/sched/core.c:4549
   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x457569
 Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
 ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
 RSP: 002b:00007f05ce0a2c78 EFLAGS: 00000246 ORIG_RAX: 000000000000013a
 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457569
 RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000000
 RBP: 000000000072bfa0 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f05ce0a36d4
 R13: 00000000004c369f R14: 00000000004d5730 R15: 00000000ffffffff

At deadline.c:628 we have:

 623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
 624 {
 625 	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
 626 	struct rq *rq = rq_of_dl_rq(dl_rq);
 627
 628 	WARN_ON(dl_se->dl_boosted);
 629 	WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
        [...]
     }

Which means that setup_new_dl_entity() has been called on a task
currently boosted. This shouldn't happen though, as setup_new_
dl_entity() is only called when the 'dynamic' deadline of the new entity
is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
condition.

Digging through PI code I noticed that what above might in fact happen
if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
first branch of boosting conditions we check only if a pi_task 'dynamic'
deadline is earlier than mutex holder's and in this case we set mutex
holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
initialized if such tasks get boosted at some point (or if they become
DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
to 0 and this verifies the aforementioned condition.

Fix it by checking that the potential donor task is actually (even if
temporary because in turn boosted) running at DEADLINE priority before
using its 'dynamic' deadline value.

Fixes: 2d3d891d3344 ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Link: https://lkml.kernel.org/r/20181119153201.GB2119@localhost.localdomain
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9f8aa34..1f79d76 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4534,7 +4534,8 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
 	 */
 	if (dl_prio(prio)) {
 		if (!dl_prio(p->normal_prio) ||
-		    (pi_task && dl_entity_preempt(&pi_task->dl, &p->dl))) {
+		    (pi_task && dl_prio(pi_task->prio) &&
+		     dl_entity_preempt(&pi_task->dl, &p->dl))) {
 			p->dl.dl_boosted = 1;
 			queue_flag |= ENQUEUE_REPLENISH;
 		} else

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip: sched/urgent] sched/core: Fix PI boosting between RT and DEADLINE tasks
  2018-11-19 15:32         ` Juri Lelli
  2019-01-07 16:19           ` Daniel Bristot de Oliveira
  2020-06-23  7:19           ` [tip: sched/urgent] sched/core: Fix PI boosting between RT and DEADLINE tip-bot2 for Juri Lelli
@ 2020-06-23  8:48           ` tip-bot2 for Juri Lelli
  2 siblings, 0 replies; 18+ messages in thread
From: tip-bot2 for Juri Lelli @ 2020-06-23  8:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: syzbot+119ba87189432ead09b4, Juri Lelli, Peter Zijlstra (Intel),
	Ingo Molnar, Daniel Bristot de Oliveira, Daniel Wagner, x86,
	LKML

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     93a952a81bf31bffaf21eca1b530245acce12597
Gitweb:        https://git.kernel.org/tip/93a952a81bf31bffaf21eca1b530245acce12597
Author:        Juri Lelli <juri.lelli@redhat.com>
AuthorDate:    Mon, 19 Nov 2018 16:32:01 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 23 Jun 2020 10:42:39 +02:00

sched/core: Fix PI boosting between RT and DEADLINE tasks

syzbot reported the following warning:

 WARNING: CPU: 1 PID: 6351 at kernel/sched/deadline.c:628
 enqueue_task_dl+0x22da/0x38a0 kernel/sched/deadline.c:1504

At deadline.c:628 we have:

 623 static inline void setup_new_dl_entity(struct sched_dl_entity *dl_se)
 624 {
 625 	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
 626 	struct rq *rq = rq_of_dl_rq(dl_rq);
 627
 628 	WARN_ON(dl_se->dl_boosted);
 629 	WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
        [...]
     }

Which means that setup_new_dl_entity() has been called on a task
currently boosted. This shouldn't happen though, as setup_new_dl_entity()
is only called when the 'dynamic' deadline of the new entity
is in the past w.r.t. rq_clock and boosted tasks shouldn't verify this
condition.

Digging through the PI code I noticed that what above might in fact happen
if an RT tasks blocks on an rt_mutex hold by a DEADLINE task. In the
first branch of boosting conditions we check only if a pi_task 'dynamic'
deadline is earlier than mutex holder's and in this case we set mutex
holder to be dl_boosted. However, since RT 'dynamic' deadlines are only
initialized if such tasks get boosted at some point (or if they become
DEADLINE of course), in general RT 'dynamic' deadlines are usually equal
to 0 and this verifies the aforementioned condition.

Fix it by checking that the potential donor task is actually (even if
temporary because in turn boosted) running at DEADLINE priority before
using its 'dynamic' deadline value.

Fixes: 2d3d891d3344 ("sched/deadline: Add SCHED_DEADLINE inheritance logic")
Reported-by: syzbot+119ba87189432ead09b4@syzkaller.appspotmail.com
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Link: https://lkml.kernel.org/r/20181119153201.GB2119@localhost.localdomain
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9eeac94..c1ba2e5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4533,7 +4533,8 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
 	 */
 	if (dl_prio(prio)) {
 		if (!dl_prio(p->normal_prio) ||
-		    (pi_task && dl_entity_preempt(&pi_task->dl, &p->dl))) {
+		    (pi_task && dl_prio(pi_task->prio) &&
+		     dl_entity_preempt(&pi_task->dl, &p->dl))) {
 			p->dl.dl_boosted = 1;
 			queue_flag |= ENQUEUE_REPLENISH;
 		} else

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-06-23  8:48 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-18 18:49 WARNING in enqueue_task_dl syzbot
2018-11-19  8:23 ` Thomas Gleixner
2018-11-19 10:34   ` Peter Zijlstra
2018-11-19 12:07   ` luca abeni
2018-11-19 12:52     ` Peter Zijlstra
2018-11-19 13:43       ` Juri Lelli
2018-11-19 15:32         ` Juri Lelli
2019-01-07 16:19           ` Daniel Bristot de Oliveira
2019-02-07  9:35             ` Dmitry Vyukov
2019-07-24  4:45               ` Eric Biggers
2020-06-16  6:53                 ` Daniel Wagner
2020-06-16  8:20                   ` Peter Zijlstra
2020-06-23  7:19           ` [tip: sched/urgent] sched/core: Fix PI boosting between RT and DEADLINE tip-bot2 for Juri Lelli
2020-06-23  8:48           ` [tip: sched/urgent] sched/core: Fix PI boosting between RT and DEADLINE tasks tip-bot2 for Juri Lelli
2018-12-31 15:02 ` WARNING in enqueue_task_dl syzbot
2019-01-02  9:15   ` luca abeni
2019-01-07  7:46     ` Juri Lelli
2019-03-20 17:08 ` syzbot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.