All of lore.kernel.org
 help / color / mirror / Atom feed
* [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3)
@ 2021-12-18 19:50 syzbot
  2021-12-18 21:22 ` Matthew Wilcox
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: syzbot @ 2021-12-18 19:50 UTC (permalink / raw)
  To: linux-fsdevel, linux-kernel, syzkaller-bugs, viro

Hello,

syzbot found the following issue on:

HEAD commit:    5472f14a3742 Merge tag 'for_linus' of git://git.kernel.org..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11132113b00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=e3bdfd29b408d1b6
dashboard link: https://syzkaller.appspot.com/bug?extid=9c3fb12e9128b6e1d7eb
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14559113b00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9c3fb12e9128b6e1d7eb@syzkaller.appspotmail.com

INFO: task jbd2/sda1-8:2937 blocked for more than 143 seconds.
      Not tainted 5.16.0-rc5-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:jbd2/sda1-8     state:D stack:27112 pid: 2937 ppid:     2 flags:0x00004000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:4972 [inline]
 __schedule+0xa9a/0x4940 kernel/sched/core.c:6253
 schedule+0xd2/0x260 kernel/sched/core.c:6326
 io_schedule+0xee/0x170 kernel/sched/core.c:8371
 bit_wait_io+0x12/0xd0 kernel/sched/wait_bit.c:209
 __wait_on_bit+0x60/0x190 kernel/sched/wait_bit.c:49
 out_of_line_wait_on_bit+0xd5/0x110 kernel/sched/wait_bit.c:64
 wait_on_bit_io include/linux/wait_bit.h:101 [inline]
 __wait_on_buffer+0x7a/0x90 fs/buffer.c:122
 wait_on_buffer include/linux/buffer_head.h:356 [inline]
 journal_wait_on_commit_record fs/jbd2/commit.c:175 [inline]
 jbd2_journal_commit_transaction+0x4e3c/0x6be0 fs/jbd2/commit.c:931
 kjournald2+0x1d0/0x930 fs/jbd2/journal.c:213
 kthread+0x405/0x4f0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>

Showing all locks held in the system:
1 lock held by khungtaskd/27:
 #0: ffffffff8bb812e0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6458
1 lock held by udevd/2974:
2 locks held by getty/3287:
 #0: ffff88807ec56098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:252
 #1: ffffc90002b8e2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0xcf0/0x1230 drivers/tty/n_tty.c:2113
2 locks held by kworker/1:0/3663:
 #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
 #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline]
 #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:635 [inline]
 #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:662 [inline]
 #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: process_one_work+0x896/0x1690 kernel/workqueue.c:2269
 #1: ffffc90001b27db0 ((work_completion)(&rew.rew_work)){+.+.}-{0:0}, at: process_one_work+0x8ca/0x1690 kernel/workqueue.c:2273
2 locks held by syz-executor.2/13278:
 #0: ffffffff8bc4c6c8 (perf_sched_mutex){+.+.}-{3:3}, at: account_event kernel/events/core.c:11445 [inline]
 #0: ffffffff8bc4c6c8 (perf_sched_mutex){+.+.}-{3:3}, at: perf_event_alloc.part.0+0x31f9/0x3b10 kernel/events/core.c:11678
 #1: ffffffff8bb8a668 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:290 [inline]
 #1: ffffffff8bb8a668 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x4fa/0x620 kernel/rcu/tree_exp.h:836

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 27 Comm: khungtaskd Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xc1d/0xf50 kernel/hung_task.c:295
 kthread+0x405/0x4f0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 10 Comm: kworker/u4:1 Not tainted 5.16.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_unbound toggle_allocation_gate
RIP: 0010:__kasan_check_read+0x4/0x10 mm/kasan/shadow.c:31
Code: 44 07 48 85 db 0f 85 d0 9f 45 07 48 83 c4 60 5b 5d 41 5c 41 5d c3 c3 e9 d0 a0 45 07 cc cc cc cc cc cc cc cc cc cc 48 8b 0c 24 <89> f6 31 d2 e9 f3 f9 ff ff 0f 1f 00 48 8b 0c 24 89 f6 ba 01 00 00
RSP: 0018:ffffc90000f0f630 EFLAGS: 00000002
RAX: 0000000000000002 RBX: 1ffff920001e1ece RCX: ffffffff815b774f
RDX: 0000000000000092 RSI: 0000000000000008 RDI: ffffffff8ff77a10
RBP: 0000000000000100 R08: 0000000000000000 R09: ffffffff8ff77a17
R10: 0000000000000001 R11: 000000000000003f R12: 0000000000000008
R13: ffff888011c7eda8 R14: 0000000000000092 R15: ffff888011c7edc8
FS:  0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555557b2708 CR3: 000000000b88e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 instrument_atomic_read include/linux/instrumented.h:71 [inline]
 test_bit include/asm-generic/bitops/instrumented-non-atomic.h:134 [inline]
 hlock_class kernel/locking/lockdep.c:199 [inline]
 mark_lock+0xef/0x17b0 kernel/locking/lockdep.c:4583
 mark_usage kernel/locking/lockdep.c:4526 [inline]
 __lock_acquire+0x8a7/0x54a0 kernel/locking/lockdep.c:4981
 lock_acquire kernel/locking/lockdep.c:5637 [inline]
 lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5602
 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
 _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
 spin_lock include/linux/spinlock.h:349 [inline]
 __get_locked_pte+0x2b6/0x4d0 mm/memory.c:1722
 get_locked_pte include/linux/mm.h:2160 [inline]
 __text_poke+0x1ae/0x8c0 arch/x86/kernel/alternative.c:1000
 text_poke_bp_batch+0x3d7/0x560 arch/x86/kernel/alternative.c:1361
 text_poke_flush arch/x86/kernel/alternative.c:1451 [inline]
 text_poke_flush arch/x86/kernel/alternative.c:1448 [inline]
 text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1458
 arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:146
 jump_label_update+0x1d5/0x430 kernel/jump_label.c:830
 static_key_enable_cpuslocked+0x1b1/0x260 kernel/jump_label.c:177
 static_key_enable+0x16/0x20 kernel/jump_label.c:190
 toggle_allocation_gate mm/kfence/core.c:732 [inline]
 toggle_allocation_gate+0x100/0x390 mm/kfence/core.c:724
 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
 kthread+0x405/0x4f0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
sd 0:0:1:0: tag#6693 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
sd 0:0:1:0: tag#6693 CDB: opcode=0xe5 (vendor)
sd 0:0:1:0: tag#6693 CDB[00]: e5 f4 32 73 2f 4e 09 6d 26 e2 c7 35 d1 35 12 1c
sd 0:0:1:0: tag#6693 CDB[10]: 92 1b da 40 b8 58 5b a8 d4 7d 34 f3 90 4c f1 2d
sd 0:0:1:0: tag#6693 CDB[20]: ba
----------------
Code disassembly (best guess):
   0:	44 07                	rex.R (bad)
   2:	48 85 db             	test   %rbx,%rbx
   5:	0f 85 d0 9f 45 07    	jne    0x7459fdb
   b:	48 83 c4 60          	add    $0x60,%rsp
   f:	5b                   	pop    %rbx
  10:	5d                   	pop    %rbp
  11:	41 5c                	pop    %r12
  13:	41 5d                	pop    %r13
  15:	c3                   	retq
  16:	c3                   	retq
  17:	e9 d0 a0 45 07       	jmpq   0x745a0ec
  1c:	cc                   	int3
  1d:	cc                   	int3
  1e:	cc                   	int3
  1f:	cc                   	int3
  20:	cc                   	int3
  21:	cc                   	int3
  22:	cc                   	int3
  23:	cc                   	int3
  24:	cc                   	int3
  25:	cc                   	int3
  26:	48 8b 0c 24          	mov    (%rsp),%rcx
* 2a:	89 f6                	mov    %esi,%esi <-- trapping instruction
  2c:	31 d2                	xor    %edx,%edx
  2e:	e9 f3 f9 ff ff       	jmpq   0xfffffa26
  33:	0f 1f 00             	nopl   (%rax)
  36:	48 8b 0c 24          	mov    (%rsp),%rcx
  3a:	89 f6                	mov    %esi,%esi
  3c:	ba                   	.byte 0xba
  3d:	01 00                	add    %eax,(%rax)


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3)
  2021-12-18 19:50 [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3) syzbot
@ 2021-12-18 21:22 ` Matthew Wilcox
       [not found] ` <20211219023540.1638-1-hdanton@sina.com>
  2021-12-23  5:32 ` syzbot
  2 siblings, 0 replies; 11+ messages in thread
From: Matthew Wilcox @ 2021-12-18 21:22 UTC (permalink / raw)
  To: syzbot; +Cc: linux-fsdevel, linux-kernel, syzkaller-bugs, viro

On Sat, Dec 18, 2021 at 11:50:20AM -0800, syzbot wrote:
> INFO: task jbd2/sda1-8:2937 blocked for more than 143 seconds.
>       Not tainted 5.16.0-rc5-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

sched_setattr(0x0, &(0x7f0000000080)={0x38, 0x1, 0x0, 0x0, 0x1}, 0x0)

so you've set a SCHED_FIFO priority and then are surprised that some
tasks are getting starved?

> task:jbd2/sda1-8     state:D stack:27112 pid: 2937 ppid:     2 flags:0x00004000
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:4972 [inline]
>  __schedule+0xa9a/0x4940 kernel/sched/core.c:6253
>  schedule+0xd2/0x260 kernel/sched/core.c:6326
>  io_schedule+0xee/0x170 kernel/sched/core.c:8371
>  bit_wait_io+0x12/0xd0 kernel/sched/wait_bit.c:209
>  __wait_on_bit+0x60/0x190 kernel/sched/wait_bit.c:49
>  out_of_line_wait_on_bit+0xd5/0x110 kernel/sched/wait_bit.c:64
>  wait_on_bit_io include/linux/wait_bit.h:101 [inline]
>  __wait_on_buffer+0x7a/0x90 fs/buffer.c:122
>  wait_on_buffer include/linux/buffer_head.h:356 [inline]
>  journal_wait_on_commit_record fs/jbd2/commit.c:175 [inline]
>  jbd2_journal_commit_transaction+0x4e3c/0x6be0 fs/jbd2/commit.c:931
>  kjournald2+0x1d0/0x930 fs/jbd2/journal.c:213
>  kthread+0x405/0x4f0 kernel/kthread.c:327
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>  </TASK>
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/27:
>  #0: ffffffff8bb812e0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6458
> 1 lock held by udevd/2974:
> 2 locks held by getty/3287:
>  #0: ffff88807ec56098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:252
>  #1: ffffc90002b8e2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0xcf0/0x1230 drivers/tty/n_tty.c:2113
> 2 locks held by kworker/1:0/3663:
>  #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
>  #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
>  #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline]
>  #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:635 [inline]
>  #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:662 [inline]
>  #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: process_one_work+0x896/0x1690 kernel/workqueue.c:2269
>  #1: ffffc90001b27db0 ((work_completion)(&rew.rew_work)){+.+.}-{0:0}, at: process_one_work+0x8ca/0x1690 kernel/workqueue.c:2273
> 2 locks held by syz-executor.2/13278:
>  #0: ffffffff8bc4c6c8 (perf_sched_mutex){+.+.}-{3:3}, at: account_event kernel/events/core.c:11445 [inline]
>  #0: ffffffff8bc4c6c8 (perf_sched_mutex){+.+.}-{3:3}, at: perf_event_alloc.part.0+0x31f9/0x3b10 kernel/events/core.c:11678
>  #1: ffffffff8bb8a668 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:290 [inline]
>  #1: ffffffff8bb8a668 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x4fa/0x620 kernel/rcu/tree_exp.h:836
> 
> =============================================
> 
> NMI backtrace for cpu 0
> CPU: 0 PID: 27 Comm: khungtaskd Not tainted 5.16.0-rc5-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  <TASK>
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
>  nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
>  nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
>  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
>  check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
>  watchdog+0xc1d/0xf50 kernel/hung_task.c:295
>  kthread+0x405/0x4f0 kernel/kthread.c:327
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>  </TASK>
> Sending NMI from CPU 0 to CPUs 1:
> NMI backtrace for cpu 1
> CPU: 1 PID: 10 Comm: kworker/u4:1 Not tainted 5.16.0-rc5-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: events_unbound toggle_allocation_gate
> RIP: 0010:__kasan_check_read+0x4/0x10 mm/kasan/shadow.c:31
> Code: 44 07 48 85 db 0f 85 d0 9f 45 07 48 83 c4 60 5b 5d 41 5c 41 5d c3 c3 e9 d0 a0 45 07 cc cc cc cc cc cc cc cc cc cc 48 8b 0c 24 <89> f6 31 d2 e9 f3 f9 ff ff 0f 1f 00 48 8b 0c 24 89 f6 ba 01 00 00
> RSP: 0018:ffffc90000f0f630 EFLAGS: 00000002
> RAX: 0000000000000002 RBX: 1ffff920001e1ece RCX: ffffffff815b774f
> RDX: 0000000000000092 RSI: 0000000000000008 RDI: ffffffff8ff77a10
> RBP: 0000000000000100 R08: 0000000000000000 R09: ffffffff8ff77a17
> R10: 0000000000000001 R11: 000000000000003f R12: 0000000000000008
> R13: ffff888011c7eda8 R14: 0000000000000092 R15: ffff888011c7edc8
> FS:  0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00005555557b2708 CR3: 000000000b88e000 CR4: 00000000003506e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  instrument_atomic_read include/linux/instrumented.h:71 [inline]
>  test_bit include/asm-generic/bitops/instrumented-non-atomic.h:134 [inline]
>  hlock_class kernel/locking/lockdep.c:199 [inline]
>  mark_lock+0xef/0x17b0 kernel/locking/lockdep.c:4583
>  mark_usage kernel/locking/lockdep.c:4526 [inline]
>  __lock_acquire+0x8a7/0x54a0 kernel/locking/lockdep.c:4981
>  lock_acquire kernel/locking/lockdep.c:5637 [inline]
>  lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5602
>  __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
>  _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
>  spin_lock include/linux/spinlock.h:349 [inline]
>  __get_locked_pte+0x2b6/0x4d0 mm/memory.c:1722
>  get_locked_pte include/linux/mm.h:2160 [inline]
>  __text_poke+0x1ae/0x8c0 arch/x86/kernel/alternative.c:1000
>  text_poke_bp_batch+0x3d7/0x560 arch/x86/kernel/alternative.c:1361
>  text_poke_flush arch/x86/kernel/alternative.c:1451 [inline]
>  text_poke_flush arch/x86/kernel/alternative.c:1448 [inline]
>  text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1458
>  arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:146
>  jump_label_update+0x1d5/0x430 kernel/jump_label.c:830
>  static_key_enable_cpuslocked+0x1b1/0x260 kernel/jump_label.c:177
>  static_key_enable+0x16/0x20 kernel/jump_label.c:190
>  toggle_allocation_gate mm/kfence/core.c:732 [inline]
>  toggle_allocation_gate+0x100/0x390 mm/kfence/core.c:724
>  process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
>  worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
>  kthread+0x405/0x4f0 kernel/kthread.c:327
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>  </TASK>
> sd 0:0:1:0: tag#6693 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
> sd 0:0:1:0: tag#6693 CDB: opcode=0xe5 (vendor)
> sd 0:0:1:0: tag#6693 CDB[00]: e5 f4 32 73 2f 4e 09 6d 26 e2 c7 35 d1 35 12 1c
> sd 0:0:1:0: tag#6693 CDB[10]: 92 1b da 40 b8 58 5b a8 d4 7d 34 f3 90 4c f1 2d
> sd 0:0:1:0: tag#6693 CDB[20]: ba
> ----------------
> Code disassembly (best guess):
>    0:	44 07                	rex.R (bad)
>    2:	48 85 db             	test   %rbx,%rbx
>    5:	0f 85 d0 9f 45 07    	jne    0x7459fdb
>    b:	48 83 c4 60          	add    $0x60,%rsp
>    f:	5b                   	pop    %rbx
>   10:	5d                   	pop    %rbp
>   11:	41 5c                	pop    %r12
>   13:	41 5d                	pop    %r13
>   15:	c3                   	retq
>   16:	c3                   	retq
>   17:	e9 d0 a0 45 07       	jmpq   0x745a0ec
>   1c:	cc                   	int3
>   1d:	cc                   	int3
>   1e:	cc                   	int3
>   1f:	cc                   	int3
>   20:	cc                   	int3
>   21:	cc                   	int3
>   22:	cc                   	int3
>   23:	cc                   	int3
>   24:	cc                   	int3
>   25:	cc                   	int3
>   26:	48 8b 0c 24          	mov    (%rsp),%rcx
> * 2a:	89 f6                	mov    %esi,%esi <-- trapping instruction
>   2c:	31 d2                	xor    %edx,%edx
>   2e:	e9 f3 f9 ff ff       	jmpq   0xfffffa26
>   33:	0f 1f 00             	nopl   (%rax)
>   36:	48 8b 0c 24          	mov    (%rsp),%rcx
>   3a:	89 f6                	mov    %esi,%esi
>   3c:	ba                   	.byte 0xba
>   3d:	01 00                	add    %eax,(%rax)
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3)
       [not found] ` <20211219023540.1638-1-hdanton@sina.com>
@ 2021-12-19  4:20   ` Matthew Wilcox
  2021-12-20 21:24     ` Theodore Ts'o
       [not found]     ` <20211221090804.1810-1-hdanton@sina.com>
  0 siblings, 2 replies; 11+ messages in thread
From: Matthew Wilcox @ 2021-12-19  4:20 UTC (permalink / raw)
  To: Hillf Danton; +Cc: syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs

On Sun, Dec 19, 2021 at 10:35:40AM +0800, Hillf Danton wrote:
> On Sat, 18 Dec 2021 21:22:57 +0000 Matthew Wilcox wrote:
> >On Sat, Dec 18, 2021 at 11:50:20AM -0800, syzbot wrote:
> >> INFO: task jbd2/sda1-8:2937 blocked for more than 143 seconds.
> >>       Not tainted 5.16.0-rc5-syzkaller #0
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 
> Hey Willy
> >
> >sched_setattr(0x0, &(0x7f0000000080)={0x38, 0x1, 0x0, 0x0, 0x1}, 0x0)
> >
> >so you've set a SCHED_FIFO priority and then are surprised that some
> >tasks are getting starved?
> 
> Can you speficy a bit more on how fifo could block journald waiting for
> IO completion more than 120 seconds?

Sure!  You can see from the trace below that jbd2/sda1-8 is in D state,
so we know nobody's called unlock_buffer() yet, which would have woken
it.  That would happen in journal_end_buffer_io_sync(), which is
the b_end_io for the buffer.

Learning more detail than that would require knowing the I/O path
for this particular test system.  I suspect that the I/O was submitted
and has even completed, but there's a kernel thread waiting to run which
will call the ->b_end_io that hasn't been scheduled yet, because it's
at a lower priority than all the threads which are running at
SCHED_FIFO.

I'm disinclined to look at this report much further because syzbot is
stumbling around trying things which are definitely in the category of
"if you do this and things break, you get to keep both pieces".  You
can learn some interesting things by playing with the various RT
scheduling classes, but mostly what you can learn is that you need to
choose your priorities carefully to have a functioning system.

> Hillf
> >
> >> task:jbd2/sda1-8     state:D stack:27112 pid: 2937 ppid:     2 flags:0x00004000
> >> Call Trace:
> >>  <TASK>
> >>  context_switch kernel/sched/core.c:4972 [inline]
> >>  __schedule+0xa9a/0x4940 kernel/sched/core.c:6253
> >>  schedule+0xd2/0x260 kernel/sched/core.c:6326
> >>  io_schedule+0xee/0x170 kernel/sched/core.c:8371
> >>  bit_wait_io+0x12/0xd0 kernel/sched/wait_bit.c:209
> >>  __wait_on_bit+0x60/0x190 kernel/sched/wait_bit.c:49
> >>  out_of_line_wait_on_bit+0xd5/0x110 kernel/sched/wait_bit.c:64
> >>  wait_on_bit_io include/linux/wait_bit.h:101 [inline]
> >>  __wait_on_buffer+0x7a/0x90 fs/buffer.c:122
> >>  wait_on_buffer include/linux/buffer_head.h:356 [inline]
> >>  journal_wait_on_commit_record fs/jbd2/commit.c:175 [inline]
> >>  jbd2_journal_commit_transaction+0x4e3c/0x6be0 fs/jbd2/commit.c:931
> >>  kjournald2+0x1d0/0x930 fs/jbd2/journal.c:213
> >>  kthread+0x405/0x4f0 kernel/kthread.c:327
> >>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> >>  </TASK>
> >> 
> >> Showing all locks held in the system:
> >> 1 lock held by khungtaskd/27:
> >>  #0: ffffffff8bb812e0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6458
> >> 1 lock held by udevd/2974:
> >> 2 locks held by getty/3287:
> >>  #0: ffff88807ec56098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:252
> >>  #1: ffffc90002b8e2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0xcf0/0x1230 drivers/tty/n_tty.c:2113
> >> 2 locks held by kworker/1:0/3663:
> >>  #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
> >>  #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
> >>  #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline]
> >>  #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:635 [inline]
> >>  #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:662 [inline]
> >>  #0: ffff888010c76538 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: process_one_work+0x896/0x1690 kernel/workqueue.c:2269
> >>  #1: ffffc90001b27db0 ((work_completion)(&rew.rew_work)){+.+.}-{0:0}, at: process_one_work+0x8ca/0x1690 kernel/workqueue.c:2273
> >> 2 locks held by syz-executor.2/13278:
> >>  #0: ffffffff8bc4c6c8 (perf_sched_mutex){+.+.}-{3:3}, at: account_event kernel/events/core.c:11445 [inline]
> >>  #0: ffffffff8bc4c6c8 (perf_sched_mutex){+.+.}-{3:3}, at: perf_event_alloc.part.0+0x31f9/0x3b10 kernel/events/core.c:11678
> >>  #1: ffffffff8bb8a668 (rcu_state.exp_mutex){+.+.}-{3:3}, at: exp_funnel_lock kernel/rcu/tree_exp.h:290 [inline]
> >>  #1: ffffffff8bb8a668 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x4fa/0x620 kernel/rcu/tree_exp.h:836
> >> 
> >> =============================================
> >> 
> >> NMI backtrace for cpu 0
> >> CPU: 0 PID: 27 Comm: khungtaskd Not tainted 5.16.0-rc5-syzkaller #0
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> >> Call Trace:
> >>  <TASK>
> >>  __dump_stack lib/dump_stack.c:88 [inline]
> >>  dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> >>  nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
> >>  nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
> >>  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
> >>  check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
> >>  watchdog+0xc1d/0xf50 kernel/hung_task.c:295
> >>  kthread+0x405/0x4f0 kernel/kthread.c:327
> >>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> >>  </TASK>
> >> Sending NMI from CPU 0 to CPUs 1:
> >> NMI backtrace for cpu 1
> >> CPU: 1 PID: 10 Comm: kworker/u4:1 Not tainted 5.16.0-rc5-syzkaller #0
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> >> Workqueue: events_unbound toggle_allocation_gate
> >> RIP: 0010:__kasan_check_read+0x4/0x10 mm/kasan/shadow.c:31
> >> Code: 44 07 48 85 db 0f 85 d0 9f 45 07 48 83 c4 60 5b 5d 41 5c 41 5d c3 c3 e9 d0 a0 45 07 cc cc cc cc cc cc cc cc cc cc 48 8b 0c 24 <89> f6 31 d2 e9 f3 f9 ff ff 0f 1f 00 48 8b 0c 24 89 f6 ba 01 00 00
> >> RSP: 0018:ffffc90000f0f630 EFLAGS: 00000002
> >> RAX: 0000000000000002 RBX: 1ffff920001e1ece RCX: ffffffff815b774f
> >> RDX: 0000000000000092 RSI: 0000000000000008 RDI: ffffffff8ff77a10
> >> RBP: 0000000000000100 R08: 0000000000000000 R09: ffffffff8ff77a17
> >> R10: 0000000000000001 R11: 000000000000003f R12: 0000000000000008
> >> R13: ffff888011c7eda8 R14: 0000000000000092 R15: ffff888011c7edc8
> >> FS:  0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
> >> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: 00005555557b2708 CR3: 000000000b88e000 CR4: 00000000003506e0
> >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> Call Trace:
> >>  <TASK>
> >>  instrument_atomic_read include/linux/instrumented.h:71 [inline]
> >>  test_bit include/asm-generic/bitops/instrumented-non-atomic.h:134 [inline]
> >>  hlock_class kernel/locking/lockdep.c:199 [inline]
> >>  mark_lock+0xef/0x17b0 kernel/locking/lockdep.c:4583
> >>  mark_usage kernel/locking/lockdep.c:4526 [inline]
> >>  __lock_acquire+0x8a7/0x54a0 kernel/locking/lockdep.c:4981
> >>  lock_acquire kernel/locking/lockdep.c:5637 [inline]
> >>  lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5602
> >>  __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
> >>  _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
> >>  spin_lock include/linux/spinlock.h:349 [inline]
> >>  __get_locked_pte+0x2b6/0x4d0 mm/memory.c:1722
> >>  get_locked_pte include/linux/mm.h:2160 [inline]
> >>  __text_poke+0x1ae/0x8c0 arch/x86/kernel/alternative.c:1000
> >>  text_poke_bp_batch+0x3d7/0x560 arch/x86/kernel/alternative.c:1361
> >>  text_poke_flush arch/x86/kernel/alternative.c:1451 [inline]
> >>  text_poke_flush arch/x86/kernel/alternative.c:1448 [inline]
> >>  text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1458
> >>  arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:146
> >>  jump_label_update+0x1d5/0x430 kernel/jump_label.c:830
> >>  static_key_enable_cpuslocked+0x1b1/0x260 kernel/jump_label.c:177
> >>  static_key_enable+0x16/0x20 kernel/jump_label.c:190
> >>  toggle_allocation_gate mm/kfence/core.c:732 [inline]
> >>  toggle_allocation_gate+0x100/0x390 mm/kfence/core.c:724
> >>  process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
> >>  worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
> >>  kthread+0x405/0x4f0 kernel/kthread.c:327
> >>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> >>  </TASK>
> >> sd 0:0:1:0: tag#6693 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
> >> sd 0:0:1:0: tag#6693 CDB: opcode=0xe5 (vendor)
> >> sd 0:0:1:0: tag#6693 CDB[00]: e5 f4 32 73 2f 4e 09 6d 26 e2 c7 35 d1 35 12 1c
> >> sd 0:0:1:0: tag#6693 CDB[10]: 92 1b da 40 b8 58 5b a8 d4 7d 34 f3 90 4c f1 2d
> >> sd 0:0:1:0: tag#6693 CDB[20]: ba
> >> ----------------
> >> Code disassembly (best guess):
> >>    0:	44 07                	rex.R (bad)
> >>    2:	48 85 db             	test   %rbx,%rbx
> >>    5:	0f 85 d0 9f 45 07    	jne    0x7459fdb
> >>    b:	48 83 c4 60          	add    $0x60,%rsp
> >>    f:	5b                   	pop    %rbx
> >>   10:	5d                   	pop    %rbp
> >>   11:	41 5c                	pop    %r12
> >>   13:	41 5d                	pop    %r13
> >>   15:	c3                   	retq
> >>   16:	c3                   	retq
> >>   17:	e9 d0 a0 45 07       	jmpq   0x745a0ec
> >>   1c:	cc                   	int3
> >>   1d:	cc                   	int3
> >>   1e:	cc                   	int3
> >>   1f:	cc                   	int3
> >>   20:	cc                   	int3
> >>   21:	cc                   	int3
> >>   22:	cc                   	int3
> >>   23:	cc                   	int3
> >>   24:	cc                   	int3
> >>   25:	cc                   	int3
> >>   26:	48 8b 0c 24          	mov    (%rsp),%rcx
> >> * 2a:	89 f6                	mov    %esi,%esi <-- trapping instruction
> >>   2c:	31 d2                	xor    %edx,%edx
> >>   2e:	e9 f3 f9 ff ff       	jmpq   0xfffffa26
> >>   33:	0f 1f 00             	nopl   (%rax)
> >>   36:	48 8b 0c 24          	mov    (%rsp),%rcx
> >>   3a:	89 f6                	mov    %esi,%esi
> >>   3c:	ba                   	.byte 0xba
> >>   3d:	01 00                	add    %eax,(%rax)
> >> 
> >> 
> >> ---
> >> This report is generated by a bot. It may contain errors.
> >> See https://goo.gl/tpsmEJ for more information about syzbot.
> >> syzbot engineers can be reached at syzkaller@googlegroups.com.
> >> 
> >> syzbot will keep track of this issue. See:
> >> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >> syzbot can test patches for this issue, for details see:
> >> https://goo.gl/tpsmEJ#testing-patches
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3)
  2021-12-19  4:20   ` Matthew Wilcox
@ 2021-12-20 21:24     ` Theodore Ts'o
       [not found]     ` <20211221090804.1810-1-hdanton@sina.com>
  1 sibling, 0 replies; 11+ messages in thread
From: Theodore Ts'o @ 2021-12-20 21:24 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Hillf Danton, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs

On Sun, Dec 19, 2021 at 04:20:57AM +0000, Matthew Wilcox wrote:
> > Hey Willy
> > >
> > >sched_setattr(0x0, &(0x7f0000000080)={0x38, 0x1, 0x0, 0x0, 0x1}, 0x0)
> > >
> > >so you've set a SCHED_FIFO priority and then are surprised that some
> > >tasks are getting starved?
> > 
> > Can you speficy a bit more on how fifo could block journald waiting for
> > IO completion more than 120 seconds?
> 
> Sure!  You can see from the trace below that jbd2/sda1-8 is in D state,
> so we know nobody's called unlock_buffer() yet, which would have woken
> it.  That would happen in journal_end_buffer_io_sync(), which is
> the b_end_io for the buffer.
> 
> Learning more detail than that would require knowing the I/O path
> for this particular test system.  I suspect that the I/O was submitted
> and has even completed, but there's a kernel thread waiting to run which
> will call the ->b_end_io that hasn't been scheduled yet, because it's
> at a lower priority than all the threads which are running at
> SCHED_FIFO.
> 
> I'm disinclined to look at this report much further because syzbot is
> stumbling around trying things which are definitely in the category of
> "if you do this and things break, you get to keep both pieces".  You
> can learn some interesting things by playing with the various RT
> scheduling classes, but mostly what you can learn is that you need to
> choose your priorities carefully to have a functioning system.

In general, real-time threads (anything scheduled with SCHED_FIFO or
SCHED_RT) should never, *ever* try to do any kind of I/O.  After all,
I/O can block, and if a real-time thread blocks, so much for any kind
of real-time guarantee that you might have.

If you must use do I/O from soft real-time thread, one trick you *can*
do is to some number of CPU's which are reserved for real-time
threads, and a subset of threads which are reserved for non-real-time
threads, enforced using CPU pinning.  It's still not prefect, since
there are still priority inheritance issues, and while this protects
against a non-real-time thread holding some lock which is needed by a
real-time (SCHED_FIFO) thread, if there are two SCHED_FIFO running at
different priorities it's still possible to deadlock the entire
kernel.

Can it be done?  Sure; I was part of an effort to make it work for the
US Navy's DDG-1000 Zumwalt-class destroyer[1].  But it's tricky, and
it's why IBM got paid the big bucks. :-)  Certainly it's going to be
problematic for syzkaller if it's just going to be randomly trying to
set some threads to be real-time without doing any kind of formal
planning.

[1] https://dl.acm.org/doi/10.1147/sj.472.0207

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3)
       [not found]     ` <20211221090804.1810-1-hdanton@sina.com>
@ 2021-12-21 22:32       ` Theodore Ts'o
       [not found]       ` <20211222022527.1880-1-hdanton@sina.com>
  1 sibling, 0 replies; 11+ messages in thread
From: Theodore Ts'o @ 2021-12-21 22:32 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Matthew Wilcox, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs

On Tue, Dec 21, 2021 at 05:08:04PM +0800, Hillf Danton wrote:
> 
> I am trying to find the cause of same jbd2 journal thread blocked for
> more than 120 seconds on a customer's system of linux-4.18 with RT
> turned on and 12 CPUs in total bootup.

So here's the tricky bit with trying to use ext4 (or any file system,
really; the details will be different but the fundamental issues will
remain the same).  When a thread either calls fsync(2) or tries to
initiate a flie system mutation by creating a jbd2 handle and there
isn't enough space, the process will wake up the jbd2 thread and block
until a new transaction has been started.

In the jbd2 thread, the first thing it does is wait for all currently
open handles to close, since it can only commit the current
transaction when all handles attached to the current transaction have
been closed.  If some non-real-time process happens to have an open
handle, but it can't make forward progress for some reason, then this
will prevent the commit from completing, and this in turn will cause
any other process which needs to make changes to the file system from
making forward progress, since they will be blocked by jbd2 commit
thread, which in turn is blocked waiting low-priority process to make
forward progress --- and if that process is blocked behind some high
priority process, then that's the classic definition of "priority
inversion".

> Without both access to it and
> clue of what RT apps running in the system, what I proposed is to
> launch one more FIFO task of priority MAX_RT_PRIO-1 in the system like
> 
> 	for (;;) {
> 		unsigned long i;
> 
> 		for (;;) /* spin for 150 seconds */
> 			i++;
> 		sleep a second;
> 	}
> 
> in bid to observe the changes in behavior of underlying hardware using
> the diff below.

I'm not sure what you hope to learn by doing something like that.
That will certainly perturb the system, but every 150 seconds, the
task is going to let other tasks/threads run --- but it will be
whatever is the next highest priority thread. 

What you want to do is to figure out which thread is still holding a
handle open, and why it can't run --- is it because there are
sufficient higher priority threads that are running that it can't get
a time slice to run, so it can complete its file system operation and
release its handle?  Is it blocked behind a memory allocation (perhaps
because it is in a memory-constrained cgroup)?  Is it blocked waiting
on some mutex perhaps because it's doing something crazy like
sendfile()?  Or some kind of I/O Uring system call?  Etc, Etc., Etc.

What would probably make sense is to use "ps -eLcl" before the system
hangs so you can see what processes and threads are running with which
real-time or non-real-time priorities.  Or if the system has hung,
uses the magic sysrq key to find out what threads are running on each
CPU, and grab a stack trace from all of the running processes so you
can figure out where some task might be blocked, and figure out which
task might be blocked inside a codepath where it would be holding a
handle.

If that level of access means you have to get a government security
clearance, or get permission from a finance company's vice president
for that kind of access --- get that clearance ahead of time, even if
it takes months and involves background investigations and polygraph
tests.  Because you *will* need that kind of access to debug these
sorts of real-time locking issues.  There is an extremely high (99.9%)
probability that the bug is in the system configuration or application
logic, so you will need full source code access to the workload to
understand what might have gone wrong.  It's almost never a kernel
bug, but rather a fundamental application design or system
configuration problem.

> Is it a well-designed system in general if it would take more than
> three seconds for the IO to complete with hardware glitch ruled out?

Well, it depends on your definition of "well-designed" and "the I/O",
doesn't it?  If you are using a cost-optimized cheap-sh*t flash device
from Shenzhen, it can minutes for I/O to complete.  Just try copying
DVD's worth of data using buffered writes to said slow USB device, and
run "sync" or "umount /mnt", and watch the process hang for a long,
long time.

Or if you are using a cloud environment, and you are using virtual
block device which is provisioned for a small number of IOPS, whose
fault is it?  The cloud system, for throttling I/O the IOPS that was
provisioned for the device?  The person who created the VM, for not
appropriately provisioning enough IOPS?  Or the application
programmer?  Or the kernel programmer?  (And if you've ever worked at
a Linux distro or a cloud provider, you can be sure that at least one
platinum customer will try to blame the kernel programmer.  :-)

Or if you are using a storage area network, and you have a real time
process which is logging to a file, and battle damage takes out part
of the storage area network, and now the real-time process (which
might be responsible for ship navigation or missle defense) hangs
because file write is hanging, is that a "well-designed system"?  This
is why you never, never, *NEVER* write to a file from a mission or
life-critical real-time thread.  Instead you log to a ring buffer
which is shared by non-real-time process, and that non-realtime
process will save the log information to a file.  And if the
non-real-time process can't keep up with writing the log, which is
worse?  Missing log information?  Or a laggy or deadlocked missile
defense system?

The bottom line is especially if you are trying to use real-time
threads, the entire system configuration, including choice of
hardware, as well as the overall system architecture, all needs to be
part of a holistic design.  You can't just be "the OS person" but
instead you need to be part of the overall system architecture team.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3)
       [not found]       ` <20211222022527.1880-1-hdanton@sina.com>
@ 2021-12-22  4:35         ` Theodore Ts'o
  2022-05-20 11:57           ` Dmitry Vyukov
  0 siblings, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2021-12-22  4:35 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Matthew Wilcox, syzbot, linux-fsdevel, linux-kernel, syzkaller-bugs

On Wed, Dec 22, 2021 at 10:25:27AM +0800, Hillf Danton wrote:
> > I'm not sure what you hope to learn by doing something like that.
> > That will certainly perturb the system, but every 150 seconds, the
> > task is going to let other tasks/threads run --- but it will be
> > whatever is the next highest priority thread. 
> 
> Without reproducer, I am trying to reproduce the issue using a FIFO CPU hog
> which is supposed to beat the watchdog to show me the victims like various
> kthreads, workqueue workers and user apps, despite I know zero about how the
> watchdog is configured except the report was down to watchdog bite.

It's really trivial to reproduce an issue that has the same symptom as
what has been reported to you.  Mount the file system using a
non-real-time (SCHED_OTHER) thread, such that the jbd2 and ext4 worker
threads are running SCHED_OTHER.  Then run some file system workload
(fsstress or fsmark) as SCHED_FIFO.  Then on an N CPU system, run N
processes as SCHED_FIFO at any priority (doesn't matter whether it's
MAX_PRI-1 or MIN_PRI; SCHED_FIFO will have priority over SCHED_OTHER
processes, so this will effectively starve the ext4 and jbd2 worker
threads from ever getting to run.  Once the ext4 journal fills up, any
SCHED_FIFO process which tries to write to the file system will hang.

The problem is that's *one* potential stupid configuration of the
real-time system.  It's not necessarily the *only* potentially stupid
way that you can get yourself into a system hang.  It appears the
syzkaller "repro" is another such "stupid way".  And the number of
ways you can screw up with a real-time system is practically
unbounded...

So getting back to syzkaller, Willy had the right approach, which is a
Syzcaller "repro" happens to use SCHED_FIFO or SCHED_RR, and the
symptom is a system hang, it's probably worth ignoring the report,
since it's going to be a waste of time to debug userspace bug.  If you
have anything that uses kernel threads, and SCHED_FIFO or SCHED_RR is
in play, it's probably a userspace bug.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3)
  2021-12-18 19:50 [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3) syzbot
  2021-12-18 21:22 ` Matthew Wilcox
       [not found] ` <20211219023540.1638-1-hdanton@sina.com>
@ 2021-12-23  5:32 ` syzbot
  2 siblings, 0 replies; 11+ messages in thread
From: syzbot @ 2021-12-23  5:32 UTC (permalink / raw)
  To: hdanton, jack, linux-ext4, linux-fsdevel, linux-kernel,
	syzkaller-bugs, tytso, viro, willy

syzbot has found a reproducer for the following issue on:

HEAD commit:    3f066e882bf1 Merge tag 'for-5.16/parisc-7' of git://git.ke..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13aeaedbb00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6104739ac5f067ea
dashboard link: https://syzkaller.appspot.com/bug?extid=9c3fb12e9128b6e1d7eb
compiler:       Debian clang version 11.0.1-2, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=130e8ea5b00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15b3620db00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+9c3fb12e9128b6e1d7eb@syzkaller.appspotmail.com

INFO: task jbd2/sda1-8:2935 blocked for more than 143 seconds.
      Not tainted 5.16.0-rc6-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:jbd2/sda1-8     state:D stack:24688 pid: 2935 ppid:     2 flags:0x00004000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:4972 [inline]
 __schedule+0xb72/0x1460 kernel/sched/core.c:6253
 schedule+0x12b/0x1f0 kernel/sched/core.c:6326
 jbd2_journal_commit_transaction+0xc24/0x5c00 fs/jbd2/commit.c:496
 kjournald2+0x4b4/0x940 fs/jbd2/journal.c:213
 kthread+0x468/0x490 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30
 </TASK>

Showing all locks held in the system:
1 lock held by khungtaskd/27:
 #0: ffffffff8cb1de00 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x30
2 locks held by getty/3281:
 #0: ffff88814b4a6098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x21/0x70 drivers/tty/tty_ldisc.c:252
 #1: ffffc90002b962e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6c5/0x1c60 drivers/tty/n_tty.c:2113
2 locks held by syz-executor272/3690:
 #0: ffff88806d05dda8 (&type->i_mutex_dir_key#4){++++}-{3:3}, at: iterate_dir+0x124/0x640 fs/readdir.c:55
 #1: ffff88814b6d0460 (sb_writers#5){.+.+}-{0:0}, at: file_accessed include/linux/fs.h:2505 [inline]
 #1: ffff88814b6d0460 (sb_writers#5){.+.+}-{0:0}, at: iterate_dir+0x552/0x640 fs/readdir.c:70
2 locks held by syz-executor272/3689:
 #0: ffff88806d0349b8 (&type->i_mutex_dir_key#4){++++}-{3:3}, at: iterate_dir+0x124/0x640 fs/readdir.c:55
 #1: ffff88814b6d0460 (sb_writers#5){.+.+}-{0:0}, at: file_accessed include/linux/fs.h:2505 [inline]
 #1: ffff88814b6d0460 (sb_writers#5){.+.+}-{0:0}, at: iterate_dir+0x552/0x640 fs/readdir.c:70
3 locks held by syz-executor272/3691:
 #0: ffff88814b6d0460 (sb_writers#5){.+.+}-{0:0}, at: mnt_want_write+0x3b/0x80 fs/namespace.c:376
 #1: ffff888074dcf198 (&type->i_mutex_dir_key#4/1){+.+.}-{3:3}, at: inode_lock_nested include/linux/fs.h:818 [inline]
 #1: ffff888074dcf198 (&type->i_mutex_dir_key#4/1){+.+.}-{3:3}, at: filename_create+0x1da/0x4e0 fs/namei.c:3654
 #2: ffff88814b6d4990 (jbd2_handle){++++}-{0:0}, at: start_this_handle+0x136d/0x1630 fs/jbd2/transaction.c:466
2 locks held by syz-executor272/3693:
 #0: ffff88806d0317e0 (&type->i_mutex_dir_key#4){++++}-{3:3}, at: iterate_dir+0x124/0x640 fs/readdir.c:55
 #1: ffff88814b6d0460 (sb_writers#5){.+.+}-{0:0}, at: file_accessed include/linux/fs.h:2505 [inline]
 #1: ffff88814b6d0460 (sb_writers#5){.+.+}-{0:0}, at: iterate_dir+0x552/0x640 fs/readdir.c:70
2 locks held by syz-executor272/3694:
 #0: ffff88806d0617e0 (&type->i_mutex_dir_key#4){++++}-{3:3}, at: iterate_dir+0x124/0x640 fs/readdir.c:55
 #1: ffff88814b6d0460 (sb_writers#5){.+.+}-{0:0}, at: file_accessed include/linux/fs.h:2505 [inline]
 #1: ffff88814b6d0460 (sb_writers#5){.+.+}-{0:0}, at: iterate_dir+0x552/0x640 fs/readdir.c:70
2 locks held by syz-executor272/3695:
 #0: ffff88806d035da8 (&type->i_mutex_dir_key#4){++++}-{3:3}, at: iterate_dir+0x124/0x640 fs/readdir.c:55
 #1: ffff88814b6d0460 (sb_writers#5){.+.+}-{0:0}, at: file_accessed include/linux/fs.h:2505 [inline]
 #1: ffff88814b6d0460 (sb_writers#5){.+.+}-{0:0}, at: iterate_dir+0x552/0x640 fs/readdir.c:70
2 locks held by kworker/u4:1/3719:
4 locks held by kworker/u4:4/3847:
 #0: ffff888013bf3938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_one_work+0x7ca/0x1140
 #1: ffffc90003167d20 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_one_work+0x808/0x1140 kernel/workqueue.c:2273
 #2: ffff88814b6d00e0 (&type->s_umount_key#31){++++}-{3:3}, at: trylock_super+0x1b/0xf0 fs/super.c:418
 #3: ffff88814b6d2bd8 (&sbi->s_writepages_rwsem){.+.+}-{0:0}, at: ext4_writepages+0x1dd/0x4080 fs/ext4/inode.c:2655

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 27 Comm: khungtaskd Not tainted 5.16.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1dc/0x2d8 lib/dump_stack.c:106
 nmi_cpu_backtrace+0x45f/0x490 lib/nmi_backtrace.c:111
 nmi_trigger_cpumask_backtrace+0x16a/0x280 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xc82/0xcd0 kernel/hung_task.c:295
 kthread+0x468/0x490 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30
 </TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 3719 Comm: kworker/u4:1 Not tainted 5.16.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: bat_events batadv_nc_worker
RIP: 0010:mark_lock+0x6/0x1e00 kernel/locking/lockdep.c:4566
Code: 07 80 c1 03 38 c1 0f 8c 6f ff ff ff 4c 89 ff e8 80 fb 69 00 e9 62 ff ff ff e8 26 e7 ac 08 66 0f 1f 44 00 00 55 48 89 e5 41 57 <41> 56 41 55 41 54 53 48 83 e4 e0 48 81 ec c0 01 00 00 65 48 8b 04
RSP: 0018:ffffc900029bf8e8 EFLAGS: 00000046
RAX: 0000000000000002 RBX: ffff8880217761d0 RCX: 00000000ffffffff
RDX: 0000000000000008 RSI: ffff8880217761b0 RDI: ffff888021775700
RBP: ffffc900029bf8f0 R08: dffffc0000000000 R09: fffffbfff1ff3ff8
R10: fffffbfff1ff3ff8 R11: 0000000000000000 R12: 000000000000000a
R13: ffff8880217761b0 R14: ffff888021775700 R15: ffff8880217761d0
FS:  0000000000000000(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005621d6c67680 CR3: 000000000c88e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 mark_usage kernel/locking/lockdep.c:4526 [inline]
 __lock_acquire+0xd38/0x2b00 kernel/locking/lockdep.c:4981
 lock_acquire+0x19f/0x4d0 kernel/locking/lockdep.c:5637
 rcu_lock_acquire+0x2a/0x30 include/linux/rcupdate.h:268
 rcu_read_lock include/linux/rcupdate.h:688 [inline]
 batadv_nc_purge_orig_hash net/batman-adv/network-coding.c:412 [inline]
 batadv_nc_worker+0xc8/0x5b0 net/batman-adv/network-coding.c:723
 process_one_work+0x853/0x1140 kernel/workqueue.c:2298
 worker_thread+0xac1/0x1320 kernel/workqueue.c:2445
 kthread+0x468/0x490 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30
 </TASK>
----------------
Code disassembly (best guess), 1 bytes skipped:
   0:	80 c1 03             	add    $0x3,%cl
   3:	38 c1                	cmp    %al,%cl
   5:	0f 8c 6f ff ff ff    	jl     0xffffff7a
   b:	4c 89 ff             	mov    %r15,%rdi
   e:	e8 80 fb 69 00       	callq  0x69fb93
  13:	e9 62 ff ff ff       	jmpq   0xffffff7a
  18:	e8 26 e7 ac 08       	callq  0x8ace743
  1d:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
  23:	55                   	push   %rbp
  24:	48 89 e5             	mov    %rsp,%rbp
  27:	41 57                	push   %r15
* 29:	41 56                	push   %r14 <-- trapping instruction
  2b:	41 55                	push   %r13
  2d:	41 54                	push   %r12
  2f:	53                   	push   %rbx
  30:	48 83 e4 e0          	and    $0xffffffffffffffe0,%rsp
  34:	48 81 ec c0 01 00 00 	sub    $0x1c0,%rsp
  3b:	65                   	gs
  3c:	48                   	rex.W
  3d:	8b                   	.byte 0x8b
  3e:	04                   	.byte 0x4


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3)
  2021-12-22  4:35         ` Theodore Ts'o
@ 2022-05-20 11:57           ` Dmitry Vyukov
  2022-05-20 21:45             ` Theodore Ts'o
  0 siblings, 1 reply; 11+ messages in thread
From: Dmitry Vyukov @ 2022-05-20 11:57 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Hillf Danton, Matthew Wilcox, syzbot, linux-fsdevel,
	linux-kernel, syzkaller-bugs, syzkaller

On Wed, 22 Dec 2021 at 05:35, Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Wed, Dec 22, 2021 at 10:25:27AM +0800, Hillf Danton wrote:
> > > I'm not sure what you hope to learn by doing something like that.
> > > That will certainly perturb the system, but every 150 seconds, the
> > > task is going to let other tasks/threads run --- but it will be
> > > whatever is the next highest priority thread.
> >
> > Without reproducer, I am trying to reproduce the issue using a FIFO CPU hog
> > which is supposed to beat the watchdog to show me the victims like various
> > kthreads, workqueue workers and user apps, despite I know zero about how the
> > watchdog is configured except the report was down to watchdog bite.
>
> It's really trivial to reproduce an issue that has the same symptom as
> what has been reported to you.  Mount the file system using a
> non-real-time (SCHED_OTHER) thread, such that the jbd2 and ext4 worker
> threads are running SCHED_OTHER.  Then run some file system workload
> (fsstress or fsmark) as SCHED_FIFO.  Then on an N CPU system, run N
> processes as SCHED_FIFO at any priority (doesn't matter whether it's
> MAX_PRI-1 or MIN_PRI; SCHED_FIFO will have priority over SCHED_OTHER
> processes, so this will effectively starve the ext4 and jbd2 worker
> threads from ever getting to run.  Once the ext4 journal fills up, any
> SCHED_FIFO process which tries to write to the file system will hang.
>
> The problem is that's *one* potential stupid configuration of the
> real-time system.  It's not necessarily the *only* potentially stupid
> way that you can get yourself into a system hang.  It appears the
> syzkaller "repro" is another such "stupid way".  And the number of
> ways you can screw up with a real-time system is practically
> unbounded...
>
> So getting back to syzkaller, Willy had the right approach, which is a
> Syzcaller "repro" happens to use SCHED_FIFO or SCHED_RR, and the
> symptom is a system hang, it's probably worth ignoring the report,
> since it's going to be a waste of time to debug userspace bug.  If you
> have anything that uses kernel threads, and SCHED_FIFO or SCHED_RR is
> in play, it's probably a userspace bug.
>
> Cheers,

Hi Ted,

Reviving this old thread re syzkaller using SCHED_FIFO.

It's a bit hard to restrict what the fuzzer can do if we give it
sched_setattr() and friends syscalls. We could remove them from the
fuzzer entirely, but it's probably suboptimal as well.

I see that setting up SCHED_FIFO is guarded by CAP_SYS_NICE:
https://elixir.bootlin.com/linux/v5.18-rc7/source/kernel/sched/core.c#L7264

And I see we drop CAP_SYS_NICE from the fuzzer process since 2019
(after a similar discussion):
https://github.com/google/syzkaller/commit/f3ad68446455a

The latest C reproducer contains:

static void drop_caps(void)
{
  struct __user_cap_header_struct cap_hdr = {};
  struct __user_cap_data_struct cap_data[2] = {};
  cap_hdr.version = _LINUX_CAPABILITY_VERSION_3;
  cap_hdr.pid = getpid();
  if (syscall(SYS_capget, &cap_hdr, &cap_data))
    exit(1);
  const int drop = (1 << CAP_SYS_PTRACE) | (1 << CAP_SYS_NICE);
  cap_data[0].effective &= ~drop;
  cap_data[0].permitted &= ~drop;
  cap_data[0].inheritable &= ~drop;
  if (syscall(SYS_capset, &cap_hdr, &cap_data))
    exit(1);
}

Are we holding it wrong? How can the process manage to set any bad
scheduling policies if it dropped CAP_SYS_NICE?...
The process still has CAP_SYS_ADMIN, but I assume it should not allow
it using something that requires dropped CAP_SYS_NICE.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3)
  2022-05-20 11:57           ` Dmitry Vyukov
@ 2022-05-20 21:45             ` Theodore Ts'o
  2022-05-23 11:34               ` Dmitry Vyukov
  0 siblings, 1 reply; 11+ messages in thread
From: Theodore Ts'o @ 2022-05-20 21:45 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Hillf Danton, Matthew Wilcox, syzbot, linux-fsdevel,
	linux-kernel, syzkaller-bugs, syzkaller

On Fri, May 20, 2022 at 01:57:07PM +0200, Dmitry Vyukov wrote:
> 
> Hi Ted,
> 
> Reviving this old thread re syzkaller using SCHED_FIFO.
> 
> It's a bit hard to restrict what the fuzzer can do if we give it
> sched_setattr() and friends syscalls. We could remove them from the
> fuzzer entirely, but it's probably suboptimal as well.
> 
> I see that setting up SCHED_FIFO is guarded by CAP_SYS_NICE:
> https://elixir.bootlin.com/linux/v5.18-rc7/source/kernel/sched/core.c#L7264
> 
> And I see we drop CAP_SYS_NICE from the fuzzer process since 2019
> (after a similar discussion):
> https://github.com/google/syzkaller/commit/f3ad68446455a
>
> The latest C reproducer contains: ....

For this particular report, there *was* no C reproducer.  There was
only a syz reproducer:

> syzbot found the following issue on:
> 
> HEAD commit:    5472f14a3742 Merge tag 'for_linus' of git://git.kernel.org..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=11132113b00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=e3bdfd29b408d1b6
> dashboard link: https://syzkaller.appspot.com/bug?extid=9c3fb12e9128b6e1d7eb
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14559113b00000

So let me ask a counter question.  I thought syzbot tries to create a
minimal reproducer?  So if the sched_setattr is a no-op, and is
returning EPERM, why wasn't the sched_setattr line removed from the
syz repro?

As a side note, since in many cases running a reproducer can be
painful, would it be possible for the syzkiller dashboard to provide
the output of running "strace -f" while the reproducer is running?

That would also especially help since even when there is a C
reproducer, trying to understand what it is doing from reading the
syzbot-generated C source code is often non-trivial, and strace does a
much better job decoding what the !@#?@ the reproducer.  Another
advantage of using strace is that it will also show us the return code
from the system call, which would very quickly confirm whether the
sched_setattr() was actually returning EPERM or not --- and it will
decode the system call arguments in a way that I often wished would be
included as comments in the syzbot-generated reproducer.

Providing the strace output could significantly reduce the amount of
upstream developer toil, and might therefore improve upstream
developer engagement with syzkaller.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3)
  2022-05-20 21:45             ` Theodore Ts'o
@ 2022-05-23 11:34               ` Dmitry Vyukov
  2022-05-24 10:59                 ` Jan Kara
  0 siblings, 1 reply; 11+ messages in thread
From: Dmitry Vyukov @ 2022-05-23 11:34 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Hillf Danton, Matthew Wilcox, syzbot, linux-fsdevel,
	linux-kernel, syzkaller-bugs, syzkaller, Aleksandr Nogikh

On Fri, 20 May 2022 at 23:45, Theodore Ts'o <tytso@mit.edu> wrote:
> > Hi Ted,
> >
> > Reviving this old thread re syzkaller using SCHED_FIFO.
> >
> > It's a bit hard to restrict what the fuzzer can do if we give it
> > sched_setattr() and friends syscalls. We could remove them from the
> > fuzzer entirely, but it's probably suboptimal as well.
> >
> > I see that setting up SCHED_FIFO is guarded by CAP_SYS_NICE:
> > https://elixir.bootlin.com/linux/v5.18-rc7/source/kernel/sched/core.c#L7264
> >
> > And I see we drop CAP_SYS_NICE from the fuzzer process since 2019
> > (after a similar discussion):
> > https://github.com/google/syzkaller/commit/f3ad68446455a
> >
> > The latest C reproducer contains: ....
>
> For this particular report, there *was* no C reproducer.  There was
> only a syz reproducer:

Yes, but the same logic should have been used. I just showed the
excerpt from the C reproducer b/c maybe there is something wrong with
that CAP_SYS_NICE logic.

> > syzbot found the following issue on:
> >
> > HEAD commit:    5472f14a3742 Merge tag 'for_linus' of git://git.kernel.org..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=11132113b00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=e3bdfd29b408d1b6
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9c3fb12e9128b6e1d7eb
> > compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14559113b00000
>
> So let me ask a counter question.  I thought syzbot tries to create a
> minimal reproducer?  So if the sched_setattr is a no-op, and is
> returning EPERM, why wasn't the sched_setattr line removed from the
> syz repro?

Two most common reasons would be (1) flaky reproduction, then each
minimization step can falsely decide that something is needed to
reproduce the bug b/c when we remove it, the crash does not reproduce;
(2) previously order/number of syscalls affected what syscalls are
executed concurrently to provoke races, so removing one syscall could
affect how subsequent syscalls are collided (this is not the case
anymore, but was the case when this bug was reported).


> As a side note, since in many cases running a reproducer can be
> painful, would it be possible for the syzkiller dashboard to provide
> the output of running "strace -f" while the reproducer is running?
>
> That would also especially help since even when there is a C
> reproducer, trying to understand what it is doing from reading the
> syzbot-generated C source code is often non-trivial, and strace does a
> much better job decoding what the !@#?@ the reproducer.  Another
> advantage of using strace is that it will also show us the return code
> from the system call, which would very quickly confirm whether the
> sched_setattr() was actually returning EPERM or not --- and it will
> decode the system call arguments in a way that I often wished would be
> included as comments in the syzbot-generated reproducer.
>
> Providing the strace output could significantly reduce the amount of
> upstream developer toil, and might therefore improve upstream
> developer engagement with syzkaller.

+Aleksandr added this feature recently.

Console output will contain strace output for reproducers (when the
run under strace reproduced the same kernel crash as w/o strace).

Here is a recently reported bug:
https://syzkaller.appspot.com/bug?id=53c9bd2ca0e16936e45ff1333a22b838d91da0a2

"log" link for the reproducer crash shows:
https://syzkaller.appspot.com/text?tag=CrashLog&x=14f791aef00000
...

[   26.757008][ T3179] 8021q: adding VLAN 0 to HW filter on device bond0
[   26.766878][ T3179] eql: remember to turn off Van-Jacobson
compression on your slave devices
Starting sshd: OK
Warning: Permanently added '10.128.0.110' (ECDSA) to the list of known hosts.
execve("./syz-executor1865045535", ["./syz-executor1865045535"],
0x7ffdc91edf40 /* 10 vars */) = 0
brk(NULL)                               = 0x555557248000
brk(0x555557248c40)                     = 0x555557248c40
arch_prctl(ARCH_SET_FS, 0x555557248300) = 0
uname({sysname="Linux", nodename="syzkaller", ...}) = 0
readlink("/proc/self/exe", "/root/syz-executor1865045535", 4096) = 28
brk(0x555557269c40)                     = 0x555557269c40
brk(0x55555726a000)                     = 0x55555726a000
mprotect(0x7f37f8ecc000, 16384, PROT_READ) = 0
mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS,
-1, 0) = 0x1ffff000
mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS,
-1, 0) = 0x21000000
socket(AF_NETLINK, SOCK_RAW, NETLINK_NETFILTER) = 3
syzkaller login: [   58.834018][ T3600] ------------[ cut here ]------------
[   58.839772][ T3600] WARNING: CPU: 0 PID: 3600 at
net/netfilter/nfnetlink.c:703 nfnetlink_unbind+0x357/0x3b0
[   58.849856][ T3600] Modules linked in:
...


The same is available in the report emails, e.g.:
https://lore.kernel.org/all/000000000000ff239c05df391402@google.com/
...
console+strace: https://syzkaller.appspot.com/x/log.txt?x=14f791aef00000
...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3)
  2022-05-23 11:34               ` Dmitry Vyukov
@ 2022-05-24 10:59                 ` Jan Kara
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kara @ 2022-05-24 10:59 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Theodore Ts'o, Hillf Danton, Matthew Wilcox, syzbot,
	linux-fsdevel, linux-kernel, syzkaller-bugs, syzkaller,
	Aleksandr Nogikh

On Mon 23-05-22 13:34:23, Dmitry Vyukov wrote:
> +Aleksandr added this feature recently.
> 
> Console output will contain strace output for reproducers (when the
> run under strace reproduced the same kernel crash as w/o strace).
> 
> Here is a recently reported bug:
> https://syzkaller.appspot.com/bug?id=53c9bd2ca0e16936e45ff1333a22b838d91da0a2
> 
> "log" link for the reproducer crash shows:
> https://syzkaller.appspot.com/text?tag=CrashLog&x=14f791aef00000
> ...
> 
> [   26.757008][ T3179] 8021q: adding VLAN 0 to HW filter on device bond0
> [   26.766878][ T3179] eql: remember to turn off Van-Jacobson
> compression on your slave devices
> Starting sshd: OK
> Warning: Permanently added '10.128.0.110' (ECDSA) to the list of known hosts.
> execve("./syz-executor1865045535", ["./syz-executor1865045535"],
> 0x7ffdc91edf40 /* 10 vars */) = 0
> brk(NULL)                               = 0x555557248000
> brk(0x555557248c40)                     = 0x555557248c40
> arch_prctl(ARCH_SET_FS, 0x555557248300) = 0
> uname({sysname="Linux", nodename="syzkaller", ...}) = 0
> readlink("/proc/self/exe", "/root/syz-executor1865045535", 4096) = 28
> brk(0x555557269c40)                     = 0x555557269c40
> brk(0x55555726a000)                     = 0x55555726a000
> mprotect(0x7f37f8ecc000, 16384, PROT_READ) = 0
> mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS,
> -1, 0) = 0x1ffff000
> mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC,
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
> mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS,
> -1, 0) = 0x21000000
> socket(AF_NETLINK, SOCK_RAW, NETLINK_NETFILTER) = 3
> syzkaller login: [   58.834018][ T3600] ------------[ cut here ]------------
> [   58.839772][ T3600] WARNING: CPU: 0 PID: 3600 at
> net/netfilter/nfnetlink.c:703 nfnetlink_unbind+0x357/0x3b0
> [   58.849856][ T3600] Modules linked in:
> ...
> 
> 
> The same is available in the report emails, e.g.:
> https://lore.kernel.org/all/000000000000ff239c05df391402@google.com/
> ...
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=14f791aef00000
> ...

Wow, cool! Thanks for adding that! This was often one of the first steps I
did when looking into what the problem could be so it saves me some manual
work :)

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-05-24 10:59 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-18 19:50 [syzbot] INFO: task hung in jbd2_journal_commit_transaction (3) syzbot
2021-12-18 21:22 ` Matthew Wilcox
     [not found] ` <20211219023540.1638-1-hdanton@sina.com>
2021-12-19  4:20   ` Matthew Wilcox
2021-12-20 21:24     ` Theodore Ts'o
     [not found]     ` <20211221090804.1810-1-hdanton@sina.com>
2021-12-21 22:32       ` Theodore Ts'o
     [not found]       ` <20211222022527.1880-1-hdanton@sina.com>
2021-12-22  4:35         ` Theodore Ts'o
2022-05-20 11:57           ` Dmitry Vyukov
2022-05-20 21:45             ` Theodore Ts'o
2022-05-23 11:34               ` Dmitry Vyukov
2022-05-24 10:59                 ` Jan Kara
2021-12-23  5:32 ` syzbot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.