linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* INFO: rcu detected stall in sys_sendfile64 (2)
@ 2019-01-19 11:32 syzbot
  2019-01-19 11:41 ` Dmitry Vyukov
  2019-03-12  3:59 ` syzbot
  0 siblings, 2 replies; 17+ messages in thread
From: syzbot @ 2019-01-19 11:32 UTC (permalink / raw)
  To: akpm, amir73il, darrick.wong, david, hannes, hughd, jrdr.linux,
	linux-kernel, linux-mm, syzkaller-bugs, willy

Hello,

syzbot found the following crash on:

HEAD commit:    2339e91d0e66 Merge tag 'media/v5.0-1' of git://git.kernel...
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=175f2638c00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258
dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15df4108c00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+1505c80c74256c6118a5@syzkaller.appspotmail.com

audit: type=1400 audit(1547895693.874:36): avc:  denied  { map } for   
pid=8427 comm="syz-executor786" path="/root/syz-executor786610373"  
dev="sda1" ino=1426 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023  
tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
hrtimer: interrupt took 42996 ns
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	(detected by 0, t=10502 jiffies, g=6469, q=3)
rcu: All QSes seen, last rcu_preempt kthread activity 10502  
(4295051508-4295041006), jiffies_till_next_fqs=1, root ->qsmask 0x0
syz-executor786 R  running task    21544  8437   8433 0x00000000
Call Trace:
  <IRQ>
  sched_show_task kernel/sched/core.c:5293 [inline]
  sched_show_task.cold+0x273/0x2d5 kernel/sched/core.c:5268
  print_other_cpu_stall.cold+0x7f2/0x8bb kernel/rcu/tree.c:1301
  check_cpu_stall kernel/rcu/tree.c:1429 [inline]
  rcu_pending kernel/rcu/tree.c:3018 [inline]
  rcu_check_callbacks+0xf36/0x1380 kernel/rcu/tree.c:2521
  update_process_times+0x32/0x80 kernel/time/timer.c:1635
  tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
  tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
  __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
  __hrtimer_run_queues+0x3a7/0x1050 kernel/time/hrtimer.c:1451
  hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
  smp_apic_timer_interrupt+0x18d/0x760 arch/x86/kernel/apic/apic.c:1060
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
  </IRQ>
RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline]
RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline]
RIP: 0010:write_comp_data+0x9/0x70 kernel/kcov.c:122
Code: 12 00 00 8b 80 dc 12 00 00 48 8b 11 48 83 c2 01 48 39 d0 76 07 48 89  
34 d1 48 89 11 5d c3 0f 1f 00 65 4c 8b 04 25 40 ee 01 00 <65> 8b 05 80 ee  
7f 7e a9 00 01 1f 00 75 51 41 8b 80 d8 12 00 00 83
RSP: 0018:ffff888080466f58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: 000000000000226d RBX: ffff888080467140 RCX: ffffffff87f08bdf
RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000007
RBP: ffff888080466f60 R08: ffff888096338480 R09: ffffed1015cc5b90
R10: ffffed1015cc5b8f R11: ffff8880ae62dc7b R12: 1ffff1101008cdf0
R13: ffff888092b64102 R14: ffff888092b64102 R15: ffff888080467158
  xa_is_node include/linux/xarray.h:946 [inline]
  xas_start+0x1cf/0x720 lib/xarray.c:183
  xas_load+0x21/0x160 lib/xarray.c:227
  find_get_entry+0x350/0x10a0 mm/filemap.c:1476
  pagecache_get_page+0xe6/0x1020 mm/filemap.c:1579
  find_get_page include/linux/pagemap.h:272 [inline]
  generic_file_buffered_read mm/filemap.c:2076 [inline]
  generic_file_read_iter+0x7b2/0x2d40 mm/filemap.c:2350
  ext4_file_read_iter+0x180/0x3c0 fs/ext4/file.c:77
  call_read_iter include/linux/fs.h:1856 [inline]
  generic_file_splice_read+0x5c4/0xa90 fs/splice.c:308
  do_splice_to+0x12a/0x190 fs/splice.c:880
  splice_direct_to_actor+0x31b/0x9d0 fs/splice.c:957
  do_splice_direct+0x2c7/0x420 fs/splice.c:1066
  do_sendfile+0x61a/0xe60 fs/read_write.c:1436
  __do_sys_sendfile64 fs/read_write.c:1491 [inline]
  __se_sys_sendfile64 fs/read_write.c:1483 [inline]
  __x64_sys_sendfile64+0x15a/0x240 fs/read_write.c:1483
  do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446a19
Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 4b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f53e33cfda8 EFLAGS: 00000297 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 00000000006dcc28 RCX: 0000000000446a19
RDX: 0000000020000380 RSI: 0000000000000003 RDI: 0000000000000003
RBP: 00000000006dcc20 R08: 0000000000000000 R09: 0000000000000000
R10: 00008080fffffffe R11: 0000000000000297 R12: 00000000006dcc2c
R13: 00008080fffffffe R14: 00007f53e33d09c0 R15: 00000000006dcd2c
rcu: rcu_preempt kthread starved for 10502 jiffies! g6469 f0x2  
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu: RCU grace-period kthread stack dump:
rcu_preempt     R  running task    26200    10      2 0x80000000
Call Trace:
  context_switch kernel/sched/core.c:2831 [inline]
  __schedule+0x897/0x1e60 kernel/sched/core.c:3472
  schedule+0xfe/0x350 kernel/sched/core.c:3516
  schedule_timeout+0x14a/0x250 kernel/time/timer.c:1803
  rcu_gp_fqs_loop+0x6ba/0x970 kernel/rcu/tree.c:1948
  rcu_gp_kthread+0x2bb/0xc10 kernel/rcu/tree.c:2105
  kthread+0x357/0x430 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
sched: RT throttling activated


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-01-19 11:32 INFO: rcu detected stall in sys_sendfile64 (2) syzbot
@ 2019-01-19 11:41 ` Dmitry Vyukov
  2019-01-19 13:00   ` Tetsuo Handa
  2019-03-12  3:59 ` syzbot
  1 sibling, 1 reply; 17+ messages in thread
From: Dmitry Vyukov @ 2019-01-19 11:41 UTC (permalink / raw)
  To: syzbot, Ingo Molnar, Peter Zijlstra
  Cc: Andrew Morton, Amir Goldstein, Darrick J. Wong, Dave Chinner,
	Johannes Weiner, Hugh Dickins, Souptick Joarder, LKML, Linux-MM,
	syzkaller-bugs, Matthew Wilcox

On Sat, Jan 19, 2019 at 12:32 PM syzbot
<syzbot+1505c80c74256c6118a5@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    2339e91d0e66 Merge tag 'media/v5.0-1' of git://git.kernel...
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=175f2638c00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258
> dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c00000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15df4108c00000

Looking at the reproducer it looks like something with scheduler as it
involves perf_event_open and sched_setattr. So +Peter and Mingo.
Is it the same root cause as the other stalls involving sched_setattr?

> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+1505c80c74256c6118a5@syzkaller.appspotmail.com
>
> audit: type=1400 audit(1547895693.874:36): avc:  denied  { map } for
> pid=8427 comm="syz-executor786" path="/root/syz-executor786610373"
> dev="sda1" ino=1426 scontext=unconfined_u:system_r:insmod_t:s0-s0:c0.c1023
> tcontext=unconfined_u:object_r:user_home_t:s0 tclass=file permissive=1
> hrtimer: interrupt took 42996 ns
> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> rcu:    (detected by 0, t=10502 jiffies, g=6469, q=3)
> rcu: All QSes seen, last rcu_preempt kthread activity 10502
> (4295051508-4295041006), jiffies_till_next_fqs=1, root ->qsmask 0x0
> syz-executor786 R  running task    21544  8437   8433 0x00000000
> Call Trace:
>   <IRQ>
>   sched_show_task kernel/sched/core.c:5293 [inline]
>   sched_show_task.cold+0x273/0x2d5 kernel/sched/core.c:5268
>   print_other_cpu_stall.cold+0x7f2/0x8bb kernel/rcu/tree.c:1301
>   check_cpu_stall kernel/rcu/tree.c:1429 [inline]
>   rcu_pending kernel/rcu/tree.c:3018 [inline]
>   rcu_check_callbacks+0xf36/0x1380 kernel/rcu/tree.c:2521
>   update_process_times+0x32/0x80 kernel/time/timer.c:1635
>   tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:161
>   tick_sched_timer+0x47/0x130 kernel/time/tick-sched.c:1271
>   __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
>   __hrtimer_run_queues+0x3a7/0x1050 kernel/time/hrtimer.c:1451
>   hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
>   local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1035 [inline]
>   smp_apic_timer_interrupt+0x18d/0x760 arch/x86/kernel/apic/apic.c:1060
>   apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:807
>   </IRQ>
> RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:26 [inline]
> RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline]
> RIP: 0010:write_comp_data+0x9/0x70 kernel/kcov.c:122
> Code: 12 00 00 8b 80 dc 12 00 00 48 8b 11 48 83 c2 01 48 39 d0 76 07 48 89
> 34 d1 48 89 11 5d c3 0f 1f 00 65 4c 8b 04 25 40 ee 01 00 <65> 8b 05 80 ee
> 7f 7e a9 00 01 1f 00 75 51 41 8b 80 d8 12 00 00 83
> RSP: 0018:ffff888080466f58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: 000000000000226d RBX: ffff888080467140 RCX: ffffffff87f08bdf
> RDX: 0000000000000002 RSI: 0000000000000002 RDI: 0000000000000007
> RBP: ffff888080466f60 R08: ffff888096338480 R09: ffffed1015cc5b90
> R10: ffffed1015cc5b8f R11: ffff8880ae62dc7b R12: 1ffff1101008cdf0
> R13: ffff888092b64102 R14: ffff888092b64102 R15: ffff888080467158
>   xa_is_node include/linux/xarray.h:946 [inline]
>   xas_start+0x1cf/0x720 lib/xarray.c:183
>   xas_load+0x21/0x160 lib/xarray.c:227
>   find_get_entry+0x350/0x10a0 mm/filemap.c:1476
>   pagecache_get_page+0xe6/0x1020 mm/filemap.c:1579
>   find_get_page include/linux/pagemap.h:272 [inline]
>   generic_file_buffered_read mm/filemap.c:2076 [inline]
>   generic_file_read_iter+0x7b2/0x2d40 mm/filemap.c:2350
>   ext4_file_read_iter+0x180/0x3c0 fs/ext4/file.c:77
>   call_read_iter include/linux/fs.h:1856 [inline]
>   generic_file_splice_read+0x5c4/0xa90 fs/splice.c:308
>   do_splice_to+0x12a/0x190 fs/splice.c:880
>   splice_direct_to_actor+0x31b/0x9d0 fs/splice.c:957
>   do_splice_direct+0x2c7/0x420 fs/splice.c:1066
>   do_sendfile+0x61a/0xe60 fs/read_write.c:1436
>   __do_sys_sendfile64 fs/read_write.c:1491 [inline]
>   __se_sys_sendfile64 fs/read_write.c:1483 [inline]
>   __x64_sys_sendfile64+0x15a/0x240 fs/read_write.c:1483
>   do_syscall_64+0x1a3/0x800 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x446a19
> Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 4b 07 fc ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007f53e33cfda8 EFLAGS: 00000297 ORIG_RAX: 0000000000000028
> RAX: ffffffffffffffda RBX: 00000000006dcc28 RCX: 0000000000446a19
> RDX: 0000000020000380 RSI: 0000000000000003 RDI: 0000000000000003
> RBP: 00000000006dcc20 R08: 0000000000000000 R09: 0000000000000000
> R10: 00008080fffffffe R11: 0000000000000297 R12: 00000000006dcc2c
> R13: 00008080fffffffe R14: 00007f53e33d09c0 R15: 00000000006dcd2c
> rcu: rcu_preempt kthread starved for 10502 jiffies! g6469 f0x2
> RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
> rcu: RCU grace-period kthread stack dump:
> rcu_preempt     R  running task    26200    10      2 0x80000000
> Call Trace:
>   context_switch kernel/sched/core.c:2831 [inline]
>   __schedule+0x897/0x1e60 kernel/sched/core.c:3472
>   schedule+0xfe/0x350 kernel/sched/core.c:3516
>   schedule_timeout+0x14a/0x250 kernel/time/timer.c:1803
>   rcu_gp_fqs_loop+0x6ba/0x970 kernel/rcu/tree.c:1948
>   rcu_gp_kthread+0x2bb/0xc10 kernel/rcu/tree.c:2105
>   kthread+0x357/0x430 kernel/kthread.c:246
>   ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
> sched: RT throttling activated
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
> syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000010b2fc057fcdfaba%40google.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-01-19 11:41 ` Dmitry Vyukov
@ 2019-01-19 13:00   ` Tetsuo Handa
  0 siblings, 0 replies; 17+ messages in thread
From: Tetsuo Handa @ 2019-01-19 13:00 UTC (permalink / raw)
  To: Dmitry Vyukov, syzbot, Ingo Molnar, Peter Zijlstra
  Cc: Andrew Morton, Amir Goldstein, Darrick J. Wong, Dave Chinner,
	Johannes Weiner, Hugh Dickins, Souptick Joarder, LKML, Linux-MM,
	syzkaller-bugs, Matthew Wilcox

On 2019/01/19 20:41, Dmitry Vyukov wrote:
> On Sat, Jan 19, 2019 at 12:32 PM syzbot
> <syzbot+1505c80c74256c6118a5@syzkaller.appspotmail.com> wrote:
>>
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    2339e91d0e66 Merge tag 'media/v5.0-1' of git://git.kernel...
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=175f2638c00000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258
>> dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5
>> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c00000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15df4108c00000
> 
> Looking at the reproducer it looks like something with scheduler as it
> involves perf_event_open and sched_setattr. So +Peter and Mingo.
> Is it the same root cause as the other stalls involving sched_setattr?

Yes. I think sched_setattr() involves this problem.

Reproducers from "BUG: workqueue lockup (4)" at
https://syzkaller.appspot.com/text?tag=ReproC&x=13ec31a5400000
involves sched_setattr(SCHED_DEADLINE) and
https://syzkaller.appspot.com/text?tag=ReproC&x=104da690c00000
also involves sched_setattr().


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-01-19 11:32 INFO: rcu detected stall in sys_sendfile64 (2) syzbot
  2019-01-19 11:41 ` Dmitry Vyukov
@ 2019-03-12  3:59 ` syzbot
  2019-03-12  4:08   ` Al Viro
  1 sibling, 1 reply; 17+ messages in thread
From: syzbot @ 2019-03-12  3:59 UTC (permalink / raw)
  To: airlied, akpm, amir73il, chris, darrick.wong, david, dri-devel,
	dvyukov, eparis, hannes, hughd, intel-gfx, jack, jani.nikula,
	joonas.lahtinen, jrdr.linux, linux-kernel, linux-mm, mingo,
	mszeredi, penguin-kernel, peterz, rodrigo.vivi, syzkaller-bugs,
	viro, willy

syzbot has bisected this bug to:

commit 34e07e42c55aeaa78e93b057a6664e2ecde3fadb
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Feb 8 10:54:48 2018 +0000

     drm/i915: Add missing kerneldoc for 'ent' in i915_driver_init_early

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13220283200000
start commit:   34e07e42 drm/i915: Add missing kerneldoc for 'ent' in i915..
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=10a20283200000
console output: https://syzkaller.appspot.com/x/log.txt?x=17220283200000
kernel config:  https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258
dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5
userspace arch: amd64
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c00000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15df4108c00000

Reported-by: syzbot+1505c80c74256c6118a5@syzkaller.appspotmail.com
Fixes: 34e07e42 ("drm/i915: Add missing kerneldoc for 'ent' in  
i915_driver_init_early")

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-12  3:59 ` syzbot
@ 2019-03-12  4:08   ` Al Viro
  2019-03-12  8:00     ` Jani Nikula
                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Al Viro @ 2019-03-12  4:08 UTC (permalink / raw)
  To: syzbot
  Cc: airlied, akpm, amir73il, chris, darrick.wong, david, dri-devel,
	dvyukov, eparis, hannes, hughd, intel-gfx, jack, jani.nikula,
	joonas.lahtinen, jrdr.linux, linux-kernel, linux-mm, mingo,
	mszeredi, penguin-kernel, peterz, rodrigo.vivi, syzkaller-bugs,
	willy

On Mon, Mar 11, 2019 at 08:59:00PM -0700, syzbot wrote:
> syzbot has bisected this bug to:
> 
> commit 34e07e42c55aeaa78e93b057a6664e2ecde3fadb
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Feb 8 10:54:48 2018 +0000
> 
>     drm/i915: Add missing kerneldoc for 'ent' in i915_driver_init_early
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13220283200000
> start commit:   34e07e42 drm/i915: Add missing kerneldoc for 'ent' in i915..
> git tree:       upstream
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=10a20283200000
> console output: https://syzkaller.appspot.com/x/log.txt?x=17220283200000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258
> dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5
> userspace arch: amd64
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c00000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15df4108c00000
> 
> Reported-by: syzbot+1505c80c74256c6118a5@syzkaller.appspotmail.com
> Fixes: 34e07e42 ("drm/i915: Add missing kerneldoc for 'ent' in
> i915_driver_init_early")

Umm...  Might be a good idea to add some plausibility filters - it is,
in theory, possible that adding a line in a comment changes behaviour
(without compiler bugs, even - playing with __LINE__ is all it would
take), but the odds that it's _not_ a false positive are very low.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-12  4:08   ` Al Viro
@ 2019-03-12  8:00     ` Jani Nikula
  2019-03-12 14:29     ` Tetsuo Handa
  2019-03-12 17:10     ` Dmitry Vyukov
  2 siblings, 0 replies; 17+ messages in thread
From: Jani Nikula @ 2019-03-12  8:00 UTC (permalink / raw)
  To: Al Viro, syzbot
  Cc: airlied, akpm, amir73il, chris, darrick.wong, david, dri-devel,
	dvyukov, eparis, hannes, hughd, intel-gfx, jack, joonas.lahtinen,
	jrdr.linux, linux-kernel, linux-mm, mingo, mszeredi,
	penguin-kernel, peterz, rodrigo.vivi, syzkaller-bugs, willy

On Tue, 12 Mar 2019, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Mon, Mar 11, 2019 at 08:59:00PM -0700, syzbot wrote:
>> syzbot has bisected this bug to:
>> 
>> commit 34e07e42c55aeaa78e93b057a6664e2ecde3fadb
>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>> Date:   Thu Feb 8 10:54:48 2018 +0000
>> 
>>     drm/i915: Add missing kerneldoc for 'ent' in i915_driver_init_early
>> 
>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13220283200000
>> start commit:   34e07e42 drm/i915: Add missing kerneldoc for 'ent' in i915..
>> git tree:       upstream
>> final crash:    https://syzkaller.appspot.com/x/report.txt?x=10a20283200000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=17220283200000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258
>> dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5
>> userspace arch: amd64
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c00000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15df4108c00000
>> 
>> Reported-by: syzbot+1505c80c74256c6118a5@syzkaller.appspotmail.com
>> Fixes: 34e07e42 ("drm/i915: Add missing kerneldoc for 'ent' in
>> i915_driver_init_early")
>
> Umm...  Might be a good idea to add some plausibility filters - it is,
> in theory, possible that adding a line in a comment changes behaviour
> (without compiler bugs, even - playing with __LINE__ is all it would
> take), but the odds that it's _not_ a false positive are very low.

If it's not a false positive, it's bound to be good source material for
IOCCC.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-12  4:08   ` Al Viro
  2019-03-12  8:00     ` Jani Nikula
@ 2019-03-12 14:29     ` Tetsuo Handa
  2019-03-12 17:15       ` Dmitry Vyukov
  2019-03-12 17:10     ` Dmitry Vyukov
  2 siblings, 1 reply; 17+ messages in thread
From: Tetsuo Handa @ 2019-03-12 14:29 UTC (permalink / raw)
  To: syzbot, dvyukov, syzkaller-bugs; +Cc: Al Viro, linux-kernel

(Moving most recipients to bcc: in order to avoid flooding.)

On 2019/03/12 13:08, Al Viro wrote:
> Umm...  Might be a good idea to add some plausibility filters - it is,
> in theory, possible that adding a line in a comment changes behaviour
> (without compiler bugs, even - playing with __LINE__ is all it would
> take), but the odds that it's _not_ a false positive are very low.

Well, 108 out of 168 tests done during this bisection failed to test.
With such high failure ratio, it is possible that by chance no crash
happened during few tests for specific commit; causing a wrong bisection
result. I expect that when trying to conclude "git bisect good" for
specific commit, the tests should be repeated until no crash happened
during 8 successful tests.

Also, this bisection is finding multiple different crash patterns, which
suggests that the crashed tests are not giving correct feedback to syzbot.

$ grep -F 'run #' bisect.txt\?x\=13220283200000 | wc -l
168
$ grep -F 'Connection timed out' bisect.txt\?x\=13220283200000 | wc -l
108
$ grep -F 'crashed' bisect.txt\?x\=13220283200000
run #0: crashed: WARNING: ODEBUG bug in netdev_freemem
run #0: crashed: WARNING: ODEBUG bug in netdev_freemem
run #1: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in sys_sendfile64
run #0: crashed: INFO: rcu detected stall in corrupted
run #4: crashed: INFO: rcu detected stall in sys_sendfile64
run #0: crashed: INFO: rcu detected stall in corrupted
run #1: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #0: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in sendfile64
run #0: crashed: INFO: rcu detected stall in corrupted
run #1: crashed: INFO: rcu detected stall in sendfile64
run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
run #1: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in corrupted
run #1: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in corrupted
run #3: crashed: INFO: rcu detected stall in corrupted
run #0: crashed: INFO: rcu detected stall in do_iter_write

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-12  4:08   ` Al Viro
  2019-03-12  8:00     ` Jani Nikula
  2019-03-12 14:29     ` Tetsuo Handa
@ 2019-03-12 17:10     ` Dmitry Vyukov
  2 siblings, 0 replies; 17+ messages in thread
From: Dmitry Vyukov @ 2019-03-12 17:10 UTC (permalink / raw)
  To: Al Viro
  Cc: syzbot, David Airlie, Andrew Morton, Amir Goldstein,
	Chris Wilson, Darrick J. Wong, Dave Chinner, DRI, eparis,
	Johannes Weiner, Hugh Dickins, intel-gfx, Jan Kara, Jani Nikula,
	Joonas Lahtinen, Souptick Joarder, LKML, Linux-MM, Ingo Molnar,
	mszeredi, Tetsuo Handa, Peter Zijlstra, Rodrigo Vivi,
	syzkaller-bugs, Matthew Wilcox

On Tue, Mar 12, 2019 at 5:08 AM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Mar 11, 2019 at 08:59:00PM -0700, syzbot wrote:
> > syzbot has bisected this bug to:
> >
> > commit 34e07e42c55aeaa78e93b057a6664e2ecde3fadb
> > Author: Chris Wilson <chris@chris-wilson.co.uk>
> > Date:   Thu Feb 8 10:54:48 2018 +0000
> >
> >     drm/i915: Add missing kerneldoc for 'ent' in i915_driver_init_early
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=13220283200000
> > start commit:   34e07e42 drm/i915: Add missing kerneldoc for 'ent' in i915..
> > git tree:       upstream
> > final crash:    https://syzkaller.appspot.com/x/report.txt?x=10a20283200000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17220283200000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=abc3dc9b7a900258
> > dashboard link: https://syzkaller.appspot.com/bug?extid=1505c80c74256c6118a5
> > userspace arch: amd64
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12c4dc28c00000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=15df4108c00000
> >
> > Reported-by: syzbot+1505c80c74256c6118a5@syzkaller.appspotmail.com
> > Fixes: 34e07e42 ("drm/i915: Add missing kerneldoc for 'ent' in
> > i915_driver_init_early")
>
> Umm...  Might be a good idea to add some plausibility filters - it is,
> in theory, possible that adding a line in a comment changes behaviour
> (without compiler bugs, even - playing with __LINE__ is all it would
> take), but the odds that it's _not_ a false positive are very low.

Thanks for pointing this out.

I've started collecting all such cases, so that we are able to draw
broader conclusions later:
https://github.com/google/syzkaller/issues/1051

added for this one:
=========
A mix of problems: unrelated bug triggered by the same repro
("WARNING: ODEBUG bug in netdev_freemem"); lots of infrastructure
failures ("failed to copy test binary to VM"); also the original
failure seems to be flaky. All this contributed to pointing to a
random commit.
Al Viro points out that the commit only touches comments, so we could
mark the end result as suspicious.
=========

The infrastructure problems is definitely something we need to fix
("failed to copy test binary to VM") (currently the machine hangs
periodically with lots of time consumed by dmcrypt, but I don't know
if it's related or not yet).

Re the comment-only changes, I would like to see more cases where it
would help before we start creating new universes for this. We could
parse sources with clang to understand that a change was comment-only,
but I guess kernel is mostly broken with clang throughout history....

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-12 14:29     ` Tetsuo Handa
@ 2019-03-12 17:15       ` Dmitry Vyukov
  2019-03-12 21:11         ` Tetsuo Handa
  0 siblings, 1 reply; 17+ messages in thread
From: Dmitry Vyukov @ 2019-03-12 17:15 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: syzbot, syzkaller-bugs, Al Viro, LKML

On Tue, Mar 12, 2019 at 3:30 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> (Moving most recipients to bcc: in order to avoid flooding.)
>
> On 2019/03/12 13:08, Al Viro wrote:
> > Umm...  Might be a good idea to add some plausibility filters - it is,
> > in theory, possible that adding a line in a comment changes behaviour
> > (without compiler bugs, even - playing with __LINE__ is all it would
> > take), but the odds that it's _not_ a false positive are very low.
>
> Well, 108 out of 168 tests done during this bisection failed to test.
> With such high failure ratio, it is possible that by chance no crash
> happened during few tests for specific commit; causing a wrong bisection
> result. I expect that when trying to conclude "git bisect good" for
> specific commit, the tests should be repeated until no crash happened
> during 8 successful tests.

Added to https://github.com/google/syzkaller/issues/1051:

Tetsuo points out that if lots (say, 7/8) tests failed with infra
problems, then we should retry/skip or something. This zeroes the
effect of having multiple independent tests.

Thanks.

> Also, this bisection is finding multiple different crash patterns, which
> suggests that the crashed tests are not giving correct feedback to syzbot.

Treating different crashes as just "crash" is intended. Kernel bugs
can manifest in very different ways.
Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update
in parallel" in  https://syzkaller.appspot.com/?fixed=upstream
It lead to 50+ different failure modes.

> $ grep -F 'run #' bisect.txt\?x\=13220283200000 | wc -l
> 168
> $ grep -F 'Connection timed out' bisect.txt\?x\=13220283200000 | wc -l
> 108
> $ grep -F 'crashed' bisect.txt\?x\=13220283200000
> run #0: crashed: WARNING: ODEBUG bug in netdev_freemem
> run #0: crashed: WARNING: ODEBUG bug in netdev_freemem
> run #1: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in sys_sendfile64
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #4: crashed: INFO: rcu detected stall in sys_sendfile64
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #1: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in sendfile64
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #1: crashed: INFO: rcu detected stall in sendfile64
> run #0: crashed: INFO: rcu detected stall in ext4_file_write_iter
> run #1: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #1: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in corrupted
> run #3: crashed: INFO: rcu detected stall in corrupted
> run #0: crashed: INFO: rcu detected stall in do_iter_write

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-12 17:15       ` Dmitry Vyukov
@ 2019-03-12 21:11         ` Tetsuo Handa
  2019-03-13  6:43           ` Dmitry Vyukov
  0 siblings, 1 reply; 17+ messages in thread
From: Tetsuo Handa @ 2019-03-12 21:11 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: syzbot, syzkaller-bugs, Al Viro, LKML

On 2019/03/13 2:15, Dmitry Vyukov wrote:
>> Also, this bisection is finding multiple different crash patterns, which
>> suggests that the crashed tests are not giving correct feedback to syzbot.
> 
> Treating different crashes as just "crash" is intended. Kernel bugs
> can manifest in very different ways.
> Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update
> in parallel" in  https://syzkaller.appspot.com/?fixed=upstream
> It lead to 50+ different failure modes.
> 

But syzbot already found a rather simple C reproducer
( https://syzkaller.appspot.com/text?tag=ReproC&x=116fc7a8c00000 ) for this bug.
Was this reproducer used for bisection? I guess that if this reproducer was used,
syzbot did not hit "WARNING: ODEBUG bug in netdev_freemem" cases.

Also, humans can sometimes find more simpler C reproducers from syzbot provided
reproducers. It would be nice if syzbot can accept and use a user defined C
reproducer for testing.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-12 21:11         ` Tetsuo Handa
@ 2019-03-13  6:43           ` Dmitry Vyukov
  2019-03-13 16:37             ` Theodore Ts'o
  2019-03-13 23:40             ` Eric Biggers
  0 siblings, 2 replies; 17+ messages in thread
From: Dmitry Vyukov @ 2019-03-13  6:43 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: syzbot, syzkaller-bugs, Al Viro, LKML

On Tue, Mar 12, 2019 at 10:11 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/13 2:15, Dmitry Vyukov wrote:
> >> Also, this bisection is finding multiple different crash patterns, which
> >> suggests that the crashed tests are not giving correct feedback to syzbot.
> >
> > Treating different crashes as just "crash" is intended. Kernel bugs
> > can manifest in very different ways.
> > Want fun, search for "bpf: sockhash, disallow bpf_tcp_close and update
> > in parallel" in  https://syzkaller.appspot.com/?fixed=upstream
> > It lead to 50+ different failure modes.
> >
>
> But syzbot already found a rather simple C reproducer
> ( https://syzkaller.appspot.com/text?tag=ReproC&x=116fc7a8c00000 ) for this bug.
> Was this reproducer used for bisection?

The C reproducer used for bisection is provided as "C reproducer" in
the bisection report.

> I guess that if this reproducer was used,
> syzbot did not hit "WARNING: ODEBUG bug in netdev_freemem" cases.

Maybe. But we won't have more than 1 in future. Currently syzbot
bisects over a backlog of crashes, some of them accumulated multiple
reproducers over weeks/months/years. When it will bisect newly
reported bugs as they are found, there will be only 1 reproducer. E.g.
these two for this bug were found within a month.

> Also, humans can sometimes find more simpler C reproducers from syzbot provided
> reproducers. It would be nice if syzbot can accept and use a user defined C
> reproducer for testing.

It would be more useful to accept patches that make syzkaller create
better reproducers from these people. Manual work is not scalable. We
would need 10 reproducers per day for a dozen of OSes (incl some
private kernels/branches). Anybody is free to run syzkaller manually
and do full manual (perfect) reporting. But for us it become clear
very early that it won't work. Then see above, while that human is
sleeping/on weekend/vacation, syzbot will already bisect own
reproducer. Adding manual reproducer later won't help in any way.
syzkaller already does lots of smart work for reproducers. Let's not
give up on the last mile and switch back to all manual work.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-13  6:43           ` Dmitry Vyukov
@ 2019-03-13 16:37             ` Theodore Ts'o
  2019-03-13 16:56               ` Dmitry Vyukov
  2019-03-13 23:40             ` Eric Biggers
  1 sibling, 1 reply; 17+ messages in thread
From: Theodore Ts'o @ 2019-03-13 16:37 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: Tetsuo Handa, syzbot, syzkaller-bugs, Al Viro, LKML

On Wed, Mar 13, 2019 at 07:43:38AM +0100, Dmitry Vyukov wrote:
> It would be more useful to accept patches that make syzkaller create
> better reproducers from these people. Manual work is not scalable. We
> would need 10 reproducers per day for a dozen of OSes (incl some
> private kernels/branches). Anybody is free to run syzkaller manually
> and do full manual (perfect) reporting. But for us it become clear
> very early that it won't work. Then see above, while that human is
> sleeping/on weekend/vacation, syzbot will already bisect own
> reproducer. Adding manual reproducer later won't help in any way.
> syzkaller already does lots of smart work for reproducers. Let's not
> give up on the last mile and switch back to all manual work.

I suspect a scalable solution that would significantly improve things
is one where Syzbot tries N times for a "good" result to make sure
it's not a flaky pass.  N could either be hard-coded to some value
like 8 or 10, or Syzbot could experimentally try to figure out how
reliable the reproducer happens to be, and figure out what an ideal
"N" value should be for a particular reproducer.

					- Ted

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-13 16:37             ` Theodore Ts'o
@ 2019-03-13 16:56               ` Dmitry Vyukov
  0 siblings, 0 replies; 17+ messages in thread
From: Dmitry Vyukov @ 2019-03-13 16:56 UTC (permalink / raw)
  To: Theodore Ts'o, Dmitry Vyukov, Tetsuo Handa, syzbot,
	syzkaller-bugs, Al Viro, LKML

On Wed, Mar 13, 2019 at 5:37 PM Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Wed, Mar 13, 2019 at 07:43:38AM +0100, Dmitry Vyukov wrote:
> > It would be more useful to accept patches that make syzkaller create
> > better reproducers from these people. Manual work is not scalable. We
> > would need 10 reproducers per day for a dozen of OSes (incl some
> > private kernels/branches). Anybody is free to run syzkaller manually
> > and do full manual (perfect) reporting. But for us it become clear
> > very early that it won't work. Then see above, while that human is
> > sleeping/on weekend/vacation, syzbot will already bisect own
> > reproducer. Adding manual reproducer later won't help in any way.
> > syzkaller already does lots of smart work for reproducers. Let's not
> > give up on the last mile and switch back to all manual work.
>
> I suspect a scalable solution that would significantly improve things
> is one where Syzbot tries N times for a "good" result to make sure
> it's not a flaky pass.  N could either be hard-coded to some value
> like 8 or 10, or Syzbot could experimentally try to figure out how
> reliable the reproducer happens to be, and figure out what an ideal
> "N" value should be for a particular reproducer.

It currently tries 8 times, see e.g.:
https://syzkaller.appspot.com/text?tag=Log&x=13354d9d200000

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-13  6:43           ` Dmitry Vyukov
  2019-03-13 16:37             ` Theodore Ts'o
@ 2019-03-13 23:40             ` Eric Biggers
  2019-03-14 10:52               ` Tetsuo Handa
  1 sibling, 1 reply; 17+ messages in thread
From: Eric Biggers @ 2019-03-13 23:40 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: Tetsuo Handa, syzbot, syzkaller-bugs, Al Viro, LKML

On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> > Also, humans can sometimes find more simpler C reproducers from syzbot provided
> > reproducers. It would be nice if syzbot can accept and use a user defined C
> > reproducer for testing.
> 
> It would be more useful to accept patches that make syzkaller create
> better reproducers from these people. Manual work is not scalable. We
> would need 10 reproducers per day for a dozen of OSes (incl some
> private kernels/branches). Anybody is free to run syzkaller manually
> and do full manual (perfect) reporting. But for us it become clear
> very early that it won't work. Then see above, while that human is
> sleeping/on weekend/vacation, syzbot will already bisect own
> reproducer. Adding manual reproducer later won't help in any way.
> syzkaller already does lots of smart work for reproducers. Let's not
> give up on the last mile and switch back to all manual work.
> 

Well, it's very tough and not many people are familiar with the syzkaller
codebase, let alone have time to contribute.  But having simplified a lot of
the syzkaller reproducers manually, the main things I do are:

- Replace bare system calls with proper C library calls.  For example:

	#include <sys/syscall.h>

	syscall(__NR_socket, 0xa, 6, 0);

    becomes:

	#include <sys/socket.h>

        socket(AF_INET, SOCK_DCCP, 0); 

- Do the same for structs.  Use the appropriate C header rather than filling in
  each struct manually.  For example:

	*(uint16_t*)0x20000000 = 0xa;
	*(uint16_t*)0x20000002 = htobe16(0x4e20);
	*(uint32_t*)0x20000004 = 0;
	*(uint8_t*)0x20000008 = 0;
	*(uint8_t*)0x20000009 = 0;
	*(uint8_t*)0x2000000a = 0;
	*(uint8_t*)0x2000000b = 0;
	*(uint8_t*)0x2000000c = 0;
	*(uint8_t*)0x2000000d = 0;
	*(uint8_t*)0x2000000e = 0;
	*(uint8_t*)0x2000000f = 0;
	*(uint8_t*)0x20000010 = 0;
	*(uint8_t*)0x20000011 = 0;
	*(uint8_t*)0x20000012 = 0;
	*(uint8_t*)0x20000013 = 0;
	*(uint8_t*)0x20000014 = 0;
	*(uint8_t*)0x20000015 = 0;
	*(uint8_t*)0x20000016 = 0;
	*(uint8_t*)0x20000017 = 0;
	*(uint32_t*)0x20000018 = 0;

    becomes:

	struct sockaddr_in6 addr = { .sin6_family = AF_INET6, .sin6_port = htobe16(0x4e20) };

- Put arguments on the stack rather than in a mmap'd region, if possible.

- Simplify any calls to the helper functions that syzkaller emits, e.g.
  syz_open_dev(), syz_kvm_setup_vcpu(), or the networking setup stuff.  Usually
  the reproducer needs a small subset of the functionality to work.

- For multithreaded reproducers, try to incrementally simplify the threading
  strategy.  For example, reduce the number of threads by combining operations.
  Also try running the operations in loops.  Also, using fork() can often result
  in a simpler reproducer than pthreads.

- Instead of using the 'r[]' array to hold all integer return values, give them
  appropriate names.

- Remove duplicate #includes.

- Considering the actual kernel code and the bug, if possible find a different
  way to trigger the same bug that's simpler or more reliable.  If the problem
  is obvious it may be possible to jump right to this step from the beginning.

Some gotchas:

- fault-nth injections are fragile, since the number of memory allocations in a
  particular system call varies by kernel config and kernel version.
  Incrementing n starting from 1 is more reliable.

- Some of the perf_event_open() reproducers are fragile because they hardcode a
  trace event ID, which can change in every kernel version.  Reading the trace
  event ID from /sys/kernel/debug/tracing/events/ is more reliable.

- Reproducers using the KVM API sometimes only work on certain processors (e.g.
  Intel but not AMD) or even depend on the host kernel.

- Reproducers that access the local filesystem sometimes assume that it's ext4.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-13 23:40             ` Eric Biggers
@ 2019-03-14 10:52               ` Tetsuo Handa
  2019-03-20 12:49                 ` Dmitry Vyukov
  2019-03-20 13:45                 ` Dmitry Vyukov
  0 siblings, 2 replies; 17+ messages in thread
From: Tetsuo Handa @ 2019-03-14 10:52 UTC (permalink / raw)
  To: Eric Biggers, Dmitry Vyukov; +Cc: syzbot, syzkaller-bugs, Al Viro, LKML

On 2019/03/14 8:40, Eric Biggers wrote:
> On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
>>> Also, humans can sometimes find more simpler C reproducers from syzbot provided
>>> reproducers. It would be nice if syzbot can accept and use a user defined C
>>> reproducer for testing.
>>
>> It would be more useful to accept patches that make syzkaller create
>> better reproducers from these people. Manual work is not scalable. We
>> would need 10 reproducers per day for a dozen of OSes (incl some
>> private kernels/branches). Anybody is free to run syzkaller manually
>> and do full manual (perfect) reporting. But for us it become clear
>> very early that it won't work. Then see above, while that human is
>> sleeping/on weekend/vacation, syzbot will already bisect own
>> reproducer. Adding manual reproducer later won't help in any way.
>> syzkaller already does lots of smart work for reproducers. Let's not
>> give up on the last mile and switch back to all manual work.
>>
> 
> Well, it's very tough and not many people are familiar with the syzkaller
> codebase, let alone have time to contribute.

Right. I don't read/write go programs. I don't have access to environments
for running syzbot. But instead I try to write kernel patches.

Also, although anybody is free to do full manual (perfect) reporting,
I can't afford checking such reports posted to e.g. LKML. I can afford
checking only https://syzkaller.appspot.com/ .

I have seen a Japanese article which explains how to run syzbot. But I felt that
that article lacks what to do if syzbot found a bug. If people found a crash
by running syzbot in their environments, it would be nice if they can export
the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard
acts as if a bugzilla).

>                                               But having simplified a lot of
> the syzkaller reproducers manually, the main things I do are:

Yes. I'm doing similar things. Other things not listed here are:

  Try to remove syscall() which passes EOF as fd argument, for it should be
  unrelated to the problem unless such call affects subtle timing.

  Try to remove code for testing fuse / tun etc. if the problem seems to be
  unrelated to fuse / tun etc.

syzbot gets pleased with finding one C reproducer, but I wish that syzbot
continues trying to find smaller C reproducers by e.g. eliminating unrelated
calls.

> 
> - Replace bare system calls with proper C library calls.  For example:
> 
> 	#include <sys/syscall.h>
> 
> 	syscall(__NR_socket, 0xa, 6, 0);
> 
>     becomes:
> 
> 	#include <sys/socket.h>
> 
>         socket(AF_INET, SOCK_DCCP, 0); 

Yes. It would be nice if C reproducers are provided using symbols. I run
syzbot provided C reproducers under strace because strace gives me more hints
about symbols and structures.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-14 10:52               ` Tetsuo Handa
@ 2019-03-20 12:49                 ` Dmitry Vyukov
  2019-03-20 13:45                 ` Dmitry Vyukov
  1 sibling, 0 replies; 17+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 12:49 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Eric Biggers, syzbot, syzkaller-bugs, Al Viro, LKML

On Thu, Mar 14, 2019 at 11:52 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/14 8:40, Eric Biggers wrote:
> > On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> >>> Also, humans can sometimes find more simpler C reproducers from syzbot provided
> >>> reproducers. It would be nice if syzbot can accept and use a user defined C
> >>> reproducer for testing.
> >>
> >> It would be more useful to accept patches that make syzkaller create
> >> better reproducers from these people. Manual work is not scalable. We
> >> would need 10 reproducers per day for a dozen of OSes (incl some
> >> private kernels/branches). Anybody is free to run syzkaller manually
> >> and do full manual (perfect) reporting. But for us it become clear
> >> very early that it won't work. Then see above, while that human is
> >> sleeping/on weekend/vacation, syzbot will already bisect own
> >> reproducer. Adding manual reproducer later won't help in any way.
> >> syzkaller already does lots of smart work for reproducers. Let's not
> >> give up on the last mile and switch back to all manual work.
> >>
> >
> > Well, it's very tough and not many people are familiar with the syzkaller
> > codebase, let alone have time to contribute.
>
> Right. I don't read/write go programs. I don't have access to environments
> for running syzbot. But instead I try to write kernel patches.
>
> Also, although anybody is free to do full manual (perfect) reporting,
> I can't afford checking such reports posted to e.g. LKML. I can afford
> checking only https://syzkaller.appspot.com/ .
>
> I have seen a Japanese article which explains how to run syzbot. But I felt that
> that article lacks what to do if syzbot found a bug. If people found a crash
> by running syzbot in their environments, it would be nice if they can export
> the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard
> acts as if a bugzilla).
>
> >                                               But having simplified a lot of
> > the syzkaller reproducers manually, the main things I do are:
>
> Yes. I'm doing similar things. Other things not listed here are:
>
>   Try to remove syscall() which passes EOF as fd argument, for it should be
>   unrelated to the problem unless such call affects subtle timing.
>
>   Try to remove code for testing fuse / tun etc. if the problem seems to be
>   unrelated to fuse / tun etc.
>
> syzbot gets pleased with finding one C reproducer, but I wish that syzbot
> continues trying to find smaller C reproducers by e.g. eliminating unrelated
> calls.
>
> >
> > - Replace bare system calls with proper C library calls.  For example:
> >
> >       #include <sys/syscall.h>
> >
> >       syscall(__NR_socket, 0xa, 6, 0);
> >
> >     becomes:
> >
> >       #include <sys/socket.h>
> >
> >         socket(AF_INET, SOCK_DCCP, 0);
>
> Yes. It would be nice if C reproducers are provided using symbols. I run
> syzbot provided C reproducers under strace because strace gives me more hints
> about symbols and structures.


I will answer re reproducers first.

Thanks for the suggestions, I've filed
https://github.com/google/syzkaller/issues/1070 for this. Lots of them
are implementable within the current framework. Things on kernel
mailing lists (suggestions, bug reports, patches) get lost very
quickly.

As far as I see most of them are related to cosmetics (not saying that
it's not useful, but just won't affect bisection results).
From my experience the most powerful simplifications are possible only
when I have already root caused the bug and understand its mechanics.
Then it's possible to reorder syscalls, remove all/most of threading,
etc. But in that case bisection is not so useful already.

Some of them can alter the load on kernel (e.g. using libc structs and
syscall wrappers) which can lead to triggering of a different bug...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: INFO: rcu detected stall in sys_sendfile64 (2)
  2019-03-14 10:52               ` Tetsuo Handa
  2019-03-20 12:49                 ` Dmitry Vyukov
@ 2019-03-20 13:45                 ` Dmitry Vyukov
  1 sibling, 0 replies; 17+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:45 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Eric Biggers, syzbot, syzkaller-bugs, Al Viro, LKML

On Thu, Mar 14, 2019 at 11:52 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/14 8:40, Eric Biggers wrote:
> > On Wed, Mar 13, 2019 at 07:43:38AM +0100, 'Dmitry Vyukov' via syzkaller-bugs wrote:
> >>> Also, humans can sometimes find more simpler C reproducers from syzbot provided
> >>> reproducers. It would be nice if syzbot can accept and use a user defined C
> >>> reproducer for testing.
> >>
> >> It would be more useful to accept patches that make syzkaller create
> >> better reproducers from these people. Manual work is not scalable. We
> >> would need 10 reproducers per day for a dozen of OSes (incl some
> >> private kernels/branches). Anybody is free to run syzkaller manually
> >> and do full manual (perfect) reporting. But for us it become clear
> >> very early that it won't work. Then see above, while that human is
> >> sleeping/on weekend/vacation, syzbot will already bisect own
> >> reproducer. Adding manual reproducer later won't help in any way.
> >> syzkaller already does lots of smart work for reproducers. Let's not
> >> give up on the last mile and switch back to all manual work.
> >>
> >
> > Well, it's very tough and not many people are familiar with the syzkaller
> > codebase, let alone have time to contribute.
>
> Right. I don't read/write go programs. I don't have access to environments
> for running syzbot. But instead I try to write kernel patches.
>
> Also, although anybody is free to do full manual (perfect) reporting,
> I can't afford checking such reports posted to e.g. LKML. I can afford
> checking only https://syzkaller.appspot.com/ .
>
> I have seen a Japanese article which explains how to run syzbot. But I felt that
> that article lacks what to do if syzbot found a bug. If people found a crash
> by running syzbot in their environments, it would be nice if they can export
> the report and import it to https://syzkaller.appspot.com/ (i.e. dashboard
> acts as if a bugzilla).


Problem 1 (smaller). Neither providing custom program nor manually
specifying bisection range (as you suggested in another thread
https://groups.google.com/d/msg/syzkaller-bugs/nFeC8-UG1gg/1OTVIuzBAgAJ)
won't make kernel bug bisection reliable. The problems with kernel
bisection are deeper. Consider a bug that is inherently hard to
trigger, even if one provides own reproducer it's still hard to
trigger and bisection can diverge. What happened in the other bug:
bisection diverged because the reproducer triggered another bug. Now
consider that this happens within the bisection range. Even if you
give own range, it won't help. And there are lots of other problems
like, say, large ranges where kernel build is broken.
And this will introduce own problems: e.g. it's very easy to give
syzbot a reproducer that actually don't not trigger the bug for it
(because you can't match its environment precisely).
Also: if you can't bisect locally and can't test, how do you know the
right range generally? Again that one bug was a single corner case.
Also: semi-manual process will also lead to some suboptimal results,
and then other kernel developers will come and ask questions and
somebody will need to answer these questions. But in this case syzbot
is not even accountable for what happened.
I don't think there is a simple substitution for a qualified engineer
doing its job (guiding each step of bisection manually).
It's possible to imagine a very complex workflow (super hard to
implement, test and maintain too) that will allow to do that. And it
becomes mostly offloading build/boot/test of a given configuration to
the cloud. And this brings this us to the second problem.

Problem 2. What you are proposing effectively looks like some kind of
custom workload offloading service for kernel developers. Just instead
of console commands (raw cloud VMs) it has somewhat higher level
interface (e.g. here is kernel config, compiler, command line,
sysctls, machine configuration and test case, go build and test it).
I don't think this should be bolted on top of syzbot.
Developing and running syzbot is already a _huge_ amount of work
(frequently ungrateful). I simply cannot take on developing, testing,
deploying, maintaining and operating another service. And that service
will involve much more complex human interactions, so will be much
more complex overall.
If such service is provided I think it needs to run on Linux
Foundation infrastructure that runs CI and other testing. Yes, I know,
it does not exist. But that would be the right place. It would benefit
work on all other kernel bugs too. Lots of things people attribute to
syzbot are really not specific to syzbot in anyway. For example that
service would help with bisection of all other bugs too. And it seems
that a much simpler solution would be just to provide free VMs for
developers, because you main points seems to be "I would like to do
something custom, but I don't have resources for that". This is out of
scope for syzbot.
The current syzbot scope is: automating as much as possible, solving
common cases at scale (including other OSes and kernel branches),
bringing developers enough information to pick up the bug from there
and do any custom work necessary to debug and fix the bug (there
always will be custom work! even perfect bisection can get you nowhere
re root causing and there are still bugs without reproducers). We can
solve some surrounding problems too _iff_ they are common enough, have
high bang for the buck, reasonably easy to implement and don't cause
long-term maintenance toll. This one does not look like such problem.
Sorry.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-03-20 13:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-19 11:32 INFO: rcu detected stall in sys_sendfile64 (2) syzbot
2019-01-19 11:41 ` Dmitry Vyukov
2019-01-19 13:00   ` Tetsuo Handa
2019-03-12  3:59 ` syzbot
2019-03-12  4:08   ` Al Viro
2019-03-12  8:00     ` Jani Nikula
2019-03-12 14:29     ` Tetsuo Handa
2019-03-12 17:15       ` Dmitry Vyukov
2019-03-12 21:11         ` Tetsuo Handa
2019-03-13  6:43           ` Dmitry Vyukov
2019-03-13 16:37             ` Theodore Ts'o
2019-03-13 16:56               ` Dmitry Vyukov
2019-03-13 23:40             ` Eric Biggers
2019-03-14 10:52               ` Tetsuo Handa
2019-03-20 12:49                 ` Dmitry Vyukov
2019-03-20 13:45                 ` Dmitry Vyukov
2019-03-12 17:10     ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).