All of lore.kernel.org
 help / color / mirror / Atom feed
* kernel panic: corrupted stack end in wb_workfn
@ 2018-12-31  3:41 ` syzbot
  0 siblings, 0 replies; 42+ messages in thread
From: syzbot @ 2018-12-31  3:41 UTC (permalink / raw)
  To: akpm, aryabinin, guro, hannes, jbacik, ktkhai, linux-kernel,
	linux-mm, mgorman, mhocko, shakeelb, syzkaller-bugs, willy

Hello,

syzbot found the following crash on:

HEAD commit:    195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=176c0ebf400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com

Kernel panic - not syncing: corrupted stack end detected inside scheduler
CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.20.0+ #396
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: writeback wb_workfn (flush-8:0)
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
  panic+0x2ad/0x55f kernel/panic.c:189
  schedule_debug kernel/sched/core.c:3285 [inline]
  __schedule+0x1ec6/0x1ed0 kernel/sched/core.c:3394
  preempt_schedule_common+0x1f/0xe0 kernel/sched/core.c:3596
  preempt_schedule+0x4d/0x60 kernel/sched/core.c:3622
  ___preempt_schedule+0x16/0x18
  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
  _raw_spin_unlock_irqrestore+0xbb/0xd0 kernel/locking/spinlock.c:184
  spin_unlock_irqrestore include/linux/spinlock.h:384 [inline]
  __remove_mapping+0x932/0x1af0 mm/vmscan.c:967
  shrink_page_list+0x6610/0xc2e0 mm/vmscan.c:1461
  shrink_inactive_list+0x77b/0x1c60 mm/vmscan.c:1961
  shrink_list mm/vmscan.c:2273 [inline]
  shrink_node_memcg+0x7a8/0x19a0 mm/vmscan.c:2538
  shrink_node+0x3e1/0x17f0 mm/vmscan.c:2753
  shrink_zones mm/vmscan.c:2987 [inline]
  do_try_to_free_pages+0x3df/0x12a0 mm/vmscan.c:3049
  try_to_free_pages+0x4d0/0xb90 mm/vmscan.c:3265
  __perform_reclaim mm/page_alloc.c:3920 [inline]
  __alloc_pages_direct_reclaim mm/page_alloc.c:3942 [inline]
  __alloc_pages_slowpath+0xa5a/0x2db0 mm/page_alloc.c:4335
  __alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
  alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
  alloc_pages include/linux/gfp.h:509 [inline]
  __page_cache_alloc+0x38c/0x5b0 mm/filemap.c:924
  pagecache_get_page+0x396/0xf00 mm/filemap.c:1615
  find_or_create_page include/linux/pagemap.h:322 [inline]
  ext4_mb_load_buddy_gfp+0xddf/0x1e70 fs/ext4/mballoc.c:1158
  ext4_mb_load_buddy fs/ext4/mballoc.c:1241 [inline]
  ext4_mb_regular_allocator+0x634/0x1590 fs/ext4/mballoc.c:2190
  ext4_mb_new_blocks+0x1de3/0x4840 fs/ext4/mballoc.c:4538
  ext4_ext_map_blocks+0x2eef/0x6180 fs/ext4/extents.c:4404
  ext4_map_blocks+0x8f7/0x1b60 fs/ext4/inode.c:636
  mpage_map_one_extent fs/ext4/inode.c:2480 [inline]
  mpage_map_and_submit_extent fs/ext4/inode.c:2533 [inline]
  ext4_writepages+0x2564/0x4170 fs/ext4/inode.c:2884
  do_writepages+0x9a/0x1a0 mm/page-writeback.c:2335
  __writeback_single_inode+0x20a/0x1660 fs/fs-writeback.c:1316
  writeback_sb_inodes+0x71f/0x1210 fs/fs-writeback.c:1580
  __writeback_inodes_wb+0x1b9/0x340 fs/fs-writeback.c:1649
  wb_writeback+0xa73/0xfc0 fs/fs-writeback.c:1758
oom_reaper: reaped process 7963 (syz-executor189), now anon-rss:0kB,  
file-rss:0kB, shmem-rss:0kB
rsyslogd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE),  
order=0, oom_score_adj=0
  wb_check_start_all fs/fs-writeback.c:1882 [inline]
  wb_do_writeback fs/fs-writeback.c:1908 [inline]
  wb_workfn+0xee9/0x1790 fs/fs-writeback.c:1942
  process_one_work+0xc90/0x1c40 kernel/workqueue.c:2153
  worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
  kthread+0x35a/0x440 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
CPU: 1 PID: 7840 Comm: rsyslogd Not tainted 4.20.0+ #396
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
  dump_header+0x253/0x1239 mm/oom_kill.c:451
  oom_kill_process.cold.27+0x10/0x903 mm/oom_kill.c:966
  out_of_memory+0x8ba/0x1480 mm/oom_kill.c:1133
  __alloc_pages_may_oom mm/page_alloc.c:3666 [inline]
  __alloc_pages_slowpath+0x230c/0x2db0 mm/page_alloc.c:4379
  __alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
  alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
  alloc_pages include/linux/gfp.h:509 [inline]
  __page_cache_alloc+0x38c/0x5b0 mm/filemap.c:924
  page_cache_read mm/filemap.c:2373 [inline]
  filemap_fault+0x1595/0x25f0 mm/filemap.c:2557
  ext4_filemap_fault+0x82/0xad fs/ext4/inode.c:6317
  __do_fault+0x100/0x6b0 mm/memory.c:2997
  do_read_fault mm/memory.c:3409 [inline]
  do_fault mm/memory.c:3535 [inline]
  handle_pte_fault mm/memory.c:3766 [inline]
  __handle_mm_fault+0x392f/0x5630 mm/memory.c:3890
  handle_mm_fault+0x54f/0xc70 mm/memory.c:3927
  do_user_addr_fault arch/x86/mm/fault.c:1475 [inline]
  __do_page_fault+0x5f6/0xd70 arch/x86/mm/fault.c:1541
  do_page_fault+0xf2/0x7e0 arch/x86/mm/fault.c:1572
  page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1143
RIP: 0033:0x7f00f990e1fd
Code: Bad RIP value.
RSP: 002b:00007f00f6eade30 EFLAGS: 00010293
RAX: 0000000000000fd2 RBX: 000000000111f170 RCX: 00007f00f990e1fd
RDX: 0000000000000fff RSI: 00007f00f86e25a0 RDI: 0000000000000004
RBP: 0000000000000000 R08: 000000000110a260 R09: 0000000000000000
R10: 74616c7567657227 R11: 0000000000000293 R12: 000000000065e420
R13: 00007f00f6eae9c0 R14: 00007f00f9f53040 R15: 0000000000000003
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 42+ messages in thread

* kernel panic: corrupted stack end in wb_workfn
@ 2018-12-31  3:41 ` syzbot
  0 siblings, 0 replies; 42+ messages in thread
From: syzbot @ 2018-12-31  3:41 UTC (permalink / raw)
  To: akpm, aryabinin, guro, hannes, jbacik, ktkhai, linux-kernel,
	linux-mm, mgorman, mhocko, shakeelb, syzkaller-bugs, willy

Hello,

syzbot found the following crash on:

HEAD commit:    195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=176c0ebf400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com

Kernel panic - not syncing: corrupted stack end detected inside scheduler
CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.20.0+ #396
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: writeback wb_workfn (flush-8:0)
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
  panic+0x2ad/0x55f kernel/panic.c:189
  schedule_debug kernel/sched/core.c:3285 [inline]
  __schedule+0x1ec6/0x1ed0 kernel/sched/core.c:3394
  preempt_schedule_common+0x1f/0xe0 kernel/sched/core.c:3596
  preempt_schedule+0x4d/0x60 kernel/sched/core.c:3622
  ___preempt_schedule+0x16/0x18
  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
  _raw_spin_unlock_irqrestore+0xbb/0xd0 kernel/locking/spinlock.c:184
  spin_unlock_irqrestore include/linux/spinlock.h:384 [inline]
  __remove_mapping+0x932/0x1af0 mm/vmscan.c:967
  shrink_page_list+0x6610/0xc2e0 mm/vmscan.c:1461
  shrink_inactive_list+0x77b/0x1c60 mm/vmscan.c:1961
  shrink_list mm/vmscan.c:2273 [inline]
  shrink_node_memcg+0x7a8/0x19a0 mm/vmscan.c:2538
  shrink_node+0x3e1/0x17f0 mm/vmscan.c:2753
  shrink_zones mm/vmscan.c:2987 [inline]
  do_try_to_free_pages+0x3df/0x12a0 mm/vmscan.c:3049
  try_to_free_pages+0x4d0/0xb90 mm/vmscan.c:3265
  __perform_reclaim mm/page_alloc.c:3920 [inline]
  __alloc_pages_direct_reclaim mm/page_alloc.c:3942 [inline]
  __alloc_pages_slowpath+0xa5a/0x2db0 mm/page_alloc.c:4335
  __alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
  alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
  alloc_pages include/linux/gfp.h:509 [inline]
  __page_cache_alloc+0x38c/0x5b0 mm/filemap.c:924
  pagecache_get_page+0x396/0xf00 mm/filemap.c:1615
  find_or_create_page include/linux/pagemap.h:322 [inline]
  ext4_mb_load_buddy_gfp+0xddf/0x1e70 fs/ext4/mballoc.c:1158
  ext4_mb_load_buddy fs/ext4/mballoc.c:1241 [inline]
  ext4_mb_regular_allocator+0x634/0x1590 fs/ext4/mballoc.c:2190
  ext4_mb_new_blocks+0x1de3/0x4840 fs/ext4/mballoc.c:4538
  ext4_ext_map_blocks+0x2eef/0x6180 fs/ext4/extents.c:4404
  ext4_map_blocks+0x8f7/0x1b60 fs/ext4/inode.c:636
  mpage_map_one_extent fs/ext4/inode.c:2480 [inline]
  mpage_map_and_submit_extent fs/ext4/inode.c:2533 [inline]
  ext4_writepages+0x2564/0x4170 fs/ext4/inode.c:2884
  do_writepages+0x9a/0x1a0 mm/page-writeback.c:2335
  __writeback_single_inode+0x20a/0x1660 fs/fs-writeback.c:1316
  writeback_sb_inodes+0x71f/0x1210 fs/fs-writeback.c:1580
  __writeback_inodes_wb+0x1b9/0x340 fs/fs-writeback.c:1649
  wb_writeback+0xa73/0xfc0 fs/fs-writeback.c:1758
oom_reaper: reaped process 7963 (syz-executor189), now anon-rss:0kB,  
file-rss:0kB, shmem-rss:0kB
rsyslogd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE),  
order=0, oom_score_adj=0
  wb_check_start_all fs/fs-writeback.c:1882 [inline]
  wb_do_writeback fs/fs-writeback.c:1908 [inline]
  wb_workfn+0xee9/0x1790 fs/fs-writeback.c:1942
  process_one_work+0xc90/0x1c40 kernel/workqueue.c:2153
  worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
  kthread+0x35a/0x440 kernel/kthread.c:246
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
CPU: 1 PID: 7840 Comm: rsyslogd Not tainted 4.20.0+ #396
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
  dump_header+0x253/0x1239 mm/oom_kill.c:451
  oom_kill_process.cold.27+0x10/0x903 mm/oom_kill.c:966
  out_of_memory+0x8ba/0x1480 mm/oom_kill.c:1133
  __alloc_pages_may_oom mm/page_alloc.c:3666 [inline]
  __alloc_pages_slowpath+0x230c/0x2db0 mm/page_alloc.c:4379
  __alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
  alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
  alloc_pages include/linux/gfp.h:509 [inline]
  __page_cache_alloc+0x38c/0x5b0 mm/filemap.c:924
  page_cache_read mm/filemap.c:2373 [inline]
  filemap_fault+0x1595/0x25f0 mm/filemap.c:2557
  ext4_filemap_fault+0x82/0xad fs/ext4/inode.c:6317
  __do_fault+0x100/0x6b0 mm/memory.c:2997
  do_read_fault mm/memory.c:3409 [inline]
  do_fault mm/memory.c:3535 [inline]
  handle_pte_fault mm/memory.c:3766 [inline]
  __handle_mm_fault+0x392f/0x5630 mm/memory.c:3890
  handle_mm_fault+0x54f/0xc70 mm/memory.c:3927
  do_user_addr_fault arch/x86/mm/fault.c:1475 [inline]
  __do_page_fault+0x5f6/0xd70 arch/x86/mm/fault.c:1541
  do_page_fault+0xf2/0x7e0 arch/x86/mm/fault.c:1572
  page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1143
RIP: 0033:0x7f00f990e1fd
Code: Bad RIP value.
RSP: 002b:00007f00f6eade30 EFLAGS: 00010293
RAX: 0000000000000fd2 RBX: 000000000111f170 RCX: 00007f00f990e1fd
RDX: 0000000000000fff RSI: 00007f00f86e25a0 RDI: 0000000000000004
RBP: 0000000000000000 R08: 000000000110a260 R09: 0000000000000000
R10: 74616c7567657227 R11: 0000000000000293 R12: 000000000065e420
R13: 00007f00f6eae9c0 R14: 00007f00f9f53040 R15: 0000000000000003
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2018-12-31  3:41 ` syzbot
  (?)
@ 2018-12-31  3:47 ` Qian Cai
  2018-12-31  6:31     ` Dmitry Vyukov
  -1 siblings, 1 reply; 42+ messages in thread
From: Qian Cai @ 2018-12-31  3:47 UTC (permalink / raw)
  To: syzbot, akpm, aryabinin, guro, hannes, jbacik, ktkhai,
	linux-kernel, linux-mm, mgorman, mhocko, shakeelb,
	syzkaller-bugs, willy

Ah, it has KASAN_EXTRA. Need this patch then.

https://lore.kernel.org/lkml/20181228020639.80425-1-cai@lca.pw/

or to use GCC from the HEAD which suppose to reduce the stack-size in half.

shrink_page_list
shrink_inactive_list

Those things are 7k each, so 32k would be soon gone.

On 12/30/18 10:41 PM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=176c0ebf400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> 
> Kernel panic - not syncing: corrupted stack end detected inside scheduler
> CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.20.0+ #396
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> Workqueue: writeback wb_workfn (flush-8:0)
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
>  panic+0x2ad/0x55f kernel/panic.c:189
>  schedule_debug kernel/sched/core.c:3285 [inline]
>  __schedule+0x1ec6/0x1ed0 kernel/sched/core.c:3394
>  preempt_schedule_common+0x1f/0xe0 kernel/sched/core.c:3596
>  preempt_schedule+0x4d/0x60 kernel/sched/core.c:3622
>  ___preempt_schedule+0x16/0x18
>  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
>  _raw_spin_unlock_irqrestore+0xbb/0xd0 kernel/locking/spinlock.c:184
>  spin_unlock_irqrestore include/linux/spinlock.h:384 [inline]
>  __remove_mapping+0x932/0x1af0 mm/vmscan.c:967
>  shrink_page_list+0x6610/0xc2e0 mm/vmscan.c:1461
>  shrink_inactive_list+0x77b/0x1c60 mm/vmscan.c:1961
>  shrink_list mm/vmscan.c:2273 [inline]
>  shrink_node_memcg+0x7a8/0x19a0 mm/vmscan.c:2538
>  shrink_node+0x3e1/0x17f0 mm/vmscan.c:2753
>  shrink_zones mm/vmscan.c:2987 [inline]
>  do_try_to_free_pages+0x3df/0x12a0 mm/vmscan.c:3049
>  try_to_free_pages+0x4d0/0xb90 mm/vmscan.c:3265
>  __perform_reclaim mm/page_alloc.c:3920 [inline]
>  __alloc_pages_direct_reclaim mm/page_alloc.c:3942 [inline]
>  __alloc_pages_slowpath+0xa5a/0x2db0 mm/page_alloc.c:4335
>  __alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
>  alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
>  alloc_pages include/linux/gfp.h:509 [inline]
>  __page_cache_alloc+0x38c/0x5b0 mm/filemap.c:924
>  pagecache_get_page+0x396/0xf00 mm/filemap.c:1615
>  find_or_create_page include/linux/pagemap.h:322 [inline]
>  ext4_mb_load_buddy_gfp+0xddf/0x1e70 fs/ext4/mballoc.c:1158
>  ext4_mb_load_buddy fs/ext4/mballoc.c:1241 [inline]
>  ext4_mb_regular_allocator+0x634/0x1590 fs/ext4/mballoc.c:2190
>  ext4_mb_new_blocks+0x1de3/0x4840 fs/ext4/mballoc.c:4538
>  ext4_ext_map_blocks+0x2eef/0x6180 fs/ext4/extents.c:4404
>  ext4_map_blocks+0x8f7/0x1b60 fs/ext4/inode.c:636
>  mpage_map_one_extent fs/ext4/inode.c:2480 [inline]
>  mpage_map_and_submit_extent fs/ext4/inode.c:2533 [inline]
>  ext4_writepages+0x2564/0x4170 fs/ext4/inode.c:2884
>  do_writepages+0x9a/0x1a0 mm/page-writeback.c:2335
>  __writeback_single_inode+0x20a/0x1660 fs/fs-writeback.c:1316
>  writeback_sb_inodes+0x71f/0x1210 fs/fs-writeback.c:1580
>  __writeback_inodes_wb+0x1b9/0x340 fs/fs-writeback.c:1649
>  wb_writeback+0xa73/0xfc0 fs/fs-writeback.c:1758
> oom_reaper: reaped process 7963 (syz-executor189), now anon-rss:0kB,
> file-rss:0kB, shmem-rss:0kB
> rsyslogd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0,
> oom_score_adj=0
>  wb_check_start_all fs/fs-writeback.c:1882 [inline]
>  wb_do_writeback fs/fs-writeback.c:1908 [inline]
>  wb_workfn+0xee9/0x1790 fs/fs-writeback.c:1942
>  process_one_work+0xc90/0x1c40 kernel/workqueue.c:2153
>  worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
>  kthread+0x35a/0x440 kernel/kthread.c:246
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
> CPU: 1 PID: 7840 Comm: rsyslogd Not tainted 4.20.0+ #396
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
>  dump_header+0x253/0x1239 mm/oom_kill.c:451
>  oom_kill_process.cold.27+0x10/0x903 mm/oom_kill.c:966
>  out_of_memory+0x8ba/0x1480 mm/oom_kill.c:1133
>  __alloc_pages_may_oom mm/page_alloc.c:3666 [inline]
>  __alloc_pages_slowpath+0x230c/0x2db0 mm/page_alloc.c:4379
>  __alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
>  alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
>  alloc_pages include/linux/gfp.h:509 [inline]
>  __page_cache_alloc+0x38c/0x5b0 mm/filemap.c:924
>  page_cache_read mm/filemap.c:2373 [inline]
>  filemap_fault+0x1595/0x25f0 mm/filemap.c:2557
>  ext4_filemap_fault+0x82/0xad fs/ext4/inode.c:6317
>  __do_fault+0x100/0x6b0 mm/memory.c:2997
>  do_read_fault mm/memory.c:3409 [inline]
>  do_fault mm/memory.c:3535 [inline]
>  handle_pte_fault mm/memory.c:3766 [inline]
>  __handle_mm_fault+0x392f/0x5630 mm/memory.c:3890
>  handle_mm_fault+0x54f/0xc70 mm/memory.c:3927
>  do_user_addr_fault arch/x86/mm/fault.c:1475 [inline]
>  __do_page_fault+0x5f6/0xd70 arch/x86/mm/fault.c:1541
>  do_page_fault+0xf2/0x7e0 arch/x86/mm/fault.c:1572
>  page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1143
> RIP: 0033:0x7f00f990e1fd
> Code: Bad RIP value.
> RSP: 002b:00007f00f6eade30 EFLAGS: 00010293
> RAX: 0000000000000fd2 RBX: 000000000111f170 RCX: 00007f00f990e1fd
> RDX: 0000000000000fff RSI: 00007f00f86e25a0 RDI: 0000000000000004
> RBP: 0000000000000000 R08: 000000000110a260 R09: 0000000000000000
> R10: 74616c7567657227 R11: 0000000000000293 R12: 000000000065e420
> R13: 00007f00f6eae9c0 R14: 00007f00f9f53040 R15: 0000000000000003
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2018-12-31  6:31     ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2018-12-31  6:31 UTC (permalink / raw)
  To: Qian Cai
  Cc: syzbot, Andrew Morton, Andrey Ryabinin, guro, Johannes Weiner,
	Josef Bacik, Kirill Tkhai, LKML, Linux-MM, Mel Gorman,
	Michal Hocko, Shakeel Butt, syzkaller-bugs, Matthew Wilcox

On Mon, Dec 31, 2018 at 4:47 AM Qian Cai <cai@lca.pw> wrote:
>
> Ah, it has KASAN_EXTRA. Need this patch then.
>
> https://lore.kernel.org/lkml/20181228020639.80425-1-cai@lca.pw/
>
> or to use GCC from the HEAD which suppose to reduce the stack-size in half.
>
> shrink_page_list
> shrink_inactive_list
>
> Those things are 7k each, so 32k would be soon gone.

I am not sure it's just KASAN. I reproduced stack overflow at this
stack without KASAN:
https://groups.google.com/forum/#!msg/syzkaller-bugs/ZaBzAJbn6i8/Py9FVlAqDQAJ

Note: this was originally reported 5 months ago:
https://groups.google.com/forum/#!msg/syzkaller-bugs/C7d0Hm6YcDM/nQeciKgtCgAJ
so now at least in 2 releases and causes stream of induced crashes
that people spent time debugging:
https://groups.google.com/forum/#!msg/syzkaller-bugs/ZaBzAJbn6i8/Py9FVlAqDQAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/GIpnqHiIEQg/5jzwQqqfCwAJ
https://syzkaller.appspot.com/bug?id=26c906d472ea470c2cb58c77f08f964f347cbc68
https://groups.google.com/forum/#!msg/syzkaller-bugs/Ovkbsq5qd84/FHsTYlsfDAAJ
most likely more of these:
https://syzkaller.appspot.com#upstream



> On 12/30/18 10:41 PM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=176c0ebf400000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
> > compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> >
> > Kernel panic - not syncing: corrupted stack end detected inside scheduler
> > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.20.0+ #396
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> > 01/01/2011
> > Workqueue: writeback wb_workfn (flush-8:0)
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
> >  panic+0x2ad/0x55f kernel/panic.c:189
> >  schedule_debug kernel/sched/core.c:3285 [inline]
> >  __schedule+0x1ec6/0x1ed0 kernel/sched/core.c:3394
> >  preempt_schedule_common+0x1f/0xe0 kernel/sched/core.c:3596
> >  preempt_schedule+0x4d/0x60 kernel/sched/core.c:3622
> >  ___preempt_schedule+0x16/0x18
> >  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
> >  _raw_spin_unlock_irqrestore+0xbb/0xd0 kernel/locking/spinlock.c:184
> >  spin_unlock_irqrestore include/linux/spinlock.h:384 [inline]
> >  __remove_mapping+0x932/0x1af0 mm/vmscan.c:967
> >  shrink_page_list+0x6610/0xc2e0 mm/vmscan.c:1461
> >  shrink_inactive_list+0x77b/0x1c60 mm/vmscan.c:1961
> >  shrink_list mm/vmscan.c:2273 [inline]
> >  shrink_node_memcg+0x7a8/0x19a0 mm/vmscan.c:2538
> >  shrink_node+0x3e1/0x17f0 mm/vmscan.c:2753
> >  shrink_zones mm/vmscan.c:2987 [inline]
> >  do_try_to_free_pages+0x3df/0x12a0 mm/vmscan.c:3049
> >  try_to_free_pages+0x4d0/0xb90 mm/vmscan.c:3265
> >  __perform_reclaim mm/page_alloc.c:3920 [inline]
> >  __alloc_pages_direct_reclaim mm/page_alloc.c:3942 [inline]
> >  __alloc_pages_slowpath+0xa5a/0x2db0 mm/page_alloc.c:4335
> >  __alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
> >  alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
> >  alloc_pages include/linux/gfp.h:509 [inline]
> >  __page_cache_alloc+0x38c/0x5b0 mm/filemap.c:924
> >  pagecache_get_page+0x396/0xf00 mm/filemap.c:1615
> >  find_or_create_page include/linux/pagemap.h:322 [inline]
> >  ext4_mb_load_buddy_gfp+0xddf/0x1e70 fs/ext4/mballoc.c:1158
> >  ext4_mb_load_buddy fs/ext4/mballoc.c:1241 [inline]
> >  ext4_mb_regular_allocator+0x634/0x1590 fs/ext4/mballoc.c:2190
> >  ext4_mb_new_blocks+0x1de3/0x4840 fs/ext4/mballoc.c:4538
> >  ext4_ext_map_blocks+0x2eef/0x6180 fs/ext4/extents.c:4404
> >  ext4_map_blocks+0x8f7/0x1b60 fs/ext4/inode.c:636
> >  mpage_map_one_extent fs/ext4/inode.c:2480 [inline]
> >  mpage_map_and_submit_extent fs/ext4/inode.c:2533 [inline]
> >  ext4_writepages+0x2564/0x4170 fs/ext4/inode.c:2884
> >  do_writepages+0x9a/0x1a0 mm/page-writeback.c:2335
> >  __writeback_single_inode+0x20a/0x1660 fs/fs-writeback.c:1316
> >  writeback_sb_inodes+0x71f/0x1210 fs/fs-writeback.c:1580
> >  __writeback_inodes_wb+0x1b9/0x340 fs/fs-writeback.c:1649
> >  wb_writeback+0xa73/0xfc0 fs/fs-writeback.c:1758
> > oom_reaper: reaped process 7963 (syz-executor189), now anon-rss:0kB,
> > file-rss:0kB, shmem-rss:0kB
> > rsyslogd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0,
> > oom_score_adj=0
> >  wb_check_start_all fs/fs-writeback.c:1882 [inline]
> >  wb_do_writeback fs/fs-writeback.c:1908 [inline]
> >  wb_workfn+0xee9/0x1790 fs/fs-writeback.c:1942
> >  process_one_work+0xc90/0x1c40 kernel/workqueue.c:2153
> >  worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
> >  kthread+0x35a/0x440 kernel/kthread.c:246
> >  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
> > CPU: 1 PID: 7840 Comm: rsyslogd Not tainted 4.20.0+ #396
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> > 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
> >  dump_header+0x253/0x1239 mm/oom_kill.c:451
> >  oom_kill_process.cold.27+0x10/0x903 mm/oom_kill.c:966
> >  out_of_memory+0x8ba/0x1480 mm/oom_kill.c:1133
> >  __alloc_pages_may_oom mm/page_alloc.c:3666 [inline]
> >  __alloc_pages_slowpath+0x230c/0x2db0 mm/page_alloc.c:4379
> >  __alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
> >  alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
> >  alloc_pages include/linux/gfp.h:509 [inline]
> >  __page_cache_alloc+0x38c/0x5b0 mm/filemap.c:924
> >  page_cache_read mm/filemap.c:2373 [inline]
> >  filemap_fault+0x1595/0x25f0 mm/filemap.c:2557
> >  ext4_filemap_fault+0x82/0xad fs/ext4/inode.c:6317
> >  __do_fault+0x100/0x6b0 mm/memory.c:2997
> >  do_read_fault mm/memory.c:3409 [inline]
> >  do_fault mm/memory.c:3535 [inline]
> >  handle_pte_fault mm/memory.c:3766 [inline]
> >  __handle_mm_fault+0x392f/0x5630 mm/memory.c:3890
> >  handle_mm_fault+0x54f/0xc70 mm/memory.c:3927
> >  do_user_addr_fault arch/x86/mm/fault.c:1475 [inline]
> >  __do_page_fault+0x5f6/0xd70 arch/x86/mm/fault.c:1541
> >  do_page_fault+0xf2/0x7e0 arch/x86/mm/fault.c:1572
> >  page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1143
> > RIP: 0033:0x7f00f990e1fd
> > Code: Bad RIP value.
> > RSP: 002b:00007f00f6eade30 EFLAGS: 00010293
> > RAX: 0000000000000fd2 RBX: 000000000111f170 RCX: 00007f00f990e1fd
> > RDX: 0000000000000fff RSI: 00007f00f86e25a0 RDI: 0000000000000004
> > RBP: 0000000000000000 R08: 000000000110a260 R09: 0000000000000000
> > R10: 74616c7567657227 R11: 0000000000000293 R12: 000000000065e420
> > R13: 00007f00f6eae9c0 R14: 00007f00f9f53040 R15: 0000000000000003
> > Kernel Offset: disabled
> > Rebooting in 86400 seconds..
> >
> >
> > ---
> > This bug is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> >
> > syzbot will keep track of this bug report. See:
> > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.
> > syzbot can test patches for this bug, for details see:
> > https://goo.gl/tpsmEJ#testing-patches
> >
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/9fe14b68-5a3c-5964-62b1-53a4ef4c0b76%40lca.pw.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2018-12-31  6:31     ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2018-12-31  6:31 UTC (permalink / raw)
  To: Qian Cai
  Cc: syzbot, Andrew Morton, Andrey Ryabinin, guro, Johannes Weiner,
	Josef Bacik, Kirill Tkhai, LKML, Linux-MM, Mel Gorman,
	Michal Hocko, Shakeel Butt, syzkaller-bugs, Matthew Wilcox

On Mon, Dec 31, 2018 at 4:47 AM Qian Cai <cai@lca.pw> wrote:
>
> Ah, it has KASAN_EXTRA. Need this patch then.
>
> https://lore.kernel.org/lkml/20181228020639.80425-1-cai@lca.pw/
>
> or to use GCC from the HEAD which suppose to reduce the stack-size in half.
>
> shrink_page_list
> shrink_inactive_list
>
> Those things are 7k each, so 32k would be soon gone.

I am not sure it's just KASAN. I reproduced stack overflow at this
stack without KASAN:
https://groups.google.com/forum/#!msg/syzkaller-bugs/ZaBzAJbn6i8/Py9FVlAqDQAJ

Note: this was originally reported 5 months ago:
https://groups.google.com/forum/#!msg/syzkaller-bugs/C7d0Hm6YcDM/nQeciKgtCgAJ
so now at least in 2 releases and causes stream of induced crashes
that people spent time debugging:
https://groups.google.com/forum/#!msg/syzkaller-bugs/ZaBzAJbn6i8/Py9FVlAqDQAJ
https://groups.google.com/forum/#!msg/syzkaller-bugs/GIpnqHiIEQg/5jzwQqqfCwAJ
https://syzkaller.appspot.com/bug?id=26c906d472ea470c2cb58c77f08f964f347cbc68
https://groups.google.com/forum/#!msg/syzkaller-bugs/Ovkbsq5qd84/FHsTYlsfDAAJ
most likely more of these:
https://syzkaller.appspot.com#upstream



> On 12/30/18 10:41 PM, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    195303136f19 Merge tag 'kconfig-v4.21-2' of git://git.kern..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=176c0ebf400000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
> > compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> >
> > Kernel panic - not syncing: corrupted stack end detected inside scheduler
> > CPU: 0 PID: 7 Comm: kworker/u4:0 Not tainted 4.20.0+ #396
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> > 01/01/2011
> > Workqueue: writeback wb_workfn (flush-8:0)
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
> >  panic+0x2ad/0x55f kernel/panic.c:189
> >  schedule_debug kernel/sched/core.c:3285 [inline]
> >  __schedule+0x1ec6/0x1ed0 kernel/sched/core.c:3394
> >  preempt_schedule_common+0x1f/0xe0 kernel/sched/core.c:3596
> >  preempt_schedule+0x4d/0x60 kernel/sched/core.c:3622
> >  ___preempt_schedule+0x16/0x18
> >  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
> >  _raw_spin_unlock_irqrestore+0xbb/0xd0 kernel/locking/spinlock.c:184
> >  spin_unlock_irqrestore include/linux/spinlock.h:384 [inline]
> >  __remove_mapping+0x932/0x1af0 mm/vmscan.c:967
> >  shrink_page_list+0x6610/0xc2e0 mm/vmscan.c:1461
> >  shrink_inactive_list+0x77b/0x1c60 mm/vmscan.c:1961
> >  shrink_list mm/vmscan.c:2273 [inline]
> >  shrink_node_memcg+0x7a8/0x19a0 mm/vmscan.c:2538
> >  shrink_node+0x3e1/0x17f0 mm/vmscan.c:2753
> >  shrink_zones mm/vmscan.c:2987 [inline]
> >  do_try_to_free_pages+0x3df/0x12a0 mm/vmscan.c:3049
> >  try_to_free_pages+0x4d0/0xb90 mm/vmscan.c:3265
> >  __perform_reclaim mm/page_alloc.c:3920 [inline]
> >  __alloc_pages_direct_reclaim mm/page_alloc.c:3942 [inline]
> >  __alloc_pages_slowpath+0xa5a/0x2db0 mm/page_alloc.c:4335
> >  __alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
> >  alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
> >  alloc_pages include/linux/gfp.h:509 [inline]
> >  __page_cache_alloc+0x38c/0x5b0 mm/filemap.c:924
> >  pagecache_get_page+0x396/0xf00 mm/filemap.c:1615
> >  find_or_create_page include/linux/pagemap.h:322 [inline]
> >  ext4_mb_load_buddy_gfp+0xddf/0x1e70 fs/ext4/mballoc.c:1158
> >  ext4_mb_load_buddy fs/ext4/mballoc.c:1241 [inline]
> >  ext4_mb_regular_allocator+0x634/0x1590 fs/ext4/mballoc.c:2190
> >  ext4_mb_new_blocks+0x1de3/0x4840 fs/ext4/mballoc.c:4538
> >  ext4_ext_map_blocks+0x2eef/0x6180 fs/ext4/extents.c:4404
> >  ext4_map_blocks+0x8f7/0x1b60 fs/ext4/inode.c:636
> >  mpage_map_one_extent fs/ext4/inode.c:2480 [inline]
> >  mpage_map_and_submit_extent fs/ext4/inode.c:2533 [inline]
> >  ext4_writepages+0x2564/0x4170 fs/ext4/inode.c:2884
> >  do_writepages+0x9a/0x1a0 mm/page-writeback.c:2335
> >  __writeback_single_inode+0x20a/0x1660 fs/fs-writeback.c:1316
> >  writeback_sb_inodes+0x71f/0x1210 fs/fs-writeback.c:1580
> >  __writeback_inodes_wb+0x1b9/0x340 fs/fs-writeback.c:1649
> >  wb_writeback+0xa73/0xfc0 fs/fs-writeback.c:1758
> > oom_reaper: reaped process 7963 (syz-executor189), now anon-rss:0kB,
> > file-rss:0kB, shmem-rss:0kB
> > rsyslogd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0,
> > oom_score_adj=0
> >  wb_check_start_all fs/fs-writeback.c:1882 [inline]
> >  wb_do_writeback fs/fs-writeback.c:1908 [inline]
> >  wb_workfn+0xee9/0x1790 fs/fs-writeback.c:1942
> >  process_one_work+0xc90/0x1c40 kernel/workqueue.c:2153
> >  worker_thread+0x17f/0x1390 kernel/workqueue.c:2296
> >  kthread+0x35a/0x440 kernel/kthread.c:246
> >  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
> > CPU: 1 PID: 7840 Comm: rsyslogd Not tainted 4.20.0+ #396
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
> > 01/01/2011
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x1d3/0x2c6 lib/dump_stack.c:113
> >  dump_header+0x253/0x1239 mm/oom_kill.c:451
> >  oom_kill_process.cold.27+0x10/0x903 mm/oom_kill.c:966
> >  out_of_memory+0x8ba/0x1480 mm/oom_kill.c:1133
> >  __alloc_pages_may_oom mm/page_alloc.c:3666 [inline]
> >  __alloc_pages_slowpath+0x230c/0x2db0 mm/page_alloc.c:4379
> >  __alloc_pages_nodemask+0xa89/0xde0 mm/page_alloc.c:4549
> >  alloc_pages_current+0x10c/0x210 mm/mempolicy.c:2106
> >  alloc_pages include/linux/gfp.h:509 [inline]
> >  __page_cache_alloc+0x38c/0x5b0 mm/filemap.c:924
> >  page_cache_read mm/filemap.c:2373 [inline]
> >  filemap_fault+0x1595/0x25f0 mm/filemap.c:2557
> >  ext4_filemap_fault+0x82/0xad fs/ext4/inode.c:6317
> >  __do_fault+0x100/0x6b0 mm/memory.c:2997
> >  do_read_fault mm/memory.c:3409 [inline]
> >  do_fault mm/memory.c:3535 [inline]
> >  handle_pte_fault mm/memory.c:3766 [inline]
> >  __handle_mm_fault+0x392f/0x5630 mm/memory.c:3890
> >  handle_mm_fault+0x54f/0xc70 mm/memory.c:3927
> >  do_user_addr_fault arch/x86/mm/fault.c:1475 [inline]
> >  __do_page_fault+0x5f6/0xd70 arch/x86/mm/fault.c:1541
> >  do_page_fault+0xf2/0x7e0 arch/x86/mm/fault.c:1572
> >  page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1143
> > RIP: 0033:0x7f00f990e1fd
> > Code: Bad RIP value.
> > RSP: 002b:00007f00f6eade30 EFLAGS: 00010293
> > RAX: 0000000000000fd2 RBX: 000000000111f170 RCX: 00007f00f990e1fd
> > RDX: 0000000000000fff RSI: 00007f00f86e25a0 RDI: 0000000000000004
> > RBP: 0000000000000000 R08: 000000000110a260 R09: 0000000000000000
> > R10: 74616c7567657227 R11: 0000000000000293 R12: 000000000065e420
> > R13: 00007f00f6eae9c0 R14: 00007f00f9f53040 R15: 0000000000000003
> > Kernel Offset: disabled
> > Rebooting in 86400 seconds..
> >
> >
> > ---
> > This bug is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> >
> > syzbot will keep track of this bug report. See:
> > https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.
> > syzbot can test patches for this bug, for details see:
> > https://goo.gl/tpsmEJ#testing-patches
> >
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/9fe14b68-5a3c-5964-62b1-53a4ef4c0b76%40lca.pw.
> For more options, visit https://groups.google.com/d/optout.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2018-12-31  3:41 ` syzbot
  (?)
@ 2019-03-17 20:49   ` syzbot
  -1 siblings, 0 replies; 42+ messages in thread
From: syzbot @ 2019-03-17 20:49 UTC (permalink / raw)
  To: akpm, aryabinin, cai, davem, dvyukov, guro, hannes, jbacik,
	ktkhai, linux-kernel, linux-mm, linux-sctp, mgorman, mhocko,
	netdev, nhorman, shakeelb, syzkaller-bugs, viro, vyasevich,
	willy

syzbot has bisected this bug to:

commit c981f254cc82f50f8cb864ce6432097b23195b9c
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun Jan 7 18:19:09 2018 +0000

     sctp: use vmemdup_user() rather than badly open-coding memdup_user()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=137bcecf200000
start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=10fbcecf200000
console output: https://syzkaller.appspot.com/x/log.txt?x=177bcecf200000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000

Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding  
memdup_user()")

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-17 20:49   ` syzbot
  0 siblings, 0 replies; 42+ messages in thread
From: syzbot @ 2019-03-17 20:49 UTC (permalink / raw)
  To: akpm, aryabinin, cai, davem, dvyukov, guro, hannes, jbacik,
	ktkhai, linux-kernel, linux-mm, linux-sctp, mgorman, mhocko,
	netdev, nhorman, shakeelb, syzkaller-bugs, viro, vyasevich,
	willy

syzbot has bisected this bug to:

commit c981f254cc82f50f8cb864ce6432097b23195b9c
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun Jan 7 18:19:09 2018 +0000

     sctp: use vmemdup_user() rather than badly open-coding memdup_user()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x\x137bcecf200000
start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x\x10fbcecf200000
console output: https://syzkaller.appspot.com/x/log.txt?x\x177bcecf200000
kernel config:  https://syzkaller.appspot.com/x/.config?x^7dc790609552d7
dashboard link: https://syzkaller.appspot.com/bug?extidì1b7575afef85a0e5ca
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x\x16a9a84b400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x\x17199bb3400000

Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding  
memdup_user()")

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-17 20:49   ` syzbot
  0 siblings, 0 replies; 42+ messages in thread
From: syzbot @ 2019-03-17 20:49 UTC (permalink / raw)
  To: akpm, aryabinin, cai, davem, dvyukov, guro, hannes, jbacik,
	ktkhai, linux-kernel, linux-mm, linux-sctp, mgorman, mhocko,
	netdev, nhorman, shakeelb, syzkaller-bugs, viro, vyasevich,
	willy

syzbot has bisected this bug to:

commit c981f254cc82f50f8cb864ce6432097b23195b9c
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sun Jan 7 18:19:09 2018 +0000

     sctp: use vmemdup_user() rather than badly open-coding memdup_user()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=137bcecf200000
start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=10fbcecf200000
console output: https://syzkaller.appspot.com/x/log.txt?x=177bcecf200000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000

Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding  
memdup_user()")


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-17 20:49   ` syzbot
  (?)
@ 2019-03-19 18:03     ` Xin Long
  -1 siblings, 0 replies; 42+ messages in thread
From: Xin Long @ 2019-03-19 18:03 UTC (permalink / raw)
  To: syzbot
  Cc: akpm, aryabinin, cai, davem, Dmitry Vyukov, guro, hannes, jbacik,
	Kirill Tkhai, LKML, linux-mm, linux-sctp, mgorman, mhocko,
	network dev, Neil Horman, shakeelb, syzkaller-bugs, viro,
	Vlad Yasevich, willy

On Mon, Mar 18, 2019 at 4:49 AM syzbot
<syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com> wrote:
>
> syzbot has bisected this bug to:
>
> commit c981f254cc82f50f8cb864ce6432097b23195b9c
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date:   Sun Jan 7 18:19:09 2018 +0000
>
>      sctp: use vmemdup_user() rather than badly open-coding memdup_user()
'addrs_size' is passed from users, we actually used GFP_USER to
put some more restrictions on it in this commit:

commit cacc06215271104b40773c99547c506095db6ad4
Author: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date:   Mon Nov 30 14:32:54 2015 -0200

    sctp: use GFP_USER for user-controlled kmalloc

However, vmemdup_user() will 'ignore' this flag when going to vmalloc_*(),
So we probably should fix it by using memdup_user() to avoid that
open-coding part instead:

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ea95cd4..e5bcade 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -999,7 +999,7 @@ static int sctp_setsockopt_bindx(struct sock *sk,
        if (unlikely(addrs_size <= 0))
                return -EINVAL;

-       kaddrs = vmemdup_user(addrs, addrs_size);
+       kaddrs = memdup_user(addrs, addrs_size);

>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=137bcecf200000
> start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
> git tree:       upstream
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=10fbcecf200000
> console output: https://syzkaller.appspot.com/x/log.txt?x=177bcecf200000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000
>
> Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding
> memdup_user()")

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-19 18:03     ` Xin Long
  0 siblings, 0 replies; 42+ messages in thread
From: Xin Long @ 2019-03-19 18:03 UTC (permalink / raw)
  To: syzbot
  Cc: akpm, aryabinin, cai, davem, Dmitry Vyukov, guro, hannes, jbacik,
	Kirill Tkhai, LKML, linux-mm, linux-sctp, mgorman, mhocko,
	network dev, Neil Horman, shakeelb, syzkaller-bugs, viro,
	Vlad Yasevich, willy

On Mon, Mar 18, 2019 at 4:49 AM syzbot
<syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com> wrote:
>
> syzbot has bisected this bug to:
>
> commit c981f254cc82f50f8cb864ce6432097b23195b9c
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date:   Sun Jan 7 18:19:09 2018 +0000
>
>      sctp: use vmemdup_user() rather than badly open-coding memdup_user()
'addrs_size' is passed from users, we actually used GFP_USER to
put some more restrictions on it in this commit:

commit cacc06215271104b40773c99547c506095db6ad4
Author: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date:   Mon Nov 30 14:32:54 2015 -0200

    sctp: use GFP_USER for user-controlled kmalloc

However, vmemdup_user() will 'ignore' this flag when going to vmalloc_*(),
So we probably should fix it by using memdup_user() to avoid that
open-coding part instead:

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ea95cd4..e5bcade 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -999,7 +999,7 @@ static int sctp_setsockopt_bindx(struct sock *sk,
        if (unlikely(addrs_size <= 0))
                return -EINVAL;

-       kaddrs = vmemdup_user(addrs, addrs_size);
+       kaddrs = memdup_user(addrs, addrs_size);

>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x\x137bcecf200000
> start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
> git tree:       upstream
> final crash:    https://syzkaller.appspot.com/x/report.txt?x\x10fbcecf200000
> console output: https://syzkaller.appspot.com/x/log.txt?x\x177bcecf200000
> kernel config:  https://syzkaller.appspot.com/x/.config?x^7dc790609552d7
> dashboard link: https://syzkaller.appspot.com/bug?extidì1b7575afef85a0e5ca
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x\x16a9a84b400000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x\x17199bb3400000
>
> Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding
> memdup_user()")

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-19 18:03     ` Xin Long
  0 siblings, 0 replies; 42+ messages in thread
From: Xin Long @ 2019-03-19 18:03 UTC (permalink / raw)
  To: syzbot
  Cc: akpm, aryabinin, cai, davem, Dmitry Vyukov, guro, hannes, jbacik,
	Kirill Tkhai, LKML, linux-mm, linux-sctp, mgorman, mhocko,
	network dev, Neil Horman, shakeelb, syzkaller-bugs, viro,
	Vlad Yasevich, willy

On Mon, Mar 18, 2019 at 4:49 AM syzbot
<syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com> wrote:
>
> syzbot has bisected this bug to:
>
> commit c981f254cc82f50f8cb864ce6432097b23195b9c
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date:   Sun Jan 7 18:19:09 2018 +0000
>
>      sctp: use vmemdup_user() rather than badly open-coding memdup_user()
'addrs_size' is passed from users, we actually used GFP_USER to
put some more restrictions on it in this commit:

commit cacc06215271104b40773c99547c506095db6ad4
Author: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date:   Mon Nov 30 14:32:54 2015 -0200

    sctp: use GFP_USER for user-controlled kmalloc

However, vmemdup_user() will 'ignore' this flag when going to vmalloc_*(),
So we probably should fix it by using memdup_user() to avoid that
open-coding part instead:

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index ea95cd4..e5bcade 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -999,7 +999,7 @@ static int sctp_setsockopt_bindx(struct sock *sk,
        if (unlikely(addrs_size <= 0))
                return -EINVAL;

-       kaddrs = vmemdup_user(addrs, addrs_size);
+       kaddrs = memdup_user(addrs, addrs_size);

>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=137bcecf200000
> start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
> git tree:       upstream
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=10fbcecf200000
> console output: https://syzkaller.appspot.com/x/log.txt?x=177bcecf200000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000
>
> Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding
> memdup_user()")


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-17 20:49   ` syzbot
@ 2019-03-20  9:56     ` Andrey Ryabinin
  -1 siblings, 0 replies; 42+ messages in thread
From: Andrey Ryabinin @ 2019-03-20  9:56 UTC (permalink / raw)
  To: syzbot, akpm, cai, davem, dvyukov, guro, hannes, jbacik, ktkhai,
	linux-kernel, linux-mm, linux-sctp, mgorman, mhocko, netdev,
	nhorman, shakeelb, syzkaller-bugs, viro, vyasevich, willy
  Cc: Xin Long



On 3/17/19 11:49 PM, syzbot wrote:
> syzbot has bisected this bug to:
> 
> commit c981f254cc82f50f8cb864ce6432097b23195b9c
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date:   Sun Jan 7 18:19:09 2018 +0000
> 
>     sctp: use vmemdup_user() rather than badly open-coding memdup_user()
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=137bcecf200000
> start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
> git tree:       upstream
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=10fbcecf200000
> console output: https://syzkaller.appspot.com/x/log.txt?x=177bcecf200000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000
> 
> Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()")

From bisection log:

	testing release v4.17
	testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
	run #0: crashed: kernel panic: corrupted stack end in wb_workfn
	run #1: crashed: kernel panic: corrupted stack end in worker_thread
	run #2: crashed: kernel panic: Out of memory and no killable processes...
	run #3: crashed: kernel panic: corrupted stack end in wb_workfn
	run #4: crashed: kernel panic: corrupted stack end in wb_workfn
	run #5: crashed: kernel panic: corrupted stack end in wb_workfn
	run #6: crashed: kernel panic: corrupted stack end in wb_workfn
	run #7: crashed: kernel panic: corrupted stack end in wb_workfn
	run #8: crashed: kernel panic: Out of memory and no killable processes...
	run #9: crashed: kernel panic: corrupted stack end in wb_workfn
	testing release v4.16
	testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
	run #0: OK
	run #1: OK
	run #2: OK
	run #3: OK
	run #4: OK
	run #5: crashed: kernel panic: Out of memory and no killable processes...
	run #6: OK
	run #7: crashed: kernel panic: Out of memory and no killable processes...
	run #8: OK
	run #9: OK
	testing release v4.15
	testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
	all runs: OK
	# git bisect start v4.16 v4.15

Why bisect started between 4.16 4.15 instead of 4.17 4.16?


	testing commit c14376de3a1befa70d9811ca2872d47367b48767 with gcc (GCC) 8.1.0
	run #0: crashed: kernel panic: Out of memory and no killable processes...
	run #1: crashed: kernel panic: Out of memory and no killable processes...
	run #2: crashed: kernel panic: Out of memory and no killable processes...
	run #3: crashed: kernel panic: Out of memory and no killable processes...
	run #4: OK
	run #5: OK
	run #6: crashed: WARNING: ODEBUG bug in netdev_freemem
	run #7: crashed: no output from test machine
	run #8: OK
	run #9: OK
	# git bisect bad c14376de3a1befa70d9811ca2872d47367b48767

Why c14376de3a1befa70d9811ca2872d47367b48767 is bad? There was no stack corruption.
It looks like the syzbot were bisecting a different bug - "kernel panic: Out of memory and no killable processes..."
And bisection for that bug seems to be correct. kvmalloc() in vmemdup_user() may eat up all memory unlike kmalloc which is limited by KMALLOC_MAX_SIZE (4MB usually).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20  9:56     ` Andrey Ryabinin
  0 siblings, 0 replies; 42+ messages in thread
From: Andrey Ryabinin @ 2019-03-20  9:56 UTC (permalink / raw)
  To: syzbot, akpm, cai, davem, dvyukov, guro, hannes, jbacik, ktkhai,
	linux-kernel, linux-mm, linux-sctp, mgorman, mhocko, netdev,
	nhorman, shakeelb, syzkaller-bugs, viro, vyasevich, willy
  Cc: Xin Long



On 3/17/19 11:49 PM, syzbot wrote:
> syzbot has bisected this bug to:
> 
> commit c981f254cc82f50f8cb864ce6432097b23195b9c
> Author: Al Viro <viro@zeniv.linux.org.uk>
> Date:   Sun Jan 7 18:19:09 2018 +0000
> 
>     sctp: use vmemdup_user() rather than badly open-coding memdup_user()
> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x\x137bcecf200000
> start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
> git tree:       upstream
> final crash:    https://syzkaller.appspot.com/x/report.txt?x\x10fbcecf200000
> console output: https://syzkaller.appspot.com/x/log.txt?x\x177bcecf200000
> kernel config:  https://syzkaller.appspot.com/x/.config?x^7dc790609552d7
> dashboard link: https://syzkaller.appspot.com/bug?extidì1b7575afef85a0e5ca
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x\x16a9a84b400000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x\x17199bb3400000
> 
> Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()")

From bisection log:

	testing release v4.17
	testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
	run #0: crashed: kernel panic: corrupted stack end in wb_workfn
	run #1: crashed: kernel panic: corrupted stack end in worker_thread
	run #2: crashed: kernel panic: Out of memory and no killable processes...
	run #3: crashed: kernel panic: corrupted stack end in wb_workfn
	run #4: crashed: kernel panic: corrupted stack end in wb_workfn
	run #5: crashed: kernel panic: corrupted stack end in wb_workfn
	run #6: crashed: kernel panic: corrupted stack end in wb_workfn
	run #7: crashed: kernel panic: corrupted stack end in wb_workfn
	run #8: crashed: kernel panic: Out of memory and no killable processes...
	run #9: crashed: kernel panic: corrupted stack end in wb_workfn
	testing release v4.16
	testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
	run #0: OK
	run #1: OK
	run #2: OK
	run #3: OK
	run #4: OK
	run #5: crashed: kernel panic: Out of memory and no killable processes...
	run #6: OK
	run #7: crashed: kernel panic: Out of memory and no killable processes...
	run #8: OK
	run #9: OK
	testing release v4.15
	testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
	all runs: OK
	# git bisect start v4.16 v4.15

Why bisect started between 4.16 4.15 instead of 4.17 4.16?


	testing commit c14376de3a1befa70d9811ca2872d47367b48767 with gcc (GCC) 8.1.0
	run #0: crashed: kernel panic: Out of memory and no killable processes...
	run #1: crashed: kernel panic: Out of memory and no killable processes...
	run #2: crashed: kernel panic: Out of memory and no killable processes...
	run #3: crashed: kernel panic: Out of memory and no killable processes...
	run #4: OK
	run #5: OK
	run #6: crashed: WARNING: ODEBUG bug in netdev_freemem
	run #7: crashed: no output from test machine
	run #8: OK
	run #9: OK
	# git bisect bad c14376de3a1befa70d9811ca2872d47367b48767

Why c14376de3a1befa70d9811ca2872d47367b48767 is bad? There was no stack corruption.
It looks like the syzbot were bisecting a different bug - "kernel panic: Out of memory and no killable processes..."
And bisection for that bug seems to be correct. kvmalloc() in vmemdup_user() may eat up all memory unlike kmalloc which is limited by KMALLOC_MAX_SIZE (4MB usually).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20  9:56     ` Andrey Ryabinin
  (?)
@ 2019-03-20  9:59       ` Dmitry Vyukov
  -1 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20  9:59 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 10:56 AM Andrey Ryabinin
<aryabinin@virtuozzo.com> wrote:
>
> On 3/17/19 11:49 PM, syzbot wrote:
> > syzbot has bisected this bug to:
> >
> > commit c981f254cc82f50f8cb864ce6432097b23195b9c
> > Author: Al Viro <viro@zeniv.linux.org.uk>
> > Date:   Sun Jan 7 18:19:09 2018 +0000
> >
> >     sctp: use vmemdup_user() rather than badly open-coding memdup_user()
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=137bcecf200000
> > start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
> > git tree:       upstream
> > final crash:    https://syzkaller.appspot.com/x/report.txt?x=10fbcecf200000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=177bcecf200000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000
> >
> > Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> > Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()")
>
> From bisection log:
>
>         testing release v4.17
>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #8: crashed: kernel panic: Out of memory and no killable processes...
>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
>         testing release v4.16
>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
>         run #0: OK
>         run #1: OK
>         run #2: OK
>         run #3: OK
>         run #4: OK
>         run #5: crashed: kernel panic: Out of memory and no killable processes...
>         run #6: OK
>         run #7: crashed: kernel panic: Out of memory and no killable processes...
>         run #8: OK
>         run #9: OK
>         testing release v4.15
>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
>         all runs: OK
>         # git bisect start v4.16 v4.15
>
> Why bisect started between 4.16 4.15 instead of 4.17 4.16?

Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
looks like the right range, no?


>         testing commit c14376de3a1befa70d9811ca2872d47367b48767 with gcc (GCC) 8.1.0
>         run #0: crashed: kernel panic: Out of memory and no killable processes...
>         run #1: crashed: kernel panic: Out of memory and no killable processes...
>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>         run #3: crashed: kernel panic: Out of memory and no killable processes...
>         run #4: OK
>         run #5: OK
>         run #6: crashed: WARNING: ODEBUG bug in netdev_freemem
>         run #7: crashed: no output from test machine
>         run #8: OK
>         run #9: OK
>         # git bisect bad c14376de3a1befa70d9811ca2872d47367b48767
>
> Why c14376de3a1befa70d9811ca2872d47367b48767 is bad? There was no stack corruption.
> It looks like the syzbot were bisecting a different bug - "kernel panic: Out of memory and no killable processes..."
> And bisection for that bug seems to be correct. kvmalloc() in vmemdup_user() may eat up all memory unlike kmalloc which is limited by KMALLOC_MAX_SIZE (4MB usually).

Please see https://github.com/google/syzkaller/blob/master/docs/syzbot.md#bisection
for answer.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20  9:59       ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20  9:59 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 10:56 AM Andrey Ryabinin
<aryabinin@virtuozzo.com> wrote:
>
> On 3/17/19 11:49 PM, syzbot wrote:
> > syzbot has bisected this bug to:
> >
> > commit c981f254cc82f50f8cb864ce6432097b23195b9c
> > Author: Al Viro <viro@zeniv.linux.org.uk>
> > Date:   Sun Jan 7 18:19:09 2018 +0000
> >
> >     sctp: use vmemdup_user() rather than badly open-coding memdup_user()
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x\x137bcecf200000
> > start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
> > git tree:       upstream
> > final crash:    https://syzkaller.appspot.com/x/report.txt?x\x10fbcecf200000
> > console output: https://syzkaller.appspot.com/x/log.txt?x\x177bcecf200000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x^7dc790609552d7
> > dashboard link: https://syzkaller.appspot.com/bug?extidì1b7575afef85a0e5ca
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x\x16a9a84b400000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x\x17199bb3400000
> >
> > Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> > Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()")
>
> From bisection log:
>
>         testing release v4.17
>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #8: crashed: kernel panic: Out of memory and no killable processes...
>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
>         testing release v4.16
>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
>         run #0: OK
>         run #1: OK
>         run #2: OK
>         run #3: OK
>         run #4: OK
>         run #5: crashed: kernel panic: Out of memory and no killable processes...
>         run #6: OK
>         run #7: crashed: kernel panic: Out of memory and no killable processes...
>         run #8: OK
>         run #9: OK
>         testing release v4.15
>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
>         all runs: OK
>         # git bisect start v4.16 v4.15
>
> Why bisect started between 4.16 4.15 instead of 4.17 4.16?

Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
looks like the right range, no?


>         testing commit c14376de3a1befa70d9811ca2872d47367b48767 with gcc (GCC) 8.1.0
>         run #0: crashed: kernel panic: Out of memory and no killable processes...
>         run #1: crashed: kernel panic: Out of memory and no killable processes...
>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>         run #3: crashed: kernel panic: Out of memory and no killable processes...
>         run #4: OK
>         run #5: OK
>         run #6: crashed: WARNING: ODEBUG bug in netdev_freemem
>         run #7: crashed: no output from test machine
>         run #8: OK
>         run #9: OK
>         # git bisect bad c14376de3a1befa70d9811ca2872d47367b48767
>
> Why c14376de3a1befa70d9811ca2872d47367b48767 is bad? There was no stack corruption.
> It looks like the syzbot were bisecting a different bug - "kernel panic: Out of memory and no killable processes..."
> And bisection for that bug seems to be correct. kvmalloc() in vmemdup_user() may eat up all memory unlike kmalloc which is limited by KMALLOC_MAX_SIZE (4MB usually).

Please see https://github.com/google/syzkaller/blob/master/docs/syzbot.md#bisection
for answer.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20  9:59       ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20  9:59 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 10:56 AM Andrey Ryabinin
<aryabinin@virtuozzo.com> wrote:
>
> On 3/17/19 11:49 PM, syzbot wrote:
> > syzbot has bisected this bug to:
> >
> > commit c981f254cc82f50f8cb864ce6432097b23195b9c
> > Author: Al Viro <viro@zeniv.linux.org.uk>
> > Date:   Sun Jan 7 18:19:09 2018 +0000
> >
> >     sctp: use vmemdup_user() rather than badly open-coding memdup_user()
> >
> > bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=137bcecf200000
> > start commit:   c981f254 sctp: use vmemdup_user() rather than badly open-c..
> > git tree:       upstream
> > final crash:    https://syzkaller.appspot.com/x/report.txt?x=10fbcecf200000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=177bcecf200000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=5e7dc790609552d7
> > dashboard link: https://syzkaller.appspot.com/bug?extid=ec1b7575afef85a0e5ca
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16a9a84b400000
> > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17199bb3400000
> >
> > Reported-by: syzbot+ec1b7575afef85a0e5ca@syzkaller.appspotmail.com
> > Fixes: c981f254 ("sctp: use vmemdup_user() rather than badly open-coding memdup_user()")
>
> From bisection log:
>
>         testing release v4.17
>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
>         run #8: crashed: kernel panic: Out of memory and no killable processes...
>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
>         testing release v4.16
>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
>         run #0: OK
>         run #1: OK
>         run #2: OK
>         run #3: OK
>         run #4: OK
>         run #5: crashed: kernel panic: Out of memory and no killable processes...
>         run #6: OK
>         run #7: crashed: kernel panic: Out of memory and no killable processes...
>         run #8: OK
>         run #9: OK
>         testing release v4.15
>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
>         all runs: OK
>         # git bisect start v4.16 v4.15
>
> Why bisect started between 4.16 4.15 instead of 4.17 4.16?

Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
looks like the right range, no?


>         testing commit c14376de3a1befa70d9811ca2872d47367b48767 with gcc (GCC) 8.1.0
>         run #0: crashed: kernel panic: Out of memory and no killable processes...
>         run #1: crashed: kernel panic: Out of memory and no killable processes...
>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>         run #3: crashed: kernel panic: Out of memory and no killable processes...
>         run #4: OK
>         run #5: OK
>         run #6: crashed: WARNING: ODEBUG bug in netdev_freemem
>         run #7: crashed: no output from test machine
>         run #8: OK
>         run #9: OK
>         # git bisect bad c14376de3a1befa70d9811ca2872d47367b48767
>
> Why c14376de3a1befa70d9811ca2872d47367b48767 is bad? There was no stack corruption.
> It looks like the syzbot were bisecting a different bug - "kernel panic: Out of memory and no killable processes..."
> And bisection for that bug seems to be correct. kvmalloc() in vmemdup_user() may eat up all memory unlike kmalloc which is limited by KMALLOC_MAX_SIZE (4MB usually).

Please see https://github.com/google/syzkaller/blob/master/docs/syzbot.md#bisection
for answer.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20  9:59       ` Dmitry Vyukov
@ 2019-03-20 10:23         ` Tetsuo Handa
  -1 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2019-03-20 10:23 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrey Ryabinin
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On 2019/03/20 18:59, Dmitry Vyukov wrote:
>> From bisection log:
>>
>>         testing release v4.17
>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
>>         testing release v4.16
>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
>>         run #0: OK
>>         run #1: OK
>>         run #2: OK
>>         run #3: OK
>>         run #4: OK
>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
>>         run #6: OK
>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
>>         run #8: OK
>>         run #9: OK
>>         testing release v4.15
>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
>>         all runs: OK
>>         # git bisect start v4.16 v4.15
>>
>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> 
> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> looks like the right range, no?

No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
"Stack corruption" can't manifest as "Out of memory and no killable processes".

"kernel panic: Out of memory and no killable processes..." is completely
unrelated to "kernel panic: corrupted stack end in wb_workfn".

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 10:23         ` Tetsuo Handa
  0 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2019-03-20 10:23 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrey Ryabinin
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On 2019/03/20 18:59, Dmitry Vyukov wrote:
>> From bisection log:
>>
>>         testing release v4.17
>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
>>         testing release v4.16
>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
>>         run #0: OK
>>         run #1: OK
>>         run #2: OK
>>         run #3: OK
>>         run #4: OK
>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
>>         run #6: OK
>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
>>         run #8: OK
>>         run #9: OK
>>         testing release v4.15
>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
>>         all runs: OK
>>         # git bisect start v4.16 v4.15
>>
>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> 
> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> looks like the right range, no?

No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
"Stack corruption" can't manifest as "Out of memory and no killable processes".

"kernel panic: Out of memory and no killable processes..." is completely
unrelated to "kernel panic: corrupted stack end in wb_workfn".

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 10:23         ` Tetsuo Handa
  (?)
@ 2019-03-20 10:38           ` Dmitry Vyukov
  -1 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:38 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >> From bisection log:
> >>
> >>         testing release v4.17
> >>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         testing release v4.16
> >>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >>         run #0: OK
> >>         run #1: OK
> >>         run #2: OK
> >>         run #3: OK
> >>         run #4: OK
> >>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #6: OK
> >>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #8: OK
> >>         run #9: OK
> >>         testing release v4.15
> >>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >>         all runs: OK
> >>         # git bisect start v4.16 v4.15
> >>
> >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >
> > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > looks like the right range, no?
>
> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>
> "kernel panic: Out of memory and no killable processes..." is completely
> unrelated to "kernel panic: corrupted stack end in wb_workfn".


Do you think this predicate is possible to code? Looking at the
examples we have, distinguishing different bugs does not look feasible
to me. If the predicate is not accurate, you just trade one set of
false positives to another set of false positives and then you at the
beginning of an infinite slippery slope refining it.
Also, if we see a different bug (assuming we can distinguish them),
does it mean that the original bug is not present? Or it's also
present, but we just hit the other one first? This also does not look
feasible to answer. And if you give a wrong answer, bisection goes the
wrong way and we are where we started. Just with more complex code and
things being even harder to explain to other people.
I mean, yes, I agree, kernel bug bisection won't be perfect. But do
you see anything actionable here?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 10:38           ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:38 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >> From bisection log:
> >>
> >>         testing release v4.17
> >>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         testing release v4.16
> >>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >>         run #0: OK
> >>         run #1: OK
> >>         run #2: OK
> >>         run #3: OK
> >>         run #4: OK
> >>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #6: OK
> >>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #8: OK
> >>         run #9: OK
> >>         testing release v4.15
> >>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >>         all runs: OK
> >>         # git bisect start v4.16 v4.15
> >>
> >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >
> > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > looks like the right range, no?
>
> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>
> "kernel panic: Out of memory and no killable processes..." is completely
> unrelated to "kernel panic: corrupted stack end in wb_workfn".


Do you think this predicate is possible to code? Looking at the
examples we have, distinguishing different bugs does not look feasible
to me. If the predicate is not accurate, you just trade one set of
false positives to another set of false positives and then you at the
beginning of an infinite slippery slope refining it.
Also, if we see a different bug (assuming we can distinguish them),
does it mean that the original bug is not present? Or it's also
present, but we just hit the other one first? This also does not look
feasible to answer. And if you give a wrong answer, bisection goes the
wrong way and we are where we started. Just with more complex code and
things being even harder to explain to other people.
I mean, yes, I agree, kernel bug bisection won't be perfect. But do
you see anything actionable here?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 10:38           ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:38 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >> From bisection log:
> >>
> >>         testing release v4.17
> >>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >>         testing release v4.16
> >>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >>         run #0: OK
> >>         run #1: OK
> >>         run #2: OK
> >>         run #3: OK
> >>         run #4: OK
> >>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #6: OK
> >>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> >>         run #8: OK
> >>         run #9: OK
> >>         testing release v4.15
> >>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >>         all runs: OK
> >>         # git bisect start v4.16 v4.15
> >>
> >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >
> > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > looks like the right range, no?
>
> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>
> "kernel panic: Out of memory and no killable processes..." is completely
> unrelated to "kernel panic: corrupted stack end in wb_workfn".


Do you think this predicate is possible to code? Looking at the
examples we have, distinguishing different bugs does not look feasible
to me. If the predicate is not accurate, you just trade one set of
false positives to another set of false positives and then you at the
beginning of an infinite slippery slope refining it.
Also, if we see a different bug (assuming we can distinguish them),
does it mean that the original bug is not present? Or it's also
present, but we just hit the other one first? This also does not look
feasible to answer. And if you give a wrong answer, bisection goes the
wrong way and we are where we started. Just with more complex code and
things being even harder to explain to other people.
I mean, yes, I agree, kernel bug bisection won't be perfect. But do
you see anything actionable here?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 10:38           ` Dmitry Vyukov
  (?)
@ 2019-03-20 10:42             ` Dmitry Vyukov
  -1 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:42 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 11:38 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >> From bisection log:
> > >>
> > >>         testing release v4.17
> > >>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         testing release v4.16
> > >>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >>         run #0: OK
> > >>         run #1: OK
> > >>         run #2: OK
> > >>         run #3: OK
> > >>         run #4: OK
> > >>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #6: OK
> > >>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #8: OK
> > >>         run #9: OK
> > >>         testing release v4.15
> > >>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >>         all runs: OK
> > >>         # git bisect start v4.16 v4.15
> > >>
> > >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >
> > > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > looks like the right range, no?
> >
> > No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >
> > "kernel panic: Out of memory and no killable processes..." is completely
> > unrelated to "kernel panic: corrupted stack end in wb_workfn".
>
>
> Do you think this predicate is possible to code? Looking at the
> examples we have, distinguishing different bugs does not look feasible
> to me. If the predicate is not accurate, you just trade one set of
> false positives to another set of false positives and then you at the
> beginning of an infinite slippery slope refining it.
> Also, if we see a different bug (assuming we can distinguish them),
> does it mean that the original bug is not present? Or it's also
> present, but we just hit the other one first? This also does not look
> feasible to answer. And if you give a wrong answer, bisection goes the
> wrong way and we are where we started. Just with more complex code and
> things being even harder to explain to other people.
> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> you see anything actionable here?

I see the larger long term bisection quality improvement (for syzbot
and for everybody else) in doing some actual testing for each kernel
commit before it's being merged into any kernel tree, so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc. I don't see how reliable
bisection is possible without that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 10:42             ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:42 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 11:38 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >> From bisection log:
> > >>
> > >>         testing release v4.17
> > >>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         testing release v4.16
> > >>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >>         run #0: OK
> > >>         run #1: OK
> > >>         run #2: OK
> > >>         run #3: OK
> > >>         run #4: OK
> > >>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #6: OK
> > >>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #8: OK
> > >>         run #9: OK
> > >>         testing release v4.15
> > >>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >>         all runs: OK
> > >>         # git bisect start v4.16 v4.15
> > >>
> > >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >
> > > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > looks like the right range, no?
> >
> > No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >
> > "kernel panic: Out of memory and no killable processes..." is completely
> > unrelated to "kernel panic: corrupted stack end in wb_workfn".
>
>
> Do you think this predicate is possible to code? Looking at the
> examples we have, distinguishing different bugs does not look feasible
> to me. If the predicate is not accurate, you just trade one set of
> false positives to another set of false positives and then you at the
> beginning of an infinite slippery slope refining it.
> Also, if we see a different bug (assuming we can distinguish them),
> does it mean that the original bug is not present? Or it's also
> present, but we just hit the other one first? This also does not look
> feasible to answer. And if you give a wrong answer, bisection goes the
> wrong way and we are where we started. Just with more complex code and
> things being even harder to explain to other people.
> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> you see anything actionable here?

I see the larger long term bisection quality improvement (for syzbot
and for everybody else) in doing some actual testing for each kernel
commit before it's being merged into any kernel tree, so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc. I don't see how reliable
bisection is possible without that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 10:42             ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 10:42 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 11:38 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >
> > On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >> From bisection log:
> > >>
> > >>         testing release v4.17
> > >>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>         testing release v4.16
> > >>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >>         run #0: OK
> > >>         run #1: OK
> > >>         run #2: OK
> > >>         run #3: OK
> > >>         run #4: OK
> > >>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #6: OK
> > >>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >>         run #8: OK
> > >>         run #9: OK
> > >>         testing release v4.15
> > >>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >>         all runs: OK
> > >>         # git bisect start v4.16 v4.15
> > >>
> > >> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >
> > > Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > looks like the right range, no?
> >
> > No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >
> > "kernel panic: Out of memory and no killable processes..." is completely
> > unrelated to "kernel panic: corrupted stack end in wb_workfn".
>
>
> Do you think this predicate is possible to code? Looking at the
> examples we have, distinguishing different bugs does not look feasible
> to me. If the predicate is not accurate, you just trade one set of
> false positives to another set of false positives and then you at the
> beginning of an infinite slippery slope refining it.
> Also, if we see a different bug (assuming we can distinguish them),
> does it mean that the original bug is not present? Or it's also
> present, but we just hit the other one first? This also does not look
> feasible to answer. And if you give a wrong answer, bisection goes the
> wrong way and we are where we started. Just with more complex code and
> things being even harder to explain to other people.
> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> you see anything actionable here?

I see the larger long term bisection quality improvement (for syzbot
and for everybody else) in doing some actual testing for each kernel
commit before it's being merged into any kernel tree, so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc. I don't see how reliable
bisection is possible without that.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 10:42             ` Dmitry Vyukov
@ 2019-03-20 10:58               ` Tetsuo Handa
  -1 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2019-03-20 10:58 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On 2019/03/20 19:42, Dmitry Vyukov wrote:
>> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
>> you see anything actionable here?

Allow users to manually tell bisection range when
automatic bisection found a wrong commit.

Also, allow users to specify reproducer program
when automatic bisection found a wrong commit.

Yes, this is anti automation. But since automation can't become perfect,
I'm suggesting manual adjustment. Even if we involve manual adjustment,
the syzbot's plenty CPU resources for building/testing kernels is highly
appreciated (compared to doing manual bisection by building/testing kernels
on personal PC environments).

> 
> I see the larger long term bisection quality improvement (for syzbot
> and for everybody else) in doing some actual testing for each kernel
> commit before it's being merged into any kernel tree, so that we have
> less of these a single program triggers 3 different bugs, stray
> unrelated bugs, broken release boots, etc. I don't see how reliable
> bisection is possible without that.
> 

syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
Are you saying that syzbot will become be able to test kernels with custom patches?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 10:58               ` Tetsuo Handa
  0 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2019-03-20 10:58 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On 2019/03/20 19:42, Dmitry Vyukov wrote:
>> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
>> you see anything actionable here?

Allow users to manually tell bisection range when
automatic bisection found a wrong commit.

Also, allow users to specify reproducer program
when automatic bisection found a wrong commit.

Yes, this is anti automation. But since automation can't become perfect,
I'm suggesting manual adjustment. Even if we involve manual adjustment,
the syzbot's plenty CPU resources for building/testing kernels is highly
appreciated (compared to doing manual bisection by building/testing kernels
on personal PC environments).

> 
> I see the larger long term bisection quality improvement (for syzbot
> and for everybody else) in doing some actual testing for each kernel
> commit before it's being merged into any kernel tree, so that we have
> less of these a single program triggers 3 different bugs, stray
> unrelated bugs, broken release boots, etc. I don't see how reliable
> bisection is possible without that.
> 

syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
Are you saying that syzbot will become be able to test kernels with custom patches?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 10:38           ` Dmitry Vyukov
@ 2019-03-20 13:34             ` Andrey Ryabinin
  -1 siblings, 0 replies; 42+ messages in thread
From: Andrey Ryabinin @ 2019-03-20 13:34 UTC (permalink / raw)
  To: Dmitry Vyukov, Tetsuo Handa
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long



On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> On 2019/03/20 18:59, Dmitry Vyukov wrote:
>>>> From bisection log:
>>>>
>>>>         testing release v4.17
>>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
>>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
>>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         testing release v4.16
>>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
>>>>         run #0: OK
>>>>         run #1: OK
>>>>         run #2: OK
>>>>         run #3: OK
>>>>         run #4: OK
>>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
>>>>         run #6: OK
>>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
>>>>         run #8: OK
>>>>         run #9: OK
>>>>         testing release v4.15
>>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
>>>>         all runs: OK
>>>>         # git bisect start v4.16 v4.15
>>>>
>>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
>>>
>>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
>>> looks like the right range, no?
>>
>> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
>> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>>
>> "kernel panic: Out of memory and no killable processes..." is completely
>> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> 
> 
> Do you think this predicate is possible to code?

Something like bellow probably would work better than current behavior.

For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
Also it might be worth to experiment with using neural networks to identify duplicates.


target_crash = 'kernel panic: corrupted stack end in wb_workfn'
test commit:
	bad = false;
	skip = true;
	foreach run:
		run_started, crashed, crash := run_repro();

		//kernel built, booted, reproducer launched successfully
		if (run_started)
			skip = false;
		if (crashed && is_duplicates(crash, target_crash))
			bad = true;
	
	if (skip)
		git bisect skip;
	else if (bad)
		git bisect bad;
	else
		git bisect good;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 13:34             ` Andrey Ryabinin
  0 siblings, 0 replies; 42+ messages in thread
From: Andrey Ryabinin @ 2019-03-20 13:34 UTC (permalink / raw)
  To: Dmitry Vyukov, Tetsuo Handa
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long



On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> On 2019/03/20 18:59, Dmitry Vyukov wrote:
>>>> From bisection log:
>>>>
>>>>         testing release v4.17
>>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
>>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
>>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
>>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
>>>>         testing release v4.16
>>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
>>>>         run #0: OK
>>>>         run #1: OK
>>>>         run #2: OK
>>>>         run #3: OK
>>>>         run #4: OK
>>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
>>>>         run #6: OK
>>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
>>>>         run #8: OK
>>>>         run #9: OK
>>>>         testing release v4.15
>>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
>>>>         all runs: OK
>>>>         # git bisect start v4.16 v4.15
>>>>
>>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
>>>
>>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
>>> looks like the right range, no?
>>
>> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
>> "Stack corruption" can't manifest as "Out of memory and no killable processes".
>>
>> "kernel panic: Out of memory and no killable processes..." is completely
>> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> 
> 
> Do you think this predicate is possible to code?

Something like bellow probably would work better than current behavior.

For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
Also it might be worth to experiment with using neural networks to identify duplicates.


target_crash = 'kernel panic: corrupted stack end in wb_workfn'
test commit:
	bad = false;
	skip = true;
	foreach run:
		run_started, crashed, crash := run_repro();

		//kernel built, booted, reproducer launched successfully
		if (run_started)
			skip = false;
		if (crashed && is_duplicates(crash, target_crash))
			bad = true;
	
	if (skip)
		git bisect skip;
	else if (bad)
		git bisect bad;
	else
		git bisect good;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 13:34             ` Andrey Ryabinin
  (?)
@ 2019-03-20 13:57               ` Dmitry Vyukov
  -1 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:57 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>
>
>
> On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>
> >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >>>> From bisection log:
> >>>>
> >>>>         testing release v4.17
> >>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         testing release v4.16
> >>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >>>>         run #0: OK
> >>>>         run #1: OK
> >>>>         run #2: OK
> >>>>         run #3: OK
> >>>>         run #4: OK
> >>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #6: OK
> >>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #8: OK
> >>>>         run #9: OK
> >>>>         testing release v4.15
> >>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >>>>         all runs: OK
> >>>>         # git bisect start v4.16 v4.15
> >>>>
> >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >>>
> >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> >>> looks like the right range, no?
> >>
> >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >>
> >> "kernel panic: Out of memory and no killable processes..." is completely
> >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> >
> >
> > Do you think this predicate is possible to code?
>
> Something like bellow probably would work better than current behavior.
>
> For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.

Lots of bugs (half?) manifest differently. On top of this, titles
change as we go back in history. On top of this, if we see a different
bug, it does not mean that the original bug is also not there.
This will sure solve some subset of cases better then the current
logic. But I feel that that subset is smaller then what the current
logic solves.

> syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.

This is very limited set of info. And in the end I think we've seen
all bug types being duped on all other bugs types pair-wise, and at
the same time we've seen all bug types being not dups to all other bug
types. So I don't see where this gets us.
And again as we go back in history all these titles change.

> Also it might be worth to experiment with using neural networks to identify duplicates.
>
>
> target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> test commit:
>         bad = false;
>         skip = true;
>         foreach run:
>                 run_started, crashed, crash := run_repro();
>
>                 //kernel built, booted, reproducer launched successfully
>                 if (run_started)
>                         skip = false;
>                 if (crashed && is_duplicates(crash, target_crash))
>                         bad = true;
>
>         if (skip)
>                 git bisect skip;
>         else if (bad)
>                 git bisect bad;
>         else
>                 git bisect good;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 13:57               ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:57 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>
>
>
> On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>
> >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >>>> From bisection log:
> >>>>
> >>>>         testing release v4.17
> >>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         testing release v4.16
> >>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >>>>         run #0: OK
> >>>>         run #1: OK
> >>>>         run #2: OK
> >>>>         run #3: OK
> >>>>         run #4: OK
> >>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #6: OK
> >>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #8: OK
> >>>>         run #9: OK
> >>>>         testing release v4.15
> >>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >>>>         all runs: OK
> >>>>         # git bisect start v4.16 v4.15
> >>>>
> >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >>>
> >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> >>> looks like the right range, no?
> >>
> >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >>
> >> "kernel panic: Out of memory and no killable processes..." is completely
> >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> >
> >
> > Do you think this predicate is possible to code?
>
> Something like bellow probably would work better than current behavior.
>
> For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.

Lots of bugs (half?) manifest differently. On top of this, titles
change as we go back in history. On top of this, if we see a different
bug, it does not mean that the original bug is also not there.
This will sure solve some subset of cases better then the current
logic. But I feel that that subset is smaller then what the current
logic solves.

> syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.

This is very limited set of info. And in the end I think we've seen
all bug types being duped on all other bugs types pair-wise, and at
the same time we've seen all bug types being not dups to all other bug
types. So I don't see where this gets us.
And again as we go back in history all these titles change.

> Also it might be worth to experiment with using neural networks to identify duplicates.
>
>
> target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> test commit:
>         bad = false;
>         skip = true;
>         foreach run:
>                 run_started, crashed, crash := run_repro();
>
>                 //kernel built, booted, reproducer launched successfully
>                 if (run_started)
>                         skip = false;
>                 if (crashed && is_duplicates(crash, target_crash))
>                         bad = true;
>
>         if (skip)
>                 git bisect skip;
>         else if (bad)
>                 git bisect bad;
>         else
>                 git bisect good;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 13:57               ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:57 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>
>
>
> On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> >>
> >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> >>>> From bisection log:
> >>>>
> >>>>         testing release v4.17
> >>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> >>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> >>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> >>>>         testing release v4.16
> >>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> >>>>         run #0: OK
> >>>>         run #1: OK
> >>>>         run #2: OK
> >>>>         run #3: OK
> >>>>         run #4: OK
> >>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #6: OK
> >>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> >>>>         run #8: OK
> >>>>         run #9: OK
> >>>>         testing release v4.15
> >>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> >>>>         all runs: OK
> >>>>         # git bisect start v4.16 v4.15
> >>>>
> >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> >>>
> >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> >>> looks like the right range, no?
> >>
> >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> >>
> >> "kernel panic: Out of memory and no killable processes..." is completely
> >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> >
> >
> > Do you think this predicate is possible to code?
>
> Something like bellow probably would work better than current behavior.
>
> For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.

Lots of bugs (half?) manifest differently. On top of this, titles
change as we go back in history. On top of this, if we see a different
bug, it does not mean that the original bug is also not there.
This will sure solve some subset of cases better then the current
logic. But I feel that that subset is smaller then what the current
logic solves.

> syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.

This is very limited set of info. And in the end I think we've seen
all bug types being duped on all other bugs types pair-wise, and at
the same time we've seen all bug types being not dups to all other bug
types. So I don't see where this gets us.
And again as we go back in history all these titles change.

> Also it might be worth to experiment with using neural networks to identify duplicates.
>
>
> target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> test commit:
>         bad = false;
>         skip = true;
>         foreach run:
>                 run_started, crashed, crash := run_repro();
>
>                 //kernel built, booted, reproducer launched successfully
>                 if (run_started)
>                         skip = false;
>                 if (crashed && is_duplicates(crash, target_crash))
>                         bad = true;
>
>         if (skip)
>                 git bisect skip;
>         else if (bad)
>                 git bisect bad;
>         else
>                 git bisect good;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 10:58               ` Tetsuo Handa
  (?)
@ 2019-03-20 13:59                 ` Dmitry Vyukov
  -1 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:59 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 11:59 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 19:42, Dmitry Vyukov wrote:
> >> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> >> you see anything actionable here?
>
> Allow users to manually tell bisection range when
> automatic bisection found a wrong commit.
>
> Also, allow users to specify reproducer program
> when automatic bisection found a wrong commit.
>
> Yes, this is anti automation. But since automation can't become perfect,
> I'm suggesting manual adjustment. Even if we involve manual adjustment,
> the syzbot's plenty CPU resources for building/testing kernels is highly
> appreciated (compared to doing manual bisection by building/testing kernels
> on personal PC environments).

FTR: provided an extended answer here:
https://groups.google.com/d/msg/syzkaller-bugs/1BSkmb_fawo/DOcDxv_KAgAJ


> > I see the larger long term bisection quality improvement (for syzbot
> > and for everybody else) in doing some actual testing for each kernel
> > commit before it's being merged into any kernel tree, so that we have
> > less of these a single program triggers 3 different bugs, stray
> > unrelated bugs, broken release boots, etc. I don't see how reliable
> > bisection is possible without that.
> >
>
> syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
> Are you saying that syzbot will become be able to test kernels with custom patches?

I mean if we start improving kernel quality over time so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc, it will improve bisection
quality for everybody (beside being hugely useful in itself).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 13:59                 ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:59 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 11:59 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 19:42, Dmitry Vyukov wrote:
> >> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> >> you see anything actionable here?
>
> Allow users to manually tell bisection range when
> automatic bisection found a wrong commit.
>
> Also, allow users to specify reproducer program
> when automatic bisection found a wrong commit.
>
> Yes, this is anti automation. But since automation can't become perfect,
> I'm suggesting manual adjustment. Even if we involve manual adjustment,
> the syzbot's plenty CPU resources for building/testing kernels is highly
> appreciated (compared to doing manual bisection by building/testing kernels
> on personal PC environments).

FTR: provided an extended answer here:
https://groups.google.com/d/msg/syzkaller-bugs/1BSkmb_fawo/DOcDxv_KAgAJ


> > I see the larger long term bisection quality improvement (for syzbot
> > and for everybody else) in doing some actual testing for each kernel
> > commit before it's being merged into any kernel tree, so that we have
> > less of these a single program triggers 3 different bugs, stray
> > unrelated bugs, broken release boots, etc. I don't see how reliable
> > bisection is possible without that.
> >
>
> syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
> Are you saying that syzbot will become be able to test kernels with custom patches?

I mean if we start improving kernel quality over time so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc, it will improve bisection
quality for everybody (beside being hugely useful in itself).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-20 13:59                 ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-20 13:59 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrey Ryabinin, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 11:59 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2019/03/20 19:42, Dmitry Vyukov wrote:
> >> I mean, yes, I agree, kernel bug bisection won't be perfect. But do
> >> you see anything actionable here?
>
> Allow users to manually tell bisection range when
> automatic bisection found a wrong commit.
>
> Also, allow users to specify reproducer program
> when automatic bisection found a wrong commit.
>
> Yes, this is anti automation. But since automation can't become perfect,
> I'm suggesting manual adjustment. Even if we involve manual adjustment,
> the syzbot's plenty CPU resources for building/testing kernels is highly
> appreciated (compared to doing manual bisection by building/testing kernels
> on personal PC environments).

FTR: provided an extended answer here:
https://groups.google.com/d/msg/syzkaller-bugs/1BSkmb_fawo/DOcDxv_KAgAJ


> > I see the larger long term bisection quality improvement (for syzbot
> > and for everybody else) in doing some actual testing for each kernel
> > commit before it's being merged into any kernel tree, so that we have
> > less of these a single program triggers 3 different bugs, stray
> > unrelated bugs, broken release boots, etc. I don't see how reliable
> > bisection is possible without that.
> >
>
> syzbot currently cannot test kernels with custom patches (unless "#syz test:" requests).
> Are you saying that syzbot will become be able to test kernels with custom patches?

I mean if we start improving kernel quality over time so that we have
less of these a single program triggers 3 different bugs, stray
unrelated bugs, broken release boots, etc, it will improve bisection
quality for everybody (beside being hugely useful in itself).


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-20 13:57               ` Dmitry Vyukov
  (?)
@ 2019-03-21  9:45                 ` Dmitry Vyukov
  -1 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-21  9:45 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> >
> >
> >
> > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > >>
> > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >>>> From bisection log:
> > >>>>
> > >>>>         testing release v4.17
> > >>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         testing release v4.16
> > >>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >>>>         run #0: OK
> > >>>>         run #1: OK
> > >>>>         run #2: OK
> > >>>>         run #3: OK
> > >>>>         run #4: OK
> > >>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #6: OK
> > >>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #8: OK
> > >>>>         run #9: OK
> > >>>>         testing release v4.15
> > >>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >>>>         all runs: OK
> > >>>>         # git bisect start v4.16 v4.15
> > >>>>
> > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >>>
> > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > >>> looks like the right range, no?
> > >>
> > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > >>
> > >> "kernel panic: Out of memory and no killable processes..." is completely
> > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > >
> > >
> > > Do you think this predicate is possible to code?
> >
> > Something like bellow probably would work better than current behavior.
> >
> > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
>
> Lots of bugs (half?) manifest differently. On top of this, titles
> change as we go back in history. On top of this, if we see a different
> bug, it does not mean that the original bug is also not there.
> This will sure solve some subset of cases better then the current
> logic. But I feel that that subset is smaller then what the current
> logic solves.

Counter-examples come up in basically every other bisection.
For example:

bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.19
testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.18
testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
testing release v4.17
testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test

That's a different crash title, unless somebody explicitly code this case.

Or, what crash is this?

testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
run #1: crashed: general protection fault in cpuacct_charge
run #2: crashed: WARNING: suspicious RCU usage in corrupted
run #3: crashed: general protection fault in cpuacct_charge
run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
run #6: crashed: WARNING: suspicious RCU usage
run #7: crashed: no output from test machine
run #8: crashed: no output from test machine


Or, that "INFO: trying to register non-static key in can_notifier"
does not do any testing, but is "WARNING in dma_buf_vunmap" still
there or not?

testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: WARNING in dma_buf_vunmap
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: OK
# git bisect start v4.12 v4.11
Bisecting: 7831 revisions left to test after this (roughly 13 steps)
[2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
Bisecting: 3853 revisions left to test after this (roughly 12 steps)
[8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
Bisecting: 2022 revisions left to test after this (roughly 11 steps)
[cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
'mac80211-next-for-davem-2017-04-28' of
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
all runs: crashed: INFO: trying to register non-static key in can_notifier






> > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
>
> This is very limited set of info. And in the end I think we've seen
> all bug types being duped on all other bugs types pair-wise, and at
> the same time we've seen all bug types being not dups to all other bug
> types. So I don't see where this gets us.
> And again as we go back in history all these titles change.
>
> > Also it might be worth to experiment with using neural networks to identify duplicates.
> >
> >
> > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > test commit:
> >         bad = false;
> >         skip = true;
> >         foreach run:
> >                 run_started, crashed, crash := run_repro();
> >
> >                 //kernel built, booted, reproducer launched successfully
> >                 if (run_started)
> >                         skip = false;
> >                 if (crashed && is_duplicates(crash, target_crash))
> >                         bad = true;
> >
> >         if (skip)
> >                 git bisect skip;
> >         else if (bad)
> >                 git bisect bad;
> >         else
> >                 git bisect good;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-21  9:45                 ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-21  9:45 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> >
> >
> >
> > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > >>
> > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >>>> From bisection log:
> > >>>>
> > >>>>         testing release v4.17
> > >>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         testing release v4.16
> > >>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >>>>         run #0: OK
> > >>>>         run #1: OK
> > >>>>         run #2: OK
> > >>>>         run #3: OK
> > >>>>         run #4: OK
> > >>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #6: OK
> > >>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #8: OK
> > >>>>         run #9: OK
> > >>>>         testing release v4.15
> > >>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >>>>         all runs: OK
> > >>>>         # git bisect start v4.16 v4.15
> > >>>>
> > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >>>
> > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > >>> looks like the right range, no?
> > >>
> > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > >>
> > >> "kernel panic: Out of memory and no killable processes..." is completely
> > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > >
> > >
> > > Do you think this predicate is possible to code?
> >
> > Something like bellow probably would work better than current behavior.
> >
> > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
>
> Lots of bugs (half?) manifest differently. On top of this, titles
> change as we go back in history. On top of this, if we see a different
> bug, it does not mean that the original bug is also not there.
> This will sure solve some subset of cases better then the current
> logic. But I feel that that subset is smaller then what the current
> logic solves.

Counter-examples come up in basically every other bisection.
For example:

bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.19
testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.18
testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
testing release v4.17
testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test

That's a different crash title, unless somebody explicitly code this case.

Or, what crash is this?

testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
run #1: crashed: general protection fault in cpuacct_charge
run #2: crashed: WARNING: suspicious RCU usage in corrupted
run #3: crashed: general protection fault in cpuacct_charge
run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
run #6: crashed: WARNING: suspicious RCU usage
run #7: crashed: no output from test machine
run #8: crashed: no output from test machine


Or, that "INFO: trying to register non-static key in can_notifier"
does not do any testing, but is "WARNING in dma_buf_vunmap" still
there or not?

testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: WARNING in dma_buf_vunmap
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: OK
# git bisect start v4.12 v4.11
Bisecting: 7831 revisions left to test after this (roughly 13 steps)
[2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
Bisecting: 3853 revisions left to test after this (roughly 12 steps)
[8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
Bisecting: 2022 revisions left to test after this (roughly 11 steps)
[cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
'mac80211-next-for-davem-2017-04-28' of
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
all runs: crashed: INFO: trying to register non-static key in can_notifier






> > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
>
> This is very limited set of info. And in the end I think we've seen
> all bug types being duped on all other bugs types pair-wise, and at
> the same time we've seen all bug types being not dups to all other bug
> types. So I don't see where this gets us.
> And again as we go back in history all these titles change.
>
> > Also it might be worth to experiment with using neural networks to identify duplicates.
> >
> >
> > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > test commit:
> >         bad = false;
> >         skip = true;
> >         foreach run:
> >                 run_started, crashed, crash := run_repro();
> >
> >                 //kernel built, booted, reproducer launched successfully
> >                 if (run_started)
> >                         skip = false;
> >                 if (crashed && is_duplicates(crash, target_crash))
> >                         bad = true;
> >
> >         if (skip)
> >                 git bisect skip;
> >         else if (bad)
> >                 git bisect bad;
> >         else
> >                 git bisect good;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-21  9:45                 ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-21  9:45 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> >
> >
> >
> > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > >>
> > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > >>>> From bisection log:
> > >>>>
> > >>>>         testing release v4.17
> > >>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > >>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > >>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > >>>>         testing release v4.16
> > >>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > >>>>         run #0: OK
> > >>>>         run #1: OK
> > >>>>         run #2: OK
> > >>>>         run #3: OK
> > >>>>         run #4: OK
> > >>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #6: OK
> > >>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> > >>>>         run #8: OK
> > >>>>         run #9: OK
> > >>>>         testing release v4.15
> > >>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > >>>>         all runs: OK
> > >>>>         # git bisect start v4.16 v4.15
> > >>>>
> > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > >>>
> > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > >>> looks like the right range, no?
> > >>
> > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > >>
> > >> "kernel panic: Out of memory and no killable processes..." is completely
> > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > >
> > >
> > > Do you think this predicate is possible to code?
> >
> > Something like bellow probably would work better than current behavior.
> >
> > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
>
> Lots of bugs (half?) manifest differently. On top of this, titles
> change as we go back in history. On top of this, if we see a different
> bug, it does not mean that the original bug is also not there.
> This will sure solve some subset of cases better then the current
> logic. But I feel that that subset is smaller then what the current
> logic solves.

Counter-examples come up in basically every other bisection.
For example:

bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.19
testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
testing release v4.18
testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
testing release v4.17
testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test

That's a different crash title, unless somebody explicitly code this case.

Or, what crash is this?

testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
run #1: crashed: general protection fault in cpuacct_charge
run #2: crashed: WARNING: suspicious RCU usage in corrupted
run #3: crashed: general protection fault in cpuacct_charge
run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
run #6: crashed: WARNING: suspicious RCU usage
run #7: crashed: no output from test machine
run #8: crashed: no output from test machine


Or, that "INFO: trying to register non-static key in can_notifier"
does not do any testing, but is "WARNING in dma_buf_vunmap" still
there or not?

testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: WARNING in dma_buf_vunmap
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: OK
# git bisect start v4.12 v4.11
Bisecting: 7831 revisions left to test after this (roughly 13 steps)
[2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
Bisecting: 3853 revisions left to test after this (roughly 12 steps)
[8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
all runs: crashed: INFO: trying to register non-static key in can_notifier
# git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
Bisecting: 2022 revisions left to test after this (roughly 11 steps)
[cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
'mac80211-next-for-davem-2017-04-28' of
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
all runs: crashed: INFO: trying to register non-static key in can_notifier






> > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
>
> This is very limited set of info. And in the end I think we've seen
> all bug types being duped on all other bugs types pair-wise, and at
> the same time we've seen all bug types being not dups to all other bug
> types. So I don't see where this gets us.
> And again as we go back in history all these titles change.
>
> > Also it might be worth to experiment with using neural networks to identify duplicates.
> >
> >
> > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > test commit:
> >         bad = false;
> >         skip = true;
> >         foreach run:
> >                 run_started, crashed, crash := run_repro();
> >
> >                 //kernel built, booted, reproducer launched successfully
> >                 if (run_started)
> >                         skip = false;
> >                 if (crashed && is_duplicates(crash, target_crash))
> >                         bad = true;
> >
> >         if (skip)
> >                 git bisect skip;
> >         else if (bad)
> >                 git bisect bad;
> >         else
> >                 git bisect good;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-21  9:45                 ` Dmitry Vyukov
  (?)
@ 2019-03-21  9:51                   ` Dmitry Vyukov
  -1 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-21  9:51 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Thu, Mar 21, 2019 at 10:45 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> > >
> > >
> > >
> > > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > >>
> > > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > > >>>> From bisection log:
> > > >>>>
> > > >>>>         testing release v4.17
> > > >>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > > >>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > > >>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         testing release v4.16
> > > >>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > > >>>>         run #0: OK
> > > >>>>         run #1: OK
> > > >>>>         run #2: OK
> > > >>>>         run #3: OK
> > > >>>>         run #4: OK
> > > >>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #6: OK
> > > >>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #8: OK
> > > >>>>         run #9: OK
> > > >>>>         testing release v4.15
> > > >>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > > >>>>         all runs: OK
> > > >>>>         # git bisect start v4.16 v4.15
> > > >>>>
> > > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > > >>>
> > > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > >>> looks like the right range, no?
> > > >>
> > > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > > >>
> > > >> "kernel panic: Out of memory and no killable processes..." is completely
> > > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > > >
> > > >
> > > > Do you think this predicate is possible to code?
> > >
> > > Something like bellow probably would work better than current behavior.
> > >
> > > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
> >
> > Lots of bugs (half?) manifest differently. On top of this, titles
> > change as we go back in history. On top of this, if we see a different
> > bug, it does not mean that the original bug is also not there.
> > This will sure solve some subset of cases better then the current
> > logic. But I feel that that subset is smaller then what the current
> > logic solves.
>
> Counter-examples come up in basically every other bisection.
> For example:
>
> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.19
> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.18
> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
> testing release v4.17
> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test


And to make things even more interesting, this later changes to "BUG:
unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":

testing release v4.12
testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: general protection fault in refcount_sub_and_test
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: crashed: BUG: unable to handle kernel NULL pointer
dereference in vb2_vmalloc_put

And since the original bug is in vb2 subsystem
(https://syzkaller.appspot.com/bug?id=17535f4bf5b322437f7c639b59161ce343fc55a9),
it's actually not clear even for me, if we should treat it as the same
bug or not. May be different manifestation of the same root cause, or
a different bug around.





> That's a different crash title, unless somebody explicitly code this case.
>
> Or, what crash is this?
>
> testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
> run #1: crashed: general protection fault in cpuacct_charge
> run #2: crashed: WARNING: suspicious RCU usage in corrupted
> run #3: crashed: general protection fault in cpuacct_charge
> run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
> run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
> run #6: crashed: WARNING: suspicious RCU usage
> run #7: crashed: no output from test machine
> run #8: crashed: no output from test machine
>
>
> Or, that "INFO: trying to register non-static key in can_notifier"
> does not do any testing, but is "WARNING in dma_buf_vunmap" still
> there or not?
>
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: WARNING in dma_buf_vunmap
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: OK
> # git bisect start v4.12 v4.11
> Bisecting: 7831 revisions left to test after this (roughly 13 steps)
> [2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
> testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
> Bisecting: 3853 revisions left to test after this (roughly 12 steps)
> [8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
> Bisecting: 2022 revisions left to test after this (roughly 11 steps)
> [cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
> 'mac80211-next-for-davem-2017-04-28' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
> testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
>
>
>
>
>
>
> > > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
> >
> > This is very limited set of info. And in the end I think we've seen
> > all bug types being duped on all other bugs types pair-wise, and at
> > the same time we've seen all bug types being not dups to all other bug
> > types. So I don't see where this gets us.
> > And again as we go back in history all these titles change.
> >
> > > Also it might be worth to experiment with using neural networks to identify duplicates.
> > >
> > >
> > > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > > test commit:
> > >         bad = false;
> > >         skip = true;
> > >         foreach run:
> > >                 run_started, crashed, crash := run_repro();
> > >
> > >                 //kernel built, booted, reproducer launched successfully
> > >                 if (run_started)
> > >                         skip = false;
> > >                 if (crashed && is_duplicates(crash, target_crash))
> > >                         bad = true;
> > >
> > >         if (skip)
> > >                 git bisect skip;
> > >         else if (bad)
> > >                 git bisect bad;
> > >         else
> > >                 git bisect good;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-21  9:51                   ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-21  9:51 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Thu, Mar 21, 2019 at 10:45 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> > >
> > >
> > >
> > > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > >>
> > > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > > >>>> From bisection log:
> > > >>>>
> > > >>>>         testing release v4.17
> > > >>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > > >>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > > >>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         testing release v4.16
> > > >>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > > >>>>         run #0: OK
> > > >>>>         run #1: OK
> > > >>>>         run #2: OK
> > > >>>>         run #3: OK
> > > >>>>         run #4: OK
> > > >>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #6: OK
> > > >>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #8: OK
> > > >>>>         run #9: OK
> > > >>>>         testing release v4.15
> > > >>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > > >>>>         all runs: OK
> > > >>>>         # git bisect start v4.16 v4.15
> > > >>>>
> > > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > > >>>
> > > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > >>> looks like the right range, no?
> > > >>
> > > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > > >>
> > > >> "kernel panic: Out of memory and no killable processes..." is completely
> > > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > > >
> > > >
> > > > Do you think this predicate is possible to code?
> > >
> > > Something like bellow probably would work better than current behavior.
> > >
> > > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
> >
> > Lots of bugs (half?) manifest differently. On top of this, titles
> > change as we go back in history. On top of this, if we see a different
> > bug, it does not mean that the original bug is also not there.
> > This will sure solve some subset of cases better then the current
> > logic. But I feel that that subset is smaller then what the current
> > logic solves.
>
> Counter-examples come up in basically every other bisection.
> For example:
>
> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.19
> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.18
> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
> testing release v4.17
> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test


And to make things even more interesting, this later changes to "BUG:
unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":

testing release v4.12
testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: general protection fault in refcount_sub_and_test
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: crashed: BUG: unable to handle kernel NULL pointer
dereference in vb2_vmalloc_put

And since the original bug is in vb2 subsystem
(https://syzkaller.appspot.com/bug?id\x17535f4bf5b322437f7c639b59161ce343fc55a9),
it's actually not clear even for me, if we should treat it as the same
bug or not. May be different manifestation of the same root cause, or
a different bug around.





> That's a different crash title, unless somebody explicitly code this case.
>
> Or, what crash is this?
>
> testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
> run #1: crashed: general protection fault in cpuacct_charge
> run #2: crashed: WARNING: suspicious RCU usage in corrupted
> run #3: crashed: general protection fault in cpuacct_charge
> run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
> run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
> run #6: crashed: WARNING: suspicious RCU usage
> run #7: crashed: no output from test machine
> run #8: crashed: no output from test machine
>
>
> Or, that "INFO: trying to register non-static key in can_notifier"
> does not do any testing, but is "WARNING in dma_buf_vunmap" still
> there or not?
>
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: WARNING in dma_buf_vunmap
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: OK
> # git bisect start v4.12 v4.11
> Bisecting: 7831 revisions left to test after this (roughly 13 steps)
> [2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
> testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
> Bisecting: 3853 revisions left to test after this (roughly 12 steps)
> [8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
> Bisecting: 2022 revisions left to test after this (roughly 11 steps)
> [cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
> 'mac80211-next-for-davem-2017-04-28' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
> testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
>
>
>
>
>
>
> > > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
> >
> > This is very limited set of info. And in the end I think we've seen
> > all bug types being duped on all other bugs types pair-wise, and at
> > the same time we've seen all bug types being not dups to all other bug
> > types. So I don't see where this gets us.
> > And again as we go back in history all these titles change.
> >
> > > Also it might be worth to experiment with using neural networks to identify duplicates.
> > >
> > >
> > > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > > test commit:
> > >         bad = false;
> > >         skip = true;
> > >         foreach run:
> > >                 run_started, crashed, crash := run_repro();
> > >
> > >                 //kernel built, booted, reproducer launched successfully
> > >                 if (run_started)
> > >                         skip = false;
> > >                 if (crashed && is_duplicates(crash, target_crash))
> > >                         bad = true;
> > >
> > >         if (skip)
> > >                 git bisect skip;
> > >         else if (bad)
> > >                 git bisect bad;
> > >         else
> > >                 git bisect good;

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-21  9:51                   ` Dmitry Vyukov
  0 siblings, 0 replies; 42+ messages in thread
From: Dmitry Vyukov @ 2019-03-21  9:51 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Tetsuo Handa, syzbot, Andrew Morton, Qian Cai, David Miller,
	guro, Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On Thu, Mar 21, 2019 at 10:45 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Mar 20, 2019 at 2:57 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Wed, Mar 20, 2019 at 2:33 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> > >
> > >
> > >
> > > On 3/20/19 1:38 PM, Dmitry Vyukov wrote:
> > > > On Wed, Mar 20, 2019 at 11:24 AM Tetsuo Handa
> > > > <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > > >>
> > > >> On 2019/03/20 18:59, Dmitry Vyukov wrote:
> > > >>>> From bisection log:
> > > >>>>
> > > >>>>         testing release v4.17
> > > >>>>         testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> > > >>>>         run #0: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #1: crashed: kernel panic: corrupted stack end in worker_thread
> > > >>>>         run #2: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #3: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #4: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #5: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #6: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #7: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         run #8: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #9: crashed: kernel panic: corrupted stack end in wb_workfn
> > > >>>>         testing release v4.16
> > > >>>>         testing commit 0adb32858b0bddf4ada5f364a84ed60b196dbcda with gcc (GCC) 8.1.0
> > > >>>>         run #0: OK
> > > >>>>         run #1: OK
> > > >>>>         run #2: OK
> > > >>>>         run #3: OK
> > > >>>>         run #4: OK
> > > >>>>         run #5: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #6: OK
> > > >>>>         run #7: crashed: kernel panic: Out of memory and no killable processes...
> > > >>>>         run #8: OK
> > > >>>>         run #9: OK
> > > >>>>         testing release v4.15
> > > >>>>         testing commit d8a5b80568a9cb66810e75b182018e9edb68e8ff with gcc (GCC) 8.1.0
> > > >>>>         all runs: OK
> > > >>>>         # git bisect start v4.16 v4.15
> > > >>>>
> > > >>>> Why bisect started between 4.16 4.15 instead of 4.17 4.16?
> > > >>>
> > > >>> Because 4.16 was still crashing and 4.15 was not crashing. 4.15..4.16
> > > >>> looks like the right range, no?
> > > >>
> > > >> No, syzbot should bisect between 4.16 and 4.17 regarding this bug, for
> > > >> "Stack corruption" can't manifest as "Out of memory and no killable processes".
> > > >>
> > > >> "kernel panic: Out of memory and no killable processes..." is completely
> > > >> unrelated to "kernel panic: corrupted stack end in wb_workfn".
> > > >
> > > >
> > > > Do you think this predicate is possible to code?
> > >
> > > Something like bellow probably would work better than current behavior.
> > >
> > > For starters, is_duplicates() might just compare 'crash' title with 'target_crash' title and its duplicates titles.
> >
> > Lots of bugs (half?) manifest differently. On top of this, titles
> > change as we go back in history. On top of this, if we see a different
> > bug, it does not mean that the original bug is also not there.
> > This will sure solve some subset of cases better then the current
> > logic. But I feel that that subset is smaller then what the current
> > logic solves.
>
> Counter-examples come up in basically every other bisection.
> For example:
>
> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.19
> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
> testing release v4.18
> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
> testing release v4.17
> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test


And to make things even more interesting, this later changes to "BUG:
unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":

testing release v4.12
testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
all runs: crashed: general protection fault in refcount_sub_and_test
testing release v4.11
testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
all runs: crashed: BUG: unable to handle kernel NULL pointer
dereference in vb2_vmalloc_put

And since the original bug is in vb2 subsystem
(https://syzkaller.appspot.com/bug?id=17535f4bf5b322437f7c639b59161ce343fc55a9),
it's actually not clear even for me, if we should treat it as the same
bug or not. May be different manifestation of the same root cause, or
a different bug around.





> That's a different crash title, unless somebody explicitly code this case.
>
> Or, what crash is this?
>
> testing commit 52358cb5a310990ea5069f986bdab3620e01181f with gcc (GCC) 8.1.0
> run #1: crashed: general protection fault in cpuacct_charge
> run #2: crashed: WARNING: suspicious RCU usage in corrupted
> run #3: crashed: general protection fault in cpuacct_charge
> run #4: crashed: BUG: unable to handle kernel paging request in ipt_do_table
> run #5: crashed: KASAN: stack-out-of-bounds Read in cpuacct_charge
> run #6: crashed: WARNING: suspicious RCU usage
> run #7: crashed: no output from test machine
> run #8: crashed: no output from test machine
>
>
> Or, that "INFO: trying to register non-static key in can_notifier"
> does not do any testing, but is "WARNING in dma_buf_vunmap" still
> there or not?
>
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: WARNING in dma_buf_vunmap
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: OK
> # git bisect start v4.12 v4.11
> Bisecting: 7831 revisions left to test after this (roughly 13 steps)
> [2bd80401743568ced7d303b008ae5298ce77e695] Merge tag 'gpio-v4.12-1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
> testing commit 2bd80401743568ced7d303b008ae5298ce77e695 with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 2bd80401743568ced7d303b008ae5298ce77e695
> Bisecting: 3853 revisions left to test after this (roughly 12 steps)
> [8d65b08debc7e62b2c6032d7fe7389d895b92cbc] Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
> testing commit 8d65b08debc7e62b2c6032d7fe7389d895b92cbc with gcc (GCC) 7.3.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
> # git bisect bad 8d65b08debc7e62b2c6032d7fe7389d895b92cbc
> Bisecting: 2022 revisions left to test after this (roughly 11 steps)
> [cec381919818a9a0cb85600b3c82404bdd38cf36] Merge tag
> 'mac80211-next-for-davem-2017-04-28' of
> git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
> testing commit cec381919818a9a0cb85600b3c82404bdd38cf36 with gcc (GCC) 5.5.0
> all runs: crashed: INFO: trying to register non-static key in can_notifier
>
>
>
>
>
>
> > > syzbot has some knowledge about duplicates with different crash titles when people use "syz dup" command.
> >
> > This is very limited set of info. And in the end I think we've seen
> > all bug types being duped on all other bugs types pair-wise, and at
> > the same time we've seen all bug types being not dups to all other bug
> > types. So I don't see where this gets us.
> > And again as we go back in history all these titles change.
> >
> > > Also it might be worth to experiment with using neural networks to identify duplicates.
> > >
> > >
> > > target_crash = 'kernel panic: corrupted stack end in wb_workfn'
> > > test commit:
> > >         bad = false;
> > >         skip = true;
> > >         foreach run:
> > >                 run_started, crashed, crash := run_repro();
> > >
> > >                 //kernel built, booted, reproducer launched successfully
> > >                 if (run_started)
> > >                         skip = false;
> > >                 if (crashed && is_duplicates(crash, target_crash))
> > >                         bad = true;
> > >
> > >         if (skip)
> > >                 git bisect skip;
> > >         else if (bad)
> > >                 git bisect bad;
> > >         else
> > >                 git bisect good;


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
  2019-03-21  9:51                   ` Dmitry Vyukov
@ 2019-03-21 11:41                     ` Tetsuo Handa
  -1 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2019-03-21 11:41 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrey Ryabinin
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On 2019/03/21 18:51, Dmitry Vyukov wrote:
>>> Lots of bugs (half?) manifest differently. On top of this, titles
>>> change as we go back in history. On top of this, if we see a different
>>> bug, it does not mean that the original bug is also not there.
>>> This will sure solve some subset of cases better then the current
>>> logic. But I feel that that subset is smaller then what the current
>>> logic solves.
>>
>> Counter-examples come up in basically every other bisection.
>> For example:
>>
>> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
>> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
>> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
>> testing release v4.19
>> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
>> testing release v4.18
>> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
>> testing release v4.17
>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
> 
> 
> And to make things even more interesting, this later changes to "BUG:
> unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":
> 
> testing release v4.12
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: general protection fault in refcount_sub_and_test
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: crashed: BUG: unable to handle kernel NULL pointer
> dereference in vb2_vmalloc_put
> 
> And since the original bug is in vb2 subsystem
> (https://syzkaller.appspot.com/bug?id=17535f4bf5b322437f7c639b59161ce343fc55a9),
> it's actually not clear even for me, if we should treat it as the same
> bug or not. May be different manifestation of the same root cause, or
> a different bug around.
> 

Well, maybe we should use reproducers for checking whether each not-yet-fixed
problem is reproducible with old kernels rather than finding specific commit
that is causing specific problem?

I think there are two patterns syzbot starts reporting.

  (a) a commit which causes one or more problems is merged into a codebase where
      syzbot was already testing because syzbot already knew what/how should
      that codebase be tested.

  (b) a commit which causes one or more problems was already there in a codebase
      where syzbot did not know until now what/how should that codebase be tested.

(a) tends to require testing new kernels (i.e. bisection range is narrow) whereas
(b) tends to require testing old kernels (i.e. bisection range is wide).

Regarding case (b), it is difficult for developers to guess when the problem
started, and I think that (b) tends to confuse automatic bisection attempts.

Therefore, instead of trying to find specific commit for specific problem using
"git bisect" approach, try running all reproducers (gathered from all problems)
on each release (e.g. each git tag) and append reproduced crashes to the

  Manager Time Kernel Commit Syzkaller Config Log Report Syz repro C repro Maintainers

table for each not-yet-fixed problem of dashboard interface. That is, if running a
repro1 from problem1 on some old kernel reproduced a crash for problem2, append the
crash to the problem2's table. Maybe we want to use a new table with only

  Kernel Commit Syzkaller Config Log Report Syz repro C repro

entries because what we want to know is the oldest kernel release which helps
guessing when the problem started.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: kernel panic: corrupted stack end in wb_workfn
@ 2019-03-21 11:41                     ` Tetsuo Handa
  0 siblings, 0 replies; 42+ messages in thread
From: Tetsuo Handa @ 2019-03-21 11:41 UTC (permalink / raw)
  To: Dmitry Vyukov, Andrey Ryabinin
  Cc: syzbot, Andrew Morton, Qian Cai, David Miller, guro,
	Johannes Weiner, Josef Bacik, Kirill Tkhai, LKML, Linux-MM,
	linux-sctp, Mel Gorman, Michal Hocko, netdev, Neil Horman,
	Shakeel Butt, syzkaller-bugs, Al Viro, Vladislav Yasevich,
	Matthew Wilcox, Xin Long

On 2019/03/21 18:51, Dmitry Vyukov wrote:
>>> Lots of bugs (half?) manifest differently. On top of this, titles
>>> change as we go back in history. On top of this, if we see a different
>>> bug, it does not mean that the original bug is also not there.
>>> This will sure solve some subset of cases better then the current
>>> logic. But I feel that that subset is smaller then what the current
>>> logic solves.
>>
>> Counter-examples come up in basically every other bisection.
>> For example:
>>
>> bisecting cause commit starting from ccda4af0f4b92f7b4c308d3acc262f4a7e3affad
>> building syzkaller on 5f5f6d14e80b8bd6b42db961118e902387716bcb
>> testing commit ccda4af0f4b92f7b4c308d3acc262f4a7e3affad with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
>> testing release v4.19
>> testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test_checked
>> testing release v4.18
>> testing commit 94710cac0ef4ee177a63b5227664b38c95bbf703 with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
>> testing release v4.17
>> testing commit 29dcea88779c856c7dc92040a0c01233263101d4 with gcc (GCC) 8.1.0
>> all runs: crashed: KASAN: null-ptr-deref Read in refcount_sub_and_test
> 
> 
> And to make things even more interesting, this later changes to "BUG:
> unable to handle kernel NULL pointer dereference in vb2_vmalloc_put":
> 
> testing release v4.12
> testing commit 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c with gcc (GCC) 8.1.0
> all runs: crashed: general protection fault in refcount_sub_and_test
> testing release v4.11
> testing commit a351e9b9fc24e982ec2f0e76379a49826036da12 with gcc (GCC) 7.3.0
> all runs: crashed: BUG: unable to handle kernel NULL pointer
> dereference in vb2_vmalloc_put
> 
> And since the original bug is in vb2 subsystem
> (https://syzkaller.appspot.com/bug?id\x17535f4bf5b322437f7c639b59161ce343fc55a9),
> it's actually not clear even for me, if we should treat it as the same
> bug or not. May be different manifestation of the same root cause, or
> a different bug around.
> 

Well, maybe we should use reproducers for checking whether each not-yet-fixed
problem is reproducible with old kernels rather than finding specific commit
that is causing specific problem?

I think there are two patterns syzbot starts reporting.

  (a) a commit which causes one or more problems is merged into a codebase where
      syzbot was already testing because syzbot already knew what/how should
      that codebase be tested.

  (b) a commit which causes one or more problems was already there in a codebase
      where syzbot did not know until now what/how should that codebase be tested.

(a) tends to require testing new kernels (i.e. bisection range is narrow) whereas
(b) tends to require testing old kernels (i.e. bisection range is wide).

Regarding case (b), it is difficult for developers to guess when the problem
started, and I think that (b) tends to confuse automatic bisection attempts.

Therefore, instead of trying to find specific commit for specific problem using
"git bisect" approach, try running all reproducers (gathered from all problems)
on each release (e.g. each git tag) and append reproduced crashes to the

  Manager Time Kernel Commit Syzkaller Config Log Report Syz repro C repro Maintainers

table for each not-yet-fixed problem of dashboard interface. That is, if running a
repro1 from problem1 on some old kernel reproduced a crash for problem2, append the
crash to the problem2's table. Maybe we want to use a new table with only

  Kernel Commit Syzkaller Config Log Report Syz repro C repro

entries because what we want to know is the oldest kernel release which helps
guessing when the problem started.

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2019-03-21 11:43 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-31  3:41 kernel panic: corrupted stack end in wb_workfn syzbot
2018-12-31  3:41 ` syzbot
2018-12-31  3:47 ` Qian Cai
2018-12-31  6:31   ` Dmitry Vyukov
2018-12-31  6:31     ` Dmitry Vyukov
2019-03-17 20:49 ` syzbot
2019-03-17 20:49   ` syzbot
2019-03-17 20:49   ` syzbot
2019-03-19 18:03   ` Xin Long
2019-03-19 18:03     ` Xin Long
2019-03-19 18:03     ` Xin Long
2019-03-20  9:56   ` Andrey Ryabinin
2019-03-20  9:56     ` Andrey Ryabinin
2019-03-20  9:59     ` Dmitry Vyukov
2019-03-20  9:59       ` Dmitry Vyukov
2019-03-20  9:59       ` Dmitry Vyukov
2019-03-20 10:23       ` Tetsuo Handa
2019-03-20 10:23         ` Tetsuo Handa
2019-03-20 10:38         ` Dmitry Vyukov
2019-03-20 10:38           ` Dmitry Vyukov
2019-03-20 10:38           ` Dmitry Vyukov
2019-03-20 10:42           ` Dmitry Vyukov
2019-03-20 10:42             ` Dmitry Vyukov
2019-03-20 10:42             ` Dmitry Vyukov
2019-03-20 10:58             ` Tetsuo Handa
2019-03-20 10:58               ` Tetsuo Handa
2019-03-20 13:59               ` Dmitry Vyukov
2019-03-20 13:59                 ` Dmitry Vyukov
2019-03-20 13:59                 ` Dmitry Vyukov
2019-03-20 13:34           ` Andrey Ryabinin
2019-03-20 13:34             ` Andrey Ryabinin
2019-03-20 13:57             ` Dmitry Vyukov
2019-03-20 13:57               ` Dmitry Vyukov
2019-03-20 13:57               ` Dmitry Vyukov
2019-03-21  9:45               ` Dmitry Vyukov
2019-03-21  9:45                 ` Dmitry Vyukov
2019-03-21  9:45                 ` Dmitry Vyukov
2019-03-21  9:51                 ` Dmitry Vyukov
2019-03-21  9:51                   ` Dmitry Vyukov
2019-03-21  9:51                   ` Dmitry Vyukov
2019-03-21 11:41                   ` Tetsuo Handa
2019-03-21 11:41                     ` Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.