* KASAN: stack-out-of-bounds Read in csd_lock_record
@ 2020-07-03 23:31 syzbot
2020-07-04 0:48 ` syzbot
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: syzbot @ 2020-07-03 23:31 UTC (permalink / raw)
To: bigeasy, linux-kernel, mingo, paulmck, peterz, syzkaller-bugs, tglx
Hello,
syzbot found the following crash on:
HEAD commit: 9e50b94b Add linux-next specific files for 20200703
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000
kernel config: https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
compiler: gcc (GCC) 10.1.0-syz 20200507
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com
==================================================================
BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xcb/0xe0 kernel/smp.c:118
Read of size 8 at addr ffffc90001727710 by task syz-executor.0/10721
CPU: 1 PID: 10721 Comm: syz-executor.0 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
csd_lock_record+0xcb/0xe0 kernel/smp.c:118
flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391
__sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248
asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243
asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191
Code: 48 c7 c0 00 ff b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d 9b 74 c8 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 95 fb 62 f9 65 8b 05 fe 73 15 78
RSP: 0018:ffffc900016e7558 EFLAGS: 00000282
RAX: 1ffffffff1369fe0 RBX: 0000000000000282 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282
RBP: ffffffff8cb02508 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 1ffffffff19604a0
R13: 0000000000000000 R14: dead000000000100 R15: dffffc0000000000
__debug_check_no_obj_freed lib/debugobjects.c:977 [inline]
debug_check_no_obj_freed+0x20c/0x41c lib/debugobjects.c:998
free_pages_prepare mm/page_alloc.c:1219 [inline]
__free_pages_ok+0x20b/0xc90 mm/page_alloc.c:1471
release_pages+0x5ec/0x17a0 mm/swap.c:880
tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:249
zap_pte_range mm/memory.c:1155 [inline]
zap_pmd_range mm/memory.c:1193 [inline]
zap_pud_range mm/memory.c:1222 [inline]
zap_p4d_range mm/memory.c:1243 [inline]
unmap_page_range+0x1e22/0x2b20 mm/memory.c:1264
unmap_single_vma+0x198/0x300 mm/memory.c:1309
unmap_vmas+0x16f/0x2f0 mm/memory.c:1341
exit_mmap+0x2b1/0x530 mm/mmap.c:3165
__mmput+0x122/0x470 kernel/fork.c:1075
mmput+0x53/0x60 kernel/fork.c:1096
exit_mm kernel/exit.c:483 [inline]
do_exit+0xa8f/0x2a40 kernel/exit.c:793
do_group_exit+0x125/0x310 kernel/exit.c:904
get_signal+0x40b/0x1ee0 kernel/signal.c:2743
do_signal+0x82/0x2520 arch/x86/kernel/signal.c:810
exit_to_usermode_loop arch/x86/entry/common.c:218 [inline]
__prepare_exit_to_usermode+0x156/0x1f0 arch/x86/entry/common.c:252
do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:376
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cb29
Code: Bad RIP value.
RSP: 002b:00007fb154b96cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000001 RBX: 000000000078bf08 RCX: 000000000045cb29
RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000000000078bf0c
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078bf0c
R13: 00007ffd3933f26f R14: 00007fb154b979c0 R15: 000000000078bf0c
Memory state around the buggy address:
ffffc90001727600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc90001727680: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
>ffffc90001727700: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
^
ffffc90001727780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc90001727800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: KASAN: stack-out-of-bounds Read in csd_lock_record 2020-07-03 23:31 KASAN: stack-out-of-bounds Read in csd_lock_record syzbot @ 2020-07-04 0:48 ` syzbot 2020-07-04 16:45 ` Paul E. McKenney 2020-10-09 6:35 ` [tip: core/rcu] kernel/smp: Provide CSD lock timeout diagnostics tip-bot2 for Paul E. McKenney 2 siblings, 0 replies; 9+ messages in thread From: syzbot @ 2020-07-04 0:48 UTC (permalink / raw) To: bigeasy, linux-kernel, mingo, paulmck, peterz, syzkaller-bugs, tglx syzbot has found a reproducer for the following crash on: HEAD commit: 9e50b94b Add linux-next specific files for 20200703 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=1224dc83100000 kernel config: https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6 dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e compiler: gcc (GCC) 10.1.0-syz 20200507 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=170442d5100000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=162ef66d100000 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ================================================================== BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xd2/0xe0 kernel/smp.c:119 Read of size 8 at addr ffffc900016d75f8 by task swapper/1/0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x18f/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383 __kasan_report mm/kasan/report.c:513 [inline] kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 csd_lock_record+0xd2/0xe0 kernel/smp.c:119 flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391 __sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243 asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604 RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61 Code: ff 4c 89 ef e8 33 30 c7 f9 e9 8e fe ff ff 48 89 df e8 26 30 c7 f9 eb 8a cc cc cc cc e9 07 00 00 00 0f 00 2d 14 4b 5c 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 04 4b 5c 00 f4 c3 cc cc 55 53 e8 c9 RSP: 0018:ffffc90000d3fd18 EFLAGS: 00000293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff8880a95f0340 RSI: ffffffff87ec78c8 RDI: ffffffff87ec789e RBP: ffff88821af4d864 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88821af4d864 R13: 1ffff920001a7fad R14: ffff88821af4d865 R15: 0000000000000001 arch_safe_halt arch/x86/include/asm/paravirt.h:150 [inline] acpi_safe_halt+0x8d/0x110 drivers/acpi/processor_idle.c:111 acpi_idle_do_entry+0x15c/0x1b0 drivers/acpi/processor_idle.c:525 acpi_idle_enter+0x3f9/0xab0 drivers/acpi/processor_idle.c:651 cpuidle_enter_state+0xff/0x960 drivers/cpuidle/cpuidle.c:235 cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:346 call_cpuidle kernel/sched/idle.c:126 [inline] cpuidle_idle_call kernel/sched/idle.c:214 [inline] do_idle+0x431/0x6d0 kernel/sched/idle.c:276 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:372 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Memory state around the buggy address: ffffc900016d7480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffc900016d7500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffffc900016d7580: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 f3 f3 f3 f3 ^ ffffc900016d7600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffc900016d7680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: KASAN: stack-out-of-bounds Read in csd_lock_record 2020-07-03 23:31 KASAN: stack-out-of-bounds Read in csd_lock_record syzbot 2020-07-04 0:48 ` syzbot @ 2020-07-04 16:45 ` Paul E. McKenney 2020-07-04 18:34 ` Dmitry Vyukov 2020-10-09 6:35 ` [tip: core/rcu] kernel/smp: Provide CSD lock timeout diagnostics tip-bot2 for Paul E. McKenney 2 siblings, 1 reply; 9+ messages in thread From: Paul E. McKenney @ 2020-07-04 16:45 UTC (permalink / raw) To: syzbot; +Cc: bigeasy, linux-kernel, mingo, peterz, syzkaller-bugs, tglx On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote: > Hello, > > syzbot found the following crash on: > > HEAD commit: 9e50b94b Add linux-next specific files for 20200703 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000 > kernel config: https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6 > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e > compiler: gcc (GCC) 10.1.0-syz 20200507 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000 > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com Good catch! A call to csd_lock_record() was on the wrong side of a call to csd_unlock(). But is folded into another commit for bisectability reasons, so "Reported-by" would not make sense. I have instead added this to the commit log: [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ] Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com Thanx, Paul > ================================================================== > BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xcb/0xe0 kernel/smp.c:118 > Read of size 8 at addr ffffc90001727710 by task syz-executor.0/10721 > > CPU: 1 PID: 10721 Comm: syz-executor.0 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > Call Trace: > <IRQ> > __dump_stack lib/dump_stack.c:77 [inline] > dump_stack+0x18f/0x20d lib/dump_stack.c:118 > print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383 > __kasan_report mm/kasan/report.c:513 [inline] > kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 > csd_lock_record+0xcb/0xe0 kernel/smp.c:118 > flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391 > __sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248 > asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706 > </IRQ> > __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] > run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] > sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243 > asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604 > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline] > RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] > RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191 > Code: 48 c7 c0 00 ff b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d 9b 74 c8 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 95 fb 62 f9 65 8b 05 fe 73 15 78 > RSP: 0018:ffffc900016e7558 EFLAGS: 00000282 > RAX: 1ffffffff1369fe0 RBX: 0000000000000282 RCX: 0000000000000000 > RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282 > RBP: ffffffff8cb02508 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000001 R11: 0000000000000000 R12: 1ffffffff19604a0 > R13: 0000000000000000 R14: dead000000000100 R15: dffffc0000000000 > __debug_check_no_obj_freed lib/debugobjects.c:977 [inline] > debug_check_no_obj_freed+0x20c/0x41c lib/debugobjects.c:998 > free_pages_prepare mm/page_alloc.c:1219 [inline] > __free_pages_ok+0x20b/0xc90 mm/page_alloc.c:1471 > release_pages+0x5ec/0x17a0 mm/swap.c:880 > tlb_batch_pages_flush mm/mmu_gather.c:49 [inline] > tlb_flush_mmu_free mm/mmu_gather.c:242 [inline] > tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:249 > zap_pte_range mm/memory.c:1155 [inline] > zap_pmd_range mm/memory.c:1193 [inline] > zap_pud_range mm/memory.c:1222 [inline] > zap_p4d_range mm/memory.c:1243 [inline] > unmap_page_range+0x1e22/0x2b20 mm/memory.c:1264 > unmap_single_vma+0x198/0x300 mm/memory.c:1309 > unmap_vmas+0x16f/0x2f0 mm/memory.c:1341 > exit_mmap+0x2b1/0x530 mm/mmap.c:3165 > __mmput+0x122/0x470 kernel/fork.c:1075 > mmput+0x53/0x60 kernel/fork.c:1096 > exit_mm kernel/exit.c:483 [inline] > do_exit+0xa8f/0x2a40 kernel/exit.c:793 > do_group_exit+0x125/0x310 kernel/exit.c:904 > get_signal+0x40b/0x1ee0 kernel/signal.c:2743 > do_signal+0x82/0x2520 arch/x86/kernel/signal.c:810 > exit_to_usermode_loop arch/x86/entry/common.c:218 [inline] > __prepare_exit_to_usermode+0x156/0x1f0 arch/x86/entry/common.c:252 > do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:376 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x45cb29 > Code: Bad RIP value. > RSP: 002b:00007fb154b96cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > RAX: 0000000000000001 RBX: 000000000078bf08 RCX: 000000000045cb29 > RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000000000078bf0c > RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078bf0c > R13: 00007ffd3933f26f R14: 00007fb154b979c0 R15: 000000000078bf0c > > > Memory state around the buggy address: > ffffc90001727600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ffffc90001727680: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 > >ffffc90001727700: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 > ^ > ffffc90001727780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ffffc90001727800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ================================================================== > > > --- > This bug is generated by a bot. It may contain errors. > See https://goo.gl/tpsmEJ for more information about syzbot. > syzbot engineers can be reached at syzkaller@googlegroups.com. > > syzbot will keep track of this bug report. See: > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > syzbot can test patches for this bug, for details see: > https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: KASAN: stack-out-of-bounds Read in csd_lock_record 2020-07-04 16:45 ` Paul E. McKenney @ 2020-07-04 18:34 ` Dmitry Vyukov 2020-07-07 15:51 ` Dmitry Vyukov 0 siblings, 1 reply; 9+ messages in thread From: Dmitry Vyukov @ 2020-07-04 18:34 UTC (permalink / raw) To: Paul E. McKenney Cc: syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner On Sat, Jul 4, 2020 at 6:45 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote: > > Hello, > > > > syzbot found the following crash on: > > > > HEAD commit: 9e50b94b Add linux-next specific files for 20200703 > > git tree: linux-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6 > > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000 > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com > > Good catch! A call to csd_lock_record() was on the wrong side of a > call to csd_unlock(). Thanks for taking a look. > But is folded into another commit for bisectability reasons, so > "Reported-by" would not make sense. I have instead added this to the > commit log: > > [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ] > Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com > Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com This should work, as far as I remember sybot looks for the email+hash anywhere in the commit. FWIW Tested-by can make sense as well. > Thanx, Paul > > > ================================================================== > > BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xcb/0xe0 kernel/smp.c:118 > > Read of size 8 at addr ffffc90001727710 by task syz-executor.0/10721 > > > > CPU: 1 PID: 10721 Comm: syz-executor.0 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0 > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > Call Trace: > > <IRQ> > > __dump_stack lib/dump_stack.c:77 [inline] > > dump_stack+0x18f/0x20d lib/dump_stack.c:118 > > print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383 > > __kasan_report mm/kasan/report.c:513 [inline] > > kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 > > csd_lock_record+0xcb/0xe0 kernel/smp.c:118 > > flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391 > > __sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248 > > asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706 > > </IRQ> > > __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] > > run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] > > sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243 > > asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604 > > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline] > > RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] > > RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191 > > Code: 48 c7 c0 00 ff b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d 9b 74 c8 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 95 fb 62 f9 65 8b 05 fe 73 15 78 > > RSP: 0018:ffffc900016e7558 EFLAGS: 00000282 > > RAX: 1ffffffff1369fe0 RBX: 0000000000000282 RCX: 0000000000000000 > > RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282 > > RBP: ffffffff8cb02508 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000001 R11: 0000000000000000 R12: 1ffffffff19604a0 > > R13: 0000000000000000 R14: dead000000000100 R15: dffffc0000000000 > > __debug_check_no_obj_freed lib/debugobjects.c:977 [inline] > > debug_check_no_obj_freed+0x20c/0x41c lib/debugobjects.c:998 > > free_pages_prepare mm/page_alloc.c:1219 [inline] > > __free_pages_ok+0x20b/0xc90 mm/page_alloc.c:1471 > > release_pages+0x5ec/0x17a0 mm/swap.c:880 > > tlb_batch_pages_flush mm/mmu_gather.c:49 [inline] > > tlb_flush_mmu_free mm/mmu_gather.c:242 [inline] > > tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:249 > > zap_pte_range mm/memory.c:1155 [inline] > > zap_pmd_range mm/memory.c:1193 [inline] > > zap_pud_range mm/memory.c:1222 [inline] > > zap_p4d_range mm/memory.c:1243 [inline] > > unmap_page_range+0x1e22/0x2b20 mm/memory.c:1264 > > unmap_single_vma+0x198/0x300 mm/memory.c:1309 > > unmap_vmas+0x16f/0x2f0 mm/memory.c:1341 > > exit_mmap+0x2b1/0x530 mm/mmap.c:3165 > > __mmput+0x122/0x470 kernel/fork.c:1075 > > mmput+0x53/0x60 kernel/fork.c:1096 > > exit_mm kernel/exit.c:483 [inline] > > do_exit+0xa8f/0x2a40 kernel/exit.c:793 > > do_group_exit+0x125/0x310 kernel/exit.c:904 > > get_signal+0x40b/0x1ee0 kernel/signal.c:2743 > > do_signal+0x82/0x2520 arch/x86/kernel/signal.c:810 > > exit_to_usermode_loop arch/x86/entry/common.c:218 [inline] > > __prepare_exit_to_usermode+0x156/0x1f0 arch/x86/entry/common.c:252 > > do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:376 > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > RIP: 0033:0x45cb29 > > Code: Bad RIP value. > > RSP: 002b:00007fb154b96cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > > RAX: 0000000000000001 RBX: 000000000078bf08 RCX: 000000000045cb29 > > RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000000000078bf0c > > RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 > > R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078bf0c > > R13: 00007ffd3933f26f R14: 00007fb154b979c0 R15: 000000000078bf0c > > > > > > Memory state around the buggy address: > > ffffc90001727600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ffffc90001727680: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 > > >ffffc90001727700: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 > > ^ > > ffffc90001727780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ffffc90001727800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > ================================================================== > > > > > > --- > > This bug is generated by a bot. It may contain errors. > > See https://goo.gl/tpsmEJ for more information about syzbot. > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > syzbot will keep track of this bug report. See: > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > syzbot can test patches for this bug, for details see: > > https://goo.gl/tpsmEJ#testing-patches > > -- > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20200704164522.GO9247%40paulmck-ThinkPad-P72. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: KASAN: stack-out-of-bounds Read in csd_lock_record 2020-07-04 18:34 ` Dmitry Vyukov @ 2020-07-07 15:51 ` Dmitry Vyukov 2020-07-07 16:26 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: Dmitry Vyukov @ 2020-07-07 15:51 UTC (permalink / raw) To: Paul E. McKenney Cc: syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner On Sat, Jul 4, 2020 at 8:34 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > On Sat, Jul 4, 2020 at 6:45 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote: > > > Hello, > > > > > > syzbot found the following crash on: > > > > > > HEAD commit: 9e50b94b Add linux-next specific files for 20200703 > > > git tree: linux-next > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6 > > > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000 > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com > > > > Good catch! A call to csd_lock_record() was on the wrong side of a > > call to csd_unlock(). > > Thanks for taking a look. > > > But is folded into another commit for bisectability reasons, so > > "Reported-by" would not make sense. I have instead added this to the > > commit log: > > > > [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ] > > Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com > > Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com > > This should work, as far as I remember sybot looks for the email+hash > anywhere in the commit. > FWIW Tested-by can make sense as well. Paul, there is also some spike of stalls in smp_call_function, if you look at the top ones at: https://syzkaller.appspot.com/upstream#open Can these be caused by the same root cause? I am not sure what trees the bug was/is present... This seems to only happen on linux-next and nowhere else. But these stalls equally happen on mainline... > > Thanx, Paul > > > > > ================================================================== > > > BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xcb/0xe0 kernel/smp.c:118 > > > Read of size 8 at addr ffffc90001727710 by task syz-executor.0/10721 > > > > > > CPU: 1 PID: 10721 Comm: syz-executor.0 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0 > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > Call Trace: > > > <IRQ> > > > __dump_stack lib/dump_stack.c:77 [inline] > > > dump_stack+0x18f/0x20d lib/dump_stack.c:118 > > > print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383 > > > __kasan_report mm/kasan/report.c:513 [inline] > > > kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 > > > csd_lock_record+0xcb/0xe0 kernel/smp.c:118 > > > flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391 > > > __sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248 > > > asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706 > > > </IRQ> > > > __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] > > > run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] > > > sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243 > > > asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604 > > > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline] > > > RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] > > > RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191 > > > Code: 48 c7 c0 00 ff b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d 9b 74 c8 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 95 fb 62 f9 65 8b 05 fe 73 15 78 > > > RSP: 0018:ffffc900016e7558 EFLAGS: 00000282 > > > RAX: 1ffffffff1369fe0 RBX: 0000000000000282 RCX: 0000000000000000 > > > RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282 > > > RBP: ffffffff8cb02508 R08: 0000000000000000 R09: 0000000000000000 > > > R10: 0000000000000001 R11: 0000000000000000 R12: 1ffffffff19604a0 > > > R13: 0000000000000000 R14: dead000000000100 R15: dffffc0000000000 > > > __debug_check_no_obj_freed lib/debugobjects.c:977 [inline] > > > debug_check_no_obj_freed+0x20c/0x41c lib/debugobjects.c:998 > > > free_pages_prepare mm/page_alloc.c:1219 [inline] > > > __free_pages_ok+0x20b/0xc90 mm/page_alloc.c:1471 > > > release_pages+0x5ec/0x17a0 mm/swap.c:880 > > > tlb_batch_pages_flush mm/mmu_gather.c:49 [inline] > > > tlb_flush_mmu_free mm/mmu_gather.c:242 [inline] > > > tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:249 > > > zap_pte_range mm/memory.c:1155 [inline] > > > zap_pmd_range mm/memory.c:1193 [inline] > > > zap_pud_range mm/memory.c:1222 [inline] > > > zap_p4d_range mm/memory.c:1243 [inline] > > > unmap_page_range+0x1e22/0x2b20 mm/memory.c:1264 > > > unmap_single_vma+0x198/0x300 mm/memory.c:1309 > > > unmap_vmas+0x16f/0x2f0 mm/memory.c:1341 > > > exit_mmap+0x2b1/0x530 mm/mmap.c:3165 > > > __mmput+0x122/0x470 kernel/fork.c:1075 > > > mmput+0x53/0x60 kernel/fork.c:1096 > > > exit_mm kernel/exit.c:483 [inline] > > > do_exit+0xa8f/0x2a40 kernel/exit.c:793 > > > do_group_exit+0x125/0x310 kernel/exit.c:904 > > > get_signal+0x40b/0x1ee0 kernel/signal.c:2743 > > > do_signal+0x82/0x2520 arch/x86/kernel/signal.c:810 > > > exit_to_usermode_loop arch/x86/entry/common.c:218 [inline] > > > __prepare_exit_to_usermode+0x156/0x1f0 arch/x86/entry/common.c:252 > > > do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:376 > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > RIP: 0033:0x45cb29 > > > Code: Bad RIP value. > > > RSP: 002b:00007fb154b96cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > > > RAX: 0000000000000001 RBX: 000000000078bf08 RCX: 000000000045cb29 > > > RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000000000078bf0c > > > RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 > > > R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078bf0c > > > R13: 00007ffd3933f26f R14: 00007fb154b979c0 R15: 000000000078bf0c > > > > > > > > > Memory state around the buggy address: > > > ffffc90001727600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > ffffc90001727680: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 > > > >ffffc90001727700: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 > > > ^ > > > ffffc90001727780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > ffffc90001727800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > ================================================================== > > > > > > > > > --- > > > This bug is generated by a bot. It may contain errors. > > > See https://goo.gl/tpsmEJ for more information about syzbot. > > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > > > syzbot will keep track of this bug report. See: > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > syzbot can test patches for this bug, for details see: > > > https://goo.gl/tpsmEJ#testing-patches > > > > -- > > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20200704164522.GO9247%40paulmck-ThinkPad-P72. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: KASAN: stack-out-of-bounds Read in csd_lock_record 2020-07-07 15:51 ` Dmitry Vyukov @ 2020-07-07 16:26 ` Paul E. McKenney 2020-07-09 10:13 ` Dmitry Vyukov 0 siblings, 1 reply; 9+ messages in thread From: Paul E. McKenney @ 2020-07-07 16:26 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner On Tue, Jul 07, 2020 at 05:51:48PM +0200, Dmitry Vyukov wrote: > On Sat, Jul 4, 2020 at 8:34 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > On Sat, Jul 4, 2020 at 6:45 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote: > > > > Hello, > > > > > > > > syzbot found the following crash on: > > > > > > > > HEAD commit: 9e50b94b Add linux-next specific files for 20200703 > > > > git tree: linux-next > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6 > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e > > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000 > > > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > > Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com > > > > > > Good catch! A call to csd_lock_record() was on the wrong side of a > > > call to csd_unlock(). > > > > Thanks for taking a look. > > > > > But is folded into another commit for bisectability reasons, so > > > "Reported-by" would not make sense. I have instead added this to the > > > commit log: > > > > > > [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ] > > > Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com > > > Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com > > > > This should work, as far as I remember sybot looks for the email+hash > > anywhere in the commit. > > FWIW Tested-by can make sense as well. > > Paul, there is also some spike of stalls in smp_call_function, > if you look at the top ones at: > https://syzkaller.appspot.com/upstream#open > > Can these be caused by the same root cause? > I am not sure what trees the bug was/is present... This seems to only > happen on linux-next and nowhere else. But these stalls equally happen > on mainline... I would be surprised, given that the csd_unlock() was before the faulting reference. But then again, I have been surprised before. You aren't running scftorture with its longwait parameter set to a non-zero value, are you? In that case, stalls are expected behavior. This is to support test the CSD lock diagnostics in -rcu. Which isn't in mainline yet, so maybe I am asking a stupid question. If these are repeatable, one thing to try is to build the kernel with CSD_LOCK_WAIT_DEBUG=y. This requires c6c67d89c059 ("smp: Add source and destination CPUs to __call_single_data") and 216d15e0d870 ("kernel/smp: Provide CSD lock timeout diagnostics") from the -rcu tree's "dev" branch. This will dump out the smp_call_function() function that was to be invoked, on the off-chance that the problem is something like lock contention in that function. Thanx, Paul > > > > ================================================================== > > > > BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xcb/0xe0 kernel/smp.c:118 > > > > Read of size 8 at addr ffffc90001727710 by task syz-executor.0/10721 > > > > > > > > CPU: 1 PID: 10721 Comm: syz-executor.0 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0 > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > > Call Trace: > > > > <IRQ> > > > > __dump_stack lib/dump_stack.c:77 [inline] > > > > dump_stack+0x18f/0x20d lib/dump_stack.c:118 > > > > print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383 > > > > __kasan_report mm/kasan/report.c:513 [inline] > > > > kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530 > > > > csd_lock_record+0xcb/0xe0 kernel/smp.c:118 > > > > flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391 > > > > __sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248 > > > > asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706 > > > > </IRQ> > > > > __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline] > > > > run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline] > > > > sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243 > > > > asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604 > > > > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline] > > > > RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] > > > > RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191 > > > > Code: 48 c7 c0 00 ff b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d 9b 74 c8 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 95 fb 62 f9 65 8b 05 fe 73 15 78 > > > > RSP: 0018:ffffc900016e7558 EFLAGS: 00000282 > > > > RAX: 1ffffffff1369fe0 RBX: 0000000000000282 RCX: 0000000000000000 > > > > RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282 > > > > RBP: ffffffff8cb02508 R08: 0000000000000000 R09: 0000000000000000 > > > > R10: 0000000000000001 R11: 0000000000000000 R12: 1ffffffff19604a0 > > > > R13: 0000000000000000 R14: dead000000000100 R15: dffffc0000000000 > > > > __debug_check_no_obj_freed lib/debugobjects.c:977 [inline] > > > > debug_check_no_obj_freed+0x20c/0x41c lib/debugobjects.c:998 > > > > free_pages_prepare mm/page_alloc.c:1219 [inline] > > > > __free_pages_ok+0x20b/0xc90 mm/page_alloc.c:1471 > > > > release_pages+0x5ec/0x17a0 mm/swap.c:880 > > > > tlb_batch_pages_flush mm/mmu_gather.c:49 [inline] > > > > tlb_flush_mmu_free mm/mmu_gather.c:242 [inline] > > > > tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:249 > > > > zap_pte_range mm/memory.c:1155 [inline] > > > > zap_pmd_range mm/memory.c:1193 [inline] > > > > zap_pud_range mm/memory.c:1222 [inline] > > > > zap_p4d_range mm/memory.c:1243 [inline] > > > > unmap_page_range+0x1e22/0x2b20 mm/memory.c:1264 > > > > unmap_single_vma+0x198/0x300 mm/memory.c:1309 > > > > unmap_vmas+0x16f/0x2f0 mm/memory.c:1341 > > > > exit_mmap+0x2b1/0x530 mm/mmap.c:3165 > > > > __mmput+0x122/0x470 kernel/fork.c:1075 > > > > mmput+0x53/0x60 kernel/fork.c:1096 > > > > exit_mm kernel/exit.c:483 [inline] > > > > do_exit+0xa8f/0x2a40 kernel/exit.c:793 > > > > do_group_exit+0x125/0x310 kernel/exit.c:904 > > > > get_signal+0x40b/0x1ee0 kernel/signal.c:2743 > > > > do_signal+0x82/0x2520 arch/x86/kernel/signal.c:810 > > > > exit_to_usermode_loop arch/x86/entry/common.c:218 [inline] > > > > __prepare_exit_to_usermode+0x156/0x1f0 arch/x86/entry/common.c:252 > > > > do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:376 > > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > RIP: 0033:0x45cb29 > > > > Code: Bad RIP value. > > > > RSP: 002b:00007fb154b96cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca > > > > RAX: 0000000000000001 RBX: 000000000078bf08 RCX: 000000000045cb29 > > > > RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000000000078bf0c > > > > RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 > > > > R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078bf0c > > > > R13: 00007ffd3933f26f R14: 00007fb154b979c0 R15: 000000000078bf0c > > > > > > > > > > > > Memory state around the buggy address: > > > > ffffc90001727600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > > ffffc90001727680: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 > > > > >ffffc90001727700: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 > > > > ^ > > > > ffffc90001727780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > > ffffc90001727800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > > ================================================================== > > > > > > > > > > > > --- > > > > This bug is generated by a bot. It may contain errors. > > > > See https://goo.gl/tpsmEJ for more information about syzbot. > > > > syzbot engineers can be reached at syzkaller@googlegroups.com. > > > > > > > > syzbot will keep track of this bug report. See: > > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot. > > > > syzbot can test patches for this bug, for details see: > > > > https://goo.gl/tpsmEJ#testing-patches > > > > > > -- > > > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > > > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com. > > > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20200704164522.GO9247%40paulmck-ThinkPad-P72. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: KASAN: stack-out-of-bounds Read in csd_lock_record 2020-07-07 16:26 ` Paul E. McKenney @ 2020-07-09 10:13 ` Dmitry Vyukov 2020-07-09 16:45 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: Dmitry Vyukov @ 2020-07-09 10:13 UTC (permalink / raw) To: Paul E. McKenney Cc: syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner On Tue, Jul 7, 2020 at 6:26 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Jul 07, 2020 at 05:51:48PM +0200, Dmitry Vyukov wrote: > > On Sat, Jul 4, 2020 at 8:34 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > > On Sat, Jul 4, 2020 at 6:45 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote: > > > > > Hello, > > > > > > > > > > syzbot found the following crash on: > > > > > > > > > > HEAD commit: 9e50b94b Add linux-next specific files for 20200703 > > > > > git tree: linux-next > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6 > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e > > > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000 > > > > > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > > > Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com > > > > > > > > Good catch! A call to csd_lock_record() was on the wrong side of a > > > > call to csd_unlock(). > > > > > > Thanks for taking a look. > > > > > > > But is folded into another commit for bisectability reasons, so > > > > "Reported-by" would not make sense. I have instead added this to the > > > > commit log: > > > > > > > > [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ] > > > > Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com > > > > Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com > > > > > > This should work, as far as I remember sybot looks for the email+hash > > > anywhere in the commit. > > > FWIW Tested-by can make sense as well. > > > > Paul, there is also some spike of stalls in smp_call_function, > > if you look at the top ones at: > > https://syzkaller.appspot.com/upstream#open > > > > Can these be caused by the same root cause? > > I am not sure what trees the bug was/is present... This seems to only > > happen on linux-next and nowhere else. But these stalls equally happen > > on mainline... > > I would be surprised, given that the csd_unlock() was before the faulting > reference. But then again, I have been surprised before. Yes, it seems unrelated. It looks like something broken in the kernel recently and now instead of diagnosing a stall on one CPU, it diagnoses it as a stall in smp_call_function on another CPU. This produces large number of assorted stall reports which are not too actionable... > You aren't running scftorture with its longwait parameter set to a > non-zero value, are you? In that case, stalls are expected behavior. > This is to support test the CSD lock diagnostics in -rcu. Which isn't > in mainline yet, so maybe I am asking a stupid question. Since I don't know what is scftorture/longwait, I guess I am not running it :) > If these are repeatable, one thing to try is to build the kernel with > CSD_LOCK_WAIT_DEBUG=y. This requires c6c67d89c059 ("smp: Add source and > destination CPUs to __call_single_data") and 216d15e0d870 ("kernel/smp: > Provide CSD lock timeout diagnostics") from the -rcu tree's "dev" branch. > This will dump out the smp_call_function() function that was to be > invoked, on the off-chance that the problem is something like lock > contention in that function. Here are some with reproducers: https://syzkaller.appspot.com/bug?id=8a1e95291152ce5afea43c103a1fd62a257fcf4b https://syzkaller.appspot.com/bug?id=5e3ac329b6304aacc6304cfaab1a514bca12ce82 https://syzkaller.appspot.com/bug?id=a01b4478f89e19cee91531f7c2b7751f0caf8c0c https://syzkaller.appspot.com/bug?id=e4caef9fc41d0c019c532a4257faec129699a42e But the question is if this CSD_LOCK_WAIT_DEBUG=y is useful in general? Should we enable it all the time? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: KASAN: stack-out-of-bounds Read in csd_lock_record 2020-07-09 10:13 ` Dmitry Vyukov @ 2020-07-09 16:45 ` Paul E. McKenney 0 siblings, 0 replies; 9+ messages in thread From: Paul E. McKenney @ 2020-07-09 16:45 UTC (permalink / raw) To: Dmitry Vyukov Cc: syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar, Peter Zijlstra, syzkaller-bugs, Thomas Gleixner On Thu, Jul 09, 2020 at 12:13:44PM +0200, Dmitry Vyukov wrote: > On Tue, Jul 7, 2020 at 6:26 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Tue, Jul 07, 2020 at 05:51:48PM +0200, Dmitry Vyukov wrote: > > > On Sat, Jul 4, 2020 at 8:34 PM Dmitry Vyukov <dvyukov@google.com> wrote: > > > > > > > > On Sat, Jul 4, 2020 at 6:45 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote: > > > > > > Hello, > > > > > > > > > > > > syzbot found the following crash on: > > > > > > > > > > > > HEAD commit: 9e50b94b Add linux-next specific files for 20200703 > > > > > > git tree: linux-next > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000 > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6 > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e > > > > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000 > > > > > > > > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit: > > > > > > Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com > > > > > > > > > > Good catch! A call to csd_lock_record() was on the wrong side of a > > > > > call to csd_unlock(). > > > > > > > > Thanks for taking a look. > > > > > > > > > But is folded into another commit for bisectability reasons, so > > > > > "Reported-by" would not make sense. I have instead added this to the > > > > > commit log: > > > > > > > > > > [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ] > > > > > Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com > > > > > Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com > > > > > > > > This should work, as far as I remember sybot looks for the email+hash > > > > anywhere in the commit. > > > > FWIW Tested-by can make sense as well. > > > > > > Paul, there is also some spike of stalls in smp_call_function, > > > if you look at the top ones at: > > > https://syzkaller.appspot.com/upstream#open > > > > > > Can these be caused by the same root cause? > > > I am not sure what trees the bug was/is present... This seems to only > > > happen on linux-next and nowhere else. But these stalls equally happen > > > on mainline... > > > > I would be surprised, given that the csd_unlock() was before the faulting > > reference. But then again, I have been surprised before. > > Yes, it seems unrelated. > It looks like something broken in the kernel recently and now instead > of diagnosing a stall on one CPU, it diagnoses it as a stall in > smp_call_function on another CPU. This produces large number of > assorted stall reports which are not too actionable... > > > > You aren't running scftorture with its longwait parameter set to a > > non-zero value, are you? In that case, stalls are expected behavior. > > This is to support test the CSD lock diagnostics in -rcu. Which isn't > > in mainline yet, so maybe I am asking a stupid question. > > Since I don't know what is scftorture/longwait, I guess I am not running it :) > > > If these are repeatable, one thing to try is to build the kernel with > > CSD_LOCK_WAIT_DEBUG=y. This requires c6c67d89c059 ("smp: Add source and > > destination CPUs to __call_single_data") and 216d15e0d870 ("kernel/smp: > > Provide CSD lock timeout diagnostics") from the -rcu tree's "dev" branch. > > This will dump out the smp_call_function() function that was to be > > invoked, on the off-chance that the problem is something like lock > > contention in that function. > > Here are some with reproducers: > https://syzkaller.appspot.com/bug?id=8a1e95291152ce5afea43c103a1fd62a257fcf4b > https://syzkaller.appspot.com/bug?id=5e3ac329b6304aacc6304cfaab1a514bca12ce82 > https://syzkaller.appspot.com/bug?id=a01b4478f89e19cee91531f7c2b7751f0caf8c0c > https://syzkaller.appspot.com/bug?id=e4caef9fc41d0c019c532a4257faec129699a42e > > But the question is if this CSD_LOCK_WAIT_DEBUG=y is useful in > general? Should we enable it all the time? The CSD_LOCK_WAIT_DEBUG functionality is quite new, so it is quite possible that it is causing rather than detecting problems. ;-) But once it is stable, then yes, it might be quite generally useful. Thanx, Paul ^ permalink raw reply [flat|nested] 9+ messages in thread
* [tip: core/rcu] kernel/smp: Provide CSD lock timeout diagnostics 2020-07-03 23:31 KASAN: stack-out-of-bounds Read in csd_lock_record syzbot 2020-07-04 0:48 ` syzbot 2020-07-04 16:45 ` Paul E. McKenney @ 2020-10-09 6:35 ` tip-bot2 for Paul E. McKenney 2 siblings, 0 replies; 9+ messages in thread From: tip-bot2 for Paul E. McKenney @ 2020-10-09 6:35 UTC (permalink / raw) To: linux-tip-commits Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Sebastian Andrzej Siewior, Paul E. McKenney, x86, LKML The following commit has been merged into the core/rcu branch of tip: Commit-ID: 35feb60474bf4f7fa7840e14fc7fd344996b919d Gitweb: https://git.kernel.org/tip/35feb60474bf4f7fa7840e14fc7fd344996b919d Author: Paul E. McKenney <paulmck@kernel.org> AuthorDate: Tue, 30 Jun 2020 13:22:54 -07:00 Committer: Paul E. McKenney <paulmck@kernel.org> CommitterDate: Fri, 04 Sep 2020 11:52:50 -07:00 kernel/smp: Provide CSD lock timeout diagnostics This commit causes csd_lock_wait() to emit diagnostics when a CPU fails to respond quickly enough to one of the smp_call_function() family of function calls. These diagnostics are enabled by a new CSD_LOCK_WAIT_DEBUG Kconfig option that depends on DEBUG_KERNEL. This commit was inspired by an earlier patch by Josef Bacik. [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ] [ paulmck: Fix KASAN use-after-free issue reported by Qian Cai. ] [ paulmck: Fix botched nr_cpu_ids comparison per Dan Carpenter. ] [ paulmck: Apply Peter Zijlstra feedback. ] Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> --- kernel/smp.c | 132 ++++++++++++++++++++++++++++++++++++++++++++- lib/Kconfig.debug | 11 ++++- 2 files changed, 141 insertions(+), 2 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 865a876..c5d3188 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -20,6 +20,9 @@ #include <linux/sched.h> #include <linux/sched/idle.h> #include <linux/hypervisor.h> +#include <linux/sched/clock.h> +#include <linux/nmi.h> +#include <linux/sched/debug.h> #include "smpboot.h" #include "sched/smp.h" @@ -96,6 +99,103 @@ void __init call_function_init(void) smpcfd_prepare_cpu(smp_processor_id()); } +#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG + +static DEFINE_PER_CPU(call_single_data_t *, cur_csd); +static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func); +static DEFINE_PER_CPU(void *, cur_csd_info); + +#define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC) +atomic_t csd_bug_count = ATOMIC_INIT(0); + +/* Record current CSD work for current CPU, NULL to erase. */ +static void csd_lock_record(call_single_data_t *csd) +{ + if (!csd) { + smp_mb(); /* NULL cur_csd after unlock. */ + __this_cpu_write(cur_csd, NULL); + return; + } + __this_cpu_write(cur_csd_func, csd->func); + __this_cpu_write(cur_csd_info, csd->info); + smp_wmb(); /* func and info before csd. */ + __this_cpu_write(cur_csd, csd); + smp_mb(); /* Update cur_csd before function call. */ + /* Or before unlock, as the case may be. */ +} + +static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd) +{ + unsigned int csd_type; + + csd_type = CSD_TYPE(csd); + if (csd_type == CSD_TYPE_ASYNC || csd_type == CSD_TYPE_SYNC) + return csd->dst; /* Other CSD_TYPE_ values might not have ->dst. */ + return -1; +} + +/* + * Complain if too much time spent waiting. Note that only + * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU, + * so waiting on other types gets much less information. + */ +static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id) +{ + int cpu = -1; + int cpux; + bool firsttime; + u64 ts2, ts_delta; + call_single_data_t *cpu_cur_csd; + unsigned int flags = READ_ONCE(csd->flags); + + if (!(flags & CSD_FLAG_LOCK)) { + if (!unlikely(*bug_id)) + return true; + cpu = csd_lock_wait_getcpu(csd); + pr_alert("csd: CSD lock (#%d) got unstuck on CPU#%02d, CPU#%02d released the lock.\n", + *bug_id, raw_smp_processor_id(), cpu); + return true; + } + + ts2 = sched_clock(); + ts_delta = ts2 - *ts1; + if (likely(ts_delta <= CSD_LOCK_TIMEOUT)) + return false; + + firsttime = !*bug_id; + if (firsttime) + *bug_id = atomic_inc_return(&csd_bug_count); + cpu = csd_lock_wait_getcpu(csd); + if (WARN_ONCE(cpu < 0 || cpu >= nr_cpu_ids, "%s: cpu = %d\n", __func__, cpu)) + cpux = 0; + else + cpux = cpu; + cpu_cur_csd = smp_load_acquire(&per_cpu(cur_csd, cpux)); /* Before func and info. */ + pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %llu ns for CPU#%02d %pS(%ps).\n", + firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), ts2 - ts0, + cpu, csd->func, csd->info); + if (cpu_cur_csd && csd != cpu_cur_csd) { + pr_alert("\tcsd: CSD lock (#%d) handling prior %pS(%ps) request.\n", + *bug_id, READ_ONCE(per_cpu(cur_csd_func, cpux)), + READ_ONCE(per_cpu(cur_csd_info, cpux))); + } else { + pr_alert("\tcsd: CSD lock (#%d) %s.\n", + *bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request"); + } + if (cpu >= 0) { + if (!trigger_single_cpu_backtrace(cpu)) + dump_cpu_task(cpu); + if (!cpu_cur_csd) { + pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu); + arch_send_call_function_single_ipi(cpu); + } + } + dump_stack(); + *ts1 = ts2; + + return false; +} + /* * csd_lock/csd_unlock used to serialize access to per-cpu csd resources * @@ -105,8 +205,28 @@ void __init call_function_init(void) */ static __always_inline void csd_lock_wait(call_single_data_t *csd) { + int bug_id = 0; + u64 ts0, ts1; + + ts1 = ts0 = sched_clock(); + for (;;) { + if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id)) + break; + cpu_relax(); + } + smp_acquire__after_ctrl_dep(); +} + +#else +static void csd_lock_record(call_single_data_t *csd) +{ +} + +static __always_inline void csd_lock_wait(call_single_data_t *csd) +{ smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK)); } +#endif static __always_inline void csd_lock(call_single_data_t *csd) { @@ -166,9 +286,11 @@ static int generic_exec_single(int cpu, call_single_data_t *csd) * We can unlock early even for the synchronous on-stack case, * since we're doing this from the same CPU.. */ + csd_lock_record(csd); csd_unlock(csd); local_irq_save(flags); func(info); + csd_lock_record(NULL); local_irq_restore(flags); return 0; } @@ -268,8 +390,10 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline) entry = &csd_next->llist; } + csd_lock_record(csd); func(info); csd_unlock(csd); + csd_lock_record(NULL); } else { prev = &csd->llist; } @@ -296,8 +420,10 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline) smp_call_func_t func = csd->func; void *info = csd->info; + csd_lock_record(csd); csd_unlock(csd); func(info); + csd_lock_record(NULL); } else if (type == CSD_TYPE_IRQ_WORK) { irq_work_single(csd); } @@ -375,7 +501,8 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info, csd->func = func; csd->info = info; -#ifdef CONFIG_64BIT +#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG + csd->src = smp_processor_id(); csd->dst = cpu; #endif @@ -543,7 +670,8 @@ static void smp_call_function_many_cond(const struct cpumask *mask, csd->flags |= CSD_TYPE_SYNC; csd->func = func; csd->info = info; -#ifdef CONFIG_64BIT +#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG + csd->src = smp_processor_id(); csd->dst = cpu; #endif if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu))) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index e068c3c..86a35fd 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1367,6 +1367,17 @@ config WW_MUTEX_SELFTEST Say M if you want these self tests to build as a module. Say N if you are unsure. +config CSD_LOCK_WAIT_DEBUG + bool "Debugging for csd_lock_wait(), called from smp_call_function*()" + depends on DEBUG_KERNEL + depends on 64BIT + default n + help + This option enables debug prints when CPUs are slow to respond + to the smp_call_function*() IPI wrappers. These debug prints + include the IPI handler function currently executing (if any) + and relevant stack traces. + endmenu # lock debugging config TRACE_IRQFLAGS ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-10-09 6:39 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-03 23:31 KASAN: stack-out-of-bounds Read in csd_lock_record syzbot 2020-07-04 0:48 ` syzbot 2020-07-04 16:45 ` Paul E. McKenney 2020-07-04 18:34 ` Dmitry Vyukov 2020-07-07 15:51 ` Dmitry Vyukov 2020-07-07 16:26 ` Paul E. McKenney 2020-07-09 10:13 ` Dmitry Vyukov 2020-07-09 16:45 ` Paul E. McKenney 2020-10-09 6:35 ` [tip: core/rcu] kernel/smp: Provide CSD lock timeout diagnostics tip-bot2 for Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).