linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* KASAN: stack-out-of-bounds Read in csd_lock_record
@ 2020-07-03 23:31 syzbot
  2020-07-04  0:48 ` syzbot
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: syzbot @ 2020-07-03 23:31 UTC (permalink / raw)
  To: bigeasy, linux-kernel, mingo, paulmck, peterz, syzkaller-bugs, tglx

Hello,

syzbot found the following crash on:

HEAD commit:    9e50b94b Add linux-next specific files for 20200703
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
compiler:       gcc (GCC) 10.1.0-syz 20200507
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xcb/0xe0 kernel/smp.c:118
Read of size 8 at addr ffffc90001727710 by task syz-executor.0/10721

CPU: 1 PID: 10721 Comm: syz-executor.0 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 csd_lock_record+0xcb/0xe0 kernel/smp.c:118
 flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391
 __sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248
 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
 </IRQ>
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
 sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243
 asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline]
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191
Code: 48 c7 c0 00 ff b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d 9b 74 c8 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 95 fb 62 f9 65 8b 05 fe 73 15 78
RSP: 0018:ffffc900016e7558 EFLAGS: 00000282
RAX: 1ffffffff1369fe0 RBX: 0000000000000282 RCX: 0000000000000000
RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282
RBP: ffffffff8cb02508 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 1ffffffff19604a0
R13: 0000000000000000 R14: dead000000000100 R15: dffffc0000000000
 __debug_check_no_obj_freed lib/debugobjects.c:977 [inline]
 debug_check_no_obj_freed+0x20c/0x41c lib/debugobjects.c:998
 free_pages_prepare mm/page_alloc.c:1219 [inline]
 __free_pages_ok+0x20b/0xc90 mm/page_alloc.c:1471
 release_pages+0x5ec/0x17a0 mm/swap.c:880
 tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
 tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
 tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:249
 zap_pte_range mm/memory.c:1155 [inline]
 zap_pmd_range mm/memory.c:1193 [inline]
 zap_pud_range mm/memory.c:1222 [inline]
 zap_p4d_range mm/memory.c:1243 [inline]
 unmap_page_range+0x1e22/0x2b20 mm/memory.c:1264
 unmap_single_vma+0x198/0x300 mm/memory.c:1309
 unmap_vmas+0x16f/0x2f0 mm/memory.c:1341
 exit_mmap+0x2b1/0x530 mm/mmap.c:3165
 __mmput+0x122/0x470 kernel/fork.c:1075
 mmput+0x53/0x60 kernel/fork.c:1096
 exit_mm kernel/exit.c:483 [inline]
 do_exit+0xa8f/0x2a40 kernel/exit.c:793
 do_group_exit+0x125/0x310 kernel/exit.c:904
 get_signal+0x40b/0x1ee0 kernel/signal.c:2743
 do_signal+0x82/0x2520 arch/x86/kernel/signal.c:810
 exit_to_usermode_loop arch/x86/entry/common.c:218 [inline]
 __prepare_exit_to_usermode+0x156/0x1f0 arch/x86/entry/common.c:252
 do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:376
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45cb29
Code: Bad RIP value.
RSP: 002b:00007fb154b96cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000001 RBX: 000000000078bf08 RCX: 000000000045cb29
RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000000000078bf0c
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078bf0c
R13: 00007ffd3933f26f R14: 00007fb154b979c0 R15: 000000000078bf0c


Memory state around the buggy address:
 ffffc90001727600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffc90001727680: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
>ffffc90001727700: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
                         ^
 ffffc90001727780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffc90001727800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: stack-out-of-bounds Read in csd_lock_record
  2020-07-03 23:31 KASAN: stack-out-of-bounds Read in csd_lock_record syzbot
@ 2020-07-04  0:48 ` syzbot
  2020-07-04 16:45 ` Paul E. McKenney
  2020-10-09  6:35 ` [tip: core/rcu] kernel/smp: Provide CSD lock timeout diagnostics tip-bot2 for Paul E. McKenney
  2 siblings, 0 replies; 9+ messages in thread
From: syzbot @ 2020-07-04  0:48 UTC (permalink / raw)
  To: bigeasy, linux-kernel, mingo, paulmck, peterz, syzkaller-bugs, tglx

syzbot has found a reproducer for the following crash on:

HEAD commit:    9e50b94b Add linux-next specific files for 20200703
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1224dc83100000
kernel config:  https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
compiler:       gcc (GCC) 10.1.0-syz 20200507
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=170442d5100000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=162ef66d100000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xd2/0xe0 kernel/smp.c:119
Read of size 8 at addr ffffc900016d75f8 by task swapper/1/0

CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x18f/0x20d lib/dump_stack.c:118
 print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
 csd_lock_record+0xd2/0xe0 kernel/smp.c:119
 flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391
 __sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248
 asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
 </IRQ>
 __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
 run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
 sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243
 asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604
RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61
Code: ff 4c 89 ef e8 33 30 c7 f9 e9 8e fe ff ff 48 89 df e8 26 30 c7 f9 eb 8a cc cc cc cc e9 07 00 00 00 0f 00 2d 14 4b 5c 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 04 4b 5c 00 f4 c3 cc cc 55 53 e8 c9
RSP: 0018:ffffc90000d3fd18 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff8880a95f0340 RSI: ffffffff87ec78c8 RDI: ffffffff87ec789e
RBP: ffff88821af4d864 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88821af4d864
R13: 1ffff920001a7fad R14: ffff88821af4d865 R15: 0000000000000001
 arch_safe_halt arch/x86/include/asm/paravirt.h:150 [inline]
 acpi_safe_halt+0x8d/0x110 drivers/acpi/processor_idle.c:111
 acpi_idle_do_entry+0x15c/0x1b0 drivers/acpi/processor_idle.c:525
 acpi_idle_enter+0x3f9/0xab0 drivers/acpi/processor_idle.c:651
 cpuidle_enter_state+0xff/0x960 drivers/cpuidle/cpuidle.c:235
 cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:346
 call_cpuidle kernel/sched/idle.c:126 [inline]
 cpuidle_idle_call kernel/sched/idle.c:214 [inline]
 do_idle+0x431/0x6d0 kernel/sched/idle.c:276
 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:372
 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243


Memory state around the buggy address:
 ffffc900016d7480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffc900016d7500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffc900016d7580: 00 00 00 00 f1 f1 f1 f1 00 00 00 00 f3 f3 f3 f3
                                                                ^
 ffffc900016d7600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffffc900016d7680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: stack-out-of-bounds Read in csd_lock_record
  2020-07-03 23:31 KASAN: stack-out-of-bounds Read in csd_lock_record syzbot
  2020-07-04  0:48 ` syzbot
@ 2020-07-04 16:45 ` Paul E. McKenney
  2020-07-04 18:34   ` Dmitry Vyukov
  2020-10-09  6:35 ` [tip: core/rcu] kernel/smp: Provide CSD lock timeout diagnostics tip-bot2 for Paul E. McKenney
  2 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2020-07-04 16:45 UTC (permalink / raw)
  To: syzbot; +Cc: bigeasy, linux-kernel, mingo, peterz, syzkaller-bugs, tglx

On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    9e50b94b Add linux-next specific files for 20200703
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
> dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
> compiler:       gcc (GCC) 10.1.0-syz 20200507
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com

Good catch!  A call to csd_lock_record() was on the wrong side of a
call to csd_unlock().

But is folded into another commit for bisectability reasons, so
"Reported-by" would not make sense.  I have instead added this to the
commit log:

[ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ]
Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com
Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com

							Thanx, Paul

> ==================================================================
> BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xcb/0xe0 kernel/smp.c:118
> Read of size 8 at addr ffffc90001727710 by task syz-executor.0/10721
> 
> CPU: 1 PID: 10721 Comm: syz-executor.0 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x18f/0x20d lib/dump_stack.c:118
>  print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383
>  __kasan_report mm/kasan/report.c:513 [inline]
>  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
>  csd_lock_record+0xcb/0xe0 kernel/smp.c:118
>  flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391
>  __sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248
>  asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
>  </IRQ>
>  __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
>  run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
>  sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243
>  asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604
> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline]
> RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
> RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191
> Code: 48 c7 c0 00 ff b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d 9b 74 c8 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 95 fb 62 f9 65 8b 05 fe 73 15 78
> RSP: 0018:ffffc900016e7558 EFLAGS: 00000282
> RAX: 1ffffffff1369fe0 RBX: 0000000000000282 RCX: 0000000000000000
> RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282
> RBP: ffffffff8cb02508 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: 1ffffffff19604a0
> R13: 0000000000000000 R14: dead000000000100 R15: dffffc0000000000
>  __debug_check_no_obj_freed lib/debugobjects.c:977 [inline]
>  debug_check_no_obj_freed+0x20c/0x41c lib/debugobjects.c:998
>  free_pages_prepare mm/page_alloc.c:1219 [inline]
>  __free_pages_ok+0x20b/0xc90 mm/page_alloc.c:1471
>  release_pages+0x5ec/0x17a0 mm/swap.c:880
>  tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
>  tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
>  tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:249
>  zap_pte_range mm/memory.c:1155 [inline]
>  zap_pmd_range mm/memory.c:1193 [inline]
>  zap_pud_range mm/memory.c:1222 [inline]
>  zap_p4d_range mm/memory.c:1243 [inline]
>  unmap_page_range+0x1e22/0x2b20 mm/memory.c:1264
>  unmap_single_vma+0x198/0x300 mm/memory.c:1309
>  unmap_vmas+0x16f/0x2f0 mm/memory.c:1341
>  exit_mmap+0x2b1/0x530 mm/mmap.c:3165
>  __mmput+0x122/0x470 kernel/fork.c:1075
>  mmput+0x53/0x60 kernel/fork.c:1096
>  exit_mm kernel/exit.c:483 [inline]
>  do_exit+0xa8f/0x2a40 kernel/exit.c:793
>  do_group_exit+0x125/0x310 kernel/exit.c:904
>  get_signal+0x40b/0x1ee0 kernel/signal.c:2743
>  do_signal+0x82/0x2520 arch/x86/kernel/signal.c:810
>  exit_to_usermode_loop arch/x86/entry/common.c:218 [inline]
>  __prepare_exit_to_usermode+0x156/0x1f0 arch/x86/entry/common.c:252
>  do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:376
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x45cb29
> Code: Bad RIP value.
> RSP: 002b:00007fb154b96cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: 0000000000000001 RBX: 000000000078bf08 RCX: 000000000045cb29
> RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000000000078bf0c
> RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078bf0c
> R13: 00007ffd3933f26f R14: 00007fb154b979c0 R15: 000000000078bf0c
> 
> 
> Memory state around the buggy address:
>  ffffc90001727600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  ffffc90001727680: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
> >ffffc90001727700: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
>                          ^
>  ffffc90001727780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>  ffffc90001727800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ==================================================================
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: stack-out-of-bounds Read in csd_lock_record
  2020-07-04 16:45 ` Paul E. McKenney
@ 2020-07-04 18:34   ` Dmitry Vyukov
  2020-07-07 15:51     ` Dmitry Vyukov
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Vyukov @ 2020-07-04 18:34 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar,
	Peter Zijlstra, syzkaller-bugs, Thomas Gleixner

On Sat, Jul 4, 2020 at 6:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    9e50b94b Add linux-next specific files for 20200703
> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
> > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
> > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000
> >
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com
>
> Good catch!  A call to csd_lock_record() was on the wrong side of a
> call to csd_unlock().

Thanks for taking a look.

> But is folded into another commit for bisectability reasons, so
> "Reported-by" would not make sense.  I have instead added this to the
> commit log:
>
> [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ]
> Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com
> Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com

This should work, as far as I remember sybot looks for the email+hash
anywhere in the commit.
FWIW Tested-by can make sense as well.

>                                                         Thanx, Paul
>
> > ==================================================================
> > BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xcb/0xe0 kernel/smp.c:118
> > Read of size 8 at addr ffffc90001727710 by task syz-executor.0/10721
> >
> > CPU: 1 PID: 10721 Comm: syz-executor.0 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Call Trace:
> >  <IRQ>
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x18f/0x20d lib/dump_stack.c:118
> >  print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383
> >  __kasan_report mm/kasan/report.c:513 [inline]
> >  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
> >  csd_lock_record+0xcb/0xe0 kernel/smp.c:118
> >  flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391
> >  __sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248
> >  asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
> >  </IRQ>
> >  __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
> >  run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
> >  sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243
> >  asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604
> > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline]
> > RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
> > RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191
> > Code: 48 c7 c0 00 ff b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d 9b 74 c8 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 95 fb 62 f9 65 8b 05 fe 73 15 78
> > RSP: 0018:ffffc900016e7558 EFLAGS: 00000282
> > RAX: 1ffffffff1369fe0 RBX: 0000000000000282 RCX: 0000000000000000
> > RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282
> > RBP: ffffffff8cb02508 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000000 R12: 1ffffffff19604a0
> > R13: 0000000000000000 R14: dead000000000100 R15: dffffc0000000000
> >  __debug_check_no_obj_freed lib/debugobjects.c:977 [inline]
> >  debug_check_no_obj_freed+0x20c/0x41c lib/debugobjects.c:998
> >  free_pages_prepare mm/page_alloc.c:1219 [inline]
> >  __free_pages_ok+0x20b/0xc90 mm/page_alloc.c:1471
> >  release_pages+0x5ec/0x17a0 mm/swap.c:880
> >  tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
> >  tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
> >  tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:249
> >  zap_pte_range mm/memory.c:1155 [inline]
> >  zap_pmd_range mm/memory.c:1193 [inline]
> >  zap_pud_range mm/memory.c:1222 [inline]
> >  zap_p4d_range mm/memory.c:1243 [inline]
> >  unmap_page_range+0x1e22/0x2b20 mm/memory.c:1264
> >  unmap_single_vma+0x198/0x300 mm/memory.c:1309
> >  unmap_vmas+0x16f/0x2f0 mm/memory.c:1341
> >  exit_mmap+0x2b1/0x530 mm/mmap.c:3165
> >  __mmput+0x122/0x470 kernel/fork.c:1075
> >  mmput+0x53/0x60 kernel/fork.c:1096
> >  exit_mm kernel/exit.c:483 [inline]
> >  do_exit+0xa8f/0x2a40 kernel/exit.c:793
> >  do_group_exit+0x125/0x310 kernel/exit.c:904
> >  get_signal+0x40b/0x1ee0 kernel/signal.c:2743
> >  do_signal+0x82/0x2520 arch/x86/kernel/signal.c:810
> >  exit_to_usermode_loop arch/x86/entry/common.c:218 [inline]
> >  __prepare_exit_to_usermode+0x156/0x1f0 arch/x86/entry/common.c:252
> >  do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:376
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > RIP: 0033:0x45cb29
> > Code: Bad RIP value.
> > RSP: 002b:00007fb154b96cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> > RAX: 0000000000000001 RBX: 000000000078bf08 RCX: 000000000045cb29
> > RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000000000078bf0c
> > RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078bf0c
> > R13: 00007ffd3933f26f R14: 00007fb154b979c0 R15: 000000000078bf0c
> >
> >
> > Memory state around the buggy address:
> >  ffffc90001727600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >  ffffc90001727680: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
> > >ffffc90001727700: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
> >                          ^
> >  ffffc90001727780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >  ffffc90001727800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > ==================================================================
> >
> >
> > ---
> > This bug is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at syzkaller@googlegroups.com.
> >
> > syzbot will keep track of this bug report. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > syzbot can test patches for this bug, for details see:
> > https://goo.gl/tpsmEJ#testing-patches
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20200704164522.GO9247%40paulmck-ThinkPad-P72.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: stack-out-of-bounds Read in csd_lock_record
  2020-07-04 18:34   ` Dmitry Vyukov
@ 2020-07-07 15:51     ` Dmitry Vyukov
  2020-07-07 16:26       ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Vyukov @ 2020-07-07 15:51 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar,
	Peter Zijlstra, syzkaller-bugs, Thomas Gleixner

On Sat, Jul 4, 2020 at 8:34 PM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Sat, Jul 4, 2020 at 6:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit:    9e50b94b Add linux-next specific files for 20200703
> > > git tree:       linux-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
> > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com
> >
> > Good catch!  A call to csd_lock_record() was on the wrong side of a
> > call to csd_unlock().
>
> Thanks for taking a look.
>
> > But is folded into another commit for bisectability reasons, so
> > "Reported-by" would not make sense.  I have instead added this to the
> > commit log:
> >
> > [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ]
> > Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com
> > Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com
>
> This should work, as far as I remember sybot looks for the email+hash
> anywhere in the commit.
> FWIW Tested-by can make sense as well.


Paul, there is also some spike of stalls in smp_call_function,
if you look at the top ones at:
https://syzkaller.appspot.com/upstream#open

Can these be caused by the same root cause?
I am not sure what trees the bug was/is present... This seems to only
happen on linux-next and nowhere else. But these stalls equally happen
on mainline...



> >                                                         Thanx, Paul
> >
> > > ==================================================================
> > > BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xcb/0xe0 kernel/smp.c:118
> > > Read of size 8 at addr ffffc90001727710 by task syz-executor.0/10721
> > >
> > > CPU: 1 PID: 10721 Comm: syz-executor.0 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > > Call Trace:
> > >  <IRQ>
> > >  __dump_stack lib/dump_stack.c:77 [inline]
> > >  dump_stack+0x18f/0x20d lib/dump_stack.c:118
> > >  print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383
> > >  __kasan_report mm/kasan/report.c:513 [inline]
> > >  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
> > >  csd_lock_record+0xcb/0xe0 kernel/smp.c:118
> > >  flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391
> > >  __sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248
> > >  asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
> > >  </IRQ>
> > >  __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
> > >  run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
> > >  sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243
> > >  asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604
> > > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline]
> > > RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
> > > RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191
> > > Code: 48 c7 c0 00 ff b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d 9b 74 c8 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 95 fb 62 f9 65 8b 05 fe 73 15 78
> > > RSP: 0018:ffffc900016e7558 EFLAGS: 00000282
> > > RAX: 1ffffffff1369fe0 RBX: 0000000000000282 RCX: 0000000000000000
> > > RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282
> > > RBP: ffffffff8cb02508 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000001 R11: 0000000000000000 R12: 1ffffffff19604a0
> > > R13: 0000000000000000 R14: dead000000000100 R15: dffffc0000000000
> > >  __debug_check_no_obj_freed lib/debugobjects.c:977 [inline]
> > >  debug_check_no_obj_freed+0x20c/0x41c lib/debugobjects.c:998
> > >  free_pages_prepare mm/page_alloc.c:1219 [inline]
> > >  __free_pages_ok+0x20b/0xc90 mm/page_alloc.c:1471
> > >  release_pages+0x5ec/0x17a0 mm/swap.c:880
> > >  tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
> > >  tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
> > >  tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:249
> > >  zap_pte_range mm/memory.c:1155 [inline]
> > >  zap_pmd_range mm/memory.c:1193 [inline]
> > >  zap_pud_range mm/memory.c:1222 [inline]
> > >  zap_p4d_range mm/memory.c:1243 [inline]
> > >  unmap_page_range+0x1e22/0x2b20 mm/memory.c:1264
> > >  unmap_single_vma+0x198/0x300 mm/memory.c:1309
> > >  unmap_vmas+0x16f/0x2f0 mm/memory.c:1341
> > >  exit_mmap+0x2b1/0x530 mm/mmap.c:3165
> > >  __mmput+0x122/0x470 kernel/fork.c:1075
> > >  mmput+0x53/0x60 kernel/fork.c:1096
> > >  exit_mm kernel/exit.c:483 [inline]
> > >  do_exit+0xa8f/0x2a40 kernel/exit.c:793
> > >  do_group_exit+0x125/0x310 kernel/exit.c:904
> > >  get_signal+0x40b/0x1ee0 kernel/signal.c:2743
> > >  do_signal+0x82/0x2520 arch/x86/kernel/signal.c:810
> > >  exit_to_usermode_loop arch/x86/entry/common.c:218 [inline]
> > >  __prepare_exit_to_usermode+0x156/0x1f0 arch/x86/entry/common.c:252
> > >  do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:376
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > RIP: 0033:0x45cb29
> > > Code: Bad RIP value.
> > > RSP: 002b:00007fb154b96cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> > > RAX: 0000000000000001 RBX: 000000000078bf08 RCX: 000000000045cb29
> > > RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000000000078bf0c
> > > RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078bf0c
> > > R13: 00007ffd3933f26f R14: 00007fb154b979c0 R15: 000000000078bf0c
> > >
> > >
> > > Memory state around the buggy address:
> > >  ffffc90001727600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >  ffffc90001727680: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
> > > >ffffc90001727700: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
> > >                          ^
> > >  ffffc90001727780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >  ffffc90001727800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > ==================================================================
> > >
> > >
> > > ---
> > > This bug is generated by a bot. It may contain errors.
> > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > >
> > > syzbot will keep track of this bug report. See:
> > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > > syzbot can test patches for this bug, for details see:
> > > https://goo.gl/tpsmEJ#testing-patches
> >
> > --
> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20200704164522.GO9247%40paulmck-ThinkPad-P72.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: stack-out-of-bounds Read in csd_lock_record
  2020-07-07 15:51     ` Dmitry Vyukov
@ 2020-07-07 16:26       ` Paul E. McKenney
  2020-07-09 10:13         ` Dmitry Vyukov
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2020-07-07 16:26 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar,
	Peter Zijlstra, syzkaller-bugs, Thomas Gleixner

On Tue, Jul 07, 2020 at 05:51:48PM +0200, Dmitry Vyukov wrote:
> On Sat, Jul 4, 2020 at 8:34 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> >
> > On Sat, Jul 4, 2020 at 6:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > >
> > > On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit:    9e50b94b Add linux-next specific files for 20200703
> > > > git tree:       linux-next
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
> > > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000
> > > >
> > > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > > Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com
> > >
> > > Good catch!  A call to csd_lock_record() was on the wrong side of a
> > > call to csd_unlock().
> >
> > Thanks for taking a look.
> >
> > > But is folded into another commit for bisectability reasons, so
> > > "Reported-by" would not make sense.  I have instead added this to the
> > > commit log:
> > >
> > > [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ]
> > > Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com
> > > Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com
> >
> > This should work, as far as I remember sybot looks for the email+hash
> > anywhere in the commit.
> > FWIW Tested-by can make sense as well.
> 
> Paul, there is also some spike of stalls in smp_call_function,
> if you look at the top ones at:
> https://syzkaller.appspot.com/upstream#open
> 
> Can these be caused by the same root cause?
> I am not sure what trees the bug was/is present... This seems to only
> happen on linux-next and nowhere else. But these stalls equally happen
> on mainline...

I would be surprised, given that the csd_unlock() was before the faulting
reference.  But then again, I have been surprised before.

You aren't running scftorture with its longwait parameter set to a
non-zero value, are you?  In that case, stalls are expected behavior.
This is to support test the CSD lock diagnostics in -rcu.  Which isn't
in mainline yet, so maybe I am asking a stupid question.

If these are repeatable, one thing to try is to build the kernel with
CSD_LOCK_WAIT_DEBUG=y.  This requires c6c67d89c059 ("smp: Add source and
destination CPUs to __call_single_data") and 216d15e0d870 ("kernel/smp:
Provide CSD lock timeout diagnostics") from the -rcu tree's "dev" branch.
This will dump out the smp_call_function() function that was to be
invoked, on the off-chance that the problem is something like lock
contention in that function.

							Thanx, Paul

> > > > ==================================================================
> > > > BUG: KASAN: stack-out-of-bounds in csd_lock_record+0xcb/0xe0 kernel/smp.c:118
> > > > Read of size 8 at addr ffffc90001727710 by task syz-executor.0/10721
> > > >
> > > > CPU: 1 PID: 10721 Comm: syz-executor.0 Not tainted 5.8.0-rc3-next-20200703-syzkaller #0
> > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > > > Call Trace:
> > > >  <IRQ>
> > > >  __dump_stack lib/dump_stack.c:77 [inline]
> > > >  dump_stack+0x18f/0x20d lib/dump_stack.c:118
> > > >  print_address_description.constprop.0.cold+0x5/0x436 mm/kasan/report.c:383
> > > >  __kasan_report mm/kasan/report.c:513 [inline]
> > > >  kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
> > > >  csd_lock_record+0xcb/0xe0 kernel/smp.c:118
> > > >  flush_smp_call_function_queue+0x285/0x730 kernel/smp.c:391
> > > >  __sysvec_call_function_single+0x98/0x490 arch/x86/kernel/smp.c:248
> > > >  asm_call_on_stack+0xf/0x20 arch/x86/entry/entry_64.S:706
> > > >  </IRQ>
> > > >  __run_on_irqstack arch/x86/include/asm/irq_stack.h:22 [inline]
> > > >  run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:48 [inline]
> > > >  sysvec_call_function_single+0xe0/0x120 arch/x86/kernel/smp.c:243
> > > >  asm_sysvec_call_function_single+0x12/0x20 arch/x86/include/asm/idtentry.h:604
> > > > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:765 [inline]
> > > > RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
> > > > RIP: 0010:_raw_spin_unlock_irqrestore+0x8c/0xe0 kernel/locking/spinlock.c:191
> > > > Code: 48 c7 c0 00 ff b4 89 48 ba 00 00 00 00 00 fc ff df 48 c1 e8 03 80 3c 10 00 75 37 48 83 3d 9b 74 c8 01 00 74 22 48 89 df 57 9d <0f> 1f 44 00 00 bf 01 00 00 00 e8 95 fb 62 f9 65 8b 05 fe 73 15 78
> > > > RSP: 0018:ffffc900016e7558 EFLAGS: 00000282
> > > > RAX: 1ffffffff1369fe0 RBX: 0000000000000282 RCX: 0000000000000000
> > > > RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000282
> > > > RBP: ffffffff8cb02508 R08: 0000000000000000 R09: 0000000000000000
> > > > R10: 0000000000000001 R11: 0000000000000000 R12: 1ffffffff19604a0
> > > > R13: 0000000000000000 R14: dead000000000100 R15: dffffc0000000000
> > > >  __debug_check_no_obj_freed lib/debugobjects.c:977 [inline]
> > > >  debug_check_no_obj_freed+0x20c/0x41c lib/debugobjects.c:998
> > > >  free_pages_prepare mm/page_alloc.c:1219 [inline]
> > > >  __free_pages_ok+0x20b/0xc90 mm/page_alloc.c:1471
> > > >  release_pages+0x5ec/0x17a0 mm/swap.c:880
> > > >  tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
> > > >  tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
> > > >  tlb_flush_mmu+0xe9/0x6b0 mm/mmu_gather.c:249
> > > >  zap_pte_range mm/memory.c:1155 [inline]
> > > >  zap_pmd_range mm/memory.c:1193 [inline]
> > > >  zap_pud_range mm/memory.c:1222 [inline]
> > > >  zap_p4d_range mm/memory.c:1243 [inline]
> > > >  unmap_page_range+0x1e22/0x2b20 mm/memory.c:1264
> > > >  unmap_single_vma+0x198/0x300 mm/memory.c:1309
> > > >  unmap_vmas+0x16f/0x2f0 mm/memory.c:1341
> > > >  exit_mmap+0x2b1/0x530 mm/mmap.c:3165
> > > >  __mmput+0x122/0x470 kernel/fork.c:1075
> > > >  mmput+0x53/0x60 kernel/fork.c:1096
> > > >  exit_mm kernel/exit.c:483 [inline]
> > > >  do_exit+0xa8f/0x2a40 kernel/exit.c:793
> > > >  do_group_exit+0x125/0x310 kernel/exit.c:904
> > > >  get_signal+0x40b/0x1ee0 kernel/signal.c:2743
> > > >  do_signal+0x82/0x2520 arch/x86/kernel/signal.c:810
> > > >  exit_to_usermode_loop arch/x86/entry/common.c:218 [inline]
> > > >  __prepare_exit_to_usermode+0x156/0x1f0 arch/x86/entry/common.c:252
> > > >  do_syscall_64+0x6c/0xe0 arch/x86/entry/common.c:376
> > > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > RIP: 0033:0x45cb29
> > > > Code: Bad RIP value.
> > > > RSP: 002b:00007fb154b96cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> > > > RAX: 0000000000000001 RBX: 000000000078bf08 RCX: 000000000045cb29
> > > > RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 000000000078bf0c
> > > > RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
> > > > R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078bf0c
> > > > R13: 00007ffd3933f26f R14: 00007fb154b979c0 R15: 000000000078bf0c
> > > >
> > > >
> > > > Memory state around the buggy address:
> > > >  ffffc90001727600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > >  ffffc90001727680: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00
> > > > >ffffc90001727700: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
> > > >                          ^
> > > >  ffffc90001727780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > >  ffffc90001727800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > > > ==================================================================
> > > >
> > > >
> > > > ---
> > > > This bug is generated by a bot. It may contain errors.
> > > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > > syzbot engineers can be reached at syzkaller@googlegroups.com.
> > > >
> > > > syzbot will keep track of this bug report. See:
> > > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > > > syzbot can test patches for this bug, for details see:
> > > > https://goo.gl/tpsmEJ#testing-patches
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> > > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20200704164522.GO9247%40paulmck-ThinkPad-P72.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: stack-out-of-bounds Read in csd_lock_record
  2020-07-07 16:26       ` Paul E. McKenney
@ 2020-07-09 10:13         ` Dmitry Vyukov
  2020-07-09 16:45           ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Vyukov @ 2020-07-09 10:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar,
	Peter Zijlstra, syzkaller-bugs, Thomas Gleixner

On Tue, Jul 7, 2020 at 6:26 PM Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Tue, Jul 07, 2020 at 05:51:48PM +0200, Dmitry Vyukov wrote:
> > On Sat, Jul 4, 2020 at 8:34 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > >
> > > On Sat, Jul 4, 2020 at 6:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > >
> > > > On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote:
> > > > > Hello,
> > > > >
> > > > > syzbot found the following crash on:
> > > > >
> > > > > HEAD commit:    9e50b94b Add linux-next specific files for 20200703
> > > > > git tree:       linux-next
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000
> > > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
> > > > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000
> > > > >
> > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > > > Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com
> > > >
> > > > Good catch!  A call to csd_lock_record() was on the wrong side of a
> > > > call to csd_unlock().
> > >
> > > Thanks for taking a look.
> > >
> > > > But is folded into another commit for bisectability reasons, so
> > > > "Reported-by" would not make sense.  I have instead added this to the
> > > > commit log:
> > > >
> > > > [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ]
> > > > Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com
> > > > Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com
> > >
> > > This should work, as far as I remember sybot looks for the email+hash
> > > anywhere in the commit.
> > > FWIW Tested-by can make sense as well.
> >
> > Paul, there is also some spike of stalls in smp_call_function,
> > if you look at the top ones at:
> > https://syzkaller.appspot.com/upstream#open
> >
> > Can these be caused by the same root cause?
> > I am not sure what trees the bug was/is present... This seems to only
> > happen on linux-next and nowhere else. But these stalls equally happen
> > on mainline...
>
> I would be surprised, given that the csd_unlock() was before the faulting
> reference.  But then again, I have been surprised before.

Yes, it seems unrelated.
It looks like something broken in the kernel recently and now instead
of diagnosing a stall on one CPU, it diagnoses it as a stall in
smp_call_function on another CPU. This produces large number of
assorted stall reports which are not too actionable...


> You aren't running scftorture with its longwait parameter set to a
> non-zero value, are you?  In that case, stalls are expected behavior.
> This is to support test the CSD lock diagnostics in -rcu.  Which isn't
> in mainline yet, so maybe I am asking a stupid question.

Since I don't know what is scftorture/longwait, I guess I am not running it :)

> If these are repeatable, one thing to try is to build the kernel with
> CSD_LOCK_WAIT_DEBUG=y.  This requires c6c67d89c059 ("smp: Add source and
> destination CPUs to __call_single_data") and 216d15e0d870 ("kernel/smp:
> Provide CSD lock timeout diagnostics") from the -rcu tree's "dev" branch.
> This will dump out the smp_call_function() function that was to be
> invoked, on the off-chance that the problem is something like lock
> contention in that function.

Here are some with reproducers:
https://syzkaller.appspot.com/bug?id=8a1e95291152ce5afea43c103a1fd62a257fcf4b
https://syzkaller.appspot.com/bug?id=5e3ac329b6304aacc6304cfaab1a514bca12ce82
https://syzkaller.appspot.com/bug?id=a01b4478f89e19cee91531f7c2b7751f0caf8c0c
https://syzkaller.appspot.com/bug?id=e4caef9fc41d0c019c532a4257faec129699a42e

But the question is if this CSD_LOCK_WAIT_DEBUG=y is useful in
general? Should we enable it all the time?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: KASAN: stack-out-of-bounds Read in csd_lock_record
  2020-07-09 10:13         ` Dmitry Vyukov
@ 2020-07-09 16:45           ` Paul E. McKenney
  0 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2020-07-09 16:45 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: syzbot, Sebastian Andrzej Siewior, LKML, Ingo Molnar,
	Peter Zijlstra, syzkaller-bugs, Thomas Gleixner

On Thu, Jul 09, 2020 at 12:13:44PM +0200, Dmitry Vyukov wrote:
> On Tue, Jul 7, 2020 at 6:26 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Tue, Jul 07, 2020 at 05:51:48PM +0200, Dmitry Vyukov wrote:
> > > On Sat, Jul 4, 2020 at 8:34 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> > > >
> > > > On Sat, Jul 4, 2020 at 6:45 PM Paul E. McKenney <paulmck@kernel.org> wrote:
> > > > >
> > > > > On Fri, Jul 03, 2020 at 04:31:22PM -0700, syzbot wrote:
> > > > > > Hello,
> > > > > >
> > > > > > syzbot found the following crash on:
> > > > > >
> > > > > > HEAD commit:    9e50b94b Add linux-next specific files for 20200703
> > > > > > git tree:       linux-next
> > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1024b405100000
> > > > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=f99cc0faa1476ed6
> > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=0f719294463916a3fc0e
> > > > > > compiler:       gcc (GCC) 10.1.0-syz 20200507
> > > > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16dc490f100000
> > > > > >
> > > > > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > > > > Reported-by: syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com
> > > > >
> > > > > Good catch!  A call to csd_lock_record() was on the wrong side of a
> > > > > call to csd_unlock().
> > > >
> > > > Thanks for taking a look.
> > > >
> > > > > But is folded into another commit for bisectability reasons, so
> > > > > "Reported-by" would not make sense.  I have instead added this to the
> > > > > commit log:
> > > > >
> > > > > [ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ]
> > > > > Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com
> > > > > Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com
> > > >
> > > > This should work, as far as I remember sybot looks for the email+hash
> > > > anywhere in the commit.
> > > > FWIW Tested-by can make sense as well.
> > >
> > > Paul, there is also some spike of stalls in smp_call_function,
> > > if you look at the top ones at:
> > > https://syzkaller.appspot.com/upstream#open
> > >
> > > Can these be caused by the same root cause?
> > > I am not sure what trees the bug was/is present... This seems to only
> > > happen on linux-next and nowhere else. But these stalls equally happen
> > > on mainline...
> >
> > I would be surprised, given that the csd_unlock() was before the faulting
> > reference.  But then again, I have been surprised before.
> 
> Yes, it seems unrelated.
> It looks like something broken in the kernel recently and now instead
> of diagnosing a stall on one CPU, it diagnoses it as a stall in
> smp_call_function on another CPU. This produces large number of
> assorted stall reports which are not too actionable...
> 
> 
> > You aren't running scftorture with its longwait parameter set to a
> > non-zero value, are you?  In that case, stalls are expected behavior.
> > This is to support test the CSD lock diagnostics in -rcu.  Which isn't
> > in mainline yet, so maybe I am asking a stupid question.
> 
> Since I don't know what is scftorture/longwait, I guess I am not running it :)
> 
> > If these are repeatable, one thing to try is to build the kernel with
> > CSD_LOCK_WAIT_DEBUG=y.  This requires c6c67d89c059 ("smp: Add source and
> > destination CPUs to __call_single_data") and 216d15e0d870 ("kernel/smp:
> > Provide CSD lock timeout diagnostics") from the -rcu tree's "dev" branch.
> > This will dump out the smp_call_function() function that was to be
> > invoked, on the off-chance that the problem is something like lock
> > contention in that function.
> 
> Here are some with reproducers:
> https://syzkaller.appspot.com/bug?id=8a1e95291152ce5afea43c103a1fd62a257fcf4b
> https://syzkaller.appspot.com/bug?id=5e3ac329b6304aacc6304cfaab1a514bca12ce82
> https://syzkaller.appspot.com/bug?id=a01b4478f89e19cee91531f7c2b7751f0caf8c0c
> https://syzkaller.appspot.com/bug?id=e4caef9fc41d0c019c532a4257faec129699a42e
> 
> But the question is if this CSD_LOCK_WAIT_DEBUG=y is useful in
> general? Should we enable it all the time?

The CSD_LOCK_WAIT_DEBUG functionality is quite new, so it is quite
possible that it is causing rather than detecting problems.  ;-)

But once it is stable, then yes, it might be quite generally useful.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip: core/rcu] kernel/smp: Provide CSD lock timeout diagnostics
  2020-07-03 23:31 KASAN: stack-out-of-bounds Read in csd_lock_record syzbot
  2020-07-04  0:48 ` syzbot
  2020-07-04 16:45 ` Paul E. McKenney
@ 2020-10-09  6:35 ` tip-bot2 for Paul E. McKenney
  2 siblings, 0 replies; 9+ messages in thread
From: tip-bot2 for Paul E. McKenney @ 2020-10-09  6:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Sebastian Andrzej Siewior, Paul E. McKenney, x86, LKML

The following commit has been merged into the core/rcu branch of tip:

Commit-ID:     35feb60474bf4f7fa7840e14fc7fd344996b919d
Gitweb:        https://git.kernel.org/tip/35feb60474bf4f7fa7840e14fc7fd344996b919d
Author:        Paul E. McKenney <paulmck@kernel.org>
AuthorDate:    Tue, 30 Jun 2020 13:22:54 -07:00
Committer:     Paul E. McKenney <paulmck@kernel.org>
CommitterDate: Fri, 04 Sep 2020 11:52:50 -07:00

kernel/smp: Provide CSD lock timeout diagnostics

This commit causes csd_lock_wait() to emit diagnostics when a CPU
fails to respond quickly enough to one of the smp_call_function()
family of function calls.  These diagnostics are enabled by a new
CSD_LOCK_WAIT_DEBUG Kconfig option that depends on DEBUG_KERNEL.

This commit was inspired by an earlier patch by Josef Bacik.

[ paulmck: Fix for syzbot+0f719294463916a3fc0e@syzkaller.appspotmail.com ]
[ paulmck: Fix KASAN use-after-free issue reported by Qian Cai. ]
[ paulmck: Fix botched nr_cpu_ids comparison per Dan Carpenter. ]
[ paulmck: Apply Peter Zijlstra feedback. ]
Link: https://lore.kernel.org/lkml/00000000000042f21905a991ecea@google.com
Link: https://lore.kernel.org/lkml/0000000000002ef21705a9933cf3@google.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/smp.c      | 132 ++++++++++++++++++++++++++++++++++++++++++++-
 lib/Kconfig.debug |  11 ++++-
 2 files changed, 141 insertions(+), 2 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 865a876..c5d3188 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -20,6 +20,9 @@
 #include <linux/sched.h>
 #include <linux/sched/idle.h>
 #include <linux/hypervisor.h>
+#include <linux/sched/clock.h>
+#include <linux/nmi.h>
+#include <linux/sched/debug.h>
 
 #include "smpboot.h"
 #include "sched/smp.h"
@@ -96,6 +99,103 @@ void __init call_function_init(void)
 	smpcfd_prepare_cpu(smp_processor_id());
 }
 
+#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
+
+static DEFINE_PER_CPU(call_single_data_t *, cur_csd);
+static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func);
+static DEFINE_PER_CPU(void *, cur_csd_info);
+
+#define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC)
+atomic_t csd_bug_count = ATOMIC_INIT(0);
+
+/* Record current CSD work for current CPU, NULL to erase. */
+static void csd_lock_record(call_single_data_t *csd)
+{
+	if (!csd) {
+		smp_mb(); /* NULL cur_csd after unlock. */
+		__this_cpu_write(cur_csd, NULL);
+		return;
+	}
+	__this_cpu_write(cur_csd_func, csd->func);
+	__this_cpu_write(cur_csd_info, csd->info);
+	smp_wmb(); /* func and info before csd. */
+	__this_cpu_write(cur_csd, csd);
+	smp_mb(); /* Update cur_csd before function call. */
+		  /* Or before unlock, as the case may be. */
+}
+
+static __always_inline int csd_lock_wait_getcpu(call_single_data_t *csd)
+{
+	unsigned int csd_type;
+
+	csd_type = CSD_TYPE(csd);
+	if (csd_type == CSD_TYPE_ASYNC || csd_type == CSD_TYPE_SYNC)
+		return csd->dst; /* Other CSD_TYPE_ values might not have ->dst. */
+	return -1;
+}
+
+/*
+ * Complain if too much time spent waiting.  Note that only
+ * the CSD_TYPE_SYNC/ASYNC types provide the destination CPU,
+ * so waiting on other types gets much less information.
+ */
+static __always_inline bool csd_lock_wait_toolong(call_single_data_t *csd, u64 ts0, u64 *ts1, int *bug_id)
+{
+	int cpu = -1;
+	int cpux;
+	bool firsttime;
+	u64 ts2, ts_delta;
+	call_single_data_t *cpu_cur_csd;
+	unsigned int flags = READ_ONCE(csd->flags);
+
+	if (!(flags & CSD_FLAG_LOCK)) {
+		if (!unlikely(*bug_id))
+			return true;
+		cpu = csd_lock_wait_getcpu(csd);
+		pr_alert("csd: CSD lock (#%d) got unstuck on CPU#%02d, CPU#%02d released the lock.\n",
+			 *bug_id, raw_smp_processor_id(), cpu);
+		return true;
+	}
+
+	ts2 = sched_clock();
+	ts_delta = ts2 - *ts1;
+	if (likely(ts_delta <= CSD_LOCK_TIMEOUT))
+		return false;
+
+	firsttime = !*bug_id;
+	if (firsttime)
+		*bug_id = atomic_inc_return(&csd_bug_count);
+	cpu = csd_lock_wait_getcpu(csd);
+	if (WARN_ONCE(cpu < 0 || cpu >= nr_cpu_ids, "%s: cpu = %d\n", __func__, cpu))
+		cpux = 0;
+	else
+		cpux = cpu;
+	cpu_cur_csd = smp_load_acquire(&per_cpu(cur_csd, cpux)); /* Before func and info. */
+	pr_alert("csd: %s non-responsive CSD lock (#%d) on CPU#%d, waiting %llu ns for CPU#%02d %pS(%ps).\n",
+		 firsttime ? "Detected" : "Continued", *bug_id, raw_smp_processor_id(), ts2 - ts0,
+		 cpu, csd->func, csd->info);
+	if (cpu_cur_csd && csd != cpu_cur_csd) {
+		pr_alert("\tcsd: CSD lock (#%d) handling prior %pS(%ps) request.\n",
+			 *bug_id, READ_ONCE(per_cpu(cur_csd_func, cpux)),
+			 READ_ONCE(per_cpu(cur_csd_info, cpux)));
+	} else {
+		pr_alert("\tcsd: CSD lock (#%d) %s.\n",
+			 *bug_id, !cpu_cur_csd ? "unresponsive" : "handling this request");
+	}
+	if (cpu >= 0) {
+		if (!trigger_single_cpu_backtrace(cpu))
+			dump_cpu_task(cpu);
+		if (!cpu_cur_csd) {
+			pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
+			arch_send_call_function_single_ipi(cpu);
+		}
+	}
+	dump_stack();
+	*ts1 = ts2;
+
+	return false;
+}
+
 /*
  * csd_lock/csd_unlock used to serialize access to per-cpu csd resources
  *
@@ -105,8 +205,28 @@ void __init call_function_init(void)
  */
 static __always_inline void csd_lock_wait(call_single_data_t *csd)
 {
+	int bug_id = 0;
+	u64 ts0, ts1;
+
+	ts1 = ts0 = sched_clock();
+	for (;;) {
+		if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id))
+			break;
+		cpu_relax();
+	}
+	smp_acquire__after_ctrl_dep();
+}
+
+#else
+static void csd_lock_record(call_single_data_t *csd)
+{
+}
+
+static __always_inline void csd_lock_wait(call_single_data_t *csd)
+{
 	smp_cond_load_acquire(&csd->flags, !(VAL & CSD_FLAG_LOCK));
 }
+#endif
 
 static __always_inline void csd_lock(call_single_data_t *csd)
 {
@@ -166,9 +286,11 @@ static int generic_exec_single(int cpu, call_single_data_t *csd)
 		 * We can unlock early even for the synchronous on-stack case,
 		 * since we're doing this from the same CPU..
 		 */
+		csd_lock_record(csd);
 		csd_unlock(csd);
 		local_irq_save(flags);
 		func(info);
+		csd_lock_record(NULL);
 		local_irq_restore(flags);
 		return 0;
 	}
@@ -268,8 +390,10 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
 				entry = &csd_next->llist;
 			}
 
+			csd_lock_record(csd);
 			func(info);
 			csd_unlock(csd);
+			csd_lock_record(NULL);
 		} else {
 			prev = &csd->llist;
 		}
@@ -296,8 +420,10 @@ static void flush_smp_call_function_queue(bool warn_cpu_offline)
 				smp_call_func_t func = csd->func;
 				void *info = csd->info;
 
+				csd_lock_record(csd);
 				csd_unlock(csd);
 				func(info);
+				csd_lock_record(NULL);
 			} else if (type == CSD_TYPE_IRQ_WORK) {
 				irq_work_single(csd);
 			}
@@ -375,7 +501,8 @@ int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
 
 	csd->func = func;
 	csd->info = info;
-#ifdef CONFIG_64BIT
+#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
+	csd->src = smp_processor_id();
 	csd->dst = cpu;
 #endif
 
@@ -543,7 +670,8 @@ static void smp_call_function_many_cond(const struct cpumask *mask,
 			csd->flags |= CSD_TYPE_SYNC;
 		csd->func = func;
 		csd->info = info;
-#ifdef CONFIG_64BIT
+#ifdef CONFIG_CSD_LOCK_WAIT_DEBUG
+		csd->src = smp_processor_id();
 		csd->dst = cpu;
 #endif
 		if (llist_add(&csd->llist, &per_cpu(call_single_queue, cpu)))
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e068c3c..86a35fd 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1367,6 +1367,17 @@ config WW_MUTEX_SELFTEST
 	  Say M if you want these self tests to build as a module.
 	  Say N if you are unsure.
 
+config CSD_LOCK_WAIT_DEBUG
+	bool "Debugging for csd_lock_wait(), called from smp_call_function*()"
+	depends on DEBUG_KERNEL
+	depends on 64BIT
+	default n
+	help
+	  This option enables debug prints when CPUs are slow to respond
+	  to the smp_call_function*() IPI wrappers.  These debug prints
+	  include the IPI handler function currently executing (if any)
+	  and relevant stack traces.
+
 endmenu # lock debugging
 
 config TRACE_IRQFLAGS

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-10-09  6:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-03 23:31 KASAN: stack-out-of-bounds Read in csd_lock_record syzbot
2020-07-04  0:48 ` syzbot
2020-07-04 16:45 ` Paul E. McKenney
2020-07-04 18:34   ` Dmitry Vyukov
2020-07-07 15:51     ` Dmitry Vyukov
2020-07-07 16:26       ` Paul E. McKenney
2020-07-09 10:13         ` Dmitry Vyukov
2020-07-09 16:45           ` Paul E. McKenney
2020-10-09  6:35 ` [tip: core/rcu] kernel/smp: Provide CSD lock timeout diagnostics tip-bot2 for Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).