All of lore.kernel.org
 help / color / mirror / Atom feed
* [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult
@ 2024-03-18 10:07 syzbot
  2024-03-21  0:25 ` Edward Adam Davis
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: syzbot @ 2024-03-18 10:07 UTC (permalink / raw)
  To: 42.hyeyoo, andrii, ast, bpf, daniel, davem, edumazet, jakub,
	john.fastabend, kafai, kpsingh, kuba, linux-kernel, namhyung,
	netdev, pabeni, peterz, songliubraving, syzkaller-bugs, yhs

Hello,

syzbot found the following issue on:

HEAD commit:    ea80e3ed09ab net: ethernet: mtk_eth_soc: fix PPE hanging i..
git tree:       net
console output: https://syzkaller.appspot.com/x/log.txt?x=1249daa5180000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6fb1be60a193d440
dashboard link: https://syzkaller.appspot.com/bug?extid=c4f4d25859c2e5859988
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17fd8c81180000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1795afc1180000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/4c6c49a7ef5c/disk-ea80e3ed.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/242942b30f2d/vmlinux-ea80e3ed.xz
kernel image: https://storage.googleapis.com/syzbot-assets/74dcc2059655/bzImage-ea80e3ed.xz

The issue was bisected to:

commit ee042be16cb455116d0fe99b77c6bc8baf87c8c6
Author: Namhyung Kim <namhyung@kernel.org>
Date:   Tue Mar 22 18:57:09 2022 +0000

    locking: Apply contention tracepoints in the slow path

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1702c2a5180000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=1482c2a5180000
console output: https://syzkaller.appspot.com/x/log.txt?x=1082c2a5180000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+c4f4d25859c2e5859988@syzkaller.appspotmail.com
Fixes: ee042be16cb4 ("locking: Apply contention tracepoints in the slow path")

=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.8.0-syzkaller-05221-gea80e3ed09ab #0 Not tainted
-----------------------------------------------------
rcu_exp_gp_kthr/18 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939

and this task is already holding:
ffffffff8e136558 (rcu_node_0){-.-.}-{2:2}, at: sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
which would create a new lock dependency:
 (rcu_node_0){-.-.}-{2:2} -> (&htab->buckets[i].lock){+...}-{2:2}

but this new dependency connects a HARDIRQ-irq-safe lock:
 (rcu_node_0){-.-.}-{2:2}

... which became HARDIRQ-irq-safe at:
  lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
  rcu_report_exp_cpu_mult+0x27/0x2f0 kernel/rcu/tree_exp.h:238
  csd_do_func kernel/smp.c:133 [inline]
  __flush_smp_call_function_queue+0xb2e/0x15b0 kernel/smp.c:542
  __sysvec_call_function_single+0xa8/0x3e0 arch/x86/kernel/smp.c:271
  instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
  sysvec_call_function_single+0x9e/0xc0 arch/x86/kernel/smp.c:266
  asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:709
  __sanitizer_cov_trace_switch+0x90/0x120
  update_event_printk kernel/trace/trace_events.c:2750 [inline]
  trace_event_eval_update+0x311/0xf90 kernel/trace/trace_events.c:2922
  process_one_work kernel/workqueue.c:3254 [inline]
  process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
  worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
  kthread+0x2f0/0x390 kernel/kthread.c:388
  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243

to a HARDIRQ-irq-unsafe lock:
 (&htab->buckets[i].lock){+...}-{2:2}

... which became HARDIRQ-irq-unsafe at:
...
  lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
  _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
  spin_lock_bh include/linux/spinlock.h:356 [inline]
  sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
  0xffffffffa0001b0e
  bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
  __bpf_prog_run include/linux/filter.h:657 [inline]
  bpf_prog_run include/linux/filter.h:664 [inline]
  __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
  bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
  trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
  __mutex_lock_common kernel/locking/mutex.c:617 [inline]
  __mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
  futex_cleanup_begin kernel/futex/core.c:1091 [inline]
  futex_exit_release+0x34/0x1f0 kernel/futex/core.c:1143
  exit_mm_release+0x1a/0x30 kernel/fork.c:1652
  exit_mm+0xb0/0x310 kernel/exit.c:542
  do_exit+0x99e/0x27e0 kernel/exit.c:865
  do_group_exit+0x207/0x2c0 kernel/exit.c:1027
  __do_sys_exit_group kernel/exit.c:1038 [inline]
  __se_sys_exit_group kernel/exit.c:1036 [inline]
  __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
  do_syscall_64+0xfb/0x240
  entry_SYSCALL_64_after_hwframe+0x6d/0x75

other info that might help us debug this:

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&htab->buckets[i].lock);
                               local_irq_disable();
                               lock(rcu_node_0);
                               lock(&htab->buckets[i].lock);
  <Interrupt>
    lock(rcu_node_0);

 *** DEADLOCK ***

2 locks held by rcu_exp_gp_kthr/18:
 #0: ffffffff8e136558 (rcu_node_0){-.-.}-{2:2}, at: sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
 #1: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
 #1: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
 #1: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2380 [inline]
 #1: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x114/0x420 kernel/trace/bpf_trace.c:2420

the dependencies between HARDIRQ-irq-safe lock and the holding lock:
-> (rcu_node_0){-.-.}-{2:2} {
   IN-HARDIRQ-W at:
                    lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
                    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
                    _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
                    rcu_report_exp_cpu_mult+0x27/0x2f0 kernel/rcu/tree_exp.h:238
                    csd_do_func kernel/smp.c:133 [inline]
                    __flush_smp_call_function_queue+0xb2e/0x15b0 kernel/smp.c:542
                    __sysvec_call_function_single+0xa8/0x3e0 arch/x86/kernel/smp.c:271
                    instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
                    sysvec_call_function_single+0x9e/0xc0 arch/x86/kernel/smp.c:266
                    asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:709
                    __sanitizer_cov_trace_switch+0x90/0x120
                    update_event_printk kernel/trace/trace_events.c:2750 [inline]
                    trace_event_eval_update+0x311/0xf90 kernel/trace/trace_events.c:2922
                    process_one_work kernel/workqueue.c:3254 [inline]
                    process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
                    worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
                    kthread+0x2f0/0x390 kernel/kthread.c:388
                    ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
                    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
   IN-SOFTIRQ-W at:
                    lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
                    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
                    _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
                    rcu_report_qs_rdp kernel/rcu/tree.c:2018 [inline]
                    rcu_check_quiescent_state kernel/rcu/tree.c:2100 [inline]
                    rcu_core+0x3ae/0x1830 kernel/rcu/tree.c:2455
                    __do_softirq+0x2bc/0x943 kernel/softirq.c:554
                    invoke_softirq kernel/softirq.c:428 [inline]
                    __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
                    irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
                    instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
                    sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
                    asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
                    unwind_next_frame+0x1d8e/0x2a00 arch/x86/kernel/unwind_orc.c:665
                    arch_stack_walk+0x151/0x1b0 arch/x86/kernel/stacktrace.c:25
                    stack_trace_save+0x118/0x1d0 kernel/stacktrace.c:122
                    save_stack+0xfb/0x1f0 mm/page_owner.c:129
                    __set_page_owner+0x29/0x380 mm/page_owner.c:195
                    set_page_owner include/linux/page_owner.h:31 [inline]
                    post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
                    prep_new_page mm/page_alloc.c:1540 [inline]
                    get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
                    __alloc_pages+0x256/0x680 mm/page_alloc.c:4569
                    __alloc_pages_node include/linux/gfp.h:238 [inline]
                    alloc_pages_node include/linux/gfp.h:261 [inline]
                    alloc_slab_page+0x5f/0x160 mm/slub.c:2190
                    allocate_slab mm/slub.c:2354 [inline]
                    new_slab+0x84/0x2f0 mm/slub.c:2407
                    ___slab_alloc+0xd1b/0x13e0 mm/slub.c:3540
                    __slab_alloc mm/slub.c:3625 [inline]
                    __slab_alloc_node mm/slub.c:3678 [inline]
                    slab_alloc_node mm/slub.c:3850 [inline]
                    kmalloc_trace+0x267/0x360 mm/slub.c:4007
                    kmalloc include/linux/slab.h:590 [inline]
                    kzalloc include/linux/slab.h:711 [inline]
                    ddebug_add_module+0x88/0x800 lib/dynamic_debug.c:1240
                    dynamic_debug_init+0x205/0x5a0 lib/dynamic_debug.c:1446
                    do_one_initcall+0x238/0x830 init/main.c:1241
                    do_pre_smp_initcalls+0x57/0xa0 init/main.c:1347
                    kernel_init_freeable+0x40d/0x5d0 init/main.c:1546
                    kernel_init+0x1d/0x2a0 init/main.c:1446
                    ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
                    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
   INITIAL USE at:
                   lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
                   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
                   _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
                   rcutree_prepare_cpu+0x71/0x640 kernel/rcu/tree.c:4484
                   rcu_init+0x9b/0x140 kernel/rcu/tree.c:5224
                   start_kernel+0x1f7/0x500 init/main.c:969
                   x86_64_start_reservations+0x2a/0x30 arch/x86/kernel/head64.c:509
                   x86_64_start_kernel+0x99/0xa0 arch/x86/kernel/head64.c:490
                   common_startup_64+0x13e/0x147
 }
 ... key      at: [<ffffffff945012e0>] rcu_init_one.rcu_node_class+0x0/0x20

the dependencies between the lock to be acquired
 and HARDIRQ-irq-unsafe lock:
-> (&htab->buckets[i].lock){+...}-{2:2} {
   HARDIRQ-ON-W at:
                    lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
                    __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
                    _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
                    spin_lock_bh include/linux/spinlock.h:356 [inline]
                    sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
                    0xffffffffa0001b0e
                    bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
                    __bpf_prog_run include/linux/filter.h:657 [inline]
                    bpf_prog_run include/linux/filter.h:664 [inline]
                    __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
                    bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
                    trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
                    __mutex_lock_common kernel/locking/mutex.c:617 [inline]
                    __mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
                    futex_cleanup_begin kernel/futex/core.c:1091 [inline]
                    futex_exit_release+0x34/0x1f0 kernel/futex/core.c:1143
                    exit_mm_release+0x1a/0x30 kernel/fork.c:1652
                    exit_mm+0xb0/0x310 kernel/exit.c:542
                    do_exit+0x99e/0x27e0 kernel/exit.c:865
                    do_group_exit+0x207/0x2c0 kernel/exit.c:1027
                    __do_sys_exit_group kernel/exit.c:1038 [inline]
                    __se_sys_exit_group kernel/exit.c:1036 [inline]
                    __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
                    do_syscall_64+0xfb/0x240
                    entry_SYSCALL_64_after_hwframe+0x6d/0x75
   INITIAL USE at:
                   lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
                   __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
                   _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
                   spin_lock_bh include/linux/spinlock.h:356 [inline]
                   sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
                   0xffffffffa0001b0e
                   bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
                   __bpf_prog_run include/linux/filter.h:657 [inline]
                   bpf_prog_run include/linux/filter.h:664 [inline]
                   __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
                   bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
                   trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
                   __mutex_lock_common kernel/locking/mutex.c:617 [inline]
                   __mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
                   futex_cleanup_begin kernel/futex/core.c:1091 [inline]
                   futex_exit_release+0x34/0x1f0 kernel/futex/core.c:1143
                   exit_mm_release+0x1a/0x30 kernel/fork.c:1652
                   exit_mm+0xb0/0x310 kernel/exit.c:542
                   do_exit+0x99e/0x27e0 kernel/exit.c:865
                   do_group_exit+0x207/0x2c0 kernel/exit.c:1027
                   __do_sys_exit_group kernel/exit.c:1038 [inline]
                   __se_sys_exit_group kernel/exit.c:1036 [inline]
                   __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
                   do_syscall_64+0xfb/0x240
                   entry_SYSCALL_64_after_hwframe+0x6d/0x75
 }
 ... key      at: [<ffffffff94882300>] sock_hash_alloc.__key+0x0/0x20
 ... acquired at:
   lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
   __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
   _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
   spin_lock_bh include/linux/spinlock.h:356 [inline]
   sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
   bpf_prog_43221478a22f23b5+0x42/0x46
   bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
   __bpf_prog_run include/linux/filter.h:657 [inline]
   bpf_prog_run include/linux/filter.h:664 [inline]
   __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
   bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
   trace_contention_end+0xf6/0x120 include/trace/events/lock.h:122
   __pv_queued_spin_lock_slowpath+0x939/0xc60 kernel/locking/qspinlock.c:560
   pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:584 [inline]
   queued_spin_lock_slowpath+0x42/0x50 arch/x86/include/asm/qspinlock.h:51
   queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
   do_raw_spin_lock+0x272/0x370 kernel/locking/spinlock_debug.c:116
   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
   _raw_spin_lock_irqsave+0xe1/0x120 kernel/locking/spinlock.c:162
   sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
   synchronize_rcu_expedited_wait_once kernel/rcu/tree_exp.h:516 [inline]
   synchronize_rcu_expedited_wait kernel/rcu/tree_exp.h:570 [inline]
   rcu_exp_wait_wake kernel/rcu/tree_exp.h:641 [inline]
   rcu_exp_sel_wait_wake+0x628/0x1df0 kernel/rcu/tree_exp.h:675
   kthread_worker_fn+0x4bf/0xab0 kernel/kthread.c:841
   kthread+0x2f0/0x390 kernel/kthread.c:388
   ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243


stack backtrace:
CPU: 1 PID: 18 Comm: rcu_exp_gp_kthr Not tainted 6.8.0-syzkaller-05221-gea80e3ed09ab #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
 print_bad_irq_dependency kernel/locking/lockdep.c:2626 [inline]
 check_irq_usage kernel/locking/lockdep.c:2865 [inline]
 check_prev_add kernel/locking/lockdep.c:3138 [inline]
 check_prevs_add kernel/locking/lockdep.c:3253 [inline]
 validate_chain+0x4dc7/0x58e0 kernel/locking/lockdep.c:3869
 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
 lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
 _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
 spin_lock_bh include/linux/spinlock.h:356 [inline]
 sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
 bpf_prog_43221478a22f23b5+0x42/0x46
 bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
 __bpf_prog_run include/linux/filter.h:657 [inline]
 bpf_prog_run include/linux/filter.h:664 [inline]
 __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
 bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
 trace_contention_end+0xf6/0x120 include/trace/events/lock.h:122
 __pv_queued_spin_lock_slowpath+0x939/0xc60 kernel/locking/qspinlock.c:560
 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:584 [inline]
 queued_spin_lock_slowpath+0x42/0x50 arch/x86/include/asm/qspinlock.h:51
 queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
 do_raw_spin_lock+0x272/0x370 kernel/locking/spinlock_debug.c:116
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
 _raw_spin_lock_irqsave+0xe1/0x120 kernel/locking/spinlock.c:162
 sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
 synchronize_rcu_expedited_wait_once kernel/rcu/tree_exp.h:516 [inline]
 synchronize_rcu_expedited_wait kernel/rcu/tree_exp.h:570 [inline]
 rcu_exp_wait_wake kernel/rcu/tree_exp.h:641 [inline]
 rcu_exp_sel_wait_wake+0x628/0x1df0 kernel/rcu/tree_exp.h:675
 kthread_worker_fn+0x4bf/0xab0 kernel/kthread.c:841
 kthread+0x2f0/0x390 kernel/kthread.c:388
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
 </TASK>
------------[ cut here ]------------
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 1 PID: 18 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x29/0x40 kernel/locking/irqflag-debug.c:10
Modules linked in:
CPU: 1 PID: 18 Comm: rcu_exp_gp_kthr Not tainted 6.8.0-syzkaller-05221-gea80e3ed09ab #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
RIP: 0010:warn_bogus_irq_restore+0x29/0x40 kernel/locking/irqflag-debug.c:10
Code: 90 f3 0f 1e fa 90 80 3d 9e 69 01 04 00 74 06 90 c3 cc cc cc cc c6 05 8f 69 01 04 01 90 48 c7 c7 20 ba aa 8b e8 f8 e5 e7 f5 90 <0f> 0b 90 90 90 c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
RSP: 0018:ffffc90000177bb8 EFLAGS: 00010246
RAX: bd04dc17ab040900 RBX: 1ffff9200002ef7c RCX: ffff8880172c1e00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90000177c48 R08: ffffffff8157cc12 R09: 1ffff9200002eecc
R10: dffffc0000000000 R11: fffff5200002eecd R12: dffffc0000000000
R13: 1ffff9200002ef78 R14: ffffc90000177be0 R15: 0000000000000246
FS:  0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6d95bcb0d0 CR3: 000000002098e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline]
 _raw_spin_unlock_irqrestore+0x120/0x140 kernel/locking/spinlock.c:194
 sync_rcu_exp_done_unlocked+0xdb/0x140 kernel/rcu/tree_exp.h:171
 synchronize_rcu_expedited_wait_once kernel/rcu/tree_exp.h:516 [inline]
 synchronize_rcu_expedited_wait kernel/rcu/tree_exp.h:570 [inline]
 rcu_exp_wait_wake kernel/rcu/tree_exp.h:641 [inline]
 rcu_exp_sel_wait_wake+0x628/0x1df0 kernel/rcu/tree_exp.h:675
 kthread_worker_fn+0x4bf/0xab0 kernel/kthread.c:841
 kthread+0x2f0/0x390 kernel/kthread.c:388
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult
  2024-03-18 10:07 [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult syzbot
@ 2024-03-21  0:25 ` Edward Adam Davis
  2024-03-21 15:04   ` syzbot
  2024-03-22  0:17 ` Edward Adam Davis
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Edward Adam Davis @ 2024-03-21  0:25 UTC (permalink / raw)
  To: syzbot+c4f4d25859c2e5859988; +Cc: linux-kernel, syzkaller-bugs

please test dl in rcu_report_exp_cpu_mult

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 27d733c0f65e..8a21a59eb599 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -932,11 +932,12 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
 	struct bpf_shtab_bucket *bucket;
 	struct bpf_shtab_elem *elem;
 	int ret = -ENOENT;
+	unsigned long flags;
 
 	hash = sock_hash_bucket_hash(key, key_size);
 	bucket = sock_hash_select_bucket(htab, hash);
 
-	spin_lock_bh(&bucket->lock);
+	spin_lock_irqsave(&bucket->lock, flags);
 	elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
 	if (elem) {
 		hlist_del_rcu(&elem->node);
@@ -944,7 +945,7 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
 		sock_hash_free_elem(htab, elem);
 		ret = 0;
 	}
-	spin_unlock_bh(&bucket->lock);
+	spin_unlock_irqrestore(&bucket->lock, flags);
 	return ret;
 }
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult
  2024-03-21  0:25 ` Edward Adam Davis
@ 2024-03-21 15:04   ` syzbot
  0 siblings, 0 replies; 13+ messages in thread
From: syzbot @ 2024-03-21 15:04 UTC (permalink / raw)
  To: eadavis, linux-kernel, syzkaller-bugs

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
possible deadlock in add_timer_on

=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.8.0-syzkaller-05231-ga51cd6bf8e10-dirty #0 Not tainted
-----------------------------------------------------
udevd/5417 [HC0[0]:SC1[1]:HE0:SE0] is trying to acquire:
ffff88806c3c9020 (&htab->buckets[i].lock){+.-.}-{2:2}, at: sock_hash_delete_elem+0xb1/0x2f0 net/core/sock_map.c:940

and this task is already holding:
ffffffff94697d58 (&obj_hash[i].lock){-.-.}-{2:2}, at: debug_object_active_state+0x15d/0x360 lib/debugobjects.c:936
which would create a new lock dependency:
 (&obj_hash[i].lock){-.-.}-{2:2} -> (&htab->buckets[i].lock){+.-.}-{2:2}

but this new dependency connects a HARDIRQ-irq-safe lock:
 (&obj_hash[i].lock){-.-.}-{2:2}

... which became HARDIRQ-irq-safe at:
  lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
  debug_object_assert_init+0x164/0x440 lib/debugobjects.c:897
  debug_timer_assert_init kernel/time/timer.c:846 [inline]
  debug_assert_init kernel/time/timer.c:891 [inline]
  add_timer_on+0xc3/0x5c0 kernel/time/timer.c:1351
  handle_irq_event_percpu kernel/irq/handle.c:195 [inline]
  handle_irq_event+0xad/0x1f0 kernel/irq/handle.c:210
  handle_level_irq+0x3c5/0x6e0 kernel/irq/chip.c:648
  generic_handle_irq_desc include/linux/irqdesc.h:161 [inline]
  handle_irq arch/x86/kernel/irq.c:238 [inline]
  __common_interrupt+0x13a/0x230 arch/x86/kernel/irq.c:257
  common_interrupt+0xa5/0xd0 arch/x86/kernel/irq.c:247
  asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693
  __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]
  _raw_spin_unlock_irqrestore+0xd8/0x140 kernel/locking/spinlock.c:194
  __setup_irq+0x1277/0x1cf0 kernel/irq/manage.c:1818
  request_threaded_irq+0x2ab/0x380 kernel/irq/manage.c:2202
  request_irq include/linux/interrupt.h:168 [inline]
  setup_default_timer_irq+0x25/0x60 arch/x86/kernel/time.c:70
  x86_late_time_init+0x66/0xc0 arch/x86/kernel/time.c:94
  start_kernel+0x3f3/0x500 init/main.c:1039
  x86_64_start_reservations+0x2a/0x30 arch/x86/kernel/head64.c:509
  x86_64_start_kernel+0x99/0xa0 arch/x86/kernel/head64.c:490
  common_startup_64+0x13e/0x147

to a HARDIRQ-irq-unsafe lock:
 (&htab->buckets[i].lock){+.-.}-{2:2}

... which became HARDIRQ-irq-unsafe at:
...
  lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
  _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
  spin_lock_bh include/linux/spinlock.h:356 [inline]
  sock_hash_free+0x164/0x820 net/core/sock_map.c:1155
  bpf_map_free_deferred+0xe6/0x110 kernel/bpf/syscall.c:734
  process_one_work kernel/workqueue.c:3254 [inline]
  process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
  worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
  kthread+0x2f0/0x390 kernel/kthread.c:388
  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243

other info that might help us debug this:

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&htab->buckets[i].lock);
                               local_irq_disable();
                               lock(&obj_hash[i].lock);
                               lock(&htab->buckets[i].lock);
  <Interrupt>
    lock(&obj_hash[i].lock);

 *** DEADLOCK ***

5 locks held by udevd/5417:
 #0: ffff88802a208420 (sb_writers#5){.+.+}-{0:0}, at: mnt_want_write+0x3f/0x90 fs/namespace.c:409
 #1: ffff8880758002d0 (&type->i_mutex_dir_key#5){++++}-{3:3}, at: inode_lock include/linux/fs.h:793 [inline]
 #1: ffff8880758002d0 (&type->i_mutex_dir_key#5){++++}-{3:3}, at: open_last_lookups fs/namei.c:3564 [inline]
 #1: ffff8880758002d0 (&type->i_mutex_dir_key#5){++++}-{3:3}, at: path_openat+0x7d3/0x3240 fs/namei.c:3797
 #2: ffffffff8e236790 (remove_cache_srcu){.+.+}-{0:0}, at: srcu_lock_acquire include/linux/srcu.h:116 [inline]
 #2: ffffffff8e236790 (remove_cache_srcu){.+.+}-{0:0}, at: srcu_read_lock+0x24/0x50 include/linux/srcu.h:215
 #3: ffffffff94697d58 (&obj_hash[i].lock){-.-.}-{2:2}, at: debug_object_active_state+0x15d/0x360 lib/debugobjects.c:936
 #4: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
 #4: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
 #4: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2380 [inline]
 #4: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x114/0x420 kernel/trace/bpf_trace.c:2420

the dependencies between HARDIRQ-irq-safe lock and the holding lock:
-> (&obj_hash[i].lock){-.-.}-{2:2} {
   IN-HARDIRQ-W at:
                    lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
                    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
                    _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
                    debug_object_assert_init+0x164/0x440 lib/debugobjects.c:897
                    debug_timer_assert_init kernel/time/timer.c:846 [inline]
                    debug_assert_init kernel/time/timer.c:891 [inline]
                    add_timer_on+0xc3/0x5c0 kernel/time/timer.c:1351
                    handle_irq_event_percpu kernel/irq/handle.c:195 [inline]
                    handle_irq_event+0xad/0x1f0 kernel/irq/handle.c:210
                    handle_level_irq+0x3c5/0x6e0 kernel/irq/chip.c:648
                    generic_handle_irq_desc include/linux/irqdesc.h:161 [inline]
                    handle_irq arch/x86/kernel/irq.c:238 [inline]
                    __common_interrupt+0x13a/0x230 arch/x86/kernel/irq.c:257
                    common_interrupt+0xa5/0xd0 arch/x86/kernel/irq.c:247
                    asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693
                    __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]
                    _raw_spin_unlock_irqrestore+0xd8/0x140 kernel/locking/spinlock.c:194
                    __setup_irq+0x1277/0x1cf0 kernel/irq/manage.c:1818
                    request_threaded_irq+0x2ab/0x380 kernel/irq/manage.c:2202
                    request_irq include/linux/interrupt.h:168 [inline]
                    setup_default_timer_irq+0x25/0x60 arch/x86/kernel/time.c:70
                    x86_late_time_init+0x66/0xc0 arch/x86/kernel/time.c:94
                    start_kernel+0x3f3/0x500 init/main.c:1039
                    x86_64_start_reservations+0x2a/0x30 arch/x86/kernel/head64.c:509
                    x86_64_start_kernel+0x99/0xa0 arch/x86/kernel/head64.c:490
                    common_startup_64+0x13e/0x147
   IN-SOFTIRQ-W at:
                    lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
                    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
                    _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
                    debug_object_deactivate+0x158/0x390 lib/debugobjects.c:763
                    debug_timer_deactivate kernel/time/timer.c:841 [inline]
                    debug_deactivate kernel/time/timer.c:885 [inline]
                    detach_timer+0x24/0x300 kernel/time/timer.c:932
                    expire_timers kernel/time/timer.c:1826 [inline]
                    __run_timers kernel/time/timer.c:2408 [inline]
                    __run_timer_base+0x5ef/0x8e0 kernel/time/timer.c:2419
                    run_timer_base kernel/time/timer.c:2428 [inline]
                    run_timer_softirq+0x67/0x170 kernel/time/timer.c:2436
                    __do_softirq+0x2be/0x943 kernel/softirq.c:554
                    invoke_softirq kernel/softirq.c:428 [inline]
                    __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
                    irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
                    common_interrupt+0xaa/0xd0 arch/x86/kernel/irq.c:247
                    asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693
                    console_flush_all+0x9cd/0xec0
                    console_unlock+0x13b/0x4d0 kernel/printk/printk.c:3025
                    vprintk_emit+0x509/0x720 kernel/printk/printk.c:2292
                    _printk+0xd5/0x120 kernel/printk/printk.c:2317
                    calibrate_delay+0x1597/0x16b0 init/calibrate.c:308
                    start_kernel+0x3fd/0x500 init/main.c:1041
                    x86_64_start_reservations+0x2a/0x30 arch/x86/kernel/head64.c:509
                    x86_64_start_kernel+0x99/0xa0 arch/x86/kernel/head64.c:490
                    common_startup_64+0x13e/0x147
   INITIAL USE at:
 }
 ... key      at: [<ffffffff9466d4c0>] debug_objects_early_init.__key+0x0/0x20

the dependencies between the lock to be acquired
 and HARDIRQ-irq-unsafe lock:
-> (&htab->buckets[i].lock){+.-.}-{2:2} {
   HARDIRQ-ON-W at:
                    lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
                    __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
                    _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
                    spin_lock_bh include/linux/spinlock.h:356 [inline]
                    sock_hash_free+0x164/0x820 net/core/sock_map.c:1155
                    bpf_map_free_deferred+0xe6/0x110 kernel/bpf/syscall.c:734
                    process_one_work kernel/workqueue.c:3254 [inline]
                    process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
                    worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
                    kthread+0x2f0/0x390 kernel/kthread.c:388
                    ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
                    ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
   IN-SOFTIRQ-W at:
                    lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
                    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
                    _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
                    sock_hash_delete_elem+0xb1/0x2f0 net/core/sock_map.c:940
                    bpf_prog_43221478a22f23b5+0x42/0x46
                    bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
                    __bpf_prog_run include/linux/filter.h:657 [inline]
                    bpf_prog_run include/linux/filter.h:664 [inline]
                    __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
                    bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
                    trace_contention_end+0xf6/0x120 include/trace/events/lock.h:122
                    __pv_queued_spin_lock_slowpath+0x939/0xc60 kernel/locking/qspinlock.c:560
                    pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:584 [inline]
                    queued_spin_lock_slowpath+0x42/0x50 arch/x86/include/asm/qspinlock.h:51
                    queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
                    do_raw_spin_lock+0x272/0x370 kernel/locking/spinlock_debug.c:116
                    __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
                    _raw_spin_lock_irqsave+0xe1/0x120 kernel/locking/spinlock.c:162
                    debug_object_active_state+0x15d/0x360 lib/debugobjects.c:936
                    debug_rcu_head_unqueue kernel/rcu/rcu.h:236 [inline]
                    rcu_do_batch kernel/rcu/tree.c:2188 [inline]
                    rcu_core+0xa70/0x1830 kernel/rcu/tree.c:2471
                    __do_softirq+0x2bc/0x943 kernel/softirq.c:554
                    invoke_softirq kernel/softirq.c:428 [inline]
                    __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
                    irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
                    instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
                    sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
                    asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
                    _compound_head include/linux/page-flags.h:247 [inline]
                    virt_to_folio include/linux/mm.h:1294 [inline]
                    virt_to_slab mm/kasan/../slab.h:204 [inline]
                    qlink_to_cache+0x1c/0xb0 mm/kasan/quarantine.c:131
                    qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:176
                    kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
                    __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
                    kasan_slab_alloc include/linux/kasan.h:201 [inline]
                    slab_post_alloc_hook mm/slub.c:3813 [inline]
                    slab_alloc_node mm/slub.c:3860 [inline]
                    kmem_cache_alloc_lru+0x175/0x350 mm/slub.c:3879
                    alloc_inode_sb include/linux/fs.h:3088 [inline]
                    shmem_alloc_inode+0x28/0x40 mm/shmem.c:4425
                    alloc_inode fs/inode.c:261 [inline]
                    new_inode_pseudo+0x69/0x1e0 fs/inode.c:1007
                    new_inode+0x22/0x1d0 fs/inode.c:1033
                    __shmem_get_inode mm/shmem.c:2477 [inline]
                    shmem_get_inode+0x34a/0xd40 mm/shmem.c:2548
                    shmem_mknod+0x5f/0x1d0 mm/shmem.c:3242
                    lookup_open fs/namei.c:3498 [inline]
                    open_last_lookups fs/namei.c:3567 [inline]
                    path_openat+0x1425/0x3240 fs/namei.c:3797
                    do_filp_open+0x235/0x490 fs/namei.c:3827
                    do_sys_openat2+0x13e/0x1d0 fs/open.c:1407
                    do_sys_open fs/open.c:1422 [inline]
                    __do_sys_openat fs/open.c:1438 [inline]
                    __se_sys_openat fs/open.c:1433 [inline]
                    __x64_sys_openat+0x247/0x2a0 fs/open.c:1433
                    do_syscall_64+0xfb/0x240
                    entry_SYSCALL_64_after_hwframe+0x6d/0x75
   INITIAL USE at:
                   lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
                   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
                   _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
                   sock_hash_delete_elem+0xb1/0x2f0 net/core/sock_map.c:940
                   0xffffffffa000556a
                   bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
                   __bpf_prog_run include/linux/filter.h:657 [inline]
                   bpf_prog_run include/linux/filter.h:664 [inline]
                   __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
                   bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
                   trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
                   __mutex_lock_common kernel/locking/mutex.c:617 [inline]
                   __mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
                   tracepoint_probe_unregister+0x32/0x990 kernel/tracepoint.c:548
                   bpf_raw_tp_link_release+0x63/0x90 kernel/bpf/syscall.c:3482
                   bpf_link_free kernel/bpf/syscall.c:3033 [inline]
                   bpf_link_put_direct+0x123/0x1b0 kernel/bpf/syscall.c:3064
                   bpf_link_release+0x3b/0x50 kernel/bpf/syscall.c:3071
                   __fput+0x429/0x8a0 fs/file_table.c:423
                   __do_sys_close fs/open.c:1557 [inline]
                   __se_sys_close fs/open.c:1542 [inline]
                   __x64_sys_close+0x7f/0x110 fs/open.c:1542
                   do_syscall_64+0xfb/0x240
                   entry_SYSCALL_64_after_hwframe+0x6d/0x75
 }
 ... key      at: [<ffffffff94882300>] sock_hash_alloc.__key+0x0/0x20
 ... acquired at:
   lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
   _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
   sock_hash_delete_elem+0xb1/0x2f0 net/core/sock_map.c:940
   bpf_prog_43221478a22f23b5+0x42/0x46
   bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
   __bpf_prog_run include/linux/filter.h:657 [inline]
   bpf_prog_run include/linux/filter.h:664 [inline]
   __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
   bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
   trace_contention_end+0xf6/0x120 include/trace/events/lock.h:122
   __pv_queued_spin_lock_slowpath+0x939/0xc60 kernel/locking/qspinlock.c:560
   pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:584 [inline]
   queued_spin_lock_slowpath+0x42/0x50 arch/x86/include/asm/qspinlock.h:51
   queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
   do_raw_spin_lock+0x272/0x370 kernel/locking/spinlock_debug.c:116
   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
   _raw_spin_lock_irqsave+0xe1/0x120 kernel/locking/spinlock.c:162
   debug_object_active_state+0x15d/0x360 lib/debugobjects.c:936
   debug_rcu_head_unqueue kernel/rcu/rcu.h:236 [inline]
   rcu_do_batch kernel/rcu/tree.c:2188 [inline]
   rcu_core+0xa70/0x1830 kernel/rcu/tree.c:2471
   __do_softirq+0x2bc/0x943 kernel/softirq.c:554
   invoke_softirq kernel/softirq.c:428 [inline]
   __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
   irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
   instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
   sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
   asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
   _compound_head include/linux/page-flags.h:247 [inline]
   virt_to_folio include/linux/mm.h:1294 [inline]
   virt_to_slab mm/kasan/../slab.h:204 [inline]
   qlink_to_cache+0x1c/0xb0 mm/kasan/quarantine.c:131
   qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:176
   kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
   __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
   kasan_slab_alloc include/linux/kasan.h:201 [inline]
   slab_post_alloc_hook mm/slub.c:3813 [inline]
   slab_alloc_node mm/slub.c:3860 [inline]
   kmem_cache_alloc_lru+0x175/0x350 mm/slub.c:3879
   alloc_inode_sb include/linux/fs.h:3088 [inline]
   shmem_alloc_inode+0x28/0x40 mm/shmem.c:4425
   alloc_inode fs/inode.c:261 [inline]
   new_inode_pseudo+0x69/0x1e0 fs/inode.c:1007
   new_inode+0x22/0x1d0 fs/inode.c:1033
   __shmem_get_inode mm/shmem.c:2477 [inline]
   shmem_get_inode+0x34a/0xd40 mm/shmem.c:2548
   shmem_mknod+0x5f/0x1d0 mm/shmem.c:3242
   lookup_open fs/namei.c:3498 [inline]
   open_last_lookups fs/namei.c:3567 [inline]
   path_openat+0x1425/0x3240 fs/namei.c:3797
   do_filp_open+0x235/0x490 fs/namei.c:3827
   do_sys_openat2+0x13e/0x1d0 fs/open.c:1407
   do_sys_open fs/open.c:1422 [inline]
   __do_sys_openat fs/open.c:1438 [inline]
   __se_sys_openat fs/open.c:1433 [inline]
   __x64_sys_openat+0x247/0x2a0 fs/open.c:1433
   do_syscall_64+0xfb/0x240
   entry_SYSCALL_64_after_hwframe+0x6d/0x75


stack backtrace:
CPU: 1 PID: 5417 Comm: udevd Not tainted 6.8.0-syzkaller-05231-ga51cd6bf8e10-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
 print_bad_irq_dependency kernel/locking/lockdep.c:2626 [inline]
 check_irq_usage kernel/locking/lockdep.c:2865 [inline]
 check_prev_add kernel/locking/lockdep.c:3138 [inline]
 check_prevs_add kernel/locking/lockdep.c:3253 [inline]
 validate_chain+0x4dc7/0x58e0 kernel/locking/lockdep.c:3869
 __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
 lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
 _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
 sock_hash_delete_elem+0xb1/0x2f0 net/core/sock_map.c:940
 bpf_prog_43221478a22f23b5+0x42/0x46
 bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
 __bpf_prog_run include/linux/filter.h:657 [inline]
 bpf_prog_run include/linux/filter.h:664 [inline]
 __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
 bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
 trace_contention_end+0xf6/0x120 include/trace/events/lock.h:122
 __pv_queued_spin_lock_slowpath+0x939/0xc60 kernel/locking/qspinlock.c:560
 pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:584 [inline]
 queued_spin_lock_slowpath+0x42/0x50 arch/x86/include/asm/qspinlock.h:51
 queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
 do_raw_spin_lock+0x272/0x370 kernel/locking/spinlock_debug.c:116
 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
 _raw_spin_lock_irqsave+0xe1/0x120 kernel/locking/spinlock.c:162
 debug_object_active_state+0x15d/0x360 lib/debugobjects.c:936
 debug_rcu_head_unqueue kernel/rcu/rcu.h:236 [inline]
 rcu_do_batch kernel/rcu/tree.c:2188 [inline]
 rcu_core+0xa70/0x1830 kernel/rcu/tree.c:2471
 __do_softirq+0x2bc/0x943 kernel/softirq.c:554
 invoke_softirq kernel/softirq.c:428 [inline]
 __irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
 irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
 sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:_compound_head include/linux/page-flags.h:249 [inline]
RIP: 0010:virt_to_folio include/linux/mm.h:1294 [inline]
RIP: 0010:virt_to_slab mm/kasan/../slab.h:204 [inline]
RIP: 0010:qlink_to_cache+0x1c/0xb0 mm/kasan/quarantine.c:131
Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e8 ab a3 49 ff 48 c1 e8 06 48 83 e0 c0 48 ba 00 00 00 00 00 ea ff ff 48 8b 4c 10 08 <f6> c1 01 75 44 48 01 d0 66 90 48 8b 48 08 f6 c1 01 75 65 66 90 48
RSP: 0018:ffffc90004c1f6d0 EFLAGS: 00000206
RAX: 0000000000ba42c0 RBX: ffff88802e90b300 RCX: ffffea0000ba4201
RDX: ffffea0000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88801caa0780 R08: ffffffff8141ef1c R09: 1ffffffff2598ea5
R10: dffffc0000000000 R11: fffffbfff2598ea6 R12: 0000000000000000
R13: ffff88802e90b300 R14: ffffc90004c1f708 R15: 0000000000000000
 qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:176
 kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
 __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
 kasan_slab_alloc include/linux/kasan.h:201 [inline]
 slab_post_alloc_hook mm/slub.c:3813 [inline]
 slab_alloc_node mm/slub.c:3860 [inline]
 kmem_cache_alloc_lru+0x175/0x350 mm/slub.c:3879
 alloc_inode_sb include/linux/fs.h:3088 [inline]
 shmem_alloc_inode+0x28/0x40 mm/shmem.c:4425
 alloc_inode fs/inode.c:261 [inline]
 new_inode_pseudo+0x69/0x1e0 fs/inode.c:1007
 new_inode+0x22/0x1d0 fs/inode.c:1033
 __shmem_get_inode mm/shmem.c:2477 [inline]
 shmem_get_inode+0x34a/0xd40 mm/shmem.c:2548
 shmem_mknod+0x5f/0x1d0 mm/shmem.c:3242
 lookup_open fs/namei.c:3498 [inline]
 open_last_lookups fs/namei.c:3567 [inline]
 path_openat+0x1425/0x3240 fs/namei.c:3797
 do_filp_open+0x235/0x490 fs/namei.c:3827
 do_sys_openat2+0x13e/0x1d0 fs/open.c:1407
 do_sys_open fs/open.c:1422 [inline]
 __do_sys_openat fs/open.c:1438 [inline]
 __se_sys_openat fs/open.c:1433 [inline]
 __x64_sys_openat+0x247/0x2a0 fs/open.c:1433
 do_syscall_64+0xfb/0x240
 entry_SYSCALL_64_after_hwframe+0x6d/0x75
RIP: 0033:0x7f2bc89169a4
Code: 24 20 48 8d 44 24 30 48 89 44 24 28 64 8b 04 25 18 00 00 00 85 c0 75 2c 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 76 60 48 8b 15 55 a4 0d 00 f7 d8 64 89 02 48 83
RSP: 002b:00007ffcd2db6460 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f2bc89169a4
RDX: 0000000000080241 RSI: 00007ffcd2db69a8 RDI: 00000000ffffff9c
RBP: 00007ffcd2db69a8 R08: 0000000000000004 R09: 0000000000000001
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000080241
R13: 000055cf24a9672e R14: 0000000000000001 R15: 000055cf24ab1160
 </TASK>
----------------
Code disassembly (best guess):
   0:	90                   	nop
   1:	90                   	nop
   2:	90                   	nop
   3:	90                   	nop
   4:	90                   	nop
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop
   8:	90                   	nop
   9:	90                   	nop
   a:	90                   	nop
   b:	90                   	nop
   c:	90                   	nop
   d:	90                   	nop
   e:	e8 ab a3 49 ff       	call   0xff49a3be
  13:	48 c1 e8 06          	shr    $0x6,%rax
  17:	48 83 e0 c0          	and    $0xffffffffffffffc0,%rax
  1b:	48 ba 00 00 00 00 00 	movabs $0xffffea0000000000,%rdx
  22:	ea ff ff
  25:	48 8b 4c 10 08       	mov    0x8(%rax,%rdx,1),%rcx
* 2a:	f6 c1 01             	test   $0x1,%cl <-- trapping instruction
  2d:	75 44                	jne    0x73
  2f:	48 01 d0             	add    %rdx,%rax
  32:	66 90                	xchg   %ax,%ax
  34:	48 8b 48 08          	mov    0x8(%rax),%rcx
  38:	f6 c1 01             	test   $0x1,%cl
  3b:	75 65                	jne    0xa2
  3d:	66 90                	xchg   %ax,%ax
  3f:	48                   	rex.W


Tested on:

commit:         a51cd6bf arm64: bpf: fix 32bit unconditional bswap
git tree:       https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=1797ba81180000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6fb1be60a193d440
dashboard link: https://syzkaller.appspot.com/bug?extid=c4f4d25859c2e5859988
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=100f1d66180000


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult
  2024-03-18 10:07 [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult syzbot
  2024-03-21  0:25 ` Edward Adam Davis
@ 2024-03-22  0:17 ` Edward Adam Davis
  2024-03-22 10:56   ` syzbot
  2024-03-23  5:42 ` [PATCH] bpf, sockmap: fix " Edward Adam Davis
  2024-04-20 14:51 ` [syzbot] [bpf?] [net?] possible " Tetsuo Handa
  3 siblings, 1 reply; 13+ messages in thread
From: Edward Adam Davis @ 2024-03-22  0:17 UTC (permalink / raw)
  To: syzbot+c4f4d25859c2e5859988; +Cc: linux-kernel, syzkaller-bugs

please test dl in rcu_report_exp_cpu_mult

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master


diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 27d733c0f65e..ae8f81b26e16 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -932,11 +932,12 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
 	struct bpf_shtab_bucket *bucket;
 	struct bpf_shtab_elem *elem;
 	int ret = -ENOENT;
+	unsigned long flags;
 
 	hash = sock_hash_bucket_hash(key, key_size);
 	bucket = sock_hash_select_bucket(htab, hash);
 
-	spin_lock_bh(&bucket->lock);
+	spin_lock_irqsave(&bucket->lock, flags);
 	elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
 	if (elem) {
 		hlist_del_rcu(&elem->node);
@@ -944,7 +945,7 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
 		sock_hash_free_elem(htab, elem);
 		ret = 0;
 	}
-	spin_unlock_bh(&bucket->lock);
+	spin_unlock_irqrestore(&bucket->lock, flags);
 	return ret;
 }
 
@@ -1136,6 +1137,7 @@ static void sock_hash_free(struct bpf_map *map)
 	struct bpf_shtab_elem *elem;
 	struct hlist_node *node;
 	int i;
+	unsigned long flags;
 
 	/* After the sync no updates or deletes will be in-flight so it
 	 * is safe to walk map and remove entries without risking a race
@@ -1151,11 +1153,11 @@ static void sock_hash_free(struct bpf_map *map)
 		 * exists, psock exists and holds a ref to socket. That
 		 * lets us to grab a socket ref too.
 		 */
-		spin_lock_bh(&bucket->lock);
+		spin_lock_irqsave(&bucket->lock, flags);
 		hlist_for_each_entry(elem, &bucket->head, node)
 			sock_hold(elem->sk);
 		hlist_move_list(&bucket->head, &unlink_list);
-		spin_unlock_bh(&bucket->lock);
+		spin_unlock_irqrestore(&bucket->lock, flags);
 
 		/* Process removed entries out of atomic context to
 		 * block for socket lock before deleting the psock's


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult
  2024-03-22  0:17 ` Edward Adam Davis
@ 2024-03-22 10:56   ` syzbot
  0 siblings, 0 replies; 13+ messages in thread
From: syzbot @ 2024-03-22 10:56 UTC (permalink / raw)
  To: eadavis, linux-kernel, syzkaller-bugs

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+c4f4d25859c2e5859988@syzkaller.appspotmail.com

Tested on:

commit:         ddb2ffdc libbpf: Define MFD_CLOEXEC if not available
git tree:       https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=170141a5180000
kernel config:  https://syzkaller.appspot.com/x/.config?x=6fb1be60a193d440
dashboard link: https://syzkaller.appspot.com/bug?extid=c4f4d25859c2e5859988
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch:          https://syzkaller.appspot.com/x/patch.diff?x=14b44711180000

Note: testing is done by a robot and is best-effort only.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] bpf, sockmap: fix deadlock in rcu_report_exp_cpu_mult
  2024-03-18 10:07 [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult syzbot
  2024-03-21  0:25 ` Edward Adam Davis
  2024-03-22  0:17 ` Edward Adam Davis
@ 2024-03-23  5:42 ` Edward Adam Davis
  2024-03-23  7:08   ` Alexei Starovoitov
  2024-04-20 14:51 ` [syzbot] [bpf?] [net?] possible " Tetsuo Handa
  3 siblings, 1 reply; 13+ messages in thread
From: Edward Adam Davis @ 2024-03-23  5:42 UTC (permalink / raw)
  To: syzbot+c4f4d25859c2e5859988
  Cc: 42.hyeyoo, andrii, ast, bpf, daniel, davem, edumazet, jakub,
	john.fastabend, kafai, kpsingh, kuba, linux-kernel, namhyung,
	netdev, pabeni, peterz, songliubraving, syzkaller-bugs, yhs

[Syzbot reported]
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.8.0-syzkaller-05221-gea80e3ed09ab #0 Not tainted
-----------------------------------------------------
rcu_exp_gp_kthr/18 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939

and this task is already holding:
ffffffff8e136558 (rcu_node_0){-.-.}-{2:2}, at: sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
which would create a new lock dependency:
 (rcu_node_0){-.-.}-{2:2} -> (&htab->buckets[i].lock){+...}-{2:2}

but this new dependency connects a HARDIRQ-irq-safe lock:
 (rcu_node_0){-.-.}-{2:2}

... which became HARDIRQ-irq-safe at:
  lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
  _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
  rcu_report_exp_cpu_mult+0x27/0x2f0 kernel/rcu/tree_exp.h:238
  csd_do_func kernel/smp.c:133 [inline]
  __flush_smp_call_function_queue+0xb2e/0x15b0 kernel/smp.c:542
  __sysvec_call_function_single+0xa8/0x3e0 arch/x86/kernel/smp.c:271
  instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
  sysvec_call_function_single+0x9e/0xc0 arch/x86/kernel/smp.c:266
  asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:709
  __sanitizer_cov_trace_switch+0x90/0x120
  update_event_printk kernel/trace/trace_events.c:2750 [inline]
  trace_event_eval_update+0x311/0xf90 kernel/trace/trace_events.c:2922
  process_one_work kernel/workqueue.c:3254 [inline]
  process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
  worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
  kthread+0x2f0/0x390 kernel/kthread.c:388
  ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243

to a HARDIRQ-irq-unsafe lock:
 (&htab->buckets[i].lock){+...}-{2:2}

... which became HARDIRQ-irq-unsafe at:
...
  lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
  _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
  spin_lock_bh include/linux/spinlock.h:356 [inline]
  sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
  0xffffffffa0001b0e
  bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
  __bpf_prog_run include/linux/filter.h:657 [inline]
  bpf_prog_run include/linux/filter.h:664 [inline]
  __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
  bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
  trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
  __mutex_lock_common kernel/locking/mutex.c:617 [inline]
  __mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
  futex_cleanup_begin kernel/futex/core.c:1091 [inline]
  futex_exit_release+0x34/0x1f0 kernel/futex/core.c:1143
  exit_mm_release+0x1a/0x30 kernel/fork.c:1652
  exit_mm+0xb0/0x310 kernel/exit.c:542
  do_exit+0x99e/0x27e0 kernel/exit.c:865
  do_group_exit+0x207/0x2c0 kernel/exit.c:1027
  __do_sys_exit_group kernel/exit.c:1038 [inline]
  __se_sys_exit_group kernel/exit.c:1036 [inline]
  __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
  do_syscall_64+0xfb/0x240
  entry_SYSCALL_64_after_hwframe+0x6d/0x75

other info that might help us debug this:

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&htab->buckets[i].lock);
                               local_irq_disable();
                               lock(rcu_node_0);
                               lock(&htab->buckets[i].lock);
  <Interrupt>
    lock(rcu_node_0);

 *** DEADLOCK ***
[Fix]
Ensure that the context interrupt state is the same before and after using the
bucket->lock.

Reported-and-tested-by: syzbot+c4f4d25859c2e5859988@syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
---
 net/core/sock_map.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 27d733c0f65e..ae8f81b26e16 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -932,11 +932,12 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
 	struct bpf_shtab_bucket *bucket;
 	struct bpf_shtab_elem *elem;
 	int ret = -ENOENT;
+	unsigned long flags;
 
 	hash = sock_hash_bucket_hash(key, key_size);
 	bucket = sock_hash_select_bucket(htab, hash);
 
-	spin_lock_bh(&bucket->lock);
+	spin_lock_irqsave(&bucket->lock, flags);
 	elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
 	if (elem) {
 		hlist_del_rcu(&elem->node);
@@ -944,7 +945,7 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
 		sock_hash_free_elem(htab, elem);
 		ret = 0;
 	}
-	spin_unlock_bh(&bucket->lock);
+	spin_unlock_irqrestore(&bucket->lock, flags);
 	return ret;
 }
 
@@ -1136,6 +1137,7 @@ static void sock_hash_free(struct bpf_map *map)
 	struct bpf_shtab_elem *elem;
 	struct hlist_node *node;
 	int i;
+	unsigned long flags;
 
 	/* After the sync no updates or deletes will be in-flight so it
 	 * is safe to walk map and remove entries without risking a race
@@ -1151,11 +1153,11 @@ static void sock_hash_free(struct bpf_map *map)
 		 * exists, psock exists and holds a ref to socket. That
 		 * lets us to grab a socket ref too.
 		 */
-		spin_lock_bh(&bucket->lock);
+		spin_lock_irqsave(&bucket->lock, flags);
 		hlist_for_each_entry(elem, &bucket->head, node)
 			sock_hold(elem->sk);
 		hlist_move_list(&bucket->head, &unlink_list);
-		spin_unlock_bh(&bucket->lock);
+		spin_unlock_irqrestore(&bucket->lock, flags);
 
 		/* Process removed entries out of atomic context to
 		 * block for socket lock before deleting the psock's
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] bpf, sockmap: fix deadlock in rcu_report_exp_cpu_mult
  2024-03-23  5:42 ` [PATCH] bpf, sockmap: fix " Edward Adam Davis
@ 2024-03-23  7:08   ` Alexei Starovoitov
  2024-03-25 12:23     ` Jakub Sitnicki
  0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2024-03-23  7:08 UTC (permalink / raw)
  To: Edward Adam Davis, John Fastabend
  Cc: syzbot+c4f4d25859c2e5859988, 42.hyeyoo, andrii, ast, bpf, daniel,
	davem, edumazet, jakub, kafai, kpsingh, kuba, linux-kernel,
	namhyung, netdev, pabeni, peterz, songliubraving, syzkaller-bugs,
	yhs

John,
please review.
It seems this bug was causing multiple syzbot reports.

On Fri, Mar 22, 2024 at 10:42 PM Edward Adam Davis <eadavis@qq.com> wrote:
>
> [Syzbot reported]
> WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> 6.8.0-syzkaller-05221-gea80e3ed09ab #0 Not tainted
> -----------------------------------------------------
> rcu_exp_gp_kthr/18 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
> ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
> ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
>
> and this task is already holding:
> ffffffff8e136558 (rcu_node_0){-.-.}-{2:2}, at: sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
> which would create a new lock dependency:
>  (rcu_node_0){-.-.}-{2:2} -> (&htab->buckets[i].lock){+...}-{2:2}
>
> but this new dependency connects a HARDIRQ-irq-safe lock:
>  (rcu_node_0){-.-.}-{2:2}
>
> ... which became HARDIRQ-irq-safe at:
>   lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
>   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>   _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
>   rcu_report_exp_cpu_mult+0x27/0x2f0 kernel/rcu/tree_exp.h:238
>   csd_do_func kernel/smp.c:133 [inline]
>   __flush_smp_call_function_queue+0xb2e/0x15b0 kernel/smp.c:542
>   __sysvec_call_function_single+0xa8/0x3e0 arch/x86/kernel/smp.c:271
>   instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
>   sysvec_call_function_single+0x9e/0xc0 arch/x86/kernel/smp.c:266
>   asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:709
>   __sanitizer_cov_trace_switch+0x90/0x120
>   update_event_printk kernel/trace/trace_events.c:2750 [inline]
>   trace_event_eval_update+0x311/0xf90 kernel/trace/trace_events.c:2922
>   process_one_work kernel/workqueue.c:3254 [inline]
>   process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
>   worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
>   kthread+0x2f0/0x390 kernel/kthread.c:388
>   ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
>   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
>
> to a HARDIRQ-irq-unsafe lock:
>  (&htab->buckets[i].lock){+...}-{2:2}
>
> ... which became HARDIRQ-irq-unsafe at:
> ...
>   lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
>   __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
>   _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
>   spin_lock_bh include/linux/spinlock.h:356 [inline]
>   sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
>   0xffffffffa0001b0e
>   bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
>   __bpf_prog_run include/linux/filter.h:657 [inline]
>   bpf_prog_run include/linux/filter.h:664 [inline]
>   __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
>   bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
>   trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
>   __mutex_lock_common kernel/locking/mutex.c:617 [inline]
>   __mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
>   futex_cleanup_begin kernel/futex/core.c:1091 [inline]
>   futex_exit_release+0x34/0x1f0 kernel/futex/core.c:1143
>   exit_mm_release+0x1a/0x30 kernel/fork.c:1652
>   exit_mm+0xb0/0x310 kernel/exit.c:542
>   do_exit+0x99e/0x27e0 kernel/exit.c:865
>   do_group_exit+0x207/0x2c0 kernel/exit.c:1027
>   __do_sys_exit_group kernel/exit.c:1038 [inline]
>   __se_sys_exit_group kernel/exit.c:1036 [inline]
>   __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
>   do_syscall_64+0xfb/0x240
>   entry_SYSCALL_64_after_hwframe+0x6d/0x75
>
> other info that might help us debug this:
>
>  Possible interrupt unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(&htab->buckets[i].lock);
>                                local_irq_disable();
>                                lock(rcu_node_0);
>                                lock(&htab->buckets[i].lock);
>   <Interrupt>
>     lock(rcu_node_0);
>
>  *** DEADLOCK ***
> [Fix]
> Ensure that the context interrupt state is the same before and after using the
> bucket->lock.
>
> Reported-and-tested-by: syzbot+c4f4d25859c2e5859988@syzkaller.appspotmail.com
> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
> ---
>  net/core/sock_map.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
> index 27d733c0f65e..ae8f81b26e16 100644
> --- a/net/core/sock_map.c
> +++ b/net/core/sock_map.c
> @@ -932,11 +932,12 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
>         struct bpf_shtab_bucket *bucket;
>         struct bpf_shtab_elem *elem;
>         int ret = -ENOENT;
> +       unsigned long flags;
>
>         hash = sock_hash_bucket_hash(key, key_size);
>         bucket = sock_hash_select_bucket(htab, hash);
>
> -       spin_lock_bh(&bucket->lock);
> +       spin_lock_irqsave(&bucket->lock, flags);
>         elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
>         if (elem) {
>                 hlist_del_rcu(&elem->node);
> @@ -944,7 +945,7 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
>                 sock_hash_free_elem(htab, elem);
>                 ret = 0;
>         }
> -       spin_unlock_bh(&bucket->lock);
> +       spin_unlock_irqrestore(&bucket->lock, flags);
>         return ret;
>  }
>
> @@ -1136,6 +1137,7 @@ static void sock_hash_free(struct bpf_map *map)
>         struct bpf_shtab_elem *elem;
>         struct hlist_node *node;
>         int i;
> +       unsigned long flags;
>
>         /* After the sync no updates or deletes will be in-flight so it
>          * is safe to walk map and remove entries without risking a race
> @@ -1151,11 +1153,11 @@ static void sock_hash_free(struct bpf_map *map)
>                  * exists, psock exists and holds a ref to socket. That
>                  * lets us to grab a socket ref too.
>                  */
> -               spin_lock_bh(&bucket->lock);
> +               spin_lock_irqsave(&bucket->lock, flags);
>                 hlist_for_each_entry(elem, &bucket->head, node)
>                         sock_hold(elem->sk);
>                 hlist_move_list(&bucket->head, &unlink_list);
> -               spin_unlock_bh(&bucket->lock);
> +               spin_unlock_irqrestore(&bucket->lock, flags);
>
>                 /* Process removed entries out of atomic context to
>                  * block for socket lock before deleting the psock's
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] bpf, sockmap: fix deadlock in rcu_report_exp_cpu_mult
  2024-03-23  7:08   ` Alexei Starovoitov
@ 2024-03-25 12:23     ` Jakub Sitnicki
  2024-03-25 13:49       ` Jakub Sitnicki
  2024-03-26 22:15       ` Jakub Sitnicki
  0 siblings, 2 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2024-03-25 12:23 UTC (permalink / raw)
  To: Alexei Starovoitov, Edward Adam Davis, John Fastabend
  Cc: syzbot+c4f4d25859c2e5859988, 42.hyeyoo, andrii, ast, bpf, daniel,
	davem, edumazet, kafai, kpsingh, kuba, linux-kernel, namhyung,
	netdev, pabeni, peterz, songliubraving, syzkaller-bugs, yhs

On Sat, Mar 23, 2024 at 12:08 AM -07, Alexei Starovoitov wrote:
> John,
> please review.
> It seems this bug was causing multiple syzbot reports.

Any chance we could disallow mutating sockhash from interrupt context?

If that is not an option, then this looks like a good start of a fix.
But we also need to cover sock_map_unref->sock_sock_map_del_link called
from sock_hash_delete_elem. It also grabs a spin lock.

Also, sockhash is not the only affected map type. I see we're grabbing a
spin lock in ->map_delete_elem without disabling interrupts as well in:

- sock_map_delete_elem
- reuseport_array_delete_elem
- xsk_map_delete_elem

> On Fri, Mar 22, 2024 at 10:42 PM Edward Adam Davis <eadavis@qq.com> wrote:
>>
>> [Syzbot reported]
>> WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
>> 6.8.0-syzkaller-05221-gea80e3ed09ab #0 Not tainted
>> -----------------------------------------------------
>> rcu_exp_gp_kthr/18 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
>> ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
>> ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
>>
>> and this task is already holding:
>> ffffffff8e136558 (rcu_node_0){-.-.}-{2:2}, at: sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
>> which would create a new lock dependency:
>>  (rcu_node_0){-.-.}-{2:2} -> (&htab->buckets[i].lock){+...}-{2:2}
>>
>> but this new dependency connects a HARDIRQ-irq-safe lock:
>>  (rcu_node_0){-.-.}-{2:2}
>>
>> ... which became HARDIRQ-irq-safe at:
>>   lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
>>   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>>   _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
>>   rcu_report_exp_cpu_mult+0x27/0x2f0 kernel/rcu/tree_exp.h:238
>>   csd_do_func kernel/smp.c:133 [inline]
>>   __flush_smp_call_function_queue+0xb2e/0x15b0 kernel/smp.c:542
>>   __sysvec_call_function_single+0xa8/0x3e0 arch/x86/kernel/smp.c:271
>>   instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
>>   sysvec_call_function_single+0x9e/0xc0 arch/x86/kernel/smp.c:266
>>   asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:709
>>   __sanitizer_cov_trace_switch+0x90/0x120
>>   update_event_printk kernel/trace/trace_events.c:2750 [inline]
>>   trace_event_eval_update+0x311/0xf90 kernel/trace/trace_events.c:2922
>>   process_one_work kernel/workqueue.c:3254 [inline]
>>   process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
>>   worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
>>   kthread+0x2f0/0x390 kernel/kthread.c:388
>>   ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
>>   ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
>>
>> to a HARDIRQ-irq-unsafe lock:
>>  (&htab->buckets[i].lock){+...}-{2:2}
>>
>> ... which became HARDIRQ-irq-unsafe at:
>> ...
>>   lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
>>   __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
>>   _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
>>   spin_lock_bh include/linux/spinlock.h:356 [inline]
>>   sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
>>   0xffffffffa0001b0e
>>   bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
>>   __bpf_prog_run include/linux/filter.h:657 [inline]
>>   bpf_prog_run include/linux/filter.h:664 [inline]
>>   __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
>>   bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
>>   trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
>>   __mutex_lock_common kernel/locking/mutex.c:617 [inline]
>>   __mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
>>   futex_cleanup_begin kernel/futex/core.c:1091 [inline]
>>   futex_exit_release+0x34/0x1f0 kernel/futex/core.c:1143
>>   exit_mm_release+0x1a/0x30 kernel/fork.c:1652
>>   exit_mm+0xb0/0x310 kernel/exit.c:542
>>   do_exit+0x99e/0x27e0 kernel/exit.c:865
>>   do_group_exit+0x207/0x2c0 kernel/exit.c:1027
>>   __do_sys_exit_group kernel/exit.c:1038 [inline]
>>   __se_sys_exit_group kernel/exit.c:1036 [inline]
>>   __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
>>   do_syscall_64+0xfb/0x240
>>   entry_SYSCALL_64_after_hwframe+0x6d/0x75
>>
>> other info that might help us debug this:
>>
>>  Possible interrupt unsafe locking scenario:
>>
>>        CPU0                    CPU1
>>        ----                    ----
>>   lock(&htab->buckets[i].lock);
>>                                local_irq_disable();
>>                                lock(rcu_node_0);
>>                                lock(&htab->buckets[i].lock);
>>   <Interrupt>
>>     lock(rcu_node_0);
>>
>>  *** DEADLOCK ***
>> [Fix]
>> Ensure that the context interrupt state is the same before and after using the
>> bucket->lock.
>>
>> Reported-and-tested-by: syzbot+c4f4d25859c2e5859988@syzkaller.appspotmail.com
>> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
>> ---
>>  net/core/sock_map.c | 10 ++++++----
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>> index 27d733c0f65e..ae8f81b26e16 100644
>> --- a/net/core/sock_map.c
>> +++ b/net/core/sock_map.c
>> @@ -932,11 +932,12 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
>>         struct bpf_shtab_bucket *bucket;
>>         struct bpf_shtab_elem *elem;
>>         int ret = -ENOENT;
>> +       unsigned long flags;
>>
>>         hash = sock_hash_bucket_hash(key, key_size);
>>         bucket = sock_hash_select_bucket(htab, hash);
>>
>> -       spin_lock_bh(&bucket->lock);
>> +       spin_lock_irqsave(&bucket->lock, flags);
>>         elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
>>         if (elem) {
>>                 hlist_del_rcu(&elem->node);
>> @@ -944,7 +945,7 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
>>                 sock_hash_free_elem(htab, elem);
>>                 ret = 0;
>>         }
>> -       spin_unlock_bh(&bucket->lock);
>> +       spin_unlock_irqrestore(&bucket->lock, flags);
>>         return ret;
>>  }
>>
>> @@ -1136,6 +1137,7 @@ static void sock_hash_free(struct bpf_map *map)
>>         struct bpf_shtab_elem *elem;
>>         struct hlist_node *node;
>>         int i;
>> +       unsigned long flags;
>>
>>         /* After the sync no updates or deletes will be in-flight so it
>>          * is safe to walk map and remove entries without risking a race
>> @@ -1151,11 +1153,11 @@ static void sock_hash_free(struct bpf_map *map)
>>                  * exists, psock exists and holds a ref to socket. That
>>                  * lets us to grab a socket ref too.
>>                  */
>> -               spin_lock_bh(&bucket->lock);
>> +               spin_lock_irqsave(&bucket->lock, flags);
>>                 hlist_for_each_entry(elem, &bucket->head, node)
>>                         sock_hold(elem->sk);
>>                 hlist_move_list(&bucket->head, &unlink_list);
>> -               spin_unlock_bh(&bucket->lock);
>> +               spin_unlock_irqrestore(&bucket->lock, flags);
>>
>>                 /* Process removed entries out of atomic context to
>>                  * block for socket lock before deleting the psock's
>> --
>> 2.43.0
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] bpf, sockmap: fix deadlock in rcu_report_exp_cpu_mult
  2024-03-25 12:23     ` Jakub Sitnicki
@ 2024-03-25 13:49       ` Jakub Sitnicki
  2024-03-29  5:29         ` John Fastabend
  2024-03-26 22:15       ` Jakub Sitnicki
  1 sibling, 1 reply; 13+ messages in thread
From: Jakub Sitnicki @ 2024-03-25 13:49 UTC (permalink / raw)
  To: Alexei Starovoitov, Edward Adam Davis, John Fastabend
  Cc: syzbot+c4f4d25859c2e5859988, 42.hyeyoo, andrii, ast, bpf, daniel,
	davem, edumazet, kafai, kpsingh, kuba, linux-kernel, namhyung,
	netdev, pabeni, peterz, songliubraving, syzkaller-bugs, yhs

On Mon, Mar 25, 2024 at 01:23 PM +01, Jakub Sitnicki wrote:

[...]

> But we also need to cover sock_map_unref->sock_sock_map_del_link called
> from sock_hash_delete_elem. It also grabs a spin lock.

On second look, no need to disable interrupts in
sock_map_unref->sock_sock_map_del_link. Call is enclosed in the critical
section in sock_hash_delete_elem that has been updated.

I have a question, though, why are we patching sock_hash_free? It
doesn't get called unless there are no more existing users of the BPF
map. So nothing can mutate it from interrupt context.

[...]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] bpf, sockmap: fix deadlock in rcu_report_exp_cpu_mult
  2024-03-25 12:23     ` Jakub Sitnicki
  2024-03-25 13:49       ` Jakub Sitnicki
@ 2024-03-26 22:15       ` Jakub Sitnicki
  2024-03-29 15:52         ` Shung-Hsi Yu
  1 sibling, 1 reply; 13+ messages in thread
From: Jakub Sitnicki @ 2024-03-26 22:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Edward Adam Davis, John Fastabend
  Cc: syzbot+c4f4d25859c2e5859988, 42.hyeyoo, andrii, ast, bpf, daniel,
	davem, edumazet, kafai, kpsingh, kuba, linux-kernel, namhyung,
	netdev, pabeni, peterz, songliubraving, syzkaller-bugs, yhs

On Mon, Mar 25, 2024 at 01:23 PM +01, Jakub Sitnicki wrote:
> On Sat, Mar 23, 2024 at 12:08 AM -07, Alexei Starovoitov wrote:
>> It seems this bug was causing multiple syzbot reports.
> Any chance we could disallow mutating sockhash from interrupt context?

I've been playing with the repro from one of the other reports:

https://lore.kernel.org/all/CABOYnLzaRiZ+M1v7dPaeObnj_=S4JYmWbgrXaYsyBbWh=553vQ@mail.gmail.com/

syzkaller workload is artificial. So, if we can avoid it, I'd rather not
support modifying sockmap/sockhash in contexts where irqs are disabled,
and lock safety rules are stricter than what we abide to today.

Ideally, we allow task and softirq contexts with irqs enabled (so no
tracing progs attached to timer tick, which syzcaller is using as corpus
here). Otherwise, we will have to cover for that in selftests.

I'm thinking about a restriction like:

---8<---

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 27d733c0f65e..3692f7256dd6 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -907,6 +907,7 @@ static void sock_hash_delete_from_link(struct bpf_map *map, struct sock *sk,
 	struct bpf_shtab_elem *elem_probe, *elem = link_raw;
 	struct bpf_shtab_bucket *bucket;
 
+	WARN_ON_ONCE(irqs_disabled());
 	WARN_ON_ONCE(!rcu_read_lock_held());
 	bucket = sock_hash_select_bucket(htab, elem->hash);
 
@@ -933,6 +934,10 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
 	struct bpf_shtab_elem *elem;
 	int ret = -ENOENT;
 
+	/* Can't run. We don't play nice with hardirq-safe locks. */
+	if (irqs_disabled())
+		return -EOPNOTSUPP;
+
 	hash = sock_hash_bucket_hash(key, key_size);
 	bucket = sock_hash_select_bucket(htab, hash);
 
@@ -986,6 +991,7 @@ static int sock_hash_update_common(struct bpf_map *map, void *key,
 	struct sk_psock *psock;
 	int ret;
 
+	WARN_ON_ONCE(irqs_disabled());
 	WARN_ON_ONCE(!rcu_read_lock_held());
 	if (unlikely(flags > BPF_EXIST))
 		return -EINVAL;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] bpf, sockmap: fix deadlock in rcu_report_exp_cpu_mult
  2024-03-25 13:49       ` Jakub Sitnicki
@ 2024-03-29  5:29         ` John Fastabend
  0 siblings, 0 replies; 13+ messages in thread
From: John Fastabend @ 2024-03-29  5:29 UTC (permalink / raw)
  To: Jakub Sitnicki, Alexei Starovoitov, Edward Adam Davis, John Fastabend
  Cc: syzbot+c4f4d25859c2e5859988, 42.hyeyoo, andrii, ast, bpf, daniel,
	davem, edumazet, kafai, kpsingh, kuba, linux-kernel, namhyung,
	netdev, pabeni, peterz, songliubraving, syzkaller-bugs, yhs

Jakub Sitnicki wrote:
> On Mon, Mar 25, 2024 at 01:23 PM +01, Jakub Sitnicki wrote:
> 
> [...]
> 
> > But we also need to cover sock_map_unref->sock_sock_map_del_link called
> > from sock_hash_delete_elem. It also grabs a spin lock.
> 
> On second look, no need to disable interrupts in
> sock_map_unref->sock_sock_map_del_link. Call is enclosed in the critical
> section in sock_hash_delete_elem that has been updated.
> 
> I have a question, though, why are we patching sock_hash_free? It
> doesn't get called unless there are no more existing users of the BPF
> map. So nothing can mutate it from interrupt context.
> 
> [...]

Agree sock_hash_free should be only after all refs are dropped.

Edward, did you want to send a v2 for this? Also if you want fixing the
sockmap case as well would be useful. Also happy to finish up the patches
if you would rather not.

Thanks,
John

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] bpf, sockmap: fix deadlock in rcu_report_exp_cpu_mult
  2024-03-26 22:15       ` Jakub Sitnicki
@ 2024-03-29 15:52         ` Shung-Hsi Yu
  0 siblings, 0 replies; 13+ messages in thread
From: Shung-Hsi Yu @ 2024-03-29 15:52 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Alexei Starovoitov, Edward Adam Davis, John Fastabend,
	syzbot+c4f4d25859c2e5859988, 42.hyeyoo, andrii, ast, bpf, daniel,
	davem, edumazet, kafai, kpsingh, kuba, linux-kernel, namhyung,
	netdev, pabeni, peterz, songliubraving, syzkaller-bugs, yhs,
	Xin Liu

On Tue, Mar 26, 2024 at 11:15:47PM +0100, Jakub Sitnicki wrote:
> On Mon, Mar 25, 2024 at 01:23 PM +01, Jakub Sitnicki wrote:
> > On Sat, Mar 23, 2024 at 12:08 AM -07, Alexei Starovoitov wrote:
> >> It seems this bug was causing multiple syzbot reports.
> > Any chance we could disallow mutating sockhash from interrupt context?
> 
> I've been playing with the repro from one of the other reports:
> 
> https://lore.kernel.org/all/CABOYnLzaRiZ+M1v7dPaeObnj_=S4JYmWbgrXaYsyBbWh=553vQ@mail.gmail.com/

Possibly also related:
- "A potential deadlock in sockhash map"[1] report awhile back
- commit ed17aa92dc56b ("bpf, sockmap: fix deadlocks in the sockhash and
  sockmap")
- commit 8c5c2a4898e3d ("bpf, sockmap: Revert buggy deadlock fix in the
  sockhash and sockmap")

1: https://lore.kernel.org/all/CABcoxUayum5oOqFMMqAeWuS8+EzojquSOSyDA3J_2omY=2EeAg@mail.gmail.com/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult
  2024-03-18 10:07 [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult syzbot
                   ` (2 preceding siblings ...)
  2024-03-23  5:42 ` [PATCH] bpf, sockmap: fix " Edward Adam Davis
@ 2024-04-20 14:51 ` Tetsuo Handa
  3 siblings, 0 replies; 13+ messages in thread
From: Tetsuo Handa @ 2024-04-20 14:51 UTC (permalink / raw)
  To: syzbot, linux-kernel, syzkaller-bugs

#syz fix: bpf, sockmap: Prevent lock inversion deadlock in map delete elem


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-04-20 14:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-18 10:07 [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult syzbot
2024-03-21  0:25 ` Edward Adam Davis
2024-03-21 15:04   ` syzbot
2024-03-22  0:17 ` Edward Adam Davis
2024-03-22 10:56   ` syzbot
2024-03-23  5:42 ` [PATCH] bpf, sockmap: fix " Edward Adam Davis
2024-03-23  7:08   ` Alexei Starovoitov
2024-03-25 12:23     ` Jakub Sitnicki
2024-03-25 13:49       ` Jakub Sitnicki
2024-03-29  5:29         ` John Fastabend
2024-03-26 22:15       ` Jakub Sitnicki
2024-03-29 15:52         ` Shung-Hsi Yu
2024-04-20 14:51 ` [syzbot] [bpf?] [net?] possible " Tetsuo Handa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.