From: Jakub Sitnicki <jakub@cloudflare.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Edward Adam Davis <eadavis@qq.com>,
John Fastabend <john.fastabend@gmail.com>
Cc: syzbot+c4f4d25859c2e5859988@syzkaller.appspotmail.com,
42.hyeyoo@gmail.com, andrii@kernel.org, ast@kernel.org,
bpf@vger.kernel.org, daniel@iogearbox.net, davem@davemloft.net,
edumazet@google.com, kafai@fb.com, kpsingh@kernel.org,
kuba@kernel.org, linux-kernel@vger.kernel.org,
namhyung@kernel.org, netdev@vger.kernel.org, pabeni@redhat.com,
peterz@infradead.org, songliubraving@fb.com,
syzkaller-bugs@googlegroups.com, yhs@fb.com
Subject: Re: [PATCH] bpf, sockmap: fix deadlock in rcu_report_exp_cpu_mult
Date: Mon, 25 Mar 2024 13:23:07 +0100 [thread overview]
Message-ID: <87y1a6biie.fsf@cloudflare.com> (raw)
In-Reply-To: <CAADnVQJQvcZOA_BbFxPqNyRbMdKTBSMnf=cKvW7NJ8LxxP54sA@mail.gmail.com>
On Sat, Mar 23, 2024 at 12:08 AM -07, Alexei Starovoitov wrote:
> John,
> please review.
> It seems this bug was causing multiple syzbot reports.
Any chance we could disallow mutating sockhash from interrupt context?
If that is not an option, then this looks like a good start of a fix.
But we also need to cover sock_map_unref->sock_sock_map_del_link called
from sock_hash_delete_elem. It also grabs a spin lock.
Also, sockhash is not the only affected map type. I see we're grabbing a
spin lock in ->map_delete_elem without disabling interrupts as well in:
- sock_map_delete_elem
- reuseport_array_delete_elem
- xsk_map_delete_elem
> On Fri, Mar 22, 2024 at 10:42 PM Edward Adam Davis <eadavis@qq.com> wrote:
>>
>> [Syzbot reported]
>> WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
>> 6.8.0-syzkaller-05221-gea80e3ed09ab #0 Not tainted
>> -----------------------------------------------------
>> rcu_exp_gp_kthr/18 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
>> ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
>> ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
>>
>> and this task is already holding:
>> ffffffff8e136558 (rcu_node_0){-.-.}-{2:2}, at: sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
>> which would create a new lock dependency:
>> (rcu_node_0){-.-.}-{2:2} -> (&htab->buckets[i].lock){+...}-{2:2}
>>
>> but this new dependency connects a HARDIRQ-irq-safe lock:
>> (rcu_node_0){-.-.}-{2:2}
>>
>> ... which became HARDIRQ-irq-safe at:
>> lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
>> __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
>> _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
>> rcu_report_exp_cpu_mult+0x27/0x2f0 kernel/rcu/tree_exp.h:238
>> csd_do_func kernel/smp.c:133 [inline]
>> __flush_smp_call_function_queue+0xb2e/0x15b0 kernel/smp.c:542
>> __sysvec_call_function_single+0xa8/0x3e0 arch/x86/kernel/smp.c:271
>> instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
>> sysvec_call_function_single+0x9e/0xc0 arch/x86/kernel/smp.c:266
>> asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:709
>> __sanitizer_cov_trace_switch+0x90/0x120
>> update_event_printk kernel/trace/trace_events.c:2750 [inline]
>> trace_event_eval_update+0x311/0xf90 kernel/trace/trace_events.c:2922
>> process_one_work kernel/workqueue.c:3254 [inline]
>> process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
>> worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
>> kthread+0x2f0/0x390 kernel/kthread.c:388
>> ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
>> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
>>
>> to a HARDIRQ-irq-unsafe lock:
>> (&htab->buckets[i].lock){+...}-{2:2}
>>
>> ... which became HARDIRQ-irq-unsafe at:
>> ...
>> lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
>> __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
>> _raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
>> spin_lock_bh include/linux/spinlock.h:356 [inline]
>> sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
>> 0xffffffffa0001b0e
>> bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
>> __bpf_prog_run include/linux/filter.h:657 [inline]
>> bpf_prog_run include/linux/filter.h:664 [inline]
>> __bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
>> bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
>> trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
>> __mutex_lock_common kernel/locking/mutex.c:617 [inline]
>> __mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
>> futex_cleanup_begin kernel/futex/core.c:1091 [inline]
>> futex_exit_release+0x34/0x1f0 kernel/futex/core.c:1143
>> exit_mm_release+0x1a/0x30 kernel/fork.c:1652
>> exit_mm+0xb0/0x310 kernel/exit.c:542
>> do_exit+0x99e/0x27e0 kernel/exit.c:865
>> do_group_exit+0x207/0x2c0 kernel/exit.c:1027
>> __do_sys_exit_group kernel/exit.c:1038 [inline]
>> __se_sys_exit_group kernel/exit.c:1036 [inline]
>> __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
>> do_syscall_64+0xfb/0x240
>> entry_SYSCALL_64_after_hwframe+0x6d/0x75
>>
>> other info that might help us debug this:
>>
>> Possible interrupt unsafe locking scenario:
>>
>> CPU0 CPU1
>> ---- ----
>> lock(&htab->buckets[i].lock);
>> local_irq_disable();
>> lock(rcu_node_0);
>> lock(&htab->buckets[i].lock);
>> <Interrupt>
>> lock(rcu_node_0);
>>
>> *** DEADLOCK ***
>> [Fix]
>> Ensure that the context interrupt state is the same before and after using the
>> bucket->lock.
>>
>> Reported-and-tested-by: syzbot+c4f4d25859c2e5859988@syzkaller.appspotmail.com
>> Signed-off-by: Edward Adam Davis <eadavis@qq.com>
>> ---
>> net/core/sock_map.c | 10 ++++++----
>> 1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>> index 27d733c0f65e..ae8f81b26e16 100644
>> --- a/net/core/sock_map.c
>> +++ b/net/core/sock_map.c
>> @@ -932,11 +932,12 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
>> struct bpf_shtab_bucket *bucket;
>> struct bpf_shtab_elem *elem;
>> int ret = -ENOENT;
>> + unsigned long flags;
>>
>> hash = sock_hash_bucket_hash(key, key_size);
>> bucket = sock_hash_select_bucket(htab, hash);
>>
>> - spin_lock_bh(&bucket->lock);
>> + spin_lock_irqsave(&bucket->lock, flags);
>> elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
>> if (elem) {
>> hlist_del_rcu(&elem->node);
>> @@ -944,7 +945,7 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
>> sock_hash_free_elem(htab, elem);
>> ret = 0;
>> }
>> - spin_unlock_bh(&bucket->lock);
>> + spin_unlock_irqrestore(&bucket->lock, flags);
>> return ret;
>> }
>>
>> @@ -1136,6 +1137,7 @@ static void sock_hash_free(struct bpf_map *map)
>> struct bpf_shtab_elem *elem;
>> struct hlist_node *node;
>> int i;
>> + unsigned long flags;
>>
>> /* After the sync no updates or deletes will be in-flight so it
>> * is safe to walk map and remove entries without risking a race
>> @@ -1151,11 +1153,11 @@ static void sock_hash_free(struct bpf_map *map)
>> * exists, psock exists and holds a ref to socket. That
>> * lets us to grab a socket ref too.
>> */
>> - spin_lock_bh(&bucket->lock);
>> + spin_lock_irqsave(&bucket->lock, flags);
>> hlist_for_each_entry(elem, &bucket->head, node)
>> sock_hold(elem->sk);
>> hlist_move_list(&bucket->head, &unlink_list);
>> - spin_unlock_bh(&bucket->lock);
>> + spin_unlock_irqrestore(&bucket->lock, flags);
>>
>> /* Process removed entries out of atomic context to
>> * block for socket lock before deleting the psock's
>> --
>> 2.43.0
>>
next prev parent reply other threads:[~2024-03-25 12:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-18 10:07 [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult syzbot
2024-03-23 5:42 ` [PATCH] bpf, sockmap: fix " Edward Adam Davis
2024-03-23 7:08 ` Alexei Starovoitov
2024-03-25 12:23 ` Jakub Sitnicki [this message]
2024-03-25 13:49 ` Jakub Sitnicki
2024-03-29 5:29 ` John Fastabend
2024-03-26 22:15 ` Jakub Sitnicki
2024-03-29 15:52 ` Shung-Hsi Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y1a6biie.fsf@cloudflare.com \
--to=jakub@cloudflare.com \
--cc=42.hyeyoo@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eadavis@qq.com \
--cc=edumazet@google.com \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=namhyung@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=peterz@infradead.org \
--cc=songliubraving@fb.com \
--cc=syzbot+c4f4d25859c2e5859988@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).