linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] kernel: bpf: stackmap: fix a possible sleep-in-atomic bug in bpf_mmap_unlock_get_irq_work()
@ 2023-03-17  3:52 starmiku1207184332
  2023-03-17 17:16 ` Alexei Starovoitov
  0 siblings, 1 reply; 6+ messages in thread
From: starmiku1207184332 @ 2023-03-17  3:52 UTC (permalink / raw)
  To: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa
  Cc: bpf, linux-kernel, baijiaju1990, Teng Qi

From: Teng Qi <starmiku1207184332@gmail.com>

bpf_mmap_unlock_get_irq_work() and bpf_mmap_unlock_mm() cooperate to safely
acquire mm->mmap_lock safely. The code in bpf_mmap_unlock_get_irq_work():
 struct mmap_unlock_irq_work *work = NULL;
 bool irq_work_busy = false;
 if (irqs_disabled()) {
 	if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
 		work = this_cpu_ptr(&mmap_unlock_work);
 		if (irq_work_is_busy(&work->irq_work)) {
 			irq_work_busy = true;
 		}
 	} else {
 		irq_work_busy = true;
 	}
 }
 *work_ptr = work;

shows that the pointer of struct mmap_unlock_irq_work "work" is not NULL if
irqs_disabled() == true and IS_ENABLED(CONFIG_PREEMPT_RT) == false or NULL in
other cases. The "work" will be passed to bpf_mmap_unlock_mm() as the argument.
The code in bpf_mmap_unlock_mm():
 if (!work) {
 	mmap_read_unlock(mm);
 } else {
 	work->mm = mm;
 	rwsem_release(&mm->mmap_lock.dep_map, _RET_IP_);
 	irq_work_queue(&work->irq_work);
 }

shows that mm->mmap_lock is released directly if "work" is NULL. Otherwise,
irq_work_queue is called to avoid calling mmap_read_unlock() in an irq disabled
context because of its possible sleep operation. However, mmap_read_unlock()
is unsafely called in a preempt disabled context when spin_lock() or
rcu_read_lock() has been called.

We found that some ebpf helpers that call these two functions may be invoked in
preempt disabled contexts through various hooks. We can give an example:
 SEC("kprobe/kmem_cache_free")
 int bpf_prog1(struct pt_regs *ctx)
 {
 	char buff[50];
 	bpf_get_stack(ctx, buff, sizeof(struct bpf_stack_build_id),
	              BPF_F_USER_BUILD_ID | BPF_F_USER_STACK);
 	return 0;
 }

The hook "kprobe/kmem_cache_free" is often called in preempt disabled contexts
by many modules. To fix this possible bug, we add in_atomic() in
bpf_mmap_unlock_get_irq_work().


Signed-off-by: Teng Qi <starmiku1207184332@gmail.com>
---
v2:
Thank for John Fastabend`s friendly response.

We are currently developing a static analysis tool to detect eBPF bugs in the
kernel. During our work, we discovered several possible bugs, including this
one. Unfortunately, we do not have enough information to provide a runnable
case (e.g. selftest case) that would trigger this bug, nor do we have a stack 
trace to offer. Going forward, we plan to improve our tool to provide necessary
information to construct a runnable case thst could reproduce this bug.

Fixes: 7c7e3d31e785 ("bpf: Introduce helper bpf_find_vma")
---
 kernel/bpf/mmap_unlock_work.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/mmap_unlock_work.h b/kernel/bpf/mmap_unlock_work.h
index 5d18d7d85bef..3d472d24d88f 100644
--- a/kernel/bpf/mmap_unlock_work.h
+++ b/kernel/bpf/mmap_unlock_work.h
@@ -26,7 +26,7 @@ static inline bool bpf_mmap_unlock_get_irq_work(struct mmap_unlock_irq_work **wo
 	struct mmap_unlock_irq_work *work = NULL;
 	bool irq_work_busy = false;
 
-	if (irqs_disabled()) {
+	if (in_atomic() || irqs_disabled()) {
 		if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
 			work = this_cpu_ptr(&mmap_unlock_work);
 			if (irq_work_is_busy(&work->irq_work)) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] kernel: bpf: stackmap: fix a possible sleep-in-atomic bug in bpf_mmap_unlock_get_irq_work()
  2023-03-17  3:52 [PATCH v2] kernel: bpf: stackmap: fix a possible sleep-in-atomic bug in bpf_mmap_unlock_get_irq_work() starmiku1207184332
@ 2023-03-17 17:16 ` Alexei Starovoitov
  2023-03-17 18:14   ` John Fastabend
       [not found]   ` <CALyQVayJaZ_s9yuL07ReZRmTT52ua7B+92CdYnLi9GiegpOKNw@mail.gmail.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Alexei Starovoitov @ 2023-03-17 17:16 UTC (permalink / raw)
  To: starmiku1207184332
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, linux-kernel, baijiaju1990

On Fri, Mar 17, 2023 at 03:52:27AM +0000, starmiku1207184332@gmail.com wrote:
> context because of its possible sleep operation. However, mmap_read_unlock()
> is unsafely called in a preempt disabled context when spin_lock() or
> rcu_read_lock() has been called.

Why is that unsafe?
See __up_read(). It's doing preempt_disable().


> -	if (irqs_disabled()) {
> +	if (in_atomic() || irqs_disabled()) {

We cannot do this. It will significantly hurt stack traces with build_id.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] kernel: bpf: stackmap: fix a possible sleep-in-atomic bug in bpf_mmap_unlock_get_irq_work()
  2023-03-17 17:16 ` Alexei Starovoitov
@ 2023-03-17 18:14   ` John Fastabend
       [not found]   ` <CALyQVayJaZ_s9yuL07ReZRmTT52ua7B+92CdYnLi9GiegpOKNw@mail.gmail.com>
  1 sibling, 0 replies; 6+ messages in thread
From: John Fastabend @ 2023-03-17 18:14 UTC (permalink / raw)
  To: Alexei Starovoitov, starmiku1207184332
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, linux-kernel, baijiaju1990

Alexei Starovoitov wrote:
> On Fri, Mar 17, 2023 at 03:52:27AM +0000, starmiku1207184332@gmail.com wrote:
> > context because of its possible sleep operation. However, mmap_read_unlock()
> > is unsafely called in a preempt disabled context when spin_lock() or
> > rcu_read_lock() has been called.
> 
> Why is that unsafe?
> See __up_read(). It's doing preempt_disable().

Yep I didn't see the issue either that is why I asked for the stack trace. If
its a bug we would want a reproducer as well seems like it should be trivially
tested in selftests.

> 
> 
> > -	if (irqs_disabled()) {
> > +	if (in_atomic() || irqs_disabled()) {
> 
> We cannot do this. It will significantly hurt stack traces with build_id.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] kernel: bpf: stackmap: fix a possible sleep-in-atomic bug in bpf_mmap_unlock_get_irq_work()
       [not found]   ` <CALyQVayJaZ_s9yuL07ReZRmTT52ua7B+92CdYnLi9GiegpOKNw@mail.gmail.com>
@ 2023-03-19 16:47     ` Alexei Starovoitov
       [not found]       ` <CALyQVazN_KTOhNVowuOV4FSr_zd5htCaBJ+xKgCDaL1LgVG50Q@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2023-03-19 16:47 UTC (permalink / raw)
  To: Teng Qi
  Cc: ast, daniel, andrii, martin.lau, song, yhs, john.fastabend,
	kpsingh, sdf, haoluo, jolsa, bpf, linux-kernel, baijiaju1990

On Sun, Mar 19, 2023 at 02:12:49AM +0800, Teng Qi wrote:
> Regarding the first problem, our tool discovered a possible code path that
> 
> starts from mmap_read_unlock() and leads to sleep in kernel v6.3-rc2 source
> 
> code.
> 
> 
> 
> kernel/bpf/mmap_unlock_work.h:52 mmap_read_unlock(mm);
> 
> include/linux/mmap_lock.h:144 up_read(&mm->mmap_lock);
> 
> kernel/locking/rwsem.c:1616 __up_read(sem);
> 
> kernel/locking/rwsem.c:1352 rwsem_wake(sem);
> 
> kernel/locking/rwsem.c:1211 rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
> 
> kernel/locking/rwsem.c:566 wake_q_add_safe(wake_q, tsk);
> 
> kernel/sched/core.c:990 put_task_struct(task);
> 
> include/linux/sched/task.h:119 __put_task_struct(t);
> 
> kernel/fork.c:857 exit_creds(tsk);
> 
> kernel/cred.c:172 put_cred(cred);
> 
> include/linux/cred.h:288 __put_cred(cred);
> 
> kernel/cred.c:151 put_cred_rcu(&cred->rcu);
> 
> kernel/cred.c:121 put_group_info(cred->group_info);
> 
> include/linux/cred.h:53 groups_free(group_info);
> 
> kernel/groups.c:31 kvfree(group_info);
> 
> mm/util.c:647 vfree(addr); <- oops, sleep when size of group_info is large
> 
> 
> 
> However, we cannot guarantee that this code path will be triggered during
> 
> runtime since it was generated by a static analysis tool.

So it is a purely theoretical issue and out of thousands users of up_read()
you've decided to fix one where it is called form mmap_read_unlock().
Why?

You also see that __up_read is doing preempt_disable and then calls rwsem_wake()
which will theoretically can call vfree() with "oops", right?
So agian, why target mmap_read_unlock() ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] kernel: bpf: stackmap: fix a possible sleep-in-atomic bug in bpf_mmap_unlock_get_irq_work()
       [not found]       ` <CALyQVazN_KTOhNVowuOV4FSr_zd5htCaBJ+xKgCDaL1LgVG50Q@mail.gmail.com>
@ 2023-03-20 15:20         ` Alexei Starovoitov
       [not found]           ` <CALyQVawAZQ=K5RCnq0yz+g3fUT6vd5h15wMAeGXnDwdrZi87Qg@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Alexei Starovoitov @ 2023-03-20 15:20 UTC (permalink / raw)
  To: Teng Qi
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, bpf, LKML,
	baijiaju1990

On Mon, Mar 20, 2023 at 5:28 AM Teng Qi <starmiku1207184332@gmail.com> wrote:
>
> Yeah, we got your points. There are two key questions. The first question is
> that preempt_disable() and preempt_enable() will be conflicted with vfree()
> before the mmap_read_unlock().

What does this sentence mean?

> The second question is that thousands callers
> of up_read() only make sure irqs_disabled() == false needed fixed if
> the mmap_read_unlock() is fixed.

that doesn't answer my question either.

> Detecting ebpf bugs can be challenging since it is difficult to prove that a
> bug can be triggered during runtime, as well as fixing the bug. We decided to
> give up this patch that fixes the possible sleep-in-atomic bug in
> bpf_mmap_unlock_get_irq_work(). Instead, we will focus on improving our static
> analysis tool to find ebpf-specific bugs.

Please don't.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] kernel: bpf: stackmap: fix a possible sleep-in-atomic bug in bpf_mmap_unlock_get_irq_work()
       [not found]           ` <CALyQVawAZQ=K5RCnq0yz+g3fUT6vd5h15wMAeGXnDwdrZi87Qg@mail.gmail.com>
@ 2023-03-22 22:30             ` Alexei Starovoitov
  0 siblings, 0 replies; 6+ messages in thread
From: Alexei Starovoitov @ 2023-03-22 22:30 UTC (permalink / raw)
  To: Teng Qi
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, bpf, LKML,
	baijiaju1990

On Wed, Mar 22, 2023 at 3:26 AM Teng Qi <starmiku1207184332@gmail.com> wrote:
>
> We are returning to this possible bug because it is better not to give up.
> Please don't mind any previous retreats.
>
> The reason we are only fixing the mmap_read_unlock() function is that our goal
> is to develop a static analysis tool to detect bugs in ebpf. According to the
> tool's output, we found that the mmap_read_unlock() function may be called
> indirectly by ebpf hooks in a context where preemption is disabled, which may
> lead to sleepable function calls through this code path.
>
>
> kernel/bpf/mmap_unlock_work.h:52 mmap_read_unlock(mm);

bpf rcu scope is a 1st non sleepable line.

> include/linux/mmap_lock.h:144 up_read(&mm->mmap_lock);
> kernel/locking/rwsem.c:1616 __up_read(sem);
> kernel/locking/rwsem.c:1352 rwsem_wake(sem); <- preempt_disable()

and this is 2nd non sleepable line.

> kernel/locking/rwsem.c:1211 rwsem_mark_wake(sem, RWSEM_WAKE_ANY,
>  &wake_q); <- raw_spin_lock_irqsave()

and this is 3rd.

> kernel/locking/rwsem.c:566 wake_q_add_safe(wake_q, tsk);
> kernel/sched/core.c:990 put_task_struct(task);
> include/linux/sched/task.h:119 __put_task_struct(t);
> kernel/fork.c:857 exit_creds(tsk);
> kernel/cred.c:172 put_cred(cred);
> include/linux/cred.h:288 __put_cred(cred);
> kernel/cred.c:151 put_cred_rcu(&cred->rcu);
> kernel/cred.c:121 put_group_info(cred->group_info);
> include/linux/cred.h:53 groups_free(group_info);
> kernel/groups.c:31 kvfree(group_info);
> mm/util.c:647 vfree(addr); <- oops, sleep when size of group_info is large
>
>
> Our focus has been on detecting and fixing bugs in ebpf, and we were not
> previously aware that vfree() might be called in other contexts where preemption
> is disabled.

preemption and non-sleepable are not the same.

> Additionally, you mentioned that rwsem_wake() calls
> preempt_disable(). Upon investigating the code path, we discovered another
> occurrence of raw_spin_lock_irqsave() in rwsem_mark_wake(). We understand that
> our tool does not currently account for context operations from helpers to
> sleepable functions.
>
> To address this limitation, we have decided to enhance our tool's capabilities
> to collect and display information on context operations in the callee functions
> of helpers and potential callers of sleepable functions. However, this work will
> require some time. Consequently, we have decided to abandon this patch before.
>
> At present, we are uncertain about how to fix this potential and theoretical
> bug. One potential solution could be to replace the use of kvfree() with
> kfree_rcu() in groups_free(). Among the callees in put_group_info(),
> groups_free() is the only one that may call sleepable kvfree(). Therefore, we
> propose modifying groups_free() to ensure that put_group_info() does not sleep.

Do not fix what is not broken.
So far you haven't demonstrated that this stack trace is possible.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-03-22 22:30 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-17  3:52 [PATCH v2] kernel: bpf: stackmap: fix a possible sleep-in-atomic bug in bpf_mmap_unlock_get_irq_work() starmiku1207184332
2023-03-17 17:16 ` Alexei Starovoitov
2023-03-17 18:14   ` John Fastabend
     [not found]   ` <CALyQVayJaZ_s9yuL07ReZRmTT52ua7B+92CdYnLi9GiegpOKNw@mail.gmail.com>
2023-03-19 16:47     ` Alexei Starovoitov
     [not found]       ` <CALyQVazN_KTOhNVowuOV4FSr_zd5htCaBJ+xKgCDaL1LgVG50Q@mail.gmail.com>
2023-03-20 15:20         ` Alexei Starovoitov
     [not found]           ` <CALyQVawAZQ=K5RCnq0yz+g3fUT6vd5h15wMAeGXnDwdrZi87Qg@mail.gmail.com>
2023-03-22 22:30             ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).