linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@meta.com>
To: Teng Qi <starmiku1207184332@gmail.com>
Cc: ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com,
	andrii@kernel.org, martin.lau@linux.dev, song@kernel.org,
	yhs@fb.com, kpsingh@kernel.org, sdf@google.com,
	haoluo@google.com, jolsa@kernel.org, davem@davemloft.net,
	kuba@kernel.org, hawk@kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [bug] kernel: bpf: syscall: a possible sleep-in-atomic bug in __bpf_prog_put()
Date: Mon, 22 May 2023 21:33:41 -0700	[thread overview]
Message-ID: <d027cb6b-e32c-36ad-3aba-9a7b1177f89f@meta.com> (raw)
In-Reply-To: <CALyQVazb=D1ejapiFdTnan6JbjFJA2q9ifhSsmF4OC9MDz3oAw@mail.gmail.com>



On 5/21/23 6:39 AM, Teng Qi wrote:
> Thank you.
> 
>  > Your above analysis makes sense if indeed that kvfree cannot appear
>  > inside a spin lock region or RCU read lock region. But is it true?
>  > I checked a few code paths in kvfree/kfree. It is either guarded
>  > with local_irq_save/restore or by
>  > spin_lock_irqsave/spin_unlock_
>  > irqrestore, etc. Did I miss
>  > anything? Are you talking about RT kernel here?
> 
> To see the sleepable possibility of kvfree, it is important to analyze the
> following calling stack:
> mm/util.c: 645 kvfree()
> mm/vmalloc.c: 2763 vfree()
> 
> In kvfree(), to call vfree, if the pointer addr points to memory 
> allocated by
> vmalloc(), it calls vfree().
> void kvfree(const void *addr)
> {
>          if (is_vmalloc_addr(addr))
>                  vfree(addr);
>          else
>                  kfree(addr);
> }
> 
> In vfree(), in_interrupt() and might_sleep() need to be considered.
> void vfree(const void *addr)
> {
>          // ...
>          if (unlikely(in_interrupt()))
>          {
>                  vfree_atomic(addr);
>                  return;
>          }
>          // ...
>          might_sleep();
>          // ...
> }

Sorry. I didn't check vfree path. So it does look like that
we need to pay special attention to non interrupt part.

> 
> The vfree() may sleep if in_interrupt() == false. The RCU read lock region
> could have in_interrupt() == false and spin lock region which only disables
> preemption also has in_interrupt() == false. So the kvfree() cannot appear
> inside a spin lock region or RCU read lock region if the pointer addr points
> to memory allocated by vmalloc().
> 
>  > > Therefore, we propose modifying the condition to include
>  > > in_atomic(). Could we
>  > > update the condition as follows: "in_irq() || irqs_disabled() ||
>  > > in_atomic()"?
>  > Thank you! We look forward to your feedback.
> 
> We now think that ‘irqs_disabled() || in_atomic() || 
> rcu_read_lock_held()’ is
> more proper. irqs_disabled() is for irq flag reg, in_atomic() is for
> preempt count and rcu_read_lock_held() is for RCU read lock region.

We cannot use rcu_read_lock_held() in the 'if' statement. The return
value rcu_read_lock_held() could be 1 for some configuraitons regardless
whether rcu_read_lock() is really held or not. In most cases,
rcu_read_lock_held() is used in issuing potential warnings.
Maybe there are other ways to record whether rcu_read_lock() is held or not?

I agree with your that 'irqs_disabled() || in_atomic()' makes sense
since it covers process context local_irq_save() and spin_lock() cases.

If we cannot resolve rcu_read_lock() presence issue, maybe the condition
can be !in_interrupt(), so any process-context will go to a workqueue.

Alternatively, we could have another solution. We could add another
function e.g., bpf_prog_put_rcu(), which indicates that bpf_prog_put()
will be done in rcu context. So if in_interrupt(), do kvfree, otherwise,
put into a workqueue.


> 
> -- Teng Qi
> 
> On Sun, May 21, 2023 at 11:45 AM Yonghong Song <yhs@meta.com 
> <mailto:yhs@meta.com>> wrote:
> 
> 
> 
>     On 5/19/23 7:18 AM, Teng Qi wrote:
>      > Thank you for your response.
>      >  > Looks like you only have suspicion here. Could you find a real
>     violation
>      >  > here where __bpf_prog_put() is called with !in_irq() &&
>      >  > !irqs_disabled(), but inside spin_lock or rcu read lock? I
>     have not seen
>      >  > things like that.
>      >
>      > For the complex conditions to call bpf_prog_put() with 1 refcnt,
>     we have
>      > been
>      > unable to really trigger this atomic violation after trying to
>     construct
>      > test cases manually. But we found that it is possible to show
>     cases with
>      > !in_irq() && !irqs_disabled(), but inside spin_lock or rcu read lock.
>      > For example, even a failed case, one of selftest cases of bpf,
>     netns_cookie,
>      > calls bpf_sock_map_update() and may indirectly call bpf_prog_put()
>      > only inside rcu read lock: The possible call stack is:
>      > net/core/sock_map.c: 615 bpf_sock_map_update()
>      > net/core/sock_map.c: 468 sock_map_update_common()
>      > net/core/sock_map.c:  217 sock_map_link()
>      > kernel/bpf/syscall.c: 2111 bpf_prog_put()
>      >
>      > The files about netns_cookie include
>      > tools/testing/selftests/bpf/progs/netns_cookie_prog.c and
>      > tools/testing/selftests/bpf/prog_tests/netns_cookie.c. We
>     inserted the
>      > following code in
>      > ‘net/core/sock_map.c: 468 sock_map_update_common()’:
>      > static int sock_map_update_common(..)
>      > {
>      >          int inIrq = in_irq();
>      >          int irqsDisabled = irqs_disabled();
>      >          int preemptBits = preempt_count();
>      >          int inAtomic = in_atomic();
>      >          int rcuHeld = rcu_read_lock_held();
>      >          printk("in_irq() %d, irqs_disabled() %d, preempt_count() %d,
>      >            in_atomic() %d, rcu_read_lock_held() %d", inIrq,
>     irqsDisabled,
>      >            preemptBits, inAtomic, rcuHeld);
>      > }
>      >
>      > The output message is as follows:
>      > root@(none):/root/bpf# ./test_progs -t netns_cookie
>      > [  137.639188] in_irq() 0, irqs_disabled() 0, preempt_count() 0,
>      > in_atomic() 0,
>      >          rcu_read_lock_held() 1
>      > #113     netns_cookie:OK
>      > Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
>      >
>      > We notice that there are numerous callers in kernel/, net/ and
>     drivers/,
>      > so we
>      > highly suggest modifying __bpf_prog_put() to address this gap.
>     The gap
>      > exists
>      > because __bpf_prog_put() is only safe under in_irq() ||
>     irqs_disabled()
>      > but not in_atomic() || rcu_read_lock_held(). The following code
>     snippet may
>      > mislead developers into thinking that bpf_prog_put() is safe in all
>      > contexts.
>      > if (in_irq() || irqs_disabled()) {
>      >          INIT_WORK(&aux->work, bpf_prog_put_deferred);
>      >          schedule_work(&aux->work);
>      > } else {
>      >          bpf_prog_put_deferred(&aux->work);
>      > }
>      >
>      > Implicit dependency may lead to issues.
>      >
>      >  > Any problem here?
>      > We mentioned it to demonstrate the possibility of kvfree() being
>      > called by __bpf_prog_put_noref().
>      >
>      > Thanks.
>      > -- Teng Qi
>      >
>      > On Wed, May 17, 2023 at 1:08 AM Yonghong Song <yhs@meta.com
>     <mailto:yhs@meta.com>
>      > <mailto:yhs@meta.com <mailto:yhs@meta.com>>> wrote:
>      >
>      >
>      >
>      >     On 5/16/23 4:18 AM, starmiku1207184332@gmail.com
>     <mailto:starmiku1207184332@gmail.com>
>      >     <mailto:starmiku1207184332@gmail.com
>     <mailto:starmiku1207184332@gmail.com>> wrote:
>      >      > From: Teng Qi <starmiku1207184332@gmail.com
>     <mailto:starmiku1207184332@gmail.com>
>      >     <mailto:starmiku1207184332@gmail.com
>     <mailto:starmiku1207184332@gmail.com>>>
>      >      >
>      >      > Hi, bpf developers,
>      >      >
>      >      > We are developing a static tool to check the matching between
>      >     helpers and the
>      >      > context of hooks. During our analysis, we have discovered some
>      >     important
>      >      > findings that we would like to report.
>      >      >
>      >      > ‘kernel/bpf/syscall.c: 2097 __bpf_prog_put()’ shows that
>     function
>      >      > bpf_prog_put_deferred() won`t be called in the condition of
>      >      > ‘in_irq() || irqs_disabled()’.
>      >      > if (in_irq() || irqs_disabled()) {
>      >      >      INIT_WORK(&aux->work, bpf_prog_put_deferred);
>      >      >      schedule_work(&aux->work);
>      >      > } else {
>      >      >
>      >      >      bpf_prog_put_deferred(&aux->work);
>      >      > }
>      >      >
>      >      > We suspect this condition exists because there might be
>     sleepable
>      >     operations
>      >      > in the callees of the bpf_prog_put_deferred() function:
>      >      > kernel/bpf/syscall.c: 2097 __bpf_prog_put()
>      >      > kernel/bpf/syscall.c: 2084 bpf_prog_put_deferred()
>      >      > kernel/bpf/syscall.c: 2063 __bpf_prog_put_noref()
>      >      > kvfree(prog->aux->jited_linfo);
>      >      > kvfree(prog->aux->linfo);
>      >
>      >     Looks like you only have suspicion here. Could you find a real
>      >     violation
>      >     here where __bpf_prog_put() is called with !in_irq() &&
>      >     !irqs_disabled(), but inside spin_lock or rcu read lock? I
>     have not seen
>      >     things like that.
>      >
>      >      >
>      >      > Additionally, we found that array prog->aux->jited_linfo is
>      >     initialized in
>      >      > ‘kernel/bpf/core.c: 157 bpf_prog_alloc_jited_linfo()’:
>      >      > prog->aux->jited_linfo = kvcalloc(prog->aux->nr_linfo,
>      >      >    sizeof(*prog->aux->jited_linfo),
>     bpf_memcg_flags(GFP_KERNEL |
>      >     __GFP_NOWARN));
>      >
>      >     Any problem here?
>      >
>      >      >
>      >      > Our question is whether the condition 'in_irq() ||
>      >     irqs_disabled() == false' is
>      >      > sufficient for calling 'kvfree'. We are aware that calling
>      >     'kvfree' within the
>      >      > context of a spin lock or an RCU lock is unsafe.
> 
>     Your above analysis makes sense if indeed that kvfree cannot appear
>     inside a spin lock region or RCU read lock region. But is it true?
>     I checked a few code paths in kvfree/kfree. It is either guarded
>     with local_irq_save/restore or by
>     spin_lock_irqsave/spin_unlock_irqrestore, etc. Did I miss
>     anything? Are you talking about RT kernel here?
> 
> 
>      >      >
>      >      > Therefore, we propose modifying the condition to include
>      >     in_atomic(). Could we
>      >      > update the condition as follows: "in_irq() ||
>     irqs_disabled() ||
>      >     in_atomic()"?
>      >      >
>      >      > Thank you! We look forward to your feedback.
>      >      >
>      >      > Signed-off-by: Teng Qi <starmiku1207184332@gmail.com
>     <mailto:starmiku1207184332@gmail.com>
>      >     <mailto:starmiku1207184332@gmail.com
>     <mailto:starmiku1207184332@gmail.com>>>
>      >
> 

  parent reply	other threads:[~2023-05-23  4:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-16 11:18 [bug] kernel: bpf: syscall: a possible sleep-in-atomic bug in __bpf_prog_put() starmiku1207184332
2023-05-16 17:08 ` Yonghong Song
     [not found]   ` <CALyQVax8X63qekZVhvRTmZFFs+ucPKRkBB7UnRZk6Hu3ggi7Og@mail.gmail.com>
2023-05-21  3:44     ` Yonghong Song
     [not found]       ` <CALyQVazb=D1ejapiFdTnan6JbjFJA2q9ifhSsmF4OC9MDz3oAw@mail.gmail.com>
2023-05-23  4:33         ` Yonghong Song [this message]
2023-05-24 12:42           ` Teng Qi
2023-05-24 19:34             ` Yonghong Song
2023-05-24 19:44               ` Alexei Starovoitov
2023-05-25  5:37                 ` Yonghong Song
2023-06-11 13:02               ` Teng Qi
2023-06-12  0:01                 ` Yonghong Song
2023-06-19  9:05                   ` Teng Qi
2023-06-19 19:01                     ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d027cb6b-e32c-36ad-3aba-9a7b1177f89f@meta.com \
    --to=yhs@meta.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=song@kernel.org \
    --cc=starmiku1207184332@gmail.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).