bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] tracing: Add generic test_recursion_try_acquire()
       [not found] <20230414221702.554c44fe@rorschach.local.home>
@ 2023-04-15 13:33 ` Yafang Shao
  2023-04-15 14:33   ` Yafang Shao
  0 siblings, 1 reply; 4+ messages in thread
From: Yafang Shao @ 2023-04-15 13:33 UTC (permalink / raw)
  To: Steven Rostedt, Alexei Starovoitov
  Cc: LKML, Linux trace kernel, Masami Hiramatsu, Mark Rutland, bpf

On Sat, Apr 15, 2023 at 10:17 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
>
> The ftrace_test_recursion_trylock() also disables preemption. This is not
> required, but was a clean up as every place that called it also disabled
> preemption, and making the two tightly coupled appeared to make the code
> simpler.
>
> But the recursion protection can be used for other purposes that do not
> require disabling preemption. As the recursion bits are attached to the
> task_struct, it follows the task, so there's no need for preemption being
> disabled.
>
> Add test_recursion_try_acquire/release() functions to be used generically,
> and separate it from being associated with ftrace. It also removes the
> "lock" name, as there is no lock happening. Keeping the "lock" for the
> ftrace version is better, as it at least differentiates that preemption is
> being disabled (hence, "locking the CPU").
>
> Link: https://lore.kernel.org/linux-trace-kernel/20230321020103.13494-1-laoar.shao@gmail.com/
>
> Cc: Yafang Shao <laoar.shao@gmail.com>
> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

Acked-by: Yafang Shao <laoar.shao@gmail.com>

+Alexei, bpf

Thanks Steven.
I almost finished replacing prog->active with
test_recursion_try_{acquire,release}[1], and yet I need to do more
tests.

In my verification, I find that something abnormal happens. When I ran
bpf self tests after the replacement, I found the fentry recursion
test failed[2]. The test result as follows,

main:PASS:skel_open_and_load 0 nsec
main:PASS:skel_attach 0 nsec
main:PASS:pass1 == 0 0 nsec
main:PASS:pass1 == 1 0 nsec
main:PASS:pass1 == 2 0 nsec
main:PASS:pass2 == 0 0 nsec
main:FAIL:pass2 == 1 unexpected pass2 == 1: actual 2 != expected 1 [0]
main:FAIL:pass2 == 2 unexpected pass2 == 2: actual 4 != expected 2
main:PASS:get_prog_info 0 nsec
main:PASS:recursion_misses 0 nsec

The reason is that the bpf_map_delete_elem() in this fentry SEC()[2]
will hit the first if-condition[3] but fail to hit the second
if-condition[4], so it won't be considered as recursion at the first
time. So the value 'pass2' will be 2[0]. Illustrated as follows,

SEC("fentry/htab_map_delete_elem")
    pass2++;   <<<< Turn out to be 1 after this operation.
    bpf_map_delete_elem(&hash2, &key);
         SEC("fentry/htab_map_delete_elem")    <<<< not recursion
            pass2++; <<<< Turn out to be 2 after this operation.
            bpf_map_delete_elem(&hash2, &key);
                SEC("fentry/htab_map_delete_elem")    <<<< RECURSION!!

That said, if a function can call itself, the first call won't be
considered as recursion. Is that expected ?

One thing I can be sure of is that prog->active can't handle the
interrupted case[5].  If the interrupt happens, it will be considered
as a recursion.
But it seems that bpf_map_delete_elem() in this fentry SEC() is still
in the process context.

[1]. https://lore.kernel.org/bpf/CALOAHbACzCwu-VeMczEJw8A4WFgkL-uQDS1NkcVR2pqEMZyAQQ@mail.gmail.com/
[2].  https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/recursion.c#n38
[3]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/trace_recursion.h#n166
[4]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/trace_recursion.h#n176
[5]. https://lore.kernel.org/bpf/20230409220239.0fcf6738@rorschach.local.home/

> ---
>  include/linux/trace_recursion.h | 47 ++++++++++++++++++++++++---------
>  kernel/trace/ftrace.c           |  2 ++
>  2 files changed, 37 insertions(+), 12 deletions(-)
>
> diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> index d48cd92d2364..80de2ee7b4c3 100644
> --- a/include/linux/trace_recursion.h
> +++ b/include/linux/trace_recursion.h
> @@ -150,9 +150,6 @@ extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
>  # define trace_warn_on_no_rcu(ip)      false
>  #endif
>
> -/*
> - * Preemption is promised to be disabled when return bit >= 0.
> - */
>  static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
>                                                         int start)
>  {
> @@ -182,18 +179,11 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
>         val |= 1 << bit;
>         current->trace_recursion = val;
>         barrier();
> -
> -       preempt_disable_notrace();
> -
>         return bit;
>  }
>
> -/*
> - * Preemption will be enabled (if it was previously enabled).
> - */
>  static __always_inline void trace_clear_recursion(int bit)
>  {
> -       preempt_enable_notrace();
>         barrier();
>         trace_recursion_clear(bit);
>  }
> @@ -205,12 +195,18 @@ static __always_inline void trace_clear_recursion(int bit)
>   * tracing recursed in the same context (normal vs interrupt),
>   *
>   * Returns: -1 if a recursion happened.
> - *           >= 0 if no recursion.
> + *           >= 0 if no recursion and preemption will be disabled.
>   */
>  static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>                                                          unsigned long parent_ip)
>  {
> -       return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
> +       int bit;
> +
> +       bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
> +       if (unlikely(bit < 0))
> +               return bit;
> +       preempt_disable_notrace();
> +       return bit;
>  }
>
>  /**
> @@ -220,6 +216,33 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
>   * This is used at the end of a ftrace callback.
>   */
>  static __always_inline void ftrace_test_recursion_unlock(int bit)
> +{
> +       preempt_enable_notrace();
> +       trace_clear_recursion(bit);
> +}
> +
> +/**
> + * test_recursion_try_acquire - tests for recursion in same context
> + *
> + * This will detect recursion of a function.
> + *
> + * Returns: -1 if a recursion happened.
> + *           >= 0 if no recursion
> + */
> +static __always_inline int test_recursion_try_acquire(unsigned long ip,
> +                                                     unsigned long parent_ip)
> +{
> +       return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
> +}
> +
> +/**
> + * test_recursion_release - called after a success of test_recursion_try_acquire()
> + * @bit: The return of a successful test_recursion_try_acquire()
> + *
> + * This releases the recursion lock taken by a non-negative return call
> + * by test_recursion_try_acquire().
> + */
> +static __always_inline void test_recursion_release(int bit)
>  {
>         trace_clear_recursion(bit);
>  }
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index db8532a4d5c8..1b117ab057ed 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -7299,6 +7299,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
>         if (bit < 0)
>                 return;
>
> +       preempt_disable();
>         do_for_each_ftrace_op(op, ftrace_ops_list) {
>                 /* Stub functions don't need to be called nor tested */
>                 if (op->flags & FTRACE_OPS_FL_STUB)
> @@ -7320,6 +7321,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
>                 }
>         } while_for_each_ftrace_op(op);
>  out:
> +       preempt_enable();
>         trace_clear_recursion(bit);
>  }
>
> --
> 2.39.2
>


-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] tracing: Add generic test_recursion_try_acquire()
  2023-04-15 13:33 ` [PATCH] tracing: Add generic test_recursion_try_acquire() Yafang Shao
@ 2023-04-15 14:33   ` Yafang Shao
  2023-04-15 15:46     ` Steven Rostedt
  0 siblings, 1 reply; 4+ messages in thread
From: Yafang Shao @ 2023-04-15 14:33 UTC (permalink / raw)
  To: Steven Rostedt, Alexei Starovoitov
  Cc: LKML, Linux trace kernel, Masami Hiramatsu, Mark Rutland, bpf

On Sat, Apr 15, 2023 at 9:33 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Sat, Apr 15, 2023 at 10:17 AM Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > From: "Steven Rostedt (Google)" <rostedt@goodmis.org>
> >
> > The ftrace_test_recursion_trylock() also disables preemption. This is not
> > required, but was a clean up as every place that called it also disabled
> > preemption, and making the two tightly coupled appeared to make the code
> > simpler.
> >
> > But the recursion protection can be used for other purposes that do not
> > require disabling preemption. As the recursion bits are attached to the
> > task_struct, it follows the task, so there's no need for preemption being
> > disabled.
> >
> > Add test_recursion_try_acquire/release() functions to be used generically,
> > and separate it from being associated with ftrace. It also removes the
> > "lock" name, as there is no lock happening. Keeping the "lock" for the
> > ftrace version is better, as it at least differentiates that preemption is
> > being disabled (hence, "locking the CPU").
> >
> > Link: https://lore.kernel.org/linux-trace-kernel/20230321020103.13494-1-laoar.shao@gmail.com/
> >
> > Cc: Yafang Shao <laoar.shao@gmail.com>
> > Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>
> Acked-by: Yafang Shao <laoar.shao@gmail.com>
>
> +Alexei, bpf
>
> Thanks Steven.
> I almost finished replacing prog->active with
> test_recursion_try_{acquire,release}[1], and yet I need to do more
> tests.
>
> In my verification, I find that something abnormal happens. When I ran
> bpf self tests after the replacement, I found the fentry recursion
> test failed[2]. The test result as follows,
>
> main:PASS:skel_open_and_load 0 nsec
> main:PASS:skel_attach 0 nsec
> main:PASS:pass1 == 0 0 nsec
> main:PASS:pass1 == 1 0 nsec
> main:PASS:pass1 == 2 0 nsec
> main:PASS:pass2 == 0 0 nsec
> main:FAIL:pass2 == 1 unexpected pass2 == 1: actual 2 != expected 1 [0]
> main:FAIL:pass2 == 2 unexpected pass2 == 2: actual 4 != expected 2
> main:PASS:get_prog_info 0 nsec
> main:PASS:recursion_misses 0 nsec
>

I don't have a clear idea why TRACE_CTX_TRANSITION must be needed, but
it seems we have to do below code for the fentry test case,

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index d48cd92d2364..f6b4601dd604 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -163,21 +163,8 @@ static __always_inline int
trace_test_and_set_recursion(unsigned long ip, unsign
  return -1;

  bit = trace_get_context_bit() + start;
- if (unlikely(val & (1 << bit))) {
- /*
- * If an interrupt occurs during a trace, and another trace
- * happens in that interrupt but before the preempt_count is
- * updated to reflect the new interrupt context, then this
- * will think a recursion occurred, and the event will be dropped.
- * Let a single instance happen via the TRANSITION_BIT to
- * not drop those events.
- */
- bit = TRACE_CTX_TRANSITION + start;
- if (val & (1 << bit)) {
- do_ftrace_record_recursion(ip, pip);
- return -1;
- }
- }
+ if (unlikely(val & (1 << bit)))
+ return -1;

  val |= 1 << bit;
  current->trace_recursion = val;


> The reason is that the bpf_map_delete_elem() in this fentry SEC()[2]
> will hit the first if-condition[3] but fail to hit the second
> if-condition[4], so it won't be considered as recursion at the first
> time. So the value 'pass2' will be 2[0]. Illustrated as follows,
>
> SEC("fentry/htab_map_delete_elem")
>     pass2++;   <<<< Turn out to be 1 after this operation.
>     bpf_map_delete_elem(&hash2, &key);
>          SEC("fentry/htab_map_delete_elem")    <<<< not recursion
>             pass2++; <<<< Turn out to be 2 after this operation.
>             bpf_map_delete_elem(&hash2, &key);
>                 SEC("fentry/htab_map_delete_elem")    <<<< RECURSION!!
>
> That said, if a function can call itself, the first call won't be
> considered as recursion. Is that expected ?
>
> One thing I can be sure of is that prog->active can't handle the
> interrupted case[5].  If the interrupt happens, it will be considered
> as a recursion.
> But it seems that bpf_map_delete_elem() in this fentry SEC() is still
> in the process context.
>
> [1]. https://lore.kernel.org/bpf/CALOAHbACzCwu-VeMczEJw8A4WFgkL-uQDS1NkcVR2pqEMZyAQQ@mail.gmail.com/
> [2].  https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/recursion.c#n38
> [3]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/trace_recursion.h#n166
> [4]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/include/linux/trace_recursion.h#n176
> [5]. https://lore.kernel.org/bpf/20230409220239.0fcf6738@rorschach.local.home/
>
> > ---
> >  include/linux/trace_recursion.h | 47 ++++++++++++++++++++++++---------
> >  kernel/trace/ftrace.c           |  2 ++
> >  2 files changed, 37 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> > index d48cd92d2364..80de2ee7b4c3 100644
> > --- a/include/linux/trace_recursion.h
> > +++ b/include/linux/trace_recursion.h
> > @@ -150,9 +150,6 @@ extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip);
> >  # define trace_warn_on_no_rcu(ip)      false
> >  #endif
> >
> > -/*
> > - * Preemption is promised to be disabled when return bit >= 0.
> > - */
> >  static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsigned long pip,
> >                                                         int start)
> >  {
> > @@ -182,18 +179,11 @@ static __always_inline int trace_test_and_set_recursion(unsigned long ip, unsign
> >         val |= 1 << bit;
> >         current->trace_recursion = val;
> >         barrier();
> > -
> > -       preempt_disable_notrace();
> > -
> >         return bit;
> >  }
> >
> > -/*
> > - * Preemption will be enabled (if it was previously enabled).
> > - */
> >  static __always_inline void trace_clear_recursion(int bit)
> >  {
> > -       preempt_enable_notrace();
> >         barrier();
> >         trace_recursion_clear(bit);
> >  }
> > @@ -205,12 +195,18 @@ static __always_inline void trace_clear_recursion(int bit)
> >   * tracing recursed in the same context (normal vs interrupt),
> >   *
> >   * Returns: -1 if a recursion happened.
> > - *           >= 0 if no recursion.
> > + *           >= 0 if no recursion and preemption will be disabled.
> >   */
> >  static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
> >                                                          unsigned long parent_ip)
> >  {
> > -       return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
> > +       int bit;
> > +
> > +       bit = trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
> > +       if (unlikely(bit < 0))
> > +               return bit;
> > +       preempt_disable_notrace();
> > +       return bit;
> >  }
> >
> >  /**
> > @@ -220,6 +216,33 @@ static __always_inline int ftrace_test_recursion_trylock(unsigned long ip,
> >   * This is used at the end of a ftrace callback.
> >   */
> >  static __always_inline void ftrace_test_recursion_unlock(int bit)
> > +{
> > +       preempt_enable_notrace();
> > +       trace_clear_recursion(bit);
> > +}
> > +
> > +/**
> > + * test_recursion_try_acquire - tests for recursion in same context
> > + *
> > + * This will detect recursion of a function.
> > + *
> > + * Returns: -1 if a recursion happened.
> > + *           >= 0 if no recursion
> > + */
> > +static __always_inline int test_recursion_try_acquire(unsigned long ip,
> > +                                                     unsigned long parent_ip)
> > +{
> > +       return trace_test_and_set_recursion(ip, parent_ip, TRACE_FTRACE_START);
> > +}
> > +
> > +/**
> > + * test_recursion_release - called after a success of test_recursion_try_acquire()
> > + * @bit: The return of a successful test_recursion_try_acquire()
> > + *
> > + * This releases the recursion lock taken by a non-negative return call
> > + * by test_recursion_try_acquire().
> > + */
> > +static __always_inline void test_recursion_release(int bit)
> >  {
> >         trace_clear_recursion(bit);
> >  }
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index db8532a4d5c8..1b117ab057ed 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -7299,6 +7299,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
> >         if (bit < 0)
> >                 return;
> >
> > +       preempt_disable();
> >         do_for_each_ftrace_op(op, ftrace_ops_list) {
> >                 /* Stub functions don't need to be called nor tested */
> >                 if (op->flags & FTRACE_OPS_FL_STUB)
> > @@ -7320,6 +7321,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
> >                 }
> >         } while_for_each_ftrace_op(op);
> >  out:
> > +       preempt_enable();
> >         trace_clear_recursion(bit);
> >  }
> >
> > --
> > 2.39.2
> >
>
>
> --
> Regards
> Yafang



-- 
Regards
Yafang

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] tracing: Add generic test_recursion_try_acquire()
  2023-04-15 14:33   ` Yafang Shao
@ 2023-04-15 15:46     ` Steven Rostedt
  2023-04-16  2:42       ` Yafang Shao
  0 siblings, 1 reply; 4+ messages in thread
From: Steven Rostedt @ 2023-04-15 15:46 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Alexei Starovoitov, LKML, Linux trace kernel, Masami Hiramatsu,
	Mark Rutland, bpf

On Sat, 15 Apr 2023 22:33:17 +0800
Yafang Shao <laoar.shao@gmail.com> wrote:

> 
> I don't have a clear idea why TRACE_CTX_TRANSITION must be needed, but
> it seems we have to do below code for the fentry test case,

The issue is that there's still cases that we can trace when
preempt_count hasn't been updated to the new context. That is,
preempt_count is used to determine if we are in softirq, hardirq or NMI
context. But there's some places that have:

 normal context:
 func_a() --> traced
   --> interrupt
    func_b() --> traced
    preempt_count_add(HARDIRQ_OFFSET)

Now we drop the second trace because it is flagged as a recursion when
in reality it is in a new context, but the preempt_count has not been
updated yet.

We are currently fixing these locations, but it's not there yet. And
since there's tools that relies on not dropping these locations, the
transition bit needs to be there until this situation is dealt with.

Can you change the tests to allow a single recursion?

-- Steve

> 
> diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
> index d48cd92d2364..f6b4601dd604 100644
> --- a/include/linux/trace_recursion.h
> +++ b/include/linux/trace_recursion.h
> @@ -163,21 +163,8 @@ static __always_inline int
> trace_test_and_set_recursion(unsigned long ip, unsign
>   return -1;
> 
>   bit = trace_get_context_bit() + start;
> - if (unlikely(val & (1 << bit))) {
> - /*
> - * If an interrupt occurs during a trace, and another trace
> - * happens in that interrupt but before the preempt_count is
> - * updated to reflect the new interrupt context, then this
> - * will think a recursion occurred, and the event will be dropped.
> - * Let a single instance happen via the TRANSITION_BIT to
> - * not drop those events.
> - */
> - bit = TRACE_CTX_TRANSITION + start;
> - if (val & (1 << bit)) {
> - do_ftrace_record_recursion(ip, pip);
> - return -1;
> - }
> - }
> + if (unlikely(val & (1 << bit)))
> + return -1;
> 
>   val |= 1 << bit;
>   current->trace_recursion = val;
> 
> 
> > The reason is that the bpf_map_delete_elem() in this fentry SEC()[2]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] tracing: Add generic test_recursion_try_acquire()
  2023-04-15 15:46     ` Steven Rostedt
@ 2023-04-16  2:42       ` Yafang Shao
  0 siblings, 0 replies; 4+ messages in thread
From: Yafang Shao @ 2023-04-16  2:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Alexei Starovoitov, LKML, Linux trace kernel, Masami Hiramatsu,
	Mark Rutland, bpf

On Sat, Apr 15, 2023 at 11:46 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Sat, 15 Apr 2023 22:33:17 +0800
> Yafang Shao <laoar.shao@gmail.com> wrote:
>
> >
> > I don't have a clear idea why TRACE_CTX_TRANSITION must be needed, but
> > it seems we have to do below code for the fentry test case,
>
> The issue is that there's still cases that we can trace when
> preempt_count hasn't been updated to the new context. That is,
> preempt_count is used to determine if we are in softirq, hardirq or NMI
> context. But there's some places that have:
>
>  normal context:
>  func_a() --> traced
>    --> interrupt
>     func_b() --> traced
>     preempt_count_add(HARDIRQ_OFFSET)
>
> Now we drop the second trace because it is flagged as a recursion when
> in reality it is in a new context, but the preempt_count has not been
> updated yet.
>
> We are currently fixing these locations, but it's not there yet. And
> since there's tools that relies on not dropping these locations, the
> transition bit needs to be there until this situation is dealt with.
>

Got it. Thanks for your explanation.

> Can you change the tests to allow a single recursion?
>

I think one single recursion should be acceptable. I will change it.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-16  2:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230414221702.554c44fe@rorschach.local.home>
2023-04-15 13:33 ` [PATCH] tracing: Add generic test_recursion_try_acquire() Yafang Shao
2023-04-15 14:33   ` Yafang Shao
2023-04-15 15:46     ` Steven Rostedt
2023-04-16  2:42       ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).