[bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare
diff mbox series

Message ID 20210505162307.2545061-1-revest@chromium.org
State New, archived
Headers show
Series
  • [bpf] bpf: Don't WARN_ON_ONCE in bpf_bprintf_prepare
Related show

Commit Message

Florent Revest May 5, 2021, 4:23 p.m. UTC
The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
per-cpu buffer that they use to store temporary data (arguments to
bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
by the end of their scope with bpf_bprintf_cleanup.

If one of these helpers gets called within the scope of one of these
helpers, for example: a first bpf program gets called, uses
bpf_trace_printk which calls raw_spin_lock_irqsave which is traced by
another bpf program that calls bpf_trace_printk again, then the second
"get" fails. Essentially, these helpers are not re-entrant, and it's not
that bad because they would simply return -EBUSY and recover gracefully.

However, when this happens, the code hits a WARN_ON_ONCE. The guidelines
in include/asm-generic/bug.h say "Do not use these macros [...] on
transient conditions like ENOMEM or EAGAIN."

This condition qualifies as transient, for example, the next
raw_spin_lock_irqsave probe is likely to succeed, so it does not deserve
a WARN_ON_ONCE.

The guidelines also say "Do not use these macros when checking for
invalid external inputs (e.g. invalid system call arguments" and, in a
way, this can be seen as an invalid input because syzkaller triggered
it.

Signed-off-by: Florent Revest <revest@chromium.org>
Reported-by: syzbot <syzbot@syzkaller.appspotmail.com>
Fixes: d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf")
---
 kernel/bpf/helpers.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andrii Nakryiko May 5, 2021, 6:55 p.m. UTC | #1
On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote:
>
> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> per-cpu buffer that they use to store temporary data (arguments to
> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> by the end of their scope with bpf_bprintf_cleanup.
>
> If one of these helpers gets called within the scope of one of these
> helpers, for example: a first bpf program gets called, uses

Can we afford having few struct bpf_printf_bufs? They are just 512
bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
only situation where this can occur, right? If someone is doing
bpf_snprintf() and interrupt occurs and we run another BPF program, it
will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
second BPF program, etc. We can't eliminate the probability, but
having a small stack of buffers would make the probability so
miniscule as to not worry about it at all.

Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
the changes are minimal. Nestedness property is preserved for
non-sleepable BPF programs, right? If we want this to work for
sleepable we'd need to either: 1) disable migration or 2) instead of
assuming a stack of buffers, do a loop to find unused one. Should be
acceptable performance-wise, as it's not the fastest code anyway
(printf'ing in general).

In any case, re-using the same buffer for sort-of-optional-to-work
bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
suboptimal, so seems worth fixing this.

Thoughts?

> bpf_trace_printk which calls raw_spin_lock_irqsave which is traced by
> another bpf program that calls bpf_trace_printk again, then the second
> "get" fails. Essentially, these helpers are not re-entrant, and it's not
> that bad because they would simply return -EBUSY and recover gracefully.
>
> However, when this happens, the code hits a WARN_ON_ONCE. The guidelines
> in include/asm-generic/bug.h say "Do not use these macros [...] on
> transient conditions like ENOMEM or EAGAIN."
>
> This condition qualifies as transient, for example, the next
> raw_spin_lock_irqsave probe is likely to succeed, so it does not deserve
> a WARN_ON_ONCE.
>
> The guidelines also say "Do not use these macros when checking for
> invalid external inputs (e.g. invalid system call arguments" and, in a
> way, this can be seen as an invalid input because syzkaller triggered
> it.
>
> Signed-off-by: Florent Revest <revest@chromium.org>
> Reported-by: syzbot <syzbot@syzkaller.appspotmail.com>
> Fixes: d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf")
> ---
>  kernel/bpf/helpers.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 544773970dbc..007fa26eb3f5 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -709,7 +709,7 @@ static int try_get_fmt_tmp_buf(char **tmp_buf)
>
>         preempt_disable();
>         used = this_cpu_inc_return(bpf_printf_buf_used);
> -       if (WARN_ON_ONCE(used > 1)) {
> +       if (used > 1) {
>                 this_cpu_dec(bpf_printf_buf_used);
>                 preempt_enable();
>                 return -EBUSY;
> --
> 2.31.1.527.g47e6f16901-goog
>
Daniel Borkmann May 5, 2021, 8 p.m. UTC | #2
On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote:
>>
>> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
>> per-cpu buffer that they use to store temporary data (arguments to
>> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
>> by the end of their scope with bpf_bprintf_cleanup.
>>
>> If one of these helpers gets called within the scope of one of these
>> helpers, for example: a first bpf program gets called, uses
> 
> Can we afford having few struct bpf_printf_bufs? They are just 512
> bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> only situation where this can occur, right? If someone is doing
> bpf_snprintf() and interrupt occurs and we run another BPF program, it
> will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> second BPF program, etc. We can't eliminate the probability, but
> having a small stack of buffers would make the probability so
> miniscule as to not worry about it at all.
> 
> Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> the changes are minimal. Nestedness property is preserved for
> non-sleepable BPF programs, right? If we want this to work for
> sleepable we'd need to either: 1) disable migration or 2) instead of
> assuming a stack of buffers, do a loop to find unused one. Should be
> acceptable performance-wise, as it's not the fastest code anyway
> (printf'ing in general).
> 
> In any case, re-using the same buffer for sort-of-optional-to-work
> bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> suboptimal, so seems worth fixing this.
> 
> Thoughts?

Yes, agree, it would otherwise be really hard to debug. I had the same
thought on why not allowing nesting here given users very likely expect
these helpers to just work for all the contexts.

Thanks,
Daniel
Andrii Nakryiko May 5, 2021, 8:48 p.m. UTC | #3
On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> > On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote:
> >>
> >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> >> per-cpu buffer that they use to store temporary data (arguments to
> >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> >> by the end of their scope with bpf_bprintf_cleanup.
> >>
> >> If one of these helpers gets called within the scope of one of these
> >> helpers, for example: a first bpf program gets called, uses
> >
> > Can we afford having few struct bpf_printf_bufs? They are just 512
> > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> > only situation where this can occur, right? If someone is doing
> > bpf_snprintf() and interrupt occurs and we run another BPF program, it
> > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> > second BPF program, etc. We can't eliminate the probability, but
> > having a small stack of buffers would make the probability so
> > miniscule as to not worry about it at all.
> >
> > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> > the changes are minimal. Nestedness property is preserved for
> > non-sleepable BPF programs, right? If we want this to work for
> > sleepable we'd need to either: 1) disable migration or 2) instead of

oh wait, we already disable migration for sleepable BPF progs, so it
should be good to do nestedness level only

> > assuming a stack of buffers, do a loop to find unused one. Should be
> > acceptable performance-wise, as it's not the fastest code anyway
> > (printf'ing in general).
> >
> > In any case, re-using the same buffer for sort-of-optional-to-work
> > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> > suboptimal, so seems worth fixing this.
> >
> > Thoughts?
>
> Yes, agree, it would otherwise be really hard to debug. I had the same
> thought on why not allowing nesting here given users very likely expect
> these helpers to just work for all the contexts.
>
> Thanks,
> Daniel
Andrii Nakryiko May 5, 2021, 8:52 p.m. UTC | #4
On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote:
> > >>
> > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> > >> per-cpu buffer that they use to store temporary data (arguments to
> > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> > >> by the end of their scope with bpf_bprintf_cleanup.
> > >>
> > >> If one of these helpers gets called within the scope of one of these
> > >> helpers, for example: a first bpf program gets called, uses
> > >
> > > Can we afford having few struct bpf_printf_bufs? They are just 512
> > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> > > only situation where this can occur, right? If someone is doing
> > > bpf_snprintf() and interrupt occurs and we run another BPF program, it
> > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> > > second BPF program, etc. We can't eliminate the probability, but
> > > having a small stack of buffers would make the probability so
> > > miniscule as to not worry about it at all.
> > >
> > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> > > the changes are minimal. Nestedness property is preserved for
> > > non-sleepable BPF programs, right? If we want this to work for
> > > sleepable we'd need to either: 1) disable migration or 2) instead of
>
> oh wait, we already disable migration for sleepable BPF progs, so it
> should be good to do nestedness level only

actually, migrate_disable() might not be enough. Unless it is
impossible for some reason I miss, worst case it could be that two
sleepable programs (A and B) can be intermixed on the same CPU: A
starts&sleeps - B starts&sleeps - A continues&returns - B continues
and nestedness doesn't work anymore. So something like "reserving a
slot" would work better.

>
> > > assuming a stack of buffers, do a loop to find unused one. Should be
> > > acceptable performance-wise, as it's not the fastest code anyway
> > > (printf'ing in general).
> > >
> > > In any case, re-using the same buffer for sort-of-optional-to-work
> > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> > > suboptimal, so seems worth fixing this.
> > >
> > > Thoughts?
> >
> > Yes, agree, it would otherwise be really hard to debug. I had the same
> > thought on why not allowing nesting here given users very likely expect
> > these helpers to just work for all the contexts.
> >
> > Thanks,
> > Daniel
Florent Revest May 5, 2021, 10:29 p.m. UTC | #5
On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >
> > > On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> > > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote:
> > > >>
> > > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> > > >> per-cpu buffer that they use to store temporary data (arguments to
> > > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> > > >> by the end of their scope with bpf_bprintf_cleanup.
> > > >>
> > > >> If one of these helpers gets called within the scope of one of these
> > > >> helpers, for example: a first bpf program gets called, uses
> > > >
> > > > Can we afford having few struct bpf_printf_bufs? They are just 512
> > > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> > > > only situation where this can occur, right? If someone is doing
> > > > bpf_snprintf() and interrupt occurs and we run another BPF program, it
> > > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> > > > second BPF program, etc. We can't eliminate the probability, but
> > > > having a small stack of buffers would make the probability so
> > > > miniscule as to not worry about it at all.
> > > >
> > > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> > > > the changes are minimal. Nestedness property is preserved for
> > > > non-sleepable BPF programs, right? If we want this to work for
> > > > sleepable we'd need to either: 1) disable migration or 2) instead of
> >
> > oh wait, we already disable migration for sleepable BPF progs, so it
> > should be good to do nestedness level only
>
> actually, migrate_disable() might not be enough. Unless it is
> impossible for some reason I miss, worst case it could be that two
> sleepable programs (A and B) can be intermixed on the same CPU: A
> starts&sleeps - B starts&sleeps - A continues&returns - B continues
> and nestedness doesn't work anymore. So something like "reserving a
> slot" would work better.

Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ?

> >
> > > > assuming a stack of buffers, do a loop to find unused one. Should be
> > > > acceptable performance-wise, as it's not the fastest code anyway
> > > > (printf'ing in general).
> > > >
> > > > In any case, re-using the same buffer for sort-of-optional-to-work
> > > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> > > > suboptimal, so seems worth fixing this.
> > > >
> > > > Thoughts?
> > >
> > > Yes, agree, it would otherwise be really hard to debug. I had the same
> > > thought on why not allowing nesting here given users very likely expect
> > > these helpers to just work for all the contexts.
> > >
> > > Thanks,
> > > Daniel

What would you think of just letting the helpers own these 512 bytes
buffers as local variables on their stacks ? Then bpf_prepare_bprintf
would only need to write there, there would be no acquire semantic
(like try_get_fmt_tmp_buf) and the stack frame would just be freed on
the helper return so there would be no bpf_printf_cleanup either. We
would also not pre-reserve static memory for all CPUs and it becomes
trivial to handle re-entrant helper calls.

I inherited this per-cpu buffer from the pre-existing bpf_seq_printf
code but I've not been convinced of its necessity.
Andrii Nakryiko May 6, 2021, 6:52 p.m. UTC | #6
On Wed, May 5, 2021 at 3:29 PM Florent Revest <revest@chromium.org> wrote:
>
> On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > >
> > > > On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> > > > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote:
> > > > >>
> > > > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> > > > >> per-cpu buffer that they use to store temporary data (arguments to
> > > > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> > > > >> by the end of their scope with bpf_bprintf_cleanup.
> > > > >>
> > > > >> If one of these helpers gets called within the scope of one of these
> > > > >> helpers, for example: a first bpf program gets called, uses
> > > > >
> > > > > Can we afford having few struct bpf_printf_bufs? They are just 512
> > > > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> > > > > only situation where this can occur, right? If someone is doing
> > > > > bpf_snprintf() and interrupt occurs and we run another BPF program, it
> > > > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> > > > > second BPF program, etc. We can't eliminate the probability, but
> > > > > having a small stack of buffers would make the probability so
> > > > > miniscule as to not worry about it at all.
> > > > >
> > > > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> > > > > the changes are minimal. Nestedness property is preserved for
> > > > > non-sleepable BPF programs, right? If we want this to work for
> > > > > sleepable we'd need to either: 1) disable migration or 2) instead of
> > >
> > > oh wait, we already disable migration for sleepable BPF progs, so it
> > > should be good to do nestedness level only
> >
> > actually, migrate_disable() might not be enough. Unless it is
> > impossible for some reason I miss, worst case it could be that two
> > sleepable programs (A and B) can be intermixed on the same CPU: A
> > starts&sleeps - B starts&sleeps - A continues&returns - B continues
> > and nestedness doesn't work anymore. So something like "reserving a
> > slot" would work better.
>
> Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ?
>
> > >
> > > > > assuming a stack of buffers, do a loop to find unused one. Should be
> > > > > acceptable performance-wise, as it's not the fastest code anyway
> > > > > (printf'ing in general).
> > > > >
> > > > > In any case, re-using the same buffer for sort-of-optional-to-work
> > > > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> > > > > suboptimal, so seems worth fixing this.
> > > > >
> > > > > Thoughts?
> > > >
> > > > Yes, agree, it would otherwise be really hard to debug. I had the same
> > > > thought on why not allowing nesting here given users very likely expect
> > > > these helpers to just work for all the contexts.
> > > >
> > > > Thanks,
> > > > Daniel
>
> What would you think of just letting the helpers own these 512 bytes
> buffers as local variables on their stacks ? Then bpf_prepare_bprintf
> would only need to write there, there would be no acquire semantic
> (like try_get_fmt_tmp_buf) and the stack frame would just be freed on
> the helper return so there would be no bpf_printf_cleanup either. We
> would also not pre-reserve static memory for all CPUs and it becomes
> trivial to handle re-entrant helper calls.
>
> I inherited this per-cpu buffer from the pre-existing bpf_seq_printf
> code but I've not been convinced of its necessity.

I got the impression that extra 512 bytes on the kernel stack is quite
a lot and that's why we have per-cpu buffers. Especially that
bpf_trace_printk() can be called from any context, including NMI.
Florent Revest May 6, 2021, 8:17 p.m. UTC | #7
On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, May 5, 2021 at 3:29 PM Florent Revest <revest@chromium.org> wrote:
> >
> > On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > > >
> > > > > On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> > > > > > On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote:
> > > > > >>
> > > > > >> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> > > > > >> per-cpu buffer that they use to store temporary data (arguments to
> > > > > >> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> > > > > >> by the end of their scope with bpf_bprintf_cleanup.
> > > > > >>
> > > > > >> If one of these helpers gets called within the scope of one of these
> > > > > >> helpers, for example: a first bpf program gets called, uses
> > > > > >
> > > > > > Can we afford having few struct bpf_printf_bufs? They are just 512
> > > > > > bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> > > > > > only situation where this can occur, right? If someone is doing
> > > > > > bpf_snprintf() and interrupt occurs and we run another BPF program, it
> > > > > > will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> > > > > > second BPF program, etc. We can't eliminate the probability, but
> > > > > > having a small stack of buffers would make the probability so
> > > > > > miniscule as to not worry about it at all.
> > > > > >
> > > > > > Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> > > > > > the changes are minimal. Nestedness property is preserved for
> > > > > > non-sleepable BPF programs, right? If we want this to work for
> > > > > > sleepable we'd need to either: 1) disable migration or 2) instead of
> > > >
> > > > oh wait, we already disable migration for sleepable BPF progs, so it
> > > > should be good to do nestedness level only
> > >
> > > actually, migrate_disable() might not be enough. Unless it is
> > > impossible for some reason I miss, worst case it could be that two
> > > sleepable programs (A and B) can be intermixed on the same CPU: A
> > > starts&sleeps - B starts&sleeps - A continues&returns - B continues
> > > and nestedness doesn't work anymore. So something like "reserving a
> > > slot" would work better.
> >
> > Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ?
> >
> > > >
> > > > > > assuming a stack of buffers, do a loop to find unused one. Should be
> > > > > > acceptable performance-wise, as it's not the fastest code anyway
> > > > > > (printf'ing in general).
> > > > > >
> > > > > > In any case, re-using the same buffer for sort-of-optional-to-work
> > > > > > bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> > > > > > suboptimal, so seems worth fixing this.
> > > > > >
> > > > > > Thoughts?
> > > > >
> > > > > Yes, agree, it would otherwise be really hard to debug. I had the same
> > > > > thought on why not allowing nesting here given users very likely expect
> > > > > these helpers to just work for all the contexts.
> > > > >
> > > > > Thanks,
> > > > > Daniel
> >
> > What would you think of just letting the helpers own these 512 bytes
> > buffers as local variables on their stacks ? Then bpf_prepare_bprintf
> > would only need to write there, there would be no acquire semantic
> > (like try_get_fmt_tmp_buf) and the stack frame would just be freed on
> > the helper return so there would be no bpf_printf_cleanup either. We
> > would also not pre-reserve static memory for all CPUs and it becomes
> > trivial to handle re-entrant helper calls.
> >
> > I inherited this per-cpu buffer from the pre-existing bpf_seq_printf
> > code but I've not been convinced of its necessity.
>
> I got the impression that extra 512 bytes on the kernel stack is quite
> a lot and that's why we have per-cpu buffers. Especially that
> bpf_trace_printk() can be called from any context, including NMI.

Ok, I understand.

What about having one buffer per helper, synchronized with a spinlock?
Actually, bpf_trace_printk already has that, not for the bprintf
arguments but for the bprintf output so this wouldn't change much to
the performance of the helpers anyway:
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385

These helpers are not performance sensitive so a per-cpu stack of
buffers feels over-engineered to me (and is also complexity I feel a
bit uncomfortable with).
Daniel Borkmann May 6, 2021, 9:38 p.m. UTC | #8
On 5/6/21 10:17 PM, Florent Revest wrote:
> On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
>> On Wed, May 5, 2021 at 3:29 PM Florent Revest <revest@chromium.org> wrote:
>>> On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko
>>> <andrii.nakryiko@gmail.com> wrote:
>>>> On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
>>>> <andrii.nakryiko@gmail.com> wrote:
>>>>> On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>>>>> On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
>>>>>>> On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote:
>>>>>>>>
>>>>>>>> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
>>>>>>>> per-cpu buffer that they use to store temporary data (arguments to
>>>>>>>> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
>>>>>>>> by the end of their scope with bpf_bprintf_cleanup.
>>>>>>>>
>>>>>>>> If one of these helpers gets called within the scope of one of these
>>>>>>>> helpers, for example: a first bpf program gets called, uses
>>>>>>>
>>>>>>> Can we afford having few struct bpf_printf_bufs? They are just 512
>>>>>>> bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
>>>>>>> only situation where this can occur, right? If someone is doing
>>>>>>> bpf_snprintf() and interrupt occurs and we run another BPF program, it
>>>>>>> will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
>>>>>>> second BPF program, etc. We can't eliminate the probability, but
>>>>>>> having a small stack of buffers would make the probability so
>>>>>>> miniscule as to not worry about it at all.
>>>>>>>
>>>>>>> Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
>>>>>>> the changes are minimal. Nestedness property is preserved for
>>>>>>> non-sleepable BPF programs, right? If we want this to work for
>>>>>>> sleepable we'd need to either: 1) disable migration or 2) instead of
>>>>>
>>>>> oh wait, we already disable migration for sleepable BPF progs, so it
>>>>> should be good to do nestedness level only
>>>>
>>>> actually, migrate_disable() might not be enough. Unless it is
>>>> impossible for some reason I miss, worst case it could be that two
>>>> sleepable programs (A and B) can be intermixed on the same CPU: A
>>>> starts&sleeps - B starts&sleeps - A continues&returns - B continues
>>>> and nestedness doesn't work anymore. So something like "reserving a
>>>> slot" would work better.
>>>
>>> Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ?
>>>
>>>>>>> assuming a stack of buffers, do a loop to find unused one. Should be
>>>>>>> acceptable performance-wise, as it's not the fastest code anyway
>>>>>>> (printf'ing in general).
>>>>>>>
>>>>>>> In any case, re-using the same buffer for sort-of-optional-to-work
>>>>>>> bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
>>>>>>> suboptimal, so seems worth fixing this.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>
>>>>>> Yes, agree, it would otherwise be really hard to debug. I had the same
>>>>>> thought on why not allowing nesting here given users very likely expect
>>>>>> these helpers to just work for all the contexts.
>>>>>>
>>>>>> Thanks,
>>>>>> Daniel
>>>
>>> What would you think of just letting the helpers own these 512 bytes
>>> buffers as local variables on their stacks ? Then bpf_prepare_bprintf
>>> would only need to write there, there would be no acquire semantic
>>> (like try_get_fmt_tmp_buf) and the stack frame would just be freed on
>>> the helper return so there would be no bpf_printf_cleanup either. We
>>> would also not pre-reserve static memory for all CPUs and it becomes
>>> trivial to handle re-entrant helper calls.
>>>
>>> I inherited this per-cpu buffer from the pre-existing bpf_seq_printf
>>> code but I've not been convinced of its necessity.
>>
>> I got the impression that extra 512 bytes on the kernel stack is quite
>> a lot and that's why we have per-cpu buffers. Especially that
>> bpf_trace_printk() can be called from any context, including NMI.
> 
> Ok, I understand.
> 
> What about having one buffer per helper, synchronized with a spinlock?
> Actually, bpf_trace_printk already has that, not for the bprintf
> arguments but for the bprintf output so this wouldn't change much to
> the performance of the helpers anyway:
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385
> 
> These helpers are not performance sensitive so a per-cpu stack of
> buffers feels over-engineered to me (and is also complexity I feel a
> bit uncomfortable with).

But wouldn't this have same potential of causing a deadlock? Simple example
would be if you have a tracing prog attached to bstr_printf(), and one of
the other helpers using the same lock called from a non-tracing prog. If
it can be avoided fairly easily, I'd also opt for per-cpu buffers as Andrii
mentioned earlier. We've had few prior examples with similar issues [0].

   [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9594dc3c7e71b9f52bee1d7852eb3d4e3aea9e99
Florent Revest May 7, 2021, 10:39 a.m. UTC | #9
On Thu, May 6, 2021 at 11:38 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 5/6/21 10:17 PM, Florent Revest wrote:
> > On Thu, May 6, 2021 at 8:52 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> >> On Wed, May 5, 2021 at 3:29 PM Florent Revest <revest@chromium.org> wrote:
> >>> On Wed, May 5, 2021 at 10:52 PM Andrii Nakryiko
> >>> <andrii.nakryiko@gmail.com> wrote:
> >>>> On Wed, May 5, 2021 at 1:48 PM Andrii Nakryiko
> >>>> <andrii.nakryiko@gmail.com> wrote:
> >>>>> On Wed, May 5, 2021 at 1:00 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >>>>>> On 5/5/21 8:55 PM, Andrii Nakryiko wrote:
> >>>>>>> On Wed, May 5, 2021 at 9:23 AM Florent Revest <revest@chromium.org> wrote:
> >>>>>>>>
> >>>>>>>> The bpf_seq_printf, bpf_trace_printk and bpf_snprintf helpers share one
> >>>>>>>> per-cpu buffer that they use to store temporary data (arguments to
> >>>>>>>> bprintf). They "get" that buffer with try_get_fmt_tmp_buf and "put" it
> >>>>>>>> by the end of their scope with bpf_bprintf_cleanup.
> >>>>>>>>
> >>>>>>>> If one of these helpers gets called within the scope of one of these
> >>>>>>>> helpers, for example: a first bpf program gets called, uses
> >>>>>>>
> >>>>>>> Can we afford having few struct bpf_printf_bufs? They are just 512
> >>>>>>> bytes, so can we have 3-5 of them? Tracing low-level stuff isn't the
> >>>>>>> only situation where this can occur, right? If someone is doing
> >>>>>>> bpf_snprintf() and interrupt occurs and we run another BPF program, it
> >>>>>>> will be impossible to do bpf_snprintf() or bpf_trace_printk() from the
> >>>>>>> second BPF program, etc. We can't eliminate the probability, but
> >>>>>>> having a small stack of buffers would make the probability so
> >>>>>>> miniscule as to not worry about it at all.
> >>>>>>>
> >>>>>>> Good thing is that try_get_fmt_tmp_buf() abstracts all the details, so
> >>>>>>> the changes are minimal. Nestedness property is preserved for
> >>>>>>> non-sleepable BPF programs, right? If we want this to work for
> >>>>>>> sleepable we'd need to either: 1) disable migration or 2) instead of
> >>>>>
> >>>>> oh wait, we already disable migration for sleepable BPF progs, so it
> >>>>> should be good to do nestedness level only
> >>>>
> >>>> actually, migrate_disable() might not be enough. Unless it is
> >>>> impossible for some reason I miss, worst case it could be that two
> >>>> sleepable programs (A and B) can be intermixed on the same CPU: A
> >>>> starts&sleeps - B starts&sleeps - A continues&returns - B continues
> >>>> and nestedness doesn't work anymore. So something like "reserving a
> >>>> slot" would work better.
> >>>
> >>> Iiuc try_get_fmt_tmp_buf does preempt_enable to avoid that situation ?
> >>>
> >>>>>>> assuming a stack of buffers, do a loop to find unused one. Should be
> >>>>>>> acceptable performance-wise, as it's not the fastest code anyway
> >>>>>>> (printf'ing in general).
> >>>>>>>
> >>>>>>> In any case, re-using the same buffer for sort-of-optional-to-work
> >>>>>>> bpf_trace_printk() and probably-important-to-work bpf_snprintf() is
> >>>>>>> suboptimal, so seems worth fixing this.
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>
> >>>>>> Yes, agree, it would otherwise be really hard to debug. I had the same
> >>>>>> thought on why not allowing nesting here given users very likely expect
> >>>>>> these helpers to just work for all the contexts.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Daniel
> >>>
> >>> What would you think of just letting the helpers own these 512 bytes
> >>> buffers as local variables on their stacks ? Then bpf_prepare_bprintf
> >>> would only need to write there, there would be no acquire semantic
> >>> (like try_get_fmt_tmp_buf) and the stack frame would just be freed on
> >>> the helper return so there would be no bpf_printf_cleanup either. We
> >>> would also not pre-reserve static memory for all CPUs and it becomes
> >>> trivial to handle re-entrant helper calls.
> >>>
> >>> I inherited this per-cpu buffer from the pre-existing bpf_seq_printf
> >>> code but I've not been convinced of its necessity.
> >>
> >> I got the impression that extra 512 bytes on the kernel stack is quite
> >> a lot and that's why we have per-cpu buffers. Especially that
> >> bpf_trace_printk() can be called from any context, including NMI.
> >
> > Ok, I understand.
> >
> > What about having one buffer per helper, synchronized with a spinlock?
> > Actually, bpf_trace_printk already has that, not for the bprintf
> > arguments but for the bprintf output so this wouldn't change much to
> > the performance of the helpers anyway:
> > https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/kernel/trace/bpf_trace.c?id=9d31d2338950293ec19d9b095fbaa9030899dcb4#n385
> >
> > These helpers are not performance sensitive so a per-cpu stack of
> > buffers feels over-engineered to me (and is also complexity I feel a
> > bit uncomfortable with).
>
> But wouldn't this have same potential of causing a deadlock? Simple example
> would be if you have a tracing prog attached to bstr_printf(), and one of
> the other helpers using the same lock called from a non-tracing prog. If

Ah, right, I see :/

> it can be avoided fairly easily, I'd also opt for per-cpu buffers as Andrii
> mentioned earlier. We've had few prior examples with similar issues [0].
>
>    [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9594dc3c7e71b9f52bee1d7852eb3d4e3aea9e99

Ok it's not as bad as I imagined, thank you Daniel :) I'll look into
it beginning of next week.

Patch
diff mbox series

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 544773970dbc..007fa26eb3f5 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -709,7 +709,7 @@  static int try_get_fmt_tmp_buf(char **tmp_buf)
 
 	preempt_disable();
 	used = this_cpu_inc_return(bpf_printf_buf_used);
-	if (WARN_ON_ONCE(used > 1)) {
+	if (used > 1) {
 		this_cpu_dec(bpf_printf_buf_used);
 		preempt_enable();
 		return -EBUSY;