bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	David Miller <davem@davemloft.net>,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Sebastian Sewior <bigeasy@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Clark Williams <williams@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Ingo Molnar <mingo@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Vinicius Costa Gomes <vinicius.gomes@intel.com>,
	Jakub Kicinski <kuba@kernel.org>
Subject: Re: [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs
Date: Sat, 22 Feb 2020 09:40:10 +0100	[thread overview]
Message-ID: <87o8tr3thx.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <20200222042916.k3r5dj5njoo2ywyj@ast-mbp>

Alexei,

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Thu, Feb 20, 2020 at 09:45:18PM +0100, Thomas Gleixner wrote:
>> The assumption that only programs attached to perf NMI events can deadlock
>> on memory allocators is wrong. Assume the following simplified callchain:
>>  	 */
>> -	if (prog->type == BPF_PROG_TYPE_PERF_EVENT) {
>> +	if ((is_tracing_prog_type(prog->type)) {
>
> This doesn't build.
> I assumed the typo somehow sneaked in and proceeded, but it broke
> a bunch of tests:
> Summary: 1526 PASSED, 0 SKIPPED, 54 FAILED
> One can argue that the test are unsafe and broken.
> We used to test all those tests with and without prealloc:
> map_flags = 0;
> run_all_tests();
> map_flags = BPF_F_NO_PREALLOC;
> run_all_tests();
> Then 4 years ago commit 5aa5bd14c5f866 switched hashmap to be no_prealloc
> always and that how it stayed since then. We can adjust the tests to use
> prealloc with tracing progs, but this breakage shows that there could be plenty
> of bpf users that also use BPF_F_NO_PREALLOC with tracing. It could simply
> be because they know that their kprobes are in a safe spot (and kmalloc is ok)
> and they want to save memory. They could be using large max_entries parameter
> for worst case hash map usage, but typical load is low. In general hashtables
> don't perform well after 50%, so prealloc is wasting half of the memory. Since
> we cannot control where kprobes are placed I'm not sure what is the right fix
> here. It feels that if we proceed with this patch somebody will complain and we
> would have to revert, but I'm willing to take this risk if we cannot come up
> with an alternative fix.

Having something which is known to be broken exposed is not a good option
either.

Just assume that someone is investigating a kernel issue. BOFH who is
stuck in the 90's uses perf, kprobes and tracepoints. Now he goes on
vacation and the new kid in the team decides to flip that over to BPF.
So now instead of getting information he deadlocks or crashes the
machine.

You can't just tell him, don't do that then. It's broken by design and
you really can't tell which probes are safe and which are not because
the allocator calls out into whatever functions which might look
completely unrelated.

So one way to phase this out would be:

	if (is_tracing()) {
        	if (is_perf() || IS_ENABLED(RT))
                	return -EINVAL;
                WARN_ONCE(.....)
        }

And clearly write in the warning that this is dangerous, broken and
about to be forbidden. Hmm?

> Going further with the patchset.
>
> Patch 9 "bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites."
> adds new warning:
> ../kernel/seccomp.c: In function ‘seccomp_run_filters’:
> ../kernel/seccomp.c:272:50: warning: passing argument 2 of ‘bpf_prog_run_pin_on_cpu’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
>    u32 cur_ret = bpf_prog_run_pin_on_cpu(f->prog, sd);

Uurgh. I'm sure I fixed that and then I must have lost it again while
reshuffling stuff. Sorry about that.

> That's where I gave up.

Fair enough.

> I pulled sched-for-bpf-2020-02-20 branch from tip and pushed it into bpf-next.
> Could you please rebase your set on top of bpf-next and repost?
> The logic in all patches looks good.

Will do.

Thanks,

        tglx

  reply	other threads:[~2020-02-22  8:40 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-20 20:45 [patch V2 00/20] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
2020-02-20 20:45 ` [patch V2 01/20] bpf: Enforce preallocation for all instrumentation programs Thomas Gleixner
2020-02-22  4:29   ` Alexei Starovoitov
2020-02-22  8:40     ` Thomas Gleixner [this message]
2020-02-23 22:40       ` Alexei Starovoitov
2020-02-20 20:45 ` [patch V2 02/20] bpf: Update locking comment in hashtab code Thomas Gleixner
2020-02-20 20:45 ` [patch V2 03/20] bpf/tracing: Remove redundant preempt_disable() in __bpf_trace_run() Thomas Gleixner
2020-02-20 20:45 ` [patch V2 04/20] perf/bpf: Remove preempt disable around BPF invocation Thomas Gleixner
2020-02-20 20:45 ` [patch V2 05/20] bpf: Remove recursion prevention from rcu free callback Thomas Gleixner
2020-02-20 20:45 ` [patch V2 06/20] bpf: Dont iterate over possible CPUs with interrupts disabled Thomas Gleixner
2020-02-20 20:45 ` [patch V2 07/20] bpf: Provide bpf_prog_run_pin_on_cpu() helper Thomas Gleixner
2020-02-20 20:45 ` [patch V2 08/20] bpf: Replace cant_sleep() with cant_migrate() Thomas Gleixner
2020-02-20 20:45 ` [patch V2 09/20] bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites Thomas Gleixner
2020-02-20 20:45 ` [patch V2 10/20] trace/bpf: Use migrate disable in trace_call_bpf() Thomas Gleixner
2020-02-20 20:45 ` [patch V2 11/20] bpf/tests: Use migrate disable instead of preempt disable Thomas Gleixner
2020-02-20 20:45 ` [patch V2 12/20] bpf: Use migrate_disable/enabe() in trampoline code Thomas Gleixner
2020-02-20 20:45 ` [patch V2 13/20] bpf: Use migrate_disable/enable in array macros and cgroup/lirc code Thomas Gleixner
2020-02-20 20:45 ` [patch V2 14/20] bpf: Use migrate_disable() in hashtab code Thomas Gleixner
2020-02-20 20:45 ` [patch V2 15/20] bpf: Provide recursion prevention helpers Thomas Gleixner
2020-02-20 20:45 ` [patch V2 16/20] bpf: Replace open coded recursion prevention Thomas Gleixner
2020-02-20 20:45 ` [patch V2 17/20] bpf: Factor out hashtab bucket lock operations Thomas Gleixner
2020-02-20 20:45 ` [patch V2 18/20] bpf: Prepare hashtab locking for PREEMPT_RT Thomas Gleixner
2020-02-20 20:45 ` [patch V2 19/20] bpf, lpm: Make locking RT friendly Thomas Gleixner
2020-02-20 20:45 ` [patch V2 20/20] bpf/stackmap: Dont trylock mmap_sem with PREEMPT_RT and interrupts disabled Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o8tr3thx.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=juri.lelli@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vinicius.gomes@intel.com \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).