BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: LKML <linux-kernel@vger.kernel.org>
Cc: David Miller <davem@davemloft.net>,
	bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Sebastian Sewior <bigeasy@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Clark Williams <williams@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Ingo Molnar <mingo@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Vinicius Costa Gomes <vinicius.gomes@intel.com>,
	Jakub Kicinski <kuba@kernel.org>
Subject: [patch V3 01/22] bpf: Tighten the requirements for preallocated hash maps
Date: Mon, 24 Feb 2020 15:01:32 +0100
Message-ID: <20200224145642.540542802@linutronix.de> (raw)
In-Reply-To: <20200224140131.461979697@linutronix.de>

The assumption that only programs attached to perf NMI events can deadlock
on memory allocators is wrong. Assume the following simplified callchain:

 kmalloc() from regular non BPF context
  cache empty
   freelist empty
    lock(zone->lock);
     tracepoint or kprobe
      BPF()
       update_elem()
        lock(bucket)
          kmalloc()
           cache empty
            freelist empty
             lock(zone->lock);  <- DEADLOCK

There are other ways which do not involve locking to create wreckage:

 kmalloc() from regular non BPF context
  local_irq_save();
   ...
    obj = slab_first();
     kprobe()
      BPF()
       update_elem()
        lock(bucket)
         kmalloc()
          local_irq_save();
           ...
            obj = slab_first(); <- Same object as above ...

So preallocation _must_ be enforced for all variants of intrusive
instrumentation.

Unfortunately immediate enforcement would break backwards compatibility, so
for now such programs still are allowed to run, but a one time warning is
emitted in dmesg and the verifier emits a warning in the verifier log as
well so developers are made aware about this and can fix their programs
before the enforcement becomes mandatory.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V3: Still allow run-time allocation for !RT. Emit warnings. Split
    out the RT part as this really should be backported to stable
    kernels.
V2: New patch
---
 kernel/bpf/verifier.c |   39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8143,26 +8143,43 @@ static bool is_tracing_prog_type(enum bp
 	}
 }
 
+static bool is_preallocated_map(struct bpf_map *map)
+{
+	if (!check_map_prealloc(map))
+		return false;
+	if (map->inner_map_meta && !check_map_prealloc(map->inner_map_meta))
+		return false;
+	return true;
+}
+
 static int check_map_prog_compatibility(struct bpf_verifier_env *env,
 					struct bpf_map *map,
 					struct bpf_prog *prog)
 
 {
-	/* Make sure that BPF_PROG_TYPE_PERF_EVENT programs only use
-	 * preallocated hash maps, since doing memory allocation
-	 * in overflow_handler can crash depending on where nmi got
-	 * triggered.
+	/*
+	 * Validate that trace type programs use preallocated hash maps.
+	 *
+	 * For programs attached to PERF events this is mandatory as the
+	 * perf NMI can hit any arbitrary code sequence.
+	 *
+	 * All other trace types using preallocated hash maps are unsafe as
+	 * well because tracepoint or kprobes can be inside locked regions
+	 * of the memory allocator or at a place where a recursion into the
+	 * memory allocator would see inconsistent state.
+	 *
+	 * For now running such programs is allowed for backwards
+	 * compatibility reasons, but warnings are emitted so developers are
+	 * made aware of the unsafety and can fix their programs before this
+	 * is enforced.
 	 */
-	if (prog->type == BPF_PROG_TYPE_PERF_EVENT) {
-		if (!check_map_prealloc(map)) {
+	if (is_tracing_prog_type(prog->type) && !is_preallocated_map(map)) {
+		if (prog->type == BPF_PROG_TYPE_PERF_EVENT) {
 			verbose(env, "perf_event programs can only use preallocated hash map\n");
 			return -EINVAL;
 		}
-		if (map->inner_map_meta &&
-		    !check_map_prealloc(map->inner_map_meta)) {
-			verbose(env, "perf_event programs can only use preallocated inner hash map\n");
-			return -EINVAL;
-		}
+		WARN_ONCE(1, "trace type BPF program uses run-time allocation\n");
+		verbose(env, "trace type programs with run-time allocated hash maps are unsafe. Switch to preallocated hash maps.\n");
 	}
 
 	if ((is_tracing_prog_type(prog->type) ||


  reply index

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-24 14:01 [patch V3 00/22] bpf: Make BPF and PREEMPT_RT co-exist Thomas Gleixner
2020-02-24 14:01 ` Thomas Gleixner [this message]
2020-02-24 14:01 ` [patch V3 02/22] bpf: Enforce preallocation for instrumentation programs on RT Thomas Gleixner
2020-02-24 14:01 ` [patch V3 03/22] bpf: Update locking comment in hashtab code Thomas Gleixner
2020-02-24 14:01 ` [patch V3 04/22] bpf/tracing: Remove redundant preempt_disable() in __bpf_trace_run() Thomas Gleixner
2020-02-24 14:01 ` [patch V3 05/22] bpf/trace: Remove EXPORT from trace_call_bpf() Thomas Gleixner
2020-02-24 18:16   ` Alexei Starovoitov
2020-02-24 14:01 ` [patch V3 06/22] bpf/trace: Remove redundant preempt_disable " Thomas Gleixner
2020-02-24 19:40   ` Alexei Starovoitov
2020-02-24 20:42     ` Thomas Gleixner
2020-02-25  0:33       ` Alexei Starovoitov
2020-02-25 12:36         ` Thomas Gleixner
2020-02-24 14:01 ` [patch V3 07/22] perf/bpf: Remove preempt disable around BPF invocation Thomas Gleixner
2020-02-24 14:01 ` [patch V3 08/22] bpf: Remove recursion prevention from rcu free callback Thomas Gleixner
2020-02-24 14:01 ` [patch V3 09/22] bpf: Dont iterate over possible CPUs with interrupts disabled Thomas Gleixner
2020-02-24 14:01 ` [patch V3 10/22] bpf: Provide bpf_prog_run_pin_on_cpu() helper Thomas Gleixner
2020-02-24 18:14   ` Thomas Gleixner
2020-02-24 18:41   ` [patch V4 " Thomas Gleixner
2020-02-24 14:01 ` [patch V3 11/22] bpf: Replace cant_sleep() with cant_migrate() Thomas Gleixner
2020-02-24 14:01 ` [patch V3 12/22] bpf: Use bpf_prog_run_pin_on_cpu() at simple call sites Thomas Gleixner
2020-02-24 14:01 ` [patch V3 13/22] bpf/tests: Use migrate disable instead of preempt disable Thomas Gleixner
2020-02-24 14:01 ` [patch V3 14/22] bpf: Use migrate_disable/enabe() in trampoline code Thomas Gleixner
2020-02-24 14:01 ` [patch V3 15/22] bpf: Use migrate_disable/enable in array macros and cgroup/lirc code Thomas Gleixner
2020-02-24 14:01 ` [patch V3 16/22] bpf: Provide recursion prevention helpers Thomas Gleixner
2020-02-24 14:01 ` [patch V3 17/22] bpf: Use recursion prevention helpers in hashtab code Thomas Gleixner
2020-02-24 14:01 ` [patch V3 18/22] bpf: Replace open coded recursion prevention in sys_bpf() Thomas Gleixner
2020-02-24 14:01 ` [patch V3 19/22] bpf: Factor out hashtab bucket lock operations Thomas Gleixner
2020-02-24 14:01 ` [patch V3 20/22] bpf: Prepare hashtab locking for PREEMPT_RT Thomas Gleixner
2020-02-24 14:01 ` [patch V3 21/22] bpf, lpm: Make locking RT friendly Thomas Gleixner
2020-02-24 14:01 ` [patch V3 22/22] bpf/stackmap: Dont trylock mmap_sem with PREEMPT_RT and interrupts disabled Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200224145642.540542802@linutronix.de \
    --to=tglx@linutronix.de \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=juri.lelli@redhat.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vinicius.gomes@intel.com \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git