From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ak@linux.intel.com>
Received: from mail.linutronix.de (146.0.238.70:993) by
  crypto-ml.lab.linutronix.de with IMAP4-SSL for <speck@linutronix.de>; 24 Feb
  2019 15:08:25 -0000
Received: from mga01.intel.com ([192.55.52.88])
	by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256)
	(Exim 4.80)
	(envelope-from <ak@linux.intel.com>)
	id 1gxvNr-0001Qv-2Z
	for speck@linutronix.de; Sun, 24 Feb 2019 16:08:04 +0100
From: Andi Kleen <andi@firstfloor.org>
Subject: [MODERATED] [PATCH v6 25/43] MDSv6
Date: Sun, 24 Feb 2019 07:07:31 -0800
Message-Id: 
 <c5dc8a58910fc59fdd90c2dcd0a61d0c049ff814.1551019522.git.ak@linux.intel.com>
In-Reply-To: <cover.1551019522.git.ak@linux.intel.com>
References: <cover.1551019522.git.ak@linux.intel.com>
In-Reply-To: <cover.1551019522.git.ak@linux.intel.com>
References: <cover.1551019522.git.ak@linux.intel.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0
To: speck@linutronix.de
Cc: Andi Kleen <ak@linux.intel.com>
List-ID: <speck.linutronix.de>

BPF allows the user to run untrusted code in the kernel.

Normally MDS would allow some information leakage either
from other processes  or sensitive kernel code to the user
controlled BPF code.  We cannot rule out that BPF code contains
an MDS exploit and it is difficult to pattern match
gadgets.

The patch aims to add limited number of clear cpus
before BPF executions to guarantee EBPF executions
cannot leak data.

Assume BPF execution does not touch other user's data, so does
not need to schedule a clear for itself.

For EBPF programs loaded privileged (by root) we never clear,
because we already assume they are trusted.

When the BPF program was loaded unprivileged clear the CPU
before the BPF execution, depending on the context it is running in:

We only do this when running in an interrupt, or if an clear cpu is
already scheduled (which means for example there was a context
switch, or crypto operation before)

In process context we check if the current process context
has the same userns+euid as the process who created the BPF.

This handles the common seccomp filter case without
any extra clears, but still adds clears when e.g. a socket
filter runs on a socket inherited to a process with different user id.
It also handles various other common cases.

Technically we would only need to do this if the BPF program
contains conditional branches and loads dominated by them, but
let's assume that near all do.

For example for running chromium with seccomp filters I see
only 15-18% of all sandbox system calls have a clear, most
are likely caused by context switches

Unprivileged EBPF usages in interrupts currently always clear.

This could be further optimized by allowing callers that do
a lot of individual BPF runs and are sure they don't touch
other user's data (that is not accessible to the EBPF anyways)
inbetween to do the clear only once at the beginning. We can add
such optimizations later based on profile data.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/include/asm/clearbpf.h | 29 +++++++++++++++++++++++++++++
 include/linux/filter.h          | 21 +++++++++++++++++++--
 kernel/bpf/core.c               |  2 ++
 kernel/bpf/cpumap.c             |  3 +++
 4 files changed, 53 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/include/asm/clearbpf.h

diff --git a/arch/x86/include/asm/clearbpf.h b/arch/x86/include/asm/clearbpf.h
new file mode 100644
index 000000000000..3da885e4eb29
--- /dev/null
+++ b/arch/x86/include/asm/clearbpf.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_CLEARBPF_H
+#define _ASM_CLEARBPF_H 1
+
+#include <linux/clearcpu.h>
+#include <linux/cred.h>
+#include <asm/cpufeatures.h>
+
+/*
+ * When the BPF program was loaded unprivileged, clear the CPU
+ * to prevent any exploits written in BPF using side channels to read
+ * data leaked from other kernel code. In some cases, like
+ * process context with the same uid, we can avoid it.
+ *
+ * See Documentation/clearcpu.txt for more details.
+ */
+static inline void arch_bpf_prepare_nonpriv(kuid_t uid)
+{
+	if (!static_cpu_has(X86_BUG_MDS))
+		return;
+	if (in_interrupt() ||
+	    __this_cpu_read(clear_cpu_flag) ||
+	    !uid_eq(current_euid(), uid)) {
+		clear_cpu();
+		__this_cpu_write(clear_cpu_flag, 0);
+	}
+}
+
+#endif
diff --git a/include/linux/filter.h b/include/linux/filter.h
index e532fcc6e4b5..2c7f62f8047a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -20,12 +20,21 @@
 #include <linux/set_memory.h>
 #include <linux/kallsyms.h>
 #include <linux/if_vlan.h>
+#include <linux/clearcpu.h>
 
 #include <net/sch_generic.h>
 
 #include <uapi/linux/filter.h>
 #include <uapi/linux/bpf.h>
 
+#ifdef CONFIG_ARCH_HAS_CLEAR_CPU
+#include <asm/clearbpf.h>
+#else
+static inline void arch_bpf_prepare_nonpriv(kuid_t uid)
+{
+}
+#endif
+
 struct sk_buff;
 struct sock;
 struct seccomp_data;
@@ -490,7 +499,9 @@ struct bpf_prog {
 				blinded:1,	/* Was blinded */
 				is_func:1,	/* program is a bpf function */
 				kprobe_override:1, /* Do we override a kprobe? */
-				has_callchain_buf:1; /* callchain buffer allocated? */
+				has_callchain_buf:1, /* callchain buffer allocated? */
+				priv:1;		/* Was loaded privileged */
+	kuid_t			uid;		/* Original uid who created it */
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	enum bpf_attach_type	expected_attach_type; /* For some prog types */
 	u32			len;		/* Number of filter blocks */
@@ -513,7 +524,13 @@ struct sk_filter {
 	struct bpf_prog	*prog;
 };
 
-#define BPF_PROG_RUN(filter, ctx)  (*(filter)->bpf_func)(ctx, (filter)->insnsi)
+static inline unsigned _bpf_prog_run(const struct bpf_prog *bp, const void *ctx)
+{
+	if (!bp->priv)
+		arch_bpf_prepare_nonpriv(bp->uid);
+	return bp->bpf_func(ctx, bp->insnsi);
+}
+#define BPF_PROG_RUN(filter, ctx) _bpf_prog_run(filter, ctx)
 
 #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index f908b9356025..67d845229d46 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -99,6 +99,8 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags)
 	fp->aux = aux;
 	fp->aux->prog = fp;
 	fp->jit_requested = ebpf_jit_enabled();
+	fp->priv = !!capable(CAP_SYS_ADMIN);
+	fp->uid = current_euid();
 
 	INIT_LIST_HEAD_RCU(&fp->aux->ksym_lnode);
 
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 8974b3755670..a5c9764168f9 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -376,6 +376,9 @@ static void __cpu_map_entry_free(struct rcu_head *rcu)
 
 		/* No concurrent bq_enqueue can run at this point */
 		bq_flush_to_queue(rcpu, bq, false);
+
+		/* Do lazy_clear_cpu_interrupt here? */
+
 	}
 	free_percpu(rcpu->bulkq);
 	/* Cannot kthread_stop() here, last put free rcpu resources */
-- 
2.17.2