From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linutronix.de (146.0.238.70:993) by crypto-ml.lab.linutronix.de with IMAP4-SSL for ; 24 Feb 2019 15:08:25 -0000 Received: from mga01.intel.com ([192.55.52.88]) by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1gxvNr-0001Qv-2Z for speck@linutronix.de; Sun, 24 Feb 2019 16:08:04 +0100 From: Andi Kleen Subject: [MODERATED] [PATCH v6 25/43] MDSv6 Date: Sun, 24 Feb 2019 07:07:31 -0800 Message-Id: In-Reply-To: References: In-Reply-To: References: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit MIME-Version: 1.0 To: speck@linutronix.de Cc: Andi Kleen List-ID: BPF allows the user to run untrusted code in the kernel. Normally MDS would allow some information leakage either from other processes or sensitive kernel code to the user controlled BPF code. We cannot rule out that BPF code contains an MDS exploit and it is difficult to pattern match gadgets. The patch aims to add limited number of clear cpus before BPF executions to guarantee EBPF executions cannot leak data. Assume BPF execution does not touch other user's data, so does not need to schedule a clear for itself. For EBPF programs loaded privileged (by root) we never clear, because we already assume they are trusted. When the BPF program was loaded unprivileged clear the CPU before the BPF execution, depending on the context it is running in: We only do this when running in an interrupt, or if an clear cpu is already scheduled (which means for example there was a context switch, or crypto operation before) In process context we check if the current process context has the same userns+euid as the process who created the BPF. This handles the common seccomp filter case without any extra clears, but still adds clears when e.g. a socket filter runs on a socket inherited to a process with different user id. It also handles various other common cases. Technically we would only need to do this if the BPF program contains conditional branches and loads dominated by them, but let's assume that near all do. For example for running chromium with seccomp filters I see only 15-18% of all sandbox system calls have a clear, most are likely caused by context switches Unprivileged EBPF usages in interrupts currently always clear. This could be further optimized by allowing callers that do a lot of individual BPF runs and are sure they don't touch other user's data (that is not accessible to the EBPF anyways) inbetween to do the clear only once at the beginning. We can add such optimizations later based on profile data. Signed-off-by: Andi Kleen --- arch/x86/include/asm/clearbpf.h | 29 +++++++++++++++++++++++++++++ include/linux/filter.h | 21 +++++++++++++++++++-- kernel/bpf/core.c | 2 ++ kernel/bpf/cpumap.c | 3 +++ 4 files changed, 53 insertions(+), 2 deletions(-) create mode 100644 arch/x86/include/asm/clearbpf.h diff --git a/arch/x86/include/asm/clearbpf.h b/arch/x86/include/asm/clearbpf.h new file mode 100644 index 000000000000..3da885e4eb29 --- /dev/null +++ b/arch/x86/include/asm/clearbpf.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_CLEARBPF_H +#define _ASM_CLEARBPF_H 1 + +#include +#include +#include + +/* + * When the BPF program was loaded unprivileged, clear the CPU + * to prevent any exploits written in BPF using side channels to read + * data leaked from other kernel code. In some cases, like + * process context with the same uid, we can avoid it. + * + * See Documentation/clearcpu.txt for more details. + */ +static inline void arch_bpf_prepare_nonpriv(kuid_t uid) +{ + if (!static_cpu_has(X86_BUG_MDS)) + return; + if (in_interrupt() || + __this_cpu_read(clear_cpu_flag) || + !uid_eq(current_euid(), uid)) { + clear_cpu(); + __this_cpu_write(clear_cpu_flag, 0); + } +} + +#endif diff --git a/include/linux/filter.h b/include/linux/filter.h index e532fcc6e4b5..2c7f62f8047a 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -20,12 +20,21 @@ #include #include #include +#include #include #include #include +#ifdef CONFIG_ARCH_HAS_CLEAR_CPU +#include +#else +static inline void arch_bpf_prepare_nonpriv(kuid_t uid) +{ +} +#endif + struct sk_buff; struct sock; struct seccomp_data; @@ -490,7 +499,9 @@ struct bpf_prog { blinded:1, /* Was blinded */ is_func:1, /* program is a bpf function */ kprobe_override:1, /* Do we override a kprobe? */ - has_callchain_buf:1; /* callchain buffer allocated? */ + has_callchain_buf:1, /* callchain buffer allocated? */ + priv:1; /* Was loaded privileged */ + kuid_t uid; /* Original uid who created it */ enum bpf_prog_type type; /* Type of BPF program */ enum bpf_attach_type expected_attach_type; /* For some prog types */ u32 len; /* Number of filter blocks */ @@ -513,7 +524,13 @@ struct sk_filter { struct bpf_prog *prog; }; -#define BPF_PROG_RUN(filter, ctx) (*(filter)->bpf_func)(ctx, (filter)->insnsi) +static inline unsigned _bpf_prog_run(const struct bpf_prog *bp, const void *ctx) +{ + if (!bp->priv) + arch_bpf_prepare_nonpriv(bp->uid); + return bp->bpf_func(ctx, bp->insnsi); +} +#define BPF_PROG_RUN(filter, ctx) _bpf_prog_run(filter, ctx) #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index f908b9356025..67d845229d46 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -99,6 +99,8 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags) fp->aux = aux; fp->aux->prog = fp; fp->jit_requested = ebpf_jit_enabled(); + fp->priv = !!capable(CAP_SYS_ADMIN); + fp->uid = current_euid(); INIT_LIST_HEAD_RCU(&fp->aux->ksym_lnode); diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c index 8974b3755670..a5c9764168f9 100644 --- a/kernel/bpf/cpumap.c +++ b/kernel/bpf/cpumap.c @@ -376,6 +376,9 @@ static void __cpu_map_entry_free(struct rcu_head *rcu) /* No concurrent bq_enqueue can run at this point */ bq_flush_to_queue(rcpu, bq, false); + + /* Do lazy_clear_cpu_interrupt here? */ + } free_percpu(rcpu->bulkq); /* Cannot kthread_stop() here, last put free rcpu resources */ -- 2.17.2