All of lore.kernel.org
 help / color / mirror / Atom feed
* [MODERATED] [PATCH 0/5] SSB extra 0
@ 2018-05-03 22:29 Dave Hansen
  2018-05-03 22:29 ` [MODERATED] [PATCH 1/5] SSB extra 2 Dave Hansen
                   ` (9 more replies)
  0 siblings, 10 replies; 31+ messages in thread
From: Dave Hansen @ 2018-05-03 22:29 UTC (permalink / raw)
  To: speck; +Cc: Dave Hansen

BPF is a potential source of gadgets that can be used for memory
diambiguation-based attacks.  To help mitigate these, we enable
the bit in SPEC_CTRL which enables the reduced (memory)
speculation mode on the processor when runing BPF code.

This is far from optimal: it does not opt-out of the mitigations
for BPF programs which are trusted or which are less likely to be
exploited, like those which have been offloaded.

It also uses a fixed (and stupid) algoritm for keeping the MSR
write traffic to a minimum.  Each time a CPU uses BPF, it does the
MSR write to enable the mitigation and then schedules some work
in 10ms to disable the mitigation.  It repeats this every 10ms if
the CPU keeps seeing BPF activity.  This can obviously be improved
on, but it is simple at the moment and _works_.

Dave Hansen (5):
  bpf: add enter/exit markers
  bpf: track entry to and exit from BFP code
  bpf: use reduced speculation mitigations
  x86, bugs: centralize SPEC_CTRL MSR mask generation
  x86: implement reduced speculation when running BPF

 arch/x86/Kconfig                 |  4 ++
 arch/x86/include/asm/rmspec.h    | 24 ++++++++++++
 arch/x86/include/asm/spec-ctrl.h |  3 ++
 arch/x86/kernel/cpu/bugs.c       | 81 ++++++++++++++++++++++++++++++++--------
 include/linux/bpf.h              |  2 +
 include/linux/filter.h           | 34 ++++++++++++++++-
 include/linux/nospec.h           | 11 ++++++
 kernel/bpf/sockmap.c             |  6 +++
 net/core/filter.c                | 65 ++++++++++++++++++++++++++++++++
 net/kcm/kcmsock.c                |  7 +++-
 10 files changed, 220 insertions(+), 17 deletions(-)
 create mode 100644 arch/x86/include/asm/rmspec.h

-- 
2.9.5

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] [PATCH 1/5] SSB extra 2
  2018-05-03 22:29 [MODERATED] [PATCH 0/5] SSB extra 0 Dave Hansen
@ 2018-05-03 22:29 ` Dave Hansen
  2018-05-03 22:29 ` [MODERATED] [PATCH 2/5] SSB extra 3 Dave Hansen
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Dave Hansen @ 2018-05-03 22:29 UTC (permalink / raw)
  To: speck; +Cc: Dave Hansen

From: Dave Hansen <dave.hansen@linux.intel.com>

BPF code is often supplied from outside the kernel.  While it can
be programmatically verified, it is very difficult to verify
potential effects from speculative execution.

This patch adds some marker functions as BFP code is entered or
exited.  These serve only a stubs for now.

There are many possibilities for optimization.  The BFP programs
that run on devices, for instance, are less likely to need any
CPU-based mitigations.  These patches are an entirely unoptimized
first pass.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
---
 include/linux/bpf.h    |  2 ++
 include/linux/filter.h | 18 +++++++++++++++++-
 kernel/bpf/sockmap.c   |  6 ++++++
 net/kcm/kcmsock.c      |  7 ++++++-
 4 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 486e65e..887c908 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -357,7 +357,9 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu *old_array,
 			goto _out;			\
 		_prog = _array->progs;			\
 		while ((__prog = READ_ONCE(*_prog))) {	\
+			bpf_enter_prog(__prog);		\
 			_ret &= func(__prog, ctx);	\
+			bpf_leave_prog(__prog);		\
 			_prog++;			\
 		}					\
 _out:							\
diff --git a/include/linux/filter.h b/include/linux/filter.h
index fc4e8f9..e92854d 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -490,7 +490,23 @@ struct sk_filter {
 	struct bpf_prog	*prog;
 };
 
-#define BPF_PROG_RUN(filter, ctx)  (*(filter)->bpf_func)(ctx, (filter)->insnsi)
+static inline void bpf_enter_prog(const struct bpf_prog *fp)
+{
+}
+
+static inline void bpf_leave_prog(const struct bpf_prog *fp)
+{
+}
+
+#define BPF_PROG_RUN(filter, ctx)  ({				\
+	int __ret;						\
+								\
+	bpf_enter_prog(filter);					\
+	__ret = (*(filter)->bpf_func)(ctx, (filter)->insnsi);	\
+	bpf_leave_prog(filter);					\
+								\
+	__ret;							\
+})
 
 #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN
 
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index a3b2138..1858b9d 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -477,7 +477,9 @@ static unsigned int smap_do_tx_msg(struct sock *sk,
 	}
 
 	bpf_compute_data_pointers_sg(md);
+	bpf_enter_prog(prog);
 	rc = (*prog->bpf_func)(md, prog->insnsi);
+	bpf_leave_prog(prog);
 	psock->apply_bytes = md->apply_bytes;
 
 	/* Moving return codes from UAPI namespace into internal namespace */
@@ -1049,7 +1051,9 @@ static int smap_verdict_func(struct smap_psock *psock, struct sk_buff *skb)
 	skb->sk = psock->sock;
 	bpf_compute_data_pointers(skb);
 	preempt_disable();
+	bpf_enter_prog(prog);
 	rc = (*prog->bpf_func)(skb, prog->insnsi);
+	bpf_leave_prog(prog);
 	preempt_enable();
 	skb->sk = NULL;
 
@@ -1301,7 +1305,9 @@ static int smap_parse_func_strparser(struct strparser *strp,
 	 */
 	skb->sk = psock->sock;
 	bpf_compute_data_pointers(skb);
+	bpf_enter_prog(prog);
 	rc = (*prog->bpf_func)(skb, prog->insnsi);
+	bpf_leave_prog(prog);
 	skb->sk = NULL;
 	rcu_read_unlock();
 	return rc;
diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c
index dc76bc3..63bfc2a 100644
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -381,8 +381,13 @@ static int kcm_parse_func_strparser(struct strparser *strp, struct sk_buff *skb)
 {
 	struct kcm_psock *psock = container_of(strp, struct kcm_psock, strp);
 	struct bpf_prog *prog = psock->bpf_prog;
+	int ret;
 
-	return (*prog->bpf_func)(skb, prog->insnsi);
+	bpf_enter_prog(prog);
+	ret = (*prog->bpf_func)(skb, prog->insnsi);
+	bpf_leave_prog(prog);
+
+	return ret;
 }
 
 static int kcm_read_sock_done(struct strparser *strp, int err)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [MODERATED] [PATCH 2/5] SSB extra 3
  2018-05-03 22:29 [MODERATED] [PATCH 0/5] SSB extra 0 Dave Hansen
  2018-05-03 22:29 ` [MODERATED] [PATCH 1/5] SSB extra 2 Dave Hansen
@ 2018-05-03 22:29 ` Dave Hansen
  2018-05-03 22:29 ` [MODERATED] [PATCH 3/5] SSB extra 1 Dave Hansen
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Dave Hansen @ 2018-05-03 22:29 UTC (permalink / raw)
  To: speck; +Cc: Dave Hansen

From: Dave Hansen <dave.hansen@linux.intel.com>

Now that we have hooks called when we enter/exit the BFP code, tracks
when we enter/leave.  We "leave" lazily.  The first time we leave, we
schedule some work to do the actual "leave" at some point in the future.
This way, we do not thrash by enabling and disabling mitigations
frequently.

This means that the per-BPF-program overhead is hopefully just the
cost of incrementing and decrementing a per-cpu variable.

The per-cpu counter 'bpf_prog_active' looks superficially like a great
mechanism to use.  However, it does not track active BPF programs.
It appears to just be active when eprobe BPF handlers are running.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
---
 include/linux/filter.h | 11 ++++++++++
 net/core/filter.c      | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index e92854d..83c1298 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -490,12 +490,23 @@ struct sk_filter {
 	struct bpf_prog	*prog;
 };
 
+DECLARE_PER_CPU(unsigned int, bpf_prog_ran);
+
 static inline void bpf_enter_prog(const struct bpf_prog *fp)
 {
+	int *count = &get_cpu_var(bpf_prog_ran);
+	(*count)++;
 }
 
+extern void bpf_leave_prog_deferred(const struct bpf_prog *fp);
 static inline void bpf_leave_prog(const struct bpf_prog *fp)
 {
+	int *count = this_cpu_ptr(&bpf_prog_ran);
+	if (*count == 1)
+		bpf_leave_prog_deferred(fp);
+	else
+		(*count)--;
+	put_cpu_var(bpf_prog_ran);
 }
 
 #define BPF_PROG_RUN(filter, ctx)  ({				\
diff --git a/net/core/filter.c b/net/core/filter.c
index d31aff9..ffca000 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5649,3 +5649,61 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
 	release_sock(sk);
 	return ret;
 }
+
+/*
+ * 0 when no BPF code has executed on the CPU.
+ * Incremented when running BPF code.
+ * When ==1, work will be scheduled.
+ * When >1, work will not be scheduled because work is already
+ * scheduled.
+ * When work is performed, count will be decremented from 1->0.
+ */
+DEFINE_PER_CPU(unsigned int, bpf_prog_ran);
+EXPORT_SYMBOL_GPL(bpf_prog_ran);
+static void bpf_done_on_this_cpu(struct work_struct *work)
+{
+	if (!this_cpu_dec_return(bpf_prog_ran))
+		return;
+
+	/*
+	 * This is unexpected.  The elevated refcount indicates
+	 * being in the *middle* of a BPF program, which should
+	 * be impossible.  They are executed inside
+	 * rcu_read_lock() where we can not sleep and where
+	 * preemption is disabled.
+	 */
+	WARN_ON_ONCE(1);
+}
+
+DEFINE_PER_CPU(struct delayed_work, bpf_prog_delayed_work);
+static __init int bpf_init_delayed_work(void)
+{
+	int i;
+
+	for_each_possible_cpu(i) {
+		struct delayed_work *w = &per_cpu(bpf_prog_delayed_work, i);
+
+		INIT_DELAYED_WORK(w, bpf_done_on_this_cpu);
+	}
+	return 0;
+}
+subsys_initcall(bpf_init_delayed_work);
+
+/*
+ * Must be called with preempt disabled
+ *
+ * The schedule_delayed_work_on() is relatively expensive.  So,
+ * this way, someone doing a bunch of repeated BPF calls will
+ * only pay the cost of scheduling work on the *first* BPF call.
+ * The subsequent calls only pay the cost of incrementing a
+ * per-cpu variable, which is cheap.
+ */
+void bpf_leave_prog_deferred(const struct bpf_prog *fp)
+{
+	int cpu = smp_processor_id();
+	struct delayed_work *w = &per_cpu(bpf_prog_delayed_work, cpu);
+	unsigned long delay_jiffies = msecs_to_jiffies(10);
+
+	schedule_delayed_work_on(cpu, w, delay_jiffies);
+}
+EXPORT_SYMBOL_GPL(bpf_leave_prog_deferred);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [MODERATED] [PATCH 3/5] SSB extra 1
  2018-05-03 22:29 [MODERATED] [PATCH 0/5] SSB extra 0 Dave Hansen
  2018-05-03 22:29 ` [MODERATED] [PATCH 1/5] SSB extra 2 Dave Hansen
  2018-05-03 22:29 ` [MODERATED] [PATCH 2/5] SSB extra 3 Dave Hansen
@ 2018-05-03 22:29 ` Dave Hansen
  2018-05-03 22:29 ` [MODERATED] [PATCH 4/5] SSB extra 5 Dave Hansen
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Dave Hansen @ 2018-05-03 22:29 UTC (permalink / raw)
  To: speck; +Cc: Dave Hansen

From: Dave Hansen <dave.hansen@linux.intel.com>

The previous patches put in place the infrastructure to tell when
BPF code is running.  Now, we hook into that code to call out to
some architecture-specific code which will implement those
mitigationse

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
---
 include/linux/filter.h |  7 +++++++
 include/linux/nospec.h |  9 +++++++++
 net/core/filter.c      | 23 +++++++++++++++--------
 3 files changed, 31 insertions(+), 8 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 83c1298..ad63d9a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -19,6 +19,7 @@
 #include <linux/cryptohash.h>
 #include <linux/set_memory.h>
 #include <linux/kallsyms.h>
+#include <linux/nospec.h>
 
 #include <net/sch_generic.h>
 
@@ -496,6 +497,12 @@ static inline void bpf_enter_prog(const struct bpf_prog *fp)
 {
 	int *count = &get_cpu_var(bpf_prog_ran);
 	(*count)++;
+	/*
+	 * Upon the first entry to BPF code, we need to reduce
+	 * memory speculation to mitigate attacks targeting it.
+	 */
+	if (*count == 1)
+		cpu_enter_reduced_memory_speculation();
 }
 
 extern void bpf_leave_prog_deferred(const struct bpf_prog *fp);
diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index 1e63a0a..037ed8e 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -60,4 +60,13 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
 int arch_prctl_set_spec_ctrl(unsigned long which, unsigned long ctrl);
 int arch_prctl_get_spec_ctrl(unsigned long which);
 
+#ifndef CONFIG_ARCH_HAS_REDUCED_MEMORY_SPECULATION
+static inline void cpu_enter_reduced_memory_speculation(void)
+{
+}
+static inline void cpu_leave_reduced_memory_speculation(void)
+{
+}
+#endif
+
 #endif /* _LINUX_NOSPEC_H */
diff --git a/net/core/filter.c b/net/core/filter.c
index ffca000..e7d7a29 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -33,6 +33,7 @@
 #include <linux/if_packet.h>
 #include <linux/if_arp.h>
 #include <linux/gfp.h>
+#include <linux/nospec.h>
 #include <net/inet_common.h>
 #include <net/ip.h>
 #include <net/protocol.h>
@@ -5662,17 +5663,23 @@ DEFINE_PER_CPU(unsigned int, bpf_prog_ran);
 EXPORT_SYMBOL_GPL(bpf_prog_ran);
 static void bpf_done_on_this_cpu(struct work_struct *work)
 {
-	if (!this_cpu_dec_return(bpf_prog_ran))
-		return;
+	if (this_cpu_dec_return(bpf_prog_ran)) {
+		/*
+		 * This is unexpected.  The elevated refcount indicates
+		 * being in the *middle* of a BPF program, which should
+		 * be impossible.  They are executed inside
+		 * rcu_read_lock() where we can not sleep and where
+		 * preemption is disabled.
+		 */
+		WARN_ON_ONCE(1);
+	}
 
 	/*
-	 * This is unexpected.  The elevated refcount indicates
-	 * being in the *middle* of a BPF program, which should
-	 * be impossible.  They are executed inside
-	 * rcu_read_lock() where we can not sleep and where
-	 * preemption is disabled.
+	 * Unsafe BPF code is no longer running, disable mitigations.
+	 * This must be done after bpf_prog_ran because the mitigation
+	 * code looks at its state.
 	 */
-	WARN_ON_ONCE(1);
+	cpu_leave_reduced_memory_speculation();
 }
 
 DEFINE_PER_CPU(struct delayed_work, bpf_prog_delayed_work);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [MODERATED] [PATCH 4/5] SSB extra 5
  2018-05-03 22:29 [MODERATED] [PATCH 0/5] SSB extra 0 Dave Hansen
                   ` (2 preceding siblings ...)
  2018-05-03 22:29 ` [MODERATED] [PATCH 3/5] SSB extra 1 Dave Hansen
@ 2018-05-03 22:29 ` Dave Hansen
  2018-05-03 22:29 ` [MODERATED] [PATCH 5/5] SSB extra 4 Dave Hansen
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Dave Hansen @ 2018-05-03 22:29 UTC (permalink / raw)
  To: speck; +Cc: Dave Hansen

From: Dave Hansen <dave.hansen@linux.intel.com>

The KVM code manipualtes the SPEC_CTRL MSR when it enters and exits
the guest.  It overwrites the "kernel" value when it enters the guest
and restores the "kernel" value after leaving the guest.

Both code paths take into account the "base" (x86_spec_ctrl_base)
value and the per-task TIF_RDS flag (on Intel).  They then see if the
new state differs from the existing state and avoid the MSR write if
no change is made.

But, these two paths could be a bit more unified.  Introduce a new
function: x86_calculate_kernel_spec_ctrl() which will figure out the
"kernel" value to contrast it with the "guest" value.  We also
rename the arguments to the set/restore functions to make it clear
that while the arguments are both "guest" state, they really mean
different things to the two functions.

This will make the next step easier when we have more state to
consult in doing the x86_calculate_kernel_spec_ctrl() calculatione

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
---
 arch/x86/kernel/cpu/bugs.c | 50 ++++++++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index c28856e..0e17379 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -146,33 +146,49 @@ u64 x86_get_default_spec_ctrl(void)
 }
 EXPORT_SYMBOL_GPL(x86_get_default_spec_ctrl);
 
-void x86_set_guest_spec_ctrl(u64 guest_spec_ctrl)
+static inline u64 intel_rds_mask(void)
 {
-	u64 host = x86_spec_ctrl_base;
+	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+		return 0;
+
+	return rds_tif_to_spec_ctrl(current_thread_info()->flags);
+}
+
+/*
+ * Calculate the SPEC_CTRL MSR value that the kernel
+ * should be using under normal operation.
+ */
+static u64 x86_calculate_kernel_spec_ctrl(void)
+{
+	u64 spec_ctrl;
 
 	if (!boot_cpu_has(X86_FEATURE_IBRS))
-		return;
+		return 0;
 
-	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
-		host |= rds_tif_to_spec_ctrl(current_thread_info()->flags);
+	spec_ctrl = x86_spec_ctrl_base;
+	spec_ctrl |= intel_rds_mask();
 
-	if (host != guest_spec_ctrl)
-		wrmsrl(MSR_IA32_SPEC_CTRL, guest_spec_ctrl);
+	return spec_ctrl;
 }
-EXPORT_SYMBOL_GPL(x86_set_guest_spec_ctrl);
 
-void x86_restore_host_spec_ctrl(u64 guest_spec_ctrl)
+/* We are entering a guest and need to set its MSR value. */
+void x86_set_guest_spec_ctrl(u64 new_spec_ctrl)
 {
-	u64 host = x86_spec_ctrl_base;
-
-	if (!boot_cpu_has(X86_FEATURE_IBRS))
-		return;
+	if (x86_calculate_kernel_spec_ctrl() != new_spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, new_spec_ctrl);
+}
+EXPORT_SYMBOL_GPL(x86_set_guest_spec_ctrl);
 
-	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
-		host |= rds_tif_to_spec_ctrl(current_thread_info()->flags);
+/*
+ * We are leaving a guest and need to restore the kernel's MSR
+ * value that it uses for normal operation.
+ */
+void x86_restore_host_spec_ctrl(u64 current_spec_ctrl)
+{
+	u64 new_spec_ctrl = x86_calculate_kernel_spec_ctrl();
 
-	if (host != guest_spec_ctrl)
-		wrmsrl(MSR_IA32_SPEC_CTRL, host);
+	if (new_spec_ctrl != current_spec_ctrl)
+		wrmsrl(MSR_IA32_SPEC_CTRL, new_spec_ctrl);
 }
 EXPORT_SYMBOL_GPL(x86_restore_host_spec_ctrl);
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [MODERATED] [PATCH 5/5] SSB extra 4
  2018-05-03 22:29 [MODERATED] [PATCH 0/5] SSB extra 0 Dave Hansen
                   ` (3 preceding siblings ...)
  2018-05-03 22:29 ` [MODERATED] [PATCH 4/5] SSB extra 5 Dave Hansen
@ 2018-05-03 22:29 ` Dave Hansen
  2018-05-03 23:27 ` [MODERATED] Re: [PATCH 0/5] SSB extra 0 Kees Cook
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 31+ messages in thread
From: Dave Hansen @ 2018-05-03 22:29 UTC (permalink / raw)
  To: speck; +Cc: Dave Hansen

From: Dave Hansen <dave.hansen@linux.intel.com>

Enable the SPEC_CTRL_RDS feature when running BPF code.

Underneath x86_calculate_kernel_spec_ctrl(), we now check the
per-cpu bpf_prog_ran counter.  If the counter is elevated, we
need to set the SPEC_CTRL_RDS bit.

We also add MSR writes (via x86_sync_spec_ctrl()) to:

	cpu_enter_reduced_memory_speculation() and
	cpu_leave_reduced_memory_speculation()

I'm not super happy that x86_sync_spec_ctrl() does an
unconditional MSR write.  But, they should be infrequent since
they only happen twice per timeout period.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
---
 arch/x86/Kconfig                 |  4 ++++
 arch/x86/include/asm/rmspec.h    | 24 ++++++++++++++++++++++++
 arch/x86/include/asm/spec-ctrl.h |  3 +++
 arch/x86/kernel/cpu/bugs.c       | 37 ++++++++++++++++++++++++++++++++++++-
 include/linux/filter.h           |  4 +---
 include/linux/nospec.h           |  2 ++
 6 files changed, 70 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/include/asm/rmspec.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c07f492..aa023d0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -58,6 +58,7 @@ config X86
 	select ARCH_HAS_KCOV			if X86_64
 	select ARCH_HAS_MEMBARRIER_SYNC_CORE
 	select ARCH_HAS_PMEM_API		if X86_64
+	select ARCH_HAS_REDUCED_MEMORY_SPECULATION
 	select ARCH_HAS_REFCOUNT
 	select ARCH_HAS_UACCESS_FLUSHCACHE	if X86_64
 	select ARCH_HAS_SET_MEMORY
@@ -268,6 +269,9 @@ config RWSEM_XCHGADD_ALGORITHM
 config GENERIC_CALIBRATE_DELAY
 	def_bool y
 
+config ARCH_HAS_REDUCED_MEMORY_SPECULATION
+	def_bool y
+
 config ARCH_HAS_CPU_RELAX
 	def_bool y
 
diff --git a/arch/x86/include/asm/rmspec.h b/arch/x86/include/asm/rmspec.h
new file mode 100644
index 0000000..cac0189
--- /dev/null
+++ b/arch/x86/include/asm/rmspec.h
@@ -0,0 +1,24 @@
+#ifndef _LINUX_RMSPEC_H
+#define _LINUX_RMSPEC_H
+#include <asm/msr.h>
+#include <asm/spec-ctrl.h>
+
+/*
+ * We call these when we *know* the CPU can go in/out of its
+ * "safer" reduced memory speculation mode.
+ *
+ * For BPF, x86_sync_spec_ctrl() reads the per-cpu BPF state
+ * variable and figures out the MSR value by itself.  Thus,
+ * we do not need to pass the "direction".
+ */
+static inline void cpu_enter_reduced_memory_speculation(void)
+{
+	x86_sync_spec_ctrl();
+}
+
+static inline void cpu_leave_reduced_memory_speculation(void)
+{
+	x86_sync_spec_ctrl();
+}
+
+#endif /* _LINUX_RMSPEC_H */
diff --git a/arch/x86/include/asm/spec-ctrl.h b/arch/x86/include/asm/spec-ctrl.h
index 607236a..881c353 100644
--- a/arch/x86/include/asm/spec-ctrl.h
+++ b/arch/x86/include/asm/spec-ctrl.h
@@ -15,6 +15,9 @@
 extern void x86_set_guest_spec_ctrl(u64);
 extern void x86_restore_host_spec_ctrl(u64);
 
+/* Write a new SPEC_CTRL MSR based on current kernel state: */
+extern void x86_sync_spec_ctrl(void);
+
 /* AMD specific Speculative Store Bypass MSR data */
 extern u64 x86_amd_ls_cfg_base;
 extern u64 x86_amd_ls_cfg_rds_mask;
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 0e17379..947ae8a 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -11,6 +11,7 @@
 #include <linux/init.h>
 #include <linux/utsname.h>
 #include <linux/cpu.h>
+#include <linux/filter.h>
 #include <linux/module.h>
 #include <linux/nospec.h>
 #include <linux/prctl.h>
@@ -148,10 +149,23 @@ EXPORT_SYMBOL_GPL(x86_get_default_spec_ctrl);
 
 static inline u64 intel_rds_mask(void)
 {
+	u64 mask;
+
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
 		return 0;
 
-	return rds_tif_to_spec_ctrl(current_thread_info()->flags);
+	mask = rds_tif_to_spec_ctrl(current_thread_info()->flags);
+
+	/*
+	 * BPF programs can be exploited to attack the kernel.
+	 * Leave the RDS bit on when we recently ran one.  This
+	 * bit gets cleared after a BFP program has not run on
+	 * the CPU for a while.
+	 */
+	if (get_cpu_var(bpf_prog_ran))
+		mask |= SPEC_CTRL_RDS;
+
+	return mask;
 }
 
 /*
@@ -192,6 +206,27 @@ void x86_restore_host_spec_ctrl(u64 current_spec_ctrl)
 }
 EXPORT_SYMBOL_GPL(x86_restore_host_spec_ctrl);
 
+/*
+ * A condition that may affect the SPEC_CTRL MSR has changed.
+ * Recalculate a new value for this CPU and set it.
+ *
+ * It is not easy to optimize the wrmsrl() away unless the
+ * callers have a full understanding of all the conditions
+ * that affect the output of x86_calculate_kernel_spec_ctrl().
+ *
+ * Try not to call this too often.
+ */
+void x86_sync_spec_ctrl(void)
+{
+	u64 new_spec_ctrl = x86_calculate_kernel_spec_ctrl();
+
+	if (!boot_cpu_has(X86_FEATURE_IBRS))
+		return;
+
+	wrmsrl(MSR_IA32_SPEC_CTRL, new_spec_ctrl);
+}
+EXPORT_SYMBOL_GPL(x86_sync_spec_ctrl);
+
 static void x86_amd_rds_enable(void)
 {
 	u64 msrval = x86_amd_ls_cfg_base | x86_amd_ls_cfg_rds_mask;
diff --git a/include/linux/filter.h b/include/linux/filter.h
index ad63d9a..7e51184 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -495,13 +495,11 @@ DECLARE_PER_CPU(unsigned int, bpf_prog_ran);
 
 static inline void bpf_enter_prog(const struct bpf_prog *fp)
 {
-	int *count = &get_cpu_var(bpf_prog_ran);
-	(*count)++;
 	/*
 	 * Upon the first entry to BPF code, we need to reduce
 	 * memory speculation to mitigate attacks targeting it.
 	 */
-	if (*count == 1)
+	if (this_cpu_inc_return(bpf_prog_ran) == 1)
 		cpu_enter_reduced_memory_speculation();
 }
 
diff --git a/include/linux/nospec.h b/include/linux/nospec.h
index 037ed8e..26873ea 100644
--- a/include/linux/nospec.h
+++ b/include/linux/nospec.h
@@ -67,6 +67,8 @@ static inline void cpu_enter_reduced_memory_speculation(void)
 static inline void cpu_leave_reduced_memory_speculation(void)
 {
 }
+#else
+#include <asm/rmspec.h>
 #endif
 
 #endif /* _LINUX_NOSPEC_H */
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 0/5] SSB extra 0
  2018-05-03 22:29 [MODERATED] [PATCH 0/5] SSB extra 0 Dave Hansen
                   ` (4 preceding siblings ...)
  2018-05-03 22:29 ` [MODERATED] [PATCH 5/5] SSB extra 4 Dave Hansen
@ 2018-05-03 23:27 ` Kees Cook
  2018-05-04  1:37   ` Dave Hansen
  2018-05-04  9:20 ` [MODERATED] Re: [PATCH 1/5] SSB extra 2 Peter Zijlstra
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Kees Cook @ 2018-05-03 23:27 UTC (permalink / raw)
  To: speck

On Thu, May 03, 2018 at 03:29:43PM -0700, speck for Dave Hansen wrote:
> BPF is a potential source of gadgets that can be used for memory
> diambiguation-based attacks.  To help mitigate these, we enable
> the bit in SPEC_CTRL which enables the reduced (memory)
> speculation mode on the processor when runing BPF code.

Do you mean eBPF, or even cBPF? For example, can gadgets be built using the
BPF used in seccomp()? Prior speculation flaws weren't exposed there, so
it might be possible (though ironic given my other seccomp series) to not
trigger this for seccomp BPF execution... :P

-Kees

-- 
Kees Cook                                            @outflux.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 0/5] SSB extra 0
  2018-05-03 23:27 ` [MODERATED] Re: [PATCH 0/5] SSB extra 0 Kees Cook
@ 2018-05-04  1:37   ` Dave Hansen
  2018-05-04 22:26     ` Kees Cook
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Hansen @ 2018-05-04  1:37 UTC (permalink / raw)
  To: speck

On 05/03/2018 04:27 PM, speck for Kees Cook wrote:> On Thu, May 03, 2018
at 03:29:43PM -0700, speck for Dave Hansen wrote:
>> BPF is a potential source of gadgets that can be used for memory 
>> diambiguation-based attacks.  To help mitigate these, we enable the
>> bit in SPEC_CTRL which enables the reduced (memory) speculation
>> mode on the processor when runing BPF code.
> 
> Do you mean eBPF, or even cBPF? 

Right or wrong, my assumption is that you can build gadgets with any of
the variants.  I haven't looked into detail to whether the classic VM
and enhanced VM have the building blocks.

My expectation is that we would build out a list of exceptions in
bpf_enter_prog() for programs that are not subject to mitigation.
Things that come from trusted sources, get offloaded to devices, or
probably the ones that do not write to memory.

My thought was that we'll get to the point where the BFP checker sets a
flag in the bpf_prog to tell us what kind of mitigations it needs.  We
already do that to some degree and some BFP programs "types" have more
freedom than others.

Either way, I don't think we have the BPF experts on this list, yet.
I'm a BPF newbie for sure.

> For example, can gadgets be built using the BPF used in seccomp()?
> Prior speculation flaws weren't exposed there, so it might be
> possible (though ironic given my other seccomp series) to not trigger
> this for seccomp BPF execution... :P

The irony of seccomp being involved here is not lost on me. :)

Were we just not concerned before because seccomp didn't use arrays?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 1/5] SSB extra 2
  2018-05-03 22:29 [MODERATED] [PATCH 0/5] SSB extra 0 Dave Hansen
                   ` (5 preceding siblings ...)
  2018-05-03 23:27 ` [MODERATED] Re: [PATCH 0/5] SSB extra 0 Kees Cook
@ 2018-05-04  9:20 ` Peter Zijlstra
  2018-05-04 14:04   ` Dave Hansen
  2018-05-04 13:33 ` [PATCH 3/5] SSB extra 1 Thomas Gleixner
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2018-05-04  9:20 UTC (permalink / raw)
  To: speck

On Thu, May 03, 2018 at 03:29:44PM -0700, speck for Dave Hansen wrote:

> BPF code is often supplied from outside the kernel.  While it can
> be programmatically verified, it is very difficult to verify
> potential effects from speculative execution.
> 
> This patch adds some marker functions as BFP code is entered or
> exited.  These serve only a stubs for now.
> 
> There are many possibilities for optimization.  The BFP programs
> that run on devices, for instance, are less likely to need any
> CPU-based mitigations.  These patches are an entirely unoptimized
> first pass.

> @@ -357,7 +357,9 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu *old_array,
>  			goto _out;			\
>  		_prog = _array->progs;			\
>  		while ((__prog = READ_ONCE(*_prog))) {	\
> +			bpf_enter_prog(__prog);		\
>  			_ret &= func(__prog, ctx);	\
> +			bpf_leave_prog(__prog);		\
>  			_prog++;			\
>  		}					\
>  _out:							\

So I actually had a look at this yesterday, and why did you choose to
put these hooks at the prog callsites instead of inside the prog itself?

It seems to me that placing them inside the program (either the
interpreter or the jit) is much less invasive.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] SSB extra 1
  2018-05-03 22:29 [MODERATED] [PATCH 0/5] SSB extra 0 Dave Hansen
                   ` (6 preceding siblings ...)
  2018-05-04  9:20 ` [MODERATED] Re: [PATCH 1/5] SSB extra 2 Peter Zijlstra
@ 2018-05-04 13:33 ` Thomas Gleixner
  2018-05-04 14:22   ` [MODERATED] " Dave Hansen
  2018-05-04 17:01 ` [MODERATED] Re: [PATCH 4/5] SSB extra 5 Konrad Rzeszutek Wilk
  2018-05-21  9:56 ` [MODERATED] Re: [PATCH 5/5] SSB extra 4 Jiri Kosina
  9 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2018-05-04 13:33 UTC (permalink / raw)
  To: speck

On Thu, 3 May 2018, speck for Dave Hansen wrote:
>  static void bpf_done_on_this_cpu(struct work_struct *work)
>  {
> -	if (!this_cpu_dec_return(bpf_prog_ran))
> -		return;
> +	if (this_cpu_dec_return(bpf_prog_ran)) {
> +		/*
> +		 * This is unexpected.  The elevated refcount indicates
> +		 * being in the *middle* of a BPF program, which should
> +		 * be impossible.  They are executed inside
> +		 * rcu_read_lock() where we can not sleep and where
> +		 * preemption is disabled.

Preemptible RCU exists in mainline since 2008, so your assumption that
preemption is disabled is simply wrong. And please don't just slap a
preempt_disable() somewhere as this really can be done without. It's per
thread information not per cpu.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 1/5] SSB extra 2
  2018-05-04  9:20 ` [MODERATED] Re: [PATCH 1/5] SSB extra 2 Peter Zijlstra
@ 2018-05-04 14:04   ` Dave Hansen
  2018-05-04 15:50     ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Hansen @ 2018-05-04 14:04 UTC (permalink / raw)
  To: speck

On 05/04/2018 02:20 AM, speck for Peter Zijlstra wrote:
>> @@ -357,7 +357,9 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu *old_array,
>>  			goto _out;			\
>>  		_prog = _array->progs;			\
>>  		while ((__prog = READ_ONCE(*_prog))) {	\
>> +			bpf_enter_prog(__prog);		\
>>  			_ret &= func(__prog, ctx);	\
>> +			bpf_leave_prog(__prog);		\
>>  			_prog++;			\
>>  		}					\
>>  _out:							\
> So I actually had a look at this yesterday, and why did you choose to
> put these hooks at the prog callsites instead of inside the prog itself?
> 
> It seems to me that placing them inside the program (either the
> interpreter or the jit) is much less invasive.

It's not a bad idea.  This didn't seem *too* invasive other than I had
to hit all the variants of calling func().

Were you thinking we add some eBPF instruction which calls out to the
kernel machinery and then the kernel BPF code inserts it as the first
instruction of the program?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 3/5] SSB extra 1
  2018-05-04 13:33 ` [PATCH 3/5] SSB extra 1 Thomas Gleixner
@ 2018-05-04 14:22   ` Dave Hansen
  2018-05-04 14:26     ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Hansen @ 2018-05-04 14:22 UTC (permalink / raw)
  To: speck

On 05/04/2018 06:33 AM, speck for Thomas Gleixner wrote:
> On Thu, 3 May 2018, speck for Dave Hansen wrote:
>>  static void bpf_done_on_this_cpu(struct work_struct *work)
>>  {
>> -	if (!this_cpu_dec_return(bpf_prog_ran))
>> -		return;
>> +	if (this_cpu_dec_return(bpf_prog_ran)) {
>> +		/*
>> +		 * This is unexpected.  The elevated refcount indicates
>> +		 * being in the *middle* of a BPF program, which should
>> +		 * be impossible.  They are executed inside
>> +		 * rcu_read_lock() where we can not sleep and where
>> +		 * preemption is disabled.
> 
> Preemptible RCU exists in mainline since 2008, so your assumption that
> preemption is disabled is simply wrong. And please don't just slap a
> preempt_disable() somewhere as this really can be done without. It's per
> thread information not per cpu.

Ahhh, that's an interesting point.  Are you thinking that we add a BPF
"instruction" to enable or disable the mitigations, and then have the
verifier insert it as the first instruction and then before any exit
instructions?  Is there some precedent for doing this?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] SSB extra 1
  2018-05-04 14:22   ` [MODERATED] " Dave Hansen
@ 2018-05-04 14:26     ` Thomas Gleixner
  2018-05-04 16:04       ` [MODERATED] " Andi Kleen
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2018-05-04 14:26 UTC (permalink / raw)
  To: speck

On Fri, 4 May 2018, speck for Dave Hansen wrote:

> On 05/04/2018 06:33 AM, speck for Thomas Gleixner wrote:
> > On Thu, 3 May 2018, speck for Dave Hansen wrote:
> >>  static void bpf_done_on_this_cpu(struct work_struct *work)
> >>  {
> >> -	if (!this_cpu_dec_return(bpf_prog_ran))
> >> -		return;
> >> +	if (this_cpu_dec_return(bpf_prog_ran)) {
> >> +		/*
> >> +		 * This is unexpected.  The elevated refcount indicates
> >> +		 * being in the *middle* of a BPF program, which should
> >> +		 * be impossible.  They are executed inside
> >> +		 * rcu_read_lock() where we can not sleep and where
> >> +		 * preemption is disabled.
> > 
> > Preemptible RCU exists in mainline since 2008, so your assumption that
> > preemption is disabled is simply wrong. And please don't just slap a
> > preempt_disable() somewhere as this really can be done without. It's per
> > thread information not per cpu.
> 
> Ahhh, that's an interesting point.  Are you thinking that we add a BPF
> "instruction" to enable or disable the mitigations, and then have the
> verifier insert it as the first instruction and then before any exit
> instructions?  Is there some precedent for doing this?

Dunno, but it would be the obvious thing to do I think.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 1/5] SSB extra 2
  2018-05-04 14:04   ` Dave Hansen
@ 2018-05-04 15:50     ` Peter Zijlstra
  2018-05-04 15:54       ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2018-05-04 15:50 UTC (permalink / raw)
  To: speck

On Fri, May 04, 2018 at 07:04:53AM -0700, speck for Dave Hansen wrote:
> On 05/04/2018 02:20 AM, speck for Peter Zijlstra wrote:
> >> @@ -357,7 +357,9 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu *old_array,
> >>  			goto _out;			\
> >>  		_prog = _array->progs;			\
> >>  		while ((__prog = READ_ONCE(*_prog))) {	\
> >> +			bpf_enter_prog(__prog);		\
> >>  			_ret &= func(__prog, ctx);	\
> >> +			bpf_leave_prog(__prog);		\
> >>  			_prog++;			\
> >>  		}					\
> >>  _out:							\
> > So I actually had a look at this yesterday, and why did you choose to
> > put these hooks at the prog callsites instead of inside the prog itself?
> > 
> > It seems to me that placing them inside the program (either the
> > interpreter or the jit) is much less invasive.
> 
> It's not a bad idea.  This didn't seem *too* invasive other than I had
> to hit all the variants of calling func().

> Were you thinking we add some eBPF instruction which calls out to the
> kernel machinery and then the kernel BPF code inserts it as the first
> instruction of the program?

Teaching the interpreter to do this is trivial, for the JIT we would
emit a "CALL $asm_pre_thunk" as first instruction and "CALL
$asm_post_thunk" before every RET.

And when it doesn't need to mitigate, don't issue the instructions at
all.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 1/5] SSB extra 2
  2018-05-04 15:50     ` Peter Zijlstra
@ 2018-05-04 15:54       ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2018-05-04 15:54 UTC (permalink / raw)
  To: speck



On Fri, 4 May 2018, speck for Peter Zijlstra wrote:
> 
> Teaching the interpreter to do this is trivial, for the JIT we would
> emit a "CALL $asm_pre_thunk" as first instruction and "CALL
> $asm_post_thunk" before every RET.
> 
> And when it doesn't need to mitigate, don't issue the instructions at
> all.

Ack, that sounds better.

           Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 3/5] SSB extra 1
  2018-05-04 14:26     ` Thomas Gleixner
@ 2018-05-04 16:04       ` Andi Kleen
  2018-05-04 16:09         ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2018-05-04 16:04 UTC (permalink / raw)
  To: speck

> > Ahhh, that's an interesting point.  Are you thinking that we add a BPF
> > "instruction" to enable or disable the mitigations, and then have the
> > verifier insert it as the first instruction and then before any exit
> > instructions?  Is there some precedent for doing this?
> 
> Dunno, but it would be the obvious thing to do I think.

Other option would be a preempt notifier migrating the SSB state?

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] SSB extra 1
  2018-05-04 16:04       ` [MODERATED] " Andi Kleen
@ 2018-05-04 16:09         ` Thomas Gleixner
  2018-05-04 16:28           ` [MODERATED] " Andi Kleen
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2018-05-04 16:09 UTC (permalink / raw)
  To: speck

On Fri, 4 May 2018, speck for Andi Kleen wrote:

> > > Ahhh, that's an interesting point.  Are you thinking that we add a BPF
> > > "instruction" to enable or disable the mitigations, and then have the
> > > verifier insert it as the first instruction and then before any exit
> > > instructions?  Is there some precedent for doing this?
> > 
> > Dunno, but it would be the obvious thing to do I think.
> 
> Other option would be a preempt notifier migrating the SSB state?

Why so? Because it's more complex than just using the TIF_RDS flag which
gets evaluated anyway?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 3/5] SSB extra 1
  2018-05-04 16:09         ` Thomas Gleixner
@ 2018-05-04 16:28           ` Andi Kleen
  2018-05-04 16:32             ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2018-05-04 16:28 UTC (permalink / raw)
  To: speck

On Fri, May 04, 2018 at 06:09:04PM +0200, speck for Thomas Gleixner wrote:
> On Fri, 4 May 2018, speck for Andi Kleen wrote:
> 
> > > > Ahhh, that's an interesting point.  Are you thinking that we add a BPF
> > > > "instruction" to enable or disable the mitigations, and then have the
> > > > verifier insert it as the first instruction and then before any exit
> > > > instructions?  Is there some precedent for doing this?
> > > 
> > > Dunno, but it would be the obvious thing to do I think.
> > 
> > Other option would be a preempt notifier migrating the SSB state?
> 
> Why so? Because it's more complex than just using the TIF_RDS flag which
> gets evaluated anyway?

The flag doesn't know anything about the timer. You would
need another flag that says "start a delay timer on the new CPU
too".

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] SSB extra 1
  2018-05-04 16:28           ` [MODERATED] " Andi Kleen
@ 2018-05-04 16:32             ` Thomas Gleixner
  2018-05-04 16:43               ` [MODERATED] " Dave Hansen
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2018-05-04 16:32 UTC (permalink / raw)
  To: speck

On Fri, 4 May 2018, speck for Andi Kleen wrote:
> On Fri, May 04, 2018 at 06:09:04PM +0200, speck for Thomas Gleixner wrote:
> > On Fri, 4 May 2018, speck for Andi Kleen wrote:
> > 
> > > > > Ahhh, that's an interesting point.  Are you thinking that we add a BPF
> > > > > "instruction" to enable or disable the mitigations, and then have the
> > > > > verifier insert it as the first instruction and then before any exit
> > > > > instructions?  Is there some precedent for doing this?
> > > > 
> > > > Dunno, but it would be the obvious thing to do I think.
> > > 
> > > Other option would be a preempt notifier migrating the SSB state?
> > 
> > Why so? Because it's more complex than just using the TIF_RDS flag which
> > gets evaluated anyway?
> 
> The flag doesn't know anything about the timer. You would
> need another flag that says "start a delay timer on the new CPU
> too".

Color me confused. I dont see a timer anywhere.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 3/5] SSB extra 1
  2018-05-04 16:32             ` Thomas Gleixner
@ 2018-05-04 16:43               ` Dave Hansen
  2018-05-04 18:39                 ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Hansen @ 2018-05-04 16:43 UTC (permalink / raw)
  To: speck

On 05/04/2018 09:32 AM, speck for Thomas Gleixner wrote:
>> The flag doesn't know anything about the timer. You would
>> need another flag that says "start a delay timer on the new CPU
>> too".
> Color me confused. I dont see a timer anywhere.

We were trying not to ping-pong the MSRs too much if we are going in/out
of the BFP code frequently.

So, there's a schedule_delayed_work_on() in there that waits 10ms and
turns the mitigations off, so we're only ping-ponging the MSRs every 10ms.

So, even in the case that we're generating BPF instructions for the
mitigation enable/disable, we still might want some mechanism to make
sure that we're not touching the MSRs *too* frequently if we're going
in/out of BFP frequently.

That's a wee bit harder if we're tracking the mitigation on the task
level than the CPU.  I don't think it's impossible, but it's certainly
more code than is there at the moment.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 4/5] SSB extra 5
  2018-05-03 22:29 [MODERATED] [PATCH 0/5] SSB extra 0 Dave Hansen
                   ` (7 preceding siblings ...)
  2018-05-04 13:33 ` [PATCH 3/5] SSB extra 1 Thomas Gleixner
@ 2018-05-04 17:01 ` Konrad Rzeszutek Wilk
  2018-05-21  9:56 ` [MODERATED] Re: [PATCH 5/5] SSB extra 4 Jiri Kosina
  9 siblings, 0 replies; 31+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-05-04 17:01 UTC (permalink / raw)
  To: speck

On Thu, May 03, 2018 at 03:29:47PM -0700, speck for Dave Hansen wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
> Subject:  x86, bugs: centralize SPEC_CTRL MSR mask generation
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> The KVM code manipualtes the SPEC_CTRL MSR when it enters and exits
> the guest.  It overwrites the "kernel" value when it enters the guest
> and restores the "kernel" value after leaving the guest.
> 
> Both code paths take into account the "base" (x86_spec_ctrl_base)
> value and the per-task TIF_RDS flag (on Intel).  They then see if the
> new state differs from the existing state and avoid the MSR write if
> no change is made.
> 
> But, these two paths could be a bit more unified.  Introduce a new
> function: x86_calculate_kernel_spec_ctrl() which will figure out the
> "kernel" value to contrast it with the "guest" value.  We also
> rename the arguments to the set/restore functions to make it clear
> that while the arguments are both "guest" state, they really mean
> different things to the two functions.
> 
> This will make the next step easier when we have more state to
> consult in doing the x86_calculate_kernel_spec_ctrl() calculatione
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] SSB extra 1
  2018-05-04 16:43               ` [MODERATED] " Dave Hansen
@ 2018-05-04 18:39                 ` Thomas Gleixner
  2018-05-06  8:32                   ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2018-05-04 18:39 UTC (permalink / raw)
  To: speck

On Fri, 4 May 2018, speck for Dave Hansen wrote:
> On 05/04/2018 09:32 AM, speck for Thomas Gleixner wrote:
> >> The flag doesn't know anything about the timer. You would
> >> need another flag that says "start a delay timer on the new CPU
> >> too".
> > Color me confused. I dont see a timer anywhere.
> 
> We were trying not to ping-pong the MSRs too much if we are going in/out
> of the BFP code frequently.
> 
> So, there's a schedule_delayed_work_on() in there that waits 10ms and
> turns the mitigations off, so we're only ping-ponging the MSRs every 10ms.
> 
> So, even in the case that we're generating BPF instructions for the
> mitigation enable/disable, we still might want some mechanism to make
> sure that we're not touching the MSRs *too* frequently if we're going
> in/out of BFP frequently.
> 
> That's a wee bit harder if we're tracking the mitigation on the task
> level than the CPU.  I don't think it's impossible, but it's certainly
> more code than is there at the moment.

Groan, you're again convoluting stuff in a completely nonsensical way.

Lets do a proper problem analysis first:

1) CPU resource speculation

   Can be controlled:

     A) Globaly (commandline on/off or not vulnerable system)

     B) Context dependent

     	Doing a time based prevention of toggling it too often is a
     	completely orthognal problem. Lets talk about that later.

2) Context

   The disable/enable is tied to execution contexts

     - task via prctl

     - task via seccomp

     - ebpf

   Now the fundamental context in Linux is a task. Soft and hard interupts
   are nesting into the task context, or you can consider them context
   stealing.

   That has a fundamental consequence:

     Nested contexts can always disable the speculation if requested, but
     they can only enable speculation when the context in which they nest
     has it enabled as well.

  So any nested context has to look at the previous level to see whether it
  can reenable. So you need storage for that.

  The regular task storage is TIF_RDS, the softirq storage can be per cpu
  and the hardirq context does not need storage assumed that NMI is not
  allowed to fiddle with that. But we can very simply use a per cpu
  refcount for soft and hard interrupt contexts. That refcount is
  incremented on disable and decremented on enable. prctl/seccomp has no
  influence at that point.  But if on enable the count goes to zero then it
  has to check task->TID_RDS to decide whether it can be reenabled or not.

  Now lets look at EBPF. EBPF is also a nesting context.

  So, if EBPF runs in preemptible task context, then it sets a flag in the
  task 'ebf_speculation_disabled' and sets TIF_RDS, which means that on
  migration the normal switch_to() logic will take care of it. Obviously we
  need a per task storage for the prctl selected state. I already did this
  for the force disable thing. If EBF reenables then it uses the per task
  prctl state.

  If EBPF runs in soft or hardirq context then it can uses the per cpu
  refcount. The above rules apply.

So now lets talk about the toggle timer thing.

  We have one central place where the MSR is written to and we already only
  write it when the control state changes between two contexts.

  So on every toggle, you increment a per cpu counter and you have a timer
  which polls that counter periodically and if the toggle count is over a
  threshold then it sets a per cpu flag which prevents MSR enable writes. A
  speculation disable write must always succeed, but that's at maximum one
  for the observation period. The toggle counter is still updated so the
  timer can check whether the wave of toggles has subsided or not. If yes,
  it lifts the MSR write restriction, if not it stays.

That just works and has the right separation levels and covers everything
from high speed context switches to high speed ebpf invocations in every
nested or non nested context. And it keeps everything which is in regular
task context tied to the task and therefore preemption, migration are
nothing special. No notifiers, no timer migration, no preempt disable
assumptions, nothing.

It's really that simple if you do a proper analysis before trying to solve
it by duct taping things together which are fundamentally separate.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 0/5] SSB extra 0
  2018-05-04  1:37   ` Dave Hansen
@ 2018-05-04 22:26     ` Kees Cook
  2018-05-23  7:17       ` [MODERATED] cBPF affectedness (was Re: [PATCH 0/5] SSB extra 0) Jiri Kosina
  0 siblings, 1 reply; 31+ messages in thread
From: Kees Cook @ 2018-05-04 22:26 UTC (permalink / raw)
  To: speck

On Thu, May 03, 2018 at 06:37:46PM -0700, speck for Dave Hansen wrote:
> On 05/03/2018 04:27 PM, speck for Kees Cook wrote:> On Thu, May 03, 2018
> at 03:29:43PM -0700, speck for Dave Hansen wrote:
> >> BPF is a potential source of gadgets that can be used for memory 
> >> diambiguation-based attacks.  To help mitigate these, we enable the
> >> bit in SPEC_CTRL which enables the reduced (memory) speculation
> >> mode on the processor when runing BPF code.
> > 
> > Do you mean eBPF, or even cBPF? 
> 
> Right or wrong, my assumption is that you can build gadgets with any of
> the variants.  I haven't looked into detail to whether the classic VM
> and enhanced VM have the building blocks.
> [...]
> Were we just not concerned before because seccomp didn't use arrays?

My understanding from Jann Horn (who looked at the seccomp cases before)
was that eBPF maps were required for SpectreV1. I've asked him for
clarification on SSB, but I think the reasoning holds there too.

Specifically, my memory of the details are that since seccomp uses a
subset of even cBPF and can only use absolute indexing (not dynamic), cBPF
programs cannot be constructed that do array access outside of the seccomp
data buffer (which consists entirely of userspace information: pid,
syscall number, syscall arguments), not even from the 16-byte "scratch"
space. i.e. the seccomp cBPF verifier, seccomp_check_filter(), results
in no seccomp cBPF programs having the "untrusted_offset_from_caller"
indexing (to borrow the variable name from the Project Zero write-up on
SpectreV1). It also has no maps (so there cannot be data sharing between
multiple seccomp cBPF programs nor to the outside world). And finally,
the only "visible" output from seccomp is via syscall behaviors, which
would create too much noise for that exfiltration path.

Note, though, that I am not an expert in eBPF nor writing speculation
exploits. Which gets us to the next question...

> [...]
> Either way, I don't think we have the BPF experts on this list, yet.
> I'm a BPF newbie for sure.

What's the plan for involving them?

Whatever the case, I think we need to find a way to exempt seccomp cBPF
from this...

-Kees

-- 
Kees Cook                                            @outflux.net

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] SSB extra 1
  2018-05-04 18:39                 ` Thomas Gleixner
@ 2018-05-06  8:32                   ` Thomas Gleixner
  2018-05-06 21:48                     ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2018-05-06  8:32 UTC (permalink / raw)
  To: speck

On Fri, 4 May 2018, speck for Thomas Gleixner wrote:
>   Now lets look at EBPF. EBPF is also a nesting context.
> 
>   So, if EBPF runs in preemptible task context, then it sets a flag in the
>   task 'ebf_speculation_disabled' and sets TIF_RDS, which means that on
>   migration the normal switch_to() logic will take care of it. Obviously we
>   need a per task storage for the prctl selected state. I already did this
>   for the force disable thing. If EBF reenables then it uses the per task
>   prctl state.
> 
>   If EBPF runs in soft or hardirq context then it can uses the per cpu
>   refcount. The above rules apply.

Following up the discussion on IRC:

 - BPF runs also in NMI context, but that's not a real problem

 - BPF programms can 'call' other BPF programs

   There are two forms: Regular calls which return to the caller and tail
   calls. Tail calls make the speculation control more complicated because
   the resulting disable/enable invocations become imbalanced.

   So the counter does not work, but we can makee BPF speculation control
   to use a simple boolean flag which denotes that speculation has been
   disabled.

   The JIT or interpreter injects only the speculation_disable() call into
   the BPF programs which are not trusted, but no speculation_enable() call.

   speculation_enable() is called unconditionally at the BPF invocation
   wrapper after the BPF execution terminates.

   speculation_disable() notes the disabling per context (thread, softirq,
   hardirq, nmi) and speculation_enable() checks the disabled bit for the
   context and acts accordinlgy.

   That means for trusted programs which do not disable speculation
   speculation_enable() is a simple bit check and return.

   If in the call chain of a BPF program speculation is disabled then it
   stays disabled unti the program returns.

Thouhgts?

	tglx



   

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] SSB extra 1
  2018-05-06  8:32                   ` Thomas Gleixner
@ 2018-05-06 21:48                     ` Thomas Gleixner
  2018-05-06 22:40                       ` [MODERATED] " Dave Hansen
  0 siblings, 1 reply; 31+ messages in thread
From: Thomas Gleixner @ 2018-05-06 21:48 UTC (permalink / raw)
  To: speck

On Sun, 6 May 2018, speck for Thomas Gleixner wrote:
> On Fri, 4 May 2018, speck for Thomas Gleixner wrote:
> >   Now lets look at EBPF. EBPF is also a nesting context.
> > 
> >   So, if EBPF runs in preemptible task context, then it sets a flag in the
> >   task 'ebf_speculation_disabled' and sets TIF_RDS, which means that on
> >   migration the normal switch_to() logic will take care of it. Obviously we
> >   need a per task storage for the prctl selected state. I already did this
> >   for the force disable thing. If EBF reenables then it uses the per task
> >   prctl state.
> > 
> >   If EBPF runs in soft or hardirq context then it can uses the per cpu
> >   refcount. The above rules apply.
> 
> Following up the discussion on IRC:
> 
>  - BPF runs also in NMI context, but that's not a real problem
> 
>  - BPF programms can 'call' other BPF programs
> 
>    There are two forms: Regular calls which return to the caller and tail
>    calls. Tail calls make the speculation control more complicated because
>    the resulting disable/enable invocations become imbalanced.
> 
>    So the counter does not work, but we can makee BPF speculation control
>    to use a simple boolean flag which denotes that speculation has been
>    disabled.
> 
>    The JIT or interpreter injects only the speculation_disable() call into
>    the BPF programs which are not trusted, but no speculation_enable() call.
> 
>    speculation_enable() is called unconditionally at the BPF invocation
>    wrapper after the BPF execution terminates.
> 
>    speculation_disable() notes the disabling per context (thread, softirq,
>    hardirq, nmi) and speculation_enable() checks the disabled bit for the
>    context and acts accordinlgy.
> 
>    That means for trusted programs which do not disable speculation
>    speculation_enable() is a simple bit check and return.
> 
>    If in the call chain of a BPF program speculation is disabled then it
>    stays disabled unti the program returns.

I looked deeper into that and its just ugly.

So here are a few more questions from a BPF oblivious person:

  1) I read about trusted and untrusted BPF programs. How is that
     determined/defined?

  2) What's the context in which untrusted BPF can run?

     There is perf context. But that would for untrusted hopefully mean
     that's in the context of a user space application which is monitored
     with perf. Of course the BPF filter attached to perf is running in NMI
     but it should only have access to userspace data which belongs to the
     monitored process.

     I read about socket filters, but did not find the context. So if that
     would run in softirq devlivery context then the whole concept would be
     broken, so I assume its in the syscall context of the application.

     So if untrusted code is generally confined to the context and data of
     the process which sets up the BPF, then we could simply force disable
     SSB when the BPF is set up and be done with it. That would simplify
     the whole stuff massively.

Though that might be just me being naive. Any insight?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 3/5] SSB extra 1
  2018-05-06 21:48                     ` Thomas Gleixner
@ 2018-05-06 22:40                       ` Dave Hansen
  2018-05-07  6:19                         ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Hansen @ 2018-05-06 22:40 UTC (permalink / raw)
  To: speck

On 05/06/2018 02:48 PM, speck for Thomas Gleixner wrote:
>   1) I read about trusted and untrusted BPF programs. How is that
>      determined/defined?

There's a rather steep learning curve, I fear.

There's bpf_map->unpriv_array which tells whether or not you have to
deploy Spectre V1 mitigations.  That's just derived from CAP_SYS_ADMIN
when the map is allocated.

Separately, I've seen some hard-coded checks against BPF_PROG_TYPE_*,
like may_access_direct_pkt_data().

So I think the trust ends up being anchored in the interfaces that
allocated the program (or map?) to begin with, but it's somewhat hard to
follow.

Is that what you meant?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 3/5] SSB extra 1
  2018-05-06 22:40                       ` [MODERATED] " Dave Hansen
@ 2018-05-07  6:19                         ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2018-05-07  6:19 UTC (permalink / raw)
  To: speck

On Sun, 6 May 2018, speck for Dave Hansen wrote:
> On 05/06/2018 02:48 PM, speck for Thomas Gleixner wrote:
> >   1) I read about trusted and untrusted BPF programs. How is that
> >      determined/defined?
> 
> There's a rather steep learning curve, I fear.
> 
> There's bpf_map->unpriv_array which tells whether or not you have to
> deploy Spectre V1 mitigations.  That's just derived from CAP_SYS_ADMIN
> when the map is allocated.
> 
> Separately, I've seen some hard-coded checks against BPF_PROG_TYPE_*,
> like may_access_direct_pkt_data().
> 
> So I think the trust ends up being anchored in the interfaces that
> allocated the program (or map?) to begin with, but it's somewhat hard to
> follow.
> 
> Is that what you meant?

Kinda. So the question is what is the definition of trusted vs. untrusted,
if there is any.

I digged deeper into the code and the user supplied socket filters run in
SOFTIRQ_RX context, so the question whether this can be restricted to the
context of the program which installed the filter is answered. We
can't.....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: [PATCH 5/5] SSB extra 4
  2018-05-03 22:29 [MODERATED] [PATCH 0/5] SSB extra 0 Dave Hansen
                   ` (8 preceding siblings ...)
  2018-05-04 17:01 ` [MODERATED] Re: [PATCH 4/5] SSB extra 5 Konrad Rzeszutek Wilk
@ 2018-05-21  9:56 ` Jiri Kosina
  2018-05-21 13:38   ` Thomas Gleixner
  9 siblings, 1 reply; 31+ messages in thread
From: Jiri Kosina @ 2018-05-21  9:56 UTC (permalink / raw)
  To: speck

On Thu, 3 May 2018, speck for Dave Hansen wrote:

> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 0e17379..947ae8a 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -11,6 +11,7 @@
>  #include <linux/init.h>
>  #include <linux/utsname.h>
>  #include <linux/cpu.h>
> +#include <linux/filter.h>
>  #include <linux/module.h>
>  #include <linux/nospec.h>
>  #include <linux/prctl.h>
> @@ -148,10 +149,23 @@ EXPORT_SYMBOL_GPL(x86_get_default_spec_ctrl);
>  
>  static inline u64 intel_rds_mask(void)
>  {
> +	u64 mask;
> +
>  	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
>  		return 0;
>  
> -	return rds_tif_to_spec_ctrl(current_thread_info()->flags);
> +	mask = rds_tif_to_spec_ctrl(current_thread_info()->flags);
> +
> +	/*
> +	 * BPF programs can be exploited to attack the kernel.
> +	 * Leave the RDS bit on when we recently ran one.  This
> +	 * bit gets cleared after a BFP program has not run on
> +	 * the CPU for a while.
> +	 */
> +	if (get_cpu_var(bpf_prog_ran))
> +		mask |= SPEC_CTRL_RDS;
> +
> +	return mask;

I know that different aproach is being taken for BPF mitigation, but in 
case anyone is basing his first wave of updates on this ...

This doesn't look right. intel_rds_mask() is being called whenever 
X86_FEATURE_IBRS is set && vendor is X86_VENDOR_INTEL.

There is no check whatsoever whether X86_FEATURE_RDS is set, and therefore 
on ucode / kernel combination that has X86_FEATURE_IBRS but doesn't have 
X86_FEATURE_RDS, the RDS bits gets set in the MSR value anyway.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/5] SSB extra 4
  2018-05-21  9:56 ` [MODERATED] Re: [PATCH 5/5] SSB extra 4 Jiri Kosina
@ 2018-05-21 13:38   ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2018-05-21 13:38 UTC (permalink / raw)
  To: speck

On Mon, 21 May 2018, speck for Jiri Kosina wrote:
> On Thu, 3 May 2018, speck for Dave Hansen wrote:
> 
> > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > index 0e17379..947ae8a 100644
> > --- a/arch/x86/kernel/cpu/bugs.c
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -11,6 +11,7 @@
> >  #include <linux/init.h>
> >  #include <linux/utsname.h>
> >  #include <linux/cpu.h>
> > +#include <linux/filter.h>
> >  #include <linux/module.h>
> >  #include <linux/nospec.h>
> >  #include <linux/prctl.h>
> > @@ -148,10 +149,23 @@ EXPORT_SYMBOL_GPL(x86_get_default_spec_ctrl);
> >  
> >  static inline u64 intel_rds_mask(void)
> >  {
> > +	u64 mask;
> > +
> >  	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> >  		return 0;
> >  
> > -	return rds_tif_to_spec_ctrl(current_thread_info()->flags);
> > +	mask = rds_tif_to_spec_ctrl(current_thread_info()->flags);
> > +
> > +	/*
> > +	 * BPF programs can be exploited to attack the kernel.
> > +	 * Leave the RDS bit on when we recently ran one.  This
> > +	 * bit gets cleared after a BFP program has not run on
> > +	 * the CPU for a while.
> > +	 */
> > +	if (get_cpu_var(bpf_prog_ran))
> > +		mask |= SPEC_CTRL_RDS;
> > +
> > +	return mask;
> 
> I know that different aproach is being taken for BPF mitigation, but in 
> case anyone is basing his first wave of updates on this ...

Please don't. It's not working in several ways.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] cBPF affectedness (was Re: [PATCH 0/5] SSB extra 0)
  2018-05-04 22:26     ` Kees Cook
@ 2018-05-23  7:17       ` Jiri Kosina
  2018-05-23 13:56         ` [MODERATED] " Alexei Starovoitov
  0 siblings, 1 reply; 31+ messages in thread
From: Jiri Kosina @ 2018-05-23  7:17 UTC (permalink / raw)
  To: speck

On Fri, 4 May 2018, speck for Kees Cook wrote:

> > >> BPF is a potential source of gadgets that can be used for memory=20
> > >> diambiguation-based attacks.  To help mitigate these, we enable the
> > >> bit in SPEC_CTRL which enables the reduced (memory) speculation
> > >> mode on the processor when runing BPF code.
> > >=20
> > > Do you mean eBPF, or even cBPF?=20
> >=20
> > Right or wrong, my assumption is that you can build gadgets with any of
> > the variants.  I haven't looked into detail to whether the classic VM
> > and enhanced VM have the building blocks.
> > [...]
> > Were we just not concerned before because seccomp didn't use arrays?
> 
> My understanding from Jann Horn (who looked at the seccomp cases before)
> was that eBPF maps were required for SpectreV1. I've asked him for
> clarification on SSB, but I think the reasoning holds there too.
> 
> Specifically, my memory of the details are that since seccomp uses a
> subset of even cBPF and can only use absolute indexing (not dynamic), cBPF
> programs cannot be constructed that do array access outside of the seccomp
> data buffer (which consists entirely of userspace information: pid,
> syscall number, syscall arguments), not even from the 16-byte "scratch"
> space. i.e. the seccomp cBPF verifier, seccomp_check_filter(), results
> in no seccomp cBPF programs having the "untrusted_offset_from_caller"
> indexing (to borrow the variable name from the Project Zero write-up on
> SpectreV1). It also has no maps (so there cannot be data sharing between
> multiple seccomp cBPF programs nor to the outside world). And finally,
> the only "visible" output from seccomp is via syscall behaviors, which
> would create too much noise for that exfiltration path.
> 
> Note, though, that I am not an expert in eBPF nor writing speculation
> exploits. Which gets us to the next question...

Now that Alexei is on this list, let me bring the question of 
(seccomp-)cBPF up again.

Specifically I am of course asking to see whether we need any kind of 
mitigation in pre-eBPF (and also pre-1be7f75d1668 eBPF) kernels.

Thanks.

> > [...]
> > Either way, I don't think we have the BPF experts on this list, yet.
> > I'm a BPF newbie for sure.
> 
> What's the plan for involving them?
> 
> Whatever the case, I think we need to find a way to exempt seccomp cBPF
> =66rom this...

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [MODERATED] Re: cBPF affectedness (was Re: [PATCH 0/5] SSB extra 0)
  2018-05-23  7:17       ` [MODERATED] cBPF affectedness (was Re: [PATCH 0/5] SSB extra 0) Jiri Kosina
@ 2018-05-23 13:56         ` Alexei Starovoitov
  0 siblings, 0 replies; 31+ messages in thread
From: Alexei Starovoitov @ 2018-05-23 13:56 UTC (permalink / raw)
  To: speck

On Wed, May 23, 2018 at 09:17:06AM +0200, speck for Jiri Kosina wrote:
> On Fri, 4 May 2018, speck for Kees Cook wrote:
> 
> > > >> BPF is a potential source of gadgets that can be used for memory=20
> > > >> diambiguation-based attacks.  To help mitigate these, we enable the
> > > >> bit in SPEC_CTRL which enables the reduced (memory) speculation
> > > >> mode on the processor when runing BPF code.
> > > >=20
> > > > Do you mean eBPF, or even cBPF?=20
> > >=20
> > > Right or wrong, my assumption is that you can build gadgets with any of
> > > the variants.  I haven't looked into detail to whether the classic VM
> > > and enhanced VM have the building blocks.
> > > [...]
> > > Were we just not concerned before because seccomp didn't use arrays?
> > 
> > My understanding from Jann Horn (who looked at the seccomp cases before)
> > was that eBPF maps were required for SpectreV1. I've asked him for
> > clarification on SSB, but I think the reasoning holds there too.
> > 
> > Specifically, my memory of the details are that since seccomp uses a
> > subset of even cBPF and can only use absolute indexing (not dynamic), cBPF
> > programs cannot be constructed that do array access outside of the seccomp
> > data buffer (which consists entirely of userspace information: pid,
> > syscall number, syscall arguments), not even from the 16-byte "scratch"
> > space. i.e. the seccomp cBPF verifier, seccomp_check_filter(), results
> > in no seccomp cBPF programs having the "untrusted_offset_from_caller"
> > indexing (to borrow the variable name from the Project Zero write-up on
> > SpectreV1). It also has no maps (so there cannot be data sharing between
> > multiple seccomp cBPF programs nor to the outside world). And finally,
> > the only "visible" output from seccomp is via syscall behaviors, which
> > would create too much noise for that exfiltration path.
> > 
> > Note, though, that I am not an expert in eBPF nor writing speculation
> > exploits. Which gets us to the next question...
> 
> Now that Alexei is on this list, let me bring the question of 
> (seccomp-)cBPF up again.
> 
> Specifically I am of course asking to see whether we need any kind of 
> mitigation in pre-eBPF (and also pre-1be7f75d1668 eBPF) kernels.

You mean whether SSB exploit can be made to work with cBPF ?
I don't think so. Classic BPF doesn't have a concept of pointers.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2018-05-23 13:56 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-03 22:29 [MODERATED] [PATCH 0/5] SSB extra 0 Dave Hansen
2018-05-03 22:29 ` [MODERATED] [PATCH 1/5] SSB extra 2 Dave Hansen
2018-05-03 22:29 ` [MODERATED] [PATCH 2/5] SSB extra 3 Dave Hansen
2018-05-03 22:29 ` [MODERATED] [PATCH 3/5] SSB extra 1 Dave Hansen
2018-05-03 22:29 ` [MODERATED] [PATCH 4/5] SSB extra 5 Dave Hansen
2018-05-03 22:29 ` [MODERATED] [PATCH 5/5] SSB extra 4 Dave Hansen
2018-05-03 23:27 ` [MODERATED] Re: [PATCH 0/5] SSB extra 0 Kees Cook
2018-05-04  1:37   ` Dave Hansen
2018-05-04 22:26     ` Kees Cook
2018-05-23  7:17       ` [MODERATED] cBPF affectedness (was Re: [PATCH 0/5] SSB extra 0) Jiri Kosina
2018-05-23 13:56         ` [MODERATED] " Alexei Starovoitov
2018-05-04  9:20 ` [MODERATED] Re: [PATCH 1/5] SSB extra 2 Peter Zijlstra
2018-05-04 14:04   ` Dave Hansen
2018-05-04 15:50     ` Peter Zijlstra
2018-05-04 15:54       ` Linus Torvalds
2018-05-04 13:33 ` [PATCH 3/5] SSB extra 1 Thomas Gleixner
2018-05-04 14:22   ` [MODERATED] " Dave Hansen
2018-05-04 14:26     ` Thomas Gleixner
2018-05-04 16:04       ` [MODERATED] " Andi Kleen
2018-05-04 16:09         ` Thomas Gleixner
2018-05-04 16:28           ` [MODERATED] " Andi Kleen
2018-05-04 16:32             ` Thomas Gleixner
2018-05-04 16:43               ` [MODERATED] " Dave Hansen
2018-05-04 18:39                 ` Thomas Gleixner
2018-05-06  8:32                   ` Thomas Gleixner
2018-05-06 21:48                     ` Thomas Gleixner
2018-05-06 22:40                       ` [MODERATED] " Dave Hansen
2018-05-07  6:19                         ` Thomas Gleixner
2018-05-04 17:01 ` [MODERATED] Re: [PATCH 4/5] SSB extra 5 Konrad Rzeszutek Wilk
2018-05-21  9:56 ` [MODERATED] Re: [PATCH 5/5] SSB extra 4 Jiri Kosina
2018-05-21 13:38   ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.