All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
@ 2018-08-31 20:56 Jiri Kosina
  2018-09-03  8:51 ` Jiri Kosina
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-08-31 20:56 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David
  Cc: linux-kernel, x86

From: Jiri Kosina <jkosina@suse.cz>

STIBP is a feature provided by certain Intel ucodes / CPUs. This feature 
(once enabled) prevents cross-hyperthread control of decisions made by 
indirect branch predictors.

Enable this feature if

- the CPU is vulnerable to spectre v2
- the CPU supports SMT
- spectre_v2 mitigation autoselection is enabled (default)

After some previous discussion, this patch leaves STIBP on all the time, 
as wrmsr on crossing kernel boundary is a no-no. This could perhaps later 
be a bit more optimized (like disabling it in NOHZ, experiment with 
disabling it in idle, etc) if needed.

Cc: stable@vger.kernel.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

Let's add the most basic STIBP support, as it has been kind of lost in all 
the previous noise.

 arch/x86/kernel/cpu/bugs.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 4c2313d0b9ca..02edf8a6ced7 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -325,6 +325,12 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
 	return cmd;
 }
 
+static bool __init stibp_needed(void)
+{
+	return (cpu_smt_control != CPU_SMT_NOT_SUPPORTED &&
+			boot_cpu_has(X86_FEATURE_STIBP));
+}
+
 static void __init spectre_v2_select_mitigation(void)
 {
 	enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
@@ -344,6 +350,12 @@ static void __init spectre_v2_select_mitigation(void)
 
 	case SPECTRE_V2_CMD_FORCE:
 	case SPECTRE_V2_CMD_AUTO:
+		if (stibp_needed()) {
+			/* Enable STIBP on SMT-capable systems */
+			pr_info("Spectre v2 cross-process SMT mitigation: Enabling STIBP\n");
+			x86_spec_ctrl_base |= SPEC_CTRL_STIBP;
+			wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
+		}
 		if (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED)) {
 			mode = SPECTRE_V2_IBRS_ENHANCED;
 			/* Force it so VMEXIT will restore correctly */
-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH] x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
  2018-08-31 20:56 [PATCH] x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation Jiri Kosina
@ 2018-09-03  8:51 ` Jiri Kosina
  2018-09-03 12:45 ` [PATCH v2 0/3] Harden spectrev2 userspace-userspace protection Jiri Kosina
  2018-09-04 14:23 ` [PATCH v3 " Jiri Kosina
  2 siblings, 0 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-09-03  8:51 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David
  Cc: linux-kernel, x86

On Fri, 31 Aug 2018, Jiri Kosina wrote:

> From: Jiri Kosina <jkosina@suse.cz>
> 
> STIBP is a feature provided by certain Intel ucodes / CPUs. This feature 
> (once enabled) prevents cross-hyperthread control of decisions made by 
> indirect branch predictors.
> 
> Enable this feature if
> 
> - the CPU is vulnerable to spectre v2
> - the CPU supports SMT
> - spectre_v2 mitigation autoselection is enabled (default)
> 
> After some previous discussion, this patch leaves STIBP on all the time, 
> as wrmsr on crossing kernel boundary is a no-no. This could perhaps later 
> be a bit more optimized (like disabling it in NOHZ, experiment with 
> disabling it in idle, etc) if needed.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
> 
> Let's add the most basic STIBP support, as it has been kind of lost in all 
> the previous noise.

After some discussions with Peter, this actually makes a little sense with 
the IBPB implementation we currently have upstream, as that's basically 
never used (I thought upstream had the same what distros had -- IBPB 
issued in cases where tasks can't ptrace each other, but that apparently 
got ditched in the process for some reason).

If the argument was that this is too expensive performance-wise, well, 
then there is nospectre_v2 for those who are fine with that.

Given the fact that the attack is real, I think we should default to 
STIBP+IBPB in the non-ptrace-capable case. Some distros (SUSE for example) 
do issue the IBPB in such a way.

I'll submit v2 with that later today.

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 0/3] Harden spectrev2 userspace-userspace protection
  2018-08-31 20:56 [PATCH] x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation Jiri Kosina
  2018-09-03  8:51 ` Jiri Kosina
@ 2018-09-03 12:45 ` Jiri Kosina
  2018-09-04 14:23 ` [PATCH v3 " Jiri Kosina
  2 siblings, 0 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-09-03 12:45 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David
  Cc: linux-kernel, x86, Oleg Nesterov, Tim Chen


[ added a few more CCs for v2 ]

Currently, linux kernel is basically not preventing userspace-userspace 
spectrev2 attack, because:

- IBPB is basically unused (issued only for tasks that marked themselves 
  explicitly non-dumpable, which is absolutely negligible minority of all 
  software out there), therefore cross-process branch buffer posioning 
  using spectrev2 is possible

- STIBP is completely unused, therefore cross-process branch buffer 
  poisoning using spectrev2 between processess running on two HT siblings 
  thread s is possible

This patchset changes IBPB semantics, so that it's now applied whenever 
context-switching between processess that can't use ptrace() to achieve 
the same. This admittedly comes with extra overhad on a context switch; 
systems that don't care about could disable the mitigation using 
nospectre_v2 boot option.
The IBPB implementaion is heavily based on original patches by Tim Chen.

In addition to that, we unconditionally turn STIBP on so that HT siblings 
always have separate branch buffers.

We've been carrying IBPB implementation with the same semantics in our 
(SUSE) trees since january disclosure; STIBP was more or less ignored up 
to today.

Jiri Kosina (3):
      ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
      x86/speculation: Apply IBPB more strictly to avoid cross-process spectre v2 leak
      x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation

 arch/x86/kernel/cpu/bugs.c | 12 ++++++++++++
 arch/x86/mm/tlb.c          | 15 ++++++---------
 include/linux/ptrace.h     | 15 +++++++++++++++
 kernel/ptrace.c            | 17 +++++++++++++----
 4 files changed, 46 insertions(+), 13 deletions(-)

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 0/3] Harden spectrev2 userspace-userspace protection
  2018-08-31 20:56 [PATCH] x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation Jiri Kosina
  2018-09-03  8:51 ` Jiri Kosina
  2018-09-03 12:45 ` [PATCH v2 0/3] Harden spectrev2 userspace-userspace protection Jiri Kosina
@ 2018-09-04 14:23 ` Jiri Kosina
  2018-09-04 14:40   ` [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks Jiri Kosina
                     ` (2 more replies)
  2 siblings, 3 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-09-04 14:23 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David, Oleg Nesterov, Tim Chen
  Cc: linux-kernel, x86

Currently, linux kernel is basically not preventing userspace-userspace 
spectrev2 attack, because:

- IBPB is basically unused (issued only for tasks that marked themselves 
  explicitly non-dumpable, which is absolutely negligible minority of all 
  software out there), therefore cross-process branch buffer posioning 
  using spectrev2 is possible

- STIBP is completely unused, therefore cross-process branch buffer 
  poisoning using spectrev2 between processess running on two HT siblings 
  thread s is possible

This patchset changes IBPB semantics, so that it's now applied whenever 
context-switching between processess that can't use ptrace() to achieve 
the same. This admittedly comes with extra overhad on a context switch; 
systems that don't care about could disable the mitigation using 
nospectre_v2 boot option.
The IBPB implementaion is heavily based on original patches by Tim Chen.

In addition to that, we unconditionally turn STIBP on so that HT siblings 
always have separate branch buffers.

We've been carrying IBPB implementation with the same semantics in our 
(SUSE) trees since january disclosure; STIBP was more or less ignored up 
to today.

v1->v2:
	include IBPB changes
v2->v3: 
	fix IBPB 'who can trace who' semantics
	wire up STIBP flipping to SMT hotplug

Jiri Kosina (3):
      ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
      x86/speculation: Apply IBPB more strictly to avoid cross-process spectre v2 leak
      x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation

 arch/x86/kernel/cpu/bugs.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/mm/tlb.c          | 15 ++++++---------
 include/linux/ptrace.h     | 15 +++++++++++++++
 kernel/cpu.c               | 13 ++++++++++++-
 kernel/ptrace.c            | 17 +++++++++++++----
 5 files changed, 109 insertions(+), 14 deletions(-)

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 14:23 ` [PATCH v3 " Jiri Kosina
@ 2018-09-04 14:40   ` Jiri Kosina
  2018-09-04 16:13     ` Thomas Gleixner
                       ` (2 more replies)
  2018-09-04 14:42   ` [PATCH v3 2/3] x86/speculation: apply IBPB more strictly to avoid cross-process data leak Jiri Kosina
  2018-09-04 14:42   ` [PATCH v3 3/3] x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation Jiri Kosina
  2 siblings, 3 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-09-04 14:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David, Oleg Nesterov, Tim Chen
  Cc: linux-kernel, x86

From: Jiri Kosina <jkosina@suse.cz>

Current ptrace_may_access() implementation assumes that the 'source' task is
always the caller (current).

Expose ___ptrace_may_access() that can be used to apply the check on arbitrary
tasks.

Originally-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

Sorry for the resend, my pine is buggered and broke threading.

 include/linux/ptrace.h | 12 ++++++++++++
 kernel/ptrace.c        | 13 ++++++++++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4f36431c380b..09744d4113fb 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -87,6 +87,18 @@ extern void exit_ptrace(struct task_struct *tracer, struct list_head *dead);
  */
 extern bool ptrace_may_access(struct task_struct *task, unsigned int mode);
 
+/**
+ * ___ptrace_may_access - variant of ptrace_may_access that can be used
+ * between two arbitrary tasks
+ * @curr: source task
+ * @task: target task
+ * @mode: selects type of access and caller credentials
+ *
+ * Returns true on success, false on denial.
+ */
+extern int ___ptrace_may_access(struct task_struct *curr, struct task_struct *task,
+				unsigned int mode);
+
 static inline int ptrace_reparented(struct task_struct *child)
 {
 	return !same_thread_group(child->real_parent, child->parent);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 21fec73d45d4..07ff6685ebed 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -268,9 +268,10 @@ static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
 }
 
 /* Returns 0 on success, -errno on denial. */
-static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
+int ___ptrace_may_access(struct task_struct *curr, struct task_struct *task,
+		unsigned int mode)
 {
-	const struct cred *cred = current_cred(), *tcred;
+	const struct cred *cred, *tcred;
 	struct mm_struct *mm;
 	kuid_t caller_uid;
 	kgid_t caller_gid;
@@ -290,9 +291,10 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 	 */
 
 	/* Don't let security modules deny introspection */
-	if (same_thread_group(task, current))
+	if (same_thread_group(task, curr))
 		return 0;
 	rcu_read_lock();
+	cred =  __task_cred(curr);
 	if (mode & PTRACE_MODE_FSCREDS) {
 		caller_uid = cred->fsuid;
 		caller_gid = cred->fsgid;
@@ -331,6 +333,11 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 	return security_ptrace_access_check(task, mode);
 }
 
+static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
+{
+	return ___ptrace_may_access(current, task, mode);
+}
+
 bool ptrace_may_access(struct task_struct *task, unsigned int mode)
 {
 	int err;

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 2/3] x86/speculation: apply IBPB more strictly to avoid cross-process data leak
  2018-09-04 14:23 ` [PATCH v3 " Jiri Kosina
  2018-09-04 14:40   ` [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks Jiri Kosina
@ 2018-09-04 14:42   ` Jiri Kosina
  2018-09-04 16:18     ` Thomas Gleixner
  2018-09-05  7:52     ` Peter Zijlstra
  2018-09-04 14:42   ` [PATCH v3 3/3] x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation Jiri Kosina
  2 siblings, 2 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-09-04 14:42 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David, Oleg Nesterov, Tim Chen
  Cc: linux-kernel, x86

From: Jiri Kosina <jkosina@suse.cz>

Currently, we are issuing IBPB only in cases when switching into a non-dumpable
process, the rationale being to protect such 'important and security sensitive'
processess (such as GPG) from data leak into a different userspace process via
spectre v2.

This is however completely insufficient to provide proper userspace-to-userpace
spectrev2 protection, as any process can poison branch buffers before being
scheduled out, and the newly scheduled process immediately becomes spectrev2
victim.

In order to minimize the performance impact (for usecases that do require
spectrev2 protection), issue the barrier only in cases when switching between
processess where the victim can't be ptraced by the potential attacker (as in
such cases, the attacker doesn't have to bother with branch buffers at all).

Fixes: 18bf3c3ea8 ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
Originally-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

Sorry for the resend, my pine is buggered and broke threading.

 arch/x86/mm/tlb.c      | 15 ++++++---------
 include/linux/ptrace.h |  3 +++
 kernel/ptrace.c        |  4 +++-
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index e96b99eb800c..ce33e58b9327 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -7,6 +7,7 @@
 #include <linux/export.h>
 #include <linux/cpu.h>
 #include <linux/debugfs.h>
+#include <linux/ptrace.h>
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
@@ -262,18 +263,14 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		 * one process from doing Spectre-v2 attacks on another.
 		 *
 		 * As an optimization, flush indirect branches only when
-		 * switching into processes that disable dumping. This
-		 * protects high value processes like gpg, without having
-		 * too high performance overhead. IBPB is *expensive*!
-		 *
-		 * This will not flush branches when switching into kernel
-		 * threads. It will also not flush if we switch to idle
-		 * thread and back to the same process. It will flush if we
-		 * switch to a different non-dumpable process.
+		 * switching into a processes that can't be ptrace by the
+		 * current one (as in such case, attacker has much more
+		 * convenient way how to tamper with the next process than
+		 * branch buffer poisoning).
 		 */
 		if (tsk && tsk->mm &&
 		    tsk->mm->context.ctx_id != last_ctx_id &&
-		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
+		    ___ptrace_may_access(current, tsk, PTRACE_MODE_IBPB))
 			indirect_branch_prediction_barrier();
 
 		if (IS_ENABLED(CONFIG_VMAP_STACK)) {
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 09744d4113fb..adab379b5456 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -64,12 +64,15 @@ extern void exit_ptrace(struct task_struct *tracer, struct list_head *dead);
 #define PTRACE_MODE_NOAUDIT	0x04
 #define PTRACE_MODE_FSCREDS 0x08
 #define PTRACE_MODE_REALCREDS 0x10
+#define PTRACE_MODE_NOACCESS_CHK 0x20
 
 /* shorthands for READ/ATTACH and FSCREDS/REALCREDS combinations */
 #define PTRACE_MODE_READ_FSCREDS (PTRACE_MODE_READ | PTRACE_MODE_FSCREDS)
 #define PTRACE_MODE_READ_REALCREDS (PTRACE_MODE_READ | PTRACE_MODE_REALCREDS)
 #define PTRACE_MODE_ATTACH_FSCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_FSCREDS)
 #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS)
+#define PTRACE_MODE_IBPB (PTRACE_MODE_ATTACH | PTRACE_MODE_NOAUDIT \
+			  | PTRACE_MODE_NOACCESS_CHK | PTRACE_MODE_REALCREDS)
 
 /**
  * ptrace_may_access - check whether the caller is permitted to access
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 07ff6685ebed..b41c37f44c32 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -330,7 +330,9 @@ int ___ptrace_may_access(struct task_struct *curr, struct task_struct *task,
 	     !ptrace_has_cap(mm->user_ns, mode)))
 	    return -EPERM;
 
-	return security_ptrace_access_check(task, mode);
+	if (!(mode & PTRACE_MODE_NOACCESS_CHK))
+		return security_ptrace_access_check(task, mode);
+	return 0;
 }
 
 static int __ptrace_may_access(struct task_struct *task, unsigned int mode)


-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v3 3/3] x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
  2018-09-04 14:23 ` [PATCH v3 " Jiri Kosina
  2018-09-04 14:40   ` [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks Jiri Kosina
  2018-09-04 14:42   ` [PATCH v3 2/3] x86/speculation: apply IBPB more strictly to avoid cross-process data leak Jiri Kosina
@ 2018-09-04 14:42   ` Jiri Kosina
  2 siblings, 0 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-09-04 14:42 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David, Oleg Nesterov, Tim Chen
  Cc: linux-kernel, x86

From: Jiri Kosina <jkosina@suse.cz>

STIBP is a feature provided by certain Intel ucodes / CPUs. This feature 
(once enabled) prevents cross-hyperthread control of decisions made by 
indirect branch predictors.

Enable this feature if

- the CPU is vulnerable to spectre v2
- the CPU supports SMT and has SMT siblings online
- spectre_v2 mitigation autoselection is enabled (default)

After some previous discussion, this patch leaves STIBP on all the time, 
as wrmsr on crossing kernel boundary is a no-no. This could perhaps later 
be a bit more optimized (like disabling it in NOHZ, experiment with 
disabling it in idle, etc) if needed.

Note: the code could be made less awkward if it'd be guaranteed that STIBP 
could be kept on on a primary thread with SMT sibling being offline, 
without potentially imposing performance penalty. This doesn't seem to be 
defined anywhere though, so let's better be safe then sorry and always 
flip STIBP both on primary and sibling threads on hotplug transitions.

Cc: stable@vger.kernel.org
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---

Sorry for the resend, my pine is buggered and broke threading.

 arch/x86/kernel/cpu/bugs.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/cpu.c               | 13 +++++++++-
 2 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 40bdaea97fe7..ba3df0a49a2e 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -325,6 +325,56 @@ static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void)
 	return cmd;
 }
 
+static bool stibp_needed(void)
+{
+	if (spectre_v2_enabled == SPECTRE_V2_NONE)
+		return false;
+
+	if (cpu_smt_control != CPU_SMT_ENABLED)
+		return false;
+
+	if (!boot_cpu_has(X86_FEATURE_STIBP))
+		return false;
+
+	return true;
+}
+
+/*
+ * The read-modify-write of the MSR doesn't need any race protection here,
+ * as we're running in atomic context.
+ */
+static void enable_stibp(void *info)
+{
+	u64 mask;
+	rdmsrl(MSR_IA32_SPEC_CTRL, mask);
+	mask |= SPEC_CTRL_STIBP;
+	wrmsrl(MSR_IA32_SPEC_CTRL, mask);
+}
+
+static void disable_stibp(void *info)
+{
+	u64 mask;
+	rdmsrl(MSR_IA32_SPEC_CTRL, mask);
+	mask &= ~SPEC_CTRL_STIBP;
+	wrmsrl(MSR_IA32_SPEC_CTRL, mask);
+}
+
+void arch_smt_enable_errata(void)
+{
+	if (stibp_needed()) {
+		pr_info("Spectre v2 cross-process SMT mitigation: Enabling STIBP\n");
+		on_each_cpu(enable_stibp, NULL, 1);
+	}
+}
+
+void arch_smt_disable_errata(void)
+{
+	if (stibp_needed()) {
+		pr_info("Spectre v2 cross-process SMT mitigation: Disabling STIBP\n");
+		on_each_cpu(disable_stibp, NULL, 1);
+	}
+}
+
 static void __init spectre_v2_select_mitigation(void)
 {
 	enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline();
@@ -424,6 +474,9 @@ static void __init spectre_v2_select_mitigation(void)
 		setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW);
 		pr_info("Enabling Restricted Speculation for firmware calls\n");
 	}
+
+	/* Enable STIBP on BP if needed */
+	arch_smt_enable_errata();
 }
 
 #undef pr_fmt
@@ -655,6 +708,16 @@ void x86_spec_ctrl_setup_ap(void)
 
 	if (ssb_mode == SPEC_STORE_BYPASS_DISABLE)
 		x86_amd_ssb_disable();
+
+	/*
+	 * If we are here during system bootup, enable STIBP.
+	 *
+	 * If we are here because of SMT hotplug, STIBP will be enabled by the
+	 * SMT control code (enabling here would not be sufficient, as it
+	 * needs to happen on primary threads as well).
+	 */
+	if (stibp_needed() && system_state < SYSTEM_RUNNING)
+		enable_stibp(NULL);
 }
 
 #undef pr_fmt
diff --git a/kernel/cpu.c b/kernel/cpu.c
index aa7fe85ad62e..d3613d546829 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2025,17 +2025,27 @@ static void cpuhp_online_cpu_device(unsigned int cpu)
 	kobject_uevent(&dev->kobj, KOBJ_ONLINE);
 }
 
+/*
+ * Architectures that need SMT-specific errata handling during SMT hotplug
+ * should override these.
+ */
+void __weak arch_smt_enable_errata(void) { };
+void __weak arch_smt_disable_errata(void) { };
+
 static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
 {
 	int cpu, ret = 0;
 
 	cpu_maps_update_begin();
+	arch_smt_disable_errata();
 	for_each_online_cpu(cpu) {
 		if (topology_is_primary_thread(cpu))
 			continue;
 		ret = cpu_down_maps_locked(cpu, CPUHP_OFFLINE);
-		if (ret)
+		if (ret) {
+			arch_smt_enable_errata();
 			break;
+		}
 		/*
 		 * As this needs to hold the cpu maps lock it's impossible
 		 * to call device_offline() because that ends up calling
@@ -2073,6 +2083,7 @@ static int cpuhp_smt_enable(void)
 		/* See comment in cpuhp_smt_disable() */
 		cpuhp_online_cpu_device(cpu);
 	}
+	arch_smt_enable_errata();
 	cpu_maps_update_done();
 	return ret;
 }

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 14:40   ` [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks Jiri Kosina
@ 2018-09-04 16:13     ` Thomas Gleixner
  2018-09-04 16:21       ` Thomas Gleixner
  2018-09-04 17:26     ` Tim Chen
  2018-09-05  7:51     ` Peter Zijlstra
  2 siblings, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2018-09-04 16:13 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Andrea Arcangeli,
	Woodhouse, David, Oleg Nesterov, Tim Chen, linux-kernel, x86

On Tue, 4 Sep 2018, Jiri Kosina wrote:
> +/**
> + * ___ptrace_may_access - variant of ptrace_may_access that can be used
> + * between two arbitrary tasks
> + * @curr: source task

That's a pretty shitty name, really. If @curr is supposed to be an
arbitrary task, then please rename it in a way which makes that entirely
clear. curr is too close to current. Just grep for 'task_struct \*curr'.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 2/3] x86/speculation: apply IBPB more strictly to avoid cross-process data leak
  2018-09-04 14:42   ` [PATCH v3 2/3] x86/speculation: apply IBPB more strictly to avoid cross-process data leak Jiri Kosina
@ 2018-09-04 16:18     ` Thomas Gleixner
  2018-09-05  7:59       ` Peter Zijlstra
  2018-09-05  7:52     ` Peter Zijlstra
  1 sibling, 1 reply; 38+ messages in thread
From: Thomas Gleixner @ 2018-09-04 16:18 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Andrea Arcangeli,
	Woodhouse, David, Oleg Nesterov, Tim Chen, linux-kernel, x86

On Tue, 4 Sep 2018, Jiri Kosina wrote:
>  		if (tsk && tsk->mm &&
>  		    tsk->mm->context.ctx_id != last_ctx_id &&
> -		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
> +		    ___ptrace_may_access(current, tsk, PTRACE_MODE_IBPB))

Uurgh. If X86_FEATURE_USE_IBPB is not enabled, then the whole
__ptrace_may_access() overhead is just done for nothing.

>  			indirect_branch_prediction_barrier();

This really wants to be runtime patched:

		if (static_cpu_has(X86_FEATURE_USE_IBPB))
			stop_speculation(tsk, last_ctx_id);			

and have an inline for that:

static inline void stop_speculation(struct task_struct *tsk, u64 last_ctx_id)
{
	if (tsk && tsk->mm && tsk->mm->context.ctx_id != last_ctx_id &&
		___ptrace_may_access(current, tsk, PTRACE_MODE_IBPB))
		indirect_branch_prediction_barrier();
}

which also makes the whole mess readable.

Hmm?

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 16:13     ` Thomas Gleixner
@ 2018-09-04 16:21       ` Thomas Gleixner
  0 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2018-09-04 16:21 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Andrea Arcangeli,
	Woodhouse, David, Oleg Nesterov, Tim Chen, linux-kernel, x86

On Tue, 4 Sep 2018, Thomas Gleixner wrote:
> On Tue, 4 Sep 2018, Jiri Kosina wrote:
> > +/**
> > + * ___ptrace_may_access - variant of ptrace_may_access that can be used
> > + * between two arbitrary tasks
> > + * @curr: source task
> 
> That's a pretty shitty name, really. If @curr is supposed to be an
> arbitrary task, then please rename it in a way which makes that entirely
> clear. curr is too close to current. Just grep for 'task_struct \*curr'.

Something like '*tracer' would make it entirely clear what's going on.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 14:40   ` [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks Jiri Kosina
  2018-09-04 16:13     ` Thomas Gleixner
@ 2018-09-04 17:26     ` Tim Chen
  2018-09-04 17:35       ` Jiri Kosina
  2018-09-05  7:51     ` Peter Zijlstra
  2 siblings, 1 reply; 38+ messages in thread
From: Tim Chen @ 2018-09-04 17:26 UTC (permalink / raw)
  To: Jiri Kosina, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Josh Poimboeuf, Andrea Arcangeli, Woodhouse, David,
	Oleg Nesterov, Casey Schaufler
  Cc: linux-kernel, x86

On 09/04/2018 07:40 AM, Jiri Kosina wrote:
> From: Jiri Kosina <jkosina@suse.cz>
> 
> Current ptrace_may_access() implementation assumes that the 'source' task is
> always the caller (current).
> 
> Expose ___ptrace_may_access() that can be used to apply the check on arbitrary
> tasks.

Casey recently has proposed putting the decision making of whether to
do IBPB in the security module.

https://lwn.net/ml/kernel-hardening/20180815235355.14908-4-casey.schaufler@intel.com/

That will have the advantage of giving the administrator a more flexibility
of when to turn on IBPB.  The policy is very similar to what you have proposed here
but I think the security module is a more appropriate place for the security policy.

Thanks.

Tim

> 
> Originally-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
> ---
> 
> Sorry for the resend, my pine is buggered and broke threading.
> 
>  include/linux/ptrace.h | 12 ++++++++++++
>  kernel/ptrace.c        | 13 ++++++++++---
>  2 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
> index 4f36431c380b..09744d4113fb 100644
> --- a/include/linux/ptrace.h
> +++ b/include/linux/ptrace.h
> @@ -87,6 +87,18 @@ extern void exit_ptrace(struct task_struct *tracer, struct list_head *dead);
>   */
>  extern bool ptrace_may_access(struct task_struct *task, unsigned int mode);
>  
> +/**
> + * ___ptrace_may_access - variant of ptrace_may_access that can be used
> + * between two arbitrary tasks
> + * @curr: source task
> + * @task: target task
> + * @mode: selects type of access and caller credentials
> + *
> + * Returns true on success, false on denial.
> + */
> +extern int ___ptrace_may_access(struct task_struct *curr, struct task_struct *task,
> +				unsigned int mode);
> +
>  static inline int ptrace_reparented(struct task_struct *child)
>  {
>  	return !same_thread_group(child->real_parent, child->parent);
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 21fec73d45d4..07ff6685ebed 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -268,9 +268,10 @@ static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
>  }
>  
>  /* Returns 0 on success, -errno on denial. */
> -static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> +int ___ptrace_may_access(struct task_struct *curr, struct task_struct *task,
> +		unsigned int mode)
>  {
> -	const struct cred *cred = current_cred(), *tcred;
> +	const struct cred *cred, *tcred;
>  	struct mm_struct *mm;
>  	kuid_t caller_uid;
>  	kgid_t caller_gid;
> @@ -290,9 +291,10 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
>  	 */
>  
>  	/* Don't let security modules deny introspection */
> -	if (same_thread_group(task, current))
> +	if (same_thread_group(task, curr))
>  		return 0;
>  	rcu_read_lock();
> +	cred =  __task_cred(curr);
>  	if (mode & PTRACE_MODE_FSCREDS) {
>  		caller_uid = cred->fsuid;
>  		caller_gid = cred->fsgid;
> @@ -331,6 +333,11 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
>  	return security_ptrace_access_check(task, mode);
>  }
>  
> +static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> +{
> +	return ___ptrace_may_access(current, task, mode);
> +}
> +
>  bool ptrace_may_access(struct task_struct *task, unsigned int mode)
>  {
>  	int err;
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 17:26     ` Tim Chen
@ 2018-09-04 17:35       ` Jiri Kosina
  2018-09-04 18:10         ` Schaufler, Casey
  2018-09-05  8:00         ` Peter Zijlstra
  0 siblings, 2 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-09-04 17:35 UTC (permalink / raw)
  To: Tim Chen
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David, Oleg Nesterov,
	Casey Schaufler, linux-kernel, x86

On Tue, 4 Sep 2018, Tim Chen wrote:

> > Current ptrace_may_access() implementation assumes that the 'source' task is
> > always the caller (current).
> > 
> > Expose ___ptrace_may_access() that can be used to apply the check on arbitrary
> > tasks.
> 
> Casey recently has proposed putting the decision making of whether to
> do IBPB in the security module.
> 
> https://lwn.net/ml/kernel-hardening/20180815235355.14908-4-casey.schaufler@intel.com/
> 
> That will have the advantage of giving the administrator a more flexibility
> of when to turn on IBPB.  The policy is very similar to what you have proposed here
> but I think the security module is a more appropriate place for the security policy.

Yeah, well, honestly, I have a bit hard time buying the "generic 
sidechannel prevention security module" idea, given how completely 
different in nature all the mitigations have been so far. I don't see that 
trying to abstract this somehow provides more clarity.

So if this should be done in LSM, it'd probably have to be written by 
someone else than me :) who actually understands how the "sidechannel LSM" 
idea works.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 17:35       ` Jiri Kosina
@ 2018-09-04 18:10         ` Schaufler, Casey
  2018-09-04 18:48           ` Jiri Kosina
  2018-09-04 23:37           ` Andrea Arcangeli
  2018-09-05  8:00         ` Peter Zijlstra
  1 sibling, 2 replies; 38+ messages in thread
From: Schaufler, Casey @ 2018-09-04 18:10 UTC (permalink / raw)
  To: Jiri Kosina, Tim Chen
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David, Oleg Nesterov, linux-kernel,
	x86, Schaufler, Casey

> -----Original Message-----
> From: Jiri Kosina [mailto:jikos@kernel.org]
> Sent: Tuesday, September 04, 2018 10:35 AM
> To: Tim Chen <tim.c.chen@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>;
> Peter Zijlstra <peterz@infradead.org>; Josh Poimboeuf
> <jpoimboe@redhat.com>; Andrea Arcangeli <aarcange@redhat.com>;
> Woodhouse, David <dwmw@amazon.co.uk>; Oleg Nesterov
> <oleg@redhat.com>; Schaufler, Casey <casey.schaufler@intel.com>; linux-
> kernel@vger.kernel.org; x86@kernel.org
> Subject: Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can
> be applied on arbitrary tasks
> 
> On Tue, 4 Sep 2018, Tim Chen wrote:
> 
> > > Current ptrace_may_access() implementation assumes that the 'source'
> task is
> > > always the caller (current).
> > >
> > > Expose ___ptrace_may_access() that can be used to apply the check on
> arbitrary
> > > tasks.
> >
> > Casey recently has proposed putting the decision making of whether to
> > do IBPB in the security module.
> >
> > https://lwn.net/ml/kernel-hardening/20180815235355.14908-4-
> casey.schaufler@intel.com/
> >
> > That will have the advantage of giving the administrator a more flexibility
> > of when to turn on IBPB.  The policy is very similar to what you have proposed
> here
> > but I think the security module is a more appropriate place for the security
> policy.
> 
> Yeah, well, honestly, I have a bit hard time buying the "generic
> sidechannel prevention security module" idea, given how completely
> different in nature all the mitigations have been so far. I don't see that
> trying to abstract this somehow provides more clarity.

The real reason to use an LSM based approach is that overloading ptrace
checks is a Really Bad Idea. Ptrace is a user interface. Side-channel is a
processor interface. Even if ptrace_may_access() does exactly what you
want it to for side-channel mitigation today it would be incredibly
inappropriate to tie the two together for eternity. You don't want to restrict
the ptrace checks to only those things that are also good for side-channel,
and vice versa. 


> So if this should be done in LSM, it'd probably have to be written by
> someone else than me :) who actually understands how the "sidechannel LSM"
> idea works.

Yes. That would be me. 

> 
> Thanks,
> 
> --
> Jiri Kosina
> SUSE Labs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 18:10         ` Schaufler, Casey
@ 2018-09-04 18:48           ` Jiri Kosina
  2018-09-04 23:26             ` Tim Chen
  2018-09-04 23:37           ` Andrea Arcangeli
  1 sibling, 1 reply; 38+ messages in thread
From: Jiri Kosina @ 2018-09-04 18:48 UTC (permalink / raw)
  To: Schaufler, Casey
  Cc: Tim Chen, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Josh Poimboeuf, Andrea Arcangeli, Woodhouse, David,
	Oleg Nesterov, linux-kernel, x86

On Tue, 4 Sep 2018, Schaufler, Casey wrote:

> > So if this should be done in LSM, it'd probably have to be written by
> > someone else than me :) who actually understands how the "sidechannel LSM"
> > idea works.
> 
> Yes. That would be me. 

Ok, cool. Then 1/2 and 2/3 can be ignored / replaced by Casey's LSM stuff. 

Some form of 3/3 still should be merged independently on that.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 18:48           ` Jiri Kosina
@ 2018-09-04 23:26             ` Tim Chen
  2018-09-05  6:22               ` Jiri Kosina
  0 siblings, 1 reply; 38+ messages in thread
From: Tim Chen @ 2018-09-04 23:26 UTC (permalink / raw)
  To: Jiri Kosina, Schaufler, Casey
  Cc: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David, Oleg Nesterov, linux-kernel,
	x86, Andi Kleen

On 09/04/2018 11:48 AM, Jiri Kosina wrote:
> On Tue, 4 Sep 2018, Schaufler, Casey wrote:
> 
>>> So if this should be done in LSM, it'd probably have to be written by
>>> someone else than me :) who actually understands how the "sidechannel LSM"
>>> idea works.
>>
>> Yes. That would be me. 
> 
> Ok, cool. Then 1/2 and 2/3 can be ignored / replaced by Casey's LSM stuff. 
> 
> Some form of 3/3 still should be merged independently on that.


I think STIBP should be an opt in option as it will have significant
impact on performance.  The attack from neighbor thread is pretty
difficult to pull off considering you have to know what the sibling
thread is running and its address allocation.

We could also use a security module to opt in the STIBP policy.

Tim 

> 
> Thanks,
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 18:10         ` Schaufler, Casey
  2018-09-04 18:48           ` Jiri Kosina
@ 2018-09-04 23:37           ` Andrea Arcangeli
  2018-09-05  1:00             ` Schaufler, Casey
  1 sibling, 1 reply; 38+ messages in thread
From: Andrea Arcangeli @ 2018-09-04 23:37 UTC (permalink / raw)
  To: Schaufler, Casey
  Cc: Jiri Kosina, Tim Chen, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Josh Poimboeuf, Woodhouse, David, Oleg Nesterov,
	linux-kernel, x86

Hello,

On Tue, Sep 04, 2018 at 06:10:47PM +0000, Schaufler, Casey wrote:
> The real reason to use an LSM based approach is that overloading ptrace
> checks is a Really Bad Idea. Ptrace is a user interface. Side-channel is a
> processor interface. Even if ptrace_may_access() does exactly what you

"Side channel is a processor interface" doesn't make me optimistic,
but I assume you're not speaking for Intel.

> want it to for side-channel mitigation today it would be incredibly
> inappropriate to tie the two together for eternity. You don't want to restrict
> the ptrace checks to only those things that are also good for side-channel,
> and vice versa. 

It seems like you want to make this more configurable, we have all
debugfs x86 specific tweaks to disable IBPB at runtime and we don't
allow a runtime opt-out of IBPB alone.

If you shutdown either IBRS or retpolines at runtime with debugfs,
then IBPB goes away too.

Giving a finegrined way to disable only IBPB we found was overkill
because IBPB has never been measurable if done only when the prev task
cannot ptrace the next task (which is a superset of the too weak
upstream not dumpable check, but still not a performance issue).

Allowing IBPB runtime opt-out doesn't make much sense if you don't
allow to disable retpolines too still at runtime. And disabling
retpolines from LSM doesn't sounds the right place, it's an x86
temporary quirk only relevant for current buggy CPUs.

There should be a function that decides when IBPB and flush_RSB should
be run (flush_RSB has to be run if SMEP because with SMEP there's no
need to run flush_RSB at every kernel entry anymore), and that
function happens to check ptrace today, but it would check other stuff
too if we had other interfaces besides ptrace that would allow the
prev task to read the memory of the next task. So it's not so much
about ptrace nor about IBPB, it's about "IBPB&flush_RSB" vs "prev task
can read memory of next task". Then each arch can implement
"IBPB&flush_RSB" method its own way but the check is for the common
code and it should be in the scheduler and there's just 1 version of
this check needed.

I don't think there should be a ton of different versions of this
function each providing a different answer, which is what LSM would
provide.

At most you can add a x86 debugfs tweak to shut off the logic but
again it would make more sense if retpolines could be shut off at
runtime too, doing it just for IBPB sounds backwards because it has
such an unmeasurable overhead.

> Yes. That would be me. 

Even the attempt to make an innocuous call like
ptrace_has_cap(tcred->user_ns, mode) will eventually
deadlock there.

I used a PTRACE_MODE_ check that the ptrace check code uses to filter
out specific calls that may eventually enter LSM and hard lockup in
non reproducible workloads (some false positive IBPB is ok, especially
if it avoids a deadlock).

Everything can be fixed in any way, but calling LSM from scheduler
code doesn't sound the most robust thing to do in general because what
works outside the scheduler doesn't work from within those semi atomic
regions (it tends to work initially until it eventually
deadlocks). The original code of such check, had all sort of deadlocks
because of that.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 23:37           ` Andrea Arcangeli
@ 2018-09-05  1:00             ` Schaufler, Casey
  2018-09-05  2:38               ` Andrea Arcangeli
  0 siblings, 1 reply; 38+ messages in thread
From: Schaufler, Casey @ 2018-09-05  1:00 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Jiri Kosina, Tim Chen, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Josh Poimboeuf, Woodhouse, David, Oleg Nesterov,
	linux-kernel, x86, Schaufler, Casey

> -----Original Message-----
> From: Andrea Arcangeli [mailto:aarcange@redhat.com]
> Sent: Tuesday, September 04, 2018 4:37 PM
> To: Schaufler, Casey <casey.schaufler@intel.com>
> Cc: Jiri Kosina <jikos@kernel.org>; Tim Chen <tim.c.chen@linux.intel.com>;
> Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>;
> Peter Zijlstra <peterz@infradead.org>; Josh Poimboeuf
> <jpoimboe@redhat.com>; Woodhouse, David <dwmw@amazon.co.uk>; Oleg
> Nesterov <oleg@redhat.com>; linux-kernel@vger.kernel.org; x86@kernel.org
> Subject: Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can
> be applied on arbitrary tasks
> 
> Hello,
> 
> On Tue, Sep 04, 2018 at 06:10:47PM +0000, Schaufler, Casey wrote:
> > The real reason to use an LSM based approach is that overloading ptrace
> > checks is a Really Bad Idea. Ptrace is a user interface. Side-channel is a
> > processor interface. Even if ptrace_may_access() does exactly what you
> 
> "Side channel is a processor interface" doesn't make me optimistic,
> but I assume you're not speaking for Intel.

Sorry, I've been working in security too long for my
optimistic streak to be very wide.

> > want it to for side-channel mitigation today it would be incredibly
> > inappropriate to tie the two together for eternity. You don't want to restrict
> > the ptrace checks to only those things that are also good for side-channel,
> > and vice versa.
> 
> It seems like you want to make this more configurable,

Not especially, no. I have gotten considerable feedback that
it should be configurable, and while I may not see the value myself,
I do respect the people who've requested the configurability.

> we have all
> debugfs x86 specific tweaks to disable IBPB at runtime and we don't
> allow a runtime opt-out of IBPB alone.
> 
> If you shutdown either IBRS or retpolines at runtime with debugfs,
> then IBPB goes away too.
> 
> Giving a finegrined way to disable only IBPB we found was overkill
> because IBPB has never been measurable if done only when the prev task
> cannot ptrace the next task (which is a superset of the too weak
> upstream not dumpable check, but still not a performance issue).
> 
> Allowing IBPB runtime opt-out doesn't make much sense if you don't
> allow to disable retpolines too still at runtime. And disabling
> retpolines from LSM doesn't sounds the right place, it's an x86
> temporary quirk only relevant for current buggy CPUs.
> 
> There should be a function that decides when IBPB and flush_RSB should
> be run (flush_RSB has to be run if SMEP because with SMEP there's no
> need to run flush_RSB at every kernel entry anymore), and that
> function happens to check ptrace today, but it would check other stuff
> too if we had other interfaces besides ptrace that would allow the
> prev task to read the memory of the next task. So it's not so much
> about ptrace nor about IBPB, it's about "IBPB&flush_RSB" vs "prev task
> can read memory of next task". Then each arch can implement
> "IBPB&flush_RSB" method its own way but the check is for the common
> code and it should be in the scheduler and there's just 1 version of
> this check needed.

Right, I get that. There's more to ptrace access than "I can read the
other task". There's more to what the processor is up to than IBPB.

> 
> I don't think there should be a ton of different versions of this
> function each providing a different answer, which is what LSM would
> provide.

If they provide different answers there should be different
functions. It's a question of viewpoint. If you don't care about
the SELinux viewpoint you shouldn't have to include it in your
checks, any more than you care about x86 issues when you're
running on a MIPS processor.

> At most you can add a x86 debugfs tweak to shut off the logic but
> again it would make more sense if retpolines could be shut off at
> runtime too, doing it just for IBPB sounds backwards because it has
> such an unmeasurable overhead.
> 
> > Yes. That would be me.
> 
> Even the attempt to make an innocuous call like
> ptrace_has_cap(tcred->user_ns, mode) will eventually
> deadlock there.

Which is yet another reason there needs to be separate logic.

> I used a PTRACE_MODE_ check that the ptrace check code uses to filter
> out specific calls that may eventually enter LSM and hard lockup in
> non reproducible workloads (some false positive IBPB is ok, especially
> if it avoids a deadlock).

Yes, even security people have to worry about locking.

> Everything can be fixed in any way, but calling LSM from scheduler
> code doesn't sound the most robust thing to do in general because what
> works outside the scheduler doesn't work from within those semi atomic
> regions (it tends to work initially until it eventually
> deadlocks). The original code of such check, had all sort of deadlocks
> because of that.

What *is* the most robust way?
Yes, there are locking issues. The code in the LSM infrastructure and in
the security modules needs to be aware of that and deal with it. The SELinux
code I proposed is more complex than it could be because the audit code
requires locking that doesn't work in the switching context.

> Thanks,
> Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-05  1:00             ` Schaufler, Casey
@ 2018-09-05  2:38               ` Andrea Arcangeli
  0 siblings, 0 replies; 38+ messages in thread
From: Andrea Arcangeli @ 2018-09-05  2:38 UTC (permalink / raw)
  To: Schaufler, Casey
  Cc: Jiri Kosina, Tim Chen, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Josh Poimboeuf, Woodhouse, David, Oleg Nesterov,
	linux-kernel, x86

On Wed, Sep 05, 2018 at 01:00:37AM +0000, Schaufler, Casey wrote:
> Sorry, I've been working in security too long for my
> optimistic streak to be very wide.

Eheh. So I was simply trying to follow in context, but it wasn't
entirely clear, so I tried to take it out of context and then it was
even less clear, never mind I was only looking for some more
explanation.

> Not especially, no. I have gotten considerable feedback that
> it should be configurable, and while I may not see the value myself,
> I do respect the people who've requested the configurability.

Correct me if I'm wrong, but LSM as last word "module" implies to make
this logic modular.

My wondering is because:

1) why not just a single version of this logic in the scheduler?
   (i.e. can only be on or off or perhaps a only-ibpb-dumpable tweak
   to retain current slightly higher perf but less secure behavior)

2) even if you want a myriad of versions of this IBPB logic, how do
   the different versions can possibly fit in a whole "module" whose
   semantics have pretty much nothing to do with the other methods in
   the module like LSM_HOOK_INIT(capable, cap_capable),
   LSM_HOOK_INIT(vm_enough_memory, cap_vm_enough_memory), and
   LSM_HOOK_INIT(mmap_addr, cap_mmap_addr), or even
   LSM_HOOK_INIT(inode_follow_link, selinux_inode_follow_link),
   LSM_HOOK_INIT(inode_permission, selinux_inode_permission). I mean
   it looks an apple to oranges monolithic link in the same module. Or
   are you going to load this method in a separate module with only a
   single method implemented and then load multiple LSM modules at the
   same time?

3) if you need so much tweaking that not even a single off/on switch
   independent of any module loaded through LSM is not enough, how is
   it ok that all the rest shouldn't be configurable? All the rest has
   more performance impact than this one so it'd start from there if
   something.

I understand how configurablity is valuable (this is why we kept
everything dynamic tunable at runtime, including the PTI stop machine
to allow even PTI TLB flushes to be turned off).

I'm just trying to understand how IBPB fits in a LSM module
implementation.

Even if you build with CONFIG_SECURITY=n PTI won't go away, retpoline
won't go away, the l1flush in vmenter won't go away, the
pte/transhugepmd inversion won't go away, why only the runtime
tunability or tweaking of IBPB fits in a LSM module?

> If they provide different answers there should be different
> functions. It's a question of viewpoint. If you don't care about
> the SELinux viewpoint you shouldn't have to include it in your
> checks, any more than you care about x86 issues when you're
> running on a MIPS processor.

I don't see how selinux fits into this. Selinux doesn't affect virtual
memory protection. Not having selinux even built in doesn't mean you
are ok with virtual memory protection not being provided by the CPU.

Or in other words selinux fits into this as much as seccomp or
apparmor fits into this.

This is just like a memory barrier mb() to be executed in the context
switch code. Security issues can emerge with the lack of it regardless
if selinux is built in and enabled or CONFIG_SECURITY=n.

selinux matters after an exploit already happened, this as opposed is
needed to prevent the exploit in the first place.

Also the correct fully secure version provides a single answer
(i.e. in theory assuming a perfect implementation that didn't forget
anything so having a single implementation will also increase the
chances that we get as close as possible to the theoretical correct
implementation).

If it provides different answers in this case it is because somebody
wants not perfect security to provide higher performance, i.e. the
"configurablity" which is valuable and fine feature to provide. Just a
LSM module doesn't seem the most flexibile way to provide
configurability by far.

> Yes, even security people have to worry about locking.

Yes it was some lock that if contended would lockup if used from
inside the scheduler.

> What *is* the most robust way?

The below pretty much explains it.

> Yes, there are locking issues. The code in the LSM infrastructure and in
> the security modules needs to be aware of that and deal with it. The SELinux
> code I proposed is more complex than it could be because the audit code
> requires locking that doesn't work in the switching context.

Or in other words having multiple versions of this called from within
the scheduler is a bit like making the kernel modular and then because
the locking is subtle you may then have scheduler deadlocks only
happening with some kernel module loaded instead of others, but the
real question is what is the payoff compared to just allowing the
scheduler code to be tuned with x86 debugfs or sysctl the normal
way.

Also how would a new common code method in LSM fit for CPUs that are
so slow that they can't possibly need anything like IBPB and
flush_RSB()?

Thanks!
Andrea

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 23:26             ` Tim Chen
@ 2018-09-05  6:22               ` Jiri Kosina
  2018-09-05 15:58                 ` Andi Kleen
  0 siblings, 1 reply; 38+ messages in thread
From: Jiri Kosina @ 2018-09-05  6:22 UTC (permalink / raw)
  To: Tim Chen
  Cc: Schaufler, Casey, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Josh Poimboeuf, Andrea Arcangeli, Woodhouse, David,
	Oleg Nesterov, linux-kernel, x86, Andi Kleen

On Tue, 4 Sep 2018, Tim Chen wrote:

> I think STIBP should be an opt in option as it will have significant
> impact on performance.  The attack from neighbor thread is pretty
> difficult to pull off considering you have to know what the sibling
> thread is running and its address allocation.

In many scenarios the attacker can just easily taskset itself to the 
correct sibling.

> We could also use a security module to opt in the STIBP policy.

I am a bit afraid that we are offloading to sysadmins decisions that are 
very hard for them to make, as they require deep understanding of both the 
technical details of the security issue in the CPU, and the mitigation.

I surely understand that Intel is doing what they could to minimize the 
performance effect, but achieving that by making it a rocket science to 
configure it properly doesn't feel right.

So, after giving it a bit more thought, I still believe "I want spectre V2 
protection" vs. "I do not care about spectre V2 on my system 
(=nospectre_v2)" are the sane options we should provide; so I'll respin v4 
of my patchset, including the ptrace check in switch_mm() (statically 
patched out on !IBPB-capable systems), and we can then later see whether 
the LSM implementation, once it exists, should be used instead.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 14:40   ` [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks Jiri Kosina
  2018-09-04 16:13     ` Thomas Gleixner
  2018-09-04 17:26     ` Tim Chen
@ 2018-09-05  7:51     ` Peter Zijlstra
  2 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2018-09-05  7:51 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Andrea Arcangeli,
	Woodhouse, David, Oleg Nesterov, Tim Chen, linux-kernel, x86

On Tue, Sep 04, 2018 at 04:40:57PM +0200, Jiri Kosina wrote:
> From: Jiri Kosina <jkosina@suse.cz>
> 
> Current ptrace_may_access() implementation assumes that the 'source' task is
> always the caller (current).

Note that now you 'fixed' the new user, this is still true. Which makes
me think you do not in fact need this patch, right?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 2/3] x86/speculation: apply IBPB more strictly to avoid cross-process data leak
  2018-09-04 14:42   ` [PATCH v3 2/3] x86/speculation: apply IBPB more strictly to avoid cross-process data leak Jiri Kosina
  2018-09-04 16:18     ` Thomas Gleixner
@ 2018-09-05  7:52     ` Peter Zijlstra
  2018-09-05  7:55       ` Jiri Kosina
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2018-09-05  7:52 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Andrea Arcangeli,
	Woodhouse, David, Oleg Nesterov, Tim Chen, linux-kernel, x86

On Tue, Sep 04, 2018 at 04:42:02PM +0200, Jiri Kosina wrote:
>  		if (tsk && tsk->mm &&
>  		    tsk->mm->context.ctx_id != last_ctx_id &&
> -		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
> +		    ___ptrace_may_access(current, tsk, PTRACE_MODE_IBPB))
>  			indirect_branch_prediction_barrier();

See how the new (first) argument is 'current' and you could've just used
the old __ptrace_may_access().

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 2/3] x86/speculation: apply IBPB more strictly to avoid cross-process data leak
  2018-09-05  7:52     ` Peter Zijlstra
@ 2018-09-05  7:55       ` Jiri Kosina
  0 siblings, 0 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-09-05  7:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Andrea Arcangeli,
	Woodhouse, David, Oleg Nesterov, Tim Chen, linux-kernel, x86

On Wed, 5 Sep 2018, Peter Zijlstra wrote:

> >  		if (tsk && tsk->mm &&
> >  		    tsk->mm->context.ctx_id != last_ctx_id &&
> > -		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
> > +		    ___ptrace_may_access(current, tsk, PTRACE_MODE_IBPB))
> >  			indirect_branch_prediction_barrier();
> 
> See how the new (first) argument is 'current' and you could've just used
> the old __ptrace_may_access().

Yeah, 1/3 is dropped in my current series already; this was cherry-picked 
from original Tim's series, but it is indeed superfluous.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 2/3] x86/speculation: apply IBPB more strictly to avoid cross-process data leak
  2018-09-04 16:18     ` Thomas Gleixner
@ 2018-09-05  7:59       ` Peter Zijlstra
  2018-09-05  8:02         ` Jiri Kosina
  0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2018-09-05  7:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jiri Kosina, Ingo Molnar, Josh Poimboeuf, Andrea Arcangeli,
	Woodhouse, David, Oleg Nesterov, Tim Chen, linux-kernel, x86

On Tue, Sep 04, 2018 at 06:18:55PM +0200, Thomas Gleixner wrote:
> On Tue, 4 Sep 2018, Jiri Kosina wrote:
> >  		if (tsk && tsk->mm &&
> >  		    tsk->mm->context.ctx_id != last_ctx_id &&
> > -		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
> > +		    ___ptrace_may_access(current, tsk, PTRACE_MODE_IBPB))
> 
> Uurgh. If X86_FEATURE_USE_IBPB is not enabled, then the whole
> __ptrace_may_access() overhead is just done for nothing.
> 
> >  			indirect_branch_prediction_barrier();
> 
> This really wants to be runtime patched:
> 
> 		if (static_cpu_has(X86_FEATURE_USE_IBPB))
> 			stop_speculation(tsk, last_ctx_id);
> 
> and have an inline for that:
> 
> static inline void stop_speculation(struct task_struct *tsk, u64 last_ctx_id)
> {
> 	if (tsk && tsk->mm && tsk->mm->context.ctx_id != last_ctx_id &&
> 		___ptrace_may_access(current, tsk, PTRACE_MODE_IBPB))
> 		indirect_branch_prediction_barrier();
> }
> 
> which also makes the whole mess readable.

How about something like:

	if (static_cpu_has(X86_FEATURE_USE_IBPB) && need_ibpb(tsk, last_ctx_id))
		indirect_branch_predictor_barrier();

where:

static inline bool need_ibpb(struct task_struct *next, u64 last_ctx_id)
{
	return next && next->mm && next->mm->context.ctx_id != last_ctx_id &&
		__ptrace_may_access(next, PTRACE_MODE_IBPB));
}

I don't much like "stop_speculation" for a name here.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-04 17:35       ` Jiri Kosina
  2018-09-04 18:10         ` Schaufler, Casey
@ 2018-09-05  8:00         ` Peter Zijlstra
  2018-09-05 15:37           ` Schaufler, Casey
  1 sibling, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2018-09-05  8:00 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Tim Chen, Thomas Gleixner, Ingo Molnar, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David, Oleg Nesterov,
	Casey Schaufler, linux-kernel, x86

On Tue, Sep 04, 2018 at 07:35:29PM +0200, Jiri Kosina wrote:
> On Tue, 4 Sep 2018, Tim Chen wrote:
> 
> > > Current ptrace_may_access() implementation assumes that the 'source' task is
> > > always the caller (current).
> > > 
> > > Expose ___ptrace_may_access() that can be used to apply the check on arbitrary
> > > tasks.
> > 
> > Casey recently has proposed putting the decision making of whether to
> > do IBPB in the security module.
> > 
> > https://lwn.net/ml/kernel-hardening/20180815235355.14908-4-casey.schaufler@intel.com/
> > 
> > That will have the advantage of giving the administrator a more flexibility
> > of when to turn on IBPB.  The policy is very similar to what you have proposed here
> > but I think the security module is a more appropriate place for the security policy.
> 
> Yeah, well, honestly, I have a bit hard time buying the "generic 
> sidechannel prevention security module" idea, given how completely 
> different in nature all the mitigations have been so far. I don't see that 
> trying to abstract this somehow provides more clarity.
> 
> So if this should be done in LSM, it'd probably have to be written by 
> someone else than me :) who actually understands how the "sidechannel LSM" 
> idea works.

Yeah, I'm not convinced on LSM either. Lets just do these here patches
first and then Casey can try and convince us later.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 2/3] x86/speculation: apply IBPB more strictly to avoid cross-process data leak
  2018-09-05  7:59       ` Peter Zijlstra
@ 2018-09-05  8:02         ` Jiri Kosina
  2018-09-05  9:40           ` Peter Zijlstra
  0 siblings, 1 reply; 38+ messages in thread
From: Jiri Kosina @ 2018-09-05  8:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Andrea Arcangeli,
	Woodhouse, David, Oleg Nesterov, Tim Chen, linux-kernel, x86

On Wed, 5 Sep 2018, Peter Zijlstra wrote:

> On Tue, Sep 04, 2018 at 06:18:55PM +0200, Thomas Gleixner wrote:
> > On Tue, 4 Sep 2018, Jiri Kosina wrote:
> > >  		if (tsk && tsk->mm &&
> > >  		    tsk->mm->context.ctx_id != last_ctx_id &&
> > > -		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
> > > +		    ___ptrace_may_access(current, tsk, PTRACE_MODE_IBPB))
> > 
> > Uurgh. If X86_FEATURE_USE_IBPB is not enabled, then the whole
> > __ptrace_may_access() overhead is just done for nothing.
> > 
> > >  			indirect_branch_prediction_barrier();
> > 
> > This really wants to be runtime patched:
> > 
> > 		if (static_cpu_has(X86_FEATURE_USE_IBPB))
> > 			stop_speculation(tsk, last_ctx_id);
> > 
> > and have an inline for that:
> > 
> > static inline void stop_speculation(struct task_struct *tsk, u64 last_ctx_id)
> > {
> > 	if (tsk && tsk->mm && tsk->mm->context.ctx_id != last_ctx_id &&
> > 		___ptrace_may_access(current, tsk, PTRACE_MODE_IBPB))
> > 		indirect_branch_prediction_barrier();
> > }
> > 
> > which also makes the whole mess readable.
> 
> How about something like:
> 
> 	if (static_cpu_has(X86_FEATURE_USE_IBPB) && need_ibpb(tsk, last_ctx_id))
> 		indirect_branch_predictor_barrier();
> 
> where:
> 
> static inline bool need_ibpb(struct task_struct *next, u64 last_ctx_id)
> {
> 	return next && next->mm && next->mm->context.ctx_id != last_ctx_id &&
> 		__ptrace_may_access(next, PTRACE_MODE_IBPB));
> }
> 
> I don't much like "stop_speculation" for a name here.

Yeah, I did more or less that earlier today; my series currently has

+static bool ibpb_needed(struct task_struct *tsk, u64 last_ctx_id)
+{
+       return (tsk && tsk->mm && tsk->mm->context.ctx_id != last_ctx_id &&
+                       __ptrace_may_access(tsk, PTRACE_MODE_IBPB));
+}
+
[ ... ]
-               if (tsk && tsk->mm &&
-                   tsk->mm->context.ctx_id != last_ctx_id &&
-                   get_dumpable(tsk->mm) != SUID_DUMP_USER)
+               if (static_cpu_has(X86_FEATURE_USE_IBPB) &&
+                               ibpb_needed(tsk, last_ctx_id))
                        indirect_branch_prediction_barrier();

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 2/3] x86/speculation: apply IBPB more strictly to avoid cross-process data leak
  2018-09-05  8:02         ` Jiri Kosina
@ 2018-09-05  9:40           ` Peter Zijlstra
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2018-09-05  9:40 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Andrea Arcangeli,
	Woodhouse, David, Oleg Nesterov, Tim Chen, linux-kernel, x86

On Wed, Sep 05, 2018 at 10:02:41AM +0200, Jiri Kosina wrote:
> Yeah, I did more or less that earlier today; my series currently has

Excellent, maybe add a wee comment like so?

> +static bool ibpb_needed(struct task_struct *tsk, u64 last_ctx_id)
> +{
	/*
	 * Check if the current (previous) task has access to the memory
	 * of the @tsk (next) task. If access is denied, make sure to
	 * issue a IBPB to stop user->user Spectre-v2 attacks.
	 *
	 * Note: __ptrace_may_access() returns 0 or -ERRNO.
	 */
> +       return (tsk && tsk->mm && tsk->mm->context.ctx_id != last_ctx_id &&
> +                       __ptrace_may_access(tsk, PTRACE_MODE_IBPB));
> +}

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-05  8:00         ` Peter Zijlstra
@ 2018-09-05 15:37           ` Schaufler, Casey
  0 siblings, 0 replies; 38+ messages in thread
From: Schaufler, Casey @ 2018-09-05 15:37 UTC (permalink / raw)
  To: Peter Zijlstra, Jiri Kosina
  Cc: Tim Chen, Thomas Gleixner, Ingo Molnar, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David, Oleg Nesterov, linux-kernel,
	x86, Schaufler, Casey

> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Wednesday, September 05, 2018 1:00 AM
> To: Jiri Kosina <jikos@kernel.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>; Thomas Gleixner
> <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Josh Poimboeuf
> <jpoimboe@redhat.com>; Andrea Arcangeli <aarcange@redhat.com>;
> Woodhouse, David <dwmw@amazon.co.uk>; Oleg Nesterov
> <oleg@redhat.com>; Schaufler, Casey <casey.schaufler@intel.com>; linux-
> kernel@vger.kernel.org; x86@kernel.org
> Subject: Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can
> be applied on arbitrary tasks
> 
> On Tue, Sep 04, 2018 at 07:35:29PM +0200, Jiri Kosina wrote:
> > On Tue, 4 Sep 2018, Tim Chen wrote:
> >
> > > > Current ptrace_may_access() implementation assumes that the 'source'
> task is
> > > > always the caller (current).
> > > >
> > > > Expose ___ptrace_may_access() that can be used to apply the check on
> arbitrary
> > > > tasks.
> > >
> > > Casey recently has proposed putting the decision making of whether to
> > > do IBPB in the security module.
> > >
> > > https://lwn.net/ml/kernel-hardening/20180815235355.14908-4-
> casey.schaufler@intel.com/
> > >
> > > That will have the advantage of giving the administrator a more flexibility
> > > of when to turn on IBPB.  The policy is very similar to what you have
> proposed here
> > > but I think the security module is a more appropriate place for the security
> policy.
> >
> > Yeah, well, honestly, I have a bit hard time buying the "generic
> > sidechannel prevention security module" idea, given how completely
> > different in nature all the mitigations have been so far. I don't see that
> > trying to abstract this somehow provides more clarity.
> >
> > So if this should be done in LSM, it'd probably have to be written by
> > someone else than me :) who actually understands how the "sidechannel
> LSM"
> > idea works.
> 
> Yeah, I'm not convinced on LSM either. Lets just do these here patches
> first and then Casey can try and convince us later.

Works for me. There are advantages to doing it either way.
The LSM approach allows you to consider implications of LSM
data, which you can't do otherwise. Once the hook is available
it becomes the obvious place to do other checks.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-05  6:22               ` Jiri Kosina
@ 2018-09-05 15:58                 ` Andi Kleen
  2018-09-05 18:04                   ` Andrea Arcangeli
                                     ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Andi Kleen @ 2018-09-05 15:58 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Tim Chen, Schaufler, Casey, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Josh Poimboeuf, Andrea Arcangeli, Woodhouse,
	David, Oleg Nesterov, linux-kernel, x86

> So, after giving it a bit more thought, I still believe "I want spectre V2 
> protection" vs. "I do not care about spectre V2 on my system 
> (=nospectre_v2)" are the sane options we should provide; so I'll respin v4 
> of my patchset, including the ptrace check in switch_mm() (statically 
> patched out on !IBPB-capable systems), and we can then later see whether 
> the LSM implementation, once it exists, should be used instead.

Please if you repost include plenty of performance numbers for multi threaded
workloads.  It's ridiculous to even discuss this without them.

-Andi

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-05 15:58                 ` Andi Kleen
@ 2018-09-05 18:04                   ` Andrea Arcangeli
  2018-09-05 18:29                     ` Jiri Kosina
  2018-09-05 18:26                   ` Thomas Gleixner
  2018-09-05 18:35                   ` Jiri Kosina
  2 siblings, 1 reply; 38+ messages in thread
From: Andrea Arcangeli @ 2018-09-05 18:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jiri Kosina, Tim Chen, Schaufler, Casey, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Woodhouse, David,
	Oleg Nesterov, linux-kernel, x86

On Wed, Sep 05, 2018 at 08:58:23AM -0700, Andi Kleen wrote:
> > So, after giving it a bit more thought, I still believe "I want spectre V2 
> > protection" vs. "I do not care about spectre V2 on my system 
> > (=nospectre_v2)" are the sane options we should provide; so I'll respin v4 
> > of my patchset, including the ptrace check in switch_mm() (statically 
> > patched out on !IBPB-capable systems), and we can then later see whether 
> > the LSM implementation, once it exists, should be used instead.
> 
> Please if you repost include plenty of performance numbers for multi threaded
> workloads.  It's ridiculous to even discuss this without them.

Multi threaded workloads won't be affected because they share the
memory in the first place... the check itself is lost in the noise
too. Maybe you meant to ask for multiple parallel processes
(multithreaded or not, zero difference) all with a different user id?

What is more weird for me is to attempt to discuss the STIBP part of
the patch without knowing which microcodes exactly implement STIBP in
a way that is slow. Tim already said it's a measurable performance hit,
but on some CPU it's zero performance hit. We don't even know if STIBP
is actually useful or if it's a noop on those CPUs where it won't
affect performance.

Back to the IBPB, from implementation standpoint at least on 3.10 this
code posted would lockup hard eventually and we got complains.

ptrace_has_cap(tcred->user_ns, mode) is supposed to eventually lockup
hard if called from scheduler as it does some locking, and we fixed
that already half a year ago.

Not sure how it's still unfixed in Jiri's codebase after so long, or
if it's an issue specific to 3.10 and upstream gets away without this.

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index eb7862f185ff..4a8d0dd73c93 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -285,7 +285,8 @@ int ___ptrace_may_access(struct task_struct *tracer,
 	    gid_eq(caller_gid, tcred->sgid) &&
 	    gid_eq(caller_gid, tcred->gid))
 		goto ok;
-	if (ptrace_has_cap(tcred->user_ns, mode))
+	if (!(mode & PTRACE_MODE_NOACCESS_CHK) &&
+	    ptrace_has_cap(tcred->user_ns, mode))
 		goto ok;
 	rcu_read_unlock();
 	return -EPERM;
@@ -296,7 +297,8 @@ ok:
 		dumpable = get_dumpable(task->mm);
 	rcu_read_lock();
 	if (dumpable != SUID_DUMP_USER &&
-	    !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
+	    ((mode & PTRACE_MODE_NOACCESS_CHK) ||
+	     !ptrace_has_cap(__task_cred(task)->user_ns, mode))) {
 		rcu_read_unlock();
 		return -EPERM;
 	}

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-05 15:58                 ` Andi Kleen
  2018-09-05 18:04                   ` Andrea Arcangeli
@ 2018-09-05 18:26                   ` Thomas Gleixner
  2018-09-05 18:35                   ` Jiri Kosina
  2 siblings, 0 replies; 38+ messages in thread
From: Thomas Gleixner @ 2018-09-05 18:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jiri Kosina, Tim Chen, Schaufler, Casey, Ingo Molnar,
	Peter Zijlstra, Josh Poimboeuf, Andrea Arcangeli, Woodhouse,
	David, Oleg Nesterov, linux-kernel, x86

On Wed, 5 Sep 2018, Andi Kleen wrote:

> > So, after giving it a bit more thought, I still believe "I want spectre V2 
> > protection" vs. "I do not care about spectre V2 on my system 
> > (=nospectre_v2)" are the sane options we should provide; so I'll respin v4 
> > of my patchset, including the ptrace check in switch_mm() (statically 
> > patched out on !IBPB-capable systems), and we can then later see whether 
> > the LSM implementation, once it exists, should be used instead.
> 
> Please if you repost include plenty of performance numbers for multi threaded
> workloads.  It's ridiculous to even discuss this without them.

Either we care about that problem and provide a proper mechanism to protect
systems or we do not. That's not a performance number problem at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-05 18:04                   ` Andrea Arcangeli
@ 2018-09-05 18:29                     ` Jiri Kosina
  2018-09-05 18:40                       ` Andrea Arcangeli
  0 siblings, 1 reply; 38+ messages in thread
From: Jiri Kosina @ 2018-09-05 18:29 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Tim Chen, Schaufler, Casey, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Woodhouse, David,
	Oleg Nesterov, linux-kernel, x86

On Wed, 5 Sep 2018, Andrea Arcangeli wrote:

> ptrace_has_cap(tcred->user_ns, mode) is supposed to eventually lockup
> hard if called from scheduler as it does some locking, and we fixed
> that already half a year ago.
> 
> Not sure how it's still unfixed in Jiri's codebase after so long, or
> if it's an issue specific to 3.10 and upstream gets away without this.

We haven't got any lockup reports in our kernels (and we do carry a 
variant of this patch), so it might be somehow specific to 3.10.

> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index eb7862f185ff..4a8d0dd73c93 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -285,7 +285,8 @@ int ___ptrace_may_access(struct task_struct *tracer,
>  	    gid_eq(caller_gid, tcred->sgid) &&
>  	    gid_eq(caller_gid, tcred->gid))
>  		goto ok;
> -	if (ptrace_has_cap(tcred->user_ns, mode))
> +	if (!(mode & PTRACE_MODE_NOACCESS_CHK) &&
> +	    ptrace_has_cap(tcred->user_ns, mode))
>  		goto ok;
>  	rcu_read_unlock();
>  	return -EPERM;
> @@ -296,7 +297,8 @@ ok:
>  		dumpable = get_dumpable(task->mm);
>  	rcu_read_lock();
>  	if (dumpable != SUID_DUMP_USER &&
> -	    !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
> +	    ((mode & PTRACE_MODE_NOACCESS_CHK) ||
> +	     !ptrace_has_cap(__task_cred(task)->user_ns, mode))) {
>  		rcu_read_unlock();
>  		return -EPERM;

I will look into this whether it's still applicable or not, thanks a lot 
for the pointer.

(and no, my testing of the patch I sent on current tree didn't produce any 
hangs -- was there a reliable way to trigger it on 3.10?).

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-05 15:58                 ` Andi Kleen
  2018-09-05 18:04                   ` Andrea Arcangeli
  2018-09-05 18:26                   ` Thomas Gleixner
@ 2018-09-05 18:35                   ` Jiri Kosina
  2 siblings, 0 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-09-05 18:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Tim Chen, Schaufler, Casey, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Josh Poimboeuf, Andrea Arcangeli, Woodhouse,
	David, Oleg Nesterov, linux-kernel, x86

On Wed, 5 Sep 2018, Andi Kleen wrote:

> Please if you repost include plenty of performance numbers for multi 
> threaded workloads.  It's ridiculous to even discuss this without them.

Talking about ridiculous ... I find it a bit sad that Intel has let this 
be unfixed for 3/4 years in linux; that doesn't really signal deep 
dedication to customer safety. Have any STIBP patches been even submitted?

This is not the same situation as IBRS which was mostly ignored -- there 
we have retpolines to protect the kernel, and it's debatable whether it's 
exploitable on SKL at all.

Ignoring IBPB and STIBP is keeping the system plain vulnerable to 
user-user attacks, and us not providing users with possibiliy to easily 
mitigate, is a bit embarassing in my eyes.

Thanks,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-05 18:29                     ` Jiri Kosina
@ 2018-09-05 18:40                       ` Andrea Arcangeli
  2018-09-05 18:42                         ` Jiri Kosina
                                           ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Andrea Arcangeli @ 2018-09-05 18:40 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Andi Kleen, Tim Chen, Schaufler, Casey, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Woodhouse, David,
	Oleg Nesterov, linux-kernel, x86

On Wed, Sep 05, 2018 at 08:29:07PM +0200, Jiri Kosina wrote:
> (and no, my testing of the patch I sent on current tree didn't produce any 
> hangs -- was there a reliable way to trigger it on 3.10?).

Only a very specific libvirt acceptance test found this after a while
and it wasn't a customer it was caught by QA. The reporter said it
wasn't sure about how to reproduce this issue either, it happened once
in a while the backtrace was still enough to fix it for sure and then
it never happened again.

It's not because of virt but probably because of selinux+audit. This
is precisely why I thought once you enter LSM from the scheduler
atomic path the trouble starts as each LSM implementation of those
calls may crash or not crash.

Perhaps you didn't sandbox KVM inside selinux by default?

This is the lockup the patch I posted fixed for 3.10.

[ 1838.621010] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 6
[ 1838.629070] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 3.10.0-327.62.4.el7.x86_64 #1
[ 1838.637610] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.4.2 01/09/2017
[ 1838.645954] Call Trace:
[ 1838.648680]  <NMI>  [<ffffffff8163a05d>] dump_stack+0x19/0x1b
[ 1838.655113]  [<ffffffff816338d0>] panic+0xd8/0x1e7
[ 1838.660460]  [<ffffffff8111e960>] ? restart_watchdog_hrtimer+0x50/0x50
[ 1838.667742]  [<ffffffff8111ea22>] watchdog_overflow_callback+0xc2/0xd0
[ 1838.675024]  [<ffffffff81162211>] __perf_event_overflow+0xa1/0x250
[ 1838.681920]  [<ffffffff81162ce4>] perf_event_overflow+0x14/0x20
[ 1838.688526]  [<ffffffff810337c8>] intel_pmu_handle_irq+0x1e8/0x470
[ 1838.695423]  [<ffffffff812f83cc>] ? ioremap_page_range+0x24c/0x330
[ 1838.702320]  [<ffffffff811a9031>] ? unmap_kernel_range_noflush+0x11/0x20
[ 1838.709797]  [<ffffffff813997f4>] ? ghes_copy_tofrom_phys+0x124/0x210
[ 1838.716984]  [<ffffffff81399980>] ? ghes_read_estatus+0xa0/0x190
[ 1838.723687]  [<ffffffff816444bb>] perf_event_nmi_handler+0x2b/0x50
[ 1838.730582]  [<ffffffff81643c09>] nmi_handle.isra.0+0x69/0xb0
[ 1838.736992]  [<ffffffff81643db9>] do_nmi+0x169/0x340
[ 1838.742532]  [<ffffffff81642ff9>] end_repeat_nmi+0x1e/0x7e
[ 1838.748653]  [<ffffffff81641bbd>] ? _raw_spin_lock_irqsave+0x3d/0x60
[ 1838.755742]  [<ffffffff81641bbd>] ? _raw_spin_lock_irqsave+0x3d/0x60
[ 1838.762831]  [<ffffffff81641bbd>] ? _raw_spin_lock_irqsave+0x3d/0x60
[ 1838.769917]  <<EOE>>  [<ffffffff816391e5>] avc_compute_av+0x126/0x1b5
[ 1838.777125]  [<ffffffff810b842e>] ? walk_tg_tree_from+0xbe/0x110
[ 1838.783828]  [<ffffffff8128b9c4>] avc_has_perm_noaudit+0xc4/0x110
[ 1838.790628]  [<ffffffff8128f1fb>] cred_has_capability+0x6b/0x120
[ 1838.797331]  [<ffffffff810db71c>] ? ktime_get+0x4c/0xd0
[ 1838.803160]  [<ffffffff810e167b>] ? clockevents_program_event+0x6b/0xf0
[ 1838.810532]  [<ffffffff8128f2de>] selinux_capable+0x2e/0x40
[ 1838.816748]  [<ffffffff81288f65>] security_capable_noaudit+0x15/0x20
[ 1838.823829]  [<ffffffff8108b975>] has_ns_capability_noaudit+0x15/0x20
[ 1838.831014]  [<ffffffff8108bc55>] ptrace_has_cap+0x35/0x40
[ 1838.837126]  [<ffffffff8108c717>] ___ptrace_may_access+0xa7/0x1e0
[ 1838.843925]  [<ffffffff8163f0ae>] __schedule+0x26e/0xa00
[ 1838.849855]  [<ffffffff81640949>] schedule_preempt_disabled+0x29/0x70
[ 1838.857041]  [<ffffffff810d9324>] cpu_startup_entry+0x184/0x290
[ 1838.863637]  [<ffffffff8104891a>] start_secondary+0x1da/0x250

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-05 18:40                       ` Andrea Arcangeli
@ 2018-09-05 18:42                         ` Jiri Kosina
  2018-09-05 19:03                         ` Peter Zijlstra
  2018-09-05 20:02                         ` Jiri Kosina
  2 siblings, 0 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-09-05 18:42 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Tim Chen, Schaufler, Casey, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Woodhouse, David,
	Oleg Nesterov, linux-kernel, x86

On Wed, 5 Sep 2018, Andrea Arcangeli wrote:

> Perhaps you didn't sandbox KVM inside selinux by default?

We by default do not enable selinux, so that's probably why.

Thanks again, will go through the code in mainline and adapt the patch 
before sending v4.

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-05 18:40                       ` Andrea Arcangeli
  2018-09-05 18:42                         ` Jiri Kosina
@ 2018-09-05 19:03                         ` Peter Zijlstra
  2018-09-05 19:27                           ` Schaufler, Casey
  2018-09-05 20:02                         ` Jiri Kosina
  2 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2018-09-05 19:03 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Jiri Kosina, Andi Kleen, Tim Chen, Schaufler, Casey,
	Thomas Gleixner, Ingo Molnar, Josh Poimboeuf, Woodhouse, David,
	Oleg Nesterov, linux-kernel, x86

On Wed, Sep 05, 2018 at 02:40:18PM -0400, Andrea Arcangeli wrote:
> [ 1838.769917]  <<EOE>>  [<ffffffff816391e5>] avc_compute_av+0x126/0x1b5

That does read_lock(), which is not allowed from scheduler context.

> [ 1838.777125]  [<ffffffff810b842e>] ? walk_tg_tree_from+0xbe/0x110
> [ 1838.783828]  [<ffffffff8128b9c4>] avc_has_perm_noaudit+0xc4/0x110

In current code this can end up in avc_update_node() which uses
spin_lock(), which is a bug from scheduler context.o

> [ 1838.790628]  [<ffffffff8128f1fb>] cred_has_capability+0x6b/0x120
> [ 1838.797331]  [<ffffffff810db71c>] ? ktime_get+0x4c/0xd0
> [ 1838.803160]  [<ffffffff810e167b>] ? clockevents_program_event+0x6b/0xf0
> [ 1838.810532]  [<ffffffff8128f2de>] selinux_capable+0x2e/0x40
> [ 1838.816748]  [<ffffffff81288f65>] security_capable_noaudit+0x15/0x20
> [ 1838.823829]  [<ffffffff8108b975>] has_ns_capability_noaudit+0x15/0x20
> [ 1838.831014]  [<ffffffff8108bc55>] ptrace_has_cap+0x35/0x40
> [ 1838.837126]  [<ffffffff8108c717>] ___ptrace_may_access+0xa7/0x1e0
> [ 1838.843925]  [<ffffffff8163f0ae>] __schedule+0x26e/0xa00
> [ 1838.849855]  [<ffffffff81640949>] schedule_preempt_disabled+0x29/0x70
> [ 1838.857041]  [<ffffffff810d9324>] cpu_startup_entry+0x184/0x290
> [ 1838.863637]  [<ffffffff8104891a>] start_secondary+0x1da/0x250

So yes, looks like all that security LSM nonsense isn't going to work
here.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-05 19:03                         ` Peter Zijlstra
@ 2018-09-05 19:27                           ` Schaufler, Casey
  0 siblings, 0 replies; 38+ messages in thread
From: Schaufler, Casey @ 2018-09-05 19:27 UTC (permalink / raw)
  To: Peter Zijlstra, Andrea Arcangeli
  Cc: Jiri Kosina, Andi Kleen, Tim Chen, Thomas Gleixner, Ingo Molnar,
	Josh Poimboeuf, Woodhouse, David, Oleg Nesterov, linux-kernel,
	x86, Schaufler, Casey

> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@infradead.org]
> Sent: Wednesday, September 05, 2018 12:03 PM
> To: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Jiri Kosina <jikos@kernel.org>; Andi Kleen <ak@linux.intel.com>; Tim Chen
> <tim.c.chen@linux.intel.com>; Schaufler, Casey <casey.schaufler@intel.com>;
> Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>;
> Josh Poimboeuf <jpoimboe@redhat.com>; Woodhouse, David
> <dwmw@amazon.co.uk>; Oleg Nesterov <oleg@redhat.com>; linux-
> kernel@vger.kernel.org; x86@kernel.org
> Subject: Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can
> be applied on arbitrary tasks
> 
> On Wed, Sep 05, 2018 at 02:40:18PM -0400, Andrea Arcangeli wrote:
> > [ 1838.769917]  <<EOE>>  [<ffffffff816391e5>]
> avc_compute_av+0x126/0x1b5
> 
> That does read_lock(), which is not allowed from scheduler context.
> 
> > [ 1838.777125]  [<ffffffff810b842e>] ? walk_tg_tree_from+0xbe/0x110
> > [ 1838.783828]  [<ffffffff8128b9c4>] avc_has_perm_noaudit+0xc4/0x110
> 
> In current code this can end up in avc_update_node() which uses
> spin_lock(), which is a bug from scheduler context.o
> 
> > [ 1838.790628]  [<ffffffff8128f1fb>] cred_has_capability+0x6b/0x120
> > [ 1838.797331]  [<ffffffff810db71c>] ? ktime_get+0x4c/0xd0
> > [ 1838.803160]  [<ffffffff810e167b>] ?
> clockevents_program_event+0x6b/0xf0
> > [ 1838.810532]  [<ffffffff8128f2de>] selinux_capable+0x2e/0x40
> > [ 1838.816748]  [<ffffffff81288f65>] security_capable_noaudit+0x15/0x20
> > [ 1838.823829]  [<ffffffff8108b975>] has_ns_capability_noaudit+0x15/0x20
> > [ 1838.831014]  [<ffffffff8108bc55>] ptrace_has_cap+0x35/0x40
> > [ 1838.837126]  [<ffffffff8108c717>] ___ptrace_may_access+0xa7/0x1e0
> > [ 1838.843925]  [<ffffffff8163f0ae>] __schedule+0x26e/0xa00
> > [ 1838.849855]  [<ffffffff81640949>] schedule_preempt_disabled+0x29/0x70
> > [ 1838.857041]  [<ffffffff810d9324>] cpu_startup_entry+0x184/0x290
> > [ 1838.863637]  [<ffffffff8104891a>] start_secondary+0x1da/0x250
> 
> So yes, looks like all that security LSM nonsense isn't going to work
> here.

What won't work is using the ptrace code. That is one of the reasons why
you can't just blindly use it. Look at the patch set I submitted and you'll see
that the SELinux selinux_task_safe_sidechannel() hook does not do the things
that cause the lockup.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
  2018-09-05 18:40                       ` Andrea Arcangeli
  2018-09-05 18:42                         ` Jiri Kosina
  2018-09-05 19:03                         ` Peter Zijlstra
@ 2018-09-05 20:02                         ` Jiri Kosina
  2 siblings, 0 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-09-05 20:02 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Tim Chen, Schaufler, Casey, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Josh Poimboeuf, Woodhouse, David,
	Oleg Nesterov, linux-kernel, x86

On Wed, 5 Sep 2018, Andrea Arcangeli wrote:

> [ 1838.621010] Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 6
> [ 1838.629070] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 3.10.0-327.62.4.el7.x86_64 #1
> [ 1838.637610] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.4.2 01/09/2017
> [ 1838.645954] Call Trace:
> [ 1838.648680]  <NMI>  [<ffffffff8163a05d>] dump_stack+0x19/0x1b
> [ 1838.655113]  [<ffffffff816338d0>] panic+0xd8/0x1e7
> [ 1838.660460]  [<ffffffff8111e960>] ? restart_watchdog_hrtimer+0x50/0x50
> [ 1838.667742]  [<ffffffff8111ea22>] watchdog_overflow_callback+0xc2/0xd0
> [ 1838.675024]  [<ffffffff81162211>] __perf_event_overflow+0xa1/0x250
> [ 1838.681920]  [<ffffffff81162ce4>] perf_event_overflow+0x14/0x20
> [ 1838.688526]  [<ffffffff810337c8>] intel_pmu_handle_irq+0x1e8/0x470
> [ 1838.695423]  [<ffffffff812f83cc>] ? ioremap_page_range+0x24c/0x330
> [ 1838.702320]  [<ffffffff811a9031>] ? unmap_kernel_range_noflush+0x11/0x20
> [ 1838.709797]  [<ffffffff813997f4>] ? ghes_copy_tofrom_phys+0x124/0x210
> [ 1838.716984]  [<ffffffff81399980>] ? ghes_read_estatus+0xa0/0x190
> [ 1838.723687]  [<ffffffff816444bb>] perf_event_nmi_handler+0x2b/0x50
> [ 1838.730582]  [<ffffffff81643c09>] nmi_handle.isra.0+0x69/0xb0
> [ 1838.736992]  [<ffffffff81643db9>] do_nmi+0x169/0x340
> [ 1838.742532]  [<ffffffff81642ff9>] end_repeat_nmi+0x1e/0x7e
> [ 1838.748653]  [<ffffffff81641bbd>] ? _raw_spin_lock_irqsave+0x3d/0x60
> [ 1838.755742]  [<ffffffff81641bbd>] ? _raw_spin_lock_irqsave+0x3d/0x60
> [ 1838.762831]  [<ffffffff81641bbd>] ? _raw_spin_lock_irqsave+0x3d/0x60
> [ 1838.769917]  <<EOE>>  [<ffffffff816391e5>] avc_compute_av+0x126/0x1b5
> [ 1838.777125]  [<ffffffff810b842e>] ? walk_tg_tree_from+0xbe/0x110
> [ 1838.783828]  [<ffffffff8128b9c4>] avc_has_perm_noaudit+0xc4/0x110
> [ 1838.790628]  [<ffffffff8128f1fb>] cred_has_capability+0x6b/0x120
> [ 1838.797331]  [<ffffffff810db71c>] ? ktime_get+0x4c/0xd0
> [ 1838.803160]  [<ffffffff810e167b>] ? clockevents_program_event+0x6b/0xf0
> [ 1838.810532]  [<ffffffff8128f2de>] selinux_capable+0x2e/0x40
> [ 1838.816748]  [<ffffffff81288f65>] security_capable_noaudit+0x15/0x20
> [ 1838.823829]  [<ffffffff8108b975>] has_ns_capability_noaudit+0x15/0x20
> [ 1838.831014]  [<ffffffff8108bc55>] ptrace_has_cap+0x35/0x40
> [ 1838.837126]  [<ffffffff8108c717>] ___ptrace_may_access+0xa7/0x1e0
> [ 1838.843925]  [<ffffffff8163f0ae>] __schedule+0x26e/0xa00
> [ 1838.849855]  [<ffffffff81640949>] schedule_preempt_disabled+0x29/0x70
> [ 1838.857041]  [<ffffffff810d9324>] cpu_startup_entry+0x184/0x290
> [ 1838.863637]  [<ffffffff8104891a>] start_secondary+0x1da/0x250

So yeah, current Linus' tree needs the same treatment -- we have to avoid 
calling out to ptrace_has_cap() in PTRACE_MODE_NOACCESS_CHK cases.

I have updated the patch for v4. Thanks again,

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks
@ 2018-09-04 14:24 Jiri Kosina
  0 siblings, 0 replies; 38+ messages in thread
From: Jiri Kosina @ 2018-09-04 14:24 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Josh Poimboeuf,
	Andrea Arcangeli, Woodhouse, David, Oleg Nesterov, Tim Chen
  Cc: linux-kernel, x86

From: Jiri Kosina <jkosina@suse.cz>

Current ptrace_may_access() implementation assumes that the 'source' task is
always the caller (current).

Expose ___ptrace_may_access() that can be used to apply the check on arbitrary
tasks.

Originally-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 include/linux/ptrace.h | 12 ++++++++++++
 kernel/ptrace.c        | 13 ++++++++++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 4f36431c380b..09744d4113fb 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -87,6 +87,18 @@ extern void exit_ptrace(struct task_struct *tracer, struct list_head *dead);
  */
 extern bool ptrace_may_access(struct task_struct *task, unsigned int mode);
 
+/**
+ * ___ptrace_may_access - variant of ptrace_may_access that can be used
+ * between two arbitrary tasks
+ * @curr: source task
+ * @task: target task
+ * @mode: selects type of access and caller credentials
+ *
+ * Returns true on success, false on denial.
+ */
+extern int ___ptrace_may_access(struct task_struct *curr, struct task_struct *task,
+				unsigned int mode);
+
 static inline int ptrace_reparented(struct task_struct *child)
 {
 	return !same_thread_group(child->real_parent, child->parent);
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 21fec73d45d4..07ff6685ebed 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -268,9 +268,10 @@ static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
 }
 
 /* Returns 0 on success, -errno on denial. */
-static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
+int ___ptrace_may_access(struct task_struct *curr, struct task_struct *task,
+		unsigned int mode)
 {
-	const struct cred *cred = current_cred(), *tcred;
+	const struct cred *cred, *tcred;
 	struct mm_struct *mm;
 	kuid_t caller_uid;
 	kgid_t caller_gid;
@@ -290,9 +291,10 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 	 */
 
 	/* Don't let security modules deny introspection */
-	if (same_thread_group(task, current))
+	if (same_thread_group(task, curr))
 		return 0;
 	rcu_read_lock();
+	cred =  __task_cred(curr);
 	if (mode & PTRACE_MODE_FSCREDS) {
 		caller_uid = cred->fsuid;
 		caller_gid = cred->fsgid;
@@ -331,6 +333,11 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 	return security_ptrace_access_check(task, mode);
 }
 
+static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
+{
+	return ___ptrace_may_access(current, task, mode);
+}
+
 bool ptrace_may_access(struct task_struct *task, unsigned int mode)
 {
 	int err;

-- 
Jiri Kosina
SUSE Labs


^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2018-09-05 20:02 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-31 20:56 [PATCH] x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation Jiri Kosina
2018-09-03  8:51 ` Jiri Kosina
2018-09-03 12:45 ` [PATCH v2 0/3] Harden spectrev2 userspace-userspace protection Jiri Kosina
2018-09-04 14:23 ` [PATCH v3 " Jiri Kosina
2018-09-04 14:40   ` [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks Jiri Kosina
2018-09-04 16:13     ` Thomas Gleixner
2018-09-04 16:21       ` Thomas Gleixner
2018-09-04 17:26     ` Tim Chen
2018-09-04 17:35       ` Jiri Kosina
2018-09-04 18:10         ` Schaufler, Casey
2018-09-04 18:48           ` Jiri Kosina
2018-09-04 23:26             ` Tim Chen
2018-09-05  6:22               ` Jiri Kosina
2018-09-05 15:58                 ` Andi Kleen
2018-09-05 18:04                   ` Andrea Arcangeli
2018-09-05 18:29                     ` Jiri Kosina
2018-09-05 18:40                       ` Andrea Arcangeli
2018-09-05 18:42                         ` Jiri Kosina
2018-09-05 19:03                         ` Peter Zijlstra
2018-09-05 19:27                           ` Schaufler, Casey
2018-09-05 20:02                         ` Jiri Kosina
2018-09-05 18:26                   ` Thomas Gleixner
2018-09-05 18:35                   ` Jiri Kosina
2018-09-04 23:37           ` Andrea Arcangeli
2018-09-05  1:00             ` Schaufler, Casey
2018-09-05  2:38               ` Andrea Arcangeli
2018-09-05  8:00         ` Peter Zijlstra
2018-09-05 15:37           ` Schaufler, Casey
2018-09-05  7:51     ` Peter Zijlstra
2018-09-04 14:42   ` [PATCH v3 2/3] x86/speculation: apply IBPB more strictly to avoid cross-process data leak Jiri Kosina
2018-09-04 16:18     ` Thomas Gleixner
2018-09-05  7:59       ` Peter Zijlstra
2018-09-05  8:02         ` Jiri Kosina
2018-09-05  9:40           ` Peter Zijlstra
2018-09-05  7:52     ` Peter Zijlstra
2018-09-05  7:55       ` Jiri Kosina
2018-09-04 14:42   ` [PATCH v3 3/3] x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation Jiri Kosina
2018-09-04 14:24 [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks Jiri Kosina

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.