All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
@ 2018-01-29 22:04 David Woodhouse
  2018-01-30 17:48 ` Josh Poimboeuf
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: David Woodhouse @ 2018-01-29 22:04 UTC (permalink / raw)
  To: arjan, tglx, karahmed, x86, linux-kernel, tim.c.chen, bp, peterz,
	pbonzini, ak, torvalds, gregkh, mingo, luto, linux

From: Tim Chen <tim.c.chen@linux.intel.com>

Flush indirect branches when switching into a process that marked itself
non dumpable. This protects high value processes like gpg better,
without having too high performance overhead.

If done naïvely, we could switch to a kernel idle thread and then back
to the original process, such as:

    process A -> idle -> process A

In such scenario, we do not have to do IBPB here even though the process
is non-dumpable, as we are switching back to the same process after a
hiatus.

To avoid the redundant IBPB, which is expensive, we track the last mm
user context ID. The cost is to have an extra u64 mm context id to track
the last mm we were using before switching to the init_mm used by idle.
Avoiding the extra IBPB is probably worth the extra memory for this
common scenario.

For those cases where tlb_defer_switch_to_init_mm() returns true (non
PCID), lazy tlb will defer switch to init_mm, so we will not be changing
the mm for the process A -> idle -> process A switch. So IBPB will be
skipped for this case.

Thanks to the reviewers and Andy Lutomirski for the suggestion of
using ctx_id which got rid of the problem of mm pointer recycling.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/tlbflush.h |  2 ++
 arch/x86/mm/tlb.c               | 33 ++++++++++++++++++++++++++++++++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 3effd3c..4405c4b 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -174,6 +174,8 @@ struct tlb_state {
 	struct mm_struct *loaded_mm;
 	u16 loaded_mm_asid;
 	u16 next_asid;
+	/* last user mm's ctx id */
+	u64 last_ctx_id;
 
 	/*
 	 * We can be in one of several states:
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index a156195..7489890 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -6,13 +6,14 @@
 #include <linux/interrupt.h>
 #include <linux/export.h>
 #include <linux/cpu.h>
+#include <linux/debugfs.h>
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
+#include <asm/nospec-branch.h>
 #include <asm/cache.h>
 #include <asm/apic.h>
 #include <asm/uv/uv.h>
-#include <linux/debugfs.h>
 
 /*
  *	TLB flushing, formerly SMP-only
@@ -219,6 +220,27 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	} else {
 		u16 new_asid;
 		bool need_flush;
+		u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
+
+		/*
+		 * Avoid user/user BTB poisoning by flushing the branch
+		 * predictor when switching between processes. This stops
+		 * one process from doing Spectre-v2 attacks on another.
+		 *
+		 * As an optimization, flush indirect branches only when
+		 * switching into processes that disable dumping. This
+		 * protects high value processes like gpg, without having
+		 * too high performance overhead. IBPB is *expensive*!
+		 *
+		 * This will not flush branches when switching into kernel
+		 * threads. It will also not flush if we switch to idle
+		 * thread and back to the same process. It will flush if we
+		 * switch to a different non-dumpable process.
+		 */
+		if (tsk && tsk->mm &&
+		    tsk->mm->context.ctx_id != last_ctx_id &&
+		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
+			indirect_branch_prediction_barrier();
 
 		if (IS_ENABLED(CONFIG_VMAP_STACK)) {
 			/*
@@ -268,6 +290,14 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
 		}
 
+		/*
+		 * Record last user mm's context id, so we can avoid
+		 * flushing branch buffer with IBPB if we switch back
+		 * to the same user.
+		 */
+		if (next != &init_mm)
+			this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id);
+
 		this_cpu_write(cpu_tlbstate.loaded_mm, next);
 		this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);
 	}
@@ -345,6 +375,7 @@ void initialize_tlbstate_and_flush(void)
 	write_cr3(build_cr3(mm->pgd, 0));
 
 	/* Reinitialize tlbstate. */
+	this_cpu_write(cpu_tlbstate.last_ctx_id, mm->context.ctx_id);
 	this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
 	this_cpu_write(cpu_tlbstate.next_asid, 1);
 	this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-29 22:04 [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch David Woodhouse
@ 2018-01-30 17:48 ` Josh Poimboeuf
  2018-01-30 21:23   ` Tim Chen
  2018-01-30 20:38 ` Borislav Petkov
  2018-01-30 22:39 ` [tip:x86/pti] " tip-bot for Tim Chen
  2 siblings, 1 reply; 29+ messages in thread
From: Josh Poimboeuf @ 2018-01-30 17:48 UTC (permalink / raw)
  To: David Woodhouse
  Cc: arjan, tglx, karahmed, x86, linux-kernel, tim.c.chen, bp, peterz,
	pbonzini, ak, torvalds, gregkh, mingo, luto, linux

On Mon, Jan 29, 2018 at 10:04:47PM +0000, David Woodhouse wrote:
> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> Flush indirect branches when switching into a process that marked itself
> non dumpable. This protects high value processes like gpg better,
> without having too high performance overhead.

I wonder what the point of this patch is.  An audit of my laptop shows
only a single user of PR_SET_DUMPABLE: systemd-coredump.

[ And yes, I have gpg-agent running.  Also, a grep of the gnupg source
doesn't show any evidence of it being used there.  So the gpg thing
seems to be a myth. ]

But also, I much preferred the original version of the patch which only
skipped IBPB when 'prev' could ptrace 'next'.

If performance is a concern, let's look at that in more detail.  But I
don't see how the solution to a performance issue could possibly be
"leave (almost) all tasks vulnerable by default."

If the argument is that everyone should "rebuild the world" with
retpolines, then this patch would still be pointless, as we wouldn't
even need IBPB.

-- 
Josh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-29 22:04 [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch David Woodhouse
  2018-01-30 17:48 ` Josh Poimboeuf
@ 2018-01-30 20:38 ` Borislav Petkov
  2018-01-30 21:03   ` Tim Chen
  2018-01-30 22:39 ` [tip:x86/pti] " tip-bot for Tim Chen
  2 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2018-01-30 20:38 UTC (permalink / raw)
  To: David Woodhouse
  Cc: arjan, tglx, karahmed, x86, linux-kernel, tim.c.chen, peterz,
	pbonzini, ak, torvalds, gregkh, mingo, luto, linux

On Mon, Jan 29, 2018 at 10:04:47PM +0000, David Woodhouse wrote:
> +		if (tsk && tsk->mm &&
> +		    tsk->mm->context.ctx_id != last_ctx_id &&
> +		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
> +			indirect_branch_prediction_barrier();

Ok, so while staring at this, someone just came up with the following
sequence:

1. Malicious process runs with UID=A, does BTB poisoning
2. Sensitive process (e.g. gpg) starts also with UID=A, no IBPB flush occurs since task is initially dumpable
3. gpg now does prctl(PR_SET_DUMPABLE, ...) to clear the dumpable flag
4. gpg now does sensitive stuff that it thinks is protected
5. gpg does indirect branches that shouldn't be influenced by the malicious process

Now, if you switch between steps 3. and 4., you're good because gpg
became non-dumpable. But if you *don't* switch, the bad BTB entries are
still there.

So, *actually*, we need to flush IBPB in set_dumpable() too, when we
clear SUID_DUMP_USER.

Or, are we missing something obvious here and that is not needed because
of reasons I haven't thought about?

I know, gpg doesn't do prctl() but disables core dumping with
setrlimit() but there might be other processes who do that...

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-30 20:38 ` Borislav Petkov
@ 2018-01-30 21:03   ` Tim Chen
  2018-01-30 21:57     ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Tim Chen @ 2018-01-30 21:03 UTC (permalink / raw)
  To: Borislav Petkov, David Woodhouse
  Cc: arjan, tglx, karahmed, x86, linux-kernel, peterz, pbonzini, ak,
	torvalds, gregkh, mingo, luto, linux

On 01/30/2018 12:38 PM, Borislav Petkov wrote:
> On Mon, Jan 29, 2018 at 10:04:47PM +0000, David Woodhouse wrote:
>> +		if (tsk && tsk->mm &&
>> +		    tsk->mm->context.ctx_id != last_ctx_id &&
>> +		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
>> +			indirect_branch_prediction_barrier();
> 
> Ok, so while staring at this, someone just came up with the following
> sequence:
> 
> 1. Malicious process runs with UID=A, does BTB poisoning
> 2. Sensitive process (e.g. gpg) starts also with UID=A, no IBPB flush occurs since task is initially dumpable
> 3. gpg now does prctl(PR_SET_DUMPABLE, ...) to clear the dumpable flag
> 4. gpg now does sensitive stuff that it thinks is protected
> 5. gpg does indirect branches that shouldn't be influenced by the malicious process
> 
> Now, if you switch between steps 3. and 4., you're good because gpg
> became non-dumpable. But if you *don't* switch, the bad BTB entries are
> still there.

The attacker has to guarantee itself to run right before victim. And
the window of attack is very small, as only the first context switch to victim
may be vulnerable. The victim will be immune on the next switch.  
For timing attack, the attacker needs to scan one bit at a time.  
So probably the attacker could glean 1 bit of information
on the switch back to the attacker and then the attacker could not scan
further.  So it doesn't seem to be very practical attack if the victim
has set itself to be non-dumpable.

Tim

> 
> So, *actually*, we need to flush IBPB in set_dumpable() too, when we
> clear SUID_DUMP_USER.
> 
> Or, are we missing something obvious here and that is not needed because
> of reasons I haven't thought about?
> 
> I know, gpg doesn't do prctl() but disables core dumping with
> setrlimit() but there might be other processes who do that...
> 
> Thx.
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-30 17:48 ` Josh Poimboeuf
@ 2018-01-30 21:23   ` Tim Chen
  2018-01-30 22:00     ` Borislav Petkov
  2018-01-31  3:59     ` Josh Poimboeuf
  0 siblings, 2 replies; 29+ messages in thread
From: Tim Chen @ 2018-01-30 21:23 UTC (permalink / raw)
  To: Josh Poimboeuf, David Woodhouse
  Cc: arjan, tglx, karahmed, x86, linux-kernel, bp, peterz, pbonzini,
	ak, torvalds, gregkh, mingo, luto, linux

On 01/30/2018 09:48 AM, Josh Poimboeuf wrote:
> On Mon, Jan 29, 2018 at 10:04:47PM +0000, David Woodhouse wrote:
>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>
>> Flush indirect branches when switching into a process that marked itself
>> non dumpable. This protects high value processes like gpg better,
>> without having too high performance overhead.
> 
> I wonder what the point of this patch is.  An audit of my laptop shows
> only a single user of PR_SET_DUMPABLE: systemd-coredump.

This is an opt in approach.  For processes who need extra
security, it set itself as non-dumpable.  Then it can
ensure that it doesn't see any poisoned BTB.  

> 
> [ And yes, I have gpg-agent running.  Also, a grep of the gnupg source
> doesn't show any evidence of it being used there.  So the gpg thing
> seems to be a myth. ]

I'm less familiar with gpg-agent.  Dave was the one who
put in comments about gpg-agent in this patch so perhaps
he can comment.

> 
> But also, I much preferred the original version of the patch which only
> skipped IBPB when 'prev' could ptrace 'next'.

For the A->kernel thread->B scenario, you will need context of A
to decide if you need IBPB when switching to B.  You need to
worry about whether the context of A has been released ... etc if
you want to use ptrace.

> 
> If performance is a concern, let's look at that in more detail.  But I
> don't see how the solution to a performance issue could possibly be
> "leave (almost) all tasks vulnerable by default."
> 

Thanks.

Tim

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-30 21:03   ` Tim Chen
@ 2018-01-30 21:57     ` Borislav Petkov
  2018-01-30 22:26       ` Tim Chen
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2018-01-30 21:57 UTC (permalink / raw)
  To: Tim Chen
  Cc: David Woodhouse, arjan, tglx, karahmed, x86, linux-kernel,
	peterz, pbonzini, ak, torvalds, gregkh, mingo, luto, linux

On Tue, Jan 30, 2018 at 01:03:20PM -0800, Tim Chen wrote:
> So it doesn't seem to be very practical attack if the victim has set
> itself to be non-dumpable.

Probably, but considering how cheap our fix is, we might just as well
plug that hole too.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-30 21:23   ` Tim Chen
@ 2018-01-30 22:00     ` Borislav Petkov
  2018-01-30 22:21       ` Thomas Gleixner
  2018-01-31  3:59     ` Josh Poimboeuf
  1 sibling, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2018-01-30 22:00 UTC (permalink / raw)
  To: Tim Chen
  Cc: Josh Poimboeuf, David Woodhouse, arjan, tglx, karahmed, x86,
	linux-kernel, peterz, pbonzini, ak, torvalds, gregkh, mingo,
	luto, linux

On Tue, Jan 30, 2018 at 01:23:17PM -0800, Tim Chen wrote:
> I'm less familiar with gpg-agent.  Dave was the one who
> put in comments about gpg-agent in this patch so perhaps
> he can comment.

So I looked at gpg-agent and AFAICT, it disables core dumping with
setrlimit().

I wasn't able to attach to it either with gdb but didn't find where we
disable the attaching, for the couple of minutes I grepped through it.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-30 22:00     ` Borislav Petkov
@ 2018-01-30 22:21       ` Thomas Gleixner
  2018-01-30 22:55         ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2018-01-30 22:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tim Chen, Josh Poimboeuf, David Woodhouse, arjan, karahmed, x86,
	linux-kernel, peterz, pbonzini, ak, torvalds, gregkh, mingo,
	luto, linux

On Tue, 30 Jan 2018, Borislav Petkov wrote:

> On Tue, Jan 30, 2018 at 01:23:17PM -0800, Tim Chen wrote:
> > I'm less familiar with gpg-agent.  Dave was the one who
> > put in comments about gpg-agent in this patch so perhaps
> > he can comment.
> 
> So I looked at gpg-agent and AFAICT, it disables core dumping with
> setrlimit().

setrlimit() does not end up fiddling with the dumpable bits in
mm->flags. So no.

> I wasn't able to attach to it either with gdb but didn't find where we
> disable the attaching, for the couple of minutes I grepped through it.

prctl(PR_SET_DUMPABLE, 0)               = 0

That does the trick.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-30 21:57     ` Borislav Petkov
@ 2018-01-30 22:26       ` Tim Chen
  2018-01-30 22:43         ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Tim Chen @ 2018-01-30 22:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: David Woodhouse, arjan, tglx, karahmed, x86, linux-kernel,
	peterz, pbonzini, ak, torvalds, gregkh, mingo, luto, linux

On 01/30/2018 01:57 PM, Borislav Petkov wrote:
> On Tue, Jan 30, 2018 at 01:03:20PM -0800, Tim Chen wrote:
>> So it doesn't seem to be very practical attack if the victim has set
>> itself to be non-dumpable.
> 
> Probably, but considering how cheap our fix is, we might just as well
> plug that hole too.
> 

If the process has multiple threads running on different cpus,
you will need to set IBPB on all cpus they are running in
order to achieve your purpose.  So it is not necessarily cheap.
But I don't think it is really necessary.

Tim

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [tip:x86/pti] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-29 22:04 [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch David Woodhouse
  2018-01-30 17:48 ` Josh Poimboeuf
  2018-01-30 20:38 ` Borislav Petkov
@ 2018-01-30 22:39 ` tip-bot for Tim Chen
  2018-01-31  7:03   ` Dominik Brodowski
  2018-02-05 14:18   ` David Woodhouse
  2 siblings, 2 replies; 29+ messages in thread
From: tip-bot for Tim Chen @ 2018-01-30 22:39 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: tim.c.chen, dwmw, hpa, mingo, linux-kernel, tglx

Commit-ID:  18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7
Gitweb:     https://git.kernel.org/tip/18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7
Author:     Tim Chen <tim.c.chen@linux.intel.com>
AuthorDate: Mon, 29 Jan 2018 22:04:47 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 30 Jan 2018 23:09:21 +0100

x86/speculation: Use Indirect Branch Prediction Barrier in context switch

Flush indirect branches when switching into a process that marked itself
non dumpable. This protects high value processes like gpg better,
without having too high performance overhead.

If done naïvely, we could switch to a kernel idle thread and then back
to the original process, such as:

    process A -> idle -> process A

In such scenario, we do not have to do IBPB here even though the process
is non-dumpable, as we are switching back to the same process after a
hiatus.

To avoid the redundant IBPB, which is expensive, we track the last mm
user context ID. The cost is to have an extra u64 mm context id to track
the last mm we were using before switching to the init_mm used by idle.
Avoiding the extra IBPB is probably worth the extra memory for this
common scenario.

For those cases where tlb_defer_switch_to_init_mm() returns true (non
PCID), lazy tlb will defer switch to init_mm, so we will not be changing
the mm for the process A -> idle -> process A switch. So IBPB will be
skipped for this case.

Thanks to the reviewers and Andy Lutomirski for the suggestion of
using ctx_id which got rid of the problem of mm pointer recycling.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: ak@linux.intel.com
Cc: karahmed@amazon.de
Cc: arjan@linux.intel.com
Cc: torvalds@linux-foundation.org
Cc: linux@dominikbrodowski.net
Cc: peterz@infradead.org
Cc: bp@alien8.de
Cc: luto@kernel.org
Cc: pbonzini@redhat.com
Cc: gregkh@linux-foundation.org
Link: https://lkml.kernel.org/r/1517263487-3708-1-git-send-email-dwmw@amazon.co.uk

---
 arch/x86/include/asm/tlbflush.h |  2 ++
 arch/x86/mm/tlb.c               | 33 ++++++++++++++++++++++++++++++++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index d33e4a2..2b8f18c 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -174,6 +174,8 @@ struct tlb_state {
 	struct mm_struct *loaded_mm;
 	u16 loaded_mm_asid;
 	u16 next_asid;
+	/* last user mm's ctx id */
+	u64 last_ctx_id;
 
 	/*
 	 * We can be in one of several states:
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 5bfe61a..012d026 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -6,13 +6,14 @@
 #include <linux/interrupt.h>
 #include <linux/export.h>
 #include <linux/cpu.h>
+#include <linux/debugfs.h>
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
+#include <asm/nospec-branch.h>
 #include <asm/cache.h>
 #include <asm/apic.h>
 #include <asm/uv/uv.h>
-#include <linux/debugfs.h>
 
 /*
  *	TLB flushing, formerly SMP-only
@@ -247,6 +248,27 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	} else {
 		u16 new_asid;
 		bool need_flush;
+		u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
+
+		/*
+		 * Avoid user/user BTB poisoning by flushing the branch
+		 * predictor when switching between processes. This stops
+		 * one process from doing Spectre-v2 attacks on another.
+		 *
+		 * As an optimization, flush indirect branches only when
+		 * switching into processes that disable dumping. This
+		 * protects high value processes like gpg, without having
+		 * too high performance overhead. IBPB is *expensive*!
+		 *
+		 * This will not flush branches when switching into kernel
+		 * threads. It will also not flush if we switch to idle
+		 * thread and back to the same process. It will flush if we
+		 * switch to a different non-dumpable process.
+		 */
+		if (tsk && tsk->mm &&
+		    tsk->mm->context.ctx_id != last_ctx_id &&
+		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
+			indirect_branch_prediction_barrier();
 
 		if (IS_ENABLED(CONFIG_VMAP_STACK)) {
 			/*
@@ -292,6 +314,14 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
 		}
 
+		/*
+		 * Record last user mm's context id, so we can avoid
+		 * flushing branch buffer with IBPB if we switch back
+		 * to the same user.
+		 */
+		if (next != &init_mm)
+			this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id);
+
 		this_cpu_write(cpu_tlbstate.loaded_mm, next);
 		this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);
 	}
@@ -369,6 +399,7 @@ void initialize_tlbstate_and_flush(void)
 	write_cr3(build_cr3(mm->pgd, 0));
 
 	/* Reinitialize tlbstate. */
+	this_cpu_write(cpu_tlbstate.last_ctx_id, mm->context.ctx_id);
 	this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
 	this_cpu_write(cpu_tlbstate.next_asid, 1);
 	this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-30 22:26       ` Tim Chen
@ 2018-01-30 22:43         ` Borislav Petkov
  2018-01-31  0:25           ` Tim Chen
  0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2018-01-30 22:43 UTC (permalink / raw)
  To: Tim Chen
  Cc: David Woodhouse, arjan, tglx, karahmed, x86, linux-kernel,
	peterz, pbonzini, ak, torvalds, gregkh, mingo, luto, linux

On Tue, Jan 30, 2018 at 02:26:53PM -0800, Tim Chen wrote:
> If the process has multiple threads running on different cpus,

I'm talking about issuing the barrier in set_dumpable(). What threads on
multiple CPUs?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-30 22:21       ` Thomas Gleixner
@ 2018-01-30 22:55         ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-01-30 22:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tim Chen, Josh Poimboeuf, David Woodhouse, arjan, karahmed, x86,
	linux-kernel, peterz, pbonzini, ak, torvalds, gregkh, mingo,
	luto, linux

On Tue, Jan 30, 2018 at 11:21:58PM +0100, Thomas Gleixner wrote:
> prctl(PR_SET_DUMPABLE, 0)               = 0
> 
> That does the trick.

Ok, found it: https://dev.gnupg.org/T1211

Looks like this is debian-specific for now. Not in gnupg mainlne, AFAICT.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-30 22:43         ` Borislav Petkov
@ 2018-01-31  0:25           ` Tim Chen
  2018-01-31  0:41             ` Borislav Petkov
  0 siblings, 1 reply; 29+ messages in thread
From: Tim Chen @ 2018-01-31  0:25 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: David Woodhouse, arjan, tglx, karahmed, x86, linux-kernel,
	peterz, pbonzini, ak, torvalds, gregkh, mingo, luto, linux

On 01/30/2018 02:43 PM, Borislav Petkov wrote:
> On Tue, Jan 30, 2018 at 02:26:53PM -0800, Tim Chen wrote:
>> If the process has multiple threads running on different cpus,
> 
> I'm talking about issuing the barrier in set_dumpable(). What threads on
> multiple CPUs?
> 

As dumpable is a property in mm->flags, it affects all threads running on other cpus sharing
the same mm.  If you issue IBPB only on the cpu that perform the set_dumpable(),
the theoretical hole you are trying to close still exist on threads running on
other cpu.

        time ----->
(cpu A)                set_dumpable victim (thread1), issue IBPB 
(cpu B) attacker -> victim (thread2), missed IBPB                 -> attacker -> victim (IBPB issued)                      


That said, I think the risk is minuscule and is not worth the cost to set IBPB on the other cpus.

Tim

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-31  0:25           ` Tim Chen
@ 2018-01-31  0:41             ` Borislav Petkov
  0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2018-01-31  0:41 UTC (permalink / raw)
  To: Tim Chen
  Cc: David Woodhouse, arjan, tglx, karahmed, x86, linux-kernel,
	peterz, pbonzini, ak, torvalds, gregkh, mingo, luto, linux

On Tue, Jan 30, 2018 at 04:25:26PM -0800, Tim Chen wrote:
> As dumpable is a property in mm->flags, it affects all threads running
> on other cpus sharing the same mm.

It is not about sharing the same mm - it is about sharing the RSB. How
many logical CPUs share an RSB? If it is per core (which can have two
threads) then issuing the barrier should have effect on both threads.
Thus one barrier is enough.

IOW, the granularity is determined by how many logical CPUs share the
RSB not by how many logical CPUs share an mm.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-30 21:23   ` Tim Chen
  2018-01-30 22:00     ` Borislav Petkov
@ 2018-01-31  3:59     ` Josh Poimboeuf
  2018-01-31 23:25       ` Tim Chen
  1 sibling, 1 reply; 29+ messages in thread
From: Josh Poimboeuf @ 2018-01-31  3:59 UTC (permalink / raw)
  To: Tim Chen
  Cc: David Woodhouse, arjan, tglx, karahmed, x86, linux-kernel, bp,
	peterz, pbonzini, ak, torvalds, gregkh, mingo, luto, linux

On Tue, Jan 30, 2018 at 01:23:17PM -0800, Tim Chen wrote:
> On 01/30/2018 09:48 AM, Josh Poimboeuf wrote:
> > On Mon, Jan 29, 2018 at 10:04:47PM +0000, David Woodhouse wrote:
> >> From: Tim Chen <tim.c.chen@linux.intel.com>
> >>
> >> Flush indirect branches when switching into a process that marked itself
> >> non dumpable. This protects high value processes like gpg better,
> >> without having too high performance overhead.
> > 
> > I wonder what the point of this patch is.  An audit of my laptop shows
> > only a single user of PR_SET_DUMPABLE: systemd-coredump.
> 
> This is an opt in approach.  For processes who need extra
> security, it set itself as non-dumpable.  Then it can
> ensure that it doesn't see any poisoned BTB.  

I don't want other users reading my applications' memory.

I don't want other containers reading my containers' memory.

I don't want *any* user tasks reading root daemons' memory.

Those are not unreasonable expectations.

So now I have to go and modify all my containers and applications to set
PR_SET_DUMPABLE?  That seems highly impractical and unlikely.

Plus, I happen to *like* core dumps.

The other option is to rebuild the entire userland with retpolines, but
again, that would make this patch completely pointless.

> > [ And yes, I have gpg-agent running.  Also, a grep of the gnupg source
> > doesn't show any evidence of it being used there.  So the gpg thing
> > seems to be a myth. ]
> 
> I'm less familiar with gpg-agent.  Dave was the one who
> put in comments about gpg-agent in this patch so perhaps
> he can comment.
> 
> > 
> > But also, I much preferred the original version of the patch which only
> > skipped IBPB when 'prev' could ptrace 'next'.
> 
> For the A->kernel thread->B scenario, you will need context of A
> to decide if you need IBPB when switching to B.  You need to
> worry about whether the context of A has been released ... etc if
> you want to use ptrace.

Is that why the ptrace approach was abandoned?  Surely that's a solvable
problem?  We have some smart people on lkml.  And anyway I didn't see it
discussed anywhere.  In the worst case we could just always do IBPB when
switching between kernel and user tasks.

-- 
Josh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-30 22:39 ` [tip:x86/pti] " tip-bot for Tim Chen
@ 2018-01-31  7:03   ` Dominik Brodowski
  2018-01-31 13:24     ` Josh Poimboeuf
                       ` (2 more replies)
  2018-02-05 14:18   ` David Woodhouse
  1 sibling, 3 replies; 29+ messages in thread
From: Dominik Brodowski @ 2018-01-31  7:03 UTC (permalink / raw)
  To: mingo, hpa, tim.c.chen, dwmw, linux-kernel, tglx; +Cc: jpoimboe

On Tue, Jan 30, 2018 at 02:39:45PM -0800, tip-bot for Tim Chen wrote:
> Commit-ID:  18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7
> Gitweb:     https://git.kernel.org/tip/18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7
> Author:     Tim Chen <tim.c.chen@linux.intel.com>
> AuthorDate: Mon, 29 Jan 2018 22:04:47 +0000
> Committer:  Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Tue, 30 Jan 2018 23:09:21 +0100
> 
> x86/speculation: Use Indirect Branch Prediction Barrier in context switch
> 
> Flush indirect branches when switching into a process that marked itself
> non dumpable. This protects high value processes like gpg better,
> without having too high performance overhead.

For the record, I am still opposed to limit this to non-dumpable processes.
Whether a process needs protection by IBPB on context switches is a
different question to whether a process should be allowed to be dumped,
though the former may be a superset of the latter. In my opinion, IBPB
should be enabled on all context switches to userspace processes, until we
have a clear mitigation strategy for userspace against Spectre-v2 designed
and implemented.

Thanks,
	Dominik

--------------------------
From: Dominik Brodowski <linux@dominikbrodowski.net>
Date: Wed, 31 Jan 2018 07:43:12 +0100
Subject: [PATCH] x86/speculation: Do not limit Indirect Branch Prediction Barrier to non-dumpable processes

Whether a process needs protection by IBPB on context switches is a
different question to whether a process should be allowed to be dumped,
though the former may be a superset of the latter. Enable IBPB on all
context switches to a different userspace process, until we have a clear
mitigation strategy for userspace against Spectre-v2 designed and
implemented.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 012d02624848..f54897b68b16 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -255,19 +255,13 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		 * predictor when switching between processes. This stops
 		 * one process from doing Spectre-v2 attacks on another.
 		 *
-		 * As an optimization, flush indirect branches only when
-		 * switching into processes that disable dumping. This
-		 * protects high value processes like gpg, without having
-		 * too high performance overhead. IBPB is *expensive*!
-		 *
 		 * This will not flush branches when switching into kernel
 		 * threads. It will also not flush if we switch to idle
 		 * thread and back to the same process. It will flush if we
-		 * switch to a different non-dumpable process.
+		 * switch to a different user process.
 		 */
 		if (tsk && tsk->mm &&
-		    tsk->mm->context.ctx_id != last_ctx_id &&
-		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
+		    tsk->mm->context.ctx_id != last_ctx_id)
 			indirect_branch_prediction_barrier();
 
 		if (IS_ENABLED(CONFIG_VMAP_STACK)) {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-31  7:03   ` Dominik Brodowski
@ 2018-01-31 13:24     ` Josh Poimboeuf
  2018-02-01  8:25     ` Christian Brauner
  2018-02-01  8:31     ` David Woodhouse
  2 siblings, 0 replies; 29+ messages in thread
From: Josh Poimboeuf @ 2018-01-31 13:24 UTC (permalink / raw)
  To: Dominik Brodowski; +Cc: mingo, hpa, tim.c.chen, dwmw, linux-kernel, tglx

On Wed, Jan 31, 2018 at 08:03:00AM +0100, Dominik Brodowski wrote:
> On Tue, Jan 30, 2018 at 02:39:45PM -0800, tip-bot for Tim Chen wrote:
> > Commit-ID:  18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7
> > Gitweb:     https://git.kernel.org/tip/18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7
> > Author:     Tim Chen <tim.c.chen@linux.intel.com>
> > AuthorDate: Mon, 29 Jan 2018 22:04:47 +0000
> > Committer:  Thomas Gleixner <tglx@linutronix.de>
> > CommitDate: Tue, 30 Jan 2018 23:09:21 +0100
> > 
> > x86/speculation: Use Indirect Branch Prediction Barrier in context switch
> > 
> > Flush indirect branches when switching into a process that marked itself
> > non dumpable. This protects high value processes like gpg better,
> > without having too high performance overhead.
> 
> For the record, I am still opposed to limit this to non-dumpable processes.
> Whether a process needs protection by IBPB on context switches is a
> different question to whether a process should be allowed to be dumped,
> though the former may be a superset of the latter. In my opinion, IBPB
> should be enabled on all context switches to userspace processes, until we
> have a clear mitigation strategy for userspace against Spectre-v2 designed
> and implemented.
> 
> Thanks,
> 	Dominik
> 
> --------------------------
> From: Dominik Brodowski <linux@dominikbrodowski.net>
> Date: Wed, 31 Jan 2018 07:43:12 +0100
> Subject: [PATCH] x86/speculation: Do not limit Indirect Branch Prediction Barrier to non-dumpable processes
> 
> Whether a process needs protection by IBPB on context switches is a
> different question to whether a process should be allowed to be dumped,
> though the former may be a superset of the latter. Enable IBPB on all
> context switches to a different userspace process, until we have a clear
> mitigation strategy for userspace against Spectre-v2 designed and
> implemented.
> 
> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>

-- 
Josh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-31  3:59     ` Josh Poimboeuf
@ 2018-01-31 23:25       ` Tim Chen
  0 siblings, 0 replies; 29+ messages in thread
From: Tim Chen @ 2018-01-31 23:25 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: David Woodhouse, arjan, tglx, karahmed, x86, linux-kernel, bp,
	peterz, pbonzini, ak, torvalds, gregkh, mingo, luto, linux

On 01/30/2018 07:59 PM, Josh Poimboeuf wrote:
> On Tue, Jan 30, 2018 at 01:23:17PM -0800, Tim Chen wrote:
>> On 01/30/2018 09:48 AM, Josh Poimboeuf wrote:
>>> On Mon, Jan 29, 2018 at 10:04:47PM +0000, David Woodhouse wrote:
>>>> From: Tim Chen <tim.c.chen@linux.intel.com>
>>>>
>>>> Flush indirect branches when switching into a process that marked itself
>>>> non dumpable. This protects high value processes like gpg better,
>>>> without having too high performance overhead.
>>>
>>> I wonder what the point of this patch is.  An audit of my laptop shows
>>> only a single user of PR_SET_DUMPABLE: systemd-coredump.
>>
>> This is an opt in approach.  For processes who need extra
>> security, it set itself as non-dumpable.  Then it can
>> ensure that it doesn't see any poisoned BTB.  
> 
> I don't want other users reading my applications' memory.
> 
> I don't want other containers reading my containers' memory.
> 
> I don't want *any* user tasks reading root daemons' memory.
> 
> Those are not unreasonable expectations.
> 
> So now I have to go and modify all my containers and applications to set
> PR_SET_DUMPABLE?  That seems highly impractical and unlikely.
> 
> Plus, I happen to *like* core dumps.
> 
> The other option is to rebuild the entire userland with retpolines, but
> again, that would make this patch completely pointless.
> 
>>> [ And yes, I have gpg-agent running.  Also, a grep of the gnupg source
>>> doesn't show any evidence of it being used there.  So the gpg thing
>>> seems to be a myth. ]
>>
>> I'm less familiar with gpg-agent.  Dave was the one who
>> put in comments about gpg-agent in this patch so perhaps
>> he can comment.
>>
>>>
>>> But also, I much preferred the original version of the patch which only
>>> skipped IBPB when 'prev' could ptrace 'next'.
>>
>> For the A->kernel thread->B scenario, you will need context of A
>> to decide if you need IBPB when switching to B.  You need to
>> worry about whether the context of A has been released ... etc if
>> you want to use ptrace.
> 
> Is that why the ptrace approach was abandoned?  Surely that's a solvable
> problem?  We have some smart people on lkml.  And anyway I didn't see it
> discussed anywhere.  In the worst case we could just always do IBPB when
> switching between kernel and user tasks.
> 

I think dumpable is not the end all policy.  It is a reasonable starting point
to provide us a means to secure the most sensitive processes without
IBPBing the world.  It is on the performance end of the security/performance trade off.

For people who opt for more security, it is reasonable to consider
alternate policies to distinguish friend and foe so we know if we are coming
from a potentially hostile environment.  Ptrace is one means to do so, and probably
there are other ways depending on usages.  I hope we can have a discussion on what we should
use to determine if two processes are friend or foe.  Say do all the processes
from the same containers are considered friends with each other?
I think once we have this decided, actually putting in IBPB will simple.

Tim

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-31  7:03   ` Dominik Brodowski
  2018-01-31 13:24     ` Josh Poimboeuf
@ 2018-02-01  8:25     ` Christian Brauner
  2018-02-01  8:31     ` David Woodhouse
  2 siblings, 0 replies; 29+ messages in thread
From: Christian Brauner @ 2018-02-01  8:25 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: mingo, hpa, tim.c.chen, dwmw, linux-kernel, tglx, jpoimboe

On Wed, Jan 31, 2018 at 08:03:00AM +0100, Dominik Brodowski wrote:
> On Tue, Jan 30, 2018 at 02:39:45PM -0800, tip-bot for Tim Chen wrote:
> > Commit-ID:  18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7
> > Gitweb:     https://git.kernel.org/tip/18bf3c3ea8ece8f03b6fc58508f2dfd23c7711c7
> > Author:     Tim Chen <tim.c.chen@linux.intel.com>
> > AuthorDate: Mon, 29 Jan 2018 22:04:47 +0000
> > Committer:  Thomas Gleixner <tglx@linutronix.de>
> > CommitDate: Tue, 30 Jan 2018 23:09:21 +0100
> > 
> > x86/speculation: Use Indirect Branch Prediction Barrier in context switch
> > 
> > Flush indirect branches when switching into a process that marked itself
> > non dumpable. This protects high value processes like gpg better,
> > without having too high performance overhead.
> 
> For the record, I am still opposed to limit this to non-dumpable processes.
> Whether a process needs protection by IBPB on context switches is a
> different question to whether a process should be allowed to be dumped,

This is especially true since you can get into a non-dumpable state
implicitly due to setuid() in a new user namespace. So avoiding the
performance hit due to IBPD is not necessary a concious decision taken
by calling prctl(), that is to say even if you care about performance
and don't want to take the IBPB hit you might accidently have to take it
when doing a setuid(). Also there are definitely workloads where
you want to be able to ptrace a task while at the same time having it do
IBPB. So let's not limit this to non-dumpable taks! Thanks Dominik!

> though the former may be a superset of the latter. In my opinion, IBPB
> should be enabled on all context switches to userspace processes, until we
> have a clear mitigation strategy for userspace against Spectre-v2 designed
> and implemented.
> 
> Thanks,
> 	Dominik
> 
> --------------------------
> From: Dominik Brodowski <linux@dominikbrodowski.net>
> Date: Wed, 31 Jan 2018 07:43:12 +0100
> Subject: [PATCH] x86/speculation: Do not limit Indirect Branch Prediction Barrier to non-dumpable processes
> 
> Whether a process needs protection by IBPB on context switches is a
> different question to whether a process should be allowed to be dumped,
> though the former may be a superset of the latter. Enable IBPB on all
> context switches to a different userspace process, until we have a clear
> mitigation strategy for userspace against Spectre-v2 designed and
> implemented.

Thanks Dominik. That makes a lot more sense!

> 
> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
> 
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 012d02624848..f54897b68b16 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -255,19 +255,13 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>  		 * predictor when switching between processes. This stops
>  		 * one process from doing Spectre-v2 attacks on another.
>  		 *
> -		 * As an optimization, flush indirect branches only when
> -		 * switching into processes that disable dumping. This
> -		 * protects high value processes like gpg, without having
> -		 * too high performance overhead. IBPB is *expensive*!
> -		 *
>  		 * This will not flush branches when switching into kernel
>  		 * threads. It will also not flush if we switch to idle
>  		 * thread and back to the same process. It will flush if we
> -		 * switch to a different non-dumpable process.
> +		 * switch to a different user process.
>  		 */
>  		if (tsk && tsk->mm &&
> -		    tsk->mm->context.ctx_id != last_ctx_id &&
> -		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
> +		    tsk->mm->context.ctx_id != last_ctx_id)
>  			indirect_branch_prediction_barrier();
>  
>  		if (IS_ENABLED(CONFIG_VMAP_STACK)) {

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-31  7:03   ` Dominik Brodowski
  2018-01-31 13:24     ` Josh Poimboeuf
  2018-02-01  8:25     ` Christian Brauner
@ 2018-02-01  8:31     ` David Woodhouse
  2018-02-01 15:40       ` Josh Poimboeuf
  2018-02-04 19:39       ` Dominik Brodowski
  2 siblings, 2 replies; 29+ messages in thread
From: David Woodhouse @ 2018-02-01  8:31 UTC (permalink / raw)
  To: Dominik Brodowski, mingo, hpa, tim.c.chen, linux-kernel, tglx
  Cc: jpoimboe, Wieczorkiewicz, Pawel

[-- Attachment #1: Type: text/plain, Size: 1103 bytes --]

On Wed, 2018-01-31 at 08:03 +0100, Dominik Brodowski wrote:
> Whether a process needs protection by IBPB on context switches is a
> different question to whether a process should be allowed to be dumped,
> though the former may be a superset of the latter. Enable IBPB on all
> context switches to a different userspace process, until we have a clear
> mitigation strategy for userspace against Spectre-v2 designed and
> implemented.
>
> ...
>                 if (tsk && tsk->mm &&
> -                   tsk->mm->context.ctx_id != last_ctx_id &&
> -                   get_dumpable(tsk->mm) != SUID_DUMP_USER)
> +                   tsk->mm->context.ctx_id != last_ctx_id)
>                         indirect_branch_prediction_barrier();


I understand your argument and I sympathise.

But that's going to hurt a *lot*, and we don't even have a viable
proof-of-concept for a user←→user Spectre v2 attack, do we? It's only
theoretical?

Show a working PoC and it makes the argument somewhat more
convincing...

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-02-01  8:31     ` David Woodhouse
@ 2018-02-01 15:40       ` Josh Poimboeuf
  2018-02-04 19:39       ` Dominik Brodowski
  1 sibling, 0 replies; 29+ messages in thread
From: Josh Poimboeuf @ 2018-02-01 15:40 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Dominik Brodowski, mingo, hpa, tim.c.chen, linux-kernel, tglx,
	Wieczorkiewicz, Pawel

On Thu, Feb 01, 2018 at 08:31:53AM +0000, David Woodhouse wrote:
> On Wed, 2018-01-31 at 08:03 +0100, Dominik Brodowski wrote:
> > Whether a process needs protection by IBPB on context switches is a
> > different question to whether a process should be allowed to be dumped,
> > though the former may be a superset of the latter. Enable IBPB on all
> > context switches to a different userspace process, until we have a clear
> > mitigation strategy for userspace against Spectre-v2 designed and
> > implemented.
> >
> > ...
> >                 if (tsk && tsk->mm &&
> > -                   tsk->mm->context.ctx_id != last_ctx_id &&
> > -                   get_dumpable(tsk->mm) != SUID_DUMP_USER)
> > +                   tsk->mm->context.ctx_id != last_ctx_id)
> >                         indirect_branch_prediction_barrier();
> 
> 
> I understand your argument and I sympathise.
> 
> But that's going to hurt a *lot*, and we don't even have a viable
> proof-of-concept for a user←→user Spectre v2 attack, do we? It's only
> theoretical?
> 
> Show a working PoC and it makes the argument somewhat more
> convincing...

Fair point.  From what I can gather, user space ASLR seems to be the
only layer of protection before a POC would be feasible.  So, unless I'm
mistaken, which is very possible, it seems to be a question of how much
we trust ASLR.

-- 
Josh

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-02-01  8:31     ` David Woodhouse
  2018-02-01 15:40       ` Josh Poimboeuf
@ 2018-02-04 19:39       ` Dominik Brodowski
  1 sibling, 0 replies; 29+ messages in thread
From: Dominik Brodowski @ 2018-02-04 19:39 UTC (permalink / raw)
  To: David Woodhouse, Tim Chen
  Cc: mingo, hpa, linux-kernel, tglx, jpoimboe, Wieczorkiewicz, Pawel,
	David Woodhouse, arjan, karahmed, x86, bp, peterz, pbonzini, ak,
	torvalds, gregkh, luto

On Thu, Feb 01, 2018 at 08:31:53AM +0000, David Woodhouse wrote:
> On Wed, 2018-01-31 at 08:03 +0100, Dominik Brodowski wrote:
> > Whether a process needs protection by IBPB on context switches is a
> > different question to whether a process should be allowed to be dumped,
> > though the former may be a superset of the latter. Enable IBPB on all
> > context switches to a different userspace process, until we have a clear
> > mitigation strategy for userspace against Spectre-v2 designed and
> > implemented.
> >
> > ...
> >                 if (tsk && tsk->mm &&
> > -                   tsk->mm->context.ctx_id != last_ctx_id &&
> > -                   get_dumpable(tsk->mm) != SUID_DUMP_USER)
> > +                   tsk->mm->context.ctx_id != last_ctx_id)
> >                         indirect_branch_prediction_barrier();
> 
> 
> I understand your argument and I sympathise.
> 
> But that's going to hurt a *lot*, and we don't even have a viable
> proof-of-concept for a user←→user Spectre v2 attack, do we? It's only
> theoretical?

Wasn't the PoC in the Spectre paper user←→user (though on a different OS)?
And what makes KVM←→KVM so much more likely/dangerous/..., that IBPB will
be done there unconditionally (AFAICS)?


And, somewhat related, @Tim Chen:

On Wed, Jan 31, 2018 at 03:25:44PM -0800, Tim Chen wrote:
> For people who opt for more security, it is reasonable to consider
> alternate policies to distinguish friend and foe so we know if we are coming
> from a potentially hostile environment.  Ptrace is one means to do so, and probably
> there are other ways depending on usages.  I hope we can have a discussion on what we should
> use to determine if two processes are friend or foe.  Say do all the processes
> from the same containers are considered friends with each other?

To my understanding, the concept of "containers" is meant to be kept outside
of the kernel. What *namespaces* / *control groups* can be considered
friends with each other?


Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-30 22:39 ` [tip:x86/pti] " tip-bot for Tim Chen
  2018-01-31  7:03   ` Dominik Brodowski
@ 2018-02-05 14:18   ` David Woodhouse
  2018-02-05 19:35       ` Tim Chen
  1 sibling, 1 reply; 29+ messages in thread
From: David Woodhouse @ 2018-02-05 14:18 UTC (permalink / raw)
  To: mingo, hpa, tim.c.chen, linux-kernel, tglx, luto,
	Greg Kroah-Hartman, linux-mm

[-- Attachment #1: Type: text/plain, Size: 264 bytes --]

On Tue, 2018-01-30 at 14:39 -0800, tip-bot for Tim Chen wrote:
> Thanks to the reviewers and Andy Lutomirski for the suggestion of
> using ctx_id which got rid of the problem of mm pointer recycling.

That one doesn't backport well to 4.9. Suggestions welcome.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-02-05 14:18   ` David Woodhouse
@ 2018-02-05 19:35       ` Tim Chen
  0 siblings, 0 replies; 29+ messages in thread
From: Tim Chen @ 2018-02-05 19:35 UTC (permalink / raw)
  To: David Woodhouse, mingo, hpa, linux-kernel, tglx, luto,
	Greg Kroah-Hartman, linux-mm

On 02/05/2018 06:18 AM, David Woodhouse wrote:
> On Tue, 2018-01-30 at 14:39 -0800, tip-bot for Tim Chen wrote:
>> Thanks to the reviewers and Andy Lutomirski for the suggestion of
>> using ctx_id which got rid of the problem of mm pointer recycling.
> 
> That one doesn't backport well to 4.9. Suggestions welcome.
> 

Will something like the following work for 4.9 using active_mm?
This patch is not really tested, but just
want to put it out here to see if this is a reasonable backport.

Tim

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index a7655f6..4994db2 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -9,6 +9,7 @@
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
+#include <asm/nospec-branch.h>
 #include <asm/cache.h>
 #include <asm/apic.h>
 #include <asm/uv/uv.h>
@@ -75,6 +76,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			struct task_struct *tsk)
 {
 	unsigned cpu = smp_processor_id();
+#ifdef CONFIG_SMP
+	struct mm_struct *active_mm = this_cpu_read(cpu_tlbstate.active_mm);
+#endif
 
 	if (likely(prev != next)) {
 		if (IS_ENABLED(CONFIG_VMAP_STACK)) {
@@ -91,6 +95,28 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 				set_pgd(pgd, init_mm.pgd[stack_pgd_index]);
 		}
 
+		/*
+		 * Avoid user/user BTB poisoning by flushing the branch
+		 * predictor when switching between processes. This stops
+		 * one process from doing Spectre-v2 attacks on another.
+		 *
+		 * As an optimization, flush indirect branches only when
+		 * switching into processes that disable dumping. This
+		 * protects high value processes like gpg, without having
+		 * too high performance overhead. IBPB is *expensive*!
+		 *
+		 * This will not flush branches when switching into kernel
+		 * threads. It will also not flush if we switch to idle
+		 * thread and back to the same process. It will flush if we
+		 * switch to a different non-dumpable process.
+		 */
+		if (tsk && tsk->mm &&
+#ifdef CONFIG_SMP
+		    next != active_mm &&
+#endif
+		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
+			indirect_branch_prediction_barrier();
+
 #ifdef CONFIG_SMP
 		this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
 		this_cpu_write(cpu_tlbstate.active_mm, next);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [tip:x86/pti] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
@ 2018-02-05 19:35       ` Tim Chen
  0 siblings, 0 replies; 29+ messages in thread
From: Tim Chen @ 2018-02-05 19:35 UTC (permalink / raw)
  To: David Woodhouse, mingo, hpa, linux-kernel, tglx, luto,
	Greg Kroah-Hartman, linux-mm

On 02/05/2018 06:18 AM, David Woodhouse wrote:
> On Tue, 2018-01-30 at 14:39 -0800, tip-bot for Tim Chen wrote:
>> Thanks to the reviewers and Andy Lutomirski for the suggestion of
>> using ctx_id which got rid of the problem of mm pointer recycling.
> 
> That one doesn't backport well to 4.9. Suggestions welcome.
> 

Will something like the following work for 4.9 using active_mm?
This patch is not really tested, but just
want to put it out here to see if this is a reasonable backport.

Tim

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index a7655f6..4994db2 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -9,6 +9,7 @@
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
+#include <asm/nospec-branch.h>
 #include <asm/cache.h>
 #include <asm/apic.h>
 #include <asm/uv/uv.h>
@@ -75,6 +76,9 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			struct task_struct *tsk)
 {
 	unsigned cpu = smp_processor_id();
+#ifdef CONFIG_SMP
+	struct mm_struct *active_mm = this_cpu_read(cpu_tlbstate.active_mm);
+#endif
 
 	if (likely(prev != next)) {
 		if (IS_ENABLED(CONFIG_VMAP_STACK)) {
@@ -91,6 +95,28 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 				set_pgd(pgd, init_mm.pgd[stack_pgd_index]);
 		}
 
+		/*
+		 * Avoid user/user BTB poisoning by flushing the branch
+		 * predictor when switching between processes. This stops
+		 * one process from doing Spectre-v2 attacks on another.
+		 *
+		 * As an optimization, flush indirect branches only when
+		 * switching into processes that disable dumping. This
+		 * protects high value processes like gpg, without having
+		 * too high performance overhead. IBPB is *expensive*!
+		 *
+		 * This will not flush branches when switching into kernel
+		 * threads. It will also not flush if we switch to idle
+		 * thread and back to the same process. It will flush if we
+		 * switch to a different non-dumpable process.
+		 */
+		if (tsk && tsk->mm &&
+#ifdef CONFIG_SMP
+		    next != active_mm &&
+#endif
+		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
+			indirect_branch_prediction_barrier();
+
 #ifdef CONFIG_SMP
 		this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
 		this_cpu_write(cpu_tlbstate.active_mm, next);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-29 12:44   ` David Woodhouse
@ 2018-01-29 13:56     ` Dominik Brodowski
  0 siblings, 0 replies; 29+ messages in thread
From: Dominik Brodowski @ 2018-01-29 13:56 UTC (permalink / raw)
  To: David Woodhouse
  Cc: arjan, tglx, karahmed, x86, linux-kernel, tim.c.chen, bp, peterz

On Mon, Jan 29, 2018 at 12:44:59PM +0000, David Woodhouse wrote:
> On Mon, 2018-01-29 at 13:28 +0100, Dominik Brodowski wrote:
> 
> > The commit message is much more about the A->idle-> improvement than
> > on the basic design decisions to limit this to non-dumpable processes.
> 
> Yeah, I collapsed the commit messages from the three commits into one.
> But the resulting commit message does start with the basic non-dumpable 
> concept, and explain the trade-off (sensitivity vs. overhead) using the
> comment from what was the second path in the series.
> 
> >  And
> > that still seems to be under discussion (see, for example, Jon Masters
> > message of today, https://lkml.org/lkml/2018/1/29/34 ). So this design
> > choice should, at least, be more explicit (if not tunable...).
> 
> That isn't stunningly relevant here. Yes, we might build userspace with
> retpoline support and there'll be an additional case here so it becomes
> !dumpable && !retpoline-userspace. But that's all way off in the
> future.
> 
> > > @@ -219,6 +220,25 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> > >  	} else {
> > >  		u16 new_asid;
> > >  		bool need_flush;
> > > +		u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
> > > +
> > > +		/*
> > > +		 * Avoid user/user BTB poisoning by flushing the branch
> > > +		 * predictor when switching between processes. This stops
> > > +		 * one process from doing Spectre-v2 attacks on another.
> > > +		 *
> > > +                 * As an optimization, flush indirect branches only when
> > > +                 * switching into processes that disable dumping.
> > > +                 *
> > > +                 * This will not flush branches when switching into kernel
> > > +		 * threads. It will also not flush if we switch to idle
> > Whitespace damage. And maybe add ", as the kernel depends on retpoline
> > protection instead" after "threads" here -- I think that was the reason why
> > you think kernel threads are safe; or did I misunderstand you?
> 
> I'll fix up the whitespace; thanks. For the other comment... no, if
> kernel threads needed *any* kind of protection from this IBPB then we'd
> be hosed by the time we got here anyway. Let's not imply that an IBPB
> here would be at all useful for the kernel under any circumstances.
> 
> 
> > > 
> > > +		 * thread and back to the same process. It will flush if we
> > > +		 * switch to a different non-dumpable process.
> > "process, as that gives additional protection to high value processes like
> > gpg. Other processes are left unprotected here to reduce the overhead of the
> > barrier [... maybe add some rationale here ...]"
> 
> The rationale is to reduce the overhead of the barrier. I've added that
> to the comment, based on what's in the commit message already. Thanks.
> 
> http://git.infradead.org/users/dwmw2/linux-retpoline.git/commitdiff/8431c5a87520cd14f8769745589443e603682b23

That reads much better now, thanks. All other issues (e.g. adding an
IPBP also for dumpable tasks) can easily be discussed on top of that commit.

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-29 12:28 ` Dominik Brodowski
@ 2018-01-29 12:44   ` David Woodhouse
  2018-01-29 13:56     ` Dominik Brodowski
  0 siblings, 1 reply; 29+ messages in thread
From: David Woodhouse @ 2018-01-29 12:44 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: arjan, tglx, karahmed, x86, linux-kernel, tim.c.chen, bp, peterz

[-- Attachment #1: Type: text/plain, Size: 2889 bytes --]

On Mon, 2018-01-29 at 13:28 +0100, Dominik Brodowski wrote:

> The commit message is much more about the A->idle-> improvement than
> on the basic design decisions to limit this to non-dumpable processes.

Yeah, I collapsed the commit messages from the three commits into one.
But the resulting commit message does start with the basic non-dumpable 
concept, and explain the trade-off (sensitivity vs. overhead) using the
comment from what was the second path in the series.

>  And
> that still seems to be under discussion (see, for example, Jon Masters
> message of today, https://lkml.org/lkml/2018/1/29/34 ). So this design
> choice should, at least, be more explicit (if not tunable...).

That isn't stunningly relevant here. Yes, we might build userspace with
retpoline support and there'll be an additional case here so it becomes
!dumpable && !retpoline-userspace. But that's all way off in the
future.

> > @@ -219,6 +220,25 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> >  	} else {
> >  		u16 new_asid;
> >  		bool need_flush;
> > +		u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
> > +
> > +		/*
> > +		 * Avoid user/user BTB poisoning by flushing the branch
> > +		 * predictor when switching between processes. This stops
> > +		 * one process from doing Spectre-v2 attacks on another.
> > +		 *
> > +                 * As an optimization, flush indirect branches only when
> > +                 * switching into processes that disable dumping.
> > +                 *
> > +                 * This will not flush branches when switching into kernel
> > +		 * threads. It will also not flush if we switch to idle
> Whitespace damage. And maybe add ", as the kernel depends on retpoline
> protection instead" after "threads" here -- I think that was the reason why
> you think kernel threads are safe; or did I misunderstand you?

I'll fix up the whitespace; thanks. For the other comment... no, if
kernel threads needed *any* kind of protection from this IBPB then we'd
be hosed by the time we got here anyway. Let's not imply that an IBPB
here would be at all useful for the kernel under any circumstances.


> > 
> > +		 * thread and back to the same process. It will flush if we
> > +		 * switch to a different non-dumpable process.
> "process, as that gives additional protection to high value processes like
> gpg. Other processes are left unprotected here to reduce the overhead of the
> barrier [... maybe add some rationale here ...]"

The rationale is to reduce the overhead of the barrier. I've added that
to the comment, based on what's in the commit message already. Thanks.

http://git.infradead.org/users/dwmw2/linux-retpoline.git/commitdiff/8431c5a87520cd14f8769745589443e603682b23

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
  2018-01-29 11:33 [PATCH] " David Woodhouse
@ 2018-01-29 12:28 ` Dominik Brodowski
  2018-01-29 12:44   ` David Woodhouse
  0 siblings, 1 reply; 29+ messages in thread
From: Dominik Brodowski @ 2018-01-29 12:28 UTC (permalink / raw)
  To: David Woodhouse
  Cc: arjan, tglx, karahmed, x86, linux-kernel, tim.c.chen, bp, peterz,
	pbonzini, ak, torvalds, gregkh, mingo, luto, jcm

On Mon, Jan 29, 2018 at 11:33:28AM +0000, David Woodhouse wrote:
> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> Flush indirect branches when switching into a process that marked itself
> non dumpable. This protects high value processes like gpg better,
> without having too high performance overhead.
> 
> If done naïvely, we could switch to a kernel idle thread and then back
> to the original process, such as:
> 
>     process A -> idle -> process A
> 
> In such scenario, we do not have to do IBPB here even though the process
> is non-dumpable, as we are switching back to the same process after a
> hiatus.
> 
> To avoid the redundant IBPB, which is expensive, we track the last mm
> user context ID. The cost is to have an extra u64 mm context id to track
> the last mm we were using before switching to the init_mm used by idle.
> Avoiding the extra IBPB is probably worth the extra memory for this
> common scenario.
> 
> For those cases where tlb_defer_switch_to_init_mm() returns true (non
> PCID), lazy tlb will defer switch to init_mm, so we will not be changing
> the mm for the process A -> idle -> process A switch. So IBPB will be
> skipped for this case.
> 
> Thanks to the reviewers and Andy Lutomirski for the suggestion of
> using ctx_id which got rid of the problem of mm pointer recycling.
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> How close are we to done with bikeshedding this one?... 

The commit message is much more about the A->idle-> improvement than
on the basic design decisions to limit this to non-dumpable processes. And
that still seems to be under discussion (see, for example, Jon Masters
message of today, https://lkml.org/lkml/2018/1/29/34 ). So this design
choice should, at least, be more explicit (if not tunable...).

> @@ -219,6 +220,25 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>  	} else {
>  		u16 new_asid;
>  		bool need_flush;
> +		u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
> +
> +		/*
> +		 * Avoid user/user BTB poisoning by flushing the branch
> +		 * predictor when switching between processes. This stops
> +		 * one process from doing Spectre-v2 attacks on another.
> +		 *
> +                 * As an optimization, flush indirect branches only when
> +                 * switching into processes that disable dumping.
> +                 *
> +                 * This will not flush branches when switching into kernel
> +		 * threads. It will also not flush if we switch to idle

Whitespace damage. And maybe add ", as the kernel depends on retpoline
protection instead" after "threads" here -- I think that was the reason why
you think kernel threads are safe; or did I misunderstand you?

> +		 * thread and back to the same process. It will flush if we
> +		 * switch to a different non-dumpable process.

"process, as that gives additional protection to high value processes like
gpg. Other processes are left unprotected here to reduce the overhead of the
barrier [... maybe add some rationale here ...]"

Thanks,
	Dominik

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch
@ 2018-01-29 11:33 David Woodhouse
  2018-01-29 12:28 ` Dominik Brodowski
  0 siblings, 1 reply; 29+ messages in thread
From: David Woodhouse @ 2018-01-29 11:33 UTC (permalink / raw)
  To: arjan, tglx, karahmed, x86, linux-kernel, tim.c.chen, bp, peterz,
	pbonzini, ak, torvalds, gregkh, mingo, luto

From: Tim Chen <tim.c.chen@linux.intel.com>

Flush indirect branches when switching into a process that marked itself
non dumpable. This protects high value processes like gpg better,
without having too high performance overhead.

If done naïvely, we could switch to a kernel idle thread and then back
to the original process, such as:

    process A -> idle -> process A

In such scenario, we do not have to do IBPB here even though the process
is non-dumpable, as we are switching back to the same process after a
hiatus.

To avoid the redundant IBPB, which is expensive, we track the last mm
user context ID. The cost is to have an extra u64 mm context id to track
the last mm we were using before switching to the init_mm used by idle.
Avoiding the extra IBPB is probably worth the extra memory for this
common scenario.

For those cases where tlb_defer_switch_to_init_mm() returns true (non
PCID), lazy tlb will defer switch to init_mm, so we will not be changing
the mm for the process A -> idle -> process A switch. So IBPB will be
skipped for this case.

Thanks to the reviewers and Andy Lutomirski for the suggestion of
using ctx_id which got rid of the problem of mm pointer recycling.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
How close are we to done with bikeshedding this one?... 

 arch/x86/include/asm/tlbflush.h |  2 ++
 arch/x86/mm/tlb.c               | 31 ++++++++++++++++++++++++++++++-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 3effd3c..4405c4b 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -174,6 +174,8 @@ struct tlb_state {
 	struct mm_struct *loaded_mm;
 	u16 loaded_mm_asid;
 	u16 next_asid;
+	/* last user mm's ctx id */
+	u64 last_ctx_id;
 
 	/*
 	 * We can be in one of several states:
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index a156195..870fb99 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -6,13 +6,14 @@
 #include <linux/interrupt.h>
 #include <linux/export.h>
 #include <linux/cpu.h>
+#include <linux/debugfs.h>
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
+#include <asm/nospec-branch.h>
 #include <asm/cache.h>
 #include <asm/apic.h>
 #include <asm/uv/uv.h>
-#include <linux/debugfs.h>
 
 /*
  *	TLB flushing, formerly SMP-only
@@ -219,6 +220,25 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 	} else {
 		u16 new_asid;
 		bool need_flush;
+		u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
+
+		/*
+		 * Avoid user/user BTB poisoning by flushing the branch
+		 * predictor when switching between processes. This stops
+		 * one process from doing Spectre-v2 attacks on another.
+		 *
+                 * As an optimization, flush indirect branches only when
+                 * switching into processes that disable dumping.
+                 *
+                 * This will not flush branches when switching into kernel
+		 * threads. It will also not flush if we switch to idle
+		 * thread and back to the same process. It will flush if we
+		 * switch to a different non-dumpable process.
+		 */
+		if (tsk && tsk->mm &&
+		    tsk->mm->context.ctx_id != last_ctx_id &&
+		    get_dumpable(tsk->mm) != SUID_DUMP_USER)
+			indirect_branch_prediction_barrier();
 
 		if (IS_ENABLED(CONFIG_VMAP_STACK)) {
 			/*
@@ -268,6 +288,14 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0);
 		}
 
+		/*
+		 * Record last user mm's context id, so we can avoid
+		 * flushing branch buffer with IBPB if we switch back
+		 * to the same user.
+		 */
+		if (next != &init_mm)
+			this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id);
+
 		this_cpu_write(cpu_tlbstate.loaded_mm, next);
 		this_cpu_write(cpu_tlbstate.loaded_mm_asid, new_asid);
 	}
@@ -345,6 +373,7 @@ void initialize_tlbstate_and_flush(void)
 	write_cr3(build_cr3(mm->pgd, 0));
 
 	/* Reinitialize tlbstate. */
+	this_cpu_write(cpu_tlbstate.last_ctx_id, mm->context.ctx_id);
 	this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
 	this_cpu_write(cpu_tlbstate.next_asid, 1);
 	this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2018-02-05 19:35 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-29 22:04 [PATCH] x86/speculation: Use Indirect Branch Prediction Barrier in context switch David Woodhouse
2018-01-30 17:48 ` Josh Poimboeuf
2018-01-30 21:23   ` Tim Chen
2018-01-30 22:00     ` Borislav Petkov
2018-01-30 22:21       ` Thomas Gleixner
2018-01-30 22:55         ` Borislav Petkov
2018-01-31  3:59     ` Josh Poimboeuf
2018-01-31 23:25       ` Tim Chen
2018-01-30 20:38 ` Borislav Petkov
2018-01-30 21:03   ` Tim Chen
2018-01-30 21:57     ` Borislav Petkov
2018-01-30 22:26       ` Tim Chen
2018-01-30 22:43         ` Borislav Petkov
2018-01-31  0:25           ` Tim Chen
2018-01-31  0:41             ` Borislav Petkov
2018-01-30 22:39 ` [tip:x86/pti] " tip-bot for Tim Chen
2018-01-31  7:03   ` Dominik Brodowski
2018-01-31 13:24     ` Josh Poimboeuf
2018-02-01  8:25     ` Christian Brauner
2018-02-01  8:31     ` David Woodhouse
2018-02-01 15:40       ` Josh Poimboeuf
2018-02-04 19:39       ` Dominik Brodowski
2018-02-05 14:18   ` David Woodhouse
2018-02-05 19:35     ` Tim Chen
2018-02-05 19:35       ` Tim Chen
  -- strict thread matches above, loose matches on Subject: below --
2018-01-29 11:33 [PATCH] " David Woodhouse
2018-01-29 12:28 ` Dominik Brodowski
2018-01-29 12:44   ` David Woodhouse
2018-01-29 13:56     ` Dominik Brodowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.