* [PATCH v4 1/5] x86/smp: Add a per-cpu view of SMT state
2021-01-08 12:10 [PATCH v4 0/5] Next revision of the L1D flush patches Balbir Singh
@ 2021-01-08 12:10 ` Balbir Singh
2021-07-28 9:58 ` [tip: x86/cpu] " tip-bot2 for Balbir Singh
2021-01-08 12:10 ` [PATCH v4 2/5] x86/mm: Refactor cond_ibpb() to support other use cases Balbir Singh
` (7 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Balbir Singh @ 2021-01-08 12:10 UTC (permalink / raw)
To: tglx, mingo
Cc: peterz, linux-kernel, keescook, jpoimboe, tony.luck, benh, x86,
dave.hansen, thomas.lendacky, torvalds, Balbir Singh
A new field smt_active in cpuinfo_x86 identifies if the current core/cpu
is in SMT mode or not. This can be very helpful if the system has some
of its cores with threads offlined and can be used for cases where
action is taken based on the state of SMT. The follow up patches use
this feature.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
---
arch/x86/include/asm/processor.h | 2 ++
arch/x86/kernel/smpboot.c | 10 +++++++++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index c20a52b5534b..a411466a6e74 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -136,6 +136,8 @@ struct cpuinfo_x86 {
u16 logical_die_id;
/* Index into per_cpu list: */
u16 cpu_index;
+ /* Is SMT active on this core? */
+ bool smt_active;
u32 microcode;
/* Address space bits used by the cache internally */
u8 x86_cache_bits;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 8ca66af96a54..5f6df298d785 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -640,6 +640,9 @@ void set_cpu_sibling_map(int cpu)
threads = cpumask_weight(topology_sibling_cpumask(cpu));
if (threads > __max_smt_threads)
__max_smt_threads = threads;
+
+ for_each_cpu(i, topology_sibling_cpumask(cpu))
+ cpu_data(i).smt_active = threads > 1;
}
/* maps the cpu to the sched domain representing multi-core */
@@ -1551,8 +1554,13 @@ static void remove_siblinginfo(int cpu)
for_each_cpu(sibling, topology_die_cpumask(cpu))
cpumask_clear_cpu(cpu, topology_die_cpumask(sibling));
- for_each_cpu(sibling, topology_sibling_cpumask(cpu))
+
+ for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
+ if (cpumask_weight(topology_sibling_cpumask(sibling)) == 1)
+ cpu_data(sibling).smt_active = false;
+ }
+
for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
cpumask_clear(cpu_llc_shared_mask(cpu));
--
2.17.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip: x86/cpu] x86/smp: Add a per-cpu view of SMT state
2021-01-08 12:10 ` [PATCH v4 1/5] x86/smp: Add a per-cpu view of SMT state Balbir Singh
@ 2021-07-28 9:58 ` tip-bot2 for Balbir Singh
0 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Balbir Singh @ 2021-07-28 9:58 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Thomas Gleixner, Balbir Singh, x86, linux-kernel
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: c52787b590634646d4da3d8f23c4532ba050d40d
Gitweb: https://git.kernel.org/tip/c52787b590634646d4da3d8f23c4532ba050d40d
Author: Balbir Singh <sblbir@amazon.com>
AuthorDate: Fri, 08 Jan 2021 23:10:52 +11:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Jul 2021 11:42:23 +02:00
x86/smp: Add a per-cpu view of SMT state
A new field smt_active in cpuinfo_x86 identifies if the current core/cpu
is in SMT mode or not.
This is helpful when the system has some of its cores with threads offlined
and can be used for cases where action is taken based on the state of SMT.
The upcoming support for paranoid L1D flush will make use of this information.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210108121056.21940-2-sblbir@amazon.com
---
arch/x86/include/asm/processor.h | 2 ++
arch/x86/kernel/smpboot.c | 10 +++++++++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index f3020c5..1e0d13c 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -136,6 +136,8 @@ struct cpuinfo_x86 {
u16 logical_die_id;
/* Index into per_cpu list: */
u16 cpu_index;
+ /* Is SMT active on this core? */
+ bool smt_active;
u32 microcode;
/* Address space bits used by the cache internally */
u8 x86_cache_bits;
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 9320285..85f6e24 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -610,6 +610,9 @@ void set_cpu_sibling_map(int cpu)
if (threads > __max_smt_threads)
__max_smt_threads = threads;
+ for_each_cpu(i, topology_sibling_cpumask(cpu))
+ cpu_data(i).smt_active = threads > 1;
+
/*
* This needs a separate iteration over the cpus because we rely on all
* topology_sibling_cpumask links to be set-up.
@@ -1552,8 +1555,13 @@ static void remove_siblinginfo(int cpu)
for_each_cpu(sibling, topology_die_cpumask(cpu))
cpumask_clear_cpu(cpu, topology_die_cpumask(sibling));
- for_each_cpu(sibling, topology_sibling_cpumask(cpu))
+
+ for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
cpumask_clear_cpu(cpu, topology_sibling_cpumask(sibling));
+ if (cpumask_weight(topology_sibling_cpumask(sibling)) == 1)
+ cpu_data(sibling).smt_active = false;
+ }
+
for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
cpumask_clear(cpu_llc_shared_mask(cpu));
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v4 2/5] x86/mm: Refactor cond_ibpb() to support other use cases
2021-01-08 12:10 [PATCH v4 0/5] Next revision of the L1D flush patches Balbir Singh
2021-01-08 12:10 ` [PATCH v4 1/5] x86/smp: Add a per-cpu view of SMT state Balbir Singh
@ 2021-01-08 12:10 ` Balbir Singh
2021-07-28 9:58 ` [tip: x86/cpu] " tip-bot2 for Balbir Singh
2021-01-08 12:10 ` [PATCH v4 3/5] x86/mm: Optionally flush L1D on context switch Balbir Singh
` (6 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Balbir Singh @ 2021-01-08 12:10 UTC (permalink / raw)
To: tglx, mingo
Cc: peterz, linux-kernel, keescook, jpoimboe, tony.luck, benh, x86,
dave.hansen, thomas.lendacky, torvalds, Balbir Singh
cond_ibpb() has the necessary bits required to track the previous mm in
switch_mm_irqs_off(). This can be reused for other use cases like L1D
flushing on context switch.
[ tglx: Moved comment, added a separate define for state (re)initialization ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200510014803.12190-4-sblbir@amazon.com
Link: https://lore.kernel.org/r/20200729001103.6450-3-sblbir@amazon.com
---
arch/x86/include/asm/tlbflush.h | 2 +-
arch/x86/mm/tlb.c | 53 ++++++++++++++++++---------------
2 files changed, 30 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 8c87a2e0b660..a927d40664df 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -83,7 +83,7 @@ struct tlb_state {
/* Last user mm for optimizing IBPB */
union {
struct mm_struct *last_user_mm;
- unsigned long last_user_mm_ibpb;
+ unsigned long last_user_mm_spec;
};
u16 loaded_mm_asid;
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 569ac1d57f55..7320348b5a61 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -42,10 +42,14 @@
*/
/*
- * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer which is
- * stored in cpu_tlb_state.last_user_mm_ibpb.
+ * Bits to mangle the TIF_SPEC_IB state into the mm pointer which is
+ * stored in cpu_tlb_state.last_user_mm_spec.
*/
#define LAST_USER_MM_IBPB 0x1UL
+#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB)
+
+/* Bits to set when tlbstate and flush is (re)initialized */
+#define LAST_USER_MM_INIT LAST_USER_MM_IBPB
/*
* The x86 feature is called PCID (Process Context IDentifier). It is similar
@@ -316,20 +320,29 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
local_irq_restore(flags);
}
-static inline unsigned long mm_mangle_tif_spec_ib(struct task_struct *next)
+static inline unsigned long mm_mangle_tif_spec_bits(struct task_struct *next)
{
unsigned long next_tif = task_thread_info(next)->flags;
- unsigned long ibpb = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_IBPB;
+ unsigned long spec_bits = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_SPEC_MASK;
- return (unsigned long)next->mm | ibpb;
+ return (unsigned long)next->mm | spec_bits;
}
-static void cond_ibpb(struct task_struct *next)
+static void cond_mitigation(struct task_struct *next)
{
+ unsigned long prev_mm, next_mm;
+
if (!next || !next->mm)
return;
+ next_mm = mm_mangle_tif_spec_bits(next);
+ prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_spec);
+
/*
+ * Avoid user/user BTB poisoning by flushing the branch predictor
+ * when switching between processes. This stops one process from
+ * doing Spectre-v2 attacks on another.
+ *
* Both, the conditional and the always IBPB mode use the mm
* pointer to avoid the IBPB when switching between tasks of the
* same process. Using the mm pointer instead of mm->context.ctx_id
@@ -339,8 +352,6 @@ static void cond_ibpb(struct task_struct *next)
* exposed data is not really interesting.
*/
if (static_branch_likely(&switch_mm_cond_ibpb)) {
- unsigned long prev_mm, next_mm;
-
/*
* This is a bit more complex than the always mode because
* it has to handle two cases:
@@ -370,20 +381,14 @@ static void cond_ibpb(struct task_struct *next)
* Optimize this with reasonably small overhead for the
* above cases. Mangle the TIF_SPEC_IB bit into the mm
* pointer of the incoming task which is stored in
- * cpu_tlbstate.last_user_mm_ibpb for comparison.
- */
- next_mm = mm_mangle_tif_spec_ib(next);
- prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_ibpb);
-
- /*
+ * cpu_tlbstate.last_user_mm_spec for comparison.
+ *
* Issue IBPB only if the mm's are different and one or
* both have the IBPB bit set.
*/
if (next_mm != prev_mm &&
(next_mm | prev_mm) & LAST_USER_MM_IBPB)
indirect_branch_prediction_barrier();
-
- this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, next_mm);
}
if (static_branch_unlikely(&switch_mm_always_ibpb)) {
@@ -392,11 +397,12 @@ static void cond_ibpb(struct task_struct *next)
* different context than the user space task which ran
* last on this CPU.
*/
- if (this_cpu_read(cpu_tlbstate.last_user_mm) != next->mm) {
+ if ((prev_mm & ~LAST_USER_MM_SPEC_MASK) !=
+ (unsigned long)next->mm)
indirect_branch_prediction_barrier();
- this_cpu_write(cpu_tlbstate.last_user_mm, next->mm);
- }
}
+
+ this_cpu_write(cpu_tlbstate.last_user_mm_spec, next_mm);
}
#ifdef CONFIG_PERF_EVENTS
@@ -524,11 +530,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
need_flush = true;
} else {
/*
- * Avoid user/user BTB poisoning by flushing the branch
- * predictor when switching between processes. This stops
- * one process from doing Spectre-v2 attacks on another.
+ * Apply process to process speculation vulnerability
+ * mitigations if applicable.
*/
- cond_ibpb(tsk);
+ cond_mitigation(tsk);
/*
* Stop remote flushes for the previous mm.
@@ -636,7 +641,7 @@ void initialize_tlbstate_and_flush(void)
write_cr3(build_cr3(mm->pgd, 0));
/* Reinitialize tlbstate. */
- this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, LAST_USER_MM_IBPB);
+ this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_INIT);
this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
this_cpu_write(cpu_tlbstate.next_asid, 1);
this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
--
2.17.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip: x86/cpu] x86/mm: Refactor cond_ibpb() to support other use cases
2021-01-08 12:10 ` [PATCH v4 2/5] x86/mm: Refactor cond_ibpb() to support other use cases Balbir Singh
@ 2021-07-28 9:58 ` tip-bot2 for Balbir Singh
0 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Balbir Singh @ 2021-07-28 9:58 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Thomas Gleixner, Balbir Singh, x86, linux-kernel
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 371b09c6fdc436f2c7bb67fc90df5eec8ce90f06
Gitweb: https://git.kernel.org/tip/371b09c6fdc436f2c7bb67fc90df5eec8ce90f06
Author: Balbir Singh <sblbir@amazon.com>
AuthorDate: Fri, 08 Jan 2021 23:10:53 +11:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Jul 2021 11:42:24 +02:00
x86/mm: Refactor cond_ibpb() to support other use cases
cond_ibpb() has the necessary bits required to track the previous mm in
switch_mm_irqs_off(). This can be reused for other use cases like L1D
flushing on context switch.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210108121056.21940-3-sblbir@amazon.com
---
arch/x86/include/asm/tlbflush.h | 2 +-
arch/x86/mm/tlb.c | 53 +++++++++++++++++---------------
2 files changed, 30 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index fa952ea..b587a9e 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -83,7 +83,7 @@ struct tlb_state {
/* Last user mm for optimizing IBPB */
union {
struct mm_struct *last_user_mm;
- unsigned long last_user_mm_ibpb;
+ unsigned long last_user_mm_spec;
};
u16 loaded_mm_asid;
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index cfe6b1e..c98bc84 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -43,10 +43,14 @@
*/
/*
- * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer which is
- * stored in cpu_tlb_state.last_user_mm_ibpb.
+ * Bits to mangle the TIF_SPEC_IB state into the mm pointer which is
+ * stored in cpu_tlb_state.last_user_mm_spec.
*/
#define LAST_USER_MM_IBPB 0x1UL
+#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB)
+
+/* Bits to set when tlbstate and flush is (re)initialized */
+#define LAST_USER_MM_INIT LAST_USER_MM_IBPB
/*
* The x86 feature is called PCID (Process Context IDentifier). It is similar
@@ -317,20 +321,29 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
local_irq_restore(flags);
}
-static unsigned long mm_mangle_tif_spec_ib(struct task_struct *next)
+static unsigned long mm_mangle_tif_spec_bits(struct task_struct *next)
{
unsigned long next_tif = task_thread_info(next)->flags;
- unsigned long ibpb = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_IBPB;
+ unsigned long spec_bits = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_SPEC_MASK;
- return (unsigned long)next->mm | ibpb;
+ return (unsigned long)next->mm | spec_bits;
}
-static void cond_ibpb(struct task_struct *next)
+static void cond_mitigation(struct task_struct *next)
{
+ unsigned long prev_mm, next_mm;
+
if (!next || !next->mm)
return;
+ next_mm = mm_mangle_tif_spec_bits(next);
+ prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_spec);
+
/*
+ * Avoid user/user BTB poisoning by flushing the branch predictor
+ * when switching between processes. This stops one process from
+ * doing Spectre-v2 attacks on another.
+ *
* Both, the conditional and the always IBPB mode use the mm
* pointer to avoid the IBPB when switching between tasks of the
* same process. Using the mm pointer instead of mm->context.ctx_id
@@ -340,8 +353,6 @@ static void cond_ibpb(struct task_struct *next)
* exposed data is not really interesting.
*/
if (static_branch_likely(&switch_mm_cond_ibpb)) {
- unsigned long prev_mm, next_mm;
-
/*
* This is a bit more complex than the always mode because
* it has to handle two cases:
@@ -371,20 +382,14 @@ static void cond_ibpb(struct task_struct *next)
* Optimize this with reasonably small overhead for the
* above cases. Mangle the TIF_SPEC_IB bit into the mm
* pointer of the incoming task which is stored in
- * cpu_tlbstate.last_user_mm_ibpb for comparison.
- */
- next_mm = mm_mangle_tif_spec_ib(next);
- prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_ibpb);
-
- /*
+ * cpu_tlbstate.last_user_mm_spec for comparison.
+ *
* Issue IBPB only if the mm's are different and one or
* both have the IBPB bit set.
*/
if (next_mm != prev_mm &&
(next_mm | prev_mm) & LAST_USER_MM_IBPB)
indirect_branch_prediction_barrier();
-
- this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, next_mm);
}
if (static_branch_unlikely(&switch_mm_always_ibpb)) {
@@ -393,11 +398,12 @@ static void cond_ibpb(struct task_struct *next)
* different context than the user space task which ran
* last on this CPU.
*/
- if (this_cpu_read(cpu_tlbstate.last_user_mm) != next->mm) {
+ if ((prev_mm & ~LAST_USER_MM_SPEC_MASK) !=
+ (unsigned long)next->mm)
indirect_branch_prediction_barrier();
- this_cpu_write(cpu_tlbstate.last_user_mm, next->mm);
- }
}
+
+ this_cpu_write(cpu_tlbstate.last_user_mm_spec, next_mm);
}
#ifdef CONFIG_PERF_EVENTS
@@ -531,11 +537,10 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
need_flush = true;
} else {
/*
- * Avoid user/user BTB poisoning by flushing the branch
- * predictor when switching between processes. This stops
- * one process from doing Spectre-v2 attacks on another.
+ * Apply process to process speculation vulnerability
+ * mitigations if applicable.
*/
- cond_ibpb(tsk);
+ cond_mitigation(tsk);
/*
* Stop remote flushes for the previous mm.
@@ -643,7 +648,7 @@ void initialize_tlbstate_and_flush(void)
write_cr3(build_cr3(mm->pgd, 0));
/* Reinitialize tlbstate. */
- this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, LAST_USER_MM_IBPB);
+ this_cpu_write(cpu_tlbstate.last_user_mm_spec, LAST_USER_MM_INIT);
this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0);
this_cpu_write(cpu_tlbstate.next_asid, 1);
this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v4 3/5] x86/mm: Optionally flush L1D on context switch
2021-01-08 12:10 [PATCH v4 0/5] Next revision of the L1D flush patches Balbir Singh
2021-01-08 12:10 ` [PATCH v4 1/5] x86/smp: Add a per-cpu view of SMT state Balbir Singh
2021-01-08 12:10 ` [PATCH v4 2/5] x86/mm: Refactor cond_ibpb() to support other use cases Balbir Singh
@ 2021-01-08 12:10 ` Balbir Singh
2021-01-08 17:31 ` kernel test robot
2021-01-08 12:10 ` [PATCH v4 4/5] prctl: Hook L1D flushing in via prctl Balbir Singh
` (5 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Balbir Singh @ 2021-01-08 12:10 UTC (permalink / raw)
To: tglx, mingo
Cc: peterz, linux-kernel, keescook, jpoimboe, tony.luck, benh, x86,
dave.hansen, thomas.lendacky, torvalds, Balbir Singh
Implement a mechanism to selectively flush the L1D cache. The goal is to
allow tasks that want to save sensitive information, found by the recent
snoop assisted data sampling vulnerabilites, to flush their L1D on being
switched out. This protects their data from being snooped or leaked via
side channels after the task has context switched out.
There are two scenarios we might want to protect against, a task leaving
the CPU with data still in L1D (which is the main concern of this patch),
the second scenario is a malicious task coming in (not so well trusted)
for which we want to clean up the cache before it starts. Only the case
for the former is addressed.
A new thread_info flag TIF_SPEC_L1D_FLUSH is added to track tasks which
opt-into L1D flushing. cpu_tlbstate.last_user_mm_spec is used to convert
the TIF flags into mm state (per cpu via last_user_mm_spec) in
cond_mitigation(), which then used to do decide when to flush the
L1D cache.
A new helper inline function l1d_flush_hw() has been introduced.
Currently it returns an error code if hardware flushing is not
supported. The caller currently does not check the return value, in the
context of these patches, the routine is called only when HW assisted
flushing is available.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200729001103.6450-4-sblbir@amazon.com
---
arch/x86/include/asm/cacheflush.h | 8 ++++++++
arch/x86/include/asm/thread_info.h | 9 +++++++--
arch/x86/mm/tlb.c | 16 ++++++++++++++--
3 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index b192d917a6d0..554eaf697f3f 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -10,4 +10,12 @@
void clflush_cache_range(void *addr, unsigned int size);
+static inline int l1d_flush_hw(void)
+{
+ if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
+ wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+ return 0;
+ }
+ return -EOPNOTSUPP;
+}
#endif /* _ASM_X86_CACHEFLUSH_H */
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 0d751d5da702..33b637442b9e 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -81,7 +81,7 @@ struct thread_info {
#define TIF_SINGLESTEP 4 /* reenable singlestep on user return*/
#define TIF_SSBD 5 /* Speculative store bypass disable */
#define TIF_SPEC_IB 9 /* Indirect branch speculation mitigation */
-#define TIF_SPEC_FORCE_UPDATE 10 /* Force speculation MSR update in context switch */
+#define TIF_SPEC_L1D_FLUSH 10 /* Flush L1D on mm switches (processes) */
#define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */
#define TIF_UPROBE 12 /* breakpointed or singlestepping */
#define TIF_PATCH_PENDING 13 /* pending live patching update */
@@ -93,6 +93,7 @@ struct thread_info {
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
#define TIF_IO_BITMAP 22 /* uses I/O bitmap */
+#define TIF_SPEC_FORCE_UPDATE 23 /* Force speculation MSR update in context switch */
#define TIF_FORCED_TF 24 /* true if TF in eflags artificially */
#define TIF_BLOCKSTEP 25 /* set when we want DEBUGCTLMSR_BTF */
#define TIF_LAZY_MMU_UPDATES 27 /* task is updating the mmu lazily */
@@ -104,7 +105,7 @@ struct thread_info {
#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)
#define _TIF_SSBD (1 << TIF_SSBD)
#define _TIF_SPEC_IB (1 << TIF_SPEC_IB)
-#define _TIF_SPEC_FORCE_UPDATE (1 << TIF_SPEC_FORCE_UPDATE)
+#define _TIF_SPEC_L1D_FLUSH (1 << TIF_SPEC_L1D_FLUSH)
#define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY)
#define _TIF_UPROBE (1 << TIF_UPROBE)
#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
@@ -115,6 +116,7 @@ struct thread_info {
#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
+#define _TIF_SPEC_FORCE_UPDATE (1 << TIF_SPEC_FORCE_UPDATE)
#define _TIF_FORCED_TF (1 << TIF_FORCED_TF)
#define _TIF_BLOCKSTEP (1 << TIF_BLOCKSTEP)
#define _TIF_LAZY_MMU_UPDATES (1 << TIF_LAZY_MMU_UPDATES)
@@ -217,6 +219,9 @@ static inline int arch_within_stack_frames(const void * const stack,
current_thread_info()->status & TS_COMPAT)
#endif
+extern int enable_l1d_flush_for_task(struct task_struct *tsk);
+extern int disable_l1d_flush_for_task(struct task_struct *tsk);
+
extern void arch_task_cache_init(void);
extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
extern void arch_release_task_struct(struct task_struct *tsk);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 7320348b5a61..f67c5bd58158 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -8,11 +8,13 @@
#include <linux/export.h>
#include <linux/cpu.h>
#include <linux/debugfs.h>
+#include <linux/sched/smt.h>
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
#include <asm/nospec-branch.h>
#include <asm/cache.h>
+#include <asm/cacheflush.h>
#include <asm/apic.h>
#include "mm_internal.h"
@@ -42,11 +44,12 @@
*/
/*
- * Bits to mangle the TIF_SPEC_IB state into the mm pointer which is
+ * Bits to mangle the TIF_SPEC_* state into the mm pointer which is
* stored in cpu_tlb_state.last_user_mm_spec.
*/
#define LAST_USER_MM_IBPB 0x1UL
-#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB)
+#define LAST_USER_MM_L1D_FLUSH 0x2UL
+#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB | LAST_USER_MM_L1D_FLUSH)
/* Bits to set when tlbstate and flush is (re)initialized */
#define LAST_USER_MM_INIT LAST_USER_MM_IBPB
@@ -325,6 +328,7 @@ static inline unsigned long mm_mangle_tif_spec_bits(struct task_struct *next)
unsigned long next_tif = task_thread_info(next)->flags;
unsigned long spec_bits = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_SPEC_MASK;
+ BUILD_BUG_ON(TIF_SPEC_L1D_FLUSH != TIF_SPEC_IB + 1);
return (unsigned long)next->mm | spec_bits;
}
@@ -402,6 +406,14 @@ static void cond_mitigation(struct task_struct *next)
indirect_branch_prediction_barrier();
}
+ /*
+ * Flush only if SMT is disabled as per the contract, which is checked
+ * when the feature is enabled.
+ */
+ if (!this_cpu_read(cpu_info.smt_active) &&
+ (prev_mm & LAST_USER_MM_L1D_FLUSH))
+ l1d_flush_hw();
+
this_cpu_write(cpu_tlbstate.last_user_mm_spec, next_mm);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v4 3/5] x86/mm: Optionally flush L1D on context switch
2021-01-08 12:10 ` [PATCH v4 3/5] x86/mm: Optionally flush L1D on context switch Balbir Singh
@ 2021-01-08 17:31 ` kernel test robot
0 siblings, 0 replies; 21+ messages in thread
From: kernel test robot @ 2021-01-08 17:31 UTC (permalink / raw)
To: kbuild-all
[-- Attachment #1: Type: text/plain, Size: 5551 bytes --]
Hi Balbir,
I love your patch! Perhaps something to improve:
[auto build test WARNING on next-20210108]
[also build test WARNING on v5.11-rc2]
[cannot apply to tip/x86/mm tip/master linus/master tip/x86/core v5.11-rc2 v5.11-rc1 v5.10]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Balbir-Singh/Next-revision-of-the-L1D-flush-patches/20210108-201551
base: 1c925d2030afd354a02c23500386e620e662622b
config: x86_64-randconfig-s022-20210108 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-208-g46a52ca4-dirty
# https://github.com/0day-ci/linux/commit/9dcce59b808dbd79d6535898f9262c0e4073aff0
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Balbir-Singh/Next-revision-of-the-L1D-flush-patches/20210108-201551
git checkout 9dcce59b808dbd79d6535898f9262c0e4073aff0
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
"sparse warnings: (new ones prefixed by >>)"
>> arch/x86/mm/tlb.c:413:14: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected void const [noderef] __percpu *__vpp_verify @@ got bool * @@
arch/x86/mm/tlb.c:413:14: sparse: expected void const [noderef] __percpu *__vpp_verify
arch/x86/mm/tlb.c:413:14: sparse: got bool *
vim +413 arch/x86/mm/tlb.c
334
335 static void cond_mitigation(struct task_struct *next)
336 {
337 unsigned long prev_mm, next_mm;
338
339 if (!next || !next->mm)
340 return;
341
342 next_mm = mm_mangle_tif_spec_bits(next);
343 prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_spec);
344
345 /*
346 * Avoid user/user BTB poisoning by flushing the branch predictor
347 * when switching between processes. This stops one process from
348 * doing Spectre-v2 attacks on another.
349 *
350 * Both, the conditional and the always IBPB mode use the mm
351 * pointer to avoid the IBPB when switching between tasks of the
352 * same process. Using the mm pointer instead of mm->context.ctx_id
353 * opens a hypothetical hole vs. mm_struct reuse, which is more or
354 * less impossible to control by an attacker. Aside of that it
355 * would only affect the first schedule so the theoretically
356 * exposed data is not really interesting.
357 */
358 if (static_branch_likely(&switch_mm_cond_ibpb)) {
359 /*
360 * This is a bit more complex than the always mode because
361 * it has to handle two cases:
362 *
363 * 1) Switch from a user space task (potential attacker)
364 * which has TIF_SPEC_IB set to a user space task
365 * (potential victim) which has TIF_SPEC_IB not set.
366 *
367 * 2) Switch from a user space task (potential attacker)
368 * which has TIF_SPEC_IB not set to a user space task
369 * (potential victim) which has TIF_SPEC_IB set.
370 *
371 * This could be done by unconditionally issuing IBPB when
372 * a task which has TIF_SPEC_IB set is either scheduled in
373 * or out. Though that results in two flushes when:
374 *
375 * - the same user space task is scheduled out and later
376 * scheduled in again and only a kernel thread ran in
377 * between.
378 *
379 * - a user space task belonging to the same process is
380 * scheduled in after a kernel thread ran in between
381 *
382 * - a user space task belonging to the same process is
383 * scheduled in immediately.
384 *
385 * Optimize this with reasonably small overhead for the
386 * above cases. Mangle the TIF_SPEC_IB bit into the mm
387 * pointer of the incoming task which is stored in
388 * cpu_tlbstate.last_user_mm_spec for comparison.
389 *
390 * Issue IBPB only if the mm's are different and one or
391 * both have the IBPB bit set.
392 */
393 if (next_mm != prev_mm &&
394 (next_mm | prev_mm) & LAST_USER_MM_IBPB)
395 indirect_branch_prediction_barrier();
396 }
397
398 if (static_branch_unlikely(&switch_mm_always_ibpb)) {
399 /*
400 * Only flush when switching to a user space task with a
401 * different context than the user space task which ran
402 * last on this CPU.
403 */
404 if ((prev_mm & ~LAST_USER_MM_SPEC_MASK) !=
405 (unsigned long)next->mm)
406 indirect_branch_prediction_barrier();
407 }
408
409 /*
410 * Flush only if SMT is disabled as per the contract, which is checked
411 * when the feature is enabled.
412 */
> 413 if (!this_cpu_read(cpu_info.smt_active) &&
414 (prev_mm & LAST_USER_MM_L1D_FLUSH))
415 l1d_flush_hw();
416
417 this_cpu_write(cpu_tlbstate.last_user_mm_spec, next_mm);
418 }
419
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 31750 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v4 4/5] prctl: Hook L1D flushing in via prctl
2021-01-08 12:10 [PATCH v4 0/5] Next revision of the L1D flush patches Balbir Singh
` (2 preceding siblings ...)
2021-01-08 12:10 ` [PATCH v4 3/5] x86/mm: Optionally flush L1D on context switch Balbir Singh
@ 2021-01-08 12:10 ` Balbir Singh
2021-07-28 9:58 ` [tip: x86/cpu] x86, " tip-bot2 for Balbir Singh
2021-01-08 12:10 ` [PATCH v4 5/5] Documentation: Add L1D flushing Documentation Balbir Singh
` (4 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Balbir Singh @ 2021-01-08 12:10 UTC (permalink / raw)
To: tglx, mingo
Cc: peterz, linux-kernel, keescook, jpoimboe, tony.luck, benh, x86,
dave.hansen, thomas.lendacky, torvalds, Balbir Singh
Use the existing PR_GET/SET_SPECULATION_CTRL API to expose the L1D
flush capability. For L1D flushing PR_SPEC_FORCE_DISABLE and
PR_SPEC_DISABLE_NOEXEC are not supported.
Enabling L1D flush does not check if the task is running on
an SMT enabled core, rather a check is done at runtime (at the
time of flush), if the task runs on a non SMT enabled core
then the task is sent a SIGBUS (this is done prior to the task
executing on the core, so no data is leaked). This is better
than the other alternatives of
a. Ensuring strict affinity of the task (hard to enforce
without further changes in the scheduler)
b. Silently skipping flush for tasks that move to SMT enabled
cores.
An arch config ARCH_HAS_PARANOID_L1D_FLUSH has been added
and struct task carries a callback_head for arch's that support
this config (currently on x86), this callback head is used
to schedule task work (SIGBUS delivery).
There is also no seccomp integration for the feature.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/Kconfig | 4 ++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/nospec-branch.h | 2 +
arch/x86/include/asm/thread_info.h | 3 --
arch/x86/kernel/cpu/bugs.c | 71 ++++++++++++++++++++++++++++
arch/x86/mm/tlb.c | 29 ++++++++++--
include/linux/sched.h | 10 ++++
include/uapi/linux/prctl.h | 1 +
8 files changed, 115 insertions(+), 6 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 7091f7187951..0a0701e0a1ed 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -327,6 +327,10 @@ config ARCH_32BIT_OFF_T
still support 32-bit off_t. This option is enabled for all such
architectures explicitly.
+config ARCH_HAS_PARANOID_L1D_FLUSH
+ bool
+ default n
+
config HAVE_ASM_MODVERSIONS
bool
help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bd4993b276fd..2bb53bfaea02 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -107,6 +107,7 @@ config X86
select ARCH_WANT_HUGE_PMD_SHARE
select ARCH_WANT_LD_ORPHAN_WARN
select ARCH_WANTS_THP_SWAP if X86_64
+ select ARCH_HAS_PARANOID_L1D_FLUSH
select BUILDTIME_TABLE_SORT
select CLKEVT_I8253
select CLOCKSOURCE_VALIDATE_LAST_CYCLE
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index cb9ad6b73973..cd60934c6075 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -253,6 +253,8 @@ DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
DECLARE_STATIC_KEY_FALSE(mds_user_clear);
DECLARE_STATIC_KEY_FALSE(mds_idle_clear);
+DECLARE_STATIC_KEY_FALSE(l1d_flush_enabled);
+
#include <asm/segment.h>
/**
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index 33b637442b9e..054dc0f58ac4 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -219,9 +219,6 @@ static inline int arch_within_stack_frames(const void * const stack,
current_thread_info()->status & TS_COMPAT)
#endif
-extern int enable_l1d_flush_for_task(struct task_struct *tsk);
-extern int disable_l1d_flush_for_task(struct task_struct *tsk);
-
extern void arch_task_cache_init(void);
extern int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src);
extern void arch_release_task_struct(struct task_struct *tsk);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d41b70fe4918..e07d2a1d5eb2 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -76,6 +76,20 @@ EXPORT_SYMBOL_GPL(mds_user_clear);
DEFINE_STATIC_KEY_FALSE(mds_idle_clear);
EXPORT_SYMBOL_GPL(mds_idle_clear);
+/*
+ * Controls whether l1d flush based mitigations are enabled,
+ * based on hw features and admin setting via boot parameter
+ * defaults to false
+ */
+DEFINE_STATIC_KEY_FALSE(l1d_flush_enabled);
+
+enum l1d_flush_mitigations {
+ L1D_FLUSH_OFF = 0,
+ L1D_FLUSH_ON,
+};
+
+static enum l1d_flush_mitigations l1d_flush_mitigation __initdata = L1D_FLUSH_OFF;
+
void __init check_bugs(void)
{
identify_boot_cpu();
@@ -150,6 +164,10 @@ void __init check_bugs(void)
if (!direct_gbpages)
set_memory_4k((unsigned long)__va(0), 1);
#endif
+ if (!l1d_flush_mitigation || !boot_cpu_has_bug(X86_BUG_L1TF) ||
+ !boot_cpu_has(X86_FEATURE_FLUSH_L1D))
+ return;
+ static_branch_enable(&l1d_flush_enabled);
}
void
@@ -379,6 +397,15 @@ static void __init taa_select_mitigation(void)
pr_info("%s\n", taa_strings[taa_mitigation]);
}
+static int __init l1d_flush_parse_cmdline(char *str)
+{
+ if (!strcmp(str, "on"))
+ l1d_flush_mitigation = L1D_FLUSH_ON;
+
+ return 0;
+}
+early_param("l1d_flush", l1d_flush_parse_cmdline);
+
static int __init tsx_async_abort_parse_cmdline(char *str)
{
if (!boot_cpu_has_bug(X86_BUG_TAA))
@@ -1215,6 +1242,35 @@ static void task_update_spec_tif(struct task_struct *tsk)
speculation_ctrl_update_current();
}
+static inline int enable_l1d_flush_for_task(struct task_struct *tsk)
+{
+ set_ti_thread_flag(&tsk->thread_info, TIF_SPEC_L1D_FLUSH);
+ return 0;
+}
+
+static inline int disable_l1d_flush_for_task(struct task_struct *tsk)
+{
+ clear_ti_thread_flag(&tsk->thread_info, TIF_SPEC_L1D_FLUSH);
+ return 0;
+}
+
+static int l1d_flush_prctl_set(struct task_struct *task, unsigned long ctrl)
+{
+
+ if (!static_branch_unlikely(&l1d_flush_enabled))
+ return -EPERM;
+
+ switch (ctrl) {
+ case PR_SPEC_ENABLE:
+ return enable_l1d_flush_for_task(task);
+ case PR_SPEC_DISABLE:
+ return disable_l1d_flush_for_task(task);
+ default:
+ return -ERANGE;
+ }
+ return 0;
+}
+
static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl)
{
if (ssb_mode != SPEC_STORE_BYPASS_PRCTL &&
@@ -1324,6 +1380,8 @@ int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which,
return ssb_prctl_set(task, ctrl);
case PR_SPEC_INDIRECT_BRANCH:
return ib_prctl_set(task, ctrl);
+ case PR_SPEC_L1D_FLUSH:
+ return l1d_flush_prctl_set(task, ctrl);
default:
return -ENODEV;
}
@@ -1340,6 +1398,17 @@ void arch_seccomp_spec_mitigate(struct task_struct *task)
}
#endif
+static int l1d_flush_prctl_get(struct task_struct *task)
+{
+ if (!static_branch_unlikely(&l1d_flush_enabled))
+ return PR_SPEC_FORCE_DISABLE;
+
+ if (test_ti_thread_flag(&task->thread_info, TIF_SPEC_L1D_FLUSH))
+ return PR_SPEC_PRCTL | PR_SPEC_ENABLE;
+ else
+ return PR_SPEC_PRCTL | PR_SPEC_DISABLE;
+}
+
static int ssb_prctl_get(struct task_struct *task)
{
switch (ssb_mode) {
@@ -1390,6 +1459,8 @@ int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which)
return ssb_prctl_get(task);
case PR_SPEC_INDIRECT_BRANCH:
return ib_prctl_get(task);
+ case PR_SPEC_L1D_FLUSH:
+ return l1d_flush_prctl_get(task);
default:
return -ENODEV;
}
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index f67c5bd58158..aa9286b83f8f 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -323,6 +323,28 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
local_irq_restore(flags);
}
+/*
+ * Sent to a task that opts into L1D flushing via the prctl interface
+ * but ends up running on an SMT enabled core.
+ */
+static void l1d_flush_kill(struct callback_head *ch)
+{
+ force_sig(SIGBUS);
+}
+
+static void l1d_flush_evaluate(unsigned long prev_mm, unsigned long next_mm,
+ struct task_struct *next)
+{
+ if (prev_mm & LAST_USER_MM_L1D_FLUSH)
+ l1d_flush_hw();
+
+ if ((next_mm & LAST_USER_MM_L1D_FLUSH) && this_cpu_read(cpu_info.smt_active)) {
+ clear_ti_thread_flag(&next->thread_info, TIF_SPEC_L1D_FLUSH);
+ next->l1d_flush_kill.func = l1d_flush_kill;
+ task_work_add(next, &next->l1d_flush_kill, TWA_RESUME);
+ }
+}
+
static inline unsigned long mm_mangle_tif_spec_bits(struct task_struct *next)
{
unsigned long next_tif = task_thread_info(next)->flags;
@@ -410,9 +432,10 @@ static void cond_mitigation(struct task_struct *next)
* Flush only if SMT is disabled as per the contract, which is checked
* when the feature is enabled.
*/
- if (!this_cpu_read(cpu_info.smt_active) &&
- (prev_mm & LAST_USER_MM_L1D_FLUSH))
- l1d_flush_hw();
+ if (static_branch_unlikely(&l1d_flush_enabled)) {
+ if (unlikely((prev_mm | next_mm) & LAST_USER_MM_L1D_FLUSH))
+ l1d_flush_evaluate(prev_mm, next_mm, next);
+ }
this_cpu_write(cpu_tlbstate.last_user_mm_spec, next_mm);
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e5ad6d354b7b..77e9d32d70ca 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1376,6 +1376,16 @@ struct task_struct {
unsigned long getblk_bh_state;
#endif
+#ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH
+ /*
+ * If L1D flush is supported on mm context switch
+ * then we use this callback head to queue kill work
+ * to kill tasks that are not running on SMT disabled
+ * cores
+ */
+ struct callback_head l1d_flush_kill;
+#endif
+
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 90deb41c8a34..44adcae6641c 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -213,6 +213,7 @@ struct prctl_mm_map {
/* Speculation control variants */
# define PR_SPEC_STORE_BYPASS 0
# define PR_SPEC_INDIRECT_BRANCH 1
+# define PR_SPEC_L1D_FLUSH 2
/* Return and control values for PR_SET/GET_SPECULATION_CTRL */
# define PR_SPEC_NOT_AFFECTED 0
# define PR_SPEC_PRCTL (1UL << 0)
--
2.17.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip: x86/cpu] x86, prctl: Hook L1D flushing in via prctl
2021-01-08 12:10 ` [PATCH v4 4/5] prctl: Hook L1D flushing in via prctl Balbir Singh
@ 2021-07-28 9:58 ` tip-bot2 for Balbir Singh
0 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Balbir Singh @ 2021-07-28 9:58 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Thomas Gleixner, Balbir Singh, x86, linux-kernel
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: e893bb1bb4d2eb635eba61e5d9c5135d96855773
Gitweb: https://git.kernel.org/tip/e893bb1bb4d2eb635eba61e5d9c5135d96855773
Author: Balbir Singh <sblbir@amazon.com>
AuthorDate: Fri, 08 Jan 2021 23:10:55 +11:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Jul 2021 11:42:25 +02:00
x86, prctl: Hook L1D flushing in via prctl
Use the existing PR_GET/SET_SPECULATION_CTRL API to expose the L1D flush
capability. For L1D flushing PR_SPEC_FORCE_DISABLE and
PR_SPEC_DISABLE_NOEXEC are not supported.
Enabling L1D flush does not check if the task is running on an SMT enabled
core, rather a check is done at runtime (at the time of flush), if the task
runs on a SMT sibling then the task is sent a SIGBUS which is executed
before the task returns to user space or to a guest.
This is better than the other alternatives of:
a. Ensuring strict affinity of the task (hard to enforce without further
changes in the scheduler)
b. Silently skipping flush for tasks that move to SMT enabled cores.
Hook up the core prctl and implement the x86 specific parts which in turn
makes it functional.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210108121056.21940-5-sblbir@amazon.com
---
arch/x86/kernel/cpu/bugs.c | 33 +++++++++++++++++++++++++++++++++
include/uapi/linux/prctl.h | 1 +
2 files changed, 34 insertions(+)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 1a5a1b0..ecfca3b 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1252,6 +1252,24 @@ static void task_update_spec_tif(struct task_struct *tsk)
speculation_ctrl_update_current();
}
+static int l1d_flush_prctl_set(struct task_struct *task, unsigned long ctrl)
+{
+
+ if (!static_branch_unlikely(&switch_mm_cond_l1d_flush))
+ return -EPERM;
+
+ switch (ctrl) {
+ case PR_SPEC_ENABLE:
+ set_ti_thread_flag(&task->thread_info, TIF_SPEC_L1D_FLUSH);
+ return 0;
+ case PR_SPEC_DISABLE:
+ clear_ti_thread_flag(&task->thread_info, TIF_SPEC_L1D_FLUSH);
+ return 0;
+ default:
+ return -ERANGE;
+ }
+}
+
static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl)
{
if (ssb_mode != SPEC_STORE_BYPASS_PRCTL &&
@@ -1361,6 +1379,8 @@ int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which,
return ssb_prctl_set(task, ctrl);
case PR_SPEC_INDIRECT_BRANCH:
return ib_prctl_set(task, ctrl);
+ case PR_SPEC_L1D_FLUSH:
+ return l1d_flush_prctl_set(task, ctrl);
default:
return -ENODEV;
}
@@ -1377,6 +1397,17 @@ void arch_seccomp_spec_mitigate(struct task_struct *task)
}
#endif
+static int l1d_flush_prctl_get(struct task_struct *task)
+{
+ if (!static_branch_unlikely(&switch_mm_cond_l1d_flush))
+ return PR_SPEC_FORCE_DISABLE;
+
+ if (test_ti_thread_flag(&task->thread_info, TIF_SPEC_L1D_FLUSH))
+ return PR_SPEC_PRCTL | PR_SPEC_ENABLE;
+ else
+ return PR_SPEC_PRCTL | PR_SPEC_DISABLE;
+}
+
static int ssb_prctl_get(struct task_struct *task)
{
switch (ssb_mode) {
@@ -1427,6 +1458,8 @@ int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which)
return ssb_prctl_get(task);
case PR_SPEC_INDIRECT_BRANCH:
return ib_prctl_get(task);
+ case PR_SPEC_L1D_FLUSH:
+ return l1d_flush_prctl_get(task);
default:
return -ENODEV;
}
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 967d9c5..964c41e 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -213,6 +213,7 @@ struct prctl_mm_map {
/* Speculation control variants */
# define PR_SPEC_STORE_BYPASS 0
# define PR_SPEC_INDIRECT_BRANCH 1
+# define PR_SPEC_L1D_FLUSH 2
/* Return and control values for PR_SET/GET_SPECULATION_CTRL */
# define PR_SPEC_NOT_AFFECTED 0
# define PR_SPEC_PRCTL (1UL << 0)
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v4 5/5] Documentation: Add L1D flushing Documentation
2021-01-08 12:10 [PATCH v4 0/5] Next revision of the L1D flush patches Balbir Singh
` (3 preceding siblings ...)
2021-01-08 12:10 ` [PATCH v4 4/5] prctl: Hook L1D flushing in via prctl Balbir Singh
@ 2021-01-08 12:10 ` Balbir Singh
2021-07-28 9:58 ` [tip: x86/cpu] " tip-bot2 for Balbir Singh
2021-01-25 9:27 ` [PATCH v4 0/5] Next revision of the L1D flush patches Singh, Balbir
` (3 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Balbir Singh @ 2021-01-08 12:10 UTC (permalink / raw)
To: tglx, mingo
Cc: peterz, linux-kernel, keescook, jpoimboe, tony.luck, benh, x86,
dave.hansen, thomas.lendacky, torvalds, Balbir Singh
Add documentation of l1d flushing, explain the need for the
feature and how it can be used.
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/l1d_flush.rst | 70 +++++++++++++++++++
.../admin-guide/kernel-parameters.txt | 17 +++++
Documentation/userspace-api/spec_ctrl.rst | 8 +++
4 files changed, 96 insertions(+)
create mode 100644 Documentation/admin-guide/hw-vuln/l1d_flush.rst
diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst
index ca4dbdd9016d..21710f8609fe 100644
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -15,3 +15,4 @@ are configurable at compile, boot or run time.
tsx_async_abort
multihit.rst
special-register-buffer-data-sampling.rst
+ l1d_flush.rst
diff --git a/Documentation/admin-guide/hw-vuln/l1d_flush.rst b/Documentation/admin-guide/hw-vuln/l1d_flush.rst
new file mode 100644
index 000000000000..d9bd931641b3
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/l1d_flush.rst
@@ -0,0 +1,70 @@
+L1D Flushing
+============
+
+With an increasing number of vulnerabilities being reported around data
+leaks from the Level 1 Data cache (L1D) the kernel provides an opt-in
+mechanism to flush the L1D cache on context switch.
+
+This mechanism can be used to address e.g. CVE-2020-0550. For applications
+the mechanism keeps them safe from vulnerabilities, related to leaks
+(snooping of) from the L1D cache.
+
+
+Related CVEs
+------------
+The following CVEs can be addressed by this
+mechanism
+
+ ============= ======================== ==================
+ CVE-2020-0550 Improper Data Forwarding OS related aspects
+ ============= ======================== ==================
+
+Usage Guidelines
+----------------
+
+Please see document: :ref:`Documentation/userspace-api/spec_ctrl.rst
+<set_spec_ctrl>` for details.
+
+**NOTE**: The feature is disabled by default, applications need to
+specifically opt into the feature to enable it.
+
+Mitigation
+----------
+
+When PR_SET_L1D_FLUSH is enabled for a task a flush of the L1D cache is
+performed when the task is scheduled out and the incoming task belongs to a
+different process and therefore to a different address space.
+
+If the underlying CPU supports L1D flushing in hardware, the hardware
+mechanism is used, software fallback for the mitigation, is not supported.
+
+Mitigation control on the kernel command line
+---------------------------------------------
+
+The kernel command line allows to control the L1D flush mitigations at boot
+time with the option "l1d_flush=". The valid arguments for this option are:
+
+ ============ =============================================================
+ on Enables the prctl interface, applications trying to use
+ the prctl() will fail with an error if l1d_flush is not
+ enabled
+ ============ =============================================================
+
+By default the API is enabled and applications opt-in by using the prctl
+API.
+
+Limitations
+-----------
+
+The mechanism does not mitigate L1D data leaks between tasks belonging to
+different processes which are concurrently executing on sibling threads of
+a physical CPU core when SMT is enabled on the system.
+
+This can be addressed by controlled placement of processes on physical CPU
+cores or by disabling SMT. See the relevant chapter in the L1TF mitigation
+document: :ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`.
+
+**NOTE** : The opt-in of a task for L1D flushing will work only when the
+tasks affinity is limited to cores running in non-SMT mode. Running the task
+on a CPU with SMT enabled would result in the task getting a SIGBUS when
+t executes on the non-SMT core.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bc20e2f4677f..bd1e8e329727 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2356,6 +2356,23 @@
feature (tagged TLBs) on capable Intel chips.
Default is 1 (enabled)
+ l1d_flush= [X86,INTEL]
+ Control mitigation for L1D based snooping vulnerability.
+
+ Certain CPUs are vulnerable to an exploit against CPU
+ internal buffers which can forward information to a
+ disclosure gadget under certain conditions.
+
+ In vulnerable processors, the speculatively
+ forwarded data can be used in a cache side channel
+ attack, to access data to which the attacker does
+ not have direct access.
+
+ This parameter controls the mitigation. The
+ options are:
+
+ on - enable the interface for the mitigation
+
l1tf= [X86] Control mitigation of the L1TF vulnerability on
affected CPUs
diff --git a/Documentation/userspace-api/spec_ctrl.rst b/Documentation/userspace-api/spec_ctrl.rst
index 7ddd8f667459..5e8ed9eef9aa 100644
--- a/Documentation/userspace-api/spec_ctrl.rst
+++ b/Documentation/userspace-api/spec_ctrl.rst
@@ -106,3 +106,11 @@ Speculation misfeature controls
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);
+
+- PR_SPEC_L1D_FLUSH: Flush L1D Cache on context switch out of the task
+ (works only when tasks run on non SMT cores)
+
+ Invocations:
+ * prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_L1D_FLUSH, 0, 0, 0);
+ * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_L1D_FLUSH, PR_SPEC_ENABLE, 0, 0);
+ * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_L1D_FLUSH, PR_SPEC_DISABLE, 0, 0);
--
2.17.1
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip: x86/cpu] Documentation: Add L1D flushing Documentation
2021-01-08 12:10 ` [PATCH v4 5/5] Documentation: Add L1D flushing Documentation Balbir Singh
@ 2021-07-28 9:58 ` tip-bot2 for Balbir Singh
0 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Balbir Singh @ 2021-07-28 9:58 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Balbir Singh, Thomas Gleixner, x86, linux-kernel
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: b7fe54f6c2d437082dcbecfbd832f38edd9caaf4
Gitweb: https://git.kernel.org/tip/b7fe54f6c2d437082dcbecfbd832f38edd9caaf4
Author: Balbir Singh <sblbir@amazon.com>
AuthorDate: Fri, 08 Jan 2021 23:10:56 +11:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Jul 2021 11:42:25 +02:00
Documentation: Add L1D flushing Documentation
Add documentation of l1d flushing, explain the need for the
feature and how it can be used.
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210108121056.21940-6-sblbir@amazon.com
---
Documentation/admin-guide/hw-vuln/index.rst | 1 +-
Documentation/admin-guide/hw-vuln/l1d_flush.rst | 69 ++++++++++++++++-
Documentation/admin-guide/kernel-parameters.txt | 17 ++++-
Documentation/userspace-api/spec_ctrl.rst | 8 ++-
4 files changed, 95 insertions(+)
create mode 100644 Documentation/admin-guide/hw-vuln/l1d_flush.rst
diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst
index f12cda5..8cbc711 100644
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -16,3 +16,4 @@ are configurable at compile, boot or run time.
multihit.rst
special-register-buffer-data-sampling.rst
core-scheduling.rst
+ l1d_flush.rst
diff --git a/Documentation/admin-guide/hw-vuln/l1d_flush.rst b/Documentation/admin-guide/hw-vuln/l1d_flush.rst
new file mode 100644
index 0000000..210020b
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/l1d_flush.rst
@@ -0,0 +1,69 @@
+L1D Flushing
+============
+
+With an increasing number of vulnerabilities being reported around data
+leaks from the Level 1 Data cache (L1D) the kernel provides an opt-in
+mechanism to flush the L1D cache on context switch.
+
+This mechanism can be used to address e.g. CVE-2020-0550. For applications
+the mechanism keeps them safe from vulnerabilities, related to leaks
+(snooping of) from the L1D cache.
+
+
+Related CVEs
+------------
+The following CVEs can be addressed by this
+mechanism
+
+ ============= ======================== ==================
+ CVE-2020-0550 Improper Data Forwarding OS related aspects
+ ============= ======================== ==================
+
+Usage Guidelines
+----------------
+
+Please see document: :ref:`Documentation/userspace-api/spec_ctrl.rst
+<set_spec_ctrl>` for details.
+
+**NOTE**: The feature is disabled by default, applications need to
+specifically opt into the feature to enable it.
+
+Mitigation
+----------
+
+When PR_SET_L1D_FLUSH is enabled for a task a flush of the L1D cache is
+performed when the task is scheduled out and the incoming task belongs to a
+different process and therefore to a different address space.
+
+If the underlying CPU supports L1D flushing in hardware, the hardware
+mechanism is used, software fallback for the mitigation, is not supported.
+
+Mitigation control on the kernel command line
+---------------------------------------------
+
+The kernel command line allows to control the L1D flush mitigations at boot
+time with the option "l1d_flush=". The valid arguments for this option are:
+
+ ============ =============================================================
+ on Enables the prctl interface, applications trying to use
+ the prctl() will fail with an error if l1d_flush is not
+ enabled
+ ============ =============================================================
+
+By default the mechanism is disabled.
+
+Limitations
+-----------
+
+The mechanism does not mitigate L1D data leaks between tasks belonging to
+different processes which are concurrently executing on sibling threads of
+a physical CPU core when SMT is enabled on the system.
+
+This can be addressed by controlled placement of processes on physical CPU
+cores or by disabling SMT. See the relevant chapter in the L1TF mitigation
+document: :ref:`Documentation/admin-guide/hw-vuln/l1tf.rst <smt_control>`.
+
+**NOTE** : The opt-in of a task for L1D flushing works only when the task's
+affinity is limited to cores running in non-SMT mode. If a task which
+requested L1D flushing is scheduled on a SMT-enabled core the kernel sends
+a SIGBUS to the task.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index bdb2200..b105db2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2421,6 +2421,23 @@
feature (tagged TLBs) on capable Intel chips.
Default is 1 (enabled)
+ l1d_flush= [X86,INTEL]
+ Control mitigation for L1D based snooping vulnerability.
+
+ Certain CPUs are vulnerable to an exploit against CPU
+ internal buffers which can forward information to a
+ disclosure gadget under certain conditions.
+
+ In vulnerable processors, the speculatively
+ forwarded data can be used in a cache side channel
+ attack, to access data to which the attacker does
+ not have direct access.
+
+ This parameter controls the mitigation. The
+ options are:
+
+ on - enable the interface for the mitigation
+
l1tf= [X86] Control mitigation of the L1TF vulnerability on
affected CPUs
diff --git a/Documentation/userspace-api/spec_ctrl.rst b/Documentation/userspace-api/spec_ctrl.rst
index 7ddd8f6..5e8ed9e 100644
--- a/Documentation/userspace-api/spec_ctrl.rst
+++ b/Documentation/userspace-api/spec_ctrl.rst
@@ -106,3 +106,11 @@ Speculation misfeature controls
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
* prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);
+
+- PR_SPEC_L1D_FLUSH: Flush L1D Cache on context switch out of the task
+ (works only when tasks run on non SMT cores)
+
+ Invocations:
+ * prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_L1D_FLUSH, 0, 0, 0);
+ * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_L1D_FLUSH, PR_SPEC_ENABLE, 0, 0);
+ * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_L1D_FLUSH, PR_SPEC_DISABLE, 0, 0);
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v4 0/5] Next revision of the L1D flush patches
2021-01-08 12:10 [PATCH v4 0/5] Next revision of the L1D flush patches Balbir Singh
` (4 preceding siblings ...)
2021-01-08 12:10 ` [PATCH v4 5/5] Documentation: Add L1D flushing Documentation Balbir Singh
@ 2021-01-25 9:27 ` Singh, Balbir
2021-04-08 20:23 ` Kees Cook
2021-07-28 9:58 ` [tip: x86/cpu] x86/mm: Prepare for opt-in based L1D flush in switch_mm() tip-bot2 for Balbir Singh
` (2 subsequent siblings)
8 siblings, 1 reply; 21+ messages in thread
From: Singh, Balbir @ 2021-01-25 9:27 UTC (permalink / raw)
To: tglx, mingo
Cc: linux-kernel, peterz, keescook, torvalds, jpoimboe, x86,
tony.luck, dave.hansen, thomas.lendacky, benh
On Fri, 2021-01-08 at 23:10 +1100, Balbir Singh wrote:
> Implement a mechanism that allows tasks to conditionally flush
> their L1D cache (mitigation mechanism suggested in [2]). The previous
> posts of these patches were sent for inclusion (see [3]) and were not
> included due to the concern for the need for additional checks,
> those checks were:
>
> 1. Implement this mechanism only for CPUs affected by the L1TF bug
> 2. Disable the software fallback
> 3. Provide an override to enable this mechanism
> 4. Be SMT aware in the implementation
>
> The patches support a use case where the entire system is not in
> non SMT mode, but rather a few CPUs can have their SMT turned off
> and processes that want to opt-in are expected to run on non SMT
> cores. This gives the administrator complete control over setting
> up the mitigation for the issue. In addition, the administrator
> has a boot time override (l1d_flush=on) to turn on the mechanism
> without which this mechanism will not work.
>
> To implement these efficiently, a new per cpu view of whether the core
> is in SMT mode or not is implemented in patch 1. The code is refactored
> in patch 2 so that the existing code can allow for other speculation
> related checks when switching mm between tasks, this mechanism has not
> changed since the last post. The ability to flush L1D for tasks if the
> TIF_SPEC_L1D_FLUSH bit is set and the task has context switched out of a
> non SMT core is provided by patch 3. Hooks for the user space API, for
> this feature to be invoked via prctl are provided in patch 4, along with
> the checks described above (1, 2, and 3). Documentation updates are in
> patch 5, with updates on l1d_flush, the prctl changes and updates to the
> kernel-parameters (l1d_flush_out).
>
> The checks for opting into L1D flushing are:
> a. If the CPU is affected by L1TF
> b. Hardware L1D flush mechanism is available
>
> A task running on a core with SMT enabled and opting into this feature will
> receive a SIGBUS.
>
> References
> [1] https://software.intel.com/security-software-guidance/software-guidance/snoop-assisted-l1-data-sampling
> [2] https://software.intel.com/security-software-guidance/insights/deep-dive-snoop-assisted-l1-data-sampling
> [3] https://lkml.org/lkml/2020/6/2/1150
> [4] https://lore.kernel.org/lkml/20200729001103.6450-1-sblbir@amazon.com/
> [5] https://lore.kernel.org/lkml/20201117234934.25985-2-sblbir@amazon.com/
>
> Reviewers guide to v4
> - The key patch in the series and most of the changes to this
> revision are to patch 4. patches 3 and 5 have been modified
> to keep them consistent with the changes to patch 4.
>
> Changelog v4:
> - Use a static key to enable the mechanism (remove overheads)
> - By default have the mechanism turned off, so there are two
> opt-ins needed, one by the administrator at boot time, second
> by the application
> - Rename l1d_flush_out/L1D_FLUSH_OUT to l1d_flush/L1D_FLUSH
> - Implement other review recommendations
> Changelog v3:
> - Implement the SIGBUS mechansim
> - Update and fix the documentation
>
>
> Balbir Singh (5):
> x86/smp: Add a per-cpu view of SMT state
> x86/mm: Refactor cond_ibpb() to support other use cases
> x86/mm: Optionally flush L1D on context switch
> prctl: Hook L1D flushing in via prctl
> Documentation: Add L1D flushing Documentation
>
> Documentation/admin-guide/hw-vuln/index.rst | 1 +
> .../admin-guide/hw-vuln/l1d_flush.rst | 70 +++++++++++++++
> .../admin-guide/kernel-parameters.txt | 17 ++++
> Documentation/userspace-api/spec_ctrl.rst | 8 ++
> arch/Kconfig | 4 +
> arch/x86/Kconfig | 1 +
> arch/x86/include/asm/cacheflush.h | 8 ++
> arch/x86/include/asm/nospec-branch.h | 2 +
> arch/x86/include/asm/processor.h | 2 +
> arch/x86/include/asm/thread_info.h | 6 +-
> arch/x86/include/asm/tlbflush.h | 2 +-
> arch/x86/kernel/cpu/bugs.c | 71 +++++++++++++++
> arch/x86/kernel/smpboot.c | 10 ++-
> arch/x86/mm/tlb.c | 88 ++++++++++++++-----
> include/linux/sched.h | 10 +++
> include/uapi/linux/prctl.h | 1 +
> 16 files changed, 273 insertions(+), 28 deletions(-)
> create mode 100644 Documentation/admin-guide/hw-vuln/l1d_flush.rst
>
Ping on any review comments? Suggested refactoring?
Balbir Singh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4 0/5] Next revision of the L1D flush patches
2021-01-25 9:27 ` [PATCH v4 0/5] Next revision of the L1D flush patches Singh, Balbir
@ 2021-04-08 20:23 ` Kees Cook
[not found] ` <87y2d5tpjh.ffs@nanos.tec.linutronix.de>
0 siblings, 1 reply; 21+ messages in thread
From: Kees Cook @ 2021-04-08 20:23 UTC (permalink / raw)
To: Singh, Balbir, tglx
Cc: mingo, linux-kernel, peterz, torvalds, jpoimboe, x86, tony.luck,
dave.hansen, thomas.lendacky, benh, linux-hardening
*thread necromancy*
https://lore.kernel.org/lkml/20210108121056.21940-1-sblbir@amazon.com/
On Mon, Jan 25, 2021 at 09:27:38AM +0000, Singh, Balbir wrote:
> On Fri, 2021-01-08 at 23:10 +1100, Balbir Singh wrote:
> > Implement a mechanism that allows tasks to conditionally flush
> > their L1D cache (mitigation mechanism suggested in [2]). The previous
> > posts of these patches were sent for inclusion (see [3]) and were not
> > included due to the concern for the need for additional checks,
> > those checks were:
> >
> > 1. Implement this mechanism only for CPUs affected by the L1TF bug
> > 2. Disable the software fallback
> > 3. Provide an override to enable this mechanism
> > 4. Be SMT aware in the implementation
> > [...]
> Ping on any review comments? Suggested refactoring?
Hi!
I'd still really like to see this -- it's a big hammer, but that's the
point for cases where some new flaw appears and we can point to the
toolbox and say "you can mitigate it with this while you wait for new
kernel/CPU."
Any further thoughts from x86 maintainers? This seems like it addressed
all of tglx's review comments.
--
Kees Cook
^ permalink raw reply [flat|nested] 21+ messages in thread
* [tip: x86/cpu] x86/mm: Prepare for opt-in based L1D flush in switch_mm()
2021-01-08 12:10 [PATCH v4 0/5] Next revision of the L1D flush patches Balbir Singh
` (5 preceding siblings ...)
2021-01-25 9:27 ` [PATCH v4 0/5] Next revision of the L1D flush patches Singh, Balbir
@ 2021-07-28 9:58 ` tip-bot2 for Balbir Singh
2021-07-28 9:58 ` [tip: x86/cpu] x86/process: Make room for TIF_SPEC_L1D_FLUSH tip-bot2 for Balbir Singh
2021-07-28 9:58 ` [tip: x86/cpu] sched: Add task_work callback for paranoid L1D flush tip-bot2 for Balbir Singh
8 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Balbir Singh @ 2021-07-28 9:58 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Thomas Gleixner, Balbir Singh, x86, linux-kernel
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: b5f06f64e269f9820cd5ad9e9a98afa6c8914b7a
Gitweb: https://git.kernel.org/tip/b5f06f64e269f9820cd5ad9e9a98afa6c8914b7a
Author: Balbir Singh <sblbir@amazon.com>
AuthorDate: Mon, 26 Apr 2021 21:42:30 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Jul 2021 11:42:24 +02:00
x86/mm: Prepare for opt-in based L1D flush in switch_mm()
The goal of this is to allow tasks that want to protect sensitive
information, against e.g. the recently found snoop assisted data sampling
vulnerabilites, to flush their L1D on being switched out. This protects
their data from being snooped or leaked via side channels after the task
has context switched out.
This could also be used to wipe L1D when an untrusted task is switched in,
but that's not a really well defined scenario while the opt-in variant is
clearly defined.
The mechanism is default disabled and can be enabled on the kernel command
line.
Prepare for the actual prctl based opt-in:
1) Provide the necessary setup functionality similar to the other
mitigations and enable the static branch when the command line option
is set and the CPU provides support for hardware assisted L1D
flushing. Software based L1D flush is not supported because it's CPU
model specific and not really well defined.
This does not come with a sysfs file like the other mitigations
because it is not bound to any specific vulnerability.
Support has to be queried via the prctl(2) interface.
2) Add TIF_SPEC_L1D_FLUSH next to L1D_SPEC_IB so the two bits can be
mangled into the mm pointer in one go which allows to reuse the
existing mechanism in switch_mm() for the conditional IBPB speculation
barrier efficiently.
3) Add the L1D flush specific functionality which flushes L1D when the
outgoing task opted in.
Also check whether the incoming task has requested L1D flush and if so
validate that it is not accidentaly running on an SMT sibling as this
makes the whole excercise moot because SMT siblings share L1D which
opens tons of other attack vectors. If that happens schedule task work
which signals the incoming task on return to user/guest with SIGBUS as
this is part of the paranoid L1D flush contract.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210108121056.21940-1-sblbir@amazon.com
---
arch/x86/Kconfig | 1 +-
arch/x86/include/asm/nospec-branch.h | 2 +-
arch/x86/include/asm/thread_info.h | 2 +-
arch/x86/kernel/cpu/bugs.c | 37 +++++++++++++++++-
arch/x86/mm/tlb.c | 58 ++++++++++++++++++++++++++-
5 files changed, 98 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4927065..d8a2c3f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -119,6 +119,7 @@ config X86
select ARCH_WANT_HUGE_PMD_SHARE
select ARCH_WANT_LD_ORPHAN_WARN
select ARCH_WANTS_THP_SWAP if X86_64
+ select ARCH_HAS_PARANOID_L1D_FLUSH
select BUILDTIME_TABLE_SORT
select CLKEVT_I8253
select CLOCKSOURCE_VALIDATE_LAST_CYCLE
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 3ad8c6d..ec2d5c8 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -252,6 +252,8 @@ DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
DECLARE_STATIC_KEY_FALSE(mds_user_clear);
DECLARE_STATIC_KEY_FALSE(mds_idle_clear);
+DECLARE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
+
#include <asm/segment.h>
/**
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index d9afd35..cf13266 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -81,6 +81,7 @@ struct thread_info {
#define TIF_SINGLESTEP 4 /* reenable singlestep on user return*/
#define TIF_SSBD 5 /* Speculative store bypass disable */
#define TIF_SPEC_IB 9 /* Indirect branch speculation mitigation */
+#define TIF_SPEC_L1D_FLUSH 10 /* Flush L1D on mm switches (processes) */
#define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */
#define TIF_UPROBE 12 /* breakpointed or singlestepping */
#define TIF_PATCH_PENDING 13 /* pending live patching update */
@@ -104,6 +105,7 @@ struct thread_info {
#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)
#define _TIF_SSBD (1 << TIF_SSBD)
#define _TIF_SPEC_IB (1 << TIF_SPEC_IB)
+#define _TIF_SPEC_L1D_FLUSH (1 << TIF_SPEC_L1D_FLUSH)
#define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY)
#define _TIF_UPROBE (1 << TIF_UPROBE)
#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d41b70f..1a5a1b0 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -43,6 +43,7 @@ static void __init mds_select_mitigation(void);
static void __init mds_print_mitigation(void);
static void __init taa_select_mitigation(void);
static void __init srbds_select_mitigation(void);
+static void __init l1d_flush_select_mitigation(void);
/* The base value of the SPEC_CTRL MSR that always has to be preserved. */
u64 x86_spec_ctrl_base;
@@ -76,6 +77,13 @@ EXPORT_SYMBOL_GPL(mds_user_clear);
DEFINE_STATIC_KEY_FALSE(mds_idle_clear);
EXPORT_SYMBOL_GPL(mds_idle_clear);
+/*
+ * Controls whether l1d flush based mitigations are enabled,
+ * based on hw features and admin setting via boot parameter
+ * defaults to false
+ */
+DEFINE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush);
+
void __init check_bugs(void)
{
identify_boot_cpu();
@@ -111,6 +119,7 @@ void __init check_bugs(void)
mds_select_mitigation();
taa_select_mitigation();
srbds_select_mitigation();
+ l1d_flush_select_mitigation();
/*
* As MDS and TAA mitigations are inter-related, print MDS
@@ -492,6 +501,34 @@ static int __init srbds_parse_cmdline(char *str)
early_param("srbds", srbds_parse_cmdline);
#undef pr_fmt
+#define pr_fmt(fmt) "L1D Flush : " fmt
+
+enum l1d_flush_mitigations {
+ L1D_FLUSH_OFF = 0,
+ L1D_FLUSH_ON,
+};
+
+static enum l1d_flush_mitigations l1d_flush_mitigation __initdata = L1D_FLUSH_OFF;
+
+static void __init l1d_flush_select_mitigation(void)
+{
+ if (!l1d_flush_mitigation || !boot_cpu_has(X86_FEATURE_FLUSH_L1D))
+ return;
+
+ static_branch_enable(&switch_mm_cond_l1d_flush);
+ pr_info("Conditional flush on switch_mm() enabled\n");
+}
+
+static int __init l1d_flush_parse_cmdline(char *str)
+{
+ if (!strcmp(str, "on"))
+ l1d_flush_mitigation = L1D_FLUSH_ON;
+
+ return 0;
+}
+early_param("l1d_flush", l1d_flush_parse_cmdline);
+
+#undef pr_fmt
#define pr_fmt(fmt) "Spectre V1 : " fmt
enum spectre_v1_mitigation {
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index c98bc84..59ba296 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -8,11 +8,13 @@
#include <linux/export.h>
#include <linux/cpu.h>
#include <linux/debugfs.h>
+#include <linux/sched/smt.h>
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
#include <asm/nospec-branch.h>
#include <asm/cache.h>
+#include <asm/cacheflush.h>
#include <asm/apic.h>
#include <asm/perf_event.h>
@@ -43,11 +45,12 @@
*/
/*
- * Bits to mangle the TIF_SPEC_IB state into the mm pointer which is
+ * Bits to mangle the TIF_SPEC_* state into the mm pointer which is
* stored in cpu_tlb_state.last_user_mm_spec.
*/
#define LAST_USER_MM_IBPB 0x1UL
-#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB)
+#define LAST_USER_MM_L1D_FLUSH 0x2UL
+#define LAST_USER_MM_SPEC_MASK (LAST_USER_MM_IBPB | LAST_USER_MM_L1D_FLUSH)
/* Bits to set when tlbstate and flush is (re)initialized */
#define LAST_USER_MM_INIT LAST_USER_MM_IBPB
@@ -321,11 +324,52 @@ void switch_mm(struct mm_struct *prev, struct mm_struct *next,
local_irq_restore(flags);
}
+/*
+ * Invoked from return to user/guest by a task that opted-in to L1D
+ * flushing but ended up running on an SMT enabled core due to wrong
+ * affinity settings or CPU hotplug. This is part of the paranoid L1D flush
+ * contract which this task requested.
+ */
+static void l1d_flush_force_sigbus(struct callback_head *ch)
+{
+ force_sig(SIGBUS);
+}
+
+static void l1d_flush_evaluate(unsigned long prev_mm, unsigned long next_mm,
+ struct task_struct *next)
+{
+ /* Flush L1D if the outgoing task requests it */
+ if (prev_mm & LAST_USER_MM_L1D_FLUSH)
+ wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
+
+ /* Check whether the incoming task opted in for L1D flush */
+ if (likely(!(next_mm & LAST_USER_MM_L1D_FLUSH)))
+ return;
+
+ /*
+ * Validate that it is not running on an SMT sibling as this would
+ * make the excercise pointless because the siblings share L1D. If
+ * it runs on a SMT sibling, notify it with SIGBUS on return to
+ * user/guest
+ */
+ if (this_cpu_read(cpu_info.smt_active)) {
+ clear_ti_thread_flag(&next->thread_info, TIF_SPEC_L1D_FLUSH);
+ next->l1d_flush_kill.func = l1d_flush_force_sigbus;
+ task_work_add(next, &next->l1d_flush_kill, TWA_RESUME);
+ }
+}
+
static unsigned long mm_mangle_tif_spec_bits(struct task_struct *next)
{
unsigned long next_tif = task_thread_info(next)->flags;
unsigned long spec_bits = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_SPEC_MASK;
+ /*
+ * Ensure that the bit shift above works as expected and the two flags
+ * end up in bit 0 and 1.
+ */
+ BUILD_BUG_ON(TIF_SPEC_L1D_FLUSH != TIF_SPEC_IB + 1);
+
return (unsigned long)next->mm | spec_bits;
}
@@ -403,6 +447,16 @@ static void cond_mitigation(struct task_struct *next)
indirect_branch_prediction_barrier();
}
+ if (static_branch_unlikely(&switch_mm_cond_l1d_flush)) {
+ /*
+ * Flush L1D when the outgoing task requested it and/or
+ * check whether the incoming task requested L1D flushing
+ * and ended up on an SMT sibling.
+ */
+ if (unlikely((prev_mm | next_mm) & LAST_USER_MM_L1D_FLUSH))
+ l1d_flush_evaluate(prev_mm, next_mm, next);
+ }
+
this_cpu_write(cpu_tlbstate.last_user_mm_spec, next_mm);
}
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip: x86/cpu] x86/process: Make room for TIF_SPEC_L1D_FLUSH
2021-01-08 12:10 [PATCH v4 0/5] Next revision of the L1D flush patches Balbir Singh
` (6 preceding siblings ...)
2021-07-28 9:58 ` [tip: x86/cpu] x86/mm: Prepare for opt-in based L1D flush in switch_mm() tip-bot2 for Balbir Singh
@ 2021-07-28 9:58 ` tip-bot2 for Balbir Singh
2021-07-28 9:58 ` [tip: x86/cpu] sched: Add task_work callback for paranoid L1D flush tip-bot2 for Balbir Singh
8 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Balbir Singh @ 2021-07-28 9:58 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Thomas Gleixner, Balbir Singh, x86, linux-kernel
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 8aacd1eab53ec853c2d29cdc9b64e9dc87d2a519
Gitweb: https://git.kernel.org/tip/8aacd1eab53ec853c2d29cdc9b64e9dc87d2a519
Author: Balbir Singh <sblbir@amazon.com>
AuthorDate: Mon, 26 Apr 2021 22:09:43 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Jul 2021 11:42:24 +02:00
x86/process: Make room for TIF_SPEC_L1D_FLUSH
The upcoming support for paranoid L1D flush in switch_mm() requires that
TIF_SPEC_IB and the new TIF_SPEC_L1D_FLUSH are two consecutive bits in
thread_info::flags.
Move TIF_SPEC_FORCE_UPDATE to a spare bit to make room for the new one.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210108121056.21940-1-sblbir@amazon.com
---
arch/x86/include/asm/thread_info.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index de406d9..d9afd35 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -81,7 +81,6 @@ struct thread_info {
#define TIF_SINGLESTEP 4 /* reenable singlestep on user return*/
#define TIF_SSBD 5 /* Speculative store bypass disable */
#define TIF_SPEC_IB 9 /* Indirect branch speculation mitigation */
-#define TIF_SPEC_FORCE_UPDATE 10 /* Force speculation MSR update in context switch */
#define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */
#define TIF_UPROBE 12 /* breakpointed or singlestepping */
#define TIF_PATCH_PENDING 13 /* pending live patching update */
@@ -93,6 +92,7 @@ struct thread_info {
#define TIF_MEMDIE 20 /* is terminating due to OOM killer */
#define TIF_POLLING_NRFLAG 21 /* idle is polling for TIF_NEED_RESCHED */
#define TIF_IO_BITMAP 22 /* uses I/O bitmap */
+#define TIF_SPEC_FORCE_UPDATE 23 /* Force speculation MSR update in context switch */
#define TIF_FORCED_TF 24 /* true if TF in eflags artificially */
#define TIF_BLOCKSTEP 25 /* set when we want DEBUGCTLMSR_BTF */
#define TIF_LAZY_MMU_UPDATES 27 /* task is updating the mmu lazily */
@@ -104,7 +104,6 @@ struct thread_info {
#define _TIF_SINGLESTEP (1 << TIF_SINGLESTEP)
#define _TIF_SSBD (1 << TIF_SSBD)
#define _TIF_SPEC_IB (1 << TIF_SPEC_IB)
-#define _TIF_SPEC_FORCE_UPDATE (1 << TIF_SPEC_FORCE_UPDATE)
#define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY)
#define _TIF_UPROBE (1 << TIF_UPROBE)
#define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING)
@@ -115,6 +114,7 @@ struct thread_info {
#define _TIF_SLD (1 << TIF_SLD)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_IO_BITMAP (1 << TIF_IO_BITMAP)
+#define _TIF_SPEC_FORCE_UPDATE (1 << TIF_SPEC_FORCE_UPDATE)
#define _TIF_FORCED_TF (1 << TIF_FORCED_TF)
#define _TIF_BLOCKSTEP (1 << TIF_BLOCKSTEP)
#define _TIF_LAZY_MMU_UPDATES (1 << TIF_LAZY_MMU_UPDATES)
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [tip: x86/cpu] sched: Add task_work callback for paranoid L1D flush
2021-01-08 12:10 [PATCH v4 0/5] Next revision of the L1D flush patches Balbir Singh
` (7 preceding siblings ...)
2021-07-28 9:58 ` [tip: x86/cpu] x86/process: Make room for TIF_SPEC_L1D_FLUSH tip-bot2 for Balbir Singh
@ 2021-07-28 9:58 ` tip-bot2 for Balbir Singh
8 siblings, 0 replies; 21+ messages in thread
From: tip-bot2 for Balbir Singh @ 2021-07-28 9:58 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Balbir Singh, Thomas Gleixner, x86, linux-kernel
The following commit has been merged into the x86/cpu branch of tip:
Commit-ID: 58e106e725eed59896b9141a1c9a917d2f67962a
Gitweb: https://git.kernel.org/tip/58e106e725eed59896b9141a1c9a917d2f67962a
Author: Balbir Singh <sblbir@amazon.com>
AuthorDate: Mon, 26 Apr 2021 21:59:11 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 28 Jul 2021 11:42:24 +02:00
sched: Add task_work callback for paranoid L1D flush
The upcoming paranoid L1D flush infrastructure allows to conditionally
(opt-in) flush L1D in switch_mm() as a defense against potential new side
channels or for paranoia reasons. As the flush makes only sense when a task
runs on a non-SMT enabled core, because SMT siblings share L1, the
switch_mm() logic will kill a task which is flagged for L1D flush when it
is running on a SMT thread.
Add a taskwork callback so switch_mm() can queue a SIG_KILL command which
is invoked when the task tries to return to user space.
Signed-off-by: Balbir Singh <sblbir@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210108121056.21940-1-sblbir@amazon.com
---
arch/Kconfig | 3 +++
include/linux/sched.h | 10 ++++++++++
2 files changed, 13 insertions(+)
diff --git a/arch/Kconfig b/arch/Kconfig
index 129df49..98db634 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1282,6 +1282,9 @@ config ARCH_SPLIT_ARG64
config ARCH_HAS_ELFCORE_COMPAT
bool
+config ARCH_HAS_PARANOID_L1D_FLUSH
+ bool
+
source "kernel/gcov/Kconfig"
source "scripts/gcc-plugins/Kconfig"
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ec8d07d..c048e59 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1400,6 +1400,16 @@ struct task_struct {
struct llist_head kretprobe_instances;
#endif
+#ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH
+ /*
+ * If L1D flush is supported on mm context switch
+ * then we use this callback head to queue kill work
+ * to kill tasks that are not running on SMT disabled
+ * cores
+ */
+ struct callback_head l1d_flush_kill;
+#endif
+
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
^ permalink raw reply related [flat|nested] 21+ messages in thread