linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/5] x86/speculation: Disable IBRS when idle
@ 2023-06-20 14:06 Waiman Long
  2023-06-20 14:06 ` [PATCH v2 1/5] x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs Waiman Long
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Waiman Long @ 2023-06-20 14:06 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Pawan Gupta,
	Jacob Pan, Len Brown
  Cc: linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario, Waiman Long

For Intel processors that need to turn on IBRS to protect against
Spectre v2 and Retbleed, the IBRS bit in the SPEC_CTRL MSR affects
the performance of the whole core even if only one thread is turning
it on when running in the kernel. For user space heavy applications,
the performance impact of occasionally turning IBRS on during syscalls
shouldn't be significant. Unfortunately, that is not the case when the
sibling thread is idling in the kernel. In that case, the performance
impact can be significant.

When DPDK is running on an isolated CPU thread processing network packets
in user space while its sibling thread is idle. The performance of the
busy DPDK thread with IBRS on and off in the sibling idle thread are:

                                IBRS on         IBRS off
                                -------         --------
  packets/second:                  7.8M           10.4M
  avg tsc cycles/packet:         282.26          209.86

This is a 25% performance degradation. The test system is a Intel Xeon
4114 CPU @ 2.20GHz.

Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the CPU enters long idle (C6 or below). However, there
are existing users out there who have set "intel_idle.max_cstate=1" or
even "intel_idle.max_cstate=0" to decrease latency. Those users won't be
able to benefit from this commit. This patch series extends this commit
by providing a new "intel_idle.no_ibrs" module option to force disable
IBRS even when "intel_idle.max_cstate=1" at the expense of increased IRQ
response latency. It also includes commit to allow the disabling of IBRS
with "intel_idle.max_cstate=0" as well as when a CPU becomes offline.

The first patch adds a new x86/spec_ctrl_msrs debugfs file which display
the current cached values of the SPEC_CTRL MSRs of all the CPUs. This
allows us to verify that IBRS bit is correctly turned off in idle CPUs
for various cstate values.

Waiman Long (5):
  x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs
  x86/idle: Disable IBRS when cpu is offline
  intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
  intel_idle: Add no_ibrs module parameter to force disable IBRS
  x86/idle: Disable IBRS entering mwait idle and enable it on wakeup

 arch/x86/include/asm/mwait.h | 17 ++++++++
 arch/x86/kernel/cpu/bugs.c   | 79 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c    | 13 ++++++
 drivers/idle/intel_idle.c    | 22 ++++++++--
 4 files changed, 127 insertions(+), 4 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/5] x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs
  2023-06-20 14:06 [PATCH v2 0/5] x86/speculation: Disable IBRS when idle Waiman Long
@ 2023-06-20 14:06 ` Waiman Long
  2023-06-21  7:41   ` Peter Zijlstra
  2023-06-20 14:06 ` [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline Waiman Long
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2023-06-20 14:06 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Pawan Gupta,
	Jacob Pan, Len Brown
  Cc: linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario, Waiman Long

Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what
mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs
debugfs file to dump the cached versions of the current SPEC_CTRL MSRs.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 arch/x86/kernel/cpu/bugs.c | 79 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 182af64387d0..f6e5910a4a2d 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -17,6 +17,7 @@
 #include <linux/sched/smt.h>
 #include <linux/pgtable.h>
 #include <linux/bpf.h>
+#include <linux/debugfs.h>
 
 #include <asm/spec-ctrl.h>
 #include <asm/cmdline.h>
@@ -1733,6 +1734,84 @@ void cpu_bugs_smt_update(void)
 	mutex_unlock(&spec_ctrl_mutex);
 }
 
+#ifdef CONFIG_DEBUG_FS
+/*
+ * Provide a debugfs file to dump SPEC_CTRL MSRs of all the CPUs
+ * Consecutive MSR values are collapsed together if they are the same.
+ */
+static ssize_t spec_ctrl_msrs_read(struct file *file, char __user *user_buf,
+				   size_t count, loff_t *ppos)
+{
+	int bufsiz = min(count, PAGE_SIZE);
+	int cpu, prev_cpu, len, cnt = 0;
+	u64 val, prev_val;
+	char *buf;
+
+	/*
+	 * The MSRs info should be small enough that the whole buffer is
+	 * copied out in one call. However, user space may read it again
+	 * to see if there is any data left. Rereading the cached SPEC_CTRL
+	 * MSR values may produce a different result causing corruption in
+	 * output data. So skipping the call if *ppos is not starting from 0.
+	 */
+	if (*ppos)
+		return 0;
+
+	buf = kmalloc(bufsiz, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	for_each_possible_cpu(cpu) {
+		val = per_cpu(x86_spec_ctrl_current, cpu);
+
+		if (!cpu)
+			goto next;
+
+		if (val == prev_val)
+			continue;
+
+		if (prev_cpu == cpu - 1)
+			len = snprintf(buf + cnt, bufsiz - cnt, "CPU  %d: 0x%llx\n",
+				       prev_cpu, prev_val);
+		else
+			len = snprintf(buf + cnt, bufsiz - cnt, "CPUs %d-%d: 0x%llx\n",
+					prev_cpu, cpu - 1, prev_val);
+
+		cnt += len;
+		if (!len)
+			break;	/* Out of buffer */
+next:
+		prev_cpu = cpu;
+		prev_val = val;
+	}
+
+	if (prev_cpu == cpu - 1)
+		cnt += snprintf(buf + cnt, bufsiz - cnt, "CPU  %d: 0x%llx\n",
+			       prev_cpu, prev_val);
+	else
+		cnt += snprintf(buf + cnt, bufsiz - cnt, "CPUs %d-%d: 0x%llx\n",
+				prev_cpu, cpu - 1, prev_val);
+
+	count = simple_read_from_buffer(user_buf, count, ppos, buf, cnt);
+	kfree(buf);
+	return count;
+}
+
+static const struct file_operations fops_spec_ctrl = {
+	.read = spec_ctrl_msrs_read,
+	.llseek = default_llseek,
+};
+
+static int __init init_spec_ctrl_debugfs(void)
+{
+	if (!debugfs_create_file("spec_ctrl_msrs", 0400, arch_debugfs_dir,
+				 NULL, &fops_spec_ctrl))
+		return -ENOMEM;
+	return 0;
+}
+fs_initcall(init_spec_ctrl_debugfs);
+#endif
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"Speculative Store Bypass: " fmt
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline
  2023-06-20 14:06 [PATCH v2 0/5] x86/speculation: Disable IBRS when idle Waiman Long
  2023-06-20 14:06 ` [PATCH v2 1/5] x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs Waiman Long
@ 2023-06-20 14:06 ` Waiman Long
  2023-06-21  7:23   ` Peter Zijlstra
  2023-06-20 14:06 ` [PATCH v2 3/5] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current Waiman Long
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2023-06-20 14:06 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Pawan Gupta,
	Jacob Pan, Len Brown
  Cc: linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario, Waiman Long

Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the CPU enters long idle. However, when a CPU becomes
offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
enabled. That will impact the performance of a sibling CPU. Mitigate
this performance impact by clearing all the mitigation bits in SPEC_CTRL
MSR when offline and restoring the value of the MSR when it becomes
online again.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 arch/x86/kernel/smpboot.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 352f0ce1ece4..5ff82fef413c 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -84,6 +84,7 @@
 #include <asm/hw_irq.h>
 #include <asm/stackprotector.h>
 #include <asm/sev.h>
+#include <asm/nospec-branch.h>
 
 /* representing HT siblings of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
@@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
 
 void native_play_dead(void)
 {
+	u64 spec_ctrl = spec_ctrl_current();
+
+	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
+		this_cpu_write(x86_spec_ctrl_current, 0);
+		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+	}
+
 	play_dead_common();
 	tboot_shutdown(TB_SHUTDOWN_WFS);
 
 	mwait_play_dead();
 	if (cpuidle_play_dead())
 		hlt_play_dead();
+
+	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
+		native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
+		this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
+	}
 }
 
 #else /* ... !CONFIG_HOTPLUG_CPU */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/5] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
  2023-06-20 14:06 [PATCH v2 0/5] x86/speculation: Disable IBRS when idle Waiman Long
  2023-06-20 14:06 ` [PATCH v2 1/5] x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs Waiman Long
  2023-06-20 14:06 ` [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline Waiman Long
@ 2023-06-20 14:06 ` Waiman Long
  2023-06-21  7:38   ` Peter Zijlstra
  2023-06-20 14:06 ` [PATCH v2 4/5] intel_idle: Add no_ibrs module parameter to force disable IBRS Waiman Long
  2023-06-20 14:06 ` [PATCH v2 5/5] x86/idle: Disable IBRS entering mwait idle and enable it on wakeup Waiman Long
  4 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2023-06-20 14:06 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Pawan Gupta,
	Jacob Pan, Len Brown
  Cc: linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario, Waiman Long

When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to 0
in order disable IBRS. However, the new MSR value isn't reflected in
x86_spec_ctrl_current. That will cause the new spec_ctrl_msrs debugfs
file to show incorrect result. Fix that by updating x86_spec_ctrl_current
percpu value to always match the content of the SPEC_CTRL MSR.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 drivers/idle/intel_idle.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index aa2d19db2b1d..07fa23707b3c 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -181,13 +181,17 @@ static __cpuidle int intel_idle_ibrs(struct cpuidle_device *dev,
 	u64 spec_ctrl = spec_ctrl_current();
 	int ret;
 
-	if (smt_active)
+	if (smt_active) {
+		__this_cpu_write(x86_spec_ctrl_current, 0);
 		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+	}
 
 	ret = __intel_idle(dev, drv, index);
 
-	if (smt_active)
+	if (smt_active) {
 		native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
+		__this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
+	}
 
 	return ret;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 4/5] intel_idle: Add no_ibrs module parameter to force disable IBRS
  2023-06-20 14:06 [PATCH v2 0/5] x86/speculation: Disable IBRS when idle Waiman Long
                   ` (2 preceding siblings ...)
  2023-06-20 14:06 ` [PATCH v2 3/5] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current Waiman Long
@ 2023-06-20 14:06 ` Waiman Long
  2023-06-21  7:30   ` Peter Zijlstra
  2023-06-20 14:06 ` [PATCH v2 5/5] x86/idle: Disable IBRS entering mwait idle and enable it on wakeup Waiman Long
  4 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2023-06-20 14:06 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Pawan Gupta,
	Jacob Pan, Len Brown
  Cc: linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario, Waiman Long

Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the cstate is 6 or lower. However, there are
some use cases where a customer may want to use max_cstate=1 to
lower latency. Such use cases will suffer from the performance
degradation caused by the enabling of IBRS in the sibling idle thread.
Add a "no_ibrs" module parameter to force disable IBRS and the
CPUIDLE_FLAG_IRQ_ENABLE flag if set.

In the case of a Skylake server with max_cstate=1, this new no_ibrs
option will increase the IRQ response latency as IRQ will now be
disabled.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 drivers/idle/intel_idle.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 07fa23707b3c..366dacccc971 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -69,6 +69,7 @@ static int max_cstate = CPUIDLE_STATE_MAX - 1;
 static unsigned int disabled_states_mask __read_mostly;
 static unsigned int preferred_states_mask __read_mostly;
 static bool force_irq_on __read_mostly;
+static bool no_ibrs __read_mostly;
 
 static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
 
@@ -1907,12 +1908,15 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
 			WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
 			state->enter = intel_idle_xstate;
 		} else if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
-			   state->flags & CPUIDLE_FLAG_IBRS) {
+			  ((state->flags & CPUIDLE_FLAG_IBRS) || no_ibrs)) {
 			/*
 			 * IBRS mitigation requires that C-states are entered
 			 * with interrupts disabled.
 			 */
-			WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
+			if (no_ibrs && (state->flags & CPUIDLE_FLAG_IRQ_ENABLE))
+				state->flags &= ~CPUIDLE_FLAG_IRQ_ENABLE;
+			else
+				WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
 			state->enter = intel_idle_ibrs;
 		} else if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
 			state->enter = intel_idle_irq;
@@ -2165,3 +2169,9 @@ MODULE_PARM_DESC(preferred_cstates, "Mask of preferred idle states");
  * 'CPUIDLE_FLAG_INIT_XSTATE' and 'CPUIDLE_FLAG_IBRS' flags.
  */
 module_param(force_irq_on, bool, 0444);
+/*
+ * Force the disabling of IBRS when X86_FEATURE_KERNEL_IBRS is on and
+ * CPUIDLE_FLAG_IRQ_ENABLE isn't set.
+ */
+module_param(no_ibrs, bool, 0444);
+MODULE_PARM_DESC(no_ibrs, "Disable IBRS when idle");
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/5] x86/idle: Disable IBRS entering mwait idle and enable it on wakeup
  2023-06-20 14:06 [PATCH v2 0/5] x86/speculation: Disable IBRS when idle Waiman Long
                   ` (3 preceding siblings ...)
  2023-06-20 14:06 ` [PATCH v2 4/5] intel_idle: Add no_ibrs module parameter to force disable IBRS Waiman Long
@ 2023-06-20 14:06 ` Waiman Long
  2023-06-21  7:32   ` Peter Zijlstra
  4 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2023-06-20 14:06 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Josh Poimboeuf, Pawan Gupta,
	Jacob Pan, Len Brown
  Cc: linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario, Waiman Long

When a user sets "intel_idle.max_cstate=0", it will disable
intel_idle and fall back to acpi_idle instead. The acpi_idle code
will then call mwait_idle_with_hints() to enter idle state. So when
X86_FEATURE_KERNEL_IBRS is enabled, it is necessary to disable IBRS
within mwait_idle_with_hints() when IRQ was disabled to avoid performance
degradation on silbing thread running user workload.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 arch/x86/include/asm/mwait.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 778df05f8539..1e36cdc21661 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -108,15 +108,32 @@ static __always_inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 static __always_inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx)
 {
 	if (static_cpu_has_bug(X86_BUG_MONITOR) || !current_set_polling_and_test()) {
+		bool ibrs_disabled = false;
+		u64 spec_ctrl;
+
 		if (static_cpu_has_bug(X86_BUG_CLFLUSH_MONITOR)) {
 			mb();
 			clflush((void *)&current_thread_info()->flags);
 			mb();
 		}
 
+		if (irqs_disabled() &&
+		    cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
+			/* NMI always enable IBRS on exception entry */
+			ibrs_disabled = true;
+			spec_ctrl = spec_ctrl_current();
+			__this_cpu_write(x86_spec_ctrl_current, 0);
+			native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
+		}
+
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
 		if (!need_resched())
 			__mwait(eax, ecx);
+
+		if (ibrs_disabled) {
+			native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
+			__this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
+		}
 	}
 	current_clr_polling();
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline
  2023-06-20 14:06 ` [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline Waiman Long
@ 2023-06-21  7:23   ` Peter Zijlstra
  2023-06-21 13:59     ` Waiman Long
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-06-21  7:23 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario

On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote:
> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
> disables IBRS when the CPU enters long idle. However, when a CPU becomes
> offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
> enabled. That will impact the performance of a sibling CPU. Mitigate
> this performance impact by clearing all the mitigation bits in SPEC_CTRL
> MSR when offline and restoring the value of the MSR when it becomes
> online again.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  arch/x86/kernel/smpboot.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 352f0ce1ece4..5ff82fef413c 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -84,6 +84,7 @@
>  #include <asm/hw_irq.h>
>  #include <asm/stackprotector.h>
>  #include <asm/sev.h>
> +#include <asm/nospec-branch.h>
>  
>  /* representing HT siblings of each logical CPU */
>  DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
> @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
>  
>  void native_play_dead(void)
>  {
> +	u64 spec_ctrl = spec_ctrl_current();
> +
> +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> +		this_cpu_write(x86_spec_ctrl_current, 0);
> +		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> +	}
> +
>  	play_dead_common();
>  	tboot_shutdown(TB_SHUTDOWN_WFS);
>  
>  	mwait_play_dead();
>  	if (cpuidle_play_dead())
>  		hlt_play_dead();
> +
> +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> +		native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
> +		this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
> +	}
>  }

play_dead() is marked __noreturn

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 4/5] intel_idle: Add no_ibrs module parameter to force disable IBRS
  2023-06-20 14:06 ` [PATCH v2 4/5] intel_idle: Add no_ibrs module parameter to force disable IBRS Waiman Long
@ 2023-06-21  7:30   ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2023-06-21  7:30 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario

On Tue, Jun 20, 2023 at 10:06:24AM -0400, Waiman Long wrote:
> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
> disables IBRS when the cstate is 6 or lower. However, there are
> some use cases where a customer may want to use max_cstate=1 to
> lower latency. 

And then add the WRMSRs to increase latency again...

Since you're mucking about with all this, perhaps see if you can measure
the latency impact of all this.

> Such use cases will suffer from the performance
> degradation caused by the enabling of IBRS in the sibling idle thread.
> Add a "no_ibrs" module parameter to force disable IBRS and the
> CPUIDLE_FLAG_IRQ_ENABLE flag if set.
> 
> In the case of a Skylake server with max_cstate=1, this new no_ibrs
> option will increase the IRQ response latency as IRQ will now be
> disabled.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  drivers/idle/intel_idle.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 07fa23707b3c..366dacccc971 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -69,6 +69,7 @@ static int max_cstate = CPUIDLE_STATE_MAX - 1;
>  static unsigned int disabled_states_mask __read_mostly;
>  static unsigned int preferred_states_mask __read_mostly;
>  static bool force_irq_on __read_mostly;
> +static bool no_ibrs __read_mostly;
>  
>  static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
>  
> @@ -1907,12 +1908,15 @@ static void __init intel_idle_init_cstates_icpu(struct cpuidle_driver *drv)
>  			WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
>  			state->enter = intel_idle_xstate;
>  		} else if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) &&
> -			   state->flags & CPUIDLE_FLAG_IBRS) {
> +			  ((state->flags & CPUIDLE_FLAG_IBRS) || no_ibrs)) {
>  			/*
>  			 * IBRS mitigation requires that C-states are entered
>  			 * with interrupts disabled.
>  			 */
> -			WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
> +			if (no_ibrs && (state->flags & CPUIDLE_FLAG_IRQ_ENABLE))
> +				state->flags &= ~CPUIDLE_FLAG_IRQ_ENABLE;
> +			else
> +				WARN_ON_ONCE(state->flags & CPUIDLE_FLAG_IRQ_ENABLE);
>  			state->enter = intel_idle_ibrs;
>  		} else if (state->flags & CPUIDLE_FLAG_IRQ_ENABLE) {
>  			state->enter = intel_idle_irq;
> @@ -2165,3 +2169,9 @@ MODULE_PARM_DESC(preferred_cstates, "Mask of preferred idle states");
>   * 'CPUIDLE_FLAG_INIT_XSTATE' and 'CPUIDLE_FLAG_IBRS' flags.
>   */
>  module_param(force_irq_on, bool, 0444);
> +/*
> + * Force the disabling of IBRS when X86_FEATURE_KERNEL_IBRS is on and
> + * CPUIDLE_FLAG_IRQ_ENABLE isn't set.
> + */
> +module_param(no_ibrs, bool, 0444);
> +MODULE_PARM_DESC(no_ibrs, "Disable IBRS when idle");

Urgghhh.. the flag is CPUIDLE_FLAG_IBRS, so "no_ibrs" implies clearing
that flag.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/5] x86/idle: Disable IBRS entering mwait idle and enable it on wakeup
  2023-06-20 14:06 ` [PATCH v2 5/5] x86/idle: Disable IBRS entering mwait idle and enable it on wakeup Waiman Long
@ 2023-06-21  7:32   ` Peter Zijlstra
  2023-06-21 14:05     ` Waiman Long
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-06-21  7:32 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario

On Tue, Jun 20, 2023 at 10:06:25AM -0400, Waiman Long wrote:
> When a user sets "intel_idle.max_cstate=0", it will disable
> intel_idle and fall back to acpi_idle instead. The acpi_idle code
> will then call mwait_idle_with_hints() to enter idle state. So when
> X86_FEATURE_KERNEL_IBRS is enabled, it is necessary to disable IBRS
> within mwait_idle_with_hints() when IRQ was disabled to avoid performance
> degradation on silbing thread running user workload.

Urgh, no, just no. This is nasty.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 3/5] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
  2023-06-20 14:06 ` [PATCH v2 3/5] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current Waiman Long
@ 2023-06-21  7:38   ` Peter Zijlstra
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Zijlstra @ 2023-06-21  7:38 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario

On Tue, Jun 20, 2023 at 10:06:23AM -0400, Waiman Long wrote:
> When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to 0
> in order disable IBRS. However, the new MSR value isn't reflected in
> x86_spec_ctrl_current. That will cause the new spec_ctrl_msrs debugfs
> file to show incorrect result. Fix that by updating x86_spec_ctrl_current
> percpu value to always match the content of the SPEC_CTRL MSR.

What debugfs file?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/5] x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs
  2023-06-20 14:06 ` [PATCH v2 1/5] x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs Waiman Long
@ 2023-06-21  7:41   ` Peter Zijlstra
  2023-06-21  8:24     ` Borislav Petkov
  2023-06-21 13:47     ` Waiman Long
  0 siblings, 2 replies; 22+ messages in thread
From: Peter Zijlstra @ 2023-06-21  7:41 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario

On Tue, Jun 20, 2023 at 10:06:21AM -0400, Waiman Long wrote:
> Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what
> mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs
> debugfs file to dump the cached versions of the current SPEC_CTRL MSRs.
> 

Pff, clearly I can't even read email anymore..

We don't do this for any of the other MSRs, so why start now?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/5] x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs
  2023-06-21  7:41   ` Peter Zijlstra
@ 2023-06-21  8:24     ` Borislav Petkov
  2023-06-21 14:02       ` Waiman Long
  2023-06-21 13:47     ` Waiman Long
  1 sibling, 1 reply; 22+ messages in thread
From: Borislav Petkov @ 2023-06-21  8:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Thomas Gleixner, Ingo Molnar, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario

On Wed, Jun 21, 2023 at 09:41:05AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 20, 2023 at 10:06:21AM -0400, Waiman Long wrote:
> > Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what
> > mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs
> > debugfs file to dump the cached versions of the current SPEC_CTRL MSRs.
> > 
> 
> Pff, clearly I can't even read email anymore..
> 
> We don't do this for any of the other MSRs, so why start now?

Hell no.

There's /sys/devices/system/cpu/vulnerabilities/ for that.

We are abstracting MSRs away from APIs - not do the backwards thing.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/5] x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs
  2023-06-21  7:41   ` Peter Zijlstra
  2023-06-21  8:24     ` Borislav Petkov
@ 2023-06-21 13:47     ` Waiman Long
  1 sibling, 0 replies; 22+ messages in thread
From: Waiman Long @ 2023-06-21 13:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario


On 6/21/23 03:41, Peter Zijlstra wrote:
> On Tue, Jun 20, 2023 at 10:06:21AM -0400, Waiman Long wrote:
>> Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what
>> mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs
>> debugfs file to dump the cached versions of the current SPEC_CTRL MSRs.
>>
> Pff, clearly I can't even read email anymore..
>
> We don't do this for any of the other MSRs, so why start now?

That is true since most of the MSRs are static. IOW, they don't change 
once they are set. The current way to read the content of the MSRs is 
via the /dev/cpu/<n>/msr files. There are user space tools to do that.

SPEC_CTRL, however, can be subjected to frequent changes especially when 
X86_FEATURE_KERNEL_IBRS is set. As a result, the current way of reading 
MSRs from /dev/cpu/<n>/msr doesn't quite work for SPEC_CTRL as the IBRS 
bit is always set due to the fact that the reading is done internally 
via an IPI in kernel space. That is the main reason that I add this 
debugfs file to get a good snapshot of the current set of cached 
SPEC_CTRL MSR values without the need to disturb what the CPUs are 
currently doing at that point in time.

This patch is not central to the main purpose of this patch series, but 
it does enable me to quickly verify the other patches are working 
correctly. I can take it out if people don't think it is a good idea.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline
  2023-06-21  7:23   ` Peter Zijlstra
@ 2023-06-21 13:59     ` Waiman Long
  2023-06-21 14:36       ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2023-06-21 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario


On 6/21/23 03:23, Peter Zijlstra wrote:
> On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote:
>> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
>> disables IBRS when the CPU enters long idle. However, when a CPU becomes
>> offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
>> enabled. That will impact the performance of a sibling CPU. Mitigate
>> this performance impact by clearing all the mitigation bits in SPEC_CTRL
>> MSR when offline and restoring the value of the MSR when it becomes
>> online again.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   arch/x86/kernel/smpboot.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index 352f0ce1ece4..5ff82fef413c 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -84,6 +84,7 @@
>>   #include <asm/hw_irq.h>
>>   #include <asm/stackprotector.h>
>>   #include <asm/sev.h>
>> +#include <asm/nospec-branch.h>
>>   
>>   /* representing HT siblings of each logical CPU */
>>   DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
>> @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
>>   
>>   void native_play_dead(void)
>>   {
>> +	u64 spec_ctrl = spec_ctrl_current();
>> +
>> +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>> +		this_cpu_write(x86_spec_ctrl_current, 0);
>> +		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>> +	}
>> +
>>   	play_dead_common();
>>   	tboot_shutdown(TB_SHUTDOWN_WFS);
>>   
>>   	mwait_play_dead();
>>   	if (cpuidle_play_dead())
>>   		hlt_play_dead();
>> +
>> +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>> +		native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
>> +		this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
>> +	}
>>   }
> play_dead() is marked __noreturn

There are different versions of play_dead() in the kernel. Some of them 
are indeed marked __noreturn like the non-SMP one in 
arch/x86/kernel/process.c. The native_play_dead() that I am patching 
isn't one of those.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/5] x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs
  2023-06-21  8:24     ` Borislav Petkov
@ 2023-06-21 14:02       ` Waiman Long
  0 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2023-06-21 14:02 UTC (permalink / raw)
  To: Borislav Petkov, Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Dave Hansen, H. Peter Anvin,
	Josh Poimboeuf, Pawan Gupta, Jacob Pan, Len Brown, linux-kernel,
	x86, linux-pm, Robin Jarry, Joe Mario


On 6/21/23 04:24, Borislav Petkov wrote:
> On Wed, Jun 21, 2023 at 09:41:05AM +0200, Peter Zijlstra wrote:
>> On Tue, Jun 20, 2023 at 10:06:21AM -0400, Waiman Long wrote:
>>> Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what
>>> mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs
>>> debugfs file to dump the cached versions of the current SPEC_CTRL MSRs.
>>>
>> Pff, clearly I can't even read email anymore..
>>
>> We don't do this for any of the other MSRs, so why start now?
> Hell no.
>
> There's /sys/devices/system/cpu/vulnerabilities/ for that.
>
> We are abstracting MSRs away from APIs - not do the backwards thing.
>
OK, as I have said. This is not central to the main purpose of this 
patch series. It is mostly there for verification purpose. I can 
certainly take this out.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/5] x86/idle: Disable IBRS entering mwait idle and enable it on wakeup
  2023-06-21  7:32   ` Peter Zijlstra
@ 2023-06-21 14:05     ` Waiman Long
  0 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2023-06-21 14:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario

On 6/21/23 03:32, Peter Zijlstra wrote:
> On Tue, Jun 20, 2023 at 10:06:25AM -0400, Waiman Long wrote:
>> When a user sets "intel_idle.max_cstate=0", it will disable
>> intel_idle and fall back to acpi_idle instead. The acpi_idle code
>> will then call mwait_idle_with_hints() to enter idle state. So when
>> X86_FEATURE_KERNEL_IBRS is enabled, it is necessary to disable IBRS
>> within mwait_idle_with_hints() when IRQ was disabled to avoid performance
>> degradation on silbing thread running user workload.
> Urgh, no, just no. This is nasty.

OK, will take this out in the next version.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline
  2023-06-21 13:59     ` Waiman Long
@ 2023-06-21 14:36       ` Peter Zijlstra
  2023-06-21 14:44         ` Waiman Long
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-06-21 14:36 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario

On Wed, Jun 21, 2023 at 09:59:52AM -0400, Waiman Long wrote:
> 
> On 6/21/23 03:23, Peter Zijlstra wrote:
> > On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote:
> > > Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
> > > disables IBRS when the CPU enters long idle. However, when a CPU becomes
> > > offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
> > > enabled. That will impact the performance of a sibling CPU. Mitigate
> > > this performance impact by clearing all the mitigation bits in SPEC_CTRL
> > > MSR when offline and restoring the value of the MSR when it becomes
> > > online again.
> > > 
> > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > ---
> > >   arch/x86/kernel/smpboot.c | 13 +++++++++++++
> > >   1 file changed, 13 insertions(+)
> > > 
> > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > > index 352f0ce1ece4..5ff82fef413c 100644
> > > --- a/arch/x86/kernel/smpboot.c
> > > +++ b/arch/x86/kernel/smpboot.c
> > > @@ -84,6 +84,7 @@
> > >   #include <asm/hw_irq.h>
> > >   #include <asm/stackprotector.h>
> > >   #include <asm/sev.h>
> > > +#include <asm/nospec-branch.h>
> > >   /* representing HT siblings of each logical CPU */
> > >   DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
> > > @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
> > >   void native_play_dead(void)
> > >   {
> > > +	u64 spec_ctrl = spec_ctrl_current();
> > > +
> > > +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> > > +		this_cpu_write(x86_spec_ctrl_current, 0);
> > > +		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> > > +	}
> > > +
> > >   	play_dead_common();
> > >   	tboot_shutdown(TB_SHUTDOWN_WFS);
> > >   	mwait_play_dead();
> > >   	if (cpuidle_play_dead())
> > >   		hlt_play_dead();
> > > +
> > > +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> > > +		native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
> > > +		this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
> > > +	}
> > >   }
> > play_dead() is marked __noreturn
> 
> There are different versions of play_dead() in the kernel. Some of them are
> indeed marked __noreturn like the non-SMP one in arch/x86/kernel/process.c.
> The native_play_dead() that I am patching isn't one of those.

mostly by accident I think, hlt_play_dead() is, so I'm thinking
everybody (all compiler and objtool) managed to figure out
native_play_dead() is __noreturn too.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline
  2023-06-21 14:36       ` Peter Zijlstra
@ 2023-06-21 14:44         ` Waiman Long
  2023-06-21 14:48           ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2023-06-21 14:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario

On 6/21/23 10:36, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 09:59:52AM -0400, Waiman Long wrote:
>> On 6/21/23 03:23, Peter Zijlstra wrote:
>>> On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote:
>>>> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
>>>> disables IBRS when the CPU enters long idle. However, when a CPU becomes
>>>> offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
>>>> enabled. That will impact the performance of a sibling CPU. Mitigate
>>>> this performance impact by clearing all the mitigation bits in SPEC_CTRL
>>>> MSR when offline and restoring the value of the MSR when it becomes
>>>> online again.
>>>>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>>    arch/x86/kernel/smpboot.c | 13 +++++++++++++
>>>>    1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>>>> index 352f0ce1ece4..5ff82fef413c 100644
>>>> --- a/arch/x86/kernel/smpboot.c
>>>> +++ b/arch/x86/kernel/smpboot.c
>>>> @@ -84,6 +84,7 @@
>>>>    #include <asm/hw_irq.h>
>>>>    #include <asm/stackprotector.h>
>>>>    #include <asm/sev.h>
>>>> +#include <asm/nospec-branch.h>
>>>>    /* representing HT siblings of each logical CPU */
>>>>    DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
>>>> @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
>>>>    void native_play_dead(void)
>>>>    {
>>>> +	u64 spec_ctrl = spec_ctrl_current();
>>>> +
>>>> +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>>>> +		this_cpu_write(x86_spec_ctrl_current, 0);
>>>> +		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>>>> +	}
>>>> +
>>>>    	play_dead_common();
>>>>    	tboot_shutdown(TB_SHUTDOWN_WFS);
>>>>    	mwait_play_dead();
>>>>    	if (cpuidle_play_dead())
>>>>    		hlt_play_dead();
>>>> +
>>>> +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>>>> +		native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
>>>> +		this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
>>>> +	}
>>>>    }
>>> play_dead() is marked __noreturn
>> There are different versions of play_dead() in the kernel. Some of them are
>> indeed marked __noreturn like the non-SMP one in arch/x86/kernel/process.c.
>> The native_play_dead() that I am patching isn't one of those.
> mostly by accident I think, hlt_play_dead() is, so I'm thinking
> everybody (all compiler and objtool) managed to figure out
> native_play_dead() is __noreturn too.

Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an 
error which is not the typical case. My testing does confirm that this 
patch is able to keep the IBRS bit off when a CPU is offline via its 
online sysfs file.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline
  2023-06-21 14:44         ` Waiman Long
@ 2023-06-21 14:48           ` Peter Zijlstra
  2023-06-21 14:51             ` Waiman Long
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-06-21 14:48 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario

On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote:

> Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error
> which is not the typical case. My testing does confirm that this patch is
> able to keep the IBRS bit off when a CPU is offline via its online sysfs
> file.

The point is; your re-enable IBRS hunk at the end is dead-code. It
should never ever run and having it is confusing.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline
  2023-06-21 14:48           ` Peter Zijlstra
@ 2023-06-21 14:51             ` Waiman Long
  2023-06-21 14:54               ` Peter Zijlstra
  0 siblings, 1 reply; 22+ messages in thread
From: Waiman Long @ 2023-06-21 14:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario


On 6/21/23 10:48, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote:
>
>> Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error
>> which is not the typical case. My testing does confirm that this patch is
>> able to keep the IBRS bit off when a CPU is offline via its online sysfs
>> file.
> The point is; your re-enable IBRS hunk at the end is dead-code. It
> should never ever run and having it is confusing.

What I meant is that hlt_play_dead() should never be called unless there 
is some serious problem with the system and native_play_dead() does 
return in normal usage.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline
  2023-06-21 14:51             ` Waiman Long
@ 2023-06-21 14:54               ` Peter Zijlstra
  2023-06-21 15:42                 ` Waiman Long
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Zijlstra @ 2023-06-21 14:54 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario

On Wed, Jun 21, 2023 at 10:51:33AM -0400, Waiman Long wrote:
> 
> On 6/21/23 10:48, Peter Zijlstra wrote:
> > On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote:
> > 
> > > Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error
> > > which is not the typical case. My testing does confirm that this patch is
> > > able to keep the IBRS bit off when a CPU is offline via its online sysfs
> > > file.
> > The point is; your re-enable IBRS hunk at the end is dead-code. It
> > should never ever run and having it is confusing.
> 
> What I meant is that hlt_play_dead() should never be called unless there is
> some serious problem with the system and native_play_dead() does return in
> normal usage.

This is all through arch_cpu_idle_dead() which is __noreturn. And no,
none of this must ever return.

If you want an offline CPU to come back, you re-init.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline
  2023-06-21 14:54               ` Peter Zijlstra
@ 2023-06-21 15:42                 ` Waiman Long
  0 siblings, 0 replies; 22+ messages in thread
From: Waiman Long @ 2023-06-21 15:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Josh Poimboeuf, Pawan Gupta, Jacob Pan,
	Len Brown, linux-kernel, x86, linux-pm, Robin Jarry, Joe Mario

On 6/21/23 10:54, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 10:51:33AM -0400, Waiman Long wrote:
>> On 6/21/23 10:48, Peter Zijlstra wrote:
>>> On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote:
>>>
>>>> Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error
>>>> which is not the typical case. My testing does confirm that this patch is
>>>> able to keep the IBRS bit off when a CPU is offline via its online sysfs
>>>> file.
>>> The point is; your re-enable IBRS hunk at the end is dead-code. It
>>> should never ever run and having it is confusing.
>> What I meant is that hlt_play_dead() should never be called unless there is
>> some serious problem with the system and native_play_dead() does return in
>> normal usage.
> This is all through arch_cpu_idle_dead() which is __noreturn. And no,
> none of this must ever return.
>
> If you want an offline CPU to come back, you re-init.

Yes, you are right. I thought it will return. I will update the patch 
accordingly.

Thanks,
Longman


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2023-06-21 15:43 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-20 14:06 [PATCH v2 0/5] x86/speculation: Disable IBRS when idle Waiman Long
2023-06-20 14:06 ` [PATCH v2 1/5] x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs Waiman Long
2023-06-21  7:41   ` Peter Zijlstra
2023-06-21  8:24     ` Borislav Petkov
2023-06-21 14:02       ` Waiman Long
2023-06-21 13:47     ` Waiman Long
2023-06-20 14:06 ` [PATCH v2 2/5] x86/idle: Disable IBRS when cpu is offline Waiman Long
2023-06-21  7:23   ` Peter Zijlstra
2023-06-21 13:59     ` Waiman Long
2023-06-21 14:36       ` Peter Zijlstra
2023-06-21 14:44         ` Waiman Long
2023-06-21 14:48           ` Peter Zijlstra
2023-06-21 14:51             ` Waiman Long
2023-06-21 14:54               ` Peter Zijlstra
2023-06-21 15:42                 ` Waiman Long
2023-06-20 14:06 ` [PATCH v2 3/5] intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current Waiman Long
2023-06-21  7:38   ` Peter Zijlstra
2023-06-20 14:06 ` [PATCH v2 4/5] intel_idle: Add no_ibrs module parameter to force disable IBRS Waiman Long
2023-06-21  7:30   ` Peter Zijlstra
2023-06-20 14:06 ` [PATCH v2 5/5] x86/idle: Disable IBRS entering mwait idle and enable it on wakeup Waiman Long
2023-06-21  7:32   ` Peter Zijlstra
2023-06-21 14:05     ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).