linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] x86/bus_lock: Enable bus lock detection
@ 2020-07-17 21:35 Fenghua Yu
  2020-07-29  3:02 ` Sean Christopherson
  2020-07-29  8:49 ` peterz
  0 siblings, 2 replies; 16+ messages in thread
From: Fenghua Yu @ 2020-07-17 21:35 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov, Ingo Molnar, Peter Zijlstra,
	Tony Luck, H Peter Anvin, Andy Lutomirski, Ravi V Shankar,
	Xiaoyao Li
  Cc: x86, linux-kernel, Fenghua Yu

A bus lock [1] is acquired either through split locked access to writeback (WB)
memory or by using locks to uncacheable (UC) memory (e.g. direct device
assignment). This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores.

Although split lock can be detected by #AC trap, the trap is triggered
before the instruction acquires bus lock. This makes it difficult to
mitigate bus lock (e.g. throttle the user application).

Some CPUs have ability to notify the kernel by an #DB trap after the
instruction acquires a bus lock and is executed. This allows the kernel
to enforce user application throttling or mitigations and also provides
a better environment to debug kernel split lock issues since the kernel
can continue instead of crashing.

#DB for bus lock detect fixes all issues in #AC for split lock detect:
1) It's architectural ... just need to look at one CPUID bit to know it
   exists
2) The IA32_DEBUGCTL MSR, which reports bus lock in #DB, is per-thread.
   So each process or guest can have different behavior.
3) It has support for VMM/guests (new VMEXIT codes, etc).

Use the existing kernel command line option "split_lock_detect=" to handle
#DB for bus lock:

split_lock_detect=
		#AC for split lock		#DB for bus lock

off		Do nothing			Do nothing

warn		Kernel OOPs			Kernel warns rate limited
		Warn once per task and		and continues to run.
		disable future checking 	Warn once per task and
						and continue to run.
						When both features are
						supported, warn in #DB

fatal		Kernel OOPs			Kernel warn rate limited
		Send SIGBUS to user		Send SIGBUS to user
		When both features are
		supported, fatal in #AC.

ratelimit:N	Do nothing			Kernel warns rate limited
						and continue to run.
						Limit bus lock rate to
						N per second in the
						current non root user.

On systems that support #DB for bus lock detection the default is "warn".

[1] Chapter 8 https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 .../admin-guide/kernel-parameters.txt         |  48 +++++-
 arch/x86/include/asm/cpu.h                    |  16 +-
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   1 +
 arch/x86/include/uapi/asm/debugreg.h          |   3 +-
 arch/x86/kernel/cpu/common.c                  |   2 +-
 arch/x86/kernel/cpu/intel.c                   | 156 +++++++++++++++---
 arch/x86/kernel/traps.c                       |  10 ++
 include/linux/sched/user.h                    |   4 +-
 kernel/user.c                                 |   7 +
 10 files changed, 214 insertions(+), 34 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fb95fad81c79..7a1cb6fe8b8e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4816,27 +4816,59 @@
 	spia_peddr=
 
 	split_lock_detect=
-			[X86] Enable split lock detection
+			[X86] Enable split lock detection or bus lock detection
 
 			When enabled (and if hardware support is present), atomic
 			instructions that access data across cache line
-			boundaries will result in an alignment check exception.
+			boundaries will result in an alignment check exception
+			for split lock detection or an debug exception for
+			bus lock detection.
 
 			off	- not enabled
 
-			warn	- the kernel will emit rate limited warnings
-				  about applications triggering the #AC
-				  exception. This mode is the default on CPUs
-				  that supports split lock detection.
+			warn	- Default mode.
 
-			fatal	- the kernel will send SIGBUS to applications
-				  that trigger the #AC exception.
+				  If split lock detection is enabled in
+				  hardware, the kernel will emit rate limited
+				  warnings about applications triggering the #AC
+				  exception.
+
+				  If bus lock detection is enabled in hardware,
+				  the kernel will emit rate limited warnings
+				  about applications triggering the #D
+				  exception.
+
+				  Default behavior is from bus lock detection
+				  if both features are enabled in hardware.
+
+			fatal	- If split lock detection is enabled in
+				  hardware, the kernel will send SIGBUS to
+				  applications that trigger the #AC exception.
+
+				  If bus lock detection is enabled in hardware,
+				  the kernel will send SIGBUS to application
+				  that trigger the #DB exception.
+
+				  Default behavior is from split lock detection
+				  if both are enabled in hardware.
+
+			ratelimit:N
+				  Set rate limit to N bus locks per second
+				  for bus lock detection. 0 < N <= HZ/2 and
+				  N is approximate. Not applied to root user
+				  and the kernel. Only applied to non root user.
+
+				  N/A for split lock detection.
 
 			If an #AC exception is hit in the kernel or in
 			firmware (i.e. not while executing in user mode)
 			the kernel will oops in either "warn" or "fatal"
 			mode.
 
+			If an #DB exception is hit in the kernel or in
+			firmware, the kernel will warn in either "warn" or
+			"fatal" mode.
+
 	srbds=		[X86,INTEL]
 			Control the Special Register Buffer Data Sampling
 			(SRBDS) mitigation.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index da78ccbd493b..0c3512ccf954 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -41,12 +41,14 @@ unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
 #ifdef CONFIG_CPU_SUP_INTEL
-extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern void __init sld_setup(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
 extern bool handle_guest_split_lock(unsigned long ip);
+extern bool handle_user_bus_lock(struct pt_regs *regs);
+extern bool handle_kernel_bus_lock(struct pt_regs *regs);
 #else
-static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline void __init sld_setup(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
 static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 {
@@ -57,6 +59,16 @@ static inline bool handle_guest_split_lock(unsigned long ip)
 {
 	return false;
 }
+
+static inline bool handle_user_bus_lock(struct pt_regs *regs)
+{
+	return false;
+}
+
+static inline bool handle_kernel_bus_lock(struct pt_regs *regs)
+{
+	return false;
+}
 #endif
 #ifdef CONFIG_IA32_FEAT_CTL
 void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 02dabc9e77b0..4bf8b1b51287 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -348,6 +348,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ	(16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_BUS_LOCK_DETECT	(16*32+24) /* Bus Lock detect */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
 #define X86_FEATURE_MOVDIR64B		(16*32+28) /* MOVDIR64B instruction */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e8370e64a155..9a095af03eba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -241,6 +241,7 @@
 #define DEBUGCTLMSR_LBR			(1UL <<  0) /* last branch recording */
 #define DEBUGCTLMSR_BTF_SHIFT		1
 #define DEBUGCTLMSR_BTF			(1UL <<  1) /* single-step on branches */
+#define DEBUGCTLMSR_BUS_LOCK_DETECT	(1UL <<  2) /* Bus lock detect */
 #define DEBUGCTLMSR_TR			(1UL <<  6)
 #define DEBUGCTLMSR_BTS			(1UL <<  7)
 #define DEBUGCTLMSR_BTINT		(1UL <<  8)
diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h
index d95d080b30e3..61078319fc6c 100644
--- a/arch/x86/include/uapi/asm/debugreg.h
+++ b/arch/x86/include/uapi/asm/debugreg.h
@@ -16,7 +16,7 @@
    are either reserved or not of interest to us. */
 
 /* Define reserved bits in DR6 which are always set to 1 */
-#define DR6_RESERVED	(0xFFFF0FF0)
+#define DR6_RESERVED	(0xFFFF07F0)
 
 #define DR_TRAP0	(0x1)		/* db0 */
 #define DR_TRAP1	(0x2)		/* db1 */
@@ -24,6 +24,7 @@
 #define DR_TRAP3	(0x8)		/* db3 */
 #define DR_TRAP_BITS	(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)
 
+#define DR_BUS_LOCK	(0x800)		/* bus_lock */
 #define DR_STEP		(0x4000)	/* single-step */
 #define DR_SWITCH	(0x8000)	/* task switch */
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 95c090a45b4b..a5488d644764 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1255,7 +1255,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
-	cpu_set_core_cap_bits(c);
+	sld_setup(c);
 
 	fpu__init_system(c);
 
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 0ab48f1cdf84..f498472990af 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -10,6 +10,9 @@
 #include <linux/thread_info.h>
 #include <linux/init.h>
 #include <linux/uaccess.h>
+#include <linux/cred.h>
+#include <linux/delay.h>
+#include <linux/sched/user.h>
 
 #include <asm/cpufeature.h>
 #include <asm/msr.h>
@@ -39,15 +42,20 @@ enum split_lock_detect_state {
 	sld_off = 0,
 	sld_warn,
 	sld_fatal,
+	sld_ratelimit,
 };
 
 /*
  * Default to sld_off because most systems do not support split lock detection
- * split_lock_setup() will switch this to sld_warn on systems that support
- * split lock detect, unless there is a command line override.
+ * sld_state_setup() will switch this to sld_warn on systems that support
+ * split lock/bus lock detect, unless there is a command line override.
  */
 static enum split_lock_detect_state sld_state __ro_after_init = sld_off;
 static u64 msr_test_ctrl_cache __ro_after_init;
+/* Split lock detection is enabled if it's true. */
+static bool sld;
+/* Bus lock detection is enabled if it's true. */
+static bool bld;
 
 /*
  * With a name like MSR_TEST_CTL it should go without saying, but don't touch
@@ -601,6 +609,7 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 }
 
 static void split_lock_init(void);
+static void bus_lock_init(void);
 
 static void init_intel(struct cpuinfo_x86 *c)
 {
@@ -717,6 +726,7 @@ static void init_intel(struct cpuinfo_x86 *c)
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
 
+	bus_lock_init();
 	split_lock_init();
 }
 
@@ -991,13 +1001,25 @@ static const struct {
 	{ "off",	sld_off   },
 	{ "warn",	sld_warn  },
 	{ "fatal",	sld_fatal },
+	{ "ratelimit:", sld_ratelimit },
 };
 
 static inline bool match_option(const char *arg, int arglen, const char *opt)
 {
-	int len = strlen(opt);
 
-	return len == arglen && !strncmp(arg, opt, len);
+	int len = strlen(opt), ratelimit;
+
+	if (strncmp(arg, opt, len))
+		return false;
+
+	if (sscanf(arg, "ratelimit:%d", &ratelimit) == 1 && ratelimit > 0 &&
+	    ratelimit_bl <= HZ / 2) {
+		ratelimit_bl = ratelimit;
+
+		return true;
+	}
+
+	return len == arglen;
 }
 
 static bool split_lock_verify_msr(bool on)
@@ -1016,16 +1038,15 @@ static bool split_lock_verify_msr(bool on)
 	return ctrl == tmp;
 }
 
-static void __init split_lock_setup(void)
+static void __init sld_state_setup(void)
 {
 	enum split_lock_detect_state state = sld_warn;
 	char arg[20];
 	int i, ret;
 
-	if (!split_lock_verify_msr(false)) {
-		pr_info("MSR access failed: Disabled\n");
+	if (!static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
+	    !static_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
 		return;
-	}
 
 	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
 				  arg, sizeof(arg));
@@ -1037,17 +1058,14 @@ static void __init split_lock_setup(void)
 			}
 		}
 	}
+	sld_state = state;
+}
 
-	switch (state) {
-	case sld_off:
-		pr_info("disabled\n");
+static void __init _split_lock_setup(void)
+{
+	if (!split_lock_verify_msr(false)) {
+		pr_info("MSR access failed: Disabled\n");
 		return;
-	case sld_warn:
-		pr_info("warning about user-space split_locks\n");
-		break;
-	case sld_fatal:
-		pr_info("sending SIGBUS on user-space split_locks\n");
-		break;
 	}
 
 	rdmsrl(MSR_TEST_CTRL, msr_test_ctrl_cache);
@@ -1057,8 +1075,11 @@ static void __init split_lock_setup(void)
 		return;
 	}
 
-	sld_state = state;
+	/* Restore the MSR to its cached value. */
+	wrmsrl(MSR_TEST_CTRL, msr_test_ctrl_cache);
+
 	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+	sld = true;
 }
 
 /*
@@ -1078,6 +1099,10 @@ static void sld_update_msr(bool on)
 
 static void split_lock_init(void)
 {
+	/* If supported, #DB for bus lock will handle warn and ratelimit. */
+	if (bld && (sld_state == sld_warn || sld_state == sld_ratelimit))
+		return;
+
 	if (cpu_model_supports_sld)
 		split_lock_verify_msr(sld_state != sld_off);
 }
@@ -1114,14 +1139,58 @@ bool handle_guest_split_lock(unsigned long ip)
 }
 EXPORT_SYMBOL_GPL(handle_guest_split_lock);
 
+static void bus_lock_init(void)
+{
+	u64 val;
+
+	if (!bld)
+		return;
+
+	/* If supported, #AC for split lock will handle fatal. */
+	if (sld && sld_state == sld_fatal)
+		return;
+
+	rdmsrl(MSR_IA32_DEBUGCTLMSR, val);
+	val |= DEBUGCTLMSR_BUS_LOCK_DETECT;
+	wrmsrl(MSR_IA32_DEBUGCTLMSR, val);
+}
+
 bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 {
-	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+	if ((regs->flags & X86_EFLAGS_AC) || !sld || sld_state == sld_fatal ||
+	    sld_state == sld_ratelimit)
 		return false;
 	split_lock_warn(regs->ip);
 	return true;
 }
 
+bool handle_user_bus_lock(struct pt_regs *regs)
+{
+	if (!bld)
+		return false;
+
+	pr_warn_ratelimited("#DB: %s/%d took a bus_lock trap at address: 0x%lx\n",
+			    current->comm, current->pid, regs->ip);
+
+	if (sld_state == sld_ratelimit) {
+		while (!__ratelimit(&get_current_user()->ratelimit_bl))
+			msleep(1000 / ratelimit_bl);
+	}
+
+	return true;
+}
+
+bool handle_kernel_bus_lock(struct pt_regs *regs)
+{
+	if (!bld)
+		return false;
+
+	pr_warn_ratelimited("#DB: %s/%d took a bus_lock trap at address: 0x%lx\n",
+			    current->comm, current->pid, regs->ip);
+
+	return true;
+}
+
 /*
  * This function is called only when switching between tasks with
  * different split-lock detection modes. It sets the MSR for the
@@ -1159,7 +1228,7 @@ static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
 	{}
 };
 
-void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+static void __init split_lock_setup(struct cpuinfo_x86 *c)
 {
 	const struct x86_cpu_id *m;
 	u64 ia32_core_caps;
@@ -1186,5 +1255,50 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
 	}
 
 	cpu_model_supports_sld = true;
-	split_lock_setup();
+	_split_lock_setup();
+}
+
+static void sld_state_show(void)
+{
+	if (!bld && !sld)
+		return;
+
+	switch (sld_state) {
+	case sld_off:
+		pr_info("disabled\n");
+		break;
+
+	case sld_warn:
+		if (bld)
+			pr_info("#DB: warning about kernel and user-space bus_locks\n");
+		else
+			pr_info("#AC: crashing the kernel about kernel split_locks and warning about user-space split_locks\n");
+		break;
+
+	case sld_fatal:
+		if (sld)
+			pr_info("#AC: crashing the kernel on kernel split_locks and sending SIGBUS on user-space split_locks\n");
+		else
+			pr_info("#DB: warning about kernel bus_locks and sending SIGBUS on user-space bus_locks\n");
+		break;
+
+	case sld_ratelimit:
+		if (bld)
+			pr_info("#DB: warning about kernel bus_locks and setting silent rate limit to %d/sec per user on non root user-space bus_locks\n", ratelimit_bl);
+		break;
+	}
+}
+
+static void __init bus_lock_setup(void)
+{
+	if (static_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) && sld_state != sld_off)
+		bld = true;
+}
+
+void __init sld_setup(struct cpuinfo_x86 *c)
+{
+	sld_state_setup();
+	split_lock_setup(c);
+	bus_lock_setup();
+	sld_state_show();
 }
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index b038695f36c5..58725567da39 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -812,6 +812,16 @@ static void handle_debug(struct pt_regs *regs, unsigned long dr6, bool user)
 	if (!user && !dr6)
 		return;
 
+	/* Handle bus lock. */
+	if (!(dr6 & DR_BUS_LOCK)) {
+		cond_local_irq_enable(regs);
+		if (user)
+			handle_user_bus_lock(regs);
+		else
+			handle_kernel_bus_lock(regs);
+		goto out;
+	}
+
 	/*
 	 * If dr6 has no reason to give us about the origin of this trap,
 	 * then it's very likely the result of an icebp/int01 trap.
diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
index 917d88edb7b9..fc757ec6c19f 100644
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -37,8 +37,9 @@ struct user_struct {
 	atomic_long_t locked_vm;
 #endif
 
-	/* Miscellaneous per-user rate limit */
+	/* Miscellaneous per-user rate limits */
 	struct ratelimit_state ratelimit;
+	struct ratelimit_state ratelimit_bl;
 };
 
 extern int uids_sysfs_init(void);
@@ -48,6 +49,7 @@ extern struct user_struct *find_user(kuid_t);
 extern struct user_struct root_user;
 #define INIT_USER (&root_user)
 
+extern int ratelimit_bl;
 
 /* per-UID process charging. */
 extern struct user_struct * alloc_uid(kuid_t);
diff --git a/kernel/user.c b/kernel/user.c
index b1635d94a1f2..023dad617625 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -103,6 +103,7 @@ struct user_struct root_user = {
 	.locked_shm     = 0,
 	.uid		= GLOBAL_ROOT_UID,
 	.ratelimit	= RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0),
+	.ratelimit_bl	= RATELIMIT_STATE_INIT(root_user.ratelimit_bl, 0, 0),
 };
 
 /*
@@ -172,6 +173,9 @@ void free_uid(struct user_struct *up)
 		free_user(up, flags);
 }
 
+/* Architectures (e.g. X86) may set this for rate limited bus locks. */
+int ratelimit_bl;
+
 struct user_struct *alloc_uid(kuid_t uid)
 {
 	struct hlist_head *hashent = uidhashentry(uid);
@@ -190,6 +194,9 @@ struct user_struct *alloc_uid(kuid_t uid)
 		refcount_set(&new->__count, 1);
 		ratelimit_state_init(&new->ratelimit, HZ, 100);
 		ratelimit_set_flags(&new->ratelimit, RATELIMIT_MSG_ON_RELEASE);
+		ratelimit_state_init(&new->ratelimit_bl, HZ, ratelimit_bl);
+		ratelimit_set_flags(&new->ratelimit_bl,
+				    RATELIMIT_MSG_ON_RELEASE);
 
 		/*
 		 * Before adding this, check whether we raced
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-17 21:35 [PATCH RFC] x86/bus_lock: Enable bus lock detection Fenghua Yu
@ 2020-07-29  3:02 ` Sean Christopherson
  2020-07-29  8:50   ` peterz
  2020-07-29 18:09   ` Yu, Fenghua
  2020-07-29  8:49 ` peterz
  1 sibling, 2 replies; 16+ messages in thread
From: Sean Christopherson @ 2020-07-29  3:02 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Borislav Petkov, Ingo Molnar, Peter Zijlstra,
	Tony Luck, H Peter Anvin, Andy Lutomirski, Ravi V Shankar,
	Xiaoyao Li, x86, linux-kernel

On Fri, Jul 17, 2020 at 02:35:00PM -0700, Fenghua Yu wrote:
> A bus lock [1] is acquired either through split locked access to writeback (WB)
> memory or by using locks to uncacheable (UC) memory (e.g. direct device

Does SLD not detect the lock to UC memory?

> assignment). This is typically >1000 cycles slower than an atomic operation
> within a cache line. It also disrupts performance on other cores.
> 
> Although split lock can be detected by #AC trap, the trap is triggered
> before the instruction acquires bus lock. This makes it difficult to
> mitigate bus lock (e.g. throttle the user application).

Mitigate _in a non-fatal way_.  The #AC makes it very easy to mitigate split
locks, it just has the side effect of SIGBUGS or killing the KVM guest.

> Some CPUs have ability to notify the kernel by an #DB trap after the
> instruction acquires a bus lock and is executed. This allows the kernel
> to enforce user application throttling or mitigations and also provides
> a better environment to debug kernel split lock issues since the kernel
> can continue instead of crashing.
> 
> #DB for bus lock detect fixes all issues in #AC for split lock detect:

Fixes "all" issues... and creates some new ones, e.g. there are use cases
where preventing the split lock from happening in the first place is
strongly desired.  It's why that train wreck exists.

> 1) It's architectural ... just need to look at one CPUID bit to know it
>    exists
> 2) The IA32_DEBUGCTL MSR, which reports bus lock in #DB, is per-thread.
>    So each process or guest can have different behavior.
> 3) It has support for VMM/guests (new VMEXIT codes, etc).
> 
> Use the existing kernel command line option "split_lock_detect=" to handle
> #DB for bus lock:

Are SLD and BLD mutually exclusive?  Can we even guarantee that given the
track record of SLD?  If not, we'll likely want to allow the user to choose
between SDL and BLD via split_lock_detect.

> split_lock_detect=
> 		#AC for split lock		#DB for bus lock
> 
> off		Do nothing			Do nothing
> 
> warn		Kernel OOPs			Kernel warns rate limited
> 		Warn once per task and		and continues to run.
> 		disable future checking 	Warn once per task and
> 						and continue to run.
> 						When both features are
> 						supported, warn in #DB
> 
> fatal		Kernel OOPs			Kernel warn rate limited

Unless the lock to UC #DB is new behavior, why would we revert to allowing
split locks in the kernel?

> 		Send SIGBUS to user		Send SIGBUS to user
> 		When both features are
> 		supported, fatal in #AC.
> 
> ratelimit:N	Do nothing			Kernel warns rate limited

This should be more than "Do nothing" for #AC, e.g. fall back to warn or
at least print a loud error.

> 						and continue to run.
> 						Limit bus lock rate to
> 						N per second in the
> 						current non root user.
> 
> On systems that support #DB for bus lock detection the default is "warn".
> 
> [1] Chapter 8 https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  48 +++++-
>  arch/x86/include/asm/cpu.h                    |  16 +-
>  arch/x86/include/asm/cpufeatures.h            |   1 +
>  arch/x86/include/asm/msr-index.h              |   1 +
>  arch/x86/include/uapi/asm/debugreg.h          |   3 +-
>  arch/x86/kernel/cpu/common.c                  |   2 +-
>  arch/x86/kernel/cpu/intel.c                   | 156 +++++++++++++++---
>  arch/x86/kernel/traps.c                       |  10 ++
>  include/linux/sched/user.h                    |   4 +-
>  kernel/user.c                                 |   7 +
>  10 files changed, 214 insertions(+), 34 deletions(-)

Maybe it's just me, but it'd be nice to break this into multiple patches
so that the SLD refactoring is separate from the introduction of BLD.  As
is, I find it hard to review as I can't easily distinguish refactoring from
new functionality.

> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index fb95fad81c79..7a1cb6fe8b8e 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4816,27 +4816,59 @@
>  	spia_peddr=
>  
>  	split_lock_detect=
> -			[X86] Enable split lock detection
> +			[X86] Enable split lock detection or bus lock detection
>  
>  			When enabled (and if hardware support is present), atomic
>  			instructions that access data across cache line
> -			boundaries will result in an alignment check exception.
> +			boundaries will result in an alignment check exception
> +			for split lock detection or an debug exception for
> +			bus lock detection.
>  
>  			off	- not enabled
>  
> -			warn	- the kernel will emit rate limited warnings
> -				  about applications triggering the #AC
> -				  exception. This mode is the default on CPUs
> -				  that supports split lock detection.
> +			warn	- Default mode.
>  
> -			fatal	- the kernel will send SIGBUS to applications
> -				  that trigger the #AC exception.
> +				  If split lock detection is enabled in
> +				  hardware, the kernel will emit rate limited
> +				  warnings about applications triggering the #AC
> +				  exception.
> +
> +				  If bus lock detection is enabled in hardware,
> +				  the kernel will emit rate limited warnings
> +				  about applications triggering the #D

s/#D/#DB

> +				  exception.
> +
> +				  Default behavior is from bus lock detection
> +				  if both features are enabled in hardware.
> +
> +			fatal	- If split lock detection is enabled in
> +				  hardware, the kernel will send SIGBUS to
> +				  applications that trigger the #AC exception.
> +
> +				  If bus lock detection is enabled in hardware,
> +				  the kernel will send SIGBUS to application
> +				  that trigger the #DB exception.
> +
> +				  Default behavior is from split lock detection
> +				  if both are enabled in hardware.
> +
> +			ratelimit:N
> +				  Set rate limit to N bus locks per second
> +				  for bus lock detection. 0 < N <= HZ/2 and
> +				  N is approximate. Not applied to root user
> +				  and the kernel. Only applied to non root user.
> +
> +				  N/A for split lock detection.
>  
>  			If an #AC exception is hit in the kernel or in
>  			firmware (i.e. not while executing in user mode)
>  			the kernel will oops in either "warn" or "fatal"
>  			mode.
>  
> +			If an #DB exception is hit in the kernel or in
> +			firmware, the kernel will warn in either "warn" or
> +			"fatal" mode.
> +
>  	srbds=		[X86,INTEL]
>  			Control the Special Register Buffer Data Sampling
>  			(SRBDS) mitigation.

...

> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 0ab48f1cdf84..f498472990af 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -10,6 +10,9 @@
>  #include <linux/thread_info.h>
>  #include <linux/init.h>
>  #include <linux/uaccess.h>
> +#include <linux/cred.h>
> +#include <linux/delay.h>
> +#include <linux/sched/user.h>
>  
>  #include <asm/cpufeature.h>
>  #include <asm/msr.h>
> @@ -39,15 +42,20 @@ enum split_lock_detect_state {
>  	sld_off = 0,
>  	sld_warn,
>  	sld_fatal,
> +	sld_ratelimit,
>  };
>  
>  /*
>   * Default to sld_off because most systems do not support split lock detection
> - * split_lock_setup() will switch this to sld_warn on systems that support
> - * split lock detect, unless there is a command line override.
> + * sld_state_setup() will switch this to sld_warn on systems that support
> + * split lock/bus lock detect, unless there is a command line override.
>   */
>  static enum split_lock_detect_state sld_state __ro_after_init = sld_off;
>  static u64 msr_test_ctrl_cache __ro_after_init;
> +/* Split lock detection is enabled if it's true. */
> +static bool sld;
> +/* Bus lock detection is enabled if it's true. */
> +static bool bld;

Why can't these be tracked/reflected in X86_FEATURE_*?

>  /*
>   * With a name like MSR_TEST_CTL it should go without saying, but don't touch
> @@ -601,6 +609,7 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
>  }
>  
>  static void split_lock_init(void);
> +static void bus_lock_init(void);
>  
>  static void init_intel(struct cpuinfo_x86 *c)
>  {
> @@ -717,6 +726,7 @@ static void init_intel(struct cpuinfo_x86 *c)
>  	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
>  		tsx_disable();
>  
> +	bus_lock_init();
>  	split_lock_init();
>  }
>  
> @@ -991,13 +1001,25 @@ static const struct {
>  	{ "off",	sld_off   },
>  	{ "warn",	sld_warn  },
>  	{ "fatal",	sld_fatal },
> +	{ "ratelimit:", sld_ratelimit },
>  };
>  
>  static inline bool match_option(const char *arg, int arglen, const char *opt)
>  {
> -	int len = strlen(opt);
>  
> -	return len == arglen && !strncmp(arg, opt, len);
> +	int len = strlen(opt), ratelimit;
> +
> +	if (strncmp(arg, opt, len))
> +		return false;
> +
> +	if (sscanf(arg, "ratelimit:%d", &ratelimit) == 1 && ratelimit > 0 &&
> +	    ratelimit_bl <= HZ / 2) {
> +		ratelimit_bl = ratelimit;
> +
> +		return true;
> +	}
> +
> +	return len == arglen;
>  }
>  
>  static bool split_lock_verify_msr(bool on)
> @@ -1016,16 +1038,15 @@ static bool split_lock_verify_msr(bool on)
>  	return ctrl == tmp;
>  }
>  
> -static void __init split_lock_setup(void)
> +static void __init sld_state_setup(void)
>  {
>  	enum split_lock_detect_state state = sld_warn;
>  	char arg[20];
>  	int i, ret;
>  
> -	if (!split_lock_verify_msr(false)) {
> -		pr_info("MSR access failed: Disabled\n");
> +	if (!static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
> +	    !static_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))

Pretty sure static_cpu_has() in an __init function is a waste.

>  		return;
> -	}
>  
>  	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
>  				  arg, sizeof(arg));
> @@ -1037,17 +1058,14 @@ static void __init split_lock_setup(void)
>  			}
>  		}
>  	}
> +	sld_state = state;
> +}
>  
> -	switch (state) {
> -	case sld_off:
> -		pr_info("disabled\n");
> +static void __init _split_lock_setup(void)
> +{
> +	if (!split_lock_verify_msr(false)) {
> +		pr_info("MSR access failed: Disabled\n");
>  		return;
> -	case sld_warn:
> -		pr_info("warning about user-space split_locks\n");
> -		break;
> -	case sld_fatal:
> -		pr_info("sending SIGBUS on user-space split_locks\n");
> -		break;
>  	}
>  
>  	rdmsrl(MSR_TEST_CTRL, msr_test_ctrl_cache);
> @@ -1057,8 +1075,11 @@ static void __init split_lock_setup(void)
>  		return;
>  	}
>  
> -	sld_state = state;
> +	/* Restore the MSR to its cached value. */
> +	wrmsrl(MSR_TEST_CTRL, msr_test_ctrl_cache);
> +
>  	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
> +	sld = true;
>  }
>  
>  /*
> @@ -1078,6 +1099,10 @@ static void sld_update_msr(bool on)
>  
>  static void split_lock_init(void)
>  {
> +	/* If supported, #DB for bus lock will handle warn and ratelimit. */
> +	if (bld && (sld_state == sld_warn || sld_state == sld_ratelimit))
> +		return;
> +
>  	if (cpu_model_supports_sld)
>  		split_lock_verify_msr(sld_state != sld_off);
>  }
> @@ -1114,14 +1139,58 @@ bool handle_guest_split_lock(unsigned long ip)
>  }
>  EXPORT_SYMBOL_GPL(handle_guest_split_lock);
>  
> +static void bus_lock_init(void)
> +{
> +	u64 val;
> +
> +	if (!bld)
> +		return;
> +
> +	/* If supported, #AC for split lock will handle fatal. */
> +	if (sld && sld_state == sld_fatal)
> +		return;
> +
> +	rdmsrl(MSR_IA32_DEBUGCTLMSR, val);
> +	val |= DEBUGCTLMSR_BUS_LOCK_DETECT;

Uh, doesn't this enable BLD even if sld_state == sld_off?

> +	wrmsrl(MSR_IA32_DEBUGCTLMSR, val);
> +}
> +
>  bool handle_user_split_lock(struct pt_regs *regs, long error_code)
>  {
> -	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> +	if ((regs->flags & X86_EFLAGS_AC) || !sld || sld_state == sld_fatal ||
> +	    sld_state == sld_ratelimit)
>  		return false;
>  	split_lock_warn(regs->ip);
>  	return true;
>  }
>  
> +bool handle_user_bus_lock(struct pt_regs *regs)
> +{
> +	if (!bld)
> +		return false;
> +
> +	pr_warn_ratelimited("#DB: %s/%d took a bus_lock trap at address: 0x%lx\n",
> +			    current->comm, current->pid, regs->ip);
> +
> +	if (sld_state == sld_ratelimit) {
> +		while (!__ratelimit(&get_current_user()->ratelimit_bl))
> +			msleep(1000 / ratelimit_bl);
> +	}
> +
> +	return true;
> +}
> +
> +bool handle_kernel_bus_lock(struct pt_regs *regs)
> +{
> +	if (!bld)
> +		return false;
> +
> +	pr_warn_ratelimited("#DB: %s/%d took a bus_lock trap at address: 0x%lx\n",
> +			    current->comm, current->pid, regs->ip);
> +
> +	return true;
> +}
> +
>  /*
>   * This function is called only when switching between tasks with
>   * different split-lock detection modes. It sets the MSR for the
> @@ -1159,7 +1228,7 @@ static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
>  	{}
>  };
>  
> -void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
> +static void __init split_lock_setup(struct cpuinfo_x86 *c)
>  {
>  	const struct x86_cpu_id *m;
>  	u64 ia32_core_caps;
> @@ -1186,5 +1255,50 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
>  	}
>  
>  	cpu_model_supports_sld = true;
> -	split_lock_setup();
> +	_split_lock_setup();
> +}
> +
> +static void sld_state_show(void)
> +{
> +	if (!bld && !sld)
> +		return;
> +
> +	switch (sld_state) {
> +	case sld_off:
> +		pr_info("disabled\n");
> +		break;
> +
> +	case sld_warn:
> +		if (bld)
> +			pr_info("#DB: warning about kernel and user-space bus_locks\n");
> +		else
> +			pr_info("#AC: crashing the kernel about kernel split_locks and warning about user-space split_locks\n");
> +		break;
> +
> +	case sld_fatal:
> +		if (sld)
> +			pr_info("#AC: crashing the kernel on kernel split_locks and sending SIGBUS on user-space split_locks\n");
> +		else
> +			pr_info("#DB: warning about kernel bus_locks and sending SIGBUS on user-space bus_locks\n");
> +		break;
> +
> +	case sld_ratelimit:
> +		if (bld)
> +			pr_info("#DB: warning about kernel bus_locks and setting silent rate limit to %d/sec per user on non root user-space bus_locks\n", ratelimit_bl);
> +		break;
> +	}
> +}
> +
> +static void __init bus_lock_setup(void)
> +{
> +	if (static_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) && sld_state != sld_off)

More overkill.

> +		bld = true;
> +}
> +
> +void __init sld_setup(struct cpuinfo_x86 *c)

This wrapper probably should call out that it configures both sld and bld.

> +{
> +	sld_state_setup();
> +	split_lock_setup(c);
> +	bus_lock_setup();
> +	sld_state_show();
>  }
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index b038695f36c5..58725567da39 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -812,6 +812,16 @@ static void handle_debug(struct pt_regs *regs, unsigned long dr6, bool user)
>  	if (!user && !dr6)
>  		return;
>  
> +	/* Handle bus lock. */
> +	if (!(dr6 & DR_BUS_LOCK)) {
> +		cond_local_irq_enable(regs);
> +		if (user)
> +			handle_user_bus_lock(regs);
> +		else
> +			handle_kernel_bus_lock(regs);
> +		goto out;
> +	}
> +
>  	/*
>  	 * If dr6 has no reason to give us about the origin of this trap,
>  	 * then it's very likely the result of an icebp/int01 trap.
> diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
> index 917d88edb7b9..fc757ec6c19f 100644
> --- a/include/linux/sched/user.h
> +++ b/include/linux/sched/user.h
> @@ -37,8 +37,9 @@ struct user_struct {
>  	atomic_long_t locked_vm;
>  #endif
>  
> -	/* Miscellaneous per-user rate limit */
> +	/* Miscellaneous per-user rate limits */
>  	struct ratelimit_state ratelimit;
> +	struct ratelimit_state ratelimit_bl;

Why not spell out ratelimit_bus_lock?  There's no way someone looking at
this code in isolation is going to have any clue what "bl" means.

>  };
>  
>  extern int uids_sysfs_init(void);
> @@ -48,6 +49,7 @@ extern struct user_struct *find_user(kuid_t);
>  extern struct user_struct root_user;
>  #define INIT_USER (&root_user)
>  
> +extern int ratelimit_bl;
>  
>  /* per-UID process charging. */
>  extern struct user_struct * alloc_uid(kuid_t);
> diff --git a/kernel/user.c b/kernel/user.c
> index b1635d94a1f2..023dad617625 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -103,6 +103,7 @@ struct user_struct root_user = {
>  	.locked_shm     = 0,
>  	.uid		= GLOBAL_ROOT_UID,
>  	.ratelimit	= RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0),
> +	.ratelimit_bl	= RATELIMIT_STATE_INIT(root_user.ratelimit_bl, 0, 0),
>  };
>  
>  /*
> @@ -172,6 +173,9 @@ void free_uid(struct user_struct *up)
>  		free_user(up, flags);
>  }
>  
> +/* Architectures (e.g. X86) may set this for rate limited bus locks. */
> +int ratelimit_bl;
> +
>  struct user_struct *alloc_uid(kuid_t uid)
>  {
>  	struct hlist_head *hashent = uidhashentry(uid);
> @@ -190,6 +194,9 @@ struct user_struct *alloc_uid(kuid_t uid)
>  		refcount_set(&new->__count, 1);
>  		ratelimit_state_init(&new->ratelimit, HZ, 100);
>  		ratelimit_set_flags(&new->ratelimit, RATELIMIT_MSG_ON_RELEASE);
> +		ratelimit_state_init(&new->ratelimit_bl, HZ, ratelimit_bl);
> +		ratelimit_set_flags(&new->ratelimit_bl,
> +				    RATELIMIT_MSG_ON_RELEASE);
>  
>  		/*
>  		 * Before adding this, check whether we raced
> -- 
> 2.19.1
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-17 21:35 [PATCH RFC] x86/bus_lock: Enable bus lock detection Fenghua Yu
  2020-07-29  3:02 ` Sean Christopherson
@ 2020-07-29  8:49 ` peterz
  2020-07-29 20:40   ` Fenghua Yu
  1 sibling, 1 reply; 16+ messages in thread
From: peterz @ 2020-07-29  8:49 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Borislav Petkov, Ingo Molnar, Tony Luck,
	H Peter Anvin, Andy Lutomirski, Ravi V Shankar, Xiaoyao Li ,
	x86, linux-kernel

On Fri, Jul 17, 2020 at 02:35:00PM -0700, Fenghua Yu wrote:

> #DB for bus lock detect fixes all issues in #AC for split lock detect:
> 1) It's architectural ... just need to look at one CPUID bit to know it
>    exists
> 2) The IA32_DEBUGCTL MSR, which reports bus lock in #DB, is per-thread.
>    So each process or guest can have different behavior.

And it generates a whole new problem due to #DB being an IST, and

> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index b038695f36c5..58725567da39 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -812,6 +812,16 @@ static void handle_debug(struct pt_regs *regs, unsigned long dr6, bool user)
>  	if (!user && !dr6)
>  		return;
>  
> +	/* Handle bus lock. */
> +	if (!(dr6 & DR_BUS_LOCK)) {
> +		cond_local_irq_enable(regs);
> +		if (user)
> +			handle_user_bus_lock(regs);
> +		else
> +			handle_kernel_bus_lock(regs);
> +		goto out;
> +	}
> +
>  	/*
>  	 * If dr6 has no reason to give us about the origin of this trap,
>  	 * then it's very likely the result of an icebp/int01 trap.

we very much rely on #DB never recursing, which we carefully crafted by
disallowing hardare breakpoints on noinstr code and clearing DR7 early.

But now it can... please keep the pieces.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29  3:02 ` Sean Christopherson
@ 2020-07-29  8:50   ` peterz
  2020-07-29 18:09   ` Yu, Fenghua
  1 sibling, 0 replies; 16+ messages in thread
From: peterz @ 2020-07-29  8:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Fenghua Yu, Thomas Gleixner, Borislav Petkov, Ingo Molnar,
	Tony Luck, H Peter Anvin, Andy Lutomirski, Ravi V Shankar,
	Xiaoyao Li, x86, linux-kernel

On Tue, Jul 28, 2020 at 08:02:32PM -0700, Sean Christopherson wrote:
> Maybe it's just me, but it'd be nice to break this into multiple patches
> so that the SLD refactoring is separate from the introduction of BLD.  As
> is, I find it hard to review as I can't easily distinguish refactoring from
> new functionality.

This. Absolutely 100% this. It's like Fenghua has never send a patch
before..

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29  3:02 ` Sean Christopherson
  2020-07-29  8:50   ` peterz
@ 2020-07-29 18:09   ` Yu, Fenghua
  2020-07-29 18:46     ` Sean Christopherson
  1 sibling, 1 reply; 16+ messages in thread
From: Yu, Fenghua @ 2020-07-29 18:09 UTC (permalink / raw)
  To: Christopherson, Sean J
  Cc: Thomas Gleixner, Borislav Petkov, Ingo Molnar, Peter Zijlstra,
	Shanbhogue, Vedvyas, Luck, Tony, H Peter Anvin, Andy Lutomirski,
	Shankar, Ravi V, Li, Xiaoyao, x86, linux-kernel

Hi, Sean,

> > A bus lock [1] is acquired either through split locked access to
> > writeback (WB) memory or by using locks to uncacheable (UC) memory
> > (e.g. direct device
> 
> Does SLD not detect the lock to UC memory?

The statement might not be accurate. Split Lock Detection doesn't detect bus
lock to non-WB memory (including UC memory).

> 
> > assignment). This is typically >1000 cycles slower than an atomic
> > operation within a cache line. It also disrupts performance on other cores.
> >
> > Although split lock can be detected by #AC trap, the trap is triggered
> > before the instruction acquires bus lock. This makes it difficult to
> > mitigate bus lock (e.g. throttle the user application).
> 
> Mitigate _in a non-fatal way_.  The #AC makes it very easy to mitigate split
> locks, it just has the side effect of SIGBUGS or killing the KVM guest.

Adding "in a non-fatal way" is more better. Will fix it.

> 
> > Some CPUs have ability to notify the kernel by an #DB trap after the
> > instruction acquires a bus lock and is executed. This allows the
> > kernel to enforce user application throttling or mitigations and also
> > provides a better environment to debug kernel split lock issues since
> > the kernel can continue instead of crashing.
> >
> > #DB for bus lock detect fixes all issues in #AC for split lock detect:
> 
> Fixes "all" issues... and creates some new ones, e.g. there are use cases
> where preventing the split lock from happening in the first place is strongly
> desired.  It's why that train wreck exists.

Bus Lock Detection doesn't replace Split Lock Detection. If both features
are enabled, default behavior is warning from bus lock, fatal behavior is
still from split lock. See the behavior table as follows.

> 
> > 1) It's architectural ... just need to look at one CPUID bit to know it
> >    exists
> > 2) The IA32_DEBUGCTL MSR, which reports bus lock in #DB, is per-thread.
> >    So each process or guest can have different behavior.
> > 3) It has support for VMM/guests (new VMEXIT codes, etc).
> >
> > Use the existing kernel command line option "split_lock_detect=" to
> > handle #DB for bus lock:
> 
> Are SLD and BLD mutually exclusive?  Can we even guarantee that given the
> track record of SLD?  If not, we'll likely want to allow the user to choose
> between SDL and BLD via split_lock_detect.

The two hardware features can be enabled on the same platform.
But they are mutually exclusive in the kernel because #AC from SLD happens
before the instruction is executed and #DB happens after the instruction is
executed.

Right now, if both of them are enabled, "warn" behavior goes to
bus lock and "fatal" behavior goes to split lock.

Do you want the user to override the behaviors by something like this?

split_lock_detect=warn[,sld]: if given "sld" while both features are enabled,
warn behavior is from split lock instead of bus lock detection.

split_lock_detect=fatal[,bld]: if given "bld" while both features are enabled,
fatal behavior is from bus lock detection.

> 
> > split_lock_detect=
> > 		#AC for split lock		#DB for bus lock
> >
> > off		Do nothing			Do nothing
> >
> > warn		Kernel OOPs			Kernel warns rate limited
> > 		Warn once per task and		and continues to run.
> > 		disable future checking 	Warn once per task and
> > 						and continue to run.
> > 						When both features are
> > 						supported, warn in #DB
> >
> > fatal		Kernel OOPs			Kernel warn rate limited
> 
> Unless the lock to UC #DB is new behavior, why would we revert to allowing
> split locks in the kernel?

SLD cannot detect lock to non-WB memory (including UC). BLD can detect
both bus locks from both split lock and lock to non-WB.

> 
> > 		Send SIGBUS to user		Send SIGBUS to user
> > 		When both features are
> > 		supported, fatal in #AC.
> >
> > ratelimit:N	Do nothing			Kernel warns rate limited
> 
> This should be more than "Do nothing" for #AC, e.g. fall back to warn or at
> least print a loud error.

Ok. Will fall back to warn.

> 
> > 						and continue to run.
> > 						Limit bus lock rate to
> > 						N per second in the
> > 						current non root user.
> >
> > On systems that support #DB for bus lock detection the default is "warn".
> >
> > [1] Chapter 8
> > https://software.intel.com/sites/default/files/managed/c5/15/architect
> > ure-instruction-set-extensions-programming-reference.pdf
> >
> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  .../admin-guide/kernel-parameters.txt         |  48 +++++-
> >  arch/x86/include/asm/cpu.h                    |  16 +-
> >  arch/x86/include/asm/cpufeatures.h            |   1 +
> >  arch/x86/include/asm/msr-index.h              |   1 +
> >  arch/x86/include/uapi/asm/debugreg.h          |   3 +-
> >  arch/x86/kernel/cpu/common.c                  |   2 +-
> >  arch/x86/kernel/cpu/intel.c                   | 156 +++++++++++++++---
> >  arch/x86/kernel/traps.c                       |  10 ++
> >  include/linux/sched/user.h                    |   4 +-
> >  kernel/user.c                                 |   7 +
> >  10 files changed, 214 insertions(+), 34 deletions(-)
> 
> Maybe it's just me, but it'd be nice to break this into multiple patches so that
> the SLD refactoring is separate from the introduction of BLD.  As is, I find it
> hard to review as I can't easily distinguish refactoring from new functionality.

OK. Will split the patch into multiple patches.

> 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt
> > b/Documentation/admin-guide/kernel-parameters.txt
> > index fb95fad81c79..7a1cb6fe8b8e 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -4816,27 +4816,59 @@
> >  	spia_peddr=
> >
> >  	split_lock_detect=
> > -			[X86] Enable split lock detection
> > +			[X86] Enable split lock detection or bus lock detection
> >
> >  			When enabled (and if hardware support is present),
> atomic
> >  			instructions that access data across cache line
> > -			boundaries will result in an alignment check
> exception.
> > +			boundaries will result in an alignment check
> exception
> > +			for split lock detection or an debug exception for
> > +			bus lock detection.
> >
> >  			off	- not enabled
> >
> > -			warn	- the kernel will emit rate limited warnings
> > -				  about applications triggering the #AC
> > -				  exception. This mode is the default on CPUs
> > -				  that supports split lock detection.
> > +			warn	- Default mode.
> >
> > -			fatal	- the kernel will send SIGBUS to applications
> > -				  that trigger the #AC exception.
> > +				  If split lock detection is enabled in
> > +				  hardware, the kernel will emit rate limited
> > +				  warnings about applications triggering the
> #AC
> > +				  exception.
> > +
> > +				  If bus lock detection is enabled in hardware,
> > +				  the kernel will emit rate limited warnings
> > +				  about applications triggering the #D
> 
> s/#D/#DB
> 
> > +				  exception.
> > +
> > +				  Default behavior is from bus lock detection
> > +				  if both features are enabled in hardware.
> > +
> > +			fatal	- If split lock detection is enabled in
> > +				  hardware, the kernel will send SIGBUS to
> > +				  applications that trigger the #AC exception.
> > +
> > +				  If bus lock detection is enabled in hardware,
> > +				  the kernel will send SIGBUS to application
> > +				  that trigger the #DB exception.
> > +
> > +				  Default behavior is from split lock detection
> > +				  if both are enabled in hardware.
> > +
> > +			ratelimit:N
> > +				  Set rate limit to N bus locks per second
> > +				  for bus lock detection. 0 < N <= HZ/2 and
> > +				  N is approximate. Not applied to root user
> > +				  and the kernel. Only applied to non root
> user.
> > +
> > +				  N/A for split lock detection.
> >
> >  			If an #AC exception is hit in the kernel or in
> >  			firmware (i.e. not while executing in user mode)
> >  			the kernel will oops in either "warn" or "fatal"
> >  			mode.
> >
> > +			If an #DB exception is hit in the kernel or in
> > +			firmware, the kernel will warn in either "warn" or
> > +			"fatal" mode.
> > +
> >  	srbds=		[X86,INTEL]
> >  			Control the Special Register Buffer Data Sampling
> >  			(SRBDS) mitigation.
> 
> ...
> 
> > diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> > index 0ab48f1cdf84..f498472990af 100644
> > --- a/arch/x86/kernel/cpu/intel.c
> > +++ b/arch/x86/kernel/cpu/intel.c
> > @@ -10,6 +10,9 @@
> >  #include <linux/thread_info.h>
> >  #include <linux/init.h>
> >  #include <linux/uaccess.h>
> > +#include <linux/cred.h>
> > +#include <linux/delay.h>
> > +#include <linux/sched/user.h>
> >
> >  #include <asm/cpufeature.h>
> >  #include <asm/msr.h>
> > @@ -39,15 +42,20 @@ enum split_lock_detect_state {
> >  	sld_off = 0,
> >  	sld_warn,
> >  	sld_fatal,
> > +	sld_ratelimit,
> >  };
> >
> >  /*
> >   * Default to sld_off because most systems do not support split lock
> > detection
> > - * split_lock_setup() will switch this to sld_warn on systems that
> > support
> > - * split lock detect, unless there is a command line override.
> > + * sld_state_setup() will switch this to sld_warn on systems that
> > + support
> > + * split lock/bus lock detect, unless there is a command line override.
> >   */
> >  static enum split_lock_detect_state sld_state __ro_after_init =
> > sld_off;  static u64 msr_test_ctrl_cache __ro_after_init;
> > +/* Split lock detection is enabled if it's true. */ static bool sld;
> > +/* Bus lock detection is enabled if it's true. */ static bool bld;
> 
> Why can't these be tracked/reflected in X86_FEATURE_*?

sld and bld are enabled depending on kernel parameter "split_lock_detect=".
They are not static and cannot be tracked by static X86_FEATURE_*.

> 
> >  /*
> >   * With a name like MSR_TEST_CTL it should go without saying, but
> > don't touch @@ -601,6 +609,7 @@ static void
> > init_intel_misc_features(struct cpuinfo_x86 *c)  }
> >
> >  static void split_lock_init(void);
> > +static void bus_lock_init(void);
> >
> >  static void init_intel(struct cpuinfo_x86 *c)  { @@ -717,6 +726,7 @@
> > static void init_intel(struct cpuinfo_x86 *c)
> >  	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
> >  		tsx_disable();
> >
> > +	bus_lock_init();
> >  	split_lock_init();
> >  }
> >
> > @@ -991,13 +1001,25 @@ static const struct {
> >  	{ "off",	sld_off   },
> >  	{ "warn",	sld_warn  },
> >  	{ "fatal",	sld_fatal },
> > +	{ "ratelimit:", sld_ratelimit },
> >  };
> >
> >  static inline bool match_option(const char *arg, int arglen, const
> > char *opt)  {
> > -	int len = strlen(opt);
> >
> > -	return len == arglen && !strncmp(arg, opt, len);
> > +	int len = strlen(opt), ratelimit;
> > +
> > +	if (strncmp(arg, opt, len))
> > +		return false;
> > +
> > +	if (sscanf(arg, "ratelimit:%d", &ratelimit) == 1 && ratelimit > 0 &&
> > +	    ratelimit_bl <= HZ / 2) {
> > +		ratelimit_bl = ratelimit;
> > +
> > +		return true;
> > +	}
> > +
> > +	return len == arglen;
> >  }
> >
> >  static bool split_lock_verify_msr(bool on) @@ -1016,16 +1038,15 @@
> > static bool split_lock_verify_msr(bool on)
> >  	return ctrl == tmp;
> >  }
> >
> > -static void __init split_lock_setup(void)
> > +static void __init sld_state_setup(void)
> >  {
> >  	enum split_lock_detect_state state = sld_warn;
> >  	char arg[20];
> >  	int i, ret;
> >
> > -	if (!split_lock_verify_msr(false)) {
> > -		pr_info("MSR access failed: Disabled\n");
> > +	if (!static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
> > +	    !static_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
> 
> Pretty sure static_cpu_has() in an __init function is a waste.

Sure. Will change it.

> 
> >  		return;
> > -	}
> >
> >  	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
> >  				  arg, sizeof(arg));
> > @@ -1037,17 +1058,14 @@ static void __init split_lock_setup(void)
> >  			}
> >  		}
> >  	}
> > +	sld_state = state;
> > +}
> >
> > -	switch (state) {
> > -	case sld_off:
> > -		pr_info("disabled\n");
> > +static void __init _split_lock_setup(void) {
> > +	if (!split_lock_verify_msr(false)) {
> > +		pr_info("MSR access failed: Disabled\n");
> >  		return;
> > -	case sld_warn:
> > -		pr_info("warning about user-space split_locks\n");
> > -		break;
> > -	case sld_fatal:
> > -		pr_info("sending SIGBUS on user-space split_locks\n");
> > -		break;
> >  	}
> >
> >  	rdmsrl(MSR_TEST_CTRL, msr_test_ctrl_cache); @@ -1057,8 +1075,11
> @@
> > static void __init split_lock_setup(void)
> >  		return;
> >  	}
> >
> > -	sld_state = state;
> > +	/* Restore the MSR to its cached value. */
> > +	wrmsrl(MSR_TEST_CTRL, msr_test_ctrl_cache);
> > +
> >  	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
> > +	sld = true;
> >  }
> >
> >  /*
> > @@ -1078,6 +1099,10 @@ static void sld_update_msr(bool on)
> >
> >  static void split_lock_init(void)
> >  {
> > +	/* If supported, #DB for bus lock will handle warn and ratelimit. */
> > +	if (bld && (sld_state == sld_warn || sld_state == sld_ratelimit))
> > +		return;
> > +
> >  	if (cpu_model_supports_sld)
> >  		split_lock_verify_msr(sld_state != sld_off);  } @@ -1114,14
> > +1139,58 @@ bool handle_guest_split_lock(unsigned long ip)  }
> > EXPORT_SYMBOL_GPL(handle_guest_split_lock);
> >
> > +static void bus_lock_init(void)
> > +{
> > +	u64 val;
> > +
> > +	if (!bld)
> > +		return;
> > +
> > +	/* If supported, #AC for split lock will handle fatal. */
> > +	if (sld && sld_state == sld_fatal)
> > +		return;
> > +
> > +	rdmsrl(MSR_IA32_DEBUGCTLMSR, val);
> > +	val |= DEBUGCTLMSR_BUS_LOCK_DETECT;
> 
> Uh, doesn't this enable BLD even if sld_state == sld_off?

If sld_state == sld_off, bld is 0 as well.
Then "if (!bld) return" will disable BLD.

> 
> > +	wrmsrl(MSR_IA32_DEBUGCTLMSR, val);
> > +}
> > +
> >  bool handle_user_split_lock(struct pt_regs *regs, long error_code)  {
> > -	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> > +	if ((regs->flags & X86_EFLAGS_AC) || !sld || sld_state == sld_fatal ||
> > +	    sld_state == sld_ratelimit)
> >  		return false;
> >  	split_lock_warn(regs->ip);
> >  	return true;
> >  }
> >
> > +bool handle_user_bus_lock(struct pt_regs *regs) {
> > +	if (!bld)
> > +		return false;
> > +
> > +	pr_warn_ratelimited("#DB: %s/%d took a bus_lock trap at address:
> 0x%lx\n",
> > +			    current->comm, current->pid, regs->ip);
> > +
> > +	if (sld_state == sld_ratelimit) {
> > +		while (!__ratelimit(&get_current_user()->ratelimit_bl))
> > +			msleep(1000 / ratelimit_bl);
> > +	}
> > +
> > +	return true;
> > +}
> > +
> > +bool handle_kernel_bus_lock(struct pt_regs *regs) {
> > +	if (!bld)
> > +		return false;
> > +
> > +	pr_warn_ratelimited("#DB: %s/%d took a bus_lock trap at address:
> 0x%lx\n",
> > +			    current->comm, current->pid, regs->ip);
> > +
> > +	return true;
> > +}
> > +
> >  /*
> >   * This function is called only when switching between tasks with
> >   * different split-lock detection modes. It sets the MSR for the @@
> > -1159,7 +1228,7 @@ static const struct x86_cpu_id split_lock_cpu_ids[]
> __initconst = {
> >  	{}
> >  };
> >
> > -void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
> > +static void __init split_lock_setup(struct cpuinfo_x86 *c)
> >  {
> >  	const struct x86_cpu_id *m;
> >  	u64 ia32_core_caps;
> > @@ -1186,5 +1255,50 @@ void __init cpu_set_core_cap_bits(struct
> cpuinfo_x86 *c)
> >  	}
> >
> >  	cpu_model_supports_sld = true;
> > -	split_lock_setup();
> > +	_split_lock_setup();
> > +}
> > +
> > +static void sld_state_show(void)
> > +{
> > +	if (!bld && !sld)
> > +		return;
> > +
> > +	switch (sld_state) {
> > +	case sld_off:
> > +		pr_info("disabled\n");
> > +		break;
> > +
> > +	case sld_warn:
> > +		if (bld)
> > +			pr_info("#DB: warning about kernel and user-space
> bus_locks\n");
> > +		else
> > +			pr_info("#AC: crashing the kernel about kernel
> split_locks and warning about user-space split_locks\n");
> > +		break;
> > +
> > +	case sld_fatal:
> > +		if (sld)
> > +			pr_info("#AC: crashing the kernel on kernel
> split_locks and sending SIGBUS on user-space split_locks\n");
> > +		else
> > +			pr_info("#DB: warning about kernel bus_locks and
> sending SIGBUS on user-space bus_locks\n");
> > +		break;
> > +
> > +	case sld_ratelimit:
> > +		if (bld)
> > +			pr_info("#DB: warning about kernel bus_locks and
> setting silent rate limit to %d/sec per user on non root user-space
> bus_locks\n", ratelimit_bl);
> > +		break;
> > +	}
> > +}
> > +
> > +static void __init bus_lock_setup(void) {
> > +	if (static_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) && sld_state !=
> > +sld_off)
> 
> More overkill.
> 
> > +		bld = true;
> > +}
> > +
> > +void __init sld_setup(struct cpuinfo_x86 *c)
> 
> This wrapper probably should call out that it configures both sld and bld.

Will fix it.

> 
> > +{
> > +	sld_state_setup();
> > +	split_lock_setup(c);
> > +	bus_lock_setup();
> > +	sld_state_show();
> >  }
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index
> > b038695f36c5..58725567da39 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -812,6 +812,16 @@ static void handle_debug(struct pt_regs *regs,
> unsigned long dr6, bool user)
> >  	if (!user && !dr6)
> >  		return;
> >
> > +	/* Handle bus lock. */
> > +	if (!(dr6 & DR_BUS_LOCK)) {
> > +		cond_local_irq_enable(regs);
> > +		if (user)
> > +			handle_user_bus_lock(regs);
> > +		else
> > +			handle_kernel_bus_lock(regs);
> > +		goto out;
> > +	}
> > +
> >  	/*
> >  	 * If dr6 has no reason to give us about the origin of this trap,
> >  	 * then it's very likely the result of an icebp/int01 trap.
> > diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
> > index 917d88edb7b9..fc757ec6c19f 100644
> > --- a/include/linux/sched/user.h
> > +++ b/include/linux/sched/user.h
> > @@ -37,8 +37,9 @@ struct user_struct {
> >  	atomic_long_t locked_vm;
> >  #endif
> >
> > -	/* Miscellaneous per-user rate limit */
> > +	/* Miscellaneous per-user rate limits */
> >  	struct ratelimit_state ratelimit;
> > +	struct ratelimit_state ratelimit_bl;
> 
> Why not spell out ratelimit_bus_lock?  There's no way someone looking at
> this code in isolation is going to have any clue what "bl" means.

Sure. Will fix it.

Thank you very much for your review!

-Fenghua

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29 18:09   ` Yu, Fenghua
@ 2020-07-29 18:46     ` Sean Christopherson
  2020-07-29 19:42       ` Fenghua Yu
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2020-07-29 18:46 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Thomas Gleixner, Borislav Petkov, Ingo Molnar, Peter Zijlstra,
	Shanbhogue, Vedvyas, Luck, Tony, H Peter Anvin, Andy Lutomirski,
	Shankar, Ravi V, Li, Xiaoyao, x86, linux-kernel

On Wed, Jul 29, 2020 at 11:09:16AM -0700, Yu, Fenghua wrote:
> > > Some CPUs have ability to notify the kernel by an #DB trap after the
> > > instruction acquires a bus lock and is executed. This allows the
> > > kernel to enforce user application throttling or mitigations and also
> > > provides a better environment to debug kernel split lock issues since
> > > the kernel can continue instead of crashing.
> > >
> > > #DB for bus lock detect fixes all issues in #AC for split lock detect:
> > 
> > Fixes "all" issues... and creates some new ones, e.g. there are use cases
> > where preventing the split lock from happening in the first place is strongly
> > desired.  It's why that train wreck exists.
> 
> Bus Lock Detection doesn't replace Split Lock Detection. If both features
> are enabled, default behavior is warning from bus lock, fatal behavior is
> still from split lock. See the behavior table as follows.
> 
> > 
> > > 1) It's architectural ... just need to look at one CPUID bit to know it
> > >    exists
> > > 2) The IA32_DEBUGCTL MSR, which reports bus lock in #DB, is per-thread.
> > >    So each process or guest can have different behavior.
> > > 3) It has support for VMM/guests (new VMEXIT codes, etc).
> > >
> > > Use the existing kernel command line option "split_lock_detect=" to
> > > handle #DB for bus lock:
> > 
> > Are SLD and BLD mutually exclusive?  Can we even guarantee that given the
> > track record of SLD?  If not, we'll likely want to allow the user to choose
> > between SDL and BLD via split_lock_detect.
> 
> The two hardware features can be enabled on the same platform.
> But they are mutually exclusive in the kernel because #AC from SLD happens
> before the instruction is executed and #DB happens after the instruction is
> executed.
> 
> Right now, if both of them are enabled, "warn" behavior goes to
> bus lock and "fatal" behavior goes to split lock.
> 
> Do you want the user to override the behaviors by something like this?
> 
> split_lock_detect=warn[,sld]: if given "sld" while both features are enabled,
> warn behavior is from split lock instead of bus lock detection.
> 
> split_lock_detect=fatal[,bld]: if given "bld" while both features are enabled,
> fatal behavior is from bus lock detection.

IMO these should be completely independent features (that happen to share
some code).

BLD in fatal mode doesn't make any sense because it can't be fatal without
a completely different implementation, e.g. the bus lock has already
happened and the application can eat the SIGBUS.  The current SLD code
works because the split lock is prevented entirely, i.e. eating SIGBUS
doesn't allow the application to make forward progress.

Smushing the two into a single option is confusing, e.g. from the table
below it's not at all clear what will happen if sld=fatal, both features
are supported, and the kernel generates a split lock.

Given that both SLD (per-core, not architectural) and BLD (#DB recursion and
inverted DR6 flag) have warts, it would be very nice to enable/disable them
independently.  The lock to non-WB behavior for BLD may also be problematic,
e.g. maybe it turns out that fixing drivers to avoid locks to non-WB isn't
as straightforward as avoiding split locks.

> > >  /*
> > >   * Default to sld_off because most systems do not support split lock
> > > detection
> > > - * split_lock_setup() will switch this to sld_warn on systems that
> > > support
> > > - * split lock detect, unless there is a command line override.
> > > + * sld_state_setup() will switch this to sld_warn on systems that
> > > + support
> > > + * split lock/bus lock detect, unless there is a command line override.
> > >   */
> > >  static enum split_lock_detect_state sld_state __ro_after_init =
> > > sld_off;  static u64 msr_test_ctrl_cache __ro_after_init;
> > > +/* Split lock detection is enabled if it's true. */ static bool sld;
> > > +/* Bus lock detection is enabled if it's true. */ static bool bld;
> > 
> > Why can't these be tracked/reflected in X86_FEATURE_*?
> 
> sld and bld are enabled depending on kernel parameter "split_lock_detect=".
> They are not static and cannot be tracked by static X86_FEATURE_*.

X86_FEATURE_* flags aren't static, the kernel sets/clears them all over the
place.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29 18:46     ` Sean Christopherson
@ 2020-07-29 19:42       ` Fenghua Yu
  2020-07-29 20:00         ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: Fenghua Yu @ 2020-07-29 19:42 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Yu, Fenghua, Thomas Gleixner, Borislav Petkov, Ingo Molnar,
	Peter Zijlstra, Shanbhogue, Vedvyas, Luck, Tony, H Peter Anvin,
	Andy Lutomirski, Shankar, Ravi V, Li, Xiaoyao, x86, linux-kernel

Hi, Sean,

On Wed, Jul 29, 2020 at 11:46:14AM -0700, Sean Christopherson wrote:
> On Wed, Jul 29, 2020 at 11:09:16AM -0700, Yu, Fenghua wrote:
> > > > Some CPUs have ability to notify the kernel by an #DB trap after the
> > > > instruction acquires a bus lock and is executed. This allows the
> > > > kernel to enforce user application throttling or mitigations and also
> > > > provides a better environment to debug kernel split lock issues since
> > > > the kernel can continue instead of crashing.
> > > >
> > > > #DB for bus lock detect fixes all issues in #AC for split lock detect:
> > > 
> > > Fixes "all" issues... and creates some new ones, e.g. there are use cases
> > > where preventing the split lock from happening in the first place is strongly
> > > desired.  It's why that train wreck exists.
> > 
> > Bus Lock Detection doesn't replace Split Lock Detection. If both features
> > are enabled, default behavior is warning from bus lock, fatal behavior is
> > still from split lock. See the behavior table as follows.
> > 
> > > 
> > > > 1) It's architectural ... just need to look at one CPUID bit to know it
> > > >    exists
> > > > 2) The IA32_DEBUGCTL MSR, which reports bus lock in #DB, is per-thread.
> > > >    So each process or guest can have different behavior.
> > > > 3) It has support for VMM/guests (new VMEXIT codes, etc).
> > > >
> > > > Use the existing kernel command line option "split_lock_detect=" to
> > > > handle #DB for bus lock:
> > > 
> > > Are SLD and BLD mutually exclusive?  Can we even guarantee that given the
> > > track record of SLD?  If not, we'll likely want to allow the user to choose
> > > between SDL and BLD via split_lock_detect.
> > 
> > The two hardware features can be enabled on the same platform.
> > But they are mutually exclusive in the kernel because #AC from SLD happens
> > before the instruction is executed and #DB happens after the instruction is
> > executed.
> > 
> > Right now, if both of them are enabled, "warn" behavior goes to
> > bus lock and "fatal" behavior goes to split lock.
> > 
> > Do you want the user to override the behaviors by something like this?
> > 
> > split_lock_detect=warn[,sld]: if given "sld" while both features are enabled,
> > warn behavior is from split lock instead of bus lock detection.
> > 
> > split_lock_detect=fatal[,bld]: if given "bld" while both features are enabled,
> > fatal behavior is from bus lock detection.
> 
> IMO these should be completely independent features (that happen to share
> some code).
> 
> BLD in fatal mode doesn't make any sense because it can't be fatal without
> a completely different implementation, e.g. the bus lock has already
> happened and the application can eat the SIGBUS.  The current SLD code
> works because the split lock is prevented entirely, i.e. eating SIGBUS
> doesn't allow the application to make forward progress.

If "fatal" is meaningless for BLD, we can remove it for BLD. If we need
it in the future, we can add it later.

The "ratelimit:N" maybe more useful for BLD: it mitigates DOS from bus locks.

> 
> Smushing the two into a single option is confusing, e.g. from the table
> below it's not at all clear what will happen if sld=fatal, both features
> are supported, and the kernel generates a split lock.
> 
> Given that both SLD (per-core, not architectural) and BLD (#DB recursion and
> inverted DR6 flag) have warts, it would be very nice to enable/disable them
> independently.  The lock to non-WB behavior for BLD may also be problematic,
> e.g. maybe it turns out that fixing drivers to avoid locks to non-WB isn't
> as straightforward as avoiding split locks.

But the two features are related if both of them are enabled in hardware:
If a split lock happens, SLD will generate #AC before instruction execution
and BLD will generate #DB after instruction execution.

The software needs to make them exclusive. The same kernel option reflects
the relationship and make them exclusive, e.g. "fatal" enables SLD and
disables BLD, "warn" does the other way.

If using two different kernel options, the user needs to give right options
to make both work, e.g. can the user give this combination
"split_lock_detect=fatal bus_lock_detect=warn"? What does the combination
mean? There could be many combinations of the two options, some of them
are meaningful and some of them aren't. Maintaining the combinations is
unnecessary complex, right?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29 19:42       ` Fenghua Yu
@ 2020-07-29 20:00         ` Sean Christopherson
  2020-07-29 20:09           ` peterz
  2020-07-29 20:35           ` Fenghua Yu
  0 siblings, 2 replies; 16+ messages in thread
From: Sean Christopherson @ 2020-07-29 20:00 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Borislav Petkov, Ingo Molnar, Peter Zijlstra,
	Shanbhogue, Vedvyas, Luck, Tony, H Peter Anvin, Andy Lutomirski,
	Shankar, Ravi V, Li, Xiaoyao, x86, linux-kernel

On Wed, Jul 29, 2020 at 07:42:59PM +0000, Fenghua Yu wrote:
> > Smushing the two into a single option is confusing, e.g. from the table
> > below it's not at all clear what will happen if sld=fatal, both features
> > are supported, and the kernel generates a split lock.
> > 
> > Given that both SLD (per-core, not architectural) and BLD (#DB recursion and
> > inverted DR6 flag) have warts, it would be very nice to enable/disable them
> > independently.  The lock to non-WB behavior for BLD may also be problematic,
> > e.g. maybe it turns out that fixing drivers to avoid locks to non-WB isn't
> > as straightforward as avoiding split locks.
> 
> But the two features are related if both of them are enabled in hardware:
> If a split lock happens, SLD will generate #AC before instruction execution
> and BLD will generate #DB after instruction execution.
> 
> The software needs to make them exclusive. The same kernel option reflects
> the relationship and make them exclusive, e.g. "fatal" enables SLD and
> disables BLD, "warn" does the other way.

Why do they need to be exclusive?  We've already established that BLD catches
things that SLD does not.  What's wrong with running sld=fatal and bld=ratelimit
so that split locks never happen and kill applications, and non-WB locks are
are ratelimited?

Sure, sld==warn with bld!=off is a bit silly, but the kernel can easily handle
that particular case.

> If using two different kernel options, the user needs to give right options
> to make both work, e.g. can the user give this combination
> "split_lock_detect=fatal bus_lock_detect=warn"? What does the combination
> mean?

Split locks are fatal, non-WB locks are logged but not fatal.

> There could be many combinations of the two options, some of them
> are meaningful and some of them aren't. Maintaining the combinations is
> unnecessary complex, right?

Honestly, it seems less complex than deciphering the resulting behavior from
that table.

  sld=off|warn|fatal
  bld=off|warn|ratelimit

As above, sld then could become

  if (sld == warn && bld != off) {
          pr_warn("disabling SLD in favor of BLD\n");
          sld = off;
  }

Everything else should simply work.  The necessary refactoring for SLD should
be minimial as well.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29 20:00         ` Sean Christopherson
@ 2020-07-29 20:09           ` peterz
  2020-07-29 20:35           ` Fenghua Yu
  1 sibling, 0 replies; 16+ messages in thread
From: peterz @ 2020-07-29 20:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Fenghua Yu, Thomas Gleixner, Borislav Petkov, Ingo Molnar,
	Shanbhogue, Vedvyas, Luck, Tony, H Peter Anvin, Andy Lutomirski,
	Shankar, Ravi V, Li, Xiaoyao, x86, linux-kernel

On Wed, Jul 29, 2020 at 01:00:33PM -0700, Sean Christopherson wrote:
> Why do they need to be exclusive?  We've already established that BLD catches
> things that SLD does not.  What's wrong with running sld=fatal and bld=ratelimit
> so that split locks never happen and kill applications, and non-WB locks are
> are ratelimited?

It's all moot until there's a sane proposal for #DB that isn't utterly
wrecked.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29 20:00         ` Sean Christopherson
  2020-07-29 20:09           ` peterz
@ 2020-07-29 20:35           ` Fenghua Yu
  2020-07-29 20:39             ` Sean Christopherson
  1 sibling, 1 reply; 16+ messages in thread
From: Fenghua Yu @ 2020-07-29 20:35 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Fenghua Yu, Thomas Gleixner, Borislav Petkov, Ingo Molnar,
	Peter Zijlstra, Shanbhogue, Vedvyas, Luck, Tony, H Peter Anvin,
	Andy Lutomirski, Shankar, Ravi V, Li, Xiaoyao, x86, linux-kernel

Hi, Sean,

On Wed, Jul 29, 2020 at 01:00:33PM -0700, Sean Christopherson wrote:
> On Wed, Jul 29, 2020 at 07:42:59PM +0000, Fenghua Yu wrote:
> > > Smushing the two into a single option is confusing, e.g. from the table
> > > below it's not at all clear what will happen if sld=fatal, both features
> > > are supported, and the kernel generates a split lock.
> > > 
> > > Given that both SLD (per-core, not architectural) and BLD (#DB recursion and
> > > inverted DR6 flag) have warts, it would be very nice to enable/disable them
> > > independently.  The lock to non-WB behavior for BLD may also be problematic,
> > > e.g. maybe it turns out that fixing drivers to avoid locks to non-WB isn't
> > > as straightforward as avoiding split locks.
> > 
> > But the two features are related if both of them are enabled in hardware:
> > If a split lock happens, SLD will generate #AC before instruction execution
> > and BLD will generate #DB after instruction execution.
> > 
> > The software needs to make them exclusive. The same kernel option reflects
> > the relationship and make them exclusive, e.g. "fatal" enables SLD and
> > disables BLD, "warn" does the other way.
> 
> Why do they need to be exclusive?  We've already established that BLD catches
> things that SLD does not.  What's wrong with running sld=fatal and bld=ratelimit
> so that split locks never happen and kill applications, and non-WB locks are
> are ratelimited?

Sorry if I didn't explain bus lock and split lock detections clearly before.

There are two causes of bus locks:
1. a locked access across cache line boundary: this is split lock.
2. a locked access to non-WB memory.

BLD detects both causes and SLD only detects the first one, i.e. BLD can detect
both split lock AND lock to non-WB memory.

If sld=fatal and bld=ratelimit (both sld and bld are enabled in hw),
a split lock always generates #AC and kills the app and bld will never have
a chance to trigger #DB for split lock. So effectively the combination makes
the kernel to take two different actions after detecting a bus lock: if the
bus lock comes from a split lock, fatal (sld); if the bus lock comes from
lock to non-WB memory, ratelimit (bld). Seems this is not a useful combination
and is not what the user really wants to do because the user wants ratelimit
for BLD, right?

> > If using two different kernel options, the user needs to give right options
> > to make both work, e.g. can the user give this combination
> > "split_lock_detect=fatal bus_lock_detect=warn"? What does the combination
> > mean?
> 
> Split locks are fatal, non-WB locks are logged but not fatal.

Similar here: bus lock from a split lock is fatal (sld triggers #AC) and
bus lock from lock to non-WB mem is warn (bld triggers #DB). Seems not what
the user really wants, right?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29 20:35           ` Fenghua Yu
@ 2020-07-29 20:39             ` Sean Christopherson
  2020-07-29 22:07               ` Fenghua Yu
  0 siblings, 1 reply; 16+ messages in thread
From: Sean Christopherson @ 2020-07-29 20:39 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Borislav Petkov, Ingo Molnar, Peter Zijlstra,
	Shanbhogue, Vedvyas, Luck, Tony, H Peter Anvin, Andy Lutomirski,
	Shankar, Ravi V, Li, Xiaoyao, x86, linux-kernel

On Wed, Jul 29, 2020 at 08:35:57PM +0000, Fenghua Yu wrote:
> Hi, Sean,
> 
> On Wed, Jul 29, 2020 at 01:00:33PM -0700, Sean Christopherson wrote:
> > On Wed, Jul 29, 2020 at 07:42:59PM +0000, Fenghua Yu wrote:
> > > > Smushing the two into a single option is confusing, e.g. from the table
> > > > below it's not at all clear what will happen if sld=fatal, both features
> > > > are supported, and the kernel generates a split lock.
> > > > 
> > > > Given that both SLD (per-core, not architectural) and BLD (#DB recursion and
> > > > inverted DR6 flag) have warts, it would be very nice to enable/disable them
> > > > independently.  The lock to non-WB behavior for BLD may also be problematic,
> > > > e.g. maybe it turns out that fixing drivers to avoid locks to non-WB isn't
> > > > as straightforward as avoiding split locks.
> > > 
> > > But the two features are related if both of them are enabled in hardware:
> > > If a split lock happens, SLD will generate #AC before instruction execution
> > > and BLD will generate #DB after instruction execution.
> > > 
> > > The software needs to make them exclusive. The same kernel option reflects
> > > the relationship and make them exclusive, e.g. "fatal" enables SLD and
> > > disables BLD, "warn" does the other way.
> > 
> > Why do they need to be exclusive?  We've already established that BLD catches
> > things that SLD does not.  What's wrong with running sld=fatal and bld=ratelimit
> > so that split locks never happen and kill applications, and non-WB locks are
> > are ratelimited?
> 
> Sorry if I didn't explain bus lock and split lock detections clearly before.
> 
> There are two causes of bus locks:
> 1. a locked access across cache line boundary: this is split lock.
> 2. a locked access to non-WB memory.
> 
> BLD detects both causes and SLD only detects the first one, i.e. BLD can detect
> both split lock AND lock to non-WB memory.
> 
> If sld=fatal and bld=ratelimit (both sld and bld are enabled in hw),
> a split lock always generates #AC and kills the app and bld will never have
> a chance to trigger #DB for split lock. So effectively the combination makes
> the kernel to take two different actions after detecting a bus lock: if the
> bus lock comes from a split lock, fatal (sld); if the bus lock comes from
> lock to non-WB memory, ratelimit (bld). Seems this is not a useful combination
> and is not what the user really wants to do because the user wants ratelimit
> for BLD, right?

I understood all off that.  And as I user I want to run sld=fatal and
bld=ratelimit to provide maximum protection, i.e. disallow split locks at
all times, and ratelimit the crud SLD #AC can't catch.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29  8:49 ` peterz
@ 2020-07-29 20:40   ` Fenghua Yu
  2020-07-29 21:09     ` peterz
  0 siblings, 1 reply; 16+ messages in thread
From: Fenghua Yu @ 2020-07-29 20:40 UTC (permalink / raw)
  To: peterz
  Cc: Fenghua Yu, Thomas Gleixner, Borislav Petkov, Ingo Molnar,
	Tony Luck, H Peter Anvin, Andy Lutomirski, Ravi V Shankar,
	Xiaoyao Li, x86, linux-kernel

Hi, Peter,

On Wed, Jul 29, 2020 at 10:49:47AM +0200, peterz@infradead.org wrote:
> On Fri, Jul 17, 2020 at 02:35:00PM -0700, Fenghua Yu wrote:
> 
> > #DB for bus lock detect fixes all issues in #AC for split lock detect:
> > 1) It's architectural ... just need to look at one CPUID bit to know it
> >    exists
> > 2) The IA32_DEBUGCTL MSR, which reports bus lock in #DB, is per-thread.
> >    So each process or guest can have different behavior.
> 
> And it generates a whole new problem due to #DB being an IST, and
> 
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index b038695f36c5..58725567da39 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -812,6 +812,16 @@ static void handle_debug(struct pt_regs *regs, unsigned long dr6, bool user)
> >  	if (!user && !dr6)
> >  		return;
> >  
> > +	/* Handle bus lock. */
> > +	if (!(dr6 & DR_BUS_LOCK)) {
> > +		cond_local_irq_enable(regs);
> > +		if (user)
> > +			handle_user_bus_lock(regs);
> > +		else
> > +			handle_kernel_bus_lock(regs);
> > +		goto out;
> > +	}
> > +
> >  	/*
> >  	 * If dr6 has no reason to give us about the origin of this trap,
> >  	 * then it's very likely the result of an icebp/int01 trap.
> 
> we very much rely on #DB never recursing, which we carefully crafted by
> disallowing hardare breakpoints on noinstr code and clearing DR7 early.
> 
> But now it can... please keep the pieces.

Can we disable Bus Lock Detection before handle it and re-enable it
after handle it? Will that resolve the recursion issue?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29 20:40   ` Fenghua Yu
@ 2020-07-29 21:09     ` peterz
  2020-07-30 10:08       ` Thomas Gleixner
  0 siblings, 1 reply; 16+ messages in thread
From: peterz @ 2020-07-29 21:09 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Borislav Petkov, Ingo Molnar, Tony Luck,
	H Peter Anvin, Andy Lutomirski, Ravi V Shankar, Xiaoyao Li, x86,
	linux-kernel

On Wed, Jul 29, 2020 at 08:40:57PM +0000, Fenghua Yu wrote:
> On Wed, Jul 29, 2020 at 10:49:47AM +0200, peterz@infradead.org wrote:
> > On Fri, Jul 17, 2020 at 02:35:00PM -0700, Fenghua Yu wrote:
> > 
> > > #DB for bus lock detect fixes all issues in #AC for split lock detect:
> > > 1) It's architectural ... just need to look at one CPUID bit to know it
> > >    exists
> > > 2) The IA32_DEBUGCTL MSR, which reports bus lock in #DB, is per-thread.
> > >    So each process or guest can have different behavior.
> > 
> > And it generates a whole new problem due to #DB being an IST, and

> > we very much rely on #DB never recursing, which we carefully crafted by
> > disallowing hardare breakpoints on noinstr code and clearing DR7 early.
> > 
> > But now it can... please keep the pieces.
> 
> Can we disable Bus Lock Detection before handle it and re-enable it
> after handle it? Will that resolve the recursion issue?

Because WRMSR is cheap, right?

You have to unconditionally {dis,en}able it on #DB entry/exit. Not only
when it's a DR_BUS_LOCK, _always_. Then maybe. I'm too tired to think
through the IST mess.

IST's suck, they're horrible crap.

Suppose we get a #DB, then we get an NMI right before it does WRMSR, so
BUS_LOCK is still on, then the NMI does a dodgy LOCK op, we die.

So that means, you get to disable it on every NMI-like exception too,
but we happen to care about performance for those, you loose.


Also, what happens if you have a hardware watchpoint on the instruction
that causes DR_BUS_LOCK? Does that work as expected?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29 20:39             ` Sean Christopherson
@ 2020-07-29 22:07               ` Fenghua Yu
  2020-07-29 23:28                 ` Sean Christopherson
  0 siblings, 1 reply; 16+ messages in thread
From: Fenghua Yu @ 2020-07-29 22:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Fenghua Yu, Thomas Gleixner, Borislav Petkov, Ingo Molnar,
	Peter Zijlstra, Shanbhogue, Vedvyas, Luck, Tony, H Peter Anvin,
	Andy Lutomirski, Shankar, Ravi V, Li, Xiaoyao, x86, linux-kernel

Hi, Sean,

On Wed, Jul 29, 2020 at 01:39:05PM -0700, Sean Christopherson wrote:
> On Wed, Jul 29, 2020 at 08:35:57PM +0000, Fenghua Yu wrote:
> > If sld=fatal and bld=ratelimit (both sld and bld are enabled in hw),
> > a split lock always generates #AC and kills the app and bld will never have
> > a chance to trigger #DB for split lock. So effectively the combination makes
> > the kernel to take two different actions after detecting a bus lock: if the
> > bus lock comes from a split lock, fatal (sld); if the bus lock comes from
> > lock to non-WB memory, ratelimit (bld). Seems this is not a useful combination
> > and is not what the user really wants to do because the user wants ratelimit
> > for BLD, right?
> 
> I understood all off that.  And as I user I want to run sld=fatal and
> bld=ratelimit to provide maximum protection, i.e. disallow split locks at
> all times, and ratelimit the crud SLD #AC can't catch.

Then this will expand the current usages and do need two options. Let me work
on adding a new "bus_lock_detect=" option as you suggested.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29 22:07               ` Fenghua Yu
@ 2020-07-29 23:28                 ` Sean Christopherson
  0 siblings, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2020-07-29 23:28 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Borislav Petkov, Ingo Molnar, Peter Zijlstra,
	Shanbhogue, Vedvyas, Luck, Tony, H Peter Anvin, Andy Lutomirski,
	Shankar, Ravi V, Li, Xiaoyao, x86, linux-kernel

On Wed, Jul 29, 2020 at 10:07:14PM +0000, Fenghua Yu wrote:
> Hi, Sean,
> 
> On Wed, Jul 29, 2020 at 01:39:05PM -0700, Sean Christopherson wrote:
> > On Wed, Jul 29, 2020 at 08:35:57PM +0000, Fenghua Yu wrote:
> > > If sld=fatal and bld=ratelimit (both sld and bld are enabled in hw),
> > > a split lock always generates #AC and kills the app and bld will never have
> > > a chance to trigger #DB for split lock. So effectively the combination makes
> > > the kernel to take two different actions after detecting a bus lock: if the
> > > bus lock comes from a split lock, fatal (sld); if the bus lock comes from
> > > lock to non-WB memory, ratelimit (bld). Seems this is not a useful combination
> > > and is not what the user really wants to do because the user wants ratelimit
> > > for BLD, right?
> > 
> > I understood all off that.  And as I user I want to run sld=fatal and
> > bld=ratelimit to provide maximum protection, i.e. disallow split locks at
> > all times, and ratelimit the crud SLD #AC can't catch.
> 
> Then this will expand the current usages and do need two options. Let me work
> on adding a new "bus_lock_detect=" option as you suggested.

I'd wait for feedback from others before spending too much effort rewriting
everything, I'm just one person with an opinion.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC] x86/bus_lock: Enable bus lock detection
  2020-07-29 21:09     ` peterz
@ 2020-07-30 10:08       ` Thomas Gleixner
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Gleixner @ 2020-07-30 10:08 UTC (permalink / raw)
  To: peterz, Fenghua Yu
  Cc: Borislav Petkov, Ingo Molnar, Tony Luck, H Peter Anvin,
	Andy Lutomirski, Ravi V Shankar, Xiaoyao Li, x86, linux-kernel

peterz@infradead.org writes:
> On Wed, Jul 29, 2020 at 08:40:57PM +0000, Fenghua Yu wrote:
>> Can we disable Bus Lock Detection before handle it and re-enable it
>> after handle it? Will that resolve the recursion issue?
>
> Because WRMSR is cheap, right?
>
> You have to unconditionally {dis,en}able it on #DB entry/exit. Not only
> when it's a DR_BUS_LOCK, _always_. Then maybe. I'm too tired to think
> through the IST mess.
>
> IST's suck, they're horrible crap.
>
> Suppose we get a #DB, then we get an NMI right before it does WRMSR, so
> BUS_LOCK is still on, then the NMI does a dodgy LOCK op, we die.
>
> So that means, you get to disable it on every NMI-like exception too,
> but we happen to care about performance for those, you loose.
>
> Also, what happens if you have a hardware watchpoint on the instruction
> that causes DR_BUS_LOCK? Does that work as expected?

Q: Why on earth are Intel hardware folks cramming this into #DB?
A: Just because there was a bit left in DR6 to indicate it, right?

Q: Why can't hardware folks talk to us _before_ they make the x86 exception
   trainwreck even worse?
A: Just because they know that we'd tell them to go back to the drawing
   board.

Q: Is that going to be supported by the kernel?
A: No, go back to the drawing board and talk to us _before_ coming back
   with the next half thought out tinkerware cast in silicon.

I'm really tired of wasting time dealing with such misfeatures which create
more problems than they solve.

Thanks,

        Thomas



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-07-30 10:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-17 21:35 [PATCH RFC] x86/bus_lock: Enable bus lock detection Fenghua Yu
2020-07-29  3:02 ` Sean Christopherson
2020-07-29  8:50   ` peterz
2020-07-29 18:09   ` Yu, Fenghua
2020-07-29 18:46     ` Sean Christopherson
2020-07-29 19:42       ` Fenghua Yu
2020-07-29 20:00         ` Sean Christopherson
2020-07-29 20:09           ` peterz
2020-07-29 20:35           ` Fenghua Yu
2020-07-29 20:39             ` Sean Christopherson
2020-07-29 22:07               ` Fenghua Yu
2020-07-29 23:28                 ` Sean Christopherson
2020-07-29  8:49 ` peterz
2020-07-29 20:40   ` Fenghua Yu
2020-07-29 21:09     ` peterz
2020-07-30 10:08       ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).