All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/3] x86/bus_lock: Enable bus lock detection
@ 2021-03-13  5:49 Fenghua Yu
  2021-03-13  5:49 ` [PATCH v5 1/3] x86/cpufeatures: Enumerate #DB for " Fenghua Yu
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Fenghua Yu @ 2021-03-13  5:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Tony Luck, Randy Dunlap, Xiaoyao Li ,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

A bus lock [1] is acquired through either split locked access to
writeback (WB) memory or any locked access to non-WB memory. This is
typically >1000 cycles slower than an atomic operation within
a cache line. It also disrupts performance on other cores.

Although split lock can be detected by #AC trap, the trap is triggered
before the instruction acquires bus lock. This makes it difficult to
mitigate bus lock (e.g. throttle the user application).

Some CPUs have ability to notify the kernel by an #DB trap after a user
instruction acquires a bus lock and is executed. This allows the kernel
to enforce user application throttling or mitigations.

#DB for bus lock detect fixes issues in #AC for split lock detect:
1) It's architectural ... just need to look at one CPUID bit to know it
   exists
2) The IA32_DEBUGCTL MSR, which reports bus lock in #DB, is per-thread.
   So each process or guest can have different behavior.
3) It has support for VMM/guests (new VMEXIT codes, etc).
4) It detects not only split locks but also bus locks from non-WB.

Hardware only generates #DB for bus lock detect when CPL>0 to avoid
nested #DB from multiple bus locks while the first #DB is being handled.

Use the existing kernel command line parameter "split_lock_detect=" to
handle #DB for bus lock with an additional option "ratelimit=N" to set
bus lock rate limit for a user.

[1] Intel Instruction Set Extension Chapter 9:
https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf

Change Log:
v5:
Address all comments from Thomas:
- In the cover letter, update the latest ISE link to include the #DB
  for bus lock spec.
- In patch 1, add commit message for breakpoint and bus lock on the same
  instruction.
- In patch 2, change warn to #AC if both #AC and #DB are supported, remove
  sld and bld variables, remove bus lock checking in handle_bus_lock() etc.
- In patch 3 and 4, remove bld_ratelimit < HZ/2 check and define
  bld_ratelimit only for Intel CPUs.
- Merge patch 2 and 3 into one patch for handling warn, fatal, and
  ratelimit.
v4 is here: https://lore.kernel.org/lkml/20201124205245.4164633-2-fenghua.yu@intel.com/

v4:
- Fix a ratelimit wording issue in the doc (Randy).
- Patch 4 is acked by Randy (Randy).

v3:
- Enable Bus Lock Detection when fatal to handle bus lock from non-WB
  (PeterZ).
- Add Acked-by: PeterZ in patch 2.

v2:
- Send SIGBUS in fatal case for bus lock #DB (PeterZ).

v1:
- Check bus lock bit by its positive polarity (Xiaoyao).
- Fix a few wording issues in the documentation (Randy).
[RFC v3 can be found at: https://lore.kernel.org/patchwork/cover/1329943/]

RFC v3:
- Remove DR6_RESERVED change (PeterZ).
- Simplify the documentation (Randy).

RFC v2:
- Architecture changed based on feedback from Thomas and PeterZ. #DB is
  no longer generated for bus lock in ring0.
- Split the one single patch into four patches.
[RFC v1 can be found at: https://lore.kernel.org/lkml/1595021700-68460-1-git-send-email-fenghua.yu@intel.com/]

Fenghua Yu (3):
  x86/cpufeatures: Enumerate #DB for bus lock detection
  x86/bus_lock: Handle #DB for bus lock
  Documentation/admin-guide: Change doc for split_lock_detect parameter

 .../admin-guide/kernel-parameters.txt         |  30 +++-
 arch/x86/include/asm/cpu.h                    |  10 +-
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   1 +
 arch/x86/include/uapi/asm/debugreg.h          |   1 +
 arch/x86/kernel/cpu/common.c                  |   2 +-
 arch/x86/kernel/cpu/intel.c                   | 148 +++++++++++++++---
 arch/x86/kernel/traps.c                       |   7 +
 include/linux/sched/user.h                    |   5 +-
 kernel/user.c                                 |  13 ++
 10 files changed, 187 insertions(+), 31 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v5 1/3] x86/cpufeatures: Enumerate #DB for bus lock detection
  2021-03-13  5:49 [PATCH v5 0/3] x86/bus_lock: Enable bus lock detection Fenghua Yu
@ 2021-03-13  5:49 ` Fenghua Yu
  2021-03-19 20:35   ` Thomas Gleixner
  2021-03-13  5:49 ` [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock Fenghua Yu
  2021-03-13  5:49 ` [PATCH v5 3/3] Documentation/admin-guide: Change doc for split_lock_detect parameter Fenghua Yu
  2 siblings, 1 reply; 18+ messages in thread
From: Fenghua Yu @ 2021-03-13  5:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Tony Luck, Randy Dunlap, Xiaoyao Li ,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

A bus lock is acquired though either split locked access to
writeback (WB) memory or any locked access to non-WB memory. This is
typically >1000 cycles slower than an atomic operation within a cache
line. It also disrupts performance on other cores.

Some CPUs have ability to notify the kernel by an #DB trap after a user
instruction acquires a bus lock and is executed. This allows the kernel
to enforce user application throttling or mitigations. Both breakpoint
and bus lock can trigger the #DB trap in the same instruction and the
ordering of handling them is the kernel #DB handler's choice.

The CPU feature flag to be shown in /proc/cpuinfo will be "bus_lock_detect".

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
Change Log:
v5:
- Add "Both breakpoint and bus lock can trigger an #DB trap..." in the
  commit message (Thomas).

 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index cc96e26d69f7..faec3d92d09b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -354,6 +354,7 @@
 #define X86_FEATURE_AVX512_VPOPCNTDQ	(16*32+14) /* POPCNT for vectors of DW/QW */
 #define X86_FEATURE_LA57		(16*32+16) /* 5-level page tables */
 #define X86_FEATURE_RDPID		(16*32+22) /* RDPID instruction */
+#define X86_FEATURE_BUS_LOCK_DETECT	(16*32+24) /* Bus Lock detect */
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
 #define X86_FEATURE_MOVDIR64B		(16*32+28) /* MOVDIR64B instruction */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock
  2021-03-13  5:49 [PATCH v5 0/3] x86/bus_lock: Enable bus lock detection Fenghua Yu
  2021-03-13  5:49 ` [PATCH v5 1/3] x86/cpufeatures: Enumerate #DB for " Fenghua Yu
@ 2021-03-13  5:49 ` Fenghua Yu
  2021-03-19 21:30   ` Thomas Gleixner
  2021-03-13  5:49 ` [PATCH v5 3/3] Documentation/admin-guide: Change doc for split_lock_detect parameter Fenghua Yu
  2 siblings, 1 reply; 18+ messages in thread
From: Fenghua Yu @ 2021-03-13  5:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Tony Luck, Randy Dunlap, Xiaoyao Li ,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

Bus locks degrade performance for the whole system, not just for the CPU
that requested the bus lock. Two CPU features "#AC for split lock" and
"#DB for bus lock" provide hooks so that the operating system may choose
one of several mitigation strategies.

#AC for split lock is already implemented. Add code to use the #DB for
bus lock feature to cover additional situations with new options to
mitigate.

split_lock_detect=
		#AC for split lock		#DB for bus lock

off		Do nothing			Do nothing

warn		Kernel OOPs			Warn once per task and
		Warn once per task and		and continues to run.
		disable future checking
	 	When both features are
		supported, warn in #AC

fatal		Kernel OOPs			Send SIGBUS to user.
		Send SIGBUS to user
		When both features are
		supported, fatal in #AC

ratelimit:N	Do nothing			Limit bus lock rate to
						N per second in the
						current non-root user.

Default option is "warn".

Hardware only generates #DB for bus lock detect when CPL>0 to avoid
nested #DB from multiple bus locks while the first #DB is being handled.
So no need to handle #DB for bus lock detected in the kernel.

#DB for bus lock is enabled by bus lock detection bit 2 in DEBUGCTL MSR
while #AC for split lock is enabled by split lock detection bit 29 in
TEST_CTRL MSR.

Both breakpoint and bus lock in the same instruction can trigger one #DB.
The bus lock is handled before the breakpoint in the #DB handler.

Delivery of #DB for bus lock in userspace clears DR6[11]. To avoid
confusion in identifying #DB, #DB handler sets the bit to 1 before
returning to the interrupted task.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
Change Log:
v5:
Address all comments from Thomas:
- Merge patch 2 and patch 3 into one patch so all "split_lock_detect="
  options are processed in one patch.
- Change warn to #AC if both #AC and #DB are supported.
- Remove sld and bld variables and use boot_cpu_has() to check bus lock
  split lock support.
- Remove bus lock checking in handle_bus_lock().
- Remove bld_ratelimit < HZ/2 check.
- Add rate limit handling comment in bus lock #DB.
- Define bld_ratelimit only for Intel CPUs.

v3:
- Enable Bus Lock Detection when fatal to handle bus lock from non-WB
  (PeterZ).

v2:
- Send SIGBUS in fatal case for bus lock #DB (PeterZ).

v1::
- Check bus lock bit by its positive polarity (Xiaoyao).

RFC v3:
- Remove DR6_RESERVED change (PeterZ).

 arch/x86/include/asm/cpu.h           |  10 +-
 arch/x86/include/asm/msr-index.h     |   1 +
 arch/x86/include/uapi/asm/debugreg.h |   1 +
 arch/x86/kernel/cpu/common.c         |   2 +-
 arch/x86/kernel/cpu/intel.c          | 148 +++++++++++++++++++++++----
 arch/x86/kernel/traps.c              |   7 ++
 include/linux/sched/user.h           |   5 +-
 kernel/user.c                        |  13 +++
 8 files changed, 162 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index da78ccbd493b..991de5f2a09c 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -41,12 +41,14 @@ unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
 #ifdef CONFIG_CPU_SUP_INTEL
-extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern void __init sld_setup(struct cpuinfo_x86 *c);
 extern void switch_to_sld(unsigned long tifn);
 extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
 extern bool handle_guest_split_lock(unsigned long ip);
+extern void handle_bus_lock(struct pt_regs *regs);
+extern int bld_ratelimit;
 #else
-static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline void __init sld_setup(struct cpuinfo_x86 *c) {}
 static inline void switch_to_sld(unsigned long tifn) {}
 static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 {
@@ -57,6 +59,10 @@ static inline bool handle_guest_split_lock(unsigned long ip)
 {
 	return false;
 }
+
+static inline void handle_bus_lock(struct pt_regs *regs)
+{
+}
 #endif
 #ifdef CONFIG_IA32_FEAT_CTL
 void init_ia32_feat_ctl(struct cpuinfo_x86 *c);
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 546d6ecf0a35..558485965f21 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -265,6 +265,7 @@
 #define DEBUGCTLMSR_LBR			(1UL <<  0) /* last branch recording */
 #define DEBUGCTLMSR_BTF_SHIFT		1
 #define DEBUGCTLMSR_BTF			(1UL <<  1) /* single-step on branches */
+#define DEBUGCTLMSR_BUS_LOCK_DETECT	(1UL <<  2) /* Bus lock detect */
 #define DEBUGCTLMSR_TR			(1UL <<  6)
 #define DEBUGCTLMSR_BTS			(1UL <<  7)
 #define DEBUGCTLMSR_BTINT		(1UL <<  8)
diff --git a/arch/x86/include/uapi/asm/debugreg.h b/arch/x86/include/uapi/asm/debugreg.h
index d95d080b30e3..0007ba077c0c 100644
--- a/arch/x86/include/uapi/asm/debugreg.h
+++ b/arch/x86/include/uapi/asm/debugreg.h
@@ -24,6 +24,7 @@
 #define DR_TRAP3	(0x8)		/* db3 */
 #define DR_TRAP_BITS	(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)
 
+#define DR_BUS_LOCK	(0x800)		/* bus_lock */
 #define DR_STEP		(0x4000)	/* single-step */
 #define DR_SWITCH	(0x8000)	/* task switch */
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index ab640abe26b6..1a4e260b9027 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1330,7 +1330,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
-	cpu_set_core_cap_bits(c);
+	sld_setup(c);
 
 	fpu__init_system(c);
 
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 0e422a544835..2bd680c0c9ae 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -10,6 +10,9 @@
 #include <linux/thread_info.h>
 #include <linux/init.h>
 #include <linux/uaccess.h>
+#include <linux/cred.h>
+#include <linux/delay.h>
+#include <linux/sched/user.h>
 
 #include <asm/cpufeature.h>
 #include <asm/msr.h>
@@ -41,12 +44,13 @@ enum split_lock_detect_state {
 	sld_off = 0,
 	sld_warn,
 	sld_fatal,
+	sld_ratelimit,
 };
 
 /*
- * Default to sld_off because most systems do not support split lock detection
- * split_lock_setup() will switch this to sld_warn on systems that support
- * split lock detect, unless there is a command line override.
+ * Default to sld_off because most systems do not support split lock detection.
+ * sld_state_setup() will switch this to sld_warn on systems that support
+ * split lock/bus lock detect, unless there is a command line override.
  */
 static enum split_lock_detect_state sld_state __ro_after_init = sld_off;
 static u64 msr_test_ctrl_cache __ro_after_init;
@@ -603,6 +607,7 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 }
 
 static void split_lock_init(void);
+static void bus_lock_init(void);
 
 static void init_intel(struct cpuinfo_x86 *c)
 {
@@ -720,6 +725,7 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_disable();
 
 	split_lock_init();
+	bus_lock_init();
 
 	intel_init_thermal(c);
 }
@@ -995,13 +1001,24 @@ static const struct {
 	{ "off",	sld_off   },
 	{ "warn",	sld_warn  },
 	{ "fatal",	sld_fatal },
+	{ "ratelimit:", sld_ratelimit },
 };
 
 static inline bool match_option(const char *arg, int arglen, const char *opt)
 {
-	int len = strlen(opt);
 
-	return len == arglen && !strncmp(arg, opt, len);
+	int len = strlen(opt), ratelimit;
+
+	if (strncmp(arg, opt, len))
+		return false;
+
+	if (sscanf(arg, "ratelimit:%d", &ratelimit) == 1 && ratelimit > 0) {
+		bld_ratelimit = ratelimit;
+
+		return true;
+	}
+
+	return len == arglen;
 }
 
 static bool split_lock_verify_msr(bool on)
@@ -1020,16 +1037,15 @@ static bool split_lock_verify_msr(bool on)
 	return ctrl == tmp;
 }
 
-static void __init split_lock_setup(void)
+static void __init sld_state_setup(void)
 {
 	enum split_lock_detect_state state = sld_warn;
 	char arg[20];
 	int i, ret;
 
-	if (!split_lock_verify_msr(false)) {
-		pr_info("MSR access failed: Disabled\n");
+	if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
+	    !boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
 		return;
-	}
 
 	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
 				  arg, sizeof(arg));
@@ -1041,17 +1057,14 @@ static void __init split_lock_setup(void)
 			}
 		}
 	}
+	sld_state = state;
+}
 
-	switch (state) {
-	case sld_off:
-		pr_info("disabled\n");
+static void __init __split_lock_setup(void)
+{
+	if (!split_lock_verify_msr(false)) {
+		pr_info("MSR access failed: Disabled\n");
 		return;
-	case sld_warn:
-		pr_info("warning about user-space split_locks\n");
-		break;
-	case sld_fatal:
-		pr_info("sending SIGBUS on user-space split_locks\n");
-		break;
 	}
 
 	rdmsrl(MSR_TEST_CTRL, msr_test_ctrl_cache);
@@ -1061,7 +1074,9 @@ static void __init split_lock_setup(void)
 		return;
 	}
 
-	sld_state = state;
+	/* Restore the MSR to its cached value. */
+	wrmsrl(MSR_TEST_CTRL, msr_test_ctrl_cache);
+
 	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
 }
 
@@ -1082,6 +1097,15 @@ static void sld_update_msr(bool on)
 
 static void split_lock_init(void)
 {
+	/*
+	 * #DB for bus lock handles ratelimit and #AC for split lock is
+	 * disabled.
+	 */
+	if (sld_state == sld_ratelimit) {
+		split_lock_verify_msr(false);
+		return;
+	}
+
 	if (cpu_model_supports_sld)
 		split_lock_verify_msr(sld_state != sld_off);
 }
@@ -1118,6 +1142,29 @@ bool handle_guest_split_lock(unsigned long ip)
 }
 EXPORT_SYMBOL_GPL(handle_guest_split_lock);
 
+static void bus_lock_init(void)
+{
+	u64 val;
+
+	/*
+	 * Warn and fatal are handled by #AC for split lock if #AC for
+	 * split lock is supported.
+	 */
+	if (!boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) ||
+	    (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
+	    (sld_state == sld_warn || sld_state == sld_fatal)) ||
+	    sld_state == sld_off)
+		return;
+
+	/*
+	 * Enable #DB for bus lock. All bus locks are handled in #DB except
+	 * split locks are handled in #AC in fatal case.
+	 */
+	rdmsrl(MSR_IA32_DEBUGCTLMSR, val);
+	val |= DEBUGCTLMSR_BUS_LOCK_DETECT;
+	wrmsrl(MSR_IA32_DEBUGCTLMSR, val);
+}
+
 bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 {
 	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
@@ -1126,6 +1173,26 @@ bool handle_user_split_lock(struct pt_regs *regs, long error_code)
 	return true;
 }
 
+void handle_bus_lock(struct pt_regs *regs)
+{
+	switch (sld_state) {
+	case sld_off:
+		break;
+	case sld_warn:
+		pr_warn_ratelimited("#DB: %s/%d took a bus_lock trap at address: 0x%lx\n",
+				    current->comm, current->pid, regs->ip);
+		break;
+	case sld_fatal:
+		force_sig_fault(SIGBUS, BUS_ADRALN, NULL);
+		break;
+	case sld_ratelimit:
+		/* Enforce no more than bld_ratelimit bus locks/sec. */
+		while (!__ratelimit(&get_current_user()->bld_ratelimit))
+			msleep(1000 / bld_ratelimit);
+		break;
+	}
+}
+
 /*
  * This function is called only when switching between tasks with
  * different split-lock detection modes. It sets the MSR for the
@@ -1166,7 +1233,7 @@ static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
 	{}
 };
 
-void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+static void __init split_lock_setup(struct cpuinfo_x86 *c)
 {
 	const struct x86_cpu_id *m;
 	u64 ia32_core_caps;
@@ -1193,5 +1260,44 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
 	}
 
 	cpu_model_supports_sld = true;
-	split_lock_setup();
+	__split_lock_setup();
+}
+
+static void sld_state_show(void)
+{
+	if (!boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT) &&
+	    !boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
+		return;
+
+	switch (sld_state) {
+	case sld_off:
+		pr_info("disabled\n");
+		break;
+	case sld_warn:
+		if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
+			pr_info("#AC: crashing the kernel on kernel split_locks and warning on user-space split_locks\n");
+		else if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
+			pr_info("#DB: warning on user-space bus_locks\n");
+		break;
+	case sld_fatal:
+		if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
+			pr_info("#AC: crashing the kernel on kernel split_locks and sending SIGBUS on user-space split_locks\n");
+		} else if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT)) {
+			pr_info("#DB: sending SIGBUS on user-space bus_locks%s\n",
+				boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) ?
+				" from non-WB" : "");
+		}
+		break;
+	case sld_ratelimit:
+		if (boot_cpu_has(X86_FEATURE_BUS_LOCK_DETECT))
+			pr_info("#DB: setting rate limit to %d/sec per user on non-root user-space bus_locks\n", bld_ratelimit);
+		break;
+	}
+}
+
+void __init sld_setup(struct cpuinfo_x86 *c)
+{
+	split_lock_setup(c);
+	sld_state_setup();
+	sld_state_show();
 }
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 7f5aec758f0e..21f885a1609d 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -979,6 +979,13 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
 		goto out_irq;
 	}
 
+	/*
+	 * Handle bus lock. #DB for bus lock can only be triggered from
+	 * userspace. The bus lock bit was flipped to positive polarity.
+	 */
+	if (dr6 & DR_BUS_LOCK)
+		handle_bus_lock(regs);
+
 	/* Add the virtual_dr6 bits for signals. */
 	dr6 |= current->thread.virtual_dr6;
 	if (dr6 & (DR_STEP | DR_TRAP_BITS) || icebp)
diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h
index a8ec3b6093fc..af8ec609910e 100644
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -40,8 +40,11 @@ struct user_struct {
 	atomic_t nr_watches;	/* The number of watches this user currently has */
 #endif
 
-	/* Miscellaneous per-user rate limit */
+	/* Miscellaneous per-user rate limits */
 	struct ratelimit_state ratelimit;
+#ifdef CONFIG_CPU_SUP_INTEL
+	struct ratelimit_state bld_ratelimit;
+#endif
 };
 
 extern int uids_sysfs_init(void);
diff --git a/kernel/user.c b/kernel/user.c
index a2478cddf536..654aefb7ce8f 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -103,6 +103,9 @@ struct user_struct root_user = {
 	.locked_shm     = 0,
 	.uid		= GLOBAL_ROOT_UID,
 	.ratelimit	= RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0),
+#ifdef CONFIG_CPU_SUP_INTEL
+	.bld_ratelimit	= RATELIMIT_STATE_INIT(root_user.bld_ratelimit, 0, 0),
+#endif
 };
 
 /*
@@ -172,6 +175,11 @@ void free_uid(struct user_struct *up)
 		free_user(up, flags);
 }
 
+#ifdef CONFIG_CPU_SUP_INTEL
+/* Some Intel CPUs may set this for rate-limited bus locks. */
+int bld_ratelimit;
+#endif
+
 struct user_struct *alloc_uid(kuid_t uid)
 {
 	struct hlist_head *hashent = uidhashentry(uid);
@@ -190,6 +198,11 @@ struct user_struct *alloc_uid(kuid_t uid)
 		refcount_set(&new->__count, 1);
 		ratelimit_state_init(&new->ratelimit, HZ, 100);
 		ratelimit_set_flags(&new->ratelimit, RATELIMIT_MSG_ON_RELEASE);
+#ifdef CONFIG_CPU_SUP_INTEL
+		ratelimit_state_init(&new->bld_ratelimit, HZ, bld_ratelimit);
+		ratelimit_set_flags(&new->bld_ratelimit,
+				    RATELIMIT_MSG_ON_RELEASE);
+#endif
 
 		/*
 		 * Before adding this, check whether we raced
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v5 3/3] Documentation/admin-guide: Change doc for split_lock_detect parameter
  2021-03-13  5:49 [PATCH v5 0/3] x86/bus_lock: Enable bus lock detection Fenghua Yu
  2021-03-13  5:49 ` [PATCH v5 1/3] x86/cpufeatures: Enumerate #DB for " Fenghua Yu
  2021-03-13  5:49 ` [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock Fenghua Yu
@ 2021-03-13  5:49 ` Fenghua Yu
  2021-03-19 21:35   ` Thomas Gleixner
  2 siblings, 1 reply; 18+ messages in thread
From: Fenghua Yu @ 2021-03-13  5:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Tony Luck, Randy Dunlap, Xiaoyao Li ,
	Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

Since #DB for bus lock detect changes the split_lock_detect parameter,
update the documentation for the changes.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
---
Change Log:
v5:
- Remove N < HZ/2 check info in the doc (Thomas).

v4:
- Fix a ratelimit wording issue in the doc (Randy).
- Patch 4 is acked by Randy (Randy).

v3:
- Enable Bus Lock Detection when fatal to handle bus lock from non-WB
  (PeterZ).

v1:
- Fix a few wording issues (Randy).

RFC v2:
- Simplify the documentation (Randy).

 .../admin-guide/kernel-parameters.txt         | 30 +++++++++++++++----
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 04545725f187..16b2e1c45d04 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5100,27 +5100,45 @@
 	spia_peddr=
 
 	split_lock_detect=
-			[X86] Enable split lock detection
+			[X86] Enable split lock detection or bus lock detection
 
 			When enabled (and if hardware support is present), atomic
 			instructions that access data across cache line
-			boundaries will result in an alignment check exception.
+			boundaries will result in an alignment check exception
+			for split lock detection or a debug exception for
+			bus lock detection.
 
 			off	- not enabled
 
-			warn	- the kernel will emit rate limited warnings
+			warn	- the kernel will emit rate-limited warnings
 				  about applications triggering the #AC
-				  exception. This mode is the default on CPUs
-				  that supports split lock detection.
+				  exception or the #DB exception. This mode is
+				  the default on CPUs that support split lock
+				  detection or bus lock detection. Default
+				  behavior is by #DB if both features are
+				  enabled in hardware.
 
 			fatal	- the kernel will send SIGBUS to applications
-				  that trigger the #AC exception.
+				  that trigger the #AC exception or the #DB
+				  exception. If both features are enabled in
+				  hardware, split lock triggers #AC and bus
+				  lock from non-WB triggers #DB.
+
+			ratelimit:N -
+				  Set rate limit to N bus locks per second
+				  for bus lock detection. 0 < N.
+				  Only applied to non-root users.
+
+				  N/A for split lock detection.
 
 			If an #AC exception is hit in the kernel or in
 			firmware (i.e. not while executing in user mode)
 			the kernel will oops in either "warn" or "fatal"
 			mode.
 
+			#DB exception for bus lock is triggered only when
+			CPL > 0.
+
 	srbds=		[X86,INTEL]
 			Control the Special Register Buffer Data Sampling
 			(SRBDS) mitigation.
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 1/3] x86/cpufeatures: Enumerate #DB for bus lock detection
  2021-03-13  5:49 ` [PATCH v5 1/3] x86/cpufeatures: Enumerate #DB for " Fenghua Yu
@ 2021-03-19 20:35   ` Thomas Gleixner
  2021-03-19 21:00     ` Fenghua Yu
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2021-03-19 20:35 UTC (permalink / raw)
  To: Fenghua Yu, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Tony Luck, Randy Dunlap, Xiaoyao Li, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

On Sat, Mar 13 2021 at 05:49, Fenghua Yu wrote:
> A bus lock is acquired though either split locked access to

s/though/through/
either a 

> writeback (WB) memory or any locked access to non-WB memory. This is
> typically >1000 cycles slower than an atomic operation within a cache
> line. It also disrupts performance on other cores.
>
> Some CPUs have ability to notify the kernel by an #DB trap after a user

the ability

> instruction acquires a bus lock and is executed. This allows the kernel
> to enforce user application throttling or mitigations. Both breakpoint
> and bus lock can trigger the #DB trap in the same instruction and the
> ordering of handling them is the kernel #DB handler's choice.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 1/3] x86/cpufeatures: Enumerate #DB for bus lock detection
  2021-03-19 20:35   ` Thomas Gleixner
@ 2021-03-19 21:00     ` Fenghua Yu
  0 siblings, 0 replies; 18+ messages in thread
From: Fenghua Yu @ 2021-03-19 21:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Peter Zijlstra, Tony Luck,
	Randy Dunlap, Xiaoyao Li, Ravi V Shankar, linux-kernel, x86

On Fri, Mar 19, 2021 at 09:35:39PM +0100, Thomas Gleixner wrote:
> On Sat, Mar 13 2021 at 05:49, Fenghua Yu wrote:
> > A bus lock is acquired though either split locked access to
> 
> s/though/through/
> either a 
> > Some CPUs have ability to notify the kernel by an #DB trap after a user
> 
> the ability

Thank you for your review, Thomas! Will fix the issues.

-Fenghua

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock
  2021-03-13  5:49 ` [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock Fenghua Yu
@ 2021-03-19 21:30   ` Thomas Gleixner
  2021-03-19 21:50     ` Luck, Tony
  2021-03-19 22:19     ` Fenghua Yu
  0 siblings, 2 replies; 18+ messages in thread
From: Thomas Gleixner @ 2021-03-19 21:30 UTC (permalink / raw)
  To: Fenghua Yu, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Tony Luck, Randy Dunlap, Xiaoyao Li, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

On Sat, Mar 13 2021 at 05:49, Fenghua Yu wrote:
> Change Log:
> v5:
> Address all comments from Thomas:
> - Merge patch 2 and patch 3 into one patch so all "split_lock_detect="
>   options are processed in one patch.

What? I certainly did not request that. I said:

 "Why is this seperate and an all in one thing? patch 2/4 changes the
  parameter first and 3/4 adds a new option...."

which means we want documentation for patch 2 and documentation for
patch 3? 

The ratelimit thing is clearly an extra functionality on top of that
buslock muck.

Next time I write it out..

> +	if (sscanf(arg, "ratelimit:%d", &ratelimit) == 1 && ratelimit > 0) {
> +		bld_ratelimit = ratelimit;

So any rate up to INTMAX/s is valid here, right?

> +	case sld_ratelimit:
> +		/* Enforce no more than bld_ratelimit bus locks/sec. */
> +		while (!__ratelimit(&get_current_user()->bld_ratelimit))
> +			msleep(1000 / bld_ratelimit);

which is cute because msleep() will always sleep until the next jiffie
increment happens.

What's not so cute here is the fact that get_current_user() takes a
reference on current's UID on every invocation, but nothing ever calls
free_uid(). I missed that last time over the way more obvious HZ division.

> +++ b/kernel/user.c
> @@ -103,6 +103,9 @@ struct user_struct root_user = {
>  	.locked_shm     = 0,
>  	.uid		= GLOBAL_ROOT_UID,
>  	.ratelimit	= RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0),
> +#ifdef CONFIG_CPU_SUP_INTEL
> +	.bld_ratelimit	= RATELIMIT_STATE_INIT(root_user.bld_ratelimit, 0, 0),
> +#endif
>  };
>  
>  /*
> @@ -172,6 +175,11 @@ void free_uid(struct user_struct *up)
>  		free_user(up, flags);
>  }
>  
> +#ifdef CONFIG_CPU_SUP_INTEL
> +/* Some Intel CPUs may set this for rate-limited bus locks. */
> +int bld_ratelimit;
> +#endif

Of course this variable is still required to be in the core kernel code
because?

While you decided to munge this all together, you obviously ignored the
following review comment:

  "It also lacks the information that the ratelimiting is per UID
   and not per task and why this was chosen to be per UID..."

There is still no reasoning neither in the changelog nor in the cover
letter nor in a reply to my review.

So let me repeat my question and make it more explicit:

  What is the justifucation for making this rate limit per UID and not
  per task, per process or systemwide?

>  struct user_struct *alloc_uid(kuid_t uid)
>  {
>  	struct hlist_head *hashent = uidhashentry(uid);
> @@ -190,6 +198,11 @@ struct user_struct *alloc_uid(kuid_t uid)
>  		refcount_set(&new->__count, 1);
>  		ratelimit_state_init(&new->ratelimit, HZ, 100);
>  		ratelimit_set_flags(&new->ratelimit, RATELIMIT_MSG_ON_RELEASE);
> +#ifdef CONFIG_CPU_SUP_INTEL
> +		ratelimit_state_init(&new->bld_ratelimit, HZ, bld_ratelimit);
> +		ratelimit_set_flags(&new->bld_ratelimit,
> +				    RATELIMIT_MSG_ON_RELEASE);
> +#endif

If this has a proper justification for being per user and having to add
40 bytes per UID for something which is mostly unused then there are
definitely better ways to do that than slapping #ifdefs into
architecture agnostic core code.

So if you instead of munging the code patches had split the
documentation, then I could apply the first 3 patches and we would only
have to sort out the ratelimiting muck.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 3/3] Documentation/admin-guide: Change doc for split_lock_detect parameter
  2021-03-13  5:49 ` [PATCH v5 3/3] Documentation/admin-guide: Change doc for split_lock_detect parameter Fenghua Yu
@ 2021-03-19 21:35   ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2021-03-19 21:35 UTC (permalink / raw)
  To: Fenghua Yu, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Tony Luck, Randy Dunlap, Xiaoyao Li, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

On Sat, Mar 13 2021 at 05:49, Fenghua Yu wrote:
> +			ratelimit:N -
> +				  Set rate limit to N bus locks per second
> +				  for bus lock detection. 0 < N.
> +				  Only applied to non-root users.

Of course this does not mention that the limit is per UID and it neither
tells what the action is to rate limit the bus lock attempts. The rate
limit saturation which is induced by msleep() is not mentioned either.

Really useful for administrators.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock
  2021-03-19 21:30   ` Thomas Gleixner
@ 2021-03-19 21:50     ` Luck, Tony
  2021-03-20  1:01       ` Thomas Gleixner
  2021-03-19 22:19     ` Fenghua Yu
  1 sibling, 1 reply; 18+ messages in thread
From: Luck, Tony @ 2021-03-19 21:50 UTC (permalink / raw)
  To: Thomas Gleixner, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Randy Dunlap, Li, Xiaoyao, Shankar, Ravi V
  Cc: linux-kernel, x86, Yu, Fenghua

>  What is the justifucation for making this rate limit per UID and not
>  per task, per process or systemwide?

The concern is that a malicious user is running a workload that loops
obtaining the buslock. This brings the whole system to its knees.

Limiting per task doesn't help. The user can just fork(2) a whole bunch
of tasks for a distributed buslock attack..

Systemwide might be an interesting alternative. Downside would be accidental
rate limit of non-malicious tasks that happen to grab a bus lock periodically
but in the same window with other buslocks from other users.

Do you think that a risk worth taking to make the code simpler?

-Tony

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock
  2021-03-19 21:30   ` Thomas Gleixner
  2021-03-19 21:50     ` Luck, Tony
@ 2021-03-19 22:19     ` Fenghua Yu
  2021-03-20 12:42       ` Thomas Gleixner
  1 sibling, 1 reply; 18+ messages in thread
From: Fenghua Yu @ 2021-03-19 22:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Peter Zijlstra, Tony Luck,
	Randy Dunlap, Xiaoyao Li, Ravi V Shankar, linux-kernel, x86

Hi, Thomas,

On Fri, Mar 19, 2021 at 10:30:50PM +0100, Thomas Gleixner wrote:
> On Sat, Mar 13 2021 at 05:49, Fenghua Yu wrote:
> > Change Log:
> > v5:
> > Address all comments from Thomas:
> > - Merge patch 2 and patch 3 into one patch so all "split_lock_detect="
> >   options are processed in one patch.
> 
> What? I certainly did not request that. I said:
> 
>  "Why is this seperate and an all in one thing? patch 2/4 changes the
>   parameter first and 3/4 adds a new option...."
> 
> which means we want documentation for patch 2 and documentation for
> patch 3? 
> 
> The ratelimit thing is clearly an extra functionality on top of that
> buslock muck.
> 
> Next time I write it out..

Sorry for misunderstanding your comments. I will split the document patch
into two: one for patch 2 (warn and fatal) and one for patch 3 (ratelimit).

> 
> > +	if (sscanf(arg, "ratelimit:%d", &ratelimit) == 1 && ratelimit > 0) {
> > +		bld_ratelimit = ratelimit;
> 
> So any rate up to INTMAX/s is valid here, right?

Yes. I don't see smaller limitation than INTMX/s. Is that right?

> 
> > +	case sld_ratelimit:
> > +		/* Enforce no more than bld_ratelimit bus locks/sec. */
> > +		while (!__ratelimit(&get_current_user()->bld_ratelimit))
> > +			msleep(1000 / bld_ratelimit);
> 
> which is cute because msleep() will always sleep until the next jiffie
> increment happens.
> 
> What's not so cute here is the fact that get_current_user() takes a
> reference on current's UID on every invocation, but nothing ever calls
> free_uid(). I missed that last time over the way more obvious HZ division.

I will call free_uid().

> 
> > +++ b/kernel/user.c
> > @@ -103,6 +103,9 @@ struct user_struct root_user = {
> >  	.locked_shm     = 0,
> >  	.uid		= GLOBAL_ROOT_UID,
> >  	.ratelimit	= RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0),
> > +#ifdef CONFIG_CPU_SUP_INTEL
> > +	.bld_ratelimit	= RATELIMIT_STATE_INIT(root_user.bld_ratelimit, 0, 0),
> > +#endif
> >  };
> >  
> >  /*
> > @@ -172,6 +175,11 @@ void free_uid(struct user_struct *up)
> >  		free_user(up, flags);
> >  }
> >  
> > +#ifdef CONFIG_CPU_SUP_INTEL
> > +/* Some Intel CPUs may set this for rate-limited bus locks. */
> > +int bld_ratelimit;
> > +#endif
> 
> Of course this variable is still required to be in the core kernel code
> because?
> 
> While you decided to munge this all together, you obviously ignored the
> following review comment:
> 
>   "It also lacks the information that the ratelimiting is per UID
>    and not per task and why this was chosen to be per UID..."
> 
> There is still no reasoning neither in the changelog nor in the cover
> letter nor in a reply to my review.
> 
> So let me repeat my question and make it more explicit:
> 
>   What is the justifucation for making this rate limit per UID and not
>   per task, per process or systemwide?

Tony jut now answered the justification. If that's OK, I will add the
answer in the commit message.

> 
> >  struct user_struct *alloc_uid(kuid_t uid)
> >  {
> >  	struct hlist_head *hashent = uidhashentry(uid);
> > @@ -190,6 +198,11 @@ struct user_struct *alloc_uid(kuid_t uid)
> >  		refcount_set(&new->__count, 1);
> >  		ratelimit_state_init(&new->ratelimit, HZ, 100);
> >  		ratelimit_set_flags(&new->ratelimit, RATELIMIT_MSG_ON_RELEASE);
> > +#ifdef CONFIG_CPU_SUP_INTEL
> > +		ratelimit_state_init(&new->bld_ratelimit, HZ, bld_ratelimit);
> > +		ratelimit_set_flags(&new->bld_ratelimit,
> > +				    RATELIMIT_MSG_ON_RELEASE);
> > +#endif
> 
> If this has a proper justification for being per user and having to add
> 40 bytes per UID for something which is mostly unused then there are
> definitely better ways to do that than slapping #ifdefs into
> architecture agnostic core code.
> 
> So if you instead of munging the code patches had split the
> documentation, then I could apply the first 3 patches and we would only
> have to sort out the ratelimiting muck.

If I split this whole patch set into two patch sets:
1. Three patches in the first patch set: the enumeration patch, the warn
   and fatal patch, and the documentation patch.
2. Two patches in the second patch set: the ratelimit patch and the
   documentation patch.

Then I will send the two patch sets separately, you will accept them one
by one. Is that OK?

Or should I still send the 5 patches in one patch set so you will pick up
the first 3 patches and then the next 2 patches separately?

Thanks.

-Fenghua


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock
  2021-03-19 21:50     ` Luck, Tony
@ 2021-03-20  1:01       ` Thomas Gleixner
  2021-03-20 13:57         ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2021-03-20  1:01 UTC (permalink / raw)
  To: Luck, Tony, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Randy Dunlap, Li, Xiaoyao, Shankar, Ravi V
  Cc: linux-kernel, x86, Yu, Fenghua

On Fri, Mar 19 2021 at 21:50, Tony Luck wrote:
>>  What is the justifucation for making this rate limit per UID and not
>>  per task, per process or systemwide?
>
> The concern is that a malicious user is running a workload that loops
> obtaining the buslock. This brings the whole system to its knees.
>
> Limiting per task doesn't help. The user can just fork(2) a whole bunch
> of tasks for a distributed buslock attack..

Fair enough.

> Systemwide might be an interesting alternative. Downside would be accidental
> rate limit of non-malicious tasks that happen to grab a bus lock periodically
> but in the same window with other buslocks from other users.
>
> Do you think that a risk worth taking to make the code simpler?

I'd consider it low risk, but I just looked for the usage of the
existing ratelimit in struct user and the related commit. Nw it's dawns
on me where you are coming from.

So that seems to become a pattern ... so the uncompiled thing below
might solve that.

Yes, it makes the efivars thingy slower, but do we care?

We neither care about efivars performance nor about the buslock
performance.

But I pretty much care about no having to stare at code which gets the
fundamental refcounting wrong.

Thanks,

        tglx
---
 fs/efivarfs/Kconfig           |    1 +
 fs/efivarfs/file.c            |   11 +++++++++--
 include/linux/ratelimit.h     |    1 +
 include/linux/ratelimit_uid.h |   26 ++++++++++++++++++++++++++
 include/linux/sched/user.h    |    4 ++--
 kernel/user.c                 |    7 ++++---
 lib/Kconfig                   |    3 +++
 lib/ratelimit.c               |   41 +++++++++++++++++++++++++++++++++++++++++
 8 files changed, 87 insertions(+), 7 deletions(-)

--- a/fs/efivarfs/Kconfig
+++ b/fs/efivarfs/Kconfig
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config EFIVAR_FS
 	tristate "EFI Variable filesystem"
+	select UID_RATELIMIT
 	depends on EFI
 	default m
 	help
--- a/fs/efivarfs/file.c
+++ b/fs/efivarfs/file.c
@@ -63,6 +63,14 @@ static ssize_t efivarfs_file_write(struc
 	return bytes;
 }
 
+static const struct uid_ratelimit_cfg efivars_rl = {
+	.which		= UID_RATELIMIT_EFIVARS,
+	.interval	= HZ,
+	.burst		= 100,
+	.flags		= RATELIMIT_MSG_ON_RELEASE,
+	.delay		= 50,
+};
+
 static ssize_t efivarfs_file_read(struct file *file, char __user *userbuf,
 		size_t count, loff_t *ppos)
 {
@@ -73,8 +81,7 @@ static ssize_t efivarfs_file_read(struct
 	ssize_t size = 0;
 	int err;
 
-	while (!__ratelimit(&file->f_cred->user->ratelimit))
-		msleep(50);
+	uid_ratelimit(&efivars_rl);
 
 	err = efivar_entry_size(var, &datasize);
 
--- a/include/linux/ratelimit.h
+++ b/include/linux/ratelimit.h
@@ -3,6 +3,7 @@
 #define _LINUX_RATELIMIT_H
 
 #include <linux/ratelimit_types.h>
+#include <linux/ratelimit_uid.h>
 #include <linux/sched.h>
 #include <linux/spinlock.h>
 
--- /dev/null
+++ b/include/linux/ratelimit_uid.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_RATELIMIT_UID_H
+#define _LINUX_RATELIMIT_UID_H
+
+/* Per UID ratelimits */
+enum uid_ratelimits {
+#ifdef CONFIG_EFIVAR_FS
+	UID_RATELIMIT_EFIVARS,
+#endif
+	UID_RATELIMIT_MAX,
+};
+
+#define UID_RATELIMIT_NODELAY	ULONG_MAX
+
+struct uid_ratelimit_cfg {
+	enum uid_ratelimits	which;
+	int			interval;
+	int			burst;
+	unsigned long		flags;
+	unsigned long		delay;
+};
+
+extern int __uid_ratelimit(const struct uid_ratelimit_cfg *cfg, const void *func);
+#define uid_ratelimit(cfg) __uid_ratelimit(cfg, __func__)
+
+#endif
--- a/include/linux/sched/user.h
+++ b/include/linux/sched/user.h
@@ -40,8 +40,8 @@ struct user_struct {
 	atomic_t nr_watches;	/* The number of watches this user currently has */
 #endif
 
-	/* Miscellaneous per-user rate limit */
-	struct ratelimit_state ratelimit;
+	/* Miscellaneous per-user rate limits storage */
+	struct ratelimit_state	*ratelimits[UID_RATELIMIT_MAX];
 };
 
 extern int uids_sysfs_init(void);
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -102,7 +102,6 @@ struct user_struct root_user = {
 	.sigpending	= ATOMIC_INIT(0),
 	.locked_shm     = 0,
 	.uid		= GLOBAL_ROOT_UID,
-	.ratelimit	= RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0),
 };
 
 /*
@@ -139,8 +138,12 @@ static struct user_struct *uid_hash_find
 static void free_user(struct user_struct *up, unsigned long flags)
 	__releases(&uidhash_lock)
 {
+	unsigned int i;
+
 	uid_hash_remove(up);
 	spin_unlock_irqrestore(&uidhash_lock, flags);
+	for (i = 0; i < UID_RATELIMIT_MAX; i++)
+		kfree(up->ratelimits[i]);
 	kmem_cache_free(uid_cachep, up);
 }
 
@@ -188,8 +191,6 @@ struct user_struct *alloc_uid(kuid_t uid
 
 		new->uid = uid;
 		refcount_set(&new->__count, 1);
-		ratelimit_state_init(&new->ratelimit, HZ, 100);
-		ratelimit_set_flags(&new->ratelimit, RATELIMIT_MSG_ON_RELEASE);
 
 		/*
 		 * Before adding this, check whether we raced
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -701,3 +701,6 @@ config GENERIC_LIB_DEVMEM_IS_ALLOWED
 config PLDMFW
 	bool
 	default n
+
+config UID_RATELIMIT
+	bool
--- a/lib/ratelimit.c
+++ b/lib/ratelimit.c
@@ -11,6 +11,9 @@
 #include <linux/ratelimit.h>
 #include <linux/jiffies.h>
 #include <linux/export.h>
+#include <linux/delay.h>
+#include <linux/slab.h>
+#include <linux/cred.h>
 
 /*
  * __ratelimit - rate limiting
@@ -68,3 +71,41 @@ int ___ratelimit(struct ratelimit_state
 	return ret;
 }
 EXPORT_SYMBOL(___ratelimit);
+
+#ifdef CONFIG_UID_RATELIMIT
+int __uid_ratelimit(const struct uid_ratelimit_cfg *cfg, const void *func)
+{
+	struct user_struct *up = get_current_user();
+	unsigned int which = cfg->which;
+	struct ratelimit_state *rs;
+	int ret = 0;
+
+	if (WARN_ON_ONCE(which > UID_RATELIMIT_MAX))
+		goto out;
+
+	rs = up->ratelimits[which];
+	if (!rs) {
+		/* ratelimit_state_init() does a memset(,0,) */
+		rs = kmalloc(sizeof(*rs), GFP_KERNEL);
+		if (!rs)
+			goto out;
+		ratelimit_state_init(rs, cfg->interval, cfg->burst);
+		ratelimit_set_flags(rs, cfg->flags);
+		if (cmpxchg(&up->ratelimits[which], NULL, rs) != NULL) {
+			kfree(rs);
+			rs = up->ratelimits[which];
+		}
+	}
+
+	while (1) {
+		ret = ___ratelimit(rs, func);
+		if (ret || cfg->delay == UID_RATELIMIT_NODELAY)
+			break;
+		msleep(cfg->delay);
+	}
+out:
+	free_uid(up);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(__uid_ratelimit);
+#endif



        

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock
  2021-03-19 22:19     ` Fenghua Yu
@ 2021-03-20 12:42       ` Thomas Gleixner
  2021-04-03  1:04         ` Fenghua Yu
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2021-03-20 12:42 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, Peter Zijlstra, Tony Luck,
	Randy Dunlap, Xiaoyao Li, Ravi V Shankar, linux-kernel, x86

On Fri, Mar 19 2021 at 22:19, Fenghua Yu wrote:
> On Fri, Mar 19, 2021 at 10:30:50PM +0100, Thomas Gleixner wrote:
>> > +	if (sscanf(arg, "ratelimit:%d", &ratelimit) == 1 && ratelimit > 0) {
>> > +		bld_ratelimit = ratelimit;
>> 
>> So any rate up to INTMAX/s is valid here, right?
>
> Yes. I don't see smaller limitation than INTMX/s. Is that right?

That's a given, but what's the point of limits in that range?

A buslock access locks up the system for X cycles. So the total amount
of allowable damage in cycles per second is:

   limit * stall_cycles_per_bus_lock

ergo the time (in seconds) which the system is locked up is:

   limit * stall_cycles_per_bus_lock / cpufreq

Which means for ~INTMAX/2 on a 2 GHz CPU:

   2 * 10^9 * $CYCLES  / 2 * 10^9  = $CYCLES seconds

Assumed the inflicted damage is only 1 cycle then #LOCK is pretty much
permanently on if there are enough threads. Sure #DB will slow them
down, but it still does not make any sense at all especially as the
damage is certainly greater than a single cycle.

And because the changelogs and the docs are void of numbers I just got
real numbers myself.

With a single thread doing a 'lock inc *mem' accross a cache line
boundary the workload which I measured with perf stat goes from:

     5,940,985,091      instructions              #    0.88  insn per cycle         
       2.780950806 seconds time elapsed
       0.998480000 seconds user
       4.202137000 seconds sys
to

     7,467,979,504      instructions              #    0.10  insn per cycle         
       5.110795917 seconds time elapsed
       7.123499000 seconds user
      37.266852000 seconds sys

The buslock injection rate is ~250k per second.

Even if I ratelimit the locked inc by a delay loop of ~5000 cycles
which is probably more than what the #DB will cost then this single task
still impacts the workload significantly:

     6,496,994,537      instructions              #    0.39  insn per cycle         
       3.043275473 seconds time elapsed
       1.899852000 seconds user
       8.957088000 seconds sys

The buslock injection rate is down to ~150k per second in this case.

And even with throttling the injection rate further down to 25k per
second the impact on the workload is still significant in the 10% range.

And of course the documentation of the ratelimit parameter explains all
of this in great detail so the administrator has a trivial job to tune
that, right?

>> > +	case sld_ratelimit:
>> > +		/* Enforce no more than bld_ratelimit bus locks/sec. */
>> > +		while (!__ratelimit(&get_current_user()->bld_ratelimit))
>> > +			msleep(1000 / bld_ratelimit);

For any ratelimit > 1000 this will loop up to 1000 times with
CONFIG_HZ=1000.

Assume that the buslock producer has tons of threads which all end up
here pretty soon then you launch a mass wakeup in the worst case every
jiffy. Are you sure that the cure is better than the disease?

> If I split this whole patch set into two patch sets:
> 1. Three patches in the first patch set: the enumeration patch, the warn
>    and fatal patch, and the documentation patch.
> 2. Two patches in the second patch set: the ratelimit patch and the
>    documentation patch.
>
> Then I will send the two patch sets separately, you will accept them one
> by one. Is that OK?

That's obviously the right thing to do because #1 should be ready and we
can sort out #2 seperately. See the conversation with Tony.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock
  2021-03-20  1:01       ` Thomas Gleixner
@ 2021-03-20 13:57         ` Thomas Gleixner
  2021-04-03  0:50           ` Fenghua Yu
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2021-03-20 13:57 UTC (permalink / raw)
  To: Luck, Tony, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, Randy Dunlap, Li, Xiaoyao, Shankar, Ravi V
  Cc: linux-kernel, x86, Yu, Fenghua

On Sat, Mar 20 2021 at 02:01, Thomas Gleixner wrote:

> On Fri, Mar 19 2021 at 21:50, Tony Luck wrote:
>>>  What is the justifucation for making this rate limit per UID and not
>>>  per task, per process or systemwide?
>>
>> The concern is that a malicious user is running a workload that loops
>> obtaining the buslock. This brings the whole system to its knees.
>>
>> Limiting per task doesn't help. The user can just fork(2) a whole bunch
>> of tasks for a distributed buslock attack..
>
> Fair enough.
>
>> Systemwide might be an interesting alternative. Downside would be accidental
>> rate limit of non-malicious tasks that happen to grab a bus lock periodically
>> but in the same window with other buslocks from other users.
>>
>> Do you think that a risk worth taking to make the code simpler?
>
> I'd consider it low risk, but I just looked for the usage of the
> existing ratelimit in struct user and the related commit. Nw it's dawns
> on me where you are coming from.

So after getting real numbers myself, I have more thoughts on
this. Setting a reasonable per user limit might be hard when you want to
protect e.g. against an orchestrated effort by several users
(containers...). If each of them stays under the limit which is easy
enough to figure out then you still end up with significant accumulated
damage.

So systemwide might not be the worst option after all.

The question is how wide spread are bus locks in existing applications?
I haven't found any on a dozen machines with random variants of
workloads so far according to perf ... -e sq_misc.split_lock.

What's the actual scenario in the real world where a buslock access
might be legitimate?

And what's the advice, recommendation for a system administrator how to
analyze the situation and what kind of parameter to set?

I tried to get answers from Documentation/x86/buslock.rst, but ....

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock
  2021-03-20 13:57         ` Thomas Gleixner
@ 2021-04-03  0:50           ` Fenghua Yu
  0 siblings, 0 replies; 18+ messages in thread
From: Fenghua Yu @ 2021-04-03  0:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Luck, Tony, Ingo Molnar, Borislav Petkov, Peter Zijlstra,
	Randy Dunlap, Li, Xiaoyao, Shankar, Ravi V, linux-kernel, x86

Hi, Thomas,

On Sat, Mar 20, 2021 at 02:57:52PM +0100, Thomas Gleixner wrote:
> On Sat, Mar 20 2021 at 02:01, Thomas Gleixner wrote:
> 
> > On Fri, Mar 19 2021 at 21:50, Tony Luck wrote:
> >>>  What is the justifucation for making this rate limit per UID and not
> >>>  per task, per process or systemwide?
> >>
> >> The concern is that a malicious user is running a workload that loops
> >> obtaining the buslock. This brings the whole system to its knees.
> >>
> >> Limiting per task doesn't help. The user can just fork(2) a whole bunch
> >> of tasks for a distributed buslock attack..
> >
> > Fair enough.
> >
> >> Systemwide might be an interesting alternative. Downside would be accidental
> >> rate limit of non-malicious tasks that happen to grab a bus lock periodically
> >> but in the same window with other buslocks from other users.
> >>
> >> Do you think that a risk worth taking to make the code simpler?
> >
> > I'd consider it low risk, but I just looked for the usage of the
> > existing ratelimit in struct user and the related commit. Nw it's dawns
> > on me where you are coming from.
> 
> So after getting real numbers myself, I have more thoughts on
> this. Setting a reasonable per user limit might be hard when you want to
> protect e.g. against an orchestrated effort by several users
> (containers...). If each of them stays under the limit which is easy
> enough to figure out then you still end up with significant accumulated
> damage.
> 
> So systemwide might not be the worst option after all.

Indeed.

> 
> The question is how wide spread are bus locks in existing applications?
> I haven't found any on a dozen machines with random variants of
> workloads so far according to perf ... -e sq_misc.split_lock.

We have been running various tests widely inside Intel (and also outside)
after enabling split lock and captured a few split lock issues in firmware,
kernel, drivers, and apps. As you know, we have submitted a few patches to
fix the split lock issues in the kernel and drivers (e.g. split lock
in bit ops) and fixed a few split lock issues in firmware.

But so far I'm not aware of any split lock issues in user space yet.
I guess compilers do good cache line alignment good job to avoid this
issue. But inline asm in user apps can easily hit this issue (on purpose).

> 
> What's the actual scenario in the real world where a buslock access
> might be legitimate?

I did a simple experiment: looping on a split locked instruction on
one core in one user can slow down "dd" command running on another core
in another user by 7 times. A malicious user can do similar things to
slow down the whole system performance, right?

> 
> And what's the advice, recommendation for a system administrator how to
> analyze the situation and what kind of parameter to set?
> 
> I tried to get answers from Documentation/x86/buslock.rst, but ....

Can I change the sleep code in the handle_bus_lock() to the following?

               while (!__ratelimit(&global_bld_ratelimit))
                       usleep_range(1000000 / bld_ratelimit,
                                    1000000 / bld_ratelimit);

Maybe the system wide bus lock ratelimit can be set to default value
1000,000/s which is also the max ratelimit value.

The max sleep in the kernel is 1 us which means max bld_ratelimit
can be up to 1000,000.

If the system administrator think bus locks are less tolerant and wants
to throttle bus lock further, bld_ratelimit can be set as a smaller number.
The smallest bld_ratelimit is 1.

When I gradually decreases bld_ratelimit value, I can see less bus locks
can be issued per second systemwide and "dd" command or other memory
benchmarks are less impacted by the bus locks.

If this works, I will have the buslock.rst doc to explain the situation
and how to set the parameter.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock
  2021-03-20 12:42       ` Thomas Gleixner
@ 2021-04-03  1:04         ` Fenghua Yu
  2021-04-12  7:15           ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Fenghua Yu @ 2021-04-03  1:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Peter Zijlstra, Tony Luck,
	Randy Dunlap, Xiaoyao Li, Ravi V Shankar, linux-kernel, x86

Hi, Thomas,

On Sat, Mar 20, 2021 at 01:42:52PM +0100, Thomas Gleixner wrote:
> On Fri, Mar 19 2021 at 22:19, Fenghua Yu wrote:
> > On Fri, Mar 19, 2021 at 10:30:50PM +0100, Thomas Gleixner wrote:
> >> > +	if (sscanf(arg, "ratelimit:%d", &ratelimit) == 1 && ratelimit > 0) {
> >> > +		bld_ratelimit = ratelimit;
> >> 
> >> So any rate up to INTMAX/s is valid here, right?
> >
> > Yes. I don't see smaller limitation than INTMX/s. Is that right?
> 
> That's a given, but what's the point of limits in that range?
> 
> A buslock access locks up the system for X cycles. So the total amount
> of allowable damage in cycles per second is:
> 
>    limit * stall_cycles_per_bus_lock
> 
> ergo the time (in seconds) which the system is locked up is:
> 
>    limit * stall_cycles_per_bus_lock / cpufreq
> 
> Which means for ~INTMAX/2 on a 2 GHz CPU:
> 
>    2 * 10^9 * $CYCLES  / 2 * 10^9  = $CYCLES seconds
> 
> Assumed the inflicted damage is only 1 cycle then #LOCK is pretty much
> permanently on if there are enough threads. Sure #DB will slow them
> down, but it still does not make any sense at all especially as the
> damage is certainly greater than a single cycle.
> 
> And because the changelogs and the docs are void of numbers I just got
> real numbers myself.
> 
> With a single thread doing a 'lock inc *mem' accross a cache line
> boundary the workload which I measured with perf stat goes from:
> 
>      5,940,985,091      instructions              #    0.88  insn per cycle         
>        2.780950806 seconds time elapsed
>        0.998480000 seconds user
>        4.202137000 seconds sys
> to
> 
>      7,467,979,504      instructions              #    0.10  insn per cycle         
>        5.110795917 seconds time elapsed
>        7.123499000 seconds user
>       37.266852000 seconds sys
> 
> The buslock injection rate is ~250k per second.
> 
> Even if I ratelimit the locked inc by a delay loop of ~5000 cycles
> which is probably more than what the #DB will cost then this single task
> still impacts the workload significantly:
> 
>      6,496,994,537      instructions              #    0.39  insn per cycle         
>        3.043275473 seconds time elapsed
>        1.899852000 seconds user
>        8.957088000 seconds sys
> 
> The buslock injection rate is down to ~150k per second in this case.
> 
> And even with throttling the injection rate further down to 25k per
> second the impact on the workload is still significant in the 10% range.

Thank you for your insight!

So I can change the ratelimit to system wide and call usleep_range()
to sleep: 
               while (!__ratelimit(&global_bld_ratelimit))
                       usleep_range(1000000 / bld_ratelimit,
                                    1000000 / bld_ratelimit);

The max bld_ratelimit is 1000,000/s because the max sleeping time is 1 usec.
The min bld_ratelimit is 1/s.

> 
> And of course the documentation of the ratelimit parameter explains all
> of this in great detail so the administrator has a trivial job to tune
> that, right?

I will explain how to tune the parameter in buslock.rst doc.

> 
> >> > +	case sld_ratelimit:
> >> > +		/* Enforce no more than bld_ratelimit bus locks/sec. */
> >> > +		while (!__ratelimit(&get_current_user()->bld_ratelimit))
> >> > +			msleep(1000 / bld_ratelimit);
> 
> For any ratelimit > 1000 this will loop up to 1000 times with
> CONFIG_HZ=1000.
> 
> Assume that the buslock producer has tons of threads which all end up
> here pretty soon then you launch a mass wakeup in the worst case every
> jiffy. Are you sure that the cure is better than the disease?

if using usleep_range() to sleep, the threads will not sleep and wakeup,
right? Maybe I can use msleep() for msec (bigger bld_ratelimit) and
usleep_range() for usec (smaller bld_ratelimit)?

Even if there is mass wakeup, throttling the threads can avoid the system
wide performance degradation (e.g. 7x slower dd command in another user).
Is that a good justification for throttling the threads?

> 
> > If I split this whole patch set into two patch sets:
> > 1. Three patches in the first patch set: the enumeration patch, the warn
> >    and fatal patch, and the documentation patch.
> > 2. Two patches in the second patch set: the ratelimit patch and the
> >    documentation patch.
> >
> > Then I will send the two patch sets separately, you will accept them one
> > by one. Is that OK?
> 
> That's obviously the right thing to do because #1 should be ready and we
> can sort out #2 seperately. See the conversation with Tony.

Thank you for picking up the first patch set!

-Fenghua

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock
  2021-04-03  1:04         ` Fenghua Yu
@ 2021-04-12  7:15           ` Thomas Gleixner
  2021-04-13 23:40             ` Fenghua Yu
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2021-04-12  7:15 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, Peter Zijlstra, Tony Luck,
	Randy Dunlap, Xiaoyao Li, Ravi V Shankar, linux-kernel, x86

On Sat, Apr 03 2021 at 01:04, Fenghua Yu wrote:
> On Sat, Mar 20, 2021 at 01:42:52PM +0100, Thomas Gleixner wrote:
>> On Fri, Mar 19 2021 at 22:19, Fenghua Yu wrote:
>> And even with throttling the injection rate further down to 25k per
>> second the impact on the workload is still significant in the 10% range.
>
> Thank you for your insight!
>
> So I can change the ratelimit to system wide and call usleep_range()
> to sleep: 
>                while (!__ratelimit(&global_bld_ratelimit))
>                        usleep_range(1000000 / bld_ratelimit,
>                                     1000000 / bld_ratelimit);
>
> The max bld_ratelimit is 1000,000/s because the max sleeping time is 1
> usec.

Maximum sleep time is 1usec?

> The min bld_ratelimit is 1/s.

Again. This does not make sense at all. 1Mio bus lock events per second
are way beyond the point where the machine does anything else than being
stuck in buslocks.

Aside of that why are you trying to make this throttling in any way
accurate? It does not matter at all, really. Limit reached, put it to
sleep for some time and be done with it. No point in trying to be clever
for no value.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock
  2021-04-12  7:15           ` Thomas Gleixner
@ 2021-04-13 23:40             ` Fenghua Yu
  2021-04-14  9:41               ` Thomas Gleixner
  0 siblings, 1 reply; 18+ messages in thread
From: Fenghua Yu @ 2021-04-13 23:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, Peter Zijlstra, Tony Luck,
	Randy Dunlap, Xiaoyao Li, Ravi V Shankar, linux-kernel, x86

Hi, Thomas,

On Mon, Apr 12, 2021 at 09:15:08AM +0200, Thomas Gleixner wrote:
> On Sat, Apr 03 2021 at 01:04, Fenghua Yu wrote:
> > On Sat, Mar 20, 2021 at 01:42:52PM +0100, Thomas Gleixner wrote:
> >> On Fri, Mar 19 2021 at 22:19, Fenghua Yu wrote:
> >> And even with throttling the injection rate further down to 25k per
> >> second the impact on the workload is still significant in the 10% range.
> > So I can change the ratelimit to system wide and call usleep_range()
> > to sleep: 
> >                while (!__ratelimit(&global_bld_ratelimit))
> >                        usleep_range(1000000 / bld_ratelimit,
> >                                     1000000 / bld_ratelimit);
> >
> > The max bld_ratelimit is 1000,000/s because the max sleeping time is 1
> > usec.
> 
> Maximum sleep time is 1usec?
> 
> > The min bld_ratelimit is 1/s.
> 
> Again. This does not make sense at all. 1Mio bus lock events per second
> are way beyond the point where the machine does anything else than being
> stuck in buslocks.
> 
> Aside of that why are you trying to make this throttling in any way
> accurate? It does not matter at all, really. Limit reached, put it to
> sleep for some time and be done with it. No point in trying to be clever
> for no value.

Is it OK to set bld_ratelimit between 1 and 1,000 bus locks/sec for
bld_ratelimit?

Can I do the throttling like this?

               /* Enforce no more than bld_ratelimit bus locks/sec. */
               while (!__ratelimit(&global_bld_ratelimit))
                       msleep(10);

On one machine, if bld_ratelimit=1,000, that's about 5msec for a busy
bus lock loop, i.e. bus is locked for about 5msec and then the process
sleeps for 10msec and thus won't generate any bus lock.
"dd" command running on other cores doesn't have noticeable degradation
with bld_ratelimit=1,000.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock
  2021-04-13 23:40             ` Fenghua Yu
@ 2021-04-14  9:41               ` Thomas Gleixner
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Gleixner @ 2021-04-14  9:41 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, Peter Zijlstra, Tony Luck,
	Randy Dunlap, Xiaoyao Li, Ravi V Shankar, linux-kernel, x86

Fenghua,

On Tue, Apr 13 2021 at 23:40, Fenghua Yu wrote:
> On Mon, Apr 12, 2021 at 09:15:08AM +0200, Thomas Gleixner wrote:
>> Aside of that why are you trying to make this throttling in any way
>> accurate? It does not matter at all, really. Limit reached, put it to
>> sleep for some time and be done with it. No point in trying to be clever
>> for no value.
>
> Is it OK to set bld_ratelimit between 1 and 1,000 bus locks/sec for
> bld_ratelimit?
>
> Can I do the throttling like this?
>
>                /* Enforce no more than bld_ratelimit bus locks/sec. */
>                while (!__ratelimit(&global_bld_ratelimit))
>                        msleep(10);
>
> On one machine, if bld_ratelimit=1,000, that's about 5msec for a busy
> bus lock loop, i.e. bus is locked for about 5msec and then the process
> sleeps for 10msec and thus won't generate any bus lock.
> "dd" command running on other cores doesn't have noticeable degradation
> with bld_ratelimit=1,000.

Something like this makes sense. Add some rationale for the upper limit
you finally end up with so sysadmins can make informed decisions.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-04-14  9:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-13  5:49 [PATCH v5 0/3] x86/bus_lock: Enable bus lock detection Fenghua Yu
2021-03-13  5:49 ` [PATCH v5 1/3] x86/cpufeatures: Enumerate #DB for " Fenghua Yu
2021-03-19 20:35   ` Thomas Gleixner
2021-03-19 21:00     ` Fenghua Yu
2021-03-13  5:49 ` [PATCH v5 2/3] x86/bus_lock: Handle #DB for bus lock Fenghua Yu
2021-03-19 21:30   ` Thomas Gleixner
2021-03-19 21:50     ` Luck, Tony
2021-03-20  1:01       ` Thomas Gleixner
2021-03-20 13:57         ` Thomas Gleixner
2021-04-03  0:50           ` Fenghua Yu
2021-03-19 22:19     ` Fenghua Yu
2021-03-20 12:42       ` Thomas Gleixner
2021-04-03  1:04         ` Fenghua Yu
2021-04-12  7:15           ` Thomas Gleixner
2021-04-13 23:40             ` Fenghua Yu
2021-04-14  9:41               ` Thomas Gleixner
2021-03-13  5:49 ` [PATCH v5 3/3] Documentation/admin-guide: Change doc for split_lock_detect parameter Fenghua Yu
2021-03-19 21:35   ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.