All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 0/6] Enable split lock detection for real time and debug
@ 2019-11-21  0:53 Fenghua Yu
  2019-11-21  0:53 ` [PATCH v10 1/6] x86/msr-index: Add two new MSRs Fenghua Yu
                   ` (5 more replies)
  0 siblings, 6 replies; 145+ messages in thread
From: Fenghua Yu @ 2019-11-21  0:53 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

This is a stripped down version of the patch series.

Goals:
======
1) To provide a boot time option (default off) to enable split lock
   detection.
2) To ensure that kernel crashes cleanly if OS code executes an
   atomic instruction that crosses cache lines.
3) Enable for some existing CPUs that do not provide CPUID enumeration
   of the feature together with architectural enumeration for future CPUs.

Non-goals:
==========
1) Fancy methods to have the kernel recover/continue
2) Selective enabling (it is either "on" or "off").
3) /sys files to enable/disable at run time
4) Virtualization support (guests just SIGBUS)

Access to misaligned data across two cache lines in an atomic instruction
(a.k.a split lock) takes over 1000 extra cycles compared to an atomic
access within one cache line. Split lock degrades performance not only
on the current CPU but also on the whole system because during split lock
the instruction holds bus lock and prohibits any other memory access on
the bus.

Some real time environments cannot meet deadlines if the processor
is handling split locks.

On Intel Tremont and future processors, split lock is detected by
triggering #AC exception after setting bit 29 in the TEST_CTRL
MSR (0x33) [1].

When split lock detection is enabled, if split lock happens in the
kernel, the kernel panics. Otherwise, the user process is killed by
SIGBUS.

To get a split lock free real time system, kernel and user application
developers need to enable split lock detection and find and fix all
possible split lock issues.

The split lock detection is disabled by default because potential split
lock issues can cause kernel panic or kill user processes. It is enabled
only for real time or debug purpose through a kernel parameter
"split_lock_detect".

Enabling split lock detection already finds split lock issues in atomic
bit operations and some of the blocking issues are fixed in the tip tree:
https://lore.kernel.org/lkml/157384597983.12247.8995835529288193538.tip-bot2@tip-bot2/
https://lore.kernel.org/lkml/157384597947.12247.7200239597382357556.tip-bot2@tip-bot2/

[1] Please check the latest Intel 64 and IA-32 Architectures Software
Developer's Manual for more detailed information on the TEST_CTRL MSR
and the split lock detection bit and how to enumerate the feature.

==Changelog==
v10:
- Reduce the scope of this patch set to real time and debug usage only
  because this usage is requested by customers and is easier to be
  implemented than enabling the feature by default:
  1. Disable split lock detection by default and enable it only for
     real time or debug purpose.
  2. Kernel panics or kill user process in #AC for split lock
  3. Drop KVM and debugfs knobs.

v9:
Address Thomas Gleixner's comments:
- wrmsr() in split_lock_update_msr() to spare RMW
- Print warnings in atomic bit operations xxx_bit() if the address is
unaligned to unsigned long.
- When host enables split lock detection, forcing it enabled for guest.
- Using the msr_test_ctl_mask to decide which bits need to be switched in
atomic_switch_msr_test_ctl().
- Warn if addr is unaligned to unsigned long in atomic ops xxx_bit().

Address Ingo Molnar's comments:
- Follow right MSR register and bits naming convention
- Use right naming convention for variables and functions
- Use split_lock_debug for atomic opertions of WARN_ONCE in #AC handler
and split_lock_detect_wr();
- Move the sysfs interface to debugfs interface /sys/kernel/debug/x86/
split_lock_detect

Other fixes:
- update vmx->msr_test_ctl_mask when changing MSR_IA32_CORE_CAP.
- Support resume from suspend/hibernation

- The split lock fix patch (#0003) for wlcore wireless driver is
upstreamed. So remove the patch from this patch set.

v8:
Address issues pointed out by Thomas Gleixner:
- Remove all "clearcpuid=" related patches.
- Add kernel parameter "nosplit_lock_detect" patch.
- Merge definition and initialization of msr_test_ctl_cache into #AC
  handling patch which first uses the variable.
- Add justification for the sysfs knob and combine function and doc
  patches into one patch 0015.
- A few other adjustments.

v7:
- Add per cpu variable to cach MSR TEST_CTL. Suggested by Thomas Gleixner.
- Change a few other changes including locking, simplifying code, work
flow, KVM fixes, etc. Suggested by Thomas Gleixner.
- Fix KVM issues pointed out by Sean Christopherson.

v6:
- Fix #AC handler issues pointed out by Dave Hansen
- Add doc for the sysfs interface pointed out by Dave Hansen
- Fix a lock issue around wrmsr during split lock init, pointed out by Dave
  Hansen
- Update descriptions and comments suggested by Dave Hansen
- Fix __le32 issue in wlcore raised by Kalle Valo
- Add feature enumeration based on family/model/stepping for Icelake mobile

v5:
- Fix wlcore issue from Paolo Bonzini
- Fix b44 issue from Peter Zijlstra
- Change init sequence by Dave Hansen
- Fix KVM issues from Paolo Bonzini
- Re-order patch sequence

v4:
- Remove "setcpuid=" option
- Enable IA32_CORE_CAPABILITY enumeration for split lock
- Handle CPUID faulting by Peter Zijlstra
- Enable /sys interface to enable/disable split lock detection

v3:
- Handle split lock as suggested by Thomas Gleixner.
- Fix a few potential spit lock issues suggested by Thomas Gleixner.
- Support kernel option "setcpuid=" suggested by Dave Hanson and Thomas
Gleixner.
- Support flag string in "clearcpuid=" suggested by Dave Hanson and
Thomas Gleixner.

v2:
- Remove code that handles split lock issue in firmware and fix
x86_capability issue mainly based on comments from Thomas Gleixner and
Peter Zijlstra.

In previous version:
Comments from Dave Hansen:
- Enumerate feature in X86_FEATURE_SPLIT_LOCK_AC
- Separate #AC handler from do_error_trap
- Use CONFIG to configure inherit BIOS setting, enable, or disable split
  lock. Remove kernel parameter "split_lock_ac="
- Change config interface to debugfs from sysfs
- Fix a few bisectable issues
- Other changes.

Comment from Tony Luck and Dave Hansen:
- Dump right information in #AC handler

Comment from Alan Cox and Dave Hansen:
- Description of split lock in patch 0

Others:
- Remove tracing because we can trace split lock in existing
  sq_misc.split_lock.
- Add CONFIG to configure either panic or re-execute faulting instruction
  for split lock in kernel.
- other minor changes.

Fenghua Yu (6):
  x86/msr-index: Add two new MSRs
  x86/cpufeatures: Enumerate the IA32_CORE_CAPABILITIES MSR
  x86/split_lock: Enumerate split lock detection by the
    IA32_CORE_CAPABILITIES MSR
  x86/split_lock: Enumerate split lock detection if the
    IA32_CORE_CAPABILITIES MSR is not supported
  x86/split_lock: Handle #AC exception for split lock
  x86/split_lock: Enable split lock detection by kernel parameter

 .../admin-guide/kernel-parameters.txt         | 10 +++
 arch/x86/include/asm/cpu.h                    |  5 ++
 arch/x86/include/asm/cpufeatures.h            |  2 +
 arch/x86/include/asm/msr-index.h              |  8 +++
 arch/x86/include/asm/traps.h                  |  3 +
 arch/x86/kernel/cpu/common.c                  |  2 +
 arch/x86/kernel/cpu/intel.c                   | 72 +++++++++++++++++++
 arch/x86/kernel/traps.c                       | 22 +++++-
 8 files changed, 123 insertions(+), 1 deletion(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 145+ messages in thread

* [PATCH v10 1/6] x86/msr-index: Add two new MSRs
  2019-11-21  0:53 [PATCH v10 0/6] Enable split lock detection for real time and debug Fenghua Yu
@ 2019-11-21  0:53 ` Fenghua Yu
  2019-11-21  0:53 ` [PATCH v10 2/6] x86/cpufeatures: Enumerate the IA32_CORE_CAPABILITIES MSR Fenghua Yu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 145+ messages in thread
From: Fenghua Yu @ 2019-11-21  0:53 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

IA32_CORE_CAPABILITIES(0xCF): Core Capabilities Register
        Bit5: #AC(0) exception for split locked accesses supported.

TEST_CTRL(0x33): Test Control Register
        Bit29: Enable #AC(0) exception for split locked accesses.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/msr-index.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 6a3124664289..7b25cec494fd 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_TEST_CTRL				0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
 #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
 #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,10 @@
  */
 #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
 
+#define MSR_IA32_CORE_CAPABILITIES			  0x000000cf
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT  5
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH v10 2/6] x86/cpufeatures: Enumerate the IA32_CORE_CAPABILITIES MSR
  2019-11-21  0:53 [PATCH v10 0/6] Enable split lock detection for real time and debug Fenghua Yu
  2019-11-21  0:53 ` [PATCH v10 1/6] x86/msr-index: Add two new MSRs Fenghua Yu
@ 2019-11-21  0:53 ` Fenghua Yu
  2019-11-21  0:53 ` [PATCH v10 3/6] x86/split_lock: Enumerate split lock detection by " Fenghua Yu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 145+ messages in thread
From: Fenghua Yu @ 2019-11-21  0:53 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

The IA32_CORE_CAPABILITIES (0xcf) MSR contains bits that enumerate
some model specific features.

The MSR itself is enumerated by CPUID.(EAX=0x7,ECX=0):EDX[30].
When this CPUID bit is 1, the MSR 0xcf exists.

There is no flag shown in /proc/cpuinfo.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index c4fbe379cc0b..d708a1f83f40 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -364,6 +364,7 @@
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
 /*
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH v10 3/6] x86/split_lock: Enumerate split lock detection by the IA32_CORE_CAPABILITIES MSR
  2019-11-21  0:53 [PATCH v10 0/6] Enable split lock detection for real time and debug Fenghua Yu
  2019-11-21  0:53 ` [PATCH v10 1/6] x86/msr-index: Add two new MSRs Fenghua Yu
  2019-11-21  0:53 ` [PATCH v10 2/6] x86/cpufeatures: Enumerate the IA32_CORE_CAPABILITIES MSR Fenghua Yu
@ 2019-11-21  0:53 ` Fenghua Yu
  2019-11-21  0:53 ` [PATCH v10 4/6] x86/split_lock: Enumerate split lock detection if the IA32_CORE_CAPABILITIES MSR is not supported Fenghua Yu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 145+ messages in thread
From: Fenghua Yu @ 2019-11-21  0:53 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

Bits in the IA32_CORE_CAPABILITIES MSR enumerate a few features that are
not enumerated through CPUID. Currently bit 5 is defined to enumerate
the feature of split lock detection. All other bits are reserved now.

When bit 5 is 1, the feature is supported and feature bit
X86_FEATURE_SPLIT_LOCK_DETECT is set. Otherwise, the feature is not
available.

The flag shown in /proc/cpuinfo is "split_lock_detect".

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/cpu.h         |  5 +++++
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/kernel/cpu/common.c       |  2 ++
 arch/x86/kernel/cpu/intel.c        | 19 +++++++++++++++++++
 4 files changed, 27 insertions(+)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..4e03f53fc079 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,9 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+#endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d708a1f83f40..92be003c02ba 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -220,6 +220,7 @@
 #define X86_FEATURE_ZEN			( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
 #define X86_FEATURE_L1TF_PTEINV		( 7*32+29) /* "" L1TF workaround PTE inversion */
 #define X86_FEATURE_IBRS_ENHANCED	( 7*32+30) /* Enhanced IBRS */
+#define X86_FEATURE_SPLIT_LOCK_DETECT	( 7*32+31) /* #AC for split lock */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index fffe21945374..2ee5fd49266f 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1235,6 +1235,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
+	cpu_set_core_cap_bits(c);
+
 	fpu__init_system(c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 11d5c5950e2d..ce87e2c68767 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1028,3 +1028,22 @@ static const struct cpu_dev intel_cpu_dev = {
 };
 
 cpu_dev_register(intel_cpu_dev);
+
+static void __init split_lock_setup(void)
+{
+	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+}
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+	u64 ia32_core_caps = 0;
+
+	if (!cpu_has(c, X86_FEATURE_CORE_CAPABILITIES))
+		return;
+
+	/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+	rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
+
+	if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT)
+		split_lock_setup();
+}
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH v10 4/6] x86/split_lock: Enumerate split lock detection if the IA32_CORE_CAPABILITIES MSR is not supported
  2019-11-21  0:53 [PATCH v10 0/6] Enable split lock detection for real time and debug Fenghua Yu
                   ` (2 preceding siblings ...)
  2019-11-21  0:53 ` [PATCH v10 3/6] x86/split_lock: Enumerate split lock detection by " Fenghua Yu
@ 2019-11-21  0:53 ` Fenghua Yu
  2019-11-21 22:07   ` Andy Lutomirski
  2019-11-21  0:53 ` [PATCH v10 5/6] x86/split_lock: Handle #AC exception for split lock Fenghua Yu
  2019-11-21  0:53 ` [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter Fenghua Yu
  5 siblings, 1 reply; 145+ messages in thread
From: Fenghua Yu @ 2019-11-21  0:53 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

Architecturally the split lock detection feature is enumerated by
IA32_CORE_CAPABILITIES MSR and future CPU models will indicate presence
of the feature by setting bit 5. But the feature is present in a few
older models where split lock detection is enumerated by the CPU models.

Use a "x86_cpu_id" table to list the older CPU models with the feature.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/intel.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index ce87e2c68767..2614616fb6d3 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,7 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -1034,15 +1035,31 @@ static void __init split_lock_setup(void)
 	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
 }
 
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+	{}
+};
+
 void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
 {
 	u64 ia32_core_caps = 0;
 
-	if (!cpu_has(c, X86_FEATURE_CORE_CAPABILITIES))
-		return;
-
-	/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
-	rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
+	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+		rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
+	} else {
+		/* Enumerate split lock detection by family and model. */
+		if (x86_match_cpu(split_lock_cpu_ids))
+			ia32_core_caps |= MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT;
+	}
 
 	if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT)
 		split_lock_setup();
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH v10 5/6] x86/split_lock: Handle #AC exception for split lock
  2019-11-21  0:53 [PATCH v10 0/6] Enable split lock detection for real time and debug Fenghua Yu
                   ` (3 preceding siblings ...)
  2019-11-21  0:53 ` [PATCH v10 4/6] x86/split_lock: Enumerate split lock detection if the IA32_CORE_CAPABILITIES MSR is not supported Fenghua Yu
@ 2019-11-21  0:53 ` Fenghua Yu
  2019-11-21 22:10   ` Andy Lutomirski
  2019-11-21  0:53 ` [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter Fenghua Yu
  5 siblings, 1 reply; 145+ messages in thread
From: Fenghua Yu @ 2019-11-21  0:53 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

Currently Linux does not expect to see an alignment check exception in
kernel mode (since it does not set CR4.AC). The existing #AC handlers
will just return from exception to the faulting instruction which will
trigger another exception.

Add a new handler for #AC exceptions that will force a panic on split
lock for kernel mode.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/traps.h |  3 +++
 arch/x86/kernel/cpu/intel.c  |  2 ++
 arch/x86/kernel/traps.c      | 22 +++++++++++++++++++++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index b25e633033c3..0fa4eef83057 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -172,4 +172,7 @@ enum x86_pf_error_code {
 	X86_PF_INSTR	=		1 << 4,
 	X86_PF_PK	=		1 << 5,
 };
+
+extern bool split_lock_detect_enabled;
+
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 2614616fb6d3..bc0c2f288509 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -32,6 +32,8 @@
 #include <asm/apic.h>
 #endif
 
+bool split_lock_detect_enabled;
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4bb0f8447112..044033ff4326 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -293,9 +293,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
 DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
 DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
 DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
-DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
 #undef IP
 
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	unsigned int trapnr = X86_TRAP_AC;
+	char str[] = "alignment check";
+	int signr = SIGBUS;
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
+		return;
+
+	if (!user_mode(regs) && split_lock_detect_enabled)
+		panic("Split lock detected\n");
+
+	cond_local_irq_enable(regs);
+
+	/* Handle #AC generated in any other cases. */
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);
+}
+
 #ifdef CONFIG_VMAP_STACK
 __visible void __noreturn handle_stack_overflow(const char *message,
 						struct pt_regs *regs,
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21  0:53 [PATCH v10 0/6] Enable split lock detection for real time and debug Fenghua Yu
                   ` (4 preceding siblings ...)
  2019-11-21  0:53 ` [PATCH v10 5/6] x86/split_lock: Handle #AC exception for split lock Fenghua Yu
@ 2019-11-21  0:53 ` Fenghua Yu
  2019-11-21  6:04   ` Ingo Molnar
  2019-11-21  8:00   ` Peter Zijlstra
  5 siblings, 2 replies; 145+ messages in thread
From: Fenghua Yu @ 2019-11-21  0:53 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar
  Cc: linux-kernel, x86, Fenghua Yu

Split lock detection is disabled by default. Enable the feature by
kernel parameter "split_lock_detect".

Usually it is enabled in real time when expensive split lock issues
cannot be tolerated so should be fatal errors, or for debugging and
fixing the split lock issues to improve performance.

Please note: enabling this feature will cause kernel panic or SIGBUS
to user application when a split lock issue is detected.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
 .../admin-guide/kernel-parameters.txt         | 10 ++++++
 arch/x86/kernel/cpu/intel.c                   | 34 +++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 8dee8f68fe15..1ed313891f44 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3166,6 +3166,16 @@
 
 	nosoftlockup	[KNL] Disable the soft-lockup detector.
 
+	split_lock_detect
+			[X86] Enable split lock detection
+			This is a real time or debugging feature. When enabled
+			(and if hardware support is present), atomic
+			instructions that access data across cache line
+			boundaries will result in an alignment check exception.
+			When triggered in applications the kernel will send
+			SIGBUS. The kernel will panic for a split lock in
+			OS code.
+
 	nosync		[HW,M68K] Disables sync negotiation for all devices.
 
 	nowatchdog	[KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index bc0c2f288509..9bf6daf185b9 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -20,6 +20,7 @@
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
 #include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -655,6 +656,26 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_init(void)
+{
+	if (split_lock_detect_enabled) {
+		u64 test_ctrl_val;
+
+		/*
+		 * The TEST_CTRL MSR is per core. So multiple threads can
+		 * read/write the MSR in parallel. But it's possible to
+		 * simplify the read/write without locking and without
+		 * worry about overwriting the MSR because only bit 29
+		 * is implemented in the MSR and the bit is set as 1 by all
+		 * threads. Locking may be needed in the future if situation
+		 * is changed e.g. other bits are implemented.
+		 */
+		rdmsrl(MSR_TEST_CTRL, test_ctrl_val);
+		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+		wrmsrl(MSR_TEST_CTRL, test_ctrl_val);
+	}
+}
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
@@ -770,6 +791,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_enable();
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
+
+	split_lock_init();
 }
 
 #ifdef CONFIG_X86_32
@@ -1032,9 +1055,20 @@ static const struct cpu_dev intel_cpu_dev = {
 
 cpu_dev_register(intel_cpu_dev);
 
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
 static void __init split_lock_setup(void)
 {
 	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+	if (cmdline_find_option_bool(boot_command_line,
+				     "split_lock_detect")) {
+		split_lock_detect_enabled = true;
+		pr_info("enabled\n");
+	} else {
+		pr_info("disabled\n");
+	}
 }
 
 #define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21  0:53 ` [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter Fenghua Yu
@ 2019-11-21  6:04   ` Ingo Molnar
  2019-11-21 13:01     ` Peter Zijlstra
  2019-11-21  8:00   ` Peter Zijlstra
  1 sibling, 1 reply; 145+ messages in thread
From: Ingo Molnar @ 2019-11-21  6:04 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86


* Fenghua Yu <fenghua.yu@intel.com> wrote:

> Split lock detection is disabled by default. Enable the feature by
> kernel parameter "split_lock_detect".
> 
> Usually it is enabled in real time when expensive split lock issues
> cannot be tolerated so should be fatal errors, or for debugging and
> fixing the split lock issues to improve performance.
> 
> Please note: enabling this feature will cause kernel panic or SIGBUS
> to user application when a split lock issue is detected.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
>  .../admin-guide/kernel-parameters.txt         | 10 ++++++
>  arch/x86/kernel/cpu/intel.c                   | 34 +++++++++++++++++++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 8dee8f68fe15..1ed313891f44 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3166,6 +3166,16 @@
>  
>  	nosoftlockup	[KNL] Disable the soft-lockup detector.
>  
> +	split_lock_detect
> +			[X86] Enable split lock detection
> +			This is a real time or debugging feature. When enabled
> +			(and if hardware support is present), atomic
> +			instructions that access data across cache line
> +			boundaries will result in an alignment check exception.
> +			When triggered in applications the kernel will send
> +			SIGBUS. The kernel will panic for a split lock in
> +			OS code.

It would be really nice to be able to enable/disable this runtime as 
well, has this been raised before, and what was the conclusion?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21  0:53 ` [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter Fenghua Yu
  2019-11-21  6:04   ` Ingo Molnar
@ 2019-11-21  8:00   ` Peter Zijlstra
  1 sibling, 0 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-21  8:00 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Tony Luck, Ashok Raj, Ravi V Shankar, linux-kernel, x86

On Wed, Nov 20, 2019 at 04:53:23PM -0800, Fenghua Yu wrote:
> Split lock detection is disabled by default. Enable the feature by
> kernel parameter "split_lock_detect".
> 
> Usually it is enabled in real time when expensive split lock issues
> cannot be tolerated so should be fatal errors, or for debugging and
> fixing the split lock issues to improve performance.
> 
> Please note: enabling this feature will cause kernel panic or SIGBUS
> to user application when a split lock issue is detected.

ARGGGHH, by having this default disabled, firmware will _NEVER_ be
exposed to this before it ships.

How will you guarantee the firmware will not explode the moment you
enable this?

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21  6:04   ` Ingo Molnar
@ 2019-11-21 13:01     ` Peter Zijlstra
  2019-11-21 13:15       ` Peter Zijlstra
                         ` (2 more replies)
  0 siblings, 3 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-21 13:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Thu, Nov 21, 2019 at 07:04:44AM +0100, Ingo Molnar wrote:
> * Fenghua Yu <fenghua.yu@intel.com> wrote:

> > +	split_lock_detect
> > +			[X86] Enable split lock detection
> > +			This is a real time or debugging feature. When enabled
> > +			(and if hardware support is present), atomic
> > +			instructions that access data across cache line
> > +			boundaries will result in an alignment check exception.
> > +			When triggered in applications the kernel will send
> > +			SIGBUS. The kernel will panic for a split lock in
> > +			OS code.
> 
> It would be really nice to be able to enable/disable this runtime as 
> well, has this been raised before, and what was the conclusion?

It has, previous versions had that. Somehow a lot of things went missing
and we're back to a broken neutered useless mess.

The problem appears to be that due to hardware design the feature cannot
be virtualized, and instead of then disabling it when a VM runs/exists
they just threw in the towel and went back to useless mode.. :-(

This feature MUST be default enabled, otherwise everything will
be/remain broken and we'll end up in the situation where you can't use
it even if you wanted to.

Imagine the BIOS/EFI/firmware containing an #AC exception. At that point
the feature becomes useless, because you cannot enable it without your
machine dying.

Now, from long and painful experience we all know that if a BIOS can be
wrong, it will be. Therefore this feature will be/is useless as
presented.

And I can't be arsed to look it up, but we've been making this very same
argument since very early (possible the very first) version.

So this version goes straight into the bit bucket. Please try again.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 13:01     ` Peter Zijlstra
@ 2019-11-21 13:15       ` Peter Zijlstra
  2019-11-21 21:51         ` Luck, Tony
  2019-11-21 16:14       ` Fenghua Yu
  2019-11-21 17:12       ` Ingo Molnar
  2 siblings, 1 reply; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-21 13:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Thu, Nov 21, 2019 at 02:01:53PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 21, 2019 at 07:04:44AM +0100, Ingo Molnar wrote:
> > * Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
> > > +	split_lock_detect
> > > +			[X86] Enable split lock detection
> > > +			This is a real time or debugging feature. When enabled
> > > +			(and if hardware support is present), atomic
> > > +			instructions that access data across cache line
> > > +			boundaries will result in an alignment check exception.
> > > +			When triggered in applications the kernel will send
> > > +			SIGBUS. The kernel will panic for a split lock in
> > > +			OS code.
> > 
> > It would be really nice to be able to enable/disable this runtime as 
> > well, has this been raised before, and what was the conclusion?
> 
> It has, previous versions had that. Somehow a lot of things went missing
> and we're back to a broken neutered useless mess.
> 
> The problem appears to be that due to hardware design the feature cannot
> be virtualized, and instead of then disabling it when a VM runs/exists
> they just threw in the towel and went back to useless mode.. :-(
> 
> This feature MUST be default enabled, otherwise everything will
> be/remain broken and we'll end up in the situation where you can't use
> it even if you wanted to.
> 
> Imagine the BIOS/EFI/firmware containing an #AC exception. At that point
> the feature becomes useless, because you cannot enable it without your
> machine dying.
> 
> Now, from long and painful experience we all know that if a BIOS can be
> wrong, it will be. Therefore this feature will be/is useless as
> presented.
> 
> And I can't be arsed to look it up, but we've been making this very same
> argument since very early (possible the very first) version.
> 
> So this version goes straight into the bit bucket. Please try again.

Also, just to remind everyone why we really want this. Split lock is a
potent, unprivileged, DoS vector.

It works nicely across guests and everything. Furthermore no sane
software should have #AC, because RISC machines have been throwing
alignment checks on stupid crap like that forever.

And even on x86, where it 'works' it has been a performance nightmare
for pretty much ever since we lost the Front Side Bus or something like
that.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 13:01     ` Peter Zijlstra
  2019-11-21 13:15       ` Peter Zijlstra
@ 2019-11-21 16:14       ` Fenghua Yu
  2019-11-21 17:14         ` Ingo Molnar
  2019-11-21 17:35         ` Peter Zijlstra
  2019-11-21 17:12       ` Ingo Molnar
  2 siblings, 2 replies; 145+ messages in thread
From: Fenghua Yu @ 2019-11-21 16:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Thu, Nov 21, 2019 at 02:01:53PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 21, 2019 at 07:04:44AM +0100, Ingo Molnar wrote:
> > * Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
> > > +	split_lock_detect
> > > +			[X86] Enable split lock detection
> > > +			This is a real time or debugging feature. When enabled
> > > +			(and if hardware support is present), atomic
> > > +			instructions that access data across cache line
> > > +			boundaries will result in an alignment check exception.
> > > +			When triggered in applications the kernel will send
> > > +			SIGBUS. The kernel will panic for a split lock in
> > > +			OS code.
> > 
> > It would be really nice to be able to enable/disable this runtime as 
> > well, has this been raised before, and what was the conclusion?
> 
> It has, previous versions had that. Somehow a lot of things went missing
> and we're back to a broken neutered useless mess.
> 
> The problem appears to be that due to hardware design the feature cannot
> be virtualized, and instead of then disabling it when a VM runs/exists
> they just threw in the towel and went back to useless mode.. :-(

It's a bit complex to virtualize the TEST_CTRL MSR because it's per core
instead of per thread. But it's still doable to virtualize it as
discussion:
https://lore.kernel.org/lkml/20191017233824.GA23654@linux.intel.com/

KVM code will be released later. Even if there is no KVM code for split
lock, the patch set will killl qemu/guest if split lock happens there.
The goal of this patch set is to have a basic enabling code.

> 
> This feature MUST be default enabled, otherwise everything will
> be/remain broken and we'll end up in the situation where you can't use
> it even if you wanted to.

The usage scope of this patch set is largely reduced to only real time.
The long split lock processing time (>1000 cycles) cannot be tolerated
by real time.

Real time customers do want to use this feature to detect the fatal
split lock error. They don't want any split lock issue from BIOS/EFI/
firmware/kerne/drivers/user apps.

Real time can enable the feature (set bit 29 in TEST_CTRL MSR) in BIOS and
don't need OS to enable it. But, #AC handler cannot handle split lock
in the kernel and will return to the faulting instruction and re-enter #AC. So
current #AC handler doesn't provide useful information for the customers.
That's why we add the new #AC handler in this patch set.

>
> Imagine the BIOS/EFI/firmware containing an #AC exception. At that point
> the feature becomes useless, because you cannot enable it without your
> machine dying.

I believe Intel real time team guarantees to deliever a split lock FREE
BIOS/EFI/firmware to their real time users.

From kernel point of view, we are working on a split lock free kernel.
Some blocking split lock issues have been fixed in TIP tree.

Only limited user apps can run on real time and should be split lock
free before they are allowed to run on the real time system.

So the feature is enabled only for real time that wants to have
a controlled split lock free environment.

The point is a split lock issue is a FATAL error on real time. Whenever
it happens, the long processing time (>1000 cycles) cannot meet hard real
time requirement any more and the system/user app has to die.

> 
> Now, from long and painful experience we all know that if a BIOS can be
> wrong, it will be. Therefore this feature will be/is useless as
> presented.
> 
> And I can't be arsed to look it up, but we've been making this very same
> argument since very early (possible the very first) version.
> 
> So this version goes straight into the bit bucket. Please try again.

In summary, the patch set only wants to enable the feature for real time
and disable it by default.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 13:01     ` Peter Zijlstra
  2019-11-21 13:15       ` Peter Zijlstra
  2019-11-21 16:14       ` Fenghua Yu
@ 2019-11-21 17:12       ` Ingo Molnar
  2019-11-21 17:34         ` Luck, Tony
  2019-11-21 17:43         ` [PATCH v10 6/6] " David Laight
  2 siblings, 2 replies; 145+ messages in thread
From: Ingo Molnar @ 2019-11-21 17:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Nov 21, 2019 at 07:04:44AM +0100, Ingo Molnar wrote:
> > * Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
> > > +	split_lock_detect
> > > +			[X86] Enable split lock detection
> > > +			This is a real time or debugging feature. When enabled
> > > +			(and if hardware support is present), atomic
> > > +			instructions that access data across cache line
> > > +			boundaries will result in an alignment check exception.
> > > +			When triggered in applications the kernel will send
> > > +			SIGBUS. The kernel will panic for a split lock in
> > > +			OS code.
> > 
> > It would be really nice to be able to enable/disable this runtime as 
> > well, has this been raised before, and what was the conclusion?
> 
> It has, previous versions had that. Somehow a lot of things went missing
> and we're back to a broken neutered useless mess.
> 
> The problem appears to be that due to hardware design the feature cannot
> be virtualized, and instead of then disabling it when a VM runs/exists
> they just threw in the towel and went back to useless mode.. :-(
> 
> This feature MUST be default enabled, otherwise everything will
> be/remain broken and we'll end up in the situation where you can't use
> it even if you wanted to.

Agreed.

> And I can't be arsed to look it up, but we've been making this very 
> same argument since very early (possible the very first) version.

Yeah, I now have a distinct deja vu...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 16:14       ` Fenghua Yu
@ 2019-11-21 17:14         ` Ingo Molnar
  2019-11-21 17:35         ` Peter Zijlstra
  1 sibling, 0 replies; 145+ messages in thread
From: Ingo Molnar @ 2019-11-21 17:14 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86


* Fenghua Yu <fenghua.yu@intel.com> wrote:

> > This feature MUST be default enabled, otherwise everything will 
> > be/remain broken and we'll end up in the situation where you can't 
> > use it even if you wanted to.
> 
> The usage scope of this patch set is largely reduced to only real time. 
> The long split lock processing time (>1000 cycles) cannot be tolerated 
> by real time.
> 
> Real time customers do want to use this feature to detect the fatal 
> split lock error. They don't want any split lock issue from BIOS/EFI/ 
> firmware/kerne/drivers/user apps.
> 
> Real time can enable the feature (set bit 29 in TEST_CTRL MSR) in BIOS 
> and don't need OS to enable it. But, #AC handler cannot handle split 
> lock in the kernel and will return to the faulting instruction and 
> re-enter #AC. So current #AC handler doesn't provide useful information 
> for the customers. That's why we add the new #AC handler in this patch 
> set.

Immaterial - for this feature to be useful it must be default-enabled, 
with reasonable quirk knobs offered to people who happen to be bitten by 
such bugs and cannot fix the software.

But default-enabled is a must-have, as Peter said.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 17:12       ` Ingo Molnar
@ 2019-11-21 17:34         ` Luck, Tony
  2019-11-22 10:51           ` Peter Zijlstra
  2019-11-21 17:43         ` [PATCH v10 6/6] " David Laight
  1 sibling, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2019-11-21 17:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Thu, Nov 21, 2019 at 06:12:14PM +0100, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Thu, Nov 21, 2019 at 07:04:44AM +0100, Ingo Molnar wrote:
> > > * Fenghua Yu <fenghua.yu@intel.com> wrote:
> > 
> > > > +	split_lock_detect
> > > > +			[X86] Enable split lock detection
> > > > +			This is a real time or debugging feature. When enabled
> > > > +			(and if hardware support is present), atomic
> > > > +			instructions that access data across cache line
> > > > +			boundaries will result in an alignment check exception.
> > > > +			When triggered in applications the kernel will send
> > > > +			SIGBUS. The kernel will panic for a split lock in
> > > > +			OS code.
> > > 
> > > It would be really nice to be able to enable/disable this runtime as 
> > > well, has this been raised before, and what was the conclusion?
> > 
> > It has, previous versions had that. Somehow a lot of things went missing
> > and we're back to a broken neutered useless mess.
> > 
> > The problem appears to be that due to hardware design the feature cannot
> > be virtualized, and instead of then disabling it when a VM runs/exists
> > they just threw in the towel and went back to useless mode.. :-(
> > 
> > This feature MUST be default enabled, otherwise everything will
> > be/remain broken and we'll end up in the situation where you can't use
> > it even if you wanted to.
> 
> Agreed.
> 
> > And I can't be arsed to look it up, but we've been making this very 
> > same argument since very early (possible the very first) version.
> 
> Yeah, I now have a distinct deja vu...

You'll notice that we are at version 10 ... lots of things have been tried
in previous versions. This new version is to get the core functionality
in, so we can build fancier features later.  Painful experience has shown
that trying to do this all at once just leads to churn with no progress.

Enabling by default at this point would result in a flurry of complaints
about applications being killed and kernels panicing. That would be
followed by:

#include <linus/all-caps-rant-about-backwards-compatability.h>

and the patches being reverted.

This version can serve a very useful purpose. CI systems with h/w that
supports split lock can enable it and begin the process of finding
and fixing the remaining kernel issues. Especially helpful if they run
randconfig and fuzzers.

We'd also find out which libraries and applications currently use
split locks.

Real-time folks that have identified split lock as a fatal (don't meet
their deadlines issue) could also enable it as is (because it is better
to crash the kernel and have the laser be powered down than to keep
firing long past the point it should have stopped).

Any developer with concerns about their BIOS using split locks can also
enable using this patch and begin testing today.

I'm totally on board with follow up patches providing extra features like:

	A way to enable/disable at run time.

	Providing a way to allow but rate limit applications that cause
	split locks.

	Figuring out something useful to do with virtualization.

Those are all good things to have - but we won't get *any* of them if we
wait until *all* have them have been perfected.

<soapbox>
So let's just take the first step now and solve world hunger tomorrow.
</soapbox>

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 16:14       ` Fenghua Yu
  2019-11-21 17:14         ` Ingo Molnar
@ 2019-11-21 17:35         ` Peter Zijlstra
  1 sibling, 0 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-21 17:35 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Thu, Nov 21, 2019 at 08:14:10AM -0800, Fenghua Yu wrote:

> The usage scope of this patch set is largely reduced to only real time.
> The long split lock processing time (>1000 cycles) cannot be tolerated
> by real time.

I'm thinking you're clueless on realtime. There's plenty of things that
can cause many cycles to go out the window. And just a single
instruction soaking up cycles like that really isn't the problem.

The problem is that split lock defeats isolation. An otherwise contained
task can have pretty horrific side effects on other tasks.

> Real time customers do want to use this feature to detect the fatal
> split lock error. They don't want any split lock issue from BIOS/EFI/
> firmware/kerne/drivers/user apps.

Cloud vendors also don't want them. Nobody wants them, they stink. They
have a system wide impact.

I don't want them on my system.

> > Imagine the BIOS/EFI/firmware containing an #AC exception. At that point
> > the feature becomes useless, because you cannot enable it without your
> > machine dying.
> 
> I believe Intel real time team guarantees to deliever a split lock FREE
> BIOS/EFI/firmware to their real time users.

Not good enough. Any system shipping with this capability needs to have
a split lock free firmware blob. And the only way to make that happen is
to force enable it by default.

> From kernel point of view, we are working on a split lock free kernel.
> Some blocking split lock issues have been fixed in TIP tree.

Haven't we fixed them all by now?

> Only limited user apps can run on real time and should be split lock
> free before they are allowed to run on the real time system.

I'm thinking most of the normal Linux userspace will run just fine.
Seeing how other architectures have rejected such nonsense forever.

> In summary, the patch set only wants to enable the feature for real time
> and disable it by default.

We told you that wasn't good enough many times. Lot's of people run the
preemp-rt kernel on lots of different hardware. And like I said, even
cloudy folks would want this.

Features that require special firmware that nobody has are useless.

For giggles, run the below. You can notice your desktop getting slower.

---
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>

void main(void)
{
	void *addr = mmap(NULL, 4096*2, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
	unsigned int *var;
	if (addr == (void*)-1) {
		printf("fail\n");
		return;
	}

	var = addr + 4096 - 2;

	for (;;)
		asm volatile ("lock incl %0" : : "m" (*var));
}

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 17:12       ` Ingo Molnar
  2019-11-21 17:34         ` Luck, Tony
@ 2019-11-21 17:43         ` David Laight
  2019-11-21 17:51           ` Andy Lutomirski
  1 sibling, 1 reply; 145+ messages in thread
From: David Laight @ 2019-11-21 17:43 UTC (permalink / raw)
  To: 'Ingo Molnar', Peter Zijlstra
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

From: Ingo Molnar
> Sent: 21 November 2019 17:12
> * Peter Zijlstra <peterz@infradead.org> wrote:
...
> > This feature MUST be default enabled, otherwise everything will
> > be/remain broken and we'll end up in the situation where you can't use
> > it even if you wanted to.
> 
> Agreed.

Before it can be enabled by default someone needs to go through the
kernel and fix all the code that abuses the 'bit' functions by using them
on int[] instead of long[].

I've only seen one fix go through for one use case of one piece of code
that repeatedly uses potentially misaligned int[] arrays for bitmasks.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 17:43         ` [PATCH v10 6/6] " David Laight
@ 2019-11-21 17:51           ` Andy Lutomirski
  2019-11-21 18:53             ` Fenghua Yu
                               ` (2 more replies)
  0 siblings, 3 replies; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-21 17:51 UTC (permalink / raw)
  To: David Laight
  Cc: Ingo Molnar, Peter Zijlstra, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Tony Luck,
	Ashok Raj, Ravi V Shankar, linux-kernel, x86

On Thu, Nov 21, 2019 at 9:43 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Ingo Molnar
> > Sent: 21 November 2019 17:12
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> ...
> > > This feature MUST be default enabled, otherwise everything will
> > > be/remain broken and we'll end up in the situation where you can't use
> > > it even if you wanted to.
> >
> > Agreed.
>
> Before it can be enabled by default someone needs to go through the
> kernel and fix all the code that abuses the 'bit' functions by using them
> on int[] instead of long[].
>
> I've only seen one fix go through for one use case of one piece of code
> that repeatedly uses potentially misaligned int[] arrays for bitmasks.
>

Can we really not just change the lock asm to use 32-bit accesses for
set_bit(), etc?  Sure, it will fail if the bit index is greater than
2^32, but that seems nuts.

(Why the *hell* do the bitops use long anyway?  They're *bit masks*
for crying out loud.  As in, users generally want to operate on fixed
numbers of bits.)

--Andy

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 17:51           ` Andy Lutomirski
@ 2019-11-21 18:53             ` Fenghua Yu
  2019-11-21 19:01               ` Andy Lutomirski
                                 ` (2 more replies)
  2019-11-21 19:56             ` Peter Zijlstra
  2019-11-22  9:46             ` David Laight
  2 siblings, 3 replies; 145+ messages in thread
From: Fenghua Yu @ 2019-11-21 18:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Laight, Ingo Molnar, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Tony Luck,
	Ashok Raj, Ravi V Shankar, linux-kernel, x86

On Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:
> On Thu, Nov 21, 2019 at 9:43 AM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Ingo Molnar
> > > Sent: 21 November 2019 17:12
> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > ...
> > > > This feature MUST be default enabled, otherwise everything will
> > > > be/remain broken and we'll end up in the situation where you can't use
> > > > it even if you wanted to.
> > >
> > > Agreed.
> >
> > Before it can be enabled by default someone needs to go through the
> > kernel and fix all the code that abuses the 'bit' functions by using them
> > on int[] instead of long[].
> >
> > I've only seen one fix go through for one use case of one piece of code
> > that repeatedly uses potentially misaligned int[] arrays for bitmasks.
> >
> 
> Can we really not just change the lock asm to use 32-bit accesses for
> set_bit(), etc?  Sure, it will fail if the bit index is greater than
> 2^32, but that seems nuts.
> 
> (Why the *hell* do the bitops use long anyway?  They're *bit masks*
> for crying out loud.  As in, users generally want to operate on fixed
> numbers of bits.)

We are working on a separate patch set to fix all split lock issues
in atomic bitops. Per Peter Anvin and Tony Luck suggestions:
1. Still keep the byte optimization if nr is constant. No split lock.
2. If type of *addr is unsigned long, do quadword atomic instruction
   on addr. No split lock.
3. If type of *addr is unsigned int, do word atomic instruction
   on addr. No split lock.
4. Otherwise, re-calculate addr to point the 32-bit address which contains
   the bit and operate on the bit. No split lock.

Only small percentage of atomic bitops calls are in case 4 (e.g. 3%
for set_bit()) which need a few extra instructions to re-calculate
address but can avoid big split lock overhead.

To get real type of *addr instead of type cast type "unsigned long",
the atomic bitops APIs are changed to macros from functions. This change
need to touch all architectures.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 18:53             ` Fenghua Yu
@ 2019-11-21 19:01               ` Andy Lutomirski
  2019-11-21 20:25                 ` Fenghua Yu
  2019-11-21 19:46               ` Peter Zijlstra
  2019-11-21 20:25               ` Peter Zijlstra
  2 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-21 19:01 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, David Laight, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Tony Luck, Ashok Raj, Ravi V Shankar, linux-kernel, x86


> On Nov 21, 2019, at 10:40 AM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
> On Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:
>>> On Thu, Nov 21, 2019 at 9:43 AM David Laight <David.Laight@aculab.com> wrote:
>>> 
>>> From: Ingo Molnar
>>>> Sent: 21 November 2019 17:12
>>>> * Peter Zijlstra <peterz@infradead.org> wrote:
>>> ...
>>>>> This feature MUST be default enabled, otherwise everything will
>>>>> be/remain broken and we'll end up in the situation where you can't use
>>>>> it even if you wanted to.
>>>> 
>>>> Agreed.
>>> 
>>> Before it can be enabled by default someone needs to go through the
>>> kernel and fix all the code that abuses the 'bit' functions by using them
>>> on int[] instead of long[].
>>> 
>>> I've only seen one fix go through for one use case of one piece of code
>>> that repeatedly uses potentially misaligned int[] arrays for bitmasks.
>>> 
>> 
>> Can we really not just change the lock asm to use 32-bit accesses for
>> set_bit(), etc?  Sure, it will fail if the bit index is greater than
>> 2^32, but that seems nuts.
>> 
>> (Why the *hell* do the bitops use long anyway?  They're *bit masks*
>> for crying out loud.  As in, users generally want to operate on fixed
>> numbers of bits.)
> 
> We are working on a separate patch set to fix all split lock issues
> in atomic bitops. Per Peter Anvin and Tony Luck suggestions:
> 1. Still keep the byte optimization if nr is constant. No split lock.
> 2. If type of *addr is unsigned long, do quadword atomic instruction
>   on addr. No split lock.
> 3. If type of *addr is unsigned int, do word atomic instruction
>   on addr. No split lock.
> 4. Otherwise, re-calculate addr to point the 32-bit address which contains
>   the bit and operate on the bit. No split lock.
> 
> Only small percentage of atomic bitops calls are in case 4 (e.g. 3%
> for set_bit()) which need a few extra instructions to re-calculate
> address but can avoid big split lock overhead.
> 
> To get real type of *addr instead of type cast type "unsigned long",
> the atomic bitops APIs are changed to macros from functions. This change
> need to touch all architectures.
> 

Isn’t the kernel full of casts to long* to match the signature?  Doing this based on type seems silly to me. I think it’s better to just to a 32-bit operation unconditionally and to try to optimize it using b*l when safe.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 18:53             ` Fenghua Yu
  2019-11-21 19:01               ` Andy Lutomirski
@ 2019-11-21 19:46               ` Peter Zijlstra
  2019-11-21 20:25               ` Peter Zijlstra
  2 siblings, 0 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-21 19:46 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, David Laight, Ingo Molnar, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Tony Luck,
	Ashok Raj, Ravi V Shankar, linux-kernel, x86

On Thu, Nov 21, 2019 at 10:53:03AM -0800, Fenghua Yu wrote:

> We are working on a separate patch set to fix all split lock issues
> in atomic bitops. Per Peter Anvin and Tony Luck suggestions:
> 1. Still keep the byte optimization if nr is constant. No split lock.
> 2. If type of *addr is unsigned long, do quadword atomic instruction
>    on addr. No split lock.
> 3. If type of *addr is unsigned int, do word atomic instruction
>    on addr. No split lock.
> 4. Otherwise, re-calculate addr to point the 32-bit address which contains
>    the bit and operate on the bit. No split lock.

Yeah, let's not do that. That sounds overly complicated for no real
purpose.


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 17:51           ` Andy Lutomirski
  2019-11-21 18:53             ` Fenghua Yu
@ 2019-11-21 19:56             ` Peter Zijlstra
  2019-11-21 21:01               ` Andy Lutomirski
  2019-11-22  9:46             ` David Laight
  2 siblings, 1 reply; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-21 19:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: David Laight, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Tony Luck,
	Ashok Raj, Ravi V Shankar, linux-kernel, x86

On Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:

> Can we really not just change the lock asm to use 32-bit accesses for
> set_bit(), etc?  Sure, it will fail if the bit index is greater than
> 2^32, but that seems nuts.

There are 64bit architectures that do exactly that: Alpha, IA64.

And because of the byte 'optimization' from x86 we already could not
rely on word atomicity (we actually play games with multi-bit atomicity
for PG_waiters and clear_bit_unlock_is_negative_byte).

Also, there's a fun paper on the properties of mixed size atomic
operations for when you want to hurt your brain real bad:

  https://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf

_If_ we're going to change the bitops interface, I would propose we
change it to u32 and mandate every operation is indeed 32bit wide.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 20:25                 ` Fenghua Yu
@ 2019-11-21 20:19                   ` Peter Zijlstra
  0 siblings, 0 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-21 20:19 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Andy Lutomirski, David Laight, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Tony Luck, Ashok Raj, Ravi V Shankar, linux-kernel, x86

On Thu, Nov 21, 2019 at 12:25:35PM -0800, Fenghua Yu wrote:

> > > We are working on a separate patch set to fix all split lock issues
> > > in atomic bitops. Per Peter Anvin and Tony Luck suggestions:
> > > 1. Still keep the byte optimization if nr is constant. No split lock.
> > > 2. If type of *addr is unsigned long, do quadword atomic instruction
> > >   on addr. No split lock.
> > > 3. If type of *addr is unsigned int, do word atomic instruction
> > >   on addr. No split lock.
> > > 4. Otherwise, re-calculate addr to point the 32-bit address which contains
> > >   the bit and operate on the bit. No split lock.

> Actually we only find 8 places calling atomic bitops using type casting
> "unsigned long *". After above changes, other 8 patches remove the type
> castings and then split lock free in atomic bitops in the current kernel.

Those above changes are never going to happen.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 18:53             ` Fenghua Yu
  2019-11-21 19:01               ` Andy Lutomirski
  2019-11-21 19:46               ` Peter Zijlstra
@ 2019-11-21 20:25               ` Peter Zijlstra
  2019-11-21 21:22                 ` Andy Lutomirski
  2 siblings, 1 reply; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-21 20:25 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, David Laight, Ingo Molnar, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Tony Luck,
	Ashok Raj, Ravi V Shankar, linux-kernel, x86

On Thu, Nov 21, 2019 at 10:53:03AM -0800, Fenghua Yu wrote:

> 4. Otherwise, re-calculate addr to point the 32-bit address which contains
>    the bit and operate on the bit. No split lock.

That sounds confused, Even BT{,CRS} have a RmW size. There is no
'operate on the bit'.

Specifically I hard rely on BTSL to be a 32bit RmW, see commit:

  7aa54be29765 ("locking/qspinlock, x86: Provide liveness guarantee")

You might need to read this paper:

  https://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 19:01               ` Andy Lutomirski
@ 2019-11-21 20:25                 ` Fenghua Yu
  2019-11-21 20:19                   ` Peter Zijlstra
  0 siblings, 1 reply; 145+ messages in thread
From: Fenghua Yu @ 2019-11-21 20:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, David Laight, Ingo Molnar, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Tony Luck, Ashok Raj, Ravi V Shankar, linux-kernel, x86

On Thu, Nov 21, 2019 at 11:01:39AM -0800, Andy Lutomirski wrote:
> 
> > On Nov 21, 2019, at 10:40 AM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> > 
> > On Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:
> >>> On Thu, Nov 21, 2019 at 9:43 AM David Laight <David.Laight@aculab.com> wrote:
> >>> 
> >>> From: Ingo Molnar
> >>>> Sent: 21 November 2019 17:12
> >>>> * Peter Zijlstra <peterz@infradead.org> wrote:
> >>> ...
> >>>>> This feature MUST be default enabled, otherwise everything will
> >>>>> be/remain broken and we'll end up in the situation where you can't use
> >>>>> it even if you wanted to.
> >>>> 
> >>>> Agreed.
> >>> 
> >>> Before it can be enabled by default someone needs to go through the
> >>> kernel and fix all the code that abuses the 'bit' functions by using them
> >>> on int[] instead of long[].
> >>> 
> >>> I've only seen one fix go through for one use case of one piece of code
> >>> that repeatedly uses potentially misaligned int[] arrays for bitmasks.
> >>> 
> >> 
> >> Can we really not just change the lock asm to use 32-bit accesses for
> >> set_bit(), etc?  Sure, it will fail if the bit index is greater than
> >> 2^32, but that seems nuts.
> >> 
> >> (Why the *hell* do the bitops use long anyway?  They're *bit masks*
> >> for crying out loud.  As in, users generally want to operate on fixed
> >> numbers of bits.)
> > 
> > We are working on a separate patch set to fix all split lock issues
> > in atomic bitops. Per Peter Anvin and Tony Luck suggestions:
> > 1. Still keep the byte optimization if nr is constant. No split lock.
> > 2. If type of *addr is unsigned long, do quadword atomic instruction
> >   on addr. No split lock.
> > 3. If type of *addr is unsigned int, do word atomic instruction
> >   on addr. No split lock.
> > 4. Otherwise, re-calculate addr to point the 32-bit address which contains
> >   the bit and operate on the bit. No split lock.
> > 
> > Only small percentage of atomic bitops calls are in case 4 (e.g. 3%
> > for set_bit()) which need a few extra instructions to re-calculate
> > address but can avoid big split lock overhead.
> > 
> > To get real type of *addr instead of type cast type "unsigned long",
> > the atomic bitops APIs are changed to macros from functions. This change
> > need to touch all architectures.
> > 
> 
> Isn’t the kernel full of casts to long* to match the signature?  Doing this based on type seems silly to me. I think it’s better to just to a 32-bit operation unconditionally and to try to optimize it
>using b*l when safe.

Actually we only find 8 places calling atomic bitops using type casting
"unsigned long *". After above changes, other 8 patches remove the type
castings and then split lock free in atomic bitops in the current kernel.

To check type casting in new patches, we add checkpatch.pl to warn on
any type casting on atomic bitops in new patches because the APIs are
marocs and gcc doesn't warn/issue error on type casting.

Using b*l will change the 8 places as well plus a lot of other places
where *addr is defined as "unsigned long *", right?

Thanks.

-Fenghua


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 19:56             ` Peter Zijlstra
@ 2019-11-21 21:01               ` Andy Lutomirski
  2019-11-22  9:36                 ` Peter Zijlstra
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-21 21:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, David Laight, Ingo Molnar, Fenghua Yu,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Tony Luck, Ashok Raj, Ravi V Shankar, linux-kernel, x86


> On Nov 21, 2019, at 11:56 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:
> 
>> Can we really not just change the lock asm to use 32-bit accesses for
>> set_bit(), etc?  Sure, it will fail if the bit index is greater than
>> 2^32, but that seems nuts.
> 
> There are 64bit architectures that do exactly that: Alpha, IA64.
> 
> And because of the byte 'optimization' from x86 we already could not
> rely on word atomicity (we actually play games with multi-bit atomicity
> for PG_waiters and clear_bit_unlock_is_negative_byte).

I read a couple pages of the paper you linked and I didn’t spot what you’re talking about as it refers to x86.  What are the relevant word properties of x86 bitops or the byte optimization?

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 20:25               ` Peter Zijlstra
@ 2019-11-21 21:22                 ` Andy Lutomirski
  2019-11-22  9:25                   ` Peter Zijlstra
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-21 21:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Fenghua Yu, Andy Lutomirski, David Laight, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Tony Luck, Ashok Raj, Ravi V Shankar, linux-kernel, x86

On Thu, Nov 21, 2019 at 12:25 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Nov 21, 2019 at 10:53:03AM -0800, Fenghua Yu wrote:
>
> > 4. Otherwise, re-calculate addr to point the 32-bit address which contains
> >    the bit and operate on the bit. No split lock.
>
> That sounds confused, Even BT{,CRS} have a RmW size. There is no
> 'operate on the bit'.
>
> Specifically I hard rely on BTSL to be a 32bit RmW, see commit:
>
>   7aa54be29765 ("locking/qspinlock, x86: Provide liveness guarantee")
>

Okay, spent a bit of time trying to grok this.  Are you saying that
LOCK BTSL suffices in a case where LOCK BTSB or LOCK XCHG8 would not?
On x86, all the LOCK operations are full barriers, so they should
order with adjacent normal accesses even to unrelated addresses,
right?

I certainly understand that a *non-locked* RMW to a bit might need to
have a certain width to get the right ordering guarantees, but those
aren't affected by split-lock detection regardless.

--Andy

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 13:15       ` Peter Zijlstra
@ 2019-11-21 21:51         ` Luck, Tony
  2019-11-21 22:24           ` Andy Lutomirski
  2019-11-22 10:08           ` Peter Zijlstra
  0 siblings, 2 replies; 145+ messages in thread
From: Luck, Tony @ 2019-11-21 21:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Thu, Nov 21, 2019 at 02:15:22PM +0100, Peter Zijlstra wrote:
> Also, just to remind everyone why we really want this. Split lock is a
> potent, unprivileged, DoS vector.

So how much do we "really want this"?

It's been 543 days since the first version of this patch was
posted. We've made exactly zero progress.

Current cut down patch series is the foundation to move one
small step towards getting this done.

Almost all of what's in this set will be required in whatever
final solution we want to end up with. Out of this:

 Documentation/admin-guide/kernel-parameters.txt |   10 +++
 arch/x86/include/asm/cpu.h                      |    5 +
 arch/x86/include/asm/cpufeatures.h              |    2 
 arch/x86/include/asm/msr-index.h                |    8 ++
 arch/x86/include/asm/traps.h                    |    3 +
 arch/x86/kernel/cpu/common.c                    |    2 
 arch/x86/kernel/cpu/intel.c                     |   72 ++++++++++++++++++++++++
 arch/x86/kernel/traps.c                         |   22 +++++++
 8 files changed, 123 insertions(+), 1 deletion(-)

the only substantive thing that will *change* is to make the default
be "on" rather than "off".

Everything else we want to do is *additions* to this base. We could
wait until we have those done and maybe see if we can stall out this
series to an even thousand days. Or, we can take the imperfect base
and build incrementally on it.

You've expressed concern about firmware ... with a simple kernel command
line switch to flip, LUV (https://01.org/linux-uefi-validation) could begin
testing to make sure that firmware is ready for the big day when we throw
the switch from "off" to "on".

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 4/6] x86/split_lock: Enumerate split lock detection if the IA32_CORE_CAPABILITIES MSR is not supported
  2019-11-21  0:53 ` [PATCH v10 4/6] x86/split_lock: Enumerate split lock detection if the IA32_CORE_CAPABILITIES MSR is not supported Fenghua Yu
@ 2019-11-21 22:07   ` Andy Lutomirski
  2019-11-22  0:37     ` Fenghua Yu
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-21 22:07 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86



> On Nov 20, 2019, at 5:45 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
> Architecturally the split lock detection feature is enumerated by
> IA32_CORE_CAPABILITIES MSR and future CPU models will indicate presence
> of the feature by setting bit 5. But the feature is present in a few
> older models where split lock detection is enumerated by the CPU models.
> 
> Use a "x86_cpu_id" table to list the older CPU models with the feature.
> 

This may need to be disabled if the HYPERVISOR bit is set.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 5/6] x86/split_lock: Handle #AC exception for split lock
  2019-11-21  0:53 ` [PATCH v10 5/6] x86/split_lock: Handle #AC exception for split lock Fenghua Yu
@ 2019-11-21 22:10   ` Andy Lutomirski
  2019-11-21 23:14     ` Fenghua Yu
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-21 22:10 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86



> On Nov 20, 2019, at 5:45 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
> Currently Linux does not expect to see an alignment check exception in
> kernel mode (since it does not set CR4.AC). The existing #AC handlers
> will just return from exception to the faulting instruction which will
> trigger another exception.
> 
> Add a new handler for #AC exceptions that will force a panic on split
> lock for kernel mode.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
> arch/x86/include/asm/traps.h |  3 +++
> arch/x86/kernel/cpu/intel.c  |  2 ++
> arch/x86/kernel/traps.c      | 22 +++++++++++++++++++++-
> 3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
> index b25e633033c3..0fa4eef83057 100644
> --- a/arch/x86/include/asm/traps.h
> +++ b/arch/x86/include/asm/traps.h
> @@ -172,4 +172,7 @@ enum x86_pf_error_code {
>    X86_PF_INSTR    =        1 << 4,
>    X86_PF_PK    =        1 << 5,
> };
> +
> +extern bool split_lock_detect_enabled;
> +
> #endif /* _ASM_X86_TRAPS_H */
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 2614616fb6d3..bc0c2f288509 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -32,6 +32,8 @@
> #include <asm/apic.h>
> #endif
> 
> +bool split_lock_detect_enabled;
> +
> /*
>  * Just in case our CPU detection goes bad, or you have a weird system,
>  * allow a way to override the automatic disabling of MPX.
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 4bb0f8447112..044033ff4326 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -293,9 +293,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
> DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
> DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
> DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
> -DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
> #undef IP
> 
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> +    unsigned int trapnr = X86_TRAP_AC;
> +    char str[] = "alignment check";
> +    int signr = SIGBUS;
> +
> +    RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> +    if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> +        return;
> +
> +    if (!user_mode(regs) && split_lock_detect_enabled)
> +        panic("Split lock detected\n");

NAK.

1. Don’t say “split lock detected” if you don’t know that you detected a split lock.  Or is this genuinely the only way to get #AC from kernel mode?

2. Don’t panic. Use die() just like every other error where nothing is corrupted.

And maybe instead turn off split lock detection and print a stack trace instead.  Then the kernel is even more likely to survive to log something useful.


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 21:51         ` Luck, Tony
@ 2019-11-21 22:24           ` Andy Lutomirski
  2019-11-21 22:29             ` Luck, Tony
  2019-11-22  0:55             ` Luck, Tony
  2019-11-22 10:08           ` Peter Zijlstra
  1 sibling, 2 replies; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-21 22:24 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Peter Zijlstra, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86


> On Nov 21, 2019, at 1:51 PM, Luck, Tony <tony.luck@intel.com> wrote:
> 
> On Thu, Nov 21, 2019 at 02:15:22PM +0100, Peter Zijlstra wrote:
>> Also, just to remind everyone why we really want this. Split lock is a
>> potent, unprivileged, DoS vector.
> 
> So how much do we "really want this"?
> 
> It's been 543 days since the first version of this patch was
> posted. We've made exactly zero progress.
> 
> Current cut down patch series is the foundation to move one
> small step towards getting this done.
> 
> Almost all of what's in this set will be required in whatever
> final solution we want to end up with. Out of this:

Why don’t we beat it into shape and apply it, hidden behind BROKEN. Then we can work on the rest of the patches and have a way to test them.

It would be really, really nice if we could pass this feature through to a VM. Can we?

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 22:24           ` Andy Lutomirski
@ 2019-11-21 22:29             ` Luck, Tony
  2019-11-21 23:18               ` Andy Lutomirski
  2019-11-22  0:55             ` Luck, Tony
  1 sibling, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2019-11-21 22:29 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86

> It would be really, really nice if we could pass this feature through to a VM. Can we?

It's hard because the MSR is core scoped rather than thread scoped.  So on an HT
enabled system a pair of logical processors gets enabled/disabled together.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 5/6] x86/split_lock: Handle #AC exception for split lock
  2019-11-21 23:14     ` Fenghua Yu
@ 2019-11-21 23:12       ` Andy Lutomirski
  0 siblings, 0 replies; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-21 23:12 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86


> On Nov 21, 2019, at 3:02 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
> On Thu, Nov 21, 2019 at 02:10:38PM -0800, Andy Lutomirski wrote:
>> 
>> 
>>>> On Nov 20, 2019, at 5:45 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
>>> 
>>> +    if (!user_mode(regs) && split_lock_detect_enabled)
>>> +        panic("Split lock detected\n");
>> 
>> NAK.
>> 
>> 1. Don’t say “split lock detected” if you don’t know that you detected a split lock.  Or is this genuinely the only way to get #AC from kernel mode?
> 
> Intel hardware design team confirmed that the only reason for #AC in ring 0 is
> split lock.

Okay.

This should eventually get integrated with Jann’s decoder work to print the lock address and size.

> 
>> 
>> 2. Don’t panic. Use die() just like every other error where nothing is corrupted.
> 
> Ok. Will change to die() which provides all the trace information and
> allow multiple split lock in one boot.
> 
>> 
>> And maybe instead turn off split lock detection and print a stack trace instead.  Then the kernel is even more likely to survive to log something useful.
> 
> How about we just use simple policy die() in this patch set to allow
> detect and debug split lock issues and extend the code base to handle
> split lock with different policies (panic, disable split lock, maybe other
> options) in the future?
> 
> 

I’m okay with this.  Peter?

> 

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 5/6] x86/split_lock: Handle #AC exception for split lock
  2019-11-21 22:10   ` Andy Lutomirski
@ 2019-11-21 23:14     ` Fenghua Yu
  2019-11-21 23:12       ` Andy Lutomirski
  0 siblings, 1 reply; 145+ messages in thread
From: Fenghua Yu @ 2019-11-21 23:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Thu, Nov 21, 2019 at 02:10:38PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Nov 20, 2019, at 5:45 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> > 
> > +    if (!user_mode(regs) && split_lock_detect_enabled)
> > +        panic("Split lock detected\n");
> 
> NAK.
> 
> 1. Don’t say “split lock detected” if you don’t know that you detected a split lock.  Or is this genuinely the only way to get #AC from kernel mode?

Intel hardware design team confirmed that the only reason for #AC in ring 0 is
split lock.

> 
> 2. Don’t panic. Use die() just like every other error where nothing is corrupted.

Ok. Will change to die() which provides all the trace information and
allow multiple split lock in one boot.

> 
> And maybe instead turn off split lock detection and print a stack trace instead.  Then the kernel is even more likely to survive to log something useful.

How about we just use simple policy die() in this patch set to allow
detect and debug split lock issues and extend the code base to handle
split lock with different policies (panic, disable split lock, maybe other
options) in the future?

Thanks.

-Fenghua
 

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 22:29             ` Luck, Tony
@ 2019-11-21 23:18               ` Andy Lutomirski
  2019-11-21 23:53                 ` Fenghua Yu
  2019-11-21 23:55                 ` Luck, Tony
  0 siblings, 2 replies; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-21 23:18 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86



> On Nov 21, 2019, at 2:29 PM, Luck, Tony <tony.luck@intel.com> wrote:
> 
> 
>> 
>> It would be really, really nice if we could pass this feature through to a VM. Can we?
> 
> It's hard because the MSR is core scoped rather than thread scoped.  So on an HT
> enabled system a pair of logical processors gets enabled/disabled together.
> 
> 

Well that sucks.

Could we pass it through if the host has no HT?  Debugging is *so* much easier in a VM.  And HT is a bit dubious these days anyway.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 23:18               ` Andy Lutomirski
@ 2019-11-21 23:53                 ` Fenghua Yu
  2019-11-22  1:52                   ` Sean Christopherson
  2019-11-21 23:55                 ` Luck, Tony
  1 sibling, 1 reply; 145+ messages in thread
From: Fenghua Yu @ 2019-11-21 23:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Christopherson Sean J, Xiaoyao Li, Luck, Tony, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Thu, Nov 21, 2019 at 03:18:46PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Nov 21, 2019, at 2:29 PM, Luck, Tony <tony.luck@intel.com> wrote:
> > 
> > 
> >> 
> >> It would be really, really nice if we could pass this feature through to a VM. Can we?
> > 
> > It's hard because the MSR is core scoped rather than thread scoped.  So on an HT
> > enabled system a pair of logical processors gets enabled/disabled together.
> > 
> > 
> 
> Well that sucks.
> 
> Could we pass it through if the host has no HT?  Debugging is *so* much easier in a VM.  And HT is a bit dubious these days anyway.

I think it's doable to pass it through to KVM. The difficulty is to disable
split lock detection in KVM because that will disable split lock on
the whole core including threads for the host. Without disabling split lock
in KVM, it's doable to debug split lock in KVM.

Sean and Xiaoyao are working on split lock for KVM (in separate patch set).
They may have insight on how to do this.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 23:18               ` Andy Lutomirski
  2019-11-21 23:53                 ` Fenghua Yu
@ 2019-11-21 23:55                 ` Luck, Tony
  1 sibling, 0 replies; 145+ messages in thread
From: Luck, Tony @ 2019-11-21 23:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86

> Could we pass it through if the host has no HT?  Debugging is *so* much easier in a VM.  And HT is a bit dubious these days anyway.

Sure ... we can look at doing that in a future series once we get to agreement on the foundation pieces.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 4/6] x86/split_lock: Enumerate split lock detection if the IA32_CORE_CAPABILITIES MSR is not supported
  2019-11-21 22:07   ` Andy Lutomirski
@ 2019-11-22  0:37     ` Fenghua Yu
  2019-11-22  2:13       ` Andy Lutomirski
  0 siblings, 1 reply; 145+ messages in thread
From: Fenghua Yu @ 2019-11-22  0:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Thu, Nov 21, 2019 at 02:07:38PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Nov 20, 2019, at 5:45 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> > 
> > Architecturally the split lock detection feature is enumerated by
> > IA32_CORE_CAPABILITIES MSR and future CPU models will indicate presence
> > of the feature by setting bit 5. But the feature is present in a few
> > older models where split lock detection is enumerated by the CPU models.
> > 
> > Use a "x86_cpu_id" table to list the older CPU models with the feature.
> > 
> 
> This may need to be disabled if the HYPERVISOR bit is set.

How about just keeping this patch set as basic enabling code and
keep HYPERVISOR out of scope as of now? KVM folks will have better
handling of split lock in KVM once this patch set is available in
the kernel.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 22:24           ` Andy Lutomirski
  2019-11-21 22:29             ` Luck, Tony
@ 2019-11-22  0:55             ` Luck, Tony
  1 sibling, 0 replies; 145+ messages in thread
From: Luck, Tony @ 2019-11-22  0:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

On Thu, Nov 21, 2019 at 02:24:21PM -0800, Andy Lutomirski wrote:
> 
> > On Nov 21, 2019, at 1:51 PM, Luck, Tony <tony.luck@intel.com> wrote:

> > Almost all of what's in this set will be required in whatever
> > final solution we want to end up with. Out of this:
> 
> Why don’t we beat it into shape and apply it, hidden behind BROKEN.
> Then we can work on the rest of the patches and have a way to test them.

That's my goal (and thanks for the help with the constructive beating,
"die" is a much better choice that "panic" at this stage of development).

I'm not sure I see the need to hide it behind BROKEN. The reasoning
behind choosing disabled by default was so that this wouldn't affect
anyone unless they chose to turn it on.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 23:53                 ` Fenghua Yu
@ 2019-11-22  1:52                   ` Sean Christopherson
  2019-11-22  2:21                     ` Andy Lutomirski
  0 siblings, 1 reply; 145+ messages in thread
From: Sean Christopherson @ 2019-11-22  1:52 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Andy Lutomirski, Xiaoyao Li, Luck, Tony, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Thu, Nov 21, 2019 at 03:53:29PM -0800, Fenghua Yu wrote:
> On Thu, Nov 21, 2019 at 03:18:46PM -0800, Andy Lutomirski wrote:
> > 
> > > On Nov 21, 2019, at 2:29 PM, Luck, Tony <tony.luck@intel.com> wrote:
> > > 
> > >> It would be really, really nice if we could pass this feature through to a VM. Can we?
> > > 
> > > It's hard because the MSR is core scoped rather than thread scoped.  So on an HT
> > > enabled system a pair of logical processors gets enabled/disabled together.
> > > 
> > 
> > Well that sucks.
> > 
> > Could we pass it through if the host has no HT?  Debugging is *so* much
> > easier in a VM.  And HT is a bit dubious these days anyway.
> 
> I think it's doable to pass it through to KVM. The difficulty is to disable
> split lock detection in KVM because that will disable split lock on the whole
> core including threads for the host. Without disabling split lock in KVM,
> it's doable to debug split lock in KVM.
> 
> Sean and Xiaoyao are working on split lock for KVM (in separate patch set).
> They may have insight on how to do this.

Yes, with SMT off KVM could allow the guest to enable split lock #AC, but
for the initial implementation we'd want to allow it if and only if split
lock #AC is disabled in the host kernel.  Otherwise we have to pull in the
logic to control whether or not a guest can disable split lock #AC, what
to do if a split lock #AC happens when it's enabled by the host but
disabled by the guest, etc...

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 4/6] x86/split_lock: Enumerate split lock detection if the IA32_CORE_CAPABILITIES MSR is not supported
  2019-11-22  0:37     ` Fenghua Yu
@ 2019-11-22  2:13       ` Andy Lutomirski
  2019-11-22  9:46         ` Peter Zijlstra
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-22  2:13 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86


> On Nov 21, 2019, at 4:25 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
> 
> On Thu, Nov 21, 2019 at 02:07:38PM -0800, Andy Lutomirski wrote:
>> 
>> 
>>>> On Nov 20, 2019, at 5:45 PM, Fenghua Yu <fenghua.yu@intel.com> wrote:
>>> 
>>> Architecturally the split lock detection feature is enumerated by
>>> IA32_CORE_CAPABILITIES MSR and future CPU models will indicate presence
>>> of the feature by setting bit 5. But the feature is present in a few
>>> older models where split lock detection is enumerated by the CPU models.
>>> 
>>> Use a "x86_cpu_id" table to list the older CPU models with the feature.
>>> 
>> 
>> This may need to be disabled if the HYPERVISOR bit is set.
> 
> How about just keeping this patch set as basic enabling code and
> keep HYPERVISOR out of scope as of now? KVM folks will have better
> handling of split lock in KVM once this patch set is available in
> the kernel.
> 
> 

You seem to be assuming that certain model CPUs have this feature even if not enumerated. You need to make sure you don’t try to use it in a VM without the hypervisor giving you an indication that it’s available and permitted. My suggestion is to disable model-based enumeration if HYPERVISOR is set.  You should also consider probing the MSR to double check even if you don’t think you have a hypervisor.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22  1:52                   ` Sean Christopherson
@ 2019-11-22  2:21                     ` Andy Lutomirski
  2019-11-22  2:39                       ` Xiaoyao Li
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-22  2:21 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Fenghua Yu, Xiaoyao Li, Luck, Tony, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86


> On Nov 21, 2019, at 5:52 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> 
> On Thu, Nov 21, 2019 at 03:53:29PM -0800, Fenghua Yu wrote:
>>> On Thu, Nov 21, 2019 at 03:18:46PM -0800, Andy Lutomirski wrote:
>>> 
>>>> On Nov 21, 2019, at 2:29 PM, Luck, Tony <tony.luck@intel.com> wrote:
>>>> 
>>>>> It would be really, really nice if we could pass this feature through to a VM. Can we?
>>>> 
>>>> It's hard because the MSR is core scoped rather than thread scoped.  So on an HT
>>>> enabled system a pair of logical processors gets enabled/disabled together.
>>>> 
>>> 
>>> Well that sucks.
>>> 
>>> Could we pass it through if the host has no HT?  Debugging is *so* much
>>> easier in a VM.  And HT is a bit dubious these days anyway.
>> 
>> I think it's doable to pass it through to KVM. The difficulty is to disable
>> split lock detection in KVM because that will disable split lock on the whole
>> core including threads for the host. Without disabling split lock in KVM,
>> it's doable to debug split lock in KVM.
>> 
>> Sean and Xiaoyao are working on split lock for KVM (in separate patch set).
>> They may have insight on how to do this.
> 
> Yes, with SMT off KVM could allow the guest to enable split lock #AC, but
> for the initial implementation we'd want to allow it if and only if split
> lock #AC is disabled in the host kernel.  Otherwise we have to pull in the
> logic to control whether or not a guest can disable split lock #AC, what
> to do if a split lock #AC happens when it's enabled by the host but
> disabled by the guest, etc...

What’s the actual issue?  There’s a window around entry and exit when a split lock in the host might not give #AC, but as long as no user code is run, this doesn’t seem like a big problem.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22  2:21                     ` Andy Lutomirski
@ 2019-11-22  2:39                       ` Xiaoyao Li
  2019-11-22  2:57                         ` Andy Lutomirski
  0 siblings, 1 reply; 145+ messages in thread
From: Xiaoyao Li @ 2019-11-22  2:39 UTC (permalink / raw)
  To: Andy Lutomirski, Sean Christopherson
  Cc: Fenghua Yu, Luck, Tony, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On 11/22/2019 10:21 AM, Andy Lutomirski wrote:
> 
>> On Nov 21, 2019, at 5:52 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>
>> On Thu, Nov 21, 2019 at 03:53:29PM -0800, Fenghua Yu wrote:
>>>> On Thu, Nov 21, 2019 at 03:18:46PM -0800, Andy Lutomirski wrote:
>>>>
>>>>> On Nov 21, 2019, at 2:29 PM, Luck, Tony <tony.luck@intel.com> wrote:
>>>>>
>>>>>> It would be really, really nice if we could pass this feature through to a VM. Can we?
>>>>>
>>>>> It's hard because the MSR is core scoped rather than thread scoped.  So on an HT
>>>>> enabled system a pair of logical processors gets enabled/disabled together.
>>>>>
>>>>
>>>> Well that sucks.
>>>>
>>>> Could we pass it through if the host has no HT?  Debugging is *so* much
>>>> easier in a VM.  And HT is a bit dubious these days anyway.
>>>
>>> I think it's doable to pass it through to KVM. The difficulty is to disable
>>> split lock detection in KVM because that will disable split lock on the whole
>>> core including threads for the host. Without disabling split lock in KVM,
>>> it's doable to debug split lock in KVM.
>>>
>>> Sean and Xiaoyao are working on split lock for KVM (in separate patch set).
>>> They may have insight on how to do this.
>>
>> Yes, with SMT off KVM could allow the guest to enable split lock #AC, but
>> for the initial implementation we'd want to allow it if and only if split
>> lock #AC is disabled in the host kernel.  Otherwise we have to pull in the
>> logic to control whether or not a guest can disable split lock #AC, what
>> to do if a split lock #AC happens when it's enabled by the host but
>> disabled by the guest, etc...
> 
> What’s the actual issue?  There’s a window around entry and exit when a split lock in the host might not give #AC, but as long as no user code is run, this doesn’t seem like a big problem.
> 
The problem is that guest can trigger split locked memory access just by 
disabling split lock #AC even when host has it enabled. In this 
situation, there is bus lock held on the hardware without #AC triggered, 
which is conflict with the purpose that host enables split lock #AC

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22  2:39                       ` Xiaoyao Li
@ 2019-11-22  2:57                         ` Andy Lutomirski
  0 siblings, 0 replies; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-22  2:57 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Sean Christopherson, Fenghua Yu, Luck, Tony, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86


> On Nov 21, 2019, at 6:39 PM, Xiaoyao Li <xiaoyao.li@intel.com> wrote:
> 
> On 11/22/2019 10:21 AM, Andy Lutomirski wrote:
>>>> On Nov 21, 2019, at 5:52 PM, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
>>> 
>>> On Thu, Nov 21, 2019 at 03:53:29PM -0800, Fenghua Yu wrote:
>>>>> On Thu, Nov 21, 2019 at 03:18:46PM -0800, Andy Lutomirski wrote:
>>>>> 
>>>>>> On Nov 21, 2019, at 2:29 PM, Luck, Tony <tony.luck@intel.com> wrote:
>>>>>> 
>>>>>>> It would be really, really nice if we could pass this feature through to a VM. Can we?
>>>>>> 
>>>>>> It's hard because the MSR is core scoped rather than thread scoped.  So on an HT
>>>>>> enabled system a pair of logical processors gets enabled/disabled together.
>>>>>> 
>>>>> 
>>>>> Well that sucks.
>>>>> 
>>>>> Could we pass it through if the host has no HT?  Debugging is *so* much
>>>>> easier in a VM.  And HT is a bit dubious these days anyway.
>>>> 
>>>> I think it's doable to pass it through to KVM. The difficulty is to disable
>>>> split lock detection in KVM because that will disable split lock on the whole
>>>> core including threads for the host. Without disabling split lock in KVM,
>>>> it's doable to debug split lock in KVM.
>>>> 
>>>> Sean and Xiaoyao are working on split lock for KVM (in separate patch set).
>>>> They may have insight on how to do this.
>>> 
>>> Yes, with SMT off KVM could allow the guest to enable split lock #AC, but
>>> for the initial implementation we'd want to allow it if and only if split
>>> lock #AC is disabled in the host kernel.  Otherwise we have to pull in the
>>> logic to control whether or not a guest can disable split lock #AC, what
>>> to do if a split lock #AC happens when it's enabled by the host but
>>> disabled by the guest, etc...
>> What’s the actual issue?  There’s a window around entry and exit when a split lock in the host might not give #AC, but as long as no user code is run, this doesn’t seem like a big problem.
> The problem is that guest can trigger split locked memory access just by disabling split lock #AC even when host has it enabled. In this situation, there is bus lock held on the hardware without #AC triggered, which is conflict with the purpose that host enables split lock #AC

Fair enough. You need some way to get this enabled in guests eventually, though.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 21:22                 ` Andy Lutomirski
@ 2019-11-22  9:25                   ` Peter Zijlstra
  2019-11-22 17:48                     ` Luck, Tony
  0 siblings, 1 reply; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-22  9:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Fenghua Yu, David Laight, Ingo Molnar, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Tony Luck,
	Ashok Raj, Ravi V Shankar, linux-kernel, x86, Will Deacon

On Thu, Nov 21, 2019 at 01:22:13PM -0800, Andy Lutomirski wrote:
> On Thu, Nov 21, 2019 at 12:25 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Nov 21, 2019 at 10:53:03AM -0800, Fenghua Yu wrote:
> >
> > > 4. Otherwise, re-calculate addr to point the 32-bit address which contains
> > >    the bit and operate on the bit. No split lock.
> >
> > That sounds confused, Even BT{,CRS} have a RmW size. There is no
> > 'operate on the bit'.
> >
> > Specifically I hard rely on BTSL to be a 32bit RmW, see commit:
> >
> >   7aa54be29765 ("locking/qspinlock, x86: Provide liveness guarantee")
> >
> 
> Okay, spent a bit of time trying to grok this.  Are you saying that
> LOCK BTSL suffices in a case where LOCK BTSB or LOCK XCHG8 would not?

Yep.

> On x86, all the LOCK operations are full barriers, so they should
> order with adjacent normal accesses even to unrelated addresses,
> right?

Yep, still.

The barrier is not the problem here. Yes the whole value load must come
after the atomic op, be it XCHGB/BTSB or BTSL.

The problem with XCHGB is that it is an 8bit RmW and therefore it makes
no guarantess on the contents of the bytes next to it.

When we use byte ops, we must consider the word as 4 independent
variables. And in that case the later load might observe the lock-byte
state from 3, because the modification to the lock byte from 4 is in
CPU2's store-buffer.

However, by using a 32bit RmW, we force a write on all 4 bytes at the
same time which forces that store from CPU2 to be flushed (because the
operations overlap, whereas an 8byte RmW would not overlap and be
independent).

Now, it _might_ work with an XCHGB anyway, _if_ coherency is per
cacheline, and not on a smaller granularity. But I don't think that is
something the architecture guarantees -- they could play fun and games
with partial forwards or whatever.

Specifically, we made this change:

  450cbdd0125c ("locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE")

Now, we know that MFENCE will in fact flush the store buffers, and LOCK
prefix being faster does seem to imply it does not. LOCK prefix only
guarantees order, it does not guarantee completion (that's what makes
MFENCE so much more expensive).


Also; if we're going to change the bitops API, that is a generic change
and we must consider all architectures. Me having audited the atomic
bitops width a fair number of times now. to answer question on what code
actually does and/or if a proposed change is valid, indicates the
current state is crap, irrespective of the long vs u32 question.

So I'm saying that if we're going to muck with bitops, lets make it a
simple and consistent thing. This concurrency crap is hard enough
without fancy bells on.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 21:01               ` Andy Lutomirski
@ 2019-11-22  9:36                 ` Peter Zijlstra
  0 siblings, 0 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-22  9:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, David Laight, Ingo Molnar, Fenghua Yu,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Tony Luck, Ashok Raj, Ravi V Shankar, linux-kernel, x86

On Thu, Nov 21, 2019 at 01:01:08PM -0800, Andy Lutomirski wrote:
> 
> > On Nov 21, 2019, at 11:56 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > On Thu, Nov 21, 2019 at 09:51:03AM -0800, Andy Lutomirski wrote:
> > 
> >> Can we really not just change the lock asm to use 32-bit accesses for
> >> set_bit(), etc?  Sure, it will fail if the bit index is greater than
> >> 2^32, but that seems nuts.
> > 
> > There are 64bit architectures that do exactly that: Alpha, IA64.
> > 
> > And because of the byte 'optimization' from x86 we already could not
> > rely on word atomicity (we actually play games with multi-bit atomicity
> > for PG_waiters and clear_bit_unlock_is_negative_byte).
> 
> I read a couple pages of the paper you linked and I didn’t spot what
> you’re talking about as it refers to x86.  What are the relevant word
> properties of x86 bitops or the byte optimization?

The paper mostly deals with Power and ARM, x86 only gets sporadic
mention. It does present a way to reason about mixed size atomic
operations though.

And the bitops API is very much cross-architecture. And like I wrote in
that other email, having audited the atomic bitop width a number of
times now makes me say no to anything complicated.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 17:51           ` Andy Lutomirski
  2019-11-21 18:53             ` Fenghua Yu
  2019-11-21 19:56             ` Peter Zijlstra
@ 2019-11-22  9:46             ` David Laight
  2019-11-22 20:32               ` Peter Zijlstra
  2 siblings, 1 reply; 145+ messages in thread
From: David Laight @ 2019-11-22  9:46 UTC (permalink / raw)
  To: 'Andy Lutomirski'
  Cc: Ingo Molnar, Peter Zijlstra, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Tony Luck,
	Ashok Raj, Ravi V Shankar, linux-kernel, x86

From Andy Lutomirski
> Sent: 21 November 2019 17:51
> On Thu, Nov 21, 2019 at 9:43 AM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Ingo Molnar
> > > Sent: 21 November 2019 17:12
> > > * Peter Zijlstra <peterz@infradead.org> wrote:
> > ...
> > > > This feature MUST be default enabled, otherwise everything will
> > > > be/remain broken and we'll end up in the situation where you can't use
> > > > it even if you wanted to.
> > >
> > > Agreed.
> >
> > Before it can be enabled by default someone needs to go through the
> > kernel and fix all the code that abuses the 'bit' functions by using them
> > on int[] instead of long[].
> >
> > I've only seen one fix go through for one use case of one piece of code
> > that repeatedly uses potentially misaligned int[] arrays for bitmasks.
> >
> 
> Can we really not just change the lock asm to use 32-bit accesses for
> set_bit(), etc?  Sure, it will fail if the bit index is greater than
> 2^32, but that seems nuts.

For little endian 64bit cpu it is safe(ish) to cast int [] to long [] for the bitops.
On BE 64bit cpu all hell breaks loose if you do that.
It really wasn't obvious that all the casts I found were anywhere near right
on 64bit BE systems.

So while it is almost certainly safe to change the x86-64 bitops to use
32 bit accesses, some of the code is horribly broken.

> (Why the *hell* do the bitops use long anyway?  They're *bit masks*
> for crying out loud.  As in, users generally want to operate on fixed
> numbers of bits.)

The bitops functions were (probably) written for large bitmaps that
are bigger than the size of a 'word'  (> 32 bits) and likely to be
variable size.
Quite why they use long [] is anybody's guess, but that is the definition.
It also isn't quite clear to me why that are required to be atomic.
On x86 atomicity doesn't cost much, on other architectures the cost
is significant.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 4/6] x86/split_lock: Enumerate split lock detection if the IA32_CORE_CAPABILITIES MSR is not supported
  2019-11-22  2:13       ` Andy Lutomirski
@ 2019-11-22  9:46         ` Peter Zijlstra
  0 siblings, 0 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-22  9:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Tony Luck, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Thu, Nov 21, 2019 at 06:13:18PM -0800, Andy Lutomirski wrote:

> You seem to be assuming that certain model CPUs have this feature even
> if not enumerated. You need to make sure you don’t try to use it in a
> VM without the hypervisor giving you an indication that it’s available
> and permitted. My suggestion is to disable model-based enumeration if
> HYPERVISOR is set.  You should also consider probing the MSR to double
> check even if you don’t think you have a hypervisor.

Yep, in patch 6 this results in an unconditinoal WRMSR, which, when ran
under a HV, will explode most mighty.

He doesn't double check, doesn't use wrmsrl_safe()...

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 21:51         ` Luck, Tony
  2019-11-21 22:24           ` Andy Lutomirski
@ 2019-11-22 10:08           ` Peter Zijlstra
  1 sibling, 0 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-22 10:08 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Ingo Molnar, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Thu, Nov 21, 2019 at 01:51:26PM -0800, Luck, Tony wrote:
> On Thu, Nov 21, 2019 at 02:15:22PM +0100, Peter Zijlstra wrote:
> > Also, just to remind everyone why we really want this. Split lock is a
> > potent, unprivileged, DoS vector.
> 
> So how much do we "really want this"?
> 
> It's been 543 days since the first version of this patch was
> posted. We've made exactly zero progress.

Well, I was thinking we were getting there, but then, all of 58 days ago
you discovered the MSR was per core, which is rather fundamental and
would've been rather useful to know at v1.

  http://lkml.kernel.org/r/20190925180931.GG31852@linux.intel.com

So that is ~485 days wasted because we didn't know how the hardware
actually worked. I'm not thinking that's on us.


Also, talk like:

> I believe Intel real time team guarantees to deliever a split lock FREE
> BIOS/EFI/firmware to their real time users.

is fundamentally misguided. Everybody who buys a chip (with this on) is
a potential real-time customer.



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-21 17:34         ` Luck, Tony
@ 2019-11-22 10:51           ` Peter Zijlstra
  2019-11-22 15:27             ` Peter Zijlstra
  0 siblings, 1 reply; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-22 10:51 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Ingo Molnar, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Thu, Nov 21, 2019 at 09:34:44AM -0800, Luck, Tony wrote:

> You'll notice that we are at version 10 ... lots of things have been tried
> in previous versions. This new version is to get the core functionality
> in, so we can build fancier features later.

The cover letter actually mentions that as a non-goal. Seems like a
conflicting message here.

> Enabling by default at this point would result in a flurry of complaints
> about applications being killed and kernels panicing. That would be
> followed by:

I thought we already found and fixed all the few kernel users that got
it wrong?

And applications? I've desktop'ed around a little with:

  perf stat -e sq_misc.split_lock -a -I 1000

running and that shows exactly, a grant total of, _ZERO_ split lock
usage. Except when I run my explicit split lock proglet, then it goes
through the roof.

So I really don't buy that argument. Like I've been saying forever, sane
architectures have never allowed unaligned atomics in the first place,
this means that sane software won't have any.

Furthermore, split_lock has been a performance issue on x86 for a long
long time, which is another reason why x86-specific software will not
have them.

And if you really really worry, just do a mode that pr_warn()s about the
userspace instead of SIGBUS.

> #include <linus/all-caps-rant-about-backwards-compatability.h>
>
> and the patches being reverted.

I don't buy that either, it would _maybe_ mean flipping the default. But
that very much depends on how many users and what sort of 'quality'
software they're running.

I suspect we can get away with a no_split_lock_detect boot flag. We've
had various such kernel flags in the past for new/dodgy features and
we've lived through that just fine.

Witness: no5lvl, noapic, noclflush noefi, nofxsr, etc..

> This version can serve a very useful purpose. CI systems with h/w that
> supports split lock can enable it and begin the process of finding
> and fixing the remaining kernel issues. Especially helpful if they run
> randconfig and fuzzers.

A non-lethal default enabled variant would be even better for them :-)

> We'd also find out which libraries and applications currently use
> split locks.

On my debian desktop, absolutely nothing I've used in the past hour or
so. That includes both major browsers and some A/V stuff, as well as
building a kernel and writing emails.

> Any developer with concerns about their BIOS using split locks can also
> enable using this patch and begin testing today.

I don't worry about developers much; they can't fix their BIOS other
than to return the box and try and get their money back :/


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 10:51           ` Peter Zijlstra
@ 2019-11-22 15:27             ` Peter Zijlstra
  2019-11-22 17:22               ` Luck, Tony
                                 ` (4 more replies)
  0 siblings, 5 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-22 15:27 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Ingo Molnar, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Fri, Nov 22, 2019 at 11:51:41AM +0100, Peter Zijlstra wrote:

> A non-lethal default enabled variant would be even better for them :-)

fresh from the keyboard, *completely* untested.

it requires we get the kernel and firmware clean, but only warns about
dodgy userspace, which I really don't think there is much of.

getting the kernel clean should be pretty simple.

---
 Documentation/admin-guide/kernel-parameters.txt |  18 +++
 arch/x86/include/asm/cpu.h                      |  17 +++
 arch/x86/include/asm/cpufeatures.h              |   2 +
 arch/x86/include/asm/msr-index.h                |   8 ++
 arch/x86/include/asm/thread_info.h              |   6 +-
 arch/x86/include/asm/traps.h                    |   1 +
 arch/x86/kernel/cpu/common.c                    |   2 +
 arch/x86/kernel/cpu/intel.c                     | 165 ++++++++++++++++++++++++
 arch/x86/kernel/process.c                       |   3 +
 arch/x86/kernel/traps.c                         |  28 +++-
 10 files changed, 246 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9983ac73b66d..18f15defdba6 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3172,6 +3172,24 @@
 
 	nosoftlockup	[KNL] Disable the soft-lockup detector.
 
+	split_lock_detect=
+			[X86] Enable split lock detection
+
+			When enabled (and if hardware support is present), atomic
+			instructions that access data across cache line
+			boundaries will result in an alignment check exception.
+
+			off	- not enabled
+
+			warn	- the kernel will pr_alert about applications
+				  triggering the #AC exception
+
+			fatal	- the kernel will SIGBUS applications that
+				  trigger the #AC exception.
+
+			For any more other than 'off' the kernel will die if
+			it (or firmware) will trigger #AC.
+
 	nosync		[HW,M68K] Disables sync negotiation for all devices.
 
 	nowatchdog	[KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..fa75bbd502b3 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool handle_split_lock(void);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_sld(void);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool handle_split_lock(void)
+{
+	return false;
+}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	return false;
+}
+static inline void switch_sld(struct task_struct *prev) {}
+#endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index e9b62498fe75..c3edd2bba184 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -220,6 +220,7 @@
 #define X86_FEATURE_ZEN			( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
 #define X86_FEATURE_L1TF_PTEINV		( 7*32+29) /* "" L1TF workaround PTE inversion */
 #define X86_FEATURE_IBRS_ENHANCED	( 7*32+30) /* Enhanced IBRS */
+#define X86_FEATURE_SPLIT_LOCK_DETECT	( 7*32+31) /* #AC for split lock */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
@@ -365,6 +366,7 @@
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
 /*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 6a3124664289..7b25cec494fd 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_TEST_CTRL				0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
 #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
 #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,10 @@
  */
 #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
 
+#define MSR_IA32_CORE_CAPABILITIES			  0x000000cf
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT  5
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index d779366ce3f8..d23638a0525e 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
 #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
+#define TIF_SLD			18	/* split_lock_detect */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
 #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
+#define _TIF_SLD		(1 << TIF_SLD)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {
 
 #ifdef CONFIG_X86_IOPL_IOPERM
 # define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
-				 _TIF_IO_BITMAP)
+				 _TIF_IO_BITMAP | _TIF_SLD)
 #else
-# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
 #endif
 
 #define _TIF_WORK_CTXSW_NEXT	(_TIF_WORK_CTXSW)
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index b25e633033c3..2a7cfe8e8c3f 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -172,4 +172,5 @@ enum x86_pf_error_code {
 	X86_PF_INSTR	=		1 << 4,
 	X86_PF_PK	=		1 << 5,
 };
+
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4fc016bc6abd..a6b176fc3996 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1233,6 +1233,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
+	cpu_set_core_cap_bits(c);
+
 	fpu__init_system(c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4a900804a023..d83b8031a124 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -31,6 +33,14 @@
 #include <asm/apic.h>
 #endif
 
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
+static enum split_lock_detect_state sld_state = sld_warn;
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
@@ -652,6 +662,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_init(void);
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
@@ -767,6 +779,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_enable();
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
+
+	split_lock_init();
 }
 
 #ifdef CONFIG_X86_32
@@ -1028,3 +1042,154 @@ static const struct cpu_dev intel_cpu_dev = {
 };
 
 cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+	const char			*option;
+	enum split_lock_detect_state	state;
+} sld_options[] __initconst = {
+	{ "off",	sld_off   },
+	{ "warn",	sld_warn  },
+	{ "force",	sld_force },
+};
+
+static void __init split_lock_setup(void)
+{
+	enum split_lock_detect_state sld = sld_state;
+	char arg[20];
+	int i, ret;
+
+	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+				  arg, sizeof(arg));
+	if (ret < 0)
+		goto print;
+
+	for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+		if (match_option(arg, ret, sld_options[i].option)) {
+			sld = sld_options[i].state;
+			break;
+		}
+	}
+
+	if (sld != sld_state)
+		sld_state = sld;
+
+print:
+	switch(sld) {
+	case sld_off:
+		pr_info("disabled\n");
+		break;
+
+	case sld_warn:
+		pr_info("warning about user-space split_locks\n");
+		break;
+
+	case sld_fatal:
+		pr_info("sending SIGBUS on user-space split_locks\n");
+		break;
+	}
+}
+
+/*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+	u64 test_ctrl_val;
+
+	if (rdmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+		return false;
+
+	if (on)
+		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+	else
+		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+		return false;
+
+	return true;
+}
+
+static void split_lock_init(void)
+{
+	u64 test_ctrl_val;
+
+	if (sld_state == sld_off)
+		return;
+
+	if (__sld_msr_set(true))
+		return;
+
+	/*
+	 * If this is anything other than the boot-cpu, you've done
+	 * funny things and you get to keep whatever pieces.
+	 */
+	pr_warn("MSR fail -- disabled\n");
+	__sld_set_all(sld_off);
+}
+
+void handle_split_lock(void)
+{
+	return sld_state != sld_off;
+}
+
+void handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	if (sld_state == sld_fatal)
+		return false;
+
+	pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+		 current->comm, current->pid, regs->ip);
+
+	__sld_set_msr(false);
+	set_tsk_thread_flag(current, TIF_CLD);
+	return true;
+}
+
+void switch_sld(struct task_struct *prev)
+{
+	__sld_set_msr(true);
+	clear_tsk_thread_flag(current, TIF_CLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+	{}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+	u64 ia32_core_caps = 0;
+
+	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+		rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
+	} else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+		/* Enumerate split lock detection by family and model. */
+		if (x86_match_cpu(split_lock_cpu_ids))
+			ia32_core_caps |= MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT;
+	}
+
+	if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT)
+		split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index bd2a11ca5dd6..c04476a1f970 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
 		/* Enforce MSR update to ensure consistent state */
 		__speculation_ctrl_update(~tifn, tifn);
 	}
+
+	if (tifp & _TIF_SLD)
+		switch_sld(prev_p);
 }
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 3451a004e162..3cba28c9c4d9 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -242,7 +242,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
 {
 	struct task_struct *tsk = current;
 
-
 	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
 		return;
 
@@ -288,9 +287,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
 DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
 DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
 DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
-DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
 #undef IP
 
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	unsigned int trapnr = X86_TRAP_AC;
+	char str[] = "alignment check";
+	int signr = SIGBUS;
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
+		return;
+
+	if (!handle_split_lock())
+		return;
+
+	if (!user_mode(regs))
+		die("Split lock detected\n", regs, error_code);
+
+	cond_local_irq_enable(regs);
+
+	if (handle_user_split_lock(regs, error_code))
+		return;
+
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);
+}
+
 #ifdef CONFIG_VMAP_STACK
 __visible void __noreturn handle_stack_overflow(const char *message,
 						struct pt_regs *regs,

^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 15:27             ` Peter Zijlstra
@ 2019-11-22 17:22               ` Luck, Tony
  2019-11-22 20:23                 ` Peter Zijlstra
  2019-11-22 18:02               ` Luck, Tony
                                 ` (3 subsequent siblings)
  4 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2019-11-22 17:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
> +void handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> +	if (sld_state == sld_fatal)
> +		return false;
> +
> +	pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> +		 current->comm, current->pid, regs->ip);
> +
> +	__sld_set_msr(false);
> +	set_tsk_thread_flag(current, TIF_CLD);
> +	return true;
> +}

I think you need an extra check in here. While a #AC in the kernel
is an indication of a split lock. A user might have enabled alignment
checking and so this #AC might not be from a split lock.

I think the extra code if just to change that first test to:

	if ((regs->eflags & X86_EFLAGS_AC) || sld_fatal)

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22  9:25                   ` Peter Zijlstra
@ 2019-11-22 17:48                     ` Luck, Tony
  2019-11-22 20:31                       ` Peter Zijlstra
  0 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2019-11-22 17:48 UTC (permalink / raw)
  To: Peter Zijlstra, Andy Lutomirski
  Cc: Yu, Fenghua, David Laight, Ingo Molnar, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86, Will Deacon

> When we use byte ops, we must consider the word as 4 independent
> variables. And in that case the later load might observe the lock-byte
> state from 3, because the modification to the lock byte from 4 is in
> CPU2's store-buffer.

So we absolutely violate this with the optimization for constant arguments
to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.

So is code that does:

	set_bit(0, bitmap);

on one CPU. While another is doing:

	set_bit(mybit, bitmap);

on another CPU safe? The first operates on just one byte, the second  on 8 bytes.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 15:27             ` Peter Zijlstra
  2019-11-22 17:22               ` Luck, Tony
@ 2019-11-22 18:02               ` Luck, Tony
  2019-11-22 20:23                 ` Peter Zijlstra
  2019-11-22 18:44               ` Sean Christopherson
                                 ` (2 subsequent siblings)
  4 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2019-11-22 18:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Yu, Fenghua, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

> it requires we get the kernel and firmware clean, but only warns about
> dodgy userspace, which I really don't think there is much of.
>
> getting the kernel clean should be pretty simple.

Fenghua has a half dozen additional patches (I think they were
all posted in previous iterations of the patch) that were found by
code inspection, rather than by actually hitting them.

Those should go in ahead of this.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 15:27             ` Peter Zijlstra
  2019-11-22 17:22               ` Luck, Tony
  2019-11-22 18:02               ` Luck, Tony
@ 2019-11-22 18:44               ` Sean Christopherson
  2019-11-22 20:30                 ` Peter Zijlstra
  2019-11-23  0:30               ` Luck, Tony
  2019-12-13  0:09               ` [PATCH v11] x86/split_lock: Enable split lock detection by kernel parameter Tony Luck
  4 siblings, 1 reply; 145+ messages in thread
From: Sean Christopherson @ 2019-11-22 18:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Luck, Tony, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 22, 2019 at 11:51:41AM +0100, Peter Zijlstra wrote:
> 
> > A non-lethal default enabled variant would be even better for them :-)
> 
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index d779366ce3f8..d23638a0525e 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -92,6 +92,7 @@ struct thread_info {
>  #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
>  #define TIF_NOTSC		16	/* TSC is not accessible in userland */
>  #define TIF_IA32		17	/* IA32 compatibility process */
> +#define TIF_SLD			18	/* split_lock_detect */

Maybe use SLAC (Split-Lock AC) as the acronym?  I can't help but read
SLD as "split-lock disabled".  And name this TIF_NOSLAC (or TIF_NOSLD if
you don't like SLAC) since it's set when the task is running without #AC?

>  #define TIF_NOHZ		19	/* in adaptive nohz mode */
>  #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
>  #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
> @@ -122,6 +123,7 @@ struct thread_info {
>  #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
>  #define _TIF_NOTSC		(1 << TIF_NOTSC)
>  #define _TIF_IA32		(1 << TIF_IA32)
> +#define _TIF_SLD		(1 << TIF_SLD)
>  #define _TIF_NOHZ		(1 << TIF_NOHZ)
>  #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
>  #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)

...

> +void handle_split_lock(void)
> +{
> +	return sld_state != sld_off;
> +}
> +
> +void handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> +	if (sld_state == sld_fatal)
> +		return false;
> +
> +	pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> +		 current->comm, current->pid, regs->ip);
> +
> +	__sld_set_msr(false);
> +	set_tsk_thread_flag(current, TIF_CLD);
> +	return true;
> +}
> +
> +void switch_sld(struct task_struct *prev)
> +{
> +	__sld_set_msr(true);
> +	clear_tsk_thread_flag(current, TIF_CLD);
> +}

...

> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index bd2a11ca5dd6..c04476a1f970 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
>  		/* Enforce MSR update to ensure consistent state */
>  		__speculation_ctrl_update(~tifn, tifn);
>  	}
> +
> +	if (tifp & _TIF_SLD)
> +		switch_sld(prev_p);
>  }

Re-enabling #AC when scheduling out the misbehaving task would also work
well for KVM, e.g. call a variant of handle_user_split_lock() on an
unhandled #AC in the guest.  We can also reuse KVM's existing code to
restore the MSR on return to userspace so that an #AC in the guest doesn't
disable detection in the userspace VMM.

Alternatively, KVM could manually do it's own thing and context switch
the MSR on VM-Enter/VM-Exit (after an unhandled #AC), but I'd rather keep
this out of the VM-Enter path and also avoid thrashing the MSR on an SMT
CPU.  The only downside is that KVM itself would occasionally run with #AC
disabled, but that doesn't seem like a big deal since split locks should
not be magically appearing in KVM.

Last thought, KVM should only expose split lock #AC to the guest if SMT=n
or the host is in "force" mode so that split lock #AC is always enabled
in hardware (for the guest) when then guest wants it enabled.  KVM would
obviously not actually disable #AC in hardware when running in force mode,
regardless of the guest's wishes.

>  /*
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 3451a004e162..3cba28c9c4d9 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -242,7 +242,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
>  {
>  	struct task_struct *tsk = current;
>  
> -
>  	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
>  		return;
>  
> @@ -288,9 +287,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
>  DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
>  DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
>  DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
> -DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
>  #undef IP
>  
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> +	unsigned int trapnr = X86_TRAP_AC;
> +	char str[] = "alignment check";
> +	int signr = SIGBUS;
> +
> +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> +	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> +		return;
> +
> +	if (!handle_split_lock())

Pretty sure this should be omitted entirely.  For an #AC in the kernel,
simply restarting the instruction will fault indefinitely, e.g. dieing is
probably the best course of action if a (completely unexpteced) #AC occurs
in "off" mode.  Dropping this check also lets handle_user_split_lock() do
the right thing for #AC due to EFLAGS.AC=1 (pointed out by Tony).

> +		return;
> +
> +	if (!user_mode(regs))
> +		die("Split lock detected\n", regs, error_code);
> +
> +	cond_local_irq_enable(regs);
> +
> +	if (handle_user_split_lock(regs, error_code))
> +		return;
> +
> +	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> +		error_code, BUS_ADRALN, NULL);
> +}
> +
>  #ifdef CONFIG_VMAP_STACK
>  __visible void __noreturn handle_stack_overflow(const char *message,
>  						struct pt_regs *regs,

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 17:22               ` Luck, Tony
@ 2019-11-22 20:23                 ` Peter Zijlstra
  0 siblings, 0 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-22 20:23 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Ingo Molnar, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Fri, Nov 22, 2019 at 09:22:46AM -0800, Luck, Tony wrote:
> On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
> > +void handle_user_split_lock(struct pt_regs *regs, long error_code)
> > +{
> > +	if (sld_state == sld_fatal)
> > +		return false;
> > +
> > +	pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> > +		 current->comm, current->pid, regs->ip);
> > +
> > +	__sld_set_msr(false);
> > +	set_tsk_thread_flag(current, TIF_CLD);
> > +	return true;
> > +}
> 
> I think you need an extra check in here. While a #AC in the kernel
> is an indication of a split lock. A user might have enabled alignment
> checking and so this #AC might not be from a split lock.
> 
> I think the extra code if just to change that first test to:
> 
> 	if ((regs->eflags & X86_EFLAGS_AC) || sld_fatal)

Indeed.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 18:02               ` Luck, Tony
@ 2019-11-22 20:23                 ` Peter Zijlstra
  2019-11-22 20:42                   ` Fenghua Yu
  0 siblings, 1 reply; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-22 20:23 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Ingo Molnar, Yu, Fenghua, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

On Fri, Nov 22, 2019 at 06:02:04PM +0000, Luck, Tony wrote:
> > it requires we get the kernel and firmware clean, but only warns about
> > dodgy userspace, which I really don't think there is much of.
> >
> > getting the kernel clean should be pretty simple.
> 
> Fenghua has a half dozen additional patches (I think they were
> all posted in previous iterations of the patch) that were found by
> code inspection, rather than by actually hitting them.

I thought we merged at least some of that, but maybe my recollection is
faulty.

> Those should go in ahead of this.

Yes, we should make the kernel as clean as possible before doing this.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 18:44               ` Sean Christopherson
@ 2019-11-22 20:30                 ` Peter Zijlstra
  0 siblings, 0 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-22 20:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Luck, Tony, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

On Fri, Nov 22, 2019 at 10:44:57AM -0800, Sean Christopherson wrote:
> On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
> > On Fri, Nov 22, 2019 at 11:51:41AM +0100, Peter Zijlstra wrote:
> > 
> > > A non-lethal default enabled variant would be even better for them :-)
> > 
> > diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> > index d779366ce3f8..d23638a0525e 100644
> > --- a/arch/x86/include/asm/thread_info.h
> > +++ b/arch/x86/include/asm/thread_info.h
> > @@ -92,6 +92,7 @@ struct thread_info {
> >  #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
> >  #define TIF_NOTSC		16	/* TSC is not accessible in userland */
> >  #define TIF_IA32		17	/* IA32 compatibility process */
> > +#define TIF_SLD			18	/* split_lock_detect */
> 
> Maybe use SLAC (Split-Lock AC) as the acronym?  I can't help but read
> SLD as "split-lock disabled".  And name this TIF_NOSLAC (or TIF_NOSLD if
> you don't like SLAC) since it's set when the task is running without #AC?

I'll take any other name, really. I was typing in a hurry and my
pick-a-sensible-name generator was definitely not running.

> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index bd2a11ca5dd6..c04476a1f970 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
> >  		/* Enforce MSR update to ensure consistent state */
> >  		__speculation_ctrl_update(~tifn, tifn);
> >  	}
> > +
> > +	if (tifp & _TIF_SLD)
> > +		switch_sld(prev_p);
> >  }
> 
> Re-enabling #AC when scheduling out the misbehaving task would also work
> well for KVM, e.g. call a variant of handle_user_split_lock() on an
> unhandled #AC in the guest.

Iinitially I thought having a timer to re-enable it, but this also
works. We really shouldn't be hitting this much. And any actual
occurence needs to be investigated and fixed anyway.

I've not thought much about guests, that's not really my thing. But I'll
think about it a bit :-)

> > +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> > +{
> > +	unsigned int trapnr = X86_TRAP_AC;
> > +	char str[] = "alignment check";
> > +	int signr = SIGBUS;
> > +
> > +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> > +
> > +	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> > +		return;
> > +
> > +	if (!handle_split_lock())
> 
> Pretty sure this should be omitted entirely. 

Yes, I just wanted to early exit the thing for !SUP_INTEL.

> For an #AC in the kernel,
> simply restarting the instruction will fault indefinitely, e.g. dieing is
> probably the best course of action if a (completely unexpteced) #AC occurs
> in "off" mode.  Dropping this check also lets handle_user_split_lock() do
> the right thing for #AC due to EFLAGS.AC=1 (pointed out by Tony).

Howveer I'd completely forgotten about EFLAGS.AC.

> > +		return;
> > +
> > +	if (!user_mode(regs))
> > +		die("Split lock detected\n", regs, error_code);
> > +
> > +	cond_local_irq_enable(regs);
> > +
> > +	if (handle_user_split_lock(regs, error_code))
> > +		return;
> > +
> > +	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> > +		error_code, BUS_ADRALN, NULL);
> > +}

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 17:48                     ` Luck, Tony
@ 2019-11-22 20:31                       ` Peter Zijlstra
  2019-11-22 21:23                         ` Andy Lutomirski
  0 siblings, 1 reply; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-22 20:31 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andy Lutomirski, Yu, Fenghua, David Laight, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86, Will Deacon

On Fri, Nov 22, 2019 at 05:48:14PM +0000, Luck, Tony wrote:
> > When we use byte ops, we must consider the word as 4 independent
> > variables. And in that case the later load might observe the lock-byte
> > state from 3, because the modification to the lock byte from 4 is in
> > CPU2's store-buffer.
> 
> So we absolutely violate this with the optimization for constant arguments
> to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.
> 
> So is code that does:
> 
> 	set_bit(0, bitmap);
> 
> on one CPU. While another is doing:
> 
> 	set_bit(mybit, bitmap);
> 
> on another CPU safe? The first operates on just one byte, the second  on 8 bytes.

It is safe if all you care about is the consistency of that one bit.


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22  9:46             ` David Laight
@ 2019-11-22 20:32               ` Peter Zijlstra
  0 siblings, 0 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-11-22 20:32 UTC (permalink / raw)
  To: David Laight
  Cc: 'Andy Lutomirski',
	Ingo Molnar, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Tony Luck, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

On Fri, Nov 22, 2019 at 09:46:16AM +0000, David Laight wrote:
> From Andy Lutomirski

> > Can we really not just change the lock asm to use 32-bit accesses for
> > set_bit(), etc?  Sure, it will fail if the bit index is greater than
> > 2^32, but that seems nuts.
> 
> For little endian 64bit cpu it is safe(ish) to cast int [] to long [] for the bitops.

But that generates the alignment issues this patch set is concerned
about.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 20:23                 ` Peter Zijlstra
@ 2019-11-22 20:42                   ` Fenghua Yu
  2019-11-22 21:25                     ` Andy Lutomirski
  0 siblings, 1 reply; 145+ messages in thread
From: Fenghua Yu @ 2019-11-22 20:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Luck, Tony, Ingo Molnar, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

On Fri, Nov 22, 2019 at 09:23:45PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 22, 2019 at 06:02:04PM +0000, Luck, Tony wrote:
> > > it requires we get the kernel and firmware clean, but only warns about
> > > dodgy userspace, which I really don't think there is much of.
> > >
> > > getting the kernel clean should be pretty simple.
> > 
> > Fenghua has a half dozen additional patches (I think they were
> > all posted in previous iterations of the patch) that were found by
> > code inspection, rather than by actually hitting them.
> 
> I thought we merged at least some of that, but maybe my recollection is
> faulty.

At least 2 key fixes are in TIP tree:
https://lore.kernel.org/lkml/157384597983.12247.8995835529288193538.tip-bot2@tip-bot2/
https://lore.kernel.org/lkml/157384597947.12247.7200239597382357556.tip-bot2@tip-bot2/

The two issues are blocking kernel boot when split lock is enabled.

> 
> > Those should go in ahead of this.
> 
> Yes, we should make the kernel as clean as possible before doing this.

I'll send out other 6 fixes for atomic bitops shortly. These issues are found
by code inspection.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 20:31                       ` Peter Zijlstra
@ 2019-11-22 21:23                         ` Andy Lutomirski
  2019-12-11 17:52                           ` Peter Zijlstra
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-22 21:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Luck, Tony, Andy Lutomirski, Yu, Fenghua, David Laight,
	Ingo Molnar, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86,
	Will Deacon

On Fri, Nov 22, 2019 at 12:31 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Nov 22, 2019 at 05:48:14PM +0000, Luck, Tony wrote:
> > > When we use byte ops, we must consider the word as 4 independent
> > > variables. And in that case the later load might observe the lock-byte
> > > state from 3, because the modification to the lock byte from 4 is in
> > > CPU2's store-buffer.
> >
> > So we absolutely violate this with the optimization for constant arguments
> > to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.
> >
> > So is code that does:
> >
> >       set_bit(0, bitmap);
> >
> > on one CPU. While another is doing:
> >
> >       set_bit(mybit, bitmap);
> >
> > on another CPU safe? The first operates on just one byte, the second  on 8 bytes.
>
> It is safe if all you care about is the consistency of that one bit.
>

I'm still lost here.  Can you explain how one could write code that
observes an issue?  My trusty SDM, Vol 3 8.2.2 says "Locked
instructions have a total order."  8.2.3.9 says "Loads and Stores Are
Not Reordered with Locked Instructions."  Admittedly, the latter is an
"example", but the section is very clear about the fact that a locked
instruction prevents reordering of a load or a store issued by the
same CPU relative to the locked instruction *regardless of whether
they overlap*.

So using LOCK to impleent smb_mb() is correct, and I still don't
understand your particular concern.

I understand that the CPU is probably permitted to optimize a LOCK RMW
operation such that it retires before the store buffers of earlier
instructions are fully flushed, but only if the store buffer and cache
coherency machinery work together to preserve the architecturally
guaranteed ordering.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 20:42                   ` Fenghua Yu
@ 2019-11-22 21:25                     ` Andy Lutomirski
  2019-12-12  8:57                       ` Peter Zijlstra
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-11-22 21:25 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Peter Zijlstra, Luck, Tony, Ingo Molnar, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86

On Fri, Nov 22, 2019 at 12:29 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> On Fri, Nov 22, 2019 at 09:23:45PM +0100, Peter Zijlstra wrote:
> > On Fri, Nov 22, 2019 at 06:02:04PM +0000, Luck, Tony wrote:
> > > > it requires we get the kernel and firmware clean, but only warns about
> > > > dodgy userspace, which I really don't think there is much of.
> > > >
> > > > getting the kernel clean should be pretty simple.
> > >
> > > Fenghua has a half dozen additional patches (I think they were
> > > all posted in previous iterations of the patch) that were found by
> > > code inspection, rather than by actually hitting them.
> >
> > I thought we merged at least some of that, but maybe my recollection is
> > faulty.
>
> At least 2 key fixes are in TIP tree:
> https://lore.kernel.org/lkml/157384597983.12247.8995835529288193538.tip-bot2@tip-bot2/
> https://lore.kernel.org/lkml/157384597947.12247.7200239597382357556.tip-bot2@tip-bot2/

I do not like these patches at all.  I would *much* rather see the
bitops fixed and those patches reverted.

Is there any Linux architecture that doesn't have 32-bit atomic
operations?  If all architectures can support them, then we should add
set_bit_u32(), etc and/or make x86's set_bit() work for a
4-byte-aligned pointer.

--Andy

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 15:27             ` Peter Zijlstra
                                 ` (2 preceding siblings ...)
  2019-11-22 18:44               ` Sean Christopherson
@ 2019-11-23  0:30               ` Luck, Tony
  2019-11-25 16:13                 ` Sean Christopherson
  2019-12-13  0:09               ` [PATCH v11] x86/split_lock: Enable split lock detection by kernel parameter Tony Luck
  4 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2019-11-23  0:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Fenghua Yu, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Ashok Raj, Ravi V Shankar,
	linux-kernel, x86

On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:

This all looks dubious on an HT system .... three snips
from your patch:

> +static bool __sld_msr_set(bool on)
> +{
> +	u64 test_ctrl_val;
> +
> +	if (rdmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> +		return false;
> +
> +	if (on)
> +		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +	else
> +		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +
> +	if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> +		return false;
> +
> +	return true;
> +}

> +void switch_sld(struct task_struct *prev)
> +{
> +	__sld_set_msr(true);
> +	clear_tsk_thread_flag(current, TIF_CLD);
> +}

> @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
>  		/* Enforce MSR update to ensure consistent state */
>  		__speculation_ctrl_update(~tifn, tifn);
>  	}
> +
> +	if (tifp & _TIF_SLD)
> +		switch_sld(prev_p);
>  }

Don't you have some horrible races between the two logical
processors on the same core as they both try to set/clear the
MSR that is shared at the core level?

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-23  0:30               ` Luck, Tony
@ 2019-11-25 16:13                 ` Sean Christopherson
  2019-12-02 18:20                   ` Luck, Tony
  2019-12-12  8:59                   ` Peter Zijlstra
  0 siblings, 2 replies; 145+ messages in thread
From: Sean Christopherson @ 2019-11-25 16:13 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Peter Zijlstra, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

On Fri, Nov 22, 2019 at 04:30:56PM -0800, Luck, Tony wrote:
> On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
> 
> This all looks dubious on an HT system .... three snips
> from your patch:
> 
> > +static bool __sld_msr_set(bool on)
> > +{
> > +	u64 test_ctrl_val;
> > +
> > +	if (rdmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> > +		return false;
> > +
> > +	if (on)
> > +		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> > +	else
> > +		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> > +
> > +	if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> > +		return false;
> > +
> > +	return true;
> > +}
> 
> > +void switch_sld(struct task_struct *prev)
> > +{
> > +	__sld_set_msr(true);
> > +	clear_tsk_thread_flag(current, TIF_CLD);
> > +}
> 
> > @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
> >  		/* Enforce MSR update to ensure consistent state */
> >  		__speculation_ctrl_update(~tifn, tifn);
> >  	}
> > +
> > +	if (tifp & _TIF_SLD)
> > +		switch_sld(prev_p);
> >  }
> 
> Don't you have some horrible races between the two logical
> processors on the same core as they both try to set/clear the
> MSR that is shared at the core level?

Yes and no.  Yes, there will be races, but they won't be fatal in any way.

  - Only the split-lock bit is supported by the kernel, so there isn't a
    risk of corrupting other bits as both threads will rewrite the current
    hardware value.

  - Toggling of split-lock is only done in "warn" mode.  Worst case
    scenario of a race is that a misbehaving task will generate multiple
    #AC exceptions on the same instruction.  And this race will only occur
    if both siblings are running tasks that generate split-lock #ACs, e.g.
    a race where sibling threads are writing different values will only
    occur if CPUx is disabling split-lock after an #AC and CPUy is
    re-enabling split-lock after *its* previous task generated an #AC.

  - Transitioning between modes at runtime isn't supported and disabling
    is tracked per task, so hardware will always reach a steady state that
    matches the configured mode.  I.e. split-lock is guaranteed to be
    enabled in hardware once all _TIF_SLD threads have been scheduled out.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-25 16:13                 ` Sean Christopherson
@ 2019-12-02 18:20                   ` Luck, Tony
  2019-12-12  8:59                   ` Peter Zijlstra
  1 sibling, 0 replies; 145+ messages in thread
From: Luck, Tony @ 2019-12-02 18:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Zijlstra, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

On Mon, Nov 25, 2019 at 08:13:48AM -0800, Sean Christopherson wrote:
> On Fri, Nov 22, 2019 at 04:30:56PM -0800, Luck, Tony wrote:
> > Don't you have some horrible races between the two logical
> > processors on the same core as they both try to set/clear the
> > MSR that is shared at the core level?
> 
> Yes and no.  Yes, there will be races, but they won't be fatal in any way.
> 
>   - Only the split-lock bit is supported by the kernel, so there isn't a
>     risk of corrupting other bits as both threads will rewrite the current
>     hardware value.
> 
>   - Toggling of split-lock is only done in "warn" mode.  Worst case
>     scenario of a race is that a misbehaving task will generate multiple
>     #AC exceptions on the same instruction.  And this race will only occur
>     if both siblings are running tasks that generate split-lock #ACs, e.g.
>     a race where sibling threads are writing different values will only
>     occur if CPUx is disabling split-lock after an #AC and CPUy is
>     re-enabling split-lock after *its* previous task generated an #AC.
> 
>   - Transitioning between modes at runtime isn't supported and disabling
>     is tracked per task, so hardware will always reach a steady state that
>     matches the configured mode.  I.e. split-lock is guaranteed to be
>     enabled in hardware once all _TIF_SLD threads have been scheduled out.

We should probably include this analysis in the commit
comment. Maybe a comment or two in the code too to note
that the races are mostly harmless and guaranteed to end
quickly.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 21:23                         ` Andy Lutomirski
@ 2019-12-11 17:52                           ` Peter Zijlstra
  2019-12-11 18:12                             ` Andy Lutomirski
  2019-12-11 18:44                             ` Luck, Tony
  0 siblings, 2 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-12-11 17:52 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luck, Tony, Yu, Fenghua, David Laight, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86, Will Deacon

On Fri, Nov 22, 2019 at 01:23:30PM -0800, Andy Lutomirski wrote:
> On Fri, Nov 22, 2019 at 12:31 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Nov 22, 2019 at 05:48:14PM +0000, Luck, Tony wrote:
> > > > When we use byte ops, we must consider the word as 4 independent
> > > > variables. And in that case the later load might observe the lock-byte
> > > > state from 3, because the modification to the lock byte from 4 is in
> > > > CPU2's store-buffer.
> > >
> > > So we absolutely violate this with the optimization for constant arguments
> > > to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.
> > >
> > > So is code that does:
> > >
> > >       set_bit(0, bitmap);
> > >
> > > on one CPU. While another is doing:
> > >
> > >       set_bit(mybit, bitmap);
> > >
> > > on another CPU safe? The first operates on just one byte, the second  on 8 bytes.
> >
> > It is safe if all you care about is the consistency of that one bit.
> >
> 
> I'm still lost here.  Can you explain how one could write code that
> observes an issue?  My trusty SDM, Vol 3 8.2.2 says "Locked
> instructions have a total order."

This is the thing I don't fully believe. Per this thread the bus-lock is
*BAD* and not used for normal LOCK prefixed operations. But without the
bus-lock it becomes very hard to guarantee total order.

After all, if some CPU doesn't observe a specific variable, it doesn't
care where in the order it fell. So I'm thinking they punted and went
with some partial order that is near enough that it becomes very hard to
tell the difference the moment you actually do observe stuff.

> 8.2.3.9 says "Loads and Stores Are
> Not Reordered with Locked Instructions."  Admittedly, the latter is an
> "example", but the section is very clear about the fact that a locked
> instruction prevents reordering of a load or a store issued by the
> same CPU relative to the locked instruction *regardless of whether
> they overlap*.

IIRC this rule is CPU-local.

Sure, but we're talking two cpus here.

	u32 var = 0;
	u8 *ptr = &var;

	CPU0			CPU1

				xchg(ptr, 1)

	xchg((ptr+1, 1);
	r = READ_ONCE(var);

AFAICT nothing guarantees r == 0x0101. The CPU1 store can be stuck in
CPU1's store-buffer. CPU0's xchg() does not overlap and therefore
doesn't force a snoop or forward.

From the perspective of the LOCK prefixed instructions CPU0 never
observes the variable @ptr. And therefore doesn't need to provide order.

Note how the READ_ONCE() is a normal load on CPU0, and per the rules is
only forced to happen after it's own LOCK prefixed instruction, but it
is free to observe ptr[0,2,3] from before, only ptr[1] will be forwarded
from its own store-buffer.

This is exactly the one reorder TSO allows.

> I understand that the CPU is probably permitted to optimize a LOCK RMW
> operation such that it retires before the store buffers of earlier
> instructions are fully flushed, but only if the store buffer and cache
> coherency machinery work together to preserve the architecturally
> guaranteed ordering.

Maybe, maybe not. I'm very loathe to trust this without things being
better specified.

Like I said, it is possible that it all works, but the way I understand
things I _really_ don't want to rely on it.

Therefore, I've written:

	u32 var = 0;
	u8 *ptr = &var;

	CPU0			CPU1

				xchg(ptr, 1)

	set_bit(8, ptr);

	r = READ_ONCE(var);

Because then the LOCK BTSL overlaps with the LOCK XCHGB and CPU0 now
observes the variable @ptr and therefore must force order.

Did this clarify, or confuse more?

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-11 17:52                           ` Peter Zijlstra
@ 2019-12-11 18:12                             ` Andy Lutomirski
  2019-12-11 22:34                               ` Peter Zijlstra
  2019-12-11 18:44                             ` Luck, Tony
  1 sibling, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-12-11 18:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Luck, Tony, Yu, Fenghua, David Laight,
	Ingo Molnar, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86,
	Will Deacon

On Wed, Dec 11, 2019 at 9:52 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Nov 22, 2019 at 01:23:30PM -0800, Andy Lutomirski wrote:
> > On Fri, Nov 22, 2019 at 12:31 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Fri, Nov 22, 2019 at 05:48:14PM +0000, Luck, Tony wrote:
> > > > > When we use byte ops, we must consider the word as 4 independent
> > > > > variables. And in that case the later load might observe the lock-byte
> > > > > state from 3, because the modification to the lock byte from 4 is in
> > > > > CPU2's store-buffer.
> > > >
> > > > So we absolutely violate this with the optimization for constant arguments
> > > > to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.
> > > >
> > > > So is code that does:
> > > >
> > > >       set_bit(0, bitmap);
> > > >
> > > > on one CPU. While another is doing:
> > > >
> > > >       set_bit(mybit, bitmap);
> > > >
> > > > on another CPU safe? The first operates on just one byte, the second  on 8 bytes.
> > >
> > > It is safe if all you care about is the consistency of that one bit.
> > >
> >
> > I'm still lost here.  Can you explain how one could write code that
> > observes an issue?  My trusty SDM, Vol 3 8.2.2 says "Locked
> > instructions have a total order."
>
> This is the thing I don't fully believe. Per this thread the bus-lock is
> *BAD* and not used for normal LOCK prefixed operations. But without the
> bus-lock it becomes very hard to guarantee total order.
>
> After all, if some CPU doesn't observe a specific variable, it doesn't
> care where in the order it fell. So I'm thinking they punted and went
> with some partial order that is near enough that it becomes very hard to
> tell the difference the moment you actually do observe stuff.

I hope that, if the SDM is indeed wrong, that Intel would fix the SDM.
It's definitely not fun to try to understand locking if we don't trust
the manual.

>
> > 8.2.3.9 says "Loads and Stores Are
> > Not Reordered with Locked Instructions."  Admittedly, the latter is an
> > "example", but the section is very clear about the fact that a locked
> > instruction prevents reordering of a load or a store issued by the
> > same CPU relative to the locked instruction *regardless of whether
> > they overlap*.
>
> IIRC this rule is CPU-local.
>
> Sure, but we're talking two cpus here.
>
>         u32 var = 0;
>         u8 *ptr = &var;
>
>         CPU0                    CPU1
>
>                                 xchg(ptr, 1)
>
>         xchg((ptr+1, 1);
>         r = READ_ONCE(var);
>
> AFAICT nothing guarantees r == 0x0101. The CPU1 store can be stuck in
> CPU1's store-buffer. CPU0's xchg() does not overlap and therefore
> doesn't force a snoop or forward.

I think I don't quite understand.  The final value of var had better
be 0x0101 or something is severely wrong.  But r can be 0x0100 because
nothing in this example guarantees that the total order of the locked
instructions has CPU 1's instruction first.

>
> From the perspective of the LOCK prefixed instructions CPU0 never
> observes the variable @ptr. And therefore doesn't need to provide order.

I suspect that the implementation works on whole cache lines for
everything except the actual store buffer entries, which would mean
that CPU 0 does think it observed ptr[0].

>
> Note how the READ_ONCE() is a normal load on CPU0, and per the rules is
> only forced to happen after it's own LOCK prefixed instruction, but it
> is free to observe ptr[0,2,3] from before, only ptr[1] will be forwarded
> from its own store-buffer.
>
> This is exactly the one reorder TSO allows.

If so, then our optimized smp_mb() has all kinds of problems, no?

>
> > I understand that the CPU is probably permitted to optimize a LOCK RMW
> > operation such that it retires before the store buffers of earlier
> > instructions are fully flushed, but only if the store buffer and cache
> > coherency machinery work together to preserve the architecturally
> > guaranteed ordering.
>
> Maybe, maybe not. I'm very loathe to trust this without things being
> better specified.
>
> Like I said, it is possible that it all works, but the way I understand
> things I _really_ don't want to rely on it.
>
> Therefore, I've written:
>
>         u32 var = 0;
>         u8 *ptr = &var;
>
>         CPU0                    CPU1
>
>                                 xchg(ptr, 1)
>
>         set_bit(8, ptr);
>
>         r = READ_ONCE(var);
>
> Because then the LOCK BTSL overlaps with the LOCK XCHGB and CPU0 now
> observes the variable @ptr and therefore must force order.
>
> Did this clarify, or confuse more?

Probably confuses more.

If you're actual concerned that the SDM is wrong, I think that roping
in some architects would be a good idea.

I still think that making set_bit() do 32-bit or smaller accesses is okay.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-11 17:52                           ` Peter Zijlstra
  2019-12-11 18:12                             ` Andy Lutomirski
@ 2019-12-11 18:44                             ` Luck, Tony
  2019-12-11 22:39                               ` Peter Zijlstra
  1 sibling, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2019-12-11 18:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Yu, Fenghua, David Laight, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86, Will Deacon

On Wed, Dec 11, 2019 at 06:52:02PM +0100, Peter Zijlstra wrote:
> Sure, but we're talking two cpus here.
> 
> 	u32 var = 0;
> 	u8 *ptr = &var;
> 
> 	CPU0			CPU1
> 
> 				xchg(ptr, 1)
> 
> 	xchg((ptr+1, 1);
> 	r = READ_ONCE(var);

It looks like our current implementation of set_bit() would already run
into this if some call sites for a particular bitmap `pass in constant
bit positions (which get optimized to byte wide "orb") while others pass
in a variable bit (which execute as 64-bit "bts").

I'm not a h/w architect ... but I've assumed that a LOCK operation
on something contained entirely within a cache line gets its atomicity
by keeping exclusive ownership of the cache line. Split lock happens
because you can't keep ownership for two cache lines, so it gets
escalated to a bus lock.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-11 18:12                             ` Andy Lutomirski
@ 2019-12-11 22:34                               ` Peter Zijlstra
  2019-12-12 19:40                                 ` Andy Lutomirski
  0 siblings, 1 reply; 145+ messages in thread
From: Peter Zijlstra @ 2019-12-11 22:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luck, Tony, Yu, Fenghua, David Laight, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86, Will Deacon

On Wed, Dec 11, 2019 at 10:12:56AM -0800, Andy Lutomirski wrote:
> On Wed, Dec 11, 2019 at 9:52 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Nov 22, 2019 at 01:23:30PM -0800, Andy Lutomirski wrote:
> > > On Fri, Nov 22, 2019 at 12:31 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Fri, Nov 22, 2019 at 05:48:14PM +0000, Luck, Tony wrote:
> > > > > > When we use byte ops, we must consider the word as 4 independent
> > > > > > variables. And in that case the later load might observe the lock-byte
> > > > > > state from 3, because the modification to the lock byte from 4 is in
> > > > > > CPU2's store-buffer.
> > > > >
> > > > > So we absolutely violate this with the optimization for constant arguments
> > > > > to set_bit(), clear_bit() and change_bit() that are implemented as byte ops.
> > > > >
> > > > > So is code that does:
> > > > >
> > > > >       set_bit(0, bitmap);
> > > > >
> > > > > on one CPU. While another is doing:
> > > > >
> > > > >       set_bit(mybit, bitmap);
> > > > >
> > > > > on another CPU safe? The first operates on just one byte, the second  on 8 bytes.
> > > >
> > > > It is safe if all you care about is the consistency of that one bit.
> > > >
> > >
> > > I'm still lost here.  Can you explain how one could write code that
> > > observes an issue?  My trusty SDM, Vol 3 8.2.2 says "Locked
> > > instructions have a total order."
> >
> > This is the thing I don't fully believe. Per this thread the bus-lock is
> > *BAD* and not used for normal LOCK prefixed operations. But without the
> > bus-lock it becomes very hard to guarantee total order.
> >
> > After all, if some CPU doesn't observe a specific variable, it doesn't
> > care where in the order it fell. So I'm thinking they punted and went
> > with some partial order that is near enough that it becomes very hard to
> > tell the difference the moment you actually do observe stuff.
> 
> I hope that, if the SDM is indeed wrong, that Intel would fix the SDM.
> It's definitely not fun to try to understand locking if we don't trust
> the manual.

I can try and find a HW person; but getting the SDM updated is
difficult.

Anyway, the way I see it, it is a scalability thing. Absolute total
order is untenable, it cannot be, it would mean that if you have your 16
socket 20 core system with hyperthreads, and each logical CPU doing a
LOCK prefix instruction on a separate page, they all 640 need to sit
down and discuss who goes first.

Some sort of partial order that connects where variables/lines are
actually shared is needed. Then again, I'm not a HW person, just a poor
sod trying to understand how this can work.

> > Sure, but we're talking two cpus here.
> >
> >         u32 var = 0;
> >         u8 *ptr = &var;
> >
> >         CPU0                    CPU1
> >
> >                                 xchg(ptr, 1)
> >
> >         xchg((ptr+1, 1);
> >         r = READ_ONCE(var);
> >
> > AFAICT nothing guarantees r == 0x0101. The CPU1 store can be stuck in
> > CPU1's store-buffer. CPU0's xchg() does not overlap and therefore
> > doesn't force a snoop or forward.
> 
> I think I don't quite understand.  The final value of var had better
> be 0x0101 or something is severely wrong.  

> But r can be 0x0100 because
> nothing in this example guarantees that the total order of the locked
> instructions has CPU 1's instruction first.

Assuming CPU1 goes first, why would the load from CPU0 see CPU1's
ptr[0]? It can be in CPU1 store buffer, and TSO allows regular reads to
ignore (remote) store-buffers.

> > From the perspective of the LOCK prefixed instructions CPU0 never
> > observes the variable @ptr. And therefore doesn't need to provide order.
> 
> I suspect that the implementation works on whole cache lines for
> everything except the actual store buffer entries, which would mean
> that CPU 0 does think it observed ptr[0].

Quite possible, but consider SMT where each thread has its own
store-buffer. Then the core owns the line, but the value is still not
visible.

I don't know if they want to tie down those semantics.

> > Note how the READ_ONCE() is a normal load on CPU0, and per the rules is
> > only forced to happen after it's own LOCK prefixed instruction, but it
> > is free to observe ptr[0,2,3] from before, only ptr[1] will be forwarded
> > from its own store-buffer.
> >
> > This is exactly the one reorder TSO allows.
> 
> If so, then our optimized smp_mb() has all kinds of problems, no?

Why? All smp_mb() guarantees is order between two memops and it does
that just fine.

> > Did this clarify, or confuse more?
> 
> Probably confuses more.

Lets put it this way, the first approach has many questions and subtle
points, the second approach must always work without question.

> If you're actual concerned that the SDM is wrong, I think that roping
> in some architects would be a good idea.

I'll see what I can do, getting them to commit to something is always
the hard part.

> I still think that making set_bit() do 32-bit or smaller accesses is okay.

Yes, that really should not be a problem. This whole subthread was more
of a cautionary tale that it is not immediately obviously safe. And like
I've said before, the bitops interface is across all archs, we must
consider the weakest behaviour.

Anyway, we considered these things when we did
clear_bit_unlock_is_negative_byte(), and there is a reason we ended up
with BIT(7), there is no way to slice up a byte.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-11 18:44                             ` Luck, Tony
@ 2019-12-11 22:39                               ` Peter Zijlstra
  2019-12-12 10:36                                 ` David Laight
  0 siblings, 1 reply; 145+ messages in thread
From: Peter Zijlstra @ 2019-12-11 22:39 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andy Lutomirski, Yu, Fenghua, David Laight, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86, Will Deacon

On Wed, Dec 11, 2019 at 10:44:16AM -0800, Luck, Tony wrote:
> On Wed, Dec 11, 2019 at 06:52:02PM +0100, Peter Zijlstra wrote:
> > Sure, but we're talking two cpus here.
> > 
> > 	u32 var = 0;
> > 	u8 *ptr = &var;
> > 
> > 	CPU0			CPU1
> > 
> > 				xchg(ptr, 1)
> > 
> > 	xchg((ptr+1, 1);
> > 	r = READ_ONCE(var);
> 
> It looks like our current implementation of set_bit() would already run
> into this if some call sites for a particular bitmap `pass in constant
> bit positions (which get optimized to byte wide "orb") while others pass
> in a variable bit (which execute as 64-bit "bts").

Yes, but luckily most nobody cares.

I only know of two places in the entire kernel where we considered this,
one is clear_bit_unlock_is_negative_byte() and there we punted and
stuffed everything in a single byte, and the other is that x86
queued_fetch_set_pending_acquire() thing I pointed out earlier.

> I'm not a h/w architect ... but I've assumed that a LOCK operation
> on something contained entirely within a cache line gets its atomicity
> by keeping exclusive ownership of the cache line.

Right, but like I just wrote to Andy, consider SMT where each thread has
its own store-buffer. Then the line is local to the core, but there
still is a remote sb to hide stores in.

I don't know if anything x86 does that, or even allows that, but I'm not
aware of specs that are clear enough to say either way.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 21:25                     ` Andy Lutomirski
@ 2019-12-12  8:57                       ` Peter Zijlstra
  2019-12-12 18:52                         ` Luck, Tony
  0 siblings, 1 reply; 145+ messages in thread
From: Peter Zijlstra @ 2019-12-12  8:57 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Fenghua Yu, Luck, Tony, Ingo Molnar, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86

On Fri, Nov 22, 2019 at 01:25:45PM -0800, Andy Lutomirski wrote:
> On Fri, Nov 22, 2019 at 12:29 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >
> > On Fri, Nov 22, 2019 at 09:23:45PM +0100, Peter Zijlstra wrote:
> > > On Fri, Nov 22, 2019 at 06:02:04PM +0000, Luck, Tony wrote:
> > > > > it requires we get the kernel and firmware clean, but only warns about
> > > > > dodgy userspace, which I really don't think there is much of.
> > > > >
> > > > > getting the kernel clean should be pretty simple.
> > > >
> > > > Fenghua has a half dozen additional patches (I think they were
> > > > all posted in previous iterations of the patch) that were found by
> > > > code inspection, rather than by actually hitting them.
> > >
> > > I thought we merged at least some of that, but maybe my recollection is
> > > faulty.
> >
> > At least 2 key fixes are in TIP tree:
> > https://lore.kernel.org/lkml/157384597983.12247.8995835529288193538.tip-bot2@tip-bot2/
> > https://lore.kernel.org/lkml/157384597947.12247.7200239597382357556.tip-bot2@tip-bot2/
> 
> I do not like these patches at all.  I would *much* rather see the
> bitops fixed and those patches reverted.
> 
> Is there any Linux architecture that doesn't have 32-bit atomic
> operations?

Of course! The right question is if there's any architecture that has
SMP and doesn't have 32bit atomic instructions, and then I'd have to
tell you that yes we have those too :/

Personally I'd love to mandate any SMP system has proper atomic ops, but
for now we sorta have to make PARISC and SPARC32 (and some ARC variant
IIRC) limp along.

PARISC and SPARC32 only have the equivalent of an xchgb or something.
Using that you can build a test-and-set spinlock, and then you have to
build atomic primitives using a hashtable of spinlocks.

Awesome, right?

> If all architectures can support them, then we should add
> set_bit_u32(), etc and/or make x86's set_bit() work for a
> 4-byte-aligned pointer.

I object to _u32() variants of the atomic bitops; the bitops interface
is a big enough trainwreck already, lets not make it worse. Making the
existing bitops use 32bit atomics on the inside should be fine though.

If anything we could switch the entire bitmap interface to unsigned int,
but I'm not sure that'd actually help much.

Anyway, many of the unaligned usages appear not to require atomicicity
in the first place, see the other patches he sent [*]. And like pointed out
elsewhere, any code that casts random pointers to (unsigned long *) is
probably already broken due to endian issues. Just making the unaligned
check go away isn't fixing it.

[*] https://lkml.kernel.org/r/1574710984-208305-1-git-send-email-fenghua.yu@intel.com



^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-25 16:13                 ` Sean Christopherson
  2019-12-02 18:20                   ` Luck, Tony
@ 2019-12-12  8:59                   ` Peter Zijlstra
  2020-01-10 19:24                     ` [PATCH v11] x86/split_lock: Enable split lock detection by kernel Luck, Tony
  1 sibling, 1 reply; 145+ messages in thread
From: Peter Zijlstra @ 2019-12-12  8:59 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Luck, Tony, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

On Mon, Nov 25, 2019 at 08:13:48AM -0800, Sean Christopherson wrote:
> On Fri, Nov 22, 2019 at 04:30:56PM -0800, Luck, Tony wrote:
> > On Fri, Nov 22, 2019 at 04:27:15PM +0100, Peter Zijlstra wrote:
> > 
> > This all looks dubious on an HT system .... three snips
> > from your patch:
> > 
> > > +static bool __sld_msr_set(bool on)
> > > +{
> > > +	u64 test_ctrl_val;
> > > +
> > > +	if (rdmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> > > +		return false;
> > > +
> > > +	if (on)
> > > +		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> > > +	else
> > > +		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> > > +
> > > +	if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> > > +		return false;
> > > +
> > > +	return true;
> > > +}
> > 
> > > +void switch_sld(struct task_struct *prev)
> > > +{
> > > +	__sld_set_msr(true);
> > > +	clear_tsk_thread_flag(current, TIF_CLD);
> > > +}
> > 
> > > @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
> > >  		/* Enforce MSR update to ensure consistent state */
> > >  		__speculation_ctrl_update(~tifn, tifn);
> > >  	}
> > > +
> > > +	if (tifp & _TIF_SLD)
> > > +		switch_sld(prev_p);
> > >  }
> > 
> > Don't you have some horrible races between the two logical
> > processors on the same core as they both try to set/clear the
> > MSR that is shared at the core level?
> 
> Yes and no.  Yes, there will be races, but they won't be fatal in any way.
> 
>   - Only the split-lock bit is supported by the kernel, so there isn't a
>     risk of corrupting other bits as both threads will rewrite the current
>     hardware value.
> 
>   - Toggling of split-lock is only done in "warn" mode.  Worst case
>     scenario of a race is that a misbehaving task will generate multiple
>     #AC exceptions on the same instruction.  And this race will only occur
>     if both siblings are running tasks that generate split-lock #ACs, e.g.
>     a race where sibling threads are writing different values will only
>     occur if CPUx is disabling split-lock after an #AC and CPUy is
>     re-enabling split-lock after *its* previous task generated an #AC.
> 
>   - Transitioning between modes at runtime isn't supported and disabling
>     is tracked per task, so hardware will always reach a steady state that
>     matches the configured mode.  I.e. split-lock is guaranteed to be
>     enabled in hardware once all _TIF_SLD threads have been scheduled out.

Just so, thanks for clarifying.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-11 22:39                               ` Peter Zijlstra
@ 2019-12-12 10:36                                 ` David Laight
  2019-12-12 13:04                                   ` Peter Zijlstra
  0 siblings, 1 reply; 145+ messages in thread
From: David Laight @ 2019-12-12 10:36 UTC (permalink / raw)
  To: 'Peter Zijlstra', Luck, Tony
  Cc: Andy Lutomirski, Yu, Fenghua, Ingo Molnar, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86, Will Deacon

From: Peter Zijlstra
> Sent: 11 December 2019 22:39
> On Wed, Dec 11, 2019 at 10:44:16AM -0800, Luck, Tony wrote:
> > On Wed, Dec 11, 2019 at 06:52:02PM +0100, Peter Zijlstra wrote:
> > > Sure, but we're talking two cpus here.
> > >
> > > 	u32 var = 0;
> > > 	u8 *ptr = &var;
> > >
> > > 	CPU0			CPU1
> > >
> > > 				xchg(ptr, 1)
> > >
> > > 	xchg((ptr+1, 1);
> > > 	r = READ_ONCE(var);
> >
> > It looks like our current implementation of set_bit() would already run
> > into this if some call sites for a particular bitmap `pass in constant
> > bit positions (which get optimized to byte wide "orb") while others pass
> > in a variable bit (which execute as 64-bit "bts").
> 
> Yes, but luckily most nobody cares.
> 
> I only know of two places in the entire kernel where we considered this,
> one is clear_bit_unlock_is_negative_byte() and there we punted and
> stuffed everything in a single byte, and the other is that x86
> queued_fetch_set_pending_acquire() thing I pointed out earlier.
> 
> > I'm not a h/w architect ... but I've assumed that a LOCK operation
> > on something contained entirely within a cache line gets its atomicity
> > by keeping exclusive ownership of the cache line.
> 
> Right, but like I just wrote to Andy, consider SMT where each thread has
> its own store-buffer. Then the line is local to the core, but there
> still is a remote sb to hide stores in.
> 
> I don't know if anything x86 does that, or even allows that, but I'm not
> aware of specs that are clear enough to say either way.

On x86 'xchg' is always 'locked' regardless of whether there is a 'lock' prefix.
set_bit() (etc) include the 'lock' prefix (dunno why this decision was made...).

For locked operations (including misaligned ones) that don't cross cache-line
boundaries the read operation almost certainly locks the cache line (against
a snoop) until the write has updated the cache line.
This won't happen until the write 'drains' from the store buffer.
(I suspect that locked read requests act like write requests in ensuring
that no other cpu has a dirty copy of the cache line, and also marking it dirty.)
Although this will delay the response to the snoop it will only
stall the cpu (or other bus master), not the entire memory 'bus'.

If you read the description of 'lock btr' you'll see that it always does the
write cycle (to complete the atomic RMW expected by the memory
subsystem) even when the bit is clear.

Remote store buffers are irrelevant to locked accesses.
(If you are doing concurrent locked and unlocked accesses to the same
memory location something is badly broken.)

It really can't matter whether one access is a mis-aligned 64bit word
and the other a byte. Both do atomic RMW updates so the result
cannot be unexpected.

In principle two separate 8 bit RMW cycles could be done concurrently
to two halves of a 16 bit 'flag' word without losing any bits or any reads
returning any of the expected 4 values.
Not that any memory system would support such updates.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-12 10:36                                 ` David Laight
@ 2019-12-12 13:04                                   ` Peter Zijlstra
  2019-12-12 16:02                                     ` Andy Lutomirski
  2019-12-12 16:29                                     ` David Laight
  0 siblings, 2 replies; 145+ messages in thread
From: Peter Zijlstra @ 2019-12-12 13:04 UTC (permalink / raw)
  To: David Laight
  Cc: Luck, Tony, Andy Lutomirski, Yu, Fenghua, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86, Will Deacon

On Thu, Dec 12, 2019 at 10:36:27AM +0000, David Laight wrote:

> On x86 'xchg' is always 'locked' regardless of whether there is a 'lock' prefix.

Sure, irrelevant here though.

> set_bit() (etc) include the 'lock' prefix (dunno why this decision was made...).

Because it is the atomic set bit function, we have __set_bit() if you
want the non-atomic one.

Atomic bitops are (obviously) useful if you have concurrent changes to
your bitmap.

Lots of people seem confused on this though, as evidenced by a lot of
the broken crap we keep finding (then again, them using __set_bit()
would still be broken due to the endian thing).

> For locked operations (including misaligned ones) that don't cross cache-line
> boundaries the read operation almost certainly locks the cache line (against
> a snoop) until the write has updated the cache line.

Note your use of 'almost'. Almost isn't good enough. Note that other
architectures allow the store from atomic operations to hit the store
buffer. And I strongly suspect x86 does the same.

Waiting for a store-buffer drain is *expensive*.

Try timing:

	LOCK INC (ptr);

vs

	LOCK INC (ptr);
	MFENCE

My guess is the second one *far* more expensive. MFENCE drains (and waits
for completion thereof) the store-buffer -- it must since it fences
against non-coherent stuff.

I suppose ARM's DMB vs DSB is of similar distinction.

> This won't happen until the write 'drains' from the store buffer.
> (I suspect that locked read requests act like write requests in ensuring
> that no other cpu has a dirty copy of the cache line, and also marking it dirty.)
> Although this will delay the response to the snoop it will only
> stall the cpu (or other bus master), not the entire memory 'bus'.

I really don't think so. The commit I pointed to earlier in the thread,
that replaced MFENCE with LOCK ADD $0, -4(%RSP) for smp_mb(), strongly
indicates LOCK prefixed instructions do _NOT_ flush the store buffer.

All barriers impose is order, if your store-buffer can preserve order,
all should just work. One possible way would be to tag each entry, and
increment the tag on barrier. Then ensure that all smaller tags are
flushed before allowing a higher tagged entry to leave.

> If you read the description of 'lock btr' you'll see that it always does the
> write cycle (to complete the atomic RMW expected by the memory
> subsystem) even when the bit is clear.

I know it does, but I don't see how that is relevant here.

> Remote store buffers are irrelevant to locked accesses.

They are not in general and I've seen nothing to indicate this is the
case on x86.

> (If you are doing concurrent locked and unlocked accesses to the same
> memory location something is badly broken.)

It is actually quite common.

> It really can't matter whether one access is a mis-aligned 64bit word
> and the other a byte. Both do atomic RMW updates so the result
> cannot be unexpected.

Expectations are often violated. Esp when talking about memory ordering.

> In principle two separate 8 bit RMW cycles could be done concurrently
> to two halves of a 16 bit 'flag' word without losing any bits or any reads
> returning any of the expected 4 values.
> Not that any memory system would support such updates.

I'm thinking you ought to go read that paper on mixed size concurrency I
referenced earlier in this thread. IIRC the conclusion was that PowerPC
does exactly that and ARM64 allows for it but it hasn't been observed,
yet.

Anyway, I'm not saying x86 behaves this way, I'm saying that I have lots
of questions and very little answers. I'm also saying that the variant
with non-overlapping atomics could conceivably misbehave, while the
variant with overlapping atomics is guaranteed not to.

Specifically smp_mb()/SYNC on PowerPC can not restore Sequential
Consistency under mixed size operations. How's that for expectations?

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-12 13:04                                   ` Peter Zijlstra
@ 2019-12-12 16:02                                     ` Andy Lutomirski
  2019-12-12 16:23                                       ` David Laight
  2019-12-12 16:29                                     ` David Laight
  1 sibling, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-12-12 16:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Laight, Luck, Tony, Andy Lutomirski, Yu, Fenghua,
	Ingo Molnar, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86,
	Will Deacon


> On Dec 12, 2019, at 5:04 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> Waiting for a store-buffer drain is *expensive*.
> 
> Try timing:
> 
>    LOCK INC (ptr);
> 
> vs
> 
>    LOCK INC (ptr);
>    MFENCE
> 
> My guess is the second one *far* more expensive. MFENCE drains (and waits
> for completion thereof) the store-buffer -- it must since it fences
> against non-coherent stuff.

MFENCE also implies LFENCE, and LFENCE is fairly slow despite having no architectural semantics other than blocking speculative execution. AFAICT, in the absence of side channels timing oddities, there is no code whatsoever that would be correct with LFENCE but incorrect without it. “Serialization” is, to some extent, a weaker example of this — MOV to CR2 is *much* slower than MFENCE or LOCK despite the fact that, as far as the memory model is concerned, it doesn’t do a whole lot more.

So the fact that draining some buffer or stalling some superscalar thingy is expensive doesn’t necessarily mean that the lack of said draining is observable in the memory model.

(LFENCE before RDTSC counts as “timing” here.)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-12 16:02                                     ` Andy Lutomirski
@ 2019-12-12 16:23                                       ` David Laight
  0 siblings, 0 replies; 145+ messages in thread
From: David Laight @ 2019-12-12 16:23 UTC (permalink / raw)
  To: 'Andy Lutomirski', Peter Zijlstra
  Cc: Luck, Tony, Andy Lutomirski, Yu, Fenghua, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86, Will Deacon

From: Andy Lutomirski
> Sent: 12 December 2019 16:02
...
> MFENCE also implies LFENCE, and LFENCE is fairly slow despite having no architectural semantics other than blocking speculative
> execution. AFAICT, in the absence of side channels timing oddities, there is no code whatsoever that would be correct with LFENCE
> but incorrect without it. “Serialization” is, to some extent, a weaker example of this — MOV to CR2 is *much* slower than MFENCE or
> LOCK despite the fact that, as far as the memory model is concerned, it doesn’t do a whole lot more.

IIRC LFENCE does affect things when you are mixing non-temporal and/or write-combining
memory accesses.

I also thought there was a case where you needed to stop the speculative reads.
But can't remember why.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-12 13:04                                   ` Peter Zijlstra
  2019-12-12 16:02                                     ` Andy Lutomirski
@ 2019-12-12 16:29                                     ` David Laight
  1 sibling, 0 replies; 145+ messages in thread
From: David Laight @ 2019-12-12 16:29 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: Luck, Tony, Andy Lutomirski, Yu, Fenghua, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86, Will Deacon

From: Peter Zijlstra
> Sent: 12 December 2019 13:04
> On Thu, Dec 12, 2019 at 10:36:27AM +0000, David Laight wrote:
...
> > set_bit() (etc) include the 'lock' prefix (dunno why this decision was made...).
> 
> Because it is the atomic set bit function, we have __set_bit() if you
> want the non-atomic one.

Horrid name, looks like part of the implementation...
I know _ prefixes get used for functions that don't acquire the obvious lock,
but they usually require the caller to hold the lock.

set_bit_nonatomic() and set_bit_atomic() wold be better names.

> Atomic bitops are (obviously) useful if you have concurrent changes to
> your bitmap.
> 
> Lots of people seem confused on this though, as evidenced by a lot of
> the broken crap we keep finding (then again, them using __set_bit()
> would still be broken due to the endian thing).

Yep, quite a bit of code just wants  x |= 1 << n;

> > For locked operations (including misaligned ones) that don't cross cache-line
> > boundaries the read operation almost certainly locks the cache line (against
> > a snoop) until the write has updated the cache line.
> 
> Note your use of 'almost'. Almost isn't good enough. Note that other
> architectures allow the store from atomic operations to hit the store
> buffer. And I strongly suspect x86 does the same.
> 
> Waiting for a store-buffer drain is *expensive*.

Right, the cpu doesn't need to wait for the store buffer to drain,
but the cache line needs to remain locked until it has drained.

...
> > This won't happen until the write 'drains' from the store buffer.
> > (I suspect that locked read requests act like write requests in ensuring
> > that no other cpu has a dirty copy of the cache line, and also marking it dirty.)
> > Although this will delay the response to the snoop it will only
> > stall the cpu (or other bus master), not the entire memory 'bus'.
> 
> I really don't think so. The commit I pointed to earlier in the thread,
> that replaced MFENCE with LOCK ADD $0, -4(%RSP) for smp_mb(), strongly
> indicates LOCK prefixed instructions do _NOT_ flush the store buffer.

They don't need to.
It is only a remote cpu trying to gain exclusive access to the cache line
that needs to be stalled by the LOCK prefix write.
Once that write has escaped the store buffer the cache line can be released.

Of course the store buffer may be able to contain the write data for multiple
atomic operations to different parts of the same cache line.

...
> > (If you are doing concurrent locked and unlocked accesses to the same
> > memory location something is badly broken.)
> 
> It is actually quite common.

Sorry I meant unlocked writes.

> > It really can't matter whether one access is a mis-aligned 64bit word
> > and the other a byte. Both do atomic RMW updates so the result
> > cannot be unexpected.
> 
> Expectations are often violated. Esp when talking about memory ordering.

Especially on DEC Alpha :-)

> > In principle two separate 8 bit RMW cycles could be done concurrently
> > to two halves of a 16 bit 'flag' word without losing any bits or any reads
> > returning any of the expected 4 values.
> > Not that any memory system would support such updates.
> 
> I'm thinking you ought to go read that paper on mixed size concurrency I
> referenced earlier in this thread. IIRC the conclusion was that PowerPC
> does exactly that and ARM64 allows for it but it hasn't been observed,
> yet.

CPU with shared L1 cache might manage to behave 'oddly'.
But they still need to do locked RMW cycles.

> Anyway, I'm not saying x86 behaves this way, I'm saying that I have lots
> of questions and very little answers. I'm also saying that the variant
> with non-overlapping atomics could conceivably misbehave, while the
> variant with overlapping atomics is guaranteed not to.
> 
> Specifically smp_mb()/SYNC on PowerPC can not restore Sequential
> Consistency under mixed size operations. How's that for expectations?

Is that the spanish inquistion?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-12  8:57                       ` Peter Zijlstra
@ 2019-12-12 18:52                         ` Luck, Tony
  2019-12-12 19:46                           ` Luck, Tony
  0 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2019-12-12 18:52 UTC (permalink / raw)
  To: Peter Zijlstra, Andy Lutomirski
  Cc: Yu, Fenghua, Ingo Molnar, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

> If anything we could switch the entire bitmap interface to unsigned int,
> but I'm not sure that'd actually help much.

As we've been looking for potential split lock issues in kernel code, most of
the ones we found relate to callers who have <=32 bits and thus stick:

	u32 flags;

in their structure.  So it would solve those places, and fix any future code
where someone does the same thing.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-11 22:34                               ` Peter Zijlstra
@ 2019-12-12 19:40                                 ` Andy Lutomirski
  2019-12-16  9:59                                   ` David Laight
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-12-12 19:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Luck, Tony, Yu, Fenghua, David Laight,
	Ingo Molnar, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86,
	Will Deacon

On Wed, Dec 11, 2019 at 2:34 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Wed, Dec 11, 2019 at 10:12:56AM -0800, Andy Lutomirski wrote:

> > > Sure, but we're talking two cpus here.
> > >
> > >         u32 var = 0;
> > >         u8 *ptr = &var;
> > >
> > >         CPU0                    CPU1
> > >
> > >                                 xchg(ptr, 1)
> > >
> > >         xchg((ptr+1, 1);
> > >         r = READ_ONCE(var);
> > >
> > > AFAICT nothing guarantees r == 0x0101. The CPU1 store can be stuck in
> > > CPU1's store-buffer. CPU0's xchg() does not overlap and therefore
> > > doesn't force a snoop or forward.
> >
> > I think I don't quite understand.  The final value of var had better
> > be 0x0101 or something is severely wrong.
>
> > But r can be 0x0100 because
> > nothing in this example guarantees that the total order of the locked
> > instructions has CPU 1's instruction first.
>
> Assuming CPU1 goes first, why would the load from CPU0 see CPU1's
> ptr[0]? It can be in CPU1 store buffer, and TSO allows regular reads to
> ignore (remote) store-buffers.

What I'm saying is: if CPU0 goes first, then the three operations order as:



xchg(ptr+1, 1);
r = READ_ONCE(var);  /* 0x0100 */
xchg(ptr, 1);

Anyway, this is all a bit too hypothetical for me.  Is there a clear
example where the total ordering of LOCKed instructions is observable?
 That is, is there a sequence of operations on, presumably, two or
three CPUs, such that LOCKed instructions being only partially ordered
allows an outcome that is disallowed by a total ordering?  I suspect
there is, but I haven't come up with it yet.  (I mean in an x86-like
memory model.  Getting this in a relaxed atomic model is easy.)

As a probably bad example:

u32 x0, x1, a1, b0, b1;

CPU 0:
xchg(&x0, 1);
barrier();
a1 = READ_ONCE(x1);

CPU 1:
xchg(&b, 1);

CPU 2:
b1 = READ_ONCE(x1);
smp_rmb();  /* which is just barrier() on x86 */
b0 = READ_ONCE(x0);

Suppose a1 == 0 and b1 == 1.  Then we know that CPU0's READ_ONCE
happened before CPU1's xchg and hence CPU0's xchg happened before
CPU1's xchg.  We also know that CPU2's first read observed the write
from CPU1's xchg, which means that CPU2's second read should have been
after CPU0's xchg (because the xchg operations have a total order
according to the SDM).  This means that b0 can't be 0.

Hence the outcome (a1, b1, b0) == (0, 1, 0) is disallowed.

It's entirely possible that I screwed up the analysis.  But I think
this means that the cache coherency mechanism is doing something more
intelligent than just shoving the x0=1 write into the store buffer and
letting it hang out there.  Something needs to make sure that CPU 2
observes everything in the same order that CPU 0 observes, and, as far
as I know it, there is a considerable amount of complexity in the CPUs
that makes sure this happens.

So here's my question: do you have a concrete example of a series of
operations and an outcome that you suspect Intel CPUs allow but that
is disallowed in the SDM?

--Andy

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-12 18:52                         ` Luck, Tony
@ 2019-12-12 19:46                           ` Luck, Tony
  2019-12-12 20:01                             ` Andy Lutomirski
  0 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2019-12-12 19:46 UTC (permalink / raw)
  To: 'Peter Zijlstra', 'Andy Lutomirski'
  Cc: Yu, Fenghua, 'Ingo Molnar', 'Thomas Gleixner',
	'Ingo Molnar', 'Borislav Petkov',
	'H Peter Anvin',
	Raj, Ashok, Shankar, Ravi V, 'linux-kernel',
	'x86'

>> If anything we could switch the entire bitmap interface to unsigned int,
>> but I'm not sure that'd actually help much.
>
> As we've been looking for potential split lock issues in kernel code, most of
> the ones we found relate to callers who have <=32 bits and thus stick:
>
>	u32 flags;
>
> in their structure.  So it would solve those places, and fix any future code
> where someone does the same thing.

If different architectures can do better with 8-bit/16-bit/32-bit/64-bit instructions
to manipulate bitmaps, then perhaps this is justification to make all the
functions operate on "bitmap_t" and have each architecture provide the
typedef for their favorite width.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-12 19:46                           ` Luck, Tony
@ 2019-12-12 20:01                             ` Andy Lutomirski
  2019-12-16 16:21                               ` David Laight
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-12-12 20:01 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Peter Zijlstra, Andy Lutomirski, Yu, Fenghua, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Thu, Dec 12, 2019 at 11:46 AM Luck, Tony <tony.luck@intel.com> wrote:
>
> >> If anything we could switch the entire bitmap interface to unsigned int,
> >> but I'm not sure that'd actually help much.
> >
> > As we've been looking for potential split lock issues in kernel code, most of
> > the ones we found relate to callers who have <=32 bits and thus stick:
> >
> >       u32 flags;
> >
> > in their structure.  So it would solve those places, and fix any future code
> > where someone does the same thing.
>
> If different architectures can do better with 8-bit/16-bit/32-bit/64-bit instructions
> to manipulate bitmaps, then perhaps this is justification to make all the
> functions operate on "bitmap_t" and have each architecture provide the
> typedef for their favorite width.
>

Hmm.  IMO there are really two different types of uses of the API.

1 There's a field somewhere and I want to atomically set a bit.  Something like:

struct whatever {
  ...
  whatever_t field;
 ...
};

struct whatever *w;
set_bit(3, &w->field);

If whatever_t is architecture-dependent, then it's really awkward to
use more than 32 bits, since some architectures won't have more than
32-bits.


2. DECLARE_BITMAP(), etc.  That is, someone wants a biggish bitmap
with a certain number of bits.

Here the type doesn't really matter.

On an architecture with genuinely atomic bit operations (i.e. no
hashed spinlocks involved), the width really shouldn't matter.
set_bit() should promise to be atomic on that bit, to be a full
barrier, and to not modify adjacent bits.  I don't see why the width
would matter for most use cases.  If we're concerned, the
implementation could actually use the largest atomic operation and
just suitably align it.  IOW, on x86, LOCK BTSQ *where we manually
align the pointer to 8 bytes and adjust the bit number accordingly*
should cover every possible case even of PeterZ's concerns are
correct.

For the "I have a field in a struct and I just want an atomic RMW that
changes one bit*, an API that matches the rest of the atomic API seems
nice: just act on atomic_t and atomic64_t.

The current "unsigned long" thing basically can't be used on a 64-bit
big-endian architecture with a 32-bit field without gross hackery.
And sometimes we actually want a 32-bit field.

Or am I missing some annoying subtlely here?

^ permalink raw reply	[flat|nested] 145+ messages in thread

* [PATCH v11] x86/split_lock: Enable split lock detection by kernel parameter
  2019-11-22 15:27             ` Peter Zijlstra
                                 ` (3 preceding siblings ...)
  2019-11-23  0:30               ` Luck, Tony
@ 2019-12-13  0:09               ` Tony Luck
  2019-12-13  0:16                 ` Luck, Tony
  4 siblings, 1 reply; 145+ messages in thread
From: Tony Luck @ 2019-12-13  0:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Fenghua Yu, Tony Luck, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Ashok Raj, Ravi V Shankar,
	Sean Christopherson, Andy Lutomirski, linux-kernel, x86

From: Peter Zijlstra <peterz@infradead.org>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete. For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
	off	- not enabled (no traps for split locks)
	warn	- warn once when an application does a
		  split lock, bust allow it to continue
		  running.
	fatal	- Send SIGBUS to applications that cause split lock

Default is "warn". Note that if the kernel hits a split lock
in any mode other than "off" it will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

  - Toggling of split-lock is only done in "warn" mode.  Worst case
    scenario of a race is that a misbehaving task will generate multiple
    #AC exceptions on the same instruction.  And this race will only occur
    if both siblings are running tasks that generate split-lock #ACs, e.g.
    a race where sibling threads are writing different values will only
    occur if CPUx is disabling split-lock after an #AC and CPUy is
    re-enabling split-lock after *its* previous task generated an #AC.
  - Transitioning between modes at runtime isn't supported and disabling
    is tracked per task, so hardware will always reach a steady state that
    matches the configured mode.  I.e. split-lock is guaranteed to be
    enabled in hardware once all _TIF_SLD threads have been scheduled out.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>

---

[Note that I gave PeterZ Author credit because the majority
of the code here came from his untested patch. I just fixed
the typos. He didn't give a "Signed-off-by" ... so he can
either add one to this, or disavow all knowledge - his choice]
---
 .../admin-guide/kernel-parameters.txt         |  18 ++
 Makefile                                      |   4 +-
 arch/x86/include/asm/cpu.h                    |  17 ++
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/msr-index.h              |   8 +
 arch/x86/include/asm/thread_info.h            |   6 +-
 arch/x86/include/asm/traps.h                  |   1 +
 arch/x86/kernel/cpu/common.c                  |   2 +
 arch/x86/kernel/cpu/intel.c                   | 170 ++++++++++++++++++
 arch/x86/kernel/process.c                     |   3 +
 arch/x86/kernel/traps.c                       |  29 ++-
 11 files changed, 254 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ade4e6ec23e0..173c1acff5f0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3181,6 +3181,24 @@
 
 	nosoftlockup	[KNL] Disable the soft-lockup detector.
 
+	split_lock_detect=
+			[X86] Enable split lock detection
+
+			When enabled (and if hardware support is present), atomic
+			instructions that access data across cache line
+			boundaries will result in an alignment check exception.
+
+			off	- not enabled
+
+			warn	- the kernel will pr_alert about applications
+				  triggering the #AC exception
+
+			fatal	- the kernel will SIGBUS applications that
+				  trigger the #AC exception.
+
+			For any more other than 'off' the kernel will die if
+			it (or firmware) will trigger #AC.
+
 	nosync		[HW,M68K] Disables sync negotiation for all devices.
 
 	nowatchdog	[KNL] Disable both lockup detectors, i.e.
diff --git a/Makefile b/Makefile
index 999a197d67d2..73e3c2802927 100644
--- a/Makefile
+++ b/Makefile
@@ -1,8 +1,8 @@
 # SPDX-License-Identifier: GPL-2.0
 VERSION = 5
-PATCHLEVEL = 4
+PATCHLEVEL = 5
 SUBLEVEL = 0
-EXTRAVERSION =
+EXTRAVERSION = -rc1
 NAME = Kleptomaniac Octopus
 
 # *DOCUMENTATION*
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..5223504c7e7c 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool handle_split_lock(void);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_sld(struct task_struct *);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool handle_split_lock(void)
+{
+	return false;
+}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	return false;
+}
+static inline void switch_sld(struct task_struct *prev) {}
+#endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index e9b62498fe75..c3edd2bba184 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -220,6 +220,7 @@
 #define X86_FEATURE_ZEN			( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
 #define X86_FEATURE_L1TF_PTEINV		( 7*32+29) /* "" L1TF workaround PTE inversion */
 #define X86_FEATURE_IBRS_ENHANCED	( 7*32+30) /* Enhanced IBRS */
+#define X86_FEATURE_SPLIT_LOCK_DETECT	( 7*32+31) /* #AC for split lock */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
@@ -365,6 +366,7 @@
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
 /*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 084e98da04a7..8bb2e08ce4a3 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_TEST_CTRL				0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
 #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
 #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,10 @@
  */
 #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
 
+#define MSR_IA32_CORE_CAPABILITIES			  0x000000cf
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT  5
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index d779366ce3f8..d23638a0525e 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
 #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
+#define TIF_SLD			18	/* split_lock_detect */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
 #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
+#define _TIF_SLD		(1 << TIF_SLD)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {
 
 #ifdef CONFIG_X86_IOPL_IOPERM
 # define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
-				 _TIF_IO_BITMAP)
+				 _TIF_IO_BITMAP | _TIF_SLD)
 #else
-# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
 #endif
 
 #define _TIF_WORK_CTXSW_NEXT	(_TIF_WORK_CTXSW)
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index ffa0dc8a535e..6ceab60370f0 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -175,4 +175,5 @@ enum x86_pf_error_code {
 	X86_PF_INSTR	=		1 << 4,
 	X86_PF_PK	=		1 << 5,
 };
+
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2e4d90294fe6..39245f61fad0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1234,6 +1234,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
+	cpu_set_core_cap_bits(c);
+
 	fpu__init_system(c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4a900804a023..79cec85c5132 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -31,6 +33,14 @@
 #include <asm/apic.h>
 #endif
 
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
+static enum split_lock_detect_state sld_state = sld_warn;
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
@@ -652,6 +662,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_init(void);
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
@@ -767,6 +779,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_enable();
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
+
+	split_lock_init();
 }
 
 #ifdef CONFIG_X86_32
@@ -1028,3 +1042,159 @@ static const struct cpu_dev intel_cpu_dev = {
 };
 
 cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+	const char			*option;
+	enum split_lock_detect_state	state;
+} sld_options[] __initconst = {
+	{ "off",	sld_off   },
+	{ "warn",	sld_warn  },
+	{ "fatal",	sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+	int len = strlen(opt);
+
+	return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+	enum split_lock_detect_state sld = sld_state;
+	char arg[20];
+	int i, ret;
+
+	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+				  arg, sizeof(arg));
+	if (ret < 0)
+		goto print;
+
+	for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+		if (match_option(arg, ret, sld_options[i].option)) {
+			sld = sld_options[i].state;
+			break;
+		}
+	}
+
+	if (sld != sld_state)
+		sld_state = sld;
+
+print:
+	switch(sld) {
+	case sld_off:
+		pr_info("disabled\n");
+		break;
+
+	case sld_warn:
+		pr_info("warning about user-space split_locks\n");
+		break;
+
+	case sld_fatal:
+		pr_info("sending SIGBUS on user-space split_locks\n");
+		break;
+	}
+}
+
+/*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+	u64 test_ctrl_val;
+
+	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+		return false;
+
+	if (on)
+		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+	else
+		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+		return false;
+
+	return true;
+}
+
+static void split_lock_init(void)
+{
+	if (sld_state == sld_off)
+		return;
+
+	if (__sld_msr_set(true))
+		return;
+
+	/*
+	 * If this is anything other than the boot-cpu, you've done
+	 * funny things and you get to keep whatever pieces.
+	 */
+	pr_warn("MSR fail -- disabled\n");
+	__sld_msr_set(sld_off);
+}
+
+bool handle_split_lock(void)
+{
+	return sld_state != sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	if ((regs->eflags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+		return false;
+
+	pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+		 current->comm, current->pid, regs->ip);
+
+	__sld_msr_set(false);
+	set_tsk_thread_flag(current, TIF_SLD);
+	return true;
+}
+
+void switch_sld(struct task_struct *prev)
+{
+	__sld_msr_set(true);
+	clear_tsk_thread_flag(prev, TIF_SLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+	{}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+	u64 ia32_core_caps = 0;
+
+	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+		rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
+	} else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+		/* Enumerate split lock detection by family and model. */
+		if (x86_match_cpu(split_lock_cpu_ids))
+			ia32_core_caps |= MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT;
+	}
+
+	if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT)
+		split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 61e93a318983..55d205820f35 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
 		/* Enforce MSR update to ensure consistent state */
 		__speculation_ctrl_update(~tifn, tifn);
 	}
+
+	if (tifp & _TIF_SLD)
+		switch_sld(prev_p);
 }
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 05da6b5b167b..a933a01f6e40 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
 #include <asm/traps.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/cpu.h>
 #include <asm/cpu_entry_area.h>
 #include <asm/mce.h>
 #include <asm/fixmap.h>
@@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
 {
 	struct task_struct *tsk = current;
 
-
 	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
 		return;
 
@@ -288,9 +288,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
 DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
 DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
 DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
-DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
 #undef IP
 
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	unsigned int trapnr = X86_TRAP_AC;
+	char str[] = "alignment check";
+	int signr = SIGBUS;
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
+		return;
+
+	if (!handle_split_lock())
+		return;
+
+	if (!user_mode(regs))
+		die("Split lock detected\n", regs, error_code);
+
+	cond_local_irq_enable(regs);
+
+	if (handle_user_split_lock(regs, error_code))
+		return;
+
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);
+}
+
 #ifdef CONFIG_VMAP_STACK
 __visible void __noreturn handle_stack_overflow(const char *message,
 						struct pt_regs *regs,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [PATCH v11] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-13  0:09               ` [PATCH v11] x86/split_lock: Enable split lock detection by kernel parameter Tony Luck
@ 2019-12-13  0:16                 ` Luck, Tony
  0 siblings, 0 replies; 145+ messages in thread
From: Luck, Tony @ 2019-12-13  0:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Ravi V Shankar, Sean Christopherson,
	Andy Lutomirski, linux-kernel, x86

On Thu, Dec 12, 2019 at 04:09:08PM -0800, Tony Luck wrote:
> diff --git a/Makefile b/Makefile
> index 999a197d67d2..73e3c2802927 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1,8 +1,8 @@
>  # SPDX-License-Identifier: GPL-2.0
>  VERSION = 5
> -PATCHLEVEL = 4
> +PATCHLEVEL = 5
>  SUBLEVEL = 0
> -EXTRAVERSION =
> +EXTRAVERSION = -rc1
>  NAME = Kleptomaniac Octopus
>  
>  # *DOCUMENTATION*

Aaargh - brown paper bag time ... obviously this doesn't
belong here. Must have slipped in when I moved base from
5.4 to 5.5-rc1

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-12 19:40                                 ` Andy Lutomirski
@ 2019-12-16  9:59                                   ` David Laight
  2019-12-16 17:22                                     ` Andy Lutomirski
  0 siblings, 1 reply; 145+ messages in thread
From: David Laight @ 2019-12-16  9:59 UTC (permalink / raw)
  To: 'Andy Lutomirski', Peter Zijlstra
  Cc: Luck, Tony, Yu, Fenghua, Ingo Molnar, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86, Will Deacon

From: Andy Lutomirski
> Sent: 12 December 2019 19:41
> On Wed, Dec 11, 2019 at 2:34 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Dec 11, 2019 at 10:12:56AM -0800, Andy Lutomirski wrote:
> 
> > > > Sure, but we're talking two cpus here.
> > > >
> > > >         u32 var = 0;
> > > >         u8 *ptr = &var;
> > > >
> > > >         CPU0                    CPU1
> > > >
> > > >                                 xchg(ptr, 1)
> > > >
> > > >         xchg((ptr+1, 1);
> > > >         r = READ_ONCE(var);
> > > >
> > > > AFAICT nothing guarantees r == 0x0101. The CPU1 store can be stuck in
> > > > CPU1's store-buffer. CPU0's xchg() does not overlap and therefore
> > > > doesn't force a snoop or forward.
> > >
> > > I think I don't quite understand.  The final value of var had better
> > > be 0x0101 or something is severely wrong.
> >
> > > But r can be 0x0100 because
> > > nothing in this example guarantees that the total order of the locked
> > > instructions has CPU 1's instruction first.
> >
> > Assuming CPU1 goes first, why would the load from CPU0 see CPU1's
> > ptr[0]? It can be in CPU1 store buffer, and TSO allows regular reads to
> > ignore (remote) store-buffers.
> 
> What I'm saying is: if CPU0 goes first, then the three operations order as:
> 
> 
> 
> xchg(ptr+1, 1);
> r = READ_ONCE(var);  /* 0x0100 */
> xchg(ptr, 1);
> 
> Anyway, this is all a bit too hypothetical for me.  Is there a clear
> example where the total ordering of LOCKed instructions is observable?
>  That is, is there a sequence of operations on, presumably, two or
> three CPUs, such that LOCKed instructions being only partially ordered
> allows an outcome that is disallowed by a total ordering?  I suspect
> there is, but I haven't come up with it yet.  (I mean in an x86-like
> memory model.  Getting this in a relaxed atomic model is easy.)
> 
> As a probably bad example:
> 
> u32 x0, x1, a1, b0, b1;
> 
> CPU 0:
> xchg(&x0, 1);
> barrier();
> a1 = READ_ONCE(x1);
> 
> CPU 1:
> xchg(&b, 1);
> 
> CPU 2:
> b1 = READ_ONCE(x1);
> smp_rmb();  /* which is just barrier() on x86 */
> b0 = READ_ONCE(x0);
> 
> Suppose a1 == 0 and b1 == 1.  Then we know that CPU0's READ_ONCE
> happened before CPU1's xchg and hence CPU0's xchg happened before
> CPU1's xchg.  We also know that CPU2's first read observed the write
> from CPU1's xchg, which means that CPU2's second read should have been
> after CPU0's xchg (because the xchg operations have a total order
> according to the SDM).  This means that b0 can't be 0.
> 
> Hence the outcome (a1, b1, b0) == (0, 1, 0) is disallowed.
> 
> It's entirely possible that I screwed up the analysis.  But I think
> this means that the cache coherency mechanism is doing something more
> intelligent than just shoving the x0=1 write into the store buffer and
> letting it hang out there.  Something needs to make sure that CPU 2
> observes everything in the same order that CPU 0 observes, and, as far
> as I know it, there is a considerable amount of complexity in the CPUs
> that makes sure this happens.
> 
> So here's my question: do you have a concrete example of a series of
> operations and an outcome that you suspect Intel CPUs allow but that
> is disallowed in the SDM?

I'm not sure that example is at all relevant.
READ_ONCE() doesn't have any sequencing requirements on the cpu, just on the compiler.
(The same is true of any 'atomic read'.)
Locks work because the RMW operation is atomic, all writes occur in sequence
and (I think) reads cannot 'overtake' the RMW sequence.
In particular you don't want speculative reads being done before the RMW
instruction completes.
I believe that, on x86, this is guaranteed without requiring an LFENCE provided
there are no non-temporal (ie cache bypassing) or write-combining memory cycles.
Other architectures require additional software.

To get any kind of hardware interlock from a read (eg of an atomic counter)
you need to read it with a RMW cycle (eg XADD $0,memory).

The point about bis(&x,1) and bis(&(x+1),1) is that, is you wait long enough
you'll always see both bits set.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-12 20:01                             ` Andy Lutomirski
@ 2019-12-16 16:21                               ` David Laight
  0 siblings, 0 replies; 145+ messages in thread
From: David Laight @ 2019-12-16 16:21 UTC (permalink / raw)
  To: 'Andy Lutomirski', Luck, Tony
  Cc: Peter Zijlstra, Yu, Fenghua, Ingo Molnar, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86

From: Andy Lutomirski
> Sent: 12 December 2019 20:01
> On Thu, Dec 12, 2019 at 11:46 AM Luck, Tony <tony.luck@intel.com> wrote:
> >
> > >> If anything we could switch the entire bitmap interface to unsigned int,
> > >> but I'm not sure that'd actually help much.

That would break all the code that assumes it is 'unsigned long'.
At best it could be changed to a structure with an integral member.
That would make it a little harder for code to 'peek inside' the abstraction.

> > > As we've been looking for potential split lock issues in kernel code, most of
> > > the ones we found relate to callers who have <=32 bits and thus stick:
> > >
> > >       u32 flags;
> > >
> > > in their structure.  So it would solve those places, and fix any future code
> > > where someone does the same thing.

And break all the places that use 'unsigned long' - especially on BE.

> > If different architectures can do better with 8-bit/16-bit/32-bit/64-bit instructions
> > to manipulate bitmaps, then perhaps this is justification to make all the
> > functions operate on "bitmap_t" and have each architecture provide the
> > typedef for their favorite width.

typedef struct { u8/u32/u64 bitmap_val } bitmap_t;

> Hmm.  IMO there are really two different types of uses of the API.
> 
> 1 There's a field somewhere and I want to atomically set a bit.  Something like:
> 
> struct whatever {
>   ...
>   whatever_t field;
>  ...
> };
> 
> struct whatever *w;
> set_bit(3, &w->field);
> 
> If whatever_t is architecture-dependent, then it's really awkward to
> use more than 32 bits, since some architectures won't have more than
> 32-bits.

You could implement that using multiple functions and 'sizeof'.

At the moment that code is broken on BE systems unless whatever_t is
the same size as 'unsigned long'.

> 2. DECLARE_BITMAP(), etc.  That is, someone wants a biggish bitmap
> with a certain number of bits.
> 
> Here the type doesn't really matter.

Except some code uses its own 'unsigned long[]' instead of DECALRE_BITMAP.

The low level x86 code actually passes 'unsigned int[]' knowing that
the cast happened to be ok.

> On an architecture with genuinely atomic bit operations (i.e. no
> hashed spinlocks involved), the width really shouldn't matter.
> set_bit() should promise to be atomic on that bit, to be a full
> barrier, and to not modify adjacent bits.  I don't see why the width
> would matter for most use cases.  If we're concerned, the
> implementation could actually use the largest atomic operation and
> just suitably align it.  IOW, on x86, LOCK BTSQ *where we manually
> align the pointer to 8 bytes and adjust the bit number accordingly*
> should cover every possible case even of PeterZ's concerns are
> correct.

A properly abstracted BITMAP library should be allowed to permute
the bit number using a run-time initialised map.
(eg xor with any value less than the number of bits in 'unsigned long'.)
Otherwise you'll always allow the user to 'peek inside'.

There is also:

1a) I've a field I need to set a bit in.
  There must be a function to do that (I like functions).
  Ah yes:
	set_bit(3, &s->m);
   Bugger doesn't compile, try:
	set_bit(3, (void *)&s->m);
   That must be how I should do it.

   ISTR at least one driver does that when writing ring buffer entries.

2b) I've a 'u32[]', if I cast it to 'unsigned long' I can use the 'bit' functions on it.
   The data never changes (after initialisation), but I've use the atomic operations
   anyway.

> For the "I have a field in a struct and I just want an atomic RMW that
> changes one bit*, an API that matches the rest of the atomic API seems
> nice: just act on atomic_t and atomic64_t.
> 
> The current "unsigned long" thing basically can't be used on a 64-bit
> big-endian architecture with a 32-bit field without gross hackery.

Well, you can xor the bit number with 63 on BE systems.
Then 8/16/32 sized field members work fine - provided you don't
care what values are actually used.

> And sometimes we actually want a 32-bit field.
> 
> Or am I missing some annoying subtlely here?

Some code has assumed DECLARE_BITFIELD() uses 'unsigned long'.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-16  9:59                                   ` David Laight
@ 2019-12-16 17:22                                     ` Andy Lutomirski
  2019-12-16 17:45                                       ` David Laight
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-12-16 17:22 UTC (permalink / raw)
  To: David Laight
  Cc: Andy Lutomirski, Peter Zijlstra, Luck, Tony, Yu, Fenghua,
	Ingo Molnar, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86,
	Will Deacon

On Mon, Dec 16, 2019 at 1:59 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Andy Lutomirski
> > Sent: 12 December 2019 19:41
> > On Wed, Dec 11, 2019 at 2:34 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Wed, Dec 11, 2019 at 10:12:56AM -0800, Andy Lutomirski wrote:
> >
> > > > > Sure, but we're talking two cpus here.
> > > > >
> > > > >         u32 var = 0;
> > > > >         u8 *ptr = &var;
> > > > >
> > > > >         CPU0                    CPU1
> > > > >
> > > > >                                 xchg(ptr, 1)
> > > > >
> > > > >         xchg((ptr+1, 1);
> > > > >         r = READ_ONCE(var);
> > > > >
> > > > > AFAICT nothing guarantees r == 0x0101. The CPU1 store can be stuck in
> > > > > CPU1's store-buffer. CPU0's xchg() does not overlap and therefore
> > > > > doesn't force a snoop or forward.
> > > >
> > > > I think I don't quite understand.  The final value of var had better
> > > > be 0x0101 or something is severely wrong.
> > >
> > > > But r can be 0x0100 because
> > > > nothing in this example guarantees that the total order of the locked
> > > > instructions has CPU 1's instruction first.
> > >
> > > Assuming CPU1 goes first, why would the load from CPU0 see CPU1's
> > > ptr[0]? It can be in CPU1 store buffer, and TSO allows regular reads to
> > > ignore (remote) store-buffers.
> >
> > What I'm saying is: if CPU0 goes first, then the three operations order as:
> >
> >
> >
> > xchg(ptr+1, 1);
> > r = READ_ONCE(var);  /* 0x0100 */
> > xchg(ptr, 1);
> >
> > Anyway, this is all a bit too hypothetical for me.  Is there a clear
> > example where the total ordering of LOCKed instructions is observable?
> >  That is, is there a sequence of operations on, presumably, two or
> > three CPUs, such that LOCKed instructions being only partially ordered
> > allows an outcome that is disallowed by a total ordering?  I suspect
> > there is, but I haven't come up with it yet.  (I mean in an x86-like
> > memory model.  Getting this in a relaxed atomic model is easy.)
> >
> > As a probably bad example:
> >
> > u32 x0, x1, a1, b0, b1;
> >
> > CPU 0:
> > xchg(&x0, 1);
> > barrier();
> > a1 = READ_ONCE(x1);
> >
> > CPU 1:
> > xchg(&b, 1);
> >
> > CPU 2:
> > b1 = READ_ONCE(x1);
> > smp_rmb();  /* which is just barrier() on x86 */
> > b0 = READ_ONCE(x0);
> >
> > Suppose a1 == 0 and b1 == 1.  Then we know that CPU0's READ_ONCE
> > happened before CPU1's xchg and hence CPU0's xchg happened before
> > CPU1's xchg.  We also know that CPU2's first read observed the write
> > from CPU1's xchg, which means that CPU2's second read should have been
> > after CPU0's xchg (because the xchg operations have a total order
> > according to the SDM).  This means that b0 can't be 0.
> >
> > Hence the outcome (a1, b1, b0) == (0, 1, 0) is disallowed.
> >
> > It's entirely possible that I screwed up the analysis.  But I think
> > this means that the cache coherency mechanism is doing something more
> > intelligent than just shoving the x0=1 write into the store buffer and
> > letting it hang out there.  Something needs to make sure that CPU 2
> > observes everything in the same order that CPU 0 observes, and, as far
> > as I know it, there is a considerable amount of complexity in the CPUs
> > that makes sure this happens.
> >
> > So here's my question: do you have a concrete example of a series of
> > operations and an outcome that you suspect Intel CPUs allow but that
> > is disallowed in the SDM?
>
> I'm not sure that example is at all relevant.
> READ_ONCE() doesn't have any sequencing requirements on the cpu, just on the compiler.
> (The same is true of any 'atomic read'.)

I'm talking specifically about x86 here, where, for example, "Reads
are not reordered with other reads".  So READ_ONCE *does* have
sequencing requirements on the CPUs.

Feel free to replace READ_ONCE with MOV in your head if you like :)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-16 17:22                                     ` Andy Lutomirski
@ 2019-12-16 17:45                                       ` David Laight
  2019-12-16 18:06                                         ` Andy Lutomirski
  0 siblings, 1 reply; 145+ messages in thread
From: David Laight @ 2019-12-16 17:45 UTC (permalink / raw)
  To: 'Andy Lutomirski'
  Cc: Peter Zijlstra, Luck, Tony, Yu, Fenghua, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86, Will Deacon

From: Andy Lutomirski
> Sent: 16 December 2019 17:23
...
> I'm talking specifically about x86 here, where, for example, "Reads
> are not reordered with other reads".  So READ_ONCE *does* have
> sequencing requirements on the CPUs.
> 
> Feel free to replace READ_ONCE with MOV in your head if you like :)

I got a little confused because I thought your reference to READ_ONCE()
was relevant.

Sometimes remembering all this gets hard.
The docs about the effects of LFENCE and MFENCE don't really help
(they make my brain hurt).
I'm pretty sure I've decided in the past they are almost never needed.

Usually the ordering of reads doesn't help you.
IIRC If locations 'a' and 'b' get changed from 0 to 1 it is perfectly possible
for one cpu to see a==0, b==1 and another a==1, b==0 even
though both read a then b.
(On non-alpha this may require different cpus update a and b.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-16 17:45                                       ` David Laight
@ 2019-12-16 18:06                                         ` Andy Lutomirski
  2019-12-17 10:03                                           ` David Laight
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2019-12-16 18:06 UTC (permalink / raw)
  To: David Laight
  Cc: Andy Lutomirski, Peter Zijlstra, Luck, Tony, Yu, Fenghua,
	Ingo Molnar, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86,
	Will Deacon

On Mon, Dec 16, 2019 at 9:45 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Andy Lutomirski
> > Sent: 16 December 2019 17:23
> ...
> > I'm talking specifically about x86 here, where, for example, "Reads
> > are not reordered with other reads".  So READ_ONCE *does* have
> > sequencing requirements on the CPUs.
> >
> > Feel free to replace READ_ONCE with MOV in your head if you like :)
>
> I got a little confused because I thought your reference to READ_ONCE()
> was relevant.
>
> Sometimes remembering all this gets hard.
> The docs about the effects of LFENCE and MFENCE don't really help
> (they make my brain hurt).
> I'm pretty sure I've decided in the past they are almost never needed.
>

Me too.

This whole discussion is about the fact that PeterZ is sceptical that
actual x86 CPUs have as strong a memory model as the SDM suggests, and
I'm trying to understand the exact concern.  This may or may not be
directly relevant to the kernel. :)

> Usually the ordering of reads doesn't help you.
> IIRC If locations 'a' and 'b' get changed from 0 to 1 it is perfectly possible
> for one cpu to see a==0, b==1 and another a==1, b==0 even
> though both read a then b.
> (On non-alpha this may require different cpus update a and b.)
>

x86 mostly prevents this.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter
  2019-12-16 18:06                                         ` Andy Lutomirski
@ 2019-12-17 10:03                                           ` David Laight
  0 siblings, 0 replies; 145+ messages in thread
From: David Laight @ 2019-12-17 10:03 UTC (permalink / raw)
  To: 'Andy Lutomirski'
  Cc: Peter Zijlstra, Luck, Tony, Yu, Fenghua, Ingo Molnar,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86, Will Deacon

From: Andy Lutomirski
> Sent: 16 December 2019 18:06
..
> This whole discussion is about the fact that PeterZ is sceptical that
> actual x86 CPUs have as strong a memory model as the SDM suggests, and
> I'm trying to understand the exact concern.  This may or may not be
> directly relevant to the kernel. :)

The x86 memory model is pretty strong.
It has to be to support historic code - including self modifying code.
I think DOS from 1982 should still boot.

Even for SMP they probably can't relax anything from the original implementations.
(Except cpu specific kernel bits - since that has all changed since some dual 486 boxes.)

Actually the weakest x86 memory model was that defined in some P-pro
era Intel docs that said that IOR/IOW weren't sequenced with memory accesses.
Fortunately no cpu ever did that reordering, and now it isn't allowed.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 145+ messages in thread

* [PATCH v11] x86/split_lock: Enable split lock detection by kernel
  2019-12-12  8:59                   ` Peter Zijlstra
@ 2020-01-10 19:24                     ` Luck, Tony
  2020-01-14  5:55                       ` Sean Christopherson
  0 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2020-01-10 19:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sean Christopherson, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

From: Peter Zijlstra <peterz@infradead.org>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete. For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
	off	- not enabled (no traps for split locks)
	warn	- warn once when an application does a
		  split lock, bust allow it to continue
		  running.
	fatal	- Send SIGBUS to applications that cause split lock

Default is "warn". Note that if the kernel hits a split lock
in any mode other than "off" it will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

  - Toggling of split-lock is only done in "warn" mode.  Worst case
    scenario of a race is that a misbehaving task will generate multiple
    #AC exceptions on the same instruction.  And this race will only occur
    if both siblings are running tasks that generate split-lock #ACs, e.g.
    a race where sibling threads are writing different values will only
    occur if CPUx is disabling split-lock after an #AC and CPUy is
    re-enabling split-lock after *its* previous task generated an #AC.
  - Transitioning between modes at runtime isn't supported and disabling
    is tracked per task, so hardware will always reach a steady state that
    matches the configured mode.  I.e. split-lock is guaranteed to be
    enabled in hardware once all _TIF_SLD threads have been scheduled out.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

I think all the known places where split locks occur in the kernel
have already been patched, or the patches are queued for the upcoming
merge window.  If we missed some, well this patch will help find them
(for people with Icelake or Icelake Xeon systems). PeterZ didn't see
any application level use of split locks in a few hours of runtime
on his desktop. So likely little fallout there (default is just to
warn for applications, so just console noise rather than failure).

 .../admin-guide/kernel-parameters.txt         |  18 ++
 arch/x86/include/asm/cpu.h                    |  17 ++
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/msr-index.h              |   8 +
 arch/x86/include/asm/thread_info.h            |   6 +-
 arch/x86/include/asm/traps.h                  |   1 +
 arch/x86/kernel/cpu/common.c                  |   2 +
 arch/x86/kernel/cpu/intel.c                   | 170 ++++++++++++++++++
 arch/x86/kernel/process.c                     |   3 +
 arch/x86/kernel/traps.c                       |  29 ++-
 10 files changed, 252 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ade4e6ec23e0..173c1acff5f0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3181,6 +3181,24 @@
 
 	nosoftlockup	[KNL] Disable the soft-lockup detector.
 
+	split_lock_detect=
+			[X86] Enable split lock detection
+
+			When enabled (and if hardware support is present), atomic
+			instructions that access data across cache line
+			boundaries will result in an alignment check exception.
+
+			off	- not enabled
+
+			warn	- the kernel will pr_alert about applications
+				  triggering the #AC exception
+
+			fatal	- the kernel will SIGBUS applications that
+				  trigger the #AC exception.
+
+			For any more other than 'off' the kernel will die if
+			it (or firmware) will trigger #AC.
+
 	nosync		[HW,M68K] Disables sync negotiation for all devices.
 
 	nowatchdog	[KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..5223504c7e7c 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool handle_split_lock(void);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_sld(struct task_struct *);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool handle_split_lock(void)
+{
+	return false;
+}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	return false;
+}
+static inline void switch_sld(struct task_struct *prev) {}
+#endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index e9b62498fe75..c3edd2bba184 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -220,6 +220,7 @@
 #define X86_FEATURE_ZEN			( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
 #define X86_FEATURE_L1TF_PTEINV		( 7*32+29) /* "" L1TF workaround PTE inversion */
 #define X86_FEATURE_IBRS_ENHANCED	( 7*32+30) /* Enhanced IBRS */
+#define X86_FEATURE_SPLIT_LOCK_DETECT	( 7*32+31) /* #AC for split lock */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
@@ -365,6 +366,7 @@
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
 /*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 084e98da04a7..8bb2e08ce4a3 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_TEST_CTRL				0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
 #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
 #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,10 @@
  */
 #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
 
+#define MSR_IA32_CORE_CAPABILITIES			  0x000000cf
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT  5
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index d779366ce3f8..d23638a0525e 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
 #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
+#define TIF_SLD			18	/* split_lock_detect */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
 #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
+#define _TIF_SLD		(1 << TIF_SLD)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {
 
 #ifdef CONFIG_X86_IOPL_IOPERM
 # define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
-				 _TIF_IO_BITMAP)
+				 _TIF_IO_BITMAP | _TIF_SLD)
 #else
-# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
 #endif
 
 #define _TIF_WORK_CTXSW_NEXT	(_TIF_WORK_CTXSW)
diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
index ffa0dc8a535e..6ceab60370f0 100644
--- a/arch/x86/include/asm/traps.h
+++ b/arch/x86/include/asm/traps.h
@@ -175,4 +175,5 @@ enum x86_pf_error_code {
 	X86_PF_INSTR	=		1 << 4,
 	X86_PF_PK	=		1 << 5,
 };
+
 #endif /* _ASM_X86_TRAPS_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2e4d90294fe6..39245f61fad0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1234,6 +1234,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
+	cpu_set_core_cap_bits(c);
+
 	fpu__init_system(c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4a900804a023..43cc7a8f077e 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -31,6 +33,14 @@
 #include <asm/apic.h>
 #endif
 
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
+static enum split_lock_detect_state sld_state = sld_warn;
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
@@ -652,6 +662,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_init(void);
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
@@ -767,6 +779,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_enable();
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
+
+	split_lock_init();
 }
 
 #ifdef CONFIG_X86_32
@@ -1028,3 +1042,159 @@ static const struct cpu_dev intel_cpu_dev = {
 };
 
 cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+	const char			*option;
+	enum split_lock_detect_state	state;
+} sld_options[] __initconst = {
+	{ "off",	sld_off   },
+	{ "warn",	sld_warn  },
+	{ "fatal",	sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+	int len = strlen(opt);
+
+	return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+	enum split_lock_detect_state sld = sld_state;
+	char arg[20];
+	int i, ret;
+
+	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+				  arg, sizeof(arg));
+	if (ret < 0)
+		goto print;
+
+	for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+		if (match_option(arg, ret, sld_options[i].option)) {
+			sld = sld_options[i].state;
+			break;
+		}
+	}
+
+	if (sld != sld_state)
+		sld_state = sld;
+
+print:
+	switch(sld) {
+	case sld_off:
+		pr_info("disabled\n");
+		break;
+
+	case sld_warn:
+		pr_info("warning about user-space split_locks\n");
+		break;
+
+	case sld_fatal:
+		pr_info("sending SIGBUS on user-space split_locks\n");
+		break;
+	}
+}
+
+/*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+	u64 test_ctrl_val;
+
+	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+		return false;
+
+	if (on)
+		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+	else
+		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+		return false;
+
+	return true;
+}
+
+static void split_lock_init(void)
+{
+	if (sld_state == sld_off)
+		return;
+
+	if (__sld_msr_set(true))
+		return;
+
+	/*
+	 * If this is anything other than the boot-cpu, you've done
+	 * funny things and you get to keep whatever pieces.
+	 */
+	pr_warn("MSR fail -- disabled\n");
+	__sld_msr_set(sld_off);
+}
+
+bool handle_split_lock(void)
+{
+	return sld_state != sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+		return false;
+
+	pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+		 current->comm, current->pid, regs->ip);
+
+	__sld_msr_set(false);
+	set_tsk_thread_flag(current, TIF_SLD);
+	return true;
+}
+
+void switch_sld(struct task_struct *prev)
+{
+	__sld_msr_set(true);
+	clear_tsk_thread_flag(prev, TIF_SLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+	{}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+	u64 ia32_core_caps = 0;
+
+	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+		rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
+	} else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+		/* Enumerate split lock detection by family and model. */
+		if (x86_match_cpu(split_lock_cpu_ids))
+			ia32_core_caps |= MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT;
+	}
+
+	if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT)
+		split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 61e93a318983..55d205820f35 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
 		/* Enforce MSR update to ensure consistent state */
 		__speculation_ctrl_update(~tifn, tifn);
 	}
+
+	if (tifp & _TIF_SLD)
+		switch_sld(prev_p);
 }
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 05da6b5b167b..a933a01f6e40 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
 #include <asm/traps.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/cpu.h>
 #include <asm/cpu_entry_area.h>
 #include <asm/mce.h>
 #include <asm/fixmap.h>
@@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
 {
 	struct task_struct *tsk = current;
 
-
 	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
 		return;
 
@@ -288,9 +288,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
 DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
 DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
 DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
-DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
 #undef IP
 
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	unsigned int trapnr = X86_TRAP_AC;
+	char str[] = "alignment check";
+	int signr = SIGBUS;
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
+		return;
+
+	if (!handle_split_lock())
+		return;
+
+	if (!user_mode(regs))
+		die("Split lock detected\n", regs, error_code);
+
+	cond_local_irq_enable(regs);
+
+	if (handle_user_split_lock(regs, error_code))
+		return;
+
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);
+}
+
 #ifdef CONFIG_VMAP_STACK
 __visible void __noreturn handle_stack_overflow(const char *message,
 						struct pt_regs *regs,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [PATCH v11] x86/split_lock: Enable split lock detection by kernel
  2020-01-10 19:24                     ` [PATCH v11] x86/split_lock: Enable split lock detection by kernel Luck, Tony
@ 2020-01-14  5:55                       ` Sean Christopherson
  2020-01-15 22:27                         ` Luck, Tony
  0 siblings, 1 reply; 145+ messages in thread
From: Sean Christopherson @ 2020-01-14  5:55 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Peter Zijlstra, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

On Fri, Jan 10, 2020 at 11:24:09AM -0800, Luck, Tony wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> A split-lock occurs when an atomic instruction operates on data
> that spans two cache lines. In order to maintain atomicity the
> core takes a global bus lock.
> 
> This is typically >1000 cycles slower than an atomic operation
> within a cache line. It also disrupts performance on other cores
> (which must wait for the bus lock to be released before their
> memory operations can complete. For real-time systems this may
> mean missing deadlines. For other systems it may just be very
> annoying.
> 
> Some CPUs have the capability to raise an #AC trap when a
> split lock is attempted.
> 
> Provide a command line option to give the user choices on how
> to handle this. split_lock_detect=
> 	off	- not enabled (no traps for split locks)
> 	warn	- warn once when an application does a
> 		  split lock, bust allow it to continue
> 		  running.
> 	fatal	- Send SIGBUS to applications that cause split lock
> 
> Default is "warn". Note that if the kernel hits a split lock
> in any mode other than "off" it will OOPs.
> 
> One implementation wrinkle is that the MSR to control the
> split lock detection is per-core, not per thread. This might
> result in some short lived races on HT systems in "warn" mode
> if Linux tries to enable on one thread while disabling on
> the other. Race analysis by Sean Christopherson:
> 
>   - Toggling of split-lock is only done in "warn" mode.  Worst case
>     scenario of a race is that a misbehaving task will generate multiple
>     #AC exceptions on the same instruction.  And this race will only occur
>     if both siblings are running tasks that generate split-lock #ACs, e.g.
>     a race where sibling threads are writing different values will only
>     occur if CPUx is disabling split-lock after an #AC and CPUy is
>     re-enabling split-lock after *its* previous task generated an #AC.
>   - Transitioning between modes at runtime isn't supported and disabling
>     is tracked per task, so hardware will always reach a steady state that
>     matches the configured mode.  I.e. split-lock is guaranteed to be
>     enabled in hardware once all _TIF_SLD threads have been scheduled out.
> 
> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>

Need Fenghua's SoB.

> Co-developed-by: Peter Zijlstra <peterz@infradead.org>

Co-developed-by for Peter not needed since he's the author (attributed
via From).

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> 
> I think all the known places where split locks occur in the kernel
> have already been patched, or the patches are queued for the upcoming
> merge window.  If we missed some, well this patch will help find them
> (for people with Icelake or Icelake Xeon systems). PeterZ didn't see
> any application level use of split locks in a few hours of runtime
> on his desktop. So likely little fallout there (default is just to
> warn for applications, so just console noise rather than failure).
> 
>  .../admin-guide/kernel-parameters.txt         |  18 ++
>  arch/x86/include/asm/cpu.h                    |  17 ++
>  arch/x86/include/asm/cpufeatures.h            |   2 +
>  arch/x86/include/asm/msr-index.h              |   8 +
>  arch/x86/include/asm/thread_info.h            |   6 +-
>  arch/x86/include/asm/traps.h                  |   1 +
>  arch/x86/kernel/cpu/common.c                  |   2 +
>  arch/x86/kernel/cpu/intel.c                   | 170 ++++++++++++++++++
>  arch/x86/kernel/process.c                     |   3 +
>  arch/x86/kernel/traps.c                       |  29 ++-
>  10 files changed, 252 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index ade4e6ec23e0..173c1acff5f0 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3181,6 +3181,24 @@
>  
>  	nosoftlockup	[KNL] Disable the soft-lockup detector.
>  
> +	split_lock_detect=

Would it make sense to name this split_lock_ac?  To help clarify what the
param does and to future proof a bit in the event split lock detection is
able to signal some other form of fault/trap.

> +			[X86] Enable split lock detection
> +
> +			When enabled (and if hardware support is present), atomic
> +			instructions that access data across cache line
> +			boundaries will result in an alignment check exception.
> +
> +			off	- not enabled
> +
> +			warn	- the kernel will pr_alert about applications
> +				  triggering the #AC exception
> +
> +			fatal	- the kernel will SIGBUS applications that
> +				  trigger the #AC exception.
> +
> +			For any more other than 'off' the kernel will die if
> +			it (or firmware) will trigger #AC.
> +
>  	nosync		[HW,M68K] Disables sync negotiation for all devices.
>  
>  	nowatchdog	[KNL] Disable both lockup detectors, i.e.

...

> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index d779366ce3f8..d23638a0525e 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -92,6 +92,7 @@ struct thread_info {
>  #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
>  #define TIF_NOTSC		16	/* TSC is not accessible in userland */
>  #define TIF_IA32		17	/* IA32 compatibility process */
> +#define TIF_SLD			18	/* split_lock_detect */

A more informative name comment would be helpful since the flag is set when
SLD is disabled by the previous task.  Something like? 

#define TIF_NEED_SLD_RESTORE	18	/* Restore split lock detection on context switch */

>  #define TIF_NOHZ		19	/* in adaptive nohz mode */
>  #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
>  #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
> @@ -122,6 +123,7 @@ struct thread_info {
>  #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
>  #define _TIF_NOTSC		(1 << TIF_NOTSC)
>  #define _TIF_IA32		(1 << TIF_IA32)
> +#define _TIF_SLD		(1 << TIF_SLD)
>  #define _TIF_NOHZ		(1 << TIF_NOHZ)
>  #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
>  #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
> @@ -158,9 +160,9 @@ struct thread_info {
>  
>  #ifdef CONFIG_X86_IOPL_IOPERM
>  # define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
> -				 _TIF_IO_BITMAP)
> +				 _TIF_IO_BITMAP | _TIF_SLD)
>  #else
> -# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
> +# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
>  #endif
>  
>  #define _TIF_WORK_CTXSW_NEXT	(_TIF_WORK_CTXSW)
> diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h
> index ffa0dc8a535e..6ceab60370f0 100644
> --- a/arch/x86/include/asm/traps.h
> +++ b/arch/x86/include/asm/traps.h
> @@ -175,4 +175,5 @@ enum x86_pf_error_code {
>  	X86_PF_INSTR	=		1 << 4,
>  	X86_PF_PK	=		1 << 5,
>  };
> +

Spurious whitespace.

>  #endif /* _ASM_X86_TRAPS_H */
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 2e4d90294fe6..39245f61fad0 100644

...

> +bool handle_split_lock(void)

This is a confusing name IMO, e.g. split_lock_detect_enabled() or similar
would be more intuitive.  It'd also avoid the weirdness of having different
semantics for the returns values of handle_split_lock() and
handle_user_split_lock().

> +{
> +	return sld_state != sld_off;
> +}
> +
> +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> +	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> +		return false;

Maybe add "|| WARN_ON_ONCE(sld_state != sld_off)" to try to prevent the
kernel from going fully into the weeds if a spurious #AC occurs.

> +
> +	pr_alert("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",

pr_warn_ratelimited since it's user controlled?

> +		 current->comm, current->pid, regs->ip);
> +
> +	__sld_msr_set(false);
> +	set_tsk_thread_flag(current, TIF_SLD);
> +	return true;
> +}
> +
> +void switch_sld(struct task_struct *prev)
> +{
> +	__sld_msr_set(true);
> +	clear_tsk_thread_flag(prev, TIF_SLD);
> +}
> +
> +#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
> +
> +/*
> + * The following processors have split lock detection feature. But since they
> + * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
> + * the MSR. So enumerate the feature by family and model on these processors.
> + */
> +static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
> +	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
> +	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
> +	{}
> +};
> +
> +void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
> +{
> +	u64 ia32_core_caps = 0;
> +
> +	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
> +		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
> +		rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
> +	} else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> +		/* Enumerate split lock detection by family and model. */
> +		if (x86_match_cpu(split_lock_cpu_ids))
> +			ia32_core_caps |= MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT;
> +	}
> +
> +	if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT)
> +		split_lock_setup();
> +}
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 61e93a318983..55d205820f35 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
>  		/* Enforce MSR update to ensure consistent state */
>  		__speculation_ctrl_update(~tifn, tifn);
>  	}
> +
> +	if (tifp & _TIF_SLD)
> +		switch_sld(prev_p);
>  }
>  
>  /*
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 05da6b5b167b..a933a01f6e40 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -46,6 +46,7 @@
>  #include <asm/traps.h>
>  #include <asm/desc.h>
>  #include <asm/fpu/internal.h>
> +#include <asm/cpu.h>
>  #include <asm/cpu_entry_area.h>
>  #include <asm/mce.h>
>  #include <asm/fixmap.h>
> @@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
>  {
>  	struct task_struct *tsk = current;
>  
> -

Whitespace.

>  	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
>  		return;
>  
> @@ -288,9 +288,34 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
>  DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
>  DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
>  DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
> -DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
>  #undef IP
>  
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> +	unsigned int trapnr = X86_TRAP_AC;
> +	char str[] = "alignment check";

const if you want to keep it.

> +	int signr = SIGBUS;

Don't see any reason for these, e.g. they're not used for do_trap().
trapnr and signr in particular do more harm than good.

> +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> +	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> +		return;
> +
> +	if (!handle_split_lock())
> +		return;
> +
> +	if (!user_mode(regs))
> +		die("Split lock detected\n", regs, error_code);
> +
> +	cond_local_irq_enable(regs);
> +
> +	if (handle_user_split_lock(regs, error_code))
> +		return;
> +
> +	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> +		error_code, BUS_ADRALN, NULL);
> +}
> +
>  #ifdef CONFIG_VMAP_STACK
>  __visible void __noreturn handle_stack_overflow(const char *message,
>  						struct pt_regs *regs,
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v11] x86/split_lock: Enable split lock detection by kernel
  2020-01-14  5:55                       ` Sean Christopherson
@ 2020-01-15 22:27                         ` Luck, Tony
  2020-01-15 22:57                           ` Sean Christopherson
  0 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2020-01-15 22:27 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Zijlstra, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

On Mon, Jan 13, 2020 at 09:55:21PM -0800, Sean Christopherson wrote:
> On Fri, Jan 10, 2020 at 11:24:09AM -0800, Luck, Tony wrote:

All comments accepted and code changed ... except for these three:

> > +#define TIF_SLD			18	/* split_lock_detect */
> 
> A more informative name comment would be helpful since the flag is set when
> SLD is disabled by the previous task.  Something like? 
> 
> #define TIF_NEED_SLD_RESTORE	18	/* Restore split lock detection on context switch */

That name is more informative ... but it is also really, really long. Are
you sure?

> > +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> > +{
> > +	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> > +		return false;
> 
> Maybe add "|| WARN_ON_ONCE(sld_state != sld_off)" to try to prevent the
> kernel from going fully into the weeds if a spurious #AC occurs.

Can a spurious #AC occur? I don't see how.

> > @@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
> >  {
> >  	struct task_struct *tsk = current;
> >  
> > -
> 
> Whitespace.
> 
> >  	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
> >  		return;

I'm staring at the post patch code, and I can't see what whitespace
issue you see.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v11] x86/split_lock: Enable split lock detection by kernel
  2020-01-15 22:27                         ` Luck, Tony
@ 2020-01-15 22:57                           ` Sean Christopherson
  2020-01-15 23:48                             ` Luck, Tony
  2020-01-22 18:55                             ` [PATCH v12] " Luck, Tony
  0 siblings, 2 replies; 145+ messages in thread
From: Sean Christopherson @ 2020-01-15 22:57 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Peter Zijlstra, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

On Wed, Jan 15, 2020 at 02:27:54PM -0800, Luck, Tony wrote:
> On Mon, Jan 13, 2020 at 09:55:21PM -0800, Sean Christopherson wrote:
> > On Fri, Jan 10, 2020 at 11:24:09AM -0800, Luck, Tony wrote:
> 
> All comments accepted and code changed ... except for these three:

Sounds like you're also writing code, in which case you should give
yourself credit with your own Co-developed-by: tag.

> > > +#define TIF_SLD			18	/* split_lock_detect */
> > 
> > A more informative name comment would be helpful since the flag is set when
> > SLD is disabled by the previous task.  Something like? 
> > 
> > #define TIF_NEED_SLD_RESTORE	18	/* Restore split lock detection on context switch */
> 
> That name is more informative ... but it is also really, really long. Are
> you sure?

Not at all.  I picked a semi-arbitrary name that was similar to existing
TIF names, I'll defer to anyone with an opinion.

> > > +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> > > +{
> > > +	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> > > +		return false;
> > 
> > Maybe add "|| WARN_ON_ONCE(sld_state != sld_off)" to try to prevent the
> > kernel from going fully into the weeds if a spurious #AC occurs.
> 
> Can a spurious #AC occur? I don't see how.

It's mostly paranoia, e.g. if sld_state==sld_off but the MSR bit was
misconfigured.  No objection if you want to omit the check.

> > > @@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
> > >  {
> > >  	struct task_struct *tsk = current;
> > >  
> > > -
> > 
> > Whitespace.
> > 
> > >  	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
> > >  		return;
> 
> I'm staring at the post patch code, and I can't see what whitespace
> issue you see.

There's a random newline removal in do_trap().  It's a good change in the
sense that it eliminates an extra newline, bad in the sense that it's
unrelated to the rest of the patch.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v11] x86/split_lock: Enable split lock detection by kernel
  2020-01-15 22:57                           ` Sean Christopherson
@ 2020-01-15 23:48                             ` Luck, Tony
  2020-01-22 18:55                             ` [PATCH v12] " Luck, Tony
  1 sibling, 0 replies; 145+ messages in thread
From: Luck, Tony @ 2020-01-15 23:48 UTC (permalink / raw)
  To: Christopherson, Sean J
  Cc: Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86

>> All comments accepted and code changed ... except for these three:
>
> Sounds like you're also writing code, in which case you should give
> yourself credit with your own Co-developed-by: tag.

I just fixed some typos in PeterZ's untested example patch. Now changed
a few names as per your suggestions. I don't' really think of that as "writing code".

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* [PATCH v12] x86/split_lock: Enable split lock detection by kernel
  2020-01-15 22:57                           ` Sean Christopherson
  2020-01-15 23:48                             ` Luck, Tony
@ 2020-01-22 18:55                             ` Luck, Tony
  2020-01-22 19:04                               ` Borislav Petkov
  2020-01-22 22:42                               ` Arvind Sankar
  1 sibling, 2 replies; 145+ messages in thread
From: Luck, Tony @ 2020-01-22 18:55 UTC (permalink / raw)
  To: Thomas Gleixner, Sean Christopherson
  Cc: Peter Zijlstra, Ingo Molnar, Fenghua Yu, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

From: Peter Zijlstra <peterz@infradead.org>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete. For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
	off	- not enabled (no traps for split locks)
	warn	- warn once when an application does a
		  split lock, bust allow it to continue
		  running.
	fatal	- Send SIGBUS to applications that cause split lock

Default is "warn". Note that if the kernel hits a split lock
in any mode other than "off" it will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

  - Toggling of split-lock is only done in "warn" mode.  Worst case
    scenario of a race is that a misbehaving task will generate multiple
    #AC exceptions on the same instruction.  And this race will only occur
    if both siblings are running tasks that generate split-lock #ACs, e.g.
    a race where sibling threads are writing different values will only
    occur if CPUx is disabling split-lock after an #AC and CPUy is
    re-enabling split-lock after *its* previous task generated an #AC.
  - Transitioning between modes at runtime isn't supported and disabling
    is tracked per task, so hardware will always reach a steady state that
    matches the configured mode.  I.e. split-lock is guaranteed to be
    enabled in hardware once all _TIF_SLD threads have been scheduled out.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

v12: Applied all changes suggested by Sean except:
	1) Keep the short name TIF_SLD (though I did take the
	   improved comment on what it does)
	2) Did not add a WARN_ON in trap code for unexpected #AC
	3) Kept the white space cleanup (delete unneeded blank line)
	   in do_trap()

 .../admin-guide/kernel-parameters.txt         |  18 ++
 arch/x86/include/asm/cpu.h                    |  17 ++
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/msr-index.h              |   8 +
 arch/x86/include/asm/thread_info.h            |   6 +-
 arch/x86/kernel/cpu/common.c                  |   2 +
 arch/x86/kernel/cpu/intel.c                   | 170 ++++++++++++++++++
 arch/x86/kernel/process.c                     |   3 +
 arch/x86/kernel/traps.c                       |  27 ++-
 9 files changed, 249 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ade4e6ec23e0..36a4e0e2654b 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3181,6 +3181,24 @@
 
 	nosoftlockup	[KNL] Disable the soft-lockup detector.
 
+	split_lock_ac=
+			[X86] Enable split lock detection
+
+			When enabled (and if hardware support is present), atomic
+			instructions that access data across cache line
+			boundaries will result in an alignment check exception.
+
+			off	- not enabled
+
+			warn	- the kernel will pr_alert about applications
+				  triggering the #AC exception
+
+			fatal	- the kernel will SIGBUS applications that
+				  trigger the #AC exception.
+
+			For any more other than 'off' the kernel will die if
+			it (or firmware) will trigger #AC.
+
 	nosync		[HW,M68K] Disables sync negotiation for all devices.
 
 	nowatchdog	[KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..32a295533e2d 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool split_lock_detect_enabled(void);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_sld(struct task_struct *);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool split_lock_detect_enabled(void)
+{
+	return false;
+}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	return false;
+}
+static inline void switch_sld(struct task_struct *prev) {}
+#endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index e9b62498fe75..c3edd2bba184 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -220,6 +220,7 @@
 #define X86_FEATURE_ZEN			( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
 #define X86_FEATURE_L1TF_PTEINV		( 7*32+29) /* "" L1TF workaround PTE inversion */
 #define X86_FEATURE_IBRS_ENHANCED	( 7*32+30) /* Enhanced IBRS */
+#define X86_FEATURE_SPLIT_LOCK_DETECT	( 7*32+31) /* #AC for split lock */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
@@ -365,6 +366,7 @@
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
 /*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 084e98da04a7..8bb2e08ce4a3 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_TEST_CTRL				0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
 #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
 #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,10 @@
  */
 #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
 
+#define MSR_IA32_CORE_CAPABILITIES			  0x000000cf
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT  5
+#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index d779366ce3f8..cd88642e9e15 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
 #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
+#define TIF_SLD			18	/* Restore split lock detection on context switch */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
 #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
+#define _TIF_SLD		(1 << TIF_SLD)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {
 
 #ifdef CONFIG_X86_IOPL_IOPERM
 # define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
-				 _TIF_IO_BITMAP)
+				 _TIF_IO_BITMAP | _TIF_SLD)
 #else
-# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
 #endif
 
 #define _TIF_WORK_CTXSW_NEXT	(_TIF_WORK_CTXSW)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2e4d90294fe6..39245f61fad0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1234,6 +1234,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
+	cpu_set_core_cap_bits(c);
+
 	fpu__init_system(c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4a900804a023..708fde6db703 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -31,6 +33,14 @@
 #include <asm/apic.h>
 #endif
 
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
+static enum split_lock_detect_state sld_state = sld_warn;
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
@@ -652,6 +662,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_init(void);
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
@@ -767,6 +779,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_enable();
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
+
+	split_lock_init();
 }
 
 #ifdef CONFIG_X86_32
@@ -1028,3 +1042,159 @@ static const struct cpu_dev intel_cpu_dev = {
 };
 
 cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+	const char			*option;
+	enum split_lock_detect_state	state;
+} sld_options[] __initconst = {
+	{ "off",	sld_off   },
+	{ "warn",	sld_warn  },
+	{ "fatal",	sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+	int len = strlen(opt);
+
+	return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+	enum split_lock_detect_state sld = sld_state;
+	char arg[20];
+	int i, ret;
+
+	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+	ret = cmdline_find_option(boot_command_line, "split_lock_ac",
+				  arg, sizeof(arg));
+	if (ret < 0)
+		goto print;
+
+	for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+		if (match_option(arg, ret, sld_options[i].option)) {
+			sld = sld_options[i].state;
+			break;
+		}
+	}
+
+	if (sld != sld_state)
+		sld_state = sld;
+
+print:
+	switch(sld) {
+	case sld_off:
+		pr_info("disabled\n");
+		break;
+
+	case sld_warn:
+		pr_info("warning about user-space split_locks\n");
+		break;
+
+	case sld_fatal:
+		pr_info("sending SIGBUS on user-space split_locks\n");
+		break;
+	}
+}
+
+/*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+	u64 test_ctrl_val;
+
+	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+		return false;
+
+	if (on)
+		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+	else
+		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+		return false;
+
+	return true;
+}
+
+static void split_lock_init(void)
+{
+	if (sld_state == sld_off)
+		return;
+
+	if (__sld_msr_set(true))
+		return;
+
+	/*
+	 * If this is anything other than the boot-cpu, you've done
+	 * funny things and you get to keep whatever pieces.
+	 */
+	pr_warn("MSR fail -- disabled\n");
+	__sld_msr_set(sld_off);
+}
+
+bool split_lock_detect_enabled(void)
+{
+	return sld_state != sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+		return false;
+
+	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+		 current->comm, current->pid, regs->ip);
+
+	__sld_msr_set(false);
+	set_tsk_thread_flag(current, TIF_SLD);
+	return true;
+}
+
+void switch_sld(struct task_struct *prev)
+{
+	__sld_msr_set(true);
+	clear_tsk_thread_flag(prev, TIF_SLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+	{}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+	u64 ia32_core_caps = 0;
+
+	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+		rdmsrl(MSR_IA32_CORE_CAPABILITIES, ia32_core_caps);
+	} else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+		/* Enumerate split lock detection by family and model. */
+		if (x86_match_cpu(split_lock_cpu_ids))
+			ia32_core_caps |= MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT;
+	}
+
+	if (ia32_core_caps & MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT)
+		split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 61e93a318983..55d205820f35 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -654,6 +654,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
 		/* Enforce MSR update to ensure consistent state */
 		__speculation_ctrl_update(~tifn, tifn);
 	}
+
+	if (tifp & _TIF_SLD)
+		switch_sld(prev_p);
 }
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 05da6b5b167b..ef287effd8ba 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
 #include <asm/traps.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/cpu.h>
 #include <asm/cpu_entry_area.h>
 #include <asm/mce.h>
 #include <asm/fixmap.h>
@@ -242,7 +243,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
 {
 	struct task_struct *tsk = current;
 
-
 	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
 		return;
 
@@ -288,9 +288,32 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
 DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
 DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
 DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
-DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
 #undef IP
 
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	const char str[] = "alignment check";
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+		return;
+
+	if (!split_lock_detect_enabled())
+		return;
+
+	if (!user_mode(regs))
+		die("Split lock detected\n", regs, error_code);
+
+	cond_local_irq_enable(regs);
+
+	if (handle_user_split_lock(regs, error_code))
+		return;
+
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);
+}
+
 #ifdef CONFIG_VMAP_STACK
 __visible void __noreturn handle_stack_overflow(const char *message,
 						struct pt_regs *regs,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel
  2020-01-22 18:55                             ` [PATCH v12] " Luck, Tony
@ 2020-01-22 19:04                               ` Borislav Petkov
  2020-01-22 20:03                                 ` Luck, Tony
  2020-01-22 22:42                               ` Arvind Sankar
  1 sibling, 1 reply; 145+ messages in thread
From: Borislav Petkov @ 2020-01-22 19:04 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, Sean Christopherson, Peter Zijlstra,
	Ingo Molnar, Fenghua Yu, Ingo Molnar, H Peter Anvin, Ashok Raj,
	Ravi V Shankar, linux-kernel, x86

On Wed, Jan 22, 2020 at 10:55:14AM -0800, Luck, Tony wrote:
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index e9b62498fe75..c3edd2bba184 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -220,6 +220,7 @@
>  #define X86_FEATURE_ZEN			( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
>  #define X86_FEATURE_L1TF_PTEINV		( 7*32+29) /* "" L1TF workaround PTE inversion */
>  #define X86_FEATURE_IBRS_ENHANCED	( 7*32+30) /* Enhanced IBRS */
> +#define X86_FEATURE_SPLIT_LOCK_DETECT	( 7*32+31) /* #AC for split lock */

That word is already full in tip:

...
#define X86_FEATURE_MSR_IA32_FEAT_CTL   ( 7*32+31) /* "" MSR IA32_FEAT_CTL configured */

use word 11 instead.

> +#define MSR_TEST_CTRL				0x00000033
> +#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
> +#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
> +
>  #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
>  #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
>  #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
> @@ -70,6 +74,10 @@
>   */
>  #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
>  
> +#define MSR_IA32_CORE_CAPABILITIES			  0x000000cf
> +#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT  5
> +#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)

Any chance making those shorter?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v12] x86/split_lock: Enable split lock detection by kernel
  2020-01-22 19:04                               ` Borislav Petkov
@ 2020-01-22 20:03                                 ` Luck, Tony
  2020-01-22 20:55                                   ` Borislav Petkov
  0 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2020-01-22 20:03 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, H Peter Anvin, Raj, Ashok,
	Shankar, Ravi V, linux-kernel, x86

>> +#define X86_FEATURE_SPLIT_LOCK_DETECT	( 7*32+31) /* #AC for split lock */
>
> That word is already full in tip:
> ...
> use word 11 instead.

Will rebase against tip/master and move to word 11.

>> +#define MSR_IA32_CORE_CAPABILITIES			  0x000000cf
>> +#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT  5
>> +#define MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPABILITIES_SPLIT_LOCK_DETECT_BIT)
>
> Any chance making those shorter?

I could abbreviate CAPABILITIES as "CAP", that would save 9 characters. Is that enough?

I'm not fond of the "remove the vowels": SPLT_LCK_DTCT, but that is sort of readable
and would save 4 more. What do you think?

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel
  2020-01-22 20:03                                 ` Luck, Tony
@ 2020-01-22 20:55                                   ` Borislav Petkov
  0 siblings, 0 replies; 145+ messages in thread
From: Borislav Petkov @ 2020-01-22 20:55 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, H Peter Anvin, Raj, Ashok,
	Shankar, Ravi V, linux-kernel, x86

On Wed, Jan 22, 2020 at 08:03:28PM +0000, Luck, Tony wrote:
> I could abbreviate CAPABILITIES as "CAP", that would save 9
> characters. Is that enough?

Sure, except...

> I'm not fond of the "remove the vowels": SPLT_LCK_DTCT, but that is
> sort of readable and would save 4 more. What do you think?

... we've been trying to keep the MSR names as spelled in the SDM to
avoid confusion.

Looking at it,

MSR_IA32_CORE_CAPABILITIES -> MSR_IA32_CORE_CAPS

along with a comment above its definition sounds like a good compromise
to me. IMO, of course.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel
  2020-01-22 18:55                             ` [PATCH v12] " Luck, Tony
  2020-01-22 19:04                               ` Borislav Petkov
@ 2020-01-22 22:42                               ` Arvind Sankar
  2020-01-22 22:52                                 ` Arvind Sankar
  2020-01-22 23:24                                 ` Luck, Tony
  1 sibling, 2 replies; 145+ messages in thread
From: Arvind Sankar @ 2020-01-22 22:42 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, Sean Christopherson, Peter Zijlstra,
	Ingo Molnar, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Ravi V Shankar, linux-kernel, x86

On Wed, Jan 22, 2020 at 10:55:14AM -0800, Luck, Tony wrote:
> +
> +static enum split_lock_detect_state sld_state = sld_warn;
> +

This sets sld_state to sld_warn even on CPUs that don't support
split-lock detection. split_lock_init will then try to read/write the
MSR to turn it on. Would it be better to initialize it to sld_off and
set it to sld_warn in split_lock_setup instead, which is only called if
the CPU supports the feature?

>  
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> +	const char str[] = "alignment check";
> +
> +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> +	if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
> +		return;
> +
> +	if (!split_lock_detect_enabled())
> +		return;

This misses one comment from Sean [1] that this check should be dropped,
otherwise user-space alignment check via EFLAGS.AC will get ignored when
split lock detection is disabled.

[1] https://lore.kernel.org/lkml/20191122184457.GA31235@linux.intel.com/

> +
> +	if (!user_mode(regs))
> +		die("Split lock detected\n", regs, error_code);
> +
> +	cond_local_irq_enable(regs);
> +
> +	if (handle_user_split_lock(regs, error_code))
> +		return;
> +
> +	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> +		error_code, BUS_ADRALN, NULL);
> +}
> +

Peter [2] called this a possible DOS vector. If userspace is malicious
rather than buggy, couldn't it simply ignore SIGBUS?

[2] https://lore.kernel.org/lkml/20191121131522.GX5671@hirez.programming.kicks-ass.net/

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel
  2020-01-22 22:42                               ` Arvind Sankar
@ 2020-01-22 22:52                                 ` Arvind Sankar
  2020-01-22 23:24                                 ` Luck, Tony
  1 sibling, 0 replies; 145+ messages in thread
From: Arvind Sankar @ 2020-01-22 22:52 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Luck, Tony, Thomas Gleixner, Sean Christopherson, Peter Zijlstra,
	Ingo Molnar, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Ashok Raj, Ravi V Shankar, linux-kernel, x86

On Wed, Jan 22, 2020 at 05:42:51PM -0500, Arvind Sankar wrote:
> 
> Peter [2] called this a possible DOS vector. If userspace is malicious
> rather than buggy, couldn't it simply ignore SIGBUS?
> 
> [2] https://lore.kernel.org/lkml/20191121131522.GX5671@hirez.programming.kicks-ass.net/

Ignore this last bit, wasn't thinking right.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v12] x86/split_lock: Enable split lock detection by kernel
  2020-01-22 22:42                               ` Arvind Sankar
  2020-01-22 22:52                                 ` Arvind Sankar
@ 2020-01-22 23:24                                 ` Luck, Tony
  2020-01-23  0:45                                   ` Arvind Sankar
  1 sibling, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2020-01-22 23:24 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Thomas Gleixner, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

>> +static enum split_lock_detect_state sld_state = sld_warn;
>> +
>
> This sets sld_state to sld_warn even on CPUs that don't support
> split-lock detection. split_lock_init will then try to read/write the
> MSR to turn it on. Would it be better to initialize it to sld_off and
> set it to sld_warn in split_lock_setup instead, which is only called if
> the CPU supports the feature?

I've lost some bits of this patch series somewhere along the way :-(  There
was once code to decide whether the feature was supported (either with
x86_match_cpu() for a couple of models, or using the architectural test
based on some MSR bits.  I need to dig that out and put it back in. Then
stuff can check X86_FEATURE_SPLIT_LOCK before wandering into code
that messes with MSRs

>> +	if (!split_lock_detect_enabled())
>> +		return;
>
> This misses one comment from Sean [1] that this check should be dropped,
> otherwise user-space alignment check via EFLAGS.AC will get ignored when
> split lock detection is disabled.

Ah yes. Good catch.  Will fix.

Thanks for the review.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel
  2020-01-22 23:24                                 ` Luck, Tony
@ 2020-01-23  0:45                                   ` Arvind Sankar
  2020-01-23  1:23                                     ` Luck, Tony
  2020-01-23  3:53                                     ` [PATCH v13] " Luck, Tony
  0 siblings, 2 replies; 145+ messages in thread
From: Arvind Sankar @ 2020-01-23  0:45 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Arvind Sankar, Thomas Gleixner, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

On Wed, Jan 22, 2020 at 11:24:34PM +0000, Luck, Tony wrote:
> >> +static enum split_lock_detect_state sld_state = sld_warn;
> >> +
> >
> > This sets sld_state to sld_warn even on CPUs that don't support
> > split-lock detection. split_lock_init will then try to read/write the
> > MSR to turn it on. Would it be better to initialize it to sld_off and
> > set it to sld_warn in split_lock_setup instead, which is only called if
> > the CPU supports the feature?
> 
> I've lost some bits of this patch series somewhere along the way :-(  There
> was once code to decide whether the feature was supported (either with
> x86_match_cpu() for a couple of models, or using the architectural test
> based on some MSR bits.  I need to dig that out and put it back in. Then
> stuff can check X86_FEATURE_SPLIT_LOCK before wandering into code
> that messes with MSRs

That code is still there (cpu_set_core_cap_bits). The issue is that with
the initialization here, nothing ever sets sld_state to sld_off if the
feature isn't supported.

v10 had a corresponding split_lock_detect_enabled that was
0-initialized, but Peter's patch as he sent out had the flag initialized
to sld_warn.

> 
> >> +	if (!split_lock_detect_enabled())
> >> +		return;
> >
> > This misses one comment from Sean [1] that this check should be dropped,
> > otherwise user-space alignment check via EFLAGS.AC will get ignored when
> > split lock detection is disabled.
> 
> Ah yes. Good catch.  Will fix.
> 
> Thanks for the review.
> 
> -Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel
  2020-01-23  0:45                                   ` Arvind Sankar
@ 2020-01-23  1:23                                     ` Luck, Tony
  2020-01-23  4:21                                       ` Arvind Sankar
  2020-01-23  3:53                                     ` [PATCH v13] " Luck, Tony
  1 sibling, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2020-01-23  1:23 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Thomas Gleixner, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Wed, Jan 22, 2020 at 07:45:08PM -0500, Arvind Sankar wrote:
> On Wed, Jan 22, 2020 at 11:24:34PM +0000, Luck, Tony wrote:
> > >> +static enum split_lock_detect_state sld_state = sld_warn;
> > >> +
> > >
> > > This sets sld_state to sld_warn even on CPUs that don't support
> > > split-lock detection. split_lock_init will then try to read/write the
> > > MSR to turn it on. Would it be better to initialize it to sld_off and
> > > set it to sld_warn in split_lock_setup instead, which is only called if
> > > the CPU supports the feature?
> > 
> > I've lost some bits of this patch series somewhere along the way :-(  There
> > was once code to decide whether the feature was supported (either with
> > x86_match_cpu() for a couple of models, or using the architectural test
> > based on some MSR bits.  I need to dig that out and put it back in. Then
> > stuff can check X86_FEATURE_SPLIT_LOCK before wandering into code
> > that messes with MSRs
> 
> That code is still there (cpu_set_core_cap_bits). The issue is that with
> the initialization here, nothing ever sets sld_state to sld_off if the
> feature isn't supported.
> 
> v10 had a corresponding split_lock_detect_enabled that was
> 0-initialized, but Peter's patch as he sent out had the flag initialized
> to sld_warn.

Ah yes. Maybe the problem is that split_lock_init() is only
called on systems that support split loc detect, while we call
split_lock_init() unconditionally.

What if we start with sld_state = sld_off, and then have split_lock_setup
set it to either sld_warn, or whatever the user chose on the command
line.  Patch below (on top of patch so you can see what I'm saying,
but will just merge it in for next version.

-Tony


diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 7478bebcd735..b6046ccfa372 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -39,7 +39,13 @@ enum split_lock_detect_state {
 	sld_fatal,
 };
 
-static enum split_lock_detect_state sld_state = sld_warn;
+/*
+ * Default to sld_off because most systems do not support
+ * split lock detection. split_lock_setup() will switch this
+ * to sld_warn, and then check to see if there is a command
+ * line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
 
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
@@ -1017,10 +1023,11 @@ static inline bool match_option(const char *arg, int arglen, const char *opt)
 
 static void __init split_lock_setup(void)
 {
-	enum split_lock_detect_state sld = sld_state;
+	enum split_lock_detect_state sld;
 	char arg[20];
 	int i, ret;
 
+	sld_state = sld = sld_warn;
 	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
 
 	ret = cmdline_find_option(boot_command_line, "split_lock_ac",

^ permalink raw reply related	[flat|nested] 145+ messages in thread

* [PATCH v13] x86/split_lock: Enable split lock detection by kernel
  2020-01-23  0:45                                   ` Arvind Sankar
  2020-01-23  1:23                                     ` Luck, Tony
@ 2020-01-23  3:53                                     ` Luck, Tony
  2020-01-23  4:45                                       ` Arvind Sankar
  1 sibling, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2020-01-23  3:53 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Thomas Gleixner, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

From: Peter Zijlstra <peterz@infradead.org>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete. For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
	off	- not enabled (no traps for split locks)
	warn	- warn once when an application does a
		  split lock, bust allow it to continue
		  running.
	fatal	- Send SIGBUS to applications that cause split lock

Default is "warn". Note that if the kernel hits a split lock
in any mode other than "off" it will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

  - Toggling of split-lock is only done in "warn" mode.  Worst case
    scenario of a race is that a misbehaving task will generate multiple
    #AC exceptions on the same instruction.  And this race will only occur
    if both siblings are running tasks that generate split-lock #ACs, e.g.
    a race where sibling threads are writing different values will only
    occur if CPUx is disabling split-lock after an #AC and CPUy is
    re-enabling split-lock after *its* previous task generated an #AC.
  - Transitioning between modes at runtime isn't supported and disabling
    is tracked per task, so hardware will always reach a steady state that
    matches the configured mode.  I.e. split-lock is guaranteed to be
    enabled in hardware once all _TIF_SLD threads have been scheduled out.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

v13: (rebased to tip/master because of first item below)
	Boris: X86 features word 7 is full, move to word 11
	Boris: MSR_IA32_CORE_CAPABILITIES too long. Abbreviate
	       (but include comment with SDM matching name)
	Arvind: Missed a comment from Sean about bogus test in
		trap handling. Delete it.
	Arvind: split_lock_init() accesses MSR on platforms that
		don't support it. Change default to "off" and
		only upgrade to "warn" on platforms that support
		split lock detect.

 .../admin-guide/kernel-parameters.txt         |  18 ++
 arch/x86/include/asm/cpu.h                    |  17 ++
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/msr-index.h              |   9 +
 arch/x86/include/asm/thread_info.h            |   6 +-
 arch/x86/kernel/cpu/common.c                  |   2 +
 arch/x86/kernel/cpu/intel.c                   | 177 ++++++++++++++++++
 arch/x86/kernel/process.c                     |   3 +
 arch/x86/kernel/traps.c                       |  24 ++-
 9 files changed, 254 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7f1e2f327e43..b420e0cebc0c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3207,6 +3207,24 @@
 
 	nosoftlockup	[KNL] Disable the soft-lockup detector.
 
+	split_lock_ac=
+			[X86] Enable split lock detection
+
+			When enabled (and if hardware support is present), atomic
+			instructions that access data across cache line
+			boundaries will result in an alignment check exception.
+
+			off	- not enabled
+
+			warn	- the kernel will pr_alert about applications
+				  triggering the #AC exception
+
+			fatal	- the kernel will SIGBUS applications that
+				  trigger the #AC exception.
+
+			For any more other than 'off' the kernel will die if
+			it (or firmware) will trigger #AC.
+
 	nosync		[HW,M68K] Disables sync negotiation for all devices.
 
 	nowatchdog	[KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..32a295533e2d 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool split_lock_detect_enabled(void);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_sld(struct task_struct *);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool split_lock_detect_enabled(void)
+{
+	return false;
+}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	return false;
+}
+static inline void switch_sld(struct task_struct *prev) {}
+#endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb56edf..cd56ad5d308e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
 #define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 #define X86_FEATURE_FENCE_SWAPGS_USER	(11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -367,6 +368,7 @@
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
 /*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685e92dd..8821697a7549 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_TEST_CTRL				0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
 #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
 #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,11 @@
  */
 #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
 
+/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
+#define MSR_IA32_CORE_CAPS			  0x000000cf
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT  5
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf4327986e98..e0d12517f348 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
 #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
+#define TIF_SLD			18	/* Restore split lock detection on context switch */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
 #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
+#define _TIF_SLD		(1 << TIF_SLD)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {
 
 #ifdef CONFIG_X86_IOPL_IOPERM
 # define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
-				 _TIF_IO_BITMAP)
+				 _TIF_IO_BITMAP | _TIF_SLD)
 #else
-# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
 #endif
 
 #define _TIF_WORK_CTXSW_NEXT	(_TIF_WORK_CTXSW)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 86b8241c8209..adb2f639f388 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
+	cpu_set_core_cap_bits(c);
+
 	fpu__init_system(c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57473e2c0869..b6046ccfa372 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -31,6 +33,20 @@
 #include <asm/apic.h>
 #endif
 
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
+/*
+ * Default to sld_off because most systems do not support
+ * split lock detection. split_lock_setup() will switch this
+ * to sld_warn, and then check to see if there is a command
+ * line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
@@ -606,6 +622,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_init(void);
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
@@ -720,6 +738,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_enable();
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
+
+	split_lock_init();
 }
 
 #ifdef CONFIG_X86_32
@@ -981,3 +1001,160 @@ static const struct cpu_dev intel_cpu_dev = {
 };
 
 cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+	const char			*option;
+	enum split_lock_detect_state	state;
+} sld_options[] __initconst = {
+	{ "off",	sld_off   },
+	{ "warn",	sld_warn  },
+	{ "fatal",	sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+	int len = strlen(opt);
+
+	return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+	enum split_lock_detect_state sld;
+	char arg[20];
+	int i, ret;
+
+	sld_state = sld = sld_warn;
+	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+	ret = cmdline_find_option(boot_command_line, "split_lock_ac",
+				  arg, sizeof(arg));
+	if (ret < 0)
+		goto print;
+
+	for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+		if (match_option(arg, ret, sld_options[i].option)) {
+			sld = sld_options[i].state;
+			break;
+		}
+	}
+
+	if (sld != sld_state)
+		sld_state = sld;
+
+print:
+	switch(sld) {
+	case sld_off:
+		pr_info("disabled\n");
+		break;
+
+	case sld_warn:
+		pr_info("warning about user-space split_locks\n");
+		break;
+
+	case sld_fatal:
+		pr_info("sending SIGBUS on user-space split_locks\n");
+		break;
+	}
+}
+
+/*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+	u64 test_ctrl_val;
+
+	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+		return false;
+
+	if (on)
+		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+	else
+		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+		return false;
+
+	return true;
+}
+
+static void split_lock_init(void)
+{
+	if (sld_state == sld_off)
+		return;
+
+	if (__sld_msr_set(true))
+		return;
+
+	/*
+	 * If this is anything other than the boot-cpu, you've done
+	 * funny things and you get to keep whatever pieces.
+	 */
+	pr_warn("MSR fail -- disabled\n");
+	__sld_msr_set(sld_off);
+}
+
+bool split_lock_detect_enabled(void)
+{
+	return sld_state != sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+		return false;
+
+	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+		 current->comm, current->pid, regs->ip);
+
+	__sld_msr_set(false);
+	set_tsk_thread_flag(current, TIF_SLD);
+	return true;
+}
+
+void switch_sld(struct task_struct *prev)
+{
+	__sld_msr_set(true);
+	clear_tsk_thread_flag(prev, TIF_SLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+	{}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+	u64 ia32_core_caps = 0;
+
+	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+		rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
+	} else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+		/* Enumerate split lock detection by family and model. */
+		if (x86_match_cpu(split_lock_cpu_ids))
+			ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+	}
+
+	if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
+		split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 839b5244e3b7..355760d36505 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -650,6 +650,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
 		/* Enforce MSR update to ensure consistent state */
 		__speculation_ctrl_update(~tifn, tifn);
 	}
+
+	if (tifp & _TIF_SLD)
+		switch_sld(prev_p);
 }
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e6f822922a3..61c576b95184 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
 #include <asm/traps.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/cpu.h>
 #include <asm/cpu_entry_area.h>
 #include <asm/mce.h>
 #include <asm/fixmap.h>
@@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
 {
 	struct task_struct *tsk = current;
 
-
 	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
 		return;
 
@@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
 DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
 DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
 DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
-DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
 #undef IP
 
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	const char str[] = "alignment check";
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+		return;
+
+	if (!user_mode(regs))
+		die("Split lock detected\n", regs, error_code);
+
+	cond_local_irq_enable(regs);
+
+	if (handle_user_split_lock(regs, error_code))
+		return;
+
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);
+}
+
 #ifdef CONFIG_VMAP_STACK
 __visible void __noreturn handle_stack_overflow(const char *message,
 						struct pt_regs *regs,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [PATCH v12] x86/split_lock: Enable split lock detection by kernel
  2020-01-23  1:23                                     ` Luck, Tony
@ 2020-01-23  4:21                                       ` Arvind Sankar
  2020-01-23 17:15                                         ` Luck, Tony
  0 siblings, 1 reply; 145+ messages in thread
From: Arvind Sankar @ 2020-01-23  4:21 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Arvind Sankar, Thomas Gleixner, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

On Wed, Jan 22, 2020 at 05:23:17PM -0800, Luck, Tony wrote:
> On Wed, Jan 22, 2020 at 07:45:08PM -0500, Arvind Sankar wrote:
> > On Wed, Jan 22, 2020 at 11:24:34PM +0000, Luck, Tony wrote:
> > > >> +static enum split_lock_detect_state sld_state = sld_warn;
> > > >> +
> > > >
> > > > This sets sld_state to sld_warn even on CPUs that don't support
> > > > split-lock detection. split_lock_init will then try to read/write the
> > > > MSR to turn it on. Would it be better to initialize it to sld_off and
> > > > set it to sld_warn in split_lock_setup instead, which is only called if
> > > > the CPU supports the feature?
> > > 
> > > I've lost some bits of this patch series somewhere along the way :-(  There
> > > was once code to decide whether the feature was supported (either with
> > > x86_match_cpu() for a couple of models, or using the architectural test
> > > based on some MSR bits.  I need to dig that out and put it back in. Then
> > > stuff can check X86_FEATURE_SPLIT_LOCK before wandering into code
> > > that messes with MSRs
> > 
> > That code is still there (cpu_set_core_cap_bits). The issue is that with
> > the initialization here, nothing ever sets sld_state to sld_off if the
> > feature isn't supported.
> > 
> > v10 had a corresponding split_lock_detect_enabled that was
> > 0-initialized, but Peter's patch as he sent out had the flag initialized
> > to sld_warn.
> 
> Ah yes. Maybe the problem is that split_lock_init() is only
> called on systems that support split loc detect, while we call
> split_lock_init() unconditionally.

It was unconditional in v10 too?

> 
> What if we start with sld_state = sld_off, and then have split_lock_setup
> set it to either sld_warn, or whatever the user chose on the command
> line.  Patch below (on top of patch so you can see what I'm saying,
> but will just merge it in for next version.

Yep, that's what I suggested.

> 
> -Tony
> 
> 
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 7478bebcd735..b6046ccfa372 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -39,7 +39,13 @@ enum split_lock_detect_state {
>  	sld_fatal,
>  };
>  
> -static enum split_lock_detect_state sld_state = sld_warn;
> +/*
> + * Default to sld_off because most systems do not support
> + * split lock detection. split_lock_setup() will switch this
> + * to sld_warn, and then check to see if there is a command
> + * line override.
> + */
> +static enum split_lock_detect_state sld_state = sld_off;
>  
>  /*
>   * Just in case our CPU detection goes bad, or you have a weird system,
> @@ -1017,10 +1023,11 @@ static inline bool match_option(const char *arg, int arglen, const char *opt)
>  
>  static void __init split_lock_setup(void)
>  {
> -	enum split_lock_detect_state sld = sld_state;
> +	enum split_lock_detect_state sld;

This is bike-shedding, but initializing sld = sld_warn here would have
been enough with no other changes to the patch I think?

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v13] x86/split_lock: Enable split lock detection by kernel
  2020-01-23  3:53                                     ` [PATCH v13] " Luck, Tony
@ 2020-01-23  4:45                                       ` Arvind Sankar
  2020-01-23 23:16                                         ` [PATCH v14] " Luck, Tony
  0 siblings, 1 reply; 145+ messages in thread
From: Arvind Sankar @ 2020-01-23  4:45 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Arvind Sankar, Thomas Gleixner, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

On Wed, Jan 22, 2020 at 07:53:59PM -0800, Luck, Tony wrote:
>  
> +	split_lock_ac=
> +			[X86] Enable split lock detection

More bike-shedding: I actually don't get Sean's suggestion to rename
this to split_lock_ac [1]. If split lock detection is able to trigger
some other form of fault/trap we just change the implementation to cope,
we would not want to change the command line argument that enables it,
so split_lock_detect is more informative?

And if the concern is the earlier one [2], then surely everything should
be renamed sld -> slac?

[1] https://lore.kernel.org/lkml/20200114055521.GI14928@linux.intel.com/
[2] https://lore.kernel.org/lkml/20191122184457.GA31235@linux.intel.com/

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v12] x86/split_lock: Enable split lock detection by kernel
  2020-01-23  4:21                                       ` Arvind Sankar
@ 2020-01-23 17:15                                         ` Luck, Tony
  0 siblings, 0 replies; 145+ messages in thread
From: Luck, Tony @ 2020-01-23 17:15 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Thomas Gleixner, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

>>  static void __init split_lock_setup(void)
>>  {
>> -	enum split_lock_detect_state sld = sld_state;
>> +	enum split_lock_detect_state sld;
>
> This is bike-shedding, but initializing sld = sld_warn here would have
> been enough with no other changes to the patch I think?

Not quite. If there isn't a command line option, we get here:

	if (ret < 0)
		goto print;

which skips copying the local "sld" to the global "sld_state".

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* [PATCH v14] x86/split_lock: Enable split lock detection by kernel
  2020-01-23  4:45                                       ` Arvind Sankar
@ 2020-01-23 23:16                                         ` Luck, Tony
  2020-01-24 21:36                                           ` Thomas Gleixner
  0 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2020-01-23 23:16 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Thomas Gleixner, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

From: Peter Zijlstra <peterz@infradead.org>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete. For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
	off	- not enabled (no traps for split locks)
	warn	- warn once when an application does a
		  split lock, bust allow it to continue
		  running.
	fatal	- Send SIGBUS to applications that cause split lock

Default is "warn". Note that if the kernel hits a split lock
in any mode other than "off" it will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

  - Toggling of split-lock is only done in "warn" mode.  Worst case
    scenario of a race is that a misbehaving task will generate multiple
    #AC exceptions on the same instruction.  And this race will only occur
    if both siblings are running tasks that generate split-lock #ACs, e.g.
    a race where sibling threads are writing different values will only
    occur if CPUx is disabling split-lock after an #AC and CPUy is
    re-enabling split-lock after *its* previous task generated an #AC.
  - Transitioning between modes at runtime isn't supported and disabling
    is tracked per task, so hardware will always reach a steady state that
    matches the configured mode.  I.e. split-lock is guaranteed to be
    enabled in hardware once all _TIF_SLD threads have been scheduled out.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

v14: I chatted offline with Sean about the kernel parameter name. He's
     now OK with the more generic name "split_lock_detect" rather than
     the trap specific split_lock_ac. So this reverts that change in
     the code and Documentation.  Thanks to Arvind for making us see
     sense ... not bike shedding at all!

 .../admin-guide/kernel-parameters.txt         |  18 ++
 arch/x86/include/asm/cpu.h                    |  17 ++
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/msr-index.h              |   9 +
 arch/x86/include/asm/thread_info.h            |   6 +-
 arch/x86/kernel/cpu/common.c                  |   2 +
 arch/x86/kernel/cpu/intel.c                   | 177 ++++++++++++++++++
 arch/x86/kernel/process.c                     |   3 +
 arch/x86/kernel/traps.c                       |  24 ++-
 9 files changed, 254 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7f1e2f327e43..568d20c04441 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3207,6 +3207,24 @@
 
 	nosoftlockup	[KNL] Disable the soft-lockup detector.
 
+	split_lock_detect=
+			[X86] Enable split lock detection
+
+			When enabled (and if hardware support is present), atomic
+			instructions that access data across cache line
+			boundaries will result in an alignment check exception.
+
+			off	- not enabled
+
+			warn	- the kernel will pr_alert about applications
+				  triggering the #AC exception
+
+			fatal	- the kernel will SIGBUS applications that
+				  trigger the #AC exception.
+
+			For any more other than 'off' the kernel will die if
+			it (or firmware) will trigger #AC.
+
 	nosync		[HW,M68K] Disables sync negotiation for all devices.
 
 	nowatchdog	[KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..32a295533e2d 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool split_lock_detect_enabled(void);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_sld(struct task_struct *);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool split_lock_detect_enabled(void)
+{
+	return false;
+}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	return false;
+}
+static inline void switch_sld(struct task_struct *prev) {}
+#endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb56edf..cd56ad5d308e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
 #define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 #define X86_FEATURE_FENCE_SWAPGS_USER	(11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -367,6 +368,7 @@
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
 /*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685e92dd..8821697a7549 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_TEST_CTRL				0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
 #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
 #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,11 @@
  */
 #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
 
+/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
+#define MSR_IA32_CORE_CAPS			  0x000000cf
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT  5
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf4327986e98..e0d12517f348 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
 #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
+#define TIF_SLD			18	/* Restore split lock detection on context switch */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
 #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
+#define _TIF_SLD		(1 << TIF_SLD)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {
 
 #ifdef CONFIG_X86_IOPL_IOPERM
 # define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
-				 _TIF_IO_BITMAP)
+				 _TIF_IO_BITMAP | _TIF_SLD)
 #else
-# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
 #endif
 
 #define _TIF_WORK_CTXSW_NEXT	(_TIF_WORK_CTXSW)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 86b8241c8209..adb2f639f388 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
+	cpu_set_core_cap_bits(c);
+
 	fpu__init_system(c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57473e2c0869..68d2a7044779 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -31,6 +33,20 @@
 #include <asm/apic.h>
 #endif
 
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
+/*
+ * Default to sld_off because most systems do not support
+ * split lock detection. split_lock_setup() will switch this
+ * to sld_warn, and then check to see if there is a command
+ * line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
@@ -606,6 +622,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_init(void);
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
@@ -720,6 +738,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_enable();
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
+
+	split_lock_init();
 }
 
 #ifdef CONFIG_X86_32
@@ -981,3 +1001,160 @@ static const struct cpu_dev intel_cpu_dev = {
 };
 
 cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+	const char			*option;
+	enum split_lock_detect_state	state;
+} sld_options[] __initconst = {
+	{ "off",	sld_off   },
+	{ "warn",	sld_warn  },
+	{ "fatal",	sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+	int len = strlen(opt);
+
+	return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+	enum split_lock_detect_state sld;
+	char arg[20];
+	int i, ret;
+
+	sld_state = sld = sld_warn;
+	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+				  arg, sizeof(arg));
+	if (ret < 0)
+		goto print;
+
+	for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+		if (match_option(arg, ret, sld_options[i].option)) {
+			sld = sld_options[i].state;
+			break;
+		}
+	}
+
+	if (sld != sld_state)
+		sld_state = sld;
+
+print:
+	switch(sld) {
+	case sld_off:
+		pr_info("disabled\n");
+		break;
+
+	case sld_warn:
+		pr_info("warning about user-space split_locks\n");
+		break;
+
+	case sld_fatal:
+		pr_info("sending SIGBUS on user-space split_locks\n");
+		break;
+	}
+}
+
+/*
+ * The TEST_CTRL MSR is per core. So multiple threads can
+ * read/write the MSR in parallel. But it's possible to
+ * simplify the read/write without locking and without
+ * worry about overwriting the MSR because only bit 29
+ * is implemented in the MSR and the bit is set as 1 by all
+ * threads. Locking may be needed in the future if situation
+ * is changed e.g. other bits are implemented.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+	u64 test_ctrl_val;
+
+	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+		return false;
+
+	if (on)
+		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+	else
+		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+		return false;
+
+	return true;
+}
+
+static void split_lock_init(void)
+{
+	if (sld_state == sld_off)
+		return;
+
+	if (__sld_msr_set(true))
+		return;
+
+	/*
+	 * If this is anything other than the boot-cpu, you've done
+	 * funny things and you get to keep whatever pieces.
+	 */
+	pr_warn("MSR fail -- disabled\n");
+	__sld_msr_set(sld_off);
+}
+
+bool split_lock_detect_enabled(void)
+{
+	return sld_state != sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+		return false;
+
+	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+		 current->comm, current->pid, regs->ip);
+
+	__sld_msr_set(false);
+	set_tsk_thread_flag(current, TIF_SLD);
+	return true;
+}
+
+void switch_sld(struct task_struct *prev)
+{
+	__sld_msr_set(true);
+	clear_tsk_thread_flag(prev, TIF_SLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+	{}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+	u64 ia32_core_caps = 0;
+
+	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+		rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
+	} else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+		/* Enumerate split lock detection by family and model. */
+		if (x86_match_cpu(split_lock_cpu_ids))
+			ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+	}
+
+	if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
+		split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 839b5244e3b7..355760d36505 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -650,6 +650,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
 		/* Enforce MSR update to ensure consistent state */
 		__speculation_ctrl_update(~tifn, tifn);
 	}
+
+	if (tifp & _TIF_SLD)
+		switch_sld(prev_p);
 }
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e6f822922a3..61c576b95184 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
 #include <asm/traps.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/cpu.h>
 #include <asm/cpu_entry_area.h>
 #include <asm/mce.h>
 #include <asm/fixmap.h>
@@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
 {
 	struct task_struct *tsk = current;
 
-
 	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
 		return;
 
@@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
 DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
 DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
 DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
-DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
 #undef IP
 
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	const char str[] = "alignment check";
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+		return;
+
+	if (!user_mode(regs))
+		die("Split lock detected\n", regs, error_code);
+
+	cond_local_irq_enable(regs);
+
+	if (handle_user_split_lock(regs, error_code))
+		return;
+
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);
+}
+
 #ifdef CONFIG_VMAP_STACK
 __visible void __noreturn handle_stack_overflow(const char *message,
 						struct pt_regs *regs,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [PATCH v14] x86/split_lock: Enable split lock detection by kernel
  2020-01-23 23:16                                         ` [PATCH v14] " Luck, Tony
@ 2020-01-24 21:36                                           ` Thomas Gleixner
  2020-01-25  2:47                                             ` [PATCH v15] " Luck, Tony
  0 siblings, 1 reply; 145+ messages in thread
From: Thomas Gleixner @ 2020-01-24 21:36 UTC (permalink / raw)
  To: Luck, Tony, Arvind Sankar
  Cc: Christopherson, Sean J, Peter Zijlstra, Ingo Molnar, Yu, Fenghua,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86

Tony,

"Luck, Tony" <tony.luck@intel.com> writes:
> +	split_lock_detect=
> +			[X86] Enable split lock detection
> +
> +			When enabled (and if hardware support is present), atomic
> +			instructions that access data across cache line
> +			boundaries will result in an alignment check exception.
> +
> +			off	- not enabled
> +
> +			warn	- the kernel will pr_alert about applications

pr_alert is not a verb. And the implementation uses
pr_warn_ratelimited(). So this should be something like:

                       The kernel will emit rate limited warnings about
                       applications ...

> +				  triggering the #AC exception
> @@ -40,4 +40,21 @@ int mwait_usable(const struct cpuinfo_x86 *);
>  unsigned int x86_family(unsigned int sig);
>  unsigned int x86_model(unsigned int sig);
>  unsigned int x86_stepping(unsigned int sig);
> +#ifdef CONFIG_CPU_SUP_INTEL
> +extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
> +extern bool split_lock_detect_enabled(void);

That function is unused.

> +extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
> +extern void switch_sld(struct task_struct *);
> +#else
> +static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
> +static inline bool split_lock_detect_enabled(void)
> +{
> +	return false;
> +}
> +static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> +	return false;
> +}
> +static inline void switch_sld(struct task_struct *prev) {}
> +#endif
  
> +enum split_lock_detect_state {
> +	sld_off = 0,
> +	sld_warn,
> +	sld_fatal,
> +};
> +
> +/*
> + * Default to sld_off because most systems do not support
> + * split lock detection. split_lock_setup() will switch this

Can you please add: If supported, then ...

> + * to sld_warn, and then check to see if there is a command
> + * line override.

I had to read this 3 times and then stare at the code.

> + */
> +static enum split_lock_detect_state sld_state = sld_off;
> +
> +static void __init split_lock_setup(void)
> +{
> +	enum split_lock_detect_state sld;
> +	char arg[20];
> +	int i, ret;
> +
> +	sld_state = sld = sld_warn;

This intermediate variable is pointless.

> +	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
> +
> +	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
> +				  arg, sizeof(arg));
> +	if (ret < 0)
> +		goto print;
> +
> +	for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
> +		if (match_option(arg, ret, sld_options[i].option)) {
> +			sld = sld_options[i].state;
> +			break;
> +		}
> +	}
> +
> +	if (sld != sld_state)
> +		sld_state = sld;
> +
> +print:

> +/*
> + * The TEST_CTRL MSR is per core. So multiple threads can
> + * read/write the MSR in parallel. But it's possible to
> + * simplify the read/write without locking and without
> + * worry about overwriting the MSR because only bit 29
> + * is implemented in the MSR and the bit is set as 1 by all
> + * threads. Locking may be needed in the future if situation
> + * is changed e.g. other bits are implemented.

This sentence doesn't parse. Something like this perhaps:

     Locking is not required at the moment because only bit 29 of this
     MSR is implemented and locking would not prevent that the operation
     of one thread is immediately undone by the sibling thread.

This implies that locking might become necessary when new bits are added.

> + */
> +
> +static bool __sld_msr_set(bool on)
> +{
> +	u64 test_ctrl_val;
> +
> +	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
> +		return false;
> +
> +	if (on)
> +		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +	else
> +		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +
> +	if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
> +		return false;
> +
> +	return true;

	return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);

> +}
> +
> +static void split_lock_init(void)
> +{
> +	if (sld_state == sld_off)
> +		return;
> +
> +	if (__sld_msr_set(true))
> +		return;
> +
> +	/*
> +	 * If this is anything other than the boot-cpu, you've done
> +	 * funny things and you get to keep whatever pieces.
> +	 */
> +	pr_warn("MSR fail -- disabled\n");
> +	__sld_msr_set(sld_off);

That should do:

        sld_state = sld_off;

for consistency sake.

> +}
> +
> +bool split_lock_detect_enabled(void)
> +{
> +	return sld_state != sld_off;
> +}
> +
> +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> +	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> +		return false;
> +
> +	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> +		 current->comm, current->pid, regs->ip);

So with 10 prints per 5 seconds an intentional offender can still fill dmesg
pretty good. A standard dmesg buffer should be full of this in
~15min. Not a big issue, but it might be annoying. Let's start with this
and deal with it when people complain.

The magic below really lacks a comment. Something like:

	/*
         * Disable the split lock detection for this task so it can make
         * progress and set TIF_SLD so the detection is reenabled via
         * switch_to_sld() when the task is scheduled out.
         */

> +	__sld_msr_set(false);
> +	set_tsk_thread_flag(current, TIF_SLD);
> +	return true;
> +}
> +
> +void switch_sld(struct task_struct *prev)

switch_to_sld() perhaps?

> +{
> +	__sld_msr_set(true);
> +	clear_tsk_thread_flag(prev, TIF_SLD);
> +}
>  
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> +	const char str[] = "alignment check";
> +
> +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> +	if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
> +		return;
> +
> +	if (!user_mode(regs))
> +		die("Split lock detected\n", regs, error_code);
> +
> +	cond_local_irq_enable(regs);

This cond is pointless. We recently removed the ability for user space
to disable interrupts and even if that would still be allowed then
keeping interrupts disabled here does not make sense.

Other than those details, I really like this approach.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 145+ messages in thread

* [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-24 21:36                                           ` Thomas Gleixner
@ 2020-01-25  2:47                                             ` Luck, Tony
  2020-01-25 10:44                                               ` Borislav Petkov
                                                                 ` (2 more replies)
  0 siblings, 3 replies; 145+ messages in thread
From: Luck, Tony @ 2020-01-25  2:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Arvind Sankar, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

From: Peter Zijlstra <peterz@infradead.org>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete). For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
	off	- not enabled (no traps for split locks)
	warn	- warn once when an application does a
		  split lock, but allow it to continue
		  running.
	fatal	- Send SIGBUS to applications that cause split lock

On systems that support split lock detection the default is "warn". Note
that if the kernel hits a split lock in any mode other than "off" it
will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

  - Toggling of split-lock is only done in "warn" mode.  Worst case
    scenario of a race is that a misbehaving task will generate multiple
    #AC exceptions on the same instruction.  And this race will only occur
    if both siblings are running tasks that generate split-lock #ACs, e.g.
    a race where sibling threads are writing different values will only
    occur if CPUx is disabling split-lock after an #AC and CPUy is
    re-enabling split-lock after *its* previous task generated an #AC.
  - Transitioning between modes at runtime isn't supported and disabling
    is tracked per task, so hardware will always reach a steady state that
    matches the configured mode.  I.e. split-lock is guaranteed to be
    enabled in hardware once all _TIF_SLD threads have been scheduled out.

Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

tglx> Other than those details, I really like this approach.

Thanks for the review. Here is V15 with all your V14 comments addressed.

I did find something with a new test. Applications that hit a
split lock warn as expected. But if they sleep before they hit
a new split lock, we get another warning. This is may be because
I messed up when fixing a PeterZ typo in the untested patch.
But I think there may have been bigger problems.

Context switch in V14 code did: 

       if (tifp & _TIF_SLD)
               switch_to_sld(prev_p);

void switch_to_sld(struct task_struct *prev)
{
       __sld_msr_set(true);
       clear_tsk_thread_flag(prev, TIF_SLD);
}

Which re-enables split lock checking for the next process to run. But
mysteriously clears the TIF_SLD bit on the previous task.

I think we need to consider TIF_SLD state of both previous and next
process when deciding what to do with the MSR. Three cases:

1) If they are both the same, leave the MSR alone it is (probably) right (modulo
   the other thread having messed with it).
2) Next process has _TIF_SLD set ... disable checking
3) Next process doesn't have _TIF_SLD set ... enable checking

So please look closely at the new version of switch_to_sld() which is
now called unconditonally on every switch ... but commonly will do
nothing.

 .../admin-guide/kernel-parameters.txt         |  18 ++
 arch/x86/include/asm/cpu.h                    |  12 ++
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/msr-index.h              |   9 +
 arch/x86/include/asm/thread_info.h            |   6 +-
 arch/x86/kernel/cpu/common.c                  |   2 +
 arch/x86/kernel/cpu/intel.c                   | 177 ++++++++++++++++++
 arch/x86/kernel/process.c                     |   2 +
 arch/x86/kernel/traps.c                       |  24 ++-
 9 files changed, 248 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7f1e2f327e43..27f61d44a37f 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3207,6 +3207,24 @@
 
 	nosoftlockup	[KNL] Disable the soft-lockup detector.
 
+	split_lock_detect=
+			[X86] Enable split lock detection
+
+			When enabled (and if hardware support is present), atomic
+			instructions that access data across cache line
+			boundaries will result in an alignment check exception.
+
+			off	- not enabled
+
+			warn	- the kernel will emit rate limited warnings
+				  about applications triggering the #AC exception
+
+			fatal	- the kernel will SIGBUS applications that
+				  trigger the #AC exception.
+
+			For any more other than 'off' the kernel will die if
+			it (or firmware) will trigger #AC.
+
 	nosync		[HW,M68K] Disables sync negotiation for all devices.
 
 	nowatchdog	[KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..2dede2bbb7cf 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,16 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+extern void switch_to_sld(struct task_struct *, struct task_struct *);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	return false;
+}
+static inline void switch_to_sld(struct task_struct *prev, struct stack *next) {}
+#endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb56edf..cd56ad5d308e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
 #define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 #define X86_FEATURE_FENCE_SWAPGS_USER	(11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -367,6 +368,7 @@
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
 /*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685e92dd..8821697a7549 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_TEST_CTRL				0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
 #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
 #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,11 @@
  */
 #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
 
+/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
+#define MSR_IA32_CORE_CAPS			  0x000000cf
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT  5
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf4327986e98..e0d12517f348 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
 #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
+#define TIF_SLD			18	/* Restore split lock detection on context switch */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
 #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
+#define _TIF_SLD		(1 << TIF_SLD)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
@@ -158,9 +160,9 @@ struct thread_info {
 
 #ifdef CONFIG_X86_IOPL_IOPERM
 # define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
-				 _TIF_IO_BITMAP)
+				 _TIF_IO_BITMAP | _TIF_SLD)
 #else
-# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
 #endif
 
 #define _TIF_WORK_CTXSW_NEXT	(_TIF_WORK_CTXSW)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 86b8241c8209..adb2f639f388 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
+	cpu_set_core_cap_bits(c);
+
 	fpu__init_system(c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57473e2c0869..d9842c64e5af 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -31,6 +33,20 @@
 #include <asm/apic.h>
 #endif
 
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
+/*
+ * Default to sld_off because most systems do not support
+ * split lock detection. split_lock_setup() will switch this
+ * to sld_warn on systems that support split lock detect, and
+ * then check to see if there is a command line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
@@ -606,6 +622,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_init(void);
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
@@ -720,6 +738,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_enable();
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
+
+	split_lock_init();
 }
 
 #ifdef CONFIG_X86_32
@@ -981,3 +1001,160 @@ static const struct cpu_dev intel_cpu_dev = {
 };
 
 cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+	const char			*option;
+	enum split_lock_detect_state	state;
+} sld_options[] __initconst = {
+	{ "off",	sld_off   },
+	{ "warn",	sld_warn  },
+	{ "fatal",	sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+	int len = strlen(opt);
+
+	return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+	char arg[20];
+	int i, ret;
+
+	sld_state = sld_warn;
+	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+				  arg, sizeof(arg));
+	if (ret < 0)
+		goto print;
+
+	for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+		if (match_option(arg, ret, sld_options[i].option)) {
+			sld_state = sld_options[i].state;
+			break;
+		}
+	}
+
+print:
+	switch(sld_state) {
+	case sld_off:
+		pr_info("disabled\n");
+		break;
+
+	case sld_warn:
+		pr_info("warning about user-space split_locks\n");
+		break;
+
+	case sld_fatal:
+		pr_info("sending SIGBUS on user-space split_locks\n");
+		break;
+	}
+}
+
+/*
+ * Locking is not required at the moment because only bit 29 of this
+ * MSR is implemented and locking would not prevent that the operation
+ * of one thread is immediately undone by the sibling thread.
+ */
+
+static bool __sld_msr_set(bool on)
+{
+	u64 test_ctrl_val;
+
+	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+		return false;
+
+	if (on)
+		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+	else
+		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
+}
+
+static void split_lock_init(void)
+{
+	if (sld_state == sld_off)
+		return;
+
+	if (__sld_msr_set(true))
+		return;
+
+	/*
+	 * If this is anything other than the boot-cpu, you've done
+	 * funny things and you get to keep whatever pieces.
+	 */
+	pr_warn("MSR fail -- disabled\n");
+	__sld_msr_set(sld_off);
+	sld_state = sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+		return false;
+
+	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+		 current->comm, current->pid, regs->ip);
+
+	/*
+	 * Disable the split lock detection for this task so it can make
+	 * progress and set TIF_SLD so the detection is reenabled via
+	 * switch_to_sld() when the task is scheduled out.
+	 */
+	__sld_msr_set(false);
+	set_tsk_thread_flag(current, TIF_SLD);
+	return true;
+}
+
+void switch_to_sld(struct task_struct *prev, struct task_struct *next)
+{
+	bool prevflag = test_tsk_thread_flag(prev, TIF_SLD);
+	bool nextflag = test_tsk_thread_flag(next, TIF_SLD);
+
+	/*
+	 * If we are switching between tasks that have the same
+	 * need for split lock checking, then the MSR is (probably)
+	 * right (modulo the other thread messing with it.
+	 * Otherwise look at whether the new task needs split
+	 * lock enabled.
+	 */
+	if (prevflag != nextflag)
+		__sld_msr_set(nextflag);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+	{}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+	u64 ia32_core_caps = 0;
+
+	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+		rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
+	} else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+		/* Enumerate split lock detection by family and model. */
+		if (x86_match_cpu(split_lock_cpu_ids))
+			ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+	}
+
+	if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
+		split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 839b5244e3b7..b34d359c4e39 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -650,6 +650,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
 		/* Enforce MSR update to ensure consistent state */
 		__speculation_ctrl_update(~tifn, tifn);
 	}
+
+	switch_to_sld(prev_p, next_p);
 }
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e6f822922a3..884e8e59dafd 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
 #include <asm/traps.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/cpu.h>
 #include <asm/cpu_entry_area.h>
 #include <asm/mce.h>
 #include <asm/fixmap.h>
@@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
 {
 	struct task_struct *tsk = current;
 
-
 	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
 		return;
 
@@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
 DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
 DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
 DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
-DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
 #undef IP
 
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	const char str[] = "alignment check";
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+		return;
+
+	if (!user_mode(regs))
+		die("Split lock detected\n", regs, error_code);
+
+	local_irq_enable();
+
+	if (handle_user_split_lock(regs, error_code))
+		return;
+
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);
+}
+
 #ifdef CONFIG_VMAP_STACK
 __visible void __noreturn handle_stack_overflow(const char *message,
 						struct pt_regs *regs,
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25  2:47                                             ` [PATCH v15] " Luck, Tony
@ 2020-01-25 10:44                                               ` Borislav Petkov
  2020-01-25 19:55                                                 ` Luck, Tony
  2020-01-25 13:41                                               ` Thomas Gleixner
  2020-01-25 21:25                                               ` [PATCH v15] " Arvind Sankar
  2 siblings, 1 reply; 145+ messages in thread
From: Borislav Petkov @ 2020-01-25 10:44 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, Arvind Sankar, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Fri, Jan 24, 2020 at 06:47:27PM -0800, Luck, Tony wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> A split-lock occurs when an atomic instruction operates on data
> that spans two cache lines. In order to maintain atomicity the
> core takes a global bus lock.
> 
> This is typically >1000 cycles slower than an atomic operation
> within a cache line. It also disrupts performance on other cores
> (which must wait for the bus lock to be released before their
> memory operations can complete). For real-time systems this may
> mean missing deadlines. For other systems it may just be very
> annoying.
> 
> Some CPUs have the capability to raise an #AC trap when a
> split lock is attempted.
> 
> Provide a command line option to give the user choices on how
> to handle this. split_lock_detect=
> 	off	- not enabled (no traps for split locks)
> 	warn	- warn once when an application does a
> 		  split lock, but allow it to continue
> 		  running.
> 	fatal	- Send SIGBUS to applications that cause split lock
> 
> On systems that support split lock detection the default is "warn". Note
> that if the kernel hits a split lock in any mode other than "off" it
> will OOPs.
> 
> One implementation wrinkle is that the MSR to control the
> split lock detection is per-core, not per thread. This might
> result in some short lived races on HT systems in "warn" mode
> if Linux tries to enable on one thread while disabling on
> the other. Race analysis by Sean Christopherson:
> 
>   - Toggling of split-lock is only done in "warn" mode.  Worst case
>     scenario of a race is that a misbehaving task will generate multiple
>     #AC exceptions on the same instruction.  And this race will only occur
>     if both siblings are running tasks that generate split-lock #ACs, e.g.
>     a race where sibling threads are writing different values will only
>     occur if CPUx is disabling split-lock after an #AC and CPUy is
>     re-enabling split-lock after *its* previous task generated an #AC.
>   - Transitioning between modes at runtime isn't supported and disabling
>     is tracked per task, so hardware will always reach a steady state that
>     matches the configured mode.  I.e. split-lock is guaranteed to be
>     enabled in hardware once all _TIF_SLD threads have been scheduled out.

I think this "wrinkle" needs to be written down somewhere more prominent
- not in the commit message only - so that people can find it when using
the thing and start seeing the multiple #ACs on the same insn.

> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
> Co-developed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Tony Luck <tony.luck@intel.com>

checkpatch is bitching here:

WARNING: Co-developed-by: must be immediately followed by Signed-off-by:
#66: 
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
WARNING: Co-developed-by and Signed-off-by: name/email do not match 
#67: 
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>

> ---
> 
> tglx> Other than those details, I really like this approach.
> 
> Thanks for the review. Here is V15 with all your V14 comments addressed.
> 
> I did find something with a new test. Applications that hit a
> split lock warn as expected. But if they sleep before they hit
> a new split lock, we get another warning. This is may be because
> I messed up when fixing a PeterZ typo in the untested patch.
> But I think there may have been bigger problems.
> 
> Context switch in V14 code did: 
> 
>        if (tifp & _TIF_SLD)
>                switch_to_sld(prev_p);
> 
> void switch_to_sld(struct task_struct *prev)
> {
>        __sld_msr_set(true);
>        clear_tsk_thread_flag(prev, TIF_SLD);
> }
> 
> Which re-enables split lock checking for the next process to run. But
> mysteriously clears the TIF_SLD bit on the previous task.
> 
> I think we need to consider TIF_SLD state of both previous and next
> process when deciding what to do with the MSR. Three cases:
> 
> 1) If they are both the same, leave the MSR alone it is (probably) right (modulo
>    the other thread having messed with it).
> 2) Next process has _TIF_SLD set ... disable checking
> 3) Next process doesn't have _TIF_SLD set ... enable checking
> 
> So please look closely at the new version of switch_to_sld() which is
> now called unconditonally on every switch ... but commonly will do
> nothing.
> 
>  .../admin-guide/kernel-parameters.txt         |  18 ++
>  arch/x86/include/asm/cpu.h                    |  12 ++
>  arch/x86/include/asm/cpufeatures.h            |   2 +
>  arch/x86/include/asm/msr-index.h              |   9 +
>  arch/x86/include/asm/thread_info.h            |   6 +-
>  arch/x86/kernel/cpu/common.c                  |   2 +
>  arch/x86/kernel/cpu/intel.c                   | 177 ++++++++++++++++++
>  arch/x86/kernel/process.c                     |   2 +
>  arch/x86/kernel/traps.c                       |  24 ++-
>  9 files changed, 248 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 7f1e2f327e43..27f61d44a37f 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3207,6 +3207,24 @@
>  
>  	nosoftlockup	[KNL] Disable the soft-lockup detector.
>  
> +	split_lock_detect=

Needs to be alphabetically sorted.

> +			[X86] Enable split lock detection
> +
> +			When enabled (and if hardware support is present), atomic
> +			instructions that access data across cache line
> +			boundaries will result in an alignment check exception.
> +
> +			off	- not enabled
> +
> +			warn	- the kernel will emit rate limited warnings
> +				  about applications triggering the #AC exception
> +
> +			fatal	- the kernel will SIGBUS applications that

"... the kernel will send a SIGBUG to applications..."

> +				  trigger the #AC exception.
> +
> +			For any more other than 'off' the kernel will die if
> +			it (or firmware) will trigger #AC.

Why would the kernel die in the "warn" case? It prints ratelimited
warnings only, if I'm reading this help text correctly. Commit mesage says

" Note that if the kernel hits a split lock in any mode other than
"off" it will OOPs."

but this text doesn't say why and leaves people scratching heads and
making them look at the code...

/me scrolls down

aaha, you mean this:

        if (!user_mode(regs))
                die("Split lock detected\n", regs, error_code);

so what you're trying to say is, "if an #AC exception is hit in the
kernel or the firmware - not in a user task - then we will oops."

Yes?

If so, pls extend so that it is clear what this means.

And the default setting is? I.e., put a short sentence after "warn"
saying so.

> +
>  	nosync		[HW,M68K] Disables sync negotiation for all devices.
>  
>  	nowatchdog	[KNL] Disable both lockup detectors, i.e.
> diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
> index adc6cc86b062..2dede2bbb7cf 100644
> --- a/arch/x86/include/asm/cpu.h
> +++ b/arch/x86/include/asm/cpu.h
> @@ -40,4 +40,16 @@ int mwait_usable(const struct cpuinfo_x86 *);
>  unsigned int x86_family(unsigned int sig);
>  unsigned int x86_model(unsigned int sig);
>  unsigned int x86_stepping(unsigned int sig);
> +#ifdef CONFIG_CPU_SUP_INTEL
> +extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
> +extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
> +extern void switch_to_sld(struct task_struct *, struct task_struct *);

WARNING: function definition argument 'struct task_struct *' should also have an identifier name
#160: FILE: arch/x86/include/asm/cpu.h:46:
+extern void switch_to_sld(struct task_struct *, struct task_struct *);

> +#else
> +static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
> +static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> +	return false;
> +}
> +static inline void switch_to_sld(struct task_struct *prev, struct stack *next) {}
> +#endif
>  #endif /* _ASM_X86_CPU_H */
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index f3327cb56edf..cd56ad5d308e 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -285,6 +285,7 @@
>  #define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
>  #define X86_FEATURE_FENCE_SWAPGS_USER	(11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
>  #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
> +#define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */

Do you really want to have "split_lock_detect" in /proc/cpuinfo or
rather somethign shorter?

>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
> @@ -367,6 +368,7 @@
>  #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
>  #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
>  #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
> +#define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
>  #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
>  
>  /*
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index ebe1685e92dd..8821697a7549 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -41,6 +41,10 @@
>  
>  /* Intel MSRs. Some also available on other CPUs */
>  
> +#define MSR_TEST_CTRL				0x00000033
> +#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
> +#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
> +
>  #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
>  #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
>  #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
> @@ -70,6 +74,11 @@
>   */
>  #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
>  
> +/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
> +#define MSR_IA32_CORE_CAPS			  0x000000cf
> +#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT  5
> +#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
> +
>  #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
>  #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
>  #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
> diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
> index cf4327986e98..e0d12517f348 100644
> --- a/arch/x86/include/asm/thread_info.h
> +++ b/arch/x86/include/asm/thread_info.h
> @@ -92,6 +92,7 @@ struct thread_info {
>  #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
>  #define TIF_NOTSC		16	/* TSC is not accessible in userland */
>  #define TIF_IA32		17	/* IA32 compatibility process */
> +#define TIF_SLD			18	/* Restore split lock detection on context switch */
>  #define TIF_NOHZ		19	/* in adaptive nohz mode */
>  #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
>  #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
> @@ -122,6 +123,7 @@ struct thread_info {
>  #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
>  #define _TIF_NOTSC		(1 << TIF_NOTSC)
>  #define _TIF_IA32		(1 << TIF_IA32)
> +#define _TIF_SLD		(1 << TIF_SLD)
>  #define _TIF_NOHZ		(1 << TIF_NOHZ)
>  #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
>  #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
> @@ -158,9 +160,9 @@ struct thread_info {
>  
>  #ifdef CONFIG_X86_IOPL_IOPERM
>  # define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
> -				 _TIF_IO_BITMAP)
> +				 _TIF_IO_BITMAP | _TIF_SLD)
>  #else
> -# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
> +# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)

Can you fix those while at it pls:

ERROR: need consistent spacing around '|' (ctx:VxW)
#245: FILE: arch/x86/include/asm/thread_info.h:165:
+# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
                              	                ^
>  #endif
>  
>  #define _TIF_WORK_CTXSW_NEXT	(_TIF_WORK_CTXSW)
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 86b8241c8209..adb2f639f388 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
>  
>  	cpu_set_bug_bits(c);
>  
> +	cpu_set_core_cap_bits(c);
> +
>  	fpu__init_system(c);
>  
>  #ifdef CONFIG_X86_32
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 57473e2c0869..d9842c64e5af 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -19,6 +19,8 @@
>  #include <asm/microcode_intel.h>
>  #include <asm/hwcap2.h>
>  #include <asm/elf.h>
> +#include <asm/cpu_device_id.h>
> +#include <asm/cmdline.h>
>  
>  #ifdef CONFIG_X86_64
>  #include <linux/topology.h>
> @@ -31,6 +33,20 @@
>  #include <asm/apic.h>
>  #endif
>  
> +enum split_lock_detect_state {
> +	sld_off = 0,
> +	sld_warn,
> +	sld_fatal,
> +};
> +
> +/*
> + * Default to sld_off because most systems do not support
> + * split lock detection. split_lock_setup() will switch this
> + * to sld_warn on systems that support split lock detect, and
> + * then check to see if there is a command line override.
> + */

That comment is shorter than 80 cols while others below aren't.

> +static enum split_lock_detect_state sld_state = sld_off;
> +
>  /*
>   * Just in case our CPU detection goes bad, or you have a weird system,
>   * allow a way to override the automatic disabling of MPX.
> @@ -606,6 +622,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
>  	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
>  }
>  
> +static void split_lock_init(void);
> +
>  static void init_intel(struct cpuinfo_x86 *c)
>  {
>  	early_init_intel(c);
> @@ -720,6 +738,8 @@ static void init_intel(struct cpuinfo_x86 *c)
>  		tsx_enable();
>  	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
>  		tsx_disable();
> +
> +	split_lock_init();
>  }
>  
>  #ifdef CONFIG_X86_32
> @@ -981,3 +1001,160 @@ static const struct cpu_dev intel_cpu_dev = {
>  };
>  
>  cpu_dev_register(intel_cpu_dev);
> +
> +#undef pr_fmt
> +#define pr_fmt(fmt) "x86/split lock detection: " fmt
> +
> +static const struct {
> +	const char			*option;
> +	enum split_lock_detect_state	state;
> +} sld_options[] __initconst = {
> +	{ "off",	sld_off   },
> +	{ "warn",	sld_warn  },
> +	{ "fatal",	sld_fatal },
> +};
> +
> +static inline bool match_option(const char *arg, int arglen, const char *opt)
> +{
> +	int len = strlen(opt);
> +
> +	return len == arglen && !strncmp(arg, opt, len);
> +}

There's the same function in arch/x86/kernel/cpu/bugs.c. Why are you
duplicating it here?

Yeah, this whole chunk looks like it has been "influenced" by the sec
mitigations in bugs.c :-)

> +static void __init split_lock_setup(void)
> +{
> +	char arg[20];
> +	int i, ret;
> +
> +	sld_state = sld_warn;
> +	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
> +
> +	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
> +				  arg, sizeof(arg));
> +	if (ret < 0)
> +		goto print;
> +
> +	for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
> +		if (match_option(arg, ret, sld_options[i].option)) {
> +			sld_state = sld_options[i].state;
> +			break;
> +		}
> +	}
> +
> +print:
> +	switch(sld_state) {

ERROR: space required before the open parenthesis '('
#359: FILE: arch/x86/kernel/cpu/intel.c:1045:
+	switch(sld_state) {

> +	case sld_off:
> +		pr_info("disabled\n");
> +		break;
> +
> +	case sld_warn:
> +		pr_info("warning about user-space split_locks\n");
> +		break;
> +
> +	case sld_fatal:
> +		pr_info("sending SIGBUS on user-space split_locks\n");
> +		break;
> +	}
> +}
> +
> +/*
> + * Locking is not required at the moment because only bit 29 of this
> + * MSR is implemented and locking would not prevent that the operation
> + * of one thread is immediately undone by the sibling thread.
> + */
> +

^ Superfluous newline.

> +static bool __sld_msr_set(bool on)
> +{
> +	u64 test_ctrl_val;
> +
> +	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
> +		return false;
> +
> +	if (on)
> +		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +	else
> +		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +
> +	return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
> +}
> +
> +static void split_lock_init(void)
> +{
> +	if (sld_state == sld_off)
> +		return;
> +
> +	if (__sld_msr_set(true))
> +		return;
> +
> +	/*
> +	 * If this is anything other than the boot-cpu, you've done
> +	 * funny things and you get to keep whatever pieces.
> +	 */
> +	pr_warn("MSR fail -- disabled\n");

What's that for? Guests?

> +	__sld_msr_set(sld_off);
> +	sld_state = sld_off;
> +}
> +
> +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
> +{
> +	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> +		return false;
> +
> +	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> +		 current->comm, current->pid, regs->ip);
> +
> +	/*
> +	 * Disable the split lock detection for this task so it can make
> +	 * progress and set TIF_SLD so the detection is reenabled via
> +	 * switch_to_sld() when the task is scheduled out.
> +	 */
> +	__sld_msr_set(false);
> +	set_tsk_thread_flag(current, TIF_SLD);
> +	return true;
> +}
> +
> +void switch_to_sld(struct task_struct *prev, struct task_struct *next)

This will get called on other vendors but let's just assume, for
simplicity's sake, TIF_SLD won't be set there so it is only a couple of
insns on a task switch going to waste.

> +{
> +	bool prevflag = test_tsk_thread_flag(prev, TIF_SLD);
> +	bool nextflag = test_tsk_thread_flag(next, TIF_SLD);
> +
> +	/*
> +	 * If we are switching between tasks that have the same
> +	 * need for split lock checking, then the MSR is (probably)
> +	 * right (modulo the other thread messing with it.
> +	 * Otherwise look at whether the new task needs split
> +	 * lock enabled.
> +	 */
> +	if (prevflag != nextflag)
> +		__sld_msr_set(nextflag);
> +}
> +
> +#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
> +
> +/*
> + * The following processors have split lock detection feature. But since they
> + * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
> + * the MSR. So enumerate the feature by family and model on these processors.
> + */
> +static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
> +	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
> +	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
> +	{}
> +};
> +
> +void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
> +{
> +	u64 ia32_core_caps = 0;

So this gets called on other vendors too and even if they should not
have set X86_FEATURE_CORE_CAPABILITIES, a vendor check here would be
prudent for the future:

	if (c->x86_vendor != X86_VENDOR_INTEL)
		return;

> +
> +	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
> +		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
> +		rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
> +	} else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> +		/* Enumerate split lock detection by family and model. */
> +		if (x86_match_cpu(split_lock_cpu_ids))
> +			ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
> +	}
> +
> +	if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
> +		split_lock_setup();
> +}
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 839b5244e3b7..b34d359c4e39 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -650,6 +650,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
>  		/* Enforce MSR update to ensure consistent state */
>  		__speculation_ctrl_update(~tifn, tifn);
>  	}
> +
> +	switch_to_sld(prev_p, next_p);
>  }
>  
>  /*
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 9e6f822922a3..884e8e59dafd 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -46,6 +46,7 @@
>  #include <asm/traps.h>
>  #include <asm/desc.h>
>  #include <asm/fpu/internal.h>
> +#include <asm/cpu.h>
>  #include <asm/cpu_entry_area.h>
>  #include <asm/mce.h>
>  #include <asm/fixmap.h>
> @@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
>  {
>  	struct task_struct *tsk = current;
>  
> -
>  	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
>  		return;
>  
> @@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
>  DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
>  DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
>  DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
> -DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
>  #undef IP
>  
> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> +	const char str[] = "alignment check";

WARNING: const array should probably be static const
#517: FILE: arch/x86/kernel/traps.c:297:
+	const char str[] = "alignment check";

> +
> +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> +	if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
> +		return;
> +
> +	if (!user_mode(regs))
> +		die("Split lock detected\n", regs, error_code);
> +
> +	local_irq_enable();
> +
> +	if (handle_user_split_lock(regs, error_code))
> +		return;
> +
> +	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> +		error_code, BUS_ADRALN, NULL);
> +}
> +
>  #ifdef CONFIG_VMAP_STACK
>  __visible void __noreturn handle_stack_overflow(const char *message,
>  						struct pt_regs *regs,
> -- 
> 2.21.1
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25  2:47                                             ` [PATCH v15] " Luck, Tony
  2020-01-25 10:44                                               ` Borislav Petkov
@ 2020-01-25 13:41                                               ` Thomas Gleixner
  2020-01-25 22:07                                                 ` [PATCH v16] " Luck, Tony
  2020-01-25 21:25                                               ` [PATCH v15] " Arvind Sankar
  2 siblings, 1 reply; 145+ messages in thread
From: Thomas Gleixner @ 2020-01-25 13:41 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Arvind Sankar, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

Tony,

"Luck, Tony" <tony.luck@intel.com> writes:
> +
> +void switch_to_sld(struct task_struct *prev, struct task_struct *next)
> +{
> +	bool prevflag = test_tsk_thread_flag(prev, TIF_SLD);
> +	bool nextflag = test_tsk_thread_flag(next, TIF_SLD);
> +
> +	/*
> +	 * If we are switching between tasks that have the same
> +	 * need for split lock checking, then the MSR is (probably)
> +	 * right (modulo the other thread messing with it.
> +	 * Otherwise look at whether the new task needs split
> +	 * lock enabled.
> +	 */
> +	if (prevflag != nextflag)
> +		__sld_msr_set(nextflag);
> +}
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 839b5244e3b7..b34d359c4e39 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -650,6 +650,8 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
>  		/* Enforce MSR update to ensure consistent state */
>  		__speculation_ctrl_update(~tifn, tifn);
>  	}
> +
> +	switch_to_sld(prev_p, next_p);

This really wants to follow the logic of the other TIF checks.

        if ((tifp ^ tifn) & _TIF_SLD)
        	switch_to_sld(tifn);

and

void switch_to_sld(tifn)
{
        __sld_msr_set(tif & _TIF_SLD);
}

That reuses tifp, tifn which are ready to consume there and calls only
out of line when the bits differ. The xor/and combo turned out to result
in the most efficient code.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 10:44                                               ` Borislav Petkov
@ 2020-01-25 19:55                                                 ` Luck, Tony
  2020-01-25 20:12                                                   ` Peter Zijlstra
  2020-01-25 20:29                                                   ` Borislav Petkov
  0 siblings, 2 replies; 145+ messages in thread
From: Luck, Tony @ 2020-01-25 19:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Arvind Sankar, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Sat, Jan 25, 2020 at 11:44:19AM +0100, Borislav Petkov wrote:

Boris,

Thanks for the review. All comments accepted and changes made, except as
listed below.  Also will fix up some other checkpatch fluff.

-Tony


> > +#define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
> 
> Do you really want to have "split_lock_detect" in /proc/cpuinfo or
> rather somethign shorter?

I don't have a good abbreviation.  It would become the joint 2nd longest
flag name ... top ten lengths look like this on my test machine. So while
long, not unprecedented.

18 tsc_deadline_timer
17 split_lock_detect
17 arch_capabilities
16 avx512_vpopcntdq
14 tsc_known_freq
14 invpcid_single
14 hwp_act_window
13 ibrs_enhanced
13 cqm_occup_llc
13 cqm_mbm_total
13 cqm_mbm_local
13 avx512_bitalg
13 3dnowprefetch


> > +static inline bool match_option(const char *arg, int arglen, const char *opt)
> > +{
> > +	int len = strlen(opt);
> > +
> > +	return len == arglen && !strncmp(arg, opt, len);
> > +}
> 
> There's the same function in arch/x86/kernel/cpu/bugs.c. Why are you
> duplicating it here?
> 
> Yeah, this whole chunk looks like it has been "influenced" by the sec
> mitigations in bugs.c :-)

Blame PeterZ for that. For now I'd like to add the duplicate inline function
and then clean up by putting it into some header file (and maybe hunting down
other places where it could be used).

> > +	/*
> > +	 * If this is anything other than the boot-cpu, you've done
> > +	 * funny things and you get to keep whatever pieces.
> > +	 */
> > +	pr_warn("MSR fail -- disabled\n");
> 
> What's that for? Guests?

Also some PeterZ code. As the comment implies we really shouldn't be able
to get here. This whole function should only be called on CPU models that
support the MSR ... but PeterZ is defending against the situation that sometimes
there are special SKUs with the same model number (since we may be here because
of an x86_match_cpu() hit, rather than the architectural enumeration check).

> > +void switch_to_sld(struct task_struct *prev, struct task_struct *next)
> 
> This will get called on other vendors but let's just assume, for
> simplicity's sake, TIF_SLD won't be set there so it is only a couple of
> insns on a task switch going to waste.

Thomas explained how to fix it so we only call the function if TIF_SLD
is set in either the previous or next process (but not both). So the
overhead is just extra XOR/AND in the caller.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 19:55                                                 ` Luck, Tony
@ 2020-01-25 20:12                                                   ` Peter Zijlstra
  2020-01-25 20:33                                                     ` Borislav Petkov
  2020-01-25 20:29                                                   ` Borislav Petkov
  1 sibling, 1 reply; 145+ messages in thread
From: Peter Zijlstra @ 2020-01-25 20:12 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, Thomas Gleixner, Arvind Sankar, Christopherson,
	Sean J, Ingo Molnar, Yu, Fenghua, Ingo Molnar, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Sat, Jan 25, 2020 at 11:55:13AM -0800, Luck, Tony wrote:
> > > +static inline bool match_option(const char *arg, int arglen, const char *opt)
> > > +{
> > > +	int len = strlen(opt);
> > > +
> > > +	return len == arglen && !strncmp(arg, opt, len);
> > > +}
> > 
> > There's the same function in arch/x86/kernel/cpu/bugs.c. Why are you
> > duplicating it here?
> > 
> > Yeah, this whole chunk looks like it has been "influenced" by the sec
> > mitigations in bugs.c :-)
> 
> Blame PeterZ for that. For now I'd like to add the duplicate inline function
> and then clean up by putting it into some header file (and maybe hunting down
> other places where it could be used).

Yeah, I copy/paste cobbled that together. I figured it was easier to
'borrow' something that worked and adapt it than try and write
something new in a hurry.

> > > +	/*
> > > +	 * If this is anything other than the boot-cpu, you've done
> > > +	 * funny things and you get to keep whatever pieces.
> > > +	 */
> > > +	pr_warn("MSR fail -- disabled\n");
> > 
> > What's that for? Guests?
> 
> Also some PeterZ code. As the comment implies we really shouldn't be able
> to get here. This whole function should only be called on CPU models that
> support the MSR ... but PeterZ is defending against the situation that sometimes
> there are special SKUs with the same model number (since we may be here because
> of an x86_match_cpu() hit, rather than the architectural enumeration check).

My thinking was Virt, virt likes to mess up all msr expectations.


^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 19:55                                                 ` Luck, Tony
  2020-01-25 20:12                                                   ` Peter Zijlstra
@ 2020-01-25 20:29                                                   ` Borislav Petkov
  1 sibling, 0 replies; 145+ messages in thread
From: Borislav Petkov @ 2020-01-25 20:29 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, Arvind Sankar, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Sat, Jan 25, 2020 at 11:55:13AM -0800, Luck, Tony wrote:
> I don't have a good abbreviation.  It would become the joint 2nd longest
> flag name ... top ten lengths look like this on my test machine. So while
> long, not unprecedented.

Yah, I guess we lost that battle long ago.

> Thomas explained how to fix it so we only call the function if TIF_SLD
> is set in either the previous or next process (but not both). So the
> overhead is just extra XOR/AND in the caller.

Yeah.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 20:12                                                   ` Peter Zijlstra
@ 2020-01-25 20:33                                                     ` Borislav Petkov
  2020-01-25 21:42                                                       ` Luck, Tony
  0 siblings, 1 reply; 145+ messages in thread
From: Borislav Petkov @ 2020-01-25 20:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Luck, Tony, Thomas Gleixner, Arvind Sankar, Christopherson,
	Sean J, Ingo Molnar, Yu, Fenghua, Ingo Molnar, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Sat, Jan 25, 2020 at 09:12:21PM +0100, Peter Zijlstra wrote:
> > Blame PeterZ for that. For now I'd like to add the duplicate inline function
> > and then clean up by putting it into some header file (and maybe hunting down
> > other places where it could be used).

Sounds like a good plan.

> Yeah, I copy/paste cobbled that together. I figured it was easier to
> 'borrow' something that worked and adapt it than try and write
> something new in a hurry.

Yeah.

> > Also some PeterZ code. As the comment implies we really shouldn't be able
> > to get here. This whole function should only be called on CPU models that
> > support the MSR ... but PeterZ is defending against the situation that sometimes
> > there are special SKUs with the same model number (since we may be here because
> > of an x86_match_cpu() hit, rather than the architectural enumeration check).
> 
> My thinking was Virt, virt likes to mess up all msr expectations.

My only worry is to have it written down why we're doing this so that it
can be changed/removed later, when we've forgotten all about split lock.
Because pretty often we look at a comment-less chunk of code and wonder,
"why the hell did we add this in the first place."

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25  2:47                                             ` [PATCH v15] " Luck, Tony
  2020-01-25 10:44                                               ` Borislav Petkov
  2020-01-25 13:41                                               ` Thomas Gleixner
@ 2020-01-25 21:25                                               ` Arvind Sankar
  2020-01-25 21:50                                                 ` Luck, Tony
  2020-01-27  8:02                                                 ` Peter Zijlstra
  2 siblings, 2 replies; 145+ messages in thread
From: Arvind Sankar @ 2020-01-25 21:25 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, Arvind Sankar, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

On Fri, Jan 24, 2020 at 06:47:27PM -0800, Luck, Tony wrote:
> I did find something with a new test. Applications that hit a
> split lock warn as expected. But if they sleep before they hit
> a new split lock, we get another warning. This is may be because
> I messed up when fixing a PeterZ typo in the untested patch.
> But I think there may have been bigger problems.
> 
> Context switch in V14 code did: 
> 
>        if (tifp & _TIF_SLD)
>                switch_to_sld(prev_p);
> 
> void switch_to_sld(struct task_struct *prev)
> {
>        __sld_msr_set(true);
>        clear_tsk_thread_flag(prev, TIF_SLD);
> }
> 
> Which re-enables split lock checking for the next process to run. But
> mysteriously clears the TIF_SLD bit on the previous task.

Did Peter mean to disable it only for the current timeslice and
re-enable it for the next time its scheduled?

> 
> I think we need to consider TIF_SLD state of both previous and next
> process when deciding what to do with the MSR. Three cases:
> 
> 1) If they are both the same, leave the MSR alone it is (probably) right (modulo
>    the other thread having messed with it).
> 2) Next process has _TIF_SLD set ... disable checking
> 3) Next process doesn't have _TIF_SLD set ... enable checking
> 
> So please look closely at the new version of switch_to_sld() which is
> now called unconditonally on every switch ... but commonly will do
> nothing.
...
> +	/*
> +	 * Disable the split lock detection for this task so it can make
> +	 * progress and set TIF_SLD so the detection is reenabled via
> +	 * switch_to_sld() when the task is scheduled out.
> +	 */
> +	__sld_msr_set(false);
> +	set_tsk_thread_flag(current, TIF_SLD);
> +	return true;
> +}
> +
> +void switch_to_sld(struct task_struct *prev, struct task_struct *next)
> +{
> +	bool prevflag = test_tsk_thread_flag(prev, TIF_SLD);
> +	bool nextflag = test_tsk_thread_flag(next, TIF_SLD);
> +
> +	/*
> +	 * If we are switching between tasks that have the same
> +	 * need for split lock checking, then the MSR is (probably)
> +	 * right (modulo the other thread messing with it.
> +	 * Otherwise look at whether the new task needs split
> +	 * lock enabled.
> +	 */
> +	if (prevflag != nextflag)
> +		__sld_msr_set(nextflag);
> +}

I might be missing something but shouldnt this be !nextflag given the
flag being unset is when the task wants sld?

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 20:33                                                     ` Borislav Petkov
@ 2020-01-25 21:42                                                       ` Luck, Tony
  2020-01-25 22:17                                                         ` Borislav Petkov
  0 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2020-01-25 21:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Peter Zijlstra, Thomas Gleixner, Arvind Sankar, Christopherson,
	Sean J, Ingo Molnar, Yu, Fenghua, Ingo Molnar, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Sat, Jan 25, 2020 at 09:33:12PM +0100, Borislav Petkov wrote:
> On Sat, Jan 25, 2020 at 09:12:21PM +0100, Peter Zijlstra wrote:
> > My thinking was Virt, virt likes to mess up all msr expectations.
> 
> My only worry is to have it written down why we're doing this so that it
> can be changed/removed later, when we've forgotten all about split lock.
> Because pretty often we look at a comment-less chunk of code and wonder,
> "why the hell did we add this in the first place."

Ok. I added a comment:

 * Use the "safe" versions of rdmsr/wrmsr here because although code
 * checks CPUID and MSR bits to make sure the TEST_CTRL MSR should
 * exist, there may be glitches in virtualization that leave a guest
 * with an incorrect view of real h/w capabilities.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 21:25                                               ` [PATCH v15] " Arvind Sankar
@ 2020-01-25 21:50                                                 ` Luck, Tony
  2020-01-25 23:51                                                   ` Arvind Sankar
  2020-01-27  8:04                                                   ` Peter Zijlstra
  2020-01-27  8:02                                                 ` Peter Zijlstra
  1 sibling, 2 replies; 145+ messages in thread
From: Luck, Tony @ 2020-01-25 21:50 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Thomas Gleixner, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Sat, Jan 25, 2020 at 04:25:25PM -0500, Arvind Sankar wrote:
> On Fri, Jan 24, 2020 at 06:47:27PM -0800, Luck, Tony wrote:
> > I did find something with a new test. Applications that hit a
> > split lock warn as expected. But if they sleep before they hit
> > a new split lock, we get another warning. This is may be because
> > I messed up when fixing a PeterZ typo in the untested patch.
> > But I think there may have been bigger problems.
> > 
> > Context switch in V14 code did: 
> > 
> >        if (tifp & _TIF_SLD)
> >                switch_to_sld(prev_p);
> > 
> > void switch_to_sld(struct task_struct *prev)
> > {
> >        __sld_msr_set(true);
> >        clear_tsk_thread_flag(prev, TIF_SLD);
> > }
> > 
> > Which re-enables split lock checking for the next process to run. But
> > mysteriously clears the TIF_SLD bit on the previous task.
> 
> Did Peter mean to disable it only for the current timeslice and
> re-enable it for the next time its scheduled?

He's seen and commented on this thread since I made this comment. So
I'll assume not.  Things get really noisy on the console (even with
the rate limit) if split lock detection is re-enabled after a context
switch (my new test highlighted this!)

> > +void switch_to_sld(struct task_struct *prev, struct task_struct *next)
> > +{
> > +	bool prevflag = test_tsk_thread_flag(prev, TIF_SLD);
> > +	bool nextflag = test_tsk_thread_flag(next, TIF_SLD);
> > +
> > +	/*
> > +	 * If we are switching between tasks that have the same
> > +	 * need for split lock checking, then the MSR is (probably)
> > +	 * right (modulo the other thread messing with it.
> > +	 * Otherwise look at whether the new task needs split
> > +	 * lock enabled.
> > +	 */
> > +	if (prevflag != nextflag)
> > +		__sld_msr_set(nextflag);
> > +}
> 
> I might be missing something but shouldnt this be !nextflag given the
> flag being unset is when the task wants sld?

That logic is convoluted ... but Thomas showed me a much better
way that is also much simpler ... so this code has gone now. The
new version is far easier to read (argument is flags for the new task
that we are switching to)

void switch_to_sld(unsigned long tifn)
{
        __sld_msr_set(tifn & _TIF_SLD);
}

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* [PATCH v16] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 13:41                                               ` Thomas Gleixner
@ 2020-01-25 22:07                                                 ` Luck, Tony
  2020-01-25 22:43                                                   ` Mark D Rustad
  2020-01-26  0:34                                                   ` [PATCH v16] " Andy Lutomirski
  0 siblings, 2 replies; 145+ messages in thread
From: Luck, Tony @ 2020-01-25 22:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Arvind Sankar, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

From: "Peter Zijlstra (Intel)" <peterz@infradead.org>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete). For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
	off	- not enabled (no traps for split locks)
	warn	- warn once when an application does a
		  split lock, but allow it to continue
		  running.
	fatal	- Send SIGBUS to applications that cause split lock

On systems that support split lock detection the default is "warn". Note
that if the kernel hits a split lock in any mode other than "off" it
will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

  - Toggling of split-lock is only done in "warn" mode.  Worst case
    scenario of a race is that a misbehaving task will generate multiple
    #AC exceptions on the same instruction.  And this race will only occur
    if both siblings are running tasks that generate split-lock #ACs, e.g.
    a race where sibling threads are writing different values will only
    occur if CPUx is disabling split-lock after an #AC and CPUy is
    re-enabling split-lock after *its* previous task generated an #AC.
  - Transitioning between modes at runtime isn't supported and disabling
    is tracked per task, so hardware will always reach a steady state that
    matches the configured mode.  I.e. split-lock is guaranteed to be
    enabled in hardware once all _TIF_SLD threads have been scheduled out.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

V16:

Thomas: Rewrote the context switch as you suggested with XOR/AND
	to avoid the function call when TIF_SLD hasn't changed
Boris:	Fixed up all the bits from you comments (except the few
	that I listed in the reply to your e-mail). I think the
	only outstanding item is a followup patch to remove
	the duplicate match_option() inline function pasted from
	cpu/bugs.c ... we can bikeshed what to name it in another
	thread.

 .../admin-guide/kernel-parameters.txt         |  22 +++
 arch/x86/include/asm/cpu.h                    |  12 ++
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/msr-index.h              |   9 +
 arch/x86/include/asm/thread_info.h            |   8 +-
 arch/x86/kernel/cpu/common.c                  |   2 +
 arch/x86/kernel/cpu/intel.c                   | 177 ++++++++++++++++++
 arch/x86/kernel/process.c                     |   3 +
 arch/x86/kernel/traps.c                       |  24 ++-
 9 files changed, 254 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7f1e2f327e43..869afed16154 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4625,6 +4625,28 @@
 	spia_pedr=
 	spia_peddr=
 
+	split_lock_detect=
+			[X86] Enable split lock detection
+
+			When enabled (and if hardware support is present), atomic
+			instructions that access data across cache line
+			boundaries will result in an alignment check exception.
+
+			off	- not enabled
+
+			warn	- the kernel will emit rate limited warnings
+				  about applications triggering the #AC
+				  exception. This mode is the default on h/w
+				  that supports split lock detection.
+
+			fatal	- the kernel will send SIGBUS to applications
+				  that trigger the #AC exception.
+
+			If an #AC exception is hit in the kernel or in
+			firmware (i.e. not while executiing in user mode)
+			the the Linux will oops in either "warn" or "fatal"
+			mode.
+
 	srcutree.counter_wrap_check [KNL]
 			Specifies how frequently to check for
 			grace-period sequence counter wrap for the
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..ff6f3ca649b3 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,16 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern void switch_to_sld(unsigned long tifn);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline void switch_to_sld(unsigned long tifn) {}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	return false;
+}
+#endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb56edf..cd56ad5d308e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
 #define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 #define X86_FEATURE_FENCE_SWAPGS_USER	(11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -367,6 +368,7 @@
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
 /*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685e92dd..8821697a7549 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_TEST_CTRL				0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
 #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
 #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,11 @@
  */
 #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
 
+/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
+#define MSR_IA32_CORE_CAPS			  0x000000cf
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT  5
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf4327986e98..e399dcefc2a7 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
 #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
+#define TIF_SLD			18	/* Restore split lock detection on context switch */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
 #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
+#define _TIF_SLD		(1 << TIF_SLD)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
@@ -157,10 +159,10 @@ struct thread_info {
 #endif
 
 #ifdef CONFIG_X86_IOPL_IOPERM
-# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
-				 _TIF_IO_BITMAP)
+# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW | _TIF_USER_RETURN_NOTIFY | \
+				 _TIF_IO_BITMAP | _TIF_SLD)
 #else
-# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW | _TIF_USER_RETURN_NOTIFY | _TIF_SLD)
 #endif
 
 #define _TIF_WORK_CTXSW_NEXT	(_TIF_WORK_CTXSW)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 86b8241c8209..adb2f639f388 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
+	cpu_set_core_cap_bits(c);
+
 	fpu__init_system(c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57473e2c0869..a84de224ffb0 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -31,6 +33,19 @@
 #include <asm/apic.h>
 #endif
 
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
+/*
+ * Default to sld_off because most systems do not support split lock detection
+ * split_lock_setup() will switch this to sld_warn on systems that support
+ * split lock detect, and then check to see if there is a command line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
@@ -606,6 +621,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_init(void);
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
@@ -720,6 +737,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_enable();
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
+
+	split_lock_init();
 }
 
 #ifdef CONFIG_X86_32
@@ -981,3 +1000,161 @@ static const struct cpu_dev intel_cpu_dev = {
 };
 
 cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+	const char			*option;
+	enum split_lock_detect_state	state;
+} sld_options[] __initconst = {
+	{ "off",	sld_off   },
+	{ "warn",	sld_warn  },
+	{ "fatal",	sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+	int len = strlen(opt);
+
+	return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+	char arg[20];
+	int i, ret;
+
+	sld_state = sld_warn;
+	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+				  arg, sizeof(arg));
+	if (ret < 0)
+		goto print;
+
+	for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+		if (match_option(arg, ret, sld_options[i].option)) {
+			sld_state = sld_options[i].state;
+			break;
+		}
+	}
+
+print:
+	switch (sld_state) {
+	case sld_off:
+		pr_info("disabled\n");
+		break;
+
+	case sld_warn:
+		pr_info("warning about user-space split_locks\n");
+		break;
+
+	case sld_fatal:
+		pr_info("sending SIGBUS on user-space split_locks\n");
+		break;
+	}
+}
+
+/*
+ * Locking is not required at the moment because only bit 29 of this
+ * MSR is implemented and locking would not prevent that the operation
+ * of one thread is immediately undone by the sibling thread.
+ * Use the "safe" versions of rdmsr/wrmsr here because although code
+ * checks CPUID and MSR bits to make sure the TEST_CTRL MSR should
+ * exist, there may be glitches in virtualization that leave a guest
+ * with an incorrect view of real h/w capabilities.
+ */
+static bool __sld_msr_set(bool on)
+{
+	u64 test_ctrl_val;
+
+	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+		return false;
+
+	if (on)
+		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+	else
+		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
+}
+
+static void split_lock_init(void)
+{
+	if (sld_state == sld_off)
+		return;
+
+	if (__sld_msr_set(true))
+		return;
+
+	/*
+	 * If this is anything other than the boot-cpu, you've done
+	 * funny things and you get to keep whatever pieces.
+	 */
+	pr_warn("MSR fail -- disabled\n");
+	__sld_msr_set(sld_off);
+	sld_state = sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+		return false;
+
+	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+			    current->comm, current->pid, regs->ip);
+
+	/*
+	 * Disable the split lock detection for this task so it can make
+	 * progress and set TIF_SLD so the detection is re-enabled via
+	 * switch_to_sld() when the task is scheduled out.
+	 */
+	__sld_msr_set(false);
+	set_tsk_thread_flag(current, TIF_SLD);
+	return true;
+}
+
+/*
+ * This function is called only when switching between tasks with
+ * different split-lock detection modes. It sets the MSR for the
+ * mode of the new task. This is right most of the time, but since
+ * the MSR is shared by hyperthreads on a physical core there can
+ * be glitches when the two threads need different modes.
+ */
+void switch_to_sld(unsigned long tifn)
+{
+	__sld_msr_set(tifn & _TIF_SLD);
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+	{}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+	u64 ia32_core_caps = 0;
+
+	if (c->x86_vendor != X86_VENDOR_INTEL)
+		return;
+	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+		rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
+	} else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+		/* Enumerate split lock detection by family and model. */
+		if (x86_match_cpu(split_lock_cpu_ids))
+			ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+	}
+
+	if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
+		split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 839b5244e3b7..a43c32868c3c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -650,6 +650,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
 		/* Enforce MSR update to ensure consistent state */
 		__speculation_ctrl_update(~tifn, tifn);
 	}
+
+	if ((tifp ^ tifn) & _TIF_SLD)
+		switch_to_sld(tifn);
 }
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e6f822922a3..9f42f0a32185 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
 #include <asm/traps.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/cpu.h>
 #include <asm/cpu_entry_area.h>
 #include <asm/mce.h>
 #include <asm/fixmap.h>
@@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
 {
 	struct task_struct *tsk = current;
 
-
 	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
 		return;
 
@@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
 DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
 DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
 DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
-DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
 #undef IP
 
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	char *str = "alignment check";
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+		return;
+
+	if (!user_mode(regs))
+		die("Split lock detected\n", regs, error_code);
+
+	local_irq_enable();
+
+	if (handle_user_split_lock(regs, error_code))
+		return;
+
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);
+}
+
 #ifdef CONFIG_VMAP_STACK
 __visible void __noreturn handle_stack_overflow(const char *message,
 						struct pt_regs *regs,
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 21:42                                                       ` Luck, Tony
@ 2020-01-25 22:17                                                         ` Borislav Petkov
  0 siblings, 0 replies; 145+ messages in thread
From: Borislav Petkov @ 2020-01-25 22:17 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Peter Zijlstra, Thomas Gleixner, Arvind Sankar, Christopherson,
	Sean J, Ingo Molnar, Yu, Fenghua, Ingo Molnar, H Peter Anvin,
	Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Sat, Jan 25, 2020 at 01:42:32PM -0800, Luck, Tony wrote:
> On Sat, Jan 25, 2020 at 09:33:12PM +0100, Borislav Petkov wrote:
> > On Sat, Jan 25, 2020 at 09:12:21PM +0100, Peter Zijlstra wrote:
> > > My thinking was Virt, virt likes to mess up all msr expectations.
> > 
> > My only worry is to have it written down why we're doing this so that it
> > can be changed/removed later, when we've forgotten all about split lock.
> > Because pretty often we look at a comment-less chunk of code and wonder,
> > "why the hell did we add this in the first place."
> 
> Ok. I added a comment:
> 
>  * Use the "safe" versions of rdmsr/wrmsr here because although code
>  * checks CPUID and MSR bits to make sure the TEST_CTRL MSR should
>  * exist, there may be glitches in virtualization that leave a guest
>  * with an incorrect view of real h/w capabilities.

Yap, nice.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v16] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 22:07                                                 ` [PATCH v16] " Luck, Tony
@ 2020-01-25 22:43                                                   ` Mark D Rustad
  2020-01-25 23:10                                                     ` Luck, Tony
  2020-01-26  0:34                                                   ` [PATCH v16] " Andy Lutomirski
  1 sibling, 1 reply; 145+ messages in thread
From: Mark D Rustad @ 2020-01-25 22:43 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, Arvind Sankar, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

[-- Attachment #1: Type: text/plain, Size: 776 bytes --]

On Jan 25, 2020, at 2:07 PM, Luck, Tony <tony.luck@intel.com> wrote:

>   - Transitioning between modes at runtime isn't supported and disabling
>     is tracked per task, so hardware will always reach a steady state that
>     matches the configured mode.

Maybe "isn't supported" is not really the right wording. I would think that  
if it truly weren't supported that you really shouldn't be changing the  
mode at all at runtime. Do you really just mean "isn't atomic"? Or is there  
something deeper about it? If so, are there other possible risks associated  
with changing the mode at runtime?

Sorry, the wording just happened to catch my eye and my mind immediately  
went to "how can you be doing something that is not supported?"

--
Mark Rustad, MRustad@gmail.com

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v16] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 22:43                                                   ` Mark D Rustad
@ 2020-01-25 23:10                                                     ` Luck, Tony
  2020-01-26 17:27                                                       ` Mark D Rustad
  0 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2020-01-25 23:10 UTC (permalink / raw)
  To: Mark D Rustad
  Cc: Thomas Gleixner, Arvind Sankar, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86


> 
> Maybe "isn't supported" is not really the right wording. I would think that if it truly weren't supported that you really shouldn't be changing the mode at all at runtime. Do you really just mean "isn't atomic"? Or is there something deeper about it? If so, are there other possible risks associated with changing the mode at runtime?
> 
> Sorry, the wording just happened to catch my eye 

The “modes” here means the three option selectable by command line option. Off/warn/fatal. Some earlier versions of this patch had a sysfs interface to switch things around.

Not whether we have the MSR enabled/disabled.

If Thomas or Boris finds more things to fix then I’ll take a look at clarifying this comment too.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 21:50                                                 ` Luck, Tony
@ 2020-01-25 23:51                                                   ` Arvind Sankar
  2020-01-26  2:52                                                     ` Luck, Tony
  2020-01-27  8:04                                                   ` Peter Zijlstra
  1 sibling, 1 reply; 145+ messages in thread
From: Arvind Sankar @ 2020-01-25 23:51 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Arvind Sankar, Thomas Gleixner, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

On Sat, Jan 25, 2020 at 01:50:03PM -0800, Luck, Tony wrote:
> > 
> > I might be missing something but shouldnt this be !nextflag given the
> > flag being unset is when the task wants sld?
> 
> That logic is convoluted ... but Thomas showed me a much better
> way that is also much simpler ... so this code has gone now. The
> new version is far easier to read (argument is flags for the new task
> that we are switching to)
> 
> void switch_to_sld(unsigned long tifn)
> {
>         __sld_msr_set(tifn & _TIF_SLD);
> }
> 
> -Tony

why doesnt this have the same problem though? tifn & _TIF_SLD still
needs to be logically negated no?

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v16] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 22:07                                                 ` [PATCH v16] " Luck, Tony
  2020-01-25 22:43                                                   ` Mark D Rustad
@ 2020-01-26  0:34                                                   ` Andy Lutomirski
  2020-01-26 20:01                                                     ` Luck, Tony
  1 sibling, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2020-01-26  0:34 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, Arvind Sankar, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86, Andrew Cooper

On Sat, Jan 25, 2020 at 2:07 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> From: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>

> +void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
> +{
> +       u64 ia32_core_caps = 0;
> +
> +       if (c->x86_vendor != X86_VENDOR_INTEL)
> +               return;
> +       if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
> +               /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
> +               rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
> +       } else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> +               /* Enumerate split lock detection by family and model. */
> +               if (x86_match_cpu(split_lock_cpu_ids))
> +                       ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
> +       }

I was chatting with Andrew Cooper, and apparently there are a ton of
hypervisors bugs in this space, and the bugs take two forms.  Some
hypervisors might #GP the read, and some might allow the read but
silently swallow writes.  This isn't *that* likely given that the
hypervisor bit is the default, but we could improve this like (sorry
for awful whitespace);

static bool have_split_lock_detect(void) {
      unsigned long tmp;

      if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES) {
              /* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
               rdmsrl(MSR_IA32_CORE_CAPS, tmp);
               if (tmp & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
                  return true;
      }

      if (cpu_has(c, X86_FEATURE_HYPERVISOR))
            return false;

       if (rdmsrl_safe(MSR_TEST_CTRL, &tmp))
               return false;

      if (wrmsrl_safe(MSR_TEST_CTRL, tmp ^ MSR_TEST_CTRL_SPLIT_LOCK_DETECT))
              return false;

      wrmsrl(MSR_TEST_CTRL, tmp);
}

Although I suppose the pile of wrmsrl_safes() in the existing patch
might be sufficient.

All this being said, the current code appears wrong if a CPU is in the
list but does have X86_FEATURE_CORE_CAPABILITIES.  Are there such
CPUs?  I think either the logic should be changed or a comment should
be added.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 23:51                                                   ` Arvind Sankar
@ 2020-01-26  2:52                                                     ` Luck, Tony
  2020-01-27  2:05                                                       ` Tony Luck
  0 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2020-01-26  2:52 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Thomas Gleixner, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Sat, Jan 25, 2020 at 06:51:31PM -0500, Arvind Sankar wrote:
> On Sat, Jan 25, 2020 at 01:50:03PM -0800, Luck, Tony wrote:
> > > 
> > > I might be missing something but shouldnt this be !nextflag given the
> > > flag being unset is when the task wants sld?
> > 
> > That logic is convoluted ... but Thomas showed me a much better
> > way that is also much simpler ... so this code has gone now. The
> > new version is far easier to read (argument is flags for the new task
> > that we are switching to)
> > 
> > void switch_to_sld(unsigned long tifn)
> > {
> >         __sld_msr_set(tifn & _TIF_SLD);
> > }
> > 
> > -Tony
> 
> why doesnt this have the same problem though? tifn & _TIF_SLD still
> needs to be logically negated no?

There's something very odd happening. I added this trace code:

        if ((tifp ^ tifn) & _TIF_SLD) {
                pr_info("switch from %d (%d) to %d (%d)\n",
                        task_tgid_nr(prev_p), (tifp & _TIF_SLD) != 0,
                        task_tgid_nr(next_p), (tifn & _TIF_SLD) != 0);
                switch_to_sld(tifn);
        }

Then ran:

$ taskset -cp 10 $$	# bind everything to just one CPU
pid 3205's current affinity list: 0-55
pid 3205's new affinity list: 10
$ ./spin &		# infinite loop
[1] 3289
$ ./split_lock_test &	# 10 * split lock with udelay(1000) between
[2] 3294

I was expecting to see transitions back & forward between the "spin"
process (which won't have TIF_SLD set) and the test program (which
will have it set after the first split executes).

But I see:
[   83.871629] x86/split lock detection: #AC: split_lock_test/3294 took a split_lock trap at address: 0x4007fc
[   83.871638] process: switch from 3294 (1) to 3289 (0)
[   83.882583] process: switch from 3294 (1) to 3289 (0)
[   83.893555] process: switch from 3294 (1) to 3289 (0)
[   83.904528] process: switch from 3294 (1) to 3289 (0)
[   83.915501] process: switch from 3294 (1) to 3289 (0)
[   83.926475] process: switch from 3294 (1) to 3289 (0)
[   83.937448] process: switch from 3294 (1) to 3289 (0)
[   83.948421] process: switch from 3294 (1) to 3289 (0)
[   83.959394] process: switch from 3294 (1) to 3289 (0)
[   83.970439] process: switch from 3294 (1) to 3289 (0)

i.e. only the switches from the test process to the spinner.

So split-lock testing is disabled when we first hit the #AC
and is never re-enabled because we don't pass through this
code when switching to the spinner.

So you are right that the argument is inverted. We should be
ENABLING split lock when switching to the spin loop process
and we actually disable.

So why don't we come through __switch_to_xtra() when the spinner
runs out its time slice (or the udelay interrupt happens and
preempts the spinner)?

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v16] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 23:10                                                     ` Luck, Tony
@ 2020-01-26 17:27                                                       ` Mark D Rustad
  2020-01-26 20:05                                                         ` [PATCH v17] " Luck, Tony
  0 siblings, 1 reply; 145+ messages in thread
From: Mark D Rustad @ 2020-01-26 17:27 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, Arvind Sankar, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

[-- Attachment #1: Type: text/plain, Size: 378 bytes --]

On Jan 25, 2020, at 3:10 PM, Luck, Tony <tony.luck@intel.com> wrote:

> The “modes” here means the three option selectable by command line  
> option. Off/warn/fatal. Some earlier versions of this patch had a sysfs  
> interface to switch things around.
>
> Not whether we have the MSR enabled/disabled.

Ok. Thanks for the clarification.

--
Mark Rustad, MRustad@gmail.com

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v16] x86/split_lock: Enable split lock detection by kernel
  2020-01-26  0:34                                                   ` [PATCH v16] " Andy Lutomirski
@ 2020-01-26 20:01                                                     ` Luck, Tony
  0 siblings, 0 replies; 145+ messages in thread
From: Luck, Tony @ 2020-01-26 20:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Arvind Sankar, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86, Andrew Cooper

On Sat, Jan 25, 2020 at 04:34:29PM -0800, Andy Lutomirski wrote:
> Although I suppose the pile of wrmsrl_safes() in the existing patch
> might be sufficient.
> 
> All this being said, the current code appears wrong if a CPU is in the
> list but does have X86_FEATURE_CORE_CAPABILITIES.  Are there such
> CPUs?  I think either the logic should be changed or a comment should
> be added.

Is it really wrong? Code check the CPUID & CORE_CAPABILTIES first and
believes what they say. Otherwise falls back to the x86_match_cpu()
list.

I don't believe we put a CPU on that list that currently says
it supports CORE_CAPABILITIES. That could theoretically change
with a microcode update. I doubt we'd waste microcode space to do
that, but if we did, I assume we'd include the split lock bit
in the newly present MSR. So behavior would not change.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* [PATCH v17] x86/split_lock: Enable split lock detection by kernel
  2020-01-26 17:27                                                       ` Mark D Rustad
@ 2020-01-26 20:05                                                         ` Luck, Tony
  2020-01-29 12:31                                                           ` Thomas Gleixner
                                                                             ` (3 more replies)
  0 siblings, 4 replies; 145+ messages in thread
From: Luck, Tony @ 2020-01-26 20:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Mark D Rustad, Arvind Sankar, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

From: "Peter Zijlstra (Intel)" <peterz@infradead.org>

A split-lock occurs when an atomic instruction operates on data
that spans two cache lines. In order to maintain atomicity the
core takes a global bus lock.

This is typically >1000 cycles slower than an atomic operation
within a cache line. It also disrupts performance on other cores
(which must wait for the bus lock to be released before their
memory operations can complete). For real-time systems this may
mean missing deadlines. For other systems it may just be very
annoying.

Some CPUs have the capability to raise an #AC trap when a
split lock is attempted.

Provide a command line option to give the user choices on how
to handle this. split_lock_detect=
	off	- not enabled (no traps for split locks)
	warn	- warn once when an application does a
		  split lock, but allow it to continue
		  running.
	fatal	- Send SIGBUS to applications that cause split lock

On systems that support split lock detection the default is "warn". Note
that if the kernel hits a split lock in any mode other than "off" it
will OOPs.

One implementation wrinkle is that the MSR to control the
split lock detection is per-core, not per thread. This might
result in some short lived races on HT systems in "warn" mode
if Linux tries to enable on one thread while disabling on
the other. Race analysis by Sean Christopherson:

  - Toggling of split-lock is only done in "warn" mode.  Worst case
    scenario of a race is that a misbehaving task will generate multiple
    #AC exceptions on the same instruction.  And this race will only occur
    if both siblings are running tasks that generate split-lock #ACs, e.g.
    a race where sibling threads are writing different values will only
    occur if CPUx is disabling split-lock after an #AC and CPUy is
    re-enabling split-lock after *its* previous task generated an #AC.
  - Transitioning between off/warn/fatal modes at runtime isn't supported
    and disabling is tracked per task, so hardware will always reach a steady
    state that matches the configured mode.  I.e. split-lock is guaranteed to
    be enabled in hardware once all _TIF_SLD threads have been scheduled out.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---

v17:
Mark Rustad:
	Clarify in commit comment that changing modes refers to the
	boot time option off/warn/fatal. Not to the split-lock detection
	mode set by the TEST_CTL MSR.

Arvind Sankar:
	The test for whether to reset the MSR in context switch was reversed.
	Should be: __sld_msr_set(!(tifn & _TIF_SLD));
	[Sorry you had to tell me twice]

Me:
	Make sure we call __switch_to_xtra() both when switching to a task
	with TIF_SLD set as well as switching from a TIF_SLD task. See
	<asm/thread_info.h> now sets _TIF_SLD in _TIF_WORK_CTXSW_BASE
	instead of in _TIF_WORK_CTXSW_PREV

 .../admin-guide/kernel-parameters.txt         |  22 +++
 arch/x86/include/asm/cpu.h                    |  12 ++
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/msr-index.h              |   9 +
 arch/x86/include/asm/thread_info.h            |   8 +-
 arch/x86/kernel/cpu/common.c                  |   2 +
 arch/x86/kernel/cpu/intel.c                   | 177 ++++++++++++++++++
 arch/x86/kernel/process.c                     |   3 +
 arch/x86/kernel/traps.c                       |  24 ++-
 9 files changed, 254 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 0ab95f48292b..97d7c7cfd107 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4638,6 +4638,28 @@
 	spia_pedr=
 	spia_peddr=
 
+	split_lock_detect=
+			[X86] Enable split lock detection
+
+			When enabled (and if hardware support is present), atomic
+			instructions that access data across cache line
+			boundaries will result in an alignment check exception.
+
+			off	- not enabled
+
+			warn	- the kernel will emit rate limited warnings
+				  about applications triggering the #AC
+				  exception. This mode is the default on h/w
+				  that supports split lock detection.
+
+			fatal	- the kernel will send SIGBUS to applications
+				  that trigger the #AC exception.
+
+			If an #AC exception is hit in the kernel or in
+			firmware (i.e. not while executiing in user mode)
+			the the Linux will oops in either "warn" or "fatal"
+			mode.
+
 	srcutree.counter_wrap_check [KNL]
 			Specifies how frequently to check for
 			grace-period sequence counter wrap for the
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..ff6f3ca649b3 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,16 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern void switch_to_sld(unsigned long tifn);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline void switch_to_sld(unsigned long tifn) {}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	return false;
+}
+#endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb56edf..cd56ad5d308e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
 #define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 #define X86_FEATURE_FENCE_SWAPGS_USER	(11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -367,6 +368,7 @@
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
 /*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685e92dd..8821697a7549 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_TEST_CTRL				0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
 #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
 #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,11 @@
  */
 #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
 
+/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
+#define MSR_IA32_CORE_CAPS			  0x000000cf
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT  5
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf4327986e98..e90ddac22d11 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
 #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
+#define TIF_SLD			18	/* Restore split lock detection on context switch */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
 #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
+#define _TIF_SLD		(1 << TIF_SLD)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
@@ -145,7 +147,7 @@ struct thread_info {
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW_BASE					\
 	(_TIF_NOCPUID | _TIF_NOTSC | _TIF_BLOCKSTEP |		\
-	 _TIF_SSBD | _TIF_SPEC_FORCE_UPDATE)
+	 _TIF_SSBD | _TIF_SPEC_FORCE_UPDATE | _TIF_SLD)
 
 /*
  * Avoid calls to __switch_to_xtra() on UP as STIBP is not evaluated.
@@ -157,10 +159,10 @@ struct thread_info {
 #endif
 
 #ifdef CONFIG_X86_IOPL_IOPERM
-# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY | \
+# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW | _TIF_USER_RETURN_NOTIFY | \
 				 _TIF_IO_BITMAP)
 #else
-# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW| _TIF_USER_RETURN_NOTIFY)
+# define _TIF_WORK_CTXSW_PREV	(_TIF_WORK_CTXSW | _TIF_USER_RETURN_NOTIFY)
 #endif
 
 #define _TIF_WORK_CTXSW_NEXT	(_TIF_WORK_CTXSW)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 86b8241c8209..adb2f639f388 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
+	cpu_set_core_cap_bits(c);
+
 	fpu__init_system(c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57473e2c0869..99f62e7eb4b0 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -31,6 +33,19 @@
 #include <asm/apic.h>
 #endif
 
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
+/*
+ * Default to sld_off because most systems do not support split lock detection
+ * split_lock_setup() will switch this to sld_warn on systems that support
+ * split lock detect, and then check to see if there is a command line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
@@ -606,6 +621,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_init(void);
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
@@ -720,6 +737,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_enable();
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
+
+	split_lock_init();
 }
 
 #ifdef CONFIG_X86_32
@@ -981,3 +1000,161 @@ static const struct cpu_dev intel_cpu_dev = {
 };
 
 cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+	const char			*option;
+	enum split_lock_detect_state	state;
+} sld_options[] __initconst = {
+	{ "off",	sld_off   },
+	{ "warn",	sld_warn  },
+	{ "fatal",	sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+	int len = strlen(opt);
+
+	return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+	char arg[20];
+	int i, ret;
+
+	sld_state = sld_warn;
+	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+
+	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+				  arg, sizeof(arg));
+	if (ret < 0)
+		goto print;
+
+	for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+		if (match_option(arg, ret, sld_options[i].option)) {
+			sld_state = sld_options[i].state;
+			break;
+		}
+	}
+
+print:
+	switch (sld_state) {
+	case sld_off:
+		pr_info("disabled\n");
+		break;
+
+	case sld_warn:
+		pr_info("warning about user-space split_locks\n");
+		break;
+
+	case sld_fatal:
+		pr_info("sending SIGBUS on user-space split_locks\n");
+		break;
+	}
+}
+
+/*
+ * Locking is not required at the moment because only bit 29 of this
+ * MSR is implemented and locking would not prevent that the operation
+ * of one thread is immediately undone by the sibling thread.
+ * Use the "safe" versions of rdmsr/wrmsr here because although code
+ * checks CPUID and MSR bits to make sure the TEST_CTRL MSR should
+ * exist, there may be glitches in virtualization that leave a guest
+ * with an incorrect view of real h/w capabilities.
+ */
+static bool __sld_msr_set(bool on)
+{
+	u64 test_ctrl_val;
+
+	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+		return false;
+
+	if (on)
+		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+	else
+		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
+}
+
+static void split_lock_init(void)
+{
+	if (sld_state == sld_off)
+		return;
+
+	if (__sld_msr_set(true))
+		return;
+
+	/*
+	 * If this is anything other than the boot-cpu, you've done
+	 * funny things and you get to keep whatever pieces.
+	 */
+	pr_warn("MSR fail -- disabled\n");
+	__sld_msr_set(sld_off);
+	sld_state = sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+		return false;
+
+	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+			    current->comm, current->pid, regs->ip);
+
+	/*
+	 * Disable the split lock detection for this task so it can make
+	 * progress and set TIF_SLD so the detection is re-enabled via
+	 * switch_to_sld() when the task is scheduled out.
+	 */
+	__sld_msr_set(false);
+	set_tsk_thread_flag(current, TIF_SLD);
+	return true;
+}
+
+/*
+ * This function is called only when switching between tasks with
+ * different split-lock detection modes. It sets the MSR for the
+ * mode of the new task. This is right most of the time, but since
+ * the MSR is shared by hyperthreads on a physical core there can
+ * be glitches when the two threads need different modes.
+ */
+void switch_to_sld(unsigned long tifn)
+{
+	__sld_msr_set(!(tifn & _TIF_SLD));
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have split lock detection feature. But since they
+ * don't have MSR IA32_CORE_CAPABILITIES, the feature cannot be enumerated by
+ * the MSR. So enumerate the feature by family and model on these processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+	{}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+	u64 ia32_core_caps = 0;
+
+	if (c->x86_vendor != X86_VENDOR_INTEL)
+		return;
+	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+		rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
+	} else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+		/* Enumerate split lock detection by family and model. */
+		if (x86_match_cpu(split_lock_cpu_ids))
+			ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+	}
+
+	if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
+		split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 839b5244e3b7..a43c32868c3c 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -650,6 +650,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
 		/* Enforce MSR update to ensure consistent state */
 		__speculation_ctrl_update(~tifn, tifn);
 	}
+
+	if ((tifp ^ tifn) & _TIF_SLD)
+		switch_to_sld(tifn);
 }
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e6f822922a3..9f42f0a32185 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
 #include <asm/traps.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/cpu.h>
 #include <asm/cpu_entry_area.h>
 #include <asm/mce.h>
 #include <asm/fixmap.h>
@@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
 {
 	struct task_struct *tsk = current;
 
-
 	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
 		return;
 
@@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
 DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
 DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
 DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
-DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
 #undef IP
 
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	char *str = "alignment check";
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+		return;
+
+	if (!user_mode(regs))
+		die("Split lock detected\n", regs, error_code);
+
+	local_irq_enable();
+
+	if (handle_user_split_lock(regs, error_code))
+		return;
+
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);
+}
+
 #ifdef CONFIG_VMAP_STACK
 __visible void __noreturn handle_stack_overflow(const char *message,
 						struct pt_regs *regs,
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-26  2:52                                                     ` Luck, Tony
@ 2020-01-27  2:05                                                       ` Tony Luck
  0 siblings, 0 replies; 145+ messages in thread
From: Tony Luck @ 2020-01-27  2:05 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Thomas Gleixner, Christopherson, Sean J, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Sat, Jan 25, 2020 at 6:53 PM Luck, Tony <tony.luck@intel.com> wrote:

> So why don't we come through __switch_to_xtra() when the spinner
> runs out its time slice (or the udelay interrupt happens and
> preempts the spinner)?

To close out this part of the thread. Linux doesn't call __switch_to_xtra()
in this case because I didn't ask it to. There are separate masks to check
TIF bits for the previous and next tasks in a context switch. I'd only set the
_TIF_SLD bit in the mask for the previous task.

See the v17 I posted a few hours before this message for the fix.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 21:25                                               ` [PATCH v15] " Arvind Sankar
  2020-01-25 21:50                                                 ` Luck, Tony
@ 2020-01-27  8:02                                                 ` Peter Zijlstra
  1 sibling, 0 replies; 145+ messages in thread
From: Peter Zijlstra @ 2020-01-27  8:02 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Luck, Tony, Thomas Gleixner, Christopherson, Sean J, Ingo Molnar,
	Yu, Fenghua, Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj,
	Ashok, Shankar, Ravi V, linux-kernel, x86

On Sat, Jan 25, 2020 at 04:25:25PM -0500, Arvind Sankar wrote:
> On Fri, Jan 24, 2020 at 06:47:27PM -0800, Luck, Tony wrote:
> > I did find something with a new test. Applications that hit a
> > split lock warn as expected. But if they sleep before they hit
> > a new split lock, we get another warning. This is may be because
> > I messed up when fixing a PeterZ typo in the untested patch.
> > But I think there may have been bigger problems.
> > 
> > Context switch in V14 code did: 
> > 
> >        if (tifp & _TIF_SLD)
> >                switch_to_sld(prev_p);
> > 
> > void switch_to_sld(struct task_struct *prev)
> > {
> >        __sld_msr_set(true);
> >        clear_tsk_thread_flag(prev, TIF_SLD);
> > }
> > 
> > Which re-enables split lock checking for the next process to run. But
> > mysteriously clears the TIF_SLD bit on the previous task.
> 
> Did Peter mean to disable it only for the current timeslice and
> re-enable it for the next time its scheduled?

That was the initial approach, yes. I was thinking it might help find
multiple spots in bad programs.

And as I said; I used perf on my desktop and couldn't find a single bad
program, so I'm not actually expecting this to trigger much.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-25 21:50                                                 ` Luck, Tony
  2020-01-25 23:51                                                   ` Arvind Sankar
@ 2020-01-27  8:04                                                   ` Peter Zijlstra
  2020-01-27  8:36                                                     ` Peter Zijlstra
  2020-01-27 17:35                                                     ` Luck, Tony
  1 sibling, 2 replies; 145+ messages in thread
From: Peter Zijlstra @ 2020-01-27  8:04 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Arvind Sankar, Thomas Gleixner, Christopherson, Sean J,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Sat, Jan 25, 2020 at 01:50:03PM -0800, Luck, Tony wrote:
> On Sat, Jan 25, 2020 at 04:25:25PM -0500, Arvind Sankar wrote:
> > On Fri, Jan 24, 2020 at 06:47:27PM -0800, Luck, Tony wrote:
> > > I did find something with a new test. Applications that hit a
> > > split lock warn as expected. But if they sleep before they hit
> > > a new split lock, we get another warning. This is may be because
> > > I messed up when fixing a PeterZ typo in the untested patch.
> > > But I think there may have been bigger problems.
> > > 
> > > Context switch in V14 code did: 
> > > 
> > >        if (tifp & _TIF_SLD)
> > >                switch_to_sld(prev_p);
> > > 
> > > void switch_to_sld(struct task_struct *prev)
> > > {
> > >        __sld_msr_set(true);
> > >        clear_tsk_thread_flag(prev, TIF_SLD);
> > > }
> > > 
> > > Which re-enables split lock checking for the next process to run. But
> > > mysteriously clears the TIF_SLD bit on the previous task.
> > 
> > Did Peter mean to disable it only for the current timeslice and
> > re-enable it for the next time its scheduled?
> 
> He's seen and commented on this thread since I made this comment. So

Yeah, I sorta don't care either way :-)

> I'll assume not.  Things get really noisy on the console (even with
> the rate limit) if split lock detection is re-enabled after a context
> switch (my new test highlighted this!)

Have you found any actual bad software ? The only way I could trigger
was by explicitly writing a program to tickle it.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-27  8:04                                                   ` Peter Zijlstra
@ 2020-01-27  8:36                                                     ` Peter Zijlstra
  2020-01-27 17:35                                                     ` Luck, Tony
  1 sibling, 0 replies; 145+ messages in thread
From: Peter Zijlstra @ 2020-01-27  8:36 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Arvind Sankar, Thomas Gleixner, Christopherson, Sean J,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Mon, Jan 27, 2020 at 09:04:19AM +0100, Peter Zijlstra wrote:
> On Sat, Jan 25, 2020 at 01:50:03PM -0800, Luck, Tony wrote:
> > On Sat, Jan 25, 2020 at 04:25:25PM -0500, Arvind Sankar wrote:
> > > On Fri, Jan 24, 2020 at 06:47:27PM -0800, Luck, Tony wrote:
> > > > I did find something with a new test. Applications that hit a
> > > > split lock warn as expected. But if they sleep before they hit
> > > > a new split lock, we get another warning. This is may be because
> > > > I messed up when fixing a PeterZ typo in the untested patch.
> > > > But I think there may have been bigger problems.
> > > > 
> > > > Context switch in V14 code did: 
> > > > 
> > > >        if (tifp & _TIF_SLD)
> > > >                switch_to_sld(prev_p);
> > > > 
> > > > void switch_to_sld(struct task_struct *prev)
> > > > {
> > > >        __sld_msr_set(true);
> > > >        clear_tsk_thread_flag(prev, TIF_SLD);
> > > > }
> > > > 
> > > > Which re-enables split lock checking for the next process to run. But
> > > > mysteriously clears the TIF_SLD bit on the previous task.
> > > 
> > > Did Peter mean to disable it only for the current timeslice and
> > > re-enable it for the next time its scheduled?
> > 
> > He's seen and commented on this thread since I made this comment. So
> 
> Yeah, I sorta don't care either way :-)

Part of the reason I did that was to get the MSR back to enabled ASAP,
to limit the blind spot on the sibling.

By leaving the TIF_SLD cleared for a task, and using the XOR logic used
for other TIF flags, the blind spots will be much larger.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* RE: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
  2020-01-27  8:04                                                   ` Peter Zijlstra
  2020-01-27  8:36                                                     ` Peter Zijlstra
@ 2020-01-27 17:35                                                     ` Luck, Tony
  1 sibling, 0 replies; 145+ messages in thread
From: Luck, Tony @ 2020-01-27 17:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arvind Sankar, Thomas Gleixner, Christopherson, Sean J,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

> Have you found any actual bad software ? The only way I could trigger
> was by explicitly writing a program to tickle it.

No application or library issues found so far (though I'm not running the kind of multi-threaded
applications that might be using atomic operations for synchronization).

Only Linux kernel seems to have APIs that make it easy for programmers to accidently split
an atomic operation between cache lines.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v17] x86/split_lock: Enable split lock detection by kernel
  2020-01-26 20:05                                                         ` [PATCH v17] " Luck, Tony
@ 2020-01-29 12:31                                                           ` Thomas Gleixner
  2020-01-29 15:24                                                           ` [tip: x86/cpu] " tip-bot2 for Peter Zijlstra (Intel)
                                                                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2020-01-29 12:31 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Mark D Rustad, Arvind Sankar, Christopherson, Sean J,
	Peter Zijlstra, Ingo Molnar, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar, Ravi V,
	linux-kernel, x86

"Luck, Tony" <tony.luck@intel.com> writes:
> +static bool __sld_msr_set(bool on)
> +{
> +	u64 test_ctrl_val;
> +
> +	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
> +		return false;
> +
> +	if (on)
> +		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +	else
> +		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +
> +	return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
> +}
> +
> +static void split_lock_init(void)
> +{
> +	if (sld_state == sld_off)
> +		return;
> +
> +	if (__sld_msr_set(true))
> +		return;
> +
> +	/*
> +	 * If this is anything other than the boot-cpu, you've done
> +	 * funny things and you get to keep whatever pieces.
> +	 */
> +	pr_warn("MSR fail -- disabled\n");
> +	__sld_msr_set(sld_off);

This one is pretty pointless. If the rdmsrl or the wrmsrl failed, then
the next attempt is going to fail too. Aside of that sld_off would be not
really the right argument value here. I just zap that line.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 145+ messages in thread

* [tip: x86/cpu] x86/split_lock: Enable split lock detection by kernel
  2020-01-26 20:05                                                         ` [PATCH v17] " Luck, Tony
  2020-01-29 12:31                                                           ` Thomas Gleixner
@ 2020-01-29 15:24                                                           ` tip-bot2 for Peter Zijlstra (Intel)
  2020-02-03 20:41                                                           ` [PATCH v17] " Sean Christopherson
  2020-02-04  0:04                                                           ` [PATCH v17] x86/split_lock: Enable split lock detection by kernel Sean Christopherson
  3 siblings, 0 replies; 145+ messages in thread
From: tip-bot2 for Peter Zijlstra (Intel) @ 2020-01-29 15:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Fenghua Yu, Tony Luck, Thomas Gleixner, x86, LKML

The following commit has been merged into the x86/cpu branch of tip:

Commit-ID:     fdbfb51ae760d1bba3f89e4fa00da83016ec4dbe
Gitweb:        https://git.kernel.org/tip/fdbfb51ae760d1bba3f89e4fa00da83016ec4dbe
Author:        Peter Zijlstra (Intel) <peterz@infradead.org>
AuthorDate:    Sun, 26 Jan 2020 12:05:35 -08:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 29 Jan 2020 13:42:39 +01:00

x86/split_lock: Enable split lock detection by kernel

A split-lock occurs when an atomic instruction operates on data that spans
two cache lines. In order to maintain atomicity the core takes a global bus
lock.

This is typically >1000 cycles slower than an atomic operation within a
cache line. It also disrupts performance on other cores (which must wait
for the bus lock to be released before their memory operations can
complete). For real-time systems this may mean missing deadlines. For other
systems it may just be very annoying.

Some CPUs have the capability to raise an #AC trap when a split lock is
attempted.

Provide a command line option to give the user choices on how to handle
this:

split_lock_detect=
	off	- not enabled (no traps for split locks)
	warn	- warn once when an application does a
		  split lock, but allow it to continue
		  running.
	fatal	- Send SIGBUS to applications that cause split lock

On systems that support split lock detection the default is "warn". Note
that if the kernel hits a split lock in any mode other than "off" it will
OOPs.

One implementation wrinkle is that the MSR to control the split lock
detection is per-core, not per thread. This might result in some short
lived races on HT systems in "warn" mode if Linux tries to enable on one
thread while disabling on the other. Race analysis by Sean Christopherson:

  - Toggling of split-lock is only done in "warn" mode.  Worst case
    scenario of a race is that a misbehaving task will generate multiple
    #AC exceptions on the same instruction.  And this race will only occur
    if both siblings are running tasks that generate split-lock #ACs, e.g.
    a race where sibling threads are writing different values will only
    occur if CPUx is disabling split-lock after an #AC and CPUy is
    re-enabling split-lock after *its* previous task generated an #AC.
  - Transitioning between off/warn/fatal modes at runtime isn't supported
    and disabling is tracked per task, so hardware will always reach a steady
    state that matches the configured mode.  I.e. split-lock is guaranteed to
    be enabled in hardware once all _TIF_SLD threads have been scheduled out.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200126200535.GB30377@agluck-desk2.amr.corp.intel.com

---
 Documentation/admin-guide/kernel-parameters.txt |  22 ++-
 arch/x86/include/asm/cpu.h                      |  12 +-
 arch/x86/include/asm/cpufeatures.h              |   2 +-
 arch/x86/include/asm/msr-index.h                |   9 +-
 arch/x86/include/asm/thread_info.h              |   4 +-
 arch/x86/kernel/cpu/common.c                    |   2 +-
 arch/x86/kernel/cpu/intel.c                     | 175 +++++++++++++++-
 arch/x86/kernel/process.c                       |   3 +-
 arch/x86/kernel/traps.c                         |  24 +-
 9 files changed, 250 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ec92120..87176a9 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4637,6 +4637,28 @@
 	spia_pedr=
 	spia_peddr=
 
+	split_lock_detect=
+			[X86] Enable split lock detection
+
+			When enabled (and if hardware support is present), atomic
+			instructions that access data across cache line
+			boundaries will result in an alignment check exception.
+
+			off	- not enabled
+
+			warn	- the kernel will emit rate limited warnings
+				  about applications triggering the #AC
+				  exception. This mode is the default on CPUs
+				  that supports split lock detection.
+
+			fatal	- the kernel will send SIGBUS to applications
+				  that trigger the #AC exception.
+
+			If an #AC exception is hit in the kernel or in
+			firmware (i.e. not while executing in user mode)
+			the kernel will oops in either "warn" or "fatal"
+			mode.
+
 	srcutree.counter_wrap_check [KNL]
 			Specifies how frequently to check for
 			grace-period sequence counter wrap for the
diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc8..ff6f3ca 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,16 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+extern void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+extern void switch_to_sld(unsigned long tifn);
+extern bool handle_user_split_lock(struct pt_regs *regs, long error_code);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline void switch_to_sld(unsigned long tifn) {}
+static inline bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	return false;
+}
+#endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index f3327cb..cd56ad5 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -285,6 +285,7 @@
 #define X86_FEATURE_CQM_MBM_LOCAL	(11*32+ 3) /* LLC Local MBM monitoring */
 #define X86_FEATURE_FENCE_SWAPGS_USER	(11*32+ 4) /* "" LFENCE in user entry SWAPGS path */
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
+#define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
@@ -367,6 +368,7 @@
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITIES	(18*32+30) /* "" IA32_CORE_CAPABILITIES MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
 /*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index ebe1685..8821697 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_TEST_CTRL				0x00000033
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
+#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
 #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
 #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
@@ -70,6 +74,11 @@
  */
 #define MSR_IA32_UMWAIT_CONTROL_TIME_MASK	(~0x03U)
 
+/* Abbreviated from Intel SDM name IA32_CORE_CAPABILITIES */
+#define MSR_IA32_CORE_CAPS			  0x000000cf
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT  5
+#define MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT	  BIT(MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index cf43279..f807930 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -92,6 +92,7 @@ struct thread_info {
 #define TIF_NOCPUID		15	/* CPUID is not accessible in userland */
 #define TIF_NOTSC		16	/* TSC is not accessible in userland */
 #define TIF_IA32		17	/* IA32 compatibility process */
+#define TIF_SLD			18	/* Restore split lock detection on context switch */
 #define TIF_NOHZ		19	/* in adaptive nohz mode */
 #define TIF_MEMDIE		20	/* is terminating due to OOM killer */
 #define TIF_POLLING_NRFLAG	21	/* idle is polling for TIF_NEED_RESCHED */
@@ -122,6 +123,7 @@ struct thread_info {
 #define _TIF_NOCPUID		(1 << TIF_NOCPUID)
 #define _TIF_NOTSC		(1 << TIF_NOTSC)
 #define _TIF_IA32		(1 << TIF_IA32)
+#define _TIF_SLD		(1 << TIF_SLD)
 #define _TIF_NOHZ		(1 << TIF_NOHZ)
 #define _TIF_POLLING_NRFLAG	(1 << TIF_POLLING_NRFLAG)
 #define _TIF_IO_BITMAP		(1 << TIF_IO_BITMAP)
@@ -145,7 +147,7 @@ struct thread_info {
 /* flags to check in __switch_to() */
 #define _TIF_WORK_CTXSW_BASE					\
 	(_TIF_NOCPUID | _TIF_NOTSC | _TIF_BLOCKSTEP |		\
-	 _TIF_SSBD | _TIF_SPEC_FORCE_UPDATE)
+	 _TIF_SSBD | _TIF_SPEC_FORCE_UPDATE | _TIF_SLD)
 
 /*
  * Avoid calls to __switch_to_xtra() on UP as STIBP is not evaluated.
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 86b8241..adb2f63 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1242,6 +1242,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
+	cpu_set_core_cap_bits(c);
+
 	fpu__init_system(c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 57473e2..5d92e38 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,8 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cpu_device_id.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -31,6 +33,19 @@
 #include <asm/apic.h>
 #endif
 
+enum split_lock_detect_state {
+	sld_off = 0,
+	sld_warn,
+	sld_fatal,
+};
+
+/*
+ * Default to sld_off because most systems do not support split lock detection
+ * split_lock_setup() will switch this to sld_warn on systems that support
+ * split lock detect, unless there is a command line override.
+ */
+static enum split_lock_detect_state sld_state = sld_off;
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
@@ -606,6 +621,8 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_init(void);
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
@@ -720,6 +737,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_enable();
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
+
+	split_lock_init();
 }
 
 #ifdef CONFIG_X86_32
@@ -981,3 +1000,159 @@ static const struct cpu_dev intel_cpu_dev = {
 };
 
 cpu_dev_register(intel_cpu_dev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
+static const struct {
+	const char			*option;
+	enum split_lock_detect_state	state;
+} sld_options[] __initconst = {
+	{ "off",	sld_off   },
+	{ "warn",	sld_warn  },
+	{ "fatal",	sld_fatal },
+};
+
+static inline bool match_option(const char *arg, int arglen, const char *opt)
+{
+	int len = strlen(opt);
+
+	return len == arglen && !strncmp(arg, opt, len);
+}
+
+static void __init split_lock_setup(void)
+{
+	char arg[20];
+	int i, ret;
+
+	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+	sld_state = sld_warn;
+
+	ret = cmdline_find_option(boot_command_line, "split_lock_detect",
+				  arg, sizeof(arg));
+	if (ret >= 0) {
+		for (i = 0; i < ARRAY_SIZE(sld_options); i++) {
+			if (match_option(arg, ret, sld_options[i].option)) {
+				sld_state = sld_options[i].state;
+				break;
+			}
+		}
+	}
+
+	switch (sld_state) {
+	case sld_off:
+		pr_info("disabled\n");
+		break;
+
+	case sld_warn:
+		pr_info("warning about user-space split_locks\n");
+		break;
+
+	case sld_fatal:
+		pr_info("sending SIGBUS on user-space split_locks\n");
+		break;
+	}
+}
+
+/*
+ * Locking is not required at the moment because only bit 29 of this
+ * MSR is implemented and locking would not prevent that the operation
+ * of one thread is immediately undone by the sibling thread.
+ * Use the "safe" versions of rdmsr/wrmsr here because although code
+ * checks CPUID and MSR bits to make sure the TEST_CTRL MSR should
+ * exist, there may be glitches in virtualization that leave a guest
+ * with an incorrect view of real h/w capabilities.
+ */
+static bool __sld_msr_set(bool on)
+{
+	u64 test_ctrl_val;
+
+	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+		return false;
+
+	if (on)
+		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+	else
+		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
+}
+
+static void split_lock_init(void)
+{
+	if (sld_state == sld_off)
+		return;
+
+	if (__sld_msr_set(true))
+		return;
+
+	/*
+	 * If this is anything other than the boot-cpu, you've done
+	 * funny things and you get to keep whatever pieces.
+	 */
+	pr_warn("MSR fail -- disabled\n");
+	sld_state = sld_off;
+}
+
+bool handle_user_split_lock(struct pt_regs *regs, long error_code)
+{
+	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
+		return false;
+
+	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
+			    current->comm, current->pid, regs->ip);
+
+	/*
+	 * Disable the split lock detection for this task so it can make
+	 * progress and set TIF_SLD so the detection is re-enabled via
+	 * switch_to_sld() when the task is scheduled out.
+	 */
+	__sld_msr_set(false);
+	set_tsk_thread_flag(current, TIF_SLD);
+	return true;
+}
+
+/*
+ * This function is called only when switching between tasks with
+ * different split-lock detection modes. It sets the MSR for the
+ * mode of the new task. This is right most of the time, but since
+ * the MSR is shared by hyperthreads on a physical core there can
+ * be glitches when the two threads need different modes.
+ */
+void switch_to_sld(unsigned long tifn)
+{
+	__sld_msr_set(!(tifn & _TIF_SLD));
+}
+
+#define SPLIT_LOCK_CPU(model) {X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY}
+
+/*
+ * The following processors have the split lock detection feature. But
+ * since they don't have the IA32_CORE_CAPABILITIES MSR, the feature cannot
+ * be enumerated. Enable it by family and model matching on these
+ * processors.
+ */
+static const struct x86_cpu_id split_lock_cpu_ids[] __initconst = {
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_X),
+	SPLIT_LOCK_CPU(INTEL_FAM6_ICELAKE_L),
+	{}
+};
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+	u64 ia32_core_caps = 0;
+
+	if (c->x86_vendor != X86_VENDOR_INTEL)
+		return;
+	if (cpu_has(c, X86_FEATURE_CORE_CAPABILITIES)) {
+		/* Enumerate features reported in IA32_CORE_CAPABILITIES MSR. */
+		rdmsrl(MSR_IA32_CORE_CAPS, ia32_core_caps);
+	} else if (!boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
+		/* Enumerate split lock detection by family and model. */
+		if (x86_match_cpu(split_lock_cpu_ids))
+			ia32_core_caps |= MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT;
+	}
+
+	if (ia32_core_caps & MSR_IA32_CORE_CAPS_SPLIT_LOCK_DETECT)
+		split_lock_setup();
+}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 839b524..a43c328 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -650,6 +650,9 @@ void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p)
 		/* Enforce MSR update to ensure consistent state */
 		__speculation_ctrl_update(~tifn, tifn);
 	}
+
+	if ((tifp ^ tifn) & _TIF_SLD)
+		switch_to_sld(tifn);
 }
 
 /*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9e6f822..9f42f0a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -46,6 +46,7 @@
 #include <asm/traps.h>
 #include <asm/desc.h>
 #include <asm/fpu/internal.h>
+#include <asm/cpu.h>
 #include <asm/cpu_entry_area.h>
 #include <asm/mce.h>
 #include <asm/fixmap.h>
@@ -244,7 +245,6 @@ do_trap(int trapnr, int signr, char *str, struct pt_regs *regs,
 {
 	struct task_struct *tsk = current;
 
-
 	if (!do_trap_no_signal(tsk, trapnr, str, regs, error_code))
 		return;
 
@@ -290,9 +290,29 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
 DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
 DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
 DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
-DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
 #undef IP
 
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	char *str = "alignment check";
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
+		return;
+
+	if (!user_mode(regs))
+		die("Split lock detected\n", regs, error_code);
+
+	local_irq_enable();
+
+	if (handle_user_split_lock(regs, error_code))
+		return;
+
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);
+}
+
 #ifdef CONFIG_VMAP_STACK
 __visible void __noreturn handle_stack_overflow(const char *message,
 						struct pt_regs *regs,

^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [PATCH v17] x86/split_lock: Enable split lock detection by kernel
  2020-01-26 20:05                                                         ` [PATCH v17] " Luck, Tony
  2020-01-29 12:31                                                           ` Thomas Gleixner
  2020-01-29 15:24                                                           ` [tip: x86/cpu] " tip-bot2 for Peter Zijlstra (Intel)
@ 2020-02-03 20:41                                                           ` Sean Christopherson
  2020-02-06  0:49                                                             ` [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR Luck, Tony
  2020-02-04  0:04                                                           ` [PATCH v17] x86/split_lock: Enable split lock detection by kernel Sean Christopherson
  3 siblings, 1 reply; 145+ messages in thread
From: Sean Christopherson @ 2020-02-03 20:41 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, Mark D Rustad, Arvind Sankar, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Sun, Jan 26, 2020 at 12:05:35PM -0800, Luck, Tony wrote:
> +/*
> + * Locking is not required at the moment because only bit 29 of this
> + * MSR is implemented and locking would not prevent that the operation
> + * of one thread is immediately undone by the sibling thread.
> + * Use the "safe" versions of rdmsr/wrmsr here because although code
> + * checks CPUID and MSR bits to make sure the TEST_CTRL MSR should
> + * exist, there may be glitches in virtualization that leave a guest
> + * with an incorrect view of real h/w capabilities.
> + */
> +static bool __sld_msr_set(bool on)
> +{
> +	u64 test_ctrl_val;
> +
> +	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
> +		return false;

How about caching the MSR value on a per-{cpu/core} basis at boot to avoid
the RDMSR when switching to/from from a misbehaving tasks?  E.g. to avoid
penalizing well-behaved tasks any more than necessary.

We've likely got bigger issues if MSR_TEST_CTL is being written by BIOS
at runtime, even if the writes were limited to synchronous calls from the
kernel.

Probably makes sense to split the MSR's init sequence and runtime sequence,
e.g. to also use an unsafe wrmsrl() at runtime so that an unexpected #GP
generates a WARN.

> +
> +	if (on)
> +		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +	else
> +		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
> +
> +	return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
> +}

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v17] x86/split_lock: Enable split lock detection by kernel
  2020-01-26 20:05                                                         ` [PATCH v17] " Luck, Tony
                                                                             ` (2 preceding siblings ...)
  2020-02-03 20:41                                                           ` [PATCH v17] " Sean Christopherson
@ 2020-02-04  0:04                                                           ` Sean Christopherson
  2020-02-04 12:52                                                             ` Thomas Gleixner
  3 siblings, 1 reply; 145+ messages in thread
From: Sean Christopherson @ 2020-02-04  0:04 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, Mark D Rustad, Arvind Sankar, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

On Sun, Jan 26, 2020 at 12:05:35PM -0800, Luck, Tony wrote:

...

> +bool handle_user_split_lock(struct pt_regs *regs, long error_code)

No reason to take the error code unless there's a plan to use it.

> +{
> +	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
> +		return false;

Any objection to moving the EFLAGS.AC up to do_alignment_check()?  And
take "unsigned long rip" instead of @regs?

That would allow KVM to reuse handle_user_split_lock() for guest faults
without any changes (other than exporting).

E.g. do_alignment_check() becomes:

	if (!(regs->flags & X86_EFLAGS_AC) && handle_user_split_lock(regs->ip))
		return;

> +
> +	pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
> +			    current->comm, current->pid, regs->ip);
> +
> +	/*
> +	 * Disable the split lock detection for this task so it can make
> +	 * progress and set TIF_SLD so the detection is re-enabled via
> +	 * switch_to_sld() when the task is scheduled out.
> +	 */
> +	__sld_msr_set(false);
> +	set_tsk_thread_flag(current, TIF_SLD);
> +	return true;
> +}

...

> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> +	char *str = "alignment check";
> +
> +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> +	if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
> +		return;
> +
> +	if (!user_mode(regs))
> +		die("Split lock detected\n", regs, error_code);
> +
> +	local_irq_enable();
> +
> +	if (handle_user_split_lock(regs, error_code))
> +		return;
> +
> +	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> +		error_code, BUS_ADRALN, NULL);
> +}
> +
>  #ifdef CONFIG_VMAP_STACK
>  __visible void __noreturn handle_stack_overflow(const char *message,
>  						struct pt_regs *regs,
> -- 
> 2.21.1
> 

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH v17] x86/split_lock: Enable split lock detection by kernel
  2020-02-04  0:04                                                           ` [PATCH v17] x86/split_lock: Enable split lock detection by kernel Sean Christopherson
@ 2020-02-04 12:52                                                             ` Thomas Gleixner
  0 siblings, 0 replies; 145+ messages in thread
From: Thomas Gleixner @ 2020-02-04 12:52 UTC (permalink / raw)
  To: Sean Christopherson, Luck, Tony
  Cc: Mark D Rustad, Arvind Sankar, Peter Zijlstra, Ingo Molnar, Yu,
	Fenghua, Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok,
	Shankar, Ravi V, linux-kernel, x86

Sean Christopherson <sean.j.christopherson@intel.com> writes:

> On Sun, Jan 26, 2020 at 12:05:35PM -0800, Luck, Tony wrote:
>
> ...
>
>> +bool handle_user_split_lock(struct pt_regs *regs, long error_code)
>
> No reason to take the error code unless there's a plan to use it.
>
>> +{
>> +	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
>> +		return false;
>
> Any objection to moving the EFLAGS.AC up to do_alignment_check()?  And
> take "unsigned long rip" instead of @regs?
>
> That would allow KVM to reuse handle_user_split_lock() for guest faults
> without any changes (other than exporting).
>
> E.g. do_alignment_check() becomes:
>
> 	if (!(regs->flags & X86_EFLAGS_AC) && handle_user_split_lock(regs->ip))
> 		return;

No objections.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 145+ messages in thread

* [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR
  2020-02-03 20:41                                                           ` [PATCH v17] " Sean Christopherson
@ 2020-02-06  0:49                                                             ` Luck, Tony
  2020-02-06  1:18                                                               ` Andy Lutomirski
  0 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2020-02-06  0:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Thomas Gleixner, Mark D Rustad, Arvind Sankar, Peter Zijlstra,
	Ingo Molnar, Yu, Fenghua, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Raj, Ashok, Shankar, Ravi V, linux-kernel, x86

In a context switch from a task that is detecting split locks
to one that is not (or vice versa) we need to update the TEST_CTRL
MSR. Currently this is done with the common sequence:
	read the MSR
	flip the bit
	write the MSR
in order to avoid changing the value of any reserved bits in the MSR.

Cache the value of the TEST_CTRL MSR when we read it during initialization
so we can avoid an expensive RDMSR instruction during context switch.

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/intel.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 5d92e381fd91..78de69c5887a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1054,6 +1054,14 @@ static void __init split_lock_setup(void)
 	}
 }
 
+/*
+ * Soft copy of MSR_TEST_CTRL initialized when we first read the
+ * MSR. Used at runtime to avoid using rdmsr again just to collect
+ * the reserved bits in the MSR. We assume reserved bits are the
+ * same on all CPUs.
+ */
+static u64 test_ctrl_val;
+
 /*
  * Locking is not required at the moment because only bit 29 of this
  * MSR is implemented and locking would not prevent that the operation
@@ -1063,19 +1071,29 @@ static void __init split_lock_setup(void)
  * exist, there may be glitches in virtualization that leave a guest
  * with an incorrect view of real h/w capabilities.
  */
-static bool __sld_msr_set(bool on)
+static bool __sld_msr_init(void)
 {
-	u64 test_ctrl_val;
+	u64 val;
 
-	if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+	if (rdmsrl_safe(MSR_TEST_CTRL, &val))
 		return false;
+	test_ctrl_val = val;
+
+	val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+
+	return !wrmsrl_safe(MSR_TEST_CTRL, val);
+}
+
+static void __sld_msr_set(bool on)
+{
+	u64 val = test_ctrl_val;
 
 	if (on)
-		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+		val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
 	else
-		test_ctrl_val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
+		val &= ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
 
-	return !wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val);
+	wrmsrl_safe(MSR_TEST_CTRL, val);
 }
 
 static void split_lock_init(void)
@@ -1083,7 +1101,7 @@ static void split_lock_init(void)
 	if (sld_state == sld_off)
 		return;
 
-	if (__sld_msr_set(true))
+	if (__sld_msr_init())
 		return;
 
 	/*
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 145+ messages in thread

* Re: [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR
  2020-02-06  0:49                                                             ` [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR Luck, Tony
@ 2020-02-06  1:18                                                               ` Andy Lutomirski
  2020-02-06 16:46                                                                 ` Luck, Tony
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2020-02-06  1:18 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Sean Christopherson, Thomas Gleixner, Mark D Rustad,
	Arvind Sankar, Peter Zijlstra, Ingo Molnar, Yu, Fenghua,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86

On Wed, Feb 5, 2020 at 4:49 PM Luck, Tony <tony.luck@intel.com> wrote:
>
> In a context switch from a task that is detecting split locks
> to one that is not (or vice versa) we need to update the TEST_CTRL
> MSR. Currently this is done with the common sequence:
>         read the MSR
>         flip the bit
>         write the MSR
> in order to avoid changing the value of any reserved bits in the MSR.
>
> Cache the value of the TEST_CTRL MSR when we read it during initialization
> so we can avoid an expensive RDMSR instruction during context switch.

If something else that is per-cpu-ish gets added to the MSR in the
future, I will personally make fun of you for not making this percpu.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR
  2020-02-06  1:18                                                               ` Andy Lutomirski
@ 2020-02-06 16:46                                                                 ` Luck, Tony
  2020-02-06 19:37                                                                   ` Andy Lutomirski
  0 siblings, 1 reply; 145+ messages in thread
From: Luck, Tony @ 2020-02-06 16:46 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Sean Christopherson, Thomas Gleixner, Mark D Rustad,
	Arvind Sankar, Peter Zijlstra, Ingo Molnar, Yu, Fenghua,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86, Xiaoyao Li

On Wed, Feb 05, 2020 at 05:18:23PM -0800, Andy Lutomirski wrote:
> On Wed, Feb 5, 2020 at 4:49 PM Luck, Tony <tony.luck@intel.com> wrote:
> >
> > In a context switch from a task that is detecting split locks
> > to one that is not (or vice versa) we need to update the TEST_CTRL
> > MSR. Currently this is done with the common sequence:
> >         read the MSR
> >         flip the bit
> >         write the MSR
> > in order to avoid changing the value of any reserved bits in the MSR.
> >
> > Cache the value of the TEST_CTRL MSR when we read it during initialization
> > so we can avoid an expensive RDMSR instruction during context switch.
> 
> If something else that is per-cpu-ish gets added to the MSR in the
> future, I will personally make fun of you for not making this percpu.

Xiaoyao Li has posted a version using a percpu cache value:

https://lore.kernel.org/r/20200206070412.17400-4-xiaoyao.li@intel.com

So take that if it makes you happier.  My patch only used the
cached value to store the state of the reserved bits in the MSR
and assumed those are the same for all cores.

Xiaoyao Li's version updates with what was most recently written
on each thread (but doesn't, and can't, make use of that because we
know that the other thread on the core may have changed the actual
value in the MSR).

If more bits are implemented that need to be set at run time, we
are likely up the proverbial creek. I'll see if I can find out if
there are plans for that.

-Tony

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR
  2020-02-06 16:46                                                                 ` Luck, Tony
@ 2020-02-06 19:37                                                                   ` Andy Lutomirski
  2020-03-03 19:22                                                                     ` Sean Christopherson
  0 siblings, 1 reply; 145+ messages in thread
From: Andy Lutomirski @ 2020-02-06 19:37 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Andy Lutomirski, Sean Christopherson, Thomas Gleixner,
	Mark D Rustad, Arvind Sankar, Peter Zijlstra, Ingo Molnar, Yu,
	Fenghua, Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok,
	Shankar, Ravi V, linux-kernel, x86, Xiaoyao Li


> On Feb 6, 2020, at 8:46 AM, Luck, Tony <tony.luck@intel.com> wrote:
> 
> On Wed, Feb 05, 2020 at 05:18:23PM -0800, Andy Lutomirski wrote:
>>> On Wed, Feb 5, 2020 at 4:49 PM Luck, Tony <tony.luck@intel.com> wrote:
>>> 
>>> In a context switch from a task that is detecting split locks
>>> to one that is not (or vice versa) we need to update the TEST_CTRL
>>> MSR. Currently this is done with the common sequence:
>>>        read the MSR
>>>        flip the bit
>>>        write the MSR
>>> in order to avoid changing the value of any reserved bits in the MSR.
>>> 
>>> Cache the value of the TEST_CTRL MSR when we read it during initialization
>>> so we can avoid an expensive RDMSR instruction during context switch.
>> 
>> If something else that is per-cpu-ish gets added to the MSR in the
>> future, I will personally make fun of you for not making this percpu.
> 
> Xiaoyao Li has posted a version using a percpu cache value:
> 
> https://lore.kernel.org/r/20200206070412.17400-4-xiaoyao.li@intel.com
> 
> So take that if it makes you happier.  My patch only used the
> cached value to store the state of the reserved bits in the MSR
> and assumed those are the same for all cores.
> 
> Xiaoyao Li's version updates with what was most recently written
> on each thread (but doesn't, and can't, make use of that because we
> know that the other thread on the core may have changed the actual
> value in the MSR).
> 
> If more bits are implemented that need to be set at run time, we
> are likely up the proverbial creek. I'll see if I can find out if
> there are plans for that.
> 

I suppose that this whole thing is a giant mess, especially since at least one bit there is per-physical-core. Sigh.

So I don’t have a strong preference.

^ permalink raw reply	[flat|nested] 145+ messages in thread

* Re: [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR
  2020-02-06 19:37                                                                   ` Andy Lutomirski
@ 2020-03-03 19:22                                                                     ` Sean Christopherson
  0 siblings, 0 replies; 145+ messages in thread
From: Sean Christopherson @ 2020-03-03 19:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Luck, Tony, Andy Lutomirski, Thomas Gleixner, Mark D Rustad,
	Arvind Sankar, Peter Zijlstra, Ingo Molnar, Yu, Fenghua,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Raj, Ashok, Shankar,
	Ravi V, linux-kernel, x86, Xiaoyao Li

On Thu, Feb 06, 2020 at 11:37:04AM -0800, Andy Lutomirski wrote:
> 
> > On Feb 6, 2020, at 8:46 AM, Luck, Tony <tony.luck@intel.com> wrote:
> > 
> > On Wed, Feb 05, 2020 at 05:18:23PM -0800, Andy Lutomirski wrote:
> >>> On Wed, Feb 5, 2020 at 4:49 PM Luck, Tony <tony.luck@intel.com> wrote:
> >>> 
> >>> In a context switch from a task that is detecting split locks
> >>> to one that is not (or vice versa) we need to update the TEST_CTRL
> >>> MSR. Currently this is done with the common sequence:
> >>>        read the MSR
> >>>        flip the bit
> >>>        write the MSR
> >>> in order to avoid changing the value of any reserved bits in the MSR.
> >>> 
> >>> Cache the value of the TEST_CTRL MSR when we read it during initialization
> >>> so we can avoid an expensive RDMSR instruction during context switch.
> >> 
> >> If something else that is per-cpu-ish gets added to the MSR in the
> >> future, I will personally make fun of you for not making this percpu.
> > 
> > Xiaoyao Li has posted a version using a percpu cache value:
> > 
> > https://lore.kernel.org/r/20200206070412.17400-4-xiaoyao.li@intel.com
> > 
> > So take that if it makes you happier.  My patch only used the
> > cached value to store the state of the reserved bits in the MSR
> > and assumed those are the same for all cores.
> > 
> > Xiaoyao Li's version updates with what was most recently written
> > on each thread (but doesn't, and can't, make use of that because we
> > know that the other thread on the core may have changed the actual
> > value in the MSR).
> > 
> > If more bits are implemented that need to be set at run time, we
> > are likely up the proverbial creek. I'll see if I can find out if
> > there are plans for that.
> > 
> 
> I suppose that this whole thing is a giant mess, especially since at least
> one bit there is per-physical-core. Sigh.
> 
> So I don’t have a strong preference.

I'd prefer to go with this patch, i.e. not percpu, to remove the temptation
of incorrectly optimizing away toggling SPLIT_LOCK_DETECT.

^ permalink raw reply	[flat|nested] 145+ messages in thread

end of thread, other threads:[~2020-03-03 19:22 UTC | newest]

Thread overview: 145+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-21  0:53 [PATCH v10 0/6] Enable split lock detection for real time and debug Fenghua Yu
2019-11-21  0:53 ` [PATCH v10 1/6] x86/msr-index: Add two new MSRs Fenghua Yu
2019-11-21  0:53 ` [PATCH v10 2/6] x86/cpufeatures: Enumerate the IA32_CORE_CAPABILITIES MSR Fenghua Yu
2019-11-21  0:53 ` [PATCH v10 3/6] x86/split_lock: Enumerate split lock detection by " Fenghua Yu
2019-11-21  0:53 ` [PATCH v10 4/6] x86/split_lock: Enumerate split lock detection if the IA32_CORE_CAPABILITIES MSR is not supported Fenghua Yu
2019-11-21 22:07   ` Andy Lutomirski
2019-11-22  0:37     ` Fenghua Yu
2019-11-22  2:13       ` Andy Lutomirski
2019-11-22  9:46         ` Peter Zijlstra
2019-11-21  0:53 ` [PATCH v10 5/6] x86/split_lock: Handle #AC exception for split lock Fenghua Yu
2019-11-21 22:10   ` Andy Lutomirski
2019-11-21 23:14     ` Fenghua Yu
2019-11-21 23:12       ` Andy Lutomirski
2019-11-21  0:53 ` [PATCH v10 6/6] x86/split_lock: Enable split lock detection by kernel parameter Fenghua Yu
2019-11-21  6:04   ` Ingo Molnar
2019-11-21 13:01     ` Peter Zijlstra
2019-11-21 13:15       ` Peter Zijlstra
2019-11-21 21:51         ` Luck, Tony
2019-11-21 22:24           ` Andy Lutomirski
2019-11-21 22:29             ` Luck, Tony
2019-11-21 23:18               ` Andy Lutomirski
2019-11-21 23:53                 ` Fenghua Yu
2019-11-22  1:52                   ` Sean Christopherson
2019-11-22  2:21                     ` Andy Lutomirski
2019-11-22  2:39                       ` Xiaoyao Li
2019-11-22  2:57                         ` Andy Lutomirski
2019-11-21 23:55                 ` Luck, Tony
2019-11-22  0:55             ` Luck, Tony
2019-11-22 10:08           ` Peter Zijlstra
2019-11-21 16:14       ` Fenghua Yu
2019-11-21 17:14         ` Ingo Molnar
2019-11-21 17:35         ` Peter Zijlstra
2019-11-21 17:12       ` Ingo Molnar
2019-11-21 17:34         ` Luck, Tony
2019-11-22 10:51           ` Peter Zijlstra
2019-11-22 15:27             ` Peter Zijlstra
2019-11-22 17:22               ` Luck, Tony
2019-11-22 20:23                 ` Peter Zijlstra
2019-11-22 18:02               ` Luck, Tony
2019-11-22 20:23                 ` Peter Zijlstra
2019-11-22 20:42                   ` Fenghua Yu
2019-11-22 21:25                     ` Andy Lutomirski
2019-12-12  8:57                       ` Peter Zijlstra
2019-12-12 18:52                         ` Luck, Tony
2019-12-12 19:46                           ` Luck, Tony
2019-12-12 20:01                             ` Andy Lutomirski
2019-12-16 16:21                               ` David Laight
2019-11-22 18:44               ` Sean Christopherson
2019-11-22 20:30                 ` Peter Zijlstra
2019-11-23  0:30               ` Luck, Tony
2019-11-25 16:13                 ` Sean Christopherson
2019-12-02 18:20                   ` Luck, Tony
2019-12-12  8:59                   ` Peter Zijlstra
2020-01-10 19:24                     ` [PATCH v11] x86/split_lock: Enable split lock detection by kernel Luck, Tony
2020-01-14  5:55                       ` Sean Christopherson
2020-01-15 22:27                         ` Luck, Tony
2020-01-15 22:57                           ` Sean Christopherson
2020-01-15 23:48                             ` Luck, Tony
2020-01-22 18:55                             ` [PATCH v12] " Luck, Tony
2020-01-22 19:04                               ` Borislav Petkov
2020-01-22 20:03                                 ` Luck, Tony
2020-01-22 20:55                                   ` Borislav Petkov
2020-01-22 22:42                               ` Arvind Sankar
2020-01-22 22:52                                 ` Arvind Sankar
2020-01-22 23:24                                 ` Luck, Tony
2020-01-23  0:45                                   ` Arvind Sankar
2020-01-23  1:23                                     ` Luck, Tony
2020-01-23  4:21                                       ` Arvind Sankar
2020-01-23 17:15                                         ` Luck, Tony
2020-01-23  3:53                                     ` [PATCH v13] " Luck, Tony
2020-01-23  4:45                                       ` Arvind Sankar
2020-01-23 23:16                                         ` [PATCH v14] " Luck, Tony
2020-01-24 21:36                                           ` Thomas Gleixner
2020-01-25  2:47                                             ` [PATCH v15] " Luck, Tony
2020-01-25 10:44                                               ` Borislav Petkov
2020-01-25 19:55                                                 ` Luck, Tony
2020-01-25 20:12                                                   ` Peter Zijlstra
2020-01-25 20:33                                                     ` Borislav Petkov
2020-01-25 21:42                                                       ` Luck, Tony
2020-01-25 22:17                                                         ` Borislav Petkov
2020-01-25 20:29                                                   ` Borislav Petkov
2020-01-25 13:41                                               ` Thomas Gleixner
2020-01-25 22:07                                                 ` [PATCH v16] " Luck, Tony
2020-01-25 22:43                                                   ` Mark D Rustad
2020-01-25 23:10                                                     ` Luck, Tony
2020-01-26 17:27                                                       ` Mark D Rustad
2020-01-26 20:05                                                         ` [PATCH v17] " Luck, Tony
2020-01-29 12:31                                                           ` Thomas Gleixner
2020-01-29 15:24                                                           ` [tip: x86/cpu] " tip-bot2 for Peter Zijlstra (Intel)
2020-02-03 20:41                                                           ` [PATCH v17] " Sean Christopherson
2020-02-06  0:49                                                             ` [PATCH] x86/split_lock: Avoid runtime reads of the TEST_CTRL MSR Luck, Tony
2020-02-06  1:18                                                               ` Andy Lutomirski
2020-02-06 16:46                                                                 ` Luck, Tony
2020-02-06 19:37                                                                   ` Andy Lutomirski
2020-03-03 19:22                                                                     ` Sean Christopherson
2020-02-04  0:04                                                           ` [PATCH v17] x86/split_lock: Enable split lock detection by kernel Sean Christopherson
2020-02-04 12:52                                                             ` Thomas Gleixner
2020-01-26  0:34                                                   ` [PATCH v16] " Andy Lutomirski
2020-01-26 20:01                                                     ` Luck, Tony
2020-01-25 21:25                                               ` [PATCH v15] " Arvind Sankar
2020-01-25 21:50                                                 ` Luck, Tony
2020-01-25 23:51                                                   ` Arvind Sankar
2020-01-26  2:52                                                     ` Luck, Tony
2020-01-27  2:05                                                       ` Tony Luck
2020-01-27  8:04                                                   ` Peter Zijlstra
2020-01-27  8:36                                                     ` Peter Zijlstra
2020-01-27 17:35                                                     ` Luck, Tony
2020-01-27  8:02                                                 ` Peter Zijlstra
2019-12-13  0:09               ` [PATCH v11] x86/split_lock: Enable split lock detection by kernel parameter Tony Luck
2019-12-13  0:16                 ` Luck, Tony
2019-11-21 17:43         ` [PATCH v10 6/6] " David Laight
2019-11-21 17:51           ` Andy Lutomirski
2019-11-21 18:53             ` Fenghua Yu
2019-11-21 19:01               ` Andy Lutomirski
2019-11-21 20:25                 ` Fenghua Yu
2019-11-21 20:19                   ` Peter Zijlstra
2019-11-21 19:46               ` Peter Zijlstra
2019-11-21 20:25               ` Peter Zijlstra
2019-11-21 21:22                 ` Andy Lutomirski
2019-11-22  9:25                   ` Peter Zijlstra
2019-11-22 17:48                     ` Luck, Tony
2019-11-22 20:31                       ` Peter Zijlstra
2019-11-22 21:23                         ` Andy Lutomirski
2019-12-11 17:52                           ` Peter Zijlstra
2019-12-11 18:12                             ` Andy Lutomirski
2019-12-11 22:34                               ` Peter Zijlstra
2019-12-12 19:40                                 ` Andy Lutomirski
2019-12-16  9:59                                   ` David Laight
2019-12-16 17:22                                     ` Andy Lutomirski
2019-12-16 17:45                                       ` David Laight
2019-12-16 18:06                                         ` Andy Lutomirski
2019-12-17 10:03                                           ` David Laight
2019-12-11 18:44                             ` Luck, Tony
2019-12-11 22:39                               ` Peter Zijlstra
2019-12-12 10:36                                 ` David Laight
2019-12-12 13:04                                   ` Peter Zijlstra
2019-12-12 16:02                                     ` Andy Lutomirski
2019-12-12 16:23                                       ` David Laight
2019-12-12 16:29                                     ` David Laight
2019-11-21 19:56             ` Peter Zijlstra
2019-11-21 21:01               ` Andy Lutomirski
2019-11-22  9:36                 ` Peter Zijlstra
2019-11-22  9:46             ` David Laight
2019-11-22 20:32               ` Peter Zijlstra
2019-11-21  8:00   ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.