linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v9 00/17] x86/split_lock: Enable split lock detection
@ 2019-06-18 22:41 Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 01/17] x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long Fenghua Yu
                   ` (17 more replies)
  0 siblings, 18 replies; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

==Introduction==

A split lock is any atomic operation whose operand crosses two cache
lines. Since the operand spans two cache lines and the operation must
be atomic, the system locks the bus while the CPU accesses the two cache
lines.

During bus locking, request from other CPUs or bus agents for control
of the bus are blocked. Blocking bus access from other CPUs plus
overhead of configuring bus locking protocol degrade not only performance
on one CPU but also overall system performance.

If the operand is cacheable and completely contained in one cache line,
the atomic operation is optimized by less expensive cache locking on
Intel P6 and recent processors. If a split lock operation is detected
and a developer fixes the issue so that the operand can be operated in one
cache line, cache locking instead of more expensive bus locking will be
used for the atomic operation. Removing the split lock can improve overall
performance.

Instructions that may cause split lock issue include lock add, lock btc,
xchg, lsl, far call, ltr, etc.

More information about split lock, bus locking, and cache locking can be
found in the latest Intel 64 and IA-32 Architecture Software Developer's
Manual.

==Split lock detection==

Currently Linux can trace split lock event counter sq_misc.split_lock
for debug purpose. But for system deployed in the field, this event
counter after the fact is insufficient. We need a mechanism that
detects split lock before it happens to ensure that bus lock is never
incurred due to split lock.

Intel introduces a mechanism to detect split lock via Alignment Check
(#AC) exception before badly aligned atomic instructions might impact
whole system performance in Tremont and other future processors. 

This capability is critical for real time system designers who build
consolidated real time systems. These systems run hard real time
code on some cores and run "untrusted" user processes on some
other cores. The hard real time cannot afford to have any bus lock from
the untrusted processes to hurt real time performance. To date the
designers have been unable to deploy these solutions as they have no
way to prevent the "untrusted" user code from generating split lock and
bus lock to block the hard real time code to access memory during bus
locking.

This capability may also find usage in cloud. A user process with split
lock running in one guest can block other cores from accessing shared
memory during its split locked memory access. That may cause overall
system performance degradation.

Split lock may open a security hole where malicious user code may slow
down overall system by executing instructions with split lock.

==Enumerate split lock detection feature==

A control bit (bit 29) in MSR_TEST_CTL (0x33) will be introduced in
future x86 processors. When the bit 29 is set, the processor causes
#AC exception for split locked accesses at all CPL.

The split lock detection feature is enumerated through bit 5 in
MSR_IA32_CORE_CAPABILITY (0xcf). The MSR 0xcf itself is enumerated by
CPUID.(EAX=0x7,ECX=0):EDX[30].

The enumeration method is published in the latest Intel 64 and IA-32
Architecture Software Developer's Manual.

A few processors have the split lock detection feature. But they don't
have MSR_IA32_CORE_CAPABILITY to enumerate the feature. On those
processors, enumerate the split lock detection feature based on their
family/model/stepping numbers.

==Handle split lock===

There may be different considerations to handle split lock, e.g. how
to handle split lock issue in firmware after kernel enables the feature.

But this patch set uses a simple way to handle split lock which is
suggested by Thomas Gleixner and Dave Hansen:

- If split lock happens in kernel, a warning is issued and split lock
detection is disabled on the current CPU. The split lock issue should
be fixed in kernel.

- If split lock happens in user process, the process is killed by
SIGBUS. Unless the issue is fixed, the process cannot run in the
system.

- If split lock happens in firmware, system may hang in firmware. The
issue should be fixed in firmware.

- Enable split lock detection by default once the feature is enumerated.

- Disable split lock detection by kernel parameter "nosplit_lock_detect"
during boot time.

- Disable/enable split lock detection by debugfs interface
/sys/kernel/debug/x86/split_lock_detect during run time.

==Expose to guest==

To expose split lock detection to guest, it needs to
 1. Report the new CPUID bit to guest.
 2. Emulate IA32_CORE_CAPABILITIES MSR.
 3. Emulate TEST_CTL MSR.

To avoid  malicious guest from using split lock to produce a slowdown attack,
making the following policy:
 - If the host kernel has it enabled then the guest is not allowed to
change it.
 - If the host kernel has it disabled then the guest can enable it for
it's own purposes.

Accordingly, injecting #AC back to guest only when guest can handle it.

==Patches==
Patch 0001-0003: Fix a few existing split lock issues.
Patch 0004-0008: Enumerate features and define MSR_TEST_CTL.
Patch 0009: Handle #AC for split lock.
Patch 0010-0011: Enable split lock detection in KVM.
Patch 0012: Enable split lock detection by default after #AC handler and KVM 
are installed.
Patch 0013: Disable split lock detection by kernel parameter
"nosplit_lock_detect" during boot time.
Patch 0014-0015: Define a debugfs interface to enable/disable split lock
detection during run time.
Patch 0016-0017: Warn if addr is unaligned to unsigned long in atomic
ops xxx_bit().

==Changelog==
v9:
Address Thomas Gleixner's comments:
- wrmsr() in split_lock_update_msr() to spare RMW
- Print warnings in atomic bit operations xxx_bit() if the address is
unaligned to unsigned long.
- When host enables split lock detection, forcing it enabled for guest.
- Using the msr_test_ctl_mask to decide which bits need to be switched in
atomic_switch_msr_test_ctl().
- Warn if addr is unaligned to unsigned long in atomic ops xxx_bit().

Address Ingo Molnar's comments:
- Follow right MSR register and bits naming convention
- Use right naming convention for variables and functions
- Use split_lock_debug for atomic opertions of WARN_ONCE in #AC handler
and split_lock_detect_wr();
- Move the sysfs interface to debugfs interface /sys/kernel/debug/x86/
split_lock_detect

Other fixes:
- update vmx->msr_test_ctl_mask when changing MSR_IA32_CORE_CAP.
- Support resume from suspend/hibernation

- The split lock fix patch (#0003) for wlcore wireless driver is
upstreamed. So remove the patch from this patch set.

v8:
Address issues pointed out by Thomas Gleixner:
- Remove all "clearcpuid=" related patches.
- Add kernel parameter "nosplit_lock_detect" patch.
- Merge definition and initialization of msr_test_ctl_cache into #AC
  handling patch which first uses the variable.
- Add justification for the sysfs knob and combine function and doc
  patches into one patch 0015.
- A few other adjustments.

v7:
- Add per cpu variable to cach MSR TEST_CTL. Suggested by Thomas Gleixner.
- Change a few other changes including locking, simplifying code, work
flow, KVM fixes, etc. Suggested by Thomas Gleixner.
- Fix KVM issues pointed out by Sean Christopherson.

v6:
- Fix #AC handler issues pointed out by Dave Hansen
- Add doc for the sysfs interface pointed out by Dave Hansen
- Fix a lock issue around wrmsr during split lock init, pointed out by Dave
  Hansen
- Update descriptions and comments suggested by Dave Hansen
- Fix __le32 issue in wlcore raised by Kalle Valo
- Add feature enumeration based on family/model/stepping for Icelake mobile

v5:
- Fix wlcore issue from Paolo Bonzini
- Fix b44 issue from Peter Zijlstra
- Change init sequence by Dave Hansen
- Fix KVM issues from Paolo Bonzini
- Re-order patch sequence

v4:
- Remove "setcpuid=" option
- Enable IA32_CORE_CAPABILITY enumeration for split lock
- Handle CPUID faulting by Peter Zijlstra
- Enable /sys interface to enable/disable split lock detection

v3:
- Handle split lock as suggested by Thomas Gleixner.
- Fix a few potential spit lock issues suggested by Thomas Gleixner.
- Support kernel option "setcpuid=" suggested by Dave Hanson and Thomas
Gleixner.
- Support flag string in "clearcpuid=" suggested by Dave Hanson and
Thomas Gleixner.

v2:
- Remove code that handles split lock issue in firmware and fix
x86_capability issue mainly based on comments from Thomas Gleixner and
Peter Zijlstra.

In previous version:
Comments from Dave Hansen:
- Enumerate feature in X86_FEATURE_SPLIT_LOCK_AC
- Separate #AC handler from do_error_trap
- Use CONFIG to configure inherit BIOS setting, enable, or disable split
  lock. Remove kernel parameter "split_lock_ac="
- Change config interface to debugfs from sysfs
- Fix a few bisectable issues
- Other changes.

Comment from Tony Luck and Dave Hansen:
- Dump right information in #AC handler

Comment from Alan Cox and Dave Hansen:
- Description of split lock in patch 0

Others:
- Remove tracing because we can trace split lock in existing
  sq_misc.split_lock.
- Add CONFIG to configure either panic or re-execute faulting instruction
  for split lock in kernel.
- other minor changes.

Fenghua Yu (13):
  x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long
  x86/split_lock: Align x86_capability to unsigned long to avoid split
    locked access
  x86/msr-index: Define MSR_IA32_CORE_CAP and split lock detection bit
  x86/cpufeatures: Enumerate MSR_IA32_CORE_CAP
  x86/split_lock: Enumerate split lock detection by MSR_IA32_CORE_CAP
  x86/split_lock: Enumerate split lock detection on Icelake mobile
    processor
  x86/split_lock: Define MSR TEST_CTL register
  x86/split_lock: Handle #AC exception for split lock
  x86/split_lock: Enable split lock detection by default
  x86/split_lock: Disable split lock detection by kernel parameter
    "nosplit_lock_detect"
  x86/split_lock: Add a debugfs interface to enable/disable split lock
    detection during run time
  x86/split_lock: Add documentation for split lock detection interface
  x86/split_lock: Warn on unaligned address in atomic bit operations

Peter Zijlstra (1):
  drivers/net/b44: Align pwol_mask to unsigned long for better
    performance

Sai Praneeth Prakhya (1):
  x86/split_lock: Reorganize few header files in order to call
    WARN_ON_ONCE() in atomic bit ops

Xiaoyao Li (2):
  kvm/x86: Emulate MSR IA32_CORE_CAPABILITY
  kvm/vmx: Emulate MSR TEST_CTL

 Documentation/ABI/testing/debugfs-x86         |  21 ++
 .../admin-guide/kernel-parameters.txt         |   2 +
 arch/microblaze/kernel/cpu/pvr.c              |   1 +
 arch/mips/ralink/mt7620.c                     |   1 +
 arch/powerpc/include/asm/cmpxchg.h            |   1 +
 arch/x86/include/asm/bitops.h                 |  16 ++
 arch/x86/include/asm/cpu.h                    |   8 +
 arch/x86/include/asm/cpufeatures.h            |   2 +
 arch/x86/include/asm/kvm_host.h               |   1 +
 arch/x86/include/asm/msr-index.h              |   8 +
 arch/x86/include/asm/processor.h              |   4 +-
 arch/x86/kernel/cpu/common.c                  |   7 +-
 arch/x86/kernel/cpu/cpuid-deps.c              |  79 +++----
 arch/x86/kernel/cpu/intel.c                   | 216 ++++++++++++++++++
 arch/x86/kernel/traps.c                       |  43 +++-
 arch/x86/kvm/cpuid.c                          |   6 +
 arch/x86/kvm/vmx/vmx.c                        |  92 +++++++-
 arch/x86/kvm/vmx/vmx.h                        |   2 +
 arch/x86/kvm/x86.c                            |  39 ++++
 arch/xtensa/include/asm/traps.h               |   1 +
 .../dvb-frontends/cxd2880/cxd2880_common.c    |   2 +
 drivers/net/ethernet/broadcom/b44.c           |   4 +-
 .../net/ethernet/freescale/fman/fman_muram.c  |   1 +
 drivers/soc/renesas/rcar-sysc.h               |   2 +-
 drivers/staging/fwserial/dma_fifo.c           |   1 +
 include/linux/assoc_array_priv.h              |   1 +
 include/linux/ata.h                           |   1 +
 include/linux/gpio/consumer.h                 |   1 +
 include/linux/iommu-helper.h                  |   1 +
 include/linux/kernel.h                        |   4 -
 include/linux/sched.h                         |   1 +
 kernel/bpf/tnum.c                             |   1 +
 lib/clz_ctz.c                                 |   1 +
 lib/errseq.c                                  |   1 +
 lib/flex_proportions.c                        |   1 +
 lib/hexdump.c                                 |   1 +
 lib/lz4/lz4defs.h                             |   1 +
 lib/math/div64.c                              |   1 +
 lib/math/gcd.c                                |   1 +
 lib/math/reciprocal_div.c                     |   1 +
 lib/siphash.c                                 |   1 +
 net/netfilter/nf_conntrack_h323_asn1.c        |   1 +
 42 files changed, 527 insertions(+), 53 deletions(-)
 create mode 100644 Documentation/ABI/testing/debugfs-x86

-- 
2.19.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v9 01/17] x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 02/17] drivers/net/b44: Align pwol_mask to unsigned long for better performance Fenghua Yu
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

cpu_caps_cleared[] and cpu_caps_set[] may not be aligned to unsigned long.
Atomic operations (i.e. set_bit() and clear_bit()) on the bitmaps may
access two cache lines (a.k.a. split lock) and cause the CPU to do a bus
lock to block all memory accesses from other processors to ensure
atomicity.

To avoid the overall performance degradation from the bus locking, align
the two variables to unsigned long.

Defining the variables as unsigned long may also fix the issue because
they will be naturally aligned to unsigned long. But that needs additional
code changes. Adding __aligned(unsigned long) is a simpler fix.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/kernel/cpu/common.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 2c57fffebf9b..ed2b81b437e0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -494,8 +494,9 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
 	return NULL;		/* Not found */
 }
 
-__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS];
-__u32 cpu_caps_set[NCAPINTS + NBUGINTS];
+/* Aligned to unsigned long to avoid split lock in atomic bitmap ops */
+__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
+__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
 
 void load_percpu_segment(int cpu)
 {
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 02/17] drivers/net/b44: Align pwol_mask to unsigned long for better performance
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 01/17] x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-24 15:12   ` David Laight
  2019-06-18 22:41 ` [PATCH v9 03/17] x86/split_lock: Align x86_capability to unsigned long to avoid split locked access Fenghua Yu
                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

From: Peter Zijlstra <peterz@infradead.org>

A bit in pwol_mask is set in b44_magic_pattern() by atomic set_bit().
But since pwol_mask is local and never exposed to concurrency, there is
no need to set bit in pwol_mask atomically.

set_bit() sets the bit in a single unsigned long location. Because
pwol_mask may not be aligned to unsigned long, the location may cross two
cache lines. On x86, accessing two cache lines in locked instruction in
set_bit() is called split locked access and can cause overall performance
degradation.

So use non atomic __set_bit() to set pwol_mask bits. __set_bit() won't hit
split lock issue on x86.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 drivers/net/ethernet/broadcom/b44.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
index 97ab0dd25552..5738ab963dfb 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -1520,7 +1520,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
 
 	memset(ppattern + offset, 0xff, magicsync);
 	for (j = 0; j < magicsync; j++)
-		set_bit(len++, (unsigned long *) pmask);
+		__set_bit(len++, (unsigned long *)pmask);
 
 	for (j = 0; j < B44_MAX_PATTERNS; j++) {
 		if ((B44_PATTERN_SIZE - len) >= ETH_ALEN)
@@ -1532,7 +1532,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
 		for (k = 0; k< ethaddr_bytes; k++) {
 			ppattern[offset + magicsync +
 				(j * ETH_ALEN) + k] = macaddr[k];
-			set_bit(len++, (unsigned long *) pmask);
+			__set_bit(len++, (unsigned long *)pmask);
 		}
 	}
 	return len - 1;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 03/17] x86/split_lock: Align x86_capability to unsigned long to avoid split locked access
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 01/17] x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 02/17] drivers/net/b44: Align pwol_mask to unsigned long for better performance Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-24 15:12   ` David Laight
  2019-06-18 22:41 ` [PATCH v9 04/17] x86/msr-index: Define MSR_IA32_CORE_CAP and split lock detection bit Fenghua Yu
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
operate on bitmap defined in x86_capability.

Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
the location is at:
base address of x86_capability + (bit offset in x86_capability / 64) * 8

Since base address of x86_capability may not be aligned to unsigned long,
the single unsigned long location may cross two cache lines and
accessing the location by locked BTS/BTR introductions will cause
split lock.

To fix the split lock issue, align x86_capability to size of unsigned long
so that the location will be always within one cache line.

Changing x86_capability's type to unsigned long may also fix the issue
because x86_capability will be naturally aligned to size of unsigned long.
But this needs additional code changes. So choose the simpler solution
by setting the array's alignment to size of unsigned long.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/processor.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index c34a35c78618..d3e017723634 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -93,7 +93,9 @@ struct cpuinfo_x86 {
 	__u32			extended_cpuid_level;
 	/* Maximum supported CPUID level, -1=no CPUID: */
 	int			cpuid_level;
-	__u32			x86_capability[NCAPINTS + NBUGINTS];
+	/* Aligned to size of unsigned long to avoid split lock in atomic ops */
+	__u32			x86_capability[NCAPINTS + NBUGINTS]
+				__aligned(sizeof(unsigned long));
 	char			x86_vendor_id[16];
 	char			x86_model_id[64];
 	/* in KB - valid for CPUS which support this call: */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 04/17] x86/msr-index: Define MSR_IA32_CORE_CAP and split lock detection bit
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (2 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 03/17] x86/split_lock: Align x86_capability to unsigned long to avoid split locked access Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 05/17] x86/cpufeatures: Enumerate MSR_IA32_CORE_CAP Fenghua Yu
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

A new MSR_IA32_CORE_CAP (0xcf) is defined. Each bit in the MSR
enumerates a model specific feature. Currently bit 5 enumerates split
lock detection. When bit 5 is 1, split lock detection is supported.
When the bit is 0, split lock detection is not supported.

Please check the latest Intel 64 and IA-32 Architectures Software
Developer's Manual for more detailed information on the MSR and the
split lock detection bit.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/msr-index.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 979ef971cc78..8b2a7899f784 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -61,6 +61,10 @@
 #define MSR_PLATFORM_INFO_CPUID_FAULT_BIT	31
 #define MSR_PLATFORM_INFO_CPUID_FAULT		BIT_ULL(MSR_PLATFORM_INFO_CPUID_FAULT_BIT)
 
+#define MSR_IA32_CORE_CAP			0x000000cf
+#define MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT_BIT 5
+#define MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT	BIT(MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_PKG_CST_CONFIG_CONTROL	0x000000e2
 #define NHM_C3_AUTO_DEMOTE		(1UL << 25)
 #define NHM_C1_AUTO_DEMOTE		(1UL << 26)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 05/17] x86/cpufeatures: Enumerate MSR_IA32_CORE_CAP
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (3 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 04/17] x86/msr-index: Define MSR_IA32_CORE_CAP and split lock detection bit Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 06/17] x86/split_lock: Enumerate split lock detection by MSR_IA32_CORE_CAP Fenghua Yu
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

MSR_IA32_CORE_CAP (0xcf) contains bits that enumerate some model
specific features.

The MSR 0xcf itself is enumerated by CPUID.(EAX=0x7,ECX=0):EDX[30].
When this CPUID bit is 1, the MSR 0xcf exists.

Detailed information on the CPUID bit and the MSR can be found in the
latest Intel 64 and IA-32 Architectures Software Developer's Manual.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 75f27ee2c263..c6e888688a13 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -351,6 +351,7 @@
 #define X86_FEATURE_INTEL_STIBP		(18*32+27) /* "" Single Thread Indirect Branch Predictors */
 #define X86_FEATURE_FLUSH_L1D		(18*32+28) /* Flush L1D cache */
 #define X86_FEATURE_ARCH_CAPABILITIES	(18*32+29) /* IA32_ARCH_CAPABILITIES MSR (Intel) */
+#define X86_FEATURE_CORE_CAPABILITY	(18*32+30) /* "" IA32_CORE_CAPABILITY MSR */
 #define X86_FEATURE_SPEC_CTRL_SSBD	(18*32+31) /* "" Speculative Store Bypass Disable */
 
 /*
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 06/17] x86/split_lock: Enumerate split lock detection by MSR_IA32_CORE_CAP
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (4 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 05/17] x86/cpufeatures: Enumerate MSR_IA32_CORE_CAP Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 07/17] x86/split_lock: Enumerate split lock detection on Icelake mobile processor Fenghua Yu
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

Bits in MSR_IA32_CORE_CAP enumerate a few features that are not
enumerated through CPUID. Currently bit 5 is defined to enumerate
feature of split lock detection. All other bits are reserved now.

When bit 5 is 1, the feature is supported and feature bit
X86_FEATURE_SPLIT_LOCK_DETECT is set. Otherwise, the feature is not
available.

The MSR_IA32_CORE_CAP itself is enumerated by
CPUID.(EAX=0x7,ECX=0):EDX[30].

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpu.h         |  5 ++
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/kernel/cpu/common.c       |  2 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 79 +++++++++++++++---------------
 arch/x86/kernel/cpu/intel.c        | 22 +++++++++
 5 files changed, 70 insertions(+), 39 deletions(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index adc6cc86b062..4e03f53fc079 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -40,4 +40,9 @@ int mwait_usable(const struct cpuinfo_x86 *);
 unsigned int x86_family(unsigned int sig);
 unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
+#ifdef CONFIG_CPU_SUP_INTEL
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+#else
+static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+#endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index c6e888688a13..5e3759b7c5b7 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -221,6 +221,7 @@
 #define X86_FEATURE_ZEN			( 7*32+28) /* "" CPU is AMD family 0x17 (Zen) */
 #define X86_FEATURE_L1TF_PTEINV		( 7*32+29) /* "" L1TF workaround PTE inversion */
 #define X86_FEATURE_IBRS_ENHANCED	( 7*32+30) /* Enhanced IBRS */
+#define X86_FEATURE_SPLIT_LOCK_DETECT	( 7*32+31) /* #AC for split lock */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW		( 8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index ed2b81b437e0..9aa91140024f 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1119,6 +1119,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 	cpu_set_bug_bits(c);
 
+	cpu_set_core_cap_bits(c);
+
 	fpu__init_system(c);
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 2c0bd38a44ab..3d633f67fbd7 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -20,45 +20,46 @@ struct cpuid_dep {
  * but it's difficult to tell that to the init reference checker.
  */
 static const struct cpuid_dep cpuid_deps[] = {
-	{ X86_FEATURE_XSAVEOPT,		X86_FEATURE_XSAVE     },
-	{ X86_FEATURE_XSAVEC,		X86_FEATURE_XSAVE     },
-	{ X86_FEATURE_XSAVES,		X86_FEATURE_XSAVE     },
-	{ X86_FEATURE_AVX,		X86_FEATURE_XSAVE     },
-	{ X86_FEATURE_PKU,		X86_FEATURE_XSAVE     },
-	{ X86_FEATURE_MPX,		X86_FEATURE_XSAVE     },
-	{ X86_FEATURE_XGETBV1,		X86_FEATURE_XSAVE     },
-	{ X86_FEATURE_FXSR_OPT,		X86_FEATURE_FXSR      },
-	{ X86_FEATURE_XMM,		X86_FEATURE_FXSR      },
-	{ X86_FEATURE_XMM2,		X86_FEATURE_XMM       },
-	{ X86_FEATURE_XMM3,		X86_FEATURE_XMM2      },
-	{ X86_FEATURE_XMM4_1,		X86_FEATURE_XMM2      },
-	{ X86_FEATURE_XMM4_2,		X86_FEATURE_XMM2      },
-	{ X86_FEATURE_XMM3,		X86_FEATURE_XMM2      },
-	{ X86_FEATURE_PCLMULQDQ,	X86_FEATURE_XMM2      },
-	{ X86_FEATURE_SSSE3,		X86_FEATURE_XMM2,     },
-	{ X86_FEATURE_F16C,		X86_FEATURE_XMM2,     },
-	{ X86_FEATURE_AES,		X86_FEATURE_XMM2      },
-	{ X86_FEATURE_SHA_NI,		X86_FEATURE_XMM2      },
-	{ X86_FEATURE_FMA,		X86_FEATURE_AVX       },
-	{ X86_FEATURE_AVX2,		X86_FEATURE_AVX,      },
-	{ X86_FEATURE_AVX512F,		X86_FEATURE_AVX,      },
-	{ X86_FEATURE_AVX512IFMA,	X86_FEATURE_AVX512F   },
-	{ X86_FEATURE_AVX512PF,		X86_FEATURE_AVX512F   },
-	{ X86_FEATURE_AVX512ER,		X86_FEATURE_AVX512F   },
-	{ X86_FEATURE_AVX512CD,		X86_FEATURE_AVX512F   },
-	{ X86_FEATURE_AVX512DQ,		X86_FEATURE_AVX512F   },
-	{ X86_FEATURE_AVX512BW,		X86_FEATURE_AVX512F   },
-	{ X86_FEATURE_AVX512VL,		X86_FEATURE_AVX512F   },
-	{ X86_FEATURE_AVX512VBMI,	X86_FEATURE_AVX512F   },
-	{ X86_FEATURE_AVX512_VBMI2,	X86_FEATURE_AVX512VL  },
-	{ X86_FEATURE_GFNI,		X86_FEATURE_AVX512VL  },
-	{ X86_FEATURE_VAES,		X86_FEATURE_AVX512VL  },
-	{ X86_FEATURE_VPCLMULQDQ,	X86_FEATURE_AVX512VL  },
-	{ X86_FEATURE_AVX512_VNNI,	X86_FEATURE_AVX512VL  },
-	{ X86_FEATURE_AVX512_BITALG,	X86_FEATURE_AVX512VL  },
-	{ X86_FEATURE_AVX512_4VNNIW,	X86_FEATURE_AVX512F   },
-	{ X86_FEATURE_AVX512_4FMAPS,	X86_FEATURE_AVX512F   },
-	{ X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_XSAVEOPT,			X86_FEATURE_XSAVE     },
+	{ X86_FEATURE_XSAVEC,			X86_FEATURE_XSAVE     },
+	{ X86_FEATURE_XSAVES,			X86_FEATURE_XSAVE     },
+	{ X86_FEATURE_AVX,			X86_FEATURE_XSAVE     },
+	{ X86_FEATURE_PKU,			X86_FEATURE_XSAVE     },
+	{ X86_FEATURE_MPX,			X86_FEATURE_XSAVE     },
+	{ X86_FEATURE_XGETBV1,			X86_FEATURE_XSAVE     },
+	{ X86_FEATURE_FXSR_OPT,			X86_FEATURE_FXSR      },
+	{ X86_FEATURE_XMM,			X86_FEATURE_FXSR      },
+	{ X86_FEATURE_XMM2,			X86_FEATURE_XMM       },
+	{ X86_FEATURE_XMM3,			X86_FEATURE_XMM2      },
+	{ X86_FEATURE_XMM4_1,			X86_FEATURE_XMM2      },
+	{ X86_FEATURE_XMM4_2,			X86_FEATURE_XMM2      },
+	{ X86_FEATURE_XMM3,			X86_FEATURE_XMM2      },
+	{ X86_FEATURE_PCLMULQDQ,		X86_FEATURE_XMM2      },
+	{ X86_FEATURE_SSSE3,			X86_FEATURE_XMM2,     },
+	{ X86_FEATURE_F16C,			X86_FEATURE_XMM2,     },
+	{ X86_FEATURE_AES,			X86_FEATURE_XMM2      },
+	{ X86_FEATURE_SHA_NI,			X86_FEATURE_XMM2      },
+	{ X86_FEATURE_FMA,			X86_FEATURE_AVX       },
+	{ X86_FEATURE_AVX2,			X86_FEATURE_AVX,      },
+	{ X86_FEATURE_AVX512F,			X86_FEATURE_AVX,      },
+	{ X86_FEATURE_AVX512IFMA,		X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_AVX512PF,			X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_AVX512ER,			X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_AVX512CD,			X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_AVX512DQ,			X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_AVX512BW,			X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_AVX512VL,			X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_AVX512VBMI,		X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_AVX512_VBMI2,		X86_FEATURE_AVX512VL  },
+	{ X86_FEATURE_GFNI,			X86_FEATURE_AVX512VL  },
+	{ X86_FEATURE_VAES,			X86_FEATURE_AVX512VL  },
+	{ X86_FEATURE_VPCLMULQDQ,		X86_FEATURE_AVX512VL  },
+	{ X86_FEATURE_AVX512_VNNI,		X86_FEATURE_AVX512VL  },
+	{ X86_FEATURE_AVX512_BITALG,		X86_FEATURE_AVX512VL  },
+	{ X86_FEATURE_AVX512_4VNNIW,		X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_AVX512_4FMAPS,		X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_AVX512_VPOPCNTDQ,		X86_FEATURE_AVX512F   },
+	{ X86_FEATURE_SPLIT_LOCK_DETECT,	X86_FEATURE_CORE_CAPABILITY},
 	{}
 };
 
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index f17c1a714779..d63a4ba203e1 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -995,3 +995,25 @@ static const struct cpu_dev intel_cpu_dev = {
 };
 
 cpu_dev_register(intel_cpu_dev);
+
+static void __init split_lock_setup(void)
+{
+	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+}
+
+void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
+{
+	u64 ia32_core_cap = 0;
+
+	if (!cpu_has(c, X86_FEATURE_CORE_CAPABILITY))
+		return;
+
+	/*
+	 * If MSR_IA32_CORE_CAP exists, enumerate features that are
+	 * reported in the MSR.
+	 */
+	rdmsrl(MSR_IA32_CORE_CAP, ia32_core_cap);
+
+	if (ia32_core_cap & MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT)
+		split_lock_setup();
+}
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 07/17] x86/split_lock: Enumerate split lock detection on Icelake mobile processor
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (5 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 06/17] x86/split_lock: Enumerate split lock detection by MSR_IA32_CORE_CAP Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 08/17] x86/split_lock: Define MSR TEST_CTL register Fenghua Yu
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

Icelake mobile processor can detect split lock operations although
the processor doesn't have MSR IA32_CORE_CAP and split lock
detection bit in the MSR. Set split lock detection feature bit
X86_FEATURE_SPLIT_LOCK_DETECT on the processor based on its
family/model/stepping.

In the future, a few other processors may also have the split lock
detection feature but don't have MSR IA32_CORE_CAP. The feature
will be enumerated on those processors once their family/model/stepping
information is released.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/cpu/intel.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index d63a4ba203e1..7ae6cc22657d 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -1005,8 +1005,18 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
 {
 	u64 ia32_core_cap = 0;
 
-	if (!cpu_has(c, X86_FEATURE_CORE_CAPABILITY))
+	if (!cpu_has(c, X86_FEATURE_CORE_CAPABILITY)) {
+		/*
+		 * The following processors have split lock detection feature.
+		 * But since they don't have MSR IA32_CORE_CAP, the
+		 * feature cannot be enumerated by the MSR. So enumerate the
+		 * feature by family/model/stepping.
+		 */
+		if (c->x86 == 6 && c->x86_model == INTEL_FAM6_ICELAKE_MOBILE)
+			split_lock_setup();
+
 		return;
+	}
 
 	/*
 	 * If MSR_IA32_CORE_CAP exists, enumerate features that are
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 08/17] x86/split_lock: Define MSR TEST_CTL register
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (6 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 07/17] x86/split_lock: Enumerate split lock detection on Icelake mobile processor Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock Fenghua Yu
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

Setting bit 29 in MSR TEST_CTL (0x33) enables split lock detection and
clearing the bit disables split lock detection.

Define the MSR and the bit. The definitions will be used in enabling or
disabling split lock detection.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/msr-index.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 8b2a7899f784..2017cab4717a 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -41,6 +41,10 @@
 
 /* Intel MSRs. Some also available on other CPUs */
 
+#define MSR_TEST_CTL				0x00000033
+#define MSR_TEST_CTL_SPLIT_LOCK_DETECT_BIT	29
+#define MSR_TEST_CTL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTL_SPLIT_LOCK_DETECT_BIT)
+
 #define MSR_IA32_SPEC_CTRL		0x00000048 /* Speculation Control */
 #define SPEC_CTRL_IBRS			BIT(0)	   /* Indirect Branch Restricted Speculation */
 #define SPEC_CTRL_STIBP_SHIFT		1	   /* Single Thread Indirect Branch Predictor (STIBP) bit */
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (7 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 08/17] x86/split_lock: Define MSR TEST_CTL register Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-26 20:20   ` Thomas Gleixner
  2019-06-18 22:41 ` [PATCH v9 10/17] kvm/x86: Emulate MSR IA32_CORE_CAPABILITY Fenghua Yu
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

There may be different considerations on how to handle #AC for split lock,
e.g. how to handle system hang caused by split lock issue in firmware,
how to emulate faulting instruction, etc. We use a simple method to
handle user and kernel split lock and may extend the method in the future.

When #AC exception for split lock is triggered from user process, the
process is killed by SIGBUS. To execute the process properly, a user
application developer needs to fix the split lock issue.

When #AC exception for split lock is triggered from a kernel instruction,
disable split lock detection on local CPU and warn the split lock issue.
After the exception, the faulting instruction will be executed and kernel
execution continues. Split lock detection is only disabled on the local
CPU, not globally. It will be re-enabled if the CPU is offline and then
online or through debugfs interface.

A kernel/driver developer should check the warning, which contains helpful
faulting address, context, and callstack info, and fix the split lock
issues. Then further split lock issues may be captured and fixed.

After bit 29 in MSR_TEST_CTL is set to 1 in kernel, firmware inherits
the setting when firmware is executed in S4, S5, run time services, SMI,
etc. If there is a split lock operation in firmware, it will triggers
#AC and may hang the system depending on how firmware handles the #AC.
It's up to a firmware developer to fix split lock issues in firmware.

MSR TEST_CTL value is cached in per CPU msr_test_ctl_cached which will be
used in virtualization to avoid costly MSR read.

Ingo suggests to use global split_lock_debug flag to allow only one CPU to
print split lock warning in the #AC handler because WARN_ONCE() and
underlying BUGFLAG_ONCE mechanism are not atomic. This also solves
the race if the split-lock #AC fault is re-triggered by NMI of perf
context interrupting one split-lock warning execution while the original
WARN_ON() is executing.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/cpu.h  |  3 +++
 arch/x86/kernel/cpu/intel.c | 38 +++++++++++++++++++++++++++++++++
 arch/x86/kernel/traps.c     | 42 ++++++++++++++++++++++++++++++++++++-
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h
index 4e03f53fc079..81710f2a3eea 100644
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -42,7 +42,10 @@ unsigned int x86_model(unsigned int sig);
 unsigned int x86_stepping(unsigned int sig);
 #ifdef CONFIG_CPU_SUP_INTEL
 void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c);
+DECLARE_PER_CPU(u64, msr_test_ctl_cached);
+void split_lock_disable(void);
 #else
 static inline void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c) {}
+static inline void split_lock_disable(void) {}
 #endif
 #endif /* _ASM_X86_CPU_H */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 7ae6cc22657d..16cf1631b7f9 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -31,6 +31,9 @@
 #include <asm/apic.h>
 #endif
 
+DEFINE_PER_CPU(u64, msr_test_ctl_cached);
+EXPORT_PER_CPU_SYMBOL_GPL(msr_test_ctl_cached);
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
@@ -624,6 +627,17 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_init(struct cpuinfo_x86 *c)
+{
+	if (cpu_has(c, X86_FEATURE_SPLIT_LOCK_DETECT)) {
+		u64 test_ctl_val;
+
+		/* Cache MSR TEST_CTL */
+		rdmsrl(MSR_TEST_CTL, test_ctl_val);
+		this_cpu_write(msr_test_ctl_cached, test_ctl_val);
+	}
+}
+
 static void init_intel(struct cpuinfo_x86 *c)
 {
 	early_init_intel(c);
@@ -734,6 +748,8 @@ static void init_intel(struct cpuinfo_x86 *c)
 		detect_tme(c);
 
 	init_intel_misc_features(c);
+
+	split_lock_init(c);
 }
 
 #ifdef CONFIG_X86_32
@@ -1027,3 +1043,25 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
 	if (ia32_core_cap & MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT)
 		split_lock_setup();
 }
+
+static atomic_t split_lock_debug;
+
+void split_lock_disable(void)
+{
+	/* Disable split lock detection on this CPU */
+	this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);
+	wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_cached));
+
+	/*
+	 * Use the atomic variable split_lock_debug to ensure only the
+	 * first CPU hitting split lock issue prints one single complete
+	 * warning. This also solves the race if the split-lock #AC fault
+	 * is re-triggered by NMI of perf context interrupting one
+	 * split-lock warning execution while the original WARN_ONCE() is
+	 * executing.
+	 */
+	if (atomic_cmpxchg(&split_lock_debug, 0, 1) == 0) {
+		WARN_ONCE(1, "split lock operation detected\n");
+		atomic_set(&split_lock_debug, 0);
+	}
+}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 8b6d03e55d2f..38143c028f5a 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -61,6 +61,7 @@
 #include <asm/mpx.h>
 #include <asm/vm86.h>
 #include <asm/umip.h>
+#include <asm/cpu.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -293,9 +294,48 @@ DO_ERROR(X86_TRAP_OLD_MF, SIGFPE,           0, NULL, "coprocessor segment overru
 DO_ERROR(X86_TRAP_TS,     SIGSEGV,          0, NULL, "invalid TSS",         invalid_TSS)
 DO_ERROR(X86_TRAP_NP,     SIGBUS,           0, NULL, "segment not present", segment_not_present)
 DO_ERROR(X86_TRAP_SS,     SIGBUS,           0, NULL, "stack segment",       stack_segment)
-DO_ERROR(X86_TRAP_AC,     SIGBUS,  BUS_ADRALN, NULL, "alignment check",     alignment_check)
 #undef IP
 
+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	unsigned int trapnr = X86_TRAP_AC;
+	char str[] = "alignment check";
+	int signr = SIGBUS;
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
+		return;
+
+	cond_local_irq_enable(regs);
+	if (!user_mode(regs) && static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
+		/*
+		 * Only split locks can generate #AC from kernel mode.
+		 *
+		 * The split-lock detection feature is a one-shot
+		 * debugging facility, so we disable it immediately and
+		 * print a warning.
+		 *
+		 * This also solves the instruction restart problem: we
+		 * return the faulting instruction right after this it
+		 * will be executed without generating another #AC fault
+		 * and getting into an infinite loop, instead it will
+		 * continue without side effects to the interrupted
+		 * execution context.
+		 *
+		 * Split-lock detection will remain disabled after this,
+		 * until the next reboot.
+		 */
+		split_lock_disable();
+
+		return;
+	}
+
+	/* Handle #AC generated in any other cases. */
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);
+}
+
 #ifdef CONFIG_VMAP_STACK
 __visible void __noreturn handle_stack_overflow(const char *message,
 						struct pt_regs *regs,
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 10/17] kvm/x86: Emulate MSR IA32_CORE_CAPABILITY
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (8 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 11/17] kvm/vmx: Emulate MSR TEST_CTL Fenghua Yu
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Xiaoyao Li, Fenghua Yu

From: Xiaoyao Li <xiaoyao.li@linux.intel.com>

MSR IA32_CORE_CAPABILITY is a feature-enumerating MSR, bit 5 of which
reports the capability of enabling detection of split locks (will be
supported on future processors based on Tremont microarchitecture and
later).

CPUID.(EAX=7H,ECX=0):EDX[30] enumerates the presence of the
IA32_CORE_CAPABILITY MSR.

Please check the latest Intel 64 and IA-32 Architectures Software
Developer's Manual for more detailed information on the MSR and
the split lock bit.

Since MSR_IA32_CORE_CAP is a feature-enumerating MSR that plays the
similar role as CPUID, it can be emulated in software regardless of host's
capability. What we need to do is to set the right value of it to report
the capability of guest.

In this patch, just set the guest's core_capability as 0, because we
haven't added support of the features it indicates to guest. It's for
bisectability.

Signed-off-by: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/cpuid.c            |  6 ++++++
 arch/x86/kvm/x86.c              | 22 ++++++++++++++++++++++
 3 files changed, 29 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 450d69a1e6fa..ddac618e96a1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -572,6 +572,7 @@ struct kvm_vcpu_arch {
 	u64 ia32_xss;
 	u64 microcode_version;
 	u64 arch_capabilities;
+	u64 core_capability;
 
 	/*
 	 * Paging state of the vcpu
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index e18a9f9f65b5..7d064a7c5637 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -507,6 +507,12 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 			 * if the host doesn't support it.
 			 */
 			entry->edx |= F(ARCH_CAPABILITIES);
+			/*
+			 * Since we emulate MSR IA32_CORE_CAPABILITY in
+			 * software, we can always enable it for guest
+			 * regardless of host's capability.
+			 */
+			entry->edx |= F(CORE_CAPABILITY);
 		} else {
 			entry->ebx = 0;
 			entry->ecx = 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83aefd759846..dc4c72bd6781 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1165,6 +1165,7 @@ static u32 emulated_msrs[] = {
 	MSR_IA32_TSC_ADJUST,
 	MSR_IA32_TSCDEADLINE,
 	MSR_IA32_ARCH_CAPABILITIES,
+	MSR_IA32_CORE_CAP,
 	MSR_IA32_MISC_ENABLE,
 	MSR_IA32_MCG_STATUS,
 	MSR_IA32_MCG_CTL,
@@ -1207,6 +1208,7 @@ static u32 msr_based_features[] = {
 
 	MSR_F10H_DECFG,
 	MSR_IA32_UCODE_REV,
+	MSR_IA32_CORE_CAP,
 	MSR_IA32_ARCH_CAPABILITIES,
 };
 
@@ -1234,9 +1236,17 @@ u64 kvm_get_arch_capabilities(void)
 }
 EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
 
+static u64 kvm_get_core_capability(void)
+{
+	return 0;
+}
+
 static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
 {
 	switch (msr->index) {
+	case MSR_IA32_CORE_CAP:
+		msr->data = kvm_get_core_capability();
+		break;
 	case MSR_IA32_ARCH_CAPABILITIES:
 		msr->data = kvm_get_arch_capabilities();
 		break;
@@ -2495,6 +2505,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_EFER:
 		return set_efer(vcpu, msr_info);
+	case MSR_IA32_CORE_CAP:
+		if (!msr_info->host_initiated)
+			return 1;
+		vcpu->arch.core_capability = data;
+		break;
 	case MSR_K7_HWCR:
 		data &= ~(u64)0x40;	/* ignore flush filter disable */
 		data &= ~(u64)0x100;	/* ignore ignne emulation enable */
@@ -2808,6 +2823,12 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSC:
 		msr_info->data = kvm_scale_tsc(vcpu, rdtsc()) + vcpu->arch.tsc_offset;
 		break;
+	case MSR_IA32_CORE_CAP:
+		if (!msr_info->host_initiated &&
+		    !guest_cpuid_has(vcpu, X86_FEATURE_CORE_CAPABILITY))
+			return 1;
+		msr_info->data = vcpu->arch.core_capability;
+		break;
 	case MSR_MTRRcap:
 	case 0x200 ... 0x2ff:
 		return kvm_mtrr_get_msr(vcpu, msr_info->index, &msr_info->data);
@@ -8853,6 +8874,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.arch_capabilities = kvm_get_arch_capabilities();
+	vcpu->arch.core_capability = kvm_get_core_capability();
 	vcpu->arch.msr_platform_info = MSR_PLATFORM_INFO_CPUID_FAULT;
 	kvm_vcpu_mtrr_init(vcpu);
 	vcpu_load(vcpu);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 11/17] kvm/vmx: Emulate MSR TEST_CTL
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (9 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 10/17] kvm/x86: Emulate MSR IA32_CORE_CAPABILITY Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-27  2:24   ` Xiaoyao Li
  2019-06-18 22:41 ` [PATCH v9 12/17] x86/split_lock: Enable split lock detection by default Fenghua Yu
                   ` (6 subsequent siblings)
  17 siblings, 1 reply; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Xiaoyao Li, Fenghua Yu

From: Xiaoyao Li <xiaoyao.li@linux.intel.com>

A control bit (bit 29) in TEST_CTL MSR 0x33 will be introduced in
future x86 processors. When bit 29 is set, the processor causes #AC
exception for split locked accesses at all CPL.

Please check the latest Intel 64 and IA-32 Architectures Software
Developer's Manual for more detailed information on the MSR and
the split lock bit.

This patch emulates MSR_TEST_CTL with vmx->msr_test_ctl and does the
following:
1. As MSR TEST_CTL of guest is emulated, enable the related bit
in CORE_CAPABILITY to correctly report this feature to guest.

2. If host has split lock detection enabled, forcing it enabled in
guest to avoid guest's slowdown attack by using split lock.
If host has it disabled, it can give control to guest that guest can
enable it on its own purpose.

Note: Guest can read and write bit 29 of MSR_TEST_CTL if hardware has
feature split lock detection. But when guest running, the real value in
hardware MSR will be different from the value read in guest when guest
has it disabled and host has it enabled. It can be regarded as host's
value overrides guest's value.

To avoid costly RDMSR of TEST_CTL when switching between host and guest
during vmentry, read per CPU variable msr_test_ctl_cached which caches
the MSR value.

Besides, only inject #AC exception back when guest can handle it.
Otherwise, it must be a split lock caused #AC. In this case, print a hint.

Signed-off-by: Xiaoyao Li <xiaoyao.li@linux.intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kvm/vmx/vmx.c | 92 ++++++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/vmx/vmx.h |  2 +
 arch/x86/kvm/x86.c     | 19 ++++++++-
 3 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b93e36ddee5e..d096cee48a40 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1640,6 +1640,16 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
 	return !(val & ~valid_bits);
 }
 
+static u64 vmx_get_msr_test_ctl_mask(struct kvm_vcpu *vcpu)
+{
+	u64 mask = 0;
+
+	if (vcpu->arch.core_capability & MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT)
+		mask |= MSR_TEST_CTL_SPLIT_LOCK_DETECT;
+
+	return mask;
+}
+
 static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 {
 	switch (msr->index) {
@@ -1666,6 +1676,11 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	u32 index;
 
 	switch (msr_info->index) {
+	case MSR_TEST_CTL:
+		if (!vmx->msr_test_ctl_mask)
+			return 1;
+		msr_info->data = vmx->msr_test_ctl;
+		break;
 #ifdef CONFIG_X86_64
 	case MSR_FS_BASE:
 		msr_info->data = vmcs_readl(GUEST_FS_BASE);
@@ -1803,6 +1818,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	u32 index;
 
 	switch (msr_index) {
+	case MSR_TEST_CTL:
+		if (!vmx->msr_test_ctl_mask ||
+		    (data & vmx->msr_test_ctl_mask) != data)
+			return 1;
+		vmx->msr_test_ctl = data;
+		break;
+	case MSR_IA32_CORE_CAP:
+		if (!msr_info->host_initiated)
+			return 1;
+		vcpu->arch.core_capability = data;
+		vmx->msr_test_ctl_mask = vmx_get_msr_test_ctl_mask(vcpu);
+		break;
 	case MSR_EFER:
 		ret = kvm_set_msr_common(vcpu, msr_info);
 		break;
@@ -4121,6 +4148,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 
 	vmx->rmode.vm86_active = 0;
 	vmx->spec_ctrl = 0;
+	vmx->msr_test_ctl = 0;
+	vmx->msr_test_ctl_mask = vmx_get_msr_test_ctl_mask(vcpu);
 
 	vcpu->arch.microcode_version = 0x100000000ULL;
 	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
@@ -4449,6 +4478,28 @@ static int handle_machine_check(struct kvm_vcpu *vcpu)
 	return 1;
 }
 
+/*
+ * In intel SDM, #AC can be caused in two way:
+ *	1. Unaligned memory access when CPL = 3 && CR0.AM == 1 && EFLAGS.AC == 1
+ *	2. Lock on crossing cache line memory access, when split lock detection
+ *	   is enabled (bit 29 of MSR_TEST_CTL is set). This #AC can be generated
+ *	   in any CPL.
+ *
+ * So, when guest's split lock detection is enabled, it can be assumed capable
+ * of handling #AC in any CPL.
+ * Or when guest's CR0.AM and EFLAGS.AC are both set, it can be assumed capable
+ * of handling #AC in CPL == 3.
+ */
+static bool guest_can_handle_ac(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+	return (vmx->msr_test_ctl & MSR_TEST_CTL_SPLIT_LOCK_DETECT) ||
+	       ((vmx_get_cpl(vcpu) == 3) &&
+		kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
+		(kvm_get_rflags(vcpu) & X86_EFLAGS_AC));
+}
+
 static int handle_exception(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -4514,9 +4565,6 @@ static int handle_exception(struct kvm_vcpu *vcpu)
 		return handle_rmode_exception(vcpu, ex_no, error_code);
 
 	switch (ex_no) {
-	case AC_VECTOR:
-		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
-		return 1;
 	case DB_VECTOR:
 		dr6 = vmcs_readl(EXIT_QUALIFICATION);
 		if (!(vcpu->guest_debug &
@@ -4545,6 +4593,15 @@ static int handle_exception(struct kvm_vcpu *vcpu)
 		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
 		kvm_run->debug.arch.exception = ex_no;
 		break;
+	case AC_VECTOR:
+		if (guest_can_handle_ac(vcpu)) {
+			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
+			return 1;
+		}
+		pr_warn("kvm: %s[%d]: there is an #AC exception in guest due to split lock. "
+			"Please try to fix it, or disable the split lock detection in host to workaround.",
+			current->comm, current->pid);
+		/* fall through */
 	default:
 		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
 		kvm_run->ex.exception = ex_no;
@@ -6335,6 +6392,33 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
 					msrs[i].host, false);
 }
 
+static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
+{
+	u64 guest_val;
+	u64 host_val = this_cpu_read(msr_test_ctl_cached);
+	u64 mask = vmx->msr_test_ctl_mask;
+
+	/*
+	 * Guest can cause overall system performance degradation (of host or
+	 * other guest) by using split lock. Hence, it takes following policy:
+	 *  - If host has split lock detection enabled, forcing it enabled in
+	 *    guest during vm entry.
+	 *  - If host has split lock detection disabled, guest can enable it for
+	 *    it's own purpose that it will load guest's value during vm entry.
+	 *
+	 * So use adjusted mask to achieve this.
+	 */
+	if (host_val & MSR_TEST_CTL_SPLIT_LOCK_DETECT)
+		mask &= ~MSR_TEST_CTL_SPLIT_LOCK_DETECT;
+
+	guest_val = (host_val & ~mask) | (vmx->msr_test_ctl & mask);
+
+	if (host_val == guest_val)
+		clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
+	else
+		add_atomic_switch_msr(vmx, MSR_TEST_CTL, guest_val, host_val, false);
+}
+
 static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
 {
 	vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
@@ -6443,6 +6527,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 
 	atomic_switch_perf_msrs(vmx);
 
+	atomic_switch_msr_test_ctl(vmx);
+
 	vmx_update_hv_timer(vcpu);
 
 	/*
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 61128b48c503..2a54b0b5741e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -193,6 +193,8 @@ struct vcpu_vmx {
 	u64		      msr_guest_kernel_gs_base;
 #endif
 
+	u64		      msr_test_ctl;
+	u64		      msr_test_ctl_mask;
 	u64		      spec_ctrl;
 
 	u32 vm_entry_controls_shadow;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dc4c72bd6781..741ad4e61386 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1238,7 +1238,24 @@ EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
 
 static u64 kvm_get_core_capability(void)
 {
-	return 0;
+	u64 data = 0;
+
+	if (boot_cpu_has(X86_FEATURE_CORE_CAPABILITY)) {
+		rdmsrl(MSR_IA32_CORE_CAP, data);
+
+		/* mask non-virtualizable functions */
+		data &= MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT;
+	} else if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
+		/*
+		 * There will be a list of FMS values that have split lock
+		 * detection but lack the CORE CAPABILITY MSR. In this case,
+		 * set MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT since we emulate
+		 * MSR CORE_CAPABILITY.
+		 */
+		data |= MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT;
+	}
+
+	return data;
 }
 
 static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 12/17] x86/split_lock: Enable split lock detection by default
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (10 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 11/17] kvm/vmx: Emulate MSR TEST_CTL Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 13/17] x86/split_lock: Disable split lock detection by kernel parameter "nosplit_lock_detect" Fenghua Yu
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

A split locked access locks bus and degrades overall memory access
performance. When split lock detection feature is enumerated, enable
the feature by default by writing 1 to bit 29 in MSR TEST_CTL to find
any split lock issue.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/cpu/intel.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 16cf1631b7f9..4ccd890a45b0 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -627,6 +627,13 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
+static void split_lock_update_msr(void)
+{
+	/* Enable split lock detection */
+	this_cpu_or(msr_test_ctl_cached, MSR_TEST_CTL_SPLIT_LOCK_DETECT);
+	wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_cached));
+}
+
 static void split_lock_init(struct cpuinfo_x86 *c)
 {
 	if (cpu_has(c, X86_FEATURE_SPLIT_LOCK_DETECT)) {
@@ -635,6 +642,8 @@ static void split_lock_init(struct cpuinfo_x86 *c)
 		/* Cache MSR TEST_CTL */
 		rdmsrl(MSR_TEST_CTL, test_ctl_val);
 		this_cpu_write(msr_test_ctl_cached, test_ctl_val);
+
+		split_lock_update_msr();
 	}
 }
 
@@ -1012,9 +1021,13 @@ static const struct cpu_dev intel_cpu_dev = {
 
 cpu_dev_register(intel_cpu_dev);
 
+#undef pr_fmt
+#define pr_fmt(fmt) "x86/split lock detection: " fmt
+
 static void __init split_lock_setup(void)
 {
 	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
+	pr_info("enabled\n");
 }
 
 void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 13/17] x86/split_lock: Disable split lock detection by kernel parameter "nosplit_lock_detect"
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (11 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 12/17] x86/split_lock: Enable split lock detection by default Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-26 20:34   ` Thomas Gleixner
  2019-06-18 22:41 ` [PATCH v9 14/17] x86/split_lock: Add a debugfs interface to enable/disable split lock detection during run time Fenghua Yu
                   ` (4 subsequent siblings)
  17 siblings, 1 reply; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

To work around or debug split lock issues, the kernel parameter
"nosplit_lock_detect" is introduced to disable the feature during boot
time.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 .../admin-guide/kernel-parameters.txt         |  2 ++
 arch/x86/kernel/cpu/intel.c                   | 22 ++++++++++++++++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 138f6664b2e2..bcf578a1bc77 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3086,6 +3086,8 @@
 
 	nosoftlockup	[KNL] Disable the soft-lockup detector.
 
+	nosplit_lock_detect	[X86] Disable split lock detection
+
 	nosync		[HW,M68K] Disables sync negotiation for all devices.
 
 	nowatchdog	[KNL] Disable both lockup detectors, i.e.
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4ccd890a45b0..4a854f051cf4 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -19,6 +19,7 @@
 #include <asm/microcode_intel.h>
 #include <asm/hwcap2.h>
 #include <asm/elf.h>
+#include <asm/cmdline.h>
 
 #ifdef CONFIG_X86_64
 #include <linux/topology.h>
@@ -34,6 +35,8 @@
 DEFINE_PER_CPU(u64, msr_test_ctl_cached);
 EXPORT_PER_CPU_SYMBOL_GPL(msr_test_ctl_cached);
 
+static bool split_lock_detect_enabled;
+
 /*
  * Just in case our CPU detection goes bad, or you have a weird system,
  * allow a way to override the automatic disabling of MPX.
@@ -629,8 +632,13 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 
 static void split_lock_update_msr(void)
 {
-	/* Enable split lock detection */
-	this_cpu_or(msr_test_ctl_cached, MSR_TEST_CTL_SPLIT_LOCK_DETECT);
+	if (split_lock_detect_enabled) {
+		/* Enable split lock detection */
+		this_cpu_or(msr_test_ctl_cached, MSR_TEST_CTL_SPLIT_LOCK_DETECT);
+	} else {
+		/* Disable split lock detection */
+		this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);
+	}
 	wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_cached));
 }
 
@@ -1027,7 +1035,15 @@ cpu_dev_register(intel_cpu_dev);
 static void __init split_lock_setup(void)
 {
 	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
-	pr_info("enabled\n");
+
+	if (cmdline_find_option_bool(boot_command_line,
+				     "nosplit_lock_detect")) {
+		split_lock_detect_enabled = false;
+		pr_info("disabled\n");
+	} else {
+		split_lock_detect_enabled = true;
+		pr_info("enabled\n");
+	}
 }
 
 void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 14/17] x86/split_lock: Add a debugfs interface to enable/disable split lock detection during run time
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (12 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 13/17] x86/split_lock: Disable split lock detection by kernel parameter "nosplit_lock_detect" Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-26 21:37   ` Thomas Gleixner
  2019-06-18 22:41 ` [PATCH v9 15/17] x86/split_lock: Add documentation for split lock detection interface Fenghua Yu
                   ` (3 subsequent siblings)
  17 siblings, 1 reply; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

To workaround or debug a split lock issue, the administrator may need to
disable or enable split lock detection during run time without rebooting
the system.

The interface /sys/kernel/debug/x86/split_lock_detect is added to allow
the administrator to disable or enable split lock detection and show
current split lock detection setting during run time.

Writing [yY1] or [oO][nN] to the file enables split lock detection and
writing [nN0] or [oO][fF] disables split lock detection. Split lock
detection is enabled or disabled on all CPUs.

Reading the file returns current global split lock detection setting:
0: disabled
1: enabled

To simplify the code, Ingo suggests to use the global atomic
split_lock_debug flag both for warning split lock in WARN_ONCE() and for
writing the debugfs interface.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/cpu/intel.c | 121 +++++++++++++++++++++++++++++++++++-
 arch/x86/kernel/traps.c     |   3 +-
 2 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 4a854f051cf4..4005342dfdd0 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -9,6 +9,8 @@
 #include <linux/thread_info.h>
 #include <linux/init.h>
 #include <linux/uaccess.h>
+#include <linux/syscore_ops.h>
+#include <linux/debugfs.h>
 
 #include <asm/cpufeature.h>
 #include <asm/pgtable.h>
@@ -630,8 +632,19 @@ static void init_intel_misc_features(struct cpuinfo_x86 *c)
 	wrmsrl(MSR_MISC_FEATURES_ENABLES, msr);
 }
 
-static void split_lock_update_msr(void)
+static void split_lock_update_msr(void *__unused)
 {
+	unsigned long flags;
+
+	/*
+	 * Need to prevent msr_test_ctl_cached from being changed *and*
+	 * completing its WRMSR between our read and our WRMSR. By turning
+	 * IRQs off here, ensure that no split lock debugfs write happens
+	 * on this CPU and that any concurrent debugfs write from a different
+	 * CPU will not finish updating us via IPI until we're done.
+	 */
+	local_irq_save(flags);
+
 	if (split_lock_detect_enabled) {
 		/* Enable split lock detection */
 		this_cpu_or(msr_test_ctl_cached, MSR_TEST_CTL_SPLIT_LOCK_DETECT);
@@ -640,6 +653,8 @@ static void split_lock_update_msr(void)
 		this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);
 	}
 	wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_cached));
+
+	local_irq_restore(flags);
 }
 
 static void split_lock_init(struct cpuinfo_x86 *c)
@@ -651,7 +666,7 @@ static void split_lock_init(struct cpuinfo_x86 *c)
 		rdmsrl(MSR_TEST_CTL, test_ctl_val);
 		this_cpu_write(msr_test_ctl_cached, test_ctl_val);
 
-		split_lock_update_msr();
+		split_lock_update_msr(NULL);
 	}
 }
 
@@ -1077,10 +1092,23 @@ static atomic_t split_lock_debug;
 
 void split_lock_disable(void)
 {
+	unsigned long flags;
+
+	/*
+	 * Need to prevent msr_test_ctl_cached from being changed *and*
+	 * completing its WRMSR between our read and our WRMSR. By turning
+	 * IRQs off here, ensure that no split lock debugfs write happens
+	 * on this CPU and that any concurrent debugfs write from a different
+	 * CPU will not finish updating us via IPI until we're done.
+	 */
+	local_irq_save(flags);
+
 	/* Disable split lock detection on this CPU */
 	this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);
 	wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_cached));
 
+	local_irq_restore(flags);
+
 	/*
 	 * Use the atomic variable split_lock_debug to ensure only the
 	 * first CPU hitting split lock issue prints one single complete
@@ -1094,3 +1122,92 @@ void split_lock_disable(void)
 		atomic_set(&split_lock_debug, 0);
 	}
 }
+
+static ssize_t split_lock_detect_rd(struct file *f, char __user *user_buf,
+				    size_t count, loff_t *ppos)
+{
+	unsigned int len;
+	char buf[8];
+
+	len = sprintf(buf, "%u\n", split_lock_detect_enabled);
+
+	return simple_read_from_buffer(user_buf, count, ppos, buf, len);
+}
+
+static ssize_t split_lock_detect_wr(struct file *f, const char __user *user_buf,
+				    size_t count, loff_t *ppos)
+{
+	unsigned int len;
+	char buf[8];
+	bool val;
+
+	len = min(count, sizeof(buf) - 1);
+	if (copy_from_user(buf, user_buf, len))
+		return -EFAULT;
+
+	buf[len] = '\0';
+	if (kstrtobool(buf, &val))
+		return -EINVAL;
+
+	while (atomic_cmpxchg(&split_lock_debug, 1, 0))
+		cpu_relax();
+
+	if (split_lock_detect_enabled == val)
+		goto out_unlock;
+
+	split_lock_detect_enabled = val;
+
+	/* Update the split lock detection setting in MSR on all online CPUs. */
+	on_each_cpu(split_lock_update_msr, NULL, 1);
+
+	if (split_lock_detect_enabled)
+		pr_info("enabled\n");
+	else
+		pr_info("disabled\n");
+
+out_unlock:
+	atomic_set(&split_lock_debug, 0);
+
+	return count;
+}
+
+static const struct file_operations split_lock_detect_fops = {
+	.read = split_lock_detect_rd,
+	.write = split_lock_detect_wr,
+	.llseek = default_llseek,
+};
+
+/*
+ * Before resume from hibernation, TEST_CTL MSR has been initialized to
+ * default value in split_lock_init() on BP. On resume, restore the MSR
+ * on BP to previous value which could be changed by debugfs and thus could
+ * be different from the default value.
+ *
+ * The MSR on BP is supposed not to be changed during suspend and thus it's
+ * unnecessary to set it again during resume from suspend. But at this point
+ * we don't know resume is from suspend or hibernation. To simplify the
+ * situation, just set up the MSR on resume from suspend.
+ *
+ * Set up the MSR on APs when they are re-added later.
+ */
+static void split_lock_syscore_resume(void)
+{
+	split_lock_update_msr(NULL);
+}
+
+static struct syscore_ops split_lock_syscore_ops = {
+	.resume = split_lock_syscore_resume,
+};
+
+static int __init split_lock_detect_initcall(void)
+{
+	if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
+		debugfs_create_file("split_lock_detect", 0600, arch_debugfs_dir,
+				    NULL, &split_lock_detect_fops);
+
+		register_syscore_ops(&split_lock_syscore_ops);
+	}
+
+	return 0;
+}
+late_initcall(split_lock_detect_initcall);
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 38143c028f5a..691e34828bdf 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -324,7 +324,8 @@ dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
 		 * execution context.
 		 *
 		 * Split-lock detection will remain disabled after this,
-		 * until the next reboot.
+		 * until the next reboot or until it is re-enabled by
+		 * debugfs interface /sys/kernel/debug/x86/split_lock_detect.
 		 */
 		split_lock_disable();
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 15/17] x86/split_lock: Add documentation for split lock detection interface
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (13 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 14/17] x86/split_lock: Add a debugfs interface to enable/disable split lock detection during run time Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-26 21:51   ` Thomas Gleixner
  2019-06-18 22:41 ` [PATCH v9 16/17] x86/split_lock: Reorganize few header files in order to call WARN_ON_ONCE() in atomic bit ops Fenghua Yu
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

It is useful for development and debugging to document the new debugfs
interface /sys/kernel/debug/x86/split_lock_detect.

A new debugfs documentation is created to describe the split lock detection
interface. In the future, more entries may be added in the documentation to
describe other interfaces under /sys/kernel/debug/x86 directory.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 Documentation/ABI/testing/debugfs-x86 | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)
 create mode 100644 Documentation/ABI/testing/debugfs-x86

diff --git a/Documentation/ABI/testing/debugfs-x86 b/Documentation/ABI/testing/debugfs-x86
new file mode 100644
index 000000000000..17a1e9ed6712
--- /dev/null
+++ b/Documentation/ABI/testing/debugfs-x86
@@ -0,0 +1,21 @@
+What:		/sys/kernel/debugfs/x86/split_lock_detect
+Date:		May 2019
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	(RW) Control split lock detection on Intel Tremont and
+		future CPUs
+
+		Reads return split lock detection status:
+			0: disabled
+			1: enabled
+
+		Writes enable or disable split lock detection:
+			The first character is one of 'Nn0' or [oO][fF] for off
+			disables the feature.
+			The first character is one of 'Yy1' or [oO][nN] for on
+			enables the feature.
+
+		Please note the interface only shows or controls global setting.
+		During run time, split lock detection on one CPU may be
+		disabled if split lock operation in kernel code happens on
+		the CPU. The interface doesn't show or control split lock
+		detection on individual CPU.
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 16/17] x86/split_lock: Reorganize few header files in order to call WARN_ON_ONCE() in atomic bit ops
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (14 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 15/17] x86/split_lock: Add documentation for split lock detection interface Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-18 22:41 ` [PATCH v9 17/17] x86/split_lock: Warn on unaligned address in atomic bit operations Fenghua Yu
  2019-09-16 22:39 ` [PATCH 0/3] Fix some 4-byte vs. 8-byte alignment issues Tony Luck
  17 siblings, 0 replies; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

From: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>

Calling WARN_ON_ONCE() from atomic bit ops xxx_bit() throws a build error
as shown below.

  HOSTLD  scripts/mod/modpost
  CC      kernel/bounds.s
  CALL    scripts/atomic/check-atomics.sh
In file included from ./include/linux/bitops.h:19:0,
                 from ./include/linux/kernel.h:12,
                 from ./include/asm-generic/bug.h:18,
                 from ./arch/x86/include/asm/bug.h:83,
                 from ./include/linux/bug.h:5,
                 from ./include/linux/page-flags.h:10,
                 from kernel/bounds.c:10:
./arch/x86/include/asm/bitops.h: In function 'set_bit':
./arch/x86/include/asm/bitops.h:164:2: error: implicit declaration of
function 'WARN_ON_ONCE' [-Werror=implicit-function-declaration]
  WARN_ON_ONCE(!ALIGNED_TO_UNSIGNED_LONG((unsigned long)addr));
  ^
  UPD     include/generated/timeconst.h
cc1: some warnings being treated as errors
scripts/Makefile.build:112: recipe for target 'kernel/bounds.s' failed
make[1]: *** [kernel/bounds.s] Error 1
make[1]: *** Waiting for unfinished jobs....
Makefile:1095: recipe for target 'prepare0' failed
make: *** [prepare0] Error 2
make: *** Waiting for unfinished jobs....
  INSTALL usr/include/asm/ (62 files)

The compiler complains that "WARN_ON_ONCE()" is undefined and the common
approach to fix this is to include the right header file in
<linux/bitops.h>. But, including any of <linux/bug.h> or
<asm-generic/bug.h> or <asm/bug.h> in <linux/bitops.h> doesn't help because
these files are already included by <linux/page-flags.h> (please refer to
the include chain above) and they won't be double included because of
<linux/bitops.h> with this include chain.

So, we need a different approach to solve this issue. Taking a look at
<linux/kernel.h> revealed that it doesn't use any macros, functions or
data_types introduced by <linux/bitops.h>. Hence, don't include
<linux/bitops.h> in <linux/kernel.h>. This fixes the issue because now
there are no references to "WARN_ON_ONCE()" by <linux/kernel.h>. Not
including <linux/bitops.h> in <linux/kernel.h> does help in progressing the
build further but now it breaks at a different point. Applying the same
above technique reveals that <linux/kernel.h> doesn't need <linux/log2.h>
and <asm/div64.h>. Hence, don't include them either.

Since, <linux/kernel.h> now doesn't include <linux/bitops.h>,
<linux/log2.h> and <asm/div64.h>, the build now breaks at yet another point
because there are some files in the repository that include
<linux/kernel.h> but refer to macros, functions or data_types introduced by
either of <linux/bitops.h>, <linux/log2.h> or <asm/div64.h>. Hence, fix
them up appropriately by including the right header file.

Note: This patch has been tested with "make allyesconfig" for x86_64 and
has been going through 0-day Continuous Integration (CI) system (thanks for
reporting build issues with IA64, Microblaze, MIPS, m68k, xtensa and
powerpc). But, please be aware that as <linux/kernel.h> has been modified
it could lead to build failures for some unknown configs on architectures
other than x86_64. The fix should really be simple, i.e. include the right
header file (please refer to this patch for fix examples).

Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Build-tested-by: kbuild test robot <lkp@intel.com>
---

Questions to the community:
---------------------------
1. Although this patch has been an outcome to support adding WARN_ON_ONCE() to
xxx_bit() ops functions, it solves an underlying issue i.e. including
unnecessary header files in linux/kernel.h. So, if we consider the problem to be
this, can we view this patch as a stand-alone (i.e. decouple it entirely from
split lock)?

2. This patch causes a lot of header thrash and has already broken build for
various architectures (all the outstanding issues are addressed), do you think
it makes sense to have this header thrash?

3. If this patch makes sense, does it also make sense to get this into Andrew
Morton's tree first so that it's well tested before it's considered for Linus's
tree?

Note:
-----
This patch (on other architectures except x86) has been tested only for
build issues and wasn't tested for boot issues (i.e. I haven't booted this
kernel on other architectures because I don't have access to them and I
*assumed* there shouldn't be any boot problems because of the nature of the
patch)

 arch/microblaze/kernel/cpu/pvr.c                     | 1 +
 arch/mips/ralink/mt7620.c                            | 1 +
 arch/powerpc/include/asm/cmpxchg.h                   | 1 +
 arch/xtensa/include/asm/traps.h                      | 1 +
 drivers/media/dvb-frontends/cxd2880/cxd2880_common.c | 2 ++
 drivers/net/ethernet/freescale/fman/fman_muram.c     | 1 +
 drivers/soc/renesas/rcar-sysc.h                      | 2 +-
 drivers/staging/fwserial/dma_fifo.c                  | 1 +
 include/linux/assoc_array_priv.h                     | 1 +
 include/linux/ata.h                                  | 1 +
 include/linux/gpio/consumer.h                        | 1 +
 include/linux/iommu-helper.h                         | 1 +
 include/linux/kernel.h                               | 4 ----
 include/linux/sched.h                                | 1 +
 kernel/bpf/tnum.c                                    | 1 +
 lib/clz_ctz.c                                        | 1 +
 lib/errseq.c                                         | 1 +
 lib/flex_proportions.c                               | 1 +
 lib/hexdump.c                                        | 1 +
 lib/lz4/lz4defs.h                                    | 1 +
 lib/math/div64.c                                     | 1 +
 lib/math/gcd.c                                       | 1 +
 lib/math/reciprocal_div.c                            | 1 +
 lib/siphash.c                                        | 1 +
 net/netfilter/nf_conntrack_h323_asn1.c               | 1 +
 25 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/microblaze/kernel/cpu/pvr.c b/arch/microblaze/kernel/cpu/pvr.c
index 8d0dc6db48cf..f139052a39bd 100644
--- a/arch/microblaze/kernel/cpu/pvr.c
+++ b/arch/microblaze/kernel/cpu/pvr.c
@@ -14,6 +14,7 @@
 #include <linux/compiler.h>
 #include <asm/exceptions.h>
 #include <asm/pvr.h>
+#include <linux/irqflags.h>
 
 /*
  * Until we get an assembler that knows about the pvr registers,
diff --git a/arch/mips/ralink/mt7620.c b/arch/mips/ralink/mt7620.c
index c1ce6f43642b..89079885e4bc 100644
--- a/arch/mips/ralink/mt7620.c
+++ b/arch/mips/ralink/mt7620.c
@@ -18,6 +18,7 @@
 #include <asm/mach-ralink/ralink_regs.h>
 #include <asm/mach-ralink/mt7620.h>
 #include <asm/mach-ralink/pinmux.h>
+#include <asm/div64.h>
 
 #include "common.h"
 
diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h
index 27183871eb3b..8727e2b9378b 100644
--- a/arch/powerpc/include/asm/cmpxchg.h
+++ b/arch/powerpc/include/asm/cmpxchg.h
@@ -7,6 +7,7 @@
 #include <asm/synch.h>
 #include <linux/bug.h>
 #include <asm/asm-405.h>
+#include <linux/bits.h>
 
 #ifdef __BIG_ENDIAN
 #define BITOFF_CAL(size, off)	((sizeof(u32) - size - off) * BITS_PER_BYTE)
diff --git a/arch/xtensa/include/asm/traps.h b/arch/xtensa/include/asm/traps.h
index f720a57d0a5b..8ae962f5352b 100644
--- a/arch/xtensa/include/asm/traps.h
+++ b/arch/xtensa/include/asm/traps.h
@@ -11,6 +11,7 @@
 #define _XTENSA_TRAPS_H
 
 #include <asm/ptrace.h>
+#include <asm/regs.h>
 
 /*
  * Per-CPU exception handling data structure.
diff --git a/drivers/media/dvb-frontends/cxd2880/cxd2880_common.c b/drivers/media/dvb-frontends/cxd2880/cxd2880_common.c
index d6f5af6609c1..898e211365fb 100644
--- a/drivers/media/dvb-frontends/cxd2880/cxd2880_common.c
+++ b/drivers/media/dvb-frontends/cxd2880/cxd2880_common.c
@@ -7,6 +7,8 @@
  * Copyright (C) 2016, 2017, 2018 Sony Semiconductor Solutions Corporation
  */
 
+#include <linux/bits.h>
+
 #include "cxd2880_common.h"
 
 int cxd2880_convert2s_complement(u32 value, u32 bitlen)
diff --git a/drivers/net/ethernet/freescale/fman/fman_muram.c b/drivers/net/ethernet/freescale/fman/fman_muram.c
index 5ec94d243da0..28edee4779aa 100644
--- a/drivers/net/ethernet/freescale/fman/fman_muram.c
+++ b/drivers/net/ethernet/freescale/fman/fman_muram.c
@@ -35,6 +35,7 @@
 #include <linux/io.h>
 #include <linux/slab.h>
 #include <linux/genalloc.h>
+#include <linux/log2.h>
 
 struct muram_info {
 	struct gen_pool *pool;
diff --git a/drivers/soc/renesas/rcar-sysc.h b/drivers/soc/renesas/rcar-sysc.h
index 485520a5b295..7595b731a6a2 100644
--- a/drivers/soc/renesas/rcar-sysc.h
+++ b/drivers/soc/renesas/rcar-sysc.h
@@ -8,7 +8,7 @@
 #define __SOC_RENESAS_RCAR_SYSC_H__
 
 #include <linux/types.h>
-
+#include <linux/bitops.h>
 
 /*
  * Power Domain flags
diff --git a/drivers/staging/fwserial/dma_fifo.c b/drivers/staging/fwserial/dma_fifo.c
index 5dcbab6fd622..d06b72594658 100644
--- a/drivers/staging/fwserial/dma_fifo.c
+++ b/drivers/staging/fwserial/dma_fifo.c
@@ -9,6 +9,7 @@
 #include <linux/slab.h>
 #include <linux/list.h>
 #include <linux/bug.h>
+#include <linux/log2.h>
 
 #include "dma_fifo.h"
 
diff --git a/include/linux/assoc_array_priv.h b/include/linux/assoc_array_priv.h
index dca733ef6750..9b4b3e666b74 100644
--- a/include/linux/assoc_array_priv.h
+++ b/include/linux/assoc_array_priv.h
@@ -13,6 +13,7 @@
 #ifdef CONFIG_ASSOCIATIVE_ARRAY
 
 #include <linux/assoc_array.h>
+#include <linux/log2.h>
 
 #define ASSOC_ARRAY_FAN_OUT		16	/* Number of slots per node */
 #define ASSOC_ARRAY_FAN_MASK		(ASSOC_ARRAY_FAN_OUT - 1)
diff --git a/include/linux/ata.h b/include/linux/ata.h
index 6e67aded28f8..506f8d4487c5 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -17,6 +17,7 @@
 #include <linux/string.h>
 #include <linux/types.h>
 #include <asm/byteorder.h>
+#include <linux/bitops.h>
 
 /* defines only for the constants which don't work well as enums */
 #define ATA_DMA_BOUNDARY	0xffffUL
diff --git a/include/linux/gpio/consumer.h b/include/linux/gpio/consumer.h
index 9ddcf50a3c59..099048c0edb4 100644
--- a/include/linux/gpio/consumer.h
+++ b/include/linux/gpio/consumer.h
@@ -5,6 +5,7 @@
 #include <linux/bug.h>
 #include <linux/err.h>
 #include <linux/kernel.h>
+#include <linux/bits.h>
 
 struct device;
 
diff --git a/include/linux/iommu-helper.h b/include/linux/iommu-helper.h
index 70d01edcbf8b..20b706abadc7 100644
--- a/include/linux/iommu-helper.h
+++ b/include/linux/iommu-helper.h
@@ -4,6 +4,7 @@
 
 #include <linux/bug.h>
 #include <linux/kernel.h>
+#include <linux/log2.h>
 
 static inline unsigned long iommu_device_max_index(unsigned long size,
 						   unsigned long offset,
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 74b1ee9027f5..117093d268d3 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -9,15 +9,11 @@
 #include <linux/stddef.h>
 #include <linux/types.h>
 #include <linux/compiler.h>
-#include <linux/bitops.h>
-#include <linux/log2.h>
 #include <linux/typecheck.h>
 #include <linux/printk.h>
 #include <linux/build_bug.h>
 #include <asm/byteorder.h>
-#include <asm/div64.h>
 #include <uapi/linux/kernel.h>
-#include <asm/div64.h>
 
 #define STACK_MAGIC	0xdeadbeef
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 11837410690f..8eec4404b4a2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -29,6 +29,7 @@
 #include <linux/mm_types_task.h>
 #include <linux/task_io_accounting.h>
 #include <linux/rseq.h>
+#include <linux/log2.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c
index ca52b9642943..2f28ef5a2929 100644
--- a/kernel/bpf/tnum.c
+++ b/kernel/bpf/tnum.c
@@ -8,6 +8,7 @@
  */
 #include <linux/kernel.h>
 #include <linux/tnum.h>
+#include <linux/bitops.h>
 
 #define TNUM(_v, _m)	(struct tnum){.value = _v, .mask = _m}
 /* A completely unknown value */
diff --git a/lib/clz_ctz.c b/lib/clz_ctz.c
index 2e11e48446ab..8e807c60a69a 100644
--- a/lib/clz_ctz.c
+++ b/lib/clz_ctz.c
@@ -15,6 +15,7 @@
 
 #include <linux/export.h>
 #include <linux/kernel.h>
+#include <linux/bitops.h>
 
 int __weak __ctzsi2(int val);
 int __weak __ctzsi2(int val)
diff --git a/lib/errseq.c b/lib/errseq.c
index 81f9e33aa7e7..93e9b94358dc 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -3,6 +3,7 @@
 #include <linux/bug.h>
 #include <linux/atomic.h>
 #include <linux/errseq.h>
+#include <linux/log2.h>
 
 /*
  * An errseq_t is a way of recording errors in one place, and allowing any
diff --git a/lib/flex_proportions.c b/lib/flex_proportions.c
index 7852bfff50b1..13be57ccd54c 100644
--- a/lib/flex_proportions.c
+++ b/lib/flex_proportions.c
@@ -34,6 +34,7 @@
  * which something happened with proportion of type j.
  */
 #include <linux/flex_proportions.h>
+#include <linux/log2.h>
 
 int fprop_global_init(struct fprop_global *p, gfp_t gfp)
 {
diff --git a/lib/hexdump.c b/lib/hexdump.c
index 81b70ed37209..926c7597920c 100644
--- a/lib/hexdump.c
+++ b/lib/hexdump.c
@@ -13,6 +13,7 @@
 #include <linux/kernel.h>
 #include <linux/export.h>
 #include <asm/unaligned.h>
+#include <linux/log2.h>
 
 const char hex_asc[] = "0123456789abcdef";
 EXPORT_SYMBOL(hex_asc);
diff --git a/lib/lz4/lz4defs.h b/lib/lz4/lz4defs.h
index 1a7fa9d9170f..9de1a56a462b 100644
--- a/lib/lz4/lz4defs.h
+++ b/lib/lz4/lz4defs.h
@@ -37,6 +37,7 @@
 
 #include <asm/unaligned.h>
 #include <linux/string.h>	 /* memset, memcpy */
+#include <linux/bitops.h>
 
 #define FORCE_INLINE __always_inline
 
diff --git a/lib/math/div64.c b/lib/math/div64.c
index 368ca7fd0d82..6e7673033fe7 100644
--- a/lib/math/div64.c
+++ b/lib/math/div64.c
@@ -21,6 +21,7 @@
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/math64.h>
+#include <linux/bitops.h>
 
 /* Not needed on 64bit architectures */
 #if BITS_PER_LONG == 32
diff --git a/lib/math/gcd.c b/lib/math/gcd.c
index e3b042214d1b..6be5fd7d199d 100644
--- a/lib/math/gcd.c
+++ b/lib/math/gcd.c
@@ -2,6 +2,7 @@
 #include <linux/kernel.h>
 #include <linux/gcd.h>
 #include <linux/export.h>
+#include <linux/bitops.h>
 
 /*
  * This implements the binary GCD algorithm. (Often attributed to Stein,
diff --git a/lib/math/reciprocal_div.c b/lib/math/reciprocal_div.c
index bf043258fa00..f438baa3dbc3 100644
--- a/lib/math/reciprocal_div.c
+++ b/lib/math/reciprocal_div.c
@@ -4,6 +4,7 @@
 #include <asm/div64.h>
 #include <linux/reciprocal_div.h>
 #include <linux/export.h>
+#include <linux/bitops.h>
 
 /*
  * For a description of the algorithm please have a look at
diff --git a/lib/siphash.c b/lib/siphash.c
index c47bb6ff2149..16677dce91de 100644
--- a/lib/siphash.c
+++ b/lib/siphash.c
@@ -12,6 +12,7 @@
 
 #include <linux/siphash.h>
 #include <asm/unaligned.h>
+#include <linux/bitops.h>
 
 #if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
 #include <linux/dcache.h>
diff --git a/net/netfilter/nf_conntrack_h323_asn1.c b/net/netfilter/nf_conntrack_h323_asn1.c
index 4c2ef42e189c..5c9d2291f140 100644
--- a/net/netfilter/nf_conntrack_h323_asn1.c
+++ b/net/netfilter/nf_conntrack_h323_asn1.c
@@ -16,6 +16,7 @@
 #include <stdio.h>
 #endif
 #include <linux/netfilter/nf_conntrack_h323_asn1.h>
+#include <linux/bits.h>
 
 /* Trace Flag */
 #ifndef H323_TRACE
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v9 17/17] x86/split_lock: Warn on unaligned address in atomic bit operations
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (15 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 16/17] x86/split_lock: Reorganize few header files in order to call WARN_ON_ONCE() in atomic bit ops Fenghua Yu
@ 2019-06-18 22:41 ` Fenghua Yu
  2019-06-26 22:00   ` Thomas Gleixner
  2019-09-16 22:39 ` [PATCH 0/3] Fix some 4-byte vs. 8-byte alignment issues Tony Luck
  17 siblings, 1 reply; 85+ messages in thread
From: Fenghua Yu @ 2019-06-18 22:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm, Fenghua Yu

An atomic bit operation operates one bit in a single unsigned long location
in a bitmap. In 64-bit mode, the location is at:
base address of the bitmap + (bit offset in the bitmap / 64) * 8

If the base address is unaligned to unsigned long, each unsigned long
location operated by the atomic operation will be unaligned to unsigned
long and a split lock issue will happen if the unsigned long location
crosses two cache lines.

So checking alignment of the base address can proactively audit potential
split lock issues in the atomic bit operation. A real split lock issue
may or may not happen depending on the bit offset.

Once analyzing the warning information, kernel developer can fix the
potential split lock issue by aligning the base address to unsigned long
instead of waiting for a real split lock issue happens.

After applying this patch on 5.2-rc1, vmlinux size is increased by 0.2%
and bzImage size is increased by 0.3% with allyesconfig.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---

FYI. After applying this patch, I haven't noticed any warning generated
from this patch with booting and limited run time tests on a few platforms.

 arch/x86/include/asm/bitops.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 8e790ec219a5..44d7a353d6fd 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -14,6 +14,7 @@
 #endif
 
 #include <linux/compiler.h>
+#include <linux/bug.h>
 #include <asm/alternative.h>
 #include <asm/rmwcc.h>
 #include <asm/barrier.h>
@@ -67,6 +68,8 @@
 static __always_inline void
 set_bit(long nr, volatile unsigned long *addr)
 {
+	WARN_ON_ONCE(!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long)));
+
 	if (IS_IMMEDIATE(nr)) {
 		asm volatile(LOCK_PREFIX "orb %1,%0"
 			: CONST_MASK_ADDR(nr, addr)
@@ -105,6 +108,8 @@ static __always_inline void __set_bit(long nr, volatile unsigned long *addr)
 static __always_inline void
 clear_bit(long nr, volatile unsigned long *addr)
 {
+	WARN_ON_ONCE(!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long)));
+
 	if (IS_IMMEDIATE(nr)) {
 		asm volatile(LOCK_PREFIX "andb %1,%0"
 			: CONST_MASK_ADDR(nr, addr)
@@ -137,6 +142,9 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr)
 static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr)
 {
 	bool negative;
+
+	WARN_ON_ONCE(!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long)));
+
 	asm volatile(LOCK_PREFIX "andb %2,%1"
 		CC_SET(s)
 		: CC_OUT(s) (negative), WBYTE_ADDR(addr)
@@ -186,6 +194,8 @@ static __always_inline void __change_bit(long nr, volatile unsigned long *addr)
  */
 static __always_inline void change_bit(long nr, volatile unsigned long *addr)
 {
+	WARN_ON_ONCE(!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long)));
+
 	if (IS_IMMEDIATE(nr)) {
 		asm volatile(LOCK_PREFIX "xorb %1,%0"
 			: CONST_MASK_ADDR(nr, addr)
@@ -206,6 +216,8 @@ static __always_inline void change_bit(long nr, volatile unsigned long *addr)
  */
 static __always_inline bool test_and_set_bit(long nr, volatile unsigned long *addr)
 {
+	WARN_ON_ONCE(!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long)));
+
 	return GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(bts), *addr, c, "Ir", nr);
 }
 
@@ -252,6 +264,8 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long *
  */
 static __always_inline bool test_and_clear_bit(long nr, volatile unsigned long *addr)
 {
+	WARN_ON_ONCE(!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long)));
+
 	return GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(btr), *addr, c, "Ir", nr);
 }
 
@@ -305,6 +319,8 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon
  */
 static __always_inline bool test_and_change_bit(long nr, volatile unsigned long *addr)
 {
+	WARN_ON_ONCE(!IS_ALIGNED((unsigned long)addr, sizeof(unsigned long)));
+
 	return GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(btc), *addr, c, "Ir", nr);
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* RE: [PATCH v9 02/17] drivers/net/b44: Align pwol_mask to unsigned long for better performance
  2019-06-18 22:41 ` [PATCH v9 02/17] drivers/net/b44: Align pwol_mask to unsigned long for better performance Fenghua Yu
@ 2019-06-24 15:12   ` David Laight
  2019-06-24 18:43     ` Paolo Bonzini
  0 siblings, 1 reply; 85+ messages in thread
From: David Laight @ 2019-06-24 15:12 UTC (permalink / raw)
  To: 'Fenghua Yu',
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm

From: Fenghua Yu
> Sent: 18 June 2019 23:41
> From: Peter Zijlstra <peterz@infradead.org>
> 
> A bit in pwol_mask is set in b44_magic_pattern() by atomic set_bit().
> But since pwol_mask is local and never exposed to concurrency, there is
> no need to set bit in pwol_mask atomically.
> 
> set_bit() sets the bit in a single unsigned long location. Because
> pwol_mask may not be aligned to unsigned long, the location may cross two
> cache lines. On x86, accessing two cache lines in locked instruction in
> set_bit() is called split locked access and can cause overall performance
> degradation.
> 
> So use non atomic __set_bit() to set pwol_mask bits. __set_bit() won't hit
> split lock issue on x86.
> 
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
>  drivers/net/ethernet/broadcom/b44.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
> index 97ab0dd25552..5738ab963dfb 100644
> --- a/drivers/net/ethernet/broadcom/b44.c
> +++ b/drivers/net/ethernet/broadcom/b44.c
> @@ -1520,7 +1520,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
> 
>  	memset(ppattern + offset, 0xff, magicsync);
>  	for (j = 0; j < magicsync; j++)
> -		set_bit(len++, (unsigned long *) pmask);
> +		__set_bit(len++, (unsigned long *)pmask);
> 
>  	for (j = 0; j < B44_MAX_PATTERNS; j++) {
>  		if ((B44_PATTERN_SIZE - len) >= ETH_ALEN)
> @@ -1532,7 +1532,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
>  		for (k = 0; k< ethaddr_bytes; k++) {
>  			ppattern[offset + magicsync +
>  				(j * ETH_ALEN) + k] = macaddr[k];
> -			set_bit(len++, (unsigned long *) pmask);
> +			__set_bit(len++, (unsigned long *)pmask);

Is this code expected to do anything sensible on BE systems?
Casting the bitmask[] argument to any of the set_bit() functions is dubious at best.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [PATCH v9 03/17] x86/split_lock: Align x86_capability to unsigned long to avoid split locked access
  2019-06-18 22:41 ` [PATCH v9 03/17] x86/split_lock: Align x86_capability to unsigned long to avoid split locked access Fenghua Yu
@ 2019-06-24 15:12   ` David Laight
  2019-06-25 23:54     ` Fenghua Yu
  0 siblings, 1 reply; 85+ messages in thread
From: David Laight @ 2019-06-24 15:12 UTC (permalink / raw)
  To: 'Fenghua Yu',
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li ,
	Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm

From: Fenghua Yu
> Sent: 18 June 2019 23:41
> 
> set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
> operate on bitmap defined in x86_capability.
> 
> Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
> the location is at:
> base address of x86_capability + (bit offset in x86_capability / 64) * 8
> 
> Since base address of x86_capability may not be aligned to unsigned long,
> the single unsigned long location may cross two cache lines and
> accessing the location by locked BTS/BTR introductions will cause
> split lock.
> 
> To fix the split lock issue, align x86_capability to size of unsigned long
> so that the location will be always within one cache line.
> 
> Changing x86_capability's type to unsigned long may also fix the issue
> because x86_capability will be naturally aligned to size of unsigned long.
> But this needs additional code changes. So choose the simpler solution
> by setting the array's alignment to size of unsigned long.

As I've pointed out several times before this isn't the only int[] data item
in this code that gets passed to the bit operations.
Just because you haven't got a 'splat' from the others doesn't mean they don't
need fixing at the same time.

> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
>  arch/x86/include/asm/processor.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index c34a35c78618..d3e017723634 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -93,7 +93,9 @@ struct cpuinfo_x86 {
>  	__u32			extended_cpuid_level;
>  	/* Maximum supported CPUID level, -1=no CPUID: */
>  	int			cpuid_level;
> -	__u32			x86_capability[NCAPINTS + NBUGINTS];
> +	/* Aligned to size of unsigned long to avoid split lock in atomic ops */

Wrong comment.
Something like:
	/* Align to sizeof (unsigned long) because the array is passed to the
	 * atomic bit-op functions which require an aligned unsigned long []. */

> +	__u32			x86_capability[NCAPINTS + NBUGINTS]
> +				__aligned(sizeof(unsigned long));

It might be better to use a union (maybe unnamed) here.

>  	char			x86_vendor_id[16];
>  	char			x86_model_id[64];
>  	/* in KB - valid for CPUS which support this call: */
> --
> 2.19.1

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 02/17] drivers/net/b44: Align pwol_mask to unsigned long for better performance
  2019-06-24 15:12   ` David Laight
@ 2019-06-24 18:43     ` Paolo Bonzini
  0 siblings, 0 replies; 85+ messages in thread
From: Paolo Bonzini @ 2019-06-24 18:43 UTC (permalink / raw)
  To: David Laight, 'Fenghua Yu',
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Radim Krcmar,
	Christopherson Sean J, Ashok Raj, Tony Luck, Dan Williams,
	Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar
  Cc: linux-kernel, x86, kvm

On 24/06/19 17:12, David Laight wrote:
> From: Fenghua Yu
>> Sent: 18 June 2019 23:41
>> From: Peter Zijlstra <peterz@infradead.org>
>>
>> A bit in pwol_mask is set in b44_magic_pattern() by atomic set_bit().
>> But since pwol_mask is local and never exposed to concurrency, there is
>> no need to set bit in pwol_mask atomically.
>>
>> set_bit() sets the bit in a single unsigned long location. Because
>> pwol_mask may not be aligned to unsigned long, the location may cross two
>> cache lines. On x86, accessing two cache lines in locked instruction in
>> set_bit() is called split locked access and can cause overall performance
>> degradation.
>>
>> So use non atomic __set_bit() to set pwol_mask bits. __set_bit() won't hit
>> split lock issue on x86.
>>
>> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
>> ---
>>  drivers/net/ethernet/broadcom/b44.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
>> index 97ab0dd25552..5738ab963dfb 100644
>> --- a/drivers/net/ethernet/broadcom/b44.c
>> +++ b/drivers/net/ethernet/broadcom/b44.c
>> @@ -1520,7 +1520,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
>>
>>  	memset(ppattern + offset, 0xff, magicsync);
>>  	for (j = 0; j < magicsync; j++)
>> -		set_bit(len++, (unsigned long *) pmask);
>> +		__set_bit(len++, (unsigned long *)pmask);
>>
>>  	for (j = 0; j < B44_MAX_PATTERNS; j++) {
>>  		if ((B44_PATTERN_SIZE - len) >= ETH_ALEN)
>> @@ -1532,7 +1532,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
>>  		for (k = 0; k< ethaddr_bytes; k++) {
>>  			ppattern[offset + magicsync +
>>  				(j * ETH_ALEN) + k] = macaddr[k];
>> -			set_bit(len++, (unsigned long *) pmask);
>> +			__set_bit(len++, (unsigned long *)pmask);
> 
> Is this code expected to do anything sensible on BE systems?

Probably not, but it's not wrong in different ways before/after the patch.

Paolo

> Casting the bitmask[] argument to any of the set_bit() functions is dubious at best.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 03/17] x86/split_lock: Align x86_capability to unsigned long to avoid split locked access
  2019-06-24 15:12   ` David Laight
@ 2019-06-25 23:54     ` Fenghua Yu
  2019-06-26 19:15       ` Thomas Gleixner
  0 siblings, 1 reply; 85+ messages in thread
From: Fenghua Yu @ 2019-06-25 23:54 UTC (permalink / raw)
  To: David Laight
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar,
	linux-kernel, x86, kvm

On Mon, Jun 24, 2019 at 03:12:49PM +0000, David Laight wrote:
> From: Fenghua Yu
> > Sent: 18 June 2019 23:41
> > 
> > set_cpu_cap() calls locked BTS and clear_cpu_cap() calls locked BTR to
> > operate on bitmap defined in x86_capability.
> > 
> > Locked BTS/BTR accesses a single unsigned long location. In 64-bit mode,
> > the location is at:
> > base address of x86_capability + (bit offset in x86_capability / 64) * 8
> > 
> > Since base address of x86_capability may not be aligned to unsigned long,
> > the single unsigned long location may cross two cache lines and
> > accessing the location by locked BTS/BTR introductions will cause
> > split lock.
> > 
> > To fix the split lock issue, align x86_capability to size of unsigned long
> > so that the location will be always within one cache line.
> > 
> > Changing x86_capability's type to unsigned long may also fix the issue
> > because x86_capability will be naturally aligned to size of unsigned long.
> > But this needs additional code changes. So choose the simpler solution
> > by setting the array's alignment to size of unsigned long.
> 
> As I've pointed out several times before this isn't the only int[] data item
> in this code that gets passed to the bit operations.
> Just because you haven't got a 'splat' from the others doesn't mean they don't
> need fixing at the same time.

As Thomas suggested in https://lkml.org/lkml/2019/4/25/353, patch #0017
in this patch set implements WARN_ON_ONCE() to audit possible unalignment
in atomic bit ops.

This patch set just enables split lock detection first. Fixing ALL split
lock issues might be practical after the patch is upstreamed and used widely.

> 
> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> > ---
> >  arch/x86/include/asm/processor.h | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > index c34a35c78618..d3e017723634 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -93,7 +93,9 @@ struct cpuinfo_x86 {
> >  	__u32			extended_cpuid_level;
> >  	/* Maximum supported CPUID level, -1=no CPUID: */
> >  	int			cpuid_level;
> > -	__u32			x86_capability[NCAPINTS + NBUGINTS];
> > +	/* Aligned to size of unsigned long to avoid split lock in atomic ops */
> 
> Wrong comment.
> Something like:
> 	/* Align to sizeof (unsigned long) because the array is passed to the
> 	 * atomic bit-op functions which require an aligned unsigned long []. */

The problem we try to fix here is not because "the array is passed to the
atomic bit-op functions which require an aligned unsigned long []".

The problem is because of the possible split lock issue. If it's not because
of split lock issue, there is no need to have this patch.

So I would think my comment is right to point out explicitly why we need
this alignment.

> 
> > +	__u32			x86_capability[NCAPINTS + NBUGINTS]
> > +				__aligned(sizeof(unsigned long));
> 
> It might be better to use a union (maybe unnamed) here.

That would be another patch. This patch just simply fixes the split lock
issue.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 03/17] x86/split_lock: Align x86_capability to unsigned long to avoid split locked access
  2019-06-25 23:54     ` Fenghua Yu
@ 2019-06-26 19:15       ` Thomas Gleixner
  0 siblings, 0 replies; 85+ messages in thread
From: Thomas Gleixner @ 2019-06-26 19:15 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: David Laight, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar,
	linux-kernel, x86, kvm

On Tue, 25 Jun 2019, Fenghua Yu wrote:
> On Mon, Jun 24, 2019 at 03:12:49PM +0000, David Laight wrote:
> > > @@ -93,7 +93,9 @@ struct cpuinfo_x86 {
> > >  	__u32			extended_cpuid_level;
> > >  	/* Maximum supported CPUID level, -1=no CPUID: */
> > >  	int			cpuid_level;
> > > -	__u32			x86_capability[NCAPINTS + NBUGINTS];
> > > +	/* Aligned to size of unsigned long to avoid split lock in atomic ops */
> > 
> > Wrong comment.
> > Something like:
> > 	/* Align to sizeof (unsigned long) because the array is passed to the
> > 	 * atomic bit-op functions which require an aligned unsigned long []. */
> 
> The problem we try to fix here is not because "the array is passed to the
> atomic bit-op functions which require an aligned unsigned long []".
> 
> The problem is because of the possible split lock issue. If it's not because
> of split lock issue, there is no need to have this patch.
> 
> So I would think my comment is right to point out explicitly why we need
> this alignment.

The underlying problem why you need that alignemnt is that the invocation
of the bitops does a type cast. And that's independent of split lock. Split
lock makes the problem visible. So the alignment papers over that. And
while this 'works' in x86 it's fundamentaly broken on big endian. So no,
your comment is not right to the point because it gives the wrong
information.

> > 
> > > +	__u32			x86_capability[NCAPINTS + NBUGINTS]
> > > +				__aligned(sizeof(unsigned long));
> > 
> > It might be better to use a union (maybe unnamed) here.
> 
> That would be another patch. This patch just simply fixes the split lock
> issue.

Why? That's a straight forward and obvious fix and way better than these
alignment games. It's still wrong for BE....

So anyway, this wants a comment which explains the underlying issue and not
a comment which blurbs about split locks.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-06-18 22:41 ` [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock Fenghua Yu
@ 2019-06-26 20:20   ` Thomas Gleixner
  2019-06-26 20:36     ` Fenghua Yu
  0 siblings, 1 reply; 85+ messages in thread
From: Thomas Gleixner @ 2019-06-26 20:20 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Peter Zijlstra,
	Andrew Morton, Dave Hansen, Paolo Bonzini, Radim Krcmar,
	Christopherson Sean J, Ashok Raj, Tony Luck, Dan Williams,
	Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel,
	x86, kvm

On Tue, 18 Jun 2019, Fenghua Yu wrote:
> +
> +static atomic_t split_lock_debug;
> +
> +void split_lock_disable(void)
> +{
> +	/* Disable split lock detection on this CPU */
> +	this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);
> +	wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_cached));
> +
> +	/*
> +	 * Use the atomic variable split_lock_debug to ensure only the
> +	 * first CPU hitting split lock issue prints one single complete
> +	 * warning. This also solves the race if the split-lock #AC fault
> +	 * is re-triggered by NMI of perf context interrupting one
> +	 * split-lock warning execution while the original WARN_ONCE() is
> +	 * executing.
> +	 */
> +	if (atomic_cmpxchg(&split_lock_debug, 0, 1) == 0) {
> +		WARN_ONCE(1, "split lock operation detected\n");
> +		atomic_set(&split_lock_debug, 0);

What's the purpose of this atomic_set()?

> +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> +{
> +	unsigned int trapnr = X86_TRAP_AC;
> +	char str[] = "alignment check";
> +	int signr = SIGBUS;
> +
> +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> +	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> +		return;
> +
> +	cond_local_irq_enable(regs);
> +	if (!user_mode(regs) && static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
> +		/*
> +		 * Only split locks can generate #AC from kernel mode.
> +		 *
> +		 * The split-lock detection feature is a one-shot
> +		 * debugging facility, so we disable it immediately and
> +		 * print a warning.
> +		 *
> +		 * This also solves the instruction restart problem: we
> +		 * return the faulting instruction right after this it

we return the faulting instruction ... to the store so we get our deposit
back :)

  the fault handler returns to the faulting instruction which will be then
  executed without ....

Don't try to impersonate code, cpus or whatever. It doesn't make sense and
confuses people.

> +		 * will be executed without generating another #AC fault
> +		 * and getting into an infinite loop, instead it will
> +		 * continue without side effects to the interrupted
> +		 * execution context.

That last part 'instead .....' is redundant. It's entirely clear from the
above that the faulting instruction is reexecuted ....

Please write concise comments and do try to repeat the same information
with a different painting.

> +		 *
> +		 * Split-lock detection will remain disabled after this,
> +		 * until the next reboot.
> +		 */
> +		split_lock_disable();
> +
> +		return;
> +	}
> +
> +	/* Handle #AC generated in any other cases. */
> +	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
> +		error_code, BUS_ADRALN, NULL);
> +}
> +
>  #ifdef CONFIG_VMAP_STACK
>  __visible void __noreturn handle_stack_overflow(const char *message,
>  						struct pt_regs *regs,
> -- 
> 2.19.1
> 
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 13/17] x86/split_lock: Disable split lock detection by kernel parameter "nosplit_lock_detect"
  2019-06-18 22:41 ` [PATCH v9 13/17] x86/split_lock: Disable split lock detection by kernel parameter "nosplit_lock_detect" Fenghua Yu
@ 2019-06-26 20:34   ` Thomas Gleixner
  2019-06-26 20:37     ` Fenghua Yu
  0 siblings, 1 reply; 85+ messages in thread
From: Thomas Gleixner @ 2019-06-26 20:34 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Peter Zijlstra,
	Andrew Morton, Dave Hansen, Paolo Bonzini, Radim Krcmar,
	Christopherson Sean J, Ashok Raj, Tony Luck, Dan Williams,
	Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel,
	x86, kvm

On Tue, 18 Jun 2019, Fenghua Yu wrote:
>  
>  static void split_lock_update_msr(void)
>  {
> -	/* Enable split lock detection */
> -	this_cpu_or(msr_test_ctl_cached, MSR_TEST_CTL_SPLIT_LOCK_DETECT);
> +	if (split_lock_detect_enabled) {
> +		/* Enable split lock detection */
> +		this_cpu_or(msr_test_ctl_cached, MSR_TEST_CTL_SPLIT_LOCK_DETECT);
> +	} else {
> +		/* Disable split lock detection */

Could you please comment the non obvious things and not the obvious ones?

> +		this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);

It's entirely clear that the if (enabled) path enables it or am I missing
something?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-06-26 20:20   ` Thomas Gleixner
@ 2019-06-26 20:36     ` Fenghua Yu
  2019-06-26 21:47       ` Thomas Gleixner
  0 siblings, 1 reply; 85+ messages in thread
From: Fenghua Yu @ 2019-06-26 20:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Peter Zijlstra,
	Andrew Morton, Dave Hansen, Paolo Bonzini, Radim Krcmar,
	Christopherson Sean J, Ashok Raj, Tony Luck, Dan Williams,
	Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel,
	x86, kvm

On Wed, Jun 26, 2019 at 10:20:05PM +0200, Thomas Gleixner wrote:
> On Tue, 18 Jun 2019, Fenghua Yu wrote:
> > +
> > +static atomic_t split_lock_debug;
> > +
> > +void split_lock_disable(void)
> > +{
> > +	/* Disable split lock detection on this CPU */
> > +	this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);
> > +	wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_cached));
> > +
> > +	/*
> > +	 * Use the atomic variable split_lock_debug to ensure only the
> > +	 * first CPU hitting split lock issue prints one single complete
> > +	 * warning. This also solves the race if the split-lock #AC fault
> > +	 * is re-triggered by NMI of perf context interrupting one
> > +	 * split-lock warning execution while the original WARN_ONCE() is
> > +	 * executing.
> > +	 */
> > +	if (atomic_cmpxchg(&split_lock_debug, 0, 1) == 0) {
> > +		WARN_ONCE(1, "split lock operation detected\n");
> > +		atomic_set(&split_lock_debug, 0);
> 
> What's the purpose of this atomic_set()?

atomic_set() releases the split_lock_debug flag after WARN_ONCE() is done.
The same split_lock_debug flag will be used in sysfs write for atomic
operation as well, as proposed by Ingo in https://lkml.org/lkml/2019/4/25/48
So that's why the flag needs to be cleared, right?

> 
> > +dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
> > +{
> > +	unsigned int trapnr = X86_TRAP_AC;
> > +	char str[] = "alignment check";
> > +	int signr = SIGBUS;
> > +
> > +	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> > +
> > +	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
> > +		return;
> > +
> > +	cond_local_irq_enable(regs);
> > +	if (!user_mode(regs) && static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
> > +		/*
> > +		 * Only split locks can generate #AC from kernel mode.
> > +		 *
> > +		 * The split-lock detection feature is a one-shot
> > +		 * debugging facility, so we disable it immediately and
> > +		 * print a warning.
> > +		 *
> > +		 * This also solves the instruction restart problem: we
> > +		 * return the faulting instruction right after this it
> 
> we return the faulting instruction ... to the store so we get our deposit
> back :)
> 
>   the fault handler returns to the faulting instruction which will be then
>   executed without ....
> 
> Don't try to impersonate code, cpus or whatever. It doesn't make sense and
> confuses people.
> 
> > +		 * will be executed without generating another #AC fault
> > +		 * and getting into an infinite loop, instead it will
> > +		 * continue without side effects to the interrupted
> > +		 * execution context.
> 
> That last part 'instead .....' is redundant. It's entirely clear from the
> above that the faulting instruction is reexecuted ....
> 
> Please write concise comments and do try to repeat the same information
> with a different painting.

I copied the comment completely from Ingo's comment on v8:
https://lkml.org/lkml/2019/4/25/40

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 13/17] x86/split_lock: Disable split lock detection by kernel parameter "nosplit_lock_detect"
  2019-06-26 20:34   ` Thomas Gleixner
@ 2019-06-26 20:37     ` Fenghua Yu
  0 siblings, 0 replies; 85+ messages in thread
From: Fenghua Yu @ 2019-06-26 20:37 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Peter Zijlstra,
	Andrew Morton, Dave Hansen, Paolo Bonzini, Radim Krcmar,
	Christopherson Sean J, Ashok Raj, Tony Luck, Dan Williams,
	Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel,
	x86, kvm

On Wed, Jun 26, 2019 at 10:34:52PM +0200, Thomas Gleixner wrote:
> On Tue, 18 Jun 2019, Fenghua Yu wrote:
> >  
> >  static void split_lock_update_msr(void)
> >  {
> > -	/* Enable split lock detection */
> > -	this_cpu_or(msr_test_ctl_cached, MSR_TEST_CTL_SPLIT_LOCK_DETECT);
> > +	if (split_lock_detect_enabled) {
> > +		/* Enable split lock detection */
> > +		this_cpu_or(msr_test_ctl_cached, MSR_TEST_CTL_SPLIT_LOCK_DETECT);
> > +	} else {
> > +		/* Disable split lock detection */
> 
> Could you please comment the non obvious things and not the obvious ones?
> 
> > +		this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);
> 
> It's entirely clear that the if (enabled) path enables it or am I missing
> something?

Ok. I will remove the comments.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 14/17] x86/split_lock: Add a debugfs interface to enable/disable split lock detection during run time
  2019-06-18 22:41 ` [PATCH v9 14/17] x86/split_lock: Add a debugfs interface to enable/disable split lock detection during run time Fenghua Yu
@ 2019-06-26 21:37   ` Thomas Gleixner
  0 siblings, 0 replies; 85+ messages in thread
From: Thomas Gleixner @ 2019-06-26 21:37 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Peter Zijlstra,
	Andrew Morton, Dave Hansen, Paolo Bonzini, Radim Krcmar,
	Christopherson Sean J, Ashok Raj, Tony Luck, Dan Williams,
	Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel,
	x86, kvm

On Tue, 18 Jun 2019, Fenghua Yu wrote:
> To simplify the code, Ingo suggests to use the global atomic
> split_lock_debug flag both for warning split lock in WARN_ONCE() and for
> writing the debugfs interface.

So how is that flag used for writing the debugfs interface? Did you use it
for writing the code or is it used by the admin to write to the interface?

> -static void split_lock_update_msr(void)
> +static void split_lock_update_msr(void *__unused)
>  {
> +	unsigned long flags;
> +
> +	/*
> +	 * Need to prevent msr_test_ctl_cached from being changed *and*
> +	 * completing its WRMSR between our read and our WRMSR. By turning
> +	 * IRQs off here, ensure that no split lock debugfs write happens
> +	 * on this CPU and that any concurrent debugfs write from a different
> +	 * CPU will not finish updating us via IPI until we're done.
> +	 */

That's the same convoluted comment as in the UMWAIT series, but aside of
that it's completely nonsensical here. This function is either called from
the early cpu init code or via an SMP function call. Both have interrupts
disabled.

> +	local_irq_save(flags);

So this is a pointless exercise.

>  	if (split_lock_detect_enabled) {
>  		/* Enable split lock detection */
>  		this_cpu_or(msr_test_ctl_cached, MSR_TEST_CTL_SPLIT_LOCK_DETECT);
> @@ -640,6 +653,8 @@ static void split_lock_update_msr(void)
>  		this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);
>  	}
>  	wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_cached));
> +
> +	local_irq_restore(flags);
>  }
>  
>  static void split_lock_init(struct cpuinfo_x86 *c)
> @@ -651,7 +666,7 @@ static void split_lock_init(struct cpuinfo_x86 *c)
>  		rdmsrl(MSR_TEST_CTL, test_ctl_val);
>  		this_cpu_write(msr_test_ctl_cached, test_ctl_val);
>  
> -		split_lock_update_msr();
> +		split_lock_update_msr(NULL);
>  	}
>  }
>  
> @@ -1077,10 +1092,23 @@ static atomic_t split_lock_debug;
>  
>  void split_lock_disable(void)
>  {
> +	unsigned long flags;
> +
> +	/*
> +	 * Need to prevent msr_test_ctl_cached from being changed *and*
> +	 * completing its WRMSR between our read and our WRMSR. By turning
> +	 * IRQs off here, ensure that no split lock debugfs write happens
> +	 * on this CPU and that any concurrent debugfs write from a different
> +	 * CPU will not finish updating us via IPI until we're done.
> +	 */

Please check the comment above umwait_cpu_online() in the version I fixed
up and make sure that the comment here makes sense as well. The above does
not make any sense at all here. But before you go there ....

> +	local_irq_save(flags);

Neither does this.

>  	/* Disable split lock detection on this CPU */
>  	this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);
>  	wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_cached));
>  
> +	local_irq_restore(flags);
> +
>  	/*
>  	 * Use the atomic variable split_lock_debug to ensure only the
>  	 * first CPU hitting split lock issue prints one single complete
> @@ -1094,3 +1122,92 @@ void split_lock_disable(void)
>  		atomic_set(&split_lock_debug, 0);
>  	}
>  }

The above is called from the AC handler:

+dotraplinkage void do_alignment_check(struct pt_regs *regs, long error_code)
+{
+	unsigned int trapnr = X86_TRAP_AC;
+	char str[] = "alignment check";
+	int signr = SIGBUS;
+
+	RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+	if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) == NOTIFY_STOP)
+		return;
+
+	cond_local_irq_enable(regs);

So why enabling interrupts here at all? Just to disable them right away
when this was a split lock? Just keep them disabled, do the check with
interrupts disabled ...

+	if (!user_mode(regs) && static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
+		/*
+		 * Only split locks can generate #AC from kernel mode.
+		 *
+		 * The split-lock detection feature is a one-shot
+		 * debugging facility, so we disable it immediately and
+		 * print a warning.
+		 *
+		 * This also solves the instruction restart problem: we
+		 * return the faulting instruction right after this it
+		 * will be executed without generating another #AC fault
+		 * and getting into an infinite loop, instead it will
+		 * continue without side effects to the interrupted
+		 * execution context.
+		 *
+		 * Split-lock detection will remain disabled after this,
+		 * until the next reboot.
+		 */
+		split_lock_disable();

	split_lock_disable() wants a lockdep_assert_irqs_disabled() inside.

+
+		return;
+	}

and put the cond_local_irq_enable() here.

+	/* Handle #AC generated in any other cases. */
+	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
+		error_code, BUS_ADRALN, NULL);

> +
> +static ssize_t split_lock_detect_rd(struct file *f, char __user *user_buf,
> +				    size_t count, loff_t *ppos)
> +{
> +	unsigned int len;
> +	char buf[8];
> +
> +	len = sprintf(buf, "%u\n", split_lock_detect_enabled);

The state is inconsistent when AC triggered in the kernel. Because nothing
clears that enabled variable, but the lock detection is disabled.

> +	return simple_read_from_buffer(user_buf, count, ppos, buf, len);
> +}
> +
> +static ssize_t split_lock_detect_wr(struct file *f, const char __user *user_buf,
> +				    size_t count, loff_t *ppos)
> +{
> +	unsigned int len;
> +	char buf[8];
> +	bool val;
> +
> +	len = min(count, sizeof(buf) - 1);
> +	if (copy_from_user(buf, user_buf, len))
> +		return -EFAULT;
> +
> +	buf[len] = '\0';
> +	if (kstrtobool(buf, &val))
> +		return -EINVAL;
> +
> +	while (atomic_cmpxchg(&split_lock_debug, 1, 0))
> +		cpu_relax();

I assume that this is the magic thing what Ingo suggested. But that's
completely non obvious and of course because it's non obvious it lacks a
comment as well.

I assume that is considered to be magic serialization of the debugfs
write. Let's have a look.

First caller:

       debug == 1

       For simplicity we assume no concurrency

       while ...
       	     1st round:  atomic_cmpxchg(&debug, 1, 0) -> returns 1

	     so the atomic_cmpxchg() succeeded because it returned oldval
	     and that was 1

	     debug contains 0 now

	     so it has to go through a second round

       	     2nd round:  atomic_cmpxchg(&debug, 1, 0) -> returns 0

	     because debug is 0

That means the debug print thing is now disabled and the first caller proceeds.

Guess what happens if another one comes in. It will only take one round to
proceed because debug is already 0

So both think they are alone. Interesting concept.

> +	if (split_lock_detect_enabled == val)
> +		goto out_unlock;

Now lets go to 'out_unlock' ....

> +
> +	split_lock_detect_enabled = val;
> +
> +	/* Update the split lock detection setting in MSR on all online CPUs. */
> +	on_each_cpu(split_lock_update_msr, NULL, 1);
> +
> +	if (split_lock_detect_enabled)
> +		pr_info("enabled\n");
> +	else
> +		pr_info("disabled\n");

Errm. No. The admin wrote to the file. Why do we need to make noise in
dmesg about that?

> +
> +out_unlock:
> +	atomic_set(&split_lock_debug, 0);

out_unlock writes the variable which is already 0 to 0. I'm failing to see
how that locking works, but clearly once the debugfs file is written to the
kernel side debug print is disabled forever.

I have no idea how that is supposed to work, but then I might be missing
the magic logic behind it.

Aside of that. Doing locking with atomic_cmpxhg() spinning for such an
interface is outright crap even if implemented correctly. The first writer
might be preempted after acquiring the 'lock' and then the second one spins
until the first one comes back on the CPU and completes. We have spinlocks
for that, but spinlocks are the wrong tool here. Why not using a good old
mutex and be done with it? Just because this file is root only does not
justify any of this.

> +	return count;

> +}
> +
> +static const struct file_operations split_lock_detect_fops = {
> +	.read = split_lock_detect_rd,
> +	.write = split_lock_detect_wr,
> +	.llseek = default_llseek,

Even if I repeat myself. May I ask you more or less politely to use tabular
aligned initializers?

> +};
> +
> +/*
> + * Before resume from hibernation, TEST_CTL MSR has been initialized to
> + * default value in split_lock_init() on BP. On resume, restore the MSR
> + * on BP to previous value which could be changed by debugfs and thus could
> + * be different from the default value.

What has this to do with debugfs? This is about state in the kernel which
starts up after hibernation and the state in the hibernated kernel. Their
state can differ for whatever reason and is just unconditionally restored
to the state it had before hibernation. The whole debugfs blurb is
irrelevant.

> + *
> + * The MSR on BP is supposed not to be changed during suspend and thus it's
> + * unnecessary to set it again during resume from suspend. But at this point
> + * we don't know resume is from suspend or hibernation. To simplify the
> + * situation, just set up the MSR on resume from suspend.
> + *
> + * Set up the MSR on APs when they are re-added later.

See the fixed up UMWAIT comment on this.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-06-26 20:36     ` Fenghua Yu
@ 2019-06-26 21:47       ` Thomas Gleixner
  2019-09-25 18:09         ` Sean Christopherson
  0 siblings, 1 reply; 85+ messages in thread
From: Thomas Gleixner @ 2019-06-26 21:47 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Peter Zijlstra,
	Andrew Morton, Dave Hansen, Paolo Bonzini, Radim Krcmar,
	Christopherson Sean J, Ashok Raj, Tony Luck, Dan Williams,
	Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel,
	x86, kvm

On Wed, 26 Jun 2019, Fenghua Yu wrote:

> On Wed, Jun 26, 2019 at 10:20:05PM +0200, Thomas Gleixner wrote:
> > On Tue, 18 Jun 2019, Fenghua Yu wrote:
> > > +
> > > +static atomic_t split_lock_debug;
> > > +
> > > +void split_lock_disable(void)
> > > +{
> > > +	/* Disable split lock detection on this CPU */
> > > +	this_cpu_and(msr_test_ctl_cached, ~MSR_TEST_CTL_SPLIT_LOCK_DETECT);
> > > +	wrmsrl(MSR_TEST_CTL, this_cpu_read(msr_test_ctl_cached));
> > > +
> > > +	/*
> > > +	 * Use the atomic variable split_lock_debug to ensure only the
> > > +	 * first CPU hitting split lock issue prints one single complete
> > > +	 * warning. This also solves the race if the split-lock #AC fault
> > > +	 * is re-triggered by NMI of perf context interrupting one
> > > +	 * split-lock warning execution while the original WARN_ONCE() is
> > > +	 * executing.
> > > +	 */
> > > +	if (atomic_cmpxchg(&split_lock_debug, 0, 1) == 0) {
> > > +		WARN_ONCE(1, "split lock operation detected\n");
> > > +		atomic_set(&split_lock_debug, 0);
> > 
> > What's the purpose of this atomic_set()?
> 
> atomic_set() releases the split_lock_debug flag after WARN_ONCE() is done.
> The same split_lock_debug flag will be used in sysfs write for atomic
> operation as well, as proposed by Ingo in https://lkml.org/lkml/2019/4/25/48

Your comment above lacks any useful information about that whole thing.

> So that's why the flag needs to be cleared, right?

Errm. No.

CPU 0					CPU 1
					
hits AC					hits AC
  if (atomic_cmpxchg() == success)	  if (atomic_cmpxchg() == success)
  	warn()	       	  		     warn()

So only one of the CPUs will win the cmpxchg race, set te variable to 1 and
warn, the other and any subsequent AC on any other CPU will not warn
either. So you don't need WARN_ONCE() at all. It's redundant and confusing
along with the atomic_set().

Whithout reading that link [1], what Ingo proposed was surely not the
trainwreck which you decided to put into that debugfs thing.

Thanks,

	tglx

[1] lkml.org sucks. We have https://lkml.kernel.org/r/$MESSAGEID for
    that. That actually works.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 15/17] x86/split_lock: Add documentation for split lock detection interface
  2019-06-18 22:41 ` [PATCH v9 15/17] x86/split_lock: Add documentation for split lock detection interface Fenghua Yu
@ 2019-06-26 21:51   ` Thomas Gleixner
  0 siblings, 0 replies; 85+ messages in thread
From: Thomas Gleixner @ 2019-06-26 21:51 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Peter Zijlstra,
	Andrew Morton, Dave Hansen, Paolo Bonzini, Radim Krcmar,
	Christopherson Sean J, Ashok Raj, Tony Luck, Dan Williams,
	Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel,
	x86, kvm

On Tue, 18 Jun 2019, Fenghua Yu wrote:

> It is useful for development and debugging to document the new debugfs
> interface /sys/kernel/debug/x86/split_lock_detect.
> 
> A new debugfs documentation is created to describe the split lock detection
> interface. In the future, more entries may be added in the documentation to
> describe other interfaces under /sys/kernel/debug/x86 directory.
> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
>  Documentation/ABI/testing/debugfs-x86 | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>  create mode 100644 Documentation/ABI/testing/debugfs-x86
> 
> diff --git a/Documentation/ABI/testing/debugfs-x86 b/Documentation/ABI/testing/debugfs-x86
> new file mode 100644
> index 000000000000..17a1e9ed6712
> --- /dev/null
> +++ b/Documentation/ABI/testing/debugfs-x86
> @@ -0,0 +1,21 @@
> +What:		/sys/kernel/debugfs/x86/split_lock_detect
> +Date:		May 2019
> +Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
> +Description:	(RW) Control split lock detection on Intel Tremont and
> +		future CPUs
> +
> +		Reads return split lock detection status:
> +			0: disabled
> +			1: enabled
> +
> +		Writes enable or disable split lock detection:
> +			The first character is one of 'Nn0' or [oO][fF] for off
> +			disables the feature.
> +			The first character is one of 'Yy1' or [oO][nN] for on
> +			enables the feature.
> +
> +		Please note the interface only shows or controls global setting.
> +		During run time, split lock detection on one CPU may be
> +		disabled if split lock operation in kernel code happens on
> +		the CPU. The interface doesn't show or control split lock
> +		detection on individual CPU.

But it should show that the debug output for the kernel has been globally
disabled instead of merily stating 'enabled' while it's already disabled on
a bunch on CPUs. I fundamentally despise inconsistent information. Even in
debugfs files inconsistency is a pain because it make debugging via mail/
bugzille etc. unnecessarily cumbersome.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 17/17] x86/split_lock: Warn on unaligned address in atomic bit operations
  2019-06-18 22:41 ` [PATCH v9 17/17] x86/split_lock: Warn on unaligned address in atomic bit operations Fenghua Yu
@ 2019-06-26 22:00   ` Thomas Gleixner
  0 siblings, 0 replies; 85+ messages in thread
From: Thomas Gleixner @ 2019-06-26 22:00 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ingo Molnar, Borislav Petkov, H Peter Anvin, Peter Zijlstra,
	Andrew Morton, Dave Hansen, Paolo Bonzini, Radim Krcmar,
	Christopherson Sean J, Ashok Raj, Tony Luck, Dan Williams,
	Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel,
	x86, kvm

On Tue, 18 Jun 2019, Fenghua Yu wrote:

> An atomic bit operation operates one bit in a single unsigned long location
> in a bitmap. In 64-bit mode, the location is at:
> base address of the bitmap + (bit offset in the bitmap / 64) * 8
> 
> If the base address is unaligned to unsigned long, each unsigned long
> location operated by the atomic operation will be unaligned to unsigned
> long and a split lock issue will happen if the unsigned long location
> crosses two cache lines.

Stop harping on this split lock stuff.

Unalignedness is a problem per se as myself and others explained you a
gazillion times now.

The fact that it does not matter on x86 except when it crosses a cacheline
does not make it in any way a split lock issue.

The root cause is misalignment per se.

Aside of that this debug enhancement wants to be the first patch in the
series not the last.

Thanks,

	tglx
	




^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 11/17] kvm/vmx: Emulate MSR TEST_CTL
  2019-06-18 22:41 ` [PATCH v9 11/17] kvm/vmx: Emulate MSR TEST_CTL Fenghua Yu
@ 2019-06-27  2:24   ` Xiaoyao Li
  2019-06-27  7:12     ` Thomas Gleixner
  0 siblings, 1 reply; 85+ messages in thread
From: Xiaoyao Li @ 2019-06-27  2:24 UTC (permalink / raw)
  To: Fenghua Yu, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Paolo Bonzini, Radim Krcmar, Christopherson Sean J, Ashok Raj,
	Tony Luck, Dan Williams, Xiaoyao Li, Sai Praneeth Prakhya,
	Ravi V Shankar
  Cc: linux-kernel, x86, kvm

Hi Paolo & tglx,

Do you have any comments on this one as the policy of how to expose 
split lock detection (emulate TEST_CTL) for guest changed.

This patch makes the implementation as below:

Host	|Guest	|Actual value in guest	|split lock happen in guest
------------------------------------------------------------------
on	|off	|	on		|report #AC to userspace
	|on	|	on		|inject #AC back to guest
------------------------------------------------------------------
off	|off	|	off		|No #AC
	|on	|	on		|inject #AC back to guest

In case 2, when split lock detection of both host and guest on, if there 
is a split lock is guest, it will inject #AC back to userspace. Then if 
#AC is from guest userspace apps, guest kernel sends SIGBUS to userspace 
apps instead of whole guest killed by host. If #AC is from guest kernel, 
guest kernel may clear it's split lock bit in test_ctl msr and 
re-execute the instruction, then it goes into case 1, the #AC will 
report to host userspace, e.g., QEMU.

On 6/19/2019 6:41 AM, Fenghua Yu wrote:
> From: Xiaoyao Li <xiaoyao.li@linux.intel.com>
> 
> A control bit (bit 29) in TEST_CTL MSR 0x33 will be introduced in
> future x86 processors. When bit 29 is set, the processor causes #AC
> exception for split locked accesses at all CPL.
> 
> Please check the latest Intel 64 and IA-32 Architectures Software
> Developer's Manual for more detailed information on the MSR and
> the split lock bit.
> 
> This patch emulates MSR_TEST_CTL with vmx->msr_test_ctl and does the
> following:
> 1. As MSR TEST_CTL of guest is emulated, enable the related bit
> in CORE_CAPABILITY to correctly report this feature to guest.
> 
> 2. If host has split lock detection enabled, forcing it enabled in
> guest to avoid guest's slowdown attack by using split lock.
> If host has it disabled, it can give control to guest that guest can
> enable it on its own purpose.
> 
> Note: Guest can read and write bit 29 of MSR_TEST_CTL if hardware has
> feature split lock detection. But when guest running, the real value in
> hardware MSR will be different from the value read in guest when guest
> has it disabled and host has it enabled. It can be regarded as host's
> value overrides guest's value.
> 
> To avoid costly RDMSR of TEST_CTL when switching between host and guest
> during vmentry, read per CPU variable msr_test_ctl_cached which caches
> the MSR value.
> 
> Besides, only inject #AC exception back when guest can handle it.
> Otherwise, it must be a split lock caused #AC. In this case, print a hint.
> 
> Signed-off-by: Xiaoyao Li <xiaoyao.li@linux.intel.com>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
>   arch/x86/kvm/vmx/vmx.c | 92 ++++++++++++++++++++++++++++++++++++++++--
>   arch/x86/kvm/vmx/vmx.h |  2 +
>   arch/x86/kvm/x86.c     | 19 ++++++++-
>   3 files changed, 109 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index b93e36ddee5e..d096cee48a40 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1640,6 +1640,16 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
>   	return !(val & ~valid_bits);
>   }
>   
> +static u64 vmx_get_msr_test_ctl_mask(struct kvm_vcpu *vcpu)
> +{
> +	u64 mask = 0;
> +
> +	if (vcpu->arch.core_capability & MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT)
> +		mask |= MSR_TEST_CTL_SPLIT_LOCK_DETECT;
> +
> +	return mask;
> +}
> +
>   static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
>   {
>   	switch (msr->index) {
> @@ -1666,6 +1676,11 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	u32 index;
>   
>   	switch (msr_info->index) {
> +	case MSR_TEST_CTL:
> +		if (!vmx->msr_test_ctl_mask)
> +			return 1;
> +		msr_info->data = vmx->msr_test_ctl;
> +		break;
>   #ifdef CONFIG_X86_64
>   	case MSR_FS_BASE:
>   		msr_info->data = vmcs_readl(GUEST_FS_BASE);
> @@ -1803,6 +1818,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	u32 index;
>   
>   	switch (msr_index) {
> +	case MSR_TEST_CTL:
> +		if (!vmx->msr_test_ctl_mask ||
> +		    (data & vmx->msr_test_ctl_mask) != data)
> +			return 1;
> +		vmx->msr_test_ctl = data;
> +		break;
> +	case MSR_IA32_CORE_CAP:
> +		if (!msr_info->host_initiated)
> +			return 1;
> +		vcpu->arch.core_capability = data;
> +		vmx->msr_test_ctl_mask = vmx_get_msr_test_ctl_mask(vcpu);
> +		break;
>   	case MSR_EFER:
>   		ret = kvm_set_msr_common(vcpu, msr_info);
>   		break;
> @@ -4121,6 +4148,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>   
>   	vmx->rmode.vm86_active = 0;
>   	vmx->spec_ctrl = 0;
> +	vmx->msr_test_ctl = 0;
> +	vmx->msr_test_ctl_mask = vmx_get_msr_test_ctl_mask(vcpu);
>   
>   	vcpu->arch.microcode_version = 0x100000000ULL;
>   	vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
> @@ -4449,6 +4478,28 @@ static int handle_machine_check(struct kvm_vcpu *vcpu)
>   	return 1;
>   }
>   
> +/*
> + * In intel SDM, #AC can be caused in two way:
> + *	1. Unaligned memory access when CPL = 3 && CR0.AM == 1 && EFLAGS.AC == 1
> + *	2. Lock on crossing cache line memory access, when split lock detection
> + *	   is enabled (bit 29 of MSR_TEST_CTL is set). This #AC can be generated
> + *	   in any CPL.
> + *
> + * So, when guest's split lock detection is enabled, it can be assumed capable
> + * of handling #AC in any CPL.
> + * Or when guest's CR0.AM and EFLAGS.AC are both set, it can be assumed capable
> + * of handling #AC in CPL == 3.
> + */
> +static bool guest_can_handle_ac(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> +
> +	return (vmx->msr_test_ctl & MSR_TEST_CTL_SPLIT_LOCK_DETECT) ||
> +	       ((vmx_get_cpl(vcpu) == 3) &&
> +		kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
> +		(kvm_get_rflags(vcpu) & X86_EFLAGS_AC));
> +}
> +
>   static int handle_exception(struct kvm_vcpu *vcpu)
>   {
>   	struct vcpu_vmx *vmx = to_vmx(vcpu);
> @@ -4514,9 +4565,6 @@ static int handle_exception(struct kvm_vcpu *vcpu)
>   		return handle_rmode_exception(vcpu, ex_no, error_code);
>   
>   	switch (ex_no) {
> -	case AC_VECTOR:
> -		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> -		return 1;
>   	case DB_VECTOR:
>   		dr6 = vmcs_readl(EXIT_QUALIFICATION);
>   		if (!(vcpu->guest_debug &
> @@ -4545,6 +4593,15 @@ static int handle_exception(struct kvm_vcpu *vcpu)
>   		kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip;
>   		kvm_run->debug.arch.exception = ex_no;
>   		break;
> +	case AC_VECTOR:
> +		if (guest_can_handle_ac(vcpu)) {
> +			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
> +			return 1;
> +		}
> +		pr_warn("kvm: %s[%d]: there is an #AC exception in guest due to split lock. "
> +			"Please try to fix it, or disable the split lock detection in host to workaround.",
> +			current->comm, current->pid);
> +		/* fall through */
>   	default:
>   		kvm_run->exit_reason = KVM_EXIT_EXCEPTION;
>   		kvm_run->ex.exception = ex_no;
> @@ -6335,6 +6392,33 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
>   					msrs[i].host, false);
>   }
>   
> +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
> +{
> +	u64 guest_val;
> +	u64 host_val = this_cpu_read(msr_test_ctl_cached);
> +	u64 mask = vmx->msr_test_ctl_mask;
> +
> +	/*
> +	 * Guest can cause overall system performance degradation (of host or
> +	 * other guest) by using split lock. Hence, it takes following policy:
> +	 *  - If host has split lock detection enabled, forcing it enabled in
> +	 *    guest during vm entry.
> +	 *  - If host has split lock detection disabled, guest can enable it for
> +	 *    it's own purpose that it will load guest's value during vm entry.
> +	 *
> +	 * So use adjusted mask to achieve this.
> +	 */
> +	if (host_val & MSR_TEST_CTL_SPLIT_LOCK_DETECT)
> +		mask &= ~MSR_TEST_CTL_SPLIT_LOCK_DETECT;
> +
> +	guest_val = (host_val & ~mask) | (vmx->msr_test_ctl & mask);
> +
> +	if (host_val == guest_val)
> +		clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
> +	else
> +		add_atomic_switch_msr(vmx, MSR_TEST_CTL, guest_val, host_val, false);
> +}
> +
>   static void vmx_arm_hv_timer(struct vcpu_vmx *vmx, u32 val)
>   {
>   	vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, val);
> @@ -6443,6 +6527,8 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
>   
>   	atomic_switch_perf_msrs(vmx);
>   
> +	atomic_switch_msr_test_ctl(vmx);
> +
>   	vmx_update_hv_timer(vcpu);
>   
>   	/*
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 61128b48c503..2a54b0b5741e 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -193,6 +193,8 @@ struct vcpu_vmx {
>   	u64		      msr_guest_kernel_gs_base;
>   #endif
>   
> +	u64		      msr_test_ctl;
> +	u64		      msr_test_ctl_mask;
>   	u64		      spec_ctrl;
>   
>   	u32 vm_entry_controls_shadow;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index dc4c72bd6781..741ad4e61386 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1238,7 +1238,24 @@ EXPORT_SYMBOL_GPL(kvm_get_arch_capabilities);
>   
>   static u64 kvm_get_core_capability(void)
>   {
> -	return 0;
> +	u64 data = 0;
> +
> +	if (boot_cpu_has(X86_FEATURE_CORE_CAPABILITY)) {
> +		rdmsrl(MSR_IA32_CORE_CAP, data);
> +
> +		/* mask non-virtualizable functions */
> +		data &= MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT;
> +	} else if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT)) {
> +		/*
> +		 * There will be a list of FMS values that have split lock
> +		 * detection but lack the CORE CAPABILITY MSR. In this case,
> +		 * set MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT since we emulate
> +		 * MSR CORE_CAPABILITY.
> +		 */
> +		data |= MSR_IA32_CORE_CAP_SPLIT_LOCK_DETECT;
> +	}
> +
> +	return data;
>   }
>   
>   static int kvm_get_msr_feature(struct kvm_msr_entry *msr)
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 11/17] kvm/vmx: Emulate MSR TEST_CTL
  2019-06-27  2:24   ` Xiaoyao Li
@ 2019-06-27  7:12     ` Thomas Gleixner
  2019-06-27  7:58       ` Xiaoyao Li
  0 siblings, 1 reply; 85+ messages in thread
From: Thomas Gleixner @ 2019-06-27  7:12 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Fenghua Yu, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar,
	linux-kernel, x86, kvm


A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top

A: Yes
Q: Should I trim all irrelevant context?

On Thu, 27 Jun 2019, Xiaoyao Li wrote:
>
> Do you have any comments on this one as the policy of how to expose split lock
> detection (emulate TEST_CTL) for guest changed.
> 
> This patch makes the implementation as below:
> 
> Host	|Guest	|Actual value in guest	|split lock happen in guest
> ------------------------------------------------------------------
> on	|off	|	on		|report #AC to userspace
> 	|on	|	on		|inject #AC back to guest
> ------------------------------------------------------------------
> off	|off	|	off		|No #AC
> 	|on	|	on		|inject #AC back to guest

A: Because it's way better to provide implementation details and useless
   references to the SDM.

Q: What's the reason that this table is _NOT_ part of the changelog?

> In case 2, when split lock detection of both host and guest on, if there is a
> split lock is guest, it will inject #AC back to userspace. Then if #AC is from
> guest userspace apps, guest kernel sends SIGBUS to userspace apps instead of
> whole guest killed by host. If #AC is from guest kernel, guest kernel may
> clear it's split lock bit in test_ctl msr and re-execute the instruction, then
> it goes into case 1, the #AC will report to host userspace, e.g., QEMU.

The real interesting question is whether the #AC on split lock prevents the
actual bus lock or not. If it does then the above is fine.

If not, then it would be trivial for a malicious guest to set the
SPLIT_LOCK_ENABLE bit and "handle" the exception pro forma, return to the
offending instruction and trigger another one. It lowers the rate, but that
doesn't make it any better.

The SDM is as usual too vague to be useful. Please clarify.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 11/17] kvm/vmx: Emulate MSR TEST_CTL
  2019-06-27  7:12     ` Thomas Gleixner
@ 2019-06-27  7:58       ` Xiaoyao Li
  2019-06-27 12:11         ` Thomas Gleixner
  0 siblings, 1 reply; 85+ messages in thread
From: Xiaoyao Li @ 2019-06-27  7:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar,
	linux-kernel, x86, kvm

On 6/27/2019 3:12 PM, Thomas Gleixner wrote:
> 
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
> 
> A: No.
> Q: Should I include quotations after my reply?
> 
> http://daringfireball.net/2007/07/on_top
> 
> A: Yes
> Q: Should I trim all irrelevant context?
> 

Sorry about this.
Won't do it anymore.

> On Thu, 27 Jun 2019, Xiaoyao Li wrote:
>>
>> Do you have any comments on this one as the policy of how to expose split lock
>> detection (emulate TEST_CTL) for guest changed.
>>
>> This patch makes the implementation as below:
>>
>> Host	|Guest	|Actual value in guest	|split lock happen in guest
>> ------------------------------------------------------------------
>> on	|off	|	on		|report #AC to userspace
>> 	|on	|	on		|inject #AC back to guest
>> ------------------------------------------------------------------
>> off	|off	|	off		|No #AC
>> 	|on	|	on		|inject #AC back to guest
> 
> A: Because it's way better to provide implementation details and useless
>     references to the SDM.
> 
> Q: What's the reason that this table is _NOT_ part of the changelog?
> 

will add it in next version.

>> In case 2, when split lock detection of both host and guest on, if there is a
>> split lock is guest, it will inject #AC back to userspace. Then if #AC is from
>> guest userspace apps, guest kernel sends SIGBUS to userspace apps instead of
>> whole guest killed by host. If #AC is from guest kernel, guest kernel may
>> clear it's split lock bit in test_ctl msr and re-execute the instruction, then
>> it goes into case 1, the #AC will report to host userspace, e.g., QEMU.
> 
> The real interesting question is whether the #AC on split lock prevents the
> actual bus lock or not. If it does then the above is fine.
> 
> If not, then it would be trivial for a malicious guest to set the
> SPLIT_LOCK_ENABLE bit and "handle" the exception pro forma, return to the
> offending instruction and trigger another one. It lowers the rate, but that
> doesn't make it any better.
> 
> The SDM is as usual too vague to be useful. Please clarify.
>

This feature is to ensure no bus lock (due to split lock) in hardware, 
that to say, when bit 29 of TEST_CTL is set, there is no bus lock due to 
split lock can be acquired.

> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 11/17] kvm/vmx: Emulate MSR TEST_CTL
  2019-06-27  7:58       ` Xiaoyao Li
@ 2019-06-27 12:11         ` Thomas Gleixner
  2019-06-27 12:22           ` Xiaoyao Li
  0 siblings, 1 reply; 85+ messages in thread
From: Thomas Gleixner @ 2019-06-27 12:11 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Fenghua Yu, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Xiaoyao Li, Sai Praneeth Prakhya, Ravi V Shankar,
	linux-kernel, x86, kvm

On Thu, 27 Jun 2019, Xiaoyao Li wrote:
> On 6/27/2019 3:12 PM, Thomas Gleixner wrote:
> > The real interesting question is whether the #AC on split lock prevents the
> > actual bus lock or not. If it does then the above is fine.
> > 
> > If not, then it would be trivial for a malicious guest to set the
> > SPLIT_LOCK_ENABLE bit and "handle" the exception pro forma, return to the
> > offending instruction and trigger another one. It lowers the rate, but that
> > doesn't make it any better.
> > 
> > The SDM is as usual too vague to be useful. Please clarify.
> > 
> This feature is to ensure no bus lock (due to split lock) in hardware, that to
> say, when bit 29 of TEST_CTL is set, there is no bus lock due to split lock
> can be acquired.

So enabling this prevents the bus lock, i.e. the exception is raised before
that happens.

Please add that information to the changelog as well because that's
important to know and makes me much more comfortable handing the #AC back
into the guest when it has it enabled.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 11/17] kvm/vmx: Emulate MSR TEST_CTL
  2019-06-27 12:11         ` Thomas Gleixner
@ 2019-06-27 12:22           ` Xiaoyao Li
  0 siblings, 0 replies; 85+ messages in thread
From: Xiaoyao Li @ 2019-06-27 12:22 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Christopherson Sean J, Ashok Raj, Tony Luck,
	Dan Williams, Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel,
	x86, kvm

On Thu, 2019-06-27 at 14:11 +0200, Thomas Gleixner wrote:
> On Thu, 27 Jun 2019, Xiaoyao Li wrote:
> > On 6/27/2019 3:12 PM, Thomas Gleixner wrote:
> > > The real interesting question is whether the #AC on split lock prevents
> > > the
> > > actual bus lock or not. If it does then the above is fine.
> > > 
> > > If not, then it would be trivial for a malicious guest to set the
> > > SPLIT_LOCK_ENABLE bit and "handle" the exception pro forma, return to the
> > > offending instruction and trigger another one. It lowers the rate, but
> > > that
> > > doesn't make it any better.
> > > 
> > > The SDM is as usual too vague to be useful. Please clarify.
> > > 
> > 
> > This feature is to ensure no bus lock (due to split lock) in hardware, that
> > to
> > say, when bit 29 of TEST_CTL is set, there is no bus lock due to split lock
> > can be acquired.
> 
> So enabling this prevents the bus lock, i.e. the exception is raised before
> that happens.
> 
exactly.

> Please add that information to the changelog as well because that's
> important to know and makes me much more comfortable handing the #AC back
> into the guest when it has it enabled.
> 
Will add it in next version.

Thanks.




^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 0/3] Fix some 4-byte vs. 8-byte alignment issues
  2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
                   ` (16 preceding siblings ...)
  2019-06-18 22:41 ` [PATCH v9 17/17] x86/split_lock: Warn on unaligned address in atomic bit operations Fenghua Yu
@ 2019-09-16 22:39 ` Tony Luck
  2019-09-16 22:39   ` [PATCH 1/3] x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long Tony Luck
                     ` (2 more replies)
  17 siblings, 3 replies; 85+ messages in thread
From: Tony Luck @ 2019-09-16 22:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tony Luck, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel,
	x86, Fenghua Yu

This series is made up of three patches from Fenghua Yu's "Split lock"
series last posted here:

https://lore.kernel.org/kvm/1560897679-228028-1-git-send-email-fenghua.yu@intel.com/

Part 3 has been fixed to use a union to force alignment per
feedback from Thomas.

These parts are all simple fixes which are a necessary pre-cursor
before we can enable #AC traps for split lock access. But they
are also worthwhile performance fixes in their own right. So
no sense in holding them back while we discuss the merits of
the rest of the series.

Fenghua Yu (2):
  x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long
  x86/split_lock: Align the x86_capability array to size of unsigned
    long

Peter Zijlstra (1):
  drivers/net/b44: Align pwol_mask to unsigned long for better
    performance

 arch/x86/include/asm/processor.h    | 10 +++++++++-
 arch/x86/kernel/cpu/common.c        |  5 +++--
 drivers/net/ethernet/broadcom/b44.c |  4 ++--
 3 files changed, 14 insertions(+), 5 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH 1/3] x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long
  2019-09-16 22:39 ` [PATCH 0/3] Fix some 4-byte vs. 8-byte alignment issues Tony Luck
@ 2019-09-16 22:39   ` Tony Luck
  2019-11-15 19:26     ` [tip: x86/cpu] x86/cpu: " tip-bot2 for Fenghua Yu
  2019-09-16 22:39   ` [PATCH 2/3] drivers/net/b44: Align pwol_mask to unsigned long for better performance Tony Luck
  2019-09-16 22:39   ` [PATCH 3/3] x86/split_lock: Align the x86_capability array to size of unsigned long Tony Luck
  2 siblings, 1 reply; 85+ messages in thread
From: Tony Luck @ 2019-09-16 22:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Borislav Petkov, Tony Luck, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Paolo Bonzini, Radim Krcmar, Sai Praneeth Prakhya,
	Ravi V Shankar, linux-kernel, x86

From: Fenghua Yu <fenghua.yu@intel.com>

cpu_caps_cleared[] and cpu_caps_set[] may not be aligned to unsigned long.
Atomic operations (i.e. set_bit() and clear_bit()) on the bitmaps may
access two cache lines (a.k.a. split lock) and cause the CPU to do a bus
lock to block all memory accesses from other processors to ensure
atomicity.

To avoid the overall performance degradation from the bus locking, align
the two variables to unsigned long.

Defining the variables as unsigned long may also fix the issue because
they will be naturally aligned to unsigned long. But that needs additional
code changes. Adding __aligned(unsigned long) is a simpler fix.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/common.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f125bf7ecb6f..87627091f45f 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -565,8 +565,9 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
 	return NULL;		/* Not found */
 }
 
-__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS];
-__u32 cpu_caps_set[NCAPINTS + NBUGINTS];
+/* Aligned to unsigned long to avoid split lock in atomic bitmap ops */
+__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
+__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
 
 void load_percpu_segment(int cpu)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 2/3] drivers/net/b44: Align pwol_mask to unsigned long for better performance
  2019-09-16 22:39 ` [PATCH 0/3] Fix some 4-byte vs. 8-byte alignment issues Tony Luck
  2019-09-16 22:39   ` [PATCH 1/3] x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long Tony Luck
@ 2019-09-16 22:39   ` Tony Luck
  2019-09-16 22:39   ` [PATCH 3/3] x86/split_lock: Align the x86_capability array to size of unsigned long Tony Luck
  2 siblings, 0 replies; 85+ messages in thread
From: Tony Luck @ 2019-09-16 22:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Fenghua Yu, Tony Luck, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Andrew Morton, Dave Hansen,
	Paolo Bonzini, Radim Krcmar, Sai Praneeth Prakhya,
	Ravi V Shankar, linux-kernel, x86

From: Peter Zijlstra <peterz@infradead.org>

A bit in pwol_mask is set in b44_magic_pattern() by atomic set_bit().
But since pwol_mask is local and never exposed to concurrency, there is
no need to set bit in pwol_mask atomically.

set_bit() sets the bit in a single unsigned long location. Because
pwol_mask may not be aligned to unsigned long, the location may cross two
cache lines. On x86, accessing two cache lines in locked instruction in
set_bit() is called split locked access and can cause overall performance
degradation.

So use non atomic __set_bit() to set pwol_mask bits. __set_bit() won't hit
split lock issue on x86.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/net/ethernet/broadcom/b44.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/b44.c b/drivers/net/ethernet/broadcom/b44.c
index 97ab0dd25552..5738ab963dfb 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -1520,7 +1520,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
 
 	memset(ppattern + offset, 0xff, magicsync);
 	for (j = 0; j < magicsync; j++)
-		set_bit(len++, (unsigned long *) pmask);
+		__set_bit(len++, (unsigned long *)pmask);
 
 	for (j = 0; j < B44_MAX_PATTERNS; j++) {
 		if ((B44_PATTERN_SIZE - len) >= ETH_ALEN)
@@ -1532,7 +1532,7 @@ static int b44_magic_pattern(u8 *macaddr, u8 *ppattern, u8 *pmask, int offset)
 		for (k = 0; k< ethaddr_bytes; k++) {
 			ppattern[offset + magicsync +
 				(j * ETH_ALEN) + k] = macaddr[k];
-			set_bit(len++, (unsigned long *) pmask);
+			__set_bit(len++, (unsigned long *)pmask);
 		}
 	}
 	return len - 1;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH 3/3] x86/split_lock: Align the x86_capability array to size of unsigned long
  2019-09-16 22:39 ` [PATCH 0/3] Fix some 4-byte vs. 8-byte alignment issues Tony Luck
  2019-09-16 22:39   ` [PATCH 1/3] x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long Tony Luck
  2019-09-16 22:39   ` [PATCH 2/3] drivers/net/b44: Align pwol_mask to unsigned long for better performance Tony Luck
@ 2019-09-16 22:39   ` Tony Luck
  2019-09-17  8:29     ` David Laight
  2019-11-15 19:26     ` [tip: x86/cpu] x86/cpu: " tip-bot2 for Fenghua Yu
  2 siblings, 2 replies; 85+ messages in thread
From: Tony Luck @ 2019-09-16 22:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, David Laight, Tony Luck, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Paolo Bonzini, Radim Krcmar, Sai Praneeth Prakhya,
	Ravi V Shankar, linux-kernel, x86

From: Fenghua Yu <fenghua.yu@intel.com>

The x86_capability array in cpuinfo_x86 is defined as u32 and thus is
naturally aligned to 4 bytes. But, set_bit() and clear_bit() require
the array to be aligned to size of unsigned long (i.e. 8 bytes in
64-bit).

To fix the alignment issue, align the x86_capability array to size of
unsigned long by using unnamed union and 'unsigned long array_align'
to force the alignment.

Changing the x86_capability array's type to unsigned long may also fix
the issue because the x86_capability array will be naturally aligned
to size of unsigned long. But this needs additional code changes.
So choose the simpler solution by setting the array's alignment to size
of unsigned long.

Suggested-by: David Laight <David.Laight@aculab.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/include/asm/processor.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 6e0a3b43d027..c073534ca485 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -93,7 +93,15 @@ struct cpuinfo_x86 {
 	__u32			extended_cpuid_level;
 	/* Maximum supported CPUID level, -1=no CPUID: */
 	int			cpuid_level;
-	__u32			x86_capability[NCAPINTS + NBUGINTS];
+	/*
+	 * Align to size of unsigned long because the x86_capability array
+	 * is passed to bitops which require the alignment. Use unnamed
+	 * union to enforce the array is aligned to size of unsigned long.
+	 */
+	union {
+		__u32		x86_capability[NCAPINTS + NBUGINTS];
+		unsigned long	x86_capability_alignment;
+	};
 	char			x86_vendor_id[16];
 	char			x86_model_id[64];
 	/* in KB - valid for CPUS which support this call: */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 85+ messages in thread

* RE: [PATCH 3/3] x86/split_lock: Align the x86_capability array to size of unsigned long
  2019-09-16 22:39   ` [PATCH 3/3] x86/split_lock: Align the x86_capability array to size of unsigned long Tony Luck
@ 2019-09-17  8:29     ` David Laight
  2019-09-17 19:14       ` Luck, Tony
  2019-11-15 19:26     ` [tip: x86/cpu] x86/cpu: " tip-bot2 for Fenghua Yu
  1 sibling, 1 reply; 85+ messages in thread
From: David Laight @ 2019-09-17  8:29 UTC (permalink / raw)
  To: 'Tony Luck', Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel,
	x86

From: Tony Luck
> Sent: 16 September 2019 23:40
> From: Fenghua Yu <fenghua.yu@intel.com>
> 
> The x86_capability array in cpuinfo_x86 is defined as u32 and thus is
> naturally aligned to 4 bytes. But, set_bit() and clear_bit() require
> the array to be aligned to size of unsigned long (i.e. 8 bytes in
> 64-bit).
> 
> To fix the alignment issue, align the x86_capability array to size of
> unsigned long by using unnamed union and 'unsigned long array_align'
> to force the alignment.
> 
> Changing the x86_capability array's type to unsigned long may also fix
> the issue because the x86_capability array will be naturally aligned
> to size of unsigned long. But this needs additional code changes.
> So choose the simpler solution by setting the array's alignment to size
> of unsigned long.
> 
> Suggested-by: David Laight <David.Laight@aculab.com>

While this is probably the only play where this 'capabilities' array
has been detected as misaligned, ISTR there are several other places
where the identical array is defined and used.
These all need fixing as well.

	David

> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/include/asm/processor.h | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 6e0a3b43d027..c073534ca485 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -93,7 +93,15 @@ struct cpuinfo_x86 {
>  	__u32			extended_cpuid_level;
>  	/* Maximum supported CPUID level, -1=no CPUID: */
>  	int			cpuid_level;
> -	__u32			x86_capability[NCAPINTS + NBUGINTS];
> +	/*
> +	 * Align to size of unsigned long because the x86_capability array
> +	 * is passed to bitops which require the alignment. Use unnamed
> +	 * union to enforce the array is aligned to size of unsigned long.
> +	 */
> +	union {
> +		__u32		x86_capability[NCAPINTS + NBUGINTS];
> +		unsigned long	x86_capability_alignment;
> +	};
>  	char			x86_vendor_id[16];
>  	char			x86_model_id[64];
>  	/* in KB - valid for CPUS which support this call: */
> --
> 2.20.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH 3/3] x86/split_lock: Align the x86_capability array to size of unsigned long
  2019-09-17  8:29     ` David Laight
@ 2019-09-17 19:14       ` Luck, Tony
  2019-09-18  8:54         ` David Laight
  0 siblings, 1 reply; 85+ messages in thread
From: Luck, Tony @ 2019-09-17 19:14 UTC (permalink / raw)
  To: David Laight
  Cc: Thomas Gleixner, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Paolo Bonzini, Radim Krcmar, Sai Praneeth Prakhya,
	Ravi V Shankar, linux-kernel, x86

On Tue, Sep 17, 2019 at 08:29:28AM +0000, David Laight wrote:
> From: Tony Luck
> > Sent: 16 September 2019 23:40
> > From: Fenghua Yu <fenghua.yu@intel.com>
> > 
> > The x86_capability array in cpuinfo_x86 is defined as u32 and thus is
> > naturally aligned to 4 bytes. But, set_bit() and clear_bit() require
> > the array to be aligned to size of unsigned long (i.e. 8 bytes in
> > 64-bit).
> > 
> > To fix the alignment issue, align the x86_capability array to size of
> > unsigned long by using unnamed union and 'unsigned long array_align'
> > to force the alignment.
> > 
> > Changing the x86_capability array's type to unsigned long may also fix
> > the issue because the x86_capability array will be naturally aligned
> > to size of unsigned long. But this needs additional code changes.
> > So choose the simpler solution by setting the array's alignment to size
> > of unsigned long.
> > 
> > Suggested-by: David Laight <David.Laight@aculab.com>
> 
> While this is probably the only play where this 'capabilities' array
> has been detected as misaligned, ISTR there are several other places
> where the identical array is defined and used.
> These all need fixing as well.

Agree 100%  These three patches cover the places *detected* so
far. For bisectability reasons they need to be upstream before
the patches that add WARN_ON, or the one that turns on alignment
traps.  As we find other places, we can fix alignments in other
structures too.

If you remember what those other places are, please let us know
so we can push patches to fix those.

If you have a better strategy to find them ... that also would
be very interesting.

-Tony

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [PATCH 3/3] x86/split_lock: Align the x86_capability array to size of unsigned long
  2019-09-17 19:14       ` Luck, Tony
@ 2019-09-18  8:54         ` David Laight
  0 siblings, 0 replies; 85+ messages in thread
From: David Laight @ 2019-09-18  8:54 UTC (permalink / raw)
  To: 'Luck, Tony'
  Cc: Thomas Gleixner, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Paolo Bonzini, Radim Krcmar, Sai Praneeth Prakhya,
	Ravi V Shankar, linux-kernel, x86


From: Luck, Tony
> Sent: 17 September 2019 20:14
> On Tue, Sep 17, 2019 at 08:29:28AM +0000, David Laight wrote:
> > From: Tony Luck
> > > Sent: 16 September 2019 23:40
> > > From: Fenghua Yu <fenghua.yu@intel.com>
> > >
> > > The x86_capability array in cpuinfo_x86 is defined as u32 and thus is
> > > naturally aligned to 4 bytes. But, set_bit() and clear_bit() require
> > > the array to be aligned to size of unsigned long (i.e. 8 bytes in
> > > 64-bit).
> > >
> > > To fix the alignment issue, align the x86_capability array to size of
> > > unsigned long by using unnamed union and 'unsigned long array_align'
> > > to force the alignment.
> > >
> > > Changing the x86_capability array's type to unsigned long may also fix
> > > the issue because the x86_capability array will be naturally aligned
> > > to size of unsigned long. But this needs additional code changes.
> > > So choose the simpler solution by setting the array's alignment to size
> > > of unsigned long.
> > >
> > > Suggested-by: David Laight <David.Laight@aculab.com>
> >
> > While this is probably the only play where this 'capabilities' array
> > has been detected as misaligned, ISTR there are several other places
> > where the identical array is defined and used.
> > These all need fixing as well.
> 
> Agree 100%  These three patches cover the places *detected* so
> far. For bisectability reasons they need to be upstream before
> the patches that add WARN_ON, or the one that turns on alignment
> traps.  As we find other places, we can fix alignments in other
> structures too.
> 
> If you remember what those other places are, please let us know
> so we can push patches to fix those.
> 
> If you have a better strategy to find them ... that also would
> be very interesting.

ISTR doing the following:
1) Looking at the other places where the x86 capabilities got stored.
2) Searching for casts of the bit functions.
Try:
grep -r --include '*.[ch]' '_bit([^(]*, *([^)]*\*)' .

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-06-26 21:47       ` Thomas Gleixner
@ 2019-09-25 18:09         ` Sean Christopherson
  2019-10-16  6:58           ` Xiaoyao Li
                             ` (2 more replies)
  0 siblings, 3 replies; 85+ messages in thread
From: Sean Christopherson @ 2019-09-25 18:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams, Xiaoyao Li,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Wed, Jun 26, 2019 at 11:47:40PM +0200, Thomas Gleixner wrote:
> So only one of the CPUs will win the cmpxchg race, set te variable to 1 and
> warn, the other and any subsequent AC on any other CPU will not warn
> either. So you don't need WARN_ONCE() at all. It's redundant and confusing
> along with the atomic_set().
> 
> Whithout reading that link [1], what Ingo proposed was surely not the
> trainwreck which you decided to put into that debugfs thing.

We're trying to sort out the trainwreck, but there's an additional wrinkle
that I'd like your input on.

We overlooked the fact that MSR_TEST_CTRL is per-core, i.e. shared by
sibling hyperthreads.  This is especially problematic for KVM, as loading
MSR_TEST_CTRL during VM-Enter could cause spurious #AC faults in the kernel
and bounce MSR_TEST_CTRL.split_lock.

E.g. if CPU0 and CPU1 are siblings and CPU1 is running a KVM guest with
MSR_TEST_CTRL.split_lock=1, hitting an #AC on CPU0 in the host kernel will
lead to suprious #AC faults and constant toggling of of the MSR.

  CPU0               CPU1

         split_lock=enabled

  #AC -> disabled

                     VM-Enter -> enabled

  #AC -> disabled

                     VM-Enter -> enabled

  #AC -> disabled



My thought to handle this:

  - Remove the per-cpu cache.

  - Rework the atomic variable to differentiate between "disabled globally"
    and "disabled by kernel (on some CPUs)".

  - Modify the #AC handler to test/set the same atomic variable as the
    sysfs knob.  This is the "disabled by kernel" flow.

  - Modify the debugfs/sysfs knob to only allow disabling split-lock
    detection.  This is the "disabled globally" path, i.e. sends IPIs to
    clear MSR_TEST_CTRL.split_lock on all online CPUs.

  - Modify the resume/init flow to clear MSR_TEST_CTRL.split_lock if it's
    been disabled on *any* CPU via #AC or via the knob.

  - Modify the debugs/sysfs read function to either print the raw atomic
    variable, or differentiate between "enabled", "disabled globally" and
   "disabled by kernel".

  - Remove KVM loading of MSR_TEST_CTRL, i.e. KVM *never* writes the CPU's
    actual MSR_TEST_CTRL.  KVM still emulates MSR_TEST_CTRL so that the
    guest can do WRMSR and handle its own #AC faults, but KVM doesn't
    change the value in hardware.

      * Allowing guest to enable split-lock detection can induce #AC on
        the host after it has been explicitly turned off, e.g. the sibling
        hyperthread hits an #AC in the host kernel, or worse, causes a
        different process in the host to SIGBUS.

      * Allowing guest to disable split-lock detection opens up the host
        to DoS attacks.

  - KVM advertises split-lock detection to guest/userspace if and only if
    split_lock_detect_disabled is zero.

  - Add a pr_warn_once() in KVM that triggers if split locks are disabled
    after support has been advertised to a guest.

Does this sound sane?

The question at the forefront of my mind is: why not have the #AC handler
send a fire-and-forget IPI to online CPUs to disable split-lock detection
on all CPUs?  Would the IPI be problematic?  Globally disabling split-lock
on any #AC would (marginally) simplify the code and would eliminate the
oddity of userspace process (and KVM guest) #AC behavior varying based on
the physical CPU it's running on.


Something like:

#define SPLIT_LOCK_DISABLED_IN_KERNEL	BIT(0)
#define SPLIT_LOCK_DISABLED_GLOBALLY	BIT(1)

static atomic_t split_lock_detect_disabled = ATOMIT_INIT(0);

void split_lock_detect_ac(void)
{
	lockdep_assert_irqs_disabled();

	/* Disable split lock detection on this CPU to avoid reentrant #AC. */
	wrmsrl(MSR_TEST_CTRL,
	       rdmsrl(MSR_TEST_CTRL) & ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT);

	/*
	 * If split-lock detection has not been disabled, either by the kernel
	 * or globally, record that it has been disabled by the kernel and
	 * WARN.  Guarding WARN with the atomic ensures only the first #AC due
	 * to split-lock is logged, e.g. if multiple CPUs encounter #AC or if
	 * #AC is retriggered by a perf context NMI that interrupts the
	 * original WARN.
	 */
	if (atomic_cmpxchg(&split_lock_detect_disabled, 0,
			   SPLIT_LOCK_DISABLED_IN_KERNEL) == 0)
	        WARN(1, "split lock operation detected\n");
}

static ssize_t split_lock_detect_wr(struct file *f, const char __user *user_buf,
				    size_t count, loff_t *ppos)
{
	int old;

	<parse or ignore input value?>
	
	old = atomic_fetch_or(SPLIT_LOCK_DISABLED_GLOBALLY,
			      &split_lock_detect_disabled);

	/* Update MSR_TEST_CTRL unless split-lock was already disabled. */
	if (!(old & SPLIT_LOCK_DISABLED_GLOBALLY))
		on_each_cpu(split_lock_update, NULL, 1);

	return count;
}


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-09-25 18:09         ` Sean Christopherson
@ 2019-10-16  6:58           ` Xiaoyao Li
  2019-10-16  9:29           ` Thomas Gleixner
  2019-10-16  9:40           ` Paolo Bonzini
  2 siblings, 0 replies; 85+ messages in thread
From: Xiaoyao Li @ 2019-10-16  6:58 UTC (permalink / raw)
  To: Sean Christopherson, Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 9/26/2019 2:09 AM, Sean Christopherson wrote:
> On Wed, Jun 26, 2019 at 11:47:40PM +0200, Thomas Gleixner wrote:
>> So only one of the CPUs will win the cmpxchg race, set te variable to 1 and
>> warn, the other and any subsequent AC on any other CPU will not warn
>> either. So you don't need WARN_ONCE() at all. It's redundant and confusing
>> along with the atomic_set().
>>
>> Whithout reading that link [1], what Ingo proposed was surely not the
>> trainwreck which you decided to put into that debugfs thing.
> 
> We're trying to sort out the trainwreck, but there's an additional wrinkle
> that I'd like your input on.
> 
> We overlooked the fact that MSR_TEST_CTRL is per-core, i.e. shared by
> sibling hyperthreads.  This is especially problematic for KVM, as loading
> MSR_TEST_CTRL during VM-Enter could cause spurious #AC faults in the kernel
> and bounce MSR_TEST_CTRL.split_lock.
> 
> E.g. if CPU0 and CPU1 are siblings and CPU1 is running a KVM guest with
> MSR_TEST_CTRL.split_lock=1, hitting an #AC on CPU0 in the host kernel will
> lead to suprious #AC faults and constant toggling of of the MSR.
> 
>    CPU0               CPU1
> 
>           split_lock=enabled
> 
>    #AC -> disabled
> 
>                       VM-Enter -> enabled
> 
>    #AC -> disabled
> 
>                       VM-Enter -> enabled
> 
>    #AC -> disabled
> 
> 
> 
> My thought to handle this:
> 
>    - Remove the per-cpu cache.
> 
>    - Rework the atomic variable to differentiate between "disabled globally"
>      and "disabled by kernel (on some CPUs)".
> 
>    - Modify the #AC handler to test/set the same atomic variable as the
>      sysfs knob.  This is the "disabled by kernel" flow.
> 
>    - Modify the debugfs/sysfs knob to only allow disabling split-lock
>      detection.  This is the "disabled globally" path, i.e. sends IPIs to
>      clear MSR_TEST_CTRL.split_lock on all online CPUs.
> 
>    - Modify the resume/init flow to clear MSR_TEST_CTRL.split_lock if it's
>      been disabled on *any* CPU via #AC or via the knob.
> 
>    - Modify the debugs/sysfs read function to either print the raw atomic
>      variable, or differentiate between "enabled", "disabled globally" and
>     "disabled by kernel".
> 
>    - Remove KVM loading of MSR_TEST_CTRL, i.e. KVM *never* writes the CPU's
>      actual MSR_TEST_CTRL.  KVM still emulates MSR_TEST_CTRL so that the
>      guest can do WRMSR and handle its own #AC faults, but KVM doesn't
>      change the value in hardware.
> 
>        * Allowing guest to enable split-lock detection can induce #AC on
>          the host after it has been explicitly turned off, e.g. the sibling
>          hyperthread hits an #AC in the host kernel, or worse, causes a
>          different process in the host to SIGBUS.
> 
>        * Allowing guest to disable split-lock detection opens up the host
>          to DoS attacks.
> 
>    - KVM advertises split-lock detection to guest/userspace if and only if
>      split_lock_detect_disabled is zero.
> 
>    - Add a pr_warn_once() in KVM that triggers if split locks are disabled
>      after support has been advertised to a guest.
> 
> Does this sound sane?
> 
> The question at the forefront of my mind is: why not have the #AC handler
> send a fire-and-forget IPI to online CPUs to disable split-lock detection
> on all CPUs?  Would the IPI be problematic?  Globally disabling split-lock
> on any #AC would (marginally) simplify the code and would eliminate the
> oddity of userspace process (and KVM guest) #AC behavior varying based on
> the physical CPU it's running on.
> 
> 
> Something like:
> 
> #define SPLIT_LOCK_DISABLED_IN_KERNEL	BIT(0)
> #define SPLIT_LOCK_DISABLED_GLOBALLY	BIT(1)
> 
> static atomic_t split_lock_detect_disabled = ATOMIT_INIT(0);
> 
> void split_lock_detect_ac(void)
> {
> 	lockdep_assert_irqs_disabled();
> 
> 	/* Disable split lock detection on this CPU to avoid reentrant #AC. */
> 	wrmsrl(MSR_TEST_CTRL,
> 	       rdmsrl(MSR_TEST_CTRL) & ~MSR_TEST_CTRL_SPLIT_LOCK_DETECT);
> 
> 	/*
> 	 * If split-lock detection has not been disabled, either by the kernel
> 	 * or globally, record that it has been disabled by the kernel and
> 	 * WARN.  Guarding WARN with the atomic ensures only the first #AC due
> 	 * to split-lock is logged, e.g. if multiple CPUs encounter #AC or if
> 	 * #AC is retriggered by a perf context NMI that interrupts the
> 	 * original WARN.
> 	 */
> 	if (atomic_cmpxchg(&split_lock_detect_disabled, 0,
> 			   SPLIT_LOCK_DISABLED_IN_KERNEL) == 0)
> 	        WARN(1, "split lock operation detected\n");
> }
> 
> static ssize_t split_lock_detect_wr(struct file *f, const char __user *user_buf,
> 				    size_t count, loff_t *ppos)
> {
> 	int old;
> 
> 	<parse or ignore input value?>
> 	
> 	old = atomic_fetch_or(SPLIT_LOCK_DISABLED_GLOBALLY,
> 			      &split_lock_detect_disabled);
> 
> 	/* Update MSR_TEST_CTRL unless split-lock was already disabled. */
> 	if (!(old & SPLIT_LOCK_DISABLED_GLOBALLY))
> 		on_each_cpu(split_lock_update, NULL, 1);
> 
> 	return count;
> }
> 

Hi Thomas,

Could you please have a look at Sean's proposal and give your opinion.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-09-25 18:09         ` Sean Christopherson
  2019-10-16  6:58           ` Xiaoyao Li
@ 2019-10-16  9:29           ` Thomas Gleixner
  2019-10-16 15:59             ` Sean Christopherson
  2019-10-16  9:40           ` Paolo Bonzini
  2 siblings, 1 reply; 85+ messages in thread
From: Thomas Gleixner @ 2019-10-16  9:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Fenghua Yu, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams, Xiaoyao Li,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

Sean,

On Wed, 25 Sep 2019, Sean Christopherson wrote:

sorry for the late reply. This got lost in travel/conferencing/vacation
induced backlog.

> On Wed, Jun 26, 2019 at 11:47:40PM +0200, Thomas Gleixner wrote:
> > So only one of the CPUs will win the cmpxchg race, set te variable to 1 and
> > warn, the other and any subsequent AC on any other CPU will not warn
> > either. So you don't need WARN_ONCE() at all. It's redundant and confusing
> > along with the atomic_set().
> > 
> > Whithout reading that link [1], what Ingo proposed was surely not the
> > trainwreck which you decided to put into that debugfs thing.
> 
> We're trying to sort out the trainwreck, but there's an additional wrinkle
> that I'd like your input on.
> 
> We overlooked the fact that MSR_TEST_CTRL is per-core, i.e. shared by
> sibling hyperthreads.

You must be kidding. It took 9 revisions of trainwreck engineering to
find that out.

> This is especially problematic for KVM, as loading MSR_TEST_CTRL during
> VM-Enter could cause spurious #AC faults in the kernel and bounce
> MSR_TEST_CTRL.split_lock.
>
> E.g. if CPU0 and CPU1 are siblings and CPU1 is running a KVM guest with
> MSR_TEST_CTRL.split_lock=1, hitting an #AC on CPU0 in the host kernel will
> lead to suprious #AC faults and constant toggling of of the MSR.
>
> My thought to handle this:
> 
>   - Remove the per-cpu cache.
>
>   - Rework the atomic variable to differentiate between "disabled globally"
>     and "disabled by kernel (on some CPUs)".

Under the assumption that the kernel should never trigger #AC anyway, that
should be good enough.

>   - Modify the #AC handler to test/set the same atomic variable as the
>     sysfs knob.  This is the "disabled by kernel" flow.

That's the #AC in kernel handler, right?
 
>   - Modify the debugfs/sysfs knob to only allow disabling split-lock
>     detection.  This is the "disabled globally" path, i.e. sends IPIs to
>     clear MSR_TEST_CTRL.split_lock on all online CPUs.

Why only disable? What's wrong with reenabling it? The shiny new driver you
are working on is triggering #AC. So in order to test the fix, you need to
reboot the machine instead of just unloading the module, reenabling #AC and
then loading the fixed one?

>   - Modify the resume/init flow to clear MSR_TEST_CTRL.split_lock if it's
>     been disabled on *any* CPU via #AC or via the knob.

Fine.

>   - Remove KVM loading of MSR_TEST_CTRL, i.e. KVM *never* writes the CPU's
>     actual MSR_TEST_CTRL.  KVM still emulates MSR_TEST_CTRL so that the
>     guest can do WRMSR and handle its own #AC faults, but KVM doesn't
>     change the value in hardware.
> 
>       * Allowing guest to enable split-lock detection can induce #AC on
>         the host after it has been explicitly turned off, e.g. the sibling
>         hyperthread hits an #AC in the host kernel, or worse, causes a
>         different process in the host to SIGBUS.
>
>       * Allowing guest to disable split-lock detection opens up the host
>         to DoS attacks.

Wasn't this discussed before and agreed on that if the host has AC enabled
that the guest should not be able to force disable it? I surely lost track
of this completely so my memory might trick me.

The real question is what you do when the host has #AC enabled and the
guest 'disabled' it and triggers #AC. Is that going to be silently ignored
or is the intention to kill the guest in the same way as we kill userspace?

The latter would be the right thing, but given the fact that the current
kernels easily trigger #AC today, that would cause a major wreckage in
hosting scenarios. So I fear we need to bite the bullet and have a knob
which defaults to 'handle silently' and allows to enable the kill mechanics
on purpose. 'Handle silently' needs some logging of course, at least a per
guest counter which can be queried and a tracepoint.

>   - KVM advertises split-lock detection to guest/userspace if and only if
>     split_lock_detect_disabled is zero.

Assuming that the host kernel is clean, fine. If the sysadmin disables it
after boot and after starting guests, it's his problem.

>   - Add a pr_warn_once() in KVM that triggers if split locks are disabled
>     after support has been advertised to a guest.

The pr_warn() is more or less redundant, but no strong opinion here.

> The question at the forefront of my mind is: why not have the #AC handler
> send a fire-and-forget IPI to online CPUs to disable split-lock detection
> on all CPUs?  Would the IPI be problematic?  Globally disabling split-lock
> on any #AC would (marginally) simplify the code and would eliminate the
> oddity of userspace process (and KVM guest) #AC behavior varying based on
> the physical CPU it's running on.

I'm fine with the IPI under the assumption that the kernel should never
trigger it at all in production.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-09-25 18:09         ` Sean Christopherson
  2019-10-16  6:58           ` Xiaoyao Li
  2019-10-16  9:29           ` Thomas Gleixner
@ 2019-10-16  9:40           ` Paolo Bonzini
  2019-10-16  9:47             ` Thomas Gleixner
  2 siblings, 1 reply; 85+ messages in thread
From: Paolo Bonzini @ 2019-10-16  9:40 UTC (permalink / raw)
  To: Sean Christopherson, Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Radim Krcmar,
	Ashok Raj, Tony Luck, Dan Williams, Xiaoyao Li,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 25/09/19 20:09, Sean Christopherson wrote:
> We're trying to sort out the trainwreck, but there's an additional wrinkle
> that I'd like your input on.

That's not exactly a wrinkle...

>   - Remove KVM loading of MSR_TEST_CTRL, i.e. KVM *never* writes the CPU's
>     actual MSR_TEST_CTRL.  KVM still emulates MSR_TEST_CTRL so that the
>     guest can do WRMSR and handle its own #AC faults, but KVM doesn't
>     change the value in hardware.
> 
>       * Allowing guest to enable split-lock detection can induce #AC on
>         the host after it has been explicitly turned off, e.g. the sibling
>         hyperthread hits an #AC in the host kernel, or worse, causes a
>         different process in the host to SIGBUS.
> 
>       * Allowing guest to disable split-lock detection opens up the host
>         to DoS attacks.
> 
>   - KVM advertises split-lock detection to guest/userspace if and only if
>     split_lock_detect_disabled is zero.
> 
>   - Add a pr_warn_once() in KVM that triggers if split locks are disabled
>     after support has been advertised to a guest.
> 
> Does this sound sane?

Not really, unfortunately.  Just never advertise split-lock detection to
guests.  If the host has enabled split-lock detection, trap #AC and
forward it to the host handler---which would disable split lock
detection globally and reenter the guest.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16  9:40           ` Paolo Bonzini
@ 2019-10-16  9:47             ` Thomas Gleixner
  2019-10-16 10:16               ` Paolo Bonzini
  0 siblings, 1 reply; 85+ messages in thread
From: Thomas Gleixner @ 2019-10-16  9:47 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams, Xiaoyao Li,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Wed, 16 Oct 2019, Paolo Bonzini wrote:
> On 25/09/19 20:09, Sean Christopherson wrote:
> >   - Remove KVM loading of MSR_TEST_CTRL, i.e. KVM *never* writes the CPU's
> >     actual MSR_TEST_CTRL.  KVM still emulates MSR_TEST_CTRL so that the
> >     guest can do WRMSR and handle its own #AC faults, but KVM doesn't
> >     change the value in hardware.
> > 
> >       * Allowing guest to enable split-lock detection can induce #AC on
> >         the host after it has been explicitly turned off, e.g. the sibling
> >         hyperthread hits an #AC in the host kernel, or worse, causes a
> >         different process in the host to SIGBUS.
> > 
> >       * Allowing guest to disable split-lock detection opens up the host
> >         to DoS attacks.
> > 
> >   - KVM advertises split-lock detection to guest/userspace if and only if
> >     split_lock_detect_disabled is zero.
> > 
> >   - Add a pr_warn_once() in KVM that triggers if split locks are disabled
> >     after support has been advertised to a guest.
> > 
> > Does this sound sane?
> 
> Not really, unfortunately.  Just never advertise split-lock detection to
> guests.  If the host has enabled split-lock detection, trap #AC and
> forward it to the host handler---which would disable split lock
> detection globally and reenter the guest.

Which completely defeats the purpose.

1) Sane guest

   Guest kernel has #AC handler and you basically prevent it from detecting
   malicious user space and killing it. You also prevent #AC detection in
   the guest kernel which limits debugability.

2) Malicious guest

   Trigger #AC to disable the host detection and then carry out the DoS
   attack.

Try again.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16  9:47             ` Thomas Gleixner
@ 2019-10-16 10:16               ` Paolo Bonzini
  2019-10-16 11:23                 ` Xiaoyao Li
  2019-10-16 11:49                 ` [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock Thomas Gleixner
  0 siblings, 2 replies; 85+ messages in thread
From: Paolo Bonzini @ 2019-10-16 10:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams, Xiaoyao Li,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 16/10/19 11:47, Thomas Gleixner wrote:
> On Wed, 16 Oct 2019, Paolo Bonzini wrote:
>> Just never advertise split-lock
>> detection to guests.  If the host has enabled split-lock detection,
>> trap #AC and forward it to the host handler---which would disable
>> split lock detection globally and reenter the guest.
> 
> Which completely defeats the purpose.

Yes it does.  But Sean's proposal, as I understand it, leads to the
guest receiving #AC when it wasn't expecting one.  So for an old guest,
as soon as the guest kernel happens to do a split lock, it gets an
unexpected #AC and crashes and burns.  And then, after much googling and
gnashing of teeth, people proceed to disable split lock detection.

(Old guests are the common case: you're a cloud provider and your
customers run old stuff; it's a workstation and you want to play that
game that requires an old version of Windows; etc.).

To save them the googling and gnashing of teeth, I guess we can do a
pr_warn_ratelimited on the first split lock encountered by a guest.  (It
has to be ratelimited because userspace could create an arbitrary amount
of guests to spam the kernel logs).  But the end result is the same,
split lock detection is disabled by the user.

The first alternative I thought of was:

- Remove KVM loading of MSR_TEST_CTRL, i.e. KVM *never* writes the CPU's
  actual MSR_TEST_CTRL.  KVM still emulates MSR_TEST_CTRL so that the
  guest can do WRMSR and handle its own #AC faults, but KVM doesn't
  change the value in hardware.

- trap #AC if the guest encounters a split lock while detection is
  disabled, and then disable split-lock detection in the host.

But I discarded it because it still doesn't do anything for malicious
guests, which can trigger #AC as they prefer.  And it makes things
_worse_ for sane guests, because they think split-lock detection is
enabled but they become vulnerable as soon as there is only one
malicious guest on the same machine.

In all of these cases, the common final result is that split-lock
detection is disabled on the host.  So might as well go with the
simplest one and not pretend to virtualize something that (without core
scheduling) is obviously not virtualizable.

Thanks,

Paolo

> 1) Sane guest
> 
> Guest kernel has #AC handler and you basically prevent it from
> detecting malicious user space and killing it. You also prevent #AC
> detection in the guest kernel which limits debugability.
> 
> 2) Malicious guest
> 
> Trigger #AC to disable the host detection and then carry out the DoS 
> attack.



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 10:16               ` Paolo Bonzini
@ 2019-10-16 11:23                 ` Xiaoyao Li
  2019-10-16 11:26                   ` Paolo Bonzini
  2019-10-16 11:49                 ` [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock Thomas Gleixner
  1 sibling, 1 reply; 85+ messages in thread
From: Xiaoyao Li @ 2019-10-16 11:23 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 10/16/2019 6:16 PM, Paolo Bonzini wrote:
> On 16/10/19 11:47, Thomas Gleixner wrote:
>> On Wed, 16 Oct 2019, Paolo Bonzini wrote:
>>> Just never advertise split-lock
>>> detection to guests.  If the host has enabled split-lock detection,
>>> trap #AC and forward it to the host handler---which would disable
>>> split lock detection globally and reenter the guest.
>>
>> Which completely defeats the purpose.
> 
> Yes it does.  But Sean's proposal, as I understand it, leads to the
> guest receiving #AC when it wasn't expecting one.  So for an old guest,
> as soon as the guest kernel happens to do a split lock, it gets an
> unexpected #AC and crashes and burns.  And then, after much googling and
> gnashing of teeth, people proceed to disable split lock detection.
> 
> (Old guests are the common case: you're a cloud provider and your
> customers run old stuff; it's a workstation and you want to play that
> game that requires an old version of Windows; etc.).
> 
> To save them the googling and gnashing of teeth, I guess we can do a
> pr_warn_ratelimited on the first split lock encountered by a guest.  (It
> has to be ratelimited because userspace could create an arbitrary amount
> of guests to spam the kernel logs).  But the end result is the same,
> split lock detection is disabled by the user.
> 
> The first alternative I thought of was:
> 
> - Remove KVM loading of MSR_TEST_CTRL, i.e. KVM *never* writes the CPU's
>    actual MSR_TEST_CTRL.  KVM still emulates MSR_TEST_CTRL so that the
>    guest can do WRMSR and handle its own #AC faults, but KVM doesn't
>    change the value in hardware.
> 
> - trap #AC if the guest encounters a split lock while detection is
>    disabled, and then disable split-lock detection in the host.
> 
> But I discarded it because it still doesn't do anything for malicious
> guests, which can trigger #AC as they prefer.  And it makes things
> _worse_ for sane guests, because they think split-lock detection is
> enabled but they become vulnerable as soon as there is only one
> malicious guest on the same machine.
> 
> In all of these cases, the common final result is that split-lock
> detection is disabled on the host.  So might as well go with the
> simplest one and not pretend to virtualize something that (without core
> scheduling) is obviously not virtualizable.

Right, the nature of core-scope makes MSR_TEST_CTL impossible/hard to 
virtualize.

- Making old guests survive needs to disable split-lock detection in 
host(hardware).
- Defending malicious guests needs to enable split-lock detection in 
host(hardware).

We cannot achieve them at the same time.

In my opinion, letting kvm disable the split-lock detection in host is 
not acceptable that it just opens the door for malicious guests to 
attack. I think we can use Sean's proposal like below.

KVM always traps #AC, and only advertises split-lock detection to guest 
when the global variable split_lock_detection_enabled in host is true.

- If guest enables #AC (CPL3 alignment check or split-lock detection 
enabled), injecting #AC back into guest since it's supposed capable of 
handling it.
- If guest doesn't enable #AC, KVM reports #AC to userspace (like other 
unexpected exceptions), and we can print a hint in kernel, or let 
userspace (e.g., QEMU) tell the user guest is killed because there is a 
split-lock in guest.

In this way, malicious guests always get killed by userspace and old 
sane guests cannot survive as well if it causes split-lock. If we do 
want old sane guests work we have to disable the split-lock detection 
(through booting parameter or debugfs) in the host just the same as we 
want to run an old and split-lock generating userspace binary.

But there is an issue that we advertise split-lock detection to guest 
based on the value of split_lock_detection_enabled to be true in host, 
which can be turned into false dynamically when split-lock happens in 
host kernel. This causes guest's capability changes at run time and I 
don't if there is a better way to inform guest? Maybe we need a pv 
interface?

> Thanks,
> 
> Paolo
> 
>> 1) Sane guest
>>
>> Guest kernel has #AC handler and you basically prevent it from
>> detecting malicious user space and killing it. You also prevent #AC
>> detection in the guest kernel which limits debugability.
>>
>> 2) Malicious guest
>>
>> Trigger #AC to disable the host detection and then carry out the DoS
>> attack.
> 
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 11:23                 ` Xiaoyao Li
@ 2019-10-16 11:26                   ` Paolo Bonzini
  2019-10-16 13:13                     ` Xiaoyao Li
  0 siblings, 1 reply; 85+ messages in thread
From: Paolo Bonzini @ 2019-10-16 11:26 UTC (permalink / raw)
  To: Xiaoyao Li, Thomas Gleixner
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 16/10/19 13:23, Xiaoyao Li wrote:
> KVM always traps #AC, and only advertises split-lock detection to guest
> when the global variable split_lock_detection_enabled in host is true.
> 
> - If guest enables #AC (CPL3 alignment check or split-lock detection
> enabled), injecting #AC back into guest since it's supposed capable of
> handling it.
> - If guest doesn't enable #AC, KVM reports #AC to userspace (like other
> unexpected exceptions), and we can print a hint in kernel, or let
> userspace (e.g., QEMU) tell the user guest is killed because there is a
> split-lock in guest.
> 
> In this way, malicious guests always get killed by userspace and old
> sane guests cannot survive as well if it causes split-lock. If we do
> want old sane guests work we have to disable the split-lock detection
> (through booting parameter or debugfs) in the host just the same as we
> want to run an old and split-lock generating userspace binary.

Old guests are prevalent enough that enabling split-lock detection by
default would be a big usability issue.  And even ignoring that, you
would get the issue you describe below:

> But there is an issue that we advertise split-lock detection to guest
> based on the value of split_lock_detection_enabled to be true in host,
> which can be turned into false dynamically when split-lock happens in
> host kernel.

... which means that supposedly safe guests become unsafe, and that is bad.

> This causes guest's capability changes at run time and I
> don't if there is a better way to inform guest? Maybe we need a pv
> interface?

Even a PV interface would not change the basic fact that a supposedly
safe configuration becomes unsafe.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 10:16               ` Paolo Bonzini
  2019-10-16 11:23                 ` Xiaoyao Li
@ 2019-10-16 11:49                 ` Thomas Gleixner
  2019-10-16 11:58                   ` Paolo Bonzini
  1 sibling, 1 reply; 85+ messages in thread
From: Thomas Gleixner @ 2019-10-16 11:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams, Xiaoyao Li,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Wed, 16 Oct 2019, Paolo Bonzini wrote:
> On 16/10/19 11:47, Thomas Gleixner wrote:
> > On Wed, 16 Oct 2019, Paolo Bonzini wrote:
> >> Just never advertise split-lock
> >> detection to guests.  If the host has enabled split-lock detection,
> >> trap #AC and forward it to the host handler---which would disable
> >> split lock detection globally and reenter the guest.
> > 
> > Which completely defeats the purpose.
> 
> Yes it does.  But Sean's proposal, as I understand it, leads to the
> guest receiving #AC when it wasn't expecting one.  So for an old guest,
> as soon as the guest kernel happens to do a split lock, it gets an
> unexpected #AC and crashes and burns.  And then, after much googling and
> gnashing of teeth, people proceed to disable split lock detection.

I don't think that this was what he suggested/intended.

> In all of these cases, the common final result is that split-lock
> detection is disabled on the host.  So might as well go with the
> simplest one and not pretend to virtualize something that (without core
> scheduling) is obviously not virtualizable.

You are completely ignoring any argument here and just leave it behind your
signature (instead of trimming your reply).

> > 1) Sane guest
> > 
> > Guest kernel has #AC handler and you basically prevent it from
> > detecting malicious user space and killing it. You also prevent #AC
> > detection in the guest kernel which limits debugability.

That's a perfectly fine situation. Host has #AC enabled and exposes the
availability of #AC to the guest. Guest kernel has a proper handler and
does the right thing. So the host _CAN_ forward #AC to the guest and let it
deal with it. For that to work you need to expose the MSR so you know the
guest state in the host.

Your lazy 'solution' just renders #AC completely useless even for
debugging.

> > 2) Malicious guest
> > 
> > Trigger #AC to disable the host detection and then carry out the DoS 
> > attack.

With your proposal you render #AC useless even on hosts which have SMT
disabled, which is just wrong. There are enough good reasons to disable
SMT.

I agree that with SMT enabled the situation is truly bad, but we surely can
be smarter than just disabling it globally unconditionally and forever.

Plus we want a knob which treats guests triggering #AC in the same way as
we treat user space, i.e. kill them with SIGBUS.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 11:49                 ` [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock Thomas Gleixner
@ 2019-10-16 11:58                   ` Paolo Bonzini
  2019-10-16 13:51                     ` Xiaoyao Li
  0 siblings, 1 reply; 85+ messages in thread
From: Paolo Bonzini @ 2019-10-16 11:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams, Xiaoyao Li,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 16/10/19 13:49, Thomas Gleixner wrote:
> On Wed, 16 Oct 2019, Paolo Bonzini wrote:
>> Yes it does.  But Sean's proposal, as I understand it, leads to the
>> guest receiving #AC when it wasn't expecting one.  So for an old guest,
>> as soon as the guest kernel happens to do a split lock, it gets an
>> unexpected #AC and crashes and burns.  And then, after much googling and
>> gnashing of teeth, people proceed to disable split lock detection.
> 
> I don't think that this was what he suggested/intended.

Xiaoyao's reply suggests that he also understood it like that.

>> In all of these cases, the common final result is that split-lock
>> detection is disabled on the host.  So might as well go with the
>> simplest one and not pretend to virtualize something that (without core
>> scheduling) is obviously not virtualizable.
> 
> You are completely ignoring any argument here and just leave it behind your
> signature (instead of trimming your reply).

I am not ignoring them, I think there is no doubt that this is the
intended behavior.  I disagree that Sean's patches achieve it, however.

>>> 1) Sane guest
>>>
>>> Guest kernel has #AC handler and you basically prevent it from
>>> detecting malicious user space and killing it. You also prevent #AC
>>> detection in the guest kernel which limits debugability.
> 
> That's a perfectly fine situation. Host has #AC enabled and exposes the
> availability of #AC to the guest. Guest kernel has a proper handler and
> does the right thing. So the host _CAN_ forward #AC to the guest and let it
> deal with it. For that to work you need to expose the MSR so you know the
> guest state in the host.
> 
> Your lazy 'solution' just renders #AC completely useless even for
> debugging.
> 
>>> 2) Malicious guest
>>>
>>> Trigger #AC to disable the host detection and then carry out the DoS 
>>> attack.
> 
> With your proposal you render #AC useless even on hosts which have SMT
> disabled, which is just wrong. There are enough good reasons to disable
> SMT.

My lazy "solution" only applies to SMT enabled.  When SMT is either not
supported, or disabled as in "nosmt=force", we can virtualize it like
the posted patches have done so far.

> I agree that with SMT enabled the situation is truly bad, but we surely can
> be smarter than just disabling it globally unconditionally and forever.
> 
> Plus we want a knob which treats guests triggering #AC in the same way as
> we treat user space, i.e. kill them with SIGBUS.

Yes, that's a valid alternative.  But if SMT is possible, I think the
only sane possibilities are global disable and SIGBUS.  SIGBUS (or
better, a new KVM_RUN exit code) can be acceptable for debugging guests too.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 11:26                   ` Paolo Bonzini
@ 2019-10-16 13:13                     ` Xiaoyao Li
  2019-10-16 14:43                       ` Thomas Gleixner
  0 siblings, 1 reply; 85+ messages in thread
From: Xiaoyao Li @ 2019-10-16 13:13 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 10/16/2019 7:26 PM, Paolo Bonzini wrote:
> On 16/10/19 13:23, Xiaoyao Li wrote:
>> KVM always traps #AC, and only advertises split-lock detection to guest
>> when the global variable split_lock_detection_enabled in host is true.
>>
>> - If guest enables #AC (CPL3 alignment check or split-lock detection
>> enabled), injecting #AC back into guest since it's supposed capable of
>> handling it.
>> - If guest doesn't enable #AC, KVM reports #AC to userspace (like other
>> unexpected exceptions), and we can print a hint in kernel, or let
>> userspace (e.g., QEMU) tell the user guest is killed because there is a
>> split-lock in guest.
>>
>> In this way, malicious guests always get killed by userspace and old
>> sane guests cannot survive as well if it causes split-lock. If we do
>> want old sane guests work we have to disable the split-lock detection
>> (through booting parameter or debugfs) in the host just the same as we
>> want to run an old and split-lock generating userspace binary.
> 
> Old guests are prevalent enough that enabling split-lock detection by
> default would be a big usability issue.  And even ignoring that, you
> would get the issue you describe below:

Right, whether enabling split-lock detection is made by the 
administrator. The administrator is supposed to know the consequence of 
enabling it. Enabling it means don't want any split-lock happens in 
userspace, of course VMM softwares are under control.

>> But there is an issue that we advertise split-lock detection to guest
>> based on the value of split_lock_detection_enabled to be true in host,
>> which can be turned into false dynamically when split-lock happens in
>> host kernel.
> 
> ... which means that supposedly safe guests become unsafe, and that is bad.
> 
>> This causes guest's capability changes at run time and I
>> don't if there is a better way to inform guest? Maybe we need a pv
>> interface?
> 
> Even a PV interface would not change the basic fact that a supposedly
> safe configuration becomes unsafe.

I don't catch you about the unsafe?

If host disables split-lock detection dynamically, then the 
MST_TEST_CTL.split_lock is clear in the hardware and we can use the PV 
interface to notify the guest so that guest knows it loses the 
capability of split-lock detection. In this case, I think safety is 
meaningless for both host and guest.

> Paolo
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 11:58                   ` Paolo Bonzini
@ 2019-10-16 13:51                     ` Xiaoyao Li
  2019-10-16 14:08                       ` Paolo Bonzini
  2019-10-16 14:50                       ` Thomas Gleixner
  0 siblings, 2 replies; 85+ messages in thread
From: Xiaoyao Li @ 2019-10-16 13:51 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 10/16/2019 7:58 PM, Paolo Bonzini wrote:
> On 16/10/19 13:49, Thomas Gleixner wrote:
>> On Wed, 16 Oct 2019, Paolo Bonzini wrote:
>>> Yes it does.  But Sean's proposal, as I understand it, leads to the
>>> guest receiving #AC when it wasn't expecting one.  So for an old guest,
>>> as soon as the guest kernel happens to do a split lock, it gets an
>>> unexpected #AC and crashes and burns.  And then, after much googling and
>>> gnashing of teeth, people proceed to disable split lock detection.
>>
>> I don't think that this was what he suggested/intended.
> 
> Xiaoyao's reply suggests that he also understood it like that.
>

Actually, what I replied is a little different from what you stated 
above that guest won't receive #AC when it wasn't expecting one but the 
userspace receives this #AC.

>>> In all of these cases, the common final result is that split-lock
>>> detection is disabled on the host.  So might as well go with the
>>> simplest one and not pretend to virtualize something that (without core
>>> scheduling) is obviously not virtualizable.
>>
>> You are completely ignoring any argument here and just leave it behind your
>> signature (instead of trimming your reply).
> 
> I am not ignoring them, I think there is no doubt that this is the
> intended behavior.  I disagree that Sean's patches achieve it, however.
> 
>>>> 1) Sane guest
>>>>
>>>> Guest kernel has #AC handler and you basically prevent it from
>>>> detecting malicious user space and killing it. You also prevent #AC
>>>> detection in the guest kernel which limits debugability.
>>
>> That's a perfectly fine situation. Host has #AC enabled and exposes the
>> availability of #AC to the guest. Guest kernel has a proper handler and
>> does the right thing. So the host _CAN_ forward #AC to the guest and let it
>> deal with it. For that to work you need to expose the MSR so you know the
>> guest state in the host.
>>
>> Your lazy 'solution' just renders #AC completely useless even for
>> debugging.
>>
>>>> 2) Malicious guest
>>>>
>>>> Trigger #AC to disable the host detection and then carry out the DoS
>>>> attack.
>>
>> With your proposal you render #AC useless even on hosts which have SMT
>> disabled, which is just wrong. There are enough good reasons to disable
>> SMT.
> 
> My lazy "solution" only applies to SMT enabled.  When SMT is either not
> supported, or disabled as in "nosmt=force", we can virtualize it like
> the posted patches have done so far.
> 

Do we really need to divide it into two cases of SMT enabled and SMT 
disabled?

>> I agree that with SMT enabled the situation is truly bad, but we surely can
>> be smarter than just disabling it globally unconditionally and forever.
>>
>> Plus we want a knob which treats guests triggering #AC in the same way as
>> we treat user space, i.e. kill them with SIGBUS.
> 
> Yes, that's a valid alternative.  But if SMT is possible, I think the
> only sane possibilities are global disable and SIGBUS.  SIGBUS (or
> better, a new KVM_RUN exit code) can be acceptable for debugging guests too.

If SIGBUS, why need to globally disable?

When there is an #AC due to split-lock in guest, KVM only has below two 
choices:
1) inject back into guest.
    - If kvm advertise this feature to guest, and guest kernel is 
latest, and guest kernel must enable it too. It's the happy case that 
guest can handler it on its own purpose.
    - Any other cases, guest get an unexpected #AC and crash.
2) report to userspace (I think the same like a SIGBUS)

So for simplicity, we can do what Paolo suggested that don't advertise 
this feature and report #AC to userspace when an #AC due to split-lock 
in guest *but* we never disable the host's split-lock detection due to 
guest's split-lock.

> Paolo
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 13:51                     ` Xiaoyao Li
@ 2019-10-16 14:08                       ` Paolo Bonzini
  2019-10-16 14:14                         ` David Laight
  2019-10-16 15:41                         ` Sean Christopherson
  2019-10-16 14:50                       ` Thomas Gleixner
  1 sibling, 2 replies; 85+ messages in thread
From: Paolo Bonzini @ 2019-10-16 14:08 UTC (permalink / raw)
  To: Xiaoyao Li, Thomas Gleixner
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 16/10/19 15:51, Xiaoyao Li wrote:
> On 10/16/2019 7:58 PM, Paolo Bonzini wrote:
>> On 16/10/19 13:49, Thomas Gleixner wrote:
>>> On Wed, 16 Oct 2019, Paolo Bonzini wrote:
>>>> Yes it does.  But Sean's proposal, as I understand it, leads to the
>>>> guest receiving #AC when it wasn't expecting one.  So for an old guest,
>>>> as soon as the guest kernel happens to do a split lock, it gets an
>>>> unexpected #AC and crashes and burns.  And then, after much googling
>>>> and
>>>> gnashing of teeth, people proceed to disable split lock detection.
>>>
>>> I don't think that this was what he suggested/intended.
>>
>> Xiaoyao's reply suggests that he also understood it like that.
> 
> Actually, what I replied is a little different from what you stated
> above that guest won't receive #AC when it wasn't expecting one but the
> userspace receives this #AC.

Okay---but userspace has no choice but to crash the guest, which is okay
for debugging but, most likely, undesirable behavior in production.

>>> With your proposal you render #AC useless even on hosts which have SMT
>>> disabled, which is just wrong. There are enough good reasons to disable
>>> SMT.
>>
>> My lazy "solution" only applies to SMT enabled.  When SMT is either not
>> supported, or disabled as in "nosmt=force", we can virtualize it like
>> the posted patches have done so far.
> 
> Do we really need to divide it into two cases of SMT enabled and SMT
> disabled?

Yes, absolutely.  Because in one case MSR_TEST_CTRL behaves sanely, in
the other it doesn't.

>> Yes, that's a valid alternative.  But if SMT is possible, I think the
>> only sane possibilities are global disable and SIGBUS.  SIGBUS (or
>> better, a new KVM_RUN exit code) can be acceptable for debugging
>> guests too.
> 
> If SIGBUS, why need to globally disable?

SIGBUS (actually a new KVM_EXIT_INTERNAL_ERROR result from KVM_RUN is
better, but that's the idea) is for when you're debugging guests.
Global disable (or alternatively, disable SMT) is for production use.

> When there is an #AC due to split-lock in guest, KVM only has below two
> choices:
> 1) inject back into guest.
>    - If kvm advertise this feature to guest, and guest kernel is latest,
> and guest kernel must enable it too. It's the happy case that guest can
> handler it on its own purpose.
>    - Any other cases, guest get an unexpected #AC and crash.
> 2) report to userspace (I think the same like a SIGBUS)
> 
> So for simplicity, we can do what Paolo suggested that don't advertise
> this feature and report #AC to userspace when an #AC due to split-lock
> in guest *but* we never disable the host's split-lock detection due to
> guest's split-lock.

This is one possibility, but it must be opt-in.  Either you make split
lock detection opt-in in the host (and then a userspace exit is okay),
or you make split lock detection opt-in for KVM (and then #AC causes a
global disable of split-lock detection on the host).

Breaking all old guests with the default options is not a valid choice.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 14:08                       ` Paolo Bonzini
@ 2019-10-16 14:14                         ` David Laight
  2019-10-16 15:03                           ` Thomas Gleixner
  2019-10-16 15:41                         ` Sean Christopherson
  1 sibling, 1 reply; 85+ messages in thread
From: David Laight @ 2019-10-16 14:14 UTC (permalink / raw)
  To: 'Paolo Bonzini', Xiaoyao Li, Thomas Gleixner
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

For the smt case, can you make #AC enable a property of the process?
Then disable it on the core if either smt process requires it be disabled?

This would mean that is a 'mixed environment' not all split accesses
would actually generate #AC - but enough would to detect broken code
that doesn't have #AC explicitly disabled.

I'm not sure you'd want a guest to flip #AC enable based on the process
it is scheduling, but it might work for the base metal scheduler.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 13:13                     ` Xiaoyao Li
@ 2019-10-16 14:43                       ` Thomas Gleixner
  2019-10-16 15:37                         ` Paolo Bonzini
  0 siblings, 1 reply; 85+ messages in thread
From: Thomas Gleixner @ 2019-10-16 14:43 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Wed, 16 Oct 2019, Xiaoyao Li wrote:
> On 10/16/2019 7:26 PM, Paolo Bonzini wrote:
> > Old guests are prevalent enough that enabling split-lock detection by
> > default would be a big usability issue.  And even ignoring that, you
> > would get the issue you describe below:
> 
> Right, whether enabling split-lock detection is made by the administrator. The
> administrator is supposed to know the consequence of enabling it. Enabling it
> means don't want any split-lock happens in userspace, of course VMM softwares
> are under control.

I have no idea what you are talking about, but the whole thing is trivial
enough to describe in a decision matrix:

N | #AC       | #AC enabled | SMT | Ctrl    | Guest | Action
R | available | on host     |     | exposed | #AC   |
--|-----------|-------------|-----|---------|-------|---------------------
  |           |             |     |         |       |
0 | N         |     x       |  x  |   N     |   x   | None
  |           |             |     |         |       |
1 | Y         |     N       |  x  |   N     |   x   | None
  |           |             |     |         |       |
2 | Y         |     Y       |  x  |   Y     |   Y   | Forward to guest
  |           |             |     |         |       |
3 | Y         |     Y       |  N  |   Y     |   N   | A) Store in vCPU and
  |           |             |     |         |       |    toggle on VMENTER/EXIT
  |           |             |     |         |       |
  |           |             |     |         |       | B) SIGBUS or KVM exit code
  |           |             |     |         |       |
4 | Y         |     Y       |  Y  |   Y     |   N   | A) Disable globally on
  |           |             |     |         |       |    host. Store in vCPU/guest
  |           |             |     |         |       |    state and evtl. reenable
  |           |             |     |         |       |    when guest goes away.
  |           |             |     |         |       | 
  |           |             |     |         |       | B) SIGBUS or KVM exit code

  [234] need proper accounting and tracepoints in KVM

  [34]  need a policy decision in KVM

Now there are a two possible state transitions:

 #AC enabled on host during runtime

   Existing guests are not notified. Nothing changes.


 #AC disabled on host during runtime

   That only affects state #2 from the above table and there are two
   possible solutions:

     1) Do nothing.

     2) Issue a notification to the guest. This would be doable at least
     	for Linux guests because any guest kernel which handles #AC is
	at least the same generation as the host which added #AC.

   	Whether it's worth it, I don't know, but it makes sense at least
	for consistency reasons.

     For a first step I'd go for 'Do nothing'

SMT state transitions could be handled in a similar way, but I don't think
it's worth the trouble. The above should cover everything at least on a
best effort basis.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 13:51                     ` Xiaoyao Li
  2019-10-16 14:08                       ` Paolo Bonzini
@ 2019-10-16 14:50                       ` Thomas Gleixner
  1 sibling, 0 replies; 85+ messages in thread
From: Thomas Gleixner @ 2019-10-16 14:50 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Wed, 16 Oct 2019, Xiaoyao Li wrote:
> On 10/16/2019 7:58 PM, Paolo Bonzini wrote:
> > > With your proposal you render #AC useless even on hosts which have SMT
> > > disabled, which is just wrong. There are enough good reasons to disable
> > > SMT.
> > 
> > My lazy "solution" only applies to SMT enabled.  When SMT is either not
> > supported, or disabled as in "nosmt=force", we can virtualize it like
> > the posted patches have done so far.
> > 
> 
> Do we really need to divide it into two cases of SMT enabled and SMT disabled?

Yes. See the matrix I just sent.

> > > I agree that with SMT enabled the situation is truly bad, but we surely
> > > can
> > > be smarter than just disabling it globally unconditionally and forever.
> > > 
> > > Plus we want a knob which treats guests triggering #AC in the same way as
> > > we treat user space, i.e. kill them with SIGBUS.
> > 
> > Yes, that's a valid alternative.  But if SMT is possible, I think the
> > only sane possibilities are global disable and SIGBUS.  SIGBUS (or
> > better, a new KVM_RUN exit code) can be acceptable for debugging guests too.
> 
> If SIGBUS, why need to globally disable?

See the matrix I just sent.

> When there is an #AC due to split-lock in guest, KVM only has below two
> choices:
> 1) inject back into guest.
>    - If kvm advertise this feature to guest, and guest kernel is latest, and
> guest kernel must enable it too. It's the happy case that guest can handler it
> on its own purpose.
>    - Any other cases, guest get an unexpected #AC and crash.

That's just wrong for obvious reasons.

> 2) report to userspace (I think the same like a SIGBUS)

No. What guarantees that userspace qemu handles the SIGBUS sanely?

> So for simplicity, we can do what Paolo suggested that don't advertise this
> feature and report #AC to userspace when an #AC due to split-lock in guest
> *but* we never disable the host's split-lock detection due to guest's
> split-lock.

No, you can't.

Guess what happens when you just boot some existing guest on a #AC enabled
host without having updated qemu to handle the exit code/SIGBUS.

It simply will crash and burn in nonsensical ways. Same as reinjecting it
into the guest and letting it crash.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 14:14                         ` David Laight
@ 2019-10-16 15:03                           ` Thomas Gleixner
  0 siblings, 0 replies; 85+ messages in thread
From: Thomas Gleixner @ 2019-10-16 15:03 UTC (permalink / raw)
  To: David Laight
  Cc: 'Paolo Bonzini',
	Xiaoyao Li, Sean Christopherson, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Wed, 16 Oct 2019, David Laight wrote:

> For the smt case, can you make #AC enable a property of the process?
> Then disable it on the core if either smt process requires it be disabled?

That would be feasible if the logic of the TEST_CTRL_MSR would be AND, but
it's OR.

Thread0	#AC-EN	Thread1 #AC-EN	#AC enabled on core
	0		0		0
	1		0		1
	0		1		1
	1		1		1

So in order to do flips on VMENTER you'd need to IPI the other thread and
handle all the interesting corner cases.

The 'Rescue SMT' mitigation stuff on top of core scheduling is ugly enough
already, but there the state can be transitionally 'unmitigated' while with
#AC you run into trouble immediately if the transitional state is ON at the
wrong point.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 14:43                       ` Thomas Gleixner
@ 2019-10-16 15:37                         ` Paolo Bonzini
  2019-10-16 16:25                           ` Xiaoyao Li
  2019-10-17 12:29                           ` [RFD] x86/split_lock: Request to Intel Thomas Gleixner
  0 siblings, 2 replies; 85+ messages in thread
From: Paolo Bonzini @ 2019-10-16 15:37 UTC (permalink / raw)
  To: Thomas Gleixner, Xiaoyao Li
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 16/10/19 16:43, Thomas Gleixner wrote:
> 
> N | #AC       | #AC enabled | SMT | Ctrl    | Guest | Action
> R | available | on host     |     | exposed | #AC   |
> --|-----------|-------------|-----|---------|-------|---------------------
>   |           |             |     |         |       |
> 0 | N         |     x       |  x  |   N     |   x   | None
>   |           |             |     |         |       |
> 1 | Y         |     N       |  x  |   N     |   x   | None

So far so good.

> 2 | Y         |     Y       |  x  |   Y     |   Y   | Forward to guest
>
> 3 | Y         |     Y       |  N  |   Y     |   N   | A) Store in vCPU and
>   |           |             |     |         |       |    toggle on VMENTER/EXIT
>   |           |             |     |         |       |
>   |           |             |     |         |       | B) SIGBUS or KVM exit code

(2) is problematic for the SMT=y case, because of what happens when #AC 
is disabled on the host---safe guests can start to be susceptible to 
DoS.

For (3), which is the SMT=n case,, the behavior is the same independent of
guest #AC.

So I would change these two lines to:

  2 | Y         |     Y       |  Y  |   N     |   x   | On first guest #AC,
    |           |             |     |         |       | disable globally on host.
    |           |             |     |         |       |
  3 | Y         |     Y       |  N  |   Y     |   x   | Switch MSR_TEST_CTRL on
    |           |             |     |         |       | enter/exit, plus:
    |           |             |     |         |       | A) #AC forwarded to guest.
    |           |             |     |         |       | B) SIGBUS or KVM exit code

> 4 | Y         |     Y       |  Y  |   Y     |   N   | A) Disable globally on
>   |           |             |     |         |       |    host. Store in vCPU/guest
>   |           |             |     |         |       |    state and evtl. reenable
>   |           |             |     |         |       |    when guest goes away.
>   |           |             |     |         |       |
>   |           |             |     |         |       | B) SIGBUS or KVM exit code

Also okay.  And finally:

  5 | Y         |     Y       |  Y  |   Y     |   Y   | Forward to guest

> Now there are a two possible state transitions:

>  #AC enabled on host during runtime
> 
>    Existing guests are not notified. Nothing changes.

Switches from (1) to (2) or (4) and (5).  Ugly for (2) and (4A), in that
split-lock detection might end up being forcibly disabled on the host, but
guests do not notice anything.  Okay for (4B) and (5).

>  #AC disabled on host during runtime

Switches from (2), (4) and (5) to (1).  Bad for (4A) and (5), in that
guests might miss #ACs from userspace.  No problem for (2), okay for (4B)
since the host admin decision affects KVM userspace but not KVM guests.

Because (4A) and (5) are problematic, and (4B) can cause guests to halt
irrecoverably on guest #AC, I'd prefer the control not to be
exposed by default.  In KVM API terms:

- KVM_GET_SUPPORTED_CPUID should *not* return the new CPUID bit and
KVM_GET_MSR_INDEX_LIST should not return MSR_TEST_CTRL.  A separate
capability can be queried with KVM_CHECK_EXTENSION to determine whether
KVM supports split-lock detection is available.  The default behavior will
be (2).

- we only need to pick one of (3A)/(4A) and (3B)/(4B).  (3A) should definitely
be the default, probably (4A) too.  But if both are implemented, the
aforementioned capability can be used with KVM_ENABLE_CAP to switch from
one behavior to the other.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 14:08                       ` Paolo Bonzini
  2019-10-16 14:14                         ` David Laight
@ 2019-10-16 15:41                         ` Sean Christopherson
  2019-10-16 15:43                           ` Paolo Bonzini
  1 sibling, 1 reply; 85+ messages in thread
From: Sean Christopherson @ 2019-10-16 15:41 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Thomas Gleixner, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Wed, Oct 16, 2019 at 04:08:14PM +0200, Paolo Bonzini wrote:
> SIGBUS (actually a new KVM_EXIT_INTERNAL_ERROR result from KVM_RUN is
> better, but that's the idea) is for when you're debugging guests.
> Global disable (or alternatively, disable SMT) is for production use.

Alternatively, for guests without split-lock #AC enabled, what if KVM were
to emulate the faulting instruction with split-lock detection temporarily
disabled?

The emulator can presumably handle all such lock instructions, and an
unhandled instruction would naturally exit to userspace.

The latency of VM-Enter+VM-Exit should be enough to guard against DoS from
a malicious guest.  KVM could also artificially rate-limit a guest that is
generating copious amounts of split-lock #ACs.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 15:41                         ` Sean Christopherson
@ 2019-10-16 15:43                           ` Paolo Bonzini
  2019-10-16 16:23                             ` Sean Christopherson
  0 siblings, 1 reply; 85+ messages in thread
From: Paolo Bonzini @ 2019-10-16 15:43 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Thomas Gleixner, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 16/10/19 17:41, Sean Christopherson wrote:
> On Wed, Oct 16, 2019 at 04:08:14PM +0200, Paolo Bonzini wrote:
>> SIGBUS (actually a new KVM_EXIT_INTERNAL_ERROR result from KVM_RUN is
>> better, but that's the idea) is for when you're debugging guests.
>> Global disable (or alternatively, disable SMT) is for production use.
> 
> Alternatively, for guests without split-lock #AC enabled, what if KVM were
> to emulate the faulting instruction with split-lock detection temporarily
> disabled?

Yes we can get fancy, but remember that KVM is not yet supporting
emulation of locked instructions.  Adding it is possible but shouldn't
be in the critical path for the whole feature.

How would you disable split-lock detection temporarily?  Just tweak
MSR_TEST_CTRL for the time of running the one instruction, and cross
fingers that the sibling doesn't notice?

Thanks,

Paolo

> The emulator can presumably handle all such lock instructions, and an
> unhandled instruction would naturally exit to userspace.
> 
> The latency of VM-Enter+VM-Exit should be enough to guard against DoS from
> a malicious guest.  KVM could also artificially rate-limit a guest that is
> generating copious amounts of split-lock #ACs.
> 


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16  9:29           ` Thomas Gleixner
@ 2019-10-16 15:59             ` Sean Christopherson
  0 siblings, 0 replies; 85+ messages in thread
From: Sean Christopherson @ 2019-10-16 15:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Fenghua Yu, Ingo Molnar, Borislav Petkov, H Peter Anvin,
	Peter Zijlstra, Andrew Morton, Dave Hansen, Paolo Bonzini,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams, Xiaoyao Li,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Wed, Oct 16, 2019 at 11:29:00AM +0200, Thomas Gleixner wrote:
> >   - Modify the #AC handler to test/set the same atomic variable as the
> >     sysfs knob.  This is the "disabled by kernel" flow.
> 
> That's the #AC in kernel handler, right?

Yes.

> >   - Modify the debugfs/sysfs knob to only allow disabling split-lock
> >     detection.  This is the "disabled globally" path, i.e. sends IPIs to
> >     clear MSR_TEST_CTRL.split_lock on all online CPUs.
> 
> Why only disable? What's wrong with reenabling it? The shiny new driver you
> are working on is triggering #AC. So in order to test the fix, you need to
> reboot the machine instead of just unloading the module, reenabling #AC and
> then loading the fixed one?

A re-enabling path adds complexity (though not much) and is undesirable
for a production environment as a split-lock issue in the kernel isn't
going to magically disappear.  And I thought that disable-only was also
your preferred implementation based on a previous comment[*], but that
comment may have been purely in the scope of userspace applications.

Anyways, my personal preference would be to keep things simple and not
support a re-enabling path.  But then again, I do 99.9% of my development
in VMs so my vote probably shouldn't count regarding the module issue.

[*] https://lkml.kernel.org/r/alpine.DEB.2.21.1904180832290.3174@nanos.tec.linutronix.de

> >   - Modify the resume/init flow to clear MSR_TEST_CTRL.split_lock if it's
> >     been disabled on *any* CPU via #AC or via the knob.
> 
> Fine.
> 
> >   - Remove KVM loading of MSR_TEST_CTRL, i.e. KVM *never* writes the CPU's
> >     actual MSR_TEST_CTRL.  KVM still emulates MSR_TEST_CTRL so that the
> >     guest can do WRMSR and handle its own #AC faults, but KVM doesn't
> >     change the value in hardware.
> > 
> >       * Allowing guest to enable split-lock detection can induce #AC on
> >         the host after it has been explicitly turned off, e.g. the sibling
> >         hyperthread hits an #AC in the host kernel, or worse, causes a
> >         different process in the host to SIGBUS.
> >
> >       * Allowing guest to disable split-lock detection opens up the host
> >         to DoS attacks.
> 
> Wasn't this discussed before and agreed on that if the host has AC enabled
> that the guest should not be able to force disable it? I surely lost track
> of this completely so my memory might trick me.

Yes, I was restating that point, or at least attempting to.
 
> The real question is what you do when the host has #AC enabled and the
> guest 'disabled' it and triggers #AC. Is that going to be silently ignored
> or is the intention to kill the guest in the same way as we kill userspace?
> 
> The latter would be the right thing, but given the fact that the current
> kernels easily trigger #AC today, that would cause a major wreckage in
> hosting scenarios. So I fear we need to bite the bullet and have a knob
> which defaults to 'handle silently' and allows to enable the kill mechanics
> on purpose. 'Handle silently' needs some logging of course, at least a per
> guest counter which can be queried and a tracepoint.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 15:43                           ` Paolo Bonzini
@ 2019-10-16 16:23                             ` Sean Christopherson
  2019-10-16 17:42                               ` Sean Christopherson
  2019-10-21 13:02                               ` Paolo Bonzini
  0 siblings, 2 replies; 85+ messages in thread
From: Sean Christopherson @ 2019-10-16 16:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Thomas Gleixner, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Wed, Oct 16, 2019 at 05:43:53PM +0200, Paolo Bonzini wrote:
> On 16/10/19 17:41, Sean Christopherson wrote:
> > On Wed, Oct 16, 2019 at 04:08:14PM +0200, Paolo Bonzini wrote:
> >> SIGBUS (actually a new KVM_EXIT_INTERNAL_ERROR result from KVM_RUN is
> >> better, but that's the idea) is for when you're debugging guests.
> >> Global disable (or alternatively, disable SMT) is for production use.
> > 
> > Alternatively, for guests without split-lock #AC enabled, what if KVM were
> > to emulate the faulting instruction with split-lock detection temporarily
> > disabled?
> 
> Yes we can get fancy, but remember that KVM is not yet supporting
> emulation of locked instructions.  Adding it is possible but shouldn't
> be in the critical path for the whole feature.

Ah, didn't realize that.  I'm surprised emulating all locks with cmpxchg
doesn't cause problems (or am I misreading the code?).  Assuming I'm
reading the code correctly, the #AC path could kick all other vCPUS on
emulation failure and then retry emulation to "guarantee" success.  Though
that's starting to build quite the house of cards.

> How would you disable split-lock detection temporarily?  Just tweak
> MSR_TEST_CTRL for the time of running the one instruction, and cross
> fingers that the sibling doesn't notice?

Tweak MSR_TEST_CTRL, with logic to handle the scenario where split-lock
detection is globally disable during emulation (so KVM doesn't
inadvertantly re-enable it).

There isn't much for the sibling to notice.  The kernel would temporarily
allow split-locks on the sibling, but that's a performance issue and isn't
directly fatal.  A missed #AC in the host kernel would only delay the
inevitable global disabling of split-lock.  A missed #AC in userspace would
again just delay the inevitable SIGBUS.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 15:37                         ` Paolo Bonzini
@ 2019-10-16 16:25                           ` Xiaoyao Li
  2019-10-16 16:38                             ` Paolo Bonzini
  2019-10-17 12:29                           ` [RFD] x86/split_lock: Request to Intel Thomas Gleixner
  1 sibling, 1 reply; 85+ messages in thread
From: Xiaoyao Li @ 2019-10-16 16:25 UTC (permalink / raw)
  To: Paolo Bonzini, Thomas Gleixner
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 10/16/2019 11:37 PM, Paolo Bonzini wrote:
> On 16/10/19 16:43, Thomas Gleixner wrote:
>>
>> N | #AC       | #AC enabled | SMT | Ctrl    | Guest | Action
>> R | available | on host     |     | exposed | #AC   |
>> --|-----------|-------------|-----|---------|-------|---------------------
>>    |           |             |     |         |       |
>> 0 | N         |     x       |  x  |   N     |   x   | None
>>    |           |             |     |         |       |
>> 1 | Y         |     N       |  x  |   N     |   x   | None
> 
> So far so good.
> 
>> 2 | Y         |     Y       |  x  |   Y     |   Y   | Forward to guest
>>
>> 3 | Y         |     Y       |  N  |   Y     |   N   | A) Store in vCPU and
>>    |           |             |     |         |       |    toggle on VMENTER/EXIT
>>    |           |             |     |         |       |
>>    |           |             |     |         |       | B) SIGBUS or KVM exit code
> 
> (2) is problematic for the SMT=y case, because of what happens when #AC
> is disabled on the host---safe guests can start to be susceptible to
> DoS.
> 
> For (3), which is the SMT=n case,, the behavior is the same independent of
> guest #AC.
> 
> So I would change these two lines to:
> 
>    2 | Y         |     Y       |  Y  |   N     |   x   | On first guest #AC,
>      |           |             |     |         |       | disable globally on host.
>      |           |             |     |         |       |
>    3 | Y         |     Y       |  N  |   Y     |   x   | Switch MSR_TEST_CTRL on
>      |           |             |     |         |       | enter/exit, plus:
>      |           |             |     |         |       | A) #AC forwarded to guest.
>      |           |             |     |         |       | B) SIGBUS or KVM exit code
>

I just want to get confirmed that in (3), we should split into 2 case:

a) if host has it enabled, still apply the constraint that guest is 
forcibly enabled? so we don't switch MSR_TEST_CTL.

b) if host has it disabled, we can switch MSR_TEST_CTL on enter/exit.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 16:25                           ` Xiaoyao Li
@ 2019-10-16 16:38                             ` Paolo Bonzini
  0 siblings, 0 replies; 85+ messages in thread
From: Paolo Bonzini @ 2019-10-16 16:38 UTC (permalink / raw)
  To: Xiaoyao Li, Thomas Gleixner
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 16/10/19 18:25, Xiaoyao Li wrote:
>>
>>    3 | Y         |     Y       |  N  |   Y     |   x   | Switch
>> MSR_TEST_CTRL on
>>      |           |             |     |         |       | enter/exit,
>> plus:
>>      |           |             |     |         |       | A) #AC
>> forwarded to guest.
>>      |           |             |     |         |       | B) SIGBUS or
>> KVM exit code
>>
> 
> I just want to get confirmed that in (3), we should split into 2 case:
> 
> a) if host has it enabled, still apply the constraint that guest is
> forcibly enabled? so we don't switch MSR_TEST_CTL.
> 
> b) if host has it disabled, we can switch MSR_TEST_CTL on enter/exit.

That's doable, yes.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 16:23                             ` Sean Christopherson
@ 2019-10-16 17:42                               ` Sean Christopherson
  2019-10-17  1:23                                 ` Xiaoyao Li
  2019-10-21 13:03                                 ` Paolo Bonzini
  2019-10-21 13:02                               ` Paolo Bonzini
  1 sibling, 2 replies; 85+ messages in thread
From: Sean Christopherson @ 2019-10-16 17:42 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Thomas Gleixner, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Wed, Oct 16, 2019 at 09:23:37AM -0700, Sean Christopherson wrote:
> On Wed, Oct 16, 2019 at 05:43:53PM +0200, Paolo Bonzini wrote:
> > On 16/10/19 17:41, Sean Christopherson wrote:
> > > On Wed, Oct 16, 2019 at 04:08:14PM +0200, Paolo Bonzini wrote:
> > >> SIGBUS (actually a new KVM_EXIT_INTERNAL_ERROR result from KVM_RUN is
> > >> better, but that's the idea) is for when you're debugging guests.
> > >> Global disable (or alternatively, disable SMT) is for production use.
> > > 
> > > Alternatively, for guests without split-lock #AC enabled, what if KVM were
> > > to emulate the faulting instruction with split-lock detection temporarily
> > > disabled?
> > 
> > Yes we can get fancy, but remember that KVM is not yet supporting
> > emulation of locked instructions.  Adding it is possible but shouldn't
> > be in the critical path for the whole feature.
> 
> Ah, didn't realize that.  I'm surprised emulating all locks with cmpxchg
> doesn't cause problems (or am I misreading the code?).  Assuming I'm
> reading the code correctly, the #AC path could kick all other vCPUS on
> emulation failure and then retry emulation to "guarantee" success.  Though
> that's starting to build quite the house of cards.

Ugh, doesn't the existing emulation behavior create another KVM issue?
KVM uses a locked cmpxchg in emulator_cmpxchg_emulated() and the address
is guest controlled, e.g. a guest could coerce the host into disabling
split-lock detection via the host's #AC handler by triggering emulation
and inducing an #AC in the emulator.

> > How would you disable split-lock detection temporarily?  Just tweak
> > MSR_TEST_CTRL for the time of running the one instruction, and cross
> > fingers that the sibling doesn't notice?
> 
> Tweak MSR_TEST_CTRL, with logic to handle the scenario where split-lock
> detection is globally disable during emulation (so KVM doesn't
> inadvertantly re-enable it).
> 
> There isn't much for the sibling to notice.  The kernel would temporarily
> allow split-locks on the sibling, but that's a performance issue and isn't
> directly fatal.  A missed #AC in the host kernel would only delay the
> inevitable global disabling of split-lock.  A missed #AC in userspace would
> again just delay the inevitable SIGBUS.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 17:42                               ` Sean Christopherson
@ 2019-10-17  1:23                                 ` Xiaoyao Li
  2019-10-21 13:06                                   ` Paolo Bonzini
  2019-10-21 13:03                                 ` Paolo Bonzini
  1 sibling, 1 reply; 85+ messages in thread
From: Xiaoyao Li @ 2019-10-17  1:23 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Thomas Gleixner, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 10/17/2019 1:42 AM, Sean Christopherson wrote:
> On Wed, Oct 16, 2019 at 09:23:37AM -0700, Sean Christopherson wrote:
>> On Wed, Oct 16, 2019 at 05:43:53PM +0200, Paolo Bonzini wrote:
>>> On 16/10/19 17:41, Sean Christopherson wrote:
>>>> On Wed, Oct 16, 2019 at 04:08:14PM +0200, Paolo Bonzini wrote:
>>>>> SIGBUS (actually a new KVM_EXIT_INTERNAL_ERROR result from KVM_RUN is
>>>>> better, but that's the idea) is for when you're debugging guests.
>>>>> Global disable (or alternatively, disable SMT) is for production use.
>>>>
>>>> Alternatively, for guests without split-lock #AC enabled, what if KVM were
>>>> to emulate the faulting instruction with split-lock detection temporarily
>>>> disabled?
>>>
>>> Yes we can get fancy, but remember that KVM is not yet supporting
>>> emulation of locked instructions.  Adding it is possible but shouldn't
>>> be in the critical path for the whole feature.
>>
>> Ah, didn't realize that.  I'm surprised emulating all locks with cmpxchg
>> doesn't cause problems (or am I misreading the code?).  Assuming I'm
>> reading the code correctly, the #AC path could kick all other vCPUS on
>> emulation failure and then retry emulation to "guarantee" success.  Though
>> that's starting to build quite the house of cards.
> 
> Ugh, doesn't the existing emulation behavior create another KVM issue?
> KVM uses a locked cmpxchg in emulator_cmpxchg_emulated() and the address
> is guest controlled, e.g. a guest could coerce the host into disabling
> split-lock detection via the host's #AC handler by triggering emulation
> and inducing an #AC in the emulator.
>

Exactly right.

I have tested with force_emulation_prefix. It did go into the #AC 
handler and disable the split-lock detection in host.

However, without force_emulation_prefix enabled, I'm not sure whether 
malicious guest can create the case causing the emulation with a lock 
prefix and going to the emulator_cmpxchg_emulated().
I found it impossible without force_emulation_prefix enabled and I'm not 
familiar with emulation at all. If I missed something, please let me know.

>>> How would you disable split-lock detection temporarily?  Just tweak
>>> MSR_TEST_CTRL for the time of running the one instruction, and cross
>>> fingers that the sibling doesn't notice?
>>
>> Tweak MSR_TEST_CTRL, with logic to handle the scenario where split-lock
>> detection is globally disable during emulation (so KVM doesn't
>> inadvertantly re-enable it).
>>
>> There isn't much for the sibling to notice.  The kernel would temporarily
>> allow split-locks on the sibling, but that's a performance issue and isn't
>> directly fatal.  A missed #AC in the host kernel would only delay the
>> inevitable global disabling of split-lock.  A missed #AC in userspace would
>> again just delay the inevitable SIGBUS.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [RFD] x86/split_lock: Request to Intel
  2019-10-16 15:37                         ` Paolo Bonzini
  2019-10-16 16:25                           ` Xiaoyao Li
@ 2019-10-17 12:29                           ` Thomas Gleixner
  2019-10-17 17:23                             ` Sean Christopherson
                                               ` (2 more replies)
  1 sibling, 3 replies; 85+ messages in thread
From: Thomas Gleixner @ 2019-10-17 12:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Xiaoyao Li, Sean Christopherson, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

The more I look at this trainwreck, the less interested I am in merging any
of this at all.

The fact that it took Intel more than a year to figure out that the MSR is
per core and not per thread is yet another proof that this industry just
works by pure chance.

There is a simple way out of this misery:

  Intel issues a microcode update which does:

    1) Convert the OR logic of the AC enable bit in the TEST_CTRL MSR to
       AND logic, i.e. when one thread disables AC it's automatically
       disabled on the core.

       Alternatively it supresses the #AC when the current thread has it
       disabled.

    2) Provide a separate bit which indicates that the AC enable logic is
       actually AND based or that #AC is supressed when the current thread
       has it disabled.

    Which way I don't really care as long as it makes sense.

If that's not going to happen, then we just bury the whole thing and put it
on hold until a sane implementation of that functionality surfaces in
silicon some day in the not so foreseeable future.

Seriously, this makes only sense when it's by default enabled and not
rendered useless by VIRT. Otherwise we never get any reports and none of
the issues are going to be fixed.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFD] x86/split_lock: Request to Intel
  2019-10-17 12:29                           ` [RFD] x86/split_lock: Request to Intel Thomas Gleixner
@ 2019-10-17 17:23                             ` Sean Christopherson
  2019-10-17 21:31                               ` Thomas Gleixner
  2019-10-17 23:28                             ` Luck, Tony
  2019-10-18  2:36                             ` Xiaoyao Li
  2 siblings, 1 reply; 85+ messages in thread
From: Sean Christopherson @ 2019-10-17 17:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paolo Bonzini, Xiaoyao Li, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Thu, Oct 17, 2019 at 02:29:45PM +0200, Thomas Gleixner wrote:
> The more I look at this trainwreck, the less interested I am in merging any
> of this at all.
> 
> The fact that it took Intel more than a year to figure out that the MSR is
> per core and not per thread is yet another proof that this industry just
> works by pure chance.
> 
> There is a simple way out of this misery:
> 
>   Intel issues a microcode update which does:
> 
>     1) Convert the OR logic of the AC enable bit in the TEST_CTRL MSR to
>        AND logic, i.e. when one thread disables AC it's automatically
>        disabled on the core.
> 
>        Alternatively it supresses the #AC when the current thread has it
>        disabled.
> 
>     2) Provide a separate bit which indicates that the AC enable logic is
>        actually AND based or that #AC is supressed when the current thread
>        has it disabled.
> 
>     Which way I don't really care as long as it makes sense.

The #AC bit doesn't use OR-logic, it's straight up shared, i.e. writes on
one CPU are immediately visible on its sibling CPU.  It doesn't magically
solve the problem, but I don't think we need IPIs to coordinate between
siblings, e.g. wouldn't something like this work?  The per-cpu things
being pointers that are shared by siblings.

void split_lock_disable(void)
{
        spinlock_t *ac_lock = this_cpu_ptr(split_lock_ac_lock);

	spin_lock(ac_lock);
        if (this_cpu_inc_return(*split_lock_ac_disabled) == 1)
                WRMSR(RDMSR() & ~bit);
        spin_unlock(ac_lock);
}

void split_lock_enable(void)
{
        spinlock_t *ac_lock = this_cpu_ptr(split_lock_ac_lock);

	spin_lock(ac_lock);
        if (this_cpu_dec_return(*split_lock_ac_disabled) == 0)
                WRMSR(RDMSR() | bit);
        spin_unlock(ac_lock);
}


To avoid the spin_lock and WRMSR latency on every VM-Enter and VM-Exit,
actions (3a) and (4a) from your matrix (copied below) could be changed to
only do split_lock_disable() if the guest actually generates an #AC, and
then do split_lock_enable() on the next VM-Exit.  Assuming even legacy
guests are somewhat sane and rarely do split-locks, lazily disabling the
control would eliminate most of the overhead and would also reduce the
time that the sibling CPU is running in the host without #AC protection.


N | #AC       | #AC enabled | SMT | Ctrl    | Guest | Action
R | available | on host     |     | exposed | #AC   |
--|-----------|-------------|-----|---------|-------|---------------------
  |           |             |     |         |       |
0 | N         |     x       |  x  |   N     |   x   | None
  |           |             |     |         |       |
1 | Y         |     N       |  x  |   N     |   x   | None
  |           |             |     |         |       |
2 | Y         |     Y       |  x  |   Y     |   Y   | Forward to guest
  |           |             |     |         |       |
3 | Y         |     Y       |  N  |   Y     |   N   | A) Store in vCPU and
  |           |             |     |         |       |    toggle on VMENTER/EXIT
  |           |             |     |         |       |
  |           |             |     |         |       | B) SIGBUS or KVM exit code
  |           |             |     |         |       |
4 | Y         |     Y       |  Y  |   Y     |   N   | A) Disable globally on
  |           |             |     |         |       |    host. Store in vCPU/guest
  |           |             |     |         |       |    state and evtl. reenable
  |           |             |     |         |       |    when guest goes away.
  |           |             |     |         |       | 
  |           |             |     |         |       | B) SIGBUS or KVM exit code


> If that's not going to happen, then we just bury the whole thing and put it
> on hold until a sane implementation of that functionality surfaces in
> silicon some day in the not so foreseeable future.
> 
> Seriously, this makes only sense when it's by default enabled and not
> rendered useless by VIRT. Otherwise we never get any reports and none of
> the issues are going to be fixed.
> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFD] x86/split_lock: Request to Intel
  2019-10-17 17:23                             ` Sean Christopherson
@ 2019-10-17 21:31                               ` Thomas Gleixner
  2019-10-17 23:38                                 ` Sean Christopherson
  0 siblings, 1 reply; 85+ messages in thread
From: Thomas Gleixner @ 2019-10-17 21:31 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Xiaoyao Li, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Thu, 17 Oct 2019, Sean Christopherson wrote:
> On Thu, Oct 17, 2019 at 02:29:45PM +0200, Thomas Gleixner wrote:
> > The more I look at this trainwreck, the less interested I am in merging any
> > of this at all.
> > 
> > The fact that it took Intel more than a year to figure out that the MSR is
> > per core and not per thread is yet another proof that this industry just
> > works by pure chance.
> > 
> > There is a simple way out of this misery:
> > 
> >   Intel issues a microcode update which does:
> > 
> >     1) Convert the OR logic of the AC enable bit in the TEST_CTRL MSR to
> >        AND logic, i.e. when one thread disables AC it's automatically
> >        disabled on the core.
> > 
> >        Alternatively it supresses the #AC when the current thread has it
> >        disabled.
> > 
> >     2) Provide a separate bit which indicates that the AC enable logic is
> >        actually AND based or that #AC is supressed when the current thread
> >        has it disabled.
> > 
> >     Which way I don't really care as long as it makes sense.
> 
> The #AC bit doesn't use OR-logic, it's straight up shared, i.e. writes on
> one CPU are immediately visible on its sibling CPU.

That's less horrible than I read out of your initial explanation.

Thankfully all of this is meticulously documented in the SDM ...

Though it changes the picture radically. The truly shared MSR allows
regular software synchronization without IPIs and without an insane amount
of corner case handling.

So as you pointed out we need a per core state, which is influenced by:

 1) The global enablement switch

 2) Host induced #AC

 3) Guest induced #AC

    A) Guest has #AC handling

    B) Guest has no #AC handling

#1:

   - OFF: #AC is globally disabled

   - ON:  #AC is globally enabled

   - FORCE: same as ON but #AC is enforced on guests

#2:

   If the host triggers an #AC then the #AC has to be force disabled on the
   affected core independent of the state of #1. Nothing we can do about
   that and once the initial wave of #AC issues is fixed this should not
   happen on production systems. That disables #3 even for the #3.A case
   for simplicity sake.

#3:

   A) Guest has #AC handling
    
      #AC is forwarded to the guest. No further action required aside of
      accounting

   B) Guest has no #AC handling

      If #AC triggers the resulting action depends on the state of #1:

      	 - FORCE: Guest is killed with SIGBUS or whatever the virt crowd
	   	  thinks is the appropriate solution

         - ON: #AC triggered state is recorded per vCPU and the MSR is
	   	toggled on VMENTER/VMEXIT in software from that point on.

So the only interesting case is #3.B and #1.state == ON. There you need
serialization of the state and the MSR write between the cores, but only
when the vCPU triggered an #AC. Until then, nothing to do.

vmenter()
{
	if (vcpu->ac_disable)
		this_core_disable_ac();
}

vmexit()
{
	if (vcpu->ac_disable) {
		this_core_enable_ac();
}

this_core_dis/enable_ac() takes the global state into account and has the
necessary serialization in place.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [RFD] x86/split_lock: Request to Intel
  2019-10-17 12:29                           ` [RFD] x86/split_lock: Request to Intel Thomas Gleixner
  2019-10-17 17:23                             ` Sean Christopherson
@ 2019-10-17 23:28                             ` Luck, Tony
  2019-10-18 10:45                               ` David Laight
  2019-10-18  2:36                             ` Xiaoyao Li
  2 siblings, 1 reply; 85+ messages in thread
From: Luck, Tony @ 2019-10-17 23:28 UTC (permalink / raw)
  To: Thomas Gleixner, Paolo Bonzini
  Cc: Li, Xiaoyao, Christopherson, Sean J, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Hansen, Dave, Radim Krcmar, Raj, Ashok, Williams, Dan J, Prakhya,
	Sai Praneeth, Shankar, Ravi V, linux-kernel, x86, kvm

> If that's not going to happen, then we just bury the whole thing and put it
> on hold until a sane implementation of that functionality surfaces in
> silicon some day in the not so foreseeable future.

We will drop the patches to flip the MSR bits to enable checking.

But we can fix the split lock issues that have already been found in the kernel.

Two strategies:

1) Adjust alignments of arrays passed to set_bit() et. al.

2) Fix set_bit() et. al. to not issue atomic operations that cross boundaries.

Fenghua had been pursuing option #1 in previous iterations. He found a few
more places with the help of the "grep" patterns suggested by David Laight.
So that path is up to ~8 patches now that do one of:
	+ Change from u32 to u64
	+ Force alignment with a union with a u64
	+ Change to non-atomic (places that didn't need atomic)

Downside of strategy #1 is that people will add new misaligned cases in the
future. So this process has no defined end point.

Strategy #2 begun when I looked at the split-lock issue I saw that with a
constant bit argument set_bit() just does a "ORB" on the affected byte (i.e.
no split lock). Similar for clear_bit() and change_bit(). Changing code to also
do that for the variable bit case is easy.

test_and_clr_bit() needs more care, but luckily, we had Peter Anvin nearby
to give us a neat solution.

So strategy #2 is being tried now (and Fenghua will post some patches
soon).

Strategy #2 does increase code size when the bit number argument isn't
a constant. But that isn't the common case (Fenghua is counting and will
give numbers when patches are ready).

So take a look at the option #2 patches when they are posted. If the code
size increase is unacceptable, we can go back to fixing each of the callers
to get alignment right.

-Tony



^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFD] x86/split_lock: Request to Intel
  2019-10-17 21:31                               ` Thomas Gleixner
@ 2019-10-17 23:38                                 ` Sean Christopherson
  0 siblings, 0 replies; 85+ messages in thread
From: Sean Christopherson @ 2019-10-17 23:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paolo Bonzini, Xiaoyao Li, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Thu, Oct 17, 2019 at 11:31:15PM +0200, Thomas Gleixner wrote:
> On Thu, 17 Oct 2019, Sean Christopherson wrote:
> > On Thu, Oct 17, 2019 at 02:29:45PM +0200, Thomas Gleixner wrote:
> > > The more I look at this trainwreck, the less interested I am in merging any
> > > of this at all.
> > > 
> > > The fact that it took Intel more than a year to figure out that the MSR is
> > > per core and not per thread is yet another proof that this industry just
> > > works by pure chance.
> > > 
> > > There is a simple way out of this misery:
> > > 
> > >   Intel issues a microcode update which does:
> > > 
> > >     1) Convert the OR logic of the AC enable bit in the TEST_CTRL MSR to
> > >        AND logic, i.e. when one thread disables AC it's automatically
> > >        disabled on the core.
> > > 
> > >        Alternatively it supresses the #AC when the current thread has it
> > >        disabled.
> > > 
> > >     2) Provide a separate bit which indicates that the AC enable logic is
> > >        actually AND based or that #AC is supressed when the current thread
> > >        has it disabled.
> > > 
> > >     Which way I don't really care as long as it makes sense.
> > 
> > The #AC bit doesn't use OR-logic, it's straight up shared, i.e. writes on
> > one CPU are immediately visible on its sibling CPU.
> 
> That's less horrible than I read out of your initial explanation.
> 
> Thankfully all of this is meticulously documented in the SDM ...

Preaching to the choir on this one...

> Though it changes the picture radically. The truly shared MSR allows
> regular software synchronization without IPIs and without an insane amount
> of corner case handling.
> 
> So as you pointed out we need a per core state, which is influenced by:
> 
>  1) The global enablement switch
> 
>  2) Host induced #AC
> 
>  3) Guest induced #AC
> 
>     A) Guest has #AC handling
> 
>     B) Guest has no #AC handling
> 
> #1:
> 
>    - OFF: #AC is globally disabled
> 
>    - ON:  #AC is globally enabled
> 
>    - FORCE: same as ON but #AC is enforced on guests
> 
> #2:
> 
>    If the host triggers an #AC then the #AC has to be force disabled on the
>    affected core independent of the state of #1. Nothing we can do about
>    that and once the initial wave of #AC issues is fixed this should not
>    happen on production systems. That disables #3 even for the #3.A case
>    for simplicity sake.
> 
> #3:
> 
>    A) Guest has #AC handling
>     
>       #AC is forwarded to the guest. No further action required aside of
>       accounting
> 
>    B) Guest has no #AC handling
> 
>       If #AC triggers the resulting action depends on the state of #1:
> 
>       	 - FORCE: Guest is killed with SIGBUS or whatever the virt crowd
> 	   	  thinks is the appropriate solution
>          - ON: #AC triggered state is recorded per vCPU and the MSR is
> 	   	toggled on VMENTER/VMEXIT in software from that point on.
>
> So the only interesting case is #3.B and #1.state == ON. There you need
> serialization of the state and the MSR write between the cores, but only
> when the vCPU triggered an #AC. Until then, nothing to do.

And "vCPU triggered an #AC" should include an explicit check in KVM's
emulator.

> vmenter()
> {
> 	if (vcpu->ac_disable)
> 		this_core_disable_ac();
> }
> 
> vmexit()
> {
> 	if (vcpu->ac_disable) {
> 		this_core_enable_ac();
> }
> 
> this_core_dis/enable_ac() takes the global state into account and has the
> necessary serialization in place.

Overall, looks good to me.  Although Tony's mail makes it obvious we need
to sync internally...

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFD] x86/split_lock: Request to Intel
  2019-10-17 12:29                           ` [RFD] x86/split_lock: Request to Intel Thomas Gleixner
  2019-10-17 17:23                             ` Sean Christopherson
  2019-10-17 23:28                             ` Luck, Tony
@ 2019-10-18  2:36                             ` Xiaoyao Li
  2019-10-18  9:02                               ` Thomas Gleixner
  2 siblings, 1 reply; 85+ messages in thread
From: Xiaoyao Li @ 2019-10-18  2:36 UTC (permalink / raw)
  To: Thomas Gleixner, Paolo Bonzini
  Cc: Sean Christopherson, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 10/17/2019 8:29 PM, Thomas Gleixner wrote:
> The more I look at this trainwreck, the less interested I am in merging any
> of this at all.
> 
> The fact that it took Intel more than a year to figure out that the MSR is
> per core and not per thread is yet another proof that this industry just
> works by pure chance.
> 

Whether it's per-core or per-thread doesn't affect much how we implement 
for host/native.

And also, no matter it's per-core or per-thread, we always can do 
something in VIRT.

Maybe what matters is below.

> Seriously, this makes only sense when it's by default enabled and not
> rendered useless by VIRT. Otherwise we never get any reports and none of
> the issues are going to be fixed.
>

For VIRT, it doesn't want old guest to be killed due to #AC. But for 
native, it doesn't want VIRT to disable the #AC detection

I think it's just about the default behavior that whether to disable the 
host's #AC detection or kill the guest (SIGBUS or something else) once 
there is an split-lock #AC in guest.

So we can provide CONFIG option to set the default behavior and module 
parameter to let KVM set/change the default behavior.


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFD] x86/split_lock: Request to Intel
  2019-10-18  2:36                             ` Xiaoyao Li
@ 2019-10-18  9:02                               ` Thomas Gleixner
  2019-10-18 10:20                                 ` Xiaoyao Li
  0 siblings, 1 reply; 85+ messages in thread
From: Thomas Gleixner @ 2019-10-18  9:02 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Paolo Bonzini, Sean Christopherson, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Fri, 18 Oct 2019, Xiaoyao Li wrote:
> On 10/17/2019 8:29 PM, Thomas Gleixner wrote:
> > The more I look at this trainwreck, the less interested I am in merging any
> > of this at all.
> > 
> > The fact that it took Intel more than a year to figure out that the MSR is
> > per core and not per thread is yet another proof that this industry just
> > works by pure chance.
> > 
> 
> Whether it's per-core or per-thread doesn't affect much how we implement for
> host/native.

How useful.

> And also, no matter it's per-core or per-thread, we always can do something in
> VIRT.

It matters a lot. If it would be per thread then we would not have this
discussion at all.

> Maybe what matters is below.
> 
> > Seriously, this makes only sense when it's by default enabled and not
> > rendered useless by VIRT. Otherwise we never get any reports and none of
> > the issues are going to be fixed.
> > 
> 
> For VIRT, it doesn't want old guest to be killed due to #AC. But for native,
> it doesn't want VIRT to disable the #AC detection
> 
> I think it's just about the default behavior that whether to disable the
> host's #AC detection or kill the guest (SIGBUS or something else) once there
> is an split-lock #AC in guest.
> 
> So we can provide CONFIG option to set the default behavior and module
> parameter to let KVM set/change the default behavior.

Care to read through the whole discussion and figure out WHY it's not that
simple?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFD] x86/split_lock: Request to Intel
  2019-10-18  9:02                               ` Thomas Gleixner
@ 2019-10-18 10:20                                 ` Xiaoyao Li
  2019-10-18 10:43                                   ` Peter Zijlstra
  0 siblings, 1 reply; 85+ messages in thread
From: Xiaoyao Li @ 2019-10-18 10:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Paolo Bonzini, Sean Christopherson, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 10/18/2019 5:02 PM, Thomas Gleixner wrote:
> On Fri, 18 Oct 2019, Xiaoyao Li wrote:
>> On 10/17/2019 8:29 PM, Thomas Gleixner wrote:
>>> The more I look at this trainwreck, the less interested I am in merging any
>>> of this at all.
>>>
>>> The fact that it took Intel more than a year to figure out that the MSR is
>>> per core and not per thread is yet another proof that this industry just
>>> works by pure chance.
>>>
>>
>> Whether it's per-core or per-thread doesn't affect much how we implement for
>> host/native.
> 
> How useful.

OK. IIUC. We can agree on the use model of native like below:

We enable #AC on all cores/threads to detect split lock.
  -If user space causes #AC, sending SIGBUS to it.
  -If kernel causes #AC, we globally disable #AC on all cores/threads, 
letting kernel go on working and WARN. (only disabling #AC on the thread 
generates it just doesn't help, since the buggy kernel code is possible 
to run on any threads and thus disabling #AC on all of them)

As described above, either enabled globally or disabled globally, so 
whether it's per-core or per-thread really doesn't matter

>> And also, no matter it's per-core or per-thread, we always can do something in
>> VIRT.
> 
> It matters a lot. If it would be per thread then we would not have this
> discussion at all.

Indeed, it's the fact that the control MSR bit is per-core to cause this 
discussion. But the per-core scope only makes this feature difficult or 
impossible to be virtualized.

We could make the decision to not expose it to guest to avoid the really 
bad thing. However, even we don't expose this feature to guest and don't 
virtualize it, the below problem always here.

If you think it's not a problem and acceptable to add an option to let 
KVM disable host's #AC detection, we can just make it this way. And then 
we can design the virtualizaion part without any change to native design 
at all.

>> Maybe what matters is below.
>>
>>> Seriously, this makes only sense when it's by default enabled and not
>>> rendered useless by VIRT. Otherwise we never get any reports and none of
>>> the issues are going to be fixed.
>>>
>>
>> For VIRT, it doesn't want old guest to be killed due to #AC. But for native,
>> it doesn't want VIRT to disable the #AC detection
>>
>> I think it's just about the default behavior that whether to disable the
>> host's #AC detection or kill the guest (SIGBUS or something else) once there
>> is an split-lock #AC in guest.
>>
>> So we can provide CONFIG option to set the default behavior and module
>> parameter to let KVM set/change the default behavior.
> 
> Care to read through the whole discussion and figure out WHY it's not that
> simple?
> 
> Thanks,
> 
> 	tglx
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [RFD] x86/split_lock: Request to Intel
  2019-10-18 10:20                                 ` Xiaoyao Li
@ 2019-10-18 10:43                                   ` Peter Zijlstra
  0 siblings, 0 replies; 85+ messages in thread
From: Peter Zijlstra @ 2019-10-18 10:43 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: Thomas Gleixner, Paolo Bonzini, Sean Christopherson, Fenghua Yu,
	Ingo Molnar, Borislav Petkov, H Peter Anvin, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On Fri, Oct 18, 2019 at 06:20:44PM +0800, Xiaoyao Li wrote:

> We enable #AC on all cores/threads to detect split lock.
>  -If user space causes #AC, sending SIGBUS to it.
>  -If kernel causes #AC, we globally disable #AC on all cores/threads,
> letting kernel go on working and WARN. (only disabling #AC on the thread
> generates it just doesn't help, since the buggy kernel code is possible to
> run on any threads and thus disabling #AC on all of them)
> 
> As described above, either enabled globally or disabled globally, so whether
> it's per-core or per-thread really doesn't matter

Go back and read the friggin' thread already. A big clue: virt ruins it
(like it tends to do).

^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [RFD] x86/split_lock: Request to Intel
  2019-10-17 23:28                             ` Luck, Tony
@ 2019-10-18 10:45                               ` David Laight
  2019-10-18 21:03                                 ` hpa
  0 siblings, 1 reply; 85+ messages in thread
From: David Laight @ 2019-10-18 10:45 UTC (permalink / raw)
  To: 'Luck, Tony', Thomas Gleixner, Paolo Bonzini
  Cc: Li, Xiaoyao, Christopherson, Sean J, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Hansen, Dave, Radim Krcmar, Raj, Ashok, Williams, Dan J, Prakhya,
	Sai Praneeth, Shankar, Ravi V, linux-kernel, x86, kvm

From: Luck, Tony
> Sent: 18 October 2019 00:28
...
> 2) Fix set_bit() et. al. to not issue atomic operations that cross boundaries.
> 
> Fenghua had been pursuing option #1 in previous iterations. He found a few
> more places with the help of the "grep" patterns suggested by David Laight.
> So that path is up to ~8 patches now that do one of:
> 	+ Change from u32 to u64
> 	+ Force alignment with a union with a u64
> 	+ Change to non-atomic (places that didn't need atomic)
> 
> Downside of strategy #1 is that people will add new misaligned cases in the
> future. So this process has no defined end point.
> 
> Strategy #2 begun when I looked at the split-lock issue I saw that with a
> constant bit argument set_bit() just does a "ORB" on the affected byte (i.e.
> no split lock). Similar for clear_bit() and change_bit(). Changing code to also
> do that for the variable bit case is easy.
> 
> test_and_clr_bit() needs more care, but luckily, we had Peter Anvin nearby
> to give us a neat solution.

Changing the x86-64 bitops to use 32bit memory cycles is trivial
(provided you are willing to accept a limit of 2G bits).

OTOH this only works because x86 is LE.
On any BE systems passing an 'int []' to any of the bit-functions is so terribly
wrong it is unbelievable.

So changing the x86-64 bitops is largely papering over a crack.

In essence any code that casts the argument to any of the bitops functions
is almost certainly badly broken on BE systems.

The x86 cpu features code is always LE.
It probably ought to have a typedef for a union of long [] and int [].

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 85+ messages in thread

* RE: [RFD] x86/split_lock: Request to Intel
  2019-10-18 10:45                               ` David Laight
@ 2019-10-18 21:03                                 ` hpa
  0 siblings, 0 replies; 85+ messages in thread
From: hpa @ 2019-10-18 21:03 UTC (permalink / raw)
  To: David Laight, 'Luck, Tony', Thomas Gleixner, Paolo Bonzini
  Cc: Li, Xiaoyao, Christopherson, Sean J, Yu, Fenghua, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Hansen, Dave,
	Radim Krcmar, Raj, Ashok, Williams, Dan J, Prakhya, Sai Praneeth,
	Shankar, Ravi V, linux-kernel, x86, kvm

On October 18, 2019 3:45:14 AM PDT, David Laight <David.Laight@ACULAB.COM> wrote:
>From: Luck, Tony
>> Sent: 18 October 2019 00:28
>...
>> 2) Fix set_bit() et. al. to not issue atomic operations that cross
>boundaries.
>> 
>> Fenghua had been pursuing option #1 in previous iterations. He found
>a few
>> more places with the help of the "grep" patterns suggested by David
>Laight.
>> So that path is up to ~8 patches now that do one of:
>> 	+ Change from u32 to u64
>> 	+ Force alignment with a union with a u64
>> 	+ Change to non-atomic (places that didn't need atomic)
>> 
>> Downside of strategy #1 is that people will add new misaligned cases
>in the
>> future. So this process has no defined end point.
>> 
>> Strategy #2 begun when I looked at the split-lock issue I saw that
>with a
>> constant bit argument set_bit() just does a "ORB" on the affected
>byte (i.e.
>> no split lock). Similar for clear_bit() and change_bit(). Changing
>code to also
>> do that for the variable bit case is easy.
>> 
>> test_and_clr_bit() needs more care, but luckily, we had Peter Anvin
>nearby
>> to give us a neat solution.
>
>Changing the x86-64 bitops to use 32bit memory cycles is trivial
>(provided you are willing to accept a limit of 2G bits).
>
>OTOH this only works because x86 is LE.
>On any BE systems passing an 'int []' to any of the bit-functions is so
>terribly
>wrong it is unbelievable.
>
>So changing the x86-64 bitops is largely papering over a crack.
>
>In essence any code that casts the argument to any of the bitops
>functions
>is almost certainly badly broken on BE systems.
>
>The x86 cpu features code is always LE.
>It probably ought to have a typedef for a union of long [] and int [].
>
>	David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>MK1 1PT, UK
>Registration No: 1397386 (Wales)

One thing I suggested is that we should actually expose the violations at committee time either by wrapping them in macros using __alignof__ and/or make the kernel compile with -Wcast-align.

On x86 the btsl/btcl/btrl instructions can be used without limiting to 2Gbit of the address is computed, the way one does for plain and, or, etc. However, if the real toes for the arguments are exposed then or is possible to do better.

Finally, as far as bigendian is concerned: the problem Linux has on bigendian machines is that it tries to use littleendian bitmaps on bigendian machines: on bigendian machines, bit 0 is naturally the MSB. If your reaction is "but that is absurd", then you have just grokked why bigendian is fundamentally broken.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 16:23                             ` Sean Christopherson
  2019-10-16 17:42                               ` Sean Christopherson
@ 2019-10-21 13:02                               ` Paolo Bonzini
  1 sibling, 0 replies; 85+ messages in thread
From: Paolo Bonzini @ 2019-10-21 13:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Thomas Gleixner, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 16/10/19 18:23, Sean Christopherson wrote:
>> Yes we can get fancy, but remember that KVM is not yet supporting
>> emulation of locked instructions.  Adding it is possible but shouldn't
>> be in the critical path for the whole feature.
> Ah, didn't realize that.  I'm surprised emulating all locks with cmpxchg
> doesn't cause problems (or am I misreading the code?).

It would cause problems if something was trying to do crazy stuff such
as locked operations on MMIO registers, or more plausibly (sort of...)
SMP in big real mode on pre-Westmere processors.  I've personally never
seen X86EMUL_CMPXCHG_FAILED happen in the real world.

Paolo

> reading the code correctly, the #AC path could kick all other vCPUS on
> emulation failure and then retry emulation to "guarantee" success.  Though
> that's starting to build quite the house of cards.
> 


^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-16 17:42                               ` Sean Christopherson
  2019-10-17  1:23                                 ` Xiaoyao Li
@ 2019-10-21 13:03                                 ` Paolo Bonzini
  1 sibling, 0 replies; 85+ messages in thread
From: Paolo Bonzini @ 2019-10-21 13:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Xiaoyao Li, Thomas Gleixner, Fenghua Yu, Ingo Molnar,
	Borislav Petkov, H Peter Anvin, Peter Zijlstra, Andrew Morton,
	Dave Hansen, Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 16/10/19 19:42, Sean Christopherson wrote:
> KVM uses a locked cmpxchg in emulator_cmpxchg_emulated() and the address
> is guest controlled, e.g. a guest could coerce the host into disabling
> split-lock detection via the host's #AC handler by triggering emulation
> and inducing an #AC in the emulator.

Yes, that's a possible issue.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock
  2019-10-17  1:23                                 ` Xiaoyao Li
@ 2019-10-21 13:06                                   ` Paolo Bonzini
  0 siblings, 0 replies; 85+ messages in thread
From: Paolo Bonzini @ 2019-10-21 13:06 UTC (permalink / raw)
  To: Xiaoyao Li, Sean Christopherson
  Cc: Thomas Gleixner, Fenghua Yu, Ingo Molnar, Borislav Petkov,
	H Peter Anvin, Peter Zijlstra, Andrew Morton, Dave Hansen,
	Radim Krcmar, Ashok Raj, Tony Luck, Dan Williams,
	Sai Praneeth Prakhya, Ravi V Shankar, linux-kernel, x86, kvm

On 17/10/19 03:23, Xiaoyao Li wrote:
> However, without force_emulation_prefix enabled, I'm not sure whether
> malicious guest can create the case causing the emulation with a lock
> prefix and going to the emulator_cmpxchg_emulated().
> I found it impossible without force_emulation_prefix enabled and I'm not
> familiar with emulation at all. If I missed something, please let me know.

It's always possible to invoke the emulator on arbitrary instructions
without FEP:

1) use big real mode on processors without unrestricted mode

2) set up two processors racing between executing an MMIO access, and
rewriting it so that the emulator sees a different instruction

3) a variant of (2) where you rewrite the page tables so that the
processor's iTLB lookup uses a stale translation.  Then the stale
translation can point to an MMIO access, while the emulator sees the
instruction pointed by the current contents of the page tables.

FEP was introduced just to keep the test code clean.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [tip: x86/cpu] x86/cpu: Align the x86_capability array to size of unsigned long
  2019-09-16 22:39   ` [PATCH 3/3] x86/split_lock: Align the x86_capability array to size of unsigned long Tony Luck
  2019-09-17  8:29     ` David Laight
@ 2019-11-15 19:26     ` tip-bot2 for Fenghua Yu
  1 sibling, 0 replies; 85+ messages in thread
From: tip-bot2 for Fenghua Yu @ 2019-11-15 19:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: David Laight, Thomas Gleixner, Fenghua Yu, Tony Luck,
	Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the x86/cpu branch of tip:

Commit-ID:     db8c33f8b5bea59d00ca12dcd6b65d01b1ea98ef
Gitweb:        https://git.kernel.org/tip/db8c33f8b5bea59d00ca12dcd6b65d01b1ea98ef
Author:        Fenghua Yu <fenghua.yu@intel.com>
AuthorDate:    Mon, 16 Sep 2019 15:39:58 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 15 Nov 2019 20:20:33 +01:00

x86/cpu: Align the x86_capability array to size of unsigned long

The x86_capability array in cpuinfo_x86 is of type u32 and thus is
naturally aligned to 4 bytes. But, set_bit() and clear_bit() require the
array to be aligned to size of unsigned long (i.e. 8 bytes on 64-bit
systems).

The array pointer is handed into atomic bit operations. If the access is
not aligned to unsigned long then the atomic bit operations can end up
crossing a cache line boundary, which causes the CPU to do a full bus lock
as it can't lock both cache lines at once. The bus lock operation is heavy
weight and can cause severe performance degradation.

The upcoming #AC split lock detection mechanism will issue warnings for
this kind of access.

Force the alignment of the array to unsigned long. This avoids the massive
code changes which would be required when converting the array data type to
unsigned long.

[ tglx: Rewrote changelog so it contains information WHY this is required ]

Suggested-by: David Laight <David.Laight@aculab.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190916223958.27048-4-tony.luck@intel.com

---
 arch/x86/include/asm/processor.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 6e0a3b4..c073534 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -93,7 +93,15 @@ struct cpuinfo_x86 {
 	__u32			extended_cpuid_level;
 	/* Maximum supported CPUID level, -1=no CPUID: */
 	int			cpuid_level;
-	__u32			x86_capability[NCAPINTS + NBUGINTS];
+	/*
+	 * Align to size of unsigned long because the x86_capability array
+	 * is passed to bitops which require the alignment. Use unnamed
+	 * union to enforce the array is aligned to size of unsigned long.
+	 */
+	union {
+		__u32		x86_capability[NCAPINTS + NBUGINTS];
+		unsigned long	x86_capability_alignment;
+	};
 	char			x86_vendor_id[16];
 	char			x86_model_id[64];
 	/* in KB - valid for CPUS which support this call: */

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [tip: x86/cpu] x86/cpu: Align cpu_caps_cleared and cpu_caps_set to unsigned long
  2019-09-16 22:39   ` [PATCH 1/3] x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long Tony Luck
@ 2019-11-15 19:26     ` tip-bot2 for Fenghua Yu
  0 siblings, 0 replies; 85+ messages in thread
From: tip-bot2 for Fenghua Yu @ 2019-11-15 19:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Fenghua Yu, Tony Luck, Thomas Gleixner, Borislav Petkov,
	Ingo Molnar, Borislav Petkov, linux-kernel

The following commit has been merged into the x86/cpu branch of tip:

Commit-ID:     f6a892ddd53e555362dbf64d31b47fde0f550ec4
Gitweb:        https://git.kernel.org/tip/f6a892ddd53e555362dbf64d31b47fde0f550ec4
Author:        Fenghua Yu <fenghua.yu@intel.com>
AuthorDate:    Mon, 16 Sep 2019 15:39:56 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 15 Nov 2019 20:20:32 +01:00

x86/cpu: Align cpu_caps_cleared and cpu_caps_set to unsigned long

cpu_caps_cleared[] and cpu_caps_set[] are arrays of type u32 and therefore
naturally aligned to 4 bytes, which is also unsigned long aligned on
32-bit, but not on 64-bit.

The array pointer is handed into atomic bit operations. If the access not
aligned to unsigned long then the atomic bit operations can end up crossing
a cache line boundary, which causes the CPU to do a full bus lock as it
can't lock both cache lines at once. The bus lock operation is heavy weight
and can cause severe performance degradation.

The upcoming #AC split lock detection mechanism will issue warnings for
this kind of access.

Force the alignment of these arrays to unsigned long. This avoids the
massive code changes which would be required when converting the array data
type to unsigned long.

[ tglx: Rewrote changelog ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20190916223958.27048-2-tony.luck@intel.com

---
 arch/x86/kernel/cpu/common.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9ae7d1b..1e9430b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -565,8 +565,9 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
 	return NULL;		/* Not found */
 }
 
-__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS];
-__u32 cpu_caps_set[NCAPINTS + NBUGINTS];
+/* Aligned to unsigned long to avoid split lock in atomic bitmap ops */
+__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
+__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
 
 void load_percpu_segment(int cpu)
 {

^ permalink raw reply related	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2019-11-15 19:26 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-18 22:41 [PATCH v9 00/17] x86/split_lock: Enable split lock detection Fenghua Yu
2019-06-18 22:41 ` [PATCH v9 01/17] x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long Fenghua Yu
2019-06-18 22:41 ` [PATCH v9 02/17] drivers/net/b44: Align pwol_mask to unsigned long for better performance Fenghua Yu
2019-06-24 15:12   ` David Laight
2019-06-24 18:43     ` Paolo Bonzini
2019-06-18 22:41 ` [PATCH v9 03/17] x86/split_lock: Align x86_capability to unsigned long to avoid split locked access Fenghua Yu
2019-06-24 15:12   ` David Laight
2019-06-25 23:54     ` Fenghua Yu
2019-06-26 19:15       ` Thomas Gleixner
2019-06-18 22:41 ` [PATCH v9 04/17] x86/msr-index: Define MSR_IA32_CORE_CAP and split lock detection bit Fenghua Yu
2019-06-18 22:41 ` [PATCH v9 05/17] x86/cpufeatures: Enumerate MSR_IA32_CORE_CAP Fenghua Yu
2019-06-18 22:41 ` [PATCH v9 06/17] x86/split_lock: Enumerate split lock detection by MSR_IA32_CORE_CAP Fenghua Yu
2019-06-18 22:41 ` [PATCH v9 07/17] x86/split_lock: Enumerate split lock detection on Icelake mobile processor Fenghua Yu
2019-06-18 22:41 ` [PATCH v9 08/17] x86/split_lock: Define MSR TEST_CTL register Fenghua Yu
2019-06-18 22:41 ` [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock Fenghua Yu
2019-06-26 20:20   ` Thomas Gleixner
2019-06-26 20:36     ` Fenghua Yu
2019-06-26 21:47       ` Thomas Gleixner
2019-09-25 18:09         ` Sean Christopherson
2019-10-16  6:58           ` Xiaoyao Li
2019-10-16  9:29           ` Thomas Gleixner
2019-10-16 15:59             ` Sean Christopherson
2019-10-16  9:40           ` Paolo Bonzini
2019-10-16  9:47             ` Thomas Gleixner
2019-10-16 10:16               ` Paolo Bonzini
2019-10-16 11:23                 ` Xiaoyao Li
2019-10-16 11:26                   ` Paolo Bonzini
2019-10-16 13:13                     ` Xiaoyao Li
2019-10-16 14:43                       ` Thomas Gleixner
2019-10-16 15:37                         ` Paolo Bonzini
2019-10-16 16:25                           ` Xiaoyao Li
2019-10-16 16:38                             ` Paolo Bonzini
2019-10-17 12:29                           ` [RFD] x86/split_lock: Request to Intel Thomas Gleixner
2019-10-17 17:23                             ` Sean Christopherson
2019-10-17 21:31                               ` Thomas Gleixner
2019-10-17 23:38                                 ` Sean Christopherson
2019-10-17 23:28                             ` Luck, Tony
2019-10-18 10:45                               ` David Laight
2019-10-18 21:03                                 ` hpa
2019-10-18  2:36                             ` Xiaoyao Li
2019-10-18  9:02                               ` Thomas Gleixner
2019-10-18 10:20                                 ` Xiaoyao Li
2019-10-18 10:43                                   ` Peter Zijlstra
2019-10-16 11:49                 ` [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split lock Thomas Gleixner
2019-10-16 11:58                   ` Paolo Bonzini
2019-10-16 13:51                     ` Xiaoyao Li
2019-10-16 14:08                       ` Paolo Bonzini
2019-10-16 14:14                         ` David Laight
2019-10-16 15:03                           ` Thomas Gleixner
2019-10-16 15:41                         ` Sean Christopherson
2019-10-16 15:43                           ` Paolo Bonzini
2019-10-16 16:23                             ` Sean Christopherson
2019-10-16 17:42                               ` Sean Christopherson
2019-10-17  1:23                                 ` Xiaoyao Li
2019-10-21 13:06                                   ` Paolo Bonzini
2019-10-21 13:03                                 ` Paolo Bonzini
2019-10-21 13:02                               ` Paolo Bonzini
2019-10-16 14:50                       ` Thomas Gleixner
2019-06-18 22:41 ` [PATCH v9 10/17] kvm/x86: Emulate MSR IA32_CORE_CAPABILITY Fenghua Yu
2019-06-18 22:41 ` [PATCH v9 11/17] kvm/vmx: Emulate MSR TEST_CTL Fenghua Yu
2019-06-27  2:24   ` Xiaoyao Li
2019-06-27  7:12     ` Thomas Gleixner
2019-06-27  7:58       ` Xiaoyao Li
2019-06-27 12:11         ` Thomas Gleixner
2019-06-27 12:22           ` Xiaoyao Li
2019-06-18 22:41 ` [PATCH v9 12/17] x86/split_lock: Enable split lock detection by default Fenghua Yu
2019-06-18 22:41 ` [PATCH v9 13/17] x86/split_lock: Disable split lock detection by kernel parameter "nosplit_lock_detect" Fenghua Yu
2019-06-26 20:34   ` Thomas Gleixner
2019-06-26 20:37     ` Fenghua Yu
2019-06-18 22:41 ` [PATCH v9 14/17] x86/split_lock: Add a debugfs interface to enable/disable split lock detection during run time Fenghua Yu
2019-06-26 21:37   ` Thomas Gleixner
2019-06-18 22:41 ` [PATCH v9 15/17] x86/split_lock: Add documentation for split lock detection interface Fenghua Yu
2019-06-26 21:51   ` Thomas Gleixner
2019-06-18 22:41 ` [PATCH v9 16/17] x86/split_lock: Reorganize few header files in order to call WARN_ON_ONCE() in atomic bit ops Fenghua Yu
2019-06-18 22:41 ` [PATCH v9 17/17] x86/split_lock: Warn on unaligned address in atomic bit operations Fenghua Yu
2019-06-26 22:00   ` Thomas Gleixner
2019-09-16 22:39 ` [PATCH 0/3] Fix some 4-byte vs. 8-byte alignment issues Tony Luck
2019-09-16 22:39   ` [PATCH 1/3] x86/common: Align cpu_caps_cleared and cpu_caps_set to unsigned long Tony Luck
2019-11-15 19:26     ` [tip: x86/cpu] x86/cpu: " tip-bot2 for Fenghua Yu
2019-09-16 22:39   ` [PATCH 2/3] drivers/net/b44: Align pwol_mask to unsigned long for better performance Tony Luck
2019-09-16 22:39   ` [PATCH 3/3] x86/split_lock: Align the x86_capability array to size of unsigned long Tony Luck
2019-09-17  8:29     ` David Laight
2019-09-17 19:14       ` Luck, Tony
2019-09-18  8:54         ` David Laight
2019-11-15 19:26     ` [tip: x86/cpu] x86/cpu: " tip-bot2 for Fenghua Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).