linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] AMD Zen4 CPU bug? Spurious SMT Sibling Invalid Opcode Speculation
@ 2023-10-04 15:29 René Rebe
  2023-10-04 22:25 ` Borislav Petkov
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: René Rebe @ 2023-10-04 15:29 UTC (permalink / raw)
  To: linux-kernel, x86; +Cc: Boris Petkov

Hello everyone,

during cross compiling our “Embedded” Linux Distribution T2 (https://t2sde.org) I observers some random illegal instruction build errors since we got ourselves a Ryzen 7950x on launch day a year ago:

vendor_id : AuthenticAMD
cpu family : 25
model : 97
model name : AMD Ryzen 9 7950X 16-Core Processor
stepping : 2
microcode : 0xa601203

Initially I thought it must surely be some early system instability, and some DDR5 AGESA and microcode updates will eventually take care of that. Month after month passed and so far no BIOS update helped. So I finally started to investigate this over the last months, run the DDR5 memory at base clock, then disabled Precision Boost, and at the end run the CPU and RAM even below advertised base clock and the pseudo random illegal instructions at some gcc instances where still observed and I started to realized they where actually 99% quite identical around gcc user-space address 0xc0e0c0 (sometimes slightly off like 0xc08aaf):

during GIMPLE pass: switchlower
../src/intel/vulkan/anv_nir_lower_ubo_loads.c: In function 'lower_ubo_load_instr':
../src/intel/vulkan/anv_nir_lower_ubo_loads.c:28:1: internal compiler error: Illegal instruction
   28 | lower_ubo_load_instr(nir_builder *b, nir_instr *instr, UNUSED void *_data)
      | ^~~~~~~~~~~~~~~~~~~~
0x1435c95 internal_error(char const*, ...)
        ???:0
0xc0e0c0 tree_switch_conversion::switch_decision_tree::try_switch_expansion(vec<tree_switch_conversion::cluster*, va_heap, vl_ptr>&)
        ???:0
0xc0eb69 tree_switch_conversion::switch_decision_tree::analyze_switch_statement()
        ???:0

   0x0000000000c0e0b6 <+246>:   cmp    $0xffffffff,%r14d
   0x0000000000c0e0ba <+250>:   je     0xc0e190 <_ZN22tree_switch_conversion20switch_decision_tree20try_switch_expansionER3vecIPNS_7clusterE7va_heap6vl_ptrE+464>
   0x0000000000c0e0c0 <+256>:   mov    %rax,%rcx # <----- HERE -----!
   0x0000000000c0e0c3 <+259>:   cmpb   $0x0,0xa8(%r13)
   0x0000000000c0e0cb <+267>:   jne    0xc0e060 <_ZN22tree_switch_conversion20switch_decision_tree20try_switch_expansionER3vecIPNS_7clusterE7va_heap6vl_ptrE+160>
   0x0000000000c0e0cd <+269>:   mov    0xa0(%r13),%rsi
   0x0000000000c0e0d4 <+276>:   mov    $0x8,%eax

The illegal instructions only occur sometimes, so rebuilding a package is usually successful.
To rule out any software inherited instability, I booted the bit identical system copy (using rsync) on a Ryzen 5950x and could build all the system using the identical kernel and gcc binaries without any such spurious illegal instructions.

This appeared to mostly show up with gcc as cross compiled for sparc64, but that should not matter, as this ist just generic x86-64(-v1) code that in similar sequence, memory access and I/O pattern could likely appear in any other sophisticated and complex enough user-space program.

Trying to further narrow this down, and wether it is just one defect core, I patched the kernel to show the likely CPU. Not sure if this is the most reliable, but that is the patch:

--- linux-6.5/arch/x86/kernel/traps.c.orig      2023-10-02 11:53:47.413623693 +0200
+++ linux-6.5/arch/x86/kernel/traps.c   2023-10-02 11:53:58.580624927 +0200
@@ -294,8 +294,12 @@
 static inline void handle_invalid_op(struct pt_regs regs)
 #endif
 {
+       void __useraddr = error_get_trap_addr(regs);
+       int cpu = raw_smp_processor_id();
+       printk("INVALID OPCODE: %lx likely on CPU %d (core %d, socket %d)\n",
+               cpu, addr, topology_core_id(cpu), topology_physical_package_id(cpu));
        do_error_trap(regs, 0, "invalid opcode", X86_TRAP_UD, SIGILL,
ILL_ILLOPN, error_get_trap_addr(regs));
+                     ILL_ILLOPN, addr);
 }

This showed number over all cores and CCX to be affected:

[ 1901.688448] INVALID OPCODE: c0e0c0 likely on CPU 26 (core 10, socket 0)
[ 1930.529211] INVALID OPCODE: c0e0c0 likely on CPU 21 (core 5, socket 0)
[ 1971.898911] INVALID OPCODE: c0e0c0 likely on CPU 27 (core 11, socket 0)
[ 2006.781557] INVALID OPCODE: c0e0c0 likely on CPU 19 (core 3, socket 0)
[ 2054.672900] INVALID OPCODE: c0e0c0 likely on CPU 30 (core 14, socket 0)
[ 2097.180969] INVALID OPCODE: c0e0c0 likely on CPU 27 (core 11, socket 0)
[ 2140.558150] INVALID OPCODE: c0e0c0 likely on CPU 23 (core 7, socket 0)
[ 2168.601674] INVALID OPCODE: c0e0c0 likely on CPU 15 (core 15, socket 0)
…

I sorted the result # dmesg | grep INVALID| sed 's/.*://' | cut -d ' ' -f 6| sort -n| uniq -c
      4 0
      2 1
      2 2
      2 3
      5 4
      5 5 
      3 7 
      4 8 
      4 9 
      2 10
      4 11
      3 12
      5 13
      5 14
      2 15
      6 16
      8 17
      2 18
      5 19
      5 20
      3 21
      7 22
      5 23
      3 24
      2 25
      2 26
      3 27
      4 28
      4 29
      6 30
      5 31

Already discussing this issue with some other folks and kernel developer it was suggested it could be TLB related, and we realized we were booting with mitigations=off for a little higher all system compilation performance and I can report that without mitigations=off this spurious illegal instructions do not appear. Also disabling SMT makes the problem disappear, too.

So I iterated over all the mitigation options and found spectre_v2_user=off to be enough to make this bug reproducibly appear when loading most cores running this sparc64-t2-linux-gcc.

Now the good news is: running with modern security mitigations enabled hides this what to me looks like a Zen 4 SMT sibling processor state corruption bug or mis-speculation. However, I would argue, non malicious user-more programs should not exhibit spurious illegal instructions with an operating system running in a classic, high performance mode without any special security mitigations in place.

As this is very reproducible with GCC for sparc64 for me, I created an initrd with a pre-processed source file (from Mesa IIRC) setup to boot into a loop running sparc64-t2-linux-gcc on all cores (all grouped in usr/local) for others to test how widespread this issue is:

https://dl.t2sde.org/amd-zen4-smt-c0fefe/

Boot with:
spectre_v2_user=off or mitigations=off

It is even reproducible in qemu/kvm running on a host with this spectre_v2_user=off:
qemu-system-x86_64 --enable-kvm -smp 32 -cpu host -m 4G -kernel vmlinuz-6.5.5-t2 -initrd initrd-6.5.5-t2.gz -nographic -append "console=ttyS0"

To test on your system with chroot:
mkdir bug; cd bug; gunzip ../initrd-6.5.5-t2 | cpio -i
chroot . usr/local/init

With this reduced test case illegal instructions appear within an average of just 5 seconds on my Ryzen 7950x.

To rule out that this is some random linux kernel config and optimization fluke, I built the kernel with clang and gcc, without any change, and also downloaded the latest Intel Clear Linux kernel binary to double checked that it is affected in the same way, and sure it does.

After all this research, to me this looks like an Zen 4 CPU bug, but any other comments, hints, patches welcome!

I realize AMD has never microcode for Epyc server CPUs, if this is already fixed in some newer microcode, it would really be amazing (hint) if AMD would release microcode updates for $999 consumer CPUs in a more timely manner, and not only high end server SKUs via linux-firmware, ...

Thank you so much,

René Rebe

--
ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
http://exactcode.com | http://exactscan.com | http://ocrkit.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] AMD Zen4 CPU bug? Spurious SMT Sibling Invalid Opcode Speculation
  2023-10-04 15:29 [RFC] AMD Zen4 CPU bug? Spurious SMT Sibling Invalid Opcode Speculation René Rebe
@ 2023-10-04 22:25 ` Borislav Petkov
  2023-10-06  9:21   ` René Rebe
  2023-10-11  9:23 ` [tip: x86/urgent] x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs tip-bot2 for Borislav Petkov (AMD)
  2023-10-12 18:20 ` [tip: perf/core] x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations tip-bot2 for Borislav Petkov
  2 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2023-10-04 22:25 UTC (permalink / raw)
  To: René Rebe; +Cc: linux-kernel, x86

On Wed, Oct 04, 2023 at 05:29:32PM +0200, René Rebe wrote:
> during cross compiling our “Embedded” Linux Distribution T2 (https://t2sde.org) I observers some random illegal instruction build errors since we got ourselves a Ryzen 7950x on launch day a year ago:

Thanks for reporting. I'm looking into it.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] AMD Zen4 CPU bug? Spurious SMT Sibling Invalid Opcode Speculation
  2023-10-04 22:25 ` Borislav Petkov
@ 2023-10-06  9:21   ` René Rebe
  2023-10-06  9:32     ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: René Rebe @ 2023-10-06  9:21 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-kernel, x86

Hi,

> On 5. Oct 2023, at 00:25, Borislav Petkov <bp@alien8.de> wrote:
> 
> On Wed, Oct 04, 2023 at 05:29:32PM +0200, René Rebe wrote:
>> during cross compiling our “Embedded” Linux Distribution T2 (https://t2sde.org) I observers some random illegal instruction build errors since we got ourselves a Ryzen 7950x on launch day a year ago:
> 
> Thanks for reporting. I'm looking into it.

Thank you Borislav, were you able to reproduce this on Zen 4 you have access to?

Thanks,
	René

--
ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
http://exactcode.com | http://exactscan.com | http://ocrkit.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] AMD Zen4 CPU bug? Spurious SMT Sibling Invalid Opcode Speculation
  2023-10-06  9:21   ` René Rebe
@ 2023-10-06  9:32     ` Borislav Petkov
  2023-10-10  8:39       ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2023-10-06  9:32 UTC (permalink / raw)
  To: René Rebe; +Cc: linux-kernel, x86

On Fri, Oct 06, 2023 at 11:21:13AM +0200, René Rebe wrote:
> Thank you Borislav, were you able to reproduce this on Zen 4 you have
> access to?

I'm still working on it and I'll have something soon.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] AMD Zen4 CPU bug? Spurious SMT Sibling Invalid Opcode Speculation
  2023-10-06  9:32     ` Borislav Petkov
@ 2023-10-10  8:39       ` Borislav Petkov
  2023-10-10 21:18         ` René Rebe
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2023-10-10  8:39 UTC (permalink / raw)
  To: René Rebe; +Cc: linux-kernel, x86

On Fri, Oct 06, 2023 at 11:32:44AM +0200, Borislav Petkov wrote:
> I'm still working on it and I'll have something soon.

Ok, try this below and see whether it fixes your reproducer.

Thx.

---
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Sat, 7 Oct 2023 12:57:02 +0200
Subject: [PATCH] x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs

Fix erratum #1485 on Zen4 parts where running with STIBP disabled can
cause an #UD exception. The performance impact of the fix is negligible.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
---
 arch/x86/include/asm/msr-index.h | 9 +++++++--
 arch/x86/kernel/cpu/amd.c        | 8 ++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1d111350197f..b37abb55e948 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -637,12 +637,17 @@
 /* AMD Last Branch Record MSRs */
 #define MSR_AMD64_LBR_SELECT			0xc000010e
 
-/* Fam 17h MSRs */
-#define MSR_F17H_IRPERF			0xc00000e9
+/* Zen4 */
+#define MSR_ZEN4_BP_CFG			0xc001102e
+#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
 
+/* Zen 2 */
 #define MSR_ZEN2_SPECTRAL_CHICKEN	0xc00110e3
 #define MSR_ZEN2_SPECTRAL_CHICKEN_BIT	BIT_ULL(1)
 
+/* Fam 17h MSRs */
+#define MSR_F17H_IRPERF			0xc00000e9
+
 /* Fam 16h MSRs */
 #define MSR_F16H_L2I_PERF_CTL		0xc0010230
 #define MSR_F16H_L2I_PERF_CTR		0xc0010231
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 03ef962a6992..ece2b5b7b0fe 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -80,6 +80,10 @@ static const int amd_div0[] =
 	AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x00, 0x0, 0x2f, 0xf),
 			   AMD_MODEL_RANGE(0x17, 0x50, 0x0, 0x5f, 0xf));
 
+static const int amd_erratum_1485[] =
+	AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x19, 0x10, 0x0, 0x1f, 0xf),
+			   AMD_MODEL_RANGE(0x19, 0x60, 0x0, 0xaf, 0xf));
+
 static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
 {
 	int osvw_id = *erratum++;
@@ -1149,6 +1153,10 @@ static void init_amd(struct cpuinfo_x86 *c)
 		pr_notice_once("AMD Zen1 DIV0 bug detected. Disable SMT for full protection.\n");
 		setup_force_cpu_bug(X86_BUG_DIV0);
 	}
+
+	if (!cpu_has(c, X86_FEATURE_HYPERVISOR) &&
+	     cpu_has_amd_erratum(c, amd_erratum_1485))
+		msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT);
 }
 
 #ifdef CONFIG_X86_32
-- 
2.42.0.rc0.25.ga82fb66fed25

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC] AMD Zen4 CPU bug? Spurious SMT Sibling Invalid Opcode Speculation
  2023-10-10  8:39       ` Borislav Petkov
@ 2023-10-10 21:18         ` René Rebe
  2023-10-11  8:59           ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: René Rebe @ 2023-10-10 21:18 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-kernel, x86

Hi Borislav,


> On 10. Oct 2023, at 10:39, Borislav Petkov <bp@alien8.de> wrote:
> 
> On Fri, Oct 06, 2023 at 11:32:44AM +0200, Borislav Petkov wrote:
>> I'm still working on it and I'll have something soon.
> 
> Ok, try this below and see whether it fixes your reproducer.

On the first day the patch so far appears to have prevented
the spurious #UD exception to appear again.

Tested-by: René Rebe <rene@exactcode.de>

> Thx.
> 
> ---
> From: "Borislav Petkov (AMD)" <bp@alien8.de>
> Date: Sat, 7 Oct 2023 12:57:02 +0200
> Subject: [PATCH] x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs
> 
> Fix erratum #1485 on Zen4 parts where running with STIBP disabled can
> cause an #UD exception. The performance impact of the fix is negligible.
> 
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> Cc: <stable@kernel.org>
> ---
> arch/x86/include/asm/msr-index.h | 9 +++++++--
> arch/x86/kernel/cpu/amd.c        | 8 ++++++++
> 2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 1d111350197f..b37abb55e948 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -637,12 +637,17 @@
> /* AMD Last Branch Record MSRs */
> #define MSR_AMD64_LBR_SELECT 0xc000010e
> 
> -/* Fam 17h MSRs */
> -#define MSR_F17H_IRPERF 0xc00000e9
> +/* Zen4 */
> +#define MSR_ZEN4_BP_CFG 0xc001102e
> +#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
> 
> +/* Zen 2 */
> #define MSR_ZEN2_SPECTRAL_CHICKEN 0xc00110e3
> #define MSR_ZEN2_SPECTRAL_CHICKEN_BIT BIT_ULL(1)
> 
> +/* Fam 17h MSRs */
> +#define MSR_F17H_IRPERF 0xc00000e9
> +
> /* Fam 16h MSRs */
> #define MSR_F16H_L2I_PERF_CTL 0xc0010230
> #define MSR_F16H_L2I_PERF_CTR 0xc0010231
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 03ef962a6992..ece2b5b7b0fe 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -80,6 +80,10 @@ static const int amd_div0[] =
> AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x00, 0x0, 0x2f, 0xf),
>  AMD_MODEL_RANGE(0x17, 0x50, 0x0, 0x5f, 0xf));
> 
> +static const int amd_erratum_1485[] =
> + AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x19, 0x10, 0x0, 0x1f, 0xf),
> +   AMD_MODEL_RANGE(0x19, 0x60, 0x0, 0xaf, 0xf));
> +
> static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
> {
> int osvw_id = *erratum++;
> @@ -1149,6 +1153,10 @@ static void init_amd(struct cpuinfo_x86 *c)
> pr_notice_once("AMD Zen1 DIV0 bug detected. Disable SMT for full protection.\n");
> setup_force_cpu_bug(X86_BUG_DIV0);
> }
> +
> + if (!cpu_has(c, X86_FEATURE_HYPERVISOR) &&
> +     cpu_has_amd_erratum(c, amd_erratum_1485))
> + msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT);
> }
> 
> #ifdef CONFIG_X86_32
> -- 
> 2.42.0.rc0.25.ga82fb66fed25
> 
> -- 
> Regards/Gruss,
>   Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

--
ExactCODE GmbH, Lietzenburger Str. 42, DE-10789 Berlin
http://exactcode.com | http://exactscan.com | http://ocrkit.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC] AMD Zen4 CPU bug? Spurious SMT Sibling Invalid Opcode Speculation
  2023-10-10 21:18         ` René Rebe
@ 2023-10-11  8:59           ` Borislav Petkov
  0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2023-10-11  8:59 UTC (permalink / raw)
  To: René Rebe; +Cc: linux-kernel, x86

On Tue, Oct 10, 2023 at 11:18:57PM +0200, René Rebe wrote:
> On the first day the patch so far appears to have prevented
> the spurious #UD exception to appear again.
> 
> Tested-by: René Rebe <rene@exactcode.de>

Thanks for reporting and testing!

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [tip: x86/urgent] x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs
  2023-10-04 15:29 [RFC] AMD Zen4 CPU bug? Spurious SMT Sibling Invalid Opcode Speculation René Rebe
  2023-10-04 22:25 ` Borislav Petkov
@ 2023-10-11  9:23 ` tip-bot2 for Borislav Petkov (AMD)
  2023-10-11 21:28   ` Ingo Molnar
  2023-10-12 18:20 ` [tip: perf/core] x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations tip-bot2 for Borislav Petkov
  2 siblings, 1 reply; 12+ messages in thread
From: tip-bot2 for Borislav Petkov (AMD) @ 2023-10-11  9:23 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: rene, Borislav Petkov (AMD), stable, x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     f454b18e07f518bcd0c05af17a2239138bff52de
Gitweb:        https://git.kernel.org/tip/f454b18e07f518bcd0c05af17a2239138bff52de
Author:        Borislav Petkov (AMD) <bp@alien8.de>
AuthorDate:    Sat, 07 Oct 2023 12:57:02 +02:00
Committer:     Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Wed, 11 Oct 2023 11:00:11 +02:00

x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs

Fix erratum #1485 on Zen4 parts where running with STIBP disabled can
cause an #UD exception. The performance impact of the fix is negligible.

Reported-by: René Rebe <rene@exactcode.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: René Rebe <rene@exactcode.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/D99589F4-BC5D-430B-87B2-72C20370CF57@exactcode.com
---
 arch/x86/include/asm/msr-index.h |  9 +++++++--
 arch/x86/kernel/cpu/amd.c        |  8 ++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 1d11135..b37abb5 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -637,12 +637,17 @@
 /* AMD Last Branch Record MSRs */
 #define MSR_AMD64_LBR_SELECT			0xc000010e
 
-/* Fam 17h MSRs */
-#define MSR_F17H_IRPERF			0xc00000e9
+/* Zen4 */
+#define MSR_ZEN4_BP_CFG			0xc001102e
+#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
 
+/* Zen 2 */
 #define MSR_ZEN2_SPECTRAL_CHICKEN	0xc00110e3
 #define MSR_ZEN2_SPECTRAL_CHICKEN_BIT	BIT_ULL(1)
 
+/* Fam 17h MSRs */
+#define MSR_F17H_IRPERF			0xc00000e9
+
 /* Fam 16h MSRs */
 #define MSR_F16H_L2I_PERF_CTL		0xc0010230
 #define MSR_F16H_L2I_PERF_CTR		0xc0010231
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 03ef962..ece2b5b 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -80,6 +80,10 @@ static const int amd_div0[] =
 	AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x17, 0x00, 0x0, 0x2f, 0xf),
 			   AMD_MODEL_RANGE(0x17, 0x50, 0x0, 0x5f, 0xf));
 
+static const int amd_erratum_1485[] =
+	AMD_LEGACY_ERRATUM(AMD_MODEL_RANGE(0x19, 0x10, 0x0, 0x1f, 0xf),
+			   AMD_MODEL_RANGE(0x19, 0x60, 0x0, 0xaf, 0xf));
+
 static bool cpu_has_amd_erratum(struct cpuinfo_x86 *cpu, const int *erratum)
 {
 	int osvw_id = *erratum++;
@@ -1149,6 +1153,10 @@ static void init_amd(struct cpuinfo_x86 *c)
 		pr_notice_once("AMD Zen1 DIV0 bug detected. Disable SMT for full protection.\n");
 		setup_force_cpu_bug(X86_BUG_DIV0);
 	}
+
+	if (!cpu_has(c, X86_FEATURE_HYPERVISOR) &&
+	     cpu_has_amd_erratum(c, amd_erratum_1485))
+		msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT);
 }
 
 #ifdef CONFIG_X86_32

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [tip: x86/urgent] x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs
  2023-10-11  9:23 ` [tip: x86/urgent] x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs tip-bot2 for Borislav Petkov (AMD)
@ 2023-10-11 21:28   ` Ingo Molnar
  2023-10-12  7:40     ` Borislav Petkov
  0 siblings, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2023-10-11 21:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-tip-commits, rene, Borislav Petkov (AMD), x86, Peter Zijlstra


* tip-bot2 for Borislav Petkov (AMD) <tip-bot2@linutronix.de> wrote:

>  /* AMD Last Branch Record MSRs */
>  #define MSR_AMD64_LBR_SELECT			0xc000010e
>  
> +/* Zen4 */
> +#define MSR_ZEN4_BP_CFG			0xc001102e
> +#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
>  
> +/* Zen 2 */
>  #define MSR_ZEN2_SPECTRAL_CHICKEN	0xc00110e3
>  #define MSR_ZEN2_SPECTRAL_CHICKEN_BIT	BIT_ULL(1)
>  
> +/* Fam 17h MSRs */
> +#define MSR_F17H_IRPERF			0xc00000e9

Yeah, so these latest AMD MSR definitions in <asm/msr-index.h> are pretty 
confused, they list MSRs in the following order:

   Zen 4
   Zen 2
   Fam 19h         // resolution in tip:master
   Fam 17h

where perf/core added a Fam 19h section a couple of days ago ...

While in reality:

   Zen 2 == Fam 17h
   Zen 4 == Fam 19h

So it's confusing to list these separately and out of order.

So in resolving the conflict in perf/core I updated this section to read:

  /* Fam 19h (Zen 4) MSRs */
  #define MSR_F19H_UMC_PERF_CTL		0xc0010800
  #define MSR_F19H_UMC_PERF_CTR		0xc0010801

  #define MSR_ZEN4_BP_CFG		0xc001102e
  #define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5

  /* Fam 17h (Zen 2) MSRs */
  #define MSR_F17H_IRPERF		0xc00000e9

  #define MSR_ZEN2_SPECTRAL_CHICKEN	0xc00110e3
  #define MSR_ZEN2_SPECTRAL_CHICKEN_BIT	BIT_ULL(1)

This doesn't change the definitions themselves, only merges the comments 
and the sections, (to keep the Git conflict resolution non-evil), but 
arguably once perf/core goes upstream, we should probably unify the naming 
to follow the existing nomenclature, which is, starting at around F15H, the 
following:

   MSR_F15H_
   MSR_F16H_
   MSR_F17H_
   MSR_F19H_

Or are the MSRs named ZEN2 and ZEN4 in AMD SDMs, which we should follow?

Anyway, something to keep in mind.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [tip: x86/urgent] x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs
  2023-10-11 21:28   ` Ingo Molnar
@ 2023-10-12  7:40     ` Borislav Petkov
  2023-10-12 18:12       ` [PATCH] x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 enumerations Ingo Molnar
  0 siblings, 1 reply; 12+ messages in thread
From: Borislav Petkov @ 2023-10-12  7:40 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, linux-tip-commits, rene, x86, Peter Zijlstra

On Wed, Oct 11, 2023 at 11:28:26PM +0200, Ingo Molnar wrote:
> While in reality:
> 
>    Zen 2 == Fam 17h
>    Zen 4 == Fam 19h

If only were that easy...

family 0x17 is Zen1 and 2, family 0x19 is spread around Zen 3 and 4.

> 
> So it's confusing to list these separately and out of order.
> 
> So in resolving the conflict in perf/core I updated this section to read:
> 
>   /* Fam 19h (Zen 4) MSRs */

That's wrong.

>   #define MSR_F19H_UMC_PERF_CTL		0xc0010800
>   #define MSR_F19H_UMC_PERF_CTR		0xc0010801
> 
>   #define MSR_ZEN4_BP_CFG		0xc001102e
>   #define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
> 
>   /* Fam 17h (Zen 2) MSRs */

Ditto.

> This doesn't change the definitions themselves, only merges the comments 
> and the sections, (to keep the Git conflict resolution non-evil), but 
> arguably once perf/core goes upstream, we should probably unify the naming 
> to follow the existing nomenclature, which is, starting at around F15H, the 
> following:
> 
>    MSR_F15H_
>    MSR_F16H_
>    MSR_F17H_
>    MSR_F19H_
> 
> Or are the MSRs named ZEN2 and ZEN4 in AMD SDMs, which we should follow?

See above. The MSRs are per Zen generation while the family is per
family. Yes, it is confusing. :-\

IOW, you want to have this as the end product:

/* Zen4 */
#define MSR_ZEN4_BP_CFG                 0xc001102e
#define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5

/* Fam 19h MSRs */
#define MSR_F19H_UMC_PERF_CTL           0xc0010800
#define MSR_F19H_UMC_PERF_CTR           0xc0010801

/* Zen 2 */
#define MSR_ZEN2_SPECTRAL_CHICKEN       0xc00110e3
#define MSR_ZEN2_SPECTRAL_CHICKEN_BIT   BIT_ULL(1)

/* Fam 17h MSRs */
#define MSR_F17H_IRPERF			0xc00000e9

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 enumerations
  2023-10-12  7:40     ` Borislav Petkov
@ 2023-10-12 18:12       ` Ingo Molnar
  0 siblings, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2023-10-12 18:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, linux-tip-commits, rene, x86, Peter Zijlstra


* Borislav Petkov <bp@alien8.de> wrote:

> On Wed, Oct 11, 2023 at 11:28:26PM +0200, Ingo Molnar wrote:
> > While in reality:
> > 
> >    Zen 2 == Fam 17h
> >    Zen 4 == Fam 19h
> 
> If only were that easy...
> 
> family 0x17 is Zen1 and 2, family 0x19 is spread around Zen 3 and 4.
>
...
> See above. The MSRs are per Zen generation while the family is per
> family. Yes, it is confusing. :-\

Fun!
 
> IOW, you want to have this as the end product:
> 
> /* Zen4 */
> #define MSR_ZEN4_BP_CFG                 0xc001102e
> #define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
> 
> /* Fam 19h MSRs */
> #define MSR_F19H_UMC_PERF_CTL           0xc0010800
> #define MSR_F19H_UMC_PERF_CTR           0xc0010801
> 
> /* Zen 2 */
> #define MSR_ZEN2_SPECTRAL_CHICKEN       0xc00110e3
> #define MSR_ZEN2_SPECTRAL_CHICKEN_BIT   BIT_ULL(1)
> 
> /* Fam 17h MSRs */
> #define MSR_F17H_IRPERF			0xc00000e9

Ok, thanks - I've distilled your enumeration order into the separate
patch below - there's more commits in perf/core meanwhile, and maybe
it isn't even bad there's a bit of a spotlight on the naming
scheme here.

I've turned your above grouping & comments into a patch, created a 
changelog and added your SOB, see the perf/core commit below.
Lemme know if that's not OK to you.

Thanks,

	Ingo

=============>
From: Borislav Petkov <bp@alien8.de>
Date: Thu, 12 Oct 2023 20:01:59 +0200
Subject: [PATCH] x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations

The comments introduced in <asm/msr-index.h> in the merge conflict fixup in:

  8f4156d58713 ("Merge branch 'x86/urgent' into perf/core, to resolve conflict")

... aren't right: AMD naming schemes are more complex than implied,
family 0x17 is Zen1 and 2, family 0x19 is spread around Zen 3 and 4.

So there's indeed four separate MSR namespaces for:

  MSR_F17H_
  MSR_F19H_
  MSR_ZEN2_
  MSR_ZEN4_

... and the namespaces cannot be merged.

Fix it up. No change in functionality.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/D99589F4-BC5D-430B-87B2-72C20370CF57@exactcode.com
---
 arch/x86/include/asm/msr-index.h | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 0ad9ba8baa8a..f8b502867dd1 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -637,18 +637,20 @@
 /* AMD Last Branch Record MSRs */
 #define MSR_AMD64_LBR_SELECT			0xc000010e
 
-/* Fam 19h (Zen 4) MSRs */
-#define MSR_F19H_UMC_PERF_CTL		0xc0010800
-#define MSR_F19H_UMC_PERF_CTR		0xc0010801
-
-#define MSR_ZEN4_BP_CFG			0xc001102e
+/* Zen4 */
+#define MSR_ZEN4_BP_CFG                 0xc001102e
 #define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
 
-/* Fam 17h (Zen 2) MSRs */
-#define MSR_F17H_IRPERF			0xc00000e9
+/* Fam 19h MSRs */
+#define MSR_F19H_UMC_PERF_CTL           0xc0010800
+#define MSR_F19H_UMC_PERF_CTR           0xc0010801
 
-#define MSR_ZEN2_SPECTRAL_CHICKEN	0xc00110e3
-#define MSR_ZEN2_SPECTRAL_CHICKEN_BIT	BIT_ULL(1)
+/* Zen 2 */
+#define MSR_ZEN2_SPECTRAL_CHICKEN       0xc00110e3
+#define MSR_ZEN2_SPECTRAL_CHICKEN_BIT   BIT_ULL(1)
+
+/* Fam 17h MSRs */
+#define MSR_F17H_IRPERF			0xc00000e9
 
 /* Fam 16h MSRs */
 #define MSR_F16H_L2I_PERF_CTL		0xc0010230

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [tip: perf/core] x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations
  2023-10-04 15:29 [RFC] AMD Zen4 CPU bug? Spurious SMT Sibling Invalid Opcode Speculation René Rebe
  2023-10-04 22:25 ` Borislav Petkov
  2023-10-11  9:23 ` [tip: x86/urgent] x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs tip-bot2 for Borislav Petkov (AMD)
@ 2023-10-12 18:20 ` tip-bot2 for Borislav Petkov
  2 siblings, 0 replies; 12+ messages in thread
From: tip-bot2 for Borislav Petkov @ 2023-10-12 18:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Borislav Petkov (AMD), Ingo Molnar, Peter Zijlstra, x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     deedec0a152a3d7fa5b04ef9431aeb71802835b5
Gitweb:        https://git.kernel.org/tip/deedec0a152a3d7fa5b04ef9431aeb71802835b5
Author:        Borislav Petkov <bp@alien8.de>
AuthorDate:    Thu, 12 Oct 2023 20:01:59 +02:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 12 Oct 2023 20:10:39 +02:00

x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations

The comments introduced in <asm/msr-index.h> in the merge conflict fixup in:

  8f4156d58713 ("Merge branch 'x86/urgent' into perf/core, to resolve conflict")

... aren't right: AMD naming schemes are more complex than implied,
family 0x17 is Zen1 and 2, family 0x19 is spread around Zen 3 and 4.

So there's indeed four separate MSR namespaces for:

  MSR_F17H_
  MSR_F19H_
  MSR_ZEN2_
  MSR_ZEN4_

... and the namespaces cannot be merged.

Fix it up. No change in functionality.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/D99589F4-BC5D-430B-87B2-72C20370CF57@exactcode.com
---
 arch/x86/include/asm/msr-index.h | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 0ad9ba8..f8b5028 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -637,18 +637,20 @@
 /* AMD Last Branch Record MSRs */
 #define MSR_AMD64_LBR_SELECT			0xc000010e
 
-/* Fam 19h (Zen 4) MSRs */
-#define MSR_F19H_UMC_PERF_CTL		0xc0010800
-#define MSR_F19H_UMC_PERF_CTR		0xc0010801
-
-#define MSR_ZEN4_BP_CFG			0xc001102e
+/* Zen4 */
+#define MSR_ZEN4_BP_CFG                 0xc001102e
 #define MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT 5
 
-/* Fam 17h (Zen 2) MSRs */
-#define MSR_F17H_IRPERF			0xc00000e9
+/* Fam 19h MSRs */
+#define MSR_F19H_UMC_PERF_CTL           0xc0010800
+#define MSR_F19H_UMC_PERF_CTR           0xc0010801
 
-#define MSR_ZEN2_SPECTRAL_CHICKEN	0xc00110e3
-#define MSR_ZEN2_SPECTRAL_CHICKEN_BIT	BIT_ULL(1)
+/* Zen 2 */
+#define MSR_ZEN2_SPECTRAL_CHICKEN       0xc00110e3
+#define MSR_ZEN2_SPECTRAL_CHICKEN_BIT   BIT_ULL(1)
+
+/* Fam 17h MSRs */
+#define MSR_F17H_IRPERF			0xc00000e9
 
 /* Fam 16h MSRs */
 #define MSR_F16H_L2I_PERF_CTL		0xc0010230

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-10-12 18:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-04 15:29 [RFC] AMD Zen4 CPU bug? Spurious SMT Sibling Invalid Opcode Speculation René Rebe
2023-10-04 22:25 ` Borislav Petkov
2023-10-06  9:21   ` René Rebe
2023-10-06  9:32     ` Borislav Petkov
2023-10-10  8:39       ` Borislav Petkov
2023-10-10 21:18         ` René Rebe
2023-10-11  8:59           ` Borislav Petkov
2023-10-11  9:23 ` [tip: x86/urgent] x86/cpu: Fix AMD erratum #1485 on Zen4-based CPUs tip-bot2 for Borislav Petkov (AMD)
2023-10-11 21:28   ` Ingo Molnar
2023-10-12  7:40     ` Borislav Petkov
2023-10-12 18:12       ` [PATCH] x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 enumerations Ingo Molnar
2023-10-12 18:20 ` [tip: perf/core] x86/cpu: Fix the AMD Fam 17h, Fam 19h, Zen2 and Zen4 MSR enumerations tip-bot2 for Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).