* x86: AMD Zen2 ymm registers rolling back
@ 2023-02-22 6:40 Tavis Ormandy
2023-02-22 8:33 ` Borislav Petkov
2023-03-07 17:46 ` [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch Andrew Cooper
0 siblings, 2 replies; 29+ messages in thread
From: Tavis Ormandy @ 2023-02-22 6:40 UTC (permalink / raw)
To: linux-kernel; +Cc: x86
[-- Attachment #1: Type: text/plain, Size: 2234 bytes --]
Hello, I'm experiencing a serious bug on AMD Zen2 that is causing
registers to "roll back" to previous values after a context switch.
I know that sounds unbelievable - but I have a reliable reproducer!
$ grep -m1 'model name' /proc/cpuinfo
model name : AMD Ryzen Threadripper PRO 3945WX 12-Cores
The bug occurs when there is a context switch after a ucomiss
instruction. The YMM registers are "rolled back" to some previous state.
It's not clear to me how or why this is happening, or if it can happen
across a process boundary.
To reproduce:
$ nasm -felf64 -O0 zenymmasm.asm
$ ld -o zenymmasm zenymmasm.o
If you run it it should just print some nuls:
$ ./zenymmasm
$
That is the expected, correct result.
However, if you run the attached program hammer.c (it just runs
sched_yield() in a loop), and then pin this testcase to the same core as
that:
$ taskset -c 1 ./zenymmasm
SECRETSECRET
The previous register values are restored.
I think this should be impossible.
The code does this:
vmovdqu ymm0, [rel secret] <--- put SECRET into ymm0
mov rax, SYS_sched_yield
syscall
vpxor ymm0, ymm0, ymm0 <--- Here the value of ymm0 should be lost
It's not related to to VPXOR, you can use VZEROALL or whatever else.
ucomiss xmm0, dword [rel space]
mov rax, SYS_sched_yield
syscall
It's the UCOMISS r128,m32 instruction that triggers the bug, the value
of the m32 must be < 0x80000. As far as I know, this should only ever
change condition flags, but if we dump the value of ymm0:
mov rax, SYS_write
mov rdi, 1
lea rsi, [rel regstate]
mov rdx, 32
syscall
It will have reverted back to the pre-vpxor SECRET value !?!?
In the original C program, we were seeing register values randomly
restored from significantly earlier in the program execution (like, from
ld.so), sometimes values we don't recognize and can't explain.
We've reproduced on multiple Zen2 machines.
Obviously, not great if your registers are randomly time travelling :)
Thanks, Tavis.
--
_o) $ lynx lock.cmpxchg8b.com
/\\ _o) _o) $ finger taviso@sdf.org
_\_V _( ) _( ) @taviso
[-- Attachment #2: hammer.c --]
[-- Type: text/plain, Size: 578 bytes --]
#define _GNU_SOURCE
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <stdbool.h>
#include <sched.h>
#include <syscall.h>
#include <x86intrin.h>
#include <sys/random.h>
#include <err.h>
int main(int argc, char **argv)
{
cpu_set_t set;
CPU_ZERO(&set);
CPU_SET(1, &set);
if (sched_setaffinity(0, sizeof(set), &set) != 0) {
err(EXIT_FAILURE, "failed to set cpu affinity");
}
while (true) {
asm volatile ("syscall" :: "a"(SYS_sched_yield) : "rcx", "r11");
}
return 0;
}
[-- Attachment #3: zenymmasm.asm --]
[-- Type: text/plain, Size: 802 bytes --]
BITS 64
global _start
%define SYS_sched_yield 0x18
%define SYS_write 0x01
%define SYS_exit 0x3c
section .data
align 32
secret: times 4 dq 'SECRET'
align 32
regstate: dq 0,0,0,0
align 32
space: dd 1
section .text
_start:
vmovdqu ymm0, [rel secret]
mov rax, SYS_sched_yield
syscall
vpxor ymm0, ymm0, ymm0
mov rax, SYS_sched_yield
syscall
ucomiss xmm0, dword [rel space]
mov rax, SYS_sched_yield
syscall
vmovdqu [rel regstate], ymm0
mov rax, SYS_write
mov rdi, 1
lea rsi, [rel regstate]
mov rdx, 32
syscall
mov rax, SYS_exit
mov rdi, 0
syscall
int3
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-22 6:40 x86: AMD Zen2 ymm registers rolling back Tavis Ormandy
@ 2023-02-22 8:33 ` Borislav Petkov
2023-02-22 9:14 ` Borislav Petkov
2023-03-07 17:46 ` [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch Andrew Cooper
1 sibling, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2023-02-22 8:33 UTC (permalink / raw)
To: Tavis Ormandy; +Cc: linux-kernel, x86
On Tue, Feb 21, 2023 at 10:40:07PM -0800, Tavis Ormandy wrote:
> Hello, I'm experiencing a serious bug on AMD Zen2 that is causing
> registers to "roll back" to previous values after a context switch.
It probably doesn't matter all too much but which kernel are you running
this on?
Latest and greatest I hope...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-22 8:33 ` Borislav Petkov
@ 2023-02-22 9:14 ` Borislav Petkov
2023-02-22 9:38 ` Andrew Cooper
0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2023-02-22 9:14 UTC (permalink / raw)
To: Tavis Ormandy; +Cc: linux-kernel, x86
On Wed, Feb 22, 2023 at 09:33:19AM +0100, Borislav Petkov wrote:
> It probably doesn't matter all too much but which kernel are you running
> this on?
>
> Latest and greatest I hope...
Also, what does
$ grep microcode /proc/cpuinfo | uniq
say?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-22 9:14 ` Borislav Petkov
@ 2023-02-22 9:38 ` Andrew Cooper
2023-02-22 10:09 ` Borislav Petkov
0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2023-02-22 9:38 UTC (permalink / raw)
To: Borislav Petkov, Tavis Ormandy; +Cc: linux-kernel, x86
On 22/02/2023 9:14 am, Borislav Petkov wrote:
> On Wed, Feb 22, 2023 at 09:33:19AM +0100, Borislav Petkov wrote:
>> It probably doesn't matter all too much but which kernel are you running
>> this on?
>>
>> Latest and greatest I hope...
> Also, what does
>
> $ grep microcode /proc/cpuinfo | uniq
>
> say?
This sounds suspiciously like an errata which was fixed with a ucode
update last year.
Check that you are up to date with ucode. If you are, perhaps the fix
needs another look at...
~Andrew
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-22 9:38 ` Andrew Cooper
@ 2023-02-22 10:09 ` Borislav Petkov
2023-02-22 21:26 ` Tavis Ormandy
2023-02-28 18:47 ` Alexander Monakov
0 siblings, 2 replies; 29+ messages in thread
From: Borislav Petkov @ 2023-02-22 10:09 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Tavis Ormandy, linux-kernel, x86
On Wed, Feb 22, 2023 at 09:38:09AM +0000, Andrew Cooper wrote:
> This sounds suspiciously like an errata which was fixed with a ucode
> update last year.
Yes, it looks like it.
Alternatively, you can try booting with "clearcpuid=xsaves" - that
should take care of your observation too but yeah, you should rather
update your microcode.
HTH.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-22 10:09 ` Borislav Petkov
@ 2023-02-22 21:26 ` Tavis Ormandy
2023-02-22 22:17 ` Andrew Cooper
2023-02-22 22:25 ` Borislav Petkov
2023-02-28 18:47 ` Alexander Monakov
1 sibling, 2 replies; 29+ messages in thread
From: Tavis Ormandy @ 2023-02-22 21:26 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Andrew Cooper, linux-kernel, x86
On Wed, Feb 22, 2023 at 11:09:51AM +0100, Borislav Petkov wrote:
> On Wed, Feb 22, 2023 at 09:38:09AM +0000, Andrew Cooper wrote:
> > This sounds suspiciously like an errata which was fixed with a ucode
> > update last year.
>
> Yes, it looks like it.
>
> Alternatively, you can try booting with "clearcpuid=xsaves" - that
> should take care of your observation too but yeah, you should rather
> update your microcode.
>
Thanks - confirmed, it *doesn't* repro with 0x8301055, but does repro
with 0x830104d.
Annoyingly, I thought I was using the most recent microcode, but it seems
like there is some bug and debian wasn't applying it at boot.
That seems like a scary errata :-/
Tavis.
--
_o) $ lynx lock.cmpxchg8b.com
/\\ _o) _o) $ finger taviso@sdf.org
_\_V _( ) _( ) @taviso
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-22 21:26 ` Tavis Ormandy
@ 2023-02-22 22:17 ` Andrew Cooper
2023-02-22 22:26 ` Borislav Petkov
2023-02-22 22:25 ` Borislav Petkov
1 sibling, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2023-02-22 22:17 UTC (permalink / raw)
To: Tavis Ormandy, Borislav Petkov; +Cc: linux-kernel, x86
On 22/02/2023 9:26 pm, Tavis Ormandy wrote:
> On Wed, Feb 22, 2023 at 11:09:51AM +0100, Borislav Petkov wrote:
>> On Wed, Feb 22, 2023 at 09:38:09AM +0000, Andrew Cooper wrote:
>>> This sounds suspiciously like an errata which was fixed with a ucode
>>> update last year.
>> Yes, it looks like it.
>>
>> Alternatively, you can try booting with "clearcpuid=xsaves" - that
>> should take care of your observation too but yeah, you should rather
>> update your microcode.
>>
> Thanks - confirmed, it *doesn't* repro with 0x8301055, but does repro
> with 0x830104d.
For completeness, this is erratum 1386, details of which can be found in
https://www.amd.com/system/files/TechDocs/56323-PUB_1.00.pdf amongst others.
That said, looking at the list of preconditions to tickle this erratum,
I don't see anything in your code which would cause a change in MXCSR.
Which most likely means there's a path in the kernel modifying MXCSR and
that doesn't sound like a thing that ought to be happening by default.
> Annoyingly, I thought I was using the most recent microcode, but it seems
> like there is some bug and debian wasn't applying it at boot.
>
> That seems like a scary errata :-/
Honestly, this is tame compared to some of the things which get fixed in
ucode...
~Andrew
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-22 21:26 ` Tavis Ormandy
2023-02-22 22:17 ` Andrew Cooper
@ 2023-02-22 22:25 ` Borislav Petkov
1 sibling, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2023-02-22 22:25 UTC (permalink / raw)
To: Tavis Ormandy; +Cc: Andrew Cooper, linux-kernel, x86
On Wed, Feb 22, 2023 at 01:26:37PM -0800, Tavis Ormandy wrote:
> Thanks - confirmed, it *doesn't* repro with 0x8301055, but does repro
> with 0x830104d.
Good.
> Annoyingly, I thought I was using the most recent microcode, but it seems
> like there is some bug and debian wasn't applying it at boot.
Well, actually, microcode loading is so simple - you don't absolutely
need to rely on the distros to do it for ya.
You simply get it from
git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
and copy all the bin files from amd-ucode/ into /lib/firmware/amd-ucode/
Technically, you'd need only the one for your family but the tools can
deal with multiple files in there.
And then you're all set - dracut or whatever creates your initrd will
add it. You can even add it by hand, see
Documentation/x86/microcode.rst
> That seems like a scary errata :-/
Yeah, modern x86 hardware is crazy complex. And we've had worse. :-\
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-22 22:17 ` Andrew Cooper
@ 2023-02-22 22:26 ` Borislav Petkov
0 siblings, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2023-02-22 22:26 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Tavis Ormandy, linux-kernel, x86
On Wed, Feb 22, 2023 at 10:17:35PM +0000, Andrew Cooper wrote:
> I don't see anything in your code which would cause a change in MXCSR.
> Which most likely means there's a path in the kernel modifying MXCSR and
> that doesn't sound like a thing that ought to be happening by default.
We were just talking about this internally too. We could trace it to see
what fiddles with MXCSR but meh, bigger fish to fry.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-22 10:09 ` Borislav Petkov
2023-02-22 21:26 ` Tavis Ormandy
@ 2023-02-28 18:47 ` Alexander Monakov
2023-02-28 19:20 ` Borislav Petkov
1 sibling, 1 reply; 29+ messages in thread
From: Alexander Monakov @ 2023-02-28 18:47 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Andrew Cooper, Tavis Ormandy, linux-kernel, x86
On Wed, 22 Feb 2023, Borislav Petkov wrote:
> On Wed, Feb 22, 2023 at 09:38:09AM +0000, Andrew Cooper wrote:
> > This sounds suspiciously like an errata which was fixed with a ucode
> > update last year.
>
> Yes, it looks like it.
>
> Alternatively, you can try booting with "clearcpuid=xsaves" - that
> should take care of your observation too but yeah, you should rather
> update your microcode.
Hi folks,
I can reproduce this bug on AMD Renoir SoC:
vendor_id : AuthenticAMD
cpu family : 23
model : 96
model name : AMD Ryzen 5 4600G with Radeon Graphics
stepping : 1
microcode : 0x8600104
for which there's no microcode update, the microcode_amd_fam17h.bin file
in the linux-firmware.git repo carries only the following patches:
$ ./amd_ucode_info.py microcode_amd_fam17h.bin
Microcode patches in microcode_amd_fam17h.bin:
Family=0x17 Model=0x08 Stepping=0x02: Patch=0x0800820d Length=3200 bytes
Family=0x17 Model=0x01 Stepping=0x02: Patch=0x0800126e Length=3200 bytes
Family=0x17 Model=0x31 Stepping=0x00: Patch=0x08301055 Length=3200 bytes
I've seen microcode version increase after a BIOS update, so it seems like
internally microcode patches exist for Renoir too, but it's up to hardware
vendors to pick them up as part of a BIOS update. Is there any chance of
a conventional release like for the above three CPU models?
Thanks.
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-28 18:47 ` Alexander Monakov
@ 2023-02-28 19:20 ` Borislav Petkov
2023-02-28 19:24 ` Alexander Monakov
0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2023-02-28 19:20 UTC (permalink / raw)
To: Alexander Monakov; +Cc: Andrew Cooper, Tavis Ormandy, linux-kernel, x86
On Tue, Feb 28, 2023 at 09:47:22PM +0300, Alexander Monakov wrote:
> I've seen microcode version increase after a BIOS update,
Are you saying there's no BIOS update for your board?
Are you running the latest BIOS?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-28 19:20 ` Borislav Petkov
@ 2023-02-28 19:24 ` Alexander Monakov
2023-02-28 19:25 ` Borislav Petkov
0 siblings, 1 reply; 29+ messages in thread
From: Alexander Monakov @ 2023-02-28 19:24 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Andrew Cooper, Tavis Ormandy, linux-kernel, x86
On Tue, 28 Feb 2023, Borislav Petkov wrote:
> On Tue, Feb 28, 2023 at 09:47:22PM +0300, Alexander Monakov wrote:
> > I've seen microcode version increase after a BIOS update,
>
> Are you saying there's no BIOS update for your board?
No, on the contrary, if there were no updates I would not be able to
see microcode version increase after updating.
> Are you running the latest BIOS?
Yes.
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-28 19:24 ` Alexander Monakov
@ 2023-02-28 19:25 ` Borislav Petkov
2023-02-28 19:29 ` Alexander Monakov
0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2023-02-28 19:25 UTC (permalink / raw)
To: Alexander Monakov; +Cc: Andrew Cooper, Tavis Ormandy, linux-kernel, x86
On Tue, Feb 28, 2023 at 10:24:40PM +0300, Alexander Monakov wrote:
> No, on the contrary, if there were no updates I would not be able to
> see microcode version increase after updating.
So what are you saying then?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-28 19:25 ` Borislav Petkov
@ 2023-02-28 19:29 ` Alexander Monakov
2023-02-28 19:38 ` Andrew Cooper
2023-02-28 20:56 ` Borislav Petkov
0 siblings, 2 replies; 29+ messages in thread
From: Alexander Monakov @ 2023-02-28 19:29 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Andrew Cooper, Tavis Ormandy, linux-kernel, x86
On Tue, 28 Feb 2023, Borislav Petkov wrote:
> On Tue, Feb 28, 2023 at 10:24:40PM +0300, Alexander Monakov wrote:
> > No, on the contrary, if there were no updates I would not be able to
> > see microcode version increase after updating.
>
> So what are you saying then?
That I can reproduce the bug even with the latest BIOS, i.e. the microcode
in the latest BIOS update does not have the fix, and the linux-firmware.git
microcode does not have a patch for this CPU family at all.
Hence the question how to get fixed microcode for this CPU family.
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-28 19:29 ` Alexander Monakov
@ 2023-02-28 19:38 ` Andrew Cooper
2023-02-28 21:45 ` Alexander Monakov
2023-02-28 20:56 ` Borislav Petkov
1 sibling, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2023-02-28 19:38 UTC (permalink / raw)
To: Alexander Monakov, Borislav Petkov; +Cc: Tavis Ormandy, linux-kernel, x86
On 28/02/2023 7:29 pm, Alexander Monakov wrote:
> On Tue, 28 Feb 2023, Borislav Petkov wrote:
>
>> On Tue, Feb 28, 2023 at 10:24:40PM +0300, Alexander Monakov wrote:
>>> No, on the contrary, if there were no updates I would not be able to
>>> see microcode version increase after updating.
>> So what are you saying then?
> That I can reproduce the bug even with the latest BIOS, i.e. the microcode
> in the latest BIOS update does not have the fix, and the linux-firmware.git
> microcode does not have a patch for this CPU family at all.
>
> Hence the question how to get fixed microcode for this CPU family.
I shouldn't be recommending this, but I've begged AMD many times to be
better at publishing microcode...
https://github.com/platomav/CPUMicrocodes
and you'll need
---8<---
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
import sys
from pathlib import Path
from struct import pack
fout = sys.stdout
def stream_write(_):
"""Write to the output"""
return fout.buffer.write(_)
blobs = []
for ucode in sorted(sys.argv[1:]):
# cpu00xx0Fxx_
fm = int(ucode[3:11], 16)
eq = int("".join([ucode[5], ucode[6], ucode[9], ucode[10]]), 16)
sz = Path(ucode).stat().st_size
print("%08x, %s => %04x" % (fm, ucode[3:11], eq), file=sys.stderr)
blobs.append((fm, eq, sz, open(ucode, "rb").read()))
equiv = []
for x in blobs:
equiv.append(pack(b"=IIIHH", x[0], 0, 0, x[1], 0))
equiv.append(pack("=IIIHH", 0, 0, 0, 0, 0))
stream_write(pack("I", 0x00414d44)) # Magic
equiv_bin = b"".join(equiv)
stream_write(pack("II", 0, len(equiv_bin))) # Equiv table
stream_write(equiv_bin)
for blob in blobs:
stream_write(pack("II", 1, len(blob[3])))
stream_write(blob[3])
---8<---
to wrap the blob(s) from that git repo into the container format that
Linux and others know how to parse.
~Andrew
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-28 19:29 ` Alexander Monakov
2023-02-28 19:38 ` Andrew Cooper
@ 2023-02-28 20:56 ` Borislav Petkov
2023-02-28 21:16 ` Alexander Monakov
1 sibling, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2023-02-28 20:56 UTC (permalink / raw)
To: Alexander Monakov; +Cc: Andrew Cooper, Tavis Ormandy, linux-kernel, x86
On Tue, Feb 28, 2023 at 10:29:23PM +0300, Alexander Monakov wrote:
> That I can reproduce the bug even with the latest BIOS,
Can you reproduce if you boot with
clearcpuid=xsaves
?
CONFIG_X86_FEATURE_NAMES must be enabled in your .config for that to
work.
Otherwise, try
clearcpuid=323
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-28 20:56 ` Borislav Petkov
@ 2023-02-28 21:16 ` Alexander Monakov
2023-03-01 0:23 ` Andrew Cooper
0 siblings, 1 reply; 29+ messages in thread
From: Alexander Monakov @ 2023-02-28 21:16 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Andrew Cooper, Tavis Ormandy, linux-kernel, x86
On Tue, 28 Feb 2023, Borislav Petkov wrote:
> On Tue, Feb 28, 2023 at 10:29:23PM +0300, Alexander Monakov wrote:
> > That I can reproduce the bug even with the latest BIOS,
>
> Can you reproduce if you boot with
>
> clearcpuid=xsaves
>
> ?
No, with this option it is not reproducible.
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-28 19:38 ` Andrew Cooper
@ 2023-02-28 21:45 ` Alexander Monakov
0 siblings, 0 replies; 29+ messages in thread
From: Alexander Monakov @ 2023-02-28 21:45 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Borislav Petkov, Tavis Ormandy, linux-kernel, x86
On Tue, 28 Feb 2023, Andrew Cooper wrote:
> On 28/02/2023 7:29 pm, Alexander Monakov wrote:
> > On Tue, 28 Feb 2023, Borislav Petkov wrote:
> >
> > That I can reproduce the bug even with the latest BIOS, i.e. the microcode
> > in the latest BIOS update does not have the fix, and the linux-firmware.git
> > microcode does not have a patch for this CPU family at all.
> >
> > Hence the question how to get fixed microcode for this CPU family.
>
> I shouldn't be recommending this, but I've begged AMD many times to be
> better at publishing microcode...
>
> https://github.com/platomav/CPUMicrocodes
>
> and you'll need
[snip script]
Thank you, this is good to have. Unfortunately, even with microcode 0x08600109
taken from that repo the issue is reproducible (it is dated 2022-03-28, so
perhaps not really surprising).
Alexander
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-02-28 21:16 ` Alexander Monakov
@ 2023-03-01 0:23 ` Andrew Cooper
2023-03-01 8:54 ` Borislav Petkov
0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2023-03-01 0:23 UTC (permalink / raw)
To: Alexander Monakov, Borislav Petkov; +Cc: Tavis Ormandy, linux-kernel, x86
On 28/02/2023 9:16 pm, Alexander Monakov wrote:
> On Tue, 28 Feb 2023, Borislav Petkov wrote:
>> On Tue, Feb 28, 2023 at 10:29:23PM +0300, Alexander Monakov wrote:
>>> That I can reproduce the bug even with the latest BIOS,
>> Can you reproduce if you boot with
>>
>> clearcpuid=xsaves
>>
>> ?
> No, with this option it is not reproducible.
Ok.
Given that AMD do appear to have screwed up here, and the exploit does
reliably work on modern versions of Linux and up-to-date firmware, the
next course of action is to clobber XSAVES by default.
So we need a table for all Zen2 parts of ucode revisions below which we
force hide XSAVES as the erratum workaround.
That, or we skip the table and just hide XSAVES unconditionally on all
Fam17/18h CPUs... The Zen1/2 uarches have no supervisor states to
manage (AFAICT - the first supervisor states are CET in Zen3 I think),
and Linux already knows how to use XSAVEC (from virt usecases) which is
equivalent given no supervisor states.
Thoughts?
~Andrew
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-03-01 0:23 ` Andrew Cooper
@ 2023-03-01 8:54 ` Borislav Petkov
2023-03-07 16:45 ` Tavis Ormandy
0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2023-03-01 8:54 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Alexander Monakov, Tavis Ormandy, linux-kernel, x86
On Wed, Mar 01, 2023 at 12:23:01AM +0000, Andrew Cooper wrote:
> So we need a table for all Zen2 parts of ucode revisions below which we
> force hide XSAVES as the erratum workaround.
Working on it - stay tuned.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: x86: AMD Zen2 ymm registers rolling back
2023-03-01 8:54 ` Borislav Petkov
@ 2023-03-07 16:45 ` Tavis Ormandy
0 siblings, 0 replies; 29+ messages in thread
From: Tavis Ormandy @ 2023-03-07 16:45 UTC (permalink / raw)
To: Borislav Petkov; +Cc: Andrew Cooper, Alexander Monakov, linux-kernel, x86
On Wed, Mar 01, 2023 at 09:54:14AM +0100, Borislav Petkov wrote:
> On Wed, Mar 01, 2023 at 12:23:01AM +0000, Andrew Cooper wrote:
> > So we need a table for all Zen2 parts of ucode revisions below which we
> > force hide XSAVES as the erratum workaround.
>
> Working on it - stay tuned.
>
FYI, I heard from someone offlist that they can reproduce with the
latest microcode available on this cpu:
cpu family : 23
model : 17
model name : AMD Ryzen 3 2200G with Radeon Vega Graphics
stepping : 0
microcode : 0x8101016
It seems likely many zen1 systems are affected without any microcode
patch, which seems not great to me.
I guess it would be nice if AMD could provide a list of devices they're
not going to provide a patch for...
Tavis.
--
_o) $ lynx lock.cmpxchg8b.com
/\\ _o) _o) $ finger taviso@sdf.org
_\_V _( ) _( ) @taviso
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch
2023-02-22 6:40 x86: AMD Zen2 ymm registers rolling back Tavis Ormandy
2023-02-22 8:33 ` Borislav Petkov
@ 2023-03-07 17:46 ` Andrew Cooper
2023-03-07 17:50 ` Borislav Petkov
2023-03-08 16:23 ` [tip: x86/urgent] x86/CPU/AMD: Disable XSAVES on AMD family 0x17 tip-bot2 for Andrew Cooper
1 sibling, 2 replies; 29+ messages in thread
From: Andrew Cooper @ 2023-03-07 17:46 UTC (permalink / raw)
To: LKML
Cc: Andrew Cooper, Tavis Ormandy, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
Alexander Monakov
AMD Erratum 1386 is summarised as:
XSAVES Instruction May Fail to Save XMM Registers to the Provided
State Save Area
This piece of accidental chronomancy causes the %xmm registers to
occasionally reset back to an older value.
Ignore the XSAVES feature on all AMD Zen1/2 hardware. The XSAVEC
instruction (which works fine) is equivalent on affected parts.
Reported-by: Tavis Ormandy <taviso@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: x86@kernel.org
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: linux-kernel@vger.kernel.org
CC: Tavis Ormandy <taviso@gmail.com>
CC: Alexander Monakov <amonakov@ispras.ru>
Only compile tested.
This wants backporting to all stable trees that understand XSAVES, but
before 5.19(?) needs the XSAVEC support backporting too...
---
arch/x86/kernel/cpu/amd.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 380753b14cab..f3a4bb479fd5 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -890,6 +890,17 @@ static void init_amd_zn(struct cpuinfo_x86 *c)
node_reclaim_distance = 32;
#endif
+ /*
+ * Work around Erratum 1386. The XSAVES instruction malfunctions in
+ * certain circumstances on Zen1/2 uarch, and not all parts have had
+ * updated microcode at the time of writing (March 2023).
+ *
+ * Affected parts all have no supervisor XSAVE states, meaning that
+ * the XSAVEC instruction (which works fine) is equivelent.
+ */
+ if (c->x86 == 0x17)
+ clear_cpu_cap(c, X86_FEATURE_XSAVES);
+
/* Fix up CPUID bits, but only if not virtualised. */
if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) {
--
2.30.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch
2023-03-07 17:46 ` [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch Andrew Cooper
@ 2023-03-07 17:50 ` Borislav Petkov
2023-03-07 18:22 ` Andrew Cooper
2023-03-08 16:23 ` [tip: x86/urgent] x86/CPU/AMD: Disable XSAVES on AMD family 0x17 tip-bot2 for Andrew Cooper
1 sibling, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2023-03-07 17:50 UTC (permalink / raw)
To: Andrew Cooper
Cc: LKML, Tavis Ormandy, Thomas Gleixner, Ingo Molnar, Dave Hansen,
x86, H. Peter Anvin, Alexander Monakov
On Tue, Mar 07, 2023 at 05:46:43PM +0000, Andrew Cooper wrote:
> + * Work around Erratum 1386. The XSAVES instruction malfunctions in
> + * certain circumstances on Zen1/2 uarch, and not all parts have had
> + * updated microcode at the time of writing (March 2023).
Any reason for rushing this and not waiting for me to clarify affected
models first?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch
2023-03-07 17:50 ` Borislav Petkov
@ 2023-03-07 18:22 ` Andrew Cooper
2023-03-07 18:56 ` Borislav Petkov
0 siblings, 1 reply; 29+ messages in thread
From: Andrew Cooper @ 2023-03-07 18:22 UTC (permalink / raw)
To: Borislav Petkov
Cc: LKML, Tavis Ormandy, Thomas Gleixner, Ingo Molnar, Dave Hansen,
x86, H. Peter Anvin, Alexander Monakov
On 07/03/2023 5:50 pm, Borislav Petkov wrote:
> On Tue, Mar 07, 2023 at 05:46:43PM +0000, Andrew Cooper wrote:
>> + * Work around Erratum 1386. The XSAVES instruction malfunctions in
>> + * certain circumstances on Zen1/2 uarch, and not all parts have had
>> + * updated microcode at the time of writing (March 2023).
> Any reason for rushing this and not waiting for me to clarify affected
> models first?
Well yes - more and more reports of impacted systems.
Having the full list of affected models is great and all, but how is it
going to change this patch as a workaround ?
~Andrew
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch
2023-03-07 18:22 ` Andrew Cooper
@ 2023-03-07 18:56 ` Borislav Petkov
2023-03-07 20:01 ` Andrew Cooper
0 siblings, 1 reply; 29+ messages in thread
From: Borislav Petkov @ 2023-03-07 18:56 UTC (permalink / raw)
To: Andrew Cooper
Cc: LKML, Tavis Ormandy, Thomas Gleixner, Ingo Molnar, Dave Hansen,
x86, H. Peter Anvin, Alexander Monakov
On Tue, Mar 07, 2023 at 06:22:01PM +0000, Andrew Cooper wrote:
> Well yes - more and more reports of impacted systems.
>
> Having the full list of affected models is great and all, but how is it
> going to change this patch as a workaround ?
We don't have to clear the feature flag on those systems which have
a fix.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch
2023-03-07 18:56 ` Borislav Petkov
@ 2023-03-07 20:01 ` Andrew Cooper
2023-03-07 20:09 ` Borislav Petkov
2023-03-14 16:01 ` Dave Hansen
0 siblings, 2 replies; 29+ messages in thread
From: Andrew Cooper @ 2023-03-07 20:01 UTC (permalink / raw)
To: Borislav Petkov
Cc: LKML, Tavis Ormandy, Thomas Gleixner, Ingo Molnar, Dave Hansen,
x86, H. Peter Anvin, Alexander Monakov
On 07/03/2023 6:56 pm, Borislav Petkov wrote:
> On Tue, Mar 07, 2023 at 06:22:01PM +0000, Andrew Cooper wrote:
>> Well yes - more and more reports of impacted systems.
>>
>> Having the full list of affected models is great and all, but how is it
>> going to change this patch as a workaround ?
> We don't have to clear the feature flag on those systems which have
> a fix.
Sure, but why is that helpful?
XSAVES and XSAVEC are functionally identical on Zen1/2 because these
CPUs don't advertise any supervisor XSAVE states.
It is only Zen3 where XSAVES starts doing more than XSAVEC (and even
then, only after the CET series actually gets merged...)
~Andrew
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch
2023-03-07 20:01 ` Andrew Cooper
@ 2023-03-07 20:09 ` Borislav Petkov
2023-03-14 16:01 ` Dave Hansen
1 sibling, 0 replies; 29+ messages in thread
From: Borislav Petkov @ 2023-03-07 20:09 UTC (permalink / raw)
To: Andrew Cooper
Cc: LKML, Tavis Ormandy, Thomas Gleixner, Ingo Molnar, Dave Hansen,
x86, H. Peter Anvin, Alexander Monakov
On Tue, Mar 07, 2023 at 08:01:36PM +0000, Andrew Cooper wrote:
> Sure, but why is that helpful?
>
> XSAVES and XSAVEC are functionally identical on Zen1/2 because these
> CPUs don't advertise any supervisor XSAVE states.
I guess... We'll know soon enough.
> It is only Zen3 where XSAVES starts doing more than XSAVEC (and even
> then, only after the CET series actually gets merged...)
Yeah, latter should probably happen this time around. :-)
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 29+ messages in thread
* [tip: x86/urgent] x86/CPU/AMD: Disable XSAVES on AMD family 0x17
2023-03-07 17:46 ` [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch Andrew Cooper
2023-03-07 17:50 ` Borislav Petkov
@ 2023-03-08 16:23 ` tip-bot2 for Andrew Cooper
1 sibling, 0 replies; 29+ messages in thread
From: tip-bot2 for Andrew Cooper @ 2023-03-08 16:23 UTC (permalink / raw)
To: linux-tip-commits
Cc: Tavis Ormandy, Andrew Cooper, Borislav Petkov (AMD),
stable, x86, linux-kernel
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: b0563468eeac88ebc70559d52a0b66efc37e4e9d
Gitweb: https://git.kernel.org/tip/b0563468eeac88ebc70559d52a0b66efc37e4e9d
Author: Andrew Cooper <andrew.cooper3@citrix.com>
AuthorDate: Tue, 07 Mar 2023 17:46:43
Committer: Borislav Petkov (AMD) <bp@alien8.de>
CommitterDate: Wed, 08 Mar 2023 16:56:08 +01:00
x86/CPU/AMD: Disable XSAVES on AMD family 0x17
AMD Erratum 1386 is summarised as:
XSAVES Instruction May Fail to Save XMM Registers to the Provided
State Save Area
This piece of accidental chronomancy causes the %xmm registers to
occasionally reset back to an older value.
Ignore the XSAVES feature on all AMD Zen1/2 hardware. The XSAVEC
instruction (which works fine) is equivalent on affected parts.
[ bp: Typos, move it into the F17h-specific function. ]
Reported-by: Tavis Ormandy <taviso@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/20230307174643.1240184-1-andrew.cooper3@citrix.com
---
arch/x86/kernel/cpu/amd.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 380753b..95cdd08 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -880,6 +880,15 @@ void init_spectral_chicken(struct cpuinfo_x86 *c)
}
}
#endif
+ /*
+ * Work around Erratum 1386. The XSAVES instruction malfunctions in
+ * certain circumstances on Zen1/2 uarch, and not all parts have had
+ * updated microcode at the time of writing (March 2023).
+ *
+ * Affected parts all have no supervisor XSAVE states, meaning that
+ * the XSAVEC instruction (which works fine) is equivalent.
+ */
+ clear_cpu_cap(c, X86_FEATURE_XSAVES);
}
static void init_amd_zn(struct cpuinfo_x86 *c)
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch
2023-03-07 20:01 ` Andrew Cooper
2023-03-07 20:09 ` Borislav Petkov
@ 2023-03-14 16:01 ` Dave Hansen
1 sibling, 0 replies; 29+ messages in thread
From: Dave Hansen @ 2023-03-14 16:01 UTC (permalink / raw)
To: Andrew Cooper, Borislav Petkov
Cc: LKML, Tavis Ormandy, Thomas Gleixner, Ingo Molnar, Dave Hansen,
x86, H. Peter Anvin, Alexander Monakov
On 3/7/23 12:01, Andrew Cooper wrote:
> On 07/03/2023 6:56 pm, Borislav Petkov wrote:
>> On Tue, Mar 07, 2023 at 06:22:01PM +0000, Andrew Cooper wrote:
>>> Well yes - more and more reports of impacted systems.
>>>
>>> Having the full list of affected models is great and all, but how is it
>>> going to change this patch as a workaround ?
>> We don't have to clear the feature flag on those systems which have
>> a fix.
> Sure, but why is that helpful?
>
> XSAVES and XSAVEC are functionally identical on Zen1/2 because these
> CPUs don't advertise any supervisor XSAVE states.
On Intel at least, XSAVES uses the modified optimization and XSAVEC does
not. I'm not sure if you include this in your definition of
"functionally identical", but it is one of those little XSAVE
architecture oddities that's bitten me a time or two.
The AMD docs don't talk about the modified optimization verbatim, but I
think this nugget is alluding to the same behavior:
> 4. XRSTOR loads an internal state value XRSTOR_INFO that can be used to further optimize a
> subsequent XSAVEOPT or XSAVES. This reflects the current privilege level and virtualization
> mode as well as the save area's base address and XCOMP_BV field.
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2023-03-14 16:01 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-22 6:40 x86: AMD Zen2 ymm registers rolling back Tavis Ormandy
2023-02-22 8:33 ` Borislav Petkov
2023-02-22 9:14 ` Borislav Petkov
2023-02-22 9:38 ` Andrew Cooper
2023-02-22 10:09 ` Borislav Petkov
2023-02-22 21:26 ` Tavis Ormandy
2023-02-22 22:17 ` Andrew Cooper
2023-02-22 22:26 ` Borislav Petkov
2023-02-22 22:25 ` Borislav Petkov
2023-02-28 18:47 ` Alexander Monakov
2023-02-28 19:20 ` Borislav Petkov
2023-02-28 19:24 ` Alexander Monakov
2023-02-28 19:25 ` Borislav Petkov
2023-02-28 19:29 ` Alexander Monakov
2023-02-28 19:38 ` Andrew Cooper
2023-02-28 21:45 ` Alexander Monakov
2023-02-28 20:56 ` Borislav Petkov
2023-02-28 21:16 ` Alexander Monakov
2023-03-01 0:23 ` Andrew Cooper
2023-03-01 8:54 ` Borislav Petkov
2023-03-07 16:45 ` Tavis Ormandy
2023-03-07 17:46 ` [PATCH] x86/amd: Work around Erratum 1386 - XSAVES malfunction on context switch Andrew Cooper
2023-03-07 17:50 ` Borislav Petkov
2023-03-07 18:22 ` Andrew Cooper
2023-03-07 18:56 ` Borislav Petkov
2023-03-07 20:01 ` Andrew Cooper
2023-03-07 20:09 ` Borislav Petkov
2023-03-14 16:01 ` Dave Hansen
2023-03-08 16:23 ` [tip: x86/urgent] x86/CPU/AMD: Disable XSAVES on AMD family 0x17 tip-bot2 for Andrew Cooper
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.