All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Menzel <pmenzel@molgen.mpg.de>
To: Jiri Kosina <jikos@kernel.org>
Cc: x86@kernel.org, LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Thomas Lendacky <Thomas.Lendacky@amd.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Borislav Petkov <bp@alien8.de>
Subject: Re: General protection fault in `switch_mm_irqs_off()`
Date: Wed, 9 Jan 2019 14:19:24 +0100	[thread overview]
Message-ID: <e2cbef46-d054-9bd2-5d8c-0b82457738ad@molgen.mpg.de> (raw)
In-Reply-To: <cb7ba667-562b-1e4c-f16e-7c11804bc98a@molgen.mpg.de>

[-- Attachment #1: Type: text/plain, Size: 9066 bytes --]

Dear Jiri, dear Thomas, dear Borislav,


On 01/09/19 13:06, Paul Menzel wrote:

> On 01/04/19 17:42, Jiri Kosina wrote:
>>
>> [ added some CCs ]
> 
> Thank you for your reply and taking care of that. I am sorry for the
> late reply. It took a while to test this.
> 
>> On Thu, 3 Jan 2019, Paul Menzel wrote:
> 
>>> On the server board Asus KGPE-D16 with AMD Opteron 6278 processor
>>> updating the microcode update in the firmware from 0x0600062e to
>>> 0x0600063e seems to cause a general protection fault with Linux
>>> 4.14.87 and 4.20-rc7.
> 
> Just a minor correction. The previous microcode update version was
> 0x0600063d, and, it looks like, I am getting the same failure with
> that and Linux 4.14.87.

I was mistaken. Everything is fine with 0x0600063d.

> It boots fine, when not applying any microcode update (0x00000000).
> 
> To answers, Thomas’ question, the microcode is updated in the
> firmware (coreboot). (Asus didn’t publish any updates.)
> 
>>>> 46.859: [    7.573240] microcode: CPU31: patch_level=0x0600063e
>>>> 46.859: [    7.578507] microcode: Microcode Update Driver: v2.2.
>>>> 46.860: [    7.578539] sched_clock: Marking stable (6510054745, 1068444659)->(7999876773, -421377369)
>>>> 46.860: [    7.593013] registered taskstats version 1
>>>> 46.861: [    7.598091] rtc_cmos 00:00: setting system clock to 2000-01-01 08:01:51 UTC (946713711)
>>>> 46.862: [    7.606575] ALSA device list:
>>>> 46.862: [    7.609802]   No soundcards found.
>>>> 46.865: [    7.615887] Freeing unused kernel image memory: 1564K
>>>> 46.871: [    7.627073] Write protecting the kernel read-only data: 20480k
>>>> 46.872: [    7.634366] Freeing unused kernel image memory: 2016K
>>>> 46.873: [    7.640297] Freeing unused kernel image memory: 584K
>>>> 46.874: [    7.645521] Run /init as init process
>>>> 46.877: [    7.652262] general protection fault: 0000 [#1] SMP NOPTI
>>>> 46.877: [    7.657931] CPU: 18 PID: 0 Comm: swapper/18 Not tainted 4.20.0-rc7.mx64.237 #1
>>>> 46.877: [    7.665514] Hardware name: ASUS KGPE-D16/KGPE-D16, BIOS 4.9-103-g637bef2037 01/02/2019
>>>> 46.878: [    7.673804] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
>>>> 46.878: [    7.678948] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
>>>> 46.879: [    7.698394] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
>>>> 46.879: [    7.703844] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX: 0000000000000049
>>>> 46.879: [    7.711238] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI: ffff88981ca0b800
>>>> 46.880: [    7.718665] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09: 0000000000000000
>>>> 46.880: [    7.726092] R10: ffffc90006343e88 R11: 0000000000000000 R12: ffffffff82479b40
>>>> 46.880: [    7.733494] R13: 0000000000000000 R14: 0000000000000012 R15: ffff88981dd50080
>>>> 46.881: [    7.740853] FS:  0000000000000000(0000) GS:ffff88981fa80000(0000) knlGS:0000000000000000
>>>> 46.881: [    7.749318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> 46.881: [    7.755281] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000406e0
>>>> 46.881: [    7.762761] Call Trace:
>>>> 46.881: [    7.765369]  ? __schedule+0x1b9/0x7b0
>>>> 46.882: [    7.769253]  __schedule+0x1b9/0x7b0
>>>> 46.882: [    7.772930]  schedule_idle+0x1e/0x40
>>>> 46.882: [    7.776744]  do_idle+0x146/0x200
>>>> 46.882: [    7.780181]  cpu_startup_entry+0x19/0x20
>>>> 46.883: [    7.784274]  start_secondary+0x183/0x1b0
>>>> 46.883: [    7.788409]  secondary_startup_64+0xa4/0xb0
>>>> 46.883: [    7.792766] Modules linked in:
>>>> 46.883: [    7.796105] ---[ end trace a423e363fe1ecf67 ]---
>>>> 46.884: [    7.800939] RIP: 0010:switch_mm_irqs_off+0xb2/0x640
>>>> 46.884: [    7.806048] Code: 48 c1 ef 09 83 e7 01 48 09 c7 65 48 8b 05 8e 34 fc 7e 48 39 c7 74 15 48 09 f8 a8 01 74 0e b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 65 48 89 3d 6c 34 fc 7e 8b 05 9a ef a7 01 85 c0 0f 8f 41 04
>>
>> So this faults when writing PRED_CMD_IBPB to MSR_IA32_PRED_CMD, but that 
>> should be properly patched out on ucodes that don't support IBPB.
>>
>> This almost looks like the ucode you updated to would advertise IBPB 
>> availability, but then fault when it's used.
> 
> As it also happens with the previous firmware version, is it possible that
> the check is incorrect? Maybe there are not a lot of people running AMD
> Opteron servers and latest Linux or Linux stable kernels?
> 
>> I guess that booting with 'spectre_v2_user=off' makes the issue go away, 
>> right?
> 
> Indeed. That makes it boot with microcode updates applied.
> 
>     [    0.000000] Command line: BOOT_IMAGE=/boot/bzImage-4.14.87.mx64.236 crashkernel=256M root=LABEL=root ro console=ttyS0,115200n8 console=ttyS1,115200n8 console=tty0 init=/bin/systemd audit=0 spectre_v2_user=off
>     […]
>     [    3.809210] microcode: CPU0: patch_level=0x0600063e
> 
>> What happens then if you manually wrmsr 0x1 to MSR 0x49 from userspace? 
> 
> With no microcode updates applied, I get.
> 
>     $ dmesg | grep 'microcode: CPU0: patch_level'
>     [    3.817171] microcode: CPU0: patch_level=0x00000000
>     $ sudo modprobe msr
>     $ sudo ./wrmsr 0x49 0x1 # https://github.com/01org/msr-tools
>     wrmsr: CPU 0 cannot set MSR 0x00000049 to 0x0000000000000001
> 
> I get the same with microcode updates applied.
> 
>     $ dmesg | grep 'microcode: CPU0: patch_level'
>     [    3.809210] microcode: CPU0: patch_level=0x0600063e
>     $ sudo modprobe msr
>     $ sudo ./wrmsr 0x49 0x1
>     wrmsr: CPU 0 cannot set MSR 0x00000049 to 0x0000000000000001
> 
>> Could you please post /proc/cpuinfo from such a boot as well?
> 
>> Leaving the rest of the original mail for reference.
> 
>     processor   : 0
>     vendor_id   : AuthenticAMD
>     cpu family  : 21
>     model               : 1
>     model name  : AMD Opteron(tm) Processor 6278
>     stepping    : 2
>     microcode   : 0x600063e
>     cpu MHz             : 1871.198
>     cache size  : 2048 KB
>     physical id : 0
>     siblings    : 16
>     core id             : 0
>     cpu cores   : 8
>     apicid              : 0
>     initial apicid      : 0
>     fpu         : yes
>     fpu_exception       : yes
>     cpuid level : 13
>     wp          : yes
>     flags               : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl
>      nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowpre
>     fetch osvw ibs xop skinit wdt fma4 topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter
>      pfthreshold
>     bugs                : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass
>     bogomips    : 4799.84
>     TLB size    : 1536 4K pages
>     clflush size        : 64
>     cache_alignment     : 64
>     address sizes       : 48 bits physical, 48 bits virtual
>     power management: ts ttp tm 100mhzsteps hwpstate cpb
> 
> Please find the whole output attached.
> 
>>>> 46.884: [    7.825440] RSP: 0018:ffffc90006343e20 EFLAGS: 00010046
>>>> 46.885: [    7.830855] RAX: 0000000000000001 RBX: ffff88981ca0b800 RCX: 0000000000000049
>>>> 46.885: [    7.838230] RDX: 0000000000000000 RSI: ffff88981b87cf80 RDI: ffff88981ca0b800
>>>> 46.885: [    7.845614] RBP: ffffc90006343e70 R08: 00000001c81bec00 R09: 0000000000000000
>>>> 46.886: [    7.853047] R10: ffffc90006343e88 R11: 0000000000000000 R12: ffffffff82479b40
>>>> 46.886: [    7.860427] R13: 0000000000000000 R14: 0000000000000012 R15: ffff88981dd50080
>>>> 46.886: [    7.867862] FS:  0000000000000000(0000) GS:ffff88981fa80000(0000) knlGS:0000000000000000
>>>> 46.886: [    7.876320] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> 46.887: [    7.882351] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000406e0
>>>> 46.887: [    7.889746] Kernel panic - not syncing: Attempted to kill the idle task!
>>>> 46.888: [    7.896907] Kernel Offset: disabled
>>>> 46.888: [    7.900558] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
>>>
>>> Please find the whole log, including the coreboot messages, attached. The time
>>> stamps in the beginning are from the script `readserial.py` from the SeaBIOS
>>> repository.
> 
> Please find the logs attached.
> 
> I’ll do one more test with the microcode update 0x0600063d, to verify
> that the panic also happens with that microcode version (I am pretty
> certain).

As written above, it looks like I was wrong, and 0x0600063d does not
cause the problem.


Kind regards,

Paul


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5174 bytes --]

      parent reply	other threads:[~2019-01-09 13:19 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-03 21:45 General protection fault in `switch_mm_irqs_off()` Paul Menzel
2019-01-04 12:41 ` Paul Menzel
2019-01-04 15:47   ` Borislav Petkov
2019-01-04 17:32     ` Lendacky, Thomas
2019-01-04 16:42 ` Jiri Kosina
     [not found]   ` <cb7ba667-562b-1e4c-f16e-7c11804bc98a@molgen.mpg.de>
2019-01-09 13:16     ` Thomas Gleixner
2019-01-09 13:35       ` Paul Menzel
2019-01-09 14:29         ` Lendacky, Thomas
2019-01-09 14:34           ` Paul Menzel
2019-01-09 16:15             ` Lendacky, Thomas
2019-01-09 16:34               ` Paul Menzel
2019-01-09 21:11                 ` Borislav Petkov
     [not found]                   ` <9bbcbaa7-b164-fcef-0588-7c5f25aa2440@molgen.mpg.de>
2019-01-10 15:53                     ` Lendacky, Thomas
2019-01-10 16:02                       ` Borislav Petkov
2019-01-10 16:00                     ` Borislav Petkov
2019-01-10 16:49                       ` Paul Menzel
2019-01-10 18:34                         ` Lendacky, Thomas
2019-01-14 17:00                           ` Lendacky, Thomas
2019-01-14 17:09                             ` Paul Menzel
2019-01-14 17:37                               ` Lendacky, Thomas
2019-10-02 15:52                                 ` Paul Menzel
2019-01-09 13:19     ` Paul Menzel [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e2cbef46-d054-9bd2-5d8c-0b82457738ad@molgen.mpg.de \
    --to=pmenzel@molgen.mpg.de \
    --cc=Thomas.Lendacky@amd.com \
    --cc=bp@alien8.de \
    --cc=jikos@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.