FSGSBASE causing panic on 5.9-rc1

* FSGSBASE causing panic on 5.9-rc1
@ 2020-08-19 18:07 Tom Lendacky
  2020-08-19 18:19 ` Tom Lendacky
  0 siblings, 1 reply; 27+ messages in thread
From: Tom Lendacky @ 2020-08-19 18:07 UTC (permalink / raw)
  To: Linux Kernel Mailing List, X86 ML
  Cc: Andy Lutomirski, Chang S. Bae, Thomas Gleixner, Sasha Levin,
	Borislav Petkov, Peter Zijlstra, Ingo Molnar

It looks like the FSGSBASE support is crashing my second generation EPYC
system. I was able to bisect it to:

b745cfba44c1 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")

The panic only happens when using KVM. Doing kernel builds or stress
on bare-metal appears fine. But if I fire up, in this case, a 64-vCPU
guest and do a kernel build within the guest, I get the following:

[  120.360637] BUG: scheduling while atomic: qemu-system-x86/5485/0x00110000
[  124.041646] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: x86_pmu_handle_irq+0x163/0x170
[  124.041647] ------------[ cut here ]------------
[  124.041649] Hardware name: AMD
[  124.041649] Workqueue:  0x0 (events)
[  124.041651] Call Trace:
[  124.041651] ------------[ cut here ]------------
[  124.041652] corrupted preempt_count: kworker/22:1/1449/0x110000
[  124.051267] WARNING: CPU: 22 PID: 1449 at kernel/sched/core.c:3595 finish_task_switch+0x289/0x290
[  124.051268] Modules linked in: tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc fuse amd64_edac_mod edac_mce_amd wmi_bmof kvm_amd kvm irqbypass sg ipmi_ssif ccp k10temp acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq squashfs loop sch_fq_codel parport_pc ppdev lp parport ip_tables raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear sd_mod t10_pi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper ast drm_vram_helper drm_ttm_helper i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect ahci sysimgblt libahci fb_sys_fops libata drm e1000e i2c_piix4 wmi i2c_designware_platform i2c_designware_core pinctrl_amd i2c_core
[  124.051285] CPU: 22 PID: 1449 Comm: kworker/22:1 Tainted: G        W         5.9.0-rc1-sos-linux #1
[  124.051286] Hardware name: AMD
[  124.051286] Workqueue:  0x0 (events)
[  124.051287] RIP: 0010:finish_task_switch+0x289/0x290
[  124.051288] Code: ff 65 48 8b 04 25 c0 7b 01 00 8b 90 a8 08 00 00 48 8d b0 b0 0a 00 00 48 c7 c7 20 10 10 86 c6 05 be aa 55 01 01 e8 89 03 fd ff <0f> 0b e9 6b ff ff ff 55 48 89 e5 41 55 41 54 49 89 fc 53 48 89 f3
[  124.051288] RSP: 0018:ffffc9001afe7e10 EFLAGS: 00010082
[  124.051289] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000023
[  124.051290] RDX: 0000000000000023 RSI: ffffffff86101044 RDI: ffff88900d798bb0
[  124.051290] RBP: ffffc9001afe7e38 R08: ffff88900d798ba8 R09: 0000000000000005
[  124.051290] R10: 000000000000000f R11: ffff88900d798d54 R12: ffff88900d7aacc0
[  124.051291] R13: ffff889bd2308000 R14: 0000000000000000 R15: ffff88900d7aacc0
[  124.051291] FS:  0000000000000000(0000) GS:ffff88900d780000(0000) knlGS:0000000000000000
[  124.051292] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  124.051292] CR2: 00007ff607620000 CR3: 0000001bcb0d2000 CR4: 0000000000350ee0
[  124.051293] Call Trace:
[  124.051293]  __schedule+0x348/0x810
[  124.051293]  ? dbs_work_handler+0x47/0x60
[  124.051294]  schedule+0x4a/0xb0
[  124.051294]  worker_thread+0xcf/0x3b0
[  124.051294]  ? process_one_work+0x370/0x370
[  124.051294]  kthread+0xfe/0x140
[  124.051295]  ? kthread_park+0x90/0x90
[  124.051295]  ret_from_fork+0x22/0x30
[  124.051295] ---[ end trace 7f77ee8ad05caa89 ]---
[  124.051296] Kernel Offset: disabled

Specifying nofsgsbase avoids the issue. This is very reproducible, so I
can easily test any fixes.

Thanks,
Tom

^ permalink raw reply	[flat|nested] 27+ messages in thread