All of lore.kernel.org
 help / color / mirror / Atom feed
* Yet another KPTI regression with 4.14.x series in a VM
@ 2018-01-12 18:19 Laura Abbott
  2018-01-12 18:51 ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Laura Abbott @ 2018-01-12 18:19 UTC (permalink / raw)
  To: X86 ML, Linux Kernel Mailing List; +Cc: stable

Hi,

Fedora got a bug report on 4.14.11 of a panic when booting a
Fedora guest in a CentOS 6 VM, not reproducible with nopti.
The issue is still present as of 4.14.13 as well. The only
report is a panic screenshot
https://bugzilla.redhat.com/show_bug.cgi?id=1532458

I've lost track of all the fixes that have been flying around,
is this a new issue or has a fix not yet made it to stable?

Thanks,
Laura

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-12 18:19 Yet another KPTI regression with 4.14.x series in a VM Laura Abbott
@ 2018-01-12 18:51 ` Thomas Gleixner
  2018-01-12 21:30   ` Laura Abbott
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2018-01-12 18:51 UTC (permalink / raw)
  To: Laura Abbott; +Cc: X86 ML, Linux Kernel Mailing List, stable

On Fri, 12 Jan 2018, Laura Abbott wrote:
> Fedora got a bug report on 4.14.11 of a panic when booting a
> Fedora guest in a CentOS 6 VM, not reproducible with nopti.
> The issue is still present as of 4.14.13 as well. The only
> report is a panic screenshot
> https://bugzilla.redhat.com/show_bug.cgi?id=1532458
> 
> I've lost track of all the fixes that have been flying around,
> is this a new issue or has a fix not yet made it to stable?

Hmm. Looks kinda familiar, but that has been fixed I think even before
4.4.11. Could you please ask the reported to provide a full console log via
the VM "serial console" ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-12 18:51 ` Thomas Gleixner
@ 2018-01-12 21:30   ` Laura Abbott
  2018-01-12 21:51     ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Laura Abbott @ 2018-01-12 21:30 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: X86 ML, Linux Kernel Mailing List, stable

On 01/12/2018 10:51 AM, Thomas Gleixner wrote:
> On Fri, 12 Jan 2018, Laura Abbott wrote:
>> Fedora got a bug report on 4.14.11 of a panic when booting a
>> Fedora guest in a CentOS 6 VM, not reproducible with nopti.
>> The issue is still present as of 4.14.13 as well. The only
>> report is a panic screenshot
>> https://bugzilla.redhat.com/show_bug.cgi?id=1532458
>>
>> I've lost track of all the fixes that have been flying around,
>> is this a new issue or has a fix not yet made it to stable?
> 
> Hmm. Looks kinda familiar, but that has been fixed I think even before
> 4.4.11. Could you please ask the reported to provide a full console log via
> the VM "serial console" ?
> 
> Thanks,
> 
> 	tglx
> 

[    0.000000] Linux version 4.14.13-300.fc27.x86_64
(mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 7.2.1 20170915 (Red
Hat 7.2.1-2) (GCC)) #1 SMP Thu Jan 11 04:00:01 UTC 2018
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.14.13-300.fc27.x86_64
root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap
rhgb LANG=en_US.UTF-8 console=tty0 console=ttyS0
[    0.000000] Disabled fast string operations
[    0.000000] x86/fpu: x87 FPU will use FXSAVE
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009dc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007fffcfff] usable
[    0.000000] BIOS-e820: [mem 0x000000007fffd000-0x000000007fffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffbc000-0x00000000ffffffff] reserved
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] random: fast init done
[    0.000000] SMBIOS 2.4 present.
[    0.000000] DMI: Red Hat KVM, BIOS 0.5.1 01/01/2007
[    0.000000] Hypervisor detected: KVM
[    0.000000] tsc: Using PIT calibration value
[    0.000000] e820: last_pfn = 0x7fffd max_arch_pfn = 0x400000000
[    0.000000] x86/PAT: PAT MSR is 0, disabled.
[    0.000000] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC
[    0.000000] found SMP MP-table at [mem 0x000fda30-0x000fda3f] mapped at
[ffffffffff200a30]
[    0.000000] RAMDISK: [mem 0x356e3000-0x36b69fff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000FD9E0 000014 (v00 BOCHS )
[    0.000000] ACPI: RSDT 0x000000007FFFD5D0 000034 (v01 BOCHS	BXPCRSDT
00000001 BXPC 00000001)
[    0.000000] ACPI: FACP 0x000000007FFFFE20 000074 (v01 BOCHS	BXPCFACP
00000001 BXPC 00000001)
[    0.000000] ACPI: DSDT 0x000000007FFFD910 0024A2 (v01 BXPC	BXDSDT	
00000001 INTL 20090123)
[    0.000000] ACPI: FACS 0x000000007FFFFDC0 000040
[    0.000000] ACPI: SSDT 0x000000007FFFD810 0000FF (v01 BOCHS	BXPCSSDT
00000001 BXPC 00000001)
[    0.000000] ACPI: APIC 0x000000007FFFD720 000080 (v01 BOCHS	BXPCAPIC
00000001 BXPC 00000001)
[    0.000000] ACPI: SSDT 0x000000007FFFD610 00010F (v01 BXPC	BXSSDTPC
00000001 INTL 20090123)
[    0.000000] No NUMA configuration found
[    0.000000] Faking a node at [mem 0x0000000000000000-0x000000007fffcfff]
[    0.000000] NODE_DATA(0) allocated [mem 0x7ffd2000-0x7fffcfff]
[    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[    0.000000] kvm-clock: cpu 0, msr 0:7ffc2001, primary cpu clock
[    0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles:
0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000000] Zone ranges:
[    0.000000]	 DMA	  [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]	 DMA32	  [mem 0x0000000001000000-0x000000007fffcfff]
[    0.000000]	 Normal   empty
[    0.000000]	 Device   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]	 node	0: [mem 0x0000000000001000-0x000000000009cfff]
[    0.000000]	 node	0: [mem 0x0000000000100000-0x000000007fffcfff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000007fffcfff]
[    0.000000] ACPI: PM-Timer IO Port: 0xb008
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.000000] PM: Registered nosave memory: [mem 0x0009d000-0x0009dfff]
[    0.000000] PM: Registered nosave memory: [mem 0x0009e000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
[    0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
[    0.000000] e820: [mem 0x80000000-0xfffbbfff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on KVM
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.000000] setup_percpu: NR_CPUS:1024 nr_cpumask_bits:2 nr_cpu_ids:2
nr_node_ids:1
[    0.000000] percpu: Embedded 44 pages/cpu @ffff891f7fc00000 s139672 r8192
d32360 u1048576
[    0.000000] kvm-stealtime: cpu 0, msr 7fc16240
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 515972
[    0.000000] Policy zone: DMA32
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-4.14.13-300.fc27.x86_64
root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap
rhgb LANG=en_US.UTF-8 console=tty0 console=ttyS0
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 2018548K/2096740K available (12300K kernel code, 1546K
rwdata, 3728K rodata, 2108K init, 1364K bss, 78192K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
[    0.000000] Kernel/User page tables isolation: enabled
[    0.000000] ftrace: allocating 35499 entries in 139 pages
[    0.001000] Hierarchical RCU implementation.
[    0.001000]	   RCU restricting CPUs from NR_CPUS=1024 to nr_cpu_ids=2.
[    0.001000]	   Tasks RCU enabled.
[    0.001000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[    0.001000] NR_IRQS: 65792, nr_irqs: 440, preallocated irqs: 16
[    0.001000]	   Offload RCU callbacks from CPUs: .
[    0.001000] Console: colour VGA+ 80x25
[    0.001000] console [tty0] enabled
[    0.001000] console [ttyS0] enabled
[    0.001029] tsc: Detected 3192.954 MHz processor
[    0.003140] Calibrating delay loop (skipped) preset value.. 6385.90 BogoMIPS
(lpj=3192954)
[    0.005015] pid_max: default: 32768 minimum: 301
[    0.007065] ACPI: Core revision 20170728
[    0.012257] ACPI: 3 ACPI AML tables successfully acquired and loaded
[    0.014182] Security Framework initialized
[    0.016033] Yama: becoming mindful.
[    0.018023] SELinux:  Initializing.
[    0.027871] Dentry cache hash table entries: 262144 (order: 9, 2097152
bytes)
[    0.033417] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes)
[    0.035181] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)
[    0.037117] Mountpoint-cache hash table entries: 4096 (order: 3, 32768
bytes)
[    0.040551] Disabled fast string operations
[    0.042121] mce: CPU supports 10 MCE banks
[    0.044238] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[    0.046011] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[    0.048013] Spectre V2 mitigation: Vulnerable: Minimal generic ASM retpoline
[    0.051181] Freeing SMP alternatives memory: 36K
[    0.056600] smpboot: Max logical packages: 2
[    0.062819] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.063000] smpboot: CPU0: Intel Common KVM processor (family: 0xf, model:
0x6, stepping: 0x1)
[    0.065264] Performance Events: unsupported Netburst CPU model 6 no PMU
driver, software events only.
[    0.067283] Hierarchical SRCU implementation.
[    0.071996] NMI watchdog: Perf event create on CPU 0 failed with -2
[    0.073011] NMI watchdog: Perf NMI watchdog permanently disabled
[    0.075272] smp: Bringing up secondary CPUs ...
[    0.078795] x86: Booting SMP configuration:
[    0.079021] .... node  #0, CPUs:	 #1
[    0.001000] kvm-clock: cpu 1, msr 0:7ffc2041, secondary cpu clock
[    0.001000] Disabled fast string operations
[    0.111083] kvm-stealtime: cpu 1, msr 7fd16240
[    0.117019] smp: Brought up 1 node, 2 CPUs
[    0.118023] smpboot: Total of 2 processors activated (12771.81 BogoMIPS)
[    0.125008] devtmpfs: initialized
[    0.127172] x86/mm: Memory block size: 128MB
[    0.130505] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
max_idle_ns: 1911260446275000 ns
[    0.132028] futex hash table entries: 512 (order: 3, 32768 bytes)
[    0.134409] pinctrl core: initialized pinctrl subsystem
[    0.137427] RTC time: 21:17:11, date: 01/12/18
[    0.139903] NET: Registered protocol family 16
[    0.143042] cpuidle: using governor menu
[    0.147760] ACPI: bus type PCI registered
[    0.149005] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.152346] PCI: Using configuration type 1 for base access
[    0.159292] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.163359] ACPI: Added _OSI(Module Device)
[    0.165008] ACPI: Added _OSI(Processor Device)
[    0.166017] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.168006] ACPI: Added _OSI(Processor Aggregator Device)
[    0.175536] ACPI: Interpreter enabled
[    0.177039] ACPI: (supports S0 S5)
[    0.179008] ACPI: Using IOAPIC for interrupt routing
[    0.180064] PCI: Using host bridge windows from ACPI; if necessary, use
"pci=nocrs" and report a bug
[    0.183000] ACPI: Enabled 16 GPEs in block 00 to 0F
[    0.193781] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.194025] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]
[    0.195023] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[    0.196033] acpi PNP0A03:00: fail to add MMCONFIG information, can't access
extended PCI configuration space under this bridge.
[    0.199739] acpiphp: Slot [1] registered
[    0.200107] acpiphp: Slot [2] registered
[    0.201078] acpiphp: Slot [3] registered
[    0.202073] acpiphp: Slot [4] registered
[    0.204013] acpiphp: Slot [5] registered
[    0.205076] acpiphp: Slot [6] registered
[    0.206100] acpiphp: Slot [7] registered
[    0.207071] acpiphp: Slot [8] registered
[    0.208091] acpiphp: Slot [9] registered
[    0.210017] acpiphp: Slot [10] registered
[    0.212048] acpiphp: Slot [11] registered
[    0.213108] acpiphp: Slot [12] registered
[    0.214000] acpiphp: Slot [13] registered
[    0.214088] acpiphp: Slot [14] registered
[    0.215092] acpiphp: Slot [15] registered
[    0.216079] acpiphp: Slot [16] registered
[    0.218037] acpiphp: Slot [17] registered
[    0.219080] acpiphp: Slot [18] registered
[    0.221066] acpiphp: Slot [19] registered
[    0.222077] acpiphp: Slot [20] registered
[    0.223073] acpiphp: Slot [21] registered
[    0.224076] acpiphp: Slot [22] registered
[    0.226014] acpiphp: Slot [23] registered
[    0.227086] acpiphp: Slot [24] registered
[    0.229032] acpiphp: Slot [25] registered
[    0.231030] acpiphp: Slot [26] registered
[    0.232128] acpiphp: Slot [27] registered
[    0.233076] acpiphp: Slot [28] registered
[    0.235139] acpiphp: Slot [29] registered
[    0.236089] acpiphp: Slot [30] registered
[    0.237071] acpiphp: Slot [31] registered
[    0.239000] PCI host bridge to bus 0000:00
[    0.239013] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    0.240028] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    0.242032] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff
window]
[    0.244010] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xfebfffff
window]
[    0.245011] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.253711] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io
0x01f0-0x01f7]
[    0.254015] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    0.255019] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io
0x0170-0x0177]
[    0.256014] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    0.262301] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4
ACPI
[    0.263059] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4
SMB
[    0.326959] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
[    0.328155] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
[    0.330160] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
[    0.332163] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
[    0.334046] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
[    0.339070] pci 0000:00:02.0: vgaarb: setting as boot VGA device
[    0.340000] pci 0000:00:02.0: vgaarb: VGA device added:
decodes=io+mem,owns=io+mem,locks=none
[    0.340020] pci 0000:00:02.0: vgaarb: bridge control possible
[    0.341006] vgaarb: loaded
[    0.345077] SCSI subsystem initialized
[    0.348291] ACPI: bus type USB registered
[    0.350196] usbcore: registered new interface driver usbfs
[    0.352048] usbcore: registered new interface driver hub
[    0.355022] usbcore: registered new device driver usb
[    0.358186] EDAC MC: Ver: 3.0.0
[    0.362305] PCI: Using ACPI for IRQ routing
[    0.367116] NetLabel: Initializing
[    0.368008] NetLabel:  domain hash size = 128
[    0.369010] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    0.371093] NetLabel:  unlabeled traffic allowed by default
[    0.374249] clocksource: Switched to clocksource kvm-clock
[    0.441047] VFS: Disk quotas dquot_6.6.0
[    0.446519] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.454161] pnp: PnP ACPI init
[    0.460204] pnp: PnP ACPI: found 5 devices
[    0.487183] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff,
max_idle_ns: 2085701024 ns
[    0.500887] NET: Registered protocol family 2
[    0.512619] TCP established hash table entries: 16384 (order: 5, 131072
bytes)
[    0.524218] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[    0.532472] TCP: Hash tables configured (established 16384 bind 16384)
[    0.540296] UDP hash table entries: 1024 (order: 3, 32768 bytes)
[    0.552314] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
[    0.561106] NET: Registered protocol family 1
[    0.567117] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    0.574096] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[    0.586198] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[    0.593935] pci 0000:00:02.0: Video device with shadowed ROM at [mem
0x000c0000-0x000dffff]
[    0.606572] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
[    0.622820] Unpacking initramfs...
[    2.335287] Freeing initrd memory: 21020K
[    2.344531] audit: initializing netlink subsys (disabled)
[    2.351390] audit: type=2000 audit(1515791833.710:1): state=initialized
audit_enabled=0 res=1
[    2.352623] Initialise system trusted keyrings
[    2.352714] Key type blacklist registered
[    2.353238] workingset: timestamp_bits=36 max_order=19 bucket_order=0
[    2.359953] zbud: loaded
[    2.910025] PANIC: double fault, error_code: 0x0
[    2.910025] CPU: 1 PID: 56 Comm: modprobe Not tainted
4.14.13-300.fc27.x86_64 #1
[    2.910025] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[    2.910025] task: ffff891f78dc3c00 task.stack: ffffa6eac0594000
[    2.910025] RIP: 0010:vprintk_default+0x5/0x30
[    2.910025] RSP: 0000:fffffe000002e000 EFLAGS: 00010046
[    2.910025] RAX: 0000000000000000 RBX: fffffe000002e118 RCX:
0000000000000001
[    2.910025] RDX: 0000000000000000 RSI: fffffe000002e018 RDI:
ffffffffbe0715a0
[    2.910025] RBP: fffffe000002e008 R08: ffffffffbe0bb565 R09:
ffffffffbe07159b
[    2.910025] R10: fffffe000002e080 R11: 0000000000000000 R12:
ffffffffbe070fdd
[    2.910025] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[    2.910025] FS:  0000000000000000(0000) GS:ffff891f7fd00000(0000)
knlGS:0000000000000000
[    2.910025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.910025] CR2: fffffe000002dff8 CR3: 0000000078da6000 CR4:
00000000000006e0
[    2.910025] Call Trace:
[    2.910025]	<ENTRY_TRAMPOLINE>
[    2.910025]	? vprintk_func+0x27/0x60
[    2.910025]	printk+0x52/0x6e
[    2.910025]	__die+0x6b/0xe0
[    2.910025]	die+0x2f/0x50
[    2.910025]	do_general_protection+0x149/0x160
[    2.910025]	general_protection+0x2c/0x60
[    2.910025] RIP: 0010:swapgs_restore_regs_and_return_to_usermode+0x6f/0x80
[    2.910025] RSP: 0000:fffffe000002e1c8 EFLAGS: 00000006
[    2.910025] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000000
[    2.910025] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000078da7800
[    2.910025] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[    2.910025] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[    2.910025] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[    2.910025]	</ENTRY_TRAMPOLINE>
[    2.910025] Code: eb 01 e8 ef 9c 7a 00 e8 1a 23 06 00 e8 b5 19 06 00 83 fb
ff 75 e4 8b 5d c8 e9 d2 fc ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <55> 49
89 f8 49 89 f1 31 c9 31 d2 be ff ff ff ff 48 89 e5 31 ff
[    2.910025] Kernel panic - not syncing: Machine halted.
[    2.910025] Kernel Offset: 0x3c000000 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffffbfffffff)
[    2.910025] ---[ end Kernel panic - not syncing: Machine halted.

Configs and other patches are at
https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git/log/?h=f27

Note that we did bring in the retpoline patches for 4.14.13 but the
report and panic was the same as with 4.14.11.

Thanks,
Laura

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-12 21:30   ` Laura Abbott
@ 2018-01-12 21:51     ` Thomas Gleixner
  2018-01-13  6:08       ` Andy Lutomirski
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2018-01-12 21:51 UTC (permalink / raw)
  To: Laura Abbott; +Cc: X86 ML, Linux Kernel Mailing List, stable, Andy Lutomirski

On Fri, 12 Jan 2018, Laura Abbott wrote:

Cc+ Andy

I'm almost crashed out by now. Andy might have an idea. I'll look again
tomorrow with brain awake.

> On 01/12/2018 10:51 AM, Thomas Gleixner wrote:
> > On Fri, 12 Jan 2018, Laura Abbott wrote:
> > > Fedora got a bug report on 4.14.11 of a panic when booting a
> > > Fedora guest in a CentOS 6 VM, not reproducible with nopti.
> > > The issue is still present as of 4.14.13 as well. The only
> > > report is a panic screenshot
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1532458
> > > 
> > > I've lost track of all the fixes that have been flying around,
> > > is this a new issue or has a fix not yet made it to stable?
> > 
> > Hmm. Looks kinda familiar, but that has been fixed I think even before
> > 4.4.11. Could you please ask the reported to provide a full console log via
> > the VM "serial console" ?
> > 
> > Thanks,
> > 
> > 	tglx
> > 
> 
> [    0.000000] Linux version 4.14.13-300.fc27.x86_64
> (mockbuild@bkernel01.phx2.fedoraproject.org) (gcc version 7.2.1 20170915 (Red
> Hat 7.2.1-2) (GCC)) #1 SMP Thu Jan 11 04:00:01 UTC 2018
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.14.13-300.fc27.x86_64
> root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap
> rhgb LANG=en_US.UTF-8 console=tty0 console=ttyS0
> [    0.000000] Disabled fast string operations
> [    0.000000] x86/fpu: x87 FPU will use FXSAVE
> [    0.000000] e820: BIOS-provided physical RAM map:
> [    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dbff] usable
> [    0.000000] BIOS-e820: [mem 0x000000000009dc00-0x000000000009ffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
> [    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007fffcfff] usable
> [    0.000000] BIOS-e820: [mem 0x000000007fffd000-0x000000007fffffff] reserved
> [    0.000000] BIOS-e820: [mem 0x00000000fffbc000-0x00000000ffffffff] reserved
> [    0.000000] NX (Execute Disable) protection: active
> [    0.000000] random: fast init done
> [    0.000000] SMBIOS 2.4 present.
> [    0.000000] DMI: Red Hat KVM, BIOS 0.5.1 01/01/2007
> [    0.000000] Hypervisor detected: KVM
> [    0.000000] tsc: Using PIT calibration value
> [    0.000000] e820: last_pfn = 0x7fffd max_arch_pfn = 0x400000000
> [    0.000000] x86/PAT: PAT MSR is 0, disabled.
> [    0.000000] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC
> [    0.000000] found SMP MP-table at [mem 0x000fda30-0x000fda3f] mapped at
> [ffffffffff200a30]
> [    0.000000] RAMDISK: [mem 0x356e3000-0x36b69fff]
> [    0.000000] ACPI: Early table checksum verification disabled
> [    0.000000] ACPI: RSDP 0x00000000000FD9E0 000014 (v00 BOCHS )
> [    0.000000] ACPI: RSDT 0x000000007FFFD5D0 000034 (v01 BOCHS	BXPCRSDT
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: FACP 0x000000007FFFFE20 000074 (v01 BOCHS	BXPCFACP
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: DSDT 0x000000007FFFD910 0024A2 (v01 BXPC	BXDSDT	
> 00000001 INTL 20090123)
> [    0.000000] ACPI: FACS 0x000000007FFFFDC0 000040
> [    0.000000] ACPI: SSDT 0x000000007FFFD810 0000FF (v01 BOCHS	BXPCSSDT
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: APIC 0x000000007FFFD720 000080 (v01 BOCHS	BXPCAPIC
> 00000001 BXPC 00000001)
> [    0.000000] ACPI: SSDT 0x000000007FFFD610 00010F (v01 BXPC	BXSSDTPC
> 00000001 INTL 20090123)
> [    0.000000] No NUMA configuration found
> [    0.000000] Faking a node at [mem 0x0000000000000000-0x000000007fffcfff]
> [    0.000000] NODE_DATA(0) allocated [mem 0x7ffd2000-0x7fffcfff]
> [    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
> [    0.000000] kvm-clock: cpu 0, msr 0:7ffc2001, primary cpu clock
> [    0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles:
> 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
> [    0.000000] Zone ranges:
> [    0.000000]	 DMA	  [mem 0x0000000000001000-0x0000000000ffffff]
> [    0.000000]	 DMA32	  [mem 0x0000000001000000-0x000000007fffcfff]
> [    0.000000]	 Normal   empty
> [    0.000000]	 Device   empty
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]	 node	0: [mem 0x0000000000001000-0x000000000009cfff]
> [    0.000000]	 node	0: [mem 0x0000000000100000-0x000000007fffcfff]
> [    0.000000] Initmem setup node 0 [mem
> 0x0000000000001000-0x000000007fffcfff]
> [    0.000000] ACPI: PM-Timer IO Port: 0xb008
> [    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
> [    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
> [    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
> [    0.000000] Using ACPI (MADT) for SMP configuration information
> [    0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
> [    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
> [    0.000000] PM: Registered nosave memory: [mem 0x0009d000-0x0009dfff]
> [    0.000000] PM: Registered nosave memory: [mem 0x0009e000-0x0009ffff]
> [    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff]
> [    0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff]
> [    0.000000] e820: [mem 0x80000000-0xfffbbfff] available for PCI devices
> [    0.000000] Booting paravirtualized kernel on KVM
> [    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles:
> 0xffffffff, max_idle_ns: 1910969940391419 ns
> [    0.000000] setup_percpu: NR_CPUS:1024 nr_cpumask_bits:2 nr_cpu_ids:2
> nr_node_ids:1
> [    0.000000] percpu: Embedded 44 pages/cpu @ffff891f7fc00000 s139672 r8192
> d32360 u1048576
> [    0.000000] kvm-stealtime: cpu 0, msr 7fc16240
> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 515972
> [    0.000000] Policy zone: DMA32
> [    0.000000] Kernel command line:
> BOOT_IMAGE=/vmlinuz-4.14.13-300.fc27.x86_64
> root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap
> rhgb LANG=en_US.UTF-8 console=tty0 console=ttyS0
> [    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
> [    0.000000] Memory: 2018548K/2096740K available (12300K kernel code, 1546K
> rwdata, 3728K rodata, 2108K init, 1364K bss, 78192K reserved, 0K cma-reserved)
> [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> [    0.000000] Kernel/User page tables isolation: enabled
> [    0.000000] ftrace: allocating 35499 entries in 139 pages
> [    0.001000] Hierarchical RCU implementation.
> [    0.001000]	   RCU restricting CPUs from NR_CPUS=1024 to nr_cpu_ids=2.
> [    0.001000]	   Tasks RCU enabled.
> [    0.001000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
> [    0.001000] NR_IRQS: 65792, nr_irqs: 440, preallocated irqs: 16
> [    0.001000]	   Offload RCU callbacks from CPUs: .
> [    0.001000] Console: colour VGA+ 80x25
> [    0.001000] console [tty0] enabled
> [    0.001000] console [ttyS0] enabled
> [    0.001029] tsc: Detected 3192.954 MHz processor
> [    0.003140] Calibrating delay loop (skipped) preset value.. 6385.90
> BogoMIPS
> (lpj=3192954)
> [    0.005015] pid_max: default: 32768 minimum: 301
> [    0.007065] ACPI: Core revision 20170728
> [    0.012257] ACPI: 3 ACPI AML tables successfully acquired and loaded
> [    0.014182] Security Framework initialized
> [    0.016033] Yama: becoming mindful.
> [    0.018023] SELinux:  Initializing.
> [    0.027871] Dentry cache hash table entries: 262144 (order: 9, 2097152
> bytes)
> [    0.033417] Inode-cache hash table entries: 131072 (order: 8, 1048576
> bytes)
> [    0.035181] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes)
> [    0.037117] Mountpoint-cache hash table entries: 4096 (order: 3, 32768
> bytes)
> [    0.040551] Disabled fast string operations
> [    0.042121] mce: CPU supports 10 MCE banks
> [    0.044238] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
> [    0.046011] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
> [    0.048013] Spectre V2 mitigation: Vulnerable: Minimal generic ASM
> retpoline
> [    0.051181] Freeing SMP alternatives memory: 36K
> [    0.056600] smpboot: Max logical packages: 2
> [    0.062819] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [    0.063000] smpboot: CPU0: Intel Common KVM processor (family: 0xf, model:
> 0x6, stepping: 0x1)
> [    0.065264] Performance Events: unsupported Netburst CPU model 6 no PMU
> driver, software events only.
> [    0.067283] Hierarchical SRCU implementation.
> [    0.071996] NMI watchdog: Perf event create on CPU 0 failed with -2
> [    0.073011] NMI watchdog: Perf NMI watchdog permanently disabled
> [    0.075272] smp: Bringing up secondary CPUs ...
> [    0.078795] x86: Booting SMP configuration:
> [    0.079021] .... node  #0, CPUs:	 #1
> [    0.001000] kvm-clock: cpu 1, msr 0:7ffc2041, secondary cpu clock
> [    0.001000] Disabled fast string operations
> [    0.111083] kvm-stealtime: cpu 1, msr 7fd16240
> [    0.117019] smp: Brought up 1 node, 2 CPUs
> [    0.118023] smpboot: Total of 2 processors activated (12771.81 BogoMIPS)
> [    0.125008] devtmpfs: initialized
> [    0.127172] x86/mm: Memory block size: 128MB
> [    0.130505] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff,
> max_idle_ns: 1911260446275000 ns
> [    0.132028] futex hash table entries: 512 (order: 3, 32768 bytes)
> [    0.134409] pinctrl core: initialized pinctrl subsystem
> [    0.137427] RTC time: 21:17:11, date: 01/12/18
> [    0.139903] NET: Registered protocol family 16
> [    0.143042] cpuidle: using governor menu
> [    0.147760] ACPI: bus type PCI registered
> [    0.149005] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
> [    0.152346] PCI: Using configuration type 1 for base access
> [    0.159292] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
> [    0.163359] ACPI: Added _OSI(Module Device)
> [    0.165008] ACPI: Added _OSI(Processor Device)
> [    0.166017] ACPI: Added _OSI(3.0 _SCP Extensions)
> [    0.168006] ACPI: Added _OSI(Processor Aggregator Device)
> [    0.175536] ACPI: Interpreter enabled
> [    0.177039] ACPI: (supports S0 S5)
> [    0.179008] ACPI: Using IOAPIC for interrupt routing
> [    0.180064] PCI: Using host bridge windows from ACPI; if necessary, use
> "pci=nocrs" and report a bug
> [    0.183000] ACPI: Enabled 16 GPEs in block 00 to 0F
> [    0.193781] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
> [    0.194025] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]
> [    0.195023] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
> [    0.196033] acpi PNP0A03:00: fail to add MMCONFIG information, can't access
> extended PCI configuration space under this bridge.
> [    0.199739] acpiphp: Slot [1] registered
> [    0.200107] acpiphp: Slot [2] registered
> [    0.201078] acpiphp: Slot [3] registered
> [    0.202073] acpiphp: Slot [4] registered
> [    0.204013] acpiphp: Slot [5] registered
> [    0.205076] acpiphp: Slot [6] registered
> [    0.206100] acpiphp: Slot [7] registered
> [    0.207071] acpiphp: Slot [8] registered
> [    0.208091] acpiphp: Slot [9] registered
> [    0.210017] acpiphp: Slot [10] registered
> [    0.212048] acpiphp: Slot [11] registered
> [    0.213108] acpiphp: Slot [12] registered
> [    0.214000] acpiphp: Slot [13] registered
> [    0.214088] acpiphp: Slot [14] registered
> [    0.215092] acpiphp: Slot [15] registered
> [    0.216079] acpiphp: Slot [16] registered
> [    0.218037] acpiphp: Slot [17] registered
> [    0.219080] acpiphp: Slot [18] registered
> [    0.221066] acpiphp: Slot [19] registered
> [    0.222077] acpiphp: Slot [20] registered
> [    0.223073] acpiphp: Slot [21] registered
> [    0.224076] acpiphp: Slot [22] registered
> [    0.226014] acpiphp: Slot [23] registered
> [    0.227086] acpiphp: Slot [24] registered
> [    0.229032] acpiphp: Slot [25] registered
> [    0.231030] acpiphp: Slot [26] registered
> [    0.232128] acpiphp: Slot [27] registered
> [    0.233076] acpiphp: Slot [28] registered
> [    0.235139] acpiphp: Slot [29] registered
> [    0.236089] acpiphp: Slot [30] registered
> [    0.237071] acpiphp: Slot [31] registered
> [    0.239000] PCI host bridge to bus 0000:00
> [    0.239013] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
> [    0.240028] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
> [    0.242032] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff
> window]
> [    0.244010] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xfebfffff
> window]
> [    0.245011] pci_bus 0000:00: root bus resource [bus 00-ff]
> [    0.253711] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io
> 0x01f0-0x01f7]
> [    0.254015] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
> [    0.255019] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io
> 0x0170-0x0177]
> [    0.256014] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
> [    0.262301] pci 0000:00:01.3: quirk: [io  0xb000-0xb03f] claimed by PIIX4
> ACPI
> [    0.263059] pci 0000:00:01.3: quirk: [io  0xb100-0xb10f] claimed by PIIX4
> SMB
> [    0.326959] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11)
> [    0.328155] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11)
> [    0.330160] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11)
> [    0.332163] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11)
> [    0.334046] ACPI: PCI Interrupt Link [LNKS] (IRQs *9)
> [    0.339070] pci 0000:00:02.0: vgaarb: setting as boot VGA device
> [    0.340000] pci 0000:00:02.0: vgaarb: VGA device added:
> decodes=io+mem,owns=io+mem,locks=none
> [    0.340020] pci 0000:00:02.0: vgaarb: bridge control possible
> [    0.341006] vgaarb: loaded
> [    0.345077] SCSI subsystem initialized
> [    0.348291] ACPI: bus type USB registered
> [    0.350196] usbcore: registered new interface driver usbfs
> [    0.352048] usbcore: registered new interface driver hub
> [    0.355022] usbcore: registered new device driver usb
> [    0.358186] EDAC MC: Ver: 3.0.0
> [    0.362305] PCI: Using ACPI for IRQ routing
> [    0.367116] NetLabel: Initializing
> [    0.368008] NetLabel:  domain hash size = 128
> [    0.369010] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
> [    0.371093] NetLabel:  unlabeled traffic allowed by default
> [    0.374249] clocksource: Switched to clocksource kvm-clock
> [    0.441047] VFS: Disk quotas dquot_6.6.0
> [    0.446519] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
> [    0.454161] pnp: PnP ACPI init
> [    0.460204] pnp: PnP ACPI: found 5 devices
> [    0.487183] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff,
> max_idle_ns: 2085701024 ns
> [    0.500887] NET: Registered protocol family 2
> [    0.512619] TCP established hash table entries: 16384 (order: 5, 131072
> bytes)
> [    0.524218] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
> [    0.532472] TCP: Hash tables configured (established 16384 bind 16384)
> [    0.540296] UDP hash table entries: 1024 (order: 3, 32768 bytes)
> [    0.552314] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes)
> [    0.561106] NET: Registered protocol family 1
> [    0.567117] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
> [    0.574096] pci 0000:00:01.0: PIIX3: Enabling Passive Release
> [    0.586198] pci 0000:00:01.0: Activating ISA DMA hang workarounds
> [    0.593935] pci 0000:00:02.0: Video device with shadowed ROM at [mem
> 0x000c0000-0x000dffff]
> [    0.606572] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11
> [    0.622820] Unpacking initramfs...
> [    2.335287] Freeing initrd memory: 21020K
> [    2.344531] audit: initializing netlink subsys (disabled)
> [    2.351390] audit: type=2000 audit(1515791833.710:1): state=initialized
> audit_enabled=0 res=1
> [    2.352623] Initialise system trusted keyrings
> [    2.352714] Key type blacklist registered
> [    2.353238] workingset: timestamp_bits=36 max_order=19 bucket_order=0
> [    2.359953] zbud: loaded
> [    2.910025] PANIC: double fault, error_code: 0x0
> [    2.910025] CPU: 1 PID: 56 Comm: modprobe Not tainted
> 4.14.13-300.fc27.x86_64 #1
> [    2.910025] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
> [    2.910025] task: ffff891f78dc3c00 task.stack: ffffa6eac0594000
> [    2.910025] RIP: 0010:vprintk_default+0x5/0x30
> [    2.910025] RSP: 0000:fffffe000002e000 EFLAGS: 00010046
> [    2.910025] RAX: 0000000000000000 RBX: fffffe000002e118 RCX:
> 0000000000000001
> [    2.910025] RDX: 0000000000000000 RSI: fffffe000002e018 RDI:
> ffffffffbe0715a0
> [    2.910025] RBP: fffffe000002e008 R08: ffffffffbe0bb565 R09:
> ffffffffbe07159b
> [    2.910025] R10: fffffe000002e080 R11: 0000000000000000 R12:
> ffffffffbe070fdd
> [    2.910025] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> [    2.910025] FS:  0000000000000000(0000) GS:ffff891f7fd00000(0000)
> knlGS:0000000000000000
> [    2.910025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    2.910025] CR2: fffffe000002dff8 CR3: 0000000078da6000 CR4:
> 00000000000006e0
> [    2.910025] Call Trace:
> [    2.910025]	<ENTRY_TRAMPOLINE>
> [    2.910025]	? vprintk_func+0x27/0x60
> [    2.910025]	printk+0x52/0x6e
> [    2.910025]	__die+0x6b/0xe0
> [    2.910025]	die+0x2f/0x50
> [    2.910025]	do_general_protection+0x149/0x160
> [    2.910025]	general_protection+0x2c/0x60
> [    2.910025] RIP: 0010:swapgs_restore_regs_and_return_to_usermode+0x6f/0x80
> [    2.910025] RSP: 0000:fffffe000002e1c8 EFLAGS: 00000006
> [    2.910025] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 0000000000000000
> [    2.910025] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000078da7800
> [    2.910025] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> [    2.910025] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000000
> [    2.910025] R13: 0000000000000000 R14: 0000000000000000 R15:
> 0000000000000000
> [    2.910025]	</ENTRY_TRAMPOLINE>
> [    2.910025] Code: eb 01 e8 ef 9c 7a 00 e8 1a 23 06 00 e8 b5 19 06 00 83 fb
> ff 75 e4 8b 5d c8 e9 d2 fc ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <55>
> 49
> 89 f8 49 89 f1 31 c9 31 d2 be ff ff ff ff 48 89 e5 31 ff
> [    2.910025] Kernel panic - not syncing: Machine halted.
> [    2.910025] Kernel Offset: 0x3c000000 from 0xffffffff81000000 (relocation
> range: 0xffffffff80000000-0xffffffffbfffffff)
> [    2.910025] ---[ end Kernel panic - not syncing: Machine halted.
> 
> Configs and other patches are at
> https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git/log/?h=f27
> 
> Note that we did bring in the retpoline patches for 4.14.13 but the
> report and panic was the same as with 4.14.11.
> 
> Thanks,
> Laura
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-12 21:51     ` Thomas Gleixner
@ 2018-01-13  6:08       ` Andy Lutomirski
  2018-01-13  6:33         ` Willy Tarreau
  2018-01-13 12:08         ` Peter Zijlstra
  0 siblings, 2 replies; 17+ messages in thread
From: Andy Lutomirski @ 2018-01-13  6:08 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra, Borislav Petkov
  Cc: Laura Abbott, X86 ML, Linux Kernel Mailing List, stable, Andy Lutomirski

On Fri, Jan 12, 2018 at 1:51 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 12 Jan 2018, Laura Abbott wrote:
>
> Cc+ Andy
>
> I'm almost crashed out by now. Andy might have an idea. I'll look again
> tomorrow with brain awake.
>
>> On 01/12/2018 10:51 AM, Thomas Gleixner wrote:
>> > On Fri, 12 Jan 2018, Laura Abbott wrote:
>> > > Fedora got a bug report on 4.14.11 of a panic when booting a
>> > > Fedora guest in a CentOS 6 VM, not reproducible with nopti.
>> > > The issue is still present as of 4.14.13 as well. The only
>> > > report is a panic screenshot
>> > > https://bugzilla.redhat.com/show_bug.cgi?id=1532458
>> > >
>> > > I've lost track of all the fixes that have been flying around,
>> > > is this a new issue or has a fix not yet made it to stable?
>> >
>> > Hmm. Looks kinda familiar, but that has been fixed I think even before
>> > 4.4.11. Could you please ask the reported to provide a full console log via
>> > the VM "serial console" ?
>> >
>> > Thanks,
>> >
>> >     tglx
>> >
>>

>> [    2.910025] PANIC: double fault, error_code: 0x0

Probably a stack overflow

>> [    2.910025] CPU: 1 PID: 56 Comm: modprobe Not tainted
>> 4.14.13-300.fc27.x86_64 #1
>> [    2.910025] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
>> [    2.910025] task: ffff891f78dc3c00 task.stack: ffffa6eac0594000
>> [    2.910025] RIP: 0010:vprintk_default+0x5/0x30
>> [    2.910025] RSP: 0000:fffffe000002e000 EFLAGS: 00010046
>> [    2.910025] RAX: 0000000000000000 RBX: fffffe000002e118 RCX:
>> 0000000000000001
>> [    2.910025] RDX: 0000000000000000 RSI: fffffe000002e018 RDI:
>> ffffffffbe0715a0
>> [    2.910025] RBP: fffffe000002e008 R08: ffffffffbe0bb565 R09:
>> ffffffffbe07159b
>> [    2.910025] R10: fffffe000002e080 R11: 0000000000000000 R12:
>> ffffffffbe070fdd
>> [    2.910025] R13: 0000000000000000 R14: 0000000000000000 R15:
>> 0000000000000000
>> [    2.910025] FS:  0000000000000000(0000) GS:ffff891f7fd00000(0000)
>> knlGS:0000000000000000
>> [    2.910025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [    2.910025] CR2: fffffe000002dff8 CR3: 0000000078da6000 CR4:
>> 00000000000006e0

CR2 and RSP are consistent with stack overflow, but that's not
terribly important, since...

>> [    2.910025] Call Trace:
>> [    2.910025]        <ENTRY_TRAMPOLINE>
>> [    2.910025]        ? vprintk_func+0x27/0x60
>> [    2.910025]        printk+0x52/0x6e
>> [    2.910025]        __die+0x6b/0xe0
>> [    2.910025]        die+0x2f/0x50
>> [    2.910025]        do_general_protection+0x149/0x160

This is the real problem here.

>> [    2.910025]        general_protection+0x2c/0x60
>> [    2.910025] RIP: 0010:swapgs_restore_regs_and_return_to_usermode+0x6f/0x80
>> [    2.910025] RSP: 0000:fffffe000002e1c8 EFLAGS: 00000006
>> [    2.910025] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>> 0000000000000000
>> [    2.910025] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>> 0000000078da7800
>> [    2.910025] RBP: 0000000000000000 R08: 0000000000000000 R09:
>> 0000000000000000
>> [    2.910025] R10: 0000000000000000 R11: 0000000000000000 R12:
>> 0000000000000000
>> [    2.910025] R13: 0000000000000000 R14: 0000000000000000 R15:
>> 0000000000000000
>> [    2.910025]        </ENTRY_TRAMPOLINE>

ffffffff81a00b3e:       65 48 0f b3 3c 25 8e    btr    %rdi,%gs:0x1e168e
ffffffff81a00b45:       16 1e 00
ffffffff81a00b48:       48 89 c7                mov    %rax,%rdi
ffffffff81a00b4b:       eb 08                   jmp    0xffffffff81a00b55
ffffffff81a00b4d:       48 89 c7                mov    %rax,%rdi
ffffffff81a00b50:       48 0f ba ef 3f          bts    $0x3f,%rdi
ffffffff81a00b55:       48 81 cf 00 18 00 00    or     $0x1800,%rdi
ffffffff81a00b5c:       0f 22 df                mov    %rdi,%cr3  <-- #GP here
ffffffff81a00b5f:       58                      pop    %rax
ffffffff81a00b60:       5f                      pop    %rdi

The CPU didn't like writing 0x0000000078da7800 to %cr3.

I'm guessing that CR4.PCIDE=0.

Now this is quite a strange value to write to CR3.  The 0x800 part
means that we're using the "user" variant of the address space that
would have ASID=0 and the 0x1000 bit being set corresponds to the user
pgdir, but this is nonsense, since the kernel never uses PCID 0 for
user mode.  We always start at 1.  The only exception is if
X86_FEATURE_PCID is off.  But, if X86_FEATURE_PCID is off, then we
shouldn't be setting any PCID bits.

In fact, it looks like this code is totally bogus and has never been
correct at all.  Even in:

commit 4b1d5ae3b103eda43f9d0f85c355bb6995b03a30
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Mon Dec 4 15:07:59 2017 +0100

    x86/mm: Use/Fix PCID to optimize user/kernel switches

We have:

.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
        ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
        mov     %cr3, \scratch_reg

        ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID

...

.Lwrcr3_\@:
        /* Flip the PGD and ASID to the user version */
        orq     $(PTI_SWITCH_MASK), \scratch_reg
        mov     \scratch_reg, %cr3
.Lend_\@:

That's bogus.  PTI_SWITCH_MASK is 0x1800, which has PCID = 0x800.

This should probably use an alternative to select between 0x1000 and
0x800 depending on X86_FEATURE_PCID or just use an entirely different
label for the !PCID case.

FWIW, this bit in SAVE_AND_SWITCH_TO_KERNEL_CR3

        testq   $(PTI_SWITCH_MASK), \scratch_reg
        jz      .Ldone_\@

is a bit silly, too.  It's *correct* (I think), but shouldn't that
just be bt $(PTI_SWITCH_PGTABLES_BIT), \scratch_reg, with the obvious
caveat that the headers don't actually define PTI_SWITCH_PGTABLES_BIT?

Was this code *ever* tested with nopcid?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-13  6:08       ` Andy Lutomirski
@ 2018-01-13  6:33         ` Willy Tarreau
  2018-01-13 20:01           ` Andy Lutomirski
  2018-01-13 12:08         ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Willy Tarreau @ 2018-01-13  6:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Peter Zijlstra, Borislav Petkov, Laura Abbott,
	X86 ML, Linux Kernel Mailing List, stable

On Fri, Jan 12, 2018 at 10:08:20PM -0800, Andy Lutomirski wrote:
> In fact, it looks like this code is totally bogus and has never been
> correct at all.  Even in:
> 
> commit 4b1d5ae3b103eda43f9d0f85c355bb6995b03a30
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Mon Dec 4 15:07:59 2017 +0100
> 
>     x86/mm: Use/Fix PCID to optimize user/kernel switches
> 
> We have:
> 
> .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
>         ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
>         mov     %cr3, \scratch_reg
> 
>         ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
> 
> ...
> 
> .Lwrcr3_\@:
>         /* Flip the PGD and ASID to the user version */
>         orq     $(PTI_SWITCH_MASK), \scratch_reg
>         mov     \scratch_reg, %cr3
> .Lend_\@:
> 
> That's bogus.  PTI_SWITCH_MASK is 0x1800, which has PCID = 0x800.
> 
> This should probably use an alternative to select between 0x1000 and
> 0x800 depending on X86_FEATURE_PCID or just use an entirely different
> label for the !PCID case.
> 
> FWIW, this bit in SAVE_AND_SWITCH_TO_KERNEL_CR3
> 
>         testq   $(PTI_SWITCH_MASK), \scratch_reg
>         jz      .Ldone_\@
> 
> is a bit silly, too.  It's *correct* (I think), but shouldn't that
> just be bt $(PTI_SWITCH_PGTABLES_BIT), \scratch_reg, with the obvious
> caveat that the headers don't actually define PTI_SWITCH_PGTABLES_BIT?

I wondered the same initially when reading this but thought there was
surely a good reason that I could not understand due to my lack of
knowledge and stopped wondering. BTW your PTI_SWITCH_PGTABLES_BIT would
in fact be PAGE_SHIFT :-)

> Was this code *ever* tested with nopcid?

At least it booted fine in qemu on my machine where pcid was initially
disabled by default, and on an Atom D510 which doesn't have PCID. I've
worked on the initial per-task PTI code with this. That doesn't mean
it's valid, just that there are situations where it works fine :-)

Willy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-13  6:08       ` Andy Lutomirski
  2018-01-13  6:33         ` Willy Tarreau
@ 2018-01-13 12:08         ` Peter Zijlstra
  2018-01-13 12:30           ` David Woodhouse
  2018-01-13 12:50           ` Thomas Gleixner
  1 sibling, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2018-01-13 12:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Borislav Petkov, Laura Abbott, X86 ML,
	Linux Kernel Mailing List, stable

On Fri, Jan 12, 2018 at 10:08:20PM -0800, Andy Lutomirski wrote:
> Now this is quite a strange value to write to CR3.  The 0x800 part
> means that we're using the "user" variant of the address space that
> would have ASID=0 and the 0x1000 bit being set corresponds to the user
> pgdir, but this is nonsense, since the kernel never uses PCID 0 for
> user mode.  We always start at 1.  The only exception is if
> X86_FEATURE_PCID is off.  But, if X86_FEATURE_PCID is off, then we
> shouldn't be setting any PCID bits.

My bad, I was under the impression the lower 12 bits would be ignored
without PCID :/

> .Lwrcr3_\@:
>         /* Flip the PGD and ASID to the user version */
>         orq     $(PTI_SWITCH_MASK), \scratch_reg
>         mov     \scratch_reg, %cr3
> .Lend_\@:
> 
> That's bogus.  PTI_SWITCH_MASK is 0x1800, which has PCID = 0x800.

> This should probably use an alternative to select between 0x1000 and
> 0x800 depending on X86_FEATURE_PCID or just use an entirely different
> label for the !PCID case.

	ALTERNATIVE "orq $(PTI_SWITCH_PGTABLE_MASK), \scratch_reg",
	            "orq $(PTI_SWITCH_MASK), \scratch_reg", X86_FEATURE_PCID

Is not wanting to compile though; probably that whole alternative vs
macro thing again :/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-13 12:08         ` Peter Zijlstra
@ 2018-01-13 12:30           ` David Woodhouse
  2018-01-13 13:10               ` Peter Zijlstra
  2018-01-13 12:50           ` Thomas Gleixner
  1 sibling, 1 reply; 17+ messages in thread
From: David Woodhouse @ 2018-01-13 12:30 UTC (permalink / raw)
  To: Peter Zijlstra, Andy Lutomirski
  Cc: Thomas Gleixner, Borislav Petkov, Laura Abbott, X86 ML,
	Linux Kernel Mailing List, stable

[-- Attachment #1: Type: text/plain, Size: 542 bytes --]

On Sat, 2018-01-13 at 13:08 +0100, Peter Zijlstra wrote:
> 
>         ALTERNATIVE "orq $(PTI_SWITCH_PGTABLE_MASK), \scratch_reg",
>                     "orq $(PTI_SWITCH_MASK), \scratch_reg", X86_FEATURE_PCID
> 
> Is not wanting to compile though; probably that whole alternative vs
> macro thing again :/

Welcome to my world. Try

 ALTERNATIVE __stringify(orq $(PTI_SWITCH_PGTABLE_MASK), \scratch_reg), \
             __stringify(orq $(PTI_SWITCH_MASK), \scratch_reg), \
             X86_FEATURE_PCID

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-13 12:08         ` Peter Zijlstra
  2018-01-13 12:30           ` David Woodhouse
@ 2018-01-13 12:50           ` Thomas Gleixner
  2018-01-13 13:51             ` Borislav Petkov
  1 sibling, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2018-01-13 12:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Borislav Petkov, Laura Abbott, X86 ML,
	Linux Kernel Mailing List, stable

On Sat, 13 Jan 2018, Peter Zijlstra wrote:

> On Fri, Jan 12, 2018 at 10:08:20PM -0800, Andy Lutomirski wrote:
> > Now this is quite a strange value to write to CR3.  The 0x800 part
> > means that we're using the "user" variant of the address space that
> > would have ASID=0 and the 0x1000 bit being set corresponds to the user
> > pgdir, but this is nonsense, since the kernel never uses PCID 0 for
> > user mode.  We always start at 1.  The only exception is if
> > X86_FEATURE_PCID is off.  But, if X86_FEATURE_PCID is off, then we
> > shouldn't be setting any PCID bits.
> 
> My bad, I was under the impression the lower 12 bits would be ignored
> without PCID :/

 2:0 Ignored
   3 PWT
   4 PCD
11:5 Ignored

So yes, it's mostly ignored at least in theory... I'm sure I stared at that
code and the SDM more than once and convinced myself that it's not an issue
to set bit 11 unconditionally.

But I should have stared at the AMD manual which says:

Reserved Bits. Reserved fields should be cleared to 0 by software when writing CR3.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-13 12:30           ` David Woodhouse
@ 2018-01-13 13:10               ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2018-01-13 13:10 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Andy Lutomirski, Thomas Gleixner, Borislav Petkov, Laura Abbott,
	X86 ML, Linux Kernel Mailing List, stable

On Sat, Jan 13, 2018 at 12:30:11PM +0000, David Woodhouse wrote:
> On Sat, 2018-01-13 at 13:08 +0100, Peter Zijlstra wrote:
> > 
> >         ALTERNATIVE "orq $(PTI_SWITCH_PGTABLE_MASK), \scratch_reg",
> >                     "orq $(PTI_SWITCH_MASK), \scratch_reg", X86_FEATURE_PCID
> > 
> > Is not wanting to compile though; probably that whole alternative vs
> > macro thing again :/
> 
> Welcome to my world. Try
> 
>  ALTERNATIVE __stringify(orq $(PTI_SWITCH_PGTABLE_MASK), \scratch_reg), \
>              __stringify(orq $(PTI_SWITCH_MASK), \scratch_reg), \
>              X86_FEATURE_PCID

Doesn't seem to work, gets literal __stringy() crud in the .s file.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
@ 2018-01-13 13:10               ` Peter Zijlstra
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2018-01-13 13:10 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Andy Lutomirski, Thomas Gleixner, Borislav Petkov, Laura Abbott,
	X86 ML, Linux Kernel Mailing List, stable

On Sat, Jan 13, 2018 at 12:30:11PM +0000, David Woodhouse wrote:
> On Sat, 2018-01-13 at 13:08 +0100, Peter Zijlstra wrote:
> > 
> > ��������ALTERNATIVE "orq $(PTI_SWITCH_PGTABLE_MASK), \scratch_reg",
> > ������������������� "orq $(PTI_SWITCH_MASK), \scratch_reg", X86_FEATURE_PCID
> > 
> > Is not wanting to compile though; probably that whole alternative vs
> > macro thing again :/
> 
> Welcome to my world. Try
> 
> �ALTERNATIVE __stringify(orq $(PTI_SWITCH_PGTABLE_MASK), \scratch_reg), \
>              __stringify(orq $(PTI_SWITCH_MASK), \scratch_reg), \
>              X86_FEATURE_PCID

Doesn't seem to work, gets literal __stringy() crud in the .s file.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-13 13:10               ` Peter Zijlstra
  (?)
@ 2018-01-13 13:39               ` David Woodhouse
  -1 siblings, 0 replies; 17+ messages in thread
From: David Woodhouse @ 2018-01-13 13:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Thomas Gleixner, Borislav Petkov, Laura Abbott,
	X86 ML, Linux Kernel Mailing List, stable

[-- Attachment #1: Type: text/plain, Size: 879 bytes --]

On Sat, 2018-01-13 at 14:10 +0100, Peter Zijlstra wrote:
> On Sat, Jan 13, 2018 at 12:30:11PM +0000, David Woodhouse wrote:
> > 
> > On Sat, 2018-01-13 at 13:08 +0100, Peter Zijlstra wrote:
> > > 
> > > 
> > >         ALTERNATIVE "orq $(PTI_SWITCH_PGTABLE_MASK), \scratch_reg",
> > >                     "orq $(PTI_SWITCH_MASK), \scratch_reg", X86_FEATURE_PCID
> > > 
> > > Is not wanting to compile though; probably that whole alternative vs
> > > macro thing again :/
> > Welcome to my world. Try
> > 
> >  ALTERNATIVE __stringify(orq $(PTI_SWITCH_PGTABLE_MASK), \scratch_reg), \
> >              __stringify(orq $(PTI_SWITCH_MASK), \scratch_reg), \
> >              X86_FEATURE_PCID
> Doesn't seem to work, gets literal __stringy() crud in the .s file.

You do have to #include <linux/stringify.h> too...

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-13 12:50           ` Thomas Gleixner
@ 2018-01-13 13:51             ` Borislav Petkov
  0 siblings, 0 replies; 17+ messages in thread
From: Borislav Petkov @ 2018-01-13 13:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra, Andy Lutomirski, Laura Abbott, X86 ML,
	Linux Kernel Mailing List, stable

On Sat, Jan 13, 2018 at 01:50:55PM +0100, Thomas Gleixner wrote:
>  2:0 Ignored
>    3 PWT
>    4 PCD

Btw, those last two are "(implies PCD=PWT=0)" according to
http://www.sandpile.org/x86/crx.htm with PCID. I was wondering recently
what happens with those bits when PCID is enabled and CR3[11:0] is the
ASID. I couldn't find it in the SDM.

And it kinda makes sense, PCD=1b is probably only for testing some
hardware crud and not really sensible for normal production.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-13 13:10               ` Peter Zijlstra
  (?)
  (?)
@ 2018-01-13 14:14               ` David Woodhouse
  -1 siblings, 0 replies; 17+ messages in thread
From: David Woodhouse @ 2018-01-13 14:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Thomas Gleixner, Borislav Petkov, Laura Abbott,
	X86 ML, Linux Kernel Mailing List, stable

[-- Attachment #1: Type: text/plain, Size: 2504 bytes --]

On Sat, 2018-01-13 at 14:10 +0100, Peter Zijlstra wrote:
> On Sat, Jan 13, 2018 at 12:30:11PM +0000, David Woodhouse wrote:
> > 
> > On Sat, 2018-01-13 at 13:08 +0100, Peter Zijlstra wrote:
> > > 
> > > 
> > >         ALTERNATIVE "orq $(PTI_SWITCH_PGTABLE_MASK), \scratch_reg",
> > >                     "orq $(PTI_SWITCH_MASK), \scratch_reg", X86_FEATURE_PCID
> > > 
> > > Is not wanting to compile though; probably that whole alternative vs
> > > macro thing again :/
> > Welcome to my world. Try
> > 
> >  ALTERNATIVE __stringify(orq $(PTI_SWITCH_PGTABLE_MASK), \scratch_reg), \
> >              __stringify(orq $(PTI_SWITCH_MASK), \scratch_reg), \
> >              X86_FEATURE_PCID
> Doesn't seem to work, gets literal __stringy() crud in the .s file.

--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -1,5 +1,6 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 #include <linux/jump_label.h>
+#include <linux/stringify.h>
 #include <asm/unwind_hints.h>
 #include <asm/cpufeatures.h>
 #include <asm/page_types.h>
@@ -222,7 +223,10 @@ For 32-bit we have the following conventions - kernel is built with
 #define THIS_CPU_user_pcid_flush_mask   \
        PER_CPU_VAR(cpu_tlbstate) + TLB_STATE_user_pcid_flush_mask
 
-.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
+.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req \
+                       pti_sw_mask=__stringify(PTI_SWITCH_MASK) \
+                       pti_sw_pgt_mask=PTI_SWITCH_PGTABLES_MASK
+
        ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
        mov     %cr3, \scratch_reg
 
@@ -247,7 +251,9 @@ For 32-bit we have the following conventions - kernel is built with
 
 .Lwrcr3_\@:
        /* Flip the PGD and ASID to the user version */
-       orq     $(PTI_SWITCH_MASK), \scratch_reg
+       ALTERNATIVE __stringify(orq $\pti_sw_pgt_mask, \scratch_reg),   \
+                   __stringify(orq $\pti_sw_mask, \scratch_reg),               \
+                   X86_FEATURE_PCID
        mov     \scratch_reg, %cr3
 .Lend_\@:
 .endm

Yeah you need to 'stringify' the first of the macro args (pti_sw_mask)
because its default value being in parens confuses the very primitive
.macro arg processing. The last arg is fine.

This shit makes my brain hurt.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-13  6:33         ` Willy Tarreau
@ 2018-01-13 20:01           ` Andy Lutomirski
  2018-01-13 20:45             ` Thomas Gleixner
  0 siblings, 1 reply; 17+ messages in thread
From: Andy Lutomirski @ 2018-01-13 20:01 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Andy Lutomirski, Thomas Gleixner, Peter Zijlstra,
	Borislav Petkov, Laura Abbott, X86 ML, Linux Kernel Mailing List,
	stable

On Fri, Jan 12, 2018 at 10:33 PM, Willy Tarreau <w@1wt.eu> wrote:
> On Fri, Jan 12, 2018 at 10:08:20PM -0800, Andy Lutomirski wrote:
>> In fact, it looks like this code is totally bogus and has never been
>> correct at all.  Even in:
>>
>> commit 4b1d5ae3b103eda43f9d0f85c355bb6995b03a30
>> Author: Peter Zijlstra <peterz@infradead.org>
>> Date:   Mon Dec 4 15:07:59 2017 +0100
>>
>>     x86/mm: Use/Fix PCID to optimize user/kernel switches
>>
>> We have:
>>
>> .macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
>>         ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
>>         mov     %cr3, \scratch_reg
>>
>>         ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID
>>
>> ...
>>
>> .Lwrcr3_\@:
>>         /* Flip the PGD and ASID to the user version */
>>         orq     $(PTI_SWITCH_MASK), \scratch_reg
>>         mov     \scratch_reg, %cr3
>> .Lend_\@:
>>
>> That's bogus.  PTI_SWITCH_MASK is 0x1800, which has PCID = 0x800.
>>
>> This should probably use an alternative to select between 0x1000 and
>> 0x800 depending on X86_FEATURE_PCID or just use an entirely different
>> label for the !PCID case.
>>
>> FWIW, this bit in SAVE_AND_SWITCH_TO_KERNEL_CR3
>>
>>         testq   $(PTI_SWITCH_MASK), \scratch_reg
>>         jz      .Ldone_\@
>>
>> is a bit silly, too.  It's *correct* (I think), but shouldn't that
>> just be bt $(PTI_SWITCH_PGTABLES_BIT), \scratch_reg, with the obvious
>> caveat that the headers don't actually define PTI_SWITCH_PGTABLES_BIT?
>
> I wondered the same initially when reading this but thought there was
> surely a good reason that I could not understand due to my lack of
> knowledge and stopped wondering. BTW your PTI_SWITCH_PGTABLES_BIT would
> in fact be PAGE_SHIFT :-)

Trying to inventory this stuff scattered all over the place:

#define PTI_PGTABLE_SWITCH_BIT    PAGE_SHIFT
#define PTI_SWITCH_PGTABLES_MASK    (1<<PAGE_SHIFT)
# define X86_CR3_PTI_SWITCH_BIT    11
#define PTI_SWITCH_MASK
(PTI_SWITCH_PGTABLES_MASK|(1<<X86_CR3_PTI_SWITCH_BIT))

Blech.  I wouldn't be terribly surprised if I missed a few as well.  How about:

PTI_USER_PGTABLE_BIT = PAGE_SHIFT
PTI_USER_PGTABLE_MASK = 1 << PTI_USER_PGTABLE_BIT
PTI_USER_PCID_BIT = 11
PTI_USER_PCID_MASK = 1 << PTI_USER_PCID_BIT
PTI_USER_PGTABLE_AND_PCID_MASK = PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK

This naming would make the apparently buggy code look fishy, as it
should.  I will give this a shot some time soon if no one beats me to
it.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-13 20:01           ` Andy Lutomirski
@ 2018-01-13 20:45             ` Thomas Gleixner
  2018-01-13 20:52               ` Andy Lutomirski
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2018-01-13 20:45 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Willy Tarreau, Peter Zijlstra, Borislav Petkov, Laura Abbott,
	X86 ML, Linux Kernel Mailing List, stable

On Sat, 13 Jan 2018, Andy Lutomirski wrote:
> Trying to inventory this stuff scattered all over the place:
> 
> #define PTI_PGTABLE_SWITCH_BIT    PAGE_SHIFT
> #define PTI_SWITCH_PGTABLES_MASK    (1<<PAGE_SHIFT)
> # define X86_CR3_PTI_SWITCH_BIT    11
> #define PTI_SWITCH_MASK
> (PTI_SWITCH_PGTABLES_MASK|(1<<X86_CR3_PTI_SWITCH_BIT))
> 
> Blech.  I wouldn't be terribly surprised if I missed a few as well.  How about:
> 
> PTI_USER_PGTABLE_BIT = PAGE_SHIFT
> PTI_USER_PGTABLE_MASK = 1 << PTI_USER_PGTABLE_BIT
> PTI_USER_PCID_BIT = 11
> PTI_USER_PCID_MASK = 1 << PTI_USER_PCID_BIT
> PTI_USER_PGTABLE_AND_PCID_MASK = PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK
> 
> This naming would make the apparently buggy code look fishy, as it
> should.  I will give this a shot some time soon if no one beats me to
> it.

Well, the thing we tripped over is that we trusted the SDM that bit 11 is
ignored. Seems its not and the AMD APM says that reserved bit should be
cleared. Next time I surely stare into both....

So something like the below should make it clear. I've not done the
alternatives thing yet...

Thanks,

	tglx

8<-------------------
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -198,8 +198,11 @@ For 32-bit we have the following convent
  * PAGE_TABLE_ISOLATION PGDs are 8k.  Flip bit 12 to switch between the two
  * halves:
  */
-#define PTI_SWITCH_PGTABLES_MASK	(1<<PAGE_SHIFT)
-#define PTI_SWITCH_MASK		(PTI_SWITCH_PGTABLES_MASK|(1<<X86_CR3_PTI_SWITCH_BIT))
+#define PTI_USER_PGTABLE_BIT		PAGE_SHIFT
+#define PTI_USER_PGTABLE_MASK		(1 << PTI_USER_PGTABLE_BIT)
+#define PTI_USER_PCID_BIT		X86_CR3_PTI_PCID_USER_BIT
+#define PTI_USER_PCID_MASK		(1 << PTI_USER_PCID_BIT)
+#define PTI_USER_PGTABLE_AND_PCID_MASK  (PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK)
 
 .macro SET_NOFLUSH_BIT	reg:req
 	bts	$X86_CR3_PCID_NOFLUSH_BIT, \reg
@@ -208,7 +211,7 @@ For 32-bit we have the following convent
 .macro ADJUST_KERNEL_CR3 reg:req
 	ALTERNATIVE "", "SET_NOFLUSH_BIT \reg", X86_FEATURE_PCID
 	/* Clear PCID and "PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */
-	andq    $(~PTI_SWITCH_MASK), \reg
+	andq    $(~PTI_USER_PGTABLE_AND_PCID_MASK), \reg
 .endm
 
 .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
@@ -239,15 +242,18 @@ For 32-bit we have the following convent
 	/* Flush needed, clear the bit */
 	btr	\scratch_reg, THIS_CPU_user_pcid_flush_mask
 	movq	\scratch_reg2, \scratch_reg
-	jmp	.Lwrcr3_\@
+	jmp	.Lwrcr3_pcid_\@
 
 .Lnoflush_\@:
 	movq	\scratch_reg2, \scratch_reg
 	SET_NOFLUSH_BIT \scratch_reg
 
+.Lwcr3_pcid_\@:
+	orq	$(PTI_USER_PCID_MASK), \scratch_reg
+
 .Lwrcr3_\@:
 	/* Flip the PGD and ASID to the user version */
-	orq     $(PTI_SWITCH_MASK), \scratch_reg
+	orq     $(PTI_USER_PGTABLE_MASK), \scratch_reg
 	mov	\scratch_reg, %cr3
 .Lend_\@:
 .endm
@@ -272,7 +278,7 @@ For 32-bit we have the following convent
 	 *
 	 * That indicates a kernel CR3 value, not a user CR3.
 	 */
-	testq	$(PTI_SWITCH_MASK), \scratch_reg
+	testq	$(PTI_USER_PGTABLE_MASK), \scratch_reg
 	jz	.Ldone_\@
 
 	ADJUST_KERNEL_CR3 \scratch_reg
@@ -290,7 +296,7 @@ For 32-bit we have the following convent
 	 * KERNEL pages can always resume with NOFLUSH as we do
 	 * explicit flushes.
 	 */
-	bt	$X86_CR3_PTI_SWITCH_BIT, \save_reg
+	bt	$PTI_USER_PGTABLE_BIT, \save_reg
 	jnc	.Lnoflush_\@
 
 	/*
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -40,7 +40,7 @@
 #define CR3_NOFLUSH	BIT_ULL(63)
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-# define X86_CR3_PTI_SWITCH_BIT	11
+# define X86_CR3_PTI_PCID_USER_BIT	11
 #endif
 
 #else
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -81,13 +81,13 @@ static inline u16 kern_pcid(u16 asid)
 	 * Make sure that the dynamic ASID space does not confict with the
 	 * bit we are using to switch between user and kernel ASIDs.
 	 */
-	BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_PTI_SWITCH_BIT));
+	BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_PTI_PCID_USER_BIT));
 
 	/*
 	 * The ASID being passed in here should have respected the
 	 * MAX_ASID_AVAILABLE and thus never have the switch bit set.
 	 */
-	VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_SWITCH_BIT));
+	VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_PCID_USER_BIT));
 #endif
 	/*
 	 * The dynamically-assigned ASIDs that get passed in are small
@@ -112,7 +112,7 @@ static inline u16 user_pcid(u16 asid)
 {
 	u16 ret = kern_pcid(asid);
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-	ret |= 1 << X86_CR3_PTI_SWITCH_BIT;
+	ret |= 1 << X86_CR3_PTI_PCID_USER_BIT;
 #endif
 	return ret;
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Yet another KPTI regression with 4.14.x series in a VM
  2018-01-13 20:45             ` Thomas Gleixner
@ 2018-01-13 20:52               ` Andy Lutomirski
  0 siblings, 0 replies; 17+ messages in thread
From: Andy Lutomirski @ 2018-01-13 20:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andy Lutomirski, Willy Tarreau, Peter Zijlstra, Borislav Petkov,
	Laura Abbott, X86 ML, Linux Kernel Mailing List, stable

On Sat, Jan 13, 2018 at 12:45 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Sat, 13 Jan 2018, Andy Lutomirski wrote:
>> Trying to inventory this stuff scattered all over the place:
>>
>> #define PTI_PGTABLE_SWITCH_BIT    PAGE_SHIFT
>> #define PTI_SWITCH_PGTABLES_MASK    (1<<PAGE_SHIFT)
>> # define X86_CR3_PTI_SWITCH_BIT    11
>> #define PTI_SWITCH_MASK
>> (PTI_SWITCH_PGTABLES_MASK|(1<<X86_CR3_PTI_SWITCH_BIT))
>>
>> Blech.  I wouldn't be terribly surprised if I missed a few as well.  How about:
>>
>> PTI_USER_PGTABLE_BIT = PAGE_SHIFT
>> PTI_USER_PGTABLE_MASK = 1 << PTI_USER_PGTABLE_BIT
>> PTI_USER_PCID_BIT = 11
>> PTI_USER_PCID_MASK = 1 << PTI_USER_PCID_BIT
>> PTI_USER_PGTABLE_AND_PCID_MASK = PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK
>>
>> This naming would make the apparently buggy code look fishy, as it
>> should.  I will give this a shot some time soon if no one beats me to
>> it.
>
> Well, the thing we tripped over is that we trusted the SDM that bit 11 is
> ignored. Seems its not and the AMD APM says that reserved bit should be
> cleared. Next time I surely stare into both....
>
> So something like the below should make it clear. I've not done the
> alternatives thing yet...
>

Looks generally sane to me.

>
> 8<-------------------
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -198,8 +198,11 @@ For 32-bit we have the following convent
>   * PAGE_TABLE_ISOLATION PGDs are 8k.  Flip bit 12 to switch between the two
>   * halves:
>   */
> -#define PTI_SWITCH_PGTABLES_MASK       (1<<PAGE_SHIFT)
> -#define PTI_SWITCH_MASK                (PTI_SWITCH_PGTABLES_MASK|(1<<X86_CR3_PTI_SWITCH_BIT))
> +#define PTI_USER_PGTABLE_BIT           PAGE_SHIFT
> +#define PTI_USER_PGTABLE_MASK          (1 << PTI_USER_PGTABLE_BIT)
> +#define PTI_USER_PCID_BIT              X86_CR3_PTI_PCID_USER_BIT
> +#define PTI_USER_PCID_MASK             (1 << PTI_USER_PCID_BIT)
> +#define PTI_USER_PGTABLE_AND_PCID_MASK  (PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK)
>
>  .macro SET_NOFLUSH_BIT reg:req
>         bts     $X86_CR3_PCID_NOFLUSH_BIT, \reg
> @@ -208,7 +211,7 @@ For 32-bit we have the following convent
>  .macro ADJUST_KERNEL_CR3 reg:req
>         ALTERNATIVE "", "SET_NOFLUSH_BIT \reg", X86_FEATURE_PCID
>         /* Clear PCID and "PAGE_TABLE_ISOLATION bit", point CR3 at kernel pagetables: */
> -       andq    $(~PTI_SWITCH_MASK), \reg
> +       andq    $(~PTI_USER_PGTABLE_AND_PCID_MASK), \reg
>  .endm
>
>  .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
> @@ -239,15 +242,18 @@ For 32-bit we have the following convent
>         /* Flush needed, clear the bit */
>         btr     \scratch_reg, THIS_CPU_user_pcid_flush_mask
>         movq    \scratch_reg2, \scratch_reg
> -       jmp     .Lwrcr3_\@
> +       jmp     .Lwrcr3_pcid_\@
>
>  .Lnoflush_\@:
>         movq    \scratch_reg2, \scratch_reg
>         SET_NOFLUSH_BIT \scratch_reg
>
> +.Lwcr3_pcid_\@:
> +       orq     $(PTI_USER_PCID_MASK), \scratch_reg
> +
>  .Lwrcr3_\@:
>         /* Flip the PGD and ASID to the user version */
> -       orq     $(PTI_SWITCH_MASK), \scratch_reg
> +       orq     $(PTI_USER_PGTABLE_MASK), \scratch_reg
>         mov     \scratch_reg, %cr3
>  .Lend_\@:
>  .endm
> @@ -272,7 +278,7 @@ For 32-bit we have the following convent
>          *
>          * That indicates a kernel CR3 value, not a user CR3.
>          */
> -       testq   $(PTI_SWITCH_MASK), \scratch_reg
> +       testq   $(PTI_USER_PGTABLE_MASK), \scratch_reg
>         jz      .Ldone_\@
>
>         ADJUST_KERNEL_CR3 \scratch_reg
> @@ -290,7 +296,7 @@ For 32-bit we have the following convent
>          * KERNEL pages can always resume with NOFLUSH as we do
>          * explicit flushes.
>          */
> -       bt      $X86_CR3_PTI_SWITCH_BIT, \save_reg
> +       bt      $PTI_USER_PGTABLE_BIT, \save_reg
>         jnc     .Lnoflush_\@
>
>         /*
> --- a/arch/x86/include/asm/processor-flags.h
> +++ b/arch/x86/include/asm/processor-flags.h
> @@ -40,7 +40,7 @@
>  #define CR3_NOFLUSH    BIT_ULL(63)
>
>  #ifdef CONFIG_PAGE_TABLE_ISOLATION
> -# define X86_CR3_PTI_SWITCH_BIT        11
> +# define X86_CR3_PTI_PCID_USER_BIT     11
>  #endif
>
>  #else
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -81,13 +81,13 @@ static inline u16 kern_pcid(u16 asid)
>          * Make sure that the dynamic ASID space does not confict with the
>          * bit we are using to switch between user and kernel ASIDs.
>          */
> -       BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_PTI_SWITCH_BIT));
> +       BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_PTI_PCID_USER_BIT));
>
>         /*
>          * The ASID being passed in here should have respected the
>          * MAX_ASID_AVAILABLE and thus never have the switch bit set.
>          */
> -       VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_SWITCH_BIT));
> +       VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_PCID_USER_BIT));
>  #endif
>         /*
>          * The dynamically-assigned ASIDs that get passed in are small
> @@ -112,7 +112,7 @@ static inline u16 user_pcid(u16 asid)
>  {
>         u16 ret = kern_pcid(asid);
>  #ifdef CONFIG_PAGE_TABLE_ISOLATION
> -       ret |= 1 << X86_CR3_PTI_SWITCH_BIT;
> +       ret |= 1 << X86_CR3_PTI_PCID_USER_BIT;
>  #endif
>         return ret;
>  }
>
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-01-13 20:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-12 18:19 Yet another KPTI regression with 4.14.x series in a VM Laura Abbott
2018-01-12 18:51 ` Thomas Gleixner
2018-01-12 21:30   ` Laura Abbott
2018-01-12 21:51     ` Thomas Gleixner
2018-01-13  6:08       ` Andy Lutomirski
2018-01-13  6:33         ` Willy Tarreau
2018-01-13 20:01           ` Andy Lutomirski
2018-01-13 20:45             ` Thomas Gleixner
2018-01-13 20:52               ` Andy Lutomirski
2018-01-13 12:08         ` Peter Zijlstra
2018-01-13 12:30           ` David Woodhouse
2018-01-13 13:10             ` Peter Zijlstra
2018-01-13 13:10               ` Peter Zijlstra
2018-01-13 13:39               ` David Woodhouse
2018-01-13 14:14               ` David Woodhouse
2018-01-13 12:50           ` Thomas Gleixner
2018-01-13 13:51             ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.