* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 14:18 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 14:18 UTC (permalink / raw)
To: catalin.marinas, will.deacon, suzuki.poulose, dave.martin,
mark.rutland, james.morse, marc.zyngier
Cc: linux-arm-kernel, linux-kernel, Linuxarm, Hanjun Guo, xiexiuqi,
huangdaode, Chenxin (Charles), Xiongfanggou (James),
Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
Hi All,
We have observed KVM guest sometimes failed to boot because of kernel stack
overflow if KPTI is enabled on a hisilicon arm64 platform.
We also tested with different kernel version and found it is only
happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the
guest.
The detail result is as below table.
+---------+----------+--------+------------+-------------------+
| host |host KPTI | guest | guest KPTI | kvm guest |
| kernel |enabled | kernel | enabled | booting result |
+---------+----------+--------+------------+-------------------+
| 4.17 | Y | 4.17 | Y | stack overflow |
+---------+----------+--------+------------+-------------------+
| 4.17 | Y | 4.16 | NA | OK |
+---------+----------+--------+------------+-------------------+
| 4.16 | NA | 4.17 | Y | stack overflow |
+---------+----------+--------+------------+-------------------+
| 4.16 | NA | 4.16 | NA | OK |
+---------+----------+--------+------------+-------------------+
A simple walk-around is adding this platform into the "kpti_safe_list".
But it does not resolve the issue indeed.
Could you please share any hint how to resolve this kind issue?
Thanks!
Another issue we found is "kpti_install_ng_mappings" will be invoked
even "kpti=off" has been added in the kernel command line. Is that expected?
This is because "kpti" is not a *early* param that "init_cpu_features" will
be invoked before parsing the param.
The command we are using to run the guest is as:
./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3
-cpu host
-enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd
../mini-rootfs-arm64.cpio.gz
-nographic -append "rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000"
The log is as below:
[ 0.000000] Booting Linux on physical CPU 0x0000000000
[0x480fd010]
[ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6
SMP PREEMPT Fri Jun 15 21:39:52 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000
(options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval)
s57984 r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table
isolation (KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit
management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K
kernel code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K
reserved, 16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0,
CPUs=1, Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz
(virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000854] Console: colour dummy device 80x25
[ 0.001423] Calibrating delay loop (skipped), value
calculated using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002478] pid_max: default: 32768 minimum: 301
[ 0.002962] Security Framework initialized
[ 0.003541] Dentry cache hash table entries: 131072 (order:
8, 1048576 bytes)
[ 0.004347] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005058] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005844] Mountpoint-cache hash table entries: 2048
(order: 2, 16384 bytes)
[ 0.025949] ASID allocator initialised with 32768 entries
[ 0.029958] Hierarchical SRCU implementation.
[ 0.034328] Platform MSI: its domain created
[ 0.034787] PCI/MSI: /intc/its domain created
[ 0.035359] EFI services will not be available.
[ 0.037987] smp: Bringing up secondary CPUs ...
[ 0.038454] smp: Brought up 1 node, 1 CPU
[ 0.038859] SMP: Total of 1 processors activated.
[ 0.039338] CPU features: detected: GIC system register CPU
interface
[ 0.039988] CPU features: detected: Privileged Access Never
[ 0.040560] CPU features: detected: User Access Override
[ 0.041093] CPU features: detected: RAS Extension Support
[ 0.042947] Insufficient stack space to handle exception!
[ 0.042949] ESR: 0x96000046 -- DABT (current EL)
[ 0.043963] FAR: 0xffff0000093a80e0
[ 0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
[ 0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 0.058572] Overflow stack:
[0xffff80003efce2f0..0xffff80003efcf2f0]
[ 0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #6
[ 0.073138] Hardware name: linux,dummy-virt (DT)
[ 0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
[ 0.082661] pc : el1_sync+0x0/0xb0
[ 0.086152] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.091219] sp : ffff0000093a80e0
[ 0.094589] x29: ffff0000093abce0 x28: ffff000008ea9000
[ 0.100004] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.105424] x25: ffff00000906d000 x24: ffff000009191000
[ 0.110733] x23: ffff000008ea9000 x22: 0000000041190000
[ 0.116148] x21: ffff0000091f7000 x20: 0000000000000000
[ 0.121564] x19: ffff000009190000 x18: 000000003455d99d
[ 0.126977] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.132288] x15: 000000007eff6000 x14: 000000007eff6000
[ 0.137704] x13: 00f800007fe00f11 x12: 000000007eff8000
[ 0.143013] x11: 000000007eff8000 x10: 0000000000000000
[ 0.148426] x9 : 000000007eff9000 x8 : 000000007eff9000
[ 0.153841] x7 : 0000000000000000 x6 : 00000000411f8000
[ 0.159154] x5 : 00000000411f8000 x4 : 0000000040a443d4
[ 0.164567] x3 : 00000000411f7000 x2 : 00000000411f7000
[ 0.169981] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
[ 0.175395] Kernel panic - not syncing: kernel stack overflow
[ 0.181178] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #6
[ 0.189248] Hardware name: linux,dummy-virt (DT)
[ 0.193945] Call trace:
[ 0.196470] dump_backtrace+0x0/0x180
[ 0.200201] show_stack+0x14/0x1c
[ 0.203574] dump_stack+0x90/0xb0
[ 0.206946] panic+0x138/0x2a0
[ 0.210075] __stack_chk_fail+0x0/0x18
[ 0.213922] handle_bad_stack+0x118/0x124
[ 0.218012] __bad_stack+0x88/0x8c
[ 0.221393] el1_sync+0x0/0xb0
[ 0.224520] Unable to handle kernel paging request at
virtual address ffff0000093abce0
[ 0.232586] Mem abort info:
[ 0.235362] ESR = 0x96000006
[ 0.238488] Exception class = DABT (current EL), IL = 32 bits
[ 0.244506] SET = 0, FnV = 0
[ 0.247632] EA = 0, S1PTW = 0
[ 0.250873] Data abort info:
[ 0.253765] ISV = 0, ISS = 0x00000006
[ 0.257725] CM = 0, WnR = 0
[ 0.260735] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
Best Regards,
Wei
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 14:18 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 14:18 UTC (permalink / raw)
To: linux-arm-kernel
Hi All,
We have observed KVM guest sometimes failed to boot because of kernel stack
overflow if KPTI is enabled on a hisilicon arm64 platform.
We also tested with different kernel version and found it is only
happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the
guest.
The detail result is as below table.
+---------+----------+--------+------------+-------------------+
| host |host KPTI | guest | guest KPTI | kvm guest |
| kernel |enabled | kernel | enabled | booting result |
+---------+----------+--------+------------+-------------------+
| 4.17 | Y | 4.17 | Y | stack overflow |
+---------+----------+--------+------------+-------------------+
| 4.17 | Y | 4.16 | NA | OK |
+---------+----------+--------+------------+-------------------+
| 4.16 | NA | 4.17 | Y | stack overflow |
+---------+----------+--------+------------+-------------------+
| 4.16 | NA | 4.16 | NA | OK |
+---------+----------+--------+------------+-------------------+
A simple walk-around is adding this platform into the "kpti_safe_list".
But it does not resolve the issue indeed.
Could you please share any hint how to resolve this kind issue?
Thanks!
Another issue we found is "kpti_install_ng_mappings" will be invoked
even "kpti=off" has been added in the kernel command line. Is that expected?
This is because "kpti" is not a *early* param that "init_cpu_features" will
be invoked before parsing the param.
The command we are using to run the guest is as:
./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3
-cpu host
-enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd
../mini-rootfs-arm64.cpio.gz
-nographic -append "rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000"
The log is as below:
[ 0.000000] Booting Linux on physical CPU 0x0000000000
[0x480fd010]
[ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6
SMP PREEMPT Fri Jun 15 21:39:52 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000
(options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval)
s57984 r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table
isolation (KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit
management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K
kernel code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K
reserved, 16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0,
CPUs=1, Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz
(virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000854] Console: colour dummy device 80x25
[ 0.001423] Calibrating delay loop (skipped), value
calculated using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002478] pid_max: default: 32768 minimum: 301
[ 0.002962] Security Framework initialized
[ 0.003541] Dentry cache hash table entries: 131072 (order:
8, 1048576 bytes)
[ 0.004347] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005058] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005844] Mountpoint-cache hash table entries: 2048
(order: 2, 16384 bytes)
[ 0.025949] ASID allocator initialised with 32768 entries
[ 0.029958] Hierarchical SRCU implementation.
[ 0.034328] Platform MSI: its domain created
[ 0.034787] PCI/MSI: /intc/its domain created
[ 0.035359] EFI services will not be available.
[ 0.037987] smp: Bringing up secondary CPUs ...
[ 0.038454] smp: Brought up 1 node, 1 CPU
[ 0.038859] SMP: Total of 1 processors activated.
[ 0.039338] CPU features: detected: GIC system register CPU
interface
[ 0.039988] CPU features: detected: Privileged Access Never
[ 0.040560] CPU features: detected: User Access Override
[ 0.041093] CPU features: detected: RAS Extension Support
[ 0.042947] Insufficient stack space to handle exception!
[ 0.042949] ESR: 0x96000046 -- DABT (current EL)
[ 0.043963] FAR: 0xffff0000093a80e0
[ 0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
[ 0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 0.058572] Overflow stack:
[0xffff80003efce2f0..0xffff80003efcf2f0]
[ 0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #6
[ 0.073138] Hardware name: linux,dummy-virt (DT)
[ 0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
[ 0.082661] pc : el1_sync+0x0/0xb0
[ 0.086152] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.091219] sp : ffff0000093a80e0
[ 0.094589] x29: ffff0000093abce0 x28: ffff000008ea9000
[ 0.100004] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.105424] x25: ffff00000906d000 x24: ffff000009191000
[ 0.110733] x23: ffff000008ea9000 x22: 0000000041190000
[ 0.116148] x21: ffff0000091f7000 x20: 0000000000000000
[ 0.121564] x19: ffff000009190000 x18: 000000003455d99d
[ 0.126977] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.132288] x15: 000000007eff6000 x14: 000000007eff6000
[ 0.137704] x13: 00f800007fe00f11 x12: 000000007eff8000
[ 0.143013] x11: 000000007eff8000 x10: 0000000000000000
[ 0.148426] x9 : 000000007eff9000 x8 : 000000007eff9000
[ 0.153841] x7 : 0000000000000000 x6 : 00000000411f8000
[ 0.159154] x5 : 00000000411f8000 x4 : 0000000040a443d4
[ 0.164567] x3 : 00000000411f7000 x2 : 00000000411f7000
[ 0.169981] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
[ 0.175395] Kernel panic - not syncing: kernel stack overflow
[ 0.181178] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #6
[ 0.189248] Hardware name: linux,dummy-virt (DT)
[ 0.193945] Call trace:
[ 0.196470] dump_backtrace+0x0/0x180
[ 0.200201] show_stack+0x14/0x1c
[ 0.203574] dump_stack+0x90/0xb0
[ 0.206946] panic+0x138/0x2a0
[ 0.210075] __stack_chk_fail+0x0/0x18
[ 0.213922] handle_bad_stack+0x118/0x124
[ 0.218012] __bad_stack+0x88/0x8c
[ 0.221393] el1_sync+0x0/0xb0
[ 0.224520] Unable to handle kernel paging request at
virtual address ffff0000093abce0
[ 0.232586] Mem abort info:
[ 0.235362] ESR = 0x96000006
[ 0.238488] Exception class = DABT (current EL), IL = 32 bits
[ 0.244506] SET = 0, FnV = 0
[ 0.247632] EA = 0, S1PTW = 0
[ 0.250873] Data abort info:
[ 0.253765] ISV = 0, ISS = 0x00000006
[ 0.257725] CM = 0, WnR = 0
[ 0.260735] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
Best Regards,
Wei
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-20 14:18 ` Wei Xu
@ 2018-06-20 14:42 ` Will Deacon
-1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-20 14:42 UTC (permalink / raw)
To: Wei Xu
Cc: catalin.marinas, suzuki.poulose, dave.martin, mark.rutland,
james.morse, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
Hi Wei,
On Wed, Jun 20, 2018 at 10:18:00PM +0800, Wei Xu wrote:
> We have observed KVM guest sometimes failed to boot because of kernel stack
> overflow if KPTI is enabled on a hisilicon arm64 platform.
>
> We also tested with different kernel version and found it is only
> happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the
> guest.
> The detail result is as below table.
>
> +---------+----------+--------+------------+-------------------+
> | host |host KPTI | guest | guest KPTI | kvm guest |
> | kernel |enabled | kernel | enabled | booting result |
> +---------+----------+--------+------------+-------------------+
> | 4.17 | Y | 4.17 | Y | stack overflow |
> +---------+----------+--------+------------+-------------------+
> | 4.17 | Y | 4.16 | NA | OK |
> +---------+----------+--------+------------+-------------------+
> | 4.16 | NA | 4.17 | Y | stack overflow |
> +---------+----------+--------+------------+-------------------+
> | 4.16 | NA | 4.16 | NA | OK |
> +---------+----------+--------+------------+-------------------+
>
> A simple walk-around is adding this platform into the "kpti_safe_list".
> But it does not resolve the issue indeed.
> Could you please share any hint how to resolve this kind issue?
> Thanks!
>
> Another issue we found is "kpti_install_ng_mappings" will be invoked
> even "kpti=off" has been added in the kernel command line. Is that expected?
> This is because "kpti" is not a *early* param that "init_cpu_features" will
> be invoked before parsing the param.
That sounds like a straightforward bug, which means we should use
early_param instead of __setup. I assume that doesn't fix your crash,
though?
> The command we are using to run the guest is as:
>
> ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
> host
> -enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd
> ../mini-rootfs-arm64.cpio.gz
> -nographic -append "rdinit=init console=ttyAMA0
> earlycon=pl011,0x9000000"
>
> The log is as below:
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000
> [0x480fd010]
> [ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
> 15 21:39:52 CST 2018
^^^ This is reproducible with vanilla v4.17 and defconfig, right?
> [ 0.038859] SMP: Total of 1 processors activated.
> [ 0.039338] CPU features: detected: GIC system register CPU
> interface
> [ 0.039988] CPU features: detected: Privileged Access Never
> [ 0.040560] CPU features: detected: User Access Override
> [ 0.041093] CPU features: detected: RAS Extension Support
> [ 0.042947] Insufficient stack space to handle exception!
> [ 0.042949] ESR: 0x96000046 -- DABT (current EL)
> [ 0.043963] FAR: 0xffff0000093a80e0
> [ 0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> [ 0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
> [ 0.058572] Overflow stack:
> [0xffff80003efce2f0..0xffff80003efcf2f0]
> [ 0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #6
> [ 0.073138] Hardware name: linux,dummy-virt (DT)
> [ 0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> [ 0.082661] pc : el1_sync+0x0/0xb0
> [ 0.086152] lr : kpti_install_ng_mappings+0x120/0x214
Can you use scripts/faddr2line to find out which line of code the lr is
pointing at, please? It would be interesting to know if we managed to
install the idmap.
Hmm, I wonder if this is at all related to RAS, since we've just enabled
that and if we take a fault whilst rewriting swapper then we're going to
get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 14:42 ` Will Deacon
0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-20 14:42 UTC (permalink / raw)
To: linux-arm-kernel
Hi Wei,
On Wed, Jun 20, 2018 at 10:18:00PM +0800, Wei Xu wrote:
> We have observed KVM guest sometimes failed to boot because of kernel stack
> overflow if KPTI is enabled on a hisilicon arm64 platform.
>
> We also tested with different kernel version and found it is only
> happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the
> guest.
> The detail result is as below table.
>
> +---------+----------+--------+------------+-------------------+
> | host |host KPTI | guest | guest KPTI | kvm guest |
> | kernel |enabled | kernel | enabled | booting result |
> +---------+----------+--------+------------+-------------------+
> | 4.17 | Y | 4.17 | Y | stack overflow |
> +---------+----------+--------+------------+-------------------+
> | 4.17 | Y | 4.16 | NA | OK |
> +---------+----------+--------+------------+-------------------+
> | 4.16 | NA | 4.17 | Y | stack overflow |
> +---------+----------+--------+------------+-------------------+
> | 4.16 | NA | 4.16 | NA | OK |
> +---------+----------+--------+------------+-------------------+
>
> A simple walk-around is adding this platform into the "kpti_safe_list".
> But it does not resolve the issue indeed.
> Could you please share any hint how to resolve this kind issue?
> Thanks!
>
> Another issue we found is "kpti_install_ng_mappings" will be invoked
> even "kpti=off" has been added in the kernel command line. Is that expected?
> This is because "kpti" is not a *early* param that "init_cpu_features" will
> be invoked before parsing the param.
That sounds like a straightforward bug, which means we should use
early_param instead of __setup. I assume that doesn't fix your crash,
though?
> The command we are using to run the guest is as:
>
> ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
> host
> -enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd
> ../mini-rootfs-arm64.cpio.gz
> -nographic -append "rdinit=init console=ttyAMA0
> earlycon=pl011,0x9000000"
>
> The log is as below:
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000
> [0x480fd010]
> [ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
> 15 21:39:52 CST 2018
^^^ This is reproducible with vanilla v4.17 and defconfig, right?
> [ 0.038859] SMP: Total of 1 processors activated.
> [ 0.039338] CPU features: detected: GIC system register CPU
> interface
> [ 0.039988] CPU features: detected: Privileged Access Never
> [ 0.040560] CPU features: detected: User Access Override
> [ 0.041093] CPU features: detected: RAS Extension Support
> [ 0.042947] Insufficient stack space to handle exception!
> [ 0.042949] ESR: 0x96000046 -- DABT (current EL)
> [ 0.043963] FAR: 0xffff0000093a80e0
> [ 0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> [ 0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
> [ 0.058572] Overflow stack:
> [0xffff80003efce2f0..0xffff80003efcf2f0]
> [ 0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #6
> [ 0.073138] Hardware name: linux,dummy-virt (DT)
> [ 0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> [ 0.082661] pc : el1_sync+0x0/0xb0
> [ 0.086152] lr : kpti_install_ng_mappings+0x120/0x214
Can you use scripts/faddr2line to find out which line of code the lr is
pointing at, please? It would be interesting to know if we managed to
install the idmap.
Hmm, I wonder if this is at all related to RAS, since we've just enabled
that and if we take a fault whilst rewriting swapper then we're going to
get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-20 14:42 ` Will Deacon
@ 2018-06-20 15:52 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 15:52 UTC (permalink / raw)
To: Will Deacon
Cc: catalin.marinas, suzuki.poulose, dave.martin, mark.rutland,
james.morse, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
Hi Will,
On 2018/6/20 22:42, Will Deacon wrote:
> Hi Wei,
>
> On Wed, Jun 20, 2018 at 10:18:00PM +0800, Wei Xu wrote:
>> We have observed KVM guest sometimes failed to boot because of kernel stack
>> overflow if KPTI is enabled on a hisilicon arm64 platform.
>>
>> We also tested with different kernel version and found it is only
>> happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the
>> guest.
>> The detail result is as below table.
>>
>> +---------+----------+--------+------------+-------------------+
>> | host |host KPTI | guest | guest KPTI | kvm guest |
>> | kernel |enabled | kernel | enabled | booting result |
>> +---------+----------+--------+------------+-------------------+
>> | 4.17 | Y | 4.17 | Y | stack overflow |
>> +---------+----------+--------+------------+-------------------+
>> | 4.17 | Y | 4.16 | NA | OK |
>> +---------+----------+--------+------------+-------------------+
>> | 4.16 | NA | 4.17 | Y | stack overflow |
>> +---------+----------+--------+------------+-------------------+
>> | 4.16 | NA | 4.16 | NA | OK |
>> +---------+----------+--------+------------+-------------------+
>>
>> A simple walk-around is adding this platform into the "kpti_safe_list".
>> But it does not resolve the issue indeed.
>> Could you please share any hint how to resolve this kind issue?
>> Thanks!
>>
>> Another issue we found is "kpti_install_ng_mappings" will be invoked
>> even "kpti=off" has been added in the kernel command line. Is that expected?
>> This is because "kpti" is not a *early* param that "init_cpu_features" will
>> be invoked before parsing the param.
> That sounds like a straightforward bug, which means we should use
> early_param instead of __setup. I assume that doesn't fix your crash,
> though?
Thanks for you quick response!
It can fix our crash but just another walk-around.
>> The command we are using to run the guest is as:
>>
>> ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
>> host
>> -enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd
>> ../mini-rootfs-arm64.cpio.gz
>> -nographic -append "rdinit=init console=ttyAMA0
>> earlycon=pl011,0x9000000"
>>
>> The log is as below:
>>
>> [ 0.000000] Booting Linux on physical CPU 0x0000000000
>> [0x480fd010]
>> [ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
>> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
>> 15 21:39:52 CST 2018
> ^^^ This is reproducible with vanilla v4.17 and defconfig, right?
Yes.
>
>> [ 0.038859] SMP: Total of 1 processors activated.
>> [ 0.039338] CPU features: detected: GIC system register CPU
>> interface
>> [ 0.039988] CPU features: detected: Privileged Access Never
>> [ 0.040560] CPU features: detected: User Access Override
>> [ 0.041093] CPU features: detected: RAS Extension Support
>> [ 0.042947] Insufficient stack space to handle exception!
>> [ 0.042949] ESR: 0x96000046 -- DABT (current EL)
>> [ 0.043963] FAR: 0xffff0000093a80e0
>> [ 0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>> [ 0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
>> [ 0.058572] Overflow stack:
>> [0xffff80003efce2f0..0xffff80003efcf2f0]
>> [ 0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #6
>> [ 0.073138] Hardware name: linux,dummy-virt (DT)
>> [ 0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>> [ 0.082661] pc : el1_sync+0x0/0xb0
>> [ 0.086152] lr : kpti_install_ng_mappings+0x120/0x214
> Can you use scripts/faddr2line to find out which line of code the lr is
> pointing at, please? It would be interesting to know if we managed to
> install the idmap.
I did not use addr2line before but with gdb we can get same info as below:
(gdb) list *kpti_install_ng_mappings+0x120/0x214
0xffff000008091d70 is in kpti_install_ng_mappings
(/home/joyx/plinth-kernel-v200/arch/arm64/kernel/cpufeature.c:907).
902 return !has_cpuid_feature(entry, scope);
903 }
904
905 static void
906 kpti_install_ng_mappings(const struct arm64_cpu_capabilities
*__unused)
907 {
908 typedef void (kpti_remap_fn)(int, int, phys_addr_t);
909 extern kpti_remap_fn idmap_kpti_install_ng_mappings;
910 kpti_remap_fn *remap_fn;
911
> Hmm, I wonder if this is at all related to RAS, since we've just enabled
> that and if we take a fault whilst rewriting swapper then we're going to
> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
I will try it now.
Thanks!
Best Regards,
Wei
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 15:52 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 15:52 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 2018/6/20 22:42, Will Deacon wrote:
> Hi Wei,
>
> On Wed, Jun 20, 2018 at 10:18:00PM +0800, Wei Xu wrote:
>> We have observed KVM guest sometimes failed to boot because of kernel stack
>> overflow if KPTI is enabled on a hisilicon arm64 platform.
>>
>> We also tested with different kernel version and found it is only
>> happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the
>> guest.
>> The detail result is as below table.
>>
>> +---------+----------+--------+------------+-------------------+
>> | host |host KPTI | guest | guest KPTI | kvm guest |
>> | kernel |enabled | kernel | enabled | booting result |
>> +---------+----------+--------+------------+-------------------+
>> | 4.17 | Y | 4.17 | Y | stack overflow |
>> +---------+----------+--------+------------+-------------------+
>> | 4.17 | Y | 4.16 | NA | OK |
>> +---------+----------+--------+------------+-------------------+
>> | 4.16 | NA | 4.17 | Y | stack overflow |
>> +---------+----------+--------+------------+-------------------+
>> | 4.16 | NA | 4.16 | NA | OK |
>> +---------+----------+--------+------------+-------------------+
>>
>> A simple walk-around is adding this platform into the "kpti_safe_list".
>> But it does not resolve the issue indeed.
>> Could you please share any hint how to resolve this kind issue?
>> Thanks!
>>
>> Another issue we found is "kpti_install_ng_mappings" will be invoked
>> even "kpti=off" has been added in the kernel command line. Is that expected?
>> This is because "kpti" is not a *early* param that "init_cpu_features" will
>> be invoked before parsing the param.
> That sounds like a straightforward bug, which means we should use
> early_param instead of __setup. I assume that doesn't fix your crash,
> though?
Thanks for you quick response!
It can fix our crash but just another walk-around.
>> The command we are using to run the guest is as:
>>
>> ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
>> host
>> -enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd
>> ../mini-rootfs-arm64.cpio.gz
>> -nographic -append "rdinit=init console=ttyAMA0
>> earlycon=pl011,0x9000000"
>>
>> The log is as below:
>>
>> [ 0.000000] Booting Linux on physical CPU 0x0000000000
>> [0x480fd010]
>> [ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
>> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
>> 15 21:39:52 CST 2018
> ^^^ This is reproducible with vanilla v4.17 and defconfig, right?
Yes.
>
>> [ 0.038859] SMP: Total of 1 processors activated.
>> [ 0.039338] CPU features: detected: GIC system register CPU
>> interface
>> [ 0.039988] CPU features: detected: Privileged Access Never
>> [ 0.040560] CPU features: detected: User Access Override
>> [ 0.041093] CPU features: detected: RAS Extension Support
>> [ 0.042947] Insufficient stack space to handle exception!
>> [ 0.042949] ESR: 0x96000046 -- DABT (current EL)
>> [ 0.043963] FAR: 0xffff0000093a80e0
>> [ 0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>> [ 0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
>> [ 0.058572] Overflow stack:
>> [0xffff80003efce2f0..0xffff80003efcf2f0]
>> [ 0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #6
>> [ 0.073138] Hardware name: linux,dummy-virt (DT)
>> [ 0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>> [ 0.082661] pc : el1_sync+0x0/0xb0
>> [ 0.086152] lr : kpti_install_ng_mappings+0x120/0x214
> Can you use scripts/faddr2line to find out which line of code the lr is
> pointing at, please? It would be interesting to know if we managed to
> install the idmap.
I did not use addr2line before but with gdb we can get same info as below:
(gdb) list *kpti_install_ng_mappings+0x120/0x214
0xffff000008091d70 is in kpti_install_ng_mappings
(/home/joyx/plinth-kernel-v200/arch/arm64/kernel/cpufeature.c:907).
902 return !has_cpuid_feature(entry, scope);
903 }
904
905 static void
906 kpti_install_ng_mappings(const struct arm64_cpu_capabilities
*__unused)
907 {
908 typedef void (kpti_remap_fn)(int, int, phys_addr_t);
909 extern kpti_remap_fn idmap_kpti_install_ng_mappings;
910 kpti_remap_fn *remap_fn;
911
> Hmm, I wonder if this is at all related to RAS, since we've just enabled
> that and if we take a fault whilst rewriting swapper then we're going to
> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
I will try it now.
Thanks!
Best Regards,
Wei
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-20 15:52 ` Wei Xu
@ 2018-06-20 15:54 ` James Morse
-1 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-20 15:54 UTC (permalink / raw)
To: Wei Xu
Cc: Will Deacon, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
Hi Wei,
On 20/06/18 16:52, Wei Xu wrote:
> On 2018/6/20 22:42, Will Deacon wrote:
>> Hmm, I wonder if this is at all related to RAS, since we've just enabled
>> that and if we take a fault whilst rewriting swapper then we're going to
>> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
>
> I will try it now.
It's not just the Kconfig symbol, could you also revert:
f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
firmware-first")
(reverts and build cleanly on 4.17)
Thanks,
James
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 15:54 ` James Morse
0 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-20 15:54 UTC (permalink / raw)
To: linux-arm-kernel
Hi Wei,
On 20/06/18 16:52, Wei Xu wrote:
> On 2018/6/20 22:42, Will Deacon wrote:
>> Hmm, I wonder if this is at all related to RAS, since we've just enabled
>> that and if we take a fault whilst rewriting swapper then we're going to
>> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
>
> I will try it now.
It's not just the Kconfig symbol, could you also revert:
f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
firmware-first")
(reverts and build cleanly on 4.17)
Thanks,
James
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-20 15:54 ` James Morse
@ 2018-06-20 16:25 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 16:25 UTC (permalink / raw)
To: James Morse
Cc: Will Deacon, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
Hi James,
On 2018/6/20 23:54, James Morse wrote:
> Hi Wei,
>
> On 20/06/18 16:52, Wei Xu wrote:
>> On 2018/6/20 22:42, Will Deacon wrote:
>>> Hmm, I wonder if this is at all related to RAS, since we've just enabled
>>> that and if we take a fault whilst rewriting swapper then we're going to
>>> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
>> I will try it now.
> It's not just the Kconfig symbol, could you also revert:
>
> f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
> firmware-first")
>
>
> (reverts and build cleanly on 4.17)
Thanks to point out this!
I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
But I still got the stack overflow issue sometimes.
Do you have more hint?
Thanks!
The log is as below:
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
[ 0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10
SMP PREEMPT Wed Jun 20 23:59:05 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval)
s57984 r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table isolation
(KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K kernel
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved,
16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1,
Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000843] Console: colour dummy device 80x25
[ 0.001401] Calibrating delay loop (skipped), value calculated
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002453] pid_max: default: 32768 minimum: 301
[ 0.002941] Security Framework initialized
[ 0.003517] Dentry cache hash table entries: 131072 (order: 8,
1048576 bytes)
[ 0.004317] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005018] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005791] Mountpoint-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.025893] ASID allocator initialised with 32768 entries
[ 0.029901] Hierarchical SRCU implementation.
[ 0.034274] Platform MSI: its domain created
[ 0.034749] PCI/MSI: /intc/its domain created
[ 0.035317] EFI services will not be available.
[ 0.037930] smp: Bringing up secondary CPUs ...
[ 0.038396] smp: Brought up 1 node, 1 CPU
[ 0.038810] SMP: Total of 1 processors activated.
[ 0.039285] CPU features: detected: GIC system register CPU
interface
[ 0.039930] CPU features: detected: Privileged Access Never
[ 0.040488] CPU features: detected: User Access Override
[ 0.042421] Insufficient stack space to handle exception!
[ 0.042423] ESR: 0x96000046 -- DABT (current EL)
[ 0.043730] FAR: 0xffff0000093a80e0
[ 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
[ 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
[ 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-g2b31fe7-dirty #10
[ 0.072201] Hardware name: linux,dummy-virt (DT)
[ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
[ 0.081727] pc : el1_sync+0x0/0xb0
[ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.090284] sp : ffff0000093a80e0
[ 0.093654] x29: ffff0000093abce0 x28: ffff000008ea9000
[ 0.099071] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.104488] x25: ffff00000906d000 x24: ffff000009191000
[ 0.109798] x23: ffff000008ea9000 x22: 0000000041190000
[ 0.115217] x21: ffff0000091f7000 x20: 0000000000000000
[ 0.120633] x19: ffff000009190000 x18: 000000003455d99d
[ 0.125943] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.131358] x15: 000000007eff6000 x14: 000000007eff6000
[ 0.136773] x13: 00f800007fe00f11 x12: 000000007eff8000
[ 0.142082] x11: 000000007eff8000 x10: 0000000000000000
[ 0.147501] x9 : 000000007eff9000 x8 : 000000007eff9000
[ 0.152920] x7 : 0000000000000000 x6 : 00000000411f8000
[ 0.158230] x5 : 00000000411f8000 x4 : 0000000040a443d4
[ 0.163646] x3 : 00000000411f7000 x2 : 00000000411f7000
[ 0.169061] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
[ 0.174372] Kernel panic - not syncing: kernel stack overflow
[ 0.180264] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-g2b31fe7-dirty #10
[ 0.188348] Hardware name: linux,dummy-virt (DT)
[ 0.193046] Call trace:
[ 0.195572] dump_backtrace+0x0/0x180
[ 0.199304] show_stack+0x14/0x1c
[ 0.202677] dump_stack+0x90/0xb0
[ 0.206152] panic+0x138/0x2a0
[ 0.209182] __stack_chk_fail+0x0/0x18
[ 0.213029] handle_bad_stack+0x118/0x124
[ 0.217120] __bad_stack+0x88/0x8c
[ 0.220607] el1_sync+0x0/0xb0
[ 0.223738] Unable to handle kernel paging request at virtual
address ffff0000093abce0
[ 0.231704] Mem abort info:
[ 0.234586] ESR = 0x96000006
[ 0.237714] Exception class = DABT (current EL), IL = 32 bits
[ 0.243628] SET = 0, FnV = 0
[ 0.246758] EA = 0, S1PTW = 0
[ 0.250001] Data abort info:
[ 0.253000] ISV = 0, ISS = 0x00000006
[ 0.256859] CM = 0, WnR = 0
[ 0.259871] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.266862] [ffff0000093abce0] pgd=00000000411f8803,
pud=00000000411f9803, pmd=0000000000000000
[ 0.275659] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 0.281213] Modules linked in:
[ 0.284447] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-g2b31fe7-dirty #10
[ 0.292534] Hardware name: linux,dummy-virt (DT)
[ 0.297229] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
[ 0.302053] pc : unwind_frame+0x28/0xc8
[ 0.306022] lr : dump_backtrace+0x12c/0x180
[ 0.310245] sp : ffff80003efcf000
[ 0.313616] x29: ffff80003efcf000 x28: ffff80003da61c00
[ 0.319033] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.324348] x25: ffff00000906d000 x24: ffff0000093a80e0
[ 0.329764] x23: 0000000000000000 x22: ffff000008dbae28
[ 0.335179] x21: 0000000000000000 x20: ffff000009049000
[ 0.340488] x19: ffff80003da61c00 x18: 000000003455d99d
[ 0.345906] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.351322] x15: 000000007eff6000 x14: 3031232079747269
[ 0.356633] x13: 0000000000000000 x12: cc26f77952f87e00
[ 0.362046] x11: ffffffffffffffff x10: 0000000000000076
[ 0.367466] x9 : ffff0000085aea28 x8 : ffff80003efcec90
[ 0.372880] x7 : 0000000000000000 x6 : ffff0000091befe1
[ 0.378190] x5 : 0000000000000000 x4 : ffff0000093ac000
[ 0.383605] x3 : ffff0000093a8000 x2 : ffff0000093abce0
[ 0.389021] x1 : ffff80003efcf048 x0 : ffff80003da61c00
[ 0.394330] Process migration/0 (pid: 12, stack limit =
0x (ptrval))
[ 0.401427] Call trace:
[ 0.403852] unwind_frame+0x28/0xc8
[ 0.407455] show_stack+0x14/0x1c
[ 0.410828] dump_stack+0x90/0xb0
[ 0.414201] panic+0x138/0x2a0
[ 0.417329] __stack_chk_fail+0x0/0x18
[ 0.421177] handle_bad_stack+0x118/0x124
[ 0.425273] __bad_stack+0x88/0x8c
[ 0.428762] el1_sync+0x0/0xb0
[ 0.431891] Unable to handle kernel paging request at virtual
address ffff0000093abce0
[ 0.439851] Mem abort info:
[ 0.442734] ESR = 0x96000006
[ 0.445861] Exception class = DABT (current EL), IL = 32 bits
[ 0.451774] SET = 0, FnV = 0
[ 0.454900] EA = 0, S1PTW = 0
[ 0.458142] Data abort info:
[ 0.461144] ISV = 0, ISS = 0x00000006
[ 0.465001] CM = 0, WnR = 0
[ 0.468013] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.474996] [ffff0000093abce0] pgd=00000000411f8803,
pud=00000000411f9803, pmd=0000000000000000
Best Regards,
Wei
>
> Thanks,
>
> James
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 16:25 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 16:25 UTC (permalink / raw)
To: linux-arm-kernel
Hi James,
On 2018/6/20 23:54, James Morse wrote:
> Hi Wei,
>
> On 20/06/18 16:52, Wei Xu wrote:
>> On 2018/6/20 22:42, Will Deacon wrote:
>>> Hmm, I wonder if this is at all related to RAS, since we've just enabled
>>> that and if we take a fault whilst rewriting swapper then we're going to
>>> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
>> I will try it now.
> It's not just the Kconfig symbol, could you also revert:
>
> f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
> firmware-first")
>
>
> (reverts and build cleanly on 4.17)
Thanks to point out this!
I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
But I still got the stack overflow issue sometimes.
Do you have more hint?
Thanks!
The log is as below:
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
[ 0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10
SMP PREEMPT Wed Jun 20 23:59:05 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval)
s57984 r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table isolation
(KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K kernel
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved,
16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1,
Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000843] Console: colour dummy device 80x25
[ 0.001401] Calibrating delay loop (skipped), value calculated
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002453] pid_max: default: 32768 minimum: 301
[ 0.002941] Security Framework initialized
[ 0.003517] Dentry cache hash table entries: 131072 (order: 8,
1048576 bytes)
[ 0.004317] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005018] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005791] Mountpoint-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.025893] ASID allocator initialised with 32768 entries
[ 0.029901] Hierarchical SRCU implementation.
[ 0.034274] Platform MSI: its domain created
[ 0.034749] PCI/MSI: /intc/its domain created
[ 0.035317] EFI services will not be available.
[ 0.037930] smp: Bringing up secondary CPUs ...
[ 0.038396] smp: Brought up 1 node, 1 CPU
[ 0.038810] SMP: Total of 1 processors activated.
[ 0.039285] CPU features: detected: GIC system register CPU
interface
[ 0.039930] CPU features: detected: Privileged Access Never
[ 0.040488] CPU features: detected: User Access Override
[ 0.042421] Insufficient stack space to handle exception!
[ 0.042423] ESR: 0x96000046 -- DABT (current EL)
[ 0.043730] FAR: 0xffff0000093a80e0
[ 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
[ 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
[ 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-g2b31fe7-dirty #10
[ 0.072201] Hardware name: linux,dummy-virt (DT)
[ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
[ 0.081727] pc : el1_sync+0x0/0xb0
[ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.090284] sp : ffff0000093a80e0
[ 0.093654] x29: ffff0000093abce0 x28: ffff000008ea9000
[ 0.099071] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.104488] x25: ffff00000906d000 x24: ffff000009191000
[ 0.109798] x23: ffff000008ea9000 x22: 0000000041190000
[ 0.115217] x21: ffff0000091f7000 x20: 0000000000000000
[ 0.120633] x19: ffff000009190000 x18: 000000003455d99d
[ 0.125943] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.131358] x15: 000000007eff6000 x14: 000000007eff6000
[ 0.136773] x13: 00f800007fe00f11 x12: 000000007eff8000
[ 0.142082] x11: 000000007eff8000 x10: 0000000000000000
[ 0.147501] x9 : 000000007eff9000 x8 : 000000007eff9000
[ 0.152920] x7 : 0000000000000000 x6 : 00000000411f8000
[ 0.158230] x5 : 00000000411f8000 x4 : 0000000040a443d4
[ 0.163646] x3 : 00000000411f7000 x2 : 00000000411f7000
[ 0.169061] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
[ 0.174372] Kernel panic - not syncing: kernel stack overflow
[ 0.180264] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-g2b31fe7-dirty #10
[ 0.188348] Hardware name: linux,dummy-virt (DT)
[ 0.193046] Call trace:
[ 0.195572] dump_backtrace+0x0/0x180
[ 0.199304] show_stack+0x14/0x1c
[ 0.202677] dump_stack+0x90/0xb0
[ 0.206152] panic+0x138/0x2a0
[ 0.209182] __stack_chk_fail+0x0/0x18
[ 0.213029] handle_bad_stack+0x118/0x124
[ 0.217120] __bad_stack+0x88/0x8c
[ 0.220607] el1_sync+0x0/0xb0
[ 0.223738] Unable to handle kernel paging request at virtual
address ffff0000093abce0
[ 0.231704] Mem abort info:
[ 0.234586] ESR = 0x96000006
[ 0.237714] Exception class = DABT (current EL), IL = 32 bits
[ 0.243628] SET = 0, FnV = 0
[ 0.246758] EA = 0, S1PTW = 0
[ 0.250001] Data abort info:
[ 0.253000] ISV = 0, ISS = 0x00000006
[ 0.256859] CM = 0, WnR = 0
[ 0.259871] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.266862] [ffff0000093abce0] pgd=00000000411f8803,
pud=00000000411f9803, pmd=0000000000000000
[ 0.275659] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 0.281213] Modules linked in:
[ 0.284447] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-g2b31fe7-dirty #10
[ 0.292534] Hardware name: linux,dummy-virt (DT)
[ 0.297229] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
[ 0.302053] pc : unwind_frame+0x28/0xc8
[ 0.306022] lr : dump_backtrace+0x12c/0x180
[ 0.310245] sp : ffff80003efcf000
[ 0.313616] x29: ffff80003efcf000 x28: ffff80003da61c00
[ 0.319033] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.324348] x25: ffff00000906d000 x24: ffff0000093a80e0
[ 0.329764] x23: 0000000000000000 x22: ffff000008dbae28
[ 0.335179] x21: 0000000000000000 x20: ffff000009049000
[ 0.340488] x19: ffff80003da61c00 x18: 000000003455d99d
[ 0.345906] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.351322] x15: 000000007eff6000 x14: 3031232079747269
[ 0.356633] x13: 0000000000000000 x12: cc26f77952f87e00
[ 0.362046] x11: ffffffffffffffff x10: 0000000000000076
[ 0.367466] x9 : ffff0000085aea28 x8 : ffff80003efcec90
[ 0.372880] x7 : 0000000000000000 x6 : ffff0000091befe1
[ 0.378190] x5 : 0000000000000000 x4 : ffff0000093ac000
[ 0.383605] x3 : ffff0000093a8000 x2 : ffff0000093abce0
[ 0.389021] x1 : ffff80003efcf048 x0 : ffff80003da61c00
[ 0.394330] Process migration/0 (pid: 12, stack limit =
0x (ptrval))
[ 0.401427] Call trace:
[ 0.403852] unwind_frame+0x28/0xc8
[ 0.407455] show_stack+0x14/0x1c
[ 0.410828] dump_stack+0x90/0xb0
[ 0.414201] panic+0x138/0x2a0
[ 0.417329] __stack_chk_fail+0x0/0x18
[ 0.421177] handle_bad_stack+0x118/0x124
[ 0.425273] __bad_stack+0x88/0x8c
[ 0.428762] el1_sync+0x0/0xb0
[ 0.431891] Unable to handle kernel paging request at virtual
address ffff0000093abce0
[ 0.439851] Mem abort info:
[ 0.442734] ESR = 0x96000006
[ 0.445861] Exception class = DABT (current EL), IL = 32 bits
[ 0.451774] SET = 0, FnV = 0
[ 0.454900] EA = 0, S1PTW = 0
[ 0.458142] Data abort info:
[ 0.461144] ISV = 0, ISS = 0x00000006
[ 0.465001] CM = 0, WnR = 0
[ 0.468013] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.474996] [ffff0000093abce0] pgd=00000000411f8803,
pud=00000000411f9803, pmd=0000000000000000
Best Regards,
Wei
>
> Thanks,
>
> James
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-20 16:25 ` Wei Xu
@ 2018-06-20 16:28 ` Will Deacon
-1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-20 16:28 UTC (permalink / raw)
To: Wei Xu
Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
On Thu, Jun 21, 2018 at 12:25:05AM +0800, Wei Xu wrote:
> Hi James,
>
> On 2018/6/20 23:54, James Morse wrote:
> >Hi Wei,
> >
> >On 20/06/18 16:52, Wei Xu wrote:
> >>On 2018/6/20 22:42, Will Deacon wrote:
> >>>Hmm, I wonder if this is at all related to RAS, since we've just enabled
> >>>that and if we take a fault whilst rewriting swapper then we're going to
> >>>get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
> >>I will try it now.
> >It's not just the Kconfig symbol, could you also revert:
> >
> >f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
> >firmware-first")
> >
> >
> >(reverts and build cleanly on 4.17)
>
> Thanks to point out this!
> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
> But I still got the stack overflow issue sometimes.
> Do you have more hint?
[...]
> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> [ 0.081727] pc : el1_sync+0x0/0xb0
> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
Please run:
$ ./scripts/faddr2line vmlinux kpti_install_ng_mappings+0x120/0x214
as the GDB output wasn't helpful (it only showed local variable
declarations?!).
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 16:28 ` Will Deacon
0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-20 16:28 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Jun 21, 2018 at 12:25:05AM +0800, Wei Xu wrote:
> Hi James,
>
> On 2018/6/20 23:54, James Morse wrote:
> >Hi Wei,
> >
> >On 20/06/18 16:52, Wei Xu wrote:
> >>On 2018/6/20 22:42, Will Deacon wrote:
> >>>Hmm, I wonder if this is at all related to RAS, since we've just enabled
> >>>that and if we take a fault whilst rewriting swapper then we're going to
> >>>get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
> >>I will try it now.
> >It's not just the Kconfig symbol, could you also revert:
> >
> >f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
> >firmware-first")
> >
> >
> >(reverts and build cleanly on 4.17)
>
> Thanks to point out this!
> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
> But I still got the stack overflow issue sometimes.
> Do you have more hint?
[...]
> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> [ 0.081727] pc : el1_sync+0x0/0xb0
> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
Please run:
$ ./scripts/faddr2line vmlinux kpti_install_ng_mappings+0x120/0x214
as the GDB output wasn't helpful (it only showed local variable
declarations?!).
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-20 16:28 ` Will Deacon
@ 2018-06-20 16:33 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 16:33 UTC (permalink / raw)
To: Will Deacon
Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
Hi Will,
On 2018/6/21 0:28, Will Deacon wrote:
> On Thu, Jun 21, 2018 at 12:25:05AM +0800, Wei Xu wrote:
>> Hi James,
>>
>> On 2018/6/20 23:54, James Morse wrote:
>>> Hi Wei,
>>>
>>> On 20/06/18 16:52, Wei Xu wrote:
>>>> On 2018/6/20 22:42, Will Deacon wrote:
>>>>> Hmm, I wonder if this is at all related to RAS, since we've just enabled
>>>>> that and if we take a fault whilst rewriting swapper then we're going to
>>>>> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
>>>> I will try it now.
>>> It's not just the Kconfig symbol, could you also revert:
>>>
>>> f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
>>> firmware-first")
>>>
>>>
>>> (reverts and build cleanly on 4.17)
>> Thanks to point out this!
>> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
>> But I still got the stack overflow issue sometimes.
>> Do you have more hint?
> [...]
>
>> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>> [ 0.081727] pc : el1_sync+0x0/0xb0
>> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
> Please run:
>
> $ ./scripts/faddr2line vmlinux kpti_install_ng_mappings+0x120/0x214
Thanks for your kindly guide :)
The output is as below:
joyx@Turing-Arch-b:~/plinth-kernel-v200$ ./scripts/faddr2line
../kernel-dev.build/vmlinux kpti_install_ng_mappings+0x120/0x214
kpti_install_ng_mappings+0x120/0x214:
cpu_set_reserved_ttbr0 at arch/arm64/include/asm/mmu_context.h:52
47 /*
48 * Set TTBR0 to empty_zero_page. No translations will be
possible via TTBR0.
49 */
50 static inline void cpu_set_reserved_ttbr0(void)
51 {
52 unsigned long ttbr =
phys_to_ttbr(__pa_symbol(empty_zero_page));
53
54 write_sysreg(ttbr, ttbr0_el1);
55 isb();
56 }
57
(inlined by) cpu_uninstall_idmap at
arch/arm64/include/asm/mmu_context.h:123
118 */
119 static inline void cpu_uninstall_idmap(void)
120 {
121 struct mm_struct *mm = current->active_mm;
122
123 cpu_set_reserved_ttbr0();
124 local_flush_tlb_all();
125 cpu_set_default_tcr_t0sz();
126
127 if (mm != &init_mm && !system_uses_ttbr0_pan())
128 cpu_switch_mm(mm->pgd, mm);
(inlined by) kpti_install_ng_mappings at
arch/arm64/kernel/cpufeature.c:922
917
918 remap_fn = (void
*)__pa_symbol(idmap_kpti_install_ng_mappings);
919
920 cpu_install_idmap();
921 remap_fn(cpu, num_online_cpus(),
__pa_symbol(swapper_pg_dir));
922 cpu_uninstall_idmap();
923
924 if (!cpu)
925 kpti_applied = true;
926
927 return;
Thanks!
Best Regards,
Wei
> as the GDB output wasn't helpful (it only showed local variable
> declarations?!).
>
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 16:33 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 16:33 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 2018/6/21 0:28, Will Deacon wrote:
> On Thu, Jun 21, 2018 at 12:25:05AM +0800, Wei Xu wrote:
>> Hi James,
>>
>> On 2018/6/20 23:54, James Morse wrote:
>>> Hi Wei,
>>>
>>> On 20/06/18 16:52, Wei Xu wrote:
>>>> On 2018/6/20 22:42, Will Deacon wrote:
>>>>> Hmm, I wonder if this is at all related to RAS, since we've just enabled
>>>>> that and if we take a fault whilst rewriting swapper then we're going to
>>>>> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
>>>> I will try it now.
>>> It's not just the Kconfig symbol, could you also revert:
>>>
>>> f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
>>> firmware-first")
>>>
>>>
>>> (reverts and build cleanly on 4.17)
>> Thanks to point out this!
>> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
>> But I still got the stack overflow issue sometimes.
>> Do you have more hint?
> [...]
>
>> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>> [ 0.081727] pc : el1_sync+0x0/0xb0
>> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
> Please run:
>
> $ ./scripts/faddr2line vmlinux kpti_install_ng_mappings+0x120/0x214
Thanks for your kindly guide :)
The output is as below:
joyx at Turing-Arch-b:~/plinth-kernel-v200$ ./scripts/faddr2line
../kernel-dev.build/vmlinux kpti_install_ng_mappings+0x120/0x214
kpti_install_ng_mappings+0x120/0x214:
cpu_set_reserved_ttbr0 at arch/arm64/include/asm/mmu_context.h:52
47 /*
48 * Set TTBR0 to empty_zero_page. No translations will be
possible via TTBR0.
49 */
50 static inline void cpu_set_reserved_ttbr0(void)
51 {
52 unsigned long ttbr =
phys_to_ttbr(__pa_symbol(empty_zero_page));
53
54 write_sysreg(ttbr, ttbr0_el1);
55 isb();
56 }
57
(inlined by) cpu_uninstall_idmap at
arch/arm64/include/asm/mmu_context.h:123
118 */
119 static inline void cpu_uninstall_idmap(void)
120 {
121 struct mm_struct *mm = current->active_mm;
122
123 cpu_set_reserved_ttbr0();
124 local_flush_tlb_all();
125 cpu_set_default_tcr_t0sz();
126
127 if (mm != &init_mm && !system_uses_ttbr0_pan())
128 cpu_switch_mm(mm->pgd, mm);
(inlined by) kpti_install_ng_mappings at
arch/arm64/kernel/cpufeature.c:922
917
918 remap_fn = (void
*)__pa_symbol(idmap_kpti_install_ng_mappings);
919
920 cpu_install_idmap();
921 remap_fn(cpu, num_online_cpus(),
__pa_symbol(swapper_pg_dir));
922 cpu_uninstall_idmap();
923
924 if (!cpu)
925 kpti_applied = true;
926
927 return;
Thanks!
Best Regards,
Wei
> as the GDB output wasn't helpful (it only showed local variable
> declarations?!).
>
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-20 16:25 ` Wei Xu
@ 2018-06-21 8:38 ` James Morse
-1 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-21 8:38 UTC (permalink / raw)
To: Wei Xu, Will Deacon
Cc: catalin.marinas, suzuki.poulose, dave.martin, mark.rutland,
marc.zyngier, linux-arm-kernel, linux-kernel, Linuxarm,
Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
Hi Will, Wei,
On 20/06/18 17:25, Wei Xu wrote:
> On 2018/6/20 23:54, James Morse wrote:
> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
> But I still got the stack overflow issue sometimes.
> Do you have more hint?
> The log is as below:
> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
> [ 0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty
Could you reproduce this with v4.17? This says there are ~45,000 extra patches,
and un-committed changes. None of the hashes so far have been commits in
mainline, so we have no idea what this tree is.
> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
> 23:59:05 CST 2018
> [ 0.000000] CPU0: using LPI pending table @0x000000007d860000
> [ 0.000000] GIC: PPI11 is secure or misconfigured
> [ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
> low
> [ 0.000000] arch_timer: WARNING: Please fix your firmware
> [ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
(No idea what these mean, but I doubt they are relevant)
> [ 0.042421] Insufficient stack space to handle exception!
> [ 0.042423] ESR: 0x96000046 -- DABT (current EL)
> [ 0.043730] FAR: 0xffff0000093a80e0
> [ 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
This was a level 2 translation fault on a write, to an address that is within
the stack....
> [ 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
> [ 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> [ 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45865-g2b31fe7-dirty #10
> [ 0.072201] Hardware name: linux,dummy-virt (DT)
> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> [ 0.081727] pc : el1_sync+0x0/0xb0
... from the vectors.
> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
What I think is happening is: we come out of the kpti idmap with the stack
unmapped. Shortly after we access the stack, which faults. el1_sync faults as
well when it tries to push the registers to the stack, and we keep going until
we overflow the stack.
I can't reproduce this with kvmtool or qemu in the model.
Thanks,
James
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-21 8:38 ` James Morse
0 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-21 8:38 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will, Wei,
On 20/06/18 17:25, Wei Xu wrote:
> On 2018/6/20 23:54, James Morse wrote:
> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
> But I still got the stack overflow issue sometimes.
> Do you have more hint?
> The log is as below:
> ??? [??? 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
> ??? [??? 0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty
Could you reproduce this with v4.17? This says there are ~45,000 extra patches,
and un-committed changes. None of the hashes so far have been commits in
mainline, so we have no idea what this tree is.
> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
> 23:59:05 CST 2018
> ??? [??? 0.000000] CPU0: using LPI pending table @0x000000007d860000
> ??? [??? 0.000000] GIC: PPI11 is secure or misconfigured
> ??? [??? 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
> low
> ??? [??? 0.000000] arch_timer: WARNING: Please fix your firmware
> ??? [??? 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
(No idea what these mean, but I doubt they are relevant)
> ??? [??? 0.042421] Insufficient stack space to handle exception!
> ??? [??? 0.042423] ESR: 0x96000046 -- DABT (current EL)
> ??? [??? 0.043730] FAR: 0xffff0000093a80e0
> ??? [??? 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
This was a level 2 translation fault on a write, to an address that is within
the stack....
> ??? [??? 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
> ??? [??? 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> ??? [??? 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45865-g2b31fe7-dirty #10
> ??? [??? 0.072201] Hardware name: linux,dummy-virt (DT)
> ??? [??? 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> ??? [??? 0.081727] pc : el1_sync+0x0/0xb0
... from the vectors.
> ??? [??? 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
What I think is happening is: we come out of the kpti idmap with the stack
unmapped. Shortly after we access the stack, which faults. el1_sync faults as
well when it tries to push the registers to the stack, and we keep going until
we overflow the stack.
I can't reproduce this with kvmtool or qemu in the model.
Thanks,
James
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-21 8:38 ` James Morse
@ 2018-06-21 9:00 ` Marc Zyngier
-1 siblings, 0 replies; 79+ messages in thread
From: Marc Zyngier @ 2018-06-21 9:00 UTC (permalink / raw)
To: James Morse, Wei Xu, Will Deacon
Cc: catalin.marinas, suzuki.poulose, dave.martin, mark.rutland,
linux-arm-kernel, linux-kernel, Linuxarm, Hanjun Guo, xiexiuqi,
huangdaode, Chenxin (Charles), Xiongfanggou (James),
Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
On 21/06/18 09:38, James Morse wrote:
>> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
>> 23:59:05 CST 2018
>
>> [ 0.000000] CPU0: using LPI pending table @0x000000007d860000
>> [ 0.000000] GIC: PPI11 is secure or misconfigured
>> [ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
>> low
>> [ 0.000000] arch_timer: WARNING: Please fix your firmware
>> [ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
>
> (No idea what these mean, but I doubt they are relevant)
Old (and buggy) QEMU. Nothing to worry about, the kernel (and the vgic)
will do the right thing. A modern QEMU presents the guest with a fixed
DT, removing the warning altogether.
Nothing to do with the issue at hand anyway.
M.
--
Jazz is not dead. It just smells funny...
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-21 9:00 ` Marc Zyngier
0 siblings, 0 replies; 79+ messages in thread
From: Marc Zyngier @ 2018-06-21 9:00 UTC (permalink / raw)
To: linux-arm-kernel
On 21/06/18 09:38, James Morse wrote:
>> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
>> 23:59:05 CST 2018
>
>> ??? [??? 0.000000] CPU0: using LPI pending table @0x000000007d860000
>> ??? [??? 0.000000] GIC: PPI11 is secure or misconfigured
>> ??? [??? 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
>> low
>> ??? [??? 0.000000] arch_timer: WARNING: Please fix your firmware
>> ??? [??? 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
>
> (No idea what these mean, but I doubt they are relevant)
Old (and buggy) QEMU. Nothing to worry about, the kernel (and the vgic)
will do the right thing. A modern QEMU presents the guest with a fixed
DT, removing the warning altogether.
Nothing to do with the issue at hand anyway.
M.
--
Jazz is not dead. It just smells funny...
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-21 8:38 ` James Morse
@ 2018-06-21 9:18 ` Will Deacon
-1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-21 9:18 UTC (permalink / raw)
To: James Morse
Cc: Wei Xu, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
> On 20/06/18 17:25, Wei Xu wrote:
> > [ 0.042421] Insufficient stack space to handle exception!
> > [ 0.042423] ESR: 0x96000046 -- DABT (current EL)
> > [ 0.043730] FAR: 0xffff0000093a80e0
> > [ 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>
> This was a level 2 translation fault on a write, to an address that is within
> the stack....
>
>
> > [ 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
> > [ 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> > [ 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> > 4.17.0-45865-g2b31fe7-dirty #10
> > [ 0.072201] Hardware name: linux,dummy-virt (DT)
>
> > [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> > [ 0.081727] pc : el1_sync+0x0/0xb0
>
> ... from the vectors.
>
>
> > [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
>
> What I think is happening is: we come out of the kpti idmap with the stack
> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
> well when it tries to push the registers to the stack, and we keep going until
> we overflow the stack.
>
> I can't reproduce this with kvmtool or qemu in the model.
Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
code leaves the nG bit set in table entries, which is actually IGNORED in
the architecture.
Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
otherwise your kernel will take an age to boot.
Will
--->8
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..70d9e98467ca 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -272,8 +272,8 @@ ENTRY(idmap_kpti_install_ng_mappings)
add end_pgdp, cur_pgdp, #(PTRS_PER_PGD * 8)
do_pgd: __idmap_kpti_get_pgtable_ent pgd
tbnz pgd, #1, walk_puds
-next_pgd:
__idmap_kpti_put_pgtable_ent_ng pgd
+next_pgd:
skip_pgd:
add cur_pgdp, cur_pgdp, #8
cmp cur_pgdp, end_pgdp
@@ -302,8 +302,8 @@ walk_puds:
add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
do_pud: __idmap_kpti_get_pgtable_ent pud
tbnz pud, #1, walk_pmds
-next_pud:
__idmap_kpti_put_pgtable_ent_ng pud
+next_pud:
skip_pud:
add cur_pudp, cur_pudp, 8
cmp cur_pudp, end_pudp
@@ -323,8 +323,8 @@ walk_pmds:
add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
do_pmd: __idmap_kpti_get_pgtable_ent pmd
tbnz pmd, #1, walk_ptes
-next_pmd:
__idmap_kpti_put_pgtable_ent_ng pmd
+next_pmd:
skip_pmd:
add cur_pmdp, cur_pmdp, #8
cmp cur_pmdp, end_pmdp
^ permalink raw reply related [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-21 9:18 ` Will Deacon
0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-21 9:18 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
> On 20/06/18 17:25, Wei Xu wrote:
> > ??? [??? 0.042421] Insufficient stack space to handle exception!
> > ??? [??? 0.042423] ESR: 0x96000046 -- DABT (current EL)
> > ??? [??? 0.043730] FAR: 0xffff0000093a80e0
> > ??? [??? 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>
> This was a level 2 translation fault on a write, to an address that is within
> the stack....
>
>
> > ??? [??? 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
> > ??? [??? 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> > ??? [??? 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> > 4.17.0-45865-g2b31fe7-dirty #10
> > ??? [??? 0.072201] Hardware name: linux,dummy-virt (DT)
>
> > ??? [??? 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> > ??? [??? 0.081727] pc : el1_sync+0x0/0xb0
>
> ... from the vectors.
>
>
> > ??? [??? 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
>
> What I think is happening is: we come out of the kpti idmap with the stack
> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
> well when it tries to push the registers to the stack, and we keep going until
> we overflow the stack.
>
> I can't reproduce this with kvmtool or qemu in the model.
Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
code leaves the nG bit set in table entries, which is actually IGNORED in
the architecture.
Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
otherwise your kernel will take an age to boot.
Will
--->8
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..70d9e98467ca 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -272,8 +272,8 @@ ENTRY(idmap_kpti_install_ng_mappings)
add end_pgdp, cur_pgdp, #(PTRS_PER_PGD * 8)
do_pgd: __idmap_kpti_get_pgtable_ent pgd
tbnz pgd, #1, walk_puds
-next_pgd:
__idmap_kpti_put_pgtable_ent_ng pgd
+next_pgd:
skip_pgd:
add cur_pgdp, cur_pgdp, #8
cmp cur_pgdp, end_pgdp
@@ -302,8 +302,8 @@ walk_puds:
add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
do_pud: __idmap_kpti_get_pgtable_ent pud
tbnz pud, #1, walk_pmds
-next_pud:
__idmap_kpti_put_pgtable_ent_ng pud
+next_pud:
skip_pud:
add cur_pudp, cur_pudp, 8
cmp cur_pudp, end_pudp
@@ -323,8 +323,8 @@ walk_pmds:
add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
do_pmd: __idmap_kpti_get_pgtable_ent pmd
tbnz pmd, #1, walk_ptes
-next_pmd:
__idmap_kpti_put_pgtable_ent_ng pmd
+next_pmd:
skip_pmd:
add cur_pmdp, cur_pmdp, #8
cmp cur_pmdp, end_pmdp
^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-21 8:38 ` James Morse
@ 2018-06-21 9:20 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-21 9:20 UTC (permalink / raw)
To: James Morse, Will Deacon
Cc: catalin.marinas, suzuki.poulose, dave.martin, mark.rutland,
marc.zyngier, linux-arm-kernel, linux-kernel, Linuxarm,
Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
Hi James,
On 2018/6/21 9:38, James Morse wrote:
> Hi Will, Wei,
>
> On 20/06/18 17:25, Wei Xu wrote:
>> On 2018/6/20 23:54, James Morse wrote:
>> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
>> But I still got the stack overflow issue sometimes.
>> Do you have more hint?
>
>> The log is as below:
>> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>> [ 0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty
>
> Could you reproduce this with v4.17? This says there are ~45,000 extra patches,
> and un-committed changes. None of the hashes so far have been commits in
> mainline, so we have no idea what this tree is.
>
I have tried v4.17 and log is as below and also it can be found in the first mail
of this thread.
[ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
15 21:39:52 CST 2018
I will try v4.17.2 and v4.18-rc1.
>
>> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
>> 23:59:05 CST 2018
>
>> [ 0.000000] CPU0: using LPI pending table @0x000000007d860000
>> [ 0.000000] GIC: PPI11 is secure or misconfigured
>> [ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
>> low
>> [ 0.000000] arch_timer: WARNING: Please fix your firmware
>> [ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
>
> (No idea what these mean, but I doubt they are relevant)
>
I will try with mainline qemu 2.12.0.
Thanks!
Best Regards,
Wei
>
>> [ 0.042421] Insufficient stack space to handle exception!
>> [ 0.042423] ESR: 0x96000046 -- DABT (current EL)
>> [ 0.043730] FAR: 0xffff0000093a80e0
>> [ 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>
> This was a level 2 translation fault on a write, to an address that is within
> the stack....
>
>
>> [ 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>> [ 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>> [ 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45865-g2b31fe7-dirty #10
>> [ 0.072201] Hardware name: linux,dummy-virt (DT)
>
>> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>> [ 0.081727] pc : el1_sync+0x0/0xb0
>
> ... from the vectors.
>
>
>> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
>
> What I think is happening is: we come out of the kpti idmap with the stack
> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
> well when it tries to push the registers to the stack, and we keep going until
> we overflow the stack.
>
> I can't reproduce this with kvmtool or qemu in the model.
>
>
> Thanks,
>
> James
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-21 9:20 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-21 9:20 UTC (permalink / raw)
To: linux-arm-kernel
Hi James,
On 2018/6/21 9:38, James Morse wrote:
> Hi Will, Wei,
>
> On 20/06/18 17:25, Wei Xu wrote:
>> On 2018/6/20 23:54, James Morse wrote:
>> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
>> But I still got the stack overflow issue sometimes.
>> Do you have more hint?
>
>> The log is as below:
>> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>> [ 0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty
>
> Could you reproduce this with v4.17? This says there are ~45,000 extra patches,
> and un-committed changes. None of the hashes so far have been commits in
> mainline, so we have no idea what this tree is.
>
I have tried v4.17 and log is as below and also it can be found in the first mail
of this thread.
[ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
15 21:39:52 CST 2018
I will try v4.17.2 and v4.18-rc1.
>
>> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
>> 23:59:05 CST 2018
>
>> [ 0.000000] CPU0: using LPI pending table @0x000000007d860000
>> [ 0.000000] GIC: PPI11 is secure or misconfigured
>> [ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
>> low
>> [ 0.000000] arch_timer: WARNING: Please fix your firmware
>> [ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
>
> (No idea what these mean, but I doubt they are relevant)
>
I will try with mainline qemu 2.12.0.
Thanks!
Best Regards,
Wei
>
>> [ 0.042421] Insufficient stack space to handle exception!
>> [ 0.042423] ESR: 0x96000046 -- DABT (current EL)
>> [ 0.043730] FAR: 0xffff0000093a80e0
>> [ 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>
> This was a level 2 translation fault on a write, to an address that is within
> the stack....
>
>
>> [ 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>> [ 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>> [ 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45865-g2b31fe7-dirty #10
>> [ 0.072201] Hardware name: linux,dummy-virt (DT)
>
>> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>> [ 0.081727] pc : el1_sync+0x0/0xb0
>
> ... from the vectors.
>
>
>> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
>
> What I think is happening is: we come out of the kpti idmap with the stack
> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
> well when it tries to push the registers to the stack, and we keep going until
> we overflow the stack.
>
> I can't reproduce this with kvmtool or qemu in the model.
>
>
> Thanks,
>
> James
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-21 9:18 ` Will Deacon
@ 2018-06-21 10:14 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-21 10:14 UTC (permalink / raw)
To: Will Deacon, James Morse
Cc: catalin.marinas, suzuki.poulose, dave.martin, mark.rutland,
marc.zyngier, linux-arm-kernel, linux-kernel, Linuxarm,
Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
Hi Will,
On 2018/6/21 10:18, Will Deacon wrote:
> On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
>> On 20/06/18 17:25, Wei Xu wrote:
>>> [ 0.042421] Insufficient stack space to handle exception!
>>> [ 0.042423] ESR: 0x96000046 -- DABT (current EL)
>>> [ 0.043730] FAR: 0xffff0000093a80e0
>>> [ 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>>
>> This was a level 2 translation fault on a write, to an address that is within
>> the stack....
>>
>>
>>> [ 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>> [ 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>> [ 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>>> 4.17.0-45865-g2b31fe7-dirty #10
>>> [ 0.072201] Hardware name: linux,dummy-virt (DT)
>>
>>> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>> [ 0.081727] pc : el1_sync+0x0/0xb0
>>
>> ... from the vectors.
>>
>>
>>> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
>>
>> What I think is happening is: we come out of the kpti idmap with the stack
>> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
>> well when it tries to push the registers to the stack, and we keep going until
>> we overflow the stack.
>>
>> I can't reproduce this with kvmtool or qemu in the model.
>
> Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
> code leaves the nG bit set in table entries, which is actually IGNORED in
> the architecture.
>
> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> otherwise your kernel will take an age to boot.
Yes, amazing! This patch resolved the issue.
I have tested 50 times and can not reproduce the issue any more.
Could you please tell more why this patch works?
Thanks!
Best Regards,
Wei
>
> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..70d9e98467ca 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -272,8 +272,8 @@ ENTRY(idmap_kpti_install_ng_mappings)
> add end_pgdp, cur_pgdp, #(PTRS_PER_PGD * 8)
> do_pgd: __idmap_kpti_get_pgtable_ent pgd
> tbnz pgd, #1, walk_puds
> -next_pgd:
> __idmap_kpti_put_pgtable_ent_ng pgd
> +next_pgd:
> skip_pgd:
> add cur_pgdp, cur_pgdp, #8
> cmp cur_pgdp, end_pgdp
> @@ -302,8 +302,8 @@ walk_puds:
> add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
> do_pud: __idmap_kpti_get_pgtable_ent pud
> tbnz pud, #1, walk_pmds
> -next_pud:
> __idmap_kpti_put_pgtable_ent_ng pud
> +next_pud:
> skip_pud:
> add cur_pudp, cur_pudp, 8
> cmp cur_pudp, end_pudp
> @@ -323,8 +323,8 @@ walk_pmds:
> add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
> do_pmd: __idmap_kpti_get_pgtable_ent pmd
> tbnz pmd, #1, walk_ptes
> -next_pmd:
> __idmap_kpti_put_pgtable_ent_ng pmd
> +next_pmd:
> skip_pmd:
> add cur_pmdp, cur_pmdp, #8
> cmp cur_pmdp, end_pmdp
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-21 10:14 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-21 10:14 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 2018/6/21 10:18, Will Deacon wrote:
> On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
>> On 20/06/18 17:25, Wei Xu wrote:
>>> [ 0.042421] Insufficient stack space to handle exception!
>>> [ 0.042423] ESR: 0x96000046 -- DABT (current EL)
>>> [ 0.043730] FAR: 0xffff0000093a80e0
>>> [ 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>>
>> This was a level 2 translation fault on a write, to an address that is within
>> the stack....
>>
>>
>>> [ 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>> [ 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>> [ 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>>> 4.17.0-45865-g2b31fe7-dirty #10
>>> [ 0.072201] Hardware name: linux,dummy-virt (DT)
>>
>>> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>> [ 0.081727] pc : el1_sync+0x0/0xb0
>>
>> ... from the vectors.
>>
>>
>>> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
>>
>> What I think is happening is: we come out of the kpti idmap with the stack
>> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
>> well when it tries to push the registers to the stack, and we keep going until
>> we overflow the stack.
>>
>> I can't reproduce this with kvmtool or qemu in the model.
>
> Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
> code leaves the nG bit set in table entries, which is actually IGNORED in
> the architecture.
>
> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> otherwise your kernel will take an age to boot.
Yes, amazing! This patch resolved the issue.
I have tested 50 times and can not reproduce the issue any more.
Could you please tell more why this patch works?
Thanks!
Best Regards,
Wei
>
> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..70d9e98467ca 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -272,8 +272,8 @@ ENTRY(idmap_kpti_install_ng_mappings)
> add end_pgdp, cur_pgdp, #(PTRS_PER_PGD * 8)
> do_pgd: __idmap_kpti_get_pgtable_ent pgd
> tbnz pgd, #1, walk_puds
> -next_pgd:
> __idmap_kpti_put_pgtable_ent_ng pgd
> +next_pgd:
> skip_pgd:
> add cur_pgdp, cur_pgdp, #8
> cmp cur_pgdp, end_pgdp
> @@ -302,8 +302,8 @@ walk_puds:
> add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
> do_pud: __idmap_kpti_get_pgtable_ent pud
> tbnz pud, #1, walk_pmds
> -next_pud:
> __idmap_kpti_put_pgtable_ent_ng pud
> +next_pud:
> skip_pud:
> add cur_pudp, cur_pudp, 8
> cmp cur_pudp, end_pudp
> @@ -323,8 +323,8 @@ walk_pmds:
> add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
> do_pmd: __idmap_kpti_get_pgtable_ent pmd
> tbnz pmd, #1, walk_ptes
> -next_pmd:
> __idmap_kpti_put_pgtable_ent_ng pmd
> +next_pmd:
> skip_pmd:
> add cur_pmdp, cur_pmdp, #8
> cmp cur_pmdp, end_pmdp
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-21 10:14 ` Wei Xu
@ 2018-06-21 10:54 ` Will Deacon
-1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-21 10:54 UTC (permalink / raw)
To: Wei Xu
Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian
Hi Wei,
On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> On 2018/6/21 10:18, Will Deacon wrote:
> > On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
> >> On 20/06/18 17:25, Wei Xu wrote:
> >>> [ 0.042421] Insufficient stack space to handle exception!
> >>> [ 0.042423] ESR: 0x96000046 -- DABT (current EL)
> >>> [ 0.043730] FAR: 0xffff0000093a80e0
> >>> [ 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> >>
> >> This was a level 2 translation fault on a write, to an address that is within
> >> the stack....
> >>
> >>
> >>> [ 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
> >>> [ 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> >>> [ 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> >>> 4.17.0-45865-g2b31fe7-dirty #10
> >>> [ 0.072201] Hardware name: linux,dummy-virt (DT)
> >>
> >>> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> >>> [ 0.081727] pc : el1_sync+0x0/0xb0
> >>
> >> ... from the vectors.
> >>
> >>
> >>> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
> >>
> >> What I think is happening is: we come out of the kpti idmap with the stack
> >> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
> >> well when it tries to push the registers to the stack, and we keep going until
> >> we overflow the stack.
> >>
> >> I can't reproduce this with kvmtool or qemu in the model.
> >
> > Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
> > code leaves the nG bit set in table entries, which is actually IGNORED in
> > the architecture.
> >
> > Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> > otherwise your kernel will take an age to boot.
>
> Yes, amazing! This patch resolved the issue.
Great...
> I have tested 50 times and can not reproduce the issue any more.
> Could you please tell more why this patch works?
You might need to ask your CPU design team ;)
Without this patch, the code in idmap_kpti_install_ng_mappings() sets
bit 11 in table descriptors so that we can keep track of which parts of
the page table we've visited. With this patch, we don't bother tracking
and potentially rewalk parts of the page table (which takes a very long
time if KASAN is enabled).
The architecture documents I've looked at are clear that bit 11 is IGNORED
by the CPU, which:
"Indicates that the architecture guarantees that the bit or field is not
interpreted or modified by hardware."
Please can you double-check that your CPU is indeed ignoring bit 11 in
non-leaf (table) descriptors?
Thanks,
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-21 10:54 ` Will Deacon
0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-21 10:54 UTC (permalink / raw)
To: linux-arm-kernel
Hi Wei,
On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> On 2018/6/21 10:18, Will Deacon wrote:
> > On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
> >> On 20/06/18 17:25, Wei Xu wrote:
> >>> [ 0.042421] Insufficient stack space to handle exception!
> >>> [ 0.042423] ESR: 0x96000046 -- DABT (current EL)
> >>> [ 0.043730] FAR: 0xffff0000093a80e0
> >>> [ 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> >>
> >> This was a level 2 translation fault on a write, to an address that is within
> >> the stack....
> >>
> >>
> >>> [ 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
> >>> [ 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> >>> [ 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> >>> 4.17.0-45865-g2b31fe7-dirty #10
> >>> [ 0.072201] Hardware name: linux,dummy-virt (DT)
> >>
> >>> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> >>> [ 0.081727] pc : el1_sync+0x0/0xb0
> >>
> >> ... from the vectors.
> >>
> >>
> >>> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
> >>
> >> What I think is happening is: we come out of the kpti idmap with the stack
> >> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
> >> well when it tries to push the registers to the stack, and we keep going until
> >> we overflow the stack.
> >>
> >> I can't reproduce this with kvmtool or qemu in the model.
> >
> > Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
> > code leaves the nG bit set in table entries, which is actually IGNORED in
> > the architecture.
> >
> > Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> > otherwise your kernel will take an age to boot.
>
> Yes, amazing! This patch resolved the issue.
Great...
> I have tested 50 times and can not reproduce the issue any more.
> Could you please tell more why this patch works?
You might need to ask your CPU design team ;)
Without this patch, the code in idmap_kpti_install_ng_mappings() sets
bit 11 in table descriptors so that we can keep track of which parts of
the page table we've visited. With this patch, we don't bother tracking
and potentially rewalk parts of the page table (which takes a very long
time if KASAN is enabled).
The architecture documents I've looked at are clear that bit 11 is IGNORED
by the CPU, which:
"Indicates that the architecture guarantees that the bit or field is not
interpreted or modified by hardware."
Please can you double-check that your CPU is indeed ignoring bit 11 in
non-leaf (table) descriptors?
Thanks,
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-21 10:54 ` Will Deacon
@ 2018-06-22 8:33 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 8:33 UTC (permalink / raw)
To: Will Deacon
Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
Hi Will,
On 2018/6/21 11:54, Will Deacon wrote:
> Hi Wei,
>
> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>> On 2018/6/21 10:18, Will Deacon wrote:
>>> On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
>>>> On 20/06/18 17:25, Wei Xu wrote:
>>>>> [ 0.042421] Insufficient stack space to handle exception!
>>>>> [ 0.042423] ESR: 0x96000046 -- DABT (current EL)
>>>>> [ 0.043730] FAR: 0xffff0000093a80e0
>>>>> [ 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>>>>
>>>> This was a level 2 translation fault on a write, to an address that is within
>>>> the stack....
>>>>
>>>>
>>>>> [ 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>>>> [ 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>>>> [ 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>>>>> 4.17.0-45865-g2b31fe7-dirty #10
>>>>> [ 0.072201] Hardware name: linux,dummy-virt (DT)
>>>>
>>>>> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>>>> [ 0.081727] pc : el1_sync+0x0/0xb0
>>>>
>>>> ... from the vectors.
>>>>
>>>>
>>>>> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
>>>>
>>>> What I think is happening is: we come out of the kpti idmap with the stack
>>>> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
>>>> well when it tries to push the registers to the stack, and we keep going until
>>>> we overflow the stack.
>>>>
>>>> I can't reproduce this with kvmtool or qemu in the model.
>>>
>>> Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
>>> code leaves the nG bit set in table entries, which is actually IGNORED in
>>> the architecture.
>>>
>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>> otherwise your kernel will take an age to boot.
>>
>> Yes, amazing! This patch resolved the issue.
>
> Great...
>
>> I have tested 50 times and can not reproduce the issue any more.
>> Could you please tell more why this patch works?
>
> You might need to ask your CPU design team ;)
>
> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> bit 11 in table descriptors so that we can keep track of which parts of
> the page table we've visited. With this patch, we don't bother tracking
> and potentially rewalk parts of the page table (which takes a very long
> time if KASAN is enabled).
Got it. Thanks!
>
> The architecture documents I've looked at are clear that bit 11 is IGNORED
> by the CPU, which:
>
> "Indicates that the architecture guarantees that the bit or field is not
> interpreted or modified by hardware."
>
> Please can you double-check that your CPU is indeed ignoring bit 11 in
> non-leaf (table) descriptors?
Do the non-leaf(table) descriptors mean the table descriptors
of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
If yes, our hardware does ignore it(not interpret or modify).
Is there any other possible reason cause this?
Thanks!
Best Regards,
Wei
>
> Thanks,
>
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 8:33 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 8:33 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 2018/6/21 11:54, Will Deacon wrote:
> Hi Wei,
>
> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>> On 2018/6/21 10:18, Will Deacon wrote:
>>> On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
>>>> On 20/06/18 17:25, Wei Xu wrote:
>>>>> [ 0.042421] Insufficient stack space to handle exception!
>>>>> [ 0.042423] ESR: 0x96000046 -- DABT (current EL)
>>>>> [ 0.043730] FAR: 0xffff0000093a80e0
>>>>> [ 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>>>>
>>>> This was a level 2 translation fault on a write, to an address that is within
>>>> the stack....
>>>>
>>>>
>>>>> [ 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>>>> [ 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>>>> [ 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>>>>> 4.17.0-45865-g2b31fe7-dirty #10
>>>>> [ 0.072201] Hardware name: linux,dummy-virt (DT)
>>>>
>>>>> [ 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>>>> [ 0.081727] pc : el1_sync+0x0/0xb0
>>>>
>>>> ... from the vectors.
>>>>
>>>>
>>>>> [ 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
>>>>
>>>> What I think is happening is: we come out of the kpti idmap with the stack
>>>> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
>>>> well when it tries to push the registers to the stack, and we keep going until
>>>> we overflow the stack.
>>>>
>>>> I can't reproduce this with kvmtool or qemu in the model.
>>>
>>> Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
>>> code leaves the nG bit set in table entries, which is actually IGNORED in
>>> the architecture.
>>>
>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>> otherwise your kernel will take an age to boot.
>>
>> Yes, amazing! This patch resolved the issue.
>
> Great...
>
>> I have tested 50 times and can not reproduce the issue any more.
>> Could you please tell more why this patch works?
>
> You might need to ask your CPU design team ;)
>
> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> bit 11 in table descriptors so that we can keep track of which parts of
> the page table we've visited. With this patch, we don't bother tracking
> and potentially rewalk parts of the page table (which takes a very long
> time if KASAN is enabled).
Got it. Thanks!
>
> The architecture documents I've looked at are clear that bit 11 is IGNORED
> by the CPU, which:
>
> "Indicates that the architecture guarantees that the bit or field is not
> interpreted or modified by hardware."
>
> Please can you double-check that your CPU is indeed ignoring bit 11 in
> non-leaf (table) descriptors?
Do the non-leaf(table) descriptors mean the table descriptors
of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
If yes, our hardware does ignore it(not interpret or modify).
Is there any other possible reason cause this?
Thanks!
Best Regards,
Wei
>
> Thanks,
>
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-22 8:33 ` Wei Xu
@ 2018-06-22 9:23 ` Will Deacon
-1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 9:23 UTC (permalink / raw)
To: Wei Xu
Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
Hi Wei,
On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
> On 2018/6/21 11:54, Will Deacon wrote:
> > On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> >> On 2018/6/21 10:18, Will Deacon wrote:
> >>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> >>> otherwise your kernel will take an age to boot.
> >>
> >> Yes, amazing! This patch resolved the issue.
> >
> > Great...
> >
> >> I have tested 50 times and can not reproduce the issue any more.
> >> Could you please tell more why this patch works?
> >
> > You might need to ask your CPU design team ;)
> >
> > Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> > bit 11 in table descriptors so that we can keep track of which parts of
> > the page table we've visited. With this patch, we don't bother tracking
> > and potentially rewalk parts of the page table (which takes a very long
> > time if KASAN is enabled).
>
> Got it. Thanks!
>
> >
> > The architecture documents I've looked at are clear that bit 11 is IGNORED
> > by the CPU, which:
> >
> > "Indicates that the architecture guarantees that the bit or field is not
> > interpreted or modified by hardware."
> >
> > Please can you double-check that your CPU is indeed ignoring bit 11 in
> > non-leaf (table) descriptors?
>
> Do the non-leaf(table) descriptors mean the table descriptors
> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
>
> If yes, our hardware does ignore it(not interpret or modify).
Ok, thanks for checking.
> Is there any other possible reason cause this?
Perhaps just writing back the table entries is enough to cause the issue,
although I really can't understand why that would be the case. Can you try
the diff below (without my previous change), please?
Will
--->8
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..e2a8e88f95a0 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -216,7 +216,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
.endm
.macro __idmap_kpti_put_pgtable_ent_ng, type
- orr \type, \type, #PTE_NG // Same bit for blocks and pages
+ eor \type, \type, #PTE_NG // Same bit for blocks and pages
str \type, [cur_\()\type\()p] // Update the entry and ensure it
dc civac, cur_\()\type\()p // is visible to all CPUs.
.endm
@@ -298,6 +298,7 @@ skip_pgd:
/* PUD */
walk_puds:
.if CONFIG_PGTABLE_LEVELS > 3
+ eor pgd, pgd, #PTE_NG
pte_to_phys cur_pudp, pgd
add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
do_pud: __idmap_kpti_get_pgtable_ent pud
@@ -319,6 +320,7 @@ next_pud:
/* PMD */
walk_pmds:
.if CONFIG_PGTABLE_LEVELS > 2
+ eor pud, pud, #PTE_NG
pte_to_phys cur_pmdp, pud
add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
do_pmd: __idmap_kpti_get_pgtable_ent pmd
@@ -339,6 +341,7 @@ next_pmd:
/* PTE */
walk_ptes:
+ eor pmd, pmd, #PTE_NG
pte_to_phys cur_ptep, pmd
add end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
do_pte: __idmap_kpti_get_pgtable_ent pte
^ permalink raw reply related [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 9:23 ` Will Deacon
0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 9:23 UTC (permalink / raw)
To: linux-arm-kernel
Hi Wei,
On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
> On 2018/6/21 11:54, Will Deacon wrote:
> > On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> >> On 2018/6/21 10:18, Will Deacon wrote:
> >>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> >>> otherwise your kernel will take an age to boot.
> >>
> >> Yes, amazing! This patch resolved the issue.
> >
> > Great...
> >
> >> I have tested 50 times and can not reproduce the issue any more.
> >> Could you please tell more why this patch works?
> >
> > You might need to ask your CPU design team ;)
> >
> > Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> > bit 11 in table descriptors so that we can keep track of which parts of
> > the page table we've visited. With this patch, we don't bother tracking
> > and potentially rewalk parts of the page table (which takes a very long
> > time if KASAN is enabled).
>
> Got it. Thanks!
>
> >
> > The architecture documents I've looked at are clear that bit 11 is IGNORED
> > by the CPU, which:
> >
> > "Indicates that the architecture guarantees that the bit or field is not
> > interpreted or modified by hardware."
> >
> > Please can you double-check that your CPU is indeed ignoring bit 11 in
> > non-leaf (table) descriptors?
>
> Do the non-leaf(table) descriptors mean the table descriptors
> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
>
> If yes, our hardware does ignore it(not interpret or modify).
Ok, thanks for checking.
> Is there any other possible reason cause this?
Perhaps just writing back the table entries is enough to cause the issue,
although I really can't understand why that would be the case. Can you try
the diff below (without my previous change), please?
Will
--->8
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..e2a8e88f95a0 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -216,7 +216,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
.endm
.macro __idmap_kpti_put_pgtable_ent_ng, type
- orr \type, \type, #PTE_NG // Same bit for blocks and pages
+ eor \type, \type, #PTE_NG // Same bit for blocks and pages
str \type, [cur_\()\type\()p] // Update the entry and ensure it
dc civac, cur_\()\type\()p // is visible to all CPUs.
.endm
@@ -298,6 +298,7 @@ skip_pgd:
/* PUD */
walk_puds:
.if CONFIG_PGTABLE_LEVELS > 3
+ eor pgd, pgd, #PTE_NG
pte_to_phys cur_pudp, pgd
add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
do_pud: __idmap_kpti_get_pgtable_ent pud
@@ -319,6 +320,7 @@ next_pud:
/* PMD */
walk_pmds:
.if CONFIG_PGTABLE_LEVELS > 2
+ eor pud, pud, #PTE_NG
pte_to_phys cur_pmdp, pud
add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
do_pmd: __idmap_kpti_get_pgtable_ent pmd
@@ -339,6 +341,7 @@ next_pmd:
/* PTE */
walk_ptes:
+ eor pmd, pmd, #PTE_NG
pte_to_phys cur_ptep, pmd
add end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
do_pte: __idmap_kpti_get_pgtable_ent pte
^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-22 9:23 ` Will Deacon
@ 2018-06-22 10:45 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 10:45 UTC (permalink / raw)
To: Will Deacon
Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
Hi Will,
On 2018/6/22 17:23, Will Deacon wrote:
> Hi Wei,
>
> On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
>> On 2018/6/21 11:54, Will Deacon wrote:
>>> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>>>> On 2018/6/21 10:18, Will Deacon wrote:
>>>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>>>> otherwise your kernel will take an age to boot.
>>>> Yes, amazing! This patch resolved the issue.
>>> Great...
>>>
>>>> I have tested 50 times and can not reproduce the issue any more.
>>>> Could you please tell more why this patch works?
>>> You might need to ask your CPU design team ;)
>>>
>>> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
>>> bit 11 in table descriptors so that we can keep track of which parts of
>>> the page table we've visited. With this patch, we don't bother tracking
>>> and potentially rewalk parts of the page table (which takes a very long
>>> time if KASAN is enabled).
>> Got it. Thanks!
>>
>>> The architecture documents I've looked at are clear that bit 11 is IGNORED
>>> by the CPU, which:
>>>
>>> "Indicates that the architecture guarantees that the bit or field is not
>>> interpreted or modified by hardware."
>>>
>>> Please can you double-check that your CPU is indeed ignoring bit 11 in
>>> non-leaf (table) descriptors?
>> Do the non-leaf(table) descriptors mean the table descriptors
>> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
>> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
>>
>> If yes, our hardware does ignore it(not interpret or modify).
> Ok, thanks for checking.
>
>> Is there any other possible reason cause this?
> Perhaps just writing back the table entries is enough to cause the issue,
> although I really can't understand why that would be the case. Can you try
> the diff below (without my previous change), please?
Thanks!
But it does not resolve the issue(only apply this patch based on 4.17.0).
The log is as below:
estuary:/$ ./qemu-system-aarch64 -machine
virt,kernel_irqchip=on,gic-version=3
-cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.17-joyx
-initrd
../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
console=ttyAMA0 earlycon=pl011,0x9000000"
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
[ 0.000000] Linux version 4.17.0-45865-gc58dc48
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #14
SMP PREEMPT Fri Jun 22 18:26:01 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval)
s57984 r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table isolation
(KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K kernel
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved,
16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1,
Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000002] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000844] Console: colour dummy device 80x25
[ 0.001406] Calibrating delay loop (skipped), value calculated
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002458] pid_max: default: 32768 minimum: 301
[ 0.002944] Security Framework initialized
[ 0.003521] Dentry cache hash table entries: 131072 (order: 8,
1048576 bytes)
[ 0.004322] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005022] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005797] Mountpoint-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.025904] ASID allocator initialised with 32768 entries
[ 0.029913] Hierarchical SRCU implementation.
[ 0.034285] Platform MSI: its domain created
[ 0.034740] PCI/MSI: /intc/its domain created
[ 0.035318] EFI services will not be available.
[ 0.037943] smp: Bringing up secondary CPUs ...
[ 0.038410] smp: Brought up 1 node, 1 CPU
[ 0.038815] SMP: Total of 1 processors activated.
[ 0.039300] CPU features: detected: GIC system register CPU
interface
[ 0.039946] CPU features: detected: Privileged Access Never
[ 0.040506] CPU features: detected: User Access Override
[ 0.042439] Insufficient stack space to handle exception!
[ 0.042441] ESR: 0x96000046 -- DABT (current EL)
[ 0.043752] FAR: 0xffff0000093a80e0
[ 0.044207] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
[ 0.046511] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 0.052899] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
[ 0.059396] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-gc58dc48 #14
[ 0.067018] Hardware name: linux,dummy-virt (DT)
[ 0.071710] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
[ 0.076532] pc : el1_sync+0x0/0xb0
[ 0.080028] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.085197] sp : ffff0000093a80e0
[ 0.088566] x29: ffff0000093abce0 x28: ffff000008ea9000
[ 0.093979] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.099293] x25: ffff00000906d000 x24: ffff000009191000
[ 0.104706] x23: ffff000008ea9000 x22: 0000000041190000
[ 0.110015] x21: ffff0000091f7000 x20: 0000000000000000
[ 0.115428] x19: ffff000009190000 x18: 000000003455d99d
[ 0.120842] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.126255] x15: 000000007eff6000 x14: 000000007eff6000
[ 0.131566] x13: 00f800007fe00f11 x12: 000000007eff8000
[ 0.136983] x11: 000000007eff8000 x10: 0000000000000000
[ 0.142396] x9 : 000000007eff9000 x8 : 000000007eff9000
[ 0.147704] x7 : 0000000000000000 x6 : 00000000411f8000
[ 0.153116] x5 : 00000000411f8000 x4 : 0000000040a443d4
[ 0.158530] x3 : 00000000411f7000 x2 : 00000000411f7000
[ 0.163943] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
[ 0.169251] Kernel panic - not syncing: kernel stack overflow
[ 0.175140] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-gc58dc48 #14
[ 0.182732] Hardware name: linux,dummy-virt (DT)
[ 0.187424] Call trace:
[ 0.189948] dump_backtrace+0x0/0x180
[ 0.193678] show_stack+0x14/0x1c
[ 0.197051] dump_stack+0x90/0xb0
[ 0.200423] panic+0x138/0x2a0
[ 0.203549] __stack_chk_fail+0x0/0x18
[ 0.207398] handle_bad_stack+0x118/0x124
[ 0.211489] __bad_stack+0x88/0x8c
[ 0.214870] el1_sync+0x0/0xb0
[ 0.217998] Unable to handle kernel paging request at virtual
address ffff0000093abce0
[ 0.226061] Mem abort info:
[ 0.228839] ESR = 0x96000006
[ 0.231965] Exception class = DABT (current EL), IL = 32 bits
[ 0.237980] SET = 0, FnV = 0
[ 0.241105] EA = 0, S1PTW = 0
[ 0.244346] Data abort info:
[ 0.247239] ISV = 0, ISS = 0x00000006
[ 0.251199] CM = 0, WnR = 0
[ 0.254209] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.261191] [ffff0000093abce0] pgd=00000000411f8003,
pud=00000000411f9003, pmd=0000000000000000
[ 0.269982] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 0.275538] Modules linked in:
[ 0.278664] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-gc58dc48 #14
[ 0.286361] Hardware name: linux,dummy-virt (DT)
[ 0.291053] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
[ 0.295874] pc : unwind_frame+0x28/0xc8
[ 0.299836] lr : dump_backtrace+0x12c/0x180
[ 0.304055] sp : ffff80003efcf000
[ 0.307429] x29: ffff80003efcf000 x28: ffff80003da61c00
[ 0.312841] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.318255] x25: ffff00000906d000 x24: ffff0000093a80e0
[ 0.323563] x23: 0000000000000000 x22: ffff000008dbada0
[ 0.328975] x21: 0000000000000000 x20: ffff000009049000
[ 0.334388] x19: ffff80003da61c00 x18: 000000003455d99d
[ 0.339698] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.345111] x15: 000000007eff6000 x14: 3431232038346364
[ 0.350523] x13: 0000000000000000 x12: cc26f77952f87e00
[ 0.355832] x11: ffffffffffffffff x10: 0000000000000075
[ 0.361245] x9 : ffff0000085ae9e8 x8 : 78302f3078302b63
[ 0.366666] x7 : 6e79735f316c6520 x6 : ffff0000091befe1
[ 0.371976] x5 : 0000000000000000 x4 : ffff0000093ac000
[ 0.377389] x3 : ffff0000093a8000 x2 : ffff0000093abce0
[ 0.382801] x1 : ffff80003efcf048 x0 : ffff80003da61c00
[ 0.388214] Process migration/0 (pid: 12, stack limit =
0x (ptrval))
[ 0.395204] Call trace:
[ 0.397726] unwind_frame+0x28/0xc8
[ 0.401224] show_stack+0x14/0x1c
[ 0.404699] dump_stack+0x90/0xb0
[ 0.408070] panic+0x138/0x2a0
[ 0.411198] __stack_chk_fail+0x0/0x18
[ 0.414944] handle_bad_stack+0x118/0x124
[ 0.419035] __bad_stack+0x88/0x8c
[ 0.422520] el1_sync+0x0/0xb0
[ 0.425648] Unable to handle kernel paging request at virtual
address ffff0000093abce0
[ 0.433601] Mem abort info:
[ 0.436486] ESR = 0x96000006
[ 0.439611] Exception class = DABT (current EL), IL = 32 bits
[ 0.445626] SET = 0, FnV = 0
[ 0.448754] EA = 0, S1PTW = 0
[ 0.451995] Data abort info:
[ 0.454888] ISV = 0, ISS = 0x00000006
[ 0.458849] CM = 0, WnR = 0
[ 0.461860] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.468843] [ffff0000093abce0] pgd=00000000411f8003,
pud=00000000411f9003, pmd=0000000000000000
> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..e2a8e88f95a0 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -216,7 +216,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
> .endm
>
> .macro __idmap_kpti_put_pgtable_ent_ng, type
> - orr \type, \type, #PTE_NG // Same bit for blocks and pages
> + eor \type, \type, #PTE_NG // Same bit for blocks and pages
> str \type, [cur_\()\type\()p] // Update the entry and ensure it
> dc civac, cur_\()\type\()p // is visible to all CPUs.
> .endm
> @@ -298,6 +298,7 @@ skip_pgd:
> /* PUD */
> walk_puds:
> .if CONFIG_PGTABLE_LEVELS > 3
> + eor pgd, pgd, #PTE_NG
> pte_to_phys cur_pudp, pgd
> add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
> do_pud: __idmap_kpti_get_pgtable_ent pud
> @@ -319,6 +320,7 @@ next_pud:
> /* PMD */
> walk_pmds:
> .if CONFIG_PGTABLE_LEVELS > 2
> + eor pud, pud, #PTE_NG
> pte_to_phys cur_pmdp, pud
> add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
> do_pmd: __idmap_kpti_get_pgtable_ent pmd
> @@ -339,6 +341,7 @@ next_pmd:
>
> /* PTE */
> walk_ptes:
> + eor pmd, pmd, #PTE_NG
> pte_to_phys cur_ptep, pmd
> add end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
> do_pte: __idmap_kpti_get_pgtable_ent pte
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 10:45 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 10:45 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 2018/6/22 17:23, Will Deacon wrote:
> Hi Wei,
>
> On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
>> On 2018/6/21 11:54, Will Deacon wrote:
>>> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>>>> On 2018/6/21 10:18, Will Deacon wrote:
>>>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>>>> otherwise your kernel will take an age to boot.
>>>> Yes, amazing! This patch resolved the issue.
>>> Great...
>>>
>>>> I have tested 50 times and can not reproduce the issue any more.
>>>> Could you please tell more why this patch works?
>>> You might need to ask your CPU design team ;)
>>>
>>> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
>>> bit 11 in table descriptors so that we can keep track of which parts of
>>> the page table we've visited. With this patch, we don't bother tracking
>>> and potentially rewalk parts of the page table (which takes a very long
>>> time if KASAN is enabled).
>> Got it. Thanks!
>>
>>> The architecture documents I've looked at are clear that bit 11 is IGNORED
>>> by the CPU, which:
>>>
>>> "Indicates that the architecture guarantees that the bit or field is not
>>> interpreted or modified by hardware."
>>>
>>> Please can you double-check that your CPU is indeed ignoring bit 11 in
>>> non-leaf (table) descriptors?
>> Do the non-leaf(table) descriptors mean the table descriptors
>> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
>> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
>>
>> If yes, our hardware does ignore it(not interpret or modify).
> Ok, thanks for checking.
>
>> Is there any other possible reason cause this?
> Perhaps just writing back the table entries is enough to cause the issue,
> although I really can't understand why that would be the case. Can you try
> the diff below (without my previous change), please?
Thanks!
But it does not resolve the issue(only apply this patch based on 4.17.0).
The log is as below:
estuary:/$ ./qemu-system-aarch64 -machine
virt,kernel_irqchip=on,gic-version=3
-cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.17-joyx
-initrd
../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
console=ttyAMA0 earlycon=pl011,0x9000000"
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
[ 0.000000] Linux version 4.17.0-45865-gc58dc48
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #14
SMP PREEMPT Fri Jun 22 18:26:01 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval)
s57984 r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table isolation
(KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K kernel
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved,
16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1,
Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000002] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000844] Console: colour dummy device 80x25
[ 0.001406] Calibrating delay loop (skipped), value calculated
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002458] pid_max: default: 32768 minimum: 301
[ 0.002944] Security Framework initialized
[ 0.003521] Dentry cache hash table entries: 131072 (order: 8,
1048576 bytes)
[ 0.004322] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005022] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005797] Mountpoint-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.025904] ASID allocator initialised with 32768 entries
[ 0.029913] Hierarchical SRCU implementation.
[ 0.034285] Platform MSI: its domain created
[ 0.034740] PCI/MSI: /intc/its domain created
[ 0.035318] EFI services will not be available.
[ 0.037943] smp: Bringing up secondary CPUs ...
[ 0.038410] smp: Brought up 1 node, 1 CPU
[ 0.038815] SMP: Total of 1 processors activated.
[ 0.039300] CPU features: detected: GIC system register CPU
interface
[ 0.039946] CPU features: detected: Privileged Access Never
[ 0.040506] CPU features: detected: User Access Override
[ 0.042439] Insufficient stack space to handle exception!
[ 0.042441] ESR: 0x96000046 -- DABT (current EL)
[ 0.043752] FAR: 0xffff0000093a80e0
[ 0.044207] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
[ 0.046511] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 0.052899] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
[ 0.059396] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-gc58dc48 #14
[ 0.067018] Hardware name: linux,dummy-virt (DT)
[ 0.071710] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
[ 0.076532] pc : el1_sync+0x0/0xb0
[ 0.080028] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.085197] sp : ffff0000093a80e0
[ 0.088566] x29: ffff0000093abce0 x28: ffff000008ea9000
[ 0.093979] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.099293] x25: ffff00000906d000 x24: ffff000009191000
[ 0.104706] x23: ffff000008ea9000 x22: 0000000041190000
[ 0.110015] x21: ffff0000091f7000 x20: 0000000000000000
[ 0.115428] x19: ffff000009190000 x18: 000000003455d99d
[ 0.120842] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.126255] x15: 000000007eff6000 x14: 000000007eff6000
[ 0.131566] x13: 00f800007fe00f11 x12: 000000007eff8000
[ 0.136983] x11: 000000007eff8000 x10: 0000000000000000
[ 0.142396] x9 : 000000007eff9000 x8 : 000000007eff9000
[ 0.147704] x7 : 0000000000000000 x6 : 00000000411f8000
[ 0.153116] x5 : 00000000411f8000 x4 : 0000000040a443d4
[ 0.158530] x3 : 00000000411f7000 x2 : 00000000411f7000
[ 0.163943] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
[ 0.169251] Kernel panic - not syncing: kernel stack overflow
[ 0.175140] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-gc58dc48 #14
[ 0.182732] Hardware name: linux,dummy-virt (DT)
[ 0.187424] Call trace:
[ 0.189948] dump_backtrace+0x0/0x180
[ 0.193678] show_stack+0x14/0x1c
[ 0.197051] dump_stack+0x90/0xb0
[ 0.200423] panic+0x138/0x2a0
[ 0.203549] __stack_chk_fail+0x0/0x18
[ 0.207398] handle_bad_stack+0x118/0x124
[ 0.211489] __bad_stack+0x88/0x8c
[ 0.214870] el1_sync+0x0/0xb0
[ 0.217998] Unable to handle kernel paging request at virtual
address ffff0000093abce0
[ 0.226061] Mem abort info:
[ 0.228839] ESR = 0x96000006
[ 0.231965] Exception class = DABT (current EL), IL = 32 bits
[ 0.237980] SET = 0, FnV = 0
[ 0.241105] EA = 0, S1PTW = 0
[ 0.244346] Data abort info:
[ 0.247239] ISV = 0, ISS = 0x00000006
[ 0.251199] CM = 0, WnR = 0
[ 0.254209] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.261191] [ffff0000093abce0] pgd=00000000411f8003,
pud=00000000411f9003, pmd=0000000000000000
[ 0.269982] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 0.275538] Modules linked in:
[ 0.278664] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45865-gc58dc48 #14
[ 0.286361] Hardware name: linux,dummy-virt (DT)
[ 0.291053] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
[ 0.295874] pc : unwind_frame+0x28/0xc8
[ 0.299836] lr : dump_backtrace+0x12c/0x180
[ 0.304055] sp : ffff80003efcf000
[ 0.307429] x29: ffff80003efcf000 x28: ffff80003da61c00
[ 0.312841] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.318255] x25: ffff00000906d000 x24: ffff0000093a80e0
[ 0.323563] x23: 0000000000000000 x22: ffff000008dbada0
[ 0.328975] x21: 0000000000000000 x20: ffff000009049000
[ 0.334388] x19: ffff80003da61c00 x18: 000000003455d99d
[ 0.339698] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.345111] x15: 000000007eff6000 x14: 3431232038346364
[ 0.350523] x13: 0000000000000000 x12: cc26f77952f87e00
[ 0.355832] x11: ffffffffffffffff x10: 0000000000000075
[ 0.361245] x9 : ffff0000085ae9e8 x8 : 78302f3078302b63
[ 0.366666] x7 : 6e79735f316c6520 x6 : ffff0000091befe1
[ 0.371976] x5 : 0000000000000000 x4 : ffff0000093ac000
[ 0.377389] x3 : ffff0000093a8000 x2 : ffff0000093abce0
[ 0.382801] x1 : ffff80003efcf048 x0 : ffff80003da61c00
[ 0.388214] Process migration/0 (pid: 12, stack limit =
0x (ptrval))
[ 0.395204] Call trace:
[ 0.397726] unwind_frame+0x28/0xc8
[ 0.401224] show_stack+0x14/0x1c
[ 0.404699] dump_stack+0x90/0xb0
[ 0.408070] panic+0x138/0x2a0
[ 0.411198] __stack_chk_fail+0x0/0x18
[ 0.414944] handle_bad_stack+0x118/0x124
[ 0.419035] __bad_stack+0x88/0x8c
[ 0.422520] el1_sync+0x0/0xb0
[ 0.425648] Unable to handle kernel paging request at virtual
address ffff0000093abce0
[ 0.433601] Mem abort info:
[ 0.436486] ESR = 0x96000006
[ 0.439611] Exception class = DABT (current EL), IL = 32 bits
[ 0.445626] SET = 0, FnV = 0
[ 0.448754] EA = 0, S1PTW = 0
[ 0.451995] Data abort info:
[ 0.454888] ISV = 0, ISS = 0x00000006
[ 0.458849] CM = 0, WnR = 0
[ 0.461860] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.468843] [ffff0000093abce0] pgd=00000000411f8003,
pud=00000000411f9003, pmd=0000000000000000
> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..e2a8e88f95a0 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -216,7 +216,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
> .endm
>
> .macro __idmap_kpti_put_pgtable_ent_ng, type
> - orr \type, \type, #PTE_NG // Same bit for blocks and pages
> + eor \type, \type, #PTE_NG // Same bit for blocks and pages
> str \type, [cur_\()\type\()p] // Update the entry and ensure it
> dc civac, cur_\()\type\()p // is visible to all CPUs.
> .endm
> @@ -298,6 +298,7 @@ skip_pgd:
> /* PUD */
> walk_puds:
> .if CONFIG_PGTABLE_LEVELS > 3
> + eor pgd, pgd, #PTE_NG
> pte_to_phys cur_pudp, pgd
> add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
> do_pud: __idmap_kpti_get_pgtable_ent pud
> @@ -319,6 +320,7 @@ next_pud:
> /* PMD */
> walk_pmds:
> .if CONFIG_PGTABLE_LEVELS > 2
> + eor pud, pud, #PTE_NG
> pte_to_phys cur_pmdp, pud
> add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
> do_pmd: __idmap_kpti_get_pgtable_ent pmd
> @@ -339,6 +341,7 @@ next_pmd:
>
> /* PTE */
> walk_ptes:
> + eor pmd, pmd, #PTE_NG
> pte_to_phys cur_ptep, pmd
> add end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
> do_pte: __idmap_kpti_get_pgtable_ent pte
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-22 10:45 ` Wei Xu
@ 2018-06-22 11:16 ` Will Deacon
-1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 11:16 UTC (permalink / raw)
To: Wei Xu
Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
Hi Wei,
Thanks for giving that a spin.
On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> On 2018/6/22 17:23, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
> >>On 2018/6/21 11:54, Will Deacon wrote:
> >>>On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> >>>>On 2018/6/21 10:18, Will Deacon wrote:
> >>>>>Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> >>>>>otherwise your kernel will take an age to boot.
> >>>>Yes, amazing! This patch resolved the issue.
> >>>Great...
> >>>
> >>>>I have tested 50 times and can not reproduce the issue any more.
> >>>>Could you please tell more why this patch works?
> >>>You might need to ask your CPU design team ;)
> >>>
> >>>Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> >>>bit 11 in table descriptors so that we can keep track of which parts of
> >>>the page table we've visited. With this patch, we don't bother tracking
> >>>and potentially rewalk parts of the page table (which takes a very long
> >>>time if KASAN is enabled).
> >>Got it. Thanks!
> >>
> >>>The architecture documents I've looked at are clear that bit 11 is IGNORED
> >>>by the CPU, which:
> >>>
> >>> "Indicates that the architecture guarantees that the bit or field is not
> >>> interpreted or modified by hardware."
> >>>
> >>>Please can you double-check that your CPU is indeed ignoring bit 11 in
> >>>non-leaf (table) descriptors?
> >>Do the non-leaf(table) descriptors mean the table descriptors
> >>of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
> >>in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
> >>
> >>If yes, our hardware does ignore it(not interpret or modify).
> >Ok, thanks for checking.
> >
> >>Is there any other possible reason cause this?
> >Perhaps just writing back the table entries is enough to cause the issue,
> >although I really can't understand why that would be the case. Can you try
> >the diff below (without my previous change), please?
>
> Thanks!
> But it does not resolve the issue(only apply this patch based on 4.17.0).
Thanks, that's a useful data point. It means that it still crashes even if
we write back the same table entries, so it's the fact that we're writing
them at all which causes the problem, not the value that we write.
Whilst looking at the code, we noticed a missing DMB. On the off-chance
that it helps, can you try this instead please?
Will
--->8
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..03646e6a2ef4 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -217,8 +217,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
.macro __idmap_kpti_put_pgtable_ent_ng, type
orr \type, \type, #PTE_NG // Same bit for blocks and pages
- str \type, [cur_\()\type\()p] // Update the entry and ensure it
- dc civac, cur_\()\type\()p // is visible to all CPUs.
+ str \type, [cur_\()\type\()p] // Update the entry and ensure
+ dmb sy // that it is visible to all
+ dc civac, cur_\()\type\()p // CPUs.
.endm
/*
^ permalink raw reply related [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 11:16 ` Will Deacon
0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 11:16 UTC (permalink / raw)
To: linux-arm-kernel
Hi Wei,
Thanks for giving that a spin.
On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> On 2018/6/22 17:23, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
> >>On 2018/6/21 11:54, Will Deacon wrote:
> >>>On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> >>>>On 2018/6/21 10:18, Will Deacon wrote:
> >>>>>Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> >>>>>otherwise your kernel will take an age to boot.
> >>>>Yes, amazing! This patch resolved the issue.
> >>>Great...
> >>>
> >>>>I have tested 50 times and can not reproduce the issue any more.
> >>>>Could you please tell more why this patch works?
> >>>You might need to ask your CPU design team ;)
> >>>
> >>>Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> >>>bit 11 in table descriptors so that we can keep track of which parts of
> >>>the page table we've visited. With this patch, we don't bother tracking
> >>>and potentially rewalk parts of the page table (which takes a very long
> >>>time if KASAN is enabled).
> >>Got it. Thanks!
> >>
> >>>The architecture documents I've looked at are clear that bit 11 is IGNORED
> >>>by the CPU, which:
> >>>
> >>> "Indicates that the architecture guarantees that the bit or field is not
> >>> interpreted or modified by hardware."
> >>>
> >>>Please can you double-check that your CPU is indeed ignoring bit 11 in
> >>>non-leaf (table) descriptors?
> >>Do the non-leaf(table) descriptors mean the table descriptors
> >>of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
> >>in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
> >>
> >>If yes, our hardware does ignore it(not interpret or modify).
> >Ok, thanks for checking.
> >
> >>Is there any other possible reason cause this?
> >Perhaps just writing back the table entries is enough to cause the issue,
> >although I really can't understand why that would be the case. Can you try
> >the diff below (without my previous change), please?
>
> Thanks!
> But it does not resolve the issue(only apply this patch based on 4.17.0).
Thanks, that's a useful data point. It means that it still crashes even if
we write back the same table entries, so it's the fact that we're writing
them at all which causes the problem, not the value that we write.
Whilst looking at the code, we noticed a missing DMB. On the off-chance
that it helps, can you try this instead please?
Will
--->8
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..03646e6a2ef4 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -217,8 +217,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
.macro __idmap_kpti_put_pgtable_ent_ng, type
orr \type, \type, #PTE_NG // Same bit for blocks and pages
- str \type, [cur_\()\type\()p] // Update the entry and ensure it
- dc civac, cur_\()\type\()p // is visible to all CPUs.
+ str \type, [cur_\()\type\()p] // Update the entry and ensure
+ dmb sy // that it is visible to all
+ dc civac, cur_\()\type\()p // CPUs.
.endm
/*
^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-22 11:16 ` Will Deacon
@ 2018-06-22 13:18 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 13:18 UTC (permalink / raw)
To: Will Deacon
Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
Hi Will,
On 2018/6/22 19:16, Will Deacon wrote:
> Hi Wei,
>
> Thanks for giving that a spin.
>
> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>> On 2018/6/22 17:23, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
>>>> On 2018/6/21 11:54, Will Deacon wrote:
>>>>> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>>>>>> On 2018/6/21 10:18, Will Deacon wrote:
>>>>>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>>>>>> otherwise your kernel will take an age to boot.
>>>>>> Yes, amazing! This patch resolved the issue.
>>>>> Great...
>>>>>
>>>>>> I have tested 50 times and can not reproduce the issue any more.
>>>>>> Could you please tell more why this patch works?
>>>>> You might need to ask your CPU design team ;)
>>>>>
>>>>> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
>>>>> bit 11 in table descriptors so that we can keep track of which parts of
>>>>> the page table we've visited. With this patch, we don't bother tracking
>>>>> and potentially rewalk parts of the page table (which takes a very long
>>>>> time if KASAN is enabled).
>>>> Got it. Thanks!
>>>>
>>>>> The architecture documents I've looked at are clear that bit 11 is IGNORED
>>>>> by the CPU, which:
>>>>>
>>>>> "Indicates that the architecture guarantees that the bit or field is not
>>>>> interpreted or modified by hardware."
>>>>>
>>>>> Please can you double-check that your CPU is indeed ignoring bit 11 in
>>>>> non-leaf (table) descriptors?
>>>> Do the non-leaf(table) descriptors mean the table descriptors
>>>> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
>>>> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
>>>>
>>>> If yes, our hardware does ignore it(not interpret or modify).
>>> Ok, thanks for checking.
>>>
>>>> Is there any other possible reason cause this?
>>> Perhaps just writing back the table entries is enough to cause the issue,
>>> although I really can't understand why that would be the case. Can you try
>>> the diff below (without my previous change), please?
>> Thanks!
>> But it does not resolve the issue(only apply this patch based on 4.17.0).
> Thanks, that's a useful data point. It means that it still crashes even if
> we write back the same table entries, so it's the fact that we're writing
> them at all which causes the problem, not the value that we write.
>
> Whilst looking at the code, we noticed a missing DMB. On the off-chance
> that it helps, can you try this instead please?
Thanks!
Only apply below patch based on 4.17.0, we still got the crash.
The log is as below nearly same with before.
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
[ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #16
SMP PREEMPT Fri Jun 22 21:05:10 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval)
s57984 r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table isolation
(KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K kernel
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved,
16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1,
Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000849] Console: colour dummy device 80x25
[ 0.001427] Calibrating delay loop (skipped), value calculated
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002485] pid_max: default: 32768 minimum: 301
[ 0.002966] Security Framework initialized
[ 0.003549] Dentry cache hash table entries: 131072 (order: 8,
1048576 bytes)
[ 0.004353] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005068] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005858] Mountpoint-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.025962] ASID allocator initialised with 32768 entries
[ 0.029972] Hierarchical SRCU implementation.
[ 0.034341] Platform MSI: its domain created
[ 0.034793] PCI/MSI: /intc/its domain created
[ 0.035360] EFI services will not be available.
[ 0.038002] smp: Bringing up secondary CPUs ...
[ 0.038472] smp: Brought up 1 node, 1 CPU
[ 0.038878] SMP: Total of 1 processors activated.
[ 0.039354] CPU features: detected: GIC system register CPU
interface
[ 0.040004] CPU features: detected: Privileged Access Never
[ 0.040566] CPU features: detected: User Access Override
[ 0.042462] Insufficient stack space to handle exception!
[ 0.042464] ESR: 0x96000046 -- DABT (current EL)
[ 0.043781] FAR: 0xffff0000093a80e0
[ 0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
[ 0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
[ 0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #16
[ 0.067946] Hardware name: linux,dummy-virt (DT)
[ 0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
[ 0.077480] pc : el1_sync+0x0/0xb0
[ 0.080970] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.086143] sp : ffff0000093a80e0
[ 0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
[ 0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.100241] x25: ffff00000906d000 x24: ffff000009191000
[ 0.105657] x23: ffff000008ea9000 x22: 0000000041190000
[ 0.111448] x21: ffff0000091f7000 x20: 0000000000000000
[ 0.116437] x19: ffff000009190000 x18: 000000003455d99d
[ 0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.127155] x15: 000000007eff6000 x14: 000000007eff6000
[ 0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
[ 0.137886] x11: 000000007eff8000 x10: 0000000000000000
[ 0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
[ 0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
[ 0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
[ 0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
[ 0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
[ 0.170179] Kernel panic - not syncing: kernel stack overflow
[ 0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #16
[ 0.184152] Hardware name: linux,dummy-virt (DT)
[ 0.188851] Call trace:
[ 0.191380] dump_backtrace+0x0/0x180
[ 0.195113] show_stack+0x14/0x1c
[ 0.198488] dump_stack+0x90/0xb0
[ 0.201862] panic+0x138/0x2a0
[ 0.204989] __stack_chk_fail+0x0/0x18
[ 0.208836] handle_bad_stack+0x118/0x124
[ 0.212927] __bad_stack+0x88/0x8c
[ 0.216414] el1_sync+0x0/0xb0
[ 0.219544] Unable to handle kernel paging request at virtual
address ffff0000093abce0
[ 0.227507] Mem abort info:
[ 0.230390] ESR = 0x96000006
[ 0.233517] Exception class = DABT (current EL), IL = 32 bits
[ 0.239428] SET = 0, FnV = 0
[ 0.242555] EA = 0, S1PTW = 0
[ 0.245797] Data abort info:
[ 0.248795] ISV = 0, ISS = 0x00000006
[ 0.252652] CM = 0, WnR = 0
[ 0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.262645] [ffff0000093abce0] pgd=00000000411f8803,
pud=00000000411f9803, pmd=0000000000000000
Best Regards,
Wei
> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..03646e6a2ef4 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -217,8 +217,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
>
> .macro __idmap_kpti_put_pgtable_ent_ng, type
> orr \type, \type, #PTE_NG // Same bit for blocks and pages
> - str \type, [cur_\()\type\()p] // Update the entry and ensure it
> - dc civac, cur_\()\type\()p // is visible to all CPUs.
> + str \type, [cur_\()\type\()p] // Update the entry and ensure
> + dmb sy // that it is visible to all
> + dc civac, cur_\()\type\()p // CPUs.
> .endm
>
> /*
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 13:18 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 13:18 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 2018/6/22 19:16, Will Deacon wrote:
> Hi Wei,
>
> Thanks for giving that a spin.
>
> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>> On 2018/6/22 17:23, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
>>>> On 2018/6/21 11:54, Will Deacon wrote:
>>>>> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>>>>>> On 2018/6/21 10:18, Will Deacon wrote:
>>>>>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>>>>>> otherwise your kernel will take an age to boot.
>>>>>> Yes, amazing! This patch resolved the issue.
>>>>> Great...
>>>>>
>>>>>> I have tested 50 times and can not reproduce the issue any more.
>>>>>> Could you please tell more why this patch works?
>>>>> You might need to ask your CPU design team ;)
>>>>>
>>>>> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
>>>>> bit 11 in table descriptors so that we can keep track of which parts of
>>>>> the page table we've visited. With this patch, we don't bother tracking
>>>>> and potentially rewalk parts of the page table (which takes a very long
>>>>> time if KASAN is enabled).
>>>> Got it. Thanks!
>>>>
>>>>> The architecture documents I've looked at are clear that bit 11 is IGNORED
>>>>> by the CPU, which:
>>>>>
>>>>> "Indicates that the architecture guarantees that the bit or field is not
>>>>> interpreted or modified by hardware."
>>>>>
>>>>> Please can you double-check that your CPU is indeed ignoring bit 11 in
>>>>> non-leaf (table) descriptors?
>>>> Do the non-leaf(table) descriptors mean the table descriptors
>>>> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
>>>> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
>>>>
>>>> If yes, our hardware does ignore it(not interpret or modify).
>>> Ok, thanks for checking.
>>>
>>>> Is there any other possible reason cause this?
>>> Perhaps just writing back the table entries is enough to cause the issue,
>>> although I really can't understand why that would be the case. Can you try
>>> the diff below (without my previous change), please?
>> Thanks!
>> But it does not resolve the issue(only apply this patch based on 4.17.0).
> Thanks, that's a useful data point. It means that it still crashes even if
> we write back the same table entries, so it's the fact that we're writing
> them at all which causes the problem, not the value that we write.
>
> Whilst looking at the code, we noticed a missing DMB. On the off-chance
> that it helps, can you try this instead please?
Thanks!
Only apply below patch based on 4.17.0, we still got the crash.
The log is as below nearly same with before.
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
[ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #16
SMP PREEMPT Fri Jun 22 21:05:10 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval)
s57984 r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table isolation
(KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K kernel
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved,
16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1,
Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000849] Console: colour dummy device 80x25
[ 0.001427] Calibrating delay loop (skipped), value calculated
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002485] pid_max: default: 32768 minimum: 301
[ 0.002966] Security Framework initialized
[ 0.003549] Dentry cache hash table entries: 131072 (order: 8,
1048576 bytes)
[ 0.004353] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005068] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005858] Mountpoint-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.025962] ASID allocator initialised with 32768 entries
[ 0.029972] Hierarchical SRCU implementation.
[ 0.034341] Platform MSI: its domain created
[ 0.034793] PCI/MSI: /intc/its domain created
[ 0.035360] EFI services will not be available.
[ 0.038002] smp: Bringing up secondary CPUs ...
[ 0.038472] smp: Brought up 1 node, 1 CPU
[ 0.038878] SMP: Total of 1 processors activated.
[ 0.039354] CPU features: detected: GIC system register CPU
interface
[ 0.040004] CPU features: detected: Privileged Access Never
[ 0.040566] CPU features: detected: User Access Override
[ 0.042462] Insufficient stack space to handle exception!
[ 0.042464] ESR: 0x96000046 -- DABT (current EL)
[ 0.043781] FAR: 0xffff0000093a80e0
[ 0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
[ 0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
[ 0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #16
[ 0.067946] Hardware name: linux,dummy-virt (DT)
[ 0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
[ 0.077480] pc : el1_sync+0x0/0xb0
[ 0.080970] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.086143] sp : ffff0000093a80e0
[ 0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
[ 0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.100241] x25: ffff00000906d000 x24: ffff000009191000
[ 0.105657] x23: ffff000008ea9000 x22: 0000000041190000
[ 0.111448] x21: ffff0000091f7000 x20: 0000000000000000
[ 0.116437] x19: ffff000009190000 x18: 000000003455d99d
[ 0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.127155] x15: 000000007eff6000 x14: 000000007eff6000
[ 0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
[ 0.137886] x11: 000000007eff8000 x10: 0000000000000000
[ 0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
[ 0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
[ 0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
[ 0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
[ 0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
[ 0.170179] Kernel panic - not syncing: kernel stack overflow
[ 0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #16
[ 0.184152] Hardware name: linux,dummy-virt (DT)
[ 0.188851] Call trace:
[ 0.191380] dump_backtrace+0x0/0x180
[ 0.195113] show_stack+0x14/0x1c
[ 0.198488] dump_stack+0x90/0xb0
[ 0.201862] panic+0x138/0x2a0
[ 0.204989] __stack_chk_fail+0x0/0x18
[ 0.208836] handle_bad_stack+0x118/0x124
[ 0.212927] __bad_stack+0x88/0x8c
[ 0.216414] el1_sync+0x0/0xb0
[ 0.219544] Unable to handle kernel paging request at virtual
address ffff0000093abce0
[ 0.227507] Mem abort info:
[ 0.230390] ESR = 0x96000006
[ 0.233517] Exception class = DABT (current EL), IL = 32 bits
[ 0.239428] SET = 0, FnV = 0
[ 0.242555] EA = 0, S1PTW = 0
[ 0.245797] Data abort info:
[ 0.248795] ISV = 0, ISS = 0x00000006
[ 0.252652] CM = 0, WnR = 0
[ 0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.262645] [ffff0000093abce0] pgd=00000000411f8803,
pud=00000000411f9803, pmd=0000000000000000
Best Regards,
Wei
> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..03646e6a2ef4 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -217,8 +217,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
>
> .macro __idmap_kpti_put_pgtable_ent_ng, type
> orr \type, \type, #PTE_NG // Same bit for blocks and pages
> - str \type, [cur_\()\type\()p] // Update the entry and ensure it
> - dc civac, cur_\()\type\()p // is visible to all CPUs.
> + str \type, [cur_\()\type\()p] // Update the entry and ensure
> + dmb sy // that it is visible to all
> + dc civac, cur_\()\type\()p // CPUs.
> .endm
>
> /*
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-22 13:18 ` Wei Xu
@ 2018-06-22 13:31 ` Will Deacon
-1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 13:31 UTC (permalink / raw)
To: Wei Xu
Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
Hi again, Wei,
On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> On 2018/6/22 19:16, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> >>On 2018/6/22 17:23, Will Deacon wrote:
> >>>Perhaps just writing back the table entries is enough to cause the issue,
> >>>although I really can't understand why that would be the case. Can you try
> >>>the diff below (without my previous change), please?
> >>Thanks!
> >>But it does not resolve the issue(only apply this patch based on 4.17.0).
> >Thanks, that's a useful data point. It means that it still crashes even if
> >we write back the same table entries, so it's the fact that we're writing
> >them at all which causes the problem, not the value that we write.
> >
> >Whilst looking at the code, we noticed a missing DMB. On the off-chance
> >that it helps, can you try this instead please?
> Thanks!
> Only apply below patch based on 4.17.0, we still got the crash.
Oh well, it was worth a shot (and that's still a fix worth having). Please
can you provide the complete disassembly for kpti_install_ng_mappings()
(I'm referring to the C function in cpufeature.c) along with a corresponding
crash log so that we can correlate the instruction stream with the crash?
Thanks,
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 13:31 ` Will Deacon
0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 13:31 UTC (permalink / raw)
To: linux-arm-kernel
Hi again, Wei,
On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> On 2018/6/22 19:16, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> >>On 2018/6/22 17:23, Will Deacon wrote:
> >>>Perhaps just writing back the table entries is enough to cause the issue,
> >>>although I really can't understand why that would be the case. Can you try
> >>>the diff below (without my previous change), please?
> >>Thanks!
> >>But it does not resolve the issue(only apply this patch based on 4.17.0).
> >Thanks, that's a useful data point. It means that it still crashes even if
> >we write back the same table entries, so it's the fact that we're writing
> >them at all which causes the problem, not the value that we write.
> >
> >Whilst looking at the code, we noticed a missing DMB. On the off-chance
> >that it helps, can you try this instead please?
> Thanks!
> Only apply below patch based on 4.17.0, we still got the crash.
Oh well, it was worth a shot (and that's still a fix worth having). Please
can you provide the complete disassembly for kpti_install_ng_mappings()
(I'm referring to the C function in cpufeature.c) along with a corresponding
crash log so that we can correlate the instruction stream with the crash?
Thanks,
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-22 13:31 ` Will Deacon
@ 2018-06-22 13:46 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 13:46 UTC (permalink / raw)
To: Will Deacon
Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
Hi Will,
On 2018/6/22 21:31, Will Deacon wrote:
> Hi again, Wei,
>
> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>> On 2018/6/22 19:16, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>>>> On 2018/6/22 17:23, Will Deacon wrote:
>>>>> Perhaps just writing back the table entries is enough to cause the issue,
>>>>> although I really can't understand why that would be the case. Can you try
>>>>> the diff below (without my previous change), please?
>>>> Thanks!
>>>> But it does not resolve the issue(only apply this patch based on 4.17.0).
>>> Thanks, that's a useful data point. It means that it still crashes even if
>>> we write back the same table entries, so it's the fact that we're writing
>>> them at all which causes the problem, not the value that we write.
>>>
>>> Whilst looking at the code, we noticed a missing DMB. On the off-chance
>>> that it helps, can you try this instead please?
>> Thanks!
>> Only apply below patch based on 4.17.0, we still got the crash.
> Oh well, it was worth a shot (and that's still a fix worth having). Please
> can you provide the complete disassembly for kpti_install_ng_mappings()
> (I'm referring to the C function in cpufeature.c) along with a corresponding
> crash log so that we can correlate the instruction stream with the crash?
Just let me know if you need more information.
Thanks!
The disassemble code is as below:
Dump of assembler code for function kpti_install_ng_mappings:
0xffff000008091d68 <+0>: stp x29, x30, [sp,#-112]!
0xffff000008091d6c <+4>: adrp x0, 0xffff000009022000
<bp_hardening_data>
0xffff000008091d70 <+8>: mov x29, sp
0xffff000008091d74 <+12>: stp x23, x24, [sp,#48]
0xffff000008091d78 <+16>: adrp x24, 0xffff000009191000
<reset_devices>
0xffff000008091d7c <+20>: add x0, x0, #0x10
0xffff000008091d80 <+24>: add x1, x24, #0x550
0xffff000008091d84 <+28>: stp x19, x20, [sp,#16]
0xffff000008091d88 <+32>: stp x21, x22, [sp,#32]
0xffff000008091d8c <+36>: stp x25, x26, [sp,#64]
0xffff000008091d90 <+40>: stp x27, x28, [sp,#80]
0xffff000008091d94 <+44>: mrs x2, tpidr_el1
0xffff000008091d98 <+48>: ldrb w1, [x1,#8]
0xffff000008091d9c <+52>: ldr w20, [x2,x0]
0xffff000008091da0 <+56>: cbnz w1, 0xffff000008091f18
<kpti_install_ng_mappings+432>
0xffff000008091da4 <+60>: adrp x27, 0xffff000008ea9000
<cpu_ops+384>
0xffff000008091da8 <+64>: adrp x19, 0xffff000009190000
<empty_zero_page>
0xffff000008091dac <+68>: add x19, x19, #0x0
0xffff000008091db0 <+72>: adrp x1, 0xffff000008a44000
<kimage_vaddr>
0xffff000008091db4 <+76>: mov x0, x19
0xffff000008091db8 <+80>: add x1, x1, #0x3d8
0xffff000008091dbc <+84>: ldr x2, [x27,#672]
0xffff000008091dc0 <+88>: sub x4, x1, x2
0xffff000008091dc4 <+92>: sub x0, x0, x2
0xffff000008091dc8 <+96>: msr ttbr0_el1, x0
0xffff000008091dcc <+100>: isb
0xffff000008091dd0 <+104>: dsb nshst
0xffff000008091dd4 <+108>: tlbi vmalle1
0xffff000008091dd8 <+112>: nop
0xffff000008091ddc <+116>: nop
0xffff000008091de0 <+120>: dsb nsh
0xffff000008091de4 <+124>: isb
0xffff000008091de8 <+128>: adrp x3, 0xffff000009056000
<armv8_event_attr_sw_incr+8>
0xffff000008091dec <+132>: ldr x0, [x3,#2248]
0xffff000008091df0 <+136>: cmp x0, #0x10
0xffff000008091df4 <+140>: b.ne 0xffff000008091f64
<kpti_install_ng_mappings+508>
0xffff000008091df8 <+144>: adrp x28, 0xffff000008ea9000
<cpu_ops+384>
0xffff000008091dfc <+148>: ldr x2, [x27,#672]
0xffff000008091e00 <+152>: adrp x1, 0xffff0000091f3000
0xffff000008091e04 <+156>: adrp x26, 0xffff0000091f7000
0xffff000008091e08 <+160>: add x1, x1, #0x0
0xffff000008091e0c <+164>: add x21, x26, #0x0
0xffff000008091e10 <+168>: ldr x0, [x28,#656]
0xffff000008091e14 <+172>: adrp x23, 0xffff000008ea9000
<cpu_ops+384>
0xffff000008091e18 <+176>: sub x1, x1, x2
0xffff000008091e1c <+180>: sub x1, x1, x0
0xffff000008091e20 <+184>: orr x0, x1, #0xffff800000000000
0xffff000008091e24 <+188>: cmp x0, x21
0xffff000008091e28 <+192>: b.eq 0xffff000008091f60
<kpti_install_ng_mappings+504>
0xffff000008091e2c <+196>: mov x22, x19
0xffff000008091e30 <+200>: str x3, [x29,#96]
0xffff000008091e34 <+204>: str x4, [x29,#104]
0xffff000008091e38 <+208>: sub x2, x22, x2
0xffff000008091e3c <+212>: msr ttbr0_el1, x2
0xffff000008091e40 <+216>: isb
0xffff000008091e44 <+220>: ldr x0, [x28,#656]
0xffff000008091e48 <+224>: and x1, x1, #0x7fffffffffff
0xffff000008091e4c <+228>: adrp x25, 0xffff00000906d000
<shmem_swaplist_mutex+16>
0xffff000008091e50 <+232>: add x0, x1, x0
0xffff000008091e54 <+236>: add x1, x25, #0x7b0
0xffff000008091e58 <+240>: bl 0xffff0000080a021c
<cpu_do_switch_mm>
0xffff000008091e5c <+244>: adrp x0, 0xffff00000904a000
<__cpu_online_mask>
0xffff000008091e60 <+248>: mov w1,
#0x80 // #128
0xffff000008091e64 <+252>: add x0, x0, #0x0
0xffff000008091e68 <+256>: bl 0xffff0000083e22f0
<__bitmap_weight>
0xffff000008091e6c <+260>: mov w1, w0
0xffff000008091e70 <+264>: ldr x5, [x23,#672]
0xffff000008091e74 <+268>: mov w0, w20
0xffff000008091e78 <+272>: ldr x4, [x29,#104]
0xffff000008091e7c <+276>: mov x2, x21
0xffff000008091e80 <+280>: sub x2, x2, x5
0xffff000008091e84 <+284>: blr x4
0xffff000008091e88 <+288>: ldr x1, [x23,#672]
0xffff000008091e8c <+292>: mrs x0, sp_el0
0xffff000008091e90 <+296>: sub x22, x22, x1
0xffff000008091e94 <+300>: ldr x1, [x0,#1128]
0xffff000008091e98 <+304>: msr ttbr0_el1, x22
0xffff000008091e9c <+308>: isb
0xffff000008091ea0 <+312>: dsb nshst
0xffff000008091ea4 <+316>: tlbi vmalle1
0xffff000008091ea8 <+320>: nop
0xffff000008091eac <+324>: nop
0xffff000008091eb0 <+328>: dsb nsh
0xffff000008091eb4 <+332>: isb
0xffff000008091eb8 <+336>: ldr x3, [x29,#96]
0xffff000008091ebc <+340>: ldr x0, [x3,#2248]
0xffff000008091ec0 <+344>: cmp x0, #0x10
0xffff000008091ec4 <+348>: b.ne 0xffff000008091f48
<kpti_install_ng_mappings+480>
0xffff000008091ec8 <+352>: add x25, x25, #0x7b0
0xffff000008091ecc <+356>: cmp x1, x25
0xffff000008091ed0 <+360>: b.eq 0xffff000008091f08
<kpti_install_ng_mappings+416>
0xffff000008091ed4 <+364>: ldr x2, [x1,#64]
0xffff000008091ed8 <+368>: add x26, x26, #0x0
0xffff000008091edc <+372>: cmp x2, x26
0xffff000008091ee0 <+376>: b.eq 0xffff000008091f60
<kpti_install_ng_mappings+504>
0xffff000008091ee4 <+380>: ldr x0, [x27,#672]
0xffff000008091ee8 <+384>: sub x19, x19, x0
0xffff000008091eec <+388>: msr ttbr0_el1, x19
0xffff000008091ef0 <+392>: isb
0xffff000008091ef4 <+396>: tbz x2, #47, 0xffff000008091f34
<kpti_install_ng_mappings+460>
0xffff000008091ef8 <+400>: ldr x0, [x28,#656]
0xffff000008091efc <+404>: and x2, x2, #0x7fffffffffff
0xffff000008091f00 <+408>: add x0, x2, x0
0xffff000008091f04 <+412>: bl 0xffff0000080a021c
<cpu_do_switch_mm>
0xffff000008091f08 <+416>: cbnz w20, 0xffff000008091f18
<kpti_install_ng_mappings+432>
0xffff000008091f0c <+420>: add x24, x24, #0x550
0xffff000008091f10 <+424>: mov w0,
#0x1 // #1
0xffff000008091f14 <+428>: strb w0, [x24,#8]
0xffff000008091f18 <+432>: ldp x19, x20, [sp,#16]
0xffff000008091f1c <+436>: ldp x21, x22, [sp,#32]
0xffff000008091f20 <+440>: ldp x23, x24, [sp,#48]
0xffff000008091f24 <+444>: ldp x25, x26, [sp,#64]
0xffff000008091f28 <+448>: ldp x27, x28, [sp,#80]
0xffff000008091f2c <+452>: ldp x29, x30, [sp],#112
0xffff000008091f30 <+456>: ret
0xffff000008091f34 <+460>: adrp x0, 0xffff000008ea9000
<cpu_ops+384>
0xffff000008091f38 <+464>: ldr x0, [x0,#672]
0xffff000008091f3c <+468>: sub x0, x2, x0
0xffff000008091f40 <+472>: bl 0xffff0000080a021c
<cpu_do_switch_mm>
0xffff000008091f44 <+476>: b 0xffff000008091f08
<kpti_install_ng_mappings+416>
0xffff000008091f48 <+480>: mrs x0, tcr_el1
0xffff000008091f4c <+484>: and x0, x0, #0xffffffffffffffc0
0xffff000008091f50 <+488>: orr x0, x0, #0x10
0xffff000008091f54 <+492>: msr tcr_el1, x0
0xffff000008091f58 <+496>: isb
0xffff000008091f5c <+500>: b 0xffff000008091ec8
<kpti_install_ng_mappings+352>
0xffff000008091f60 <+504>: brk #0x800
0xffff000008091f64 <+508>: mrs x1, tcr_el1
0xffff000008091f68 <+512>: and x1, x1, #0xffffffffffffffc0
0xffff000008091f6c <+516>: orr x0, x1, x0
0xffff000008091f70 <+520>: msr tcr_el1, x0
0xffff000008091f74 <+524>: isb
0xffff000008091f78 <+528>: b 0xffff000008091df8
<kpti_install_ng_mappings+144>
End of assembler dump.
The crash log for it is as :
estuary:/$ ./qemu-system-aarch64 -machine
virt,kernel_irqchip=on,gic-version=3
-cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.17-joyx
-initrd
../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
console=ttyAMA0 earlycon=pl011,0x9000000"
[ 0.000000] Booting Linux on physical CPU 0x0000000000
[0x480fd010]
[ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #16
SMP PREEMPT Fri Jun 22 21:05:10 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000
(options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval) s57984
r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table
isolation (KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit
management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K
kernel code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K
reserved, 16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0,
CPUs=1, Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz
(virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000849] Console: colour dummy device 80x25
[ 0.001427] Calibrating delay loop (skipped), value
calculated using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002485] pid_max: default: 32768 minimum: 301
[ 0.002966] Security Framework initialized
[ 0.003549] Dentry cache hash table entries: 131072 (order:
8, 1048576 bytes)
[ 0.004353] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005068] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005858] Mountpoint-cache hash table entries: 2048
(order: 2, 16384 bytes)
[ 0.025962] ASID allocator initialised with 32768 entries
[ 0.029972] Hierarchical SRCU implementation.
[ 0.034341] Platform MSI: its domain created
[ 0.034793] PCI/MSI: /intc/its domain created
[ 0.035360] EFI services will not be available.
[ 0.038002] smp: Bringing up secondary CPUs ...
[ 0.038472] smp: Brought up 1 node, 1 CPU
[ 0.038878] SMP: Total of 1 processors activated.
[ 0.039354] CPU features: detected: GIC system register CPU
interface
[ 0.040004] CPU features: detected: Privileged Access Never
[ 0.040566] CPU features: detected: User Access Override
[ 0.042462] Insufficient stack space to handle exception!
[ 0.042464] ESR: 0x96000046 -- DABT (current EL)
[ 0.043781] FAR: 0xffff0000093a80e0
[ 0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
[ 0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 0.053361] Overflow stack:
[0xffff80003efce2f0..0xffff80003efcf2f0]
[ 0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #16
[ 0.067946] Hardware name: linux,dummy-virt (DT)
[ 0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
[ 0.077480] pc : el1_sync+0x0/0xb0
[ 0.080970] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.086143] sp : ffff0000093a80e0
[ 0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
[ 0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.100241] x25: ffff00000906d000 x24: ffff000009191000
[ 0.105657] x23: ffff000008ea9000 x22: 0000000041190000
[ 0.111448] x21: ffff0000091f7000 x20: 0000000000000000
[ 0.116437] x19: ffff000009190000 x18: 000000003455d99d
[ 0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.127155] x15: 000000007eff6000 x14: 000000007eff6000
[ 0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
[ 0.137886] x11: 000000007eff8000 x10: 0000000000000000
[ 0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
[ 0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
[ 0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
[ 0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
[ 0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
[ 0.170179] Kernel panic - not syncing: kernel stack overflow
[ 0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #16
[ 0.184152] Hardware name: linux,dummy-virt (DT)
[ 0.188851] Call trace:
[ 0.191380] dump_backtrace+0x0/0x180
[ 0.195113] show_stack+0x14/0x1c
[ 0.198488] dump_stack+0x90/0xb0
[ 0.201862] panic+0x138/0x2a0
[ 0.204989] __stack_chk_fail+0x0/0x18
[ 0.208836] handle_bad_stack+0x118/0x124
[ 0.212927] __bad_stack+0x88/0x8c
[ 0.216414] el1_sync+0x0/0xb0
[ 0.219544] Unable to handle kernel paging request at
virtual address ffff0000093abce0
[ 0.227507] Mem abort info:
[ 0.230390] ESR = 0x96000006
[ 0.233517] Exception class = DABT (current EL), IL = 32 bits
[ 0.239428] SET = 0, FnV = 0
[ 0.242555] EA = 0, S1PTW = 0
[ 0.245797] Data abort info:
[ 0.248795] ISV = 0, ISS = 0x00000006
[ 0.252652] CM = 0, WnR = 0
[ 0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.262645] [ffff0000093abce0] pgd=00000000411f8803,
pud=00000000411f9803, pmd=0000000000000000
[ 0.271438] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 0.277098] Modules linked in:
[ 0.280227] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #16
[ 0.288310] Hardware name: linux,dummy-virt (DT)
[ 0.293004] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
[ 0.297931] pc : unwind_frame+0x28/0xc8
[ 0.301792] lr : dump_backtrace+0x12c/0x180
[ 0.306114] sp : ffff80003efcf000
[ 0.309483] x29: ffff80003efcf000 x28: ffff80003da61c00
[ 0.314798] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.320216] x25: ffff00000906d000 x24: ffff0000093a80e0
[ 0.325527] x23: 0000000000000000 x22: ffff000008dbada8
[ 0.330941] x21: 0000000000000000 x20: ffff000009049000
[ 0.336355] x19: ffff80003da61c00 x18: 000000003455d99d
[ 0.341770] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.347078] x15: 000000007eff6000 x14: 642d386165636439
[ 0.352491] x13: 0000000000000000 x12: cc26f77952f87e00
[ 0.357905] x11: ffffffffffffffff x10: 0000000000000075
[ 0.363214] x9 : ffff0000085ae9e8 x8 : ffff80003efcec90
[ 0.368628] x7 : 0000000000000000 x6 : ffff0000091befe1
[ 0.374053] x5 : 0000000000000000 x4 : ffff0000093ac000
[ 0.379363] x3 : ffff0000093a8000 x2 : ffff0000093abce0
[ 0.384779] x1 : ffff80003efcf048 x0 : ffff80003da61c00
[ 0.390196] Process migration/0 (pid: 12, stack limit =
0x (ptrval))
[ 0.397188] Call trace:
[ 0.399712] unwind_frame+0x28/0xc8
[ 0.403316] show_stack+0x14/0x1c
[ 0.406689] dump_stack+0x90/0xb0
[ 0.410065] panic+0x138/0x2a0
[ 0.413193] __stack_chk_fail+0x0/0x18
[ 0.416934] handle_bad_stack+0x118/0x124
[ 0.421025] __bad_stack+0x88/0x8c
[ 0.424513] el1_sync+0x0/0xb0
[ 0.427643] Unable to handle kernel paging request at
virtual address ffff0000093abce0
[ 0.435604] Mem abort info:
[ 0.438488] ESR = 0x96000006
[ 0.441615] Exception class = DABT (current EL), IL = 32 bits
[ 0.447635] SET = 0, FnV = 0
[ 0.450759] EA = 0, S1PTW = 0
[ 0.454002] Data abort info:
[ 0.456896] ISV = 0, ISS = 0x00000006
[ 0.460863] CM = 0, WnR = 0
[ 0.463874] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.470750] [ffff0000093abce0] pgd=00000000411f8803,
pud=00000000411f9803, pmd=0000000000000000
Best Regards,
Wei
> Thanks,
>
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 13:46 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 13:46 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 2018/6/22 21:31, Will Deacon wrote:
> Hi again, Wei,
>
> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>> On 2018/6/22 19:16, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>>>> On 2018/6/22 17:23, Will Deacon wrote:
>>>>> Perhaps just writing back the table entries is enough to cause the issue,
>>>>> although I really can't understand why that would be the case. Can you try
>>>>> the diff below (without my previous change), please?
>>>> Thanks!
>>>> But it does not resolve the issue(only apply this patch based on 4.17.0).
>>> Thanks, that's a useful data point. It means that it still crashes even if
>>> we write back the same table entries, so it's the fact that we're writing
>>> them at all which causes the problem, not the value that we write.
>>>
>>> Whilst looking at the code, we noticed a missing DMB. On the off-chance
>>> that it helps, can you try this instead please?
>> Thanks!
>> Only apply below patch based on 4.17.0, we still got the crash.
> Oh well, it was worth a shot (and that's still a fix worth having). Please
> can you provide the complete disassembly for kpti_install_ng_mappings()
> (I'm referring to the C function in cpufeature.c) along with a corresponding
> crash log so that we can correlate the instruction stream with the crash?
Just let me know if you need more information.
Thanks!
The disassemble code is as below:
Dump of assembler code for function kpti_install_ng_mappings:
0xffff000008091d68 <+0>: stp x29, x30, [sp,#-112]!
0xffff000008091d6c <+4>: adrp x0, 0xffff000009022000
<bp_hardening_data>
0xffff000008091d70 <+8>: mov x29, sp
0xffff000008091d74 <+12>: stp x23, x24, [sp,#48]
0xffff000008091d78 <+16>: adrp x24, 0xffff000009191000
<reset_devices>
0xffff000008091d7c <+20>: add x0, x0, #0x10
0xffff000008091d80 <+24>: add x1, x24, #0x550
0xffff000008091d84 <+28>: stp x19, x20, [sp,#16]
0xffff000008091d88 <+32>: stp x21, x22, [sp,#32]
0xffff000008091d8c <+36>: stp x25, x26, [sp,#64]
0xffff000008091d90 <+40>: stp x27, x28, [sp,#80]
0xffff000008091d94 <+44>: mrs x2, tpidr_el1
0xffff000008091d98 <+48>: ldrb w1, [x1,#8]
0xffff000008091d9c <+52>: ldr w20, [x2,x0]
0xffff000008091da0 <+56>: cbnz w1, 0xffff000008091f18
<kpti_install_ng_mappings+432>
0xffff000008091da4 <+60>: adrp x27, 0xffff000008ea9000
<cpu_ops+384>
0xffff000008091da8 <+64>: adrp x19, 0xffff000009190000
<empty_zero_page>
0xffff000008091dac <+68>: add x19, x19, #0x0
0xffff000008091db0 <+72>: adrp x1, 0xffff000008a44000
<kimage_vaddr>
0xffff000008091db4 <+76>: mov x0, x19
0xffff000008091db8 <+80>: add x1, x1, #0x3d8
0xffff000008091dbc <+84>: ldr x2, [x27,#672]
0xffff000008091dc0 <+88>: sub x4, x1, x2
0xffff000008091dc4 <+92>: sub x0, x0, x2
0xffff000008091dc8 <+96>: msr ttbr0_el1, x0
0xffff000008091dcc <+100>: isb
0xffff000008091dd0 <+104>: dsb nshst
0xffff000008091dd4 <+108>: tlbi vmalle1
0xffff000008091dd8 <+112>: nop
0xffff000008091ddc <+116>: nop
0xffff000008091de0 <+120>: dsb nsh
0xffff000008091de4 <+124>: isb
0xffff000008091de8 <+128>: adrp x3, 0xffff000009056000
<armv8_event_attr_sw_incr+8>
0xffff000008091dec <+132>: ldr x0, [x3,#2248]
0xffff000008091df0 <+136>: cmp x0, #0x10
0xffff000008091df4 <+140>: b.ne 0xffff000008091f64
<kpti_install_ng_mappings+508>
0xffff000008091df8 <+144>: adrp x28, 0xffff000008ea9000
<cpu_ops+384>
0xffff000008091dfc <+148>: ldr x2, [x27,#672]
0xffff000008091e00 <+152>: adrp x1, 0xffff0000091f3000
0xffff000008091e04 <+156>: adrp x26, 0xffff0000091f7000
0xffff000008091e08 <+160>: add x1, x1, #0x0
0xffff000008091e0c <+164>: add x21, x26, #0x0
0xffff000008091e10 <+168>: ldr x0, [x28,#656]
0xffff000008091e14 <+172>: adrp x23, 0xffff000008ea9000
<cpu_ops+384>
0xffff000008091e18 <+176>: sub x1, x1, x2
0xffff000008091e1c <+180>: sub x1, x1, x0
0xffff000008091e20 <+184>: orr x0, x1, #0xffff800000000000
0xffff000008091e24 <+188>: cmp x0, x21
0xffff000008091e28 <+192>: b.eq 0xffff000008091f60
<kpti_install_ng_mappings+504>
0xffff000008091e2c <+196>: mov x22, x19
0xffff000008091e30 <+200>: str x3, [x29,#96]
0xffff000008091e34 <+204>: str x4, [x29,#104]
0xffff000008091e38 <+208>: sub x2, x22, x2
0xffff000008091e3c <+212>: msr ttbr0_el1, x2
0xffff000008091e40 <+216>: isb
0xffff000008091e44 <+220>: ldr x0, [x28,#656]
0xffff000008091e48 <+224>: and x1, x1, #0x7fffffffffff
0xffff000008091e4c <+228>: adrp x25, 0xffff00000906d000
<shmem_swaplist_mutex+16>
0xffff000008091e50 <+232>: add x0, x1, x0
0xffff000008091e54 <+236>: add x1, x25, #0x7b0
0xffff000008091e58 <+240>: bl 0xffff0000080a021c
<cpu_do_switch_mm>
0xffff000008091e5c <+244>: adrp x0, 0xffff00000904a000
<__cpu_online_mask>
0xffff000008091e60 <+248>: mov w1,
#0x80 // #128
0xffff000008091e64 <+252>: add x0, x0, #0x0
0xffff000008091e68 <+256>: bl 0xffff0000083e22f0
<__bitmap_weight>
0xffff000008091e6c <+260>: mov w1, w0
0xffff000008091e70 <+264>: ldr x5, [x23,#672]
0xffff000008091e74 <+268>: mov w0, w20
0xffff000008091e78 <+272>: ldr x4, [x29,#104]
0xffff000008091e7c <+276>: mov x2, x21
0xffff000008091e80 <+280>: sub x2, x2, x5
0xffff000008091e84 <+284>: blr x4
0xffff000008091e88 <+288>: ldr x1, [x23,#672]
0xffff000008091e8c <+292>: mrs x0, sp_el0
0xffff000008091e90 <+296>: sub x22, x22, x1
0xffff000008091e94 <+300>: ldr x1, [x0,#1128]
0xffff000008091e98 <+304>: msr ttbr0_el1, x22
0xffff000008091e9c <+308>: isb
0xffff000008091ea0 <+312>: dsb nshst
0xffff000008091ea4 <+316>: tlbi vmalle1
0xffff000008091ea8 <+320>: nop
0xffff000008091eac <+324>: nop
0xffff000008091eb0 <+328>: dsb nsh
0xffff000008091eb4 <+332>: isb
0xffff000008091eb8 <+336>: ldr x3, [x29,#96]
0xffff000008091ebc <+340>: ldr x0, [x3,#2248]
0xffff000008091ec0 <+344>: cmp x0, #0x10
0xffff000008091ec4 <+348>: b.ne 0xffff000008091f48
<kpti_install_ng_mappings+480>
0xffff000008091ec8 <+352>: add x25, x25, #0x7b0
0xffff000008091ecc <+356>: cmp x1, x25
0xffff000008091ed0 <+360>: b.eq 0xffff000008091f08
<kpti_install_ng_mappings+416>
0xffff000008091ed4 <+364>: ldr x2, [x1,#64]
0xffff000008091ed8 <+368>: add x26, x26, #0x0
0xffff000008091edc <+372>: cmp x2, x26
0xffff000008091ee0 <+376>: b.eq 0xffff000008091f60
<kpti_install_ng_mappings+504>
0xffff000008091ee4 <+380>: ldr x0, [x27,#672]
0xffff000008091ee8 <+384>: sub x19, x19, x0
0xffff000008091eec <+388>: msr ttbr0_el1, x19
0xffff000008091ef0 <+392>: isb
0xffff000008091ef4 <+396>: tbz x2, #47, 0xffff000008091f34
<kpti_install_ng_mappings+460>
0xffff000008091ef8 <+400>: ldr x0, [x28,#656]
0xffff000008091efc <+404>: and x2, x2, #0x7fffffffffff
0xffff000008091f00 <+408>: add x0, x2, x0
0xffff000008091f04 <+412>: bl 0xffff0000080a021c
<cpu_do_switch_mm>
0xffff000008091f08 <+416>: cbnz w20, 0xffff000008091f18
<kpti_install_ng_mappings+432>
0xffff000008091f0c <+420>: add x24, x24, #0x550
0xffff000008091f10 <+424>: mov w0,
#0x1 // #1
0xffff000008091f14 <+428>: strb w0, [x24,#8]
0xffff000008091f18 <+432>: ldp x19, x20, [sp,#16]
0xffff000008091f1c <+436>: ldp x21, x22, [sp,#32]
0xffff000008091f20 <+440>: ldp x23, x24, [sp,#48]
0xffff000008091f24 <+444>: ldp x25, x26, [sp,#64]
0xffff000008091f28 <+448>: ldp x27, x28, [sp,#80]
0xffff000008091f2c <+452>: ldp x29, x30, [sp],#112
0xffff000008091f30 <+456>: ret
0xffff000008091f34 <+460>: adrp x0, 0xffff000008ea9000
<cpu_ops+384>
0xffff000008091f38 <+464>: ldr x0, [x0,#672]
0xffff000008091f3c <+468>: sub x0, x2, x0
0xffff000008091f40 <+472>: bl 0xffff0000080a021c
<cpu_do_switch_mm>
0xffff000008091f44 <+476>: b 0xffff000008091f08
<kpti_install_ng_mappings+416>
0xffff000008091f48 <+480>: mrs x0, tcr_el1
0xffff000008091f4c <+484>: and x0, x0, #0xffffffffffffffc0
0xffff000008091f50 <+488>: orr x0, x0, #0x10
0xffff000008091f54 <+492>: msr tcr_el1, x0
0xffff000008091f58 <+496>: isb
0xffff000008091f5c <+500>: b 0xffff000008091ec8
<kpti_install_ng_mappings+352>
0xffff000008091f60 <+504>: brk #0x800
0xffff000008091f64 <+508>: mrs x1, tcr_el1
0xffff000008091f68 <+512>: and x1, x1, #0xffffffffffffffc0
0xffff000008091f6c <+516>: orr x0, x1, x0
0xffff000008091f70 <+520>: msr tcr_el1, x0
0xffff000008091f74 <+524>: isb
0xffff000008091f78 <+528>: b 0xffff000008091df8
<kpti_install_ng_mappings+144>
End of assembler dump.
The crash log for it is as :
estuary:/$ ./qemu-system-aarch64 -machine
virt,kernel_irqchip=on,gic-version=3
-cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.17-joyx
-initrd
../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
console=ttyAMA0 earlycon=pl011,0x9000000"
[ 0.000000] Booting Linux on physical CPU 0x0000000000
[0x480fd010]
[ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #16
SMP PREEMPT Fri Jun 22 21:05:10 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000
(options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval) s57984
r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table
isolation (KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit
management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K
kernel code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K
reserved, 16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0,
CPUs=1, Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz
(virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000849] Console: colour dummy device 80x25
[ 0.001427] Calibrating delay loop (skipped), value
calculated using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002485] pid_max: default: 32768 minimum: 301
[ 0.002966] Security Framework initialized
[ 0.003549] Dentry cache hash table entries: 131072 (order:
8, 1048576 bytes)
[ 0.004353] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005068] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005858] Mountpoint-cache hash table entries: 2048
(order: 2, 16384 bytes)
[ 0.025962] ASID allocator initialised with 32768 entries
[ 0.029972] Hierarchical SRCU implementation.
[ 0.034341] Platform MSI: its domain created
[ 0.034793] PCI/MSI: /intc/its domain created
[ 0.035360] EFI services will not be available.
[ 0.038002] smp: Bringing up secondary CPUs ...
[ 0.038472] smp: Brought up 1 node, 1 CPU
[ 0.038878] SMP: Total of 1 processors activated.
[ 0.039354] CPU features: detected: GIC system register CPU
interface
[ 0.040004] CPU features: detected: Privileged Access Never
[ 0.040566] CPU features: detected: User Access Override
[ 0.042462] Insufficient stack space to handle exception!
[ 0.042464] ESR: 0x96000046 -- DABT (current EL)
[ 0.043781] FAR: 0xffff0000093a80e0
[ 0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
[ 0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
[ 0.053361] Overflow stack:
[0xffff80003efce2f0..0xffff80003efcf2f0]
[ 0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #16
[ 0.067946] Hardware name: linux,dummy-virt (DT)
[ 0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
[ 0.077480] pc : el1_sync+0x0/0xb0
[ 0.080970] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.086143] sp : ffff0000093a80e0
[ 0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
[ 0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.100241] x25: ffff00000906d000 x24: ffff000009191000
[ 0.105657] x23: ffff000008ea9000 x22: 0000000041190000
[ 0.111448] x21: ffff0000091f7000 x20: 0000000000000000
[ 0.116437] x19: ffff000009190000 x18: 000000003455d99d
[ 0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.127155] x15: 000000007eff6000 x14: 000000007eff6000
[ 0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
[ 0.137886] x11: 000000007eff8000 x10: 0000000000000000
[ 0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
[ 0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
[ 0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
[ 0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
[ 0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
[ 0.170179] Kernel panic - not syncing: kernel stack overflow
[ 0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #16
[ 0.184152] Hardware name: linux,dummy-virt (DT)
[ 0.188851] Call trace:
[ 0.191380] dump_backtrace+0x0/0x180
[ 0.195113] show_stack+0x14/0x1c
[ 0.198488] dump_stack+0x90/0xb0
[ 0.201862] panic+0x138/0x2a0
[ 0.204989] __stack_chk_fail+0x0/0x18
[ 0.208836] handle_bad_stack+0x118/0x124
[ 0.212927] __bad_stack+0x88/0x8c
[ 0.216414] el1_sync+0x0/0xb0
[ 0.219544] Unable to handle kernel paging request at
virtual address ffff0000093abce0
[ 0.227507] Mem abort info:
[ 0.230390] ESR = 0x96000006
[ 0.233517] Exception class = DABT (current EL), IL = 32 bits
[ 0.239428] SET = 0, FnV = 0
[ 0.242555] EA = 0, S1PTW = 0
[ 0.245797] Data abort info:
[ 0.248795] ISV = 0, ISS = 0x00000006
[ 0.252652] CM = 0, WnR = 0
[ 0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.262645] [ffff0000093abce0] pgd=00000000411f8803,
pud=00000000411f9803, pmd=0000000000000000
[ 0.271438] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 0.277098] Modules linked in:
[ 0.280227] CPU: 0 PID: 12 Comm: migration/0 Not tainted
4.17.0-45864-g29dcea8-dirty #16
[ 0.288310] Hardware name: linux,dummy-virt (DT)
[ 0.293004] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
[ 0.297931] pc : unwind_frame+0x28/0xc8
[ 0.301792] lr : dump_backtrace+0x12c/0x180
[ 0.306114] sp : ffff80003efcf000
[ 0.309483] x29: ffff80003efcf000 x28: ffff80003da61c00
[ 0.314798] x27: ffff000008ea9000 x26: ffff0000091f7000
[ 0.320216] x25: ffff00000906d000 x24: ffff0000093a80e0
[ 0.325527] x23: 0000000000000000 x22: ffff000008dbada8
[ 0.330941] x21: 0000000000000000 x20: ffff000009049000
[ 0.336355] x19: ffff80003da61c00 x18: 000000003455d99d
[ 0.341770] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.347078] x15: 000000007eff6000 x14: 642d386165636439
[ 0.352491] x13: 0000000000000000 x12: cc26f77952f87e00
[ 0.357905] x11: ffffffffffffffff x10: 0000000000000075
[ 0.363214] x9 : ffff0000085ae9e8 x8 : ffff80003efcec90
[ 0.368628] x7 : 0000000000000000 x6 : ffff0000091befe1
[ 0.374053] x5 : 0000000000000000 x4 : ffff0000093ac000
[ 0.379363] x3 : ffff0000093a8000 x2 : ffff0000093abce0
[ 0.384779] x1 : ffff80003efcf048 x0 : ffff80003da61c00
[ 0.390196] Process migration/0 (pid: 12, stack limit =
0x (ptrval))
[ 0.397188] Call trace:
[ 0.399712] unwind_frame+0x28/0xc8
[ 0.403316] show_stack+0x14/0x1c
[ 0.406689] dump_stack+0x90/0xb0
[ 0.410065] panic+0x138/0x2a0
[ 0.413193] __stack_chk_fail+0x0/0x18
[ 0.416934] handle_bad_stack+0x118/0x124
[ 0.421025] __bad_stack+0x88/0x8c
[ 0.424513] el1_sync+0x0/0xb0
[ 0.427643] Unable to handle kernel paging request at
virtual address ffff0000093abce0
[ 0.435604] Mem abort info:
[ 0.438488] ESR = 0x96000006
[ 0.441615] Exception class = DABT (current EL), IL = 32 bits
[ 0.447635] SET = 0, FnV = 0
[ 0.450759] EA = 0, S1PTW = 0
[ 0.454002] Data abort info:
[ 0.456896] ISV = 0, ISS = 0x00000006
[ 0.460863] CM = 0, WnR = 0
[ 0.463874] swapper pgtable: 4k pages, 48-bit VAs, pgdp
= (ptrval)
[ 0.470750] [ffff0000093abce0] pgd=00000000411f8803,
pud=00000000411f9803, pmd=0000000000000000
Best Regards,
Wei
> Thanks,
>
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-22 13:18 ` Wei Xu
@ 2018-06-22 14:28 ` Mark Rutland
-1 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-22 14:28 UTC (permalink / raw)
To: Wei Xu
Cc: Will Deacon, James Morse, catalin.marinas, suzuki.poulose,
dave.martin, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> [ 0.042462] Insufficient stack space to handle exception!
> [ 0.042464] ESR: 0x96000046 -- DABT (current EL)
> [ 0.043781] FAR: 0xffff0000093a80e0
> [ 0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
Here, the FAR points somewhere in the task stack, so we're evidently
faulting on that...
> [ 0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
> [ 0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> [ 0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #16
> [ 0.067946] Hardware name: linux,dummy-virt (DT)
> [ 0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> [ 0.077480] pc : el1_sync+0x0/0xb0
> [ 0.080970] lr : kpti_install_ng_mappings+0x120/0x214
> [ 0.086143] sp : ffff0000093a80e0
> [ 0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
> [ 0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
> [ 0.100241] x25: ffff00000906d000 x24: ffff000009191000
> [ 0.105657] x23: ffff000008ea9000 x22: 0000000041190000
> [ 0.111448] x21: ffff0000091f7000 x20: 0000000000000000
> [ 0.116437] x19: ffff000009190000 x18: 000000003455d99d
> [ 0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
> [ 0.127155] x15: 000000007eff6000 x14: 000000007eff6000
> [ 0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
> [ 0.137886] x11: 000000007eff8000 x10: 0000000000000000
> [ 0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
> [ 0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
> [ 0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
> [ 0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
> [ 0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
> [ 0.170179] Kernel panic - not syncing: kernel stack overflow
> [ 0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #16
> [ 0.184152] Hardware name: linux,dummy-virt (DT)
> [ 0.188851] Call trace:
> [ 0.191380] dump_backtrace+0x0/0x180
> [ 0.195113] show_stack+0x14/0x1c
> [ 0.198488] dump_stack+0x90/0xb0
> [ 0.201862] panic+0x138/0x2a0
> [ 0.204989] __stack_chk_fail+0x0/0x18
> [ 0.208836] handle_bad_stack+0x118/0x124
> [ 0.212927] __bad_stack+0x88/0x8c
> [ 0.216414] el1_sync+0x0/0xb0
> [ 0.219544] Unable to handle kernel paging request at virtual address
> ffff0000093abce0
Likewise, here we're faulting on an address within the task stack,
presumably as part of the unwinding process...
> [ 0.227507] Mem abort info:
> [ 0.230390] ESR = 0x96000006
> [ 0.233517] Exception class = DABT (current EL), IL = 32 bits
> [ 0.239428] SET = 0, FnV = 0
> [ 0.242555] EA = 0, S1PTW = 0
> [ 0.245797] Data abort info:
> [ 0.248795] ISV = 0, ISS = 0x00000006
> [ 0.252652] CM = 0, WnR = 0
> [ 0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
> (ptrval)
> [ 0.262645] [ffff0000093abce0] pgd=00000000411f8803,
> pud=00000000411f9803, pmd=0000000000000000
... and here the PMD for the task stack is all zeroes, so evidently
that's getting corrupted somehow.
It appears that the overflow stack (which IIRC is embedded within the
kernel's data segment, as part of the image mapping), is fine.
I wonder if there's some existing weirdness in the page tables for the
vmalloc area that causes things to go wrong. Can you please:
* enable ARM64_PTDUMP_DEBUGFS
* boot with kpti=off (with Will's patch to make this work)
* as root, cat /sys/kernel/debug/kernel_page_tables
... and dump the result here?
Thanks,
Mark.
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 14:28 ` Mark Rutland
0 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-22 14:28 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> [ 0.042462] Insufficient stack space to handle exception!
> [ 0.042464] ESR: 0x96000046 -- DABT (current EL)
> [ 0.043781] FAR: 0xffff0000093a80e0
> [ 0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
Here, the FAR points somewhere in the task stack, so we're evidently
faulting on that...
> [ 0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
> [ 0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> [ 0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #16
> [ 0.067946] Hardware name: linux,dummy-virt (DT)
> [ 0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> [ 0.077480] pc : el1_sync+0x0/0xb0
> [ 0.080970] lr : kpti_install_ng_mappings+0x120/0x214
> [ 0.086143] sp : ffff0000093a80e0
> [ 0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
> [ 0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
> [ 0.100241] x25: ffff00000906d000 x24: ffff000009191000
> [ 0.105657] x23: ffff000008ea9000 x22: 0000000041190000
> [ 0.111448] x21: ffff0000091f7000 x20: 0000000000000000
> [ 0.116437] x19: ffff000009190000 x18: 000000003455d99d
> [ 0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
> [ 0.127155] x15: 000000007eff6000 x14: 000000007eff6000
> [ 0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
> [ 0.137886] x11: 000000007eff8000 x10: 0000000000000000
> [ 0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
> [ 0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
> [ 0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
> [ 0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
> [ 0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
> [ 0.170179] Kernel panic - not syncing: kernel stack overflow
> [ 0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #16
> [ 0.184152] Hardware name: linux,dummy-virt (DT)
> [ 0.188851] Call trace:
> [ 0.191380] dump_backtrace+0x0/0x180
> [ 0.195113] show_stack+0x14/0x1c
> [ 0.198488] dump_stack+0x90/0xb0
> [ 0.201862] panic+0x138/0x2a0
> [ 0.204989] __stack_chk_fail+0x0/0x18
> [ 0.208836] handle_bad_stack+0x118/0x124
> [ 0.212927] __bad_stack+0x88/0x8c
> [ 0.216414] el1_sync+0x0/0xb0
> [ 0.219544] Unable to handle kernel paging request at virtual address
> ffff0000093abce0
Likewise, here we're faulting on an address within the task stack,
presumably as part of the unwinding process...
> [ 0.227507] Mem abort info:
> [ 0.230390] ESR = 0x96000006
> [ 0.233517] Exception class = DABT (current EL), IL = 32 bits
> [ 0.239428] SET = 0, FnV = 0
> [ 0.242555] EA = 0, S1PTW = 0
> [ 0.245797] Data abort info:
> [ 0.248795] ISV = 0, ISS = 0x00000006
> [ 0.252652] CM = 0, WnR = 0
> [ 0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
> (ptrval)
> [ 0.262645] [ffff0000093abce0] pgd=00000000411f8803,
> pud=00000000411f9803, pmd=0000000000000000
... and here the PMD for the task stack is all zeroes, so evidently
that's getting corrupted somehow.
It appears that the overflow stack (which IIRC is embedded within the
kernel's data segment, as part of the image mapping), is fine.
I wonder if there's some existing weirdness in the page tables for the
vmalloc area that causes things to go wrong. Can you please:
* enable ARM64_PTDUMP_DEBUGFS
* boot with kpti=off (with Will's patch to make this work)
* as root, cat /sys/kernel/debug/kernel_page_tables
... and dump the result here?
Thanks,
Mark.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-22 13:46 ` Wei Xu
@ 2018-06-22 14:43 ` Will Deacon
-1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 14:43 UTC (permalink / raw)
To: Wei Xu
Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
On Fri, Jun 22, 2018 at 09:46:53PM +0800, Wei Xu wrote:
> On 2018/6/22 21:31, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> >>On 2018/6/22 19:16, Will Deacon wrote:
> >>>On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> >>>>On 2018/6/22 17:23, Will Deacon wrote:
> >>>>>Perhaps just writing back the table entries is enough to cause the issue,
> >>>>>although I really can't understand why that would be the case. Can you try
> >>>>>the diff below (without my previous change), please?
> >>>>Thanks!
> >>>>But it does not resolve the issue(only apply this patch based on 4.17.0).
> >>>Thanks, that's a useful data point. It means that it still crashes even if
> >>>we write back the same table entries, so it's the fact that we're writing
> >>>them at all which causes the problem, not the value that we write.
> >>>
> >>>Whilst looking at the code, we noticed a missing DMB. On the off-chance
> >>>that it helps, can you try this instead please?
> >>Thanks!
> >>Only apply below patch based on 4.17.0, we still got the crash.
> >Oh well, it was worth a shot (and that's still a fix worth having). Please
> >can you provide the complete disassembly for kpti_install_ng_mappings()
> >(I'm referring to the C function in cpufeature.c) along with a corresponding
> >crash log so that we can correlate the instruction stream with the crash?
> Just let me know if you need more information.
Thanks; the disassembly and log are really helpful.
I have another patch for you to try below. Please can you let me know how
you get on, and sorry for the back-and-forth on this.
Will
--->8
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..26c5c3fabca8 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -216,9 +216,14 @@ ENDPROC(idmap_cpu_replace_ttbr1)
.endm
.macro __idmap_kpti_put_pgtable_ent_ng, type
- orr \type, \type, #PTE_NG // Same bit for blocks and pages
+ eor \type, \type, #PTE_NG // Same bit for blocks and pages
str \type, [cur_\()\type\()p] // Update the entry and ensure it
+ tbz \type, #11, 1234f
dc civac, cur_\()\type\()p // is visible to all CPUs.
+ b 1235f
+ 1234:
+ dc cvac, cur_\()\type\()p
+ 1235:
.endm
/*
@@ -298,6 +303,7 @@ skip_pgd:
/* PUD */
walk_puds:
.if CONFIG_PGTABLE_LEVELS > 3
+ eor pgd, pgd, #PTE_NG
pte_to_phys cur_pudp, pgd
add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
do_pud: __idmap_kpti_get_pgtable_ent pud
@@ -319,6 +325,7 @@ next_pud:
/* PMD */
walk_pmds:
.if CONFIG_PGTABLE_LEVELS > 2
+ eor pud, pud, #PTE_NG
pte_to_phys cur_pmdp, pud
add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
do_pmd: __idmap_kpti_get_pgtable_ent pmd
@@ -339,6 +346,7 @@ next_pmd:
/* PTE */
walk_ptes:
+ eor pmd, pmd, #PTE_NG
pte_to_phys cur_ptep, pmd
add end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
do_pte: __idmap_kpti_get_pgtable_ent pte
^ permalink raw reply related [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 14:43 ` Will Deacon
0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 14:43 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jun 22, 2018 at 09:46:53PM +0800, Wei Xu wrote:
> On 2018/6/22 21:31, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> >>On 2018/6/22 19:16, Will Deacon wrote:
> >>>On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> >>>>On 2018/6/22 17:23, Will Deacon wrote:
> >>>>>Perhaps just writing back the table entries is enough to cause the issue,
> >>>>>although I really can't understand why that would be the case. Can you try
> >>>>>the diff below (without my previous change), please?
> >>>>Thanks!
> >>>>But it does not resolve the issue(only apply this patch based on 4.17.0).
> >>>Thanks, that's a useful data point. It means that it still crashes even if
> >>>we write back the same table entries, so it's the fact that we're writing
> >>>them at all which causes the problem, not the value that we write.
> >>>
> >>>Whilst looking at the code, we noticed a missing DMB. On the off-chance
> >>>that it helps, can you try this instead please?
> >>Thanks!
> >>Only apply below patch based on 4.17.0, we still got the crash.
> >Oh well, it was worth a shot (and that's still a fix worth having). Please
> >can you provide the complete disassembly for kpti_install_ng_mappings()
> >(I'm referring to the C function in cpufeature.c) along with a corresponding
> >crash log so that we can correlate the instruction stream with the crash?
> Just let me know if you need more information.
Thanks; the disassembly and log are really helpful.
I have another patch for you to try below. Please can you let me know how
you get on, and sorry for the back-and-forth on this.
Will
--->8
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..26c5c3fabca8 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -216,9 +216,14 @@ ENDPROC(idmap_cpu_replace_ttbr1)
.endm
.macro __idmap_kpti_put_pgtable_ent_ng, type
- orr \type, \type, #PTE_NG // Same bit for blocks and pages
+ eor \type, \type, #PTE_NG // Same bit for blocks and pages
str \type, [cur_\()\type\()p] // Update the entry and ensure it
+ tbz \type, #11, 1234f
dc civac, cur_\()\type\()p // is visible to all CPUs.
+ b 1235f
+ 1234:
+ dc cvac, cur_\()\type\()p
+ 1235:
.endm
/*
@@ -298,6 +303,7 @@ skip_pgd:
/* PUD */
walk_puds:
.if CONFIG_PGTABLE_LEVELS > 3
+ eor pgd, pgd, #PTE_NG
pte_to_phys cur_pudp, pgd
add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
do_pud: __idmap_kpti_get_pgtable_ent pud
@@ -319,6 +325,7 @@ next_pud:
/* PMD */
walk_pmds:
.if CONFIG_PGTABLE_LEVELS > 2
+ eor pud, pud, #PTE_NG
pte_to_phys cur_pmdp, pud
add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
do_pmd: __idmap_kpti_get_pgtable_ent pmd
@@ -339,6 +346,7 @@ next_pmd:
/* PTE */
walk_ptes:
+ eor pmd, pmd, #PTE_NG
pte_to_phys cur_ptep, pmd
add end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
do_pte: __idmap_kpti_get_pgtable_ent pte
^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-22 14:43 ` Will Deacon
@ 2018-06-22 15:26 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 15:26 UTC (permalink / raw)
To: Will Deacon
Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
Hi Will,
On 2018/6/22 22:43, Will Deacon wrote:
> On Fri, Jun 22, 2018 at 09:46:53PM +0800, Wei Xu wrote:
>> On 2018/6/22 21:31, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>>>> On 2018/6/22 19:16, Will Deacon wrote:
>>>>> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>>>>>> On 2018/6/22 17:23, Will Deacon wrote:
>>>>>>> Perhaps just writing back the table entries is enough to cause the issue,
>>>>>>> although I really can't understand why that would be the case. Can you try
>>>>>>> the diff below (without my previous change), please?
>>>>>> Thanks!
>>>>>> But it does not resolve the issue(only apply this patch based on 4.17.0).
>>>>> Thanks, that's a useful data point. It means that it still crashes even if
>>>>> we write back the same table entries, so it's the fact that we're writing
>>>>> them at all which causes the problem, not the value that we write.
>>>>>
>>>>> Whilst looking at the code, we noticed a missing DMB. On the off-chance
>>>>> that it helps, can you try this instead please?
>>>> Thanks!
>>>> Only apply below patch based on 4.17.0, we still got the crash.
>>> Oh well, it was worth a shot (and that's still a fix worth having). Please
>>> can you provide the complete disassembly for kpti_install_ng_mappings()
>>> (I'm referring to the C function in cpufeature.c) along with a corresponding
>>> crash log so that we can correlate the instruction stream with the crash?
>> Just let me know if you need more information.
> Thanks; the disassembly and log are really helpful.
>
> I have another patch for you to try below. Please can you let me know how
> you get on, and sorry for the back-and-forth on this.
No worry.
Great, I have tried 30 times and it works well with this patch applying
on the 4.17.0.
And is it possible to let me know how you are using the disassemble and
log to debug
this kind issue?
Thanks!
Best Regards,
Wei
> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..26c5c3fabca8 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -216,9 +216,14 @@ ENDPROC(idmap_cpu_replace_ttbr1)
> .endm
>
> .macro __idmap_kpti_put_pgtable_ent_ng, type
> - orr \type, \type, #PTE_NG // Same bit for blocks and pages
> + eor \type, \type, #PTE_NG // Same bit for blocks and pages
> str \type, [cur_\()\type\()p] // Update the entry and ensure it
> + tbz \type, #11, 1234f
> dc civac, cur_\()\type\()p // is visible to all CPUs.
> + b 1235f
> + 1234:
> + dc cvac, cur_\()\type\()p
> + 1235:
> .endm
>
> /*
> @@ -298,6 +303,7 @@ skip_pgd:
> /* PUD */
> walk_puds:
> .if CONFIG_PGTABLE_LEVELS > 3
> + eor pgd, pgd, #PTE_NG
> pte_to_phys cur_pudp, pgd
> add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
> do_pud: __idmap_kpti_get_pgtable_ent pud
> @@ -319,6 +325,7 @@ next_pud:
> /* PMD */
> walk_pmds:
> .if CONFIG_PGTABLE_LEVELS > 2
> + eor pud, pud, #PTE_NG
> pte_to_phys cur_pmdp, pud
> add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
> do_pmd: __idmap_kpti_get_pgtable_ent pmd
> @@ -339,6 +346,7 @@ next_pmd:
>
> /* PTE */
> walk_ptes:
> + eor pmd, pmd, #PTE_NG
> pte_to_phys cur_ptep, pmd
> add end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
> do_pte: __idmap_kpti_get_pgtable_ent pte
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 15:26 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 15:26 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 2018/6/22 22:43, Will Deacon wrote:
> On Fri, Jun 22, 2018 at 09:46:53PM +0800, Wei Xu wrote:
>> On 2018/6/22 21:31, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>>>> On 2018/6/22 19:16, Will Deacon wrote:
>>>>> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>>>>>> On 2018/6/22 17:23, Will Deacon wrote:
>>>>>>> Perhaps just writing back the table entries is enough to cause the issue,
>>>>>>> although I really can't understand why that would be the case. Can you try
>>>>>>> the diff below (without my previous change), please?
>>>>>> Thanks!
>>>>>> But it does not resolve the issue(only apply this patch based on 4.17.0).
>>>>> Thanks, that's a useful data point. It means that it still crashes even if
>>>>> we write back the same table entries, so it's the fact that we're writing
>>>>> them at all which causes the problem, not the value that we write.
>>>>>
>>>>> Whilst looking at the code, we noticed a missing DMB. On the off-chance
>>>>> that it helps, can you try this instead please?
>>>> Thanks!
>>>> Only apply below patch based on 4.17.0, we still got the crash.
>>> Oh well, it was worth a shot (and that's still a fix worth having). Please
>>> can you provide the complete disassembly for kpti_install_ng_mappings()
>>> (I'm referring to the C function in cpufeature.c) along with a corresponding
>>> crash log so that we can correlate the instruction stream with the crash?
>> Just let me know if you need more information.
> Thanks; the disassembly and log are really helpful.
>
> I have another patch for you to try below. Please can you let me know how
> you get on, and sorry for the back-and-forth on this.
No worry.
Great, I have tried 30 times and it works well with this patch applying
on the 4.17.0.
And is it possible to let me know how you are using the disassemble and
log to debug
this kind issue?
Thanks!
Best Regards,
Wei
> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..26c5c3fabca8 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -216,9 +216,14 @@ ENDPROC(idmap_cpu_replace_ttbr1)
> .endm
>
> .macro __idmap_kpti_put_pgtable_ent_ng, type
> - orr \type, \type, #PTE_NG // Same bit for blocks and pages
> + eor \type, \type, #PTE_NG // Same bit for blocks and pages
> str \type, [cur_\()\type\()p] // Update the entry and ensure it
> + tbz \type, #11, 1234f
> dc civac, cur_\()\type\()p // is visible to all CPUs.
> + b 1235f
> + 1234:
> + dc cvac, cur_\()\type\()p
> + 1235:
> .endm
>
> /*
> @@ -298,6 +303,7 @@ skip_pgd:
> /* PUD */
> walk_puds:
> .if CONFIG_PGTABLE_LEVELS > 3
> + eor pgd, pgd, #PTE_NG
> pte_to_phys cur_pudp, pgd
> add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
> do_pud: __idmap_kpti_get_pgtable_ent pud
> @@ -319,6 +325,7 @@ next_pud:
> /* PMD */
> walk_pmds:
> .if CONFIG_PGTABLE_LEVELS > 2
> + eor pud, pud, #PTE_NG
> pte_to_phys cur_pmdp, pud
> add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
> do_pmd: __idmap_kpti_get_pgtable_ent pmd
> @@ -339,6 +346,7 @@ next_pmd:
>
> /* PTE */
> walk_ptes:
> + eor pmd, pmd, #PTE_NG
> pte_to_phys cur_ptep, pmd
> add end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
> do_pte: __idmap_kpti_get_pgtable_ent pte
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-22 14:28 ` Mark Rutland
@ 2018-06-22 15:28 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 15:28 UTC (permalink / raw)
To: Mark Rutland
Cc: Will Deacon, James Morse, catalin.marinas, suzuki.poulose,
dave.martin, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
Hi Mark,
On 2018/6/22 22:28, Mark Rutland wrote:
> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>> [ 0.042462] Insufficient stack space to handle exception!
>> [ 0.042464] ESR: 0x96000046 -- DABT (current EL)
>> [ 0.043781] FAR: 0xffff0000093a80e0
>> [ 0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> Here, the FAR points somewhere in the task stack, so we're evidently
> faulting on that...
>
>> [ 0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
>> [ 0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>> [ 0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #16
>> [ 0.067946] Hardware name: linux,dummy-virt (DT)
>> [ 0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>> [ 0.077480] pc : el1_sync+0x0/0xb0
>> [ 0.080970] lr : kpti_install_ng_mappings+0x120/0x214
>> [ 0.086143] sp : ffff0000093a80e0
>> [ 0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
>> [ 0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
>> [ 0.100241] x25: ffff00000906d000 x24: ffff000009191000
>> [ 0.105657] x23: ffff000008ea9000 x22: 0000000041190000
>> [ 0.111448] x21: ffff0000091f7000 x20: 0000000000000000
>> [ 0.116437] x19: ffff000009190000 x18: 000000003455d99d
>> [ 0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
>> [ 0.127155] x15: 000000007eff6000 x14: 000000007eff6000
>> [ 0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
>> [ 0.137886] x11: 000000007eff8000 x10: 0000000000000000
>> [ 0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
>> [ 0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
>> [ 0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
>> [ 0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
>> [ 0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
>> [ 0.170179] Kernel panic - not syncing: kernel stack overflow
>> [ 0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #16
>> [ 0.184152] Hardware name: linux,dummy-virt (DT)
>> [ 0.188851] Call trace:
>> [ 0.191380] dump_backtrace+0x0/0x180
>> [ 0.195113] show_stack+0x14/0x1c
>> [ 0.198488] dump_stack+0x90/0xb0
>> [ 0.201862] panic+0x138/0x2a0
>> [ 0.204989] __stack_chk_fail+0x0/0x18
>> [ 0.208836] handle_bad_stack+0x118/0x124
>> [ 0.212927] __bad_stack+0x88/0x8c
>> [ 0.216414] el1_sync+0x0/0xb0
>> [ 0.219544] Unable to handle kernel paging request at virtual address
>> ffff0000093abce0
> Likewise, here we're faulting on an address within the task stack,
> presumably as part of the unwinding process...
>
>> [ 0.227507] Mem abort info:
>> [ 0.230390] ESR = 0x96000006
>> [ 0.233517] Exception class = DABT (current EL), IL = 32 bits
>> [ 0.239428] SET = 0, FnV = 0
>> [ 0.242555] EA = 0, S1PTW = 0
>> [ 0.245797] Data abort info:
>> [ 0.248795] ISV = 0, ISS = 0x00000006
>> [ 0.252652] CM = 0, WnR = 0
>> [ 0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
>> (ptrval)
>> [ 0.262645] [ffff0000093abce0] pgd=00000000411f8803,
>> pud=00000000411f9803, pmd=0000000000000000
> ... and here the PMD for the task stack is all zeroes, so evidently
> that's getting corrupted somehow.
>
> It appears that the overflow stack (which IIRC is embedded within the
> kernel's data segment, as part of the image mapping), is fine.
>
> I wonder if there's some existing weirdness in the page tables for the
> vmalloc area that causes things to go wrong. Can you please:
>
> * enable ARM64_PTDUMP_DEBUGFS
>
> * boot with kpti=off (with Will's patch to make this work)
>
> * as root, cat /sys/kernel/debug/kernel_page_tables
>
> ... and dump the result here?
Thanks!
Can I do this later since Will's new patch works?
Best Regards,
Wei
> Thanks,
> Mark.
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 15:28 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 15:28 UTC (permalink / raw)
To: linux-arm-kernel
Hi Mark,
On 2018/6/22 22:28, Mark Rutland wrote:
> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>> [ 0.042462] Insufficient stack space to handle exception!
>> [ 0.042464] ESR: 0x96000046 -- DABT (current EL)
>> [ 0.043781] FAR: 0xffff0000093a80e0
>> [ 0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> Here, the FAR points somewhere in the task stack, so we're evidently
> faulting on that...
>
>> [ 0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
>> [ 0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>> [ 0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #16
>> [ 0.067946] Hardware name: linux,dummy-virt (DT)
>> [ 0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>> [ 0.077480] pc : el1_sync+0x0/0xb0
>> [ 0.080970] lr : kpti_install_ng_mappings+0x120/0x214
>> [ 0.086143] sp : ffff0000093a80e0
>> [ 0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
>> [ 0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
>> [ 0.100241] x25: ffff00000906d000 x24: ffff000009191000
>> [ 0.105657] x23: ffff000008ea9000 x22: 0000000041190000
>> [ 0.111448] x21: ffff0000091f7000 x20: 0000000000000000
>> [ 0.116437] x19: ffff000009190000 x18: 000000003455d99d
>> [ 0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
>> [ 0.127155] x15: 000000007eff6000 x14: 000000007eff6000
>> [ 0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
>> [ 0.137886] x11: 000000007eff8000 x10: 0000000000000000
>> [ 0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
>> [ 0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
>> [ 0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
>> [ 0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
>> [ 0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
>> [ 0.170179] Kernel panic - not syncing: kernel stack overflow
>> [ 0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #16
>> [ 0.184152] Hardware name: linux,dummy-virt (DT)
>> [ 0.188851] Call trace:
>> [ 0.191380] dump_backtrace+0x0/0x180
>> [ 0.195113] show_stack+0x14/0x1c
>> [ 0.198488] dump_stack+0x90/0xb0
>> [ 0.201862] panic+0x138/0x2a0
>> [ 0.204989] __stack_chk_fail+0x0/0x18
>> [ 0.208836] handle_bad_stack+0x118/0x124
>> [ 0.212927] __bad_stack+0x88/0x8c
>> [ 0.216414] el1_sync+0x0/0xb0
>> [ 0.219544] Unable to handle kernel paging request at virtual address
>> ffff0000093abce0
> Likewise, here we're faulting on an address within the task stack,
> presumably as part of the unwinding process...
>
>> [ 0.227507] Mem abort info:
>> [ 0.230390] ESR = 0x96000006
>> [ 0.233517] Exception class = DABT (current EL), IL = 32 bits
>> [ 0.239428] SET = 0, FnV = 0
>> [ 0.242555] EA = 0, S1PTW = 0
>> [ 0.245797] Data abort info:
>> [ 0.248795] ISV = 0, ISS = 0x00000006
>> [ 0.252652] CM = 0, WnR = 0
>> [ 0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
>> (ptrval)
>> [ 0.262645] [ffff0000093abce0] pgd=00000000411f8803,
>> pud=00000000411f9803, pmd=0000000000000000
> ... and here the PMD for the task stack is all zeroes, so evidently
> that's getting corrupted somehow.
>
> It appears that the overflow stack (which IIRC is embedded within the
> kernel's data segment, as part of the image mapping), is fine.
>
> I wonder if there's some existing weirdness in the page tables for the
> vmalloc area that causes things to go wrong. Can you please:
>
> * enable ARM64_PTDUMP_DEBUGFS
>
> * boot with kpti=off (with Will's patch to make this work)
>
> * as root, cat /sys/kernel/debug/kernel_page_tables
>
> ... and dump the result here?
Thanks!
Can I do this later since Will's new patch works?
Best Regards,
Wei
> Thanks,
> Mark.
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-22 15:28 ` Wei Xu
@ 2018-06-22 15:41 ` Will Deacon
-1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 15:41 UTC (permalink / raw)
To: Wei Xu
Cc: Mark Rutland, James Morse, catalin.marinas, suzuki.poulose,
dave.martin, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
On Fri, Jun 22, 2018 at 11:28:21PM +0800, Wei Xu wrote:
> On 2018/6/22 22:28, Mark Rutland wrote:
> >On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> >> [ 0.227507] Mem abort info:
> >> [ 0.230390] ESR = 0x96000006
> >> [ 0.233517] Exception class = DABT (current EL), IL = 32 bits
> >> [ 0.239428] SET = 0, FnV = 0
> >> [ 0.242555] EA = 0, S1PTW = 0
> >> [ 0.245797] Data abort info:
> >> [ 0.248795] ISV = 0, ISS = 0x00000006
> >> [ 0.252652] CM = 0, WnR = 0
> >> [ 0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
> >>(ptrval)
> >> [ 0.262645] [ffff0000093abce0] pgd=00000000411f8803,
> >>pud=00000000411f9803, pmd=0000000000000000
> >... and here the PMD for the task stack is all zeroes, so evidently
> >that's getting corrupted somehow.
> >
> >It appears that the overflow stack (which IIRC is embedded within the
> >kernel's data segment, as part of the image mapping), is fine.
> >
> >I wonder if there's some existing weirdness in the page tables for the
> >vmalloc area that causes things to go wrong. Can you please:
> >
> >* enable ARM64_PTDUMP_DEBUGFS
> >
> >* boot with kpti=off (with Will's patch to make this work)
> >
> >* as root, cat /sys/kernel/debug/kernel_page_tables
> >
> >... and dump the result here?
> Thanks!
> Can I do this later since Will's new patch works?
Yes, you should probably go to bed now! Please note that my patch still
isn't the right thing for mainline, since it avoids setting PTE_NG for
tables and therefore won't solve the boot-time issue with KASAN enabled.
We also don't understand why clean+invalidate is causing the issue on your
CPU, whereas clean does not. It looks like clean+invalidate somehow results
in page table entries being zeroed.
Have a good weekend,
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 15:41 ` Will Deacon
0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 15:41 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Jun 22, 2018 at 11:28:21PM +0800, Wei Xu wrote:
> On 2018/6/22 22:28, Mark Rutland wrote:
> >On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> >> [ 0.227507] Mem abort info:
> >> [ 0.230390] ESR = 0x96000006
> >> [ 0.233517] Exception class = DABT (current EL), IL = 32 bits
> >> [ 0.239428] SET = 0, FnV = 0
> >> [ 0.242555] EA = 0, S1PTW = 0
> >> [ 0.245797] Data abort info:
> >> [ 0.248795] ISV = 0, ISS = 0x00000006
> >> [ 0.252652] CM = 0, WnR = 0
> >> [ 0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
> >>(ptrval)
> >> [ 0.262645] [ffff0000093abce0] pgd=00000000411f8803,
> >>pud=00000000411f9803, pmd=0000000000000000
> >... and here the PMD for the task stack is all zeroes, so evidently
> >that's getting corrupted somehow.
> >
> >It appears that the overflow stack (which IIRC is embedded within the
> >kernel's data segment, as part of the image mapping), is fine.
> >
> >I wonder if there's some existing weirdness in the page tables for the
> >vmalloc area that causes things to go wrong. Can you please:
> >
> >* enable ARM64_PTDUMP_DEBUGFS
> >
> >* boot with kpti=off (with Will's patch to make this work)
> >
> >* as root, cat /sys/kernel/debug/kernel_page_tables
> >
> >... and dump the result here?
> Thanks!
> Can I do this later since Will's new patch works?
Yes, you should probably go to bed now! Please note that my patch still
isn't the right thing for mainline, since it avoids setting PTE_NG for
tables and therefore won't solve the boot-time issue with KASAN enabled.
We also don't understand why clean+invalidate is causing the issue on your
CPU, whereas clean does not. It looks like clean+invalidate somehow results
in page table entries being zeroed.
Have a good weekend,
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-22 15:41 ` Will Deacon
@ 2018-06-22 16:02 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 16:02 UTC (permalink / raw)
To: Will Deacon
Cc: Mark Rutland, James Morse, catalin.marinas, suzuki.poulose,
dave.martin, marc.zyngier, linux-arm-kernel, linux-kernel,
Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
Xiongfanggou (James), Liguozhu (Kenneth),
Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
libeijian, zhangbin011
Hi Will, Mark,
On 2018/6/22 23:41, Will Deacon wrote:
> On Fri, Jun 22, 2018 at 11:28:21PM +0800, Wei Xu wrote:
>> On 2018/6/22 22:28, Mark Rutland wrote:
>>> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>>>> [ 0.227507] Mem abort info:
>>>> [ 0.230390] ESR = 0x96000006
>>>> [ 0.233517] Exception class = DABT (current EL), IL = 32 bits
>>>> [ 0.239428] SET = 0, FnV = 0
>>>> [ 0.242555] EA = 0, S1PTW = 0
>>>> [ 0.245797] Data abort info:
>>>> [ 0.248795] ISV = 0, ISS = 0x00000006
>>>> [ 0.252652] CM = 0, WnR = 0
>>>> [ 0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
>>>> (ptrval)
>>>> [ 0.262645] [ffff0000093abce0] pgd=00000000411f8803,
>>>> pud=00000000411f9803, pmd=0000000000000000
>>> ... and here the PMD for the task stack is all zeroes, so evidently
>>> that's getting corrupted somehow.
>>>
>>> It appears that the overflow stack (which IIRC is embedded within the
>>> kernel's data segment, as part of the image mapping), is fine.
>>>
>>> I wonder if there's some existing weirdness in the page tables for the
>>> vmalloc area that causes things to go wrong. Can you please:
>>>
>>> * enable ARM64_PTDUMP_DEBUGFS
>>>
>>> * boot with kpti=off (with Will's patch to make this work)
>>>
>>> * as root, cat /sys/kernel/debug/kernel_page_tables
>>>
>>> ... and dump the result here?
>> Thanks!
>> Can I do this later since Will's new patch works?
> Yes, you should probably go to bed now! Please note that my patch still
> isn't the right thing for mainline, since it avoids setting PTE_NG for
> tables and therefore won't solve the boot-time issue with KASAN enabled.
>
> We also don't understand why clean+invalidate is causing the issue on your
> CPU, whereas clean does not. It looks like clean+invalidate somehow results
> in page table entries being zeroed.
>
> Have a good weekend,
Got it. Thanks and enjoy the fifa world cup :)
Below is the log enabled ARM64_PTDUMP_DEBUGFS.
Only Will's kpti early_param patch on 4.17.0.
Hope it helps.
./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel
./Image-4.17-joyx -initrd
../mini-rootfs-arm64.cpio.gz -nographic -append "kpti=off
rdinit=init console=tt
yAMA0 earlycon=pl011,0x9000000"
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
[ 0.000000] Linux version 4.17.0-45865-ga3d6816
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #19
SMP PREEMPT Fri Jun 22 23:47:07 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval)
s57984 r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: kernel page table isolation forced OFF
by command line option
[ 0.000000] CPU features: detected: Hardware dirty bit management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: kpti=off rdinit=init
console=ttyAMA0 earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K kernel
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved,
16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1,
Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000859] Console: colour dummy device 80x25
[ 0.001459] Calibrating delay loop (skipped), value calculated
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002537] pid_max: default: 32768 minimum: 301
[ 0.003028] Security Framework initialized
[ 0.003606] Dentry cache hash table entries: 131072 (order: 8,
1048576 bytes)
[ 0.004418] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005129] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005938] Mountpoint-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.026041] ASID allocator initialised with 32768 entries
[ 0.030055] Hierarchical SRCU implementation.
[ 0.034426] Platform MSI: its domain created
[ 0.034885] PCI/MSI: /intc/its domain created
[ 0.035457] EFI services will not be available.
[ 0.038086] smp: Bringing up secondary CPUs ...
[ 0.038557] smp: Brought up 1 node, 1 CPU
[ 0.038966] SMP: Total of 1 processors activated.
[ 0.039447] CPU features: detected: GIC system register CPU
interface
[ 0.040101] CPU features: detected: Privileged Access Never
[ 0.040667] CPU features: detected: User Access Override
[ 0.041988] CPU: All CPU(s) started at EL1
[ 0.042536] alternatives: patching kernel code
[ 0.044809] devtmpfs: initialized
[ 0.046662] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.049470] futex hash table entries: 256 (order: 3, 32768 bytes)
[ 0.055780] pinctrl core: initialized pinctrl subsystem
[ 0.061504] DMI not present or invalid.
[ 0.065230] NET: Registered protocol family 16
[ 0.069514] audit: initializing netlink subsys (disabled)
[ 0.075351] cpuidle: using governor menu
[ 0.078855] audit: type=2000 audit(0.068:1): state=initialized
audit_enabled=0 res=1
[ 0.086714] vdso: 2 pages (1 code @ (ptrval), 1 data
@ (ptrval))
[ 0.094456] hw-breakpoint: found 6 breakpoint and 4 watchpoint
registers.
[ 0.101869] DMA: preallocated 256 KiB pool for atomic allocations
[ 0.107408] Serial: AMBA PL011 UART driver
[ 0.114802] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 39,
base_baud = 0) is a PL011 rev1
[ 0.120256] console [ttyAMA0] enabled
[ 0.120256] console [ttyAMA0] enabled
[ 0.127525] bootconsole [pl11] disabled
[ 0.127525] bootconsole [pl11] disabled
[ 0.135667] irq: type mismatch, failed to map hwirq-27 for intc!
[ 0.153827] HugeTLB registered 2.00 MiB page size, pre-allocated
0 pages
[ 0.157547] cryptd: max_cpu_qlen set to 1000
[ 0.165692] ACPI: Interpreter disabled.
[ 0.166341] vgaarb: loaded
[ 0.166629] SCSI subsystem initialized
[ 0.169664] usbcore: registered new interface driver usbfs
[ 0.170139] usbcore: registered new interface driver hub
[ 0.174110] usbcore: registered new device driver usb
[ 0.179293] pps_core: LinuxPPS API ver. 1 registered
[ 0.184239] pps_core: Software ver. 5.3.6 - Copyright 2005-2007
Rodolfo Giometti <giometti@linux.it>
[ 0.193320] PTP clock support registered
[ 0.197360] EDAC MC: Ver: 3.0.0
[ 0.201468] Advanced Linux Sound Architecture Driver Initialized.
[ 0.207035] clocksource: Switched to clocksource arch_sys_counter
[ 0.212870] VFS: Disk quotas dquot_6.6.0
[ 0.216844] VFS: Dquot-cache hash table entries: 512 (order 0,
4096 bytes)
[ 0.223782] pnp: PnP ACPI: disabled
[ 0.229309] NET: Registered protocol family 2
[ 0.232711] tcp_listen_portaddr_hash hash table entries: 512
(order: 1, 8192 bytes)
[ 0.239478] TCP established hash table entries: 8192 (order: 4,
65536 bytes)
[ 0.246564] TCP bind hash table entries: 8192 (order: 5, 131072
bytes)
[ 0.253246] TCP: Hash tables configured (established 8192 bind 8192)
[ 0.259572] UDP hash table entries: 512 (order: 2, 16384 bytes)
[ 0.265610] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[ 0.272044] NET: Registered protocol family 1
[ 0.288576] RPC: Registered named UNIX socket transport module.
[ 0.289058] RPC: Registered udp transport module.
[ 0.289434] RPC: Registered tcp transport module.
[ 0.291949] RPC: Registered tcp NFSv4.1 backchannel transport
module.
[ 0.298471] Unpacking initramfs...
[ 0.835705] Freeing initrd memory: 29212K
[ 0.836273] hw perfevents: enabled with armv8_pmuv3 PMU driver,
13 counters available
[ 0.837026] kvm [1]: HYP mode not available
[ 0.838111] Initialise system trusted keyrings
[ 0.838710] workingset: timestamp_bits=44 max_order=18
bucket_order=0
[ 0.840716] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 0.846449] NFS: Registering the id_resolver key type
[ 0.846892] Key type id_resolver registered
[ 0.847453] Key type id_legacy registered
[ 0.847789] nfs4filelayout_init: NFSv4 File Layout Driver
Registering...
[ 0.848383] 9p: Installing v9fs 9p2000 file system support
[ 0.848878] pstore: using deflate compression
[ 0.849942] Key type asymmetric registered
[ 0.850303] Asymmetric key parser 'x509' registered
[ 0.850729] Block layer SCSI generic (bsg) driver version 0.4
loaded (major 245)
[ 0.851480] io scheduler noop registered
[ 0.851801] io scheduler deadline registered
[ 0.852215] io scheduler cfq registered (default)
[ 0.852595] io scheduler mq-deadline registered
[ 0.852955] io scheduler kyber registered
[ 0.855192] pl061_gpio 9030000.pl061: PL061 GPIO chip
@0x0000000009030000 registered
[ 0.857039] PCI: OF: host bridge /pcie@10000000 ranges:
[ 0.857481] PCI: OF: IO 0x3eff0000..0x3effffff -> 0x00000000
[ 0.857953] PCI: OF: MEM 0x10000000..0x3efeffff -> 0x10000000
[ 0.858435] PCI: OF: MEM 0x8000000000..0xffffffffff ->
0x8000000000
[ 0.858956] pci-host-generic 3f000000.pcie: ECAM at [mem
0x3f000000-0x3fffffff] for [bus 00-0f]
[ 0.860042] pci-host-generic 3f000000.pcie: PCI host bridge to
bus 0000:00
[ 0.860598] pci_bus 0000:00: root bus resource [bus 00-0f]
[ 0.861034] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 0.861524] pci_bus 0000:00: root bus resource [mem
0x10000000-0x3efeffff]
[ 0.862074] pci_bus 0000:00: root bus resource [mem
0x8000000000-0xffffffffff]
[ 0.863568] pci 0000:00:01.0: BAR 6: assigned [mem
0x10000000-0x1003ffff pref]
[ 0.864147] pci 0000:00:01.0: BAR 4: assigned [mem
0x8000000000-0x8000003fff 64bit pref]
[ 0.864803] pci 0000:00:01.0: BAR 1: assigned [mem
0x10040000-0x10040fff]
[ 0.865342] pci 0000:00:01.0: BAR 0: assigned [io 0x1000-0x101f]
[ 0.866470] EINJ: ACPI disabled.
[ 0.868836] virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
[ 0.874100] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 0.875395] SuperH (H)SCI(F) driver initialized
[ 0.876757] msm_serial: driver initialized
[ 0.877328] cacheinfo: Unable to detect cache hierarchy for CPU 0
[ 0.880330] loop: module loaded
[ 0.881885] libphy: Fixed MDIO Bus: probed
[ 0.882499] tun: Universal TUN/TAP device driver, 1.6
[ 0.884820] thunder_xcv, ver 1.0
[ 0.885126] thunder_bgx, ver 1.0
[ 0.885415] nicpf, ver 1.0
[ 0.885764] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[ 0.886246] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[ 0.886927] igb: Intel(R) Gigabit Ethernet Network Driver -
version 5.4.0-k
[ 0.887687] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 0.888159] igbvf: Intel(R) Gigabit Virtual Function Network
Driver - version 2.4.0-k
[ 0.888782] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[ 0.889388] sky2: driver version 1.30
[ 0.889931] VFIO - User Level meta-driver version: 0.3
[ 0.890861] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI)
Driver
[ 0.891644] ehci-pci: EHCI PCI platform driver
[ 0.892043] ehci-platform: EHCI generic platform driver
[ 0.892515] ehci-orion: EHCI orion driver
[ 0.892880] ehci-exynos: EHCI EXYNOS driver
[ 0.893414] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 0.893914] ohci-pci: OHCI PCI platform driver
[ 0.894308] ohci-platform: OHCI generic platform driver
[ 0.894765] ohci-exynos: OHCI EXYNOS driver
[ 0.895357] usbcore: registered new interface driver usb-storage
[ 0.896739] rtc-pl031 9010000.pl031: rtc core: registered pl031
as rtc0
[ 0.897504] i2c /dev entries driver
[ 0.899576] sdhci: Secure Digital Host Controller Interface driver
[ 0.900086] sdhci: Copyright(c) Pierre Ossman
[ 0.900551] Synopsys Designware Multimedia Card Interface Driver
[ 0.901791] sdhci-pltfm: SDHCI platform and OF driver helper
[ 0.902636] ledtrig-cpu: registered to indicate activity on CPUs
[ 0.903644] usbcore: registered new interface driver usbhid
[ 0.904106] usbhid: USB HID core driver
[ 0.905520] NET: Registered protocol family 17
[ 0.905917] 9pnet: Installing 9P2000 support
[ 0.906304] Key type dns_resolver registered
[ 0.906814] registered taskstats version 1
[ 0.907542] Loading compiled-in X.509 certificates
[ 0.908155] input: gpio-keys as
/devices/platform/gpio-keys/input/input0
[ 0.909760] rtc-pl031 9010000.pl031: setting system clock to
2015-01-30 02:38:42 UTC (1422585522)
[ 0.918889] ALSA device list:
[ 0.921687] No soundcards found.
[ 0.925317] uart-pl011 9000000.pl011: no DMA platform data
[ 0.930981] Freeing unused kernel memory: 1216K
Starting rcS...
++ Mounting filesystem
ifdown: interface lo not configured
ifdown: interface eth0 not configured
++ Starting ssh daemon
[ 0.950291] random: sshd: uninitialized urandom read (32 bytes read)
ip: RTNETLINK answers: File exists
rcS Complete
Welcome to Mini Linux
GNU/Linux 4.17.0-45865-ga3d6816 aarch64
Version: 1.1.6
.--.
|o_o |
|:_/ |
// \ \
(| | )
/'\_ _/`\
\___)=(___/
udhcpc: started, v1.29.0.git
Setting IP address 0.0.0.0 on eth0
Documentation: http://open-estuary.org
E-mail: Chinafengliang@163.com
estuary:/$ udhcpc: sending discover
udhcpc: sending select for 10.0.2.15
udhcpc: lease of 10.0.2.15 obtained, lease time 86400
Setting IP address 10.0.2.15 on eth0
Deleting routers
route: SIOCDELRT: No such process
Adding router 10.0.2.2
Recreating /etc/resolv.conf
Adding DNS server 10.0.2.3
estuary:/$
estuary:/$ cat /syestuary:/$ cat /sys/keestuary:/$ cat
/sys/kernel/debestuary:/$ cat /sys/kernel/debug/keestuary:/$ cat
/sys/kernel/debug/kernel_page_tables
---[ Modules start ]---
---[ Modules end ]---
---[ vmalloc() Area ]---
0xffff000008000000-0xffff000008004000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008005000-0xffff000008009000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000800a000-0xffff00000800e000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008010000-0xffff000008020000 64K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000008021000-0xffff000008022000 4K PTE ro NX
SHD AF UXN MEM/NORMAL
0xffff000008028000-0xffff00000802c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008030000-0xffff000008034000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008035000-0xffff000008036000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000008038000-0xffff00000803c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000803d000-0xffff00000803f000 8K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008040000-0xffff000008060000 128K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000008061000-0xffff000008065000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008066000-0xffff000008067000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000008068000-0xffff00000806c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008070000-0xffff000008074000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008078000-0xffff00000807c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008080000-0xffff000008200000 1536K PTE ro x
SHD AF CON UXN MEM/NORMAL
0xffff000008200000-0xffff000008a00000 8M PMD ro x
SHD AF BLK UXN MEM/NORMAL
0xffff000008a00000-0xffff000008a50000 320K PTE ro x
SHD AF CON UXN MEM/NORMAL
0xffff000008a50000-0xffff000008c00000 1728K PTE ro NX
SHD AF UXN MEM/NORMAL
0xffff000008c00000-0xffff000008e00000 2M PMD ro NX
SHD AF BLK UXN MEM/NORMAL
0xffff000008e00000-0xffff000008f10000 1088K PTE ro NX
SHD AF UXN MEM/NORMAL
0xffff000009040000-0xffff0000091f0000 1728K PTE RW NX
SHD AF CON UXN MEM/NORMAL
0xffff0000091f0000-0xffff0000091fa000 40K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000091fb000-0xffff0000092fb000 1M PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000092fc000-0xffff00000937c000 512K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009380000-0xffff000009384000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009388000-0xffff00000938c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009390000-0xffff000009394000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009398000-0xffff00000939c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093a0000-0xffff0000093a4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093a8000-0xffff0000093ac000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093b0000-0xffff0000093b4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093b8000-0xffff0000093bc000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093c0000-0xffff0000093c4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093c8000-0xffff0000093cc000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093d0000-0xffff0000093d4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093d5000-0xffff0000093dd000 32K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009408000-0xffff00000940c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009410000-0xffff000009414000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000946d000-0xffff00000946e000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000009475000-0xffff000009476000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff00000947d000-0xffff00000947e000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000009485000-0xffff000009486000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff00000948d000-0xffff00000948e000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000009495000-0xffff000009496000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000009595000-0xffff0000095d5000 256K PTE RW NX
SHD AF UXN MEM/NORMAL-NC
0xffff000009740000-0xffff000009744000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009c60000-0xffff000009c64000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009c70000-0xffff000009c74000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000a000000-0xffff00000af60000 15744K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff00000af61000-0xffff00000af65000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b020000-0xffff00000b024000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b028000-0xffff00000b02c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b030000-0xffff00000b034000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b038000-0xffff00000b03c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b048000-0xffff00000b04c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b0f8000-0xffff00000b0fc000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b170000-0xffff00000b174000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b208000-0xffff00000b20c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b230000-0xffff00000b234000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b238000-0xffff00000b23c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b48d000-0xffff00000b49d000 64K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b49e000-0xffff00000b4be000 128K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b4c0000-0xffff00000b4c4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b538000-0xffff00000b53c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b7e8000-0xffff00000b7ec000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000c000000-0xffff00000d000000 16M PMD RW NX
SHD AF BLK UXN DEVICE/nGnRnE
0xffff00000d001000-0xffff00000d004000 12K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d260000-0xffff00000d264000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d760000-0xffff00000d764000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d770000-0xffff00000d774000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d778000-0xffff00000d77c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d7b0000-0xffff00000d7b4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d7d8000-0xffff00000d7dc000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d7e0000-0xffff00000d7e4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff7dffbffd8000-0xffff7dffbffdb000 12K PTE RW NX
SHD AF UXN MEM/NORMAL
---[ vmalloc() End ]---
---[ Fixmap start ]---
0xffff7dfffe7fa000-0xffff7dfffe7fb000 4K PTE ro x
SHD AF UXN MEM/NORMAL
0xffff7dfffe7ff000-0xffff7dfffe800000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff7dfffe800000-0xffff7dfffea00000 2M PMD ro NX
SHD AF BLK UXN MEM/NORMAL
---[ Fixmap end ]---
---[ PCI I/O start ]---
0xffff7dfffee00000-0xffff7dfffee10000 64K PTE RW NX
SHD AF UXN DEVICE/nGnRE
---[ PCI I/O end ]---
---[ vmemmap start ]---
0xffff7e0000000000-0xffff7e0001000000 16M PMD RW NX
SHD AF BLK UXN MEM/NORMAL
---[ vmemmap end ]---
---[ Linear Mapping ]---
0xffff800000000000-0xffff800000080000 512K PTE RW NX
SHD AF CON UXN MEM/NORMAL
0xffff800000080000-0xffff800000200000 1536K PTE ro NX
SHD AF UXN MEM/NORMAL
0xffff800000200000-0xffff800000e00000 12M PMD ro NX
SHD AF BLK UXN MEM/NORMAL
0xffff800000e00000-0xffff800000f10000 1088K PTE ro NX
SHD AF UXN MEM/NORMAL
0xffff800000f10000-0xffff800001000000 960K PTE RW NX
SHD AF CON UXN MEM/NORMAL
0xffff800001000000-0xffff800002000000 16M PMD RW NX
SHD AF BLK UXN MEM/NORMAL
0xffff800002000000-0xffff800040000000 992M PMD RW NX
SHD AF CON BLK UXN MEM/NORMAL
estuary:/$
Thanks!
Best Regards,
Wei
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 16:02 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 16:02 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will, Mark,
On 2018/6/22 23:41, Will Deacon wrote:
> On Fri, Jun 22, 2018 at 11:28:21PM +0800, Wei Xu wrote:
>> On 2018/6/22 22:28, Mark Rutland wrote:
>>> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>>>> [ 0.227507] Mem abort info:
>>>> [ 0.230390] ESR = 0x96000006
>>>> [ 0.233517] Exception class = DABT (current EL), IL = 32 bits
>>>> [ 0.239428] SET = 0, FnV = 0
>>>> [ 0.242555] EA = 0, S1PTW = 0
>>>> [ 0.245797] Data abort info:
>>>> [ 0.248795] ISV = 0, ISS = 0x00000006
>>>> [ 0.252652] CM = 0, WnR = 0
>>>> [ 0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
>>>> (ptrval)
>>>> [ 0.262645] [ffff0000093abce0] pgd=00000000411f8803,
>>>> pud=00000000411f9803, pmd=0000000000000000
>>> ... and here the PMD for the task stack is all zeroes, so evidently
>>> that's getting corrupted somehow.
>>>
>>> It appears that the overflow stack (which IIRC is embedded within the
>>> kernel's data segment, as part of the image mapping), is fine.
>>>
>>> I wonder if there's some existing weirdness in the page tables for the
>>> vmalloc area that causes things to go wrong. Can you please:
>>>
>>> * enable ARM64_PTDUMP_DEBUGFS
>>>
>>> * boot with kpti=off (with Will's patch to make this work)
>>>
>>> * as root, cat /sys/kernel/debug/kernel_page_tables
>>>
>>> ... and dump the result here?
>> Thanks!
>> Can I do this later since Will's new patch works?
> Yes, you should probably go to bed now! Please note that my patch still
> isn't the right thing for mainline, since it avoids setting PTE_NG for
> tables and therefore won't solve the boot-time issue with KASAN enabled.
>
> We also don't understand why clean+invalidate is causing the issue on your
> CPU, whereas clean does not. It looks like clean+invalidate somehow results
> in page table entries being zeroed.
>
> Have a good weekend,
Got it. Thanks and enjoy the fifa world cup :)
Below is the log enabled ARM64_PTDUMP_DEBUGFS.
Only Will's kpti early_param patch on 4.17.0.
Hope it helps.
./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel
./Image-4.17-joyx -initrd
../mini-rootfs-arm64.cpio.gz -nographic -append "kpti=off
rdinit=init console=tt
yAMA0 earlycon=pl011,0x9000000"
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
[ 0.000000] Linux version 4.17.0-45865-ga3d6816
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #19
SMP PREEMPT Fri Jun 22 23:47:07 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 16 MiB at 0x000000007f000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 24 pages/cpu @ (ptrval)
s57984 r8192 d32128 u98304
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: kernel page table isolation forced OFF
by command line option
[ 0.000000] CPU features: detected: Hardware dirty bit management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: kpti=off rdinit=init
console=ttyAMA0 earlycon=pl011,0x9000000
[ 0.000000] Memory: 968436K/1048576K available (10044K kernel
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved,
16384K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1,
Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Devices
@7d830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007d850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007d860000
[ 0.000000] GIC: PPI11 is secure or misconfigured
[ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3,
assuming level low
[ 0.000000] arch_timer: WARNING: Please fix your firmware
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000859] Console: colour dummy device 80x25
[ 0.001459] Calibrating delay loop (skipped), value calculated
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002537] pid_max: default: 32768 minimum: 301
[ 0.003028] Security Framework initialized
[ 0.003606] Dentry cache hash table entries: 131072 (order: 8,
1048576 bytes)
[ 0.004418] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.005129] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005938] Mountpoint-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.026041] ASID allocator initialised with 32768 entries
[ 0.030055] Hierarchical SRCU implementation.
[ 0.034426] Platform MSI: its domain created
[ 0.034885] PCI/MSI: /intc/its domain created
[ 0.035457] EFI services will not be available.
[ 0.038086] smp: Bringing up secondary CPUs ...
[ 0.038557] smp: Brought up 1 node, 1 CPU
[ 0.038966] SMP: Total of 1 processors activated.
[ 0.039447] CPU features: detected: GIC system register CPU
interface
[ 0.040101] CPU features: detected: Privileged Access Never
[ 0.040667] CPU features: detected: User Access Override
[ 0.041988] CPU: All CPU(s) started at EL1
[ 0.042536] alternatives: patching kernel code
[ 0.044809] devtmpfs: initialized
[ 0.046662] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.049470] futex hash table entries: 256 (order: 3, 32768 bytes)
[ 0.055780] pinctrl core: initialized pinctrl subsystem
[ 0.061504] DMI not present or invalid.
[ 0.065230] NET: Registered protocol family 16
[ 0.069514] audit: initializing netlink subsys (disabled)
[ 0.075351] cpuidle: using governor menu
[ 0.078855] audit: type=2000 audit(0.068:1): state=initialized
audit_enabled=0 res=1
[ 0.086714] vdso: 2 pages (1 code @ (ptrval), 1 data
@ (ptrval))
[ 0.094456] hw-breakpoint: found 6 breakpoint and 4 watchpoint
registers.
[ 0.101869] DMA: preallocated 256 KiB pool for atomic allocations
[ 0.107408] Serial: AMBA PL011 UART driver
[ 0.114802] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 39,
base_baud = 0) is a PL011 rev1
[ 0.120256] console [ttyAMA0] enabled
[ 0.120256] console [ttyAMA0] enabled
[ 0.127525] bootconsole [pl11] disabled
[ 0.127525] bootconsole [pl11] disabled
[ 0.135667] irq: type mismatch, failed to map hwirq-27 for intc!
[ 0.153827] HugeTLB registered 2.00 MiB page size, pre-allocated
0 pages
[ 0.157547] cryptd: max_cpu_qlen set to 1000
[ 0.165692] ACPI: Interpreter disabled.
[ 0.166341] vgaarb: loaded
[ 0.166629] SCSI subsystem initialized
[ 0.169664] usbcore: registered new interface driver usbfs
[ 0.170139] usbcore: registered new interface driver hub
[ 0.174110] usbcore: registered new device driver usb
[ 0.179293] pps_core: LinuxPPS API ver. 1 registered
[ 0.184239] pps_core: Software ver. 5.3.6 - Copyright 2005-2007
Rodolfo Giometti <giometti@linux.it>
[ 0.193320] PTP clock support registered
[ 0.197360] EDAC MC: Ver: 3.0.0
[ 0.201468] Advanced Linux Sound Architecture Driver Initialized.
[ 0.207035] clocksource: Switched to clocksource arch_sys_counter
[ 0.212870] VFS: Disk quotas dquot_6.6.0
[ 0.216844] VFS: Dquot-cache hash table entries: 512 (order 0,
4096 bytes)
[ 0.223782] pnp: PnP ACPI: disabled
[ 0.229309] NET: Registered protocol family 2
[ 0.232711] tcp_listen_portaddr_hash hash table entries: 512
(order: 1, 8192 bytes)
[ 0.239478] TCP established hash table entries: 8192 (order: 4,
65536 bytes)
[ 0.246564] TCP bind hash table entries: 8192 (order: 5, 131072
bytes)
[ 0.253246] TCP: Hash tables configured (established 8192 bind 8192)
[ 0.259572] UDP hash table entries: 512 (order: 2, 16384 bytes)
[ 0.265610] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
[ 0.272044] NET: Registered protocol family 1
[ 0.288576] RPC: Registered named UNIX socket transport module.
[ 0.289058] RPC: Registered udp transport module.
[ 0.289434] RPC: Registered tcp transport module.
[ 0.291949] RPC: Registered tcp NFSv4.1 backchannel transport
module.
[ 0.298471] Unpacking initramfs...
[ 0.835705] Freeing initrd memory: 29212K
[ 0.836273] hw perfevents: enabled with armv8_pmuv3 PMU driver,
13 counters available
[ 0.837026] kvm [1]: HYP mode not available
[ 0.838111] Initialise system trusted keyrings
[ 0.838710] workingset: timestamp_bits=44 max_order=18
bucket_order=0
[ 0.840716] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 0.846449] NFS: Registering the id_resolver key type
[ 0.846892] Key type id_resolver registered
[ 0.847453] Key type id_legacy registered
[ 0.847789] nfs4filelayout_init: NFSv4 File Layout Driver
Registering...
[ 0.848383] 9p: Installing v9fs 9p2000 file system support
[ 0.848878] pstore: using deflate compression
[ 0.849942] Key type asymmetric registered
[ 0.850303] Asymmetric key parser 'x509' registered
[ 0.850729] Block layer SCSI generic (bsg) driver version 0.4
loaded (major 245)
[ 0.851480] io scheduler noop registered
[ 0.851801] io scheduler deadline registered
[ 0.852215] io scheduler cfq registered (default)
[ 0.852595] io scheduler mq-deadline registered
[ 0.852955] io scheduler kyber registered
[ 0.855192] pl061_gpio 9030000.pl061: PL061 GPIO chip
@0x0000000009030000 registered
[ 0.857039] PCI: OF: host bridge /pcie at 10000000 ranges:
[ 0.857481] PCI: OF: IO 0x3eff0000..0x3effffff -> 0x00000000
[ 0.857953] PCI: OF: MEM 0x10000000..0x3efeffff -> 0x10000000
[ 0.858435] PCI: OF: MEM 0x8000000000..0xffffffffff ->
0x8000000000
[ 0.858956] pci-host-generic 3f000000.pcie: ECAM at [mem
0x3f000000-0x3fffffff] for [bus 00-0f]
[ 0.860042] pci-host-generic 3f000000.pcie: PCI host bridge to
bus 0000:00
[ 0.860598] pci_bus 0000:00: root bus resource [bus 00-0f]
[ 0.861034] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 0.861524] pci_bus 0000:00: root bus resource [mem
0x10000000-0x3efeffff]
[ 0.862074] pci_bus 0000:00: root bus resource [mem
0x8000000000-0xffffffffff]
[ 0.863568] pci 0000:00:01.0: BAR 6: assigned [mem
0x10000000-0x1003ffff pref]
[ 0.864147] pci 0000:00:01.0: BAR 4: assigned [mem
0x8000000000-0x8000003fff 64bit pref]
[ 0.864803] pci 0000:00:01.0: BAR 1: assigned [mem
0x10040000-0x10040fff]
[ 0.865342] pci 0000:00:01.0: BAR 0: assigned [io 0x1000-0x101f]
[ 0.866470] EINJ: ACPI disabled.
[ 0.868836] virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
[ 0.874100] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 0.875395] SuperH (H)SCI(F) driver initialized
[ 0.876757] msm_serial: driver initialized
[ 0.877328] cacheinfo: Unable to detect cache hierarchy for CPU 0
[ 0.880330] loop: module loaded
[ 0.881885] libphy: Fixed MDIO Bus: probed
[ 0.882499] tun: Universal TUN/TAP device driver, 1.6
[ 0.884820] thunder_xcv, ver 1.0
[ 0.885126] thunder_bgx, ver 1.0
[ 0.885415] nicpf, ver 1.0
[ 0.885764] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[ 0.886246] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[ 0.886927] igb: Intel(R) Gigabit Ethernet Network Driver -
version 5.4.0-k
[ 0.887687] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 0.888159] igbvf: Intel(R) Gigabit Virtual Function Network
Driver - version 2.4.0-k
[ 0.888782] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[ 0.889388] sky2: driver version 1.30
[ 0.889931] VFIO - User Level meta-driver version: 0.3
[ 0.890861] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI)
Driver
[ 0.891644] ehci-pci: EHCI PCI platform driver
[ 0.892043] ehci-platform: EHCI generic platform driver
[ 0.892515] ehci-orion: EHCI orion driver
[ 0.892880] ehci-exynos: EHCI EXYNOS driver
[ 0.893414] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 0.893914] ohci-pci: OHCI PCI platform driver
[ 0.894308] ohci-platform: OHCI generic platform driver
[ 0.894765] ohci-exynos: OHCI EXYNOS driver
[ 0.895357] usbcore: registered new interface driver usb-storage
[ 0.896739] rtc-pl031 9010000.pl031: rtc core: registered pl031
as rtc0
[ 0.897504] i2c /dev entries driver
[ 0.899576] sdhci: Secure Digital Host Controller Interface driver
[ 0.900086] sdhci: Copyright(c) Pierre Ossman
[ 0.900551] Synopsys Designware Multimedia Card Interface Driver
[ 0.901791] sdhci-pltfm: SDHCI platform and OF driver helper
[ 0.902636] ledtrig-cpu: registered to indicate activity on CPUs
[ 0.903644] usbcore: registered new interface driver usbhid
[ 0.904106] usbhid: USB HID core driver
[ 0.905520] NET: Registered protocol family 17
[ 0.905917] 9pnet: Installing 9P2000 support
[ 0.906304] Key type dns_resolver registered
[ 0.906814] registered taskstats version 1
[ 0.907542] Loading compiled-in X.509 certificates
[ 0.908155] input: gpio-keys as
/devices/platform/gpio-keys/input/input0
[ 0.909760] rtc-pl031 9010000.pl031: setting system clock to
2015-01-30 02:38:42 UTC (1422585522)
[ 0.918889] ALSA device list:
[ 0.921687] No soundcards found.
[ 0.925317] uart-pl011 9000000.pl011: no DMA platform data
[ 0.930981] Freeing unused kernel memory: 1216K
Starting rcS...
++ Mounting filesystem
ifdown: interface lo not configured
ifdown: interface eth0 not configured
++ Starting ssh daemon
[ 0.950291] random: sshd: uninitialized urandom read (32 bytes read)
ip: RTNETLINK answers: File exists
rcS Complete
Welcome to Mini Linux
GNU/Linux 4.17.0-45865-ga3d6816 aarch64
Version: 1.1.6
.--.
|o_o |
|:_/ |
// \ \
(| | )
/'\_ _/`\
\___)=(___/
udhcpc: started, v1.29.0.git
Setting IP address 0.0.0.0 on eth0
Documentation: http://open-estuary.org
E-mail: Chinafengliang at 163.com
estuary:/$ udhcpc: sending discover
udhcpc: sending select for 10.0.2.15
udhcpc: lease of 10.0.2.15 obtained, lease time 86400
Setting IP address 10.0.2.15 on eth0
Deleting routers
route: SIOCDELRT: No such process
Adding router 10.0.2.2
Recreating /etc/resolv.conf
Adding DNS server 10.0.2.3
estuary:/$
estuary:/$ cat /syestuary:/$ cat /sys/keestuary:/$ cat
/sys/kernel/debestuary:/$ cat /sys/kernel/debug/keestuary:/$ cat
/sys/kernel/debug/kernel_page_tables
---[ Modules start ]---
---[ Modules end ]---
---[ vmalloc() Area ]---
0xffff000008000000-0xffff000008004000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008005000-0xffff000008009000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000800a000-0xffff00000800e000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008010000-0xffff000008020000 64K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000008021000-0xffff000008022000 4K PTE ro NX
SHD AF UXN MEM/NORMAL
0xffff000008028000-0xffff00000802c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008030000-0xffff000008034000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008035000-0xffff000008036000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000008038000-0xffff00000803c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000803d000-0xffff00000803f000 8K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008040000-0xffff000008060000 128K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000008061000-0xffff000008065000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008066000-0xffff000008067000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000008068000-0xffff00000806c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008070000-0xffff000008074000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008078000-0xffff00000807c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000008080000-0xffff000008200000 1536K PTE ro x
SHD AF CON UXN MEM/NORMAL
0xffff000008200000-0xffff000008a00000 8M PMD ro x
SHD AF BLK UXN MEM/NORMAL
0xffff000008a00000-0xffff000008a50000 320K PTE ro x
SHD AF CON UXN MEM/NORMAL
0xffff000008a50000-0xffff000008c00000 1728K PTE ro NX
SHD AF UXN MEM/NORMAL
0xffff000008c00000-0xffff000008e00000 2M PMD ro NX
SHD AF BLK UXN MEM/NORMAL
0xffff000008e00000-0xffff000008f10000 1088K PTE ro NX
SHD AF UXN MEM/NORMAL
0xffff000009040000-0xffff0000091f0000 1728K PTE RW NX
SHD AF CON UXN MEM/NORMAL
0xffff0000091f0000-0xffff0000091fa000 40K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000091fb000-0xffff0000092fb000 1M PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000092fc000-0xffff00000937c000 512K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009380000-0xffff000009384000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009388000-0xffff00000938c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009390000-0xffff000009394000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009398000-0xffff00000939c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093a0000-0xffff0000093a4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093a8000-0xffff0000093ac000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093b0000-0xffff0000093b4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093b8000-0xffff0000093bc000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093c0000-0xffff0000093c4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093c8000-0xffff0000093cc000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093d0000-0xffff0000093d4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff0000093d5000-0xffff0000093dd000 32K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009408000-0xffff00000940c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009410000-0xffff000009414000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000946d000-0xffff00000946e000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000009475000-0xffff000009476000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff00000947d000-0xffff00000947e000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000009485000-0xffff000009486000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff00000948d000-0xffff00000948e000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000009495000-0xffff000009496000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff000009595000-0xffff0000095d5000 256K PTE RW NX
SHD AF UXN MEM/NORMAL-NC
0xffff000009740000-0xffff000009744000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009c60000-0xffff000009c64000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff000009c70000-0xffff000009c74000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000a000000-0xffff00000af60000 15744K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff00000af61000-0xffff00000af65000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b020000-0xffff00000b024000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b028000-0xffff00000b02c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b030000-0xffff00000b034000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b038000-0xffff00000b03c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b048000-0xffff00000b04c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b0f8000-0xffff00000b0fc000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b170000-0xffff00000b174000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b208000-0xffff00000b20c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b230000-0xffff00000b234000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b238000-0xffff00000b23c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b48d000-0xffff00000b49d000 64K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b49e000-0xffff00000b4be000 128K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b4c0000-0xffff00000b4c4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b538000-0xffff00000b53c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000b7e8000-0xffff00000b7ec000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000c000000-0xffff00000d000000 16M PMD RW NX
SHD AF BLK UXN DEVICE/nGnRnE
0xffff00000d001000-0xffff00000d004000 12K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d260000-0xffff00000d264000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d760000-0xffff00000d764000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d770000-0xffff00000d774000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d778000-0xffff00000d77c000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d7b0000-0xffff00000d7b4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d7d8000-0xffff00000d7dc000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff00000d7e0000-0xffff00000d7e4000 16K PTE RW NX
SHD AF UXN MEM/NORMAL
0xffff7dffbffd8000-0xffff7dffbffdb000 12K PTE RW NX
SHD AF UXN MEM/NORMAL
---[ vmalloc() End ]---
---[ Fixmap start ]---
0xffff7dfffe7fa000-0xffff7dfffe7fb000 4K PTE ro x
SHD AF UXN MEM/NORMAL
0xffff7dfffe7ff000-0xffff7dfffe800000 4K PTE RW NX
SHD AF UXN DEVICE/nGnRE
0xffff7dfffe800000-0xffff7dfffea00000 2M PMD ro NX
SHD AF BLK UXN MEM/NORMAL
---[ Fixmap end ]---
---[ PCI I/O start ]---
0xffff7dfffee00000-0xffff7dfffee10000 64K PTE RW NX
SHD AF UXN DEVICE/nGnRE
---[ PCI I/O end ]---
---[ vmemmap start ]---
0xffff7e0000000000-0xffff7e0001000000 16M PMD RW NX
SHD AF BLK UXN MEM/NORMAL
---[ vmemmap end ]---
---[ Linear Mapping ]---
0xffff800000000000-0xffff800000080000 512K PTE RW NX
SHD AF CON UXN MEM/NORMAL
0xffff800000080000-0xffff800000200000 1536K PTE ro NX
SHD AF UXN MEM/NORMAL
0xffff800000200000-0xffff800000e00000 12M PMD ro NX
SHD AF BLK UXN MEM/NORMAL
0xffff800000e00000-0xffff800000f10000 1088K PTE ro NX
SHD AF UXN MEM/NORMAL
0xffff800000f10000-0xffff800001000000 960K PTE RW NX
SHD AF CON UXN MEM/NORMAL
0xffff800001000000-0xffff800002000000 16M PMD RW NX
SHD AF BLK UXN MEM/NORMAL
0xffff800002000000-0xffff800040000000 992M PMD RW NX
SHD AF CON BLK UXN MEM/NORMAL
estuary:/$
Thanks!
Best Regards,
Wei
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-21 9:20 ` Wei Xu
@ 2018-06-26 17:16 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-26 17:16 UTC (permalink / raw)
To: James Morse, Will Deacon
Cc: mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution),
libeijian
Hi All,
On 2018/6/21 17:20, Wei Xu wrote:
> Hi James,
>
> On 2018/6/21 9:38, James Morse wrote:
>> Hi Will, Wei,
>>
>> On 20/06/18 17:25, Wei Xu wrote:
>>> On 2018/6/20 23:54, James Morse wrote:
>>> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
>>> But I still got the stack overflow issue sometimes.
>>> Do you have more hint?
>>> The log is as below:
>>> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>> [ 0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty
>> Could you reproduce this with v4.17? This says there are ~45,000 extra patches,
>> and un-committed changes. None of the hashes so far have been commits in
>> mainline, so we have no idea what this tree is.
>>
> I have tried v4.17 and log is as below and also it can be found in the first mail
> of this thread.
>
> [ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
> 15 21:39:52 CST 2018
>
> I will try v4.17.2 and v4.18-rc1.
>
>>> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
>>> 23:59:05 CST 2018
>>> [ 0.000000] CPU0: using LPI pending table @0x000000007d860000
>>> [ 0.000000] GIC: PPI11 is secure or misconfigured
>>> [ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
>>> low
>>> [ 0.000000] arch_timer: WARNING: Please fix your firmware
>>> [ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
>> (No idea what these mean, but I doubt they are relevant)
>>
> I will try with mainline qemu 2.12.0.
>
> Thanks!
Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu
2.12.0.
The guest sometimes still failed to boot. But the crash reason is different.
Could you please share any hint?
Thanks!
The guest boot log is as below:
===========================
estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel
./Image-4.18-joyx -initrd
../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
console=ttyAMA0 ear
lycon=pl011,0x9000000"
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
[ 0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #20
SMP PREEMPT Tue Jun 26 23:43:35 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 32 MiB at 0x000000007e000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7dfe9a00-0x7dfeb1bf]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 23 pages/cpu @(____ptrval____)
s56064 r8192 d29952 u94208
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table isolation
(KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 951780K/1048576K available (10172K kernel
code, 1362K rwdata, 4956K rodata, 1216K init, 392K bss, 64028K reserved,
32768K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1,
Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Devices
@7c830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt
Collections @7c840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007c850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007c860000
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000828] Console: colour dummy device 80x25
[ 0.001279] Calibrating delay loop (skipped), value calculated
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002307] pid_max: default: 32768 minimum: 301
[ 0.002925] Security Framework initialized
[ 0.003494] Dentry cache hash table entries: 131072 (order: 8,
1048576 bytes)
[ 0.004277] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.004968] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005628] Mountpoint-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.031117] ASID allocator initialised with 32768 entries
[ 0.035124] Hierarchical SRCU implementation.
[ 0.039492] Platform MSI: its domain created
[ 0.039934] PCI/MSI: /intc/its domain created
[ 0.040509] EFI services will not be available.
[ 0.043153] smp: Bringing up secondary CPUs ...
[ 0.043606] smp: Brought up 1 node, 1 CPU
[ 0.044000] SMP: Total of 1 processors activated.
[ 0.044464] CPU features: detected: GIC system register CPU
interface
[ 0.045112] CPU features: detected: Privileged Access Never
[ 0.045658] CPU features: detected: User Access Override
[ 0.046177] CPU features: detected: RAS Extension Support
[ 0.048119] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000288
[ 0.048991] Mem abort info:
[ 0.049267] ESR = 0x96000004
[ 0.049567] Exception class = DABT (current EL), IL = 32 bits
[ 0.050146] SET = 0, FnV = 0
[ 0.050446] EA = 0, S1PTW = 0
[ 0.050754] Data abort info:
[ 0.051038] ISV = 0, ISS = 0x00000004
[ 0.051921] CM = 0, WnR = 0
[ 0.054936] [0000000000000288] user address but active_mm is swapper
[ 0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 0.067080] Modules linked in:
[ 0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted
4.18.0-rc2-58583-g7daf201-dirty #20
[ 0.078745] Hardware name: linux,dummy-virt (DT)
[ 0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
[ 0.088258] pc : kpti_install_ng_mappings+0x154/0x214
[ 0.093319] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.098483] sp : ffff0000093fbce0
[ 0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
[ 0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
[ 0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
[ 0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
[ 0.123392] x21: ffff00000923b000 x20: 0000000000000000
[ 0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
[ 0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.139513] x15: 000000007dff5000 x14: 000000007dff5000
[ 0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
[ 0.150329] x11: 000000007dff7000 x10: 0000000000000000
[ 0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
[ 0.161042] x7 : 0000000000000000 x6 : 000000004123c000
[ 0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
[ 0.171860] x3 : 0000000000000000 x2 : 000000004123b000
[ 0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0
[ 0.182577] Process migration/0 (pid: 13, stack limit =
0x(____ptrval____))
[ 0.189561] Call trace:
[ 0.192081] kpti_install_ng_mappings+0x154/0x214
[ 0.196892] Code: d503201f d503379f d5033fdf f94033a3 (f9414460)
[ 0.203029] ---[ end trace 3ca968ef0a151b33 ]---
[ 0.207722] note: migration/0[13] exited with preempt_count 1
[ 0.213610] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
[ 0.222393] Mem abort info:
[ 0.225273] ESR = 0x86000004
[ 0.228396] Exception class = IABT (current EL), IL = 32 bits
[ 0.234405] SET = 0, FnV = 0
[ 0.237527] EA = 0, S1PTW = 0
[ 0.240769] [0000000000000000] user address but active_mm is swapper
[ 0.247149] Internal error: Oops: 86000004 [#2] PREEMPT SMP
[ 0.252797] Modules linked in:
[ 0.255922] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G D
4.18.0-rc2-58583-g7daf201-dirty #20
[ 0.265549] Hardware name: linux,dummy-virt (DT)
[ 0.270235] pstate: 60400085 (nZCv daIf +PAN -UAO)
[ 0.275155] pc : (null)
[ 0.278520] lr : (null)
[ 0.281886] sp : ffff00000802bb10
[ 0.285257] x29: 0000000000000000 x28: 0000000000000080
[ 0.290664] x27: ffff000008a82000 x26: ffff000008a52134
[ 0.296073] x25: ffff000009089000 x24: ffff80003ca30570
[ 0.301381] x23: ffff000009064000 x22: ffff0000090acd88
[ 0.306789] x21: ffff80003ca30000 x20: 0000000000000000
[ 0.312196] x19: 0000000000000000 x18: 000000000000000e
[ 0.317503] x17: 0000000000000001 x16: 0000000000000019
[ 0.322910] x15: 0000000000000033 x14: 000000000000004c
[ 0.328317] x13: 0000000000000068 x12: ffff0000093fb7f8
[ 0.333725] x11: 0000000000000108 x10: 0000000000000940
[ 0.339028] x9 : ffff00000802baf0 x8 : ffff80003ca309a0
[ 0.344434] x7 : 0000000000000000 x6 : 0000000000000000
[ 0.349842] x5 : 0000000002da3744 x4 : 0000000000000080
[ 0.355250] x3 : 0000000000000008 x2 : 0000800034f69000
[ 0.360554] x1 : ffff80003ca30000 x0 : ffff80003ca627c0
[ 0.365959] Process swapper/0 (pid: 1, stack limit =
0x(____ptrval____))
[ 0.372801] Call trace:
[ 0.375322] Code: bad PC value
[ 0.378347] ---[ end trace 3ca968ef0a151b34 ]---
The faddr2line result is as :
========================
./scripts/faddr2line ../kernel-dev.build/vmlinux
kpti_install_ng_mappings+0x150/0x214
kpti_install_ng_mappings+0x150/0x214:
__cpu_set_tcr_t0sz at arch/arm64/include/asm/mmu_context.h:94
(inlined by) cpu_uninstall_idmap at
arch/arm64/include/asm/mmu_context.h:125
(inlined by) kpti_install_ng_mappings at
arch/arm64/kernel/cpufeature.c:921
The assembler of kpti_install_ng_mappings is as:
=============================================
Dump of assembler code for function kpti_install_ng_mappings:
0xffff000008091f7c <+0>: stp x29, x30, [sp,#-112]!
0xffff000008091f80 <+4>: adrp x0, 0xffff000009064000
<bp_hardening_data>
0xffff000008091f84 <+8>: mov x29, sp
0xffff000008091f88 <+12>: stp x23, x24, [sp,#48]
0xffff000008091f8c <+16>: adrp x24, 0xffff0000091d9000
<reset_devices>
0xffff000008091f90 <+20>: add x0, x0, #0x18
0xffff000008091f94 <+24>: add x1, x24, #0x550
0xffff000008091f98 <+28>: stp x19, x20, [sp,#16]
0xffff000008091f9c <+32>: stp x21, x22, [sp,#32]
0xffff000008091fa0 <+36>: stp x25, x26, [sp,#64]
0xffff000008091fa4 <+40>: stp x27, x28, [sp,#80]
0xffff000008091fa8 <+44>: mrs x2, tpidr_el1
0xffff000008091fac <+48>: ldrb w1, [x1,#8]
0xffff000008091fb0 <+52>: ldr w20, [x2,x0]
0xffff000008091fb4 <+56>: cbnz w1, 0xffff00000809212c
<kpti_install_ng_mappings+432>
0xffff000008091fb8 <+60>: adrp x27, 0xffff000008ee5000
<sve_vq_map+32>
0xffff000008091fbc <+64>: adrp x19, 0xffff0000091d8000
<empty_zero_page>
0xffff000008091fc0 <+68>: add x19, x19, #0x0
0xffff000008091fc4 <+72>: adrp x1, 0xffff000008a5f000
<kimage_vaddr>
0xffff000008091fc8 <+76>: mov x0, x19
0xffff000008091fcc <+80>: add x1, x1, #0x3d8
0xffff000008091fd0 <+84>: ldr x2, [x27,#1176]
0xffff000008091fd4 <+88>: sub x4, x1, x2
0xffff000008091fd8 <+92>: sub x0, x0, x2
0xffff000008091fdc <+96>: msr ttbr0_el1, x0
0xffff000008091fe0 <+100>: isb
0xffff000008091fe4 <+104>: dsb nshst
0xffff000008091fe8 <+108>: tlbi vmalle1
0xffff000008091fec <+112>: nop
0xffff000008091ff0 <+116>: nop
0xffff000008091ff4 <+120>: dsb nsh
0xffff000008091ff8 <+124>: isb
0xffff000008091ffc <+128>: adrp x3, 0xffff000009096000
<early_node_cpu_hwid+1440>
0xffff000008092000 <+132>: ldr x0, [x3,#648]
0xffff000008092004 <+136>: cmp x0, #0x10
0xffff000008092008 <+140>: b.ne 0xffff000008092178
<kpti_install_ng_mappings+508>
0xffff00000809200c <+144>: adrp x28, 0xffff000008ee5000
<sve_vq_map+32>
0xffff000008092010 <+148>: ldr x2, [x27,#1176]
0xffff000008092014 <+152>: adrp x1, 0xffff000009237000
0xffff000008092018 <+156>: adrp x26, 0xffff00000923b000
0xffff00000809201c <+160>: add x1, x1, #0x0
0xffff000008092020 <+164>: add x21, x26, #0x0
0xffff000008092024 <+168>: ldr x0, [x28,#1160]
0xffff000008092028 <+172>: adrp x23, 0xffff000008ee5000
<sve_vq_map+32>
0xffff00000809202c <+176>: sub x1, x1, x2
0xffff000008092030 <+180>: sub x1, x1, x0
0xffff000008092034 <+184>: orr x0, x1, #0xffff800000000000
0xffff000008092038 <+188>: cmp x0, x21
0xffff00000809203c <+192>: b.eq 0xffff000008092174
<kpti_install_ng_mappings+504>
0xffff000008092040 <+196>: mov x22, x19
0xffff000008092044 <+200>: str x3, [x29,#96]
0xffff000008092048 <+204>: str x4, [x29,#104]
0xffff00000809204c <+208>: sub x2, x22, x2
0xffff000008092050 <+212>: msr ttbr0_el1, x2
0xffff000008092054 <+216>: isb
0xffff000008092058 <+220>: ldr x0, [x28,#1160]
---Type <return> to continue, or q <return> to quit---
0xffff00000809205c <+224>: and x1, x1, #0x7fffffffffff
0xffff000008092060 <+228>: adrp x25, 0xffff0000090ac000
<perf_cpu_clock+200>
0xffff000008092064 <+232>: add x0, x1, x0
0xffff000008092068 <+236>: add x1, x25, #0xd88
0xffff00000809206c <+240>: bl 0xffff0000080a0750
<cpu_do_switch_mm>
0xffff000008092070 <+244>: adrp x0, 0xffff000009089000
<page_wait_table+5376>
0xffff000008092074 <+248>: mov w1,
#0x80 // #128
0xffff000008092078 <+252>: add x0, x0, #0xb48
0xffff00000809207c <+256>: bl 0xffff0000083e8144
<__bitmap_weight>
0xffff000008092080 <+260>: mov w1, w0
0xffff000008092084 <+264>: ldr x5, [x23,#1176]
0xffff000008092088 <+268>: mov w0, w20
0xffff00000809208c <+272>: ldr x4, [x29,#104]
0xffff000008092090 <+276>: mov x2, x21
0xffff000008092094 <+280>: sub x2, x2, x5
0xffff000008092098 <+284>: blr x4
0xffff00000809209c <+288>: ldr x1, [x23,#1176]
0xffff0000080920a0 <+292>: mrs x0, sp_el0
0xffff0000080920a4 <+296>: sub x22, x22, x1
0xffff0000080920a8 <+300>: ldr x1, [x0,#936]
0xffff0000080920ac <+304>: msr ttbr0_el1, x22
0xffff0000080920b0 <+308>: isb
0xffff0000080920b4 <+312>: dsb nshst
0xffff0000080920b8 <+316>: tlbi vmalle1
0xffff0000080920bc <+320>: nop
0xffff0000080920c0 <+324>: nop
0xffff0000080920c4 <+328>: dsb nsh
0xffff0000080920c8 <+332>: isb
0xffff0000080920cc <+336>: ldr x3, [x29,#96]
0xffff0000080920d0 <+340>: ldr x0, [x3,#648]
0xffff0000080920d4 <+344>: cmp x0, #0x10
0xffff0000080920d8 <+348>: b.ne 0xffff00000809215c
<kpti_install_ng_mappings+480>
0xffff0000080920dc <+352>: add x25, x25, #0xd88
0xffff0000080920e0 <+356>: cmp x1, x25
0xffff0000080920e4 <+360>: b.eq 0xffff00000809211c
<kpti_install_ng_mappings+416>
0xffff0000080920e8 <+364>: ldr x2, [x1,#64]
0xffff0000080920ec <+368>: add x26, x26, #0x0
0xffff0000080920f0 <+372>: cmp x2, x26
0xffff0000080920f4 <+376>: b.eq 0xffff000008092174
<kpti_install_ng_mappings+504>
0xffff0000080920f8 <+380>: ldr x0, [x27,#1176]
0xffff0000080920fc <+384>: sub x19, x19, x0
0xffff000008092100 <+388>: msr ttbr0_el1, x19
0xffff000008092104 <+392>: isb
0xffff000008092108 <+396>: tbz x2, #47, 0xffff000008092148
<kpti_install_ng_mappings+460>
0xffff00000809210c <+400>: ldr x0, [x28,#1160]
0xffff000008092110 <+404>: and x2, x2, #0x7fffffffffff
0xffff000008092114 <+408>: add x0, x2, x0
0xffff000008092118 <+412>: bl 0xffff0000080a0750
<cpu_do_switch_mm>
0xffff00000809211c <+416>: cbnz w20, 0xffff00000809212c
<kpti_install_ng_mappings+432>
0xffff000008092120 <+420>: add x24, x24, #0x550
0xffff000008092124 <+424>: mov w0,
#0x1 // #1
0xffff000008092128 <+428>: strb w0, [x24,#8]
0xffff00000809212c <+432>: ldp x19, x20, [sp,#16]
0xffff000008092130 <+436>: ldp x21, x22, [sp,#32]
0xffff000008092134 <+440>: ldp x23, x24, [sp,#48]
0xffff000008092138 <+444>: ldp x25, x26, [sp,#64]
0xffff00000809213c <+448>: ldp x27, x28, [sp,#80]
---Type <return> to continue, or q <return> to quit---
0xffff000008092140 <+452>: ldp x29, x30, [sp],#112
0xffff000008092144 <+456>: ret
0xffff000008092148 <+460>: adrp x0, 0xffff000008ee5000
<sve_vq_map+32>
0xffff00000809214c <+464>: ldr x0, [x0,#1176]
0xffff000008092150 <+468>: sub x0, x2, x0
0xffff000008092154 <+472>: bl 0xffff0000080a0750
<cpu_do_switch_mm>
0xffff000008092158 <+476>: b 0xffff00000809211c
<kpti_install_ng_mappings+416>
0xffff00000809215c <+480>: mrs x0, tcr_el1
0xffff000008092160 <+484>: and x0, x0, #0xffffffffffffffc0
0xffff000008092164 <+488>: orr x0, x0, #0x10
0xffff000008092168 <+492>: msr tcr_el1, x0
0xffff00000809216c <+496>: isb
0xffff000008092170 <+500>: b 0xffff0000080920dc
<kpti_install_ng_mappings+352>
0xffff000008092174 <+504>: brk #0x800
0xffff000008092178 <+508>: mrs x1, tcr_el1
0xffff00000809217c <+512>: and x1, x1, #0xffffffffffffffc0
0xffff000008092180 <+516>: orr x0, x1, x0
0xffff000008092184 <+520>: msr tcr_el1, x0
0xffff000008092188 <+524>: isb
0xffff00000809218c <+528>: b 0xffff00000809200c
<kpti_install_ng_mappings+144>
End of assembler dump.
Best Regards,
Wei
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-26 17:16 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-26 17:16 UTC (permalink / raw)
To: linux-arm-kernel
Hi All,
On 2018/6/21 17:20, Wei Xu wrote:
> Hi James,
>
> On 2018/6/21 9:38, James Morse wrote:
>> Hi Will, Wei,
>>
>> On 20/06/18 17:25, Wei Xu wrote:
>>> On 2018/6/20 23:54, James Morse wrote:
>>> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
>>> But I still got the stack overflow issue sometimes.
>>> Do you have more hint?
>>> The log is as below:
>>> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>> [ 0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty
>> Could you reproduce this with v4.17? This says there are ~45,000 extra patches,
>> and un-committed changes. None of the hashes so far have been commits in
>> mainline, so we have no idea what this tree is.
>>
> I have tried v4.17 and log is as below and also it can be found in the first mail
> of this thread.
>
> [ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
> 15 21:39:52 CST 2018
>
> I will try v4.17.2 and v4.18-rc1.
>
>>> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
>>> 23:59:05 CST 2018
>>> [ 0.000000] CPU0: using LPI pending table @0x000000007d860000
>>> [ 0.000000] GIC: PPI11 is secure or misconfigured
>>> [ 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
>>> low
>>> [ 0.000000] arch_timer: WARNING: Please fix your firmware
>>> [ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
>> (No idea what these mean, but I doubt they are relevant)
>>
> I will try with mainline qemu 2.12.0.
>
> Thanks!
Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu
2.12.0.
The guest sometimes still failed to boot. But the crash reason is different.
Could you please share any hint?
Thanks!
The guest boot log is as below:
===========================
estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel
./Image-4.18-joyx -initrd
../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
console=ttyAMA0 ear
lycon=pl011,0x9000000"
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
[ 0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease)
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #20
SMP PREEMPT Tue Jun 26 23:43:35 CST 2018
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] bootconsole [pl11] enabled
[ 0.000000] efi: Getting EFI parameters from FDT:
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 32 MiB at 0x000000007e000000
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem
0x0000000000000000-0x000000007fffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x7dfe9a00-0x7dfeb1bf]
[ 0.000000] Zone ranges:
[ 0.000000] DMA32 [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] Initmem setup node 0 [mem
0x0000000040000000-0x000000007fffffff]
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv1.0 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] psci: SMC Calling Convention v1.1
[ 0.000000] random: get_random_bytes called from
start_kernel+0xa8/0x418 with crng_init=0
[ 0.000000] percpu: Embedded 23 pages/cpu @(____ptrval____)
s56064 r8192 d29952 u94208
[ 0.000000] Detected VIPT I-cache on CPU0
[ 0.000000] CPU features: detected: Kernel page table isolation
(KPTI)
[ 0.000000] CPU features: detected: Hardware dirty bit management
[ 0.000000] Built 1 zonelists, mobility grouping on. Total
pages: 258048
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: rdinit=init console=ttyAMA0
earlycon=pl011,0x9000000
[ 0.000000] Memory: 951780K/1048576K available (10172K kernel
code, 1362K rwdata, 4956K rodata, 1216K init, 392K bss, 64028K reserved,
32768K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1,
Nodes=1
[ 0.000000] Preemptible hierarchical RCU implementation.
[ 0.000000] RCU restricting CPUs from NR_CPUS=128 to
nr_cpu_ids=1.
[ 0.000000] Tasks RCU enabled.
[ 0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16,
nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv3: Distributor has no Range Selector support
[ 0.000000] GICv3: no VLPI support, no direct LPI support
[ 0.000000] ITS [mem 0x08080000-0x0809ffff]
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Devices
@7c830000 (indirect, esz 8, psz 64K, shr 1)
[ 0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt
Collections @7c840000 (flat, esz 8, psz 64K, shr 1)
[ 0.000000] GIC: using LPI property table @0x000000007c850000
[ 0.000000] ITS: Allocated 1792 chunks for LPIs
[ 0.000000] GICv3: CPU0: found redistributor 0 region
0:0x00000000080a0000
[ 0.000000] CPU0: using LPI pending table @0x000000007c860000
[ 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask:
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
[ 0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns,
wraps every 4398046511100ns
[ 0.000828] Console: colour dummy device 80x25
[ 0.001279] Calibrating delay loop (skipped), value calculated
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
[ 0.002307] pid_max: default: 32768 minimum: 301
[ 0.002925] Security Framework initialized
[ 0.003494] Dentry cache hash table entries: 131072 (order: 8,
1048576 bytes)
[ 0.004277] Inode-cache hash table entries: 65536 (order: 7,
524288 bytes)
[ 0.004968] Mount-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.005628] Mountpoint-cache hash table entries: 2048 (order: 2,
16384 bytes)
[ 0.031117] ASID allocator initialised with 32768 entries
[ 0.035124] Hierarchical SRCU implementation.
[ 0.039492] Platform MSI: its domain created
[ 0.039934] PCI/MSI: /intc/its domain created
[ 0.040509] EFI services will not be available.
[ 0.043153] smp: Bringing up secondary CPUs ...
[ 0.043606] smp: Brought up 1 node, 1 CPU
[ 0.044000] SMP: Total of 1 processors activated.
[ 0.044464] CPU features: detected: GIC system register CPU
interface
[ 0.045112] CPU features: detected: Privileged Access Never
[ 0.045658] CPU features: detected: User Access Override
[ 0.046177] CPU features: detected: RAS Extension Support
[ 0.048119] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000288
[ 0.048991] Mem abort info:
[ 0.049267] ESR = 0x96000004
[ 0.049567] Exception class = DABT (current EL), IL = 32 bits
[ 0.050146] SET = 0, FnV = 0
[ 0.050446] EA = 0, S1PTW = 0
[ 0.050754] Data abort info:
[ 0.051038] ISV = 0, ISS = 0x00000004
[ 0.051921] CM = 0, WnR = 0
[ 0.054936] [0000000000000288] user address but active_mm is swapper
[ 0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 0.067080] Modules linked in:
[ 0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted
4.18.0-rc2-58583-g7daf201-dirty #20
[ 0.078745] Hardware name: linux,dummy-virt (DT)
[ 0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
[ 0.088258] pc : kpti_install_ng_mappings+0x154/0x214
[ 0.093319] lr : kpti_install_ng_mappings+0x120/0x214
[ 0.098483] sp : ffff0000093fbce0
[ 0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
[ 0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
[ 0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
[ 0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
[ 0.123392] x21: ffff00000923b000 x20: 0000000000000000
[ 0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
[ 0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
[ 0.139513] x15: 000000007dff5000 x14: 000000007dff5000
[ 0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
[ 0.150329] x11: 000000007dff7000 x10: 0000000000000000
[ 0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
[ 0.161042] x7 : 0000000000000000 x6 : 000000004123c000
[ 0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
[ 0.171860] x3 : 0000000000000000 x2 : 000000004123b000
[ 0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0
[ 0.182577] Process migration/0 (pid: 13, stack limit =
0x(____ptrval____))
[ 0.189561] Call trace:
[ 0.192081] kpti_install_ng_mappings+0x154/0x214
[ 0.196892] Code: d503201f d503379f d5033fdf f94033a3 (f9414460)
[ 0.203029] ---[ end trace 3ca968ef0a151b33 ]---
[ 0.207722] note: migration/0[13] exited with preempt_count 1
[ 0.213610] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
[ 0.222393] Mem abort info:
[ 0.225273] ESR = 0x86000004
[ 0.228396] Exception class = IABT (current EL), IL = 32 bits
[ 0.234405] SET = 0, FnV = 0
[ 0.237527] EA = 0, S1PTW = 0
[ 0.240769] [0000000000000000] user address but active_mm is swapper
[ 0.247149] Internal error: Oops: 86000004 [#2] PREEMPT SMP
[ 0.252797] Modules linked in:
[ 0.255922] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G D
4.18.0-rc2-58583-g7daf201-dirty #20
[ 0.265549] Hardware name: linux,dummy-virt (DT)
[ 0.270235] pstate: 60400085 (nZCv daIf +PAN -UAO)
[ 0.275155] pc : (null)
[ 0.278520] lr : (null)
[ 0.281886] sp : ffff00000802bb10
[ 0.285257] x29: 0000000000000000 x28: 0000000000000080
[ 0.290664] x27: ffff000008a82000 x26: ffff000008a52134
[ 0.296073] x25: ffff000009089000 x24: ffff80003ca30570
[ 0.301381] x23: ffff000009064000 x22: ffff0000090acd88
[ 0.306789] x21: ffff80003ca30000 x20: 0000000000000000
[ 0.312196] x19: 0000000000000000 x18: 000000000000000e
[ 0.317503] x17: 0000000000000001 x16: 0000000000000019
[ 0.322910] x15: 0000000000000033 x14: 000000000000004c
[ 0.328317] x13: 0000000000000068 x12: ffff0000093fb7f8
[ 0.333725] x11: 0000000000000108 x10: 0000000000000940
[ 0.339028] x9 : ffff00000802baf0 x8 : ffff80003ca309a0
[ 0.344434] x7 : 0000000000000000 x6 : 0000000000000000
[ 0.349842] x5 : 0000000002da3744 x4 : 0000000000000080
[ 0.355250] x3 : 0000000000000008 x2 : 0000800034f69000
[ 0.360554] x1 : ffff80003ca30000 x0 : ffff80003ca627c0
[ 0.365959] Process swapper/0 (pid: 1, stack limit =
0x(____ptrval____))
[ 0.372801] Call trace:
[ 0.375322] Code: bad PC value
[ 0.378347] ---[ end trace 3ca968ef0a151b34 ]---
The faddr2line result is as :
========================
./scripts/faddr2line ../kernel-dev.build/vmlinux
kpti_install_ng_mappings+0x150/0x214
kpti_install_ng_mappings+0x150/0x214:
__cpu_set_tcr_t0sz at arch/arm64/include/asm/mmu_context.h:94
(inlined by) cpu_uninstall_idmap at
arch/arm64/include/asm/mmu_context.h:125
(inlined by) kpti_install_ng_mappings at
arch/arm64/kernel/cpufeature.c:921
The assembler of kpti_install_ng_mappings is as:
=============================================
Dump of assembler code for function kpti_install_ng_mappings:
0xffff000008091f7c <+0>: stp x29, x30, [sp,#-112]!
0xffff000008091f80 <+4>: adrp x0, 0xffff000009064000
<bp_hardening_data>
0xffff000008091f84 <+8>: mov x29, sp
0xffff000008091f88 <+12>: stp x23, x24, [sp,#48]
0xffff000008091f8c <+16>: adrp x24, 0xffff0000091d9000
<reset_devices>
0xffff000008091f90 <+20>: add x0, x0, #0x18
0xffff000008091f94 <+24>: add x1, x24, #0x550
0xffff000008091f98 <+28>: stp x19, x20, [sp,#16]
0xffff000008091f9c <+32>: stp x21, x22, [sp,#32]
0xffff000008091fa0 <+36>: stp x25, x26, [sp,#64]
0xffff000008091fa4 <+40>: stp x27, x28, [sp,#80]
0xffff000008091fa8 <+44>: mrs x2, tpidr_el1
0xffff000008091fac <+48>: ldrb w1, [x1,#8]
0xffff000008091fb0 <+52>: ldr w20, [x2,x0]
0xffff000008091fb4 <+56>: cbnz w1, 0xffff00000809212c
<kpti_install_ng_mappings+432>
0xffff000008091fb8 <+60>: adrp x27, 0xffff000008ee5000
<sve_vq_map+32>
0xffff000008091fbc <+64>: adrp x19, 0xffff0000091d8000
<empty_zero_page>
0xffff000008091fc0 <+68>: add x19, x19, #0x0
0xffff000008091fc4 <+72>: adrp x1, 0xffff000008a5f000
<kimage_vaddr>
0xffff000008091fc8 <+76>: mov x0, x19
0xffff000008091fcc <+80>: add x1, x1, #0x3d8
0xffff000008091fd0 <+84>: ldr x2, [x27,#1176]
0xffff000008091fd4 <+88>: sub x4, x1, x2
0xffff000008091fd8 <+92>: sub x0, x0, x2
0xffff000008091fdc <+96>: msr ttbr0_el1, x0
0xffff000008091fe0 <+100>: isb
0xffff000008091fe4 <+104>: dsb nshst
0xffff000008091fe8 <+108>: tlbi vmalle1
0xffff000008091fec <+112>: nop
0xffff000008091ff0 <+116>: nop
0xffff000008091ff4 <+120>: dsb nsh
0xffff000008091ff8 <+124>: isb
0xffff000008091ffc <+128>: adrp x3, 0xffff000009096000
<early_node_cpu_hwid+1440>
0xffff000008092000 <+132>: ldr x0, [x3,#648]
0xffff000008092004 <+136>: cmp x0, #0x10
0xffff000008092008 <+140>: b.ne 0xffff000008092178
<kpti_install_ng_mappings+508>
0xffff00000809200c <+144>: adrp x28, 0xffff000008ee5000
<sve_vq_map+32>
0xffff000008092010 <+148>: ldr x2, [x27,#1176]
0xffff000008092014 <+152>: adrp x1, 0xffff000009237000
0xffff000008092018 <+156>: adrp x26, 0xffff00000923b000
0xffff00000809201c <+160>: add x1, x1, #0x0
0xffff000008092020 <+164>: add x21, x26, #0x0
0xffff000008092024 <+168>: ldr x0, [x28,#1160]
0xffff000008092028 <+172>: adrp x23, 0xffff000008ee5000
<sve_vq_map+32>
0xffff00000809202c <+176>: sub x1, x1, x2
0xffff000008092030 <+180>: sub x1, x1, x0
0xffff000008092034 <+184>: orr x0, x1, #0xffff800000000000
0xffff000008092038 <+188>: cmp x0, x21
0xffff00000809203c <+192>: b.eq 0xffff000008092174
<kpti_install_ng_mappings+504>
0xffff000008092040 <+196>: mov x22, x19
0xffff000008092044 <+200>: str x3, [x29,#96]
0xffff000008092048 <+204>: str x4, [x29,#104]
0xffff00000809204c <+208>: sub x2, x22, x2
0xffff000008092050 <+212>: msr ttbr0_el1, x2
0xffff000008092054 <+216>: isb
0xffff000008092058 <+220>: ldr x0, [x28,#1160]
---Type <return> to continue, or q <return> to quit---
0xffff00000809205c <+224>: and x1, x1, #0x7fffffffffff
0xffff000008092060 <+228>: adrp x25, 0xffff0000090ac000
<perf_cpu_clock+200>
0xffff000008092064 <+232>: add x0, x1, x0
0xffff000008092068 <+236>: add x1, x25, #0xd88
0xffff00000809206c <+240>: bl 0xffff0000080a0750
<cpu_do_switch_mm>
0xffff000008092070 <+244>: adrp x0, 0xffff000009089000
<page_wait_table+5376>
0xffff000008092074 <+248>: mov w1,
#0x80 // #128
0xffff000008092078 <+252>: add x0, x0, #0xb48
0xffff00000809207c <+256>: bl 0xffff0000083e8144
<__bitmap_weight>
0xffff000008092080 <+260>: mov w1, w0
0xffff000008092084 <+264>: ldr x5, [x23,#1176]
0xffff000008092088 <+268>: mov w0, w20
0xffff00000809208c <+272>: ldr x4, [x29,#104]
0xffff000008092090 <+276>: mov x2, x21
0xffff000008092094 <+280>: sub x2, x2, x5
0xffff000008092098 <+284>: blr x4
0xffff00000809209c <+288>: ldr x1, [x23,#1176]
0xffff0000080920a0 <+292>: mrs x0, sp_el0
0xffff0000080920a4 <+296>: sub x22, x22, x1
0xffff0000080920a8 <+300>: ldr x1, [x0,#936]
0xffff0000080920ac <+304>: msr ttbr0_el1, x22
0xffff0000080920b0 <+308>: isb
0xffff0000080920b4 <+312>: dsb nshst
0xffff0000080920b8 <+316>: tlbi vmalle1
0xffff0000080920bc <+320>: nop
0xffff0000080920c0 <+324>: nop
0xffff0000080920c4 <+328>: dsb nsh
0xffff0000080920c8 <+332>: isb
0xffff0000080920cc <+336>: ldr x3, [x29,#96]
0xffff0000080920d0 <+340>: ldr x0, [x3,#648]
0xffff0000080920d4 <+344>: cmp x0, #0x10
0xffff0000080920d8 <+348>: b.ne 0xffff00000809215c
<kpti_install_ng_mappings+480>
0xffff0000080920dc <+352>: add x25, x25, #0xd88
0xffff0000080920e0 <+356>: cmp x1, x25
0xffff0000080920e4 <+360>: b.eq 0xffff00000809211c
<kpti_install_ng_mappings+416>
0xffff0000080920e8 <+364>: ldr x2, [x1,#64]
0xffff0000080920ec <+368>: add x26, x26, #0x0
0xffff0000080920f0 <+372>: cmp x2, x26
0xffff0000080920f4 <+376>: b.eq 0xffff000008092174
<kpti_install_ng_mappings+504>
0xffff0000080920f8 <+380>: ldr x0, [x27,#1176]
0xffff0000080920fc <+384>: sub x19, x19, x0
0xffff000008092100 <+388>: msr ttbr0_el1, x19
0xffff000008092104 <+392>: isb
0xffff000008092108 <+396>: tbz x2, #47, 0xffff000008092148
<kpti_install_ng_mappings+460>
0xffff00000809210c <+400>: ldr x0, [x28,#1160]
0xffff000008092110 <+404>: and x2, x2, #0x7fffffffffff
0xffff000008092114 <+408>: add x0, x2, x0
0xffff000008092118 <+412>: bl 0xffff0000080a0750
<cpu_do_switch_mm>
0xffff00000809211c <+416>: cbnz w20, 0xffff00000809212c
<kpti_install_ng_mappings+432>
0xffff000008092120 <+420>: add x24, x24, #0x550
0xffff000008092124 <+424>: mov w0,
#0x1 // #1
0xffff000008092128 <+428>: strb w0, [x24,#8]
0xffff00000809212c <+432>: ldp x19, x20, [sp,#16]
0xffff000008092130 <+436>: ldp x21, x22, [sp,#32]
0xffff000008092134 <+440>: ldp x23, x24, [sp,#48]
0xffff000008092138 <+444>: ldp x25, x26, [sp,#64]
0xffff00000809213c <+448>: ldp x27, x28, [sp,#80]
---Type <return> to continue, or q <return> to quit---
0xffff000008092140 <+452>: ldp x29, x30, [sp],#112
0xffff000008092144 <+456>: ret
0xffff000008092148 <+460>: adrp x0, 0xffff000008ee5000
<sve_vq_map+32>
0xffff00000809214c <+464>: ldr x0, [x0,#1176]
0xffff000008092150 <+468>: sub x0, x2, x0
0xffff000008092154 <+472>: bl 0xffff0000080a0750
<cpu_do_switch_mm>
0xffff000008092158 <+476>: b 0xffff00000809211c
<kpti_install_ng_mappings+416>
0xffff00000809215c <+480>: mrs x0, tcr_el1
0xffff000008092160 <+484>: and x0, x0, #0xffffffffffffffc0
0xffff000008092164 <+488>: orr x0, x0, #0x10
0xffff000008092168 <+492>: msr tcr_el1, x0
0xffff00000809216c <+496>: isb
0xffff000008092170 <+500>: b 0xffff0000080920dc
<kpti_install_ng_mappings+352>
0xffff000008092174 <+504>: brk #0x800
0xffff000008092178 <+508>: mrs x1, tcr_el1
0xffff00000809217c <+512>: and x1, x1, #0xffffffffffffffc0
0xffff000008092180 <+516>: orr x0, x1, x0
0xffff000008092184 <+520>: msr tcr_el1, x0
0xffff000008092188 <+524>: isb
0xffff00000809218c <+528>: b 0xffff00000809200c
<kpti_install_ng_mappings+144>
End of assembler dump.
Best Regards,
Wei
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-26 17:16 ` Wei Xu
@ 2018-06-26 17:47 ` Will Deacon
-1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-26 17:47 UTC (permalink / raw)
To: Wei Xu
Cc: James Morse, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution),
libeijian
Hi Wei,
On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
> Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu
> 2.12.0.
> The guest sometimes still failed to boot. But the crash reason is different.
> Could you please share any hint?
> Thanks!
>
> The guest boot log is as below:
> ===========================
>
> estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
> ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.18-joyx
> -initrd
> ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
> console=ttyAMA0 ear
> lycon=pl011,0x9000000"
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
> [ 0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !
> [ 0.048119] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000288
> [ 0.048991] Mem abort info:
> [ 0.049267] ESR = 0x96000004
> [ 0.049567] Exception class = DABT (current EL), IL = 32 bits
> [ 0.050146] SET = 0, FnV = 0
> [ 0.050446] EA = 0, S1PTW = 0
> [ 0.050754] Data abort info:
> [ 0.051038] ISV = 0, ISS = 0x00000004
> [ 0.051921] CM = 0, WnR = 0
> [ 0.054936] [0000000000000288] user address but active_mm is swapper
> [ 0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [ 0.067080] Modules linked in:
> [ 0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted
> 4.18.0-rc2-58583-g7daf201-dirty #20
> [ 0.078745] Hardware name: linux,dummy-virt (DT)
> [ 0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
> [ 0.088258] pc : kpti_install_ng_mappings+0x154/0x214
> [ 0.093319] lr : kpti_install_ng_mappings+0x120/0x214
> [ 0.098483] sp : ffff0000093fbce0
> [ 0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
> [ 0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
> [ 0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
> [ 0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
> [ 0.123392] x21: ffff00000923b000 x20: 0000000000000000
> [ 0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
> [ 0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
> [ 0.139513] x15: 000000007dff5000 x14: 000000007dff5000
> [ 0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
> [ 0.150329] x11: 000000007dff7000 x10: 0000000000000000
> [ 0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
> [ 0.161042] x7 : 0000000000000000 x6 : 000000004123c000
> [ 0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
> [ 0.171860] x3 : 0000000000000000 x2 : 000000004123b000
> [ 0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0
So looking at the disassembly, we access idmap_t0sz as part of
cpu_install_idmap() and it looks like we push its page address to the
stack:
> 0xffff000008091ffc <+128>: adrp x3, 0xffff000009096000 <early_node_cpu_hwid+1440>
[...]
> 0xffff000008092044 <+200>: str x3, [x29,#96]
Then after we've come back from the asm call, we want to access idmap_t0sz
again as part of cpu_uninstall_idmap() so we pop it back off:
> 0xffff0000080920cc <+336>: ldr x3, [x29,#96]
> 0xffff0000080920d0 <+340>: ldr x0, [x3,#648]
And this access is the one that faults, because we popped off NULL.
So actually, rather than faulting on the stack access, we're managing to
load zeroes from somewhere, so it could still be indicative of page table
corruption for the stack mapping.
If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
replacing:
dc civac, cur_\()\type\()p
with:
dc ivac, cur_\()\type\()p
please? Only do this for the guest kernel, not the host. KVM will upgrade
the clean to a clean+invalidate, so it's interesting to see if this has
an effect on the behaviour.
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-26 17:47 ` Will Deacon
0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-26 17:47 UTC (permalink / raw)
To: linux-arm-kernel
Hi Wei,
On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
> Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu
> 2.12.0.
> The guest sometimes still failed to boot. But the crash reason is different.
> Could you please share any hint?
> Thanks!
>
> The guest boot log is as below:
> ===========================
>
> estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
> ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.18-joyx
> -initrd
> ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
> console=ttyAMA0 ear
> lycon=pl011,0x9000000"
>
> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
> [ 0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !
> [ 0.048119] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000288
> [ 0.048991] Mem abort info:
> [ 0.049267] ESR = 0x96000004
> [ 0.049567] Exception class = DABT (current EL), IL = 32 bits
> [ 0.050146] SET = 0, FnV = 0
> [ 0.050446] EA = 0, S1PTW = 0
> [ 0.050754] Data abort info:
> [ 0.051038] ISV = 0, ISS = 0x00000004
> [ 0.051921] CM = 0, WnR = 0
> [ 0.054936] [0000000000000288] user address but active_mm is swapper
> [ 0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> [ 0.067080] Modules linked in:
> [ 0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted
> 4.18.0-rc2-58583-g7daf201-dirty #20
> [ 0.078745] Hardware name: linux,dummy-virt (DT)
> [ 0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
> [ 0.088258] pc : kpti_install_ng_mappings+0x154/0x214
> [ 0.093319] lr : kpti_install_ng_mappings+0x120/0x214
> [ 0.098483] sp : ffff0000093fbce0
> [ 0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
> [ 0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
> [ 0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
> [ 0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
> [ 0.123392] x21: ffff00000923b000 x20: 0000000000000000
> [ 0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
> [ 0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
> [ 0.139513] x15: 000000007dff5000 x14: 000000007dff5000
> [ 0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
> [ 0.150329] x11: 000000007dff7000 x10: 0000000000000000
> [ 0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
> [ 0.161042] x7 : 0000000000000000 x6 : 000000004123c000
> [ 0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
> [ 0.171860] x3 : 0000000000000000 x2 : 000000004123b000
> [ 0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0
So looking at the disassembly, we access idmap_t0sz as part of
cpu_install_idmap() and it looks like we push its page address to the
stack:
> 0xffff000008091ffc <+128>: adrp x3, 0xffff000009096000 <early_node_cpu_hwid+1440>
[...]
> 0xffff000008092044 <+200>: str x3, [x29,#96]
Then after we've come back from the asm call, we want to access idmap_t0sz
again as part of cpu_uninstall_idmap() so we pop it back off:
> 0xffff0000080920cc <+336>: ldr x3, [x29,#96]
> 0xffff0000080920d0 <+340>: ldr x0, [x3,#648]
And this access is the one that faults, because we popped off NULL.
So actually, rather than faulting on the stack access, we're managing to
load zeroes from somewhere, so it could still be indicative of page table
corruption for the stack mapping.
If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
replacing:
dc civac, cur_\()\type\()p
with:
dc ivac, cur_\()\type\()p
please? Only do this for the guest kernel, not the host. KVM will upgrade
the clean to a clean+invalidate, so it's interesting to see if this has
an effect on the behaviour.
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-26 17:47 ` Will Deacon
@ 2018-06-27 8:39 ` James Morse
-1 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-27 8:39 UTC (permalink / raw)
To: Wei Xu
Cc: Will Deacon, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution),
libeijian
Hi Wei,
On 26/06/18 18:47, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>> [ 0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
>
> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !
Some examples:
For comparison, when I boot v4.17 it looks like this:
| Linux version 4.17.0 (morse@melchizedek) (gcc version 4.9.3 20141031
| (prerelease) (Linaro GCC 2014.11)) #9886 SMP PREEMPT Thu Jun 21 10:30:55 BST
| 2018
If I apply some extra patches and make some uncommitted changes, it looks like this:
| Linux version 4.17.0-00025-ga22ca2234824-dirty (morse@melchizedek) (gcc
| version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11)) #9887 SMP PREEMPT
| Thu Jun 21 10:46:22 BST 2018
Hence we read your '4.17.0-45864-g29dcea8-dirty' line as v4.17 with extra
patches and uncommited changes, and similar with this v4.18-rc2.
I agree 7daf201 is the head commit for v4.18-rc2, but something has gone wrong
here. Could you try building from a fresh clone of Linus' tree?
(I suspect at some point you've applied a patch, and have then been merging
upstream, instead of 'fast forwarding')
Thanks,
James
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-27 8:39 ` James Morse
0 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-27 8:39 UTC (permalink / raw)
To: linux-arm-kernel
Hi Wei,
On 26/06/18 18:47, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>> [ 0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
>
> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !
Some examples:
For comparison, when I boot v4.17 it looks like this:
| Linux version 4.17.0 (morse at melchizedek) (gcc version 4.9.3 20141031
| (prerelease) (Linaro GCC 2014.11)) #9886 SMP PREEMPT Thu Jun 21 10:30:55 BST
| 2018
If I apply some extra patches and make some uncommitted changes, it looks like this:
| Linux version 4.17.0-00025-ga22ca2234824-dirty (morse at melchizedek) (gcc
| version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11)) #9887 SMP PREEMPT
| Thu Jun 21 10:46:22 BST 2018
Hence we read your '4.17.0-45864-g29dcea8-dirty' line as v4.17 with extra
patches and uncommited changes, and similar with this v4.18-rc2.
I agree 7daf201 is the head commit for v4.18-rc2, but something has gone wrong
here. Could you try building from a fresh clone of Linus' tree?
(I suspect at some point you've applied a patch, and have then been merging
upstream, instead of 'fast forwarding')
Thanks,
James
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-26 17:47 ` Will Deacon
@ 2018-06-27 13:22 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-27 13:22 UTC (permalink / raw)
To: Will Deacon
Cc: James Morse, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution),
libeijian, zhangxiquan, wxf.wang, dingshuai1, Hanjun Guo,
Liguozhu (Kenneth)
Hi Will,
On 2018/6/26 18:47, Will Deacon wrote:
> Hi Wei,
>
> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>> Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu
>> 2.12.0.
>> The guest sometimes still failed to boot. But the crash reason is different.
>> Could you please share any hint?
>> Thanks!
>>
>> The guest boot log is as below:
>> ===========================
>>
>> estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
>> ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.18-joyx
>> -initrd
>> ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
>> console=ttyAMA0 ear
>> lycon=pl011,0x9000000"
>>
>> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>> [ 0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
>
> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !
Sorry, I should highlight that I have updated the default value
of CONFIG_NR_CPUS by menuconfig in the previous mail.
That is why it showed dirty.
>
>> [ 0.048119] Unable to handle kernel NULL pointer dereference at
>> virtual address 0000000000000288
>> [ 0.048991] Mem abort info:
>> [ 0.049267] ESR = 0x96000004
>> [ 0.049567] Exception class = DABT (current EL), IL = 32 bits
>> [ 0.050146] SET = 0, FnV = 0
>> [ 0.050446] EA = 0, S1PTW = 0
>> [ 0.050754] Data abort info:
>> [ 0.051038] ISV = 0, ISS = 0x00000004
>> [ 0.051921] CM = 0, WnR = 0
>> [ 0.054936] [0000000000000288] user address but active_mm is swapper
>> [ 0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>> [ 0.067080] Modules linked in:
>> [ 0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted
>> 4.18.0-rc2-58583-g7daf201-dirty #20
>> [ 0.078745] Hardware name: linux,dummy-virt (DT)
>> [ 0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
>> [ 0.088258] pc : kpti_install_ng_mappings+0x154/0x214
>> [ 0.093319] lr : kpti_install_ng_mappings+0x120/0x214
>> [ 0.098483] sp : ffff0000093fbce0
>> [ 0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
>> [ 0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
>> [ 0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
>> [ 0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
>> [ 0.123392] x21: ffff00000923b000 x20: 0000000000000000
>> [ 0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
>> [ 0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
>> [ 0.139513] x15: 000000007dff5000 x14: 000000007dff5000
>> [ 0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
>> [ 0.150329] x11: 000000007dff7000 x10: 0000000000000000
>> [ 0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
>> [ 0.161042] x7 : 0000000000000000 x6 : 000000004123c000
>> [ 0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
>> [ 0.171860] x3 : 0000000000000000 x2 : 000000004123b000
>> [ 0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0
>
> So looking at the disassembly, we access idmap_t0sz as part of
> cpu_install_idmap() and it looks like we push its page address to the
> stack:
>
>> 0xffff000008091ffc <+128>: adrp x3, 0xffff000009096000 <early_node_cpu_hwid+1440>
>
> [...]
>
>> 0xffff000008092044 <+200>: str x3, [x29,#96]
>
> Then after we've come back from the asm call, we want to access idmap_t0sz
> again as part of cpu_uninstall_idmap() so we pop it back off:
>
>> 0xffff0000080920cc <+336>: ldr x3, [x29,#96]
>> 0xffff0000080920d0 <+340>: ldr x0, [x3,#648]
>
> And this access is the one that faults, because we popped off NULL.
>
Thanks for your kindly explanation!
> So actually, rather than faulting on the stack access, we're managing to
> load zeroes from somewhere, so it could still be indicative of page table
> corruption for the stack mapping.
>
> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> replacing:
>
> dc civac, cur_\()\type\()p
>
> with:
>
> dc ivac, cur_\()\type\()p
>
> please? Only do this for the guest kernel, not the host. KVM will upgrade
> the clean to a clean+invalidate, so it's interesting to see if this has
> an effect on the behaviour.
Only changed the guest kernel, the guest still failed to boot and the log
is same with the last mail.
But if I changed to cvac as below for the guest, it is kind of stable.
dc cvac, cur_\()\type\()p
I have synced with our SoC guys about this and hope we can find the reason.
Do you have any more suggestion?
Thanks!
Best Regards,
Wei
>
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-27 13:22 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-27 13:22 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 2018/6/26 18:47, Will Deacon wrote:
> Hi Wei,
>
> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>> Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu
>> 2.12.0.
>> The guest sometimes still failed to boot. But the crash reason is different.
>> Could you please share any hint?
>> Thanks!
>>
>> The guest boot log is as below:
>> ===========================
>>
>> estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
>> ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.18-joyx
>> -initrd
>> ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
>> console=ttyAMA0 ear
>> lycon=pl011,0x9000000"
>>
>> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>> [ 0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
>
> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !
Sorry, I should highlight that I have updated the default value
of CONFIG_NR_CPUS by menuconfig in the previous mail.
That is why it showed dirty.
>
>> [ 0.048119] Unable to handle kernel NULL pointer dereference at
>> virtual address 0000000000000288
>> [ 0.048991] Mem abort info:
>> [ 0.049267] ESR = 0x96000004
>> [ 0.049567] Exception class = DABT (current EL), IL = 32 bits
>> [ 0.050146] SET = 0, FnV = 0
>> [ 0.050446] EA = 0, S1PTW = 0
>> [ 0.050754] Data abort info:
>> [ 0.051038] ISV = 0, ISS = 0x00000004
>> [ 0.051921] CM = 0, WnR = 0
>> [ 0.054936] [0000000000000288] user address but active_mm is swapper
>> [ 0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>> [ 0.067080] Modules linked in:
>> [ 0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted
>> 4.18.0-rc2-58583-g7daf201-dirty #20
>> [ 0.078745] Hardware name: linux,dummy-virt (DT)
>> [ 0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
>> [ 0.088258] pc : kpti_install_ng_mappings+0x154/0x214
>> [ 0.093319] lr : kpti_install_ng_mappings+0x120/0x214
>> [ 0.098483] sp : ffff0000093fbce0
>> [ 0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
>> [ 0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
>> [ 0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
>> [ 0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
>> [ 0.123392] x21: ffff00000923b000 x20: 0000000000000000
>> [ 0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
>> [ 0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
>> [ 0.139513] x15: 000000007dff5000 x14: 000000007dff5000
>> [ 0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
>> [ 0.150329] x11: 000000007dff7000 x10: 0000000000000000
>> [ 0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
>> [ 0.161042] x7 : 0000000000000000 x6 : 000000004123c000
>> [ 0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
>> [ 0.171860] x3 : 0000000000000000 x2 : 000000004123b000
>> [ 0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0
>
> So looking at the disassembly, we access idmap_t0sz as part of
> cpu_install_idmap() and it looks like we push its page address to the
> stack:
>
>> 0xffff000008091ffc <+128>: adrp x3, 0xffff000009096000 <early_node_cpu_hwid+1440>
>
> [...]
>
>> 0xffff000008092044 <+200>: str x3, [x29,#96]
>
> Then after we've come back from the asm call, we want to access idmap_t0sz
> again as part of cpu_uninstall_idmap() so we pop it back off:
>
>> 0xffff0000080920cc <+336>: ldr x3, [x29,#96]
>> 0xffff0000080920d0 <+340>: ldr x0, [x3,#648]
>
> And this access is the one that faults, because we popped off NULL.
>
Thanks for your kindly explanation!
> So actually, rather than faulting on the stack access, we're managing to
> load zeroes from somewhere, so it could still be indicative of page table
> corruption for the stack mapping.
>
> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> replacing:
>
> dc civac, cur_\()\type\()p
>
> with:
>
> dc ivac, cur_\()\type\()p
>
> please? Only do this for the guest kernel, not the host. KVM will upgrade
> the clean to a clean+invalidate, so it's interesting to see if this has
> an effect on the behaviour.
Only changed the guest kernel, the guest still failed to boot and the log
is same with the last mail.
But if I changed to cvac as below for the guest, it is kind of stable.
dc cvac, cur_\()\type\()p
I have synced with our SoC guys about this and hope we can find the reason.
Do you have any more suggestion?
Thanks!
Best Regards,
Wei
>
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-27 8:39 ` James Morse
@ 2018-06-27 13:26 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-27 13:26 UTC (permalink / raw)
To: James Morse
Cc: Will Deacon, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution),
libeijian
Hi James,
On 2018/6/27 9:39, James Morse wrote:
> Hi Wei,
>
> On 26/06/18 18:47, Will Deacon wrote:
>> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>>> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>> [ 0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
>>
>> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !
>
> Some examples:
>
> For comparison, when I boot v4.17 it looks like this:
> | Linux version 4.17.0 (morse@melchizedek) (gcc version 4.9.3 20141031
> | (prerelease) (Linaro GCC 2014.11)) #9886 SMP PREEMPT Thu Jun 21 10:30:55 BST
> | 2018
>
>
> If I apply some extra patches and make some uncommitted changes, it looks like this:
> | Linux version 4.17.0-00025-ga22ca2234824-dirty (morse@melchizedek) (gcc
> | version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11)) #9887 SMP PREEMPT
> | Thu Jun 21 10:46:22 BST 2018
>
>
> Hence we read your '4.17.0-45864-g29dcea8-dirty' line as v4.17 with extra
> patches and uncommited changes, and similar with this v4.18-rc2.
>
> I agree 7daf201 is the head commit for v4.18-rc2, but something has gone wrong
> here. Could you try building from a fresh clone of Linus' tree?
>
> (I suspect at some point you've applied a patch, and have then been merging
> upstream, instead of 'fast forwarding')
>
Thanks for your kindly guidance!
Sorry, I should highlight that I have only updated the default value
of CONFIG_NR_CPUS by menuconfig in the previous mail.
That is why it showed dirty.
Best Regards,
Wei
>
>
> Thanks,
>
> James
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-27 13:26 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-27 13:26 UTC (permalink / raw)
To: linux-arm-kernel
Hi James,
On 2018/6/27 9:39, James Morse wrote:
> Hi Wei,
>
> On 26/06/18 18:47, Will Deacon wrote:
>> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>>> [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>> [ 0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
>>
>> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !
>
> Some examples:
>
> For comparison, when I boot v4.17 it looks like this:
> | Linux version 4.17.0 (morse at melchizedek) (gcc version 4.9.3 20141031
> | (prerelease) (Linaro GCC 2014.11)) #9886 SMP PREEMPT Thu Jun 21 10:30:55 BST
> | 2018
>
>
> If I apply some extra patches and make some uncommitted changes, it looks like this:
> | Linux version 4.17.0-00025-ga22ca2234824-dirty (morse at melchizedek) (gcc
> | version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11)) #9887 SMP PREEMPT
> | Thu Jun 21 10:46:22 BST 2018
>
>
> Hence we read your '4.17.0-45864-g29dcea8-dirty' line as v4.17 with extra
> patches and uncommited changes, and similar with this v4.18-rc2.
>
> I agree 7daf201 is the head commit for v4.18-rc2, but something has gone wrong
> here. Could you try building from a fresh clone of Linus' tree?
>
> (I suspect at some point you've applied a patch, and have then been merging
> upstream, instead of 'fast forwarding')
>
Thanks for your kindly guidance!
Sorry, I should highlight that I have only updated the default value
of CONFIG_NR_CPUS by menuconfig in the previous mail.
That is why it showed dirty.
Best Regards,
Wei
>
>
> Thanks,
>
> James
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-27 13:22 ` Wei Xu
@ 2018-06-27 13:28 ` Will Deacon
-1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-27 13:28 UTC (permalink / raw)
To: Wei Xu
Cc: James Morse, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution),
libeijian, zhangxiquan, wxf.wang, dingshuai1, Hanjun Guo,
Liguozhu (Kenneth)
On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
> On 2018/6/26 18:47, Will Deacon wrote:
> > If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> > replacing:
> >
> > dc civac, cur_\()\type\()p
> >
> > with:
> >
> > dc ivac, cur_\()\type\()p
> >
> > please? Only do this for the guest kernel, not the host. KVM will upgrade
> > the clean to a clean+invalidate, so it's interesting to see if this has
> > an effect on the behaviour.
>
> Only changed the guest kernel, the guest still failed to boot and the log
> is same with the last mail.
>
> But if I changed to cvac as below for the guest, it is kind of stable.
> dc cvac, cur_\()\type\()p
>
> I have synced with our SoC guys about this and hope we can find the reason.
> Do you have any more suggestion?
Unfortunately, not. It looks like somehow clean+invalidate is behaving
just as an invalidate, and we're corrupting the page table as a result.
Hopefully the SoC guys will figure it out.
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-27 13:28 ` Will Deacon
0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-27 13:28 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
> On 2018/6/26 18:47, Will Deacon wrote:
> > If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> > replacing:
> >
> > dc civac, cur_\()\type\()p
> >
> > with:
> >
> > dc ivac, cur_\()\type\()p
> >
> > please? Only do this for the guest kernel, not the host. KVM will upgrade
> > the clean to a clean+invalidate, so it's interesting to see if this has
> > an effect on the behaviour.
>
> Only changed the guest kernel, the guest still failed to boot and the log
> is same with the last mail.
>
> But if I changed to cvac as below for the guest, it is kind of stable.
> dc cvac, cur_\()\type\()p
>
> I have synced with our SoC guys about this and hope we can find the reason.
> Do you have any more suggestion?
Unfortunately, not. It looks like somehow clean+invalidate is behaving
just as an invalidate, and we're corrupting the page table as a result.
Hopefully the SoC guys will figure it out.
Will
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-27 13:28 ` Will Deacon
@ 2018-06-27 13:32 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-27 13:32 UTC (permalink / raw)
To: Will Deacon
Cc: James Morse, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution),
libeijian, zhangxiquan, wxf.wang, dingshuai1, Hanjun Guo,
Liguozhu (Kenneth)
Hi Will,
On 2018/6/27 14:28, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
>> On 2018/6/26 18:47, Will Deacon wrote:
>>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
>>> replacing:
>>>
>>> dc civac, cur_\()\type\()p
>>>
>>> with:
>>>
>>> dc ivac, cur_\()\type\()p
>>>
>>> please? Only do this for the guest kernel, not the host. KVM will upgrade
>>> the clean to a clean+invalidate, so it's interesting to see if this has
>>> an effect on the behaviour.
>>
>> Only changed the guest kernel, the guest still failed to boot and the log
>> is same with the last mail.
>>
>> But if I changed to cvac as below for the guest, it is kind of stable.
>> dc cvac, cur_\()\type\()p
>>
>> I have synced with our SoC guys about this and hope we can find the reason.
>> Do you have any more suggestion?
>
> Unfortunately, not. It looks like somehow clean+invalidate is behaving
> just as an invalidate, and we're corrupting the page table as a result.
>
> Hopefully the SoC guys will figure it out.
Thanks anyway!
I will update here if any news.
Best Regards,
Wei
>
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-27 13:32 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-27 13:32 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 2018/6/27 14:28, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
>> On 2018/6/26 18:47, Will Deacon wrote:
>>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
>>> replacing:
>>>
>>> dc civac, cur_\()\type\()p
>>>
>>> with:
>>>
>>> dc ivac, cur_\()\type\()p
>>>
>>> please? Only do this for the guest kernel, not the host. KVM will upgrade
>>> the clean to a clean+invalidate, so it's interesting to see if this has
>>> an effect on the behaviour.
>>
>> Only changed the guest kernel, the guest still failed to boot and the log
>> is same with the last mail.
>>
>> But if I changed to cvac as below for the guest, it is kind of stable.
>> dc cvac, cur_\()\type\()p
>>
>> I have synced with our SoC guys about this and hope we can find the reason.
>> Do you have any more suggestion?
>
> Unfortunately, not. It looks like somehow clean+invalidate is behaving
> just as an invalidate, and we're corrupting the page table as a result.
>
> Hopefully the SoC guys will figure it out.
Thanks anyway!
I will update here if any news.
Best Regards,
Wei
>
> Will
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-27 13:26 ` Wei Xu
@ 2018-06-28 8:45 ` James Morse
-1 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-28 8:45 UTC (permalink / raw)
To: Wei Xu
Cc: Will Deacon, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution),
libeijian
Hi Wei,
On 27/06/18 14:26, Wei Xu wrote:
> Sorry, I should highlight that I have only updated the default value
> of CONFIG_NR_CPUS by menuconfig in the previous mail.
> That is why it showed dirty.
(menuconfig changes don't show up like this)
More than 64 CPUs ... Is this system running more VMs than it has VMIDs? Too-few
VMIDs does work with KVM, its just going to trigger rollover frequently.
Just to check, what kernel version is the host running? Does it have commit
f0cf47d939d0 ("KVM: arm/arm64: Close VMID generation race")
(looks like that went in as a fix for v4.17-rc3)
Are you running (lots) of other VMs whenever this happens? Do they have multiple
vcpus? (I'm thinking of the scenario in that patch's description)
Is the host system otherwise idle when this happens?
(If not, can you reproduce the issue without exhausting the VMIDs?)
It may be that writing back the page-table entries with the MMU off, and
changing the cache maintenance are just changing the timing of something else.
Thanks,
James
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-28 8:45 ` James Morse
0 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-28 8:45 UTC (permalink / raw)
To: linux-arm-kernel
Hi Wei,
On 27/06/18 14:26, Wei Xu wrote:
> Sorry, I should highlight that I have only updated the default value
> of CONFIG_NR_CPUS by menuconfig in the previous mail.
> That is why it showed dirty.
(menuconfig changes don't show up like this)
More than 64 CPUs ... Is this system running more VMs than it has VMIDs? Too-few
VMIDs does work with KVM, its just going to trigger rollover frequently.
Just to check, what kernel version is the host running? Does it have commit
f0cf47d939d0 ("KVM: arm/arm64: Close VMID generation race")
(looks like that went in as a fix for v4.17-rc3)
Are you running (lots) of other VMs whenever this happens? Do they have multiple
vcpus? (I'm thinking of the scenario in that patch's description)
Is the host system otherwise idle when this happens?
(If not, can you reproduce the issue without exhausting the VMIDs?)
It may be that writing back the page-table entries with the MMU off, and
changing the cache maintenance are just changing the timing of something else.
Thanks,
James
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-28 8:45 ` James Morse
@ 2018-06-28 10:20 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-28 10:20 UTC (permalink / raw)
To: James Morse
Cc: Will Deacon, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution),
libeijian
Hi James,
On 2018/6/28 9:45, James Morse wrote:
> Hi Wei,
>
> On 27/06/18 14:26, Wei Xu wrote:
>> Sorry, I should highlight that I have only updated the default value
>> of CONFIG_NR_CPUS by menuconfig in the previous mail.
>> That is why it showed dirty.
>
> (menuconfig changes don't show up like this)
Thanks!
Sorry, yes, you are right.
I did not see dirty after I reset the proc.S.
>
>
> More than 64 CPUs ... Is this system running more VMs than it has VMIDs? Too-few
> VMIDs does work with KVM, its just going to trigger rollover frequently.
>
No, we just ran one VM.
> Just to check, what kernel version is the host running? Does it have commit
> f0cf47d939d0 ("KVM: arm/arm64: Close VMID generation race")
> (looks like that went in as a fix for v4.17-rc3)
Yes, the host is runing 4.18-rc2 as the guest including above commit.
>
> Are you running (lots) of other VMs whenever this happens? Do they have multiple
> vcpus? (I'm thinking of the scenario in that patch's description)
No, we just ran one VM with 1 cpu.
>
> Is the host system otherwise idle when this happens?
> (If not, can you reproduce the issue without exhausting the VMIDs?)
>
>
> It may be that writing back the page-table entries with the MMU off, and
> changing the cache maintenance are just changing the timing of something else.
>
Yes, maybe. Now we are debugging with the SoC guys together.
Thanks!
Best Regards,
Wei
>
> Thanks,
>
> James
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-28 10:20 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-28 10:20 UTC (permalink / raw)
To: linux-arm-kernel
Hi James,
On 2018/6/28 9:45, James Morse wrote:
> Hi Wei,
>
> On 27/06/18 14:26, Wei Xu wrote:
>> Sorry, I should highlight that I have only updated the default value
>> of CONFIG_NR_CPUS by menuconfig in the previous mail.
>> That is why it showed dirty.
>
> (menuconfig changes don't show up like this)
Thanks!
Sorry, yes, you are right.
I did not see dirty after I reset the proc.S.
>
>
> More than 64 CPUs ... Is this system running more VMs than it has VMIDs? Too-few
> VMIDs does work with KVM, its just going to trigger rollover frequently.
>
No, we just ran one VM.
> Just to check, what kernel version is the host running? Does it have commit
> f0cf47d939d0 ("KVM: arm/arm64: Close VMID generation race")
> (looks like that went in as a fix for v4.17-rc3)
Yes, the host is runing 4.18-rc2 as the guest including above commit.
>
> Are you running (lots) of other VMs whenever this happens? Do they have multiple
> vcpus? (I'm thinking of the scenario in that patch's description)
No, we just ran one VM with 1 cpu.
>
> Is the host system otherwise idle when this happens?
> (If not, can you reproduce the issue without exhausting the VMIDs?)
>
>
> It may be that writing back the page-table entries with the MMU off, and
> changing the cache maintenance are just changing the timing of something else.
>
Yes, maybe. Now we are debugging with the SoC guys together.
Thanks!
Best Regards,
Wei
>
> Thanks,
>
> James
>
> .
>
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-27 13:28 ` Will Deacon
@ 2018-06-28 14:50 ` Wei Xu
-1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-28 14:50 UTC (permalink / raw)
To: Will Deacon
Cc: James Morse, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution),
libeijian, zhangxiquan, wxf.wang, dingshuai1, Hanjun Guo,
Liguozhu (Kenneth),
zhangxiquan, wxf.wang, Hanjun Guo, dingshuai1
Hi Will,
On 2018/6/27 14:28, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
>> On 2018/6/26 18:47, Will Deacon wrote:
>>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
>>> replacing:
>>>
>>> dc civac, cur_\()\type\()p
>>>
>>> with:
>>>
>>> dc ivac, cur_\()\type\()p
>>>
>>> please? Only do this for the guest kernel, not the host. KVM will upgrade
>>> the clean to a clean+invalidate, so it's interesting to see if this has
>>> an effect on the behaviour.
>>
>> Only changed the guest kernel, the guest still failed to boot and the log
>> is same with the last mail.
>>
>> But if I changed to cvac as below for the guest, it is kind of stable.
>> dc cvac, cur_\()\type\()p
>>
>> I have synced with our SoC guys about this and hope we can find the reason.
>> Do you have any more suggestion?
>
> Unfortunately, not. It looks like somehow clean+invalidate is behaving
> just as an invalidate, and we're corrupting the page table as a result.
>
> Hopefully the SoC guys will figure it out.
After replaced the dmb with dsb in both __idmap_kpti_get_pgtable_ent and
__idmap_kpti_put_pgtable_ent_ng, we tested 20 times and we can not reproduce
the issue.
Today we will continue to do the stress testing and will update the result tomorrow.
The dsb in __idmap_kpti_get_pgtable_ent is to make sure the dc has been done and
the following ldr can get the latest data.
The dsb in __idmap_kpti_put_pgtable_ent_ng is to make sure the str will be done
before dc. Although dmb can guarantee the order of the str and dc on the L2 cache,
dmb can not guarantee the order on the bus.
How do you think about it?
Thanks!
----
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 03646e6..bb767ea 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -209,7 +209,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
.macro __idmap_kpti_get_pgtable_ent, type
dc cvac, cur_\()\type\()p // Ensure any existing dirty
- dmb sy // lines are written back before
+ dsb sy // lines are written back before
ldr \type, [cur_\()\type\()p] // loading the entry
tbz \type, #0, skip_\()\type // Skip invalid and
tbnz \type, #11, skip_\()\type // non-global entries
@@ -218,8 +218,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
.macro __idmap_kpti_put_pgtable_ent_ng, type
orr \type, \type, #PTE_NG // Same bit for blocks and pages
str \type, [cur_\()\type\()p] // Update the entry and ensure
- dmb sy // that it is visible to all
+ dsb sy // that it is visible to all
dc civac, cur_\()\type\()p // CPUs.
Best Regards,
Wei
>
> Will
>
> .
>
^ permalink raw reply related [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-28 14:50 ` Wei Xu
0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-28 14:50 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 2018/6/27 14:28, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
>> On 2018/6/26 18:47, Will Deacon wrote:
>>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
>>> replacing:
>>>
>>> dc civac, cur_\()\type\()p
>>>
>>> with:
>>>
>>> dc ivac, cur_\()\type\()p
>>>
>>> please? Only do this for the guest kernel, not the host. KVM will upgrade
>>> the clean to a clean+invalidate, so it's interesting to see if this has
>>> an effect on the behaviour.
>>
>> Only changed the guest kernel, the guest still failed to boot and the log
>> is same with the last mail.
>>
>> But if I changed to cvac as below for the guest, it is kind of stable.
>> dc cvac, cur_\()\type\()p
>>
>> I have synced with our SoC guys about this and hope we can find the reason.
>> Do you have any more suggestion?
>
> Unfortunately, not. It looks like somehow clean+invalidate is behaving
> just as an invalidate, and we're corrupting the page table as a result.
>
> Hopefully the SoC guys will figure it out.
After replaced the dmb with dsb in both __idmap_kpti_get_pgtable_ent and
__idmap_kpti_put_pgtable_ent_ng, we tested 20 times and we can not reproduce
the issue.
Today we will continue to do the stress testing and will update the result tomorrow.
The dsb in __idmap_kpti_get_pgtable_ent is to make sure the dc has been done and
the following ldr can get the latest data.
The dsb in __idmap_kpti_put_pgtable_ent_ng is to make sure the str will be done
before dc. Although dmb can guarantee the order of the str and dc on the L2 cache,
dmb can not guarantee the order on the bus.
How do you think about it?
Thanks!
----
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 03646e6..bb767ea 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -209,7 +209,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
.macro __idmap_kpti_get_pgtable_ent, type
dc cvac, cur_\()\type\()p // Ensure any existing dirty
- dmb sy // lines are written back before
+ dsb sy // lines are written back before
ldr \type, [cur_\()\type\()p] // loading the entry
tbz \type, #0, skip_\()\type // Skip invalid and
tbnz \type, #11, skip_\()\type // non-global entries
@@ -218,8 +218,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
.macro __idmap_kpti_put_pgtable_ent_ng, type
orr \type, \type, #PTE_NG // Same bit for blocks and pages
str \type, [cur_\()\type\()p] // Update the entry and ensure
- dmb sy // that it is visible to all
+ dsb sy // that it is visible to all
dc civac, cur_\()\type\()p // CPUs.
Best Regards,
Wei
>
> Will
>
> .
>
^ permalink raw reply related [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-28 14:50 ` Wei Xu
@ 2018-06-28 15:34 ` Mark Rutland
-1 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-28 15:34 UTC (permalink / raw)
To: Wei Xu
Cc: Will Deacon, James Morse, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution),
libeijian, zhangxiquan, wxf.wang, dingshuai1, Hanjun Guo,
Liguozhu (Kenneth)
On Thu, Jun 28, 2018 at 03:50:40PM +0100, Wei Xu wrote:
> Hi Will,
>
> On 2018/6/27 14:28, Will Deacon wrote:
> > On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
> >> On 2018/6/26 18:47, Will Deacon wrote:
> >>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> >>> replacing:
> >>>
> >>> dc civac, cur_\()\type\()p
> >>>
> >>> with:
> >>>
> >>> dc ivac, cur_\()\type\()p
> >>>
> >>> please? Only do this for the guest kernel, not the host. KVM will upgrade
> >>> the clean to a clean+invalidate, so it's interesting to see if this has
> >>> an effect on the behaviour.
> >>
> >> Only changed the guest kernel, the guest still failed to boot and the log
> >> is same with the last mail.
> >>
> >> But if I changed to cvac as below for the guest, it is kind of stable.
> >> dc cvac, cur_\()\type\()p
> >>
> >> I have synced with our SoC guys about this and hope we can find the reason.
> >> Do you have any more suggestion?
> >
> > Unfortunately, not. It looks like somehow clean+invalidate is behaving
> > just as an invalidate, and we're corrupting the page table as a result.
> >
> > Hopefully the SoC guys will figure it out.
>
> After replaced the dmb with dsb in both __idmap_kpti_get_pgtable_ent and
> __idmap_kpti_put_pgtable_ent_ng, we tested 20 times and we can not reproduce
> the issue.
> Today we will continue to do the stress testing and will update the result tomorrow.
>
> The dsb in __idmap_kpti_get_pgtable_ent is to make sure the dc has been done and
> the following ldr can get the latest data.
>
> The dsb in __idmap_kpti_put_pgtable_ent_ng is to make sure the str will be done
> before dc. Although dmb can guarantee the order of the str and dc on the L2 cache,
> dmb can not guarantee the order on the bus.
The architecture mandates that a DMB must provide this ordering, so that
would be an erratum.
Per ARM DDI 0487C.a, page D3-2069, "Ordering and completion of data and instruction
cache instructions":
All data cache instructions, other than DC ZVA, that specify an
address:
* Can execute in any order relative to loads or stores that access any
address with the Device memory attribute,or with Normal memory with
Inner Non-cacheable attribute unless a DMB or DSB is executed
between the instructions.
Note that we rely on this ordering in head.S when creating the page
tables and setting up the boot mode. We also rely on this for the pmem
API.
Thanks,
Mark.
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-28 15:34 ` Mark Rutland
0 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-28 15:34 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Jun 28, 2018 at 03:50:40PM +0100, Wei Xu wrote:
> Hi Will,
>
> On 2018/6/27 14:28, Will Deacon wrote:
> > On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
> >> On 2018/6/26 18:47, Will Deacon wrote:
> >>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> >>> replacing:
> >>>
> >>> dc civac, cur_\()\type\()p
> >>>
> >>> with:
> >>>
> >>> dc ivac, cur_\()\type\()p
> >>>
> >>> please? Only do this for the guest kernel, not the host. KVM will upgrade
> >>> the clean to a clean+invalidate, so it's interesting to see if this has
> >>> an effect on the behaviour.
> >>
> >> Only changed the guest kernel, the guest still failed to boot and the log
> >> is same with the last mail.
> >>
> >> But if I changed to cvac as below for the guest, it is kind of stable.
> >> dc cvac, cur_\()\type\()p
> >>
> >> I have synced with our SoC guys about this and hope we can find the reason.
> >> Do you have any more suggestion?
> >
> > Unfortunately, not. It looks like somehow clean+invalidate is behaving
> > just as an invalidate, and we're corrupting the page table as a result.
> >
> > Hopefully the SoC guys will figure it out.
>
> After replaced the dmb with dsb in both __idmap_kpti_get_pgtable_ent and
> __idmap_kpti_put_pgtable_ent_ng, we tested 20 times and we can not reproduce
> the issue.
> Today we will continue to do the stress testing and will update the result tomorrow.
>
> The dsb in __idmap_kpti_get_pgtable_ent is to make sure the dc has been done and
> the following ldr can get the latest data.
>
> The dsb in __idmap_kpti_put_pgtable_ent_ng is to make sure the str will be done
> before dc. Although dmb can guarantee the order of the str and dc on the L2 cache,
> dmb can not guarantee the order on the bus.
The architecture mandates that a DMB must provide this ordering, so that
would be an erratum.
Per ARM DDI 0487C.a, page D3-2069, "Ordering and completion of data and instruction
cache instructions":
All data cache instructions, other than DC ZVA, that specify an
address:
* Can execute in any order relative to loads or stores that access any
address with the Device memory attribute,or with Normal memory with
Inner Non-cacheable attribute unless a DMB or DSB is executed
between the instructions.
Note that we rely on this ordering in head.S when creating the page
tables and setting up the boot mode. We also rely on this for the pmem
API.
Thanks,
Mark.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: 答复: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
[not found] ` <etPan.5b3507f7.914aa16.1d6b@localhost>
@ 2018-06-28 16:24 ` Mark Rutland
2018-06-29 8:47 ` Marc Zyngier
1 sibling, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-28 16:24 UTC (permalink / raw)
To: Wangxuefeng (E)
Cc: xuwei (O),
will.deacon, james.morse, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution),
Libeijian, Zhangxiquan, dingshuai, Guohanjun (Hanjun Guo),
Liguozhu (Kenneth)
On Thu, Jun 28, 2018 at 04:08:24PM +0000, Wangxuefeng (E) wrote:
> Hi, mark
> Your means is that DMB must make sure the completion of prior load/store
> or CMO and make sure the data is visible to all obsevers (no matter device or
> cacheable). DMB not only keep order?
Not quite -- DMB does not guarantee completion.
However, DMB must guarantee that loads/stores and CMOs are ordered on
the bus, all the way to the PoC.
Thanks,
Mark.
^ permalink raw reply [flat|nested] 79+ messages in thread
* 答复: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-28 16:24 ` Mark Rutland
0 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-28 16:24 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Jun 28, 2018 at 04:08:24PM +0000, Wangxuefeng (E) wrote:
> Hi, mark
> Your means is that DMB must make sure the completion of prior load/store
> or CMO and make sure the data is visible to all obsevers (no matter device or
> cacheable). DMB not only keep order?
Not quite -- DMB does not guarantee completion.
However, DMB must guarantee that loads/stores and CMOs are ordered on
the bus, all the way to the PoC.
Thanks,
Mark.
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
[not found] ` <etPan.5b3507f7.914aa16.1d6b@localhost>
2018-06-28 16:24 ` Mark Rutland
@ 2018-06-29 8:47 ` Marc Zyngier
1 sibling, 0 replies; 79+ messages in thread
From: Marc Zyngier @ 2018-06-29 8:47 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 28 Jun 2018 20:18:13 +0100,
Zhangxiquan <zhangxiquan@hisilicon.com> wrote:
>
> [1 <text/plain; utf-8 (base64)>]
> Hi Mark ,
>
> I clined to agree with you that DMB is enough for order of DC and LDST.
>
> Just want to know , has this code ever passed on any ARMer
> implementation ?such ad A75,A72....
This code has been tested on most ARM implementations, as this is what
we used to develop it. So far, you have the only example we know of
where this sequence doesn't work as expected.
More importantly, this piece of code is written to match the ARMv8
architecture requirements, and we can only expect implementations to
follow the exact same requirements.
Thanks,
M.
--
Jazz is not dead, it just smell funny.
^ permalink raw reply [flat|nested] 79+ messages in thread
* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
2018-06-28 16:24 ` Mark Rutland
@ 2018-06-29 9:59 ` Mark Rutland
-1 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-29 9:59 UTC (permalink / raw)
To: Zhangxiquan
Cc: Wangxuefeng (E), xuwei (O),
will.deacon, james.morse, catalin.marinas, Linuxarm, Zhangyi ac,
suzuki.poulose, marc.zyngier, Xiongfanggou (James),
linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
Turing Solution), Libeijian, dingshuai, Guohanjun (Hanjun Guo),
Liguozhu (Kenneth)
On Thu, Jun 28, 2018 at 07:24:30PM +0000, Zhangxiquan wrote:
> Do you think this order guarantee (between DC and ldst)is applicable for
> cacheable only , or it is also applicable for device ?
This also applies for device memory.
As I quoted previously, from ARM DDI 0487C.a page D3-2069:
All data cache instructions, other than DC ZVA , that specify an
address:
* Can execute in any order relative to loads or stores that access any
address with the Device memory attribute, or with Normal memory with
Inner Non-cacheable attribute unless a DMB or DSB is executed
between the instructions.
i.e. a DMB is sufficient to provide order between DC and loads/stores
which access device memory.
Thanks,
Mark.
^ permalink raw reply [flat|nested] 79+ messages in thread
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-29 9:59 ` Mark Rutland
0 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-29 9:59 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Jun 28, 2018 at 07:24:30PM +0000, Zhangxiquan wrote:
> Do you think this order guarantee (between DC and ldst)is applicable for
> cacheable only , or it is also applicable for device ?
This also applies for device memory.
As I quoted previously, from ARM DDI 0487C.a page D3-2069:
All data cache instructions, other than DC ZVA , that specify an
address:
* Can execute in any order relative to loads or stores that access any
address with the Device memory attribute, or with Normal memory with
Inner Non-cacheable attribute unless a DMB or DSB is executed
between the instructions.
i.e. a DMB is sufficient to provide order between DC and loads/stores
which access device memory.
Thanks,
Mark.
^ permalink raw reply [flat|nested] 79+ messages in thread
end of thread, other threads:[~2018-06-29 10:00 UTC | newest]
Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-20 14:18 KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform Wei Xu
2018-06-20 14:18 ` Wei Xu
2018-06-20 14:42 ` Will Deacon
2018-06-20 14:42 ` Will Deacon
2018-06-20 15:52 ` Wei Xu
2018-06-20 15:52 ` Wei Xu
2018-06-20 15:54 ` James Morse
2018-06-20 15:54 ` James Morse
2018-06-20 16:25 ` Wei Xu
2018-06-20 16:25 ` Wei Xu
2018-06-20 16:28 ` Will Deacon
2018-06-20 16:28 ` Will Deacon
2018-06-20 16:33 ` Wei Xu
2018-06-20 16:33 ` Wei Xu
2018-06-21 8:38 ` James Morse
2018-06-21 8:38 ` James Morse
2018-06-21 9:00 ` Marc Zyngier
2018-06-21 9:00 ` Marc Zyngier
2018-06-21 9:18 ` Will Deacon
2018-06-21 9:18 ` Will Deacon
2018-06-21 10:14 ` Wei Xu
2018-06-21 10:14 ` Wei Xu
2018-06-21 10:54 ` Will Deacon
2018-06-21 10:54 ` Will Deacon
2018-06-22 8:33 ` Wei Xu
2018-06-22 8:33 ` Wei Xu
2018-06-22 9:23 ` Will Deacon
2018-06-22 9:23 ` Will Deacon
2018-06-22 10:45 ` Wei Xu
2018-06-22 10:45 ` Wei Xu
2018-06-22 11:16 ` Will Deacon
2018-06-22 11:16 ` Will Deacon
2018-06-22 13:18 ` Wei Xu
2018-06-22 13:18 ` Wei Xu
2018-06-22 13:31 ` Will Deacon
2018-06-22 13:31 ` Will Deacon
2018-06-22 13:46 ` Wei Xu
2018-06-22 13:46 ` Wei Xu
2018-06-22 14:43 ` Will Deacon
2018-06-22 14:43 ` Will Deacon
2018-06-22 15:26 ` Wei Xu
2018-06-22 15:26 ` Wei Xu
2018-06-22 14:28 ` Mark Rutland
2018-06-22 14:28 ` Mark Rutland
2018-06-22 15:28 ` Wei Xu
2018-06-22 15:28 ` Wei Xu
2018-06-22 15:41 ` Will Deacon
2018-06-22 15:41 ` Will Deacon
2018-06-22 16:02 ` Wei Xu
2018-06-22 16:02 ` Wei Xu
2018-06-21 9:20 ` Wei Xu
2018-06-21 9:20 ` Wei Xu
2018-06-26 17:16 ` Wei Xu
2018-06-26 17:16 ` Wei Xu
2018-06-26 17:47 ` Will Deacon
2018-06-26 17:47 ` Will Deacon
2018-06-27 8:39 ` James Morse
2018-06-27 8:39 ` James Morse
2018-06-27 13:26 ` Wei Xu
2018-06-27 13:26 ` Wei Xu
2018-06-28 8:45 ` James Morse
2018-06-28 8:45 ` James Morse
2018-06-28 10:20 ` Wei Xu
2018-06-28 10:20 ` Wei Xu
2018-06-27 13:22 ` Wei Xu
2018-06-27 13:22 ` Wei Xu
2018-06-27 13:28 ` Will Deacon
2018-06-27 13:28 ` Will Deacon
2018-06-27 13:32 ` Wei Xu
2018-06-27 13:32 ` Wei Xu
2018-06-28 14:50 ` Wei Xu
2018-06-28 14:50 ` Wei Xu
2018-06-28 15:34 ` Mark Rutland
2018-06-28 15:34 ` Mark Rutland
[not found] ` <etPan.5b3507f7.914aa16.1d6b@localhost>
2018-06-28 16:24 ` 答复: " Mark Rutland
2018-06-28 16:24 ` Mark Rutland
2018-06-29 9:59 ` Mark Rutland
2018-06-29 9:59 ` Mark Rutland
2018-06-29 8:47 ` Marc Zyngier
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.