All of lore.kernel.org
 help / color / mirror / Atom feed
* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 14:18 ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 14:18 UTC (permalink / raw)
  To: catalin.marinas, will.deacon, suzuki.poulose, dave.martin,
	mark.rutland, james.morse, marc.zyngier
  Cc: linux-arm-kernel, linux-kernel, Linuxarm, Hanjun Guo, xiexiuqi,
	huangdaode, Chenxin (Charles), Xiongfanggou (James),
	Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

Hi All,

We have observed KVM guest sometimes failed to boot because of kernel stack
overflow if KPTI is enabled on a hisilicon arm64 platform.

We also tested with different kernel version and found it is only
happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the 
guest.
The detail result is as below table.

+---------+----------+--------+------------+-------------------+
      |  host   |host KPTI | guest  | guest KPTI | kvm guest         |
      |  kernel |enabled   | kernel | enabled    | booting result    |
+---------+----------+--------+------------+-------------------+
      |  4.17   |     Y    |  4.17  |     Y      |  stack overflow   |
+---------+----------+--------+------------+-------------------+
      |  4.17   |     Y    |  4.16  |     NA     | OK          |
+---------+----------+--------+------------+-------------------+
      |  4.16   |     NA   |  4.17  |     Y      |  stack overflow   |
+---------+----------+--------+------------+-------------------+
      |  4.16   |     NA   |  4.16  |     NA     | OK          |
+---------+----------+--------+------------+-------------------+

A simple walk-around is adding this platform into the "kpti_safe_list".
But it does not resolve the issue indeed.
Could you please share any hint how to resolve this kind issue?
Thanks!

Another issue we found is "kpti_install_ng_mappings" will be invoked
even "kpti=off" has been added in the kernel command line. Is that expected?
This is because "kpti" is not a *early* param that "init_cpu_features" will
be invoked before parsing the param.

The command we are using to run the guest is as:

     ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 
-cpu host
     -enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd 
../mini-rootfs-arm64.cpio.gz
     -nographic -append "rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000"

The log is as below:

         [    0.000000] Booting Linux on physical CPU 0x0000000000 
[0x480fd010]
         [    0.000000] Linux version 4.17.0-45864-g29dcea8-dirty 
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 
SMP PREEMPT Fri Jun 15 21:39:52 CST 2018
         [    0.000000] Machine model: linux,dummy-virt
         [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 
(options '')
         [    0.000000] bootconsole [pl11] enabled
         [    0.000000] efi: Getting EFI parameters from FDT:
         [    0.000000] efi: UEFI not found.
         [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
         [    0.000000] NUMA: No NUMA configuration found
         [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
         [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
         [    0.000000] Zone ranges:
         [    0.000000]   DMA32    [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000]   Normal   empty
         [    0.000000] Movable zone start for each node
         [    0.000000] Early memory node ranges
         [    0.000000]   node   0: [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000] psci: probing for conduit method from DT.
         [    0.000000] psci: PSCIv1.0 detected in firmware.
         [    0.000000] psci: Using standard PSCI v0.2 function IDs
         [    0.000000] psci: Trusted OS migration not required
         [    0.000000] psci: SMC Calling Convention v1.1
         [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
         [    0.000000] percpu: Embedded 24 pages/cpu @        (ptrval) 
s57984 r8192 d32128 u98304
         [    0.000000] Detected VIPT I-cache on CPU0
         [    0.000000] CPU features: detected: Kernel page table 
isolation (KPTI)
         [    0.000000] CPU features: detected: Hardware dirty bit 
management
         [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
         [    0.000000] Policy zone: DMA32
         [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
         [    0.000000] Memory: 968436K/1048576K available (10044K 
kernel code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K 
reserved, 16384K cma-reserved)
         [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, 
CPUs=1, Nodes=1
         [    0.000000] Preemptible hierarchical RCU implementation.
         [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
         [    0.000000]     Tasks RCU enabled.
         [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
         [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
         [    0.000000] GICv3: Distributor has no Range Selector support
         [    0.000000] GICv3: no VLPI support, no direct LPI support
         [    0.000000] ITS [mem 0x08080000-0x0809ffff]
         [    0.000000] ITS@0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
         [    0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
         [    0.000000] GIC: using LPI property table @0x000000007d850000
         [    0.000000] ITS: Allocated 1792 chunks for LPIs
         [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
         [    0.000000] CPU0: using LPI pending table @0x000000007d860000
         [    0.000000] GIC: PPI11 is secure or misconfigured
         [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
         [    0.000000] arch_timer: WARNING: Please fix your firmware
         [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz 
(virt).
         [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
         [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
         [    0.000854] Console: colour dummy device 80x25
         [    0.001423] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 200.00 BogoMIPS (lpj=400000)
         [    0.002478] pid_max: default: 32768 minimum: 301
         [    0.002962] Security Framework initialized
         [    0.003541] Dentry cache hash table entries: 131072 (order: 
8, 1048576 bytes)
         [    0.004347] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
         [    0.005058] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
         [    0.005844] Mountpoint-cache hash table entries: 2048 
(order: 2, 16384 bytes)
         [    0.025949] ASID allocator initialised with 32768 entries
         [    0.029958] Hierarchical SRCU implementation.
         [    0.034328] Platform MSI: its domain created
         [    0.034787] PCI/MSI: /intc/its domain created
         [    0.035359] EFI services will not be available.
         [    0.037987] smp: Bringing up secondary CPUs ...
         [    0.038454] smp: Brought up 1 node, 1 CPU
         [    0.038859] SMP: Total of 1 processors activated.
         [    0.039338] CPU features: detected: GIC system register CPU 
interface
         [    0.039988] CPU features: detected: Privileged Access Never
         [    0.040560] CPU features: detected: User Access Override
         [    0.041093] CPU features: detected: RAS Extension Support
         [    0.042947] Insufficient stack space to handle exception!
         [    0.042949] ESR: 0x96000046 -- DABT (current EL)
         [    0.043963] FAR: 0xffff0000093a80e0
         [    0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
         [    0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
         [    0.058572] Overflow stack: 
[0xffff80003efce2f0..0xffff80003efcf2f0]
         [    0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #6
         [    0.073138] Hardware name: linux,dummy-virt (DT)
         [    0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
         [    0.082661] pc : el1_sync+0x0/0xb0
         [    0.086152] lr : kpti_install_ng_mappings+0x120/0x214
         [    0.091219] sp : ffff0000093a80e0
         [    0.094589] x29: ffff0000093abce0 x28: ffff000008ea9000
         [    0.100004] x27: ffff000008ea9000 x26: ffff0000091f7000
         [    0.105424] x25: ffff00000906d000 x24: ffff000009191000
         [    0.110733] x23: ffff000008ea9000 x22: 0000000041190000
         [    0.116148] x21: ffff0000091f7000 x20: 0000000000000000
         [    0.121564] x19: ffff000009190000 x18: 000000003455d99d
         [    0.126977] x17: 0000000000000001 x16: 00f8000040ffff13
         [    0.132288] x15: 000000007eff6000 x14: 000000007eff6000
         [    0.137704] x13: 00f800007fe00f11 x12: 000000007eff8000
         [    0.143013] x11: 000000007eff8000 x10: 0000000000000000
         [    0.148426] x9 : 000000007eff9000 x8 : 000000007eff9000
         [    0.153841] x7 : 0000000000000000 x6 : 00000000411f8000
         [    0.159154] x5 : 00000000411f8000 x4 : 0000000040a443d4
         [    0.164567] x3 : 00000000411f7000 x2 : 00000000411f7000
         [    0.169981] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
         [    0.175395] Kernel panic - not syncing: kernel stack overflow
         [    0.181178] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #6
         [    0.189248] Hardware name: linux,dummy-virt (DT)
         [    0.193945] Call trace:
         [    0.196470]  dump_backtrace+0x0/0x180
         [    0.200201]  show_stack+0x14/0x1c
         [    0.203574]  dump_stack+0x90/0xb0
         [    0.206946]  panic+0x138/0x2a0
         [    0.210075]  __stack_chk_fail+0x0/0x18
         [    0.213922]  handle_bad_stack+0x118/0x124
         [    0.218012]  __bad_stack+0x88/0x8c
         [    0.221393]  el1_sync+0x0/0xb0
         [    0.224520] Unable to handle kernel paging request at 
virtual address ffff0000093abce0
         [    0.232586] Mem abort info:
         [    0.235362]   ESR = 0x96000006
         [    0.238488]   Exception class = DABT (current EL), IL = 32 bits
         [    0.244506]   SET = 0, FnV = 0
         [    0.247632]   EA = 0, S1PTW = 0
         [    0.250873] Data abort info:
         [    0.253765]   ISV = 0, ISS = 0x00000006
         [    0.257725]   CM = 0, WnR = 0
         [    0.260735] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)


Best Regards,
Wei


^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 14:18 ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 14:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi All,

We have observed KVM guest sometimes failed to boot because of kernel stack
overflow if KPTI is enabled on a hisilicon arm64 platform.

We also tested with different kernel version and found it is only
happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the 
guest.
The detail result is as below table.

+---------+----------+--------+------------+-------------------+
      |  host   |host KPTI | guest  | guest KPTI | kvm guest         |
      |  kernel |enabled   | kernel | enabled    | booting result    |
+---------+----------+--------+------------+-------------------+
      |  4.17   |     Y    |  4.17  |     Y      |  stack overflow   |
+---------+----------+--------+------------+-------------------+
      |  4.17   |     Y    |  4.16  |     NA     | OK          |
+---------+----------+--------+------------+-------------------+
      |  4.16   |     NA   |  4.17  |     Y      |  stack overflow   |
+---------+----------+--------+------------+-------------------+
      |  4.16   |     NA   |  4.16  |     NA     | OK          |
+---------+----------+--------+------------+-------------------+

A simple walk-around is adding this platform into the "kpti_safe_list".
But it does not resolve the issue indeed.
Could you please share any hint how to resolve this kind issue?
Thanks!

Another issue we found is "kpti_install_ng_mappings" will be invoked
even "kpti=off" has been added in the kernel command line. Is that expected?
This is because "kpti" is not a *early* param that "init_cpu_features" will
be invoked before parsing the param.

The command we are using to run the guest is as:

     ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 
-cpu host
     -enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd 
../mini-rootfs-arm64.cpio.gz
     -nographic -append "rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000"

The log is as below:

         [    0.000000] Booting Linux on physical CPU 0x0000000000 
[0x480fd010]
         [    0.000000] Linux version 4.17.0-45864-g29dcea8-dirty 
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 
SMP PREEMPT Fri Jun 15 21:39:52 CST 2018
         [    0.000000] Machine model: linux,dummy-virt
         [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 
(options '')
         [    0.000000] bootconsole [pl11] enabled
         [    0.000000] efi: Getting EFI parameters from FDT:
         [    0.000000] efi: UEFI not found.
         [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
         [    0.000000] NUMA: No NUMA configuration found
         [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
         [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
         [    0.000000] Zone ranges:
         [    0.000000]   DMA32    [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000]   Normal   empty
         [    0.000000] Movable zone start for each node
         [    0.000000] Early memory node ranges
         [    0.000000]   node   0: [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000] psci: probing for conduit method from DT.
         [    0.000000] psci: PSCIv1.0 detected in firmware.
         [    0.000000] psci: Using standard PSCI v0.2 function IDs
         [    0.000000] psci: Trusted OS migration not required
         [    0.000000] psci: SMC Calling Convention v1.1
         [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
         [    0.000000] percpu: Embedded 24 pages/cpu @        (ptrval) 
s57984 r8192 d32128 u98304
         [    0.000000] Detected VIPT I-cache on CPU0
         [    0.000000] CPU features: detected: Kernel page table 
isolation (KPTI)
         [    0.000000] CPU features: detected: Hardware dirty bit 
management
         [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
         [    0.000000] Policy zone: DMA32
         [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
         [    0.000000] Memory: 968436K/1048576K available (10044K 
kernel code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K 
reserved, 16384K cma-reserved)
         [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, 
CPUs=1, Nodes=1
         [    0.000000] Preemptible hierarchical RCU implementation.
         [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
         [    0.000000]     Tasks RCU enabled.
         [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
         [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
         [    0.000000] GICv3: Distributor has no Range Selector support
         [    0.000000] GICv3: no VLPI support, no direct LPI support
         [    0.000000] ITS [mem 0x08080000-0x0809ffff]
         [    0.000000] ITS at 0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
         [    0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
         [    0.000000] GIC: using LPI property table @0x000000007d850000
         [    0.000000] ITS: Allocated 1792 chunks for LPIs
         [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
         [    0.000000] CPU0: using LPI pending table @0x000000007d860000
         [    0.000000] GIC: PPI11 is secure or misconfigured
         [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
         [    0.000000] arch_timer: WARNING: Please fix your firmware
         [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz 
(virt).
         [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
         [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
         [    0.000854] Console: colour dummy device 80x25
         [    0.001423] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 200.00 BogoMIPS (lpj=400000)
         [    0.002478] pid_max: default: 32768 minimum: 301
         [    0.002962] Security Framework initialized
         [    0.003541] Dentry cache hash table entries: 131072 (order: 
8, 1048576 bytes)
         [    0.004347] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
         [    0.005058] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
         [    0.005844] Mountpoint-cache hash table entries: 2048 
(order: 2, 16384 bytes)
         [    0.025949] ASID allocator initialised with 32768 entries
         [    0.029958] Hierarchical SRCU implementation.
         [    0.034328] Platform MSI: its domain created
         [    0.034787] PCI/MSI: /intc/its domain created
         [    0.035359] EFI services will not be available.
         [    0.037987] smp: Bringing up secondary CPUs ...
         [    0.038454] smp: Brought up 1 node, 1 CPU
         [    0.038859] SMP: Total of 1 processors activated.
         [    0.039338] CPU features: detected: GIC system register CPU 
interface
         [    0.039988] CPU features: detected: Privileged Access Never
         [    0.040560] CPU features: detected: User Access Override
         [    0.041093] CPU features: detected: RAS Extension Support
         [    0.042947] Insufficient stack space to handle exception!
         [    0.042949] ESR: 0x96000046 -- DABT (current EL)
         [    0.043963] FAR: 0xffff0000093a80e0
         [    0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
         [    0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
         [    0.058572] Overflow stack: 
[0xffff80003efce2f0..0xffff80003efcf2f0]
         [    0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #6
         [    0.073138] Hardware name: linux,dummy-virt (DT)
         [    0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
         [    0.082661] pc : el1_sync+0x0/0xb0
         [    0.086152] lr : kpti_install_ng_mappings+0x120/0x214
         [    0.091219] sp : ffff0000093a80e0
         [    0.094589] x29: ffff0000093abce0 x28: ffff000008ea9000
         [    0.100004] x27: ffff000008ea9000 x26: ffff0000091f7000
         [    0.105424] x25: ffff00000906d000 x24: ffff000009191000
         [    0.110733] x23: ffff000008ea9000 x22: 0000000041190000
         [    0.116148] x21: ffff0000091f7000 x20: 0000000000000000
         [    0.121564] x19: ffff000009190000 x18: 000000003455d99d
         [    0.126977] x17: 0000000000000001 x16: 00f8000040ffff13
         [    0.132288] x15: 000000007eff6000 x14: 000000007eff6000
         [    0.137704] x13: 00f800007fe00f11 x12: 000000007eff8000
         [    0.143013] x11: 000000007eff8000 x10: 0000000000000000
         [    0.148426] x9 : 000000007eff9000 x8 : 000000007eff9000
         [    0.153841] x7 : 0000000000000000 x6 : 00000000411f8000
         [    0.159154] x5 : 00000000411f8000 x4 : 0000000040a443d4
         [    0.164567] x3 : 00000000411f7000 x2 : 00000000411f7000
         [    0.169981] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
         [    0.175395] Kernel panic - not syncing: kernel stack overflow
         [    0.181178] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #6
         [    0.189248] Hardware name: linux,dummy-virt (DT)
         [    0.193945] Call trace:
         [    0.196470]  dump_backtrace+0x0/0x180
         [    0.200201]  show_stack+0x14/0x1c
         [    0.203574]  dump_stack+0x90/0xb0
         [    0.206946]  panic+0x138/0x2a0
         [    0.210075]  __stack_chk_fail+0x0/0x18
         [    0.213922]  handle_bad_stack+0x118/0x124
         [    0.218012]  __bad_stack+0x88/0x8c
         [    0.221393]  el1_sync+0x0/0xb0
         [    0.224520] Unable to handle kernel paging request at 
virtual address ffff0000093abce0
         [    0.232586] Mem abort info:
         [    0.235362]   ESR = 0x96000006
         [    0.238488]   Exception class = DABT (current EL), IL = 32 bits
         [    0.244506]   SET = 0, FnV = 0
         [    0.247632]   EA = 0, S1PTW = 0
         [    0.250873] Data abort info:
         [    0.253765]   ISV = 0, ISS = 0x00000006
         [    0.257725]   CM = 0, WnR = 0
         [    0.260735] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)


Best Regards,
Wei

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-20 14:18 ` Wei Xu
@ 2018-06-20 14:42   ` Will Deacon
  -1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-20 14:42 UTC (permalink / raw)
  To: Wei Xu
  Cc: catalin.marinas, suzuki.poulose, dave.martin, mark.rutland,
	james.morse, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

Hi Wei,

On Wed, Jun 20, 2018 at 10:18:00PM +0800, Wei Xu wrote:
> We have observed KVM guest sometimes failed to boot because of kernel stack
> overflow if KPTI is enabled on a hisilicon arm64 platform.
> 
> We also tested with different kernel version and found it is only
> happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the
> guest.
> The detail result is as below table.
> 
> +---------+----------+--------+------------+-------------------+
>      |  host   |host KPTI | guest  | guest KPTI | kvm guest         |
>      |  kernel |enabled   | kernel | enabled    | booting result    |
> +---------+----------+--------+------------+-------------------+
>      |  4.17   |     Y    |  4.17  |     Y      |  stack overflow   |
> +---------+----------+--------+------------+-------------------+
>      |  4.17   |     Y    |  4.16  |     NA     | OK          |
> +---------+----------+--------+------------+-------------------+
>      |  4.16   |     NA   |  4.17  |     Y      |  stack overflow   |
> +---------+----------+--------+------------+-------------------+
>      |  4.16   |     NA   |  4.16  |     NA     | OK          |
> +---------+----------+--------+------------+-------------------+
> 
> A simple walk-around is adding this platform into the "kpti_safe_list".
> But it does not resolve the issue indeed.
> Could you please share any hint how to resolve this kind issue?
> Thanks!
> 
> Another issue we found is "kpti_install_ng_mappings" will be invoked
> even "kpti=off" has been added in the kernel command line. Is that expected?
> This is because "kpti" is not a *early* param that "init_cpu_features" will
> be invoked before parsing the param.

That sounds like a straightforward bug, which means we should use
early_param instead of __setup. I assume that doesn't fix your crash,
though?

> The command we are using to run the guest is as:
> 
>     ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
> host
>     -enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd
> ../mini-rootfs-arm64.cpio.gz
>     -nographic -append "rdinit=init console=ttyAMA0
> earlycon=pl011,0x9000000"
> 
> The log is as below:
> 
>         [    0.000000] Booting Linux on physical CPU 0x0000000000
> [0x480fd010]
>         [    0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
> 15 21:39:52 CST 2018

^^^ This is reproducible with vanilla v4.17 and defconfig, right?

>         [    0.038859] SMP: Total of 1 processors activated.
>         [    0.039338] CPU features: detected: GIC system register CPU
> interface
>         [    0.039988] CPU features: detected: Privileged Access Never
>         [    0.040560] CPU features: detected: User Access Override
>         [    0.041093] CPU features: detected: RAS Extension Support
>         [    0.042947] Insufficient stack space to handle exception!
>         [    0.042949] ESR: 0x96000046 -- DABT (current EL)
>         [    0.043963] FAR: 0xffff0000093a80e0
>         [    0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>         [    0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
>         [    0.058572] Overflow stack:
> [0xffff80003efce2f0..0xffff80003efcf2f0]
>         [    0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #6
>         [    0.073138] Hardware name: linux,dummy-virt (DT)
>         [    0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>         [    0.082661] pc : el1_sync+0x0/0xb0
>         [    0.086152] lr : kpti_install_ng_mappings+0x120/0x214

Can you use scripts/faddr2line to find out which line of code the lr is
pointing at, please? It would be interesting to know if we managed to
install the idmap.

Hmm, I wonder if this is at all related to RAS, since we've just enabled
that and if we take a fault whilst rewriting swapper then we're going to
get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 14:42   ` Will Deacon
  0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-20 14:42 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Wei,

On Wed, Jun 20, 2018 at 10:18:00PM +0800, Wei Xu wrote:
> We have observed KVM guest sometimes failed to boot because of kernel stack
> overflow if KPTI is enabled on a hisilicon arm64 platform.
> 
> We also tested with different kernel version and found it is only
> happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the
> guest.
> The detail result is as below table.
> 
> +---------+----------+--------+------------+-------------------+
>      |  host   |host KPTI | guest  | guest KPTI | kvm guest         |
>      |  kernel |enabled   | kernel | enabled    | booting result    |
> +---------+----------+--------+------------+-------------------+
>      |  4.17   |     Y    |  4.17  |     Y      |  stack overflow   |
> +---------+----------+--------+------------+-------------------+
>      |  4.17   |     Y    |  4.16  |     NA     | OK          |
> +---------+----------+--------+------------+-------------------+
>      |  4.16   |     NA   |  4.17  |     Y      |  stack overflow   |
> +---------+----------+--------+------------+-------------------+
>      |  4.16   |     NA   |  4.16  |     NA     | OK          |
> +---------+----------+--------+------------+-------------------+
> 
> A simple walk-around is adding this platform into the "kpti_safe_list".
> But it does not resolve the issue indeed.
> Could you please share any hint how to resolve this kind issue?
> Thanks!
> 
> Another issue we found is "kpti_install_ng_mappings" will be invoked
> even "kpti=off" has been added in the kernel command line. Is that expected?
> This is because "kpti" is not a *early* param that "init_cpu_features" will
> be invoked before parsing the param.

That sounds like a straightforward bug, which means we should use
early_param instead of __setup. I assume that doesn't fix your crash,
though?

> The command we are using to run the guest is as:
> 
>     ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
> host
>     -enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd
> ../mini-rootfs-arm64.cpio.gz
>     -nographic -append "rdinit=init console=ttyAMA0
> earlycon=pl011,0x9000000"
> 
> The log is as below:
> 
>         [    0.000000] Booting Linux on physical CPU 0x0000000000
> [0x480fd010]
>         [    0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
> 15 21:39:52 CST 2018

^^^ This is reproducible with vanilla v4.17 and defconfig, right?

>         [    0.038859] SMP: Total of 1 processors activated.
>         [    0.039338] CPU features: detected: GIC system register CPU
> interface
>         [    0.039988] CPU features: detected: Privileged Access Never
>         [    0.040560] CPU features: detected: User Access Override
>         [    0.041093] CPU features: detected: RAS Extension Support
>         [    0.042947] Insufficient stack space to handle exception!
>         [    0.042949] ESR: 0x96000046 -- DABT (current EL)
>         [    0.043963] FAR: 0xffff0000093a80e0
>         [    0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>         [    0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
>         [    0.058572] Overflow stack:
> [0xffff80003efce2f0..0xffff80003efcf2f0]
>         [    0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #6
>         [    0.073138] Hardware name: linux,dummy-virt (DT)
>         [    0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>         [    0.082661] pc : el1_sync+0x0/0xb0
>         [    0.086152] lr : kpti_install_ng_mappings+0x120/0x214

Can you use scripts/faddr2line to find out which line of code the lr is
pointing at, please? It would be interesting to know if we managed to
install the idmap.

Hmm, I wonder if this is at all related to RAS, since we've just enabled
that and if we take a fault whilst rewriting swapper then we're going to
get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-20 14:42   ` Will Deacon
@ 2018-06-20 15:52     ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 15:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: catalin.marinas, suzuki.poulose, dave.martin, mark.rutland,
	james.morse, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

Hi Will,

On 2018/6/20 22:42, Will Deacon wrote:
> Hi Wei,
>
> On Wed, Jun 20, 2018 at 10:18:00PM +0800, Wei Xu wrote:
>> We have observed KVM guest sometimes failed to boot because of kernel stack
>> overflow if KPTI is enabled on a hisilicon arm64 platform.
>>
>> We also tested with different kernel version and found it is only
>> happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the
>> guest.
>> The detail result is as below table.
>>
>> +---------+----------+--------+------------+-------------------+
>>       |  host   |host KPTI | guest  | guest KPTI | kvm guest         |
>>       |  kernel |enabled   | kernel | enabled    | booting result    |
>> +---------+----------+--------+------------+-------------------+
>>       |  4.17   |     Y    |  4.17  |     Y      |  stack overflow   |
>> +---------+----------+--------+------------+-------------------+
>>       |  4.17   |     Y    |  4.16  |     NA     | OK          |
>> +---------+----------+--------+------------+-------------------+
>>       |  4.16   |     NA   |  4.17  |     Y      |  stack overflow   |
>> +---------+----------+--------+------------+-------------------+
>>       |  4.16   |     NA   |  4.16  |     NA     | OK          |
>> +---------+----------+--------+------------+-------------------+
>>
>> A simple walk-around is adding this platform into the "kpti_safe_list".
>> But it does not resolve the issue indeed.
>> Could you please share any hint how to resolve this kind issue?
>> Thanks!
>>
>> Another issue we found is "kpti_install_ng_mappings" will be invoked
>> even "kpti=off" has been added in the kernel command line. Is that expected?
>> This is because "kpti" is not a *early* param that "init_cpu_features" will
>> be invoked before parsing the param.
> That sounds like a straightforward bug, which means we should use
> early_param instead of __setup. I assume that doesn't fix your crash,
> though?

Thanks for you quick response!
It can fix our crash but just another walk-around.

>> The command we are using to run the guest is as:
>>
>>      ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
>> host
>>      -enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd
>> ../mini-rootfs-arm64.cpio.gz
>>      -nographic -append "rdinit=init console=ttyAMA0
>> earlycon=pl011,0x9000000"
>>
>> The log is as below:
>>
>>          [    0.000000] Booting Linux on physical CPU 0x0000000000
>> [0x480fd010]
>>          [    0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
>> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
>> 15 21:39:52 CST 2018
> ^^^ This is reproducible with vanilla v4.17 and defconfig, right?

Yes.

>
>>          [    0.038859] SMP: Total of 1 processors activated.
>>          [    0.039338] CPU features: detected: GIC system register CPU
>> interface
>>          [    0.039988] CPU features: detected: Privileged Access Never
>>          [    0.040560] CPU features: detected: User Access Override
>>          [    0.041093] CPU features: detected: RAS Extension Support
>>          [    0.042947] Insufficient stack space to handle exception!
>>          [    0.042949] ESR: 0x96000046 -- DABT (current EL)
>>          [    0.043963] FAR: 0xffff0000093a80e0
>>          [    0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>>          [    0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>          [    0.058572] Overflow stack:
>> [0xffff80003efce2f0..0xffff80003efcf2f0]
>>          [    0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #6
>>          [    0.073138] Hardware name: linux,dummy-virt (DT)
>>          [    0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>          [    0.082661] pc : el1_sync+0x0/0xb0
>>          [    0.086152] lr : kpti_install_ng_mappings+0x120/0x214
> Can you use scripts/faddr2line to find out which line of code the lr is
> pointing at, please? It would be interesting to know if we managed to
> install the idmap.
I did not use addr2line before but with gdb we can get same info as below:

(gdb) list *kpti_install_ng_mappings+0x120/0x214
0xffff000008091d70 is in kpti_install_ng_mappings 
(/home/joyx/plinth-kernel-v200/arch/arm64/kernel/cpufeature.c:907).
902             return !has_cpuid_feature(entry, scope);
903     }
904
905     static void
906     kpti_install_ng_mappings(const struct arm64_cpu_capabilities 
*__unused)
907     {
908             typedef void (kpti_remap_fn)(int, int, phys_addr_t);
909             extern kpti_remap_fn idmap_kpti_install_ng_mappings;
910             kpti_remap_fn *remap_fn;
911

> Hmm, I wonder if this is at all related to RAS, since we've just enabled
> that and if we take a fault whilst rewriting swapper then we're going to
> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?

I will try it now.
Thanks!

Best Regards,
Wei

> Will
>
> .
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 15:52     ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 15:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2018/6/20 22:42, Will Deacon wrote:
> Hi Wei,
>
> On Wed, Jun 20, 2018 at 10:18:00PM +0800, Wei Xu wrote:
>> We have observed KVM guest sometimes failed to boot because of kernel stack
>> overflow if KPTI is enabled on a hisilicon arm64 platform.
>>
>> We also tested with different kernel version and found it is only
>> happened if the KPTI and KVM(enable-kvm & cpu=host) are enabled on the
>> guest.
>> The detail result is as below table.
>>
>> +---------+----------+--------+------------+-------------------+
>>       |  host   |host KPTI | guest  | guest KPTI | kvm guest         |
>>       |  kernel |enabled   | kernel | enabled    | booting result    |
>> +---------+----------+--------+------------+-------------------+
>>       |  4.17   |     Y    |  4.17  |     Y      |  stack overflow   |
>> +---------+----------+--------+------------+-------------------+
>>       |  4.17   |     Y    |  4.16  |     NA     | OK          |
>> +---------+----------+--------+------------+-------------------+
>>       |  4.16   |     NA   |  4.17  |     Y      |  stack overflow   |
>> +---------+----------+--------+------------+-------------------+
>>       |  4.16   |     NA   |  4.16  |     NA     | OK          |
>> +---------+----------+--------+------------+-------------------+
>>
>> A simple walk-around is adding this platform into the "kpti_safe_list".
>> But it does not resolve the issue indeed.
>> Could you please share any hint how to resolve this kind issue?
>> Thanks!
>>
>> Another issue we found is "kpti_install_ng_mappings" will be invoked
>> even "kpti=off" has been added in the kernel command line. Is that expected?
>> This is because "kpti" is not a *early* param that "init_cpu_features" will
>> be invoked before parsing the param.
> That sounds like a straightforward bug, which means we should use
> early_param instead of __setup. I assume that doesn't fix your crash,
> though?

Thanks for you quick response!
It can fix our crash but just another walk-around.

>> The command we are using to run the guest is as:
>>
>>      ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3 -cpu
>> host
>>      -enable-kvm -smp 1 -m 1024 -kernel ./Image -initrd
>> ../mini-rootfs-arm64.cpio.gz
>>      -nographic -append "rdinit=init console=ttyAMA0
>> earlycon=pl011,0x9000000"
>>
>> The log is as below:
>>
>>          [    0.000000] Booting Linux on physical CPU 0x0000000000
>> [0x480fd010]
>>          [    0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
>> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
>> 15 21:39:52 CST 2018
> ^^^ This is reproducible with vanilla v4.17 and defconfig, right?

Yes.

>
>>          [    0.038859] SMP: Total of 1 processors activated.
>>          [    0.039338] CPU features: detected: GIC system register CPU
>> interface
>>          [    0.039988] CPU features: detected: Privileged Access Never
>>          [    0.040560] CPU features: detected: User Access Override
>>          [    0.041093] CPU features: detected: RAS Extension Support
>>          [    0.042947] Insufficient stack space to handle exception!
>>          [    0.042949] ESR: 0x96000046 -- DABT (current EL)
>>          [    0.043963] FAR: 0xffff0000093a80e0
>>          [    0.045794] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>>          [    0.052181] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>          [    0.058572] Overflow stack:
>> [0xffff80003efce2f0..0xffff80003efcf2f0]
>>          [    0.065068] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #6
>>          [    0.073138] Hardware name: linux,dummy-virt (DT)
>>          [    0.077831] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>          [    0.082661] pc : el1_sync+0x0/0xb0
>>          [    0.086152] lr : kpti_install_ng_mappings+0x120/0x214
> Can you use scripts/faddr2line to find out which line of code the lr is
> pointing at, please? It would be interesting to know if we managed to
> install the idmap.
I did not use addr2line before but with gdb we can get same info as below:

(gdb) list *kpti_install_ng_mappings+0x120/0x214
0xffff000008091d70 is in kpti_install_ng_mappings 
(/home/joyx/plinth-kernel-v200/arch/arm64/kernel/cpufeature.c:907).
902             return !has_cpuid_feature(entry, scope);
903     }
904
905     static void
906     kpti_install_ng_mappings(const struct arm64_cpu_capabilities 
*__unused)
907     {
908             typedef void (kpti_remap_fn)(int, int, phys_addr_t);
909             extern kpti_remap_fn idmap_kpti_install_ng_mappings;
910             kpti_remap_fn *remap_fn;
911

> Hmm, I wonder if this is at all related to RAS, since we've just enabled
> that and if we take a fault whilst rewriting swapper then we're going to
> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?

I will try it now.
Thanks!

Best Regards,
Wei

> Will
>
> .
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-20 15:52     ` Wei Xu
@ 2018-06-20 15:54       ` James Morse
  -1 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-20 15:54 UTC (permalink / raw)
  To: Wei Xu
  Cc: Will Deacon, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

Hi Wei,

On 20/06/18 16:52, Wei Xu wrote:
> On 2018/6/20 22:42, Will Deacon wrote:
>> Hmm, I wonder if this is at all related to RAS, since we've just enabled
>> that and if we take a fault whilst rewriting swapper then we're going to
>> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
> 
> I will try it now.

It's not just the Kconfig symbol, could you also revert:

f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
firmware-first")


(reverts and build cleanly on 4.17)


Thanks,

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 15:54       ` James Morse
  0 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-20 15:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Wei,

On 20/06/18 16:52, Wei Xu wrote:
> On 2018/6/20 22:42, Will Deacon wrote:
>> Hmm, I wonder if this is at all related to RAS, since we've just enabled
>> that and if we take a fault whilst rewriting swapper then we're going to
>> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
> 
> I will try it now.

It's not just the Kconfig symbol, could you also revert:

f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
firmware-first")


(reverts and build cleanly on 4.17)


Thanks,

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-20 15:54       ` James Morse
@ 2018-06-20 16:25         ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 16:25 UTC (permalink / raw)
  To: James Morse
  Cc: Will Deacon, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

Hi James,

On 2018/6/20 23:54, James Morse wrote:
> Hi Wei,
>
> On 20/06/18 16:52, Wei Xu wrote:
>> On 2018/6/20 22:42, Will Deacon wrote:
>>> Hmm, I wonder if this is at all related to RAS, since we've just enabled
>>> that and if we take a fault whilst rewriting swapper then we're going to
>>> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
>> I will try it now.
> It's not just the Kconfig symbol, could you also revert:
>
> f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
> firmware-first")
>
>
> (reverts and build cleanly on 4.17)

Thanks to point out this!
I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
But I still got the stack overflow issue sometimes.
Do you have more hint?
Thanks!

The log is as below:
     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
     [    0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty 
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 
SMP PREEMPT Wed Jun 20 23:59:05 CST 2018
     [    0.000000] Machine model: linux,dummy-virt
     [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
     [    0.000000] bootconsole [pl11] enabled
     [    0.000000] efi: Getting EFI parameters from FDT:
     [    0.000000] efi: UEFI not found.
     [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
     [    0.000000] NUMA: No NUMA configuration found
     [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
     [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
     [    0.000000] Zone ranges:
     [    0.000000]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000]   Normal   empty
     [    0.000000] Movable zone start for each node
     [    0.000000] Early memory node ranges
     [    0.000000]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
     [    0.000000] psci: probing for conduit method from DT.
     [    0.000000] psci: PSCIv1.0 detected in firmware.
     [    0.000000] psci: Using standard PSCI v0.2 function IDs
     [    0.000000] psci: Trusted OS migration not required
     [    0.000000] psci: SMC Calling Convention v1.1
     [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
     [    0.000000] percpu: Embedded 24 pages/cpu @        (ptrval) 
s57984 r8192 d32128 u98304
     [    0.000000] Detected VIPT I-cache on CPU0
     [    0.000000] CPU features: detected: Kernel page table isolation 
(KPTI)
     [    0.000000] CPU features: detected: Hardware dirty bit management
     [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
     [    0.000000] Policy zone: DMA32
     [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
     [    0.000000] Memory: 968436K/1048576K available (10044K kernel 
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved, 
16384K cma-reserved)
     [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, 
Nodes=1
     [    0.000000] Preemptible hierarchical RCU implementation.
     [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
     [    0.000000]     Tasks RCU enabled.
     [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
     [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
     [    0.000000] GICv3: Distributor has no Range Selector support
     [    0.000000] GICv3: no VLPI support, no direct LPI support
     [    0.000000] ITS [mem 0x08080000-0x0809ffff]
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
     [    0.000000] GIC: using LPI property table @0x000000007d850000
     [    0.000000] ITS: Allocated 1792 chunks for LPIs
     [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
     [    0.000000] GIC: PPI11 is secure or misconfigured
     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
     [    0.000000] arch_timer: WARNING: Please fix your firmware
     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
     [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
     [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
     [    0.000843] Console: colour dummy device 80x25
     [    0.001401] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
     [    0.002453] pid_max: default: 32768 minimum: 301
     [    0.002941] Security Framework initialized
     [    0.003517] Dentry cache hash table entries: 131072 (order: 8, 
1048576 bytes)
     [    0.004317] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
     [    0.005018] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.005791] Mountpoint-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.025893] ASID allocator initialised with 32768 entries
     [    0.029901] Hierarchical SRCU implementation.
     [    0.034274] Platform MSI: its domain created
     [    0.034749] PCI/MSI: /intc/its domain created
     [    0.035317] EFI services will not be available.
     [    0.037930] smp: Bringing up secondary CPUs ...
     [    0.038396] smp: Brought up 1 node, 1 CPU
     [    0.038810] SMP: Total of 1 processors activated.
     [    0.039285] CPU features: detected: GIC system register CPU 
interface
     [    0.039930] CPU features: detected: Privileged Access Never
     [    0.040488] CPU features: detected: User Access Override
     [    0.042421] Insufficient stack space to handle exception!
     [    0.042423] ESR: 0x96000046 -- DABT (current EL)
     [    0.043730] FAR: 0xffff0000093a80e0
     [    0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
     [    0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
     [    0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
     [    0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45865-g2b31fe7-dirty #10
     [    0.072201] Hardware name: linux,dummy-virt (DT)
     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
     [    0.081727] pc : el1_sync+0x0/0xb0
     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
     [    0.090284] sp : ffff0000093a80e0
     [    0.093654] x29: ffff0000093abce0 x28: ffff000008ea9000
     [    0.099071] x27: ffff000008ea9000 x26: ffff0000091f7000
     [    0.104488] x25: ffff00000906d000 x24: ffff000009191000
     [    0.109798] x23: ffff000008ea9000 x22: 0000000041190000
     [    0.115217] x21: ffff0000091f7000 x20: 0000000000000000
     [    0.120633] x19: ffff000009190000 x18: 000000003455d99d
     [    0.125943] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.131358] x15: 000000007eff6000 x14: 000000007eff6000
     [    0.136773] x13: 00f800007fe00f11 x12: 000000007eff8000
     [    0.142082] x11: 000000007eff8000 x10: 0000000000000000
     [    0.147501] x9 : 000000007eff9000 x8 : 000000007eff9000
     [    0.152920] x7 : 0000000000000000 x6 : 00000000411f8000
     [    0.158230] x5 : 00000000411f8000 x4 : 0000000040a443d4
     [    0.163646] x3 : 00000000411f7000 x2 : 00000000411f7000
     [    0.169061] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
     [    0.174372] Kernel panic - not syncing: kernel stack overflow
     [    0.180264] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45865-g2b31fe7-dirty #10
     [    0.188348] Hardware name: linux,dummy-virt (DT)
     [    0.193046] Call trace:
     [    0.195572]  dump_backtrace+0x0/0x180
     [    0.199304]  show_stack+0x14/0x1c
     [    0.202677]  dump_stack+0x90/0xb0
     [    0.206152]  panic+0x138/0x2a0
     [    0.209182]  __stack_chk_fail+0x0/0x18
     [    0.213029]  handle_bad_stack+0x118/0x124
     [    0.217120]  __bad_stack+0x88/0x8c
     [    0.220607]  el1_sync+0x0/0xb0
     [    0.223738] Unable to handle kernel paging request at virtual 
address ffff0000093abce0
     [    0.231704] Mem abort info:
     [    0.234586]   ESR = 0x96000006
     [    0.237714]   Exception class = DABT (current EL), IL = 32 bits
     [    0.243628]   SET = 0, FnV = 0
     [    0.246758]   EA = 0, S1PTW = 0
     [    0.250001] Data abort info:
     [    0.253000]   ISV = 0, ISS = 0x00000006
     [    0.256859]   CM = 0, WnR = 0
     [    0.259871] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
     [    0.266862] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000
     [    0.275659] Internal error: Oops: 96000006 [#1] PREEMPT SMP
     [    0.281213] Modules linked in:
     [    0.284447] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45865-g2b31fe7-dirty #10
     [    0.292534] Hardware name: linux,dummy-virt (DT)
     [    0.297229] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
     [    0.302053] pc : unwind_frame+0x28/0xc8
     [    0.306022] lr : dump_backtrace+0x12c/0x180
     [    0.310245] sp : ffff80003efcf000
     [    0.313616] x29: ffff80003efcf000 x28: ffff80003da61c00
     [    0.319033] x27: ffff000008ea9000 x26: ffff0000091f7000
     [    0.324348] x25: ffff00000906d000 x24: ffff0000093a80e0
     [    0.329764] x23: 0000000000000000 x22: ffff000008dbae28
     [    0.335179] x21: 0000000000000000 x20: ffff000009049000
     [    0.340488] x19: ffff80003da61c00 x18: 000000003455d99d
     [    0.345906] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.351322] x15: 000000007eff6000 x14: 3031232079747269
     [    0.356633] x13: 0000000000000000 x12: cc26f77952f87e00
     [    0.362046] x11: ffffffffffffffff x10: 0000000000000076
     [    0.367466] x9 : ffff0000085aea28 x8 : ffff80003efcec90
     [    0.372880] x7 : 0000000000000000 x6 : ffff0000091befe1
     [    0.378190] x5 : 0000000000000000 x4 : ffff0000093ac000
     [    0.383605] x3 : ffff0000093a8000 x2 : ffff0000093abce0
     [    0.389021] x1 : ffff80003efcf048 x0 : ffff80003da61c00
     [    0.394330] Process migration/0 (pid: 12, stack limit = 
0x        (ptrval))
     [    0.401427] Call trace:
     [    0.403852]  unwind_frame+0x28/0xc8
     [    0.407455]  show_stack+0x14/0x1c
     [    0.410828]  dump_stack+0x90/0xb0
     [    0.414201]  panic+0x138/0x2a0
     [    0.417329]  __stack_chk_fail+0x0/0x18
     [    0.421177]  handle_bad_stack+0x118/0x124
     [    0.425273]  __bad_stack+0x88/0x8c
     [    0.428762]  el1_sync+0x0/0xb0
     [    0.431891] Unable to handle kernel paging request at virtual 
address ffff0000093abce0
     [    0.439851] Mem abort info:
     [    0.442734]   ESR = 0x96000006
     [    0.445861]   Exception class = DABT (current EL), IL = 32 bits
     [    0.451774]   SET = 0, FnV = 0
     [    0.454900]   EA = 0, S1PTW = 0
     [    0.458142] Data abort info:
     [    0.461144]   ISV = 0, ISS = 0x00000006
     [    0.465001]   CM = 0, WnR = 0
     [    0.468013] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
     [    0.474996] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000

Best Regards,
Wei

>
> Thanks,
>
> James
>
> .
>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 16:25         ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

On 2018/6/20 23:54, James Morse wrote:
> Hi Wei,
>
> On 20/06/18 16:52, Wei Xu wrote:
>> On 2018/6/20 22:42, Will Deacon wrote:
>>> Hmm, I wonder if this is at all related to RAS, since we've just enabled
>>> that and if we take a fault whilst rewriting swapper then we're going to
>>> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
>> I will try it now.
> It's not just the Kconfig symbol, could you also revert:
>
> f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
> firmware-first")
>
>
> (reverts and build cleanly on 4.17)

Thanks to point out this!
I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
But I still got the stack overflow issue sometimes.
Do you have more hint?
Thanks!

The log is as below:
     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
     [    0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty 
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 
SMP PREEMPT Wed Jun 20 23:59:05 CST 2018
     [    0.000000] Machine model: linux,dummy-virt
     [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
     [    0.000000] bootconsole [pl11] enabled
     [    0.000000] efi: Getting EFI parameters from FDT:
     [    0.000000] efi: UEFI not found.
     [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
     [    0.000000] NUMA: No NUMA configuration found
     [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
     [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
     [    0.000000] Zone ranges:
     [    0.000000]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000]   Normal   empty
     [    0.000000] Movable zone start for each node
     [    0.000000] Early memory node ranges
     [    0.000000]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
     [    0.000000] psci: probing for conduit method from DT.
     [    0.000000] psci: PSCIv1.0 detected in firmware.
     [    0.000000] psci: Using standard PSCI v0.2 function IDs
     [    0.000000] psci: Trusted OS migration not required
     [    0.000000] psci: SMC Calling Convention v1.1
     [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
     [    0.000000] percpu: Embedded 24 pages/cpu @        (ptrval) 
s57984 r8192 d32128 u98304
     [    0.000000] Detected VIPT I-cache on CPU0
     [    0.000000] CPU features: detected: Kernel page table isolation 
(KPTI)
     [    0.000000] CPU features: detected: Hardware dirty bit management
     [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
     [    0.000000] Policy zone: DMA32
     [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
     [    0.000000] Memory: 968436K/1048576K available (10044K kernel 
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved, 
16384K cma-reserved)
     [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, 
Nodes=1
     [    0.000000] Preemptible hierarchical RCU implementation.
     [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
     [    0.000000]     Tasks RCU enabled.
     [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
     [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
     [    0.000000] GICv3: Distributor has no Range Selector support
     [    0.000000] GICv3: no VLPI support, no direct LPI support
     [    0.000000] ITS [mem 0x08080000-0x0809ffff]
     [    0.000000] ITS at 0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
     [    0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
     [    0.000000] GIC: using LPI property table @0x000000007d850000
     [    0.000000] ITS: Allocated 1792 chunks for LPIs
     [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
     [    0.000000] GIC: PPI11 is secure or misconfigured
     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
     [    0.000000] arch_timer: WARNING: Please fix your firmware
     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
     [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
     [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
     [    0.000843] Console: colour dummy device 80x25
     [    0.001401] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
     [    0.002453] pid_max: default: 32768 minimum: 301
     [    0.002941] Security Framework initialized
     [    0.003517] Dentry cache hash table entries: 131072 (order: 8, 
1048576 bytes)
     [    0.004317] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
     [    0.005018] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.005791] Mountpoint-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.025893] ASID allocator initialised with 32768 entries
     [    0.029901] Hierarchical SRCU implementation.
     [    0.034274] Platform MSI: its domain created
     [    0.034749] PCI/MSI: /intc/its domain created
     [    0.035317] EFI services will not be available.
     [    0.037930] smp: Bringing up secondary CPUs ...
     [    0.038396] smp: Brought up 1 node, 1 CPU
     [    0.038810] SMP: Total of 1 processors activated.
     [    0.039285] CPU features: detected: GIC system register CPU 
interface
     [    0.039930] CPU features: detected: Privileged Access Never
     [    0.040488] CPU features: detected: User Access Override
     [    0.042421] Insufficient stack space to handle exception!
     [    0.042423] ESR: 0x96000046 -- DABT (current EL)
     [    0.043730] FAR: 0xffff0000093a80e0
     [    0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
     [    0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
     [    0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
     [    0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45865-g2b31fe7-dirty #10
     [    0.072201] Hardware name: linux,dummy-virt (DT)
     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
     [    0.081727] pc : el1_sync+0x0/0xb0
     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
     [    0.090284] sp : ffff0000093a80e0
     [    0.093654] x29: ffff0000093abce0 x28: ffff000008ea9000
     [    0.099071] x27: ffff000008ea9000 x26: ffff0000091f7000
     [    0.104488] x25: ffff00000906d000 x24: ffff000009191000
     [    0.109798] x23: ffff000008ea9000 x22: 0000000041190000
     [    0.115217] x21: ffff0000091f7000 x20: 0000000000000000
     [    0.120633] x19: ffff000009190000 x18: 000000003455d99d
     [    0.125943] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.131358] x15: 000000007eff6000 x14: 000000007eff6000
     [    0.136773] x13: 00f800007fe00f11 x12: 000000007eff8000
     [    0.142082] x11: 000000007eff8000 x10: 0000000000000000
     [    0.147501] x9 : 000000007eff9000 x8 : 000000007eff9000
     [    0.152920] x7 : 0000000000000000 x6 : 00000000411f8000
     [    0.158230] x5 : 00000000411f8000 x4 : 0000000040a443d4
     [    0.163646] x3 : 00000000411f7000 x2 : 00000000411f7000
     [    0.169061] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
     [    0.174372] Kernel panic - not syncing: kernel stack overflow
     [    0.180264] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45865-g2b31fe7-dirty #10
     [    0.188348] Hardware name: linux,dummy-virt (DT)
     [    0.193046] Call trace:
     [    0.195572]  dump_backtrace+0x0/0x180
     [    0.199304]  show_stack+0x14/0x1c
     [    0.202677]  dump_stack+0x90/0xb0
     [    0.206152]  panic+0x138/0x2a0
     [    0.209182]  __stack_chk_fail+0x0/0x18
     [    0.213029]  handle_bad_stack+0x118/0x124
     [    0.217120]  __bad_stack+0x88/0x8c
     [    0.220607]  el1_sync+0x0/0xb0
     [    0.223738] Unable to handle kernel paging request at virtual 
address ffff0000093abce0
     [    0.231704] Mem abort info:
     [    0.234586]   ESR = 0x96000006
     [    0.237714]   Exception class = DABT (current EL), IL = 32 bits
     [    0.243628]   SET = 0, FnV = 0
     [    0.246758]   EA = 0, S1PTW = 0
     [    0.250001] Data abort info:
     [    0.253000]   ISV = 0, ISS = 0x00000006
     [    0.256859]   CM = 0, WnR = 0
     [    0.259871] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
     [    0.266862] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000
     [    0.275659] Internal error: Oops: 96000006 [#1] PREEMPT SMP
     [    0.281213] Modules linked in:
     [    0.284447] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45865-g2b31fe7-dirty #10
     [    0.292534] Hardware name: linux,dummy-virt (DT)
     [    0.297229] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
     [    0.302053] pc : unwind_frame+0x28/0xc8
     [    0.306022] lr : dump_backtrace+0x12c/0x180
     [    0.310245] sp : ffff80003efcf000
     [    0.313616] x29: ffff80003efcf000 x28: ffff80003da61c00
     [    0.319033] x27: ffff000008ea9000 x26: ffff0000091f7000
     [    0.324348] x25: ffff00000906d000 x24: ffff0000093a80e0
     [    0.329764] x23: 0000000000000000 x22: ffff000008dbae28
     [    0.335179] x21: 0000000000000000 x20: ffff000009049000
     [    0.340488] x19: ffff80003da61c00 x18: 000000003455d99d
     [    0.345906] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.351322] x15: 000000007eff6000 x14: 3031232079747269
     [    0.356633] x13: 0000000000000000 x12: cc26f77952f87e00
     [    0.362046] x11: ffffffffffffffff x10: 0000000000000076
     [    0.367466] x9 : ffff0000085aea28 x8 : ffff80003efcec90
     [    0.372880] x7 : 0000000000000000 x6 : ffff0000091befe1
     [    0.378190] x5 : 0000000000000000 x4 : ffff0000093ac000
     [    0.383605] x3 : ffff0000093a8000 x2 : ffff0000093abce0
     [    0.389021] x1 : ffff80003efcf048 x0 : ffff80003da61c00
     [    0.394330] Process migration/0 (pid: 12, stack limit = 
0x        (ptrval))
     [    0.401427] Call trace:
     [    0.403852]  unwind_frame+0x28/0xc8
     [    0.407455]  show_stack+0x14/0x1c
     [    0.410828]  dump_stack+0x90/0xb0
     [    0.414201]  panic+0x138/0x2a0
     [    0.417329]  __stack_chk_fail+0x0/0x18
     [    0.421177]  handle_bad_stack+0x118/0x124
     [    0.425273]  __bad_stack+0x88/0x8c
     [    0.428762]  el1_sync+0x0/0xb0
     [    0.431891] Unable to handle kernel paging request at virtual 
address ffff0000093abce0
     [    0.439851] Mem abort info:
     [    0.442734]   ESR = 0x96000006
     [    0.445861]   Exception class = DABT (current EL), IL = 32 bits
     [    0.451774]   SET = 0, FnV = 0
     [    0.454900]   EA = 0, S1PTW = 0
     [    0.458142] Data abort info:
     [    0.461144]   ISV = 0, ISS = 0x00000006
     [    0.465001]   CM = 0, WnR = 0
     [    0.468013] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
     [    0.474996] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000

Best Regards,
Wei

>
> Thanks,
>
> James
>
> .
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-20 16:25         ` Wei Xu
@ 2018-06-20 16:28           ` Will Deacon
  -1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-20 16:28 UTC (permalink / raw)
  To: Wei Xu
  Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

On Thu, Jun 21, 2018 at 12:25:05AM +0800, Wei Xu wrote:
> Hi James,
> 
> On 2018/6/20 23:54, James Morse wrote:
> >Hi Wei,
> >
> >On 20/06/18 16:52, Wei Xu wrote:
> >>On 2018/6/20 22:42, Will Deacon wrote:
> >>>Hmm, I wonder if this is at all related to RAS, since we've just enabled
> >>>that and if we take a fault whilst rewriting swapper then we're going to
> >>>get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
> >>I will try it now.
> >It's not just the Kconfig symbol, could you also revert:
> >
> >f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
> >firmware-first")
> >
> >
> >(reverts and build cleanly on 4.17)
> 
> Thanks to point out this!
> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
> But I still got the stack overflow issue sometimes.
> Do you have more hint?

[...]

>     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>     [    0.081727] pc : el1_sync+0x0/0xb0
>     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214

Please run:

$ ./scripts/faddr2line vmlinux kpti_install_ng_mappings+0x120/0x214

as the GDB output wasn't helpful (it only showed local variable
declarations?!).

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 16:28           ` Will Deacon
  0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-20 16:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 21, 2018 at 12:25:05AM +0800, Wei Xu wrote:
> Hi James,
> 
> On 2018/6/20 23:54, James Morse wrote:
> >Hi Wei,
> >
> >On 20/06/18 16:52, Wei Xu wrote:
> >>On 2018/6/20 22:42, Will Deacon wrote:
> >>>Hmm, I wonder if this is at all related to RAS, since we've just enabled
> >>>that and if we take a fault whilst rewriting swapper then we're going to
> >>>get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
> >>I will try it now.
> >It's not just the Kconfig symbol, could you also revert:
> >
> >f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
> >firmware-first")
> >
> >
> >(reverts and build cleanly on 4.17)
> 
> Thanks to point out this!
> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
> But I still got the stack overflow issue sometimes.
> Do you have more hint?

[...]

>     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>     [    0.081727] pc : el1_sync+0x0/0xb0
>     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214

Please run:

$ ./scripts/faddr2line vmlinux kpti_install_ng_mappings+0x120/0x214

as the GDB output wasn't helpful (it only showed local variable
declarations?!).

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-20 16:28           ` Will Deacon
@ 2018-06-20 16:33             ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 16:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

Hi Will,

On 2018/6/21 0:28, Will Deacon wrote:
> On Thu, Jun 21, 2018 at 12:25:05AM +0800, Wei Xu wrote:
>> Hi James,
>>
>> On 2018/6/20 23:54, James Morse wrote:
>>> Hi Wei,
>>>
>>> On 20/06/18 16:52, Wei Xu wrote:
>>>> On 2018/6/20 22:42, Will Deacon wrote:
>>>>> Hmm, I wonder if this is at all related to RAS, since we've just enabled
>>>>> that and if we take a fault whilst rewriting swapper then we're going to
>>>>> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
>>>> I will try it now.
>>> It's not just the Kconfig symbol, could you also revert:
>>>
>>> f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
>>> firmware-first")
>>>
>>>
>>> (reverts and build cleanly on 4.17)
>> Thanks to point out this!
>> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
>> But I still got the stack overflow issue sometimes.
>> Do you have more hint?
> [...]
>
>>      [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>      [    0.081727] pc : el1_sync+0x0/0xb0
>>      [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
> Please run:
>
> $ ./scripts/faddr2line vmlinux kpti_install_ng_mappings+0x120/0x214

Thanks for your kindly guide :)
The output is as below:

     joyx@Turing-Arch-b:~/plinth-kernel-v200$ ./scripts/faddr2line 
../kernel-dev.build/vmlinux kpti_install_ng_mappings+0x120/0x214
     kpti_install_ng_mappings+0x120/0x214:
     cpu_set_reserved_ttbr0 at arch/arm64/include/asm/mmu_context.h:52
     47      /*
     48       * Set TTBR0 to empty_zero_page. No translations will be 
possible via TTBR0.
     49       */
     50      static inline void cpu_set_reserved_ttbr0(void)
     51      {
     52              unsigned long ttbr = 
phys_to_ttbr(__pa_symbol(empty_zero_page));
     53
     54              write_sysreg(ttbr, ttbr0_el1);
     55              isb();
     56      }
     57
     (inlined by) cpu_uninstall_idmap at 
arch/arm64/include/asm/mmu_context.h:123
     118      */
     119     static inline void cpu_uninstall_idmap(void)
     120     {
     121             struct mm_struct *mm = current->active_mm;
     122
     123             cpu_set_reserved_ttbr0();
     124             local_flush_tlb_all();
     125             cpu_set_default_tcr_t0sz();
     126
     127             if (mm != &init_mm && !system_uses_ttbr0_pan())
     128                     cpu_switch_mm(mm->pgd, mm);
     (inlined by) kpti_install_ng_mappings at 
arch/arm64/kernel/cpufeature.c:922
     917
     918             remap_fn = (void 
*)__pa_symbol(idmap_kpti_install_ng_mappings);
     919
     920             cpu_install_idmap();
     921             remap_fn(cpu, num_online_cpus(), 
__pa_symbol(swapper_pg_dir));
     922             cpu_uninstall_idmap();
     923
     924             if (!cpu)
     925                     kpti_applied = true;
     926
     927             return;

Thanks!

Best Regards,
Wei

> as the GDB output wasn't helpful (it only showed local variable
> declarations?!).
>
> Will
>
> .
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-20 16:33             ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-20 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2018/6/21 0:28, Will Deacon wrote:
> On Thu, Jun 21, 2018 at 12:25:05AM +0800, Wei Xu wrote:
>> Hi James,
>>
>> On 2018/6/20 23:54, James Morse wrote:
>>> Hi Wei,
>>>
>>> On 20/06/18 16:52, Wei Xu wrote:
>>>> On 2018/6/20 22:42, Will Deacon wrote:
>>>>> Hmm, I wonder if this is at all related to RAS, since we've just enabled
>>>>> that and if we take a fault whilst rewriting swapper then we're going to
>>>>> get stuck. What happens if you set CONFIG_ARM64_RAS_EXTN=n in the guest?
>>>> I will try it now.
>>> It's not just the Kconfig symbol, could you also revert:
>>>
>>> f751daa4f9d3 ("arm64: Unconditionally enable IESB on exception entry/return for
>>> firmware-first")
>>>
>>>
>>> (reverts and build cleanly on 4.17)
>> Thanks to point out this!
>> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
>> But I still got the stack overflow issue sometimes.
>> Do you have more hint?
> [...]
>
>>      [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>      [    0.081727] pc : el1_sync+0x0/0xb0
>>      [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
> Please run:
>
> $ ./scripts/faddr2line vmlinux kpti_install_ng_mappings+0x120/0x214

Thanks for your kindly guide :)
The output is as below:

     joyx at Turing-Arch-b:~/plinth-kernel-v200$ ./scripts/faddr2line 
../kernel-dev.build/vmlinux kpti_install_ng_mappings+0x120/0x214
     kpti_install_ng_mappings+0x120/0x214:
     cpu_set_reserved_ttbr0 at arch/arm64/include/asm/mmu_context.h:52
     47      /*
     48       * Set TTBR0 to empty_zero_page. No translations will be 
possible via TTBR0.
     49       */
     50      static inline void cpu_set_reserved_ttbr0(void)
     51      {
     52              unsigned long ttbr = 
phys_to_ttbr(__pa_symbol(empty_zero_page));
     53
     54              write_sysreg(ttbr, ttbr0_el1);
     55              isb();
     56      }
     57
     (inlined by) cpu_uninstall_idmap at 
arch/arm64/include/asm/mmu_context.h:123
     118      */
     119     static inline void cpu_uninstall_idmap(void)
     120     {
     121             struct mm_struct *mm = current->active_mm;
     122
     123             cpu_set_reserved_ttbr0();
     124             local_flush_tlb_all();
     125             cpu_set_default_tcr_t0sz();
     126
     127             if (mm != &init_mm && !system_uses_ttbr0_pan())
     128                     cpu_switch_mm(mm->pgd, mm);
     (inlined by) kpti_install_ng_mappings at 
arch/arm64/kernel/cpufeature.c:922
     917
     918             remap_fn = (void 
*)__pa_symbol(idmap_kpti_install_ng_mappings);
     919
     920             cpu_install_idmap();
     921             remap_fn(cpu, num_online_cpus(), 
__pa_symbol(swapper_pg_dir));
     922             cpu_uninstall_idmap();
     923
     924             if (!cpu)
     925                     kpti_applied = true;
     926
     927             return;

Thanks!

Best Regards,
Wei

> as the GDB output wasn't helpful (it only showed local variable
> declarations?!).
>
> Will
>
> .
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-20 16:25         ` Wei Xu
@ 2018-06-21  8:38           ` James Morse
  -1 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-21  8:38 UTC (permalink / raw)
  To: Wei Xu, Will Deacon
  Cc: catalin.marinas, suzuki.poulose, dave.martin, mark.rutland,
	marc.zyngier, linux-arm-kernel, linux-kernel, Linuxarm,
	Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

Hi Will, Wei,

On 20/06/18 17:25, Wei Xu wrote:
> On 2018/6/20 23:54, James Morse wrote:
> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
> But I still got the stack overflow issue sometimes.
> Do you have more hint?

> The log is as below:
>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>     [    0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty

Could you reproduce this with v4.17? This says there are ~45,000 extra patches,
and un-committed changes. None of the hashes so far have been commits in
mainline, so we have no idea what this tree is.


> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
> 23:59:05 CST 2018

>     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
>     [    0.000000] GIC: PPI11 is secure or misconfigured
>     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
> low
>     [    0.000000] arch_timer: WARNING: Please fix your firmware
>     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).

(No idea what these mean, but I doubt they are relevant)


>     [    0.042421] Insufficient stack space to handle exception!
>     [    0.042423] ESR: 0x96000046 -- DABT (current EL)
>     [    0.043730] FAR: 0xffff0000093a80e0
>     [    0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]

This was a level 2 translation fault on a write, to an address that is within
the stack....


>     [    0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>     [    0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>     [    0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45865-g2b31fe7-dirty #10
>     [    0.072201] Hardware name: linux,dummy-virt (DT)

>     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>     [    0.081727] pc : el1_sync+0x0/0xb0

... from the vectors.


>     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214

What I think is happening is: we come out of the kpti idmap with the stack
unmapped. Shortly after we access the stack, which faults. el1_sync faults as
well when it tries to push the registers to the stack, and we keep going until
we overflow the stack.

I can't reproduce this with kvmtool or qemu in the model.


Thanks,

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-21  8:38           ` James Morse
  0 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-21  8:38 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will, Wei,

On 20/06/18 17:25, Wei Xu wrote:
> On 2018/6/20 23:54, James Morse wrote:
> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
> But I still got the stack overflow issue sometimes.
> Do you have more hint?

> The log is as below:
> ??? [??? 0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
> ??? [??? 0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty

Could you reproduce this with v4.17? This says there are ~45,000 extra patches,
and un-committed changes. None of the hashes so far have been commits in
mainline, so we have no idea what this tree is.


> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
> 23:59:05 CST 2018

> ??? [??? 0.000000] CPU0: using LPI pending table @0x000000007d860000
> ??? [??? 0.000000] GIC: PPI11 is secure or misconfigured
> ??? [??? 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
> low
> ??? [??? 0.000000] arch_timer: WARNING: Please fix your firmware
> ??? [??? 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).

(No idea what these mean, but I doubt they are relevant)


> ??? [??? 0.042421] Insufficient stack space to handle exception!
> ??? [??? 0.042423] ESR: 0x96000046 -- DABT (current EL)
> ??? [??? 0.043730] FAR: 0xffff0000093a80e0
> ??? [??? 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]

This was a level 2 translation fault on a write, to an address that is within
the stack....


> ??? [??? 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
> ??? [??? 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> ??? [??? 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45865-g2b31fe7-dirty #10
> ??? [??? 0.072201] Hardware name: linux,dummy-virt (DT)

> ??? [??? 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> ??? [??? 0.081727] pc : el1_sync+0x0/0xb0

... from the vectors.


> ??? [??? 0.085217] lr : kpti_install_ng_mappings+0x120/0x214

What I think is happening is: we come out of the kpti idmap with the stack
unmapped. Shortly after we access the stack, which faults. el1_sync faults as
well when it tries to push the registers to the stack, and we keep going until
we overflow the stack.

I can't reproduce this with kvmtool or qemu in the model.


Thanks,

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-21  8:38           ` James Morse
@ 2018-06-21  9:00             ` Marc Zyngier
  -1 siblings, 0 replies; 79+ messages in thread
From: Marc Zyngier @ 2018-06-21  9:00 UTC (permalink / raw)
  To: James Morse, Wei Xu, Will Deacon
  Cc: catalin.marinas, suzuki.poulose, dave.martin, mark.rutland,
	linux-arm-kernel, linux-kernel, Linuxarm, Hanjun Guo, xiexiuqi,
	huangdaode, Chenxin (Charles), Xiongfanggou (James),
	Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

On 21/06/18 09:38, James Morse wrote:

>> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
>> 23:59:05 CST 2018
> 
>>     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
>>     [    0.000000] GIC: PPI11 is secure or misconfigured
>>     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
>> low
>>     [    0.000000] arch_timer: WARNING: Please fix your firmware
>>     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
> 
> (No idea what these mean, but I doubt they are relevant)

Old (and buggy) QEMU. Nothing to worry about, the kernel (and the vgic)
will do the right thing. A modern QEMU presents the guest with a fixed
DT, removing the warning altogether.

Nothing to do with the issue at hand anyway.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-21  9:00             ` Marc Zyngier
  0 siblings, 0 replies; 79+ messages in thread
From: Marc Zyngier @ 2018-06-21  9:00 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/06/18 09:38, James Morse wrote:

>> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
>> 23:59:05 CST 2018
> 
>> ??? [??? 0.000000] CPU0: using LPI pending table @0x000000007d860000
>> ??? [??? 0.000000] GIC: PPI11 is secure or misconfigured
>> ??? [??? 0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
>> low
>> ??? [??? 0.000000] arch_timer: WARNING: Please fix your firmware
>> ??? [??? 0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
> 
> (No idea what these mean, but I doubt they are relevant)

Old (and buggy) QEMU. Nothing to worry about, the kernel (and the vgic)
will do the right thing. A modern QEMU presents the guest with a fixed
DT, removing the warning altogether.

Nothing to do with the issue at hand anyway.

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-21  8:38           ` James Morse
@ 2018-06-21  9:18             ` Will Deacon
  -1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-21  9:18 UTC (permalink / raw)
  To: James Morse
  Cc: Wei Xu, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
> On 20/06/18 17:25, Wei Xu wrote:
> >     [    0.042421] Insufficient stack space to handle exception!
> >     [    0.042423] ESR: 0x96000046 -- DABT (current EL)
> >     [    0.043730] FAR: 0xffff0000093a80e0
> >     [    0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> 
> This was a level 2 translation fault on a write, to an address that is within
> the stack....
> 
> 
> >     [    0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
> >     [    0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> >     [    0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> > 4.17.0-45865-g2b31fe7-dirty #10
> >     [    0.072201] Hardware name: linux,dummy-virt (DT)
> 
> >     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> >     [    0.081727] pc : el1_sync+0x0/0xb0
> 
> ... from the vectors.
> 
> 
> >     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
> 
> What I think is happening is: we come out of the kpti idmap with the stack
> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
> well when it tries to push the registers to the stack, and we keep going until
> we overflow the stack.
> 
> I can't reproduce this with kvmtool or qemu in the model.

Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
code leaves the nG bit set in table entries, which is actually IGNORED in
the architecture.

Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
otherwise your kernel will take an age to boot.

Will

--->8

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..70d9e98467ca 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -272,8 +272,8 @@ ENTRY(idmap_kpti_install_ng_mappings)
 	add	end_pgdp, cur_pgdp, #(PTRS_PER_PGD * 8)
 do_pgd:	__idmap_kpti_get_pgtable_ent	pgd
 	tbnz	pgd, #1, walk_puds
-next_pgd:
 	__idmap_kpti_put_pgtable_ent_ng	pgd
+next_pgd:
 skip_pgd:
 	add	cur_pgdp, cur_pgdp, #8
 	cmp	cur_pgdp, end_pgdp
@@ -302,8 +302,8 @@ walk_puds:
 	add	end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
 do_pud:	__idmap_kpti_get_pgtable_ent	pud
 	tbnz	pud, #1, walk_pmds
-next_pud:
 	__idmap_kpti_put_pgtable_ent_ng	pud
+next_pud:
 skip_pud:
 	add	cur_pudp, cur_pudp, 8
 	cmp	cur_pudp, end_pudp
@@ -323,8 +323,8 @@ walk_pmds:
 	add	end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
 do_pmd:	__idmap_kpti_get_pgtable_ent	pmd
 	tbnz	pmd, #1, walk_ptes
-next_pmd:
 	__idmap_kpti_put_pgtable_ent_ng	pmd
+next_pmd:
 skip_pmd:
 	add	cur_pmdp, cur_pmdp, #8
 	cmp	cur_pmdp, end_pmdp

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-21  9:18             ` Will Deacon
  0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-21  9:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
> On 20/06/18 17:25, Wei Xu wrote:
> > ??? [??? 0.042421] Insufficient stack space to handle exception!
> > ??? [??? 0.042423] ESR: 0x96000046 -- DABT (current EL)
> > ??? [??? 0.043730] FAR: 0xffff0000093a80e0
> > ??? [??? 0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> 
> This was a level 2 translation fault on a write, to an address that is within
> the stack....
> 
> 
> > ??? [??? 0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
> > ??? [??? 0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> > ??? [??? 0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> > 4.17.0-45865-g2b31fe7-dirty #10
> > ??? [??? 0.072201] Hardware name: linux,dummy-virt (DT)
> 
> > ??? [??? 0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> > ??? [??? 0.081727] pc : el1_sync+0x0/0xb0
> 
> ... from the vectors.
> 
> 
> > ??? [??? 0.085217] lr : kpti_install_ng_mappings+0x120/0x214
> 
> What I think is happening is: we come out of the kpti idmap with the stack
> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
> well when it tries to push the registers to the stack, and we keep going until
> we overflow the stack.
> 
> I can't reproduce this with kvmtool or qemu in the model.

Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
code leaves the nG bit set in table entries, which is actually IGNORED in
the architecture.

Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
otherwise your kernel will take an age to boot.

Will

--->8

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..70d9e98467ca 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -272,8 +272,8 @@ ENTRY(idmap_kpti_install_ng_mappings)
 	add	end_pgdp, cur_pgdp, #(PTRS_PER_PGD * 8)
 do_pgd:	__idmap_kpti_get_pgtable_ent	pgd
 	tbnz	pgd, #1, walk_puds
-next_pgd:
 	__idmap_kpti_put_pgtable_ent_ng	pgd
+next_pgd:
 skip_pgd:
 	add	cur_pgdp, cur_pgdp, #8
 	cmp	cur_pgdp, end_pgdp
@@ -302,8 +302,8 @@ walk_puds:
 	add	end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
 do_pud:	__idmap_kpti_get_pgtable_ent	pud
 	tbnz	pud, #1, walk_pmds
-next_pud:
 	__idmap_kpti_put_pgtable_ent_ng	pud
+next_pud:
 skip_pud:
 	add	cur_pudp, cur_pudp, 8
 	cmp	cur_pudp, end_pudp
@@ -323,8 +323,8 @@ walk_pmds:
 	add	end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
 do_pmd:	__idmap_kpti_get_pgtable_ent	pmd
 	tbnz	pmd, #1, walk_ptes
-next_pmd:
 	__idmap_kpti_put_pgtable_ent_ng	pmd
+next_pmd:
 skip_pmd:
 	add	cur_pmdp, cur_pmdp, #8
 	cmp	cur_pmdp, end_pmdp

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-21  8:38           ` James Morse
@ 2018-06-21  9:20             ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-21  9:20 UTC (permalink / raw)
  To: James Morse, Will Deacon
  Cc: catalin.marinas, suzuki.poulose, dave.martin, mark.rutland,
	marc.zyngier, linux-arm-kernel, linux-kernel, Linuxarm,
	Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

Hi James,

On 2018/6/21 9:38, James Morse wrote:
> Hi Will, Wei,
> 
> On 20/06/18 17:25, Wei Xu wrote:
>> On 2018/6/20 23:54, James Morse wrote:
>> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
>> But I still got the stack overflow issue sometimes.
>> Do you have more hint?
> 
>> The log is as below:
>>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>     [    0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty
> 
> Could you reproduce this with v4.17? This says there are ~45,000 extra patches,
> and un-committed changes. None of the hashes so far have been commits in
> mainline, so we have no idea what this tree is.
> 

I have tried v4.17 and log is as below and also it can be found in the first mail
of this thread.

	[ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
	(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
	linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
	15 21:39:52 CST 2018

I will try v4.17.2 and v4.18-rc1.

> 
>> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
>> 23:59:05 CST 2018
> 
>>     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
>>     [    0.000000] GIC: PPI11 is secure or misconfigured
>>     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
>> low
>>     [    0.000000] arch_timer: WARNING: Please fix your firmware
>>     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
> 
> (No idea what these mean, but I doubt they are relevant)
> 

I will try with mainline qemu 2.12.0.

Thanks!

Best Regards,
Wei

> 
>>     [    0.042421] Insufficient stack space to handle exception!
>>     [    0.042423] ESR: 0x96000046 -- DABT (current EL)
>>     [    0.043730] FAR: 0xffff0000093a80e0
>>     [    0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> 
> This was a level 2 translation fault on a write, to an address that is within
> the stack....
> 
> 
>>     [    0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>     [    0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>     [    0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45865-g2b31fe7-dirty #10
>>     [    0.072201] Hardware name: linux,dummy-virt (DT)
> 
>>     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>     [    0.081727] pc : el1_sync+0x0/0xb0
> 
> ... from the vectors.
> 
> 
>>     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
> 
> What I think is happening is: we come out of the kpti idmap with the stack
> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
> well when it tries to push the registers to the stack, and we keep going until
> we overflow the stack.
> 
> I can't reproduce this with kvmtool or qemu in the model.
> 
> 
> Thanks,
> 
> James
> 
> .
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-21  9:20             ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-21  9:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

On 2018/6/21 9:38, James Morse wrote:
> Hi Will, Wei,
> 
> On 20/06/18 17:25, Wei Xu wrote:
>> On 2018/6/20 23:54, James Morse wrote:
>> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
>> But I still got the stack overflow issue sometimes.
>> Do you have more hint?
> 
>> The log is as below:
>>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>     [    0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty
> 
> Could you reproduce this with v4.17? This says there are ~45,000 extra patches,
> and un-committed changes. None of the hashes so far have been commits in
> mainline, so we have no idea what this tree is.
> 

I have tried v4.17 and log is as below and also it can be found in the first mail
of this thread.

	[ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
	(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
	linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
	15 21:39:52 CST 2018

I will try v4.17.2 and v4.18-rc1.

> 
>> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
>> 23:59:05 CST 2018
> 
>>     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
>>     [    0.000000] GIC: PPI11 is secure or misconfigured
>>     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
>> low
>>     [    0.000000] arch_timer: WARNING: Please fix your firmware
>>     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
> 
> (No idea what these mean, but I doubt they are relevant)
> 

I will try with mainline qemu 2.12.0.

Thanks!

Best Regards,
Wei

> 
>>     [    0.042421] Insufficient stack space to handle exception!
>>     [    0.042423] ESR: 0x96000046 -- DABT (current EL)
>>     [    0.043730] FAR: 0xffff0000093a80e0
>>     [    0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> 
> This was a level 2 translation fault on a write, to an address that is within
> the stack....
> 
> 
>>     [    0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>     [    0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>     [    0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45865-g2b31fe7-dirty #10
>>     [    0.072201] Hardware name: linux,dummy-virt (DT)
> 
>>     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>     [    0.081727] pc : el1_sync+0x0/0xb0
> 
> ... from the vectors.
> 
> 
>>     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
> 
> What I think is happening is: we come out of the kpti idmap with the stack
> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
> well when it tries to push the registers to the stack, and we keep going until
> we overflow the stack.
> 
> I can't reproduce this with kvmtool or qemu in the model.
> 
> 
> Thanks,
> 
> James
> 
> .
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-21  9:18             ` Will Deacon
@ 2018-06-21 10:14               ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-21 10:14 UTC (permalink / raw)
  To: Will Deacon, James Morse
  Cc: catalin.marinas, suzuki.poulose, dave.martin, mark.rutland,
	marc.zyngier, linux-arm-kernel, linux-kernel, Linuxarm,
	Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

Hi Will,

On 2018/6/21 10:18, Will Deacon wrote:
> On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
>> On 20/06/18 17:25, Wei Xu wrote:
>>>     [    0.042421] Insufficient stack space to handle exception!
>>>     [    0.042423] ESR: 0x96000046 -- DABT (current EL)
>>>     [    0.043730] FAR: 0xffff0000093a80e0
>>>     [    0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>>
>> This was a level 2 translation fault on a write, to an address that is within
>> the stack....
>>
>>
>>>     [    0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>>     [    0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>>     [    0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>>> 4.17.0-45865-g2b31fe7-dirty #10
>>>     [    0.072201] Hardware name: linux,dummy-virt (DT)
>>
>>>     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>>     [    0.081727] pc : el1_sync+0x0/0xb0
>>
>> ... from the vectors.
>>
>>
>>>     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
>>
>> What I think is happening is: we come out of the kpti idmap with the stack
>> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
>> well when it tries to push the registers to the stack, and we keep going until
>> we overflow the stack.
>>
>> I can't reproduce this with kvmtool or qemu in the model.
> 
> Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
> code leaves the nG bit set in table entries, which is actually IGNORED in
> the architecture.
> 
> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> otherwise your kernel will take an age to boot.

Yes, amazing! This patch resolved the issue.
I have tested 50 times and can not reproduce the issue any more.
Could you please tell more why this patch works?
Thanks!

Best Regards,
Wei

> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..70d9e98467ca 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -272,8 +272,8 @@ ENTRY(idmap_kpti_install_ng_mappings)
>  	add	end_pgdp, cur_pgdp, #(PTRS_PER_PGD * 8)
>  do_pgd:	__idmap_kpti_get_pgtable_ent	pgd
>  	tbnz	pgd, #1, walk_puds
> -next_pgd:
>  	__idmap_kpti_put_pgtable_ent_ng	pgd
> +next_pgd:
>  skip_pgd:
>  	add	cur_pgdp, cur_pgdp, #8
>  	cmp	cur_pgdp, end_pgdp
> @@ -302,8 +302,8 @@ walk_puds:
>  	add	end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
>  do_pud:	__idmap_kpti_get_pgtable_ent	pud
>  	tbnz	pud, #1, walk_pmds
> -next_pud:
>  	__idmap_kpti_put_pgtable_ent_ng	pud
> +next_pud:
>  skip_pud:
>  	add	cur_pudp, cur_pudp, 8
>  	cmp	cur_pudp, end_pudp
> @@ -323,8 +323,8 @@ walk_pmds:
>  	add	end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
>  do_pmd:	__idmap_kpti_get_pgtable_ent	pmd
>  	tbnz	pmd, #1, walk_ptes
> -next_pmd:
>  	__idmap_kpti_put_pgtable_ent_ng	pmd
> +next_pmd:
>  skip_pmd:
>  	add	cur_pmdp, cur_pmdp, #8
>  	cmp	cur_pmdp, end_pmdp
> 
> .
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-21 10:14               ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-21 10:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2018/6/21 10:18, Will Deacon wrote:
> On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
>> On 20/06/18 17:25, Wei Xu wrote:
>>>     [    0.042421] Insufficient stack space to handle exception!
>>>     [    0.042423] ESR: 0x96000046 -- DABT (current EL)
>>>     [    0.043730] FAR: 0xffff0000093a80e0
>>>     [    0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>>
>> This was a level 2 translation fault on a write, to an address that is within
>> the stack....
>>
>>
>>>     [    0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>>     [    0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>>     [    0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>>> 4.17.0-45865-g2b31fe7-dirty #10
>>>     [    0.072201] Hardware name: linux,dummy-virt (DT)
>>
>>>     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>>     [    0.081727] pc : el1_sync+0x0/0xb0
>>
>> ... from the vectors.
>>
>>
>>>     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
>>
>> What I think is happening is: we come out of the kpti idmap with the stack
>> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
>> well when it tries to push the registers to the stack, and we keep going until
>> we overflow the stack.
>>
>> I can't reproduce this with kvmtool or qemu in the model.
> 
> Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
> code leaves the nG bit set in table entries, which is actually IGNORED in
> the architecture.
> 
> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> otherwise your kernel will take an age to boot.

Yes, amazing! This patch resolved the issue.
I have tested 50 times and can not reproduce the issue any more.
Could you please tell more why this patch works?
Thanks!

Best Regards,
Wei

> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..70d9e98467ca 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -272,8 +272,8 @@ ENTRY(idmap_kpti_install_ng_mappings)
>  	add	end_pgdp, cur_pgdp, #(PTRS_PER_PGD * 8)
>  do_pgd:	__idmap_kpti_get_pgtable_ent	pgd
>  	tbnz	pgd, #1, walk_puds
> -next_pgd:
>  	__idmap_kpti_put_pgtable_ent_ng	pgd
> +next_pgd:
>  skip_pgd:
>  	add	cur_pgdp, cur_pgdp, #8
>  	cmp	cur_pgdp, end_pgdp
> @@ -302,8 +302,8 @@ walk_puds:
>  	add	end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
>  do_pud:	__idmap_kpti_get_pgtable_ent	pud
>  	tbnz	pud, #1, walk_pmds
> -next_pud:
>  	__idmap_kpti_put_pgtable_ent_ng	pud
> +next_pud:
>  skip_pud:
>  	add	cur_pudp, cur_pudp, 8
>  	cmp	cur_pudp, end_pudp
> @@ -323,8 +323,8 @@ walk_pmds:
>  	add	end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
>  do_pmd:	__idmap_kpti_get_pgtable_ent	pmd
>  	tbnz	pmd, #1, walk_ptes
> -next_pmd:
>  	__idmap_kpti_put_pgtable_ent_ng	pmd
> +next_pmd:
>  skip_pmd:
>  	add	cur_pmdp, cur_pmdp, #8
>  	cmp	cur_pmdp, end_pmdp
> 
> .
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-21 10:14               ` Wei Xu
@ 2018-06-21 10:54                 ` Will Deacon
  -1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-21 10:54 UTC (permalink / raw)
  To: Wei Xu
  Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian

Hi Wei,

On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> On 2018/6/21 10:18, Will Deacon wrote:
> > On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
> >> On 20/06/18 17:25, Wei Xu wrote:
> >>>     [    0.042421] Insufficient stack space to handle exception!
> >>>     [    0.042423] ESR: 0x96000046 -- DABT (current EL)
> >>>     [    0.043730] FAR: 0xffff0000093a80e0
> >>>     [    0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> >>
> >> This was a level 2 translation fault on a write, to an address that is within
> >> the stack....
> >>
> >>
> >>>     [    0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
> >>>     [    0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> >>>     [    0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> >>> 4.17.0-45865-g2b31fe7-dirty #10
> >>>     [    0.072201] Hardware name: linux,dummy-virt (DT)
> >>
> >>>     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> >>>     [    0.081727] pc : el1_sync+0x0/0xb0
> >>
> >> ... from the vectors.
> >>
> >>
> >>>     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
> >>
> >> What I think is happening is: we come out of the kpti idmap with the stack
> >> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
> >> well when it tries to push the registers to the stack, and we keep going until
> >> we overflow the stack.
> >>
> >> I can't reproduce this with kvmtool or qemu in the model.
> > 
> > Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
> > code leaves the nG bit set in table entries, which is actually IGNORED in
> > the architecture.
> > 
> > Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> > otherwise your kernel will take an age to boot.
> 
> Yes, amazing! This patch resolved the issue.

Great...

> I have tested 50 times and can not reproduce the issue any more.
> Could you please tell more why this patch works?

You might need to ask your CPU design team ;)

Without this patch, the code in idmap_kpti_install_ng_mappings() sets
bit 11 in table descriptors so that we can keep track of which parts of
the page table we've visited. With this patch, we don't bother tracking
and potentially rewalk parts of the page table (which takes a very long
time if KASAN is enabled).

The architecture documents I've looked at are clear that bit 11 is IGNORED
by the CPU, which:

  "Indicates that the architecture guarantees that the bit or field is not
   interpreted or modified by hardware."

Please can you double-check that your CPU is indeed ignoring bit 11 in
non-leaf (table) descriptors?

Thanks,

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-21 10:54                 ` Will Deacon
  0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-21 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Wei,

On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> On 2018/6/21 10:18, Will Deacon wrote:
> > On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
> >> On 20/06/18 17:25, Wei Xu wrote:
> >>>     [    0.042421] Insufficient stack space to handle exception!
> >>>     [    0.042423] ESR: 0x96000046 -- DABT (current EL)
> >>>     [    0.043730] FAR: 0xffff0000093a80e0
> >>>     [    0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> >>
> >> This was a level 2 translation fault on a write, to an address that is within
> >> the stack....
> >>
> >>
> >>>     [    0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
> >>>     [    0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
> >>>     [    0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> >>> 4.17.0-45865-g2b31fe7-dirty #10
> >>>     [    0.072201] Hardware name: linux,dummy-virt (DT)
> >>
> >>>     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
> >>>     [    0.081727] pc : el1_sync+0x0/0xb0
> >>
> >> ... from the vectors.
> >>
> >>
> >>>     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
> >>
> >> What I think is happening is: we come out of the kpti idmap with the stack
> >> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
> >> well when it tries to push the registers to the stack, and we keep going until
> >> we overflow the stack.
> >>
> >> I can't reproduce this with kvmtool or qemu in the model.
> > 
> > Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
> > code leaves the nG bit set in table entries, which is actually IGNORED in
> > the architecture.
> > 
> > Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> > otherwise your kernel will take an age to boot.
> 
> Yes, amazing! This patch resolved the issue.

Great...

> I have tested 50 times and can not reproduce the issue any more.
> Could you please tell more why this patch works?

You might need to ask your CPU design team ;)

Without this patch, the code in idmap_kpti_install_ng_mappings() sets
bit 11 in table descriptors so that we can keep track of which parts of
the page table we've visited. With this patch, we don't bother tracking
and potentially rewalk parts of the page table (which takes a very long
time if KASAN is enabled).

The architecture documents I've looked at are clear that bit 11 is IGNORED
by the CPU, which:

  "Indicates that the architecture guarantees that the bit or field is not
   interpreted or modified by hardware."

Please can you double-check that your CPU is indeed ignoring bit 11 in
non-leaf (table) descriptors?

Thanks,

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-21 10:54                 ` Will Deacon
@ 2018-06-22  8:33                   ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22  8:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

Hi Will,

On 2018/6/21 11:54, Will Deacon wrote:
> Hi Wei,
> 
> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>> On 2018/6/21 10:18, Will Deacon wrote:
>>> On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
>>>> On 20/06/18 17:25, Wei Xu wrote:
>>>>>     [    0.042421] Insufficient stack space to handle exception!
>>>>>     [    0.042423] ESR: 0x96000046 -- DABT (current EL)
>>>>>     [    0.043730] FAR: 0xffff0000093a80e0
>>>>>     [    0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>>>>
>>>> This was a level 2 translation fault on a write, to an address that is within
>>>> the stack....
>>>>
>>>>
>>>>>     [    0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>>>>     [    0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>>>>     [    0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>>>>> 4.17.0-45865-g2b31fe7-dirty #10
>>>>>     [    0.072201] Hardware name: linux,dummy-virt (DT)
>>>>
>>>>>     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>>>>     [    0.081727] pc : el1_sync+0x0/0xb0
>>>>
>>>> ... from the vectors.
>>>>
>>>>
>>>>>     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
>>>>
>>>> What I think is happening is: we come out of the kpti idmap with the stack
>>>> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
>>>> well when it tries to push the registers to the stack, and we keep going until
>>>> we overflow the stack.
>>>>
>>>> I can't reproduce this with kvmtool or qemu in the model.
>>>
>>> Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
>>> code leaves the nG bit set in table entries, which is actually IGNORED in
>>> the architecture.
>>>
>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>> otherwise your kernel will take an age to boot.
>>
>> Yes, amazing! This patch resolved the issue.
> 
> Great...
> 
>> I have tested 50 times and can not reproduce the issue any more.
>> Could you please tell more why this patch works?
> 
> You might need to ask your CPU design team ;)
> 
> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> bit 11 in table descriptors so that we can keep track of which parts of
> the page table we've visited. With this patch, we don't bother tracking
> and potentially rewalk parts of the page table (which takes a very long
> time if KASAN is enabled).

Got it. Thanks!

> 
> The architecture documents I've looked at are clear that bit 11 is IGNORED
> by the CPU, which:
> 
>   "Indicates that the architecture guarantees that the bit or field is not
>    interpreted or modified by hardware."
> 
> Please can you double-check that your CPU is indeed ignoring bit 11 in
> non-leaf (table) descriptors?

Do the non-leaf(table) descriptors mean the table descriptors
of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?

If yes, our hardware does ignore it(not interpret or modify).

Is there any other possible reason cause this?
Thanks!

Best Regards,
Wei

> 
> Thanks,
> 
> Will
> 
> .
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22  8:33                   ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22  8:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2018/6/21 11:54, Will Deacon wrote:
> Hi Wei,
> 
> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>> On 2018/6/21 10:18, Will Deacon wrote:
>>> On Thu, Jun 21, 2018 at 09:38:53AM +0100, James Morse wrote:
>>>> On 20/06/18 17:25, Wei Xu wrote:
>>>>>     [    0.042421] Insufficient stack space to handle exception!
>>>>>     [    0.042423] ESR: 0x96000046 -- DABT (current EL)
>>>>>     [    0.043730] FAR: 0xffff0000093a80e0
>>>>>     [    0.044714] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
>>>>
>>>> This was a level 2 translation fault on a write, to an address that is within
>>>> the stack....
>>>>
>>>>
>>>>>     [    0.051113] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>>>>     [    0.057610] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>>>>     [    0.064003] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>>>>> 4.17.0-45865-g2b31fe7-dirty #10
>>>>>     [    0.072201] Hardware name: linux,dummy-virt (DT)
>>>>
>>>>>     [    0.076797] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>>>>     [    0.081727] pc : el1_sync+0x0/0xb0
>>>>
>>>> ... from the vectors.
>>>>
>>>>
>>>>>     [    0.085217] lr : kpti_install_ng_mappings+0x120/0x214
>>>>
>>>> What I think is happening is: we come out of the kpti idmap with the stack
>>>> unmapped. Shortly after we access the stack, which faults. el1_sync faults as
>>>> well when it tries to push the registers to the stack, and we keep going until
>>>> we overflow the stack.
>>>>
>>>> I can't reproduce this with kvmtool or qemu in the model.
>>>
>>> Hmm, one thing that occurs to me is that the kpti_install_ng_mappings()
>>> code leaves the nG bit set in table entries, which is actually IGNORED in
>>> the architecture.
>>>
>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>> otherwise your kernel will take an age to boot.
>>
>> Yes, amazing! This patch resolved the issue.
> 
> Great...
> 
>> I have tested 50 times and can not reproduce the issue any more.
>> Could you please tell more why this patch works?
> 
> You might need to ask your CPU design team ;)
> 
> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> bit 11 in table descriptors so that we can keep track of which parts of
> the page table we've visited. With this patch, we don't bother tracking
> and potentially rewalk parts of the page table (which takes a very long
> time if KASAN is enabled).

Got it. Thanks!

> 
> The architecture documents I've looked at are clear that bit 11 is IGNORED
> by the CPU, which:
> 
>   "Indicates that the architecture guarantees that the bit or field is not
>    interpreted or modified by hardware."
> 
> Please can you double-check that your CPU is indeed ignoring bit 11 in
> non-leaf (table) descriptors?

Do the non-leaf(table) descriptors mean the table descriptors
of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?

If yes, our hardware does ignore it(not interpret or modify).

Is there any other possible reason cause this?
Thanks!

Best Regards,
Wei

> 
> Thanks,
> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-22  8:33                   ` Wei Xu
@ 2018-06-22  9:23                     ` Will Deacon
  -1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22  9:23 UTC (permalink / raw)
  To: Wei Xu
  Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

Hi Wei,

On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
> On 2018/6/21 11:54, Will Deacon wrote:
> > On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> >> On 2018/6/21 10:18, Will Deacon wrote:
> >>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> >>> otherwise your kernel will take an age to boot.
> >>
> >> Yes, amazing! This patch resolved the issue.
> > 
> > Great...
> > 
> >> I have tested 50 times and can not reproduce the issue any more.
> >> Could you please tell more why this patch works?
> > 
> > You might need to ask your CPU design team ;)
> > 
> > Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> > bit 11 in table descriptors so that we can keep track of which parts of
> > the page table we've visited. With this patch, we don't bother tracking
> > and potentially rewalk parts of the page table (which takes a very long
> > time if KASAN is enabled).
> 
> Got it. Thanks!
> 
> > 
> > The architecture documents I've looked at are clear that bit 11 is IGNORED
> > by the CPU, which:
> > 
> >   "Indicates that the architecture guarantees that the bit or field is not
> >    interpreted or modified by hardware."
> > 
> > Please can you double-check that your CPU is indeed ignoring bit 11 in
> > non-leaf (table) descriptors?
> 
> Do the non-leaf(table) descriptors mean the table descriptors
> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
> 
> If yes, our hardware does ignore it(not interpret or modify).

Ok, thanks for checking.

> Is there any other possible reason cause this?

Perhaps just writing back the table entries is enough to cause the issue,
although I really can't understand why that would be the case. Can you try
the diff below (without my previous change), please?

Will

--->8

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..e2a8e88f95a0 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -216,7 +216,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
 	.endm
 
 	.macro __idmap_kpti_put_pgtable_ent_ng, type
-	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
+	eor	\type, \type, #PTE_NG		// Same bit for blocks and pages
 	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
 	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
 	.endm
@@ -298,6 +298,7 @@ skip_pgd:
 	/* PUD */
 walk_puds:
 	.if CONFIG_PGTABLE_LEVELS > 3
+	eor	pgd, pgd, #PTE_NG
 	pte_to_phys	cur_pudp, pgd
 	add	end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
 do_pud:	__idmap_kpti_get_pgtable_ent	pud
@@ -319,6 +320,7 @@ next_pud:
 	/* PMD */
 walk_pmds:
 	.if CONFIG_PGTABLE_LEVELS > 2
+	eor	pud, pud, #PTE_NG
 	pte_to_phys	cur_pmdp, pud
 	add	end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
 do_pmd:	__idmap_kpti_get_pgtable_ent	pmd
@@ -339,6 +341,7 @@ next_pmd:
 
 	/* PTE */
 walk_ptes:
+	eor	pmd, pmd, #PTE_NG
 	pte_to_phys	cur_ptep, pmd
 	add	end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
 do_pte:	__idmap_kpti_get_pgtable_ent	pte

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22  9:23                     ` Will Deacon
  0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22  9:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Wei,

On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
> On 2018/6/21 11:54, Will Deacon wrote:
> > On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> >> On 2018/6/21 10:18, Will Deacon wrote:
> >>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> >>> otherwise your kernel will take an age to boot.
> >>
> >> Yes, amazing! This patch resolved the issue.
> > 
> > Great...
> > 
> >> I have tested 50 times and can not reproduce the issue any more.
> >> Could you please tell more why this patch works?
> > 
> > You might need to ask your CPU design team ;)
> > 
> > Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> > bit 11 in table descriptors so that we can keep track of which parts of
> > the page table we've visited. With this patch, we don't bother tracking
> > and potentially rewalk parts of the page table (which takes a very long
> > time if KASAN is enabled).
> 
> Got it. Thanks!
> 
> > 
> > The architecture documents I've looked at are clear that bit 11 is IGNORED
> > by the CPU, which:
> > 
> >   "Indicates that the architecture guarantees that the bit or field is not
> >    interpreted or modified by hardware."
> > 
> > Please can you double-check that your CPU is indeed ignoring bit 11 in
> > non-leaf (table) descriptors?
> 
> Do the non-leaf(table) descriptors mean the table descriptors
> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
> 
> If yes, our hardware does ignore it(not interpret or modify).

Ok, thanks for checking.

> Is there any other possible reason cause this?

Perhaps just writing back the table entries is enough to cause the issue,
although I really can't understand why that would be the case. Can you try
the diff below (without my previous change), please?

Will

--->8

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..e2a8e88f95a0 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -216,7 +216,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
 	.endm
 
 	.macro __idmap_kpti_put_pgtable_ent_ng, type
-	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
+	eor	\type, \type, #PTE_NG		// Same bit for blocks and pages
 	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
 	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
 	.endm
@@ -298,6 +298,7 @@ skip_pgd:
 	/* PUD */
 walk_puds:
 	.if CONFIG_PGTABLE_LEVELS > 3
+	eor	pgd, pgd, #PTE_NG
 	pte_to_phys	cur_pudp, pgd
 	add	end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
 do_pud:	__idmap_kpti_get_pgtable_ent	pud
@@ -319,6 +320,7 @@ next_pud:
 	/* PMD */
 walk_pmds:
 	.if CONFIG_PGTABLE_LEVELS > 2
+	eor	pud, pud, #PTE_NG
 	pte_to_phys	cur_pmdp, pud
 	add	end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
 do_pmd:	__idmap_kpti_get_pgtable_ent	pmd
@@ -339,6 +341,7 @@ next_pmd:
 
 	/* PTE */
 walk_ptes:
+	eor	pmd, pmd, #PTE_NG
 	pte_to_phys	cur_ptep, pmd
 	add	end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
 do_pte:	__idmap_kpti_get_pgtable_ent	pte

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-22  9:23                     ` Will Deacon
@ 2018-06-22 10:45                       ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 10:45 UTC (permalink / raw)
  To: Will Deacon
  Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

Hi Will,

On 2018/6/22 17:23, Will Deacon wrote:
> Hi Wei,
>
> On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
>> On 2018/6/21 11:54, Will Deacon wrote:
>>> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>>>> On 2018/6/21 10:18, Will Deacon wrote:
>>>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>>>> otherwise your kernel will take an age to boot.
>>>> Yes, amazing! This patch resolved the issue.
>>> Great...
>>>
>>>> I have tested 50 times and can not reproduce the issue any more.
>>>> Could you please tell more why this patch works?
>>> You might need to ask your CPU design team ;)
>>>
>>> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
>>> bit 11 in table descriptors so that we can keep track of which parts of
>>> the page table we've visited. With this patch, we don't bother tracking
>>> and potentially rewalk parts of the page table (which takes a very long
>>> time if KASAN is enabled).
>> Got it. Thanks!
>>
>>> The architecture documents I've looked at are clear that bit 11 is IGNORED
>>> by the CPU, which:
>>>
>>>    "Indicates that the architecture guarantees that the bit or field is not
>>>     interpreted or modified by hardware."
>>>
>>> Please can you double-check that your CPU is indeed ignoring bit 11 in
>>> non-leaf (table) descriptors?
>> Do the non-leaf(table) descriptors mean the table descriptors
>> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
>> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
>>
>> If yes, our hardware does ignore it(not interpret or modify).
> Ok, thanks for checking.
>
>> Is there any other possible reason cause this?
> Perhaps just writing back the table entries is enough to cause the issue,
> although I really can't understand why that would be the case. Can you try
> the diff below (without my previous change), please?

Thanks!
But it does not resolve the issue(only apply this patch based on 4.17.0).
The log is as below:

     estuary:/$ ./qemu-system-aarch64 -machine 
virt,kernel_irqchip=on,gic-version=3
      -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.17-joyx 
-initrd
     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init 
console=ttyAMA0 earlycon=pl011,0x9000000"
     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
     [    0.000000] Linux version 4.17.0-45865-gc58dc48 
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #14 
SMP PREEMPT Fri Jun 22 18:26:01 CST 2018
     [    0.000000] Machine model: linux,dummy-virt
     [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
     [    0.000000] bootconsole [pl11] enabled
     [    0.000000] efi: Getting EFI parameters from FDT:
     [    0.000000] efi: UEFI not found.
     [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
     [    0.000000] NUMA: No NUMA configuration found
     [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
     [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
     [    0.000000] Zone ranges:
     [    0.000000]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000]   Normal   empty
     [    0.000000] Movable zone start for each node
     [    0.000000] Early memory node ranges
     [    0.000000]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
     [    0.000000] psci: probing for conduit method from DT.
     [    0.000000] psci: PSCIv1.0 detected in firmware.
     [    0.000000] psci: Using standard PSCI v0.2 function IDs
     [    0.000000] psci: Trusted OS migration not required
     [    0.000000] psci: SMC Calling Convention v1.1
     [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
     [    0.000000] percpu: Embedded 24 pages/cpu @        (ptrval) 
s57984 r8192 d32128 u98304
     [    0.000000] Detected VIPT I-cache on CPU0
     [    0.000000] CPU features: detected: Kernel page table isolation 
(KPTI)
     [    0.000000] CPU features: detected: Hardware dirty bit management
     [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
     [    0.000000] Policy zone: DMA32
     [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
     [    0.000000] Memory: 968436K/1048576K available (10044K kernel 
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved, 
16384K cma-reserved)
     [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, 
Nodes=1
     [    0.000000] Preemptible hierarchical RCU implementation.
     [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
     [    0.000000]     Tasks RCU enabled.
     [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
     [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
     [    0.000000] GICv3: Distributor has no Range Selector support
     [    0.000000] GICv3: no VLPI support, no direct LPI support
     [    0.000000] ITS [mem 0x08080000-0x0809ffff]
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
     [    0.000000] GIC: using LPI property table @0x000000007d850000
     [    0.000000] ITS: Allocated 1792 chunks for LPIs
     [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
     [    0.000000] GIC: PPI11 is secure or misconfigured
     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
     [    0.000000] arch_timer: WARNING: Please fix your firmware
     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
     [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
     [    0.000002] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
     [    0.000844] Console: colour dummy device 80x25
     [    0.001406] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
     [    0.002458] pid_max: default: 32768 minimum: 301
     [    0.002944] Security Framework initialized
     [    0.003521] Dentry cache hash table entries: 131072 (order: 8, 
1048576 bytes)
     [    0.004322] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
     [    0.005022] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.005797] Mountpoint-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.025904] ASID allocator initialised with 32768 entries
     [    0.029913] Hierarchical SRCU implementation.
     [    0.034285] Platform MSI: its domain created
     [    0.034740] PCI/MSI: /intc/its domain created
     [    0.035318] EFI services will not be available.
     [    0.037943] smp: Bringing up secondary CPUs ...
     [    0.038410] smp: Brought up 1 node, 1 CPU
     [    0.038815] SMP: Total of 1 processors activated.
     [    0.039300] CPU features: detected: GIC system register CPU 
interface
     [    0.039946] CPU features: detected: Privileged Access Never
     [    0.040506] CPU features: detected: User Access Override
     [    0.042439] Insufficient stack space to handle exception!
     [    0.042441] ESR: 0x96000046 -- DABT (current EL)
     [    0.043752] FAR: 0xffff0000093a80e0
     [    0.044207] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
     [    0.046511] IRQ stack: [0xffff000008000000..0xffff000008004000]
     [    0.052899] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
     [    0.059396] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45865-gc58dc48 #14
     [    0.067018] Hardware name: linux,dummy-virt (DT)
     [    0.071710] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
     [    0.076532] pc : el1_sync+0x0/0xb0
     [    0.080028] lr : kpti_install_ng_mappings+0x120/0x214
     [    0.085197] sp : ffff0000093a80e0
     [    0.088566] x29: ffff0000093abce0 x28: ffff000008ea9000
     [    0.093979] x27: ffff000008ea9000 x26: ffff0000091f7000
     [    0.099293] x25: ffff00000906d000 x24: ffff000009191000
     [    0.104706] x23: ffff000008ea9000 x22: 0000000041190000
     [    0.110015] x21: ffff0000091f7000 x20: 0000000000000000
     [    0.115428] x19: ffff000009190000 x18: 000000003455d99d
     [    0.120842] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.126255] x15: 000000007eff6000 x14: 000000007eff6000
     [    0.131566] x13: 00f800007fe00f11 x12: 000000007eff8000
     [    0.136983] x11: 000000007eff8000 x10: 0000000000000000
     [    0.142396] x9 : 000000007eff9000 x8 : 000000007eff9000
     [    0.147704] x7 : 0000000000000000 x6 : 00000000411f8000
     [    0.153116] x5 : 00000000411f8000 x4 : 0000000040a443d4
     [    0.158530] x3 : 00000000411f7000 x2 : 00000000411f7000
     [    0.163943] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
     [    0.169251] Kernel panic - not syncing: kernel stack overflow
     [    0.175140] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45865-gc58dc48 #14
     [    0.182732] Hardware name: linux,dummy-virt (DT)
     [    0.187424] Call trace:
     [    0.189948]  dump_backtrace+0x0/0x180
     [    0.193678]  show_stack+0x14/0x1c
     [    0.197051]  dump_stack+0x90/0xb0
     [    0.200423]  panic+0x138/0x2a0
     [    0.203549]  __stack_chk_fail+0x0/0x18
     [    0.207398]  handle_bad_stack+0x118/0x124
     [    0.211489]  __bad_stack+0x88/0x8c
     [    0.214870]  el1_sync+0x0/0xb0
     [    0.217998] Unable to handle kernel paging request at virtual 
address ffff0000093abce0
     [    0.226061] Mem abort info:
     [    0.228839]   ESR = 0x96000006
     [    0.231965]   Exception class = DABT (current EL), IL = 32 bits
     [    0.237980]   SET = 0, FnV = 0
     [    0.241105]   EA = 0, S1PTW = 0
     [    0.244346] Data abort info:
     [    0.247239]   ISV = 0, ISS = 0x00000006
     [    0.251199]   CM = 0, WnR = 0
     [    0.254209] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
     [    0.261191] [ffff0000093abce0] pgd=00000000411f8003, 
pud=00000000411f9003, pmd=0000000000000000
     [    0.269982] Internal error: Oops: 96000006 [#1] PREEMPT SMP
     [    0.275538] Modules linked in:
     [    0.278664] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45865-gc58dc48 #14
     [    0.286361] Hardware name: linux,dummy-virt (DT)
     [    0.291053] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
     [    0.295874] pc : unwind_frame+0x28/0xc8
     [    0.299836] lr : dump_backtrace+0x12c/0x180
     [    0.304055] sp : ffff80003efcf000
     [    0.307429] x29: ffff80003efcf000 x28: ffff80003da61c00
     [    0.312841] x27: ffff000008ea9000 x26: ffff0000091f7000
     [    0.318255] x25: ffff00000906d000 x24: ffff0000093a80e0
     [    0.323563] x23: 0000000000000000 x22: ffff000008dbada0
     [    0.328975] x21: 0000000000000000 x20: ffff000009049000
     [    0.334388] x19: ffff80003da61c00 x18: 000000003455d99d
     [    0.339698] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.345111] x15: 000000007eff6000 x14: 3431232038346364
     [    0.350523] x13: 0000000000000000 x12: cc26f77952f87e00
     [    0.355832] x11: ffffffffffffffff x10: 0000000000000075
     [    0.361245] x9 : ffff0000085ae9e8 x8 : 78302f3078302b63
     [    0.366666] x7 : 6e79735f316c6520 x6 : ffff0000091befe1
     [    0.371976] x5 : 0000000000000000 x4 : ffff0000093ac000
     [    0.377389] x3 : ffff0000093a8000 x2 : ffff0000093abce0
     [    0.382801] x1 : ffff80003efcf048 x0 : ffff80003da61c00
     [    0.388214] Process migration/0 (pid: 12, stack limit = 
0x        (ptrval))
     [    0.395204] Call trace:
     [    0.397726]  unwind_frame+0x28/0xc8
     [    0.401224]  show_stack+0x14/0x1c
     [    0.404699]  dump_stack+0x90/0xb0
     [    0.408070]  panic+0x138/0x2a0
     [    0.411198]  __stack_chk_fail+0x0/0x18
     [    0.414944]  handle_bad_stack+0x118/0x124
     [    0.419035]  __bad_stack+0x88/0x8c
     [    0.422520]  el1_sync+0x0/0xb0
     [    0.425648] Unable to handle kernel paging request at virtual 
address ffff0000093abce0
     [    0.433601] Mem abort info:
     [    0.436486]   ESR = 0x96000006
     [    0.439611]   Exception class = DABT (current EL), IL = 32 bits
     [    0.445626]   SET = 0, FnV = 0
     [    0.448754]   EA = 0, S1PTW = 0
     [    0.451995] Data abort info:
     [    0.454888]   ISV = 0, ISS = 0x00000006
     [    0.458849]   CM = 0, WnR = 0
     [    0.461860] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
     [    0.468843] [ffff0000093abce0] pgd=00000000411f8003, 
pud=00000000411f9003, pmd=0000000000000000


> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..e2a8e88f95a0 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -216,7 +216,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
>   	.endm
>   
>   	.macro __idmap_kpti_put_pgtable_ent_ng, type
> -	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
> +	eor	\type, \type, #PTE_NG		// Same bit for blocks and pages
>   	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
>   	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
>   	.endm
> @@ -298,6 +298,7 @@ skip_pgd:
>   	/* PUD */
>   walk_puds:
>   	.if CONFIG_PGTABLE_LEVELS > 3
> +	eor	pgd, pgd, #PTE_NG
>   	pte_to_phys	cur_pudp, pgd
>   	add	end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
>   do_pud:	__idmap_kpti_get_pgtable_ent	pud
> @@ -319,6 +320,7 @@ next_pud:
>   	/* PMD */
>   walk_pmds:
>   	.if CONFIG_PGTABLE_LEVELS > 2
> +	eor	pud, pud, #PTE_NG
>   	pte_to_phys	cur_pmdp, pud
>   	add	end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
>   do_pmd:	__idmap_kpti_get_pgtable_ent	pmd
> @@ -339,6 +341,7 @@ next_pmd:
>   
>   	/* PTE */
>   walk_ptes:
> +	eor	pmd, pmd, #PTE_NG
>   	pte_to_phys	cur_ptep, pmd
>   	add	end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
>   do_pte:	__idmap_kpti_get_pgtable_ent	pte
>
> .
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 10:45                       ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 10:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2018/6/22 17:23, Will Deacon wrote:
> Hi Wei,
>
> On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
>> On 2018/6/21 11:54, Will Deacon wrote:
>>> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>>>> On 2018/6/21 10:18, Will Deacon wrote:
>>>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>>>> otherwise your kernel will take an age to boot.
>>>> Yes, amazing! This patch resolved the issue.
>>> Great...
>>>
>>>> I have tested 50 times and can not reproduce the issue any more.
>>>> Could you please tell more why this patch works?
>>> You might need to ask your CPU design team ;)
>>>
>>> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
>>> bit 11 in table descriptors so that we can keep track of which parts of
>>> the page table we've visited. With this patch, we don't bother tracking
>>> and potentially rewalk parts of the page table (which takes a very long
>>> time if KASAN is enabled).
>> Got it. Thanks!
>>
>>> The architecture documents I've looked at are clear that bit 11 is IGNORED
>>> by the CPU, which:
>>>
>>>    "Indicates that the architecture guarantees that the bit or field is not
>>>     interpreted or modified by hardware."
>>>
>>> Please can you double-check that your CPU is indeed ignoring bit 11 in
>>> non-leaf (table) descriptors?
>> Do the non-leaf(table) descriptors mean the table descriptors
>> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
>> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
>>
>> If yes, our hardware does ignore it(not interpret or modify).
> Ok, thanks for checking.
>
>> Is there any other possible reason cause this?
> Perhaps just writing back the table entries is enough to cause the issue,
> although I really can't understand why that would be the case. Can you try
> the diff below (without my previous change), please?

Thanks!
But it does not resolve the issue(only apply this patch based on 4.17.0).
The log is as below:

     estuary:/$ ./qemu-system-aarch64 -machine 
virt,kernel_irqchip=on,gic-version=3
      -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.17-joyx 
-initrd
     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init 
console=ttyAMA0 earlycon=pl011,0x9000000"
     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
     [    0.000000] Linux version 4.17.0-45865-gc58dc48 
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #14 
SMP PREEMPT Fri Jun 22 18:26:01 CST 2018
     [    0.000000] Machine model: linux,dummy-virt
     [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
     [    0.000000] bootconsole [pl11] enabled
     [    0.000000] efi: Getting EFI parameters from FDT:
     [    0.000000] efi: UEFI not found.
     [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
     [    0.000000] NUMA: No NUMA configuration found
     [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
     [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
     [    0.000000] Zone ranges:
     [    0.000000]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000]   Normal   empty
     [    0.000000] Movable zone start for each node
     [    0.000000] Early memory node ranges
     [    0.000000]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
     [    0.000000] psci: probing for conduit method from DT.
     [    0.000000] psci: PSCIv1.0 detected in firmware.
     [    0.000000] psci: Using standard PSCI v0.2 function IDs
     [    0.000000] psci: Trusted OS migration not required
     [    0.000000] psci: SMC Calling Convention v1.1
     [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
     [    0.000000] percpu: Embedded 24 pages/cpu @        (ptrval) 
s57984 r8192 d32128 u98304
     [    0.000000] Detected VIPT I-cache on CPU0
     [    0.000000] CPU features: detected: Kernel page table isolation 
(KPTI)
     [    0.000000] CPU features: detected: Hardware dirty bit management
     [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
     [    0.000000] Policy zone: DMA32
     [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
     [    0.000000] Memory: 968436K/1048576K available (10044K kernel 
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved, 
16384K cma-reserved)
     [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, 
Nodes=1
     [    0.000000] Preemptible hierarchical RCU implementation.
     [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
     [    0.000000]     Tasks RCU enabled.
     [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
     [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
     [    0.000000] GICv3: Distributor has no Range Selector support
     [    0.000000] GICv3: no VLPI support, no direct LPI support
     [    0.000000] ITS [mem 0x08080000-0x0809ffff]
     [    0.000000] ITS at 0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
     [    0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
     [    0.000000] GIC: using LPI property table @0x000000007d850000
     [    0.000000] ITS: Allocated 1792 chunks for LPIs
     [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
     [    0.000000] GIC: PPI11 is secure or misconfigured
     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
     [    0.000000] arch_timer: WARNING: Please fix your firmware
     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
     [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
     [    0.000002] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
     [    0.000844] Console: colour dummy device 80x25
     [    0.001406] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
     [    0.002458] pid_max: default: 32768 minimum: 301
     [    0.002944] Security Framework initialized
     [    0.003521] Dentry cache hash table entries: 131072 (order: 8, 
1048576 bytes)
     [    0.004322] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
     [    0.005022] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.005797] Mountpoint-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.025904] ASID allocator initialised with 32768 entries
     [    0.029913] Hierarchical SRCU implementation.
     [    0.034285] Platform MSI: its domain created
     [    0.034740] PCI/MSI: /intc/its domain created
     [    0.035318] EFI services will not be available.
     [    0.037943] smp: Bringing up secondary CPUs ...
     [    0.038410] smp: Brought up 1 node, 1 CPU
     [    0.038815] SMP: Total of 1 processors activated.
     [    0.039300] CPU features: detected: GIC system register CPU 
interface
     [    0.039946] CPU features: detected: Privileged Access Never
     [    0.040506] CPU features: detected: User Access Override
     [    0.042439] Insufficient stack space to handle exception!
     [    0.042441] ESR: 0x96000046 -- DABT (current EL)
     [    0.043752] FAR: 0xffff0000093a80e0
     [    0.044207] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
     [    0.046511] IRQ stack: [0xffff000008000000..0xffff000008004000]
     [    0.052899] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
     [    0.059396] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45865-gc58dc48 #14
     [    0.067018] Hardware name: linux,dummy-virt (DT)
     [    0.071710] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
     [    0.076532] pc : el1_sync+0x0/0xb0
     [    0.080028] lr : kpti_install_ng_mappings+0x120/0x214
     [    0.085197] sp : ffff0000093a80e0
     [    0.088566] x29: ffff0000093abce0 x28: ffff000008ea9000
     [    0.093979] x27: ffff000008ea9000 x26: ffff0000091f7000
     [    0.099293] x25: ffff00000906d000 x24: ffff000009191000
     [    0.104706] x23: ffff000008ea9000 x22: 0000000041190000
     [    0.110015] x21: ffff0000091f7000 x20: 0000000000000000
     [    0.115428] x19: ffff000009190000 x18: 000000003455d99d
     [    0.120842] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.126255] x15: 000000007eff6000 x14: 000000007eff6000
     [    0.131566] x13: 00f800007fe00f11 x12: 000000007eff8000
     [    0.136983] x11: 000000007eff8000 x10: 0000000000000000
     [    0.142396] x9 : 000000007eff9000 x8 : 000000007eff9000
     [    0.147704] x7 : 0000000000000000 x6 : 00000000411f8000
     [    0.153116] x5 : 00000000411f8000 x4 : 0000000040a443d4
     [    0.158530] x3 : 00000000411f7000 x2 : 00000000411f7000
     [    0.163943] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
     [    0.169251] Kernel panic - not syncing: kernel stack overflow
     [    0.175140] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45865-gc58dc48 #14
     [    0.182732] Hardware name: linux,dummy-virt (DT)
     [    0.187424] Call trace:
     [    0.189948]  dump_backtrace+0x0/0x180
     [    0.193678]  show_stack+0x14/0x1c
     [    0.197051]  dump_stack+0x90/0xb0
     [    0.200423]  panic+0x138/0x2a0
     [    0.203549]  __stack_chk_fail+0x0/0x18
     [    0.207398]  handle_bad_stack+0x118/0x124
     [    0.211489]  __bad_stack+0x88/0x8c
     [    0.214870]  el1_sync+0x0/0xb0
     [    0.217998] Unable to handle kernel paging request at virtual 
address ffff0000093abce0
     [    0.226061] Mem abort info:
     [    0.228839]   ESR = 0x96000006
     [    0.231965]   Exception class = DABT (current EL), IL = 32 bits
     [    0.237980]   SET = 0, FnV = 0
     [    0.241105]   EA = 0, S1PTW = 0
     [    0.244346] Data abort info:
     [    0.247239]   ISV = 0, ISS = 0x00000006
     [    0.251199]   CM = 0, WnR = 0
     [    0.254209] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
     [    0.261191] [ffff0000093abce0] pgd=00000000411f8003, 
pud=00000000411f9003, pmd=0000000000000000
     [    0.269982] Internal error: Oops: 96000006 [#1] PREEMPT SMP
     [    0.275538] Modules linked in:
     [    0.278664] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45865-gc58dc48 #14
     [    0.286361] Hardware name: linux,dummy-virt (DT)
     [    0.291053] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
     [    0.295874] pc : unwind_frame+0x28/0xc8
     [    0.299836] lr : dump_backtrace+0x12c/0x180
     [    0.304055] sp : ffff80003efcf000
     [    0.307429] x29: ffff80003efcf000 x28: ffff80003da61c00
     [    0.312841] x27: ffff000008ea9000 x26: ffff0000091f7000
     [    0.318255] x25: ffff00000906d000 x24: ffff0000093a80e0
     [    0.323563] x23: 0000000000000000 x22: ffff000008dbada0
     [    0.328975] x21: 0000000000000000 x20: ffff000009049000
     [    0.334388] x19: ffff80003da61c00 x18: 000000003455d99d
     [    0.339698] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.345111] x15: 000000007eff6000 x14: 3431232038346364
     [    0.350523] x13: 0000000000000000 x12: cc26f77952f87e00
     [    0.355832] x11: ffffffffffffffff x10: 0000000000000075
     [    0.361245] x9 : ffff0000085ae9e8 x8 : 78302f3078302b63
     [    0.366666] x7 : 6e79735f316c6520 x6 : ffff0000091befe1
     [    0.371976] x5 : 0000000000000000 x4 : ffff0000093ac000
     [    0.377389] x3 : ffff0000093a8000 x2 : ffff0000093abce0
     [    0.382801] x1 : ffff80003efcf048 x0 : ffff80003da61c00
     [    0.388214] Process migration/0 (pid: 12, stack limit = 
0x        (ptrval))
     [    0.395204] Call trace:
     [    0.397726]  unwind_frame+0x28/0xc8
     [    0.401224]  show_stack+0x14/0x1c
     [    0.404699]  dump_stack+0x90/0xb0
     [    0.408070]  panic+0x138/0x2a0
     [    0.411198]  __stack_chk_fail+0x0/0x18
     [    0.414944]  handle_bad_stack+0x118/0x124
     [    0.419035]  __bad_stack+0x88/0x8c
     [    0.422520]  el1_sync+0x0/0xb0
     [    0.425648] Unable to handle kernel paging request at virtual 
address ffff0000093abce0
     [    0.433601] Mem abort info:
     [    0.436486]   ESR = 0x96000006
     [    0.439611]   Exception class = DABT (current EL), IL = 32 bits
     [    0.445626]   SET = 0, FnV = 0
     [    0.448754]   EA = 0, S1PTW = 0
     [    0.451995] Data abort info:
     [    0.454888]   ISV = 0, ISS = 0x00000006
     [    0.458849]   CM = 0, WnR = 0
     [    0.461860] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
     [    0.468843] [ffff0000093abce0] pgd=00000000411f8003, 
pud=00000000411f9003, pmd=0000000000000000


> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..e2a8e88f95a0 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -216,7 +216,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)
>   	.endm
>   
>   	.macro __idmap_kpti_put_pgtable_ent_ng, type
> -	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
> +	eor	\type, \type, #PTE_NG		// Same bit for blocks and pages
>   	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
>   	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
>   	.endm
> @@ -298,6 +298,7 @@ skip_pgd:
>   	/* PUD */
>   walk_puds:
>   	.if CONFIG_PGTABLE_LEVELS > 3
> +	eor	pgd, pgd, #PTE_NG
>   	pte_to_phys	cur_pudp, pgd
>   	add	end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
>   do_pud:	__idmap_kpti_get_pgtable_ent	pud
> @@ -319,6 +320,7 @@ next_pud:
>   	/* PMD */
>   walk_pmds:
>   	.if CONFIG_PGTABLE_LEVELS > 2
> +	eor	pud, pud, #PTE_NG
>   	pte_to_phys	cur_pmdp, pud
>   	add	end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
>   do_pmd:	__idmap_kpti_get_pgtable_ent	pmd
> @@ -339,6 +341,7 @@ next_pmd:
>   
>   	/* PTE */
>   walk_ptes:
> +	eor	pmd, pmd, #PTE_NG
>   	pte_to_phys	cur_ptep, pmd
>   	add	end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
>   do_pte:	__idmap_kpti_get_pgtable_ent	pte
>
> .
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-22 10:45                       ` Wei Xu
@ 2018-06-22 11:16                         ` Will Deacon
  -1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 11:16 UTC (permalink / raw)
  To: Wei Xu
  Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

Hi Wei,

Thanks for giving that a spin.

On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> On 2018/6/22 17:23, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
> >>On 2018/6/21 11:54, Will Deacon wrote:
> >>>On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> >>>>On 2018/6/21 10:18, Will Deacon wrote:
> >>>>>Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> >>>>>otherwise your kernel will take an age to boot.
> >>>>Yes, amazing! This patch resolved the issue.
> >>>Great...
> >>>
> >>>>I have tested 50 times and can not reproduce the issue any more.
> >>>>Could you please tell more why this patch works?
> >>>You might need to ask your CPU design team ;)
> >>>
> >>>Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> >>>bit 11 in table descriptors so that we can keep track of which parts of
> >>>the page table we've visited. With this patch, we don't bother tracking
> >>>and potentially rewalk parts of the page table (which takes a very long
> >>>time if KASAN is enabled).
> >>Got it. Thanks!
> >>
> >>>The architecture documents I've looked at are clear that bit 11 is IGNORED
> >>>by the CPU, which:
> >>>
> >>>   "Indicates that the architecture guarantees that the bit or field is not
> >>>    interpreted or modified by hardware."
> >>>
> >>>Please can you double-check that your CPU is indeed ignoring bit 11 in
> >>>non-leaf (table) descriptors?
> >>Do the non-leaf(table) descriptors mean the table descriptors
> >>of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
> >>in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
> >>
> >>If yes, our hardware does ignore it(not interpret or modify).
> >Ok, thanks for checking.
> >
> >>Is there any other possible reason cause this?
> >Perhaps just writing back the table entries is enough to cause the issue,
> >although I really can't understand why that would be the case. Can you try
> >the diff below (without my previous change), please?
> 
> Thanks!
> But it does not resolve the issue(only apply this patch based on 4.17.0).

Thanks, that's a useful data point. It means that it still crashes even if
we write back the same table entries, so it's the fact that we're writing
them at all which causes the problem, not the value that we write.

Whilst looking at the code, we noticed a missing DMB. On the off-chance
that it helps, can you try this instead please?

Will

--->8

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..03646e6a2ef4 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -217,8 +217,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
 
 	.macro __idmap_kpti_put_pgtable_ent_ng, type
 	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
-	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
-	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
+	str	\type, [cur_\()\type\()p]	// Update the entry and ensure
+	dmb	sy				// that it is visible to all
+	dc	civac, cur_\()\type\()p		// CPUs.
 	.endm
 
 /*

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 11:16                         ` Will Deacon
  0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 11:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Wei,

Thanks for giving that a spin.

On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> On 2018/6/22 17:23, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
> >>On 2018/6/21 11:54, Will Deacon wrote:
> >>>On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
> >>>>On 2018/6/21 10:18, Will Deacon wrote:
> >>>>>Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
> >>>>>otherwise your kernel will take an age to boot.
> >>>>Yes, amazing! This patch resolved the issue.
> >>>Great...
> >>>
> >>>>I have tested 50 times and can not reproduce the issue any more.
> >>>>Could you please tell more why this patch works?
> >>>You might need to ask your CPU design team ;)
> >>>
> >>>Without this patch, the code in idmap_kpti_install_ng_mappings() sets
> >>>bit 11 in table descriptors so that we can keep track of which parts of
> >>>the page table we've visited. With this patch, we don't bother tracking
> >>>and potentially rewalk parts of the page table (which takes a very long
> >>>time if KASAN is enabled).
> >>Got it. Thanks!
> >>
> >>>The architecture documents I've looked at are clear that bit 11 is IGNORED
> >>>by the CPU, which:
> >>>
> >>>   "Indicates that the architecture guarantees that the bit or field is not
> >>>    interpreted or modified by hardware."
> >>>
> >>>Please can you double-check that your CPU is indeed ignoring bit 11 in
> >>>non-leaf (table) descriptors?
> >>Do the non-leaf(table) descriptors mean the table descriptors
> >>of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
> >>in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
> >>
> >>If yes, our hardware does ignore it(not interpret or modify).
> >Ok, thanks for checking.
> >
> >>Is there any other possible reason cause this?
> >Perhaps just writing back the table entries is enough to cause the issue,
> >although I really can't understand why that would be the case. Can you try
> >the diff below (without my previous change), please?
> 
> Thanks!
> But it does not resolve the issue(only apply this patch based on 4.17.0).

Thanks, that's a useful data point. It means that it still crashes even if
we write back the same table entries, so it's the fact that we're writing
them at all which causes the problem, not the value that we write.

Whilst looking at the code, we noticed a missing DMB. On the off-chance
that it helps, can you try this instead please?

Will

--->8

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..03646e6a2ef4 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -217,8 +217,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
 
 	.macro __idmap_kpti_put_pgtable_ent_ng, type
 	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
-	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
-	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
+	str	\type, [cur_\()\type\()p]	// Update the entry and ensure
+	dmb	sy				// that it is visible to all
+	dc	civac, cur_\()\type\()p		// CPUs.
 	.endm
 
 /*

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-22 11:16                         ` Will Deacon
@ 2018-06-22 13:18                           ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 13:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

Hi Will,

On 2018/6/22 19:16, Will Deacon wrote:
> Hi Wei,
>
> Thanks for giving that a spin.
>
> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>> On 2018/6/22 17:23, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
>>>> On 2018/6/21 11:54, Will Deacon wrote:
>>>>> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>>>>>> On 2018/6/21 10:18, Will Deacon wrote:
>>>>>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>>>>>> otherwise your kernel will take an age to boot.
>>>>>> Yes, amazing! This patch resolved the issue.
>>>>> Great...
>>>>>
>>>>>> I have tested 50 times and can not reproduce the issue any more.
>>>>>> Could you please tell more why this patch works?
>>>>> You might need to ask your CPU design team ;)
>>>>>
>>>>> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
>>>>> bit 11 in table descriptors so that we can keep track of which parts of
>>>>> the page table we've visited. With this patch, we don't bother tracking
>>>>> and potentially rewalk parts of the page table (which takes a very long
>>>>> time if KASAN is enabled).
>>>> Got it. Thanks!
>>>>
>>>>> The architecture documents I've looked at are clear that bit 11 is IGNORED
>>>>> by the CPU, which:
>>>>>
>>>>>    "Indicates that the architecture guarantees that the bit or field is not
>>>>>     interpreted or modified by hardware."
>>>>>
>>>>> Please can you double-check that your CPU is indeed ignoring bit 11 in
>>>>> non-leaf (table) descriptors?
>>>> Do the non-leaf(table) descriptors mean the table descriptors
>>>> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
>>>> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
>>>>
>>>> If yes, our hardware does ignore it(not interpret or modify).
>>> Ok, thanks for checking.
>>>
>>>> Is there any other possible reason cause this?
>>> Perhaps just writing back the table entries is enough to cause the issue,
>>> although I really can't understand why that would be the case. Can you try
>>> the diff below (without my previous change), please?
>> Thanks!
>> But it does not resolve the issue(only apply this patch based on 4.17.0).
> Thanks, that's a useful data point. It means that it still crashes even if
> we write back the same table entries, so it's the fact that we're writing
> them at all which causes the problem, not the value that we write.
>
> Whilst looking at the code, we noticed a missing DMB. On the off-chance
> that it helps, can you try this instead please?
Thanks!
Only apply below patch based on 4.17.0, we still got the crash.
The log is as below nearly same with before.

     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
     [    0.000000] Linux version 4.17.0-45864-g29dcea8-dirty 
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #16 
SMP PREEMPT Fri Jun 22 21:05:10 CST 2018
     [    0.000000] Machine model: linux,dummy-virt
     [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
     [    0.000000] bootconsole [pl11] enabled
     [    0.000000] efi: Getting EFI parameters from FDT:
     [    0.000000] efi: UEFI not found.
     [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
     [    0.000000] NUMA: No NUMA configuration found
     [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
     [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
     [    0.000000] Zone ranges:
     [    0.000000]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000]   Normal   empty
     [    0.000000] Movable zone start for each node
     [    0.000000] Early memory node ranges
     [    0.000000]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
     [    0.000000] psci: probing for conduit method from DT.
     [    0.000000] psci: PSCIv1.0 detected in firmware.
     [    0.000000] psci: Using standard PSCI v0.2 function IDs
     [    0.000000] psci: Trusted OS migration not required
     [    0.000000] psci: SMC Calling Convention v1.1
     [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
     [    0.000000] percpu: Embedded 24 pages/cpu @        (ptrval) 
s57984 r8192 d32128 u98304
     [    0.000000] Detected VIPT I-cache on CPU0
     [    0.000000] CPU features: detected: Kernel page table isolation 
(KPTI)
     [    0.000000] CPU features: detected: Hardware dirty bit management
     [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
     [    0.000000] Policy zone: DMA32
     [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
     [    0.000000] Memory: 968436K/1048576K available (10044K kernel 
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved, 
16384K cma-reserved)
     [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, 
Nodes=1
     [    0.000000] Preemptible hierarchical RCU implementation.
     [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
     [    0.000000]     Tasks RCU enabled.
     [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
     [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
     [    0.000000] GICv3: Distributor has no Range Selector support
     [    0.000000] GICv3: no VLPI support, no direct LPI support
     [    0.000000] ITS [mem 0x08080000-0x0809ffff]
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
     [    0.000000] GIC: using LPI property table @0x000000007d850000
     [    0.000000] ITS: Allocated 1792 chunks for LPIs
     [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
     [    0.000000] GIC: PPI11 is secure or misconfigured
     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
     [    0.000000] arch_timer: WARNING: Please fix your firmware
     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
     [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
     [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
     [    0.000849] Console: colour dummy device 80x25
     [    0.001427] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
     [    0.002485] pid_max: default: 32768 minimum: 301
     [    0.002966] Security Framework initialized
     [    0.003549] Dentry cache hash table entries: 131072 (order: 8, 
1048576 bytes)
     [    0.004353] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
     [    0.005068] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.005858] Mountpoint-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.025962] ASID allocator initialised with 32768 entries
     [    0.029972] Hierarchical SRCU implementation.
     [    0.034341] Platform MSI: its domain created
     [    0.034793] PCI/MSI: /intc/its domain created
     [    0.035360] EFI services will not be available.
     [    0.038002] smp: Bringing up secondary CPUs ...
     [    0.038472] smp: Brought up 1 node, 1 CPU
     [    0.038878] SMP: Total of 1 processors activated.
     [    0.039354] CPU features: detected: GIC system register CPU 
interface
     [    0.040004] CPU features: detected: Privileged Access Never
     [    0.040566] CPU features: detected: User Access Override
     [    0.042462] Insufficient stack space to handle exception!
     [    0.042464] ESR: 0x96000046 -- DABT (current EL)
     [    0.043781] FAR: 0xffff0000093a80e0
     [    0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
     [    0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
     [    0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
     [    0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
     [    0.067946] Hardware name: linux,dummy-virt (DT)
     [    0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
     [    0.077480] pc : el1_sync+0x0/0xb0
     [    0.080970] lr : kpti_install_ng_mappings+0x120/0x214
     [    0.086143] sp : ffff0000093a80e0
     [    0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
     [    0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
     [    0.100241] x25: ffff00000906d000 x24: ffff000009191000
     [    0.105657] x23: ffff000008ea9000 x22: 0000000041190000
     [    0.111448] x21: ffff0000091f7000 x20: 0000000000000000
     [    0.116437] x19: ffff000009190000 x18: 000000003455d99d
     [    0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.127155] x15: 000000007eff6000 x14: 000000007eff6000
     [    0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
     [    0.137886] x11: 000000007eff8000 x10: 0000000000000000
     [    0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
     [    0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
     [    0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
     [    0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
     [    0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
     [    0.170179] Kernel panic - not syncing: kernel stack overflow
     [    0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
     [    0.184152] Hardware name: linux,dummy-virt (DT)
     [    0.188851] Call trace:
     [    0.191380]  dump_backtrace+0x0/0x180
     [    0.195113]  show_stack+0x14/0x1c
     [    0.198488]  dump_stack+0x90/0xb0
     [    0.201862]  panic+0x138/0x2a0
     [    0.204989]  __stack_chk_fail+0x0/0x18
     [    0.208836]  handle_bad_stack+0x118/0x124
     [    0.212927]  __bad_stack+0x88/0x8c
     [    0.216414]  el1_sync+0x0/0xb0
     [    0.219544] Unable to handle kernel paging request at virtual 
address ffff0000093abce0
     [    0.227507] Mem abort info:
     [    0.230390]   ESR = 0x96000006
     [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
     [    0.239428]   SET = 0, FnV = 0
     [    0.242555]   EA = 0, S1PTW = 0
     [    0.245797] Data abort info:
     [    0.248795]   ISV = 0, ISS = 0x00000006
     [    0.252652]   CM = 0, WnR = 0
     [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
     [    0.262645] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000

Best Regards,
Wei

> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..03646e6a2ef4 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -217,8 +217,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
>   
>   	.macro __idmap_kpti_put_pgtable_ent_ng, type
>   	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
> -	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
> -	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
> +	str	\type, [cur_\()\type\()p]	// Update the entry and ensure
> +	dmb	sy				// that it is visible to all
> +	dc	civac, cur_\()\type\()p		// CPUs.
>   	.endm
>   
>   /*
>
> .
>


^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 13:18                           ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 13:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2018/6/22 19:16, Will Deacon wrote:
> Hi Wei,
>
> Thanks for giving that a spin.
>
> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>> On 2018/6/22 17:23, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 09:33:04AM +0100, Wei Xu wrote:
>>>> On 2018/6/21 11:54, Will Deacon wrote:
>>>>> On Thu, Jun 21, 2018 at 11:14:28AM +0100, Wei Xu wrote:
>>>>>> On 2018/6/21 10:18, Will Deacon wrote:
>>>>>>> Wei -- does the diff below help at all? Make sure you disable CONFIG_KASAN,
>>>>>>> otherwise your kernel will take an age to boot.
>>>>>> Yes, amazing! This patch resolved the issue.
>>>>> Great...
>>>>>
>>>>>> I have tested 50 times and can not reproduce the issue any more.
>>>>>> Could you please tell more why this patch works?
>>>>> You might need to ask your CPU design team ;)
>>>>>
>>>>> Without this patch, the code in idmap_kpti_install_ng_mappings() sets
>>>>> bit 11 in table descriptors so that we can keep track of which parts of
>>>>> the page table we've visited. With this patch, we don't bother tracking
>>>>> and potentially rewalk parts of the page table (which takes a very long
>>>>> time if KASAN is enabled).
>>>> Got it. Thanks!
>>>>
>>>>> The architecture documents I've looked at are clear that bit 11 is IGNORED
>>>>> by the CPU, which:
>>>>>
>>>>>    "Indicates that the architecture guarantees that the bit or field is not
>>>>>     interpreted or modified by hardware."
>>>>>
>>>>> Please can you double-check that your CPU is indeed ignoring bit 11 in
>>>>> non-leaf (table) descriptors?
>>>> Do the non-leaf(table) descriptors mean the table descriptors
>>>> of the section D4.3.1 "VMSAv8-64 translation table level 0, level 1, and level 2 descriptor formats"
>>>> in the ARM Architecture Reference Manual ARMv8 for ARMv8-A(DDI0487C_a_armv8_arm.pdf)?
>>>>
>>>> If yes, our hardware does ignore it(not interpret or modify).
>>> Ok, thanks for checking.
>>>
>>>> Is there any other possible reason cause this?
>>> Perhaps just writing back the table entries is enough to cause the issue,
>>> although I really can't understand why that would be the case. Can you try
>>> the diff below (without my previous change), please?
>> Thanks!
>> But it does not resolve the issue(only apply this patch based on 4.17.0).
> Thanks, that's a useful data point. It means that it still crashes even if
> we write back the same table entries, so it's the fact that we're writing
> them at all which causes the problem, not the value that we write.
>
> Whilst looking at the code, we noticed a missing DMB. On the off-chance
> that it helps, can you try this instead please?
Thanks!
Only apply below patch based on 4.17.0, we still got the crash.
The log is as below nearly same with before.

     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
     [    0.000000] Linux version 4.17.0-45864-g29dcea8-dirty 
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #16 
SMP PREEMPT Fri Jun 22 21:05:10 CST 2018
     [    0.000000] Machine model: linux,dummy-virt
     [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
     [    0.000000] bootconsole [pl11] enabled
     [    0.000000] efi: Getting EFI parameters from FDT:
     [    0.000000] efi: UEFI not found.
     [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
     [    0.000000] NUMA: No NUMA configuration found
     [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
     [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
     [    0.000000] Zone ranges:
     [    0.000000]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000]   Normal   empty
     [    0.000000] Movable zone start for each node
     [    0.000000] Early memory node ranges
     [    0.000000]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
     [    0.000000] psci: probing for conduit method from DT.
     [    0.000000] psci: PSCIv1.0 detected in firmware.
     [    0.000000] psci: Using standard PSCI v0.2 function IDs
     [    0.000000] psci: Trusted OS migration not required
     [    0.000000] psci: SMC Calling Convention v1.1
     [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
     [    0.000000] percpu: Embedded 24 pages/cpu @        (ptrval) 
s57984 r8192 d32128 u98304
     [    0.000000] Detected VIPT I-cache on CPU0
     [    0.000000] CPU features: detected: Kernel page table isolation 
(KPTI)
     [    0.000000] CPU features: detected: Hardware dirty bit management
     [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
     [    0.000000] Policy zone: DMA32
     [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
     [    0.000000] Memory: 968436K/1048576K available (10044K kernel 
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved, 
16384K cma-reserved)
     [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, 
Nodes=1
     [    0.000000] Preemptible hierarchical RCU implementation.
     [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
     [    0.000000]     Tasks RCU enabled.
     [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
     [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
     [    0.000000] GICv3: Distributor has no Range Selector support
     [    0.000000] GICv3: no VLPI support, no direct LPI support
     [    0.000000] ITS [mem 0x08080000-0x0809ffff]
     [    0.000000] ITS at 0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
     [    0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
     [    0.000000] GIC: using LPI property table @0x000000007d850000
     [    0.000000] ITS: Allocated 1792 chunks for LPIs
     [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
     [    0.000000] GIC: PPI11 is secure or misconfigured
     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
     [    0.000000] arch_timer: WARNING: Please fix your firmware
     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
     [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
     [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
     [    0.000849] Console: colour dummy device 80x25
     [    0.001427] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
     [    0.002485] pid_max: default: 32768 minimum: 301
     [    0.002966] Security Framework initialized
     [    0.003549] Dentry cache hash table entries: 131072 (order: 8, 
1048576 bytes)
     [    0.004353] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
     [    0.005068] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.005858] Mountpoint-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.025962] ASID allocator initialised with 32768 entries
     [    0.029972] Hierarchical SRCU implementation.
     [    0.034341] Platform MSI: its domain created
     [    0.034793] PCI/MSI: /intc/its domain created
     [    0.035360] EFI services will not be available.
     [    0.038002] smp: Bringing up secondary CPUs ...
     [    0.038472] smp: Brought up 1 node, 1 CPU
     [    0.038878] SMP: Total of 1 processors activated.
     [    0.039354] CPU features: detected: GIC system register CPU 
interface
     [    0.040004] CPU features: detected: Privileged Access Never
     [    0.040566] CPU features: detected: User Access Override
     [    0.042462] Insufficient stack space to handle exception!
     [    0.042464] ESR: 0x96000046 -- DABT (current EL)
     [    0.043781] FAR: 0xffff0000093a80e0
     [    0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
     [    0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
     [    0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
     [    0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
     [    0.067946] Hardware name: linux,dummy-virt (DT)
     [    0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
     [    0.077480] pc : el1_sync+0x0/0xb0
     [    0.080970] lr : kpti_install_ng_mappings+0x120/0x214
     [    0.086143] sp : ffff0000093a80e0
     [    0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
     [    0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
     [    0.100241] x25: ffff00000906d000 x24: ffff000009191000
     [    0.105657] x23: ffff000008ea9000 x22: 0000000041190000
     [    0.111448] x21: ffff0000091f7000 x20: 0000000000000000
     [    0.116437] x19: ffff000009190000 x18: 000000003455d99d
     [    0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.127155] x15: 000000007eff6000 x14: 000000007eff6000
     [    0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
     [    0.137886] x11: 000000007eff8000 x10: 0000000000000000
     [    0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
     [    0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
     [    0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
     [    0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
     [    0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
     [    0.170179] Kernel panic - not syncing: kernel stack overflow
     [    0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
     [    0.184152] Hardware name: linux,dummy-virt (DT)
     [    0.188851] Call trace:
     [    0.191380]  dump_backtrace+0x0/0x180
     [    0.195113]  show_stack+0x14/0x1c
     [    0.198488]  dump_stack+0x90/0xb0
     [    0.201862]  panic+0x138/0x2a0
     [    0.204989]  __stack_chk_fail+0x0/0x18
     [    0.208836]  handle_bad_stack+0x118/0x124
     [    0.212927]  __bad_stack+0x88/0x8c
     [    0.216414]  el1_sync+0x0/0xb0
     [    0.219544] Unable to handle kernel paging request at virtual 
address ffff0000093abce0
     [    0.227507] Mem abort info:
     [    0.230390]   ESR = 0x96000006
     [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
     [    0.239428]   SET = 0, FnV = 0
     [    0.242555]   EA = 0, S1PTW = 0
     [    0.245797] Data abort info:
     [    0.248795]   ISV = 0, ISS = 0x00000006
     [    0.252652]   CM = 0, WnR = 0
     [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
     [    0.262645] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000

Best Regards,
Wei

> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..03646e6a2ef4 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -217,8 +217,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
>   
>   	.macro __idmap_kpti_put_pgtable_ent_ng, type
>   	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
> -	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
> -	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
> +	str	\type, [cur_\()\type\()p]	// Update the entry and ensure
> +	dmb	sy				// that it is visible to all
> +	dc	civac, cur_\()\type\()p		// CPUs.
>   	.endm
>   
>   /*
>
> .
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-22 13:18                           ` Wei Xu
@ 2018-06-22 13:31                             ` Will Deacon
  -1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 13:31 UTC (permalink / raw)
  To: Wei Xu
  Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

Hi again, Wei,

On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> On 2018/6/22 19:16, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> >>On 2018/6/22 17:23, Will Deacon wrote:
> >>>Perhaps just writing back the table entries is enough to cause the issue,
> >>>although I really can't understand why that would be the case. Can you try
> >>>the diff below (without my previous change), please?
> >>Thanks!
> >>But it does not resolve the issue(only apply this patch based on 4.17.0).
> >Thanks, that's a useful data point. It means that it still crashes even if
> >we write back the same table entries, so it's the fact that we're writing
> >them at all which causes the problem, not the value that we write.
> >
> >Whilst looking at the code, we noticed a missing DMB. On the off-chance
> >that it helps, can you try this instead please?
> Thanks!
> Only apply below patch based on 4.17.0, we still got the crash.

Oh well, it was worth a shot (and that's still a fix worth having). Please
can you provide the complete disassembly for kpti_install_ng_mappings()
(I'm referring to the C function in cpufeature.c) along with a corresponding
crash log so that we can correlate the instruction stream with the crash?

Thanks,

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 13:31                             ` Will Deacon
  0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 13:31 UTC (permalink / raw)
  To: linux-arm-kernel

Hi again, Wei,

On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> On 2018/6/22 19:16, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> >>On 2018/6/22 17:23, Will Deacon wrote:
> >>>Perhaps just writing back the table entries is enough to cause the issue,
> >>>although I really can't understand why that would be the case. Can you try
> >>>the diff below (without my previous change), please?
> >>Thanks!
> >>But it does not resolve the issue(only apply this patch based on 4.17.0).
> >Thanks, that's a useful data point. It means that it still crashes even if
> >we write back the same table entries, so it's the fact that we're writing
> >them at all which causes the problem, not the value that we write.
> >
> >Whilst looking at the code, we noticed a missing DMB. On the off-chance
> >that it helps, can you try this instead please?
> Thanks!
> Only apply below patch based on 4.17.0, we still got the crash.

Oh well, it was worth a shot (and that's still a fix worth having). Please
can you provide the complete disassembly for kpti_install_ng_mappings()
(I'm referring to the C function in cpufeature.c) along with a corresponding
crash log so that we can correlate the instruction stream with the crash?

Thanks,

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-22 13:31                             ` Will Deacon
@ 2018-06-22 13:46                               ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 13:46 UTC (permalink / raw)
  To: Will Deacon
  Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

Hi Will,

On 2018/6/22 21:31, Will Deacon wrote:
> Hi again, Wei,
>
> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>> On 2018/6/22 19:16, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>>>> On 2018/6/22 17:23, Will Deacon wrote:
>>>>> Perhaps just writing back the table entries is enough to cause the issue,
>>>>> although I really can't understand why that would be the case. Can you try
>>>>> the diff below (without my previous change), please?
>>>> Thanks!
>>>> But it does not resolve the issue(only apply this patch based on 4.17.0).
>>> Thanks, that's a useful data point. It means that it still crashes even if
>>> we write back the same table entries, so it's the fact that we're writing
>>> them at all which causes the problem, not the value that we write.
>>>
>>> Whilst looking at the code, we noticed a missing DMB. On the off-chance
>>> that it helps, can you try this instead please?
>> Thanks!
>> Only apply below patch based on 4.17.0, we still got the crash.
> Oh well, it was worth a shot (and that's still a fix worth having). Please
> can you provide the complete disassembly for kpti_install_ng_mappings()
> (I'm referring to the C function in cpufeature.c) along with a corresponding
> crash log so that we can correlate the instruction stream with the crash?
Just let me know if you need more information.
Thanks!

The disassemble code is as below:
     Dump of assembler code for function kpti_install_ng_mappings:
        0xffff000008091d68 <+0>:     stp     x29, x30, [sp,#-112]!
        0xffff000008091d6c <+4>:     adrp    x0, 0xffff000009022000 
<bp_hardening_data>
        0xffff000008091d70 <+8>:     mov     x29, sp
        0xffff000008091d74 <+12>:    stp     x23, x24, [sp,#48]
        0xffff000008091d78 <+16>:    adrp    x24, 0xffff000009191000 
<reset_devices>
        0xffff000008091d7c <+20>:    add     x0, x0, #0x10
        0xffff000008091d80 <+24>:    add     x1, x24, #0x550
        0xffff000008091d84 <+28>:    stp     x19, x20, [sp,#16]
        0xffff000008091d88 <+32>:    stp     x21, x22, [sp,#32]
        0xffff000008091d8c <+36>:    stp     x25, x26, [sp,#64]
        0xffff000008091d90 <+40>:    stp     x27, x28, [sp,#80]
        0xffff000008091d94 <+44>:    mrs     x2, tpidr_el1
        0xffff000008091d98 <+48>:    ldrb    w1, [x1,#8]
        0xffff000008091d9c <+52>:    ldr     w20, [x2,x0]
        0xffff000008091da0 <+56>:    cbnz    w1, 0xffff000008091f18 
<kpti_install_ng_mappings+432>
        0xffff000008091da4 <+60>:    adrp    x27, 0xffff000008ea9000 
<cpu_ops+384>
        0xffff000008091da8 <+64>:    adrp    x19, 0xffff000009190000 
<empty_zero_page>
        0xffff000008091dac <+68>:    add     x19, x19, #0x0
        0xffff000008091db0 <+72>:    adrp    x1, 0xffff000008a44000 
<kimage_vaddr>
        0xffff000008091db4 <+76>:    mov     x0, x19
        0xffff000008091db8 <+80>:    add     x1, x1, #0x3d8
        0xffff000008091dbc <+84>:    ldr     x2, [x27,#672]
        0xffff000008091dc0 <+88>:    sub     x4, x1, x2
        0xffff000008091dc4 <+92>:    sub     x0, x0, x2
        0xffff000008091dc8 <+96>:    msr     ttbr0_el1, x0
        0xffff000008091dcc <+100>:   isb
        0xffff000008091dd0 <+104>:   dsb     nshst
        0xffff000008091dd4 <+108>:   tlbi    vmalle1
        0xffff000008091dd8 <+112>:   nop
        0xffff000008091ddc <+116>:   nop
        0xffff000008091de0 <+120>:   dsb     nsh
        0xffff000008091de4 <+124>:   isb
        0xffff000008091de8 <+128>:   adrp    x3, 0xffff000009056000 
<armv8_event_attr_sw_incr+8>
        0xffff000008091dec <+132>:   ldr     x0, [x3,#2248]
        0xffff000008091df0 <+136>:   cmp     x0, #0x10
        0xffff000008091df4 <+140>:   b.ne    0xffff000008091f64 
<kpti_install_ng_mappings+508>
        0xffff000008091df8 <+144>:   adrp    x28, 0xffff000008ea9000 
<cpu_ops+384>
        0xffff000008091dfc <+148>:   ldr     x2, [x27,#672]
        0xffff000008091e00 <+152>:   adrp    x1, 0xffff0000091f3000
        0xffff000008091e04 <+156>:   adrp    x26, 0xffff0000091f7000
        0xffff000008091e08 <+160>:   add     x1, x1, #0x0
        0xffff000008091e0c <+164>:   add     x21, x26, #0x0
        0xffff000008091e10 <+168>:   ldr     x0, [x28,#656]
        0xffff000008091e14 <+172>:   adrp    x23, 0xffff000008ea9000 
<cpu_ops+384>
        0xffff000008091e18 <+176>:   sub     x1, x1, x2
        0xffff000008091e1c <+180>:   sub     x1, x1, x0
        0xffff000008091e20 <+184>:   orr     x0, x1, #0xffff800000000000
        0xffff000008091e24 <+188>:   cmp     x0, x21
        0xffff000008091e28 <+192>:   b.eq    0xffff000008091f60 
<kpti_install_ng_mappings+504>
        0xffff000008091e2c <+196>:   mov     x22, x19
        0xffff000008091e30 <+200>:   str     x3, [x29,#96]
        0xffff000008091e34 <+204>:   str     x4, [x29,#104]
        0xffff000008091e38 <+208>:   sub     x2, x22, x2
        0xffff000008091e3c <+212>:   msr     ttbr0_el1, x2
        0xffff000008091e40 <+216>:   isb
        0xffff000008091e44 <+220>:   ldr     x0, [x28,#656]
        0xffff000008091e48 <+224>:   and     x1, x1, #0x7fffffffffff
        0xffff000008091e4c <+228>:   adrp    x25, 0xffff00000906d000 
<shmem_swaplist_mutex+16>
        0xffff000008091e50 <+232>:   add     x0, x1, x0
        0xffff000008091e54 <+236>:   add     x1, x25, #0x7b0
        0xffff000008091e58 <+240>:   bl      0xffff0000080a021c 
<cpu_do_switch_mm>
        0xffff000008091e5c <+244>:   adrp    x0, 0xffff00000904a000 
<__cpu_online_mask>
        0xffff000008091e60 <+248>:   mov     w1, 
#0x80                       // #128
        0xffff000008091e64 <+252>:   add     x0, x0, #0x0
        0xffff000008091e68 <+256>:   bl      0xffff0000083e22f0 
<__bitmap_weight>
        0xffff000008091e6c <+260>:   mov     w1, w0
        0xffff000008091e70 <+264>:   ldr     x5, [x23,#672]
        0xffff000008091e74 <+268>:   mov     w0, w20
        0xffff000008091e78 <+272>:   ldr     x4, [x29,#104]
        0xffff000008091e7c <+276>:   mov     x2, x21
        0xffff000008091e80 <+280>:   sub     x2, x2, x5
        0xffff000008091e84 <+284>:   blr     x4
        0xffff000008091e88 <+288>:   ldr     x1, [x23,#672]
        0xffff000008091e8c <+292>:   mrs     x0, sp_el0
        0xffff000008091e90 <+296>:   sub     x22, x22, x1
        0xffff000008091e94 <+300>:   ldr     x1, [x0,#1128]
        0xffff000008091e98 <+304>:   msr     ttbr0_el1, x22
        0xffff000008091e9c <+308>:   isb
        0xffff000008091ea0 <+312>:   dsb     nshst
        0xffff000008091ea4 <+316>:   tlbi    vmalle1
        0xffff000008091ea8 <+320>:   nop
        0xffff000008091eac <+324>:   nop
        0xffff000008091eb0 <+328>:   dsb     nsh
        0xffff000008091eb4 <+332>:   isb
        0xffff000008091eb8 <+336>:   ldr     x3, [x29,#96]
        0xffff000008091ebc <+340>:   ldr     x0, [x3,#2248]
        0xffff000008091ec0 <+344>:   cmp     x0, #0x10
        0xffff000008091ec4 <+348>:   b.ne    0xffff000008091f48 
<kpti_install_ng_mappings+480>
        0xffff000008091ec8 <+352>:   add     x25, x25, #0x7b0
        0xffff000008091ecc <+356>:   cmp     x1, x25
        0xffff000008091ed0 <+360>:   b.eq    0xffff000008091f08 
<kpti_install_ng_mappings+416>
        0xffff000008091ed4 <+364>:   ldr     x2, [x1,#64]
        0xffff000008091ed8 <+368>:   add     x26, x26, #0x0
        0xffff000008091edc <+372>:   cmp     x2, x26
        0xffff000008091ee0 <+376>:   b.eq    0xffff000008091f60 
<kpti_install_ng_mappings+504>
        0xffff000008091ee4 <+380>:   ldr     x0, [x27,#672]
        0xffff000008091ee8 <+384>:   sub     x19, x19, x0
        0xffff000008091eec <+388>:   msr     ttbr0_el1, x19
        0xffff000008091ef0 <+392>:   isb
        0xffff000008091ef4 <+396>:   tbz     x2, #47, 0xffff000008091f34 
<kpti_install_ng_mappings+460>
        0xffff000008091ef8 <+400>:   ldr     x0, [x28,#656]
        0xffff000008091efc <+404>:   and     x2, x2, #0x7fffffffffff
        0xffff000008091f00 <+408>:   add     x0, x2, x0
        0xffff000008091f04 <+412>:   bl      0xffff0000080a021c 
<cpu_do_switch_mm>
        0xffff000008091f08 <+416>:   cbnz    w20, 0xffff000008091f18 
<kpti_install_ng_mappings+432>
        0xffff000008091f0c <+420>:   add     x24, x24, #0x550
        0xffff000008091f10 <+424>:   mov     w0, 
#0x1                        // #1
        0xffff000008091f14 <+428>:   strb    w0, [x24,#8]
        0xffff000008091f18 <+432>:   ldp     x19, x20, [sp,#16]
        0xffff000008091f1c <+436>:   ldp     x21, x22, [sp,#32]
        0xffff000008091f20 <+440>:   ldp     x23, x24, [sp,#48]
        0xffff000008091f24 <+444>:   ldp     x25, x26, [sp,#64]
        0xffff000008091f28 <+448>:   ldp     x27, x28, [sp,#80]
        0xffff000008091f2c <+452>:   ldp     x29, x30, [sp],#112
        0xffff000008091f30 <+456>:   ret
        0xffff000008091f34 <+460>:   adrp    x0, 0xffff000008ea9000 
<cpu_ops+384>
        0xffff000008091f38 <+464>:   ldr     x0, [x0,#672]
        0xffff000008091f3c <+468>:   sub     x0, x2, x0
        0xffff000008091f40 <+472>:   bl      0xffff0000080a021c 
<cpu_do_switch_mm>
        0xffff000008091f44 <+476>:   b       0xffff000008091f08 
<kpti_install_ng_mappings+416>
        0xffff000008091f48 <+480>:   mrs     x0, tcr_el1
        0xffff000008091f4c <+484>:   and     x0, x0, #0xffffffffffffffc0
        0xffff000008091f50 <+488>:   orr     x0, x0, #0x10
        0xffff000008091f54 <+492>:   msr     tcr_el1, x0
        0xffff000008091f58 <+496>:   isb
        0xffff000008091f5c <+500>:   b       0xffff000008091ec8 
<kpti_install_ng_mappings+352>
        0xffff000008091f60 <+504>:   brk     #0x800
        0xffff000008091f64 <+508>:   mrs     x1, tcr_el1
        0xffff000008091f68 <+512>:   and     x1, x1, #0xffffffffffffffc0
        0xffff000008091f6c <+516>:   orr     x0, x1, x0
        0xffff000008091f70 <+520>:   msr     tcr_el1, x0
        0xffff000008091f74 <+524>:   isb
        0xffff000008091f78 <+528>:   b       0xffff000008091df8 
<kpti_install_ng_mappings+144>
     End of assembler dump.


The crash log for it is as :
     estuary:/$ ./qemu-system-aarch64 -machine 
virt,kernel_irqchip=on,gic-version=3
      -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.17-joyx 
-initrd
     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init 
console=ttyAMA0 earlycon=pl011,0x9000000"
         [    0.000000] Booting Linux on physical CPU 0x0000000000 
[0x480fd010]
         [    0.000000] Linux version 4.17.0-45864-g29dcea8-dirty 
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #16 
SMP PREEMPT Fri Jun 22 21:05:10 CST 2018
         [    0.000000] Machine model: linux,dummy-virt
         [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 
(options '')
         [    0.000000] bootconsole [pl11] enabled
         [    0.000000] efi: Getting EFI parameters from FDT:
         [    0.000000] efi: UEFI not found.
         [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
         [    0.000000] NUMA: No NUMA configuration found
         [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
         [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
         [    0.000000] Zone ranges:
         [    0.000000]   DMA32    [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000]   Normal   empty
         [    0.000000] Movable zone start for each node
         [    0.000000] Early memory node ranges
         [    0.000000]   node   0: [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000] psci: probing for conduit method from DT.
         [    0.000000] psci: PSCIv1.0 detected in firmware.
         [    0.000000] psci: Using standard PSCI v0.2 function IDs
         [    0.000000] psci: Trusted OS migration not required
         [    0.000000] psci: SMC Calling Convention v1.1
         [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
         [    0.000000] percpu: Embedded 24 pages/cpu @ (ptrval) s57984 
r8192 d32128 u98304
         [    0.000000] Detected VIPT I-cache on CPU0
         [    0.000000] CPU features: detected: Kernel page table 
isolation (KPTI)
         [    0.000000] CPU features: detected: Hardware dirty bit 
management
         [    0.000000] Built 1 zonelists, mobility grouping on. Total 
pages: 258048
         [    0.000000] Policy zone: DMA32
         [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
         [    0.000000] Memory: 968436K/1048576K available (10044K 
kernel code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K 
reserved, 16384K cma-reserved)
         [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, 
CPUs=1, Nodes=1
         [    0.000000] Preemptible hierarchical RCU implementation.
         [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
         [    0.000000]     Tasks RCU enabled.
         [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
         [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
         [    0.000000] GICv3: Distributor has no Range Selector support
         [    0.000000] GICv3: no VLPI support, no direct LPI support
         [    0.000000] ITS [mem 0x08080000-0x0809ffff]
         [    0.000000] ITS@0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
         [    0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
         [    0.000000] GIC: using LPI property table @0x000000007d850000
         [    0.000000] ITS: Allocated 1792 chunks for LPIs
         [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
         [    0.000000] CPU0: using LPI pending table @0x000000007d860000
         [    0.000000] GIC: PPI11 is secure or misconfigured
         [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
         [    0.000000] arch_timer: WARNING: Please fix your firmware
         [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz 
(virt).
         [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
         [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
         [    0.000849] Console: colour dummy device 80x25
         [    0.001427] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 200.00 BogoMIPS (lpj=400000)
         [    0.002485] pid_max: default: 32768 minimum: 301
         [    0.002966] Security Framework initialized
         [    0.003549] Dentry cache hash table entries: 131072 (order: 
8, 1048576 bytes)
         [    0.004353] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
         [    0.005068] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
         [    0.005858] Mountpoint-cache hash table entries: 2048 
(order: 2, 16384 bytes)
         [    0.025962] ASID allocator initialised with 32768 entries
         [    0.029972] Hierarchical SRCU implementation.
         [    0.034341] Platform MSI: its domain created
         [    0.034793] PCI/MSI: /intc/its domain created
         [    0.035360] EFI services will not be available.
         [    0.038002] smp: Bringing up secondary CPUs ...
         [    0.038472] smp: Brought up 1 node, 1 CPU
         [    0.038878] SMP: Total of 1 processors activated.
         [    0.039354] CPU features: detected: GIC system register CPU 
interface
         [    0.040004] CPU features: detected: Privileged Access Never
         [    0.040566] CPU features: detected: User Access Override
         [    0.042462] Insufficient stack space to handle exception!
         [    0.042464] ESR: 0x96000046 -- DABT (current EL)
         [    0.043781] FAR: 0xffff0000093a80e0
         [    0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
         [    0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
         [    0.053361] Overflow stack: 
[0xffff80003efce2f0..0xffff80003efcf2f0]
         [    0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
         [    0.067946] Hardware name: linux,dummy-virt (DT)
         [    0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
         [    0.077480] pc : el1_sync+0x0/0xb0
         [    0.080970] lr : kpti_install_ng_mappings+0x120/0x214
         [    0.086143] sp : ffff0000093a80e0
         [    0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
         [    0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
         [    0.100241] x25: ffff00000906d000 x24: ffff000009191000
         [    0.105657] x23: ffff000008ea9000 x22: 0000000041190000
         [    0.111448] x21: ffff0000091f7000 x20: 0000000000000000
         [    0.116437] x19: ffff000009190000 x18: 000000003455d99d
         [    0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
         [    0.127155] x15: 000000007eff6000 x14: 000000007eff6000
         [    0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
         [    0.137886] x11: 000000007eff8000 x10: 0000000000000000
         [    0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
         [    0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
         [    0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
         [    0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
         [    0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
         [    0.170179] Kernel panic - not syncing: kernel stack overflow
         [    0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
         [    0.184152] Hardware name: linux,dummy-virt (DT)
         [    0.188851] Call trace:
         [    0.191380]  dump_backtrace+0x0/0x180
         [    0.195113]  show_stack+0x14/0x1c
         [    0.198488]  dump_stack+0x90/0xb0
         [    0.201862]  panic+0x138/0x2a0
         [    0.204989]  __stack_chk_fail+0x0/0x18
         [    0.208836]  handle_bad_stack+0x118/0x124
         [    0.212927]  __bad_stack+0x88/0x8c
         [    0.216414]  el1_sync+0x0/0xb0
         [    0.219544] Unable to handle kernel paging request at 
virtual address ffff0000093abce0
         [    0.227507] Mem abort info:
         [    0.230390]   ESR = 0x96000006
         [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
         [    0.239428]   SET = 0, FnV = 0
         [    0.242555]   EA = 0, S1PTW = 0
         [    0.245797] Data abort info:
         [    0.248795]   ISV = 0, ISS = 0x00000006
         [    0.252652]   CM = 0, WnR = 0
         [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
         [    0.262645] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000
         [    0.271438] Internal error: Oops: 96000006 [#1] PREEMPT SMP
         [    0.277098] Modules linked in:
         [    0.280227] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
         [    0.288310] Hardware name: linux,dummy-virt (DT)
         [    0.293004] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
         [    0.297931] pc : unwind_frame+0x28/0xc8
         [    0.301792] lr : dump_backtrace+0x12c/0x180
         [    0.306114] sp : ffff80003efcf000
         [    0.309483] x29: ffff80003efcf000 x28: ffff80003da61c00
         [    0.314798] x27: ffff000008ea9000 x26: ffff0000091f7000
         [    0.320216] x25: ffff00000906d000 x24: ffff0000093a80e0
         [    0.325527] x23: 0000000000000000 x22: ffff000008dbada8
         [    0.330941] x21: 0000000000000000 x20: ffff000009049000
         [    0.336355] x19: ffff80003da61c00 x18: 000000003455d99d
         [    0.341770] x17: 0000000000000001 x16: 00f8000040ffff13
         [    0.347078] x15: 000000007eff6000 x14: 642d386165636439
         [    0.352491] x13: 0000000000000000 x12: cc26f77952f87e00
         [    0.357905] x11: ffffffffffffffff x10: 0000000000000075
         [    0.363214] x9 : ffff0000085ae9e8 x8 : ffff80003efcec90
         [    0.368628] x7 : 0000000000000000 x6 : ffff0000091befe1
         [    0.374053] x5 : 0000000000000000 x4 : ffff0000093ac000
         [    0.379363] x3 : ffff0000093a8000 x2 : ffff0000093abce0
         [    0.384779] x1 : ffff80003efcf048 x0 : ffff80003da61c00
         [    0.390196] Process migration/0 (pid: 12, stack limit = 
0x        (ptrval))
         [    0.397188] Call trace:
         [    0.399712]  unwind_frame+0x28/0xc8
         [    0.403316]  show_stack+0x14/0x1c
         [    0.406689]  dump_stack+0x90/0xb0
         [    0.410065]  panic+0x138/0x2a0
         [    0.413193]  __stack_chk_fail+0x0/0x18
         [    0.416934]  handle_bad_stack+0x118/0x124
         [    0.421025]  __bad_stack+0x88/0x8c
         [    0.424513]  el1_sync+0x0/0xb0
         [    0.427643] Unable to handle kernel paging request at 
virtual address ffff0000093abce0
         [    0.435604] Mem abort info:
         [    0.438488]   ESR = 0x96000006
         [    0.441615]   Exception class = DABT (current EL), IL = 32 bits
         [    0.447635]   SET = 0, FnV = 0
         [    0.450759]   EA = 0, S1PTW = 0
         [    0.454002] Data abort info:
         [    0.456896]   ISV = 0, ISS = 0x00000006
         [    0.460863]   CM = 0, WnR = 0
         [    0.463874] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
         [    0.470750] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000

Best Regards,
Wei

> Thanks,
>
> Will
>
> .
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 13:46                               ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 13:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2018/6/22 21:31, Will Deacon wrote:
> Hi again, Wei,
>
> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>> On 2018/6/22 19:16, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>>>> On 2018/6/22 17:23, Will Deacon wrote:
>>>>> Perhaps just writing back the table entries is enough to cause the issue,
>>>>> although I really can't understand why that would be the case. Can you try
>>>>> the diff below (without my previous change), please?
>>>> Thanks!
>>>> But it does not resolve the issue(only apply this patch based on 4.17.0).
>>> Thanks, that's a useful data point. It means that it still crashes even if
>>> we write back the same table entries, so it's the fact that we're writing
>>> them at all which causes the problem, not the value that we write.
>>>
>>> Whilst looking at the code, we noticed a missing DMB. On the off-chance
>>> that it helps, can you try this instead please?
>> Thanks!
>> Only apply below patch based on 4.17.0, we still got the crash.
> Oh well, it was worth a shot (and that's still a fix worth having). Please
> can you provide the complete disassembly for kpti_install_ng_mappings()
> (I'm referring to the C function in cpufeature.c) along with a corresponding
> crash log so that we can correlate the instruction stream with the crash?
Just let me know if you need more information.
Thanks!

The disassemble code is as below:
     Dump of assembler code for function kpti_install_ng_mappings:
        0xffff000008091d68 <+0>:     stp     x29, x30, [sp,#-112]!
        0xffff000008091d6c <+4>:     adrp    x0, 0xffff000009022000 
<bp_hardening_data>
        0xffff000008091d70 <+8>:     mov     x29, sp
        0xffff000008091d74 <+12>:    stp     x23, x24, [sp,#48]
        0xffff000008091d78 <+16>:    adrp    x24, 0xffff000009191000 
<reset_devices>
        0xffff000008091d7c <+20>:    add     x0, x0, #0x10
        0xffff000008091d80 <+24>:    add     x1, x24, #0x550
        0xffff000008091d84 <+28>:    stp     x19, x20, [sp,#16]
        0xffff000008091d88 <+32>:    stp     x21, x22, [sp,#32]
        0xffff000008091d8c <+36>:    stp     x25, x26, [sp,#64]
        0xffff000008091d90 <+40>:    stp     x27, x28, [sp,#80]
        0xffff000008091d94 <+44>:    mrs     x2, tpidr_el1
        0xffff000008091d98 <+48>:    ldrb    w1, [x1,#8]
        0xffff000008091d9c <+52>:    ldr     w20, [x2,x0]
        0xffff000008091da0 <+56>:    cbnz    w1, 0xffff000008091f18 
<kpti_install_ng_mappings+432>
        0xffff000008091da4 <+60>:    adrp    x27, 0xffff000008ea9000 
<cpu_ops+384>
        0xffff000008091da8 <+64>:    adrp    x19, 0xffff000009190000 
<empty_zero_page>
        0xffff000008091dac <+68>:    add     x19, x19, #0x0
        0xffff000008091db0 <+72>:    adrp    x1, 0xffff000008a44000 
<kimage_vaddr>
        0xffff000008091db4 <+76>:    mov     x0, x19
        0xffff000008091db8 <+80>:    add     x1, x1, #0x3d8
        0xffff000008091dbc <+84>:    ldr     x2, [x27,#672]
        0xffff000008091dc0 <+88>:    sub     x4, x1, x2
        0xffff000008091dc4 <+92>:    sub     x0, x0, x2
        0xffff000008091dc8 <+96>:    msr     ttbr0_el1, x0
        0xffff000008091dcc <+100>:   isb
        0xffff000008091dd0 <+104>:   dsb     nshst
        0xffff000008091dd4 <+108>:   tlbi    vmalle1
        0xffff000008091dd8 <+112>:   nop
        0xffff000008091ddc <+116>:   nop
        0xffff000008091de0 <+120>:   dsb     nsh
        0xffff000008091de4 <+124>:   isb
        0xffff000008091de8 <+128>:   adrp    x3, 0xffff000009056000 
<armv8_event_attr_sw_incr+8>
        0xffff000008091dec <+132>:   ldr     x0, [x3,#2248]
        0xffff000008091df0 <+136>:   cmp     x0, #0x10
        0xffff000008091df4 <+140>:   b.ne    0xffff000008091f64 
<kpti_install_ng_mappings+508>
        0xffff000008091df8 <+144>:   adrp    x28, 0xffff000008ea9000 
<cpu_ops+384>
        0xffff000008091dfc <+148>:   ldr     x2, [x27,#672]
        0xffff000008091e00 <+152>:   adrp    x1, 0xffff0000091f3000
        0xffff000008091e04 <+156>:   adrp    x26, 0xffff0000091f7000
        0xffff000008091e08 <+160>:   add     x1, x1, #0x0
        0xffff000008091e0c <+164>:   add     x21, x26, #0x0
        0xffff000008091e10 <+168>:   ldr     x0, [x28,#656]
        0xffff000008091e14 <+172>:   adrp    x23, 0xffff000008ea9000 
<cpu_ops+384>
        0xffff000008091e18 <+176>:   sub     x1, x1, x2
        0xffff000008091e1c <+180>:   sub     x1, x1, x0
        0xffff000008091e20 <+184>:   orr     x0, x1, #0xffff800000000000
        0xffff000008091e24 <+188>:   cmp     x0, x21
        0xffff000008091e28 <+192>:   b.eq    0xffff000008091f60 
<kpti_install_ng_mappings+504>
        0xffff000008091e2c <+196>:   mov     x22, x19
        0xffff000008091e30 <+200>:   str     x3, [x29,#96]
        0xffff000008091e34 <+204>:   str     x4, [x29,#104]
        0xffff000008091e38 <+208>:   sub     x2, x22, x2
        0xffff000008091e3c <+212>:   msr     ttbr0_el1, x2
        0xffff000008091e40 <+216>:   isb
        0xffff000008091e44 <+220>:   ldr     x0, [x28,#656]
        0xffff000008091e48 <+224>:   and     x1, x1, #0x7fffffffffff
        0xffff000008091e4c <+228>:   adrp    x25, 0xffff00000906d000 
<shmem_swaplist_mutex+16>
        0xffff000008091e50 <+232>:   add     x0, x1, x0
        0xffff000008091e54 <+236>:   add     x1, x25, #0x7b0
        0xffff000008091e58 <+240>:   bl      0xffff0000080a021c 
<cpu_do_switch_mm>
        0xffff000008091e5c <+244>:   adrp    x0, 0xffff00000904a000 
<__cpu_online_mask>
        0xffff000008091e60 <+248>:   mov     w1, 
#0x80                       // #128
        0xffff000008091e64 <+252>:   add     x0, x0, #0x0
        0xffff000008091e68 <+256>:   bl      0xffff0000083e22f0 
<__bitmap_weight>
        0xffff000008091e6c <+260>:   mov     w1, w0
        0xffff000008091e70 <+264>:   ldr     x5, [x23,#672]
        0xffff000008091e74 <+268>:   mov     w0, w20
        0xffff000008091e78 <+272>:   ldr     x4, [x29,#104]
        0xffff000008091e7c <+276>:   mov     x2, x21
        0xffff000008091e80 <+280>:   sub     x2, x2, x5
        0xffff000008091e84 <+284>:   blr     x4
        0xffff000008091e88 <+288>:   ldr     x1, [x23,#672]
        0xffff000008091e8c <+292>:   mrs     x0, sp_el0
        0xffff000008091e90 <+296>:   sub     x22, x22, x1
        0xffff000008091e94 <+300>:   ldr     x1, [x0,#1128]
        0xffff000008091e98 <+304>:   msr     ttbr0_el1, x22
        0xffff000008091e9c <+308>:   isb
        0xffff000008091ea0 <+312>:   dsb     nshst
        0xffff000008091ea4 <+316>:   tlbi    vmalle1
        0xffff000008091ea8 <+320>:   nop
        0xffff000008091eac <+324>:   nop
        0xffff000008091eb0 <+328>:   dsb     nsh
        0xffff000008091eb4 <+332>:   isb
        0xffff000008091eb8 <+336>:   ldr     x3, [x29,#96]
        0xffff000008091ebc <+340>:   ldr     x0, [x3,#2248]
        0xffff000008091ec0 <+344>:   cmp     x0, #0x10
        0xffff000008091ec4 <+348>:   b.ne    0xffff000008091f48 
<kpti_install_ng_mappings+480>
        0xffff000008091ec8 <+352>:   add     x25, x25, #0x7b0
        0xffff000008091ecc <+356>:   cmp     x1, x25
        0xffff000008091ed0 <+360>:   b.eq    0xffff000008091f08 
<kpti_install_ng_mappings+416>
        0xffff000008091ed4 <+364>:   ldr     x2, [x1,#64]
        0xffff000008091ed8 <+368>:   add     x26, x26, #0x0
        0xffff000008091edc <+372>:   cmp     x2, x26
        0xffff000008091ee0 <+376>:   b.eq    0xffff000008091f60 
<kpti_install_ng_mappings+504>
        0xffff000008091ee4 <+380>:   ldr     x0, [x27,#672]
        0xffff000008091ee8 <+384>:   sub     x19, x19, x0
        0xffff000008091eec <+388>:   msr     ttbr0_el1, x19
        0xffff000008091ef0 <+392>:   isb
        0xffff000008091ef4 <+396>:   tbz     x2, #47, 0xffff000008091f34 
<kpti_install_ng_mappings+460>
        0xffff000008091ef8 <+400>:   ldr     x0, [x28,#656]
        0xffff000008091efc <+404>:   and     x2, x2, #0x7fffffffffff
        0xffff000008091f00 <+408>:   add     x0, x2, x0
        0xffff000008091f04 <+412>:   bl      0xffff0000080a021c 
<cpu_do_switch_mm>
        0xffff000008091f08 <+416>:   cbnz    w20, 0xffff000008091f18 
<kpti_install_ng_mappings+432>
        0xffff000008091f0c <+420>:   add     x24, x24, #0x550
        0xffff000008091f10 <+424>:   mov     w0, 
#0x1                        // #1
        0xffff000008091f14 <+428>:   strb    w0, [x24,#8]
        0xffff000008091f18 <+432>:   ldp     x19, x20, [sp,#16]
        0xffff000008091f1c <+436>:   ldp     x21, x22, [sp,#32]
        0xffff000008091f20 <+440>:   ldp     x23, x24, [sp,#48]
        0xffff000008091f24 <+444>:   ldp     x25, x26, [sp,#64]
        0xffff000008091f28 <+448>:   ldp     x27, x28, [sp,#80]
        0xffff000008091f2c <+452>:   ldp     x29, x30, [sp],#112
        0xffff000008091f30 <+456>:   ret
        0xffff000008091f34 <+460>:   adrp    x0, 0xffff000008ea9000 
<cpu_ops+384>
        0xffff000008091f38 <+464>:   ldr     x0, [x0,#672]
        0xffff000008091f3c <+468>:   sub     x0, x2, x0
        0xffff000008091f40 <+472>:   bl      0xffff0000080a021c 
<cpu_do_switch_mm>
        0xffff000008091f44 <+476>:   b       0xffff000008091f08 
<kpti_install_ng_mappings+416>
        0xffff000008091f48 <+480>:   mrs     x0, tcr_el1
        0xffff000008091f4c <+484>:   and     x0, x0, #0xffffffffffffffc0
        0xffff000008091f50 <+488>:   orr     x0, x0, #0x10
        0xffff000008091f54 <+492>:   msr     tcr_el1, x0
        0xffff000008091f58 <+496>:   isb
        0xffff000008091f5c <+500>:   b       0xffff000008091ec8 
<kpti_install_ng_mappings+352>
        0xffff000008091f60 <+504>:   brk     #0x800
        0xffff000008091f64 <+508>:   mrs     x1, tcr_el1
        0xffff000008091f68 <+512>:   and     x1, x1, #0xffffffffffffffc0
        0xffff000008091f6c <+516>:   orr     x0, x1, x0
        0xffff000008091f70 <+520>:   msr     tcr_el1, x0
        0xffff000008091f74 <+524>:   isb
        0xffff000008091f78 <+528>:   b       0xffff000008091df8 
<kpti_install_ng_mappings+144>
     End of assembler dump.


The crash log for it is as :
     estuary:/$ ./qemu-system-aarch64 -machine 
virt,kernel_irqchip=on,gic-version=3
      -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.17-joyx 
-initrd
     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init 
console=ttyAMA0 earlycon=pl011,0x9000000"
         [    0.000000] Booting Linux on physical CPU 0x0000000000 
[0x480fd010]
         [    0.000000] Linux version 4.17.0-45864-g29dcea8-dirty 
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #16 
SMP PREEMPT Fri Jun 22 21:05:10 CST 2018
         [    0.000000] Machine model: linux,dummy-virt
         [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 
(options '')
         [    0.000000] bootconsole [pl11] enabled
         [    0.000000] efi: Getting EFI parameters from FDT:
         [    0.000000] efi: UEFI not found.
         [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
         [    0.000000] NUMA: No NUMA configuration found
         [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
         [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
         [    0.000000] Zone ranges:
         [    0.000000]   DMA32    [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000]   Normal   empty
         [    0.000000] Movable zone start for each node
         [    0.000000] Early memory node ranges
         [    0.000000]   node   0: [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
         [    0.000000] psci: probing for conduit method from DT.
         [    0.000000] psci: PSCIv1.0 detected in firmware.
         [    0.000000] psci: Using standard PSCI v0.2 function IDs
         [    0.000000] psci: Trusted OS migration not required
         [    0.000000] psci: SMC Calling Convention v1.1
         [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
         [    0.000000] percpu: Embedded 24 pages/cpu @ (ptrval) s57984 
r8192 d32128 u98304
         [    0.000000] Detected VIPT I-cache on CPU0
         [    0.000000] CPU features: detected: Kernel page table 
isolation (KPTI)
         [    0.000000] CPU features: detected: Hardware dirty bit 
management
         [    0.000000] Built 1 zonelists, mobility grouping on. Total 
pages: 258048
         [    0.000000] Policy zone: DMA32
         [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
         [    0.000000] Memory: 968436K/1048576K available (10044K 
kernel code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K 
reserved, 16384K cma-reserved)
         [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, 
CPUs=1, Nodes=1
         [    0.000000] Preemptible hierarchical RCU implementation.
         [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
         [    0.000000]     Tasks RCU enabled.
         [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
         [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
         [    0.000000] GICv3: Distributor has no Range Selector support
         [    0.000000] GICv3: no VLPI support, no direct LPI support
         [    0.000000] ITS [mem 0x08080000-0x0809ffff]
         [    0.000000] ITS at 0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
         [    0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
         [    0.000000] GIC: using LPI property table @0x000000007d850000
         [    0.000000] ITS: Allocated 1792 chunks for LPIs
         [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
         [    0.000000] CPU0: using LPI pending table @0x000000007d860000
         [    0.000000] GIC: PPI11 is secure or misconfigured
         [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
         [    0.000000] arch_timer: WARNING: Please fix your firmware
         [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz 
(virt).
         [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
         [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
         [    0.000849] Console: colour dummy device 80x25
         [    0.001427] Calibrating delay loop (skipped), value 
calculated using timer frequency.. 200.00 BogoMIPS (lpj=400000)
         [    0.002485] pid_max: default: 32768 minimum: 301
         [    0.002966] Security Framework initialized
         [    0.003549] Dentry cache hash table entries: 131072 (order: 
8, 1048576 bytes)
         [    0.004353] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
         [    0.005068] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
         [    0.005858] Mountpoint-cache hash table entries: 2048 
(order: 2, 16384 bytes)
         [    0.025962] ASID allocator initialised with 32768 entries
         [    0.029972] Hierarchical SRCU implementation.
         [    0.034341] Platform MSI: its domain created
         [    0.034793] PCI/MSI: /intc/its domain created
         [    0.035360] EFI services will not be available.
         [    0.038002] smp: Bringing up secondary CPUs ...
         [    0.038472] smp: Brought up 1 node, 1 CPU
         [    0.038878] SMP: Total of 1 processors activated.
         [    0.039354] CPU features: detected: GIC system register CPU 
interface
         [    0.040004] CPU features: detected: Privileged Access Never
         [    0.040566] CPU features: detected: User Access Override
         [    0.042462] Insufficient stack space to handle exception!
         [    0.042464] ESR: 0x96000046 -- DABT (current EL)
         [    0.043781] FAR: 0xffff0000093a80e0
         [    0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
         [    0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
         [    0.053361] Overflow stack: 
[0xffff80003efce2f0..0xffff80003efcf2f0]
         [    0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
         [    0.067946] Hardware name: linux,dummy-virt (DT)
         [    0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
         [    0.077480] pc : el1_sync+0x0/0xb0
         [    0.080970] lr : kpti_install_ng_mappings+0x120/0x214
         [    0.086143] sp : ffff0000093a80e0
         [    0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
         [    0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
         [    0.100241] x25: ffff00000906d000 x24: ffff000009191000
         [    0.105657] x23: ffff000008ea9000 x22: 0000000041190000
         [    0.111448] x21: ffff0000091f7000 x20: 0000000000000000
         [    0.116437] x19: ffff000009190000 x18: 000000003455d99d
         [    0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
         [    0.127155] x15: 000000007eff6000 x14: 000000007eff6000
         [    0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
         [    0.137886] x11: 000000007eff8000 x10: 0000000000000000
         [    0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
         [    0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
         [    0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
         [    0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
         [    0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
         [    0.170179] Kernel panic - not syncing: kernel stack overflow
         [    0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
         [    0.184152] Hardware name: linux,dummy-virt (DT)
         [    0.188851] Call trace:
         [    0.191380]  dump_backtrace+0x0/0x180
         [    0.195113]  show_stack+0x14/0x1c
         [    0.198488]  dump_stack+0x90/0xb0
         [    0.201862]  panic+0x138/0x2a0
         [    0.204989]  __stack_chk_fail+0x0/0x18
         [    0.208836]  handle_bad_stack+0x118/0x124
         [    0.212927]  __bad_stack+0x88/0x8c
         [    0.216414]  el1_sync+0x0/0xb0
         [    0.219544] Unable to handle kernel paging request at 
virtual address ffff0000093abce0
         [    0.227507] Mem abort info:
         [    0.230390]   ESR = 0x96000006
         [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
         [    0.239428]   SET = 0, FnV = 0
         [    0.242555]   EA = 0, S1PTW = 0
         [    0.245797] Data abort info:
         [    0.248795]   ISV = 0, ISS = 0x00000006
         [    0.252652]   CM = 0, WnR = 0
         [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
         [    0.262645] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000
         [    0.271438] Internal error: Oops: 96000006 [#1] PREEMPT SMP
         [    0.277098] Modules linked in:
         [    0.280227] CPU: 0 PID: 12 Comm: migration/0 Not tainted 
4.17.0-45864-g29dcea8-dirty #16
         [    0.288310] Hardware name: linux,dummy-virt (DT)
         [    0.293004] pstate: 204003c5 (nzCv DAIF +PAN -UAO)
         [    0.297931] pc : unwind_frame+0x28/0xc8
         [    0.301792] lr : dump_backtrace+0x12c/0x180
         [    0.306114] sp : ffff80003efcf000
         [    0.309483] x29: ffff80003efcf000 x28: ffff80003da61c00
         [    0.314798] x27: ffff000008ea9000 x26: ffff0000091f7000
         [    0.320216] x25: ffff00000906d000 x24: ffff0000093a80e0
         [    0.325527] x23: 0000000000000000 x22: ffff000008dbada8
         [    0.330941] x21: 0000000000000000 x20: ffff000009049000
         [    0.336355] x19: ffff80003da61c00 x18: 000000003455d99d
         [    0.341770] x17: 0000000000000001 x16: 00f8000040ffff13
         [    0.347078] x15: 000000007eff6000 x14: 642d386165636439
         [    0.352491] x13: 0000000000000000 x12: cc26f77952f87e00
         [    0.357905] x11: ffffffffffffffff x10: 0000000000000075
         [    0.363214] x9 : ffff0000085ae9e8 x8 : ffff80003efcec90
         [    0.368628] x7 : 0000000000000000 x6 : ffff0000091befe1
         [    0.374053] x5 : 0000000000000000 x4 : ffff0000093ac000
         [    0.379363] x3 : ffff0000093a8000 x2 : ffff0000093abce0
         [    0.384779] x1 : ffff80003efcf048 x0 : ffff80003da61c00
         [    0.390196] Process migration/0 (pid: 12, stack limit = 
0x        (ptrval))
         [    0.397188] Call trace:
         [    0.399712]  unwind_frame+0x28/0xc8
         [    0.403316]  show_stack+0x14/0x1c
         [    0.406689]  dump_stack+0x90/0xb0
         [    0.410065]  panic+0x138/0x2a0
         [    0.413193]  __stack_chk_fail+0x0/0x18
         [    0.416934]  handle_bad_stack+0x118/0x124
         [    0.421025]  __bad_stack+0x88/0x8c
         [    0.424513]  el1_sync+0x0/0xb0
         [    0.427643] Unable to handle kernel paging request at 
virtual address ffff0000093abce0
         [    0.435604] Mem abort info:
         [    0.438488]   ESR = 0x96000006
         [    0.441615]   Exception class = DABT (current EL), IL = 32 bits
         [    0.447635]   SET = 0, FnV = 0
         [    0.450759]   EA = 0, S1PTW = 0
         [    0.454002] Data abort info:
         [    0.456896]   ISV = 0, ISS = 0x00000006
         [    0.460863]   CM = 0, WnR = 0
         [    0.463874] swapper pgtable: 4k pages, 48-bit VAs, pgdp 
=         (ptrval)
         [    0.470750] [ffff0000093abce0] pgd=00000000411f8803, 
pud=00000000411f9803, pmd=0000000000000000

Best Regards,
Wei

> Thanks,
>
> Will
>
> .
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-22 13:18                           ` Wei Xu
@ 2018-06-22 14:28                             ` Mark Rutland
  -1 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-22 14:28 UTC (permalink / raw)
  To: Wei Xu
  Cc: Will Deacon, James Morse, catalin.marinas, suzuki.poulose,
	dave.martin, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>     [    0.042462] Insufficient stack space to handle exception!
>     [    0.042464] ESR: 0x96000046 -- DABT (current EL)
>     [    0.043781] FAR: 0xffff0000093a80e0
>     [    0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]

Here, the FAR points somewhere in the task stack, so we're evidently
faulting on that...

>     [    0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
>     [    0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>     [    0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #16
>     [    0.067946] Hardware name: linux,dummy-virt (DT)
>     [    0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>     [    0.077480] pc : el1_sync+0x0/0xb0
>     [    0.080970] lr : kpti_install_ng_mappings+0x120/0x214
>     [    0.086143] sp : ffff0000093a80e0
>     [    0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
>     [    0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
>     [    0.100241] x25: ffff00000906d000 x24: ffff000009191000
>     [    0.105657] x23: ffff000008ea9000 x22: 0000000041190000
>     [    0.111448] x21: ffff0000091f7000 x20: 0000000000000000
>     [    0.116437] x19: ffff000009190000 x18: 000000003455d99d
>     [    0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
>     [    0.127155] x15: 000000007eff6000 x14: 000000007eff6000
>     [    0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
>     [    0.137886] x11: 000000007eff8000 x10: 0000000000000000
>     [    0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
>     [    0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
>     [    0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
>     [    0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
>     [    0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
>     [    0.170179] Kernel panic - not syncing: kernel stack overflow
>     [    0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #16
>     [    0.184152] Hardware name: linux,dummy-virt (DT)
>     [    0.188851] Call trace:
>     [    0.191380]  dump_backtrace+0x0/0x180
>     [    0.195113]  show_stack+0x14/0x1c
>     [    0.198488]  dump_stack+0x90/0xb0
>     [    0.201862]  panic+0x138/0x2a0
>     [    0.204989]  __stack_chk_fail+0x0/0x18
>     [    0.208836]  handle_bad_stack+0x118/0x124
>     [    0.212927]  __bad_stack+0x88/0x8c
>     [    0.216414]  el1_sync+0x0/0xb0
>     [    0.219544] Unable to handle kernel paging request at virtual address
> ffff0000093abce0

Likewise, here we're faulting on an address within the task stack,
presumably as part of the unwinding process...

>     [    0.227507] Mem abort info:
>     [    0.230390]   ESR = 0x96000006
>     [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
>     [    0.239428]   SET = 0, FnV = 0
>     [    0.242555]   EA = 0, S1PTW = 0
>     [    0.245797] Data abort info:
>     [    0.248795]   ISV = 0, ISS = 0x00000006
>     [    0.252652]   CM = 0, WnR = 0
>     [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
> (ptrval)
>     [    0.262645] [ffff0000093abce0] pgd=00000000411f8803,
> pud=00000000411f9803, pmd=0000000000000000

... and here the PMD for the task stack is all zeroes, so evidently
that's getting corrupted somehow.

It appears that the overflow stack (which IIRC is embedded within the
kernel's data segment, as part of the image mapping), is fine.

I wonder if there's some existing weirdness in the page tables for the
vmalloc area that causes things to go wrong. Can you please:

* enable ARM64_PTDUMP_DEBUGFS

* boot with kpti=off (with Will's patch to make this work)

* as root, cat /sys/kernel/debug/kernel_page_tables

... and dump the result here?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 14:28                             ` Mark Rutland
  0 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-22 14:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>     [    0.042462] Insufficient stack space to handle exception!
>     [    0.042464] ESR: 0x96000046 -- DABT (current EL)
>     [    0.043781] FAR: 0xffff0000093a80e0
>     [    0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]

Here, the FAR points somewhere in the task stack, so we're evidently
faulting on that...

>     [    0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
>     [    0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>     [    0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #16
>     [    0.067946] Hardware name: linux,dummy-virt (DT)
>     [    0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>     [    0.077480] pc : el1_sync+0x0/0xb0
>     [    0.080970] lr : kpti_install_ng_mappings+0x120/0x214
>     [    0.086143] sp : ffff0000093a80e0
>     [    0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
>     [    0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
>     [    0.100241] x25: ffff00000906d000 x24: ffff000009191000
>     [    0.105657] x23: ffff000008ea9000 x22: 0000000041190000
>     [    0.111448] x21: ffff0000091f7000 x20: 0000000000000000
>     [    0.116437] x19: ffff000009190000 x18: 000000003455d99d
>     [    0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
>     [    0.127155] x15: 000000007eff6000 x14: 000000007eff6000
>     [    0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
>     [    0.137886] x11: 000000007eff8000 x10: 0000000000000000
>     [    0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
>     [    0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
>     [    0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
>     [    0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
>     [    0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
>     [    0.170179] Kernel panic - not syncing: kernel stack overflow
>     [    0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
> 4.17.0-45864-g29dcea8-dirty #16
>     [    0.184152] Hardware name: linux,dummy-virt (DT)
>     [    0.188851] Call trace:
>     [    0.191380]  dump_backtrace+0x0/0x180
>     [    0.195113]  show_stack+0x14/0x1c
>     [    0.198488]  dump_stack+0x90/0xb0
>     [    0.201862]  panic+0x138/0x2a0
>     [    0.204989]  __stack_chk_fail+0x0/0x18
>     [    0.208836]  handle_bad_stack+0x118/0x124
>     [    0.212927]  __bad_stack+0x88/0x8c
>     [    0.216414]  el1_sync+0x0/0xb0
>     [    0.219544] Unable to handle kernel paging request at virtual address
> ffff0000093abce0

Likewise, here we're faulting on an address within the task stack,
presumably as part of the unwinding process...

>     [    0.227507] Mem abort info:
>     [    0.230390]   ESR = 0x96000006
>     [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
>     [    0.239428]   SET = 0, FnV = 0
>     [    0.242555]   EA = 0, S1PTW = 0
>     [    0.245797] Data abort info:
>     [    0.248795]   ISV = 0, ISS = 0x00000006
>     [    0.252652]   CM = 0, WnR = 0
>     [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
> (ptrval)
>     [    0.262645] [ffff0000093abce0] pgd=00000000411f8803,
> pud=00000000411f9803, pmd=0000000000000000

... and here the PMD for the task stack is all zeroes, so evidently
that's getting corrupted somehow.

It appears that the overflow stack (which IIRC is embedded within the
kernel's data segment, as part of the image mapping), is fine.

I wonder if there's some existing weirdness in the page tables for the
vmalloc area that causes things to go wrong. Can you please:

* enable ARM64_PTDUMP_DEBUGFS

* boot with kpti=off (with Will's patch to make this work)

* as root, cat /sys/kernel/debug/kernel_page_tables

... and dump the result here?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-22 13:46                               ` Wei Xu
@ 2018-06-22 14:43                                 ` Will Deacon
  -1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 14:43 UTC (permalink / raw)
  To: Wei Xu
  Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

On Fri, Jun 22, 2018 at 09:46:53PM +0800, Wei Xu wrote:
> On 2018/6/22 21:31, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> >>On 2018/6/22 19:16, Will Deacon wrote:
> >>>On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> >>>>On 2018/6/22 17:23, Will Deacon wrote:
> >>>>>Perhaps just writing back the table entries is enough to cause the issue,
> >>>>>although I really can't understand why that would be the case. Can you try
> >>>>>the diff below (without my previous change), please?
> >>>>Thanks!
> >>>>But it does not resolve the issue(only apply this patch based on 4.17.0).
> >>>Thanks, that's a useful data point. It means that it still crashes even if
> >>>we write back the same table entries, so it's the fact that we're writing
> >>>them at all which causes the problem, not the value that we write.
> >>>
> >>>Whilst looking at the code, we noticed a missing DMB. On the off-chance
> >>>that it helps, can you try this instead please?
> >>Thanks!
> >>Only apply below patch based on 4.17.0, we still got the crash.
> >Oh well, it was worth a shot (and that's still a fix worth having). Please
> >can you provide the complete disassembly for kpti_install_ng_mappings()
> >(I'm referring to the C function in cpufeature.c) along with a corresponding
> >crash log so that we can correlate the instruction stream with the crash?
> Just let me know if you need more information.

Thanks; the disassembly and log are really helpful.

I have another patch for you to try below. Please can you let me know how
you get on, and sorry for the back-and-forth on this.

Will

--->8

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..26c5c3fabca8 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -216,9 +216,14 @@ ENDPROC(idmap_cpu_replace_ttbr1)
 	.endm
 
 	.macro __idmap_kpti_put_pgtable_ent_ng, type
-	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
+	eor	\type, \type, #PTE_NG		// Same bit for blocks and pages
 	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
+	tbz	\type, #11, 1234f
 	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
+	b	1235f
+	1234:
+	dc	cvac, cur_\()\type\()p
+	1235:
 	.endm
 
 /*
@@ -298,6 +303,7 @@ skip_pgd:
 	/* PUD */
 walk_puds:
 	.if CONFIG_PGTABLE_LEVELS > 3
+	eor	pgd, pgd, #PTE_NG
 	pte_to_phys	cur_pudp, pgd
 	add	end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
 do_pud:	__idmap_kpti_get_pgtable_ent	pud
@@ -319,6 +325,7 @@ next_pud:
 	/* PMD */
 walk_pmds:
 	.if CONFIG_PGTABLE_LEVELS > 2
+	eor	pud, pud, #PTE_NG
 	pte_to_phys	cur_pmdp, pud
 	add	end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
 do_pmd:	__idmap_kpti_get_pgtable_ent	pmd
@@ -339,6 +346,7 @@ next_pmd:
 
 	/* PTE */
 walk_ptes:
+	eor	pmd, pmd, #PTE_NG
 	pte_to_phys	cur_ptep, pmd
 	add	end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
 do_pte:	__idmap_kpti_get_pgtable_ent	pte

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 14:43                                 ` Will Deacon
  0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 14:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 22, 2018 at 09:46:53PM +0800, Wei Xu wrote:
> On 2018/6/22 21:31, Will Deacon wrote:
> >On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> >>On 2018/6/22 19:16, Will Deacon wrote:
> >>>On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
> >>>>On 2018/6/22 17:23, Will Deacon wrote:
> >>>>>Perhaps just writing back the table entries is enough to cause the issue,
> >>>>>although I really can't understand why that would be the case. Can you try
> >>>>>the diff below (without my previous change), please?
> >>>>Thanks!
> >>>>But it does not resolve the issue(only apply this patch based on 4.17.0).
> >>>Thanks, that's a useful data point. It means that it still crashes even if
> >>>we write back the same table entries, so it's the fact that we're writing
> >>>them at all which causes the problem, not the value that we write.
> >>>
> >>>Whilst looking at the code, we noticed a missing DMB. On the off-chance
> >>>that it helps, can you try this instead please?
> >>Thanks!
> >>Only apply below patch based on 4.17.0, we still got the crash.
> >Oh well, it was worth a shot (and that's still a fix worth having). Please
> >can you provide the complete disassembly for kpti_install_ng_mappings()
> >(I'm referring to the C function in cpufeature.c) along with a corresponding
> >crash log so that we can correlate the instruction stream with the crash?
> Just let me know if you need more information.

Thanks; the disassembly and log are really helpful.

I have another patch for you to try below. Please can you let me know how
you get on, and sorry for the back-and-forth on this.

Will

--->8

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 5f9a73a4452c..26c5c3fabca8 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -216,9 +216,14 @@ ENDPROC(idmap_cpu_replace_ttbr1)
 	.endm
 
 	.macro __idmap_kpti_put_pgtable_ent_ng, type
-	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
+	eor	\type, \type, #PTE_NG		// Same bit for blocks and pages
 	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
+	tbz	\type, #11, 1234f
 	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
+	b	1235f
+	1234:
+	dc	cvac, cur_\()\type\()p
+	1235:
 	.endm
 
 /*
@@ -298,6 +303,7 @@ skip_pgd:
 	/* PUD */
 walk_puds:
 	.if CONFIG_PGTABLE_LEVELS > 3
+	eor	pgd, pgd, #PTE_NG
 	pte_to_phys	cur_pudp, pgd
 	add	end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
 do_pud:	__idmap_kpti_get_pgtable_ent	pud
@@ -319,6 +325,7 @@ next_pud:
 	/* PMD */
 walk_pmds:
 	.if CONFIG_PGTABLE_LEVELS > 2
+	eor	pud, pud, #PTE_NG
 	pte_to_phys	cur_pmdp, pud
 	add	end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
 do_pmd:	__idmap_kpti_get_pgtable_ent	pmd
@@ -339,6 +346,7 @@ next_pmd:
 
 	/* PTE */
 walk_ptes:
+	eor	pmd, pmd, #PTE_NG
 	pte_to_phys	cur_ptep, pmd
 	add	end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
 do_pte:	__idmap_kpti_get_pgtable_ent	pte

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-22 14:43                                 ` Will Deacon
@ 2018-06-22 15:26                                   ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 15:26 UTC (permalink / raw)
  To: Will Deacon
  Cc: James Morse, catalin.marinas, suzuki.poulose, dave.martin,
	mark.rutland, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

Hi Will,

On 2018/6/22 22:43, Will Deacon wrote:
> On Fri, Jun 22, 2018 at 09:46:53PM +0800, Wei Xu wrote:
>> On 2018/6/22 21:31, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>>>> On 2018/6/22 19:16, Will Deacon wrote:
>>>>> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>>>>>> On 2018/6/22 17:23, Will Deacon wrote:
>>>>>>> Perhaps just writing back the table entries is enough to cause the issue,
>>>>>>> although I really can't understand why that would be the case. Can you try
>>>>>>> the diff below (without my previous change), please?
>>>>>> Thanks!
>>>>>> But it does not resolve the issue(only apply this patch based on 4.17.0).
>>>>> Thanks, that's a useful data point. It means that it still crashes even if
>>>>> we write back the same table entries, so it's the fact that we're writing
>>>>> them at all which causes the problem, not the value that we write.
>>>>>
>>>>> Whilst looking at the code, we noticed a missing DMB. On the off-chance
>>>>> that it helps, can you try this instead please?
>>>> Thanks!
>>>> Only apply below patch based on 4.17.0, we still got the crash.
>>> Oh well, it was worth a shot (and that's still a fix worth having). Please
>>> can you provide the complete disassembly for kpti_install_ng_mappings()
>>> (I'm referring to the C function in cpufeature.c) along with a corresponding
>>> crash log so that we can correlate the instruction stream with the crash?
>> Just let me know if you need more information.
> Thanks; the disassembly and log are really helpful.
>
> I have another patch for you to try below. Please can you let me know how
> you get on, and sorry for the back-and-forth on this.

No worry.
Great, I have tried 30 times and it works well with this patch applying 
on the 4.17.0.
And is it possible to let me know how you are using the disassemble and 
log to debug
this kind issue?
Thanks!

Best Regards,
Wei

> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..26c5c3fabca8 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -216,9 +216,14 @@ ENDPROC(idmap_cpu_replace_ttbr1)
>   	.endm
>   
>   	.macro __idmap_kpti_put_pgtable_ent_ng, type
> -	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
> +	eor	\type, \type, #PTE_NG		// Same bit for blocks and pages
>   	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
> +	tbz	\type, #11, 1234f
>   	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
> +	b	1235f
> +	1234:
> +	dc	cvac, cur_\()\type\()p
> +	1235:
>   	.endm
>   
>   /*
> @@ -298,6 +303,7 @@ skip_pgd:
>   	/* PUD */
>   walk_puds:
>   	.if CONFIG_PGTABLE_LEVELS > 3
> +	eor	pgd, pgd, #PTE_NG
>   	pte_to_phys	cur_pudp, pgd
>   	add	end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
>   do_pud:	__idmap_kpti_get_pgtable_ent	pud
> @@ -319,6 +325,7 @@ next_pud:
>   	/* PMD */
>   walk_pmds:
>   	.if CONFIG_PGTABLE_LEVELS > 2
> +	eor	pud, pud, #PTE_NG
>   	pte_to_phys	cur_pmdp, pud
>   	add	end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
>   do_pmd:	__idmap_kpti_get_pgtable_ent	pmd
> @@ -339,6 +346,7 @@ next_pmd:
>   
>   	/* PTE */
>   walk_ptes:
> +	eor	pmd, pmd, #PTE_NG
>   	pte_to_phys	cur_ptep, pmd
>   	add	end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
>   do_pte:	__idmap_kpti_get_pgtable_ent	pte
>
> .
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 15:26                                   ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 15:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2018/6/22 22:43, Will Deacon wrote:
> On Fri, Jun 22, 2018 at 09:46:53PM +0800, Wei Xu wrote:
>> On 2018/6/22 21:31, Will Deacon wrote:
>>> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>>>> On 2018/6/22 19:16, Will Deacon wrote:
>>>>> On Fri, Jun 22, 2018 at 06:45:15PM +0800, Wei Xu wrote:
>>>>>> On 2018/6/22 17:23, Will Deacon wrote:
>>>>>>> Perhaps just writing back the table entries is enough to cause the issue,
>>>>>>> although I really can't understand why that would be the case. Can you try
>>>>>>> the diff below (without my previous change), please?
>>>>>> Thanks!
>>>>>> But it does not resolve the issue(only apply this patch based on 4.17.0).
>>>>> Thanks, that's a useful data point. It means that it still crashes even if
>>>>> we write back the same table entries, so it's the fact that we're writing
>>>>> them at all which causes the problem, not the value that we write.
>>>>>
>>>>> Whilst looking at the code, we noticed a missing DMB. On the off-chance
>>>>> that it helps, can you try this instead please?
>>>> Thanks!
>>>> Only apply below patch based on 4.17.0, we still got the crash.
>>> Oh well, it was worth a shot (and that's still a fix worth having). Please
>>> can you provide the complete disassembly for kpti_install_ng_mappings()
>>> (I'm referring to the C function in cpufeature.c) along with a corresponding
>>> crash log so that we can correlate the instruction stream with the crash?
>> Just let me know if you need more information.
> Thanks; the disassembly and log are really helpful.
>
> I have another patch for you to try below. Please can you let me know how
> you get on, and sorry for the back-and-forth on this.

No worry.
Great, I have tried 30 times and it works well with this patch applying 
on the 4.17.0.
And is it possible to let me know how you are using the disassemble and 
log to debug
this kind issue?
Thanks!

Best Regards,
Wei

> Will
>
> --->8
>
> diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
> index 5f9a73a4452c..26c5c3fabca8 100644
> --- a/arch/arm64/mm/proc.S
> +++ b/arch/arm64/mm/proc.S
> @@ -216,9 +216,14 @@ ENDPROC(idmap_cpu_replace_ttbr1)
>   	.endm
>   
>   	.macro __idmap_kpti_put_pgtable_ent_ng, type
> -	orr	\type, \type, #PTE_NG		// Same bit for blocks and pages
> +	eor	\type, \type, #PTE_NG		// Same bit for blocks and pages
>   	str	\type, [cur_\()\type\()p]	// Update the entry and ensure it
> +	tbz	\type, #11, 1234f
>   	dc	civac, cur_\()\type\()p		// is visible to all CPUs.
> +	b	1235f
> +	1234:
> +	dc	cvac, cur_\()\type\()p
> +	1235:
>   	.endm
>   
>   /*
> @@ -298,6 +303,7 @@ skip_pgd:
>   	/* PUD */
>   walk_puds:
>   	.if CONFIG_PGTABLE_LEVELS > 3
> +	eor	pgd, pgd, #PTE_NG
>   	pte_to_phys	cur_pudp, pgd
>   	add	end_pudp, cur_pudp, #(PTRS_PER_PUD * 8)
>   do_pud:	__idmap_kpti_get_pgtable_ent	pud
> @@ -319,6 +325,7 @@ next_pud:
>   	/* PMD */
>   walk_pmds:
>   	.if CONFIG_PGTABLE_LEVELS > 2
> +	eor	pud, pud, #PTE_NG
>   	pte_to_phys	cur_pmdp, pud
>   	add	end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8)
>   do_pmd:	__idmap_kpti_get_pgtable_ent	pmd
> @@ -339,6 +346,7 @@ next_pmd:
>   
>   	/* PTE */
>   walk_ptes:
> +	eor	pmd, pmd, #PTE_NG
>   	pte_to_phys	cur_ptep, pmd
>   	add	end_ptep, cur_ptep, #(PTRS_PER_PTE * 8)
>   do_pte:	__idmap_kpti_get_pgtable_ent	pte
>
> .
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-22 14:28                             ` Mark Rutland
@ 2018-06-22 15:28                               ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 15:28 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Will Deacon, James Morse, catalin.marinas, suzuki.poulose,
	dave.martin, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

Hi Mark,

On 2018/6/22 22:28, Mark Rutland wrote:
> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>>      [    0.042462] Insufficient stack space to handle exception!
>>      [    0.042464] ESR: 0x96000046 -- DABT (current EL)
>>      [    0.043781] FAR: 0xffff0000093a80e0
>>      [    0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> Here, the FAR points somewhere in the task stack, so we're evidently
> faulting on that...
>
>>      [    0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>      [    0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>      [    0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #16
>>      [    0.067946] Hardware name: linux,dummy-virt (DT)
>>      [    0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>      [    0.077480] pc : el1_sync+0x0/0xb0
>>      [    0.080970] lr : kpti_install_ng_mappings+0x120/0x214
>>      [    0.086143] sp : ffff0000093a80e0
>>      [    0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
>>      [    0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
>>      [    0.100241] x25: ffff00000906d000 x24: ffff000009191000
>>      [    0.105657] x23: ffff000008ea9000 x22: 0000000041190000
>>      [    0.111448] x21: ffff0000091f7000 x20: 0000000000000000
>>      [    0.116437] x19: ffff000009190000 x18: 000000003455d99d
>>      [    0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
>>      [    0.127155] x15: 000000007eff6000 x14: 000000007eff6000
>>      [    0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
>>      [    0.137886] x11: 000000007eff8000 x10: 0000000000000000
>>      [    0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
>>      [    0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
>>      [    0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
>>      [    0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
>>      [    0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
>>      [    0.170179] Kernel panic - not syncing: kernel stack overflow
>>      [    0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #16
>>      [    0.184152] Hardware name: linux,dummy-virt (DT)
>>      [    0.188851] Call trace:
>>      [    0.191380]  dump_backtrace+0x0/0x180
>>      [    0.195113]  show_stack+0x14/0x1c
>>      [    0.198488]  dump_stack+0x90/0xb0
>>      [    0.201862]  panic+0x138/0x2a0
>>      [    0.204989]  __stack_chk_fail+0x0/0x18
>>      [    0.208836]  handle_bad_stack+0x118/0x124
>>      [    0.212927]  __bad_stack+0x88/0x8c
>>      [    0.216414]  el1_sync+0x0/0xb0
>>      [    0.219544] Unable to handle kernel paging request at virtual address
>> ffff0000093abce0
> Likewise, here we're faulting on an address within the task stack,
> presumably as part of the unwinding process...
>
>>      [    0.227507] Mem abort info:
>>      [    0.230390]   ESR = 0x96000006
>>      [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
>>      [    0.239428]   SET = 0, FnV = 0
>>      [    0.242555]   EA = 0, S1PTW = 0
>>      [    0.245797] Data abort info:
>>      [    0.248795]   ISV = 0, ISS = 0x00000006
>>      [    0.252652]   CM = 0, WnR = 0
>>      [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
>> (ptrval)
>>      [    0.262645] [ffff0000093abce0] pgd=00000000411f8803,
>> pud=00000000411f9803, pmd=0000000000000000
> ... and here the PMD for the task stack is all zeroes, so evidently
> that's getting corrupted somehow.
>
> It appears that the overflow stack (which IIRC is embedded within the
> kernel's data segment, as part of the image mapping), is fine.
>
> I wonder if there's some existing weirdness in the page tables for the
> vmalloc area that causes things to go wrong. Can you please:
>
> * enable ARM64_PTDUMP_DEBUGFS
>
> * boot with kpti=off (with Will's patch to make this work)
>
> * as root, cat /sys/kernel/debug/kernel_page_tables
>
> ... and dump the result here?
Thanks!
Can I do this later since Will's new patch works?

Best Regards,
Wei

> Thanks,
> Mark.
>
> .
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 15:28                               ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On 2018/6/22 22:28, Mark Rutland wrote:
> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>>      [    0.042462] Insufficient stack space to handle exception!
>>      [    0.042464] ESR: 0x96000046 -- DABT (current EL)
>>      [    0.043781] FAR: 0xffff0000093a80e0
>>      [    0.044239] Task stack: [0xffff0000093a8000..0xffff0000093ac000]
> Here, the FAR points somewhere in the task stack, so we're evidently
> faulting on that...
>
>>      [    0.046967] IRQ stack: [0xffff000008000000..0xffff000008004000]
>>      [    0.053361] Overflow stack: [0xffff80003efce2f0..0xffff80003efcf2f0]
>>      [    0.059754] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #16
>>      [    0.067946] Hardware name: linux,dummy-virt (DT)
>>      [    0.072644] pstate: 604003c5 (nZCv DAIF +PAN -UAO)
>>      [    0.077480] pc : el1_sync+0x0/0xb0
>>      [    0.080970] lr : kpti_install_ng_mappings+0x120/0x214
>>      [    0.086143] sp : ffff0000093a80e0
>>      [    0.089513] x29: ffff0000093abce0 x28: ffff000008ea9000
>>      [    0.094929] x27: ffff000008ea9000 x26: ffff0000091f7000
>>      [    0.100241] x25: ffff00000906d000 x24: ffff000009191000
>>      [    0.105657] x23: ffff000008ea9000 x22: 0000000041190000
>>      [    0.111448] x21: ffff0000091f7000 x20: 0000000000000000
>>      [    0.116437] x19: ffff000009190000 x18: 000000003455d99d
>>      [    0.121739] x17: 0000000000000001 x16: 00f8000040ffff13
>>      [    0.127155] x15: 000000007eff6000 x14: 000000007eff6000
>>      [    0.132576] x13: 00f800007fe00f11 x12: 000000007eff8000
>>      [    0.137886] x11: 000000007eff8000 x10: 0000000000000000
>>      [    0.143300] x9 : 000000007eff9000 x8 : 000000007eff9000
>>      [    0.148717] x7 : 0000000000000000 x6 : 00000000411f8000
>>      [    0.154028] x5 : 00000000411f8000 x4 : 0000000040a443d4
>>      [    0.159444] x3 : 00000000411f7000 x2 : 00000000411f7000
>>      [    0.164862] x1 : ffff00000906d7b0 x0 : ffff80003da61c00
>>      [    0.170179] Kernel panic - not syncing: kernel stack overflow
>>      [    0.176069] CPU: 0 PID: 12 Comm: migration/0 Not tainted
>> 4.17.0-45864-g29dcea8-dirty #16
>>      [    0.184152] Hardware name: linux,dummy-virt (DT)
>>      [    0.188851] Call trace:
>>      [    0.191380]  dump_backtrace+0x0/0x180
>>      [    0.195113]  show_stack+0x14/0x1c
>>      [    0.198488]  dump_stack+0x90/0xb0
>>      [    0.201862]  panic+0x138/0x2a0
>>      [    0.204989]  __stack_chk_fail+0x0/0x18
>>      [    0.208836]  handle_bad_stack+0x118/0x124
>>      [    0.212927]  __bad_stack+0x88/0x8c
>>      [    0.216414]  el1_sync+0x0/0xb0
>>      [    0.219544] Unable to handle kernel paging request at virtual address
>> ffff0000093abce0
> Likewise, here we're faulting on an address within the task stack,
> presumably as part of the unwinding process...
>
>>      [    0.227507] Mem abort info:
>>      [    0.230390]   ESR = 0x96000006
>>      [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
>>      [    0.239428]   SET = 0, FnV = 0
>>      [    0.242555]   EA = 0, S1PTW = 0
>>      [    0.245797] Data abort info:
>>      [    0.248795]   ISV = 0, ISS = 0x00000006
>>      [    0.252652]   CM = 0, WnR = 0
>>      [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
>> (ptrval)
>>      [    0.262645] [ffff0000093abce0] pgd=00000000411f8803,
>> pud=00000000411f9803, pmd=0000000000000000
> ... and here the PMD for the task stack is all zeroes, so evidently
> that's getting corrupted somehow.
>
> It appears that the overflow stack (which IIRC is embedded within the
> kernel's data segment, as part of the image mapping), is fine.
>
> I wonder if there's some existing weirdness in the page tables for the
> vmalloc area that causes things to go wrong. Can you please:
>
> * enable ARM64_PTDUMP_DEBUGFS
>
> * boot with kpti=off (with Will's patch to make this work)
>
> * as root, cat /sys/kernel/debug/kernel_page_tables
>
> ... and dump the result here?
Thanks!
Can I do this later since Will's new patch works?

Best Regards,
Wei

> Thanks,
> Mark.
>
> .
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-22 15:28                               ` Wei Xu
@ 2018-06-22 15:41                                 ` Will Deacon
  -1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 15:41 UTC (permalink / raw)
  To: Wei Xu
  Cc: Mark Rutland, James Morse, catalin.marinas, suzuki.poulose,
	dave.martin, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

On Fri, Jun 22, 2018 at 11:28:21PM +0800, Wei Xu wrote:
> On 2018/6/22 22:28, Mark Rutland wrote:
> >On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> >>     [    0.227507] Mem abort info:
> >>     [    0.230390]   ESR = 0x96000006
> >>     [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
> >>     [    0.239428]   SET = 0, FnV = 0
> >>     [    0.242555]   EA = 0, S1PTW = 0
> >>     [    0.245797] Data abort info:
> >>     [    0.248795]   ISV = 0, ISS = 0x00000006
> >>     [    0.252652]   CM = 0, WnR = 0
> >>     [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
> >>(ptrval)
> >>     [    0.262645] [ffff0000093abce0] pgd=00000000411f8803,
> >>pud=00000000411f9803, pmd=0000000000000000
> >... and here the PMD for the task stack is all zeroes, so evidently
> >that's getting corrupted somehow.
> >
> >It appears that the overflow stack (which IIRC is embedded within the
> >kernel's data segment, as part of the image mapping), is fine.
> >
> >I wonder if there's some existing weirdness in the page tables for the
> >vmalloc area that causes things to go wrong. Can you please:
> >
> >* enable ARM64_PTDUMP_DEBUGFS
> >
> >* boot with kpti=off (with Will's patch to make this work)
> >
> >* as root, cat /sys/kernel/debug/kernel_page_tables
> >
> >... and dump the result here?
> Thanks!
> Can I do this later since Will's new patch works?

Yes, you should probably go to bed now! Please note that my patch still
isn't the right thing for mainline, since it avoids setting PTE_NG for
tables and therefore won't solve the boot-time issue with KASAN enabled.

We also don't understand why clean+invalidate is causing the issue on your
CPU, whereas clean does not. It looks like clean+invalidate somehow results
in page table entries being zeroed.

Have a good weekend,

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 15:41                                 ` Will Deacon
  0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-22 15:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 22, 2018 at 11:28:21PM +0800, Wei Xu wrote:
> On 2018/6/22 22:28, Mark Rutland wrote:
> >On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
> >>     [    0.227507] Mem abort info:
> >>     [    0.230390]   ESR = 0x96000006
> >>     [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
> >>     [    0.239428]   SET = 0, FnV = 0
> >>     [    0.242555]   EA = 0, S1PTW = 0
> >>     [    0.245797] Data abort info:
> >>     [    0.248795]   ISV = 0, ISS = 0x00000006
> >>     [    0.252652]   CM = 0, WnR = 0
> >>     [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
> >>(ptrval)
> >>     [    0.262645] [ffff0000093abce0] pgd=00000000411f8803,
> >>pud=00000000411f9803, pmd=0000000000000000
> >... and here the PMD for the task stack is all zeroes, so evidently
> >that's getting corrupted somehow.
> >
> >It appears that the overflow stack (which IIRC is embedded within the
> >kernel's data segment, as part of the image mapping), is fine.
> >
> >I wonder if there's some existing weirdness in the page tables for the
> >vmalloc area that causes things to go wrong. Can you please:
> >
> >* enable ARM64_PTDUMP_DEBUGFS
> >
> >* boot with kpti=off (with Will's patch to make this work)
> >
> >* as root, cat /sys/kernel/debug/kernel_page_tables
> >
> >... and dump the result here?
> Thanks!
> Can I do this later since Will's new patch works?

Yes, you should probably go to bed now! Please note that my patch still
isn't the right thing for mainline, since it avoids setting PTE_NG for
tables and therefore won't solve the boot-time issue with KASAN enabled.

We also don't understand why clean+invalidate is causing the issue on your
CPU, whereas clean does not. It looks like clean+invalidate somehow results
in page table entries being zeroed.

Have a good weekend,

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-22 15:41                                 ` Will Deacon
@ 2018-06-22 16:02                                   ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 16:02 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, James Morse, catalin.marinas, suzuki.poulose,
	dave.martin, marc.zyngier, linux-arm-kernel, linux-kernel,
	Linuxarm, Hanjun Guo, xiexiuqi, huangdaode, Chenxin (Charles),
	Xiongfanggou (James), Liguozhu (Kenneth),
	Zhangyi ac, jonathan.cameron, Shameerali Kolothum Thodi,
	John Garry, Salil Mehta, Shiju Jose, Zhuangyuzeng (Yisen),
	Wangzhou (B), kongxinwei (A), Liyuan (Larry, Turing Solution),
	libeijian, zhangbin011

Hi Will, Mark,

On 2018/6/22 23:41, Will Deacon wrote:
> On Fri, Jun 22, 2018 at 11:28:21PM +0800, Wei Xu wrote:
>> On 2018/6/22 22:28, Mark Rutland wrote:
>>> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>>>>      [    0.227507] Mem abort info:
>>>>      [    0.230390]   ESR = 0x96000006
>>>>      [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
>>>>      [    0.239428]   SET = 0, FnV = 0
>>>>      [    0.242555]   EA = 0, S1PTW = 0
>>>>      [    0.245797] Data abort info:
>>>>      [    0.248795]   ISV = 0, ISS = 0x00000006
>>>>      [    0.252652]   CM = 0, WnR = 0
>>>>      [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
>>>> (ptrval)
>>>>      [    0.262645] [ffff0000093abce0] pgd=00000000411f8803,
>>>> pud=00000000411f9803, pmd=0000000000000000
>>> ... and here the PMD for the task stack is all zeroes, so evidently
>>> that's getting corrupted somehow.
>>>
>>> It appears that the overflow stack (which IIRC is embedded within the
>>> kernel's data segment, as part of the image mapping), is fine.
>>>
>>> I wonder if there's some existing weirdness in the page tables for the
>>> vmalloc area that causes things to go wrong. Can you please:
>>>
>>> * enable ARM64_PTDUMP_DEBUGFS
>>>
>>> * boot with kpti=off (with Will's patch to make this work)
>>>
>>> * as root, cat /sys/kernel/debug/kernel_page_tables
>>>
>>> ... and dump the result here?
>> Thanks!
>> Can I do this later since Will's new patch works?
> Yes, you should probably go to bed now! Please note that my patch still
> isn't the right thing for mainline, since it avoids setting PTE_NG for
> tables and therefore won't solve the boot-time issue with KASAN enabled.
>
> We also don't understand why clean+invalidate is causing the issue on your
> CPU, whereas clean does not. It looks like clean+invalidate somehow results
> in page table entries being zeroed.
>
> Have a good weekend,

Got it. Thanks and enjoy the fifa world cup :)
Below is the log enabled ARM64_PTDUMP_DEBUGFS.
Only Will's kpti early_param patch on 4.17.0.
Hope it helps.

     ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
     ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel 
./Image-4.17-joyx -initrd
     ../mini-rootfs-arm64.cpio.gz -nographic -append "kpti=off 
rdinit=init console=tt
     yAMA0 earlycon=pl011,0x9000000"
     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
     [    0.000000] Linux version 4.17.0-45865-ga3d6816 
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #19 
SMP PREEMPT Fri Jun 22 23:47:07 CST 2018
     [    0.000000] Machine model: linux,dummy-virt
     [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
     [    0.000000] bootconsole [pl11] enabled
     [    0.000000] efi: Getting EFI parameters from FDT:
     [    0.000000] efi: UEFI not found.
     [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
     [    0.000000] NUMA: No NUMA configuration found
     [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
     [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
     [    0.000000] Zone ranges:
     [    0.000000]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000]   Normal   empty
     [    0.000000] Movable zone start for each node
     [    0.000000] Early memory node ranges
     [    0.000000]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
     [    0.000000] psci: probing for conduit method from DT.
     [    0.000000] psci: PSCIv1.0 detected in firmware.
     [    0.000000] psci: Using standard PSCI v0.2 function IDs
     [    0.000000] psci: Trusted OS migration not required
     [    0.000000] psci: SMC Calling Convention v1.1
     [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
     [    0.000000] percpu: Embedded 24 pages/cpu @        (ptrval) 
s57984 r8192 d32128 u98304
     [    0.000000] Detected VIPT I-cache on CPU0
     [    0.000000] CPU features: kernel page table isolation forced OFF 
by command line option
     [    0.000000] CPU features: detected: Hardware dirty bit management
     [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
     [    0.000000] Policy zone: DMA32
     [    0.000000] Kernel command line: kpti=off rdinit=init 
console=ttyAMA0 earlycon=pl011,0x9000000
     [    0.000000] Memory: 968436K/1048576K available (10044K kernel 
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved, 
16384K cma-reserved)
     [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, 
Nodes=1
     [    0.000000] Preemptible hierarchical RCU implementation.
     [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
     [    0.000000]     Tasks RCU enabled.
     [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
     [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
     [    0.000000] GICv3: Distributor has no Range Selector support
     [    0.000000] GICv3: no VLPI support, no direct LPI support
     [    0.000000] ITS [mem 0x08080000-0x0809ffff]
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
     [    0.000000] GIC: using LPI property table @0x000000007d850000
     [    0.000000] ITS: Allocated 1792 chunks for LPIs
     [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
     [    0.000000] GIC: PPI11 is secure or misconfigured
     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
     [    0.000000] arch_timer: WARNING: Please fix your firmware
     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
     [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
     [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
     [    0.000859] Console: colour dummy device 80x25
     [    0.001459] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
     [    0.002537] pid_max: default: 32768 minimum: 301
     [    0.003028] Security Framework initialized
     [    0.003606] Dentry cache hash table entries: 131072 (order: 8, 
1048576 bytes)
     [    0.004418] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
     [    0.005129] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.005938] Mountpoint-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.026041] ASID allocator initialised with 32768 entries
     [    0.030055] Hierarchical SRCU implementation.
     [    0.034426] Platform MSI: its domain created
     [    0.034885] PCI/MSI: /intc/its domain created
     [    0.035457] EFI services will not be available.
     [    0.038086] smp: Bringing up secondary CPUs ...
     [    0.038557] smp: Brought up 1 node, 1 CPU
     [    0.038966] SMP: Total of 1 processors activated.
     [    0.039447] CPU features: detected: GIC system register CPU 
interface
     [    0.040101] CPU features: detected: Privileged Access Never
     [    0.040667] CPU features: detected: User Access Override
     [    0.041988] CPU: All CPU(s) started at EL1
     [    0.042536] alternatives: patching kernel code
     [    0.044809] devtmpfs: initialized
     [    0.046662] clocksource: jiffies: mask: 0xffffffff max_cycles: 
0xffffffff, max_idle_ns: 7645041785100000 ns
     [    0.049470] futex hash table entries: 256 (order: 3, 32768 bytes)
     [    0.055780] pinctrl core: initialized pinctrl subsystem
     [    0.061504] DMI not present or invalid.
     [    0.065230] NET: Registered protocol family 16
     [    0.069514] audit: initializing netlink subsys (disabled)
     [    0.075351] cpuidle: using governor menu
     [    0.078855] audit: type=2000 audit(0.068:1): state=initialized 
audit_enabled=0 res=1
     [    0.086714] vdso: 2 pages (1 code @         (ptrval), 1 data 
@         (ptrval))
     [    0.094456] hw-breakpoint: found 6 breakpoint and 4 watchpoint 
registers.
     [    0.101869] DMA: preallocated 256 KiB pool for atomic allocations
     [    0.107408] Serial: AMBA PL011 UART driver
     [    0.114802] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 39, 
base_baud = 0) is a PL011 rev1
     [    0.120256] console [ttyAMA0] enabled
     [    0.120256] console [ttyAMA0] enabled
     [    0.127525] bootconsole [pl11] disabled
     [    0.127525] bootconsole [pl11] disabled
     [    0.135667] irq: type mismatch, failed to map hwirq-27 for intc!
     [    0.153827] HugeTLB registered 2.00 MiB page size, pre-allocated 
0 pages
     [    0.157547] cryptd: max_cpu_qlen set to 1000
     [    0.165692] ACPI: Interpreter disabled.
     [    0.166341] vgaarb: loaded
     [    0.166629] SCSI subsystem initialized
     [    0.169664] usbcore: registered new interface driver usbfs
     [    0.170139] usbcore: registered new interface driver hub
     [    0.174110] usbcore: registered new device driver usb
     [    0.179293] pps_core: LinuxPPS API ver. 1 registered
     [    0.184239] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 
Rodolfo Giometti <giometti@linux.it>
     [    0.193320] PTP clock support registered
     [    0.197360] EDAC MC: Ver: 3.0.0
     [    0.201468] Advanced Linux Sound Architecture Driver Initialized.
     [    0.207035] clocksource: Switched to clocksource arch_sys_counter
     [    0.212870] VFS: Disk quotas dquot_6.6.0
     [    0.216844] VFS: Dquot-cache hash table entries: 512 (order 0, 
4096 bytes)
     [    0.223782] pnp: PnP ACPI: disabled
     [    0.229309] NET: Registered protocol family 2
     [    0.232711] tcp_listen_portaddr_hash hash table entries: 512 
(order: 1, 8192 bytes)
     [    0.239478] TCP established hash table entries: 8192 (order: 4, 
65536 bytes)
     [    0.246564] TCP bind hash table entries: 8192 (order: 5, 131072 
bytes)
     [    0.253246] TCP: Hash tables configured (established 8192 bind 8192)
     [    0.259572] UDP hash table entries: 512 (order: 2, 16384 bytes)
     [    0.265610] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
     [    0.272044] NET: Registered protocol family 1
     [    0.288576] RPC: Registered named UNIX socket transport module.
     [    0.289058] RPC: Registered udp transport module.
     [    0.289434] RPC: Registered tcp transport module.
     [    0.291949] RPC: Registered tcp NFSv4.1 backchannel transport 
module.
     [    0.298471] Unpacking initramfs...
     [    0.835705] Freeing initrd memory: 29212K
     [    0.836273] hw perfevents: enabled with armv8_pmuv3 PMU driver, 
13 counters available
     [    0.837026] kvm [1]: HYP mode not available
     [    0.838111] Initialise system trusted keyrings
     [    0.838710] workingset: timestamp_bits=44 max_order=18 
bucket_order=0
     [    0.840716] squashfs: version 4.0 (2009/01/31) Phillip Lougher
     [    0.846449] NFS: Registering the id_resolver key type
     [    0.846892] Key type id_resolver registered
     [    0.847453] Key type id_legacy registered
     [    0.847789] nfs4filelayout_init: NFSv4 File Layout Driver 
Registering...
     [    0.848383] 9p: Installing v9fs 9p2000 file system support
     [    0.848878] pstore: using deflate compression
     [    0.849942] Key type asymmetric registered
     [    0.850303] Asymmetric key parser 'x509' registered
     [    0.850729] Block layer SCSI generic (bsg) driver version 0.4 
loaded (major 245)
     [    0.851480] io scheduler noop registered
     [    0.851801] io scheduler deadline registered
     [    0.852215] io scheduler cfq registered (default)
     [    0.852595] io scheduler mq-deadline registered
     [    0.852955] io scheduler kyber registered
     [    0.855192] pl061_gpio 9030000.pl061: PL061 GPIO chip 
@0x0000000009030000 registered
     [    0.857039] PCI: OF: host bridge /pcie@10000000 ranges:
     [    0.857481] PCI: OF:    IO 0x3eff0000..0x3effffff -> 0x00000000
     [    0.857953] PCI: OF:   MEM 0x10000000..0x3efeffff -> 0x10000000
     [    0.858435] PCI: OF:   MEM 0x8000000000..0xffffffffff -> 
0x8000000000
     [    0.858956] pci-host-generic 3f000000.pcie: ECAM at [mem 
0x3f000000-0x3fffffff] for [bus 00-0f]
     [    0.860042] pci-host-generic 3f000000.pcie: PCI host bridge to 
bus 0000:00
     [    0.860598] pci_bus 0000:00: root bus resource [bus 00-0f]
     [    0.861034] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
     [    0.861524] pci_bus 0000:00: root bus resource [mem 
0x10000000-0x3efeffff]
     [    0.862074] pci_bus 0000:00: root bus resource [mem 
0x8000000000-0xffffffffff]
     [    0.863568] pci 0000:00:01.0: BAR 6: assigned [mem 
0x10000000-0x1003ffff pref]
     [    0.864147] pci 0000:00:01.0: BAR 4: assigned [mem 
0x8000000000-0x8000003fff 64bit pref]
     [    0.864803] pci 0000:00:01.0: BAR 1: assigned [mem 
0x10040000-0x10040fff]
     [    0.865342] pci 0000:00:01.0: BAR 0: assigned [io 0x1000-0x101f]
     [    0.866470] EINJ: ACPI disabled.
     [    0.868836] virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
     [    0.874100] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
     [    0.875395] SuperH (H)SCI(F) driver initialized
     [    0.876757] msm_serial: driver initialized
     [    0.877328] cacheinfo: Unable to detect cache hierarchy for CPU 0
     [    0.880330] loop: module loaded
     [    0.881885] libphy: Fixed MDIO Bus: probed
     [    0.882499] tun: Universal TUN/TAP device driver, 1.6
     [    0.884820] thunder_xcv, ver 1.0
     [    0.885126] thunder_bgx, ver 1.0
     [    0.885415] nicpf, ver 1.0
     [    0.885764] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
     [    0.886246] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
     [    0.886927] igb: Intel(R) Gigabit Ethernet Network Driver - 
version 5.4.0-k
     [    0.887687] igb: Copyright (c) 2007-2014 Intel Corporation.
     [    0.888159] igbvf: Intel(R) Gigabit Virtual Function Network 
Driver - version 2.4.0-k
     [    0.888782] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
     [    0.889388] sky2: driver version 1.30
     [    0.889931] VFIO - User Level meta-driver version: 0.3
     [    0.890861] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) 
Driver
     [    0.891644] ehci-pci: EHCI PCI platform driver
     [    0.892043] ehci-platform: EHCI generic platform driver
     [    0.892515] ehci-orion: EHCI orion driver
     [    0.892880] ehci-exynos: EHCI EXYNOS driver
     [    0.893414] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
     [    0.893914] ohci-pci: OHCI PCI platform driver
     [    0.894308] ohci-platform: OHCI generic platform driver
     [    0.894765] ohci-exynos: OHCI EXYNOS driver
     [    0.895357] usbcore: registered new interface driver usb-storage
     [    0.896739] rtc-pl031 9010000.pl031: rtc core: registered pl031 
as rtc0
     [    0.897504] i2c /dev entries driver
     [    0.899576] sdhci: Secure Digital Host Controller Interface driver
     [    0.900086] sdhci: Copyright(c) Pierre Ossman
     [    0.900551] Synopsys Designware Multimedia Card Interface Driver
     [    0.901791] sdhci-pltfm: SDHCI platform and OF driver helper
     [    0.902636] ledtrig-cpu: registered to indicate activity on CPUs
     [    0.903644] usbcore: registered new interface driver usbhid
     [    0.904106] usbhid: USB HID core driver
     [    0.905520] NET: Registered protocol family 17
     [    0.905917] 9pnet: Installing 9P2000 support
     [    0.906304] Key type dns_resolver registered
     [    0.906814] registered taskstats version 1
     [    0.907542] Loading compiled-in X.509 certificates
     [    0.908155] input: gpio-keys as 
/devices/platform/gpio-keys/input/input0
     [    0.909760] rtc-pl031 9010000.pl031: setting system clock to 
2015-01-30 02:38:42 UTC (1422585522)
     [    0.918889] ALSA device list:
     [    0.921687]   No soundcards found.
     [    0.925317] uart-pl011 9000000.pl011: no DMA platform data
     [    0.930981] Freeing unused kernel memory: 1216K
     Starting rcS...
     ++ Mounting filesystem
     ifdown: interface lo not configured
     ifdown: interface eth0 not configured
     ++ Starting ssh daemon
     [    0.950291] random: sshd: uninitialized urandom read (32 bytes read)
     ip: RTNETLINK answers: File exists
     rcS Complete
     Welcome to Mini Linux
     GNU/Linux 4.17.0-45865-ga3d6816 aarch64
     Version: 1.1.6
             .--.
            |o_o |
            |:_/ |
           //   \ \
          (|     | )
         /'\_   _/`\
         \___)=(___/
     udhcpc: started, v1.29.0.git
     Setting IP address 0.0.0.0 on eth0
     Documentation: http://open-estuary.org
     E-mail: Chinafengliang@163.com
     estuary:/$ udhcpc: sending discover
     udhcpc: sending select for 10.0.2.15
     udhcpc: lease of 10.0.2.15 obtained, lease time 86400
     Setting IP address 10.0.2.15 on eth0
     Deleting routers
     route: SIOCDELRT: No such process
     Adding router 10.0.2.2
     Recreating /etc/resolv.conf
      Adding DNS server 10.0.2.3

     estuary:/$
     estuary:/$ cat /syestuary:/$ cat /sys/keestuary:/$ cat 
/sys/kernel/debestuary:/$ cat /sys/kernel/debug/keestuary:/$ cat 
/sys/kernel/debug/kernel_page_tables
     ---[ Modules start ]---
     ---[ Modules end ]---
     ---[ vmalloc() Area ]---
     0xffff000008000000-0xffff000008004000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008005000-0xffff000008009000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000800a000-0xffff00000800e000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008010000-0xffff000008020000          64K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000008021000-0xffff000008022000           4K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008028000-0xffff00000802c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008030000-0xffff000008034000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008035000-0xffff000008036000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000008038000-0xffff00000803c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000803d000-0xffff00000803f000           8K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008040000-0xffff000008060000         128K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000008061000-0xffff000008065000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008066000-0xffff000008067000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000008068000-0xffff00000806c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008070000-0xffff000008074000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008078000-0xffff00000807c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008080000-0xffff000008200000        1536K PTE       ro x  
SHD AF    CON     UXN MEM/NORMAL
     0xffff000008200000-0xffff000008a00000           8M PMD       ro x  
SHD AF        BLK UXN MEM/NORMAL
     0xffff000008a00000-0xffff000008a50000         320K PTE       ro x  
SHD AF    CON     UXN MEM/NORMAL
     0xffff000008a50000-0xffff000008c00000        1728K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008c00000-0xffff000008e00000           2M PMD       ro NX 
SHD AF        BLK UXN MEM/NORMAL
     0xffff000008e00000-0xffff000008f10000        1088K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009040000-0xffff0000091f0000        1728K PTE       RW NX 
SHD AF    CON     UXN MEM/NORMAL
     0xffff0000091f0000-0xffff0000091fa000          40K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000091fb000-0xffff0000092fb000           1M PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000092fc000-0xffff00000937c000         512K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009380000-0xffff000009384000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009388000-0xffff00000938c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009390000-0xffff000009394000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009398000-0xffff00000939c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093a0000-0xffff0000093a4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093a8000-0xffff0000093ac000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093b0000-0xffff0000093b4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093b8000-0xffff0000093bc000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093c0000-0xffff0000093c4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093c8000-0xffff0000093cc000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093d0000-0xffff0000093d4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093d5000-0xffff0000093dd000          32K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009408000-0xffff00000940c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009410000-0xffff000009414000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000946d000-0xffff00000946e000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000009475000-0xffff000009476000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff00000947d000-0xffff00000947e000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000009485000-0xffff000009486000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff00000948d000-0xffff00000948e000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000009495000-0xffff000009496000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000009595000-0xffff0000095d5000         256K PTE       RW NX 
SHD AF            UXN MEM/NORMAL-NC
     0xffff000009740000-0xffff000009744000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009c60000-0xffff000009c64000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009c70000-0xffff000009c74000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000a000000-0xffff00000af60000       15744K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff00000af61000-0xffff00000af65000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b020000-0xffff00000b024000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b028000-0xffff00000b02c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b030000-0xffff00000b034000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b038000-0xffff00000b03c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b048000-0xffff00000b04c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b0f8000-0xffff00000b0fc000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b170000-0xffff00000b174000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b208000-0xffff00000b20c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b230000-0xffff00000b234000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b238000-0xffff00000b23c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b48d000-0xffff00000b49d000          64K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b49e000-0xffff00000b4be000         128K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b4c0000-0xffff00000b4c4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b538000-0xffff00000b53c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b7e8000-0xffff00000b7ec000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000c000000-0xffff00000d000000          16M PMD       RW NX 
SHD AF        BLK UXN DEVICE/nGnRnE
     0xffff00000d001000-0xffff00000d004000          12K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d260000-0xffff00000d264000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d760000-0xffff00000d764000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d770000-0xffff00000d774000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d778000-0xffff00000d77c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d7b0000-0xffff00000d7b4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d7d8000-0xffff00000d7dc000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d7e0000-0xffff00000d7e4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff7dffbffd8000-0xffff7dffbffdb000          12K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     ---[ vmalloc() End ]---
     ---[ Fixmap start ]---
     0xffff7dfffe7fa000-0xffff7dfffe7fb000           4K PTE       ro x  
SHD AF            UXN MEM/NORMAL
     0xffff7dfffe7ff000-0xffff7dfffe800000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff7dfffe800000-0xffff7dfffea00000           2M PMD       ro NX 
SHD AF        BLK UXN MEM/NORMAL
     ---[ Fixmap end ]---
     ---[ PCI I/O start ]---
     0xffff7dfffee00000-0xffff7dfffee10000          64K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     ---[ PCI I/O end ]---
     ---[ vmemmap start ]---
     0xffff7e0000000000-0xffff7e0001000000          16M PMD       RW NX 
SHD AF        BLK UXN MEM/NORMAL
     ---[ vmemmap end ]---
     ---[ Linear Mapping ]---
     0xffff800000000000-0xffff800000080000         512K PTE       RW NX 
SHD AF    CON     UXN MEM/NORMAL
     0xffff800000080000-0xffff800000200000        1536K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff800000200000-0xffff800000e00000          12M PMD       ro NX 
SHD AF        BLK UXN MEM/NORMAL
     0xffff800000e00000-0xffff800000f10000        1088K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff800000f10000-0xffff800001000000         960K PTE       RW NX 
SHD AF    CON     UXN MEM/NORMAL
     0xffff800001000000-0xffff800002000000          16M PMD       RW NX 
SHD AF        BLK UXN MEM/NORMAL
     0xffff800002000000-0xffff800040000000         992M PMD       RW NX 
SHD AF    CON BLK UXN MEM/NORMAL
     estuary:/$

Thanks!

Best Regards,
Wei

> Will
>
> .
>



^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-22 16:02                                   ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-22 16:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will, Mark,

On 2018/6/22 23:41, Will Deacon wrote:
> On Fri, Jun 22, 2018 at 11:28:21PM +0800, Wei Xu wrote:
>> On 2018/6/22 22:28, Mark Rutland wrote:
>>> On Fri, Jun 22, 2018 at 09:18:27PM +0800, Wei Xu wrote:
>>>>      [    0.227507] Mem abort info:
>>>>      [    0.230390]   ESR = 0x96000006
>>>>      [    0.233517]   Exception class = DABT (current EL), IL = 32 bits
>>>>      [    0.239428]   SET = 0, FnV = 0
>>>>      [    0.242555]   EA = 0, S1PTW = 0
>>>>      [    0.245797] Data abort info:
>>>>      [    0.248795]   ISV = 0, ISS = 0x00000006
>>>>      [    0.252652]   CM = 0, WnR = 0
>>>>      [    0.255769] swapper pgtable: 4k pages, 48-bit VAs, pgdp =
>>>> (ptrval)
>>>>      [    0.262645] [ffff0000093abce0] pgd=00000000411f8803,
>>>> pud=00000000411f9803, pmd=0000000000000000
>>> ... and here the PMD for the task stack is all zeroes, so evidently
>>> that's getting corrupted somehow.
>>>
>>> It appears that the overflow stack (which IIRC is embedded within the
>>> kernel's data segment, as part of the image mapping), is fine.
>>>
>>> I wonder if there's some existing weirdness in the page tables for the
>>> vmalloc area that causes things to go wrong. Can you please:
>>>
>>> * enable ARM64_PTDUMP_DEBUGFS
>>>
>>> * boot with kpti=off (with Will's patch to make this work)
>>>
>>> * as root, cat /sys/kernel/debug/kernel_page_tables
>>>
>>> ... and dump the result here?
>> Thanks!
>> Can I do this later since Will's new patch works?
> Yes, you should probably go to bed now! Please note that my patch still
> isn't the right thing for mainline, since it avoids setting PTE_NG for
> tables and therefore won't solve the boot-time issue with KASAN enabled.
>
> We also don't understand why clean+invalidate is causing the issue on your
> CPU, whereas clean does not. It looks like clean+invalidate somehow results
> in page table entries being zeroed.
>
> Have a good weekend,

Got it. Thanks and enjoy the fifa world cup :)
Below is the log enabled ARM64_PTDUMP_DEBUGFS.
Only Will's kpti early_param patch on 4.17.0.
Hope it helps.

     ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
     ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel 
./Image-4.17-joyx -initrd
     ../mini-rootfs-arm64.cpio.gz -nographic -append "kpti=off 
rdinit=init console=tt
     yAMA0 earlycon=pl011,0x9000000"
     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
     [    0.000000] Linux version 4.17.0-45865-ga3d6816 
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #19 
SMP PREEMPT Fri Jun 22 23:47:07 CST 2018
     [    0.000000] Machine model: linux,dummy-virt
     [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
     [    0.000000] bootconsole [pl11] enabled
     [    0.000000] efi: Getting EFI parameters from FDT:
     [    0.000000] efi: UEFI not found.
     [    0.000000] cma: Reserved 16 MiB at 0x000000007f000000
     [    0.000000] NUMA: No NUMA configuration found
     [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
     [    0.000000] NUMA: NODE_DATA [mem 0x7efeb300-0x7efecdff]
     [    0.000000] Zone ranges:
     [    0.000000]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000]   Normal   empty
     [    0.000000] Movable zone start for each node
     [    0.000000] Early memory node ranges
     [    0.000000]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
     [    0.000000] psci: probing for conduit method from DT.
     [    0.000000] psci: PSCIv1.0 detected in firmware.
     [    0.000000] psci: Using standard PSCI v0.2 function IDs
     [    0.000000] psci: Trusted OS migration not required
     [    0.000000] psci: SMC Calling Convention v1.1
     [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
     [    0.000000] percpu: Embedded 24 pages/cpu @        (ptrval) 
s57984 r8192 d32128 u98304
     [    0.000000] Detected VIPT I-cache on CPU0
     [    0.000000] CPU features: kernel page table isolation forced OFF 
by command line option
     [    0.000000] CPU features: detected: Hardware dirty bit management
     [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
     [    0.000000] Policy zone: DMA32
     [    0.000000] Kernel command line: kpti=off rdinit=init 
console=ttyAMA0 earlycon=pl011,0x9000000
     [    0.000000] Memory: 968436K/1048576K available (10044K kernel 
code, 1328K rwdata, 4840K rodata, 1216K init, 409K bss, 63756K reserved, 
16384K cma-reserved)
     [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, 
Nodes=1
     [    0.000000] Preemptible hierarchical RCU implementation.
     [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
     [    0.000000]     Tasks RCU enabled.
     [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
     [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
     [    0.000000] GICv3: Distributor has no Range Selector support
     [    0.000000] GICv3: no VLPI support, no direct LPI support
     [    0.000000] ITS [mem 0x08080000-0x0809ffff]
     [    0.000000] ITS at 0x0000000008080000: allocated 8192 Devices 
@7d830000 (indirect, esz 8, psz 64K, shr 1)
     [    0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt 
Collections @7d840000 (flat, esz 8, psz 64K, shr 1)
     [    0.000000] GIC: using LPI property table @0x000000007d850000
     [    0.000000] ITS: Allocated 1792 chunks for LPIs
     [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
     [    0.000000] CPU0: using LPI pending table @0x000000007d860000
     [    0.000000] GIC: PPI11 is secure or misconfigured
     [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, 
assuming level low
     [    0.000000] arch_timer: WARNING: Please fix your firmware
     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
     [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
     [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
     [    0.000859] Console: colour dummy device 80x25
     [    0.001459] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
     [    0.002537] pid_max: default: 32768 minimum: 301
     [    0.003028] Security Framework initialized
     [    0.003606] Dentry cache hash table entries: 131072 (order: 8, 
1048576 bytes)
     [    0.004418] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
     [    0.005129] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.005938] Mountpoint-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.026041] ASID allocator initialised with 32768 entries
     [    0.030055] Hierarchical SRCU implementation.
     [    0.034426] Platform MSI: its domain created
     [    0.034885] PCI/MSI: /intc/its domain created
     [    0.035457] EFI services will not be available.
     [    0.038086] smp: Bringing up secondary CPUs ...
     [    0.038557] smp: Brought up 1 node, 1 CPU
     [    0.038966] SMP: Total of 1 processors activated.
     [    0.039447] CPU features: detected: GIC system register CPU 
interface
     [    0.040101] CPU features: detected: Privileged Access Never
     [    0.040667] CPU features: detected: User Access Override
     [    0.041988] CPU: All CPU(s) started at EL1
     [    0.042536] alternatives: patching kernel code
     [    0.044809] devtmpfs: initialized
     [    0.046662] clocksource: jiffies: mask: 0xffffffff max_cycles: 
0xffffffff, max_idle_ns: 7645041785100000 ns
     [    0.049470] futex hash table entries: 256 (order: 3, 32768 bytes)
     [    0.055780] pinctrl core: initialized pinctrl subsystem
     [    0.061504] DMI not present or invalid.
     [    0.065230] NET: Registered protocol family 16
     [    0.069514] audit: initializing netlink subsys (disabled)
     [    0.075351] cpuidle: using governor menu
     [    0.078855] audit: type=2000 audit(0.068:1): state=initialized 
audit_enabled=0 res=1
     [    0.086714] vdso: 2 pages (1 code @         (ptrval), 1 data 
@         (ptrval))
     [    0.094456] hw-breakpoint: found 6 breakpoint and 4 watchpoint 
registers.
     [    0.101869] DMA: preallocated 256 KiB pool for atomic allocations
     [    0.107408] Serial: AMBA PL011 UART driver
     [    0.114802] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 39, 
base_baud = 0) is a PL011 rev1
     [    0.120256] console [ttyAMA0] enabled
     [    0.120256] console [ttyAMA0] enabled
     [    0.127525] bootconsole [pl11] disabled
     [    0.127525] bootconsole [pl11] disabled
     [    0.135667] irq: type mismatch, failed to map hwirq-27 for intc!
     [    0.153827] HugeTLB registered 2.00 MiB page size, pre-allocated 
0 pages
     [    0.157547] cryptd: max_cpu_qlen set to 1000
     [    0.165692] ACPI: Interpreter disabled.
     [    0.166341] vgaarb: loaded
     [    0.166629] SCSI subsystem initialized
     [    0.169664] usbcore: registered new interface driver usbfs
     [    0.170139] usbcore: registered new interface driver hub
     [    0.174110] usbcore: registered new device driver usb
     [    0.179293] pps_core: LinuxPPS API ver. 1 registered
     [    0.184239] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 
Rodolfo Giometti <giometti@linux.it>
     [    0.193320] PTP clock support registered
     [    0.197360] EDAC MC: Ver: 3.0.0
     [    0.201468] Advanced Linux Sound Architecture Driver Initialized.
     [    0.207035] clocksource: Switched to clocksource arch_sys_counter
     [    0.212870] VFS: Disk quotas dquot_6.6.0
     [    0.216844] VFS: Dquot-cache hash table entries: 512 (order 0, 
4096 bytes)
     [    0.223782] pnp: PnP ACPI: disabled
     [    0.229309] NET: Registered protocol family 2
     [    0.232711] tcp_listen_portaddr_hash hash table entries: 512 
(order: 1, 8192 bytes)
     [    0.239478] TCP established hash table entries: 8192 (order: 4, 
65536 bytes)
     [    0.246564] TCP bind hash table entries: 8192 (order: 5, 131072 
bytes)
     [    0.253246] TCP: Hash tables configured (established 8192 bind 8192)
     [    0.259572] UDP hash table entries: 512 (order: 2, 16384 bytes)
     [    0.265610] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes)
     [    0.272044] NET: Registered protocol family 1
     [    0.288576] RPC: Registered named UNIX socket transport module.
     [    0.289058] RPC: Registered udp transport module.
     [    0.289434] RPC: Registered tcp transport module.
     [    0.291949] RPC: Registered tcp NFSv4.1 backchannel transport 
module.
     [    0.298471] Unpacking initramfs...
     [    0.835705] Freeing initrd memory: 29212K
     [    0.836273] hw perfevents: enabled with armv8_pmuv3 PMU driver, 
13 counters available
     [    0.837026] kvm [1]: HYP mode not available
     [    0.838111] Initialise system trusted keyrings
     [    0.838710] workingset: timestamp_bits=44 max_order=18 
bucket_order=0
     [    0.840716] squashfs: version 4.0 (2009/01/31) Phillip Lougher
     [    0.846449] NFS: Registering the id_resolver key type
     [    0.846892] Key type id_resolver registered
     [    0.847453] Key type id_legacy registered
     [    0.847789] nfs4filelayout_init: NFSv4 File Layout Driver 
Registering...
     [    0.848383] 9p: Installing v9fs 9p2000 file system support
     [    0.848878] pstore: using deflate compression
     [    0.849942] Key type asymmetric registered
     [    0.850303] Asymmetric key parser 'x509' registered
     [    0.850729] Block layer SCSI generic (bsg) driver version 0.4 
loaded (major 245)
     [    0.851480] io scheduler noop registered
     [    0.851801] io scheduler deadline registered
     [    0.852215] io scheduler cfq registered (default)
     [    0.852595] io scheduler mq-deadline registered
     [    0.852955] io scheduler kyber registered
     [    0.855192] pl061_gpio 9030000.pl061: PL061 GPIO chip 
@0x0000000009030000 registered
     [    0.857039] PCI: OF: host bridge /pcie at 10000000 ranges:
     [    0.857481] PCI: OF:    IO 0x3eff0000..0x3effffff -> 0x00000000
     [    0.857953] PCI: OF:   MEM 0x10000000..0x3efeffff -> 0x10000000
     [    0.858435] PCI: OF:   MEM 0x8000000000..0xffffffffff -> 
0x8000000000
     [    0.858956] pci-host-generic 3f000000.pcie: ECAM at [mem 
0x3f000000-0x3fffffff] for [bus 00-0f]
     [    0.860042] pci-host-generic 3f000000.pcie: PCI host bridge to 
bus 0000:00
     [    0.860598] pci_bus 0000:00: root bus resource [bus 00-0f]
     [    0.861034] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
     [    0.861524] pci_bus 0000:00: root bus resource [mem 
0x10000000-0x3efeffff]
     [    0.862074] pci_bus 0000:00: root bus resource [mem 
0x8000000000-0xffffffffff]
     [    0.863568] pci 0000:00:01.0: BAR 6: assigned [mem 
0x10000000-0x1003ffff pref]
     [    0.864147] pci 0000:00:01.0: BAR 4: assigned [mem 
0x8000000000-0x8000003fff 64bit pref]
     [    0.864803] pci 0000:00:01.0: BAR 1: assigned [mem 
0x10040000-0x10040fff]
     [    0.865342] pci 0000:00:01.0: BAR 0: assigned [io 0x1000-0x101f]
     [    0.866470] EINJ: ACPI disabled.
     [    0.868836] virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
     [    0.874100] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
     [    0.875395] SuperH (H)SCI(F) driver initialized
     [    0.876757] msm_serial: driver initialized
     [    0.877328] cacheinfo: Unable to detect cache hierarchy for CPU 0
     [    0.880330] loop: module loaded
     [    0.881885] libphy: Fixed MDIO Bus: probed
     [    0.882499] tun: Universal TUN/TAP device driver, 1.6
     [    0.884820] thunder_xcv, ver 1.0
     [    0.885126] thunder_bgx, ver 1.0
     [    0.885415] nicpf, ver 1.0
     [    0.885764] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
     [    0.886246] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
     [    0.886927] igb: Intel(R) Gigabit Ethernet Network Driver - 
version 5.4.0-k
     [    0.887687] igb: Copyright (c) 2007-2014 Intel Corporation.
     [    0.888159] igbvf: Intel(R) Gigabit Virtual Function Network 
Driver - version 2.4.0-k
     [    0.888782] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
     [    0.889388] sky2: driver version 1.30
     [    0.889931] VFIO - User Level meta-driver version: 0.3
     [    0.890861] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) 
Driver
     [    0.891644] ehci-pci: EHCI PCI platform driver
     [    0.892043] ehci-platform: EHCI generic platform driver
     [    0.892515] ehci-orion: EHCI orion driver
     [    0.892880] ehci-exynos: EHCI EXYNOS driver
     [    0.893414] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
     [    0.893914] ohci-pci: OHCI PCI platform driver
     [    0.894308] ohci-platform: OHCI generic platform driver
     [    0.894765] ohci-exynos: OHCI EXYNOS driver
     [    0.895357] usbcore: registered new interface driver usb-storage
     [    0.896739] rtc-pl031 9010000.pl031: rtc core: registered pl031 
as rtc0
     [    0.897504] i2c /dev entries driver
     [    0.899576] sdhci: Secure Digital Host Controller Interface driver
     [    0.900086] sdhci: Copyright(c) Pierre Ossman
     [    0.900551] Synopsys Designware Multimedia Card Interface Driver
     [    0.901791] sdhci-pltfm: SDHCI platform and OF driver helper
     [    0.902636] ledtrig-cpu: registered to indicate activity on CPUs
     [    0.903644] usbcore: registered new interface driver usbhid
     [    0.904106] usbhid: USB HID core driver
     [    0.905520] NET: Registered protocol family 17
     [    0.905917] 9pnet: Installing 9P2000 support
     [    0.906304] Key type dns_resolver registered
     [    0.906814] registered taskstats version 1
     [    0.907542] Loading compiled-in X.509 certificates
     [    0.908155] input: gpio-keys as 
/devices/platform/gpio-keys/input/input0
     [    0.909760] rtc-pl031 9010000.pl031: setting system clock to 
2015-01-30 02:38:42 UTC (1422585522)
     [    0.918889] ALSA device list:
     [    0.921687]   No soundcards found.
     [    0.925317] uart-pl011 9000000.pl011: no DMA platform data
     [    0.930981] Freeing unused kernel memory: 1216K
     Starting rcS...
     ++ Mounting filesystem
     ifdown: interface lo not configured
     ifdown: interface eth0 not configured
     ++ Starting ssh daemon
     [    0.950291] random: sshd: uninitialized urandom read (32 bytes read)
     ip: RTNETLINK answers: File exists
     rcS Complete
     Welcome to Mini Linux
     GNU/Linux 4.17.0-45865-ga3d6816 aarch64
     Version: 1.1.6
             .--.
            |o_o |
            |:_/ |
           //   \ \
          (|     | )
         /'\_   _/`\
         \___)=(___/
     udhcpc: started, v1.29.0.git
     Setting IP address 0.0.0.0 on eth0
     Documentation: http://open-estuary.org
     E-mail: Chinafengliang at 163.com
     estuary:/$ udhcpc: sending discover
     udhcpc: sending select for 10.0.2.15
     udhcpc: lease of 10.0.2.15 obtained, lease time 86400
     Setting IP address 10.0.2.15 on eth0
     Deleting routers
     route: SIOCDELRT: No such process
     Adding router 10.0.2.2
     Recreating /etc/resolv.conf
      Adding DNS server 10.0.2.3

     estuary:/$
     estuary:/$ cat /syestuary:/$ cat /sys/keestuary:/$ cat 
/sys/kernel/debestuary:/$ cat /sys/kernel/debug/keestuary:/$ cat 
/sys/kernel/debug/kernel_page_tables
     ---[ Modules start ]---
     ---[ Modules end ]---
     ---[ vmalloc() Area ]---
     0xffff000008000000-0xffff000008004000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008005000-0xffff000008009000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000800a000-0xffff00000800e000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008010000-0xffff000008020000          64K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000008021000-0xffff000008022000           4K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008028000-0xffff00000802c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008030000-0xffff000008034000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008035000-0xffff000008036000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000008038000-0xffff00000803c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000803d000-0xffff00000803f000           8K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008040000-0xffff000008060000         128K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000008061000-0xffff000008065000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008066000-0xffff000008067000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000008068000-0xffff00000806c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008070000-0xffff000008074000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008078000-0xffff00000807c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008080000-0xffff000008200000        1536K PTE       ro x  
SHD AF    CON     UXN MEM/NORMAL
     0xffff000008200000-0xffff000008a00000           8M PMD       ro x  
SHD AF        BLK UXN MEM/NORMAL
     0xffff000008a00000-0xffff000008a50000         320K PTE       ro x  
SHD AF    CON     UXN MEM/NORMAL
     0xffff000008a50000-0xffff000008c00000        1728K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff000008c00000-0xffff000008e00000           2M PMD       ro NX 
SHD AF        BLK UXN MEM/NORMAL
     0xffff000008e00000-0xffff000008f10000        1088K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009040000-0xffff0000091f0000        1728K PTE       RW NX 
SHD AF    CON     UXN MEM/NORMAL
     0xffff0000091f0000-0xffff0000091fa000          40K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000091fb000-0xffff0000092fb000           1M PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000092fc000-0xffff00000937c000         512K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009380000-0xffff000009384000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009388000-0xffff00000938c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009390000-0xffff000009394000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009398000-0xffff00000939c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093a0000-0xffff0000093a4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093a8000-0xffff0000093ac000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093b0000-0xffff0000093b4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093b8000-0xffff0000093bc000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093c0000-0xffff0000093c4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093c8000-0xffff0000093cc000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093d0000-0xffff0000093d4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff0000093d5000-0xffff0000093dd000          32K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009408000-0xffff00000940c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009410000-0xffff000009414000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000946d000-0xffff00000946e000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000009475000-0xffff000009476000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff00000947d000-0xffff00000947e000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000009485000-0xffff000009486000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff00000948d000-0xffff00000948e000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000009495000-0xffff000009496000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff000009595000-0xffff0000095d5000         256K PTE       RW NX 
SHD AF            UXN MEM/NORMAL-NC
     0xffff000009740000-0xffff000009744000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009c60000-0xffff000009c64000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff000009c70000-0xffff000009c74000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000a000000-0xffff00000af60000       15744K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff00000af61000-0xffff00000af65000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b020000-0xffff00000b024000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b028000-0xffff00000b02c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b030000-0xffff00000b034000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b038000-0xffff00000b03c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b048000-0xffff00000b04c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b0f8000-0xffff00000b0fc000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b170000-0xffff00000b174000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b208000-0xffff00000b20c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b230000-0xffff00000b234000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b238000-0xffff00000b23c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b48d000-0xffff00000b49d000          64K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b49e000-0xffff00000b4be000         128K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b4c0000-0xffff00000b4c4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b538000-0xffff00000b53c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000b7e8000-0xffff00000b7ec000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000c000000-0xffff00000d000000          16M PMD       RW NX 
SHD AF        BLK UXN DEVICE/nGnRnE
     0xffff00000d001000-0xffff00000d004000          12K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d260000-0xffff00000d264000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d760000-0xffff00000d764000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d770000-0xffff00000d774000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d778000-0xffff00000d77c000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d7b0000-0xffff00000d7b4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d7d8000-0xffff00000d7dc000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff00000d7e0000-0xffff00000d7e4000          16K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     0xffff7dffbffd8000-0xffff7dffbffdb000          12K PTE       RW NX 
SHD AF            UXN MEM/NORMAL
     ---[ vmalloc() End ]---
     ---[ Fixmap start ]---
     0xffff7dfffe7fa000-0xffff7dfffe7fb000           4K PTE       ro x  
SHD AF            UXN MEM/NORMAL
     0xffff7dfffe7ff000-0xffff7dfffe800000           4K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     0xffff7dfffe800000-0xffff7dfffea00000           2M PMD       ro NX 
SHD AF        BLK UXN MEM/NORMAL
     ---[ Fixmap end ]---
     ---[ PCI I/O start ]---
     0xffff7dfffee00000-0xffff7dfffee10000          64K PTE       RW NX 
SHD AF            UXN DEVICE/nGnRE
     ---[ PCI I/O end ]---
     ---[ vmemmap start ]---
     0xffff7e0000000000-0xffff7e0001000000          16M PMD       RW NX 
SHD AF        BLK UXN MEM/NORMAL
     ---[ vmemmap end ]---
     ---[ Linear Mapping ]---
     0xffff800000000000-0xffff800000080000         512K PTE       RW NX 
SHD AF    CON     UXN MEM/NORMAL
     0xffff800000080000-0xffff800000200000        1536K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff800000200000-0xffff800000e00000          12M PMD       ro NX 
SHD AF        BLK UXN MEM/NORMAL
     0xffff800000e00000-0xffff800000f10000        1088K PTE       ro NX 
SHD AF            UXN MEM/NORMAL
     0xffff800000f10000-0xffff800001000000         960K PTE       RW NX 
SHD AF    CON     UXN MEM/NORMAL
     0xffff800001000000-0xffff800002000000          16M PMD       RW NX 
SHD AF        BLK UXN MEM/NORMAL
     0xffff800002000000-0xffff800040000000         992M PMD       RW NX 
SHD AF    CON BLK UXN MEM/NORMAL
     estuary:/$

Thanks!

Best Regards,
Wei

> Will
>
> .
>

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-21  9:20             ` Wei Xu
@ 2018-06-26 17:16               ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-26 17:16 UTC (permalink / raw)
  To: James Morse, Will Deacon
  Cc: mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution),
	libeijian

Hi All,

On 2018/6/21 17:20, Wei Xu wrote:
> Hi James,
>
> On 2018/6/21 9:38, James Morse wrote:
>> Hi Will, Wei,
>>
>> On 20/06/18 17:25, Wei Xu wrote:
>>> On 2018/6/20 23:54, James Morse wrote:
>>> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
>>> But I still got the stack overflow issue sometimes.
>>> Do you have more hint?
>>> The log is as below:
>>>      [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>>      [    0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty
>> Could you reproduce this with v4.17? This says there are ~45,000 extra patches,
>> and un-committed changes. None of the hashes so far have been commits in
>> mainline, so we have no idea what this tree is.
>>
> I have tried v4.17 and log is as below and also it can be found in the first mail
> of this thread.
>
> 	[ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
> 	(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> 	linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
> 	15 21:39:52 CST 2018
>
> I will try v4.17.2 and v4.18-rc1.
>
>>> (joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
>>> 23:59:05 CST 2018
>>>      [    0.000000] CPU0: using LPI pending table @0x000000007d860000
>>>      [    0.000000] GIC: PPI11 is secure or misconfigured
>>>      [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
>>> low
>>>      [    0.000000] arch_timer: WARNING: Please fix your firmware
>>>      [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
>> (No idea what these mean, but I doubt they are relevant)
>>
> I will try with mainline qemu 2.12.0.
>
> Thanks!

Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu 
2.12.0.
The guest sometimes still failed to boot. But the crash reason is different.
Could you please share any hint?
Thanks!

The guest boot log is as below:
===========================

     estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
     ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel 
./Image-4.18-joyx -initrd
     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init 
console=ttyAMA0 ear
     lycon=pl011,0x9000000"

     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
     [    0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty 
(joyx@Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #20 
SMP PREEMPT Tue Jun 26 23:43:35 CST 2018
     [    0.000000] Machine model: linux,dummy-virt
     [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
     [    0.000000] bootconsole [pl11] enabled
     [    0.000000] efi: Getting EFI parameters from FDT:
     [    0.000000] efi: UEFI not found.
     [    0.000000] cma: Reserved 32 MiB at 0x000000007e000000
     [    0.000000] NUMA: No NUMA configuration found
     [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
     [    0.000000] NUMA: NODE_DATA [mem 0x7dfe9a00-0x7dfeb1bf]
     [    0.000000] Zone ranges:
     [    0.000000]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000]   Normal   empty
     [    0.000000] Movable zone start for each node
     [    0.000000] Early memory node ranges
     [    0.000000]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
     [    0.000000] psci: probing for conduit method from DT.
     [    0.000000] psci: PSCIv1.0 detected in firmware.
     [    0.000000] psci: Using standard PSCI v0.2 function IDs
     [    0.000000] psci: Trusted OS migration not required
     [    0.000000] psci: SMC Calling Convention v1.1
     [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
     [    0.000000] percpu: Embedded 23 pages/cpu @(____ptrval____) 
s56064 r8192 d29952 u94208
     [    0.000000] Detected VIPT I-cache on CPU0
     [    0.000000] CPU features: detected: Kernel page table isolation 
(KPTI)
     [    0.000000] CPU features: detected: Hardware dirty bit management
     [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
     [    0.000000] Policy zone: DMA32
     [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
     [    0.000000] Memory: 951780K/1048576K available (10172K kernel 
code, 1362K rwdata, 4956K rodata, 1216K init, 392K bss, 64028K reserved, 
32768K cma-reserved)
     [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, 
Nodes=1
     [    0.000000] Preemptible hierarchical RCU implementation.
     [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
     [    0.000000]     Tasks RCU enabled.
     [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
     [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
     [    0.000000] GICv3: Distributor has no Range Selector support
     [    0.000000] GICv3: no VLPI support, no direct LPI support
     [    0.000000] ITS [mem 0x08080000-0x0809ffff]
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Devices 
@7c830000 (indirect, esz 8, psz 64K, shr 1)
     [    0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt 
Collections @7c840000 (flat, esz 8, psz 64K, shr 1)
     [    0.000000] GIC: using LPI property table @0x000000007c850000
     [    0.000000] ITS: Allocated 1792 chunks for LPIs
     [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
     [    0.000000] CPU0: using LPI pending table @0x000000007c860000
     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
     [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
     [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
     [    0.000828] Console: colour dummy device 80x25
     [    0.001279] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
     [    0.002307] pid_max: default: 32768 minimum: 301
     [    0.002925] Security Framework initialized
     [    0.003494] Dentry cache hash table entries: 131072 (order: 8, 
1048576 bytes)
     [    0.004277] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
     [    0.004968] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.005628] Mountpoint-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.031117] ASID allocator initialised with 32768 entries
     [    0.035124] Hierarchical SRCU implementation.
     [    0.039492] Platform MSI: its domain created
     [    0.039934] PCI/MSI: /intc/its domain created
     [    0.040509] EFI services will not be available.
     [    0.043153] smp: Bringing up secondary CPUs ...
     [    0.043606] smp: Brought up 1 node, 1 CPU
     [    0.044000] SMP: Total of 1 processors activated.
     [    0.044464] CPU features: detected: GIC system register CPU 
interface
     [    0.045112] CPU features: detected: Privileged Access Never
     [    0.045658] CPU features: detected: User Access Override
     [    0.046177] CPU features: detected: RAS Extension Support
     [    0.048119] Unable to handle kernel NULL pointer dereference at 
virtual address 0000000000000288
     [    0.048991] Mem abort info:
     [    0.049267]   ESR = 0x96000004
     [    0.049567]   Exception class = DABT (current EL), IL = 32 bits
     [    0.050146]   SET = 0, FnV = 0
     [    0.050446]   EA = 0, S1PTW = 0
     [    0.050754] Data abort info:
     [    0.051038]   ISV = 0, ISS = 0x00000004
     [    0.051921]   CM = 0, WnR = 0
     [    0.054936] [0000000000000288] user address but active_mm is swapper
     [    0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
     [    0.067080] Modules linked in:
     [    0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted 
4.18.0-rc2-58583-g7daf201-dirty #20
     [    0.078745] Hardware name: linux,dummy-virt (DT)
     [    0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
     [    0.088258] pc : kpti_install_ng_mappings+0x154/0x214
     [    0.093319] lr : kpti_install_ng_mappings+0x120/0x214
     [    0.098483] sp : ffff0000093fbce0
     [    0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
     [    0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
     [    0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
     [    0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
     [    0.123392] x21: ffff00000923b000 x20: 0000000000000000
     [    0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
     [    0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.139513] x15: 000000007dff5000 x14: 000000007dff5000
     [    0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
     [    0.150329] x11: 000000007dff7000 x10: 0000000000000000
     [    0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
     [    0.161042] x7 : 0000000000000000 x6 : 000000004123c000
     [    0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
     [    0.171860] x3 : 0000000000000000 x2 : 000000004123b000
     [    0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0
     [    0.182577] Process migration/0 (pid: 13, stack limit = 
0x(____ptrval____))
     [    0.189561] Call trace:
     [    0.192081]  kpti_install_ng_mappings+0x154/0x214
     [    0.196892] Code: d503201f d503379f d5033fdf f94033a3 (f9414460)
     [    0.203029] ---[ end trace 3ca968ef0a151b33 ]---
     [    0.207722] note: migration/0[13] exited with preempt_count 1
     [    0.213610] Unable to handle kernel NULL pointer dereference at 
virtual address 0000000000000000
     [    0.222393] Mem abort info:
     [    0.225273]   ESR = 0x86000004
     [    0.228396]   Exception class = IABT (current EL), IL = 32 bits
     [    0.234405]   SET = 0, FnV = 0
     [    0.237527]   EA = 0, S1PTW = 0
     [    0.240769] [0000000000000000] user address but active_mm is swapper
     [    0.247149] Internal error: Oops: 86000004 [#2] PREEMPT SMP
     [    0.252797] Modules linked in:
     [    0.255922] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G D           
4.18.0-rc2-58583-g7daf201-dirty #20
     [    0.265549] Hardware name: linux,dummy-virt (DT)
     [    0.270235] pstate: 60400085 (nZCv daIf +PAN -UAO)
     [    0.275155] pc :           (null)
     [    0.278520] lr :           (null)
     [    0.281886] sp : ffff00000802bb10
     [    0.285257] x29: 0000000000000000 x28: 0000000000000080
     [    0.290664] x27: ffff000008a82000 x26: ffff000008a52134
     [    0.296073] x25: ffff000009089000 x24: ffff80003ca30570
     [    0.301381] x23: ffff000009064000 x22: ffff0000090acd88
     [    0.306789] x21: ffff80003ca30000 x20: 0000000000000000
     [    0.312196] x19: 0000000000000000 x18: 000000000000000e
     [    0.317503] x17: 0000000000000001 x16: 0000000000000019
     [    0.322910] x15: 0000000000000033 x14: 000000000000004c
     [    0.328317] x13: 0000000000000068 x12: ffff0000093fb7f8
     [    0.333725] x11: 0000000000000108 x10: 0000000000000940
     [    0.339028] x9 : ffff00000802baf0 x8 : ffff80003ca309a0
     [    0.344434] x7 : 0000000000000000 x6 : 0000000000000000
     [    0.349842] x5 : 0000000002da3744 x4 : 0000000000000080
     [    0.355250] x3 : 0000000000000008 x2 : 0000800034f69000
     [    0.360554] x1 : ffff80003ca30000 x0 : ffff80003ca627c0
     [    0.365959] Process swapper/0 (pid: 1, stack limit = 
0x(____ptrval____))
     [    0.372801] Call trace:
     [    0.375322] Code: bad PC value
     [    0.378347] ---[ end trace 3ca968ef0a151b34 ]---


The faddr2line result is as :
========================

     ./scripts/faddr2line ../kernel-dev.build/vmlinux 
kpti_install_ng_mappings+0x150/0x214
     kpti_install_ng_mappings+0x150/0x214:
     __cpu_set_tcr_t0sz at arch/arm64/include/asm/mmu_context.h:94
     (inlined by) cpu_uninstall_idmap at 
arch/arm64/include/asm/mmu_context.h:125
     (inlined by) kpti_install_ng_mappings at 
arch/arm64/kernel/cpufeature.c:921


The assembler of kpti_install_ng_mappings is as:
=============================================

     Dump of assembler code for function kpti_install_ng_mappings:
        0xffff000008091f7c <+0>:     stp     x29, x30, [sp,#-112]!
        0xffff000008091f80 <+4>:     adrp    x0, 0xffff000009064000 
<bp_hardening_data>
        0xffff000008091f84 <+8>:     mov     x29, sp
        0xffff000008091f88 <+12>:    stp     x23, x24, [sp,#48]
        0xffff000008091f8c <+16>:    adrp    x24, 0xffff0000091d9000 
<reset_devices>
        0xffff000008091f90 <+20>:    add     x0, x0, #0x18
        0xffff000008091f94 <+24>:    add     x1, x24, #0x550
        0xffff000008091f98 <+28>:    stp     x19, x20, [sp,#16]
        0xffff000008091f9c <+32>:    stp     x21, x22, [sp,#32]
        0xffff000008091fa0 <+36>:    stp     x25, x26, [sp,#64]
        0xffff000008091fa4 <+40>:    stp     x27, x28, [sp,#80]
        0xffff000008091fa8 <+44>:    mrs     x2, tpidr_el1
        0xffff000008091fac <+48>:    ldrb    w1, [x1,#8]
        0xffff000008091fb0 <+52>:    ldr     w20, [x2,x0]
        0xffff000008091fb4 <+56>:    cbnz    w1, 0xffff00000809212c 
<kpti_install_ng_mappings+432>
        0xffff000008091fb8 <+60>:    adrp    x27, 0xffff000008ee5000 
<sve_vq_map+32>
        0xffff000008091fbc <+64>:    adrp    x19, 0xffff0000091d8000 
<empty_zero_page>
        0xffff000008091fc0 <+68>:    add     x19, x19, #0x0
        0xffff000008091fc4 <+72>:    adrp    x1, 0xffff000008a5f000 
<kimage_vaddr>
        0xffff000008091fc8 <+76>:    mov     x0, x19
        0xffff000008091fcc <+80>:    add     x1, x1, #0x3d8
        0xffff000008091fd0 <+84>:    ldr     x2, [x27,#1176]
        0xffff000008091fd4 <+88>:    sub     x4, x1, x2
        0xffff000008091fd8 <+92>:    sub     x0, x0, x2
        0xffff000008091fdc <+96>:    msr     ttbr0_el1, x0
        0xffff000008091fe0 <+100>:   isb
        0xffff000008091fe4 <+104>:   dsb     nshst
        0xffff000008091fe8 <+108>:   tlbi    vmalle1
        0xffff000008091fec <+112>:   nop
        0xffff000008091ff0 <+116>:   nop
        0xffff000008091ff4 <+120>:   dsb     nsh
        0xffff000008091ff8 <+124>:   isb
        0xffff000008091ffc <+128>:   adrp    x3, 0xffff000009096000 
<early_node_cpu_hwid+1440>
        0xffff000008092000 <+132>:   ldr     x0, [x3,#648]
        0xffff000008092004 <+136>:   cmp     x0, #0x10
        0xffff000008092008 <+140>:   b.ne    0xffff000008092178 
<kpti_install_ng_mappings+508>
        0xffff00000809200c <+144>:   adrp    x28, 0xffff000008ee5000 
<sve_vq_map+32>
        0xffff000008092010 <+148>:   ldr     x2, [x27,#1176]
        0xffff000008092014 <+152>:   adrp    x1, 0xffff000009237000
        0xffff000008092018 <+156>:   adrp    x26, 0xffff00000923b000
        0xffff00000809201c <+160>:   add     x1, x1, #0x0
        0xffff000008092020 <+164>:   add     x21, x26, #0x0
        0xffff000008092024 <+168>:   ldr     x0, [x28,#1160]
        0xffff000008092028 <+172>:   adrp    x23, 0xffff000008ee5000 
<sve_vq_map+32>
        0xffff00000809202c <+176>:   sub     x1, x1, x2
        0xffff000008092030 <+180>:   sub     x1, x1, x0
        0xffff000008092034 <+184>:   orr     x0, x1, #0xffff800000000000
        0xffff000008092038 <+188>:   cmp     x0, x21
        0xffff00000809203c <+192>:   b.eq    0xffff000008092174 
<kpti_install_ng_mappings+504>
        0xffff000008092040 <+196>:   mov     x22, x19
        0xffff000008092044 <+200>:   str     x3, [x29,#96]
        0xffff000008092048 <+204>:   str     x4, [x29,#104]
        0xffff00000809204c <+208>:   sub     x2, x22, x2
        0xffff000008092050 <+212>:   msr     ttbr0_el1, x2
        0xffff000008092054 <+216>:   isb
        0xffff000008092058 <+220>:   ldr     x0, [x28,#1160]
     ---Type <return> to continue, or q <return> to quit---
        0xffff00000809205c <+224>:   and     x1, x1, #0x7fffffffffff
        0xffff000008092060 <+228>:   adrp    x25, 0xffff0000090ac000 
<perf_cpu_clock+200>
        0xffff000008092064 <+232>:   add     x0, x1, x0
        0xffff000008092068 <+236>:   add     x1, x25, #0xd88
        0xffff00000809206c <+240>:   bl      0xffff0000080a0750 
<cpu_do_switch_mm>
        0xffff000008092070 <+244>:   adrp    x0, 0xffff000009089000 
<page_wait_table+5376>
        0xffff000008092074 <+248>:   mov     w1, 
#0x80                       // #128
        0xffff000008092078 <+252>:   add     x0, x0, #0xb48
        0xffff00000809207c <+256>:   bl      0xffff0000083e8144 
<__bitmap_weight>
        0xffff000008092080 <+260>:   mov     w1, w0
        0xffff000008092084 <+264>:   ldr     x5, [x23,#1176]
        0xffff000008092088 <+268>:   mov     w0, w20
        0xffff00000809208c <+272>:   ldr     x4, [x29,#104]
        0xffff000008092090 <+276>:   mov     x2, x21
        0xffff000008092094 <+280>:   sub     x2, x2, x5
        0xffff000008092098 <+284>:   blr     x4
        0xffff00000809209c <+288>:   ldr     x1, [x23,#1176]
        0xffff0000080920a0 <+292>:   mrs     x0, sp_el0
        0xffff0000080920a4 <+296>:   sub     x22, x22, x1
        0xffff0000080920a8 <+300>:   ldr     x1, [x0,#936]
        0xffff0000080920ac <+304>:   msr     ttbr0_el1, x22
        0xffff0000080920b0 <+308>:   isb
        0xffff0000080920b4 <+312>:   dsb     nshst
        0xffff0000080920b8 <+316>:   tlbi    vmalle1
        0xffff0000080920bc <+320>:   nop
        0xffff0000080920c0 <+324>:   nop
        0xffff0000080920c4 <+328>:   dsb     nsh
        0xffff0000080920c8 <+332>:   isb
        0xffff0000080920cc <+336>:   ldr     x3, [x29,#96]
        0xffff0000080920d0 <+340>:   ldr     x0, [x3,#648]
        0xffff0000080920d4 <+344>:   cmp     x0, #0x10
        0xffff0000080920d8 <+348>:   b.ne    0xffff00000809215c 
<kpti_install_ng_mappings+480>
        0xffff0000080920dc <+352>:   add     x25, x25, #0xd88
        0xffff0000080920e0 <+356>:   cmp     x1, x25
        0xffff0000080920e4 <+360>:   b.eq    0xffff00000809211c 
<kpti_install_ng_mappings+416>
        0xffff0000080920e8 <+364>:   ldr     x2, [x1,#64]
        0xffff0000080920ec <+368>:   add     x26, x26, #0x0
        0xffff0000080920f0 <+372>:   cmp     x2, x26
        0xffff0000080920f4 <+376>:   b.eq    0xffff000008092174 
<kpti_install_ng_mappings+504>
        0xffff0000080920f8 <+380>:   ldr     x0, [x27,#1176]
        0xffff0000080920fc <+384>:   sub     x19, x19, x0
        0xffff000008092100 <+388>:   msr     ttbr0_el1, x19
        0xffff000008092104 <+392>:   isb
        0xffff000008092108 <+396>:   tbz     x2, #47, 0xffff000008092148 
<kpti_install_ng_mappings+460>
        0xffff00000809210c <+400>:   ldr     x0, [x28,#1160]
        0xffff000008092110 <+404>:   and     x2, x2, #0x7fffffffffff
        0xffff000008092114 <+408>:   add     x0, x2, x0
        0xffff000008092118 <+412>:   bl      0xffff0000080a0750 
<cpu_do_switch_mm>
        0xffff00000809211c <+416>:   cbnz    w20, 0xffff00000809212c 
<kpti_install_ng_mappings+432>
        0xffff000008092120 <+420>:   add     x24, x24, #0x550
        0xffff000008092124 <+424>:   mov     w0, 
#0x1                        // #1
        0xffff000008092128 <+428>:   strb    w0, [x24,#8]
        0xffff00000809212c <+432>:   ldp     x19, x20, [sp,#16]
        0xffff000008092130 <+436>:   ldp     x21, x22, [sp,#32]
        0xffff000008092134 <+440>:   ldp     x23, x24, [sp,#48]
        0xffff000008092138 <+444>:   ldp     x25, x26, [sp,#64]
        0xffff00000809213c <+448>:   ldp     x27, x28, [sp,#80]
     ---Type <return> to continue, or q <return> to quit---
        0xffff000008092140 <+452>:   ldp     x29, x30, [sp],#112
        0xffff000008092144 <+456>:   ret
        0xffff000008092148 <+460>:   adrp    x0, 0xffff000008ee5000 
<sve_vq_map+32>
        0xffff00000809214c <+464>:   ldr     x0, [x0,#1176]
        0xffff000008092150 <+468>:   sub     x0, x2, x0
        0xffff000008092154 <+472>:   bl      0xffff0000080a0750 
<cpu_do_switch_mm>
        0xffff000008092158 <+476>:   b       0xffff00000809211c 
<kpti_install_ng_mappings+416>
        0xffff00000809215c <+480>:   mrs     x0, tcr_el1
        0xffff000008092160 <+484>:   and     x0, x0, #0xffffffffffffffc0
        0xffff000008092164 <+488>:   orr     x0, x0, #0x10
        0xffff000008092168 <+492>:   msr     tcr_el1, x0
        0xffff00000809216c <+496>:   isb
        0xffff000008092170 <+500>:   b       0xffff0000080920dc 
<kpti_install_ng_mappings+352>
        0xffff000008092174 <+504>:   brk     #0x800
        0xffff000008092178 <+508>:   mrs     x1, tcr_el1
        0xffff00000809217c <+512>:   and     x1, x1, #0xffffffffffffffc0
        0xffff000008092180 <+516>:   orr     x0, x1, x0
        0xffff000008092184 <+520>:   msr     tcr_el1, x0
        0xffff000008092188 <+524>:   isb
        0xffff00000809218c <+528>:   b       0xffff00000809200c 
<kpti_install_ng_mappings+144>
     End of assembler dump.


Best Regards,
Wei


^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-26 17:16               ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-26 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi All,

On 2018/6/21 17:20, Wei Xu wrote:
> Hi James,
>
> On 2018/6/21 9:38, James Morse wrote:
>> Hi Will, Wei,
>>
>> On 20/06/18 17:25, Wei Xu wrote:
>>> On 2018/6/20 23:54, James Morse wrote:
>>> I have disabled CONFIG_ARM64_RAS_EXTN and reverted that commit.
>>> But I still got the stack overflow issue sometimes.
>>> Do you have more hint?
>>> The log is as below:
>>>      [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>>      [    0.000000] Linux version 4.17.0-45865-g2b31fe7-dirty
>> Could you reproduce this with v4.17? This says there are ~45,000 extra patches,
>> and un-committed changes. None of the hashes so far have been commits in
>> mainline, so we have no idea what this tree is.
>>
> I have tried v4.17 and log is as below and also it can be found in the first mail
> of this thread.
>
> 	[ 0.000000] Linux version 4.17.0-45864-g29dcea8-dirty
> 	(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
> 	linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #6 SMP PREEMPT Fri Jun
> 	15 21:39:52 CST 2018
>
> I will try v4.17.2 and v4.18-rc1.
>
>>> (joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) (crosstool-NG
>>> linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #10 SMP PREEMPT Wed Jun 20
>>> 23:59:05 CST 2018
>>>      [    0.000000] CPU0: using LPI pending table @0x000000007d860000
>>>      [    0.000000] GIC: PPI11 is secure or misconfigured
>>>      [    0.000000] arch_timer: WARNING: Invalid trigger for IRQ3, assuming level
>>> low
>>>      [    0.000000] arch_timer: WARNING: Please fix your firmware
>>>      [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
>> (No idea what these mean, but I doubt they are relevant)
>>
> I will try with mainline qemu 2.12.0.
>
> Thanks!

Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu 
2.12.0.
The guest sometimes still failed to boot. But the crash reason is different.
Could you please share any hint?
Thanks!

The guest boot log is as below:
===========================

     estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
     ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel 
./Image-4.18-joyx -initrd
     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init 
console=ttyAMA0 ear
     lycon=pl011,0x9000000"

     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
     [    0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty 
(joyx at Turing-Arch-b) (gcc version 4.9.1 20140505 (prerelease) 
(crosstool-NG linaro-1.13.1-4.9-2014.05 - Linaro GCC 4.9-2014.05)) #20 
SMP PREEMPT Tue Jun 26 23:43:35 CST 2018
     [    0.000000] Machine model: linux,dummy-virt
     [    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
     [    0.000000] bootconsole [pl11] enabled
     [    0.000000] efi: Getting EFI parameters from FDT:
     [    0.000000] efi: UEFI not found.
     [    0.000000] cma: Reserved 32 MiB at 0x000000007e000000
     [    0.000000] NUMA: No NUMA configuration found
     [    0.000000] NUMA: Faking a node at [mem 
0x0000000000000000-0x000000007fffffff]
     [    0.000000] NUMA: NODE_DATA [mem 0x7dfe9a00-0x7dfeb1bf]
     [    0.000000] Zone ranges:
     [    0.000000]   DMA32    [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000]   Normal   empty
     [    0.000000] Movable zone start for each node
     [    0.000000] Early memory node ranges
     [    0.000000]   node   0: [mem 0x0000000040000000-0x000000007fffffff]
     [    0.000000] Initmem setup node 0 [mem 
0x0000000040000000-0x000000007fffffff]
     [    0.000000] psci: probing for conduit method from DT.
     [    0.000000] psci: PSCIv1.0 detected in firmware.
     [    0.000000] psci: Using standard PSCI v0.2 function IDs
     [    0.000000] psci: Trusted OS migration not required
     [    0.000000] psci: SMC Calling Convention v1.1
     [    0.000000] random: get_random_bytes called from 
start_kernel+0xa8/0x418 with crng_init=0
     [    0.000000] percpu: Embedded 23 pages/cpu @(____ptrval____) 
s56064 r8192 d29952 u94208
     [    0.000000] Detected VIPT I-cache on CPU0
     [    0.000000] CPU features: detected: Kernel page table isolation 
(KPTI)
     [    0.000000] CPU features: detected: Hardware dirty bit management
     [    0.000000] Built 1 zonelists, mobility grouping on.  Total 
pages: 258048
     [    0.000000] Policy zone: DMA32
     [    0.000000] Kernel command line: rdinit=init console=ttyAMA0 
earlycon=pl011,0x9000000
     [    0.000000] Memory: 951780K/1048576K available (10172K kernel 
code, 1362K rwdata, 4956K rodata, 1216K init, 392K bss, 64028K reserved, 
32768K cma-reserved)
     [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, 
Nodes=1
     [    0.000000] Preemptible hierarchical RCU implementation.
     [    0.000000]     RCU restricting CPUs from NR_CPUS=128 to 
nr_cpu_ids=1.
     [    0.000000]     Tasks RCU enabled.
     [    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, 
nr_cpu_ids=1
     [    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
     [    0.000000] GICv3: Distributor has no Range Selector support
     [    0.000000] GICv3: no VLPI support, no direct LPI support
     [    0.000000] ITS [mem 0x08080000-0x0809ffff]
     [    0.000000] ITS at 0x0000000008080000: allocated 8192 Devices 
@7c830000 (indirect, esz 8, psz 64K, shr 1)
     [    0.000000] ITS at 0x0000000008080000: allocated 8192 Interrupt 
Collections @7c840000 (flat, esz 8, psz 64K, shr 1)
     [    0.000000] GIC: using LPI property table @0x000000007c850000
     [    0.000000] ITS: Allocated 1792 chunks for LPIs
     [    0.000000] GICv3: CPU0: found redistributor 0 region 
0:0x00000000080a0000
     [    0.000000] CPU0: using LPI pending table @0x000000007c860000
     [    0.000000] arch_timer: cp15 timer(s) running at 100.00MHz (virt).
     [    0.000000] clocksource: arch_sys_counter: mask: 
0xffffffffffffff max_cycles: 0x171024e7e0, max_idle_ns: 440795205315 ns
     [    0.000001] sched_clock: 56 bits at 100MHz, resolution 10ns, 
wraps every 4398046511100ns
     [    0.000828] Console: colour dummy device 80x25
     [    0.001279] Calibrating delay loop (skipped), value calculated 
using timer frequency.. 200.00 BogoMIPS (lpj=400000)
     [    0.002307] pid_max: default: 32768 minimum: 301
     [    0.002925] Security Framework initialized
     [    0.003494] Dentry cache hash table entries: 131072 (order: 8, 
1048576 bytes)
     [    0.004277] Inode-cache hash table entries: 65536 (order: 7, 
524288 bytes)
     [    0.004968] Mount-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.005628] Mountpoint-cache hash table entries: 2048 (order: 2, 
16384 bytes)
     [    0.031117] ASID allocator initialised with 32768 entries
     [    0.035124] Hierarchical SRCU implementation.
     [    0.039492] Platform MSI: its domain created
     [    0.039934] PCI/MSI: /intc/its domain created
     [    0.040509] EFI services will not be available.
     [    0.043153] smp: Bringing up secondary CPUs ...
     [    0.043606] smp: Brought up 1 node, 1 CPU
     [    0.044000] SMP: Total of 1 processors activated.
     [    0.044464] CPU features: detected: GIC system register CPU 
interface
     [    0.045112] CPU features: detected: Privileged Access Never
     [    0.045658] CPU features: detected: User Access Override
     [    0.046177] CPU features: detected: RAS Extension Support
     [    0.048119] Unable to handle kernel NULL pointer dereference at 
virtual address 0000000000000288
     [    0.048991] Mem abort info:
     [    0.049267]   ESR = 0x96000004
     [    0.049567]   Exception class = DABT (current EL), IL = 32 bits
     [    0.050146]   SET = 0, FnV = 0
     [    0.050446]   EA = 0, S1PTW = 0
     [    0.050754] Data abort info:
     [    0.051038]   ISV = 0, ISS = 0x00000004
     [    0.051921]   CM = 0, WnR = 0
     [    0.054936] [0000000000000288] user address but active_mm is swapper
     [    0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
     [    0.067080] Modules linked in:
     [    0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted 
4.18.0-rc2-58583-g7daf201-dirty #20
     [    0.078745] Hardware name: linux,dummy-virt (DT)
     [    0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
     [    0.088258] pc : kpti_install_ng_mappings+0x154/0x214
     [    0.093319] lr : kpti_install_ng_mappings+0x120/0x214
     [    0.098483] sp : ffff0000093fbce0
     [    0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
     [    0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
     [    0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
     [    0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
     [    0.123392] x21: ffff00000923b000 x20: 0000000000000000
     [    0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
     [    0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
     [    0.139513] x15: 000000007dff5000 x14: 000000007dff5000
     [    0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
     [    0.150329] x11: 000000007dff7000 x10: 0000000000000000
     [    0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
     [    0.161042] x7 : 0000000000000000 x6 : 000000004123c000
     [    0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
     [    0.171860] x3 : 0000000000000000 x2 : 000000004123b000
     [    0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0
     [    0.182577] Process migration/0 (pid: 13, stack limit = 
0x(____ptrval____))
     [    0.189561] Call trace:
     [    0.192081]  kpti_install_ng_mappings+0x154/0x214
     [    0.196892] Code: d503201f d503379f d5033fdf f94033a3 (f9414460)
     [    0.203029] ---[ end trace 3ca968ef0a151b33 ]---
     [    0.207722] note: migration/0[13] exited with preempt_count 1
     [    0.213610] Unable to handle kernel NULL pointer dereference at 
virtual address 0000000000000000
     [    0.222393] Mem abort info:
     [    0.225273]   ESR = 0x86000004
     [    0.228396]   Exception class = IABT (current EL), IL = 32 bits
     [    0.234405]   SET = 0, FnV = 0
     [    0.237527]   EA = 0, S1PTW = 0
     [    0.240769] [0000000000000000] user address but active_mm is swapper
     [    0.247149] Internal error: Oops: 86000004 [#2] PREEMPT SMP
     [    0.252797] Modules linked in:
     [    0.255922] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G D           
4.18.0-rc2-58583-g7daf201-dirty #20
     [    0.265549] Hardware name: linux,dummy-virt (DT)
     [    0.270235] pstate: 60400085 (nZCv daIf +PAN -UAO)
     [    0.275155] pc :           (null)
     [    0.278520] lr :           (null)
     [    0.281886] sp : ffff00000802bb10
     [    0.285257] x29: 0000000000000000 x28: 0000000000000080
     [    0.290664] x27: ffff000008a82000 x26: ffff000008a52134
     [    0.296073] x25: ffff000009089000 x24: ffff80003ca30570
     [    0.301381] x23: ffff000009064000 x22: ffff0000090acd88
     [    0.306789] x21: ffff80003ca30000 x20: 0000000000000000
     [    0.312196] x19: 0000000000000000 x18: 000000000000000e
     [    0.317503] x17: 0000000000000001 x16: 0000000000000019
     [    0.322910] x15: 0000000000000033 x14: 000000000000004c
     [    0.328317] x13: 0000000000000068 x12: ffff0000093fb7f8
     [    0.333725] x11: 0000000000000108 x10: 0000000000000940
     [    0.339028] x9 : ffff00000802baf0 x8 : ffff80003ca309a0
     [    0.344434] x7 : 0000000000000000 x6 : 0000000000000000
     [    0.349842] x5 : 0000000002da3744 x4 : 0000000000000080
     [    0.355250] x3 : 0000000000000008 x2 : 0000800034f69000
     [    0.360554] x1 : ffff80003ca30000 x0 : ffff80003ca627c0
     [    0.365959] Process swapper/0 (pid: 1, stack limit = 
0x(____ptrval____))
     [    0.372801] Call trace:
     [    0.375322] Code: bad PC value
     [    0.378347] ---[ end trace 3ca968ef0a151b34 ]---


The faddr2line result is as :
========================

     ./scripts/faddr2line ../kernel-dev.build/vmlinux 
kpti_install_ng_mappings+0x150/0x214
     kpti_install_ng_mappings+0x150/0x214:
     __cpu_set_tcr_t0sz at arch/arm64/include/asm/mmu_context.h:94
     (inlined by) cpu_uninstall_idmap at 
arch/arm64/include/asm/mmu_context.h:125
     (inlined by) kpti_install_ng_mappings at 
arch/arm64/kernel/cpufeature.c:921


The assembler of kpti_install_ng_mappings is as:
=============================================

     Dump of assembler code for function kpti_install_ng_mappings:
        0xffff000008091f7c <+0>:     stp     x29, x30, [sp,#-112]!
        0xffff000008091f80 <+4>:     adrp    x0, 0xffff000009064000 
<bp_hardening_data>
        0xffff000008091f84 <+8>:     mov     x29, sp
        0xffff000008091f88 <+12>:    stp     x23, x24, [sp,#48]
        0xffff000008091f8c <+16>:    adrp    x24, 0xffff0000091d9000 
<reset_devices>
        0xffff000008091f90 <+20>:    add     x0, x0, #0x18
        0xffff000008091f94 <+24>:    add     x1, x24, #0x550
        0xffff000008091f98 <+28>:    stp     x19, x20, [sp,#16]
        0xffff000008091f9c <+32>:    stp     x21, x22, [sp,#32]
        0xffff000008091fa0 <+36>:    stp     x25, x26, [sp,#64]
        0xffff000008091fa4 <+40>:    stp     x27, x28, [sp,#80]
        0xffff000008091fa8 <+44>:    mrs     x2, tpidr_el1
        0xffff000008091fac <+48>:    ldrb    w1, [x1,#8]
        0xffff000008091fb0 <+52>:    ldr     w20, [x2,x0]
        0xffff000008091fb4 <+56>:    cbnz    w1, 0xffff00000809212c 
<kpti_install_ng_mappings+432>
        0xffff000008091fb8 <+60>:    adrp    x27, 0xffff000008ee5000 
<sve_vq_map+32>
        0xffff000008091fbc <+64>:    adrp    x19, 0xffff0000091d8000 
<empty_zero_page>
        0xffff000008091fc0 <+68>:    add     x19, x19, #0x0
        0xffff000008091fc4 <+72>:    adrp    x1, 0xffff000008a5f000 
<kimage_vaddr>
        0xffff000008091fc8 <+76>:    mov     x0, x19
        0xffff000008091fcc <+80>:    add     x1, x1, #0x3d8
        0xffff000008091fd0 <+84>:    ldr     x2, [x27,#1176]
        0xffff000008091fd4 <+88>:    sub     x4, x1, x2
        0xffff000008091fd8 <+92>:    sub     x0, x0, x2
        0xffff000008091fdc <+96>:    msr     ttbr0_el1, x0
        0xffff000008091fe0 <+100>:   isb
        0xffff000008091fe4 <+104>:   dsb     nshst
        0xffff000008091fe8 <+108>:   tlbi    vmalle1
        0xffff000008091fec <+112>:   nop
        0xffff000008091ff0 <+116>:   nop
        0xffff000008091ff4 <+120>:   dsb     nsh
        0xffff000008091ff8 <+124>:   isb
        0xffff000008091ffc <+128>:   adrp    x3, 0xffff000009096000 
<early_node_cpu_hwid+1440>
        0xffff000008092000 <+132>:   ldr     x0, [x3,#648]
        0xffff000008092004 <+136>:   cmp     x0, #0x10
        0xffff000008092008 <+140>:   b.ne    0xffff000008092178 
<kpti_install_ng_mappings+508>
        0xffff00000809200c <+144>:   adrp    x28, 0xffff000008ee5000 
<sve_vq_map+32>
        0xffff000008092010 <+148>:   ldr     x2, [x27,#1176]
        0xffff000008092014 <+152>:   adrp    x1, 0xffff000009237000
        0xffff000008092018 <+156>:   adrp    x26, 0xffff00000923b000
        0xffff00000809201c <+160>:   add     x1, x1, #0x0
        0xffff000008092020 <+164>:   add     x21, x26, #0x0
        0xffff000008092024 <+168>:   ldr     x0, [x28,#1160]
        0xffff000008092028 <+172>:   adrp    x23, 0xffff000008ee5000 
<sve_vq_map+32>
        0xffff00000809202c <+176>:   sub     x1, x1, x2
        0xffff000008092030 <+180>:   sub     x1, x1, x0
        0xffff000008092034 <+184>:   orr     x0, x1, #0xffff800000000000
        0xffff000008092038 <+188>:   cmp     x0, x21
        0xffff00000809203c <+192>:   b.eq    0xffff000008092174 
<kpti_install_ng_mappings+504>
        0xffff000008092040 <+196>:   mov     x22, x19
        0xffff000008092044 <+200>:   str     x3, [x29,#96]
        0xffff000008092048 <+204>:   str     x4, [x29,#104]
        0xffff00000809204c <+208>:   sub     x2, x22, x2
        0xffff000008092050 <+212>:   msr     ttbr0_el1, x2
        0xffff000008092054 <+216>:   isb
        0xffff000008092058 <+220>:   ldr     x0, [x28,#1160]
     ---Type <return> to continue, or q <return> to quit---
        0xffff00000809205c <+224>:   and     x1, x1, #0x7fffffffffff
        0xffff000008092060 <+228>:   adrp    x25, 0xffff0000090ac000 
<perf_cpu_clock+200>
        0xffff000008092064 <+232>:   add     x0, x1, x0
        0xffff000008092068 <+236>:   add     x1, x25, #0xd88
        0xffff00000809206c <+240>:   bl      0xffff0000080a0750 
<cpu_do_switch_mm>
        0xffff000008092070 <+244>:   adrp    x0, 0xffff000009089000 
<page_wait_table+5376>
        0xffff000008092074 <+248>:   mov     w1, 
#0x80                       // #128
        0xffff000008092078 <+252>:   add     x0, x0, #0xb48
        0xffff00000809207c <+256>:   bl      0xffff0000083e8144 
<__bitmap_weight>
        0xffff000008092080 <+260>:   mov     w1, w0
        0xffff000008092084 <+264>:   ldr     x5, [x23,#1176]
        0xffff000008092088 <+268>:   mov     w0, w20
        0xffff00000809208c <+272>:   ldr     x4, [x29,#104]
        0xffff000008092090 <+276>:   mov     x2, x21
        0xffff000008092094 <+280>:   sub     x2, x2, x5
        0xffff000008092098 <+284>:   blr     x4
        0xffff00000809209c <+288>:   ldr     x1, [x23,#1176]
        0xffff0000080920a0 <+292>:   mrs     x0, sp_el0
        0xffff0000080920a4 <+296>:   sub     x22, x22, x1
        0xffff0000080920a8 <+300>:   ldr     x1, [x0,#936]
        0xffff0000080920ac <+304>:   msr     ttbr0_el1, x22
        0xffff0000080920b0 <+308>:   isb
        0xffff0000080920b4 <+312>:   dsb     nshst
        0xffff0000080920b8 <+316>:   tlbi    vmalle1
        0xffff0000080920bc <+320>:   nop
        0xffff0000080920c0 <+324>:   nop
        0xffff0000080920c4 <+328>:   dsb     nsh
        0xffff0000080920c8 <+332>:   isb
        0xffff0000080920cc <+336>:   ldr     x3, [x29,#96]
        0xffff0000080920d0 <+340>:   ldr     x0, [x3,#648]
        0xffff0000080920d4 <+344>:   cmp     x0, #0x10
        0xffff0000080920d8 <+348>:   b.ne    0xffff00000809215c 
<kpti_install_ng_mappings+480>
        0xffff0000080920dc <+352>:   add     x25, x25, #0xd88
        0xffff0000080920e0 <+356>:   cmp     x1, x25
        0xffff0000080920e4 <+360>:   b.eq    0xffff00000809211c 
<kpti_install_ng_mappings+416>
        0xffff0000080920e8 <+364>:   ldr     x2, [x1,#64]
        0xffff0000080920ec <+368>:   add     x26, x26, #0x0
        0xffff0000080920f0 <+372>:   cmp     x2, x26
        0xffff0000080920f4 <+376>:   b.eq    0xffff000008092174 
<kpti_install_ng_mappings+504>
        0xffff0000080920f8 <+380>:   ldr     x0, [x27,#1176]
        0xffff0000080920fc <+384>:   sub     x19, x19, x0
        0xffff000008092100 <+388>:   msr     ttbr0_el1, x19
        0xffff000008092104 <+392>:   isb
        0xffff000008092108 <+396>:   tbz     x2, #47, 0xffff000008092148 
<kpti_install_ng_mappings+460>
        0xffff00000809210c <+400>:   ldr     x0, [x28,#1160]
        0xffff000008092110 <+404>:   and     x2, x2, #0x7fffffffffff
        0xffff000008092114 <+408>:   add     x0, x2, x0
        0xffff000008092118 <+412>:   bl      0xffff0000080a0750 
<cpu_do_switch_mm>
        0xffff00000809211c <+416>:   cbnz    w20, 0xffff00000809212c 
<kpti_install_ng_mappings+432>
        0xffff000008092120 <+420>:   add     x24, x24, #0x550
        0xffff000008092124 <+424>:   mov     w0, 
#0x1                        // #1
        0xffff000008092128 <+428>:   strb    w0, [x24,#8]
        0xffff00000809212c <+432>:   ldp     x19, x20, [sp,#16]
        0xffff000008092130 <+436>:   ldp     x21, x22, [sp,#32]
        0xffff000008092134 <+440>:   ldp     x23, x24, [sp,#48]
        0xffff000008092138 <+444>:   ldp     x25, x26, [sp,#64]
        0xffff00000809213c <+448>:   ldp     x27, x28, [sp,#80]
     ---Type <return> to continue, or q <return> to quit---
        0xffff000008092140 <+452>:   ldp     x29, x30, [sp],#112
        0xffff000008092144 <+456>:   ret
        0xffff000008092148 <+460>:   adrp    x0, 0xffff000008ee5000 
<sve_vq_map+32>
        0xffff00000809214c <+464>:   ldr     x0, [x0,#1176]
        0xffff000008092150 <+468>:   sub     x0, x2, x0
        0xffff000008092154 <+472>:   bl      0xffff0000080a0750 
<cpu_do_switch_mm>
        0xffff000008092158 <+476>:   b       0xffff00000809211c 
<kpti_install_ng_mappings+416>
        0xffff00000809215c <+480>:   mrs     x0, tcr_el1
        0xffff000008092160 <+484>:   and     x0, x0, #0xffffffffffffffc0
        0xffff000008092164 <+488>:   orr     x0, x0, #0x10
        0xffff000008092168 <+492>:   msr     tcr_el1, x0
        0xffff00000809216c <+496>:   isb
        0xffff000008092170 <+500>:   b       0xffff0000080920dc 
<kpti_install_ng_mappings+352>
        0xffff000008092174 <+504>:   brk     #0x800
        0xffff000008092178 <+508>:   mrs     x1, tcr_el1
        0xffff00000809217c <+512>:   and     x1, x1, #0xffffffffffffffc0
        0xffff000008092180 <+516>:   orr     x0, x1, x0
        0xffff000008092184 <+520>:   msr     tcr_el1, x0
        0xffff000008092188 <+524>:   isb
        0xffff00000809218c <+528>:   b       0xffff00000809200c 
<kpti_install_ng_mappings+144>
     End of assembler dump.


Best Regards,
Wei

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-26 17:16               ` Wei Xu
@ 2018-06-26 17:47                 ` Will Deacon
  -1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-26 17:47 UTC (permalink / raw)
  To: Wei Xu
  Cc: James Morse, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution),
	libeijian

Hi Wei,

On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
> Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu
> 2.12.0.
> The guest sometimes still failed to boot. But the crash reason is different.
> Could you please share any hint?
> Thanks!
> 
> The guest boot log is as below:
> ===========================
> 
>     estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
>     ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.18-joyx
> -initrd
>     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
> console=ttyAMA0 ear
>     lycon=pl011,0x9000000"
> 
>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>     [    0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty

I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !

>     [    0.048119] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000288
>     [    0.048991] Mem abort info:
>     [    0.049267]   ESR = 0x96000004
>     [    0.049567]   Exception class = DABT (current EL), IL = 32 bits
>     [    0.050146]   SET = 0, FnV = 0
>     [    0.050446]   EA = 0, S1PTW = 0
>     [    0.050754] Data abort info:
>     [    0.051038]   ISV = 0, ISS = 0x00000004
>     [    0.051921]   CM = 0, WnR = 0
>     [    0.054936] [0000000000000288] user address but active_mm is swapper
>     [    0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>     [    0.067080] Modules linked in:
>     [    0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted
> 4.18.0-rc2-58583-g7daf201-dirty #20
>     [    0.078745] Hardware name: linux,dummy-virt (DT)
>     [    0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
>     [    0.088258] pc : kpti_install_ng_mappings+0x154/0x214
>     [    0.093319] lr : kpti_install_ng_mappings+0x120/0x214
>     [    0.098483] sp : ffff0000093fbce0
>     [    0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
>     [    0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
>     [    0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
>     [    0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
>     [    0.123392] x21: ffff00000923b000 x20: 0000000000000000
>     [    0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
>     [    0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
>     [    0.139513] x15: 000000007dff5000 x14: 000000007dff5000
>     [    0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
>     [    0.150329] x11: 000000007dff7000 x10: 0000000000000000
>     [    0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
>     [    0.161042] x7 : 0000000000000000 x6 : 000000004123c000
>     [    0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
>     [    0.171860] x3 : 0000000000000000 x2 : 000000004123b000
>     [    0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0

So looking at the disassembly, we access idmap_t0sz as part of
cpu_install_idmap() and it looks like we push its page address to the
stack:

>        0xffff000008091ffc <+128>:   adrp    x3, 0xffff000009096000 <early_node_cpu_hwid+1440>

[...]

>        0xffff000008092044 <+200>:   str     x3, [x29,#96]

Then after we've come back from the asm call, we want to access idmap_t0sz
again as part of cpu_uninstall_idmap() so we pop it back off:

>        0xffff0000080920cc <+336>:   ldr     x3, [x29,#96]
>        0xffff0000080920d0 <+340>:   ldr     x0, [x3,#648]

And this access is the one that faults, because we popped off NULL.

So actually, rather than faulting on the stack access, we're managing to
load zeroes from somewhere, so it could still be indicative of page table
corruption for the stack mapping.

If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
replacing:

	dc      civac, cur_\()\type\()p

with:

	dc      ivac, cur_\()\type\()p

please? Only do this for the guest kernel, not the host. KVM will upgrade
the clean to a clean+invalidate, so it's interesting to see if this has
an effect on the behaviour.

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-26 17:47                 ` Will Deacon
  0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-26 17:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Wei,

On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
> Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu
> 2.12.0.
> The guest sometimes still failed to boot. But the crash reason is different.
> Could you please share any hint?
> Thanks!
> 
> The guest boot log is as below:
> ===========================
> 
>     estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
>     ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.18-joyx
> -initrd
>     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
> console=ttyAMA0 ear
>     lycon=pl011,0x9000000"
> 
>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>     [    0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty

I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !

>     [    0.048119] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000288
>     [    0.048991] Mem abort info:
>     [    0.049267]   ESR = 0x96000004
>     [    0.049567]   Exception class = DABT (current EL), IL = 32 bits
>     [    0.050146]   SET = 0, FnV = 0
>     [    0.050446]   EA = 0, S1PTW = 0
>     [    0.050754] Data abort info:
>     [    0.051038]   ISV = 0, ISS = 0x00000004
>     [    0.051921]   CM = 0, WnR = 0
>     [    0.054936] [0000000000000288] user address but active_mm is swapper
>     [    0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>     [    0.067080] Modules linked in:
>     [    0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted
> 4.18.0-rc2-58583-g7daf201-dirty #20
>     [    0.078745] Hardware name: linux,dummy-virt (DT)
>     [    0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
>     [    0.088258] pc : kpti_install_ng_mappings+0x154/0x214
>     [    0.093319] lr : kpti_install_ng_mappings+0x120/0x214
>     [    0.098483] sp : ffff0000093fbce0
>     [    0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
>     [    0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
>     [    0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
>     [    0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
>     [    0.123392] x21: ffff00000923b000 x20: 0000000000000000
>     [    0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
>     [    0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
>     [    0.139513] x15: 000000007dff5000 x14: 000000007dff5000
>     [    0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
>     [    0.150329] x11: 000000007dff7000 x10: 0000000000000000
>     [    0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
>     [    0.161042] x7 : 0000000000000000 x6 : 000000004123c000
>     [    0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
>     [    0.171860] x3 : 0000000000000000 x2 : 000000004123b000
>     [    0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0

So looking at the disassembly, we access idmap_t0sz as part of
cpu_install_idmap() and it looks like we push its page address to the
stack:

>        0xffff000008091ffc <+128>:   adrp    x3, 0xffff000009096000 <early_node_cpu_hwid+1440>

[...]

>        0xffff000008092044 <+200>:   str     x3, [x29,#96]

Then after we've come back from the asm call, we want to access idmap_t0sz
again as part of cpu_uninstall_idmap() so we pop it back off:

>        0xffff0000080920cc <+336>:   ldr     x3, [x29,#96]
>        0xffff0000080920d0 <+340>:   ldr     x0, [x3,#648]

And this access is the one that faults, because we popped off NULL.

So actually, rather than faulting on the stack access, we're managing to
load zeroes from somewhere, so it could still be indicative of page table
corruption for the stack mapping.

If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
replacing:

	dc      civac, cur_\()\type\()p

with:

	dc      ivac, cur_\()\type\()p

please? Only do this for the guest kernel, not the host. KVM will upgrade
the clean to a clean+invalidate, so it's interesting to see if this has
an effect on the behaviour.

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-26 17:47                 ` Will Deacon
@ 2018-06-27  8:39                   ` James Morse
  -1 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-27  8:39 UTC (permalink / raw)
  To: Wei Xu
  Cc: Will Deacon, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution),
	libeijian

Hi Wei,

On 26/06/18 18:47, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>     [    0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
> 
> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !

Some examples:

For comparison, when I boot v4.17 it looks like this:
| Linux version 4.17.0 (morse@melchizedek) (gcc version 4.9.3 20141031
| (prerelease) (Linaro GCC 2014.11)) #9886 SMP PREEMPT Thu Jun 21 10:30:55 BST
| 2018


If I apply some extra patches and make some uncommitted changes, it looks like this:
| Linux version 4.17.0-00025-ga22ca2234824-dirty (morse@melchizedek) (gcc
| version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11)) #9887 SMP PREEMPT
| Thu Jun 21 10:46:22 BST 2018


Hence we read your '4.17.0-45864-g29dcea8-dirty' line as v4.17 with extra
patches and uncommited changes, and similar with this v4.18-rc2.

I agree 7daf201 is the head commit for v4.18-rc2, but something has gone wrong
here. Could you try building from a fresh clone of Linus' tree?

(I suspect at some point you've applied a patch, and have then been merging
upstream, instead of 'fast forwarding')



Thanks,

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-27  8:39                   ` James Morse
  0 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-27  8:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Wei,

On 26/06/18 18:47, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>     [    0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
> 
> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !

Some examples:

For comparison, when I boot v4.17 it looks like this:
| Linux version 4.17.0 (morse at melchizedek) (gcc version 4.9.3 20141031
| (prerelease) (Linaro GCC 2014.11)) #9886 SMP PREEMPT Thu Jun 21 10:30:55 BST
| 2018


If I apply some extra patches and make some uncommitted changes, it looks like this:
| Linux version 4.17.0-00025-ga22ca2234824-dirty (morse at melchizedek) (gcc
| version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11)) #9887 SMP PREEMPT
| Thu Jun 21 10:46:22 BST 2018


Hence we read your '4.17.0-45864-g29dcea8-dirty' line as v4.17 with extra
patches and uncommited changes, and similar with this v4.18-rc2.

I agree 7daf201 is the head commit for v4.18-rc2, but something has gone wrong
here. Could you try building from a fresh clone of Linus' tree?

(I suspect at some point you've applied a patch, and have then been merging
upstream, instead of 'fast forwarding')



Thanks,

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-26 17:47                 ` Will Deacon
@ 2018-06-27 13:22                   ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-27 13:22 UTC (permalink / raw)
  To: Will Deacon
  Cc: James Morse, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution),
	libeijian, zhangxiquan, wxf.wang, dingshuai1, Hanjun Guo,
	Liguozhu (Kenneth)

Hi Will,

On 2018/6/26 18:47, Will Deacon wrote:
> Hi Wei,
> 
> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>> Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu
>> 2.12.0.
>> The guest sometimes still failed to boot. But the crash reason is different.
>> Could you please share any hint?
>> Thanks!
>>
>> The guest boot log is as below:
>> ===========================
>>
>>     estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
>>     ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.18-joyx
>> -initrd
>>     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
>> console=ttyAMA0 ear
>>     lycon=pl011,0x9000000"
>>
>>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>     [    0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
> 
> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !

Sorry, I should highlight that I have updated the default value
of CONFIG_NR_CPUS by menuconfig in the previous mail.
That is why it showed dirty.

> 
>>     [    0.048119] Unable to handle kernel NULL pointer dereference at
>> virtual address 0000000000000288
>>     [    0.048991] Mem abort info:
>>     [    0.049267]   ESR = 0x96000004
>>     [    0.049567]   Exception class = DABT (current EL), IL = 32 bits
>>     [    0.050146]   SET = 0, FnV = 0
>>     [    0.050446]   EA = 0, S1PTW = 0
>>     [    0.050754] Data abort info:
>>     [    0.051038]   ISV = 0, ISS = 0x00000004
>>     [    0.051921]   CM = 0, WnR = 0
>>     [    0.054936] [0000000000000288] user address but active_mm is swapper
>>     [    0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>     [    0.067080] Modules linked in:
>>     [    0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted
>> 4.18.0-rc2-58583-g7daf201-dirty #20
>>     [    0.078745] Hardware name: linux,dummy-virt (DT)
>>     [    0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
>>     [    0.088258] pc : kpti_install_ng_mappings+0x154/0x214
>>     [    0.093319] lr : kpti_install_ng_mappings+0x120/0x214
>>     [    0.098483] sp : ffff0000093fbce0
>>     [    0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
>>     [    0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
>>     [    0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
>>     [    0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
>>     [    0.123392] x21: ffff00000923b000 x20: 0000000000000000
>>     [    0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
>>     [    0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
>>     [    0.139513] x15: 000000007dff5000 x14: 000000007dff5000
>>     [    0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
>>     [    0.150329] x11: 000000007dff7000 x10: 0000000000000000
>>     [    0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
>>     [    0.161042] x7 : 0000000000000000 x6 : 000000004123c000
>>     [    0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
>>     [    0.171860] x3 : 0000000000000000 x2 : 000000004123b000
>>     [    0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0
> 
> So looking at the disassembly, we access idmap_t0sz as part of
> cpu_install_idmap() and it looks like we push its page address to the
> stack:
> 
>>        0xffff000008091ffc <+128>:   adrp    x3, 0xffff000009096000 <early_node_cpu_hwid+1440>
> 
> [...]
> 
>>        0xffff000008092044 <+200>:   str     x3, [x29,#96]
> 
> Then after we've come back from the asm call, we want to access idmap_t0sz
> again as part of cpu_uninstall_idmap() so we pop it back off:
> 
>>        0xffff0000080920cc <+336>:   ldr     x3, [x29,#96]
>>        0xffff0000080920d0 <+340>:   ldr     x0, [x3,#648]
> 
> And this access is the one that faults, because we popped off NULL.
> 

Thanks for your kindly explanation!

> So actually, rather than faulting on the stack access, we're managing to
> load zeroes from somewhere, so it could still be indicative of page table
> corruption for the stack mapping.
> 
> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> replacing:
> 
> 	dc      civac, cur_\()\type\()p
> 
> with:
> 
> 	dc      ivac, cur_\()\type\()p
> 
> please? Only do this for the guest kernel, not the host. KVM will upgrade
> the clean to a clean+invalidate, so it's interesting to see if this has
> an effect on the behaviour.

Only changed the guest kernel, the guest still failed to boot and the log
is same with the last mail.

But if I changed to cvac as below for the guest, it is kind of stable.
	dc      cvac, cur_\()\type\()p

I have synced with our SoC guys about this and hope we can find the reason.
Do you have any more suggestion?
Thanks!

Best Regards,
Wei

> 
> Will
> 
> .
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-27 13:22                   ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-27 13:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2018/6/26 18:47, Will Deacon wrote:
> Hi Wei,
> 
> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>> Today I tried the kernel 4.18-rc2(defconfig, no change on top) with qemu
>> 2.12.0.
>> The guest sometimes still failed to boot. But the crash reason is different.
>> Could you please share any hint?
>> Thanks!
>>
>> The guest boot log is as below:
>> ===========================
>>
>>     estuary:/$ ./qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-v
>>     ersion=3 -cpu host -enable-kvm -smp 1 -m 1024 -kernel ./Image-4.18-joyx
>> -initrd
>>     ../mini-rootfs-arm64.cpio.gz -nographic -append "rdinit=init
>> console=ttyAMA0 ear
>>     lycon=pl011,0x9000000"
>>
>>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>     [    0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
> 
> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !

Sorry, I should highlight that I have updated the default value
of CONFIG_NR_CPUS by menuconfig in the previous mail.
That is why it showed dirty.

> 
>>     [    0.048119] Unable to handle kernel NULL pointer dereference at
>> virtual address 0000000000000288
>>     [    0.048991] Mem abort info:
>>     [    0.049267]   ESR = 0x96000004
>>     [    0.049567]   Exception class = DABT (current EL), IL = 32 bits
>>     [    0.050146]   SET = 0, FnV = 0
>>     [    0.050446]   EA = 0, S1PTW = 0
>>     [    0.050754] Data abort info:
>>     [    0.051038]   ISV = 0, ISS = 0x00000004
>>     [    0.051921]   CM = 0, WnR = 0
>>     [    0.054936] [0000000000000288] user address but active_mm is swapper
>>     [    0.061427] Internal error: Oops: 96000004 [#1] PREEMPT SMP
>>     [    0.067080] Modules linked in:
>>     [    0.070206] CPU: 0 PID: 13 Comm: migration/0 Not tainted
>> 4.18.0-rc2-58583-g7daf201-dirty #20
>>     [    0.078745] Hardware name: linux,dummy-virt (DT)
>>     [    0.083433] pstate: 60400085 (nZCv daIf +PAN -UAO)
>>     [    0.088258] pc : kpti_install_ng_mappings+0x154/0x214
>>     [    0.093319] lr : kpti_install_ng_mappings+0x120/0x214
>>     [    0.098483] sp : ffff0000093fbce0
>>     [    0.101854] x29: ffff0000093fbce0 x28: ffff000008ee5000
>>     [    0.107263] x27: ffff000008ee5000 x26: ffff00000923b000
>>     [    0.112568] x25: ffff0000090ac000 x24: ffff0000091d9000
>>     [    0.117983] x23: ffff000008ee5000 x22: 00000000411d8000
>>     [    0.123392] x21: ffff00000923b000 x20: 0000000000000000
>>     [    0.128801] x19: ffff0000091d8000 x18: 000000003455d99d
>>     [    0.134209] x17: 0000000000000001 x16: 00f8000040ffff13
>>     [    0.139513] x15: 000000007dff5000 x14: 000000007dff5000
>>     [    0.144920] x13: 00f800007fe00f11 x12: 000000007dff7000
>>     [    0.150329] x11: 000000007dff7000 x10: 0000000000000000
>>     [    0.155633] x9 : 000000007dff8000 x8 : 000000007dff8000
>>     [    0.161042] x7 : 0000000000000000 x6 : 000000004123c000
>>     [    0.166451] x5 : 000000004123c000 x4 : 0000000040a5f3d4
>>     [    0.171860] x3 : 0000000000000000 x2 : 000000004123b000
>>     [    0.177163] x1 : ffff0000090acd88 x0 : ffff80003ca627c0
> 
> So looking at the disassembly, we access idmap_t0sz as part of
> cpu_install_idmap() and it looks like we push its page address to the
> stack:
> 
>>        0xffff000008091ffc <+128>:   adrp    x3, 0xffff000009096000 <early_node_cpu_hwid+1440>
> 
> [...]
> 
>>        0xffff000008092044 <+200>:   str     x3, [x29,#96]
> 
> Then after we've come back from the asm call, we want to access idmap_t0sz
> again as part of cpu_uninstall_idmap() so we pop it back off:
> 
>>        0xffff0000080920cc <+336>:   ldr     x3, [x29,#96]
>>        0xffff0000080920d0 <+340>:   ldr     x0, [x3,#648]
> 
> And this access is the one that faults, because we popped off NULL.
> 

Thanks for your kindly explanation!

> So actually, rather than faulting on the stack access, we're managing to
> load zeroes from somewhere, so it could still be indicative of page table
> corruption for the stack mapping.
> 
> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> replacing:
> 
> 	dc      civac, cur_\()\type\()p
> 
> with:
> 
> 	dc      ivac, cur_\()\type\()p
> 
> please? Only do this for the guest kernel, not the host. KVM will upgrade
> the clean to a clean+invalidate, so it's interesting to see if this has
> an effect on the behaviour.

Only changed the guest kernel, the guest still failed to boot and the log
is same with the last mail.

But if I changed to cvac as below for the guest, it is kind of stable.
	dc      cvac, cur_\()\type\()p

I have synced with our SoC guys about this and hope we can find the reason.
Do you have any more suggestion?
Thanks!

Best Regards,
Wei

> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-27  8:39                   ` James Morse
@ 2018-06-27 13:26                     ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-27 13:26 UTC (permalink / raw)
  To: James Morse
  Cc: Will Deacon, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution),
	libeijian

Hi James,

On 2018/6/27 9:39, James Morse wrote:
> Hi Wei,
> 
> On 26/06/18 18:47, Will Deacon wrote:
>> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>>>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>>     [    0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
>>
>> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !
> 
> Some examples:
> 
> For comparison, when I boot v4.17 it looks like this:
> | Linux version 4.17.0 (morse@melchizedek) (gcc version 4.9.3 20141031
> | (prerelease) (Linaro GCC 2014.11)) #9886 SMP PREEMPT Thu Jun 21 10:30:55 BST
> | 2018
> 
> 
> If I apply some extra patches and make some uncommitted changes, it looks like this:
> | Linux version 4.17.0-00025-ga22ca2234824-dirty (morse@melchizedek) (gcc
> | version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11)) #9887 SMP PREEMPT
> | Thu Jun 21 10:46:22 BST 2018
> 
> 
> Hence we read your '4.17.0-45864-g29dcea8-dirty' line as v4.17 with extra
> patches and uncommited changes, and similar with this v4.18-rc2.
> 
> I agree 7daf201 is the head commit for v4.18-rc2, but something has gone wrong
> here. Could you try building from a fresh clone of Linus' tree?
> 
> (I suspect at some point you've applied a patch, and have then been merging
> upstream, instead of 'fast forwarding')
> 

Thanks for your kindly guidance!
Sorry, I should highlight that I have only updated the default value
of CONFIG_NR_CPUS by menuconfig in the previous mail.
That is why it showed dirty.

Best Regards,
Wei

> 
> 
> Thanks,
> 
> James
> 
> .
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-27 13:26                     ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-27 13:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

On 2018/6/27 9:39, James Morse wrote:
> Hi Wei,
> 
> On 26/06/18 18:47, Will Deacon wrote:
>> On Wed, Jun 27, 2018 at 01:16:44AM +0800, Wei Xu wrote:
>>>     [    0.000000] Booting Linux on physical CPU 0x0000000000 [0x480fd010]
>>>     [    0.000000] Linux version 4.18.0-rc2-58583-g7daf201-dirty
>>
>> I'm still suspicious that this is 4.18-rc2 with "no change on top" ^^^ !
> 
> Some examples:
> 
> For comparison, when I boot v4.17 it looks like this:
> | Linux version 4.17.0 (morse at melchizedek) (gcc version 4.9.3 20141031
> | (prerelease) (Linaro GCC 2014.11)) #9886 SMP PREEMPT Thu Jun 21 10:30:55 BST
> | 2018
> 
> 
> If I apply some extra patches and make some uncommitted changes, it looks like this:
> | Linux version 4.17.0-00025-ga22ca2234824-dirty (morse at melchizedek) (gcc
> | version 4.9.3 20141031 (prerelease) (Linaro GCC 2014.11)) #9887 SMP PREEMPT
> | Thu Jun 21 10:46:22 BST 2018
> 
> 
> Hence we read your '4.17.0-45864-g29dcea8-dirty' line as v4.17 with extra
> patches and uncommited changes, and similar with this v4.18-rc2.
> 
> I agree 7daf201 is the head commit for v4.18-rc2, but something has gone wrong
> here. Could you try building from a fresh clone of Linus' tree?
> 
> (I suspect at some point you've applied a patch, and have then been merging
> upstream, instead of 'fast forwarding')
> 

Thanks for your kindly guidance!
Sorry, I should highlight that I have only updated the default value
of CONFIG_NR_CPUS by menuconfig in the previous mail.
That is why it showed dirty.

Best Regards,
Wei

> 
> 
> Thanks,
> 
> James
> 
> .
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-27 13:22                   ` Wei Xu
@ 2018-06-27 13:28                     ` Will Deacon
  -1 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-27 13:28 UTC (permalink / raw)
  To: Wei Xu
  Cc: James Morse, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution),
	libeijian, zhangxiquan, wxf.wang, dingshuai1, Hanjun Guo,
	Liguozhu (Kenneth)

On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
> On 2018/6/26 18:47, Will Deacon wrote:
> > If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> > replacing:
> > 
> > 	dc      civac, cur_\()\type\()p
> > 
> > with:
> > 
> > 	dc      ivac, cur_\()\type\()p
> > 
> > please? Only do this for the guest kernel, not the host. KVM will upgrade
> > the clean to a clean+invalidate, so it's interesting to see if this has
> > an effect on the behaviour.
> 
> Only changed the guest kernel, the guest still failed to boot and the log
> is same with the last mail.
> 
> But if I changed to cvac as below for the guest, it is kind of stable.
> 	dc      cvac, cur_\()\type\()p
> 
> I have synced with our SoC guys about this and hope we can find the reason.
> Do you have any more suggestion?

Unfortunately, not. It looks like somehow clean+invalidate is behaving
just as an invalidate, and we're corrupting the page table as a result.

Hopefully the SoC guys will figure it out.

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-27 13:28                     ` Will Deacon
  0 siblings, 0 replies; 79+ messages in thread
From: Will Deacon @ 2018-06-27 13:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
> On 2018/6/26 18:47, Will Deacon wrote:
> > If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> > replacing:
> > 
> > 	dc      civac, cur_\()\type\()p
> > 
> > with:
> > 
> > 	dc      ivac, cur_\()\type\()p
> > 
> > please? Only do this for the guest kernel, not the host. KVM will upgrade
> > the clean to a clean+invalidate, so it's interesting to see if this has
> > an effect on the behaviour.
> 
> Only changed the guest kernel, the guest still failed to boot and the log
> is same with the last mail.
> 
> But if I changed to cvac as below for the guest, it is kind of stable.
> 	dc      cvac, cur_\()\type\()p
> 
> I have synced with our SoC guys about this and hope we can find the reason.
> Do you have any more suggestion?

Unfortunately, not. It looks like somehow clean+invalidate is behaving
just as an invalidate, and we're corrupting the page table as a result.

Hopefully the SoC guys will figure it out.

Will

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-27 13:28                     ` Will Deacon
@ 2018-06-27 13:32                       ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-27 13:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: James Morse, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution),
	libeijian, zhangxiquan, wxf.wang, dingshuai1, Hanjun Guo,
	Liguozhu (Kenneth)

Hi Will,

On 2018/6/27 14:28, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
>> On 2018/6/26 18:47, Will Deacon wrote:
>>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
>>> replacing:
>>>
>>> 	dc      civac, cur_\()\type\()p
>>>
>>> with:
>>>
>>> 	dc      ivac, cur_\()\type\()p
>>>
>>> please? Only do this for the guest kernel, not the host. KVM will upgrade
>>> the clean to a clean+invalidate, so it's interesting to see if this has
>>> an effect on the behaviour.
>>
>> Only changed the guest kernel, the guest still failed to boot and the log
>> is same with the last mail.
>>
>> But if I changed to cvac as below for the guest, it is kind of stable.
>> 	dc      cvac, cur_\()\type\()p
>>
>> I have synced with our SoC guys about this and hope we can find the reason.
>> Do you have any more suggestion?
> 
> Unfortunately, not. It looks like somehow clean+invalidate is behaving
> just as an invalidate, and we're corrupting the page table as a result.
> 
> Hopefully the SoC guys will figure it out.

Thanks anyway!
I will update here if any news.

Best Regards,
Wei

> 
> Will
> 
> .
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-27 13:32                       ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-27 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2018/6/27 14:28, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
>> On 2018/6/26 18:47, Will Deacon wrote:
>>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
>>> replacing:
>>>
>>> 	dc      civac, cur_\()\type\()p
>>>
>>> with:
>>>
>>> 	dc      ivac, cur_\()\type\()p
>>>
>>> please? Only do this for the guest kernel, not the host. KVM will upgrade
>>> the clean to a clean+invalidate, so it's interesting to see if this has
>>> an effect on the behaviour.
>>
>> Only changed the guest kernel, the guest still failed to boot and the log
>> is same with the last mail.
>>
>> But if I changed to cvac as below for the guest, it is kind of stable.
>> 	dc      cvac, cur_\()\type\()p
>>
>> I have synced with our SoC guys about this and hope we can find the reason.
>> Do you have any more suggestion?
> 
> Unfortunately, not. It looks like somehow clean+invalidate is behaving
> just as an invalidate, and we're corrupting the page table as a result.
> 
> Hopefully the SoC guys will figure it out.

Thanks anyway!
I will update here if any news.

Best Regards,
Wei

> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-27 13:26                     ` Wei Xu
@ 2018-06-28  8:45                       ` James Morse
  -1 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-28  8:45 UTC (permalink / raw)
  To: Wei Xu
  Cc: Will Deacon, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution),
	libeijian

Hi Wei,

On 27/06/18 14:26, Wei Xu wrote:
> Sorry, I should highlight that I have only updated the default value
> of CONFIG_NR_CPUS by menuconfig in the previous mail.
> That is why it showed dirty.

(menuconfig changes don't show up like this)


More than 64 CPUs ... Is this system running more VMs than it has VMIDs? Too-few
VMIDs does work with KVM, its just going to trigger rollover frequently.

Just to check, what kernel version is the host running? Does it have commit
f0cf47d939d0 ("KVM: arm/arm64: Close VMID generation race")
(looks like that went in as a fix for v4.17-rc3)

Are you running (lots) of other VMs whenever this happens? Do they have multiple
vcpus? (I'm thinking of the scenario in that patch's description)

Is the host system otherwise idle when this happens?
(If not, can you reproduce the issue without exhausting the VMIDs?)


It may be that writing back the page-table entries with the MMU off, and
changing the cache maintenance are just changing the timing of something else.


Thanks,

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-28  8:45                       ` James Morse
  0 siblings, 0 replies; 79+ messages in thread
From: James Morse @ 2018-06-28  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Wei,

On 27/06/18 14:26, Wei Xu wrote:
> Sorry, I should highlight that I have only updated the default value
> of CONFIG_NR_CPUS by menuconfig in the previous mail.
> That is why it showed dirty.

(menuconfig changes don't show up like this)


More than 64 CPUs ... Is this system running more VMs than it has VMIDs? Too-few
VMIDs does work with KVM, its just going to trigger rollover frequently.

Just to check, what kernel version is the host running? Does it have commit
f0cf47d939d0 ("KVM: arm/arm64: Close VMID generation race")
(looks like that went in as a fix for v4.17-rc3)

Are you running (lots) of other VMs whenever this happens? Do they have multiple
vcpus? (I'm thinking of the scenario in that patch's description)

Is the host system otherwise idle when this happens?
(If not, can you reproduce the issue without exhausting the VMIDs?)


It may be that writing back the page-table entries with the MMU off, and
changing the cache maintenance are just changing the timing of something else.


Thanks,

James

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-28  8:45                       ` James Morse
@ 2018-06-28 10:20                         ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-28 10:20 UTC (permalink / raw)
  To: James Morse
  Cc: Will Deacon, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution),
	libeijian

Hi James,

On 2018/6/28 9:45, James Morse wrote:
> Hi Wei,
> 
> On 27/06/18 14:26, Wei Xu wrote:
>> Sorry, I should highlight that I have only updated the default value
>> of CONFIG_NR_CPUS by menuconfig in the previous mail.
>> That is why it showed dirty.
> 
> (menuconfig changes don't show up like this)

Thanks!
Sorry, yes, you are right.
I did not see dirty after I reset the proc.S.

> 
> 
> More than 64 CPUs ... Is this system running more VMs than it has VMIDs? Too-few
> VMIDs does work with KVM, its just going to trigger rollover frequently.
>

No, we just ran one VM.

> Just to check, what kernel version is the host running? Does it have commit
> f0cf47d939d0 ("KVM: arm/arm64: Close VMID generation race")
> (looks like that went in as a fix for v4.17-rc3)

Yes, the host is runing 4.18-rc2 as the guest including above commit.

> 
> Are you running (lots) of other VMs whenever this happens? Do they have multiple
> vcpus? (I'm thinking of the scenario in that patch's description)

No, we just ran one VM with 1 cpu.

> 
> Is the host system otherwise idle when this happens?
> (If not, can you reproduce the issue without exhausting the VMIDs?)
> 
> 
> It may be that writing back the page-table entries with the MMU off, and
> changing the cache maintenance are just changing the timing of something else.
> 

Yes, maybe. Now we are debugging with the SoC guys together.
Thanks!

Best Regards,
Wei

> 
> Thanks,
> 
> James
> 
> .
> 


^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-28 10:20                         ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-28 10:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

On 2018/6/28 9:45, James Morse wrote:
> Hi Wei,
> 
> On 27/06/18 14:26, Wei Xu wrote:
>> Sorry, I should highlight that I have only updated the default value
>> of CONFIG_NR_CPUS by menuconfig in the previous mail.
>> That is why it showed dirty.
> 
> (menuconfig changes don't show up like this)

Thanks!
Sorry, yes, you are right.
I did not see dirty after I reset the proc.S.

> 
> 
> More than 64 CPUs ... Is this system running more VMs than it has VMIDs? Too-few
> VMIDs does work with KVM, its just going to trigger rollover frequently.
>

No, we just ran one VM.

> Just to check, what kernel version is the host running? Does it have commit
> f0cf47d939d0 ("KVM: arm/arm64: Close VMID generation race")
> (looks like that went in as a fix for v4.17-rc3)

Yes, the host is runing 4.18-rc2 as the guest including above commit.

> 
> Are you running (lots) of other VMs whenever this happens? Do they have multiple
> vcpus? (I'm thinking of the scenario in that patch's description)

No, we just ran one VM with 1 cpu.

> 
> Is the host system otherwise idle when this happens?
> (If not, can you reproduce the issue without exhausting the VMIDs?)
> 
> 
> It may be that writing back the page-table entries with the MMU off, and
> changing the cache maintenance are just changing the timing of something else.
> 

Yes, maybe. Now we are debugging with the SoC guys together.
Thanks!

Best Regards,
Wei

> 
> Thanks,
> 
> James
> 
> .
> 

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-27 13:28                     ` Will Deacon
@ 2018-06-28 14:50                       ` Wei Xu
  -1 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-28 14:50 UTC (permalink / raw)
  To: Will Deacon
  Cc: James Morse, mark.rutland, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution),
	libeijian, zhangxiquan, wxf.wang, dingshuai1, Hanjun Guo,
	Liguozhu (Kenneth),
	zhangxiquan, wxf.wang, Hanjun Guo, dingshuai1

Hi Will,

On 2018/6/27 14:28, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
>> On 2018/6/26 18:47, Will Deacon wrote:
>>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
>>> replacing:
>>>
>>> 	dc      civac, cur_\()\type\()p
>>>
>>> with:
>>>
>>> 	dc      ivac, cur_\()\type\()p
>>>
>>> please? Only do this for the guest kernel, not the host. KVM will upgrade
>>> the clean to a clean+invalidate, so it's interesting to see if this has
>>> an effect on the behaviour.
>>
>> Only changed the guest kernel, the guest still failed to boot and the log
>> is same with the last mail.
>>
>> But if I changed to cvac as below for the guest, it is kind of stable.
>> 	dc      cvac, cur_\()\type\()p
>>
>> I have synced with our SoC guys about this and hope we can find the reason.
>> Do you have any more suggestion?
> 
> Unfortunately, not. It looks like somehow clean+invalidate is behaving
> just as an invalidate, and we're corrupting the page table as a result.
> 
> Hopefully the SoC guys will figure it out.

After replaced the dmb with dsb in both __idmap_kpti_get_pgtable_ent and
__idmap_kpti_put_pgtable_ent_ng, we tested 20 times and we can not reproduce
the issue.
Today we will continue to do the stress testing and will update the result tomorrow.

The dsb in __idmap_kpti_get_pgtable_ent is to make sure the dc has been done and
the following ldr can get the latest data.

The dsb in __idmap_kpti_put_pgtable_ent_ng is to make sure the str will be done
before dc. Although dmb can guarantee the order of the str and dc on the L2 cache,
dmb can not guarantee the order on the bus.

How do you think about it?
Thanks!

----

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 03646e6..bb767ea 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -209,7 +209,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)

        .macro  __idmap_kpti_get_pgtable_ent, type
        dc      cvac, cur_\()\type\()p          // Ensure any existing dirty
-       dmb     sy                              // lines are written back before
+       dsb     sy                              // lines are written back before
        ldr     \type, [cur_\()\type\()p]       // loading the entry
        tbz     \type, #0, skip_\()\type        // Skip invalid and
        tbnz    \type, #11, skip_\()\type       // non-global entries
@@ -218,8 +218,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
        .macro __idmap_kpti_put_pgtable_ent_ng, type
        orr     \type, \type, #PTE_NG           // Same bit for blocks and pages
        str     \type, [cur_\()\type\()p]       // Update the entry and ensure
-       dmb     sy                              // that it is visible to all
+       dsb     sy                              // that it is visible to all
        dc      civac, cur_\()\type\()p         // CPUs. 	


Best Regards,
Wei

> 
> Will
> 
> .
> 


^ permalink raw reply related	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-28 14:50                       ` Wei Xu
  0 siblings, 0 replies; 79+ messages in thread
From: Wei Xu @ 2018-06-28 14:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 2018/6/27 14:28, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
>> On 2018/6/26 18:47, Will Deacon wrote:
>>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
>>> replacing:
>>>
>>> 	dc      civac, cur_\()\type\()p
>>>
>>> with:
>>>
>>> 	dc      ivac, cur_\()\type\()p
>>>
>>> please? Only do this for the guest kernel, not the host. KVM will upgrade
>>> the clean to a clean+invalidate, so it's interesting to see if this has
>>> an effect on the behaviour.
>>
>> Only changed the guest kernel, the guest still failed to boot and the log
>> is same with the last mail.
>>
>> But if I changed to cvac as below for the guest, it is kind of stable.
>> 	dc      cvac, cur_\()\type\()p
>>
>> I have synced with our SoC guys about this and hope we can find the reason.
>> Do you have any more suggestion?
> 
> Unfortunately, not. It looks like somehow clean+invalidate is behaving
> just as an invalidate, and we're corrupting the page table as a result.
> 
> Hopefully the SoC guys will figure it out.

After replaced the dmb with dsb in both __idmap_kpti_get_pgtable_ent and
__idmap_kpti_put_pgtable_ent_ng, we tested 20 times and we can not reproduce
the issue.
Today we will continue to do the stress testing and will update the result tomorrow.

The dsb in __idmap_kpti_get_pgtable_ent is to make sure the dc has been done and
the following ldr can get the latest data.

The dsb in __idmap_kpti_put_pgtable_ent_ng is to make sure the str will be done
before dc. Although dmb can guarantee the order of the str and dc on the L2 cache,
dmb can not guarantee the order on the bus.

How do you think about it?
Thanks!

----

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 03646e6..bb767ea 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -209,7 +209,7 @@ ENDPROC(idmap_cpu_replace_ttbr1)

        .macro  __idmap_kpti_get_pgtable_ent, type
        dc      cvac, cur_\()\type\()p          // Ensure any existing dirty
-       dmb     sy                              // lines are written back before
+       dsb     sy                              // lines are written back before
        ldr     \type, [cur_\()\type\()p]       // loading the entry
        tbz     \type, #0, skip_\()\type        // Skip invalid and
        tbnz    \type, #11, skip_\()\type       // non-global entries
@@ -218,8 +218,9 @@ ENDPROC(idmap_cpu_replace_ttbr1)
        .macro __idmap_kpti_put_pgtable_ent_ng, type
        orr     \type, \type, #PTE_NG           // Same bit for blocks and pages
        str     \type, [cur_\()\type\()p]       // Update the entry and ensure
-       dmb     sy                              // that it is visible to all
+       dsb     sy                              // that it is visible to all
        dc      civac, cur_\()\type\()p         // CPUs. 	


Best Regards,
Wei

> 
> Will
> 
> .
> 

^ permalink raw reply related	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-28 14:50                       ` Wei Xu
@ 2018-06-28 15:34                         ` Mark Rutland
  -1 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-28 15:34 UTC (permalink / raw)
  To: Wei Xu
  Cc: Will Deacon, James Morse, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution),
	libeijian, zhangxiquan, wxf.wang, dingshuai1, Hanjun Guo,
	Liguozhu (Kenneth)

On Thu, Jun 28, 2018 at 03:50:40PM +0100, Wei Xu wrote:
> Hi Will,
> 
> On 2018/6/27 14:28, Will Deacon wrote:
> > On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
> >> On 2018/6/26 18:47, Will Deacon wrote:
> >>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> >>> replacing:
> >>>
> >>> 	dc      civac, cur_\()\type\()p
> >>>
> >>> with:
> >>>
> >>> 	dc      ivac, cur_\()\type\()p
> >>>
> >>> please? Only do this for the guest kernel, not the host. KVM will upgrade
> >>> the clean to a clean+invalidate, so it's interesting to see if this has
> >>> an effect on the behaviour.
> >>
> >> Only changed the guest kernel, the guest still failed to boot and the log
> >> is same with the last mail.
> >>
> >> But if I changed to cvac as below for the guest, it is kind of stable.
> >> 	dc      cvac, cur_\()\type\()p
> >>
> >> I have synced with our SoC guys about this and hope we can find the reason.
> >> Do you have any more suggestion?
> > 
> > Unfortunately, not. It looks like somehow clean+invalidate is behaving
> > just as an invalidate, and we're corrupting the page table as a result.
> > 
> > Hopefully the SoC guys will figure it out.
> 
> After replaced the dmb with dsb in both __idmap_kpti_get_pgtable_ent and
> __idmap_kpti_put_pgtable_ent_ng, we tested 20 times and we can not reproduce
> the issue.
> Today we will continue to do the stress testing and will update the result tomorrow.
> 
> The dsb in __idmap_kpti_get_pgtable_ent is to make sure the dc has been done and
> the following ldr can get the latest data.
> 
> The dsb in __idmap_kpti_put_pgtable_ent_ng is to make sure the str will be done
> before dc. Although dmb can guarantee the order of the str and dc on the L2 cache,
> dmb can not guarantee the order on the bus.

The architecture mandates that a DMB must provide this ordering, so that
would be an erratum.

Per ARM DDI 0487C.a, page D3-2069, "Ordering and completion of data and instruction
cache instructions":

  All data cache instructions, other than DC ZVA, that specify an
  address:

  * Can execute in any order relative to loads or stores that access any
    address with the Device memory attribute,or with Normal memory with
    Inner Non-cacheable attribute unless a DMB or DSB is executed
    between the instructions.

Note that we rely on this ordering in head.S when creating the page
tables and setting up the boot mode. We also rely on this for the pmem
API.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-28 15:34                         ` Mark Rutland
  0 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-28 15:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 28, 2018 at 03:50:40PM +0100, Wei Xu wrote:
> Hi Will,
> 
> On 2018/6/27 14:28, Will Deacon wrote:
> > On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
> >> On 2018/6/26 18:47, Will Deacon wrote:
> >>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> >>> replacing:
> >>>
> >>> 	dc      civac, cur_\()\type\()p
> >>>
> >>> with:
> >>>
> >>> 	dc      ivac, cur_\()\type\()p
> >>>
> >>> please? Only do this for the guest kernel, not the host. KVM will upgrade
> >>> the clean to a clean+invalidate, so it's interesting to see if this has
> >>> an effect on the behaviour.
> >>
> >> Only changed the guest kernel, the guest still failed to boot and the log
> >> is same with the last mail.
> >>
> >> But if I changed to cvac as below for the guest, it is kind of stable.
> >> 	dc      cvac, cur_\()\type\()p
> >>
> >> I have synced with our SoC guys about this and hope we can find the reason.
> >> Do you have any more suggestion?
> > 
> > Unfortunately, not. It looks like somehow clean+invalidate is behaving
> > just as an invalidate, and we're corrupting the page table as a result.
> > 
> > Hopefully the SoC guys will figure it out.
> 
> After replaced the dmb with dsb in both __idmap_kpti_get_pgtable_ent and
> __idmap_kpti_put_pgtable_ent_ng, we tested 20 times and we can not reproduce
> the issue.
> Today we will continue to do the stress testing and will update the result tomorrow.
> 
> The dsb in __idmap_kpti_get_pgtable_ent is to make sure the dc has been done and
> the following ldr can get the latest data.
> 
> The dsb in __idmap_kpti_put_pgtable_ent_ng is to make sure the str will be done
> before dc. Although dmb can guarantee the order of the str and dc on the L2 cache,
> dmb can not guarantee the order on the bus.

The architecture mandates that a DMB must provide this ordering, so that
would be an erratum.

Per ARM DDI 0487C.a, page D3-2069, "Ordering and completion of data and instruction
cache instructions":

  All data cache instructions, other than DC ZVA, that specify an
  address:

  * Can execute in any order relative to loads or stores that access any
    address with the Device memory attribute,or with Normal memory with
    Inner Non-cacheable attribute unless a DMB or DSB is executed
    between the instructions.

Note that we rely on this ordering in head.S when creating the page
tables and setting up the boot mode. We also rely on this for the pmem
API.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: 答复: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
       [not found]                         ` <etPan.5b3507f7.914aa16.1d6b@localhost>
@ 2018-06-28 16:24                             ` Mark Rutland
  2018-06-29  8:47                           ` Marc Zyngier
  1 sibling, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-28 16:24 UTC (permalink / raw)
  To: Wangxuefeng (E)
  Cc: xuwei (O),
	will.deacon, james.morse, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution),
	Libeijian, Zhangxiquan, dingshuai, Guohanjun (Hanjun Guo),
	Liguozhu (Kenneth)

On Thu, Jun 28, 2018 at 04:08:24PM +0000, Wangxuefeng (E) wrote:
> Hi, mark
>      Your means is that DMB must  make sure the completion of prior load/store
> or CMO  and make sure the data is visible to all obsevers (no matter device or
> cacheable).   DMB not only keep order?

Not quite -- DMB does not guarantee completion.

However, DMB must guarantee that loads/stores and CMOs are ordered on
the bus, all the way to the PoC.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* 答复: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-28 16:24                             ` Mark Rutland
  0 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-28 16:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 28, 2018 at 04:08:24PM +0000, Wangxuefeng (E) wrote:
> Hi, mark
>      Your means is that DMB must  make sure the completion of prior load/store
> or CMO  and make sure the data is visible to all obsevers (no matter device or
> cacheable).   DMB not only keep order?

Not quite -- DMB does not guarantee completion.

However, DMB must guarantee that loads/stores and CMOs are ordered on
the bus, all the way to the PoC.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
       [not found]                         ` <etPan.5b3507f7.914aa16.1d6b@localhost>
  2018-06-28 16:24                             ` Mark Rutland
@ 2018-06-29  8:47                           ` Marc Zyngier
  1 sibling, 0 replies; 79+ messages in thread
From: Marc Zyngier @ 2018-06-29  8:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 28 Jun 2018 20:18:13 +0100,
Zhangxiquan <zhangxiquan@hisilicon.com> wrote:
> 
> [1  <text/plain; utf-8 (base64)>]
> Hi Mark ,
> 
> I clined to agree with you that DMB is enough for order of DC and LDST.
> 
> Just want to know , has this code ever passed on any ARMer
> implementation ?such ad A75,A72....

This code has been tested on most ARM implementations, as this is what
we used to develop it. So far, you have the only example we know of
where this sequence doesn't work as expected.

More importantly, this piece of code is written to match the ARMv8
architecture requirements, and we can only expect implementations to
follow the exact same requirements.

Thanks,

	M.

-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* Re: KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
  2018-06-28 16:24                             ` Mark Rutland
@ 2018-06-29  9:59                               ` Mark Rutland
  -1 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-29  9:59 UTC (permalink / raw)
  To: Zhangxiquan
  Cc: Wangxuefeng (E), xuwei (O),
	will.deacon, james.morse, catalin.marinas, Linuxarm, Zhangyi ac,
	suzuki.poulose, marc.zyngier, Xiongfanggou (James),
	linux-arm-kernel, linux-kernel, dave.martin, Liyuan (Larry,
	Turing Solution), Libeijian, dingshuai, Guohanjun (Hanjun Guo),
	Liguozhu (Kenneth)

On Thu, Jun 28, 2018 at 07:24:30PM +0000, Zhangxiquan wrote:
> Do you think this order guarantee (between DC and ldst)is applicable for
> cacheable only , or it is also applicable for device ?

This also applies for device memory.

As I quoted previously, from ARM DDI 0487C.a page D3-2069:

  All data cache instructions, other than DC ZVA , that specify an
  address:

  * Can execute in any order relative to loads or stores that access any
    address with the Device memory attribute, or with Normal memory with
    Inner Non-cacheable attribute unless a DMB or DSB is executed
    between the instructions.

i.e. a DMB is sufficient to provide order between DC and loads/stores
which access device memory.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 79+ messages in thread

* KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.
@ 2018-06-29  9:59                               ` Mark Rutland
  0 siblings, 0 replies; 79+ messages in thread
From: Mark Rutland @ 2018-06-29  9:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 28, 2018 at 07:24:30PM +0000, Zhangxiquan wrote:
> Do you think this order guarantee (between DC and ldst)is applicable for
> cacheable only , or it is also applicable for device ?

This also applies for device memory.

As I quoted previously, from ARM DDI 0487C.a page D3-2069:

  All data cache instructions, other than DC ZVA , that specify an
  address:

  * Can execute in any order relative to loads or stores that access any
    address with the Device memory attribute, or with Normal memory with
    Inner Non-cacheable attribute unless a DMB or DSB is executed
    between the instructions.

i.e. a DMB is sufficient to provide order between DC and loads/stores
which access device memory.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 79+ messages in thread

end of thread, other threads:[~2018-06-29 10:00 UTC | newest]

Thread overview: 79+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-20 14:18 KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform Wei Xu
2018-06-20 14:18 ` Wei Xu
2018-06-20 14:42 ` Will Deacon
2018-06-20 14:42   ` Will Deacon
2018-06-20 15:52   ` Wei Xu
2018-06-20 15:52     ` Wei Xu
2018-06-20 15:54     ` James Morse
2018-06-20 15:54       ` James Morse
2018-06-20 16:25       ` Wei Xu
2018-06-20 16:25         ` Wei Xu
2018-06-20 16:28         ` Will Deacon
2018-06-20 16:28           ` Will Deacon
2018-06-20 16:33           ` Wei Xu
2018-06-20 16:33             ` Wei Xu
2018-06-21  8:38         ` James Morse
2018-06-21  8:38           ` James Morse
2018-06-21  9:00           ` Marc Zyngier
2018-06-21  9:00             ` Marc Zyngier
2018-06-21  9:18           ` Will Deacon
2018-06-21  9:18             ` Will Deacon
2018-06-21 10:14             ` Wei Xu
2018-06-21 10:14               ` Wei Xu
2018-06-21 10:54               ` Will Deacon
2018-06-21 10:54                 ` Will Deacon
2018-06-22  8:33                 ` Wei Xu
2018-06-22  8:33                   ` Wei Xu
2018-06-22  9:23                   ` Will Deacon
2018-06-22  9:23                     ` Will Deacon
2018-06-22 10:45                     ` Wei Xu
2018-06-22 10:45                       ` Wei Xu
2018-06-22 11:16                       ` Will Deacon
2018-06-22 11:16                         ` Will Deacon
2018-06-22 13:18                         ` Wei Xu
2018-06-22 13:18                           ` Wei Xu
2018-06-22 13:31                           ` Will Deacon
2018-06-22 13:31                             ` Will Deacon
2018-06-22 13:46                             ` Wei Xu
2018-06-22 13:46                               ` Wei Xu
2018-06-22 14:43                               ` Will Deacon
2018-06-22 14:43                                 ` Will Deacon
2018-06-22 15:26                                 ` Wei Xu
2018-06-22 15:26                                   ` Wei Xu
2018-06-22 14:28                           ` Mark Rutland
2018-06-22 14:28                             ` Mark Rutland
2018-06-22 15:28                             ` Wei Xu
2018-06-22 15:28                               ` Wei Xu
2018-06-22 15:41                               ` Will Deacon
2018-06-22 15:41                                 ` Will Deacon
2018-06-22 16:02                                 ` Wei Xu
2018-06-22 16:02                                   ` Wei Xu
2018-06-21  9:20           ` Wei Xu
2018-06-21  9:20             ` Wei Xu
2018-06-26 17:16             ` Wei Xu
2018-06-26 17:16               ` Wei Xu
2018-06-26 17:47               ` Will Deacon
2018-06-26 17:47                 ` Will Deacon
2018-06-27  8:39                 ` James Morse
2018-06-27  8:39                   ` James Morse
2018-06-27 13:26                   ` Wei Xu
2018-06-27 13:26                     ` Wei Xu
2018-06-28  8:45                     ` James Morse
2018-06-28  8:45                       ` James Morse
2018-06-28 10:20                       ` Wei Xu
2018-06-28 10:20                         ` Wei Xu
2018-06-27 13:22                 ` Wei Xu
2018-06-27 13:22                   ` Wei Xu
2018-06-27 13:28                   ` Will Deacon
2018-06-27 13:28                     ` Will Deacon
2018-06-27 13:32                     ` Wei Xu
2018-06-27 13:32                       ` Wei Xu
2018-06-28 14:50                     ` Wei Xu
2018-06-28 14:50                       ` Wei Xu
2018-06-28 15:34                       ` Mark Rutland
2018-06-28 15:34                         ` Mark Rutland
     [not found]                         ` <etPan.5b3507f7.914aa16.1d6b@localhost>
2018-06-28 16:24                           ` 答复: " Mark Rutland
2018-06-28 16:24                             ` Mark Rutland
2018-06-29  9:59                             ` Mark Rutland
2018-06-29  9:59                               ` Mark Rutland
2018-06-29  8:47                           ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.