All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: Bharata B Rao <bharata@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org, aneesh.kumar@linux.ibm.com,
	npiggin@gmail.com, mpe@ellerman.id.au,
	linux-kernel@vger.kernel.org, sfr@canb.auug.org.au,
	linux-next@vger.kernel.org
Subject: Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE
Date: Thu, 16 Jul 2020 13:27:14 -0400	[thread overview]
Message-ID: <20200716172713.GA4565@lca.pw> (raw)
In-Reply-To: <20200703053608.12884-1-bharata@linux.ibm.com>

On Fri, Jul 03, 2020 at 11:06:05AM +0530, Bharata B Rao wrote:
> Hypervisor may choose not to enable Guest Translation Shootdown Enable
> (GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
> permitted to use instructions like tblie and tlbsync directly, but is
> expected to make hypervisor calls to get the TLB flushed.
> 
> This series enables the TLB flush routines in the radix code to
> off-load TLB flushing to hypervisor via the newly proposed hcall
> H_RPT_INVALIDATE. 
> 
> To easily check the availability of GTSE, it is made an MMU feature.
> The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
> handle GTSE as an optionally available feature and to not assume GTSE
> when radix support is available.
> 
> The actual hcall implementation for KVM isn't included in this
> patchset and will be posted separately.
> 
> Changes in v3
> =============
> - Fixed a bug in the hcall wrapper code where we were missing setting
>   H_RPTI_TYPE_NESTED while retrying the failed flush request with
>   a full flush for the nested case.
> - s/psize_to_h_rpti/psize_to_rpti_pgsize
> 
> v2: https://lore.kernel.org/linuxppc-dev/20200626131000.5207-1-bharata@linux.ibm.com/T/#t
> 
> Bharata B Rao (2):
>   powerpc/mm: Enable radix GTSE only if supported.
>   powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
>     enabled
> 
> Nicholas Piggin (1):
>   powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
>     !GTSE

Reverting the whole series fixed random memory corruptions during boot on
POWER9 PowerNV systems below.

IBM 8335-GTH (ibm,witherspoon)
POWER9, altivec supported
262144 MB memory, 2000 GB disk space

.config:
https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config

[    9.338996][  T925] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
[    9.339026][  T925] Faulting instruction address: 0x00000000
[    9.339051][  T925] Oops: Kernel access of bad area, sig: 11 [#1]
[    9.339064][  T925] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 NUMA PowerNV
[    9.339098][  T925] Modules linked in: dm_mirror dm_region_hash dm_log dm_mod
[    9.339150][  T925] CPU: 92 PID: 925 Comm: (md-udevd) Not tainted 5.8.0-rc5-next-20200716 #3
[    9.339186][  T925] NIP:  0000000000000000 LR: c00000000021f2cc CTR: 0000000000000000
[    9.339210][  T925] REGS: c000201cb52d79b0 TRAP: 0400   Not tainted  (5.8.0-rc5-next-20200716)
[    9.339244][  T925] MSR:  9000000040009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24222292  XER: 00000000
[    9.339278][  T925] CFAR: c00000000021f2c8 IRQMASK: 0 
[    9.339278][  T925] GPR00: c00000000021f2cc c000201cb52d7c40 c000000005901000 c000201cb52d7ca8 
[    9.339278][  T925] GPR04: c00800000ea60038 0000000000000000 000000007fff0000 000000007fff0000 
[    9.339278][  T925] GPR08: 0000000000000000 0000000000000000 c000201cb50bd500 0000000000000003 
[    9.339278][  T925] GPR12: 0000000000000000 c000201fff694500 00007fffa4a8a940 00007fffa4a8a6c8 
[    9.339278][  T925] GPR16: 00007fffa4a8a8f8 00007fffa4a8a650 00007fffa4a8a488 0000000000000000 
[    9.339278][  T925] GPR20: 0000000000050001 00007fffa4a8a984 000000007fff0000 c00000000a4545cc 
[    9.339278][  T925] GPR24: c000000000affe28 0000000000000000 0000000000000000 0000000000000166 
[    9.339278][  T925] GPR28: c000201cb52d7ca8 c00800000ea60000 c000201cc3b72600 000000007fff0000 
[    9.339493][  T925] NIP [0000000000000000] 0x0
[    9.339516][  T925] LR [c00000000021f2cc] __seccomp_filter+0xec/0x530
bpf_dispatcher_nop_func at include/linux/bpf.h:567
(inlined by) bpf_prog_run_pin_on_cpu at include/linux/filter.h:597
(inlined by) seccomp_run_filters at kernel/seccomp.c:324
(inlined by) __seccomp_filter at kernel/seccomp.c:937
[    9.339538][  T925] Call Trace:
[    9.339548][  T925] [c000201cb52d7c40] [c00000000021f2cc] __seccomp_filter+0xec/0x530 (unreliable)
[    9.339566][  T925] [c000201cb52d7d50] [c000000000025af8] do_syscall_trace_enter+0xb8/0x470
do_seccomp at arch/powerpc/kernel/ptrace/ptrace.c:252
(inlined by) do_syscall_trace_enter at arch/powerpc/kernel/ptrace/ptrace.c:327
[    9.339600][  T925] [c000201cb52d7dc0] [c00000000002c8f8] system_call_exception+0x138/0x180
[    9.339625][  T925] [c000201cb52d7e20] [c00000000000c9e8] system_call_common+0xe8/0x214
[    9.339648][  T925] Instruction dump:
[    9.339667][  T925] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    9.339706][  T925] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    9.339748][  T925] ---[ end trace d89eb80f9a6bc141 ]---
[  OK  ] Started Journal Service.
[   10.452364][  T925] Kernel panic - not syncing: Fatal exception
[   11.876655][  T925] ---[ end Kernel panic - not syncing: Fatal exception ]---

There could also be lots of random userspace segfault like,

[   16.463545][  T771] rngd[771]: segfault (11) at 0 nip 0 lr 0 code 1 in rngd[106d60000+20000]
[   16.463620][  T771] rngd[771]: code: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[   16.463656][  T771] rngd[771]: code: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Occasionally, there are many soft-lockups,

[  396.920702][   C99] watchdog: BUG: soft lockup - CPU#99 stuck for 22s! [(spawn):2692]
[  396.920754][   C99] Modules linked in: kvm_hv kvm ip_tables x_tables sd_mod bnx2x tg3 ahci mdio libahci libphy firmware_class libata dm_mirror dm_region_hash dm_log dm_mod
[  396.920843][   C99] irq event stamp: 1731717220
[  396.920860][   C99] hardirqs last  enabled at (1731717219): [<c00000000004d6f4>] do_page_fault+0x324/0xd90
[  396.920889][   C99] hardirqs last disabled at (1731717220): [<c000000000015638>] arch_local_irq_restore+0x48/0xd0
[  396.920919][   C99] softirqs last  enabled at (41260): [<c0000000009abbe8>] __do_softirq+0x648/0x8c8
[  396.920948][   C99] softirqs last disabled at (41125): [<c0000000000d717c>] irq_exit+0x15c/0x1c0
[  396.920976][   C99] CPU: 99 PID: 2692 Comm: (spawn) Tainted: G             L    5.8.0-rc5-next-20200716 #3
[  396.921001][   C99] NIP:  c0000000000152b4 LR: c000000000015640 CTR: 0000000000000000
[  396.921037][   C99] REGS: c000201cbc3d7178 TRAP: 0900   Tainted: G             L     (5.8.0-rc5-next-20200716)
[  396.921074][   C99] MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28022482  XER: 20040000
[  396.921122][   C99] CFAR: 0000000000000000 IRQMASK: 3 
[  396.921122][   C99] GPR00: c000000000015640 c000201cbc3d7340 c000000005901000 c000201cbc3d7178 
[  396.921122][   C99] GPR04: c0000000057d7280 0000000000000000 000000000002000a 0000000000000003 
[  396.921122][   C99] GPR08: 0000201cc61c0000 0000000000000000 0000000000000001 c00000000593f868 
[  396.921122][   C99] GPR12: 0000000000002000 c000201fff67e700 00007fffdcda3918 0000000139eeba60 
[  396.921122][   C99] GPR16: 0000000139f30130 00007fffdcda39c8 0000000139eea708 0000000000000000 
[  396.921122][   C99] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000008 
[  396.921122][   C99] GPR24: 0000000000000e60 0000000000000900 0000000000000500 0000000000000a00 
[  396.921122][   C99] GPR28: 0000000000000f00 0000000000000002 0000000000000003 c000201ca4212400 
[  396.921468][   C99] NIP [c0000000000152b4] replay_soft_interrupts+0x74/0x3b0
replay_soft_interrupts at arch/powerpc/kernel/irq.c:216
[  396.921504][   C99] LR [c000000000015640] arch_local_irq_restore+0x50/0xd0
arch_local_irq_restore at arch/powerpc/kernel/irq.c:375
[  396.921539][   C99] Call Trace:
[  396.921560][   C99] [c000201cbc3d7340] [c000000000015640] arch_local_irq_restore+0x50/0xd0 (unreliable)
[  396.921602][   C99] [c000201cbc3d7360] [c0000000009a0c68] lock_is_held_type+0xf8/0x180
[  396.921641][   C99] [c000201cbc3d73c0] [c0000000003e8cf0] mem_cgroup_from_task+0xa0/0x130
[  396.921666][   C99] [c000201cbc3d7400] [c000000000337950] handle_mm_fault+0x140/0x1d20
[  396.921703][   C99] [c000201cbc3d7500] [c00000000004d5ac] do_page_fault+0x1dc/0xd90
[  396.921763][   C99] [c000201cbc3d7600] [c00000000000c028] handle_page_fault+0x10/0x2c
[  396.921804][   C99] --- interrupt: 300 at futex_cleanup+0x3c0/0x740
[  396.921804][   C99]     LR = futex_cleanup+0x35c/0x740
[  396.921879][   C99] [c000201cbc3d79c0] [c0000000001df2e8] futex_exec_release+0x28/0x50
[  396.921929][   C99] [c000201cbc3d79f0] [c0000000000c5e54] exec_mm_release+0x24/0x50
[  396.921968][   C99] [c000201cbc3d7a30] [c000000000421e84] begin_new_exec+0x324/0xea0
[  396.922005][   C99] [c000201cbc3d7af0] [c0000000004d8f1c] load_elf_binary+0x7fc/0x1110
[  396.922042][   C99] [c000201cbc3d7bf0] [c000000000420824] exec_binprm+0x1c4/0x7d0
[  396.922079][   C99] [c000201cbc3d7cb0] [c000000000421540] do_execveat_common+0x710/0x960
[  396.922117][   C99] [c000201cbc3d7d90] [c000000000422a44] sys_execve+0x44/0x60
[  396.922156][   C99] [c000201cbc3d7dc0] [c00000000002c8b8] system_call_exception+0xf8/0x180
[  396.922205][   C99] [c000201cbc3d7e20] [c00000000000c9e8] system_call_common+0xe8/0x214
[  396.922253][   C99] Instruction dump:
[  396.922286][   C99] 3b000e60 3b400500 3b600a00 3b800f00 f8010010 f821fe11 38610028 e92d0c70 
[  396.922316][   C99] f9210198 39200000 8aed0989 48037df9 <60000000> 39200003 f9210160 56e90738


[  248.821138][  T676] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  248.821170][  T676] khugepaged      D28416   682      2 0x00000800 
[  248.821212][  T676] Call Trace: 
[  248.821241][  T676] [c000001ff310f4e0] [c000000000caeb80] lru_add_drain_work+0x0/0x48 (unreliable) 
[  248.821275][  T676] [c000001ff310f6c0] [c00000000001a2d0] __switch_to+0x260/0x380 
[  248.821308][  T676] [c000001ff310f720] [c0000000009a18b8] __schedule+0x398/0x9f0 
[  248.821352][  T676] [c000001ff310f7f0] [c0000000009a1fa8] schedule+0x98/0x160 
[  248.821387][  T676] [c000001ff310f820] [c0000000009a9814] schedule_timeout+0x304/0x520 
[  248.821432][  T676] [c000001ff310f960] [c0000000009a3c84] wait_for_completion+0xc4/0x1b0 
[  248.821460][  T676] [c000001ff310f9d0] [c0000000000fd0c8] __flush_work+0x3b8/0x770 
[  248.821491][  T676] [c000001ff310faf0] [c0000000002e0ac4] lru_add_drain_all+0x3e4/0x760 
[  248.821521][  T676] [c000001ff310fbf0] [c0000000003e0f18] khugepaged+0xd8/0x1770 
[  248.821560][  T676] [c000001ff310fdb0] [c0000000001095fc] kthread+0x1bc/0x1d0 
[  248.821611][  T676] [c000001ff310fe20] [c00000000000cbc4] ret_from_kernel_thread+0x5c/0x78 
[  248.821655][  T676] INFO: task kworker/56:1:719 blocked for more than 122 seconds. 
[  248.821689][  T676]       Tainted: G             L    5.8.0-rc5-next-20200716 #3 
[  248.821729][  T676] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  248.821779][  T676] kworker/56:1    D27584   719      2 0x00000800 
[  248.821839][  T676] Workqueue: rcu_gp wait_rcu_exp_gp 
[  248.821862][  T676] Call Trace: 
[  248.821888][  T676] [c000001ff2b57660] [c0000000057c9fb0] rcu_state+0x4fb0/0x5100 (unreliable) 
[  248.821934][  T676] [c000001ff2b57840] [c00000000001a2d0] __switch_to+0x260/0x380 
[  248.821977][  T676] [c000001ff2b578a0] [c0000000009a18b8] __schedule+0x398/0x9f0 
[  248.822021][  T676] [c000001ff2b57970] [c0000000009a1fa8] schedule+0x98/0x160 
[  248.822066][  T676] [c000001ff2b579a0] [c0000000009a970c] schedule_timeout+0x1fc/0x520 
[  248.822110][  T676] [c000001ff2b57ae0] [c0000000001a86d0] rcu_exp_wait_wake+0x1b0/0x950 
[  248.822153][  T676] [c000001ff2b57c30] [c0000000000fb754] process_one_work+0x304/0x900 
[  248.822197][  T676] [c000001ff2b57d20] [c0000000000fbdc8] worker_thread+0x78/0x520 
[  248.822242][  T676] [c000001ff2b57db0] [c0000000001095fc] kthread+0x1bc/0x1d0 
[  248.822279][  T676] [c000001ff2b57e20] [c00000000000cbc4] ret_from_kernel_thread+0x5c/0x78 
[  248.822385][  T676] INFO: task lvm:3123 blocked for more than 122 seconds. 
[  248.822413][  T676]       Tainted: G             L    5.8.0-rc5-next-20200716 #3 
[  248.822462][  T676] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  248.822503][  T676] lvm             D26608  3123      1 0x00040000 
[  248.822552][  T676] Call Trace: 
[  248.822576][  T676] [c000001feee27a00] [c00000000001a2d0] __switch_to+0x260/0x380 
[  248.822620][  T676] [c000001feee27a60] [c0000000009a18b8] __schedule+0x398/0x9f0 
[  248.822648][  T676] [c000001feee27b30] [c0000000009a1fa8] schedule+0x98/0x160 
[  248.822680][  T676] [c000001feee27b60] [c0000000009a9814] schedule_timeout+0x304/0x520 
[  248.822724][  T676] [c000001feee27ca0] [c0000000009a3c84] wait_for_completion+0xc4/0x1b0 
[  248.822768][  T676] [c000001feee27d10] [c0000000004b0e88] sys_io_destroy+0x238/0x2f0 
[  248.822808][  T676] [c000001feee27dc0] [c00000000002c8b8] system_call_exception+0xf8/0x180 
[  248.822840][  T676] [c000001feee27e20] [c00000000000c9e8] system_call_common+0xe8/0x214 
[  248.822873][  T676] INFO: task lvm:3126 blocked for more than 122 seconds. 
[  248.822901][  T676]       Tainted: G             L    5.8.0-rc5-next-20200716 #3 
[  248.822938][  T676] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  248.822987][  T676] lvm             D26608  3126      1 0x00040000 
[  248.823017][  T676] Call Trace: 
[  248.823031][  T676] [c000001fc0b27a00] [c00000000001a2d0] __switch_to+0x260/0x380 
[  248.823075][  T676] [c000001fc0b27a60] [c0000000009a18b8] __schedule+0x398/0x9f0 
[  248.823113][  T676] [c000001fc0b27b30] [c0000000009a1fa8] schedule+0x98/0x160 
[  248.823158][  T676] [c000001fc0b27b60] [c0000000009a9814] schedule_timeout+0x304/0x520 
[  248.823199][  T676] [c000001fc0b27ca0] [c0000000009a3c84] wait_for_completion+0xc4/0x1b0 
[  248.823250][  T676] [c000001fc0b27d10] [c0000000004b0e88] sys_io_destroy+0x238/0x2f0 
[  248.823294][  T676] [c000001fc0b27dc0] [c00000000002c8b8] system_call_exception+0xf8/0x180 
[  248.823332][  T676] [c000001fc0b27e20] [c00000000000c9e8] system_call_common+0xe8/0x214 
[  248.823374][  T676] INFO: task auditd:3163 blocked for more than 122 seconds. 
[  248.823424][  T676]       Tainted: G             L    5.8.0-rc5-next-20200716 #3 
[  248.823471][  T676] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  248.823512][  T676] auditd          D27088  3163      1 0x00042408 
[  248.823551][  T676] Call Trace: 
[  248.823583][  T676] [c000001fc080f760] [0000000200000001] 0x200000001 (unreliable) 
[  248.823648][  T676] [c000001fc080f940] [c00000000001a2d0] __switch_to+0x260/0x380 
[  248.823689][  T676] [c000001fc080f9a0] [c0000000009a18b8] __schedule+0x398/0x9f0 
[  248.823742][  T676] [c000001fc080fa70] [c0000000009a1fa8] schedule+0x98/0x160 
[  248.823784][  T676] [c000001fc080faa0] [c0000000001a9244] synchronize_rcu_expedited+0x394/0x600 
[  248.823837][  T676] [c000001fc080fba0] [c0000000004504c4] namespace_unlock+0xf4/0x230 
[  248.823881][  T676] [c000001fc080fc00] [c000000000456dec] put_mnt_ns+0x5c/0x80 
[  248.823926][  T676] [c000001fc080fc30] [c00000000010ba6c] free_nsproxy+0x2c/0x1e0 
[  248.823966][  T676] [c000001fc080fc60] [c0000000000d5130] do_exit+0x4e0/0xee0 
[  248.823997][  T676] [c000001fc080fd60] [c0000000000d5bec] do_group_exit+0x5c/0xd0 
[  248.824019][  T676] [c000001fc080fda0] [c0000000000d5c7c] sys_exit_group+0x1c/0x20 
[  248.824060][  T676] [c000001fc080fdc0] [c00000000002c8b8] system_call_exception+0xf8/0x180 
[  248.824103][  T676] [c000001fc080fe20] [c00000000000c9e8] system_call_common+0xe8/0x214 
[  248.824192][  T676]  
[  248.824192][  T676] Showing all locks held in the system: 
[  248.824419][  T676] 1 lock held by khungtaskd/676: 
[  248.824455][  T676]  #0: c0000000057c44c0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.29+0x8/0x30 
[  248.824531][  T676] 1 lock held by khugepaged/682: 
[  248.824565][  T676]  #0: c0000000057f42c8 (lock#4){+.+.}-{3:3}, at: lru_add_drain_all+0x68/0x760 
[  248.824679][  T676] 2 locks held by kworker/56:1/719: 
[  248.824742][  T676]  #0: c00000000bcc8938 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: process_one_work+0x21c/0x900 
[  248.824857][  T676]  #1: c000001ff2b57c90 ((work_completion)(&rew.rew_work)){+.+.}-{0:0}, at: process_one_work+0x21c/0x900 
[  248.825026][  T676] 3 locks held by (spawn)/2692: 
[  248.825077][  T676] 1 lock held by auditd/3163: 
[  248.825135][  T676]  #0: c0000000057c9ee8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x254/0x600 
[  248.825296][  T676] ============================================= 
[  248.825296][  T676]  

> 
>  .../include/asm/book3s/64/tlbflush-radix.h    | 15 ++++
>  arch/powerpc/include/asm/hvcall.h             | 34 +++++++-
>  arch/powerpc/include/asm/mmu.h                |  4 +
>  arch/powerpc/include/asm/plpar_wrappers.h     | 52 ++++++++++++
>  arch/powerpc/kernel/dt_cpu_ftrs.c             |  1 +
>  arch/powerpc/kernel/prom_init.c               | 13 +--
>  arch/powerpc/mm/book3s64/radix_tlb.c          | 82 +++++++++++++++++--
>  arch/powerpc/mm/init_64.c                     |  5 +-
>  arch/powerpc/platforms/pseries/lpar.c         |  8 +-
>  9 files changed, 197 insertions(+), 17 deletions(-)
> 
> -- 
> 2.21.3
> 

WARNING: multiple messages have this Message-ID (diff)
From: Qian Cai <cai@lca.pw>
To: Bharata B Rao <bharata@linux.ibm.com>
Cc: sfr@canb.auug.org.au, aneesh.kumar@linux.ibm.com,
	linux-kernel@vger.kernel.org, npiggin@gmail.com,
	linux-next@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE
Date: Thu, 16 Jul 2020 13:27:14 -0400	[thread overview]
Message-ID: <20200716172713.GA4565@lca.pw> (raw)
In-Reply-To: <20200703053608.12884-1-bharata@linux.ibm.com>

On Fri, Jul 03, 2020 at 11:06:05AM +0530, Bharata B Rao wrote:
> Hypervisor may choose not to enable Guest Translation Shootdown Enable
> (GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
> permitted to use instructions like tblie and tlbsync directly, but is
> expected to make hypervisor calls to get the TLB flushed.
> 
> This series enables the TLB flush routines in the radix code to
> off-load TLB flushing to hypervisor via the newly proposed hcall
> H_RPT_INVALIDATE. 
> 
> To easily check the availability of GTSE, it is made an MMU feature.
> The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
> handle GTSE as an optionally available feature and to not assume GTSE
> when radix support is available.
> 
> The actual hcall implementation for KVM isn't included in this
> patchset and will be posted separately.
> 
> Changes in v3
> =============
> - Fixed a bug in the hcall wrapper code where we were missing setting
>   H_RPTI_TYPE_NESTED while retrying the failed flush request with
>   a full flush for the nested case.
> - s/psize_to_h_rpti/psize_to_rpti_pgsize
> 
> v2: https://lore.kernel.org/linuxppc-dev/20200626131000.5207-1-bharata@linux.ibm.com/T/#t
> 
> Bharata B Rao (2):
>   powerpc/mm: Enable radix GTSE only if supported.
>   powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
>     enabled
> 
> Nicholas Piggin (1):
>   powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
>     !GTSE

Reverting the whole series fixed random memory corruptions during boot on
POWER9 PowerNV systems below.

IBM 8335-GTH (ibm,witherspoon)
POWER9, altivec supported
262144 MB memory, 2000 GB disk space

.config:
https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config

[    9.338996][  T925] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
[    9.339026][  T925] Faulting instruction address: 0x00000000
[    9.339051][  T925] Oops: Kernel access of bad area, sig: 11 [#1]
[    9.339064][  T925] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 NUMA PowerNV
[    9.339098][  T925] Modules linked in: dm_mirror dm_region_hash dm_log dm_mod
[    9.339150][  T925] CPU: 92 PID: 925 Comm: (md-udevd) Not tainted 5.8.0-rc5-next-20200716 #3
[    9.339186][  T925] NIP:  0000000000000000 LR: c00000000021f2cc CTR: 0000000000000000
[    9.339210][  T925] REGS: c000201cb52d79b0 TRAP: 0400   Not tainted  (5.8.0-rc5-next-20200716)
[    9.339244][  T925] MSR:  9000000040009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24222292  XER: 00000000
[    9.339278][  T925] CFAR: c00000000021f2c8 IRQMASK: 0 
[    9.339278][  T925] GPR00: c00000000021f2cc c000201cb52d7c40 c000000005901000 c000201cb52d7ca8 
[    9.339278][  T925] GPR04: c00800000ea60038 0000000000000000 000000007fff0000 000000007fff0000 
[    9.339278][  T925] GPR08: 0000000000000000 0000000000000000 c000201cb50bd500 0000000000000003 
[    9.339278][  T925] GPR12: 0000000000000000 c000201fff694500 00007fffa4a8a940 00007fffa4a8a6c8 
[    9.339278][  T925] GPR16: 00007fffa4a8a8f8 00007fffa4a8a650 00007fffa4a8a488 0000000000000000 
[    9.339278][  T925] GPR20: 0000000000050001 00007fffa4a8a984 000000007fff0000 c00000000a4545cc 
[    9.339278][  T925] GPR24: c000000000affe28 0000000000000000 0000000000000000 0000000000000166 
[    9.339278][  T925] GPR28: c000201cb52d7ca8 c00800000ea60000 c000201cc3b72600 000000007fff0000 
[    9.339493][  T925] NIP [0000000000000000] 0x0
[    9.339516][  T925] LR [c00000000021f2cc] __seccomp_filter+0xec/0x530
bpf_dispatcher_nop_func at include/linux/bpf.h:567
(inlined by) bpf_prog_run_pin_on_cpu at include/linux/filter.h:597
(inlined by) seccomp_run_filters at kernel/seccomp.c:324
(inlined by) __seccomp_filter at kernel/seccomp.c:937
[    9.339538][  T925] Call Trace:
[    9.339548][  T925] [c000201cb52d7c40] [c00000000021f2cc] __seccomp_filter+0xec/0x530 (unreliable)
[    9.339566][  T925] [c000201cb52d7d50] [c000000000025af8] do_syscall_trace_enter+0xb8/0x470
do_seccomp at arch/powerpc/kernel/ptrace/ptrace.c:252
(inlined by) do_syscall_trace_enter at arch/powerpc/kernel/ptrace/ptrace.c:327
[    9.339600][  T925] [c000201cb52d7dc0] [c00000000002c8f8] system_call_exception+0x138/0x180
[    9.339625][  T925] [c000201cb52d7e20] [c00000000000c9e8] system_call_common+0xe8/0x214
[    9.339648][  T925] Instruction dump:
[    9.339667][  T925] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    9.339706][  T925] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[    9.339748][  T925] ---[ end trace d89eb80f9a6bc141 ]---
[  OK  ] Started Journal Service.
[   10.452364][  T925] Kernel panic - not syncing: Fatal exception
[   11.876655][  T925] ---[ end Kernel panic - not syncing: Fatal exception ]---

There could also be lots of random userspace segfault like,

[   16.463545][  T771] rngd[771]: segfault (11) at 0 nip 0 lr 0 code 1 in rngd[106d60000+20000]
[   16.463620][  T771] rngd[771]: code: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX 
[   16.463656][  T771] rngd[771]: code: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Occasionally, there are many soft-lockups,

[  396.920702][   C99] watchdog: BUG: soft lockup - CPU#99 stuck for 22s! [(spawn):2692]
[  396.920754][   C99] Modules linked in: kvm_hv kvm ip_tables x_tables sd_mod bnx2x tg3 ahci mdio libahci libphy firmware_class libata dm_mirror dm_region_hash dm_log dm_mod
[  396.920843][   C99] irq event stamp: 1731717220
[  396.920860][   C99] hardirqs last  enabled at (1731717219): [<c00000000004d6f4>] do_page_fault+0x324/0xd90
[  396.920889][   C99] hardirqs last disabled at (1731717220): [<c000000000015638>] arch_local_irq_restore+0x48/0xd0
[  396.920919][   C99] softirqs last  enabled at (41260): [<c0000000009abbe8>] __do_softirq+0x648/0x8c8
[  396.920948][   C99] softirqs last disabled at (41125): [<c0000000000d717c>] irq_exit+0x15c/0x1c0
[  396.920976][   C99] CPU: 99 PID: 2692 Comm: (spawn) Tainted: G             L    5.8.0-rc5-next-20200716 #3
[  396.921001][   C99] NIP:  c0000000000152b4 LR: c000000000015640 CTR: 0000000000000000
[  396.921037][   C99] REGS: c000201cbc3d7178 TRAP: 0900   Tainted: G             L     (5.8.0-rc5-next-20200716)
[  396.921074][   C99] MSR:  9000000000001033 <SF,HV,ME,IR,DR,RI,LE>  CR: 28022482  XER: 20040000
[  396.921122][   C99] CFAR: 0000000000000000 IRQMASK: 3 
[  396.921122][   C99] GPR00: c000000000015640 c000201cbc3d7340 c000000005901000 c000201cbc3d7178 
[  396.921122][   C99] GPR04: c0000000057d7280 0000000000000000 000000000002000a 0000000000000003 
[  396.921122][   C99] GPR08: 0000201cc61c0000 0000000000000000 0000000000000001 c00000000593f868 
[  396.921122][   C99] GPR12: 0000000000002000 c000201fff67e700 00007fffdcda3918 0000000139eeba60 
[  396.921122][   C99] GPR16: 0000000139f30130 00007fffdcda39c8 0000000139eea708 0000000000000000 
[  396.921122][   C99] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000008 
[  396.921122][   C99] GPR24: 0000000000000e60 0000000000000900 0000000000000500 0000000000000a00 
[  396.921122][   C99] GPR28: 0000000000000f00 0000000000000002 0000000000000003 c000201ca4212400 
[  396.921468][   C99] NIP [c0000000000152b4] replay_soft_interrupts+0x74/0x3b0
replay_soft_interrupts at arch/powerpc/kernel/irq.c:216
[  396.921504][   C99] LR [c000000000015640] arch_local_irq_restore+0x50/0xd0
arch_local_irq_restore at arch/powerpc/kernel/irq.c:375
[  396.921539][   C99] Call Trace:
[  396.921560][   C99] [c000201cbc3d7340] [c000000000015640] arch_local_irq_restore+0x50/0xd0 (unreliable)
[  396.921602][   C99] [c000201cbc3d7360] [c0000000009a0c68] lock_is_held_type+0xf8/0x180
[  396.921641][   C99] [c000201cbc3d73c0] [c0000000003e8cf0] mem_cgroup_from_task+0xa0/0x130
[  396.921666][   C99] [c000201cbc3d7400] [c000000000337950] handle_mm_fault+0x140/0x1d20
[  396.921703][   C99] [c000201cbc3d7500] [c00000000004d5ac] do_page_fault+0x1dc/0xd90
[  396.921763][   C99] [c000201cbc3d7600] [c00000000000c028] handle_page_fault+0x10/0x2c
[  396.921804][   C99] --- interrupt: 300 at futex_cleanup+0x3c0/0x740
[  396.921804][   C99]     LR = futex_cleanup+0x35c/0x740
[  396.921879][   C99] [c000201cbc3d79c0] [c0000000001df2e8] futex_exec_release+0x28/0x50
[  396.921929][   C99] [c000201cbc3d79f0] [c0000000000c5e54] exec_mm_release+0x24/0x50
[  396.921968][   C99] [c000201cbc3d7a30] [c000000000421e84] begin_new_exec+0x324/0xea0
[  396.922005][   C99] [c000201cbc3d7af0] [c0000000004d8f1c] load_elf_binary+0x7fc/0x1110
[  396.922042][   C99] [c000201cbc3d7bf0] [c000000000420824] exec_binprm+0x1c4/0x7d0
[  396.922079][   C99] [c000201cbc3d7cb0] [c000000000421540] do_execveat_common+0x710/0x960
[  396.922117][   C99] [c000201cbc3d7d90] [c000000000422a44] sys_execve+0x44/0x60
[  396.922156][   C99] [c000201cbc3d7dc0] [c00000000002c8b8] system_call_exception+0xf8/0x180
[  396.922205][   C99] [c000201cbc3d7e20] [c00000000000c9e8] system_call_common+0xe8/0x214
[  396.922253][   C99] Instruction dump:
[  396.922286][   C99] 3b000e60 3b400500 3b600a00 3b800f00 f8010010 f821fe11 38610028 e92d0c70 
[  396.922316][   C99] f9210198 39200000 8aed0989 48037df9 <60000000> 39200003 f9210160 56e90738


[  248.821138][  T676] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  248.821170][  T676] khugepaged      D28416   682      2 0x00000800 
[  248.821212][  T676] Call Trace: 
[  248.821241][  T676] [c000001ff310f4e0] [c000000000caeb80] lru_add_drain_work+0x0/0x48 (unreliable) 
[  248.821275][  T676] [c000001ff310f6c0] [c00000000001a2d0] __switch_to+0x260/0x380 
[  248.821308][  T676] [c000001ff310f720] [c0000000009a18b8] __schedule+0x398/0x9f0 
[  248.821352][  T676] [c000001ff310f7f0] [c0000000009a1fa8] schedule+0x98/0x160 
[  248.821387][  T676] [c000001ff310f820] [c0000000009a9814] schedule_timeout+0x304/0x520 
[  248.821432][  T676] [c000001ff310f960] [c0000000009a3c84] wait_for_completion+0xc4/0x1b0 
[  248.821460][  T676] [c000001ff310f9d0] [c0000000000fd0c8] __flush_work+0x3b8/0x770 
[  248.821491][  T676] [c000001ff310faf0] [c0000000002e0ac4] lru_add_drain_all+0x3e4/0x760 
[  248.821521][  T676] [c000001ff310fbf0] [c0000000003e0f18] khugepaged+0xd8/0x1770 
[  248.821560][  T676] [c000001ff310fdb0] [c0000000001095fc] kthread+0x1bc/0x1d0 
[  248.821611][  T676] [c000001ff310fe20] [c00000000000cbc4] ret_from_kernel_thread+0x5c/0x78 
[  248.821655][  T676] INFO: task kworker/56:1:719 blocked for more than 122 seconds. 
[  248.821689][  T676]       Tainted: G             L    5.8.0-rc5-next-20200716 #3 
[  248.821729][  T676] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  248.821779][  T676] kworker/56:1    D27584   719      2 0x00000800 
[  248.821839][  T676] Workqueue: rcu_gp wait_rcu_exp_gp 
[  248.821862][  T676] Call Trace: 
[  248.821888][  T676] [c000001ff2b57660] [c0000000057c9fb0] rcu_state+0x4fb0/0x5100 (unreliable) 
[  248.821934][  T676] [c000001ff2b57840] [c00000000001a2d0] __switch_to+0x260/0x380 
[  248.821977][  T676] [c000001ff2b578a0] [c0000000009a18b8] __schedule+0x398/0x9f0 
[  248.822021][  T676] [c000001ff2b57970] [c0000000009a1fa8] schedule+0x98/0x160 
[  248.822066][  T676] [c000001ff2b579a0] [c0000000009a970c] schedule_timeout+0x1fc/0x520 
[  248.822110][  T676] [c000001ff2b57ae0] [c0000000001a86d0] rcu_exp_wait_wake+0x1b0/0x950 
[  248.822153][  T676] [c000001ff2b57c30] [c0000000000fb754] process_one_work+0x304/0x900 
[  248.822197][  T676] [c000001ff2b57d20] [c0000000000fbdc8] worker_thread+0x78/0x520 
[  248.822242][  T676] [c000001ff2b57db0] [c0000000001095fc] kthread+0x1bc/0x1d0 
[  248.822279][  T676] [c000001ff2b57e20] [c00000000000cbc4] ret_from_kernel_thread+0x5c/0x78 
[  248.822385][  T676] INFO: task lvm:3123 blocked for more than 122 seconds. 
[  248.822413][  T676]       Tainted: G             L    5.8.0-rc5-next-20200716 #3 
[  248.822462][  T676] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  248.822503][  T676] lvm             D26608  3123      1 0x00040000 
[  248.822552][  T676] Call Trace: 
[  248.822576][  T676] [c000001feee27a00] [c00000000001a2d0] __switch_to+0x260/0x380 
[  248.822620][  T676] [c000001feee27a60] [c0000000009a18b8] __schedule+0x398/0x9f0 
[  248.822648][  T676] [c000001feee27b30] [c0000000009a1fa8] schedule+0x98/0x160 
[  248.822680][  T676] [c000001feee27b60] [c0000000009a9814] schedule_timeout+0x304/0x520 
[  248.822724][  T676] [c000001feee27ca0] [c0000000009a3c84] wait_for_completion+0xc4/0x1b0 
[  248.822768][  T676] [c000001feee27d10] [c0000000004b0e88] sys_io_destroy+0x238/0x2f0 
[  248.822808][  T676] [c000001feee27dc0] [c00000000002c8b8] system_call_exception+0xf8/0x180 
[  248.822840][  T676] [c000001feee27e20] [c00000000000c9e8] system_call_common+0xe8/0x214 
[  248.822873][  T676] INFO: task lvm:3126 blocked for more than 122 seconds. 
[  248.822901][  T676]       Tainted: G             L    5.8.0-rc5-next-20200716 #3 
[  248.822938][  T676] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  248.822987][  T676] lvm             D26608  3126      1 0x00040000 
[  248.823017][  T676] Call Trace: 
[  248.823031][  T676] [c000001fc0b27a00] [c00000000001a2d0] __switch_to+0x260/0x380 
[  248.823075][  T676] [c000001fc0b27a60] [c0000000009a18b8] __schedule+0x398/0x9f0 
[  248.823113][  T676] [c000001fc0b27b30] [c0000000009a1fa8] schedule+0x98/0x160 
[  248.823158][  T676] [c000001fc0b27b60] [c0000000009a9814] schedule_timeout+0x304/0x520 
[  248.823199][  T676] [c000001fc0b27ca0] [c0000000009a3c84] wait_for_completion+0xc4/0x1b0 
[  248.823250][  T676] [c000001fc0b27d10] [c0000000004b0e88] sys_io_destroy+0x238/0x2f0 
[  248.823294][  T676] [c000001fc0b27dc0] [c00000000002c8b8] system_call_exception+0xf8/0x180 
[  248.823332][  T676] [c000001fc0b27e20] [c00000000000c9e8] system_call_common+0xe8/0x214 
[  248.823374][  T676] INFO: task auditd:3163 blocked for more than 122 seconds. 
[  248.823424][  T676]       Tainted: G             L    5.8.0-rc5-next-20200716 #3 
[  248.823471][  T676] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  248.823512][  T676] auditd          D27088  3163      1 0x00042408 
[  248.823551][  T676] Call Trace: 
[  248.823583][  T676] [c000001fc080f760] [0000000200000001] 0x200000001 (unreliable) 
[  248.823648][  T676] [c000001fc080f940] [c00000000001a2d0] __switch_to+0x260/0x380 
[  248.823689][  T676] [c000001fc080f9a0] [c0000000009a18b8] __schedule+0x398/0x9f0 
[  248.823742][  T676] [c000001fc080fa70] [c0000000009a1fa8] schedule+0x98/0x160 
[  248.823784][  T676] [c000001fc080faa0] [c0000000001a9244] synchronize_rcu_expedited+0x394/0x600 
[  248.823837][  T676] [c000001fc080fba0] [c0000000004504c4] namespace_unlock+0xf4/0x230 
[  248.823881][  T676] [c000001fc080fc00] [c000000000456dec] put_mnt_ns+0x5c/0x80 
[  248.823926][  T676] [c000001fc080fc30] [c00000000010ba6c] free_nsproxy+0x2c/0x1e0 
[  248.823966][  T676] [c000001fc080fc60] [c0000000000d5130] do_exit+0x4e0/0xee0 
[  248.823997][  T676] [c000001fc080fd60] [c0000000000d5bec] do_group_exit+0x5c/0xd0 
[  248.824019][  T676] [c000001fc080fda0] [c0000000000d5c7c] sys_exit_group+0x1c/0x20 
[  248.824060][  T676] [c000001fc080fdc0] [c00000000002c8b8] system_call_exception+0xf8/0x180 
[  248.824103][  T676] [c000001fc080fe20] [c00000000000c9e8] system_call_common+0xe8/0x214 
[  248.824192][  T676]  
[  248.824192][  T676] Showing all locks held in the system: 
[  248.824419][  T676] 1 lock held by khungtaskd/676: 
[  248.824455][  T676]  #0: c0000000057c44c0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.29+0x8/0x30 
[  248.824531][  T676] 1 lock held by khugepaged/682: 
[  248.824565][  T676]  #0: c0000000057f42c8 (lock#4){+.+.}-{3:3}, at: lru_add_drain_all+0x68/0x760 
[  248.824679][  T676] 2 locks held by kworker/56:1/719: 
[  248.824742][  T676]  #0: c00000000bcc8938 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: process_one_work+0x21c/0x900 
[  248.824857][  T676]  #1: c000001ff2b57c90 ((work_completion)(&rew.rew_work)){+.+.}-{0:0}, at: process_one_work+0x21c/0x900 
[  248.825026][  T676] 3 locks held by (spawn)/2692: 
[  248.825077][  T676] 1 lock held by auditd/3163: 
[  248.825135][  T676]  #0: c0000000057c9ee8 (rcu_state.exp_mutex){+.+.}-{3:3}, at: synchronize_rcu_expedited+0x254/0x600 
[  248.825296][  T676] ============================================= 
[  248.825296][  T676]  

> 
>  .../include/asm/book3s/64/tlbflush-radix.h    | 15 ++++
>  arch/powerpc/include/asm/hvcall.h             | 34 +++++++-
>  arch/powerpc/include/asm/mmu.h                |  4 +
>  arch/powerpc/include/asm/plpar_wrappers.h     | 52 ++++++++++++
>  arch/powerpc/kernel/dt_cpu_ftrs.c             |  1 +
>  arch/powerpc/kernel/prom_init.c               | 13 +--
>  arch/powerpc/mm/book3s64/radix_tlb.c          | 82 +++++++++++++++++--
>  arch/powerpc/mm/init_64.c                     |  5 +-
>  arch/powerpc/platforms/pseries/lpar.c         |  8 +-
>  9 files changed, 197 insertions(+), 17 deletions(-)
> 
> -- 
> 2.21.3
> 

  parent reply	other threads:[~2020-07-16 17:27 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-03  5:36 [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE Bharata B Rao
2020-07-03  5:36 ` [PATCH v3 1/3] powerpc/mm: Enable radix GTSE only if supported Bharata B Rao
2020-07-06  1:49   ` Santosh Sivaraj
2020-07-06  4:17     ` Bharata B Rao
2020-07-03  5:36 ` [PATCH v3 2/3] powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if enabled Bharata B Rao
2020-07-03  5:36 ` [PATCH v3 3/3] powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when !GTSE Bharata B Rao
2020-07-16 12:55 ` [PATCH v3 0/3] Off-load TLB invalidations to host for !GTSE Michael Ellerman
2020-07-16 17:27 ` Qian Cai [this message]
2020-07-16 17:27   ` Qian Cai
2020-07-16 23:09   ` Stephen Rothwell
2020-07-16 23:09     ` Stephen Rothwell
2020-07-17  2:08   ` Nicholas Piggin
2020-07-17  2:08     ` Nicholas Piggin
2020-07-17  2:44     ` Nicholas Piggin
2020-07-17  2:44       ` Nicholas Piggin
2020-07-17  4:28       ` Bharata B Rao
2020-07-17  4:28         ` Bharata B Rao
2020-07-18 12:40         ` Michael Ellerman
2020-07-18 12:40           ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200716172713.GA4565@lca.pw \
    --to=cai@lca.pw \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=bharata@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=sfr@canb.auug.org.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.