linux-next.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address
@ 2023-10-26 14:41 Naresh Kamboju
  2023-10-26 15:30 ` Mark Rutland
  0 siblings, 1 reply; 11+ messages in thread
From: Naresh Kamboju @ 2023-10-26 14:41 UTC (permalink / raw)
  To: Linux-Next Mailing List, open list, Linux ARM, lkft-triage
  Cc: Arnd Bergmann, Ard Biesheuvel, Thomas Gleixner, Ingo Molnar,
	Catalin Marinas, Anders Roxell, Dan Carpenter, LTP List,
	Petr Vorel

Following kernel crash noticed on qemu-arm64 while running LTP syscalls
set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 and
6.6.0-rc7-next-20231025.

BAD: next-20231025
Good: next-20231024

Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>

Log:
----
<1>[  203.119139] Unable to handle kernel unknown 43 at virtual
address 0001ffff9e2e7d78
<1>[  203.119838] Mem abort info:
<1>[  203.120064]   ESR = 0x000000009793002b
<1>[  203.121040]   EC = 0x25: DABT (current EL), IL = 32 bits
set_robust_list01    1  TPASS  :  set_robust_list: retval = -1
(expected -1), errno = 22 (expected 22)
set_robust_list01    2  TPASS  :  set_robust_list: retval = 0
(expected 0), errno = 0 (expected 0)
<1>[  203.124496]   SET = 0, FnV = 0
<1>[  203.124778]   EA = 0, S1PTW = 0
<1>[  203.125029]   FSC = 0x2b: unknown 43
<1>[  203.126470] Data abort info:
<1>[  203.126710]   Access size = 4 byte(s)
<1>[  203.126969]   SSE = 0, SRT = 19
<1>[  203.127708]   SF = 0, AR = 0
<1>[  203.128213]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
<1>[  203.128788]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
<1>[  203.130416] user pgtable: 4k pages, 52-bit VAs, pgdp=000000010606a780
<1>[  203.130817] [0001ffff9e2e7d78] pgd=0000000000000000
<0>[  203.132603] Internal error: Oops: 000000009793002b [#1] PREEMPT SMP
<4>[  203.133483] Modules linked in: btrfs blake2b_generic libcrc32c
xor xor_neon raid6_pq zstd_compress crct10dif_ce sm3_ce sm3 sha3_ce
sha512_ce sha512_arm64 fuse drm backlight dm_mod ip_tables x_tables
<4>[  203.135177] CPU: 1 PID: 653 Comm: set_robust_list Not tainted
6.6.0-rc7-next-20231026 #1
<4>[  203.135642] Hardware name: linux,dummy-virt (DT)
<4>[  203.136609] pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT
-SSBS BTYPE=--)
<4>[ 203.137028] pc : handle_futex_death (kernel/futex/core.c:661
(discriminator 6))
<4>[ 203.138844] lr : handle_futex_death
(arch/arm64/include/asm/uaccess.h:46 (discriminator 1)
kernel/futex/core.c:661 (discriminator 1))
<4>[  203.139132] sp : ffff8000805c3c10
<4>[  203.139356] x29: ffff8000805c3c10 x28: 0000ffffbf187740 x27:
d53bd04035000220
<4>[  203.140366] x26: 0000000000000000 x25: fff00000c6195280 x24:
fff00000c6195280
<4>[  203.141055] x23: 0000000000000001 x22: ffffa4e6aeef09d0 x21:
0001ffff9e2e7d78
<4>[  203.141771] x20: 0001ffff9e2e7d78 x19: 0001ffff9e2e7d78 x18:
ffff8000805c3cf8
<4>[  203.142457] x17: 0000000000000000 x16: ffffa4e6aeae7078 x15:
000000000000000a
<4>[  203.143134] x14: 0000000000000000 x13: 1ffe000018258661 x12:
ffff8000805c3cf8
<4>[  203.143809] x11: 0000000000000000 x10: fff00000c12c3308 x9 :
ffffa4e6ad0e5748
<4>[  203.144504] x8 : ffff8000805c3c38 x7 : 0000000000000000 x6 :
0000000000000001
<4>[  203.145186] x5 : 0000000000000000 x4 : fff00000c6195280 x3 :
0000000000000000
<4>[  203.145929] x2 : 0000000000000000 x1 : 000ffffffffffffc x0 :
0001ffff9e2e7d78
<4>[  203.147032] Call trace:
<4>[ 203.147254] handle_futex_death (kernel/futex/core.c:661 (discriminator 6))
<4>[ 203.147560] exit_robust_list (kernel/futex/core.c:828)
<4>[ 203.148348] futex_exit_release (kernel/futex/core.c:1035
(discriminator 1) kernel/futex/core.c:1131 (discriminator 1))
<4>[ 203.148891] exit_mm_release (kernel/fork.c:1657)
<4>[ 203.149669] do_exit (kernel/exit.c:541 kernel/exit.c:858)
<4>[ 203.149897] do_group_exit (kernel/exit.c:1002)
<4>[ 203.150209] __arm64_sys_exit_group (kernel/exit.c:1032)
<4>[ 203.150980] invoke_syscall (arch/arm64/include/asm/current.h:19
arch/arm64/kernel/syscall.c:56)
<4>[ 203.151234] el0_svc_common.constprop.0
(include/linux/thread_info.h:127 (discriminator 2)
arch/arm64/kernel/syscall.c:144 (discriminator 2))
<4>[ 203.151999] do_el0_svc (arch/arm64/kernel/syscall.c:156)
<4>[ 203.152231] el0_svc (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:133
arch/arm64/kernel/entry-common.c:144
arch/arm64/kernel/entry-common.c:679)
<4>[ 203.152936] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697)
<4>[ 203.153518] el0t_64_sync (arch/arm64/kernel/entry.S:595)
<0>[ 203.154424] Code: d50323bf d65f03c0 9248fa93 52800002 (b8400a73)
All code
========
   0: d50323bf autiasp
   4: d65f03c0 ret
   8: 9248fa93 and x19, x20, #0xff7fffffffffffff
   c: 52800002 mov w2, #0x0                    // #0
  10:* b8400a73 ldtr w19, [x19] <-- trapping instruction

Code starting with the faulting instruction
===========================================
   0: b8400a73 ldtr w19, [x19]
<4>[  203.155308] ---[ end trace 0000000000000000 ]---
<1>[  203.156234] Fixing recursive fault but reboot is needed!
<3>[  203.157116] BUG: using smp_processor_id() in preemptible
[00000000] code: set_robust_list/653
<4>[ 203.158116] caller is debug_smp_processor_id (lib/smp_processor_id.c:61)
<4>[  203.158983] CPU: 1 PID: 653 Comm: set_robust_list Tainted: G
 D            6.6.0-rc7-next-20231026 #1
<4>[  203.159451] Hardware name: linux,dummy-virt (DT)
<4>[  203.159990] Call trace:
<4>[ 203.160394] dump_backtrace (arch/arm64/kernel/stacktrace.c:235)
<4>[ 203.160625] show_stack (arch/arm64/kernel/stacktrace.c:242)
<4>[ 203.160854] dump_stack_lvl (lib/dump_stack.c:107)
<4>[ 203.161869] dump_stack (lib/dump_stack.c:114)
<4>[ 203.162093] check_preemption_disabled
(arch/arm64/include/asm/current.h:19
arch/arm64/include/asm/preempt.h:54 lib/smp_processor_id.c:53)
<4>[ 203.162898] debug_smp_processor_id (lib/smp_processor_id.c:61)
<4>[ 203.163176] __schedule (kernel/sched/core.c:6578 (discriminator 1))
<4>[ 203.163894] do_task_dead (kernel/sched/core.c:6705)
<4>[ 203.164143] make_task_dead
(arch/arm64/include/asm/atomic_ll_sc.h:95 (discriminator 3)
arch/arm64/include/asm/atomic.h:49 (discriminator 3)
include/linux/atomic/atomic-arch-fallback.h:747 (discriminator 3)
include/linux/atomic/atomic-instrumented.h:253 (discriminator 3)
include/linux/refcount.h:193 (discriminator 3)
include/linux/refcount.h:250 (discriminator 3)
include/linux/refcount.h:267 (discriminator 3) kernel/exit.c:979
(discriminator 3))
<4>[ 203.164871] die (arch/arm64/kernel/traps.c:239)
<4>[ 203.165093] die_kernel_fault (arch/arm64/mm/fault.c:321)
<4>[ 203.165905] do_mem_abort (arch/arm64/mm/fault.c:850)
<4>[ 203.166149] el1_abort (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:399)
<4>[ 203.166864] el1h_64_sync_handler (arch/arm64/kernel/entry-common.c:486)
<4>[ 203.167173] el1h_64_sync (arch/arm64/kernel/entry.S:590)
<4>[ 203.167824] handle_futex_death (kernel/futex/core.c:661 (discriminator 6))
<4>[ 203.168329] exit_robust_list (kernel/futex/core.c:828)
<4>[ 203.168829] futex_exit_release (kernel/futex/core.c:1035
(discriminator 1) kernel/futex/core.c:1131 (discriminator 1))
<4>[ 203.169375] exit_mm_release (kernel/fork.c:1657)
<4>[ 203.169884] do_exit (kernel/exit.c:541 kernel/exit.c:858)
<4>[ 203.170372] do_group_exit (kernel/exit.c:1002)
<4>[ 203.170857] __arm64_sys_exit_group (kernel/exit.c:1032)
<4>[ 203.171643] invoke_syscall (arch/arm64/include/asm/current.h:19
arch/arm64/kernel/syscall.c:56)
<4>[ 203.172281] el0_svc_common.constprop.0
(include/linux/thread_info.h:127 (discriminator 2)
arch/arm64/kernel/syscall.c:144 (discriminator 2))
<4>[ 203.172815] do_el0_svc (arch/arm64/kernel/syscall.c:156)
<4>[ 203.173284] el0_svc (arch/arm64/include/asm/daifflags.h:28
arch/arm64/kernel/entry-common.c:133
arch/arm64/kernel/entry-common.c:144
arch/arm64/kernel/entry-common.c:679)
<4>[ 203.173769] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697)
<4>[ 203.174052] el0t_64_sync (arch/arm64/kernel/entry.S:595)



Links:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823098/suite/log-parser-test/test/check-kernel-bug/log
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823098/suite/log-parser-test/tests/
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823050/suite/log-parser-test/tests/

--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address
  2023-10-26 14:41 qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address Naresh Kamboju
@ 2023-10-26 15:30 ` Mark Rutland
  2023-10-26 15:39   ` Ard Biesheuvel
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Rutland @ 2023-10-26 15:30 UTC (permalink / raw)
  To: Naresh Kamboju, Ard Biesheuvel, Catalin Marinas, Will Deacon,
	Oliver Upton
  Cc: Linux-Next Mailing List, open list, Linux ARM, lkft-triage,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Anders Roxell,
	Dan Carpenter, LTP List, Petr Vorel

On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 and
> 6.6.0-rc7-next-20231025.
> 
> BAD: next-20231025
> Good: next-20231024
> 
> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> 
> Log:
> ----
> <1>[  203.119139] Unable to handle kernel unknown 43 at virtual
> address 0001ffff9e2e7d78
> <1>[  203.119838] Mem abort info:
> <1>[  203.120064]   ESR = 0x000000009793002b
> <1>[  203.121040]   EC = 0x25: DABT (current EL), IL = 32 bits
> set_robust_list01    1  TPASS  :  set_robust_list: retval = -1
> (expected -1), errno = 22 (expected 22)
> set_robust_list01    2  TPASS  :  set_robust_list: retval = 0
> (expected 0), errno = 0 (expected 0)
> <1>[  203.124496]   SET = 0, FnV = 0
> <1>[  203.124778]   EA = 0, S1PTW = 0
> <1>[  203.125029]   FSC = 0x2b: unknown 43

It looks like this is fallout from the LPA2 enablement.

According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:

	0b101011 When FEAT_LPA2 is implemented:
		 Translation fault, level -1.

It's triggered here by an LDTR in a get_user() on a bogus userspace address.
The exception is expected, and it's supposed to be handled via the exception
fixups, but the LPA2 patches didn't update the fault_info table entries for all
the level -1 faults, and so those all get handled by do_bad() and don't call
fixup_exception(), causing them to be fatal.

It should be relatively simple to update the fault_info table for the level -1
faults, but given the other issues we're seeing I think it's probably worth
dropping the LPA2 patches for the moment.

Mark.

> <1>[  203.126470] Data abort info:
> <1>[  203.126710]   Access size = 4 byte(s)
> <1>[  203.126969]   SSE = 0, SRT = 19
> <1>[  203.127708]   SF = 0, AR = 0
> <1>[  203.128213]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> <1>[  203.128788]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> <1>[  203.130416] user pgtable: 4k pages, 52-bit VAs, pgdp=000000010606a780
> <1>[  203.130817] [0001ffff9e2e7d78] pgd=0000000000000000
> <0>[  203.132603] Internal error: Oops: 000000009793002b [#1] PREEMPT SMP
> <4>[  203.133483] Modules linked in: btrfs blake2b_generic libcrc32c
> xor xor_neon raid6_pq zstd_compress crct10dif_ce sm3_ce sm3 sha3_ce
> sha512_ce sha512_arm64 fuse drm backlight dm_mod ip_tables x_tables
> <4>[  203.135177] CPU: 1 PID: 653 Comm: set_robust_list Not tainted
> 6.6.0-rc7-next-20231026 #1
> <4>[  203.135642] Hardware name: linux,dummy-virt (DT)
> <4>[  203.136609] pstate: 83400009 (Nzcv daif +PAN -UAO +TCO +DIT
> -SSBS BTYPE=--)
> <4>[ 203.137028] pc : handle_futex_death (kernel/futex/core.c:661
> (discriminator 6))
> <4>[ 203.138844] lr : handle_futex_death
> (arch/arm64/include/asm/uaccess.h:46 (discriminator 1)
> kernel/futex/core.c:661 (discriminator 1))
> <4>[  203.139132] sp : ffff8000805c3c10
> <4>[  203.139356] x29: ffff8000805c3c10 x28: 0000ffffbf187740 x27:
> d53bd04035000220
> <4>[  203.140366] x26: 0000000000000000 x25: fff00000c6195280 x24:
> fff00000c6195280
> <4>[  203.141055] x23: 0000000000000001 x22: ffffa4e6aeef09d0 x21:
> 0001ffff9e2e7d78
> <4>[  203.141771] x20: 0001ffff9e2e7d78 x19: 0001ffff9e2e7d78 x18:
> ffff8000805c3cf8
> <4>[  203.142457] x17: 0000000000000000 x16: ffffa4e6aeae7078 x15:
> 000000000000000a
> <4>[  203.143134] x14: 0000000000000000 x13: 1ffe000018258661 x12:
> ffff8000805c3cf8
> <4>[  203.143809] x11: 0000000000000000 x10: fff00000c12c3308 x9 :
> ffffa4e6ad0e5748
> <4>[  203.144504] x8 : ffff8000805c3c38 x7 : 0000000000000000 x6 :
> 0000000000000001
> <4>[  203.145186] x5 : 0000000000000000 x4 : fff00000c6195280 x3 :
> 0000000000000000
> <4>[  203.145929] x2 : 0000000000000000 x1 : 000ffffffffffffc x0 :
> 0001ffff9e2e7d78
> <4>[  203.147032] Call trace:
> <4>[ 203.147254] handle_futex_death (kernel/futex/core.c:661 (discriminator 6))
> <4>[ 203.147560] exit_robust_list (kernel/futex/core.c:828)
> <4>[ 203.148348] futex_exit_release (kernel/futex/core.c:1035
> (discriminator 1) kernel/futex/core.c:1131 (discriminator 1))
> <4>[ 203.148891] exit_mm_release (kernel/fork.c:1657)
> <4>[ 203.149669] do_exit (kernel/exit.c:541 kernel/exit.c:858)
> <4>[ 203.149897] do_group_exit (kernel/exit.c:1002)
> <4>[ 203.150209] __arm64_sys_exit_group (kernel/exit.c:1032)
> <4>[ 203.150980] invoke_syscall (arch/arm64/include/asm/current.h:19
> arch/arm64/kernel/syscall.c:56)
> <4>[ 203.151234] el0_svc_common.constprop.0
> (include/linux/thread_info.h:127 (discriminator 2)
> arch/arm64/kernel/syscall.c:144 (discriminator 2))
> <4>[ 203.151999] do_el0_svc (arch/arm64/kernel/syscall.c:156)
> <4>[ 203.152231] el0_svc (arch/arm64/include/asm/daifflags.h:28
> arch/arm64/kernel/entry-common.c:133
> arch/arm64/kernel/entry-common.c:144
> arch/arm64/kernel/entry-common.c:679)
> <4>[ 203.152936] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697)
> <4>[ 203.153518] el0t_64_sync (arch/arm64/kernel/entry.S:595)
> <0>[ 203.154424] Code: d50323bf d65f03c0 9248fa93 52800002 (b8400a73)
> All code
> ========
>    0: d50323bf autiasp
>    4: d65f03c0 ret
>    8: 9248fa93 and x19, x20, #0xff7fffffffffffff
>    c: 52800002 mov w2, #0x0                    // #0
>   10:* b8400a73 ldtr w19, [x19] <-- trapping instruction
> 
> Code starting with the faulting instruction
> ===========================================
>    0: b8400a73 ldtr w19, [x19]
> <4>[  203.155308] ---[ end trace 0000000000000000 ]---
> <1>[  203.156234] Fixing recursive fault but reboot is needed!
> <3>[  203.157116] BUG: using smp_processor_id() in preemptible
> [00000000] code: set_robust_list/653
> <4>[ 203.158116] caller is debug_smp_processor_id (lib/smp_processor_id.c:61)
> <4>[  203.158983] CPU: 1 PID: 653 Comm: set_robust_list Tainted: G
>  D            6.6.0-rc7-next-20231026 #1
> <4>[  203.159451] Hardware name: linux,dummy-virt (DT)
> <4>[  203.159990] Call trace:
> <4>[ 203.160394] dump_backtrace (arch/arm64/kernel/stacktrace.c:235)
> <4>[ 203.160625] show_stack (arch/arm64/kernel/stacktrace.c:242)
> <4>[ 203.160854] dump_stack_lvl (lib/dump_stack.c:107)
> <4>[ 203.161869] dump_stack (lib/dump_stack.c:114)
> <4>[ 203.162093] check_preemption_disabled
> (arch/arm64/include/asm/current.h:19
> arch/arm64/include/asm/preempt.h:54 lib/smp_processor_id.c:53)
> <4>[ 203.162898] debug_smp_processor_id (lib/smp_processor_id.c:61)
> <4>[ 203.163176] __schedule (kernel/sched/core.c:6578 (discriminator 1))
> <4>[ 203.163894] do_task_dead (kernel/sched/core.c:6705)
> <4>[ 203.164143] make_task_dead
> (arch/arm64/include/asm/atomic_ll_sc.h:95 (discriminator 3)
> arch/arm64/include/asm/atomic.h:49 (discriminator 3)
> include/linux/atomic/atomic-arch-fallback.h:747 (discriminator 3)
> include/linux/atomic/atomic-instrumented.h:253 (discriminator 3)
> include/linux/refcount.h:193 (discriminator 3)
> include/linux/refcount.h:250 (discriminator 3)
> include/linux/refcount.h:267 (discriminator 3) kernel/exit.c:979
> (discriminator 3))
> <4>[ 203.164871] die (arch/arm64/kernel/traps.c:239)
> <4>[ 203.165093] die_kernel_fault (arch/arm64/mm/fault.c:321)
> <4>[ 203.165905] do_mem_abort (arch/arm64/mm/fault.c:850)
> <4>[ 203.166149] el1_abort (arch/arm64/include/asm/daifflags.h:28
> arch/arm64/kernel/entry-common.c:399)
> <4>[ 203.166864] el1h_64_sync_handler (arch/arm64/kernel/entry-common.c:486)
> <4>[ 203.167173] el1h_64_sync (arch/arm64/kernel/entry.S:590)
> <4>[ 203.167824] handle_futex_death (kernel/futex/core.c:661 (discriminator 6))
> <4>[ 203.168329] exit_robust_list (kernel/futex/core.c:828)
> <4>[ 203.168829] futex_exit_release (kernel/futex/core.c:1035
> (discriminator 1) kernel/futex/core.c:1131 (discriminator 1))
> <4>[ 203.169375] exit_mm_release (kernel/fork.c:1657)
> <4>[ 203.169884] do_exit (kernel/exit.c:541 kernel/exit.c:858)
> <4>[ 203.170372] do_group_exit (kernel/exit.c:1002)
> <4>[ 203.170857] __arm64_sys_exit_group (kernel/exit.c:1032)
> <4>[ 203.171643] invoke_syscall (arch/arm64/include/asm/current.h:19
> arch/arm64/kernel/syscall.c:56)
> <4>[ 203.172281] el0_svc_common.constprop.0
> (include/linux/thread_info.h:127 (discriminator 2)
> arch/arm64/kernel/syscall.c:144 (discriminator 2))
> <4>[ 203.172815] do_el0_svc (arch/arm64/kernel/syscall.c:156)
> <4>[ 203.173284] el0_svc (arch/arm64/include/asm/daifflags.h:28
> arch/arm64/kernel/entry-common.c:133
> arch/arm64/kernel/entry-common.c:144
> arch/arm64/kernel/entry-common.c:679)
> <4>[ 203.173769] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:697)
> <4>[ 203.174052] el0t_64_sync (arch/arm64/kernel/entry.S:595)
> 
> 
> 
> Links:
> - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823098/suite/log-parser-test/test/check-kernel-bug/log
> - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823098/suite/log-parser-test/tests/
> - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20231026/testrun/20823050/suite/log-parser-test/tests/
> 
> --
> Linaro LKFT
> https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address
  2023-10-26 15:30 ` Mark Rutland
@ 2023-10-26 15:39   ` Ard Biesheuvel
  2023-10-27 10:57     ` Naresh Kamboju
  2023-10-31 16:32     ` Mark Rutland
  0 siblings, 2 replies; 11+ messages in thread
From: Ard Biesheuvel @ 2023-10-26 15:39 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Naresh Kamboju, Catalin Marinas, Will Deacon, Oliver Upton,
	Linux-Next Mailing List, open list, Linux ARM, lkft-triage,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Anders Roxell,
	Dan Carpenter, LTP List, Petr Vorel

On Thu, 26 Oct 2023 at 17:30, Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 and
> > 6.6.0-rc7-next-20231025.
> >
> > BAD: next-20231025
> > Good: next-20231024
> >
> > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> >
> > Log:
> > ----
> > <1>[  203.119139] Unable to handle kernel unknown 43 at virtual
> > address 0001ffff9e2e7d78
> > <1>[  203.119838] Mem abort info:
> > <1>[  203.120064]   ESR = 0x000000009793002b
> > <1>[  203.121040]   EC = 0x25: DABT (current EL), IL = 32 bits
> > set_robust_list01    1  TPASS  :  set_robust_list: retval = -1
> > (expected -1), errno = 22 (expected 22)
> > set_robust_list01    2  TPASS  :  set_robust_list: retval = 0
> > (expected 0), errno = 0 (expected 0)
> > <1>[  203.124496]   SET = 0, FnV = 0
> > <1>[  203.124778]   EA = 0, S1PTW = 0
> > <1>[  203.125029]   FSC = 0x2b: unknown 43
>
> It looks like this is fallout from the LPA2 enablement.
>
> According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
>
>         0b101011 When FEAT_LPA2 is implemented:
>                  Translation fault, level -1.
>
> It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> The exception is expected, and it's supposed to be handled via the exception
> fixups, but the LPA2 patches didn't update the fault_info table entries for all
> the level -1 faults, and so those all get handled by do_bad() and don't call
> fixup_exception(), causing them to be fatal.
>
> It should be relatively simple to update the fault_info table for the level -1
> faults, but given the other issues we're seeing I think it's probably worth
> dropping the LPA2 patches for the moment.
>

Thanks for the analysis Mark.

I agree that this should not be difficult to fix, but given the other
CI problems and identified loose ends, I am not going to object to
dropping this partially or entirely at this point. I'm sure everybody
will be thrilled to go over those 60 patches again after I rebase them
onto v6.7-rc1 :-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address
  2023-10-26 15:39   ` Ard Biesheuvel
@ 2023-10-27 10:57     ` Naresh Kamboju
  2023-10-28  7:42       ` Ard Biesheuvel
  2023-10-31 16:32     ` Mark Rutland
  1 sibling, 1 reply; 11+ messages in thread
From: Naresh Kamboju @ 2023-10-27 10:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Oliver Upton,
	Linux-Next Mailing List, open list, Linux ARM, lkft-triage,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Anders Roxell,
	Dan Carpenter, LTP List, Petr Vorel

On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 26 Oct 2023 at 17:30, Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 and
> > > 6.6.0-rc7-next-20231025.
> > >
> > > BAD: next-20231025
> > > Good: next-20231024
> > >
> > > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> > > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> > >
> > > Log:
> > > ----
> > > <1>[  203.119139] Unable to handle kernel unknown 43 at virtual
> > > address 0001ffff9e2e7d78
> > > <1>[  203.119838] Mem abort info:
> > > <1>[  203.120064]   ESR = 0x000000009793002b
> > > <1>[  203.121040]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > set_robust_list01    1  TPASS  :  set_robust_list: retval = -1
> > > (expected -1), errno = 22 (expected 22)
> > > set_robust_list01    2  TPASS  :  set_robust_list: retval = 0
> > > (expected 0), errno = 0 (expected 0)
> > > <1>[  203.124496]   SET = 0, FnV = 0
> > > <1>[  203.124778]   EA = 0, S1PTW = 0
> > > <1>[  203.125029]   FSC = 0x2b: unknown 43
> >
> > It looks like this is fallout from the LPA2 enablement.
> >
> > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> >
> >         0b101011 When FEAT_LPA2 is implemented:
> >                  Translation fault, level -1.
> >
> > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > The exception is expected, and it's supposed to be handled via the exception
> > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > the level -1 faults, and so those all get handled by do_bad() and don't call
> > fixup_exception(), causing them to be fatal.
> >
> > It should be relatively simple to update the fault_info table for the level -1
> > faults, but given the other issues we're seeing I think it's probably worth
> > dropping the LPA2 patches for the moment.
> >
>
> Thanks for the analysis Mark.
>
> I agree that this should not be difficult to fix, but given the other
> CI problems and identified loose ends, I am not going to object to
> dropping this partially or entirely at this point. I'm sure everybody
> will be thrilled to go over those 60 patches again after I rebase them
> onto v6.7-rc1 :-)

I am happy to test any proposed fix patch.

- Naresh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address
  2023-10-27 10:57     ` Naresh Kamboju
@ 2023-10-28  7:42       ` Ard Biesheuvel
  2023-10-30  8:07         ` Naresh Kamboju
  0 siblings, 1 reply; 11+ messages in thread
From: Ard Biesheuvel @ 2023-10-28  7:42 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Oliver Upton,
	Linux-Next Mailing List, open list, Linux ARM, lkft-triage,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Anders Roxell,
	Dan Carpenter, LTP List, Petr Vorel

[-- Attachment #1: Type: text/plain, Size: 1870 bytes --]

On Fri, 27 Oct 2023 at 12:57, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Thu, 26 Oct 2023 at 17:30, Mark Rutland <mark.rutland@arm.com> wrote:
> > >
> > > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 ...
> > > It looks like this is fallout from the LPA2 enablement.
> > >
> > > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> > >
> > >         0b101011 When FEAT_LPA2 is implemented:
> > >                  Translation fault, level -1.
> > >
> > > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > > The exception is expected, and it's supposed to be handled via the exception
> > > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > > the level -1 faults, and so those all get handled by do_bad() and don't call
> > > fixup_exception(), causing them to be fatal.
> > >
> > > It should be relatively simple to update the fault_info table for the level -1
> > > faults, but given the other issues we're seeing I think it's probably worth
> > > dropping the LPA2 patches for the moment.
> > >
> >
> > Thanks for the analysis Mark.
> >
> > I agree that this should not be difficult to fix, but given the other
> > CI problems and identified loose ends, I am not going to object to
> > dropping this partially or entirely at this point. I'm sure everybody
> > will be thrilled to go over those 60 patches again after I rebase them
> > onto v6.7-rc1 :-)
>
> I am happy to test any proposed fix patch.
>

Thanks Naresh. Patch attached.

[-- Attachment #2: 0001-Add-missing-ESR-decoding-for-level-1-translation-fau.patch --]
[-- Type: text/x-patch, Size: 2659 bytes --]

From 0d3c9d39a4541f7c5dea5175adea2af63ec1b92d Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb@kernel.org>
Date: Sat, 28 Oct 2023 09:40:29 +0200
Subject: [PATCH] Add missing ESR decoding for level -1 translation faults

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/fault.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 460d799e1296..22318d56087d 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -791,7 +791,7 @@ static const struct fault_info fault_info[] = {
 	{ do_sea,		SIGBUS,  BUS_OBJERR,	"synchronous external abort"	},
 	{ do_tag_check_fault,	SIGSEGV, SEGV_MTESERR,	"synchronous tag check fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 18"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 19"			},
+	{ do_sea,		SIGKILL, SI_KERNEL,	"level -1 (translation table walk)"	},
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 0 (translation table walk)"	},
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 1 (translation table walk)"	},
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 2 (translation table walk)"	},
@@ -799,7 +799,7 @@ static const struct fault_info fault_info[] = {
 	{ do_sea,		SIGBUS,  BUS_OBJERR,	"synchronous parity or ECC error" },	// Reserved when RAS is implemented
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 25"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 26"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 27"			},
+	{ do_sea,		SIGKILL, SI_KERNEL,	"level -1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 0 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 2 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
@@ -811,9 +811,9 @@ static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 36"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 37"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 38"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 39"			},
+	{ do_bad,		SIGKILL, SI_KERNEL,	"level -1 address size fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 40"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 41"			},
+	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level -1 translation fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 42"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 43"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 44"			},
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address
  2023-10-28  7:42       ` Ard Biesheuvel
@ 2023-10-30  8:07         ` Naresh Kamboju
  2023-10-30  8:14           ` Ard Biesheuvel
  0 siblings, 1 reply; 11+ messages in thread
From: Naresh Kamboju @ 2023-10-30  8:07 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Oliver Upton,
	Linux-Next Mailing List, open list, Linux ARM, lkft-triage,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Anders Roxell,
	Dan Carpenter, LTP List, Petr Vorel

On Sat, 28 Oct 2023 at 13:12, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 27 Oct 2023 at 12:57, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >
> > On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Thu, 26 Oct 2023 at 17:30, Mark Rutland <mark.rutland@arm.com> wrote:
> > > >
> > > > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 ...
> > > > It looks like this is fallout from the LPA2 enablement.
> > > >
> > > > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > > > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> > > >
> > > >         0b101011 When FEAT_LPA2 is implemented:
> > > >                  Translation fault, level -1.
> > > >
> > > > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > > > The exception is expected, and it's supposed to be handled via the exception
> > > > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > > > the level -1 faults, and so those all get handled by do_bad() and don't call
> > > > fixup_exception(), causing them to be fatal.
> > > >
> > > > It should be relatively simple to update the fault_info table for the level -1
> > > > faults, but given the other issues we're seeing I think it's probably worth
> > > > dropping the LPA2 patches for the moment.
> > > >
> > >
> > > Thanks for the analysis Mark.
> > >
> > > I agree that this should not be difficult to fix, but given the other
> > > CI problems and identified loose ends, I am not going to object to
> > > dropping this partially or entirely at this point. I'm sure everybody
> > > will be thrilled to go over those 60 patches again after I rebase them
> > > onto v6.7-rc1 :-)
> >
> > I am happy to test any proposed fix patch.
> >
>
> Thanks Naresh. Patch attached.

This patch did not solve the reported problem.
Test log links,
 - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/naresh/tests/2XTP1lXcUUscT357YaAm2G1AhpS

- Naresh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address
  2023-10-30  8:07         ` Naresh Kamboju
@ 2023-10-30  8:14           ` Ard Biesheuvel
  2023-10-30 11:50             ` Naresh Kamboju
  2023-10-31 16:27             ` Mark Rutland
  0 siblings, 2 replies; 11+ messages in thread
From: Ard Biesheuvel @ 2023-10-30  8:14 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Oliver Upton,
	Linux-Next Mailing List, open list, Linux ARM, lkft-triage,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Anders Roxell,
	Dan Carpenter, LTP List, Petr Vorel

[-- Attachment #1: Type: text/plain, Size: 2405 bytes --]

On Mon, 30 Oct 2023 at 09:07, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> On Sat, 28 Oct 2023 at 13:12, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Fri, 27 Oct 2023 at 12:57, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > >
> > > On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > On Thu, 26 Oct 2023 at 17:30, Mark Rutland <mark.rutland@arm.com> wrote:
> > > > >
> > > > > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > > > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > > > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 ...
> > > > > It looks like this is fallout from the LPA2 enablement.
> > > > >
> > > > > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > > > > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> > > > >
> > > > >         0b101011 When FEAT_LPA2 is implemented:
> > > > >                  Translation fault, level -1.
> > > > >
> > > > > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > > > > The exception is expected, and it's supposed to be handled via the exception
> > > > > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > > > > the level -1 faults, and so those all get handled by do_bad() and don't call
> > > > > fixup_exception(), causing them to be fatal.
> > > > >
> > > > > It should be relatively simple to update the fault_info table for the level -1
> > > > > faults, but given the other issues we're seeing I think it's probably worth
> > > > > dropping the LPA2 patches for the moment.
> > > > >
> > > >
> > > > Thanks for the analysis Mark.
> > > >
> > > > I agree that this should not be difficult to fix, but given the other
> > > > CI problems and identified loose ends, I am not going to object to
> > > > dropping this partially or entirely at this point. I'm sure everybody
> > > > will be thrilled to go over those 60 patches again after I rebase them
> > > > onto v6.7-rc1 :-)
> > >
> > > I am happy to test any proposed fix patch.
> > >
> >
> > Thanks Naresh. Patch attached.
>
> This patch did not solve the reported problem.
> Test log links,
>  - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/naresh/tests/2XTP1lXcUUscT357YaAm2G1AhpS
>

Oops, sorry about that.

Fixed patch attched.

[-- Attachment #2: v2-0001-Add-missing-ESR-decoding-for-level-1-translation-.patch --]
[-- Type: text/x-patch, Size: 3600 bytes --]

From 97dea432bceadfcece84484609374c277afc2c81 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb@kernel.org>
Date: Sat, 28 Oct 2023 09:40:29 +0200
Subject: [PATCH v2] Add missing ESR decoding for level -1 translation faults

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/mm/fault.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 2e5d1e238af9..13f192691060 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -780,18 +780,18 @@ static const struct fault_info fault_info[] = {
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 1 translation fault"	},
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 2 translation fault"	},
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 3 translation fault"	},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 8"			},
+	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 0 access flag fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 access flag fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 access flag fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 access flag fault"	},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 12"			},
+	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 0 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
 	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
 	{ do_sea,		SIGBUS,  BUS_OBJERR,	"synchronous external abort"	},
 	{ do_tag_check_fault,	SIGSEGV, SEGV_MTESERR,	"synchronous tag check fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 18"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 19"			},
+	{ do_sea,		SIGKILL, SI_KERNEL,	"level -1 (translation table walk)"	},
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 0 (translation table walk)"	},
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 1 (translation table walk)"	},
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 2 (translation table walk)"	},
@@ -799,7 +799,7 @@ static const struct fault_info fault_info[] = {
 	{ do_sea,		SIGBUS,  BUS_OBJERR,	"synchronous parity or ECC error" },	// Reserved when RAS is implemented
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 25"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 26"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 27"			},
+	{ do_sea,		SIGKILL, SI_KERNEL,	"level -1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 0 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
 	{ do_sea,		SIGKILL, SI_KERNEL,	"level 2 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
@@ -813,9 +813,9 @@ static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 38"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 39"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 40"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 41"			},
+	{ do_bad,		SIGKILL, SI_KERNEL,	"level -1 address size fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 42"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 43"			},
+	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level -1 translation fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 44"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 45"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 46"			},
-- 
2.42.0.820.g83a721a137-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address
  2023-10-30  8:14           ` Ard Biesheuvel
@ 2023-10-30 11:50             ` Naresh Kamboju
  2023-10-31  7:43               ` Naresh Kamboju
  2023-10-31 16:27             ` Mark Rutland
  1 sibling, 1 reply; 11+ messages in thread
From: Naresh Kamboju @ 2023-10-30 11:50 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Oliver Upton,
	Linux-Next Mailing List, open list, Linux ARM, lkft-triage,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Anders Roxell,
	Dan Carpenter, LTP List, Petr Vorel

[-- Attachment #1: Type: text/plain, Size: 2653 bytes --]

On Mon, 30 Oct 2023 at 13:45, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 30 Oct 2023 at 09:07, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> >
> > On Sat, 28 Oct 2023 at 13:12, Ard Biesheuvel <ardb@kernel.org> wrote:
> > >
> > > On Fri, 27 Oct 2023 at 12:57, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > > >
> > > > On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > >
> > > > > On Thu, 26 Oct 2023 at 17:30, Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > >
> > > > > > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > > > > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > > > > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 ...
> > > > > > It looks like this is fallout from the LPA2 enablement.
> > > > > >
> > > > > > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > > > > > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> > > > > >
> > > > > >         0b101011 When FEAT_LPA2 is implemented:
> > > > > >                  Translation fault, level -1.
> > > > > >
> > > > > > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > > > > > The exception is expected, and it's supposed to be handled via the exception
> > > > > > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > > > > > the level -1 faults, and so those all get handled by do_bad() and don't call
> > > > > > fixup_exception(), causing them to be fatal.
> > > > > >
> > > > > > It should be relatively simple to update the fault_info table for the level -1
> > > > > > faults, but given the other issues we're seeing I think it's probably worth
> > > > > > dropping the LPA2 patches for the moment.
> > > > > >
> > > > >
> > > > > Thanks for the analysis Mark.
> > > > >
> > > > > I agree that this should not be difficult to fix, but given the other
> > > > > CI problems and identified loose ends, I am not going to object to
> > > > > dropping this partially or entirely at this point. I'm sure everybody
> > > > > will be thrilled to go over those 60 patches again after I rebase them
> > > > > onto v6.7-rc1 :-)
> > > >
> > > > I am happy to test any proposed fix patch.
> > > >
> > >
> > > Thanks Naresh. Patch attached.
> >
> > This patch did not solve the reported problem.
> > Test log links,
> >  - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/naresh/tests/2XTP1lXcUUscT357YaAm2G1AhpS
> >
>
> Oops, sorry about that.
>
> Fixed patch attched.

Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>

- Naresh

[-- Attachment #2: v2-0001-Add-missing-ESR-decoding-for-level-1-translation-.patch --]
[-- Type: application/x-patch, Size: 3600 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address
  2023-10-30 11:50             ` Naresh Kamboju
@ 2023-10-31  7:43               ` Naresh Kamboju
  0 siblings, 0 replies; 11+ messages in thread
From: Naresh Kamboju @ 2023-10-31  7:43 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Mark Rutland, Catalin Marinas, Will Deacon, Oliver Upton,
	Linux-Next Mailing List, open list, Linux ARM, lkft-triage,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Anders Roxell,
	Dan Carpenter, LTP List, Petr Vorel

[-- Attachment #1: Type: text/plain, Size: 2936 bytes --]

Hi Ard,

Your V2 patch works perfectly.
Thanks for providing a fix patch.

- Naresh

On Mon, 30 Oct 2023 at 17:20, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> On Mon, 30 Oct 2023 at 13:45, Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > On Mon, 30 Oct 2023 at 09:07, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > >
> > > On Sat, 28 Oct 2023 at 13:12, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > >
> > > > On Fri, 27 Oct 2023 at 12:57, Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
> > > > >
> > > > > On Thu, 26 Oct 2023 at 21:09, Ard Biesheuvel <ardb@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, 26 Oct 2023 at 17:30, Mark Rutland <mark.rutland@arm.com> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > > > > > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > > > > > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 ...
> > > > > > > It looks like this is fallout from the LPA2 enablement.
> > > > > > >
> > > > > > > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > > > > > > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> > > > > > >
> > > > > > >         0b101011 When FEAT_LPA2 is implemented:
> > > > > > >                  Translation fault, level -1.
> > > > > > >
> > > > > > > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > > > > > > The exception is expected, and it's supposed to be handled via the exception
> > > > > > > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > > > > > > the level -1 faults, and so those all get handled by do_bad() and don't call
> > > > > > > fixup_exception(), causing them to be fatal.
> > > > > > >
> > > > > > > It should be relatively simple to update the fault_info table for the level -1
> > > > > > > faults, but given the other issues we're seeing I think it's probably worth
> > > > > > > dropping the LPA2 patches for the moment.
> > > > > > >
> > > > > >
> > > > > > Thanks for the analysis Mark.
> > > > > >
> > > > > > I agree that this should not be difficult to fix, but given the other
> > > > > > CI problems and identified loose ends, I am not going to object to
> > > > > > dropping this partially or entirely at this point. I'm sure everybody
> > > > > > will be thrilled to go over those 60 patches again after I rebase them
> > > > > > onto v6.7-rc1 :-)
> > > > >
> > > > > I am happy to test any proposed fix patch.
> > > > >
> > > >
> > > > Thanks Naresh. Patch attached.
> > >
> > > This patch did not solve the reported problem.
> > > Test log links,
> > >  - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/naresh/tests/2XTP1lXcUUscT357YaAm2G1AhpS
> > >
> >
> > Oops, sorry about that.
> >
> > Fixed patch attched.
>
> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
>
> - Naresh

[-- Attachment #2: v2-0001-Add-missing-ESR-decoding-for-level-1-translation-.patch --]
[-- Type: application/x-patch, Size: 3600 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address
  2023-10-30  8:14           ` Ard Biesheuvel
  2023-10-30 11:50             ` Naresh Kamboju
@ 2023-10-31 16:27             ` Mark Rutland
  1 sibling, 0 replies; 11+ messages in thread
From: Mark Rutland @ 2023-10-31 16:27 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Naresh Kamboju, Catalin Marinas, Will Deacon, Oliver Upton,
	Linux-Next Mailing List, open list, Linux ARM, lkft-triage,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Anders Roxell,
	Dan Carpenter, LTP List, Petr Vorel

On Mon, Oct 30, 2023 at 09:14:56AM +0100, Ard Biesheuvel wrote:
> From 97dea432bceadfcece84484609374c277afc2c81 Mon Sep 17 00:00:00 2001
> From: Ard Biesheuvel <ardb@kernel.org>
> Date: Sat, 28 Oct 2023 09:40:29 +0200
> Subject: [PATCH v2] Add missing ESR decoding for level -1 translation faults
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

As a heads-up, looking at this some more we'll also need to rework the usage of
of ESR_ELx_FSC_TYPE and ESR_ELx_FSC_LEVEL, since those no longer work correctly
Level -1 xFSC value. ESR_ELx_FSC_TYPE is 0x3c and ESR_ELx_FSC_LEVEL is 0x3, and
work on the basis that the xFSC fault types are encoded as xxxxyy, where the
xxxx is the type and the yy is the level (0 to 3).

That didn't expand naturally to level -1. For example, Level {0,1,2,3}
translation faults get reported as 0b0001xx, where the xx encodes the level,
while Level -1 translation faults get reported as 0b101011.

That ends up affecting:

* All the is_${FOO}_fault() predicat functions, e.g. is_translation_fault(),
  is_el1_permission_fault() and is_spurious_el1_translation_fault().

* Places where we synthesize an xFSC value, e.g. set_thread_esr()

* A bunch of KVM due to the use of kvm_vcpu_trap_get_fault_type()

... and we probably need to remove ESR_ELx_FSC_TYPE and ESR_ELx_FSC_LEVEL
entirely to avoid the possiblity of misuse.

Mark.

> ---
>  arch/arm64/mm/fault.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 2e5d1e238af9..13f192691060 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -780,18 +780,18 @@ static const struct fault_info fault_info[] = {
>  	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 1 translation fault"	},
>  	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 2 translation fault"	},
>  	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 3 translation fault"	},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 8"			},
> +	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 0 access flag fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 access flag fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 access flag fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 access flag fault"	},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 12"			},
> +	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 0 permission fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
>  	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
>  	{ do_sea,		SIGBUS,  BUS_OBJERR,	"synchronous external abort"	},
>  	{ do_tag_check_fault,	SIGSEGV, SEGV_MTESERR,	"synchronous tag check fault"	},
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 18"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 19"			},
> +	{ do_sea,		SIGKILL, SI_KERNEL,	"level -1 (translation table walk)"	},
>  	{ do_sea,		SIGKILL, SI_KERNEL,	"level 0 (translation table walk)"	},
>  	{ do_sea,		SIGKILL, SI_KERNEL,	"level 1 (translation table walk)"	},
>  	{ do_sea,		SIGKILL, SI_KERNEL,	"level 2 (translation table walk)"	},
> @@ -799,7 +799,7 @@ static const struct fault_info fault_info[] = {
>  	{ do_sea,		SIGBUS,  BUS_OBJERR,	"synchronous parity or ECC error" },	// Reserved when RAS is implemented
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 25"			},
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 26"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 27"			},
> +	{ do_sea,		SIGKILL, SI_KERNEL,	"level -1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>  	{ do_sea,		SIGKILL, SI_KERNEL,	"level 0 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>  	{ do_sea,		SIGKILL, SI_KERNEL,	"level 1 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
>  	{ do_sea,		SIGKILL, SI_KERNEL,	"level 2 synchronous parity error (translation table walk)"	},	// Reserved when RAS is implemented
> @@ -813,9 +813,9 @@ static const struct fault_info fault_info[] = {
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 38"			},
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 39"			},
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 40"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 41"			},
> +	{ do_bad,		SIGKILL, SI_KERNEL,	"level -1 address size fault"	},
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 42"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 43"			},
> +	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level -1 translation fault"	},
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 44"			},
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 45"			},
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 46"			},
> -- 
> 2.42.0.820.g83a721a137-goog
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address
  2023-10-26 15:39   ` Ard Biesheuvel
  2023-10-27 10:57     ` Naresh Kamboju
@ 2023-10-31 16:32     ` Mark Rutland
  1 sibling, 0 replies; 11+ messages in thread
From: Mark Rutland @ 2023-10-31 16:32 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Naresh Kamboju, Catalin Marinas, Will Deacon, Oliver Upton,
	Linux-Next Mailing List, open list, Linux ARM, lkft-triage,
	Arnd Bergmann, Thomas Gleixner, Ingo Molnar, Anders Roxell,
	Dan Carpenter, LTP List, Petr Vorel

On Thu, Oct 26, 2023 at 05:39:11PM +0200, Ard Biesheuvel wrote:
> On Thu, 26 Oct 2023 at 17:30, Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > On Thu, Oct 26, 2023 at 08:11:26PM +0530, Naresh Kamboju wrote:
> > > Following kernel crash noticed on qemu-arm64 while running LTP syscalls
> > > set_robust_list test case running Linux next 6.6.0-rc7-next-20231026 and
> > > 6.6.0-rc7-next-20231025.
> > >
> > > BAD: next-20231025
> > > Good: next-20231024
> > >
> > > Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
> > > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> > >
> > > Log:
> > > ----
> > > <1>[  203.119139] Unable to handle kernel unknown 43 at virtual
> > > address 0001ffff9e2e7d78
> > > <1>[  203.119838] Mem abort info:
> > > <1>[  203.120064]   ESR = 0x000000009793002b
> > > <1>[  203.121040]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > set_robust_list01    1  TPASS  :  set_robust_list: retval = -1
> > > (expected -1), errno = 22 (expected 22)
> > > set_robust_list01    2  TPASS  :  set_robust_list: retval = 0
> > > (expected 0), errno = 0 (expected 0)
> > > <1>[  203.124496]   SET = 0, FnV = 0
> > > <1>[  203.124778]   EA = 0, S1PTW = 0
> > > <1>[  203.125029]   FSC = 0x2b: unknown 43
> >
> > It looks like this is fallout from the LPA2 enablement.
> >
> > According to the latest ARM ARM (ARM DDI 0487J.a), page D19-6475, that "unknown
> > 43" (0x2b / 0b101011) is the DFSC for a level -1 translation fault:
> >
> >         0b101011 When FEAT_LPA2 is implemented:
> >                  Translation fault, level -1.
> >
> > It's triggered here by an LDTR in a get_user() on a bogus userspace address.
> > The exception is expected, and it's supposed to be handled via the exception
> > fixups, but the LPA2 patches didn't update the fault_info table entries for all
> > the level -1 faults, and so those all get handled by do_bad() and don't call
> > fixup_exception(), causing them to be fatal.
> >
> > It should be relatively simple to update the fault_info table for the level -1
> > faults, but given the other issues we're seeing I think it's probably worth
> > dropping the LPA2 patches for the moment.
> >
> 
> Thanks for the analysis Mark.
> 
> I agree that this should not be difficult to fix, but given the other
> CI problems and identified loose ends, I am not going to object to
> dropping this partially or entirely at this point. I'm sure everybody
> will be thrilled to go over those 60 patches again after I rebase them
> onto v6.7-rc1 :-)

FWIW, I'm more than happy to try; the issue has lagely been finding the time.
Hopefully that'll be a bit easier after LPC!

Mark.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-10-31 16:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-26 14:41 qemu-arm64: handle_futex_death - kernel/futex/core.c:661 - Unable to handle kernel unknown 43 at virtual address Naresh Kamboju
2023-10-26 15:30 ` Mark Rutland
2023-10-26 15:39   ` Ard Biesheuvel
2023-10-27 10:57     ` Naresh Kamboju
2023-10-28  7:42       ` Ard Biesheuvel
2023-10-30  8:07         ` Naresh Kamboju
2023-10-30  8:14           ` Ard Biesheuvel
2023-10-30 11:50             ` Naresh Kamboju
2023-10-31  7:43               ` Naresh Kamboju
2023-10-31 16:27             ` Mark Rutland
2023-10-31 16:32     ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).