* [Possible BUG] arm64: efi: efi_runtime_fixup_exception() and efi_call_virt_check_flags() both taint the kernel @ 2022-11-07 17:27 Alexandru Elisei 2022-11-07 17:47 ` Ard Biesheuvel 2022-11-08 12:19 ` [Possible BUG] arm64: efi: efi_runtime_fixup_exception() and efi_call_virt_check_flags() both taint the kernel #forregzbot Thorsten Leemhuis 0 siblings, 2 replies; 6+ messages in thread From: Alexandru Elisei @ 2022-11-07 17:27 UTC (permalink / raw) To: ardb, catalin.marinas, will, linux-efi, linux-arm-kernel, linux-kernel I'm going to preface this by saying that I'm extremely unfamiliar with the EFI code. Commit d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when possible") skipped the call to SetVirtualAddressMap() for certain configurations, and that started causing kernel panics on an Ampere Altra machine due to an EFI synchronous exception. Commit 23715a26c8d8 ("arm64: efi: Recover from synchronous exceptions occurring in firmware") made the EFI exception non-fatal. With a kernel built from v6.1-rc4 (which has both patches), I'm now getting two splats on the same Altra machine (log below). Looks to me like the second splat is caused by efi_call_virt_check_flags() using the PSTATE.{I,F} values from when taking the exception. Shouldn't efi_runtime_fixup_exception() fix up the exception so the error isn't propagated along the call chain? I'm asking this because efi_runtime_fixup_exception() has this add_taint() call: pr_err(FW_BUG "Unable to handle %s in EFI runtime service\n", msg); add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK); and then efi_call_virt_check_flags() has this call: mismatch = flags ^ cur_flags; if (!WARN_ON_ONCE(mismatch & ARCH_EFI_IRQ_FLAGS_MASK)) return; add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_NOW_UNRELIABLE); It looks to me like LOCKDEP_STILL_OK from the first call is at odds with LOCKDEP_NOW_UNRELIABLE from the second add_taint() call. Here is the relevant part of the log (I can send the .config, kernel command line and full log, or any other information that might be needed): [ 55.479519] [Firmware Bug]: Unable to handle paging request in EFI runtime service [ 55.487122] CPU: 62 PID: 9 Comm: kworker/u320:0 Tainted: G I 6.1.0-rc4 #60 [ 55.487128] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18 [ 55.487131] Workqueue: efi_rts_wq efi_call_rts [ 55.487158] Call trace: [ 55.487161] dump_backtrace.part.0+0xdc/0xf0 [ 55.487177] show_stack+0x18/0x40 [ 55.487180] dump_stack_lvl+0x68/0x84 [ 55.487190] dump_stack+0x18/0x34 [ 55.487192] efi_runtime_fixup_exception+0x74/0x88 [ 55.487199] __do_kernel_fault+0x108/0x1b0 [ 55.487204] do_page_fault+0xd0/0x400 [ 55.487207] do_translation_fault+0xac/0xc0 [ 55.487209] do_mem_abort+0x44/0x94 [ 55.487212] el1_abort+0x40/0x6c [ 55.487214] el1h_64_sync_handler+0xd8/0xe4 [ 55.487218] el1h_64_sync+0x64/0x68 [ 55.487221] 0xb7eb7ae4 [ 55.487224] 0xb7eb8668 [ 55.487225] 0xb7eb6e08 [ 55.487227] 0xb7eb68ec [ 55.487228] 0xb7eb3824 [ 55.487230] 0xb7eb05a8 [ 55.487231] 0xb7eb12a0 [ 55.487232] 0xb7e43504 [ 55.487234] 0xb7e43650 [ 55.487235] 0xb7e482d0 [ 55.487237] 0xb7e4907c [ 55.487238] 0xb7e49ff4 [ 55.487239] 0xb7e40888 [ 55.487241] 0xb7cb3328 [ 55.487242] 0xb7cb0674 [ 55.487243] __efi_rt_asm_wrapper+0x54/0x70 [ 55.487246] efi_call_rts+0x28c/0x3d0 [ 55.487249] process_one_work+0x1d0/0x320 [ 55.487258] worker_thread+0x14c/0x444 [ 55.487261] kthread+0x10c/0x110 [ 55.487264] ret_from_fork+0x10/0x20 [ 55.487268] [Firmware Bug]: Synchronous exception occurred in EFI runtime service set_time() [ 55.495735] ------------[ cut here ]------------ [ 55.495739] WARNING: CPU: 62 PID: 9 at drivers/firmware/efi/runtime-wrappers.c:111 efi_call_virt_check_flags+0x40/0xac [ 55.495746] Modules linked in: [ 55.495749] CPU: 62 PID: 9 Comm: kworker/u320:0 Tainted: G I 6.1.0-rc4 #60 [ 55.495751] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18 [ 55.495753] Workqueue: efi_rts_wq efi_call_rts [ 55.495757] pstate: 004000c9 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 55.495761] pc : efi_call_virt_check_flags+0x40/0xac [ 55.495764] lr : efi_call_rts+0x29c/0x3d0 [ 55.495767] sp : ffff80000861bd40 [ 55.495768] x29: ffff80000861bd40 x28: 0000000000000000 x27: 0000000000000000 [ 55.495772] x26: ffffb251470e9e68 x25: ffff3fff89714805 x24: 0000000000000000 [ 55.495775] x23: 0000000000000000 x22: 0000000000000000 x21: 00000000000000c0 [ 55.495778] x20: ffffb25146688de0 x19: 0000000000000000 x18: ffffffffffffffff [ 55.495780] x17: 657320656d69746e x16: 757220494645206e x15: 6920646572727563 [ 55.495784] x14: 636f206e6f697470 x13: ffff403e40540000 x12: 0000000000001c14 [ 55.495787] x11: 000000000000095c x10: ffff403e40800000 x9 : ffff403e40540000 [ 55.495790] x8 : 00000000ffff7fff x7 : ffff403e40800000 x6 : 0000000000000000 [ 55.495792] x5 : ffff083e7fe9aaa0 x4 : 0000000000000000 x3 : 0000000000000000 [ 55.495796] x2 : 0000000000000000 x1 : ffffb25146688de0 x0 : 00000000000000c0 [ 55.495799] Call trace: [ 55.495800] efi_call_virt_check_flags+0x40/0xac [ 55.495802] efi_call_rts+0x29c/0x3d0 [ 55.495805] process_one_work+0x1d0/0x320 [ 55.495808] worker_thread+0x14c/0x444 [ 55.495811] kthread+0x10c/0x110 [ 55.495814] ret_from_fork+0x10/0x20 [ 55.495815] ---[ end trace 0000000000000000 ]--- [ 55.495818] Disabling lock debugging due to kernel taint [ 55.495822] efi: [Firmware Bug]: IRQ flags corrupted (0x00000000=>0x000000c0) by EFI set_time [ 55.504434] efi: EFI Runtime Services are disabled! [ 55.504465] rtc-efi rtc-efi.0: can't read time [ 56.479370] efi: EFI Runtime Services are disabled! [ 56.479394] rtc-efi rtc-efi.0: can't read time [ 57.479574] rtc-efi rtc-efi.0: can't read time [ 57.484030] rtc-efi rtc-efi.0: can't read time [ 57.488474] rtc-efi rtc-efi.0: can't read time [ 58.479692] rtc-efi rtc-efi.0: can't read time [ 58.484139] rtc-efi rtc-efi.0: can't read time (rtc-efi error message repeats ad nauseum) Note: this error message from the EFI rtc driver fires over and over and clutters dmesg, will send a different report for this as I don't think it's necessarily related to the two functions. Thanks, Alex ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Possible BUG] arm64: efi: efi_runtime_fixup_exception() and efi_call_virt_check_flags() both taint the kernel 2022-11-07 17:27 [Possible BUG] arm64: efi: efi_runtime_fixup_exception() and efi_call_virt_check_flags() both taint the kernel Alexandru Elisei @ 2022-11-07 17:47 ` Ard Biesheuvel 2022-11-08 10:09 ` Alexandru Elisei 2022-11-08 12:19 ` [Possible BUG] arm64: efi: efi_runtime_fixup_exception() and efi_call_virt_check_flags() both taint the kernel #forregzbot Thorsten Leemhuis 1 sibling, 1 reply; 6+ messages in thread From: Ard Biesheuvel @ 2022-11-07 17:47 UTC (permalink / raw) To: Alexandru Elisei Cc: catalin.marinas, will, linux-efi, linux-arm-kernel, linux-kernel Hello Alexandru, Thanks a lot for the report. On Mon, 7 Nov 2022 at 18:27, Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > I'm going to preface this by saying that I'm extremely unfamiliar with the > EFI code. > > Commit d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when > possible") skipped the call to SetVirtualAddressMap() for certain > configurations, and that started causing kernel panics on an Ampere Altra > machine due to an EFI synchronous exception. > > Commit 23715a26c8d8 ("arm64: efi: Recover from synchronous exceptions > occurring in firmware") made the EFI exception non-fatal. > > With a kernel built from v6.1-rc4 (which has both patches), I'm now getting > two splats on the same Altra machine (log below). Looks to me like the > second splat is caused by efi_call_virt_check_flags() using the > PSTATE.{I,F} values from when taking the exception. Shouldn't > efi_runtime_fixup_exception() fix up the exception so the error isn't > propagated along the call chain? > No, that is not exactly how I intended this to work. The new code will essentially do a pseudo-longjmp() back to the asm wrapper if a sync exception occurs during a runtime services call, and return EFI_ABORTED to the caller. This return will go via the ordinary setup/teardown helpers that check whether interrupts were left in a different state by the firmware. > I'm asking this because efi_runtime_fixup_exception() has this add_taint() > call: > > pr_err(FW_BUG "Unable to handle %s in EFI runtime service\n", msg); > add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK); > So this one runs first, and sets the taint because we crashed in the firmware. There is no shared state between the firmware and the kernel, so we should be able to carry on as usual. > and then efi_call_virt_check_flags() has this call: > > mismatch = flags ^ cur_flags; > if (!WARN_ON_ONCE(mismatch & ARCH_EFI_IRQ_FLAGS_MASK)) > return; > > add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_NOW_UNRELIABLE); > > It looks to me like LOCKDEP_STILL_OK from the first call is at odds with > LOCKDEP_NOW_UNRELIABLE from the second add_taint() call. > This one runs next, and sets the taint because the sync exception occurred when the EFI runtime services was running with interrupts disabled. The latter check was added because when we started running runtime services with interrupts enabled (which wasn't the case before), we started to see issues with firmware that disabled interrupts but never re-enabled them again. So this is really a different issue. So the question is whether we can assume that we can carry on as usual when we abort a runtime service call that has disabled interrupts, while we set the LOCKDEP_NOW_UNRELIABLE flag if the runtime service simply returned that way. So my assumption here was that we cannot, and the double taint is simply the result of two different things happening at the same time. > Here is the relevant part of the log (I can send the .config, kernel > command line and full log, or any other information that might be needed): > Thanks, this is really useful. > [ 55.479519] [Firmware Bug]: Unable to handle paging request in EFI runtime service > [ 55.487122] CPU: 62 PID: 9 Comm: kworker/u320:0 Tainted: G I 6.1.0-rc4 #60 > [ 55.487128] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18 > [ 55.487131] Workqueue: efi_rts_wq efi_call_rts > [ 55.487158] Call trace: > [ 55.487161] dump_backtrace.part.0+0xdc/0xf0 > [ 55.487177] show_stack+0x18/0x40 > [ 55.487180] dump_stack_lvl+0x68/0x84 > [ 55.487190] dump_stack+0x18/0x34 > [ 55.487192] efi_runtime_fixup_exception+0x74/0x88 > [ 55.487199] __do_kernel_fault+0x108/0x1b0 > [ 55.487204] do_page_fault+0xd0/0x400 > [ 55.487207] do_translation_fault+0xac/0xc0 > [ 55.487209] do_mem_abort+0x44/0x94 > [ 55.487212] el1_abort+0x40/0x6c > [ 55.487214] el1h_64_sync_handler+0xd8/0xe4 > [ 55.487218] el1h_64_sync+0x64/0x68 > [ 55.487221] 0xb7eb7ae4 > [ 55.487224] 0xb7eb8668 > [ 55.487225] 0xb7eb6e08 > [ 55.487227] 0xb7eb68ec > [ 55.487228] 0xb7eb3824 > [ 55.487230] 0xb7eb05a8 > [ 55.487231] 0xb7eb12a0 > [ 55.487232] 0xb7e43504 > [ 55.487234] 0xb7e43650 > [ 55.487235] 0xb7e482d0 > [ 55.487237] 0xb7e4907c > [ 55.487238] 0xb7e49ff4 > [ 55.487239] 0xb7e40888 > [ 55.487241] 0xb7cb3328 > [ 55.487242] 0xb7cb0674 > [ 55.487243] __efi_rt_asm_wrapper+0x54/0x70 > [ 55.487246] efi_call_rts+0x28c/0x3d0 > [ 55.487249] process_one_work+0x1d0/0x320 > [ 55.487258] worker_thread+0x14c/0x444 > [ 55.487261] kthread+0x10c/0x110 > [ 55.487264] ret_from_fork+0x10/0x20 > [ 55.487268] [Firmware Bug]: Synchronous exception occurred in EFI runtime service set_time() Interestingly, this occurs on set_time() rather than get_time(), which is called first when the efi-rtc driver loads. If you blacklist the driver, do the EFI variables still work on this machine? This is mostly relevant for the vendor I suppose, but interesting nonetheless Ultimately, we might end up having to revert d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when possible"), but this would be rather unfortunate: that puts us in the same situation as x86, where some systems need SetVirtualAddressMap() to be called, and some crash when you call it (the Snapdragon WoA laptops) > [ 55.495735] ------------[ cut here ]------------ > [ 55.495739] WARNING: CPU: 62 PID: 9 at drivers/firmware/efi/runtime-wrappers.c:111 efi_call_virt_check_flags+0x40/0xac > [ 55.495746] Modules linked in: > [ 55.495749] CPU: 62 PID: 9 Comm: kworker/u320:0 Tainted: G I 6.1.0-rc4 #60 > [ 55.495751] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18 > [ 55.495753] Workqueue: efi_rts_wq efi_call_rts > [ 55.495757] pstate: 004000c9 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 55.495761] pc : efi_call_virt_check_flags+0x40/0xac > [ 55.495764] lr : efi_call_rts+0x29c/0x3d0 > [ 55.495767] sp : ffff80000861bd40 > [ 55.495768] x29: ffff80000861bd40 x28: 0000000000000000 x27: 0000000000000000 > [ 55.495772] x26: ffffb251470e9e68 x25: ffff3fff89714805 x24: 0000000000000000 > [ 55.495775] x23: 0000000000000000 x22: 0000000000000000 x21: 00000000000000c0 > [ 55.495778] x20: ffffb25146688de0 x19: 0000000000000000 x18: ffffffffffffffff > [ 55.495780] x17: 657320656d69746e x16: 757220494645206e x15: 6920646572727563 > [ 55.495784] x14: 636f206e6f697470 x13: ffff403e40540000 x12: 0000000000001c14 > [ 55.495787] x11: 000000000000095c x10: ffff403e40800000 x9 : ffff403e40540000 > [ 55.495790] x8 : 00000000ffff7fff x7 : ffff403e40800000 x6 : 0000000000000000 > [ 55.495792] x5 : ffff083e7fe9aaa0 x4 : 0000000000000000 x3 : 0000000000000000 > [ 55.495796] x2 : 0000000000000000 x1 : ffffb25146688de0 x0 : 00000000000000c0 > [ 55.495799] Call trace: > [ 55.495800] efi_call_virt_check_flags+0x40/0xac > [ 55.495802] efi_call_rts+0x29c/0x3d0 > [ 55.495805] process_one_work+0x1d0/0x320 > [ 55.495808] worker_thread+0x14c/0x444 > [ 55.495811] kthread+0x10c/0x110 > [ 55.495814] ret_from_fork+0x10/0x20 > [ 55.495815] ---[ end trace 0000000000000000 ]--- > [ 55.495818] Disabling lock debugging due to kernel taint > [ 55.495822] efi: [Firmware Bug]: IRQ flags corrupted (0x00000000=>0x000000c0) by EFI set_time > [ 55.504434] efi: EFI Runtime Services are disabled! > [ 55.504465] rtc-efi rtc-efi.0: can't read time > [ 56.479370] efi: EFI Runtime Services are disabled! > [ 56.479394] rtc-efi rtc-efi.0: can't read time > [ 57.479574] rtc-efi rtc-efi.0: can't read time > [ 57.484030] rtc-efi rtc-efi.0: can't read time > [ 57.488474] rtc-efi rtc-efi.0: can't read time > [ 58.479692] rtc-efi rtc-efi.0: can't read time > [ 58.484139] rtc-efi rtc-efi.0: can't read time > > (rtc-efi error message repeats ad nauseum) > > Note: this error message from the EFI rtc driver fires over and over and > clutters dmesg, will send a different report for this as I don't think it's > necessarily related to the two functions. > Yes, please. That should at least have a rate limit on it, but maybe a warn_once is better here. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Possible BUG] arm64: efi: efi_runtime_fixup_exception() and efi_call_virt_check_flags() both taint the kernel 2022-11-07 17:47 ` Ard Biesheuvel @ 2022-11-08 10:09 ` Alexandru Elisei 2022-11-09 9:07 ` Ard Biesheuvel 0 siblings, 1 reply; 6+ messages in thread From: Alexandru Elisei @ 2022-11-08 10:09 UTC (permalink / raw) To: Ard Biesheuvel Cc: catalin.marinas, will, linux-efi, linux-arm-kernel, linux-kernel On Mon, Nov 07, 2022 at 06:47:48PM +0100, Ard Biesheuvel wrote: > Hello Alexandru, > > Thanks a lot for the report. > > On Mon, 7 Nov 2022 at 18:27, Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > > > I'm going to preface this by saying that I'm extremely unfamiliar with the > > EFI code. > > > > Commit d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when > > possible") skipped the call to SetVirtualAddressMap() for certain > > configurations, and that started causing kernel panics on an Ampere Altra > > machine due to an EFI synchronous exception. > > > > Commit 23715a26c8d8 ("arm64: efi: Recover from synchronous exceptions > > occurring in firmware") made the EFI exception non-fatal. > > > > With a kernel built from v6.1-rc4 (which has both patches), I'm now getting > > two splats on the same Altra machine (log below). Looks to me like the > > second splat is caused by efi_call_virt_check_flags() using the > > PSTATE.{I,F} values from when taking the exception. Shouldn't > > efi_runtime_fixup_exception() fix up the exception so the error isn't > > propagated along the call chain? > > > > No, that is not exactly how I intended this to work. > > The new code will essentially do a pseudo-longjmp() back to the asm > wrapper if a sync exception occurs during a runtime services call, and > return EFI_ABORTED to the caller. This return will go via the ordinary > setup/teardown helpers that check whether interrupts were left in a > different state by the firmware. > > > I'm asking this because efi_runtime_fixup_exception() has this add_taint() > > call: > > > > pr_err(FW_BUG "Unable to handle %s in EFI runtime service\n", msg); > > add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK); > > > > So this one runs first, and sets the taint because we crashed in the > firmware. There is no shared state between the firmware and the > kernel, so we should be able to carry on as usual. > > > and then efi_call_virt_check_flags() has this call: > > > > mismatch = flags ^ cur_flags; > > if (!WARN_ON_ONCE(mismatch & ARCH_EFI_IRQ_FLAGS_MASK)) > > return; > > > > add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_NOW_UNRELIABLE); > > > > It looks to me like LOCKDEP_STILL_OK from the first call is at odds with > > LOCKDEP_NOW_UNRELIABLE from the second add_taint() call. > > > > This one runs next, and sets the taint because the sync exception > occurred when the EFI runtime services was running with interrupts > disabled. > > The latter check was added because when we started running runtime > services with interrupts enabled (which wasn't the case before), we > started to see issues with firmware that disabled interrupts but never > re-enabled them again. So this is really a different issue. > > So the question is whether we can assume that we can carry on as usual > when we abort a runtime service call that has disabled interrupts, > while we set the LOCKDEP_NOW_UNRELIABLE flag if the runtime service > simply returned that way. efi_handle_runtime_exception() clears the EFI_RUNTIME_SERVICES in efi.flags, and the comment for the flag suggests that runtime services will not be available from that point on. If runtime services are disabled, does it still make sense to set LOCKDEP_NOW_UNRELIABLE? > > So my assumption here was that we cannot, and the double taint is > simply the result of two different things happening at the same time. > > > > Here is the relevant part of the log (I can send the .config, kernel > > command line and full log, or any other information that might be needed): > > > > Thanks, this is really useful. > > > [ 55.479519] [Firmware Bug]: Unable to handle paging request in EFI runtime service > > [ 55.487122] CPU: 62 PID: 9 Comm: kworker/u320:0 Tainted: G I 6.1.0-rc4 #60 > > [ 55.487128] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18 > > [ 55.487131] Workqueue: efi_rts_wq efi_call_rts > > [ 55.487158] Call trace: > > [ 55.487161] dump_backtrace.part.0+0xdc/0xf0 > > [ 55.487177] show_stack+0x18/0x40 > > [ 55.487180] dump_stack_lvl+0x68/0x84 > > [ 55.487190] dump_stack+0x18/0x34 > > [ 55.487192] efi_runtime_fixup_exception+0x74/0x88 > > [ 55.487199] __do_kernel_fault+0x108/0x1b0 > > [ 55.487204] do_page_fault+0xd0/0x400 > > [ 55.487207] do_translation_fault+0xac/0xc0 > > [ 55.487209] do_mem_abort+0x44/0x94 > > [ 55.487212] el1_abort+0x40/0x6c > > [ 55.487214] el1h_64_sync_handler+0xd8/0xe4 > > [ 55.487218] el1h_64_sync+0x64/0x68 > > [ 55.487221] 0xb7eb7ae4 > > [ 55.487224] 0xb7eb8668 > > [ 55.487225] 0xb7eb6e08 > > [ 55.487227] 0xb7eb68ec > > [ 55.487228] 0xb7eb3824 > > [ 55.487230] 0xb7eb05a8 > > [ 55.487231] 0xb7eb12a0 > > [ 55.487232] 0xb7e43504 > > [ 55.487234] 0xb7e43650 > > [ 55.487235] 0xb7e482d0 > > [ 55.487237] 0xb7e4907c > > [ 55.487238] 0xb7e49ff4 > > [ 55.487239] 0xb7e40888 > > [ 55.487241] 0xb7cb3328 > > [ 55.487242] 0xb7cb0674 > > [ 55.487243] __efi_rt_asm_wrapper+0x54/0x70 > > [ 55.487246] efi_call_rts+0x28c/0x3d0 > > [ 55.487249] process_one_work+0x1d0/0x320 > > [ 55.487258] worker_thread+0x14c/0x444 > > [ 55.487261] kthread+0x10c/0x110 > > [ 55.487264] ret_from_fork+0x10/0x20 > > [ 55.487268] [Firmware Bug]: Synchronous exception occurred in EFI runtime service set_time() > > Interestingly, this occurs on set_time() rather than get_time(), which > is called first when the efi-rtc driver loads. > > If you blacklist the driver, do the EFI variables still work on this > machine? This is mostly relevant for the vendor I suppose, but > interesting nonetheless Works just fine, tried it with v6.1-rc4, no errors on dmesg, /sys/firmware/efi/efivars is mounted and looks sane to me (efibootmgr also seems to be working fine, didn't modify/add entries though). > > Ultimately, we might end up having to revert d3549a938b73 ("efi/arm64: > libstub: avoid SetVirtualAddressMap() when possible"), but this would > be rather unfortunate: that puts us in the same situation as x86, > where some systems need SetVirtualAddressMap() to be called, and some > crash when you call it (the Snapdragon WoA laptops) Speaking as an user, I think it would be nice to revert the commit, that's how I am running v6.1-rcX kernels on the machine, as updating the firmware is not feasible right now. But I realize that I'm not the one maintaining the code, so I don't have a strong opinion about it :) And it's better now than it was at rc3, when the kernel was panicing. Thanks, Alex > > > [ 55.495735] ------------[ cut here ]------------ > > [ 55.495739] WARNING: CPU: 62 PID: 9 at drivers/firmware/efi/runtime-wrappers.c:111 efi_call_virt_check_flags+0x40/0xac > > [ 55.495746] Modules linked in: > > [ 55.495749] CPU: 62 PID: 9 Comm: kworker/u320:0 Tainted: G I 6.1.0-rc4 #60 > > [ 55.495751] Hardware name: WIWYNN Mt.Jade Server System B81.03001.0005/Mt.Jade Motherboard, BIOS 1.08.20220218 (SCP: 1.08.20220218) 2022/02/18 > > [ 55.495753] Workqueue: efi_rts_wq efi_call_rts > > [ 55.495757] pstate: 004000c9 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > [ 55.495761] pc : efi_call_virt_check_flags+0x40/0xac > > [ 55.495764] lr : efi_call_rts+0x29c/0x3d0 > > [ 55.495767] sp : ffff80000861bd40 > > [ 55.495768] x29: ffff80000861bd40 x28: 0000000000000000 x27: 0000000000000000 > > [ 55.495772] x26: ffffb251470e9e68 x25: ffff3fff89714805 x24: 0000000000000000 > > [ 55.495775] x23: 0000000000000000 x22: 0000000000000000 x21: 00000000000000c0 > > [ 55.495778] x20: ffffb25146688de0 x19: 0000000000000000 x18: ffffffffffffffff > > [ 55.495780] x17: 657320656d69746e x16: 757220494645206e x15: 6920646572727563 > > [ 55.495784] x14: 636f206e6f697470 x13: ffff403e40540000 x12: 0000000000001c14 > > [ 55.495787] x11: 000000000000095c x10: ffff403e40800000 x9 : ffff403e40540000 > > [ 55.495790] x8 : 00000000ffff7fff x7 : ffff403e40800000 x6 : 0000000000000000 > > [ 55.495792] x5 : ffff083e7fe9aaa0 x4 : 0000000000000000 x3 : 0000000000000000 > > [ 55.495796] x2 : 0000000000000000 x1 : ffffb25146688de0 x0 : 00000000000000c0 > > [ 55.495799] Call trace: > > [ 55.495800] efi_call_virt_check_flags+0x40/0xac > > [ 55.495802] efi_call_rts+0x29c/0x3d0 > > [ 55.495805] process_one_work+0x1d0/0x320 > > [ 55.495808] worker_thread+0x14c/0x444 > > [ 55.495811] kthread+0x10c/0x110 > > [ 55.495814] ret_from_fork+0x10/0x20 > > [ 55.495815] ---[ end trace 0000000000000000 ]--- > > [ 55.495818] Disabling lock debugging due to kernel taint > > [ 55.495822] efi: [Firmware Bug]: IRQ flags corrupted (0x00000000=>0x000000c0) by EFI set_time > > [ 55.504434] efi: EFI Runtime Services are disabled! > > [ 55.504465] rtc-efi rtc-efi.0: can't read time > > [ 56.479370] efi: EFI Runtime Services are disabled! > > [ 56.479394] rtc-efi rtc-efi.0: can't read time > > [ 57.479574] rtc-efi rtc-efi.0: can't read time > > [ 57.484030] rtc-efi rtc-efi.0: can't read time > > [ 57.488474] rtc-efi rtc-efi.0: can't read time > > [ 58.479692] rtc-efi rtc-efi.0: can't read time > > [ 58.484139] rtc-efi rtc-efi.0: can't read time > > > > (rtc-efi error message repeats ad nauseum) > > > > Note: this error message from the EFI rtc driver fires over and over and > > clutters dmesg, will send a different report for this as I don't think it's > > necessarily related to the two functions. > > > > Yes, please. That should at least have a rate limit on it, but maybe a > warn_once is better here. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Possible BUG] arm64: efi: efi_runtime_fixup_exception() and efi_call_virt_check_flags() both taint the kernel 2022-11-08 10:09 ` Alexandru Elisei @ 2022-11-09 9:07 ` Ard Biesheuvel 2022-11-09 10:08 ` Alexandru Elisei 0 siblings, 1 reply; 6+ messages in thread From: Ard Biesheuvel @ 2022-11-09 9:07 UTC (permalink / raw) To: Alexandru Elisei Cc: catalin.marinas, will, linux-efi, linux-arm-kernel, linux-kernel On Tue, 8 Nov 2022 at 11:10, Alexandru Elisei <alexandru.elisei@arm.com> wrote: > ... > > Speaking as an user, I think it would be nice to revert the commit, that's > how I am running v6.1-rcX kernels on the machine, as updating the firmware > is not feasible right now. But I realize that I'm not the one maintaining > the code, so I don't have a strong opinion about it :) And it's better now > than it was at rc3, when the kernel was panicing. > I sent out a patch yesterday that tweaks the sync exception fixup handler to only disable the runtime service that triggered the exception. This means, of course, that you might hit it multiple times if several runtime service implementations are buggy, but there are only five or so that we actually use, so that shouldn't make a huge difference. But it also means a) we don't trigger other code paths that freak out when a runtime service that was available suddenly goes away and b) the diagnostics are more useful because we will find out which other runtime services are broken. Could you please test that patch? And for good measure, could you try something like efibootmgr -t 3 (as root) to exercise the SetVariable() path as well? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Possible BUG] arm64: efi: efi_runtime_fixup_exception() and efi_call_virt_check_flags() both taint the kernel 2022-11-09 9:07 ` Ard Biesheuvel @ 2022-11-09 10:08 ` Alexandru Elisei 0 siblings, 0 replies; 6+ messages in thread From: Alexandru Elisei @ 2022-11-09 10:08 UTC (permalink / raw) To: Ard Biesheuvel Cc: catalin.marinas, will, linux-efi, linux-arm-kernel, linux-kernel Hi, On Wed, Nov 09, 2022 at 10:07:00AM +0100, Ard Biesheuvel wrote: > On Tue, 8 Nov 2022 at 11:10, Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > > ... > > > > Speaking as an user, I think it would be nice to revert the commit, that's > > how I am running v6.1-rcX kernels on the machine, as updating the firmware > > is not feasible right now. But I realize that I'm not the one maintaining > > the code, so I don't have a strong opinion about it :) And it's better now > > than it was at rc3, when the kernel was panicing. > > > > I sent out a patch yesterday that tweaks the sync exception fixup > handler to only disable the runtime service that triggered the > exception. This means, of course, that you might hit it multiple times > if several runtime service implementations are buggy, but there are > only five or so that we actually use, so that shouldn't make a huge > difference. But it also means a) we don't trigger other code paths > that freak out when a runtime service that was available suddenly goes > away and b) the diagnostics are more useful because we will find out > which other runtime services are broken. > > Could you please test that patch? And for good measure, could you try > something like > > efibootmgr -t 3 > > (as root) to exercise the SetVariable() path as well? Tried booting with the patch applied yesterday, no regression as far as I could tell (still hitting the two add_taint() statements, but that's not what the patch does). I was trying to figure out how I could test that the other runtime services are still working correctly, your suggestion is exactly what I was looking for, thanks. The machine is a shared machine, will test when I get access to it (hopefully later today) and post my findings. Thanks, Alex ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Possible BUG] arm64: efi: efi_runtime_fixup_exception() and efi_call_virt_check_flags() both taint the kernel #forregzbot 2022-11-07 17:27 [Possible BUG] arm64: efi: efi_runtime_fixup_exception() and efi_call_virt_check_flags() both taint the kernel Alexandru Elisei 2022-11-07 17:47 ` Ard Biesheuvel @ 2022-11-08 12:19 ` Thorsten Leemhuis 1 sibling, 0 replies; 6+ messages in thread From: Thorsten Leemhuis @ 2022-11-08 12:19 UTC (permalink / raw) To: linux-efi, linux-arm-kernel, linux-kernel, regressions [Note: this mail is primarily send for documentation purposes and/or for regzbot, my Linux kernel regression tracking bot. That's why I removed most or all folks from the list of recipients, but left any that looked like a mailing lists. These mails usually contain '#forregzbot' in the subject, to make them easy to spot and filter out.] [TLDR: I'm adding this regression report to the list of tracked regressions; all text from me you find below is based on a few templates paragraphs you might have encountered already already in similar form.] Hi, this is your Linux kernel regression tracker. On 07.11.22 18:27, Alexandru Elisei wrote: > I'm going to preface this by saying that I'm extremely unfamiliar with the > EFI code. > > Commit d3549a938b73 ("efi/arm64: libstub: avoid SetVirtualAddressMap() when > possible") skipped the call to SetVirtualAddressMap() for certain > configurations, and that started causing kernel panics on an Ampere Altra > machine due to an EFI synchronous exception. > > Commit 23715a26c8d8 ("arm64: efi: Recover from synchronous exceptions > occurring in firmware") made the EFI exception non-fatal. Thanks for the report. To be sure below issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, my Linux kernel regression tracking bot: #regzbot ^introduced d3549a938b73 #regzbot title efi: arm64: updating the firmware is not feasible anymore #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply -- ideally with also telling regzbot about it, as explained here: https://linux-regtracking.leemhuis.info/tracked-regression/ Reminder for developers: When fixing the issue, add 'Link:' tags pointing to the report (the mail this one replies to), as explained for in the Linux kernel's documentation; above webpage explains why this is important for tracked regressions. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-11-09 10:09 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-07 17:27 [Possible BUG] arm64: efi: efi_runtime_fixup_exception() and efi_call_virt_check_flags() both taint the kernel Alexandru Elisei 2022-11-07 17:47 ` Ard Biesheuvel 2022-11-08 10:09 ` Alexandru Elisei 2022-11-09 9:07 ` Ard Biesheuvel 2022-11-09 10:08 ` Alexandru Elisei 2022-11-08 12:19 ` [Possible BUG] arm64: efi: efi_runtime_fixup_exception() and efi_call_virt_check_flags() both taint the kernel #forregzbot Thorsten Leemhuis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).