* WARNING: can't access registers at asm_common_interrupt @ 2020-11-06 6:04 Shinichiro Kawasaki 2020-11-06 18:06 ` Josh Poimboeuf 2020-11-11 17:05 ` Josh Poimboeuf 0 siblings, 2 replies; 32+ messages in thread From: Shinichiro Kawasaki @ 2020-11-06 6:04 UTC (permalink / raw) To: linux-kernel; +Cc: Josh Poimboeuf, Nicholas Piggin, Damien Le Moal Greetings, I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40" in my kernel test system repeatedly, which is printed by unwind_next_frame() in "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar warning was reported and discussed [2], but I suppose the cause is not yet clarified. The warning was observed with v5.10-rc2 and older tags. I bisected and found that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3 triggered the warning. Reverting that from 5.10-rc2, the warning disappeared. May I ask comment by expertise on CC how this commit can relate to the warning? The test condition to reproduce the warning is rather unique (blktests, dm-linear and ZNS device emulation by QEMU). If any action is suggested for further analysis, I'm willing to take it with my test system. Wish this report helps. [1] https://lkml.org/lkml/2020/9/6/231 [2] https://lkml.org/lkml/2020/9/8/1538 -- Best Regards, Shin'ichiro Kawasaki ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-06 6:04 WARNING: can't access registers at asm_common_interrupt Shinichiro Kawasaki @ 2020-11-06 18:06 ` Josh Poimboeuf 2020-11-09 9:10 ` Shinichiro Kawasaki 2020-11-11 17:05 ` Josh Poimboeuf 1 sibling, 1 reply; 32+ messages in thread From: Josh Poimboeuf @ 2020-11-06 18:06 UTC (permalink / raw) To: Shinichiro Kawasaki; +Cc: linux-kernel, Nicholas Piggin, Damien Le Moal On Fri, Nov 06, 2020 at 06:04:15AM +0000, Shinichiro Kawasaki wrote: > Greetings, > > I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40" > in my kernel test system repeatedly, which is printed by unwind_next_frame() in > "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar > warning was reported and discussed [2], but I suppose the cause is not yet > clarified. > > The warning was observed with v5.10-rc2 and older tags. I bisected and found > that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3 > triggered the warning. Reverting that from 5.10-rc2, the warning disappeared. > May I ask comment by expertise on CC how this commit can relate to the warning? > > The test condition to reproduce the warning is rather unique (blktests, > dm-linear and ZNS device emulation by QEMU). If any action is suggested for > further analysis, I'm willing to take it with my test system. Hi, Thanks for reporting this issue. This might be a different issue from [2]. Can you send me the arch/x86/entry/entry_64.o file from your build? -- Josh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-06 18:06 ` Josh Poimboeuf @ 2020-11-09 9:10 ` Shinichiro Kawasaki 2020-11-10 3:19 ` Josh Poimboeuf 0 siblings, 1 reply; 32+ messages in thread From: Shinichiro Kawasaki @ 2020-11-09 9:10 UTC (permalink / raw) To: Josh Poimboeuf; +Cc: linux-kernel, Nicholas Piggin, Damien Le Moal On Nov 06, 2020 / 12:06, Josh Poimboeuf wrote: > On Fri, Nov 06, 2020 at 06:04:15AM +0000, Shinichiro Kawasaki wrote: > > Greetings, > > > > I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40" > > in my kernel test system repeatedly, which is printed by unwind_next_frame() in > > "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar > > warning was reported and discussed [2], but I suppose the cause is not yet > > clarified. > > > > The warning was observed with v5.10-rc2 and older tags. I bisected and found > > that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3 > > triggered the warning. Reverting that from 5.10-rc2, the warning disappeared. > > May I ask comment by expertise on CC how this commit can relate to the warning? > > > > The test condition to reproduce the warning is rather unique (blktests, > > dm-linear and ZNS device emulation by QEMU). If any action is suggested for > > further analysis, I'm willing to take it with my test system. > > Hi, > > Thanks for reporting this issue. This might be a different issue from > [2]. > > Can you send me the arch/x86/entry/entry_64.o file from your build? Hi Josh, thank you for your response. As a separated e-mail, I have sent the entry_64.o only to your address, since I hesitate to send around the 76kb attachment file to LKML. In case it does not reach to you, please let me know. -- Best Regards, Shin'ichiro Kawasaki ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-09 9:10 ` Shinichiro Kawasaki @ 2020-11-10 3:19 ` Josh Poimboeuf 2020-11-10 9:19 ` Shinichiro Kawasaki 0 siblings, 1 reply; 32+ messages in thread From: Josh Poimboeuf @ 2020-11-10 3:19 UTC (permalink / raw) To: Shinichiro Kawasaki; +Cc: linux-kernel, Nicholas Piggin, Damien Le Moal On Mon, Nov 09, 2020 at 09:10:38AM +0000, Shinichiro Kawasaki wrote: > On Nov 06, 2020 / 12:06, Josh Poimboeuf wrote: > > On Fri, Nov 06, 2020 at 06:04:15AM +0000, Shinichiro Kawasaki wrote: > > > Greetings, > > > > > > I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40" > > > in my kernel test system repeatedly, which is printed by unwind_next_frame() in > > > "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar > > > warning was reported and discussed [2], but I suppose the cause is not yet > > > clarified. > > > > > > The warning was observed with v5.10-rc2 and older tags. I bisected and found > > > that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3 > > > triggered the warning. Reverting that from 5.10-rc2, the warning disappeared. > > > May I ask comment by expertise on CC how this commit can relate to the warning? > > > > > > The test condition to reproduce the warning is rather unique (blktests, > > > dm-linear and ZNS device emulation by QEMU). If any action is suggested for > > > further analysis, I'm willing to take it with my test system. > > > > Hi, > > > > Thanks for reporting this issue. This might be a different issue from > > [2]. > > > > Can you send me the arch/x86/entry/entry_64.o file from your build? > > Hi Josh, thank you for your response. As a separated e-mail, I have sent the > entry_64.o only to your address, since I hesitate to send around the 76kb > attachment file to LKML. In case it does not reach to you, please let me know. Got it, thanks. Unfortunately I'm still confused. Can you test with the following patch, and boot with 'unwind_debug'? That should hopefully dump a lot of useful data along with the warning. From: Josh Poimboeuf <jpoimboe@redhat.com> Subject: [PATCH] x86/unwind/orc: Add 'unwind_debug' cmdline option Sometimes the one-line ORC unwinder warnings aren't very helpful. Add a new 'unwind_debug' cmdline option which will dump the full stack contents of the current task when an error condition is encountered. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> --- .../admin-guide/kernel-parameters.txt | 6 +++ arch/x86/kernel/unwind_orc.c | 48 ++++++++++++++++++- 2 files changed, 53 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 526d65d8573a..4bed92c51723 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -5512,6 +5512,12 @@ unknown_nmi_panic [X86] Cause panic on unknown NMI. + unwind_debug [X86-64] + Enable unwinder debug output. This can be + useful for debugging certain unwinder error + conditions, including corrupt stacks and + bad/missing unwinder metadata. + usbcore.authorized_default= [USB] Default USB device authorization: (default -1 = authorized except for wireless USB, diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index 73f800100066..44bae03f9bfc 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -13,8 +13,13 @@ #define orc_warn_current(args...) \ ({ \ - if (state->task == current) \ + static bool dumped_before; \ + if (state->task == current) { \ orc_warn(args); \ + if (unwind_debug && !dumped_before) \ + unwind_dump(state); \ + dumped_before = true; \ + } \ }) extern int __start_orc_unwind_ip[]; @@ -23,8 +28,49 @@ extern struct orc_entry __start_orc_unwind[]; extern struct orc_entry __stop_orc_unwind[]; static bool orc_init __ro_after_init; +static bool unwind_debug __ro_after_init; static unsigned int lookup_num_blocks __ro_after_init; +static int __init unwind_debug_cmdline(char *str) +{ + unwind_debug = true; + + return 0; +} +early_param("unwind_debug", unwind_debug_cmdline); + +static void unwind_dump(struct unwind_state *state) +{ + static bool dumped_before; + unsigned long word, *sp; + struct stack_info stack_info = {0}; + unsigned long visit_mask = 0; + + if (dumped_before) + return; + + dumped_before = true; + + printk_deferred("unwind stack type:%d next_sp:%p mask:0x%lx graph_idx:%d\n", + state->stack_info.type, state->stack_info.next_sp, + state->stack_mask, state->graph_idx); + + for (sp = __builtin_frame_address(0); sp; + sp = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { + if (get_stack_info(sp, state->task, &stack_info, &visit_mask)) + break; + + for (; sp < stack_info.end; sp++) { + + word = READ_ONCE_NOCHECK(*sp); + + printk_deferred("%0*lx: %0*lx (%pB)\n", BITS_PER_LONG/4, + (unsigned long)sp, BITS_PER_LONG/4, + word, (void *)word); + } + } +} + static inline unsigned long orc_ip(const int *ip) { return (unsigned long)ip + *ip; -- 2.25.4 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-10 3:19 ` Josh Poimboeuf @ 2020-11-10 9:19 ` Shinichiro Kawasaki 0 siblings, 0 replies; 32+ messages in thread From: Shinichiro Kawasaki @ 2020-11-10 9:19 UTC (permalink / raw) To: Josh Poimboeuf; +Cc: linux-kernel, Nicholas Piggin, Damien Le Moal On Nov 09, 2020 / 21:19, Josh Poimboeuf wrote: > On Mon, Nov 09, 2020 at 09:10:38AM +0000, Shinichiro Kawasaki wrote: > > On Nov 06, 2020 / 12:06, Josh Poimboeuf wrote: > > > On Fri, Nov 06, 2020 at 06:04:15AM +0000, Shinichiro Kawasaki wrote: > > > > Greetings, > > > > > > > > I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40" > > > > in my kernel test system repeatedly, which is printed by unwind_next_frame() in > > > > "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar > > > > warning was reported and discussed [2], but I suppose the cause is not yet > > > > clarified. > > > > > > > > The warning was observed with v5.10-rc2 and older tags. I bisected and found > > > > that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3 > > > > triggered the warning. Reverting that from 5.10-rc2, the warning disappeared. > > > > May I ask comment by expertise on CC how this commit can relate to the warning? > > > > > > > > The test condition to reproduce the warning is rather unique (blktests, > > > > dm-linear and ZNS device emulation by QEMU). If any action is suggested for > > > > further analysis, I'm willing to take it with my test system. > > > > > > Hi, > > > > > > Thanks for reporting this issue. This might be a different issue from > > > [2]. > > > > > > Can you send me the arch/x86/entry/entry_64.o file from your build? > > > > Hi Josh, thank you for your response. As a separated e-mail, I have sent the > > entry_64.o only to your address, since I hesitate to send around the 76kb > > attachment file to LKML. In case it does not reach to you, please let me know. > > Got it, thanks. Unfortunately I'm still confused. > > Can you test with the following patch, and boot with 'unwind_debug'? > That should hopefully dump a lot of useful data along with the warning. Thank you for the patch. With the patch and 'unwind_debug' kernel command line, I recreated the warning. Here I quote the kernel messages printed (smpboot related messages are usual message printed for blktests test case block/008). ... [ 113.022680] smpboot: CPU 1 is now offline [ 113.030546] WARNING: can't access registers at asm_common_interrupt+0x1e/0x40 [ 113.030549] unwind stack type:0 next_sp:0000000000000000 mask:0x6 graph_idx:0 [ 113.030554] ffff8881e87097b0: 1ffff1103d0e1302 (0x1ffff1103d0e1302) [ 113.030558] ffff8881e87097b8: ffffffff9712a434 (unwind_next_frame+0x15e4/0x1fc0) [ 113.030560] ffff8881e87097c0: ffffffff99200c5e (asm_common_interrupt+0x1e/0x40) [ 113.030562] ffff8881e87097c8: 0000000000000000 (0x0) [ 113.030563] ffff8881e87097d0: ffff8881e87098bd (0xffff8881e87098bd) [ 113.030564] ffff8881e87097d8: ffff8881e87098d8 (0xffff8881e87098d8) [ 113.030565] ffff8881e87097e0: ffff8881e87098c0 (0xffff8881e87098c0) [ 113.030570] ffff8881e87097e8: ffffffff9ac85575 (__start_orc_unwind+0x65f81/0x37a91c) [ 113.030572] ffff8881e87097f0: ffffffff9ac85575 (__start_orc_unwind+0x65f81/0x37a91c) [ 113.030573] ffff8881e87097f8: ffff8881e8709938 (0xffff8881e8709938) [ 113.030574] ffff8881e8709800: 0000000000000000 (0x0) [ 113.030597] ffff8881e8709808: ffff88810ebd31c0 (0xffff88810ebd31c0) [ 113.030598] ffff8881e8709810: 0000000041b58ab3 (0x41b58ab3) [ 113.030602] ffff8881e8709818: ffffffff99d7a850 (.LC2+0x3be/0x44d) [ 113.030604] ffff8881e8709820: ffffffff97128e50 (deref_stack_reg+0x160/0x160) [ 113.030605] ffff8881e8709828: 0000000000000000 (0x0) [ 113.030606] ffff8881e8709830: 0000000000000000 (0x0) [ 113.030610] ffff8881e8709838: ffffffff972343bf (kernel_text_address.part.0+0x1f/0xc0) [ 113.030624] ffff8881e8709840: ffffffffc046fd90 (nvme_irq+0x10/0x20 [nvme]) [ 113.030625] ffff8881e8709848: ffff8881e87098d0 (0xffff8881e87098d0) [ 113.030628] ffff8881e8709850: ffffffff973994c0 (create_prof_cpu_mask+0x20/0x20) [ 113.030629] ffff8881e8709858: ffff8881e8709908 (0xffff8881e8709908) [ 113.030630] ffff8881e8709860: ffff8881e8709938 (0xffff8881e8709938) [ 113.030631] ffff8881e8709868: 0000000000000000 (0x0) [ 113.030632] ffff8881e8709870: ffff88810ebd31c0 (0xffff88810ebd31c0) [ 113.030633] ffff8881e8709878: 0000000000000006 (0x6) [ 113.030636] ffff8881e8709880: ffffffff9709925c (arch_stack_walk+0x6c/0xb0) [ 113.030637] ffff8881e8709888: 0000000000000000 (0x0) [ 113.030638] ffff8881e8709890: ffff88814c428000 (0xffff88814c428000) [ 113.030640] ffff8881e8709898: ffff88814c430000 (0xffff88814c430000) [ 113.030641] ffff8881e87098a0: 0000000000000000 (0x0) [ 113.030642] ffff8881e87098a8: 0000000000000006 (0x6) [ 113.030643] ffff8881e87098b0: ffff88810ebd31c0 (0xffff88810ebd31c0) [ 113.030644] ffff8881e87098b8: 0000010000000000 (0x10000000000) [ 113.030645] ffff8881e87098c0: 0000000000000000 (0x0) [ 113.030646] ffff8881e87098c8: 0000000000000000 (0x0) [ 113.030648] ffff8881e87098d0: ffffffff99200c5e (asm_common_interrupt+0x1e/0x40) [ 113.030649] ffff8881e87098d8: ffff88814c42fed0 (0xffff88814c42fed0) [ 113.030650] ffff8881e87098e0: 0000000000000000 (0x0) [ 113.030651] ffff8881e87098e8: ffffed103d0e1323 (0xffffed103d0e1323) [ 113.030652] ffff8881e87098f0: 0000000000000800 (0x800) [ 113.030653] ffff8881e87098f8: ffff88810b48a780 (0xffff88810b48a780) [ 113.030654] ffff8881e8709900: ffff8881e8709c48 (0xffff8881e8709c48) [ 113.030658] ffff8881e8709908: ffffffff97862595 (kmem_cache_free+0xf5/0x590) [ 113.030660] ffff8881e8709910: ffffffff973996a1 (stack_trace_save+0x81/0xa0) [ 113.030661] ffff8881e8709918: 0000000041b58ab3 (0x41b58ab3) [ 113.030663] ffff8881e8709920: ffffffff99d89ef4 (.LC4+0x20f/0x14aa1) [ 113.030665] ffff8881e8709928: ffffffff97399620 (stack_trace_consume_entry+0x160/0x160) [ 113.030666] ffff8881e8709930: ffff8881098e9800 (0xffff8881098e9800) [ 113.030667] ffff8881e8709938: ffff8881e8709988 (0xffff8881e8709988) [ 113.030668] ffff8881e8709940: 0000000000000040 (0x40) [ 113.030669] ffff8881e8709948: 0000000000000015 (0x15) [ 113.030670] ffff8881e8709950: ffff88811858d348 (0xffff88811858d348) [ 113.030672] ffff8881e8709958: ffffffff97862595 (kmem_cache_free+0xf5/0x590) [ 113.030673] ffff8881e8709960: 0000000000000800 (0x800) [ 113.030674] ffff8881e8709968: ffff88810b48a780 (0xffff88810b48a780) [ 113.030675] ffff8881e8709970: ffff8881e8709c48 (0xffff8881e8709c48) [ 113.030676] ffff8881e8709978: ffff88811858c008 (0xffff88811858c008) [ 113.030678] ffff8881e8709980: ffffffff978649db (kasan_save_stack+0x1b/0x40) [ 113.030680] ffff8881e8709988: ffffffff978649db (kasan_save_stack+0x1b/0x40) [ 113.030681] ffff8881e8709990: ffffffff97864a1c (kasan_set_track+0x1c/0x30) [ 113.030684] ffff8881e8709998: ffffffff97866f5b (kasan_set_free_info+0x1b/0x30) [ 113.030685] ffff8881e87099a0: ffffffff97864980 (__kasan_slab_free+0x110/0x150) [ 113.030687] ffff8881e87099a8: ffffffff9785b4da (slab_free_freelist_hook+0x5a/0x170) [ 113.030689] ffff8881e87099b0: ffffffff97862595 (kmem_cache_free+0xf5/0x590) [ 113.030692] ffff8881e87099b8: ffffffff988f4da0 (dec_pending+0x1f0/0x900) [ 113.030694] ffff8881e87099c0: ffffffff988fad15 (clone_endio+0x1e5/0x940) [ 113.030697] ffff8881e87099c8: ffffffff97eac8f6 (blk_update_request+0x716/0xe20) [ 113.030699] ffff8881e87099d0: ffffffff97ed523b (blk_mq_end_request+0x4b/0x480) [ 113.030701] ffff8881e87099d8: ffffffffc046e863 (nvme_process_cq+0x563/0xa40 [nvme]) [ 113.030703] ffff8881e87099e0: ffffffffc046fd90 (nvme_irq+0x10/0x20 [nvme]) [ 113.030706] ffff8881e87099e8: ffffffff9733ef52 (__handle_irq_event_percpu+0x252/0x620) [ 113.030707] ffff8881e87099f0: ffffffff9733f51f (handle_irq_event+0xef/0x240) [ 113.030709] ffff8881e87099f8: ffffffff9734bff6 (handle_edge_irq+0x1f6/0xb60) [ 113.030712] ffff8881e8709a00: ffffffff992010b2 (asm_call_irq_on_stack+0x12/0x20) [ 113.030714] ffff8881e8709a08: ffffffff9918f05b (common_interrupt+0xeb/0x190) [ 113.030716] ffff8881e8709a10: ffffffff99200c5e (asm_common_interrupt+0x1e/0x40) [ 113.030717] ffff8881e8709a18: ffffffff97395f70 (exit_to_user_mode_prepare+0xc0/0x1d0) [ 113.030719] ffff8881e8709a20: ffffffff99200c48 (asm_common_interrupt+0x8/0x40) [ 113.030721] ffff8881e8709a28: ffffffff99200c5e (asm_common_interrupt+0x1e/0x40) [ 113.030723] ffff8881e8709a30: ffffffff991a91e0 (schedule+0xd0/0x270) [ 113.030724] ffff8881e8709a38: ffffffff97395fb7 (exit_to_user_mode_prepare+0x107/0x1d0) [ 113.030725] ffff8881e8709a40: 1ffff1103d0e134e (0x1ffff1103d0e134e) [ 113.030727] ffff8881e8709a48: ffff8881e8733dc8 (0xffff8881e8733dc8) [ 113.030728] ffff8881e8709a50: ffff8881e8733d40 (0xffff8881e8733d40) [ 113.030729] ffff8881e8709a58: 00000019d79785f8 (0x19d79785f8) [ 113.030730] ffff8881e8709a60: 0000000000000005 (0x5) [ 113.030731] ffff8881e8709a68: 0000000000000000 (0x0) [ 113.030732] ffff8881e8709a70: ffff88810b579980 (0xffff88810b579980) [ 113.030733] ffff8881e8709a78: ffff88810b579800 (0xffff88810b579800) [ 113.030736] ffff8881e8709a80: ffffffff97286cda (update_load_avg+0x1fa/0x1ad0) [ 113.030737] ffff8881e8709a88: ffff88810b57a000 (0xffff88810b57a000) [ 113.030738] ffff8881e8709a90: ffff88810b579800 (0xffff88810b579800) [ 113.030739] ffff8881e8709a98: 0000000000000400 (0x400) [ 113.030741] ffff8881e8709aa0: ffffffff9729be38 (update_cfs_group+0x118/0x290) [ 113.030743] ffff8881e8709aa8: ffffffff973146a0 (lock_downgrade+0x6a0/0x6a0) [ 113.030744] ffff8881e8709ab0: 0000000000000000 (0x0) [ 113.030745] ffff8881e8709ab8: ffff88810b579850 (0xffff88810b579850) [ 113.030746] ffff8881e8709ac0: ffff8881e8733d68 (0xffff8881e8733d68) [ 113.030747] ffff8881e8709ac8: ffff8881e8733d40 (0xffff8881e8733d40) [ 113.030748] ffff8881e8709ad0: 0000000000000001 (0x1) [ 113.030749] ffff8881e8709ad8: ffff88810b579810 (0xffff88810b579810) [ 113.030751] ffff8881e8709ae0: ffffffff972a14b2 (enqueue_entity+0x402/0x2ba0) [ 113.030752] ffff8881e8709ae8: ffff8881e8733d50 (0xffff8881e8733d50) [ 113.030753] ffff8881e8709af0: ffff88810b579800 (0xffff88810b579800) [ 113.030754] ffff8881e8709af8: ffff8881e8733d80 (0xffff8881e8733d80) [ 113.030755] ffff8881e8709b00: ffff888116873268 (0xffff888116873268) [ 113.030757] ffff8881e8709b08: ffff8881e8734790 (0xffff8881e8734790) [ 113.030758] ffff8881e8709b10: ffff88810ebd3268 (0xffff88810ebd3268) [ 113.030759] ffff8881e8709b18: ffff8881e8720e80 (0xffff8881e8720e80) [ 113.030760] ffff8881e8709b20: ffff88810b579800 (0xffff88810b579800) [ 113.030761] ffff8881e8709b28: ffff88810b579950 (0xffff88810b579950) [ 113.030762] ffff8881e8709b30: 0000000000000009 (0x9) [ 113.030763] ffff8881e8709b38: ffff8881e8733c80 (0xffff8881e8733c80) [ 113.030764] ffff8881e8709b40: ffff88810ebd31c0 (0xffff88810ebd31c0) [ 113.030765] ffff8881e8709b48: 0000000000000002 (0x2) [ 113.030768] ffff8881e8709b50: ffffffff97263aef (resched_curr+0x17f/0x1e0) [ 113.030769] ffff8881e8709b58: ffff8881098e9000 (0xffff8881098e9000) [ 113.030770] ffff8881e8709b60: ffff8881098e9038 (0xffff8881098e9038) [ 113.030771] ffff8881e8709b68: 000000000056e125 (0x56e125) [ 113.030772] ffff8881e8709b70: 0000000000000002 (0x2) [ 113.030773] ffff8881e8709b78: 1ffff1103d0e1374 (0x1ffff1103d0e1374) [ 113.030774] ffff8881e8709b80: ffff888116873e20 (0xffff888116873e20) [ 113.030775] ffff8881e8709b88: ffff88811858c000 (0xffff88811858c000) [ 113.030777] ffff8881e8709b90: ffffffff97864a1c (kasan_set_track+0x1c/0x30) [ 113.030778] ffff8881e8709b98: 1ffff110230b1800 (0x1ffff110230b1800) [ 113.030780] ffff8881e8709ba0: ffffffff97866f5b (kasan_set_free_info+0x1b/0x30) [ 113.030781] ffff8881e8709ba8: 0000000000000001 (0x1) [ 113.030782] ffff8881e8709bb0: ffffffff97864980 (__kasan_slab_free+0x110/0x150) [ 113.030783] ffff8881e8709bb8: ffff88810b48a780 (0xffff88810b48a780) [ 113.030785] ffff8881e8709bc0: ffff88811858c000 (0xffff88811858c000) [ 113.030786] ffff8881e8709bc8: 3b9c1e2c25148a56 (0x3b9c1e2c25148a56) [ 113.030787] ffff8881e8709bd0: ffff88811858c000 (0xffff88811858c000) [ 113.030788] ffff8881e8709bd8: ffffffff9785b4da (slab_free_freelist_hook+0x5a/0x170) [ 113.030789] ffff8881e8709be0: ffff8881e8709c50 (0xffff8881e8709c50) [ 113.030790] ffff8881e8709be8: ffff88819858c000 (0xffff88819858c000) [ 113.030792] ffff8881e8709bf0: ffff88810b48a780 (0xffff88810b48a780) [ 113.030793] ffff8881e8709bf8: ffffea0004616300 (0xffffea0004616300) [ 113.030794] ffff8881e8709c00: ffff88811858c000 (0xffff88811858c000) [ 113.030795] ffff8881e8709c08: 0000000000000000 (0x0) [ 113.030796] ffff8881e8709c10: ffff8881059402a0 (0xffff8881059402a0) [ 113.030797] ffff8881e8709c18: ffffffff97862595 (kmem_cache_free+0xf5/0x590) [ 113.030799] ffff8881e8709c20: ffff888114d87038 (0xffff888114d87038) [ 113.030799] ffff8881e8709c28: 0000000000000000 (0x0) [ 113.030801] ffff8881e8709c30: ffffffff988f4da0 (dec_pending+0x1f0/0x900) [ 113.030802] ffff8881e8709c38: ffff8881059403b8 (0xffff8881059403b8) [ 113.030803] ffff8881e8709c40: ffff88811858c0a8 (0xffff88811858c0a8) [ 113.030804] ffff8881e8709c48: 0000000000000000 (0x0) [ 113.030805] ffff8881e8709c50: 0000000000000000 (0x0) [ 113.030806] ffff8881e8709c58: ffff88811858c000 (0xffff88811858c000) [ 113.030807] ffff8881e8709c60: ffff88814c427ab8 (0xffff88814c427ab8) [ 113.030808] ffff8881e8709c68: ffff888105940000 (0xffff888105940000) [ 113.030809] ffff8881e8709c70: 0000000000000000 (0x0) [ 113.030810] ffff8881e8709c78: ffff88814c427ac8 (0xffff88814c427ac8) [ 113.030811] ffff8881e8709c80: ffff8881059402a0 (0xffff8881059402a0) [ 113.030813] ffff8881e8709c88: ffffffff988f4da0 (dec_pending+0x1f0/0x900) [ 113.030814] ffff8881e8709c90: ffff888105940000 (0xffff888105940000) [ 113.030815] ffff8881e8709c98: 00000000fffd21a8 (0xfffd21a8) [ 113.030816] ffff8881e8709ca0: 0000000400000000 (0x400000000) [ 113.030817] ffff8881e8709ca8: ffff88814c427ac0 (0xffff88814c427ac0) [ 113.030818] ffff8881e8709cb0: 1ffff1103d0e139c (0x1ffff1103d0e139c) [ 113.030819] ffff8881e8709cb8: 1ffff1103d0e13a0 (0x1ffff1103d0e13a0) [ 113.030820] ffff8881e8709cc0: ffff88811858c0a8 (0xffff88811858c0a8) [ 113.030822] ffff8881e8709cc8: ffff88814c427ab8 (0xffff88814c427ab8) [ 113.030823] ffff8881e8709cd0: ffff88811858c000 (0xffff88811858c000) [ 113.030824] ffff8881e8709cd8: 0000000000000000 (0x0) [ 113.030825] ffff8881e8709ce0: ffff88811858c088 (0xffff88811858c088) [ 113.030826] ffff8881e8709ce8: ffffffff988fad15 (clone_endio+0x1e5/0x940) [ 113.030827] ffff8881e8709cf0: ffff888102a26c38 (0xffff888102a26c38) [ 113.030828] ffff8881e8709cf8: ffff88811858c0b0 (0xffff88811858c0b0) [ 113.030829] ffff8881e8709d00: 0000000041b58ab3 (0x41b58ab3) [ 113.030831] ffff8881e8709d08: ffffffff99e07e8f (.LC0+0x31f92/0x36703) [ 113.030833] ffff8881e8709d10: ffffffff988fab30 (disable_discard+0xd0/0xd0) [ 113.030834] ffff8881e8709d18: ffff88811858c0a8 (0xffff88811858c0a8) [ 113.030835] ffff8881e8709d20: ffff88811858c000 (0xffff88811858c000) [ 113.030836] ffff8881e8709d28: ffff88811858c0e0 (0xffff88811858c0e0) [ 113.030837] ffff8881e8709d30: ffff88811858c0b0 (0xffff88811858c0b0) [ 113.030838] ffff8881e8709d38: ffff88811a753300 (0xffff88811a753300) [ 113.030840] ffff8881e8709d40: dffffc0000000000 (0xdffffc0000000000) [ 113.030841] ffff8881e8709d48: ffff88811858c0a8 (0xffff88811858c0a8) [ 113.030842] ffff8881e8709d50: ffff88811a75331c (0xffff88811a75331c) [ 113.030843] ffff8881e8709d58: 0000000000001000 (0x1000) [ 113.030844] ffff8881e8709d60: 0000000000001000 (0x1000) [ 113.030845] ffff8881e8709d68: ffff88811a753300 (0xffff88811a753300) [ 113.030846] ffff8881e8709d70: ffffffff97eac8f6 (blk_update_request+0x716/0xe20) [ 113.030847] ffff8881e8709d78: ffffffff00000000 (0xffffffff00000000) [ 113.030848] ffff8881e8709d80: 0000000000000000 (0x0) [ 113.030849] ffff8881e8709d88: ffffed10234ea667 (0xffffed10234ea667) [ 113.030851] ffff8881e8709d90: ffff88811858c0d0 (0xffff88811858c0d0) [ 113.030852] ffff8881e8709d98: ffffed10234ea663 (0xffffed10234ea663) [ 113.030853] ffff8881e8709da0: ffff88811a753318 (0xffff88811a753318) [ 113.030854] ffff8881e8709da8: ffff88811a753338 (0xffff88811a753338) [ 113.030855] ffff8881e8709db0: ffff888111788800 (0xffff888111788800) [ 113.030856] ffff8881e8709db8: dffffc0000000000 (0xdffffc0000000000) [ 113.030857] ffff8881e8709dc0: ffff88811a753300 (0xffff88811a753300) [ 113.030858] ffff8881e8709dc8: 0000000000000000 (0x0) [ 113.030859] ffff8881e8709dd0: ffff88811a753300 (0xffff88811a753300) [ 113.030860] ffff8881e8709dd8: ffff888116c828c0 (0xffff888116c828c0) [ 113.030861] ffff8881e8709de0: ffffed1020316ea8 (0xffffed1020316ea8) [ 113.030863] ffff8881e8709de8: ffffffff97ed523b (blk_mq_end_request+0x4b/0x480) [ 113.030864] ffff8881e8709df0: dffffc0000000000 (0xdffffc0000000000) [ 113.030865] ffff8881e8709df8: dffffc0000000000 (0xdffffc0000000000) [ 113.030866] ffff8881e8709e00: ffff8881018b7480 (0xffff8881018b7480) [ 113.030867] ffff8881e8709e08: 0000000000000000 (0x0) [ 113.030868] ffff8881e8709e10: ffff88811a753300 (0xffff88811a753300) [ 113.030869] ffff8881e8709e18: ffff888116c828c0 (0xffff888116c828c0) [ 113.030871] ffff8881e8709e20: ffffffffc046e863 (nvme_process_cq+0x563/0xa40 [nvme]) [ 113.030872] ffff8881e8709e28: ffffed1020316ead (0xffffed1020316ead) [ 113.030873] ffff8881e8709e30: ffff888116c828cc (0xffff888116c828cc) [ 113.030874] ffff8881e8709e38: 0000000197314fd1 (0x197314fd1) [ 113.030875] ffff8881e8709e40: ffff8881018b756c (0xffff8881018b756c) [ 113.030876] ffff8881e8709e48: ffff8881018b7560 (0xffff8881018b7560) [ 113.030878] ffff8881e8709e50: ffff8881018b756a (0xffff8881018b756a) [ 113.030879] ffff8881e8709e58: ffff88810e3c98c0 (0xffff88810e3c98c0) [ 113.030880] ffff8881e8709e60: ffff8881018b7540 (0xffff8881018b7540) [ 113.030881] ffff8881e8709e68: ffff8881018b7568 (0xffff8881018b7568) [ 113.030883] ffff8881e8709e70: ffffffff9733f50a (handle_irq_event+0xda/0x240) [ 113.030884] ffff8881e8709e78: dffffc0000000000 (0xdffffc0000000000) [ 113.030886] ffff8881e8709e80: ffffffffc046fd80 (nvme_suspend+0x330/0x330 [nvme]) [ 113.030887] ffff8881e8709e88: ffff8881e8709f48 (0xffff8881e8709f48) [ 113.030888] ffff8881e8709e90: ffff88810e3c9800 (0xffff88810e3c9800) [ 113.030889] ffff8881e8709e98: ffff8881152e7f00 (0xffff8881152e7f00) [ 113.030890] ffff8881e8709ea0: 000000000000001c (0x1c) [ 113.030892] ffff8881e8709ea8: ffffffffc046fd90 (nvme_irq+0x10/0x20 [nvme]) [ 113.030894] ffff8881e8709eb0: ffffffff9733ef52 (__handle_irq_event_percpu+0x252/0x620) [ 113.030895] ffff8881e8709eb8: 0000000000000002 (0x2) [ 113.030896] ffff8881e8709ec0: 0000000000000000 (0x0) [ 113.030897] ffff8881e8709ec8: ffffed1021c7930f (0xffffed1021c7930f) [ 113.030898] ffff8881e8709ed0: ffff88810ebd3eb8 (0xffff88810ebd3eb8) [ 113.030899] ffff8881e8709ed8: ffff88810ebd31c0 (0xffff88810ebd31c0) [ 113.030900] ffff8881e8709ee0: ffff88810e3c9878 (0xffff88810e3c9878) [ 113.030901] ffff8881e8709ee8: ffff88810e3c9800 (0xffff88810e3c9800) [ 113.030902] ffff8881e8709ef0: 1ffff1103d0e13e5 (0x1ffff1103d0e13e5) [ 113.030903] ffff8881e8709ef8: ffff88810e3c9800 (0xffff88810e3c9800) [ 113.030904] ffff8881e8709f00: ffff88810e3c9838 (0xffff88810e3c9838) [ 113.030906] ffff8881e8709f08: ffff88810e3c98a8 (0xffff88810e3c98a8) [ 113.030907] ffff8881e8709f10: ffff88810e3c9800 (0xffff88810e3c9800) [ 113.030908] ffff8881e8709f18: ffffffff9733f51f (handle_irq_event+0xef/0x240) [ 113.030909] ffff8881e8709f20: 0000000000000002 (0x2) [ 113.030910] ffff8881e8709f28: 0000000041b58ab3 (0x41b58ab3) [ 113.030912] ffff8881e8709f30: ffffffff99d88063 (.LC0+0x10d5/0x2d57) [ 113.030914] ffff8881e8709f38: ffffffff9733f430 (handle_irq_event_percpu+0x110/0x110) [ 113.030915] ffff8881e8709f40: 0000000000000000 (0x0) [ 113.030916] ffff8881e8709f48: 0000000000000000 (0x0) [ 113.030917] ffff8881e8709f50: ffff88810e3c98a8 (0xffff88810e3c98a8) [ 113.030918] ffff8881e8709f58: ffff88810e3c987c (0xffff88810e3c987c) [ 113.030919] ffff8881e8709f60: 0000000000000000 (0x0) [ 113.030920] ffff8881e8709f68: ffffed1021c7930e (0xffffed1021c7930e) [ 113.030921] ffff8881e8709f70: ffff88810e3c9838 (0xffff88810e3c9838) [ 113.030922] ffff8881e8709f78: ffff88810e3c987c (0xffff88810e3c987c) [ 113.030923] ffff8881e8709f80: ffffed1021c7930f (0xffffed1021c7930f) [ 113.030925] ffff8881e8709f88: dffffc0000000000 (0xdffffc0000000000) [ 113.030926] ffff8881e8709f90: ffffffff9734bff6 (handle_edge_irq+0x1f6/0xb60) [ 113.030927] ffff8881e8709f98: ffff88810e3c98a8 (0xffff88810e3c98a8) [ 113.030928] ffff8881e8709fa0: ffff88810e3c9870 (0xffff88810e3c9870) [ 113.030929] ffff8881e8709fa8: ffff88810e3c9840 (0xffff88810e3c9840) [ 113.030930] ffff8881e8709fb0: ffff88810e3c9828 (0xffff88810e3c9828) [ 113.030931] ffff8881e8709fb8: 0000000000000000 (0x0) [ 113.030932] ffff8881e8709fc0: 0000000000000023 (0x23) [ 113.030933] ffff8881e8709fc8: ffff88814c42fe30 (0xffff88814c42fe30) [ 113.030934] ffff8881e8709fd0: ffff88810e3c9800 (0xffff88810e3c9800) [ 113.030935] ffff8881e8709fd8: 0000000000000000 (0x0) [ 113.030936] ffff8881e8709fe0: 0000000000000000 (0x0) [ 113.030937] ffff8881e8709fe8: 0000000000000023 (0x23) [ 113.030939] ffff8881e8709ff0: ffffffff992010b2 (asm_call_irq_on_stack+0x12/0x20) [ 113.030940] ffff8881e8709ff8: ffff88814c42fe30 (0xffff88814c42fe30) [ 113.030942] ffff88814c42fe30: ffff88814c42fe78 (0xffff88814c42fe78) [ 113.030943] ffff88814c42fe38: ffffffff9918f05b (common_interrupt+0xeb/0x190) [ 113.030944] ffff88814c42fe40: 0000000000000000 (0x0) [ 113.030945] ffff88814c42fe48: 0000000000000000 (0x0) [ 113.030946] ffff88814c42fe50: 0000000000000000 (0x0) [ 113.030947] ffff88814c42fe58: 0000000000000000 (0x0) [ 113.030948] ffff88814c42fe60: 0000000000000000 (0x0) [ 113.030949] ffff88814c42fe68: 0000000000000000 (0x0) [ 113.030950] ffff88814c42fe70: ffffffff99200c5e (asm_common_interrupt+0x1e/0x40) [ 113.030951] ffff88814c42fe78: 0000000000000000 (0x0) [ 113.030952] ffff88814c42fe80: 0000000000000000 (0x0) [ 113.030953] ffff88814c42fe88: 0000000000000000 (0x0) [ 113.030954] ffff88814c42fe90: ffff88810ebd31c0 (0xffff88810ebd31c0) [ 113.030955] ffff88814c42fe98: ffff88814c42ff58 (0xffff88814c42ff58) [ 113.030956] ffff88814c42fea0: 0000000000000000 (0x0) [ 113.030957] ffff88814c42fea8: 0000000000000001 (0x1) [ 113.030958] ffff88814c42feb0: ffffed1021d7a638 (0xffffed1021d7a638) [ 113.030959] ffff88814c42feb8: ffff88810ebd31c7 (0xffff88810ebd31c7) [ 113.030960] ffff88814c42fec0: 0000000000000000 (0x0) [ 113.030961] ffff88814c42fec8: 0000000000400140 (0x400140) [ 113.030963] ffff88814c42fed0: ffffffff991a91f4 (schedule+0xe4/0x270) [ 113.030963] ffff88814c42fed8: 0000000000000000 (0x0) [ 113.030964] ffff88814c42fee0: 0000000000000008 (0x8) [ 113.030965] ffff88814c42fee8: ffff88810ebd31c0 (0xffff88810ebd31c0) [ 113.030967] ffff88814c42fef0: ffffffffffffffff (0xffffffffffffffff) [ 113.030968] ffff88814c42fef8: ffffffff97395f70 (exit_to_user_mode_prepare+0xc0/0x1d0) [ 113.030969] ffff88814c42ff00: 0000000000000010 (0x10) [ 113.030970] ffff88814c42ff08: 0000000000000246 (0x246) [ 113.030971] ffff88814c42ff10: ffff88814c42ff28 (0xffff88814c42ff28) [ 113.030972] ffff88814c42ff18: 0000000000000018 (0x18) [ 113.030973] ffff88814c42ff20: 0000000000000000 (0x0) [ 113.030974] ffff88814c42ff28: 0000000000000246 (0x246) [ 113.030975] ffff88814c42ff30: 0000000000000000 (0x0) [ 113.030976] ffff88814c42ff38: 0000000000000000 (0x0) [ 113.030978] ffff88814c42ff40: ffffffff99200c48 (asm_common_interrupt+0x8/0x40) [ 113.030980] ffff88814c42ff48: ffffffff99191c95 (irqentry_exit_to_user_mode+0x5/0x40) [ 113.030981] ffff88814c42ff50: ffffffff99200c5e (asm_common_interrupt+0x1e/0x40) [ 113.030982] ffff88814c42ff58: 0000000000aaa828 (0xaaa828) [ 113.030983] ffff88814c42ff60: 0000000000001000 (0x1000) [ 113.030984] ffff88814c42ff68: 0000000000000000 (0x0) [ 113.030985] ffff88814c42ff70: 00007f02266d1460 (0x7f02266d1460) [ 113.030986] ffff88814c42ff78: 0000000000aaa800 (0xaaa800) [ 113.030987] ffff88814c42ff80: 0000000000aaa800 (0xaaa800) [ 113.030988] ffff88814c42ff88: 0000000000000293 (0x293) [ 113.030989] ffff88814c42ff90: 0000000000b59000 (0xb59000) [ 113.030990] ffff88814c42ff98: 00007ffdf02c4090 (0x7ffdf02c4090) [ 113.030991] ffff88814c42ffa0: 0000000000000000 (0x0) [ 113.030992] ffff88814c42ffa8: 0000000000001000 (0x1000) [ 113.030994] ffff88814c42ffb0: 00007f028a34124f (0x7f028a34124f) [ 113.030995] ffff88814c42ffb8: 0000000000001000 (0x1000) [ 113.030995] ffff88814c42ffc0: 0000000000abd000 (0xabd000) [ 113.030996] ffff88814c42ffc8: 0000000000000007 (0x7) [ 113.030998] ffff88814c42ffd0: ffffffffffffffff (0xffffffffffffffff) [ 113.030999] ffff88814c42ffd8: 00007f028a34124f (0x7f028a34124f) [ 113.031000] ffff88814c42ffe0: 0000000000000033 (0x33) [ 113.031000] ffff88814c42ffe8: 0000000000000293 (0x293) [ 113.031002] ffff88814c42fff0: 00007ffdf01f96d0 (0x7ffdf01f96d0) [ 113.031002] ffff88814c42fff8: 000000000000002b (0x2b) [ 115.105359] smpboot: CPU 3 is now offline [ 115.359319] smpboot: CPU 2 is now offline ... -- Best Regards, Shin'ichiro Kawasaki ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-06 6:04 WARNING: can't access registers at asm_common_interrupt Shinichiro Kawasaki 2020-11-06 18:06 ` Josh Poimboeuf @ 2020-11-11 17:05 ` Josh Poimboeuf 2020-11-11 17:47 ` Peter Zijlstra 1 sibling, 1 reply; 32+ messages in thread From: Josh Poimboeuf @ 2020-11-11 17:05 UTC (permalink / raw) To: Shinichiro Kawasaki Cc: linux-kernel, Nicholas Piggin, Damien Le Moal, Peter Zijlstra On Fri, Nov 06, 2020 at 06:04:15AM +0000, Shinichiro Kawasaki wrote: > Greetings, > > I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40" > in my kernel test system repeatedly, which is printed by unwind_next_frame() in > "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar > warning was reported and discussed [2], but I suppose the cause is not yet > clarified. > > The warning was observed with v5.10-rc2 and older tags. I bisected and found > that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3 > triggered the warning. Reverting that from 5.10-rc2, the warning disappeared. > May I ask comment by expertise on CC how this commit can relate to the warning? > > The test condition to reproduce the warning is rather unique (blktests, > dm-linear and ZNS device emulation by QEMU). If any action is suggested for > further analysis, I'm willing to take it with my test system. > > Wish this report helps. > > [1] https://lkml.org/lkml/2020/9/6/231 > [2] https://lkml.org/lkml/2020/9/8/1538 Shin'ichiro, Thanks for all the data. It looks like the ORC unwinder is getting confused by paravirt patching (with runtime-patched pushf/pop changing the stack layout). <user interrupt> exit_to_user_mode_prepare() exit_to_user_mode_loop() local_irq_disable_exit_to_user() local_irq_disable() raw_irqs_disabled() arch_irqs_disabled() arch_local_save_flags() pushfq <another interrupt> Objtool doesn't know about the pushf/pop paravirt patch, so ORC gets confused by the changed stack layout. I'm thinking we either need to teach objtool how to deal with save_fl/restore_fl patches, or we need to just get rid of those nasty patches somehow. Peter, any thoughts? It looks like 044d0d6de9f5 ("lockdep: Only trace IRQ edges") is making the problem more likely, by adding the irqs_disabled() check for every local_irq_disable(). Also - Peter, Nicholas - is that irqs_disabled() check really necessary in local_irq_disable()? Presumably irqs would typically be be enabled before calling it? -- Josh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-11 17:05 ` Josh Poimboeuf @ 2020-11-11 17:47 ` Peter Zijlstra 2020-11-11 18:13 ` Josh Poimboeuf 0 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2020-11-11 17:47 UTC (permalink / raw) To: Josh Poimboeuf Cc: Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, andrew.cooper3, jgross On Wed, Nov 11, 2020 at 11:05:36AM -0600, Josh Poimboeuf wrote: > On Fri, Nov 06, 2020 at 06:04:15AM +0000, Shinichiro Kawasaki wrote: > > Greetings, > > > > I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40" > > in my kernel test system repeatedly, which is printed by unwind_next_frame() in > > "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar > > warning was reported and discussed [2], but I suppose the cause is not yet > > clarified. > > > > The warning was observed with v5.10-rc2 and older tags. I bisected and found > > that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3 > > triggered the warning. Reverting that from 5.10-rc2, the warning disappeared. > > May I ask comment by expertise on CC how this commit can relate to the warning? > > > > The test condition to reproduce the warning is rather unique (blktests, > > dm-linear and ZNS device emulation by QEMU). If any action is suggested for > > further analysis, I'm willing to take it with my test system. > > > > Wish this report helps. > > > > [1] https://lkml.org/lkml/2020/9/6/231 > > [2] https://lkml.org/lkml/2020/9/8/1538 > > Shin'ichiro, > > Thanks for all the data. It looks like the ORC unwinder is getting > confused by paravirt patching (with runtime-patched pushf/pop changing > the stack layout). > > <user interrupt> > exit_to_user_mode_prepare() > exit_to_user_mode_loop() > local_irq_disable_exit_to_user() > local_irq_disable() > raw_irqs_disabled() > arch_irqs_disabled() > arch_local_save_flags() > pushfq > <another interrupt> This is PARAVIRT_XXL only, which is a Xen special. My preference, as always, is to kill it... Sadly the Xen people have a different opinion. > Objtool doesn't know about the pushf/pop paravirt patch, so ORC gets > confused by the changed stack layout. > > I'm thinking we either need to teach objtool how to deal with > save_fl/restore_fl patches, or we need to just get rid of those nasty > patches somehow. Peter, any thoughts? Don't use Xen? ;-) So with PARAVIRT_XXL the compiler will emit something like: "CALL *pvops.save_fl" Which we then overwrite at runtime with "pushf; pop %[re]ax" and a few NOPs. Now, objtool understands alternatives, and ensures they have the same stack layout, it has no chance in hell of understanding this, simply because paravirt_patch.c is magic. I don't have any immediate clever ideas, but let me ponder it a wee bit. .... Something really disguisting we could do is recognise the indirect call offset and emit an extra ORC entry for RIP+1. So the cases are: CALL *pv_ops.save_fl -- 7 bytes IIRC CALL $imm; -- 5 bytes PUSHF; POP %[RE]AX -- 2 bytes so the RIP+1 (the POP insn) will only ever exist in this case. The indirect and direct call cases would never land on that IP. .... > It looks like 044d0d6de9f5 ("lockdep: Only trace IRQ edges") is making > the problem more likely, by adding the irqs_disabled() check for every > local_irq_disable(). > > Also - Peter, Nicholas - is that irqs_disabled() check really necessary > in local_irq_disable()? Presumably irqs would typically be be enabled > before calling it? Yeah, so it's all a giant can of worms that; also see: https://lkml.kernel.org/r/20200821084738.508092956@infradead.org The basic idea is to only trace edges, ie. when the hardware state actually changes. Sadly this means doing a pushf/pop before the cli. Ideally CLI would store the old IF in CF or something like that, but alas. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-11 17:47 ` Peter Zijlstra @ 2020-11-11 18:13 ` Josh Poimboeuf 2020-11-11 18:46 ` Andrew Cooper 2020-11-11 20:42 ` Peter Zijlstra 0 siblings, 2 replies; 32+ messages in thread From: Josh Poimboeuf @ 2020-11-11 18:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, andrew.cooper3, jgross On Wed, Nov 11, 2020 at 06:47:36PM +0100, Peter Zijlstra wrote: > This is PARAVIRT_XXL only, which is a Xen special. My preference, as > always, is to kill it... Sadly the Xen people have a different opinion. That would be soooo nice... then we could get rid of paravirt patching altogether and replace it with static calls. > > Objtool doesn't know about the pushf/pop paravirt patch, so ORC gets > > confused by the changed stack layout. > > > > I'm thinking we either need to teach objtool how to deal with > > save_fl/restore_fl patches, or we need to just get rid of those nasty > > patches somehow. Peter, any thoughts? > > Don't use Xen? ;-) > > So with PARAVIRT_XXL the compiler will emit something like: > > "CALL *pvops.save_fl" > > Which we then overwrite at runtime with "pushf; pop %[re]ax" and a few > NOPs. > > Now, objtool understands alternatives, and ensures they have the same > stack layout, it has no chance in hell of understanding this, simply > because paravirt_patch.c is magic. > > I don't have any immediate clever ideas, but let me ponder it a wee bit. > > .... > > Something really disguisting we could do is recognise the indirect call > offset and emit an extra ORC entry for RIP+1. So the cases are: > > CALL *pv_ops.save_fl -- 7 bytes IIRC > CALL $imm; -- 5 bytes > PUSHF; POP %[RE]AX -- 2 bytes > > so the RIP+1 (the POP insn) will only ever exist in this case. The > indirect and direct call cases would never land on that IP. I had a similar idea, and a bit of deja vu - we may have talked about this before. At least I know we talked about doing something similar for alternatives which muck with the stack. > > It looks like 044d0d6de9f5 ("lockdep: Only trace IRQ edges") is making > > the problem more likely, by adding the irqs_disabled() check for every > > local_irq_disable(). > > > > Also - Peter, Nicholas - is that irqs_disabled() check really necessary > > in local_irq_disable()? Presumably irqs would typically be be enabled > > before calling it? > > Yeah, so it's all a giant can of worms that; also see: > > https://lkml.kernel.org/r/20200821084738.508092956@infradead.org > > The basic idea is to only trace edges, ie. when the hardware state > actually changes. Sadly this means doing a pushf/pop before the cli. > Ideally CLI would store the old IF in CF or something like that, but > alas. Right, that makes sense for save/restore, but is the disabled check really needed for local_irq_disable()? Wouldn't that always be an edge? And anyway I don't see a similar check for local_irq_enable(). -- Josh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-11 18:13 ` Josh Poimboeuf @ 2020-11-11 18:46 ` Andrew Cooper 2020-11-11 19:42 ` Peter Zijlstra 2020-11-11 20:42 ` Peter Zijlstra 1 sibling, 1 reply; 32+ messages in thread From: Andrew Cooper @ 2020-11-11 18:46 UTC (permalink / raw) To: Josh Poimboeuf, Peter Zijlstra Cc: Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, jgross On 11/11/2020 18:13, Josh Poimboeuf wrote: > On Wed, Nov 11, 2020 at 06:47:36PM +0100, Peter Zijlstra wrote: >> This is PARAVIRT_XXL only, which is a Xen special. My preference, as >> always, is to kill it... Sadly the Xen people have a different opinion. > That would be soooo nice... then we could get rid of paravirt patching > altogether and replace it with static calls. > >>> Objtool doesn't know about the pushf/pop paravirt patch, so ORC gets >>> confused by the changed stack layout. >>> >>> I'm thinking we either need to teach objtool how to deal with >>> save_fl/restore_fl patches, or we need to just get rid of those nasty >>> patches somehow. Peter, any thoughts? >> Don't use Xen? ;-) >> >> So with PARAVIRT_XXL the compiler will emit something like: >> >> "CALL *pvops.save_fl" >> >> Which we then overwrite at runtime with "pushf; pop %[re]ax" and a few >> NOPs. >> >> Now, objtool understands alternatives, and ensures they have the same >> stack layout, it has no chance in hell of understanding this, simply >> because paravirt_patch.c is magic. >> >> I don't have any immediate clever ideas, but let me ponder it a wee bit. Well... static_calls are a newer, and more generic, form of pvops. Most of the magic is to do with inlining small fragments, but static calls can do that now too, IIRC? >> .... >> >> Something really disguisting we could do is recognise the indirect call >> offset and emit an extra ORC entry for RIP+1. So the cases are: >> >> CALL *pv_ops.save_fl -- 7 bytes IIRC >> CALL $imm; -- 5 bytes >> PUSHF; POP %[RE]AX -- 2 bytes >> >> so the RIP+1 (the POP insn) will only ever exist in this case. The >> indirect and direct call cases would never land on that IP. > I had a similar idea, and a bit of deja vu - we may have talked about > this before. At least I know we talked about doing something similar > for alternatives which muck with the stack. The main complexity with pvops is that the CALL *pv_ops.save_fl form needs to be usable from extremely early in the day (pre general patching), hence the use of function pointers and some non-standard ABIs. For performance reasons, the end result of this pvop wants to be `pushf; pop %[re]ax` in then native case, and `call xen_pv_save_fl` in the Xen case, but this doesn't mean that the compiled instruction needs to be a function pointer to begin with. Would objtool have an easier time coping if this were implemented in terms of a static call? ~Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-11 18:46 ` Andrew Cooper @ 2020-11-11 19:42 ` Peter Zijlstra 2020-11-11 19:59 ` Josh Poimboeuf 0 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2020-11-11 19:42 UTC (permalink / raw) To: Andrew Cooper Cc: Josh Poimboeuf, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, jgross, x86 On Wed, Nov 11, 2020 at 06:46:37PM +0000, Andrew Cooper wrote: > Well... > > static_calls are a newer, and more generic, form of pvops. Most of the > magic is to do with inlining small fragments, but static calls can do > that now too, IIRC? If you're referring to this glorious hack: https://lkml.kernel.org/r/20201110101307.GO2651@hirez.programming.kicks-ass.net that only 'works' because it's a single instruction. That is, static_call can only poke single instructions. They cannot replace a call with "PUSHF; POP" / "PUSH; POPF" for example. They also cannot do NOP padding for 'short' sequences. Paravirt, like alternatives, are special in that they only happen once, before SMP bringup. > >> Something really disguisting we could do is recognise the indirect call > >> offset and emit an extra ORC entry for RIP+1. So the cases are: > >> > >> CALL *pv_ops.save_fl -- 7 bytes IIRC > >> CALL $imm; -- 5 bytes > >> PUSHF; POP %[RE]AX -- 2 bytes > >> > >> so the RIP+1 (the POP insn) will only ever exist in this case. The > >> indirect and direct call cases would never land on that IP. > > I had a similar idea, and a bit of deja vu - we may have talked about > > this before. At least I know we talked about doing something similar > > for alternatives which muck with the stack. Vague memories... luckily we managed to get alternatives to a state where they match, which is much saner. > The main complexity with pvops is that the > > CALL *pv_ops.save_fl > > form needs to be usable from extremely early in the day (pre general > patching), hence the use of function pointers and some non-standard ABIs. The performance rasins mentioned below are a large part of the non-standard ABI (eg CALLEE_SAVE) > For performance reasons, the end result of this pvop wants to be `pushf; > pop %[re]ax` in then native case, and `call xen_pv_save_fl` in the Xen > case, but this doesn't mean that the compiled instruction needs to be a > function pointer to begin with. Not sure emitting the native code would be feasible.. also cpu_usergs_sysret64 is 6 bytes. > Would objtool have an easier time coping if this were implemented in > terms of a static call? I doubt it, the big problem is that there is no visibility into the actual alternative text. Runtime patching fragments into static call would have the exact same problem. Something that _might_ maybe work is trying to morph the immediate fragments into an alternative. That is, instead of this: static inline notrace unsigned long arch_local_save_flags(void) { return PVOP_CALLEE0(unsigned long, irq.save_fl); } Write it something like: static inline notrace unsigned long arch_local_save_flags(void) { PVOP_CALL_ARGS; PVOP_TEST_NULL(irq.save_fl); asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), "PUSHF; POP _ASM_AX", X86_FEATURE_NATIVE) : CLBR_RET_REG, ASM_CALL_CONSTRAINT : paravirt_type(irq.save_fl.func), paravirt_clobber(PVOP_CALLEE_CLOBBERS) : "memory", "cc"); return __eax; } And then we have to teach objtool how to deal with conflicting alternatives... That would remove most (all, if we can figure out a form that deals with the spinlock fragments) of paravirt_patch.c Hmm? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-11 19:42 ` Peter Zijlstra @ 2020-11-11 19:59 ` Josh Poimboeuf 2020-11-11 20:07 ` Peter Zijlstra 0 siblings, 1 reply; 32+ messages in thread From: Josh Poimboeuf @ 2020-11-11 19:59 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Cooper, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, jgross, x86 On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote: > > Would objtool have an easier time coping if this were implemented in > > terms of a static call? > > I doubt it, the big problem is that there is no visibility into the > actual alternative text. Runtime patching fragments into static call > would have the exact same problem. > > Something that _might_ maybe work is trying to morph the immediate > fragments into an alternative. That is, instead of this: > > static inline notrace unsigned long arch_local_save_flags(void) > { > return PVOP_CALLEE0(unsigned long, irq.save_fl); > } > > Write it something like: > > static inline notrace unsigned long arch_local_save_flags(void) > { > PVOP_CALL_ARGS; > PVOP_TEST_NULL(irq.save_fl); > asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), > "PUSHF; POP _ASM_AX", > X86_FEATURE_NATIVE) > : CLBR_RET_REG, ASM_CALL_CONSTRAINT > : paravirt_type(irq.save_fl.func), > paravirt_clobber(PVOP_CALLEE_CLOBBERS) > : "memory", "cc"); > return __eax; > } > > And then we have to teach objtool how to deal with conflicting > alternatives... > > That would remove most (all, if we can figure out a form that deals with > the spinlock fragments) of paravirt_patch.c > > Hmm? I was going to suggest something similar. Though I would try to take it further and replace paravirt_patch_default() with static calls. Either way it doesn't make objtool's job much easier. But it would be nice to consolidate runtime patching mechanisms and get rid of .parainstructions. -- Josh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-11 19:59 ` Josh Poimboeuf @ 2020-11-11 20:07 ` Peter Zijlstra 2020-11-11 20:15 ` Josh Poimboeuf 0 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2020-11-11 20:07 UTC (permalink / raw) To: Josh Poimboeuf Cc: Andrew Cooper, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, jgross, x86 On Wed, Nov 11, 2020 at 01:59:00PM -0600, Josh Poimboeuf wrote: > On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote: > > > Would objtool have an easier time coping if this were implemented in > > > terms of a static call? > > > > I doubt it, the big problem is that there is no visibility into the > > actual alternative text. Runtime patching fragments into static call > > would have the exact same problem. > > > > Something that _might_ maybe work is trying to morph the immediate > > fragments into an alternative. That is, instead of this: > > > > static inline notrace unsigned long arch_local_save_flags(void) > > { > > return PVOP_CALLEE0(unsigned long, irq.save_fl); > > } > > > > Write it something like: > > > > static inline notrace unsigned long arch_local_save_flags(void) > > { > > PVOP_CALL_ARGS; > > PVOP_TEST_NULL(irq.save_fl); > > asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), > > "PUSHF; POP _ASM_AX", > > X86_FEATURE_NATIVE) > > : CLBR_RET_REG, ASM_CALL_CONSTRAINT > > : paravirt_type(irq.save_fl.func), > > paravirt_clobber(PVOP_CALLEE_CLOBBERS) > > : "memory", "cc"); > > return __eax; > > } > > > > And then we have to teach objtool how to deal with conflicting > > alternatives... > > > > That would remove most (all, if we can figure out a form that deals with > > the spinlock fragments) of paravirt_patch.c > > > > Hmm? > > I was going to suggest something similar. Though I would try to take it > further and replace paravirt_patch_default() with static calls. Possible, we just need to be _really_ careful to not allow changing those static_call()s. So maybe we need DEFINE_STATIC_CALL_RO() which does a __ro_after_init on the whole thing. > Either way it doesn't make objtool's job much easier. But it would be > nice to consolidate runtime patching mechanisms and get rid of > .parainstructions. I think the above (combining alternative and paravirt/static_call) does make objtool's job easier, since then we at least have the actual alternative instructions available to inspect, or am I mis-understanding things? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-11 20:07 ` Peter Zijlstra @ 2020-11-11 20:15 ` Josh Poimboeuf 2020-11-11 20:25 ` Andrew Cooper 2020-11-11 20:35 ` Peter Zijlstra 0 siblings, 2 replies; 32+ messages in thread From: Josh Poimboeuf @ 2020-11-11 20:15 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Cooper, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, jgross, x86 On Wed, Nov 11, 2020 at 09:07:30PM +0100, Peter Zijlstra wrote: > On Wed, Nov 11, 2020 at 01:59:00PM -0600, Josh Poimboeuf wrote: > > On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote: > > > > Would objtool have an easier time coping if this were implemented in > > > > terms of a static call? > > > > > > I doubt it, the big problem is that there is no visibility into the > > > actual alternative text. Runtime patching fragments into static call > > > would have the exact same problem. > > > > > > Something that _might_ maybe work is trying to morph the immediate > > > fragments into an alternative. That is, instead of this: > > > > > > static inline notrace unsigned long arch_local_save_flags(void) > > > { > > > return PVOP_CALLEE0(unsigned long, irq.save_fl); > > > } > > > > > > Write it something like: > > > > > > static inline notrace unsigned long arch_local_save_flags(void) > > > { > > > PVOP_CALL_ARGS; > > > PVOP_TEST_NULL(irq.save_fl); > > > asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), > > > "PUSHF; POP _ASM_AX", > > > X86_FEATURE_NATIVE) > > > : CLBR_RET_REG, ASM_CALL_CONSTRAINT > > > : paravirt_type(irq.save_fl.func), > > > paravirt_clobber(PVOP_CALLEE_CLOBBERS) > > > : "memory", "cc"); > > > return __eax; > > > } > > > > > > And then we have to teach objtool how to deal with conflicting > > > alternatives... > > > > > > That would remove most (all, if we can figure out a form that deals with > > > the spinlock fragments) of paravirt_patch.c > > > > > > Hmm? > > > > I was going to suggest something similar. Though I would try to take it > > further and replace paravirt_patch_default() with static calls. > > Possible, we just need to be _really_ careful to not allow changing > those static_call()s. So maybe we need DEFINE_STATIC_CALL_RO() which > does a __ro_after_init on the whole thing. But what if you want to live migrate to another hypervisor ;-) > > Either way it doesn't make objtool's job much easier. But it would be > > nice to consolidate runtime patching mechanisms and get rid of > > .parainstructions. > > I think the above (combining alternative and paravirt/static_call) does > make objtool's job easier, since then we at least have the actual > alternative instructions available to inspect, or am I mis-understanding > things? Right, it makes objtool's job a _little_ easier, since it already knows how to read alternatives. But it still has to learn to deal with the conflicting stack layouts. -- Josh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-11 20:15 ` Josh Poimboeuf @ 2020-11-11 20:25 ` Andrew Cooper 2020-11-11 20:39 ` Peter Zijlstra 2020-11-13 17:34 ` Andy Lutomirski 2020-11-11 20:35 ` Peter Zijlstra 1 sibling, 2 replies; 32+ messages in thread From: Andrew Cooper @ 2020-11-11 20:25 UTC (permalink / raw) To: Josh Poimboeuf, Peter Zijlstra Cc: Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, jgross, x86 On 11/11/2020 20:15, Josh Poimboeuf wrote: > On Wed, Nov 11, 2020 at 09:07:30PM +0100, Peter Zijlstra wrote: >> On Wed, Nov 11, 2020 at 01:59:00PM -0600, Josh Poimboeuf wrote: >>> On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote: >>>>> Would objtool have an easier time coping if this were implemented in >>>>> terms of a static call? >>>> I doubt it, the big problem is that there is no visibility into the >>>> actual alternative text. Runtime patching fragments into static call >>>> would have the exact same problem. >>>> >>>> Something that _might_ maybe work is trying to morph the immediate >>>> fragments into an alternative. That is, instead of this: >>>> >>>> static inline notrace unsigned long arch_local_save_flags(void) >>>> { >>>> return PVOP_CALLEE0(unsigned long, irq.save_fl); >>>> } >>>> >>>> Write it something like: >>>> >>>> static inline notrace unsigned long arch_local_save_flags(void) >>>> { >>>> PVOP_CALL_ARGS; >>>> PVOP_TEST_NULL(irq.save_fl); >>>> asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), >>>> "PUSHF; POP _ASM_AX", >>>> X86_FEATURE_NATIVE) >>>> : CLBR_RET_REG, ASM_CALL_CONSTRAINT >>>> : paravirt_type(irq.save_fl.func), >>>> paravirt_clobber(PVOP_CALLEE_CLOBBERS) >>>> : "memory", "cc"); >>>> return __eax; >>>> } >>>> >>>> And then we have to teach objtool how to deal with conflicting >>>> alternatives... >>>> >>>> That would remove most (all, if we can figure out a form that deals with >>>> the spinlock fragments) of paravirt_patch.c >>>> >>>> Hmm? >>> I was going to suggest something similar. Though I would try to take it >>> further and replace paravirt_patch_default() with static calls. >> Possible, we just need to be _really_ careful to not allow changing >> those static_call()s. So maybe we need DEFINE_STATIC_CALL_RO() which >> does a __ro_after_init on the whole thing. > But what if you want to live migrate to another hypervisor ;-) The same as what happens currently. The user gets to keep all the resulting pieces ;) >>> Either way it doesn't make objtool's job much easier. But it would be >>> nice to consolidate runtime patching mechanisms and get rid of >>> .parainstructions. >> I think the above (combining alternative and paravirt/static_call) does >> make objtool's job easier, since then we at least have the actual >> alternative instructions available to inspect, or am I mis-understanding >> things? > Right, it makes objtool's job a _little_ easier, since it already knows > how to read alternatives. But it still has to learn to deal with the > conflicting stack layouts. I suppose the needed abstraction is "these blocks will start and end with the same stack layout", while allowing the internals to diverge. ~Andrew ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-11 20:25 ` Andrew Cooper @ 2020-11-11 20:39 ` Peter Zijlstra 2020-11-13 17:34 ` Andy Lutomirski 1 sibling, 0 replies; 32+ messages in thread From: Peter Zijlstra @ 2020-11-11 20:39 UTC (permalink / raw) To: Andrew Cooper Cc: Josh Poimboeuf, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, jgross, x86 On Wed, Nov 11, 2020 at 08:25:36PM +0000, Andrew Cooper wrote: > > Right, it makes objtool's job a _little_ easier, since it already knows > > how to read alternatives. But it still has to learn to deal with the > > conflicting stack layouts. > > I suppose the needed abstraction is "these blocks will start and end > with the same stack layout", while allowing the internals to diverge. It's a little more complicated than that due to the fact that there is only a single ORC table. So something like: alt0 alt1 0x00 push 0x00 nop 0x01 nop 0x01 push is impossible to correctly unwind. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-11 20:25 ` Andrew Cooper 2020-11-11 20:39 ` Peter Zijlstra @ 2020-11-13 17:34 ` Andy Lutomirski 2020-11-14 9:16 ` Jürgen Groß 2020-11-16 11:56 ` Jürgen Groß 1 sibling, 2 replies; 32+ messages in thread From: Andy Lutomirski @ 2020-11-13 17:34 UTC (permalink / raw) To: Andrew Cooper Cc: Josh Poimboeuf, Peter Zijlstra, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, Juergen Gross, X86 ML On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > On 11/11/2020 20:15, Josh Poimboeuf wrote: > > On Wed, Nov 11, 2020 at 09:07:30PM +0100, Peter Zijlstra wrote: > >> On Wed, Nov 11, 2020 at 01:59:00PM -0600, Josh Poimboeuf wrote: > >>> On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote: > >>>>> Would objtool have an easier time coping if this were implemented in > >>>>> terms of a static call? > >>>> I doubt it, the big problem is that there is no visibility into the > >>>> actual alternative text. Runtime patching fragments into static call > >>>> would have the exact same problem. > >>>> > >>>> Something that _might_ maybe work is trying to morph the immediate > >>>> fragments into an alternative. That is, instead of this: > >>>> > >>>> static inline notrace unsigned long arch_local_save_flags(void) > >>>> { > >>>> return PVOP_CALLEE0(unsigned long, irq.save_fl); > >>>> } > >>>> > >>>> Write it something like: > >>>> > >>>> static inline notrace unsigned long arch_local_save_flags(void) > >>>> { > >>>> PVOP_CALL_ARGS; > >>>> PVOP_TEST_NULL(irq.save_fl); > >>>> asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), > >>>> "PUSHF; POP _ASM_AX", > >>>> X86_FEATURE_NATIVE) > >>>> : CLBR_RET_REG, ASM_CALL_CONSTRAINT > >>>> : paravirt_type(irq.save_fl.func), > >>>> paravirt_clobber(PVOP_CALLEE_CLOBBERS) > >>>> : "memory", "cc"); > >>>> return __eax; > >>>> } > >>>> > >>>> And then we have to teach objtool how to deal with conflicting > >>>> alternatives... > >>>> > >>>> That would remove most (all, if we can figure out a form that deals with > >>>> the spinlock fragments) of paravirt_patch.c > >>>> > >>>> Hmm? > >>> I was going to suggest something similar. Though I would try to take it > >>> further and replace paravirt_patch_default() with static calls. > >> Possible, we just need to be _really_ careful to not allow changing > >> those static_call()s. So maybe we need DEFINE_STATIC_CALL_RO() which > >> does a __ro_after_init on the whole thing. > > But what if you want to live migrate to another hypervisor ;-) > > The same as what happens currently. The user gets to keep all the > resulting pieces ;) > > >>> Either way it doesn't make objtool's job much easier. But it would be > >>> nice to consolidate runtime patching mechanisms and get rid of > >>> .parainstructions. > >> I think the above (combining alternative and paravirt/static_call) does > >> make objtool's job easier, since then we at least have the actual > >> alternative instructions available to inspect, or am I mis-understanding > >> things? > > Right, it makes objtool's job a _little_ easier, since it already knows > > how to read alternatives. But it still has to learn to deal with the > > conflicting stack layouts. > > I suppose the needed abstraction is "these blocks will start and end > with the same stack layout", while allowing the internals to diverge. > How much of this stuff is actually useful anymore? I'm wondering if we can move most or all of this crud to cpu_feature_enabled(X86_FEATURE_XENPV) and its asm equivalent. The full list, annotated, appears to be: const unsigned char irq_irq_disable[1]; This is CLI or CALL, right? const unsigned char irq_irq_enable[1]; STI or CALL. const unsigned char irq_save_fl[2]; PUSHF; POP %r/eax. I *think* I read the paravirt mess correctly and this also turns into CALL. const unsigned char mmu_read_cr2[3]; const unsigned char mmu_read_cr3[3]; const unsigned char mmu_write_cr3[3]; The write CR3 is so slow that I can't imagine us caring. Reading CR3 should already be fairly optimized because it's slow on old non-PV hypervisors, too. Reading CR2 is rare and lives in asm. These also appear to just switch between MOV and CALL, anyway. const unsigned char irq_restore_fl[2]; Ugh, this one sucks. IMO it should be, for native and PV: if (flags & X86_EFLAGS_IF) { local_irq_enable(); /* or raw? */ } else { if (some debugging option) { WARN_ON_ONCE(save_fl() & X86_EFLAGS_IF); } } POPF is slooooow. const unsigned char cpu_wbinvd[2]; This is hilariously slow no matter what. static_call() or even just a plain old indirect call should be fine. const unsigned char cpu_usergs_sysret64[6]; This is in the asm and we shouldn't be doing it at all for Xen PV. IOW we should just drop this patch site entirely. I can possibly find some time to get rid of it, and maybe someone from Xen land can help. I bet that we can gain a lot of perf on Xen PV by cleaning this up, and I bet it will simplify everything. const unsigned char cpu_swapgs[3]; This is SWAPGS or nop, unless I've missed some subtlety. const unsigned char mov64[3]; This is some PTE magic, and I haven't deciphered it yet. So I think there is at most one of these that wants anything more complicated than a plain ALTERNATIVE. Any volunteers to make it so? Juergen, if you do all of them except USERGS_SYSRET64, I hereby volunteer to do that one. BTW, if y'all want to live migrate between Xen PV and anything else, you are nuts. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-13 17:34 ` Andy Lutomirski @ 2020-11-14 9:16 ` Jürgen Groß 2020-11-14 18:10 ` Andy Lutomirski 2020-11-16 11:56 ` Jürgen Groß 1 sibling, 1 reply; 32+ messages in thread From: Jürgen Groß @ 2020-11-14 9:16 UTC (permalink / raw) To: Andy Lutomirski, Andrew Cooper Cc: Josh Poimboeuf, Peter Zijlstra, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML [-- Attachment #1.1.1: Type: text/plain, Size: 6143 bytes --] On 13.11.20 18:34, Andy Lutomirski wrote: > On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper > <andrew.cooper3@citrix.com> wrote: >> >> On 11/11/2020 20:15, Josh Poimboeuf wrote: >>> On Wed, Nov 11, 2020 at 09:07:30PM +0100, Peter Zijlstra wrote: >>>> On Wed, Nov 11, 2020 at 01:59:00PM -0600, Josh Poimboeuf wrote: >>>>> On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote: >>>>>>> Would objtool have an easier time coping if this were implemented in >>>>>>> terms of a static call? >>>>>> I doubt it, the big problem is that there is no visibility into the >>>>>> actual alternative text. Runtime patching fragments into static call >>>>>> would have the exact same problem. >>>>>> >>>>>> Something that _might_ maybe work is trying to morph the immediate >>>>>> fragments into an alternative. That is, instead of this: >>>>>> >>>>>> static inline notrace unsigned long arch_local_save_flags(void) >>>>>> { >>>>>> return PVOP_CALLEE0(unsigned long, irq.save_fl); >>>>>> } >>>>>> >>>>>> Write it something like: >>>>>> >>>>>> static inline notrace unsigned long arch_local_save_flags(void) >>>>>> { >>>>>> PVOP_CALL_ARGS; >>>>>> PVOP_TEST_NULL(irq.save_fl); >>>>>> asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), >>>>>> "PUSHF; POP _ASM_AX", >>>>>> X86_FEATURE_NATIVE) >>>>>> : CLBR_RET_REG, ASM_CALL_CONSTRAINT >>>>>> : paravirt_type(irq.save_fl.func), >>>>>> paravirt_clobber(PVOP_CALLEE_CLOBBERS) >>>>>> : "memory", "cc"); >>>>>> return __eax; >>>>>> } >>>>>> >>>>>> And then we have to teach objtool how to deal with conflicting >>>>>> alternatives... >>>>>> >>>>>> That would remove most (all, if we can figure out a form that deals with >>>>>> the spinlock fragments) of paravirt_patch.c >>>>>> >>>>>> Hmm? >>>>> I was going to suggest something similar. Though I would try to take it >>>>> further and replace paravirt_patch_default() with static calls. >>>> Possible, we just need to be _really_ careful to not allow changing >>>> those static_call()s. So maybe we need DEFINE_STATIC_CALL_RO() which >>>> does a __ro_after_init on the whole thing. >>> But what if you want to live migrate to another hypervisor ;-) >> >> The same as what happens currently. The user gets to keep all the >> resulting pieces ;) >> >>>>> Either way it doesn't make objtool's job much easier. But it would be >>>>> nice to consolidate runtime patching mechanisms and get rid of >>>>> .parainstructions. >>>> I think the above (combining alternative and paravirt/static_call) does >>>> make objtool's job easier, since then we at least have the actual >>>> alternative instructions available to inspect, or am I mis-understanding >>>> things? >>> Right, it makes objtool's job a _little_ easier, since it already knows >>> how to read alternatives. But it still has to learn to deal with the >>> conflicting stack layouts. >> >> I suppose the needed abstraction is "these blocks will start and end >> with the same stack layout", while allowing the internals to diverge. >> > > How much of this stuff is actually useful anymore? I'm wondering if > we can move most or all of this crud to > cpu_feature_enabled(X86_FEATURE_XENPV) and its asm equivalent. The > full list, annotated, appears to be: > > const unsigned char irq_irq_disable[1]; > > This is CLI or CALL, right? Yes. > > const unsigned char irq_irq_enable[1]; > > STI or CALL. Yes. > > const unsigned char irq_save_fl[2]; > > PUSHF; POP %r/eax. I *think* I read the paravirt mess correctly and > this also turns into CALL. It does. > > const unsigned char mmu_read_cr2[3]; > const unsigned char mmu_read_cr3[3]; > const unsigned char mmu_write_cr3[3]; > > The write CR3 is so slow that I can't imagine us caring. Reading CR3 > should already be fairly optimized because it's slow on old non-PV > hypervisors, too. Reading CR2 is rare and lives in asm. These also > appear to just switch between MOV and CALL, anyway. Correct. > > const unsigned char irq_restore_fl[2]; > > Ugh, this one sucks. IMO it should be, for native and PV: > > if (flags & X86_EFLAGS_IF) { > local_irq_enable(); /* or raw? */ > } else { > if (some debugging option) { > WARN_ON_ONCE(save_fl() & X86_EFLAGS_IF); > } > } Seems sensible. > > POPF is slooooow. > > const unsigned char cpu_wbinvd[2]; > > This is hilariously slow no matter what. static_call() or even just a > plain old indirect call should be fine. I'd go with the static_call(). > > const unsigned char cpu_usergs_sysret64[6]; > > This is in the asm and we shouldn't be doing it at all for Xen PV. > IOW we should just drop this patch site entirely. I can possibly find > some time to get rid of it, and maybe someone from Xen land can help. > I bet that we can gain a lot of perf on Xen PV by cleaning this up, > and I bet it will simplify everything. > > const unsigned char cpu_swapgs[3]; > > This is SWAPGS or nop, unless I've missed some subtlety. > > const unsigned char mov64[3]; > > This is some PTE magic, and I haven't deciphered it yet. Either a mov or a call. > > So I think there is at most one of these that wants anything more > complicated than a plain ALTERNATIVE. Any volunteers to make it so? > Juergen, if you do all of them except USERGS_SYSRET64, I hereby > volunteer to do that one. Why is a plain alternative (either swapgs; sysretq or a jmp xen_sysret64 depending on X86_FEATURE_XENPV) no option? Its not as if this code would run before alternative patching. > > BTW, if y'all want to live migrate between Xen PV and anything else, > you are nuts. > That's no option. Xen PV is a guest property, not one of the hypervisor. Juergen [-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --] [-- Type: application/pgp-keys, Size: 3135 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-14 9:16 ` Jürgen Groß @ 2020-11-14 18:10 ` Andy Lutomirski 2020-11-15 6:33 ` Jürgen Groß 0 siblings, 1 reply; 32+ messages in thread From: Andy Lutomirski @ 2020-11-14 18:10 UTC (permalink / raw) To: Jürgen Groß Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, Peter Zijlstra, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML On Sat, Nov 14, 2020 at 1:16 AM Jürgen Groß <jgross@suse.com> wrote: > > On 13.11.20 18:34, Andy Lutomirski wrote: > > On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper > > <andrew.cooper3@citrix.com> wrote: > > > > So I think there is at most one of these that wants anything more > > complicated than a plain ALTERNATIVE. Any volunteers to make it so? > > Juergen, if you do all of them except USERGS_SYSRET64, I hereby > > volunteer to do that one. > > Why is a plain alternative (either swapgs; sysretq or a jmp xen_sysret64 > depending on X86_FEATURE_XENPV) no option? > > Its not as if this code would run before alternative patching. ALTERNATIVE would "work" in the sense that it would function and be just about as nonsensical as the current code. Fundamentally, Xen PV's sysret feature is not a drop-in replacement for SYSRET64, and pretending that it is seems unlikely to work well. I suspect that the current code is some combination of exceedingly slow, non-functional, and incorrect in subtle ways. We should just have a separate Xen PV exit path the same way we have a separate entry path in recent kernels. *This* is what I'm volunteering to do. --Andy ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-14 18:10 ` Andy Lutomirski @ 2020-11-15 6:33 ` Jürgen Groß 2020-11-15 16:05 ` Andy Lutomirski 0 siblings, 1 reply; 32+ messages in thread From: Jürgen Groß @ 2020-11-15 6:33 UTC (permalink / raw) To: Andy Lutomirski Cc: Andrew Cooper, Josh Poimboeuf, Peter Zijlstra, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML [-- Attachment #1.1.1: Type: text/plain, Size: 1486 bytes --] On 14.11.20 19:10, Andy Lutomirski wrote: > On Sat, Nov 14, 2020 at 1:16 AM Jürgen Groß <jgross@suse.com> wrote: >> >> On 13.11.20 18:34, Andy Lutomirski wrote: >>> On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper >>> <andrew.cooper3@citrix.com> wrote: >>> >>> So I think there is at most one of these that wants anything more >>> complicated than a plain ALTERNATIVE. Any volunteers to make it so? >>> Juergen, if you do all of them except USERGS_SYSRET64, I hereby >>> volunteer to do that one. >> >> Why is a plain alternative (either swapgs; sysretq or a jmp xen_sysret64 >> depending on X86_FEATURE_XENPV) no option? >> >> Its not as if this code would run before alternative patching. > > ALTERNATIVE would "work" in the sense that it would function and be > just about as nonsensical as the current code. Fundamentally, Xen > PV's sysret feature is not a drop-in replacement for SYSRET64, and > pretending that it is seems unlikely to work well. I suspect that the > current code is some combination of exceedingly slow, non-functional, > and incorrect in subtle ways. > > We should just have a separate Xen PV exit path the same way we have a > separate entry path in recent kernels. *This* is what I'm > volunteering to do. I don't think there is much work needed. Xen PV does basically a return to user mode via IRET. I think it might work just to use swapgs_restore_regs_and_return_to_usermode() unconditionally for Xen PV. Juergen [-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --] [-- Type: application/pgp-keys, Size: 3135 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-15 6:33 ` Jürgen Groß @ 2020-11-15 16:05 ` Andy Lutomirski 2020-11-15 16:13 ` Jürgen Groß 0 siblings, 1 reply; 32+ messages in thread From: Andy Lutomirski @ 2020-11-15 16:05 UTC (permalink / raw) To: Jürgen Groß Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, Peter Zijlstra, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML > On Nov 14, 2020, at 10:33 PM, Jürgen Groß <jgross@suse.com> wrote: > > On 14.11.20 19:10, Andy Lutomirski wrote: >>> On Sat, Nov 14, 2020 at 1:16 AM Jürgen Groß <jgross@suse.com> wrote: >>> >>> On 13.11.20 18:34, Andy Lutomirski wrote: >>>> On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper >>>> <andrew.cooper3@citrix.com> wrote: >>>> >>>> So I think there is at most one of these that wants anything more >>>> complicated than a plain ALTERNATIVE. Any volunteers to make it so? >>>> Juergen, if you do all of them except USERGS_SYSRET64, I hereby >>>> volunteer to do that one. >>> >>> Why is a plain alternative (either swapgs; sysretq or a jmp xen_sysret64 >>> depending on X86_FEATURE_XENPV) no option? >>> >>> Its not as if this code would run before alternative patching. >> ALTERNATIVE would "work" in the sense that it would function and be >> just about as nonsensical as the current code. Fundamentally, Xen >> PV's sysret feature is not a drop-in replacement for SYSRET64, and >> pretending that it is seems unlikely to work well. I suspect that the >> current code is some combination of exceedingly slow, non-functional, >> and incorrect in subtle ways. >> We should just have a separate Xen PV exit path the same way we have a >> separate entry path in recent kernels. *This* is what I'm >> volunteering to do. > > I don't think there is much work needed. Xen PV does basically a return > to user mode via IRET. I think it might work just to use > swapgs_restore_regs_and_return_to_usermode() unconditionally for Xen PV. > I’m quite confident that will work, but I was hoping to get it to work quickly too :) > > Juergen > <OpenPGP_0xB0DE9DD628BF132F.asc> ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-15 16:05 ` Andy Lutomirski @ 2020-11-15 16:13 ` Jürgen Groß 0 siblings, 0 replies; 32+ messages in thread From: Jürgen Groß @ 2020-11-15 16:13 UTC (permalink / raw) To: Andy Lutomirski Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, Peter Zijlstra, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML [-- Attachment #1.1.1: Type: text/plain, Size: 1863 bytes --] On 15.11.20 17:05, Andy Lutomirski wrote: > >> On Nov 14, 2020, at 10:33 PM, Jürgen Groß <jgross@suse.com> wrote: >> >> On 14.11.20 19:10, Andy Lutomirski wrote: >>>> On Sat, Nov 14, 2020 at 1:16 AM Jürgen Groß <jgross@suse.com> wrote: >>>> >>>> On 13.11.20 18:34, Andy Lutomirski wrote: >>>>> On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper >>>>> <andrew.cooper3@citrix.com> wrote: >>>>> >>>>> So I think there is at most one of these that wants anything more >>>>> complicated than a plain ALTERNATIVE. Any volunteers to make it so? >>>>> Juergen, if you do all of them except USERGS_SYSRET64, I hereby >>>>> volunteer to do that one. >>>> >>>> Why is a plain alternative (either swapgs; sysretq or a jmp xen_sysret64 >>>> depending on X86_FEATURE_XENPV) no option? >>>> >>>> Its not as if this code would run before alternative patching. >>> ALTERNATIVE would "work" in the sense that it would function and be >>> just about as nonsensical as the current code. Fundamentally, Xen >>> PV's sysret feature is not a drop-in replacement for SYSRET64, and >>> pretending that it is seems unlikely to work well. I suspect that the >>> current code is some combination of exceedingly slow, non-functional, >>> and incorrect in subtle ways. >>> We should just have a separate Xen PV exit path the same way we have a >>> separate entry path in recent kernels. *This* is what I'm >>> volunteering to do. >> >> I don't think there is much work needed. Xen PV does basically a return >> to user mode via IRET. I think it might work just to use >> swapgs_restore_regs_and_return_to_usermode() unconditionally for Xen PV. >> > > I’m quite confident that will work, but I was hoping to get it to work quickly too :) The PV interface requires to use the iret hypercall, because there is no sysret equivalent. Juergen [-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --] [-- Type: application/pgp-keys, Size: 3135 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-13 17:34 ` Andy Lutomirski 2020-11-14 9:16 ` Jürgen Groß @ 2020-11-16 11:56 ` Jürgen Groß 2020-11-16 13:04 ` Peter Zijlstra 1 sibling, 1 reply; 32+ messages in thread From: Jürgen Groß @ 2020-11-16 11:56 UTC (permalink / raw) To: Andy Lutomirski, Andrew Cooper Cc: Josh Poimboeuf, Peter Zijlstra, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML [-- Attachment #1.1.1: Type: text/plain, Size: 1727 bytes --] On 13.11.20 18:34, Andy Lutomirski wrote: > On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper > <andrew.cooper3@citrix.com> wrote: >> >> On 11/11/2020 20:15, Josh Poimboeuf wrote: >>> On Wed, Nov 11, 2020 at 09:07:30PM +0100, Peter Zijlstra wrote: >>>> On Wed, Nov 11, 2020 at 01:59:00PM -0600, Josh Poimboeuf wrote: >>>>> On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote: >>>>>>> Would objtool have an easier time coping if this were implemented in >>>>>>> terms of a static call? >>>>>> I doubt it, the big problem is that there is no visibility into the >>>>>> actual alternative text. Runtime patching fragments into static call >>>>>> would have the exact same problem. >>>>>> >>>>>> Something that _might_ maybe work is trying to morph the immediate >>>>>> fragments into an alternative. That is, instead of this: >>>>>> >>>>>> static inline notrace unsigned long arch_local_save_flags(void) >>>>>> { >>>>>> return PVOP_CALLEE0(unsigned long, irq.save_fl); >>>>>> } >>>>>> >>>>>> Write it something like: >>>>>> >>>>>> static inline notrace unsigned long arch_local_save_flags(void) >>>>>> { >>>>>> PVOP_CALL_ARGS; >>>>>> PVOP_TEST_NULL(irq.save_fl); >>>>>> asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), >>>>>> "PUSHF; POP _ASM_AX", >>>>>> X86_FEATURE_NATIVE) I am wondering whether we really want a new feature (basically "not XENPV). We could use ~X86_FEATURE_XENPV and teach apply_alternatives() to understand negated features (yes, this limits the number of features to 32767, but I don't think this is a real problem for quite some time). Thoughts? Juergen [-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --] [-- Type: application/pgp-keys, Size: 3135 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-16 11:56 ` Jürgen Groß @ 2020-11-16 13:04 ` Peter Zijlstra 2020-11-18 6:47 ` Jürgen Groß 0 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2020-11-16 13:04 UTC (permalink / raw) To: Jürgen Groß Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML [-- Attachment #1: Type: text/plain, Size: 6351 bytes --] On Mon, Nov 16, 2020 at 12:56:32PM +0100, Jürgen Groß wrote: > > > > > > > static inline notrace unsigned long arch_local_save_flags(void) > > > > > > > { > > > > > > > PVOP_CALL_ARGS; > > > > > > > PVOP_TEST_NULL(irq.save_fl); > > > > > > > asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), > > > > > > > "PUSHF; POP _ASM_AX", > > > > > > > X86_FEATURE_NATIVE) > > I am wondering whether we really want a new feature (basically "not > XENPV). We could use ~X86_FEATURE_XENPV and teach apply_alternatives() > to understand negated features (yes, this limits the number of features > to 32767, but I don't think this is a real problem for quite some time). > > Thoughts? I went with the simple thing for now... If we ever want/need another negative alternative I suppose we can do as you suggest... I was still poking at objtool to actually dtrt though.. --- diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index dad350d42ecf..cc88f358bac5 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -237,6 +237,7 @@ #define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports the VMCALL instruction */ #define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */ #define X86_FEATURE_SEV_ES ( 8*32+20) /* AMD Secure Encrypted Virtualization - Encrypted State */ +#define X86_FEATURE_NOT_XENPV ( 8*32+21) /* "" inverse of X86_FEATURE_XENPV */ /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */ #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/ diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index d25cc6830e89..e2a9d3e6b7ad 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -645,22 +645,56 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu); #ifdef CONFIG_PARAVIRT_XXL static inline notrace unsigned long arch_local_save_flags(void) { - return PVOP_CALLEE0(unsigned long, irq.save_fl); + PVOP_CALL_ARGS; + PVOP_TEST_NULL(irq.save_fl); + asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), + "pushf; pop %%" _ASM_AX ";", + X86_FEATURE_NOT_XENPV) + : PVOP_CALLEE_CLOBBERS, ASM_CALL_CONSTRAINT + : paravirt_type(irq.save_fl.func), + paravirt_clobber(CLBR_RET_REG) + : "memory", "cc"); + return __eax; } static inline notrace void arch_local_irq_restore(unsigned long f) { - PVOP_VCALLEE1(irq.restore_fl, f); + PVOP_CALL_ARGS; + PVOP_TEST_NULL(irq.restore_fl); + asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), + "push %%" _ASM_ARG1 "; popf;", + X86_FEATURE_NOT_XENPV) + : PVOP_VCALLEE_CLOBBERS, ASM_CALL_CONSTRAINT + : paravirt_type(irq.restore_fl.func), + paravirt_clobber(CLBR_RET_REG), + PVOP_CALL_ARG1(f) + : "memory", "cc"); } static inline notrace void arch_local_irq_disable(void) { - PVOP_VCALLEE0(irq.irq_disable); + PVOP_CALL_ARGS; + PVOP_TEST_NULL(irq.irq_disable); + asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), + "cli;", + X86_FEATURE_NOT_XENPV) + : PVOP_VCALLEE_CLOBBERS, ASM_CALL_CONSTRAINT + : paravirt_type(irq.irq_disable.func), + paravirt_clobber(CLBR_RET_REG) + : "memory", "cc"); } static inline notrace void arch_local_irq_enable(void) { - PVOP_VCALLEE0(irq.irq_enable); + PVOP_CALL_ARGS; + PVOP_TEST_NULL(irq.irq_enable); + asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), + "sti;", + X86_FEATURE_NOT_XENPV) + : PVOP_VCALLEE_CLOBBERS, ASM_CALL_CONSTRAINT + : paravirt_type(irq.irq_enable.func), + paravirt_clobber(CLBR_RET_REG) + : "memory", "cc"); } static inline notrace unsigned long arch_local_irq_save(void) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 2400ad62f330..5bd8f5503652 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -723,6 +723,19 @@ void __init alternative_instructions(void) * patching. */ + if (!boot_cpu_has(X86_FEATURE_XENPV)) + setup_force_cpu_cap(X86_FEATURE_NOT_XENPV); + + /* + * First patch paravirt functions, such that we overwrite the indirect + * call with the direct call. + */ + apply_paravirt(__parainstructions, __parainstructions_end); + + /* + * Then patch alternatives, such that those paravirt calls that are in + * alternatives can be overwritten by their immediate fragments. + */ apply_alternatives(__alt_instructions, __alt_instructions_end); #ifdef CONFIG_SMP @@ -741,8 +754,6 @@ void __init alternative_instructions(void) } #endif - apply_paravirt(__parainstructions, __parainstructions_end); - restart_nmi(); alternatives_patched = 1; } diff --git a/arch/x86/kernel/paravirt_patch.c b/arch/x86/kernel/paravirt_patch.c index ace6e334cb39..867498db53ad 100644 --- a/arch/x86/kernel/paravirt_patch.c +++ b/arch/x86/kernel/paravirt_patch.c @@ -33,13 +33,9 @@ struct patch_xxl { }; static const struct patch_xxl patch_data_xxl = { - .irq_irq_disable = { 0xfa }, // cli - .irq_irq_enable = { 0xfb }, // sti - .irq_save_fl = { 0x9c, 0x58 }, // pushf; pop %[re]ax .mmu_read_cr2 = { 0x0f, 0x20, 0xd0 }, // mov %cr2, %[re]ax .mmu_read_cr3 = { 0x0f, 0x20, 0xd8 }, // mov %cr3, %[re]ax .mmu_write_cr3 = { 0x0f, 0x22, 0xdf }, // mov %rdi, %cr3 - .irq_restore_fl = { 0x57, 0x9d }, // push %rdi; popfq .cpu_wbinvd = { 0x0f, 0x09 }, // wbinvd .cpu_usergs_sysret64 = { 0x0f, 0x01, 0xf8, 0x48, 0x0f, 0x07 }, // swapgs; sysretq @@ -76,11 +72,6 @@ unsigned int native_patch(u8 type, void *insn_buff, unsigned long addr, switch (type) { #ifdef CONFIG_PARAVIRT_XXL - PATCH_CASE(irq, restore_fl, xxl, insn_buff, len); - PATCH_CASE(irq, save_fl, xxl, insn_buff, len); - PATCH_CASE(irq, irq_enable, xxl, insn_buff, len); - PATCH_CASE(irq, irq_disable, xxl, insn_buff, len); - PATCH_CASE(mmu, read_cr2, xxl, insn_buff, len); PATCH_CASE(mmu, read_cr3, xxl, insn_buff, len); PATCH_CASE(mmu, write_cr3, xxl, insn_buff, len); [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-16 13:04 ` Peter Zijlstra @ 2020-11-18 6:47 ` Jürgen Groß 2020-11-18 8:22 ` Peter Zijlstra 0 siblings, 1 reply; 32+ messages in thread From: Jürgen Groß @ 2020-11-18 6:47 UTC (permalink / raw) To: Peter Zijlstra Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML [-- Attachment #1.1.1: Type: text/plain, Size: 6935 bytes --] On 16.11.20 14:04, Peter Zijlstra wrote: > On Mon, Nov 16, 2020 at 12:56:32PM +0100, Jürgen Groß wrote: >>>>>>>> static inline notrace unsigned long arch_local_save_flags(void) >>>>>>>> { >>>>>>>> PVOP_CALL_ARGS; >>>>>>>> PVOP_TEST_NULL(irq.save_fl); >>>>>>>> asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), >>>>>>>> "PUSHF; POP _ASM_AX", >>>>>>>> X86_FEATURE_NATIVE) >> >> I am wondering whether we really want a new feature (basically "not >> XENPV). We could use ~X86_FEATURE_XENPV and teach apply_alternatives() >> to understand negated features (yes, this limits the number of features >> to 32767, but I don't think this is a real problem for quite some time). >> >> Thoughts? > > I went with the simple thing for now... If we ever want/need another > negative alternative I suppose we can do as you suggest... > > I was still poking at objtool to actually dtrt though.. I'd like to include this part in my series (with a different solution for the restore_fl part, as suggested by Andy before). Peter, are you fine with me taking your patch and adding your SoB? Juergen > > --- > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h > index dad350d42ecf..cc88f358bac5 100644 > --- a/arch/x86/include/asm/cpufeatures.h > +++ b/arch/x86/include/asm/cpufeatures.h > @@ -237,6 +237,7 @@ > #define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports the VMCALL instruction */ > #define X86_FEATURE_VMW_VMMCALL ( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */ > #define X86_FEATURE_SEV_ES ( 8*32+20) /* AMD Secure Encrypted Virtualization - Encrypted State */ > +#define X86_FEATURE_NOT_XENPV ( 8*32+21) /* "" inverse of X86_FEATURE_XENPV */ > > /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */ > #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/ > diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h > index d25cc6830e89..e2a9d3e6b7ad 100644 > --- a/arch/x86/include/asm/paravirt.h > +++ b/arch/x86/include/asm/paravirt.h > @@ -645,22 +645,56 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu); > #ifdef CONFIG_PARAVIRT_XXL > static inline notrace unsigned long arch_local_save_flags(void) > { > - return PVOP_CALLEE0(unsigned long, irq.save_fl); > + PVOP_CALL_ARGS; > + PVOP_TEST_NULL(irq.save_fl); > + asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), > + "pushf; pop %%" _ASM_AX ";", > + X86_FEATURE_NOT_XENPV) > + : PVOP_CALLEE_CLOBBERS, ASM_CALL_CONSTRAINT > + : paravirt_type(irq.save_fl.func), > + paravirt_clobber(CLBR_RET_REG) > + : "memory", "cc"); > + return __eax; > } > > static inline notrace void arch_local_irq_restore(unsigned long f) > { > - PVOP_VCALLEE1(irq.restore_fl, f); > + PVOP_CALL_ARGS; > + PVOP_TEST_NULL(irq.restore_fl); > + asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), > + "push %%" _ASM_ARG1 "; popf;", > + X86_FEATURE_NOT_XENPV) > + : PVOP_VCALLEE_CLOBBERS, ASM_CALL_CONSTRAINT > + : paravirt_type(irq.restore_fl.func), > + paravirt_clobber(CLBR_RET_REG), > + PVOP_CALL_ARG1(f) > + : "memory", "cc"); > } > > static inline notrace void arch_local_irq_disable(void) > { > - PVOP_VCALLEE0(irq.irq_disable); > + PVOP_CALL_ARGS; > + PVOP_TEST_NULL(irq.irq_disable); > + asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), > + "cli;", > + X86_FEATURE_NOT_XENPV) > + : PVOP_VCALLEE_CLOBBERS, ASM_CALL_CONSTRAINT > + : paravirt_type(irq.irq_disable.func), > + paravirt_clobber(CLBR_RET_REG) > + : "memory", "cc"); > } > > static inline notrace void arch_local_irq_enable(void) > { > - PVOP_VCALLEE0(irq.irq_enable); > + PVOP_CALL_ARGS; > + PVOP_TEST_NULL(irq.irq_enable); > + asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), > + "sti;", > + X86_FEATURE_NOT_XENPV) > + : PVOP_VCALLEE_CLOBBERS, ASM_CALL_CONSTRAINT > + : paravirt_type(irq.irq_enable.func), > + paravirt_clobber(CLBR_RET_REG) > + : "memory", "cc"); > } > > static inline notrace unsigned long arch_local_irq_save(void) > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c > index 2400ad62f330..5bd8f5503652 100644 > --- a/arch/x86/kernel/alternative.c > +++ b/arch/x86/kernel/alternative.c > @@ -723,6 +723,19 @@ void __init alternative_instructions(void) > * patching. > */ > > + if (!boot_cpu_has(X86_FEATURE_XENPV)) > + setup_force_cpu_cap(X86_FEATURE_NOT_XENPV); > + > + /* > + * First patch paravirt functions, such that we overwrite the indirect > + * call with the direct call. > + */ > + apply_paravirt(__parainstructions, __parainstructions_end); > + > + /* > + * Then patch alternatives, such that those paravirt calls that are in > + * alternatives can be overwritten by their immediate fragments. > + */ > apply_alternatives(__alt_instructions, __alt_instructions_end); > > #ifdef CONFIG_SMP > @@ -741,8 +754,6 @@ void __init alternative_instructions(void) > } > #endif > > - apply_paravirt(__parainstructions, __parainstructions_end); > - > restart_nmi(); > alternatives_patched = 1; > } > diff --git a/arch/x86/kernel/paravirt_patch.c b/arch/x86/kernel/paravirt_patch.c > index ace6e334cb39..867498db53ad 100644 > --- a/arch/x86/kernel/paravirt_patch.c > +++ b/arch/x86/kernel/paravirt_patch.c > @@ -33,13 +33,9 @@ struct patch_xxl { > }; > > static const struct patch_xxl patch_data_xxl = { > - .irq_irq_disable = { 0xfa }, // cli > - .irq_irq_enable = { 0xfb }, // sti > - .irq_save_fl = { 0x9c, 0x58 }, // pushf; pop %[re]ax > .mmu_read_cr2 = { 0x0f, 0x20, 0xd0 }, // mov %cr2, %[re]ax > .mmu_read_cr3 = { 0x0f, 0x20, 0xd8 }, // mov %cr3, %[re]ax > .mmu_write_cr3 = { 0x0f, 0x22, 0xdf }, // mov %rdi, %cr3 > - .irq_restore_fl = { 0x57, 0x9d }, // push %rdi; popfq > .cpu_wbinvd = { 0x0f, 0x09 }, // wbinvd > .cpu_usergs_sysret64 = { 0x0f, 0x01, 0xf8, > 0x48, 0x0f, 0x07 }, // swapgs; sysretq > @@ -76,11 +72,6 @@ unsigned int native_patch(u8 type, void *insn_buff, unsigned long addr, > switch (type) { > > #ifdef CONFIG_PARAVIRT_XXL > - PATCH_CASE(irq, restore_fl, xxl, insn_buff, len); > - PATCH_CASE(irq, save_fl, xxl, insn_buff, len); > - PATCH_CASE(irq, irq_enable, xxl, insn_buff, len); > - PATCH_CASE(irq, irq_disable, xxl, insn_buff, len); > - > PATCH_CASE(mmu, read_cr2, xxl, insn_buff, len); > PATCH_CASE(mmu, read_cr3, xxl, insn_buff, len); > PATCH_CASE(mmu, write_cr3, xxl, insn_buff, len); > [-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --] [-- Type: application/pgp-keys, Size: 3135 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-18 6:47 ` Jürgen Groß @ 2020-11-18 8:22 ` Peter Zijlstra 2020-11-19 11:51 ` Shinichiro Kawasaki 0 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2020-11-18 8:22 UTC (permalink / raw) To: Jürgen Groß Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML On Wed, Nov 18, 2020 at 07:47:36AM +0100, Jürgen Groß wrote: > On 16.11.20 14:04, Peter Zijlstra wrote: > > On Mon, Nov 16, 2020 at 12:56:32PM +0100, Jürgen Groß wrote: > > > > > > > > > static inline notrace unsigned long arch_local_save_flags(void) > > > > > > > > > { > > > > > > > > > PVOP_CALL_ARGS; > > > > > > > > > PVOP_TEST_NULL(irq.save_fl); > > > > > > > > > asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), > > > > > > > > > "PUSHF; POP _ASM_AX", > > > > > > > > > X86_FEATURE_NATIVE) > > > > > > I am wondering whether we really want a new feature (basically "not > > > XENPV). We could use ~X86_FEATURE_XENPV and teach apply_alternatives() > > > to understand negated features (yes, this limits the number of features > > > to 32767, but I don't think this is a real problem for quite some time). > > > > > > Thoughts? > > > > I went with the simple thing for now... If we ever want/need another > > negative alternative I suppose we can do as you suggest... > > > > I was still poking at objtool to actually dtrt though.. > > I'd like to include this part in my series (with a different solution > for the restore_fl part, as suggested by Andy before). > > Peter, are you fine with me taking your patch and adding your SoB? Sure, note that I only compile tested it, as it was my test-bed for playing with objtool (which I still haven't managed to get back to and finish :/). So if it actually _works_ feel free to take it, otherwise you can have whatever bits and pieces remain after you've butchered it to do as you want. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-18 8:22 ` Peter Zijlstra @ 2020-11-19 11:51 ` Shinichiro Kawasaki 2020-11-19 12:01 ` Peter Zijlstra 0 siblings, 1 reply; 32+ messages in thread From: Shinichiro Kawasaki @ 2020-11-19 11:51 UTC (permalink / raw) To: Peter Zijlstra Cc: Jürgen Groß, Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML On Nov 18, 2020 / 09:22, Peter Zijlstra wrote: > On Wed, Nov 18, 2020 at 07:47:36AM +0100, Jürgen Groß wrote: > > On 16.11.20 14:04, Peter Zijlstra wrote: > > > On Mon, Nov 16, 2020 at 12:56:32PM +0100, Jürgen Groß wrote: > > > > > > > > > > static inline notrace unsigned long arch_local_save_flags(void) > > > > > > > > > > { > > > > > > > > > > PVOP_CALL_ARGS; > > > > > > > > > > PVOP_TEST_NULL(irq.save_fl); > > > > > > > > > > asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL), > > > > > > > > > > "PUSHF; POP _ASM_AX", > > > > > > > > > > X86_FEATURE_NATIVE) > > > > > > > > I am wondering whether we really want a new feature (basically "not > > > > XENPV). We could use ~X86_FEATURE_XENPV and teach apply_alternatives() > > > > to understand negated features (yes, this limits the number of features > > > > to 32767, but I don't think this is a real problem for quite some time). > > > > > > > > Thoughts? > > > > > > I went with the simple thing for now... If we ever want/need another > > > negative alternative I suppose we can do as you suggest... > > > > > > I was still poking at objtool to actually dtrt though.. > > > > I'd like to include this part in my series (with a different solution > > for the restore_fl part, as suggested by Andy before). > > > > Peter, are you fine with me taking your patch and adding your SoB? > > Sure, note that I only compile tested it, as it was my test-bed for > playing with objtool (which I still haven't managed to get back to and > finish :/). > > So if it actually _works_ feel free to take it, otherwise you can have > whatever bits and pieces remain after you've butchered it to do as you > want. All, thank you for the actions. I tried Peter's patch in my test system and result looks good. The warning is still observed with 5.10-rc4. With the patch on top of 5.10-rc4, the warning disappeared. Of note is that when I built kernel with the patch, objtool reported many warnings, as follows: arch/x86/events/intel/core.o: warning: objtool: .altinstr_replacement+0x0: alternative modifies stack arch/x86/events/core.o: warning: objtool: .altinstr_replacement+0x0: alternative modifies stack -- Best Regards, Shin'ichiro Kawasaki ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-19 11:51 ` Shinichiro Kawasaki @ 2020-11-19 12:01 ` Peter Zijlstra 2020-11-19 12:28 ` Jürgen Groß 2020-11-19 12:48 ` Shinichiro Kawasaki 0 siblings, 2 replies; 32+ messages in thread From: Peter Zijlstra @ 2020-11-19 12:01 UTC (permalink / raw) To: Shinichiro Kawasaki Cc: Jürgen Groß, Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML On Thu, Nov 19, 2020 at 11:51:00AM +0000, Shinichiro Kawasaki wrote: > I tried Peter's patch in my test system and result looks good. The warning is > still observed with 5.10-rc4. With the patch on top of 5.10-rc4, the warning > disappeared. The patch on its own is not sufficient to fix things; there needs to be an accompanying objtool patch to generate the correct unwind information. This patch only ensures objtool has enough information to be able to dtrt, but as it stands it shouldn't even compile, it should complain about an alternative trying to modify the stack and bail. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-19 12:01 ` Peter Zijlstra @ 2020-11-19 12:28 ` Jürgen Groß 2020-11-19 12:48 ` Shinichiro Kawasaki 1 sibling, 0 replies; 32+ messages in thread From: Jürgen Groß @ 2020-11-19 12:28 UTC (permalink / raw) To: Peter Zijlstra, Shinichiro Kawasaki Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML [-- Attachment #1.1.1: Type: text/plain, Size: 927 bytes --] On 19.11.20 13:01, Peter Zijlstra wrote: > On Thu, Nov 19, 2020 at 11:51:00AM +0000, Shinichiro Kawasaki wrote: >> I tried Peter's patch in my test system and result looks good. The warning is >> still observed with 5.10-rc4. With the patch on top of 5.10-rc4, the warning >> disappeared. > > The patch on its own is not sufficient to fix things; there needs to be > an accompanying objtool patch to generate the correct unwind > information. > > This patch only ensures objtool has enough information to be able to > dtrt, but as it stands it shouldn't even compile, it should complain > about an alternative trying to modify the stack and bail. > It complains, but doesn't bail. I'm nearly ready with my series replacing all the custom code paravirt patches by ALTERNATIVEs. Quite some paravirt macro cruft turned out to be unused, so there is a rather large cleanup patch included. :-) Juergen [-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --] [-- Type: application/pgp-keys, Size: 3135 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-19 12:01 ` Peter Zijlstra 2020-11-19 12:28 ` Jürgen Groß @ 2020-11-19 12:48 ` Shinichiro Kawasaki 1 sibling, 0 replies; 32+ messages in thread From: Shinichiro Kawasaki @ 2020-11-19 12:48 UTC (permalink / raw) To: Peter Zijlstra Cc: Jürgen Groß, Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML On Nov 19, 2020 / 13:01, Peter Zijlstra wrote: > On Thu, Nov 19, 2020 at 11:51:00AM +0000, Shinichiro Kawasaki wrote: > > I tried Peter's patch in my test system and result looks good. The warning is > > still observed with 5.10-rc4. With the patch on top of 5.10-rc4, the warning > > disappeared. > > The patch on its own is not sufficient to fix things; there needs to be > an accompanying objtool patch to generate the correct unwind > information. > > This patch only ensures objtool has enough information to be able to > dtrt, but as it stands it shouldn't even compile, it should complain > about an alternative trying to modify the stack and bail. Thanks for the clarification. It was too early to try out... When it gets ready to try and test, please let me know. -- Best Regards, Shin'ichiro Kawasaki ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-11 20:15 ` Josh Poimboeuf 2020-11-11 20:25 ` Andrew Cooper @ 2020-11-11 20:35 ` Peter Zijlstra 1 sibling, 0 replies; 32+ messages in thread From: Peter Zijlstra @ 2020-11-11 20:35 UTC (permalink / raw) To: Josh Poimboeuf Cc: Andrew Cooper, Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, jgross, x86 On Wed, Nov 11, 2020 at 02:15:06PM -0600, Josh Poimboeuf wrote: > On Wed, Nov 11, 2020 at 09:07:30PM +0100, Peter Zijlstra wrote: > > Possible, we just need to be _really_ careful to not allow changing > > those static_call()s. So maybe we need DEFINE_STATIC_CALL_RO() which > > does a __ro_after_init on the whole thing. > > But what if you want to live migrate to another hypervisor ;-) Then you get to keep the pieces ;-) > > > Either way it doesn't make objtool's job much easier. But it would be > > > nice to consolidate runtime patching mechanisms and get rid of > > > .parainstructions. > > > > I think the above (combining alternative and paravirt/static_call) does > > make objtool's job easier, since then we at least have the actual > > alternative instructions available to inspect, or am I mis-understanding > > things? > > Right, it makes objtool's job a _little_ easier, since it already knows > how to read alternatives. But it still has to learn to deal with the > conflicting stack layouts. Right, but at least now it can see the instructions. Which is a lot better than: `add a magic +1 ORC entry when you see an indirect call to $magic`. Anyway, __orc_find(addr) looks for the ORC entry with the highest IP <= @addr. So if we have: alt0 alt1 0x00 CALL *foo 0x00 PUSHF 0x07 insn 0x01 POP %rax 0x02 .nops 5 0x07 insn we have ORC entries for alt1.0x00 and alt1.0x01. Then if we hit insn, it'll find the alt1.0x01 entry, but that had better be the same as the state at 0x00. This means that for every alt, we have to unwind using the CFI of every other alt and match for every instruction. Which should be doable I think. Certainly more complicated that outright disallowing CFI changes inside alt groups though :/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: WARNING: can't access registers at asm_common_interrupt 2020-11-11 18:13 ` Josh Poimboeuf 2020-11-11 18:46 ` Andrew Cooper @ 2020-11-11 20:42 ` Peter Zijlstra 1 sibling, 0 replies; 32+ messages in thread From: Peter Zijlstra @ 2020-11-11 20:42 UTC (permalink / raw) To: Josh Poimboeuf Cc: Shinichiro Kawasaki, linux-kernel, Nicholas Piggin, Damien Le Moal, andrew.cooper3, jgross On Wed, Nov 11, 2020 at 12:13:28PM -0600, Josh Poimboeuf wrote: > On Wed, Nov 11, 2020 at 06:47:36PM +0100, Peter Zijlstra wrote: > > Yeah, so it's all a giant can of worms that; also see: > > > > https://lkml.kernel.org/r/20200821084738.508092956@infradead.org > > > > The basic idea is to only trace edges, ie. when the hardware state > > actually changes. Sadly this means doing a pushf/pop before the cli. > > Ideally CLI would store the old IF in CF or something like that, but > > alas. > > Right, that makes sense for save/restore, but is the disabled check > really needed for local_irq_disable()? Wouldn't that always be an edge? IIRC there is code that does local_irq_disable() even though IRQs are already disabled. This is 'harmless'. > And anyway I don't see a similar check for local_irq_enable(). I know there is code that does local_irq_enable() with IRQs already enabled, I'm not exactly sure why this is different. I'll have to put it on the todo list :/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* WARNING: can't access registers at asm_common_interrupt @ 2020-09-06 20:46 syzbot 0 siblings, 0 replies; 32+ messages in thread From: syzbot @ 2020-09-06 20:46 UTC (permalink / raw) To: alexandre.chartre, bp, hpa, linux-kernel, luto, mingo, peterz, syzkaller-bugs, tglx, x86 Hello, syzbot found the following issue on: HEAD commit: 59126901 Merge tag 'perf-tools-fixes-for-v5.9-2020-09-03' .. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=11221b5d900000 kernel config: https://syzkaller.appspot.com/x/.config?x=3c5f6ce8d5b68299 dashboard link: https://syzkaller.appspot.com/bug?extid=424f7b15245b9615f293 compiler: gcc (GCC) 10.1.0-syz 20200507 Unfortunately, I don't have any reproducer for this issue yet. IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+424f7b15245b9615f293@syzkaller.appspotmail.com WARNING: can't access registers at asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:572 --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2020-11-19 12:49 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-11-06 6:04 WARNING: can't access registers at asm_common_interrupt Shinichiro Kawasaki 2020-11-06 18:06 ` Josh Poimboeuf 2020-11-09 9:10 ` Shinichiro Kawasaki 2020-11-10 3:19 ` Josh Poimboeuf 2020-11-10 9:19 ` Shinichiro Kawasaki 2020-11-11 17:05 ` Josh Poimboeuf 2020-11-11 17:47 ` Peter Zijlstra 2020-11-11 18:13 ` Josh Poimboeuf 2020-11-11 18:46 ` Andrew Cooper 2020-11-11 19:42 ` Peter Zijlstra 2020-11-11 19:59 ` Josh Poimboeuf 2020-11-11 20:07 ` Peter Zijlstra 2020-11-11 20:15 ` Josh Poimboeuf 2020-11-11 20:25 ` Andrew Cooper 2020-11-11 20:39 ` Peter Zijlstra 2020-11-13 17:34 ` Andy Lutomirski 2020-11-14 9:16 ` Jürgen Groß 2020-11-14 18:10 ` Andy Lutomirski 2020-11-15 6:33 ` Jürgen Groß 2020-11-15 16:05 ` Andy Lutomirski 2020-11-15 16:13 ` Jürgen Groß 2020-11-16 11:56 ` Jürgen Groß 2020-11-16 13:04 ` Peter Zijlstra 2020-11-18 6:47 ` Jürgen Groß 2020-11-18 8:22 ` Peter Zijlstra 2020-11-19 11:51 ` Shinichiro Kawasaki 2020-11-19 12:01 ` Peter Zijlstra 2020-11-19 12:28 ` Jürgen Groß 2020-11-19 12:48 ` Shinichiro Kawasaki 2020-11-11 20:35 ` Peter Zijlstra 2020-11-11 20:42 ` Peter Zijlstra -- strict thread matches above, loose matches on Subject: below -- 2020-09-06 20:46 syzbot
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.