All of lore.kernel.org
 help / color / mirror / Atom feed
* WARNING: can't access registers at asm_common_interrupt
@ 2020-11-06  6:04 Shinichiro Kawasaki
  2020-11-06 18:06 ` Josh Poimboeuf
  2020-11-11 17:05 ` Josh Poimboeuf
  0 siblings, 2 replies; 32+ messages in thread
From: Shinichiro Kawasaki @ 2020-11-06  6:04 UTC (permalink / raw)
  To: linux-kernel; +Cc: Josh Poimboeuf, Nicholas Piggin, Damien Le Moal

Greetings,

I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40"
in my kernel test system repeatedly, which is printed by unwind_next_frame() in
"arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar
warning was reported and discussed [2], but I suppose the cause is not yet
clarified.

The warning was observed with v5.10-rc2 and older tags. I bisected and found
that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3
triggered the warning. Reverting that from 5.10-rc2, the warning disappeared.
May I ask comment by expertise on CC how this commit can relate to the warning?

The test condition to reproduce the warning is rather unique (blktests,
dm-linear and ZNS device emulation by QEMU). If any action is suggested for
further analysis, I'm willing to take it with my test system.

Wish this report helps.

[1] https://lkml.org/lkml/2020/9/6/231
[2] https://lkml.org/lkml/2020/9/8/1538

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-06  6:04 WARNING: can't access registers at asm_common_interrupt Shinichiro Kawasaki
@ 2020-11-06 18:06 ` Josh Poimboeuf
  2020-11-09  9:10   ` Shinichiro Kawasaki
  2020-11-11 17:05 ` Josh Poimboeuf
  1 sibling, 1 reply; 32+ messages in thread
From: Josh Poimboeuf @ 2020-11-06 18:06 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: linux-kernel, Nicholas Piggin, Damien Le Moal

On Fri, Nov 06, 2020 at 06:04:15AM +0000, Shinichiro Kawasaki wrote:
> Greetings,
> 
> I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40"
> in my kernel test system repeatedly, which is printed by unwind_next_frame() in
> "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar
> warning was reported and discussed [2], but I suppose the cause is not yet
> clarified.
> 
> The warning was observed with v5.10-rc2 and older tags. I bisected and found
> that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3
> triggered the warning. Reverting that from 5.10-rc2, the warning disappeared.
> May I ask comment by expertise on CC how this commit can relate to the warning?
> 
> The test condition to reproduce the warning is rather unique (blktests,
> dm-linear and ZNS device emulation by QEMU). If any action is suggested for
> further analysis, I'm willing to take it with my test system.

Hi,

Thanks for reporting this issue.  This might be a different issue from
[2].

Can you send me the arch/x86/entry/entry_64.o file from your build?

-- 
Josh


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-06 18:06 ` Josh Poimboeuf
@ 2020-11-09  9:10   ` Shinichiro Kawasaki
  2020-11-10  3:19     ` Josh Poimboeuf
  0 siblings, 1 reply; 32+ messages in thread
From: Shinichiro Kawasaki @ 2020-11-09  9:10 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: linux-kernel, Nicholas Piggin, Damien Le Moal

On Nov 06, 2020 / 12:06, Josh Poimboeuf wrote:
> On Fri, Nov 06, 2020 at 06:04:15AM +0000, Shinichiro Kawasaki wrote:
> > Greetings,
> > 
> > I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40"
> > in my kernel test system repeatedly, which is printed by unwind_next_frame() in
> > "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar
> > warning was reported and discussed [2], but I suppose the cause is not yet
> > clarified.
> > 
> > The warning was observed with v5.10-rc2 and older tags. I bisected and found
> > that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3
> > triggered the warning. Reverting that from 5.10-rc2, the warning disappeared.
> > May I ask comment by expertise on CC how this commit can relate to the warning?
> > 
> > The test condition to reproduce the warning is rather unique (blktests,
> > dm-linear and ZNS device emulation by QEMU). If any action is suggested for
> > further analysis, I'm willing to take it with my test system.
> 
> Hi,
> 
> Thanks for reporting this issue.  This might be a different issue from
> [2].
> 
> Can you send me the arch/x86/entry/entry_64.o file from your build?

Hi Josh, thank you for your response. As a separated e-mail, I have sent the
entry_64.o only to your address, since I hesitate to send around the 76kb
attachment file to LKML. In case it does not reach to you, please let me know.

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-09  9:10   ` Shinichiro Kawasaki
@ 2020-11-10  3:19     ` Josh Poimboeuf
  2020-11-10  9:19       ` Shinichiro Kawasaki
  0 siblings, 1 reply; 32+ messages in thread
From: Josh Poimboeuf @ 2020-11-10  3:19 UTC (permalink / raw)
  To: Shinichiro Kawasaki; +Cc: linux-kernel, Nicholas Piggin, Damien Le Moal

On Mon, Nov 09, 2020 at 09:10:38AM +0000, Shinichiro Kawasaki wrote:
> On Nov 06, 2020 / 12:06, Josh Poimboeuf wrote:
> > On Fri, Nov 06, 2020 at 06:04:15AM +0000, Shinichiro Kawasaki wrote:
> > > Greetings,
> > > 
> > > I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40"
> > > in my kernel test system repeatedly, which is printed by unwind_next_frame() in
> > > "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar
> > > warning was reported and discussed [2], but I suppose the cause is not yet
> > > clarified.
> > > 
> > > The warning was observed with v5.10-rc2 and older tags. I bisected and found
> > > that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3
> > > triggered the warning. Reverting that from 5.10-rc2, the warning disappeared.
> > > May I ask comment by expertise on CC how this commit can relate to the warning?
> > > 
> > > The test condition to reproduce the warning is rather unique (blktests,
> > > dm-linear and ZNS device emulation by QEMU). If any action is suggested for
> > > further analysis, I'm willing to take it with my test system.
> > 
> > Hi,
> > 
> > Thanks for reporting this issue.  This might be a different issue from
> > [2].
> > 
> > Can you send me the arch/x86/entry/entry_64.o file from your build?
> 
> Hi Josh, thank you for your response. As a separated e-mail, I have sent the
> entry_64.o only to your address, since I hesitate to send around the 76kb
> attachment file to LKML. In case it does not reach to you, please let me know.

Got it, thanks.  Unfortunately I'm still confused.

Can you test with the following patch, and boot with 'unwind_debug'?
That should hopefully dump a lot of useful data along with the warning.

From: Josh Poimboeuf <jpoimboe@redhat.com>
Subject: [PATCH] x86/unwind/orc: Add 'unwind_debug' cmdline option

Sometimes the one-line ORC unwinder warnings aren't very helpful.  Add a
new 'unwind_debug' cmdline option which will dump the full stack
contents of the current task when an error condition is encountered.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
---
 .../admin-guide/kernel-parameters.txt         |  6 +++
 arch/x86/kernel/unwind_orc.c                  | 48 ++++++++++++++++++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 526d65d8573a..4bed92c51723 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5512,6 +5512,12 @@
 	unknown_nmi_panic
 			[X86] Cause panic on unknown NMI.
 
+	unwind_debug	[X86-64]
+			Enable unwinder debug output.  This can be
+			useful for debugging certain unwinder error
+			conditions, including corrupt stacks and
+			bad/missing unwinder metadata.
+
 	usbcore.authorized_default=
 			[USB] Default USB device authorization:
 			(default -1 = authorized except for wireless USB,
diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index 73f800100066..44bae03f9bfc 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -13,8 +13,13 @@
 
 #define orc_warn_current(args...)					\
 ({									\
-	if (state->task == current)					\
+	static bool dumped_before;					\
+	if (state->task == current) {					\
 		orc_warn(args);						\
+		if (unwind_debug && !dumped_before)			\
+			unwind_dump(state);				\
+		dumped_before = true;					\
+	}								\
 })
 
 extern int __start_orc_unwind_ip[];
@@ -23,8 +28,49 @@ extern struct orc_entry __start_orc_unwind[];
 extern struct orc_entry __stop_orc_unwind[];
 
 static bool orc_init __ro_after_init;
+static bool unwind_debug __ro_after_init;
 static unsigned int lookup_num_blocks __ro_after_init;
 
+static int __init unwind_debug_cmdline(char *str)
+{
+	unwind_debug = true;
+
+	return 0;
+}
+early_param("unwind_debug", unwind_debug_cmdline);
+
+static void unwind_dump(struct unwind_state *state)
+{
+	static bool dumped_before;
+	unsigned long word, *sp;
+	struct stack_info stack_info = {0};
+	unsigned long visit_mask = 0;
+
+	if (dumped_before)
+		return;
+
+	dumped_before = true;
+
+	printk_deferred("unwind stack type:%d next_sp:%p mask:0x%lx graph_idx:%d\n",
+			state->stack_info.type, state->stack_info.next_sp,
+			state->stack_mask, state->graph_idx);
+
+	for (sp = __builtin_frame_address(0); sp;
+	     sp = PTR_ALIGN(stack_info.next_sp, sizeof(long))) {
+		if (get_stack_info(sp, state->task, &stack_info, &visit_mask))
+			break;
+
+		for (; sp < stack_info.end; sp++) {
+
+			word = READ_ONCE_NOCHECK(*sp);
+
+			printk_deferred("%0*lx: %0*lx (%pB)\n", BITS_PER_LONG/4,
+					(unsigned long)sp, BITS_PER_LONG/4,
+					word, (void *)word);
+		}
+	}
+}
+
 static inline unsigned long orc_ip(const int *ip)
 {
 	return (unsigned long)ip + *ip;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-10  3:19     ` Josh Poimboeuf
@ 2020-11-10  9:19       ` Shinichiro Kawasaki
  0 siblings, 0 replies; 32+ messages in thread
From: Shinichiro Kawasaki @ 2020-11-10  9:19 UTC (permalink / raw)
  To: Josh Poimboeuf; +Cc: linux-kernel, Nicholas Piggin, Damien Le Moal

On Nov 09, 2020 / 21:19, Josh Poimboeuf wrote:
> On Mon, Nov 09, 2020 at 09:10:38AM +0000, Shinichiro Kawasaki wrote:
> > On Nov 06, 2020 / 12:06, Josh Poimboeuf wrote:
> > > On Fri, Nov 06, 2020 at 06:04:15AM +0000, Shinichiro Kawasaki wrote:
> > > > Greetings,
> > > > 
> > > > I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40"
> > > > in my kernel test system repeatedly, which is printed by unwind_next_frame() in
> > > > "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar
> > > > warning was reported and discussed [2], but I suppose the cause is not yet
> > > > clarified.
> > > > 
> > > > The warning was observed with v5.10-rc2 and older tags. I bisected and found
> > > > that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3
> > > > triggered the warning. Reverting that from 5.10-rc2, the warning disappeared.
> > > > May I ask comment by expertise on CC how this commit can relate to the warning?
> > > > 
> > > > The test condition to reproduce the warning is rather unique (blktests,
> > > > dm-linear and ZNS device emulation by QEMU). If any action is suggested for
> > > > further analysis, I'm willing to take it with my test system.
> > > 
> > > Hi,
> > > 
> > > Thanks for reporting this issue.  This might be a different issue from
> > > [2].
> > > 
> > > Can you send me the arch/x86/entry/entry_64.o file from your build?
> > 
> > Hi Josh, thank you for your response. As a separated e-mail, I have sent the
> > entry_64.o only to your address, since I hesitate to send around the 76kb
> > attachment file to LKML. In case it does not reach to you, please let me know.
> 
> Got it, thanks.  Unfortunately I'm still confused.
> 
> Can you test with the following patch, and boot with 'unwind_debug'?
> That should hopefully dump a lot of useful data along with the warning.

Thank you for the patch. With the patch and 'unwind_debug' kernel command line,
I recreated the warning. Here I quote the kernel messages printed (smpboot
related messages are usual message printed for blktests test case block/008).

...
[  113.022680] smpboot: CPU 1 is now offline
[  113.030546] WARNING: can't access registers at asm_common_interrupt+0x1e/0x40
[  113.030549] unwind stack type:0 next_sp:0000000000000000 mask:0x6 graph_idx:0
[  113.030554] ffff8881e87097b0: 1ffff1103d0e1302 (0x1ffff1103d0e1302)
[  113.030558] ffff8881e87097b8: ffffffff9712a434 (unwind_next_frame+0x15e4/0x1fc0)
[  113.030560] ffff8881e87097c0: ffffffff99200c5e (asm_common_interrupt+0x1e/0x40)
[  113.030562] ffff8881e87097c8: 0000000000000000 (0x0)
[  113.030563] ffff8881e87097d0: ffff8881e87098bd (0xffff8881e87098bd)
[  113.030564] ffff8881e87097d8: ffff8881e87098d8 (0xffff8881e87098d8)
[  113.030565] ffff8881e87097e0: ffff8881e87098c0 (0xffff8881e87098c0)
[  113.030570] ffff8881e87097e8: ffffffff9ac85575 (__start_orc_unwind+0x65f81/0x37a91c)
[  113.030572] ffff8881e87097f0: ffffffff9ac85575 (__start_orc_unwind+0x65f81/0x37a91c)
[  113.030573] ffff8881e87097f8: ffff8881e8709938 (0xffff8881e8709938)
[  113.030574] ffff8881e8709800: 0000000000000000 (0x0)
[  113.030597] ffff8881e8709808: ffff88810ebd31c0 (0xffff88810ebd31c0)
[  113.030598] ffff8881e8709810: 0000000041b58ab3 (0x41b58ab3)
[  113.030602] ffff8881e8709818: ffffffff99d7a850 (.LC2+0x3be/0x44d)
[  113.030604] ffff8881e8709820: ffffffff97128e50 (deref_stack_reg+0x160/0x160)
[  113.030605] ffff8881e8709828: 0000000000000000 (0x0)
[  113.030606] ffff8881e8709830: 0000000000000000 (0x0)
[  113.030610] ffff8881e8709838: ffffffff972343bf (kernel_text_address.part.0+0x1f/0xc0)
[  113.030624] ffff8881e8709840: ffffffffc046fd90 (nvme_irq+0x10/0x20 [nvme])
[  113.030625] ffff8881e8709848: ffff8881e87098d0 (0xffff8881e87098d0)
[  113.030628] ffff8881e8709850: ffffffff973994c0 (create_prof_cpu_mask+0x20/0x20)
[  113.030629] ffff8881e8709858: ffff8881e8709908 (0xffff8881e8709908)
[  113.030630] ffff8881e8709860: ffff8881e8709938 (0xffff8881e8709938)
[  113.030631] ffff8881e8709868: 0000000000000000 (0x0)
[  113.030632] ffff8881e8709870: ffff88810ebd31c0 (0xffff88810ebd31c0)
[  113.030633] ffff8881e8709878: 0000000000000006 (0x6)
[  113.030636] ffff8881e8709880: ffffffff9709925c (arch_stack_walk+0x6c/0xb0)
[  113.030637] ffff8881e8709888: 0000000000000000 (0x0)
[  113.030638] ffff8881e8709890: ffff88814c428000 (0xffff88814c428000)
[  113.030640] ffff8881e8709898: ffff88814c430000 (0xffff88814c430000)
[  113.030641] ffff8881e87098a0: 0000000000000000 (0x0)
[  113.030642] ffff8881e87098a8: 0000000000000006 (0x6)
[  113.030643] ffff8881e87098b0: ffff88810ebd31c0 (0xffff88810ebd31c0)
[  113.030644] ffff8881e87098b8: 0000010000000000 (0x10000000000)
[  113.030645] ffff8881e87098c0: 0000000000000000 (0x0)
[  113.030646] ffff8881e87098c8: 0000000000000000 (0x0)
[  113.030648] ffff8881e87098d0: ffffffff99200c5e (asm_common_interrupt+0x1e/0x40)
[  113.030649] ffff8881e87098d8: ffff88814c42fed0 (0xffff88814c42fed0)
[  113.030650] ffff8881e87098e0: 0000000000000000 (0x0)
[  113.030651] ffff8881e87098e8: ffffed103d0e1323 (0xffffed103d0e1323)
[  113.030652] ffff8881e87098f0: 0000000000000800 (0x800)
[  113.030653] ffff8881e87098f8: ffff88810b48a780 (0xffff88810b48a780)
[  113.030654] ffff8881e8709900: ffff8881e8709c48 (0xffff8881e8709c48)
[  113.030658] ffff8881e8709908: ffffffff97862595 (kmem_cache_free+0xf5/0x590)
[  113.030660] ffff8881e8709910: ffffffff973996a1 (stack_trace_save+0x81/0xa0)
[  113.030661] ffff8881e8709918: 0000000041b58ab3 (0x41b58ab3)
[  113.030663] ffff8881e8709920: ffffffff99d89ef4 (.LC4+0x20f/0x14aa1)
[  113.030665] ffff8881e8709928: ffffffff97399620 (stack_trace_consume_entry+0x160/0x160)
[  113.030666] ffff8881e8709930: ffff8881098e9800 (0xffff8881098e9800)
[  113.030667] ffff8881e8709938: ffff8881e8709988 (0xffff8881e8709988)
[  113.030668] ffff8881e8709940: 0000000000000040 (0x40)
[  113.030669] ffff8881e8709948: 0000000000000015 (0x15)
[  113.030670] ffff8881e8709950: ffff88811858d348 (0xffff88811858d348)
[  113.030672] ffff8881e8709958: ffffffff97862595 (kmem_cache_free+0xf5/0x590)
[  113.030673] ffff8881e8709960: 0000000000000800 (0x800)
[  113.030674] ffff8881e8709968: ffff88810b48a780 (0xffff88810b48a780)
[  113.030675] ffff8881e8709970: ffff8881e8709c48 (0xffff8881e8709c48)
[  113.030676] ffff8881e8709978: ffff88811858c008 (0xffff88811858c008)
[  113.030678] ffff8881e8709980: ffffffff978649db (kasan_save_stack+0x1b/0x40)
[  113.030680] ffff8881e8709988: ffffffff978649db (kasan_save_stack+0x1b/0x40)
[  113.030681] ffff8881e8709990: ffffffff97864a1c (kasan_set_track+0x1c/0x30)
[  113.030684] ffff8881e8709998: ffffffff97866f5b (kasan_set_free_info+0x1b/0x30)
[  113.030685] ffff8881e87099a0: ffffffff97864980 (__kasan_slab_free+0x110/0x150)
[  113.030687] ffff8881e87099a8: ffffffff9785b4da (slab_free_freelist_hook+0x5a/0x170)
[  113.030689] ffff8881e87099b0: ffffffff97862595 (kmem_cache_free+0xf5/0x590)
[  113.030692] ffff8881e87099b8: ffffffff988f4da0 (dec_pending+0x1f0/0x900)
[  113.030694] ffff8881e87099c0: ffffffff988fad15 (clone_endio+0x1e5/0x940)
[  113.030697] ffff8881e87099c8: ffffffff97eac8f6 (blk_update_request+0x716/0xe20)
[  113.030699] ffff8881e87099d0: ffffffff97ed523b (blk_mq_end_request+0x4b/0x480)
[  113.030701] ffff8881e87099d8: ffffffffc046e863 (nvme_process_cq+0x563/0xa40 [nvme])
[  113.030703] ffff8881e87099e0: ffffffffc046fd90 (nvme_irq+0x10/0x20 [nvme])
[  113.030706] ffff8881e87099e8: ffffffff9733ef52 (__handle_irq_event_percpu+0x252/0x620)
[  113.030707] ffff8881e87099f0: ffffffff9733f51f (handle_irq_event+0xef/0x240)
[  113.030709] ffff8881e87099f8: ffffffff9734bff6 (handle_edge_irq+0x1f6/0xb60)
[  113.030712] ffff8881e8709a00: ffffffff992010b2 (asm_call_irq_on_stack+0x12/0x20)
[  113.030714] ffff8881e8709a08: ffffffff9918f05b (common_interrupt+0xeb/0x190)
[  113.030716] ffff8881e8709a10: ffffffff99200c5e (asm_common_interrupt+0x1e/0x40)
[  113.030717] ffff8881e8709a18: ffffffff97395f70 (exit_to_user_mode_prepare+0xc0/0x1d0)
[  113.030719] ffff8881e8709a20: ffffffff99200c48 (asm_common_interrupt+0x8/0x40)
[  113.030721] ffff8881e8709a28: ffffffff99200c5e (asm_common_interrupt+0x1e/0x40)
[  113.030723] ffff8881e8709a30: ffffffff991a91e0 (schedule+0xd0/0x270)
[  113.030724] ffff8881e8709a38: ffffffff97395fb7 (exit_to_user_mode_prepare+0x107/0x1d0)
[  113.030725] ffff8881e8709a40: 1ffff1103d0e134e (0x1ffff1103d0e134e)
[  113.030727] ffff8881e8709a48: ffff8881e8733dc8 (0xffff8881e8733dc8)
[  113.030728] ffff8881e8709a50: ffff8881e8733d40 (0xffff8881e8733d40)
[  113.030729] ffff8881e8709a58: 00000019d79785f8 (0x19d79785f8)
[  113.030730] ffff8881e8709a60: 0000000000000005 (0x5)
[  113.030731] ffff8881e8709a68: 0000000000000000 (0x0)
[  113.030732] ffff8881e8709a70: ffff88810b579980 (0xffff88810b579980)
[  113.030733] ffff8881e8709a78: ffff88810b579800 (0xffff88810b579800)
[  113.030736] ffff8881e8709a80: ffffffff97286cda (update_load_avg+0x1fa/0x1ad0)
[  113.030737] ffff8881e8709a88: ffff88810b57a000 (0xffff88810b57a000)
[  113.030738] ffff8881e8709a90: ffff88810b579800 (0xffff88810b579800)
[  113.030739] ffff8881e8709a98: 0000000000000400 (0x400)
[  113.030741] ffff8881e8709aa0: ffffffff9729be38 (update_cfs_group+0x118/0x290)
[  113.030743] ffff8881e8709aa8: ffffffff973146a0 (lock_downgrade+0x6a0/0x6a0)
[  113.030744] ffff8881e8709ab0: 0000000000000000 (0x0)
[  113.030745] ffff8881e8709ab8: ffff88810b579850 (0xffff88810b579850)
[  113.030746] ffff8881e8709ac0: ffff8881e8733d68 (0xffff8881e8733d68)
[  113.030747] ffff8881e8709ac8: ffff8881e8733d40 (0xffff8881e8733d40)
[  113.030748] ffff8881e8709ad0: 0000000000000001 (0x1)
[  113.030749] ffff8881e8709ad8: ffff88810b579810 (0xffff88810b579810)
[  113.030751] ffff8881e8709ae0: ffffffff972a14b2 (enqueue_entity+0x402/0x2ba0)
[  113.030752] ffff8881e8709ae8: ffff8881e8733d50 (0xffff8881e8733d50)
[  113.030753] ffff8881e8709af0: ffff88810b579800 (0xffff88810b579800)
[  113.030754] ffff8881e8709af8: ffff8881e8733d80 (0xffff8881e8733d80)
[  113.030755] ffff8881e8709b00: ffff888116873268 (0xffff888116873268)
[  113.030757] ffff8881e8709b08: ffff8881e8734790 (0xffff8881e8734790)
[  113.030758] ffff8881e8709b10: ffff88810ebd3268 (0xffff88810ebd3268)
[  113.030759] ffff8881e8709b18: ffff8881e8720e80 (0xffff8881e8720e80)
[  113.030760] ffff8881e8709b20: ffff88810b579800 (0xffff88810b579800)
[  113.030761] ffff8881e8709b28: ffff88810b579950 (0xffff88810b579950)
[  113.030762] ffff8881e8709b30: 0000000000000009 (0x9)
[  113.030763] ffff8881e8709b38: ffff8881e8733c80 (0xffff8881e8733c80)
[  113.030764] ffff8881e8709b40: ffff88810ebd31c0 (0xffff88810ebd31c0)
[  113.030765] ffff8881e8709b48: 0000000000000002 (0x2)
[  113.030768] ffff8881e8709b50: ffffffff97263aef (resched_curr+0x17f/0x1e0)
[  113.030769] ffff8881e8709b58: ffff8881098e9000 (0xffff8881098e9000)
[  113.030770] ffff8881e8709b60: ffff8881098e9038 (0xffff8881098e9038)
[  113.030771] ffff8881e8709b68: 000000000056e125 (0x56e125)
[  113.030772] ffff8881e8709b70: 0000000000000002 (0x2)
[  113.030773] ffff8881e8709b78: 1ffff1103d0e1374 (0x1ffff1103d0e1374)
[  113.030774] ffff8881e8709b80: ffff888116873e20 (0xffff888116873e20)
[  113.030775] ffff8881e8709b88: ffff88811858c000 (0xffff88811858c000)
[  113.030777] ffff8881e8709b90: ffffffff97864a1c (kasan_set_track+0x1c/0x30)
[  113.030778] ffff8881e8709b98: 1ffff110230b1800 (0x1ffff110230b1800)
[  113.030780] ffff8881e8709ba0: ffffffff97866f5b (kasan_set_free_info+0x1b/0x30)
[  113.030781] ffff8881e8709ba8: 0000000000000001 (0x1)
[  113.030782] ffff8881e8709bb0: ffffffff97864980 (__kasan_slab_free+0x110/0x150)
[  113.030783] ffff8881e8709bb8: ffff88810b48a780 (0xffff88810b48a780)
[  113.030785] ffff8881e8709bc0: ffff88811858c000 (0xffff88811858c000)
[  113.030786] ffff8881e8709bc8: 3b9c1e2c25148a56 (0x3b9c1e2c25148a56)
[  113.030787] ffff8881e8709bd0: ffff88811858c000 (0xffff88811858c000)
[  113.030788] ffff8881e8709bd8: ffffffff9785b4da (slab_free_freelist_hook+0x5a/0x170)
[  113.030789] ffff8881e8709be0: ffff8881e8709c50 (0xffff8881e8709c50)
[  113.030790] ffff8881e8709be8: ffff88819858c000 (0xffff88819858c000)
[  113.030792] ffff8881e8709bf0: ffff88810b48a780 (0xffff88810b48a780)
[  113.030793] ffff8881e8709bf8: ffffea0004616300 (0xffffea0004616300)
[  113.030794] ffff8881e8709c00: ffff88811858c000 (0xffff88811858c000)
[  113.030795] ffff8881e8709c08: 0000000000000000 (0x0)
[  113.030796] ffff8881e8709c10: ffff8881059402a0 (0xffff8881059402a0)
[  113.030797] ffff8881e8709c18: ffffffff97862595 (kmem_cache_free+0xf5/0x590)
[  113.030799] ffff8881e8709c20: ffff888114d87038 (0xffff888114d87038)
[  113.030799] ffff8881e8709c28: 0000000000000000 (0x0)
[  113.030801] ffff8881e8709c30: ffffffff988f4da0 (dec_pending+0x1f0/0x900)
[  113.030802] ffff8881e8709c38: ffff8881059403b8 (0xffff8881059403b8)
[  113.030803] ffff8881e8709c40: ffff88811858c0a8 (0xffff88811858c0a8)
[  113.030804] ffff8881e8709c48: 0000000000000000 (0x0)
[  113.030805] ffff8881e8709c50: 0000000000000000 (0x0)
[  113.030806] ffff8881e8709c58: ffff88811858c000 (0xffff88811858c000)
[  113.030807] ffff8881e8709c60: ffff88814c427ab8 (0xffff88814c427ab8)
[  113.030808] ffff8881e8709c68: ffff888105940000 (0xffff888105940000)
[  113.030809] ffff8881e8709c70: 0000000000000000 (0x0)
[  113.030810] ffff8881e8709c78: ffff88814c427ac8 (0xffff88814c427ac8)
[  113.030811] ffff8881e8709c80: ffff8881059402a0 (0xffff8881059402a0)
[  113.030813] ffff8881e8709c88: ffffffff988f4da0 (dec_pending+0x1f0/0x900)
[  113.030814] ffff8881e8709c90: ffff888105940000 (0xffff888105940000)
[  113.030815] ffff8881e8709c98: 00000000fffd21a8 (0xfffd21a8)
[  113.030816] ffff8881e8709ca0: 0000000400000000 (0x400000000)
[  113.030817] ffff8881e8709ca8: ffff88814c427ac0 (0xffff88814c427ac0)
[  113.030818] ffff8881e8709cb0: 1ffff1103d0e139c (0x1ffff1103d0e139c)
[  113.030819] ffff8881e8709cb8: 1ffff1103d0e13a0 (0x1ffff1103d0e13a0)
[  113.030820] ffff8881e8709cc0: ffff88811858c0a8 (0xffff88811858c0a8)
[  113.030822] ffff8881e8709cc8: ffff88814c427ab8 (0xffff88814c427ab8)
[  113.030823] ffff8881e8709cd0: ffff88811858c000 (0xffff88811858c000)
[  113.030824] ffff8881e8709cd8: 0000000000000000 (0x0)
[  113.030825] ffff8881e8709ce0: ffff88811858c088 (0xffff88811858c088)
[  113.030826] ffff8881e8709ce8: ffffffff988fad15 (clone_endio+0x1e5/0x940)
[  113.030827] ffff8881e8709cf0: ffff888102a26c38 (0xffff888102a26c38)
[  113.030828] ffff8881e8709cf8: ffff88811858c0b0 (0xffff88811858c0b0)
[  113.030829] ffff8881e8709d00: 0000000041b58ab3 (0x41b58ab3)
[  113.030831] ffff8881e8709d08: ffffffff99e07e8f (.LC0+0x31f92/0x36703)
[  113.030833] ffff8881e8709d10: ffffffff988fab30 (disable_discard+0xd0/0xd0)
[  113.030834] ffff8881e8709d18: ffff88811858c0a8 (0xffff88811858c0a8)
[  113.030835] ffff8881e8709d20: ffff88811858c000 (0xffff88811858c000)
[  113.030836] ffff8881e8709d28: ffff88811858c0e0 (0xffff88811858c0e0)
[  113.030837] ffff8881e8709d30: ffff88811858c0b0 (0xffff88811858c0b0)
[  113.030838] ffff8881e8709d38: ffff88811a753300 (0xffff88811a753300)
[  113.030840] ffff8881e8709d40: dffffc0000000000 (0xdffffc0000000000)
[  113.030841] ffff8881e8709d48: ffff88811858c0a8 (0xffff88811858c0a8)
[  113.030842] ffff8881e8709d50: ffff88811a75331c (0xffff88811a75331c)
[  113.030843] ffff8881e8709d58: 0000000000001000 (0x1000)
[  113.030844] ffff8881e8709d60: 0000000000001000 (0x1000)
[  113.030845] ffff8881e8709d68: ffff88811a753300 (0xffff88811a753300)
[  113.030846] ffff8881e8709d70: ffffffff97eac8f6 (blk_update_request+0x716/0xe20)
[  113.030847] ffff8881e8709d78: ffffffff00000000 (0xffffffff00000000)
[  113.030848] ffff8881e8709d80: 0000000000000000 (0x0)
[  113.030849] ffff8881e8709d88: ffffed10234ea667 (0xffffed10234ea667)
[  113.030851] ffff8881e8709d90: ffff88811858c0d0 (0xffff88811858c0d0)
[  113.030852] ffff8881e8709d98: ffffed10234ea663 (0xffffed10234ea663)
[  113.030853] ffff8881e8709da0: ffff88811a753318 (0xffff88811a753318)
[  113.030854] ffff8881e8709da8: ffff88811a753338 (0xffff88811a753338)
[  113.030855] ffff8881e8709db0: ffff888111788800 (0xffff888111788800)
[  113.030856] ffff8881e8709db8: dffffc0000000000 (0xdffffc0000000000)
[  113.030857] ffff8881e8709dc0: ffff88811a753300 (0xffff88811a753300)
[  113.030858] ffff8881e8709dc8: 0000000000000000 (0x0)
[  113.030859] ffff8881e8709dd0: ffff88811a753300 (0xffff88811a753300)
[  113.030860] ffff8881e8709dd8: ffff888116c828c0 (0xffff888116c828c0)
[  113.030861] ffff8881e8709de0: ffffed1020316ea8 (0xffffed1020316ea8)
[  113.030863] ffff8881e8709de8: ffffffff97ed523b (blk_mq_end_request+0x4b/0x480)
[  113.030864] ffff8881e8709df0: dffffc0000000000 (0xdffffc0000000000)
[  113.030865] ffff8881e8709df8: dffffc0000000000 (0xdffffc0000000000)
[  113.030866] ffff8881e8709e00: ffff8881018b7480 (0xffff8881018b7480)
[  113.030867] ffff8881e8709e08: 0000000000000000 (0x0)
[  113.030868] ffff8881e8709e10: ffff88811a753300 (0xffff88811a753300)
[  113.030869] ffff8881e8709e18: ffff888116c828c0 (0xffff888116c828c0)
[  113.030871] ffff8881e8709e20: ffffffffc046e863 (nvme_process_cq+0x563/0xa40 [nvme])
[  113.030872] ffff8881e8709e28: ffffed1020316ead (0xffffed1020316ead)
[  113.030873] ffff8881e8709e30: ffff888116c828cc (0xffff888116c828cc)
[  113.030874] ffff8881e8709e38: 0000000197314fd1 (0x197314fd1)
[  113.030875] ffff8881e8709e40: ffff8881018b756c (0xffff8881018b756c)
[  113.030876] ffff8881e8709e48: ffff8881018b7560 (0xffff8881018b7560)
[  113.030878] ffff8881e8709e50: ffff8881018b756a (0xffff8881018b756a)
[  113.030879] ffff8881e8709e58: ffff88810e3c98c0 (0xffff88810e3c98c0)
[  113.030880] ffff8881e8709e60: ffff8881018b7540 (0xffff8881018b7540)
[  113.030881] ffff8881e8709e68: ffff8881018b7568 (0xffff8881018b7568)
[  113.030883] ffff8881e8709e70: ffffffff9733f50a (handle_irq_event+0xda/0x240)
[  113.030884] ffff8881e8709e78: dffffc0000000000 (0xdffffc0000000000)
[  113.030886] ffff8881e8709e80: ffffffffc046fd80 (nvme_suspend+0x330/0x330 [nvme])
[  113.030887] ffff8881e8709e88: ffff8881e8709f48 (0xffff8881e8709f48)
[  113.030888] ffff8881e8709e90: ffff88810e3c9800 (0xffff88810e3c9800)
[  113.030889] ffff8881e8709e98: ffff8881152e7f00 (0xffff8881152e7f00)
[  113.030890] ffff8881e8709ea0: 000000000000001c (0x1c)
[  113.030892] ffff8881e8709ea8: ffffffffc046fd90 (nvme_irq+0x10/0x20 [nvme])
[  113.030894] ffff8881e8709eb0: ffffffff9733ef52 (__handle_irq_event_percpu+0x252/0x620)
[  113.030895] ffff8881e8709eb8: 0000000000000002 (0x2)
[  113.030896] ffff8881e8709ec0: 0000000000000000 (0x0)
[  113.030897] ffff8881e8709ec8: ffffed1021c7930f (0xffffed1021c7930f)
[  113.030898] ffff8881e8709ed0: ffff88810ebd3eb8 (0xffff88810ebd3eb8)
[  113.030899] ffff8881e8709ed8: ffff88810ebd31c0 (0xffff88810ebd31c0)
[  113.030900] ffff8881e8709ee0: ffff88810e3c9878 (0xffff88810e3c9878)
[  113.030901] ffff8881e8709ee8: ffff88810e3c9800 (0xffff88810e3c9800)
[  113.030902] ffff8881e8709ef0: 1ffff1103d0e13e5 (0x1ffff1103d0e13e5)
[  113.030903] ffff8881e8709ef8: ffff88810e3c9800 (0xffff88810e3c9800)
[  113.030904] ffff8881e8709f00: ffff88810e3c9838 (0xffff88810e3c9838)
[  113.030906] ffff8881e8709f08: ffff88810e3c98a8 (0xffff88810e3c98a8)
[  113.030907] ffff8881e8709f10: ffff88810e3c9800 (0xffff88810e3c9800)
[  113.030908] ffff8881e8709f18: ffffffff9733f51f (handle_irq_event+0xef/0x240)
[  113.030909] ffff8881e8709f20: 0000000000000002 (0x2)
[  113.030910] ffff8881e8709f28: 0000000041b58ab3 (0x41b58ab3)
[  113.030912] ffff8881e8709f30: ffffffff99d88063 (.LC0+0x10d5/0x2d57)
[  113.030914] ffff8881e8709f38: ffffffff9733f430 (handle_irq_event_percpu+0x110/0x110)
[  113.030915] ffff8881e8709f40: 0000000000000000 (0x0)
[  113.030916] ffff8881e8709f48: 0000000000000000 (0x0)
[  113.030917] ffff8881e8709f50: ffff88810e3c98a8 (0xffff88810e3c98a8)
[  113.030918] ffff8881e8709f58: ffff88810e3c987c (0xffff88810e3c987c)
[  113.030919] ffff8881e8709f60: 0000000000000000 (0x0)
[  113.030920] ffff8881e8709f68: ffffed1021c7930e (0xffffed1021c7930e)
[  113.030921] ffff8881e8709f70: ffff88810e3c9838 (0xffff88810e3c9838)
[  113.030922] ffff8881e8709f78: ffff88810e3c987c (0xffff88810e3c987c)
[  113.030923] ffff8881e8709f80: ffffed1021c7930f (0xffffed1021c7930f)
[  113.030925] ffff8881e8709f88: dffffc0000000000 (0xdffffc0000000000)
[  113.030926] ffff8881e8709f90: ffffffff9734bff6 (handle_edge_irq+0x1f6/0xb60)
[  113.030927] ffff8881e8709f98: ffff88810e3c98a8 (0xffff88810e3c98a8)
[  113.030928] ffff8881e8709fa0: ffff88810e3c9870 (0xffff88810e3c9870)
[  113.030929] ffff8881e8709fa8: ffff88810e3c9840 (0xffff88810e3c9840)
[  113.030930] ffff8881e8709fb0: ffff88810e3c9828 (0xffff88810e3c9828)
[  113.030931] ffff8881e8709fb8: 0000000000000000 (0x0)
[  113.030932] ffff8881e8709fc0: 0000000000000023 (0x23)
[  113.030933] ffff8881e8709fc8: ffff88814c42fe30 (0xffff88814c42fe30)
[  113.030934] ffff8881e8709fd0: ffff88810e3c9800 (0xffff88810e3c9800)
[  113.030935] ffff8881e8709fd8: 0000000000000000 (0x0)
[  113.030936] ffff8881e8709fe0: 0000000000000000 (0x0)
[  113.030937] ffff8881e8709fe8: 0000000000000023 (0x23)
[  113.030939] ffff8881e8709ff0: ffffffff992010b2 (asm_call_irq_on_stack+0x12/0x20)
[  113.030940] ffff8881e8709ff8: ffff88814c42fe30 (0xffff88814c42fe30)
[  113.030942] ffff88814c42fe30: ffff88814c42fe78 (0xffff88814c42fe78)
[  113.030943] ffff88814c42fe38: ffffffff9918f05b (common_interrupt+0xeb/0x190)
[  113.030944] ffff88814c42fe40: 0000000000000000 (0x0)
[  113.030945] ffff88814c42fe48: 0000000000000000 (0x0)
[  113.030946] ffff88814c42fe50: 0000000000000000 (0x0)
[  113.030947] ffff88814c42fe58: 0000000000000000 (0x0)
[  113.030948] ffff88814c42fe60: 0000000000000000 (0x0)
[  113.030949] ffff88814c42fe68: 0000000000000000 (0x0)
[  113.030950] ffff88814c42fe70: ffffffff99200c5e (asm_common_interrupt+0x1e/0x40)
[  113.030951] ffff88814c42fe78: 0000000000000000 (0x0)
[  113.030952] ffff88814c42fe80: 0000000000000000 (0x0)
[  113.030953] ffff88814c42fe88: 0000000000000000 (0x0)
[  113.030954] ffff88814c42fe90: ffff88810ebd31c0 (0xffff88810ebd31c0)
[  113.030955] ffff88814c42fe98: ffff88814c42ff58 (0xffff88814c42ff58)
[  113.030956] ffff88814c42fea0: 0000000000000000 (0x0)
[  113.030957] ffff88814c42fea8: 0000000000000001 (0x1)
[  113.030958] ffff88814c42feb0: ffffed1021d7a638 (0xffffed1021d7a638)
[  113.030959] ffff88814c42feb8: ffff88810ebd31c7 (0xffff88810ebd31c7)
[  113.030960] ffff88814c42fec0: 0000000000000000 (0x0)
[  113.030961] ffff88814c42fec8: 0000000000400140 (0x400140)
[  113.030963] ffff88814c42fed0: ffffffff991a91f4 (schedule+0xe4/0x270)
[  113.030963] ffff88814c42fed8: 0000000000000000 (0x0)
[  113.030964] ffff88814c42fee0: 0000000000000008 (0x8)
[  113.030965] ffff88814c42fee8: ffff88810ebd31c0 (0xffff88810ebd31c0)
[  113.030967] ffff88814c42fef0: ffffffffffffffff (0xffffffffffffffff)
[  113.030968] ffff88814c42fef8: ffffffff97395f70 (exit_to_user_mode_prepare+0xc0/0x1d0)
[  113.030969] ffff88814c42ff00: 0000000000000010 (0x10)
[  113.030970] ffff88814c42ff08: 0000000000000246 (0x246)
[  113.030971] ffff88814c42ff10: ffff88814c42ff28 (0xffff88814c42ff28)
[  113.030972] ffff88814c42ff18: 0000000000000018 (0x18)
[  113.030973] ffff88814c42ff20: 0000000000000000 (0x0)
[  113.030974] ffff88814c42ff28: 0000000000000246 (0x246)
[  113.030975] ffff88814c42ff30: 0000000000000000 (0x0)
[  113.030976] ffff88814c42ff38: 0000000000000000 (0x0)
[  113.030978] ffff88814c42ff40: ffffffff99200c48 (asm_common_interrupt+0x8/0x40)
[  113.030980] ffff88814c42ff48: ffffffff99191c95 (irqentry_exit_to_user_mode+0x5/0x40)
[  113.030981] ffff88814c42ff50: ffffffff99200c5e (asm_common_interrupt+0x1e/0x40)
[  113.030982] ffff88814c42ff58: 0000000000aaa828 (0xaaa828)
[  113.030983] ffff88814c42ff60: 0000000000001000 (0x1000)
[  113.030984] ffff88814c42ff68: 0000000000000000 (0x0)
[  113.030985] ffff88814c42ff70: 00007f02266d1460 (0x7f02266d1460)
[  113.030986] ffff88814c42ff78: 0000000000aaa800 (0xaaa800)
[  113.030987] ffff88814c42ff80: 0000000000aaa800 (0xaaa800)
[  113.030988] ffff88814c42ff88: 0000000000000293 (0x293)
[  113.030989] ffff88814c42ff90: 0000000000b59000 (0xb59000)
[  113.030990] ffff88814c42ff98: 00007ffdf02c4090 (0x7ffdf02c4090)
[  113.030991] ffff88814c42ffa0: 0000000000000000 (0x0)
[  113.030992] ffff88814c42ffa8: 0000000000001000 (0x1000)
[  113.030994] ffff88814c42ffb0: 00007f028a34124f (0x7f028a34124f)
[  113.030995] ffff88814c42ffb8: 0000000000001000 (0x1000)
[  113.030995] ffff88814c42ffc0: 0000000000abd000 (0xabd000)
[  113.030996] ffff88814c42ffc8: 0000000000000007 (0x7)
[  113.030998] ffff88814c42ffd0: ffffffffffffffff (0xffffffffffffffff)
[  113.030999] ffff88814c42ffd8: 00007f028a34124f (0x7f028a34124f)
[  113.031000] ffff88814c42ffe0: 0000000000000033 (0x33)
[  113.031000] ffff88814c42ffe8: 0000000000000293 (0x293)
[  113.031002] ffff88814c42fff0: 00007ffdf01f96d0 (0x7ffdf01f96d0)
[  113.031002] ffff88814c42fff8: 000000000000002b (0x2b)
[  115.105359] smpboot: CPU 3 is now offline
[  115.359319] smpboot: CPU 2 is now offline
...

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-06  6:04 WARNING: can't access registers at asm_common_interrupt Shinichiro Kawasaki
  2020-11-06 18:06 ` Josh Poimboeuf
@ 2020-11-11 17:05 ` Josh Poimboeuf
  2020-11-11 17:47   ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Josh Poimboeuf @ 2020-11-11 17:05 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: linux-kernel, Nicholas Piggin, Damien Le Moal, Peter Zijlstra

On Fri, Nov 06, 2020 at 06:04:15AM +0000, Shinichiro Kawasaki wrote:
> Greetings,
> 
> I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40"
> in my kernel test system repeatedly, which is printed by unwind_next_frame() in
> "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar
> warning was reported and discussed [2], but I suppose the cause is not yet
> clarified.
> 
> The warning was observed with v5.10-rc2 and older tags. I bisected and found
> that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3
> triggered the warning. Reverting that from 5.10-rc2, the warning disappeared.
> May I ask comment by expertise on CC how this commit can relate to the warning?
> 
> The test condition to reproduce the warning is rather unique (blktests,
> dm-linear and ZNS device emulation by QEMU). If any action is suggested for
> further analysis, I'm willing to take it with my test system.
> 
> Wish this report helps.
> 
> [1] https://lkml.org/lkml/2020/9/6/231
> [2] https://lkml.org/lkml/2020/9/8/1538

Shin'ichiro,

Thanks for all the data.  It looks like the ORC unwinder is getting
confused by paravirt patching (with runtime-patched pushf/pop changing
the stack layout).

<user interrupt>
	exit_to_user_mode_prepare()
		exit_to_user_mode_loop()
			local_irq_disable_exit_to_user()
				local_irq_disable()
					raw_irqs_disabled()
						arch_irqs_disabled()
							arch_local_save_flags()
								pushfq
								<another interrupt>

Objtool doesn't know about the pushf/pop paravirt patch, so ORC gets
confused by the changed stack layout.

I'm thinking we either need to teach objtool how to deal with
save_fl/restore_fl patches, or we need to just get rid of those nasty
patches somehow.  Peter, any thoughts?

It looks like 044d0d6de9f5 ("lockdep: Only trace IRQ edges") is making
the problem more likely, by adding the irqs_disabled() check for every
local_irq_disable().

Also - Peter, Nicholas - is that irqs_disabled() check really necessary
in local_irq_disable()?  Presumably irqs would typically be be enabled
before calling it?

-- 
Josh


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-11 17:05 ` Josh Poimboeuf
@ 2020-11-11 17:47   ` Peter Zijlstra
  2020-11-11 18:13     ` Josh Poimboeuf
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-11-11 17:47 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Shinichiro Kawasaki, linux-kernel, Nicholas Piggin,
	Damien Le Moal, andrew.cooper3, jgross

On Wed, Nov 11, 2020 at 11:05:36AM -0600, Josh Poimboeuf wrote:
> On Fri, Nov 06, 2020 at 06:04:15AM +0000, Shinichiro Kawasaki wrote:
> > Greetings,
> > 
> > I observe "WARNING: can't access registers at asm_common_interrupt+0x1e/0x40"
> > in my kernel test system repeatedly, which is printed by unwind_next_frame() in
> > "arch/x86/kernel/unwind_orc.c". Syzbot already reported that [1]. Similar
> > warning was reported and discussed [2], but I suppose the cause is not yet
> > clarified.
> > 
> > The warning was observed with v5.10-rc2 and older tags. I bisected and found
> > that the commit 044d0d6de9f5 ("lockdep: Only trace IRQ edges") in v5.9-rc3
> > triggered the warning. Reverting that from 5.10-rc2, the warning disappeared.
> > May I ask comment by expertise on CC how this commit can relate to the warning?
> > 
> > The test condition to reproduce the warning is rather unique (blktests,
> > dm-linear and ZNS device emulation by QEMU). If any action is suggested for
> > further analysis, I'm willing to take it with my test system.
> > 
> > Wish this report helps.
> > 
> > [1] https://lkml.org/lkml/2020/9/6/231
> > [2] https://lkml.org/lkml/2020/9/8/1538
> 
> Shin'ichiro,
> 
> Thanks for all the data.  It looks like the ORC unwinder is getting
> confused by paravirt patching (with runtime-patched pushf/pop changing
> the stack layout).
> 
> <user interrupt>
> 	exit_to_user_mode_prepare()
> 		exit_to_user_mode_loop()
> 			local_irq_disable_exit_to_user()
> 				local_irq_disable()
> 					raw_irqs_disabled()
> 						arch_irqs_disabled()
> 							arch_local_save_flags()
> 								pushfq
> 								<another interrupt>

This is PARAVIRT_XXL only, which is a Xen special. My preference, as
always, is to kill it... Sadly the Xen people have a different opinion.

> Objtool doesn't know about the pushf/pop paravirt patch, so ORC gets
> confused by the changed stack layout.
> 
> I'm thinking we either need to teach objtool how to deal with
> save_fl/restore_fl patches, or we need to just get rid of those nasty
> patches somehow.  Peter, any thoughts?

Don't use Xen? ;-)

So with PARAVIRT_XXL the compiler will emit something like:

  "CALL *pvops.save_fl"

Which we then overwrite at runtime with "pushf; pop %[re]ax" and a few
NOPs.

Now, objtool understands alternatives, and ensures they have the same
stack layout, it has no chance in hell of understanding this, simply
because paravirt_patch.c is magic.

I don't have any immediate clever ideas, but let me ponder it a wee bit.

....

Something really disguisting we could do is recognise the indirect call
offset and emit an extra ORC entry for RIP+1. So the cases are:

	CALL *pv_ops.save_fl	-- 7 bytes IIRC
	CALL $imm;		-- 5 bytes
	PUSHF; POP %[RE]AX	-- 2 bytes

so the RIP+1 (the POP insn) will only ever exist in this case. The
indirect and direct call cases would never land on that IP.

....


> It looks like 044d0d6de9f5 ("lockdep: Only trace IRQ edges") is making
> the problem more likely, by adding the irqs_disabled() check for every
> local_irq_disable().
> 
> Also - Peter, Nicholas - is that irqs_disabled() check really necessary
> in local_irq_disable()?  Presumably irqs would typically be be enabled
> before calling it?

Yeah, so it's all a giant can of worms that; also see:

  https://lkml.kernel.org/r/20200821084738.508092956@infradead.org

The basic idea is to only trace edges, ie. when the hardware state
actually changes. Sadly this means doing a pushf/pop before the cli.
Ideally CLI would store the old IF in CF or something like that, but
alas.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-11 17:47   ` Peter Zijlstra
@ 2020-11-11 18:13     ` Josh Poimboeuf
  2020-11-11 18:46       ` Andrew Cooper
  2020-11-11 20:42       ` Peter Zijlstra
  0 siblings, 2 replies; 32+ messages in thread
From: Josh Poimboeuf @ 2020-11-11 18:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Shinichiro Kawasaki, linux-kernel, Nicholas Piggin,
	Damien Le Moal, andrew.cooper3, jgross

On Wed, Nov 11, 2020 at 06:47:36PM +0100, Peter Zijlstra wrote:
> This is PARAVIRT_XXL only, which is a Xen special. My preference, as
> always, is to kill it... Sadly the Xen people have a different opinion.

That would be soooo nice... then we could get rid of paravirt patching
altogether and replace it with static calls.

> > Objtool doesn't know about the pushf/pop paravirt patch, so ORC gets
> > confused by the changed stack layout.
> > 
> > I'm thinking we either need to teach objtool how to deal with
> > save_fl/restore_fl patches, or we need to just get rid of those nasty
> > patches somehow.  Peter, any thoughts?
> 
> Don't use Xen? ;-)
> 
> So with PARAVIRT_XXL the compiler will emit something like:
> 
>   "CALL *pvops.save_fl"
> 
> Which we then overwrite at runtime with "pushf; pop %[re]ax" and a few
> NOPs.
> 
> Now, objtool understands alternatives, and ensures they have the same
> stack layout, it has no chance in hell of understanding this, simply
> because paravirt_patch.c is magic.
> 
> I don't have any immediate clever ideas, but let me ponder it a wee bit.
> 
> ....
> 
> Something really disguisting we could do is recognise the indirect call
> offset and emit an extra ORC entry for RIP+1. So the cases are:
> 
> 	CALL *pv_ops.save_fl	-- 7 bytes IIRC
> 	CALL $imm;		-- 5 bytes
> 	PUSHF; POP %[RE]AX	-- 2 bytes
> 
> so the RIP+1 (the POP insn) will only ever exist in this case. The
> indirect and direct call cases would never land on that IP.

I had a similar idea, and a bit of deja vu - we may have talked about
this before.  At least I know we talked about doing something similar
for alternatives which muck with the stack.

> > It looks like 044d0d6de9f5 ("lockdep: Only trace IRQ edges") is making
> > the problem more likely, by adding the irqs_disabled() check for every
> > local_irq_disable().
> > 
> > Also - Peter, Nicholas - is that irqs_disabled() check really necessary
> > in local_irq_disable()?  Presumably irqs would typically be be enabled
> > before calling it?
> 
> Yeah, so it's all a giant can of worms that; also see:
> 
>   https://lkml.kernel.org/r/20200821084738.508092956@infradead.org
> 
> The basic idea is to only trace edges, ie. when the hardware state
> actually changes. Sadly this means doing a pushf/pop before the cli.
> Ideally CLI would store the old IF in CF or something like that, but
> alas.

Right, that makes sense for save/restore, but is the disabled check
really needed for local_irq_disable()?  Wouldn't that always be an edge?

And anyway I don't see a similar check for local_irq_enable().

-- 
Josh


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-11 18:13     ` Josh Poimboeuf
@ 2020-11-11 18:46       ` Andrew Cooper
  2020-11-11 19:42         ` Peter Zijlstra
  2020-11-11 20:42       ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Cooper @ 2020-11-11 18:46 UTC (permalink / raw)
  To: Josh Poimboeuf, Peter Zijlstra
  Cc: Shinichiro Kawasaki, linux-kernel, Nicholas Piggin,
	Damien Le Moal, jgross

On 11/11/2020 18:13, Josh Poimboeuf wrote:
> On Wed, Nov 11, 2020 at 06:47:36PM +0100, Peter Zijlstra wrote:
>> This is PARAVIRT_XXL only, which is a Xen special. My preference, as
>> always, is to kill it... Sadly the Xen people have a different opinion.
> That would be soooo nice... then we could get rid of paravirt patching
> altogether and replace it with static calls.
>
>>> Objtool doesn't know about the pushf/pop paravirt patch, so ORC gets
>>> confused by the changed stack layout.
>>>
>>> I'm thinking we either need to teach objtool how to deal with
>>> save_fl/restore_fl patches, or we need to just get rid of those nasty
>>> patches somehow.  Peter, any thoughts?
>> Don't use Xen? ;-)
>>
>> So with PARAVIRT_XXL the compiler will emit something like:
>>
>>   "CALL *pvops.save_fl"
>>
>> Which we then overwrite at runtime with "pushf; pop %[re]ax" and a few
>> NOPs.
>>
>> Now, objtool understands alternatives, and ensures they have the same
>> stack layout, it has no chance in hell of understanding this, simply
>> because paravirt_patch.c is magic.
>>
>> I don't have any immediate clever ideas, but let me ponder it a wee bit.

Well...

static_calls are a newer, and more generic, form of pvops.  Most of the
magic is to do with inlining small fragments, but static calls can do
that now too, IIRC?

>> ....
>>
>> Something really disguisting we could do is recognise the indirect call
>> offset and emit an extra ORC entry for RIP+1. So the cases are:
>>
>> 	CALL *pv_ops.save_fl	-- 7 bytes IIRC
>> 	CALL $imm;		-- 5 bytes
>> 	PUSHF; POP %[RE]AX	-- 2 bytes
>>
>> so the RIP+1 (the POP insn) will only ever exist in this case. The
>> indirect and direct call cases would never land on that IP.
> I had a similar idea, and a bit of deja vu - we may have talked about
> this before.  At least I know we talked about doing something similar
> for alternatives which muck with the stack.

The main complexity with pvops is that the

    CALL *pv_ops.save_fl

form needs to be usable from extremely early in the day (pre general
patching), hence the use of function pointers and some non-standard ABIs.

For performance reasons, the end result of this pvop wants to be `pushf;
pop %[re]ax` in then native case, and `call xen_pv_save_fl` in the Xen
case, but this doesn't mean that the compiled instruction needs to be a
function pointer to begin with.

Would objtool have an easier time coping if this were implemented in
terms of a static call?

~Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-11 18:46       ` Andrew Cooper
@ 2020-11-11 19:42         ` Peter Zijlstra
  2020-11-11 19:59           ` Josh Poimboeuf
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-11-11 19:42 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Josh Poimboeuf, Shinichiro Kawasaki, linux-kernel,
	Nicholas Piggin, Damien Le Moal, jgross, x86

On Wed, Nov 11, 2020 at 06:46:37PM +0000, Andrew Cooper wrote:

> Well...
> 
> static_calls are a newer, and more generic, form of pvops.  Most of the
> magic is to do with inlining small fragments, but static calls can do
> that now too, IIRC?

If you're referring to this glorious hack:

  https://lkml.kernel.org/r/20201110101307.GO2651@hirez.programming.kicks-ass.net

that only 'works' because it's a single instruction. That is,
static_call can only poke single instructions. They cannot replace a
call with "PUSHF; POP" / "PUSH; POPF" for example. They also cannot do
NOP padding for 'short' sequences.

Paravirt, like alternatives, are special in that they only happen once,
before SMP bringup.

> >> Something really disguisting we could do is recognise the indirect call
> >> offset and emit an extra ORC entry for RIP+1. So the cases are:
> >>
> >> 	CALL *pv_ops.save_fl	-- 7 bytes IIRC
> >> 	CALL $imm;		-- 5 bytes
> >> 	PUSHF; POP %[RE]AX	-- 2 bytes
> >>
> >> so the RIP+1 (the POP insn) will only ever exist in this case. The
> >> indirect and direct call cases would never land on that IP.
> > I had a similar idea, and a bit of deja vu - we may have talked about
> > this before.  At least I know we talked about doing something similar
> > for alternatives which muck with the stack.

Vague memories... luckily we managed to get alternatives to a state
where they match, which is much saner.

> The main complexity with pvops is that the
> 
>     CALL *pv_ops.save_fl
> 
> form needs to be usable from extremely early in the day (pre general
> patching), hence the use of function pointers and some non-standard ABIs.

The performance rasins mentioned below are a large part of the
non-standard ABI (eg CALLEE_SAVE)

> For performance reasons, the end result of this pvop wants to be `pushf;
> pop %[re]ax` in then native case, and `call xen_pv_save_fl` in the Xen
> case, but this doesn't mean that the compiled instruction needs to be a
> function pointer to begin with.

Not sure emitting the native code would be feasible.. also
cpu_usergs_sysret64 is 6 bytes.

> Would objtool have an easier time coping if this were implemented in
> terms of a static call?

I doubt it, the big problem is that there is no visibility into the
actual alternative text. Runtime patching fragments into static call
would have the exact same problem.

Something that _might_ maybe work is trying to morph the immediate
fragments into an alternative. That is, instead of this:

static inline notrace unsigned long arch_local_save_flags(void)
{
	return PVOP_CALLEE0(unsigned long, irq.save_fl);
}

Write it something like:

static inline notrace unsigned long arch_local_save_flags(void)
{
	PVOP_CALL_ARGS;
	PVOP_TEST_NULL(irq.save_fl);
	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
					"PUSHF; POP _ASM_AX",
					X86_FEATURE_NATIVE)
			    : CLBR_RET_REG, ASM_CALL_CONSTRAINT
			    : paravirt_type(irq.save_fl.func),
			      paravirt_clobber(PVOP_CALLEE_CLOBBERS)
			    : "memory", "cc");
	return __eax;
}

And then we have to teach objtool how to deal with conflicting
alternatives...

That would remove most (all, if we can figure out a form that deals with
the spinlock fragments) of paravirt_patch.c

Hmm?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-11 19:42         ` Peter Zijlstra
@ 2020-11-11 19:59           ` Josh Poimboeuf
  2020-11-11 20:07             ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Josh Poimboeuf @ 2020-11-11 19:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Cooper, Shinichiro Kawasaki, linux-kernel,
	Nicholas Piggin, Damien Le Moal, jgross, x86

On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote:
> > Would objtool have an easier time coping if this were implemented in
> > terms of a static call?
> 
> I doubt it, the big problem is that there is no visibility into the
> actual alternative text. Runtime patching fragments into static call
> would have the exact same problem.
> 
> Something that _might_ maybe work is trying to morph the immediate
> fragments into an alternative. That is, instead of this:
> 
> static inline notrace unsigned long arch_local_save_flags(void)
> {
> 	return PVOP_CALLEE0(unsigned long, irq.save_fl);
> }
> 
> Write it something like:
> 
> static inline notrace unsigned long arch_local_save_flags(void)
> {
> 	PVOP_CALL_ARGS;
> 	PVOP_TEST_NULL(irq.save_fl);
> 	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
> 					"PUSHF; POP _ASM_AX",
> 					X86_FEATURE_NATIVE)
> 			    : CLBR_RET_REG, ASM_CALL_CONSTRAINT
> 			    : paravirt_type(irq.save_fl.func),
> 			      paravirt_clobber(PVOP_CALLEE_CLOBBERS)
> 			    : "memory", "cc");
> 	return __eax;
> }
> 
> And then we have to teach objtool how to deal with conflicting
> alternatives...
> 
> That would remove most (all, if we can figure out a form that deals with
> the spinlock fragments) of paravirt_patch.c
> 
> Hmm?

I was going to suggest something similar.  Though I would try to take it
further and replace paravirt_patch_default() with static calls.

Either way it doesn't make objtool's job much easier.  But it would be
nice to consolidate runtime patching mechanisms and get rid of
.parainstructions.

-- 
Josh


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-11 19:59           ` Josh Poimboeuf
@ 2020-11-11 20:07             ` Peter Zijlstra
  2020-11-11 20:15               ` Josh Poimboeuf
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-11-11 20:07 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andrew Cooper, Shinichiro Kawasaki, linux-kernel,
	Nicholas Piggin, Damien Le Moal, jgross, x86

On Wed, Nov 11, 2020 at 01:59:00PM -0600, Josh Poimboeuf wrote:
> On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote:
> > > Would objtool have an easier time coping if this were implemented in
> > > terms of a static call?
> > 
> > I doubt it, the big problem is that there is no visibility into the
> > actual alternative text. Runtime patching fragments into static call
> > would have the exact same problem.
> > 
> > Something that _might_ maybe work is trying to morph the immediate
> > fragments into an alternative. That is, instead of this:
> > 
> > static inline notrace unsigned long arch_local_save_flags(void)
> > {
> > 	return PVOP_CALLEE0(unsigned long, irq.save_fl);
> > }
> > 
> > Write it something like:
> > 
> > static inline notrace unsigned long arch_local_save_flags(void)
> > {
> > 	PVOP_CALL_ARGS;
> > 	PVOP_TEST_NULL(irq.save_fl);
> > 	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
> > 					"PUSHF; POP _ASM_AX",
> > 					X86_FEATURE_NATIVE)
> > 			    : CLBR_RET_REG, ASM_CALL_CONSTRAINT
> > 			    : paravirt_type(irq.save_fl.func),
> > 			      paravirt_clobber(PVOP_CALLEE_CLOBBERS)
> > 			    : "memory", "cc");
> > 	return __eax;
> > }
> > 
> > And then we have to teach objtool how to deal with conflicting
> > alternatives...
> > 
> > That would remove most (all, if we can figure out a form that deals with
> > the spinlock fragments) of paravirt_patch.c
> > 
> > Hmm?
> 
> I was going to suggest something similar.  Though I would try to take it
> further and replace paravirt_patch_default() with static calls.

Possible, we just need to be _really_ careful to not allow changing
those static_call()s. So maybe we need DEFINE_STATIC_CALL_RO() which
does a __ro_after_init on the whole thing.

> Either way it doesn't make objtool's job much easier.  But it would be
> nice to consolidate runtime patching mechanisms and get rid of
> .parainstructions.

I think the above (combining alternative and paravirt/static_call) does
make objtool's job easier, since then we at least have the actual
alternative instructions available to inspect, or am I mis-understanding
things?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-11 20:07             ` Peter Zijlstra
@ 2020-11-11 20:15               ` Josh Poimboeuf
  2020-11-11 20:25                 ` Andrew Cooper
  2020-11-11 20:35                 ` Peter Zijlstra
  0 siblings, 2 replies; 32+ messages in thread
From: Josh Poimboeuf @ 2020-11-11 20:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Cooper, Shinichiro Kawasaki, linux-kernel,
	Nicholas Piggin, Damien Le Moal, jgross, x86

On Wed, Nov 11, 2020 at 09:07:30PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2020 at 01:59:00PM -0600, Josh Poimboeuf wrote:
> > On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote:
> > > > Would objtool have an easier time coping if this were implemented in
> > > > terms of a static call?
> > > 
> > > I doubt it, the big problem is that there is no visibility into the
> > > actual alternative text. Runtime patching fragments into static call
> > > would have the exact same problem.
> > > 
> > > Something that _might_ maybe work is trying to morph the immediate
> > > fragments into an alternative. That is, instead of this:
> > > 
> > > static inline notrace unsigned long arch_local_save_flags(void)
> > > {
> > > 	return PVOP_CALLEE0(unsigned long, irq.save_fl);
> > > }
> > > 
> > > Write it something like:
> > > 
> > > static inline notrace unsigned long arch_local_save_flags(void)
> > > {
> > > 	PVOP_CALL_ARGS;
> > > 	PVOP_TEST_NULL(irq.save_fl);
> > > 	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
> > > 					"PUSHF; POP _ASM_AX",
> > > 					X86_FEATURE_NATIVE)
> > > 			    : CLBR_RET_REG, ASM_CALL_CONSTRAINT
> > > 			    : paravirt_type(irq.save_fl.func),
> > > 			      paravirt_clobber(PVOP_CALLEE_CLOBBERS)
> > > 			    : "memory", "cc");
> > > 	return __eax;
> > > }
> > > 
> > > And then we have to teach objtool how to deal with conflicting
> > > alternatives...
> > > 
> > > That would remove most (all, if we can figure out a form that deals with
> > > the spinlock fragments) of paravirt_patch.c
> > > 
> > > Hmm?
> > 
> > I was going to suggest something similar.  Though I would try to take it
> > further and replace paravirt_patch_default() with static calls.
> 
> Possible, we just need to be _really_ careful to not allow changing
> those static_call()s. So maybe we need DEFINE_STATIC_CALL_RO() which
> does a __ro_after_init on the whole thing.

But what if you want to live migrate to another hypervisor ;-)

> > Either way it doesn't make objtool's job much easier.  But it would be
> > nice to consolidate runtime patching mechanisms and get rid of
> > .parainstructions.
> 
> I think the above (combining alternative and paravirt/static_call) does
> make objtool's job easier, since then we at least have the actual
> alternative instructions available to inspect, or am I mis-understanding
> things?

Right, it makes objtool's job a _little_ easier, since it already knows
how to read alternatives.  But it still has to learn to deal with the
conflicting stack layouts.

-- 
Josh


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-11 20:15               ` Josh Poimboeuf
@ 2020-11-11 20:25                 ` Andrew Cooper
  2020-11-11 20:39                   ` Peter Zijlstra
  2020-11-13 17:34                   ` Andy Lutomirski
  2020-11-11 20:35                 ` Peter Zijlstra
  1 sibling, 2 replies; 32+ messages in thread
From: Andrew Cooper @ 2020-11-11 20:25 UTC (permalink / raw)
  To: Josh Poimboeuf, Peter Zijlstra
  Cc: Shinichiro Kawasaki, linux-kernel, Nicholas Piggin,
	Damien Le Moal, jgross, x86

On 11/11/2020 20:15, Josh Poimboeuf wrote:
> On Wed, Nov 11, 2020 at 09:07:30PM +0100, Peter Zijlstra wrote:
>> On Wed, Nov 11, 2020 at 01:59:00PM -0600, Josh Poimboeuf wrote:
>>> On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote:
>>>>> Would objtool have an easier time coping if this were implemented in
>>>>> terms of a static call?
>>>> I doubt it, the big problem is that there is no visibility into the
>>>> actual alternative text. Runtime patching fragments into static call
>>>> would have the exact same problem.
>>>>
>>>> Something that _might_ maybe work is trying to morph the immediate
>>>> fragments into an alternative. That is, instead of this:
>>>>
>>>> static inline notrace unsigned long arch_local_save_flags(void)
>>>> {
>>>> 	return PVOP_CALLEE0(unsigned long, irq.save_fl);
>>>> }
>>>>
>>>> Write it something like:
>>>>
>>>> static inline notrace unsigned long arch_local_save_flags(void)
>>>> {
>>>> 	PVOP_CALL_ARGS;
>>>> 	PVOP_TEST_NULL(irq.save_fl);
>>>> 	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
>>>> 					"PUSHF; POP _ASM_AX",
>>>> 					X86_FEATURE_NATIVE)
>>>> 			    : CLBR_RET_REG, ASM_CALL_CONSTRAINT
>>>> 			    : paravirt_type(irq.save_fl.func),
>>>> 			      paravirt_clobber(PVOP_CALLEE_CLOBBERS)
>>>> 			    : "memory", "cc");
>>>> 	return __eax;
>>>> }
>>>>
>>>> And then we have to teach objtool how to deal with conflicting
>>>> alternatives...
>>>>
>>>> That would remove most (all, if we can figure out a form that deals with
>>>> the spinlock fragments) of paravirt_patch.c
>>>>
>>>> Hmm?
>>> I was going to suggest something similar.  Though I would try to take it
>>> further and replace paravirt_patch_default() with static calls.
>> Possible, we just need to be _really_ careful to not allow changing
>> those static_call()s. So maybe we need DEFINE_STATIC_CALL_RO() which
>> does a __ro_after_init on the whole thing.
> But what if you want to live migrate to another hypervisor ;-)

The same as what happens currently.  The user gets to keep all the
resulting pieces ;)

>>> Either way it doesn't make objtool's job much easier.  But it would be
>>> nice to consolidate runtime patching mechanisms and get rid of
>>> .parainstructions.
>> I think the above (combining alternative and paravirt/static_call) does
>> make objtool's job easier, since then we at least have the actual
>> alternative instructions available to inspect, or am I mis-understanding
>> things?
> Right, it makes objtool's job a _little_ easier, since it already knows
> how to read alternatives.  But it still has to learn to deal with the
> conflicting stack layouts.

I suppose the needed abstraction is "these blocks will start and end
with the same stack layout", while allowing the internals to diverge.

~Andrew

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-11 20:15               ` Josh Poimboeuf
  2020-11-11 20:25                 ` Andrew Cooper
@ 2020-11-11 20:35                 ` Peter Zijlstra
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2020-11-11 20:35 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andrew Cooper, Shinichiro Kawasaki, linux-kernel,
	Nicholas Piggin, Damien Le Moal, jgross, x86

On Wed, Nov 11, 2020 at 02:15:06PM -0600, Josh Poimboeuf wrote:
> On Wed, Nov 11, 2020 at 09:07:30PM +0100, Peter Zijlstra wrote:

> > Possible, we just need to be _really_ careful to not allow changing
> > those static_call()s. So maybe we need DEFINE_STATIC_CALL_RO() which
> > does a __ro_after_init on the whole thing.
> 
> But what if you want to live migrate to another hypervisor ;-)

Then you get to keep the pieces ;-)

> > > Either way it doesn't make objtool's job much easier.  But it would be
> > > nice to consolidate runtime patching mechanisms and get rid of
> > > .parainstructions.
> > 
> > I think the above (combining alternative and paravirt/static_call) does
> > make objtool's job easier, since then we at least have the actual
> > alternative instructions available to inspect, or am I mis-understanding
> > things?
> 
> Right, it makes objtool's job a _little_ easier, since it already knows
> how to read alternatives.  But it still has to learn to deal with the
> conflicting stack layouts.

Right, but at least now it can see the instructions. Which is a lot
better than: `add a magic +1 ORC entry when you see an indirect call to
$magic`.

Anyway, __orc_find(addr) looks for the ORC entry with the highest IP <=
@addr. So if we have:

	alt0			alt1

	0x00 CALL *foo		0x00 PUSHF
	0x07 insn		0x01 POP %rax
				0x02 .nops 5
				0x07 insn

we have ORC entries for alt1.0x00 and alt1.0x01. Then if we hit insn,
it'll find the alt1.0x01 entry, but that had better be the same as the
state at 0x00.

This means that for every alt, we have to unwind using the CFI of every
other alt and match for every instruction. Which should be doable I
think.

Certainly more complicated that outright disallowing CFI changes inside
alt groups though :/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-11 20:25                 ` Andrew Cooper
@ 2020-11-11 20:39                   ` Peter Zijlstra
  2020-11-13 17:34                   ` Andy Lutomirski
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2020-11-11 20:39 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Josh Poimboeuf, Shinichiro Kawasaki, linux-kernel,
	Nicholas Piggin, Damien Le Moal, jgross, x86

On Wed, Nov 11, 2020 at 08:25:36PM +0000, Andrew Cooper wrote:

> > Right, it makes objtool's job a _little_ easier, since it already knows
> > how to read alternatives.  But it still has to learn to deal with the
> > conflicting stack layouts.
> 
> I suppose the needed abstraction is "these blocks will start and end
> with the same stack layout", while allowing the internals to diverge.

It's a little more complicated than that due to the fact that there is
only a single ORC table.

So something like:

	alt0		alt1
	0x00 push	0x00 nop
	0x01 nop	0x01 push

is impossible to correctly unwind.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-11 18:13     ` Josh Poimboeuf
  2020-11-11 18:46       ` Andrew Cooper
@ 2020-11-11 20:42       ` Peter Zijlstra
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2020-11-11 20:42 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Shinichiro Kawasaki, linux-kernel, Nicholas Piggin,
	Damien Le Moal, andrew.cooper3, jgross

On Wed, Nov 11, 2020 at 12:13:28PM -0600, Josh Poimboeuf wrote:
> On Wed, Nov 11, 2020 at 06:47:36PM +0100, Peter Zijlstra wrote:

> > Yeah, so it's all a giant can of worms that; also see:
> > 
> >   https://lkml.kernel.org/r/20200821084738.508092956@infradead.org
> > 
> > The basic idea is to only trace edges, ie. when the hardware state
> > actually changes. Sadly this means doing a pushf/pop before the cli.
> > Ideally CLI would store the old IF in CF or something like that, but
> > alas.
> 
> Right, that makes sense for save/restore, but is the disabled check
> really needed for local_irq_disable()?  Wouldn't that always be an edge?

IIRC there is code that does local_irq_disable() even though IRQs are
already disabled. This is 'harmless'.

> And anyway I don't see a similar check for local_irq_enable().

I know there is code that does local_irq_enable() with IRQs already
enabled, I'm not exactly sure why this is different. I'll have to put it
on the todo list :/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-11 20:25                 ` Andrew Cooper
  2020-11-11 20:39                   ` Peter Zijlstra
@ 2020-11-13 17:34                   ` Andy Lutomirski
  2020-11-14  9:16                     ` Jürgen Groß
  2020-11-16 11:56                     ` Jürgen Groß
  1 sibling, 2 replies; 32+ messages in thread
From: Andy Lutomirski @ 2020-11-13 17:34 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Josh Poimboeuf, Peter Zijlstra, Shinichiro Kawasaki,
	linux-kernel, Nicholas Piggin, Damien Le Moal, Juergen Gross,
	X86 ML

On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 11/11/2020 20:15, Josh Poimboeuf wrote:
> > On Wed, Nov 11, 2020 at 09:07:30PM +0100, Peter Zijlstra wrote:
> >> On Wed, Nov 11, 2020 at 01:59:00PM -0600, Josh Poimboeuf wrote:
> >>> On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote:
> >>>>> Would objtool have an easier time coping if this were implemented in
> >>>>> terms of a static call?
> >>>> I doubt it, the big problem is that there is no visibility into the
> >>>> actual alternative text. Runtime patching fragments into static call
> >>>> would have the exact same problem.
> >>>>
> >>>> Something that _might_ maybe work is trying to morph the immediate
> >>>> fragments into an alternative. That is, instead of this:
> >>>>
> >>>> static inline notrace unsigned long arch_local_save_flags(void)
> >>>> {
> >>>>    return PVOP_CALLEE0(unsigned long, irq.save_fl);
> >>>> }
> >>>>
> >>>> Write it something like:
> >>>>
> >>>> static inline notrace unsigned long arch_local_save_flags(void)
> >>>> {
> >>>>    PVOP_CALL_ARGS;
> >>>>    PVOP_TEST_NULL(irq.save_fl);
> >>>>    asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
> >>>>                                    "PUSHF; POP _ASM_AX",
> >>>>                                    X86_FEATURE_NATIVE)
> >>>>                        : CLBR_RET_REG, ASM_CALL_CONSTRAINT
> >>>>                        : paravirt_type(irq.save_fl.func),
> >>>>                          paravirt_clobber(PVOP_CALLEE_CLOBBERS)
> >>>>                        : "memory", "cc");
> >>>>    return __eax;
> >>>> }
> >>>>
> >>>> And then we have to teach objtool how to deal with conflicting
> >>>> alternatives...
> >>>>
> >>>> That would remove most (all, if we can figure out a form that deals with
> >>>> the spinlock fragments) of paravirt_patch.c
> >>>>
> >>>> Hmm?
> >>> I was going to suggest something similar.  Though I would try to take it
> >>> further and replace paravirt_patch_default() with static calls.
> >> Possible, we just need to be _really_ careful to not allow changing
> >> those static_call()s. So maybe we need DEFINE_STATIC_CALL_RO() which
> >> does a __ro_after_init on the whole thing.
> > But what if you want to live migrate to another hypervisor ;-)
>
> The same as what happens currently.  The user gets to keep all the
> resulting pieces ;)
>
> >>> Either way it doesn't make objtool's job much easier.  But it would be
> >>> nice to consolidate runtime patching mechanisms and get rid of
> >>> .parainstructions.
> >> I think the above (combining alternative and paravirt/static_call) does
> >> make objtool's job easier, since then we at least have the actual
> >> alternative instructions available to inspect, or am I mis-understanding
> >> things?
> > Right, it makes objtool's job a _little_ easier, since it already knows
> > how to read alternatives.  But it still has to learn to deal with the
> > conflicting stack layouts.
>
> I suppose the needed abstraction is "these blocks will start and end
> with the same stack layout", while allowing the internals to diverge.
>

How much of this stuff is actually useful anymore?  I'm wondering if
we can move most or all of this crud to
cpu_feature_enabled(X86_FEATURE_XENPV) and its asm equivalent.  The
full list, annotated, appears to be:

        const unsigned char     irq_irq_disable[1];

This is CLI or CALL, right?

        const unsigned char     irq_irq_enable[1];

STI or CALL.

        const unsigned char     irq_save_fl[2];

PUSHF; POP %r/eax.  I *think* I read the paravirt mess correctly and
this also turns into CALL.

        const unsigned char     mmu_read_cr2[3];
        const unsigned char     mmu_read_cr3[3];
        const unsigned char     mmu_write_cr3[3];

The write CR3 is so slow that I can't imagine us caring.  Reading CR3
should already be fairly optimized because it's slow on old non-PV
hypervisors, too.  Reading CR2 is rare and lives in asm.  These also
appear to just switch between MOV and CALL, anyway.

        const unsigned char     irq_restore_fl[2];

Ugh, this one sucks.  IMO it should be, for native and PV:

if (flags & X86_EFLAGS_IF) {
  local_irq_enable();  /* or raw? */
} else {
  if (some debugging option) {
    WARN_ON_ONCE(save_fl() & X86_EFLAGS_IF);
  }
}

POPF is slooooow.

        const unsigned char     cpu_wbinvd[2];

This is hilariously slow no matter what.  static_call() or even just a
plain old indirect call should be fine.

        const unsigned char     cpu_usergs_sysret64[6];

This is in the asm and we shouldn't be doing it at all for Xen PV.
IOW we should just drop this patch site entirely.  I can possibly find
some time to get rid of it, and maybe someone from Xen land can help.
I bet that we can gain a lot of perf on Xen PV by cleaning this up,
and I bet it will simplify everything.

        const unsigned char     cpu_swapgs[3];

This is SWAPGS or nop, unless I've missed some subtlety.

        const unsigned char     mov64[3];

This is some PTE magic, and I haven't deciphered it yet.

So I think there is at most one of these that wants anything more
complicated than a plain ALTERNATIVE.  Any volunteers to make it so?
Juergen, if you do all of them except USERGS_SYSRET64, I hereby
volunteer to do that one.

BTW, if y'all want to live migrate between Xen PV and anything else,
you are nuts.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-13 17:34                   ` Andy Lutomirski
@ 2020-11-14  9:16                     ` Jürgen Groß
  2020-11-14 18:10                       ` Andy Lutomirski
  2020-11-16 11:56                     ` Jürgen Groß
  1 sibling, 1 reply; 32+ messages in thread
From: Jürgen Groß @ 2020-11-14  9:16 UTC (permalink / raw)
  To: Andy Lutomirski, Andrew Cooper
  Cc: Josh Poimboeuf, Peter Zijlstra, Shinichiro Kawasaki,
	linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML


[-- Attachment #1.1.1: Type: text/plain, Size: 6143 bytes --]

On 13.11.20 18:34, Andy Lutomirski wrote:
> On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>>
>> On 11/11/2020 20:15, Josh Poimboeuf wrote:
>>> On Wed, Nov 11, 2020 at 09:07:30PM +0100, Peter Zijlstra wrote:
>>>> On Wed, Nov 11, 2020 at 01:59:00PM -0600, Josh Poimboeuf wrote:
>>>>> On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote:
>>>>>>> Would objtool have an easier time coping if this were implemented in
>>>>>>> terms of a static call?
>>>>>> I doubt it, the big problem is that there is no visibility into the
>>>>>> actual alternative text. Runtime patching fragments into static call
>>>>>> would have the exact same problem.
>>>>>>
>>>>>> Something that _might_ maybe work is trying to morph the immediate
>>>>>> fragments into an alternative. That is, instead of this:
>>>>>>
>>>>>> static inline notrace unsigned long arch_local_save_flags(void)
>>>>>> {
>>>>>>     return PVOP_CALLEE0(unsigned long, irq.save_fl);
>>>>>> }
>>>>>>
>>>>>> Write it something like:
>>>>>>
>>>>>> static inline notrace unsigned long arch_local_save_flags(void)
>>>>>> {
>>>>>>     PVOP_CALL_ARGS;
>>>>>>     PVOP_TEST_NULL(irq.save_fl);
>>>>>>     asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
>>>>>>                                     "PUSHF; POP _ASM_AX",
>>>>>>                                     X86_FEATURE_NATIVE)
>>>>>>                         : CLBR_RET_REG, ASM_CALL_CONSTRAINT
>>>>>>                         : paravirt_type(irq.save_fl.func),
>>>>>>                           paravirt_clobber(PVOP_CALLEE_CLOBBERS)
>>>>>>                         : "memory", "cc");
>>>>>>     return __eax;
>>>>>> }
>>>>>>
>>>>>> And then we have to teach objtool how to deal with conflicting
>>>>>> alternatives...
>>>>>>
>>>>>> That would remove most (all, if we can figure out a form that deals with
>>>>>> the spinlock fragments) of paravirt_patch.c
>>>>>>
>>>>>> Hmm?
>>>>> I was going to suggest something similar.  Though I would try to take it
>>>>> further and replace paravirt_patch_default() with static calls.
>>>> Possible, we just need to be _really_ careful to not allow changing
>>>> those static_call()s. So maybe we need DEFINE_STATIC_CALL_RO() which
>>>> does a __ro_after_init on the whole thing.
>>> But what if you want to live migrate to another hypervisor ;-)
>>
>> The same as what happens currently.  The user gets to keep all the
>> resulting pieces ;)
>>
>>>>> Either way it doesn't make objtool's job much easier.  But it would be
>>>>> nice to consolidate runtime patching mechanisms and get rid of
>>>>> .parainstructions.
>>>> I think the above (combining alternative and paravirt/static_call) does
>>>> make objtool's job easier, since then we at least have the actual
>>>> alternative instructions available to inspect, or am I mis-understanding
>>>> things?
>>> Right, it makes objtool's job a _little_ easier, since it already knows
>>> how to read alternatives.  But it still has to learn to deal with the
>>> conflicting stack layouts.
>>
>> I suppose the needed abstraction is "these blocks will start and end
>> with the same stack layout", while allowing the internals to diverge.
>>
> 
> How much of this stuff is actually useful anymore?  I'm wondering if
> we can move most or all of this crud to
> cpu_feature_enabled(X86_FEATURE_XENPV) and its asm equivalent.  The
> full list, annotated, appears to be:
> 
>          const unsigned char     irq_irq_disable[1];
> 
> This is CLI or CALL, right?

Yes.

> 
>          const unsigned char     irq_irq_enable[1];
> 
> STI or CALL.

Yes.

> 
>          const unsigned char     irq_save_fl[2];
> 
> PUSHF; POP %r/eax.  I *think* I read the paravirt mess correctly and
> this also turns into CALL.

It does.

> 
>          const unsigned char     mmu_read_cr2[3];
>          const unsigned char     mmu_read_cr3[3];
>          const unsigned char     mmu_write_cr3[3];
> 
> The write CR3 is so slow that I can't imagine us caring.  Reading CR3
> should already be fairly optimized because it's slow on old non-PV
> hypervisors, too.  Reading CR2 is rare and lives in asm.  These also
> appear to just switch between MOV and CALL, anyway.

Correct.

> 
>          const unsigned char     irq_restore_fl[2];
> 
> Ugh, this one sucks.  IMO it should be, for native and PV:
> 
> if (flags & X86_EFLAGS_IF) {
>    local_irq_enable();  /* or raw? */
> } else {
>    if (some debugging option) {
>      WARN_ON_ONCE(save_fl() & X86_EFLAGS_IF);
>    }
> }

Seems sensible.

> 
> POPF is slooooow.
> 
>          const unsigned char     cpu_wbinvd[2];
> 
> This is hilariously slow no matter what.  static_call() or even just a
> plain old indirect call should be fine.

I'd go with the static_call().

> 
>          const unsigned char     cpu_usergs_sysret64[6];
> 
> This is in the asm and we shouldn't be doing it at all for Xen PV.
> IOW we should just drop this patch site entirely.  I can possibly find
> some time to get rid of it, and maybe someone from Xen land can help.
> I bet that we can gain a lot of perf on Xen PV by cleaning this up,
> and I bet it will simplify everything.
> 
>          const unsigned char     cpu_swapgs[3];
> 
> This is SWAPGS or nop, unless I've missed some subtlety.
> 
>          const unsigned char     mov64[3];
> 
> This is some PTE magic, and I haven't deciphered it yet.

Either a mov or a call.

> 
> So I think there is at most one of these that wants anything more
> complicated than a plain ALTERNATIVE.  Any volunteers to make it so?
> Juergen, if you do all of them except USERGS_SYSRET64, I hereby
> volunteer to do that one.

Why is a plain alternative (either swapgs; sysretq or a jmp xen_sysret64
depending on X86_FEATURE_XENPV) no option?

Its not as if this code would run before alternative patching.

> 
> BTW, if y'all want to live migrate between Xen PV and anything else,
> you are nuts.
> 

That's no option. Xen PV is a guest property, not one of the hypervisor.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-14  9:16                     ` Jürgen Groß
@ 2020-11-14 18:10                       ` Andy Lutomirski
  2020-11-15  6:33                         ` Jürgen Groß
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2020-11-14 18:10 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, Peter Zijlstra,
	Shinichiro Kawasaki, linux-kernel, Nicholas Piggin,
	Damien Le Moal, X86 ML

On Sat, Nov 14, 2020 at 1:16 AM Jürgen Groß <jgross@suse.com> wrote:
>
> On 13.11.20 18:34, Andy Lutomirski wrote:
> > On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> >
> > So I think there is at most one of these that wants anything more
> > complicated than a plain ALTERNATIVE.  Any volunteers to make it so?
> > Juergen, if you do all of them except USERGS_SYSRET64, I hereby
> > volunteer to do that one.
>
> Why is a plain alternative (either swapgs; sysretq or a jmp xen_sysret64
> depending on X86_FEATURE_XENPV) no option?
>
> Its not as if this code would run before alternative patching.

ALTERNATIVE would "work" in the sense that it would function and be
just about as nonsensical as the current code.  Fundamentally, Xen
PV's sysret feature is not a drop-in replacement for SYSRET64, and
pretending that it is seems unlikely to work well.  I suspect that the
current code is some combination of exceedingly slow, non-functional,
and incorrect in subtle ways.

We should just have a separate Xen PV exit path the same way we have a
separate entry path in recent kernels.  *This* is what I'm
volunteering to do.

--Andy

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-14 18:10                       ` Andy Lutomirski
@ 2020-11-15  6:33                         ` Jürgen Groß
  2020-11-15 16:05                           ` Andy Lutomirski
  0 siblings, 1 reply; 32+ messages in thread
From: Jürgen Groß @ 2020-11-15  6:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andrew Cooper, Josh Poimboeuf, Peter Zijlstra,
	Shinichiro Kawasaki, linux-kernel, Nicholas Piggin,
	Damien Le Moal, X86 ML


[-- Attachment #1.1.1: Type: text/plain, Size: 1486 bytes --]

On 14.11.20 19:10, Andy Lutomirski wrote:
> On Sat, Nov 14, 2020 at 1:16 AM Jürgen Groß <jgross@suse.com> wrote:
>>
>> On 13.11.20 18:34, Andy Lutomirski wrote:
>>> On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper
>>> <andrew.cooper3@citrix.com> wrote:
>>>
>>> So I think there is at most one of these that wants anything more
>>> complicated than a plain ALTERNATIVE.  Any volunteers to make it so?
>>> Juergen, if you do all of them except USERGS_SYSRET64, I hereby
>>> volunteer to do that one.
>>
>> Why is a plain alternative (either swapgs; sysretq or a jmp xen_sysret64
>> depending on X86_FEATURE_XENPV) no option?
>>
>> Its not as if this code would run before alternative patching.
> 
> ALTERNATIVE would "work" in the sense that it would function and be
> just about as nonsensical as the current code.  Fundamentally, Xen
> PV's sysret feature is not a drop-in replacement for SYSRET64, and
> pretending that it is seems unlikely to work well.  I suspect that the
> current code is some combination of exceedingly slow, non-functional,
> and incorrect in subtle ways.
> 
> We should just have a separate Xen PV exit path the same way we have a
> separate entry path in recent kernels.  *This* is what I'm
> volunteering to do.

I don't think there is much work needed. Xen PV does basically a return
to user mode via IRET. I think it might work just to use
swapgs_restore_regs_and_return_to_usermode() unconditionally for Xen PV.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-15  6:33                         ` Jürgen Groß
@ 2020-11-15 16:05                           ` Andy Lutomirski
  2020-11-15 16:13                             ` Jürgen Groß
  0 siblings, 1 reply; 32+ messages in thread
From: Andy Lutomirski @ 2020-11-15 16:05 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, Peter Zijlstra,
	Shinichiro Kawasaki, linux-kernel, Nicholas Piggin,
	Damien Le Moal, X86 ML


> On Nov 14, 2020, at 10:33 PM, Jürgen Groß <jgross@suse.com> wrote:
> 
> On 14.11.20 19:10, Andy Lutomirski wrote:
>>> On Sat, Nov 14, 2020 at 1:16 AM Jürgen Groß <jgross@suse.com> wrote:
>>> 
>>> On 13.11.20 18:34, Andy Lutomirski wrote:
>>>> On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper
>>>> <andrew.cooper3@citrix.com> wrote:
>>>> 
>>>> So I think there is at most one of these that wants anything more
>>>> complicated than a plain ALTERNATIVE.  Any volunteers to make it so?
>>>> Juergen, if you do all of them except USERGS_SYSRET64, I hereby
>>>> volunteer to do that one.
>>> 
>>> Why is a plain alternative (either swapgs; sysretq or a jmp xen_sysret64
>>> depending on X86_FEATURE_XENPV) no option?
>>> 
>>> Its not as if this code would run before alternative patching.
>> ALTERNATIVE would "work" in the sense that it would function and be
>> just about as nonsensical as the current code.  Fundamentally, Xen
>> PV's sysret feature is not a drop-in replacement for SYSRET64, and
>> pretending that it is seems unlikely to work well.  I suspect that the
>> current code is some combination of exceedingly slow, non-functional,
>> and incorrect in subtle ways.
>> We should just have a separate Xen PV exit path the same way we have a
>> separate entry path in recent kernels.  *This* is what I'm
>> volunteering to do.
> 
> I don't think there is much work needed. Xen PV does basically a return
> to user mode via IRET. I think it might work just to use
> swapgs_restore_regs_and_return_to_usermode() unconditionally for Xen PV.
> 

I’m quite confident that will work, but I was hoping to get it to work quickly too :)

> 
> Juergen
> <OpenPGP_0xB0DE9DD628BF132F.asc>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-15 16:05                           ` Andy Lutomirski
@ 2020-11-15 16:13                             ` Jürgen Groß
  0 siblings, 0 replies; 32+ messages in thread
From: Jürgen Groß @ 2020-11-15 16:13 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, Peter Zijlstra,
	Shinichiro Kawasaki, linux-kernel, Nicholas Piggin,
	Damien Le Moal, X86 ML


[-- Attachment #1.1.1: Type: text/plain, Size: 1863 bytes --]

On 15.11.20 17:05, Andy Lutomirski wrote:
> 
>> On Nov 14, 2020, at 10:33 PM, Jürgen Groß <jgross@suse.com> wrote:
>>
>> On 14.11.20 19:10, Andy Lutomirski wrote:
>>>> On Sat, Nov 14, 2020 at 1:16 AM Jürgen Groß <jgross@suse.com> wrote:
>>>>
>>>> On 13.11.20 18:34, Andy Lutomirski wrote:
>>>>> On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper
>>>>> <andrew.cooper3@citrix.com> wrote:
>>>>>
>>>>> So I think there is at most one of these that wants anything more
>>>>> complicated than a plain ALTERNATIVE.  Any volunteers to make it so?
>>>>> Juergen, if you do all of them except USERGS_SYSRET64, I hereby
>>>>> volunteer to do that one.
>>>>
>>>> Why is a plain alternative (either swapgs; sysretq or a jmp xen_sysret64
>>>> depending on X86_FEATURE_XENPV) no option?
>>>>
>>>> Its not as if this code would run before alternative patching.
>>> ALTERNATIVE would "work" in the sense that it would function and be
>>> just about as nonsensical as the current code.  Fundamentally, Xen
>>> PV's sysret feature is not a drop-in replacement for SYSRET64, and
>>> pretending that it is seems unlikely to work well.  I suspect that the
>>> current code is some combination of exceedingly slow, non-functional,
>>> and incorrect in subtle ways.
>>> We should just have a separate Xen PV exit path the same way we have a
>>> separate entry path in recent kernels.  *This* is what I'm
>>> volunteering to do.
>>
>> I don't think there is much work needed. Xen PV does basically a return
>> to user mode via IRET. I think it might work just to use
>> swapgs_restore_regs_and_return_to_usermode() unconditionally for Xen PV.
>>
> 
> I’m quite confident that will work, but I was hoping to get it to work quickly too :)

The PV interface requires to use the iret hypercall, because there
is no sysret equivalent.


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-13 17:34                   ` Andy Lutomirski
  2020-11-14  9:16                     ` Jürgen Groß
@ 2020-11-16 11:56                     ` Jürgen Groß
  2020-11-16 13:04                       ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Jürgen Groß @ 2020-11-16 11:56 UTC (permalink / raw)
  To: Andy Lutomirski, Andrew Cooper
  Cc: Josh Poimboeuf, Peter Zijlstra, Shinichiro Kawasaki,
	linux-kernel, Nicholas Piggin, Damien Le Moal, X86 ML


[-- Attachment #1.1.1: Type: text/plain, Size: 1727 bytes --]

On 13.11.20 18:34, Andy Lutomirski wrote:
> On Wed, Nov 11, 2020 at 12:25 PM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>>
>> On 11/11/2020 20:15, Josh Poimboeuf wrote:
>>> On Wed, Nov 11, 2020 at 09:07:30PM +0100, Peter Zijlstra wrote:
>>>> On Wed, Nov 11, 2020 at 01:59:00PM -0600, Josh Poimboeuf wrote:
>>>>> On Wed, Nov 11, 2020 at 08:42:06PM +0100, Peter Zijlstra wrote:
>>>>>>> Would objtool have an easier time coping if this were implemented in
>>>>>>> terms of a static call?
>>>>>> I doubt it, the big problem is that there is no visibility into the
>>>>>> actual alternative text. Runtime patching fragments into static call
>>>>>> would have the exact same problem.
>>>>>>
>>>>>> Something that _might_ maybe work is trying to morph the immediate
>>>>>> fragments into an alternative. That is, instead of this:
>>>>>>
>>>>>> static inline notrace unsigned long arch_local_save_flags(void)
>>>>>> {
>>>>>>     return PVOP_CALLEE0(unsigned long, irq.save_fl);
>>>>>> }
>>>>>>
>>>>>> Write it something like:
>>>>>>
>>>>>> static inline notrace unsigned long arch_local_save_flags(void)
>>>>>> {
>>>>>>     PVOP_CALL_ARGS;
>>>>>>     PVOP_TEST_NULL(irq.save_fl);
>>>>>>     asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
>>>>>>                                     "PUSHF; POP _ASM_AX",
>>>>>>                                     X86_FEATURE_NATIVE)

I am wondering whether we really want a new feature (basically "not
XENPV). We could use ~X86_FEATURE_XENPV and teach apply_alternatives()
to understand negated features (yes, this limits the number of features
to 32767, but I don't think this is a real problem for quite some time).

Thoughts?


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-16 11:56                     ` Jürgen Groß
@ 2020-11-16 13:04                       ` Peter Zijlstra
  2020-11-18  6:47                         ` Jürgen Groß
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-11-16 13:04 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf,
	Shinichiro Kawasaki, linux-kernel, Nicholas Piggin,
	Damien Le Moal, X86 ML

[-- Attachment #1: Type: text/plain, Size: 6351 bytes --]

On Mon, Nov 16, 2020 at 12:56:32PM +0100, Jürgen Groß wrote:
> > > > > > > static inline notrace unsigned long arch_local_save_flags(void)
> > > > > > > {
> > > > > > >     PVOP_CALL_ARGS;
> > > > > > >     PVOP_TEST_NULL(irq.save_fl);
> > > > > > >     asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
> > > > > > >                                     "PUSHF; POP _ASM_AX",
> > > > > > >                                     X86_FEATURE_NATIVE)
> 
> I am wondering whether we really want a new feature (basically "not
> XENPV). We could use ~X86_FEATURE_XENPV and teach apply_alternatives()
> to understand negated features (yes, this limits the number of features
> to 32767, but I don't think this is a real problem for quite some time).
> 
> Thoughts?

I went with the simple thing for now... If we ever want/need another
negative alternative I suppose we can do as you suggest...

I was still poking at objtool to actually dtrt though..

---
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index dad350d42ecf..cc88f358bac5 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -237,6 +237,7 @@
 #define X86_FEATURE_VMCALL		( 8*32+18) /* "" Hypervisor supports the VMCALL instruction */
 #define X86_FEATURE_VMW_VMMCALL		( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
 #define X86_FEATURE_SEV_ES		( 8*32+20) /* AMD Secure Encrypted Virtualization - Encrypted State */
+#define X86_FEATURE_NOT_XENPV		( 8*32+21) /* "" inverse of X86_FEATURE_XENPV */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE		( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index d25cc6830e89..e2a9d3e6b7ad 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -645,22 +645,56 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
 #ifdef CONFIG_PARAVIRT_XXL
 static inline notrace unsigned long arch_local_save_flags(void)
 {
-	return PVOP_CALLEE0(unsigned long, irq.save_fl);
+	PVOP_CALL_ARGS;
+	PVOP_TEST_NULL(irq.save_fl);
+	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
+					"pushf; pop %%" _ASM_AX ";",
+					X86_FEATURE_NOT_XENPV)
+			    : PVOP_CALLEE_CLOBBERS, ASM_CALL_CONSTRAINT
+			    : paravirt_type(irq.save_fl.func),
+			      paravirt_clobber(CLBR_RET_REG)
+			    : "memory", "cc");
+	return __eax;
 }
 
 static inline notrace void arch_local_irq_restore(unsigned long f)
 {
-	PVOP_VCALLEE1(irq.restore_fl, f);
+	PVOP_CALL_ARGS;
+	PVOP_TEST_NULL(irq.restore_fl);
+	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
+					"push %%" _ASM_ARG1 "; popf;",
+					X86_FEATURE_NOT_XENPV)
+			    : PVOP_VCALLEE_CLOBBERS, ASM_CALL_CONSTRAINT
+			    : paravirt_type(irq.restore_fl.func),
+			      paravirt_clobber(CLBR_RET_REG),
+			      PVOP_CALL_ARG1(f)
+			    : "memory", "cc");
 }
 
 static inline notrace void arch_local_irq_disable(void)
 {
-	PVOP_VCALLEE0(irq.irq_disable);
+	PVOP_CALL_ARGS;
+	PVOP_TEST_NULL(irq.irq_disable);
+	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
+					"cli;",
+					X86_FEATURE_NOT_XENPV)
+			    : PVOP_VCALLEE_CLOBBERS, ASM_CALL_CONSTRAINT
+			    : paravirt_type(irq.irq_disable.func),
+			      paravirt_clobber(CLBR_RET_REG)
+			    : "memory", "cc");
 }
 
 static inline notrace void arch_local_irq_enable(void)
 {
-	PVOP_VCALLEE0(irq.irq_enable);
+	PVOP_CALL_ARGS;
+	PVOP_TEST_NULL(irq.irq_enable);
+	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
+					"sti;",
+					X86_FEATURE_NOT_XENPV)
+			    : PVOP_VCALLEE_CLOBBERS, ASM_CALL_CONSTRAINT
+			    : paravirt_type(irq.irq_enable.func),
+			      paravirt_clobber(CLBR_RET_REG)
+			    : "memory", "cc");
 }
 
 static inline notrace unsigned long arch_local_irq_save(void)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 2400ad62f330..5bd8f5503652 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -723,6 +723,19 @@ void __init alternative_instructions(void)
 	 * patching.
 	 */
 
+	if (!boot_cpu_has(X86_FEATURE_XENPV))
+		setup_force_cpu_cap(X86_FEATURE_NOT_XENPV);
+
+	/*
+	 * First patch paravirt functions, such that we overwrite the indirect
+	 * call with the direct call.
+	 */
+	apply_paravirt(__parainstructions, __parainstructions_end);
+
+	/*
+	 * Then patch alternatives, such that those paravirt calls that are in
+	 * alternatives can be overwritten by their immediate fragments.
+	 */
 	apply_alternatives(__alt_instructions, __alt_instructions_end);
 
 #ifdef CONFIG_SMP
@@ -741,8 +754,6 @@ void __init alternative_instructions(void)
 	}
 #endif
 
-	apply_paravirt(__parainstructions, __parainstructions_end);
-
 	restart_nmi();
 	alternatives_patched = 1;
 }
diff --git a/arch/x86/kernel/paravirt_patch.c b/arch/x86/kernel/paravirt_patch.c
index ace6e334cb39..867498db53ad 100644
--- a/arch/x86/kernel/paravirt_patch.c
+++ b/arch/x86/kernel/paravirt_patch.c
@@ -33,13 +33,9 @@ struct patch_xxl {
 };
 
 static const struct patch_xxl patch_data_xxl = {
-	.irq_irq_disable	= { 0xfa },		// cli
-	.irq_irq_enable		= { 0xfb },		// sti
-	.irq_save_fl		= { 0x9c, 0x58 },	// pushf; pop %[re]ax
 	.mmu_read_cr2		= { 0x0f, 0x20, 0xd0 },	// mov %cr2, %[re]ax
 	.mmu_read_cr3		= { 0x0f, 0x20, 0xd8 },	// mov %cr3, %[re]ax
 	.mmu_write_cr3		= { 0x0f, 0x22, 0xdf },	// mov %rdi, %cr3
-	.irq_restore_fl		= { 0x57, 0x9d },	// push %rdi; popfq
 	.cpu_wbinvd		= { 0x0f, 0x09 },	// wbinvd
 	.cpu_usergs_sysret64	= { 0x0f, 0x01, 0xf8,
 				    0x48, 0x0f, 0x07 },	// swapgs; sysretq
@@ -76,11 +72,6 @@ unsigned int native_patch(u8 type, void *insn_buff, unsigned long addr,
 	switch (type) {
 
 #ifdef CONFIG_PARAVIRT_XXL
-	PATCH_CASE(irq, restore_fl, xxl, insn_buff, len);
-	PATCH_CASE(irq, save_fl, xxl, insn_buff, len);
-	PATCH_CASE(irq, irq_enable, xxl, insn_buff, len);
-	PATCH_CASE(irq, irq_disable, xxl, insn_buff, len);
-
 	PATCH_CASE(mmu, read_cr2, xxl, insn_buff, len);
 	PATCH_CASE(mmu, read_cr3, xxl, insn_buff, len);
 	PATCH_CASE(mmu, write_cr3, xxl, insn_buff, len);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-16 13:04                       ` Peter Zijlstra
@ 2020-11-18  6:47                         ` Jürgen Groß
  2020-11-18  8:22                           ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Jürgen Groß @ 2020-11-18  6:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf,
	Shinichiro Kawasaki, linux-kernel, Nicholas Piggin,
	Damien Le Moal, X86 ML


[-- Attachment #1.1.1: Type: text/plain, Size: 6935 bytes --]

On 16.11.20 14:04, Peter Zijlstra wrote:
> On Mon, Nov 16, 2020 at 12:56:32PM +0100, Jürgen Groß wrote:
>>>>>>>> static inline notrace unsigned long arch_local_save_flags(void)
>>>>>>>> {
>>>>>>>>      PVOP_CALL_ARGS;
>>>>>>>>      PVOP_TEST_NULL(irq.save_fl);
>>>>>>>>      asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
>>>>>>>>                                      "PUSHF; POP _ASM_AX",
>>>>>>>>                                      X86_FEATURE_NATIVE)
>>
>> I am wondering whether we really want a new feature (basically "not
>> XENPV). We could use ~X86_FEATURE_XENPV and teach apply_alternatives()
>> to understand negated features (yes, this limits the number of features
>> to 32767, but I don't think this is a real problem for quite some time).
>>
>> Thoughts?
> 
> I went with the simple thing for now... If we ever want/need another
> negative alternative I suppose we can do as you suggest...
> 
> I was still poking at objtool to actually dtrt though..

I'd like to include this part in my series (with a different solution
for the restore_fl part, as suggested by Andy before).

Peter, are you fine with me taking your patch and adding your SoB?


Juergen

> 
> ---
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index dad350d42ecf..cc88f358bac5 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -237,6 +237,7 @@
>   #define X86_FEATURE_VMCALL		( 8*32+18) /* "" Hypervisor supports the VMCALL instruction */
>   #define X86_FEATURE_VMW_VMMCALL		( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */
>   #define X86_FEATURE_SEV_ES		( 8*32+20) /* AMD Secure Encrypted Virtualization - Encrypted State */
> +#define X86_FEATURE_NOT_XENPV		( 8*32+21) /* "" inverse of X86_FEATURE_XENPV */
>   
>   /* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */
>   #define X86_FEATURE_FSGSBASE		( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index d25cc6830e89..e2a9d3e6b7ad 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -645,22 +645,56 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
>   #ifdef CONFIG_PARAVIRT_XXL
>   static inline notrace unsigned long arch_local_save_flags(void)
>   {
> -	return PVOP_CALLEE0(unsigned long, irq.save_fl);
> +	PVOP_CALL_ARGS;
> +	PVOP_TEST_NULL(irq.save_fl);
> +	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
> +					"pushf; pop %%" _ASM_AX ";",
> +					X86_FEATURE_NOT_XENPV)
> +			    : PVOP_CALLEE_CLOBBERS, ASM_CALL_CONSTRAINT
> +			    : paravirt_type(irq.save_fl.func),
> +			      paravirt_clobber(CLBR_RET_REG)
> +			    : "memory", "cc");
> +	return __eax;
>   }
>   
>   static inline notrace void arch_local_irq_restore(unsigned long f)
>   {
> -	PVOP_VCALLEE1(irq.restore_fl, f);
> +	PVOP_CALL_ARGS;
> +	PVOP_TEST_NULL(irq.restore_fl);
> +	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
> +					"push %%" _ASM_ARG1 "; popf;",
> +					X86_FEATURE_NOT_XENPV)
> +			    : PVOP_VCALLEE_CLOBBERS, ASM_CALL_CONSTRAINT
> +			    : paravirt_type(irq.restore_fl.func),
> +			      paravirt_clobber(CLBR_RET_REG),
> +			      PVOP_CALL_ARG1(f)
> +			    : "memory", "cc");
>   }
>   
>   static inline notrace void arch_local_irq_disable(void)
>   {
> -	PVOP_VCALLEE0(irq.irq_disable);
> +	PVOP_CALL_ARGS;
> +	PVOP_TEST_NULL(irq.irq_disable);
> +	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
> +					"cli;",
> +					X86_FEATURE_NOT_XENPV)
> +			    : PVOP_VCALLEE_CLOBBERS, ASM_CALL_CONSTRAINT
> +			    : paravirt_type(irq.irq_disable.func),
> +			      paravirt_clobber(CLBR_RET_REG)
> +			    : "memory", "cc");
>   }
>   
>   static inline notrace void arch_local_irq_enable(void)
>   {
> -	PVOP_VCALLEE0(irq.irq_enable);
> +	PVOP_CALL_ARGS;
> +	PVOP_TEST_NULL(irq.irq_enable);
> +	asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
> +					"sti;",
> +					X86_FEATURE_NOT_XENPV)
> +			    : PVOP_VCALLEE_CLOBBERS, ASM_CALL_CONSTRAINT
> +			    : paravirt_type(irq.irq_enable.func),
> +			      paravirt_clobber(CLBR_RET_REG)
> +			    : "memory", "cc");
>   }
>   
>   static inline notrace unsigned long arch_local_irq_save(void)
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 2400ad62f330..5bd8f5503652 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -723,6 +723,19 @@ void __init alternative_instructions(void)
>   	 * patching.
>   	 */
>   
> +	if (!boot_cpu_has(X86_FEATURE_XENPV))
> +		setup_force_cpu_cap(X86_FEATURE_NOT_XENPV);
> +
> +	/*
> +	 * First patch paravirt functions, such that we overwrite the indirect
> +	 * call with the direct call.
> +	 */
> +	apply_paravirt(__parainstructions, __parainstructions_end);
> +
> +	/*
> +	 * Then patch alternatives, such that those paravirt calls that are in
> +	 * alternatives can be overwritten by their immediate fragments.
> +	 */
>   	apply_alternatives(__alt_instructions, __alt_instructions_end);
>   
>   #ifdef CONFIG_SMP
> @@ -741,8 +754,6 @@ void __init alternative_instructions(void)
>   	}
>   #endif
>   
> -	apply_paravirt(__parainstructions, __parainstructions_end);
> -
>   	restart_nmi();
>   	alternatives_patched = 1;
>   }
> diff --git a/arch/x86/kernel/paravirt_patch.c b/arch/x86/kernel/paravirt_patch.c
> index ace6e334cb39..867498db53ad 100644
> --- a/arch/x86/kernel/paravirt_patch.c
> +++ b/arch/x86/kernel/paravirt_patch.c
> @@ -33,13 +33,9 @@ struct patch_xxl {
>   };
>   
>   static const struct patch_xxl patch_data_xxl = {
> -	.irq_irq_disable	= { 0xfa },		// cli
> -	.irq_irq_enable		= { 0xfb },		// sti
> -	.irq_save_fl		= { 0x9c, 0x58 },	// pushf; pop %[re]ax
>   	.mmu_read_cr2		= { 0x0f, 0x20, 0xd0 },	// mov %cr2, %[re]ax
>   	.mmu_read_cr3		= { 0x0f, 0x20, 0xd8 },	// mov %cr3, %[re]ax
>   	.mmu_write_cr3		= { 0x0f, 0x22, 0xdf },	// mov %rdi, %cr3
> -	.irq_restore_fl		= { 0x57, 0x9d },	// push %rdi; popfq
>   	.cpu_wbinvd		= { 0x0f, 0x09 },	// wbinvd
>   	.cpu_usergs_sysret64	= { 0x0f, 0x01, 0xf8,
>   				    0x48, 0x0f, 0x07 },	// swapgs; sysretq
> @@ -76,11 +72,6 @@ unsigned int native_patch(u8 type, void *insn_buff, unsigned long addr,
>   	switch (type) {
>   
>   #ifdef CONFIG_PARAVIRT_XXL
> -	PATCH_CASE(irq, restore_fl, xxl, insn_buff, len);
> -	PATCH_CASE(irq, save_fl, xxl, insn_buff, len);
> -	PATCH_CASE(irq, irq_enable, xxl, insn_buff, len);
> -	PATCH_CASE(irq, irq_disable, xxl, insn_buff, len);
> -
>   	PATCH_CASE(mmu, read_cr2, xxl, insn_buff, len);
>   	PATCH_CASE(mmu, read_cr3, xxl, insn_buff, len);
>   	PATCH_CASE(mmu, write_cr3, xxl, insn_buff, len);
> 


[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-18  6:47                         ` Jürgen Groß
@ 2020-11-18  8:22                           ` Peter Zijlstra
  2020-11-19 11:51                             ` Shinichiro Kawasaki
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2020-11-18  8:22 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf,
	Shinichiro Kawasaki, linux-kernel, Nicholas Piggin,
	Damien Le Moal, X86 ML

On Wed, Nov 18, 2020 at 07:47:36AM +0100, Jürgen Groß wrote:
> On 16.11.20 14:04, Peter Zijlstra wrote:
> > On Mon, Nov 16, 2020 at 12:56:32PM +0100, Jürgen Groß wrote:
> > > > > > > > > static inline notrace unsigned long arch_local_save_flags(void)
> > > > > > > > > {
> > > > > > > > >      PVOP_CALL_ARGS;
> > > > > > > > >      PVOP_TEST_NULL(irq.save_fl);
> > > > > > > > >      asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
> > > > > > > > >                                      "PUSHF; POP _ASM_AX",
> > > > > > > > >                                      X86_FEATURE_NATIVE)
> > > 
> > > I am wondering whether we really want a new feature (basically "not
> > > XENPV). We could use ~X86_FEATURE_XENPV and teach apply_alternatives()
> > > to understand negated features (yes, this limits the number of features
> > > to 32767, but I don't think this is a real problem for quite some time).
> > > 
> > > Thoughts?
> > 
> > I went with the simple thing for now... If we ever want/need another
> > negative alternative I suppose we can do as you suggest...
> > 
> > I was still poking at objtool to actually dtrt though..
> 
> I'd like to include this part in my series (with a different solution
> for the restore_fl part, as suggested by Andy before).
> 
> Peter, are you fine with me taking your patch and adding your SoB?

Sure, note that I only compile tested it, as it was my test-bed for
playing with objtool (which I still haven't managed to get back to and
finish :/).

So if it actually _works_ feel free to take it, otherwise you can have
whatever bits and pieces remain after you've butchered it to do as you
want.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-18  8:22                           ` Peter Zijlstra
@ 2020-11-19 11:51                             ` Shinichiro Kawasaki
  2020-11-19 12:01                               ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Shinichiro Kawasaki @ 2020-11-19 11:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jürgen Groß,
	Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, linux-kernel,
	Nicholas Piggin, Damien Le Moal, X86 ML

On Nov 18, 2020 / 09:22, Peter Zijlstra wrote:
> On Wed, Nov 18, 2020 at 07:47:36AM +0100, Jürgen Groß wrote:
> > On 16.11.20 14:04, Peter Zijlstra wrote:
> > > On Mon, Nov 16, 2020 at 12:56:32PM +0100, Jürgen Groß wrote:
> > > > > > > > > > static inline notrace unsigned long arch_local_save_flags(void)
> > > > > > > > > > {
> > > > > > > > > >      PVOP_CALL_ARGS;
> > > > > > > > > >      PVOP_TEST_NULL(irq.save_fl);
> > > > > > > > > >      asm_inline volatile(ALTERNATIVE(paravirt_alt(PARAVIRT_CALL),
> > > > > > > > > >                                      "PUSHF; POP _ASM_AX",
> > > > > > > > > >                                      X86_FEATURE_NATIVE)
> > > > 
> > > > I am wondering whether we really want a new feature (basically "not
> > > > XENPV). We could use ~X86_FEATURE_XENPV and teach apply_alternatives()
> > > > to understand negated features (yes, this limits the number of features
> > > > to 32767, but I don't think this is a real problem for quite some time).
> > > > 
> > > > Thoughts?
> > > 
> > > I went with the simple thing for now... If we ever want/need another
> > > negative alternative I suppose we can do as you suggest...
> > > 
> > > I was still poking at objtool to actually dtrt though..
> > 
> > I'd like to include this part in my series (with a different solution
> > for the restore_fl part, as suggested by Andy before).
> > 
> > Peter, are you fine with me taking your patch and adding your SoB?
> 
> Sure, note that I only compile tested it, as it was my test-bed for
> playing with objtool (which I still haven't managed to get back to and
> finish :/).
> 
> So if it actually _works_ feel free to take it, otherwise you can have
> whatever bits and pieces remain after you've butchered it to do as you
> want.

All, thank you for the actions.

I tried Peter's patch in my test system and result looks good. The warning is
still observed with 5.10-rc4. With the patch on top of 5.10-rc4, the warning
disappeared.

Of note is that when I built kernel with the patch, objtool reported many
warnings, as follows:

arch/x86/events/intel/core.o: warning: objtool: .altinstr_replacement+0x0: alternative modifies stack
arch/x86/events/core.o: warning: objtool: .altinstr_replacement+0x0: alternative modifies stack

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-19 11:51                             ` Shinichiro Kawasaki
@ 2020-11-19 12:01                               ` Peter Zijlstra
  2020-11-19 12:28                                 ` Jürgen Groß
  2020-11-19 12:48                                 ` Shinichiro Kawasaki
  0 siblings, 2 replies; 32+ messages in thread
From: Peter Zijlstra @ 2020-11-19 12:01 UTC (permalink / raw)
  To: Shinichiro Kawasaki
  Cc: Jürgen Groß,
	Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, linux-kernel,
	Nicholas Piggin, Damien Le Moal, X86 ML

On Thu, Nov 19, 2020 at 11:51:00AM +0000, Shinichiro Kawasaki wrote:
> I tried Peter's patch in my test system and result looks good. The warning is
> still observed with 5.10-rc4. With the patch on top of 5.10-rc4, the warning
> disappeared.

The patch on its own is not sufficient to fix things; there needs to be
an accompanying objtool patch to generate the correct unwind
information.

This patch only ensures objtool has enough information to be able to
dtrt, but as it stands it shouldn't even compile, it should complain
about an alternative trying to modify the stack and bail.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-19 12:01                               ` Peter Zijlstra
@ 2020-11-19 12:28                                 ` Jürgen Groß
  2020-11-19 12:48                                 ` Shinichiro Kawasaki
  1 sibling, 0 replies; 32+ messages in thread
From: Jürgen Groß @ 2020-11-19 12:28 UTC (permalink / raw)
  To: Peter Zijlstra, Shinichiro Kawasaki
  Cc: Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, linux-kernel,
	Nicholas Piggin, Damien Le Moal, X86 ML


[-- Attachment #1.1.1: Type: text/plain, Size: 927 bytes --]

On 19.11.20 13:01, Peter Zijlstra wrote:
> On Thu, Nov 19, 2020 at 11:51:00AM +0000, Shinichiro Kawasaki wrote:
>> I tried Peter's patch in my test system and result looks good. The warning is
>> still observed with 5.10-rc4. With the patch on top of 5.10-rc4, the warning
>> disappeared.
> 
> The patch on its own is not sufficient to fix things; there needs to be
> an accompanying objtool patch to generate the correct unwind
> information.
> 
> This patch only ensures objtool has enough information to be able to
> dtrt, but as it stands it shouldn't even compile, it should complain
> about an alternative trying to modify the stack and bail.
> 

It complains, but doesn't bail.

I'm nearly ready with my series replacing all the custom code paravirt
patches by ALTERNATIVEs. Quite some paravirt macro cruft turned out to
be unused, so there is a rather large cleanup patch included. :-)


Juergen

[-- Attachment #1.1.2: OpenPGP_0xB0DE9DD628BF132F.asc --]
[-- Type: application/pgp-keys, Size: 3135 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: WARNING: can't access registers at asm_common_interrupt
  2020-11-19 12:01                               ` Peter Zijlstra
  2020-11-19 12:28                                 ` Jürgen Groß
@ 2020-11-19 12:48                                 ` Shinichiro Kawasaki
  1 sibling, 0 replies; 32+ messages in thread
From: Shinichiro Kawasaki @ 2020-11-19 12:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jürgen Groß,
	Andy Lutomirski, Andrew Cooper, Josh Poimboeuf, linux-kernel,
	Nicholas Piggin, Damien Le Moal, X86 ML

On Nov 19, 2020 / 13:01, Peter Zijlstra wrote:
> On Thu, Nov 19, 2020 at 11:51:00AM +0000, Shinichiro Kawasaki wrote:
> > I tried Peter's patch in my test system and result looks good. The warning is
> > still observed with 5.10-rc4. With the patch on top of 5.10-rc4, the warning
> > disappeared.
> 
> The patch on its own is not sufficient to fix things; there needs to be
> an accompanying objtool patch to generate the correct unwind
> information.
> 
> This patch only ensures objtool has enough information to be able to
> dtrt, but as it stands it shouldn't even compile, it should complain
> about an alternative trying to modify the stack and bail.

Thanks for the clarification. It was too early to try out... When it gets ready
to try and test, please let me know.

-- 
Best Regards,
Shin'ichiro Kawasaki

^ permalink raw reply	[flat|nested] 32+ messages in thread

* WARNING: can't access registers at asm_common_interrupt
@ 2020-09-06 20:46 syzbot
  0 siblings, 0 replies; 32+ messages in thread
From: syzbot @ 2020-09-06 20:46 UTC (permalink / raw)
  To: alexandre.chartre, bp, hpa, linux-kernel, luto, mingo, peterz,
	syzkaller-bugs, tglx, x86

Hello,

syzbot found the following issue on:

HEAD commit:    59126901 Merge tag 'perf-tools-fixes-for-v5.9-2020-09-03' ..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11221b5d900000
kernel config:  https://syzkaller.appspot.com/x/.config?x=3c5f6ce8d5b68299
dashboard link: https://syzkaller.appspot.com/bug?extid=424f7b15245b9615f293
compiler:       gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+424f7b15245b9615f293@syzkaller.appspotmail.com

WARNING: can't access registers at asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:572


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-11-19 12:49 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-06  6:04 WARNING: can't access registers at asm_common_interrupt Shinichiro Kawasaki
2020-11-06 18:06 ` Josh Poimboeuf
2020-11-09  9:10   ` Shinichiro Kawasaki
2020-11-10  3:19     ` Josh Poimboeuf
2020-11-10  9:19       ` Shinichiro Kawasaki
2020-11-11 17:05 ` Josh Poimboeuf
2020-11-11 17:47   ` Peter Zijlstra
2020-11-11 18:13     ` Josh Poimboeuf
2020-11-11 18:46       ` Andrew Cooper
2020-11-11 19:42         ` Peter Zijlstra
2020-11-11 19:59           ` Josh Poimboeuf
2020-11-11 20:07             ` Peter Zijlstra
2020-11-11 20:15               ` Josh Poimboeuf
2020-11-11 20:25                 ` Andrew Cooper
2020-11-11 20:39                   ` Peter Zijlstra
2020-11-13 17:34                   ` Andy Lutomirski
2020-11-14  9:16                     ` Jürgen Groß
2020-11-14 18:10                       ` Andy Lutomirski
2020-11-15  6:33                         ` Jürgen Groß
2020-11-15 16:05                           ` Andy Lutomirski
2020-11-15 16:13                             ` Jürgen Groß
2020-11-16 11:56                     ` Jürgen Groß
2020-11-16 13:04                       ` Peter Zijlstra
2020-11-18  6:47                         ` Jürgen Groß
2020-11-18  8:22                           ` Peter Zijlstra
2020-11-19 11:51                             ` Shinichiro Kawasaki
2020-11-19 12:01                               ` Peter Zijlstra
2020-11-19 12:28                                 ` Jürgen Groß
2020-11-19 12:48                                 ` Shinichiro Kawasaki
2020-11-11 20:35                 ` Peter Zijlstra
2020-11-11 20:42       ` Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2020-09-06 20:46 syzbot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.