virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
       [not found] ` <20200805132629.GA87338@elver.google.com>
@ 2020-08-05 13:42   ` peterz
       [not found]     ` <20200805135940.GA156343@elver.google.com>
  0 siblings, 1 reply; 19+ messages in thread
From: peterz @ 2020-08-05 13:42 UTC (permalink / raw)
  To: Marco Elver
  Cc: jgross, fenghua.yu, yu-cheng.yu, tony.luck, dave.hansen,
	syzkaller-bugs, linux-kernel, kasan-dev, virtualization, mingo,
	bp, hpa, tglx, syzbot, x86

On Wed, Aug 05, 2020 at 03:26:29PM +0200, Marco Elver wrote:
> Add missing noinstr to arch_local*() helpers, as they may be called from
> noinstr code.
> 
> On a KCSAN config with CONFIG_PARAVIRT=y, syzbot stumbled across corrupt

Cute, so I've been working on adding objtool support for this a little:

  https://lkml.kernel.org/r/20200803143231.GE2674@hirez.programming.kicks-ass.net

> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index 3d2afecde50c..a606f2ba2b5e 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -760,27 +760,27 @@ bool __raw_callee_save___native_vcpu_is_preempted(long cpu);
>  	((struct paravirt_callee_save) { func })
>  
>  #ifdef CONFIG_PARAVIRT_XXL
> -static inline notrace unsigned long arch_local_save_flags(void)
> +static inline noinstr unsigned long arch_local_save_flags(void)
>  {
>  	return PVOP_CALLEE0(unsigned long, irq.save_fl);
>  }
>  
> -static inline notrace void arch_local_irq_restore(unsigned long f)
> +static inline noinstr void arch_local_irq_restore(unsigned long f)
>  {
>  	PVOP_VCALLEE1(irq.restore_fl, f);
>  }
>  
> -static inline notrace void arch_local_irq_disable(void)
> +static inline noinstr void arch_local_irq_disable(void)
>  {
>  	PVOP_VCALLEE0(irq.irq_disable);
>  }
>  
> -static inline notrace void arch_local_irq_enable(void)
> +static inline noinstr void arch_local_irq_enable(void)
>  {
>  	PVOP_VCALLEE0(irq.irq_enable);
>  }
>  
> -static inline notrace unsigned long arch_local_irq_save(void)
> +static inline noinstr unsigned long arch_local_irq_save(void)
>  {
>  	unsigned long f;
>  

Shouldn't we __always_inline those? They're going to be really small.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
       [not found]     ` <20200805135940.GA156343@elver.google.com>
@ 2020-08-05 14:12       ` peterz
  2020-08-05 14:17         ` Jürgen Groß
  2020-08-05 14:17         ` peterz
  0 siblings, 2 replies; 19+ messages in thread
From: peterz @ 2020-08-05 14:12 UTC (permalink / raw)
  To: Marco Elver
  Cc: jgross, fenghua.yu, yu-cheng.yu, tony.luck, dave.hansen,
	syzkaller-bugs, linux-kernel, kasan-dev, virtualization, mingo,
	bp, hpa, tglx, syzbot, x86

On Wed, Aug 05, 2020 at 03:59:40PM +0200, Marco Elver wrote:
> On Wed, Aug 05, 2020 at 03:42PM +0200, peterz@infradead.org wrote:

> > Shouldn't we __always_inline those? They're going to be really small.
> 
> I can send a v2, and you can choose. For reference, though:
> 
> 	ffffffff86271ee0 <arch_local_save_flags>:
> 	ffffffff86271ee0:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> 	ffffffff86271ee5:       48 83 3d 43 87 e4 01    cmpq   $0x0,0x1e48743(%rip)        # ffffffff880ba630 <pv_ops+0x120>
> 	ffffffff86271eec:       00
> 	ffffffff86271eed:       74 0d                   je     ffffffff86271efc <arch_local_save_flags+0x1c>
> 	ffffffff86271eef:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> 	ffffffff86271ef4:       ff 14 25 30 a6 0b 88    callq  *0xffffffff880ba630
> 	ffffffff86271efb:       c3                      retq
> 	ffffffff86271efc:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> 	ffffffff86271f01:       0f 0b                   ud2

> 	ffffffff86271a90 <arch_local_irq_restore>:
> 	ffffffff86271a90:       53                      push   %rbx
> 	ffffffff86271a91:       48 89 fb                mov    %rdi,%rbx
> 	ffffffff86271a94:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> 	ffffffff86271a99:       48 83 3d 97 8b e4 01    cmpq   $0x0,0x1e48b97(%rip)        # ffffffff880ba638 <pv_ops+0x128>
> 	ffffffff86271aa0:       00
> 	ffffffff86271aa1:       74 11                   je     ffffffff86271ab4 <arch_local_irq_restore+0x24>
> 	ffffffff86271aa3:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> 	ffffffff86271aa8:       48 89 df                mov    %rbx,%rdi
> 	ffffffff86271aab:       ff 14 25 38 a6 0b 88    callq  *0xffffffff880ba638
> 	ffffffff86271ab2:       5b                      pop    %rbx
> 	ffffffff86271ab3:       c3                      retq
> 	ffffffff86271ab4:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> 	ffffffff86271ab9:       0f 0b                   ud2


Blergh, that's abysmall. In part I suspect because you have
CONFIG_PARAVIRT_DEBUG, let me try and untangle that PV macro maze.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
  2020-08-05 14:12       ` peterz
@ 2020-08-05 14:17         ` Jürgen Groß
  2020-08-05 14:17         ` peterz
  1 sibling, 0 replies; 19+ messages in thread
From: Jürgen Groß @ 2020-08-05 14:17 UTC (permalink / raw)
  To: peterz, Marco Elver
  Cc: syzbot, fenghua.yu, yu-cheng.yu, tony.luck, dave.hansen,
	syzkaller-bugs, linux-kernel, kasan-dev, virtualization, mingo,
	bp, hpa, tglx, x86

On 05.08.20 16:12, peterz@infradead.org wrote:
> On Wed, Aug 05, 2020 at 03:59:40PM +0200, Marco Elver wrote:
>> On Wed, Aug 05, 2020 at 03:42PM +0200, peterz@infradead.org wrote:
> 
>>> Shouldn't we __always_inline those? They're going to be really small.
>>
>> I can send a v2, and you can choose. For reference, though:
>>
>> 	ffffffff86271ee0 <arch_local_save_flags>:
>> 	ffffffff86271ee0:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
>> 	ffffffff86271ee5:       48 83 3d 43 87 e4 01    cmpq   $0x0,0x1e48743(%rip)        # ffffffff880ba630 <pv_ops+0x120>
>> 	ffffffff86271eec:       00
>> 	ffffffff86271eed:       74 0d                   je     ffffffff86271efc <arch_local_save_flags+0x1c>
>> 	ffffffff86271eef:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
>> 	ffffffff86271ef4:       ff 14 25 30 a6 0b 88    callq  *0xffffffff880ba630
>> 	ffffffff86271efb:       c3                      retq
>> 	ffffffff86271efc:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
>> 	ffffffff86271f01:       0f 0b                   ud2
> 
>> 	ffffffff86271a90 <arch_local_irq_restore>:
>> 	ffffffff86271a90:       53                      push   %rbx
>> 	ffffffff86271a91:       48 89 fb                mov    %rdi,%rbx
>> 	ffffffff86271a94:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
>> 	ffffffff86271a99:       48 83 3d 97 8b e4 01    cmpq   $0x0,0x1e48b97(%rip)        # ffffffff880ba638 <pv_ops+0x128>
>> 	ffffffff86271aa0:       00
>> 	ffffffff86271aa1:       74 11                   je     ffffffff86271ab4 <arch_local_irq_restore+0x24>
>> 	ffffffff86271aa3:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
>> 	ffffffff86271aa8:       48 89 df                mov    %rbx,%rdi
>> 	ffffffff86271aab:       ff 14 25 38 a6 0b 88    callq  *0xffffffff880ba638
>> 	ffffffff86271ab2:       5b                      pop    %rbx
>> 	ffffffff86271ab3:       c3                      retq
>> 	ffffffff86271ab4:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
>> 	ffffffff86271ab9:       0f 0b                   ud2
> 
> 
> Blergh, that's abysmall. In part I suspect because you have
> CONFIG_PARAVIRT_DEBUG, let me try and untangle that PV macro maze.
> 

Probably. I have found the following in my kernel:

fffffff81540a5f <arch_local_save_flags>:
ffffffff81540a5f:   ff 14 25 40 a4 23 82    callq  *0xffffffff8223a440
ffffffff81540a66:   c3                      retq


Juergen
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
  2020-08-05 14:12       ` peterz
  2020-08-05 14:17         ` Jürgen Groß
@ 2020-08-05 14:17         ` peterz
       [not found]           ` <CANpmjNN6FWZ+MsAn3Pj+WEez97diHzqF8hjONtHG15C2gSpSgw@mail.gmail.com>
  1 sibling, 1 reply; 19+ messages in thread
From: peterz @ 2020-08-05 14:17 UTC (permalink / raw)
  To: Marco Elver
  Cc: jgross, fenghua.yu, yu-cheng.yu, tony.luck, dave.hansen,
	syzkaller-bugs, linux-kernel, kasan-dev, virtualization, mingo,
	bp, hpa, tglx, syzbot, x86

On Wed, Aug 05, 2020 at 04:12:37PM +0200, peterz@infradead.org wrote:
> On Wed, Aug 05, 2020 at 03:59:40PM +0200, Marco Elver wrote:
> > On Wed, Aug 05, 2020 at 03:42PM +0200, peterz@infradead.org wrote:
> 
> > > Shouldn't we __always_inline those? They're going to be really small.
> > 
> > I can send a v2, and you can choose. For reference, though:
> > 
> > 	ffffffff86271ee0 <arch_local_save_flags>:
> > 	ffffffff86271ee0:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> > 	ffffffff86271ee5:       48 83 3d 43 87 e4 01    cmpq   $0x0,0x1e48743(%rip)        # ffffffff880ba630 <pv_ops+0x120>
> > 	ffffffff86271eec:       00
> > 	ffffffff86271eed:       74 0d                   je     ffffffff86271efc <arch_local_save_flags+0x1c>
> > 	ffffffff86271eef:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> > 	ffffffff86271ef4:       ff 14 25 30 a6 0b 88    callq  *0xffffffff880ba630
> > 	ffffffff86271efb:       c3                      retq
> > 	ffffffff86271efc:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> > 	ffffffff86271f01:       0f 0b                   ud2
> 
> > 	ffffffff86271a90 <arch_local_irq_restore>:
> > 	ffffffff86271a90:       53                      push   %rbx
> > 	ffffffff86271a91:       48 89 fb                mov    %rdi,%rbx
> > 	ffffffff86271a94:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> > 	ffffffff86271a99:       48 83 3d 97 8b e4 01    cmpq   $0x0,0x1e48b97(%rip)        # ffffffff880ba638 <pv_ops+0x128>
> > 	ffffffff86271aa0:       00
> > 	ffffffff86271aa1:       74 11                   je     ffffffff86271ab4 <arch_local_irq_restore+0x24>
> > 	ffffffff86271aa3:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> > 	ffffffff86271aa8:       48 89 df                mov    %rbx,%rdi
> > 	ffffffff86271aab:       ff 14 25 38 a6 0b 88    callq  *0xffffffff880ba638
> > 	ffffffff86271ab2:       5b                      pop    %rbx
> > 	ffffffff86271ab3:       c3                      retq
> > 	ffffffff86271ab4:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
> > 	ffffffff86271ab9:       0f 0b                   ud2
> 
> 
> Blergh, that's abysmall. In part I suspect because you have
> CONFIG_PARAVIRT_DEBUG, let me try and untangle that PV macro maze.

Yeah, look here:

0000 0000000000462149 <arch_local_save_flags>:
0000   462149:  ff 14 25 00 00 00 00    callq  *0x0
0003                    46214c: R_X86_64_32S    pv_ops+0x120
0007   462150:  c3                      retq


That's exactly what I was expecting.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
       [not found]               ` <20200806074723.GA2364872@elver.google.com>
@ 2020-08-06 11:32                 ` peterz
       [not found]                   ` <20200806131702.GA3029162@elver.google.com>
  0 siblings, 1 reply; 19+ messages in thread
From: peterz @ 2020-08-06 11:32 UTC (permalink / raw)
  To: Marco Elver
  Cc: jgross, fenghua.yu, yu-cheng.yu, Luck, Tony, Dave Hansen,
	syzkaller-bugs, LKML, kasan-dev, virtualization, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Thomas Gleixner, syzbot,
	the arch/x86 maintainers

On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote:
> Testing my hypothesis that raw then nested non-raw
> local_irq_save/restore() breaks IRQ state tracking -- see the reproducer
> below. This is at least 1 case I can think of that we're bound to hit.

Aaargh!

> diff --git a/init/main.c b/init/main.c
> index 15bd0efff3df..0873319dcff4 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -1041,6 +1041,22 @@ asmlinkage __visible void __init start_kernel(void)
>  	sfi_init_late();
>  	kcsan_init();
>  
> +	/* DEBUG CODE */
> +	lockdep_assert_irqs_enabled(); /* Pass. */
> +	{
> +		unsigned long flags1;
> +		raw_local_irq_save(flags1);

This disables IRQs but doesn't trace..

> +		{
> +			unsigned long flags2;
> +			lockdep_assert_irqs_enabled(); /* Pass - expectedly blind. */

Indeed, we didn't trace the above disable, so software state is still
on.

> +			local_irq_save(flags2);

So here we save IRQ state, and unconditionally disable IRQs and trace
them disabled.

> +			lockdep_assert_irqs_disabled(); /* Pass. */
> +			local_irq_restore(flags2);

But here, we restore IRQ state to 'disabled' and explicitly trace it
disabled *again* (which is a bit daft, but whatever).

> +		}
> +		raw_local_irq_restore(flags1);

This then restores the IRQ state to enable, but no tracing.

> +	}
> +	lockdep_assert_irqs_enabled(); /* FAIL! */

And we're out of sync... :/

/me goes ponder things...

How's something like this then?

---
 include/linux/sched.h |  3 ---
 kernel/kcsan/core.c   | 62 ++++++++++++++++++++++++++++++++++++---------------
 2 files changed, 44 insertions(+), 21 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 06ec60462af0..2f5aef57e687 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1193,9 +1193,6 @@ struct task_struct {
 
 #ifdef CONFIG_KCSAN
 	struct kcsan_ctx		kcsan_ctx;
-#ifdef CONFIG_TRACE_IRQFLAGS
-	struct irqtrace_events		kcsan_save_irqtrace;
-#endif
 #endif
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c
index 9147ff6a12e5..9c4436bf0561 100644
--- a/kernel/kcsan/core.c
+++ b/kernel/kcsan/core.c
@@ -291,17 +291,50 @@ static inline unsigned int get_delay(void)
 				0);
 }
 
-void kcsan_save_irqtrace(struct task_struct *task)
+/*
+ * KCSAN hooks are everywhere, which means they're NMI like for interrupt
+ * tracing. In order to present a 'normal' as possible context to the code
+ * called by KCSAN when reporting errors we need to update the irq-tracing
+ * state.
+ *
+ * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
+ * runtime is entered for every memory access, and potentially useful
+ * information is lost if dirtied by KCSAN.
+ */
+
+struct kcsan_irq_state {
+	unsigned long		flags;
+#ifdef CONFIG_TRACE_IRQFLAGS
+	int			hardirqs;
+	struct irqtrace_events	irqtrace;
+#endif
+};
+
+void kcsan_save_irqtrace(struct kcsan_irq_state *irq_state)
 {
 #ifdef CONFIG_TRACE_IRQFLAGS
-	task->kcsan_save_irqtrace = task->irqtrace;
+	irq_state->irqtrace = task->irqtrace;
+	irq_state->hardirq = lockdep_hardirqs_enabled();
 #endif
+	if (!kcsan_interrupt_watcher) {
+		raw_local_irq_save(irq_state->flags);
+		lockdep_hardirqs_off(CALLER_ADDR0);
+	}
 }
 
-void kcsan_restore_irqtrace(struct task_struct *task)
+void kcsan_restore_irqtrace(struct kcsan_irq_state *irq_state)
 {
+	if (!kcsan_interrupt_watcher) {
+#ifdef CONFIG_TRACE_IRQFLAGS
+		if (irq_state->hardirqs) {
+			lockdep_hardirqs_on_prepare(CALLER_ADDR0);
+			lockdep_hardirqs_on(CALLER_ADDR0);
+		}
+#endif
+		raw_local_irq_restore(irq_state->flags);
+	}
 #ifdef CONFIG_TRACE_IRQFLAGS
-	task->irqtrace = task->kcsan_save_irqtrace;
+	task->irqtrace = irq_state->irqtrace;
 #endif
 }
 
@@ -350,11 +383,13 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr,
 	flags = user_access_save();
 
 	if (consumed) {
-		kcsan_save_irqtrace(current);
+		struct kcsan_irq_state irqstate;
+
+		kcsan_save_irqtrace(&irqstate);
 		kcsan_report(ptr, size, type, KCSAN_VALUE_CHANGE_MAYBE,
 			     KCSAN_REPORT_CONSUMED_WATCHPOINT,
 			     watchpoint - watchpoints);
-		kcsan_restore_irqtrace(current);
+		kcsan_restore_irqtrace(&irqstate);
 	} else {
 		/*
 		 * The other thread may not print any diagnostics, as it has
@@ -387,7 +422,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
 	unsigned long access_mask;
 	enum kcsan_value_change value_change = KCSAN_VALUE_CHANGE_MAYBE;
 	unsigned long ua_flags = user_access_save();
-	unsigned long irq_flags = 0;
+	struct kcsan_irq_state irqstate;
 
 	/*
 	 * Always reset kcsan_skip counter in slow-path to avoid underflow; see
@@ -412,14 +447,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
 		goto out;
 	}
 
-	/*
-	 * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's
-	 * runtime is entered for every memory access, and potentially useful
-	 * information is lost if dirtied by KCSAN.
-	 */
-	kcsan_save_irqtrace(current);
-	if (!kcsan_interrupt_watcher)
-		local_irq_save(irq_flags);
+	kcsan_save_irqtrace(&irqstate);
 
 	watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
 	if (watchpoint == NULL) {
@@ -559,9 +587,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type)
 	remove_watchpoint(watchpoint);
 	kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS);
 out_unlock:
-	if (!kcsan_interrupt_watcher)
-		local_irq_restore(irq_flags);
-	kcsan_restore_irqtrace(current);
+	kcsan_restore_irqtrace(&irqstate);
 out:
 	user_access_restore(ua_flags);
 }


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
       [not found]                       ` <CANpmjNO860SHpNve+vaoAOgarU1SWy8o--tUWCqNhn82OLCiew@mail.gmail.com>
@ 2020-08-07  9:24                         ` Jürgen Groß
       [not found]                           ` <20200807095032.GA3528289@elver.google.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Jürgen Groß @ 2020-08-07  9:24 UTC (permalink / raw)
  To: Marco Elver, Peter Zijlstra
  Cc: syzbot, fenghua.yu, yu-cheng.yu, Luck, Tony, Paul E. McKenney,
	Dave Hansen, syzkaller-bugs, LKML, kasan-dev, virtualization,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Thomas Gleixner,
	the arch/x86 maintainers

On 07.08.20 11:01, Marco Elver wrote:
> On Thu, 6 Aug 2020 at 18:06, Marco Elver <elver@google.com> wrote:
>> On Thu, 6 Aug 2020 at 15:17, Marco Elver <elver@google.com> wrote:
>>> On Thu, Aug 06, 2020 at 01:32PM +0200, peterz@infradead.org wrote:
>>>> On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote:
>>>>> Testing my hypothesis that raw then nested non-raw
>>>>> local_irq_save/restore() breaks IRQ state tracking -- see the reproducer
>>>>> below. This is at least 1 case I can think of that we're bound to hit.
>>> ...
>>>>
>>>> /me goes ponder things...
>>>>
>>>> How's something like this then?
>>>>
>>>> ---
>>>>   include/linux/sched.h |  3 ---
>>>>   kernel/kcsan/core.c   | 62 ++++++++++++++++++++++++++++++++++++---------------
>>>>   2 files changed, 44 insertions(+), 21 deletions(-)
>>>
>>> Thank you! That approach seems to pass syzbot (also with
>>> CONFIG_PARAVIRT) and kcsan-test tests.
>>>
>>> I had to modify it some, so that report.c's use of the restore logic
>>> works and not mess up the IRQ trace printed on KCSAN reports (with
>>> CONFIG_KCSAN_VERBOSE).
>>>
>>> I still need to fully convince myself all is well now and we don't end
>>> up with more fixes. :-) If it passes further testing, I'll send it as a
>>> real patch (I want to add you as Co-developed-by, but would need your
>>> Signed-off-by for the code you pasted, I think.)
> 
> I let it run on syzbot through the night, and it's fine without
> PARAVIRT (see below). I have sent the patch (need your Signed-off-by
> as it's based on your code, thank you!):
> https://lkml.kernel.org/r/20200807090031.3506555-1-elver@google.com
> 
>> With CONFIG_PARAVIRT=y (without the notrace->noinstr patch), I still
>> get lockdep DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled()), although
>> it takes longer for syzbot to hit them. But I think that's expected
>> because we can still get the recursion that I pointed out, and will
>> need that patch.
> 
> Never mind, I get these warnings even if I don't turn on KCSAN
> (CONFIG_KCSAN=n). Something else is going on with PARAVIRT=y that
> throws off IRQ state tracking. :-/

What are the settings of CONFIG_PARAVIRT_XXL and
CONFIG_PARAVIRT_SPINLOCKS in this case?


Juergen
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
       [not found]                           ` <20200807095032.GA3528289@elver.google.com>
@ 2020-08-07 10:35                             ` Jürgen Groß
       [not found]                               ` <20200807113838.GA3547125@elver.google.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Jürgen Groß @ 2020-08-07 10:35 UTC (permalink / raw)
  To: Marco Elver
  Cc: syzbot, fenghua.yu, yu-cheng.yu, Luck, Tony, Paul E. McKenney,
	Peter Zijlstra, Dave Hansen, syzkaller-bugs, LKML, kasan-dev,
	virtualization, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Thomas Gleixner, the arch/x86 maintainers

On 07.08.20 11:50, Marco Elver wrote:
> On Fri, Aug 07, 2020 at 11:24AM +0200, Jürgen Groß wrote:
>> On 07.08.20 11:01, Marco Elver wrote:
>>> On Thu, 6 Aug 2020 at 18:06, Marco Elver <elver@google.com> wrote:
>>>> On Thu, 6 Aug 2020 at 15:17, Marco Elver <elver@google.com> wrote:
>>>>> On Thu, Aug 06, 2020 at 01:32PM +0200, peterz@infradead.org wrote:
>>>>>> On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote:
>>>>>>> Testing my hypothesis that raw then nested non-raw
>>>>>>> local_irq_save/restore() breaks IRQ state tracking -- see the reproducer
>>>>>>> below. This is at least 1 case I can think of that we're bound to hit.
>>>>> ...
>>>>>>
>>>>>> /me goes ponder things...
>>>>>>
>>>>>> How's something like this then?
>>>>>>
>>>>>> ---
>>>>>>    include/linux/sched.h |  3 ---
>>>>>>    kernel/kcsan/core.c   | 62 ++++++++++++++++++++++++++++++++++++---------------
>>>>>>    2 files changed, 44 insertions(+), 21 deletions(-)
>>>>>
>>>>> Thank you! That approach seems to pass syzbot (also with
>>>>> CONFIG_PARAVIRT) and kcsan-test tests.
>>>>>
>>>>> I had to modify it some, so that report.c's use of the restore logic
>>>>> works and not mess up the IRQ trace printed on KCSAN reports (with
>>>>> CONFIG_KCSAN_VERBOSE).
>>>>>
>>>>> I still need to fully convince myself all is well now and we don't end
>>>>> up with more fixes. :-) If it passes further testing, I'll send it as a
>>>>> real patch (I want to add you as Co-developed-by, but would need your
>>>>> Signed-off-by for the code you pasted, I think.)
>>>
>>> I let it run on syzbot through the night, and it's fine without
>>> PARAVIRT (see below). I have sent the patch (need your Signed-off-by
>>> as it's based on your code, thank you!):
>>> https://lkml.kernel.org/r/20200807090031.3506555-1-elver@google.com
>>>
>>>> With CONFIG_PARAVIRT=y (without the notrace->noinstr patch), I still
>>>> get lockdep DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled()), although
>>>> it takes longer for syzbot to hit them. But I think that's expected
>>>> because we can still get the recursion that I pointed out, and will
>>>> need that patch.
>>>
>>> Never mind, I get these warnings even if I don't turn on KCSAN
>>> (CONFIG_KCSAN=n). Something else is going on with PARAVIRT=y that
>>> throws off IRQ state tracking. :-/
>>
>> What are the settings of CONFIG_PARAVIRT_XXL and
>> CONFIG_PARAVIRT_SPINLOCKS in this case?
> 
> I attached a config.
> 
> 	$> grep PARAVIRT .config
> 	CONFIG_PARAVIRT=y
> 	CONFIG_PARAVIRT_XXL=y
> 	# CONFIG_PARAVIRT_DEBUG is not set
> 	CONFIG_PARAVIRT_SPINLOCKS=y
> 	# CONFIG_PARAVIRT_TIME_ACCOUNTING is not set
> 	CONFIG_PARAVIRT_CLOCK=y

Anything special I need to do to reproduce the problem? Or would you be
willing to do some more rounds with different config settings?

I think CONFIG_PARAVIRT_XXL shouldn't matter, but I'm not completely
sure about that. CONFIG_PARAVIRT_SPINLOCKS would be my primary suspect.


Juergen
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
       [not found]                               ` <20200807113838.GA3547125@elver.google.com>
@ 2020-08-07 12:04                                 ` Jürgen Groß
       [not found]                                   ` <CANpmjNPau_DEYadey9OL+iFZKEaUTqnFnyFs1dU12o00mg7ofA@mail.gmail.com>
  0 siblings, 1 reply; 19+ messages in thread
From: Jürgen Groß @ 2020-08-07 12:04 UTC (permalink / raw)
  To: Marco Elver
  Cc: syzbot, fenghua.yu, yu-cheng.yu, Luck, Tony, Paul E. McKenney,
	Peter Zijlstra, Dave Hansen, syzkaller-bugs, LKML, kasan-dev,
	virtualization, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Thomas Gleixner, the arch/x86 maintainers

On 07.08.20 13:38, Marco Elver wrote:
> On Fri, Aug 07, 2020 at 12:35PM +0200, Jürgen Groß wrote:
>> On 07.08.20 11:50, Marco Elver wrote:
>>> On Fri, Aug 07, 2020 at 11:24AM +0200, Jürgen Groß wrote:
>>>> On 07.08.20 11:01, Marco Elver wrote:
>>>>> On Thu, 6 Aug 2020 at 18:06, Marco Elver <elver@google.com> wrote:
>>>>>> On Thu, 6 Aug 2020 at 15:17, Marco Elver <elver@google.com> wrote:
>>>>>>> On Thu, Aug 06, 2020 at 01:32PM +0200, peterz@infradead.org wrote:
>>>>>>>> On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote:
>>>>>>>>> Testing my hypothesis that raw then nested non-raw
>>>>>>>>> local_irq_save/restore() breaks IRQ state tracking -- see the reproducer
>>>>>>>>> below. This is at least 1 case I can think of that we're bound to hit.
>>>>>>> ...
>>>>>>>>
>>>>>>>> /me goes ponder things...
>>>>>>>>
>>>>>>>> How's something like this then?
>>>>>>>>
>>>>>>>> ---
>>>>>>>>     include/linux/sched.h |  3 ---
>>>>>>>>     kernel/kcsan/core.c   | 62 ++++++++++++++++++++++++++++++++++++---------------
>>>>>>>>     2 files changed, 44 insertions(+), 21 deletions(-)
>>>>>>>
>>>>>>> Thank you! That approach seems to pass syzbot (also with
>>>>>>> CONFIG_PARAVIRT) and kcsan-test tests.
>>>>>>>
>>>>>>> I had to modify it some, so that report.c's use of the restore logic
>>>>>>> works and not mess up the IRQ trace printed on KCSAN reports (with
>>>>>>> CONFIG_KCSAN_VERBOSE).
>>>>>>>
>>>>>>> I still need to fully convince myself all is well now and we don't end
>>>>>>> up with more fixes. :-) If it passes further testing, I'll send it as a
>>>>>>> real patch (I want to add you as Co-developed-by, but would need your
>>>>>>> Signed-off-by for the code you pasted, I think.)
>>>>>
>>>>> I let it run on syzbot through the night, and it's fine without
>>>>> PARAVIRT (see below). I have sent the patch (need your Signed-off-by
>>>>> as it's based on your code, thank you!):
>>>>> https://lkml.kernel.org/r/20200807090031.3506555-1-elver@google.com
>>>>>
>>>>>> With CONFIG_PARAVIRT=y (without the notrace->noinstr patch), I still
>>>>>> get lockdep DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled()), although
>>>>>> it takes longer for syzbot to hit them. But I think that's expected
>>>>>> because we can still get the recursion that I pointed out, and will
>>>>>> need that patch.
>>>>>
>>>>> Never mind, I get these warnings even if I don't turn on KCSAN
>>>>> (CONFIG_KCSAN=n). Something else is going on with PARAVIRT=y that
>>>>> throws off IRQ state tracking. :-/
>>>>
>>>> What are the settings of CONFIG_PARAVIRT_XXL and
>>>> CONFIG_PARAVIRT_SPINLOCKS in this case?
>>>
>>> I attached a config.
>>>
>>> 	$> grep PARAVIRT .config
>>> 	CONFIG_PARAVIRT=y
>>> 	CONFIG_PARAVIRT_XXL=y
>>> 	# CONFIG_PARAVIRT_DEBUG is not set
>>> 	CONFIG_PARAVIRT_SPINLOCKS=y
>>> 	# CONFIG_PARAVIRT_TIME_ACCOUNTING is not set
>>> 	CONFIG_PARAVIRT_CLOCK=y
>>
>> Anything special I need to do to reproduce the problem? Or would you be
>> willing to do some more rounds with different config settings?
> 
> I can only test it with syzkaller, but that probably doesn't help if you
> don't already have it set up. It can't seem to find a C reproducer.
> 
> I did some more rounds with different configs.
> 
>> I think CONFIG_PARAVIRT_XXL shouldn't matter, but I'm not completely
>> sure about that. CONFIG_PARAVIRT_SPINLOCKS would be my primary suspect.
> 
> Yes, PARAVIRT_XXL doesn't make a different. When disabling
> PARAVIRT_SPINLOCKS, however, the warnings go away.

Thanks for testing!

I take it you are doing the tests in a KVM guest?

If so I have a gut feeling that the use of local_irq_save() and
local_irq_restore() in kvm_wait() might be fishy. I might be completely
wrong here, though.

BTW, I think Xen's variant of pv spinlocks is fine (no playing with IRQ
on/off).

Hyper-V seems to do the same as KVM, and kicking another vcpu could be
problematic as well, as it is just using IPI.


Juergen
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
       [not found]                                       ` <CANpmjNM1jASqCFYZpteVrZCa2V2D_DbXaqvoCV_Ac2boYfDXnQ@mail.gmail.com>
@ 2020-08-11  7:04                                         ` Jürgen Groß
  0 siblings, 0 replies; 19+ messages in thread
From: Jürgen Groß @ 2020-08-11  7:04 UTC (permalink / raw)
  To: Marco Elver
  Cc: syzbot, fenghua.yu, yu-cheng.yu, Luck, Tony, Paul E. McKenney,
	Peter Zijlstra, Dave Hansen, syzkaller-bugs, LKML, kasan-dev,
	virtualization, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Thomas Gleixner, Wei Liu, the arch/x86 maintainers

On 11.08.20 09:00, Marco Elver wrote:
> On Fri, 7 Aug 2020 at 17:19, Marco Elver <elver@google.com> wrote:
>> On Fri, Aug 07, 2020 at 02:08PM +0200, Marco Elver wrote:
>>> On Fri, 7 Aug 2020 at 14:04, Jürgen Groß <jgross@suse.com> wrote:
>>>>
>>>> On 07.08.20 13:38, Marco Elver wrote:
>>>>> On Fri, Aug 07, 2020 at 12:35PM +0200, Jürgen Groß wrote:
> ...
>>>>>> I think CONFIG_PARAVIRT_XXL shouldn't matter, but I'm not completely
>>>>>> sure about that. CONFIG_PARAVIRT_SPINLOCKS would be my primary suspect.
>>>>>
>>>>> Yes, PARAVIRT_XXL doesn't make a different. When disabling
>>>>> PARAVIRT_SPINLOCKS, however, the warnings go away.
>>>>
>>>> Thanks for testing!
>>>>
>>>> I take it you are doing the tests in a KVM guest?
>>>
>>> Yes, correct.
>>>
>>>> If so I have a gut feeling that the use of local_irq_save() and
>>>> local_irq_restore() in kvm_wait() might be fishy. I might be completely
>>>> wrong here, though.
>>>
>>> Happy to help debug more, although I might need patches or pointers
>>> what to play with.
>>>
>>>> BTW, I think Xen's variant of pv spinlocks is fine (no playing with IRQ
>>>> on/off).
>>>>
>>>> Hyper-V seems to do the same as KVM, and kicking another vcpu could be
>>>> problematic as well, as it is just using IPI.
>>
>> I experimented a bit more, and the below patch seems to solve the
>> warnings. However, that was based on your pointer about kvm_wait(), and
>> I can't quite tell if it is the right solution.
>>
>> My hypothesis here is simply that kvm_wait() may be called in a place
>> where we get the same case I mentioned to Peter,
>>
>>          raw_local_irq_save(); /* or other IRQs off without tracing */
>>          ...
>>          kvm_wait() /* IRQ state tracing gets confused */
>>          ...
>>          raw_local_irq_restore();
>>
>> and therefore, using raw variants in kvm_wait() works. It's also safe
>> because it doesn't call any other libraries that would result in corrupt
>> IRQ state AFAIK.
> 
> Just to follow-up, it'd still be nice to fix this. Suggestions?
> 
> I could send the below as a patch, but can only go off my above
> hypothesis and the fact that syzbot is happier, so not entirely
> convincing.

Peter has told me via IRC he will look soon further into this.

Your finding suggests that the pv-lock implementation for Hyper-V
needs some tweaking, too. For that purpose I'm adding Wei to Cc.


Juergen

> 
> Thanks,
> -- Marco
> 
>> ------ >8 ------
>>
>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
>> index 233c77d056c9..1d412d1466f0 100644
>> --- a/arch/x86/kernel/kvm.c
>> +++ b/arch/x86/kernel/kvm.c
>> @@ -797,7 +797,7 @@ static void kvm_wait(u8 *ptr, u8 val)
>>          if (in_nmi())
>>                  return;
>>
>> -       local_irq_save(flags);
>> +       raw_local_irq_save(flags);
>>
>>          if (READ_ONCE(*ptr) != val)
>>                  goto out;
>> @@ -810,10 +810,10 @@ static void kvm_wait(u8 *ptr, u8 val)
>>          if (arch_irqs_disabled_flags(flags))
>>                  halt();
>>          else
>> -               safe_halt();
>> +               raw_safe_halt();
>>
>>   out:
>> -       local_irq_restore(flags);
>> +       raw_local_irq_restore(flags);
>>   }
>>
>>   #ifdef CONFIG_X86_32
> 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
       [not found]                                     ` <20200807151903.GA1263469@elver.google.com>
       [not found]                                       ` <CANpmjNM1jASqCFYZpteVrZCa2V2D_DbXaqvoCV_Ac2boYfDXnQ@mail.gmail.com>
@ 2020-08-11  7:41                                       ` Peter Zijlstra
  2020-08-11  7:57                                         ` Jürgen Groß
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2020-08-11  7:41 UTC (permalink / raw)
  To: Marco Elver
  Cc: Jürgen Groß,
	fenghua.yu, yu-cheng.yu, Luck, Tony, Paul E. McKenney,
	Dave Hansen, syzkaller-bugs, LKML, kasan-dev, virtualization,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Thomas Gleixner,
	syzbot, the arch/x86 maintainers

On Fri, Aug 07, 2020 at 05:19:03PM +0200, Marco Elver wrote:

> My hypothesis here is simply that kvm_wait() may be called in a place
> where we get the same case I mentioned to Peter,
> 
> 	raw_local_irq_save(); /* or other IRQs off without tracing */
> 	...
> 	kvm_wait() /* IRQ state tracing gets confused */
> 	...
> 	raw_local_irq_restore();
> 
> and therefore, using raw variants in kvm_wait() works. It's also safe
> because it doesn't call any other libraries that would result in corrupt

Yes, this is definitely an issue.

Tracing, we also musn't call into tracing when using raw_local_irq_*().
Because then we re-intoduce this same issue all over again.

Both halt() and safe_halt() are more paravirt calls, but given we're in
a KVM paravirt call already, I suppose we can directly use native_*()
here.

Something like so then... I suppose, but then the Xen variants need TLC
too.

---
 arch/x86/include/asm/irqflags.h |  4 ++--
 arch/x86/include/asm/kvm_para.h | 18 +++++++++---------
 arch/x86/kernel/kvm.c           | 14 +++++++-------
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 02a0cf547d7b..7c614db25274 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -54,13 +54,13 @@ static __always_inline void native_irq_enable(void)
 	asm volatile("sti": : :"memory");
 }

-static inline __cpuidle void native_safe_halt(void)
+static __always_inline __cpuidle void native_safe_halt(void)
 {
 	mds_idle_clear_cpu_buffers();
 	asm volatile("sti; hlt": : :"memory");
 }

-static inline __cpuidle void native_halt(void)
+static __always_inline __cpuidle void native_halt(void)
 {
 	mds_idle_clear_cpu_buffers();
 	asm volatile("hlt": : :"memory");
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 49d3a9edb06f..90f7ea58ebb0 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -30,7 +30,7 @@ static inline bool kvm_check_and_clear_guest_paused(void)
  * noted by the particular hypercall.
  */

-static inline long kvm_hypercall0(unsigned int nr)
+static __always_inline long kvm_hypercall0(unsigned int nr)
 {
 	long ret;
 	asm volatile(KVM_HYPERCALL
@@ -40,7 +40,7 @@ static inline long kvm_hypercall0(unsigned int nr)
 	return ret;
 }

-static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
+static __always_inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
 {
 	long ret;
 	asm volatile(KVM_HYPERCALL
@@ -50,8 +50,8 @@ static inline long kvm_hypercall1(unsigned int nr, unsigned long p1)
 	return ret;
 }

-static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
-				  unsigned long p2)
+static __always_inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
+					   unsigned long p2)
 {
 	long ret;
 	asm volatile(KVM_HYPERCALL
@@ -61,8 +61,8 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned long p1,
 	return ret;
 }

-static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
-				  unsigned long p2, unsigned long p3)
+static __always_inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
+					   unsigned long p2, unsigned long p3)
 {
 	long ret;
 	asm volatile(KVM_HYPERCALL
@@ -72,9 +72,9 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned long p1,
 	return ret;
 }

-static inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
-				  unsigned long p2, unsigned long p3,
-				  unsigned long p4)
+static __always_inline long kvm_hypercall4(unsigned int nr, unsigned long p1,
+					   unsigned long p2, unsigned long p3,
+					   unsigned long p4)
 {
 	long ret;
 	asm volatile(KVM_HYPERCALL
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 233c77d056c9..15f8dfd8812d 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -779,7 +779,7 @@ arch_initcall(kvm_alloc_cpumask);
 #ifdef CONFIG_PARAVIRT_SPINLOCKS

 /* Kick a cpu by its apicid. Used to wake up a halted vcpu */
-static void kvm_kick_cpu(int cpu)
+static notrace kvm_kick_cpu(int cpu)
 {
 	int apicid;
 	unsigned long flags = 0;
@@ -790,14 +790,14 @@ static void kvm_kick_cpu(int cpu)

 #include <asm/qspinlock.h>

-static void kvm_wait(u8 *ptr, u8 val)
+static notrace kvm_wait(u8 *ptr, u8 val)
 {
 	unsigned long flags;

 	if (in_nmi())
 		return;

-	local_irq_save(flags);
+	raw_local_irq_save(flags);

 	if (READ_ONCE(*ptr) != val)
 		goto out;
@@ -808,16 +808,16 @@ static void kvm_wait(u8 *ptr, u8 val)
 	 * in irq spinlock slowpath and no spurious interrupt occur to save us.
 	 */
 	if (arch_irqs_disabled_flags(flags))
-		halt();
+		native_halt();
 	else
-		safe_halt();
+		native_safe_halt();

 out:
-	local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 }

 #ifdef CONFIG_X86_32
-__visible bool __kvm_vcpu_is_preempted(long cpu)
+__visible notrace bool __kvm_vcpu_is_preempted(long cpu)
 {
 	struct kvm_steal_time *src = &per_cpu(steal_time, cpu);


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
  2020-08-11  7:41                                       ` Peter Zijlstra
@ 2020-08-11  7:57                                         ` Jürgen Groß
  2020-08-11  8:12                                           ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Jürgen Groß @ 2020-08-11  7:57 UTC (permalink / raw)
  To: Peter Zijlstra, Marco Elver
  Cc: syzbot, fenghua.yu, yu-cheng.yu, Luck, Tony, Paul E. McKenney,
	Dave Hansen, syzkaller-bugs, LKML, kasan-dev, virtualization,
	Ingo Molnar, Borislav Petkov, H. Peter Anvin, Thomas Gleixner,
	Wei Liu, the arch/x86 maintainers

On 11.08.20 09:41, Peter Zijlstra wrote:
> On Fri, Aug 07, 2020 at 05:19:03PM +0200, Marco Elver wrote:
> 
>> My hypothesis here is simply that kvm_wait() may be called in a place
>> where we get the same case I mentioned to Peter,
>>
>> 	raw_local_irq_save(); /* or other IRQs off without tracing */
>> 	...
>> 	kvm_wait() /* IRQ state tracing gets confused */
>> 	...
>> 	raw_local_irq_restore();
>>
>> and therefore, using raw variants in kvm_wait() works. It's also safe
>> because it doesn't call any other libraries that would result in corrupt
> 
> Yes, this is definitely an issue.
> 
> Tracing, we also musn't call into tracing when using raw_local_irq_*().
> Because then we re-intoduce this same issue all over again.
> 
> Both halt() and safe_halt() are more paravirt calls, but given we're in
> a KVM paravirt call already, I suppose we can directly use native_*()
> here.
> 
> Something like so then... I suppose, but then the Xen variants need TLC
> too.

Just to be sure I understand you correct:

You mean that xen_qlock_kick() and xen_qlock_wait() and all functions
called by those should gain the "notrace" attribute, right?

I am not sure why the kick variants need it, though. IMO those are
called only after the lock has been released, so they should be fine
without notrace.

And again: we shouldn't forget the Hyper-V variants.


Juergen
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
  2020-08-11  7:57                                         ` Jürgen Groß
@ 2020-08-11  8:12                                           ` Peter Zijlstra
  2020-08-11  8:18                                             ` Jürgen Groß
  2020-08-11  8:38                                             ` Jürgen Groß
  0 siblings, 2 replies; 19+ messages in thread
From: Peter Zijlstra @ 2020-08-11  8:12 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: syzbot, fenghua.yu, Marco Elver, Luck, Tony, Paul E. McKenney,
	yu-cheng.yu, Dave Hansen, syzkaller-bugs, LKML, kasan-dev,
	virtualization, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Thomas Gleixner, Wei Liu, the arch/x86 maintainers

On Tue, Aug 11, 2020 at 09:57:55AM +0200, Jürgen Groß wrote:
> On 11.08.20 09:41, Peter Zijlstra wrote:
> > On Fri, Aug 07, 2020 at 05:19:03PM +0200, Marco Elver wrote:
> > 
> > > My hypothesis here is simply that kvm_wait() may be called in a place
> > > where we get the same case I mentioned to Peter,
> > > 
> > > 	raw_local_irq_save(); /* or other IRQs off without tracing */
> > > 	...
> > > 	kvm_wait() /* IRQ state tracing gets confused */
> > > 	...
> > > 	raw_local_irq_restore();
> > > 
> > > and therefore, using raw variants in kvm_wait() works. It's also safe
> > > because it doesn't call any other libraries that would result in corrupt
> > 
> > Yes, this is definitely an issue.
> > 
> > Tracing, we also musn't call into tracing when using raw_local_irq_*().
> > Because then we re-intoduce this same issue all over again.
> > 
> > Both halt() and safe_halt() are more paravirt calls, but given we're in
> > a KVM paravirt call already, I suppose we can directly use native_*()
> > here.
> > 
> > Something like so then... I suppose, but then the Xen variants need TLC
> > too.
> 
> Just to be sure I understand you correct:
> 
> You mean that xen_qlock_kick() and xen_qlock_wait() and all functions
> called by those should gain the "notrace" attribute, right?
> 
> I am not sure why the kick variants need it, though. IMO those are
> called only after the lock has been released, so they should be fine
> without notrace.

The issue happens when someone uses arch_spinlock_t under
raw_local_irq_*().

> And again: we shouldn't forget the Hyper-V variants.

Bah, my grep failed :/ Also *groan*, that's calling apic->send_IPI().


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
  2020-08-11  8:12                                           ` Peter Zijlstra
@ 2020-08-11  8:18                                             ` Jürgen Groß
  2020-08-11  8:38                                             ` Jürgen Groß
  1 sibling, 0 replies; 19+ messages in thread
From: Jürgen Groß @ 2020-08-11  8:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: syzbot, fenghua.yu, Marco Elver, Luck, Tony, Paul E. McKenney,
	yu-cheng.yu, Dave Hansen, syzkaller-bugs, LKML, kasan-dev,
	virtualization, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Thomas Gleixner, Wei Liu, the arch/x86 maintainers

On 11.08.20 10:12, Peter Zijlstra wrote:
> On Tue, Aug 11, 2020 at 09:57:55AM +0200, Jürgen Groß wrote:
>> On 11.08.20 09:41, Peter Zijlstra wrote:
>>> On Fri, Aug 07, 2020 at 05:19:03PM +0200, Marco Elver wrote:
>>>
>>>> My hypothesis here is simply that kvm_wait() may be called in a place
>>>> where we get the same case I mentioned to Peter,
>>>>
>>>> 	raw_local_irq_save(); /* or other IRQs off without tracing */
>>>> 	...
>>>> 	kvm_wait() /* IRQ state tracing gets confused */
>>>> 	...
>>>> 	raw_local_irq_restore();
>>>>
>>>> and therefore, using raw variants in kvm_wait() works. It's also safe
>>>> because it doesn't call any other libraries that would result in corrupt
>>>
>>> Yes, this is definitely an issue.
>>>
>>> Tracing, we also musn't call into tracing when using raw_local_irq_*().
>>> Because then we re-intoduce this same issue all over again.
>>>
>>> Both halt() and safe_halt() are more paravirt calls, but given we're in
>>> a KVM paravirt call already, I suppose we can directly use native_*()
>>> here.
>>>
>>> Something like so then... I suppose, but then the Xen variants need TLC
>>> too.
>>
>> Just to be sure I understand you correct:
>>
>> You mean that xen_qlock_kick() and xen_qlock_wait() and all functions
>> called by those should gain the "notrace" attribute, right?
>>
>> I am not sure why the kick variants need it, though. IMO those are
>> called only after the lock has been released, so they should be fine
>> without notrace.
> 
> The issue happens when someone uses arch_spinlock_t under
> raw_local_irq_*().

Ah, okay.

> 
>> And again: we shouldn't forget the Hyper-V variants.
> 
> Bah, my grep failed :/ Also *groan*, that's calling apic->send_IPI().

I've seen that, too. :-(


Juergen
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
  2020-08-11  8:12                                           ` Peter Zijlstra
  2020-08-11  8:18                                             ` Jürgen Groß
@ 2020-08-11  8:38                                             ` Jürgen Groß
  2020-08-11  9:20                                               ` peterz
  1 sibling, 1 reply; 19+ messages in thread
From: Jürgen Groß @ 2020-08-11  8:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: syzbot, fenghua.yu, Marco Elver, Luck, Tony, Paul E. McKenney,
	yu-cheng.yu, Dave Hansen, syzkaller-bugs, LKML, kasan-dev,
	virtualization, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Thomas Gleixner, Wei Liu, the arch/x86 maintainers

On 11.08.20 10:12, Peter Zijlstra wrote:
> On Tue, Aug 11, 2020 at 09:57:55AM +0200, Jürgen Groß wrote:
>> On 11.08.20 09:41, Peter Zijlstra wrote:
>>> On Fri, Aug 07, 2020 at 05:19:03PM +0200, Marco Elver wrote:
>>>
>>>> My hypothesis here is simply that kvm_wait() may be called in a place
>>>> where we get the same case I mentioned to Peter,
>>>>
>>>> 	raw_local_irq_save(); /* or other IRQs off without tracing */
>>>> 	...
>>>> 	kvm_wait() /* IRQ state tracing gets confused */
>>>> 	...
>>>> 	raw_local_irq_restore();
>>>>
>>>> and therefore, using raw variants in kvm_wait() works. It's also safe
>>>> because it doesn't call any other libraries that would result in corrupt
>>>
>>> Yes, this is definitely an issue.
>>>
>>> Tracing, we also musn't call into tracing when using raw_local_irq_*().
>>> Because then we re-intoduce this same issue all over again.
>>>
>>> Both halt() and safe_halt() are more paravirt calls, but given we're in
>>> a KVM paravirt call already, I suppose we can directly use native_*()
>>> here.
>>>
>>> Something like so then... I suppose, but then the Xen variants need TLC
>>> too.
>>
>> Just to be sure I understand you correct:
>>
>> You mean that xen_qlock_kick() and xen_qlock_wait() and all functions
>> called by those should gain the "notrace" attribute, right?
>>
>> I am not sure why the kick variants need it, though. IMO those are
>> called only after the lock has been released, so they should be fine
>> without notrace.
> 
> The issue happens when someone uses arch_spinlock_t under
> raw_local_irq_*().
> 
>> And again: we shouldn't forget the Hyper-V variants.
> 
> Bah, my grep failed :/ Also *groan*, that's calling apic->send_IPI().

In case you don't want to do it I can send the patch for the Xen
variants.


Juergen
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
  2020-08-11  8:38                                             ` Jürgen Groß
@ 2020-08-11  9:20                                               ` peterz
  2020-08-11  9:46                                                 ` peterz
  0 siblings, 1 reply; 19+ messages in thread
From: peterz @ 2020-08-11  9:20 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: syzbot, fenghua.yu, Marco Elver, Luck, Tony, Paul E. McKenney,
	yu-cheng.yu, Dave Hansen, syzkaller-bugs, LKML, kasan-dev,
	virtualization, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Thomas Gleixner, Wei Liu, the arch/x86 maintainers

On Tue, Aug 11, 2020 at 10:38:50AM +0200, Jürgen Groß wrote:
> In case you don't want to do it I can send the patch for the Xen
> variants.

I might've opened a whole new can of worms here. I'm not sure we
can/want to fix the entire fallout this release :/

Let me ponder this a little, because the more I look at things, the more
problems I keep finding... bah bah bah.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
  2020-08-11  9:20                                               ` peterz
@ 2020-08-11  9:46                                                 ` peterz
  2020-08-11 20:17                                                   ` peterz
  0 siblings, 1 reply; 19+ messages in thread
From: peterz @ 2020-08-11  9:46 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: syzbot, fenghua.yu, Marco Elver, Luck, Tony, Paul E. McKenney,
	yu-cheng.yu, Dave Hansen, syzkaller-bugs, LKML, kasan-dev,
	virtualization, Ingo Molnar, Borislav Petkov, Steven Rostedt,
	H. Peter Anvin, Thomas Gleixner, Wei Liu,
	the arch/x86 maintainers

On Tue, Aug 11, 2020 at 11:20:54AM +0200, peterz@infradead.org wrote:
> On Tue, Aug 11, 2020 at 10:38:50AM +0200, Jürgen Groß wrote:
> > In case you don't want to do it I can send the patch for the Xen
> > variants.
> 
> I might've opened a whole new can of worms here. I'm not sure we
> can/want to fix the entire fallout this release :/
> 
> Let me ponder this a little, because the more I look at things, the more
> problems I keep finding... bah bah bah.

That is, most of these irq-tracking problem are new because commit:

  859d069ee1dd ("lockdep: Prepare for NMI IRQ state tracking")

changed irq-tracking to ignore the lockdep recursion count.

This then allows:

	lock_acquire()
	  raw_local_irq_save();
	  current->lockdep_recursion++;
	  trace_lock_acquire()
	   ... tracing ...
	     #PF under raw_local_irq_*()

	  __lock_acquire()
	    arch_spin_lock(&graph_lock)
	      pv-spinlock-wait()
	        local_irq_save() under raw_local_irq_*()

However afaict that just made a bad situation worse. There already were
issues, take for example:

  trace_clock_global()
    raw_local_irq_save();
    arch_spin_lock()
      pv-spinlock-wait
        local_irq_save()

And that has no lockdep_recursion to 'save' the say.

The tracing recursion does however avoid some of the obvious fails
there, like trace_clock calling into paravirt which then calls back into
tracing. But still, that would've caused IRQ tracking problems even with
the old code.

And in that respect, this is all the exact same problem as that other
set of patches has ( 20200807192336.405068898@infradead.org ).

Now, on the flip side, it does find actual problems, the trace_lock_*()
things were using RCU in RCU-disabled code, and here I found that
trace_clock_global() thinkg (and I suspect there's more of that).

But at this point I'm not entirelty sure how best to proceed... tracing
uses arch_spinlock_t, which means all spinlock implementations should be
notrace, but then that drops into paravirt and all hell breaks loose
because Hyper-V then calls into the APIC code etc.. etc..

At that rate we'll have the entire kernel marked notrace, and I'm fairly
sure that's not a solution either.

So let me once again see if I can't find a better solution for this all.
Clearly it needs one :/
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
  2020-08-11  9:46                                                 ` peterz
@ 2020-08-11 20:17                                                   ` peterz
       [not found]                                                     ` <20200812080650.GA3894595@elver.google.com>
  0 siblings, 1 reply; 19+ messages in thread
From: peterz @ 2020-08-11 20:17 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: syzbot, fenghua.yu, Marco Elver, Luck, Tony, Paul E. McKenney,
	yu-cheng.yu, Dave Hansen, syzkaller-bugs, LKML, kasan-dev,
	virtualization, Ingo Molnar, Borislav Petkov, Steven Rostedt,
	H. Peter Anvin, Thomas Gleixner, Wei Liu,
	the arch/x86 maintainers

On Tue, Aug 11, 2020 at 11:46:51AM +0200, peterz@infradead.org wrote:

> So let me once again see if I can't find a better solution for this all.
> Clearly it needs one :/

So the below boots without triggering the debug code from Marco -- it
should allow nesting local_irq_save/restore under raw_local_irq_*().

I tried unconditional counting, but there's some _reallly_ wonky /
asymmetric code that wrecks that and I've not been able to come up with
anything useful.

This one starts counting when local_irq_save() finds it didn't disable
IRQs while lockdep though it did. At that point, local_irq_restore()
will decrement and enable things again when it reaches 0.

This assumes local_irq_save()/local_irq_restore() are nested sane, which
is mostly true.

This leaves #PF, which I fixed in these other patches, but I realized it
needs fixing for all architectures :-( No bright ideas there yet.

---
 arch/x86/entry/thunk_32.S       |  5 ----
 include/linux/irqflags.h        | 45 +++++++++++++++++++-------------
 init/main.c                     | 16 ++++++++++++
 kernel/locking/lockdep.c        | 58 +++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_preemptirq.c | 33 +++++++++++++++++++++++
 5 files changed, 134 insertions(+), 23 deletions(-)

diff --git a/arch/x86/entry/thunk_32.S b/arch/x86/entry/thunk_32.S
index 3a07ce3ec70b..f1f96d4d8cd6 100644
--- a/arch/x86/entry/thunk_32.S
+++ b/arch/x86/entry/thunk_32.S
@@ -29,11 +29,6 @@ SYM_CODE_START_NOALIGN(\name)
 SYM_CODE_END(\name)
 	.endm
 
-#ifdef CONFIG_TRACE_IRQFLAGS
-	THUNK trace_hardirqs_on_thunk,trace_hardirqs_on_caller,1
-	THUNK trace_hardirqs_off_thunk,trace_hardirqs_off_caller,1
-#endif
-
 #ifdef CONFIG_PREEMPTION
 	THUNK preempt_schedule_thunk, preempt_schedule
 	THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index bd5c55755447..67e2a16d3846 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -23,12 +23,16 @@
   extern void lockdep_hardirqs_on_prepare(unsigned long ip);
   extern void lockdep_hardirqs_on(unsigned long ip);
   extern void lockdep_hardirqs_off(unsigned long ip);
+  extern void lockdep_hardirqs_save(unsigned long ip, unsigned long flags);
+  extern void lockdep_hardirqs_restore(unsigned long ip, unsigned long flags);
 #else
   static inline void lockdep_softirqs_on(unsigned long ip) { }
   static inline void lockdep_softirqs_off(unsigned long ip) { }
   static inline void lockdep_hardirqs_on_prepare(unsigned long ip) { }
   static inline void lockdep_hardirqs_on(unsigned long ip) { }
   static inline void lockdep_hardirqs_off(unsigned long ip) { }
+  static inline void lockdep_hardirqs_save(unsigned long ip, unsigned long flags) { }
+  static inline void lockdep_hardirqs_restore(unsigned long ip, unsigned long flags) { }
 #endif
 
 #ifdef CONFIG_TRACE_IRQFLAGS
@@ -49,10 +53,13 @@ struct irqtrace_events {
 DECLARE_PER_CPU(int, hardirqs_enabled);
 DECLARE_PER_CPU(int, hardirq_context);
 
-  extern void trace_hardirqs_on_prepare(void);
-  extern void trace_hardirqs_off_finish(void);
-  extern void trace_hardirqs_on(void);
-  extern void trace_hardirqs_off(void);
+extern void trace_hardirqs_on_prepare(void);
+extern void trace_hardirqs_off_finish(void);
+extern void trace_hardirqs_on(void);
+extern void trace_hardirqs_off(void);
+extern void trace_hardirqs_save(unsigned long flags);
+extern void trace_hardirqs_restore(unsigned long flags);
+
 # define lockdep_hardirq_context()	(this_cpu_read(hardirq_context))
 # define lockdep_softirq_context(p)	((p)->softirq_context)
 # define lockdep_hardirqs_enabled()	(this_cpu_read(hardirqs_enabled))
@@ -120,17 +127,19 @@ do {						\
 #else
 # define trace_hardirqs_on_prepare()		do { } while (0)
 # define trace_hardirqs_off_finish()		do { } while (0)
-# define trace_hardirqs_on()		do { } while (0)
-# define trace_hardirqs_off()		do { } while (0)
-# define lockdep_hardirq_context()	0
-# define lockdep_softirq_context(p)	0
-# define lockdep_hardirqs_enabled()	0
-# define lockdep_softirqs_enabled(p)	0
-# define lockdep_hardirq_enter()	do { } while (0)
-# define lockdep_hardirq_threaded()	do { } while (0)
-# define lockdep_hardirq_exit()		do { } while (0)
-# define lockdep_softirq_enter()	do { } while (0)
-# define lockdep_softirq_exit()		do { } while (0)
+# define trace_hardirqs_on()			do { } while (0)
+# define trace_hardirqs_off()			do { } while (0)
+# define trace_hardirqs_save(flags)		do { } while (0)
+# define trace_hardirqs_restore(flags)		do { } while (0)
+# define lockdep_hardirq_context()		0
+# define lockdep_softirq_context(p)		0
+# define lockdep_hardirqs_enabled()		0
+# define lockdep_softirqs_enabled(p)		0
+# define lockdep_hardirq_enter()		do { } while (0)
+# define lockdep_hardirq_threaded()		do { } while (0)
+# define lockdep_hardirq_exit()			do { } while (0)
+# define lockdep_softirq_enter()		do { } while (0)
+# define lockdep_softirq_exit()			do { } while (0)
 # define lockdep_hrtimer_enter(__hrtimer)	false
 # define lockdep_hrtimer_exit(__context)	do { } while (0)
 # define lockdep_posixtimer_enter()		do { } while (0)
@@ -185,18 +194,18 @@ do {						\
 	do { trace_hardirqs_on(); raw_local_irq_enable(); } while (0)
 #define local_irq_disable() \
 	do { raw_local_irq_disable(); trace_hardirqs_off(); } while (0)
+
 #define local_irq_save(flags)				\
 	do {						\
 		raw_local_irq_save(flags);		\
-		trace_hardirqs_off();			\
+		trace_hardirqs_save(flags);		\
 	} while (0)
 
-
 #define local_irq_restore(flags)			\
 	do {						\
 		if (raw_irqs_disabled_flags(flags)) {	\
 			raw_local_irq_restore(flags);	\
-			trace_hardirqs_off();		\
+			trace_hardirqs_restore(flags);	\
 		} else {				\
 			trace_hardirqs_on();		\
 			raw_local_irq_restore(flags);	\
diff --git a/init/main.c b/init/main.c
index 15bd0efff3df..0873319dcff4 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1041,6 +1041,22 @@ asmlinkage __visible void __init start_kernel(void)
 	sfi_init_late();
 	kcsan_init();
 
+	/* DEBUG CODE */
+	lockdep_assert_irqs_enabled(); /* Pass. */
+	{
+		unsigned long flags1;
+		raw_local_irq_save(flags1);
+		{
+			unsigned long flags2;
+			lockdep_assert_irqs_enabled(); /* Pass - expectedly blind. */
+			local_irq_save(flags2);
+			lockdep_assert_irqs_disabled(); /* Pass. */
+			local_irq_restore(flags2);
+		}
+		raw_local_irq_restore(flags1);
+	}
+	lockdep_assert_irqs_enabled(); /* FAIL! */
+
 	/* Do the rest non-__init'ed, we're now alive */
 	arch_call_rest_init();
 
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 3617abb08e31..2c88574b817c 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3763,6 +3763,30 @@ void noinstr lockdep_hardirqs_on(unsigned long ip)
 }
 EXPORT_SYMBOL_GPL(lockdep_hardirqs_on);
 
+static DEFINE_PER_CPU(int, hardirqs_disabled);
+
+void lockdep_hardirqs_restore(unsigned long ip, unsigned long flags)
+{
+	if (unlikely(!debug_locks))
+		return;
+
+	if (in_nmi()) {
+		if (!IS_ENABLED(CONFIG_TRACE_IRQFLAGS_NMI))
+			return;
+	} else if (current->lockdep_recursion & LOCKDEP_RECURSION_MASK)
+		return;
+
+	if (__this_cpu_read(hardirqs_disabled) &&
+	    __this_cpu_dec_return(hardirqs_disabled) == 0) {
+
+		lockdep_hardirqs_on_prepare(ip);
+		lockdep_hardirqs_on(ip);
+	} else {
+		lockdep_hardirqs_off(ip);
+	}
+}
+EXPORT_SYMBOL_GPL(lockdep_hardirqs_restore);
+
 /*
  * Hardirqs were disabled:
  */
@@ -3805,6 +3829,40 @@ void noinstr lockdep_hardirqs_off(unsigned long ip)
 }
 EXPORT_SYMBOL_GPL(lockdep_hardirqs_off);
 
+void lockdep_hardirqs_save(unsigned long ip, unsigned long flags)
+{
+	if (unlikely(!debug_locks))
+		return;
+
+	/*
+	 * Matching lockdep_hardirqs_on(), allow NMIs in the middle of lockdep;
+	 * they will restore the software state. This ensures the software
+	 * state is consistent inside NMIs as well.
+	 */
+	if (in_nmi()) {
+		if (!IS_ENABLED(CONFIG_TRACE_IRQFLAGS_NMI))
+			return;
+	} else if (current->lockdep_recursion & LOCKDEP_RECURSION_MASK)
+		return;
+
+	/*
+	 * If IRQs were disabled, but IRQ tracking state said enabled we
+	 * 'missed' an update (or are nested inside raw_local_irq_*()) and
+	 * cannot rely on IRQ state to tell us when we should enable again.
+	 *
+	 * Count our way through this.
+	 */
+	if (raw_irqs_disabled_flags(flags)) {
+		if (__this_cpu_read(hardirqs_enabled)) {
+			WARN_ON_ONCE(__this_cpu_read(hardirqs_disabled) != 0);
+			__this_cpu_write(hardirqs_disabled, 1);
+		} else if (__this_cpu_read(hardirqs_disabled))
+			__this_cpu_inc(hardirqs_disabled);
+	}
+	lockdep_hardirqs_off(ip);
+}
+EXPORT_SYMBOL_GPL(lockdep_hardirqs_save);
+
 /*
  * Softirqs will be enabled:
  */
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
index f10073e62603..080deaa1d9c9 100644
--- a/kernel/trace/trace_preemptirq.c
+++ b/kernel/trace/trace_preemptirq.c
@@ -85,6 +85,36 @@ void trace_hardirqs_off(void)
 EXPORT_SYMBOL(trace_hardirqs_off);
 NOKPROBE_SYMBOL(trace_hardirqs_off);
 
+void trace_hardirqs_save(unsigned long flags)
+{
+	lockdep_hardirqs_save(CALLER_ADDR0, flags);
+
+	if (!this_cpu_read(tracing_irq_cpu)) {
+		this_cpu_write(tracing_irq_cpu, 1);
+		tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1);
+		if (!in_nmi())
+			trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+	}
+}
+EXPORT_SYMBOL(trace_hardirqs_save);
+NOKPROBE_SYMBOL(trace_hardirqs_save);
+
+void trace_hardirqs_restore(unsigned long flags)
+{
+	if (this_cpu_read(tracing_irq_cpu)) {
+		if (!in_nmi())
+			trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+		tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1);
+		this_cpu_write(tracing_irq_cpu, 0);
+	}
+
+	lockdep_hardirqs_restore(CALLER_ADDR0, flags);
+}
+EXPORT_SYMBOL(trace_hardirqs_restore);
+NOKPROBE_SYMBOL(trace_hardirqs_restore);
+
+#ifdef __s390__
+
 __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
 {
 	if (this_cpu_read(tracing_irq_cpu)) {
@@ -113,6 +143,9 @@ __visible void trace_hardirqs_off_caller(unsigned long caller_addr)
 }
 EXPORT_SYMBOL(trace_hardirqs_off_caller);
 NOKPROBE_SYMBOL(trace_hardirqs_off_caller);
+
+#endif /* __s390__ */
+
 #endif /* CONFIG_TRACE_IRQFLAGS */
 
 #ifdef CONFIG_TRACE_PREEMPT_TOGGLE
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
       [not found]                                                     ` <20200812080650.GA3894595@elver.google.com>
@ 2020-08-12  8:18                                                       ` peterz
  2020-08-12  8:57                                                         ` peterz
  0 siblings, 1 reply; 19+ messages in thread
From: peterz @ 2020-08-12  8:18 UTC (permalink / raw)
  To: Marco Elver
  Cc: Jürgen Groß,
	fenghua.yu, yu-cheng.yu, Luck, Tony, Paul E. McKenney,
	Dave Hansen, syzkaller-bugs, LKML, kasan-dev, virtualization,
	Ingo Molnar, Borislav Petkov, Steven Rostedt, H. Peter Anvin,
	Thomas Gleixner, syzbot, Wei Liu, the arch/x86 maintainers

On Wed, Aug 12, 2020 at 10:06:50AM +0200, Marco Elver wrote:
> On Tue, Aug 11, 2020 at 10:17PM +0200, peterz@infradead.org wrote:
> > On Tue, Aug 11, 2020 at 11:46:51AM +0200, peterz@infradead.org wrote:
> > 
> > > So let me once again see if I can't find a better solution for this all.
> > > Clearly it needs one :/
> > 
> > So the below boots without triggering the debug code from Marco -- it
> > should allow nesting local_irq_save/restore under raw_local_irq_*().
> > 
> > I tried unconditional counting, but there's some _reallly_ wonky /
> > asymmetric code that wrecks that and I've not been able to come up with
> > anything useful.
> > 
> > This one starts counting when local_irq_save() finds it didn't disable
> > IRQs while lockdep though it did. At that point, local_irq_restore()
> > will decrement and enable things again when it reaches 0.
> > 
> > This assumes local_irq_save()/local_irq_restore() are nested sane, which
> > is mostly true.
> > 
> > This leaves #PF, which I fixed in these other patches, but I realized it
> > needs fixing for all architectures :-( No bright ideas there yet.
> > 
> > ---
> >  arch/x86/entry/thunk_32.S       |  5 ----
> >  include/linux/irqflags.h        | 45 +++++++++++++++++++-------------
> >  init/main.c                     | 16 ++++++++++++
> >  kernel/locking/lockdep.c        | 58 +++++++++++++++++++++++++++++++++++++++++
> >  kernel/trace/trace_preemptirq.c | 33 +++++++++++++++++++++++
> >  5 files changed, 134 insertions(+), 23 deletions(-)
> 
> Testing this again with syzkaller produced some new reports:
> 
> 	BUG: stack guard page was hit in error_entry
> 	BUG: stack guard page was hit in exc_int3
> 	PANIC: double fault in error_entry
> 	PANIC: double fault in exc_int3
> 
> Most of them have corrupted reports, but this one might be useful:
> 
> 	BUG: stack guard page was hit at 000000001fab0982 (stack is 00000000063f33dc..00000000bf04b0d8)
> 	BUG: stack guard page was hit at 00000000ca97ac69 (stack is 00000000af3e6c84..000000001597e1bf)
> 	kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP
> 	CPU: 1 PID: 4709 Comm: kworker/1:1H Not tainted 5.8.0+ #5
> 	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
> 	Workqueue: events_highpri snd_vmidi_output_work
> 	RIP: 0010:exc_int3+0x5/0xf0 arch/x86/kernel/traps.c:636
> 	Code: c9 85 4d 89 e8 31 c0 e8 a9 7d 68 fd e9 90 fe ff ff e8 0f 35 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 53 48 89 fb <e8> 76 0e 00 00 85 c0 74 03 5b 5d c3 f6 83 88 00 00 00 03 74 7e 48
> 	RSP: 0018:ffffc90008114000 EFLAGS: 00010083
> 	RAX: 0000000084e00e17 RBX: ffffc90008114018 RCX: ffffffff84e00e17
> 	RDX: 0000000000000000 RSI: ffffffff84e00a39 RDI: ffffc90008114018
> 	RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> 	R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> 	R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> 	FS:  0000000000000000(0000) GS:ffff88807dc80000(0000) knlGS:0000000000000000
> 	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 	CR2: ffffc90008113ff8 CR3: 000000002dae4006 CR4: 0000000000770ee0
> 	DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 	DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 	PKRU: 00000000
> 	Call Trace:
> 	 asm_exc_int3+0x31/0x40 arch/x86/include/asm/idtentry.h:537
> 	RIP: 0010:arch_static_branch include/trace/events/preemptirq.h:40 [inline]
> 	RIP: 0010:static_key_false include/linux/jump_label.h:200 [inline]
> 	RIP: 0010:trace_irq_enable_rcuidle+0xd/0x120 include/trace/events/preemptirq.h:40
> 	Code: 24 08 48 89 df e8 43 8d ef ff 48 89 df 5b e9 4a 2e 99 03 66 2e 0f 1f 84 00 00 00 00 00 55 41 56 53 48 89 fb e8 84 1a fd ff cc <1f> 44 00 00 5b 41 5e 5d c3 65 8b 05 ab 74 c3 7e 89 c0 31 f6 48 0f
> 	RSP: 0018:ffffc900081140f8 EFLAGS: 00000093
> 	RAX: ffffffff813d9e8c RBX: ffffffff81314dd3 RCX: ffff888076ce6000
> 	RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81314dd3
> 	RBP: 0000000000000000 R08: ffffffff813da3d4 R09: 0000000000000001
> 	R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> 	R13: 0000000000000082 R14: 0000000000000000 R15: ffff888076ce6000
> 	 trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> 	 rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> 	 trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
> 	 trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> 	 rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> 	 trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
> 	 trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> 	 rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> 	 trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
> 	 trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> 	 rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> 
> 	<... repeated many many times ...>
> 
> 	 trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
> 	 trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> 	 rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> 	Lost 500 message(s)!
> 	BUG: stack guard page was hit at 00000000cab483ba (stack is 00000000b1442365..00000000c26f9ad3)
> 	BUG: stack guard page was hit at 00000000318ff8d8 (stack is 00000000fd87d656..0000000058100136)
> 	---[ end trace 4157e0bb4a65941a ]---

Wheee... recursion! Let me try and see if I can make something of that.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
  2020-08-12  8:18                                                       ` peterz
@ 2020-08-12  8:57                                                         ` peterz
  0 siblings, 0 replies; 19+ messages in thread
From: peterz @ 2020-08-12  8:57 UTC (permalink / raw)
  To: Marco Elver
  Cc: Jürgen Groß,
	fenghua.yu, yu-cheng.yu, Luck, Tony, Paul E. McKenney,
	Dave Hansen, syzkaller-bugs, LKML, kasan-dev, virtualization,
	Ingo Molnar, Borislav Petkov, Steven Rostedt, H. Peter Anvin,
	Thomas Gleixner, syzbot, Wei Liu, the arch/x86 maintainers

On Wed, Aug 12, 2020 at 10:18:32AM +0200, peterz@infradead.org wrote:
> > 	 trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> > 	 rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> > 	 trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
> > 	 trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> > 	 rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> > 	 trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
> > 	 trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> > 	 rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> > 	 trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
> > 	 trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> > 	 rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> > 
> > 	<... repeated many many times ...>
> > 
> > 	 trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40
> > 	 trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106
> > 	 rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074
> > 	Lost 500 message(s)!
> > 	BUG: stack guard page was hit at 00000000cab483ba (stack is 00000000b1442365..00000000c26f9ad3)
> > 	BUG: stack guard page was hit at 00000000318ff8d8 (stack is 00000000fd87d656..0000000058100136)
> > 	---[ end trace 4157e0bb4a65941a ]---
> 
> Wheee... recursion! Let me try and see if I can make something of that.

All that's needed is enabling the preemptirq tracepoints. Lemme go fix.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2020-08-12  8:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0000000000007d3b2d05ac1c303e@google.com>
     [not found] ` <20200805132629.GA87338@elver.google.com>
2020-08-05 13:42   ` [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers peterz
     [not found]     ` <20200805135940.GA156343@elver.google.com>
2020-08-05 14:12       ` peterz
2020-08-05 14:17         ` Jürgen Groß
2020-08-05 14:17         ` peterz
     [not found]           ` <CANpmjNN6FWZ+MsAn3Pj+WEez97diHzqF8hjONtHG15C2gSpSgw@mail.gmail.com>
     [not found]             ` <CANpmjNNy3XKQqgrjGPPKKvXhAoF=mae7dk8hmoS4k4oNnnB=KA@mail.gmail.com>
     [not found]               ` <20200806074723.GA2364872@elver.google.com>
2020-08-06 11:32                 ` peterz
     [not found]                   ` <20200806131702.GA3029162@elver.google.com>
     [not found]                     ` <CANpmjNNqt8YrCad4WqgCoXvH47pRXtSLpnTKhD8W8+UpoYJ+jQ@mail.gmail.com>
     [not found]                       ` <CANpmjNO860SHpNve+vaoAOgarU1SWy8o--tUWCqNhn82OLCiew@mail.gmail.com>
2020-08-07  9:24                         ` Jürgen Groß
     [not found]                           ` <20200807095032.GA3528289@elver.google.com>
2020-08-07 10:35                             ` Jürgen Groß
     [not found]                               ` <20200807113838.GA3547125@elver.google.com>
2020-08-07 12:04                                 ` Jürgen Groß
     [not found]                                   ` <CANpmjNPau_DEYadey9OL+iFZKEaUTqnFnyFs1dU12o00mg7ofA@mail.gmail.com>
     [not found]                                     ` <20200807151903.GA1263469@elver.google.com>
     [not found]                                       ` <CANpmjNM1jASqCFYZpteVrZCa2V2D_DbXaqvoCV_Ac2boYfDXnQ@mail.gmail.com>
2020-08-11  7:04                                         ` Jürgen Groß
2020-08-11  7:41                                       ` Peter Zijlstra
2020-08-11  7:57                                         ` Jürgen Groß
2020-08-11  8:12                                           ` Peter Zijlstra
2020-08-11  8:18                                             ` Jürgen Groß
2020-08-11  8:38                                             ` Jürgen Groß
2020-08-11  9:20                                               ` peterz
2020-08-11  9:46                                                 ` peterz
2020-08-11 20:17                                                   ` peterz
     [not found]                                                     ` <20200812080650.GA3894595@elver.google.com>
2020-08-12  8:18                                                       ` peterz
2020-08-12  8:57                                                         ` peterz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).