All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/7] Early task context tracking
@ 2019-04-02 20:03 Daniel Bristot de Oliveira
  2019-04-02 20:03 ` [RFC PATCH 1/7] x86/entry: Add support for early " Daniel Bristot de Oliveira
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-04-02 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Arnaldo Carvalho de Melo, Ingo Molnar,
	Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, H. Peter Anvin, Joel Fernandes (Google),
	Jiri Olsa, Namhyung Kim, Alexander Shishkin, Tommaso Cucinotta,
	Romulo Silva de Oliveira, Clark Williams, x86

Note: do not take it too seriously, it is just a proof of concept.

Some time ago, while using perf to check the automaton model, I noticed
that perf was losing events. The same was reproducible with ftrace.

See: https://www.spinics.net/lists/linux-rt-users/msg19781.html

Steve pointed to a problem in the identification of the context
execution used by the recursion control.

Currently, recursion control uses the preempt_counter to
identify the current context. The NMI/HARD/SOFT IRQ counters
are set in the preempt_counter in the irq_enter/exit functions.

In a trace, they are set like this:
-------------- %< --------------------
 0)   ==========> |
 0)               |  do_IRQ() {		/* First C function */
 0)               |    irq_enter() {
 0)               |      		/* set the IRQ context. */
 0)   1.081 us    |    }
 0)               |    handle_irq() {
 0)               |     		/* IRQ handling code */
 0) + 10.290 us   |    }
 0)               |    irq_exit() {
 0)               |      		/* unset the IRQ context. */
 0)   6.657 us    |    }
 0) + 18.995 us   |  }
 0)   <========== |
-------------- >% --------------------

As one can see, functions (and events) that take place before the set
and after unset the preempt_counter are identified in the wrong context,
causing the miss interpretation that a recursion is taking place.
When this happens, events are dropped.

To resolve this problem, the set/unset of the IRQ/NMI context needs to
be done before the execution of the first C execution, and after its
return. By doing so, and using this method to identify the context in the
trace recursion protection, no more events are lost.

A possible solution is to use a per-cpu variable set and unset in the
entry point of NMI/IRQs, before calling the C handler. This possible
solution is presented in the next patches as a proof of concept, for
x86_64. However, other ideas might be better than mine... so...

Daniel Bristot de Oliveira (7):
  x86/entry: Add support for early task context tracking
  trace: Move the trace recursion context enum to trace.h and reuse it
  trace: Optimize trace_get_context_bit()
  trace/ring_buffer: Use trace_get_context_bit()
  trace: Use early task context tracking if available
  events: Create an trace_get_context_bit()
  events: Use early task context tracking if available

 arch/x86/entry/entry_64.S       |  9 ++++++
 arch/x86/include/asm/irqflags.h | 30 ++++++++++++++++++++
 arch/x86/kernel/cpu/common.c    |  4 +++
 include/linux/irqflags.h        |  4 +++
 kernel/events/internal.h        | 50 +++++++++++++++++++++++++++------
 kernel/softirq.c                |  5 +++-
 kernel/trace/ring_buffer.c      | 28 ++----------------
 kernel/trace/trace.h            | 46 ++++++++++++++++++++++--------
 8 files changed, 129 insertions(+), 47 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/7] x86/entry: Add support for early task context tracking
  2019-04-02 20:03 [RFC PATCH 0/7] Early task context tracking Daniel Bristot de Oliveira
@ 2019-04-02 20:03 ` Daniel Bristot de Oliveira
  2019-04-02 20:03 ` [RFC PATCH 2/7] trace: Move the trace recursion context enum to trace.h and reuse it Daniel Bristot de Oliveira
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-04-02 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Arnaldo Carvalho de Melo, Ingo Molnar,
	Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, H. Peter Anvin, Joel Fernandes (Google),
	Jiri Olsa, Namhyung Kim, Alexander Shishkin, Tommaso Cucinotta,
	Romulo Silva de Oliveira, Clark Williams, x86

Currently, the identification of the context is made through the
preempt_counter, but it is set after the execution of the first functions
of the IRQ/NMI, causing potential problems in the identification of the
current status. For instance, ftrace/perf might drop events in the early
stage of IRQ/NMI handlers because the preempt_counter was not set.

The proposed approach is to use a dedicated per-cpu variable to keep
track of the context of execution, with values set before the execution
of the first C function of the interrupt handler.

This is a PoC in the x86_64.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Clark Williams <williams@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
---
 arch/x86/entry/entry_64.S       |  9 +++++++++
 arch/x86/include/asm/irqflags.h | 30 ++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/common.c    |  4 ++++
 include/linux/irqflags.h        |  4 ++++
 kernel/softirq.c                |  5 ++++-
 5 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1f0efdb7b629..1471b544241f 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -545,6 +545,7 @@ ENTRY(interrupt_entry)
 	testb	$3, CS+8(%rsp)
 	jz	1f
 
+	TASK_CONTEXT_SET_BIT context=TASK_CTX_IRQ
 	/*
 	 * IRQ from user mode.
 	 *
@@ -561,6 +562,8 @@ ENTRY(interrupt_entry)
 
 1:
 	ENTER_IRQ_STACK old_rsp=%rdi save_ret=1
+
+	TASK_CONTEXT_SET_BIT context=TASK_CTX_IRQ
 	/* We entered an interrupt context - irqs are off: */
 	TRACE_IRQS_OFF
 
@@ -586,6 +589,7 @@ ret_from_intr:
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF
 
+	TASK_CONTEXT_RESET_BIT context=TASK_CTX_IRQ
 	LEAVE_IRQ_STACK
 
 	testb	$3, CS(%rsp)
@@ -780,6 +784,7 @@ ENTRY(\sym)
 	call	interrupt_entry
 	UNWIND_HINT_REGS indirect=1
 	call	\do_sym	/* rdi points to pt_regs */
+	TASK_CONTEXT_RESET_BIT context=TASK_CTX_IRQ
 	jmp	ret_from_intr
 END(\sym)
 _ASM_NOKPROBE(\sym)
@@ -1403,9 +1408,11 @@ ENTRY(nmi)
 	 * done with the NMI stack.
 	 */
 
+	TASK_CONTEXT_SET_BIT context=TASK_CTX_NMI
 	movq	%rsp, %rdi
 	movq	$-1, %rsi
 	call	do_nmi
+	TASK_CONTEXT_RESET_BIT context=TASK_CTX_NMI
 
 	/*
 	 * Return back to user mode.  We must *not* do the normal exit
@@ -1615,10 +1622,12 @@ end_repeat_nmi:
 	call	paranoid_entry
 	UNWIND_HINT_REGS
 
+	TASK_CONTEXT_SET_BIT context=TASK_CTX_NMI
 	/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
 	movq	%rsp, %rdi
 	movq	$-1, %rsi
 	call	do_nmi
+	TASK_CONTEXT_RESET_BIT context=TASK_CTX_NMI
 
 	/* Always restore stashed CR3 value (see paranoid_entry) */
 	RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 058e40fed167..5a12bc3ea02b 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -3,6 +3,7 @@
 #define _X86_IRQFLAGS_H_
 
 #include <asm/processor-flags.h>
+#include <asm/percpu.h>
 
 #ifndef __ASSEMBLY__
 
@@ -202,4 +203,33 @@ static inline int arch_irqs_disabled(void)
 #endif
 #endif /* __ASSEMBLY__ */
 
+#ifdef CONFIG_X86_64
+/*
+ * NOTE: I know I need to implement this to the 32 bits as well.
+ * But... this is just a POC.
+ */
+#define ARCH_HAS_TASK_CONTEXT   1
+
+#define TASK_CTX_THREAD		0x0
+#define TASK_CTX_SOFTIRQ	0x1
+#define TASK_CTX_IRQ		0x2
+#define TASK_CTX_NMI		0x4
+
+#ifdef __ASSEMBLY__
+.macro TASK_CONTEXT_SET_BIT context:req
+	orb	$\context, PER_CPU_VAR(task_context)
+.endm
+
+.macro TASK_CONTEXT_RESET_BIT context:req
+	andb	$~\context, PER_CPU_VAR(task_context)
+.endm
+#else /* __ASSEMBLY__ */
+DECLARE_PER_CPU(unsigned char, task_context);
+
+static __always_inline void task_context_set(unsigned char context)
+{
+	raw_cpu_write_1(task_context, context);
+}
+#endif /* __ASSEMBLY__ */
+#endif /* CONFIG_X86_64 */
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cb28e98a0659..1acbec22319b 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1531,6 +1531,8 @@ DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
 DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
 EXPORT_PER_CPU_SYMBOL(__preempt_count);
 
+DEFINE_PER_CPU(unsigned char, task_context) __visible = 0;
+
 /* May not be marked __init: used by software suspend */
 void syscall_init(void)
 {
@@ -1604,6 +1606,8 @@ EXPORT_PER_CPU_SYMBOL(current_task);
 DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
 EXPORT_PER_CPU_SYMBOL(__preempt_count);
 
+DEFINE_PER_CPU(unsigned char, task_context) __visible = 0;
+
 /*
  * On x86_32, vm86 modifies tss.sp0, so sp0 isn't a reliable way to find
  * the top of the kernel stack.  Use an extra percpu variable to track the
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 21619c92c377..1c3473bbe5d2 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -168,4 +168,8 @@ do {						\
 
 #define irqs_disabled_flags(flags) raw_irqs_disabled_flags(flags)
 
+#ifndef ARCH_HAS_TASK_CONTEXT
+#define task_context_set(context) do {} while (0)
+#endif
+
 #endif
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 10277429ed84..324de769dc07 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -410,8 +410,11 @@ void irq_exit(void)
 #endif
 	account_irq_exit_time(current);
 	preempt_count_sub(HARDIRQ_OFFSET);
-	if (!in_interrupt() && local_softirq_pending())
+	if (!in_interrupt() && local_softirq_pending()) {
+		task_context_set(TASK_CTX_SOFTIRQ);
 		invoke_softirq();
+		task_context_set(TASK_CTX_IRQ);
+	}
 
 	tick_irq_exit();
 	rcu_irq_exit();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/7] trace: Move the trace recursion context enum to trace.h and reuse it
  2019-04-02 20:03 [RFC PATCH 0/7] Early task context tracking Daniel Bristot de Oliveira
  2019-04-02 20:03 ` [RFC PATCH 1/7] x86/entry: Add support for early " Daniel Bristot de Oliveira
@ 2019-04-02 20:03 ` Daniel Bristot de Oliveira
  2019-04-02 20:03 ` [RFC PATCH 3/7] trace: Optimize trace_get_context_bit() Daniel Bristot de Oliveira
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-04-02 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Arnaldo Carvalho de Melo, Ingo Molnar,
	Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, H. Peter Anvin, Joel Fernandes (Google),
	Jiri Olsa, Namhyung Kim, Alexander Shishkin, Tommaso Cucinotta,
	Romulo Silva de Oliveira, Clark Williams, x86

Both trace and ring buffer code needs to identify in which
context the current code is running to control recursion.

Move the enum in the trace.h, and unify its usage.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Clark Williams <williams@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
---
 kernel/trace/ring_buffer.c | 25 +++++--------------------
 kernel/trace/trace.h       | 25 +++++++++++++++++++++----
 2 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 41b6f96e5366..fa8cbad2ca88 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -27,6 +27,8 @@
 
 #include <asm/local.h>
 
+#include "trace.h"
+
 static void update_pages_handler(struct work_struct *work);
 
 /*
@@ -428,23 +430,6 @@ struct rb_event_info {
 	int			add_timestamp;
 };
 
-/*
- * Used for which event context the event is in.
- *  NMI     = 0
- *  IRQ     = 1
- *  SOFTIRQ = 2
- *  NORMAL  = 3
- *
- * See trace_recursive_lock() comment below for more details.
- */
-enum {
-	RB_CTX_NMI,
-	RB_CTX_IRQ,
-	RB_CTX_SOFTIRQ,
-	RB_CTX_NORMAL,
-	RB_CTX_MAX
-};
-
 /*
  * head_page == tail_page && head == tail then buffer is empty.
  */
@@ -2704,10 +2689,10 @@ trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
 	int bit;
 
 	if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
-		bit = RB_CTX_NORMAL;
+		bit = TRACE_CTX_NORMAL;
 	else
-		bit = pc & NMI_MASK ? RB_CTX_NMI :
-			pc & HARDIRQ_MASK ? RB_CTX_IRQ : RB_CTX_SOFTIRQ;
+		bit = pc & NMI_MASK ? TRACE_CTX_NMI :
+			pc & HARDIRQ_MASK ? TRACE_CTX_IRQ : TRACE_CTX_SOFTIRQ;
 
 	if (unlikely(val & (1 << (bit + cpu_buffer->nest))))
 		return 1;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index d80cee49e0eb..dad2f0cd7208 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -616,20 +616,37 @@ enum {
 
 #define TRACE_CONTEXT_MASK	TRACE_LIST_MAX
 
+/*
+ * Used for which event context the event is in.
+ *  NMI     = 0
+ *  IRQ     = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * See trace_recursive_lock() comment for more details.
+ */
+enum {
+	TRACE_CTX_NMI,
+	TRACE_CTX_IRQ,
+	TRACE_CTX_SOFTIRQ,
+	TRACE_CTX_NORMAL,
+	TRACE_CTX_MAX
+};
+
 static __always_inline int trace_get_context_bit(void)
 {
 	int bit;
 
 	if (in_interrupt()) {
 		if (in_nmi())
-			bit = 0;
+			bit = TRACE_CTX_NMI;
 
 		else if (in_irq())
-			bit = 1;
+			bit = TRACE_CTX_IRQ;
 		else
-			bit = 2;
+			bit = TRACE_CTX_SOFTIRQ;
 	} else
-		bit = 3;
+		bit = TRACE_CTX_NORMAL;
 
 	return bit;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/7] trace: Optimize trace_get_context_bit()
  2019-04-02 20:03 [RFC PATCH 0/7] Early task context tracking Daniel Bristot de Oliveira
  2019-04-02 20:03 ` [RFC PATCH 1/7] x86/entry: Add support for early " Daniel Bristot de Oliveira
  2019-04-02 20:03 ` [RFC PATCH 2/7] trace: Move the trace recursion context enum to trace.h and reuse it Daniel Bristot de Oliveira
@ 2019-04-02 20:03 ` Daniel Bristot de Oliveira
  2019-04-02 20:03 ` [RFC PATCH 4/7] trace/ring_buffer: Use trace_get_context_bit() Daniel Bristot de Oliveira
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-04-02 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Arnaldo Carvalho de Melo, Ingo Molnar,
	Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, H. Peter Anvin, Joel Fernandes (Google),
	Jiri Olsa, Namhyung Kim, Alexander Shishkin, Tommaso Cucinotta,
	Romulo Silva de Oliveira, Clark Williams, x86

trace_get_context_bit() and trace_recursive_lock() uses the same logic,
but the second reads the per_cpu variable only once.

Uses the trace_recursive_lock()'s logic in trace_get_context_bit().

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Clark Williams <williams@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
---
 kernel/trace/trace.h | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index dad2f0cd7208..09318748fab8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -635,20 +635,13 @@ enum {
 
 static __always_inline int trace_get_context_bit(void)
 {
-	int bit;
-
-	if (in_interrupt()) {
-		if (in_nmi())
-			bit = TRACE_CTX_NMI;
+	unsigned long pc = preempt_count();
 
-		else if (in_irq())
-			bit = TRACE_CTX_IRQ;
-		else
-			bit = TRACE_CTX_SOFTIRQ;
-	} else
-		bit = TRACE_CTX_NORMAL;
-
-	return bit;
+	if (pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))
+		return pc & NMI_MASK ? TRACE_CTX_NMI :
+			pc & HARDIRQ_MASK ? TRACE_CTX_IRQ : TRACE_CTX_SOFTIRQ;
+	else
+		return TRACE_CTX_NORMAL;
 }
 
 static __always_inline int trace_test_and_set_recursion(int start, int max)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 4/7] trace/ring_buffer: Use trace_get_context_bit()
  2019-04-02 20:03 [RFC PATCH 0/7] Early task context tracking Daniel Bristot de Oliveira
                   ` (2 preceding siblings ...)
  2019-04-02 20:03 ` [RFC PATCH 3/7] trace: Optimize trace_get_context_bit() Daniel Bristot de Oliveira
@ 2019-04-02 20:03 ` Daniel Bristot de Oliveira
  2019-04-02 20:03 ` [RFC PATCH 5/7] trace: Use early task context tracking if available Daniel Bristot de Oliveira
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-04-02 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Arnaldo Carvalho de Melo, Ingo Molnar,
	Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, H. Peter Anvin, Joel Fernandes (Google),
	Jiri Olsa, Namhyung Kim, Alexander Shishkin, Tommaso Cucinotta,
	Romulo Silva de Oliveira, Clark Williams, x86

Unify the context identification in the function trace_get_context_bit().

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Clark Williams <williams@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
---
 kernel/trace/ring_buffer.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index fa8cbad2ca88..5bff5bcb1bf8 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2685,14 +2685,7 @@ static __always_inline int
 trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
 {
 	unsigned int val = cpu_buffer->current_context;
-	unsigned long pc = preempt_count();
-	int bit;
-
-	if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
-		bit = TRACE_CTX_NORMAL;
-	else
-		bit = pc & NMI_MASK ? TRACE_CTX_NMI :
-			pc & HARDIRQ_MASK ? TRACE_CTX_IRQ : TRACE_CTX_SOFTIRQ;
+	int bit = trace_get_context_bit();
 
 	if (unlikely(val & (1 << (bit + cpu_buffer->nest))))
 		return 1;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 5/7] trace: Use early task context tracking if available
  2019-04-02 20:03 [RFC PATCH 0/7] Early task context tracking Daniel Bristot de Oliveira
                   ` (3 preceding siblings ...)
  2019-04-02 20:03 ` [RFC PATCH 4/7] trace/ring_buffer: Use trace_get_context_bit() Daniel Bristot de Oliveira
@ 2019-04-02 20:03 ` Daniel Bristot de Oliveira
  2019-04-02 20:03 ` [RFC PATCH 6/7] events: Create an trace_get_context_bit() Daniel Bristot de Oliveira
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-04-02 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Arnaldo Carvalho de Melo, Ingo Molnar,
	Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, H. Peter Anvin, Joel Fernandes (Google),
	Jiri Olsa, Namhyung Kim, Alexander Shishkin, Tommaso Cucinotta,
	Romulo Silva de Oliveira, Clark Williams, x86

Use the early task context tracking to detect the current context,
if the arch supports it.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Clark Williams <williams@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
---
 kernel/trace/trace.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 09318748fab8..62ba4bd0e436 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -632,7 +632,18 @@ enum {
 	TRACE_CTX_NORMAL,
 	TRACE_CTX_MAX
 };
+#ifdef ARCH_HAS_TASK_CONTEXT
+static __always_inline int trace_get_context_bit(void)
+{
+	unsigned long tc = this_cpu_read(task_context);
 
+	if (tc)
+		return tc & TASK_CTX_NMI ? TRACE_CTX_NMI :
+			tc & TASK_CTX_IRQ ? TRACE_CTX_IRQ : TRACE_CTX_SOFTIRQ;
+	else
+		return TRACE_CTX_NORMAL;
+}
+#else /* ARCH_HAS_TASK_CONTEXT */
 static __always_inline int trace_get_context_bit(void)
 {
 	unsigned long pc = preempt_count();
@@ -643,6 +654,7 @@ static __always_inline int trace_get_context_bit(void)
 	else
 		return TRACE_CTX_NORMAL;
 }
+#endif /* ARCH_HAS_TASK_CONTEXT */
 
 static __always_inline int trace_test_and_set_recursion(int start, int max)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 6/7] events: Create an trace_get_context_bit()
  2019-04-02 20:03 [RFC PATCH 0/7] Early task context tracking Daniel Bristot de Oliveira
                   ` (4 preceding siblings ...)
  2019-04-02 20:03 ` [RFC PATCH 5/7] trace: Use early task context tracking if available Daniel Bristot de Oliveira
@ 2019-04-02 20:03 ` Daniel Bristot de Oliveira
  2019-04-02 20:03 ` [RFC PATCH 7/7] events: Use early task context tracking if available Daniel Bristot de Oliveira
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-04-02 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Arnaldo Carvalho de Melo, Ingo Molnar,
	Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, H. Peter Anvin, Joel Fernandes (Google),
	Jiri Olsa, Namhyung Kim, Alexander Shishkin, Tommaso Cucinotta,
	Romulo Silva de Oliveira, Clark Williams, x86

Create a specific function to get the current task context.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Clark Williams <williams@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
---
 kernel/events/internal.h | 37 ++++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 79c47076700a..241a2318bfdc 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -202,18 +202,37 @@ arch_perf_out_copy_user(void *dst, const void *src, unsigned long n)
 
 DEFINE_OUTPUT_COPY(__output_copy_user, arch_perf_out_copy_user)
 
-static inline int get_recursion_context(int *recursion)
+/*
+ * Used for which event context the event is in.
+ *  NMI     = 0
+ *  IRQ     = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * See trace_recursive_lock() comment for more details.
+ */
+enum {
+	TRACE_CTX_NMI,
+	TRACE_CTX_IRQ,
+	TRACE_CTX_SOFTIRQ,
+	TRACE_CTX_NORMAL,
+	TRACE_CTX_MAX
+};
+
+static __always_inline int trace_get_context_bit(void)
 {
-	int rctx;
+	unsigned long pc = preempt_count();
 
-	if (unlikely(in_nmi()))
-		rctx = 3;
-	else if (in_irq())
-		rctx = 2;
-	else if (in_softirq())
-		rctx = 1;
+	if (pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET))
+		return pc & NMI_MASK ? TRACE_CTX_NMI :
+			pc & HARDIRQ_MASK ? TRACE_CTX_IRQ : TRACE_CTX_SOFTIRQ;
 	else
-		rctx = 0;
+		return TRACE_CTX_NORMAL;
+}
+
+static inline int get_recursion_context(int *recursion)
+{
+	int rctx = trace_get_context_bit();
 
 	if (recursion[rctx])
 		return -1;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 7/7] events: Use early task context tracking if available
  2019-04-02 20:03 [RFC PATCH 0/7] Early task context tracking Daniel Bristot de Oliveira
                   ` (5 preceding siblings ...)
  2019-04-02 20:03 ` [RFC PATCH 6/7] events: Create an trace_get_context_bit() Daniel Bristot de Oliveira
@ 2019-04-02 20:03 ` Daniel Bristot de Oliveira
  2019-04-04  0:01 ` [RFC PATCH 0/7] Early task context tracking Andy Lutomirski
  2019-04-04 17:40 ` Joel Fernandes
  8 siblings, 0 replies; 14+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-04-02 20:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Steven Rostedt, Arnaldo Carvalho de Melo, Ingo Molnar,
	Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, H. Peter Anvin, Joel Fernandes (Google),
	Jiri Olsa, Namhyung Kim, Alexander Shishkin, Tommaso Cucinotta,
	Romulo Silva de Oliveira, Clark Williams, x86

Use the early task context tracking to detect the current context,
if the arch supports it.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Clark Williams <williams@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: x86@kernel.org
---
 kernel/events/internal.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 241a2318bfdc..8e4215f8cb93 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -219,6 +219,18 @@ enum {
 	TRACE_CTX_MAX
 };
 
+#ifdef ARCH_HAS_TASK_CONTEXT
+static __always_inline int trace_get_context_bit(void)
+{
+	unsigned long tc = this_cpu_read(task_context);
+
+	if (tc)
+		return tc & TASK_CTX_NMI ? TRACE_CTX_NMI :
+			tc & TASK_CTX_IRQ ? TRACE_CTX_IRQ : TRACE_CTX_SOFTIRQ;
+	else
+		return TRACE_CTX_NORMAL;
+}
+#else /* ARCH_HAS_TASK_CONTEXT */
 static __always_inline int trace_get_context_bit(void)
 {
 	unsigned long pc = preempt_count();
@@ -229,6 +241,7 @@ static __always_inline int trace_get_context_bit(void)
 	else
 		return TRACE_CTX_NORMAL;
 }
+#endif /* ARCH_HAS_TASK_CONTEXT */
 
 static inline int get_recursion_context(int *recursion)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/7] Early task context tracking
  2019-04-02 20:03 [RFC PATCH 0/7] Early task context tracking Daniel Bristot de Oliveira
                   ` (6 preceding siblings ...)
  2019-04-02 20:03 ` [RFC PATCH 7/7] events: Use early task context tracking if available Daniel Bristot de Oliveira
@ 2019-04-04  0:01 ` Andy Lutomirski
  2019-04-04  9:42   ` Peter Zijlstra
  2019-04-08 12:47   ` Daniel Bristot de Oliveira
  2019-04-04 17:40 ` Joel Fernandes
  8 siblings, 2 replies; 14+ messages in thread
From: Andy Lutomirski @ 2019-04-04  0:01 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: LKML, Steven Rostedt, Arnaldo Carvalho de Melo, Ingo Molnar,
	Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, H. Peter Anvin, Joel Fernandes (Google),
	Jiri Olsa, Namhyung Kim, Alexander Shishkin, Tommaso Cucinotta,
	Romulo Silva de Oliveira, Clark Williams, X86 ML

> On Apr 2, 2019, at 2:03 PM, Daniel Bristot de Oliveira <bristot@redhat.com> wrote:
>
> Note: do not take it too seriously, it is just a proof of concept.
>
> Some time ago, while using perf to check the automaton model, I noticed
> that perf was losing events. The same was reproducible with ftrace.
>
> See: https://www.spinics.net/lists/linux-rt-users/msg19781.html
>
> Steve pointed to a problem in the identification of the context
> execution used by the recursion control.
>
> Currently, recursion control uses the preempt_counter to
> identify the current context. The NMI/HARD/SOFT IRQ counters
> are set in the preempt_counter in the irq_enter/exit functions.
>
> In a trace, they are set like this:
> -------------- %< --------------------
> 0)   ==========> |
> 0)               |  do_IRQ() {        /* First C function */
> 0)               |    irq_enter() {
> 0)               |              /* set the IRQ context. */
> 0)   1.081 us    |    }
> 0)               |    handle_irq() {
> 0)               |             /* IRQ handling code */
> 0) + 10.290 us   |    }
> 0)               |    irq_exit() {
> 0)               |              /* unset the IRQ context. */
> 0)   6.657 us    |    }
> 0) + 18.995 us   |  }
> 0)   <========== |
> -------------- >% --------------------
>
> As one can see, functions (and events) that take place before the set
> and after unset the preempt_counter are identified in the wrong context,
> causing the miss interpretation that a recursion is taking place.
> When this happens, events are dropped.
>
> To resolve this problem, the set/unset of the IRQ/NMI context needs to
> be done before the execution of the first C execution, and after its
> return. By doing so, and using this method to identify the context in the
> trace recursion protection, no more events are lost.

I would much rather do the opposite: completely remove context
tracking from the asm and, instead, stick it into the C code.  We'd
need to make sure that the C code is totally immune from tracing,
kprobes, etc, but it would be a nice cleanup.  And then you could fix
this bug in C!


>
> A possible solution is to use a per-cpu variable set and unset in the
> entry point of NMI/IRQs, before calling the C handler. This possible
> solution is presented in the next patches as a proof of concept, for
> x86_64. However, other ideas might be better than mine... so...
>
> Daniel Bristot de Oliveira (7):
>  x86/entry: Add support for early task context tracking
>  trace: Move the trace recursion context enum to trace.h and reuse it
>  trace: Optimize trace_get_context_bit()
>  trace/ring_buffer: Use trace_get_context_bit()
>  trace: Use early task context tracking if available
>  events: Create an trace_get_context_bit()
>  events: Use early task context tracking if available
>
> arch/x86/entry/entry_64.S       |  9 ++++++
> arch/x86/include/asm/irqflags.h | 30 ++++++++++++++++++++
> arch/x86/kernel/cpu/common.c    |  4 +++
> include/linux/irqflags.h        |  4 +++
> kernel/events/internal.h        | 50 +++++++++++++++++++++++++++------
> kernel/softirq.c                |  5 +++-
> kernel/trace/ring_buffer.c      | 28 ++----------------
> kernel/trace/trace.h            | 46 ++++++++++++++++++++++--------
> 8 files changed, 129 insertions(+), 47 deletions(-)
>
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/7] Early task context tracking
  2019-04-04  0:01 ` [RFC PATCH 0/7] Early task context tracking Andy Lutomirski
@ 2019-04-04  9:42   ` Peter Zijlstra
  2019-04-08 12:47   ` Daniel Bristot de Oliveira
  1 sibling, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2019-04-04  9:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Bristot de Oliveira, LKML, Steven Rostedt,
	Arnaldo Carvalho de Melo, Ingo Molnar, Thomas Gleixner,
	Borislav Petkov, H. Peter Anvin, Joel Fernandes (Google),
	Jiri Olsa, Namhyung Kim, Alexander Shishkin, Tommaso Cucinotta,
	Romulo Silva de Oliveira, Clark Williams, X86 ML

On Wed, Apr 03, 2019 at 05:01:02PM -0700, Andy Lutomirski wrote:
> I would much rather do the opposite: completely remove context
> tracking from the asm and, instead, stick it into the C code.  We'd
> need to make sure that the C code is totally immune from tracing,
> kprobes, etc, but it would be a nice cleanup.  And then you could fix
> this bug in C!

Right, so you had some prelim patches toward that a few weeks ago. Esp.
for the idtentry stuff that looked fairly straight forward.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/7] Early task context tracking
  2019-04-02 20:03 [RFC PATCH 0/7] Early task context tracking Daniel Bristot de Oliveira
                   ` (7 preceding siblings ...)
  2019-04-04  0:01 ` [RFC PATCH 0/7] Early task context tracking Andy Lutomirski
@ 2019-04-04 17:40 ` Joel Fernandes
  2019-04-08 12:54   ` Daniel Bristot de Oliveira
  8 siblings, 1 reply; 14+ messages in thread
From: Joel Fernandes @ 2019-04-04 17:40 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Steven Rostedt, Arnaldo Carvalho de Melo,
	Ingo Molnar, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, H. Peter Anvin, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Tommaso Cucinotta, Romulo Silva de Oliveira,
	paulmck, Clark Williams, x86

On Tue, Apr 02, 2019 at 10:03:52PM +0200, Daniel Bristot de Oliveira wrote:
> Note: do not take it too seriously, it is just a proof of concept.
> 
> Some time ago, while using perf to check the automaton model, I noticed
> that perf was losing events. The same was reproducible with ftrace.
> 
> See: https://www.spinics.net/lists/linux-rt-users/msg19781.html
> 
> Steve pointed to a problem in the identification of the context
> execution used by the recursion control.
> 
> Currently, recursion control uses the preempt_counter to
> identify the current context. The NMI/HARD/SOFT IRQ counters
> are set in the preempt_counter in the irq_enter/exit functions.

Just started looking.

Thinking out loud... can we not just update the preempt_count as early on
entry and as late on exit, as possible, and fix it that way? (Haven't fully
yet looked into what could break if we did that.)

I also feel the context tracking should be unified, right now we already have
two methods AFAIK - preempt_count and lockdep. Now this is yet another third.
Granted lockdep cannot be enabled in production, but still. It will be nice
to unify these tracking methods and if there is a single point of all such
context tracking that works well, and even better if we can just fix
preempt_count and use that for non-debugging usecases.

Also I feel in_interrupt() etc should be updated to rely on such tracking
methods if something other than preempt_count is used..

thanks,

 - Joel


> In a trace, they are set like this:
> -------------- %< --------------------
>  0)   ==========> |
>  0)               |  do_IRQ() {		/* First C function */
>  0)               |    irq_enter() {
>  0)               |      		/* set the IRQ context. */
>  0)   1.081 us    |    }
>  0)               |    handle_irq() {
>  0)               |     		/* IRQ handling code */
>  0) + 10.290 us   |    }
>  0)               |    irq_exit() {
>  0)               |      		/* unset the IRQ context. */
>  0)   6.657 us    |    }
>  0) + 18.995 us   |  }
>  0)   <========== |
> -------------- >% --------------------
> 
> As one can see, functions (and events) that take place before the set
> and after unset the preempt_counter are identified in the wrong context,
> causing the miss interpretation that a recursion is taking place.
> When this happens, events are dropped.
> 
> To resolve this problem, the set/unset of the IRQ/NMI context needs to
> be done before the execution of the first C execution, and after its
> return. By doing so, and using this method to identify the context in the
> trace recursion protection, no more events are lost.
> 
> A possible solution is to use a per-cpu variable set and unset in the
> entry point of NMI/IRQs, before calling the C handler. This possible
> solution is presented in the next patches as a proof of concept, for
> x86_64. However, other ideas might be better than mine... so...
> 
> Daniel Bristot de Oliveira (7):
>   x86/entry: Add support for early task context tracking
>   trace: Move the trace recursion context enum to trace.h and reuse it
>   trace: Optimize trace_get_context_bit()
>   trace/ring_buffer: Use trace_get_context_bit()
>   trace: Use early task context tracking if available
>   events: Create an trace_get_context_bit()
>   events: Use early task context tracking if available
> 
>  arch/x86/entry/entry_64.S       |  9 ++++++
>  arch/x86/include/asm/irqflags.h | 30 ++++++++++++++++++++
>  arch/x86/kernel/cpu/common.c    |  4 +++
>  include/linux/irqflags.h        |  4 +++
>  kernel/events/internal.h        | 50 +++++++++++++++++++++++++++------
>  kernel/softirq.c                |  5 +++-
>  kernel/trace/ring_buffer.c      | 28 ++----------------
>  kernel/trace/trace.h            | 46 ++++++++++++++++++++++--------
>  8 files changed, 129 insertions(+), 47 deletions(-)
> 
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/7] Early task context tracking
  2019-04-04  0:01 ` [RFC PATCH 0/7] Early task context tracking Andy Lutomirski
  2019-04-04  9:42   ` Peter Zijlstra
@ 2019-04-08 12:47   ` Daniel Bristot de Oliveira
  2019-04-08 16:08     ` Andy Lutomirski
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-04-08 12:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: LKML, Steven Rostedt, Arnaldo Carvalho de Melo, Ingo Molnar,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, H. Peter Anvin,
	Joel Fernandes (Google),
	Jiri Olsa, Namhyung Kim, Alexander Shishkin, Tommaso Cucinotta,
	Romulo Silva de Oliveira, Clark Williams, X86 ML

On 4/4/19 2:01 AM, Andy Lutomirski wrote:
>> To resolve this problem, the set/unset of the IRQ/NMI context needs to
>> be done before the execution of the first C execution, and after its
>> return. By doing so, and using this method to identify the context in the
>> trace recursion protection, no more events are lost.
> I would much rather do the opposite: completely remove context
> tracking from the asm and, instead, stick it into the C code.  We'd
> need to make sure that the C code is totally immune from tracing,
> kprobes, etc, but it would be a nice cleanup.  And then you could fix
> this bug in C!
> 
> 

Humm... what we could do to have things in C is to set the variable right at the
begin of the C handler, e.g., do_IRQ(), and right before the return.

But by doing this we would have a problem with two things:

1) irq handler itself (e.g., do_IRQ())
2) functions/tracepoints that might run before and after the handler execution
(e.g., preemptirq tracer), but still in the IRQ context.

We can work around the first case by checking if (the function is in the
__irq_entry .text section) in the recursion control.

The second case would still be a problem. For instance, the preemptirq:
tracepoints in the preemptirq tracer would be "dropped" in the case of a
miss-identification of a recursion.

Thinking aloud: should we try to move the preemptirq tracers to the C part?

I will try to come up with a patch with this approach to see if it "works."

Thoughts?

-- Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/7] Early task context tracking
  2019-04-04 17:40 ` Joel Fernandes
@ 2019-04-08 12:54   ` Daniel Bristot de Oliveira
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-04-08 12:54 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, Steven Rostedt, Arnaldo Carvalho de Melo,
	Ingo Molnar, Andy Lutomirski, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, H. Peter Anvin, Jiri Olsa, Namhyung Kim,
	Alexander Shishkin, Tommaso Cucinotta, Romulo Silva de Oliveira,
	paulmck, Clark Williams, x86

On 4/4/19 7:40 PM, Joel Fernandes wrote:
>> Currently, recursion control uses the preempt_counter to
>> identify the current context. The NMI/HARD/SOFT IRQ counters
>> are set in the preempt_counter in the irq_enter/exit functions.
> Just started looking.
> 
> Thinking out loud... can we not just update the preempt_count as early on
> entry and as late on exit, as possible, and fix it that way? (Haven't fully
> yet looked into what could break if we did that.)
> 
> I also feel the context tracking should be unified, right now we already have
> two methods AFAIK - preempt_count and lockdep. Now this is yet another third.
> Granted lockdep cannot be enabled in production, but still. It will be nice
> to unify these tracking methods and if there is a single point of all such
> context tracking that works well, and even better if we can just fix
> preempt_count and use that for non-debugging usecases.
> 
> Also I feel in_interrupt() etc should be updated to rely on such tracking
> methods if something other than preempt_count is used..

Hi Joel,

I agree with you that it is important to have a single method to identify the
context.

I did the RFC using a specific percpu variable to make things simpler. Also
because I tried to move set/unset of the preempt_counter and my dev VM stopped
booting. So it looked, somehow, risky to move the preempt_counter.

Still, if people believe it is better to use the preempt_counter... I am not
against...

-- Daniel


> thanks,
> 
>  - Joel
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/7] Early task context tracking
  2019-04-08 12:47   ` Daniel Bristot de Oliveira
@ 2019-04-08 16:08     ` Andy Lutomirski
  0 siblings, 0 replies; 14+ messages in thread
From: Andy Lutomirski @ 2019-04-08 16:08 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: Andy Lutomirski, LKML, Steven Rostedt, Arnaldo Carvalho de Melo,
	Ingo Molnar, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	H. Peter Anvin, Joel Fernandes (Google),
	Jiri Olsa, Namhyung Kim, Alexander Shishkin, Tommaso Cucinotta,
	Romulo Silva de Oliveira, Clark Williams, X86 ML

On Mon, Apr 8, 2019 at 5:47 AM Daniel Bristot de Oliveira
<bristot@redhat.com> wrote:
>
> On 4/4/19 2:01 AM, Andy Lutomirski wrote:
> >> To resolve this problem, the set/unset of the IRQ/NMI context needs to
> >> be done before the execution of the first C execution, and after its
> >> return. By doing so, and using this method to identify the context in the
> >> trace recursion protection, no more events are lost.
> > I would much rather do the opposite: completely remove context
> > tracking from the asm and, instead, stick it into the C code.  We'd
> > need to make sure that the C code is totally immune from tracing,
> > kprobes, etc, but it would be a nice cleanup.  And then you could fix
> > this bug in C!
> >
> >
>
> Humm... what we could do to have things in C is to set the variable right at the
> begin of the C handler, e.g., do_IRQ(), and right before the return.
>
> But by doing this we would have a problem with two things:
>
> 1) irq handler itself (e.g., do_IRQ())
> 2) functions/tracepoints that might run before and after the handler execution
> (e.g., preemptirq tracer), but still in the IRQ context.
>
> We can work around the first case by checking if (the function is in the
> __irq_entry .text section) in the recursion control.
>
> The second case would still be a problem. For instance, the preemptirq:
> tracepoints in the preemptirq tracer would be "dropped" in the case of a
> miss-identification of a recursion.
>
> Thinking aloud: should we try to move the preemptirq tracers to the C part?


I think we should try to move as much as possible to the C part.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-04-08 16:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-02 20:03 [RFC PATCH 0/7] Early task context tracking Daniel Bristot de Oliveira
2019-04-02 20:03 ` [RFC PATCH 1/7] x86/entry: Add support for early " Daniel Bristot de Oliveira
2019-04-02 20:03 ` [RFC PATCH 2/7] trace: Move the trace recursion context enum to trace.h and reuse it Daniel Bristot de Oliveira
2019-04-02 20:03 ` [RFC PATCH 3/7] trace: Optimize trace_get_context_bit() Daniel Bristot de Oliveira
2019-04-02 20:03 ` [RFC PATCH 4/7] trace/ring_buffer: Use trace_get_context_bit() Daniel Bristot de Oliveira
2019-04-02 20:03 ` [RFC PATCH 5/7] trace: Use early task context tracking if available Daniel Bristot de Oliveira
2019-04-02 20:03 ` [RFC PATCH 6/7] events: Create an trace_get_context_bit() Daniel Bristot de Oliveira
2019-04-02 20:03 ` [RFC PATCH 7/7] events: Use early task context tracking if available Daniel Bristot de Oliveira
2019-04-04  0:01 ` [RFC PATCH 0/7] Early task context tracking Andy Lutomirski
2019-04-04  9:42   ` Peter Zijlstra
2019-04-08 12:47   ` Daniel Bristot de Oliveira
2019-04-08 16:08     ` Andy Lutomirski
2019-04-04 17:40 ` Joel Fernandes
2019-04-08 12:54   ` Daniel Bristot de Oliveira

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.