All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joe Lawrence <joe.lawrence@redhat.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-arch@vger.kernel.org,
	Joel Fernandes <joel@joelfernandes.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Andy Lutomirski <luto@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>
Subject: Re: [for-next][PATCH 00/18] function_graph: Add separate depth counter to prevent trace corruption
Date: Thu, 29 Nov 2018 10:08:55 -0500	[thread overview]
Message-ID: <044db9c0-a937-252f-b9e6-e4060bca3b0d@redhat.com> (raw)
In-Reply-To: <20181128232430.425f8b12@vmware.local.home>

On 11/28/2018 11:24 PM, Steven Rostedt wrote:
> On Wed, 28 Nov 2018 22:29:36 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
>> Does this patch fix it for you?
> 
> Take 2. I realized that the reason for the interrupts being traced is
> because it is more likely to interrupt when the depth is already 0,
> setting it to 1 causing the interrupt to think it's already in a
> function  that wants to be traced (when that wasn't decided yet).
> 
> The fix uses a bit that gets set when we want to trace (and will trace
> till that bit is cleared). That bit gets cleared on the return
> function when depth is zero. But if we are tracing something in a
> interrupt that happened to interrupt the beginning of a function that
> already set the depth to 0, then we need to clear the bit at depth of 1
> not zero (and 2 if we want to trace a function that interrupted a
> softirq, that interrupted a start of normal function. and 3 if we want
> to trace an NMI function that had the very unlikely case of
> interrupting a start of a interrupt function, that interrupted the
> start of a softirq function, that interrupted a start of a normal
> context function!).
> 
> If that happens then we will continue to trace functions when we don't
> want to. To solve that, I need to record the depth (0-3) when the bit
> is set, and clear it when the return function is at that same depth
> (0-3).
> 
> -- Steve
> 
> diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
> index 3b8c0e24ab30..447bd96ee658 100644
> --- a/kernel/trace/trace.h
> +++ b/kernel/trace/trace.h
> @@ -512,12 +512,44 @@ enum {
>   * can only be modified by current, we can reuse trace_recursion.
>   */
>  	TRACE_IRQ_BIT,
> +
> +	/* Set if the function is in the set_graph_function file */
> +	TRACE_GRAPH_BIT,
> +
> +	/*
> +	 * In the very unlikely case that an interrupt came in
> +	 * at a start of graph tracing, and we want to trace
> +	 * the function in that interrupt, the depth can be greater
> +	 * than zero, because of the preempted start of a previous
> +	 * trace. In an even more unlikely case, depth could be 2
> +	 * if a softirq interrupted the start of graph tracing,
> +	 * followed by an interrupt preempting a start of graph
> +	 * tracing in the softirq, and depth can even be 3
> +	 * if an NMI came in at the start of an interrupt function
> +	 * that preempted a softirq start of a function that
> +	 * preempted normal context!!!! Luckily, it can't be
> +	 * greater than 3, so the next two bits are a mask
> +	 * of what the depth is when we set TRACE_GRAPH_BIT
> +	 */
> +
> +	TRACE_GRAPH_DEPTH_START_BIT,
> +	TRACE_GRAPH_DEPTH_END_BIT,
>  };
>  
>  #define trace_recursion_set(bit)	do { (current)->trace_recursion |= (1<<(bit)); } while (0)
>  #define trace_recursion_clear(bit)	do { (current)->trace_recursion &= ~(1<<(bit)); } while (0)
>  #define trace_recursion_test(bit)	((current)->trace_recursion & (1<<(bit)))
>  
> +#define trace_recursion_depth() \
> +	(((current)->trace_recursion >> TRACE_GRAPH_DEPTH_START_BIT) & 3)
> +#define trace_recursion_set_depth(depth) \
> +	do {								\
> +		current->trace_recursion &=				\
> +			~(3 << TRACE_GRAPH_DEPTH_START_BIT);		\
> +		current->trace_recursion |=				\
> +			((depth) & 3) << TRACE_GRAPH_DEPTH_START_BIT;	\
> +	} while (0)
> +
>  #define TRACE_CONTEXT_BITS	4
>  
>  #define TRACE_FTRACE_START	TRACE_FTRACE_BIT
> @@ -843,8 +875,9 @@ extern void __trace_graph_return(struct trace_array *tr,
>  extern struct ftrace_hash *ftrace_graph_hash;
>  extern struct ftrace_hash *ftrace_graph_notrace_hash;
>  
> -static inline int ftrace_graph_addr(unsigned long addr)
> +static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
>  {
> +	unsigned long addr = trace->func;
>  	int ret = 0;
>  
>  	preempt_disable_notrace();
> @@ -855,6 +888,14 @@ static inline int ftrace_graph_addr(unsigned long addr)
>  	}
>  
>  	if (ftrace_lookup_ip(ftrace_graph_hash, addr)) {
> +
> +		/*
> +		 * This needs to be cleared on the return functions
> +		 * when the depth is zero.
> +		 */
> +		trace_recursion_set(TRACE_GRAPH_BIT);
> +		trace_recursion_set_depth(trace->depth);
> +
>  		/*
>  		 * If no irqs are to be traced, but a set_graph_function
>  		 * is set, and called by an interrupt handler, we still
> @@ -872,6 +913,13 @@ static inline int ftrace_graph_addr(unsigned long addr)
>  	return ret;
>  }
>  
> +static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
> +{
> +	if (trace_recursion_test(TRACE_GRAPH_BIT) &&
> +	    trace->depth == trace_recursion_depth())
> +		trace_recursion_clear(TRACE_GRAPH_BIT);
> +}
> +
>  static inline int ftrace_graph_notrace_addr(unsigned long addr)
>  {
>  	int ret = 0;
> @@ -885,7 +933,7 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
>  	return ret;
>  }
>  #else
> -static inline int ftrace_graph_addr(unsigned long addr)
> +static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
>  {
>  	return 1;
>  }
> @@ -894,6 +942,8 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
>  {
>  	return 0;
>  }
> +static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
> +{ }
>  #endif /* CONFIG_DYNAMIC_FTRACE */
>  
>  extern unsigned int fgraph_max_depth;
> @@ -901,7 +951,8 @@ extern unsigned int fgraph_max_depth;
>  static inline bool ftrace_graph_ignore_func(struct ftrace_graph_ent *trace)
>  {
>  	/* trace it when it is-nested-in or is a function enabled. */
> -	return !(trace->depth || ftrace_graph_addr(trace->func)) ||
> +	return !(trace_recursion_test(TRACE_GRAPH_BIT) ||
> +		 ftrace_graph_addr(trace)) ||
>  		(trace->depth < 0) ||
>  		(fgraph_max_depth && trace->depth >= fgraph_max_depth);
>  }
> diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
> index 2561460d7baf..086af4f5c3e8 100644
> --- a/kernel/trace/trace_functions_graph.c
> +++ b/kernel/trace/trace_functions_graph.c
> @@ -509,6 +509,8 @@ void trace_graph_return(struct ftrace_graph_ret *trace)
>  	int cpu;
>  	int pc;
>  
> +	ftrace_graph_addr_finish(trace);
> +
>  	local_irq_save(flags);
>  	cpu = raw_smp_processor_id();
>  	data = per_cpu_ptr(tr->trace_buffer.data, cpu);
> @@ -532,6 +534,8 @@ void set_graph_array(struct trace_array *tr)
>  
>  static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
>  {
> +	ftrace_graph_addr_finish(trace);
> +
>  	if (tracing_thresh &&
>  	    (trace->rettime - trace->calltime < tracing_thresh))
>  		return;
> diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
> index b7357f9f82a3..98ea6d28df15 100644
> --- a/kernel/trace/trace_irqsoff.c
> +++ b/kernel/trace/trace_irqsoff.c
> @@ -208,6 +208,8 @@ static void irqsoff_graph_return(struct ftrace_graph_ret *trace)
>  	unsigned long flags;
>  	int pc;
>  
> +	ftrace_graph_addr_finish(trace);
> +
>  	if (!func_prolog_dec(tr, &data, &flags))
>  		return;
>  
> diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
> index a86b303e6c67..7d04b9890755 100644
> --- a/kernel/trace/trace_sched_wakeup.c
> +++ b/kernel/trace/trace_sched_wakeup.c
> @@ -270,6 +270,8 @@ static void wakeup_graph_return(struct ftrace_graph_ret *trace)
>  	unsigned long flags;
>  	int pc;
>  
> +	ftrace_graph_addr_finish(trace);
> +
>  	if (!func_prolog_preempt_disable(tr, &data, &pc))
>  		return;
>  
> 

With the "take 2" patch, I only see smp_irq_work_interrupt() graph when
the graph_function is in progress..  for example:

 3)               |  cmdline_proc_show() {
 3)   ==========> |
 3)               |    smp_irq_work_interrupt() {
 3)               |      irq_enter() {
 3)               |        rcu_irq_enter() {
 3)   0.141 us    |          rcu_dynticks_curr_cpu_in_eqs();
 3)   0.589 us    |        }
 3)   0.877 us    |      }
 3)               |      __wake_up() {
 3)               |        __wake_up_common_lock() {
 3)   0.122 us    |          _raw_spin_lock_irqsave();
 3)               |          __wake_up_common() {
 3)               |            autoremove_wake_function() {
 3)               |              default_wake_function() {
 3)               |                try_to_wake_up() {
 3)   0.209 us    |                  _raw_spin_lock_irqsave();
 3)               |                  select_task_rq_fair() {
 3)               |                    select_idle_sibling() {
 3)   0.140 us    |                      available_idle_cpu();
 3)   0.408 us    |                    }
 3)   0.711 us    |                  }
 3)               |                  native_smp_send_reschedule() {
 3)   4.298 us    |                    x2apic_send_IPI();
 3)   4.574 us    |                  }
 3)   0.144 us    |                  _raw_spin_unlock_irqrestore();
 3)   6.677 us    |                }
 3)   6.915 us    |              }
 3)   7.182 us    |            }
 3)   7.620 us    |          }
 3)   0.132 us    |          _raw_spin_unlock_irqrestore();
 3)   8.401 us    |        }
 3)   9.066 us    |      }
 3)               |      irq_exit() {
 3)   0.128 us    |        idle_cpu();
 3)               |        rcu_irq_exit() {
 3)   0.122 us    |          rcu_dynticks_curr_cpu_in_eqs();
 3) + 10.598 us   |        }
 3) + 11.128 us   |      }
 3) + 22.075 us   |    }
 3)   <========== |
 3)   0.414 us    |    seq_puts();
 3)   0.128 us    |    seq_putc();
 3) + 29.040 us   |  }

-- Joe

  reply	other threads:[~2018-11-29 15:09 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-22  0:28 [for-next][PATCH 00/18] function_graph: Add separate depth counter to prevent trace corruption Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 01/18] function_graph: Create function_graph_enter() to consolidate architecture code Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 02/18] x86/function_graph: Simplify with function_graph_entry() Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 03/18] ARM: function_graph: " Steven Rostedt
2018-11-22  0:28   ` Steven Rostedt
2018-11-23  7:30   ` Sasha Levin
2018-11-22  0:28 ` [for-next][PATCH 04/18] arm64: " Steven Rostedt
2018-11-22  0:28   ` Steven Rostedt
2018-11-22  0:28   ` Steven Rostedt
2018-11-27 18:07   ` Will Deacon
2018-11-27 18:07     ` Will Deacon
2018-11-27 18:26     ` Steven Rostedt
2018-11-27 18:26       ` Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 05/18] microblaze: " Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 06/18] MIPS: " Steven Rostedt
2018-11-23  7:30   ` Sasha Levin
2018-11-23  7:30     ` Sasha Levin
2018-11-22  0:28 ` [for-next][PATCH 07/18] nds32: " Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 08/18] parisc: " Steven Rostedt
     [not found]   ` <20181123073033.2083020863@mail.kernel.org>
2018-11-23  9:06     ` Helge Deller
2018-11-23 17:12       ` Steven Rostedt
2018-11-23 18:34         ` Sasha Levin
2018-11-23 19:26           ` Steven Rostedt
2018-11-23 20:00             ` Sasha Levin
2018-11-24 18:46               ` Steven Rostedt
2018-12-03 14:54                 ` Sasha Levin
2018-11-22  0:28 ` [for-next][PATCH 09/18] powerpc/function_graph: " Steven Rostedt
2018-11-22  0:28   ` Steven Rostedt
2018-11-23  7:30   ` Sasha Levin
2018-11-22  0:28 ` [for-next][PATCH 10/18] riscv/function_graph: " Steven Rostedt
2018-11-26 16:47   ` Palmer Dabbelt
2018-11-22  0:28 ` [for-next][PATCH 11/18] s390/function_graph: " Steven Rostedt
2018-11-22  6:43   ` Martin Schwidefsky
2018-11-23 17:15     ` Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 12/18] sh/function_graph: " Steven Rostedt
2018-11-22  0:28   ` Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 13/18] sparc/function_graph: " Steven Rostedt
2018-11-22  0:28   ` Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 14/18] function_graph: Make ftrace_push_return_trace() static Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 15/18] function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack Steven Rostedt
2018-11-22 10:03   ` Peter Zijlstra
2018-11-23 14:14     ` Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 16/18] function_graph: Move return callback before update " Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 17/18] function_graph: Reverse the order of pushing the ret_stack and the callback Steven Rostedt
2018-11-22  0:28 ` [for-next][PATCH 18/18] function_graph: Have profiler use curr_ret_stack and not depth Steven Rostedt
2018-11-26  5:15 ` [for-next][PATCH 00/18] function_graph: Add separate depth counter to prevent trace corruption Masami Hiramatsu
2018-11-28 20:39 ` Joe Lawrence
2018-11-28 21:00   ` Steven Rostedt
2018-11-29  3:29     ` Steven Rostedt
2018-11-29  4:24       ` Steven Rostedt
2018-11-29 15:08         ` Joe Lawrence [this message]
2018-11-29 16:17           ` Steven Rostedt
2018-11-29 16:32             ` Joe Lawrence

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=044db9c0-a937-252f-b9e6-e4060bca3b0d@redhat.com \
    --to=joe.lawrence@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=frederic@kernel.org \
    --cc=joel@joelfernandes.org \
    --cc=jpoimboe@redhat.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.