All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joel Fernandes <joel@joelfernandes.org>
Subject: [for-next][PATCH 23/25] tracing: Document the stack trace algorithm in the comments
Date: Thu, 05 Sep 2019 11:43:21 -0400	[thread overview]
Message-ID: <20190905154343.625989887@goodmis.org> (raw)
In-Reply-To: 20190905154258.573706229@goodmis.org

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

As the max stack tracer algorithm is not that easy to understand from the
code, add comments that explain the algorithm and mentions how
ARCH_FTRACE_SHIFT_STACK_TRACER affects it.

Link: http://lkml.kernel.org/r/20190806123455.487ac02b@gandalf.local.home

Suggested-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/trace_stack.c | 98 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 642a850af81a..ec9a34a97129 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -53,6 +53,104 @@ static void print_max_stack(void)
 	}
 }
 
+/*
+ * The stack tracer looks for a maximum stack at each call from a function. It
+ * registers a callback from ftrace, and in that callback it examines the stack
+ * size. It determines the stack size from the variable passed in, which is the
+ * address of a local variable in the stack_trace_call() callback function.
+ * The stack size is calculated by the address of the local variable to the top
+ * of the current stack. If that size is smaller than the currently saved max
+ * stack size, nothing more is done.
+ *
+ * If the size of the stack is greater than the maximum recorded size, then the
+ * following algorithm takes place.
+ *
+ * For architectures (like x86) that store the function's return address before
+ * saving the function's local variables, the stack will look something like
+ * this:
+ *
+ *   [ top of stack ]
+ *    0: sys call entry frame
+ *   10: return addr to entry code
+ *   11: start of sys_foo frame
+ *   20: return addr to sys_foo
+ *   21: start of kernel_func_bar frame
+ *   30: return addr to kernel_func_bar
+ *   31: [ do trace stack here ]
+ *
+ * The save_stack_trace() is called returning all the functions it finds in the
+ * current stack. Which would be (from the bottom of the stack to the top):
+ *
+ *   return addr to kernel_func_bar
+ *   return addr to sys_foo
+ *   return addr to entry code
+ *
+ * Now to figure out how much each of these functions' local variable size is,
+ * a search of the stack is made to find these values. When a match is made, it
+ * is added to the stack_dump_trace[] array. The offset into the stack is saved
+ * in the stack_trace_index[] array. The above example would show:
+ *
+ *        stack_dump_trace[]        |   stack_trace_index[]
+ *        ------------------        +   -------------------
+ *  return addr to kernel_func_bar  |          30
+ *  return addr to sys_foo          |          20
+ *  return addr to entry            |          10
+ *
+ * The print_max_stack() function above, uses these values to print the size of
+ * each function's portion of the stack.
+ *
+ *  for (i = 0; i < nr_entries; i++) {
+ *     size = i == nr_entries - 1 ? stack_trace_index[i] :
+ *                    stack_trace_index[i] - stack_trace_index[i+1]
+ *     print "%d %d %d %s\n", i, stack_trace_index[i], size, stack_dump_trace[i]);
+ *  }
+ *
+ * The above shows
+ *
+ *     depth size  location
+ *     ----- ----  --------
+ *  0    30   10   kernel_func_bar
+ *  1    20   10   sys_foo
+ *  2    10   10   entry code
+ *
+ * Now for architectures that might save the return address after the functions
+ * local variables (saving the link register before calling nested functions),
+ * this will cause the stack to look a little different:
+ *
+ * [ top of stack ]
+ *  0: sys call entry frame
+ * 10: start of sys_foo_frame
+ * 19: return addr to entry code << lr saved before calling kernel_func_bar
+ * 20: start of kernel_func_bar frame
+ * 29: return addr to sys_foo_frame << lr saved before calling next function
+ * 30: [ do trace stack here ]
+ *
+ * Although the functions returned by save_stack_trace() may be the same, the
+ * placement in the stack will be different. Using the same algorithm as above
+ * would yield:
+ *
+ *        stack_dump_trace[]        |   stack_trace_index[]
+ *        ------------------        +   -------------------
+ *  return addr to kernel_func_bar  |          30
+ *  return addr to sys_foo          |          29
+ *  return addr to entry            |          19
+ *
+ * Where the mapping is off by one:
+ *
+ *   kernel_func_bar stack frame size is 29 - 19 not 30 - 29!
+ *
+ * To fix this, if the architecture sets ARCH_RET_ADDR_AFTER_LOCAL_VARS the
+ * values in stack_trace_index[] are shifted by one to and the number of
+ * stack trace entries is decremented by one.
+ *
+ *        stack_dump_trace[]        |   stack_trace_index[]
+ *        ------------------        +   -------------------
+ *  return addr to kernel_func_bar  |          29
+ *  return addr to sys_foo          |          19
+ *
+ * Although the entry function is not displayed, the first function (sys_foo)
+ * will still include the stack size of it.
+ */
 static void check_stack(unsigned long ip, unsigned long *stack)
 {
 	unsigned long this_size, flags; unsigned long *p, *top, *start;
-- 
2.20.1



  parent reply	other threads:[~2019-09-05 15:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05 15:42 [for-next][PATCH 00/25] tracing: Updates for 5.4 Steven Rostedt
2019-09-05 15:42 ` [for-next][PATCH 01/25] kprobes: Allow kprobes coexist with livepatch Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 02/25] recordmcount: Remove redundant strcmp Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 03/25] recordmcount: Remove uread() Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 04/25] recordmcount: Remove unused fd from uwrite() and ulseek() Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 05/25] tracing/probe: Split trace_event related data from trace_probe Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 06/25] tracing/dynevent: Delete all matched events Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 07/25] tracing/dynevent: Pass extra arguments to match operation Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 08/25] tracing/kprobe: Add multi-probe per event support Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 09/25] tracing/uprobe: Add multi-probe per uprobe " Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 10/25] tracing/kprobe: Add per-probe delete from event Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 11/25] tracing/uprobe: " Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 12/25] tracing/probe: Add immediate parameter support Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 13/25] tracing/probe: Add immediate string " Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 14/25] selftests/ftrace: Add a testcase for kprobe multiprobe event Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 15/25] selftests/ftrace: Add syntax error test for immediates Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 16/25] selftests/ftrace: Add syntax error test for multiprobe Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 17/25] recordmcount: Rewrite error/success handling Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 18/25] recordmcount: Kernel style function signature formatting Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 19/25] recordmcount: Kernel style formatting Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 20/25] recordmcount: Remove redundant cleanup() calls Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 21/25] recordmcount: Clarify what cleanup() does Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 22/25] tracing/arm64: Have max stack tracer handle the case of return address after data Steven Rostedt
2019-09-05 15:43 ` Steven Rostedt [this message]
2019-09-05 15:43 ` [for-next][PATCH 24/25] tracing: Rename tracing_reset() to tracing_reset_cpu() Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 25/25] tracing: Add "gfp_t" support in synthetic_events Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190905154343.625989887@goodmis.org \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.