linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joel Fernandes <joel@joelfernandes.org>
Subject: [for-next][PATCH 23/25] tracing: Document the stack trace algorithm in the comments
Date: Thu, 05 Sep 2019 11:43:21 -0400	[thread overview]
Message-ID: <20190905154343.625989887@goodmis.org> (raw)
In-Reply-To: 20190905154258.573706229@goodmis.org

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

As the max stack tracer algorithm is not that easy to understand from the
code, add comments that explain the algorithm and mentions how
ARCH_FTRACE_SHIFT_STACK_TRACER affects it.

Link: http://lkml.kernel.org/r/20190806123455.487ac02b@gandalf.local.home

Suggested-by: Joel Fernandes <joel@joelfernandes.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/trace_stack.c | 98 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
index 642a850af81a..ec9a34a97129 100644
--- a/kernel/trace/trace_stack.c
+++ b/kernel/trace/trace_stack.c
@@ -53,6 +53,104 @@ static void print_max_stack(void)
 	}
 }
 
+/*
+ * The stack tracer looks for a maximum stack at each call from a function. It
+ * registers a callback from ftrace, and in that callback it examines the stack
+ * size. It determines the stack size from the variable passed in, which is the
+ * address of a local variable in the stack_trace_call() callback function.
+ * The stack size is calculated by the address of the local variable to the top
+ * of the current stack. If that size is smaller than the currently saved max
+ * stack size, nothing more is done.
+ *
+ * If the size of the stack is greater than the maximum recorded size, then the
+ * following algorithm takes place.
+ *
+ * For architectures (like x86) that store the function's return address before
+ * saving the function's local variables, the stack will look something like
+ * this:
+ *
+ *   [ top of stack ]
+ *    0: sys call entry frame
+ *   10: return addr to entry code
+ *   11: start of sys_foo frame
+ *   20: return addr to sys_foo
+ *   21: start of kernel_func_bar frame
+ *   30: return addr to kernel_func_bar
+ *   31: [ do trace stack here ]
+ *
+ * The save_stack_trace() is called returning all the functions it finds in the
+ * current stack. Which would be (from the bottom of the stack to the top):
+ *
+ *   return addr to kernel_func_bar
+ *   return addr to sys_foo
+ *   return addr to entry code
+ *
+ * Now to figure out how much each of these functions' local variable size is,
+ * a search of the stack is made to find these values. When a match is made, it
+ * is added to the stack_dump_trace[] array. The offset into the stack is saved
+ * in the stack_trace_index[] array. The above example would show:
+ *
+ *        stack_dump_trace[]        |   stack_trace_index[]
+ *        ------------------        +   -------------------
+ *  return addr to kernel_func_bar  |          30
+ *  return addr to sys_foo          |          20
+ *  return addr to entry            |          10
+ *
+ * The print_max_stack() function above, uses these values to print the size of
+ * each function's portion of the stack.
+ *
+ *  for (i = 0; i < nr_entries; i++) {
+ *     size = i == nr_entries - 1 ? stack_trace_index[i] :
+ *                    stack_trace_index[i] - stack_trace_index[i+1]
+ *     print "%d %d %d %s\n", i, stack_trace_index[i], size, stack_dump_trace[i]);
+ *  }
+ *
+ * The above shows
+ *
+ *     depth size  location
+ *     ----- ----  --------
+ *  0    30   10   kernel_func_bar
+ *  1    20   10   sys_foo
+ *  2    10   10   entry code
+ *
+ * Now for architectures that might save the return address after the functions
+ * local variables (saving the link register before calling nested functions),
+ * this will cause the stack to look a little different:
+ *
+ * [ top of stack ]
+ *  0: sys call entry frame
+ * 10: start of sys_foo_frame
+ * 19: return addr to entry code << lr saved before calling kernel_func_bar
+ * 20: start of kernel_func_bar frame
+ * 29: return addr to sys_foo_frame << lr saved before calling next function
+ * 30: [ do trace stack here ]
+ *
+ * Although the functions returned by save_stack_trace() may be the same, the
+ * placement in the stack will be different. Using the same algorithm as above
+ * would yield:
+ *
+ *        stack_dump_trace[]        |   stack_trace_index[]
+ *        ------------------        +   -------------------
+ *  return addr to kernel_func_bar  |          30
+ *  return addr to sys_foo          |          29
+ *  return addr to entry            |          19
+ *
+ * Where the mapping is off by one:
+ *
+ *   kernel_func_bar stack frame size is 29 - 19 not 30 - 29!
+ *
+ * To fix this, if the architecture sets ARCH_RET_ADDR_AFTER_LOCAL_VARS the
+ * values in stack_trace_index[] are shifted by one to and the number of
+ * stack trace entries is decremented by one.
+ *
+ *        stack_dump_trace[]        |   stack_trace_index[]
+ *        ------------------        +   -------------------
+ *  return addr to kernel_func_bar  |          29
+ *  return addr to sys_foo          |          19
+ *
+ * Although the entry function is not displayed, the first function (sys_foo)
+ * will still include the stack size of it.
+ */
 static void check_stack(unsigned long ip, unsigned long *stack)
 {
 	unsigned long this_size, flags; unsigned long *p, *top, *start;
-- 
2.20.1



  parent reply	other threads:[~2019-09-05 15:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05 15:42 [for-next][PATCH 00/25] tracing: Updates for 5.4 Steven Rostedt
2019-09-05 15:42 ` [for-next][PATCH 01/25] kprobes: Allow kprobes coexist with livepatch Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 02/25] recordmcount: Remove redundant strcmp Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 03/25] recordmcount: Remove uread() Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 04/25] recordmcount: Remove unused fd from uwrite() and ulseek() Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 05/25] tracing/probe: Split trace_event related data from trace_probe Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 06/25] tracing/dynevent: Delete all matched events Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 07/25] tracing/dynevent: Pass extra arguments to match operation Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 08/25] tracing/kprobe: Add multi-probe per event support Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 09/25] tracing/uprobe: Add multi-probe per uprobe " Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 10/25] tracing/kprobe: Add per-probe delete from event Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 11/25] tracing/uprobe: " Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 12/25] tracing/probe: Add immediate parameter support Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 13/25] tracing/probe: Add immediate string " Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 14/25] selftests/ftrace: Add a testcase for kprobe multiprobe event Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 15/25] selftests/ftrace: Add syntax error test for immediates Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 16/25] selftests/ftrace: Add syntax error test for multiprobe Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 17/25] recordmcount: Rewrite error/success handling Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 18/25] recordmcount: Kernel style function signature formatting Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 19/25] recordmcount: Kernel style formatting Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 20/25] recordmcount: Remove redundant cleanup() calls Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 21/25] recordmcount: Clarify what cleanup() does Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 22/25] tracing/arm64: Have max stack tracer handle the case of return address after data Steven Rostedt
2019-09-05 15:43 ` Steven Rostedt [this message]
2019-09-05 15:43 ` [for-next][PATCH 24/25] tracing: Rename tracing_reset() to tracing_reset_cpu() Steven Rostedt
2019-09-05 15:43 ` [for-next][PATCH 25/25] tracing: Add "gfp_t" support in synthetic_events Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190905154343.625989887@goodmis.org \
    --to=rostedt@goodmis.org \
    --cc=akpm@linux-foundation.org \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).