From: Aleksa Sarai <cyphar@cyphar.com>
To: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>,
Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>,
"David S. Miller" <davem@davemloft.net>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
Brendan Gregg <bgregg@netflix.com>,
Christian Brauner <christian@brauner.io>,
Aleksa Sarai <asarai@suse.de>, Aleksa Sarai <cyphar@cyphar.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] kretprobe: produce sane stack traces
Date: Tue, 30 Oct 2018 14:19:53 +1100 [thread overview]
Message-ID: <20181030031953.5petvkbt45adewdt@yavin> (raw)
In-Reply-To: <20181030101206.2e5998ca3c75496c91ba5b09@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 5142 bytes --]
On 2018-10-30, Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > Historically, kretprobe has always produced unusable stack traces
> > (kretprobe_trampoline is the only entry in most cases, because of the
> > funky stack pointer overwriting). This has caused quite a few annoyances
> > when using tracing to debug problems[1] -- since return values are only
> > available with kretprobes but stack traces were only usable for kprobes,
> > users had to probe both and then manually associate them.
>
> Yes, this unfortunately still happens. I once tried to fix it by
> replacing current "kretprobe instance" with graph-tracer's per-thread
> return stack. (https://lkml.org/lkml/2017/8/21/553)
I played with graph-tracer a while ago and it didn't appear to have
associated return values? Is this hidden somewhere or did I just miss
it?
> I still believe that direction is the best solution to solve this kind
> of issues, otherwise, we have to have 2 different stack fixups for
> kretprobe and ftrace graph tracer. (I will have a talk with Steve at
> plumbers next month)
I'm definitely :+1: on removing the duplication of the stack fixups, my
first instinct was to try to refactor all of the stack_trace code so
that we didn't have multiple arch-specific "get the stack trace" paths
(and so we could generically add current_kretprobe_instance() to one
codepath). But after looking into it, I was convinced this would be more
than a little ugly to do.
> > With the advent of bpf_trace, users would have been able to do this
> > association in bpf, but this was less than ideal (because
> > bpf_get_stackid would still produce rubbish and programs that didn't
> > know better would get silly results). The main usecase for stack traces
> > (at least with bpf_trace) is for DTrace-style aggregation on stack
> > traces (both entry and exit). Therefore we cannot simply correct the
> > stack trace on exit -- we must stash away the stack trace and return the
> > entry stack trace when it is requested.
> >
> > In theory, patches like commit 76094a2cf46e ("ftrace: distinguish
> > kretprobe'd functions in trace logs") are no longer necessary *for
> > tracing* because now all kretprobe traces should produce sane stack
> > traces. However it's not clear whether removing them completely is
> > reasonable.
>
> Then, let's try to revert it :)
Sure. :P
> BTW, could you also add a test case for ftrace too?
> also, I have some comments below.
Yup, will do.
> > +#define KRETPROBE_TRACE_SIZE 1024
> > +struct kretprobe_trace {
> > + int nr_entries;
> > + unsigned long entries[KRETPROBE_TRACE_SIZE];
> > +};
>
> Hmm, do we really need all entries? It takes 8KB for each instances.
> Note that the number of instances can be big if the system core number
> is larger.
Yeah, you're right this is too large for a default.
But the problem is that we need it to be large enough for any of the
tracers to be happy -- otherwise we'd have to dynamically allocate it
and I had a feeling this would be seen as a Bad Idea™ in the kprobe
paths.
* ftrace uses PAGE_SIZE/sizeof(u64) == 512 (on x86_64).
* perf_events (and thus BPF) uses 127 as the default but can be
configured via sysctl -- and thus can be unbounded.
* show_stack(...) doesn't appear to have a limit, but I might just be
misreading the x86-specific code.
As mentioned above, the lack of consensus on a single structure for
storing stack traces also means that there is a lack of consensus on
what the largest reasonable stack is.
But maybe just doing 127 would be "reasonable"?
(Athough, dynamically allocating would allow us to just use 'struct
stack_trace' directly without needing to embed a different structure.)
> > + hlist_for_each_entry_safe(iter, next, head, hlist) {
>
> Why would you use "_safe" variant here? if you don't modify the hlist,
> you don't need to use it.
Yup, my mistake.
> > +void kretprobe_save_stack_trace(struct kretprobe_instance *ri,
> > + struct stack_trace *trace)
> > +{
> > + int i;
> > + struct kretprobe_trace *krt = &ri->entry;
> > +
> > + for (i = trace->skip; i < krt->nr_entries; i++) {
> > + if (trace->nr_entries >= trace->max_entries)
> > + break;
> > + trace->entries[trace->nr_entries++] = krt->entries[i];
> > + }
> > +}
> > +EXPORT_SYMBOL_GPL(kretprobe_save_stack_trace);
> > +
> > +void kretprobe_perf_callchain_kernel(struct kretprobe_instance *ri,
> > + struct perf_callchain_entry_ctx *ctx)
> > +{
> > + int i;
> > + struct kretprobe_trace *krt = &ri->entry;
> > +
> > + for (i = 0; i < krt->nr_entries; i++) {
> > + if (krt->entries[i] == ULONG_MAX)
> > + break;
> > + perf_callchain_store(ctx, (u64) krt->entries[i]);
> > + }
> > +}
> > +EXPORT_SYMBOL_GPL(kretprobe_perf_callchain_kernel);
>
>
> Why do we need to export these functions?
That's a good question -- I must've just banged out the EXPORT
statements without thinking. I'll remove them in v2.
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2018-10-30 3:20 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-26 13:22 [PATCH] kretprobe: produce sane stack traces Aleksa Sarai
2018-10-30 1:12 ` Masami Hiramatsu
2018-10-30 3:19 ` Aleksa Sarai [this message]
2018-11-01 8:06 ` Masami Hiramatsu
2018-10-31 13:03 ` Steven Rostedt
2018-10-31 13:39 ` Aleksa Sarai
2018-11-01 10:13 ` Masami Hiramatsu
2018-11-01 10:49 ` Aleksa Sarai
2018-11-01 13:22 ` Steven Rostedt
2018-11-01 15:01 ` Masami Hiramatsu
2018-11-01 20:25 ` Aleksa Sarai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181030031953.5petvkbt45adewdt@yavin \
--to=cyphar@cyphar.com \
--cc=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=anil.s.keshavamurthy@intel.com \
--cc=asarai@suse.de \
--cc=bgregg@netflix.com \
--cc=christian@brauner.io \
--cc=davem@davemloft.net \
--cc=jolsa@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=naveen.n.rao@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).