All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aleksa Sarai <cyphar@cyphar.com>
To: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>,
	Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>,
	"David S. Miller" <davem@davemloft.net>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	Brendan Gregg <bgregg@netflix.com>,
	Christian Brauner <christian@brauner.io>,
	Aleksa Sarai <asarai@suse.de>, Aleksa Sarai <cyphar@cyphar.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] kretprobe: produce sane stack traces
Date: Tue, 30 Oct 2018 14:19:53 +1100	[thread overview]
Message-ID: <20181030031953.5petvkbt45adewdt@yavin> (raw)
In-Reply-To: <20181030101206.2e5998ca3c75496c91ba5b09@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 5142 bytes --]

On 2018-10-30, Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > Historically, kretprobe has always produced unusable stack traces
> > (kretprobe_trampoline is the only entry in most cases, because of the
> > funky stack pointer overwriting). This has caused quite a few annoyances
> > when using tracing to debug problems[1] -- since return values are only
> > available with kretprobes but stack traces were only usable for kprobes,
> > users had to probe both and then manually associate them.
> 
> Yes, this unfortunately still happens. I once tried to fix it by
> replacing current "kretprobe instance" with graph-tracer's per-thread
> return stack. (https://lkml.org/lkml/2017/8/21/553)

I played with graph-tracer a while ago and it didn't appear to have
associated return values? Is this hidden somewhere or did I just miss
it?

> I still believe that direction is the best solution to solve this kind
> of issues, otherwise, we have to have 2 different stack fixups for
> kretprobe and ftrace graph tracer. (I will have a talk with Steve at
> plumbers next month)

I'm definitely :+1: on removing the duplication of the stack fixups, my
first instinct was to try to refactor all of the stack_trace code so
that we didn't have multiple arch-specific "get the stack trace" paths
(and so we could generically add current_kretprobe_instance() to one
codepath). But after looking into it, I was convinced this would be more
than a little ugly to do.

> > With the advent of bpf_trace, users would have been able to do this
> > association in bpf, but this was less than ideal (because
> > bpf_get_stackid would still produce rubbish and programs that didn't
> > know better would get silly results). The main usecase for stack traces
> > (at least with bpf_trace) is for DTrace-style aggregation on stack
> > traces (both entry and exit). Therefore we cannot simply correct the
> > stack trace on exit -- we must stash away the stack trace and return the
> > entry stack trace when it is requested.
> > 
> > In theory, patches like commit 76094a2cf46e ("ftrace: distinguish
> > kretprobe'd functions in trace logs") are no longer necessary *for
> > tracing* because now all kretprobe traces should produce sane stack
> > traces. However it's not clear whether removing them completely is
> > reasonable.
> 
> Then, let's try to revert it :)

Sure. :P

> BTW, could you also add a test case for ftrace too?
> also, I have some comments below.

Yup, will do.

> > +#define KRETPROBE_TRACE_SIZE 1024
> > +struct kretprobe_trace {
> > +	int nr_entries;
> > +	unsigned long entries[KRETPROBE_TRACE_SIZE];
> > +};
> 
> Hmm, do we really need all entries? It takes 8KB for each instances.
> Note that the number of instances can be big if the system core number
> is larger.

Yeah, you're right this is too large for a default.

But the problem is that we need it to be large enough for any of the
tracers to be happy -- otherwise we'd have to dynamically allocate it
and I had a feeling this would be seen as a Bad Idea™ in the kprobe
paths.

  * ftrace uses PAGE_SIZE/sizeof(u64) == 512 (on x86_64).
  * perf_events (and thus BPF) uses 127 as the default but can be
    configured via sysctl -- and thus can be unbounded.
  * show_stack(...) doesn't appear to have a limit, but I might just be
    misreading the x86-specific code.

As mentioned above, the lack of consensus on a single structure for
storing stack traces also means that there is a lack of consensus on
what the largest reasonable stack is.

But maybe just doing 127 would be "reasonable"?

(Athough, dynamically allocating would allow us to just use 'struct
stack_trace' directly without needing to embed a different structure.)

> > +	hlist_for_each_entry_safe(iter, next, head, hlist) {
> 
> Why would you use "_safe" variant here? if you don't modify the hlist,
> you don't need to use it.

Yup, my mistake.

> > +void kretprobe_save_stack_trace(struct kretprobe_instance *ri,
> > +				struct stack_trace *trace)
> > +{
> > +	int i;
> > +	struct kretprobe_trace *krt = &ri->entry;
> > +
> > +	for (i = trace->skip; i < krt->nr_entries; i++) {
> > +		if (trace->nr_entries >= trace->max_entries)
> > +			break;
> > +		trace->entries[trace->nr_entries++] = krt->entries[i];
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(kretprobe_save_stack_trace);
> > +
> > +void kretprobe_perf_callchain_kernel(struct kretprobe_instance *ri,
> > +				     struct perf_callchain_entry_ctx *ctx)
> > +{
> > +	int i;
> > +	struct kretprobe_trace *krt = &ri->entry;
> > +
> > +	for (i = 0; i < krt->nr_entries; i++) {
> > +		if (krt->entries[i] == ULONG_MAX)
> > +			break;
> > +		perf_callchain_store(ctx, (u64) krt->entries[i]);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(kretprobe_perf_callchain_kernel);
> 
> 
> Why do we need to export these functions?

That's a good question -- I must've just banged out the EXPORT
statements without thinking. I'll remove them in v2.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2018-10-30  3:20 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-26 13:22 [PATCH] kretprobe: produce sane stack traces Aleksa Sarai
2018-10-30  1:12 ` Masami Hiramatsu
2018-10-30  3:19   ` Aleksa Sarai [this message]
2018-11-01  8:06     ` Masami Hiramatsu
2018-10-31 13:03   ` Steven Rostedt
2018-10-31 13:39     ` Aleksa Sarai
2018-11-01 10:13       ` Masami Hiramatsu
2018-11-01 10:49         ` Aleksa Sarai
2018-11-01 13:22           ` Steven Rostedt
2018-11-01 15:01           ` Masami Hiramatsu
2018-11-01 20:25             ` Aleksa Sarai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181030031953.5petvkbt45adewdt@yavin \
    --to=cyphar@cyphar.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=anil.s.keshavamurthy@intel.com \
    --cc=asarai@suse.de \
    --cc=bgregg@netflix.com \
    --cc=christian@brauner.io \
    --cc=davem@davemloft.net \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=naveen.n.rao@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.