On 2018-10-30, Masami Hiramatsu wrote: > > Historically, kretprobe has always produced unusable stack traces > > (kretprobe_trampoline is the only entry in most cases, because of the > > funky stack pointer overwriting). This has caused quite a few annoyances > > when using tracing to debug problems[1] -- since return values are only > > available with kretprobes but stack traces were only usable for kprobes, > > users had to probe both and then manually associate them. > > Yes, this unfortunately still happens. I once tried to fix it by > replacing current "kretprobe instance" with graph-tracer's per-thread > return stack. (https://lkml.org/lkml/2017/8/21/553) I played with graph-tracer a while ago and it didn't appear to have associated return values? Is this hidden somewhere or did I just miss it? > I still believe that direction is the best solution to solve this kind > of issues, otherwise, we have to have 2 different stack fixups for > kretprobe and ftrace graph tracer. (I will have a talk with Steve at > plumbers next month) I'm definitely :+1: on removing the duplication of the stack fixups, my first instinct was to try to refactor all of the stack_trace code so that we didn't have multiple arch-specific "get the stack trace" paths (and so we could generically add current_kretprobe_instance() to one codepath). But after looking into it, I was convinced this would be more than a little ugly to do. > > With the advent of bpf_trace, users would have been able to do this > > association in bpf, but this was less than ideal (because > > bpf_get_stackid would still produce rubbish and programs that didn't > > know better would get silly results). The main usecase for stack traces > > (at least with bpf_trace) is for DTrace-style aggregation on stack > > traces (both entry and exit). Therefore we cannot simply correct the > > stack trace on exit -- we must stash away the stack trace and return the > > entry stack trace when it is requested. > > > > In theory, patches like commit 76094a2cf46e ("ftrace: distinguish > > kretprobe'd functions in trace logs") are no longer necessary *for > > tracing* because now all kretprobe traces should produce sane stack > > traces. However it's not clear whether removing them completely is > > reasonable. > > Then, let's try to revert it :) Sure. :P > BTW, could you also add a test case for ftrace too? > also, I have some comments below. Yup, will do. > > +#define KRETPROBE_TRACE_SIZE 1024 > > +struct kretprobe_trace { > > + int nr_entries; > > + unsigned long entries[KRETPROBE_TRACE_SIZE]; > > +}; > > Hmm, do we really need all entries? It takes 8KB for each instances. > Note that the number of instances can be big if the system core number > is larger. Yeah, you're right this is too large for a default. But the problem is that we need it to be large enough for any of the tracers to be happy -- otherwise we'd have to dynamically allocate it and I had a feeling this would be seen as a Bad Idea™ in the kprobe paths. * ftrace uses PAGE_SIZE/sizeof(u64) == 512 (on x86_64). * perf_events (and thus BPF) uses 127 as the default but can be configured via sysctl -- and thus can be unbounded. * show_stack(...) doesn't appear to have a limit, but I might just be misreading the x86-specific code. As mentioned above, the lack of consensus on a single structure for storing stack traces also means that there is a lack of consensus on what the largest reasonable stack is. But maybe just doing 127 would be "reasonable"? (Athough, dynamically allocating would allow us to just use 'struct stack_trace' directly without needing to embed a different structure.) > > + hlist_for_each_entry_safe(iter, next, head, hlist) { > > Why would you use "_safe" variant here? if you don't modify the hlist, > you don't need to use it. Yup, my mistake. > > +void kretprobe_save_stack_trace(struct kretprobe_instance *ri, > > + struct stack_trace *trace) > > +{ > > + int i; > > + struct kretprobe_trace *krt = &ri->entry; > > + > > + for (i = trace->skip; i < krt->nr_entries; i++) { > > + if (trace->nr_entries >= trace->max_entries) > > + break; > > + trace->entries[trace->nr_entries++] = krt->entries[i]; > > + } > > +} > > +EXPORT_SYMBOL_GPL(kretprobe_save_stack_trace); > > + > > +void kretprobe_perf_callchain_kernel(struct kretprobe_instance *ri, > > + struct perf_callchain_entry_ctx *ctx) > > +{ > > + int i; > > + struct kretprobe_trace *krt = &ri->entry; > > + > > + for (i = 0; i < krt->nr_entries; i++) { > > + if (krt->entries[i] == ULONG_MAX) > > + break; > > + perf_callchain_store(ctx, (u64) krt->entries[i]); > > + } > > +} > > +EXPORT_SYMBOL_GPL(kretprobe_perf_callchain_kernel); > > > Why do we need to export these functions? That's a good question -- I must've just banged out the EXPORT statements without thinking. I'll remove them in v2. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH