Benchmarks of kernel tracing options (ftrace, ktrace, lttng and perf)

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Benchmarks of kernel tracing options (ftrace, ktrace, lttng and perf)
       [not found]   ` <AANLkTi=Dw-dxkUgYLZYBKb5sGYcdGcpS0OH_fEHW51Jr@mail.gmail.com>
@ 2010-10-28 18:20     ` David Sharp
  2010-10-28 19:27       ` Mathieu Desnoyers
  2010-11-16 20:28       ` Mathieu Desnoyers
  0 siblings, 2 replies; 5+ messages in thread
From: David Sharp @ 2010-10-28 18:20 UTC (permalink / raw)
  To: linux-kernel, Mathieu Desnoyers, Steven Rostedt, Ingo Molnar,
	Andrew Morton, Michael Rubin

Hi again,

I now have some benchmark results for two more tracers, lttng and
perf. Thanks to Mathieu for helping me with the lttng test.

To restate our objectives:
> Google uses kernel tracing aggressively in the its data centers. We
> wrote our own kernel tracer, ktrace. However ftrace, perf and LTTng
> all have a better feature set than ktrace, so we are abandoning that
> code.
>
> We see several implementations of tracing aimed at the mainline kernel
> and wanted a fair comparison of each of them to make sure they will
> not significantly impact performance. A tracing toolkit that is too
> expensive is not usable in our environment.
>
> So we are sending out some benchmark results to compare the three
> tracing subsystems we know of. In addition we included numbers for
> ktrace, and a git tree for the source.
>
> The benchmark used is rather synthetic, and aims to measure the
> overhead of entering the kernel with a tracepoint enabled. It measures
> the time to call getuid() 100,000 times, and returns the average time
> per call. To infer the overhead of tracing, the benchmark is run with
> tracing off, and with tracing on. The tracer is given a
> generously-sized ring buffer so that overflows do not occur (no reader
> is present). The "off" run essentially measures the cost of a ring
> switch, as sys_getuid() is practically a nop syscall. Subtracting the
> "off" result from the "on" result indicates the average amount of time
> it takes to add a small item to the tracing ring buffer.

I hope this produces a fair comparison. I've detailed my steps below
for all of the tracers in case there are improvements to be made.

This is the code used for the getuid microbenchmark:
http://autotest.kernel.org/browser/trunk/client/tests/tracing_microbenchmark/src/getuid_microbench.c

The numbers below are the "on" minus "off" average results over at
least 10 runs each.

Results for amount of time to execute a tracepoint (includes previous results):
ktrace: 200ns   (old)
ftrace: 224ns   (old, w/ handcoded tracepoint, not syscall tracing)
lttng: 449ns    (new)
perf: 1047ns    (new)

Also interesting:
ftrace: 587ns   (old, w/ syscall tracing)
This just shows that syscall tracing is much slower than a normal tracepoint.


The rest of this email is a lot of details about the benchmark runs
for each tracer.

I look forward to your feedback.

Thanks,
David Sharp



Here are the details of the steps for the two new results (lttng,
perf) first, and then some repeated details of the previous ftrace and
ktrace results.

LTTng
-----

I use lttng in flight-recorder mode to avoid the overhead of a reader
while tracing.

The benchmark was run with these versions:
- lttng kernel 0.233 based on v.2.6.36
 + the sys_getuid tracepoint patch [0]
- ltt-modules v0.19.1
 + a probe for sys_getuid [1]
- ltt-control v0.88

I ran the benchmark like this:

#setup:
modprobe ltt-trace-control
modprobe syscall-trace
/sys/kernel/debug/ltt/markers/kernel/sys_getuid/enable
#run:
for n in $(seq 10) ; do \
 lttctl -C -w $PWD/$n \
   -o channel.all.bufsize=$((8*1024*1024)) \
   -o channel.all.overwrite=1 trace1 && \
 getuid_microbench 100000 >> results.traced && \
 lttctl -D -w $PWD/$n trace1 ; \
done
for n in $(seq 10) ; do \
 getuid_microbench 100000 >> results.notrace ; \
done

Perf
----

This was run with a vanilla v2.6.36 kernel plus the sys_getuid
tracepoint patch, and perf 0.0.2. I used a large number of pages and
observed that perf reported it woke up only once to write data. I'm
assuming that it woke up at the end, so it was out of the way during
the tracing.

for n in $(seq 10) ; do \
 perf record -e timer:sys_getuid -a -m 4096 -- \
 getuid_microbench 100000 >> results.perf ; \
done
for n in $(seq 10) ; \
 do getuid_microbench 100000 >> results.notrace ; \
done


ftrace
------

ftrace and ktrace benchmarks were run using Autotest. The commands
below approximate what the autotest does. The Autotest test itself is
here:
http://autotest.kernel.org/browser/trunk/client/tests/tracing_microbenchmark

#setup:
tracedir="/sys/kernel/debug/tracing"
echo '$((8*1024)) > $tracedir/buffer_size_kb
echo '1' > $tracedir/events/timer/sys_getuid/enable
#run:
for n in $(seq 10) ; do \
 echo '1' > $tracing/tracing_enabled && \
 getuid_microbench 100000 >> results.ftrace && \
 echo '0' > $tracing/tracing_enabled && \
 echo > $tracing/trace ; \
done
for n in $(seq 10) ; do \
 getuid_microbench 100000 >> results.notrace && \
done

ktrace
------

> Some background on Ktrace: Ktrace uses the same ringbuffer as ftrace,
> but has much fewer frontend features (filtering, formatted output
> file). Trace events are always packed. ktrace has an mmap interface to
> the per-cpu ring buffers instead of splice()  (the merits of these two
> options have yet to be evaluated).
>
> In order to test ktrace fairly, I forward ported it v2.6.36-r7. I also
> made some minor changes to ftrace that don't significantly affect the
> benchmark.
>
> The tree with ktrace and the changes mentioned above is available here:
>
>        git://google3-2.osuosl.org/kernel/tracing/ktrace.git  ktrace-forwardport

I didn't push the ktrace tracer to the public kernel.org Autotest, but
it is quite similar to the ftrace tracer. Once again, I'll approximate
it with shell commands:

#setup:
tracedir="/sys/kernel/debug/ktrace"
echo '$((8*1024*1024)) > $tracedir/bufsize
echo '1' > $tracedir/events/kernel_getuid
#run:
for n in $(seq 10) ; do \
 echo '1' > $tracing/enabled && \
 getuid_microbench 100000 >> results.ftrace && \
 echo '0' > $tracing/enabled && \
 echo '1' > $tracing/reset ; \
done
for n in $(seq 10) ; do \
 getuid_microbench 100000 >> results.notrace && \
done


Footnotes
---------
[0] sys_getuid tracepoint:
http://google3-2.osuosl.org/?p=kernel/tracing/ktrace.git;a=commitdiff;h=872088a6e482ce0288d8fe9f02726df82173de0e

[1] sys_getuid ltt probe, diff against ltt-module v0.19.1 (forgive
gmail's patch mangling):
diff --git a/probes/syscall-trace.c b/probes/syscall-trace.c
index 9ae419f..1409a74 100644
--- a/probes/syscall-trace.c
+++ b/probes/syscall-trace.c
@@ -49,6 +49,20 @@ notrace void probe_syscall_exit(void *_data, long ret)
               &ret, sizeof(ret), sizeof(ret));
 }

+void probe_getuid(void *_data, uid_t uid);
+
+DEFINE_MARKER_TP(kernel, sys_getuid, sys_getuid,
+       probe_getuid, "uid %d");
+
+notrace void probe_getuid(void *_data, uid_t uid)
+{
+       struct marker *marker;
+
+       marker = &GET_MARKER(kernel, sys_getuid);
+       ltt_specialized_trace(marker, marker->single.probe_private,
+               &uid, sizeof(uid), sizeof(uid));
+}
+
 MODULE_LICENSE("GPL and additional rights");
 MODULE_AUTHOR("Mathieu Desnoyers");
 MODULE_DESCRIPTION("syscall Tracepoint Probes");
--
1.7.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Benchmarks of kernel tracing options (ftrace, ktrace, lttng and perf)
  2010-10-28 18:20     ` Benchmarks of kernel tracing options (ftrace, ktrace, lttng and perf) David Sharp
@ 2010-10-28 19:27       ` Mathieu Desnoyers
  2010-10-28 23:24         ` Michael Rubin
  2010-11-16 20:28       ` Mathieu Desnoyers
  1 sibling, 1 reply; 5+ messages in thread
From: Mathieu Desnoyers @ 2010-10-28 19:27 UTC (permalink / raw)
  To: David Sharp
  Cc: linux-kernel, Steven Rostedt, Ingo Molnar, Andrew Morton, Michael Rubin

* David Sharp (dhsharp@google.com) wrote:
[...]
> I hope this produces a fair comparison. I've detailed my steps below
> for all of the tracers in case there are improvements to be made.
> 
[...]
> Results for amount of time to execute a tracepoint (includes previous results):
> ktrace: 200ns   (old)
> ftrace: 224ns   (old, w/ handcoded tracepoint, not syscall tracing)
> lttng: 449ns    (new)
> perf: 1047ns    (new)
> 
> Also interesting:
> ftrace: 587ns   (old, w/ syscall tracing)
> This just shows that syscall tracing is much slower than a normal tracepoint.

Hrm, not quite fair actually. LTTng enables a thread flag that causes
syscall_trace to be called on sycall entry/exit by default. Please try with the
following patch to remove the extra syscall overhead.

Thanks,

Mathieu

for google benchmarks: remove syscall tracing from LTTng modules

Syscall tracing is enabled by default when lttng is tracing, which causes the
full registers to be saved and a sycall_trace call upon syscall entry/exit.

This patch is for testing only.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 ltt-tracer.c |    2 --
 1 file changed, 2 deletions(-)

Index: lttng-modules/ltt-tracer.c
===================================================================
--- lttng-modules.orig/ltt-tracer.c
+++ lttng-modules/ltt-tracer.c
@@ -839,7 +839,6 @@ int ltt_trace_alloc(const char *trace_na
 	if (list_empty(&ltt_traces.head)) {
 		mod_timer(&ltt_async_wakeup_timer,
 				jiffies + LTT_PERCPU_TIMER_INTERVAL);
-		set_kernel_trace_flag_all_tasks();
 	}
 	list_add_rcu(&trace->list, &ltt_traces.head);
 	synchronize_trace();
@@ -917,7 +916,6 @@ static int _ltt_trace_destroy(struct ltt
 	list_del_rcu(&trace->list);
 	synchronize_trace();
 	if (list_empty(&ltt_traces.head)) {
-		clear_kernel_trace_flag_all_tasks();
 		/*
 		 * We stop the asynchronous delivery of reader wakeup, but
 		 * we must make one last check for reader wakeups pending

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Benchmarks of kernel tracing options (ftrace, ktrace, lttng and perf)
  2010-10-28 19:27       ` Mathieu Desnoyers
@ 2010-10-28 23:24         ` Michael Rubin
  2010-10-29  4:37           ` Mathieu Desnoyers
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Rubin @ 2010-10-28 23:24 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: David Sharp, linux-kernel, Steven Rostedt, Ingo Molnar, Andrew Morton

On Thu, Oct 28, 2010 at 12:27 PM, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
> * David Sharp (dhsharp@google.com) wrote:
> Hrm, not quite fair actually. LTTng enables a thread flag that causes
> syscall_trace to be called on sycall entry/exit by default. Please try with the
> following patch to remove the extra syscall overhead.

We started this thread is an attempt to get the ball rolling on
keeping track of these metrics. They seem important for both Google
and Linux as a whole. We published all our work and benchmarks with
hopes that the owners could comment and also run the tests themselves.

Matthieu it might make more sense for you to find the" fair" setup and
publish that, than take the extra time to go back and forth with
David. Once you have the config that best represents what users will
see and you have numbers to go with it, we can then run it in our
environment. I worry you will be forced to send us patch and patch and
patch and we won't have the resources to do a good job in responding
to them. Which is not fair to you.

mrubin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Benchmarks of kernel tracing options (ftrace, ktrace, lttng and perf)
  2010-10-28 23:24         ` Michael Rubin
@ 2010-10-29  4:37           ` Mathieu Desnoyers
  0 siblings, 0 replies; 5+ messages in thread
From: Mathieu Desnoyers @ 2010-10-29  4:37 UTC (permalink / raw)
  To: Michael Rubin
  Cc: David Sharp, linux-kernel, Steven Rostedt, Ingo Molnar, Andrew Morton

* Michael Rubin (mrubin@google.com) wrote:
> On Thu, Oct 28, 2010 at 12:27 PM, Mathieu Desnoyers
> <mathieu.desnoyers@efficios.com> wrote:
> > * David Sharp (dhsharp@google.com) wrote:
> > Hrm, not quite fair actually. LTTng enables a thread flag that causes
> > syscall_trace to be called on sycall entry/exit by default. Please try with the
> > following patch to remove the extra syscall overhead.
> 
> We started this thread is an attempt to get the ball rolling on
> keeping track of these metrics. They seem important for both Google
> and Linux as a whole. We published all our work and benchmarks with
> hopes that the owners could comment and also run the tests themselves.
> 
> Matthieu it might make more sense for you to find the" fair" setup and
> publish that, than take the extra time to go back and forth with
> David. Once you have the config that best represents what users will
> see and you have numbers to go with it, we can then run it in our
> environment. I worry you will be forced to send us patch and patch and
> patch and we won't have the resources to do a good job in responding
> to them. Which is not fair to you.

Nah, it's really a matter of fine-tuning really. I already performed these
benchmarks, have published the results, and know what to expect, so I figured
out quickly that something was wrong with your setup. Disabling the extra
syscall tracing is the only detail I see that is missing for fair comparison of
LTTng with your setup.

For the "fair" setup to compare ring buffers, I would personally tend to say
that going through a system call might not be the way I would benchmark the ring
buffer per se, but I must admit that the benchmark you are doing is very useful
for evaluation of system call entry/exit tracing. On this particular aspect, I
have always admitted that lttng is not super-fast, but it actually takes the
same overhead as the Linux kernel syscall tracing has today. But given it is
always enabled when lttng tracing runs, you happened to add this overhead to the
lttng ring buffer inappropriately.

I'm willing to help David with the setup, and I'm sure it won't take long. By
the way, I'm currently porting lttng to the new Generic Ring Buffer Library,
which might be even slightly faster than the current lttng code (and faster than
ftrace with the ring buffer benchmark Steven wrote). So when I'm done porting
lttng to it, it might be worthwhile to benchmark the Generic Ring Buffer too.

Thank you,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Benchmarks of kernel tracing options (ftrace, ktrace, lttng and perf)
  2010-10-28 18:20     ` Benchmarks of kernel tracing options (ftrace, ktrace, lttng and perf) David Sharp
  2010-10-28 19:27       ` Mathieu Desnoyers
@ 2010-11-16 20:28       ` Mathieu Desnoyers
  1 sibling, 0 replies; 5+ messages in thread
From: Mathieu Desnoyers @ 2010-11-16 20:28 UTC (permalink / raw)
  To: David Sharp
  Cc: linux-kernel, Steven Rostedt, Ingo Molnar, Andrew Morton, Michael Rubin

* David Sharp (dhsharp@google.com) wrote:
[...]
> Results for amount of time to execute a tracepoint (includes previous results):
> ktrace: 200ns   (old)
> ftrace: 224ns   (old, w/ handcoded tracepoint, not syscall tracing)
> lttng: 449ns    (new)
> perf: 1047ns    (new)
> 
> Also interesting:
> ftrace: 587ns   (old, w/ syscall tracing)
> This just shows that syscall tracing is much slower than a normal tracepoint.

As I pointed out in my email a few weeks ago, the LTTng comparison is simply
bogus because the "syscall tracing" thread-flag is active, which calls into
syscall tracing, after saving all registers, from entry_*.S, both at syscall
entry and exit.

I did benchmarks using Steven's ring_buffer_benchmark kernel module, which calls
tracing in a loop, for both Ftrace and the Generic Ring Buffer (which is derived
from LTTng). The results are:

Intel Xeon 2.0GHz

ftrace:  103 ns/entry (no reader)
lttng:    83 ns/entry (no reader) (with the generic ring buffer library)

So, given that even after I pointed out that the results above were bogus,
people took the numbers for granted, and given that David seems busy on other
things, I thought I should set records straight.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-11-16 20:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <AANLkTim8c+FLo11XJbTmTjiuucn-9UJD0Vui=fvUgCd+@mail.gmail.com>
     [not found] ` <AANLkTikd6sawwJeBm76QLeowXj0GBAf=wpRVgG8x5MOp@mail.gmail.com>
     [not found]   ` <AANLkTi=Dw-dxkUgYLZYBKb5sGYcdGcpS0OH_fEHW51Jr@mail.gmail.com>
2010-10-28 18:20     ` Benchmarks of kernel tracing options (ftrace, ktrace, lttng and perf) David Sharp
2010-10-28 19:27       ` Mathieu Desnoyers
2010-10-28 23:24         ` Michael Rubin
2010-10-29  4:37           ` Mathieu Desnoyers
2010-11-16 20:28       ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).