linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Luigi Rizzo <lrizzo@google.com>
To: linux-kernel@vger.kernel.org, mhiramat@kernel.org,
	akpm@linux-foundation.org, gregkh@linuxfoundation.org,
	naveen.n.rao@linux.ibm.com, ardb@kernel.org, rizzo@iet.unipi.it,
	pabeni@redhat.com, giuseppe.lettieri@unipi.it, toke@redhat.com,
	hawk@kernel.org, mingo@redhat.com, acme@kernel.org,
	rostedt@goodmis.org, peterz@infradead.org
Cc: Luigi Rizzo <lrizzo@google.com>
Subject: [PATCH v2 0/2] kstats: kernel metric collector
Date: Wed, 26 Feb 2020 05:46:35 -0800	[thread overview]
Message-ID: <20200226134637.31670-1-lrizzo@google.com> (raw)

This patchset introduces a small library to collect per-cpu samples and
accumulate distributions to be exported through debugfs.

Note that use case and performance are orthogonal to those of the
perf/tracing architecture, which is why this is proposed as a standalone
component.

I have used this to trace the execution times of small sections of code
(network stack functions, drivers and xdp processing, ...), whole functions,
or notification latencies (e.g. IPI dispatch, etc.).  The fine granularity
of the aggregation helps identifying outliers, multimodal distributions,
caching effects and general performance-related issues.

Samples are 64-bit values that are aggregated into per-cpu logarithmic buckets
with configurable density (typical sample collectors use one buckets for
each power of 2, powers of 2, but that is very coarse and corresponds
to 1 significant bit per value; in quickstats one can specify the number
of significant bits, up to 5, which is useful for finer grain measurements).

There are two ways to trace a block of code: manual annotations has the best
performance at runtime and is done as follows:

	// create metric and entry in debugfs
	struct kstats *key = kstats_new("foo", frac_bits);

	...
	// instrument the code
	u64 t0 = ktime_get_ns();        // about 20ns
	<section of code to measure>
	t0 = ktime_get_ns() - t0;       // about 20ns
	kstats_record(key, t0);         // 5ns hot cache, 300ns cold

This method has an accuracy of about 20-30ns (inherited from the clock)
and an overhead with hot/cold cache as show above.

Values can be read from debugfs in an easy to parse format

	# cat /sys/kernel/debug/kstats/foo
	...
	slot 55  CPU  0    count      589 avg      480 p 0.027613
	slot 55  CPU  1    count       18 avg      480 p 0.002572
	slot 55  CPU  2    count       25 avg      480 p 0.003325
	...
	slot 55  CPUS 28   count      814 avg      480 p 0.002474
	...
	slot 97  CPU  13   count     1150 avg    20130 p 0.447442
	slot 97  CPUS 28   count   152585 avg    19809 p 0.651747
	...

Writing STOP, START, RESET to the file executes the corresponding action

	echo RESET > /sys/kernel/debug/kstats/foo

The second instrumentation mechanism uses kretrprobes or tracepoints and
lets tracing be enabled or removed at runtime from the command line for
any globally visible function

	echo trace some_function bits 3 > /sys/kernel/debug/kstats/_control

Data are exported or controlled in the same way as above. Accuracy is
worse due to the presence of kretprobe trampolines: 90ns with hot cache,
500ns with cold cache. The overhead on the traced function is 250ns hot,
1500ns cold.

Hope you find this useful.


Luigi Rizzo (2):
  kstats: kernel metric collector
  kstats: kretprobe and tracepoint support

 include/linux/kstats.h |  62 ++++
 lib/Kconfig.debug      |   7 +
 lib/Makefile           |   1 +
 lib/kstats.c           | 654 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 724 insertions(+)
 create mode 100644 include/linux/kstats.h
 create mode 100644 lib/kstats.c

-- 
2.25.0.265.gbab2e86ba0-goog


             reply	other threads:[~2020-02-26 13:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-26 13:46 Luigi Rizzo [this message]
2020-02-26 13:46 ` [PATCH v2 1/2] kstats: kernel metric collector Luigi Rizzo
2020-02-26 16:19   ` Peter Zijlstra
2020-02-26 19:31     ` Luigi Rizzo
2020-02-27  2:10       ` Masami Hiramatsu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200226134637.31670-1-lrizzo@google.com \
    --to=lrizzo@google.com \
    --cc=acme@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=giuseppe.lettieri@unipi.it \
    --cc=gregkh@linuxfoundation.org \
    --cc=hawk@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@redhat.com \
    --cc=naveen.n.rao@linux.ibm.com \
    --cc=pabeni@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rizzo@iet.unipi.it \
    --cc=rostedt@goodmis.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).