From: Jonathan Adams <jwadams@google.com>
To: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
Paolo Bonzini <pbonzini@redhat.com>,
Greg KH <gregkh@linuxfoundation.org>,
Jim Mattson <jmattson@google.com>,
David Rientjes <rientjes@google.com>,
Jonathan Adams <jwadams@google.com>
Subject: [RFC PATCH 0/7] metricfs metric file system and examples
Date: Fri, 7 Aug 2020 14:29:09 -0700 [thread overview]
Message-ID: <20200807212916.2883031-1-jwadams@google.com> (raw)
[resending to widen the CC lists per rdunlap@infradead.org's suggestion
original posting to lkml here: https://lkml.org/lkml/2020/8/5/1009]
To try to restart the discussion of kernel statistics started by the
statsfs patchsets (https://lkml.org/lkml/2020/5/26/332), I wanted
to share the following set of patches which are Google's 'metricfs'
implementation and some example uses. Google has been using metricfs
internally since 2012 as a way to export various statistics to our
telemetry systems (similar to OpenTelemetry), and we have over 200
statistics exported on a typical machine.
These patches have been cleaned up and modernized v.s. the versions
in production; I've included notes under the fold in the patches.
They're based on v5.8-rc6.
The statistics live under debugfs, in a tree rooted at:
/sys/kernel/debug/metricfs
Each metric is a directory, with four files in it. For example, the '
core/metricfs: Create metricfs, standardized files under debugfs.' patch
includes a simple 'metricfs_presence' metric, whose files look like:
/sys/kernel/debug/metricfs:
metricfs_presence/annotations
DESCRIPTION A\ basic\ presence\ metric.
metricfs_presence/fields
value
int
metricfs_presence/values
1
metricfs_presence/version
1
(The "version" field always says '1', and is kind of vestigial)
An example of a more complicated stat is the networking stats.
For example, the tx_bytes stat looks like:
net/dev/stats/tx_bytes/annotations
DESCRIPTION net\ device\ transmited\ bytes\ count
CUMULATIVE
net/dev/stats/tx_bytes/fields
interface value
str int
net/dev/stats/tx_bytes/values
lo 4394430608
eth0 33353183843
eth1 16228847091
net/dev/stats/tx_bytes/version
1
The per-cpu statistics show up in the schedulat stat info and x86
IRQ counts. For example:
stat/user/annotations
DESCRIPTION time\ in\ user\ mode\ (nsec)
CUMULATIVE
stat/user/fields
cpu value
int int
stat/user/values
0 1183486517734
1 1038284237228
...
stat/user/version
1
The full set of example metrics I've included are:
core/metricfs: Create metricfs, standardized files under debugfs.
metricfs_presence
core/metricfs: metric for kernel warnings
warnings/values
core/metricfs: expose scheduler stat information through metricfs
stat/*
net-metricfs: Export /proc/net/dev via metricfs.
net/dev/stats/[tr]x_*
core/metricfs: expose x86-specific irq information through metricfs
irq_x86/*
The general approach is called out in kernel/metricfs.c:
The kernel provides:
- A description of the metric
- The subsystem for the metric (NULL is ok)
- Type information about the metric, and
- A callback function which supplies metric values.
Limitations:
- "values" files are at MOST 64K. We truncate the file at that point.
- The list of fields and types is at most 1K.
- Metrics may have at most 2 fields.
Best Practices:
- Emit the most important data first! Once the 64K per-metric buffer
is full, the emit* functions won't do anything.
- In userspace, open(), read(), and close() the file quickly! The kernel
allocation for the metric is alive as long as the file is open. This
permits users to seek around the contents of the file, while
permitting an atomic view of the data.
Note that since the callbacks are called and the data is generated at
file open() time, the relative consistency is only between members of
a given metric; the rx_bytes stat for every network interface will
be read at almost the same time, but if you want to get rx_bytes
and rx_packets, there could be a bunch of slew between the two file
opens. (So this doesn't entirely address Andrew Lunn's comments in
https://lkml.org/lkml/2020/5/26/490)
This also doesn't address one of the basic parts of the statsfs work:
moving the statistics out of debugfs to avoid lockdown interactions.
Google has found a lot of value in having a generic interface for adding
these kinds of statistics with reasonably low overhead (reading them
is O(number of statistics), not number of objects in each statistic).
There are definitely warts in the interface, but does the basic approach
make sense to folks?
Thanks,
- Jonathan
Jonathan Adams (5):
core/metricfs: add support for percpu metricfs files
core/metricfs: metric for kernel warnings
core/metricfs: expose softirq information through metricfs
core/metricfs: expose scheduler stat information through metricfs
core/metricfs: expose x86-specific irq information through metricfs
Justin TerAvest (1):
core/metricfs: Create metricfs, standardized files under debugfs.
Laurent Chavey (1):
net-metricfs: Export /proc/net/dev via metricfs.
arch/x86/kernel/irq.c | 80 ++++
fs/proc/stat.c | 57 +++
include/linux/metricfs.h | 131 +++++++
kernel/Makefile | 2 +
kernel/metricfs.c | 775 +++++++++++++++++++++++++++++++++++++
kernel/metricfs_examples.c | 151 ++++++++
kernel/panic.c | 131 +++++++
kernel/softirq.c | 45 +++
lib/Kconfig.debug | 18 +
net/core/Makefile | 1 +
net/core/net_metricfs.c | 194 ++++++++++
11 files changed, 1585 insertions(+)
create mode 100644 include/linux/metricfs.h
create mode 100644 kernel/metricfs.c
create mode 100644 kernel/metricfs_examples.c
create mode 100644 net/core/net_metricfs.c
--
2.28.0.236.gb10cc79966-goog
next reply other threads:[~2020-08-07 21:31 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-07 21:29 Jonathan Adams [this message]
2020-08-07 21:29 ` [RFC PATCH 1/7] core/metricfs: Create metricfs, standardized files under debugfs Jonathan Adams
2020-08-08 5:41 ` Greg KH
2020-08-07 21:29 ` [RFC PATCH 2/7] core/metricfs: add support for percpu metricfs files Jonathan Adams
2020-08-08 5:43 ` Greg KH
2020-08-07 21:29 ` [RFC PATCH 3/7] core/metricfs: metric for kernel warnings Jonathan Adams
2020-08-08 5:45 ` Greg KH
2020-08-07 21:29 ` [RFC PATCH 4/7] core/metricfs: expose softirq information through metricfs Jonathan Adams
2020-08-08 5:46 ` Greg KH
2020-08-07 21:29 ` [RFC PATCH 5/7] core/metricfs: expose scheduler stat " Jonathan Adams
2020-08-07 21:29 ` [RFC PATCH 6/7] core/metricfs: expose x86-specific irq " Jonathan Adams
2020-08-13 10:11 ` Thomas Gleixner
2020-08-13 11:47 ` Paolo Bonzini
2020-08-13 12:13 ` Thomas Gleixner
2020-08-13 14:10 ` Paolo Bonzini
2020-08-13 14:21 ` Thomas Gleixner
2020-08-07 21:29 ` [RFC PATCH 7/7] net-metricfs: Export /proc/net/dev via metricfs Jonathan Adams
2020-08-08 2:06 ` [RFC PATCH 0/7] metricfs metric file system and examples Andrew Lunn
2020-08-08 15:59 ` David Ahern
2020-08-10 18:20 ` Jakub Kicinski
2020-08-10 9:23 ` Pavel Machek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200807212916.2883031-1-jwadams@google.com \
--to=jwadams@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=jmattson@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pbonzini@redhat.com \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).