linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Adams <jwadams@google.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Emanuele Giuseppe Esposito <eesposit@redhat.com>,
	linux-s390@vger.kernel.org, kvm list <kvm@vger.kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Emanuele Giuseppe Esposito <e.emanuelegiuseppe@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	kvm-ppc@vger.kernel.org, linux-mips@vger.kernel.org,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	linuxppc-dev@lists.ozlabs.org, Jim Mattson <jmattson@google.com>
Subject: Re: [PATCH v2 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics
Date: Thu, 14 May 2020 10:35:35 -0700	[thread overview]
Message-ID: <CA+VK+GOnVK23X+J-VVWUK6VVpkeVOvsmQAw=HAf89h_ksYM9Rg@mail.gmail.com> (raw)
In-Reply-To: <fe21094c-bdb0-b802-482e-72bc17e5232a@redhat.com>

On Mon, May 11, 2020 at 10:34 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> Hi Jonathan, I think the remaining sticky point is this one:

Apologies it took a couple days for me to respond; I wanted to finish
evaluating our current usage to make sure I had a full picture; I'll
summarize our state at the bottom.

> On 11/05/20 19:02, Jonathan Adams wrote:
> > I think I'd characterize this slightly differently; we have a set of
> > statistics which are essentially "in parallel":
> >
> >   - a variety of statistics, N CPUs they're available for, or
> >   - a variety of statistics, N interfaces they're available for.
> >   - a variety of statistics, N kvm object they're available for.
> >
> > Recreating a parallel hierarchy of statistics any time we add/subtract
> > a CPU or interface seems like a lot of overhead.  Perhaps a better
> > model would be some sort of "parameter enumn" (naming is hard;
> > parameter set?), so when a CPU/network interface/etc is added you'd
> > add its ID to the "CPUs" we know about, and at removal time you'd
> > take it out; it would have an associated cbarg for the value getting
> > callback.
> >
> >> Yep, the above "not create a dentry" flag would handle the case where
> >> you sum things up in the kernel because the more fine grained counters
> >> would be overwhelming.
> >
> > nodnod; or the callback could handle the sum itself.
>
> In general for statsfs we took a more explicit approach where each
> addend in a sum is a separate stats_fs_source.  In this version of the
> patches it's also a directory, but we'll take your feedback and add both
> the ability to hide directories (first) and to list values (second).
>
> So, in the cases of interfaces and KVM objects I would prefer to keep
> each addend separate.

This just feels like a lot of churn just to add a statistic or object;
in your model, every time a KVM or VCPU is created, you create the N
statistics, leading to N*M total objects.  As I was imagining it,
you'd have:

    A 'parameter enum' which maps names to object pointers and
    A set of statistics which map a statfs path to {callback, cbarg,
zero or more parameter enums}

So adding a new KVM VCPU would just be "add an object to the KVM's
VCPU parameter enum", and removing it would be the opposite, and a
couple callbacks could handle basically all of the stats.   The only
tricky part would be making sure the parameter enum value
create/destroy and the callback calls are coordinated correctly.

If you wanted stats for a particular VCPU, we could mark the overall
directory as "include subdirs for VCPU parameter", and you'd
automatically get one directory per VCPU, with the same set of stats
in it, constrained to the single VCPU.  I could also imagine having an
".agg_sum/{stata,statb,...}" to report using the aggregations you
have, or a mode to say "stats in this directory are sums over the
following VCPU parameter".

> For CPUs that however would be pretty bad.  Many subsystems might
> accumulate stats percpu for performance reason, which would then be
> exposed as the sum (usually).  So yeah, native handling of percpu values
> makes sense.  I think it should fit naturally into the same custom
> aggregation framework as hash table keys, we'll see if there's any devil
> in the details.
>
> Core kernel stats such as /proc/interrupts or /proc/stat are the
> exception here, since individual per-CPU values can be vital for
> debugging.  For those, creating a source per stat, possibly on-the-fly
> at hotplug/hot-unplug time because NR_CPUS can be huge, would still be
> my preferred way to do it.

Our metricfs has basically two modes: report all per-CPU values (for
the IPI counts etc; you pass a callback which takes a 'int cpu'
argument) or a callback that sums over CPUs and reports the full
value.  It also seems hard to have any subsystem with a per-CPU stat
having to install a hotplug callback to add/remove statistics.

In my model, a "CPU" parameter enum which is automatically kept
up-to-date is probably sufficient for the "report all per-CPU values".

Does this make sense to you?  I realize that this is a significant
change to the model y'all are starting with; I'm willing to do the
work to flesh it out.

Thanks for your time,
- Jonathan

P.S.  Here's a summary of the types of statistics we use in metricfs
in google, to give a little context:

- integer values (single value per stat, source also a single value);
a couple of these are boolean values exported as '0' or '1'.
- per-CPU integer values, reported as a <cpuid, value> table
- per-CPU integer values, summed and reported as an aggregate
- single-value values, keys related to objects:
    - many per-device (disk, network, etc) integer stats
    - some per-device string data (version strings, UUIDs, and
occasional statuses.)
- a few histograms (usually counts by duration ranges)
- the "function name" to count for the WARN statistic I mentioned.
- A single statistic with two keys (for livepatch statistics; the
value is the livepatch status as a string)

Most of the stats with keys are "complete" (every key has a value),
but there are several examples of statistics where only some of the
possible keys have values, or (e.g. for networking statistics) only
the keys visible to the reading process (e.g. in its namespaces) are
included.

  reply	other threads:[~2020-05-14 17:38 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04 11:03 [PATCH v2 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics Emanuele Giuseppe Esposito
2020-05-04 11:03 ` [PATCH v2 1/5] refcount, kref: add dec-and-test wrappers for rw_semaphores Emanuele Giuseppe Esposito
2020-05-04 11:03 ` [PATCH v2 2/5] stats_fs API: create, add and remove stats_fs sources and values Emanuele Giuseppe Esposito
2020-05-04 22:11   ` Randy Dunlap
2020-05-04 11:03 ` [PATCH v2 3/5] kunit: tests for stats_fs API Emanuele Giuseppe Esposito
2020-05-04 11:03 ` [PATCH v2 4/5] stats_fs fs: virtual fs to show stats to the end-user Emanuele Giuseppe Esposito
2020-05-04 11:03 ` [PATCH v2 5/5] kvm_main: replace debugfs with stats_fs Emanuele Giuseppe Esposito
2020-05-04 21:37 ` [PATCH v2 0/5] Statsfs: a new ram-based file sytem for Linux kernel statistics David Rientjes
2020-05-05  9:18   ` Emanuele Giuseppe Esposito
2020-05-05 16:53     ` Jim Mattson
2020-05-05 17:02       ` Paolo Bonzini
2020-05-05 17:07         ` David Rientjes
2020-05-05 17:21           ` Paolo Bonzini
2020-05-05 17:30             ` Christian Borntraeger
2020-06-04 11:59   ` Amit Kucheria
     [not found] ` <CA+VK+GN=iDhDV2ZDJbBsxrjZ3Qoyotk_L0DvsbwDVvqrpFZ8fQ@mail.gmail.com>
2020-05-08  9:44   ` Paolo Bonzini
2020-05-11  9:37     ` Emanuele Giuseppe Esposito
2020-05-11 17:02     ` Jonathan Adams
2020-05-11 17:34       ` Paolo Bonzini
2020-05-14 17:35         ` Jonathan Adams [this message]
2020-05-14 17:42           ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+VK+GOnVK23X+J-VVWUK6VVpkeVOvsmQAw=HAf89h_ksYM9Rg@mail.gmail.com' \
    --to=jwadams@google.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=e.emanuelegiuseppe@gmail.com \
    --cc=eesposit@redhat.com \
    --cc=jmattson@google.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=pbonzini@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).