From: Anton Protopopov <aspsk@isovalent.com>
To: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>,
bpf@vger.kernel.org
Cc: Anton Protopopov <aspsk@isovalent.com>
Subject: [RFC v2 PATCH bpf-next 0/4] bpf: add percpu stats for bpf_map
Date: Thu, 22 Jun 2023 09:53:26 +0000 [thread overview]
Message-ID: <20230622095330.1023453-1-aspsk@isovalent.com> (raw)
This series adds a mechanism for maps to populate per-cpu counters of elements
on insertions/deletions. The sum of these counters can be accessed by a new
kfunc from a map iterator program.
The following patches are present in the series:
* Patch 1 adds a generic per-cpu counter to struct bpf_map
* Patch 2 utilizes this mechanism for hash-based maps
* Patch 3 extends the preloaded map iterator to dump the sum
* Patch 4 adds a self-test for the change
The reason for adding this functionality in our case (Cilium) is to get
signals about how full some heavy-used maps are and what the actual dynamic
profile of map capacity is. In the case of LRU maps this is impossible to get
this information anyhow else. See also [1].
This is a v2 for the https://lore.kernel.org/bpf/20230531110511.64612-1-aspsk@isovalent.com/T/#t
It was rewritten according to previous comments. I've turned this series into
an RFC for two reasons:
1) This patch only works on systems where this_cpu_{inc,dec} is atomic for s64.
For systems which might write s64 non-atomically this would be required to use
some locking mechanism to prevent readers from reading trash via the
bpf_map_sum_elements_counter() kfunc (see patch 1)
2) In comparison with the v1, we're adding extra instructions per map operation
(for preallocated maps, as well as for non-preallocated maps). The only
functionality we're interested at the moment is the number of elements present
in a map, not a per-cpu statistics. This could be better achieved by using
the v1 version, which only adds computations for preallocated maps.
So, the question is: won't it be fine to do the changes in the following way:
* extend the preallocated hash maps to populate percpu batch counters as in v1
* add a kfunc as in v2 to get the current sum
This works as
* nobody at the moment actually requires the per-cpu statistcs
* this implementation can be transparently turned into per-cpu statistics, if
such a need occurs on practice (the only thing to change would be to
re-implement the kfunc and, maybe, add more kfuncs to get per-cpu stats)
* the "v1 way" is the least intrusive: it only affects preallocated maps, as
other maps already provide the required functionality
[1] https://lpc.events/event/16/contributions/1368/
v1 -> v2:
- make the counters generic part of struct bpf_map
- don't use map_info and /proc/self/fdinfo in favor of a kfunc
Anton Protopopov (4):
bpf: add percpu stats for bpf_map elements insertions/deletions
bpf: populate the per-cpu insertions/deletions counters for hashmaps
bpf: make preloaded map iterators to display map elements count
selftests/bpf: test map percpu stats
include/linux/bpf.h | 30 +
kernel/bpf/hashtab.c | 102 ++--
kernel/bpf/map_iter.c | 48 +-
kernel/bpf/preload/iterators/iterators.bpf.c | 9 +-
.../iterators/iterators.lskel-little-endian.h | 513 +++++++++---------
.../bpf/map_tests/map_percpu_stats.c | 336 ++++++++++++
.../selftests/bpf/progs/map_percpu_stats.c | 24 +
7 files changed, 766 insertions(+), 296 deletions(-)
create mode 100644 tools/testing/selftests/bpf/map_tests/map_percpu_stats.c
create mode 100644 tools/testing/selftests/bpf/progs/map_percpu_stats.c
--
2.34.1
next reply other threads:[~2023-06-22 9:52 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-22 9:53 Anton Protopopov [this message]
2023-06-22 9:53 ` [RFC v2 PATCH bpf-next 1/4] bpf: add percpu stats for bpf_map elements insertions/deletions Anton Protopopov
2023-06-22 20:11 ` Alexei Starovoitov
2023-06-23 12:47 ` Anton Protopopov
2023-06-23 10:51 ` Daniel Borkmann
2023-06-23 12:35 ` Anton Protopopov
2023-06-22 9:53 ` [RFC v2 PATCH bpf-next 2/4] bpf: populate the per-cpu insertions/deletions counters for hashmaps Anton Protopopov
2023-06-22 20:18 ` Alexei Starovoitov
2023-06-22 9:53 ` [RFC v2 PATCH bpf-next 3/4] bpf: make preloaded map iterators to display map elements count Anton Protopopov
2023-06-22 9:58 ` [RFC v2 PATCH bpf-next 4/4] selftests/bpf: test map percpu stats Anton Protopopov
2023-06-22 20:20 ` Alexei Starovoitov
2023-06-26 14:37 ` Anton Protopopov
2023-06-23 9:53 ` [RFC v2 PATCH bpf-next 0/4] bpf: add percpu stats for bpf_map Daniel Borkmann
2023-06-24 0:17 ` Alexei Starovoitov
2023-06-26 8:50 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230622095330.1023453-1-aspsk@isovalent.com \
--to=aspsk@isovalent.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).