[RFC PATCH bpf-next 0/9] bpf: cgroup hierarchical stats collection

* [RFC PATCH bpf-next 0/9] bpf: cgroup hierarchical stats collection
@ 2022-05-10  0:17 Yosry Ahmed
  2022-05-10  0:17 ` [RFC PATCH bpf-next 1/9] bpf: introduce CGROUP_SUBSYS_RSTAT program type Yosry Ahmed
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Yosry Ahmed @ 2022-05-10  0:17 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Tejun Heo, Zefan Li, Johannes Weiner,
	Shuah Khan, Roman Gushchin, Michal Hocko
  Cc: Stanislav Fomichev, David Rientjes, Greg Thelen, Shakeel Butt,
	linux-kernel, netdev, bpf, cgroups, Yosry Ahmed

This patch series allows for using bpf to collect hierarchical cgroup
stats efficiently by integrating with the rstat framework. The rstat
framework provides an efficient way to collect cgroup stats and
propagate them through the cgroup hierarchy.

The last patch is a selftest that demonastrates the entire workflow.
The workflow consists of:
- bpf programs that collect per-cpu per-cgroup stats (tracing progs).
- bpf rstat flusher that contains the logic for aggregating stats
  across cpus and across the cgroup hierarchy.
- bpf cgroup_iter responsible for outputting the stats to userspace
  through reading a file in bpffs.

The first 3 patches include the new bpf rstat flusher program type and
the needed support in rstat code and libbpf. The rstat flusher program
is a callback that the rstat framework makes to bpf when a stat flush is
ongoing, similar to the css_rstat_flush() callback that rstat makes to
cgroup controllers. Each callback is parameterized by a (cgroup, cpu)
pair that has been updated. The program contains the logic for
aggregating the stats across cpus and across the cgroup hierarchy.
These programs can be attached to any cgroup subsystem, not only the
ones that implement the css_rstat_flush() callback in the kernel. This
gives bpf programs more flexibility, and more isolation from the kernel
implementation.

The following 2 patches add necessary helpers for the stats collection
workflow. Helpers that call into cgroup_rstat_updated() and
cgroup_rstat_flush() are added to allow bpf programs collecting stats to
tell the rstat framework that a cgroup has been updated, and to allow
bpf programs outputting stats to tell the rstat framework to flush the
stats before they are displayed to the user. An additional
bpf_map_lookup_percpu_elem is introduced to allow rstat flusher programs
to access percpu stats of the cpu being flushed.

The following 3 patches add the cgroup_iter program type (v2). This was
originally introduced by Hao as a part of a different series [1].
Their usecase is better showcased as part of this patch series. We also
make cgroup_get_from_id() cgroup v1 friendly to allow cgroup_iter programs
to display stats for cgroup v1 as well. This small change makes the
entire workflow cgroup v1 friendly without any other dedicated changes.

The final patch is a selftest demonstrating the entire workflow with a
set of bpf programs that collect per-cgroup latency of memcg reclaim.

[1]https://lore.kernel.org/lkml/20220225234339.2386398-9-haoluo@google.com/

Hao Luo (2):
  cgroup: Add cgroup_put() in !CONFIG_CGROUPS case
  bpf: Introduce cgroup iter

Yosry Ahmed (7):
  bpf: introduce CGROUP_SUBSYS_RSTAT program type
  cgroup: bpf: flush bpf stats on rstat flush
  libbpf: Add support for rstat progs and links
  bpf: add bpf rstat helpers
  bpf: add bpf_map_lookup_percpu_elem() helper
  cgroup: add v1 support to cgroup_get_from_id()
  bpf: add a selftest for cgroup hierarchical stats collection

 include/linux/bpf-cgroup-subsys.h             |  35 ++
 include/linux/bpf.h                           |   4 +
 include/linux/bpf_types.h                     |   2 +
 include/linux/cgroup-defs.h                   |   4 +
 include/linux/cgroup.h                        |   5 +
 include/uapi/linux/bpf.h                      |  45 +++
 kernel/bpf/Makefile                           |   3 +-
 kernel/bpf/arraymap.c                         |  11 +-
 kernel/bpf/cgroup_iter.c                      | 148 ++++++++
 kernel/bpf/cgroup_subsys.c                    | 212 +++++++++++
 kernel/bpf/hashtab.c                          |  25 +-
 kernel/bpf/helpers.c                          |  56 +++
 kernel/bpf/syscall.c                          |   6 +
 kernel/bpf/verifier.c                         |   6 +
 kernel/cgroup/cgroup.c                        |  16 +-
 kernel/cgroup/rstat.c                         |  11 +
 scripts/bpf_doc.py                            |   2 +
 tools/include/uapi/linux/bpf.h                |  45 +++
 tools/lib/bpf/bpf.c                           |   3 +
 tools/lib/bpf/bpf.h                           |   3 +
 tools/lib/bpf/libbpf.c                        |  35 ++
 tools/lib/bpf/libbpf.h                        |   3 +
 tools/lib/bpf/libbpf.map                      |   1 +
 .../test_cgroup_hierarchical_stats.c          | 335 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/bpf_iter.h  |   7 +
 .../selftests/bpf/progs/cgroup_vmscan.c       | 211 +++++++++++
 26 files changed, 1212 insertions(+), 22 deletions(-)
 create mode 100644 include/linux/bpf-cgroup-subsys.h
 create mode 100644 kernel/bpf/cgroup_iter.c
 create mode 100644 kernel/bpf/cgroup_subsys.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_cgroup_hierarchical_stats.c
 create mode 100644 tools/testing/selftests/bpf/progs/cgroup_vmscan.c

-- 
2.36.0.512.ge40c2bad7a-goog

^ permalink raw reply	[flat|nested] 30+ messages in thread