All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/35] bpf: switch to memcg-based memory accounting
@ 2020-07-25  0:03 Roman Gushchin
  2020-07-25  0:03 ` [PATCH bpf-next 01/35] bpf: memcg-based memory accounting for bpf progs Roman Gushchin
                   ` (34 more replies)
  0 siblings, 35 replies; 37+ messages in thread
From: Roman Gushchin @ 2020-07-25  0:03 UTC (permalink / raw)
  To: bpf
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	linux-kernel, Roman Gushchin

Currently bpf is using the memlock rlimit for the memory accounting.
This approach has its downsides and over time has created a significant
amount of problems:

1) The limit is per-user, but because most bpf operations are performed
   as root, the limit has a little value.

2) It's hard to come up with a specific maximum value. Especially because
   the counter is shared with non-bpf users (e.g. memlock() users).
   Any specific value is either too low and creates false failures
   or too high and useless.

3) Charging is not connected to the actual memory allocation. Bpf code
   should manually calculate the estimated cost and precharge the counter,
   and then take care of uncharging, including all fail paths.
   It adds to the code complexity and makes it easy to leak a charge.

4) There is no simple way of getting the current value of the counter.
   We've used drgn for it, but it's far from being convenient.

5) Cryptic -EPERM is returned on exceeding the limit. Libbpf even had
   a function to "explain" this case for users.

In order to overcome these problems let's switch to the memcg-based
memory accounting of bpf objects. With the recent addition of the percpu
memory accounting, now it's possible to provide a comprehensive accounting
of memory used by bpf programs and maps.

This approach has the following advantages:
1) The limit is per-cgroup and hierarchical. It's way more flexible and allows
   a better control over memory usage by different workloads.

2) The actual memory consumption is taken into account. It happens automatically
   on the allocation time if __GFP_ACCOUNT flags is passed. Uncharging is also
   performed automatically on releasing the memory. So the code on the bpf side
   becomes simpler and safer.

3) There is a simple way to get the current value and statistics.

The patchset consists of the following parts:
1) memcg-based accounting for various bpf objects: progs and maps
2) removal of the rlimit-based accounting
3) removal of rlimit adjustments in userspace tools and tests


Roman Gushchin (35):
  bpf: memcg-based memory accounting for bpf progs
  bpf: memcg-based memory accounting for bpf maps
  bpf: refine memcg-based memory accounting for arraymap maps
  bpf: refine memcg-based memory accounting for cpumap maps
  bpf: memcg-based memory accounting for cgroup storage maps
  bpf: refine memcg-based memory accounting for devmap maps
  bpf: refine memcg-based memory accounting for hashtab maps
  bpf: memcg-based memory accounting for lpm_trie maps
  bpf: memcg-based memory accounting for bpf ringbuffer
  bpf: memcg-based memory accounting for socket storage maps
  bpf: refine memcg-based memory accounting for sockmap maps
  bpf: refine memcg-based memory accounting for xskmap maps
  bpf: eliminate rlimit-based memory accounting for arraymap maps
  bpf: eliminate rlimit-based memory accounting for bpf_struct_ops maps
  bpf: eliminate rlimit-based memory accounting for cpumap maps
  bpf: eliminate rlimit-based memory accounting for cgroup storage maps
  bpf: eliminate rlimit-based memory accounting for devmap maps
  bpf: eliminate rlimit-based memory accounting for hashtab maps
  bpf: eliminate rlimit-based memory accounting for lpm_trie maps
  bpf: eliminate rlimit-based memory accounting for queue_stack_maps
    maps
  bpf: eliminate rlimit-based memory accounting for reuseport_array maps
  bpf: eliminate rlimit-based memory accounting for bpf ringbuffer
  bpf: eliminate rlimit-based memory accounting for sock_map maps
  bpf: eliminate rlimit-based memory accounting for stackmap maps
  bpf: eliminate rlimit-based memory accounting for socket storage maps
  bpf: eliminate rlimit-based memory accounting for xskmap maps
  bpf: eliminate rlimit-based memory accounting infra for bpf maps
  bpf: eliminate rlimit-based memory accounting for bpf progs
  bpf: libbpf: cleanup RLIMIT_MEMLOCK usage
  bpf: bpftool: do not touch RLIMIT_MEMLOCK
  bpf: runqslower: don't touch RLIMIT_MEMLOCK
  bpf: selftests: delete bpf_rlimit.h
  bpf: selftests: don't touch RLIMIT_MEMLOCK
  bpf: samples: do not touch RLIMIT_MEMLOCK
  perf: don't touch RLIMIT_MEMLOCK

 include/linux/bpf.h                           |  23 ---
 kernel/bpf/arraymap.c                         |  30 +---
 kernel/bpf/bpf_struct_ops.c                   |  19 +--
 kernel/bpf/core.c                             |  20 +--
 kernel/bpf/cpumap.c                           |  20 +--
 kernel/bpf/devmap.c                           |  23 +--
 kernel/bpf/hashtab.c                          |  33 +---
 kernel/bpf/local_storage.c                    |  38 ++---
 kernel/bpf/lpm_trie.c                         |  17 +-
 kernel/bpf/queue_stack_maps.c                 |  16 +-
 kernel/bpf/reuseport_array.c                  |  12 +-
 kernel/bpf/ringbuf.c                          |  33 ++--
 kernel/bpf/stackmap.c                         |  16 +-
 kernel/bpf/syscall.c                          | 152 ++----------------
 net/core/bpf_sk_storage.c                     |  23 +--
 net/core/sock_map.c                           |  28 ++--
 net/xdp/xskmap.c                              |  13 +-
 samples/bpf/hbm.c                             |   1 -
 samples/bpf/map_perf_test_user.c              |  11 --
 samples/bpf/offwaketime_user.c                |   2 -
 samples/bpf/sockex2_user.c                    |   2 -
 samples/bpf/sockex3_user.c                    |   2 -
 samples/bpf/spintest_user.c                   |   2 -
 samples/bpf/syscall_tp_user.c                 |   2 -
 samples/bpf/task_fd_query_user.c              |   5 -
 samples/bpf/test_lru_dist.c                   |   3 -
 samples/bpf/test_map_in_map_user.c            |   9 --
 samples/bpf/test_overhead_user.c              |   2 -
 samples/bpf/trace_event_user.c                |   2 -
 samples/bpf/tracex2_user.c                    |   6 -
 samples/bpf/tracex3_user.c                    |   6 -
 samples/bpf/tracex4_user.c                    |   6 -
 samples/bpf/tracex5_user.c                    |   3 -
 samples/bpf/tracex6_user.c                    |   3 -
 samples/bpf/xdp1_user.c                       |   6 -
 samples/bpf/xdp_adjust_tail_user.c            |   6 -
 samples/bpf/xdp_monitor_user.c                |   6 -
 samples/bpf/xdp_redirect_cpu_user.c           |   6 -
 samples/bpf/xdp_redirect_map_user.c           |   6 -
 samples/bpf/xdp_redirect_user.c               |   6 -
 samples/bpf/xdp_router_ipv4_user.c            |   6 -
 samples/bpf/xdp_rxq_info_user.c               |   6 -
 samples/bpf/xdp_sample_pkts_user.c            |   6 -
 samples/bpf/xdp_tx_iptunnel_user.c            |   6 -
 samples/bpf/xdpsock_user.c                    |   7 -
 tools/bpf/bpftool/common.c                    |   7 -
 tools/bpf/bpftool/feature.c                   |   2 -
 tools/bpf/bpftool/main.h                      |   2 -
 tools/bpf/bpftool/map.c                       |   2 -
 tools/bpf/bpftool/pids.c                      |   1 -
 tools/bpf/bpftool/prog.c                      |   3 -
 tools/bpf/bpftool/struct_ops.c                |   2 -
 tools/bpf/runqslower/runqslower.c             |  16 --
 tools/lib/bpf/libbpf.c                        |  31 +---
 tools/lib/bpf/libbpf.h                        |   5 -
 tools/perf/builtin-trace.c                    |  10 --
 tools/perf/tests/builtin-test.c               |   6 -
 tools/perf/util/Build                         |   1 -
 tools/perf/util/rlimit.c                      |  29 ----
 tools/perf/util/rlimit.h                      |   6 -
 tools/testing/selftests/bpf/bench.c           |  16 --
 tools/testing/selftests/bpf/bpf_rlimit.h      |  28 ----
 .../selftests/bpf/flow_dissector_load.c       |   1 -
 .../selftests/bpf/get_cgroup_id_user.c        |   1 -
 .../bpf/prog_tests/select_reuseport.c         |   1 -
 .../selftests/bpf/prog_tests/sk_lookup.c      |   1 -
 .../selftests/bpf/progs/bpf_iter_bpf_map.c    |   5 +-
 .../selftests/bpf/progs/map_ptr_kern.c        |   5 -
 tools/testing/selftests/bpf/test_btf.c        |   1 -
 .../selftests/bpf/test_cgroup_storage.c       |   1 -
 tools/testing/selftests/bpf/test_dev_cgroup.c |   1 -
 tools/testing/selftests/bpf/test_lpm_map.c    |   1 -
 tools/testing/selftests/bpf/test_lru_map.c    |   1 -
 tools/testing/selftests/bpf/test_maps.c       |   1 -
 tools/testing/selftests/bpf/test_netcnt.c     |   1 -
 tools/testing/selftests/bpf/test_progs.c      |   1 -
 .../selftests/bpf/test_skb_cgroup_id_user.c   |   1 -
 tools/testing/selftests/bpf/test_sock.c       |   1 -
 tools/testing/selftests/bpf/test_sock_addr.c  |   1 -
 .../testing/selftests/bpf/test_sock_fields.c  |   1 -
 .../selftests/bpf/test_socket_cookie.c        |   1 -
 tools/testing/selftests/bpf/test_sockmap.c    |   1 -
 tools/testing/selftests/bpf/test_sysctl.c     |   1 -
 tools/testing/selftests/bpf/test_tag.c        |   1 -
 .../bpf/test_tcp_check_syncookie_user.c       |   1 -
 .../testing/selftests/bpf/test_tcpbpf_user.c  |   1 -
 .../selftests/bpf/test_tcpnotify_user.c       |   1 -
 tools/testing/selftests/bpf/test_verifier.c   |   1 -
 .../testing/selftests/bpf/test_verifier_log.c |   2 -
 tools/testing/selftests/bpf/xdping.c          |   6 -
 tools/testing/selftests/net/reuseport_bpf.c   |  20 ---
 91 files changed, 97 insertions(+), 782 deletions(-)
 delete mode 100644 tools/perf/util/rlimit.c
 delete mode 100644 tools/perf/util/rlimit.h
 delete mode 100644 tools/testing/selftests/bpf/bpf_rlimit.h

-- 
2.26.2


^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2020-07-25  4:10 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-25  0:03 [PATCH bpf-next 00/35] bpf: switch to memcg-based memory accounting Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 01/35] bpf: memcg-based memory accounting for bpf progs Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 02/35] bpf: memcg-based memory accounting for bpf maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 03/35] bpf: refine memcg-based memory accounting for arraymap maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 04/35] bpf: refine memcg-based memory accounting for cpumap maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 05/35] bpf: memcg-based memory accounting for cgroup storage maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 06/35] bpf: refine memcg-based memory accounting for devmap maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 07/35] bpf: refine memcg-based memory accounting for hashtab maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 08/35] bpf: memcg-based memory accounting for lpm_trie maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 09/35] bpf: memcg-based memory accounting for bpf ringbuffer Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 10/35] bpf: memcg-based memory accounting for socket storage maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 11/35] bpf: refine memcg-based memory accounting for sockmap maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 12/35] bpf: refine memcg-based memory accounting for xskmap maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 13/35] bpf: eliminate rlimit-based memory accounting for arraymap maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 14/35] bpf: eliminate rlimit-based memory accounting for bpf_struct_ops maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 15/35] bpf: eliminate rlimit-based memory accounting for cpumap maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 16/35] bpf: eliminate rlimit-based memory accounting for cgroup storage maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 17/35] bpf: eliminate rlimit-based memory accounting for devmap maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 18/35] bpf: eliminate rlimit-based memory accounting for hashtab maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 19/35] bpf: eliminate rlimit-based memory accounting for lpm_trie maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 20/35] bpf: eliminate rlimit-based memory accounting for queue_stack_maps maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 21/35] bpf: eliminate rlimit-based memory accounting for reuseport_array maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 22/35] bpf: eliminate rlimit-based memory accounting for bpf ringbuffer Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 23/35] bpf: eliminate rlimit-based memory accounting for sock_map maps Roman Gushchin
2020-07-25  0:03 ` [PATCH bpf-next 24/35] bpf: eliminate rlimit-based memory accounting for stackmap maps Roman Gushchin
2020-07-25  0:04 ` [PATCH bpf-next 25/35] bpf: eliminate rlimit-based memory accounting for socket storage maps Roman Gushchin
2020-07-25  0:04 ` [PATCH bpf-next 26/35] bpf: eliminate rlimit-based memory accounting for xskmap maps Roman Gushchin
2020-07-25  0:04 ` [PATCH bpf-next 27/35] bpf: eliminate rlimit-based memory accounting infra for bpf maps Roman Gushchin
2020-07-25  4:10   ` kernel test robot
2020-07-25  0:04 ` [PATCH bpf-next 28/35] bpf: eliminate rlimit-based memory accounting for bpf progs Roman Gushchin
2020-07-25  0:04 ` [PATCH bpf-next 29/35] bpf: libbpf: cleanup RLIMIT_MEMLOCK usage Roman Gushchin
2020-07-25  0:04 ` [PATCH bpf-next 30/35] bpf: bpftool: do not touch RLIMIT_MEMLOCK Roman Gushchin
2020-07-25  0:04 ` [PATCH bpf-next 31/35] bpf: runqslower: don't " Roman Gushchin
2020-07-25  0:04 ` [PATCH bpf-next 32/35] bpf: selftests: delete bpf_rlimit.h Roman Gushchin
2020-07-25  0:04 ` [PATCH bpf-next 33/35] bpf: selftests: don't touch RLIMIT_MEMLOCK Roman Gushchin
2020-07-25  0:04 ` [PATCH bpf-next 34/35] bpf: samples: do not " Roman Gushchin
2020-07-25  0:04 ` [PATCH bpf-next 35/35] perf: don't " Roman Gushchin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.