All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: <bpf@vger.kernel.org>
Cc: <netdev@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>, <kernel-team@fb.com>,
	<linux-kernel@vger.kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>, <linux-mm@kvack.org>
Subject: Re: [PATCH bpf-next v4 00/30] bpf: switch to memcg-based memory accounting
Date: Fri, 21 Aug 2020 15:20:36 -0700	[thread overview]
Message-ID: <20200821222036.GB2250889@carbon.dhcp.thefacebook.com> (raw)
In-Reply-To: <20200821150134.2581465-1-guro@fb.com>

On Fri, Aug 21, 2020 at 08:01:04AM -0700, Roman Gushchin wrote:
> Currently bpf is using the memlock rlimit for the memory accounting.
> This approach has its downsides and over time has created a significant
> amount of problems:
> 
> 1) The limit is per-user, but because most bpf operations are performed
>    as root, the limit has a little value.
> 
> 2) It's hard to come up with a specific maximum value. Especially because
>    the counter is shared with non-bpf users (e.g. memlock() users).
>    Any specific value is either too low and creates false failures
>    or too high and useless.
> 
> 3) Charging is not connected to the actual memory allocation. Bpf code
>    should manually calculate the estimated cost and precharge the counter,
>    and then take care of uncharging, including all fail paths.
>    It adds to the code complexity and makes it easy to leak a charge.
> 
> 4) There is no simple way of getting the current value of the counter.
>    We've used drgn for it, but it's far from being convenient.
> 
> 5) Cryptic -EPERM is returned on exceeding the limit. Libbpf even had
>    a function to "explain" this case for users.
> 
> In order to overcome these problems let's switch to the memcg-based
> memory accounting of bpf objects. With the recent addition of the percpu
> memory accounting, now it's possible to provide a comprehensive accounting
> of the memory used by bpf programs and maps.
> 
> This approach has the following advantages:
> 1) The limit is per-cgroup and hierarchical. It's way more flexible and allows
>    a better control over memory usage by different workloads. Of course, it
>    requires enabled cgroups and kernel memory accounting and properly configured
>    cgroup tree, but it's a default configuration for a modern Linux system.
> 
> 2) The actual memory consumption is taken into account. It happens automatically
>    on the allocation time if __GFP_ACCOUNT flags is passed. Uncharging is also
>    performed automatically on releasing the memory. So the code on the bpf side
>    becomes simpler and safer.
> 
> 3) There is a simple way to get the current value and statistics.
> 
> In general, if a process performs a bpf operation (e.g. creates or updates
> a map), it's memory cgroup is charged. However map updates performed from
> an interrupt context are charged to the memory cgroup which contained
> the process, which created the map.
> 
> Providing a 1:1 replacement for the rlimit-based memory accounting is
> a non-goal of this patchset. Users and memory cgroups are completely
> orthogonal, so it's not possible even in theory.
> Memcg-based memory accounting requires a properly configured cgroup tree
> to be actually useful. However, it's the way how the memory is managed
> on a modern Linux system.
> 
> 
> The patchset consists of the following parts:
> 1) an auxiliary patch by Johanness, which adds an ability to charge
>    a custom memory cgroup from an interrupt context
> 2) memcg-based accounting for various bpf objects: progs and maps
> 3) removal of the rlimit-based accounting
> 4) removal of rlimit adjustments in userspace samples

As a note, I've resent the first patch from the series as a standalone
patch to linux-mm@, because a similar change is required by other non-related
patchset. This should avoid further merge conflicts.

I did some renamings in the patch, so v5 of this patchset is expected.
Please, don't merge v4. Feedback is highly appreciated though.

Thanks!

      parent reply	other threads:[~2020-08-21 22:21 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-21 15:01 [PATCH bpf-next v4 00/30] bpf: switch to memcg-based memory accounting Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 01/30] mm: support nesting memalloc_use_memcg() Roman Gushchin
2020-08-21 16:29   ` Shakeel Butt
2020-08-21 16:29     ` Shakeel Butt
2020-08-21 15:01 ` [PATCH bpf-next v4 02/30] bpf: memcg-based memory accounting for bpf progs Roman Gushchin
2020-08-25 19:00   ` Shakeel Butt
2020-08-25 19:00     ` Shakeel Butt
2020-08-25 22:26     ` Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 03/30] bpf: memcg-based memory accounting for bpf maps Roman Gushchin
2020-08-25 23:27   ` Shakeel Butt
2020-08-25 23:27     ` Shakeel Butt
2020-08-26  2:38     ` Roman Gushchin
2020-08-26  8:57   ` [bpf] 3ebc0a7f46: BUG:KASAN:use-after-free_in_b kernel test robot
2020-08-26  8:57     ` kernel test robot
2020-08-21 15:01 ` [PATCH bpf-next v4 04/30] bpf: refine memcg-based memory accounting for arraymap maps Roman Gushchin
2020-08-27  1:19   ` Shakeel Butt
2020-08-27  1:19     ` Shakeel Butt
2020-08-21 15:01 ` [PATCH bpf-next v4 05/30] bpf: refine memcg-based memory accounting for cpumap maps Roman Gushchin
2020-08-27  1:24   ` Shakeel Butt
2020-08-27  1:24     ` Shakeel Butt
2020-08-21 15:01 ` [PATCH bpf-next v4 06/30] bpf: memcg-based memory accounting for cgroup storage maps Roman Gushchin
2020-08-27  1:25   ` Shakeel Butt
2020-08-27  1:25     ` Shakeel Butt
2020-08-21 15:01 ` [PATCH bpf-next v4 07/30] bpf: refine memcg-based memory accounting for devmap maps Roman Gushchin
2020-08-27  1:38   ` Shakeel Butt
2020-08-27  1:38     ` Shakeel Butt
2020-08-21 15:01 ` [PATCH bpf-next v4 08/30] bpf: refine memcg-based memory accounting for hashtab maps Roman Gushchin
2020-08-28 16:44   ` Shakeel Butt
2020-08-28 16:44     ` Shakeel Butt
2020-08-21 15:01 ` [PATCH bpf-next v4 09/30] bpf: memcg-based memory accounting for lpm_trie maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 10/30] bpf: memcg-based memory accounting for bpf ringbuffer Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 11/30] bpf: memcg-based memory accounting for socket storage maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 12/30] bpf: refine memcg-based memory accounting for sockmap and sockhash maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 13/30] bpf: refine memcg-based memory accounting for xskmap maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 14/30] bpf: eliminate rlimit-based memory accounting for arraymap maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 15/30] bpf: eliminate rlimit-based memory accounting for bpf_struct_ops maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 16/30] bpf: eliminate rlimit-based memory accounting for cpumap maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 17/30] bpf: eliminate rlimit-based memory accounting for cgroup storage maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 18/30] bpf: eliminate rlimit-based memory accounting for devmap maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 19/30] bpf: eliminate rlimit-based memory accounting for hashtab maps Roman Gushchin
2020-08-26  8:57   ` [bpf] eda7ef0c7b: canonical_address#:#[##] kernel test robot
2020-08-26  8:57     ` kernel test robot
2020-08-21 15:01 ` [PATCH bpf-next v4 20/30] bpf: eliminate rlimit-based memory accounting for lpm_trie maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 21/30] bpf: eliminate rlimit-based memory accounting for queue_stack_maps maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 22/30] bpf: eliminate rlimit-based memory accounting for reuseport_array maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 23/30] bpf: eliminate rlimit-based memory accounting for bpf ringbuffer Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 24/30] bpf: eliminate rlimit-based memory accounting for sockmap and sockhash maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 25/30] bpf: eliminate rlimit-based memory accounting for stackmap maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 26/30] bpf: eliminate rlimit-based memory accounting for socket storage maps Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 27/30] bpf: eliminate rlimit-based memory accounting for xskmap maps Roman Gushchin
2020-08-23 14:22   ` kernel test robot
2020-08-21 15:01 ` [PATCH bpf-next v4 28/30] bpf: eliminate rlimit-based memory accounting infra for bpf maps Roman Gushchin
2020-08-21 18:23   ` Alexei Starovoitov
2020-08-21 23:15     ` Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 29/30] bpf: eliminate rlimit-based memory accounting for bpf progs Roman Gushchin
2020-08-21 15:01 ` [PATCH bpf-next v4 30/30] bpf: samples: do not touch RLIMIT_MEMLOCK Roman Gushchin
2020-08-21 22:20 ` Roman Gushchin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200821222036.GB2250889@carbon.dhcp.thefacebook.com \
    --to=guro@fb.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.