All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: "David S. Miller" <davem@davemloft.net>,
	Andrii Nakryiko <andrii@kernel.org>, Tejun Heo <tj@kernel.org>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Delyan Kratunov <delyank@fb.com>, linux-mm <linux-mm@kvack.org>,
	bpf <bpf@vger.kernel.org>, Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH v4 bpf-next 15/15] bpf: Introduce sysctl kernel.bpf_force_dyn_alloc.
Date: Mon, 29 Aug 2022 15:27:45 -0700	[thread overview]
Message-ID: <CAADnVQ+vcSmbE=AydXiNTRo1fYFsCA1bPg9ypjVdpYTAUrs2AQ@mail.gmail.com> (raw)
In-Reply-To: <f0e3e3ab-99b7-4d87-4b5a-b71ca7724310@iogearbox.net>

On Mon, Aug 29, 2022 at 3:02 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 8/26/22 4:44 AM, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > Introduce sysctl kernel.bpf_force_dyn_alloc to force dynamic allocation in bpf
> > hash map. All selftests/bpf should pass with bpf_force_dyn_alloc 0 or 1 and all
> > bpf programs (both sleepable and not) should not see any functional difference.
> > The sysctl's observable behavior should only be improved memory usage.
> >
> > Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >   include/linux/filter.h | 2 ++
> >   kernel/bpf/core.c      | 2 ++
> >   kernel/bpf/hashtab.c   | 5 +++++
> >   kernel/bpf/syscall.c   | 9 +++++++++
> >   4 files changed, 18 insertions(+)
> >
> > diff --git a/include/linux/filter.h b/include/linux/filter.h
> > index a5f21dc3c432..eb4d4a0c0bde 100644
> > --- a/include/linux/filter.h
> > +++ b/include/linux/filter.h
> > @@ -1009,6 +1009,8 @@ bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk,
> >   }
> >   #endif
> >
> > +extern int bpf_force_dyn_alloc;
> > +
> >   #ifdef CONFIG_BPF_JIT
> >   extern int bpf_jit_enable;
> >   extern int bpf_jit_harden;
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 639437f36928..a13e78ea4b90 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -533,6 +533,8 @@ void bpf_prog_kallsyms_del_all(struct bpf_prog *fp)
> >       bpf_prog_kallsyms_del(fp);
> >   }
> >
> > +int bpf_force_dyn_alloc __read_mostly;
> > +
> >   #ifdef CONFIG_BPF_JIT
> >   /* All BPF JIT sysctl knobs here. */
> >   int bpf_jit_enable   __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_DEFAULT_ON);
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index 89f26cbddef5..f68a3400939e 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -505,6 +505,11 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
> >
> >       bpf_map_init_from_attr(&htab->map, attr);
> >
> > +     if (!lru && bpf_force_dyn_alloc) {
> > +             prealloc = false;
> > +             htab->map.map_flags |= BPF_F_NO_PREALLOC;
> > +     }
> > +
>
> The rationale is essentially for testing, right? Would be nice to avoid
> making this patch uapi. It will just confuse users with implementation
> details, imho, and then it's hard to remove it again.

Not for testing, but for production.
The plan is to roll this sysctl gradually in the fleet and
hopefully observe memory saving without negative side effects,
but map usage patterns are wild. It will take a long time to get
the confidence that prelloc code from htab can be completely removed.
At scale usage might find all kinds of unforeseen issues.
Probably new alloc heuristics would need to be developed.
If 'git rm kernel/bpf/percpu_freelist.*' ever happens
(would be great, but who knows) then this sysctl will become a nop.
This patch is trivial enough and we could keep it internal,
but everybody else with a large fleet of servers would probably
be applying the same patch and will be repeating the same steps.
bpf usage in hyperscalers varies a lot.
Before 'git rm freelist' we probably flip the default for this sysctl
to get even broader coverage.

  reply	other threads:[~2022-08-29 22:28 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-26  2:44 [PATCH v4 bpf-next 00/15] bpf: BPF specific memory allocator Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 01/15] bpf: Introduce any context " Alexei Starovoitov
2022-08-29 21:30   ` Daniel Borkmann
2022-08-29 21:45     ` Alexei Starovoitov
2022-08-29 21:59   ` Daniel Borkmann
2022-08-29 22:04     ` Alexei Starovoitov
2022-08-29 22:39   ` Martin KaFai Lau
2022-08-29 22:42     ` Alexei Starovoitov
2022-08-29 22:59       ` Kumar Kartikeya Dwivedi
2022-08-29 23:13         ` Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 02/15] bpf: Convert hash map to bpf_mem_alloc Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 03/15] selftests/bpf: Improve test coverage of test_maps Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 04/15] samples/bpf: Reduce syscall overhead in map_perf_test Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 05/15] bpf: Relax the requirement to use preallocated hash maps in tracing progs Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 06/15] bpf: Optimize element count in non-preallocated hash map Alexei Starovoitov
2022-08-29 21:47   ` Daniel Borkmann
2022-08-29 21:57     ` Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 07/15] bpf: Optimize call_rcu " Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 08/15] bpf: Adjust low/high watermarks in bpf_mem_cache Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 09/15] bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 10/15] bpf: Add percpu allocation support to bpf_mem_alloc Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 11/15] bpf: Convert percpu hash map to per-cpu bpf_mem_alloc Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 12/15] bpf: Remove tracing program restriction on map types Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 13/15] bpf: Prepare bpf_mem_alloc to be used by sleepable bpf programs Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 14/15] bpf: Remove prealloc-only restriction for " Alexei Starovoitov
2022-08-26  2:44 ` [PATCH v4 bpf-next 15/15] bpf: Introduce sysctl kernel.bpf_force_dyn_alloc Alexei Starovoitov
2022-08-29 22:02   ` Daniel Borkmann
2022-08-29 22:27     ` Alexei Starovoitov [this message]
2022-08-27 16:57 ` [PATCH v4 bpf-next 00/15] bpf: BPF specific memory allocator Andrii Nakryiko
2022-08-27 22:53   ` Kumar Kartikeya Dwivedi
2022-08-29 15:47     ` Alexei Starovoitov
2022-09-09 20:10       ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAADnVQ+vcSmbE=AydXiNTRo1fYFsCA1bPg9ypjVdpYTAUrs2AQ@mail.gmail.com' \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=delyank@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=memxor@gmail.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.