bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Quentin Monnet <quentin@isovalent.com>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>
Cc: Harsh Modi <harshmodi@google.com>, Paul Chaignon <paul@cilium.io>,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	Quentin Monnet <quentin@isovalent.com>
Subject: [PATCH bpf-next] libbpf: Improve probing for memcg-based memory accounting
Date: Thu,  9 Jun 2022 15:36:14 +0100	[thread overview]
Message-ID: <20220609143614.97837-1-quentin@isovalent.com> (raw)

To ensure that memory accounting will not hinder the load of BPF
objects, libbpf may raise the memlock rlimit before proceeding to some
operations. Whether this limit needs to be raised depends on the version
of the kernel: newer versions use cgroup-based (memcg) memory
accounting, and do not require any adjustment.

There is a probe in libbpf to determine whether memcg-based accounting
is supported. But this probe currently relies on the availability of a
given BPF helper, bpf_ktime_get_coarse_ns(), which landed in the same
kernel version as the memory accounting change. This works in the
generic case, but it may fail, for example, if the helper function has
been backported to an older kernel. This has been observed for Google
Cloud's Container-Optimized OS (COS), where the helper is available but
rlimit is still in use. The probe succeeds, the rlimit is not raised,
and probing features with bpftool, for example, fails.

Here we attempt to improve this probe and to effectively rely on memory
accounting. Function probe_memcg_account() in libbpf is updated to set
the rlimit to 0, then attempt to load a BPF object, and then to reset
the rlimit. If the load still succeeds, then this means we're running
with memcg-based accounting.

This probe was inspired by the similar one from the cilium/ebpf Go
library [0].

[0] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
---
 tools/lib/bpf/bpf.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 240186aac8e6..781387e6f66b 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -99,31 +99,44 @@ static inline int sys_bpf_prog_load(union bpf_attr *attr, unsigned int size, int
 
 /* Probe whether kernel switched from memlock-based (RLIMIT_MEMLOCK) to
  * memcg-based memory accounting for BPF maps and progs. This was done in [0].
- * We use the support for bpf_ktime_get_coarse_ns() helper, which was added in
- * the same 5.11 Linux release ([1]), to detect memcg-based accounting for BPF.
+ * To do so, we lower the soft memlock rlimit to 0 and attempt to create a BPF
+ * object. If it succeeds, then memcg-based accounting for BPF is available.
  *
  *   [0] https://lore.kernel.org/bpf/20201201215900.3569844-1-guro@fb.com/
- *   [1] d05512618056 ("bpf: Add bpf_ktime_get_coarse_ns helper")
  */
 int probe_memcg_account(void)
 {
 	const size_t prog_load_attr_sz = offsetofend(union bpf_attr, attach_btf_obj_fd);
 	struct bpf_insn insns[] = {
-		BPF_EMIT_CALL(BPF_FUNC_ktime_get_coarse_ns),
 		BPF_EXIT_INSN(),
 	};
+	struct rlimit rlim_init, rlim_cur_zero = {};
 	size_t insn_cnt = ARRAY_SIZE(insns);
 	union bpf_attr attr;
 	int prog_fd;
 
-	/* attempt loading freplace trying to use custom BTF */
 	memset(&attr, 0, prog_load_attr_sz);
 	attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
 	attr.insns = ptr_to_u64(insns);
 	attr.insn_cnt = insn_cnt;
 	attr.license = ptr_to_u64("GPL");
 
+	if (getrlimit(RLIMIT_MEMLOCK, &rlim_init))
+		return -1;
+
+	/* Drop the soft limit to zero. We maintain the hard limit to its
+	 * current value, because lowering it would be a permanent operation
+	 * for unprivileged users.
+	 */
+	rlim_cur_zero.rlim_max = rlim_init.rlim_max;
+	if (setrlimit(RLIMIT_MEMLOCK, &rlim_cur_zero))
+		return -1;
+
 	prog_fd = sys_bpf_fd(BPF_PROG_LOAD, &attr, prog_load_attr_sz);
+
+	/* reset soft rlimit as soon as possible */
+	setrlimit(RLIMIT_MEMLOCK, &rlim_init);
+
 	if (prog_fd >= 0) {
 		close(prog_fd);
 		return 1;
-- 
2.34.1


             reply	other threads:[~2022-06-09 14:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-09 14:36 Quentin Monnet [this message]
2022-06-09 18:13 ` [PATCH bpf-next] libbpf: Improve probing for memcg-based memory accounting sdf
2022-06-09 18:37   ` Andrii Nakryiko
2022-06-09 19:57     ` Quentin Monnet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220609143614.97837-1-quentin@isovalent.com \
    --to=quentin@isovalent.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=harshmodi@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=paul@cilium.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).