All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v2] bpftool: Probe for memcg-based accounting before bumping rlimit
@ 2022-06-29 11:13 Quentin Monnet
  2022-06-29 14:24 ` Yafang Shao
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Quentin Monnet @ 2022-06-29 11:13 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, bpf, netdev, Quentin Monnet, Stanislav Fomichev,
	Yafang Shao

Bpftool used to bump the memlock rlimit to make sure to be able to load
BPF objects. After the kernel has switched to memcg-based memory
accounting [0] in 5.11, bpftool has relied on libbpf to probe the system
for memcg-based accounting support and for raising the rlimit if
necessary [1]. But this was later reverted, because the probe would
sometimes fail, resulting in bpftool not being able to load all required
objects [2].

Here we add a more efficient probe, in bpftool itself. We first lower
the rlimit to 0, then we attempt to load a BPF object (and finally reset
the rlimit): if the load succeeds, then memcg-based memory accounting is
supported.

This approach was earlier proposed for the probe in libbpf itself [3],
but given that the library may be used in multithreaded applications,
the probe could have undesirable consequences if one thread attempts to
lock kernel memory while memlock rlimit is at 0. Since bpftool is
single-threaded and the rlimit is process-based, this is fine to do in
bpftool itself.

This probe was inspired by the similar one from the cilium/ebpf Go
library [4].

v2:
- Simply use sizeof(attr) instead of hardcoding a size via
  offsetofend().
- Set r0 = 0 before returning in sample program.

[0] commit 97306be45fbe ("Merge branch 'switch to memcg-based memory accounting'")
[1] commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK")
[2] commit 6b4384ff1088 ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"")
[3] https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/t/#u
[4] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39

Cc: Stanislav Fomichev <sdf@google.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
---
 tools/bpf/bpftool/common.c | 71 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 68 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index a0d4acd7c54a..fc8172a4969a 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -13,14 +13,17 @@
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
-#include <linux/limits.h>
-#include <linux/magic.h>
 #include <net/if.h>
 #include <sys/mount.h>
 #include <sys/resource.h>
 #include <sys/stat.h>
 #include <sys/vfs.h>
 
+#include <linux/filter.h>
+#include <linux/limits.h>
+#include <linux/magic.h>
+#include <linux/unistd.h>
+
 #include <bpf/bpf.h>
 #include <bpf/hashmap.h>
 #include <bpf/libbpf.h> /* libbpf_num_possible_cpus */
@@ -73,11 +76,73 @@ static bool is_bpffs(char *path)
 	return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
 }
 
+/* Probe whether kernel switched from memlock-based (RLIMIT_MEMLOCK) to
+ * memcg-based memory accounting for BPF maps and programs. This was done in
+ * commit 97306be45fbe ("Merge branch 'switch to memcg-based memory
+ * accounting'"), in Linux 5.11.
+ *
+ * Libbpf also offers to probe for memcg-based accounting vs rlimit, but does
+ * so by checking for the availability of a given BPF helper and this has
+ * failed on some kernels with backports in the past, see commit 6b4384ff1088
+ * ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"").
+ * Instead, we can probe by lowering the process-based rlimit to 0, trying to
+ * load a BPF object, and resetting the rlimit. If the load succeeds then
+ * memcg-based accounting is supported.
+ *
+ * This would be too dangerous to do in the library, because multithreaded
+ * applications might attempt to load items while the rlimit is at 0. Given
+ * that bpftool is single-threaded, this is fine to do here.
+ */
+static bool known_to_need_rlimit(void)
+{
+	struct rlimit rlim_init, rlim_cur_zero = {};
+	struct bpf_insn insns[] = {
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	size_t insn_cnt = ARRAY_SIZE(insns);
+	union bpf_attr attr;
+	int prog_fd, err;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
+	attr.insns = ptr_to_u64(insns);
+	attr.insn_cnt = insn_cnt;
+	attr.license = ptr_to_u64("GPL");
+
+	if (getrlimit(RLIMIT_MEMLOCK, &rlim_init))
+		return false;
+
+	/* Drop the soft limit to zero. We maintain the hard limit to its
+	 * current value, because lowering it would be a permanent operation
+	 * for unprivileged users.
+	 */
+	rlim_cur_zero.rlim_max = rlim_init.rlim_max;
+	if (setrlimit(RLIMIT_MEMLOCK, &rlim_cur_zero))
+		return false;
+
+	/* Do not use bpf_prog_load() from libbpf here, because it calls
+	 * bump_rlimit_memlock(), interfering with the current probe.
+	 */
+	prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
+	err = errno;
+
+	/* reset soft rlimit to its initial value */
+	setrlimit(RLIMIT_MEMLOCK, &rlim_init);
+
+	if (prog_fd < 0)
+		return err == EPERM;
+
+	close(prog_fd);
+	return false;
+}
+
 void set_max_rlimit(void)
 {
 	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
 
-	setrlimit(RLIMIT_MEMLOCK, &rinf);
+	if (known_to_need_rlimit())
+		setrlimit(RLIMIT_MEMLOCK, &rinf);
 }
 
 static int
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf-next v2] bpftool: Probe for memcg-based accounting before bumping rlimit
  2022-06-29 11:13 [PATCH bpf-next v2] bpftool: Probe for memcg-based accounting before bumping rlimit Quentin Monnet
@ 2022-06-29 14:24 ` Yafang Shao
  2022-06-29 16:19 ` sdf
  2022-06-29 21:40 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: Yafang Shao @ 2022-06-29 14:24 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, bpf, netdev, Stanislav Fomichev

On Wed, Jun 29, 2022 at 7:13 PM Quentin Monnet <quentin@isovalent.com> wrote:
>
> Bpftool used to bump the memlock rlimit to make sure to be able to load
> BPF objects. After the kernel has switched to memcg-based memory
> accounting [0] in 5.11, bpftool has relied on libbpf to probe the system
> for memcg-based accounting support and for raising the rlimit if
> necessary [1]. But this was later reverted, because the probe would
> sometimes fail, resulting in bpftool not being able to load all required
> objects [2].
>
> Here we add a more efficient probe, in bpftool itself. We first lower
> the rlimit to 0, then we attempt to load a BPF object (and finally reset
> the rlimit): if the load succeeds, then memcg-based memory accounting is
> supported.
>
> This approach was earlier proposed for the probe in libbpf itself [3],
> but given that the library may be used in multithreaded applications,
> the probe could have undesirable consequences if one thread attempts to
> lock kernel memory while memlock rlimit is at 0. Since bpftool is
> single-threaded and the rlimit is process-based, this is fine to do in
> bpftool itself.
>
> This probe was inspired by the similar one from the cilium/ebpf Go
> library [4].
>
> v2:
> - Simply use sizeof(attr) instead of hardcoding a size via
>   offsetofend().
> - Set r0 = 0 before returning in sample program.
>
> [0] commit 97306be45fbe ("Merge branch 'switch to memcg-based memory accounting'")
> [1] commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK")
> [2] commit 6b4384ff1088 ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"")
> [3] https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/t/#u
> [4] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
>
> Cc: Stanislav Fomichev <sdf@google.com>
> Cc: Yafang Shao <laoar.shao@gmail.com>
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

LGTM

Acked-by: Yafang Shao <laoar.shao@gmail.com>

> ---
>  tools/bpf/bpftool/common.c | 71 ++++++++++++++++++++++++++++++++++++--
>  1 file changed, 68 insertions(+), 3 deletions(-)
>
> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> index a0d4acd7c54a..fc8172a4969a 100644
> --- a/tools/bpf/bpftool/common.c
> +++ b/tools/bpf/bpftool/common.c
> @@ -13,14 +13,17 @@
>  #include <stdlib.h>
>  #include <string.h>
>  #include <unistd.h>
> -#include <linux/limits.h>
> -#include <linux/magic.h>
>  #include <net/if.h>
>  #include <sys/mount.h>
>  #include <sys/resource.h>
>  #include <sys/stat.h>
>  #include <sys/vfs.h>
>
> +#include <linux/filter.h>
> +#include <linux/limits.h>
> +#include <linux/magic.h>
> +#include <linux/unistd.h>
> +
>  #include <bpf/bpf.h>
>  #include <bpf/hashmap.h>
>  #include <bpf/libbpf.h> /* libbpf_num_possible_cpus */
> @@ -73,11 +76,73 @@ static bool is_bpffs(char *path)
>         return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
>  }
>
> +/* Probe whether kernel switched from memlock-based (RLIMIT_MEMLOCK) to
> + * memcg-based memory accounting for BPF maps and programs. This was done in
> + * commit 97306be45fbe ("Merge branch 'switch to memcg-based memory
> + * accounting'"), in Linux 5.11.
> + *
> + * Libbpf also offers to probe for memcg-based accounting vs rlimit, but does
> + * so by checking for the availability of a given BPF helper and this has
> + * failed on some kernels with backports in the past, see commit 6b4384ff1088
> + * ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"").
> + * Instead, we can probe by lowering the process-based rlimit to 0, trying to
> + * load a BPF object, and resetting the rlimit. If the load succeeds then
> + * memcg-based accounting is supported.
> + *
> + * This would be too dangerous to do in the library, because multithreaded
> + * applications might attempt to load items while the rlimit is at 0. Given
> + * that bpftool is single-threaded, this is fine to do here.
> + */
> +static bool known_to_need_rlimit(void)
> +{
> +       struct rlimit rlim_init, rlim_cur_zero = {};
> +       struct bpf_insn insns[] = {
> +               BPF_MOV64_IMM(BPF_REG_0, 0),
> +               BPF_EXIT_INSN(),
> +       };
> +       size_t insn_cnt = ARRAY_SIZE(insns);
> +       union bpf_attr attr;
> +       int prog_fd, err;
> +
> +       memset(&attr, 0, sizeof(attr));
> +       attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
> +       attr.insns = ptr_to_u64(insns);
> +       attr.insn_cnt = insn_cnt;
> +       attr.license = ptr_to_u64("GPL");
> +
> +       if (getrlimit(RLIMIT_MEMLOCK, &rlim_init))
> +               return false;
> +
> +       /* Drop the soft limit to zero. We maintain the hard limit to its
> +        * current value, because lowering it would be a permanent operation
> +        * for unprivileged users.
> +        */
> +       rlim_cur_zero.rlim_max = rlim_init.rlim_max;
> +       if (setrlimit(RLIMIT_MEMLOCK, &rlim_cur_zero))
> +               return false;
> +
> +       /* Do not use bpf_prog_load() from libbpf here, because it calls
> +        * bump_rlimit_memlock(), interfering with the current probe.
> +        */
> +       prog_fd = syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
> +       err = errno;
> +
> +       /* reset soft rlimit to its initial value */
> +       setrlimit(RLIMIT_MEMLOCK, &rlim_init);
> +
> +       if (prog_fd < 0)
> +               return err == EPERM;
> +
> +       close(prog_fd);
> +       return false;
> +}
> +
>  void set_max_rlimit(void)
>  {
>         struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>
> -       setrlimit(RLIMIT_MEMLOCK, &rinf);
> +       if (known_to_need_rlimit())
> +               setrlimit(RLIMIT_MEMLOCK, &rinf);
>  }
>
>  static int
> --
> 2.34.1
>


-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf-next v2] bpftool: Probe for memcg-based accounting before bumping rlimit
  2022-06-29 11:13 [PATCH bpf-next v2] bpftool: Probe for memcg-based accounting before bumping rlimit Quentin Monnet
  2022-06-29 14:24 ` Yafang Shao
@ 2022-06-29 16:19 ` sdf
  2022-06-29 21:40 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: sdf @ 2022-06-29 16:19 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, bpf, netdev, Yafang Shao

On 06/29, Quentin Monnet wrote:
> Bpftool used to bump the memlock rlimit to make sure to be able to load
> BPF objects. After the kernel has switched to memcg-based memory
> accounting [0] in 5.11, bpftool has relied on libbpf to probe the system
> for memcg-based accounting support and for raising the rlimit if
> necessary [1]. But this was later reverted, because the probe would
> sometimes fail, resulting in bpftool not being able to load all required
> objects [2].

> Here we add a more efficient probe, in bpftool itself. We first lower
> the rlimit to 0, then we attempt to load a BPF object (and finally reset
> the rlimit): if the load succeeds, then memcg-based memory accounting is
> supported.

> This approach was earlier proposed for the probe in libbpf itself [3],
> but given that the library may be used in multithreaded applications,
> the probe could have undesirable consequences if one thread attempts to
> lock kernel memory while memlock rlimit is at 0. Since bpftool is
> single-threaded and the rlimit is process-based, this is fine to do in
> bpftool itself.

> This probe was inspired by the similar one from the cilium/ebpf Go
> library [4].

> v2:
> - Simply use sizeof(attr) instead of hardcoding a size via
>    offsetofend().
> - Set r0 = 0 before returning in sample program.

> [0] commit 97306be45fbe ("Merge branch 'switch to memcg-based memory  
> accounting'")
> [1] commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of  
> RLIMIT_MEMLOCK")
> [2] commit 6b4384ff1088 ("Revert "bpftool: Use libbpf 1.0 API mode  
> instead of RLIMIT_MEMLOCK"")
> [3]  
> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/t/#u
> [4] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39

> Cc: Stanislav Fomichev <sdf@google.com>

Reviewed-by: Stanislav Fomichev <sdf@google.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf-next v2] bpftool: Probe for memcg-based accounting before bumping rlimit
  2022-06-29 11:13 [PATCH bpf-next v2] bpftool: Probe for memcg-based accounting before bumping rlimit Quentin Monnet
  2022-06-29 14:24 ` Yafang Shao
  2022-06-29 16:19 ` sdf
@ 2022-06-29 21:40 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-06-29 21:40 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, bpf, netdev, sdf, laoar.shao

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Wed, 29 Jun 2022 12:13:51 +0100 you wrote:
> Bpftool used to bump the memlock rlimit to make sure to be able to load
> BPF objects. After the kernel has switched to memcg-based memory
> accounting [0] in 5.11, bpftool has relied on libbpf to probe the system
> for memcg-based accounting support and for raising the rlimit if
> necessary [1]. But this was later reverted, because the probe would
> sometimes fail, resulting in bpftool not being able to load all required
> objects [2].
> 
> [...]

Here is the summary with links:
  - [bpf-next,v2] bpftool: Probe for memcg-based accounting before bumping rlimit
    https://git.kernel.org/bpf/bpf-next/c/f0cf642c56b7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-06-29 21:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29 11:13 [PATCH bpf-next v2] bpftool: Probe for memcg-based accounting before bumping rlimit Quentin Monnet
2022-06-29 14:24 ` Yafang Shao
2022-06-29 16:19 ` sdf
2022-06-29 21:40 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.