netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/2] bpftool: Restore memlock rlimit bump
@ 2022-06-10 11:26 Quentin Monnet
  2022-06-10 11:26 ` [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK" Quentin Monnet
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Quentin Monnet @ 2022-06-10 11:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Yafang Shao, Harsh Modi, Paul Chaignon, netdev, bpf, Quentin Monnet

We recently removed the uncondtional rlimit bump from bpftool, deferring it
to libbpf to probe the system for memcg-based memory accounting and to
raise the rlimit if necessary.

This probing is based on the availability of a given BPF helper, and his
known to fail on some environments where the helper, but not the memcg
memory accounting, has been backported.

To work around this, this set restores the memlock rlimit bump in bpftool.
Please see the description of the first patch for more details.

Quentin Monnet (2):
  Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  bpftool: Do not check return value from libbpf_set_strict_mode()

 tools/bpf/bpftool/common.c     | 8 ++++++++
 tools/bpf/bpftool/feature.c    | 2 ++
 tools/bpf/bpftool/main.c       | 2 --
 tools/bpf/bpftool/main.h       | 2 ++
 tools/bpf/bpftool/map.c        | 2 ++
 tools/bpf/bpftool/pids.c       | 1 +
 tools/bpf/bpftool/prog.c       | 3 +++
 tools/bpf/bpftool/struct_ops.c | 2 ++
 8 files changed, 20 insertions(+), 2 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-10 11:26 [PATCH bpf-next 0/2] bpftool: Restore memlock rlimit bump Quentin Monnet
@ 2022-06-10 11:26 ` Quentin Monnet
  2022-06-10 16:07   ` sdf
  2022-06-10 11:26 ` [PATCH bpf-next 2/2] bpftool: Do not check return value from libbpf_set_strict_mode() Quentin Monnet
  2022-06-14 20:37 ` [PATCH bpf-next 0/2] bpftool: Restore memlock rlimit bump patchwork-bot+netdevbpf
  2 siblings, 1 reply; 20+ messages in thread
From: Quentin Monnet @ 2022-06-10 11:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Yafang Shao, Harsh Modi, Paul Chaignon, netdev, bpf, Quentin Monnet

This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.

In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
kernel has switched to memcg-based memory accounting. Thanks to the
LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
with other systems and ask libbpf to raise the limit for us if
necessary.

How do we know if memcg-based accounting is supported? There is a probe
in libbpf to check this. But this probe currently relies on the
availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
landed in the same kernel version as the memory accounting change. This
works in the generic case, but it may fail, for example, if the helper
function has been backported to an older kernel. This has been observed
for Google Cloud's Container-Optimized OS (COS), where the helper is
available but rlimit is still in use. The probe succeeds, the rlimit is
not raised, and probing features with bpftool, for example, fails.

A patch was submitted [0] to update this probe in libbpf, based on what
the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
0, attempt to load a BPF object, and reset the rlimit. But it may induce
some hard-to-debug flakiness if another process starts, or the current
application is killed, while the rlimit is reduced, and the approach was
discarded.

As a workaround to ensure that the rlimit bump does not depend on the
availability of a given helper, we restore the unconditional rlimit bump
in bpftool for now.

[0] https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
[1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39

Cc: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
---
 tools/bpf/bpftool/common.c     | 8 ++++++++
 tools/bpf/bpftool/feature.c    | 2 ++
 tools/bpf/bpftool/main.c       | 6 +++---
 tools/bpf/bpftool/main.h       | 2 ++
 tools/bpf/bpftool/map.c        | 2 ++
 tools/bpf/bpftool/pids.c       | 1 +
 tools/bpf/bpftool/prog.c       | 3 +++
 tools/bpf/bpftool/struct_ops.c | 2 ++
 8 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index a45b42ee8ab0..a0d4acd7c54a 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -17,6 +17,7 @@
 #include <linux/magic.h>
 #include <net/if.h>
 #include <sys/mount.h>
+#include <sys/resource.h>
 #include <sys/stat.h>
 #include <sys/vfs.h>
 
@@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
 	return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
 }
 
+void set_max_rlimit(void)
+{
+	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
+
+	setrlimit(RLIMIT_MEMLOCK, &rinf);
+}
+
 static int
 mnt_fs(const char *target, const char *type, char *buff, size_t bufflen)
 {
diff --git a/tools/bpf/bpftool/feature.c b/tools/bpf/bpftool/feature.c
index cc9e4df8c58e..bac4ef428a02 100644
--- a/tools/bpf/bpftool/feature.c
+++ b/tools/bpf/bpftool/feature.c
@@ -1167,6 +1167,8 @@ static int do_probe(int argc, char **argv)
 	__u32 ifindex = 0;
 	char *ifname;
 
+	set_max_rlimit();
+
 	while (argc) {
 		if (is_prefix(*argv, "kernel")) {
 			if (target != COMPONENT_UNSPEC) {
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 9062ef2b8767..e81227761f5d 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -507,9 +507,9 @@ int main(int argc, char **argv)
 		 * It will still be rejected if users use LIBBPF_STRICT_ALL
 		 * mode for loading generated skeleton.
 		 */
-		libbpf_set_strict_mode(LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS);
-	} else {
-		libbpf_set_strict_mode(LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK);
+		ret = libbpf_set_strict_mode(LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS);
+		if (ret)
+			p_err("failed to enable libbpf strict mode: %d", ret);
 	}
 
 	argc -= optind;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 6c311f47147e..589cb76b227a 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -96,6 +96,8 @@ int detect_common_prefix(const char *arg, ...);
 void fprint_hex(FILE *f, void *arg, unsigned int n, const char *sep);
 void usage(void) __noreturn;
 
+void set_max_rlimit(void);
+
 int mount_tracefs(const char *target);
 
 struct obj_ref {
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 800834be1bcb..38b6bc9c26c3 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -1326,6 +1326,8 @@ static int do_create(int argc, char **argv)
 		goto exit;
 	}
 
+	set_max_rlimit();
+
 	fd = bpf_map_create(map_type, map_name, key_size, value_size, max_entries, &attr);
 	if (fd < 0) {
 		p_err("map create failed: %s", strerror(errno));
diff --git a/tools/bpf/bpftool/pids.c b/tools/bpf/bpftool/pids.c
index e2d00d3cd868..bb6c969a114a 100644
--- a/tools/bpf/bpftool/pids.c
+++ b/tools/bpf/bpftool/pids.c
@@ -108,6 +108,7 @@ int build_obj_refs_table(struct hashmap **map, enum bpf_obj_type type)
 		p_err("failed to create hashmap for PID references");
 		return -1;
 	}
+	set_max_rlimit();
 
 	skel = pid_iter_bpf__open();
 	if (!skel) {
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index e71f0b2da50b..f081de398b60 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -1590,6 +1590,8 @@ static int load_with_options(int argc, char **argv, bool first_prog_only)
 		}
 	}
 
+	set_max_rlimit();
+
 	if (verifier_logs)
 		/* log_level1 + log_level2 + stats, but not stable UAPI */
 		open_opts.kernel_log_level = 1 + 2 + 4;
@@ -2287,6 +2289,7 @@ static int do_profile(int argc, char **argv)
 		}
 	}
 
+	set_max_rlimit();
 	err = profiler_bpf__load(profile_obj);
 	if (err) {
 		p_err("failed to load profile_obj");
diff --git a/tools/bpf/bpftool/struct_ops.c b/tools/bpf/bpftool/struct_ops.c
index 2535f079ed67..e08a6ff2866c 100644
--- a/tools/bpf/bpftool/struct_ops.c
+++ b/tools/bpf/bpftool/struct_ops.c
@@ -501,6 +501,8 @@ static int do_register(int argc, char **argv)
 	if (libbpf_get_error(obj))
 		return -1;
 
+	set_max_rlimit();
+
 	if (bpf_object__load(obj)) {
 		bpf_object__close(obj);
 		return -1;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH bpf-next 2/2] bpftool: Do not check return value from libbpf_set_strict_mode()
  2022-06-10 11:26 [PATCH bpf-next 0/2] bpftool: Restore memlock rlimit bump Quentin Monnet
  2022-06-10 11:26 ` [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK" Quentin Monnet
@ 2022-06-10 11:26 ` Quentin Monnet
  2022-06-14 20:37 ` [PATCH bpf-next 0/2] bpftool: Restore memlock rlimit bump patchwork-bot+netdevbpf
  2 siblings, 0 replies; 20+ messages in thread
From: Quentin Monnet @ 2022-06-10 11:26 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Yafang Shao, Harsh Modi, Paul Chaignon, netdev, bpf, Quentin Monnet

The function always returns 0, so we don't need to check whether the
return value is 0 or not.

This change was first introduced in commit a777e18f1bcd ("bpftool: Use
libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"), but later reverted to
restore the unconditional rlimit bump in bpftool. Let's re-add it.

Co-developed-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
---
 tools/bpf/bpftool/main.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index e81227761f5d..451cefc2d0da 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -507,9 +507,7 @@ int main(int argc, char **argv)
 		 * It will still be rejected if users use LIBBPF_STRICT_ALL
 		 * mode for loading generated skeleton.
 		 */
-		ret = libbpf_set_strict_mode(LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS);
-		if (ret)
-			p_err("failed to enable libbpf strict mode: %d", ret);
+		libbpf_set_strict_mode(LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS);
 	}
 
 	argc -= optind;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-10 11:26 ` [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK" Quentin Monnet
@ 2022-06-10 16:07   ` sdf
  2022-06-10 16:34     ` Quentin Monnet
  0 siblings, 1 reply; 20+ messages in thread
From: sdf @ 2022-06-10 16:07 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Yafang Shao, Harsh Modi, Paul Chaignon, netdev, bpf

On 06/10, Quentin Monnet wrote:
> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.

> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
> kernel has switched to memcg-based memory accounting. Thanks to the
> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
> with other systems and ask libbpf to raise the limit for us if
> necessary.

> How do we know if memcg-based accounting is supported? There is a probe
> in libbpf to check this. But this probe currently relies on the
> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
> landed in the same kernel version as the memory accounting change. This
> works in the generic case, but it may fail, for example, if the helper
> function has been backported to an older kernel. This has been observed
> for Google Cloud's Container-Optimized OS (COS), where the helper is
> available but rlimit is still in use. The probe succeeds, the rlimit is
> not raised, and probing features with bpftool, for example, fails.

> A patch was submitted [0] to update this probe in libbpf, based on what
> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
> some hard-to-debug flakiness if another process starts, or the current
> application is killed, while the rlimit is reduced, and the approach was
> discarded.

> As a workaround to ensure that the rlimit bump does not depend on the
> availability of a given helper, we restore the unconditional rlimit bump
> in bpftool for now.

> [0]  
> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39

> Cc: Yafang Shao <laoar.shao@gmail.com>
> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
> ---
>   tools/bpf/bpftool/common.c     | 8 ++++++++
>   tools/bpf/bpftool/feature.c    | 2 ++
>   tools/bpf/bpftool/main.c       | 6 +++---
>   tools/bpf/bpftool/main.h       | 2 ++
>   tools/bpf/bpftool/map.c        | 2 ++
>   tools/bpf/bpftool/pids.c       | 1 +
>   tools/bpf/bpftool/prog.c       | 3 +++
>   tools/bpf/bpftool/struct_ops.c | 2 ++
>   8 files changed, 23 insertions(+), 3 deletions(-)

> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> index a45b42ee8ab0..a0d4acd7c54a 100644
> --- a/tools/bpf/bpftool/common.c
> +++ b/tools/bpf/bpftool/common.c
> @@ -17,6 +17,7 @@
>   #include <linux/magic.h>
>   #include <net/if.h>
>   #include <sys/mount.h>
> +#include <sys/resource.h>
>   #include <sys/stat.h>
>   #include <sys/vfs.h>

> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
>   	return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
>   }

> +void set_max_rlimit(void)
> +{
> +	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> +
> +	setrlimit(RLIMIT_MEMLOCK, &rinf);

Do you think it might make sense to print to stderr some warning if
we actually happen to adjust this limit?

if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
	fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
	infinity!\n");
	setrlimit(RLIMIT_MEMLOCK, &rinf);
}

?

Because while it's nice that we automatically do this, this might still
lead to surprises for some users. OTOH, not sure whether people
actually read those warnings? :-/

> +}
> +
>   static int
>   mnt_fs(const char *target, const char *type, char *buff, size_t bufflen)
>   {
> diff --git a/tools/bpf/bpftool/feature.c b/tools/bpf/bpftool/feature.c
> index cc9e4df8c58e..bac4ef428a02 100644
> --- a/tools/bpf/bpftool/feature.c
> +++ b/tools/bpf/bpftool/feature.c
> @@ -1167,6 +1167,8 @@ static int do_probe(int argc, char **argv)
>   	__u32 ifindex = 0;
>   	char *ifname;

> +	set_max_rlimit();
> +
>   	while (argc) {
>   		if (is_prefix(*argv, "kernel")) {
>   			if (target != COMPONENT_UNSPEC) {
> diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
> index 9062ef2b8767..e81227761f5d 100644
> --- a/tools/bpf/bpftool/main.c
> +++ b/tools/bpf/bpftool/main.c
> @@ -507,9 +507,9 @@ int main(int argc, char **argv)
>   		 * It will still be rejected if users use LIBBPF_STRICT_ALL
>   		 * mode for loading generated skeleton.
>   		 */
> -		libbpf_set_strict_mode(LIBBPF_STRICT_ALL &  
> ~LIBBPF_STRICT_MAP_DEFINITIONS);
> -	} else {
> -		libbpf_set_strict_mode(LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK);
> +		ret = libbpf_set_strict_mode(LIBBPF_STRICT_ALL &  
> ~LIBBPF_STRICT_MAP_DEFINITIONS);
> +		if (ret)
> +			p_err("failed to enable libbpf strict mode: %d", ret);
>   	}

>   	argc -= optind;
> diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
> index 6c311f47147e..589cb76b227a 100644
> --- a/tools/bpf/bpftool/main.h
> +++ b/tools/bpf/bpftool/main.h
> @@ -96,6 +96,8 @@ int detect_common_prefix(const char *arg, ...);
>   void fprint_hex(FILE *f, void *arg, unsigned int n, const char *sep);
>   void usage(void) __noreturn;

> +void set_max_rlimit(void);
> +
>   int mount_tracefs(const char *target);

>   struct obj_ref {
> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index 800834be1bcb..38b6bc9c26c3 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -1326,6 +1326,8 @@ static int do_create(int argc, char **argv)
>   		goto exit;
>   	}

> +	set_max_rlimit();
> +
>   	fd = bpf_map_create(map_type, map_name, key_size, value_size,  
> max_entries, &attr);
>   	if (fd < 0) {
>   		p_err("map create failed: %s", strerror(errno));
> diff --git a/tools/bpf/bpftool/pids.c b/tools/bpf/bpftool/pids.c
> index e2d00d3cd868..bb6c969a114a 100644
> --- a/tools/bpf/bpftool/pids.c
> +++ b/tools/bpf/bpftool/pids.c
> @@ -108,6 +108,7 @@ int build_obj_refs_table(struct hashmap **map, enum  
> bpf_obj_type type)
>   		p_err("failed to create hashmap for PID references");
>   		return -1;
>   	}
> +	set_max_rlimit();

>   	skel = pid_iter_bpf__open();
>   	if (!skel) {
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index e71f0b2da50b..f081de398b60 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -1590,6 +1590,8 @@ static int load_with_options(int argc, char **argv,  
> bool first_prog_only)
>   		}
>   	}

> +	set_max_rlimit();
> +
>   	if (verifier_logs)
>   		/* log_level1 + log_level2 + stats, but not stable UAPI */
>   		open_opts.kernel_log_level = 1 + 2 + 4;
> @@ -2287,6 +2289,7 @@ static int do_profile(int argc, char **argv)
>   		}
>   	}

> +	set_max_rlimit();
>   	err = profiler_bpf__load(profile_obj);
>   	if (err) {
>   		p_err("failed to load profile_obj");
> diff --git a/tools/bpf/bpftool/struct_ops.c  
> b/tools/bpf/bpftool/struct_ops.c
> index 2535f079ed67..e08a6ff2866c 100644
> --- a/tools/bpf/bpftool/struct_ops.c
> +++ b/tools/bpf/bpftool/struct_ops.c
> @@ -501,6 +501,8 @@ static int do_register(int argc, char **argv)
>   	if (libbpf_get_error(obj))
>   		return -1;

> +	set_max_rlimit();
> +
>   	if (bpf_object__load(obj)) {
>   		bpf_object__close(obj);
>   		return -1;
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-10 16:07   ` sdf
@ 2022-06-10 16:34     ` Quentin Monnet
  2022-06-10 16:46       ` Stanislav Fomichev
  0 siblings, 1 reply; 20+ messages in thread
From: Quentin Monnet @ 2022-06-10 16:34 UTC (permalink / raw)
  To: sdf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Yafang Shao, Harsh Modi, Paul Chaignon, netdev, bpf

2022-06-10 09:07 UTC-0700 ~ sdf@google.com
> On 06/10, Quentin Monnet wrote:
>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
> 
>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
>> kernel has switched to memcg-based memory accounting. Thanks to the
>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
>> with other systems and ask libbpf to raise the limit for us if
>> necessary.
> 
>> How do we know if memcg-based accounting is supported? There is a probe
>> in libbpf to check this. But this probe currently relies on the
>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
>> landed in the same kernel version as the memory accounting change. This
>> works in the generic case, but it may fail, for example, if the helper
>> function has been backported to an older kernel. This has been observed
>> for Google Cloud's Container-Optimized OS (COS), where the helper is
>> available but rlimit is still in use. The probe succeeds, the rlimit is
>> not raised, and probing features with bpftool, for example, fails.
> 
>> A patch was submitted [0] to update this probe in libbpf, based on what
>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
>> some hard-to-debug flakiness if another process starts, or the current
>> application is killed, while the rlimit is reduced, and the approach was
>> discarded.
> 
>> As a workaround to ensure that the rlimit bump does not depend on the
>> availability of a given helper, we restore the unconditional rlimit bump
>> in bpftool for now.
> 
>> [0]
>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
> 
>> Cc: Yafang Shao <laoar.shao@gmail.com>
>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
>> ---
>>   tools/bpf/bpftool/common.c     | 8 ++++++++
>>   tools/bpf/bpftool/feature.c    | 2 ++
>>   tools/bpf/bpftool/main.c       | 6 +++---
>>   tools/bpf/bpftool/main.h       | 2 ++
>>   tools/bpf/bpftool/map.c        | 2 ++
>>   tools/bpf/bpftool/pids.c       | 1 +
>>   tools/bpf/bpftool/prog.c       | 3 +++
>>   tools/bpf/bpftool/struct_ops.c | 2 ++
>>   8 files changed, 23 insertions(+), 3 deletions(-)
> 
>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
>> index a45b42ee8ab0..a0d4acd7c54a 100644
>> --- a/tools/bpf/bpftool/common.c
>> +++ b/tools/bpf/bpftool/common.c
>> @@ -17,6 +17,7 @@
>>   #include <linux/magic.h>
>>   #include <net/if.h>
>>   #include <sys/mount.h>
>> +#include <sys/resource.h>
>>   #include <sys/stat.h>
>>   #include <sys/vfs.h>
> 
>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
>>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
>>   }
> 
>> +void set_max_rlimit(void)
>> +{
>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>> +
>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
> 
> Do you think it might make sense to print to stderr some warning if
> we actually happen to adjust this limit?
> 
> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
>     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
>     infinity!\n");
>     setrlimit(RLIMIT_MEMLOCK, &rinf);
> }
> 
> ?
> 
> Because while it's nice that we automatically do this, this might still
> lead to surprises for some users. OTOH, not sure whether people
> actually read those warnings? :-/

I'm not strictly opposed to a warning, but I'm not completely sure this
is desirable.

Bpftool has raised the rlimit for a long time, it changed only in April,
so I don't think it would come up as a surprise for people who have used
it for a while. I think this is also something that several other
BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
have been doing too.

For new users, I agree the warning may be helpful. But then the message
is likely to appear the very first time a user runs the command - likely
as root - and I fear this might worry people not familiar with rlimits,
who would wonder if they just broke something on their system? Maybe
with a different phrasing.

Alternatively we could document it in the relevant man pages (not that
people would see it better, but at least it would be mentioned somewhere
if people take the time to read the docs)? What do you think?

Quentin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-10 16:34     ` Quentin Monnet
@ 2022-06-10 16:46       ` Stanislav Fomichev
  2022-06-10 17:00         ` Quentin Monnet
  0 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 16:46 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Yafang Shao, Harsh Modi, Paul Chaignon, netdev, bpf

On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
>
> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
> > On 06/10, Quentin Monnet wrote:
> >> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
> >
> >> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
> >> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
> >> kernel has switched to memcg-based memory accounting. Thanks to the
> >> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
> >> with other systems and ask libbpf to raise the limit for us if
> >> necessary.
> >
> >> How do we know if memcg-based accounting is supported? There is a probe
> >> in libbpf to check this. But this probe currently relies on the
> >> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
> >> landed in the same kernel version as the memory accounting change. This
> >> works in the generic case, but it may fail, for example, if the helper
> >> function has been backported to an older kernel. This has been observed
> >> for Google Cloud's Container-Optimized OS (COS), where the helper is
> >> available but rlimit is still in use. The probe succeeds, the rlimit is
> >> not raised, and probing features with bpftool, for example, fails.
> >
> >> A patch was submitted [0] to update this probe in libbpf, based on what
> >> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
> >> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
> >> some hard-to-debug flakiness if another process starts, or the current
> >> application is killed, while the rlimit is reduced, and the approach was
> >> discarded.
> >
> >> As a workaround to ensure that the rlimit bump does not depend on the
> >> availability of a given helper, we restore the unconditional rlimit bump
> >> in bpftool for now.
> >
> >> [0]
> >> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
> >> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
> >
> >> Cc: Yafang Shao <laoar.shao@gmail.com>
> >> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
> >> ---
> >>   tools/bpf/bpftool/common.c     | 8 ++++++++
> >>   tools/bpf/bpftool/feature.c    | 2 ++
> >>   tools/bpf/bpftool/main.c       | 6 +++---
> >>   tools/bpf/bpftool/main.h       | 2 ++
> >>   tools/bpf/bpftool/map.c        | 2 ++
> >>   tools/bpf/bpftool/pids.c       | 1 +
> >>   tools/bpf/bpftool/prog.c       | 3 +++
> >>   tools/bpf/bpftool/struct_ops.c | 2 ++
> >>   8 files changed, 23 insertions(+), 3 deletions(-)
> >
> >> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> >> index a45b42ee8ab0..a0d4acd7c54a 100644
> >> --- a/tools/bpf/bpftool/common.c
> >> +++ b/tools/bpf/bpftool/common.c
> >> @@ -17,6 +17,7 @@
> >>   #include <linux/magic.h>
> >>   #include <net/if.h>
> >>   #include <sys/mount.h>
> >> +#include <sys/resource.h>
> >>   #include <sys/stat.h>
> >>   #include <sys/vfs.h>
> >
> >> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
> >>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
> >>   }
> >
> >> +void set_max_rlimit(void)
> >> +{
> >> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> >> +
> >> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
> >
> > Do you think it might make sense to print to stderr some warning if
> > we actually happen to adjust this limit?
> >
> > if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
> >     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
> >     infinity!\n");
> >     setrlimit(RLIMIT_MEMLOCK, &rinf);
> > }
> >
> > ?
> >
> > Because while it's nice that we automatically do this, this might still
> > lead to surprises for some users. OTOH, not sure whether people
> > actually read those warnings? :-/
>
> I'm not strictly opposed to a warning, but I'm not completely sure this
> is desirable.
>
> Bpftool has raised the rlimit for a long time, it changed only in April,
> so I don't think it would come up as a surprise for people who have used
> it for a while. I think this is also something that several other
> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
> have been doing too.

In this case ignore me and let's continue doing that :-)

Btw, eventually we'd still like to stop doing that I'd presume? Should
we at some point follow up with something like:

if (kernel_version >= 5.11) { don't touch memlock; }

?

I guess we care only about <5.11 because of the backports, but 5.11+
kernels are guaranteed to have memcg.

I'm not sure whether memlock is used out there in the distros (and
especially for root/bpf_capable), so I'm also not sure whether we
really care or not.

> For new users, I agree the warning may be helpful. But then the message
> is likely to appear the very first time a user runs the command - likely
> as root - and I fear this might worry people not familiar with rlimits,
> who would wonder if they just broke something on their system? Maybe
> with a different phrasing.
>
> Alternatively we could document it in the relevant man pages (not that
> people would see it better, but at least it would be mentioned somewhere
> if people take the time to read the docs)? What do you think?
>
> Quentin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-10 16:46       ` Stanislav Fomichev
@ 2022-06-10 17:00         ` Quentin Monnet
  2022-06-10 17:17           ` Stanislav Fomichev
  0 siblings, 1 reply; 20+ messages in thread
From: Quentin Monnet @ 2022-06-10 17:00 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Yafang Shao, Harsh Modi, Paul Chaignon, netdev, bpf

2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
> On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
>>
>> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
>>> On 06/10, Quentin Monnet wrote:
>>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
>>>
>>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
>>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
>>>> kernel has switched to memcg-based memory accounting. Thanks to the
>>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
>>>> with other systems and ask libbpf to raise the limit for us if
>>>> necessary.
>>>
>>>> How do we know if memcg-based accounting is supported? There is a probe
>>>> in libbpf to check this. But this probe currently relies on the
>>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
>>>> landed in the same kernel version as the memory accounting change. This
>>>> works in the generic case, but it may fail, for example, if the helper
>>>> function has been backported to an older kernel. This has been observed
>>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
>>>> available but rlimit is still in use. The probe succeeds, the rlimit is
>>>> not raised, and probing features with bpftool, for example, fails.
>>>
>>>> A patch was submitted [0] to update this probe in libbpf, based on what
>>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
>>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
>>>> some hard-to-debug flakiness if another process starts, or the current
>>>> application is killed, while the rlimit is reduced, and the approach was
>>>> discarded.
>>>
>>>> As a workaround to ensure that the rlimit bump does not depend on the
>>>> availability of a given helper, we restore the unconditional rlimit bump
>>>> in bpftool for now.
>>>
>>>> [0]
>>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
>>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
>>>
>>>> Cc: Yafang Shao <laoar.shao@gmail.com>
>>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
>>>> ---
>>>>   tools/bpf/bpftool/common.c     | 8 ++++++++
>>>>   tools/bpf/bpftool/feature.c    | 2 ++
>>>>   tools/bpf/bpftool/main.c       | 6 +++---
>>>>   tools/bpf/bpftool/main.h       | 2 ++
>>>>   tools/bpf/bpftool/map.c        | 2 ++
>>>>   tools/bpf/bpftool/pids.c       | 1 +
>>>>   tools/bpf/bpftool/prog.c       | 3 +++
>>>>   tools/bpf/bpftool/struct_ops.c | 2 ++
>>>>   8 files changed, 23 insertions(+), 3 deletions(-)
>>>
>>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
>>>> index a45b42ee8ab0..a0d4acd7c54a 100644
>>>> --- a/tools/bpf/bpftool/common.c
>>>> +++ b/tools/bpf/bpftool/common.c
>>>> @@ -17,6 +17,7 @@
>>>>   #include <linux/magic.h>
>>>>   #include <net/if.h>
>>>>   #include <sys/mount.h>
>>>> +#include <sys/resource.h>
>>>>   #include <sys/stat.h>
>>>>   #include <sys/vfs.h>
>>>
>>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
>>>>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
>>>>   }
>>>
>>>> +void set_max_rlimit(void)
>>>> +{
>>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>>>> +
>>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
>>>
>>> Do you think it might make sense to print to stderr some warning if
>>> we actually happen to adjust this limit?
>>>
>>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
>>>     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
>>>     infinity!\n");
>>>     setrlimit(RLIMIT_MEMLOCK, &rinf);
>>> }
>>>
>>> ?
>>>
>>> Because while it's nice that we automatically do this, this might still
>>> lead to surprises for some users. OTOH, not sure whether people
>>> actually read those warnings? :-/
>>
>> I'm not strictly opposed to a warning, but I'm not completely sure this
>> is desirable.
>>
>> Bpftool has raised the rlimit for a long time, it changed only in April,
>> so I don't think it would come up as a surprise for people who have used
>> it for a while. I think this is also something that several other
>> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
>> have been doing too.
> 
> In this case ignore me and let's continue doing that :-)
> 
> Btw, eventually we'd still like to stop doing that I'd presume?

Agreed. I was thinking either finding a way to improve the probe in
libbpf, or waiting for some more time until 5.11 gets old, but this may
take years :/

> Should
> we at some point follow up with something like:
> 
> if (kernel_version >= 5.11) { don't touch memlock; }
> 
> ?
> 
> I guess we care only about <5.11 because of the backports, but 5.11+
> kernels are guaranteed to have memcg.

You mean from uname() and parsing the release? Yes I suppose we could do
that, can do as a follow-up.

> 
> I'm not sure whether memlock is used out there in the distros (and
> especially for root/bpf_capable), so I'm also not sure whether we
> really care or not.

Not sure either. For what it's worth, I've never seen complaints so far
from users about the rlimit being raised (from bpftool or other BPF apps).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-10 17:00         ` Quentin Monnet
@ 2022-06-10 17:17           ` Stanislav Fomichev
  2022-06-14 12:37             ` Yafang Shao
  0 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2022-06-10 17:17 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Yafang Shao, Harsh Modi, Paul Chaignon, netdev, bpf

On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@isovalent.com> wrote:
>
> 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
> > On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
> >>
> >> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
> >>> On 06/10, Quentin Monnet wrote:
> >>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
> >>>
> >>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
> >>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
> >>>> kernel has switched to memcg-based memory accounting. Thanks to the
> >>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
> >>>> with other systems and ask libbpf to raise the limit for us if
> >>>> necessary.
> >>>
> >>>> How do we know if memcg-based accounting is supported? There is a probe
> >>>> in libbpf to check this. But this probe currently relies on the
> >>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
> >>>> landed in the same kernel version as the memory accounting change. This
> >>>> works in the generic case, but it may fail, for example, if the helper
> >>>> function has been backported to an older kernel. This has been observed
> >>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
> >>>> available but rlimit is still in use. The probe succeeds, the rlimit is
> >>>> not raised, and probing features with bpftool, for example, fails.
> >>>
> >>>> A patch was submitted [0] to update this probe in libbpf, based on what
> >>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
> >>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
> >>>> some hard-to-debug flakiness if another process starts, or the current
> >>>> application is killed, while the rlimit is reduced, and the approach was
> >>>> discarded.
> >>>
> >>>> As a workaround to ensure that the rlimit bump does not depend on the
> >>>> availability of a given helper, we restore the unconditional rlimit bump
> >>>> in bpftool for now.
> >>>
> >>>> [0]
> >>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
> >>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
> >>>
> >>>> Cc: Yafang Shao <laoar.shao@gmail.com>
> >>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
> >>>> ---
> >>>>   tools/bpf/bpftool/common.c     | 8 ++++++++
> >>>>   tools/bpf/bpftool/feature.c    | 2 ++
> >>>>   tools/bpf/bpftool/main.c       | 6 +++---
> >>>>   tools/bpf/bpftool/main.h       | 2 ++
> >>>>   tools/bpf/bpftool/map.c        | 2 ++
> >>>>   tools/bpf/bpftool/pids.c       | 1 +
> >>>>   tools/bpf/bpftool/prog.c       | 3 +++
> >>>>   tools/bpf/bpftool/struct_ops.c | 2 ++
> >>>>   8 files changed, 23 insertions(+), 3 deletions(-)
> >>>
> >>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> >>>> index a45b42ee8ab0..a0d4acd7c54a 100644
> >>>> --- a/tools/bpf/bpftool/common.c
> >>>> +++ b/tools/bpf/bpftool/common.c
> >>>> @@ -17,6 +17,7 @@
> >>>>   #include <linux/magic.h>
> >>>>   #include <net/if.h>
> >>>>   #include <sys/mount.h>
> >>>> +#include <sys/resource.h>
> >>>>   #include <sys/stat.h>
> >>>>   #include <sys/vfs.h>
> >>>
> >>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
> >>>>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
> >>>>   }
> >>>
> >>>> +void set_max_rlimit(void)
> >>>> +{
> >>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> >>>> +
> >>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
> >>>
> >>> Do you think it might make sense to print to stderr some warning if
> >>> we actually happen to adjust this limit?
> >>>
> >>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
> >>>     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
> >>>     infinity!\n");
> >>>     setrlimit(RLIMIT_MEMLOCK, &rinf);
> >>> }
> >>>
> >>> ?
> >>>
> >>> Because while it's nice that we automatically do this, this might still
> >>> lead to surprises for some users. OTOH, not sure whether people
> >>> actually read those warnings? :-/
> >>
> >> I'm not strictly opposed to a warning, but I'm not completely sure this
> >> is desirable.
> >>
> >> Bpftool has raised the rlimit for a long time, it changed only in April,
> >> so I don't think it would come up as a surprise for people who have used
> >> it for a while. I think this is also something that several other
> >> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
> >> have been doing too.
> >
> > In this case ignore me and let's continue doing that :-)
> >
> > Btw, eventually we'd still like to stop doing that I'd presume?
>
> Agreed. I was thinking either finding a way to improve the probe in
> libbpf, or waiting for some more time until 5.11 gets old, but this may
> take years :/
>
> > Should
> > we at some point follow up with something like:
> >
> > if (kernel_version >= 5.11) { don't touch memlock; }
> >
> > ?
> >
> > I guess we care only about <5.11 because of the backports, but 5.11+
> > kernels are guaranteed to have memcg.
>
> You mean from uname() and parsing the release? Yes I suppose we could do
> that, can do as a follow-up.

Yeah, uname-based, I don't think we can do better? Given that probing
is problematic as well :-(
But idk, up to you.

> > I'm not sure whether memlock is used out there in the distros (and
> > especially for root/bpf_capable), so I'm also not sure whether we
> > really care or not.
>
> Not sure either. For what it's worth, I've never seen complaints so far
> from users about the rlimit being raised (from bpftool or other BPF apps).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-10 17:17           ` Stanislav Fomichev
@ 2022-06-14 12:37             ` Yafang Shao
  2022-06-14 14:20               ` Quentin Monnet
  0 siblings, 1 reply; 20+ messages in thread
From: Yafang Shao @ 2022-06-14 12:37 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Quentin Monnet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Harsh Modi, Paul Chaignon, netdev, bpf

On Sat, Jun 11, 2022 at 1:17 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@isovalent.com> wrote:
> >
> > 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
> > > On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
> > >>
> > >> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
> > >>> On 06/10, Quentin Monnet wrote:
> > >>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
> > >>>
> > >>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
> > >>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
> > >>>> kernel has switched to memcg-based memory accounting. Thanks to the
> > >>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
> > >>>> with other systems and ask libbpf to raise the limit for us if
> > >>>> necessary.
> > >>>
> > >>>> How do we know if memcg-based accounting is supported? There is a probe
> > >>>> in libbpf to check this. But this probe currently relies on the
> > >>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
> > >>>> landed in the same kernel version as the memory accounting change. This
> > >>>> works in the generic case, but it may fail, for example, if the helper
> > >>>> function has been backported to an older kernel. This has been observed
> > >>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
> > >>>> available but rlimit is still in use. The probe succeeds, the rlimit is
> > >>>> not raised, and probing features with bpftool, for example, fails.
> > >>>
> > >>>> A patch was submitted [0] to update this probe in libbpf, based on what
> > >>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
> > >>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
> > >>>> some hard-to-debug flakiness if another process starts, or the current
> > >>>> application is killed, while the rlimit is reduced, and the approach was
> > >>>> discarded.
> > >>>
> > >>>> As a workaround to ensure that the rlimit bump does not depend on the
> > >>>> availability of a given helper, we restore the unconditional rlimit bump
> > >>>> in bpftool for now.
> > >>>
> > >>>> [0]
> > >>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
> > >>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
> > >>>
> > >>>> Cc: Yafang Shao <laoar.shao@gmail.com>
> > >>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
> > >>>> ---
> > >>>>   tools/bpf/bpftool/common.c     | 8 ++++++++
> > >>>>   tools/bpf/bpftool/feature.c    | 2 ++
> > >>>>   tools/bpf/bpftool/main.c       | 6 +++---
> > >>>>   tools/bpf/bpftool/main.h       | 2 ++
> > >>>>   tools/bpf/bpftool/map.c        | 2 ++
> > >>>>   tools/bpf/bpftool/pids.c       | 1 +
> > >>>>   tools/bpf/bpftool/prog.c       | 3 +++
> > >>>>   tools/bpf/bpftool/struct_ops.c | 2 ++
> > >>>>   8 files changed, 23 insertions(+), 3 deletions(-)
> > >>>
> > >>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> > >>>> index a45b42ee8ab0..a0d4acd7c54a 100644
> > >>>> --- a/tools/bpf/bpftool/common.c
> > >>>> +++ b/tools/bpf/bpftool/common.c
> > >>>> @@ -17,6 +17,7 @@
> > >>>>   #include <linux/magic.h>
> > >>>>   #include <net/if.h>
> > >>>>   #include <sys/mount.h>
> > >>>> +#include <sys/resource.h>
> > >>>>   #include <sys/stat.h>
> > >>>>   #include <sys/vfs.h>
> > >>>
> > >>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
> > >>>>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
> > >>>>   }
> > >>>
> > >>>> +void set_max_rlimit(void)
> > >>>> +{
> > >>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> > >>>> +
> > >>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
> > >>>
> > >>> Do you think it might make sense to print to stderr some warning if
> > >>> we actually happen to adjust this limit?
> > >>>
> > >>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
> > >>>     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
> > >>>     infinity!\n");
> > >>>     setrlimit(RLIMIT_MEMLOCK, &rinf);
> > >>> }
> > >>>
> > >>> ?
> > >>>
> > >>> Because while it's nice that we automatically do this, this might still
> > >>> lead to surprises for some users. OTOH, not sure whether people
> > >>> actually read those warnings? :-/
> > >>
> > >> I'm not strictly opposed to a warning, but I'm not completely sure this
> > >> is desirable.
> > >>
> > >> Bpftool has raised the rlimit for a long time, it changed only in April,
> > >> so I don't think it would come up as a surprise for people who have used
> > >> it for a while. I think this is also something that several other
> > >> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
> > >> have been doing too.
> > >
> > > In this case ignore me and let's continue doing that :-)
> > >
> > > Btw, eventually we'd still like to stop doing that I'd presume?
> >
> > Agreed. I was thinking either finding a way to improve the probe in
> > libbpf, or waiting for some more time until 5.11 gets old, but this may
> > take years :/
> >
> > > Should
> > > we at some point follow up with something like:
> > >
> > > if (kernel_version >= 5.11) { don't touch memlock; }
> > >
> > > ?
> > >
> > > I guess we care only about <5.11 because of the backports, but 5.11+
> > > kernels are guaranteed to have memcg.
> >
> > You mean from uname() and parsing the release? Yes I suppose we could do
> > that, can do as a follow-up.
>
> Yeah, uname-based, I don't think we can do better? Given that probing
> is problematic as well :-(
> But idk, up to you.
>

Agreed with the uname-based solution. Another possible solution is to
probe the member 'memcg' in struct bpf_map, in case someone may
backport memcg-based  memory accounting, but that will be a little
over-engineering. The uname-based solution is simple and can work.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-14 12:37             ` Yafang Shao
@ 2022-06-14 14:20               ` Quentin Monnet
  2022-06-14 20:34                 ` Daniel Borkmann
  2022-06-15 13:22                 ` Yafang Shao
  0 siblings, 2 replies; 20+ messages in thread
From: Quentin Monnet @ 2022-06-14 14:20 UTC (permalink / raw)
  To: Yafang Shao, Stanislav Fomichev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Harsh Modi,
	Paul Chaignon, netdev, bpf

2022-06-14 20:37 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
> On Sat, Jun 11, 2022 at 1:17 AM Stanislav Fomichev <sdf@google.com> wrote:
>>
>> On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@isovalent.com> wrote:
>>>
>>> 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
>>>> On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
>>>>>
>>>>> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
>>>>>> On 06/10, Quentin Monnet wrote:
>>>>>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
>>>>>>
>>>>>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
>>>>>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
>>>>>>> kernel has switched to memcg-based memory accounting. Thanks to the
>>>>>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
>>>>>>> with other systems and ask libbpf to raise the limit for us if
>>>>>>> necessary.
>>>>>>
>>>>>>> How do we know if memcg-based accounting is supported? There is a probe
>>>>>>> in libbpf to check this. But this probe currently relies on the
>>>>>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
>>>>>>> landed in the same kernel version as the memory accounting change. This
>>>>>>> works in the generic case, but it may fail, for example, if the helper
>>>>>>> function has been backported to an older kernel. This has been observed
>>>>>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
>>>>>>> available but rlimit is still in use. The probe succeeds, the rlimit is
>>>>>>> not raised, and probing features with bpftool, for example, fails.
>>>>>>
>>>>>>> A patch was submitted [0] to update this probe in libbpf, based on what
>>>>>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
>>>>>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
>>>>>>> some hard-to-debug flakiness if another process starts, or the current
>>>>>>> application is killed, while the rlimit is reduced, and the approach was
>>>>>>> discarded.
>>>>>>
>>>>>>> As a workaround to ensure that the rlimit bump does not depend on the
>>>>>>> availability of a given helper, we restore the unconditional rlimit bump
>>>>>>> in bpftool for now.
>>>>>>
>>>>>>> [0]
>>>>>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
>>>>>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
>>>>>>
>>>>>>> Cc: Yafang Shao <laoar.shao@gmail.com>
>>>>>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
>>>>>>> ---
>>>>>>>   tools/bpf/bpftool/common.c     | 8 ++++++++
>>>>>>>   tools/bpf/bpftool/feature.c    | 2 ++
>>>>>>>   tools/bpf/bpftool/main.c       | 6 +++---
>>>>>>>   tools/bpf/bpftool/main.h       | 2 ++
>>>>>>>   tools/bpf/bpftool/map.c        | 2 ++
>>>>>>>   tools/bpf/bpftool/pids.c       | 1 +
>>>>>>>   tools/bpf/bpftool/prog.c       | 3 +++
>>>>>>>   tools/bpf/bpftool/struct_ops.c | 2 ++
>>>>>>>   8 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>
>>>>>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
>>>>>>> index a45b42ee8ab0..a0d4acd7c54a 100644
>>>>>>> --- a/tools/bpf/bpftool/common.c
>>>>>>> +++ b/tools/bpf/bpftool/common.c
>>>>>>> @@ -17,6 +17,7 @@
>>>>>>>   #include <linux/magic.h>
>>>>>>>   #include <net/if.h>
>>>>>>>   #include <sys/mount.h>
>>>>>>> +#include <sys/resource.h>
>>>>>>>   #include <sys/stat.h>
>>>>>>>   #include <sys/vfs.h>
>>>>>>
>>>>>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
>>>>>>>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
>>>>>>>   }
>>>>>>
>>>>>>> +void set_max_rlimit(void)
>>>>>>> +{
>>>>>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>>>>>>> +
>>>>>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
>>>>>>
>>>>>> Do you think it might make sense to print to stderr some warning if
>>>>>> we actually happen to adjust this limit?
>>>>>>
>>>>>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
>>>>>>     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
>>>>>>     infinity!\n");
>>>>>>     setrlimit(RLIMIT_MEMLOCK, &rinf);
>>>>>> }
>>>>>>
>>>>>> ?
>>>>>>
>>>>>> Because while it's nice that we automatically do this, this might still
>>>>>> lead to surprises for some users. OTOH, not sure whether people
>>>>>> actually read those warnings? :-/
>>>>>
>>>>> I'm not strictly opposed to a warning, but I'm not completely sure this
>>>>> is desirable.
>>>>>
>>>>> Bpftool has raised the rlimit for a long time, it changed only in April,
>>>>> so I don't think it would come up as a surprise for people who have used
>>>>> it for a while. I think this is also something that several other
>>>>> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
>>>>> have been doing too.
>>>>
>>>> In this case ignore me and let's continue doing that :-)
>>>>
>>>> Btw, eventually we'd still like to stop doing that I'd presume?
>>>
>>> Agreed. I was thinking either finding a way to improve the probe in
>>> libbpf, or waiting for some more time until 5.11 gets old, but this may
>>> take years :/
>>>
>>>> Should
>>>> we at some point follow up with something like:
>>>>
>>>> if (kernel_version >= 5.11) { don't touch memlock; }
>>>>
>>>> ?
>>>>
>>>> I guess we care only about <5.11 because of the backports, but 5.11+
>>>> kernels are guaranteed to have memcg.
>>>
>>> You mean from uname() and parsing the release? Yes I suppose we could do
>>> that, can do as a follow-up.
>>
>> Yeah, uname-based, I don't think we can do better? Given that probing
>> is problematic as well :-(
>> But idk, up to you.
>>
> 
> Agreed with the uname-based solution. Another possible solution is to
> probe the member 'memcg' in struct bpf_map, in case someone may
> backport memcg-based  memory accounting, but that will be a little
> over-engineering. The uname-based solution is simple and can work.
> 

Thanks! Yes, memcg would be more complex: the struct is not exposed to
user space, and BTF is not a hard dependency for bpftool. I'll work on
the uname-based test as a follow-up to this set.

Quentin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-14 14:20               ` Quentin Monnet
@ 2022-06-14 20:34                 ` Daniel Borkmann
  2022-06-14 21:01                   ` Stanislav Fomichev
  2022-06-15 13:22                 ` Yafang Shao
  1 sibling, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2022-06-14 20:34 UTC (permalink / raw)
  To: Quentin Monnet, Yafang Shao, Stanislav Fomichev
  Cc: Alexei Starovoitov, Andrii Nakryiko, Harsh Modi, Paul Chaignon,
	netdev, bpf

On 6/14/22 4:20 PM, Quentin Monnet wrote:
> 2022-06-14 20:37 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
>> On Sat, Jun 11, 2022 at 1:17 AM Stanislav Fomichev <sdf@google.com> wrote:
>>> On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@isovalent.com> wrote:
>>>> 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
>>>>> On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
>>>>>> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
>>>>>>> On 06/10, Quentin Monnet wrote:
>>>>>>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
>>>>>>>
>>>>>>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
>>>>>>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
>>>>>>>> kernel has switched to memcg-based memory accounting. Thanks to the
>>>>>>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
>>>>>>>> with other systems and ask libbpf to raise the limit for us if
>>>>>>>> necessary.
>>>>>>>
>>>>>>>> How do we know if memcg-based accounting is supported? There is a probe
>>>>>>>> in libbpf to check this. But this probe currently relies on the
>>>>>>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
>>>>>>>> landed in the same kernel version as the memory accounting change. This
>>>>>>>> works in the generic case, but it may fail, for example, if the helper
>>>>>>>> function has been backported to an older kernel. This has been observed
>>>>>>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
>>>>>>>> available but rlimit is still in use. The probe succeeds, the rlimit is
>>>>>>>> not raised, and probing features with bpftool, for example, fails.
>>>>>>>
>>>>>>>> A patch was submitted [0] to update this probe in libbpf, based on what
>>>>>>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
>>>>>>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
>>>>>>>> some hard-to-debug flakiness if another process starts, or the current
>>>>>>>> application is killed, while the rlimit is reduced, and the approach was
>>>>>>>> discarded.
>>>>>>>
>>>>>>>> As a workaround to ensure that the rlimit bump does not depend on the
>>>>>>>> availability of a given helper, we restore the unconditional rlimit bump
>>>>>>>> in bpftool for now.
>>>>>>>
>>>>>>>> [0]
>>>>>>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
>>>>>>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
>>>>>>>
>>>>>>>> Cc: Yafang Shao <laoar.shao@gmail.com>
>>>>>>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
>>>>>>>> ---
>>>>>>>>    tools/bpf/bpftool/common.c     | 8 ++++++++
>>>>>>>>    tools/bpf/bpftool/feature.c    | 2 ++
>>>>>>>>    tools/bpf/bpftool/main.c       | 6 +++---
>>>>>>>>    tools/bpf/bpftool/main.h       | 2 ++
>>>>>>>>    tools/bpf/bpftool/map.c        | 2 ++
>>>>>>>>    tools/bpf/bpftool/pids.c       | 1 +
>>>>>>>>    tools/bpf/bpftool/prog.c       | 3 +++
>>>>>>>>    tools/bpf/bpftool/struct_ops.c | 2 ++
>>>>>>>>    8 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
>>>>>>>> index a45b42ee8ab0..a0d4acd7c54a 100644
>>>>>>>> --- a/tools/bpf/bpftool/common.c
>>>>>>>> +++ b/tools/bpf/bpftool/common.c
>>>>>>>> @@ -17,6 +17,7 @@
>>>>>>>>    #include <linux/magic.h>
>>>>>>>>    #include <net/if.h>
>>>>>>>>    #include <sys/mount.h>
>>>>>>>> +#include <sys/resource.h>
>>>>>>>>    #include <sys/stat.h>
>>>>>>>>    #include <sys/vfs.h>
>>>>>>>
>>>>>>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
>>>>>>>>        return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
>>>>>>>>    }
>>>>>>>
>>>>>>>> +void set_max_rlimit(void)
>>>>>>>> +{
>>>>>>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>>>>>>>> +
>>>>>>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
>>>>>>>
>>>>>>> Do you think it might make sense to print to stderr some warning if
>>>>>>> we actually happen to adjust this limit?
>>>>>>>
>>>>>>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
>>>>>>>      fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
>>>>>>>      infinity!\n");
>>>>>>>      setrlimit(RLIMIT_MEMLOCK, &rinf);
>>>>>>> }
>>>>>>>
>>>>>>> ?
>>>>>>>
>>>>>>> Because while it's nice that we automatically do this, this might still
>>>>>>> lead to surprises for some users. OTOH, not sure whether people
>>>>>>> actually read those warnings? :-/
>>>>>>
>>>>>> I'm not strictly opposed to a warning, but I'm not completely sure this
>>>>>> is desirable.
>>>>>>
>>>>>> Bpftool has raised the rlimit for a long time, it changed only in April,
>>>>>> so I don't think it would come up as a surprise for people who have used
>>>>>> it for a while. I think this is also something that several other
>>>>>> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
>>>>>> have been doing too.
>>>>>
>>>>> In this case ignore me and let's continue doing that :-)
>>>>>
>>>>> Btw, eventually we'd still like to stop doing that I'd presume?
>>>>
>>>> Agreed. I was thinking either finding a way to improve the probe in
>>>> libbpf, or waiting for some more time until 5.11 gets old, but this may
>>>> take years :/
>>>>
>>>>> Should
>>>>> we at some point follow up with something like:
>>>>>
>>>>> if (kernel_version >= 5.11) { don't touch memlock; }
>>>>>
>>>>> ?
>>>>>
>>>>> I guess we care only about <5.11 because of the backports, but 5.11+
>>>>> kernels are guaranteed to have memcg.
>>>>
>>>> You mean from uname() and parsing the release? Yes I suppose we could do
>>>> that, can do as a follow-up.
>>>
>>> Yeah, uname-based, I don't think we can do better? Given that probing
>>> is problematic as well :-(
>>> But idk, up to you.
>>
>> Agreed with the uname-based solution. Another possible solution is to
>> probe the member 'memcg' in struct bpf_map, in case someone may
>> backport memcg-based  memory accounting, but that will be a little
>> over-engineering. The uname-based solution is simple and can work.
> 
> Thanks! Yes, memcg would be more complex: the struct is not exposed to
> user space, and BTF is not a hard dependency for bpftool. I'll work on
> the uname-based test as a follow-up to this set.

How would this work for things like RHEL? Maybe two potential workarounds ...
1) We could use a different helper for the probing and see how far we get with
it.. though not great, probably still rare enough that we would run into it
again. 2) Maybe we could create a temp memcg and check whether we get accounted
against it on prog load (e.g. despite high rlimit)?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 0/2] bpftool: Restore memlock rlimit bump
  2022-06-10 11:26 [PATCH bpf-next 0/2] bpftool: Restore memlock rlimit bump Quentin Monnet
  2022-06-10 11:26 ` [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK" Quentin Monnet
  2022-06-10 11:26 ` [PATCH bpf-next 2/2] bpftool: Do not check return value from libbpf_set_strict_mode() Quentin Monnet
@ 2022-06-14 20:37 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 20+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-06-14 20:37 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: ast, daniel, andrii, laoar.shao, harshmodi, paul, netdev, bpf

Hello:

This series was applied to bpf/bpf-next.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Fri, 10 Jun 2022 12:26:46 +0100 you wrote:
> We recently removed the uncondtional rlimit bump from bpftool, deferring it
> to libbpf to probe the system for memcg-based memory accounting and to
> raise the rlimit if necessary.
> 
> This probing is based on the availability of a given BPF helper, and his
> known to fail on some environments where the helper, but not the memcg
> memory accounting, has been backported.
> 
> [...]

Here is the summary with links:
  - [bpf-next,1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
    https://git.kernel.org/bpf/bpf-next/c/6b4384ff1088
  - [bpf-next,2/2] bpftool: Do not check return value from libbpf_set_strict_mode()
    https://git.kernel.org/bpf/bpf-next/c/93270357daa9

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-14 20:34                 ` Daniel Borkmann
@ 2022-06-14 21:01                   ` Stanislav Fomichev
  0 siblings, 0 replies; 20+ messages in thread
From: Stanislav Fomichev @ 2022-06-14 21:01 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Quentin Monnet, Yafang Shao, Alexei Starovoitov, Andrii Nakryiko,
	Harsh Modi, Paul Chaignon, netdev, bpf

On Tue, Jun 14, 2022 at 1:34 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 6/14/22 4:20 PM, Quentin Monnet wrote:
> > 2022-06-14 20:37 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
> >> On Sat, Jun 11, 2022 at 1:17 AM Stanislav Fomichev <sdf@google.com> wrote:
> >>> On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@isovalent.com> wrote:
> >>>> 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
> >>>>> On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
> >>>>>> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
> >>>>>>> On 06/10, Quentin Monnet wrote:
> >>>>>>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
> >>>>>>>
> >>>>>>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
> >>>>>>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
> >>>>>>>> kernel has switched to memcg-based memory accounting. Thanks to the
> >>>>>>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
> >>>>>>>> with other systems and ask libbpf to raise the limit for us if
> >>>>>>>> necessary.
> >>>>>>>
> >>>>>>>> How do we know if memcg-based accounting is supported? There is a probe
> >>>>>>>> in libbpf to check this. But this probe currently relies on the
> >>>>>>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
> >>>>>>>> landed in the same kernel version as the memory accounting change. This
> >>>>>>>> works in the generic case, but it may fail, for example, if the helper
> >>>>>>>> function has been backported to an older kernel. This has been observed
> >>>>>>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
> >>>>>>>> available but rlimit is still in use. The probe succeeds, the rlimit is
> >>>>>>>> not raised, and probing features with bpftool, for example, fails.
> >>>>>>>
> >>>>>>>> A patch was submitted [0] to update this probe in libbpf, based on what
> >>>>>>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
> >>>>>>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
> >>>>>>>> some hard-to-debug flakiness if another process starts, or the current
> >>>>>>>> application is killed, while the rlimit is reduced, and the approach was
> >>>>>>>> discarded.
> >>>>>>>
> >>>>>>>> As a workaround to ensure that the rlimit bump does not depend on the
> >>>>>>>> availability of a given helper, we restore the unconditional rlimit bump
> >>>>>>>> in bpftool for now.
> >>>>>>>
> >>>>>>>> [0]
> >>>>>>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
> >>>>>>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
> >>>>>>>
> >>>>>>>> Cc: Yafang Shao <laoar.shao@gmail.com>
> >>>>>>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
> >>>>>>>> ---
> >>>>>>>>    tools/bpf/bpftool/common.c     | 8 ++++++++
> >>>>>>>>    tools/bpf/bpftool/feature.c    | 2 ++
> >>>>>>>>    tools/bpf/bpftool/main.c       | 6 +++---
> >>>>>>>>    tools/bpf/bpftool/main.h       | 2 ++
> >>>>>>>>    tools/bpf/bpftool/map.c        | 2 ++
> >>>>>>>>    tools/bpf/bpftool/pids.c       | 1 +
> >>>>>>>>    tools/bpf/bpftool/prog.c       | 3 +++
> >>>>>>>>    tools/bpf/bpftool/struct_ops.c | 2 ++
> >>>>>>>>    8 files changed, 23 insertions(+), 3 deletions(-)
> >>>>>>>
> >>>>>>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> >>>>>>>> index a45b42ee8ab0..a0d4acd7c54a 100644
> >>>>>>>> --- a/tools/bpf/bpftool/common.c
> >>>>>>>> +++ b/tools/bpf/bpftool/common.c
> >>>>>>>> @@ -17,6 +17,7 @@
> >>>>>>>>    #include <linux/magic.h>
> >>>>>>>>    #include <net/if.h>
> >>>>>>>>    #include <sys/mount.h>
> >>>>>>>> +#include <sys/resource.h>
> >>>>>>>>    #include <sys/stat.h>
> >>>>>>>>    #include <sys/vfs.h>
> >>>>>>>
> >>>>>>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
> >>>>>>>>        return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
> >>>>>>>>    }
> >>>>>>>
> >>>>>>>> +void set_max_rlimit(void)
> >>>>>>>> +{
> >>>>>>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> >>>>>>>> +
> >>>>>>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
> >>>>>>>
> >>>>>>> Do you think it might make sense to print to stderr some warning if
> >>>>>>> we actually happen to adjust this limit?
> >>>>>>>
> >>>>>>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
> >>>>>>>      fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
> >>>>>>>      infinity!\n");
> >>>>>>>      setrlimit(RLIMIT_MEMLOCK, &rinf);
> >>>>>>> }
> >>>>>>>
> >>>>>>> ?
> >>>>>>>
> >>>>>>> Because while it's nice that we automatically do this, this might still
> >>>>>>> lead to surprises for some users. OTOH, not sure whether people
> >>>>>>> actually read those warnings? :-/
> >>>>>>
> >>>>>> I'm not strictly opposed to a warning, but I'm not completely sure this
> >>>>>> is desirable.
> >>>>>>
> >>>>>> Bpftool has raised the rlimit for a long time, it changed only in April,
> >>>>>> so I don't think it would come up as a surprise for people who have used
> >>>>>> it for a while. I think this is also something that several other
> >>>>>> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
> >>>>>> have been doing too.
> >>>>>
> >>>>> In this case ignore me and let's continue doing that :-)
> >>>>>
> >>>>> Btw, eventually we'd still like to stop doing that I'd presume?
> >>>>
> >>>> Agreed. I was thinking either finding a way to improve the probe in
> >>>> libbpf, or waiting for some more time until 5.11 gets old, but this may
> >>>> take years :/
> >>>>
> >>>>> Should
> >>>>> we at some point follow up with something like:
> >>>>>
> >>>>> if (kernel_version >= 5.11) { don't touch memlock; }
> >>>>>
> >>>>> ?
> >>>>>
> >>>>> I guess we care only about <5.11 because of the backports, but 5.11+
> >>>>> kernels are guaranteed to have memcg.
> >>>>
> >>>> You mean from uname() and parsing the release? Yes I suppose we could do
> >>>> that, can do as a follow-up.
> >>>
> >>> Yeah, uname-based, I don't think we can do better? Given that probing
> >>> is problematic as well :-(
> >>> But idk, up to you.
> >>
> >> Agreed with the uname-based solution. Another possible solution is to
> >> probe the member 'memcg' in struct bpf_map, in case someone may
> >> backport memcg-based  memory accounting, but that will be a little
> >> over-engineering. The uname-based solution is simple and can work.
> >
> > Thanks! Yes, memcg would be more complex: the struct is not exposed to
> > user space, and BTF is not a hard dependency for bpftool. I'll work on
> > the uname-based test as a follow-up to this set.
>
> How would this work for things like RHEL? Maybe two potential workarounds ...
> 1) We could use a different helper for the probing and see how far we get with
> it.. though not great, probably still rare enough that we would run into it
> again. 2) Maybe we could create a temp memcg and check whether we get accounted
> against it on prog load (e.g. despite high rlimit)?

This might be dangerous as well: we don't want to move to a different
cgroup and move back; if we are in a multithreaded application, other
threads might potentially create resources in that temp directory. And
I don't think we can fork as well since we are in a library :-(

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-14 14:20               ` Quentin Monnet
  2022-06-14 20:34                 ` Daniel Borkmann
@ 2022-06-15 13:22                 ` Yafang Shao
  2022-06-15 15:52                   ` Stanislav Fomichev
  1 sibling, 1 reply; 20+ messages in thread
From: Yafang Shao @ 2022-06-15 13:22 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Harsh Modi, Paul Chaignon, netdev, bpf

On Tue, Jun 14, 2022 at 10:20 PM Quentin Monnet <quentin@isovalent.com> wrote:
>
> 2022-06-14 20:37 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
> > On Sat, Jun 11, 2022 at 1:17 AM Stanislav Fomichev <sdf@google.com> wrote:
> >>
> >> On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@isovalent.com> wrote:
> >>>
> >>> 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
> >>>> On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
> >>>>>
> >>>>> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
> >>>>>> On 06/10, Quentin Monnet wrote:
> >>>>>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
> >>>>>>
> >>>>>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
> >>>>>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
> >>>>>>> kernel has switched to memcg-based memory accounting. Thanks to the
> >>>>>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
> >>>>>>> with other systems and ask libbpf to raise the limit for us if
> >>>>>>> necessary.
> >>>>>>
> >>>>>>> How do we know if memcg-based accounting is supported? There is a probe
> >>>>>>> in libbpf to check this. But this probe currently relies on the
> >>>>>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
> >>>>>>> landed in the same kernel version as the memory accounting change. This
> >>>>>>> works in the generic case, but it may fail, for example, if the helper
> >>>>>>> function has been backported to an older kernel. This has been observed
> >>>>>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
> >>>>>>> available but rlimit is still in use. The probe succeeds, the rlimit is
> >>>>>>> not raised, and probing features with bpftool, for example, fails.
> >>>>>>
> >>>>>>> A patch was submitted [0] to update this probe in libbpf, based on what
> >>>>>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
> >>>>>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
> >>>>>>> some hard-to-debug flakiness if another process starts, or the current
> >>>>>>> application is killed, while the rlimit is reduced, and the approach was
> >>>>>>> discarded.
> >>>>>>
> >>>>>>> As a workaround to ensure that the rlimit bump does not depend on the
> >>>>>>> availability of a given helper, we restore the unconditional rlimit bump
> >>>>>>> in bpftool for now.
> >>>>>>
> >>>>>>> [0]
> >>>>>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
> >>>>>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
> >>>>>>
> >>>>>>> Cc: Yafang Shao <laoar.shao@gmail.com>
> >>>>>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
> >>>>>>> ---
> >>>>>>>   tools/bpf/bpftool/common.c     | 8 ++++++++
> >>>>>>>   tools/bpf/bpftool/feature.c    | 2 ++
> >>>>>>>   tools/bpf/bpftool/main.c       | 6 +++---
> >>>>>>>   tools/bpf/bpftool/main.h       | 2 ++
> >>>>>>>   tools/bpf/bpftool/map.c        | 2 ++
> >>>>>>>   tools/bpf/bpftool/pids.c       | 1 +
> >>>>>>>   tools/bpf/bpftool/prog.c       | 3 +++
> >>>>>>>   tools/bpf/bpftool/struct_ops.c | 2 ++
> >>>>>>>   8 files changed, 23 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> >>>>>>> index a45b42ee8ab0..a0d4acd7c54a 100644
> >>>>>>> --- a/tools/bpf/bpftool/common.c
> >>>>>>> +++ b/tools/bpf/bpftool/common.c
> >>>>>>> @@ -17,6 +17,7 @@
> >>>>>>>   #include <linux/magic.h>
> >>>>>>>   #include <net/if.h>
> >>>>>>>   #include <sys/mount.h>
> >>>>>>> +#include <sys/resource.h>
> >>>>>>>   #include <sys/stat.h>
> >>>>>>>   #include <sys/vfs.h>
> >>>>>>
> >>>>>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
> >>>>>>>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
> >>>>>>>   }
> >>>>>>
> >>>>>>> +void set_max_rlimit(void)
> >>>>>>> +{
> >>>>>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> >>>>>>> +
> >>>>>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
> >>>>>>
> >>>>>> Do you think it might make sense to print to stderr some warning if
> >>>>>> we actually happen to adjust this limit?
> >>>>>>
> >>>>>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
> >>>>>>     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
> >>>>>>     infinity!\n");
> >>>>>>     setrlimit(RLIMIT_MEMLOCK, &rinf);
> >>>>>> }
> >>>>>>
> >>>>>> ?
> >>>>>>
> >>>>>> Because while it's nice that we automatically do this, this might still
> >>>>>> lead to surprises for some users. OTOH, not sure whether people
> >>>>>> actually read those warnings? :-/
> >>>>>
> >>>>> I'm not strictly opposed to a warning, but I'm not completely sure this
> >>>>> is desirable.
> >>>>>
> >>>>> Bpftool has raised the rlimit for a long time, it changed only in April,
> >>>>> so I don't think it would come up as a surprise for people who have used
> >>>>> it for a while. I think this is also something that several other
> >>>>> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
> >>>>> have been doing too.
> >>>>
> >>>> In this case ignore me and let's continue doing that :-)
> >>>>
> >>>> Btw, eventually we'd still like to stop doing that I'd presume?
> >>>
> >>> Agreed. I was thinking either finding a way to improve the probe in
> >>> libbpf, or waiting for some more time until 5.11 gets old, but this may
> >>> take years :/
> >>>
> >>>> Should
> >>>> we at some point follow up with something like:
> >>>>
> >>>> if (kernel_version >= 5.11) { don't touch memlock; }
> >>>>
> >>>> ?
> >>>>
> >>>> I guess we care only about <5.11 because of the backports, but 5.11+
> >>>> kernels are guaranteed to have memcg.
> >>>
> >>> You mean from uname() and parsing the release? Yes I suppose we could do
> >>> that, can do as a follow-up.
> >>
> >> Yeah, uname-based, I don't think we can do better? Given that probing
> >> is problematic as well :-(
> >> But idk, up to you.
> >>
> >
> > Agreed with the uname-based solution. Another possible solution is to
> > probe the member 'memcg' in struct bpf_map, in case someone may
> > backport memcg-based  memory accounting, but that will be a little
> > over-engineering. The uname-based solution is simple and can work.
> >
>
> Thanks! Yes, memcg would be more complex: the struct is not exposed to
> user space, and BTF is not a hard dependency for bpftool. I'll work on
> the uname-based test as a follow-up to this set.
>

After a second thought, the uname-based test may not work, because
CONFIG_MEMCG_KMEM can be disabled.
Maybe we can probe the member 'memcg' in struct bpf_map by parsing
/sys/kernel/btf/vmlinux:
[8584] STRUCT 'bpf_map' size=256 vlen=27
        'ops' type_id=8659 bits_offset=0
        'inner_map_meta' type_id=8587 bits_offset=64
        'security' type_id=93 bits_offset=128
        'map_type' type_id=8532 bits_offset=192
        'key_size' type_id=36 bits_offset=224
        'value_size' type_id=36 bits_offset=256
        'max_entries' type_id=36 bits_offset=288
        'map_extra' type_id=38 bits_offset=320
        'map_flags' type_id=36 bits_offset=384
        'spin_lock_off' type_id=21 bits_offset=416
        'timer_off' type_id=21 bits_offset=448
        'id' type_id=36 bits_offset=480
        'numa_node' type_id=21 bits_offset=512
        'btf_key_type_id' type_id=36 bits_offset=544
        'btf_value_type_id' type_id=36 bits_offset=576
        'btf_vmlinux_value_type_id' type_id=36 bits_offset=608
        'btf' type_id=8660 bits_offset=640
        'memcg' type_id=687 bits_offset=704                       <<<< here
        'name' type_id=337 bits_offset=768
        'bypass_spec_v1' type_id=63 bits_offset=896
        'frozen' type_id=63 bits_offset=904
        'refcnt' type_id=81 bits_offset=1024
        'usercnt' type_id=81 bits_offset=1088
        'work' type_id=484 bits_offset=1152
        'freeze_mutex' type_id=443 bits_offset=1408
        'writecnt' type_id=81 bits_offset=1664
        'owner' type_id=8658 bits_offset=1728

If 'memcg' exists, it is memcg-based, otherwise it is rlimit-based.

WDYT?

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-15 13:22                 ` Yafang Shao
@ 2022-06-15 15:52                   ` Stanislav Fomichev
  2022-06-15 16:05                     ` Yafang Shao
  0 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2022-06-15 15:52 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Quentin Monnet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Harsh Modi, Paul Chaignon, netdev, bpf

On Wed, Jun 15, 2022 at 6:23 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Tue, Jun 14, 2022 at 10:20 PM Quentin Monnet <quentin@isovalent.com> wrote:
> >
> > 2022-06-14 20:37 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
> > > On Sat, Jun 11, 2022 at 1:17 AM Stanislav Fomichev <sdf@google.com> wrote:
> > >>
> > >> On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@isovalent.com> wrote:
> > >>>
> > >>> 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
> > >>>> On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
> > >>>>>
> > >>>>> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
> > >>>>>> On 06/10, Quentin Monnet wrote:
> > >>>>>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
> > >>>>>>
> > >>>>>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
> > >>>>>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
> > >>>>>>> kernel has switched to memcg-based memory accounting. Thanks to the
> > >>>>>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
> > >>>>>>> with other systems and ask libbpf to raise the limit for us if
> > >>>>>>> necessary.
> > >>>>>>
> > >>>>>>> How do we know if memcg-based accounting is supported? There is a probe
> > >>>>>>> in libbpf to check this. But this probe currently relies on the
> > >>>>>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
> > >>>>>>> landed in the same kernel version as the memory accounting change. This
> > >>>>>>> works in the generic case, but it may fail, for example, if the helper
> > >>>>>>> function has been backported to an older kernel. This has been observed
> > >>>>>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
> > >>>>>>> available but rlimit is still in use. The probe succeeds, the rlimit is
> > >>>>>>> not raised, and probing features with bpftool, for example, fails.
> > >>>>>>
> > >>>>>>> A patch was submitted [0] to update this probe in libbpf, based on what
> > >>>>>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
> > >>>>>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
> > >>>>>>> some hard-to-debug flakiness if another process starts, or the current
> > >>>>>>> application is killed, while the rlimit is reduced, and the approach was
> > >>>>>>> discarded.
> > >>>>>>
> > >>>>>>> As a workaround to ensure that the rlimit bump does not depend on the
> > >>>>>>> availability of a given helper, we restore the unconditional rlimit bump
> > >>>>>>> in bpftool for now.
> > >>>>>>
> > >>>>>>> [0]
> > >>>>>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
> > >>>>>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
> > >>>>>>
> > >>>>>>> Cc: Yafang Shao <laoar.shao@gmail.com>
> > >>>>>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
> > >>>>>>> ---
> > >>>>>>>   tools/bpf/bpftool/common.c     | 8 ++++++++
> > >>>>>>>   tools/bpf/bpftool/feature.c    | 2 ++
> > >>>>>>>   tools/bpf/bpftool/main.c       | 6 +++---
> > >>>>>>>   tools/bpf/bpftool/main.h       | 2 ++
> > >>>>>>>   tools/bpf/bpftool/map.c        | 2 ++
> > >>>>>>>   tools/bpf/bpftool/pids.c       | 1 +
> > >>>>>>>   tools/bpf/bpftool/prog.c       | 3 +++
> > >>>>>>>   tools/bpf/bpftool/struct_ops.c | 2 ++
> > >>>>>>>   8 files changed, 23 insertions(+), 3 deletions(-)
> > >>>>>>
> > >>>>>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> > >>>>>>> index a45b42ee8ab0..a0d4acd7c54a 100644
> > >>>>>>> --- a/tools/bpf/bpftool/common.c
> > >>>>>>> +++ b/tools/bpf/bpftool/common.c
> > >>>>>>> @@ -17,6 +17,7 @@
> > >>>>>>>   #include <linux/magic.h>
> > >>>>>>>   #include <net/if.h>
> > >>>>>>>   #include <sys/mount.h>
> > >>>>>>> +#include <sys/resource.h>
> > >>>>>>>   #include <sys/stat.h>
> > >>>>>>>   #include <sys/vfs.h>
> > >>>>>>
> > >>>>>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
> > >>>>>>>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
> > >>>>>>>   }
> > >>>>>>
> > >>>>>>> +void set_max_rlimit(void)
> > >>>>>>> +{
> > >>>>>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> > >>>>>>> +
> > >>>>>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
> > >>>>>>
> > >>>>>> Do you think it might make sense to print to stderr some warning if
> > >>>>>> we actually happen to adjust this limit?
> > >>>>>>
> > >>>>>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
> > >>>>>>     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
> > >>>>>>     infinity!\n");
> > >>>>>>     setrlimit(RLIMIT_MEMLOCK, &rinf);
> > >>>>>> }
> > >>>>>>
> > >>>>>> ?
> > >>>>>>
> > >>>>>> Because while it's nice that we automatically do this, this might still
> > >>>>>> lead to surprises for some users. OTOH, not sure whether people
> > >>>>>> actually read those warnings? :-/
> > >>>>>
> > >>>>> I'm not strictly opposed to a warning, but I'm not completely sure this
> > >>>>> is desirable.
> > >>>>>
> > >>>>> Bpftool has raised the rlimit for a long time, it changed only in April,
> > >>>>> so I don't think it would come up as a surprise for people who have used
> > >>>>> it for a while. I think this is also something that several other
> > >>>>> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
> > >>>>> have been doing too.
> > >>>>
> > >>>> In this case ignore me and let's continue doing that :-)
> > >>>>
> > >>>> Btw, eventually we'd still like to stop doing that I'd presume?
> > >>>
> > >>> Agreed. I was thinking either finding a way to improve the probe in
> > >>> libbpf, or waiting for some more time until 5.11 gets old, but this may
> > >>> take years :/
> > >>>
> > >>>> Should
> > >>>> we at some point follow up with something like:
> > >>>>
> > >>>> if (kernel_version >= 5.11) { don't touch memlock; }
> > >>>>
> > >>>> ?
> > >>>>
> > >>>> I guess we care only about <5.11 because of the backports, but 5.11+
> > >>>> kernels are guaranteed to have memcg.
> > >>>
> > >>> You mean from uname() and parsing the release? Yes I suppose we could do
> > >>> that, can do as a follow-up.
> > >>
> > >> Yeah, uname-based, I don't think we can do better? Given that probing
> > >> is problematic as well :-(
> > >> But idk, up to you.
> > >>
> > >
> > > Agreed with the uname-based solution. Another possible solution is to
> > > probe the member 'memcg' in struct bpf_map, in case someone may
> > > backport memcg-based  memory accounting, but that will be a little
> > > over-engineering. The uname-based solution is simple and can work.
> > >
> >
> > Thanks! Yes, memcg would be more complex: the struct is not exposed to
> > user space, and BTF is not a hard dependency for bpftool. I'll work on
> > the uname-based test as a follow-up to this set.
> >
>
> After a second thought, the uname-based test may not work, because
> CONFIG_MEMCG_KMEM can be disabled.

Does it matter? Regardless of whether there is memcg or not, we
shouldn't touch ulimit on 5.11+
If there is no memcg, there is no bpf memory enforcement.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-15 15:52                   ` Stanislav Fomichev
@ 2022-06-15 16:05                     ` Yafang Shao
  2022-06-16 13:59                       ` Quentin Monnet
  0 siblings, 1 reply; 20+ messages in thread
From: Yafang Shao @ 2022-06-15 16:05 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Quentin Monnet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Harsh Modi, Paul Chaignon, netdev, bpf

On Wed, Jun 15, 2022 at 11:52 PM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Wed, Jun 15, 2022 at 6:23 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 10:20 PM Quentin Monnet <quentin@isovalent.com> wrote:
> > >
> > > 2022-06-14 20:37 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
> > > > On Sat, Jun 11, 2022 at 1:17 AM Stanislav Fomichev <sdf@google.com> wrote:
> > > >>
> > > >> On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@isovalent.com> wrote:
> > > >>>
> > > >>> 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
> > > >>>> On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
> > > >>>>>
> > > >>>>> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
> > > >>>>>> On 06/10, Quentin Monnet wrote:
> > > >>>>>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
> > > >>>>>>
> > > >>>>>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
> > > >>>>>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
> > > >>>>>>> kernel has switched to memcg-based memory accounting. Thanks to the
> > > >>>>>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
> > > >>>>>>> with other systems and ask libbpf to raise the limit for us if
> > > >>>>>>> necessary.
> > > >>>>>>
> > > >>>>>>> How do we know if memcg-based accounting is supported? There is a probe
> > > >>>>>>> in libbpf to check this. But this probe currently relies on the
> > > >>>>>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
> > > >>>>>>> landed in the same kernel version as the memory accounting change. This
> > > >>>>>>> works in the generic case, but it may fail, for example, if the helper
> > > >>>>>>> function has been backported to an older kernel. This has been observed
> > > >>>>>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
> > > >>>>>>> available but rlimit is still in use. The probe succeeds, the rlimit is
> > > >>>>>>> not raised, and probing features with bpftool, for example, fails.
> > > >>>>>>
> > > >>>>>>> A patch was submitted [0] to update this probe in libbpf, based on what
> > > >>>>>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
> > > >>>>>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
> > > >>>>>>> some hard-to-debug flakiness if another process starts, or the current
> > > >>>>>>> application is killed, while the rlimit is reduced, and the approach was
> > > >>>>>>> discarded.
> > > >>>>>>
> > > >>>>>>> As a workaround to ensure that the rlimit bump does not depend on the
> > > >>>>>>> availability of a given helper, we restore the unconditional rlimit bump
> > > >>>>>>> in bpftool for now.
> > > >>>>>>
> > > >>>>>>> [0]
> > > >>>>>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
> > > >>>>>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
> > > >>>>>>
> > > >>>>>>> Cc: Yafang Shao <laoar.shao@gmail.com>
> > > >>>>>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
> > > >>>>>>> ---
> > > >>>>>>>   tools/bpf/bpftool/common.c     | 8 ++++++++
> > > >>>>>>>   tools/bpf/bpftool/feature.c    | 2 ++
> > > >>>>>>>   tools/bpf/bpftool/main.c       | 6 +++---
> > > >>>>>>>   tools/bpf/bpftool/main.h       | 2 ++
> > > >>>>>>>   tools/bpf/bpftool/map.c        | 2 ++
> > > >>>>>>>   tools/bpf/bpftool/pids.c       | 1 +
> > > >>>>>>>   tools/bpf/bpftool/prog.c       | 3 +++
> > > >>>>>>>   tools/bpf/bpftool/struct_ops.c | 2 ++
> > > >>>>>>>   8 files changed, 23 insertions(+), 3 deletions(-)
> > > >>>>>>
> > > >>>>>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> > > >>>>>>> index a45b42ee8ab0..a0d4acd7c54a 100644
> > > >>>>>>> --- a/tools/bpf/bpftool/common.c
> > > >>>>>>> +++ b/tools/bpf/bpftool/common.c
> > > >>>>>>> @@ -17,6 +17,7 @@
> > > >>>>>>>   #include <linux/magic.h>
> > > >>>>>>>   #include <net/if.h>
> > > >>>>>>>   #include <sys/mount.h>
> > > >>>>>>> +#include <sys/resource.h>
> > > >>>>>>>   #include <sys/stat.h>
> > > >>>>>>>   #include <sys/vfs.h>
> > > >>>>>>
> > > >>>>>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
> > > >>>>>>>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
> > > >>>>>>>   }
> > > >>>>>>
> > > >>>>>>> +void set_max_rlimit(void)
> > > >>>>>>> +{
> > > >>>>>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> > > >>>>>>> +
> > > >>>>>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
> > > >>>>>>
> > > >>>>>> Do you think it might make sense to print to stderr some warning if
> > > >>>>>> we actually happen to adjust this limit?
> > > >>>>>>
> > > >>>>>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
> > > >>>>>>     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
> > > >>>>>>     infinity!\n");
> > > >>>>>>     setrlimit(RLIMIT_MEMLOCK, &rinf);
> > > >>>>>> }
> > > >>>>>>
> > > >>>>>> ?
> > > >>>>>>
> > > >>>>>> Because while it's nice that we automatically do this, this might still
> > > >>>>>> lead to surprises for some users. OTOH, not sure whether people
> > > >>>>>> actually read those warnings? :-/
> > > >>>>>
> > > >>>>> I'm not strictly opposed to a warning, but I'm not completely sure this
> > > >>>>> is desirable.
> > > >>>>>
> > > >>>>> Bpftool has raised the rlimit for a long time, it changed only in April,
> > > >>>>> so I don't think it would come up as a surprise for people who have used
> > > >>>>> it for a while. I think this is also something that several other
> > > >>>>> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
> > > >>>>> have been doing too.
> > > >>>>
> > > >>>> In this case ignore me and let's continue doing that :-)
> > > >>>>
> > > >>>> Btw, eventually we'd still like to stop doing that I'd presume?
> > > >>>
> > > >>> Agreed. I was thinking either finding a way to improve the probe in
> > > >>> libbpf, or waiting for some more time until 5.11 gets old, but this may
> > > >>> take years :/
> > > >>>
> > > >>>> Should
> > > >>>> we at some point follow up with something like:
> > > >>>>
> > > >>>> if (kernel_version >= 5.11) { don't touch memlock; }
> > > >>>>
> > > >>>> ?
> > > >>>>
> > > >>>> I guess we care only about <5.11 because of the backports, but 5.11+
> > > >>>> kernels are guaranteed to have memcg.
> > > >>>
> > > >>> You mean from uname() and parsing the release? Yes I suppose we could do
> > > >>> that, can do as a follow-up.
> > > >>
> > > >> Yeah, uname-based, I don't think we can do better? Given that probing
> > > >> is problematic as well :-(
> > > >> But idk, up to you.
> > > >>
> > > >
> > > > Agreed with the uname-based solution. Another possible solution is to
> > > > probe the member 'memcg' in struct bpf_map, in case someone may
> > > > backport memcg-based  memory accounting, but that will be a little
> > > > over-engineering. The uname-based solution is simple and can work.
> > > >
> > >
> > > Thanks! Yes, memcg would be more complex: the struct is not exposed to
> > > user space, and BTF is not a hard dependency for bpftool. I'll work on
> > > the uname-based test as a follow-up to this set.
> > >
> >
> > After a second thought, the uname-based test may not work, because
> > CONFIG_MEMCG_KMEM can be disabled.
>
> Does it matter? Regardless of whether there is memcg or not, we
> shouldn't touch ulimit on 5.11+
> If there is no memcg, there is no bpf memory enforcement.

Right, rlimit-based accounting is totally removed, that is not the
same with what I thought before, while I thought it will fallback to
rlimit-based if kmemcg is disabled.

-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-15 16:05                     ` Yafang Shao
@ 2022-06-16 13:59                       ` Quentin Monnet
  2022-06-16 14:54                         ` Yafang Shao
  0 siblings, 1 reply; 20+ messages in thread
From: Quentin Monnet @ 2022-06-16 13:59 UTC (permalink / raw)
  To: Yafang Shao, Stanislav Fomichev
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Harsh Modi,
	Paul Chaignon, netdev, bpf

2022-06-16 00:05 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
> On Wed, Jun 15, 2022 at 11:52 PM Stanislav Fomichev <sdf@google.com> wrote:
>>
>> On Wed, Jun 15, 2022 at 6:23 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>>>
>>> On Tue, Jun 14, 2022 at 10:20 PM Quentin Monnet <quentin@isovalent.com> wrote:
>>>>
>>>> 2022-06-14 20:37 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
>>>>> On Sat, Jun 11, 2022 at 1:17 AM Stanislav Fomichev <sdf@google.com> wrote:
>>>>>>
>>>>>> On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@isovalent.com> wrote:
>>>>>>>
>>>>>>> 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
>>>>>>>> On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
>>>>>>>>>
>>>>>>>>> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
>>>>>>>>>> On 06/10, Quentin Monnet wrote:
>>>>>>>>>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
>>>>>>>>>>
>>>>>>>>>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
>>>>>>>>>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
>>>>>>>>>>> kernel has switched to memcg-based memory accounting. Thanks to the
>>>>>>>>>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
>>>>>>>>>>> with other systems and ask libbpf to raise the limit for us if
>>>>>>>>>>> necessary.
>>>>>>>>>>
>>>>>>>>>>> How do we know if memcg-based accounting is supported? There is a probe
>>>>>>>>>>> in libbpf to check this. But this probe currently relies on the
>>>>>>>>>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
>>>>>>>>>>> landed in the same kernel version as the memory accounting change. This
>>>>>>>>>>> works in the generic case, but it may fail, for example, if the helper
>>>>>>>>>>> function has been backported to an older kernel. This has been observed
>>>>>>>>>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
>>>>>>>>>>> available but rlimit is still in use. The probe succeeds, the rlimit is
>>>>>>>>>>> not raised, and probing features with bpftool, for example, fails.
>>>>>>>>>>
>>>>>>>>>>> A patch was submitted [0] to update this probe in libbpf, based on what
>>>>>>>>>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
>>>>>>>>>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
>>>>>>>>>>> some hard-to-debug flakiness if another process starts, or the current
>>>>>>>>>>> application is killed, while the rlimit is reduced, and the approach was
>>>>>>>>>>> discarded.
>>>>>>>>>>
>>>>>>>>>>> As a workaround to ensure that the rlimit bump does not depend on the
>>>>>>>>>>> availability of a given helper, we restore the unconditional rlimit bump
>>>>>>>>>>> in bpftool for now.
>>>>>>>>>>
>>>>>>>>>>> [0]
>>>>>>>>>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
>>>>>>>>>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
>>>>>>>>>>
>>>>>>>>>>> Cc: Yafang Shao <laoar.shao@gmail.com>
>>>>>>>>>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
>>>>>>>>>>> ---
>>>>>>>>>>>   tools/bpf/bpftool/common.c     | 8 ++++++++
>>>>>>>>>>>   tools/bpf/bpftool/feature.c    | 2 ++
>>>>>>>>>>>   tools/bpf/bpftool/main.c       | 6 +++---
>>>>>>>>>>>   tools/bpf/bpftool/main.h       | 2 ++
>>>>>>>>>>>   tools/bpf/bpftool/map.c        | 2 ++
>>>>>>>>>>>   tools/bpf/bpftool/pids.c       | 1 +
>>>>>>>>>>>   tools/bpf/bpftool/prog.c       | 3 +++
>>>>>>>>>>>   tools/bpf/bpftool/struct_ops.c | 2 ++
>>>>>>>>>>>   8 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
>>>>>>>>>>> index a45b42ee8ab0..a0d4acd7c54a 100644
>>>>>>>>>>> --- a/tools/bpf/bpftool/common.c
>>>>>>>>>>> +++ b/tools/bpf/bpftool/common.c
>>>>>>>>>>> @@ -17,6 +17,7 @@
>>>>>>>>>>>   #include <linux/magic.h>
>>>>>>>>>>>   #include <net/if.h>
>>>>>>>>>>>   #include <sys/mount.h>
>>>>>>>>>>> +#include <sys/resource.h>
>>>>>>>>>>>   #include <sys/stat.h>
>>>>>>>>>>>   #include <sys/vfs.h>
>>>>>>>>>>
>>>>>>>>>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
>>>>>>>>>>>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
>>>>>>>>>>>   }
>>>>>>>>>>
>>>>>>>>>>> +void set_max_rlimit(void)
>>>>>>>>>>> +{
>>>>>>>>>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
>>>>>>>>>>> +
>>>>>>>>>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
>>>>>>>>>>
>>>>>>>>>> Do you think it might make sense to print to stderr some warning if
>>>>>>>>>> we actually happen to adjust this limit?
>>>>>>>>>>
>>>>>>>>>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
>>>>>>>>>>     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
>>>>>>>>>>     infinity!\n");
>>>>>>>>>>     setrlimit(RLIMIT_MEMLOCK, &rinf);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> ?
>>>>>>>>>>
>>>>>>>>>> Because while it's nice that we automatically do this, this might still
>>>>>>>>>> lead to surprises for some users. OTOH, not sure whether people
>>>>>>>>>> actually read those warnings? :-/
>>>>>>>>>
>>>>>>>>> I'm not strictly opposed to a warning, but I'm not completely sure this
>>>>>>>>> is desirable.
>>>>>>>>>
>>>>>>>>> Bpftool has raised the rlimit for a long time, it changed only in April,
>>>>>>>>> so I don't think it would come up as a surprise for people who have used
>>>>>>>>> it for a while. I think this is also something that several other
>>>>>>>>> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
>>>>>>>>> have been doing too.
>>>>>>>>
>>>>>>>> In this case ignore me and let's continue doing that :-)
>>>>>>>>
>>>>>>>> Btw, eventually we'd still like to stop doing that I'd presume?
>>>>>>>
>>>>>>> Agreed. I was thinking either finding a way to improve the probe in
>>>>>>> libbpf, or waiting for some more time until 5.11 gets old, but this may
>>>>>>> take years :/
>>>>>>>
>>>>>>>> Should
>>>>>>>> we at some point follow up with something like:
>>>>>>>>
>>>>>>>> if (kernel_version >= 5.11) { don't touch memlock; }
>>>>>>>>
>>>>>>>> ?
>>>>>>>>
>>>>>>>> I guess we care only about <5.11 because of the backports, but 5.11+
>>>>>>>> kernels are guaranteed to have memcg.
>>>>>>>
>>>>>>> You mean from uname() and parsing the release? Yes I suppose we could do
>>>>>>> that, can do as a follow-up.
>>>>>>
>>>>>> Yeah, uname-based, I don't think we can do better? Given that probing
>>>>>> is problematic as well :-(
>>>>>> But idk, up to you.
>>>>>>
>>>>>
>>>>> Agreed with the uname-based solution. Another possible solution is to
>>>>> probe the member 'memcg' in struct bpf_map, in case someone may
>>>>> backport memcg-based  memory accounting, but that will be a little
>>>>> over-engineering. The uname-based solution is simple and can work.
>>>>>
>>>>
>>>> Thanks! Yes, memcg would be more complex: the struct is not exposed to
>>>> user space, and BTF is not a hard dependency for bpftool. I'll work on
>>>> the uname-based test as a follow-up to this set.
>>>>
>>>
>>> After a second thought, the uname-based test may not work, because
>>> CONFIG_MEMCG_KMEM can be disabled.
>>
>> Does it matter? Regardless of whether there is memcg or not, we
>> shouldn't touch ulimit on 5.11+
>> If there is no memcg, there is no bpf memory enforcement.
> 
> Right, rlimit-based accounting is totally removed, that is not the
> same with what I thought before, while I thought it will fallback to
> rlimit-based if kmemcg is disabled.

Agreed, and so I've got a patch ready for the uname-based probe.

But talking about this with Daniel, we were wondering if it would make
sense instead to have the probe I had initially submitted (lower the
rlimit to 0, attempt to load a program, reset rlimit - see [0]), but
only for bpftool instead of libbpf? My understanding is that the memlock
rlimit is per-process, right? So this shouldn't affect any other
process, and because bpftool is not multithreaded, nothing other than
probing would happen while the rlimit is at zero? Or is it just simpler,
if less accurate, to stick to the uname probe?

Quentin

[0]
https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-16 13:59                       ` Quentin Monnet
@ 2022-06-16 14:54                         ` Yafang Shao
  2022-06-16 18:07                           ` Stanislav Fomichev
  0 siblings, 1 reply; 20+ messages in thread
From: Yafang Shao @ 2022-06-16 14:54 UTC (permalink / raw)
  To: Quentin Monnet
  Cc: Stanislav Fomichev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Harsh Modi, Paul Chaignon, netdev, bpf

On Thu, Jun 16, 2022 at 9:59 PM Quentin Monnet <quentin@isovalent.com> wrote:
>
> 2022-06-16 00:05 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
> > On Wed, Jun 15, 2022 at 11:52 PM Stanislav Fomichev <sdf@google.com> wrote:
> >>
> >> On Wed, Jun 15, 2022 at 6:23 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >>>
> >>> On Tue, Jun 14, 2022 at 10:20 PM Quentin Monnet <quentin@isovalent.com> wrote:
> >>>>
> >>>> 2022-06-14 20:37 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
> >>>>> On Sat, Jun 11, 2022 at 1:17 AM Stanislav Fomichev <sdf@google.com> wrote:
> >>>>>>
> >>>>>> On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@isovalent.com> wrote:
> >>>>>>>
> >>>>>>> 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
> >>>>>>>> On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
> >>>>>>>>>
> >>>>>>>>> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
> >>>>>>>>>> On 06/10, Quentin Monnet wrote:
> >>>>>>>>>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
> >>>>>>>>>>
> >>>>>>>>>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
> >>>>>>>>>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
> >>>>>>>>>>> kernel has switched to memcg-based memory accounting. Thanks to the
> >>>>>>>>>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
> >>>>>>>>>>> with other systems and ask libbpf to raise the limit for us if
> >>>>>>>>>>> necessary.
> >>>>>>>>>>
> >>>>>>>>>>> How do we know if memcg-based accounting is supported? There is a probe
> >>>>>>>>>>> in libbpf to check this. But this probe currently relies on the
> >>>>>>>>>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
> >>>>>>>>>>> landed in the same kernel version as the memory accounting change. This
> >>>>>>>>>>> works in the generic case, but it may fail, for example, if the helper
> >>>>>>>>>>> function has been backported to an older kernel. This has been observed
> >>>>>>>>>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
> >>>>>>>>>>> available but rlimit is still in use. The probe succeeds, the rlimit is
> >>>>>>>>>>> not raised, and probing features with bpftool, for example, fails.
> >>>>>>>>>>
> >>>>>>>>>>> A patch was submitted [0] to update this probe in libbpf, based on what
> >>>>>>>>>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
> >>>>>>>>>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
> >>>>>>>>>>> some hard-to-debug flakiness if another process starts, or the current
> >>>>>>>>>>> application is killed, while the rlimit is reduced, and the approach was
> >>>>>>>>>>> discarded.
> >>>>>>>>>>
> >>>>>>>>>>> As a workaround to ensure that the rlimit bump does not depend on the
> >>>>>>>>>>> availability of a given helper, we restore the unconditional rlimit bump
> >>>>>>>>>>> in bpftool for now.
> >>>>>>>>>>
> >>>>>>>>>>> [0]
> >>>>>>>>>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
> >>>>>>>>>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
> >>>>>>>>>>
> >>>>>>>>>>> Cc: Yafang Shao <laoar.shao@gmail.com>
> >>>>>>>>>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
> >>>>>>>>>>> ---
> >>>>>>>>>>>   tools/bpf/bpftool/common.c     | 8 ++++++++
> >>>>>>>>>>>   tools/bpf/bpftool/feature.c    | 2 ++
> >>>>>>>>>>>   tools/bpf/bpftool/main.c       | 6 +++---
> >>>>>>>>>>>   tools/bpf/bpftool/main.h       | 2 ++
> >>>>>>>>>>>   tools/bpf/bpftool/map.c        | 2 ++
> >>>>>>>>>>>   tools/bpf/bpftool/pids.c       | 1 +
> >>>>>>>>>>>   tools/bpf/bpftool/prog.c       | 3 +++
> >>>>>>>>>>>   tools/bpf/bpftool/struct_ops.c | 2 ++
> >>>>>>>>>>>   8 files changed, 23 insertions(+), 3 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> >>>>>>>>>>> index a45b42ee8ab0..a0d4acd7c54a 100644
> >>>>>>>>>>> --- a/tools/bpf/bpftool/common.c
> >>>>>>>>>>> +++ b/tools/bpf/bpftool/common.c
> >>>>>>>>>>> @@ -17,6 +17,7 @@
> >>>>>>>>>>>   #include <linux/magic.h>
> >>>>>>>>>>>   #include <net/if.h>
> >>>>>>>>>>>   #include <sys/mount.h>
> >>>>>>>>>>> +#include <sys/resource.h>
> >>>>>>>>>>>   #include <sys/stat.h>
> >>>>>>>>>>>   #include <sys/vfs.h>
> >>>>>>>>>>
> >>>>>>>>>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
> >>>>>>>>>>>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
> >>>>>>>>>>>   }
> >>>>>>>>>>
> >>>>>>>>>>> +void set_max_rlimit(void)
> >>>>>>>>>>> +{
> >>>>>>>>>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> >>>>>>>>>>> +
> >>>>>>>>>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
> >>>>>>>>>>
> >>>>>>>>>> Do you think it might make sense to print to stderr some warning if
> >>>>>>>>>> we actually happen to adjust this limit?
> >>>>>>>>>>
> >>>>>>>>>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
> >>>>>>>>>>     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
> >>>>>>>>>>     infinity!\n");
> >>>>>>>>>>     setrlimit(RLIMIT_MEMLOCK, &rinf);
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> ?
> >>>>>>>>>>
> >>>>>>>>>> Because while it's nice that we automatically do this, this might still
> >>>>>>>>>> lead to surprises for some users. OTOH, not sure whether people
> >>>>>>>>>> actually read those warnings? :-/
> >>>>>>>>>
> >>>>>>>>> I'm not strictly opposed to a warning, but I'm not completely sure this
> >>>>>>>>> is desirable.
> >>>>>>>>>
> >>>>>>>>> Bpftool has raised the rlimit for a long time, it changed only in April,
> >>>>>>>>> so I don't think it would come up as a surprise for people who have used
> >>>>>>>>> it for a while. I think this is also something that several other
> >>>>>>>>> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
> >>>>>>>>> have been doing too.
> >>>>>>>>
> >>>>>>>> In this case ignore me and let's continue doing that :-)
> >>>>>>>>
> >>>>>>>> Btw, eventually we'd still like to stop doing that I'd presume?
> >>>>>>>
> >>>>>>> Agreed. I was thinking either finding a way to improve the probe in
> >>>>>>> libbpf, or waiting for some more time until 5.11 gets old, but this may
> >>>>>>> take years :/
> >>>>>>>
> >>>>>>>> Should
> >>>>>>>> we at some point follow up with something like:
> >>>>>>>>
> >>>>>>>> if (kernel_version >= 5.11) { don't touch memlock; }
> >>>>>>>>
> >>>>>>>> ?
> >>>>>>>>
> >>>>>>>> I guess we care only about <5.11 because of the backports, but 5.11+
> >>>>>>>> kernels are guaranteed to have memcg.
> >>>>>>>
> >>>>>>> You mean from uname() and parsing the release? Yes I suppose we could do
> >>>>>>> that, can do as a follow-up.
> >>>>>>
> >>>>>> Yeah, uname-based, I don't think we can do better? Given that probing
> >>>>>> is problematic as well :-(
> >>>>>> But idk, up to you.
> >>>>>>
> >>>>>
> >>>>> Agreed with the uname-based solution. Another possible solution is to
> >>>>> probe the member 'memcg' in struct bpf_map, in case someone may
> >>>>> backport memcg-based  memory accounting, but that will be a little
> >>>>> over-engineering. The uname-based solution is simple and can work.
> >>>>>
> >>>>
> >>>> Thanks! Yes, memcg would be more complex: the struct is not exposed to
> >>>> user space, and BTF is not a hard dependency for bpftool. I'll work on
> >>>> the uname-based test as a follow-up to this set.
> >>>>
> >>>
> >>> After a second thought, the uname-based test may not work, because
> >>> CONFIG_MEMCG_KMEM can be disabled.
> >>
> >> Does it matter? Regardless of whether there is memcg or not, we
> >> shouldn't touch ulimit on 5.11+
> >> If there is no memcg, there is no bpf memory enforcement.
> >
> > Right, rlimit-based accounting is totally removed, that is not the
> > same with what I thought before, while I thought it will fallback to
> > rlimit-based if kmemcg is disabled.
>
> Agreed, and so I've got a patch ready for the uname-based probe.
>
> But talking about this with Daniel, we were wondering if it would make
> sense instead to have the probe I had initially submitted (lower the
> rlimit to 0, attempt to load a program, reset rlimit - see [0]), but
> only for bpftool instead of libbpf? My understanding is that the memlock
> rlimit is per-process, right? So this shouldn't affect any other
> process, and because bpftool is not multithreaded, nothing other than
> probing would happen while the rlimit is at zero?

Makes sense.
It is safe to do the probe within bpftool.

> Or is it just simpler,
> if less accurate, to stick to the uname probe?
>
> Quentin
>
> [0]
> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/



-- 
Regards
Yafang

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-16 14:54                         ` Yafang Shao
@ 2022-06-16 18:07                           ` Stanislav Fomichev
  2022-06-16 20:40                             ` Andrii Nakryiko
  0 siblings, 1 reply; 20+ messages in thread
From: Stanislav Fomichev @ 2022-06-16 18:07 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Quentin Monnet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Harsh Modi, Paul Chaignon, netdev, bpf

On Thu, Jun 16, 2022 at 7:54 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Thu, Jun 16, 2022 at 9:59 PM Quentin Monnet <quentin@isovalent.com> wrote:
> >
> > 2022-06-16 00:05 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
> > > On Wed, Jun 15, 2022 at 11:52 PM Stanislav Fomichev <sdf@google.com> wrote:
> > >>
> > >> On Wed, Jun 15, 2022 at 6:23 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >>>
> > >>> On Tue, Jun 14, 2022 at 10:20 PM Quentin Monnet <quentin@isovalent.com> wrote:
> > >>>>
> > >>>> 2022-06-14 20:37 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
> > >>>>> On Sat, Jun 11, 2022 at 1:17 AM Stanislav Fomichev <sdf@google.com> wrote:
> > >>>>>>
> > >>>>>> On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@isovalent.com> wrote:
> > >>>>>>>
> > >>>>>>> 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
> > >>>>>>>> On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
> > >>>>>>>>>> On 06/10, Quentin Monnet wrote:
> > >>>>>>>>>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
> > >>>>>>>>>>
> > >>>>>>>>>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
> > >>>>>>>>>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
> > >>>>>>>>>>> kernel has switched to memcg-based memory accounting. Thanks to the
> > >>>>>>>>>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
> > >>>>>>>>>>> with other systems and ask libbpf to raise the limit for us if
> > >>>>>>>>>>> necessary.
> > >>>>>>>>>>
> > >>>>>>>>>>> How do we know if memcg-based accounting is supported? There is a probe
> > >>>>>>>>>>> in libbpf to check this. But this probe currently relies on the
> > >>>>>>>>>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
> > >>>>>>>>>>> landed in the same kernel version as the memory accounting change. This
> > >>>>>>>>>>> works in the generic case, but it may fail, for example, if the helper
> > >>>>>>>>>>> function has been backported to an older kernel. This has been observed
> > >>>>>>>>>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
> > >>>>>>>>>>> available but rlimit is still in use. The probe succeeds, the rlimit is
> > >>>>>>>>>>> not raised, and probing features with bpftool, for example, fails.
> > >>>>>>>>>>
> > >>>>>>>>>>> A patch was submitted [0] to update this probe in libbpf, based on what
> > >>>>>>>>>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
> > >>>>>>>>>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
> > >>>>>>>>>>> some hard-to-debug flakiness if another process starts, or the current
> > >>>>>>>>>>> application is killed, while the rlimit is reduced, and the approach was
> > >>>>>>>>>>> discarded.
> > >>>>>>>>>>
> > >>>>>>>>>>> As a workaround to ensure that the rlimit bump does not depend on the
> > >>>>>>>>>>> availability of a given helper, we restore the unconditional rlimit bump
> > >>>>>>>>>>> in bpftool for now.
> > >>>>>>>>>>
> > >>>>>>>>>>> [0]
> > >>>>>>>>>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
> > >>>>>>>>>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
> > >>>>>>>>>>
> > >>>>>>>>>>> Cc: Yafang Shao <laoar.shao@gmail.com>
> > >>>>>>>>>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
> > >>>>>>>>>>> ---
> > >>>>>>>>>>>   tools/bpf/bpftool/common.c     | 8 ++++++++
> > >>>>>>>>>>>   tools/bpf/bpftool/feature.c    | 2 ++
> > >>>>>>>>>>>   tools/bpf/bpftool/main.c       | 6 +++---
> > >>>>>>>>>>>   tools/bpf/bpftool/main.h       | 2 ++
> > >>>>>>>>>>>   tools/bpf/bpftool/map.c        | 2 ++
> > >>>>>>>>>>>   tools/bpf/bpftool/pids.c       | 1 +
> > >>>>>>>>>>>   tools/bpf/bpftool/prog.c       | 3 +++
> > >>>>>>>>>>>   tools/bpf/bpftool/struct_ops.c | 2 ++
> > >>>>>>>>>>>   8 files changed, 23 insertions(+), 3 deletions(-)
> > >>>>>>>>>>
> > >>>>>>>>>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> > >>>>>>>>>>> index a45b42ee8ab0..a0d4acd7c54a 100644
> > >>>>>>>>>>> --- a/tools/bpf/bpftool/common.c
> > >>>>>>>>>>> +++ b/tools/bpf/bpftool/common.c
> > >>>>>>>>>>> @@ -17,6 +17,7 @@
> > >>>>>>>>>>>   #include <linux/magic.h>
> > >>>>>>>>>>>   #include <net/if.h>
> > >>>>>>>>>>>   #include <sys/mount.h>
> > >>>>>>>>>>> +#include <sys/resource.h>
> > >>>>>>>>>>>   #include <sys/stat.h>
> > >>>>>>>>>>>   #include <sys/vfs.h>
> > >>>>>>>>>>
> > >>>>>>>>>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
> > >>>>>>>>>>>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
> > >>>>>>>>>>>   }
> > >>>>>>>>>>
> > >>>>>>>>>>> +void set_max_rlimit(void)
> > >>>>>>>>>>> +{
> > >>>>>>>>>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> > >>>>>>>>>>> +
> > >>>>>>>>>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
> > >>>>>>>>>>
> > >>>>>>>>>> Do you think it might make sense to print to stderr some warning if
> > >>>>>>>>>> we actually happen to adjust this limit?
> > >>>>>>>>>>
> > >>>>>>>>>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
> > >>>>>>>>>>     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
> > >>>>>>>>>>     infinity!\n");
> > >>>>>>>>>>     setrlimit(RLIMIT_MEMLOCK, &rinf);
> > >>>>>>>>>> }
> > >>>>>>>>>>
> > >>>>>>>>>> ?
> > >>>>>>>>>>
> > >>>>>>>>>> Because while it's nice that we automatically do this, this might still
> > >>>>>>>>>> lead to surprises for some users. OTOH, not sure whether people
> > >>>>>>>>>> actually read those warnings? :-/
> > >>>>>>>>>
> > >>>>>>>>> I'm not strictly opposed to a warning, but I'm not completely sure this
> > >>>>>>>>> is desirable.
> > >>>>>>>>>
> > >>>>>>>>> Bpftool has raised the rlimit for a long time, it changed only in April,
> > >>>>>>>>> so I don't think it would come up as a surprise for people who have used
> > >>>>>>>>> it for a while. I think this is also something that several other
> > >>>>>>>>> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
> > >>>>>>>>> have been doing too.
> > >>>>>>>>
> > >>>>>>>> In this case ignore me and let's continue doing that :-)
> > >>>>>>>>
> > >>>>>>>> Btw, eventually we'd still like to stop doing that I'd presume?
> > >>>>>>>
> > >>>>>>> Agreed. I was thinking either finding a way to improve the probe in
> > >>>>>>> libbpf, or waiting for some more time until 5.11 gets old, but this may
> > >>>>>>> take years :/
> > >>>>>>>
> > >>>>>>>> Should
> > >>>>>>>> we at some point follow up with something like:
> > >>>>>>>>
> > >>>>>>>> if (kernel_version >= 5.11) { don't touch memlock; }
> > >>>>>>>>
> > >>>>>>>> ?
> > >>>>>>>>
> > >>>>>>>> I guess we care only about <5.11 because of the backports, but 5.11+
> > >>>>>>>> kernels are guaranteed to have memcg.
> > >>>>>>>
> > >>>>>>> You mean from uname() and parsing the release? Yes I suppose we could do
> > >>>>>>> that, can do as a follow-up.
> > >>>>>>
> > >>>>>> Yeah, uname-based, I don't think we can do better? Given that probing
> > >>>>>> is problematic as well :-(
> > >>>>>> But idk, up to you.
> > >>>>>>
> > >>>>>
> > >>>>> Agreed with the uname-based solution. Another possible solution is to
> > >>>>> probe the member 'memcg' in struct bpf_map, in case someone may
> > >>>>> backport memcg-based  memory accounting, but that will be a little
> > >>>>> over-engineering. The uname-based solution is simple and can work.
> > >>>>>
> > >>>>
> > >>>> Thanks! Yes, memcg would be more complex: the struct is not exposed to
> > >>>> user space, and BTF is not a hard dependency for bpftool. I'll work on
> > >>>> the uname-based test as a follow-up to this set.
> > >>>>
> > >>>
> > >>> After a second thought, the uname-based test may not work, because
> > >>> CONFIG_MEMCG_KMEM can be disabled.
> > >>
> > >> Does it matter? Regardless of whether there is memcg or not, we
> > >> shouldn't touch ulimit on 5.11+
> > >> If there is no memcg, there is no bpf memory enforcement.
> > >
> > > Right, rlimit-based accounting is totally removed, that is not the
> > > same with what I thought before, while I thought it will fallback to
> > > rlimit-based if kmemcg is disabled.
> >
> > Agreed, and so I've got a patch ready for the uname-based probe.
> >
> > But talking about this with Daniel, we were wondering if it would make
> > sense instead to have the probe I had initially submitted (lower the
> > rlimit to 0, attempt to load a program, reset rlimit - see [0]), but
> > only for bpftool instead of libbpf? My understanding is that the memlock
> > rlimit is per-process, right? So this shouldn't affect any other
> > process, and because bpftool is not multithreaded, nothing other than
> > probing would happen while the rlimit is at zero?
>
> Makes sense.
> It is safe to do the probe within bpftool.

+1, seems to be safe to continue doing that in bpftool.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
  2022-06-16 18:07                           ` Stanislav Fomichev
@ 2022-06-16 20:40                             ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2022-06-16 20:40 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Yafang Shao, Quentin Monnet, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Harsh Modi, Paul Chaignon, netdev, bpf

On Thu, Jun 16, 2022 at 11:08 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Thu, Jun 16, 2022 at 7:54 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Thu, Jun 16, 2022 at 9:59 PM Quentin Monnet <quentin@isovalent.com> wrote:
> > >
> > > 2022-06-16 00:05 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
> > > > On Wed, Jun 15, 2022 at 11:52 PM Stanislav Fomichev <sdf@google.com> wrote:
> > > >>
> > > >> On Wed, Jun 15, 2022 at 6:23 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >>>
> > > >>> On Tue, Jun 14, 2022 at 10:20 PM Quentin Monnet <quentin@isovalent.com> wrote:
> > > >>>>
> > > >>>> 2022-06-14 20:37 UTC+0800 ~ Yafang Shao <laoar.shao@gmail.com>
> > > >>>>> On Sat, Jun 11, 2022 at 1:17 AM Stanislav Fomichev <sdf@google.com> wrote:
> > > >>>>>>
> > > >>>>>> On Fri, Jun 10, 2022 at 10:00 AM Quentin Monnet <quentin@isovalent.com> wrote:
> > > >>>>>>>
> > > >>>>>>> 2022-06-10 09:46 UTC-0700 ~ Stanislav Fomichev <sdf@google.com>
> > > >>>>>>>> On Fri, Jun 10, 2022 at 9:34 AM Quentin Monnet <quentin@isovalent.com> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> 2022-06-10 09:07 UTC-0700 ~ sdf@google.com
> > > >>>>>>>>>> On 06/10, Quentin Monnet wrote:
> > > >>>>>>>>>>> This reverts commit a777e18f1bcd32528ff5dfd10a6629b655b05eb8.
> > > >>>>>>>>>>
> > > >>>>>>>>>>> In commit a777e18f1bcd ("bpftool: Use libbpf 1.0 API mode instead of
> > > >>>>>>>>>>> RLIMIT_MEMLOCK"), we removed the rlimit bump in bpftool, because the
> > > >>>>>>>>>>> kernel has switched to memcg-based memory accounting. Thanks to the
> > > >>>>>>>>>>> LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK, we attempted to keep compatibility
> > > >>>>>>>>>>> with other systems and ask libbpf to raise the limit for us if
> > > >>>>>>>>>>> necessary.
> > > >>>>>>>>>>
> > > >>>>>>>>>>> How do we know if memcg-based accounting is supported? There is a probe
> > > >>>>>>>>>>> in libbpf to check this. But this probe currently relies on the
> > > >>>>>>>>>>> availability of a given BPF helper, bpf_ktime_get_coarse_ns(), which
> > > >>>>>>>>>>> landed in the same kernel version as the memory accounting change. This
> > > >>>>>>>>>>> works in the generic case, but it may fail, for example, if the helper
> > > >>>>>>>>>>> function has been backported to an older kernel. This has been observed
> > > >>>>>>>>>>> for Google Cloud's Container-Optimized OS (COS), where the helper is
> > > >>>>>>>>>>> available but rlimit is still in use. The probe succeeds, the rlimit is
> > > >>>>>>>>>>> not raised, and probing features with bpftool, for example, fails.
> > > >>>>>>>>>>
> > > >>>>>>>>>>> A patch was submitted [0] to update this probe in libbpf, based on what
> > > >>>>>>>>>>> the cilium/ebpf Go library does [1]. It would lower the soft rlimit to
> > > >>>>>>>>>>> 0, attempt to load a BPF object, and reset the rlimit. But it may induce
> > > >>>>>>>>>>> some hard-to-debug flakiness if another process starts, or the current
> > > >>>>>>>>>>> application is killed, while the rlimit is reduced, and the approach was
> > > >>>>>>>>>>> discarded.
> > > >>>>>>>>>>
> > > >>>>>>>>>>> As a workaround to ensure that the rlimit bump does not depend on the
> > > >>>>>>>>>>> availability of a given helper, we restore the unconditional rlimit bump
> > > >>>>>>>>>>> in bpftool for now.
> > > >>>>>>>>>>
> > > >>>>>>>>>>> [0]
> > > >>>>>>>>>>> https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/
> > > >>>>>>>>>>> [1] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Cc: Yafang Shao <laoar.shao@gmail.com>
> > > >>>>>>>>>>> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
> > > >>>>>>>>>>> ---
> > > >>>>>>>>>>>   tools/bpf/bpftool/common.c     | 8 ++++++++
> > > >>>>>>>>>>>   tools/bpf/bpftool/feature.c    | 2 ++
> > > >>>>>>>>>>>   tools/bpf/bpftool/main.c       | 6 +++---
> > > >>>>>>>>>>>   tools/bpf/bpftool/main.h       | 2 ++
> > > >>>>>>>>>>>   tools/bpf/bpftool/map.c        | 2 ++
> > > >>>>>>>>>>>   tools/bpf/bpftool/pids.c       | 1 +
> > > >>>>>>>>>>>   tools/bpf/bpftool/prog.c       | 3 +++
> > > >>>>>>>>>>>   tools/bpf/bpftool/struct_ops.c | 2 ++
> > > >>>>>>>>>>>   8 files changed, 23 insertions(+), 3 deletions(-)
> > > >>>>>>>>>>
> > > >>>>>>>>>>> diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
> > > >>>>>>>>>>> index a45b42ee8ab0..a0d4acd7c54a 100644
> > > >>>>>>>>>>> --- a/tools/bpf/bpftool/common.c
> > > >>>>>>>>>>> +++ b/tools/bpf/bpftool/common.c
> > > >>>>>>>>>>> @@ -17,6 +17,7 @@
> > > >>>>>>>>>>>   #include <linux/magic.h>
> > > >>>>>>>>>>>   #include <net/if.h>
> > > >>>>>>>>>>>   #include <sys/mount.h>
> > > >>>>>>>>>>> +#include <sys/resource.h>
> > > >>>>>>>>>>>   #include <sys/stat.h>
> > > >>>>>>>>>>>   #include <sys/vfs.h>
> > > >>>>>>>>>>
> > > >>>>>>>>>>> @@ -72,6 +73,13 @@ static bool is_bpffs(char *path)
> > > >>>>>>>>>>>       return (unsigned long)st_fs.f_type == BPF_FS_MAGIC;
> > > >>>>>>>>>>>   }
> > > >>>>>>>>>>
> > > >>>>>>>>>>> +void set_max_rlimit(void)
> > > >>>>>>>>>>> +{
> > > >>>>>>>>>>> +    struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> > > >>>>>>>>>>> +
> > > >>>>>>>>>>> +    setrlimit(RLIMIT_MEMLOCK, &rinf);
> > > >>>>>>>>>>
> > > >>>>>>>>>> Do you think it might make sense to print to stderr some warning if
> > > >>>>>>>>>> we actually happen to adjust this limit?
> > > >>>>>>>>>>
> > > >>>>>>>>>> if (getrlimit(MEMLOCK) != RLIM_INFINITY) {
> > > >>>>>>>>>>     fprintf(stderr, "Warning: resetting MEMLOCK rlimit to
> > > >>>>>>>>>>     infinity!\n");
> > > >>>>>>>>>>     setrlimit(RLIMIT_MEMLOCK, &rinf);
> > > >>>>>>>>>> }
> > > >>>>>>>>>>
> > > >>>>>>>>>> ?
> > > >>>>>>>>>>
> > > >>>>>>>>>> Because while it's nice that we automatically do this, this might still
> > > >>>>>>>>>> lead to surprises for some users. OTOH, not sure whether people
> > > >>>>>>>>>> actually read those warnings? :-/
> > > >>>>>>>>>
> > > >>>>>>>>> I'm not strictly opposed to a warning, but I'm not completely sure this
> > > >>>>>>>>> is desirable.
> > > >>>>>>>>>
> > > >>>>>>>>> Bpftool has raised the rlimit for a long time, it changed only in April,
> > > >>>>>>>>> so I don't think it would come up as a surprise for people who have used
> > > >>>>>>>>> it for a while. I think this is also something that several other
> > > >>>>>>>>> BPF-related applications (BCC I think?, bpftrace, Cilium come to mind)
> > > >>>>>>>>> have been doing too.
> > > >>>>>>>>
> > > >>>>>>>> In this case ignore me and let's continue doing that :-)
> > > >>>>>>>>
> > > >>>>>>>> Btw, eventually we'd still like to stop doing that I'd presume?
> > > >>>>>>>
> > > >>>>>>> Agreed. I was thinking either finding a way to improve the probe in
> > > >>>>>>> libbpf, or waiting for some more time until 5.11 gets old, but this may
> > > >>>>>>> take years :/
> > > >>>>>>>
> > > >>>>>>>> Should
> > > >>>>>>>> we at some point follow up with something like:
> > > >>>>>>>>
> > > >>>>>>>> if (kernel_version >= 5.11) { don't touch memlock; }
> > > >>>>>>>>
> > > >>>>>>>> ?
> > > >>>>>>>>
> > > >>>>>>>> I guess we care only about <5.11 because of the backports, but 5.11+
> > > >>>>>>>> kernels are guaranteed to have memcg.
> > > >>>>>>>
> > > >>>>>>> You mean from uname() and parsing the release? Yes I suppose we could do
> > > >>>>>>> that, can do as a follow-up.
> > > >>>>>>
> > > >>>>>> Yeah, uname-based, I don't think we can do better? Given that probing
> > > >>>>>> is problematic as well :-(
> > > >>>>>> But idk, up to you.
> > > >>>>>>
> > > >>>>>
> > > >>>>> Agreed with the uname-based solution. Another possible solution is to
> > > >>>>> probe the member 'memcg' in struct bpf_map, in case someone may
> > > >>>>> backport memcg-based  memory accounting, but that will be a little
> > > >>>>> over-engineering. The uname-based solution is simple and can work.
> > > >>>>>
> > > >>>>
> > > >>>> Thanks! Yes, memcg would be more complex: the struct is not exposed to
> > > >>>> user space, and BTF is not a hard dependency for bpftool. I'll work on
> > > >>>> the uname-based test as a follow-up to this set.
> > > >>>>
> > > >>>
> > > >>> After a second thought, the uname-based test may not work, because
> > > >>> CONFIG_MEMCG_KMEM can be disabled.
> > > >>
> > > >> Does it matter? Regardless of whether there is memcg or not, we
> > > >> shouldn't touch ulimit on 5.11+
> > > >> If there is no memcg, there is no bpf memory enforcement.
> > > >
> > > > Right, rlimit-based accounting is totally removed, that is not the
> > > > same with what I thought before, while I thought it will fallback to
> > > > rlimit-based if kmemcg is disabled.
> > >
> > > Agreed, and so I've got a patch ready for the uname-based probe.
> > >
> > > But talking about this with Daniel, we were wondering if it would make
> > > sense instead to have the probe I had initially submitted (lower the
> > > rlimit to 0, attempt to load a program, reset rlimit - see [0]), but
> > > only for bpftool instead of libbpf? My understanding is that the memlock
> > > rlimit is per-process, right? So this shouldn't affect any other
> > > process, and because bpftool is not multithreaded, nothing other than
> > > probing would happen while the rlimit is at zero?
> >
> > Makes sense.
> > It is safe to do the probe within bpftool.
>
> +1, seems to be safe to continue doing that in bpftool.

Agree, doing it in the *application* (which bpftool is for libbpf)
seems totally safe and fine.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-06-16 20:41 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-10 11:26 [PATCH bpf-next 0/2] bpftool: Restore memlock rlimit bump Quentin Monnet
2022-06-10 11:26 ` [PATCH bpf-next 1/2] Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK" Quentin Monnet
2022-06-10 16:07   ` sdf
2022-06-10 16:34     ` Quentin Monnet
2022-06-10 16:46       ` Stanislav Fomichev
2022-06-10 17:00         ` Quentin Monnet
2022-06-10 17:17           ` Stanislav Fomichev
2022-06-14 12:37             ` Yafang Shao
2022-06-14 14:20               ` Quentin Monnet
2022-06-14 20:34                 ` Daniel Borkmann
2022-06-14 21:01                   ` Stanislav Fomichev
2022-06-15 13:22                 ` Yafang Shao
2022-06-15 15:52                   ` Stanislav Fomichev
2022-06-15 16:05                     ` Yafang Shao
2022-06-16 13:59                       ` Quentin Monnet
2022-06-16 14:54                         ` Yafang Shao
2022-06-16 18:07                           ` Stanislav Fomichev
2022-06-16 20:40                             ` Andrii Nakryiko
2022-06-10 11:26 ` [PATCH bpf-next 2/2] bpftool: Do not check return value from libbpf_set_strict_mode() Quentin Monnet
2022-06-14 20:37 ` [PATCH bpf-next 0/2] bpftool: Restore memlock rlimit bump patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).