All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next] bpf, bpftool: Allow probing for CONFIG_HZ from kernel config
@ 2020-05-13  7:58 Daniel Borkmann
  2020-05-13 10:42 ` Quentin Monnet
  2020-05-14 23:19 ` Andrii Nakryiko
  0 siblings, 2 replies; 6+ messages in thread
From: Daniel Borkmann @ 2020-05-13  7:58 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: quentin, bpf, netdev, Daniel Borkmann, Martin KaFai Lau

In Cilium we've recently switched to make use of bpf_jiffies64() for
parts of our tc and XDP datapath since bpf_ktime_get_ns() is more
expensive and high-precision is not needed for our timeouts we have
anyway. Our agent has a probe manager which picks up the json of
bpftool's feature probe and we also use the macro output in our C
programs e.g. to have workarounds when helpers are not available on
older kernels.

Extend the kernel config info dump to also include the kernel's
CONFIG_HZ, and rework the probe_kernel_image_config() for allowing a
macro dump such that CONFIG_HZ can be propagated to BPF C code as a
simple define if available via config. Latter allows to have _compile-
time_ resolution of jiffies <-> sec conversion in our code since all
are propagated as known constants.

Given we cannot generally assume availability of kconfig everywhere,
we also have a kernel hz probe [0] as a fallback. Potentially, bpftool
could have an integrated probe fallback as well, although to derive it,
we might need to place it under 'bpftool feature probe full' or similar
given it would slow down the probing process overall. Yet 'full' doesn't
fit either for us since we don't want to pollute the kernel log with
warning messages from bpf_probe_write_user() and bpf_trace_printk() on
agent startup; I've left it out for the time being.

  [0] https://github.com/cilium/cilium/blob/master/bpf/cilium-probe-kernel-hz.c

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
---
 tools/bpf/bpftool/feature.c | 120 ++++++++++++++++++++----------------
 1 file changed, 67 insertions(+), 53 deletions(-)

diff --git a/tools/bpf/bpftool/feature.c b/tools/bpf/bpftool/feature.c
index f54347f55ee0..1b73e63274b5 100644
--- a/tools/bpf/bpftool/feature.c
+++ b/tools/bpf/bpftool/feature.c
@@ -80,13 +80,12 @@ print_bool_feature(const char *feat_name, const char *plain_name,
 		printf("%s is %savailable\n", plain_name, res ? "" : "NOT ");
 }
 
-static void print_kernel_option(const char *name, const char *value)
+static void print_kernel_option(const char *name, const char *value,
+				const char *define_prefix)
 {
 	char *endptr;
 	int res;
 
-	/* No support for C-style ouptut */
-
 	if (json_output) {
 		if (!value) {
 			jsonw_null_field(json_wtr, name);
@@ -98,6 +97,12 @@ static void print_kernel_option(const char *name, const char *value)
 			jsonw_int_field(json_wtr, name, res);
 		else
 			jsonw_string_field(json_wtr, name, value);
+	} else if (define_prefix) {
+		if (value)
+			printf("#define %s%s %s\n", define_prefix,
+			       name, value);
+		else
+			printf("/* %s%s is not set */\n", define_prefix, name);
 	} else {
 		if (value)
 			printf("%s is set to %s\n", name, value);
@@ -315,77 +320,84 @@ static bool read_next_kernel_config_option(gzFile file, char *buf, size_t n,
 	return false;
 }
 
-static void probe_kernel_image_config(void)
+static void probe_kernel_image_config(const char *define_prefix)
 {
-	static const char * const options[] = {
+	static const struct {
+		const char * const name;
+		bool macro_dump;
+	} options[] = {
 		/* Enable BPF */
-		"CONFIG_BPF",
+		{ "CONFIG_BPF", },
 		/* Enable bpf() syscall */
-		"CONFIG_BPF_SYSCALL",
+		{ "CONFIG_BPF_SYSCALL", },
 		/* Does selected architecture support eBPF JIT compiler */
-		"CONFIG_HAVE_EBPF_JIT",
+		{ "CONFIG_HAVE_EBPF_JIT", },
 		/* Compile eBPF JIT compiler */
-		"CONFIG_BPF_JIT",
+		{ "CONFIG_BPF_JIT", },
 		/* Avoid compiling eBPF interpreter (use JIT only) */
-		"CONFIG_BPF_JIT_ALWAYS_ON",
+		{ "CONFIG_BPF_JIT_ALWAYS_ON", },
 
 		/* cgroups */
-		"CONFIG_CGROUPS",
+		{ "CONFIG_CGROUPS", },
 		/* BPF programs attached to cgroups */
-		"CONFIG_CGROUP_BPF",
+		{ "CONFIG_CGROUP_BPF", },
 		/* bpf_get_cgroup_classid() helper */
-		"CONFIG_CGROUP_NET_CLASSID",
+		{ "CONFIG_CGROUP_NET_CLASSID", },
 		/* bpf_skb_{,ancestor_}cgroup_id() helpers */
-		"CONFIG_SOCK_CGROUP_DATA",
+		{ "CONFIG_SOCK_CGROUP_DATA", },
 
 		/* Tracing: attach BPF to kprobes, tracepoints, etc. */
-		"CONFIG_BPF_EVENTS",
+		{ "CONFIG_BPF_EVENTS", },
 		/* Kprobes */
-		"CONFIG_KPROBE_EVENTS",
+		{ "CONFIG_KPROBE_EVENTS", },
 		/* Uprobes */
-		"CONFIG_UPROBE_EVENTS",
+		{ "CONFIG_UPROBE_EVENTS", },
 		/* Tracepoints */
-		"CONFIG_TRACING",
+		{ "CONFIG_TRACING", },
 		/* Syscall tracepoints */
-		"CONFIG_FTRACE_SYSCALLS",
+		{ "CONFIG_FTRACE_SYSCALLS", },
 		/* bpf_override_return() helper support for selected arch */
-		"CONFIG_FUNCTION_ERROR_INJECTION",
+		{ "CONFIG_FUNCTION_ERROR_INJECTION", },
 		/* bpf_override_return() helper */
-		"CONFIG_BPF_KPROBE_OVERRIDE",
+		{ "CONFIG_BPF_KPROBE_OVERRIDE", },
 
 		/* Network */
-		"CONFIG_NET",
+		{ "CONFIG_NET", },
 		/* AF_XDP sockets */
-		"CONFIG_XDP_SOCKETS",
+		{ "CONFIG_XDP_SOCKETS", },
 		/* BPF_PROG_TYPE_LWT_* and related helpers */
-		"CONFIG_LWTUNNEL_BPF",
+		{ "CONFIG_LWTUNNEL_BPF", },
 		/* BPF_PROG_TYPE_SCHED_ACT, TC (traffic control) actions */
-		"CONFIG_NET_ACT_BPF",
+		{ "CONFIG_NET_ACT_BPF", },
 		/* BPF_PROG_TYPE_SCHED_CLS, TC filters */
-		"CONFIG_NET_CLS_BPF",
+		{ "CONFIG_NET_CLS_BPF", },
 		/* TC clsact qdisc */
-		"CONFIG_NET_CLS_ACT",
+		{ "CONFIG_NET_CLS_ACT", },
 		/* Ingress filtering with TC */
-		"CONFIG_NET_SCH_INGRESS",
+		{ "CONFIG_NET_SCH_INGRESS", },
 		/* bpf_skb_get_xfrm_state() helper */
-		"CONFIG_XFRM",
+		{ "CONFIG_XFRM", },
 		/* bpf_get_route_realm() helper */
-		"CONFIG_IP_ROUTE_CLASSID",
+		{ "CONFIG_IP_ROUTE_CLASSID", },
 		/* BPF_PROG_TYPE_LWT_SEG6_LOCAL and related helpers */
-		"CONFIG_IPV6_SEG6_BPF",
+		{ "CONFIG_IPV6_SEG6_BPF", },
 		/* BPF_PROG_TYPE_LIRC_MODE2 and related helpers */
-		"CONFIG_BPF_LIRC_MODE2",
+		{ "CONFIG_BPF_LIRC_MODE2", },
 		/* BPF stream parser and BPF socket maps */
-		"CONFIG_BPF_STREAM_PARSER",
+		{ "CONFIG_BPF_STREAM_PARSER", },
 		/* xt_bpf module for passing BPF programs to netfilter  */
-		"CONFIG_NETFILTER_XT_MATCH_BPF",
+		{ "CONFIG_NETFILTER_XT_MATCH_BPF", },
 		/* bpfilter back-end for iptables */
-		"CONFIG_BPFILTER",
+		{ "CONFIG_BPFILTER", },
 		/* bpftilter module with "user mode helper" */
-		"CONFIG_BPFILTER_UMH",
+		{ "CONFIG_BPFILTER_UMH", },
 
 		/* test_bpf module for BPF tests */
-		"CONFIG_TEST_BPF",
+		{ "CONFIG_TEST_BPF", },
+
+		/* Misc configs useful in BPF C programs */
+		/* jiffies <-> sec conversion for bpf_jiffies64() helper */
+		{ "CONFIG_HZ", true, }
 	};
 	char *values[ARRAY_SIZE(options)] = { };
 	struct utsname utsn;
@@ -427,7 +439,8 @@ static void probe_kernel_image_config(void)
 
 	while (read_next_kernel_config_option(file, buf, sizeof(buf), &value)) {
 		for (i = 0; i < ARRAY_SIZE(options); i++) {
-			if (values[i] || strcmp(buf, options[i]))
+			if ((define_prefix && !options[i].macro_dump) ||
+			    values[i] || strcmp(buf, options[i].name))
 				continue;
 
 			values[i] = strdup(value);
@@ -439,7 +452,9 @@ static void probe_kernel_image_config(void)
 		gzclose(file);
 
 	for (i = 0; i < ARRAY_SIZE(options); i++) {
-		print_kernel_option(options[i], values[i]);
+		if (define_prefix && !options[i].macro_dump)
+			continue;
+		print_kernel_option(options[i].name, values[i], define_prefix);
 		free(values[i]);
 	}
 }
@@ -632,23 +647,22 @@ section_system_config(enum probe_component target, const char *define_prefix)
 	switch (target) {
 	case COMPONENT_KERNEL:
 	case COMPONENT_UNSPEC:
-		if (define_prefix)
-			break;
-
 		print_start_section("system_config",
 				    "Scanning system configuration...",
-				    NULL, /* define_comment never used here */
-				    NULL); /* define_prefix always NULL here */
-		if (check_procfs()) {
-			probe_unprivileged_disabled();
-			probe_jit_enable();
-			probe_jit_harden();
-			probe_jit_kallsyms();
-			probe_jit_limit();
-		} else {
-			p_info("/* procfs not mounted, skipping related probes */");
+				    "/*** Misc kernel config items ***/",
+				    define_prefix);
+		if (!define_prefix) {
+			if (check_procfs()) {
+				probe_unprivileged_disabled();
+				probe_jit_enable();
+				probe_jit_harden();
+				probe_jit_kallsyms();
+				probe_jit_limit();
+			} else {
+				p_info("/* procfs not mounted, skipping related probes */");
+			}
 		}
-		probe_kernel_image_config();
+		probe_kernel_image_config(define_prefix);
 		print_end_section();
 		break;
 	default:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf, bpftool: Allow probing for CONFIG_HZ from kernel config
  2020-05-13  7:58 [PATCH bpf-next] bpf, bpftool: Allow probing for CONFIG_HZ from kernel config Daniel Borkmann
@ 2020-05-13 10:42 ` Quentin Monnet
  2020-05-13 11:26   ` Daniel Borkmann
  2020-05-14 23:19 ` Andrii Nakryiko
  1 sibling, 1 reply; 6+ messages in thread
From: Quentin Monnet @ 2020-05-13 10:42 UTC (permalink / raw)
  To: Daniel Borkmann, alexei.starovoitov; +Cc: bpf, netdev, Martin KaFai Lau

2020-05-13 09:58 UTC+0200 ~ Daniel Borkmann <daniel@iogearbox.net>
> In Cilium we've recently switched to make use of bpf_jiffies64() for
> parts of our tc and XDP datapath since bpf_ktime_get_ns() is more
> expensive and high-precision is not needed for our timeouts we have
> anyway. Our agent has a probe manager which picks up the json of
> bpftool's feature probe and we also use the macro output in our C
> programs e.g. to have workarounds when helpers are not available on
> older kernels.
> 
> Extend the kernel config info dump to also include the kernel's
> CONFIG_HZ, and rework the probe_kernel_image_config() for allowing a
> macro dump such that CONFIG_HZ can be propagated to BPF C code as a
> simple define if available via config. Latter allows to have _compile-
> time_ resolution of jiffies <-> sec conversion in our code since all
> are propagated as known constants.
> 
> Given we cannot generally assume availability of kconfig everywhere,
> we also have a kernel hz probe [0] as a fallback. Potentially, bpftool
> could have an integrated probe fallback as well, although to derive it,
> we might need to place it under 'bpftool feature probe full' or similar
> given it would slow down the probing process overall. Yet 'full' doesn't
> fit either for us since we don't want to pollute the kernel log with
> warning messages from bpf_probe_write_user() and bpf_trace_printk() on
> agent startup; I've left it out for the time being.
> 
>   [0] https://github.com/cilium/cilium/blob/master/bpf/cilium-probe-kernel-hz.c
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Martin KaFai Lau <kafai@fb.com>

Looks good to me, thanks!

I think at the time the "bpftool feature probe" was added we didn't
settle on a particular format for dumping the CONFIG_* as part as the C
macro output, but other than that I can see no specific reason why not
to have them, so we could even list them all and avoid the macro_dump
bool. But I'm fine either way, other CONFIG_* can still be added to C
macro output at a later time if someone needs them anyway.

Regarding a fallback for the jiffies, not sure what would be best. I
agree with you for the "full" keyword, so we would need another word I
suppose. But adding new keyword for fallbacks for probing features not
directly related to BPF might be going a bit beyond bpftool's scope? I
don't know. Anyway, for the current patch:

Reviewed-by: Quentin Monnet <quentin@isovalent.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf, bpftool: Allow probing for CONFIG_HZ from kernel config
  2020-05-13 10:42 ` Quentin Monnet
@ 2020-05-13 11:26   ` Daniel Borkmann
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Borkmann @ 2020-05-13 11:26 UTC (permalink / raw)
  To: Quentin Monnet, alexei.starovoitov; +Cc: bpf, netdev, Martin KaFai Lau

On 5/13/20 12:42 PM, Quentin Monnet wrote:
> 2020-05-13 09:58 UTC+0200 ~ Daniel Borkmann <daniel@iogearbox.net>
>> In Cilium we've recently switched to make use of bpf_jiffies64() for
>> parts of our tc and XDP datapath since bpf_ktime_get_ns() is more
>> expensive and high-precision is not needed for our timeouts we have
>> anyway. Our agent has a probe manager which picks up the json of
>> bpftool's feature probe and we also use the macro output in our C
>> programs e.g. to have workarounds when helpers are not available on
>> older kernels.
>>
>> Extend the kernel config info dump to also include the kernel's
>> CONFIG_HZ, and rework the probe_kernel_image_config() for allowing a
>> macro dump such that CONFIG_HZ can be propagated to BPF C code as a
>> simple define if available via config. Latter allows to have _compile-
>> time_ resolution of jiffies <-> sec conversion in our code since all
>> are propagated as known constants.
>>
>> Given we cannot generally assume availability of kconfig everywhere,
>> we also have a kernel hz probe [0] as a fallback. Potentially, bpftool
>> could have an integrated probe fallback as well, although to derive it,
>> we might need to place it under 'bpftool feature probe full' or similar
>> given it would slow down the probing process overall. Yet 'full' doesn't
>> fit either for us since we don't want to pollute the kernel log with
>> warning messages from bpf_probe_write_user() and bpf_trace_printk() on
>> agent startup; I've left it out for the time being.
>>
>>    [0] https://github.com/cilium/cilium/blob/master/bpf/cilium-probe-kernel-hz.c
>>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: Martin KaFai Lau <kafai@fb.com>
> 
> Looks good to me, thanks!
> 
> I think at the time the "bpftool feature probe" was added we didn't
> settle on a particular format for dumping the CONFIG_* as part as the C
> macro output, but other than that I can see no specific reason why not
> to have them, so we could even list them all and avoid the macro_dump
> bool. But I'm fine either way, other CONFIG_* can still be added to C
> macro output at a later time if someone needs them anyway.

Right, I initially thought about listing them all, but then my thinking
was that we should really only dump those CONFIG_* as defines that actually
have a real-world use case in a BPF prog somewhere. This also helps to
better understand what is useful and why and avoids unrelated noise e.g.
in the bpf_features.h dump we have in Cilium as part of the agent bootstrap.

> Regarding a fallback for the jiffies, not sure what would be best. I
> agree with you for the "full" keyword, so we would need another word I
> suppose. But adding new keyword for fallbacks for probing features not
> directly related to BPF might be going a bit beyond bpftool's scope? I
> don't know. Anyway, for the current patch:

Agree, had the same thought.

> Reviewed-by: Quentin Monnet <quentin@isovalent.com>

Thanks!
Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf, bpftool: Allow probing for CONFIG_HZ from kernel config
  2020-05-13  7:58 [PATCH bpf-next] bpf, bpftool: Allow probing for CONFIG_HZ from kernel config Daniel Borkmann
  2020-05-13 10:42 ` Quentin Monnet
@ 2020-05-14 23:19 ` Andrii Nakryiko
  2020-05-15 12:12   ` Daniel Borkmann
  1 sibling, 1 reply; 6+ messages in thread
From: Andrii Nakryiko @ 2020-05-14 23:19 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Quentin Monnet, bpf, Networking, Martin KaFai Lau

On Wed, May 13, 2020 at 1:00 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> In Cilium we've recently switched to make use of bpf_jiffies64() for
> parts of our tc and XDP datapath since bpf_ktime_get_ns() is more
> expensive and high-precision is not needed for our timeouts we have
> anyway. Our agent has a probe manager which picks up the json of
> bpftool's feature probe and we also use the macro output in our C
> programs e.g. to have workarounds when helpers are not available on
> older kernels.
>
> Extend the kernel config info dump to also include the kernel's
> CONFIG_HZ, and rework the probe_kernel_image_config() for allowing a
> macro dump such that CONFIG_HZ can be propagated to BPF C code as a
> simple define if available via config. Latter allows to have _compile-
> time_ resolution of jiffies <-> sec conversion in our code since all
> are propagated as known constants.
>
> Given we cannot generally assume availability of kconfig everywhere,
> we also have a kernel hz probe [0] as a fallback. Potentially, bpftool
> could have an integrated probe fallback as well, although to derive it,
> we might need to place it under 'bpftool feature probe full' or similar
> given it would slow down the probing process overall. Yet 'full' doesn't
> fit either for us since we don't want to pollute the kernel log with
> warning messages from bpf_probe_write_user() and bpf_trace_printk() on
> agent startup; I've left it out for the time being.
>
>   [0] https://github.com/cilium/cilium/blob/master/bpf/cilium-probe-kernel-hz.c
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Martin KaFai Lau <kafai@fb.com>
> ---

libbpf supports kconfig values, so don't have to even probe for that,
it will just appear as a constant global variable:

extern unsigned long CONFIG_HZ __kconfig;

But I assume you want this for iproute2 case, which doesn't support
this, right? We really should try to make iproute2 just use libbpf as
a loader with all the libbpf features available. I think all (at least
all major ones) features needed are already there in libbpf, iproute2
would just need to implement custom support for old-style map
definitions and maybe something else, not sure. I wonder if any of
iproute2/BPF users willing to spend some effort on this?

>  tools/bpf/bpftool/feature.c | 120 ++++++++++++++++++++----------------
>  1 file changed, 67 insertions(+), 53 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf, bpftool: Allow probing for CONFIG_HZ from kernel config
  2020-05-14 23:19 ` Andrii Nakryiko
@ 2020-05-15 12:12   ` Daniel Borkmann
  2020-05-15 15:23     ` Alexei Starovoitov
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Borkmann @ 2020-05-15 12:12 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Quentin Monnet, bpf, Networking, Martin KaFai Lau

On Thu, May 14, 2020 at 04:19:41PM -0700, Andrii Nakryiko wrote:
> On Wed, May 13, 2020 at 1:00 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > In Cilium we've recently switched to make use of bpf_jiffies64() for
> > parts of our tc and XDP datapath since bpf_ktime_get_ns() is more
> > expensive and high-precision is not needed for our timeouts we have
> > anyway. Our agent has a probe manager which picks up the json of
> > bpftool's feature probe and we also use the macro output in our C
> > programs e.g. to have workarounds when helpers are not available on
> > older kernels.
> >
> > Extend the kernel config info dump to also include the kernel's
> > CONFIG_HZ, and rework the probe_kernel_image_config() for allowing a
> > macro dump such that CONFIG_HZ can be propagated to BPF C code as a
> > simple define if available via config. Latter allows to have _compile-
> > time_ resolution of jiffies <-> sec conversion in our code since all
> > are propagated as known constants.
> >
> > Given we cannot generally assume availability of kconfig everywhere,
> > we also have a kernel hz probe [0] as a fallback. Potentially, bpftool
> > could have an integrated probe fallback as well, although to derive it,
> > we might need to place it under 'bpftool feature probe full' or similar
> > given it would slow down the probing process overall. Yet 'full' doesn't
> > fit either for us since we don't want to pollute the kernel log with
> > warning messages from bpf_probe_write_user() and bpf_trace_printk() on
> > agent startup; I've left it out for the time being.
> >
> >   [0] https://github.com/cilium/cilium/blob/master/bpf/cilium-probe-kernel-hz.c
> >
> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Martin KaFai Lau <kafai@fb.com>
> > ---
> 
> libbpf supports kconfig values, so don't have to even probe for that,
> it will just appear as a constant global variable:
> 
> extern unsigned long CONFIG_HZ __kconfig;
> 
> But I assume you want this for iproute2 case, which doesn't support
> this, right? We really should try to make iproute2 just use libbpf as

It's one but not the only reason. Our golang daemon picks up the json config
from `bpftool feature -j` and based on that we decide which features our BPF
datapath will have where the daemons needs to also adapt accordingly. So for
users running older kernels we need fallback behavior in our daemon, for
example, in case of missing LPM delete or get_next_key from syscall side the
LPM map management and/or program regeneration will differ in the agent. In
case of jiffies, it's also not as trivial from control plane side, e.g.
existing deployments cannot simply switch from ktime to jiffies during upgrade
while traffic is live in the datapath given this has upgrade and downgrade
implications on timeouts. However, we can switch to using it for new deployments
via helm. In that case, the agent today probes for availability, so it needs
to know i) whether the underlying kernel supports jiffies64 helper, ii) it needs
to know the kernel hz value in order to convert timeouts for our CT's GC. This
is done based on bpftool feature -j from the agent's probe manager. Next, we
also cannot assume general availability of an existing .config from distro side,
so in that case we run the probe to determine kernel hz and emit the CONFIG_HZ
define instead, and if all breaks down we fall back to using ktime in our data
path. From the macro side, the timeouts all resolve nicely during compilation
time since everything is passed as a constant here. We have a small helper for
bpf_jiffies_to_sec() and bpf_sec_to_jiffies() that resolves it whereas `extern
unsigned long CONFIG_HZ __kconfig` hapens at load time plus relies on the fact
that config must be available, although the latter could probably be fixed via
user-callback.

> a loader with all the libbpf features available. I think all (at least
> all major ones) features needed are already there in libbpf, iproute2
> would just need to implement custom support for old-style map
> definitions and maybe something else, not sure. I wonder if any of
> iproute2/BPF users willing to spend some effort on this?

Right, my main concern are all the behavioral subtleties where things could
break on our side e.g. debugging something like [0] is not fun, but I might
eventually do it, at least it's on my list. I recently converted our LB to
be compileable and loadable from both tc as well as XDP side and I'm in the
process of optimizing the BPF side further. So I found myself annoyed enough
that `bpftool prog profile` doesn't work. ;) Lack of BTF - the iproute2 lib
does load BTF [1], but it broke with newer LLVM versions (we ship clang-10
these days). So yeah, either fixing the BTF handling for getting `bpftool
prog profile` working or investing the cycles to move iproute2 finally to
libbpf along with our entire datapath. It's still an intermediate step as
long-term we would love to handle everything native from golang via cilium/ebpf
library to avoid shelling out, but it would allow for opening usage of other
features in the meantime and latter might still be further out. One of the
things that is still not clear yet to me is the global data handling and how
to have a clean solution for both old kernels that don't have BPF global data
support and new ones that have it. Our iproute2 version uses the bpf_apply_relo_glob()
variant [2] which we discussed longer time ago at plumbers and while a hack,
it solved the use-case of avoiding to invoke the compiler for every regeneration
even on old kernels down to 4.9 [3] where BPF global data is not available of
course. Tricky, but maybe there is some low-overhead solution we could add to
libbpf that would resolve it under the hood, or worst case just inline asm ...

  [0] https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=e4c4685fd6e4afd3e9b6818c96fded371f350a3f
  [1] https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=f823f36012fb5ab4ddfca6ed4ff56188730f281e
  [2] https://github.com/cilium/iproute2/blob/static-data/lib/bpf.c#L2525
  [3] https://cilium.io/blog/2019/04/24/cilium-15#BpfTemplating

> >  tools/bpf/bpftool/feature.c | 120 ++++++++++++++++++++----------------
> >  1 file changed, 67 insertions(+), 53 deletions(-)
> >
> 
> [...]

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH bpf-next] bpf, bpftool: Allow probing for CONFIG_HZ from kernel config
  2020-05-15 12:12   ` Daniel Borkmann
@ 2020-05-15 15:23     ` Alexei Starovoitov
  0 siblings, 0 replies; 6+ messages in thread
From: Alexei Starovoitov @ 2020-05-15 15:23 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Andrii Nakryiko, Quentin Monnet, bpf, Networking, Martin KaFai Lau

On Fri, May 15, 2020 at 5:12 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > libbpf supports kconfig values, so don't have to even probe for that,
> > it will just appear as a constant global variable:
> >
> > extern unsigned long CONFIG_HZ __kconfig;
> >
> > But I assume you want this for iproute2 case, which doesn't support
> > this, right? We really should try to make iproute2 just use libbpf as
>
> It's one but not the only reason. Our golang daemon picks up the json config
> from `bpftool feature -j` and based on that we decide which features our BPF
> datapath will have where the daemons needs to also adapt accordingly. So for
> users running older kernels we need fallback behavior in our daemon, for
> example, in case of missing LPM delete or get_next_key from syscall side the
> LPM map management and/or program regeneration will differ in the agent. In
> case of jiffies, it's also not as trivial from control plane side, e.g.
> existing deployments cannot simply switch from ktime to jiffies during upgrade
> while traffic is live in the datapath given this has upgrade and downgrade
> implications on timeouts. However, we can switch to using it for new deployments
> via helm. In that case, the agent today probes for availability, so it needs
> to know i) whether the underlying kernel supports jiffies64 helper, ii) it needs
> to know the kernel hz value in order to convert timeouts for our CT's GC. This
> is done based on bpftool feature -j from the agent's probe manager. Next, we
> also cannot assume general availability of an existing .config from distro side,
> so in that case we run the probe to determine kernel hz and emit the CONFIG_HZ
> define instead, and if all breaks down we fall back to using ktime in our data
> path. From the macro side, the timeouts all resolve nicely during compilation
> time since everything is passed as a constant here. We have a small helper for
> bpf_jiffies_to_sec() and bpf_sec_to_jiffies() that resolves it whereas `extern
> unsigned long CONFIG_HZ __kconfig` hapens at load time plus relies on the fact
> that config must be available, although the latter could probably be fixed via
> user-callback.

fair enough.
Applied to bpf-next. Thanks

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-15 15:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-13  7:58 [PATCH bpf-next] bpf, bpftool: Allow probing for CONFIG_HZ from kernel config Daniel Borkmann
2020-05-13 10:42 ` Quentin Monnet
2020-05-13 11:26   ` Daniel Borkmann
2020-05-14 23:19 ` Andrii Nakryiko
2020-05-15 12:12   ` Daniel Borkmann
2020-05-15 15:23     ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.