bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/3] Implement runqslower BCC tool with BPF CO-RE
@ 2019-12-19  7:06 Andrii Nakryiko
  2019-12-19  7:06 ` [PATCH bpf-next 1/3] bpftool: add extra CO-RE mode to btf dump command Andrii Nakryiko
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Andrii Nakryiko @ 2019-12-19  7:06 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Based on recent BPF CO-RE, tp_btf, and skeleton changes, re-implement
BCC-based runqslower tool as portable pre-compiled BPF CO-RE-based tool. Make
sure it's built as part of selftests to ensure it doesn't bit rot.

As part of this patch set, also introduce new `format core` to `bpftool btf
dump` sub-command. It generates same compilable C header file with all the
types from BTF, but additionally ensures seamless use of generated header with
BPF CO-RE. Currently `format core` applies preserve_access_index attribute (if
supported by Clang) to all structs and unions, to improve user experience of
writing TRACING programs with direct kernel memory read access.

Andrii Nakryiko (3):
  bpftool: add extra CO-RE mode to btf dump command
  libbpf/tools: add runqslower tool to libbpf
  selftests/bpf: build runqslower from selftests

 .../bpf/bpftool/Documentation/bpftool-btf.rst |   7 +-
 tools/bpf/bpftool/bash-completion/bpftool     |   2 +-
 tools/bpf/bpftool/btf.c                       |  24 ++-
 tools/lib/bpf/tools/runqslower/.gitignore     |   2 +
 tools/lib/bpf/tools/runqslower/Makefile       |  60 ++++++
 .../lib/bpf/tools/runqslower/runqslower.bpf.c | 101 ++++++++++
 tools/lib/bpf/tools/runqslower/runqslower.c   | 187 ++++++++++++++++++
 tools/lib/bpf/tools/runqslower/runqslower.h   |  13 ++
 tools/testing/selftests/bpf/Makefile          |   7 +-
 9 files changed, 395 insertions(+), 8 deletions(-)
 create mode 100644 tools/lib/bpf/tools/runqslower/.gitignore
 create mode 100644 tools/lib/bpf/tools/runqslower/Makefile
 create mode 100644 tools/lib/bpf/tools/runqslower/runqslower.bpf.c
 create mode 100644 tools/lib/bpf/tools/runqslower/runqslower.c
 create mode 100644 tools/lib/bpf/tools/runqslower/runqslower.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH bpf-next 1/3] bpftool: add extra CO-RE mode to btf dump command
  2019-12-19  7:06 [PATCH bpf-next 0/3] Implement runqslower BCC tool with BPF CO-RE Andrii Nakryiko
@ 2019-12-19  7:06 ` Andrii Nakryiko
  2019-12-19 17:06   ` Alexei Starovoitov
  2019-12-19  7:06 ` [PATCH bpf-next 2/3] libbpf/tools: add runqslower tool to libbpf Andrii Nakryiko
  2019-12-19  7:06 ` [PATCH bpf-next 3/3] selftests/bpf: build runqslower from selftests Andrii Nakryiko
  2 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2019-12-19  7:06 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add special BPF CO-RE mode to `btf dump` sub-command, which emits C dump of
provided BTF data, but with additional provisions for use with BPF CO-RE.
The first special BPF CO-RE specific feature added is applying
preserve_access_index attribute to all structs/unions to make them
automatically relocatable. This is especially useful for tp_btf/fentry/fexit
BPF program types. They allow direct memory access, so BPF C code just uses
straightfoward a->b->c access pattern to read data from kernel. But without
kernel structs marked as CO-RE relocatable through preserve_access_index
attribute, one has to enclose all the data reads into a special
__builtin_preserve_access_index code block, like so:

__builtin_preserve_access_index(({
    x = p->pid; /* where p is struct task_struct *, for example */
}));

This is very inconvenient and obscures the logic quite a bit. By marking all
auto-generated types with preserve_access_index attribute the above code is
reduced to just clean and natural `x = p->pid;`.

Having a special `format core`, as opposed to extending existing `format c`
mode, allows us to do more improvements like above, knowing that intended use
case is for BPF CO-RE. And still have a clean and unassuming plain C mode for
any other generic use case.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 .../bpf/bpftool/Documentation/bpftool-btf.rst |  7 ++++--
 tools/bpf/bpftool/bash-completion/bpftool     |  2 +-
 tools/bpf/bpftool/btf.c                       | 24 +++++++++++++++----
 3 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-btf.rst b/tools/bpf/bpftool/Documentation/bpftool-btf.rst
index 39615f8e145b..101a91280f3d 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-btf.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-btf.rst
@@ -24,7 +24,7 @@ BTF COMMANDS
 |	**bpftool** **btf help**
 |
 |	*BTF_SRC* := { **id** *BTF_ID* | **prog** *PROG* | **map** *MAP* [{**key** | **value** | **kv** | **all**}] | **file** *FILE* }
-|	*FORMAT* := { **raw** | **c** }
+|	*FORMAT* := { **raw** | **c** | **core** }
 |	*MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
 |	*PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
 
@@ -59,7 +59,10 @@ DESCRIPTION
 
 		  **format** option can be used to override default (raw)
 		  output format. Raw (**raw**) or C-syntax (**c**) output
-		  formats are supported.
+		  formats are supported. There is also a special BPF CO-RE
+		  mode (**core**), which emits types in valid C syntax, but
+		  with additional provisions for more seamless use with BPF
+		  CO-RE.
 
 	**bpftool btf help**
 		  Print short help message.
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 754d8395e451..8308ae5f5679 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -800,7 +800,7 @@ _bpftool()
                             return 0
                             ;;
                         format)
-                            COMPREPLY=( $( compgen -W "c raw" -- "$cur" ) )
+                            COMPREPLY=( $( compgen -W "c core raw" -- "$cur" ) )
                             ;;
                         *)
                             # emit extra options
diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index e5bc97b71ceb..a57494493fec 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -361,7 +361,7 @@ static void __printf(2, 0) btf_dump_printf(void *ctx,
 }
 
 static int dump_btf_c(const struct btf *btf,
-		      __u32 *root_type_ids, int root_type_cnt)
+		      __u32 *root_type_ids, int root_type_cnt, bool core_mode)
 {
 	struct btf_dump *d;
 	int err = 0, i;
@@ -370,6 +370,13 @@ static int dump_btf_c(const struct btf *btf,
 	if (IS_ERR(d))
 		return PTR_ERR(d);
 
+	if (core_mode) {
+		printf("#if defined(__has_attribute) && __has_attribute(preserve_access_index)\n");
+		printf("#define __CLANG_BPF_CORE_SUPPORTED\n");
+		printf("#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)\n");
+		printf("#endif\n\n");
+	}
+
 	if (root_type_cnt) {
 		for (i = 0; i < root_type_cnt; i++) {
 			err = btf_dump__dump_type(d, root_type_ids[i]);
@@ -386,6 +393,12 @@ static int dump_btf_c(const struct btf *btf,
 		}
 	}
 
+	if (core_mode) {
+		printf("#ifdef __CLANG_BPF_CORE_SUPPORTED\n");
+		printf("#pragma clang attribute pop\n");
+		printf("#endif\n");
+	}
+
 done:
 	btf_dump__free(d);
 	return err;
@@ -441,10 +454,10 @@ static bool is_btf_raw(const char *file)
 
 static int do_dump(int argc, char **argv)
 {
+	bool dump_c = false, core_mode = false;
 	struct btf *btf = NULL;
 	__u32 root_type_ids[2];
 	int root_type_cnt = 0;
-	bool dump_c = false;
 	__u32 btf_id = -1;
 	const char *src;
 	int fd = -1;
@@ -544,6 +557,9 @@ static int do_dump(int argc, char **argv)
 			}
 			if (strcmp(*argv, "c") == 0) {
 				dump_c = true;
+			} else if (strcmp(*argv, "core") == 0) {
+				dump_c = true;
+				core_mode = true;
 			} else if (strcmp(*argv, "raw") == 0) {
 				dump_c = false;
 			} else {
@@ -577,7 +593,7 @@ static int do_dump(int argc, char **argv)
 			err = -ENOTSUP;
 			goto done;
 		}
-		err = dump_btf_c(btf, root_type_ids, root_type_cnt);
+		err = dump_btf_c(btf, root_type_ids, root_type_cnt, core_mode);
 	} else {
 		err = dump_btf_raw(btf, root_type_ids, root_type_cnt);
 	}
@@ -925,7 +941,7 @@ static int do_help(int argc, char **argv)
 		"       %s btf help\n"
 		"\n"
 		"       BTF_SRC := { id BTF_ID | prog PROG | map MAP [{key | value | kv | all}] | file FILE }\n"
-		"       FORMAT  := { raw | c }\n"
+		"       FORMAT  := { raw | c | core }\n"
 		"       " HELP_SPEC_MAP "\n"
 		"       " HELP_SPEC_PROGRAM "\n"
 		"       " HELP_SPEC_OPTIONS "\n"
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next 2/3] libbpf/tools: add runqslower tool to libbpf
  2019-12-19  7:06 [PATCH bpf-next 0/3] Implement runqslower BCC tool with BPF CO-RE Andrii Nakryiko
  2019-12-19  7:06 ` [PATCH bpf-next 1/3] bpftool: add extra CO-RE mode to btf dump command Andrii Nakryiko
@ 2019-12-19  7:06 ` Andrii Nakryiko
  2019-12-19 15:41   ` Daniel Borkmann
  2019-12-19 18:13   ` Yonghong Song
  2019-12-19  7:06 ` [PATCH bpf-next 3/3] selftests/bpf: build runqslower from selftests Andrii Nakryiko
  2 siblings, 2 replies; 15+ messages in thread
From: Andrii Nakryiko @ 2019-12-19  7:06 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Convert one of BCC tools (runqslower [0]) to BPF CO-RE + libbpf. It matches
its BCC-based counterpart 1-to-1, supporting all the same parameters and
functionality.

runqslower tool utilizes BPF skeleton, auto-generated from BPF object file,
as well as memory-mapped interface to global (read-only, in this case) data.
Its makefile also ensures auto-generation of "relocatable" vmlinux.h, which is
necessary for BTF-typed raw tracepoints with direct memory access.

  [0] https://github.com/iovisor/bcc/blob/11bf5d02c895df9646c117c713082eb192825293/tools/runqslower.py

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/tools/runqslower/.gitignore     |   2 +
 tools/lib/bpf/tools/runqslower/Makefile       |  60 ++++++
 .../lib/bpf/tools/runqslower/runqslower.bpf.c | 101 ++++++++++
 tools/lib/bpf/tools/runqslower/runqslower.c   | 187 ++++++++++++++++++
 tools/lib/bpf/tools/runqslower/runqslower.h   |  13 ++
 5 files changed, 363 insertions(+)
 create mode 100644 tools/lib/bpf/tools/runqslower/.gitignore
 create mode 100644 tools/lib/bpf/tools/runqslower/Makefile
 create mode 100644 tools/lib/bpf/tools/runqslower/runqslower.bpf.c
 create mode 100644 tools/lib/bpf/tools/runqslower/runqslower.c
 create mode 100644 tools/lib/bpf/tools/runqslower/runqslower.h

diff --git a/tools/lib/bpf/tools/runqslower/.gitignore b/tools/lib/bpf/tools/runqslower/.gitignore
new file mode 100644
index 000000000000..404942cc9371
--- /dev/null
+++ b/tools/lib/bpf/tools/runqslower/.gitignore
@@ -0,0 +1,2 @@
+/.output
+/runqslower
diff --git a/tools/lib/bpf/tools/runqslower/Makefile b/tools/lib/bpf/tools/runqslower/Makefile
new file mode 100644
index 000000000000..b87b1f9fe9da
--- /dev/null
+++ b/tools/lib/bpf/tools/runqslower/Makefile
@@ -0,0 +1,60 @@
+# SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+CLANG := clang
+LLC := llc
+LLVM_STRIP := llvm-strip
+BPFTOOL := bpftool
+LIBBPF_SRC := ../..
+CFLAGS := -g -Wall
+
+# Try to detect best kernel BTF source
+KERNEL_REL := $(shell uname -r)
+ifneq ("$(wildcard /sys/kenerl/btf/vmlinux)","")
+VMLINUX_BTF := /sys/kernel/btf/vmlinux
+else ifneq ("$(wildcard /boot/vmlinux-$(KERNEL_REL))","")
+VMLINUX_BTF := /boot/vmlinux-$(KERNEL_REL)
+else
+$(error "Can't detect kernel BTF, use VMLINUX_BTF to specify it explicitly")
+endif
+
+out := .output
+abs_out := $(abspath $(out))
+libbpf_src := $(abspath $(LIBBPF_SRC))
+
+.DELETE_ON_ERROR:
+
+.PHONY: all
+all: runqslower
+
+.PHONY: clean
+clean:
+	rm -rf $(out) runqslower
+
+runqslower: $(out)/runqslower.o $(out)/libbpf.a
+	$(CC) $(CFLAGS) -lelf -lz $^ -o $@
+
+$(out)/vmlinux.h: $(VMLINUX_BTF) | $(out)
+	$(BPFTOOL) btf dump file $(VMLINUX_BTF) format core > $@
+
+$(out)/libbpf.a: | $(out)
+	cd $(out) &&							      \
+	$(MAKE) -C $(libbpf_src) OUTPUT=$(abs_out)/ $(abs_out)/libbpf.a
+
+$(out)/runqslower.o: runqslower.h $(out)/runqslower.skel.h		      \
+		     $(out)/runqslower.bpf.o
+
+$(out)/runqslower.bpf.o: $(out)/vmlinux.h runqslower.h
+
+$(out)/%.skel.h: $(out)/%.bpf.o
+	$(BPFTOOL) gen skeleton $< > $@
+
+$(out)/%.bpf.o: %.bpf.c | $(out)
+	$(CLANG) -g -O2 -target bpf -I$(out) -I$(LIBBPF_SRC)		      \
+		 -c $(filter %.c,$^) -o $@ &&				      \
+	$(LLVM_STRIP) -g $@
+
+$(out)/%.o: %.c | $(out)
+	$(CC) $(CFLAGS) -I$(LIBBPF_SRC) -I$(out) -c $(filter %.c,$^) -o $@
+
+$(out):
+	mkdir -p $(out)
+
diff --git a/tools/lib/bpf/tools/runqslower/runqslower.bpf.c b/tools/lib/bpf/tools/runqslower/runqslower.bpf.c
new file mode 100644
index 000000000000..7f078e8b1365
--- /dev/null
+++ b/tools/lib/bpf/tools/runqslower/runqslower.bpf.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include "vmlinux.h"
+#include "bpf_helpers.h"
+#include "bpf_core_read.h"
+#include "runqslower.h"
+
+#define TASK_RUNNING 0
+
+#define BPF_F_INDEX_MASK		0xffffffffULL
+#define BPF_F_CURRENT_CPU		BPF_F_INDEX_MASK
+
+const volatile __u64 min_us = 0;
+const volatile pid_t targ_pid = 0;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 10240);
+	__type(key, u32);
+	__type(value, u64);
+} start SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+	__uint(key_size, sizeof(u32));
+	__uint(value_size, sizeof(u32));
+} events SEC(".maps");
+
+/* record enqueue timestamp */
+__always_inline
+static int trace_enqueue(u32 tgid, u32 pid)
+{
+	u64 ts;
+
+	if (!pid || (targ_pid && targ_pid != pid))
+		return 0;
+
+	ts = bpf_ktime_get_ns();
+	bpf_map_update_elem(&start, &pid, &ts, 0);
+	return 0;
+}
+
+SEC("tp_btf/sched_wakeup")
+int handle__sched_wakeup(u64 *ctx)
+{
+	/* TP_PROTO(struct task_struct *p) */
+	struct task_struct *p = (void *)ctx[0];
+
+	return trace_enqueue(p->tgid, p->pid);
+}
+
+SEC("tp_btf/sched_wakeup_new")
+int handle__sched_wakeup_new(u64 *ctx)
+{
+	/* TP_PROTO(struct task_struct *p) */
+	struct task_struct *p = (void *)ctx[0];
+
+	return trace_enqueue(p->tgid, p->pid);
+}
+
+SEC("tp_btf/sched_switch")
+int handle__sched_switch(u64 *ctx)
+{
+	/* TP_PROTO(bool preempt, struct task_struct *prev,
+	 *	    struct task_struct *next)
+	 */
+	struct task_struct *prev = (struct task_struct *)ctx[1];
+	struct task_struct *next = (struct task_struct *)ctx[2];
+	struct event event = {};
+	u64 *tsp, delta_us;
+	long state;
+	u32 pid;
+
+	/* ivcsw: treat like an enqueue event and store timestamp */
+	if (prev->state == TASK_RUNNING)
+		trace_enqueue(prev->tgid, prev->pid);
+
+	pid = next->pid;
+
+	/* fetch timestamp and calculate delta */
+	tsp = bpf_map_lookup_elem(&start, &pid);
+	if (!tsp)
+		return 0;   /* missed enqueue */
+
+	delta_us = (bpf_ktime_get_ns() - *tsp) / 1000;
+	if (min_us && delta_us <= min_us)
+		return 0;
+
+	event.pid = pid;
+	event.delta_us = delta_us;
+	bpf_get_current_comm(&event.task, sizeof(event.task));
+
+	/* output */
+	bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
+			      &event, sizeof(event));
+
+	bpf_map_delete_elem(&start, &pid);
+	return 0;
+}
+
+char LICENSE[] SEC("license") = "GPL";
diff --git a/tools/lib/bpf/tools/runqslower/runqslower.c b/tools/lib/bpf/tools/runqslower/runqslower.c
new file mode 100644
index 000000000000..996f0e2c560e
--- /dev/null
+++ b/tools/lib/bpf/tools/runqslower/runqslower.c
@@ -0,0 +1,187 @@
+// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
+// Copyright (c) 2019 Facebook
+#include <argp.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/resource.h>
+#include <time.h>
+#include <libbpf.h>
+#include <bpf.h>
+#include "runqslower.h"
+#include "runqslower.skel.h"
+
+struct env {
+	pid_t pid;
+	__u64 min_us;
+	bool verbose;
+} env = {
+	.min_us = 10000,
+};
+
+const char *argp_program_version = "runqslower 0.1";
+const char *argp_program_bug_address = "<bpf@vger.kernel.org>";
+const char argp_program_doc[] =
+"runqslower    Trace long process scheduling delays.\n"
+"              For Linux, uses eBPF, BPF CO-RE, libbpf, BTF.\n"
+"\n"
+"This script traces high scheduling delays between tasks being\n"
+"ready to run and them running on CPU after that.\n"
+"\n"
+"USAGE: runqslower [-p PID] [min_us]\n"
+"\n"
+"EXAMPLES:\n"
+"    runqslower         # trace run queue latency higher than 10000 us (default)\n"
+"    runqslower 1000    # trace run queue latency higher than 1000 us\n"
+"    runqslower -p 123  # trace pid 123 only\n";
+
+static const struct argp_option opts[] = {
+	{ "pid", 'p', "PID", 0, "Process PID to trace"},
+	{ "verbose", 'v', NULL, 0, "Verbose debug output" },
+	{},
+};
+
+static error_t parse_arg(int key, char *arg, struct argp_state *state)
+{
+	static int pos_args;
+	int pid;
+	long long min_us;
+
+	switch (key) {
+	case 'v':
+		env.verbose = true;
+		break;
+	case 'p':
+		errno = 0;
+		pid = strtol(arg, NULL, 10);
+		if (errno || pid <= 0) {
+			fprintf(stderr, "Invalid PID: %s\n", arg);
+			argp_usage(state);
+		}
+		env.pid = pid;
+		break;
+	case ARGP_KEY_ARG:
+		if (pos_args++) {
+			fprintf(stderr,
+				"Unrecognized positional argument: %s\n", arg);
+			argp_usage(state);
+		}
+		errno = 0;
+		min_us = strtoll(arg, NULL, 10);
+		if (errno || min_us <= 0) {
+			fprintf(stderr, "Invalid delay (in us): %s\n", arg);
+			argp_usage(state);
+		}
+		env.min_us = min_us;
+		break;
+	default:
+		return ARGP_ERR_UNKNOWN;
+	}
+	return 0;
+}
+
+int libbpf_print_fn(enum libbpf_print_level level,
+		    const char *format, va_list args)
+{
+	if (level == LIBBPF_DEBUG && !env.verbose)
+		return 0;
+	return vfprintf(stderr, format, args);
+}
+
+static int bump_memlock_rlimit(void)
+{
+	struct rlimit rlim_new = {
+		.rlim_cur	= RLIM_INFINITY,
+		.rlim_max	= RLIM_INFINITY,
+	};
+
+	return setrlimit(RLIMIT_MEMLOCK, &rlim_new);
+}
+
+void handle_event(void *ctx, int cpu, void *data, __u32 data_sz)
+{
+	const struct event *e = data;
+	struct tm *tm;
+	char ts[32];
+	time_t t;
+
+	time(&t);
+	tm = localtime(&t);
+	strftime(ts, sizeof(ts), "%H:%M:%S", tm);
+	printf("%-8s %-16s %-6d %14llu\n", ts, e->task, e->pid, e->delta_us);
+}
+
+void handle_lost_events(void *ctx, int cpu, __u64 lost_cnt)
+{
+	printf("Lost %llu events on CPU #%d!\n", lost_cnt, cpu);
+}
+
+int main(int argc, char **argv)
+{
+	static const struct argp argp = {
+		.options = opts,
+		.parser = parse_arg,
+		.doc = argp_program_doc,
+	};
+	struct perf_buffer_opts pb_opts;
+	struct perf_buffer *pb = NULL;
+	struct runqslower_bpf *obj;
+	int err;
+
+	err = argp_parse(&argp, argc, argv, 0, NULL, NULL);
+	if (err)
+		return err;
+
+	libbpf_set_print(libbpf_print_fn);
+
+	err = bump_memlock_rlimit();
+	if (err) {
+		fprintf(stderr, "failed to increase rlimit: %d", err);
+		return 1;
+	}
+
+	obj = runqslower_bpf__open();
+	if (!obj) {
+		fprintf(stderr, "failed to open and/or load BPF object\n");
+		return 1;
+	}
+
+	/* initialize global data (filtering options) */
+	obj->rodata->targ_pid = env.pid;
+	obj->rodata->min_us = env.min_us;
+
+	err = runqslower_bpf__load(obj);
+	if (err) {
+		fprintf(stderr, "failed to load BPF object: %d\n", err);
+		goto cleanup;
+	}
+
+	err = runqslower_bpf__attach(obj);
+	if (err) {
+		fprintf(stderr, "failed to attach BPF programs\n");
+		goto cleanup;
+	}
+
+	printf("Tracing run queue latency higher than %llu us\n", env.min_us);
+	printf("%-8s %-16s %-6s %14s\n", "TIME", "COMM", "PID", "LAT(us)");
+
+	pb_opts.sample_cb = handle_event;
+	pb_opts.lost_cb = handle_lost_events;
+	pb = perf_buffer__new(bpf_map__fd(obj->maps.events), 64, &pb_opts);
+	err = libbpf_get_error(pb);
+	if (err) {
+		pb = NULL;
+		fprintf(stderr, "failed to open perf buffer: %d\n", err);
+		goto cleanup;
+	}
+
+	while ((err = perf_buffer__poll(pb, 100)) >= 0)
+		;
+	printf("Error polling perf buffer: %d\n", err);
+
+cleanup:
+	perf_buffer__free(pb);
+	runqslower_bpf__destroy(obj);
+
+	return err != 0;
+}
diff --git a/tools/lib/bpf/tools/runqslower/runqslower.h b/tools/lib/bpf/tools/runqslower/runqslower.h
new file mode 100644
index 000000000000..9db225425e5f
--- /dev/null
+++ b/tools/lib/bpf/tools/runqslower/runqslower.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+#ifndef __RUNQSLOWER_H
+#define __RUNQSLOWER_H
+
+#define TASK_COMM_LEN 16
+
+struct event {
+	char task[TASK_COMM_LEN];
+	__u64 delta_us;
+	pid_t pid;
+};
+
+#endif /* __RUNQSLOWER_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH bpf-next 3/3] selftests/bpf: build runqslower from selftests
  2019-12-19  7:06 [PATCH bpf-next 0/3] Implement runqslower BCC tool with BPF CO-RE Andrii Nakryiko
  2019-12-19  7:06 ` [PATCH bpf-next 1/3] bpftool: add extra CO-RE mode to btf dump command Andrii Nakryiko
  2019-12-19  7:06 ` [PATCH bpf-next 2/3] libbpf/tools: add runqslower tool to libbpf Andrii Nakryiko
@ 2019-12-19  7:06 ` Andrii Nakryiko
  2 siblings, 0 replies; 15+ messages in thread
From: Andrii Nakryiko @ 2019-12-19  7:06 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Ensure runqslower tool from libbpf is built as part of selftests to prevent it
from bit rotting.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/testing/selftests/bpf/Makefile | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index c652bd84ef0e..1f8eb95d019c 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -75,7 +75,7 @@ TEST_GEN_PROGS_EXTENDED = test_sock_addr test_skb_cgroup_id_user \
 	flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \
 	test_lirc_mode2_user xdping test_cpp
 
-TEST_CUSTOM_PROGS = urandom_read
+TEST_CUSTOM_PROGS = urandom_read runqslower
 
 # Emit succinct information message describing current building step
 # $1 - generic step name (e.g., CC, LINK, etc);
@@ -93,6 +93,7 @@ OVERRIDE_TARGETS := 1
 override define CLEAN
 	$(call msg,    CLEAN)
 	$(RM) -r $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES) $(EXTRA_CLEAN)
+	$(MAKE) -C $(BPFDIR)/tools/runqslower clean
 endef
 
 include ../lib.mk
@@ -119,6 +120,10 @@ $(OUTPUT)/test_stub.o: test_stub.c
 	$(call msg,         CC,,$@)
 	$(CC) -c $(CFLAGS) -o $@ $<
 
+.PHONY: $(OUTPUT)/runqslower
+$(OUTPUT)/runqslower: force
+	$(MAKE) -C $(BPFDIR)/tools/runqslower
+
 BPFOBJ := $(OUTPUT)/libbpf.a
 
 $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED): $(OUTPUT)/test_stub.o $(BPFOBJ)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 2/3] libbpf/tools: add runqslower tool to libbpf
  2019-12-19  7:06 ` [PATCH bpf-next 2/3] libbpf/tools: add runqslower tool to libbpf Andrii Nakryiko
@ 2019-12-19 15:41   ` Daniel Borkmann
  2019-12-19 21:14     ` Andrii Nakryiko
  2019-12-19 18:13   ` Yonghong Song
  1 sibling, 1 reply; 15+ messages in thread
From: Daniel Borkmann @ 2019-12-19 15:41 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, netdev, ast, andrii.nakryiko, kernel-team

On Wed, Dec 18, 2019 at 11:06:57PM -0800, Andrii Nakryiko wrote:
> Convert one of BCC tools (runqslower [0]) to BPF CO-RE + libbpf. It matches
> its BCC-based counterpart 1-to-1, supporting all the same parameters and
> functionality.
> 
> runqslower tool utilizes BPF skeleton, auto-generated from BPF object file,
> as well as memory-mapped interface to global (read-only, in this case) data.
> Its makefile also ensures auto-generation of "relocatable" vmlinux.h, which is
> necessary for BTF-typed raw tracepoints with direct memory access.
> 
>   [0] https://github.com/iovisor/bcc/blob/11bf5d02c895df9646c117c713082eb192825293/tools/runqslower.py
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  tools/lib/bpf/tools/runqslower/.gitignore     |   2 +
>  tools/lib/bpf/tools/runqslower/Makefile       |  60 ++++++
>  .../lib/bpf/tools/runqslower/runqslower.bpf.c | 101 ++++++++++
>  tools/lib/bpf/tools/runqslower/runqslower.c   | 187 ++++++++++++++++++
>  tools/lib/bpf/tools/runqslower/runqslower.h   |  13 ++

tools/lib/bpf/tools/ is rather weird, please add to tools/bpf/ which is the
more appropriate place we have for small tools. Could also live directly in
there, e.g. tools/bpf/runqslower.{c,h,bpf.c} and then built/run from selftests,
but under libbpf directly is too odd.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/3] bpftool: add extra CO-RE mode to btf dump command
  2019-12-19  7:06 ` [PATCH bpf-next 1/3] bpftool: add extra CO-RE mode to btf dump command Andrii Nakryiko
@ 2019-12-19 17:06   ` Alexei Starovoitov
  2019-12-19 21:07     ` Andrii Nakryiko
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2019-12-19 17:06 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, netdev, ast, daniel, andrii.nakryiko, kernel-team

On Wed, Dec 18, 2019 at 11:06:56PM -0800, Andrii Nakryiko wrote:
> +	if (core_mode) {
> +		printf("#if defined(__has_attribute) && __has_attribute(preserve_access_index)\n");
> +		printf("#define __CLANG_BPF_CORE_SUPPORTED\n");
> +		printf("#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)\n");
> +		printf("#endif\n\n");

I think it's dangerous to automatically opt-out when clang is not new enough.
bpf prog will compile fine, but it will be missing co-re relocations.
How about doing something like:
  printf("#ifdef NEEDS_CO_RE\n");
  printf("#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)\n");
  printf("#endif\n\n");
and emit it always when 'format c'.
Then on the program side it will look:
#define NEEDS_CO_RE
#include "vmlinux.h"
If clang is too old there will be a compile time error which is a good thing.
Future features will have different NEEDS_ macros.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 2/3] libbpf/tools: add runqslower tool to libbpf
  2019-12-19  7:06 ` [PATCH bpf-next 2/3] libbpf/tools: add runqslower tool to libbpf Andrii Nakryiko
  2019-12-19 15:41   ` Daniel Borkmann
@ 2019-12-19 18:13   ` Yonghong Song
  2019-12-19 21:16     ` Andrii Nakryiko
  1 sibling, 1 reply; 15+ messages in thread
From: Yonghong Song @ 2019-12-19 18:13 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, Alexei Starovoitov, daniel
  Cc: andrii.nakryiko, Kernel Team



On 12/18/19 11:06 PM, Andrii Nakryiko wrote:
> Convert one of BCC tools (runqslower [0]) to BPF CO-RE + libbpf. It matches
> its BCC-based counterpart 1-to-1, supporting all the same parameters and
> functionality.
> 
> runqslower tool utilizes BPF skeleton, auto-generated from BPF object file,
> as well as memory-mapped interface to global (read-only, in this case) data.
> Its makefile also ensures auto-generation of "relocatable" vmlinux.h, which is
> necessary for BTF-typed raw tracepoints with direct memory access.
> 
>    [0] https://github.com/iovisor/bcc/blob/11bf5d02c895df9646c117c713082eb192825293/tools/runqslower.py
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>   tools/lib/bpf/tools/runqslower/.gitignore     |   2 +
>   tools/lib/bpf/tools/runqslower/Makefile       |  60 ++++++
>   .../lib/bpf/tools/runqslower/runqslower.bpf.c | 101 ++++++++++
>   tools/lib/bpf/tools/runqslower/runqslower.c   | 187 ++++++++++++++++++
>   tools/lib/bpf/tools/runqslower/runqslower.h   |  13 ++
>   5 files changed, 363 insertions(+)
>   create mode 100644 tools/lib/bpf/tools/runqslower/.gitignore
>   create mode 100644 tools/lib/bpf/tools/runqslower/Makefile
>   create mode 100644 tools/lib/bpf/tools/runqslower/runqslower.bpf.c
>   create mode 100644 tools/lib/bpf/tools/runqslower/runqslower.c
>   create mode 100644 tools/lib/bpf/tools/runqslower/runqslower.h
> 
> diff --git a/tools/lib/bpf/tools/runqslower/.gitignore b/tools/lib/bpf/tools/runqslower/.gitignore
> new file mode 100644
> index 000000000000..404942cc9371
> --- /dev/null
> +++ b/tools/lib/bpf/tools/runqslower/.gitignore
> @@ -0,0 +1,2 @@
> +/.output
> +/runqslower
> diff --git a/tools/lib/bpf/tools/runqslower/Makefile b/tools/lib/bpf/tools/runqslower/Makefile
> new file mode 100644
> index 000000000000..b87b1f9fe9da
> --- /dev/null
> +++ b/tools/lib/bpf/tools/runqslower/Makefile
> @@ -0,0 +1,60 @@
> +# SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> +CLANG := clang
> +LLC := llc
> +LLVM_STRIP := llvm-strip
> +BPFTOOL := bpftool

Maybe it is better to use in-tree bpftool? This will ensure we use the 
one shipped together with the source which should have needed functionality.

> +LIBBPF_SRC := ../..
> +CFLAGS := -g -Wall
> +
> +# Try to detect best kernel BTF source
> +KERNEL_REL := $(shell uname -r)
> +ifneq ("$(wildcard /sys/kenerl/btf/vmlinux)","")
> +VMLINUX_BTF := /sys/kernel/btf/vmlinux
> +else ifneq ("$(wildcard /boot/vmlinux-$(KERNEL_REL))","")
> +VMLINUX_BTF := /boot/vmlinux-$(KERNEL_REL)
> +else
> +$(error "Can't detect kernel BTF, use VMLINUX_BTF to specify it explicitly")
> +endif
> +
> +out := .output
> +abs_out := $(abspath $(out))
> +libbpf_src := $(abspath $(LIBBPF_SRC))
> +
> +.DELETE_ON_ERROR:
> +
> +.PHONY: all
> +all: runqslower
> +
> +.PHONY: clean
> +clean:
> +	rm -rf $(out) runqslower
> +
> +runqslower: $(out)/runqslower.o $(out)/libbpf.a
> +	$(CC) $(CFLAGS) -lelf -lz $^ -o $@
> +
> +$(out)/vmlinux.h: $(VMLINUX_BTF) | $(out)
> +	$(BPFTOOL) btf dump file $(VMLINUX_BTF) format core > $@
> +
> +$(out)/libbpf.a: | $(out)
> +	cd $(out) &&							      \
> +	$(MAKE) -C $(libbpf_src) OUTPUT=$(abs_out)/ $(abs_out)/libbpf.a
> +
> +$(out)/runqslower.o: runqslower.h $(out)/runqslower.skel.h		      \
> +		     $(out)/runqslower.bpf.o
> +
> +$(out)/runqslower.bpf.o: $(out)/vmlinux.h runqslower.h
> +
> +$(out)/%.skel.h: $(out)/%.bpf.o
> +	$(BPFTOOL) gen skeleton $< > $@
> +
> +$(out)/%.bpf.o: %.bpf.c | $(out)
> +	$(CLANG) -g -O2 -target bpf -I$(out) -I$(LIBBPF_SRC)		      \
> +		 -c $(filter %.c,$^) -o $@ &&				      \
> +	$(LLVM_STRIP) -g $@
> +
> +$(out)/%.o: %.c | $(out)
> +	$(CC) $(CFLAGS) -I$(LIBBPF_SRC) -I$(out) -c $(filter %.c,$^) -o $@
> +
> +$(out):
> +	mkdir -p $(out)
> +
[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/3] bpftool: add extra CO-RE mode to btf dump command
  2019-12-19 17:06   ` Alexei Starovoitov
@ 2019-12-19 21:07     ` Andrii Nakryiko
  2019-12-19 22:04       ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2019-12-19 21:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Thu, Dec 19, 2019 at 9:06 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Dec 18, 2019 at 11:06:56PM -0800, Andrii Nakryiko wrote:
> > +     if (core_mode) {
> > +             printf("#if defined(__has_attribute) && __has_attribute(preserve_access_index)\n");
> > +             printf("#define __CLANG_BPF_CORE_SUPPORTED\n");
> > +             printf("#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)\n");
> > +             printf("#endif\n\n");
>
> I think it's dangerous to automatically opt-out when clang is not new enough.
> bpf prog will compile fine, but it will be missing co-re relocations.
> How about doing something like:
>   printf("#ifdef NEEDS_CO_RE\n");
>   printf("#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)\n");
>   printf("#endif\n\n");
> and emit it always when 'format c'.
> Then on the program side it will look:
> #define NEEDS_CO_RE
> #include "vmlinux.h"
> If clang is too old there will be a compile time error which is a good thing.
> Future features will have different NEEDS_ macros.

Wouldn't it be cleaner to separate vanilla C types dump vs
CO-RE-specific one? I'd prefer to have them separate and not require
every application to specify this #define NEEDS_CO_RE macro.
Furthermore, later we probably are going to add some additional
auto-generated types, definitions, etc, so plain C types dump and
CO-RE-specific one will deviate quite a bit. So it feels cleaner to
separate them now instead of polluting `format c` with irrelevant
noise.

I can unconditionally assume preserve_access_index availability,
though, because Clang 10 release is going to have all those features
needed for BPF CO-RE. I can also add nicer compiler error, if this
feature is not detected. Ok?

BTW, the reason I added this opt-out is because if you use
bpf_core_read() and BPF_CORE_READ() macros, you don't really need
those structs marked as relocatable. But again, I think it's fine to
just assume it has to be supported by compiler.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 2/3] libbpf/tools: add runqslower tool to libbpf
  2019-12-19 15:41   ` Daniel Borkmann
@ 2019-12-19 21:14     ` Andrii Nakryiko
  2019-12-19 22:04       ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2019-12-19 21:14 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov, Kernel Team

On Thu, Dec 19, 2019 at 7:41 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On Wed, Dec 18, 2019 at 11:06:57PM -0800, Andrii Nakryiko wrote:
> > Convert one of BCC tools (runqslower [0]) to BPF CO-RE + libbpf. It matches
> > its BCC-based counterpart 1-to-1, supporting all the same parameters and
> > functionality.
> >
> > runqslower tool utilizes BPF skeleton, auto-generated from BPF object file,
> > as well as memory-mapped interface to global (read-only, in this case) data.
> > Its makefile also ensures auto-generation of "relocatable" vmlinux.h, which is
> > necessary for BTF-typed raw tracepoints with direct memory access.
> >
> >   [0] https://github.com/iovisor/bcc/blob/11bf5d02c895df9646c117c713082eb192825293/tools/runqslower.py
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> >  tools/lib/bpf/tools/runqslower/.gitignore     |   2 +
> >  tools/lib/bpf/tools/runqslower/Makefile       |  60 ++++++
> >  .../lib/bpf/tools/runqslower/runqslower.bpf.c | 101 ++++++++++
> >  tools/lib/bpf/tools/runqslower/runqslower.c   | 187 ++++++++++++++++++
> >  tools/lib/bpf/tools/runqslower/runqslower.h   |  13 ++
>
> tools/lib/bpf/tools/ is rather weird, please add to tools/bpf/ which is the
> more appropriate place we have for small tools. Could also live directly in
> there, e.g. tools/bpf/runqslower.{c,h,bpf.c} and then built/run from selftests,
> but under libbpf directly is too odd.

runqslower is as much as a showcase of how to build a stand-alone tool
with libbpf and CO-RE, as a separate tool, which is why I put it under
libbpf directory. It's also not really BPF-specific tool, wouldn't
that make it weird to put it under tools/bpf along the bpftool? If we
added few more such tools using BPF CO-RE + libbpf, would you feel
it's still a good idea to put them under tools/bpf?

I tried to follow BCC structure, which has tools/ subdirectory with
all kinds of examples/tools built with BCC (but not necessarily used
by BCC itself). This is the same idea here.

>
> Thanks,
> Daniel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 2/3] libbpf/tools: add runqslower tool to libbpf
  2019-12-19 18:13   ` Yonghong Song
@ 2019-12-19 21:16     ` Andrii Nakryiko
  0 siblings, 0 replies; 15+ messages in thread
From: Andrii Nakryiko @ 2019-12-19 21:16 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, bpf, netdev, Alexei Starovoitov, daniel, Kernel Team

On Thu, Dec 19, 2019 at 10:13 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 12/18/19 11:06 PM, Andrii Nakryiko wrote:
> > Convert one of BCC tools (runqslower [0]) to BPF CO-RE + libbpf. It matches
> > its BCC-based counterpart 1-to-1, supporting all the same parameters and
> > functionality.
> >
> > runqslower tool utilizes BPF skeleton, auto-generated from BPF object file,
> > as well as memory-mapped interface to global (read-only, in this case) data.
> > Its makefile also ensures auto-generation of "relocatable" vmlinux.h, which is
> > necessary for BTF-typed raw tracepoints with direct memory access.
> >
> >    [0] https://github.com/iovisor/bcc/blob/11bf5d02c895df9646c117c713082eb192825293/tools/runqslower.py
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> >   tools/lib/bpf/tools/runqslower/.gitignore     |   2 +
> >   tools/lib/bpf/tools/runqslower/Makefile       |  60 ++++++
> >   .../lib/bpf/tools/runqslower/runqslower.bpf.c | 101 ++++++++++
> >   tools/lib/bpf/tools/runqslower/runqslower.c   | 187 ++++++++++++++++++
> >   tools/lib/bpf/tools/runqslower/runqslower.h   |  13 ++
> >   5 files changed, 363 insertions(+)
> >   create mode 100644 tools/lib/bpf/tools/runqslower/.gitignore
> >   create mode 100644 tools/lib/bpf/tools/runqslower/Makefile
> >   create mode 100644 tools/lib/bpf/tools/runqslower/runqslower.bpf.c
> >   create mode 100644 tools/lib/bpf/tools/runqslower/runqslower.c
> >   create mode 100644 tools/lib/bpf/tools/runqslower/runqslower.h
> >
> > diff --git a/tools/lib/bpf/tools/runqslower/.gitignore b/tools/lib/bpf/tools/runqslower/.gitignore
> > new file mode 100644
> > index 000000000000..404942cc9371
> > --- /dev/null
> > +++ b/tools/lib/bpf/tools/runqslower/.gitignore
> > @@ -0,0 +1,2 @@
> > +/.output
> > +/runqslower
> > diff --git a/tools/lib/bpf/tools/runqslower/Makefile b/tools/lib/bpf/tools/runqslower/Makefile
> > new file mode 100644
> > index 000000000000..b87b1f9fe9da
> > --- /dev/null
> > +++ b/tools/lib/bpf/tools/runqslower/Makefile
> > @@ -0,0 +1,60 @@
> > +# SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
> > +CLANG := clang
> > +LLC := llc
> > +LLVM_STRIP := llvm-strip
> > +BPFTOOL := bpftool
>
> Maybe it is better to use in-tree bpftool? This will ensure we use the
> one shipped together with the source which should have needed functionality.

I can do that as well, though I tried to keep it close enough to how
libbpf+CO-RE applications are going to be built in the wild, so tried
to minimize amount of in-kernel source tree assumptions. Ideally both
bpftool is installed and libbpf is available as a dynamic library in
the system. But surely I can add bpftool auto-build and use that one.

>
> > +LIBBPF_SRC := ../..
> > +CFLAGS := -g -Wall
> > +
> > +# Try to detect best kernel BTF source
> > +KERNEL_REL := $(shell uname -r)
> > +ifneq ("$(wildcard /sys/kenerl/btf/vmlinux)","")
> > +VMLINUX_BTF := /sys/kernel/btf/vmlinux
> > +else ifneq ("$(wildcard /boot/vmlinux-$(KERNEL_REL))","")
> > +VMLINUX_BTF := /boot/vmlinux-$(KERNEL_REL)
> > +else
> > +$(error "Can't detect kernel BTF, use VMLINUX_BTF to specify it explicitly")
> > +endif
> > +
> > +out := .output
> > +abs_out := $(abspath $(out))
> > +libbpf_src := $(abspath $(LIBBPF_SRC))
> > +
> > +.DELETE_ON_ERROR:
> > +
> > +.PHONY: all
> > +all: runqslower
> > +
> > +.PHONY: clean
> > +clean:
> > +     rm -rf $(out) runqslower
> > +
> > +runqslower: $(out)/runqslower.o $(out)/libbpf.a
> > +     $(CC) $(CFLAGS) -lelf -lz $^ -o $@
> > +
> > +$(out)/vmlinux.h: $(VMLINUX_BTF) | $(out)
> > +     $(BPFTOOL) btf dump file $(VMLINUX_BTF) format core > $@
> > +
> > +$(out)/libbpf.a: | $(out)
> > +     cd $(out) &&                                                          \
> > +     $(MAKE) -C $(libbpf_src) OUTPUT=$(abs_out)/ $(abs_out)/libbpf.a
> > +
> > +$(out)/runqslower.o: runqslower.h $(out)/runqslower.skel.h                 \
> > +                  $(out)/runqslower.bpf.o
> > +
> > +$(out)/runqslower.bpf.o: $(out)/vmlinux.h runqslower.h
> > +
> > +$(out)/%.skel.h: $(out)/%.bpf.o
> > +     $(BPFTOOL) gen skeleton $< > $@
> > +
> > +$(out)/%.bpf.o: %.bpf.c | $(out)
> > +     $(CLANG) -g -O2 -target bpf -I$(out) -I$(LIBBPF_SRC)                  \
> > +              -c $(filter %.c,$^) -o $@ &&                                 \
> > +     $(LLVM_STRIP) -g $@
> > +
> > +$(out)/%.o: %.c | $(out)
> > +     $(CC) $(CFLAGS) -I$(LIBBPF_SRC) -I$(out) -c $(filter %.c,$^) -o $@
> > +
> > +$(out):
> > +     mkdir -p $(out)
> > +
> [...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/3] bpftool: add extra CO-RE mode to btf dump command
  2019-12-19 21:07     ` Andrii Nakryiko
@ 2019-12-19 22:04       ` Alexei Starovoitov
  2019-12-20 17:40         ` Andrii Nakryiko
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2019-12-19 22:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Thu, Dec 19, 2019 at 01:07:38PM -0800, Andrii Nakryiko wrote:
> On Thu, Dec 19, 2019 at 9:06 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Dec 18, 2019 at 11:06:56PM -0800, Andrii Nakryiko wrote:
> > > +     if (core_mode) {
> > > +             printf("#if defined(__has_attribute) && __has_attribute(preserve_access_index)\n");
> > > +             printf("#define __CLANG_BPF_CORE_SUPPORTED\n");
> > > +             printf("#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)\n");
> > > +             printf("#endif\n\n");
> >
> > I think it's dangerous to automatically opt-out when clang is not new enough.
> > bpf prog will compile fine, but it will be missing co-re relocations.
> > How about doing something like:
> >   printf("#ifdef NEEDS_CO_RE\n");
> >   printf("#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)\n");
> >   printf("#endif\n\n");
> > and emit it always when 'format c'.
> > Then on the program side it will look:
> > #define NEEDS_CO_RE
> > #include "vmlinux.h"
> > If clang is too old there will be a compile time error which is a good thing.
> > Future features will have different NEEDS_ macros.
> 
> Wouldn't it be cleaner to separate vanilla C types dump vs
> CO-RE-specific one? I'd prefer to have them separate and not require
> every application to specify this #define NEEDS_CO_RE macro.
> Furthermore, later we probably are going to add some additional
> auto-generated types, definitions, etc, so plain C types dump and
> CO-RE-specific one will deviate quite a bit. So it feels cleaner to
> separate them now instead of polluting `format c` with irrelevant
> noise.

Say we do this 'format core' today then tomorrow another tweak to vmlinux.h
would need 'format core2' ? I think adding new format to bpftool for every
little feature will be annoying to users. I think the output should stay as
'format c' and that format should be extensible/customizable by bpf progs via
#define NEEDS_FEATURE_X. Then these features can grow without a need to keep
adding new cmd line args. This preserve_access_index feature makes up for less
than 1% difference in generated vmlinux.h. If some feature extension would
drastically change generated .h then it would justify new 'format'. This one is
just a small tweak. Also #define NEEDS_CO_RE is probably too broad. I think
#define CLANG_NEEDS_TO_EMIT_RELO would be more precise and less ambiguous.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 2/3] libbpf/tools: add runqslower tool to libbpf
  2019-12-19 21:14     ` Andrii Nakryiko
@ 2019-12-19 22:04       ` Alexei Starovoitov
  0 siblings, 0 replies; 15+ messages in thread
From: Alexei Starovoitov @ 2019-12-19 22:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Daniel Borkmann, Andrii Nakryiko, bpf, Networking,
	Alexei Starovoitov, Kernel Team

On Thu, Dec 19, 2019 at 01:14:46PM -0800, Andrii Nakryiko wrote:
> On Thu, Dec 19, 2019 at 7:41 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On Wed, Dec 18, 2019 at 11:06:57PM -0800, Andrii Nakryiko wrote:
> > > Convert one of BCC tools (runqslower [0]) to BPF CO-RE + libbpf. It matches
> > > its BCC-based counterpart 1-to-1, supporting all the same parameters and
> > > functionality.
> > >
> > > runqslower tool utilizes BPF skeleton, auto-generated from BPF object file,
> > > as well as memory-mapped interface to global (read-only, in this case) data.
> > > Its makefile also ensures auto-generation of "relocatable" vmlinux.h, which is
> > > necessary for BTF-typed raw tracepoints with direct memory access.
> > >
> > >   [0] https://github.com/iovisor/bcc/blob/11bf5d02c895df9646c117c713082eb192825293/tools/runqslower.py
> > >
> > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > ---
> > >  tools/lib/bpf/tools/runqslower/.gitignore     |   2 +
> > >  tools/lib/bpf/tools/runqslower/Makefile       |  60 ++++++
> > >  .../lib/bpf/tools/runqslower/runqslower.bpf.c | 101 ++++++++++
> > >  tools/lib/bpf/tools/runqslower/runqslower.c   | 187 ++++++++++++++++++
> > >  tools/lib/bpf/tools/runqslower/runqslower.h   |  13 ++
> >
> > tools/lib/bpf/tools/ is rather weird, please add to tools/bpf/ which is the
> > more appropriate place we have for small tools. Could also live directly in
> > there, e.g. tools/bpf/runqslower.{c,h,bpf.c} and then built/run from selftests,
> > but under libbpf directly is too odd.
> 
> runqslower is as much as a showcase of how to build a stand-alone tool
> with libbpf and CO-RE, as a separate tool, which is why I put it under
> libbpf directory. It's also not really BPF-specific tool, wouldn't
> that make it weird to put it under tools/bpf along the bpftool? If we
> added few more such tools using BPF CO-RE + libbpf, would you feel
> it's still a good idea to put them under tools/bpf?

I agree with Daniel tools/bpf/ seems like better location.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/3] bpftool: add extra CO-RE mode to btf dump command
  2019-12-19 22:04       ` Alexei Starovoitov
@ 2019-12-20 17:40         ` Andrii Nakryiko
  2019-12-21  3:21           ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Andrii Nakryiko @ 2019-12-20 17:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Thu, Dec 19, 2019 at 2:04 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Dec 19, 2019 at 01:07:38PM -0800, Andrii Nakryiko wrote:
> > On Thu, Dec 19, 2019 at 9:06 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Dec 18, 2019 at 11:06:56PM -0800, Andrii Nakryiko wrote:
> > > > +     if (core_mode) {
> > > > +             printf("#if defined(__has_attribute) && __has_attribute(preserve_access_index)\n");
> > > > +             printf("#define __CLANG_BPF_CORE_SUPPORTED\n");
> > > > +             printf("#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)\n");
> > > > +             printf("#endif\n\n");
> > >
> > > I think it's dangerous to automatically opt-out when clang is not new enough.
> > > bpf prog will compile fine, but it will be missing co-re relocations.
> > > How about doing something like:
> > >   printf("#ifdef NEEDS_CO_RE\n");
> > >   printf("#pragma clang attribute push (__attribute__((preserve_access_index)), apply_to = record)\n");
> > >   printf("#endif\n\n");
> > > and emit it always when 'format c'.
> > > Then on the program side it will look:
> > > #define NEEDS_CO_RE
> > > #include "vmlinux.h"
> > > If clang is too old there will be a compile time error which is a good thing.
> > > Future features will have different NEEDS_ macros.
> >
> > Wouldn't it be cleaner to separate vanilla C types dump vs
> > CO-RE-specific one? I'd prefer to have them separate and not require
> > every application to specify this #define NEEDS_CO_RE macro.
> > Furthermore, later we probably are going to add some additional
> > auto-generated types, definitions, etc, so plain C types dump and
> > CO-RE-specific one will deviate quite a bit. So it feels cleaner to
> > separate them now instead of polluting `format c` with irrelevant
> > noise.
>
> Say we do this 'format core' today then tomorrow another tweak to vmlinux.h
> would need 'format core2' ? I think adding new format to bpftool for every

No, not at all, it will stay within `format core`. If we need to do
some parameterized tweak to BPF CO-RE-targeted vmlinux.h, then we'll
have to add this parameter (even though I'd try to avoid
parameterizing it as much as possible, of course).


> little feature will be annoying to users. I think the output should stay as
> 'format c' and that format should be extensible/customizable by bpf progs via
> #define NEEDS_FEATURE_X. Then these features can grow without a need to keep
> adding new cmd line args. This preserve_access_index feature makes up for less
> than 1% difference in generated vmlinux.h. If some feature extension would
> drastically change generated .h then it would justify new 'format'. This one is
> just a small tweak. Also #define NEEDS_CO_RE is probably too broad. I think

This one is a small line-number-wise. But the big difference between
`format c` and `format core` is that the latter assumes we are dumping
*vmlinux's BTF* for use with *BPF CO-RE from BPF side*. `format c`
doesn't make any assumptions and faithfully dumps whatever BTF
information is provided, which can be some other BPF program, or just
any userspace program, on which pahole -J was executed.

This assumption is why I think we should separate those two formats.
For `format core` we can start auto-generating extra helper types,
similarly how BCC auto-generates them for tracepoint, for example.

Technically, sure, we can always guard everything behind extra
#ifdefs, but think about dumping BTF type info for your small BPF
program, and instead of seeing clean dump of types, you see all those
crazy #ifdefs and weird #pragma clang's, extra attributes and so on.
Not a great user experience for sure.


> #define CLANG_NEEDS_TO_EMIT_RELO would be more precise and less ambiguous.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/3] bpftool: add extra CO-RE mode to btf dump command
  2019-12-20 17:40         ` Andrii Nakryiko
@ 2019-12-21  3:21           ` Alexei Starovoitov
  2019-12-21  5:40             ` Andrii Nakryiko
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2019-12-21  3:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Fri, Dec 20, 2019 at 09:40:31AM -0800, Andrii Nakryiko wrote:
> 
> This one is a small line-number-wise. But the big difference between
> `format c` and `format core` is that the latter assumes we are dumping
> *vmlinux's BTF* for use with *BPF CO-RE from BPF side*. `format c`
> doesn't make any assumptions and faithfully dumps whatever BTF
> information is provided, which can be some other BPF program, or just
> any userspace program, on which pahole -J was executed.

When 'format c' was introduced it was specifically targeting CO-RE framework.
It is useful with BPF_CORE_READ macro and with builtin_preserve_access_index.
Then we realized that both macro and builtin(({ ... })) are quite cumbersome to
use and came with new clang attribute((preserve_access_index)) which makes
programs read like normal C without any extra gotchas. Obviously it's nice if
vmlinux.h already contains this attribute. Hence the need to either add extra
flag to bpftool to emit this attribute or just emit it by default. So
introducing new 'format core' (which is 99% the same as 'format c') and
deprecating 'format c' to 'this is just .h dump of BTF' when it was around for
few month only is absolutely no go. You need to find a way to extend 'format c'
without breaking existing users. Yes. Likely there are no such users, but that
doesn't matter. Once new api is introduced we have to stick to it. 'format c'
is such api.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH bpf-next 1/3] bpftool: add extra CO-RE mode to btf dump command
  2019-12-21  3:21           ` Alexei Starovoitov
@ 2019-12-21  5:40             ` Andrii Nakryiko
  0 siblings, 0 replies; 15+ messages in thread
From: Andrii Nakryiko @ 2019-12-21  5:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Fri, Dec 20, 2019 at 7:21 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Dec 20, 2019 at 09:40:31AM -0800, Andrii Nakryiko wrote:
> >
> > This one is a small line-number-wise. But the big difference between
> > `format c` and `format core` is that the latter assumes we are dumping
> > *vmlinux's BTF* for use with *BPF CO-RE from BPF side*. `format c`
> > doesn't make any assumptions and faithfully dumps whatever BTF
> > information is provided, which can be some other BPF program, or just
> > any userspace program, on which pahole -J was executed.
>
> When 'format c' was introduced it was specifically targeting CO-RE framework.

No it wasn't. Here's "motivational" part of BTF-to-C dump API patch set:

  "This patch set adds BTF-to-C dumping APIs to libbpf, allowing to output
  a subset of BTF types as a compilable C type definitions. This is useful by
  itself, as raw BTF output is not easy to inspect and comprehend. But it's also
  a big part of BPF CO-RE (compile once - run everywhere) initiative aimed at
  allowing to write relocatable BPF programs, that won't require on-the-host
  kernel headers (and would be able to inspect internal kernel structures, not
  exposed through kernel headers)."

And here's `format c` patch commit message:

  "Utilize new libbpf's btf_dump API to emit BTF as a C definitions."

It was never **just** for CO-RE framework, rather as a convenient
C-syntax view of BTF types.

> It is useful with BPF_CORE_READ macro and with builtin_preserve_access_index.
> Then we realized that both macro and builtin(({ ... })) are quite cumbersome to
> use and came with new clang attribute((preserve_access_index)) which makes
> programs read like normal C without any extra gotchas. Obviously it's nice if
> vmlinux.h already contains this attribute. Hence the need to either add extra
> flag to bpftool to emit this attribute or just emit it by default. So
> introducing new 'format core' (which is 99% the same as 'format c') and
> deprecating 'format c' to 'this is just .h dump of BTF' when it was around for
> few month only is absolutely no go. You need to find a way to extend 'format c'

I found the way, that's not the point of this discussion and
absolutely **not why** I'm adding `format core`. I feel like having
plain C dump of BTF is useful by itself (at least as a diagnostics
tool for BTF, similarly to objdump/readelf for ELF). But if no one
else cares, sure, I'll just reuse `format c` as CO-RE-specific BTF
dumper.

> without breaking existing users. Yes. Likely there are no such users, but that
> doesn't matter. Once new api is introduced we have to stick to it. 'format c'
> is such api.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-12-21  5:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-19  7:06 [PATCH bpf-next 0/3] Implement runqslower BCC tool with BPF CO-RE Andrii Nakryiko
2019-12-19  7:06 ` [PATCH bpf-next 1/3] bpftool: add extra CO-RE mode to btf dump command Andrii Nakryiko
2019-12-19 17:06   ` Alexei Starovoitov
2019-12-19 21:07     ` Andrii Nakryiko
2019-12-19 22:04       ` Alexei Starovoitov
2019-12-20 17:40         ` Andrii Nakryiko
2019-12-21  3:21           ` Alexei Starovoitov
2019-12-21  5:40             ` Andrii Nakryiko
2019-12-19  7:06 ` [PATCH bpf-next 2/3] libbpf/tools: add runqslower tool to libbpf Andrii Nakryiko
2019-12-19 15:41   ` Daniel Borkmann
2019-12-19 21:14     ` Andrii Nakryiko
2019-12-19 22:04       ` Alexei Starovoitov
2019-12-19 18:13   ` Yonghong Song
2019-12-19 21:16     ` Andrii Nakryiko
2019-12-19  7:06 ` [PATCH bpf-next 3/3] selftests/bpf: build runqslower from selftests Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).