[PATCH bpf-next 0/6] bpf: Support multi-attach for freplace programs

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next 0/6] bpf: Support multi-attach for freplace programs
@ 2020-07-13 20:12 Toke Høiland-Jørgensen
  2020-07-13 20:12 ` [PATCH bpf-next 1/6] bpf: change logging calls from verbose() to bpf_log() and use log pointer Toke Høiland-Jørgensen
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-07-13 20:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf

This series adds support attaching freplace BPF programs to multiple targets.
This is needed to support incremental attachment of multiple XDP programs using
the libxdp dispatcher model.

The first two patches are refactoring patches: The first one is a trivial change
to the logging in the verifier, split out to make the subsequent refactor easier
to read. Patch 2 refactors check_attach_btf_id() so that the checks on program
and target compatibility can be reused when attaching to a secondary location.

Patch 3 contains the change that actually allows attaching freplace programs in
multiple places. Patches 4-6 are the usual tools header updates, libbpf support
and selftest.

See the individual patches for details.

---

Toke Høiland-Jørgensen (6):
      bpf: change logging calls from verbose() to bpf_log() and use log pointer
      bpf: verifier: refactor check_attach_btf_id()
      bpf: support attaching freplace programs to multiple attach points
      tools: add new members to bpf_attr.raw_tracepoint in bpf.h
      libbpf: add support for supplying target to bpf_raw_tracepoint_open()
      selftests: add test for multiple attachments of freplace program


 include/linux/bpf.h                           |  23 ++-
 include/linux/bpf_verifier.h                  |   9 +
 include/uapi/linux/bpf.h                      |   9 +-
 kernel/bpf/core.c                             |   1 -
 kernel/bpf/syscall.c                          |  96 +++++++++-
 kernel/bpf/trampoline.c                       |  36 +++-
 kernel/bpf/verifier.c                         | 169 +++++++++--------
 tools/include/uapi/linux/bpf.h                |   9 +-
 tools/lib/bpf/bpf.c                           |  33 +++-
 tools/lib/bpf/bpf.h                           |  12 ++
 tools/lib/bpf/libbpf.map                      |   1 +
 .../selftests/bpf/prog_tests/fexit_bpf2bpf.c  | 174 +++++++++++++++---
 .../bpf/progs/freplace_get_constant.c         |  15 ++
 13 files changed, 460 insertions(+), 127 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/freplace_get_constant.c


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH bpf-next 1/6] bpf: change logging calls from verbose() to bpf_log() and use log pointer
  2020-07-13 20:12 [PATCH bpf-next 0/6] bpf: Support multi-attach for freplace programs Toke Høiland-Jørgensen
@ 2020-07-13 20:12 ` Toke Høiland-Jørgensen
  2020-07-13 20:12 ` [PATCH bpf-next 2/6] bpf: verifier: refactor check_attach_btf_id() Toke Høiland-Jørgensen
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-07-13 20:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

In preparation for moving code around, change a bunch of references to
env->log (and the verbose() logging helper) to use bpf_log() and a direct
pointer to struct bpf_verifier_log. While we're touching the function
signature, mark the 'prog' argument to bpf_check_type_match() as const.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/bpf.h   |    2 +-
 kernel/bpf/btf.c      |    6 +++---
 kernel/bpf/verifier.c |   46 +++++++++++++++++++++++-----------------------
 3 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0cd7f6884c5c..c547065387e4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1314,7 +1314,7 @@ int btf_check_func_arg_match(struct bpf_verifier_env *env, int subprog,
 			     struct bpf_reg_state *regs);
 int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog,
 			  struct bpf_reg_state *reg);
-int btf_check_type_match(struct bpf_verifier_env *env, struct bpf_prog *prog,
+int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,
 			 struct btf *btf, const struct btf_type *t);
 
 struct bpf_prog *bpf_prog_by_id(u32 id);
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 4c3007f428b1..ed60fa54378d 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -4366,7 +4366,7 @@ static int btf_check_func_type_match(struct bpf_verifier_log *log,
 }
 
 /* Compare BTFs of given program with BTF of target program */
-int btf_check_type_match(struct bpf_verifier_env *env, struct bpf_prog *prog,
+int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog,
 			 struct btf *btf2, const struct btf_type *t2)
 {
 	struct btf *btf1 = prog->aux->btf;
@@ -4374,7 +4374,7 @@ int btf_check_type_match(struct bpf_verifier_env *env, struct bpf_prog *prog,
 	u32 btf_id = 0;
 
 	if (!prog->aux->func_info) {
-		bpf_log(&env->log, "Program extension requires BTF\n");
+		bpf_log(log, "Program extension requires BTF\n");
 		return -EINVAL;
 	}
 
@@ -4386,7 +4386,7 @@ int btf_check_type_match(struct bpf_verifier_env *env, struct bpf_prog *prog,
 	if (!t1 || !btf_type_is_func(t1))
 		return -EFAULT;
 
-	return btf_check_func_type_match(&env->log, btf1, t1, btf2, t2);
+	return btf_check_func_type_match(log, btf1, t1, btf2, t2);
 }
 
 /* Compare BTF of a function with given bpf_reg_state.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b608185e1ffd..0db338567cf9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10729,6 +10729,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	struct bpf_prog *prog = env->prog;
 	bool prog_extension = prog->type == BPF_PROG_TYPE_EXT;
 	struct bpf_prog *tgt_prog = prog->aux->linked_prog;
+	struct bpf_verifier_log *log = &env->log;
 	u32 btf_id = prog->aux->attach_btf_id;
 	const char prefix[] = "btf_trace_";
 	struct btf_func_model fmodel;
@@ -10750,23 +10751,23 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		return 0;
 
 	if (!btf_id) {
-		verbose(env, "Tracing programs must provide btf_id\n");
+		bpf_log(log, "Tracing programs must provide btf_id\n");
 		return -EINVAL;
 	}
 	btf = bpf_prog_get_target_btf(prog);
 	if (!btf) {
-		verbose(env,
+		bpf_log(log,
 			"FENTRY/FEXIT program can only be attached to another program annotated with BTF\n");
 		return -EINVAL;
 	}
 	t = btf_type_by_id(btf, btf_id);
 	if (!t) {
-		verbose(env, "attach_btf_id %u is invalid\n", btf_id);
+		bpf_log(log, "attach_btf_id %u is invalid\n", btf_id);
 		return -EINVAL;
 	}
 	tname = btf_name_by_offset(btf, t->name_off);
 	if (!tname) {
-		verbose(env, "attach_btf_id %u doesn't have a name\n", btf_id);
+		bpf_log(log, "attach_btf_id %u doesn't have a name\n", btf_id);
 		return -EINVAL;
 	}
 	if (tgt_prog) {
@@ -10778,18 +10779,18 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 				break;
 			}
 		if (subprog == -1) {
-			verbose(env, "Subprog %s doesn't exist\n", tname);
+			bpf_log(log, "Subprog %s doesn't exist\n", tname);
 			return -EINVAL;
 		}
 		conservative = aux->func_info_aux[subprog].unreliable;
 		if (prog_extension) {
 			if (conservative) {
-				verbose(env,
+				bpf_log(log,
 					"Cannot replace static functions\n");
 				return -EINVAL;
 			}
 			if (!prog->jit_requested) {
-				verbose(env,
+				bpf_log(log,
 					"Extension programs should be JITed\n");
 				return -EINVAL;
 			}
@@ -10797,7 +10798,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 			prog->expected_attach_type = tgt_prog->expected_attach_type;
 		}
 		if (!tgt_prog->jited) {
-			verbose(env, "Can attach to only JITed progs\n");
+			bpf_log(log, "Can attach to only JITed progs\n");
 			return -EINVAL;
 		}
 		if (tgt_prog->type == prog->type) {
@@ -10805,7 +10806,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 			 * Cannot attach program extension to another extension.
 			 * It's ok to attach fentry/fexit to extension program.
 			 */
-			verbose(env, "Cannot recursively attach\n");
+			bpf_log(log, "Cannot recursively attach\n");
 			return -EINVAL;
 		}
 		if (tgt_prog->type == BPF_PROG_TYPE_TRACING &&
@@ -10827,13 +10828,13 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 			 * reasonable stack size. Hence extending fentry is not
 			 * allowed.
 			 */
-			verbose(env, "Cannot extend fentry/fexit\n");
+			bpf_log(log, "Cannot extend fentry/fexit\n");
 			return -EINVAL;
 		}
 		key = ((u64)aux->id) << 32 | btf_id;
 	} else {
 		if (prog_extension) {
-			verbose(env, "Cannot replace kernel functions\n");
+			bpf_log(log, "Cannot replace kernel functions\n");
 			return -EINVAL;
 		}
 		key = btf_id;
@@ -10842,17 +10843,17 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	switch (prog->expected_attach_type) {
 	case BPF_TRACE_RAW_TP:
 		if (tgt_prog) {
-			verbose(env,
+			bpf_log(log,
 				"Only FENTRY/FEXIT progs are attachable to another BPF prog\n");
 			return -EINVAL;
 		}
 		if (!btf_type_is_typedef(t)) {
-			verbose(env, "attach_btf_id %u is not a typedef\n",
+			bpf_log(log, "attach_btf_id %u is not a typedef\n",
 				btf_id);
 			return -EINVAL;
 		}
 		if (strncmp(prefix, tname, sizeof(prefix) - 1)) {
-			verbose(env, "attach_btf_id %u points to wrong type name %s\n",
+			bpf_log(log, "attach_btf_id %u points to wrong type name %s\n",
 				btf_id, tname);
 			return -EINVAL;
 		}
@@ -10875,7 +10876,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		return 0;
 	case BPF_TRACE_ITER:
 		if (!btf_type_is_func(t)) {
-			verbose(env, "attach_btf_id %u is not a function\n",
+			bpf_log(log, "attach_btf_id %u is not a function\n",
 				btf_id);
 			return -EINVAL;
 		}
@@ -10886,8 +10887,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		prog->aux->attach_func_proto = t;
 		if (!bpf_iter_prog_supported(prog))
 			return -EINVAL;
-		ret = btf_distill_func_proto(&env->log, btf, t,
-					     tname, &fmodel);
+		ret = btf_distill_func_proto(log, btf, t, tname, &fmodel);
 		return ret;
 	default:
 		if (!prog_extension)
@@ -10899,18 +10899,18 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	case BPF_TRACE_FEXIT:
 		prog->aux->attach_func_name = tname;
 		if (prog->type == BPF_PROG_TYPE_LSM) {
-			ret = bpf_lsm_verify_prog(&env->log, prog);
+			ret = bpf_lsm_verify_prog(log, prog);
 			if (ret < 0)
 				return ret;
 		}
 
 		if (!btf_type_is_func(t)) {
-			verbose(env, "attach_btf_id %u is not a function\n",
+			bpf_log(log, "attach_btf_id %u is not a function\n",
 				btf_id);
 			return -EINVAL;
 		}
 		if (prog_extension &&
-		    btf_check_type_match(env, prog, btf, t))
+		    btf_check_type_match(log, prog, btf, t))
 			return -EINVAL;
 		t = btf_type_by_id(btf, t->type);
 		if (!btf_type_is_func_proto(t))
@@ -10929,7 +10929,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 			prog->aux->attach_func_proto = NULL;
 			t = NULL;
 		}
-		ret = btf_distill_func_proto(&env->log, btf, t,
+		ret = btf_distill_func_proto(log, btf, t,
 					     tname, &tr->func.model);
 		if (ret < 0)
 			goto out;
@@ -10941,7 +10941,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		} else {
 			addr = kallsyms_lookup_name(tname);
 			if (!addr) {
-				verbose(env,
+				bpf_log(log,
 					"The address of function %s cannot be found\n",
 					tname);
 				ret = -ENOENT;
@@ -10952,7 +10952,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		if (prog->expected_attach_type == BPF_MODIFY_RETURN) {
 			ret = check_attach_modify_return(prog, addr);
 			if (ret)
-				verbose(env, "%s() is not modifiable\n",
+				bpf_log(log, "%s() is not modifiable\n",
 					prog->aux->attach_func_name);
 		}
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next 2/6] bpf: verifier: refactor check_attach_btf_id()
  2020-07-13 20:12 [PATCH bpf-next 0/6] bpf: Support multi-attach for freplace programs Toke Høiland-Jørgensen
  2020-07-13 20:12 ` [PATCH bpf-next 1/6] bpf: change logging calls from verbose() to bpf_log() and use log pointer Toke Høiland-Jørgensen
@ 2020-07-13 20:12 ` Toke Høiland-Jørgensen
  2020-07-13 20:12 ` [PATCH bpf-next 3/6] bpf: support attaching freplace programs to multiple attach points Toke Høiland-Jørgensen
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-07-13 20:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

The check_attach_btf_id() function really does three things:

1. It performs a bunch of checks on the program to ensure that the
   attachment is valid.

2. It stores a bunch of state about the attachment being requested in
   the verifier environment and struct bpf_prog objects.

3. It allocates a trampoline for the attachment.

This patch splits out (1.) and (3.) into separate functions in preparation
for reusing them when the actual attachment is happening (in the
raw_tracepoint_open syscall operation), which will allow tracing programs
to have multiple (compatible) attachments.

No functional change is intended with this patch.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/bpf.h          |    9 ++
 include/linux/bpf_verifier.h |    9 ++
 kernel/bpf/trampoline.c      |   22 ++++++
 kernel/bpf/verifier.c        |  165 ++++++++++++++++++++++++------------------
 4 files changed, 133 insertions(+), 72 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c547065387e4..67310595f720 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -570,6 +570,9 @@ static __always_inline unsigned int bpf_dispatcher_nop_func(
 struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
 int bpf_trampoline_link_prog(struct bpf_prog *prog);
 int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
+int bpf_trampoline_get(u64 key, void *addr,
+		       struct btf_func_model *fmodel,
+		       struct bpf_trampoline **trampoline);
 void bpf_trampoline_put(struct bpf_trampoline *tr);
 #define BPF_DISPATCHER_INIT(_name) {				\
 	.mutex = __MUTEX_INITIALIZER(_name.mutex),		\
@@ -626,6 +629,12 @@ static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
 {
 	return -ENOTSUPP;
 }
+static inline int bpf_trampoline_get(u64 key, void *addr,
+				     struct btf_func_model *fmodel,
+				     struct bpf_trampoline **trampoline)
+{
+	return -EOPNOTSUPP;
+}
 static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {}
 #define DEFINE_BPF_DISPATCHER(name)
 #define DECLARE_BPF_DISPATCHER(name)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 53c7bd568c5d..b882185257cc 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -446,4 +446,13 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt);
 int check_ctx_reg(struct bpf_verifier_env *env,
 		  const struct bpf_reg_state *reg, int regno);
 
+int bpf_check_attach_target(struct bpf_verifier_log *log,
+			    const struct bpf_prog *prog,
+			    const struct bpf_prog *tgt_prog,
+			    u32 btf_id,
+			    struct btf_func_model *fmodel,
+			    long *tgt_addr,
+			    const char **tgt_name,
+			    const struct btf_type **tgt_type);
+
 #endif /* _LINUX_BPF_VERIFIER_H */
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 9be85aa4ec5f..fadfa330f728 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -331,6 +331,28 @@ int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
 	return err;
 }
 
+int bpf_trampoline_get(u64 key, void *addr,
+		       struct btf_func_model *fmodel,
+		       struct bpf_trampoline **trampoline)
+{
+	struct bpf_trampoline *tr;
+
+	tr = bpf_trampoline_lookup(key);
+	if (!tr)
+		return -ENOMEM;
+
+	mutex_lock(&tr->mutex);
+	if (tr->func.addr)
+		goto out;
+
+	memcpy(&tr->func.model, fmodel, sizeof(*fmodel));
+	tr->func.addr = addr;
+out:
+	mutex_unlock(&tr->mutex);
+	*trampoline = tr;
+	return 0;
+}
+
 void bpf_trampoline_put(struct bpf_trampoline *tr)
 {
 	if (!tr)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0db338567cf9..ff3c5c53982c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10724,31 +10724,23 @@ static int check_attach_modify_return(struct bpf_prog *prog, unsigned long addr)
 	return -EINVAL;
 }
 
-static int check_attach_btf_id(struct bpf_verifier_env *env)
+int bpf_check_attach_target(struct bpf_verifier_log *log,
+			    const struct bpf_prog *prog,
+			    const struct bpf_prog *tgt_prog,
+			    u32 btf_id,
+			    struct btf_func_model *fmodel,
+			    long *tgt_addr,
+			    const char **tgt_name,
+			    const struct btf_type **tgt_type)
 {
-	struct bpf_prog *prog = env->prog;
 	bool prog_extension = prog->type == BPF_PROG_TYPE_EXT;
-	struct bpf_prog *tgt_prog = prog->aux->linked_prog;
-	struct bpf_verifier_log *log = &env->log;
-	u32 btf_id = prog->aux->attach_btf_id;
 	const char prefix[] = "btf_trace_";
-	struct btf_func_model fmodel;
 	int ret = 0, subprog = -1, i;
-	struct bpf_trampoline *tr;
 	const struct btf_type *t;
 	bool conservative = true;
 	const char *tname;
 	struct btf *btf;
 	long addr;
-	u64 key;
-
-	if (prog->type == BPF_PROG_TYPE_STRUCT_OPS)
-		return check_struct_ops_btf_id(env);
-
-	if (prog->type != BPF_PROG_TYPE_TRACING &&
-	    prog->type != BPF_PROG_TYPE_LSM &&
-	    !prog_extension)
-		return 0;
 
 	if (!btf_id) {
 		bpf_log(log, "Tracing programs must provide btf_id\n");
@@ -10794,8 +10786,6 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 					"Extension programs should be JITed\n");
 				return -EINVAL;
 			}
-			env->ops = bpf_verifier_ops[tgt_prog->type];
-			prog->expected_attach_type = tgt_prog->expected_attach_type;
 		}
 		if (!tgt_prog->jited) {
 			bpf_log(log, "Can attach to only JITed progs\n");
@@ -10831,13 +10821,11 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 			bpf_log(log, "Cannot extend fentry/fexit\n");
 			return -EINVAL;
 		}
-		key = ((u64)aux->id) << 32 | btf_id;
 	} else {
 		if (prog_extension) {
 			bpf_log(log, "Cannot replace kernel functions\n");
 			return -EINVAL;
 		}
-		key = btf_id;
 	}
 
 	switch (prog->expected_attach_type) {
@@ -10867,13 +10855,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 			/* should never happen in valid vmlinux build */
 			return -EINVAL;
 
-		/* remember two read only pointers that are valid for
-		 * the life time of the kernel
-		 */
-		prog->aux->attach_func_name = tname;
-		prog->aux->attach_func_proto = t;
-		prog->aux->attach_btf_trace = true;
-		return 0;
+		break;
 	case BPF_TRACE_ITER:
 		if (!btf_type_is_func(t)) {
 			bpf_log(log, "attach_btf_id %u is not a function\n",
@@ -10883,12 +10865,10 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		t = btf_type_by_id(btf, t->type);
 		if (!btf_type_is_func_proto(t))
 			return -EINVAL;
-		prog->aux->attach_func_name = tname;
-		prog->aux->attach_func_proto = t;
-		if (!bpf_iter_prog_supported(prog))
-			return -EINVAL;
-		ret = btf_distill_func_proto(log, btf, t, tname, &fmodel);
-		return ret;
+		ret = btf_distill_func_proto(log, btf, t, tname, fmodel);
+		if (ret)
+			return ret;
+		break;
 	default:
 		if (!prog_extension)
 			return -EINVAL;
@@ -10897,13 +10877,6 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	case BPF_LSM_MAC:
 	case BPF_TRACE_FENTRY:
 	case BPF_TRACE_FEXIT:
-		prog->aux->attach_func_name = tname;
-		if (prog->type == BPF_PROG_TYPE_LSM) {
-			ret = bpf_lsm_verify_prog(log, prog);
-			if (ret < 0)
-				return ret;
-		}
-
 		if (!btf_type_is_func(t)) {
 			bpf_log(log, "attach_btf_id %u is not a function\n",
 				btf_id);
@@ -10915,24 +10888,14 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		t = btf_type_by_id(btf, t->type);
 		if (!btf_type_is_func_proto(t))
 			return -EINVAL;
-		tr = bpf_trampoline_lookup(key);
-		if (!tr)
-			return -ENOMEM;
-		/* t is either vmlinux type or another program's type */
-		prog->aux->attach_func_proto = t;
-		mutex_lock(&tr->mutex);
-		if (tr->func.addr) {
-			prog->aux->trampoline = tr;
-			goto out;
-		}
-		if (tgt_prog && conservative) {
-			prog->aux->attach_func_proto = NULL;
+
+		if (tgt_prog && conservative)
 			t = NULL;
-		}
-		ret = btf_distill_func_proto(log, btf, t,
-					     tname, &tr->func.model);
+
+		ret = btf_distill_func_proto(log, btf, t, tname, fmodel);
 		if (ret < 0)
-			goto out;
+			return ret;
+
 		if (tgt_prog) {
 			if (subprog == 0)
 				addr = (long) tgt_prog->bpf_func;
@@ -10944,27 +10907,85 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 				bpf_log(log,
 					"The address of function %s cannot be found\n",
 					tname);
-				ret = -ENOENT;
-				goto out;
+				return -ENOENT;
 			}
 		}
+		break;
+	}
 
-		if (prog->expected_attach_type == BPF_MODIFY_RETURN) {
-			ret = check_attach_modify_return(prog, addr);
-			if (ret)
-				bpf_log(log, "%s() is not modifiable\n",
-					prog->aux->attach_func_name);
-		}
+	*tgt_addr = addr;
+	if (tgt_name)
+		*tgt_name = tname;
+	if (tgt_type)
+		*tgt_type = t;
+	return 0;
+}
 
-		if (ret)
-			goto out;
-		tr->func.addr = (void *)addr;
-		prog->aux->trampoline = tr;
-out:
-		mutex_unlock(&tr->mutex);
-		if (ret)
-			bpf_trampoline_put(tr);
+static int check_attach_btf_id(struct bpf_verifier_env *env)
+{
+	struct bpf_prog *prog = env->prog;
+	struct bpf_prog *tgt_prog = prog->aux->linked_prog;
+	u32 btf_id = prog->aux->attach_btf_id;
+	struct btf_func_model fmodel;
+	const struct btf_type *t;
+	const char *tname;
+	long addr;
+	int ret;
+	u64 key;
+
+	if (prog->type == BPF_PROG_TYPE_STRUCT_OPS)
+		return check_struct_ops_btf_id(env);
+
+	if (prog->type != BPF_PROG_TYPE_TRACING &&
+	    prog->type != BPF_PROG_TYPE_LSM &&
+	    prog->type != BPF_PROG_TYPE_EXT)
+		return 0;
+
+	ret = bpf_check_attach_target(&env->log, prog, tgt_prog, btf_id,
+				      &fmodel, &addr, &tname, &t);
+	if (ret)
 		return ret;
+
+	if (tgt_prog) {
+		if (prog->type == BPF_PROG_TYPE_EXT) {
+			env->ops = bpf_verifier_ops[tgt_prog->type];
+			prog->expected_attach_type = tgt_prog->expected_attach_type;
+		}
+		key = ((u64)tgt_prog->aux->id) << 32 | btf_id;
+	} else {
+		key = btf_id;
+	}
+
+	prog->aux->attach_func_proto = t;
+	prog->aux->attach_func_name = tname;
+
+	switch (prog->expected_attach_type) {
+	case BPF_TRACE_RAW_TP:
+		/* remember two read only pointers that are valid for
+		 * the life time of the kernel
+		 */
+		prog->aux->attach_btf_trace = true;
+		return 0;
+	case BPF_TRACE_ITER:
+		if (!bpf_iter_prog_supported(prog))
+			return -EINVAL;
+		return 0;
+	case BPF_MODIFY_RETURN:
+		ret = check_attach_modify_return(prog, addr);
+		if (ret) {
+			verbose(env, "%s() is not modifiable\n",
+				prog->aux->attach_func_name);
+			return ret;
+		}
+		fallthrough;
+	default:
+		if (prog->type == BPF_PROG_TYPE_LSM) {
+			ret = bpf_lsm_verify_prog(&env->log, prog);
+			if (ret < 0)
+				return ret;
+		}
+		return bpf_trampoline_get(key, (void *)addr, &fmodel,
+					  &prog->aux->trampoline);
 	}
 }
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next 3/6] bpf: support attaching freplace programs to multiple attach points
  2020-07-13 20:12 [PATCH bpf-next 0/6] bpf: Support multi-attach for freplace programs Toke Høiland-Jørgensen
  2020-07-13 20:12 ` [PATCH bpf-next 1/6] bpf: change logging calls from verbose() to bpf_log() and use log pointer Toke Høiland-Jørgensen
  2020-07-13 20:12 ` [PATCH bpf-next 2/6] bpf: verifier: refactor check_attach_btf_id() Toke Høiland-Jørgensen
@ 2020-07-13 20:12 ` Toke Høiland-Jørgensen
  2020-07-13 23:16   ` kernel test robot
  2020-07-13 20:12 ` [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h Toke Høiland-Jørgensen
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-07-13 20:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This enables support for attaching freplace programs to multiple attach
points. It does this by amending UAPI for bpf_raw_tracepoint_open with a
target prog fd and btf ID pair that can be used to supply the new
attachment point. The target must be compatible with the target that was
supplied at program load time.

Since the checks for the attach target compatibility will output debug
information about why the attachment was rejected to the verifier log, also
add support for supplying a log buffer to bpf_raw_tracepoint_open.

The implementation reuses the checks that were factored out of
check_attach_btf_id() in the previous patch. It also moves the BPF
trampoline out of prog->aux and into struct bpf_tracking_link where it is
managed as part of the bpf_link structure. When a new target is specified,
a reference to the target program is also kept in the bpf_link structure
which removes the need to keep references to all attach targets in the
bpf_prog structure itself.

The code could theoretically support multiple-attach of other types of
tracing programs as well, but since I don't have a use case for any of
those, the bpf_tracing_prog_attach() function will reject new targets for
anything other than PROG_TYPE_EXT programs.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/bpf.h      |   14 ++++---
 include/uapi/linux/bpf.h |    9 +++-
 kernel/bpf/core.c        |    1 
 kernel/bpf/syscall.c     |   96 +++++++++++++++++++++++++++++++++++++++++++---
 kernel/bpf/trampoline.c  |   14 +++----
 kernel/bpf/verifier.c    |   16 +++++---
 6 files changed, 121 insertions(+), 29 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 67310595f720..3d43778a1d05 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -568,8 +568,8 @@ static __always_inline unsigned int bpf_dispatcher_nop_func(
 }
 #ifdef CONFIG_BPF_JIT
 struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
-int bpf_trampoline_link_prog(struct bpf_prog *prog);
-int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
+int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr);
+int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr);
 int bpf_trampoline_get(u64 key, void *addr,
 		       struct btf_func_model *fmodel,
 		       struct bpf_trampoline **trampoline);
@@ -621,11 +621,13 @@ static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
 {
 	return NULL;
 }
-static inline int bpf_trampoline_link_prog(struct bpf_prog *prog)
+static inline int bpf_trampoline_link_prog(struct bpf_prog *prog,
+					   struct bpf_trampoline *tr)
 {
 	return -ENOTSUPP;
 }
-static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
+static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog,
+					     struct bpf_trampoline *tr)
 {
 	return -ENOTSUPP;
 }
@@ -697,10 +699,12 @@ struct bpf_prog_aux {
 	bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */
 	bool func_proto_unreliable;
 	enum bpf_tramp_prog_type trampoline_prog_type;
-	struct bpf_trampoline *trampoline;
 	struct hlist_node tramp_hlist;
 	/* BTF_KIND_FUNC_PROTO for valid attach_btf_id */
 	const struct btf_type *attach_func_proto;
+	/* target BPF prog types for trace programs */
+	enum bpf_prog_type tgt_prog_type;
+	enum bpf_attach_type tgt_attach_type;
 	/* function name for valid attach_btf_id */
 	const char *attach_func_name;
 	struct bpf_prog **func;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index da9bf35a26f8..662a15e4a1a1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -573,8 +573,13 @@ union bpf_attr {
 	} query;
 
 	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
-		__u64 name;
-		__u32 prog_fd;
+		__u64		name;
+		__u32		prog_fd;
+		__u32		log_level;	/* verbosity level of log */
+		__u32		log_size;	/* size of user buffer */
+		__aligned_u64	log_buf;	/* user supplied buffer */
+		__u32		tgt_prog_fd;
+		__u32		tgt_btf_id;
 	} raw_tracepoint;
 
 	struct { /* anonymous struct for BPF_BTF_LOAD */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9df4cc9a2907..ed4d7259316a 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2087,7 +2087,6 @@ static void bpf_prog_free_deferred(struct work_struct *work)
 	if (aux->prog->has_callchain_buf)
 		put_callchain_buffers();
 #endif
-	bpf_trampoline_put(aux->trampoline);
 	for (i = 0; i < aux->func_cnt; i++)
 		bpf_jit_free(aux->func[i]);
 	if (aux->func_cnt) {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8da159936bab..eec6dc0f0a54 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4,6 +4,7 @@
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
 #include <linux/bpf_lirc.h>
+#include <linux/bpf_verifier.h>
 #include <linux/btf.h>
 #include <linux/syscalls.h>
 #include <linux/slab.h>
@@ -2483,11 +2484,21 @@ struct bpf_link *bpf_link_get_from_fd(u32 ufd)
 struct bpf_tracing_link {
 	struct bpf_link link;
 	enum bpf_attach_type attach_type;
+	struct bpf_trampoline *trampoline;
+	struct bpf_prog *tgt_prog;
 };
 
 static void bpf_tracing_link_release(struct bpf_link *link)
 {
-	WARN_ON_ONCE(bpf_trampoline_unlink_prog(link->prog));
+	struct bpf_tracing_link *tr_link =
+		container_of(link, struct bpf_tracing_link, link);
+
+	WARN_ON_ONCE(bpf_trampoline_unlink_prog(link->prog,
+						tr_link->trampoline));
+
+	bpf_trampoline_put(tr_link->trampoline);
+	if (tr_link->tgt_prog)
+		bpf_prog_put(tr_link->tgt_prog);
 }
 
 static void bpf_tracing_link_dealloc(struct bpf_link *link)
@@ -2527,10 +2538,18 @@ static const struct bpf_link_ops bpf_tracing_link_lops = {
 	.fill_link_info = bpf_tracing_link_fill_link_info,
 };
 
-static int bpf_tracing_prog_attach(struct bpf_prog *prog)
+static int bpf_tracing_prog_attach(struct bpf_prog *prog,
+				   int tgt_prog_fd,
+				   u32 btf_id,
+				   struct bpf_verifier_log *log)
 {
 	struct bpf_link_primer link_primer;
+	struct bpf_trampoline *tr = NULL;
+	struct bpf_prog *tgt_prog = NULL;
 	struct bpf_tracing_link *link;
+	struct btf_func_model fmodel;
+	long addr;
+	u64 key;
 	int err;
 
 	switch (prog->type) {
@@ -2559,6 +2578,43 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
 		goto out_put_prog;
 	}
 
+	if (tgt_prog_fd) {
+		/* For now we only allow new targets for BPF_PROG_TYPE_EXT */
+		if (prog->type != BPF_PROG_TYPE_EXT ||
+		    !btf_id) {
+			err = -EINVAL;
+			goto out_put_prog;
+		}
+		tgt_prog = bpf_prog_get(tgt_prog_fd);
+		if (IS_ERR(tgt_prog)) {
+			err = PTR_ERR(tgt_prog);
+			tgt_prog = NULL;
+			goto out_put_prog;
+		}
+
+	} else if (btf_id) {
+		err = -EINVAL;
+		goto out_put_prog;
+	} else {
+		btf_id = prog->aux->attach_btf_id;
+		tgt_prog = prog->aux->linked_prog;
+		if (tgt_prog)
+			bpf_prog_inc(tgt_prog); /* we call bpf_prog_put() on link release */
+	}
+	err = bpf_check_attach_target(log, prog, tgt_prog, btf_id,
+				      &fmodel, &addr, NULL, NULL);
+	if (err)
+		goto out_put_prog;
+
+	if (tgt_prog)
+		key = ((u64)tgt_prog->aux->id) << 32 | btf_id;
+	else
+		key = btf_id;
+
+	err = bpf_trampoline_get(key, (void *)addr, &fmodel, &tr);
+	if (err)
+		goto out_put_prog;
+
 	link = kzalloc(sizeof(*link), GFP_USER);
 	if (!link) {
 		err = -ENOMEM;
@@ -2574,15 +2630,21 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
 		goto out_put_prog;
 	}
 
-	err = bpf_trampoline_link_prog(prog);
+	err = bpf_trampoline_link_prog(prog, tr);
 	if (err) {
 		bpf_link_cleanup(&link_primer);
 		goto out_put_prog;
 	}
+	link->trampoline = tr;
+	link->tgt_prog = tgt_prog;
 
 	return bpf_link_settle(&link_primer);
 out_put_prog:
 	bpf_prog_put(prog);
+	if (tgt_prog)
+		bpf_prog_put(tgt_prog);
+	if (tr)
+		bpf_trampoline_put(tr);
 	return err;
 }
 
@@ -2660,11 +2722,12 @@ static const struct bpf_link_ops bpf_raw_tp_link_lops = {
 	.fill_link_info = bpf_raw_tp_link_fill_link_info,
 };
 
-#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
+#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.tgt_btf_id
 
 static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 {
 	struct bpf_link_primer link_primer;
+	struct bpf_verifier_log log = {};
 	struct bpf_raw_tp_link *link;
 	struct bpf_raw_event_map *btp;
 	struct bpf_prog *prog;
@@ -2679,13 +2742,30 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 	if (IS_ERR(prog))
 		return PTR_ERR(prog);
 
+	if (attr->raw_tracepoint.log_level ||
+	    attr->raw_tracepoint.log_buf ||
+	    attr->raw_tracepoint.log_size) {
+		/* user requested verbose verifier output
+		 * and supplied buffer to store the verification trace
+		 */
+		log.level = attr->raw_tracepoint.log_level;
+		log.ubuf = (char __user *) (unsigned long) attr->raw_tracepoint.log_buf;
+		log.len_total = attr->raw_tracepoint.log_size;
+
+		/* log attributes have to be sane */
+		if (log.len_total < 128 || log.len_total > UINT_MAX >> 2 ||
+		    !log.level || !log.ubuf || log.level & ~BPF_LOG_MASK)
+			return -EINVAL;
+	}
+
 	switch (prog->type) {
 	case BPF_PROG_TYPE_TRACING:
 	case BPF_PROG_TYPE_EXT:
 	case BPF_PROG_TYPE_LSM:
 		if (attr->raw_tracepoint.name) {
-			/* The attach point for this category of programs
-			 * should be specified via btf_id during program load.
+			/* The attach point for this category of programs should
+			 * be specified via btf_id during program load, or using
+			 * tgt_btf_id.
 			 */
 			err = -EINVAL;
 			goto out_put_prog;
@@ -2695,7 +2775,9 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 			tp_name = prog->aux->attach_func_name;
 			break;
 		}
-		return bpf_tracing_prog_attach(prog);
+		return bpf_tracing_prog_attach(prog,
+					       attr->raw_tracepoint.tgt_prog_fd,
+					       attr->raw_tracepoint.tgt_btf_id, &log);
 	case BPF_PROG_TYPE_RAW_TRACEPOINT:
 	case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
 		if (strncpy_from_user(buf,
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index fadfa330f728..40797405f1a0 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -256,14 +256,13 @@ static enum bpf_tramp_prog_type bpf_attach_type_to_tramp(struct bpf_prog *prog)
 	}
 }
 
-int bpf_trampoline_link_prog(struct bpf_prog *prog)
+int bpf_trampoline_link_prog(struct bpf_prog *prog,
+			     struct bpf_trampoline *tr)
 {
 	enum bpf_tramp_prog_type kind;
-	struct bpf_trampoline *tr;
 	int err = 0;
 	int cnt;
 
-	tr = prog->aux->trampoline;
 	kind = bpf_attach_type_to_tramp(prog);
 	mutex_lock(&tr->mutex);
 	if (tr->extension_prog) {
@@ -296,7 +295,7 @@ int bpf_trampoline_link_prog(struct bpf_prog *prog)
 	}
 	hlist_add_head(&prog->aux->tramp_hlist, &tr->progs_hlist[kind]);
 	tr->progs_cnt[kind]++;
-	err = bpf_trampoline_update(prog->aux->trampoline);
+	err = bpf_trampoline_update(tr);
 	if (err) {
 		hlist_del(&prog->aux->tramp_hlist);
 		tr->progs_cnt[kind]--;
@@ -307,13 +306,12 @@ int bpf_trampoline_link_prog(struct bpf_prog *prog)
 }
 
 /* bpf_trampoline_unlink_prog() should never fail. */
-int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
+int bpf_trampoline_unlink_prog(struct bpf_prog *prog,
+			       struct bpf_trampoline *tr)
 {
 	enum bpf_tramp_prog_type kind;
-	struct bpf_trampoline *tr;
 	int err;
 
-	tr = prog->aux->trampoline;
 	kind = bpf_attach_type_to_tramp(prog);
 	mutex_lock(&tr->mutex);
 	if (kind == BPF_TRAMP_REPLACE) {
@@ -325,7 +323,7 @@ int bpf_trampoline_unlink_prog(struct bpf_prog *prog)
 	}
 	hlist_del(&prog->aux->tramp_hlist);
 	tr->progs_cnt[kind]--;
-	err = bpf_trampoline_update(prog->aux->trampoline);
+	err = bpf_trampoline_update(tr);
 out:
 	mutex_unlock(&tr->mutex);
 	return err;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ff3c5c53982c..867bc14f5610 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10889,6 +10889,12 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 		if (!btf_type_is_func_proto(t))
 			return -EINVAL;
 
+		if ((prog->aux->tgt_prog_type &&
+		     prog->aux->tgt_prog_type != tgt_prog->type) ||
+		    (prog->aux->tgt_attach_type &&
+		     prog->aux->tgt_attach_type != tgt_prog->expected_attach_type))
+			return -EINVAL;
+
 		if (tgt_prog && conservative)
 			t = NULL;
 
@@ -10931,7 +10937,6 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	const char *tname;
 	long addr;
 	int ret;
-	u64 key;
 
 	if (prog->type == BPF_PROG_TYPE_STRUCT_OPS)
 		return check_struct_ops_btf_id(env);
@@ -10947,13 +10952,13 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		return ret;
 
 	if (tgt_prog) {
+		prog->aux->tgt_prog_type = tgt_prog->type;
+		prog->aux->tgt_attach_type = tgt_prog->expected_attach_type;
+
 		if (prog->type == BPF_PROG_TYPE_EXT) {
 			env->ops = bpf_verifier_ops[tgt_prog->type];
 			prog->expected_attach_type = tgt_prog->expected_attach_type;
 		}
-		key = ((u64)tgt_prog->aux->id) << 32 | btf_id;
-	} else {
-		key = btf_id;
 	}
 
 	prog->aux->attach_func_proto = t;
@@ -10984,8 +10989,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 			if (ret < 0)
 				return ret;
 		}
-		return bpf_trampoline_get(key, (void *)addr, &fmodel,
-					  &prog->aux->trampoline);
+		return 0;
 	}
 }
 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-13 20:12 [PATCH bpf-next 0/6] bpf: Support multi-attach for freplace programs Toke Høiland-Jørgensen
                   ` (2 preceding siblings ...)
  2020-07-13 20:12 ` [PATCH bpf-next 3/6] bpf: support attaching freplace programs to multiple attach points Toke Høiland-Jørgensen
@ 2020-07-13 20:12 ` Toke Høiland-Jørgensen
  2020-07-14  5:21   ` BPF logging infrastructure. Was: " Andrii Nakryiko
  2020-07-13 20:12 ` [PATCH bpf-next 5/6] libbpf: add support for supplying target to bpf_raw_tracepoint_open() Toke Høiland-Jørgensen
  2020-07-13 20:12 ` [PATCH bpf-next 6/6] selftests: add test for multiple attachments of freplace program Toke Høiland-Jørgensen
  5 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-07-13 20:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

Sync addition of new members from main kernel tree.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/include/uapi/linux/bpf.h |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index da9bf35a26f8..662a15e4a1a1 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -573,8 +573,13 @@ union bpf_attr {
 	} query;
 
 	struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
-		__u64 name;
-		__u32 prog_fd;
+		__u64		name;
+		__u32		prog_fd;
+		__u32		log_level;	/* verbosity level of log */
+		__u32		log_size;	/* size of user buffer */
+		__aligned_u64	log_buf;	/* user supplied buffer */
+		__u32		tgt_prog_fd;
+		__u32		tgt_btf_id;
 	} raw_tracepoint;
 
 	struct { /* anonymous struct for BPF_BTF_LOAD */


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next 5/6] libbpf: add support for supplying target to bpf_raw_tracepoint_open()
  2020-07-13 20:12 [PATCH bpf-next 0/6] bpf: Support multi-attach for freplace programs Toke Høiland-Jørgensen
                   ` (3 preceding siblings ...)
  2020-07-13 20:12 ` [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h Toke Høiland-Jørgensen
@ 2020-07-13 20:12 ` Toke Høiland-Jørgensen
  2020-07-13 20:12 ` [PATCH bpf-next 6/6] selftests: add test for multiple attachments of freplace program Toke Høiland-Jørgensen
  5 siblings, 0 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-07-13 20:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This adds support for supplying a target fd and btf ID for the
raw_tracepoint_open() BPF operation, using a new bpf_raw_tracepoint_opts
structure. This can be used for attaching freplace programs to multiple
destinations.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 tools/lib/bpf/bpf.c      |   33 ++++++++++++++++++++++++++++++++-
 tools/lib/bpf/bpf.h      |   12 ++++++++++++
 tools/lib/bpf/libbpf.map |    1 +
 3 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index a7329b671c41..6e7690d8ddfb 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -793,17 +793,48 @@ int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len)
 	return err;
 }
 
-int bpf_raw_tracepoint_open(const char *name, int prog_fd)
+int bpf_raw_tracepoint_open_opts(const char *name, int prog_fd,
+				 struct bpf_raw_tracepoint_opts *opts)
 {
 	union bpf_attr attr;
+	size_t log_buf_sz;
+	__u32 log_level;
+	char *log_buf;
+
+	if (!OPTS_VALID(opts, bpf_raw_tracepoint_opts))
+		return -EINVAL;
+
+	log_buf = OPTS_GET(opts, log_buf, NULL);
+	log_buf_sz = OPTS_GET(opts, log_buf_sz, 0);
+	log_level = OPTS_GET(opts, log_level, 0);
+
+	if (!log_buf != !log_buf_sz || log_level > (4 | 2 | 1) ||
+	    (log_level && !log_buf))
+		return -EINVAL;
 
 	memset(&attr, 0, sizeof(attr));
 	attr.raw_tracepoint.name = ptr_to_u64(name);
 	attr.raw_tracepoint.prog_fd = prog_fd;
+	attr.raw_tracepoint.tgt_prog_fd = OPTS_GET(opts, tgt_prog_fd, 0);
+	attr.raw_tracepoint.tgt_btf_id = OPTS_GET(opts, tgt_btf_id, 0);
+
+	attr.raw_tracepoint.log_level = log_level;
+	if (log_level) {
+		attr.raw_tracepoint.log_buf = ptr_to_u64(log_buf);
+		attr.raw_tracepoint.log_size = log_buf_sz;
+	} else {
+		attr.raw_tracepoint.log_buf = ptr_to_u64(NULL);
+		attr.raw_tracepoint.log_size = 0;
+	}
 
 	return sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr));
 }
 
+int bpf_raw_tracepoint_open(const char *name, int prog_fd)
+{
+	return bpf_raw_tracepoint_open_opts(name, prog_fd, NULL);
+}
+
 int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
 		 bool do_log)
 {
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 1b6015b21ba8..b6b8ebb6ce65 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -227,7 +227,19 @@ LIBBPF_API int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len);
 LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
 			      __u32 query_flags, __u32 *attach_flags,
 			      __u32 *prog_ids, __u32 *prog_cnt);
+struct bpf_raw_tracepoint_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 log_level;
+	char *log_buf;
+	size_t log_buf_sz;
+	int tgt_prog_fd; /* target program to attach to */
+	__u32 tgt_btf_id; /* BTF ID of target function */
+};
+#define bpf_raw_tracepoint_opts__last_field tgt_btf_id
+
 LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd);
+LIBBPF_API int bpf_raw_tracepoint_open_opts(const char *name, int prog_fd,
+					    struct bpf_raw_tracepoint_opts *opts);
 LIBBPF_API int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf,
 			    __u32 log_buf_size, bool do_log);
 LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 6544d2cd1ed6..f726805f5ec8 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -288,4 +288,5 @@ LIBBPF_0.1.0 {
 		bpf_map__value_size;
 		bpf_program__autoload;
 		bpf_program__set_autoload;
+		bpf_raw_tracepoint_open_opts;
 } LIBBPF_0.0.9;


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next 6/6] selftests: add test for multiple attachments of freplace program
  2020-07-13 20:12 [PATCH bpf-next 0/6] bpf: Support multi-attach for freplace programs Toke Høiland-Jørgensen
                   ` (4 preceding siblings ...)
  2020-07-13 20:12 ` [PATCH bpf-next 5/6] libbpf: add support for supplying target to bpf_raw_tracepoint_open() Toke Høiland-Jørgensen
@ 2020-07-13 20:12 ` Toke Høiland-Jørgensen
  5 siblings, 0 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-07-13 20:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, netdev, bpf

From: Toke Høiland-Jørgensen <toke@redhat.com>

This adds a selftest for attaching an freplace program to multiple targets
simultaneously.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 .../selftests/bpf/prog_tests/fexit_bpf2bpf.c       |  174 +++++++++++++++++---
 .../selftests/bpf/progs/freplace_get_constant.c    |   15 ++
 2 files changed, 160 insertions(+), 29 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/freplace_get_constant.c

diff --git a/tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c b/tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c
index a895bfed55db..60cf3e44a6ef 100644
--- a/tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/prog_tests/fexit_bpf2bpf.c
@@ -2,35 +2,78 @@
 /* Copyright (c) 2019 Facebook */
 #include <test_progs.h>
 #include <network_helpers.h>
+#include <bpf/btf.h>
+
+typedef int (*test_cb)(struct bpf_object *obj);
+
+static int check_data_map(struct bpf_object *obj, int prog_cnt, bool reset)
+{
+	struct bpf_map *data_map, *map;
+	const int zero = 0;
+	u64 *result = NULL;
+	__u32 duration = 0;
+	int ret = -1, i;
+
+	result = malloc((prog_cnt + 32 /* spare */) * sizeof(u64));
+	if (CHECK(!result, "alloc_memory", "failed to alloc memory"))
+		return -ENOMEM;
+
+	bpf_object__for_each_map(map, obj)
+		if (bpf_map__is_internal(map)) {
+			data_map = map;
+			break;
+		}
+	if (CHECK(!data_map, "find_data_map", "data map not found\n"))
+		goto out;
+
+	ret = bpf_map_lookup_elem(bpf_map__fd(data_map), &zero, result);
+	if (CHECK(ret, "get_result",
+		  "failed to get output data: %d\n", ret))
+		goto out;
+
+	for (i = 0; i < prog_cnt; i++) {
+		if (CHECK(result[i] != 1, "result", "fexit_bpf2bpf failed err %ld\n",
+			  result[i]))
+			goto out;
+		result[i] = 0;
+	}
+	if (reset) {
+		ret = bpf_map_update_elem(bpf_map__fd(data_map), &zero, result, 0);
+		if (CHECK(ret, "reset_result", "failed to reset result\n"))
+			goto out;
+	}
+
+	ret = 0;
+out:
+	free(result);
+	return ret;
+}
 
 static void test_fexit_bpf2bpf_common(const char *obj_file,
 				      const char *target_obj_file,
 				      int prog_cnt,
 				      const char **prog_name,
-				      bool run_prog)
+				      bool run_prog,
+				      test_cb cb)
 {
-	struct bpf_object *obj = NULL, *pkt_obj;
-	int err, pkt_fd, i;
+	struct bpf_object *obj = NULL, *tgt_obj;
+	int err, tgt_fd, i;
 	struct bpf_link **link = NULL;
 	struct bpf_program **prog = NULL;
 	__u32 duration = 0, retval;
-	struct bpf_map *data_map;
-	const int zero = 0;
-	u64 *result = NULL;
 
 	err = bpf_prog_load(target_obj_file, BPF_PROG_TYPE_UNSPEC,
-			    &pkt_obj, &pkt_fd);
+			    &tgt_obj, &tgt_fd);
 	if (CHECK(err, "tgt_prog_load", "file %s err %d errno %d\n",
 		  target_obj_file, err, errno))
 		return;
 	DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts,
-			    .attach_prog_fd = pkt_fd,
+			    .attach_prog_fd = tgt_fd,
 			   );
 
 	link = calloc(sizeof(struct bpf_link *), prog_cnt);
 	prog = calloc(sizeof(struct bpf_program *), prog_cnt);
-	result = malloc((prog_cnt + 32 /* spare */) * sizeof(u64));
-	if (CHECK(!link || !prog || !result, "alloc_memory",
+	if (CHECK(!link || !prog, "alloc_memory",
 		  "failed to alloc memory"))
 		goto close_prog;
 
@@ -53,39 +96,33 @@ static void test_fexit_bpf2bpf_common(const char *obj_file,
 			goto close_prog;
 	}
 
-	if (!run_prog)
-		goto close_prog;
+	if (cb) {
+		err = cb(obj);
+		if (err)
+			goto close_prog;
+	}
 
-	data_map = bpf_object__find_map_by_name(obj, "fexit_bp.bss");
-	if (CHECK(!data_map, "find_data_map", "data map not found\n"))
+	if (!run_prog)
 		goto close_prog;
 
-	err = bpf_prog_test_run(pkt_fd, 1, &pkt_v6, sizeof(pkt_v6),
+	err = bpf_prog_test_run(tgt_fd, 1, &pkt_v6, sizeof(pkt_v6),
 				NULL, NULL, &retval, &duration);
 	CHECK(err || retval, "ipv6",
 	      "err %d errno %d retval %d duration %d\n",
 	      err, errno, retval, duration);
 
-	err = bpf_map_lookup_elem(bpf_map__fd(data_map), &zero, result);
-	if (CHECK(err, "get_result",
-		  "failed to get output data: %d\n", err))
+	if (check_data_map(obj, prog_cnt, false))
 		goto close_prog;
 
-	for (i = 0; i < prog_cnt; i++)
-		if (CHECK(result[i] != 1, "result", "fexit_bpf2bpf failed err %ld\n",
-			  result[i]))
-			goto close_prog;
-
 close_prog:
 	for (i = 0; i < prog_cnt; i++)
 		if (!IS_ERR_OR_NULL(link[i]))
 			bpf_link__destroy(link[i]);
 	if (!IS_ERR_OR_NULL(obj))
 		bpf_object__close(obj);
-	bpf_object__close(pkt_obj);
+	bpf_object__close(tgt_obj);
 	free(link);
 	free(prog);
-	free(result);
 }
 
 static void test_target_no_callees(void)
@@ -96,7 +133,7 @@ static void test_target_no_callees(void)
 	test_fexit_bpf2bpf_common("./fexit_bpf2bpf_simple.o",
 				  "./test_pkt_md_access.o",
 				  ARRAY_SIZE(prog_name),
-				  prog_name, true);
+				  prog_name, true, NULL);
 }
 
 static void test_target_yes_callees(void)
@@ -110,7 +147,7 @@ static void test_target_yes_callees(void)
 	test_fexit_bpf2bpf_common("./fexit_bpf2bpf.o",
 				  "./test_pkt_access.o",
 				  ARRAY_SIZE(prog_name),
-				  prog_name, true);
+				  prog_name, true, NULL);
 }
 
 static void test_func_replace(void)
@@ -127,7 +164,7 @@ static void test_func_replace(void)
 	test_fexit_bpf2bpf_common("./fexit_bpf2bpf.o",
 				  "./test_pkt_access.o",
 				  ARRAY_SIZE(prog_name),
-				  prog_name, true);
+				  prog_name, true, NULL);
 }
 
 static void test_func_replace_verify(void)
@@ -138,7 +175,85 @@ static void test_func_replace_verify(void)
 	test_fexit_bpf2bpf_common("./freplace_connect4.o",
 				  "./connect4_prog.o",
 				  ARRAY_SIZE(prog_name),
-				  prog_name, false);
+				  prog_name, false, NULL);
+}
+
+static int test_second_attach(struct bpf_object *obj)
+{
+	const char *prog_name = "freplace/get_constant";
+	const char *tgt_name = prog_name + 9; /* cut off freplace/ */
+	const char *tgt_obj_file = "./test_pkt_access.o";
+	int err = 0, tgt_fd, tgt_btf_id, link_fd = -1;
+	struct bpf_program *prog = NULL;
+	struct bpf_object *tgt_obj;
+	__u32 duration = 0, retval;
+	size_t log_buf_sz = 1024;
+	char *log_buf = NULL;
+	struct btf *btf;
+
+	prog = bpf_object__find_program_by_title(obj, prog_name);
+	if (CHECK(!prog, "find_prog", "prog %s not found\n", prog_name))
+		return -ENOENT;
+
+	log_buf = calloc(log_buf_sz, 1);
+	if (!log_buf)
+		return -ENOMEM;
+
+	err = bpf_prog_load(tgt_obj_file, BPF_PROG_TYPE_UNSPEC,
+			    &tgt_obj, &tgt_fd);
+	if (CHECK(err, "second_prog_load", "file %s err %d errno %d\n",
+		  tgt_obj_file, err, errno))
+		return err;
+
+	btf = bpf_object__btf(tgt_obj);
+	tgt_btf_id = btf__find_by_name_kind(btf, tgt_name, BTF_KIND_FUNC);
+	if (CHECK(tgt_btf_id < 0, "find_btf", "no BTF ID found for func %s\n", prog_name)) {
+		err = -ENOENT;
+		goto out;
+	}
+
+	DECLARE_LIBBPF_OPTS(bpf_raw_tracepoint_opts, opts,
+			    .tgt_prog_fd = tgt_fd,
+			    .tgt_btf_id = tgt_btf_id,
+			    .log_buf = log_buf,
+			    .log_buf_sz = log_buf_sz,
+			    .log_level = 7,
+			   );
+	link_fd = bpf_raw_tracepoint_open_opts(NULL, bpf_program__fd(prog), &opts);
+	if (CHECK(link_fd < 0, "second_link", "err %d errno %d log %s",
+		  link_fd, errno, log_buf)) {
+		err = link_fd;
+		goto out;
+	}
+
+	err = bpf_prog_test_run(tgt_fd, 1, &pkt_v6, sizeof(pkt_v6),
+				NULL, NULL, &retval, &duration);
+	if (CHECK(err || retval, "ipv6",
+		  "err %d errno %d retval %d duration %d\n",
+		  err, errno, retval, duration))
+		goto out;
+
+	err = check_data_map(obj, 1, true);
+	if (err)
+		goto out;
+
+out:
+	if (link_fd >= 0)
+		close(link_fd);
+	bpf_object__close(tgt_obj);
+	free(log_buf);
+	return err;
+}
+
+static void test_func_replace_multi(void)
+{
+	const char *prog_name[] = {
+		"freplace/get_constant",
+	};
+	test_fexit_bpf2bpf_common("./freplace_get_constant.o",
+				  "./test_pkt_access.o",
+				  ARRAY_SIZE(prog_name),
+				  prog_name, true, test_second_attach);
 }
 
 void test_fexit_bpf2bpf(void)
@@ -147,4 +262,5 @@ void test_fexit_bpf2bpf(void)
 	test_target_yes_callees();
 	test_func_replace();
 	test_func_replace_verify();
+	test_func_replace_multi();
 }
diff --git a/tools/testing/selftests/bpf/progs/freplace_get_constant.c b/tools/testing/selftests/bpf/progs/freplace_get_constant.c
new file mode 100644
index 000000000000..8f0ecf94e533
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/freplace_get_constant.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+volatile __u64 test_get_constant = 0;
+SEC("freplace/get_constant")
+int new_get_constant(long val)
+{
+	if (val != 123)
+		return 0;
+	test_get_constant = 1;
+	return test_get_constant; /* original get_constant() returns val - 122 */
+}
+char _license[] SEC("license") = "GPL";


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next 3/6] bpf: support attaching freplace programs to multiple attach points
  2020-07-13 20:12 ` [PATCH bpf-next 3/6] bpf: support attaching freplace programs to multiple attach points Toke Høiland-Jørgensen
@ 2020-07-13 23:16   ` kernel test robot
  0 siblings, 0 replies; 26+ messages in thread
From: kernel test robot @ 2020-07-13 23:16 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Alexei Starovoitov
  Cc: kbuild-all, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh, netdev,
	bpf

[-- Attachment #1: Type: text/plain, Size: 10309 bytes --]

Hi "Toke,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]
[also build test WARNING on vhost/linux-next ipvs/master v5.8-rc5 next-20200713]
[cannot apply to bpf-next/master bpf/master net/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Toke-H-iland-J-rgensen/bpf-Support-multi-attach-for-freplace-programs/20200714-041410
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 528ae84a34ffd40da5d3fbff740d28d6dc2c8f8a
config: nios2-allyesconfig (attached as .config)
compiler: nios2-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nios2 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/bpf/syscall.c: In function 'bpf_raw_tracepoint_open':
>> kernel/bpf/syscall.c:2831:1: warning: the frame size of 1172 bytes is larger than 1024 bytes [-Wframe-larger-than=]
    2831 | }
         | ^

vim +2831 kernel/bpf/syscall.c

c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2726  
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2727  static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2728  {
a3b80e1078943dc Andrii Nakryiko        2020-04-28  2729  	struct bpf_link_primer link_primer;
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2730  	struct bpf_verifier_log log = {};
babf3164095b067 Andrii Nakryiko        2020-03-09  2731  	struct bpf_raw_tp_link *link;
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2732  	struct bpf_raw_event_map *btp;
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2733  	struct bpf_prog *prog;
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2734  	const char *tp_name;
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2735  	char buf[128];
a3b80e1078943dc Andrii Nakryiko        2020-04-28  2736  	int err;
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2737  
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2738  	if (CHECK_ATTR(BPF_RAW_TRACEPOINT_OPEN))
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2739  		return -EINVAL;
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2740  
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2741  	prog = bpf_prog_get(attr->raw_tracepoint.prog_fd);
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2742  	if (IS_ERR(prog))
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2743  		return PTR_ERR(prog);
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2744  
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2745  	if (attr->raw_tracepoint.log_level ||
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2746  	    attr->raw_tracepoint.log_buf ||
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2747  	    attr->raw_tracepoint.log_size) {
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2748  		/* user requested verbose verifier output
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2749  		 * and supplied buffer to store the verification trace
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2750  		 */
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2751  		log.level = attr->raw_tracepoint.log_level;
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2752  		log.ubuf = (char __user *) (unsigned long) attr->raw_tracepoint.log_buf;
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2753  		log.len_total = attr->raw_tracepoint.log_size;
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2754  
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2755  		/* log attributes have to be sane */
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2756  		if (log.len_total < 128 || log.len_total > UINT_MAX >> 2 ||
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2757  		    !log.level || !log.ubuf || log.level & ~BPF_LOG_MASK)
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2758  			return -EINVAL;
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2759  	}
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2760  
9e4e01dfd3254c7 KP Singh               2020-03-29  2761  	switch (prog->type) {
9e4e01dfd3254c7 KP Singh               2020-03-29  2762  	case BPF_PROG_TYPE_TRACING:
9e4e01dfd3254c7 KP Singh               2020-03-29  2763  	case BPF_PROG_TYPE_EXT:
9e4e01dfd3254c7 KP Singh               2020-03-29  2764  	case BPF_PROG_TYPE_LSM:
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2765  		if (attr->raw_tracepoint.name) {
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2766  			/* The attach point for this category of programs should
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2767  			 * be specified via btf_id during program load, or using
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2768  			 * tgt_btf_id.
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2769  			 */
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2770  			err = -EINVAL;
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2771  			goto out_put_prog;
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2772  		}
9e4e01dfd3254c7 KP Singh               2020-03-29  2773  		if (prog->type == BPF_PROG_TYPE_TRACING &&
9e4e01dfd3254c7 KP Singh               2020-03-29  2774  		    prog->expected_attach_type == BPF_TRACE_RAW_TP) {
382072916044015 Martin KaFai Lau       2019-10-24  2775  			tp_name = prog->aux->attach_func_name;
9e4e01dfd3254c7 KP Singh               2020-03-29  2776  			break;
9e4e01dfd3254c7 KP Singh               2020-03-29  2777  		}
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2778  		return bpf_tracing_prog_attach(prog,
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2779  					       attr->raw_tracepoint.tgt_prog_fd,
95dcc8ca42c35eb Toke Høiland-Jørgensen 2020-07-13  2780  					       attr->raw_tracepoint.tgt_btf_id, &log);
9e4e01dfd3254c7 KP Singh               2020-03-29  2781  	case BPF_PROG_TYPE_RAW_TRACEPOINT:
9e4e01dfd3254c7 KP Singh               2020-03-29  2782  	case BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE:
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2783  		if (strncpy_from_user(buf,
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2784  				      u64_to_user_ptr(attr->raw_tracepoint.name),
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2785  				      sizeof(buf) - 1) < 0) {
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2786  			err = -EFAULT;
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2787  			goto out_put_prog;
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2788  		}
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2789  		buf[sizeof(buf) - 1] = 0;
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2790  		tp_name = buf;
9e4e01dfd3254c7 KP Singh               2020-03-29  2791  		break;
9e4e01dfd3254c7 KP Singh               2020-03-29  2792  	default:
9e4e01dfd3254c7 KP Singh               2020-03-29  2793  		err = -EINVAL;
9e4e01dfd3254c7 KP Singh               2020-03-29  2794  		goto out_put_prog;
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2795  	}
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2796  
a38d1107f937ca9 Matt Mullins           2018-12-12  2797  	btp = bpf_get_raw_tracepoint(tp_name);
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2798  	if (!btp) {
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2799  		err = -ENOENT;
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2800  		goto out_put_prog;
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2801  	}
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2802  
babf3164095b067 Andrii Nakryiko        2020-03-09  2803  	link = kzalloc(sizeof(*link), GFP_USER);
babf3164095b067 Andrii Nakryiko        2020-03-09  2804  	if (!link) {
a38d1107f937ca9 Matt Mullins           2018-12-12  2805  		err = -ENOMEM;
a38d1107f937ca9 Matt Mullins           2018-12-12  2806  		goto out_put_btp;
a38d1107f937ca9 Matt Mullins           2018-12-12  2807  	}
f2e10bff16a0fdd Andrii Nakryiko        2020-04-28  2808  	bpf_link_init(&link->link, BPF_LINK_TYPE_RAW_TRACEPOINT,
f2e10bff16a0fdd Andrii Nakryiko        2020-04-28  2809  		      &bpf_raw_tp_link_lops, prog);
babf3164095b067 Andrii Nakryiko        2020-03-09  2810  	link->btp = btp;
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2811  
a3b80e1078943dc Andrii Nakryiko        2020-04-28  2812  	err = bpf_link_prime(&link->link, &link_primer);
a3b80e1078943dc Andrii Nakryiko        2020-04-28  2813  	if (err) {
babf3164095b067 Andrii Nakryiko        2020-03-09  2814  		kfree(link);
babf3164095b067 Andrii Nakryiko        2020-03-09  2815  		goto out_put_btp;
babf3164095b067 Andrii Nakryiko        2020-03-09  2816  	}
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2817  
babf3164095b067 Andrii Nakryiko        2020-03-09  2818  	err = bpf_probe_register(link->btp, prog);
babf3164095b067 Andrii Nakryiko        2020-03-09  2819  	if (err) {
a3b80e1078943dc Andrii Nakryiko        2020-04-28  2820  		bpf_link_cleanup(&link_primer);
babf3164095b067 Andrii Nakryiko        2020-03-09  2821  		goto out_put_btp;
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2822  	}
babf3164095b067 Andrii Nakryiko        2020-03-09  2823  
a3b80e1078943dc Andrii Nakryiko        2020-04-28  2824  	return bpf_link_settle(&link_primer);
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2825  
a38d1107f937ca9 Matt Mullins           2018-12-12  2826  out_put_btp:
a38d1107f937ca9 Matt Mullins           2018-12-12  2827  	bpf_put_raw_tracepoint(btp);
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2828  out_put_prog:
ac4414b5ca47d16 Alexei Starovoitov     2019-10-15  2829  	bpf_prog_put(prog);
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2830  	return err;
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28 @2831  }
c4f6699dfcb8558 Alexei Starovoitov     2018-03-28  2832  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 56600 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-13 20:12 ` [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h Toke Høiland-Jørgensen
@ 2020-07-14  5:21   ` Andrii Nakryiko
  2020-07-14 12:12     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2020-07-14  5:21 UTC (permalink / raw)
  To: bpf, Alexei Starovoitov, Daniel Borkmann
  Cc: Toke Høiland-Jørgensen, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	Networking, Kernel Team

On Mon, Jul 13, 2020 at 1:13 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> From: Toke Høiland-Jørgensen <toke@redhat.com>
>
> Sync addition of new members from main kernel tree.
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
>  tools/include/uapi/linux/bpf.h |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index da9bf35a26f8..662a15e4a1a1 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -573,8 +573,13 @@ union bpf_attr {
>         } query;
>
>         struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
> -               __u64 name;
> -               __u32 prog_fd;
> +               __u64           name;
> +               __u32           prog_fd;
> +               __u32           log_level;      /* verbosity level of log */
> +               __u32           log_size;       /* size of user buffer */
> +               __aligned_u64   log_buf;        /* user supplied buffer */
> +               __u32           tgt_prog_fd;
> +               __u32           tgt_btf_id;
>         } raw_tracepoint;
>
>         struct { /* anonymous struct for BPF_BTF_LOAD */
>

I think BPF syscall would benefit from common/generalized log support
across all commands, given how powerful/complex it already is.
Sometimes it's literally impossible to understand why one gets -EINVAL
without adding printk()s in the kernel.

But it feels wrong to add log_level/log_size/log_buf fields to every
section of bpf_attr and require the user to specify it differently for
each command. So before we go and start adding per-command fields,
let's discuss how we can do this more generically. I wonder if we can
come up with a good way to do it in one common way and then gradually
support that way throughout all BPF commands.

Unfortunately it's too late to just add a few common fields to
bpf_attr in front of all other per-command fields, but here's two more
ideas just to start discussion. I hope someone can come up with
something nicer.

1. Currently bpf syscall accepts 3 input arguments (cmd, uattr, size).
We can extend it with two more optional arguments: one for pointer to
log-defining attr (for log_buf pointer, log_size, log_level, maybe
something more later) and another for the size of that log attr.
Beyond that we'd need some way to specify that the user actually meant
to provide those 2 optional args. The most straightforward way would
be to use the highest bit of cmd argument. So instead of calling bpf()
with BPF_MAP_CREATE command, you'd call it with BPF_MAP_CREATE |
BPF_LOGGED, where BPF_LOGGED = 1<<31.

2. A more "stateful" approach, would be to have an extra BPF command
to set log buffer (and level) per thread. And if such a log is set, it
would be overwritten with content on each bpf() syscall invocation
(i.e., log position would be reset to 0 on each BPF syscall).

Of course, the existing BPF_LOAD_PROG command would keep working with
existing log, but could fall back to the "common one", if
BPF_LOAD_PROG-specific one is not set.

It also doesn't seem to be all that critical to signal when the log
buffer is overflown. It's pretty easy to detect from user-space:
- either last byte would be non-zero, if we don't care about
guaranteeing zero-termination for truncated log;
- of second-to-last byte would be non-zero, if BPF syscall will always
zero-terminate the log.

Of course, if user code cares about such detection of log truncation,
it will need to set last/second-to-last bytes to zero before each
syscall.

Both approaches have their pros/cons, we can dig into those later, but
I'd like to start this discussion and see what other people think. I
also wonder if there are other syscalls with similarly advanced input
arguments (like bpf_attr) and common logging, we could learn from
those.

Thoughts?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-14  5:21   ` BPF logging infrastructure. Was: " Andrii Nakryiko
@ 2020-07-14 12:12     ` Toke Høiland-Jørgensen
  2020-07-14 19:14       ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-07-14 12:12 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Networking, Kernel Team

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Mon, Jul 13, 2020 at 1:13 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> Sync addition of new members from main kernel tree.
>>
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>>  tools/include/uapi/linux/bpf.h |    9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index da9bf35a26f8..662a15e4a1a1 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -573,8 +573,13 @@ union bpf_attr {
>>         } query;
>>
>>         struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
>> -               __u64 name;
>> -               __u32 prog_fd;
>> +               __u64           name;
>> +               __u32           prog_fd;
>> +               __u32           log_level;      /* verbosity level of log */
>> +               __u32           log_size;       /* size of user buffer */
>> +               __aligned_u64   log_buf;        /* user supplied buffer */
>> +               __u32           tgt_prog_fd;
>> +               __u32           tgt_btf_id;
>>         } raw_tracepoint;
>>
>>         struct { /* anonymous struct for BPF_BTF_LOAD */
>>
>
> I think BPF syscall would benefit from common/generalized log support
> across all commands, given how powerful/complex it already is.
> Sometimes it's literally impossible to understand why one gets -EINVAL
> without adding printk()s in the kernel.

Yes, I agree! This is horrible UI!

> But it feels wrong to add log_level/log_size/log_buf fields to every
> section of bpf_attr and require the user to specify it differently for
> each command. So before we go and start adding per-command fields,
> let's discuss how we can do this more generically. I wonder if we can
> come up with a good way to do it in one common way and then gradually
> support that way throughout all BPF commands.
>
> Unfortunately it's too late to just add a few common fields to
> bpf_attr in front of all other per-command fields, but here's two more
> ideas just to start discussion. I hope someone can come up with
> something nicer.
>
> 1. Currently bpf syscall accepts 3 input arguments (cmd, uattr, size).
> We can extend it with two more optional arguments: one for pointer to
> log-defining attr (for log_buf pointer, log_size, log_level, maybe
> something more later) and another for the size of that log attr.
> Beyond that we'd need some way to specify that the user actually meant
> to provide those 2 optional args. The most straightforward way would
> be to use the highest bit of cmd argument. So instead of calling bpf()
> with BPF_MAP_CREATE command, you'd call it with BPF_MAP_CREATE |
> BPF_LOGGED, where BPF_LOGGED = 1<<31.

Well, if only we'd had a 'flags' argument to the syscall... I don't
suppose we want a bpf2()? :)

I like your idea of just using the top bits of the 'cmd' field as flag
bits, but in that case we should just define this explicitly, say
'#define BPF_CMD_FLAGS_MASK 0xFFFF0000'?

And instead of adding new arguments, why not just change the meaning of
the 'attr' argument? Say we define:

struct bpf_extended_attr {
       __u32 log_level;
       __u32 log_size;
       __aligned_u64 log_buf;
       __u8 reserved[48];
       union bpf_attr attr;
};

And then define a new flag BPF_USES_EXTENDED_ATTR which will cause the
kernel to interpret the second argument of the syscall as a pointer to
that struct instead of to the bpf_attr union?

> 2. A more "stateful" approach, would be to have an extra BPF command
> to set log buffer (and level) per thread. And if such a log is set, it
> would be overwritten with content on each bpf() syscall invocation
> (i.e., log position would be reset to 0 on each BPF syscall).

I don't think adding something stateful is a good idea; that's bound to
lead to weird issues when someone messes up the state management in
userspace.

> Of course, the existing BPF_LOAD_PROG command would keep working with
> existing log, but could fall back to the "common one", if
> BPF_LOAD_PROG-specific one is not set.
>
> It also doesn't seem to be all that critical to signal when the log
> buffer is overflown. It's pretty easy to detect from user-space:
> - either last byte would be non-zero, if we don't care about
> guaranteeing zero-termination for truncated log;
> - of second-to-last byte would be non-zero, if BPF syscall will always
> zero-terminate the log.

I think if we're doing this we should think about making the contents of
the log machine-readable, so applications can figure out what's going on
without having to parse the text strings. The simplest would be to make
this new log buffer use TLV-style messaging, say we define the log
buffer output to be a series of messages like these:

struct bpf_log_msg {
       __u16 type;
       __u32 len;
       __u8 contents[]; /* of size 'len' */
} __attribute__((packed));

To begin with we could define two types:

struct bpf_log_msg_string {
       __u16 type; /* BPF_LOG_MSG_TYPE_STRING */
       __u32 len;
       char message[];
}  __attribute__((packed));

struct bpf_log_msg_end {
       __u16 type; /* BPF_LOG_MSG_TYPE_END */
       __u32 len;
       __u16 num_truncations;
}  __attribute__((packed));

The TYPE_STRING would just be a wrapper for the existing text-based
messages, but delimited so userspace can pick them apart. And the second
one would solve your 'has the log been truncated' issue above; the
kernel simply always reserves eight bytes at the end of the buffer and
ends with writing a TYPE_END message with the number of messages that
were dropped due to lack of space. That would make it trivial for
userspace to detect truncation.

We could then add new message types later for machine-consumption. Say,
and extended error code, or offsets into the BTF information, or
whatever we end up needing. But just wrapping the existing log messages
in TLVs like the ones above could be fairly straight-forwardly
implemented with the existing bpf_log() infrastructure in the kernel,
without having to settle on which machine-readable information is useful
ahead of time... And the TLV format makes it easy for userspace to just
skip message types it doesn't understand.

WDYT?

> Of course, if user code cares about such detection of log truncation,
> it will need to set last/second-to-last bytes to zero before each
> syscall.
>
> Both approaches have their pros/cons, we can dig into those later, but
> I'd like to start this discussion and see what other people think. I
> also wonder if there are other syscalls with similarly advanced input
> arguments (like bpf_attr) and common logging, we could learn from
> those.

The first one that comes to mind is netlink extacks. Of course it's not
quite comparable since netlink already has message-based semantics, but
it does do sorta-kinda the same thing from a user PoV. The TLV format is
obviously inspired by netlink (or, well, binary networking protocols in
general).

-Toke

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-14 12:12     ` Toke Høiland-Jørgensen
@ 2020-07-14 19:14       ` Andrii Nakryiko
  2020-07-14 20:47         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2020-07-14 19:14 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend,
	KP Singh, Networking, Kernel Team

On Tue, Jul 14, 2020 at 5:12 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Mon, Jul 13, 2020 at 1:13 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >>
> >> Sync addition of new members from main kernel tree.
> >>
> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> ---
> >>  tools/include/uapi/linux/bpf.h |    9 +++++++--
> >>  1 file changed, 7 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> >> index da9bf35a26f8..662a15e4a1a1 100644
> >> --- a/tools/include/uapi/linux/bpf.h
> >> +++ b/tools/include/uapi/linux/bpf.h
> >> @@ -573,8 +573,13 @@ union bpf_attr {
> >>         } query;
> >>
> >>         struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
> >> -               __u64 name;
> >> -               __u32 prog_fd;
> >> +               __u64           name;
> >> +               __u32           prog_fd;
> >> +               __u32           log_level;      /* verbosity level of log */
> >> +               __u32           log_size;       /* size of user buffer */
> >> +               __aligned_u64   log_buf;        /* user supplied buffer */
> >> +               __u32           tgt_prog_fd;
> >> +               __u32           tgt_btf_id;
> >>         } raw_tracepoint;
> >>
> >>         struct { /* anonymous struct for BPF_BTF_LOAD */
> >>
> >
> > I think BPF syscall would benefit from common/generalized log support
> > across all commands, given how powerful/complex it already is.
> > Sometimes it's literally impossible to understand why one gets -EINVAL
> > without adding printk()s in the kernel.
>
> Yes, I agree! This is horrible UI!

UI?.. It's a perfectly fine and extensible API for all functionality
it provides, it just needs a better human-readable feedback mechanism,
which is what I'm proposing. Error codes are not working when you have
so many different situations that can result in error.

>
> > But it feels wrong to add log_level/log_size/log_buf fields to every
> > section of bpf_attr and require the user to specify it differently for
> > each command. So before we go and start adding per-command fields,
> > let's discuss how we can do this more generically. I wonder if we can
> > come up with a good way to do it in one common way and then gradually
> > support that way throughout all BPF commands.
> >
> > Unfortunately it's too late to just add a few common fields to
> > bpf_attr in front of all other per-command fields, but here's two more
> > ideas just to start discussion. I hope someone can come up with
> > something nicer.
> >
> > 1. Currently bpf syscall accepts 3 input arguments (cmd, uattr, size).
> > We can extend it with two more optional arguments: one for pointer to
> > log-defining attr (for log_buf pointer, log_size, log_level, maybe
> > something more later) and another for the size of that log attr.
> > Beyond that we'd need some way to specify that the user actually meant
> > to provide those 2 optional args. The most straightforward way would
> > be to use the highest bit of cmd argument. So instead of calling bpf()
> > with BPF_MAP_CREATE command, you'd call it with BPF_MAP_CREATE |
> > BPF_LOGGED, where BPF_LOGGED = 1<<31.
>
> Well, if only we'd had a 'flags' argument to the syscall... I don't
> suppose we want a bpf2()? :)
>
> I like your idea of just using the top bits of the 'cmd' field as flag
> bits, but in that case we should just define this explicitly, say
> '#define BPF_CMD_FLAGS_MASK 0xFFFF0000'?

sure

>
> And instead of adding new arguments, why not just change the meaning of
> the 'attr' argument? Say we define:
>
> struct bpf_extended_attr {
>        __u32 log_level;
>        __u32 log_size;
>        __aligned_u64 log_buf;
>        __u8 reserved[48];
>        union bpf_attr attr;
> };

because this is a major PITA for libraries, plus it's not very
extensible, once you run out of 48 bytes? And when you don't need
those 48 bytes, you still need to zero them out, the kernel still
needs to copy them, etc. It just feels unclean to me.

But before we argue that, is there a problem adding 2 more arguments
which are never used/read unless we have an extra bit set in cmd?
Honest question, are there any implications to user-space with such an
approach? Backwards-compatibility issues or anything?


>
> And then define a new flag BPF_USES_EXTENDED_ATTR which will cause the
> kernel to interpret the second argument of the syscall as a pointer to
> that struct instead of to the bpf_attr union?
>
> > 2. A more "stateful" approach, would be to have an extra BPF command
> > to set log buffer (and level) per thread. And if such a log is set, it
> > would be overwritten with content on each bpf() syscall invocation
> > (i.e., log position would be reset to 0 on each BPF syscall).
>
> I don't think adding something stateful is a good idea; that's bound to
> lead to weird issues when someone messes up the state management in
> userspace.

I agree, I'd prefer a stateless approach, but wanted to lay out a
stateful one for completeness as well.

>
> > Of course, the existing BPF_LOAD_PROG command would keep working with
> > existing log, but could fall back to the "common one", if
> > BPF_LOAD_PROG-specific one is not set.
> >
> > It also doesn't seem to be all that critical to signal when the log
> > buffer is overflown. It's pretty easy to detect from user-space:
> > - either last byte would be non-zero, if we don't care about
> > guaranteeing zero-termination for truncated log;
> > - of second-to-last byte would be non-zero, if BPF syscall will always
> > zero-terminate the log.
>
> I think if we're doing this we should think about making the contents of
> the log machine-readable, so applications can figure out what's going on
> without having to parse the text strings. The simplest would be to make
> this new log buffer use TLV-style messaging, say we define the log
> buffer output to be a series of messages like these:
>
> struct bpf_log_msg {
>        __u16 type;
>        __u32 len;
>        __u8 contents[]; /* of size 'len' */
> } __attribute__((packed));
>
> To begin with we could define two types:
>
> struct bpf_log_msg_string {
>        __u16 type; /* BPF_LOG_MSG_TYPE_STRING */
>        __u32 len;
>        char message[];
> }  __attribute__((packed));
>
> struct bpf_log_msg_end {
>        __u16 type; /* BPF_LOG_MSG_TYPE_END */
>        __u32 len;
>        __u16 num_truncations;
> }  __attribute__((packed));
>
> The TYPE_STRING would just be a wrapper for the existing text-based
> messages, but delimited so userspace can pick them apart. And the second
> one would solve your 'has the log been truncated' issue above; the
> kernel simply always reserves eight bytes at the end of the buffer and
> ends with writing a TYPE_END message with the number of messages that
> were dropped due to lack of space. That would make it trivial for
> userspace to detect truncation.
>

Log truncation is not an issue, we can make bpf syscall to write back
the actual size of emitted log (and optionally extra bool to mean
"truncated") into the original log_size field.

> We could then add new message types later for machine-consumption. Say,
> and extended error code, or offsets into the BTF information, or
> whatever we end up needing. But just wrapping the existing log messages
> in TLVs like the ones above could be fairly straight-forwardly
> implemented with the existing bpf_log() infrastructure in the kernel,
> without having to settle on which machine-readable information is useful
> ahead of time... And the TLV format makes it easy for userspace to just
> skip message types it doesn't understand.
>
> WDYT?

I think it's taking it a bit too far and adds more API on top of
existing API. Now all those types become a fixed API, messages become
an API, etc. Just more backwards compatibility stuff we need to
support forever, for, what I believe, very little gain.

In practice, using human-readable strings works just fine. If there is
any kind of real issue, usually it involves humans reading debug logs
and understanding what's going on.

Also adopting these packet-like messages is not as straightforward
through BPF code, as now you can't just construct a single log line
with few calls to bpf_log().

>
> > Of course, if user code cares about such detection of log truncation,
> > it will need to set last/second-to-last bytes to zero before each
> > syscall.
> >
> > Both approaches have their pros/cons, we can dig into those later, but
> > I'd like to start this discussion and see what other people think. I
> > also wonder if there are other syscalls with similarly advanced input
> > arguments (like bpf_attr) and common logging, we could learn from
> > those.
>
> The first one that comes to mind is netlink extacks. Of course it's not
> quite comparable since netlink already has message-based semantics, but
> it does do sorta-kinda the same thing from a user PoV. The TLV format is
> obviously inspired by netlink (or, well, binary networking protocols in
> general).
>

Yeah, I'm aware of extack, but as you said, TLV is more of a netlink
format, extack messages themselves are just strings. But my question
was more of how this log could be done for complicated API calls using
similar extendable attrs, like perf_event_open, clone3, openat2, etc.

> -Toke
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-14 19:14       ` Andrii Nakryiko
@ 2020-07-14 20:47         ` Toke Høiland-Jørgensen
  2020-07-14 21:58           ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-07-14 20:47 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend,
	KP Singh, Networking, Kernel Team

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Jul 14, 2020 at 5:12 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Mon, Jul 13, 2020 at 1:13 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> >>
>> >> Sync addition of new members from main kernel tree.
>> >>
>> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> >> ---
>> >>  tools/include/uapi/linux/bpf.h |    9 +++++++--
>> >>  1 file changed, 7 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> >> index da9bf35a26f8..662a15e4a1a1 100644
>> >> --- a/tools/include/uapi/linux/bpf.h
>> >> +++ b/tools/include/uapi/linux/bpf.h
>> >> @@ -573,8 +573,13 @@ union bpf_attr {
>> >>         } query;
>> >>
>> >>         struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
>> >> -               __u64 name;
>> >> -               __u32 prog_fd;
>> >> +               __u64           name;
>> >> +               __u32           prog_fd;
>> >> +               __u32           log_level;      /* verbosity level of log */
>> >> +               __u32           log_size;       /* size of user buffer */
>> >> +               __aligned_u64   log_buf;        /* user supplied buffer */
>> >> +               __u32           tgt_prog_fd;
>> >> +               __u32           tgt_btf_id;
>> >>         } raw_tracepoint;
>> >>
>> >>         struct { /* anonymous struct for BPF_BTF_LOAD */
>> >>
>> >
>> > I think BPF syscall would benefit from common/generalized log support
>> > across all commands, given how powerful/complex it already is.
>> > Sometimes it's literally impossible to understand why one gets -EINVAL
>> > without adding printk()s in the kernel.
>>
>> Yes, I agree! This is horrible UI!
>
> UI?.. It's a perfectly fine and extensible API for all functionality
> it provides, it just needs a better human-readable feedback mechanism,
> which is what I'm proposing. Error codes are not working when you have
> so many different situations that can result in error.

Yes. I was agreeing with you: the lack of understandable error messages
means you have to play "guess where this EINVAL came from", which is a
terrible user experience (should have been UX instead of UI, I guess,
sorry about that). 

>> > But it feels wrong to add log_level/log_size/log_buf fields to every
>> > section of bpf_attr and require the user to specify it differently for
>> > each command. So before we go and start adding per-command fields,
>> > let's discuss how we can do this more generically. I wonder if we can
>> > come up with a good way to do it in one common way and then gradually
>> > support that way throughout all BPF commands.
>> >
>> > Unfortunately it's too late to just add a few common fields to
>> > bpf_attr in front of all other per-command fields, but here's two more
>> > ideas just to start discussion. I hope someone can come up with
>> > something nicer.
>> >
>> > 1. Currently bpf syscall accepts 3 input arguments (cmd, uattr, size).
>> > We can extend it with two more optional arguments: one for pointer to
>> > log-defining attr (for log_buf pointer, log_size, log_level, maybe
>> > something more later) and another for the size of that log attr.
>> > Beyond that we'd need some way to specify that the user actually meant
>> > to provide those 2 optional args. The most straightforward way would
>> > be to use the highest bit of cmd argument. So instead of calling bpf()
>> > with BPF_MAP_CREATE command, you'd call it with BPF_MAP_CREATE |
>> > BPF_LOGGED, where BPF_LOGGED = 1<<31.
>>
>> Well, if only we'd had a 'flags' argument to the syscall... I don't
>> suppose we want a bpf2()? :)
>>
>> I like your idea of just using the top bits of the 'cmd' field as flag
>> bits, but in that case we should just define this explicitly, say
>> '#define BPF_CMD_FLAGS_MASK 0xFFFF0000'?
>
> sure
>
>>
>> And instead of adding new arguments, why not just change the meaning of
>> the 'attr' argument? Say we define:
>>
>> struct bpf_extended_attr {
>>        __u32 log_level;
>>        __u32 log_size;
>>        __aligned_u64 log_buf;
>>        __u8 reserved[48];
>>        union bpf_attr attr;
>> };
>
> because this is a major PITA for libraries, plus it's not very
> extensible, once you run out of 48 bytes? And when you don't need
> those 48 bytes, you still need to zero them out, the kernel still
> needs to copy them, etc. It just feels unclean to me.
>
> But before we argue that, is there a problem adding 2 more arguments
> which are never used/read unless we have an extra bit set in cmd?
> Honest question, are there any implications to user-space with such an
> approach? Backwards-compatibility issues or anything?

No idea; I don't know enough about how the lower-level details of the
syscall interface works to tell either way. Is it even *possible* to add
arguments like that in a backwards-compatible way?

However, assuming it *is* possible, my larger point was that we
shouldn't add just a 'logging struct', but rather a 'common options
struct' which can be extended further as needed. And if it is *not*
possible to add new arguments to a syscall like you're proposing, my
suggestion above would be a different way to achieve basically the same
(at the cost of having to specify the maximum reserved space in advance).

>> And then define a new flag BPF_USES_EXTENDED_ATTR which will cause the
>> kernel to interpret the second argument of the syscall as a pointer to
>> that struct instead of to the bpf_attr union?
>>
>> > 2. A more "stateful" approach, would be to have an extra BPF command
>> > to set log buffer (and level) per thread. And if such a log is set, it
>> > would be overwritten with content on each bpf() syscall invocation
>> > (i.e., log position would be reset to 0 on each BPF syscall).
>>
>> I don't think adding something stateful is a good idea; that's bound to
>> lead to weird issues when someone messes up the state management in
>> userspace.
>
> I agree, I'd prefer a stateless approach, but wanted to lay out a
> stateful one for completeness as well.

OK, cool.

>>
>> > Of course, the existing BPF_LOAD_PROG command would keep working with
>> > existing log, but could fall back to the "common one", if
>> > BPF_LOAD_PROG-specific one is not set.
>> >
>> > It also doesn't seem to be all that critical to signal when the log
>> > buffer is overflown. It's pretty easy to detect from user-space:
>> > - either last byte would be non-zero, if we don't care about
>> > guaranteeing zero-termination for truncated log;
>> > - of second-to-last byte would be non-zero, if BPF syscall will always
>> > zero-terminate the log.
>>
>> I think if we're doing this we should think about making the contents of
>> the log machine-readable, so applications can figure out what's going on
>> without having to parse the text strings. The simplest would be to make
>> this new log buffer use TLV-style messaging, say we define the log
>> buffer output to be a series of messages like these:
>>
>> struct bpf_log_msg {
>>        __u16 type;
>>        __u32 len;
>>        __u8 contents[]; /* of size 'len' */
>> } __attribute__((packed));
>>
>> To begin with we could define two types:
>>
>> struct bpf_log_msg_string {
>>        __u16 type; /* BPF_LOG_MSG_TYPE_STRING */
>>        __u32 len;
>>        char message[];
>> }  __attribute__((packed));
>>
>> struct bpf_log_msg_end {
>>        __u16 type; /* BPF_LOG_MSG_TYPE_END */
>>        __u32 len;
>>        __u16 num_truncations;
>> }  __attribute__((packed));
>>
>> The TYPE_STRING would just be a wrapper for the existing text-based
>> messages, but delimited so userspace can pick them apart. And the second
>> one would solve your 'has the log been truncated' issue above; the
>> kernel simply always reserves eight bytes at the end of the buffer and
>> ends with writing a TYPE_END message with the number of messages that
>> were dropped due to lack of space. That would make it trivial for
>> userspace to detect truncation.
>>
>
> Log truncation is not an issue, we can make bpf syscall to write back
> the actual size of emitted log (and optionally extra bool to mean
> "truncated") into the original log_size field.

So what was all that you were talking about with "check the
second-to-last byte" in your previous email? I understood that to be
about detecting truncation?

>> We could then add new message types later for machine-consumption. Say,
>> and extended error code, or offsets into the BTF information, or
>> whatever we end up needing. But just wrapping the existing log messages
>> in TLVs like the ones above could be fairly straight-forwardly
>> implemented with the existing bpf_log() infrastructure in the kernel,
>> without having to settle on which machine-readable information is useful
>> ahead of time... And the TLV format makes it easy for userspace to just
>> skip message types it doesn't understand.
>>
>> WDYT?
>
> I think it's taking it a bit too far and adds more API on top of
> existing API. Now all those types become a fixed API, messages become
> an API, etc. Just more backwards compatibility stuff we need to
> support forever, for, what I believe, very little gain.
>
> In practice, using human-readable strings works just fine. If there is
> any kind of real issue, usually it involves humans reading debug logs
> and understanding what's going on.

Let me give an example of what I want to be able to do. Just today I
helped someone debug getting xdp-filter to run, and they were getting
output like this:

      libbpf: -- BEGIN DUMP LOG ---
      libbpf:
      xdpfilt_alw_all() is not a global function
      processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0
      peak_states 0 mark_read 0

      libbpf: -- END LOG --

I would like to have xdp-filter catch that, and turn it into a
friendlier "your compiler is too old" message. Having to effectively
grep through the free-form log output to pick out that message feels
brittle and error prone, as opposed to just having the kernel add a
machine-readable id ("err_func_linkage_not_global" or somesuch) and
stick it in a machine-parsable TLV.

> Also adopting these packet-like messages is not as straightforward
> through BPF code, as now you can't just construct a single log line
> with few calls to bpf_log().

Why not? bpf_log() could just transparently write the four bytes of
header (TYPE_STRING, followed by strlen(msg)) into the buffer before the
string? And in the future, an enhanced version could take (say) an error
ID as another parameter and transparently add that as a separate message.

>> > Of course, if user code cares about such detection of log truncation,
>> > it will need to set last/second-to-last bytes to zero before each
>> > syscall.
>> >
>> > Both approaches have their pros/cons, we can dig into those later, but
>> > I'd like to start this discussion and see what other people think. I
>> > also wonder if there are other syscalls with similarly advanced input
>> > arguments (like bpf_attr) and common logging, we could learn from
>> > those.
>>
>> The first one that comes to mind is netlink extacks. Of course it's not
>> quite comparable since netlink already has message-based semantics, but
>> it does do sorta-kinda the same thing from a user PoV. The TLV format is
>> obviously inspired by netlink (or, well, binary networking protocols in
>> general).
>>
>
> Yeah, I'm aware of extack, but as you said, TLV is more of a netlink
> format, extack messages themselves are just strings. But my question
> was more of how this log could be done for complicated API calls using
> similar extendable attrs, like perf_event_open, clone3, openat2, etc.

Ah, right, no idea :)

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-14 20:47         ` Toke Høiland-Jørgensen
@ 2020-07-14 21:58           ` Andrii Nakryiko
  2020-07-14 22:19             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2020-07-14 21:58 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend,
	KP Singh, Networking, Kernel Team

On Tue, Jul 14, 2020 at 1:47 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Tue, Jul 14, 2020 at 5:12 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >> > On Mon, Jul 13, 2020 at 1:13 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >>
> >> >> From: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >>
> >> >> Sync addition of new members from main kernel tree.
> >> >>
> >> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> >> >> ---
> >> >>  tools/include/uapi/linux/bpf.h |    9 +++++++--
> >> >>  1 file changed, 7 insertions(+), 2 deletions(-)
> >> >>
> >> >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> >> >> index da9bf35a26f8..662a15e4a1a1 100644
> >> >> --- a/tools/include/uapi/linux/bpf.h
> >> >> +++ b/tools/include/uapi/linux/bpf.h
> >> >> @@ -573,8 +573,13 @@ union bpf_attr {
> >> >>         } query;
> >> >>
> >> >>         struct { /* anonymous struct used by BPF_RAW_TRACEPOINT_OPEN command */
> >> >> -               __u64 name;
> >> >> -               __u32 prog_fd;
> >> >> +               __u64           name;
> >> >> +               __u32           prog_fd;
> >> >> +               __u32           log_level;      /* verbosity level of log */
> >> >> +               __u32           log_size;       /* size of user buffer */
> >> >> +               __aligned_u64   log_buf;        /* user supplied buffer */
> >> >> +               __u32           tgt_prog_fd;
> >> >> +               __u32           tgt_btf_id;
> >> >>         } raw_tracepoint;
> >> >>
> >> >>         struct { /* anonymous struct for BPF_BTF_LOAD */
> >> >>
> >> >
> >> > I think BPF syscall would benefit from common/generalized log support
> >> > across all commands, given how powerful/complex it already is.
> >> > Sometimes it's literally impossible to understand why one gets -EINVAL
> >> > without adding printk()s in the kernel.
> >>
> >> Yes, I agree! This is horrible UI!
> >
> > UI?.. It's a perfectly fine and extensible API for all functionality
> > it provides, it just needs a better human-readable feedback mechanism,
> > which is what I'm proposing. Error codes are not working when you have
> > so many different situations that can result in error.
>
> Yes. I was agreeing with you: the lack of understandable error messages
> means you have to play "guess where this EINVAL came from", which is a
> terrible user experience (should have been UX instead of UI, I guess,
> sorry about that).

Ah, ok, the term confused me :)

>
> >> > But it feels wrong to add log_level/log_size/log_buf fields to every
> >> > section of bpf_attr and require the user to specify it differently for
> >> > each command. So before we go and start adding per-command fields,
> >> > let's discuss how we can do this more generically. I wonder if we can
> >> > come up with a good way to do it in one common way and then gradually
> >> > support that way throughout all BPF commands.
> >> >
> >> > Unfortunately it's too late to just add a few common fields to
> >> > bpf_attr in front of all other per-command fields, but here's two more
> >> > ideas just to start discussion. I hope someone can come up with
> >> > something nicer.
> >> >
> >> > 1. Currently bpf syscall accepts 3 input arguments (cmd, uattr, size).
> >> > We can extend it with two more optional arguments: one for pointer to
> >> > log-defining attr (for log_buf pointer, log_size, log_level, maybe
> >> > something more later) and another for the size of that log attr.
> >> > Beyond that we'd need some way to specify that the user actually meant
> >> > to provide those 2 optional args. The most straightforward way would
> >> > be to use the highest bit of cmd argument. So instead of calling bpf()
> >> > with BPF_MAP_CREATE command, you'd call it with BPF_MAP_CREATE |
> >> > BPF_LOGGED, where BPF_LOGGED = 1<<31.
> >>
> >> Well, if only we'd had a 'flags' argument to the syscall... I don't
> >> suppose we want a bpf2()? :)
> >>
> >> I like your idea of just using the top bits of the 'cmd' field as flag
> >> bits, but in that case we should just define this explicitly, say
> >> '#define BPF_CMD_FLAGS_MASK 0xFFFF0000'?
> >
> > sure
> >
> >>
> >> And instead of adding new arguments, why not just change the meaning of
> >> the 'attr' argument? Say we define:
> >>
> >> struct bpf_extended_attr {
> >>        __u32 log_level;
> >>        __u32 log_size;
> >>        __aligned_u64 log_buf;
> >>        __u8 reserved[48];
> >>        union bpf_attr attr;
> >> };
> >
> > because this is a major PITA for libraries, plus it's not very
> > extensible, once you run out of 48 bytes? And when you don't need
> > those 48 bytes, you still need to zero them out, the kernel still
> > needs to copy them, etc. It just feels unclean to me.
> >
> > But before we argue that, is there a problem adding 2 more arguments
> > which are never used/read unless we have an extra bit set in cmd?
> > Honest question, are there any implications to user-space with such an
> > approach? Backwards-compatibility issues or anything?
>
> No idea; I don't know enough about how the lower-level details of the
> syscall interface works to tell either way. Is it even *possible* to add
> arguments like that in a backwards-compatible way?
>
> However, assuming it *is* possible, my larger point was that we
> shouldn't add just a 'logging struct', but rather a 'common options
> struct' which can be extended further as needed. And if it is *not*
> possible to add new arguments to a syscall like you're proposing, my
> suggestion above would be a different way to achieve basically the same
> (at the cost of having to specify the maximum reserved space in advance).
>

yeah-yeah, I agree, it's less a "logging attr", more of "common attr
across all commands".

> >> And then define a new flag BPF_USES_EXTENDED_ATTR which will cause the
> >> kernel to interpret the second argument of the syscall as a pointer to
> >> that struct instead of to the bpf_attr union?
> >>
> >> > 2. A more "stateful" approach, would be to have an extra BPF command
> >> > to set log buffer (and level) per thread. And if such a log is set, it
> >> > would be overwritten with content on each bpf() syscall invocation
> >> > (i.e., log position would be reset to 0 on each BPF syscall).
> >>
> >> I don't think adding something stateful is a good idea; that's bound to
> >> lead to weird issues when someone messes up the state management in
> >> userspace.
> >
> > I agree, I'd prefer a stateless approach, but wanted to lay out a
> > stateful one for completeness as well.
>
> OK, cool.
>
> >>
> >> > Of course, the existing BPF_LOAD_PROG command would keep working with
> >> > existing log, but could fall back to the "common one", if
> >> > BPF_LOAD_PROG-specific one is not set.
> >> >
> >> > It also doesn't seem to be all that critical to signal when the log
> >> > buffer is overflown. It's pretty easy to detect from user-space:
> >> > - either last byte would be non-zero, if we don't care about
> >> > guaranteeing zero-termination for truncated log;
> >> > - of second-to-last byte would be non-zero, if BPF syscall will always
> >> > zero-terminate the log.
> >>
> >> I think if we're doing this we should think about making the contents of
> >> the log machine-readable, so applications can figure out what's going on
> >> without having to parse the text strings. The simplest would be to make
> >> this new log buffer use TLV-style messaging, say we define the log
> >> buffer output to be a series of messages like these:
> >>
> >> struct bpf_log_msg {
> >>        __u16 type;
> >>        __u32 len;
> >>        __u8 contents[]; /* of size 'len' */
> >> } __attribute__((packed));
> >>
> >> To begin with we could define two types:
> >>
> >> struct bpf_log_msg_string {
> >>        __u16 type; /* BPF_LOG_MSG_TYPE_STRING */
> >>        __u32 len;
> >>        char message[];
> >> }  __attribute__((packed));
> >>
> >> struct bpf_log_msg_end {
> >>        __u16 type; /* BPF_LOG_MSG_TYPE_END */
> >>        __u32 len;
> >>        __u16 num_truncations;
> >> }  __attribute__((packed));
> >>
> >> The TYPE_STRING would just be a wrapper for the existing text-based
> >> messages, but delimited so userspace can pick them apart. And the second
> >> one would solve your 'has the log been truncated' issue above; the
> >> kernel simply always reserves eight bytes at the end of the buffer and
> >> ends with writing a TYPE_END message with the number of messages that
> >> were dropped due to lack of space. That would make it trivial for
> >> userspace to detect truncation.
> >>
> >
> > Log truncation is not an issue, we can make bpf syscall to write back
> > the actual size of emitted log (and optionally extra bool to mean
> > "truncated") into the original log_size field.
>
> So what was all that you were talking about with "check the
> second-to-last byte" in your previous email? I understood that to be
> about detecting truncation?

Yeah, ignore my initial rambling. One can do that (detecting
truncationg) without any extra "feedback" from bpf syscall, but I
think returning filled length is probably a better approach and
doesn't hamper any other aspects.

>
> >> We could then add new message types later for machine-consumption. Say,
> >> and extended error code, or offsets into the BTF information, or
> >> whatever we end up needing. But just wrapping the existing log messages
> >> in TLVs like the ones above could be fairly straight-forwardly
> >> implemented with the existing bpf_log() infrastructure in the kernel,
> >> without having to settle on which machine-readable information is useful
> >> ahead of time... And the TLV format makes it easy for userspace to just
> >> skip message types it doesn't understand.
> >>
> >> WDYT?
> >
> > I think it's taking it a bit too far and adds more API on top of
> > existing API. Now all those types become a fixed API, messages become
> > an API, etc. Just more backwards compatibility stuff we need to
> > support forever, for, what I believe, very little gain.
> >
> > In practice, using human-readable strings works just fine. If there is
> > any kind of real issue, usually it involves humans reading debug logs
> > and understanding what's going on.
>
> Let me give an example of what I want to be able to do. Just today I
> helped someone debug getting xdp-filter to run, and they were getting
> output like this:
>
>       libbpf: -- BEGIN DUMP LOG ---
>       libbpf:
>       xdpfilt_alw_all() is not a global function
>       processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0
>       peak_states 0 mark_read 0
>
>       libbpf: -- END LOG --
>
> I would like to have xdp-filter catch that, and turn it into a
> friendlier "your compiler is too old" message. Having to effectively
> grep through the free-form log output to pick out that message feels
> brittle and error prone, as opposed to just having the kernel add a
> machine-readable id ("err_func_linkage_not_global" or somesuch) and
> stick it in a machine-parsable TLV.

Yeah, I guessed as much, but I still think it's unnecessarily
restrictive (for BPF kernel developers) to adhere to such a rigorous
error format. We'll have a few hundred different failure reasons, and
with the verifier constantly evolving some of them might get
deprecated, others will be constantly added, some other might change
the meaning, etc. I don't see this realistically working long term. So
at that point, free-form text-delimited text is as good as we should
(and, realistically, would) probably go. Library, if it wants to be
extra-helpful, can still reasonably easily parse line-delimited text,
even if that's not an ideal setup (especially if error messages will
get periodically changed).

But I'd be happy to hear what others think on this matter as well, of course.

>
> > Also adopting these packet-like messages is not as straightforward
> > through BPF code, as now you can't just construct a single log line
> > with few calls to bpf_log().
>
> Why not? bpf_log() could just transparently write the four bytes of
> header (TYPE_STRING, followed by strlen(msg)) into the buffer before the
> string? And in the future, an enhanced version could take (say) an error
> ID as another parameter and transparently add that as a separate message.

I mean when you construct one error message with few printf-like
functions. We do have that in libbpf, but I haven't checked the
verifier code. Basically, assuming one bpf_log() call is a complete
"message" might not be true.

>
> >> > Of course, if user code cares about such detection of log truncation,
> >> > it will need to set last/second-to-last bytes to zero before each
> >> > syscall.
> >> >
> >> > Both approaches have their pros/cons, we can dig into those later, but
> >> > I'd like to start this discussion and see what other people think. I
> >> > also wonder if there are other syscalls with similarly advanced input
> >> > arguments (like bpf_attr) and common logging, we could learn from
> >> > those.
> >>
> >> The first one that comes to mind is netlink extacks. Of course it's not
> >> quite comparable since netlink already has message-based semantics, but
> >> it does do sorta-kinda the same thing from a user PoV. The TLV format is
> >> obviously inspired by netlink (or, well, binary networking protocols in
> >> general).
> >>
> >
> > Yeah, I'm aware of extack, but as you said, TLV is more of a netlink
> > format, extack messages themselves are just strings. But my question
> > was more of how this log could be done for complicated API calls using
> > similar extendable attrs, like perf_event_open, clone3, openat2, etc.
>
> Ah, right, no idea :)
>
> -Toke
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-14 21:58           ` Andrii Nakryiko
@ 2020-07-14 22:19             ` Toke Høiland-Jørgensen
  2020-07-14 23:11               ` Alexei Starovoitov
  0 siblings, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-07-14 22:19 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend,
	KP Singh, Networking, Kernel Team

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

>> However, assuming it *is* possible, my larger point was that we
>> shouldn't add just a 'logging struct', but rather a 'common options
>> struct' which can be extended further as needed. And if it is *not*
>> possible to add new arguments to a syscall like you're proposing, my
>> suggestion above would be a different way to achieve basically the same
>> (at the cost of having to specify the maximum reserved space in advance).
>>
>
> yeah-yeah, I agree, it's less a "logging attr", more of "common attr
> across all commands".

Right, great. I think we are broadly in agreement with where we want to
go with this, actually :)

Let's see if anyone else chimes in; otherwise I guess I can incorporate
something along these lines in the next version of this series. I'm
going on vacation at the end of this week, though, so I will most likely
not be able to carry it to completion before then; but at least I can
post something for someone else to pick up (or if no one does it can
wait until I get back).

[...]

> Yeah, ignore my initial rambling. One can do that (detecting
> truncationg) without any extra "feedback" from bpf syscall, but I
> think returning filled length is probably a better approach and
> doesn't hamper any other aspects.

OK, sure, makes sense.

[...]

>> > Also adopting these packet-like messages is not as straightforward
>> > through BPF code, as now you can't just construct a single log line
>> > with few calls to bpf_log().
>>
>> Why not? bpf_log() could just transparently write the four bytes of
>> header (TYPE_STRING, followed by strlen(msg)) into the buffer before the
>> string? And in the future, an enhanced version could take (say) an error
>> ID as another parameter and transparently add that as a separate message.
>
> I mean when you construct one error message with few printf-like
> functions. We do have that in libbpf, but I haven't checked the
> verifier code. Basically, assuming one bpf_log() call is a complete
> "message" might not be true.

Ah, I see what you mean. I guess that could be worked around with a flag
or something, but I'll concede that in that case it's less of an obvious
drop-in replacement :)

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-14 22:19             ` Toke Høiland-Jørgensen
@ 2020-07-14 23:11               ` Alexei Starovoitov
  2020-07-15 12:56                 ` Toke Høiland-Jørgensen
  2020-07-15 19:02                 ` Andrii Nakryiko
  0 siblings, 2 replies; 26+ messages in thread
From: Alexei Starovoitov @ 2020-07-14 23:11 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Networking, Kernel Team

On Wed, Jul 15, 2020 at 12:19:03AM +0200, Toke Høiland-Jørgensen wrote:
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> 
> >> However, assuming it *is* possible, my larger point was that we
> >> shouldn't add just a 'logging struct', but rather a 'common options
> >> struct' which can be extended further as needed. And if it is *not*
> >> possible to add new arguments to a syscall like you're proposing, my
> >> suggestion above would be a different way to achieve basically the same
> >> (at the cost of having to specify the maximum reserved space in advance).
> >>
> >
> > yeah-yeah, I agree, it's less a "logging attr", more of "common attr
> > across all commands".
> 
> Right, great. I think we are broadly in agreement with where we want to
> go with this, actually :)

I really don't like 'common attr across all commands'.
Both of you are talking as libbpf developers who occasionally need to
add printk-s to the kernel. That is not an excuse to bloat api that will be
useful to two people.

The only reason log_buf sort-of make sense in raw_tp_open is because
btf comparison is moved from prog_load into raw_tp_open.
Miscompare of (prog_fd1, btf_id1) vs (prog_fd2, btf_id2) can be easily solved
by libbpf with as nice and as human friendly message libbpf can do.

I'm not convinced yet that it's a kernel job to print it nicely. It certainly can,
but it's quite a bit different from two existing bpf commands where log_buf is used:
PROG_LOAD and BTF_LOAD. In these two cases the kernel verifies the program
and the BTF. raw_tp_open is different, since the kernel needs to compare
that function signatures (prog_fd1, btf_id1) and (prog_fd2, btf_id2) are
exactly the same. The kernel can indicate that with single specific errno and
libbpf can print human friendly function signatures via btf_dump infra for
humans to see.
So I really don't see why log_buf is such a necessity for raw_tp_open.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-14 23:11               ` Alexei Starovoitov
@ 2020-07-15 12:56                 ` Toke Høiland-Jørgensen
  2020-07-15 23:41                   ` Alexei Starovoitov
  2020-07-15 19:02                 ` Andrii Nakryiko
  1 sibling, 1 reply; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-07-15 12:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Networking, Kernel Team

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Wed, Jul 15, 2020 at 12:19:03AM +0200, Toke HÃƒÂ¸iland-JÃƒÂ¸rgensen wrote:
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> 
>> >> However, assuming it *is* possible, my larger point was that we
>> >> shouldn't add just a 'logging struct', but rather a 'common options
>> >> struct' which can be extended further as needed. And if it is *not*
>> >> possible to add new arguments to a syscall like you're proposing, my
>> >> suggestion above would be a different way to achieve basically the same
>> >> (at the cost of having to specify the maximum reserved space in advance).
>> >>
>> >
>> > yeah-yeah, I agree, it's less a "logging attr", more of "common attr
>> > across all commands".
>> 
>> Right, great. I think we are broadly in agreement with where we want to
>> go with this, actually :)
>
> I really don't like 'common attr across all commands'.
> Both of you are talking as libbpf developers who occasionally need to
> add printk-s to the kernel. That is not an excuse to bloat api that will be
> useful to two people.

What? No, this is about making error messages comprehensible to people
who *can't* just go around adding printks. "Guess the source of the
EINVAL" is a really bad user experience!

> The only reason log_buf sort-of make sense in raw_tp_open is because
> btf comparison is moved from prog_load into raw_tp_open.
> Miscompare of (prog_fd1, btf_id1) vs (prog_fd2, btf_id2) can be easily solved
> by libbpf with as nice and as human friendly message libbpf can do.

So userspace is supposed to replicate all the checks done by the kernel
because we can't be bothered to add proper error messages? Really?

> I'm not convinced yet that it's a kernel job to print it nicely. It certainly can,
> but it's quite a bit different from two existing bpf commands where log_buf is used:
> PROG_LOAD and BTF_LOAD. In these two cases the kernel verifies the program
> and the BTF. raw_tp_open is different, since the kernel needs to compare
> that function signatures (prog_fd1, btf_id1) and (prog_fd2, btf_id2) are
> exactly the same. The kernel can indicate that with single specific errno and
> libbpf can print human friendly function signatures via btf_dump infra for
> humans to see.
> So I really don't see why log_buf is such a necessity for raw_tp_open.

I'll drop them from raw_tp_open for this series, but I still think they
should be added globally (or something like it). Returning a
user-friendly error message should be the absolute minimum we do. Just
like extack does for netlink.

-Toke


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-14 23:11               ` Alexei Starovoitov
  2020-07-15 12:56                 ` Toke Høiland-Jørgensen
@ 2020-07-15 19:02                 ` Andrii Nakryiko
  1 sibling, 0 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2020-07-15 19:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Toke Høiland-Jørgensen, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, Networking,
	Kernel Team

On Tue, Jul 14, 2020 at 4:11 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jul 15, 2020 at 12:19:03AM +0200, Toke Høiland-Jørgensen wrote:
> > Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >
> > >> However, assuming it *is* possible, my larger point was that we
> > >> shouldn't add just a 'logging struct', but rather a 'common options
> > >> struct' which can be extended further as needed. And if it is *not*
> > >> possible to add new arguments to a syscall like you're proposing, my
> > >> suggestion above would be a different way to achieve basically the same
> > >> (at the cost of having to specify the maximum reserved space in advance).
> > >>
> > >
> > > yeah-yeah, I agree, it's less a "logging attr", more of "common attr
> > > across all commands".
> >
> > Right, great. I think we are broadly in agreement with where we want to
> > go with this, actually :)
>
> I really don't like 'common attr across all commands'.
> Both of you are talking as libbpf developers who occasionally need to
> add printk-s to the kernel.

How did you come to this conclusion?

Inability to figure out what's wrong when using BPF is at the top of
complaints from many users, together with hard to understand logs from
verifier.

> That is not an excuse to bloat api that will be
> useful to two people.

What do you mean specifically by bloat in API? I could understand how
it would bloat the API if we were to add log-related fields into every
part of bpf_attr, one for each type of commands. But that's exactly
what I advocate to not do.

But having even a slight improvement of error reporting, beyond
current -EINVAL, -E2BIG, etc, would improve experience immensely for
*all* BPF users.

>
> The only reason log_buf sort-of make sense in raw_tp_open is because
> btf comparison is moved from prog_load into raw_tp_open.
> Miscompare of (prog_fd1, btf_id1) vs (prog_fd2, btf_id2) can be easily solved
> by libbpf with as nice and as human friendly message libbpf can do.
>
> I'm not convinced yet that it's a kernel job to print it nicely. It certainly can,
> but it's quite a bit different from two existing bpf commands where log_buf is used:
> PROG_LOAD and BTF_LOAD. In these two cases the kernel verifies the program
> and the BTF. raw_tp_open is different, since the kernel needs to compare
> that function signatures (prog_fd1, btf_id1) and (prog_fd2, btf_id2) are
> exactly the same. The kernel can indicate that with single specific errno and
> libbpf can print human friendly function signatures via btf_dump infra for
> humans to see.
> So I really don't see why log_buf is such a necessity for raw_tp_open.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-15 12:56                 ` Toke Høiland-Jørgensen
@ 2020-07-15 23:41                   ` Alexei Starovoitov
  2020-07-16  1:11                     ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2020-07-15 23:41 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, bpf, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Networking, Kernel Team

On Wed, Jul 15, 2020 at 02:56:36PM +0200, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Wed, Jul 15, 2020 at 12:19:03AM +0200, Toke HÃƒÂ¸iland-JÃƒÂ¸rgensen wrote:
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >> 
> >> >> However, assuming it *is* possible, my larger point was that we
> >> >> shouldn't add just a 'logging struct', but rather a 'common options
> >> >> struct' which can be extended further as needed. And if it is *not*
> >> >> possible to add new arguments to a syscall like you're proposing, my
> >> >> suggestion above would be a different way to achieve basically the same
> >> >> (at the cost of having to specify the maximum reserved space in advance).
> >> >>
> >> >
> >> > yeah-yeah, I agree, it's less a "logging attr", more of "common attr
> >> > across all commands".
> >> 
> >> Right, great. I think we are broadly in agreement with where we want to
> >> go with this, actually :)
> >
> > I really don't like 'common attr across all commands'.
> > Both of you are talking as libbpf developers who occasionally need to
> > add printk-s to the kernel. That is not an excuse to bloat api that will be
> > useful to two people.
> 
> What? No, this is about making error messages comprehensible to people
> who *can't* just go around adding printks. "Guess the source of the
> EINVAL" is a really bad user experience!
> 
> > The only reason log_buf sort-of make sense in raw_tp_open is because
> > btf comparison is moved from prog_load into raw_tp_open.
> > Miscompare of (prog_fd1, btf_id1) vs (prog_fd2, btf_id2) can be easily solved
> > by libbpf with as nice and as human friendly message libbpf can do.
> 
> So userspace is supposed to replicate all the checks done by the kernel
> because we can't be bothered to add proper error messages? Really?

That's not what I said. The kernel can report unique errno for miscompare
and all nice messages can and _should be_ be printed by libbpf.

On Wed, Jul 15, 2020, Andrii Nakryiko wrote:
>
> Inability to figure out what's wrong when using BPF is at the top of
> complaints from many users, together with hard to understand logs from
> verifier.

Only the second part is true. All users are complaining about the verifier.
No one is complaing that failed prog attach is somehow lacking string message.
The users are also complaing about libbpf being too verbose.
Yet you've refused to address the verbosity where it should be reduced and
now refusing to add it where it's needed.
It's libbpf job to explain users kernel errors.

The same thing is happening with perf_event_open syscall.
Every one who's trying to code it directly complaining about the kernel. But
not a single user is complaing about perf syscall when they use libraries and
tools. Same thing with bpf syscall. libbpf is the interface. It needs to clear
and to the point. Right now it's not doing it well. elf dump is too verbose and
unnecessary whereas in other places it says nothing informative where it
could have printed human hint.

libbpf's pr_perm_msg() hint is the only one where libbpf cares about its users.
All other messages are useful to libbpf developers and not its users.

The kernel verifier messages suck as well. They need to be improved.
But this thread 'lets add strings everywhere and users will be happy' is
completely missing the mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-15 23:41                   ` Alexei Starovoitov
@ 2020-07-16  1:11                     ` Andrii Nakryiko
  2020-07-16  5:44                       ` Alexei Starovoitov
  0 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2020-07-16  1:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Toke Høiland-Jørgensen, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, Networking,
	Kernel Team

On Wed, Jul 15, 2020 at 4:41 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jul 15, 2020 at 02:56:36PM +0200, Toke Høiland-Jørgensen wrote:
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >
> > > On Wed, Jul 15, 2020 at 12:19:03AM +0200, Toke HÃƒÂ¸iland-JÃƒÂ¸rgensen wrote:
> > >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> > >>
> > >> >> However, assuming it *is* possible, my larger point was that we
> > >> >> shouldn't add just a 'logging struct', but rather a 'common options
> > >> >> struct' which can be extended further as needed. And if it is *not*
> > >> >> possible to add new arguments to a syscall like you're proposing, my
> > >> >> suggestion above would be a different way to achieve basically the same
> > >> >> (at the cost of having to specify the maximum reserved space in advance).
> > >> >>
> > >> >
> > >> > yeah-yeah, I agree, it's less a "logging attr", more of "common attr
> > >> > across all commands".
> > >>
> > >> Right, great. I think we are broadly in agreement with where we want to
> > >> go with this, actually :)
> > >
> > > I really don't like 'common attr across all commands'.
> > > Both of you are talking as libbpf developers who occasionally need to
> > > add printk-s to the kernel. That is not an excuse to bloat api that will be
> > > useful to two people.
> >
> > What? No, this is about making error messages comprehensible to people
> > who *can't* just go around adding printks. "Guess the source of the
> > EINVAL" is a really bad user experience!
> >
> > > The only reason log_buf sort-of make sense in raw_tp_open is because
> > > btf comparison is moved from prog_load into raw_tp_open.
> > > Miscompare of (prog_fd1, btf_id1) vs (prog_fd2, btf_id2) can be easily solved
> > > by libbpf with as nice and as human friendly message libbpf can do.
> >
> > So userspace is supposed to replicate all the checks done by the kernel
> > because we can't be bothered to add proper error messages? Really?
>
> That's not what I said. The kernel can report unique errno for miscompare
> and all nice messages can and _should be_ be printed by libbpf.
>
>
> On Wed, Jul 15, 2020, Andrii Nakryiko wrote:
> >
> > Inability to figure out what's wrong when using BPF is at the top of
> > complaints from many users, together with hard to understand logs from
> > verifier.
>
> Only the second part is true. All users are complaining about the verifier.
> No one is complaing that failed prog attach is somehow lacking string message.

Ok, next time I'll be helping someone to figure out another -EINVAL,
I'll remember to reassure them that it's not really frustrating, not a
guess game, and not a time sink at all.

> The users are also complaing about libbpf being too verbose.

Very well might be, but apart from your complaints on that patch
adding program loading debug message, I can't remember a single case
when someone complained about that. Do you have a link for me to get
some context?

> Yet you've refused to address the verbosity where it should be reduced and

It's open-source, everyone is welcome to submit their patches. Just
because I don't think we need to remove some log messages and thus not
am creating such patches, doesn't mean it can't be done by someone
else.

> now refusing to add it where it's needed.

Can you point to or quote where I refused to add a helpful message to libbpf?

> It's libbpf job to explain users kernel errors.

To the best of its ability, yes. Unfortunately there were many times
where I, as a human, couldn't figure it out without printk'ing my way
around the kernel. If I can't do that, I can't teach libbpf to do it.
Error codes are just not granular enough to allow distinguishing a lot
of error conditions, either by humans or automatically by libbpf.

>
> The same thing is happening with perf_event_open syscall.
> Every one who's trying to code it directly complaining about the kernel. But
> not a single user is complaing about perf syscall when they use libraries and
> tools. Same thing with bpf syscall. libbpf is the interface. It needs to clear
> and to the point. Right now it's not doing it well. elf dump is too verbose and
> unnecessary whereas in other places it says nothing informative where it
> could have printed human hint.
>
> libbpf's pr_perm_msg() hint is the only one where libbpf cares about its users.
> All other messages are useful to libbpf developers and not its users.

"Couldn't load trivial BPF program. Make sure your kernel supports BPF
(CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
value."
"kernel doesn't support global data"
"can't attach BPF program w/o FD (did you load it?)"
"specified path %s is not on BPF FS"
"vmlinux BTF is not found" -- we should clearly add "you need a kernel
built with CONFIG_DEBUG_INFO_BTF=y"
"invalid relo for \'%s\' in special section 0x%x; forgot to initialize
global var?.."

And so on. Sure, we can have more and better error messages, no one is
claiming libbpf is perfect or done.

That doesn't mean, though, that the kernel itself can't do better in
terms of error reporting. But you clearly don't think it's a real
problem, so I'll let it rest, thank you.

>
> The kernel verifier messages suck as well. They need to be improved.
> But this thread 'lets add strings everywhere and users will be happy' is
> completely missing the mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-16  1:11                     ` Andrii Nakryiko
@ 2020-07-16  5:44                       ` Alexei Starovoitov
  2020-07-16 19:59                         ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2020-07-16  5:44 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, Networking,
	Kernel Team

On Wed, Jul 15, 2020 at 06:11:39PM -0700, Andrii Nakryiko wrote:
> >
> > On Wed, Jul 15, 2020, Andrii Nakryiko wrote:
> > >
> > > Inability to figure out what's wrong when using BPF is at the top of
> > > complaints from many users, together with hard to understand logs from
> > > verifier.
> >
> > Only the second part is true. All users are complaining about the verifier.
> > No one is complaing that failed prog attach is somehow lacking string message.
> 
> Ok, next time I'll be helping someone to figure out another -EINVAL,
> I'll remember to reassure them that it's not really frustrating, not a
> guess game, and not a time sink at all.

When the next time the users hit EINVAL due to _their_ usage and not
due to kernel or libbpf bug and libbpf couldn't do anything to make
the error user friendly then yes please raise it up.

> > The users are also complaing about libbpf being too verbose.
> 
> Very well might be, but apart from your complaints on that patch
> adding program loading debug message, I can't remember a single case
> when someone complained about that. Do you have a link for me to get
> some context?
> 
> > Yet you've refused to address the verbosity where it should be reduced and
> 
> It's open-source, everyone is welcome to submit their patches. Just
> because I don't think we need to remove some log messages and thus not
> am creating such patches, doesn't mean it can't be done by someone
> else.

So myself and Toke are wearing 'bpf user' hat in that context.
Both of us indicated that libbpf output is too verbose.
Your response "just send a patch" is a sure way to turn away more users.

> > now refusing to add it where it's needed.
> 
> Can you point to or quote where I refused to add a helpful message to libbpf?

see below for detailed example.

> 
> > It's libbpf job to explain users kernel errors.
> 
> To the best of its ability, yes. Unfortunately there were many times
> where I, as a human, couldn't figure it out without printk'ing my way
> around the kernel. If I can't do that, I can't teach libbpf to do it.
> Error codes are just not granular enough to allow distinguishing a lot
> of error conditions, either by humans or automatically by libbpf.

If there are such cases please bring it up. I'm sure kernel errnos
can become more unique.

> >
> > The same thing is happening with perf_event_open syscall.
> > Every one who's trying to code it directly complaining about the kernel. But
> > not a single user is complaing about perf syscall when they use libraries and
> > tools. Same thing with bpf syscall. libbpf is the interface. It needs to clear
> > and to the point. Right now it's not doing it well. elf dump is too verbose and
> > unnecessary whereas in other places it says nothing informative where it
> > could have printed human hint.
> >
> > libbpf's pr_perm_msg() hint is the only one where libbpf cares about its users.
> > All other messages are useful to libbpf developers and not its users.
> 
> "Couldn't load trivial BPF program. Make sure your kernel supports BPF
> (CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
> value."
> "kernel doesn't support global data"
> "can't attach BPF program w/o FD (did you load it?)"
> "specified path %s is not on BPF FS"
> "vmlinux BTF is not found" -- we should clearly add "you need a kernel
> built with CONFIG_DEBUG_INFO_BTF=y"
> "invalid relo for \'%s\' in special section 0x%x; forgot to initialize
> global var?.."

sure. those count too.

> That doesn't mean, though, that the kernel itself can't do better in
> terms of error reporting. But you clearly don't think it's a real
> problem, so I'll let it rest, thank you.

It is a real problem and libbpf has to step up to address it.
The kernel does everything it can already.

Let's take this raw_tp_open patches as an example.
Currently raw_tp_open will EINVAL if prog_fd is incorrect, tp name
is not specified or expected_attach_type doesn't match.
If that happens it's a _libbpf_ bug. It's not a user mistake.
With Toke's patches tgt prog_fd and btf_id are added.
Both can be incorrect. If that happens it's most likely libbpf bug.
The user is writing their bpf_prog.c file with SEC("freplace/name")
libbpf could have messed up prog_fd and btf_id resolution.
The user didn't specify btf_id as a raw integer.
It was a libbpf job to convert user's string name into prog_fd
and btf_id in the first place. If libbpf messes it it shouldn't rely
on the kernel to catch such bugs. It's not a job of the kernel to
point out bugs in the libraries that suppose to be tightly
coupled with the kernel.
When libbpf is doing its job correctly tgt_prog_fd and tgt_btf_id are
valid and BTF of the extension prog can miscompare with BTF of
target prog it's trying to attach to via raw_tp_open with these two
arguments. Take a look at Toke's patches. That comparison is done via
btf_check_type_match(). The most helpful kernel message in such case
will be 'arg2 in foo() has size 4 while bar() has 2'.
That's the best the kernel can do. Yet it's very user unfriendly.
The kernel has no infra today to print BTF in a human friendly way.
But libbpf has BTF dumper. btf_dump__emit_type_decl() alone is
magnitude friendlier to users than kernel messages.
When the kernel's raw_tp_open syscall realizes that BTFs don't match
it can return single unique errno and libbpf can tell the user:
"
Function prototype of BPF prog:
   int ext_bpf_prog(struct __sk_buff *);
doesn't match function prototype of target prog:
   int tgt_prog_name(struct xdp_md *);
"
Everything is available in libbpf to print such error in human friendly way.
Whereas the kernel has no ability to do so.
If that message was coming from the kernel it would have been from
btf_check_type_match() and would be:
"arg1 in ext_bpf_prog() is not a pointer to context".
Compare that to above libbpf error message.
I think the difference is pretty drastic.
libbpf by far is a better place to print human friendly errors.

I'm arguing that raw_tp_open with additional tgt_prog_fd and tgt_btf_id
parameters is not much different from sys_kcmp syscall and from memcmp() libc
function. They compare two objects and return equal or not.
It's not a job of the kernel to say
'objects are not equal because byte 2 is different'.
Such message is not helpeful to users.
It's libbpf job to print two BTFs in human friendly way when the kernel
found them different. It's easier for human to glance over two function
prototypes and spot the difference instead of 'arg1 is not a pointer to context'.

So please make libbpf user friendly. Enough of pointing fingers at the kernel.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-16  5:44                       ` Alexei Starovoitov
@ 2020-07-16 19:59                         ` Andrii Nakryiko
  2020-07-16 20:19                           ` Toke Høiland-Jørgensen
  2020-07-17  3:09                           ` Alexei Starovoitov
  0 siblings, 2 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2020-07-16 19:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Toke Høiland-Jørgensen, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, Networking,
	Kernel Team, Blake Matheny

On Wed, Jul 15, 2020 at 10:44 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Jul 15, 2020 at 06:11:39PM -0700, Andrii Nakryiko wrote:
> > >
> > > On Wed, Jul 15, 2020, Andrii Nakryiko wrote:
> > > >
> > > > Inability to figure out what's wrong when using BPF is at the top of
> > > > complaints from many users, together with hard to understand logs from
> > > > verifier.
> > >
> > > Only the second part is true. All users are complaining about the verifier.
> > > No one is complaing that failed prog attach is somehow lacking string message.
> >
> > Ok, next time I'll be helping someone to figure out another -EINVAL,
> > I'll remember to reassure them that it's not really frustrating, not a
> > guess game, and not a time sink at all.
>
> When the next time the users hit EINVAL due to _their_ usage and not
> due to kernel or libbpf bug and libbpf couldn't do anything to make
> the error user friendly then yes please raise it up.

I know this is futile to convince you anyways, so I won't go dig all
the details, but here are few general examples. With the benefit of
hindsight in a lot of those cases libbpf can do extra checks and
guessing (though bad guess is worse than no guess), but that doesn't
scale because of the sheer amount of possible situations.

What I personally went through when I was building runqslower:

1. You write a simple BPF program, open + load. You run it, you get EPERM.
2. You go check if you ran the program with sudo, retry with sudo.
3. You still get EPERM. You bang your head against the wall for 30
minutes, you recall about RLIMIT_MEMLOCK, you bump it up.
4. If you are happy, now everything works. I don't remember now if I
had to bang my head some more until I got the initial minimal version
of runqslower successfully load BPF program.

Some popular other issues I can recall.

1. People get excited and try to use fentry/fexit, get -EINVAL. Most
probably it's because they don't have a recent enough kernel.
Otherwise (if they know libbpf behavior), they'd probably have gotten
something about not being able to find the desired BTF func in the
kernel. Even in that case, is that that they don't have BTF at all
(CONFIG_DEBUG_BTF_INFO=y)? Or it's just an old one with no FUNCs
because pahole isn't recent enough? Kernel build won't mention that
you need 1.16 for fentry/fexit, of course. And so on. This is actually
not the worst case, because I can walk them through this process.

2. When dealing with cgroups. You get EINVAL for every tiny misstep.
You want to replace the BPF program, you get EINVAL. Maybe you forgot
one of the necessary flags? Or maybe you specified incompatible flags
(BPF_F_REPLACE | BPF_F_ALLOW_MULTI)? Or the parent cgroup has a
non-overridable BPF program attached already? Or maybe prog FD is
wrong, or cgroup FD is wrong, or?...

3. Someone gets excited about libbpf-tools in BCC, decides to convert
a tool. Needs to dump the hashmap fast. There is BATCH_LOOKUP, nice!
They try, they get EINVAL. Most probably outdated kernel, but I'll
need to dig into kernel sources to see what else could go wrong.

4. Try using some of the low-level APIs from libbpf/bpf.h, like the
same cgroup program replacement. You'll get E2BIG and will scratch
your head for a while, checking how the list of attached BPF programs
can be too big, if it's the only one. Just to realize that again, your
kernel is just a touch too old and just complains about non-zero
unknown field in bpf_attr.

5. Just few days ago, one user was doing bpf_program__attach_tracepoint():

libbpf: program 'tp/sched/sched_process_exit': failed to attach to pfd
10: File exists

After a bit of back and forth turns out he had a second instance of
that program running in parallel. Good, I quickly realize that it's an
old kernel and it doesn't allow me to attach more than 1 BPF program
to the tracepoint. Case solved. But what about many other BPF users
that do not have access to someone developing BPF in the kernel?

And the list goes on. Even if it was my full-time job just to
anticipate all the misuses and try to check/guess them in libbpf, that
wouldn't work and won't scale.

And all I was asking (and not finger pointing or blaming anything or
anyone) to have a mechanism in the kernel to get a single-line
human-readable hint as to which one out of many
EINVAL/E2BIG/ENOENT/EPERM conditions was hit. No need to immediately
convert all of them, we could have gradually added that, prioritizing
common and most probable ones to hit. But alas, it's easier to blame
me for not perfecting libbpf and anticipating all that beforehand.

>
> > > The users are also complaing about libbpf being too verbose.
> >
> > Very well might be, but apart from your complaints on that patch
> > adding program loading debug message, I can't remember a single case
> > when someone complained about that. Do you have a link for me to get
> > some context?
> >
> > > Yet you've refused to address the verbosity where it should be reduced and
> >
> > It's open-source, everyone is welcome to submit their patches. Just
> > because I don't think we need to remove some log messages and thus not
> > am creating such patches, doesn't mean it can't be done by someone
> > else.
>
> So myself and Toke are wearing 'bpf user' hat in that context.
> Both of us indicated that libbpf output is too verbose.
> Your response "just send a patch" is a sure way to turn away more users.
>

I can't find any such complaint from Toke in this thread, and can't
really recall something like that from recent discussions. I'd rather
have him speak for himself.

But neither you nor Toke are "just bpf users", you are totally capable
of creating a patch addressing something that bugs you.

Upstream BPF users have never complained to me about the verbosity of
logs. Furthermore, they rarely (if ever) see those **debug** logs,
until I actually ask them to turn it on to help me help them. Yes,
debug-level libbpf logs are driven more towards me and other people
(like Toke) who know internals and are helping users with issues. I
actually try to help BPF users, not ask them for patches to libbpf,
and you can find many patches from me where I was
improving/fixing/adding to kernel and libbpf as a response to both
upstream and internal BPF users. I am far from asking for recognition
of that, but please don't make false accusations either.

> > > now refusing to add it where it's needed.
> >
> > Can you point to or quote where I refused to add a helpful message to libbpf?
>
> see below for detailed example.
>
> >
> > > It's libbpf job to explain users kernel errors.
> >
> > To the best of its ability, yes. Unfortunately there were many times
> > where I, as a human, couldn't figure it out without printk'ing my way
> > around the kernel. If I can't do that, I can't teach libbpf to do it.
> > Error codes are just not granular enough to allow distinguishing a lot
> > of error conditions, either by humans or automatically by libbpf.
>
> If there are such cases please bring it up. I'm sure kernel errnos
> can become more unique.

See above for a few typical examples. I don't keep track of all the
issues I had to debug myself or help others with.

>
> > >
> > > The same thing is happening with perf_event_open syscall.
> > > Every one who's trying to code it directly complaining about the kernel. But
> > > not a single user is complaing about perf syscall when they use libraries and
> > > tools. Same thing with bpf syscall. libbpf is the interface. It needs to clear
> > > and to the point. Right now it's not doing it well. elf dump is too verbose and
> > > unnecessary whereas in other places it says nothing informative where it
> > > could have printed human hint.
> > >
> > > libbpf's pr_perm_msg() hint is the only one where libbpf cares about its users.
> > > All other messages are useful to libbpf developers and not its users.
> >
> > "Couldn't load trivial BPF program. Make sure your kernel supports BPF
> > (CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
> > value."
> > "kernel doesn't support global data"
> > "can't attach BPF program w/o FD (did you load it?)"
> > "specified path %s is not on BPF FS"
> > "vmlinux BTF is not found" -- we should clearly add "you need a kernel
> > built with CONFIG_DEBUG_INFO_BTF=y"
> > "invalid relo for \'%s\' in special section 0x%x; forgot to initialize
> > global var?.."
>
> sure. those count too.

great, thanks for acknowledging

>
> > That doesn't mean, though, that the kernel itself can't do better in
> > terms of error reporting. But you clearly don't think it's a real
> > problem, so I'll let it rest, thank you.
>
> It is a real problem and libbpf has to step up to address it.
> The kernel does everything it can already.
>
> Let's take this raw_tp_open patches as an example.
> Currently raw_tp_open will EINVAL if prog_fd is incorrect, tp name
> is not specified or expected_attach_type doesn't match.
> If that happens it's a _libbpf_ bug. It's not a user mistake.
> With Toke's patches tgt prog_fd and btf_id are added.
> Both can be incorrect. If that happens it's most likely libbpf bug.
> The user is writing their bpf_prog.c file with SEC("freplace/name")
> libbpf could have messed up prog_fd and btf_id resolution.
> The user didn't specify btf_id as a raw integer.
> It was a libbpf job to convert user's string name into prog_fd
> and btf_id in the first place. If libbpf messes it it shouldn't rely
> on the kernel to catch such bugs. It's not a job of the kernel to
> point out bugs in the libraries that suppose to be tightly
> coupled with the kernel.
> When libbpf is doing its job correctly tgt_prog_fd and tgt_btf_id are
> valid and BTF of the extension prog can miscompare with BTF of
> target prog it's trying to attach to via raw_tp_open with these two
> arguments. Take a look at Toke's patches. That comparison is done via
> btf_check_type_match(). The most helpful kernel message in such case
> will be 'arg2 in foo() has size 4 while bar() has 2'.
> That's the best the kernel can do. Yet it's very user unfriendly.
> The kernel has no infra today to print BTF in a human friendly way.
> But libbpf has BTF dumper. btf_dump__emit_type_decl() alone is
> magnitude friendlier to users than kernel messages.
> When the kernel's raw_tp_open syscall realizes that BTFs don't match
> it can return single unique errno and libbpf can tell the user:
> "
> Function prototype of BPF prog:
>    int ext_bpf_prog(struct __sk_buff *);
> doesn't match function prototype of target prog:
>    int tgt_prog_name(struct xdp_md *);
> "
> Everything is available in libbpf to print such error in human friendly way.
> Whereas the kernel has no ability to do so.
> If that message was coming from the kernel it would have been from
> btf_check_type_match() and would be:
> "arg1 in ext_bpf_prog() is not a pointer to context".
> Compare that to above libbpf error message.
> I think the difference is pretty drastic.
> libbpf by far is a better place to print human friendly errors.
>
> I'm arguing that raw_tp_open with additional tgt_prog_fd and tgt_btf_id
> parameters is not much different from sys_kcmp syscall and from memcmp() libc
> function. They compare two objects and return equal or not.
> It's not a job of the kernel to say
> 'objects are not equal because byte 2 is different'.
> Such message is not helpeful to users.
> It's libbpf job to print two BTFs in human friendly way when the kernel
> found them different. It's easier for human to glance over two function
> prototypes and spot the difference instead of 'arg1 is not a pointer to context'.

Ok, so I assume that is an example where I refused to add helpful
error reporting to libbpf?

First, thanks for taking the time to elaborate on this. I'm sure Toke
will appreciate it, though I have nothing to do with arguing for or
against this. It was Toke's scepticism, not mine, that you are
addressing. I never said we shouldn't do this.

I do agree that in some cases libbpf has the same or more information
and it's easier to pretty-print extra context, like in this case,
which we should take advantage of.

I started this topic to discuss extending bpf() syscall in general,
regardless of this particular change and feature.

>
> So please make libbpf user friendly. Enough of pointing fingers at the kernel.

It's an honor you are associating me personally and libbpf, but I
still hope we can move this forward as a community and collective
project. Of course let's make libbpf more user-friendly where possible
and feasible, but I don't think arguing about debug-level logging in
libbpf is the right place to start.

I don't know where "finger pointing" comes from, I had no intent nor I
said anything to offend either the kernel or you personally. Proposing
changes up for discussion shouldn't be considered a form of attacking
or finger pointing, I hope.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-16 19:59                         ` Andrii Nakryiko
@ 2020-07-16 20:19                           ` Toke Høiland-Jørgensen
  2020-07-17  3:09                           ` Alexei Starovoitov
  1 sibling, 0 replies; 26+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-07-16 20:19 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend,
	KP Singh, Networking, Kernel Team, Blake Matheny

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

>> So myself and Toke are wearing 'bpf user' hat in that context.
>> Both of us indicated that libbpf output is too verbose.
>> Your response "just send a patch" is a sure way to turn away more users.
>>
>
> I can't find any such complaint from Toke in this thread, and can't
> really recall something like that from recent discussions. I'd rather
> have him speak for himself.

I think what I said (not in this thread, way back during some other
discussion) was that I agreed that libbpf was being too verbose by
dumping all the sections and relocations it finds when reading an ELF
file, which causes the useful error messages to get lost. I would like
to see those messages demoted to another log level, or removed
altogether.

I won't have time to look more at this right now, but I do plan to
circle back to it: I agree with you that we need to make this more
friendly. And yes, I also think this should include finding a way to
disambiguate between different conditions leading to the same error from
the kernel. I've run into a lot of the same issues as you when
supporting people who are new to BPF - thank you for the extensive list!

-Toke

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-16 19:59                         ` Andrii Nakryiko
  2020-07-16 20:19                           ` Toke Høiland-Jørgensen
@ 2020-07-17  3:09                           ` Alexei Starovoitov
  2020-07-18  3:54                             ` Andrii Nakryiko
  1 sibling, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2020-07-17  3:09 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, Networking,
	Kernel Team, Blake Matheny

On Thu, Jul 16, 2020 at 12:59:30PM -0700, Andrii Nakryiko wrote:
> On Wed, Jul 15, 2020 at 10:44 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Jul 15, 2020 at 06:11:39PM -0700, Andrii Nakryiko wrote:
> > > >
> > > > On Wed, Jul 15, 2020, Andrii Nakryiko wrote:
> > > > >
> > > > > Inability to figure out what's wrong when using BPF is at the top of
> > > > > complaints from many users, together with hard to understand logs from
> > > > > verifier.
> > > >
> > > > Only the second part is true. All users are complaining about the verifier.
> > > > No one is complaing that failed prog attach is somehow lacking string message.
> > >
> > > Ok, next time I'll be helping someone to figure out another -EINVAL,
> > > I'll remember to reassure them that it's not really frustrating, not a
> > > guess game, and not a time sink at all.
> >
> > When the next time the users hit EINVAL due to _their_ usage and not
> > due to kernel or libbpf bug and libbpf couldn't do anything to make
> > the error user friendly then yes please raise it up.
> 
> I know this is futile to convince you anyways, so I won't go dig all
> the details, but here are few general examples. With the benefit of
> hindsight in a lot of those cases libbpf can do extra checks and
> guessing (though bad guess is worse than no guess), but that doesn't
> scale because of the sheer amount of possible situations.
> 
> What I personally went through when I was building runqslower:
> 
> 1. You write a simple BPF program, open + load. You run it, you get EPERM.
> 2. You go check if you ran the program with sudo, retry with sudo.
> 3. You still get EPERM. You bang your head against the wall for 30
> minutes, you recall about RLIMIT_MEMLOCK, you bump it up.
> 4. If you are happy, now everything works. I don't remember now if I
> had to bang my head some more until I got the initial minimal version
> of runqslower successfully load BPF program.

But we have libbpf hinting on that already?
Are you saying that hint wasn't working somehow?

> 
> Some popular other issues I can recall.
> 
> 1. People get excited and try to use fentry/fexit, get -EINVAL. Most
> probably it's because they don't have a recent enough kernel.
> Otherwise (if they know libbpf behavior), they'd probably have gotten
> something about not being able to find the desired BTF func in the
> kernel. Even in that case, is that that they don't have BTF at all
> (CONFIG_DEBUG_BTF_INFO=y)? Or it's just an old one with no FUNCs
> because pahole isn't recent enough? Kernel build won't mention that
> you need 1.16 for fentry/fexit, of course. And so on. This is actually
> not the worst case, because I can walk them through this process.

Exactly. Either old kernel or missing config in the kernel.
libbpf is above that. It could have provided a hint.
When libbpf is processing SEC("fentry/") and it fails to load with
empty verifier log it could start probing.
Only libbpf can do it. Kernel is helpless here.
Say we change the kernel errno for all unsuported prog types and maps
it would return ENOTSUPP or something.
Would it really help the situation?
Probably not. There will be old kernels and the same usability
issue as you described.

> 2. When dealing with cgroups. You get EINVAL for every tiny misstep.
> You want to replace the BPF program, you get EINVAL. Maybe you forgot
> one of the necessary flags? Or maybe you specified incompatible flags
> (BPF_F_REPLACE | BPF_F_ALLOW_MULTI)? Or the parent cgroup has a
> non-overridable BPF program attached already? Or maybe prog FD is
> wrong, or cgroup FD is wrong, or?...

A wrong FD for prog or cgroup would be a libbpf bug.
imo that's the case where kernel must not return a string even
if there is a log_buf facility.

non-overridable already attached is EPERM. It's not EINVAL.
multi-prog vs overridable is also EPERM.
Say, the kernel has log_buf for hierarchy_allows_attach().
what kind of message do you think it can print that will be user friendly?
"attach is not allowed because there is a parent cgroup
that has no override flag set"
How is it much better than EPERM?
Two differentiate between these two EPERM?
Both to me look like the same category if permission checks.
I don't know how to the kernel can print full cgroup path here.
It has 'struct cgroup *'. Then go to kernfs? then what?
cgroupfs is mounted somwhere.
To have meaningul message the user would want to see something like:
"
  /sys/fs/cgroup/my_container <- has no-override bpf prog, hence
  the kernel doesn't allow attach at:
  /sys/fs/cgroup/my_container/here
"
Only libbpf can print such message.
The user gave libbpf a string path. Attaching is relative to that.
Whereas the kernel has hard time with mounts/paths/namespaces.

May be to solve this cgroup attaching ambiguity we can add a libbpf
helper that can walk hierarchy and print state?
Either libbpf calls it automatically or user can trigger it?

> 3. Someone gets excited about libbpf-tools in BCC, decides to convert
> a tool. Needs to dump the hashmap fast. There is BATCH_LOOKUP, nice!
> They try, they get EINVAL. Most probably outdated kernel, but I'll
> need to dig into kernel sources to see what else could go wrong.

right. that is similar to 1. I don't see how kernel could have helped.
Say sys_bpf got top level log_buf for _all_ commands.
The user is passing a command that is unsupported.
The kernel has no clue whether log_buf is even there in bpf_attr.
It cannot return a string. Only errno.
If we change that errno from current EINVAL to ENOTSUPP...
it would go back to the point I made above.
Not really helping much.
imo that's another case where libbpf probing can go long way to
improve user experience.

> 
> 4. Try using some of the low-level APIs from libbpf/bpf.h, like the
> same cgroup program replacement. You'll get E2BIG and will scratch
> your head for a while, checking how the list of attached BPF programs
> can be too big, if it's the only one. Just to realize that again, your
> kernel is just a touch too old and just complains about non-zero
> unknown field in bpf_attr.

The low-level APIs of libbpf is indeed a pain.
There are also old and partially broken libbpf APIs.
I think we need to start aggressively deprecating some of them.
Especially those that cause debugging issues to users.

> 5. Just few days ago, one user was doing bpf_program__attach_tracepoint():
> 
> libbpf: program 'tp/sched/sched_process_exit': failed to attach to pfd
> 10: File exists
> 
> After a bit of back and forth turns out he had a second instance of
> that program running in parallel. Good, I quickly realize that it's an
> old kernel and it doesn't allow me to attach more than 1 BPF program
> to the tracepoint. Case solved. But what about many other BPF users
> that do not have access to someone developing BPF in the kernel?

I similarly don't see how kernel string would have helped.
It would have said the same thing: "cannot attach".

> And the list goes on. Even if it was my full-time job just to
> anticipate all the misuses and try to check/guess them in libbpf, that
> wouldn't work and won't scale.
> 
> And all I was asking (and not finger pointing or blaming anything or
> anyone) to have a mechanism in the kernel to get a single-line
> human-readable hint as to which one out of many
> EINVAL/E2BIG/ENOENT/EPERM conditions was hit. No need to immediately
> convert all of them, we could have gradually added that, prioritizing
> common and most probable ones to hit.

I'm not against adding log_buf to more bpf commands.
In many cases they are needed.
I'm against the blank statement that _all_ bpf commands need log_buf
and that is somehow will solve user debug nightmares.

> project. Of course let's make libbpf more user-friendly where possible
> and feasible,

Let's do so.
This thread jumped to early conclusion that log_buf for all bpf commands
will magically improve user experience. Both you and Toke were happy to
conclude that "horrible kernel UI/UX" is responsible for everything and
it has to be the one to fix. I don't think that's the case and I hope
working through the examples of bad user experience above made it clear.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-17  3:09                           ` Alexei Starovoitov
@ 2020-07-18  3:54                             ` Andrii Nakryiko
  2020-07-20 22:30                               ` Alexei Starovoitov
  0 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2020-07-18  3:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Toke Høiland-Jørgensen, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, Networking,
	Kernel Team, Blake Matheny

On Thu, Jul 16, 2020 at 8:09 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jul 16, 2020 at 12:59:30PM -0700, Andrii Nakryiko wrote:
> > On Wed, Jul 15, 2020 at 10:44 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Jul 15, 2020 at 06:11:39PM -0700, Andrii Nakryiko wrote:
> > > > >
> > > > > On Wed, Jul 15, 2020, Andrii Nakryiko wrote:
> > > > > >
> > > > > > Inability to figure out what's wrong when using BPF is at the top of
> > > > > > complaints from many users, together with hard to understand logs from
> > > > > > verifier.
> > > > >
> > > > > Only the second part is true. All users are complaining about the verifier.
> > > > > No one is complaing that failed prog attach is somehow lacking string message.
> > > >
> > > > Ok, next time I'll be helping someone to figure out another -EINVAL,
> > > > I'll remember to reassure them that it's not really frustrating, not a
> > > > guess game, and not a time sink at all.
> > >
> > > When the next time the users hit EINVAL due to _their_ usage and not
> > > due to kernel or libbpf bug and libbpf couldn't do anything to make
> > > the error user friendly then yes please raise it up.
> >
> > I know this is futile to convince you anyways, so I won't go dig all
> > the details, but here are few general examples. With the benefit of
> > hindsight in a lot of those cases libbpf can do extra checks and
> > guessing (though bad guess is worse than no guess), but that doesn't
> > scale because of the sheer amount of possible situations.
> >
> > What I personally went through when I was building runqslower:
> >
> > 1. You write a simple BPF program, open + load. You run it, you get EPERM.
> > 2. You go check if you ran the program with sudo, retry with sudo.
> > 3. You still get EPERM. You bang your head against the wall for 30
> > minutes, you recall about RLIMIT_MEMLOCK, you bump it up.
> > 4. If you are happy, now everything works. I don't remember now if I
> > had to bang my head some more until I got the initial minimal version
> > of runqslower successfully load BPF program.
>
> But we have libbpf hinting on that already?
> Are you saying that hint wasn't working somehow?

At that time the message was just "failed to load primitive BPF
program", we added mentions of CONFIG_BPF_SYSCALL and RLIMIT_MEMLOCK
much later. And even current message might be misleading, because it
might be that you are not a root (or, in more modern kernels, don't
have CAP_BPF). Maybe not, if we are relying on unprivileged BPF for
primitive prog, I don't know.

>
> >
> > Some popular other issues I can recall.
> >
> > 1. People get excited and try to use fentry/fexit, get -EINVAL. Most
> > probably it's because they don't have a recent enough kernel.
> > Otherwise (if they know libbpf behavior), they'd probably have gotten
> > something about not being able to find the desired BTF func in the
> > kernel. Even in that case, is that that they don't have BTF at all
> > (CONFIG_DEBUG_BTF_INFO=y)? Or it's just an old one with no FUNCs
> > because pahole isn't recent enough? Kernel build won't mention that
> > you need 1.16 for fentry/fexit, of course. And so on. This is actually
> > not the worst case, because I can walk them through this process.
>
> Exactly. Either old kernel or missing config in the kernel.
> libbpf is above that. It could have provided a hint.
> When libbpf is processing SEC("fentry/") and it fails to load with
> empty verifier log it could start probing.

As I said, listing these examples I didn't expect to convince you,
because I knew you'd propose one or another way how libbpf can provide
more "diagnostics". In a lot of cases it can. But usually after the
fact (debugging it in a painful way). I don't believe we can
anticipate all cases, though, and even those that we can, I'm not sure
how scalable it is to do all that. Requiring anyone adding any new API
to provide a comprehensive set of probing/checks/diagnostics to
prevent possible error conditions seems to be way too harsh on
contributors.

Unfortunately, details escape me now for the case I had in mind, where
I had absolutely no clue which of many checks are failing, even after
reading kernel source code very carefully. Too bad, but I don't keep
such detailed notes.

> Only libbpf can do it. Kernel is helpless here.
> Say we change the kernel errno for all unsuported prog types and maps
> it would return ENOTSUPP or something.
> Would it really help the situation?

IMO, if the kernel just prints out "Unknown BPF command 123" or
"Unknown map type 345" that would be already a nice improvement.

> Probably not. There will be old kernels and the same usability
> issue as you described.

Of course old kernels will still have those problems, but it never
stopped us from fixing and improving the kernel moving forward so that
eventually a better version is available (almost) everywhere.

>
> > 2. When dealing with cgroups. You get EINVAL for every tiny misstep.
> > You want to replace the BPF program, you get EINVAL. Maybe you forgot
> > one of the necessary flags? Or maybe you specified incompatible flags
> > (BPF_F_REPLACE | BPF_F_ALLOW_MULTI)? Or the parent cgroup has a
> > non-overridable BPF program attached already? Or maybe prog FD is
> > wrong, or cgroup FD is wrong, or?...
>
> A wrong FD for prog or cgroup would be a libbpf bug.

cgroup fd is the direct input parameter for
bpf_program__attach_cgroup() and bpf_prog_attach() APIs. But, of
course, we can also add a variant that takes a string path to cgroup
and fetches it. Or we can double-check that provided cgroup FD is
actually cgroup FD.

> imo that's the case where kernel must not return a string even
> if there is a log_buf facility.
>
> non-overridable already attached is EPERM. It's not EINVAL.
> multi-prog vs overridable is also EPERM.

Attaching the same BPF prog is EINVAL, though. But it doesn't matter
which error code specifically, either EPERM, or EINVAL, or even E2BIG
have many possible meanings in bpf() syscall.

> Say, the kernel has log_buf for hierarchy_allows_attach().
> what kind of message do you think it can print that will be user friendly?
> "attach is not allowed because there is a parent cgroup
> that has no override flag set"
> How is it much better than EPERM?

"parent cgroup prevents attaching BPF program" would already lead into
the right direction. It's not perfect and full, but better than more
generic EPERM. But sure, EPERM has less variants than EINVAL, so I'm
personally always happier with EPERM, than with EINVAL.

> Two differentiate between these two EPERM?
> Both to me look like the same category if permission checks.
> I don't know how to the kernel can print full cgroup path here.
> It has 'struct cgroup *'. Then go to kernfs? then what?
> cgroupfs is mounted somwhere.
> To have meaningul message the user would want to see something like:
> "
>   /sys/fs/cgroup/my_container <- has no-override bpf prog, hence
>   the kernel doesn't allow attach at:
>   /sys/fs/cgroup/my_container/here
> "
> Only libbpf can print such message.
> The user gave libbpf a string path. Attaching is relative to that.
> Whereas the kernel has hard time with mounts/paths/namespaces.

Sure, I didn't even hope for cgroup path or anything like that. Maybe
cgroup ID would be nice, don't know, but even without that just saying
that it's parent cgroup that's the culprit is a big step forward, IMO.

>
> May be to solve this cgroup attaching ambiguity we can add a libbpf
> helper that can walk hierarchy and print state?
> Either libbpf calls it automatically or user can trigger it?
>
> > 3. Someone gets excited about libbpf-tools in BCC, decides to convert
> > a tool. Needs to dump the hashmap fast. There is BATCH_LOOKUP, nice!
> > They try, they get EINVAL. Most probably outdated kernel, but I'll
> > need to dig into kernel sources to see what else could go wrong.
>
> right. that is similar to 1. I don't see how kernel could have helped.
> Say sys_bpf got top level log_buf for _all_ commands.
> The user is passing a command that is unsupported.
> The kernel has no clue whether log_buf is even there in bpf_attr.
> It cannot return a string. Only errno.
> If we change that errno from current EINVAL to ENOTSUPP...
> it would go back to the point I made above.
> Not really helping much.
> imo that's another case where libbpf probing can go long way to
> improve user experience.

log_buf can't help existing kernels. Period. No one is arguing or
expecting that. But moving forward, just having that "unknown command
123" would be great.

But yeah, of course libbpf can create a probing map and try to do
BATCH_LOOKUP, to detect BATCH_LOOKUP support.

>
> >
> > 4. Try using some of the low-level APIs from libbpf/bpf.h, like the
> > same cgroup program replacement. You'll get E2BIG and will scratch
> > your head for a while, checking how the list of attached BPF programs
> > can be too big, if it's the only one. Just to realize that again, your
> > kernel is just a touch too old and just complains about non-zero
> > unknown field in bpf_attr.
>
> The low-level APIs of libbpf is indeed a pain.
> There are also old and partially broken libbpf APIs.
> I think we need to start aggressively deprecating some of them.
> Especially those that cause debugging issues to users.
>
> > 5. Just few days ago, one user was doing bpf_program__attach_tracepoint():
> >
> > libbpf: program 'tp/sched/sched_process_exit': failed to attach to pfd
> > 10: File exists
> >
> > After a bit of back and forth turns out he had a second instance of
> > that program running in parallel. Good, I quickly realize that it's an
> > old kernel and it doesn't allow me to attach more than 1 BPF program
> > to the tracepoint. Case solved. But what about many other BPF users
> > that do not have access to someone developing BPF in the kernel?
>
> I similarly don't see how kernel string would have helped.
> It would have said the same thing: "cannot attach".

This one is for perf subsystem, actually, it's its
PERF_EVENT_IOC_SET_BPF ioctl (until we add bpf_link for perf_event
attachment).

"limit of allowed BPF programs reached" would be good enough. "cannot
attach" is too generic.

>
> > And the list goes on. Even if it was my full-time job just to
> > anticipate all the misuses and try to check/guess them in libbpf, that
> > wouldn't work and won't scale.
> >
> > And all I was asking (and not finger pointing or blaming anything or
> > anyone) to have a mechanism in the kernel to get a single-line
> > human-readable hint as to which one out of many
> > EINVAL/E2BIG/ENOENT/EPERM conditions was hit. No need to immediately
> > convert all of them, we could have gradually added that, prioritizing
> > common and most probable ones to hit.
>
> I'm not against adding log_buf to more bpf commands.
> In many cases they are needed.
> I'm against the blank statement that _all_ bpf commands need log_buf
> and that is somehow will solve user debug nightmares.

There was no such statement. There was no statement that it will
"solve" debug nightmares. But it will certainly ease the pain.

My proposal was about adding the ability to emit something to log_buf
from any of the BPF commands, if that BPF command chooses to provide
extra error information. The whole point of this was to avoid adding
log_buf in command-specific ways (as Toke was doing in the patch that
I used to initiate the discussion) and do it once for entire syscall,
so that we can gradually utilize it where it makes most sense.

>
> > project. Of course let's make libbpf more user-friendly where possible
> > and feasible,
>
> Let's do so.
> This thread jumped to early conclusion that log_buf for all bpf commands
> will magically improve user experience. Both you and Toke were happy to
> conclude that "horrible kernel UI/UX" is responsible for everything and
> it has to be the one to fix.

You somehow concluded that it has to be either kernel or libbpf that
has to be improved, not both. I don't know how you came to this
conclusion. I didn't say or mean that, neither I read that from Toke's
replies. Let me walk through relevant parts verbatim.

Me:
> > > > I think BPF syscall would benefit from common/generalized log support
> > > > across all commands, given how powerful/complex it already is.
> > > > Sometimes it's literally impossible to understand why one gets -EINVAL
> > > > without adding printk()s in the kernel.

Toke:
> > > Yes, I agree! This is horrible UI!

Me:
> > UI?.. It's a perfectly fine and extensible API for all functionality
> > it provides, it just needs a better human-readable feedback mechanism,
> > which is what I'm proposing. Error codes are not working when you have
>> so many different situations that can result in error.

... Some technical details back and forth follows ...

You:
> I really don't like 'common attr across all commands'.
> Both of you are talking as libbpf developers who occasionally need to
> add printk-s to the kernel. That is not an excuse to bloat api that will be
> useful to two people.

You not liking this I get, this is fine, we all have our preferences,
we don't have to agree on everything. But then you just jumped into
conclusion about our motivation and claimed that it will be useful to
only two people (presumably me and Toke). I'm also not sure about
bloating the API, given that such API is already part of BPF_PROG_LOAD
command, but there is no point in discussing such technical details
anymore.

Toke said that it's a horrible story to debug such generic errors (and
I concur, it is), not that the kernel is horrible itself or was
written by horrible people. I don't feel offended if libbpf provides
horrible user experience in parts I've implemented or changed. I might
disagree about the qualification in some cases, but I won't get
offended about someone not liking my code, API or design.

> I don't think that's the case and I hope
> working through the examples of bad user experience above made it clear.

Not at all. All you showed is that once someone runs into some
specific API misuse and debugs it to success, then it's usually pretty
obvious how libbpf could have helped doing additional diagnostics.
With the benefit of hindsight.

I agree that if such diagnostics are reliable and the situation itself
is common and experienced by multiple users, then it might make sense
to add such checks to libbpf. But I also don't think it's always
possible to diagnose something automatically with 100% confidence. We
can give hints, but misdiagnosing the problem can just further confuse
things instead. Also quite often such problems are one-offs, which
doesn't make them any less confusing and frustrating, but custom
diagnostics for every such case has a potential of bloating libbpf
beyond recognition and I'm not convinced that we should diagnose all
of them. That's what I meant that this is not a scalable approach to
just say "fix libbpf to be more user friendly, kernel does its best
already".

This thread was about generalizing log_buf to entire bpf(), even
though it got diverted into something entirely different. I would
still be interested to know the technical implications of syscall
using extra args (like in my original proposal #1, about adding 2
extra args if some flag in the existing argument is set). Is that even
safe to do with syscalls? Even if we don't do it, it's still useful to
know for the future. I think the syscall() wrapper is inherently
vararg and accepts an arbitrary number of args, so I assumed it's
actually ok, but don't know how to prove it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-18  3:54                             ` Andrii Nakryiko
@ 2020-07-20 22:30                               ` Alexei Starovoitov
  2020-07-21  3:00                                 ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2020-07-20 22:30 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Toke Høiland-Jørgensen, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, Networking,
	Kernel Team, Blake Matheny

On Fri, Jul 17, 2020 at 08:54:45PM -0700, Andrii Nakryiko wrote:
> 
> > Only libbpf can do it. Kernel is helpless here.
> > Say we change the kernel errno for all unsuported prog types and maps
> > it would return ENOTSUPP or something.
> > Would it really help the situation?
> 
> IMO, if the kernel just prints out "Unknown BPF command 123" or
> "Unknown map type 345" that would be already a nice improvement.
...
> log_buf can't help existing kernels. Period. No one is arguing or
> expecting that. But moving forward, just having that "unknown command
> 123" would be great.
> 
> But yeah, of course libbpf can create a probing map and try to do
> BATCH_LOOKUP, to detect BATCH_LOOKUP support.

Also with BTF the kernel is self documented.
The following will print all commands that kernel supports:
bpftool btf dump file ./bld_x64/vmlinux |grep -A40 bpf_cmd
and with 'grep BPF_MAP_TYPE_' all supported maps.
For older kernels there is 'bpftool feature probe'.
Since libbpf reads vmlinux BTF anyway it could have got a knowledge
of all the features it supports based on BTF.

But let's continue this thought experiment for augmenting error
reporting with a string.
For 'Unknown BPF command 123' to work log_buf needs to passed
outside of 'union bpf_attr'. Probably as 4th argument to sys_bpf ?
and then a bit in 'int cmd' would need to be burned to indicate
that 4th arg is there. Probably size of the arg needs to be passed
either as 5th arg or as part of the 'struct bpf_log_buf' so it can
be extensible.
Using that log_buf directly in
SYSCALL_DEFINE[45](bpf, int cmd, ... struct bpf_log_buf *log_buf)
will be trivial.
But to pass it into any of first level functions (bpf_iter_create,
bpf_prog_attach, etc) they would need to gain an extra argument.
To pass it all the way into hierarchy_allows_attach() it needs to be added to:
__cgroup_bpf_attach 
cgroup_bpf_attach
cgroup_bpf_link_attach
link_create

Another case of ambiguous 'return -EINVAL' would cause another change
to a bunch of function prototypes.
So it's better to integrate it into current task_struct to avoid huge code churn.

But if we do so what stops us from generalizing log_buf reporting to
other syscalls? perf_event_open is in the same category.

> This one is for perf subsystem, actually, it's its
> PERF_EVENT_IOC_SET_BPF ioctl (until we add bpf_link for perf_event
> attachment).

clearly not only sys_bpf and sys_perf_event_open, but sys_ioctl would need
string support to be a full answer to ambiguous einval-s.

> My proposal was about adding the ability to emit something to log_buf
> from any of the BPF commands, if that BPF command chooses to provide
> extra error information. The whole point of this was to avoid adding
> log_buf in command-specific ways (as Toke was doing in the patch that
> I used to initiate the discussion) and do it once for entire syscall,
> so that we can gradually utilize it where it makes most sense.

I don't think that works due to code churn. Whether we pay that price
once or 'gradually' it doesn't make it any better.
When log_buf is added to an existing command the 'union bpf_attr' is there
in the function proto and nothing new needs to passed to a lot of functions.
So I certainly prefer Toke's approach of adding log_buf to one specific
command if it's really needed there.

The alternative is to solve it for all syscalls.

> I agree that if such diagnostics are reliable and the situation itself
> is common and experienced by multiple users, then it might make sense
> to add such checks to libbpf. 

'experienced by multiple users' is going to be a judgement call
either for libbpf or for kernel.
I'm saying let's improve libbpf user-friendlyness everywhere we can.
We can always drop these hints later. Unlike kernel messages that
might become stable api.
One thing is log_buf that the verifier is dumping. It's huge and not parsable.
Whereas a string next to return EINVAL may become an uapi.
I wouldn't be surprised if some of netlink extack strings got to this
level of stability and changing the string may cause somebody to notice
and the commit would get reverted.

> But I also don't think it's always
> possible to diagnose something automatically with 100% confidence. We
> can give hints, but misdiagnosing the problem can just further confuse
> things instead.

I think the possible confusion is fine.
The libbpf has an opportunity to try them and remove if things don't work out.
The kernel strings would be much more scrutinized and harder to change.

> Also quite often such problems are one-offs, which
> doesn't make them any less confusing and frustrating, but custom
> diagnostics for every such case has a potential of bloating libbpf
> beyond recognition

bloating libbpf with strings is imo better way to figure out how to
improve user experience than bloating kernel.

> of them. That's what I meant that this is not a scalable approach to
> just say "fix libbpf to be more user friendly, kernel does its best
> already".

Ok. I will take it back. The kernel can be improved with extra strings
here and there, but only if it's done without huge code churn.
If current task_struct approach can work and the changes would be
limited to something like:
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index ac53102e244a..244df18728c2 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -196,8 +196,12 @@ static bool hierarchy_allows_attach(struct cgroup *cgrp,
                        return true;
                cnt = prog_list_length(&p->bpf.progs[type]);
                WARN_ON_ONCE(cnt > 1);
-               if (cnt == 1)
-                       return !!(flags & BPF_F_ALLOW_OVERRIDE);
+               if (cnt == 1) {
+                       ret = !!(flags & BPF_F_ALLOW_OVERRIDE);
+                       if (!ret)
+                               syscall_string("cgroup has non-overridable program in the parent");
+                       return ret;
+               }

Then such syscall_string reporting mechanism would be solid addition to the
kernel. Otherwise the cost of passing explicit log_buf everywhere is not worth
it.

I think such syscall_string can probably piggy back on "socket local storage
into generic local storage" work. Sooner or later we will have
per task_struct storage. If that's a single pointer in task_struct that will
be used by both task local storage from inside tracing bpf program and
by this syscall_string() reporting mechanism then we can land it.
May be all syscalls will become stateful. sys_bpf, sys_perf_event_open, sys_ioctl
followed by new syscall "give me back the error string".
Or may be we can do some asm magic and pass both errno and a string from
a syscall at the same time.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h
  2020-07-20 22:30                               ` Alexei Starovoitov
@ 2020-07-21  3:00                                 ` Andrii Nakryiko
  0 siblings, 0 replies; 26+ messages in thread
From: Andrii Nakryiko @ 2020-07-21  3:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Toke Høiland-Jørgensen, bpf, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, John Fastabend, KP Singh, Networking,
	Kernel Team, Blake Matheny

On Mon, Jul 20, 2020 at 3:31 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jul 17, 2020 at 08:54:45PM -0700, Andrii Nakryiko wrote:
> >
> > > Only libbpf can do it. Kernel is helpless here.
> > > Say we change the kernel errno for all unsuported prog types and maps
> > > it would return ENOTSUPP or something.
> > > Would it really help the situation?
> >
> > IMO, if the kernel just prints out "Unknown BPF command 123" or
> > "Unknown map type 345" that would be already a nice improvement.
> ...
> > log_buf can't help existing kernels. Period. No one is arguing or
> > expecting that. But moving forward, just having that "unknown command
> > 123" would be great.
> >
> > But yeah, of course libbpf can create a probing map and try to do
> > BATCH_LOOKUP, to detect BATCH_LOOKUP support.
>
> Also with BTF the kernel is self documented.
> The following will print all commands that kernel supports:
> bpftool btf dump file ./bld_x64/vmlinux |grep -A40 bpf_cmd
> and with 'grep BPF_MAP_TYPE_' all supported maps.
> For older kernels there is 'bpftool feature probe'.
> Since libbpf reads vmlinux BTF anyway it could have got a knowledge
> of all the features it supports based on BTF.
>

Yes.

> But let's continue this thought experiment for augmenting error
> reporting with a string.
> For 'Unknown BPF command 123' to work log_buf needs to passed
> outside of 'union bpf_attr'. Probably as 4th argument to sys_bpf ?
> and then a bit in 'int cmd' would need to be burned to indicate
> that 4th arg is there. Probably size of the arg needs to be passed
> either as 5th arg or as part of the 'struct bpf_log_buf' so it can
> be extensible.
> Using that log_buf directly in
> SYSCALL_DEFINE[45](bpf, int cmd, ... struct bpf_log_buf *log_buf)
> will be trivial.

Right, so far exactly what I proposed as #1 proposal.

> But to pass it into any of first level functions (bpf_iter_create,
> bpf_prog_attach, etc) they would need to gain an extra argument.
> To pass it all the way into hierarchy_allows_attach() it needs to be added to:
> __cgroup_bpf_attach
> cgroup_bpf_attach
> cgroup_bpf_link_attach
> link_create

I was hoping we can do it in such a way that we won't have to thread
through extra input arguments. Current task_struct looked like a good
candidate, given syscalls are always happening in a process context. I
was considering a per-CPU variable, but CPU migration precludes that.
I certainly was hoping that it could be implemented without huge code
churn.

>
> Another case of ambiguous 'return -EINVAL' would cause another change
> to a bunch of function prototypes.
> So it's better to integrate it into current task_struct to avoid huge code churn.

Right, exactly.

>
> But if we do so what stops us from generalizing log_buf reporting to
> other syscalls? perf_event_open is in the same category.

Nothing, of course, except for the need to convince other folks. I
thought starting with bpf() syscall and proving usefulness in practice
was a good first step. Then we could expand that to other syscalls.

>
> > This one is for perf subsystem, actually, it's its
> > PERF_EVENT_IOC_SET_BPF ioctl (until we add bpf_link for perf_event
> > attachment).
>
> clearly not only sys_bpf and sys_perf_event_open, but sys_ioctl would need
> string support to be a full answer to ambiguous einval-s.
>
> > My proposal was about adding the ability to emit something to log_buf
> > from any of the BPF commands, if that BPF command chooses to provide
> > extra error information. The whole point of this was to avoid adding
> > log_buf in command-specific ways (as Toke was doing in the patch that
> > I used to initiate the discussion) and do it once for entire syscall,
> > so that we can gradually utilize it where it makes most sense.
>
> I don't think that works due to code churn. Whether we pay that price
> once or 'gradually' it doesn't make it any better.

See above, here you are assuming explicit passing of log_buf through
all functions. If done through current there would be no code churn.

> When log_buf is added to an existing command the 'union bpf_attr' is there
> in the function proto and nothing new needs to passed to a lot of functions.

Not all the functions that do error checking have a bpf_attr pointer
anyway, so even in that case we could have a lot of code churn, if we
don't do it in a smarter way (i.e., current).

> So I certainly prefer Toke's approach of adding log_buf to one specific
> command if it's really needed there.
>
> The alternative is to solve it for all syscalls.

I do like the alternative, of course, but I'd start with just bpf()
syscall initially. But that's less of a technical choice, of course.

>
> > I agree that if such diagnostics are reliable and the situation itself
> > is common and experienced by multiple users, then it might make sense
> > to add such checks to libbpf.
>
> 'experienced by multiple users' is going to be a judgement call
> either for libbpf or for kernel.
> I'm saying let's improve libbpf user-friendlyness everywhere we can.
> We can always drop these hints later. Unlike kernel messages that
> might become stable api.
> One thing is log_buf that the verifier is dumping. It's huge and not parsable.
> Whereas a string next to return EINVAL may become an uapi.
> I wouldn't be surprised if some of netlink extack strings got to this
> level of stability and changing the string may cause somebody to notice
> and the commit would get reverted.
>
> > But I also don't think it's always
> > possible to diagnose something automatically with 100% confidence. We
> > can give hints, but misdiagnosing the problem can just further confuse
> > things instead.
>
> I think the possible confusion is fine.
> The libbpf has an opportunity to try them and remove if things don't work out.
> The kernel strings would be much more scrutinized and harder to change.
>

I see where you are coming from w.r.t. stability of those strings, but
I hope that's not going to be the case. Otherwise one might argue that
kernel WARN() messages could become a stable API.

But regardless, "try them and remove if things don't work out" isn't
exactly a user-friendly strategy, if that causes pain and confusion.
So I think we should still be very careful with what we add to libbpf.


> > Also quite often such problems are one-offs, which
> > doesn't make them any less confusing and frustrating, but custom
> > diagnostics for every such case has a potential of bloating libbpf
> > beyond recognition
>
> bloating libbpf with strings is imo better way to figure out how to
> improve user experience than bloating kernel.

By bloat I meant not strings, but the logic of determining the exact
cause of failure. Which from what we discussed can be quite elaborate
and non-trivial. Strings bloat are an easy thing to fix in the kernel.
We can conditionally compile them out if some Kconfig value is set.
E.g., for cases of embedded kernels where every kilobyte counts. It's
a matter of having the proper macros/funcs to do error reporting.

>
> > of them. That's what I meant that this is not a scalable approach to
> > just say "fix libbpf to be more user friendly, kernel does its best
> > already".
>
> Ok. I will take it back. The kernel can be improved with extra strings
> here and there, but only if it's done without huge code churn.
> If current task_struct approach can work and the changes would be
> limited to something like:
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index ac53102e244a..244df18728c2 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -196,8 +196,12 @@ static bool hierarchy_allows_attach(struct cgroup *cgrp,
>                         return true;
>                 cnt = prog_list_length(&p->bpf.progs[type]);
>                 WARN_ON_ONCE(cnt > 1);
> -               if (cnt == 1)
> -                       return !!(flags & BPF_F_ALLOW_OVERRIDE);
> +               if (cnt == 1) {
> +                       ret = !!(flags & BPF_F_ALLOW_OVERRIDE);
> +                       if (!ret)
> +                               syscall_string("cgroup has non-overridable program in the parent");
> +                       return ret;
> +               }
>
> Then such syscall_string reporting mechanism would be solid addition to the
> kernel. Otherwise the cost of passing explicit log_buf everywhere is not worth
> it.
>

Right, that's something like what I had in mind in terms of how
non-intrusive it has to be to get adopted.

> I think such syscall_string can probably piggy back on "socket local storage
> into generic local storage" work. Sooner or later we will have
> per task_struct storage. If that's a single pointer in task_struct that will
> be used by both task local storage from inside tracing bpf program and
> by this syscall_string() reporting mechanism then we can land it.

Not sure why the need to unify it with task-local storage? What's the
use case and what are the benefits of conflating syscall logging and
some BPF program's logging? It just adds problems, e.g., some BPF
programs might interleave its log output with syscall's output. I'm
not saying we shouldn't do it, but I don't see why we have to do it.

> May be all syscalls will become stateful. sys_bpf, sys_perf_event_open, sys_ioctl
> followed by new syscall "give me back the error string".
> Or may be we can do some asm magic and pass both errno and a string from
> a syscall at the same time.


So stateful logging (e.g., through extra syscall to set user log
buffer and unset it), which was proposal #2, definitely makes it easy
to do this across all syscalls uniformly without touching that syscall
API at all. The biggest downside, though, is runtimes like Golang,
where application doesn't control which OS thread it runs on. So
getting reliable and consistent log between syscalls becomes a
problem. Maybe there is some thread pinning API, though, and this
might not be a problem. Don't know.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-07-21  3:00 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 20:12 [PATCH bpf-next 0/6] bpf: Support multi-attach for freplace programs Toke Høiland-Jørgensen
2020-07-13 20:12 ` [PATCH bpf-next 1/6] bpf: change logging calls from verbose() to bpf_log() and use log pointer Toke Høiland-Jørgensen
2020-07-13 20:12 ` [PATCH bpf-next 2/6] bpf: verifier: refactor check_attach_btf_id() Toke Høiland-Jørgensen
2020-07-13 20:12 ` [PATCH bpf-next 3/6] bpf: support attaching freplace programs to multiple attach points Toke Høiland-Jørgensen
2020-07-13 23:16   ` kernel test robot
2020-07-13 20:12 ` [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h Toke Høiland-Jørgensen
2020-07-14  5:21   ` BPF logging infrastructure. Was: " Andrii Nakryiko
2020-07-14 12:12     ` Toke Høiland-Jørgensen
2020-07-14 19:14       ` Andrii Nakryiko
2020-07-14 20:47         ` Toke Høiland-Jørgensen
2020-07-14 21:58           ` Andrii Nakryiko
2020-07-14 22:19             ` Toke Høiland-Jørgensen
2020-07-14 23:11               ` Alexei Starovoitov
2020-07-15 12:56                 ` Toke Høiland-Jørgensen
2020-07-15 23:41                   ` Alexei Starovoitov
2020-07-16  1:11                     ` Andrii Nakryiko
2020-07-16  5:44                       ` Alexei Starovoitov
2020-07-16 19:59                         ` Andrii Nakryiko
2020-07-16 20:19                           ` Toke Høiland-Jørgensen
2020-07-17  3:09                           ` Alexei Starovoitov
2020-07-18  3:54                             ` Andrii Nakryiko
2020-07-20 22:30                               ` Alexei Starovoitov
2020-07-21  3:00                                 ` Andrii Nakryiko
2020-07-15 19:02                 ` Andrii Nakryiko
2020-07-13 20:12 ` [PATCH bpf-next 5/6] libbpf: add support for supplying target to bpf_raw_tracepoint_open() Toke Høiland-Jørgensen
2020-07-13 20:12 ` [PATCH bpf-next 6/6] selftests: add test for multiple attachments of freplace program Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).