netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
@ 2020-03-18 13:06 Eelco Chaudron
  2020-03-18 13:06 ` [RFC PATCH bpf-next 1/3] bpf: introduce trace option to the BPF_PROG_TEST_RUN command API Eelco Chaudron
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Eelco Chaudron @ 2020-03-18 13:06 UTC (permalink / raw)
  To: bpf; +Cc: davem, netdev, ast, daniel, kafai, songliubraving, yhs, andriin

I sent out this RFC to get an idea if the approach suggested here
would be something other people would also like to see. In addition,
this cover letter mentions some concerns and questions that need
answers before we can move to an acceptable implementation.

This patch adds support for tracing eBPF XDP programs that get
executed using the __BPF_PROG_RUN syscall. This is done by switching
from JIT (if enabled) to executing the program using the interpreter
and record each executed instruction.

For now, the execution history is printed to the kernel ring buffer
using pr_info(), the final version should have enough data stored in a
user-supplied buffer to reconstruct this output. This should probably
be part of bpftool, i.e. dump a similar output, and the ability to
store all this in an elf-like format for dumping/analyzing/replaying
at a later stage.

This patch does not dump the XDP packet content before and after
execution, however, this data is available to the caller of the API.

The __bpf_prog_run_trace() interpreter is a copy of __bpf_prog_run()
and we probably need a smarter way to re-use the code rather than a
blind copy with some changes.

Enabling the interpreter opens up the kernel for spectre variant 2,
guess that's why the BPF_JIT_ALWAYS_ON option was introduced (commit
290af86629b2). Enabling it for debugging in the field does not sound
like an option (talking to people doing kernel distributions).
Any idea how to work around this (lfence before any call this will
slow down, but I guess for debugging this does not matter)? I need to
research this more as I'm no expert in this area. But I think this
needs to be solved as I see this as a show stopper. So any input is
welcome.

To allow bpf_call support for tracing currently the general
interpreter is enabled. See the fixup_call_args() function for why
this is needed. We might need to find a way to fix this (see the above
section on spectre).

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH bpf-next 1/3] bpf: introduce trace option to the BPF_PROG_TEST_RUN command API
  2020-03-18 13:06 [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API Eelco Chaudron
@ 2020-03-18 13:06 ` Eelco Chaudron
  2020-03-18 13:06 ` [RFC PATCH bpf-next 2/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API Eelco Chaudron
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Eelco Chaudron @ 2020-03-18 13:06 UTC (permalink / raw)
  To: bpf; +Cc: davem, netdev, ast, daniel, kafai, songliubraving, yhs, andriin

This patch adds a flag to the existing BPF_PROG_TEST_RUN API,
which will allow tracing for XDP programs.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 include/linux/filter.h         |   13 +++++++++++++
 include/uapi/linux/bpf.h       |    4 ++++
 kernel/bpf/syscall.c           |    2 +-
 net/bpf/test_run.c             |   17 +++++++++++------
 tools/include/uapi/linux/bpf.h |    4 ++++
 5 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 43b5e455d2f5..f95f9ad45ad6 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -737,6 +737,19 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 			      BPF_DISPATCHER_FUNC(bpf_dispatcher_xdp));
 }
 
+static __always_inline u32 bpf_prog_run_xdp_trace(const struct bpf_prog *prog,
+						  struct xdp_buff *xdp)
+{
+	/* Caller needs to hold rcu_read_lock() (!), otherwise program
+	 * can be released while still running, or map elements could be
+	 * freed early while still having concurrent users. XDP fastpath
+	 * already takes rcu_read_lock() when fetching the program, so
+	 * it's not necessary here anymore.
+	 */
+	return __BPF_PROG_RUN(prog, xdp,
+			      BPF_DISPATCHER_FUNC(bpf_dispatcher_xdp));
+}
+
 void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
 
 static inline u32 bpf_prog_insn_size(const struct bpf_prog *prog)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 40b2d9476268..ac5c89903550 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -375,6 +375,9 @@ enum {
  */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
+/* Flags for BPF_PROG_TEST_RUN. */
+#define BPF_F_TEST_ENABLE_TRACE (1U << 0)
+
 enum bpf_stack_build_id_status {
 	/* user space need an empty entry to identify end of a trace */
 	BPF_STACK_BUILD_ID_EMPTY = 0,
@@ -511,6 +514,7 @@ union bpf_attr {
 						 */
 		__aligned_u64	ctx_in;
 		__aligned_u64	ctx_out;
+		__u32           flags;
 	} test;
 
 	struct { /* anonymous struct used by BPF_*_GET_*_ID */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7ce0815793dd..9a6fae428976 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2680,7 +2680,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	return cgroup_bpf_prog_query(attr, uattr);
 }
 
-#define BPF_PROG_TEST_RUN_LAST_FIELD test.ctx_out
+#define BPF_PROG_TEST_RUN_LAST_FIELD test.flags
 
 static int bpf_prog_test_run(const union bpf_attr *attr,
 			     union bpf_attr __user *uattr)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 4c921f5154e0..061cad840b05 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -16,7 +16,7 @@
 #include <trace/events/bpf_test_run.h>
 
 static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
-			u32 *retval, u32 *time, bool xdp)
+			u32 *retval, u32 *time, bool xdp, bool trace)
 {
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = { NULL };
 	enum bpf_cgroup_storage_type stype;
@@ -43,10 +43,14 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
 	for (i = 0; i < repeat; i++) {
 		bpf_cgroup_storage_set(storage);
 
-		if (xdp)
-			*retval = bpf_prog_run_xdp(prog, ctx);
-		else
+		if (xdp) {
+			if (trace)
+				*retval = bpf_prog_run_xdp_trace(prog, ctx);
+			else
+				*retval = bpf_prog_run_xdp(prog, ctx);
+		} else {
 			*retval = BPF_PROG_RUN(prog, ctx);
+		}
 
 		if (signal_pending(current)) {
 			ret = -EINTR;
@@ -431,7 +435,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	ret = convert___skb_to_skb(skb, ctx);
 	if (ret)
 		goto out;
-	ret = bpf_test_run(prog, skb, repeat, &retval, &duration, false);
+	ret = bpf_test_run(prog, skb, repeat, &retval, &duration, false, false);
 	if (ret)
 		goto out;
 	if (!is_l2) {
@@ -468,6 +472,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 {
 	u32 size = kattr->test.data_size_in;
 	u32 repeat = kattr->test.repeat;
+	bool trace = kattr->test.flags & BPF_F_TEST_ENABLE_TRACE;
 	struct netdev_rx_queue *rxqueue;
 	struct xdp_buff xdp = {};
 	u32 retval, duration;
@@ -489,7 +494,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0);
 	xdp.rxq = &rxqueue->xdp_rxq;
 	bpf_prog_change_xdp(NULL, prog);
-	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
+	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true, trace);
 	if (ret)
 		goto out;
 	if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 40b2d9476268..ac5c89903550 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -375,6 +375,9 @@ enum {
  */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
+/* Flags for BPF_PROG_TEST_RUN. */
+#define BPF_F_TEST_ENABLE_TRACE (1U << 0)
+
 enum bpf_stack_build_id_status {
 	/* user space need an empty entry to identify end of a trace */
 	BPF_STACK_BUILD_ID_EMPTY = 0,
@@ -511,6 +514,7 @@ union bpf_attr {
 						 */
 		__aligned_u64	ctx_in;
 		__aligned_u64	ctx_out;
+		__u32           flags;
 	} test;
 
 	struct { /* anonymous struct used by BPF_*_GET_*_ID */


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH bpf-next 2/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
  2020-03-18 13:06 [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API Eelco Chaudron
  2020-03-18 13:06 ` [RFC PATCH bpf-next 1/3] bpf: introduce trace option to the BPF_PROG_TEST_RUN command API Eelco Chaudron
@ 2020-03-18 13:06 ` Eelco Chaudron
  2020-03-18 13:06 ` [RFC PATCH bpf-next 3/3] selftests/bpf: call bpf_prog_test_run with trace enabled for XDP test Eelco Chaudron
  2020-03-23 22:47 ` [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API Yonghong Song
  3 siblings, 0 replies; 15+ messages in thread
From: Eelco Chaudron @ 2020-03-18 13:06 UTC (permalink / raw)
  To: bpf; +Cc: davem, netdev, ast, daniel, kafai, songliubraving, yhs, andriin

Add support for tracing eBPF XDP programs that get executed using the
__BPF_PROG_RUN syscall. This is done by switching from JIT (if enabled)
to executing the program using the interpreter and record each
executed instruction.

For now, the execution history is printed to the kernel ring buffer
using pr_info(), the final version should have this stored in a user
supplied buffer.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 include/linux/filter.h |   36 ++-
 kernel/bpf/core.c      |  679 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c  |   10 -
 3 files changed, 707 insertions(+), 18 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index f95f9ad45ad6..3d8fdbde2d02 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -523,6 +523,9 @@ struct bpf_binary_header {
 	u8 image[] __aligned(BPF_IMAGE_ALIGNMENT);
 };
 
+typedef unsigned int (*bpf_func_t)(const void *ctx,
+				   const struct bpf_insn *insn);
+
 struct bpf_prog {
 	u16			pages;		/* Number of allocated pages */
 	u16			jited:1,	/* Is our filter JIT'ed? */
@@ -542,8 +545,7 @@ struct bpf_prog {
 	u8			tag[BPF_TAG_SIZE];
 	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
 	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
-	unsigned int		(*bpf_func)(const void *ctx,
-					    const struct bpf_insn *insn);
+	bpf_func_t		bpf_func;
 	/* Instructions for interpreter */
 	union {
 		struct sock_filter	insns[0];
@@ -559,6 +561,29 @@ struct sk_filter {
 
 DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key);
 
+#define __BPF_PROG_RUN_INTERPRETER_TRACE(prog, ctx, dfunc)	({	\
+	u32 ret;							\
+	bpf_func_t bpf_func = bpf_get_trace_interpreter(prog);		\
+	if (!bpf_func) {						\
+		/* This should not happen, but if it does warn once */	\
+		WARN_ON_ONCE(1);					\
+		return 0;						\
+	}								\
+	cant_migrate();							\
+	if (static_branch_unlikely(&bpf_stats_enabled_key)) {		\
+		struct bpf_prog_stats *stats;				\
+		u64 start = sched_clock();				\
+		ret = dfunc(ctx, (prog)->insnsi, bpf_func);		\
+		stats = this_cpu_ptr(prog->aux->stats);			\
+		u64_stats_update_begin(&stats->syncp);			\
+		stats->cnt++;						\
+		stats->nsecs += sched_clock() - start;			\
+		u64_stats_update_end(&stats->syncp);			\
+	} else {							\
+		ret = dfunc(ctx, (prog)->insnsi, bpf_func);		\
+	}								\
+	ret; })
+
 #define __BPF_PROG_RUN(prog, ctx, dfunc)	({			\
 	u32 ret;							\
 	cant_migrate();							\
@@ -722,6 +747,8 @@ static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog,
 	return res;
 }
 
+bpf_func_t bpf_get_trace_interpreter(const struct bpf_prog *fp);
+
 DECLARE_BPF_DISPATCHER(bpf_dispatcher_xdp)
 
 static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
@@ -746,8 +773,9 @@ static __always_inline u32 bpf_prog_run_xdp_trace(const struct bpf_prog *prog,
 	 * already takes rcu_read_lock() when fetching the program, so
 	 * it's not necessary here anymore.
 	 */
-	return __BPF_PROG_RUN(prog, xdp,
-			      BPF_DISPATCHER_FUNC(bpf_dispatcher_xdp));
+	return __BPF_PROG_RUN_INTERPRETER_TRACE(prog, xdp,
+						BPF_DISPATCHER_FUNC(
+							bpf_dispatcher_xdp));
 }
 
 void bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 973a20d49749..5bfc7b5c5af5 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -34,6 +34,8 @@
 #include <linux/log2.h>
 #include <asm/unaligned.h>
 
+#include "disasm.h"
+
 /* Registers */
 #define BPF_R0	regs[BPF_REG_0]
 #define BPF_R1	regs[BPF_REG_1]
@@ -1341,13 +1343,17 @@ bool bpf_opcode_in_insntable(u8 code)
 	return public_insntable[code];
 }
 
-#ifndef CONFIG_BPF_JIT_ALWAYS_ON
 u64 __weak bpf_probe_read_kernel(void *dst, u32 size, const void *unsafe_ptr)
 {
 	memset(dst, 0, size);
 	return -EFAULT;
 }
 
+/* TODO: Removed "ifndef CONFIG_BPF_JIT_ALWAYS_ON" for now from __bpf_prog_run
+ *       as we need it for fixup_call_args(), i.e. bpf 2 bpf call with the
+ *       interpreter.
+ */
+
 /**
  *	__bpf_prog_run - run eBPF program on a given context
  *	@regs: is the array of MAX_BPF_EXT_REG eBPF pseudo-registers
@@ -1634,6 +1640,625 @@ static u64 __no_fgcse ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u6
 		return 0;
 }
 
+struct bpf_trace_ctx {
+	u32                    insn_executed;
+	u32                    stack_size;
+	u64                   *stack;
+	u64                   *regs;
+	u64                    regs_stored[MAX_BPF_EXT_REG];
+	void                  *addr_stored;
+	u64                    addr_value_stored;
+	char                   scratch[80];
+	char                   scratch_addr[KSYM_NAME_LEN];
+	const struct bpf_insn *prog_insn;
+	const struct bpf_insn *prev_insn;
+};
+
+static __always_inline void __bpf_trace_dump_stack(struct bpf_trace_ctx *ctx)
+{
+	int i;
+
+	pr_info("BPFTrace: STACK(%u bytes):", ctx->stack_size);
+
+	for (i = 0; i <  (ctx->stack_size / sizeof(u64) / 4); i++) {
+		pr_info("BPFTrace:   %px: 0x%016llx %016llx %016llx %016llx",
+			&ctx->stack[i * 4],
+			ctx->stack[i * 4 + 0], ctx->stack[i * 4 + 1],
+			ctx->stack[i * 4 + 2], ctx->stack[i * 4 + 3]);
+	}
+
+	if ((ctx->stack_size / sizeof(u64) % 4) > 0) {
+		int j;
+		char line[96];
+		int bytes_used = 0;
+
+		bytes_used += snprintf(line, sizeof(line),
+				       "BPFTrace:   %px: 0x",
+				       &ctx->stack[i * 4]);
+
+		for (j = 0; j < (ctx->stack_size / sizeof(u64) % 4); j++) {
+			bytes_used += snprintf(line + bytes_used,
+					       sizeof(line) - bytes_used,
+					       "%016llx ",
+					       ctx->stack[i * 4 + j]);
+		}
+		pr_info("%s", line);
+	}
+}
+
+static __always_inline void __bpf_trace_dump_regs(struct bpf_trace_ctx *ctx)
+{
+	pr_info("BPFTrace: REGISTERS:");
+	pr_info("BPFTrace:   r00-03: 0x%016llx %016llx %016llx %016llx",
+		ctx->regs[BPF_REG_0], ctx->regs[BPF_REG_1],
+		ctx->regs[BPF_REG_2], ctx->regs[BPF_REG_3]);
+	pr_info("BPFTrace:   r04-07: 0x%016llx %016llx %016llx %016llx",
+		ctx->regs[BPF_REG_4], ctx->regs[BPF_REG_5],
+		ctx->regs[BPF_REG_6], ctx->regs[BPF_REG_7]);
+	pr_info("BPFTrace:   r08-11: 0x%016llx %016llx %016llx %016llx",
+		ctx->regs[BPF_REG_8], ctx->regs[BPF_REG_9],
+		ctx->regs[BPF_REG_10], ctx->regs[BPF_REG_AX]);
+}
+
+static __always_inline void __bpf_trace_init(struct bpf_trace_ctx *ctx,
+					     const struct bpf_insn *insn,
+					     u64 *regs,
+					     u64 *stack, u32 stack_size)
+{
+	memset(ctx, 0, sizeof(*ctx));
+	ctx->prog_insn = insn;
+	ctx->regs = regs;
+	ctx->stack = stack;
+	ctx->stack_size = stack_size;
+
+	__bpf_trace_dump_stack(ctx);
+	pr_info("BPFTrace:");
+	__bpf_trace_dump_regs(ctx);
+	pr_info("BPFTrace:");
+}
+
+static const char *__bpf_cb_call(void *private_data,
+				 const struct bpf_insn *insn)
+{
+	/* For now only return helper names, we can fix this in the
+	 * actual tool decoding the trace buffer, i.e. bpftool
+	 */
+	struct bpf_trace_ctx *ctx = private_data;
+
+	if (insn->src_reg == BPF_PSEUDO_CALL) {
+		strncpy(ctx->scratch_addr, "unknown",
+			sizeof(ctx->scratch_addr));
+
+	} else  {
+		u64 address;
+
+		if (__bpf_call_base + insn->imm)
+			address = (u64) __bpf_call_base + insn->imm;
+		else
+			address = (u64) __bpf_call_base_args + insn->imm;
+
+
+		if (!kallsyms_lookup(address, NULL, NULL, NULL,
+				     ctx->scratch_addr)) {
+
+			snprintf(ctx->scratch_addr, sizeof(ctx->scratch_addr),
+				 "%#016llx", address);
+		}
+	}
+	return ctx->scratch_addr;
+}
+
+
+static void __bpf_cb_print(void *private_data, const char *fmt, ...)
+{
+	struct bpf_trace_ctx *ctx = private_data;
+	va_list args;
+
+	va_start(args, fmt);
+	vsnprintf(ctx->scratch, sizeof(ctx->scratch), fmt, args);
+	va_end(args);
+}
+
+static __always_inline void __bpf_trace_next(struct bpf_trace_ctx *ctx,
+					      const struct bpf_insn *insn)
+{
+	int reg;
+
+	/* For tracing handle results of previous function */
+	if (ctx->prev_insn != NULL) {
+		u64 pc;
+		const struct bpf_insn *prev_insn = ctx->prev_insn;
+		const struct bpf_insn_cbs cbs = {
+			.cb_call        = __bpf_cb_call,
+			.cb_print	= __bpf_cb_print,
+			.private_data	= ctx,
+		};
+		ctx->scratch[0] = 0;
+		/*
+		 * TODO: Here we need to handle the BPF_CALL_ARGS and
+		 *       BPF_TAIL_CALL for the disassemble part, see also the
+		 *       bpf_insn_prepare_dump() call.
+		 */
+		print_bpf_insn(&cbs, prev_insn, true);
+		ctx->scratch[strlen(ctx->scratch) - 1] = 0;
+
+		/* TODO: We need a better way to calculate the pc, as the pc
+		 *       could be out of scope, i.e. external call. We could
+		 *       use  bpf_prog_insn_size() if we have the bpf_prog
+		 *       structure.
+		 */
+		pc = (u64) (prev_insn - ctx->prog_insn);
+
+		pr_info("BPFTrace: %u[%#llx]: %s",
+			ctx->insn_executed, pc,
+			ctx->scratch);
+
+		switch (BPF_CLASS(prev_insn->code)) {
+		case BPF_ALU:
+		case BPF_ALU64:
+		case BPF_LD:
+		case BPF_LDX:
+			pr_info("BPFTrace:   r%u: %#018llx -> %#018llx",
+				ctx->prev_insn->dst_reg,
+				ctx->regs_stored[prev_insn->dst_reg],
+				ctx->regs[prev_insn->dst_reg]);
+			break;
+
+		case BPF_ST:
+		case BPF_STX:
+			switch (BPF_SIZE(insn->code)) {
+			case BPF_B:
+				pr_info("BPFTrace:   MEM[%px]: %#04llx -> %#04x",
+					ctx->addr_stored,
+					ctx->addr_value_stored,
+					*(u8 *) ctx->addr_stored);
+				break;
+			case BPF_H:
+				pr_info("BPFTrace:   MEM[%px]: %#06llx -> %#06x",
+					ctx->addr_stored,
+					ctx->addr_value_stored,
+					*(u16 *) ctx->addr_stored);
+				break;
+			case BPF_W:
+				pr_info("BPFTrace:   MEM[%px]: %#010llx -> %#010x",
+					ctx->addr_stored,
+					ctx->addr_value_stored,
+					*(u32 *) ctx->addr_stored);
+				break;
+			case BPF_DW:
+				pr_info("BPFTrace:   MEM[%px]: %#018llx -> %#018llx",
+					ctx->addr_stored,
+					ctx->addr_value_stored,
+					*(u64 *) ctx->addr_stored);
+				break;
+			}
+			if (BPF_OP(prev_insn->code) == BPF_XADD)
+				pr_info("BPFTrace:   !! Above values are not retrieved atomically !!");
+
+			break;
+
+		case BPF_JMP:
+		case BPF_JMP32:
+			switch (BPF_OP(prev_insn->code)) {
+
+			default:
+				if (BPF_SRC(prev_insn->code) == BPF_X)
+					pr_warn("BPFTrace:   r%u: %#018llx <==> r%u: %#018llx",
+						prev_insn->dst_reg,
+						ctx->regs[prev_insn->dst_reg],
+						prev_insn->src_reg,
+						ctx->regs[prev_insn->src_reg]);
+				else
+					pr_warn("BPFTrace:   r%u: %#018llx <==> %#08x",
+						prev_insn->dst_reg,
+						ctx->regs[prev_insn->dst_reg],
+						prev_insn->imm);
+				fallthrough;
+
+			case BPF_JA:
+				if ((prev_insn + 1) == insn) {
+					pr_warn("BPFTrace:   branch was NOT taken");
+				} else {
+					/* TODO: See note above on pc. */
+					pr_warn("BPFTrace:   pc: %#018llx -> %#018llx",
+						(u64) (prev_insn - ctx->prog_insn),
+						(u64) (insn - ctx->prog_insn));
+					pr_warn("BPFTrace:   branch was taken");
+				}
+				break;
+
+			case BPF_EXIT:
+				break;
+
+			case BPF_CALL:
+			case BPF_CALL_ARGS:
+			case BPF_TAIL_CALL:
+				for (reg = BPF_REG_0; reg <= BPF_REG_5;
+				     reg++) {
+					if (ctx->regs_stored[reg] != ctx->regs[reg])
+						pr_info("BPFTrace:   r%u: %#018llx -> %#018llx",
+							reg,
+							ctx->regs_stored[reg],
+							ctx->regs[reg]);
+				}
+				break;
+			}
+			break;
+
+		default:
+			pr_warn("BPFTrace: No decode for results %d",
+				BPF_CLASS(prev_insn->code));
+			break;
+		}
+	}
+
+	if (insn == NULL)
+		return;
+
+	/* For tracing store information needed before it gets changed */
+	switch (BPF_CLASS(insn->code)) {
+	case BPF_ALU:
+	case BPF_ALU64:
+	case BPF_LD:
+	case BPF_LDX:
+		ctx->regs_stored[insn->dst_reg] = ctx->regs[insn->dst_reg];
+		break;
+
+	case BPF_ST:
+	case BPF_STX:
+		ctx->addr_stored = (void *) ctx->regs[insn->dst_reg] + insn->off;
+		switch (BPF_SIZE(insn->code)) {
+		case BPF_B:
+			ctx->addr_value_stored = *(u8 *) ctx->addr_stored;
+			break;
+		case BPF_H:
+			ctx->addr_value_stored = *(u16 *) ctx->addr_stored;
+			break;
+		case BPF_W:
+			ctx->addr_value_stored = *(u32 *) ctx->addr_stored;
+			break;
+		case BPF_DW:
+			ctx->addr_value_stored = *(u64 *) ctx->addr_stored;
+			break;
+		default:
+			ctx->addr_value_stored = 0;
+			break;
+		}
+		break;
+
+	case BPF_JMP32:
+		/* Nothing we need to store for this case */
+		break;
+
+	case BPF_JMP:
+		/* Only for special jump we need to store stuff */
+		if (BPF_OP(insn->code) == BPF_CALL ||
+		    BPF_OP(insn->code) == BPF_CALL_ARGS) {
+
+			for (reg = BPF_REG_0; reg <= BPF_REG_5; reg++)
+				ctx->regs_stored[reg] = ctx->regs[reg];
+
+		} else if (BPF_OP(insn->code) == BPF_TAIL_CALL) {
+
+			pr_warn("BPFTrace: TODO: No decode for results %d",
+				BPF_OP(insn->code));
+		}
+		break;
+
+	default:
+		pr_warn("BPFTrace: No capture for ORIGINAL value for %d",
+			BPF_CLASS(insn->code));
+		break;
+	}
+	ctx->insn_executed++;
+	ctx->prev_insn = insn;
+}
+
+static __always_inline void __bpf_trace_complete(struct bpf_trace_ctx *ctx,
+						 u64 ret)
+{
+	__bpf_trace_next(ctx, NULL);
+	pr_info("BPFTrace:");
+	__bpf_trace_dump_stack(ctx);
+	pr_info("BPFTrace:");
+	__bpf_trace_dump_regs(ctx);
+	pr_info("BPFTrace:");
+	pr_info("BPFTrace: RESULT:");
+	pr_info("BPFTrace:   return = %#018llx", ret);
+}
+
+/**
+ *	__bpf_prog_run_trace - run eBPF program on a given context
+ *	@regs: is the array of MAX_BPF_EXT_REG eBPF pseudo-registers
+ *	@insn: is the array of eBPF instructions
+ *	@stack: is the eBPF storage stack
+ *
+ * Decode and execute eBPF instructions with tracing enabled.
+ *
+ * TODO: This is just a copy of the none trace __bpf_prog_run(), we need to
+ *       integrate the two with out adding duplicate code...
+ */
+static u64 __no_fgcse ___bpf_prog_run_trace(struct bpf_trace_ctx *trace_ctx,
+					    u64 *regs,
+					    const struct bpf_insn *insn,
+					    u64 *stack)
+{
+#define BPF_INSN_2_LBL(x, y)    [BPF_##x | BPF_##y] = &&x##_##y
+#define BPF_INSN_3_LBL(x, y, z) [BPF_##x | BPF_##y | BPF_##z] = &&x##_##y##_##z
+	static const void * const jumptable[256] __annotate_jump_table = {
+		[0 ... 255] = &&default_label,
+		/* Now overwrite non-defaults ... */
+		BPF_INSN_MAP(BPF_INSN_2_LBL, BPF_INSN_3_LBL),
+		/* Non-UAPI available opcodes. */
+		[BPF_JMP | BPF_CALL_ARGS] = &&JMP_CALL_ARGS,
+		[BPF_JMP | BPF_TAIL_CALL] = &&JMP_TAIL_CALL,
+		[BPF_LDX | BPF_PROBE_MEM | BPF_B] = &&LDX_PROBE_MEM_B,
+		[BPF_LDX | BPF_PROBE_MEM | BPF_H] = &&LDX_PROBE_MEM_H,
+		[BPF_LDX | BPF_PROBE_MEM | BPF_W] = &&LDX_PROBE_MEM_W,
+		[BPF_LDX | BPF_PROBE_MEM | BPF_DW] = &&LDX_PROBE_MEM_DW,
+	};
+#undef BPF_INSN_3_LBL
+#undef BPF_INSN_2_LBL
+	u32 tail_call_cnt = 0;
+
+#define CONT	 ({ insn++; goto select_insn; })
+#define CONT_JMP ({ insn++; goto select_insn; })
+select_insn:
+	__bpf_trace_next(trace_ctx, insn);
+	goto *jumptable[insn->code];
+
+	/* ALU */
+#define ALU(OPCODE, OP)			\
+	ALU64_##OPCODE##_X:		\
+		DST = DST OP SRC;	\
+		CONT;			\
+	ALU_##OPCODE##_X:		\
+		DST = (u32) DST OP (u32) SRC;	\
+		CONT;			\
+	ALU64_##OPCODE##_K:		\
+		DST = DST OP IMM;		\
+		CONT;			\
+	ALU_##OPCODE##_K:		\
+		DST = (u32) DST OP (u32) IMM;	\
+		CONT;
+
+	ALU(ADD,  +)
+	ALU(SUB,  -)
+	ALU(AND,  &)
+	ALU(OR,   |)
+	ALU(LSH, <<)
+	ALU(RSH, >>)
+	ALU(XOR,  ^)
+	ALU(MUL,  *)
+#undef ALU
+	ALU_NEG:
+		DST = (u32) -DST;
+		CONT;
+	ALU64_NEG:
+		DST = -DST;
+		CONT;
+	ALU_MOV_X:
+		DST = (u32) SRC;
+		CONT;
+	ALU_MOV_K:
+		DST = (u32) IMM;
+		CONT;
+	ALU64_MOV_X:
+		DST = SRC;
+		CONT;
+	ALU64_MOV_K:
+		DST = IMM;
+		CONT;
+	LD_IMM_DW:
+		DST = (u64) (u32) insn[0].imm | ((u64) (u32) insn[1].imm) << 32;
+		insn++;
+		CONT;
+	ALU_ARSH_X:
+		DST = (u64) (u32) (((s32) DST) >> SRC);
+		CONT;
+	ALU_ARSH_K:
+		DST = (u64) (u32) (((s32) DST) >> IMM);
+		CONT;
+	ALU64_ARSH_X:
+		(*(s64 *) &DST) >>= SRC;
+		CONT;
+	ALU64_ARSH_K:
+		(*(s64 *) &DST) >>= IMM;
+		CONT;
+	ALU64_MOD_X:
+		div64_u64_rem(DST, SRC, &AX);
+		DST = AX;
+		CONT;
+	ALU_MOD_X:
+		AX = (u32) DST;
+		DST = do_div(AX, (u32) SRC);
+		CONT;
+	ALU64_MOD_K:
+		div64_u64_rem(DST, IMM, &AX);
+		DST = AX;
+		CONT;
+	ALU_MOD_K:
+		AX = (u32) DST;
+		DST = do_div(AX, (u32) IMM);
+		CONT;
+	ALU64_DIV_X:
+		DST = div64_u64(DST, SRC);
+		CONT;
+	ALU_DIV_X:
+		AX = (u32) DST;
+		do_div(AX, (u32) SRC);
+		DST = (u32) AX;
+		CONT;
+	ALU64_DIV_K:
+		DST = div64_u64(DST, IMM);
+		CONT;
+	ALU_DIV_K:
+		AX = (u32) DST;
+		do_div(AX, (u32) IMM);
+		DST = (u32) AX;
+		CONT;
+	ALU_END_TO_BE:
+		switch (IMM) {
+		case 16:
+			DST = (__force u16) cpu_to_be16(DST);
+			break;
+		case 32:
+			DST = (__force u32) cpu_to_be32(DST);
+			break;
+		case 64:
+			DST = (__force u64) cpu_to_be64(DST);
+			break;
+		}
+		CONT;
+	ALU_END_TO_LE:
+		switch (IMM) {
+		case 16:
+			DST = (__force u16) cpu_to_le16(DST);
+			break;
+		case 32:
+			DST = (__force u32) cpu_to_le32(DST);
+			break;
+		case 64:
+			DST = (__force u64) cpu_to_le64(DST);
+			break;
+		}
+		CONT;
+
+	/* CALL */
+	JMP_CALL:
+		/* Function call scratches BPF_R1-BPF_R5 registers,
+		 * preserves BPF_R6-BPF_R9, and stores return value
+		 * into BPF_R0.
+		 */
+		BPF_R0 = (__bpf_call_base + insn->imm)(BPF_R1, BPF_R2, BPF_R3,
+						       BPF_R4, BPF_R5);
+		CONT;
+
+	JMP_CALL_ARGS:
+		BPF_R0 = (__bpf_call_base_args + insn->imm)(BPF_R1, BPF_R2,
+							    BPF_R3, BPF_R4,
+							    BPF_R5,
+							    insn + insn->off + 1);
+		CONT;
+
+	JMP_TAIL_CALL: {
+		struct bpf_map *map = (struct bpf_map *) (unsigned long) BPF_R2;
+		struct bpf_array *array = container_of(map, struct bpf_array, map);
+		struct bpf_prog *prog;
+		u32 index = BPF_R3;
+
+		if (unlikely(index >= array->map.max_entries))
+			goto out;
+		if (unlikely(tail_call_cnt > MAX_TAIL_CALL_CNT))
+			goto out;
+
+		tail_call_cnt++;
+
+		prog = READ_ONCE(array->ptrs[index]);
+		if (!prog)
+			goto out;
+
+		/* ARG1 at this point is guaranteed to point to CTX from
+		 * the verifier side due to the fact that the tail call is
+		 * handeled like a helper, that is, bpf_tail_call_proto,
+		 * where arg1_type is ARG_PTR_TO_CTX.
+		 */
+		insn = prog->insnsi;
+		goto select_insn;
+out:
+		CONT;
+	}
+	JMP_JA:
+		insn += insn->off;
+		CONT;
+	JMP_EXIT:
+		return BPF_R0;
+	/* JMP */
+#define COND_JMP(SIGN, OPCODE, CMP_OP)				\
+	JMP_##OPCODE##_X:					\
+		if ((SIGN##64) DST CMP_OP (SIGN##64) SRC) {	\
+			insn += insn->off;			\
+			CONT_JMP;				\
+		}						\
+		CONT;						\
+	JMP32_##OPCODE##_X:					\
+		if ((SIGN##32) DST CMP_OP (SIGN##32) SRC) {	\
+			insn += insn->off;			\
+			CONT_JMP;				\
+		}						\
+		CONT;						\
+	JMP_##OPCODE##_K:					\
+		if ((SIGN##64) DST CMP_OP (SIGN##64) IMM) {	\
+			insn += insn->off;			\
+			CONT_JMP;				\
+		}						\
+		CONT;						\
+	JMP32_##OPCODE##_K:					\
+		if ((SIGN##32) DST CMP_OP (SIGN##32) IMM) {	\
+			insn += insn->off;			\
+			CONT_JMP;				\
+		}						\
+		CONT;
+	COND_JMP(u, JEQ, ==)
+	COND_JMP(u, JNE, !=)
+	COND_JMP(u, JGT, >)
+	COND_JMP(u, JLT, <)
+	COND_JMP(u, JGE, >=)
+	COND_JMP(u, JLE, <=)
+	COND_JMP(u, JSET, &)
+	COND_JMP(s, JSGT, >)
+	COND_JMP(s, JSLT, <)
+	COND_JMP(s, JSGE, >=)
+	COND_JMP(s, JSLE, <=)
+#undef COND_JMP
+	/* STX and ST and LDX*/
+#define LDST(SIZEOP, SIZE)						\
+	STX_MEM_##SIZEOP:						\
+		*(SIZE *)(unsigned long) (DST + insn->off) = SRC;	\
+		CONT;							\
+	ST_MEM_##SIZEOP:						\
+		*(SIZE *)(unsigned long) (DST + insn->off) = IMM;	\
+		CONT;							\
+	LDX_MEM_##SIZEOP:						\
+		DST = *(SIZE *)(unsigned long) (SRC + insn->off);	\
+		CONT;
+
+	LDST(B,   u8)
+	LDST(H,  u16)
+	LDST(W,  u32)
+	LDST(DW, u64)
+#undef LDST
+#define LDX_PROBE(SIZEOP, SIZE)							\
+	LDX_PROBE_MEM_##SIZEOP:							\
+		bpf_probe_read_kernel(&DST, SIZE, (const void *)(long) (SRC + insn->off));	\
+		CONT;
+	LDX_PROBE(B,  1)
+	LDX_PROBE(H,  2)
+	LDX_PROBE(W,  4)
+	LDX_PROBE(DW, 8)
+#undef LDX_PROBE
+
+	STX_XADD_W: /* lock xadd *(u32 *)(dst_reg + off16) += src_reg */
+		atomic_add((u32) SRC, (atomic_t *)(unsigned long)
+			   (DST + insn->off));
+		CONT;
+	STX_XADD_DW: /* lock xadd *(u64 *)(dst_reg + off16) += src_reg */
+		atomic64_add((u64) SRC, (atomic64_t *)(unsigned long)
+			     (DST + insn->off));
+		CONT;
+
+	default_label:
+		/* If we ever reach this, we have a bug somewhere. Die hard here
+		 * instead of just returning 0; we could be somewhere in a subprog,
+		 * so execution could continue otherwise which we do /not/ want.
+		 *
+		 * Note, verifier whitelists all opcodes in bpf_opcode_in_insntable().
+		 */
+		pr_warn("BPF interpreter: unknown opcode %02x\n", insn->code);
+		BUG_ON(1);
+		return 0;
+}
+
 #define PROG_NAME(stack_size) __bpf_prog_run##stack_size
 #define DEFINE_BPF_PROG_RUN(stack_size) \
 static unsigned int PROG_NAME(stack_size)(const void *ctx, const struct bpf_insn *insn) \
@@ -1646,6 +2271,27 @@ static unsigned int PROG_NAME(stack_size)(const void *ctx, const struct bpf_insn
 	return ___bpf_prog_run(regs, insn, stack); \
 }
 
+#define PROG_NAME_TRACE(stack_size) __bpf_prog_run_trace##stack_size
+#define DEFINE_BPF_PROG_RUN_TRACE(stack_size) \
+static unsigned int PROG_NAME_TRACE(stack_size)(const void *ctx, const struct bpf_insn *insn) \
+{ \
+	unsigned int ret; \
+	u64 stack[stack_size / sizeof(u64)]; \
+	u64 regs[MAX_BPF_EXT_REG]; \
+	struct bpf_trace_ctx trace_ctx; \
+\
+	memset(stack, 0, sizeof(stack)); \
+	memset(regs, 0, sizeof(regs)); \
+\
+	FP = (u64) (unsigned long) &stack[ARRAY_SIZE(stack)]; \
+	ARG1 = (u64) (unsigned long) ctx; \
+\
+	__bpf_trace_init(&trace_ctx, insn, regs, stack, stack_size); \
+	ret = ___bpf_prog_run_trace(&trace_ctx, regs, insn, stack); \
+	__bpf_trace_complete(&trace_ctx, ret); \
+	return ret; \
+}
+
 #define PROG_NAME_ARGS(stack_size) __bpf_prog_run_args##stack_size
 #define DEFINE_BPF_PROG_RUN_ARGS(stack_size) \
 static u64 PROG_NAME_ARGS(stack_size)(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5, \
@@ -1670,16 +2316,20 @@ static u64 PROG_NAME_ARGS(stack_size)(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5, \
 #define EVAL5(FN, X, Y...) FN(X) EVAL4(FN, Y)
 #define EVAL6(FN, X, Y...) FN(X) EVAL5(FN, Y)
 
-EVAL6(DEFINE_BPF_PROG_RUN, 32, 64, 96, 128, 160, 192);
-EVAL6(DEFINE_BPF_PROG_RUN, 224, 256, 288, 320, 352, 384);
-EVAL4(DEFINE_BPF_PROG_RUN, 416, 448, 480, 512);
+EVAL6(DEFINE_BPF_PROG_RUN_TRACE, 32, 64, 96, 128, 160, 192);
+EVAL6(DEFINE_BPF_PROG_RUN_TRACE, 224, 256, 288, 320, 352, 384);
+EVAL4(DEFINE_BPF_PROG_RUN_TRACE, 416, 448, 480, 512);
 
 EVAL6(DEFINE_BPF_PROG_RUN_ARGS, 32, 64, 96, 128, 160, 192);
 EVAL6(DEFINE_BPF_PROG_RUN_ARGS, 224, 256, 288, 320, 352, 384);
 EVAL4(DEFINE_BPF_PROG_RUN_ARGS, 416, 448, 480, 512);
 
-#define PROG_NAME_LIST(stack_size) PROG_NAME(stack_size),
+#ifndef CONFIG_BPF_JIT_ALWAYS_ON
+EVAL6(DEFINE_BPF_PROG_RUN, 32, 64, 96, 128, 160, 192);
+EVAL6(DEFINE_BPF_PROG_RUN, 224, 256, 288, 320, 352, 384);
+EVAL4(DEFINE_BPF_PROG_RUN, 416, 448, 480, 512);
 
+#define PROG_NAME_LIST(stack_size) PROG_NAME(stack_size),
 static unsigned int (*interpreters[])(const void *ctx,
 				      const struct bpf_insn *insn) = {
 EVAL6(PROG_NAME_LIST, 32, 64, 96, 128, 160, 192)
@@ -1687,6 +2337,17 @@ EVAL6(PROG_NAME_LIST, 224, 256, 288, 320, 352, 384)
 EVAL4(PROG_NAME_LIST, 416, 448, 480, 512)
 };
 #undef PROG_NAME_LIST
+#endif
+
+#define PROG_NAME_LIST(stack_size) PROG_NAME_TRACE(stack_size),
+static unsigned int (*interpreters_trace[])(const void *ctx,
+					    const struct bpf_insn *insn) = {
+EVAL6(PROG_NAME_LIST, 32, 64, 96, 128, 160, 192)
+EVAL6(PROG_NAME_LIST, 224, 256, 288, 320, 352, 384)
+EVAL4(PROG_NAME_LIST, 416, 448, 480, 512)
+};
+#undef PROG_NAME_LIST
+
 #define PROG_NAME_LIST(stack_size) PROG_NAME_ARGS(stack_size),
 static u64 (*interpreters_args[])(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5,
 				  const struct bpf_insn *insn) = {
@@ -1696,6 +2357,12 @@ EVAL4(PROG_NAME_LIST, 416, 448, 480, 512)
 };
 #undef PROG_NAME_LIST
 
+bpf_func_t bpf_get_trace_interpreter(const struct bpf_prog *fp)
+{
+	u32 stack_depth = max_t(u32, fp->aux->stack_depth, 1);
+	return interpreters_trace[(round_up(stack_depth, 32) / 32) - 1];
+}
+
 void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth)
 {
 	stack_depth = max_t(u32, stack_depth, 1);
@@ -1705,7 +2372,7 @@ void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth)
 	insn->code = BPF_JMP | BPF_CALL_ARGS;
 }
 
-#else
+#ifdef CONFIG_BPF_JIT_ALWAYS_ON
 static unsigned int __bpf_prog_ret0_warn(const void *ctx,
 					 const struct bpf_insn *insn)
 {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 55d376c53f7d..d521c5ad8111 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -2720,7 +2720,6 @@ static int check_max_stack_depth(struct bpf_verifier_env *env)
 	goto continue_func;
 }
 
-#ifndef CONFIG_BPF_JIT_ALWAYS_ON
 static int get_callee_stack_depth(struct bpf_verifier_env *env,
 				  const struct bpf_insn *insn, int idx)
 {
@@ -2734,7 +2733,6 @@ static int get_callee_stack_depth(struct bpf_verifier_env *env,
 	}
 	return env->subprog_info[subprog].stack_depth;
 }
-#endif
 
 int check_ctx_reg(struct bpf_verifier_env *env,
 		  const struct bpf_reg_state *reg, int regno)
@@ -9158,11 +9156,9 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 
 static int fixup_call_args(struct bpf_verifier_env *env)
 {
-#ifndef CONFIG_BPF_JIT_ALWAYS_ON
 	struct bpf_prog *prog = env->prog;
 	struct bpf_insn *insn = prog->insnsi;
 	int i, depth;
-#endif
 	int err = 0;
 
 	if (env->prog->jit_requested &&
@@ -9173,7 +9169,7 @@ static int fixup_call_args(struct bpf_verifier_env *env)
 		if (err == -EFAULT)
 			return err;
 	}
-#ifndef CONFIG_BPF_JIT_ALWAYS_ON
+
 	for (i = 0; i < prog->len; i++, insn++) {
 		if (insn->code != (BPF_JMP | BPF_CALL) ||
 		    insn->src_reg != BPF_PSEUDO_CALL)
@@ -9183,9 +9179,7 @@ static int fixup_call_args(struct bpf_verifier_env *env)
 			return depth;
 		bpf_patch_call_args(insn, depth);
 	}
-	err = 0;
-#endif
-	return err;
+	return 0;
 }
 
 /* fixup insn->imm field of bpf_call instructions


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH bpf-next 3/3] selftests/bpf: call bpf_prog_test_run with trace enabled for XDP test
  2020-03-18 13:06 [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API Eelco Chaudron
  2020-03-18 13:06 ` [RFC PATCH bpf-next 1/3] bpf: introduce trace option to the BPF_PROG_TEST_RUN command API Eelco Chaudron
  2020-03-18 13:06 ` [RFC PATCH bpf-next 2/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API Eelco Chaudron
@ 2020-03-18 13:06 ` Eelco Chaudron
  2020-03-23 22:47 ` [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API Yonghong Song
  3 siblings, 0 replies; 15+ messages in thread
From: Eelco Chaudron @ 2020-03-18 13:06 UTC (permalink / raw)
  To: bpf; +Cc: davem, netdev, ast, daniel, kafai, songliubraving, yhs, andriin

This adds a quick and dirty version of bpf_prog_test_run_trace() right
in the xdp.c selftest to demonstrate the new trace functionality.

This is a (too long) example output of the trace functionality:

STACK(32 bytes):
  ffffb82b4081faa8: 0x0000000000000000 0000000000000000 0000000000000000 0000000000000000

REGISTERS:
  r00-03: 0x0000000000000000 ffffb82b4081fde8 0000000000000000 0000000000000000
  r04-07: 0x0000000000000000 0000000000000000 0000000000000000 0000000000000000
  r08-11: 0x0000000000000000 0000000000000000 ffffb82b4081fac8 0000000000000000

1[0x0]: (bf) r6 = r1
  r6: 0x0000000000000000 -> 0xffffb82b4081fde8
2[0x1]: (b4) w8 = 1
  r8: 0x0000000000000000 -> 0x0000000000000001
3[0x2]: (b7) r11 = -439116112
  r11: 0x0000000000000000 -> 0xffffffffe5d39eb0
4[0x3]: (67) r11 <<= 32
  r11: 0xffffffffe5d39eb0 -> 0xe5d39eb000000000
5[0x4]: (4f) r8 |= r11
  r8: 0x0000000000000001 -> 0xe5d39eb000000001
6[0x5]: (79) r2 = *(u64 *)(r6 +8)
  r2: 0x0000000000000000 -> 0xffff8fbaec6d3b36
7[0x6]: (79) r1 = *(u64 *)(r6 +0)
  r1: 0xffffb82b4081fde8 -> 0xffff8fbaec6d3b00
8[0x7]: (bf) r3 = r1
  r3: 0x0000000000000000 -> 0xffff8fbaec6d3b00
9[0x8]: (07) r3 += 14
  r3: 0xffff8fbaec6d3b00 -> 0xffff8fbaec6d3b0e
10[0x9]: (2d) if r3 > r2 goto pc+76
  r3: 0xffff8fbaec6d3b0e <==> r2: 0xffff8fbaec6d3b36
  branch was NOT taken
...
...
300[0x152]: (a4) w2 ^= -1
  r2: 0x6ec5cf350000934d -> 0x00000000ffff6cb2
301[0x153]: (b7) r11 = -412759113
  r11: 0x6ec5cf3500000000 -> 0xffffffffe765cbb7
302[0x154]: (67) r11 <<= 32
  r11: 0xffffffffe765cbb7 -> 0xe765cbb700000000
303[0x155]: (4f) r2 |= r11
  r2: 0x00000000ffff6cb2 -> 0xe765cbb7ffff6cb2
304[0x156]: (6b) *(u16 *)(r1 +24) = r2
  MEM[ffff8fbaec6d3b04]: 0x00 -> 0xb2
305[0x157]: (71) r1 = *(u8 *)(r10 -12)
  r1: 0xffff8fbaec6d3aec -> 0x0000000000000006
306[0x158]: (b7) r11 = -1438054225
  r11: 0xe765cbb700000000 -> 0xffffffffaa4908af
307[0x159]: (67) r11 <<= 32
  r11: 0xffffffffaa4908af -> 0xaa4908af00000000
308[0x15a]: (4f) r1 |= r11
  r1: 0x0000000000000006 -> 0xaa4908af00000006
309[0x15b]: (63) *(u32 *)(r10 -4) = r1
  MEM[ffffb82b4081fac4]: 0x0000000000000000 -> 0x0000000000000006
310[0x15c]: (bf) r2 = r10
  r2: 0xe765cbb7ffff6cb2 -> 0xffffb82b4081fac8
311[0x15d]: (07) r2 += -4
  r2: 0xffffb82b4081fac8 -> 0xffffb82b4081fac4
312[0x15e]: (18) r1 = 0xffff8fbaeef8c000
  r1: 0xaa4908af00000006 -> 0xffff8fbaeef8c000
313[0x160]: (85) call percpu_array_map_lookup_elem#164336
  r0: 0x7af818e200000000 -> 0xffffd82b3fbb4048
314[0x161]: (15) if r0 == 0x0 goto pc+229
  r0: 0xffffd82b3fbb4048 <==> 0x000000
  branch was NOT taken
315[0x162]: (05) goto pc+225
  pc: 0x0000000000000162 -> 0x0000000000000244
  branch was taken
316[0x244]: (79) r1 = *(u64 *)(r0 +0)
  r1: 0xffff8fbaeef8c000 -> 0x0000000000000000
317[0x245]: (07) r1 += 1
  r1: 0x0000000000000000 -> 0x0000000000000001
318[0x246]: (7b) *(u64 *)(r0 +0) = r1
  MEM[ffffd82b3fbb4048]: 0x00 -> 0x01
319[0x247]: (b4) w8 = 3
  r8: 0x14d492f700000045 -> 0x0000000000000003
320[0x248]: (b7) r11 = -1474974291
  r11: 0xaa4908af00000000 -> 0xffffffffa815adad
321[0x249]: (67) r11 <<= 32
  r11: 0xffffffffa815adad -> 0xa815adad00000000
322[0x24a]: (4f) r8 |= r11
  r8: 0x0000000000000003 -> 0xa815adad00000003
323[0x24b]: (05) goto pc-502
  pc: 0x000000000000024b -> 0x0000000000000056
  branch was taken
324[0x56]: (bc) w0 = w8
  r0: 0xffffd82b3fbb4048 -> 0x0000000000000003
325[0x57]: (95) exit

STACK(32 bytes):
  ffffb82b4081faa8: 0x0000000000000000 0000000000000000 0000000600020000 0000000600000000

REGISTERS:
  r00-03: 0x0000000000000003 0000000000000001 ffffb82b4081fac4 9b5fd9b800000000
  r04-07: 0x03f00a9d00000000 2c071e5a00000000 7437bef800000000 ffff8fbaf4b5b6f8
  r08-11: 0xa815adad00000003 be1b210e0000934d ffffb82b4081fac8 a815adad00000000

RESULT:
  return = 0x0000000000000003


Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 tools/testing/selftests/bpf/prog_tests/xdp.c |   36 +++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp.c b/tools/testing/selftests/bpf/prog_tests/xdp.c
index dcb5ecac778e..8f7a70d39dbf 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp.c
@@ -1,6 +1,38 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
 
+
+static inline int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr,
+			  unsigned int size)
+{
+	return syscall(__NR_bpf, cmd, attr, size);
+}
+
+int bpf_prog_test_run_trace(int prog_fd, int repeat, void *data, __u32 size,
+			    void *data_out, __u32 *size_out, __u32 *retval,
+			    __u32 *duration)
+{
+	union bpf_attr attr;
+	int ret;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.test.prog_fd = prog_fd;
+	attr.test.data_in = ptr_to_u64(data);
+	attr.test.data_out = ptr_to_u64(data_out);
+	attr.test.data_size_in = size;
+	attr.test.repeat = repeat;
+	attr.test.flags = BPF_F_TEST_ENABLE_TRACE;
+
+	ret = sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr));
+	if (size_out)
+		*size_out = attr.test.data_size_out;
+	if (retval)
+		*retval = attr.test.retval;
+	if (duration)
+		*duration = attr.test.duration;
+	return ret;
+}
+
 void test_xdp(void)
 {
 	struct vip key4 = {.protocol = 6, .family = AF_INET};
@@ -25,7 +57,7 @@ void test_xdp(void)
 	bpf_map_update_elem(map_fd, &key4, &value4, 0);
 	bpf_map_update_elem(map_fd, &key6, &value6, 0);
 
-	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+	err = bpf_prog_test_run_trace(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
 				buf, &size, &retval, &duration);
 
 	CHECK(err || retval != XDP_TX || size != 74 ||
@@ -33,7 +65,7 @@ void test_xdp(void)
 	      "err %d errno %d retval %d size %d\n",
 	      err, errno, retval, size);
 
-	err = bpf_prog_test_run(prog_fd, 1, &pkt_v6, sizeof(pkt_v6),
+	err = bpf_prog_test_run_trace(prog_fd, 1, &pkt_v6, sizeof(pkt_v6),
 				buf, &size, &retval, &duration);
 	CHECK(err || retval != XDP_TX || size != 114 ||
 	      iph6->nexthdr != IPPROTO_IPV6, "ipv6",


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
  2020-03-18 13:06 [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API Eelco Chaudron
                   ` (2 preceding siblings ...)
  2020-03-18 13:06 ` [RFC PATCH bpf-next 3/3] selftests/bpf: call bpf_prog_test_run with trace enabled for XDP test Eelco Chaudron
@ 2020-03-23 22:47 ` Yonghong Song
  2020-04-16 12:45   ` Eelco Chaudron
  3 siblings, 1 reply; 15+ messages in thread
From: Yonghong Song @ 2020-03-23 22:47 UTC (permalink / raw)
  To: Eelco Chaudron, bpf
  Cc: davem, netdev, ast, daniel, kafai, songliubraving, andriin



On 3/18/20 6:06 AM, Eelco Chaudron wrote:
> I sent out this RFC to get an idea if the approach suggested here
> would be something other people would also like to see. In addition,
> this cover letter mentions some concerns and questions that need
> answers before we can move to an acceptable implementation.
> 
> This patch adds support for tracing eBPF XDP programs that get
> executed using the __BPF_PROG_RUN syscall. This is done by switching
> from JIT (if enabled) to executing the program using the interpreter
> and record each executed instruction.

Thanks for working on this! I think this is a useful feature
to do semi single step in a safe environment. The initial input,
e.g., packet or some other kernel context, may be captured
in production error path. People can use this to easily
do some post analysis. This feature can also be used for
initial single-step debugging with better bpftool support.

> 
> For now, the execution history is printed to the kernel ring buffer
> using pr_info(), the final version should have enough data stored in a
> user-supplied buffer to reconstruct this output. This should probably
> be part of bpftool, i.e. dump a similar output, and the ability to
> store all this in an elf-like format for dumping/analyzing/replaying
> at a later stage.
> 
> This patch does not dump the XDP packet content before and after
> execution, however, this data is available to the caller of the API.

I would like to see the feature is implemented in a way to apply
to all existing test_run program types and extensible to future
program types.

There are different ways to send data back to user. User buffer
is one way, ring buffer is another way, seq_file can also be used.
Performance is not a concern here, so we can choose the one with best
usability.

> 
> The __bpf_prog_run_trace() interpreter is a copy of __bpf_prog_run()
> and we probably need a smarter way to re-use the code rather than a
> blind copy with some changes.

Yes, reusing the code is a must. Using existing interpreter framework
is the easiest for semi single step support.

> 
> Enabling the interpreter opens up the kernel for spectre variant 2,
> guess that's why the BPF_JIT_ALWAYS_ON option was introduced (commit
> 290af86629b2). Enabling it for debugging in the field does not sound
> like an option (talking to people doing kernel distributions).
> Any idea how to work around this (lfence before any call this will
> slow down, but I guess for debugging this does not matter)? I need to
> research this more as I'm no expert in this area. But I think this
> needs to be solved as I see this as a show stopper. So any input is
> welcome.

lfence for indirect call is okay here for test_run. Just need to be
careful to no introduce any performance penalty for non-test-run
prog run.

> 
> To allow bpf_call support for tracing currently the general
> interpreter is enabled. See the fixup_call_args() function for why
> this is needed. We might need to find a way to fix this (see the above
> section on spectre).
> 
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
  2020-03-23 22:47 ` [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API Yonghong Song
@ 2020-04-16 12:45   ` Eelco Chaudron
  2020-04-19  7:01     ` Yonghong Song
  0 siblings, 1 reply; 15+ messages in thread
From: Eelco Chaudron @ 2020-04-16 12:45 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, davem, netdev, ast, daniel, kafai, songliubraving, andriin



On 23 Mar 2020, at 23:47, Yonghong Song wrote:

> On 3/18/20 6:06 AM, Eelco Chaudron wrote:
>> I sent out this RFC to get an idea if the approach suggested here
>> would be something other people would also like to see. In addition,
>> this cover letter mentions some concerns and questions that need
>> answers before we can move to an acceptable implementation.
>>
>> This patch adds support for tracing eBPF XDP programs that get
>> executed using the __BPF_PROG_RUN syscall. This is done by switching
>> from JIT (if enabled) to executing the program using the interpreter
>> and record each executed instruction.
>
> Thanks for working on this! I think this is a useful feature
> to do semi single step in a safe environment. The initial input,
> e.g., packet or some other kernel context, may be captured
> in production error path. People can use this to easily
> do some post analysis. This feature can also be used for
> initial single-step debugging with better bpftool support.
>
>>
>> For now, the execution history is printed to the kernel ring buffer
>> using pr_info(), the final version should have enough data stored in 
>> a
>> user-supplied buffer to reconstruct this output. This should probably
>> be part of bpftool, i.e. dump a similar output, and the ability to
>> store all this in an elf-like format for dumping/analyzing/replaying
>> at a later stage.
>>
>> This patch does not dump the XDP packet content before and after
>> execution, however, this data is available to the caller of the API.
>
> I would like to see the feature is implemented in a way to apply
> to all existing test_run program types and extensible to future
> program types.

Yes, this makes sense, but as I’m only familiar with the XDP part, I 
focused on that.

> There are different ways to send data back to user. User buffer
> is one way, ring buffer is another way, seq_file can also be used.
> Performance is not a concern here, so we can choose the one with best
> usability.

As we need a buffer the easiest way would be to supply a user buffer. I 
guess a raw perf buffer might also work, but the API might get 
complex… I’ll dig into this a bit for the next RFC.

>>
>> The __bpf_prog_run_trace() interpreter is a copy of __bpf_prog_run()
>> and we probably need a smarter way to re-use the code rather than a
>> blind copy with some changes.
>
> Yes, reusing the code is a must. Using existing interpreter framework
> is the easiest for semi single step support.

Any idea how to do it cleanly? I guess I could move the interpreter code 
out of the core file and include it twice.

>> Enabling the interpreter opens up the kernel for spectre variant 2,
>> guess that's why the BPF_JIT_ALWAYS_ON option was introduced (commit
>> 290af86629b2). Enabling it for debugging in the field does not sound
>> like an option (talking to people doing kernel distributions).
>> Any idea how to work around this (lfence before any call this will
>> slow down, but I guess for debugging this does not matter)? I need to
>> research this more as I'm no expert in this area. But I think this
>> needs to be solved as I see this as a show stopper. So any input is
>> welcome.
>
> lfence for indirect call is okay here for test_run. Just need to be
> careful to no introduce any performance penalty for non-test-run
> prog run.

My idea here was to do it at compile time and only if the interpreter 
was disabled.

>>
>> To allow bpf_call support for tracing currently the general
>> interpreter is enabled. See the fixup_call_args() function for why
>> this is needed. We might need to find a way to fix this (see the 
>> above
>> section on spectre).
>>
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>

One final question did you (or anyone else) looked at the actual code 
and have some tips, thinks look at?


I’ll try to do another RFC, cleaning up the duplicate interpreter 
code, sent the actual trace data to userspace. Will hack some userspace 
decoder together, or maybe even start integrating it in bpftool (if not 
it will be part of the follow on RFC).


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
  2020-04-16 12:45   ` Eelco Chaudron
@ 2020-04-19  7:01     ` Yonghong Song
  2020-04-19 22:54       ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Yonghong Song @ 2020-04-19  7:01 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: bpf, davem, netdev, ast, daniel, kafai, songliubraving, andriin



On 4/16/20 5:45 AM, Eelco Chaudron wrote:
> 
> 
> On 23 Mar 2020, at 23:47, Yonghong Song wrote:
> 
>> On 3/18/20 6:06 AM, Eelco Chaudron wrote:
>>> I sent out this RFC to get an idea if the approach suggested here
>>> would be something other people would also like to see. In addition,
>>> this cover letter mentions some concerns and questions that need
>>> answers before we can move to an acceptable implementation.
>>>
>>> This patch adds support for tracing eBPF XDP programs that get
>>> executed using the __BPF_PROG_RUN syscall. This is done by switching
>>> from JIT (if enabled) to executing the program using the interpreter
>>> and record each executed instruction.
>>
>> Thanks for working on this! I think this is a useful feature
>> to do semi single step in a safe environment. The initial input,
>> e.g., packet or some other kernel context, may be captured
>> in production error path. People can use this to easily
>> do some post analysis. This feature can also be used for
>> initial single-step debugging with better bpftool support.
>>
>>>
>>> For now, the execution history is printed to the kernel ring buffer
>>> using pr_info(), the final version should have enough data stored in a
>>> user-supplied buffer to reconstruct this output. This should probably
>>> be part of bpftool, i.e. dump a similar output, and the ability to
>>> store all this in an elf-like format for dumping/analyzing/replaying
>>> at a later stage.
>>>
>>> This patch does not dump the XDP packet content before and after
>>> execution, however, this data is available to the caller of the API.
>>
>> I would like to see the feature is implemented in a way to apply
>> to all existing test_run program types and extensible to future
>> program types.
> 
> Yes, this makes sense, but as I’m only familiar with the XDP part, I 
> focused on that.

That is okay. Let us first focus on xdp and then to change it to make it 
more general before landing. During implementation you want to keep 
anything not related to xdp as separate functions so they can be
reused.

> 
>> There are different ways to send data back to user. User buffer
>> is one way, ring buffer is another way, seq_file can also be used.
>> Performance is not a concern here, so we can choose the one with best
>> usability.
> 
> As we need a buffer the easiest way would be to supply a user buffer. I 
> guess a raw perf buffer might also work, but the API might get complex… 
> I’ll dig into this a bit for the next RFC.

You may use xdp's perf_event_output. Not sure whether it can serve
your purpose or not. If not, I think user buffer is okay.

> 
>>>
>>> The __bpf_prog_run_trace() interpreter is a copy of __bpf_prog_run()
>>> and we probably need a smarter way to re-use the code rather than a
>>> blind copy with some changes.
>>
>> Yes, reusing the code is a must. Using existing interpreter framework
>> is the easiest for semi single step support.
> 
> Any idea how to do it cleanly? I guess I could move the interpreter code 
> out of the core file and include it twice.

I think refactor to a different file is totally okay. You can then pass
different callback functions from test_run and from interpreter call.

> 
>>> Enabling the interpreter opens up the kernel for spectre variant 2,
>>> guess that's why the BPF_JIT_ALWAYS_ON option was introduced (commit
>>> 290af86629b2). Enabling it for debugging in the field does not sound
>>> like an option (talking to people doing kernel distributions).
>>> Any idea how to work around this (lfence before any call this will
>>> slow down, but I guess for debugging this does not matter)? I need to
>>> research this more as I'm no expert in this area. But I think this
>>> needs to be solved as I see this as a show stopper. So any input is
>>> welcome.
>>
>> lfence for indirect call is okay here for test_run. Just need to be
>> careful to no introduce any performance penalty for non-test-run
>> prog run.
> 
> My idea here was to do it at compile time and only if the interpreter 
> was disabled.

This should be okay. If people do not have BPF_JIT_ALWAYS_ON, that
probably means they are not worried about interpreter indirect
calls...

> 
>>>
>>> To allow bpf_call support for tracing currently the general
>>> interpreter is enabled. See the fixup_call_args() function for why
>>> this is needed. We might need to find a way to fix this (see the above
>>> section on spectre).
>>>
>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>>
> 
> One final question did you (or anyone else) looked at the actual code 
> and have some tips, thinks look at?

I briefly looked at the code, but did not go through details.
Next time will try to look at more details.

> 
> 
> I’ll try to do another RFC, cleaning up the duplicate interpreter code, 
> sent the actual trace data to userspace. Will hack some userspace 
> decoder together, or maybe even start integrating it in bpftool (if not 
> it will be part of the follow on RFC).
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
  2020-04-19  7:01     ` Yonghong Song
@ 2020-04-19 22:54       ` Alexei Starovoitov
  2020-04-24 12:29         ` Eelco Chaudron
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2020-04-19 22:54 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Eelco Chaudron, bpf, David S. Miller, Network Development,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Andrii Nakryiko

On Sun, Apr 19, 2020 at 12:02 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 4/16/20 5:45 AM, Eelco Chaudron wrote:
> >
> >
> > On 23 Mar 2020, at 23:47, Yonghong Song wrote:
> >
> >> On 3/18/20 6:06 AM, Eelco Chaudron wrote:
> >>> I sent out this RFC to get an idea if the approach suggested here
> >>> would be something other people would also like to see. In addition,
> >>> this cover letter mentions some concerns and questions that need
> >>> answers before we can move to an acceptable implementation.
> >>>
> >>> This patch adds support for tracing eBPF XDP programs that get
> >>> executed using the __BPF_PROG_RUN syscall. This is done by switching
> >>> from JIT (if enabled) to executing the program using the interpreter
> >>> and record each executed instruction.

sorry for delay. I only noticed these patches after Yonghong replied.

I think hacking interpreter to pr_info after every instruction is too
much of a hack.
Not working with JIT-ed code is imo red flag for the approach as well.
When every insn is spamming the logs the only use case I can see
is to feed the test program with one packet and read thousand lines dump.
Even that is quite user unfriendly.

How about enabling kprobe in JITed code instead?
Then if you really need to trap and print regs for every instruction you can
still do so by placing kprobe on every JITed insn.
But in reality I think few kprobes in the prog will be enough
to debug the program and XDP prog may still process millions of packets
because your kprobe could be in error path and the user may want to
capture only specific things when it triggers.
kprobe bpf prog will execute in such case and it can capture necessary
state from xdp prog, from packet or from maps that xdp prog is using.
Some sort of bpf-gdb would be needed in user space.
Obviously people shouldn't be writing such kprob-bpf progs that debug
other bpf progs by hand. bpf-gdb should be able to generate them automatically.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
  2020-04-19 22:54       ` Alexei Starovoitov
@ 2020-04-24 12:29         ` Eelco Chaudron
  2020-04-28  4:04           ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Eelco Chaudron @ 2020-04-24 12:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yonghong Song, bpf, David S. Miller, Network Development,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Andrii Nakryiko



On 20 Apr 2020, at 0:54, Alexei Starovoitov wrote:

> On Sun, Apr 19, 2020 at 12:02 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 4/16/20 5:45 AM, Eelco Chaudron wrote:
>>>
>>>
>>> On 23 Mar 2020, at 23:47, Yonghong Song wrote:
>>>
>>>> On 3/18/20 6:06 AM, Eelco Chaudron wrote:
>>>>> I sent out this RFC to get an idea if the approach suggested here
>>>>> would be something other people would also like to see. In 
>>>>> addition,
>>>>> this cover letter mentions some concerns and questions that need
>>>>> answers before we can move to an acceptable implementation.
>>>>>
>>>>> This patch adds support for tracing eBPF XDP programs that get
>>>>> executed using the __BPF_PROG_RUN syscall. This is done by 
>>>>> switching
>>>>> from JIT (if enabled) to executing the program using the 
>>>>> interpreter
>>>>> and record each executed instruction.
>
> sorry for delay. I only noticed these patches after Yonghong replied.

Hi Alexei, I have to disagree with most of your comments below :) 
However, I think it's due to not clearly stating the main use case I 
have in mind for this. This is not for a developer to be used to 
interactively debug an issue, but for an end-user/support person to get 
some initial data in a live environment without affecting any live 
traffic (assuming the XDP use case), as this tracing is only available 
using the BPF_PROG_RUN API. From my previous experience as an 
ASIC/Microcode network datapath engineer I have found that this kind of 
a debug output (how low level as it looks, but with the right tooling 
this should not be an issue) solves +/-80% of the cases I've worked on. 
The remaining 20% were almost always related to "user space" 
applications not populating tables, and other resources correctly.

> I think hacking interpreter to pr_info after every instruction is too
> much of a hack.

I agree if this would be for the normal interpreter path also, but this 
is a separate interpreter only used for the debug path.

> Not working with JIT-ed code is imo red flag for the approach as well.

How would this be an issue, this is for the debug path only, and if the 
jitted code behaves differently than the interpreter there is a bigger 
issue.

> When every insn is spamming the logs the only use case I can see
> is to feed the test program with one packet and read thousand lines 
> dump.
> Even that is quite user unfriendly.

The log was for the POC only, the idea is to dump this in a user buffer, 
and with the right tooling (bpftool prog run ... {trace}?) it can be 
stored in an ELF file together with the program, and input/output. Then 
it would be easy to dump the C and eBPF program interleaved as bpftool 
does. If GDB would support eBPF, the format I envision would be good 
enough to support the GDB record/replay functionality.


> How about enabling kprobe in JITed code instead?
> Then if you really need to trap and print regs for every instruction 
> you can
> still do so by placing kprobe on every JITed insn.

This would even be harder as you need to understand the ASM(PPC/ARM/x86) 
to eBPF mapping (registers/code), where all you are interested in is 
eBPF (to C).
This kprobe would also affect all the instances of the program running 
in the system, i.e. for XDP, it could be assigned to all interfaces in 
the system.
And for this purpose, you are only interested in the results of a run 
for a specific packet (in the XDP use case) using the BPF_RUN_API so you 
are not affecting any live traffic.

> But in reality I think few kprobes in the prog will be enough
> to debug the program and XDP prog may still process millions of 
> packets
> because your kprobe could be in error path and the user may want to
> capture only specific things when it triggers.
> kprobe bpf prog will execute in such case and it can capture necessary
> state from xdp prog, from packet or from maps that xdp prog is using.
> Some sort of bpf-gdb would be needed in user space.
> Obviously people shouldn't be writing such kprob-bpf progs that debug
> other bpf progs by hand. bpf-gdb should be able to generate them 
> automatically.

See my opening comment. What you're describing here is more when the 
right developer has access to the specific system. But this might not 
even be possible in some environments.


Let me know if your opinion on this idea changes after reading this, or 
what else is needed to convince you of the need ;)



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
  2020-04-24 12:29         ` Eelco Chaudron
@ 2020-04-28  4:04           ` Alexei Starovoitov
  2020-04-28 10:47             ` Eelco Chaudron
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2020-04-28  4:04 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: Yonghong Song, bpf, David S. Miller, Network Development,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Andrii Nakryiko

On Fri, Apr 24, 2020 at 02:29:56PM +0200, Eelco Chaudron wrote:
> 
> > Not working with JIT-ed code is imo red flag for the approach as well.
> 
> How would this be an issue, this is for the debug path only, and if the
> jitted code behaves differently than the interpreter there is a bigger
> issue.

They are different already. Like tail_calls cannot mix and match interpreter
and JITed. Similar with bpf2bpf calls.
And that difference will be growing further.
At that time of doing bpf trampoline I considering dropping support for
interpreter, but then figured out a relatively cheap way of keeping it alive.
I expect next feature to not support interpreter.

> > When every insn is spamming the logs the only use case I can see
> > is to feed the test program with one packet and read thousand lines
> > dump.
> > Even that is quite user unfriendly.
> 
> The log was for the POC only, the idea is to dump this in a user buffer, and
> with the right tooling (bpftool prog run ... {trace}?) it can be stored in
> an ELF file together with the program, and input/output. Then it would be
> easy to dump the C and eBPF program interleaved as bpftool does. If GDB
> would support eBPF, the format I envision would be good enough to support
> the GDB record/replay functionality.

For the case you have in mind no kernel changes are necessary.
Just run the interpreter in user space.
It can be embedded in gdb binary, for example.

Especially if you don't want to affect production server you definitely
don't want to run anything on that machine.
As support person just grab the prog, capture the traffic and debug
on their own server.

> 
> > How about enabling kprobe in JITed code instead?
> > Then if you really need to trap and print regs for every instruction you
> > can
> > still do so by placing kprobe on every JITed insn.
> 
> This would even be harder as you need to understand the ASM(PPC/ARM/x86) to
> eBPF mapping (registers/code), where all you are interested in is eBPF (to
> C).

Not really. gdb-like tool will hide all that from users.

> This kprobe would also affect all the instances of the program running in
> the system, i.e. for XDP, it could be assigned to all interfaces in the
> system.

There are plenty of ways to solve that.
Such kprobe in a prog can be gated by test_run cmd only.
Or the prog .text can be cloned into new one and kprobed there.

> And for this purpose, you are only interested in the results of a run for a
> specific packet (in the XDP use case) using the BPF_RUN_API so you are not
> affecting any live traffic.

The only way to not affect live traffic is to provide support on
a different machine.

> > But in reality I think few kprobes in the prog will be enough
> > to debug the program and XDP prog may still process millions of packets
> > because your kprobe could be in error path and the user may want to
> > capture only specific things when it triggers.
> > kprobe bpf prog will execute in such case and it can capture necessary
> > state from xdp prog, from packet or from maps that xdp prog is using.
> > Some sort of bpf-gdb would be needed in user space.
> > Obviously people shouldn't be writing such kprob-bpf progs that debug
> > other bpf progs by hand. bpf-gdb should be able to generate them
> > automatically.
> 
> See my opening comment. What you're describing here is more when the right
> developer has access to the specific system. But this might not even be
> possible in some environments.

All I'm saying that kprobe is a way to trace kernel.
The same facility should be used to trace bpf progs.

> 
> Let me know if your opinion on this idea changes after reading this, or what
> else is needed to convince you of the need ;)

I'm very much against hacking in-kernel interpreter into register
dumping facility.
Either use kprobe+bpf for programmatic tracing or intel's pt for pure
instruction trace.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
  2020-04-28  4:04           ` Alexei Starovoitov
@ 2020-04-28 10:47             ` Eelco Chaudron
  2020-04-28 12:19               ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 15+ messages in thread
From: Eelco Chaudron @ 2020-04-28 10:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yonghong Song, bpf, David S. Miller, Network Development,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Andrii Nakryiko



On 28 Apr 2020, at 6:04, Alexei Starovoitov wrote:

> On Fri, Apr 24, 2020 at 02:29:56PM +0200, Eelco Chaudron wrote:
>>
>>> Not working with JIT-ed code is imo red flag for the approach as 
>>> well.
>>
>> How would this be an issue, this is for the debug path only, and if 
>> the
>> jitted code behaves differently than the interpreter there is a 
>> bigger
>> issue.
>
> They are different already. Like tail_calls cannot mix and match 
> interpreter
> and JITed. Similar with bpf2bpf calls.
> And that difference will be growing further.
> At that time of doing bpf trampoline I considering dropping support 
> for
> interpreter, but then figured out a relatively cheap way of keeping it 
> alive.
> I expect next feature to not support interpreter.

If the goal is to face out the interpreter then I have to agree it does 
not make sense to add this facility based on it…

>>> When every insn is spamming the logs the only use case I can see
>>> is to feed the test program with one packet and read thousand lines
>>> dump.
>>> Even that is quite user unfriendly.
>>
>> The log was for the POC only, the idea is to dump this in a user 
>> buffer, and
>> with the right tooling (bpftool prog run ... {trace}?) it can be 
>> stored in
>> an ELF file together with the program, and input/output. Then it 
>> would be
>> easy to dump the C and eBPF program interleaved as bpftool does. If 
>> GDB
>> would support eBPF, the format I envision would be good enough to 
>> support
>> the GDB record/replay functionality.
>
> For the case you have in mind no kernel changes are necessary.
> Just run the interpreter in user space.
> It can be embedded in gdb binary, for example.

I do not believe a user-space approach would work, as you need support 
for all helpers (and make sure they behave specifically to the kernel 
version), as well you need all maps/memory available.

> Especially if you don't want to affect production server you 
> definitely
> don't want to run anything on that machine.

With affecting production server I was not hinting towards some 
performance degradation/CPU/memory usage, but not affecting any of the 
traffic streams by inserting another packet into the network.

> As support person just grab the prog, capture the traffic and debug
> on their own server.
>
>>
>>> How about enabling kprobe in JITed code instead?
>>> Then if you really need to trap and print regs for every instruction 
>>> you
>>> can
>>> still do so by placing kprobe on every JITed insn.
>>
>> This would even be harder as you need to understand the 
>> ASM(PPC/ARM/x86) to
>> eBPF mapping (registers/code), where all you are interested in is 
>> eBPF (to
>> C).
>
> Not really. gdb-like tool will hide all that from users.

Potentially yes if we get support for this in any gdb-like tool.

>> This kprobe would also affect all the instances of the program 
>> running in
>> the system, i.e. for XDP, it could be assigned to all interfaces in 
>> the
>> system.
>
> There are plenty of ways to solve that.
> Such kprobe in a prog can be gated by test_run cmd only.
> Or the prog .text can be cloned into new one and kprobed there.

Ack

>> And for this purpose, you are only interested in the results of a run 
>> for a
>> specific packet (in the XDP use case) using the BPF_RUN_API so you 
>> are not
>> affecting any live traffic.
>
> The only way to not affect live traffic is to provide support on
> a different machine.

See above

>>> But in reality I think few kprobes in the prog will be enough
>>> to debug the program and XDP prog may still process millions of 
>>> packets
>>> because your kprobe could be in error path and the user may want to
>>> capture only specific things when it triggers.
>>> kprobe bpf prog will execute in such case and it can capture 
>>> necessary
>>> state from xdp prog, from packet or from maps that xdp prog is 
>>> using.
>>> Some sort of bpf-gdb would be needed in user space.
>>> Obviously people shouldn't be writing such kprob-bpf progs that 
>>> debug
>>> other bpf progs by hand. bpf-gdb should be able to generate them
>>> automatically.
>>
>> See my opening comment. What you're describing here is more when the 
>> right
>> developer has access to the specific system. But this might not even 
>> be
>> possible in some environments.
>
> All I'm saying that kprobe is a way to trace kernel.
> The same facility should be used to trace bpf progs.

perf doesn’t support tracing bpf programs, do you know of any tools 
that can, or you have any examples that would do this?

>>
>> Let me know if your opinion on this idea changes after reading this, 
>> or what
>> else is needed to convince you of the need ;)
>
> I'm very much against hacking in-kernel interpreter into register
> dumping facility.

If the goal is to eventually remove the interpreter and not even adding 
new features to it I agree it does not make sense to continue this way.

> Either use kprobe+bpf for programmatic tracing or intel's pt for pure
> instruction trace.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
  2020-04-28 10:47             ` Eelco Chaudron
@ 2020-04-28 12:19               ` Arnaldo Carvalho de Melo
  2020-05-01  2:44                 ` Masami Hiramatsu
  0 siblings, 1 reply; 15+ messages in thread
From: Arnaldo Carvalho de Melo @ 2020-04-28 12:19 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: Alexei Starovoitov, Yonghong Song, bpf, Masami Hiramatsu,
	David S. Miller, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Andrii Nakryiko

Em Tue, Apr 28, 2020 at 12:47:53PM +0200, Eelco Chaudron escreveu:
> On 28 Apr 2020, at 6:04, Alexei Starovoitov wrote:
> > On Fri, Apr 24, 2020 at 02:29:56PM +0200, Eelco Chaudron wrote:

> > > > But in reality I think few kprobes in the prog will be enough to
> > > > debug the program and XDP prog may still process millions of
> > > > packets because your kprobe could be in error path and the user
> > > > may want to capture only specific things when it triggers.

> > > > kprobe bpf prog will execute in such case and it can capture
> > > > necessary state from xdp prog, from packet or from maps that xdp
> > > > prog is using.

> > > > Some sort of bpf-gdb would be needed in user space.  Obviously
> > > > people shouldn't be writing such kprob-bpf progs that debug
> > > > other bpf progs by hand. bpf-gdb should be able to generate them
> > > > automatically.

> > > See my opening comment. What you're describing here is more when
> > > the right developer has access to the specific system. But this
> > > might not even be possible in some environments.

> > All I'm saying that kprobe is a way to trace kernel.
> > The same facility should be used to trace bpf progs.
 
> perf doesn’t support tracing bpf programs, do you know of any tools that
> can, or you have any examples that would do this?

I'm discussing with Yonghong and Masami what would be needed for 'perf
probe' to be able to add kprobes to BPF jitted areas in addition to
vmlinux and modules.

- Arnaldo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
  2020-04-28 12:19               ` Arnaldo Carvalho de Melo
@ 2020-05-01  2:44                 ` Masami Hiramatsu
  2020-05-06  1:25                   ` Alexei Starovoitov
  0 siblings, 1 reply; 15+ messages in thread
From: Masami Hiramatsu @ 2020-05-01  2:44 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Eelco Chaudron, Alexei Starovoitov, Yonghong Song, bpf,
	Masami Hiramatsu, David S. Miller, Network Development,
	Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Andrii Nakryiko

On Tue, 28 Apr 2020 09:19:47 -0300
Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Em Tue, Apr 28, 2020 at 12:47:53PM +0200, Eelco Chaudron escreveu:
> > On 28 Apr 2020, at 6:04, Alexei Starovoitov wrote:
> > > On Fri, Apr 24, 2020 at 02:29:56PM +0200, Eelco Chaudron wrote:
> 
> > > > > But in reality I think few kprobes in the prog will be enough to
> > > > > debug the program and XDP prog may still process millions of
> > > > > packets because your kprobe could be in error path and the user
> > > > > may want to capture only specific things when it triggers.
> 
> > > > > kprobe bpf prog will execute in such case and it can capture
> > > > > necessary state from xdp prog, from packet or from maps that xdp
> > > > > prog is using.
> 
> > > > > Some sort of bpf-gdb would be needed in user space.  Obviously
> > > > > people shouldn't be writing such kprob-bpf progs that debug
> > > > > other bpf progs by hand. bpf-gdb should be able to generate them
> > > > > automatically.
> 
> > > > See my opening comment. What you're describing here is more when
> > > > the right developer has access to the specific system. But this
> > > > might not even be possible in some environments.
> 
> > > All I'm saying that kprobe is a way to trace kernel.
> > > The same facility should be used to trace bpf progs.
>  
> > perf doesn’t support tracing bpf programs, do you know of any tools that
> > can, or you have any examples that would do this?
> 
> I'm discussing with Yonghong and Masami what would be needed for 'perf
> probe' to be able to add kprobes to BPF jitted areas in addition to
> vmlinux and modules.

At a grance, at first we need a debuginfo which maps the source code and
BPF binaries. We also need to get a map from the kernel indicating
which instructions the bpf code was jited to.
Are there any such information?

Also, I would like to know the target BPF (XDP) is running in kprobes
context or not. BPF tracer sometimes use the kprobes to hook the event
and run in the kprobe (INT3) context. That will be need more work to
probe it.
For the BPF code which just runs in tracepoint context, it will be easy
to probe it. (we may need to break a limitation of notrace, which we
already has a kconfig)

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
  2020-05-01  2:44                 ` Masami Hiramatsu
@ 2020-05-06  1:25                   ` Alexei Starovoitov
  2020-05-07  8:55                     ` Masami Hiramatsu
  0 siblings, 1 reply; 15+ messages in thread
From: Alexei Starovoitov @ 2020-05-06  1:25 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Arnaldo Carvalho de Melo, Eelco Chaudron, Yonghong Song, bpf,
	David S. Miller, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Andrii Nakryiko

On Thu, Apr 30, 2020 at 7:44 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
>
> On Tue, 28 Apr 2020 09:19:47 -0300
> Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> > Em Tue, Apr 28, 2020 at 12:47:53PM +0200, Eelco Chaudron escreveu:
> > > On 28 Apr 2020, at 6:04, Alexei Starovoitov wrote:
> > > > On Fri, Apr 24, 2020 at 02:29:56PM +0200, Eelco Chaudron wrote:
> >
> > > > > > But in reality I think few kprobes in the prog will be enough to
> > > > > > debug the program and XDP prog may still process millions of
> > > > > > packets because your kprobe could be in error path and the user
> > > > > > may want to capture only specific things when it triggers.
> >
> > > > > > kprobe bpf prog will execute in such case and it can capture
> > > > > > necessary state from xdp prog, from packet or from maps that xdp
> > > > > > prog is using.
> >
> > > > > > Some sort of bpf-gdb would be needed in user space.  Obviously
> > > > > > people shouldn't be writing such kprob-bpf progs that debug
> > > > > > other bpf progs by hand. bpf-gdb should be able to generate them
> > > > > > automatically.
> >
> > > > > See my opening comment. What you're describing here is more when
> > > > > the right developer has access to the specific system. But this
> > > > > might not even be possible in some environments.
> >
> > > > All I'm saying that kprobe is a way to trace kernel.
> > > > The same facility should be used to trace bpf progs.
> >
> > > perf doesn’t support tracing bpf programs, do you know of any tools that
> > > can, or you have any examples that would do this?
> >
> > I'm discussing with Yonghong and Masami what would be needed for 'perf
> > probe' to be able to add kprobes to BPF jitted areas in addition to
> > vmlinux and modules.
>
> At a grance, at first we need a debuginfo which maps the source code and
> BPF binaries. We also need to get a map from the kernel indicating
> which instructions the bpf code was jited to.
> Are there any such information?

it's already there. Try 'bpftool prog dump jited id N'
It will show something like this:
; data = ({typeof(errors.leaf) *leaf =
bpf_map_lookup_elem_(bpf_pseudo_fd(1, -11), &type_key); if (!leaf) {
bpf_map_update_elem_(bpf_pseudo_fd(1, -11), &type_key, &zero,
BPF_NOEXIST); leaf = bpf_map_lookup_elem_(bpf_pseudo_fd(1, -11), &t;
 81d:    movabs $0xffff8881a0679000,%rdi
; return bpf_map_lookup_elem((void *)map, key);
 827:    mov    %rbx,%rsi
 82a:    callq  0xffffffffe0f7f448
 82f:    test   %rax,%rax
 832:    je     0x0000000000000838
 834:    add    $0x40,%rax
; if (!data)
 838:    test   %rax,%rax
 83b:    je     0x0000000000000846
 83d:    mov    $0x1,%edi
; lock_xadd(data, 1);
 842:    lock add %edi,0x0(%rax)

> Also, I would like to know the target BPF (XDP) is running in kprobes
> context or not. BPF tracer sometimes use the kprobes to hook the event
> and run in the kprobe (INT3) context. That will be need more work to
> probe it.
> For the BPF code which just runs in tracepoint context, it will be easy
> to probe it. (we may need to break a limitation of notrace, which we
> already has a kconfig)

yeah. this mechanism won't be able to debug bpf progs that are
attached to kprobes via int3. But that is rare case.
Most kprobe+bpf are for function entry and adding int3 to jited bpf code
will work just like for normal kernel functions.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API
  2020-05-06  1:25                   ` Alexei Starovoitov
@ 2020-05-07  8:55                     ` Masami Hiramatsu
  0 siblings, 0 replies; 15+ messages in thread
From: Masami Hiramatsu @ 2020-05-07  8:55 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Arnaldo Carvalho de Melo, Eelco Chaudron, Yonghong Song, bpf,
	David S. Miller, Network Development, Alexei Starovoitov,
	Daniel Borkmann, Martin KaFai Lau, Song Liu, Andrii Nakryiko

On Tue, 5 May 2020 18:25:38 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Thu, Apr 30, 2020 at 7:44 PM Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >
> > On Tue, 28 Apr 2020 09:19:47 -0300
> > Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> >
> > > Em Tue, Apr 28, 2020 at 12:47:53PM +0200, Eelco Chaudron escreveu:
> > > > On 28 Apr 2020, at 6:04, Alexei Starovoitov wrote:
> > > > > On Fri, Apr 24, 2020 at 02:29:56PM +0200, Eelco Chaudron wrote:
> > >
> > > > > > > But in reality I think few kprobes in the prog will be enough to
> > > > > > > debug the program and XDP prog may still process millions of
> > > > > > > packets because your kprobe could be in error path and the user
> > > > > > > may want to capture only specific things when it triggers.
> > >
> > > > > > > kprobe bpf prog will execute in such case and it can capture
> > > > > > > necessary state from xdp prog, from packet or from maps that xdp
> > > > > > > prog is using.
> > >
> > > > > > > Some sort of bpf-gdb would be needed in user space.  Obviously
> > > > > > > people shouldn't be writing such kprob-bpf progs that debug
> > > > > > > other bpf progs by hand. bpf-gdb should be able to generate them
> > > > > > > automatically.
> > >
> > > > > > See my opening comment. What you're describing here is more when
> > > > > > the right developer has access to the specific system. But this
> > > > > > might not even be possible in some environments.
> > >
> > > > > All I'm saying that kprobe is a way to trace kernel.
> > > > > The same facility should be used to trace bpf progs.
> > >
> > > > perf doesn’t support tracing bpf programs, do you know of any tools that
> > > > can, or you have any examples that would do this?
> > >
> > > I'm discussing with Yonghong and Masami what would be needed for 'perf
> > > probe' to be able to add kprobes to BPF jitted areas in addition to
> > > vmlinux and modules.
> >
> > At a grance, at first we need a debuginfo which maps the source code and
> > BPF binaries. We also need to get a map from the kernel indicating
> > which instructions the bpf code was jited to.
> > Are there any such information?
> 
> it's already there. Try 'bpftool prog dump jited id N'
> It will show something like this:
> ; data = ({typeof(errors.leaf) *leaf =
> bpf_map_lookup_elem_(bpf_pseudo_fd(1, -11), &type_key); if (!leaf) {
> bpf_map_update_elem_(bpf_pseudo_fd(1, -11), &type_key, &zero,
> BPF_NOEXIST); leaf = bpf_map_lookup_elem_(bpf_pseudo_fd(1, -11), &t;
>  81d:    movabs $0xffff8881a0679000,%rdi
> ; return bpf_map_lookup_elem((void *)map, key);
>  827:    mov    %rbx,%rsi
>  82a:    callq  0xffffffffe0f7f448
>  82f:    test   %rax,%rax
>  832:    je     0x0000000000000838
>  834:    add    $0x40,%rax
> ; if (!data)
>  838:    test   %rax,%rax
>  83b:    je     0x0000000000000846
>  83d:    mov    $0x1,%edi
> ; lock_xadd(data, 1);
>  842:    lock add %edi,0x0(%rax)

Hm, so bpftool or libbpf has some source-address (offset) mapping
APIs, that will help for me.

BTW, I would like to confirm that if the kernel has jited code,
it will not fall back to xlated code. Is it correct?

> > Also, I would like to know the target BPF (XDP) is running in kprobes
> > context or not. BPF tracer sometimes use the kprobes to hook the event
> > and run in the kprobe (INT3) context. That will be need more work to
> > probe it.
> > For the BPF code which just runs in tracepoint context, it will be easy
> > to probe it. (we may need to break a limitation of notrace, which we
> > already has a kconfig)
> 
> yeah. this mechanism won't be able to debug bpf progs that are
> attached to kprobes via int3. But that is rare case.
> Most kprobe+bpf are for function entry and adding int3 to jited bpf code
> will work just like for normal kernel functions.

OK, anyway, I've done the nested kprobe support on x86/arm/arm64 :)
That will make it easy.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-05-07  8:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-18 13:06 [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API Eelco Chaudron
2020-03-18 13:06 ` [RFC PATCH bpf-next 1/3] bpf: introduce trace option to the BPF_PROG_TEST_RUN command API Eelco Chaudron
2020-03-18 13:06 ` [RFC PATCH bpf-next 2/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API Eelco Chaudron
2020-03-18 13:06 ` [RFC PATCH bpf-next 3/3] selftests/bpf: call bpf_prog_test_run with trace enabled for XDP test Eelco Chaudron
2020-03-23 22:47 ` [RFC PATCH bpf-next 0/3] bpf: add tracing for XDP programs using the BPF_PROG_TEST_RUN API Yonghong Song
2020-04-16 12:45   ` Eelco Chaudron
2020-04-19  7:01     ` Yonghong Song
2020-04-19 22:54       ` Alexei Starovoitov
2020-04-24 12:29         ` Eelco Chaudron
2020-04-28  4:04           ` Alexei Starovoitov
2020-04-28 10:47             ` Eelco Chaudron
2020-04-28 12:19               ` Arnaldo Carvalho de Melo
2020-05-01  2:44                 ` Masami Hiramatsu
2020-05-06  1:25                   ` Alexei Starovoitov
2020-05-07  8:55                     ` Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).