Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing
@ 2019-10-05  5:03 Alexei Starovoitov
  2019-10-05  5:03 ` [PATCH bpf-next 01/10] bpf: add typecast to raw_tracepoints to help BTF generation Alexei Starovoitov
                   ` (9 more replies)
  0 siblings, 10 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-05  5:03 UTC (permalink / raw)
  To: davem; +Cc: daniel, x86, netdev, bpf, kernel-team

Revolutionize bpf tracing and bpf C programming.
C language allows any pointer to be typecasted to any other pointer
or convert integer to a pointer.
Though bpf verifier is operating at assembly level it has strict type
checking for fixed number of types.
Known types are defined in 'enum bpf_reg_type'.
For example:
PTR_TO_FLOW_KEYS is a pointer to 'struct bpf_flow_keys'
PTR_TO_SOCKET is a pointer to 'struct bpf_sock',
and so on.

When it comes to bpf tracing there are no types to track.
bpf+kprobe receives 'struct pt_regs' as input.
bpf+raw_tracepoint receives raw kernel arguments as an array of u64 values.
It was up to bpf program to interpret these integers.
Typical tracing program looks like:
int bpf_prog(struct pt_regs *ctx)
{
    struct net_device *dev;
    struct sk_buff *skb;
    int ifindex;

    skb = (struct sk_buff *) ctx->di;
    bpf_probe_read(&dev, sizeof(dev), &skb->dev);
    bpf_probe_read(&ifindex, sizeof(ifindex), &dev->ifindex);
}
Addressing mistakes will not be caught by C compiler or by the verifier.
The program above could have typecasted ctx->si to skb and page faulted
on every bpf_probe_read().
bpf_probe_read() allows reading any address and suppresses page faults.
Typical program has hundreds of bpf_probe_read() calls to walk
kernel data structures.
Not only tracing program would be slow, but there was always a risk
that bpf_probe_read() would read mmio region of memory and cause
unpredictable hw behavior.

With introduction of Compile Once Run Everywhere technology in libbpf
and in LLVM and BPF Type Format (BTF) the verifier is finally ready
for the next step in program verification.
Now it can use in-kernel BTF to type check bpf assembly code.

Equivalent program will look like:
struct trace_kfree_skb {
    struct sk_buff *skb;
    void *location;
};
SEC("raw_tracepoint/kfree_skb")
int trace_kfree_skb(struct trace_kfree_skb* ctx)
{
    struct sk_buff *skb = ctx->skb;
    struct net_device *dev;
    int ifindex;

    __builtin_preserve_access_index(({
        dev = skb->dev;
        ifindex = dev->ifindex;
    }));
}

These patches teach bpf verifier to recognize kfree_skb's first argument
as 'struct sk_buff *' because this is what kernel C code is doing.
The bpf program cannot 'cheat' and say that the first argument
to kfree_skb raw_tracepoint is some other type.
The verifier will catch such type mismatch between bpf program
assumption of kernel code and the actual type in the kernel.

Furthermore skb->dev access is type tracked as well.
The verifier can see which field of skb is being read
in bpf assembly. It will match offset to type.
If bpf program has code:
struct net_device *dev = (void *)skb->len;
C compiler will not complain and generate bpf assembly code,
but the verifier will recognize that integer 'len' field
is being accessed at offsetof(struct sk_buff, len) and will reject
further dereference of 'dev' variable because it contains
integer value instead of a pointer.

Such sophisticated type tracking allows calling networking
bpf helpers from tracing programs.
This patchset allows calling bpf_skb_event_output() that dumps
skb data into perf ring buffer.
It greatly improves observability.
Now users can not only see packet lenth of the skb
about to be freed in kfree_skb() kernel function, but can
dump it to user space via perf ring buffer using bpf helper
that was previously available only to TC and socket filters.
See patch 10 for full example.

The end result is safer and faster bpf tracing.
Safer - because direct calls to bpf_probe_read() are disallowed and
arbitrary addresses cannot be read.
Faster - because normal loads are used to walk kernel data structures
instead of bpf_probe_read() calls.
Note that such loads can page fault and are supported by
hidden bpf_probe_read() in interpreter and via exception table
if program is JITed.

See patches for details.

Alexei Starovoitov (10):
  bpf: add typecast to raw_tracepoints to help BTF generation
  bpf: add typecast to bpf helpers to help BTF generation
  bpf: process in-kernel BTF
  libbpf: auto-detect btf_id of raw_tracepoint
  bpf: implement accurate raw_tp context access via BTF
  bpf: add support for BTF pointers to interpreter
  bpf: add support for BTF pointers to x86 JIT
  bpf: check types of arguments passed into helpers
  bpf: disallow bpf_probe_read[_str] helpers
  selftests/bpf: add kfree_skb raw_tp test

 arch/x86/net/bpf_jit_comp.c                   |  96 +++++-
 include/linux/bpf.h                           |  21 +-
 include/linux/bpf_verifier.h                  |   6 +-
 include/linux/btf.h                           |   1 +
 include/linux/extable.h                       |  10 +
 include/linux/filter.h                        |   6 +-
 include/trace/bpf_probe.h                     |   3 +-
 include/uapi/linux/bpf.h                      |   3 +-
 kernel/bpf/btf.c                              | 318 ++++++++++++++++++
 kernel/bpf/core.c                             |  39 ++-
 kernel/bpf/verifier.c                         | 125 ++++++-
 kernel/extable.c                              |   2 +
 kernel/trace/bpf_trace.c                      |  10 +-
 net/core/filter.c                             |  15 +-
 tools/include/uapi/linux/bpf.h                |   3 +-
 tools/lib/bpf/libbpf.c                        |  16 +
 tools/testing/selftests/bpf/bpf_helpers.h     |   4 +
 .../selftests/bpf/prog_tests/kfree_skb.c      |  90 +++++
 tools/testing/selftests/bpf/progs/kfree_skb.c |  76 +++++
 19 files changed, 828 insertions(+), 16 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/kfree_skb.c
 create mode 100644 tools/testing/selftests/bpf/progs/kfree_skb.c

-- 
2.20.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 01/10] bpf: add typecast to raw_tracepoints to help BTF generation
  2019-10-05  5:03 [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing Alexei Starovoitov
@ 2019-10-05  5:03 ` Alexei Starovoitov
  2019-10-05 18:40   ` Andrii Nakryiko
  2019-10-06  3:58   ` John Fastabend
  2019-10-05  5:03 ` [PATCH bpf-next 02/10] bpf: add typecast to bpf helpers " Alexei Starovoitov
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-05  5:03 UTC (permalink / raw)
  To: davem; +Cc: daniel, x86, netdev, bpf, kernel-team

When pahole converts dwarf to btf it emits only used types.
Wrap existing __bpf_trace_##template() function into
btf_trace_##template typedef and use it in type cast to
make gcc emits this type into dwarf. Then pahole will convert it to btf.
The "btf_trace_" prefix will be used to identify BTF enabled raw tracepoints.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/trace/bpf_probe.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index d6e556c0a085..ff1a879773df 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -74,11 +74,12 @@ static inline void bpf_test_probe_##call(void)				\
 {									\
 	check_trace_callback_type_##call(__bpf_trace_##template);	\
 }									\
+typedef void (*btf_trace_##template)(void *__data, proto);		\
 static struct bpf_raw_event_map	__used					\
 	__attribute__((section("__bpf_raw_tp_map")))			\
 __bpf_trace_tp_map_##call = {						\
 	.tp		= &__tracepoint_##call,				\
-	.bpf_func	= (void *)__bpf_trace_##template,		\
+	.bpf_func	= (void *)(btf_trace_##template)__bpf_trace_##template,	\
 	.num_args	= COUNT_ARGS(args),				\
 	.writable_size	= size,						\
 };
-- 
2.20.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 02/10] bpf: add typecast to bpf helpers to help BTF generation
  2019-10-05  5:03 [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing Alexei Starovoitov
  2019-10-05  5:03 ` [PATCH bpf-next 01/10] bpf: add typecast to raw_tracepoints to help BTF generation Alexei Starovoitov
@ 2019-10-05  5:03 ` " Alexei Starovoitov
  2019-10-05 18:41   ` Andrii Nakryiko
  2019-10-06  4:00   ` John Fastabend
  2019-10-05  5:03 ` [PATCH bpf-next 03/10] bpf: process in-kernel BTF Alexei Starovoitov
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-05  5:03 UTC (permalink / raw)
  To: davem; +Cc: daniel, x86, netdev, bpf, kernel-team

When pahole converts dwarf to btf it emits only used types.
Wrap existing bpf helper functions into typedef and use it in
typecast to make gcc emits this type into dwarf.
Then pahole will convert it to btf.
The "btf_#name_of_helper" types will be used to figure out
types of arguments of bpf helpers.
The generate code before and after is the same.
Only dwarf and btf are different.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/filter.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 2ce57645f3cd..d3d51d7aff2c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -464,10 +464,11 @@ static inline bool insn_is_zext(const struct bpf_insn *insn)
 #define BPF_CALL_x(x, name, ...)					       \
 	static __always_inline						       \
 	u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__));   \
+	typedef u64 (*btf_##name)(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \
 	u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__));	       \
 	u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__))	       \
 	{								       \
-		return ____##name(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\
+		return ((btf_##name)____##name)(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\
 	}								       \
 	static __always_inline						       \
 	u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__))
-- 
2.20.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 03/10] bpf: process in-kernel BTF
  2019-10-05  5:03 [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing Alexei Starovoitov
  2019-10-05  5:03 ` [PATCH bpf-next 01/10] bpf: add typecast to raw_tracepoints to help BTF generation Alexei Starovoitov
  2019-10-05  5:03 ` [PATCH bpf-next 02/10] bpf: add typecast to bpf helpers " Alexei Starovoitov
@ 2019-10-05  5:03 ` Alexei Starovoitov
  2019-10-06  6:36   ` Andrii Nakryiko
  2019-10-09 20:51   ` Martin Lau
  2019-10-05  5:03 ` [PATCH bpf-next 04/10] libbpf: auto-detect btf_id of raw_tracepoint Alexei Starovoitov
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-05  5:03 UTC (permalink / raw)
  To: davem; +Cc: daniel, x86, netdev, bpf, kernel-team

If in-kernel BTF exists parse it and prepare 'struct btf *btf_vmlinux'
for further use by the verifier.
In-kernel BTF is trusted just like kallsyms and other build artifacts
embedded into vmlinux.
Yet run this BTF image through BTF verifier to make sure
that it is valid and it wasn't mangled during the build.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf_verifier.h |  4 ++-
 include/linux/btf.h          |  1 +
 kernel/bpf/btf.c             | 66 ++++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c        | 18 ++++++++++
 4 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 26a6d58ca78c..432ba8977a0a 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -330,10 +330,12 @@ static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log)
 #define BPF_LOG_STATS	4
 #define BPF_LOG_LEVEL	(BPF_LOG_LEVEL1 | BPF_LOG_LEVEL2)
 #define BPF_LOG_MASK	(BPF_LOG_LEVEL | BPF_LOG_STATS)
+#define BPF_LOG_KERNEL (BPF_LOG_MASK + 1)
 
 static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
 {
-	return log->level && log->ubuf && !bpf_verifier_log_full(log);
+	return (log->level && log->ubuf && !bpf_verifier_log_full(log)) ||
+		log->level == BPF_LOG_KERNEL;
 }
 
 #define BPF_MAX_SUBPROGS 256
diff --git a/include/linux/btf.h b/include/linux/btf.h
index 64cdf2a23d42..55d43bc856be 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -56,6 +56,7 @@ bool btf_type_is_void(const struct btf_type *t);
 #ifdef CONFIG_BPF_SYSCALL
 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
 const char *btf_name_by_offset(const struct btf *btf, u32 offset);
+struct btf *btf_parse_vmlinux(void);
 #else
 static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
 						    u32 type_id)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 29c7c06c6bd6..848f9d4b9d7e 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -698,6 +698,9 @@ __printf(4, 5) static void __btf_verifier_log_type(struct btf_verifier_env *env,
 	if (!bpf_verifier_log_needed(log))
 		return;
 
+	if (log->level == BPF_LOG_KERNEL && !fmt)
+		return;
+
 	__btf_verifier_log(log, "[%u] %s %s%s",
 			   env->log_type_id,
 			   btf_kind_str[kind],
@@ -735,6 +738,8 @@ static void btf_verifier_log_member(struct btf_verifier_env *env,
 	if (!bpf_verifier_log_needed(log))
 		return;
 
+	if (log->level == BPF_LOG_KERNEL && !fmt)
+		return;
 	/* The CHECK_META phase already did a btf dump.
 	 *
 	 * If member is logged again, it must hit an error in
@@ -777,6 +782,8 @@ static void btf_verifier_log_vsi(struct btf_verifier_env *env,
 
 	if (!bpf_verifier_log_needed(log))
 		return;
+	if (log->level == BPF_LOG_KERNEL && !fmt)
+		return;
 	if (env->phase != CHECK_META)
 		btf_verifier_log_type(env, datasec_type, NULL);
 
@@ -802,6 +809,8 @@ static void btf_verifier_log_hdr(struct btf_verifier_env *env,
 	if (!bpf_verifier_log_needed(log))
 		return;
 
+	if (log->level == BPF_LOG_KERNEL)
+		return;
 	hdr = &btf->hdr;
 	__btf_verifier_log(log, "magic: 0x%x\n", hdr->magic);
 	__btf_verifier_log(log, "version: %u\n", hdr->version);
@@ -2406,6 +2415,8 @@ static s32 btf_enum_check_meta(struct btf_verifier_env *env,
 		}
 
 
+		if (env->log.level == BPF_LOG_KERNEL)
+			continue;
 		btf_verifier_log(env, "\t%s val=%d\n",
 				 __btf_name_by_offset(btf, enums[i].name_off),
 				 enums[i].val);
@@ -3367,6 +3378,61 @@ static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
 	return ERR_PTR(err);
 }
 
+extern char __weak _binary__btf_vmlinux_bin_start[];
+extern char __weak _binary__btf_vmlinux_bin_end[];
+
+struct btf *btf_parse_vmlinux(void)
+{
+	struct btf_verifier_env *env = NULL;
+	struct bpf_verifier_log *log;
+	struct btf *btf = NULL;
+	int err;
+
+	env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
+	if (!env)
+		return ERR_PTR(-ENOMEM);
+
+	log = &env->log;
+	log->level = BPF_LOG_KERNEL;
+
+	btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
+	if (!btf) {
+		err = -ENOMEM;
+		goto errout;
+	}
+	env->btf = btf;
+
+	btf->data = _binary__btf_vmlinux_bin_start;
+	btf->data_size = _binary__btf_vmlinux_bin_end -
+		_binary__btf_vmlinux_bin_start;
+
+	err = btf_parse_hdr(env);
+	if (err)
+		goto errout;
+
+	btf->nohdr_data = btf->data + btf->hdr.hdr_len;
+
+	err = btf_parse_str_sec(env);
+	if (err)
+		goto errout;
+
+	err = btf_check_all_metas(env);
+	if (err)
+		goto errout;
+
+	btf_verifier_env_free(env);
+	refcount_set(&btf->refcnt, 1);
+	return btf;
+
+errout:
+	btf_verifier_env_free(env);
+	if (btf) {
+		kvfree(btf->types);
+		kfree(btf);
+	}
+	return ERR_PTR(err);
+}
+
 void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
 		       struct seq_file *m)
 {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ffc3e53f5300..91c4db4d1c6a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -207,6 +207,8 @@ struct bpf_call_arg_meta {
 	int func_id;
 };
 
+struct btf *btf_vmlinux;
+
 static DEFINE_MUTEX(bpf_verifier_lock);
 
 static const struct bpf_line_info *
@@ -243,6 +245,10 @@ void bpf_verifier_vlog(struct bpf_verifier_log *log, const char *fmt,
 	n = min(log->len_total - log->len_used - 1, n);
 	log->kbuf[n] = '\0';
 
+	if (log->level == BPF_LOG_KERNEL) {
+		pr_err("BPF:%s\n", log->kbuf);
+		return;
+	}
 	if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1))
 		log->len_used += n;
 	else
@@ -9241,6 +9247,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 	env->ops = bpf_verifier_ops[env->prog->type];
 	is_priv = capable(CAP_SYS_ADMIN);
 
+	if (is_priv && !btf_vmlinux) {
+		mutex_lock(&bpf_verifier_lock);
+		btf_vmlinux = btf_parse_vmlinux();
+		mutex_unlock(&bpf_verifier_lock);
+	}
+
 	/* grab the mutex to protect few globals used by verifier */
 	if (!is_priv)
 		mutex_lock(&bpf_verifier_lock);
@@ -9260,6 +9272,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
 			goto err_unlock;
 	}
 
+	if (IS_ERR(btf_vmlinux)) {
+		verbose(env, "in-kernel BTF is malformed\n");
+		ret = PTR_ERR(btf_vmlinux);
+		goto err_unlock;
+	}
+
 	env->strict_alignment = !!(attr->prog_flags & BPF_F_STRICT_ALIGNMENT);
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
 		env->strict_alignment = true;
-- 
2.20.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 04/10] libbpf: auto-detect btf_id of raw_tracepoint
  2019-10-05  5:03 [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing Alexei Starovoitov
                   ` (2 preceding siblings ...)
  2019-10-05  5:03 ` [PATCH bpf-next 03/10] bpf: process in-kernel BTF Alexei Starovoitov
@ 2019-10-05  5:03 ` Alexei Starovoitov
  2019-10-07 23:41   ` Andrii Nakryiko
  2019-10-05  5:03 ` [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF Alexei Starovoitov
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-05  5:03 UTC (permalink / raw)
  To: davem; +Cc: daniel, x86, netdev, bpf, kernel-team

For raw tracepoint program types libbpf will try to find
btf_id of raw tracepoint in vmlinux's BTF.
It's a responsiblity of bpf program author to annotate the program
with SEC("raw_tracepoint/name") where "name" is a valid raw tracepoint.
If "name" is indeed a valid raw tracepoint then in-kernel BTF
will have "btf_trace_##name" typedef that points to function
prototype of that raw tracepoint. BTF description captures
exact argument the kernel C code is passing into raw tracepoint.
The kernel verifier will check the types while loading bpf program.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 tools/lib/bpf/libbpf.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e0276520171b..0e6f7b41c521 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4591,6 +4591,22 @@ int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
 			continue;
 		*prog_type = section_names[i].prog_type;
 		*expected_attach_type = section_names[i].expected_attach_type;
+		if (*prog_type == BPF_PROG_TYPE_RAW_TRACEPOINT) {
+			struct btf *btf = bpf_core_find_kernel_btf();
+			char raw_tp_btf_name[128] = "btf_trace_";
+			int ret;
+
+			if (IS_ERR(btf))
+				/* lack of kernel BTF is not a failure */
+				return 0;
+			/* append "btf_trace_" prefix per kernel convention */
+			strcpy(raw_tp_btf_name + sizeof("btf_trace_") - 1,
+			       name + section_names[i].len);
+			ret = btf__find_by_name(btf, raw_tp_btf_name);
+			if (ret > 0)
+				*expected_attach_type = ret;
+			btf__free(btf);
+		}
 		return 0;
 	}
 	pr_warning("failed to guess program type based on ELF section name '%s'\n", name);
-- 
2.20.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF
  2019-10-05  5:03 [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing Alexei Starovoitov
                   ` (3 preceding siblings ...)
  2019-10-05  5:03 ` [PATCH bpf-next 04/10] libbpf: auto-detect btf_id of raw_tracepoint Alexei Starovoitov
@ 2019-10-05  5:03 ` Alexei Starovoitov
  2019-10-07 16:32   ` Alan Maguire
  2019-10-08  0:35   ` Andrii Nakryiko
  2019-10-05  5:03 ` [PATCH bpf-next 06/10] bpf: add support for BTF pointers to interpreter Alexei Starovoitov
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-05  5:03 UTC (permalink / raw)
  To: davem; +Cc: daniel, x86, netdev, bpf, kernel-team

libbpf analyzes bpf C program, searches in-kernel BTF for given type name
and stores it into expected_attach_type.
The kernel verifier expects this btf_id to point to something like:
typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, void *loc);
which represents signature of raw_tracepoint "kfree_skb".

Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb'
and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint.
In first case it passes btf_id of 'struct sk_buff *' back to the verifier core
and 'void *' in second case.

Then the verifier tracks PTR_TO_BTF_ID as any other pointer type.
Like PTR_TO_SOCKET points to 'struct bpf_sock',
PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on.
PTR_TO_BTF_ID points to in-kernel structs.
If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF
then PTR_TO_BTF_ID#1234 points to one of in kernel skbs.

When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32)
the btf_struct_access() checks which field of 'struct sk_buff' is
at offset 32. Checks that size of access matches type definition
of the field and continues to track the dereferenced type.
If that field was a pointer to 'struct net_device' the r2's type
will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device'
in vmlinux's BTF.

Such verifier anlaysis prevents "cheating" in BPF C program.
The program cannot cast arbitrary pointer to 'struct sk_buff *'
and access it. C compiler would allow type cast, of course,
but the verifier will notice type mismatch based on BPF assembly
and in-kernel BTF.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf.h          |  15 ++-
 include/linux/bpf_verifier.h |   2 +
 kernel/bpf/btf.c             | 179 +++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c        |  69 +++++++++++++-
 kernel/trace/bpf_trace.c     |   2 +-
 5 files changed, 262 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5b9d22338606..2dc3a7c313e9 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -281,6 +281,7 @@ enum bpf_reg_type {
 	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
 	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
 	PTR_TO_XDP_SOCK,	 /* reg points to struct xdp_sock */
+	PTR_TO_BTF_ID,
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -288,7 +289,11 @@ enum bpf_reg_type {
  */
 struct bpf_insn_access_aux {
 	enum bpf_reg_type reg_type;
-	int ctx_field_size;
+	union {
+		int ctx_field_size;
+		u32 btf_id;
+	};
+	struct bpf_verifier_env *env; /* for verbose logs */
 };
 
 static inline void
@@ -747,6 +752,14 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 				     const union bpf_attr *kattr,
 				     union bpf_attr __user *uattr);
+bool btf_ctx_access(int off, int size, enum bpf_access_type type,
+		    const struct bpf_prog *prog,
+		    struct bpf_insn_access_aux *info);
+int btf_struct_access(struct bpf_verifier_env *env,
+		      const struct btf_type *t, int off, int size,
+		      enum bpf_access_type atype,
+		      u32 *next_btf_id);
+
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 432ba8977a0a..e21782f49c45 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -52,6 +52,8 @@ struct bpf_reg_state {
 		 */
 		struct bpf_map *map_ptr;
 
+		u32 btf_id; /* for PTR_TO_BTF_ID */
+
 		/* Max size from any of the above. */
 		unsigned long raw;
 	};
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 848f9d4b9d7e..61ff8a54ca22 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3433,6 +3433,185 @@ struct btf *btf_parse_vmlinux(void)
 	return ERR_PTR(err);
 }
 
+extern struct btf *btf_vmlinux;
+
+bool btf_ctx_access(int off, int size, enum bpf_access_type type,
+		    const struct bpf_prog *prog,
+		    struct bpf_insn_access_aux *info)
+{
+	u32 btf_id = prog->expected_attach_type;
+	const struct btf_param *args;
+	const struct btf_type *t;
+	const char prefix[] = "btf_trace_";
+	const char *tname;
+	u32 nr_args;
+
+	if (!btf_id)
+		return true;
+
+	if (IS_ERR(btf_vmlinux)) {
+		bpf_verifier_log_write(info->env, "btf_vmlinux is malformed\n");
+		return false;
+	}
+
+	t = btf_type_by_id(btf_vmlinux, btf_id);
+	if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_TYPEDEF) {
+		bpf_verifier_log_write(info->env, "btf_id is invalid\n");
+		return false;
+	}
+
+	tname = __btf_name_by_offset(btf_vmlinux, t->name_off);
+	if (strncmp(prefix, tname, sizeof(prefix) - 1)) {
+		bpf_verifier_log_write(info->env,
+				       "btf_id points to wrong type name %s\n",
+				       tname);
+		return false;
+	}
+	tname += sizeof(prefix) - 1;
+
+	t = btf_type_by_id(btf_vmlinux, t->type);
+	if (!btf_type_is_ptr(t))
+		return false;
+	t = btf_type_by_id(btf_vmlinux, t->type);
+	if (!btf_type_is_func_proto(t))
+		return false;
+
+	args = (const struct btf_param *)(t + 1);
+	/* skip first 'void *__data' argument in btf_trace_* */
+	nr_args = btf_type_vlen(t) - 1;
+	if (off >= nr_args * 8) {
+		bpf_verifier_log_write(info->env,
+				       "raw_tp '%s' doesn't have %d-th argument\n",
+				       tname, off / 8);
+		return false;
+	}
+
+	/* raw tp arg is off / 8, but typedef has extra 'void *', hence +1 */
+	t = btf_type_by_id(btf_vmlinux, args[off / 8 + 1].type);
+	if (btf_type_is_int(t))
+		/* accessing a scalar */
+		return true;
+	if (!btf_type_is_ptr(t)) {
+		bpf_verifier_log_write(info->env,
+				       "raw_tp '%s' arg%d '%s' has type %s. Only pointer access is allowed\n",
+				       tname, off / 8,
+				       __btf_name_by_offset(btf_vmlinux, t->name_off),
+				       btf_kind_str[BTF_INFO_KIND(t->info)]);
+		return false;
+	}
+	if (t->type == 0)
+		/* This is a pointer to void.
+		 * It is the same as scalar from the verifier safety pov.
+		 * No further pointer walking is allowed.
+		 */
+		return true;
+
+	/* this is a pointer to another type */
+	info->reg_type = PTR_TO_BTF_ID;
+	info->btf_id = t->type;
+
+	t = btf_type_by_id(btf_vmlinux, t->type);
+	bpf_verifier_log_write(info->env,
+			       "raw_tp '%s' arg%d has btf_id %d type %s '%s'\n",
+			       tname, off / 8, info->btf_id,
+			       btf_kind_str[BTF_INFO_KIND(t->info)],
+			       __btf_name_by_offset(btf_vmlinux, t->name_off));
+	return true;
+}
+
+int btf_struct_access(struct bpf_verifier_env *env,
+		      const struct btf_type *t, int off, int size,
+		      enum bpf_access_type atype,
+		      u32 *next_btf_id)
+{
+	const struct btf_member *member;
+	const struct btf_type *mtype;
+	const char *tname, *mname;
+	int i, moff = 0, msize;
+
+again:
+	tname = btf_name_by_offset(btf_vmlinux, t->name_off);
+	if (!btf_type_is_struct(t)) {
+		bpf_verifier_log_write(env, "Type '%s' is not a struct", tname);
+		return -EINVAL;
+	}
+	if (btf_type_vlen(t) < 1) {
+		bpf_verifier_log_write(env, "struct %s doesn't have fields", tname);
+		return -EINVAL;
+	}
+
+	for_each_member(i, t, member) {
+
+		/* offset of the field */
+		moff = btf_member_bit_offset(t, member);
+
+		if (off < moff / 8)
+			continue;
+
+		/* type of the field */
+		mtype = btf_type_by_id(btf_vmlinux, member->type);
+		mname = __btf_name_by_offset(btf_vmlinux, member->name_off);
+
+		/* skip typedef, volotile modifiers */
+		while (btf_type_is_modifier(mtype))
+			mtype = btf_type_by_id(btf_vmlinux, mtype->type);
+
+		if (btf_type_is_array(mtype))
+			/* array deref is not supported yet */
+			continue;
+
+		if (!btf_type_has_size(mtype) && !btf_type_is_ptr(mtype)) {
+			bpf_verifier_log_write(env,
+					       "field %s doesn't have size\n",
+					       mname);
+			return -EFAULT;
+		}
+		if (btf_type_is_ptr(mtype))
+			msize = 8;
+		else
+			msize = mtype->size;
+		if (off >= moff / 8 + msize)
+			/* rare case, must be a field of the union with smaller size,
+			 * let's try another field
+			 */
+			continue;
+		/* the 'off' we're looking for is either equal to start
+		 * of this field or inside of this struct
+		 */
+		if (btf_type_is_struct(mtype)) {
+			/* our field must be inside that union or struct */
+			t = mtype;
+
+			/* adjust offset we're looking for */
+			off -= moff / 8;
+			goto again;
+		}
+		if (msize != size) {
+			/* field access size doesn't match */
+			bpf_verifier_log_write(env,
+					       "cannot access %d bytes in struct %s field %s that has size %d\n",
+					       size, tname, mname, msize);
+			return -EACCES;
+		}
+
+		if (btf_type_is_ptr(mtype)) {
+			const struct btf_type *stype;
+
+			stype = btf_type_by_id(btf_vmlinux, mtype->type);
+			if (btf_type_is_struct(stype)) {
+				*next_btf_id = mtype->type;
+				return PTR_TO_BTF_ID;
+			}
+		}
+		/* all other fields are treated as scalars */
+		return SCALAR_VALUE;
+	}
+	bpf_verifier_log_write(env,
+			       "struct %s doesn't have field at offset %d\n",
+			       tname, off);
+	return -EINVAL;
+}
+
 void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
 		       struct seq_file *m)
 {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 91c4db4d1c6a..3c155873ffea 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -406,6 +406,7 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null",
 	[PTR_TO_TP_BUFFER]	= "tp_buffer",
 	[PTR_TO_XDP_SOCK]	= "xdp_sock",
+	[PTR_TO_BTF_ID]		= "ptr_",
 };
 
 static char slot_type_char[] = {
@@ -460,6 +461,10 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 			/* reg->off should be 0 for SCALAR_VALUE */
 			verbose(env, "%lld", reg->var_off.value + reg->off);
 		} else {
+			if (t == PTR_TO_BTF_ID)
+				verbose(env, "%s",
+					btf_name_by_offset(btf_vmlinux,
+							   btf_type_by_id(btf_vmlinux, reg->btf_id)->name_off));
 			verbose(env, "(id=%d", reg->id);
 			if (reg_type_may_be_refcounted_or_null(t))
 				verbose(env, ",ref_obj_id=%d", reg->ref_obj_id);
@@ -2337,10 +2342,12 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
 
 /* check access to 'struct bpf_context' fields.  Supports fixed offsets only */
 static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size,
-			    enum bpf_access_type t, enum bpf_reg_type *reg_type)
+			    enum bpf_access_type t, enum bpf_reg_type *reg_type,
+			    u32 *btf_id)
 {
 	struct bpf_insn_access_aux info = {
 		.reg_type = *reg_type,
+		.env = env,
 	};
 
 	if (env->ops->is_valid_access &&
@@ -2354,7 +2361,10 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
 		 */
 		*reg_type = info.reg_type;
 
-		env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
+		if (*reg_type == PTR_TO_BTF_ID)
+			*btf_id = info.btf_id;
+		else
+			env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
 		/* remember the offset of last byte accessed in ctx */
 		if (env->prog->aux->max_ctx_offset < off + size)
 			env->prog->aux->max_ctx_offset = off + size;
@@ -2745,6 +2755,53 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
 	reg->smax_value = reg->umax_value;
 }
 
+static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
+				   struct bpf_reg_state *regs,
+				   int regno, int off, int size,
+				   enum bpf_access_type atype,
+				   int value_regno)
+{
+	struct bpf_reg_state *reg = regs + regno;
+	const struct btf_type *t = btf_type_by_id(btf_vmlinux, reg->btf_id);
+	const char *tname = btf_name_by_offset(btf_vmlinux, t->name_off);
+	u32 btf_id;
+	int ret;
+
+	if (atype != BPF_READ) {
+		verbose(env, "only read is supported\n");
+		return -EACCES;
+	}
+
+	if (off < 0) {
+		verbose(env,
+			"R%d is ptr_%s negative access %d is not allowed\n",
+			regno, tname, off);
+		return -EACCES;
+	}
+	if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
+		char tn_buf[48];
+
+		tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
+		verbose(env,
+			"R%d is ptr_%s invalid variable offset: off=%d, var_off=%s\n",
+			regno, tname, off, tn_buf);
+		return -EACCES;
+	}
+
+	ret = btf_struct_access(env, t, off, size, atype, &btf_id);
+	if (ret < 0)
+		return ret;
+
+	if (ret == SCALAR_VALUE) {
+		mark_reg_unknown(env, regs, value_regno);
+		return 0;
+	}
+	mark_reg_known_zero(env, regs, value_regno);
+	regs[value_regno].type = PTR_TO_BTF_ID;
+	regs[value_regno].btf_id = btf_id;
+	return 0;
+}
+
 /* check whether memory at (regno + off) is accessible for t = (read | write)
  * if t==write, value_regno is a register which value is stored into memory
  * if t==read, value_regno is a register which will receive the value from memory
@@ -2787,6 +2844,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 
 	} else if (reg->type == PTR_TO_CTX) {
 		enum bpf_reg_type reg_type = SCALAR_VALUE;
+		u32 btf_id = 0;
 
 		if (t == BPF_WRITE && value_regno >= 0 &&
 		    is_pointer_value(env, value_regno)) {
@@ -2798,7 +2856,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		if (err < 0)
 			return err;
 
-		err = check_ctx_access(env, insn_idx, off, size, t, &reg_type);
+		err = check_ctx_access(env, insn_idx, off, size, t, &reg_type, &btf_id);
 		if (!err && t == BPF_READ && value_regno >= 0) {
 			/* ctx access returns either a scalar, or a
 			 * PTR_TO_PACKET[_META,_END]. In the latter
@@ -2817,6 +2875,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 				 * a sub-register.
 				 */
 				regs[value_regno].subreg_def = DEF_NOT_SUBREG;
+				if (reg_type == PTR_TO_BTF_ID)
+					regs[value_regno].btf_id = btf_id;
 			}
 			regs[value_regno].type = reg_type;
 		}
@@ -2876,6 +2936,9 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		err = check_tp_buffer_access(env, reg, regno, off, size);
 		if (!err && t == BPF_READ && value_regno >= 0)
 			mark_reg_unknown(env, regs, value_regno);
+	} else if (reg->type == PTR_TO_BTF_ID) {
+		err = check_ptr_to_btf_access(env, regs, regno, off, size, t,
+					      value_regno);
 	} else {
 		verbose(env, "R%d invalid mem access '%s'\n", regno,
 			reg_type_str[reg->type]);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 44bd08f2443b..6221e8c6ecc3 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1074,7 +1074,7 @@ static bool raw_tp_prog_is_valid_access(int off, int size,
 		return false;
 	if (off % size != 0)
 		return false;
-	return true;
+	return btf_ctx_access(off, size, type, prog, info);
 }
 
 const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
-- 
2.20.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 06/10] bpf: add support for BTF pointers to interpreter
  2019-10-05  5:03 [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing Alexei Starovoitov
                   ` (4 preceding siblings ...)
  2019-10-05  5:03 ` [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF Alexei Starovoitov
@ 2019-10-05  5:03 ` Alexei Starovoitov
  2019-10-08  3:08   ` Andrii Nakryiko
  2019-10-05  5:03 ` [PATCH bpf-next 07/10] bpf: add support for BTF pointers to x86 JIT Alexei Starovoitov
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-05  5:03 UTC (permalink / raw)
  To: davem; +Cc: daniel, x86, netdev, bpf, kernel-team

Pointer to BTF object is a pointer to kernel object or NULL.
The memory access in the interpreter has to be done via probe_kernel_read
to avoid page faults.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/filter.h |  3 +++
 kernel/bpf/core.c      | 19 +++++++++++++++++++
 kernel/bpf/verifier.c  |  8 ++++++++
 3 files changed, 30 insertions(+)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index d3d51d7aff2c..22ebea2e64ea 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -65,6 +65,9 @@ struct ctl_table_header;
 /* unused opcode to mark special call to bpf_tail_call() helper */
 #define BPF_TAIL_CALL	0xf0
 
+/* unused opcode to mark special load instruction. Same as BPF_ABS */
+#define BPF_PROBE_MEM	0x20
+
 /* unused opcode to mark call to interpreter with arguments */
 #define BPF_CALL_ARGS	0xe0
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 66088a9e9b9e..8a765bbd33f0 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1291,6 +1291,11 @@ bool bpf_opcode_in_insntable(u8 code)
 }
 
 #ifndef CONFIG_BPF_JIT_ALWAYS_ON
+u64 __weak bpf_probe_read(void * dst, u32 size, const void * unsafe_ptr)
+{
+	memset(dst, 0, size);
+	return -EFAULT;
+}
 /**
  *	__bpf_prog_run - run eBPF program on a given context
  *	@regs: is the array of MAX_BPF_EXT_REG eBPF pseudo-registers
@@ -1310,6 +1315,10 @@ static u64 __no_fgcse ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u6
 		/* Non-UAPI available opcodes. */
 		[BPF_JMP | BPF_CALL_ARGS] = &&JMP_CALL_ARGS,
 		[BPF_JMP | BPF_TAIL_CALL] = &&JMP_TAIL_CALL,
+		[BPF_LDX | BPF_PROBE_MEM | BPF_B] = &&LDX_PROBE_MEM_B,
+		[BPF_LDX | BPF_PROBE_MEM | BPF_H] = &&LDX_PROBE_MEM_H,
+		[BPF_LDX | BPF_PROBE_MEM | BPF_W] = &&LDX_PROBE_MEM_W,
+		[BPF_LDX | BPF_PROBE_MEM | BPF_DW] = &&LDX_PROBE_MEM_DW,
 	};
 #undef BPF_INSN_3_LBL
 #undef BPF_INSN_2_LBL
@@ -1542,6 +1551,16 @@ static u64 __no_fgcse ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u6
 	LDST(W,  u32)
 	LDST(DW, u64)
 #undef LDST
+#define LDX_PROBE(SIZEOP, SIZE)						\
+	LDX_PROBE_MEM_##SIZEOP:						\
+		bpf_probe_read(&DST, SIZE, (const void *)(long) SRC);	\
+		CONT;
+	LDX_PROBE(B,  1)
+	LDX_PROBE(H,  2)
+	LDX_PROBE(W,  4)
+	LDX_PROBE(DW, 8)
+#undef LDX_PROBE
+
 	STX_XADD_W: /* lock xadd *(u32 *)(dst_reg + off16) += src_reg */
 		atomic_add((u32) SRC, (atomic_t *)(unsigned long)
 			   (DST + insn->off));
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3c155873ffea..b81f46371bb9 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -7509,6 +7509,7 @@ static bool reg_type_mismatch_ok(enum bpf_reg_type type)
 	case PTR_TO_TCP_SOCK:
 	case PTR_TO_TCP_SOCK_OR_NULL:
 	case PTR_TO_XDP_SOCK:
+	case PTR_TO_BTF_ID:
 		return false;
 	default:
 		return true;
@@ -8650,6 +8651,13 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 		case PTR_TO_XDP_SOCK:
 			convert_ctx_access = bpf_xdp_sock_convert_ctx_access;
 			break;
+		case PTR_TO_BTF_ID:
+			if (type == BPF_WRITE) {
+				verbose(env, "Writes through BTF pointers are not allowed\n");
+				return -EINVAL;
+			}
+			insn->code = BPF_LDX | BPF_PROBE_MEM | BPF_SIZE((insn)->code);
+			continue;
 		default:
 			continue;
 		}
-- 
2.20.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 07/10] bpf: add support for BTF pointers to x86 JIT
  2019-10-05  5:03 [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing Alexei Starovoitov
                   ` (5 preceding siblings ...)
  2019-10-05  5:03 ` [PATCH bpf-next 06/10] bpf: add support for BTF pointers to interpreter Alexei Starovoitov
@ 2019-10-05  5:03 ` Alexei Starovoitov
  2019-10-05  6:03   ` Eric Dumazet
  2019-10-09 17:38   ` Andrii Nakryiko
  2019-10-05  5:03 ` [PATCH bpf-next 08/10] bpf: check types of arguments passed into helpers Alexei Starovoitov
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-05  5:03 UTC (permalink / raw)
  To: davem; +Cc: daniel, x86, netdev, bpf, kernel-team

Pointer to BTF object is a pointer to kernel object or NULL.
Such pointers can only be used by BPF_LDX instructions.
The verifier changed their opcode from LDX|MEM|size
to LDX|PROBE_MEM|size to make JITing easier.
The number of entries in extable is the number of BPF_LDX insns
that access kernel memory via "pointer to BTF type".
Only these load instructions can fault.
Since x86 extable is relative it has to be allocated in the same
memory region as JITed code.
Allocate it prior to last pass of JITing and let the last pass populate it.
Pointer to extable in bpf_prog_aux is necessary to make page fault
handling fast.
Page fault handling is done in two steps:
1. bpf_prog_kallsyms_find() finds BPF program that page faulted.
   It's done by walking rb tree.
2. then extable for given bpf program is binary searched.
This process is similar to how page faulting is done for kernel modules.
The exception handler skips over faulting x86 instruction and
initializes destination register with zero. This mimics exact
behavior of bpf_probe_read (when probe_kernel_read faults dest is zeroed).

JITs for other architectures can add support in similar way.
Until then they will reject unknown opcode and fallback to interpreter.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 arch/x86/net/bpf_jit_comp.c | 96 +++++++++++++++++++++++++++++++++++--
 include/linux/bpf.h         |  3 ++
 include/linux/extable.h     | 10 ++++
 kernel/bpf/core.c           | 20 +++++++-
 kernel/bpf/verifier.c       |  1 +
 kernel/extable.c            |  2 +
 6 files changed, 127 insertions(+), 5 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 3ad2ba1ad855..b8549118df04 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -9,7 +9,7 @@
 #include <linux/filter.h>
 #include <linux/if_vlan.h>
 #include <linux/bpf.h>
-
+#include <asm/extable.h>
 #include <asm/set_memory.h>
 #include <asm/nospec-branch.h>
 
@@ -123,6 +123,19 @@ static const int reg2hex[] = {
 	[AUX_REG] = 3,    /* R11 temp register */
 };
 
+static const int reg2pt_regs[] = {
+	[BPF_REG_0] = offsetof(struct pt_regs, ax),
+	[BPF_REG_1] = offsetof(struct pt_regs, di),
+	[BPF_REG_2] = offsetof(struct pt_regs, si),
+	[BPF_REG_3] = offsetof(struct pt_regs, dx),
+	[BPF_REG_4] = offsetof(struct pt_regs, cx),
+	[BPF_REG_5] = offsetof(struct pt_regs, r8),
+	[BPF_REG_6] = offsetof(struct pt_regs, bx),
+	[BPF_REG_7] = offsetof(struct pt_regs, r13),
+	[BPF_REG_8] = offsetof(struct pt_regs, r14),
+	[BPF_REG_9] = offsetof(struct pt_regs, r15),
+};
+
 /*
  * is_ereg() == true if BPF register 'reg' maps to x86-64 r8..r15
  * which need extra byte of encoding.
@@ -377,6 +390,19 @@ static void emit_mov_reg(u8 **pprog, bool is64, u32 dst_reg, u32 src_reg)
 	*pprog = prog;
 }
 
+
+bool ex_handler_bpf(const struct exception_table_entry *x,
+		    struct pt_regs *regs, int trapnr,
+		    unsigned long error_code, unsigned long fault_addr)
+{
+	u32 reg = x->fixup >> 8;
+
+	/* jump over faulting load and clear dest register */
+	*(unsigned long *)((void *)regs + reg) = 0;
+	regs->ip += x->fixup & 0xff;
+	return true;
+}
+
 static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 		  int oldproglen, struct jit_context *ctx)
 {
@@ -384,7 +410,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 	int insn_cnt = bpf_prog->len;
 	bool seen_exit = false;
 	u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
-	int i, cnt = 0;
+	int i, cnt = 0, excnt = 0;
 	int proglen = 0;
 	u8 *prog = temp;
 
@@ -778,14 +804,17 @@ stx:			if (is_imm8(insn->off))
 
 			/* LDX: dst_reg = *(u8*)(src_reg + off) */
 		case BPF_LDX | BPF_MEM | BPF_B:
+		case BPF_LDX | BPF_PROBE_MEM | BPF_B:
 			/* Emit 'movzx rax, byte ptr [rax + off]' */
 			EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB6);
 			goto ldx;
 		case BPF_LDX | BPF_MEM | BPF_H:
+		case BPF_LDX | BPF_PROBE_MEM | BPF_H:
 			/* Emit 'movzx rax, word ptr [rax + off]' */
 			EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xB7);
 			goto ldx;
 		case BPF_LDX | BPF_MEM | BPF_W:
+		case BPF_LDX | BPF_PROBE_MEM | BPF_W:
 			/* Emit 'mov eax, dword ptr [rax+0x14]' */
 			if (is_ereg(dst_reg) || is_ereg(src_reg))
 				EMIT2(add_2mod(0x40, src_reg, dst_reg), 0x8B);
@@ -793,6 +822,7 @@ stx:			if (is_imm8(insn->off))
 				EMIT1(0x8B);
 			goto ldx;
 		case BPF_LDX | BPF_MEM | BPF_DW:
+		case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
 			/* Emit 'mov rax, qword ptr [rax+0x14]' */
 			EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x8B);
 ldx:			/*
@@ -805,6 +835,48 @@ stx:			if (is_imm8(insn->off))
 			else
 				EMIT1_off32(add_2reg(0x80, src_reg, dst_reg),
 					    insn->off);
+			if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
+				struct exception_table_entry *ex;
+				u8 *_insn = image + proglen;
+				s64 delta;
+
+				if (!bpf_prog->aux->extable)
+					break;
+
+				if (excnt >= bpf_prog->aux->num_exentries) {
+					pr_err("ex gen bug\n");
+					return -EFAULT;
+				}
+				ex = &bpf_prog->aux->extable[excnt++];
+
+				delta = _insn - (u8 *)&ex->insn;
+				if (!is_simm32(delta)) {
+					pr_err("extable->insn doesn't fit into 32-bit\n");
+					return -EFAULT;
+				}
+				ex->insn = delta;
+
+				delta = (u8 *)ex_handler_bpf - (u8 *)&ex->handler;
+				if (!is_simm32(delta)) {
+					pr_err("extable->handler doesn't fit into 32-bit\n");
+					return -EFAULT;
+				}
+				ex->handler = delta;
+
+				if (dst_reg > BPF_REG_9) {
+					pr_err("verifier error\n");
+					return -EFAULT;
+				}
+				/*
+				 * Compute size of x86 insn and its target dest x86 register.
+				 * ex_handler_bpf() will use lower 8 bits to adjust
+				 * pt_regs->ip to jump over this x86 instruction
+				 * and upper bits to figure out which pt_regs to zero out.
+				 * End result: x86 insn "mov rbx, qword ptr [rax+0x14]"
+				 * of 4 bytes will be ignored and rbx will be zero inited.
+				 */
+				ex->fixup = (prog - temp) | (reg2pt_regs[dst_reg] << 8);
+			}
 			break;
 
 			/* STX XADD: lock *(u32*)(dst_reg + off) += src_reg */
@@ -1058,6 +1130,11 @@ xadd:			if (is_imm8(insn->off))
 		addrs[i] = proglen;
 		prog = temp;
 	}
+
+	if (image && excnt != bpf_prog->aux->num_exentries) {
+		pr_err("extable is not populated\n");
+		return -EFAULT;
+	}
 	return proglen;
 }
 
@@ -1158,12 +1235,23 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 			break;
 		}
 		if (proglen == oldproglen) {
-			header = bpf_jit_binary_alloc(proglen, &image,
-						      1, jit_fill_hole);
+			/*
+			 * The number of entries in extable is the number of BPF_LDX
+			 * insns that access kernel memory via "pointer to BTF type".
+			 * The verifier changed their opcode from LDX|MEM|size
+			 * to LDX|PROBE_MEM|size to make JITing easier.
+			 */
+			u32 extable_size = prog->aux->num_exentries *
+				sizeof(struct exception_table_entry);
+
+			/* allocate module memory for x86 insns and extable */
+			header = bpf_jit_binary_alloc(proglen + extable_size,
+						      &image, 1, jit_fill_hole);
 			if (!header) {
 				prog = orig_prog;
 				goto out_addrs;
 			}
+			prog->aux->extable = (void *) image + proglen;
 		}
 		oldproglen = proglen;
 		cond_resched();
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2dc3a7c313e9..0bd9e12150ac 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -23,6 +23,7 @@ struct sock;
 struct seq_file;
 struct btf;
 struct btf_type;
+struct exception_table_entry;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -421,6 +422,8 @@ struct bpf_prog_aux {
 	 * main prog always has linfo_idx == 0
 	 */
 	u32 linfo_idx;
+	u32 num_exentries;
+	struct exception_table_entry *extable;
 	struct bpf_prog_stats __percpu *stats;
 	union {
 		struct work_struct work;
diff --git a/include/linux/extable.h b/include/linux/extable.h
index 81ecfaa83ad3..4ab9e78f313b 100644
--- a/include/linux/extable.h
+++ b/include/linux/extable.h
@@ -33,4 +33,14 @@ search_module_extables(unsigned long addr)
 }
 #endif /*CONFIG_MODULES*/
 
+#ifdef CONFIG_BPF_JIT
+const struct exception_table_entry *search_bpf_extables(unsigned long addr);
+#else
+static inline const struct exception_table_entry *
+search_bpf_extables(unsigned long addr)
+{
+	return NULL;
+}
+#endif
+
 #endif /* _LINUX_EXTABLE_H */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 8a765bbd33f0..673f5d40a93e 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -30,7 +30,7 @@
 #include <linux/kallsyms.h>
 #include <linux/rcupdate.h>
 #include <linux/perf_event.h>
-
+#include <linux/extable.h>
 #include <asm/unaligned.h>
 
 /* Registers */
@@ -712,6 +712,24 @@ bool is_bpf_text_address(unsigned long addr)
 	return ret;
 }
 
+const struct exception_table_entry *search_bpf_extables(unsigned long addr)
+{
+	const struct exception_table_entry *e = NULL;
+	struct bpf_prog *prog;
+
+	rcu_read_lock();
+	prog = bpf_prog_kallsyms_find(addr);
+	if (!prog)
+		goto out;
+	if (!prog->aux->num_exentries)
+		goto out;
+
+	e = search_extable(prog->aux->extable, prog->aux->num_exentries, addr);
+out:
+	rcu_read_unlock();
+	return e;
+}
+
 int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
 		    char *sym)
 {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b81f46371bb9..957ee442f2b4 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -8657,6 +8657,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 				return -EINVAL;
 			}
 			insn->code = BPF_LDX | BPF_PROBE_MEM | BPF_SIZE((insn)->code);
+			env->prog->aux->num_exentries++;
 			continue;
 		default:
 			continue;
diff --git a/kernel/extable.c b/kernel/extable.c
index f6c9406eec7d..f6920a11e28a 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -56,6 +56,8 @@ const struct exception_table_entry *search_exception_tables(unsigned long addr)
 	e = search_kernel_exception_table(addr);
 	if (!e)
 		e = search_module_extables(addr);
+	if (!e)
+		e = search_bpf_extables(addr);
 	return e;
 }
 
-- 
2.20.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 08/10] bpf: check types of arguments passed into helpers
  2019-10-05  5:03 [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing Alexei Starovoitov
                   ` (6 preceding siblings ...)
  2019-10-05  5:03 ` [PATCH bpf-next 07/10] bpf: add support for BTF pointers to x86 JIT Alexei Starovoitov
@ 2019-10-05  5:03 ` Alexei Starovoitov
  2019-10-09 18:01   ` Andrii Nakryiko
  2019-10-05  5:03 ` [PATCH bpf-next 09/10] bpf: disallow bpf_probe_read[_str] helpers Alexei Starovoitov
  2019-10-05  5:03 ` [PATCH bpf-next 10/10] selftests/bpf: add kfree_skb raw_tp test Alexei Starovoitov
  9 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-05  5:03 UTC (permalink / raw)
  To: davem; +Cc: daniel, x86, netdev, bpf, kernel-team

Introduce new helper that reuses existing skb perf_event output
implementation, but can be called from raw_tracepoint programs
that receive 'struct sk_buff *' as tracepoint argument or
can walk other kernel data structures to skb pointer.

In order to do that teach verifier to resolve true C types
of bpf helpers into in-kernel BTF ids.
The type of kernel pointer passed by raw tracepoint into bpf
program will be tracked by the verifier all the way until
it's passed into helper function.
For example:
kfree_skb() kernel function calls trace_kfree_skb(skb, loc);
bpf programs receives that skb pointer and may eventually
pass it into bpf_skb_output() bpf helper which in-kernel is
implemented via bpf_skb_event_output() kernel function.
Its first argument in the kernel is 'struct sk_buff *'.
The verifier makes sure that types match all the way.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf.h                       |  3 +
 include/uapi/linux/bpf.h                  |  3 +-
 kernel/bpf/btf.c                          | 73 +++++++++++++++++++++++
 kernel/bpf/verifier.c                     | 29 +++++++++
 kernel/trace/bpf_trace.c                  |  4 ++
 net/core/filter.c                         | 15 ++++-
 tools/include/uapi/linux/bpf.h            |  3 +-
 tools/testing/selftests/bpf/bpf_helpers.h |  4 ++
 8 files changed, 131 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0bd9e12150ac..f1690e233e51 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -212,6 +212,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_INT,		/* pointer to int */
 	ARG_PTR_TO_LONG,	/* pointer to long */
 	ARG_PTR_TO_SOCKET,	/* pointer to bpf_sock (fullsock) */
+	ARG_PTR_TO_BTF_ID,	/* pointer to in-kernel struct */
 };
 
 /* type of values returned from helper functions */
@@ -239,6 +240,7 @@ struct bpf_func_proto {
 	enum bpf_arg_type arg3_type;
 	enum bpf_arg_type arg4_type;
 	enum bpf_arg_type arg5_type;
+	u32 *btf_id; /* BTF ids of arguments */
 };
 
 /* bpf_context is intentionally undefined structure. Pointer to bpf_context is
@@ -762,6 +764,7 @@ int btf_struct_access(struct bpf_verifier_env *env,
 		      const struct btf_type *t, int off, int size,
 		      enum bpf_access_type atype,
 		      u32 *next_btf_id);
+u32 btf_resolve_helper_id(struct bpf_verifier_env *env, void *, int);
 
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 77c6be96d676..3752de7ae50e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2862,7 +2862,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(skb_output),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 61ff8a54ca22..5d516a817d1c 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3612,6 +3612,79 @@ int btf_struct_access(struct bpf_verifier_env *env,
 	return -EINVAL;
 }
 
+u32 btf_resolve_helper_id(struct bpf_verifier_env *env, void *fn, int arg)
+{
+	char fnname[KSYM_SYMBOL_LEN + 4] = "btf_";
+	const struct btf_param *args;
+	const struct btf_type *t;
+	const char *tname, *sym;
+	u32 btf_id, i;
+
+	if (IS_ERR(btf_vmlinux)) {
+		bpf_verifier_log_write(env, "btf_vmlinux is malformed\n");
+		return -EINVAL;
+	}
+
+	sym = kallsyms_lookup((long)fn, NULL, NULL, NULL, fnname + 4);
+	if (!sym) {
+		bpf_verifier_log_write(env, "kernel doesn't have kallsyms\n");
+		return -EFAULT;
+	}
+
+	for (i = 1; i <= btf_vmlinux->nr_types; i++) {
+		t = btf_type_by_id(btf_vmlinux, i);
+		if (BTF_INFO_KIND(t->info) != BTF_KIND_TYPEDEF)
+			continue;
+		tname = __btf_name_by_offset(btf_vmlinux, t->name_off);
+		if (!strcmp(tname, fnname))
+			break;
+	}
+	if (i > btf_vmlinux->nr_types) {
+		bpf_verifier_log_write(env,
+				       "helper %s type is not found\n",
+				       fnname);
+		return -ENOENT;
+	}
+
+	t = btf_type_by_id(btf_vmlinux, t->type);
+	if (!btf_type_is_ptr(t))
+		return -EFAULT;
+	t = btf_type_by_id(btf_vmlinux, t->type);
+	if (!btf_type_is_func_proto(t))
+		return -EFAULT;
+
+	args = (const struct btf_param *)(t + 1);
+	if (arg >= btf_type_vlen(t)) {
+		bpf_verifier_log_write(env,
+				       "bpf helper '%s' doesn't have %d-th argument\n",
+				       fnname, arg);
+		return -EINVAL;
+	}
+
+	t = btf_type_by_id(btf_vmlinux, args[arg].type);
+	if (!btf_type_is_ptr(t) || !t->type) {
+		/* anything but the pointer to struct is a helper config bug */
+		bpf_verifier_log_write(env,
+				       "ARG_PTR_TO_BTF is misconfigured\n");
+
+		return -EFAULT;
+	}
+	btf_id = t->type;
+
+	t = btf_type_by_id(btf_vmlinux, t->type);
+	if (!btf_type_is_struct(t)) {
+		bpf_verifier_log_write(env,
+				       "ARG_PTR_TO_BTF is not a struct\n");
+
+		return -EFAULT;
+	}
+	bpf_verifier_log_write(env,
+			       "helper '%s' arg%d has btf_id %d struct '%s'\n",
+			       fnname + 4, arg, btf_id,
+			       __btf_name_by_offset(btf_vmlinux, t->name_off));
+	return btf_id;
+}
+
 void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
 		       struct seq_file *m)
 {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 957ee442f2b4..0717aacb7801 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -205,6 +205,7 @@ struct bpf_call_arg_meta {
 	u64 msize_umax_value;
 	int ref_obj_id;
 	int func_id;
+	u32 btf_id;
 };
 
 struct btf *btf_vmlinux;
@@ -3367,6 +3368,27 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
 		expected_type = PTR_TO_SOCKET;
 		if (type != expected_type)
 			goto err_type;
+	} else if (arg_type == ARG_PTR_TO_BTF_ID) {
+		expected_type = PTR_TO_BTF_ID;
+		if (type != expected_type)
+			goto err_type;
+		if (reg->btf_id != meta->btf_id) {
+			verbose(env, "Helper has type %s got %s in R%d\n",
+				btf_name_by_offset(btf_vmlinux,
+						   btf_type_by_id(btf_vmlinux,
+								  meta->btf_id)->name_off),
+				btf_name_by_offset(btf_vmlinux,
+						   btf_type_by_id(btf_vmlinux,
+								  reg->btf_id)->name_off),
+				regno);
+
+			return -EACCES;
+		}
+		if (!tnum_is_const(reg->var_off) || reg->var_off.value || reg->off) {
+			verbose(env, "R%d is a pointer to in-kernel struct with non-zero offset\n",
+				regno);
+			return -EACCES;
+		}
 	} else if (arg_type == ARG_PTR_TO_SPIN_LOCK) {
 		if (meta->func_id == BPF_FUNC_spin_lock) {
 			if (process_spin_lock(env, regno, true))
@@ -3514,6 +3536,7 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_MAP_TYPE_PERF_EVENT_ARRAY:
 		if (func_id != BPF_FUNC_perf_event_read &&
 		    func_id != BPF_FUNC_perf_event_output &&
+		    func_id != BPF_FUNC_skb_output &&
 		    func_id != BPF_FUNC_perf_event_read_value)
 			goto error;
 		break;
@@ -3601,6 +3624,7 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 	case BPF_FUNC_perf_event_read:
 	case BPF_FUNC_perf_event_output:
 	case BPF_FUNC_perf_event_read_value:
+	case BPF_FUNC_skb_output:
 		if (map->map_type != BPF_MAP_TYPE_PERF_EVENT_ARRAY)
 			goto error;
 		break;
@@ -4053,6 +4077,11 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
 		return err;
 	}
 
+	if (fn->arg1_type == ARG_PTR_TO_BTF_ID) {
+		if (!fn->btf_id[0])
+			fn->btf_id[0] = btf_resolve_helper_id(env, fn->func, 0);
+		meta.btf_id = fn->btf_id[0];
+	}
 	meta.func_id = func_id;
 	/* check args */
 	err = check_func_arg(env, BPF_REG_1, fn->arg1_type, &meta);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 6221e8c6ecc3..52f7e9d8c29b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -995,6 +995,8 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = {
 	.arg5_type	= ARG_CONST_SIZE_OR_ZERO,
 };
 
+extern const struct bpf_func_proto bpf_skb_output_proto;
+
 BPF_CALL_3(bpf_get_stackid_raw_tp, struct bpf_raw_tracepoint_args *, args,
 	   struct bpf_map *, map, u64, flags)
 {
@@ -1053,6 +1055,8 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	switch (func_id) {
 	case BPF_FUNC_perf_event_output:
 		return &bpf_perf_event_output_proto_raw_tp;
+	case BPF_FUNC_skb_output:
+		return &bpf_skb_output_proto;
 	case BPF_FUNC_get_stackid:
 		return &bpf_get_stackid_proto_raw_tp;
 	case BPF_FUNC_get_stack:
diff --git a/net/core/filter.c b/net/core/filter.c
index ed6563622ce3..c48fe0971b25 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3798,7 +3798,7 @@ BPF_CALL_5(bpf_skb_event_output, struct sk_buff *, skb, struct bpf_map *, map,
 
 	if (unlikely(flags & ~(BPF_F_CTXLEN_MASK | BPF_F_INDEX_MASK)))
 		return -EINVAL;
-	if (unlikely(skb_size > skb->len))
+	if (unlikely(!skb || skb_size > skb->len))
 		return -EFAULT;
 
 	return bpf_event_output(map, flags, meta, meta_size, skb, skb_size,
@@ -3816,6 +3816,19 @@ static const struct bpf_func_proto bpf_skb_event_output_proto = {
 	.arg5_type	= ARG_CONST_SIZE_OR_ZERO,
 };
 
+static u32 bpf_skb_output_btf_ids[5];
+const struct bpf_func_proto bpf_skb_output_proto = {
+	.func		= bpf_skb_event_output,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_BTF_ID,
+	.arg2_type	= ARG_CONST_MAP_PTR,
+	.arg3_type	= ARG_ANYTHING,
+	.arg4_type	= ARG_PTR_TO_MEM,
+	.arg5_type	= ARG_CONST_SIZE_OR_ZERO,
+	.btf_id		= bpf_skb_output_btf_ids,
+};
+
 static unsigned short bpf_tunnel_key_af(u64 flags)
 {
 	return flags & BPF_F_TUNINFO_IPV6 ? AF_INET6 : AF_INET;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 77c6be96d676..3752de7ae50e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2862,7 +2862,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(skb_output),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 54a50699bbfd..c5e05d1a806f 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -65,6 +65,10 @@ static int (*bpf_perf_event_output)(void *ctx, void *map,
 				    unsigned long long flags, void *data,
 				    int size) =
 	(void *) BPF_FUNC_perf_event_output;
+static int (*bpf_skb_output)(void *ctx, void *map,
+			     unsigned long long flags, void *data,
+			     int size) =
+	(void *) BPF_FUNC_skb_output;
 static int (*bpf_get_stackid)(void *ctx, void *map, int flags) =
 	(void *) BPF_FUNC_get_stackid;
 static int (*bpf_probe_write_user)(void *dst, const void *src, int size) =
-- 
2.20.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 09/10] bpf: disallow bpf_probe_read[_str] helpers
  2019-10-05  5:03 [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing Alexei Starovoitov
                   ` (7 preceding siblings ...)
  2019-10-05  5:03 ` [PATCH bpf-next 08/10] bpf: check types of arguments passed into helpers Alexei Starovoitov
@ 2019-10-05  5:03 ` Alexei Starovoitov
  2019-10-09  5:29   ` Andrii Nakryiko
  2019-10-05  5:03 ` [PATCH bpf-next 10/10] selftests/bpf: add kfree_skb raw_tp test Alexei Starovoitov
  9 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-05  5:03 UTC (permalink / raw)
  To: davem; +Cc: daniel, x86, netdev, bpf, kernel-team

Disallow bpf_probe_read() and bpf_probe_read_str() helpers in
raw_tracepoint bpf programs that use in-kernel BTF to track
types of memory accesses.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 kernel/trace/bpf_trace.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 52f7e9d8c29b..7c607f79f1bb 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -700,6 +700,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_map_peek_elem:
 		return &bpf_map_peek_elem_proto;
 	case BPF_FUNC_probe_read:
+		if (prog->expected_attach_type)
+			return NULL;
 		return &bpf_probe_read_proto;
 	case BPF_FUNC_ktime_get_ns:
 		return &bpf_ktime_get_ns_proto;
@@ -728,6 +730,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_get_prandom_u32:
 		return &bpf_get_prandom_u32_proto;
 	case BPF_FUNC_probe_read_str:
+		if (prog->expected_attach_type)
+			return NULL;
 		return &bpf_probe_read_str_proto;
 #ifdef CONFIG_CGROUPS
 	case BPF_FUNC_get_current_cgroup_id:
-- 
2.20.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH bpf-next 10/10] selftests/bpf: add kfree_skb raw_tp test
  2019-10-05  5:03 [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing Alexei Starovoitov
                   ` (8 preceding siblings ...)
  2019-10-05  5:03 ` [PATCH bpf-next 09/10] bpf: disallow bpf_probe_read[_str] helpers Alexei Starovoitov
@ 2019-10-05  5:03 ` Alexei Starovoitov
  2019-10-09  5:36   ` Andrii Nakryiko
  9 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-05  5:03 UTC (permalink / raw)
  To: davem; +Cc: daniel, x86, netdev, bpf, kernel-team

Load basic cls_bpf program.
Load raw_tracepoint program and attach to kfree_skb raw tracepoint.
Trigger cls_bpf via prog_test_run.
At the end of test_run kernel will call kfree_skb
which will trigger trace_kfree_skb tracepoint.
Which will call our raw_tracepoint program.
Which will take that skb and will dump it into perf ring buffer.
Check that user space received correct packet.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 .../selftests/bpf/prog_tests/kfree_skb.c      | 90 +++++++++++++++++++
 tools/testing/selftests/bpf/progs/kfree_skb.c | 76 ++++++++++++++++
 2 files changed, 166 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/kfree_skb.c
 create mode 100644 tools/testing/selftests/bpf/progs/kfree_skb.c

diff --git a/tools/testing/selftests/bpf/prog_tests/kfree_skb.c b/tools/testing/selftests/bpf/prog_tests/kfree_skb.c
new file mode 100644
index 000000000000..238bc7024b36
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/kfree_skb.c
@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+static void on_sample(void *ctx, int cpu, void *data, __u32 size)
+{
+	int ifindex = *(int *)data, duration = 0;
+	struct ipv6_packet * pkt_v6 = data + 4;
+
+	if (ifindex != 1)
+		/* spurious kfree_skb not on loopback device */
+		return;
+	if (CHECK(size != 76, "check_size", "size %d != 76\n", size))
+		return;
+	if (CHECK(pkt_v6->eth.h_proto != 0xdd86, "check_eth",
+		  "h_proto %x\n", pkt_v6->eth.h_proto))
+		return;
+	if (CHECK(pkt_v6->iph.nexthdr != 6, "check_ip",
+		  "iph.nexthdr %x\n", pkt_v6->iph.nexthdr))
+		return;
+	if (CHECK(pkt_v6->tcp.doff != 5, "check_tcp",
+		  "tcp.doff %x\n", pkt_v6->tcp.doff))
+		return;
+
+	*(bool *)ctx = true;
+}
+
+void test_kfree_skb(void)
+{
+	struct bpf_prog_load_attr attr = {
+		.file = "./kfree_skb.o",
+		.log_level = 2,
+	};
+
+	struct bpf_object *obj, *obj2 = NULL;
+	struct perf_buffer_opts pb_opts = {};
+	struct perf_buffer *pb = NULL;
+	struct bpf_link *link = NULL;
+	struct bpf_map *perf_buf_map;
+	struct bpf_program *prog;
+	__u32 duration, retval;
+	int err, pkt_fd, kfree_skb_fd;
+	bool passed = false;
+
+	err = bpf_prog_load("./test_pkt_access.o", BPF_PROG_TYPE_SCHED_CLS, &obj, &pkt_fd);
+	if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
+		return;
+
+	err = bpf_prog_load_xattr(&attr, &obj2, &kfree_skb_fd);
+	if (CHECK(err, "prog_load raw tp", "err %d errno %d\n", err, errno))
+		goto close_prog;
+
+	prog = bpf_object__find_program_by_title(obj2, "raw_tracepoint/kfree_skb");
+	if (CHECK(!prog, "find_prog", "prog kfree_skb not found\n"))
+		goto close_prog;
+	link = bpf_program__attach_raw_tracepoint(prog, "kfree_skb");
+	if (CHECK(IS_ERR(link), "attach_raw_tp", "err %ld\n", PTR_ERR(link)))
+		goto close_prog;
+
+	perf_buf_map = bpf_object__find_map_by_name(obj2, "perf_buf_map");
+	if (CHECK(!perf_buf_map, "find_perf_buf_map", "not found\n"))
+		goto close_prog;
+
+	/* set up perf buffer */
+	pb_opts.sample_cb = on_sample;
+	pb_opts.ctx = &passed;
+	pb = perf_buffer__new(bpf_map__fd(perf_buf_map), 1, &pb_opts);
+	if (CHECK(IS_ERR(pb), "perf_buf__new", "err %ld\n", PTR_ERR(pb)))
+		goto close_prog;
+
+	err = bpf_prog_test_run(pkt_fd, 1, &pkt_v6, sizeof(pkt_v6),
+				NULL, NULL, &retval, &duration);
+	CHECK(err || retval, "ipv6",
+	      "err %d errno %d retval %d duration %d\n",
+	      err, errno, retval, duration);
+
+	/* read perf buffer */
+	err = perf_buffer__poll(pb, 100);
+	if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
+		goto close_prog;
+	/* make sure kfree_skb program was triggered
+	 * and it sent expected skb into ring buffer
+	 */
+	CHECK_FAIL(!passed);
+close_prog:
+	perf_buffer__free(pb);
+	if (!IS_ERR_OR_NULL(link))
+		bpf_link__destroy(link);
+	bpf_object__close(obj);
+	bpf_object__close(obj2);
+}
diff --git a/tools/testing/selftests/bpf/progs/kfree_skb.c b/tools/testing/selftests/bpf/progs/kfree_skb.c
new file mode 100644
index 000000000000..61f1abfc4f48
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/kfree_skb.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+struct {
+	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+	__uint(key_size, sizeof(int));
+	__uint(value_size, sizeof(int));
+} perf_buf_map SEC(".maps");
+
+#define _(P) (__builtin_preserve_access_index(P))
+
+/* define few struct-s that bpf program needs to access */
+struct callback_head {
+	struct callback_head *next;
+	void (*func)(struct callback_head *head);
+};
+struct dev_ifalias {
+	struct callback_head rcuhead;
+};
+
+struct net_device /* same as kernel's struct net_device */ {
+	int ifindex;
+	volatile struct dev_ifalias *ifalias;
+};
+
+struct sk_buff {
+	/* field names and sizes should match to those in the kernel */
+        unsigned int            len,
+                                data_len;
+        __u16                   mac_len,
+                                hdr_len;
+        __u16                   queue_mapping;
+	struct net_device *dev;
+	/* order of the fields doesn't matter */
+};
+
+/* copy arguments from
+ * include/trace/events/skb.h:
+ * TRACE_EVENT(kfree_skb,
+ *         TP_PROTO(struct sk_buff *skb, void *location),
+ *
+ * into struct below:
+ */
+struct trace_kfree_skb {
+	struct sk_buff *skb;
+	void *location;
+};
+
+SEC("raw_tracepoint/kfree_skb")
+int trace_kfree_skb(struct trace_kfree_skb* ctx)
+{
+	struct sk_buff *skb = ctx->skb;
+	struct net_device *dev;
+	int ifindex;
+	struct callback_head *ptr;
+	void *func;
+	__builtin_preserve_access_index(({
+		dev = skb->dev;
+		ifindex = dev->ifindex;
+		ptr = dev->ifalias->rcuhead.next;
+		func = ptr->func;
+	}));
+
+	bpf_printk("rcuhead.next %llx func %llx\n", ptr, func);
+	bpf_printk("skb->len %d\n", _(skb->len));
+	bpf_printk("skb->queue_mapping %d\n", _(skb->queue_mapping));
+	bpf_printk("dev->ifindex %d\n", ifindex);
+
+	/* send first 72 byte of the packet to user space */
+	bpf_skb_output(skb, &perf_buf_map, (72ull << 32) | BPF_F_CURRENT_CPU,
+		       &ifindex, sizeof(ifindex));
+	return 0;
+}
-- 
2.20.0


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 07/10] bpf: add support for BTF pointers to x86 JIT
  2019-10-05  5:03 ` [PATCH bpf-next 07/10] bpf: add support for BTF pointers to x86 JIT Alexei Starovoitov
@ 2019-10-05  6:03   ` Eric Dumazet
  2019-10-09 17:38   ` Andrii Nakryiko
  1 sibling, 0 replies; 39+ messages in thread
From: Eric Dumazet @ 2019-10-05  6:03 UTC (permalink / raw)
  To: Alexei Starovoitov, davem; +Cc: daniel, x86, netdev, bpf, kernel-team



On 10/4/19 10:03 PM, Alexei Starovoitov wrote:
> Pointer to BTF object is a pointer to kernel object or NULL.
> Such pointers can only be used by BPF_LDX instructions.
> The verifier changed their opcode from LDX|MEM|size
> to LDX|PROBE_MEM|size to make JITing easier.
> The number of entries in extable is the number of BPF_LDX insns
> that access kernel memory via "pointer to BTF type".

...

>  		}
>  		if (proglen == oldproglen) {
> -			header = bpf_jit_binary_alloc(proglen, &image,
> -						      1, jit_fill_hole);
> +			/*
> +			 * The number of entries in extable is the number of BPF_LDX
> +			 * insns that access kernel memory via "pointer to BTF type".
> +			 * The verifier changed their opcode from LDX|MEM|size
> +			 * to LDX|PROBE_MEM|size to make JITing easier.
> +			 */
> +			u32 extable_size = prog->aux->num_exentries *
> +				sizeof(struct exception_table_entry);
> +
> +			/* allocate module memory for x86 insns and extable */
> +			header = bpf_jit_binary_alloc(proglen + extable_size,
> +						      &image, 1, jit_fill_hole);
>  			if (!header) {
>  				prog = orig_prog;
>  				goto out_addrs;
>  			}
> +			prog->aux->extable = (void *) image + proglen;

You might want to align ->extable to __alignof__(struct exception_table_entry) (4 bytes currently)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 01/10] bpf: add typecast to raw_tracepoints to help BTF generation
  2019-10-05  5:03 ` [PATCH bpf-next 01/10] bpf: add typecast to raw_tracepoints to help BTF generation Alexei Starovoitov
@ 2019-10-05 18:40   ` Andrii Nakryiko
  2019-10-06  3:58   ` John Fastabend
  1 sibling, 0 replies; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-05 18:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On Fri, Oct 4, 2019 at 10:05 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> When pahole converts dwarf to btf it emits only used types.
> Wrap existing __bpf_trace_##template() function into
> btf_trace_##template typedef and use it in type cast to
> make gcc emits this type into dwarf. Then pahole will convert it to btf.
> The "btf_trace_" prefix will be used to identify BTF enabled raw tracepoints.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  include/trace/bpf_probe.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> index d6e556c0a085..ff1a879773df 100644
> --- a/include/trace/bpf_probe.h
> +++ b/include/trace/bpf_probe.h
> @@ -74,11 +74,12 @@ static inline void bpf_test_probe_##call(void)                              \
>  {                                                                      \
>         check_trace_callback_type_##call(__bpf_trace_##template);       \
>  }                                                                      \
> +typedef void (*btf_trace_##template)(void *__data, proto);             \
>  static struct bpf_raw_event_map        __used                                  \
>         __attribute__((section("__bpf_raw_tp_map")))                    \
>  __bpf_trace_tp_map_##call = {                                          \
>         .tp             = &__tracepoint_##call,                         \
> -       .bpf_func       = (void *)__bpf_trace_##template,               \
> +       .bpf_func       = (void *)(btf_trace_##template)__bpf_trace_##template, \
>         .num_args       = COUNT_ARGS(args),                             \
>         .writable_size  = size,                                         \
>  };
> --
> 2.20.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 02/10] bpf: add typecast to bpf helpers to help BTF generation
  2019-10-05  5:03 ` [PATCH bpf-next 02/10] bpf: add typecast to bpf helpers " Alexei Starovoitov
@ 2019-10-05 18:41   ` Andrii Nakryiko
  2019-10-06  4:00   ` John Fastabend
  1 sibling, 0 replies; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-05 18:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On Fri, Oct 4, 2019 at 10:05 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> When pahole converts dwarf to btf it emits only used types.
> Wrap existing bpf helper functions into typedef and use it in
> typecast to make gcc emits this type into dwarf.
> Then pahole will convert it to btf.
> The "btf_#name_of_helper" types will be used to figure out
> types of arguments of bpf helpers.
> The generate code before and after is the same.
> Only dwarf and btf are different.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

It's amazing this works :)

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  include/linux/filter.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 2ce57645f3cd..d3d51d7aff2c 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -464,10 +464,11 @@ static inline bool insn_is_zext(const struct bpf_insn *insn)
>  #define BPF_CALL_x(x, name, ...)                                              \
>         static __always_inline                                                 \
>         u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__));   \
> +       typedef u64 (*btf_##name)(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \
>         u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__));         \
>         u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__))          \
>         {                                                                      \
> -               return ____##name(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\
> +               return ((btf_##name)____##name)(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\
>         }                                                                      \
>         static __always_inline                                                 \
>         u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__))
> --
> 2.20.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH bpf-next 01/10] bpf: add typecast to raw_tracepoints to help BTF generation
  2019-10-05  5:03 ` [PATCH bpf-next 01/10] bpf: add typecast to raw_tracepoints to help BTF generation Alexei Starovoitov
  2019-10-05 18:40   ` Andrii Nakryiko
@ 2019-10-06  3:58   ` John Fastabend
  1 sibling, 0 replies; 39+ messages in thread
From: John Fastabend @ 2019-10-06  3:58 UTC (permalink / raw)
  To: Alexei Starovoitov, davem; +Cc: daniel, x86, netdev, bpf, kernel-team

Alexei Starovoitov wrote:
> When pahole converts dwarf to btf it emits only used types.
> Wrap existing __bpf_trace_##template() function into
> btf_trace_##template typedef and use it in type cast to
> make gcc emits this type into dwarf. Then pahole will convert it to btf.
> The "btf_trace_" prefix will be used to identify BTF enabled raw tracepoints.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

FWIW I also have some cases where pahole gets padding wrong when
converting dwarf to btf on older kernels. I'll try to get some
more details and fix or get useful bug reports out next week.
For now I work around them with some code on my side but can
confuse tracing programs.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [PATCH bpf-next 02/10] bpf: add typecast to bpf helpers to help BTF generation
  2019-10-05  5:03 ` [PATCH bpf-next 02/10] bpf: add typecast to bpf helpers " Alexei Starovoitov
  2019-10-05 18:41   ` Andrii Nakryiko
@ 2019-10-06  4:00   ` John Fastabend
  1 sibling, 0 replies; 39+ messages in thread
From: John Fastabend @ 2019-10-06  4:00 UTC (permalink / raw)
  To: Alexei Starovoitov, davem; +Cc: daniel, x86, netdev, bpf, kernel-team

Alexei Starovoitov wrote:
> When pahole converts dwarf to btf it emits only used types.
> Wrap existing bpf helper functions into typedef and use it in
> typecast to make gcc emits this type into dwarf.
> Then pahole will convert it to btf.
> The "btf_#name_of_helper" types will be used to figure out
> types of arguments of bpf helpers.
> The generate code before and after is the same.
> Only dwarf and btf are different.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  include/linux/filter.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 2ce57645f3cd..d3d51d7aff2c 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -464,10 +464,11 @@ static inline bool insn_is_zext(const struct bpf_insn *insn)
>  #define BPF_CALL_x(x, name, ...)					       \
>  	static __always_inline						       \
>  	u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__));   \
> +	typedef u64 (*btf_##name)(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \
>  	u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__));	       \
>  	u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__))	       \
>  	{								       \
> -		return ____##name(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\
> +		return ((btf_##name)____##name)(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\
>  	}								       \
>  	static __always_inline						       \
>  	u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__))
> -- 
> 2.20.0
> 

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: process in-kernel BTF
  2019-10-05  5:03 ` [PATCH bpf-next 03/10] bpf: process in-kernel BTF Alexei Starovoitov
@ 2019-10-06  6:36   ` Andrii Nakryiko
  2019-10-06 23:49     ` Alexei Starovoitov
  2019-10-09 20:51   ` Martin Lau
  1 sibling, 1 reply; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-06  6:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On Fri, Oct 4, 2019 at 10:08 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> If in-kernel BTF exists parse it and prepare 'struct btf *btf_vmlinux'
> for further use by the verifier.
> In-kernel BTF is trusted just like kallsyms and other build artifacts
> embedded into vmlinux.
> Yet run this BTF image through BTF verifier to make sure
> that it is valid and it wasn't mangled during the build.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  include/linux/bpf_verifier.h |  4 ++-
>  include/linux/btf.h          |  1 +
>  kernel/bpf/btf.c             | 66 ++++++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c        | 18 ++++++++++
>  4 files changed, 88 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 26a6d58ca78c..432ba8977a0a 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -330,10 +330,12 @@ static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log)
>  #define BPF_LOG_STATS  4
>  #define BPF_LOG_LEVEL  (BPF_LOG_LEVEL1 | BPF_LOG_LEVEL2)
>  #define BPF_LOG_MASK   (BPF_LOG_LEVEL | BPF_LOG_STATS)
> +#define BPF_LOG_KERNEL (BPF_LOG_MASK + 1)

It's not clear what's the numbering scheme is for these flags. Are
they independent bits? Only one bit allowed at a time? Only some
subset of bits allowed?
E.g., if I specify BPF_LOG_KERNEL an BPF_LOG_STATS, will it work?

If it's bits, then specifying BPF_LOG_KERNEL as (BPF_LOG_MASK + 1)
looks weird, setting it to 8 would be more obvious and
straightforward.

>
>  static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
>  {
> -       return log->level && log->ubuf && !bpf_verifier_log_full(log);
> +       return (log->level && log->ubuf && !bpf_verifier_log_full(log)) ||
> +               log->level == BPF_LOG_KERNEL;
>  }
>
>  #define BPF_MAX_SUBPROGS 256
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 64cdf2a23d42..55d43bc856be 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -56,6 +56,7 @@ bool btf_type_is_void(const struct btf_type *t);
>  #ifdef CONFIG_BPF_SYSCALL
>  const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
>  const char *btf_name_by_offset(const struct btf *btf, u32 offset);
> +struct btf *btf_parse_vmlinux(void);
>  #else
>  static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
>                                                     u32 type_id)
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 29c7c06c6bd6..848f9d4b9d7e 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -698,6 +698,9 @@ __printf(4, 5) static void __btf_verifier_log_type(struct btf_verifier_env *env,
>         if (!bpf_verifier_log_needed(log))
>                 return;
>
> +       if (log->level == BPF_LOG_KERNEL && !fmt)
> +               return;

This "!fmt" condition is subtle and took me a bit of time to
understand. Is the intent to print only verification errors for
BPF_LOG_KERNEL mode? Maybe small comment would help?

> +
>         __btf_verifier_log(log, "[%u] %s %s%s",
>                            env->log_type_id,
>                            btf_kind_str[kind],
> @@ -735,6 +738,8 @@ static void btf_verifier_log_member(struct btf_verifier_env *env,
>         if (!bpf_verifier_log_needed(log))
>                 return;
>
> +       if (log->level == BPF_LOG_KERNEL && !fmt)
> +               return;
>         /* The CHECK_META phase already did a btf dump.
>          *
>          * If member is logged again, it must hit an error in
> @@ -777,6 +782,8 @@ static void btf_verifier_log_vsi(struct btf_verifier_env *env,
>
>         if (!bpf_verifier_log_needed(log))
>                 return;
> +       if (log->level == BPF_LOG_KERNEL && !fmt)
> +               return;
>         if (env->phase != CHECK_META)
>                 btf_verifier_log_type(env, datasec_type, NULL);
>
> @@ -802,6 +809,8 @@ static void btf_verifier_log_hdr(struct btf_verifier_env *env,
>         if (!bpf_verifier_log_needed(log))
>                 return;
>
> +       if (log->level == BPF_LOG_KERNEL)
> +               return;
>         hdr = &btf->hdr;
>         __btf_verifier_log(log, "magic: 0x%x\n", hdr->magic);
>         __btf_verifier_log(log, "version: %u\n", hdr->version);
> @@ -2406,6 +2415,8 @@ static s32 btf_enum_check_meta(struct btf_verifier_env *env,
>                 }
>
>

nit: extra empty line here, might as well get rid of it in this change?

> +               if (env->log.level == BPF_LOG_KERNEL)
> +                       continue;
>                 btf_verifier_log(env, "\t%s val=%d\n",
>                                  __btf_name_by_offset(btf, enums[i].name_off),
>                                  enums[i].val);
> @@ -3367,6 +3378,61 @@ static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
>         return ERR_PTR(err);
>  }
>
> +extern char __weak _binary__btf_vmlinux_bin_start[];
> +extern char __weak _binary__btf_vmlinux_bin_end[];
> +
> +struct btf *btf_parse_vmlinux(void)

It's a bit unfortunate to duplicate a bunch of logic of btf_parse()
here. I assume you considered extending btf_parse() with extra flag
but decided it's better to have separate vmlinux-specific version?

> +{
> +       struct btf_verifier_env *env = NULL;
> +       struct bpf_verifier_log *log;
> +       struct btf *btf = NULL;
> +       int err;
> +

[...]

>         if (!copy_to_user(log->ubuf + log->len_used, log->kbuf, n + 1))
>                 log->len_used += n;
>         else
> @@ -9241,6 +9247,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
>         env->ops = bpf_verifier_ops[env->prog->type];
>         is_priv = capable(CAP_SYS_ADMIN);
>
> +       if (is_priv && !btf_vmlinux) {

I'm missing were you are checking that vmlinux BTF (raw data) is
present at all? Should this have additional `&&
_binary__btf_vmlinux_bin_start` check?

> +               mutex_lock(&bpf_verifier_lock);
> +               btf_vmlinux = btf_parse_vmlinux();

This is racy, you might end up parsing vmlinux BTF twice. Check
`!btf_vmlinux` again under lock?

> +               mutex_unlock(&bpf_verifier_lock);
> +       }
> +
>         /* grab the mutex to protect few globals used by verifier */
>         if (!is_priv)
>                 mutex_lock(&bpf_verifier_lock);
> @@ -9260,6 +9272,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
>                         goto err_unlock;
>         }
>
> +       if (IS_ERR(btf_vmlinux)) {

There is an interesting interplay between non-priviledged BPF and
corrupted vmlinux. If vmlinux BTF is malformed, but system only ever
does unprivileged BPF, then we'll never parse vmlinux BTF and won't
know it's malformed. But once some privileged BPF does parse and
detect problem, all subsequent unprivileged BPFs will fail due to bad
BTF, even though they shouldn't use/rely on it. Should something be
done about this inconsistency?

> +               verbose(env, "in-kernel BTF is malformed\n");
> +               ret = PTR_ERR(btf_vmlinux);
> +               goto err_unlock;
> +       }
> +
>         env->strict_alignment = !!(attr->prog_flags & BPF_F_STRICT_ALIGNMENT);
>         if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS))
>                 env->strict_alignment = true;
> --
> 2.20.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: process in-kernel BTF
  2019-10-06  6:36   ` Andrii Nakryiko
@ 2019-10-06 23:49     ` Alexei Starovoitov
  2019-10-07  0:20       ` Andrii Nakryiko
  0 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-06 23:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann, x86,
	Networking, bpf, Kernel Team

On Sat, Oct 05, 2019 at 11:36:16PM -0700, Andrii Nakryiko wrote:
> On Fri, Oct 4, 2019 at 10:08 PM Alexei Starovoitov <ast@kernel.org> wrote:
> >
> > If in-kernel BTF exists parse it and prepare 'struct btf *btf_vmlinux'
> > for further use by the verifier.
> > In-kernel BTF is trusted just like kallsyms and other build artifacts
> > embedded into vmlinux.
> > Yet run this BTF image through BTF verifier to make sure
> > that it is valid and it wasn't mangled during the build.
> >
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > ---
> >  include/linux/bpf_verifier.h |  4 ++-
> >  include/linux/btf.h          |  1 +
> >  kernel/bpf/btf.c             | 66 ++++++++++++++++++++++++++++++++++++
> >  kernel/bpf/verifier.c        | 18 ++++++++++
> >  4 files changed, 88 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 26a6d58ca78c..432ba8977a0a 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -330,10 +330,12 @@ static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log)
> >  #define BPF_LOG_STATS  4
> >  #define BPF_LOG_LEVEL  (BPF_LOG_LEVEL1 | BPF_LOG_LEVEL2)
> >  #define BPF_LOG_MASK   (BPF_LOG_LEVEL | BPF_LOG_STATS)
> > +#define BPF_LOG_KERNEL (BPF_LOG_MASK + 1)
> 
> It's not clear what's the numbering scheme is for these flags. Are
> they independent bits? Only one bit allowed at a time? Only some
> subset of bits allowed?
> E.g., if I specify BPF_LOG_KERNEL an BPF_LOG_STATS, will it work?

you cannot. It's kernel internal flag. User space cannot pass it in.
That's why it's just +1 and will keep floating up when other flags
are added in the future.
I considered using something really large instead (like ~0),
but it's imo cleaner to define it as max_visible_flag + 1.

> > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > index 29c7c06c6bd6..848f9d4b9d7e 100644
> > --- a/kernel/bpf/btf.c
> > +++ b/kernel/bpf/btf.c
> > @@ -698,6 +698,9 @@ __printf(4, 5) static void __btf_verifier_log_type(struct btf_verifier_env *env,
> >         if (!bpf_verifier_log_needed(log))
> >                 return;
> >
> > +       if (log->level == BPF_LOG_KERNEL && !fmt)
> > +               return;
> 
> This "!fmt" condition is subtle and took me a bit of time to
> understand. Is the intent to print only verification errors for
> BPF_LOG_KERNEL mode? Maybe small comment would help?

It's the way btf.c prints types. It's calling btf_verifier_log_type(..fmt=NULL).
I need to skip all of these, since they're there to debug invalid BTF
when user space passes it into the kernel.
Here the same code is processing in-kernel trusted BTF and extra messages
are completely unnecessary.
I will add a comment.

> 
> nit: extra empty line here, might as well get rid of it in this change?

yeah. the empty line was there before. Will remove it.

> 
> > +               if (env->log.level == BPF_LOG_KERNEL)
> > +                       continue;
> >                 btf_verifier_log(env, "\t%s val=%d\n",
> >                                  __btf_name_by_offset(btf, enums[i].name_off),
> >                                  enums[i].val);
> > @@ -3367,6 +3378,61 @@ static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
> >         return ERR_PTR(err);
> >  }
> >
> > +extern char __weak _binary__btf_vmlinux_bin_start[];
> > +extern char __weak _binary__btf_vmlinux_bin_end[];
> > +
> > +struct btf *btf_parse_vmlinux(void)
> 
> It's a bit unfortunate to duplicate a bunch of logic of btf_parse()
> here. I assume you considered extending btf_parse() with extra flag
> but decided it's better to have separate vmlinux-specific version?

Right. It looks similar, but it's 70-80% different. I actually started
with combined, but it didn't look good.

> >
> > +       if (is_priv && !btf_vmlinux) {
> 
> I'm missing were you are checking that vmlinux BTF (raw data) is
> present at all? Should this have additional `&&
> _binary__btf_vmlinux_bin_start` check?

btf_parse_hdr() is doing it.
But now I'm thinking I should gate it with CONFIG_DEBUG_INFO_BTF.

> 
> > +               mutex_lock(&bpf_verifier_lock);
> > +               btf_vmlinux = btf_parse_vmlinux();
> 
> This is racy, you might end up parsing vmlinux BTF twice. Check
> `!btf_vmlinux` again under lock?

right. good catch.

> >
> > +       if (IS_ERR(btf_vmlinux)) {
> 
> There is an interesting interplay between non-priviledged BPF and
> corrupted vmlinux. If vmlinux BTF is malformed, but system only ever
> does unprivileged BPF, then we'll never parse vmlinux BTF and won't
> know it's malformed. But once some privileged BPF does parse and
> detect problem, all subsequent unprivileged BPFs will fail due to bad
> BTF, even though they shouldn't use/rely on it. Should something be
> done about this inconsistency?

I did is_priv check to avoid parsing btf in unpriv, since no unpriv
progs will ever use this stuff.. (not until cpu hw side channels are fixed).
But this inconsistency is indeed bad.
Will refactor to do it always.
Broken in-kernel BTF is bad enough sign that either gcc or pahole or kernel
are broken. In all cases the kernel shouldn't be loading any bpf.

Thanks for the review!


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: process in-kernel BTF
  2019-10-06 23:49     ` Alexei Starovoitov
@ 2019-10-07  0:20       ` Andrii Nakryiko
  0 siblings, 0 replies; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-07  0:20 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann, x86,
	Networking, bpf, Kernel Team

On Sun, Oct 6, 2019 at 4:49 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Sat, Oct 05, 2019 at 11:36:16PM -0700, Andrii Nakryiko wrote:
> > On Fri, Oct 4, 2019 at 10:08 PM Alexei Starovoitov <ast@kernel.org> wrote:
> > >
> > > If in-kernel BTF exists parse it and prepare 'struct btf *btf_vmlinux'
> > > for further use by the verifier.
> > > In-kernel BTF is trusted just like kallsyms and other build artifacts
> > > embedded into vmlinux.
> > > Yet run this BTF image through BTF verifier to make sure
> > > that it is valid and it wasn't mangled during the build.
> > >
> > > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > > ---
> > >  include/linux/bpf_verifier.h |  4 ++-
> > >  include/linux/btf.h          |  1 +
> > >  kernel/bpf/btf.c             | 66 ++++++++++++++++++++++++++++++++++++
> > >  kernel/bpf/verifier.c        | 18 ++++++++++
> > >  4 files changed, 88 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > > index 26a6d58ca78c..432ba8977a0a 100644
> > > --- a/include/linux/bpf_verifier.h
> > > +++ b/include/linux/bpf_verifier.h
> > > @@ -330,10 +330,12 @@ static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log)
> > >  #define BPF_LOG_STATS  4
> > >  #define BPF_LOG_LEVEL  (BPF_LOG_LEVEL1 | BPF_LOG_LEVEL2)
> > >  #define BPF_LOG_MASK   (BPF_LOG_LEVEL | BPF_LOG_STATS)
> > > +#define BPF_LOG_KERNEL (BPF_LOG_MASK + 1)
> >
> > It's not clear what's the numbering scheme is for these flags. Are
> > they independent bits? Only one bit allowed at a time? Only some
> > subset of bits allowed?
> > E.g., if I specify BPF_LOG_KERNEL an BPF_LOG_STATS, will it work?
>
> you cannot. It's kernel internal flag. User space cannot pass it in.
> That's why it's just +1 and will keep floating up when other flags
> are added in the future.
> I considered using something really large instead (like ~0),
> but it's imo cleaner to define it as max_visible_flag + 1.

Ah, I see, maybe small comment, e.g., /* kernel-only flag */ or
something along those lines?

>
> > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > index 29c7c06c6bd6..848f9d4b9d7e 100644
> > > --- a/kernel/bpf/btf.c
> > > +++ b/kernel/bpf/btf.c
> > > @@ -698,6 +698,9 @@ __printf(4, 5) static void __btf_verifier_log_type(struct btf_verifier_env *env,
> > >         if (!bpf_verifier_log_needed(log))
> > >                 return;
> > >
> > > +       if (log->level == BPF_LOG_KERNEL && !fmt)
> > > +               return;
> >
> > This "!fmt" condition is subtle and took me a bit of time to
> > understand. Is the intent to print only verification errors for
> > BPF_LOG_KERNEL mode? Maybe small comment would help?
>
> It's the way btf.c prints types. It's calling btf_verifier_log_type(..fmt=NULL).
> I need to skip all of these, since they're there to debug invalid BTF
> when user space passes it into the kernel.
> Here the same code is processing in-kernel trusted BTF and extra messages
> are completely unnecessary.
> I will add a comment.
>
> >
> > nit: extra empty line here, might as well get rid of it in this change?
>
> yeah. the empty line was there before. Will remove it.
>
> >
> > > +               if (env->log.level == BPF_LOG_KERNEL)
> > > +                       continue;
> > >                 btf_verifier_log(env, "\t%s val=%d\n",
> > >                                  __btf_name_by_offset(btf, enums[i].name_off),
> > >                                  enums[i].val);
> > > @@ -3367,6 +3378,61 @@ static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
> > >         return ERR_PTR(err);
> > >  }
> > >
> > > +extern char __weak _binary__btf_vmlinux_bin_start[];
> > > +extern char __weak _binary__btf_vmlinux_bin_end[];
> > > +
> > > +struct btf *btf_parse_vmlinux(void)
> >
> > It's a bit unfortunate to duplicate a bunch of logic of btf_parse()
> > here. I assume you considered extending btf_parse() with extra flag
> > but decided it's better to have separate vmlinux-specific version?
>
> Right. It looks similar, but it's 70-80% different. I actually started
> with combined, but it didn't look good.
>
> > >
> > > +       if (is_priv && !btf_vmlinux) {
> >
> > I'm missing were you are checking that vmlinux BTF (raw data) is
> > present at all? Should this have additional `&&
> > _binary__btf_vmlinux_bin_start` check?
>
> btf_parse_hdr() is doing it.
> But now I'm thinking I should gate it with CONFIG_DEBUG_INFO_BTF.

You mean btf_data_size check? But in that case you'll get error
message printed even though no BTF was generated, so yeah, I guess
gating is cleaner.

>
> >
> > > +               mutex_lock(&bpf_verifier_lock);
> > > +               btf_vmlinux = btf_parse_vmlinux();
> >
> > This is racy, you might end up parsing vmlinux BTF twice. Check
> > `!btf_vmlinux` again under lock?
>
> right. good catch.
>
> > >
> > > +       if (IS_ERR(btf_vmlinux)) {
> >
> > There is an interesting interplay between non-priviledged BPF and
> > corrupted vmlinux. If vmlinux BTF is malformed, but system only ever
> > does unprivileged BPF, then we'll never parse vmlinux BTF and won't
> > know it's malformed. But once some privileged BPF does parse and
> > detect problem, all subsequent unprivileged BPFs will fail due to bad
> > BTF, even though they shouldn't use/rely on it. Should something be
> > done about this inconsistency?
>
> I did is_priv check to avoid parsing btf in unpriv, since no unpriv
> progs will ever use this stuff.. (not until cpu hw side channels are fixed).
> But this inconsistency is indeed bad.
> Will refactor to do it always.

Sounds good.

> Broken in-kernel BTF is bad enough sign that either gcc or pahole or kernel
> are broken. In all cases the kernel shouldn't be loading any bpf.
>
> Thanks for the review!
>

I'm intending to go over the rest today-tomorrow, so don't post v2 just yet :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF
  2019-10-05  5:03 ` [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF Alexei Starovoitov
@ 2019-10-07 16:32   ` Alan Maguire
  2019-10-09  3:59     ` Alexei Starovoitov
  2019-10-08  0:35   ` Andrii Nakryiko
  1 sibling, 1 reply; 39+ messages in thread
From: Alan Maguire @ 2019-10-07 16:32 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, x86, netdev, bpf, kernel-team

On Fri, 4 Oct 2019, Alexei Starovoitov wrote:

> libbpf analyzes bpf C program, searches in-kernel BTF for given type name
> and stores it into expected_attach_type.
> The kernel verifier expects this btf_id to point to something like:
> typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, void *loc);
> which represents signature of raw_tracepoint "kfree_skb".
> 
> Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb'
> and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint.
> In first case it passes btf_id of 'struct sk_buff *' back to the verifier core
> and 'void *' in second case.
> 
> Then the verifier tracks PTR_TO_BTF_ID as any other pointer type.
> Like PTR_TO_SOCKET points to 'struct bpf_sock',
> PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on.
> PTR_TO_BTF_ID points to in-kernel structs.
> If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF
> then PTR_TO_BTF_ID#1234 points to one of in kernel skbs.
> 
> When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32)
> the btf_struct_access() checks which field of 'struct sk_buff' is
> at offset 32. Checks that size of access matches type definition
> of the field and continues to track the dereferenced type.
> If that field was a pointer to 'struct net_device' the r2's type
> will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device'
> in vmlinux's BTF.
> 
> Such verifier anlaysis prevents "cheating" in BPF C program.
> The program cannot cast arbitrary pointer to 'struct sk_buff *'
> and access it. C compiler would allow type cast, of course,
> but the verifier will notice type mismatch based on BPF assembly
> and in-kernel BTF.
>

This is an incredible leap forward! One question I have relates to 
another aspect of checking. As we move from bpf_probe_read() to "direct 
struct access", should we have the verifier insist on the same sort of 
checking we have for direct packet access? Specifically I'm thinking of 
the case where a typed pointer argument might be NULL and we attempt to 
dereference it.  This might be as simple as adding 
PTR_TO_BTF_ID to the reg_type_may_be_null() check:

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0717aac..6559b4d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -342,7 +342,8 @@ static bool reg_type_may_be_null(enum bpf_reg_type 
type)
        return type == PTR_TO_MAP_VALUE_OR_NULL ||
               type == PTR_TO_SOCKET_OR_NULL ||
               type == PTR_TO_SOCK_COMMON_OR_NULL ||
-              type == PTR_TO_TCP_SOCK_OR_NULL;
+              type == PTR_TO_TCP_SOCK_OR_NULL ||
+              type == PTR_TO_BTF_ID;
 }
 
...in order to ensure we don't dereference the pointer before checking for 
NULL.  Possibly I'm missing something that will do that NULL checking 
already?

Thanks!

Alan

 > Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  include/linux/bpf.h          |  15 ++-
>  include/linux/bpf_verifier.h |   2 +
>  kernel/bpf/btf.c             | 179 +++++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c        |  69 +++++++++++++-
>  kernel/trace/bpf_trace.c     |   2 +-
>  5 files changed, 262 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 5b9d22338606..2dc3a7c313e9 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -281,6 +281,7 @@ enum bpf_reg_type {
>  	PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
>  	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
>  	PTR_TO_XDP_SOCK,	 /* reg points to struct xdp_sock */
> +	PTR_TO_BTF_ID,
>  };
>  
>  /* The information passed from prog-specific *_is_valid_access
> @@ -288,7 +289,11 @@ enum bpf_reg_type {
>   */
>  struct bpf_insn_access_aux {
>  	enum bpf_reg_type reg_type;
> -	int ctx_field_size;
> +	union {
> +		int ctx_field_size;
> +		u32 btf_id;
> +	};
> +	struct bpf_verifier_env *env; /* for verbose logs */
>  };
>  
>  static inline void
> @@ -747,6 +752,14 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>  int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>  				     const union bpf_attr *kattr,
>  				     union bpf_attr __user *uattr);
> +bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> +		    const struct bpf_prog *prog,
> +		    struct bpf_insn_access_aux *info);
> +int btf_struct_access(struct bpf_verifier_env *env,
> +		      const struct btf_type *t, int off, int size,
> +		      enum bpf_access_type atype,
> +		      u32 *next_btf_id);
> +
>  #else /* !CONFIG_BPF_SYSCALL */
>  static inline struct bpf_prog *bpf_prog_get(u32 ufd)
>  {
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 432ba8977a0a..e21782f49c45 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -52,6 +52,8 @@ struct bpf_reg_state {
>  		 */
>  		struct bpf_map *map_ptr;
>  
> +		u32 btf_id; /* for PTR_TO_BTF_ID */
> +
>  		/* Max size from any of the above. */
>  		unsigned long raw;
>  	};
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 848f9d4b9d7e..61ff8a54ca22 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -3433,6 +3433,185 @@ struct btf *btf_parse_vmlinux(void)
>  	return ERR_PTR(err);
>  }
>  
> +extern struct btf *btf_vmlinux;
> +
> +bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> +		    const struct bpf_prog *prog,
> +		    struct bpf_insn_access_aux *info)
> +{
> +	u32 btf_id = prog->expected_attach_type;
> +	const struct btf_param *args;
> +	const struct btf_type *t;
> +	const char prefix[] = "btf_trace_";
> +	const char *tname;
> +	u32 nr_args;
> +
> +	if (!btf_id)
> +		return true;
> +
> +	if (IS_ERR(btf_vmlinux)) {
> +		bpf_verifier_log_write(info->env, "btf_vmlinux is malformed\n");
> +		return false;
> +	}
> +
> +	t = btf_type_by_id(btf_vmlinux, btf_id);
> +	if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_TYPEDEF) {
> +		bpf_verifier_log_write(info->env, "btf_id is invalid\n");
> +		return false;
> +	}
> +
> +	tname = __btf_name_by_offset(btf_vmlinux, t->name_off);
> +	if (strncmp(prefix, tname, sizeof(prefix) - 1)) {
> +		bpf_verifier_log_write(info->env,
> +				       "btf_id points to wrong type name %s\n",
> +				       tname);
> +		return false;
> +	}
> +	tname += sizeof(prefix) - 1;
> +
> +	t = btf_type_by_id(btf_vmlinux, t->type);
> +	if (!btf_type_is_ptr(t))
> +		return false;
> +	t = btf_type_by_id(btf_vmlinux, t->type);
> +	if (!btf_type_is_func_proto(t))
> +		return false;
> +
> +	args = (const struct btf_param *)(t + 1);
> +	/* skip first 'void *__data' argument in btf_trace_* */
> +	nr_args = btf_type_vlen(t) - 1;
> +	if (off >= nr_args * 8) {
> +		bpf_verifier_log_write(info->env,
> +				       "raw_tp '%s' doesn't have %d-th argument\n",
> +				       tname, off / 8);
> +		return false;
> +	}
> +
> +	/* raw tp arg is off / 8, but typedef has extra 'void *', hence +1 */
> +	t = btf_type_by_id(btf_vmlinux, args[off / 8 + 1].type);
> +	if (btf_type_is_int(t))
> +		/* accessing a scalar */
> +		return true;
> +	if (!btf_type_is_ptr(t)) {
> +		bpf_verifier_log_write(info->env,
> +				       "raw_tp '%s' arg%d '%s' has type %s. Only pointer access is allowed\n",
> +				       tname, off / 8,
> +				       __btf_name_by_offset(btf_vmlinux, t->name_off),
> +				       btf_kind_str[BTF_INFO_KIND(t->info)]);
> +		return false;
> +	}
> +	if (t->type == 0)
> +		/* This is a pointer to void.
> +		 * It is the same as scalar from the verifier safety pov.
> +		 * No further pointer walking is allowed.
> +		 */
> +		return true;
> +
> +	/* this is a pointer to another type */
> +	info->reg_type = PTR_TO_BTF_ID;
> +	info->btf_id = t->type;
> +
> +	t = btf_type_by_id(btf_vmlinux, t->type);
> +	bpf_verifier_log_write(info->env,
> +			       "raw_tp '%s' arg%d has btf_id %d type %s '%s'\n",
> +			       tname, off / 8, info->btf_id,
> +			       btf_kind_str[BTF_INFO_KIND(t->info)],
> +			       __btf_name_by_offset(btf_vmlinux, t->name_off));
> +	return true;
> +}
> +
> +int btf_struct_access(struct bpf_verifier_env *env,
> +		      const struct btf_type *t, int off, int size,
> +		      enum bpf_access_type atype,
> +		      u32 *next_btf_id)
> +{
> +	const struct btf_member *member;
> +	const struct btf_type *mtype;
> +	const char *tname, *mname;
> +	int i, moff = 0, msize;
> +
> +again:
> +	tname = btf_name_by_offset(btf_vmlinux, t->name_off);
> +	if (!btf_type_is_struct(t)) {
> +		bpf_verifier_log_write(env, "Type '%s' is not a struct", tname);
> +		return -EINVAL;
> +	}
> +	if (btf_type_vlen(t) < 1) {
> +		bpf_verifier_log_write(env, "struct %s doesn't have fields", tname);
> +		return -EINVAL;
> +	}
> +
> +	for_each_member(i, t, member) {
> +
> +		/* offset of the field */
> +		moff = btf_member_bit_offset(t, member);
> +
> +		if (off < moff / 8)
> +			continue;
> +
> +		/* type of the field */
> +		mtype = btf_type_by_id(btf_vmlinux, member->type);
> +		mname = __btf_name_by_offset(btf_vmlinux, member->name_off);
> +
> +		/* skip typedef, volotile modifiers */
> +		while (btf_type_is_modifier(mtype))
> +			mtype = btf_type_by_id(btf_vmlinux, mtype->type);
> +
> +		if (btf_type_is_array(mtype))
> +			/* array deref is not supported yet */
> +			continue;
> +
> +		if (!btf_type_has_size(mtype) && !btf_type_is_ptr(mtype)) {
> +			bpf_verifier_log_write(env,
> +					       "field %s doesn't have size\n",
> +					       mname);
> +			return -EFAULT;
> +		}
> +		if (btf_type_is_ptr(mtype))
> +			msize = 8;
> +		else
> +			msize = mtype->size;
> +		if (off >= moff / 8 + msize)
> +			/* rare case, must be a field of the union with smaller size,
> +			 * let's try another field
> +			 */
> +			continue;
> +		/* the 'off' we're looking for is either equal to start
> +		 * of this field or inside of this struct
> +		 */
> +		if (btf_type_is_struct(mtype)) {
> +			/* our field must be inside that union or struct */
> +			t = mtype;
> +
> +			/* adjust offset we're looking for */
> +			off -= moff / 8;
> +			goto again;
> +		}
> +		if (msize != size) {
> +			/* field access size doesn't match */
> +			bpf_verifier_log_write(env,
> +					       "cannot access %d bytes in struct %s field %s that has size %d\n",
> +					       size, tname, mname, msize);
> +			return -EACCES;
> +		}
> +
> +		if (btf_type_is_ptr(mtype)) {
> +			const struct btf_type *stype;
> +
> +			stype = btf_type_by_id(btf_vmlinux, mtype->type);
> +			if (btf_type_is_struct(stype)) {
> +				*next_btf_id = mtype->type;
> +				return PTR_TO_BTF_ID;
> +			}
> +		}
> +		/* all other fields are treated as scalars */
> +		return SCALAR_VALUE;
> +	}
> +	bpf_verifier_log_write(env,
> +			       "struct %s doesn't have field at offset %d\n",
> +			       tname, off);
> +	return -EINVAL;
> +}
> +
>  void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
>  		       struct seq_file *m)
>  {
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 91c4db4d1c6a..3c155873ffea 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -406,6 +406,7 @@ static const char * const reg_type_str[] = {
>  	[PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null",
>  	[PTR_TO_TP_BUFFER]	= "tp_buffer",
>  	[PTR_TO_XDP_SOCK]	= "xdp_sock",
> +	[PTR_TO_BTF_ID]		= "ptr_",
>  };
>  
>  static char slot_type_char[] = {
> @@ -460,6 +461,10 @@ static void print_verifier_state(struct bpf_verifier_env *env,
>  			/* reg->off should be 0 for SCALAR_VALUE */
>  			verbose(env, "%lld", reg->var_off.value + reg->off);
>  		} else {
> +			if (t == PTR_TO_BTF_ID)
> +				verbose(env, "%s",
> +					btf_name_by_offset(btf_vmlinux,
> +							   btf_type_by_id(btf_vmlinux, reg->btf_id)->name_off));
>  			verbose(env, "(id=%d", reg->id);
>  			if (reg_type_may_be_refcounted_or_null(t))
>  				verbose(env, ",ref_obj_id=%d", reg->ref_obj_id);
> @@ -2337,10 +2342,12 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
>  
>  /* check access to 'struct bpf_context' fields.  Supports fixed offsets only */
>  static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size,
> -			    enum bpf_access_type t, enum bpf_reg_type *reg_type)
> +			    enum bpf_access_type t, enum bpf_reg_type *reg_type,
> +			    u32 *btf_id)
>  {
>  	struct bpf_insn_access_aux info = {
>  		.reg_type = *reg_type,
> +		.env = env,
>  	};
>  
>  	if (env->ops->is_valid_access &&
> @@ -2354,7 +2361,10 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
>  		 */
>  		*reg_type = info.reg_type;
>  
> -		env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
> +		if (*reg_type == PTR_TO_BTF_ID)
> +			*btf_id = info.btf_id;
> +		else
> +			env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
>  		/* remember the offset of last byte accessed in ctx */
>  		if (env->prog->aux->max_ctx_offset < off + size)
>  			env->prog->aux->max_ctx_offset = off + size;
> @@ -2745,6 +2755,53 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
>  	reg->smax_value = reg->umax_value;
>  }
>  
> +static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> +				   struct bpf_reg_state *regs,
> +				   int regno, int off, int size,
> +				   enum bpf_access_type atype,
> +				   int value_regno)
> +{
> +	struct bpf_reg_state *reg = regs + regno;
> +	const struct btf_type *t = btf_type_by_id(btf_vmlinux, reg->btf_id);
> +	const char *tname = btf_name_by_offset(btf_vmlinux, t->name_off);
> +	u32 btf_id;
> +	int ret;
> +
> +	if (atype != BPF_READ) {
> +		verbose(env, "only read is supported\n");
> +		return -EACCES;
> +	}
> +
> +	if (off < 0) {
> +		verbose(env,
> +			"R%d is ptr_%s negative access %d is not allowed\n",
> +			regno, tname, off);
> +		return -EACCES;
> +	}
> +	if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
> +		char tn_buf[48];
> +
> +		tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
> +		verbose(env,
> +			"R%d is ptr_%s invalid variable offset: off=%d, var_off=%s\n",
> +			regno, tname, off, tn_buf);
> +		return -EACCES;
> +	}
> +
> +	ret = btf_struct_access(env, t, off, size, atype, &btf_id);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (ret == SCALAR_VALUE) {
> +		mark_reg_unknown(env, regs, value_regno);
> +		return 0;
> +	}
> +	mark_reg_known_zero(env, regs, value_regno);
> +	regs[value_regno].type = PTR_TO_BTF_ID;
> +	regs[value_regno].btf_id = btf_id;
> +	return 0;
> +}
> +
>  /* check whether memory at (regno + off) is accessible for t = (read | write)
>   * if t==write, value_regno is a register which value is stored into memory
>   * if t==read, value_regno is a register which will receive the value from memory
> @@ -2787,6 +2844,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>  
>  	} else if (reg->type == PTR_TO_CTX) {
>  		enum bpf_reg_type reg_type = SCALAR_VALUE;
> +		u32 btf_id = 0;
>  
>  		if (t == BPF_WRITE && value_regno >= 0 &&
>  		    is_pointer_value(env, value_regno)) {
> @@ -2798,7 +2856,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>  		if (err < 0)
>  			return err;
>  
> -		err = check_ctx_access(env, insn_idx, off, size, t, &reg_type);
> +		err = check_ctx_access(env, insn_idx, off, size, t, &reg_type, &btf_id);
>  		if (!err && t == BPF_READ && value_regno >= 0) {
>  			/* ctx access returns either a scalar, or a
>  			 * PTR_TO_PACKET[_META,_END]. In the latter
> @@ -2817,6 +2875,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>  				 * a sub-register.
>  				 */
>  				regs[value_regno].subreg_def = DEF_NOT_SUBREG;
> +				if (reg_type == PTR_TO_BTF_ID)
> +					regs[value_regno].btf_id = btf_id;
>  			}
>  			regs[value_regno].type = reg_type;
>  		}
> @@ -2876,6 +2936,9 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>  		err = check_tp_buffer_access(env, reg, regno, off, size);
>  		if (!err && t == BPF_READ && value_regno >= 0)
>  			mark_reg_unknown(env, regs, value_regno);
> +	} else if (reg->type == PTR_TO_BTF_ID) {
> +		err = check_ptr_to_btf_access(env, regs, regno, off, size, t,
> +					      value_regno);
>  	} else {
>  		verbose(env, "R%d invalid mem access '%s'\n", regno,
>  			reg_type_str[reg->type]);
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 44bd08f2443b..6221e8c6ecc3 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1074,7 +1074,7 @@ static bool raw_tp_prog_is_valid_access(int off, int size,
>  		return false;
>  	if (off % size != 0)
>  		return false;
> -	return true;
> +	return btf_ctx_access(off, size, type, prog, info);
>  }
>  
>  const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
> -- 
> 2.20.0
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 04/10] libbpf: auto-detect btf_id of raw_tracepoint
  2019-10-05  5:03 ` [PATCH bpf-next 04/10] libbpf: auto-detect btf_id of raw_tracepoint Alexei Starovoitov
@ 2019-10-07 23:41   ` Andrii Nakryiko
  2019-10-09  2:26     ` Alexei Starovoitov
  0 siblings, 1 reply; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-07 23:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> For raw tracepoint program types libbpf will try to find
> btf_id of raw tracepoint in vmlinux's BTF.
> It's a responsiblity of bpf program author to annotate the program
> with SEC("raw_tracepoint/name") where "name" is a valid raw tracepoint.

As an aside, I've been thinking about allowing to specify "raw_tp/"
and "tp/" in section name as an "alias" for "raw_tracepoint/" and
"tracepoint/", respectively. Any objections?

> If "name" is indeed a valid raw tracepoint then in-kernel BTF
> will have "btf_trace_##name" typedef that points to function
> prototype of that raw tracepoint. BTF description captures
> exact argument the kernel C code is passing into raw tracepoint.
> The kernel verifier will check the types while loading bpf program.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  tools/lib/bpf/libbpf.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index e0276520171b..0e6f7b41c521 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -4591,6 +4591,22 @@ int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
>                         continue;
>                 *prog_type = section_names[i].prog_type;
>                 *expected_attach_type = section_names[i].expected_attach_type;
> +               if (*prog_type == BPF_PROG_TYPE_RAW_TRACEPOINT) {
> +                       struct btf *btf = bpf_core_find_kernel_btf();
> +                       char raw_tp_btf_name[128] = "btf_trace_";
> +                       int ret;
> +
> +                       if (IS_ERR(btf))
> +                               /* lack of kernel BTF is not a failure */
> +                               return 0;
> +                       /* append "btf_trace_" prefix per kernel convention */
> +                       strcpy(raw_tp_btf_name + sizeof("btf_trace_") - 1,
> +                              name + section_names[i].len);

buffer overflow here? use strncat() instead?

> +                       ret = btf__find_by_name(btf, raw_tp_btf_name);
> +                       if (ret > 0)
> +                               *expected_attach_type = ret;
> +                       btf__free(btf);
> +               }
>                 return 0;
>         }
>         pr_warning("failed to guess program type based on ELF section name '%s'\n", name);
> --
> 2.20.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF
  2019-10-05  5:03 ` [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF Alexei Starovoitov
  2019-10-07 16:32   ` Alan Maguire
@ 2019-10-08  0:35   ` Andrii Nakryiko
  2019-10-09  3:30     ` Alexei Starovoitov
  1 sibling, 1 reply; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-08  0:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> libbpf analyzes bpf C program, searches in-kernel BTF for given type name
> and stores it into expected_attach_type.
> The kernel verifier expects this btf_id to point to something like:
> typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, void *loc);
> which represents signature of raw_tracepoint "kfree_skb".
>
> Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb'
> and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint.
> In first case it passes btf_id of 'struct sk_buff *' back to the verifier core
> and 'void *' in second case.
>
> Then the verifier tracks PTR_TO_BTF_ID as any other pointer type.
> Like PTR_TO_SOCKET points to 'struct bpf_sock',
> PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on.
> PTR_TO_BTF_ID points to in-kernel structs.
> If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF
> then PTR_TO_BTF_ID#1234 points to one of in kernel skbs.
>
> When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32)
> the btf_struct_access() checks which field of 'struct sk_buff' is
> at offset 32. Checks that size of access matches type definition
> of the field and continues to track the dereferenced type.
> If that field was a pointer to 'struct net_device' the r2's type
> will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device'
> in vmlinux's BTF.
>
> Such verifier anlaysis prevents "cheating" in BPF C program.

typo: analysis

> The program cannot cast arbitrary pointer to 'struct sk_buff *'
> and access it. C compiler would allow type cast, of course,
> but the verifier will notice type mismatch based on BPF assembly
> and in-kernel BTF.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  include/linux/bpf.h          |  15 ++-
>  include/linux/bpf_verifier.h |   2 +
>  kernel/bpf/btf.c             | 179 +++++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c        |  69 +++++++++++++-
>  kernel/trace/bpf_trace.c     |   2 +-
>  5 files changed, 262 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 5b9d22338606..2dc3a7c313e9 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -281,6 +281,7 @@ enum bpf_reg_type {
>         PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
>         PTR_TO_TP_BUFFER,        /* reg points to a writable raw tp's buffer */
>         PTR_TO_XDP_SOCK,         /* reg points to struct xdp_sock */
> +       PTR_TO_BTF_ID,

comments for consistency? ;)

>  };
>
>  /* The information passed from prog-specific *_is_valid_access
> @@ -288,7 +289,11 @@ enum bpf_reg_type {
>   */
>  struct bpf_insn_access_aux {
>         enum bpf_reg_type reg_type;
> -       int ctx_field_size;
> +       union {
> +               int ctx_field_size;
> +               u32 btf_id;
> +       };
> +       struct bpf_verifier_env *env; /* for verbose logs */
>  };
>
>  static inline void
> @@ -747,6 +752,14 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>  int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>                                      const union bpf_attr *kattr,
>                                      union bpf_attr __user *uattr);
> +bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> +                   const struct bpf_prog *prog,
> +                   struct bpf_insn_access_aux *info);
> +int btf_struct_access(struct bpf_verifier_env *env,
> +                     const struct btf_type *t, int off, int size,
> +                     enum bpf_access_type atype,
> +                     u32 *next_btf_id);
> +
>  #else /* !CONFIG_BPF_SYSCALL */
>  static inline struct bpf_prog *bpf_prog_get(u32 ufd)
>  {
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 432ba8977a0a..e21782f49c45 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -52,6 +52,8 @@ struct bpf_reg_state {
>                  */
>                 struct bpf_map *map_ptr;
>
> +               u32 btf_id; /* for PTR_TO_BTF_ID */
> +
>                 /* Max size from any of the above. */
>                 unsigned long raw;
>         };
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 848f9d4b9d7e..61ff8a54ca22 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -3433,6 +3433,185 @@ struct btf *btf_parse_vmlinux(void)
>         return ERR_PTR(err);
>  }
>
> +extern struct btf *btf_vmlinux;
> +
> +bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> +                   const struct bpf_prog *prog,
> +                   struct bpf_insn_access_aux *info)
> +{
> +       u32 btf_id = prog->expected_attach_type;
> +       const struct btf_param *args;
> +       const struct btf_type *t;
> +       const char prefix[] = "btf_trace_";
> +       const char *tname;
> +       u32 nr_args;
> +
> +       if (!btf_id)
> +               return true;
> +
> +       if (IS_ERR(btf_vmlinux)) {
> +               bpf_verifier_log_write(info->env, "btf_vmlinux is malformed\n");
> +               return false;
> +       }
> +
> +       t = btf_type_by_id(btf_vmlinux, btf_id);
> +       if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_TYPEDEF) {
> +               bpf_verifier_log_write(info->env, "btf_id is invalid\n");
> +               return false;
> +       }
> +
> +       tname = __btf_name_by_offset(btf_vmlinux, t->name_off);
> +       if (strncmp(prefix, tname, sizeof(prefix) - 1)) {
> +               bpf_verifier_log_write(info->env,
> +                                      "btf_id points to wrong type name %s\n",
> +                                      tname);
> +               return false;
> +       }
> +       tname += sizeof(prefix) - 1;
> +
> +       t = btf_type_by_id(btf_vmlinux, t->type);
> +       if (!btf_type_is_ptr(t))
> +               return false;
> +       t = btf_type_by_id(btf_vmlinux, t->type);
> +       if (!btf_type_is_func_proto(t))
> +               return false;

All negative cases but these two have helpful log messages, please add
two more for these.

> +
> +       args = (const struct btf_param *)(t + 1);

IMO, doing args++ (and leaving comment why) here instead of adjusting
`off/8 + 1` below is cleaner.

> +       /* skip first 'void *__data' argument in btf_trace_* */
> +       nr_args = btf_type_vlen(t) - 1;
> +       if (off >= nr_args * 8) {

Looks like you forgot to check that `off % 8 == 0`?

> +               bpf_verifier_log_write(info->env,
> +                                      "raw_tp '%s' doesn't have %d-th argument\n",
> +                                      tname, off / 8);
> +               return false;
> +       }
> +
> +       /* raw tp arg is off / 8, but typedef has extra 'void *', hence +1 */
> +       t = btf_type_by_id(btf_vmlinux, args[off / 8 + 1].type);
> +       if (btf_type_is_int(t))

this is too limiting, you need to strip const/volatile/restrict and
resolve typedef's (e.g., size_t, __u64 -- that's all typedefs).

also probably want to allow enums.

btw, atomic_t is a struct, so might want to allow up to 8 byte
struct/unions (passed by value) reads? might never happen for
tracepoint, not sure

> +               /* accessing a scalar */
> +               return true;
> +       if (!btf_type_is_ptr(t)) {

similar to above, modifiers and typedef resolution has to happen first

> +               bpf_verifier_log_write(info->env,
> +                                      "raw_tp '%s' arg%d '%s' has type %s. Only pointer access is allowed\n",
> +                                      tname, off / 8,
> +                                      __btf_name_by_offset(btf_vmlinux, t->name_off),
> +                                      btf_kind_str[BTF_INFO_KIND(t->info)]);
> +               return false;
> +       }
> +       if (t->type == 0)
> +               /* This is a pointer to void.
> +                * It is the same as scalar from the verifier safety pov.
> +                * No further pointer walking is allowed.
> +                */
> +               return true;
> +
> +       /* this is a pointer to another type */
> +       info->reg_type = PTR_TO_BTF_ID;
> +       info->btf_id = t->type;
> +
> +       t = btf_type_by_id(btf_vmlinux, t->type);
> +       bpf_verifier_log_write(info->env,
> +                              "raw_tp '%s' arg%d has btf_id %d type %s '%s'\n",
> +                              tname, off / 8, info->btf_id,
> +                              btf_kind_str[BTF_INFO_KIND(t->info)],
> +                              __btf_name_by_offset(btf_vmlinux, t->name_off));
> +       return true;
> +}
> +
> +int btf_struct_access(struct bpf_verifier_env *env,
> +                     const struct btf_type *t, int off, int size,
> +                     enum bpf_access_type atype,
> +                     u32 *next_btf_id)
> +{
> +       const struct btf_member *member;
> +       const struct btf_type *mtype;
> +       const char *tname, *mname;
> +       int i, moff = 0, msize;
> +
> +again:
> +       tname = btf_name_by_offset(btf_vmlinux, t->name_off);
> +       if (!btf_type_is_struct(t)) {

see above about typedef/modifiers resolution

> +               bpf_verifier_log_write(env, "Type '%s' is not a struct", tname);
> +               return -EINVAL;
> +       }
> +       if (btf_type_vlen(t) < 1) {
> +               bpf_verifier_log_write(env, "struct %s doesn't have fields", tname);
> +               return -EINVAL;
> +       }

kind of redundant check...

> +
> +       for_each_member(i, t, member) {
> +
> +               /* offset of the field */
> +               moff = btf_member_bit_offset(t, member);

what do you want to do with bitfields?

> +
> +               if (off < moff / 8)
> +                       continue;
> +
> +               /* type of the field */
> +               mtype = btf_type_by_id(btf_vmlinux, member->type);
> +               mname = __btf_name_by_offset(btf_vmlinux, member->name_off);

nit: you mix btf_name_by_offset and __btf_name_by_offset, any reason
to not stick to just one of them (__btf_name_by_offset is safer, so
that one, probably)?

> +
> +               /* skip typedef, volotile modifiers */

typo: volatile

nit: also, volatile is not special, so either mention
const/volatile/restrict or just "modifiers"?

> +               while (btf_type_is_modifier(mtype))
> +                       mtype = btf_type_by_id(btf_vmlinux, mtype->type);
> +
> +               if (btf_type_is_array(mtype))
> +                       /* array deref is not supported yet */
> +                       continue;
> +
> +               if (!btf_type_has_size(mtype) && !btf_type_is_ptr(mtype)) {
> +                       bpf_verifier_log_write(env,
> +                                              "field %s doesn't have size\n",
> +                                              mname);
> +                       return -EFAULT;
> +               }
> +               if (btf_type_is_ptr(mtype))
> +                       msize = 8;
> +               else
> +                       msize = mtype->size;
> +               if (off >= moff / 8 + msize)
> +                       /* rare case, must be a field of the union with smaller size,
> +                        * let's try another field
> +                        */
> +                       continue;
> +               /* the 'off' we're looking for is either equal to start
> +                * of this field or inside of this struct
> +                */
> +               if (btf_type_is_struct(mtype)) {
> +                       /* our field must be inside that union or struct */
> +                       t = mtype;
> +
> +                       /* adjust offset we're looking for */
> +                       off -= moff / 8;
> +                       goto again;
> +               }
> +               if (msize != size) {
> +                       /* field access size doesn't match */
> +                       bpf_verifier_log_write(env,
> +                                              "cannot access %d bytes in struct %s field %s that has size %d\n",
> +                                              size, tname, mname, msize);
> +                       return -EACCES;
> +               }
> +
> +               if (btf_type_is_ptr(mtype)) {
> +                       const struct btf_type *stype;
> +
> +                       stype = btf_type_by_id(btf_vmlinux, mtype->type);
> +                       if (btf_type_is_struct(stype)) {

again, resolving modifiers/typedefs? though in this case it might be
too eager?...

> +                               *next_btf_id = mtype->type;
> +                               return PTR_TO_BTF_ID;
> +                       }
> +               }
> +               /* all other fields are treated as scalars */
> +               return SCALAR_VALUE;
> +       }
> +       bpf_verifier_log_write(env,
> +                              "struct %s doesn't have field at offset %d\n",
> +                              tname, off);
> +       return -EINVAL;
> +}
> +
>  void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
>                        struct seq_file *m)
>  {
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 91c4db4d1c6a..3c155873ffea 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -406,6 +406,7 @@ static const char * const reg_type_str[] = {
>         [PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null",
>         [PTR_TO_TP_BUFFER]      = "tp_buffer",
>         [PTR_TO_XDP_SOCK]       = "xdp_sock",
> +       [PTR_TO_BTF_ID]         = "ptr_",
>  };
>
>  static char slot_type_char[] = {
> @@ -460,6 +461,10 @@ static void print_verifier_state(struct bpf_verifier_env *env,
>                         /* reg->off should be 0 for SCALAR_VALUE */
>                         verbose(env, "%lld", reg->var_off.value + reg->off);
>                 } else {
> +                       if (t == PTR_TO_BTF_ID)
> +                               verbose(env, "%s",
> +                                       btf_name_by_offset(btf_vmlinux,
> +                                                          btf_type_by_id(btf_vmlinux, reg->btf_id)->name_off));
>                         verbose(env, "(id=%d", reg->id);
>                         if (reg_type_may_be_refcounted_or_null(t))
>                                 verbose(env, ",ref_obj_id=%d", reg->ref_obj_id);
> @@ -2337,10 +2342,12 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
>
>  /* check access to 'struct bpf_context' fields.  Supports fixed offsets only */
>  static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size,
> -                           enum bpf_access_type t, enum bpf_reg_type *reg_type)
> +                           enum bpf_access_type t, enum bpf_reg_type *reg_type,
> +                           u32 *btf_id)
>  {
>         struct bpf_insn_access_aux info = {
>                 .reg_type = *reg_type,
> +               .env = env,
>         };
>
>         if (env->ops->is_valid_access &&
> @@ -2354,7 +2361,10 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
>                  */
>                 *reg_type = info.reg_type;
>
> -               env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
> +               if (*reg_type == PTR_TO_BTF_ID)
> +                       *btf_id = info.btf_id;
> +               else
> +                       env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;

ctx_field_size is passed through bpf_insn_access_aux, but btf_id is
returned like this. Is there a reason to do it in two different ways?

>                 /* remember the offset of last byte accessed in ctx */
>                 if (env->prog->aux->max_ctx_offset < off + size)
>                         env->prog->aux->max_ctx_offset = off + size;
> @@ -2745,6 +2755,53 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
>         reg->smax_value = reg->umax_value;
>  }
>
> +static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> +                                  struct bpf_reg_state *regs,
> +                                  int regno, int off, int size,
> +                                  enum bpf_access_type atype,
> +                                  int value_regno)
> +{
> +       struct bpf_reg_state *reg = regs + regno;
> +       const struct btf_type *t = btf_type_by_id(btf_vmlinux, reg->btf_id);
> +       const char *tname = btf_name_by_offset(btf_vmlinux, t->name_off);
> +       u32 btf_id;
> +       int ret;
> +
> +       if (atype != BPF_READ) {
> +               verbose(env, "only read is supported\n");
> +               return -EACCES;
> +       }
> +
> +       if (off < 0) {
> +               verbose(env,
> +                       "R%d is ptr_%s negative access %d is not allowed\n",

totally nit: but for consistency sake (following variable offset error
below): R%d is ptr_%s negative access: off=%d\n"?

> +                       regno, tname, off);
> +               return -EACCES;
> +       }
> +       if (!tnum_is_const(reg->var_off) || reg->var_off.value) {

why so strict about reg->var_off.value?

> +               char tn_buf[48];
> +
> +               tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off);
> +               verbose(env,
> +                       "R%d is ptr_%s invalid variable offset: off=%d, var_off=%s\n",
> +                       regno, tname, off, tn_buf);
> +               return -EACCES;
> +       }
> +
> +       ret = btf_struct_access(env, t, off, size, atype, &btf_id);
> +       if (ret < 0)
> +               return ret;
> +
> +       if (ret == SCALAR_VALUE) {
> +               mark_reg_unknown(env, regs, value_regno);
> +               return 0;
> +       }
> +       mark_reg_known_zero(env, regs, value_regno);
> +       regs[value_regno].type = PTR_TO_BTF_ID;
> +       regs[value_regno].btf_id = btf_id;
> +       return 0;
> +}
> +
>  /* check whether memory at (regno + off) is accessible for t = (read | write)
>   * if t==write, value_regno is a register which value is stored into memory
>   * if t==read, value_regno is a register which will receive the value from memory
> @@ -2787,6 +2844,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>
>         } else if (reg->type == PTR_TO_CTX) {
>                 enum bpf_reg_type reg_type = SCALAR_VALUE;
> +               u32 btf_id = 0;
>
>                 if (t == BPF_WRITE && value_regno >= 0 &&
>                     is_pointer_value(env, value_regno)) {
> @@ -2798,7 +2856,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>                 if (err < 0)
>                         return err;
>
> -               err = check_ctx_access(env, insn_idx, off, size, t, &reg_type);
> +               err = check_ctx_access(env, insn_idx, off, size, t, &reg_type, &btf_id);
>                 if (!err && t == BPF_READ && value_regno >= 0) {
>                         /* ctx access returns either a scalar, or a
>                          * PTR_TO_PACKET[_META,_END]. In the latter
> @@ -2817,6 +2875,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>                                  * a sub-register.
>                                  */
>                                 regs[value_regno].subreg_def = DEF_NOT_SUBREG;
> +                               if (reg_type == PTR_TO_BTF_ID)
> +                                       regs[value_regno].btf_id = btf_id;
>                         }
>                         regs[value_regno].type = reg_type;
>                 }
> @@ -2876,6 +2936,9 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>                 err = check_tp_buffer_access(env, reg, regno, off, size);
>                 if (!err && t == BPF_READ && value_regno >= 0)
>                         mark_reg_unknown(env, regs, value_regno);
> +       } else if (reg->type == PTR_TO_BTF_ID) {
> +               err = check_ptr_to_btf_access(env, regs, regno, off, size, t,
> +                                             value_regno);
>         } else {
>                 verbose(env, "R%d invalid mem access '%s'\n", regno,
>                         reg_type_str[reg->type]);
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 44bd08f2443b..6221e8c6ecc3 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1074,7 +1074,7 @@ static bool raw_tp_prog_is_valid_access(int off, int size,
>                 return false;
>         if (off % size != 0)
>                 return false;
> -       return true;
> +       return btf_ctx_access(off, size, type, prog, info);
>  }
>
>  const struct bpf_verifier_ops raw_tracepoint_verifier_ops = {
> --
> 2.20.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 06/10] bpf: add support for BTF pointers to interpreter
  2019-10-05  5:03 ` [PATCH bpf-next 06/10] bpf: add support for BTF pointers to interpreter Alexei Starovoitov
@ 2019-10-08  3:08   ` Andrii Nakryiko
  0 siblings, 0 replies; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-08  3:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On Fri, Oct 4, 2019 at 10:07 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> Pointer to BTF object is a pointer to kernel object or NULL.
> The memory access in the interpreter has to be done via probe_kernel_read
> to avoid page faults.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

LGTM.

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  include/linux/filter.h |  3 +++
>  kernel/bpf/core.c      | 19 +++++++++++++++++++
>  kernel/bpf/verifier.c  |  8 ++++++++
>  3 files changed, 30 insertions(+)
>

[...]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 04/10] libbpf: auto-detect btf_id of raw_tracepoint
  2019-10-07 23:41   ` Andrii Nakryiko
@ 2019-10-09  2:26     ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-09  2:26 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On 10/7/19 4:41 PM, Andrii Nakryiko wrote:
> On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>>
>> For raw tracepoint program types libbpf will try to find
>> btf_id of raw tracepoint in vmlinux's BTF.
>> It's a responsiblity of bpf program author to annotate the program
>> with SEC("raw_tracepoint/name") where "name" is a valid raw tracepoint.
> 
> As an aside, I've been thinking about allowing to specify "raw_tp/"
> and "tp/" in section name as an "alias" for "raw_tracepoint/" and
> "tracepoint/", respectively. Any objections?

make sense.

>> If "name" is indeed a valid raw tracepoint then in-kernel BTF
>> will have "btf_trace_##name" typedef that points to function
>> prototype of that raw tracepoint. BTF description captures
>> exact argument the kernel C code is passing into raw tracepoint.
>> The kernel verifier will check the types while loading bpf program.
>>
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> ---
>>   tools/lib/bpf/libbpf.c | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index e0276520171b..0e6f7b41c521 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -4591,6 +4591,22 @@ int libbpf_prog_type_by_name(const char *name, enum bpf_prog_type *prog_type,
>>                          continue;
>>                  *prog_type = section_names[i].prog_type;
>>                  *expected_attach_type = section_names[i].expected_attach_type;
>> +               if (*prog_type == BPF_PROG_TYPE_RAW_TRACEPOINT) {
>> +                       struct btf *btf = bpf_core_find_kernel_btf();
>> +                       char raw_tp_btf_name[128] = "btf_trace_";
>> +                       int ret;
>> +
>> +                       if (IS_ERR(btf))
>> +                               /* lack of kernel BTF is not a failure */
>> +                               return 0;
>> +                       /* append "btf_trace_" prefix per kernel convention */
>> +                       strcpy(raw_tp_btf_name + sizeof("btf_trace_") - 1,
>> +                              name + section_names[i].len);
> 
> buffer overflow here? use strncat() instead?

128 is ksym_name and due to tp construction with other prefixes,
I think, it cannot overflow, but that's a good nit. Fixed it.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF
  2019-10-08  0:35   ` Andrii Nakryiko
@ 2019-10-09  3:30     ` Alexei Starovoitov
  2019-10-09  4:01       ` Andrii Nakryiko
  0 siblings, 1 reply; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-09  3:30 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On 10/7/19 5:35 PM, Andrii Nakryiko wrote:
> On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>>
>> libbpf analyzes bpf C program, searches in-kernel BTF for given type name
>> and stores it into expected_attach_type.
>> The kernel verifier expects this btf_id to point to something like:
>> typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, void *loc);
>> which represents signature of raw_tracepoint "kfree_skb".
>>
>> Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb'
>> and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint.
>> In first case it passes btf_id of 'struct sk_buff *' back to the verifier core
>> and 'void *' in second case.
>>
>> Then the verifier tracks PTR_TO_BTF_ID as any other pointer type.
>> Like PTR_TO_SOCKET points to 'struct bpf_sock',
>> PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on.
>> PTR_TO_BTF_ID points to in-kernel structs.
>> If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF
>> then PTR_TO_BTF_ID#1234 points to one of in kernel skbs.
>>
>> When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32)
>> the btf_struct_access() checks which field of 'struct sk_buff' is
>> at offset 32. Checks that size of access matches type definition
>> of the field and continues to track the dereferenced type.
>> If that field was a pointer to 'struct net_device' the r2's type
>> will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device'
>> in vmlinux's BTF.
>>
>> Such verifier anlaysis prevents "cheating" in BPF C program.
> 
> typo: analysis

I did ran spellcheck, but couldn't interpret its input :)

> 
>> The program cannot cast arbitrary pointer to 'struct sk_buff *'
>> and access it. C compiler would allow type cast, of course,
>> but the verifier will notice type mismatch based on BPF assembly
>> and in-kernel BTF.
>>
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> ---
>>   include/linux/bpf.h          |  15 ++-
>>   include/linux/bpf_verifier.h |   2 +
>>   kernel/bpf/btf.c             | 179 +++++++++++++++++++++++++++++++++++
>>   kernel/bpf/verifier.c        |  69 +++++++++++++-
>>   kernel/trace/bpf_trace.c     |   2 +-
>>   5 files changed, 262 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index 5b9d22338606..2dc3a7c313e9 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -281,6 +281,7 @@ enum bpf_reg_type {
>>          PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
>>          PTR_TO_TP_BUFFER,        /* reg points to a writable raw tp's buffer */
>>          PTR_TO_XDP_SOCK,         /* reg points to struct xdp_sock */
>> +       PTR_TO_BTF_ID,
> 
> comments for consistency? ;)

fixed

>>   };
>>
>>   /* The information passed from prog-specific *_is_valid_access
>> @@ -288,7 +289,11 @@ enum bpf_reg_type {
>>    */
>>   struct bpf_insn_access_aux {
>>          enum bpf_reg_type reg_type;
>> -       int ctx_field_size;
>> +       union {
>> +               int ctx_field_size;
>> +               u32 btf_id;
>> +       };
>> +       struct bpf_verifier_env *env; /* for verbose logs */
>>   };
>>
>>   static inline void
>> @@ -747,6 +752,14 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>>   int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
>>                                       const union bpf_attr *kattr,
>>                                       union bpf_attr __user *uattr);
>> +bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>> +                   const struct bpf_prog *prog,
>> +                   struct bpf_insn_access_aux *info);
>> +int btf_struct_access(struct bpf_verifier_env *env,
>> +                     const struct btf_type *t, int off, int size,
>> +                     enum bpf_access_type atype,
>> +                     u32 *next_btf_id);
>> +
>>   #else /* !CONFIG_BPF_SYSCALL */
>>   static inline struct bpf_prog *bpf_prog_get(u32 ufd)
>>   {
>> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
>> index 432ba8977a0a..e21782f49c45 100644
>> --- a/include/linux/bpf_verifier.h
>> +++ b/include/linux/bpf_verifier.h
>> @@ -52,6 +52,8 @@ struct bpf_reg_state {
>>                   */
>>                  struct bpf_map *map_ptr;
>>
>> +               u32 btf_id; /* for PTR_TO_BTF_ID */
>> +
>>                  /* Max size from any of the above. */
>>                  unsigned long raw;
>>          };
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index 848f9d4b9d7e..61ff8a54ca22 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -3433,6 +3433,185 @@ struct btf *btf_parse_vmlinux(void)
>>          return ERR_PTR(err);
>>   }
>>
>> +extern struct btf *btf_vmlinux;
>> +
>> +bool btf_ctx_access(int off, int size, enum bpf_access_type type,
>> +                   const struct bpf_prog *prog,
>> +                   struct bpf_insn_access_aux *info)
>> +{
>> +       u32 btf_id = prog->expected_attach_type;
>> +       const struct btf_param *args;
>> +       const struct btf_type *t;
>> +       const char prefix[] = "btf_trace_";
>> +       const char *tname;
>> +       u32 nr_args;
>> +
>> +       if (!btf_id)
>> +               return true;
>> +
>> +       if (IS_ERR(btf_vmlinux)) {
>> +               bpf_verifier_log_write(info->env, "btf_vmlinux is malformed\n");
>> +               return false;
>> +       }
>> +
>> +       t = btf_type_by_id(btf_vmlinux, btf_id);
>> +       if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_TYPEDEF) {
>> +               bpf_verifier_log_write(info->env, "btf_id is invalid\n");
>> +               return false;
>> +       }
>> +
>> +       tname = __btf_name_by_offset(btf_vmlinux, t->name_off);
>> +       if (strncmp(prefix, tname, sizeof(prefix) - 1)) {
>> +               bpf_verifier_log_write(info->env,
>> +                                      "btf_id points to wrong type name %s\n",
>> +                                      tname);
>> +               return false;
>> +       }
>> +       tname += sizeof(prefix) - 1;
>> +
>> +       t = btf_type_by_id(btf_vmlinux, t->type);
>> +       if (!btf_type_is_ptr(t))
>> +               return false;
>> +       t = btf_type_by_id(btf_vmlinux, t->type);
>> +       if (!btf_type_is_func_proto(t))
>> +               return false;
> 
> All negative cases but these two have helpful log messages, please add
> two more for these.

no. not here. This is a part of typedef construction from patch 1.
It cannot be anything else. If btf_id points to typedef and
typedef has btf_trace_ prefix it has to be correct.
Above two checks are checking sanity of kernel build.

> 
>> +
>> +       args = (const struct btf_param *)(t + 1);
> 
> IMO, doing args++ (and leaving comment why) here instead of adjusting
> `off/8 + 1` below is cleaner.

I tried your suggestion and it doesn't look any better, but why not.
Since I've coded it anyway.

>> +       /* skip first 'void *__data' argument in btf_trace_* */
>> +       nr_args = btf_type_vlen(t) - 1;
>> +       if (off >= nr_args * 8) {
> 
> Looks like you forgot to check that `off % 8 == 0`?

great catch. yes. fixed

>> +               bpf_verifier_log_write(info->env,
>> +                                      "raw_tp '%s' doesn't have %d-th argument\n",
>> +                                      tname, off / 8);
>> +               return false;
>> +       }
>> +
>> +       /* raw tp arg is off / 8, but typedef has extra 'void *', hence +1 */
>> +       t = btf_type_by_id(btf_vmlinux, args[off / 8 + 1].type);
>> +       if (btf_type_is_int(t))
> 
> this is too limiting, you need to strip const/volatile/restrict and
> resolve typedef's (e.g., size_t, __u64 -- that's all typedefs).

right. done.

> also probably want to allow enums.

eventually yes. I prefer to walk first.

> btw, atomic_t is a struct, so might want to allow up to 8 byte
> struct/unions (passed by value) reads? might never happen for
> tracepoint, not sure

may be in the future. walk first.

> 
>> +               /* accessing a scalar */
>> +               return true;
>> +       if (!btf_type_is_ptr(t)) {
> 
> similar to above, modifiers and typedef resolution has to happen first

done.

>> +               bpf_verifier_log_write(info->env,
>> +                                      "raw_tp '%s' arg%d '%s' has type %s. Only pointer access is allowed\n",
>> +                                      tname, off / 8,
>> +                                      __btf_name_by_offset(btf_vmlinux, t->name_off),
>> +                                      btf_kind_str[BTF_INFO_KIND(t->info)]);
>> +               return false;
>> +       }
>> +       if (t->type == 0)
>> +               /* This is a pointer to void.
>> +                * It is the same as scalar from the verifier safety pov.
>> +                * No further pointer walking is allowed.
>> +                */
>> +               return true;
>> +
>> +       /* this is a pointer to another type */
>> +       info->reg_type = PTR_TO_BTF_ID;
>> +       info->btf_id = t->type;
>> +
>> +       t = btf_type_by_id(btf_vmlinux, t->type);
>> +       bpf_verifier_log_write(info->env,
>> +                              "raw_tp '%s' arg%d has btf_id %d type %s '%s'\n",
>> +                              tname, off / 8, info->btf_id,
>> +                              btf_kind_str[BTF_INFO_KIND(t->info)],
>> +                              __btf_name_by_offset(btf_vmlinux, t->name_off));
>> +       return true;
>> +}
>> +
>> +int btf_struct_access(struct bpf_verifier_env *env,
>> +                     const struct btf_type *t, int off, int size,
>> +                     enum bpf_access_type atype,
>> +                     u32 *next_btf_id)
>> +{
>> +       const struct btf_member *member;
>> +       const struct btf_type *mtype;
>> +       const char *tname, *mname;
>> +       int i, moff = 0, msize;
>> +
>> +again:
>> +       tname = btf_name_by_offset(btf_vmlinux, t->name_off);
>> +       if (!btf_type_is_struct(t)) {
> 
> see above about typedef/modifiers resolution

here actually skipping is not necessary.

> 
>> +               bpf_verifier_log_write(env, "Type '%s' is not a struct", tname);
>> +               return -EINVAL;
>> +       }
>> +       if (btf_type_vlen(t) < 1) {
>> +               bpf_verifier_log_write(env, "struct %s doesn't have fields", tname);
>> +               return -EINVAL;
>> +       }
> 
> kind of redundant check...

I wanted to give helpful message, but since you asked.
There are 394 struct FOO {}; in the kernel.
And probably none of them are going to appear in bpf tracing,
so I deleted that check.

> 
>> +
>> +       for_each_member(i, t, member) {
>> +
>> +               /* offset of the field */
>> +               moff = btf_member_bit_offset(t, member);
> 
> what do you want to do with bitfields?

they're scalars.

>> +
>> +               if (off < moff / 8)
>> +                       continue;
>> +
>> +               /* type of the field */
>> +               mtype = btf_type_by_id(btf_vmlinux, member->type);
>> +               mname = __btf_name_by_offset(btf_vmlinux, member->name_off);
> 
> nit: you mix btf_name_by_offset and __btf_name_by_offset, any reason
> to not stick to just one of them (__btf_name_by_offset is safer, so
> that one, probably)?

I tried to use btf_name_by_offset() in verifier.c and
__btf_name_by_offset() in btf.c consistently.
Looks like I missed one spot.
Fixed.

> 
>> +
>> +               /* skip typedef, volotile modifiers */
> 
> typo: volatile
> 
> nit: also, volatile is not special, so either mention
> const/volatile/restrict or just "modifiers"?

fixed

>> +               while (btf_type_is_modifier(mtype))
>> +                       mtype = btf_type_by_id(btf_vmlinux, mtype->type);
>> +
>> +               if (btf_type_is_array(mtype))
>> +                       /* array deref is not supported yet */
>> +                       continue;
>> +
>> +               if (!btf_type_has_size(mtype) && !btf_type_is_ptr(mtype)) {
>> +                       bpf_verifier_log_write(env,
>> +                                              "field %s doesn't have size\n",
>> +                                              mname);
>> +                       return -EFAULT;
>> +               }
>> +               if (btf_type_is_ptr(mtype))
>> +                       msize = 8;
>> +               else
>> +                       msize = mtype->size;
>> +               if (off >= moff / 8 + msize)
>> +                       /* rare case, must be a field of the union with smaller size,
>> +                        * let's try another field
>> +                        */
>> +                       continue;
>> +               /* the 'off' we're looking for is either equal to start
>> +                * of this field or inside of this struct
>> +                */
>> +               if (btf_type_is_struct(mtype)) {
>> +                       /* our field must be inside that union or struct */
>> +                       t = mtype;
>> +
>> +                       /* adjust offset we're looking for */
>> +                       off -= moff / 8;
>> +                       goto again;
>> +               }
>> +               if (msize != size) {
>> +                       /* field access size doesn't match */
>> +                       bpf_verifier_log_write(env,
>> +                                              "cannot access %d bytes in struct %s field %s that has size %d\n",
>> +                                              size, tname, mname, msize);
>> +                       return -EACCES;
>> +               }
>> +
>> +               if (btf_type_is_ptr(mtype)) {
>> +                       const struct btf_type *stype;
>> +
>> +                       stype = btf_type_by_id(btf_vmlinux, mtype->type);
>> +                       if (btf_type_is_struct(stype)) {
> 
> again, resolving modifiers/typedefs? though in this case it might be
> too eager?...

done

>> +                               *next_btf_id = mtype->type;
>> +                               return PTR_TO_BTF_ID;
>> +                       }
>> +               }
>> +               /* all other fields are treated as scalars */
>> +               return SCALAR_VALUE;
>> +       }
>> +       bpf_verifier_log_write(env,
>> +                              "struct %s doesn't have field at offset %d\n",
>> +                              tname, off);
>> +       return -EINVAL;
>> +}
>> +
>>   void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
>>                         struct seq_file *m)
>>   {
>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> index 91c4db4d1c6a..3c155873ffea 100644
>> --- a/kernel/bpf/verifier.c
>> +++ b/kernel/bpf/verifier.c
>> @@ -406,6 +406,7 @@ static const char * const reg_type_str[] = {
>>          [PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null",
>>          [PTR_TO_TP_BUFFER]      = "tp_buffer",
>>          [PTR_TO_XDP_SOCK]       = "xdp_sock",
>> +       [PTR_TO_BTF_ID]         = "ptr_",
>>   };
>>
>>   static char slot_type_char[] = {
>> @@ -460,6 +461,10 @@ static void print_verifier_state(struct bpf_verifier_env *env,
>>                          /* reg->off should be 0 for SCALAR_VALUE */
>>                          verbose(env, "%lld", reg->var_off.value + reg->off);
>>                  } else {
>> +                       if (t == PTR_TO_BTF_ID)
>> +                               verbose(env, "%s",
>> +                                       btf_name_by_offset(btf_vmlinux,
>> +                                                          btf_type_by_id(btf_vmlinux, reg->btf_id)->name_off));
>>                          verbose(env, "(id=%d", reg->id);
>>                          if (reg_type_may_be_refcounted_or_null(t))
>>                                  verbose(env, ",ref_obj_id=%d", reg->ref_obj_id);
>> @@ -2337,10 +2342,12 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
>>
>>   /* check access to 'struct bpf_context' fields.  Supports fixed offsets only */
>>   static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size,
>> -                           enum bpf_access_type t, enum bpf_reg_type *reg_type)
>> +                           enum bpf_access_type t, enum bpf_reg_type *reg_type,
>> +                           u32 *btf_id)
>>   {
>>          struct bpf_insn_access_aux info = {
>>                  .reg_type = *reg_type,
>> +               .env = env,
>>          };
>>
>>          if (env->ops->is_valid_access &&
>> @@ -2354,7 +2361,10 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
>>                   */
>>                  *reg_type = info.reg_type;
>>
>> -               env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
>> +               if (*reg_type == PTR_TO_BTF_ID)
>> +                       *btf_id = info.btf_id;
>> +               else
>> +                       env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
> 
> ctx_field_size is passed through bpf_insn_access_aux, but btf_id is
> returned like this. Is there a reason to do it in two different ways?

insn_aux_data is permanent. Meaning that any executing path into
this instruction got to have the same size and offset of ctx access.
I think btf based ctx access doesn't have to be.
There is a check later to make sure r1=ctx dereferenceing btf is
permanent, but r1=btf dereferencing btf is clearly note.
I'm still not sure whether former will be permanent forever.
So went with quick hack above to reduce amount of potential
refactoring later. I'll think about it more.

>>                  /* remember the offset of last byte accessed in ctx */
>>                  if (env->prog->aux->max_ctx_offset < off + size)
>>                          env->prog->aux->max_ctx_offset = off + size;
>> @@ -2745,6 +2755,53 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
>>          reg->smax_value = reg->umax_value;
>>   }
>>
>> +static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
>> +                                  struct bpf_reg_state *regs,
>> +                                  int regno, int off, int size,
>> +                                  enum bpf_access_type atype,
>> +                                  int value_regno)
>> +{
>> +       struct bpf_reg_state *reg = regs + regno;
>> +       const struct btf_type *t = btf_type_by_id(btf_vmlinux, reg->btf_id);
>> +       const char *tname = btf_name_by_offset(btf_vmlinux, t->name_off);
>> +       u32 btf_id;
>> +       int ret;
>> +
>> +       if (atype != BPF_READ) {
>> +               verbose(env, "only read is supported\n");
>> +               return -EACCES;
>> +       }
>> +
>> +       if (off < 0) {
>> +               verbose(env,
>> +                       "R%d is ptr_%s negative access %d is not allowed\n",
> 
> totally nit: but for consistency sake (following variable offset error
> below): R%d is ptr_%s negative access: off=%d\n"?

fixed

>> +                       regno, tname, off);
>> +               return -EACCES;
>> +       }
>> +       if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
> 
> why so strict about reg->var_off.value?

It's variable part of register access.
There is no fixed offset to pass into btf_struct_access().
In other words 'arrays of pointers to btf_id' are not supported yet.
Walk first :)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF
  2019-10-07 16:32   ` Alan Maguire
@ 2019-10-09  3:59     ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-09  3:59 UTC (permalink / raw)
  To: Alan Maguire, Alexei Starovoitov
  Cc: davem, daniel, x86, netdev, bpf, Kernel Team

On 10/7/19 9:32 AM, Alan Maguire wrote:
> This is an incredible leap forward! One question I have relates to
> another aspect of checking. As we move from bpf_probe_read() to "direct
> struct access", should we have the verifier insist on the same sort of
> checking we have for direct packet access? Specifically I'm thinking of
> the case where a typed pointer argument might be NULL and we attempt to
> dereference it.  This might be as simple as adding
> PTR_TO_BTF_ID to the reg_type_may_be_null() check:
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 0717aac..6559b4d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -342,7 +342,8 @@ static bool reg_type_may_be_null(enum bpf_reg_type
> type)
>          return type == PTR_TO_MAP_VALUE_OR_NULL ||
>                 type == PTR_TO_SOCKET_OR_NULL ||
>                 type == PTR_TO_SOCK_COMMON_OR_NULL ||
> -              type == PTR_TO_TCP_SOCK_OR_NULL;
> +              type == PTR_TO_TCP_SOCK_OR_NULL ||
> +              type == PTR_TO_BTF_ID;
>   }
>   
> ...in order to ensure we don't dereference the pointer before checking for
> NULL.  Possibly I'm missing something that will do that NULL checking
> already?

well, it's not as simple as above ;) but the point is valid.
Yes. It's definitely possible to enforce NULL check for every step
of btf pointer walking.

The thing is that in bpf tracing all scripts are using bpf_probe_read
and walk the pointers without checking error code.
In most cases the people who write those scripts know what they're
walking and know that the pointers will be valid.
Take execsnoop.py from bcc/tools. It's doing:
task->real_parent->tgid;
Every arrow bcc is magically replacing with bpf_probe_read.
I believe 'real_parent' is always valid, so above should
return expected data all the time.
But even if the pointer is not valid the cost of checking it
in the program is not worth it. All accesses are probe_read-ed.
If we make verifier forcing users to check every pointer for NULL
the bpf C code will look very ugly.
And not only ugly, but slow. Code size will increase, etc.

Long term we're thinking to add try/catch-like builtins.
So instead of doing
         __builtin_preserve_access_index(({
                 dev = skb->dev;
                 ifindex = dev->ifindex;
         }));
like the test from patch 10 is doing.
We will be able to write BPF program like:
         __builtin_try(({
                 dev = skb->dev;
                 ifindex = dev->ifindex;
         }), ({
		// handle page fault
		bpf_printk("skb is NULL or skb->dev is NULL! sucks");
	}));
On the kernel side it will be supported through extable mechanism.
Once we realized that C language can be improved we started doing so :)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF
  2019-10-09  3:30     ` Alexei Starovoitov
@ 2019-10-09  4:01       ` Andrii Nakryiko
  2019-10-09  5:10         ` Andrii Nakryiko
  0 siblings, 1 reply; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-09  4:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann, x86,
	Networking, bpf, Kernel Team

On Tue, Oct 8, 2019 at 8:31 PM Alexei Starovoitov <ast@fb.com> wrote:
>
> On 10/7/19 5:35 PM, Andrii Nakryiko wrote:
> > On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
> >>
> >> libbpf analyzes bpf C program, searches in-kernel BTF for given type name
> >> and stores it into expected_attach_type.
> >> The kernel verifier expects this btf_id to point to something like:
> >> typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, void *loc);
> >> which represents signature of raw_tracepoint "kfree_skb".
> >>
> >> Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb'
> >> and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint.
> >> In first case it passes btf_id of 'struct sk_buff *' back to the verifier core
> >> and 'void *' in second case.
> >>
> >> Then the verifier tracks PTR_TO_BTF_ID as any other pointer type.
> >> Like PTR_TO_SOCKET points to 'struct bpf_sock',
> >> PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on.
> >> PTR_TO_BTF_ID points to in-kernel structs.
> >> If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF
> >> then PTR_TO_BTF_ID#1234 points to one of in kernel skbs.
> >>
> >> When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32)
> >> the btf_struct_access() checks which field of 'struct sk_buff' is
> >> at offset 32. Checks that size of access matches type definition
> >> of the field and continues to track the dereferenced type.
> >> If that field was a pointer to 'struct net_device' the r2's type
> >> will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device'
> >> in vmlinux's BTF.
> >>
> >> Such verifier anlaysis prevents "cheating" in BPF C program.
> >
> > typo: analysis
>
> I did ran spellcheck, but couldn't interpret its input :)
>
> >
> >> The program cannot cast arbitrary pointer to 'struct sk_buff *'
> >> and access it. C compiler would allow type cast, of course,
> >> but the verifier will notice type mismatch based on BPF assembly
> >> and in-kernel BTF.
> >>
> >> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> >> ---
> >>   include/linux/bpf.h          |  15 ++-
> >>   include/linux/bpf_verifier.h |   2 +
> >>   kernel/bpf/btf.c             | 179 +++++++++++++++++++++++++++++++++++
> >>   kernel/bpf/verifier.c        |  69 +++++++++++++-
> >>   kernel/trace/bpf_trace.c     |   2 +-
> >>   5 files changed, 262 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> >> index 5b9d22338606..2dc3a7c313e9 100644
> >> --- a/include/linux/bpf.h
> >> +++ b/include/linux/bpf.h
> >> @@ -281,6 +281,7 @@ enum bpf_reg_type {
> >>          PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
> >>          PTR_TO_TP_BUFFER,        /* reg points to a writable raw tp's buffer */
> >>          PTR_TO_XDP_SOCK,         /* reg points to struct xdp_sock */
> >> +       PTR_TO_BTF_ID,
> >
> > comments for consistency? ;)
>
> fixed
>
> >>   };
> >>
> >>   /* The information passed from prog-specific *_is_valid_access
> >> @@ -288,7 +289,11 @@ enum bpf_reg_type {
> >>    */
> >>   struct bpf_insn_access_aux {
> >>          enum bpf_reg_type reg_type;
> >> -       int ctx_field_size;
> >> +       union {
> >> +               int ctx_field_size;
> >> +               u32 btf_id;
> >> +       };
> >> +       struct bpf_verifier_env *env; /* for verbose logs */
> >>   };
> >>
> >>   static inline void
> >> @@ -747,6 +752,14 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
> >>   int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> >>                                       const union bpf_attr *kattr,
> >>                                       union bpf_attr __user *uattr);
> >> +bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> >> +                   const struct bpf_prog *prog,
> >> +                   struct bpf_insn_access_aux *info);
> >> +int btf_struct_access(struct bpf_verifier_env *env,
> >> +                     const struct btf_type *t, int off, int size,
> >> +                     enum bpf_access_type atype,
> >> +                     u32 *next_btf_id);
> >> +
> >>   #else /* !CONFIG_BPF_SYSCALL */
> >>   static inline struct bpf_prog *bpf_prog_get(u32 ufd)
> >>   {
> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> >> index 432ba8977a0a..e21782f49c45 100644
> >> --- a/include/linux/bpf_verifier.h
> >> +++ b/include/linux/bpf_verifier.h
> >> @@ -52,6 +52,8 @@ struct bpf_reg_state {
> >>                   */
> >>                  struct bpf_map *map_ptr;
> >>
> >> +               u32 btf_id; /* for PTR_TO_BTF_ID */
> >> +
> >>                  /* Max size from any of the above. */
> >>                  unsigned long raw;
> >>          };
> >> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> >> index 848f9d4b9d7e..61ff8a54ca22 100644
> >> --- a/kernel/bpf/btf.c
> >> +++ b/kernel/bpf/btf.c
> >> @@ -3433,6 +3433,185 @@ struct btf *btf_parse_vmlinux(void)
> >>          return ERR_PTR(err);
> >>   }
> >>
> >> +extern struct btf *btf_vmlinux;
> >> +
> >> +bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> >> +                   const struct bpf_prog *prog,
> >> +                   struct bpf_insn_access_aux *info)
> >> +{
> >> +       u32 btf_id = prog->expected_attach_type;
> >> +       const struct btf_param *args;
> >> +       const struct btf_type *t;
> >> +       const char prefix[] = "btf_trace_";
> >> +       const char *tname;
> >> +       u32 nr_args;
> >> +
> >> +       if (!btf_id)
> >> +               return true;
> >> +
> >> +       if (IS_ERR(btf_vmlinux)) {
> >> +               bpf_verifier_log_write(info->env, "btf_vmlinux is malformed\n");
> >> +               return false;
> >> +       }
> >> +
> >> +       t = btf_type_by_id(btf_vmlinux, btf_id);
> >> +       if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_TYPEDEF) {
> >> +               bpf_verifier_log_write(info->env, "btf_id is invalid\n");
> >> +               return false;
> >> +       }
> >> +
> >> +       tname = __btf_name_by_offset(btf_vmlinux, t->name_off);
> >> +       if (strncmp(prefix, tname, sizeof(prefix) - 1)) {
> >> +               bpf_verifier_log_write(info->env,
> >> +                                      "btf_id points to wrong type name %s\n",
> >> +                                      tname);
> >> +               return false;
> >> +       }
> >> +       tname += sizeof(prefix) - 1;
> >> +
> >> +       t = btf_type_by_id(btf_vmlinux, t->type);
> >> +       if (!btf_type_is_ptr(t))
> >> +               return false;
> >> +       t = btf_type_by_id(btf_vmlinux, t->type);
> >> +       if (!btf_type_is_func_proto(t))
> >> +               return false;
> >
> > All negative cases but these two have helpful log messages, please add
> > two more for these.
>
> no. not here. This is a part of typedef construction from patch 1.
> It cannot be anything else. If btf_id points to typedef and
> typedef has btf_trace_ prefix it has to be correct.
> Above two checks are checking sanity of kernel build.

Fair enough.

>
> >
> >> +
> >> +       args = (const struct btf_param *)(t + 1);
> >
> > IMO, doing args++ (and leaving comment why) here instead of adjusting
> > `off/8 + 1` below is cleaner.
>
> I tried your suggestion and it doesn't look any better, but why not.
> Since I've coded it anyway.
>
> >> +       /* skip first 'void *__data' argument in btf_trace_* */
> >> +       nr_args = btf_type_vlen(t) - 1;
> >> +       if (off >= nr_args * 8) {
> >
> > Looks like you forgot to check that `off % 8 == 0`?
>
> great catch. yes. fixed
>
> >> +               bpf_verifier_log_write(info->env,
> >> +                                      "raw_tp '%s' doesn't have %d-th argument\n",
> >> +                                      tname, off / 8);
> >> +               return false;
> >> +       }
> >> +
> >> +       /* raw tp arg is off / 8, but typedef has extra 'void *', hence +1 */
> >> +       t = btf_type_by_id(btf_vmlinux, args[off / 8 + 1].type);
> >> +       if (btf_type_is_int(t))
> >
> > this is too limiting, you need to strip const/volatile/restrict and
> > resolve typedef's (e.g., size_t, __u64 -- that's all typedefs).
>
> right. done.
>
> > also probably want to allow enums.
>
> eventually yes. I prefer to walk first.

sure, but it is just an integer (scalar) so with just `||
btf_is_enum(t)` you get that support for free.

>
> > btw, atomic_t is a struct, so might want to allow up to 8 byte
> > struct/unions (passed by value) reads? might never happen for
> > tracepoint, not sure
>
> may be in the future. walk first.
>
> >
> >> +               /* accessing a scalar */
> >> +               return true;
> >> +       if (!btf_type_is_ptr(t)) {
> >
> > similar to above, modifiers and typedef resolution has to happen first
>
> done.
>
> >> +               bpf_verifier_log_write(info->env,
> >> +                                      "raw_tp '%s' arg%d '%s' has type %s. Only pointer access is allowed\n",
> >> +                                      tname, off / 8,
> >> +                                      __btf_name_by_offset(btf_vmlinux, t->name_off),
> >> +                                      btf_kind_str[BTF_INFO_KIND(t->info)]);
> >> +               return false;
> >> +       }
> >> +       if (t->type == 0)
> >> +               /* This is a pointer to void.
> >> +                * It is the same as scalar from the verifier safety pov.
> >> +                * No further pointer walking is allowed.
> >> +                */
> >> +               return true;
> >> +
> >> +       /* this is a pointer to another type */
> >> +       info->reg_type = PTR_TO_BTF_ID;
> >> +       info->btf_id = t->type;
> >> +
> >> +       t = btf_type_by_id(btf_vmlinux, t->type);
> >> +       bpf_verifier_log_write(info->env,
> >> +                              "raw_tp '%s' arg%d has btf_id %d type %s '%s'\n",
> >> +                              tname, off / 8, info->btf_id,
> >> +                              btf_kind_str[BTF_INFO_KIND(t->info)],
> >> +                              __btf_name_by_offset(btf_vmlinux, t->name_off));
> >> +       return true;
> >> +}
> >> +
> >> +int btf_struct_access(struct bpf_verifier_env *env,
> >> +                     const struct btf_type *t, int off, int size,
> >> +                     enum bpf_access_type atype,
> >> +                     u32 *next_btf_id)
> >> +{
> >> +       const struct btf_member *member;
> >> +       const struct btf_type *mtype;
> >> +       const char *tname, *mname;
> >> +       int i, moff = 0, msize;
> >> +
> >> +again:
> >> +       tname = btf_name_by_offset(btf_vmlinux, t->name_off);
> >> +       if (!btf_type_is_struct(t)) {
> >
> > see above about typedef/modifiers resolution
>
> here actually skipping is not necessary.
>
> >
> >> +               bpf_verifier_log_write(env, "Type '%s' is not a struct", tname);
> >> +               return -EINVAL;
> >> +       }
> >> +       if (btf_type_vlen(t) < 1) {
> >> +               bpf_verifier_log_write(env, "struct %s doesn't have fields", tname);
> >> +               return -EINVAL;
> >> +       }
> >
> > kind of redundant check...
>
> I wanted to give helpful message, but since you asked.
> There are 394 struct FOO {}; in the kernel.
> And probably none of them are going to appear in bpf tracing,
> so I deleted that check.
>
> >
> >> +
> >> +       for_each_member(i, t, member) {
> >> +
> >> +               /* offset of the field */
> >> +               moff = btf_member_bit_offset(t, member);
> >
> > what do you want to do with bitfields?
>
> they're scalars.

well, I meant that `off` is offset in bytes, while moff is offset in
bits and for bitfield fields it might not be a multiple of 8, so after
check below (off < moff/8) it doesn't necessarily mean that `off * 8
== moff` and you'll be "capturing" wrong field. So you probably need
extra check for that?

More generally, also, `off` can point into the middle of some field,
not the beginning of the field (because it's just a byte offset, so
can be arbitrary). So there are two things there: detecting this
situation and what to do with this, reject or assume opaque scalar
value?

>
> >> +
> >> +               if (off < moff / 8)
> >> +                       continue;

Thinking about this again, this seems like an inverted condition.
Shouldn't it be "skip field until you find field offset equal or
greater than our desired offset":

if (moff < off * 8)
    continue;

> >> +
> >> +               /* type of the field */
> >> +               mtype = btf_type_by_id(btf_vmlinux, member->type);
> >> +               mname = __btf_name_by_offset(btf_vmlinux, member->name_off);
> >
> > nit: you mix btf_name_by_offset and __btf_name_by_offset, any reason
> > to not stick to just one of them (__btf_name_by_offset is safer, so
> > that one, probably)?
>
> I tried to use btf_name_by_offset() in verifier.c and
> __btf_name_by_offset() in btf.c consistently.
> Looks like I missed one spot.
> Fixed.
>
> >
> >> +
> >> +               /* skip typedef, volotile modifiers */
> >
> > typo: volatile
> >
> > nit: also, volatile is not special, so either mention
> > const/volatile/restrict or just "modifiers"?
>
> fixed
>
> >> +               while (btf_type_is_modifier(mtype))
> >> +                       mtype = btf_type_by_id(btf_vmlinux, mtype->type);
> >> +
> >> +               if (btf_type_is_array(mtype))
> >> +                       /* array deref is not supported yet */
> >> +                       continue;
> >> +
> >> +               if (!btf_type_has_size(mtype) && !btf_type_is_ptr(mtype)) {
> >> +                       bpf_verifier_log_write(env,
> >> +                                              "field %s doesn't have size\n",
> >> +                                              mname);
> >> +                       return -EFAULT;
> >> +               }
> >> +               if (btf_type_is_ptr(mtype))
> >> +                       msize = 8;
> >> +               else
> >> +                       msize = mtype->size;
> >> +               if (off >= moff / 8 + msize)
> >> +                       /* rare case, must be a field of the union with smaller size,
> >> +                        * let's try another field
> >> +                        */
> >> +                       continue;
> >> +               /* the 'off' we're looking for is either equal to start
> >> +                * of this field or inside of this struct
> >> +                */
> >> +               if (btf_type_is_struct(mtype)) {
> >> +                       /* our field must be inside that union or struct */
> >> +                       t = mtype;
> >> +
> >> +                       /* adjust offset we're looking for */
> >> +                       off -= moff / 8;
> >> +                       goto again;
> >> +               }
> >> +               if (msize != size) {
> >> +                       /* field access size doesn't match */
> >> +                       bpf_verifier_log_write(env,
> >> +                                              "cannot access %d bytes in struct %s field %s that has size %d\n",
> >> +                                              size, tname, mname, msize);
> >> +                       return -EACCES;
> >> +               }
> >> +
> >> +               if (btf_type_is_ptr(mtype)) {
> >> +                       const struct btf_type *stype;
> >> +
> >> +                       stype = btf_type_by_id(btf_vmlinux, mtype->type);
> >> +                       if (btf_type_is_struct(stype)) {
> >
> > again, resolving modifiers/typedefs? though in this case it might be
> > too eager?...
>
> done
>
> >> +                               *next_btf_id = mtype->type;
> >> +                               return PTR_TO_BTF_ID;
> >> +                       }
> >> +               }
> >> +               /* all other fields are treated as scalars */
> >> +               return SCALAR_VALUE;
> >> +       }
> >> +       bpf_verifier_log_write(env,
> >> +                              "struct %s doesn't have field at offset %d\n",
> >> +                              tname, off);
> >> +       return -EINVAL;
> >> +}
> >> +
> >>   void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
> >>                         struct seq_file *m)
> >>   {
> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> >> index 91c4db4d1c6a..3c155873ffea 100644
> >> --- a/kernel/bpf/verifier.c
> >> +++ b/kernel/bpf/verifier.c
> >> @@ -406,6 +406,7 @@ static const char * const reg_type_str[] = {
> >>          [PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null",
> >>          [PTR_TO_TP_BUFFER]      = "tp_buffer",
> >>          [PTR_TO_XDP_SOCK]       = "xdp_sock",
> >> +       [PTR_TO_BTF_ID]         = "ptr_",
> >>   };
> >>
> >>   static char slot_type_char[] = {
> >> @@ -460,6 +461,10 @@ static void print_verifier_state(struct bpf_verifier_env *env,
> >>                          /* reg->off should be 0 for SCALAR_VALUE */
> >>                          verbose(env, "%lld", reg->var_off.value + reg->off);
> >>                  } else {
> >> +                       if (t == PTR_TO_BTF_ID)
> >> +                               verbose(env, "%s",
> >> +                                       btf_name_by_offset(btf_vmlinux,
> >> +                                                          btf_type_by_id(btf_vmlinux, reg->btf_id)->name_off));
> >>                          verbose(env, "(id=%d", reg->id);
> >>                          if (reg_type_may_be_refcounted_or_null(t))
> >>                                  verbose(env, ",ref_obj_id=%d", reg->ref_obj_id);
> >> @@ -2337,10 +2342,12 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
> >>
> >>   /* check access to 'struct bpf_context' fields.  Supports fixed offsets only */
> >>   static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size,
> >> -                           enum bpf_access_type t, enum bpf_reg_type *reg_type)
> >> +                           enum bpf_access_type t, enum bpf_reg_type *reg_type,
> >> +                           u32 *btf_id)
> >>   {
> >>          struct bpf_insn_access_aux info = {
> >>                  .reg_type = *reg_type,
> >> +               .env = env,
> >>          };
> >>
> >>          if (env->ops->is_valid_access &&
> >> @@ -2354,7 +2361,10 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
> >>                   */
> >>                  *reg_type = info.reg_type;
> >>
> >> -               env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
> >> +               if (*reg_type == PTR_TO_BTF_ID)
> >> +                       *btf_id = info.btf_id;
> >> +               else
> >> +                       env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
> >
> > ctx_field_size is passed through bpf_insn_access_aux, but btf_id is
> > returned like this. Is there a reason to do it in two different ways?
>
> insn_aux_data is permanent. Meaning that any executing path into
> this instruction got to have the same size and offset of ctx access.
> I think btf based ctx access doesn't have to be.
> There is a check later to make sure r1=ctx dereferenceing btf is
> permanent, but r1=btf dereferencing btf is clearly note.
> I'm still not sure whether former will be permanent forever.
> So went with quick hack above to reduce amount of potential
> refactoring later. I'll think about it more.

Yeah, it makes sense. I agree we shouldn't enforce same BTF type ID
through all executions paths, if possible. I just saw you added btf_id
both to struct bpf_insn_access_aux and reg_state, so was wondering
what's going on.

>
> >>                  /* remember the offset of last byte accessed in ctx */
> >>                  if (env->prog->aux->max_ctx_offset < off + size)
> >>                          env->prog->aux->max_ctx_offset = off + size;
> >> @@ -2745,6 +2755,53 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
> >>          reg->smax_value = reg->umax_value;
> >>   }
> >>
> >> +static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> >> +                                  struct bpf_reg_state *regs,
> >> +                                  int regno, int off, int size,
> >> +                                  enum bpf_access_type atype,
> >> +                                  int value_regno)
> >> +{
> >> +       struct bpf_reg_state *reg = regs + regno;
> >> +       const struct btf_type *t = btf_type_by_id(btf_vmlinux, reg->btf_id);
> >> +       const char *tname = btf_name_by_offset(btf_vmlinux, t->name_off);
> >> +       u32 btf_id;
> >> +       int ret;
> >> +
> >> +       if (atype != BPF_READ) {
> >> +               verbose(env, "only read is supported\n");
> >> +               return -EACCES;
> >> +       }
> >> +
> >> +       if (off < 0) {
> >> +               verbose(env,
> >> +                       "R%d is ptr_%s negative access %d is not allowed\n",
> >
> > totally nit: but for consistency sake (following variable offset error
> > below): R%d is ptr_%s negative access: off=%d\n"?
>
> fixed
>
> >> +                       regno, tname, off);
> >> +               return -EACCES;
> >> +       }
> >> +       if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
> >
> > why so strict about reg->var_off.value?
>
> It's variable part of register access.
> There is no fixed offset to pass into btf_struct_access().
> In other words 'arrays of pointers to btf_id' are not supported yet.
> Walk first :)
>

yeah, I'm fine with that

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF
  2019-10-09  4:01       ` Andrii Nakryiko
@ 2019-10-09  5:10         ` Andrii Nakryiko
  2019-10-10  3:54           ` Alexei Starovoitov
  0 siblings, 1 reply; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-09  5:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann, x86,
	Networking, bpf, Kernel Team

On Tue, Oct 8, 2019 at 9:01 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Oct 8, 2019 at 8:31 PM Alexei Starovoitov <ast@fb.com> wrote:
> >
> > On 10/7/19 5:35 PM, Andrii Nakryiko wrote:
> > > On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
> > >>
> > >> libbpf analyzes bpf C program, searches in-kernel BTF for given type name
> > >> and stores it into expected_attach_type.
> > >> The kernel verifier expects this btf_id to point to something like:
> > >> typedef void (*btf_trace_kfree_skb)(void *, struct sk_buff *skb, void *loc);
> > >> which represents signature of raw_tracepoint "kfree_skb".
> > >>
> > >> Then btf_ctx_access() matches ctx+0 access in bpf program with 'skb'
> > >> and 'ctx+8' access with 'loc' arguments of "kfree_skb" tracepoint.
> > >> In first case it passes btf_id of 'struct sk_buff *' back to the verifier core
> > >> and 'void *' in second case.
> > >>
> > >> Then the verifier tracks PTR_TO_BTF_ID as any other pointer type.
> > >> Like PTR_TO_SOCKET points to 'struct bpf_sock',
> > >> PTR_TO_TCP_SOCK points to 'struct bpf_tcp_sock', and so on.
> > >> PTR_TO_BTF_ID points to in-kernel structs.
> > >> If 1234 is btf_id of 'struct sk_buff' in vmlinux's BTF
> > >> then PTR_TO_BTF_ID#1234 points to one of in kernel skbs.
> > >>
> > >> When PTR_TO_BTF_ID#1234 is dereferenced (like r2 = *(u64 *)r1 + 32)
> > >> the btf_struct_access() checks which field of 'struct sk_buff' is
> > >> at offset 32. Checks that size of access matches type definition
> > >> of the field and continues to track the dereferenced type.
> > >> If that field was a pointer to 'struct net_device' the r2's type
> > >> will be PTR_TO_BTF_ID#456. Where 456 is btf_id of 'struct net_device'
> > >> in vmlinux's BTF.
> > >>
> > >> Such verifier anlaysis prevents "cheating" in BPF C program.
> > >
> > > typo: analysis
> >
> > I did ran spellcheck, but couldn't interpret its input :)
> >
> > >
> > >> The program cannot cast arbitrary pointer to 'struct sk_buff *'
> > >> and access it. C compiler would allow type cast, of course,
> > >> but the verifier will notice type mismatch based on BPF assembly
> > >> and in-kernel BTF.
> > >>
> > >> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> > >> ---
> > >>   include/linux/bpf.h          |  15 ++-
> > >>   include/linux/bpf_verifier.h |   2 +
> > >>   kernel/bpf/btf.c             | 179 +++++++++++++++++++++++++++++++++++
> > >>   kernel/bpf/verifier.c        |  69 +++++++++++++-
> > >>   kernel/trace/bpf_trace.c     |   2 +-
> > >>   5 files changed, 262 insertions(+), 5 deletions(-)
> > >>
> > >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > >> index 5b9d22338606..2dc3a7c313e9 100644
> > >> --- a/include/linux/bpf.h
> > >> +++ b/include/linux/bpf.h
> > >> @@ -281,6 +281,7 @@ enum bpf_reg_type {
> > >>          PTR_TO_TCP_SOCK_OR_NULL, /* reg points to struct tcp_sock or NULL */
> > >>          PTR_TO_TP_BUFFER,        /* reg points to a writable raw tp's buffer */
> > >>          PTR_TO_XDP_SOCK,         /* reg points to struct xdp_sock */
> > >> +       PTR_TO_BTF_ID,
> > >
> > > comments for consistency? ;)
> >
> > fixed
> >
> > >>   };
> > >>
> > >>   /* The information passed from prog-specific *_is_valid_access
> > >> @@ -288,7 +289,11 @@ enum bpf_reg_type {
> > >>    */
> > >>   struct bpf_insn_access_aux {
> > >>          enum bpf_reg_type reg_type;
> > >> -       int ctx_field_size;
> > >> +       union {
> > >> +               int ctx_field_size;
> > >> +               u32 btf_id;
> > >> +       };
> > >> +       struct bpf_verifier_env *env; /* for verbose logs */
> > >>   };
> > >>
> > >>   static inline void
> > >> @@ -747,6 +752,14 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
> > >>   int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
> > >>                                       const union bpf_attr *kattr,
> > >>                                       union bpf_attr __user *uattr);
> > >> +bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> > >> +                   const struct bpf_prog *prog,
> > >> +                   struct bpf_insn_access_aux *info);
> > >> +int btf_struct_access(struct bpf_verifier_env *env,
> > >> +                     const struct btf_type *t, int off, int size,
> > >> +                     enum bpf_access_type atype,
> > >> +                     u32 *next_btf_id);
> > >> +
> > >>   #else /* !CONFIG_BPF_SYSCALL */
> > >>   static inline struct bpf_prog *bpf_prog_get(u32 ufd)
> > >>   {
> > >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > >> index 432ba8977a0a..e21782f49c45 100644
> > >> --- a/include/linux/bpf_verifier.h
> > >> +++ b/include/linux/bpf_verifier.h
> > >> @@ -52,6 +52,8 @@ struct bpf_reg_state {
> > >>                   */
> > >>                  struct bpf_map *map_ptr;
> > >>
> > >> +               u32 btf_id; /* for PTR_TO_BTF_ID */
> > >> +
> > >>                  /* Max size from any of the above. */
> > >>                  unsigned long raw;
> > >>          };
> > >> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > >> index 848f9d4b9d7e..61ff8a54ca22 100644
> > >> --- a/kernel/bpf/btf.c
> > >> +++ b/kernel/bpf/btf.c
> > >> @@ -3433,6 +3433,185 @@ struct btf *btf_parse_vmlinux(void)
> > >>          return ERR_PTR(err);
> > >>   }
> > >>
> > >> +extern struct btf *btf_vmlinux;
> > >> +
> > >> +bool btf_ctx_access(int off, int size, enum bpf_access_type type,
> > >> +                   const struct bpf_prog *prog,
> > >> +                   struct bpf_insn_access_aux *info)
> > >> +{
> > >> +       u32 btf_id = prog->expected_attach_type;
> > >> +       const struct btf_param *args;
> > >> +       const struct btf_type *t;
> > >> +       const char prefix[] = "btf_trace_";
> > >> +       const char *tname;
> > >> +       u32 nr_args;
> > >> +
> > >> +       if (!btf_id)
> > >> +               return true;
> > >> +
> > >> +       if (IS_ERR(btf_vmlinux)) {
> > >> +               bpf_verifier_log_write(info->env, "btf_vmlinux is malformed\n");
> > >> +               return false;
> > >> +       }
> > >> +
> > >> +       t = btf_type_by_id(btf_vmlinux, btf_id);
> > >> +       if (!t || BTF_INFO_KIND(t->info) != BTF_KIND_TYPEDEF) {
> > >> +               bpf_verifier_log_write(info->env, "btf_id is invalid\n");
> > >> +               return false;
> > >> +       }
> > >> +
> > >> +       tname = __btf_name_by_offset(btf_vmlinux, t->name_off);
> > >> +       if (strncmp(prefix, tname, sizeof(prefix) - 1)) {
> > >> +               bpf_verifier_log_write(info->env,
> > >> +                                      "btf_id points to wrong type name %s\n",
> > >> +                                      tname);
> > >> +               return false;
> > >> +       }
> > >> +       tname += sizeof(prefix) - 1;
> > >> +
> > >> +       t = btf_type_by_id(btf_vmlinux, t->type);
> > >> +       if (!btf_type_is_ptr(t))
> > >> +               return false;
> > >> +       t = btf_type_by_id(btf_vmlinux, t->type);
> > >> +       if (!btf_type_is_func_proto(t))
> > >> +               return false;
> > >
> > > All negative cases but these two have helpful log messages, please add
> > > two more for these.
> >
> > no. not here. This is a part of typedef construction from patch 1.
> > It cannot be anything else. If btf_id points to typedef and
> > typedef has btf_trace_ prefix it has to be correct.
> > Above two checks are checking sanity of kernel build.
>
> Fair enough.
>
> >
> > >
> > >> +
> > >> +       args = (const struct btf_param *)(t + 1);
> > >
> > > IMO, doing args++ (and leaving comment why) here instead of adjusting
> > > `off/8 + 1` below is cleaner.
> >
> > I tried your suggestion and it doesn't look any better, but why not.
> > Since I've coded it anyway.
> >
> > >> +       /* skip first 'void *__data' argument in btf_trace_* */
> > >> +       nr_args = btf_type_vlen(t) - 1;
> > >> +       if (off >= nr_args * 8) {
> > >
> > > Looks like you forgot to check that `off % 8 == 0`?
> >
> > great catch. yes. fixed
> >
> > >> +               bpf_verifier_log_write(info->env,
> > >> +                                      "raw_tp '%s' doesn't have %d-th argument\n",
> > >> +                                      tname, off / 8);
> > >> +               return false;
> > >> +       }
> > >> +
> > >> +       /* raw tp arg is off / 8, but typedef has extra 'void *', hence +1 */
> > >> +       t = btf_type_by_id(btf_vmlinux, args[off / 8 + 1].type);
> > >> +       if (btf_type_is_int(t))
> > >
> > > this is too limiting, you need to strip const/volatile/restrict and
> > > resolve typedef's (e.g., size_t, __u64 -- that's all typedefs).
> >
> > right. done.
> >
> > > also probably want to allow enums.
> >
> > eventually yes. I prefer to walk first.
>
> sure, but it is just an integer (scalar) so with just `||
> btf_is_enum(t)` you get that support for free.
>
> >
> > > btw, atomic_t is a struct, so might want to allow up to 8 byte
> > > struct/unions (passed by value) reads? might never happen for
> > > tracepoint, not sure
> >
> > may be in the future. walk first.
> >
> > >
> > >> +               /* accessing a scalar */
> > >> +               return true;
> > >> +       if (!btf_type_is_ptr(t)) {
> > >
> > > similar to above, modifiers and typedef resolution has to happen first
> >
> > done.
> >
> > >> +               bpf_verifier_log_write(info->env,
> > >> +                                      "raw_tp '%s' arg%d '%s' has type %s. Only pointer access is allowed\n",
> > >> +                                      tname, off / 8,
> > >> +                                      __btf_name_by_offset(btf_vmlinux, t->name_off),
> > >> +                                      btf_kind_str[BTF_INFO_KIND(t->info)]);
> > >> +               return false;
> > >> +       }
> > >> +       if (t->type == 0)
> > >> +               /* This is a pointer to void.
> > >> +                * It is the same as scalar from the verifier safety pov.
> > >> +                * No further pointer walking is allowed.
> > >> +                */
> > >> +               return true;
> > >> +
> > >> +       /* this is a pointer to another type */
> > >> +       info->reg_type = PTR_TO_BTF_ID;
> > >> +       info->btf_id = t->type;
> > >> +
> > >> +       t = btf_type_by_id(btf_vmlinux, t->type);
> > >> +       bpf_verifier_log_write(info->env,
> > >> +                              "raw_tp '%s' arg%d has btf_id %d type %s '%s'\n",
> > >> +                              tname, off / 8, info->btf_id,
> > >> +                              btf_kind_str[BTF_INFO_KIND(t->info)],
> > >> +                              __btf_name_by_offset(btf_vmlinux, t->name_off));
> > >> +       return true;
> > >> +}
> > >> +
> > >> +int btf_struct_access(struct bpf_verifier_env *env,
> > >> +                     const struct btf_type *t, int off, int size,
> > >> +                     enum bpf_access_type atype,
> > >> +                     u32 *next_btf_id)
> > >> +{
> > >> +       const struct btf_member *member;
> > >> +       const struct btf_type *mtype;
> > >> +       const char *tname, *mname;
> > >> +       int i, moff = 0, msize;
> > >> +
> > >> +again:
> > >> +       tname = btf_name_by_offset(btf_vmlinux, t->name_off);
> > >> +       if (!btf_type_is_struct(t)) {
> > >
> > > see above about typedef/modifiers resolution
> >
> > here actually skipping is not necessary.
> >
> > >
> > >> +               bpf_verifier_log_write(env, "Type '%s' is not a struct", tname);
> > >> +               return -EINVAL;
> > >> +       }
> > >> +       if (btf_type_vlen(t) < 1) {
> > >> +               bpf_verifier_log_write(env, "struct %s doesn't have fields", tname);
> > >> +               return -EINVAL;
> > >> +       }
> > >
> > > kind of redundant check...
> >
> > I wanted to give helpful message, but since you asked.
> > There are 394 struct FOO {}; in the kernel.
> > And probably none of them are going to appear in bpf tracing,
> > so I deleted that check.
> >
> > >
> > >> +
> > >> +       for_each_member(i, t, member) {
> > >> +
> > >> +               /* offset of the field */
> > >> +               moff = btf_member_bit_offset(t, member);
> > >
> > > what do you want to do with bitfields?
> >
> > they're scalars.
>
> well, I meant that `off` is offset in bytes, while moff is offset in
> bits and for bitfield fields it might not be a multiple of 8, so after
> check below (off < moff/8) it doesn't necessarily mean that `off * 8
> == moff` and you'll be "capturing" wrong field. So you probably need
> extra check for that?
>
> More generally, also, `off` can point into the middle of some field,
> not the beginning of the field (because it's just a byte offset, so
> can be arbitrary). So there are two things there: detecting this
> situation and what to do with this, reject or assume opaque scalar
> value?
>
> >
> > >> +
> > >> +               if (off < moff / 8)
> > >> +                       continue;
>
> Thinking about this again, this seems like an inverted condition.
> Shouldn't it be "skip field until you find field offset equal or
> greater than our desired offset":
>
> if (moff < off * 8)
>     continue;

Ok, so I had to simulate this to understand what's wrong.

Let's take this struct as an example and I'll also write down moff,
but in bytes for simplicity. Let's also assume we are trying to access
off=8:

              off  moff  off < moff     off >= moff + msize
              ---  ----  ----------     -------------------
struct s {
    int a;      8     0  8 < 0  = F -->  8 >= 0 + 4 = T --> continue
    int b;      8     4  8 < 4  = F -->  8 >= 4 + 4 = T --> continue
    int c;      8     8  8 < 8  = F -->  8 >= 8 + 4 = F <-- FOUND IT!
    int d;      8    12  8 < 12 = T --> continue
    int e;      8    16  8 < 16 = T --> continue
    int f;      8    20  8 < 20 = T --> continue
};

So it works, but:

1. Comment about rare union condition is misleading, it's actually an
expected condition to skip field which is completely before offset we
are looking for.
2. Now assume in the example above int c is actually an one-element
array. So we'll skip it because it's not supported, will go to `int
d`, continue, then check e, f, and so on until we exhaust all fields,
while it should be clear at `int d` that there is no way and we should
just stop.

So I think overall logic would be a bit more straightforward and
efficient if expressed as:

for_each_member(i, t, member) {
    if (moff + msize <= off)
        continue; /* no overlap with member, yet, keep iterating */
    if (moff >= off + size)
        break; /* won't find anything, field is already too far */

    /* overlapping, rest of the checks */

}

>
> > >> +
> > >> +               /* type of the field */
> > >> +               mtype = btf_type_by_id(btf_vmlinux, member->type);
> > >> +               mname = __btf_name_by_offset(btf_vmlinux, member->name_off);
> > >
> > > nit: you mix btf_name_by_offset and __btf_name_by_offset, any reason
> > > to not stick to just one of them (__btf_name_by_offset is safer, so
> > > that one, probably)?
> >
> > I tried to use btf_name_by_offset() in verifier.c and
> > __btf_name_by_offset() in btf.c consistently.
> > Looks like I missed one spot.
> > Fixed.
> >
> > >
> > >> +
> > >> +               /* skip typedef, volotile modifiers */
> > >
> > > typo: volatile
> > >
> > > nit: also, volatile is not special, so either mention
> > > const/volatile/restrict or just "modifiers"?
> >
> > fixed
> >
> > >> +               while (btf_type_is_modifier(mtype))
> > >> +                       mtype = btf_type_by_id(btf_vmlinux, mtype->type);
> > >> +
> > >> +               if (btf_type_is_array(mtype))
> > >> +                       /* array deref is not supported yet */
> > >> +                       continue;
> > >> +
> > >> +               if (!btf_type_has_size(mtype) && !btf_type_is_ptr(mtype)) {
> > >> +                       bpf_verifier_log_write(env,
> > >> +                                              "field %s doesn't have size\n",
> > >> +                                              mname);
> > >> +                       return -EFAULT;
> > >> +               }
> > >> +               if (btf_type_is_ptr(mtype))
> > >> +                       msize = 8;
> > >> +               else
> > >> +                       msize = mtype->size;
> > >> +               if (off >= moff / 8 + msize)
> > >> +                       /* rare case, must be a field of the union with smaller size,
> > >> +                        * let's try another field
> > >> +                        */
> > >> +                       continue;
> > >> +               /* the 'off' we're looking for is either equal to start
> > >> +                * of this field or inside of this struct
> > >> +                */
> > >> +               if (btf_type_is_struct(mtype)) {
> > >> +                       /* our field must be inside that union or struct */
> > >> +                       t = mtype;
> > >> +
> > >> +                       /* adjust offset we're looking for */
> > >> +                       off -= moff / 8;
> > >> +                       goto again;
> > >> +               }
> > >> +               if (msize != size) {
> > >> +                       /* field access size doesn't match */
> > >> +                       bpf_verifier_log_write(env,
> > >> +                                              "cannot access %d bytes in struct %s field %s that has size %d\n",
> > >> +                                              size, tname, mname, msize);
> > >> +                       return -EACCES;
> > >> +               }
> > >> +
> > >> +               if (btf_type_is_ptr(mtype)) {
> > >> +                       const struct btf_type *stype;
> > >> +
> > >> +                       stype = btf_type_by_id(btf_vmlinux, mtype->type);
> > >> +                       if (btf_type_is_struct(stype)) {
> > >
> > > again, resolving modifiers/typedefs? though in this case it might be
> > > too eager?...
> >
> > done
> >
> > >> +                               *next_btf_id = mtype->type;
> > >> +                               return PTR_TO_BTF_ID;
> > >> +                       }
> > >> +               }
> > >> +               /* all other fields are treated as scalars */
> > >> +               return SCALAR_VALUE;
> > >> +       }
> > >> +       bpf_verifier_log_write(env,
> > >> +                              "struct %s doesn't have field at offset %d\n",
> > >> +                              tname, off);
> > >> +       return -EINVAL;
> > >> +}
> > >> +
> > >>   void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
> > >>                         struct seq_file *m)
> > >>   {
> > >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > >> index 91c4db4d1c6a..3c155873ffea 100644
> > >> --- a/kernel/bpf/verifier.c
> > >> +++ b/kernel/bpf/verifier.c
> > >> @@ -406,6 +406,7 @@ static const char * const reg_type_str[] = {
> > >>          [PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null",
> > >>          [PTR_TO_TP_BUFFER]      = "tp_buffer",
> > >>          [PTR_TO_XDP_SOCK]       = "xdp_sock",
> > >> +       [PTR_TO_BTF_ID]         = "ptr_",
> > >>   };
> > >>
> > >>   static char slot_type_char[] = {
> > >> @@ -460,6 +461,10 @@ static void print_verifier_state(struct bpf_verifier_env *env,
> > >>                          /* reg->off should be 0 for SCALAR_VALUE */
> > >>                          verbose(env, "%lld", reg->var_off.value + reg->off);
> > >>                  } else {
> > >> +                       if (t == PTR_TO_BTF_ID)
> > >> +                               verbose(env, "%s",
> > >> +                                       btf_name_by_offset(btf_vmlinux,
> > >> +                                                          btf_type_by_id(btf_vmlinux, reg->btf_id)->name_off));
> > >>                          verbose(env, "(id=%d", reg->id);
> > >>                          if (reg_type_may_be_refcounted_or_null(t))
> > >>                                  verbose(env, ",ref_obj_id=%d", reg->ref_obj_id);
> > >> @@ -2337,10 +2342,12 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
> > >>
> > >>   /* check access to 'struct bpf_context' fields.  Supports fixed offsets only */
> > >>   static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size,
> > >> -                           enum bpf_access_type t, enum bpf_reg_type *reg_type)
> > >> +                           enum bpf_access_type t, enum bpf_reg_type *reg_type,
> > >> +                           u32 *btf_id)
> > >>   {
> > >>          struct bpf_insn_access_aux info = {
> > >>                  .reg_type = *reg_type,
> > >> +               .env = env,
> > >>          };
> > >>
> > >>          if (env->ops->is_valid_access &&
> > >> @@ -2354,7 +2361,10 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
> > >>                   */
> > >>                  *reg_type = info.reg_type;
> > >>
> > >> -               env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
> > >> +               if (*reg_type == PTR_TO_BTF_ID)
> > >> +                       *btf_id = info.btf_id;
> > >> +               else
> > >> +                       env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
> > >
> > > ctx_field_size is passed through bpf_insn_access_aux, but btf_id is
> > > returned like this. Is there a reason to do it in two different ways?
> >
> > insn_aux_data is permanent. Meaning that any executing path into
> > this instruction got to have the same size and offset of ctx access.
> > I think btf based ctx access doesn't have to be.
> > There is a check later to make sure r1=ctx dereferenceing btf is
> > permanent, but r1=btf dereferencing btf is clearly note.
> > I'm still not sure whether former will be permanent forever.
> > So went with quick hack above to reduce amount of potential
> > refactoring later. I'll think about it more.
>
> Yeah, it makes sense. I agree we shouldn't enforce same BTF type ID
> through all executions paths, if possible. I just saw you added btf_id
> both to struct bpf_insn_access_aux and reg_state, so was wondering
> what's going on.
>
> >
> > >>                  /* remember the offset of last byte accessed in ctx */
> > >>                  if (env->prog->aux->max_ctx_offset < off + size)
> > >>                          env->prog->aux->max_ctx_offset = off + size;
> > >> @@ -2745,6 +2755,53 @@ static void coerce_reg_to_size(struct bpf_reg_state *reg, int size)
> > >>          reg->smax_value = reg->umax_value;
> > >>   }
> > >>
> > >> +static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
> > >> +                                  struct bpf_reg_state *regs,
> > >> +                                  int regno, int off, int size,
> > >> +                                  enum bpf_access_type atype,
> > >> +                                  int value_regno)
> > >> +{
> > >> +       struct bpf_reg_state *reg = regs + regno;
> > >> +       const struct btf_type *t = btf_type_by_id(btf_vmlinux, reg->btf_id);
> > >> +       const char *tname = btf_name_by_offset(btf_vmlinux, t->name_off);
> > >> +       u32 btf_id;
> > >> +       int ret;
> > >> +
> > >> +       if (atype != BPF_READ) {
> > >> +               verbose(env, "only read is supported\n");
> > >> +               return -EACCES;
> > >> +       }
> > >> +
> > >> +       if (off < 0) {
> > >> +               verbose(env,
> > >> +                       "R%d is ptr_%s negative access %d is not allowed\n",
> > >
> > > totally nit: but for consistency sake (following variable offset error
> > > below): R%d is ptr_%s negative access: off=%d\n"?
> >
> > fixed
> >
> > >> +                       regno, tname, off);
> > >> +               return -EACCES;
> > >> +       }
> > >> +       if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
> > >
> > > why so strict about reg->var_off.value?
> >
> > It's variable part of register access.
> > There is no fixed offset to pass into btf_struct_access().
> > In other words 'arrays of pointers to btf_id' are not supported yet.
> > Walk first :)
> >
>
> yeah, I'm fine with that

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 09/10] bpf: disallow bpf_probe_read[_str] helpers
  2019-10-05  5:03 ` [PATCH bpf-next 09/10] bpf: disallow bpf_probe_read[_str] helpers Alexei Starovoitov
@ 2019-10-09  5:29   ` Andrii Nakryiko
  2019-10-09 19:38     ` Alexei Starovoitov
  0 siblings, 1 reply; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-09  5:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> Disallow bpf_probe_read() and bpf_probe_read_str() helpers in
> raw_tracepoint bpf programs that use in-kernel BTF to track
> types of memory accesses.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  kernel/trace/bpf_trace.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 52f7e9d8c29b..7c607f79f1bb 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -700,6 +700,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>         case BPF_FUNC_map_peek_elem:
>                 return &bpf_map_peek_elem_proto;
>         case BPF_FUNC_probe_read:
> +               if (prog->expected_attach_type)
> +                       return NULL;

This can unintentionally disable bpf_probe_read/bpf_probe_read_str for
non-raw_tp programs that happened to specify non-zero
expected_attach_type, which we don't really validate for
kprobe/tp/perf_event/etc. So how about passing program type into
tracing_func_proto() so that we can have more granular control?

>                 return &bpf_probe_read_proto;
>         case BPF_FUNC_ktime_get_ns:
>                 return &bpf_ktime_get_ns_proto;
> @@ -728,6 +730,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>         case BPF_FUNC_get_prandom_u32:
>                 return &bpf_get_prandom_u32_proto;
>         case BPF_FUNC_probe_read_str:
> +               if (prog->expected_attach_type)
> +                       return NULL;
>                 return &bpf_probe_read_str_proto;
>  #ifdef CONFIG_CGROUPS
>         case BPF_FUNC_get_current_cgroup_id:
> --
> 2.20.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 10/10] selftests/bpf: add kfree_skb raw_tp test
  2019-10-05  5:03 ` [PATCH bpf-next 10/10] selftests/bpf: add kfree_skb raw_tp test Alexei Starovoitov
@ 2019-10-09  5:36   ` Andrii Nakryiko
  2019-10-09 17:37     ` Alexei Starovoitov
  0 siblings, 1 reply; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-09  5:36 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> Load basic cls_bpf program.
> Load raw_tracepoint program and attach to kfree_skb raw tracepoint.
> Trigger cls_bpf via prog_test_run.
> At the end of test_run kernel will call kfree_skb
> which will trigger trace_kfree_skb tracepoint.
> Which will call our raw_tracepoint program.
> Which will take that skb and will dump it into perf ring buffer.
> Check that user space received correct packet.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

LGTM, few minor nits below.

Acked-by: Andrii Nakryiko <andriin@fb.com>


>  .../selftests/bpf/prog_tests/kfree_skb.c      | 90 +++++++++++++++++++
>  tools/testing/selftests/bpf/progs/kfree_skb.c | 76 ++++++++++++++++
>  2 files changed, 166 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/kfree_skb.c
>  create mode 100644 tools/testing/selftests/bpf/progs/kfree_skb.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/kfree_skb.c b/tools/testing/selftests/bpf/prog_tests/kfree_skb.c
> new file mode 100644
> index 000000000000..238bc7024b36
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/kfree_skb.c
> @@ -0,0 +1,90 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <test_progs.h>
> +
> +static void on_sample(void *ctx, int cpu, void *data, __u32 size)
> +{
> +       int ifindex = *(int *)data, duration = 0;
> +       struct ipv6_packet * pkt_v6 = data + 4;
> +
> +       if (ifindex != 1)
> +               /* spurious kfree_skb not on loopback device */
> +               return;
> +       if (CHECK(size != 76, "check_size", "size %d != 76\n", size))

compiler doesn't complain about %d and size being unsigned?

> +               return;
> +       if (CHECK(pkt_v6->eth.h_proto != 0xdd86, "check_eth",
> +                 "h_proto %x\n", pkt_v6->eth.h_proto))
> +               return;
> +       if (CHECK(pkt_v6->iph.nexthdr != 6, "check_ip",
> +                 "iph.nexthdr %x\n", pkt_v6->iph.nexthdr))
> +               return;
> +       if (CHECK(pkt_v6->tcp.doff != 5, "check_tcp",
> +                 "tcp.doff %x\n", pkt_v6->tcp.doff))
> +               return;
> +
> +       *(bool *)ctx = true;
> +}
> +
> +void test_kfree_skb(void)
> +{
> +       struct bpf_prog_load_attr attr = {
> +               .file = "./kfree_skb.o",
> +               .log_level = 2,
> +       };
> +
> +       struct bpf_object *obj, *obj2 = NULL;
> +       struct perf_buffer_opts pb_opts = {};
> +       struct perf_buffer *pb = NULL;
> +       struct bpf_link *link = NULL;
> +       struct bpf_map *perf_buf_map;
> +       struct bpf_program *prog;
> +       __u32 duration, retval;
> +       int err, pkt_fd, kfree_skb_fd;
> +       bool passed = false;
> +
> +       err = bpf_prog_load("./test_pkt_access.o", BPF_PROG_TYPE_SCHED_CLS, &obj, &pkt_fd);
> +       if (CHECK(err, "prog_load sched cls", "err %d errno %d\n", err, errno))
> +               return;
> +
> +       err = bpf_prog_load_xattr(&attr, &obj2, &kfree_skb_fd);
> +       if (CHECK(err, "prog_load raw tp", "err %d errno %d\n", err, errno))
> +               goto close_prog;
> +
> +       prog = bpf_object__find_program_by_title(obj2, "raw_tracepoint/kfree_skb");
> +       if (CHECK(!prog, "find_prog", "prog kfree_skb not found\n"))
> +               goto close_prog;
> +       link = bpf_program__attach_raw_tracepoint(prog, "kfree_skb");
> +       if (CHECK(IS_ERR(link), "attach_raw_tp", "err %ld\n", PTR_ERR(link)))
> +               goto close_prog;
> +
> +       perf_buf_map = bpf_object__find_map_by_name(obj2, "perf_buf_map");
> +       if (CHECK(!perf_buf_map, "find_perf_buf_map", "not found\n"))
> +               goto close_prog;
> +
> +       /* set up perf buffer */
> +       pb_opts.sample_cb = on_sample;
> +       pb_opts.ctx = &passed;
> +       pb = perf_buffer__new(bpf_map__fd(perf_buf_map), 1, &pb_opts);
> +       if (CHECK(IS_ERR(pb), "perf_buf__new", "err %ld\n", PTR_ERR(pb)))
> +               goto close_prog;
> +
> +       err = bpf_prog_test_run(pkt_fd, 1, &pkt_v6, sizeof(pkt_v6),
> +                               NULL, NULL, &retval, &duration);
> +       CHECK(err || retval, "ipv6",
> +             "err %d errno %d retval %d duration %d\n",
> +             err, errno, retval, duration);
> +
> +       /* read perf buffer */
> +       err = perf_buffer__poll(pb, 100);
> +       if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
> +               goto close_prog;
> +       /* make sure kfree_skb program was triggered
> +        * and it sent expected skb into ring buffer
> +        */
> +       CHECK_FAIL(!passed);
> +close_prog:
> +       perf_buffer__free(pb);
> +       if (!IS_ERR_OR_NULL(link))
> +               bpf_link__destroy(link);
> +       bpf_object__close(obj);
> +       bpf_object__close(obj2);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/kfree_skb.c b/tools/testing/selftests/bpf/progs/kfree_skb.c
> new file mode 100644
> index 000000000000..61f1abfc4f48
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/kfree_skb.c
> @@ -0,0 +1,76 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2019 Facebook
> +#include <linux/bpf.h>
> +#include "bpf_helpers.h"
> +
> +char _license[] SEC("license") = "GPL";
> +struct {
> +       __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
> +       __uint(key_size, sizeof(int));
> +       __uint(value_size, sizeof(int));
> +} perf_buf_map SEC(".maps");
> +
> +#define _(P) (__builtin_preserve_access_index(P))
> +
> +/* define few struct-s that bpf program needs to access */
> +struct callback_head {
> +       struct callback_head *next;
> +       void (*func)(struct callback_head *head);
> +};
> +struct dev_ifalias {
> +       struct callback_head rcuhead;
> +};
> +
> +struct net_device /* same as kernel's struct net_device */ {
> +       int ifindex;
> +       volatile struct dev_ifalias *ifalias;
> +};
> +
> +struct sk_buff {
> +       /* field names and sizes should match to those in the kernel */
> +        unsigned int            len,
> +                                data_len;
> +        __u16                   mac_len,
> +                                hdr_len;
> +        __u16                   queue_mapping;
> +       struct net_device *dev;
> +       /* order of the fields doesn't matter */
> +};
> +
> +/* copy arguments from
> + * include/trace/events/skb.h:
> + * TRACE_EVENT(kfree_skb,
> + *         TP_PROTO(struct sk_buff *skb, void *location),
> + *
> + * into struct below:
> + */
> +struct trace_kfree_skb {
> +       struct sk_buff *skb;
> +       void *location;
> +};
> +
> +SEC("raw_tracepoint/kfree_skb")
> +int trace_kfree_skb(struct trace_kfree_skb* ctx)
> +{
> +       struct sk_buff *skb = ctx->skb;
> +       struct net_device *dev;
> +       int ifindex;
> +       struct callback_head *ptr;
> +       void *func;

nit: style checker should have complained about missing empty line

> +       __builtin_preserve_access_index(({
> +               dev = skb->dev;
> +               ifindex = dev->ifindex;
> +               ptr = dev->ifalias->rcuhead.next;
> +               func = ptr->func;
> +       }));
> +
> +       bpf_printk("rcuhead.next %llx func %llx\n", ptr, func);
> +       bpf_printk("skb->len %d\n", _(skb->len));
> +       bpf_printk("skb->queue_mapping %d\n", _(skb->queue_mapping));
> +       bpf_printk("dev->ifindex %d\n", ifindex);
> +
> +       /* send first 72 byte of the packet to user space */
> +       bpf_skb_output(skb, &perf_buf_map, (72ull << 32) | BPF_F_CURRENT_CPU,
> +                      &ifindex, sizeof(ifindex));
> +       return 0;
> +}
> --
> 2.20.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 10/10] selftests/bpf: add kfree_skb raw_tp test
  2019-10-09  5:36   ` Andrii Nakryiko
@ 2019-10-09 17:37     ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-09 17:37 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On 10/8/19 10:36 PM, Andrii Nakryiko wrote:
> On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>>
>> Load basic cls_bpf program.
>> Load raw_tracepoint program and attach to kfree_skb raw tracepoint.
>> Trigger cls_bpf via prog_test_run.
>> At the end of test_run kernel will call kfree_skb
>> which will trigger trace_kfree_skb tracepoint.
>> Which will call our raw_tracepoint program.
>> Which will take that skb and will dump it into perf ring buffer.
>> Check that user space received correct packet.
>>
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> ---
> 
> LGTM, few minor nits below.
> 
> Acked-by: Andrii Nakryiko <andriin@fb.com>
> 
> 
>>   .../selftests/bpf/prog_tests/kfree_skb.c      | 90 +++++++++++++++++++
>>   tools/testing/selftests/bpf/progs/kfree_skb.c | 76 ++++++++++++++++
>>   2 files changed, 166 insertions(+)
>>   create mode 100644 tools/testing/selftests/bpf/prog_tests/kfree_skb.c
>>   create mode 100644 tools/testing/selftests/bpf/progs/kfree_skb.c
>>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/kfree_skb.c b/tools/testing/selftests/bpf/prog_tests/kfree_skb.c
>> new file mode 100644
>> index 000000000000..238bc7024b36
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/prog_tests/kfree_skb.c
>> @@ -0,0 +1,90 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +#include <test_progs.h>
>> +
>> +static void on_sample(void *ctx, int cpu, void *data, __u32 size)
>> +{
>> +       int ifindex = *(int *)data, duration = 0;
>> +       struct ipv6_packet * pkt_v6 = data + 4;
>> +
>> +       if (ifindex != 1)
>> +               /* spurious kfree_skb not on loopback device */
>> +               return;
>> +       if (CHECK(size != 76, "check_size", "size %d != 76\n", size))
> 
> compiler doesn't complain about %d and size being unsigned?

compile didn't complain. but I fixed it.

>> +SEC("raw_tracepoint/kfree_skb")
>> +int trace_kfree_skb(struct trace_kfree_skb* ctx)
>> +{
>> +       struct sk_buff *skb = ctx->skb;
>> +       struct net_device *dev;
>> +       int ifindex;
>> +       struct callback_head *ptr;
>> +       void *func;
> 
> nit: style checker should have complained about missing empty line

good point. Fixed checkpatch errors.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 07/10] bpf: add support for BTF pointers to x86 JIT
  2019-10-05  5:03 ` [PATCH bpf-next 07/10] bpf: add support for BTF pointers to x86 JIT Alexei Starovoitov
  2019-10-05  6:03   ` Eric Dumazet
@ 2019-10-09 17:38   ` Andrii Nakryiko
  2019-10-09 17:46     ` Alexei Starovoitov
  1 sibling, 1 reply; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-09 17:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> Pointer to BTF object is a pointer to kernel object or NULL.
> Such pointers can only be used by BPF_LDX instructions.
> The verifier changed their opcode from LDX|MEM|size
> to LDX|PROBE_MEM|size to make JITing easier.
> The number of entries in extable is the number of BPF_LDX insns
> that access kernel memory via "pointer to BTF type".
> Only these load instructions can fault.
> Since x86 extable is relative it has to be allocated in the same
> memory region as JITed code.
> Allocate it prior to last pass of JITing and let the last pass populate it.
> Pointer to extable in bpf_prog_aux is necessary to make page fault
> handling fast.
> Page fault handling is done in two steps:
> 1. bpf_prog_kallsyms_find() finds BPF program that page faulted.
>    It's done by walking rb tree.
> 2. then extable for given bpf program is binary searched.
> This process is similar to how page faulting is done for kernel modules.
> The exception handler skips over faulting x86 instruction and
> initializes destination register with zero. This mimics exact
> behavior of bpf_probe_read (when probe_kernel_read faults dest is zeroed).
>
> JITs for other architectures can add support in similar way.
> Until then they will reject unknown opcode and fallback to interpreter.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  arch/x86/net/bpf_jit_comp.c | 96 +++++++++++++++++++++++++++++++++++--
>  include/linux/bpf.h         |  3 ++
>  include/linux/extable.h     | 10 ++++
>  kernel/bpf/core.c           | 20 +++++++-
>  kernel/bpf/verifier.c       |  1 +
>  kernel/extable.c            |  2 +
>  6 files changed, 127 insertions(+), 5 deletions(-)
>

This is surprisingly easy to follow :) Looks good overall, just one
concern about 32-bit distance between ex_handler_bpf and BPF jitted
program below. And I agree with Eric, probably need to ensure proper
alignment for exception_table_entry array.

[...]

> @@ -805,6 +835,48 @@ stx:                       if (is_imm8(insn->off))
>                         else
>                                 EMIT1_off32(add_2reg(0x80, src_reg, dst_reg),
>                                             insn->off);
> +                       if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
> +                               struct exception_table_entry *ex;
> +                               u8 *_insn = image + proglen;
> +                               s64 delta;
> +
> +                               if (!bpf_prog->aux->extable)
> +                                       break;
> +
> +                               if (excnt >= bpf_prog->aux->num_exentries) {
> +                                       pr_err("ex gen bug\n");

This should never happen, right? BUG()?

> +                                       return -EFAULT;
> +                               }
> +                               ex = &bpf_prog->aux->extable[excnt++];
> +
> +                               delta = _insn - (u8 *)&ex->insn;
> +                               if (!is_simm32(delta)) {
> +                                       pr_err("extable->insn doesn't fit into 32-bit\n");
> +                                       return -EFAULT;
> +                               }
> +                               ex->insn = delta;
> +
> +                               delta = (u8 *)ex_handler_bpf - (u8 *)&ex->handler;

how likely it is that global ex_handle_bpf will be close enough to
dynamically allocated piece of exception_table_entry?

> +                               if (!is_simm32(delta)) {
> +                                       pr_err("extable->handler doesn't fit into 32-bit\n");
> +                                       return -EFAULT;
> +                               }
> +                               ex->handler = delta;
> +
> +                               if (dst_reg > BPF_REG_9) {
> +                                       pr_err("verifier error\n");
> +                                       return -EFAULT;
> +                               }
> +                               /*
> +                                * Compute size of x86 insn and its target dest x86 register.
> +                                * ex_handler_bpf() will use lower 8 bits to adjust
> +                                * pt_regs->ip to jump over this x86 instruction
> +                                * and upper bits to figure out which pt_regs to zero out.
> +                                * End result: x86 insn "mov rbx, qword ptr [rax+0x14]"
> +                                * of 4 bytes will be ignored and rbx will be zero inited.
> +                                */
> +                               ex->fixup = (prog - temp) | (reg2pt_regs[dst_reg] << 8);
> +                       }
>                         break;
>
>                         /* STX XADD: lock *(u32*)(dst_reg + off) += src_reg */
> @@ -1058,6 +1130,11 @@ xadd:                    if (is_imm8(insn->off))
>                 addrs[i] = proglen;
>                 prog = temp;
>         }
> +
> +       if (image && excnt != bpf_prog->aux->num_exentries) {
> +               pr_err("extable is not populated\n");

Isn't this a plain BUG() ?


> +               return -EFAULT;
> +       }
>         return proglen;
>  }
>

[...]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 07/10] bpf: add support for BTF pointers to x86 JIT
  2019-10-09 17:38   ` Andrii Nakryiko
@ 2019-10-09 17:46     ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-09 17:46 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On 10/9/19 10:38 AM, Andrii Nakryiko wrote:
> On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>>
>> Pointer to BTF object is a pointer to kernel object or NULL.
>> Such pointers can only be used by BPF_LDX instructions.
>> The verifier changed their opcode from LDX|MEM|size
>> to LDX|PROBE_MEM|size to make JITing easier.
>> The number of entries in extable is the number of BPF_LDX insns
>> that access kernel memory via "pointer to BTF type".
>> Only these load instructions can fault.
>> Since x86 extable is relative it has to be allocated in the same
>> memory region as JITed code.
>> Allocate it prior to last pass of JITing and let the last pass populate it.
>> Pointer to extable in bpf_prog_aux is necessary to make page fault
>> handling fast.
>> Page fault handling is done in two steps:
>> 1. bpf_prog_kallsyms_find() finds BPF program that page faulted.
>>     It's done by walking rb tree.
>> 2. then extable for given bpf program is binary searched.
>> This process is similar to how page faulting is done for kernel modules.
>> The exception handler skips over faulting x86 instruction and
>> initializes destination register with zero. This mimics exact
>> behavior of bpf_probe_read (when probe_kernel_read faults dest is zeroed).
>>
>> JITs for other architectures can add support in similar way.
>> Until then they will reject unknown opcode and fallback to interpreter.
>>
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> ---
>>   arch/x86/net/bpf_jit_comp.c | 96 +++++++++++++++++++++++++++++++++++--
>>   include/linux/bpf.h         |  3 ++
>>   include/linux/extable.h     | 10 ++++
>>   kernel/bpf/core.c           | 20 +++++++-
>>   kernel/bpf/verifier.c       |  1 +
>>   kernel/extable.c            |  2 +
>>   6 files changed, 127 insertions(+), 5 deletions(-)
>>
> 
> This is surprisingly easy to follow :) Looks good overall, just one
> concern about 32-bit distance between ex_handler_bpf and BPF jitted
> program below. And I agree with Eric, probably need to ensure proper
> alignment for exception_table_entry array.

already fixed.


> [...]
> 
>> @@ -805,6 +835,48 @@ stx:                       if (is_imm8(insn->off))
>>                          else
>>                                  EMIT1_off32(add_2reg(0x80, src_reg, dst_reg),
>>                                              insn->off);
>> +                       if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
>> +                               struct exception_table_entry *ex;
>> +                               u8 *_insn = image + proglen;
>> +                               s64 delta;
>> +
>> +                               if (!bpf_prog->aux->extable)
>> +                                       break;
>> +
>> +                               if (excnt >= bpf_prog->aux->num_exentries) {
>> +                                       pr_err("ex gen bug\n");
> 
> This should never happen, right? BUG()?

absolutely not. No BUGs in kernel for things like this.
If kernel can continue it should.

>> +                                       return -EFAULT;
>> +                               }
>> +                               ex = &bpf_prog->aux->extable[excnt++];
>> +
>> +                               delta = _insn - (u8 *)&ex->insn;
>> +                               if (!is_simm32(delta)) {
>> +                                       pr_err("extable->insn doesn't fit into 32-bit\n");
>> +                                       return -EFAULT;
>> +                               }
>> +                               ex->insn = delta;
>> +
>> +                               delta = (u8 *)ex_handler_bpf - (u8 *)&ex->handler;
> 
> how likely it is that global ex_handle_bpf will be close enough to
> dynamically allocated piece of exception_table_entry?

99.9% Since we rely on that in other places in the JIT.
See BPF_CALL, for example.
But I'd like to keep the check below. Just in case.
Same as in BPF_CALL.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 08/10] bpf: check types of arguments passed into helpers
  2019-10-05  5:03 ` [PATCH bpf-next 08/10] bpf: check types of arguments passed into helpers Alexei Starovoitov
@ 2019-10-09 18:01   ` Andrii Nakryiko
  2019-10-09 19:58     ` Alexei Starovoitov
  0 siblings, 1 reply; 39+ messages in thread
From: Andrii Nakryiko @ 2019-10-09 18:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>
> Introduce new helper that reuses existing skb perf_event output
> implementation, but can be called from raw_tracepoint programs
> that receive 'struct sk_buff *' as tracepoint argument or
> can walk other kernel data structures to skb pointer.
>
> In order to do that teach verifier to resolve true C types
> of bpf helpers into in-kernel BTF ids.
> The type of kernel pointer passed by raw tracepoint into bpf
> program will be tracked by the verifier all the way until
> it's passed into helper function.
> For example:
> kfree_skb() kernel function calls trace_kfree_skb(skb, loc);
> bpf programs receives that skb pointer and may eventually
> pass it into bpf_skb_output() bpf helper which in-kernel is
> implemented via bpf_skb_event_output() kernel function.
> Its first argument in the kernel is 'struct sk_buff *'.
> The verifier makes sure that types match all the way.
>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---

no real concerns, few questions and nits below. Looks great otherwise!

>  include/linux/bpf.h                       |  3 +
>  include/uapi/linux/bpf.h                  |  3 +-
>  kernel/bpf/btf.c                          | 73 +++++++++++++++++++++++
>  kernel/bpf/verifier.c                     | 29 +++++++++
>  kernel/trace/bpf_trace.c                  |  4 ++
>  net/core/filter.c                         | 15 ++++-
>  tools/include/uapi/linux/bpf.h            |  3 +-
>  tools/testing/selftests/bpf/bpf_helpers.h |  4 ++
>  8 files changed, 131 insertions(+), 3 deletions(-)
>

[...]

> +       args = (const struct btf_param *)(t + 1);
> +       if (arg >= btf_type_vlen(t)) {
> +               bpf_verifier_log_write(env,
> +                                      "bpf helper '%s' doesn't have %d-th argument\n",
> +                                      fnname, arg);
> +               return -EINVAL;
> +       }
> +
> +       t = btf_type_by_id(btf_vmlinux, args[arg].type);
> +       if (!btf_type_is_ptr(t) || !t->type) {
> +               /* anything but the pointer to struct is a helper config bug */
> +               bpf_verifier_log_write(env,
> +                                      "ARG_PTR_TO_BTF is misconfigured\n");
> +
> +               return -EFAULT;
> +       }
> +       btf_id = t->type;
> +
> +       t = btf_type_by_id(btf_vmlinux, t->type);
> +       if (!btf_type_is_struct(t)) {

resolve mods/typedefs?

> +               bpf_verifier_log_write(env,
> +                                      "ARG_PTR_TO_BTF is not a struct\n");
> +
> +               return -EFAULT;
> +       }
> +       bpf_verifier_log_write(env,
> +                              "helper '%s' arg%d has btf_id %d struct '%s'\n",
> +                              fnname + 4, arg, btf_id,
> +                              __btf_name_by_offset(btf_vmlinux, t->name_off));
> +       return btf_id;
> +}
> +
>  void btf_type_seq_show(const struct btf *btf, u32 type_id, void *obj,
>                        struct seq_file *m)
>  {
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 957ee442f2b4..0717aacb7801 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -205,6 +205,7 @@ struct bpf_call_arg_meta {
>         u64 msize_umax_value;
>         int ref_obj_id;
>         int func_id;
> +       u32 btf_id;
>  };
>
>  struct btf *btf_vmlinux;
> @@ -3367,6 +3368,27 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno,
>                 expected_type = PTR_TO_SOCKET;
>                 if (type != expected_type)
>                         goto err_type;
> +       } else if (arg_type == ARG_PTR_TO_BTF_ID) {
> +               expected_type = PTR_TO_BTF_ID;
> +               if (type != expected_type)
> +                       goto err_type;
> +               if (reg->btf_id != meta->btf_id) {

just double-checking, both reg->btf_id and meta->btf_id will be
resolved through modifiers/typedefs all the way to the struct, right?

> +                       verbose(env, "Helper has type %s got %s in R%d\n",
> +                               btf_name_by_offset(btf_vmlinux,
> +                                                  btf_type_by_id(btf_vmlinux,
> +                                                                 meta->btf_id)->name_off),
> +                               btf_name_by_offset(btf_vmlinux,
> +                                                  btf_type_by_id(btf_vmlinux,
> +                                                                 reg->btf_id)->name_off),

This is rather verbose, but popular, construct, maybe extract into a
helper func and cut on code boilerplate? I think you had similar usage
in few places in previous patches.

> +                               regno);
> +
> +                       return -EACCES;
> +               }

[...]

> @@ -4053,6 +4077,11 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn
>                 return err;
>         }
>
> +       if (fn->arg1_type == ARG_PTR_TO_BTF_ID) {
> +               if (!fn->btf_id[0])
> +                       fn->btf_id[0] = btf_resolve_helper_id(env, fn->func, 0);
> +               meta.btf_id = fn->btf_id[0];
> +       }

Is this this baby-stepping thing that we do it only for arg1? Any
complications from doing a loop over all 5 params?

>         meta.func_id = func_id;
>         /* check args */
>         err = check_func_arg(env, BPF_REG_1, fn->arg1_type, &meta);
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 6221e8c6ecc3..52f7e9d8c29b 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -995,6 +995,8 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = {
>         .arg5_type      = ARG_CONST_SIZE_OR_ZERO,
>  };
>

[...]

> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 54a50699bbfd..c5e05d1a806f 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -65,6 +65,10 @@ static int (*bpf_perf_event_output)(void *ctx, void *map,
>                                     unsigned long long flags, void *data,
>                                     int size) =
>         (void *) BPF_FUNC_perf_event_output;
> +static int (*bpf_skb_output)(void *ctx, void *map,
> +                            unsigned long long flags, void *data,
> +                            int size) =
> +       (void *) BPF_FUNC_skb_output;

Obsolete now, no more manual list of helpers.

>  static int (*bpf_get_stackid)(void *ctx, void *map, int flags) =
>         (void *) BPF_FUNC_get_stackid;
>  static int (*bpf_probe_write_user)(void *dst, const void *src, int size) =
> --
> 2.20.0
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 09/10] bpf: disallow bpf_probe_read[_str] helpers
  2019-10-09  5:29   ` Andrii Nakryiko
@ 2019-10-09 19:38     ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-09 19:38 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On 10/8/19 10:29 PM, Andrii Nakryiko wrote:
> On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>>
>> Disallow bpf_probe_read() and bpf_probe_read_str() helpers in
>> raw_tracepoint bpf programs that use in-kernel BTF to track
>> types of memory accesses.
>>
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> ---
>>   kernel/trace/bpf_trace.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index 52f7e9d8c29b..7c607f79f1bb 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -700,6 +700,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>          case BPF_FUNC_map_peek_elem:
>>                  return &bpf_map_peek_elem_proto;
>>          case BPF_FUNC_probe_read:
>> +               if (prog->expected_attach_type)
>> +                       return NULL;
> 
> This can unintentionally disable bpf_probe_read/bpf_probe_read_str for
> non-raw_tp programs that happened to specify non-zero
> expected_attach_type, which we don't really validate for
> kprobe/tp/perf_event/etc. So how about passing program type into
> tracing_func_proto() so that we can have more granular control?

yeah. that sucks that we forgot to check expected_attach_type for zero
when that field was introduced for networking progs.
I'll add new u32 to prog_load command instead. It's cleaner too.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 08/10] bpf: check types of arguments passed into helpers
  2019-10-09 18:01   ` Andrii Nakryiko
@ 2019-10-09 19:58     ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-09 19:58 UTC (permalink / raw)
  To: Andrii Nakryiko, Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, x86, Networking, bpf, Kernel Team

On 10/9/19 11:01 AM, Andrii Nakryiko wrote:
> On Fri, Oct 4, 2019 at 10:04 PM Alexei Starovoitov <ast@kernel.org> wrote:
>>
>> Introduce new helper that reuses existing skb perf_event output
>> implementation, but can be called from raw_tracepoint programs
>> that receive 'struct sk_buff *' as tracepoint argument or
>> can walk other kernel data structures to skb pointer.
>>
>> In order to do that teach verifier to resolve true C types
>> of bpf helpers into in-kernel BTF ids.
>> The type of kernel pointer passed by raw tracepoint into bpf
>> program will be tracked by the verifier all the way until
>> it's passed into helper function.
>> For example:
>> kfree_skb() kernel function calls trace_kfree_skb(skb, loc);
>> bpf programs receives that skb pointer and may eventually
>> pass it into bpf_skb_output() bpf helper which in-kernel is
>> implemented via bpf_skb_event_output() kernel function.
>> Its first argument in the kernel is 'struct sk_buff *'.
>> The verifier makes sure that types match all the way.
>>
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>> ---
> 
> no real concerns, few questions and nits below. Looks great otherwise!
> 
>>   include/linux/bpf.h                       |  3 +
>>   include/uapi/linux/bpf.h                  |  3 +-
>>   kernel/bpf/btf.c                          | 73 +++++++++++++++++++++++
>>   kernel/bpf/verifier.c                     | 29 +++++++++
>>   kernel/trace/bpf_trace.c                  |  4 ++
>>   net/core/filter.c                         | 15 ++++-
>>   tools/include/uapi/linux/bpf.h            |  3 +-
>>   tools/testing/selftests/bpf/bpf_helpers.h |  4 ++
>>   8 files changed, 131 insertions(+), 3 deletions(-)
>>
> 
> [...]
> 
>> +       args = (const struct btf_param *)(t + 1);
>> +       if (arg >= btf_type_vlen(t)) {
>> +               bpf_verifier_log_write(env,
>> +                                      "bpf helper '%s' doesn't have %d-th argument\n",
>> +                                      fnname, arg);
>> +               return -EINVAL;
>> +       }
>> +
>> +       t = btf_type_by_id(btf_vmlinux, args[arg].type);
>> +       if (!btf_type_is_ptr(t) || !t->type) {
>> +               /* anything but the pointer to struct is a helper config bug */
>> +               bpf_verifier_log_write(env,
>> +                                      "ARG_PTR_TO_BTF is misconfigured\n");
>> +
>> +               return -EFAULT;
>> +       }
>> +       btf_id = t->type;
>> +
>> +       t = btf_type_by_id(btf_vmlinux, t->type);
>> +       if (!btf_type_is_struct(t)) {
> 
> resolve mods/typedefs?

fixed

>> +                       verbose(env, "Helper has type %s got %s in R%d\n",
>> +                               btf_name_by_offset(btf_vmlinux,
>> +                                                  btf_type_by_id(btf_vmlinux,
>> +                                                                 meta->btf_id)->name_off),
>> +                               btf_name_by_offset(btf_vmlinux,
>> +                                                  btf_type_by_id(btf_vmlinux,
>> +                                                                 reg->btf_id)->name_off),
> 
> This is rather verbose, but popular, construct, maybe extract into a
> helper func and cut on code boilerplate? I think you had similar usage
> in few places in previous patches.

makes sense.

>> +       if (fn->arg1_type == ARG_PTR_TO_BTF_ID) {
>> +               if (!fn->btf_id[0])
>> +                       fn->btf_id[0] = btf_resolve_helper_id(env, fn->func, 0);
>> +               meta.btf_id = fn->btf_id[0];
>> +       }
> 
> Is this this baby-stepping thing that we do it only for arg1? Any
> complications from doing a loop over all 5 params?

fixed

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: process in-kernel BTF
  2019-10-05  5:03 ` [PATCH bpf-next 03/10] bpf: process in-kernel BTF Alexei Starovoitov
  2019-10-06  6:36   ` Andrii Nakryiko
@ 2019-10-09 20:51   ` Martin Lau
  2019-10-10  3:43     ` Alexei Starovoitov
  1 sibling, 1 reply; 39+ messages in thread
From: Martin Lau @ 2019-10-09 20:51 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, daniel, x86, netdev, bpf, Kernel Team

On Fri, Oct 04, 2019 at 10:03:07PM -0700, Alexei Starovoitov wrote:
> If in-kernel BTF exists parse it and prepare 'struct btf *btf_vmlinux'
> for further use by the verifier.
> In-kernel BTF is trusted just like kallsyms and other build artifacts
> embedded into vmlinux.
> Yet run this BTF image through BTF verifier to make sure
> that it is valid and it wasn't mangled during the build.
> 
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
> ---
>  include/linux/bpf_verifier.h |  4 ++-
>  include/linux/btf.h          |  1 +
>  kernel/bpf/btf.c             | 66 ++++++++++++++++++++++++++++++++++++
>  kernel/bpf/verifier.c        | 18 ++++++++++
>  4 files changed, 88 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 26a6d58ca78c..432ba8977a0a 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -330,10 +330,12 @@ static inline bool bpf_verifier_log_full(const struct bpf_verifier_log *log)
>  #define BPF_LOG_STATS	4
>  #define BPF_LOG_LEVEL	(BPF_LOG_LEVEL1 | BPF_LOG_LEVEL2)
>  #define BPF_LOG_MASK	(BPF_LOG_LEVEL | BPF_LOG_STATS)
> +#define BPF_LOG_KERNEL (BPF_LOG_MASK + 1)
>  
>  static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log)
>  {
> -	return log->level && log->ubuf && !bpf_verifier_log_full(log);
> +	return (log->level && log->ubuf && !bpf_verifier_log_full(log)) ||
> +		log->level == BPF_LOG_KERNEL;
>  }
>  
>  #define BPF_MAX_SUBPROGS 256
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index 64cdf2a23d42..55d43bc856be 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -56,6 +56,7 @@ bool btf_type_is_void(const struct btf_type *t);
>  #ifdef CONFIG_BPF_SYSCALL
>  const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);
>  const char *btf_name_by_offset(const struct btf *btf, u32 offset);
> +struct btf *btf_parse_vmlinux(void);
>  #else
>  static inline const struct btf_type *btf_type_by_id(const struct btf *btf,
>  						    u32 type_id)
> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index 29c7c06c6bd6..848f9d4b9d7e 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -698,6 +698,9 @@ __printf(4, 5) static void __btf_verifier_log_type(struct btf_verifier_env *env,
>  	if (!bpf_verifier_log_needed(log))
>  		return;
>  
> +	if (log->level == BPF_LOG_KERNEL && !fmt)
> +		return;
> +
>  	__btf_verifier_log(log, "[%u] %s %s%s",
>  			   env->log_type_id,
>  			   btf_kind_str[kind],
> @@ -735,6 +738,8 @@ static void btf_verifier_log_member(struct btf_verifier_env *env,
>  	if (!bpf_verifier_log_needed(log))
>  		return;
>  
> +	if (log->level == BPF_LOG_KERNEL && !fmt)
> +		return;
>  	/* The CHECK_META phase already did a btf dump.
>  	 *
>  	 * If member is logged again, it must hit an error in
> @@ -777,6 +782,8 @@ static void btf_verifier_log_vsi(struct btf_verifier_env *env,
>  
>  	if (!bpf_verifier_log_needed(log))
>  		return;
> +	if (log->level == BPF_LOG_KERNEL && !fmt)
> +		return;
>  	if (env->phase != CHECK_META)
>  		btf_verifier_log_type(env, datasec_type, NULL);
>  
> @@ -802,6 +809,8 @@ static void btf_verifier_log_hdr(struct btf_verifier_env *env,
>  	if (!bpf_verifier_log_needed(log))
>  		return;
>  
> +	if (log->level == BPF_LOG_KERNEL)
> +		return;
>  	hdr = &btf->hdr;
>  	__btf_verifier_log(log, "magic: 0x%x\n", hdr->magic);
>  	__btf_verifier_log(log, "version: %u\n", hdr->version);
> @@ -2406,6 +2415,8 @@ static s32 btf_enum_check_meta(struct btf_verifier_env *env,
>  		}
>  
>  
> +		if (env->log.level == BPF_LOG_KERNEL)
> +			continue;
>  		btf_verifier_log(env, "\t%s val=%d\n",
>  				 __btf_name_by_offset(btf, enums[i].name_off),
>  				 enums[i].val);
> @@ -3367,6 +3378,61 @@ static struct btf *btf_parse(void __user *btf_data, u32 btf_data_size,
>  	return ERR_PTR(err);
>  }
>  
> +extern char __weak _binary__btf_vmlinux_bin_start[];
> +extern char __weak _binary__btf_vmlinux_bin_end[];
> +
> +struct btf *btf_parse_vmlinux(void)
> +{
> +	struct btf_verifier_env *env = NULL;
> +	struct bpf_verifier_log *log;
> +	struct btf *btf = NULL;
> +	int err;
> +
> +	env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN);
> +	if (!env)
> +		return ERR_PTR(-ENOMEM);
> +
> +	log = &env->log;
> +	log->level = BPF_LOG_KERNEL;
> +
> +	btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN);
> +	if (!btf) {
> +		err = -ENOMEM;
> +		goto errout;
> +	}
> +	env->btf = btf;
> +
> +	btf->data = _binary__btf_vmlinux_bin_start;
> +	btf->data_size = _binary__btf_vmlinux_bin_end -
> +		_binary__btf_vmlinux_bin_start;
> +
> +	err = btf_parse_hdr(env);
> +	if (err)
> +		goto errout;
> +
> +	btf->nohdr_data = btf->data + btf->hdr.hdr_len;
> +
> +	err = btf_parse_str_sec(env);
> +	if (err)
> +		goto errout;
> +
> +	err = btf_check_all_metas(env);
> +	if (err)
> +		goto errout;
> +
Considering btf_vmlinux is already safe, any concern in making an extra
call to btf_check_all_types()?

Having resolved_ids and resolved_sizes available will
be handy in my later patch.

> +	btf_verifier_env_free(env);
> +	refcount_set(&btf->refcnt, 1);
> +	return btf;
> +
> +errout:
> +	btf_verifier_env_free(env);
> +	if (btf) {
> +		kvfree(btf->types);
> +		kfree(btf);
> +	}
> +	return ERR_PTR(err);
> +}
> +

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 03/10] bpf: process in-kernel BTF
  2019-10-09 20:51   ` Martin Lau
@ 2019-10-10  3:43     ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-10  3:43 UTC (permalink / raw)
  To: Martin Lau, Alexei Starovoitov
  Cc: davem, daniel, x86, netdev, bpf, Kernel Team

On 10/9/19 1:51 PM, Martin Lau wrote:
>> err = btf_check_all_metas(env);
>> +	if (err)
>> +		goto errout;
>> +
> Considering btf_vmlinux is already safe, any concern in making an extra
> call to btf_check_all_types()?

Only concern is additional memory resolved_* arrays will take.
The usual cpu vs memory trade off.
For this set I think extra while loop to skip btf modifiers
is better than extra array.
Yours could be different. And that's ok.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF
  2019-10-09  5:10         ` Andrii Nakryiko
@ 2019-10-10  3:54           ` Alexei Starovoitov
  0 siblings, 0 replies; 39+ messages in thread
From: Alexei Starovoitov @ 2019-10-10  3:54 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann, x86,
	Networking, bpf, Kernel Team

On 10/8/19 10:10 PM, Andrii Nakryiko wrote:
> for_each_member(i, t, member) {
>      if (moff + msize <= off)
>          continue; /* no overlap with member, yet, keep iterating */
>      if (moff >= off + size)
>          break; /* won't find anything, field is already too far */
> 
>      /* overlapping, rest of the checks */

makes sense. I'll tweak the comments and checks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, back to index

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-05  5:03 [PATCH bpf-next 00/10] bpf: revolutionize bpf tracing Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 01/10] bpf: add typecast to raw_tracepoints to help BTF generation Alexei Starovoitov
2019-10-05 18:40   ` Andrii Nakryiko
2019-10-06  3:58   ` John Fastabend
2019-10-05  5:03 ` [PATCH bpf-next 02/10] bpf: add typecast to bpf helpers " Alexei Starovoitov
2019-10-05 18:41   ` Andrii Nakryiko
2019-10-06  4:00   ` John Fastabend
2019-10-05  5:03 ` [PATCH bpf-next 03/10] bpf: process in-kernel BTF Alexei Starovoitov
2019-10-06  6:36   ` Andrii Nakryiko
2019-10-06 23:49     ` Alexei Starovoitov
2019-10-07  0:20       ` Andrii Nakryiko
2019-10-09 20:51   ` Martin Lau
2019-10-10  3:43     ` Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 04/10] libbpf: auto-detect btf_id of raw_tracepoint Alexei Starovoitov
2019-10-07 23:41   ` Andrii Nakryiko
2019-10-09  2:26     ` Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 05/10] bpf: implement accurate raw_tp context access via BTF Alexei Starovoitov
2019-10-07 16:32   ` Alan Maguire
2019-10-09  3:59     ` Alexei Starovoitov
2019-10-08  0:35   ` Andrii Nakryiko
2019-10-09  3:30     ` Alexei Starovoitov
2019-10-09  4:01       ` Andrii Nakryiko
2019-10-09  5:10         ` Andrii Nakryiko
2019-10-10  3:54           ` Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 06/10] bpf: add support for BTF pointers to interpreter Alexei Starovoitov
2019-10-08  3:08   ` Andrii Nakryiko
2019-10-05  5:03 ` [PATCH bpf-next 07/10] bpf: add support for BTF pointers to x86 JIT Alexei Starovoitov
2019-10-05  6:03   ` Eric Dumazet
2019-10-09 17:38   ` Andrii Nakryiko
2019-10-09 17:46     ` Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 08/10] bpf: check types of arguments passed into helpers Alexei Starovoitov
2019-10-09 18:01   ` Andrii Nakryiko
2019-10-09 19:58     ` Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 09/10] bpf: disallow bpf_probe_read[_str] helpers Alexei Starovoitov
2019-10-09  5:29   ` Andrii Nakryiko
2019-10-09 19:38     ` Alexei Starovoitov
2019-10-05  5:03 ` [PATCH bpf-next 10/10] selftests/bpf: add kfree_skb raw_tp test Alexei Starovoitov
2019-10-09  5:36   ` Andrii Nakryiko
2019-10-09 17:37     ` Alexei Starovoitov

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox