All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 bpf-next 0/2] introduce bpf_find_vma
@ 2021-11-05 23:23 Song Liu
  2021-11-05 23:23 ` [PATCH v5 bpf-next 1/2] bpf: introduce helper bpf_find_vma Song Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Song Liu @ 2021-11-05 23:23 UTC (permalink / raw)
  To: bpf, netdev
  Cc: ast, daniel, andrii, kernel-team, kpsingh, hengqi.chen, Song Liu

Changes v4 => v5:
1. Clean up and style change in 2/2. (Andrii)

Changes v3 => v4:
1. Move mmap_unlock_work to task_iter.c to fix build for .config without
   !CONFIG_PERF_EVENTS. (kernel test robot <lkp@intel.com>)

Changes v2 => v3:
1. Avoid using x86 only function in selftests. (Yonghong)
2. Add struct file and struct vm_area_struct to btf_task_struct_ids, and
   use it in bpf_find_vma and stackmap.c. (Yonghong)
3. Fix inaccurate comments. (Yonghong)

Changes v1 => v2:
1. Share irq_work with stackmap.c. (Daniel)
2. Add tests for illegal writes to task/vma from the callback function.
   (Daniel)
3. Other small fixes.

Add helper bpf_find_vma. This can be used in some profiling use cases. It
might also be useful for LSM.

Song Liu (2):
  bpf: introduce helper bpf_find_vma
  selftests/bpf: add tests for bpf_find_vma

 include/linux/bpf.h                           |   1 +
 include/uapi/linux/bpf.h                      |  20 +++
 kernel/bpf/btf.c                              |   5 +-
 kernel/bpf/mmap_unlock_work.h                 |  65 ++++++++++
 kernel/bpf/stackmap.c                         |  80 ++----------
 kernel/bpf/task_iter.c                        |  76 ++++++++++--
 kernel/bpf/verifier.c                         |  34 +++++
 kernel/trace/bpf_trace.c                      |   2 +
 tools/include/uapi/linux/bpf.h                |  20 +++
 .../selftests/bpf/prog_tests/find_vma.c       | 117 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/find_vma.c  |  69 +++++++++++
 .../selftests/bpf/progs/find_vma_fail1.c      |  29 +++++
 .../selftests/bpf/progs/find_vma_fail2.c      |  29 +++++
 13 files changed, 466 insertions(+), 81 deletions(-)
 create mode 100644 kernel/bpf/mmap_unlock_work.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/find_vma.c
 create mode 100644 tools/testing/selftests/bpf/progs/find_vma.c
 create mode 100644 tools/testing/selftests/bpf/progs/find_vma_fail1.c
 create mode 100644 tools/testing/selftests/bpf/progs/find_vma_fail2.c

--
2.30.2

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v5 bpf-next 1/2] bpf: introduce helper bpf_find_vma
  2021-11-05 23:23 [PATCH v5 bpf-next 0/2] introduce bpf_find_vma Song Liu
@ 2021-11-05 23:23 ` Song Liu
  2021-11-08 18:36   ` Eric Dumazet
  2021-11-05 23:23 ` [PATCH v5 bpf-next 2/2] selftests/bpf: add tests for bpf_find_vma Song Liu
  2021-11-07 20:06 ` [PATCH v5 bpf-next 0/2] introduce bpf_find_vma Alexei Starovoitov
  2 siblings, 1 reply; 10+ messages in thread
From: Song Liu @ 2021-11-05 23:23 UTC (permalink / raw)
  To: bpf, netdev
  Cc: ast, daniel, andrii, kernel-team, kpsingh, hengqi.chen, Song Liu,
	Yonghong Song

In some profiler use cases, it is necessary to map an address to the
backing file, e.g., a shared library. bpf_find_vma helper provides a
flexible way to achieve this. bpf_find_vma maps an address of a task to
the vma (vm_area_struct) for this address, and feed the vma to an callback
BPF function. The callback function is necessary here, as we need to
ensure mmap_sem is unlocked.

It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem
safely when irqs are disable, we use the same mechanism as stackmap with
build_id. Specifically, when irqs are disabled, the unlocked is postponed
in an irq_work. Refactor stackmap.c so that the irq_work is shared among
bpf_find_vma and stackmap helpers.

Acked-by: Yonghong Song <yhs@fb.com>
Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/bpf.h            |  1 +
 include/uapi/linux/bpf.h       | 20 +++++++++
 kernel/bpf/btf.c               |  5 ++-
 kernel/bpf/mmap_unlock_work.h  | 65 +++++++++++++++++++++++++++
 kernel/bpf/stackmap.c          | 80 +++-------------------------------
 kernel/bpf/task_iter.c         | 76 +++++++++++++++++++++++++++++---
 kernel/bpf/verifier.c          | 34 +++++++++++++++
 kernel/trace/bpf_trace.c       |  2 +
 tools/include/uapi/linux/bpf.h | 20 +++++++++
 9 files changed, 222 insertions(+), 81 deletions(-)
 create mode 100644 kernel/bpf/mmap_unlock_work.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2be6dfd68df99..df3410bff4b06 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2157,6 +2157,7 @@ extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto;
 extern const struct bpf_func_proto bpf_sk_setsockopt_proto;
 extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
 extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto;
+extern const struct bpf_func_proto bpf_find_vma_proto;
 
 const struct bpf_func_proto *tracing_prog_func_proto(
   enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ba5af15e25f5c..509eee5f0393d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4938,6 +4938,25 @@ union bpf_attr {
  *		**-ENOENT** if symbol is not found.
  *
  *		**-EPERM** if caller does not have permission to obtain kernel address.
+ *
+ * long bpf_find_vma(struct task_struct *task, u64 addr, void *callback_fn, void *callback_ctx, u64 flags)
+ *	Description
+ *		Find vma of *task* that contains *addr*, call *callback_fn*
+ *		function with *task*, *vma*, and *callback_ctx*.
+ *		The *callback_fn* should be a static function and
+ *		the *callback_ctx* should be a pointer to the stack.
+ *		The *flags* is used to control certain aspects of the helper.
+ *		Currently, the *flags* must be 0.
+ *
+ *		The expected callback signature is
+ *
+ *		long (\*callback_fn)(struct task_struct \*task, struct vm_area_struct \*vma, void \*callback_ctx);
+ *
+ *	Return
+ *		0 on success.
+ *		**-ENOENT** if *task->mm* is NULL, or no vma contains *addr*.
+ *		**-EBUSY** if failed to try lock mmap_lock.
+ *		**-EINVAL** for invalid **flags**.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5120,6 +5139,7 @@ union bpf_attr {
 	FN(trace_vprintk),		\
 	FN(skc_to_unix_sock),		\
 	FN(kallsyms_lookup_name),	\
+	FN(find_vma),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index dbc3ad07e21b6..cdb0fba656006 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6342,7 +6342,10 @@ const struct bpf_func_proto bpf_btf_find_by_name_kind_proto = {
 	.arg4_type	= ARG_ANYTHING,
 };
 
-BTF_ID_LIST_GLOBAL_SINGLE(btf_task_struct_ids, struct, task_struct)
+BTF_ID_LIST_GLOBAL(btf_task_struct_ids)
+BTF_ID(struct, task_struct)
+BTF_ID(struct, file)
+BTF_ID(struct, vm_area_struct)
 
 /* BTF ID set registration API for modules */
 
diff --git a/kernel/bpf/mmap_unlock_work.h b/kernel/bpf/mmap_unlock_work.h
new file mode 100644
index 0000000000000..5d18d7d85bef9
--- /dev/null
+++ b/kernel/bpf/mmap_unlock_work.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2021 Facebook
+ */
+
+#ifndef __MMAP_UNLOCK_WORK_H__
+#define __MMAP_UNLOCK_WORK_H__
+#include <linux/irq_work.h>
+
+/* irq_work to run mmap_read_unlock() in irq_work */
+struct mmap_unlock_irq_work {
+	struct irq_work irq_work;
+	struct mm_struct *mm;
+};
+
+DECLARE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
+
+/*
+ * We cannot do mmap_read_unlock() when the irq is disabled, because of
+ * risk to deadlock with rq_lock. To look up vma when the irqs are
+ * disabled, we need to run mmap_read_unlock() in irq_work. We use a
+ * percpu variable to do the irq_work. If the irq_work is already used
+ * by another lookup, we fall over.
+ */
+static inline bool bpf_mmap_unlock_get_irq_work(struct mmap_unlock_irq_work **work_ptr)
+{
+	struct mmap_unlock_irq_work *work = NULL;
+	bool irq_work_busy = false;
+
+	if (irqs_disabled()) {
+		if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
+			work = this_cpu_ptr(&mmap_unlock_work);
+			if (irq_work_is_busy(&work->irq_work)) {
+				/* cannot queue more up_read, fallback */
+				irq_work_busy = true;
+			}
+		} else {
+			/*
+			 * PREEMPT_RT does not allow to trylock mmap sem in
+			 * interrupt disabled context. Force the fallback code.
+			 */
+			irq_work_busy = true;
+		}
+	}
+
+	*work_ptr = work;
+	return irq_work_busy;
+}
+
+static inline void bpf_mmap_unlock_mm(struct mmap_unlock_irq_work *work, struct mm_struct *mm)
+{
+	if (!work) {
+		mmap_read_unlock(mm);
+	} else {
+		work->mm = mm;
+
+		/* The lock will be released once we're out of interrupt
+		 * context. Tell lockdep that we've released it now so
+		 * it doesn't complain that we forgot to release it.
+		 */
+		rwsem_release(&mm->mmap_lock.dep_map, _RET_IP_);
+		irq_work_queue(&work->irq_work);
+	}
+}
+
+#endif /* __MMAP_UNLOCK_WORK_H__ */
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 6e75bbee39f0b..1de0a1b03636e 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -7,10 +7,10 @@
 #include <linux/kernel.h>
 #include <linux/stacktrace.h>
 #include <linux/perf_event.h>
-#include <linux/irq_work.h>
 #include <linux/btf_ids.h>
 #include <linux/buildid.h>
 #include "percpu_freelist.h"
+#include "mmap_unlock_work.h"
 
 #define STACK_CREATE_FLAG_MASK					\
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY |	\
@@ -31,25 +31,6 @@ struct bpf_stack_map {
 	struct stack_map_bucket *buckets[];
 };
 
-/* irq_work to run up_read() for build_id lookup in nmi context */
-struct stack_map_irq_work {
-	struct irq_work irq_work;
-	struct mm_struct *mm;
-};
-
-static void do_up_read(struct irq_work *entry)
-{
-	struct stack_map_irq_work *work;
-
-	if (WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT)))
-		return;
-
-	work = container_of(entry, struct stack_map_irq_work, irq_work);
-	mmap_read_unlock_non_owner(work->mm);
-}
-
-static DEFINE_PER_CPU(struct stack_map_irq_work, up_read_work);
-
 static inline bool stack_map_use_build_id(struct bpf_map *map)
 {
 	return (map->map_flags & BPF_F_STACK_BUILD_ID);
@@ -149,35 +130,13 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 					  u64 *ips, u32 trace_nr, bool user)
 {
 	int i;
+	struct mmap_unlock_irq_work *work = NULL;
+	bool irq_work_busy = bpf_mmap_unlock_get_irq_work(&work);
 	struct vm_area_struct *vma;
-	bool irq_work_busy = false;
-	struct stack_map_irq_work *work = NULL;
-
-	if (irqs_disabled()) {
-		if (!IS_ENABLED(CONFIG_PREEMPT_RT)) {
-			work = this_cpu_ptr(&up_read_work);
-			if (irq_work_is_busy(&work->irq_work)) {
-				/* cannot queue more up_read, fallback */
-				irq_work_busy = true;
-			}
-		} else {
-			/*
-			 * PREEMPT_RT does not allow to trylock mmap sem in
-			 * interrupt disabled context. Force the fallback code.
-			 */
-			irq_work_busy = true;
-		}
-	}
 
-	/*
-	 * We cannot do up_read() when the irq is disabled, because of
-	 * risk to deadlock with rq_lock. To do build_id lookup when the
-	 * irqs are disabled, we need to run up_read() in irq_work. We use
-	 * a percpu variable to do the irq_work. If the irq_work is
-	 * already used by another lookup, we fall back to report ips.
-	 *
-	 * Same fallback is used for kernel stack (!user) on a stackmap
-	 * with build_id.
+	/* If the irq_work is in use, fall back to report ips. Same
+	 * fallback is used for kernel stack (!user) on a stackmap with
+	 * build_id.
 	 */
 	if (!user || !current || !current->mm || irq_work_busy ||
 	    !mmap_read_trylock(current->mm)) {
@@ -203,19 +162,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
 			- vma->vm_start;
 		id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
 	}
-
-	if (!work) {
-		mmap_read_unlock(current->mm);
-	} else {
-		work->mm = current->mm;
-
-		/* The lock will be released once we're out of interrupt
-		 * context. Tell lockdep that we've released it now so
-		 * it doesn't complain that we forgot to release it.
-		 */
-		rwsem_release(&current->mm->mmap_lock.dep_map, _RET_IP_);
-		irq_work_queue(&work->irq_work);
-	}
+	bpf_mmap_unlock_mm(work, current->mm);
 }
 
 static struct perf_callchain_entry *
@@ -719,16 +666,3 @@ const struct bpf_map_ops stack_trace_map_ops = {
 	.map_btf_name = "bpf_stack_map",
 	.map_btf_id = &stack_trace_map_btf_id,
 };
-
-static int __init stack_map_init(void)
-{
-	int cpu;
-	struct stack_map_irq_work *work;
-
-	for_each_possible_cpu(cpu) {
-		work = per_cpu_ptr(&up_read_work, cpu);
-		init_irq_work(&work->irq_work, do_up_read);
-	}
-	return 0;
-}
-subsys_initcall(stack_map_init);
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index b48750bfba5aa..f171479f7dd6b 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -8,6 +8,7 @@
 #include <linux/fdtable.h>
 #include <linux/filter.h>
 #include <linux/btf_ids.h>
+#include "mmap_unlock_work.h"
 
 struct bpf_iter_seq_task_common {
 	struct pid_namespace *ns;
@@ -524,10 +525,6 @@ static const struct seq_operations task_vma_seq_ops = {
 	.show	= task_vma_seq_show,
 };
 
-BTF_ID_LIST(btf_task_file_ids)
-BTF_ID(struct, file)
-BTF_ID(struct, vm_area_struct)
-
 static const struct bpf_iter_seq_info task_seq_info = {
 	.seq_ops		= &task_seq_ops,
 	.init_seq_private	= init_seq_pidns,
@@ -586,9 +583,74 @@ static struct bpf_iter_reg task_vma_reg_info = {
 	.seq_info		= &task_vma_seq_info,
 };
 
+BPF_CALL_5(bpf_find_vma, struct task_struct *, task, u64, start,
+	   bpf_callback_t, callback_fn, void *, callback_ctx, u64, flags)
+{
+	struct mmap_unlock_irq_work *work = NULL;
+	struct vm_area_struct *vma;
+	bool irq_work_busy = false;
+	struct mm_struct *mm;
+	int ret = -ENOENT;
+
+	if (flags)
+		return -EINVAL;
+
+	if (!task)
+		return -ENOENT;
+
+	mm = task->mm;
+	if (!mm)
+		return -ENOENT;
+
+	irq_work_busy = bpf_mmap_unlock_get_irq_work(&work);
+
+	if (irq_work_busy || !mmap_read_trylock(mm))
+		return -EBUSY;
+
+	vma = find_vma(mm, start);
+
+	if (vma && vma->vm_start <= start && vma->vm_end > start) {
+		callback_fn((u64)(long)task, (u64)(long)vma,
+			    (u64)(long)callback_ctx, 0, 0);
+		ret = 0;
+	}
+	bpf_mmap_unlock_mm(work, mm);
+	return ret;
+}
+
+const struct bpf_func_proto bpf_find_vma_proto = {
+	.func		= bpf_find_vma,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_BTF_ID,
+	.arg1_btf_id	= &btf_task_struct_ids[0],
+	.arg2_type	= ARG_ANYTHING,
+	.arg3_type	= ARG_PTR_TO_FUNC,
+	.arg4_type	= ARG_PTR_TO_STACK_OR_NULL,
+	.arg5_type	= ARG_ANYTHING,
+};
+
+DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
+
+static void do_mmap_read_unlock(struct irq_work *entry)
+{
+	struct mmap_unlock_irq_work *work;
+
+	if (WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_RT)))
+		return;
+
+	work = container_of(entry, struct mmap_unlock_irq_work, irq_work);
+	mmap_read_unlock_non_owner(work->mm);
+}
+
 static int __init task_iter_init(void)
 {
-	int ret;
+	struct mmap_unlock_irq_work *work;
+	int ret, cpu;
+
+	for_each_possible_cpu(cpu) {
+		work = per_cpu_ptr(&mmap_unlock_work, cpu);
+		init_irq_work(&work->irq_work, do_mmap_read_unlock);
+	}
 
 	task_reg_info.ctx_arg_info[0].btf_id = btf_task_struct_ids[0];
 	ret = bpf_iter_reg_target(&task_reg_info);
@@ -596,13 +658,13 @@ static int __init task_iter_init(void)
 		return ret;
 
 	task_file_reg_info.ctx_arg_info[0].btf_id = btf_task_struct_ids[0];
-	task_file_reg_info.ctx_arg_info[1].btf_id = btf_task_file_ids[0];
+	task_file_reg_info.ctx_arg_info[1].btf_id = btf_task_struct_ids[1];
 	ret =  bpf_iter_reg_target(&task_file_reg_info);
 	if (ret)
 		return ret;
 
 	task_vma_reg_info.ctx_arg_info[0].btf_id = btf_task_struct_ids[0];
-	task_vma_reg_info.ctx_arg_info[1].btf_id = btf_task_file_ids[1];
+	task_vma_reg_info.ctx_arg_info[1].btf_id = btf_task_struct_ids[2];
 	return bpf_iter_reg_target(&task_vma_reg_info);
 }
 late_initcall(task_iter_init);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f0dca726ebfde..1aafb43f61d1c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6132,6 +6132,33 @@ static int set_timer_callback_state(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int set_find_vma_callback_state(struct bpf_verifier_env *env,
+				       struct bpf_func_state *caller,
+				       struct bpf_func_state *callee,
+				       int insn_idx)
+{
+	/* bpf_find_vma(struct task_struct *task, u64 addr,
+	 *               void *callback_fn, void *callback_ctx, u64 flags)
+	 * (callback_fn)(struct task_struct *task,
+	 *               struct vm_area_struct *vma, void *callback_ctx);
+	 */
+	callee->regs[BPF_REG_1] = caller->regs[BPF_REG_1];
+
+	callee->regs[BPF_REG_2].type = PTR_TO_BTF_ID;
+	__mark_reg_known_zero(&callee->regs[BPF_REG_2]);
+	callee->regs[BPF_REG_2].btf =  btf_vmlinux;
+	callee->regs[BPF_REG_2].btf_id = btf_task_struct_ids[2];
+
+	/* pointer to stack or null */
+	callee->regs[BPF_REG_3] = caller->regs[BPF_REG_4];
+
+	/* unused */
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_4]);
+	__mark_reg_not_init(env, &callee->regs[BPF_REG_5]);
+	callee->in_callback_fn = true;
+	return 0;
+}
+
 static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx)
 {
 	struct bpf_verifier_state *state = env->cur_state;
@@ -6489,6 +6516,13 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
 			return -EINVAL;
 	}
 
+	if (func_id == BPF_FUNC_find_vma) {
+		err = __check_func_call(env, insn, insn_idx_p, meta.subprogno,
+					set_find_vma_callback_state);
+		if (err < 0)
+			return -EINVAL;
+	}
+
 	if (func_id == BPF_FUNC_snprintf) {
 		err = check_bpf_snprintf_call(env, regs);
 		if (err < 0)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7396488793ff7..390176a3031ab 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1208,6 +1208,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_func_ip_proto_tracing;
 	case BPF_FUNC_get_branch_snapshot:
 		return &bpf_get_branch_snapshot_proto;
+	case BPF_FUNC_find_vma:
+		return &bpf_find_vma_proto;
 	case BPF_FUNC_trace_vprintk:
 		return bpf_get_trace_vprintk_proto();
 	default:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index ba5af15e25f5c..509eee5f0393d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4938,6 +4938,25 @@ union bpf_attr {
  *		**-ENOENT** if symbol is not found.
  *
  *		**-EPERM** if caller does not have permission to obtain kernel address.
+ *
+ * long bpf_find_vma(struct task_struct *task, u64 addr, void *callback_fn, void *callback_ctx, u64 flags)
+ *	Description
+ *		Find vma of *task* that contains *addr*, call *callback_fn*
+ *		function with *task*, *vma*, and *callback_ctx*.
+ *		The *callback_fn* should be a static function and
+ *		the *callback_ctx* should be a pointer to the stack.
+ *		The *flags* is used to control certain aspects of the helper.
+ *		Currently, the *flags* must be 0.
+ *
+ *		The expected callback signature is
+ *
+ *		long (\*callback_fn)(struct task_struct \*task, struct vm_area_struct \*vma, void \*callback_ctx);
+ *
+ *	Return
+ *		0 on success.
+ *		**-ENOENT** if *task->mm* is NULL, or no vma contains *addr*.
+ *		**-EBUSY** if failed to try lock mmap_lock.
+ *		**-EINVAL** for invalid **flags**.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -5120,6 +5139,7 @@ union bpf_attr {
 	FN(trace_vprintk),		\
 	FN(skc_to_unix_sock),		\
 	FN(kallsyms_lookup_name),	\
+	FN(find_vma),			\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 bpf-next 2/2] selftests/bpf: add tests for bpf_find_vma
  2021-11-05 23:23 [PATCH v5 bpf-next 0/2] introduce bpf_find_vma Song Liu
  2021-11-05 23:23 ` [PATCH v5 bpf-next 1/2] bpf: introduce helper bpf_find_vma Song Liu
@ 2021-11-05 23:23 ` Song Liu
  2021-11-07 20:06 ` [PATCH v5 bpf-next 0/2] introduce bpf_find_vma Alexei Starovoitov
  2 siblings, 0 replies; 10+ messages in thread
From: Song Liu @ 2021-11-05 23:23 UTC (permalink / raw)
  To: bpf, netdev
  Cc: ast, daniel, andrii, kernel-team, kpsingh, hengqi.chen, Song Liu,
	Yonghong Song

Add tests for bpf_find_vma in perf_event program and kprobe program. The
perf_event program is triggered from NMI context, so the second call of
bpf_find_vma() will return -EBUSY (irq_work busy). The kprobe program,
on the other hand, does not have this constraint.

Also add tests for illegal writes to task or vma from the callback
function. The verifier should reject both cases.

Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 .../selftests/bpf/prog_tests/find_vma.c       | 117 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/find_vma.c  |  69 +++++++++++
 .../selftests/bpf/progs/find_vma_fail1.c      |  29 +++++
 .../selftests/bpf/progs/find_vma_fail2.c      |  29 +++++
 4 files changed, 244 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/find_vma.c
 create mode 100644 tools/testing/selftests/bpf/progs/find_vma.c
 create mode 100644 tools/testing/selftests/bpf/progs/find_vma_fail1.c
 create mode 100644 tools/testing/selftests/bpf/progs/find_vma_fail2.c

diff --git a/tools/testing/selftests/bpf/prog_tests/find_vma.c b/tools/testing/selftests/bpf/prog_tests/find_vma.c
new file mode 100644
index 0000000000000..b74b3c0c555a8
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/find_vma.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#include <test_progs.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include "find_vma.skel.h"
+#include "find_vma_fail1.skel.h"
+#include "find_vma_fail2.skel.h"
+
+static void test_and_reset_skel(struct find_vma *skel, int expected_find_zero_ret)
+{
+	ASSERT_EQ(skel->bss->found_vm_exec, 1, "found_vm_exec");
+	ASSERT_EQ(skel->data->find_addr_ret, 0, "find_addr_ret");
+	ASSERT_EQ(skel->data->find_zero_ret, expected_find_zero_ret, "find_zero_ret");
+	ASSERT_OK_PTR(strstr(skel->bss->d_iname, "test_progs"), "find_test_progs");
+
+	skel->bss->found_vm_exec = 0;
+	skel->data->find_addr_ret = -1;
+	skel->data->find_zero_ret = -1;
+	skel->bss->d_iname[0] = 0;
+}
+
+static int open_pe(void)
+{
+	struct perf_event_attr attr = {0};
+	int pfd;
+
+	/* create perf event */
+	attr.size = sizeof(attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.config = PERF_COUNT_HW_CPU_CYCLES;
+	attr.freq = 1;
+	attr.sample_freq = 4000;
+	pfd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, PERF_FLAG_FD_CLOEXEC);
+
+	return pfd >= 0 ? pfd : -errno;
+}
+
+static void test_find_vma_pe(struct find_vma *skel)
+{
+	struct bpf_link *link = NULL;
+	volatile int j = 0;
+	int pfd, i;
+
+	pfd = open_pe();
+	if (pfd < 0) {
+		if (pfd == -ENOENT || pfd == -EOPNOTSUPP) {
+			printf("%s:SKIP:no PERF_COUNT_HW_CPU_CYCLES\n", __func__);
+			test__skip();
+			goto cleanup;
+		}
+		if (!ASSERT_GE(pfd, 0, "perf_event_open"))
+			goto cleanup;
+	}
+
+	link = bpf_program__attach_perf_event(skel->progs.handle_pe, pfd);
+	if (!ASSERT_OK_PTR(link, "attach_perf_event"))
+		goto cleanup;
+
+	for (i = 0; i < 1000000; ++i)
+		++j;
+
+	test_and_reset_skel(skel, -EBUSY /* in nmi, irq_work is busy */);
+cleanup:
+	bpf_link__destroy(link);
+	close(pfd);
+}
+
+static void test_find_vma_kprobe(struct find_vma *skel)
+{
+	int err;
+
+	err = find_vma__attach(skel);
+	if (!ASSERT_OK(err, "get_branch_snapshot__attach"))
+		return;
+
+	getpgid(skel->bss->target_pid);
+	test_and_reset_skel(skel, -ENOENT /* could not find vma for ptr 0 */);
+}
+
+static void test_illegal_write_vma(void)
+{
+	struct find_vma_fail1 *skel;
+
+	skel = find_vma_fail1__open_and_load();
+	if (!ASSERT_ERR_PTR(skel, "find_vma_fail1__open_and_load"))
+		find_vma_fail1__destroy(skel);
+}
+
+static void test_illegal_write_task(void)
+{
+	struct find_vma_fail2 *skel;
+
+	skel = find_vma_fail2__open_and_load();
+	if (!ASSERT_ERR_PTR(skel, "find_vma_fail2__open_and_load"))
+		find_vma_fail2__destroy(skel);
+}
+
+void serial_test_find_vma(void)
+{
+	struct find_vma *skel;
+
+	skel = find_vma__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "find_vma__open_and_load"))
+		return;
+
+	skel->bss->target_pid = getpid();
+	skel->bss->addr = (__u64)(uintptr_t)test_find_vma_pe;
+
+	test_find_vma_pe(skel);
+	usleep(100000); /* allow the irq_work to finish */
+	test_find_vma_kprobe(skel);
+
+	find_vma__destroy(skel);
+	test_illegal_write_vma();
+	test_illegal_write_task();
+}
diff --git a/tools/testing/selftests/bpf/progs/find_vma.c b/tools/testing/selftests/bpf/progs/find_vma.c
new file mode 100644
index 0000000000000..38034fb82530f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/find_vma.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct callback_ctx {
+	int dummy;
+};
+
+#define VM_EXEC		0x00000004
+#define DNAME_INLINE_LEN 32
+
+pid_t target_pid = 0;
+char d_iname[DNAME_INLINE_LEN] = {0};
+__u32 found_vm_exec = 0;
+__u64 addr = 0;
+int find_zero_ret = -1;
+int find_addr_ret = -1;
+
+static long check_vma(struct task_struct *task, struct vm_area_struct *vma,
+		      struct callback_ctx *data)
+{
+	if (vma->vm_file)
+		bpf_probe_read_kernel_str(d_iname, DNAME_INLINE_LEN - 1,
+					  vma->vm_file->f_path.dentry->d_iname);
+
+	/* check for VM_EXEC */
+	if (vma->vm_flags & VM_EXEC)
+		found_vm_exec = 1;
+
+	return 0;
+}
+
+SEC("raw_tp/sys_enter")
+int handle_getpid(void)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct callback_ctx data = {};
+
+	if (task->pid != target_pid)
+		return 0;
+
+	find_addr_ret = bpf_find_vma(task, addr, check_vma, &data, 0);
+
+	/* this should return -ENOENT */
+	find_zero_ret = bpf_find_vma(task, 0, check_vma, &data, 0);
+	return 0;
+}
+
+SEC("perf_event")
+int handle_pe(void)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct callback_ctx data = {};
+
+	if (task->pid != target_pid)
+		return 0;
+
+	find_addr_ret = bpf_find_vma(task, addr, check_vma, &data, 0);
+
+	/* In NMI, this should return -EBUSY, as the previous call is using
+	 * the irq_work.
+	 */
+	find_zero_ret = bpf_find_vma(task, 0, check_vma, &data, 0);
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/find_vma_fail1.c b/tools/testing/selftests/bpf/progs/find_vma_fail1.c
new file mode 100644
index 0000000000000..b3b326b8e2d1c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/find_vma_fail1.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct callback_ctx {
+	int dummy;
+};
+
+static long write_vma(struct task_struct *task, struct vm_area_struct *vma,
+		      struct callback_ctx *data)
+{
+	/* writing to vma, which is illegal */
+	vma->vm_flags |= 0x55;
+
+	return 0;
+}
+
+SEC("raw_tp/sys_enter")
+int handle_getpid(void)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct callback_ctx data = {};
+
+	bpf_find_vma(task, 0, write_vma, &data, 0);
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/find_vma_fail2.c b/tools/testing/selftests/bpf/progs/find_vma_fail2.c
new file mode 100644
index 0000000000000..9bcf3203e26b5
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/find_vma_fail2.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct callback_ctx {
+	int dummy;
+};
+
+static long write_task(struct task_struct *task, struct vm_area_struct *vma,
+		       struct callback_ctx *data)
+{
+	/* writing to task, which is illegal */
+	task->mm = NULL;
+
+	return 0;
+}
+
+SEC("raw_tp/sys_enter")
+int handle_getpid(void)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	struct callback_ctx data = {};
+
+	bpf_find_vma(task, 0, write_task, &data, 0);
+	return 0;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 bpf-next 0/2] introduce bpf_find_vma
  2021-11-05 23:23 [PATCH v5 bpf-next 0/2] introduce bpf_find_vma Song Liu
  2021-11-05 23:23 ` [PATCH v5 bpf-next 1/2] bpf: introduce helper bpf_find_vma Song Liu
  2021-11-05 23:23 ` [PATCH v5 bpf-next 2/2] selftests/bpf: add tests for bpf_find_vma Song Liu
@ 2021-11-07 20:06 ` Alexei Starovoitov
  2 siblings, 0 replies; 10+ messages in thread
From: Alexei Starovoitov @ 2021-11-07 20:06 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, Network Development, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, KP Singh, Hengqi Chen

On Fri, Nov 5, 2021 at 4:23 PM Song Liu <songliubraving@fb.com> wrote:
>
> Changes v4 => v5:
> 1. Clean up and style change in 2/2. (Andrii)

Applied. Thanks

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 bpf-next 1/2] bpf: introduce helper bpf_find_vma
  2021-11-05 23:23 ` [PATCH v5 bpf-next 1/2] bpf: introduce helper bpf_find_vma Song Liu
@ 2021-11-08 18:36   ` Eric Dumazet
  2021-11-08 21:59     ` Song Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2021-11-08 18:36 UTC (permalink / raw)
  To: Song Liu, bpf, netdev
  Cc: ast, daniel, andrii, kernel-team, kpsingh, hengqi.chen, Yonghong Song



On 11/5/21 4:23 PM, Song Liu wrote:
> In some profiler use cases, it is necessary to map an address to the
> backing file, e.g., a shared library. bpf_find_vma helper provides a
> flexible way to achieve this. bpf_find_vma maps an address of a task to
> the vma (vm_area_struct) for this address, and feed the vma to an callback
> BPF function. The callback function is necessary here, as we need to
> ensure mmap_sem is unlocked.
> 
> It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem
> safely when irqs are disable, we use the same mechanism as stackmap with
> build_id. Specifically, when irqs are disabled, the unlocked is postponed
> in an irq_work. Refactor stackmap.c so that the irq_work is shared among
> bpf_find_vma and stackmap helpers.
> 
> Acked-by: Yonghong Song <yhs@fb.com>
> Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---

...

> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> index dbc3ad07e21b6..cdb0fba656006 100644
> --- a/kernel/bpf/btf.c
> +++ b/kernel/bpf/btf.c
> @@ -6342,7 +6342,10 @@ const struct bpf_func_proto bpf_btf_find_by_name_kind_proto = {
>  	.arg4_type	= ARG_ANYTHING,
>  };
>  
> -BTF_ID_LIST_GLOBAL_SINGLE(btf_task_struct_ids, struct, task_struct)
> +BTF_ID_LIST_GLOBAL(btf_task_struct_ids)
> +BTF_ID(struct, task_struct)
> +BTF_ID(struct, file)
> +BTF_ID(struct, vm_area_struct)

$ nm -v vmlinux |grep -A3 btf_task_struct_ids
ffffffff82adfd9c R btf_task_struct_ids
ffffffff82adfda0 r __BTF_ID__struct__file__715
ffffffff82adfda4 r __BTF_ID__struct__vm_area_struct__716
ffffffff82adfda8 r bpf_skb_output_btf_ids

KASAN thinks btf_task_struct_ids has 4 bytes only.

BUG: KASAN: global-out-of-bounds in task_iter_init+0x212/0x2e7 kernel/bpf/task_iter.c:661
Read of size 4 at addr ffffffff90297404 by task swapper/0/1

CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.15.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 print_address_description.constprop.0.cold+0xf/0x309 mm/kasan/report.c:256
 __kasan_report mm/kasan/report.c:442 [inline]
 kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
 task_iter_init+0x212/0x2e7 kernel/bpf/task_iter.c:661
 do_one_initcall+0x103/0x650 init/main.c:1295
 do_initcall_level init/main.c:1368 [inline]
 do_initcalls init/main.c:1384 [inline]
 do_basic_setup init/main.c:1403 [inline]
 kernel_init_freeable+0x6b1/0x73a init/main.c:1606
 kernel_init+0x1a/0x1d0 init/main.c:1497
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>

The buggy address belongs to the variable:
 btf_task_struct_ids+0x4/0x40

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 bpf-next 1/2] bpf: introduce helper bpf_find_vma
  2021-11-08 18:36   ` Eric Dumazet
@ 2021-11-08 21:59     ` Song Liu
  2021-11-08 22:27       ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Song Liu @ 2021-11-08 21:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: bpf, netdev, ast, daniel, andrii, Kernel Team, kpsingh,
	hengqi.chen, Yonghong Song



> On Nov 8, 2021, at 10:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> 
> 
> On 11/5/21 4:23 PM, Song Liu wrote:
>> In some profiler use cases, it is necessary to map an address to the
>> backing file, e.g., a shared library. bpf_find_vma helper provides a
>> flexible way to achieve this. bpf_find_vma maps an address of a task to
>> the vma (vm_area_struct) for this address, and feed the vma to an callback
>> BPF function. The callback function is necessary here, as we need to
>> ensure mmap_sem is unlocked.
>> 
>> It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem
>> safely when irqs are disable, we use the same mechanism as stackmap with
>> build_id. Specifically, when irqs are disabled, the unlocked is postponed
>> in an irq_work. Refactor stackmap.c so that the irq_work is shared among
>> bpf_find_vma and stackmap helpers.
>> 
>> Acked-by: Yonghong Song <yhs@fb.com>
>> Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
> 
> ...
> 
>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>> index dbc3ad07e21b6..cdb0fba656006 100644
>> --- a/kernel/bpf/btf.c
>> +++ b/kernel/bpf/btf.c
>> @@ -6342,7 +6342,10 @@ const struct bpf_func_proto bpf_btf_find_by_name_kind_proto = {
>> 	.arg4_type	= ARG_ANYTHING,
>> };
>> 
>> -BTF_ID_LIST_GLOBAL_SINGLE(btf_task_struct_ids, struct, task_struct)
>> +BTF_ID_LIST_GLOBAL(btf_task_struct_ids)
>> +BTF_ID(struct, task_struct)
>> +BTF_ID(struct, file)
>> +BTF_ID(struct, vm_area_struct)
> 
> $ nm -v vmlinux |grep -A3 btf_task_struct_ids
> ffffffff82adfd9c R btf_task_struct_ids
> ffffffff82adfda0 r __BTF_ID__struct__file__715
> ffffffff82adfda4 r __BTF_ID__struct__vm_area_struct__716
> ffffffff82adfda8 r bpf_skb_output_btf_ids
> 
> KASAN thinks btf_task_struct_ids has 4 bytes only.

I have KASAN enabled, but couldn't repro this issue. I think
btf_task_struct_ids looks correct:

nm -v vmlinux | grep -A3 -B1 btf_task_struct_ids
ffffffff83cf8260 r __BTF_ID__struct__task_struct__1026
ffffffff83cf8260 R btf_task_struct_ids
ffffffff83cf8264 r __BTF_ID__struct__file__1027
ffffffff83cf8268 r __BTF_ID__struct__vm_area_struct__1028
ffffffff83cf826c r bpf_skb_output_btf_ids

Did I miss something?

Thanks,
Song

> 
> BUG: KASAN: global-out-of-bounds in task_iter_init+0x212/0x2e7 kernel/bpf/task_iter.c:661
> Read of size 4 at addr ffffffff90297404 by task swapper/0/1
> 
> CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.15.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> print_address_description.constprop.0.cold+0xf/0x309 mm/kasan/report.c:256
> __kasan_report mm/kasan/report.c:442 [inline]
> kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
> task_iter_init+0x212/0x2e7 kernel/bpf/task_iter.c:661
> do_one_initcall+0x103/0x650 init/main.c:1295
> do_initcall_level init/main.c:1368 [inline]
> do_initcalls init/main.c:1384 [inline]
> do_basic_setup init/main.c:1403 [inline]
> kernel_init_freeable+0x6b1/0x73a init/main.c:1606
> kernel_init+0x1a/0x1d0 init/main.c:1497
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> </TASK>
> 
> The buggy address belongs to the variable:
> btf_task_struct_ids+0x4/0x40


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 bpf-next 1/2] bpf: introduce helper bpf_find_vma
  2021-11-08 21:59     ` Song Liu
@ 2021-11-08 22:27       ` Eric Dumazet
  2021-11-08 22:43         ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2021-11-08 22:27 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, netdev, ast, daniel, andrii, Kernel Team, kpsingh,
	hengqi.chen, Yonghong Song



On 11/8/21 1:59 PM, Song Liu wrote:
> 
> 
>> On Nov 8, 2021, at 10:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 11/5/21 4:23 PM, Song Liu wrote:
>>> In some profiler use cases, it is necessary to map an address to the
>>> backing file, e.g., a shared library. bpf_find_vma helper provides a
>>> flexible way to achieve this. bpf_find_vma maps an address of a task to
>>> the vma (vm_area_struct) for this address, and feed the vma to an callback
>>> BPF function. The callback function is necessary here, as we need to
>>> ensure mmap_sem is unlocked.
>>>
>>> It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem
>>> safely when irqs are disable, we use the same mechanism as stackmap with
>>> build_id. Specifically, when irqs are disabled, the unlocked is postponed
>>> in an irq_work. Refactor stackmap.c so that the irq_work is shared among
>>> bpf_find_vma and stackmap helpers.
>>>
>>> Acked-by: Yonghong Song <yhs@fb.com>
>>> Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
>>> Signed-off-by: Song Liu <songliubraving@fb.com>
>>> ---
>>
>> ...
>>
>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>> index dbc3ad07e21b6..cdb0fba656006 100644
>>> --- a/kernel/bpf/btf.c
>>> +++ b/kernel/bpf/btf.c
>>> @@ -6342,7 +6342,10 @@ const struct bpf_func_proto bpf_btf_find_by_name_kind_proto = {
>>> 	.arg4_type	= ARG_ANYTHING,
>>> };
>>>
>>> -BTF_ID_LIST_GLOBAL_SINGLE(btf_task_struct_ids, struct, task_struct)
>>> +BTF_ID_LIST_GLOBAL(btf_task_struct_ids)
>>> +BTF_ID(struct, task_struct)
>>> +BTF_ID(struct, file)
>>> +BTF_ID(struct, vm_area_struct)
>>
>> $ nm -v vmlinux |grep -A3 btf_task_struct_ids
>> ffffffff82adfd9c R btf_task_struct_ids
>> ffffffff82adfda0 r __BTF_ID__struct__file__715
>> ffffffff82adfda4 r __BTF_ID__struct__vm_area_struct__716
>> ffffffff82adfda8 r bpf_skb_output_btf_ids
>>
>> KASAN thinks btf_task_struct_ids has 4 bytes only.
> 
> I have KASAN enabled, but couldn't repro this issue. I think
> btf_task_struct_ids looks correct:
> 
> nm -v vmlinux | grep -A3 -B1 btf_task_struct_ids
> ffffffff83cf8260 r __BTF_ID__struct__task_struct__1026
> ffffffff83cf8260 R btf_task_struct_ids
> ffffffff83cf8264 r __BTF_ID__struct__file__1027
> ffffffff83cf8268 r __BTF_ID__struct__vm_area_struct__1028
> ffffffff83cf826c r bpf_skb_output_btf_ids
> 
> Did I miss something?
> 
> Thanks,
> Song
> 

I will release the syzbot bug, so that you can use its .config

Basically, we have

u32 btf_task_struct_ids[1];

xxxx = btf_task_struct_ids[2];  /* trap */




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 bpf-next 1/2] bpf: introduce helper bpf_find_vma
  2021-11-08 22:27       ` Eric Dumazet
@ 2021-11-08 22:43         ` Eric Dumazet
  2021-11-08 22:56           ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2021-11-08 22:43 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, netdev, ast, daniel, andrii, Kernel Team, kpsingh,
	hengqi.chen, Yonghong Song



On 11/8/21 2:27 PM, Eric Dumazet wrote:
> 
> 
> On 11/8/21 1:59 PM, Song Liu wrote:
>>
>>
>>> On Nov 8, 2021, at 10:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>>
>>>
>>> On 11/5/21 4:23 PM, Song Liu wrote:
>>>> In some profiler use cases, it is necessary to map an address to the
>>>> backing file, e.g., a shared library. bpf_find_vma helper provides a
>>>> flexible way to achieve this. bpf_find_vma maps an address of a task to
>>>> the vma (vm_area_struct) for this address, and feed the vma to an callback
>>>> BPF function. The callback function is necessary here, as we need to
>>>> ensure mmap_sem is unlocked.
>>>>
>>>> It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem
>>>> safely when irqs are disable, we use the same mechanism as stackmap with
>>>> build_id. Specifically, when irqs are disabled, the unlocked is postponed
>>>> in an irq_work. Refactor stackmap.c so that the irq_work is shared among
>>>> bpf_find_vma and stackmap helpers.
>>>>
>>>> Acked-by: Yonghong Song <yhs@fb.com>
>>>> Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
>>>> Signed-off-by: Song Liu <songliubraving@fb.com>
>>>> ---
>>>
>>> ...
>>>
>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>> index dbc3ad07e21b6..cdb0fba656006 100644
>>>> --- a/kernel/bpf/btf.c
>>>> +++ b/kernel/bpf/btf.c
>>>> @@ -6342,7 +6342,10 @@ const struct bpf_func_proto bpf_btf_find_by_name_kind_proto = {
>>>> 	.arg4_type	= ARG_ANYTHING,
>>>> };
>>>>
>>>> -BTF_ID_LIST_GLOBAL_SINGLE(btf_task_struct_ids, struct, task_struct)
>>>> +BTF_ID_LIST_GLOBAL(btf_task_struct_ids)
>>>> +BTF_ID(struct, task_struct)
>>>> +BTF_ID(struct, file)
>>>> +BTF_ID(struct, vm_area_struct)
>>>
>>> $ nm -v vmlinux |grep -A3 btf_task_struct_ids
>>> ffffffff82adfd9c R btf_task_struct_ids
>>> ffffffff82adfda0 r __BTF_ID__struct__file__715
>>> ffffffff82adfda4 r __BTF_ID__struct__vm_area_struct__716
>>> ffffffff82adfda8 r bpf_skb_output_btf_ids
>>>
>>> KASAN thinks btf_task_struct_ids has 4 bytes only.
>>
>> I have KASAN enabled, but couldn't repro this issue. I think
>> btf_task_struct_ids looks correct:
>>
>> nm -v vmlinux | grep -A3 -B1 btf_task_struct_ids
>> ffffffff83cf8260 r __BTF_ID__struct__task_struct__1026
>> ffffffff83cf8260 R btf_task_struct_ids
>> ffffffff83cf8264 r __BTF_ID__struct__file__1027
>> ffffffff83cf8268 r __BTF_ID__struct__vm_area_struct__1028
>> ffffffff83cf826c r bpf_skb_output_btf_ids
>>
>> Did I miss something?
>>
>> Thanks,
>> Song
>>
> 
> I will release the syzbot bug, so that you can use its .config
> 
> Basically, we have
> 
> u32 btf_task_struct_ids[1];

That is, if  

# CONFIG_DEBUG_INFO_BTF is not set

> 
> xxxx = btf_task_struct_ids[2];  /* trap */
> 
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 bpf-next 1/2] bpf: introduce helper bpf_find_vma
  2021-11-08 22:43         ` Eric Dumazet
@ 2021-11-08 22:56           ` Eric Dumazet
  2021-11-08 23:08             ` Song Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2021-11-08 22:56 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, netdev, ast, daniel, andrii, Kernel Team, kpsingh,
	hengqi.chen, Yonghong Song



On 11/8/21 2:43 PM, Eric Dumazet wrote:
> 
> 
> On 11/8/21 2:27 PM, Eric Dumazet wrote:
>>
>>
>> On 11/8/21 1:59 PM, Song Liu wrote:
>>>
>>>
>>>> On Nov 8, 2021, at 10:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On 11/5/21 4:23 PM, Song Liu wrote:
>>>>> In some profiler use cases, it is necessary to map an address to the
>>>>> backing file, e.g., a shared library. bpf_find_vma helper provides a
>>>>> flexible way to achieve this. bpf_find_vma maps an address of a task to
>>>>> the vma (vm_area_struct) for this address, and feed the vma to an callback
>>>>> BPF function. The callback function is necessary here, as we need to
>>>>> ensure mmap_sem is unlocked.
>>>>>
>>>>> It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem
>>>>> safely when irqs are disable, we use the same mechanism as stackmap with
>>>>> build_id. Specifically, when irqs are disabled, the unlocked is postponed
>>>>> in an irq_work. Refactor stackmap.c so that the irq_work is shared among
>>>>> bpf_find_vma and stackmap helpers.
>>>>>
>>>>> Acked-by: Yonghong Song <yhs@fb.com>
>>>>> Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
>>>>> Signed-off-by: Song Liu <songliubraving@fb.com>
>>>>> ---
>>>>
>>>> ...
>>>>
>>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>>> index dbc3ad07e21b6..cdb0fba656006 100644
>>>>> --- a/kernel/bpf/btf.c
>>>>> +++ b/kernel/bpf/btf.c
>>>>> @@ -6342,7 +6342,10 @@ const struct bpf_func_proto bpf_btf_find_by_name_kind_proto = {
>>>>> 	.arg4_type	= ARG_ANYTHING,
>>>>> };
>>>>>
>>>>> -BTF_ID_LIST_GLOBAL_SINGLE(btf_task_struct_ids, struct, task_struct)
>>>>> +BTF_ID_LIST_GLOBAL(btf_task_struct_ids)
>>>>> +BTF_ID(struct, task_struct)
>>>>> +BTF_ID(struct, file)
>>>>> +BTF_ID(struct, vm_area_struct)
>>>>
>>>> $ nm -v vmlinux |grep -A3 btf_task_struct_ids
>>>> ffffffff82adfd9c R btf_task_struct_ids
>>>> ffffffff82adfda0 r __BTF_ID__struct__file__715
>>>> ffffffff82adfda4 r __BTF_ID__struct__vm_area_struct__716
>>>> ffffffff82adfda8 r bpf_skb_output_btf_ids
>>>>
>>>> KASAN thinks btf_task_struct_ids has 4 bytes only.
>>>
>>> I have KASAN enabled, but couldn't repro this issue. I think
>>> btf_task_struct_ids looks correct:
>>>
>>> nm -v vmlinux | grep -A3 -B1 btf_task_struct_ids
>>> ffffffff83cf8260 r __BTF_ID__struct__task_struct__1026
>>> ffffffff83cf8260 R btf_task_struct_ids
>>> ffffffff83cf8264 r __BTF_ID__struct__file__1027
>>> ffffffff83cf8268 r __BTF_ID__struct__vm_area_struct__1028
>>> ffffffff83cf826c r bpf_skb_output_btf_ids
>>>
>>> Did I miss something?
>>>
>>> Thanks,
>>> Song
>>>
>>
>> I will release the syzbot bug, so that you can use its .config
>>
>> Basically, we have
>>
>> u32 btf_task_struct_ids[1];
> 
> That is, if  
> 
> # CONFIG_DEBUG_INFO_BTF is not set
> 

This is how btf_sock_ids gets defined :

#ifdef CONFIG_DEBUG_INFO_BTF
BTF_ID_LIST_GLOBAL(btf_sock_ids)
#define BTF_SOCK_TYPE(name, type) BTF_ID(struct, type)
BTF_SOCK_TYPE_xxx
#undef BTF_SOCK_TYPE
#else
u32 btf_sock_ids[MAX_BTF_SOCK_TYPE];
#endif


Perhaps do the same for btf_task_struct_ids ?

Thanks.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 bpf-next 1/2] bpf: introduce helper bpf_find_vma
  2021-11-08 22:56           ` Eric Dumazet
@ 2021-11-08 23:08             ` Song Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Song Liu @ 2021-11-08 23:08 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: bpf, netdev, ast, daniel, andrii, Kernel Team, kpsingh,
	hengqi.chen, Yonghong Song



> On Nov 8, 2021, at 2:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> 
> 
> On 11/8/21 2:43 PM, Eric Dumazet wrote:
>> 
>> 
>> On 11/8/21 2:27 PM, Eric Dumazet wrote:
>>> 
>>> 
>>> On 11/8/21 1:59 PM, Song Liu wrote:
>>>> 
>>>> 
>>>>> On Nov 8, 2021, at 10:36 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>> On 11/5/21 4:23 PM, Song Liu wrote:
>>>>>> In some profiler use cases, it is necessary to map an address to the
>>>>>> backing file, e.g., a shared library. bpf_find_vma helper provides a
>>>>>> flexible way to achieve this. bpf_find_vma maps an address of a task to
>>>>>> the vma (vm_area_struct) for this address, and feed the vma to an callback
>>>>>> BPF function. The callback function is necessary here, as we need to
>>>>>> ensure mmap_sem is unlocked.
>>>>>> 
>>>>>> It is necessary to lock mmap_sem for find_vma. To lock and unlock mmap_sem
>>>>>> safely when irqs are disable, we use the same mechanism as stackmap with
>>>>>> build_id. Specifically, when irqs are disabled, the unlocked is postponed
>>>>>> in an irq_work. Refactor stackmap.c so that the irq_work is shared among
>>>>>> bpf_find_vma and stackmap helpers.
>>>>>> 
>>>>>> Acked-by: Yonghong Song <yhs@fb.com>
>>>>>> Tested-by: Hengqi Chen <hengqi.chen@gmail.com>
>>>>>> Signed-off-by: Song Liu <songliubraving@fb.com>
>>>>>> ---
>>>>> 
>>>>> ...
>>>>> 
>>>>>> diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
>>>>>> index dbc3ad07e21b6..cdb0fba656006 100644
>>>>>> --- a/kernel/bpf/btf.c
>>>>>> +++ b/kernel/bpf/btf.c
>>>>>> @@ -6342,7 +6342,10 @@ const struct bpf_func_proto bpf_btf_find_by_name_kind_proto = {
>>>>>> 	.arg4_type	= ARG_ANYTHING,
>>>>>> };
>>>>>> 
>>>>>> -BTF_ID_LIST_GLOBAL_SINGLE(btf_task_struct_ids, struct, task_struct)
>>>>>> +BTF_ID_LIST_GLOBAL(btf_task_struct_ids)
>>>>>> +BTF_ID(struct, task_struct)
>>>>>> +BTF_ID(struct, file)
>>>>>> +BTF_ID(struct, vm_area_struct)
>>>>> 
>>>>> $ nm -v vmlinux |grep -A3 btf_task_struct_ids
>>>>> ffffffff82adfd9c R btf_task_struct_ids
>>>>> ffffffff82adfda0 r __BTF_ID__struct__file__715
>>>>> ffffffff82adfda4 r __BTF_ID__struct__vm_area_struct__716
>>>>> ffffffff82adfda8 r bpf_skb_output_btf_ids
>>>>> 
>>>>> KASAN thinks btf_task_struct_ids has 4 bytes only.
>>>> 
>>>> I have KASAN enabled, but couldn't repro this issue. I think
>>>> btf_task_struct_ids looks correct:
>>>> 
>>>> nm -v vmlinux | grep -A3 -B1 btf_task_struct_ids
>>>> ffffffff83cf8260 r __BTF_ID__struct__task_struct__1026
>>>> ffffffff83cf8260 R btf_task_struct_ids
>>>> ffffffff83cf8264 r __BTF_ID__struct__file__1027
>>>> ffffffff83cf8268 r __BTF_ID__struct__vm_area_struct__1028
>>>> ffffffff83cf826c r bpf_skb_output_btf_ids
>>>> 
>>>> Did I miss something?
>>>> 
>>>> Thanks,
>>>> Song
>>>> 
>>> 
>>> I will release the syzbot bug, so that you can use its .config
>>> 
>>> Basically, we have
>>> 
>>> u32 btf_task_struct_ids[1];
>> 
>> That is, if  
>> 
>> # CONFIG_DEBUG_INFO_BTF is not set
>> 
> 
> This is how btf_sock_ids gets defined :
> 
> #ifdef CONFIG_DEBUG_INFO_BTF
> BTF_ID_LIST_GLOBAL(btf_sock_ids)
> #define BTF_SOCK_TYPE(name, type) BTF_ID(struct, type)
> BTF_SOCK_TYPE_xxx
> #undef BTF_SOCK_TYPE
> #else
> u32 btf_sock_ids[MAX_BTF_SOCK_TYPE];
> #endif
> 
> 
> Perhaps do the same for btf_task_struct_ids ?

Yeah, I was testing something below, but this one looks better. 

Shall I include syzbot link for the fix?

Thanks,
Song




diff --git i/include/linux/btf_ids.h w/include/linux/btf_ids.h
index 47d9abfbdb556..4153264c1236b 100644
--- i/include/linux/btf_ids.h
+++ w/include/linux/btf_ids.h
@@ -149,7 +149,7 @@ extern struct btf_id_set name;
 #define BTF_ID_LIST(name) static u32 name[5];
 #define BTF_ID(prefix, name)
 #define BTF_ID_UNUSED
-#define BTF_ID_LIST_GLOBAL(name) u32 name[1];
+#define BTF_ID_LIST_GLOBAL(name) u32 name[3];
 #define BTF_ID_LIST_SINGLE(name, prefix, typename) static u32 name[1];
 #define BTF_ID_LIST_GLOBAL_SINGLE(name, prefix, typename) u32 name[1];
 #define BTF_SET_START(name) static struct btf_id_set name = { 0 };




^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-11-08 23:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-05 23:23 [PATCH v5 bpf-next 0/2] introduce bpf_find_vma Song Liu
2021-11-05 23:23 ` [PATCH v5 bpf-next 1/2] bpf: introduce helper bpf_find_vma Song Liu
2021-11-08 18:36   ` Eric Dumazet
2021-11-08 21:59     ` Song Liu
2021-11-08 22:27       ` Eric Dumazet
2021-11-08 22:43         ` Eric Dumazet
2021-11-08 22:56           ` Eric Dumazet
2021-11-08 23:08             ` Song Liu
2021-11-05 23:23 ` [PATCH v5 bpf-next 2/2] selftests/bpf: add tests for bpf_find_vma Song Liu
2021-11-07 20:06 ` [PATCH v5 bpf-next 0/2] introduce bpf_find_vma Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.