Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v4 bpf-next 0/6] bpf: enable task local storage for tracing programs
@ 2021-02-23  1:20 Song Liu
  2021-02-23  1:20 ` [PATCH v4 bpf-next 1/6] " Song Liu
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Song Liu @ 2021-02-23  1:20 UTC (permalink / raw)
  To: bpf, netdev, linux-kernel; +Cc: ast, daniel, kernel-team, peterz, Song Liu

This set enables task local storage for non-BPF_LSM programs.

It is common for tracing BPF program to access per-task data. Currently,
these data are stored in hash tables with pid as the key. In
bcc/libbpftools [1], 9 out of 23 tools use such hash tables. However,
hash table is not ideal for many use case. Task local storage provides
better usability and performance for BPF programs. Please refer to 4/4 for
some performance comparison of task local storage vs. hash table.

Changes v3 => v4:
1. Prevent deadlock from recursive calls of bpf_task_storage_[get|delete].
   (2/6 checks potential deadlock and fails over, 4/6 adds a selftest).

Changes v2 => v3:
1. Make the selftest more robust. (Andrii)
2. Small changes with runqslower. (Andrii)
3. Shortern CC list to make it easy for vger.

Changes v1 => v2:
1. Do not allocate task local storage when the task is being freed.
2. Revise the selftest and added a new test for a task being freed.
3. Minor changes in runqslower.

Song Liu (6):
  bpf: enable task local storage for tracing programs
  bpf: prevent deadlock from recursive bpf_task_storage_[get|delete]
  selftests/bpf: add non-BPF_LSM test for task local storage
  selftests/bpf: test deadlock from recursive
    bpf_task_storage_[get|delete]
  bpf: runqslower: prefer using local vmlimux to generate vmlinux.h
  bpf: runqslower: use task local storage

 include/linux/bpf.h                           |  7 ++
 include/linux/bpf_lsm.h                       | 22 -----
 include/linux/bpf_types.h                     |  2 +-
 include/linux/sched.h                         |  5 +
 kernel/bpf/Makefile                           |  3 +-
 kernel/bpf/bpf_local_storage.c                | 28 +++---
 kernel/bpf/bpf_lsm.c                          |  4 -
 kernel/bpf/bpf_task_storage.c                 | 89 +++++++++++-------
 kernel/fork.c                                 |  5 +
 kernel/trace/bpf_trace.c                      |  4 +
 tools/bpf/runqslower/Makefile                 |  5 +-
 tools/bpf/runqslower/runqslower.bpf.c         | 33 ++++---
 .../bpf/prog_tests/task_local_storage.c       | 92 +++++++++++++++++++
 .../selftests/bpf/progs/task_local_storage.c  | 64 +++++++++++++
 .../bpf/progs/task_local_storage_exit_creds.c | 32 +++++++
 .../selftests/bpf/progs/task_ls_recursion.c   | 70 ++++++++++++++
 16 files changed, 381 insertions(+), 84 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_storage.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_ls_recursion.c

--
2.24.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 bpf-next 1/6] bpf: enable task local storage for tracing programs
  2021-02-23  1:20 [PATCH v4 bpf-next 0/6] bpf: enable task local storage for tracing programs Song Liu
@ 2021-02-23  1:20 ` Song Liu
  2021-02-23  3:08   ` kernel test robot
                     ` (2 more replies)
  2021-02-23  1:20 ` [PATCH v4 bpf-next 2/6] bpf: prevent deadlock from recursive bpf_task_storage_[get|delete] Song Liu
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 20+ messages in thread
From: Song Liu @ 2021-02-23  1:20 UTC (permalink / raw)
  To: bpf, netdev, linux-kernel
  Cc: ast, daniel, kernel-team, peterz, Song Liu, KP Singh

To access per-task data, BPF programs usually creates a hash table with
pid as the key. This is not ideal because:
 1. The user need to estimate the proper size of the hash table, which may
    be inaccurate;
 2. Big hash tables are slow;
 3. To clean up the data properly during task terminations, the user need
    to write extra logic.

Task local storage overcomes these issues and offers a better option for
these per-task data. Task local storage is only available to BPF_LSM. Now
enable it for tracing programs.

Unlike LSM programs, tracing programs can be called in IRQ contexts.
Helpers that access task local storage are updated to use
raw_spin_lock_irqsave() instead of raw_spin_lock_bh().

Tracing programs can attach to functions on the task free path, e.g.
exit_creds(). To avoid allocating task local storage after
bpf_task_storage_free(). bpf_task_storage_get() is updated to not allocate
new storage when the task is not refcounted (task->usage == 0).

Acked-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 include/linux/bpf.h            |  7 +++++++
 include/linux/bpf_lsm.h        | 22 ----------------------
 include/linux/bpf_types.h      |  2 +-
 include/linux/sched.h          |  5 +++++
 kernel/bpf/Makefile            |  3 +--
 kernel/bpf/bpf_local_storage.c | 28 +++++++++++++++++-----------
 kernel/bpf/bpf_lsm.c           |  4 ----
 kernel/bpf/bpf_task_storage.c  | 34 +++++++++-------------------------
 kernel/fork.c                  |  5 +++++
 kernel/trace/bpf_trace.c       |  4 ++++
 10 files changed, 49 insertions(+), 65 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index cccaef1088eae..e2cfc4809219c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1499,6 +1499,7 @@ struct bpf_prog *bpf_prog_by_id(u32 id);
 struct bpf_link *bpf_link_by_id(u32 id);
 
 const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
+void bpf_task_storage_free(struct task_struct *task);
 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 {
@@ -1684,6 +1685,10 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 {
 	return NULL;
 }
+
+static inline void bpf_task_storage_free(struct task_struct *task)
+{
+}
 #endif /* CONFIG_BPF_SYSCALL */
 
 void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
@@ -1886,6 +1891,8 @@ extern const struct bpf_func_proto bpf_this_cpu_ptr_proto;
 extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto;
 extern const struct bpf_func_proto bpf_sock_from_file_proto;
 extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto;
+extern const struct bpf_func_proto bpf_task_storage_get_proto;
+extern const struct bpf_func_proto bpf_task_storage_delete_proto;
 
 const struct bpf_func_proto *bpf_tracing_func_proto(
 	enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
index 0d1c33ace3987..479c101546ad1 100644
--- a/include/linux/bpf_lsm.h
+++ b/include/linux/bpf_lsm.h
@@ -38,21 +38,9 @@ static inline struct bpf_storage_blob *bpf_inode(
 	return inode->i_security + bpf_lsm_blob_sizes.lbs_inode;
 }
 
-static inline struct bpf_storage_blob *bpf_task(
-	const struct task_struct *task)
-{
-	if (unlikely(!task->security))
-		return NULL;
-
-	return task->security + bpf_lsm_blob_sizes.lbs_task;
-}
-
 extern const struct bpf_func_proto bpf_inode_storage_get_proto;
 extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
-extern const struct bpf_func_proto bpf_task_storage_get_proto;
-extern const struct bpf_func_proto bpf_task_storage_delete_proto;
 void bpf_inode_storage_free(struct inode *inode);
-void bpf_task_storage_free(struct task_struct *task);
 
 #else /* !CONFIG_BPF_LSM */
 
@@ -73,20 +61,10 @@ static inline struct bpf_storage_blob *bpf_inode(
 	return NULL;
 }
 
-static inline struct bpf_storage_blob *bpf_task(
-	const struct task_struct *task)
-{
-	return NULL;
-}
-
 static inline void bpf_inode_storage_free(struct inode *inode)
 {
 }
 
-static inline void bpf_task_storage_free(struct task_struct *task)
-{
-}
-
 #endif /* CONFIG_BPF_LSM */
 
 #endif /* _LINUX_BPF_LSM_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 99f7fd657d87a..b9edee336d804 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -109,8 +109,8 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops)
 #endif
 #ifdef CONFIG_BPF_LSM
 BPF_MAP_TYPE(BPF_MAP_TYPE_INODE_STORAGE, inode_storage_map_ops)
-BPF_MAP_TYPE(BPF_MAP_TYPE_TASK_STORAGE, task_storage_map_ops)
 #endif
+BPF_MAP_TYPE(BPF_MAP_TYPE_TASK_STORAGE, task_storage_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)
 #if defined(CONFIG_XDP_SOCKETS)
 BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4d568288abf9f..e5fbf8e6952ab 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -42,6 +42,7 @@ struct audit_context;
 struct backing_dev_info;
 struct bio_list;
 struct blk_plug;
+struct bpf_local_storage;
 struct capture_control;
 struct cfs_rq;
 struct fs_struct;
@@ -1348,6 +1349,10 @@ struct task_struct {
 	/* Used by LSM modules for access restriction: */
 	void				*security;
 #endif
+#ifdef CONFIG_BPF_SYSCALL
+	/* Used by BPF task local storage */
+	struct bpf_local_storage __rcu	*bpf_storage;
+#endif
 
 #ifdef CONFIG_GCC_PLUGIN_STACKLEAK
 	unsigned long			lowest_stack;
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index d1249340fd6ba..ca995fdfa45e7 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -8,9 +8,8 @@ CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
-obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
+obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o bpf_task_storage.o
 obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
-obj-${CONFIG_BPF_LSM}	  += bpf_task_storage.o
 obj-$(CONFIG_BPF_SYSCALL) += disasm.o
 obj-$(CONFIG_BPF_JIT) += trampoline.o
 obj-$(CONFIG_BPF_SYSCALL) += btf.o
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
index dd5aedee99e73..9bd47ad2b26f1 100644
--- a/kernel/bpf/bpf_local_storage.c
+++ b/kernel/bpf/bpf_local_storage.c
@@ -140,17 +140,18 @@ static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem)
 {
 	struct bpf_local_storage *local_storage;
 	bool free_local_storage = false;
+	unsigned long flags;
 
 	if (unlikely(!selem_linked_to_storage(selem)))
 		/* selem has already been unlinked from sk */
 		return;
 
 	local_storage = rcu_dereference(selem->local_storage);
-	raw_spin_lock_bh(&local_storage->lock);
+	raw_spin_lock_irqsave(&local_storage->lock, flags);
 	if (likely(selem_linked_to_storage(selem)))
 		free_local_storage = bpf_selem_unlink_storage_nolock(
 			local_storage, selem, true);
-	raw_spin_unlock_bh(&local_storage->lock);
+	raw_spin_unlock_irqrestore(&local_storage->lock, flags);
 
 	if (free_local_storage)
 		kfree_rcu(local_storage, rcu);
@@ -167,6 +168,7 @@ void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
 {
 	struct bpf_local_storage_map *smap;
 	struct bpf_local_storage_map_bucket *b;
+	unsigned long flags;
 
 	if (unlikely(!selem_linked_to_map(selem)))
 		/* selem has already be unlinked from smap */
@@ -174,21 +176,22 @@ void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
 
 	smap = rcu_dereference(SDATA(selem)->smap);
 	b = select_bucket(smap, selem);
-	raw_spin_lock_bh(&b->lock);
+	raw_spin_lock_irqsave(&b->lock, flags);
 	if (likely(selem_linked_to_map(selem)))
 		hlist_del_init_rcu(&selem->map_node);
-	raw_spin_unlock_bh(&b->lock);
+	raw_spin_unlock_irqrestore(&b->lock, flags);
 }
 
 void bpf_selem_link_map(struct bpf_local_storage_map *smap,
 			struct bpf_local_storage_elem *selem)
 {
 	struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem);
+	unsigned long flags;
 
-	raw_spin_lock_bh(&b->lock);
+	raw_spin_lock_irqsave(&b->lock, flags);
 	RCU_INIT_POINTER(SDATA(selem)->smap, smap);
 	hlist_add_head_rcu(&selem->map_node, &b->list);
-	raw_spin_unlock_bh(&b->lock);
+	raw_spin_unlock_irqrestore(&b->lock, flags);
 }
 
 void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
@@ -224,16 +227,18 @@ bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
 
 	sdata = SDATA(selem);
 	if (cacheit_lockit) {
+		unsigned long flags;
+
 		/* spinlock is needed to avoid racing with the
 		 * parallel delete.  Otherwise, publishing an already
 		 * deleted sdata to the cache will become a use-after-free
 		 * problem in the next bpf_local_storage_lookup().
 		 */
-		raw_spin_lock_bh(&local_storage->lock);
+		raw_spin_lock_irqsave(&local_storage->lock, flags);
 		if (selem_linked_to_storage(selem))
 			rcu_assign_pointer(local_storage->cache[smap->cache_idx],
 					   sdata);
-		raw_spin_unlock_bh(&local_storage->lock);
+		raw_spin_unlock_irqrestore(&local_storage->lock, flags);
 	}
 
 	return sdata;
@@ -327,6 +332,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
 	struct bpf_local_storage_data *old_sdata = NULL;
 	struct bpf_local_storage_elem *selem;
 	struct bpf_local_storage *local_storage;
+	unsigned long flags;
 	int err;
 
 	/* BPF_EXIST and BPF_NOEXIST cannot be both set */
@@ -374,7 +380,7 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
 		}
 	}
 
-	raw_spin_lock_bh(&local_storage->lock);
+	raw_spin_lock_irqsave(&local_storage->lock, flags);
 
 	/* Recheck local_storage->list under local_storage->lock */
 	if (unlikely(hlist_empty(&local_storage->list))) {
@@ -428,11 +434,11 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
 	}
 
 unlock:
-	raw_spin_unlock_bh(&local_storage->lock);
+	raw_spin_unlock_irqrestore(&local_storage->lock, flags);
 	return SDATA(selem);
 
 unlock_err:
-	raw_spin_unlock_bh(&local_storage->lock);
+	raw_spin_unlock_irqrestore(&local_storage->lock, flags);
 	return ERR_PTR(err);
 }
 
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 1622a44d1617e..9829f381b51c5 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -115,10 +115,6 @@ bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_spin_lock_proto;
 	case BPF_FUNC_spin_unlock:
 		return &bpf_spin_unlock_proto;
-	case BPF_FUNC_task_storage_get:
-		return &bpf_task_storage_get_proto;
-	case BPF_FUNC_task_storage_delete:
-		return &bpf_task_storage_delete_proto;
 	case BPF_FUNC_bprm_opts_set:
 		return &bpf_bprm_opts_set_proto;
 	case BPF_FUNC_ima_inode_hash:
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index e0da0258b732d..2034019966d44 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -15,7 +15,6 @@
 #include <linux/bpf_local_storage.h>
 #include <linux/filter.h>
 #include <uapi/linux/btf.h>
-#include <linux/bpf_lsm.h>
 #include <linux/btf_ids.h>
 #include <linux/fdtable.h>
 
@@ -24,12 +23,8 @@ DEFINE_BPF_STORAGE_CACHE(task_cache);
 static struct bpf_local_storage __rcu **task_storage_ptr(void *owner)
 {
 	struct task_struct *task = owner;
-	struct bpf_storage_blob *bsb;
 
-	bsb = bpf_task(task);
-	if (!bsb)
-		return NULL;
-	return &bsb->storage;
+	return &task->bpf_storage;
 }
 
 static struct bpf_local_storage_data *
@@ -38,13 +33,8 @@ task_storage_lookup(struct task_struct *task, struct bpf_map *map,
 {
 	struct bpf_local_storage *task_storage;
 	struct bpf_local_storage_map *smap;
-	struct bpf_storage_blob *bsb;
-
-	bsb = bpf_task(task);
-	if (!bsb)
-		return NULL;
 
-	task_storage = rcu_dereference(bsb->storage);
+	task_storage = rcu_dereference(task->bpf_storage);
 	if (!task_storage)
 		return NULL;
 
@@ -57,16 +47,12 @@ void bpf_task_storage_free(struct task_struct *task)
 	struct bpf_local_storage_elem *selem;
 	struct bpf_local_storage *local_storage;
 	bool free_task_storage = false;
-	struct bpf_storage_blob *bsb;
 	struct hlist_node *n;
-
-	bsb = bpf_task(task);
-	if (!bsb)
-		return;
+	unsigned long flags;
 
 	rcu_read_lock();
 
-	local_storage = rcu_dereference(bsb->storage);
+	local_storage = rcu_dereference(task->bpf_storage);
 	if (!local_storage) {
 		rcu_read_unlock();
 		return;
@@ -81,7 +67,7 @@ void bpf_task_storage_free(struct task_struct *task)
 	 * when unlinking elem from the local_storage->list and
 	 * the map's bucket->list.
 	 */
-	raw_spin_lock_bh(&local_storage->lock);
+	raw_spin_lock_irqsave(&local_storage->lock, flags);
 	hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
 		/* Always unlink from map before unlinking from
 		 * local_storage.
@@ -90,7 +76,7 @@ void bpf_task_storage_free(struct task_struct *task)
 		free_task_storage = bpf_selem_unlink_storage_nolock(
 			local_storage, selem, false);
 	}
-	raw_spin_unlock_bh(&local_storage->lock);
+	raw_spin_unlock_irqrestore(&local_storage->lock, flags);
 	rcu_read_unlock();
 
 	/* free_task_storage should always be true as long as
@@ -225,11 +211,9 @@ BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
 	if (sdata)
 		return (unsigned long)sdata->data;
 
-	/* This helper must only be called from places where the lifetime of the task
-	 * is guaranteed. Either by being refcounted or by being protected
-	 * by an RCU read-side critical section.
-	 */
-	if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
+	/* only allocate new storage, when the task is refcounted */
+	if (refcount_read(&task->usage) &&
+	    (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) {
 		sdata = bpf_local_storage_update(
 			task, (struct bpf_local_storage_map *)map, value,
 			BPF_NOEXIST);
diff --git a/kernel/fork.c b/kernel/fork.c
index d66cd1014211b..181604db2d65e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -96,6 +96,7 @@
 #include <linux/kasan.h>
 #include <linux/scs.h>
 #include <linux/io_uring.h>
+#include <linux/bpf.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -734,6 +735,7 @@ void __put_task_struct(struct task_struct *tsk)
 	cgroup_free(tsk);
 	task_numa_free(tsk, true);
 	security_task_free(tsk);
+	bpf_task_storage_free(tsk);
 	exit_creds(tsk);
 	delayacct_tsk_free(tsk);
 	put_signal_struct(tsk->signal);
@@ -2062,6 +2064,9 @@ static __latent_entropy struct task_struct *copy_process(
 	p->sequential_io	= 0;
 	p->sequential_io_avg	= 0;
 #endif
+#ifdef CONFIG_BPF_SYSCALL
+	RCU_INIT_POINTER(p->bpf_storage, NULL);
+#endif
 
 	/* Perform scheduler related setup. Assign this task to a CPU. */
 	retval = sched_fork(clone_flags, p);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index b0c45d923f0f9..e9701744d8e4c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1367,6 +1367,10 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_per_cpu_ptr_proto;
 	case BPF_FUNC_this_cpu_ptr:
 		return &bpf_this_cpu_ptr_proto;
+	case BPF_FUNC_task_storage_get:
+		return &bpf_task_storage_get_proto;
+	case BPF_FUNC_task_storage_delete:
+		return &bpf_task_storage_delete_proto;
 	default:
 		return NULL;
 	}
-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 bpf-next 2/6] bpf: prevent deadlock from recursive bpf_task_storage_[get|delete]
  2021-02-23  1:20 [PATCH v4 bpf-next 0/6] bpf: enable task local storage for tracing programs Song Liu
  2021-02-23  1:20 ` [PATCH v4 bpf-next 1/6] " Song Liu
@ 2021-02-23  1:20 ` Song Liu
  2021-02-23  6:21   ` Andrii Nakryiko
  2021-02-23 11:06   ` Peter Zijlstra
  2021-02-23  1:20 ` [PATCH v4 bpf-next 3/6] selftests/bpf: add non-BPF_LSM test for task local storage Song Liu
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 20+ messages in thread
From: Song Liu @ 2021-02-23  1:20 UTC (permalink / raw)
  To: bpf, netdev, linux-kernel; +Cc: ast, daniel, kernel-team, peterz, Song Liu

BPF helpers bpf_task_storage_[get|delete] could hold two locks:
bpf_local_storage_map_bucket->lock and bpf_local_storage->lock. Calling
these helpers from fentry/fexit programs on functions in bpf_*_storage.c
may cause deadlock on either locks.

Prevent such deadlock with a per cpu counter, bpf_task_storage_busy, which
is similar to bpf_prog_active. We need this counter to be global, because
the two locks here belong to two different objects: bpf_local_storage_map
and bpf_local_storage. If we pick one of them as the owner of the counter,
it is still possible to trigger deadlock on the other lock. For example,
if bpf_local_storage_map owns the counters, it cannot prevent deadlock
on bpf_local_storage->lock when two maps are used.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 kernel/bpf/bpf_task_storage.c | 57 ++++++++++++++++++++++++++++++-----
 1 file changed, 50 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index 2034019966d44..ed7d2e02b1c19 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -20,6 +20,31 @@
 
 DEFINE_BPF_STORAGE_CACHE(task_cache);
 
+DEFINE_PER_CPU(int, bpf_task_storage_busy);
+
+static void bpf_task_storage_lock(void)
+{
+	migrate_disable();
+	__this_cpu_inc(bpf_task_storage_busy);
+}
+
+static void bpf_task_storage_unlock(void)
+{
+	__this_cpu_dec(bpf_task_storage_busy);
+	migrate_enable();
+}
+
+static bool bpf_task_storage_trylock(void)
+{
+	migrate_disable();
+	if (unlikely(__this_cpu_inc_return(bpf_task_storage_busy) != 1)) {
+		__this_cpu_dec(bpf_task_storage_busy);
+		migrate_enable();
+		return false;
+	}
+	return true;
+}
+
 static struct bpf_local_storage __rcu **task_storage_ptr(void *owner)
 {
 	struct task_struct *task = owner;
@@ -67,6 +92,7 @@ void bpf_task_storage_free(struct task_struct *task)
 	 * when unlinking elem from the local_storage->list and
 	 * the map's bucket->list.
 	 */
+	bpf_task_storage_lock();
 	raw_spin_lock_irqsave(&local_storage->lock, flags);
 	hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
 		/* Always unlink from map before unlinking from
@@ -77,6 +103,7 @@ void bpf_task_storage_free(struct task_struct *task)
 			local_storage, selem, false);
 	}
 	raw_spin_unlock_irqrestore(&local_storage->lock, flags);
+	bpf_task_storage_unlock();
 	rcu_read_unlock();
 
 	/* free_task_storage should always be true as long as
@@ -109,7 +136,9 @@ static void *bpf_pid_task_storage_lookup_elem(struct bpf_map *map, void *key)
 		goto out;
 	}
 
+	bpf_task_storage_lock();
 	sdata = task_storage_lookup(task, map, true);
+	bpf_task_storage_unlock();
 	put_pid(pid);
 	return sdata ? sdata->data : NULL;
 out:
@@ -141,8 +170,10 @@ static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
 		goto out;
 	}
 
+	bpf_task_storage_lock();
 	sdata = bpf_local_storage_update(
 		task, (struct bpf_local_storage_map *)map, value, map_flags);
+	bpf_task_storage_unlock();
 
 	err = PTR_ERR_OR_ZERO(sdata);
 out:
@@ -185,7 +216,9 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
 		goto out;
 	}
 
+	bpf_task_storage_lock();
 	err = task_storage_delete(task, map);
+	bpf_task_storage_unlock();
 out:
 	put_pid(pid);
 	return err;
@@ -207,34 +240,44 @@ BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
 	if (!task || !task_storage_ptr(task))
 		return (unsigned long)NULL;
 
+	if (!bpf_task_storage_trylock())
+		return (unsigned long)NULL;
+
 	sdata = task_storage_lookup(task, map, true);
 	if (sdata)
-		return (unsigned long)sdata->data;
+		goto unlock;
 
 	/* only allocate new storage, when the task is refcounted */
 	if (refcount_read(&task->usage) &&
-	    (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) {
+	    (flags & BPF_LOCAL_STORAGE_GET_F_CREATE))
 		sdata = bpf_local_storage_update(
 			task, (struct bpf_local_storage_map *)map, value,
 			BPF_NOEXIST);
-		return IS_ERR(sdata) ? (unsigned long)NULL :
-					     (unsigned long)sdata->data;
-	}
 
-	return (unsigned long)NULL;
+unlock:
+	bpf_task_storage_unlock();
+	return IS_ERR_OR_NULL(sdata) ? (unsigned long)NULL :
+		(unsigned long)sdata->data;
 }
 
 BPF_CALL_2(bpf_task_storage_delete, struct bpf_map *, map, struct task_struct *,
 	   task)
 {
+	int ret;
+
 	if (!task)
 		return -EINVAL;
 
+	if (!bpf_task_storage_trylock())
+		return -EBUSY;
+
 	/* This helper must only be called from places where the lifetime of the task
 	 * is guaranteed. Either by being refcounted or by being protected
 	 * by an RCU read-side critical section.
 	 */
-	return task_storage_delete(task, map);
+	ret = task_storage_delete(task, map);
+	bpf_task_storage_unlock();
+	return ret;
 }
 
 static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 bpf-next 3/6] selftests/bpf: add non-BPF_LSM test for task local storage
  2021-02-23  1:20 [PATCH v4 bpf-next 0/6] bpf: enable task local storage for tracing programs Song Liu
  2021-02-23  1:20 ` [PATCH v4 bpf-next 1/6] " Song Liu
  2021-02-23  1:20 ` [PATCH v4 bpf-next 2/6] bpf: prevent deadlock from recursive bpf_task_storage_[get|delete] Song Liu
@ 2021-02-23  1:20 ` Song Liu
  2021-02-23  1:20 ` [PATCH v4 bpf-next 4/6] selftests/bpf: test deadlock from recursive bpf_task_storage_[get|delete] Song Liu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Song Liu @ 2021-02-23  1:20 UTC (permalink / raw)
  To: bpf, netdev, linux-kernel; +Cc: ast, daniel, kernel-team, peterz, Song Liu

Task local storage is enabled for tracing programs. Add two tests for
task local storage without CONFIG_BPF_LSM.

The first test stores a value in sys_enter and read it back in sys_exit.

The second test checks whether the kernel allows allocating task local
storage in exit_creds() (which it should not).

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 .../bpf/prog_tests/task_local_storage.c       | 69 +++++++++++++++++++
 .../selftests/bpf/progs/task_local_storage.c  | 64 +++++++++++++++++
 .../bpf/progs/task_local_storage_exit_creds.c | 32 +++++++++
 3 files changed, 165 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_storage.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage.c
 create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c

diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
new file mode 100644
index 0000000000000..dbb7525cdd567
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#define _GNU_SOURCE         /* See feature_test_macros(7) */
+#include <unistd.h>
+#include <sys/syscall.h>   /* For SYS_xxx definitions */
+#include <sys/types.h>
+#include <test_progs.h>
+#include "task_local_storage.skel.h"
+#include "task_local_storage_exit_creds.skel.h"
+
+static void test_sys_enter_exit(void)
+{
+	struct task_local_storage *skel;
+	int err;
+
+	skel = task_local_storage__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+		return;
+
+	skel->bss->target_pid = syscall(SYS_gettid);
+
+	err = task_local_storage__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto out;
+
+	syscall(SYS_gettid);
+	syscall(SYS_gettid);
+
+	/* 3x syscalls: 1x attach and 2x gettid */
+	ASSERT_EQ(skel->bss->enter_cnt, 3, "enter_cnt");
+	ASSERT_EQ(skel->bss->exit_cnt, 3, "exit_cnt");
+	ASSERT_EQ(skel->bss->mismatch_cnt, 0, "mismatch_cnt");
+out:
+	task_local_storage__destroy(skel);
+}
+
+static void test_exit_creds(void)
+{
+	struct task_local_storage_exit_creds *skel;
+	int err;
+
+	skel = task_local_storage_exit_creds__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+		return;
+
+	err = task_local_storage_exit_creds__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto out;
+
+	/* trigger at least one exit_creds() */
+	if (CHECK_FAIL(system("ls > /dev/null")))
+		goto out;
+
+	/* sync rcu to make sure exit_creds() is called for "ls" */
+	kern_sync_rcu();
+	ASSERT_EQ(skel->bss->valid_ptr_count, 0, "valid_ptr_count");
+	ASSERT_NEQ(skel->bss->null_ptr_count, 0, "null_ptr_count");
+out:
+	task_local_storage_exit_creds__destroy(skel);
+}
+
+void test_task_local_storage(void)
+{
+	if (test__start_subtest("sys_enter_exit"))
+		test_sys_enter_exit();
+	if (test__start_subtest("exit_creds"))
+		test_exit_creds();
+}
diff --git a/tools/testing/selftests/bpf/progs/task_local_storage.c b/tools/testing/selftests/bpf/progs/task_local_storage.c
new file mode 100644
index 0000000000000..80a0a20db88d2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/task_local_storage.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, long);
+} enter_id SEC(".maps");
+
+#define MAGIC_VALUE 0xabcd1234
+
+pid_t target_pid = 0;
+int mismatch_cnt = 0;
+int enter_cnt = 0;
+int exit_cnt = 0;
+
+SEC("tp_btf/sys_enter")
+int BPF_PROG(on_enter, struct pt_regs *regs, long id)
+{
+	struct task_struct *task;
+	long *ptr;
+
+	task = bpf_get_current_task_btf();
+	if (task->pid != target_pid)
+		return 0;
+
+	ptr = bpf_task_storage_get(&enter_id, task, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (!ptr)
+		return 0;
+
+	__sync_fetch_and_add(&enter_cnt, 1);
+	*ptr = MAGIC_VALUE + enter_cnt;
+
+	return 0;
+}
+
+SEC("tp_btf/sys_exit")
+int BPF_PROG(on_exit, struct pt_regs *regs, long id)
+{
+	struct task_struct *task;
+	long *ptr;
+
+	task = bpf_get_current_task_btf();
+	if (task->pid != target_pid)
+		return 0;
+
+	ptr = bpf_task_storage_get(&enter_id, task, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (!ptr)
+		return 0;
+
+	__sync_fetch_and_add(&exit_cnt, 1);
+	if (*ptr != MAGIC_VALUE + exit_cnt)
+		__sync_fetch_and_add(&mismatch_cnt, 1);
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c b/tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c
new file mode 100644
index 0000000000000..81758c0aef993
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, __u64);
+} task_storage SEC(".maps");
+
+int valid_ptr_count = 0;
+int null_ptr_count = 0;
+
+SEC("fentry/exit_creds")
+int BPF_PROG(trace_exit_creds, struct task_struct *task)
+{
+	__u64 *ptr;
+
+	ptr = bpf_task_storage_get(&task_storage, task, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (ptr)
+		__sync_fetch_and_add(&valid_ptr_count, 1);
+	else
+		__sync_fetch_and_add(&null_ptr_count, 1);
+	return 0;
+}
-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 bpf-next 4/6] selftests/bpf: test deadlock from recursive bpf_task_storage_[get|delete]
  2021-02-23  1:20 [PATCH v4 bpf-next 0/6] bpf: enable task local storage for tracing programs Song Liu
                   ` (2 preceding siblings ...)
  2021-02-23  1:20 ` [PATCH v4 bpf-next 3/6] selftests/bpf: add non-BPF_LSM test for task local storage Song Liu
@ 2021-02-23  1:20 ` Song Liu
  2021-02-23  1:20 ` [PATCH v4 bpf-next 5/6] bpf: runqslower: prefer using local vmlimux to generate vmlinux.h Song Liu
  2021-02-23  1:20 ` [PATCH v4 bpf-next 6/6] bpf: runqslower: use task local storage Song Liu
  5 siblings, 0 replies; 20+ messages in thread
From: Song Liu @ 2021-02-23  1:20 UTC (permalink / raw)
  To: bpf, netdev, linux-kernel; +Cc: ast, daniel, kernel-team, peterz, Song Liu

Add a test with recursive bpf_task_storage_[get|delete] from fentry
programs on bpf_local_storage_lookup and bpf_local_storage_update. Without
proper deadlock prevent mechanism, this test would cause deadlock.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 .../bpf/prog_tests/task_local_storage.c       | 23 ++++++
 .../selftests/bpf/progs/task_ls_recursion.c   | 70 +++++++++++++++++++
 2 files changed, 93 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/task_ls_recursion.c

diff --git a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
index dbb7525cdd567..035c263aab1b1 100644
--- a/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
+++ b/tools/testing/selftests/bpf/prog_tests/task_local_storage.c
@@ -8,6 +8,7 @@
 #include <test_progs.h>
 #include "task_local_storage.skel.h"
 #include "task_local_storage_exit_creds.skel.h"
+#include "task_ls_recursion.skel.h"
 
 static void test_sys_enter_exit(void)
 {
@@ -60,10 +61,32 @@ static void test_exit_creds(void)
 	task_local_storage_exit_creds__destroy(skel);
 }
 
+static void test_recursion(void)
+{
+	struct task_ls_recursion *skel;
+	int err;
+
+	skel = task_ls_recursion__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "skel_open_and_load"))
+		return;
+
+	err = task_ls_recursion__attach(skel);
+	if (!ASSERT_OK(err, "skel_attach"))
+		goto out;
+
+	/* trigger sys_enter, make sure it does not cause deadlock */
+	syscall(SYS_gettid);
+
+out:
+	task_ls_recursion__destroy(skel);
+}
+
 void test_task_local_storage(void)
 {
 	if (test__start_subtest("sys_enter_exit"))
 		test_sys_enter_exit();
 	if (test__start_subtest("exit_creds"))
 		test_exit_creds();
+	if (test__start_subtest("recursion"))
+		test_recursion();
 }
diff --git a/tools/testing/selftests/bpf/progs/task_ls_recursion.c b/tools/testing/selftests/bpf/progs/task_ls_recursion.c
new file mode 100644
index 0000000000000..564583dca7c85
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/task_ls_recursion.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 Facebook */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, long);
+} map_a SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, long);
+} map_b SEC(".maps");
+
+SEC("fentry/bpf_local_storage_lookup")
+int BPF_PROG(on_lookup)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+
+	bpf_task_storage_delete(&map_a, task);
+	bpf_task_storage_delete(&map_b, task);
+	return 0;
+}
+
+SEC("fentry/bpf_local_storage_update")
+int BPF_PROG(on_update)
+{
+	struct task_struct *task = bpf_get_current_task_btf();
+	long *ptr;
+
+	ptr = bpf_task_storage_get(&map_a, task, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (ptr)
+		*ptr += 1;
+
+	ptr = bpf_task_storage_get(&map_b, task, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (ptr)
+		*ptr += 1;
+
+	return 0;
+}
+
+SEC("tp_btf/sys_enter")
+int BPF_PROG(on_enter, struct pt_regs *regs, long id)
+{
+	struct task_struct *task;
+	long *ptr;
+
+	task = bpf_get_current_task_btf();
+	ptr = bpf_task_storage_get(&map_a, task, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (ptr)
+		*ptr = 200;
+
+	ptr = bpf_task_storage_get(&map_b, task, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (ptr)
+		*ptr = 100;
+	return 0;
+}
-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 bpf-next 5/6] bpf: runqslower: prefer using local vmlimux to generate vmlinux.h
  2021-02-23  1:20 [PATCH v4 bpf-next 0/6] bpf: enable task local storage for tracing programs Song Liu
                   ` (3 preceding siblings ...)
  2021-02-23  1:20 ` [PATCH v4 bpf-next 4/6] selftests/bpf: test deadlock from recursive bpf_task_storage_[get|delete] Song Liu
@ 2021-02-23  1:20 ` Song Liu
  2021-02-23  6:26   ` Andrii Nakryiko
  2021-02-23 21:24   ` Martin KaFai Lau
  2021-02-23  1:20 ` [PATCH v4 bpf-next 6/6] bpf: runqslower: use task local storage Song Liu
  5 siblings, 2 replies; 20+ messages in thread
From: Song Liu @ 2021-02-23  1:20 UTC (permalink / raw)
  To: bpf, netdev, linux-kernel; +Cc: ast, daniel, kernel-team, peterz, Song Liu

Update the Makefile to prefer using $(O)/mvlinux, $(KBUILD_OUTPUT)/vmlinux
(for selftests) or ../../../vmlinux. These two files should have latest
definitions for vmlinux.h.

Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/bpf/runqslower/Makefile | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/runqslower/Makefile b/tools/bpf/runqslower/Makefile
index 9d9fb6209be1b..c96ba90c6f018 100644
--- a/tools/bpf/runqslower/Makefile
+++ b/tools/bpf/runqslower/Makefile
@@ -16,7 +16,10 @@ CFLAGS := -g -Wall
 
 # Try to detect best kernel BTF source
 KERNEL_REL := $(shell uname -r)
-VMLINUX_BTF_PATHS := /sys/kernel/btf/vmlinux /boot/vmlinux-$(KERNEL_REL)
+VMLINUX_BTF_PATHS := $(if $(O),$(O)/vmlinux)		\
+	$(if $(KBUILD_OUTPUT),$(KBUILD_OUTPUT)/vmlinux) \
+	../../../vmlinux /sys/kernel/btf/vmlinux	\
+	/boot/vmlinux-$(KERNEL_REL)
 VMLINUX_BTF_PATH := $(or $(VMLINUX_BTF),$(firstword			       \
 					  $(wildcard $(VMLINUX_BTF_PATHS))))
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 bpf-next 6/6] bpf: runqslower: use task local storage
  2021-02-23  1:20 [PATCH v4 bpf-next 0/6] bpf: enable task local storage for tracing programs Song Liu
                   ` (4 preceding siblings ...)
  2021-02-23  1:20 ` [PATCH v4 bpf-next 5/6] bpf: runqslower: prefer using local vmlimux to generate vmlinux.h Song Liu
@ 2021-02-23  1:20 ` Song Liu
  2021-02-23 21:33   ` Martin KaFai Lau
  5 siblings, 1 reply; 20+ messages in thread
From: Song Liu @ 2021-02-23  1:20 UTC (permalink / raw)
  To: bpf, netdev, linux-kernel
  Cc: ast, daniel, kernel-team, peterz, Song Liu, Andrii Nakryiko

Replace hashtab with task local storage in runqslower. This improves the
performance of these BPF programs. The following table summarizes average
runtime of these programs, in nanoseconds:

                          task-local   hash-prealloc   hash-no-prealloc
handle__sched_wakeup             125             340               3124
handle__sched_wakeup_new        2812            1510               2998
handle__sched_switch             151             208                991

Note that, task local storage gives better performance than hashtab for
handle__sched_wakeup and handle__sched_switch. On the other hand, for
handle__sched_wakeup_new, task local storage is slower than hashtab with
prealloc. This is because handle__sched_wakeup_new accesses the data for
the first time, so it has to allocate the data for task local storage.
Once the initial allocation is done, subsequent accesses, as those in
handle__sched_wakeup, are much faster with task local storage. If we
disable hashtab prealloc, task local storage is much faster for all 3
functions.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
---
 tools/bpf/runqslower/runqslower.bpf.c | 33 +++++++++++++++++----------
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/tools/bpf/runqslower/runqslower.bpf.c b/tools/bpf/runqslower/runqslower.bpf.c
index 1f18a409f0443..645530ca7e985 100644
--- a/tools/bpf/runqslower/runqslower.bpf.c
+++ b/tools/bpf/runqslower/runqslower.bpf.c
@@ -11,9 +11,9 @@ const volatile __u64 min_us = 0;
 const volatile pid_t targ_pid = 0;
 
 struct {
-	__uint(type, BPF_MAP_TYPE_HASH);
-	__uint(max_entries, 10240);
-	__type(key, u32);
+	__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
 	__type(value, u64);
 } start SEC(".maps");
 
@@ -25,15 +25,20 @@ struct {
 
 /* record enqueue timestamp */
 __always_inline
-static int trace_enqueue(u32 tgid, u32 pid)
+static int trace_enqueue(struct task_struct *t)
 {
-	u64 ts;
+	u32 pid = t->pid;
+	u64 *ptr;
 
 	if (!pid || (targ_pid && targ_pid != pid))
 		return 0;
 
-	ts = bpf_ktime_get_ns();
-	bpf_map_update_elem(&start, &pid, &ts, 0);
+	ptr = bpf_task_storage_get(&start, t, 0,
+				   BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (!ptr)
+		return 0;
+
+	*ptr = bpf_ktime_get_ns();
 	return 0;
 }
 
@@ -43,7 +48,7 @@ int handle__sched_wakeup(u64 *ctx)
 	/* TP_PROTO(struct task_struct *p) */
 	struct task_struct *p = (void *)ctx[0];
 
-	return trace_enqueue(p->tgid, p->pid);
+	return trace_enqueue(p);
 }
 
 SEC("tp_btf/sched_wakeup_new")
@@ -52,7 +57,7 @@ int handle__sched_wakeup_new(u64 *ctx)
 	/* TP_PROTO(struct task_struct *p) */
 	struct task_struct *p = (void *)ctx[0];
 
-	return trace_enqueue(p->tgid, p->pid);
+	return trace_enqueue(p);
 }
 
 SEC("tp_btf/sched_switch")
@@ -70,12 +75,16 @@ int handle__sched_switch(u64 *ctx)
 
 	/* ivcsw: treat like an enqueue event and store timestamp */
 	if (prev->state == TASK_RUNNING)
-		trace_enqueue(prev->tgid, prev->pid);
+		trace_enqueue(prev);
 
 	pid = next->pid;
 
+	/* For pid mismatch, save a bpf_task_storage_get */
+	if (!pid || (targ_pid && targ_pid != pid))
+		return 0;
+
 	/* fetch timestamp and calculate delta */
-	tsp = bpf_map_lookup_elem(&start, &pid);
+	tsp = bpf_task_storage_get(&start, next, 0, 0);
 	if (!tsp)
 		return 0;   /* missed enqueue */
 
@@ -91,7 +100,7 @@ int handle__sched_switch(u64 *ctx)
 	bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU,
 			      &event, sizeof(event));
 
-	bpf_map_delete_elem(&start, &pid);
+	bpf_task_storage_delete(&start, next);
 	return 0;
 }
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 1/6] bpf: enable task local storage for tracing programs
  2021-02-23  1:20 ` [PATCH v4 bpf-next 1/6] " Song Liu
@ 2021-02-23  3:08   ` kernel test robot
  2021-02-23  4:04   ` kernel test robot
  2021-02-23 19:23   ` Martin KaFai Lau
  2 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2021-02-23  3:08 UTC (permalink / raw)
  To: Song Liu, bpf, netdev, linux-kernel
  Cc: kbuild-all, ast, daniel, kernel-team, peterz, Song Liu, KP Singh


[-- Attachment #1: Type: text/plain, Size: 3454 bytes --]

Hi Song,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Song-Liu/bpf-enable-task-local-storage-for-tracing-programs/20210223-092439
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: m68k-randconfig-r031-20210222 (attached as .config)
compiler: m68k-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/a1afc383ea0cda5b0221b111f208140114752782
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Song-Liu/bpf-enable-task-local-storage-for-tracing-programs/20210223-092439
        git checkout a1afc383ea0cda5b0221b111f208140114752782
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=m68k 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   m68k-linux-ld: kernel/bpf/bpf_task_storage.o: in function `task_storage_delete':
>> bpf_task_storage.c:(.text+0x28): undefined reference to `bpf_local_storage_lookup'
>> m68k-linux-ld: bpf_task_storage.c:(.text+0x3a): undefined reference to `bpf_selem_unlink'
   m68k-linux-ld: kernel/bpf/bpf_task_storage.o: in function `bpf_pid_task_storage_update_elem':
>> bpf_task_storage.c:(.text+0x120): undefined reference to `bpf_local_storage_update'
   m68k-linux-ld: kernel/bpf/bpf_task_storage.o: in function `bpf_pid_task_storage_lookup_elem':
   bpf_task_storage.c:(.text+0x1a8): undefined reference to `bpf_local_storage_lookup'
   m68k-linux-ld: kernel/bpf/bpf_task_storage.o: in function `task_storage_map_free':
>> bpf_task_storage.c:(.text+0x232): undefined reference to `bpf_local_storage_cache_idx_free'
>> m68k-linux-ld: bpf_task_storage.c:(.text+0x240): undefined reference to `bpf_local_storage_map_free'
   m68k-linux-ld: kernel/bpf/bpf_task_storage.o: in function `task_storage_map_alloc':
>> bpf_task_storage.c:(.text+0x24c): undefined reference to `bpf_local_storage_map_alloc'
>> m68k-linux-ld: bpf_task_storage.c:(.text+0x262): undefined reference to `bpf_local_storage_cache_idx_get'
   m68k-linux-ld: kernel/bpf/bpf_task_storage.o: in function `bpf_task_storage_get':
   bpf_task_storage.c:(.text+0x2b2): undefined reference to `bpf_local_storage_lookup'
>> m68k-linux-ld: bpf_task_storage.c:(.text+0x2ee): undefined reference to `bpf_local_storage_update'
   m68k-linux-ld: kernel/bpf/bpf_task_storage.o: in function `bpf_task_storage_free':
>> bpf_task_storage.c:(.text+0x328): undefined reference to `bpf_selem_unlink_map'
>> m68k-linux-ld: bpf_task_storage.c:(.text+0x32e): undefined reference to `bpf_selem_unlink_storage_nolock'
>> m68k-linux-ld: kernel/bpf/bpf_task_storage.o:(.rodata+0x74): undefined reference to `bpf_local_storage_map_alloc_check'
>> m68k-linux-ld: kernel/bpf/bpf_task_storage.o:(.rodata+0xcc): undefined reference to `bpf_local_storage_map_check_btf'
   m68k-linux-ld: drivers/macintosh/via-pmu.o: in function `pmu_set_rtc_time':
   (.text+0x187e): undefined reference to `rtc_tm_to_time64'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 38633 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 1/6] bpf: enable task local storage for tracing programs
  2021-02-23  1:20 ` [PATCH v4 bpf-next 1/6] " Song Liu
  2021-02-23  3:08   ` kernel test robot
@ 2021-02-23  4:04   ` kernel test robot
  2021-02-23 19:23   ` Martin KaFai Lau
  2 siblings, 0 replies; 20+ messages in thread
From: kernel test robot @ 2021-02-23  4:04 UTC (permalink / raw)
  To: Song Liu, bpf, netdev, linux-kernel
  Cc: kbuild-all, ast, daniel, kernel-team, peterz, Song Liu, KP Singh


[-- Attachment #1: Type: text/plain, Size: 4376 bytes --]

Hi Song,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Song-Liu/bpf-enable-task-local-storage-for-tracing-programs/20210223-092439
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: xtensa-randconfig-r034-20210222 (attached as .config)
compiler: xtensa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/a1afc383ea0cda5b0221b111f208140114752782
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Song-Liu/bpf-enable-task-local-storage-for-tracing-programs/20210223-092439
        git checkout a1afc383ea0cda5b0221b111f208140114752782
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=xtensa 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   xtensa-linux-ld: kernel/bpf/bpf_task_storage.o: in function `notsupp_get_next_key':
   bpf_task_storage.c:(.text+0x14): undefined reference to `bpf_local_storage_lookup'
>> xtensa-linux-ld: bpf_task_storage.c:(.text+0x18): undefined reference to `bpf_selem_unlink'
   xtensa-linux-ld: kernel/bpf/bpf_task_storage.o: in function `task_storage_delete':
   bpf_task_storage.c:(.text+0x32): undefined reference to `bpf_local_storage_lookup'
   xtensa-linux-ld: bpf_task_storage.c:(.text+0x3e): undefined reference to `bpf_selem_unlink'
   xtensa-linux-ld: kernel/bpf/bpf_task_storage.o: in function `bpf_pid_task_storage_delete_elem':
   bpf_task_storage.c:(.text+0x94): undefined reference to `bpf_local_storage_update'
   xtensa-linux-ld: kernel/bpf/bpf_task_storage.o: in function `bpf_pid_task_storage_update_elem':
   bpf_task_storage.c:(.text+0xce): undefined reference to `bpf_local_storage_update'
>> xtensa-linux-ld: bpf_task_storage.c:(.text+0x12c): undefined reference to `bpf_local_storage_lookup'
   xtensa-linux-ld: kernel/bpf/bpf_task_storage.o: in function `bpf_pid_task_storage_lookup_elem':
   bpf_task_storage.c:(.text+0x148): undefined reference to `bpf_local_storage_cache_idx_free'
>> xtensa-linux-ld: bpf_task_storage.c:(.text+0x14c): undefined reference to `bpf_local_storage_map_free'
>> xtensa-linux-ld: bpf_task_storage.c:(.text+0x159): undefined reference to `bpf_local_storage_cache_idx_free'
   xtensa-linux-ld: bpf_task_storage.c:(.text+0x162): undefined reference to `bpf_local_storage_map_free'
>> xtensa-linux-ld: bpf_task_storage.c:(.text+0x16c): undefined reference to `bpf_local_storage_map_alloc'
>> xtensa-linux-ld: bpf_task_storage.c:(.text+0x170): undefined reference to `bpf_local_storage_cache_idx_get'
   xtensa-linux-ld: bpf_task_storage.c:(.text+0x17a): undefined reference to `bpf_local_storage_map_alloc'
   xtensa-linux-ld: bpf_task_storage.c:(.text+0x18b): undefined reference to `bpf_local_storage_cache_idx_get'
   xtensa-linux-ld: kernel/bpf/bpf_task_storage.o: in function `task_storage_map_free':
   bpf_task_storage.c:(.text+0x1d3): undefined reference to `bpf_local_storage_lookup'
   xtensa-linux-ld: kernel/bpf/bpf_task_storage.o: in function `task_storage_map_alloc':
   bpf_task_storage.c:(.text+0x1f6): undefined reference to `bpf_local_storage_update'
   xtensa-linux-ld: bpf_task_storage.c:(.text+0x208): undefined reference to `bpf_selem_unlink_map'
>> xtensa-linux-ld: bpf_task_storage.c:(.text+0x20c): undefined reference to `bpf_selem_unlink_storage_nolock'
   xtensa-linux-ld: kernel/bpf/bpf_task_storage.o: in function `bpf_task_storage_get':
   bpf_task_storage.c:(.text+0x252): undefined reference to `bpf_selem_unlink_map'
   xtensa-linux-ld: bpf_task_storage.c:(.text+0x25e): undefined reference to `bpf_selem_unlink_storage_nolock'
>> xtensa-linux-ld: kernel/bpf/bpf_task_storage.o:(.rodata+0x78): undefined reference to `bpf_local_storage_map_alloc_check'
>> xtensa-linux-ld: kernel/bpf/bpf_task_storage.o:(.rodata+0xd0): undefined reference to `bpf_local_storage_map_check_btf'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 22009 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 2/6] bpf: prevent deadlock from recursive bpf_task_storage_[get|delete]
  2021-02-23  1:20 ` [PATCH v4 bpf-next 2/6] bpf: prevent deadlock from recursive bpf_task_storage_[get|delete] Song Liu
@ 2021-02-23  6:21   ` Andrii Nakryiko
  2021-02-23  7:16     ` Song Liu
  2021-02-23 11:06   ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2021-02-23  6:21 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, Networking, open list, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Ziljstra

On Mon, Feb 22, 2021 at 5:23 PM Song Liu <songliubraving@fb.com> wrote:
>
> BPF helpers bpf_task_storage_[get|delete] could hold two locks:
> bpf_local_storage_map_bucket->lock and bpf_local_storage->lock. Calling
> these helpers from fentry/fexit programs on functions in bpf_*_storage.c
> may cause deadlock on either locks.
>
> Prevent such deadlock with a per cpu counter, bpf_task_storage_busy, which
> is similar to bpf_prog_active. We need this counter to be global, because
> the two locks here belong to two different objects: bpf_local_storage_map
> and bpf_local_storage. If we pick one of them as the owner of the counter,
> it is still possible to trigger deadlock on the other lock. For example,
> if bpf_local_storage_map owns the counters, it cannot prevent deadlock
> on bpf_local_storage->lock when two maps are used.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  kernel/bpf/bpf_task_storage.c | 57 ++++++++++++++++++++++++++++++-----
>  1 file changed, 50 insertions(+), 7 deletions(-)
>

[...]

> @@ -109,7 +136,9 @@ static void *bpf_pid_task_storage_lookup_elem(struct bpf_map *map, void *key)
>                 goto out;
>         }
>
> +       bpf_task_storage_lock();
>         sdata = task_storage_lookup(task, map, true);
> +       bpf_task_storage_unlock();
>         put_pid(pid);
>         return sdata ? sdata->data : NULL;
>  out:
> @@ -141,8 +170,10 @@ static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
>                 goto out;
>         }
>
> +       bpf_task_storage_lock();
>         sdata = bpf_local_storage_update(
>                 task, (struct bpf_local_storage_map *)map, value, map_flags);

this should probably be container_of() instead of casting

> +       bpf_task_storage_unlock();
>
>         err = PTR_ERR_OR_ZERO(sdata);
>  out:
> @@ -185,7 +216,9 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
>                 goto out;
>         }
>
> +       bpf_task_storage_lock();
>         err = task_storage_delete(task, map);
> +       bpf_task_storage_unlock();
>  out:
>         put_pid(pid);
>         return err;

[...]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 5/6] bpf: runqslower: prefer using local vmlimux to generate vmlinux.h
  2021-02-23  1:20 ` [PATCH v4 bpf-next 5/6] bpf: runqslower: prefer using local vmlimux to generate vmlinux.h Song Liu
@ 2021-02-23  6:26   ` Andrii Nakryiko
  2021-02-23 21:24   ` Martin KaFai Lau
  1 sibling, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2021-02-23  6:26 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, Networking, open list, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Ziljstra

On Mon, Feb 22, 2021 at 5:24 PM Song Liu <songliubraving@fb.com> wrote:
>
> Update the Makefile to prefer using $(O)/mvlinux, $(KBUILD_OUTPUT)/vmlinux
> (for selftests) or ../../../vmlinux. These two files should have latest
> definitions for vmlinux.h.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---

Acked-by: Andrii Nakryiko <andrii@kernel.org>

>  tools/bpf/runqslower/Makefile | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/tools/bpf/runqslower/Makefile b/tools/bpf/runqslower/Makefile
> index 9d9fb6209be1b..c96ba90c6f018 100644
> --- a/tools/bpf/runqslower/Makefile
> +++ b/tools/bpf/runqslower/Makefile
> @@ -16,7 +16,10 @@ CFLAGS := -g -Wall
>
>  # Try to detect best kernel BTF source
>  KERNEL_REL := $(shell uname -r)
> -VMLINUX_BTF_PATHS := /sys/kernel/btf/vmlinux /boot/vmlinux-$(KERNEL_REL)
> +VMLINUX_BTF_PATHS := $(if $(O),$(O)/vmlinux)           \
> +       $(if $(KBUILD_OUTPUT),$(KBUILD_OUTPUT)/vmlinux) \
> +       ../../../vmlinux /sys/kernel/btf/vmlinux        \
> +       /boot/vmlinux-$(KERNEL_REL)
>  VMLINUX_BTF_PATH := $(or $(VMLINUX_BTF),$(firstword                           \
>                                           $(wildcard $(VMLINUX_BTF_PATHS))))
>
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 2/6] bpf: prevent deadlock from recursive bpf_task_storage_[get|delete]
  2021-02-23  6:21   ` Andrii Nakryiko
@ 2021-02-23  7:16     ` Song Liu
  2021-02-23  7:19       ` Andrii Nakryiko
  0 siblings, 1 reply; 20+ messages in thread
From: Song Liu @ 2021-02-23  7:16 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, open list, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Ziljstra



> On Feb 22, 2021, at 10:21 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> 
> On Mon, Feb 22, 2021 at 5:23 PM Song Liu <songliubraving@fb.com> wrote:
>> 
>> BPF helpers bpf_task_storage_[get|delete] could hold two locks:
>> bpf_local_storage_map_bucket->lock and bpf_local_storage->lock. Calling
>> these helpers from fentry/fexit programs on functions in bpf_*_storage.c
>> may cause deadlock on either locks.
>> 
>> Prevent such deadlock with a per cpu counter, bpf_task_storage_busy, which
>> is similar to bpf_prog_active. We need this counter to be global, because
>> the two locks here belong to two different objects: bpf_local_storage_map
>> and bpf_local_storage. If we pick one of them as the owner of the counter,
>> it is still possible to trigger deadlock on the other lock. For example,
>> if bpf_local_storage_map owns the counters, it cannot prevent deadlock
>> on bpf_local_storage->lock when two maps are used.
>> 
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
>> kernel/bpf/bpf_task_storage.c | 57 ++++++++++++++++++++++++++++++-----
>> 1 file changed, 50 insertions(+), 7 deletions(-)
>> 
> 
> [...]
> 
>> @@ -109,7 +136,9 @@ static void *bpf_pid_task_storage_lookup_elem(struct bpf_map *map, void *key)
>>                goto out;
>>        }
>> 
>> +       bpf_task_storage_lock();
>>        sdata = task_storage_lookup(task, map, true);
>> +       bpf_task_storage_unlock();
>>        put_pid(pid);
>>        return sdata ? sdata->data : NULL;
>> out:
>> @@ -141,8 +170,10 @@ static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
>>                goto out;
>>        }
>> 
>> +       bpf_task_storage_lock();
>>        sdata = bpf_local_storage_update(
>>                task, (struct bpf_local_storage_map *)map, value, map_flags);
> 
> this should probably be container_of() instead of casting

bpf_task_storage.c uses casting in multiple places. How about we fix it in a 
separate patch?

Thanks,
Song

> 
>> +       bpf_task_storage_unlock();
>> 
>>        err = PTR_ERR_OR_ZERO(sdata);
>> out:
>> @@ -185,7 +216,9 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
>>                goto out;
>>        }
>> 
>> +       bpf_task_storage_lock();
>>        err = task_storage_delete(task, map);
>> +       bpf_task_storage_unlock();
>> out:
>>        put_pid(pid);
>>        return err;
> 
> [...]


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 2/6] bpf: prevent deadlock from recursive bpf_task_storage_[get|delete]
  2021-02-23  7:16     ` Song Liu
@ 2021-02-23  7:19       ` Andrii Nakryiko
  2021-02-23 16:44         ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2021-02-23  7:19 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, Networking, open list, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Ziljstra

On Mon, Feb 22, 2021 at 11:16 PM Song Liu <songliubraving@fb.com> wrote:
>
>
>
> > On Feb 22, 2021, at 10:21 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Mon, Feb 22, 2021 at 5:23 PM Song Liu <songliubraving@fb.com> wrote:
> >>
> >> BPF helpers bpf_task_storage_[get|delete] could hold two locks:
> >> bpf_local_storage_map_bucket->lock and bpf_local_storage->lock. Calling
> >> these helpers from fentry/fexit programs on functions in bpf_*_storage.c
> >> may cause deadlock on either locks.
> >>
> >> Prevent such deadlock with a per cpu counter, bpf_task_storage_busy, which
> >> is similar to bpf_prog_active. We need this counter to be global, because
> >> the two locks here belong to two different objects: bpf_local_storage_map
> >> and bpf_local_storage. If we pick one of them as the owner of the counter,
> >> it is still possible to trigger deadlock on the other lock. For example,
> >> if bpf_local_storage_map owns the counters, it cannot prevent deadlock
> >> on bpf_local_storage->lock when two maps are used.
> >>
> >> Signed-off-by: Song Liu <songliubraving@fb.com>
> >> ---
> >> kernel/bpf/bpf_task_storage.c | 57 ++++++++++++++++++++++++++++++-----
> >> 1 file changed, 50 insertions(+), 7 deletions(-)
> >>
> >
> > [...]
> >
> >> @@ -109,7 +136,9 @@ static void *bpf_pid_task_storage_lookup_elem(struct bpf_map *map, void *key)
> >>                goto out;
> >>        }
> >>
> >> +       bpf_task_storage_lock();
> >>        sdata = task_storage_lookup(task, map, true);
> >> +       bpf_task_storage_unlock();
> >>        put_pid(pid);
> >>        return sdata ? sdata->data : NULL;
> >> out:
> >> @@ -141,8 +170,10 @@ static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
> >>                goto out;
> >>        }
> >>
> >> +       bpf_task_storage_lock();
> >>        sdata = bpf_local_storage_update(
> >>                task, (struct bpf_local_storage_map *)map, value, map_flags);
> >
> > this should probably be container_of() instead of casting
>
> bpf_task_storage.c uses casting in multiple places. How about we fix it in a
> separate patch?
>

Sure, let's fix all uses in a separate patch. My point is that this
makes an assumption (that struct bpf_map map field is always the very
first one) which is not enforced and not documented in struct
bpf_local_storage_map.

> Thanks,
> Song
>
> >
> >> +       bpf_task_storage_unlock();
> >>
> >>        err = PTR_ERR_OR_ZERO(sdata);
> >> out:
> >> @@ -185,7 +216,9 @@ static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
> >>                goto out;
> >>        }
> >>
> >> +       bpf_task_storage_lock();
> >>        err = task_storage_delete(task, map);
> >> +       bpf_task_storage_unlock();
> >> out:
> >>        put_pid(pid);
> >>        return err;
> >
> > [...]
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 2/6] bpf: prevent deadlock from recursive bpf_task_storage_[get|delete]
  2021-02-23  1:20 ` [PATCH v4 bpf-next 2/6] bpf: prevent deadlock from recursive bpf_task_storage_[get|delete] Song Liu
  2021-02-23  6:21   ` Andrii Nakryiko
@ 2021-02-23 11:06   ` Peter Zijlstra
  2021-02-23 20:49     ` Song Liu
  1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2021-02-23 11:06 UTC (permalink / raw)
  To: Song Liu; +Cc: bpf, netdev, linux-kernel, ast, daniel, kernel-team

On Mon, Feb 22, 2021 at 05:20:10PM -0800, Song Liu wrote:
> BPF helpers bpf_task_storage_[get|delete] could hold two locks:
> bpf_local_storage_map_bucket->lock and bpf_local_storage->lock. Calling
> these helpers from fentry/fexit programs on functions in bpf_*_storage.c
> may cause deadlock on either locks.
> 
> Prevent such deadlock with a per cpu counter, bpf_task_storage_busy, which
> is similar to bpf_prog_active. We need this counter to be global, because

So bpf_prog_active is one of the biggest turds around, and now you're
making it worse ?!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 2/6] bpf: prevent deadlock from recursive bpf_task_storage_[get|delete]
  2021-02-23  7:19       ` Andrii Nakryiko
@ 2021-02-23 16:44         ` Alexei Starovoitov
  0 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2021-02-23 16:44 UTC (permalink / raw)
  To: Andrii Nakryiko, Song Liu
  Cc: bpf, Networking, open list, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Peter Ziljstra

On 2/22/21 11:19 PM, Andrii Nakryiko wrote:
>>>> +       bpf_task_storage_lock();
>>>>         sdata = bpf_local_storage_update(
>>>>                 task, (struct bpf_local_storage_map *)map, value, map_flags);
>>> this should probably be container_of() instead of casting
>> bpf_task_storage.c uses casting in multiple places. How about we fix it in a
>> separate patch?
>>
> Sure, let's fix all uses in a separate patch. My point is that this
> makes an assumption (that struct bpf_map map field is always the very
> first one) which is not enforced and not documented in struct
> bpf_local_storage_map.
> 

I'd rather document it in separate patch.
Just doing container_of here and there will lead to wrong assumption
that it can be in a different place, but it's much more involved.
Consider what happens with all map_alloc callbacks... how that pointer
is hard coded into bpf prog.. then passed back into helpers...
then the logic that can read inner fields of bpf_map from the prog...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 1/6] bpf: enable task local storage for tracing programs
  2021-02-23  1:20 ` [PATCH v4 bpf-next 1/6] " Song Liu
  2021-02-23  3:08   ` kernel test robot
  2021-02-23  4:04   ` kernel test robot
@ 2021-02-23 19:23   ` Martin KaFai Lau
  2021-02-23 20:51     ` Song Liu
  2 siblings, 1 reply; 20+ messages in thread
From: Martin KaFai Lau @ 2021-02-23 19:23 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, netdev, linux-kernel, ast, daniel, kernel-team, peterz, KP Singh

On Mon, Feb 22, 2021 at 05:20:09PM -0800, Song Liu wrote:
[ ... ]

> diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
> index e0da0258b732d..2034019966d44 100644
> --- a/kernel/bpf/bpf_task_storage.c
> +++ b/kernel/bpf/bpf_task_storage.c
> @@ -15,7 +15,6 @@
>  #include <linux/bpf_local_storage.h>
>  #include <linux/filter.h>
>  #include <uapi/linux/btf.h>
> -#include <linux/bpf_lsm.h>
>  #include <linux/btf_ids.h>
>  #include <linux/fdtable.h>
>  
> @@ -24,12 +23,8 @@ DEFINE_BPF_STORAGE_CACHE(task_cache);
>  static struct bpf_local_storage __rcu **task_storage_ptr(void *owner)
>  {
>  	struct task_struct *task = owner;
> -	struct bpf_storage_blob *bsb;
>  
> -	bsb = bpf_task(task);
> -	if (!bsb)
> -		return NULL;
task_storage_ptr() no longer returns NULL.  All "!task_storage_ptr(task)"
checks should be removed also.  e.g. In bpf_task_storage_get
and bpf_pid_task_storage_update_elem.

> -	return &bsb->storage;
> +	return &task->bpf_storage;
>  }
>  

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 2/6] bpf: prevent deadlock from recursive bpf_task_storage_[get|delete]
  2021-02-23 11:06   ` Peter Zijlstra
@ 2021-02-23 20:49     ` Song Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Song Liu @ 2021-02-23 20:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bpf, Networking, open list, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team



> On Feb 23, 2021, at 3:06 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Mon, Feb 22, 2021 at 05:20:10PM -0800, Song Liu wrote:
>> BPF helpers bpf_task_storage_[get|delete] could hold two locks:
>> bpf_local_storage_map_bucket->lock and bpf_local_storage->lock. Calling
>> these helpers from fentry/fexit programs on functions in bpf_*_storage.c
>> may cause deadlock on either locks.
>> 
>> Prevent such deadlock with a per cpu counter, bpf_task_storage_busy, which
>> is similar to bpf_prog_active. We need this counter to be global, because
> 
> So bpf_prog_active is one of the biggest turds around, and now you're
> making it worse ?!

bpf_prog_active is a distraction here. We are trying to enable task local 
storage for fentry/fext programs, which do not use bpf_prog_active.

bpf_task_storage_busy counter is introduced to protect against a specific 
pattern of deadlocks (attaching fentry/fexit on bpf_task_storage_[get|delete] 
helpers, then let the programs call these two helpers again). 

Thanks,
Song

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 1/6] bpf: enable task local storage for tracing programs
  2021-02-23 19:23   ` Martin KaFai Lau
@ 2021-02-23 20:51     ` Song Liu
  0 siblings, 0 replies; 20+ messages in thread
From: Song Liu @ 2021-02-23 20:51 UTC (permalink / raw)
  To: Martin Lau
  Cc: bpf, Networking, linux-kernel, ast, daniel, Kernel Team, peterz,
	KP Singh



> On Feb 23, 2021, at 11:23 AM, Martin Lau <kafai@fb.com> wrote:
> 
> On Mon, Feb 22, 2021 at 05:20:09PM -0800, Song Liu wrote:
> [ ... ]
> 
>> diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
>> index e0da0258b732d..2034019966d44 100644
>> --- a/kernel/bpf/bpf_task_storage.c
>> +++ b/kernel/bpf/bpf_task_storage.c
>> @@ -15,7 +15,6 @@
>> #include <linux/bpf_local_storage.h>
>> #include <linux/filter.h>
>> #include <uapi/linux/btf.h>
>> -#include <linux/bpf_lsm.h>
>> #include <linux/btf_ids.h>
>> #include <linux/fdtable.h>
>> 
>> @@ -24,12 +23,8 @@ DEFINE_BPF_STORAGE_CACHE(task_cache);
>> static struct bpf_local_storage __rcu **task_storage_ptr(void *owner)
>> {
>> 	struct task_struct *task = owner;
>> -	struct bpf_storage_blob *bsb;
>> 
>> -	bsb = bpf_task(task);
>> -	if (!bsb)
>> -		return NULL;
> task_storage_ptr() no longer returns NULL.  All "!task_storage_ptr(task)"
> checks should be removed also.  e.g. In bpf_task_storage_get
> and bpf_pid_task_storage_update_elem.

Good catch! Fixed it in v5. 

Thanks,
Song

> 
>> -	return &bsb->storage;
>> +	return &task->bpf_storage;
>> }
>> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 5/6] bpf: runqslower: prefer using local vmlimux to generate vmlinux.h
  2021-02-23  1:20 ` [PATCH v4 bpf-next 5/6] bpf: runqslower: prefer using local vmlimux to generate vmlinux.h Song Liu
  2021-02-23  6:26   ` Andrii Nakryiko
@ 2021-02-23 21:24   ` Martin KaFai Lau
  1 sibling, 0 replies; 20+ messages in thread
From: Martin KaFai Lau @ 2021-02-23 21:24 UTC (permalink / raw)
  Cc: bpf, netdev, linux-kernel, ast, daniel, kernel-team, peterz

On Mon, Feb 22, 2021 at 05:20:13PM -0800, Song Liu wrote:
> Update the Makefile to prefer using $(O)/mvlinux, $(KBUILD_OUTPUT)/vmlinux
s/mvlinux/vmlinux/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 bpf-next 6/6] bpf: runqslower: use task local storage
  2021-02-23  1:20 ` [PATCH v4 bpf-next 6/6] bpf: runqslower: use task local storage Song Liu
@ 2021-02-23 21:33   ` Martin KaFai Lau
  0 siblings, 0 replies; 20+ messages in thread
From: Martin KaFai Lau @ 2021-02-23 21:33 UTC (permalink / raw)
  To: Song Liu
  Cc: bpf, netdev, linux-kernel, ast, daniel, kernel-team, peterz,
	Andrii Nakryiko

On Mon, Feb 22, 2021 at 05:20:14PM -0800, Song Liu wrote:
> Replace hashtab with task local storage in runqslower. This improves the
> performance of these BPF programs. The following table summarizes average
> runtime of these programs, in nanoseconds:
> 
>                           task-local   hash-prealloc   hash-no-prealloc
> handle__sched_wakeup             125             340               3124
> handle__sched_wakeup_new        2812            1510               2998
> handle__sched_switch             151             208                991
Nice!  The required code change is also minimal.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, back to index

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-23  1:20 [PATCH v4 bpf-next 0/6] bpf: enable task local storage for tracing programs Song Liu
2021-02-23  1:20 ` [PATCH v4 bpf-next 1/6] " Song Liu
2021-02-23  3:08   ` kernel test robot
2021-02-23  4:04   ` kernel test robot
2021-02-23 19:23   ` Martin KaFai Lau
2021-02-23 20:51     ` Song Liu
2021-02-23  1:20 ` [PATCH v4 bpf-next 2/6] bpf: prevent deadlock from recursive bpf_task_storage_[get|delete] Song Liu
2021-02-23  6:21   ` Andrii Nakryiko
2021-02-23  7:16     ` Song Liu
2021-02-23  7:19       ` Andrii Nakryiko
2021-02-23 16:44         ` Alexei Starovoitov
2021-02-23 11:06   ` Peter Zijlstra
2021-02-23 20:49     ` Song Liu
2021-02-23  1:20 ` [PATCH v4 bpf-next 3/6] selftests/bpf: add non-BPF_LSM test for task local storage Song Liu
2021-02-23  1:20 ` [PATCH v4 bpf-next 4/6] selftests/bpf: test deadlock from recursive bpf_task_storage_[get|delete] Song Liu
2021-02-23  1:20 ` [PATCH v4 bpf-next 5/6] bpf: runqslower: prefer using local vmlimux to generate vmlinux.h Song Liu
2021-02-23  6:26   ` Andrii Nakryiko
2021-02-23 21:24   ` Martin KaFai Lau
2021-02-23  1:20 ` [PATCH v4 bpf-next 6/6] bpf: runqslower: use task local storage Song Liu
2021-02-23 21:33   ` Martin KaFai Lau

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git