bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next 00/10] bpf_link observability APIs
@ 2020-04-28  5:49 Andrii Nakryiko
  2020-04-28  5:49 ` [PATCH v2 bpf-next 01/10] bpf: refactor bpf_link update handling Andrii Nakryiko
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28  5:49 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

This patch series adds various observability APIs to bpf_link:
  - each bpf_link now gets ID, similar to bpf_map and bpf_prog, by which
    user-space can iterate over all existing bpf_links and create limited FD
    from ID;
  - allows to get extra object information with bpf_link general and
    type-specific information;
  - implements `bpf link show` command which lists all active bpf_links in the
    system;
  - implements `bpf link pin` allowing to pin bpf_link by ID or from other
    pinned path.

v1->v2:
  - simplified `bpftool link show` implementation (Quentin);
  - fixed formatting of bpftool-link.rst (Quentin);
  - fixed attach type printing logic (Quentin);
rfc->v1:
  - dropped read-only bpf_links (Alexei);
  - fixed bug in bpf_link_cleanup() not removing ID;
  - fixed bpftool link pinning search logic;
  - added bash-completion and man page.

Andrii Nakryiko (10):
  bpf: refactor bpf_link update handling
  bpf: allocate ID for bpf_link
  bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  bpf: add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link
  libbpf: add low-level APIs for new bpf_link commands
  selftests/bpf: test bpf_link's get_next_id, get_fd_by_id, and
    get_obj_info
  bpftool: expose attach_type-to-string array to non-cgroup code
  bpftool: add bpf_link show and pin support
  bpftool: add bpftool-link manpage
  bpftool: add link bash completions

 include/linux/bpf-cgroup.h                    |  14 -
 include/linux/bpf.h                           |  28 +-
 include/linux/bpf_types.h                     |   6 +
 include/uapi/linux/bpf.h                      |  31 ++
 kernel/bpf/btf.c                              |   2 +
 kernel/bpf/cgroup.c                           |  89 ++++-
 kernel/bpf/syscall.c                          | 363 ++++++++++++++----
 kernel/bpf/verifier.c                         |   2 +
 kernel/cgroup/cgroup.c                        |  27 --
 .../bpftool/Documentation/bpftool-link.rst    | 118 ++++++
 tools/bpf/bpftool/bash-completion/bpftool     |  39 ++
 tools/bpf/bpftool/cgroup.c                    |  48 +--
 tools/bpf/bpftool/common.c                    |   2 +
 tools/bpf/bpftool/link.c                      | 333 ++++++++++++++++
 tools/bpf/bpftool/main.c                      |   6 +-
 tools/bpf/bpftool/main.h                      |  37 ++
 tools/include/uapi/linux/bpf.h                |  31 ++
 tools/lib/bpf/bpf.c                           |  19 +-
 tools/lib/bpf/bpf.h                           |   4 +-
 tools/lib/bpf/libbpf.map                      |   6 +
 .../selftests/bpf/prog_tests/bpf_obj_id.c     | 110 +++++-
 .../testing/selftests/bpf/progs/test_obj_id.c |  14 +-
 22 files changed, 1145 insertions(+), 184 deletions(-)
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-link.rst
 create mode 100644 tools/bpf/bpftool/link.c

-- 
2.24.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 bpf-next 01/10] bpf: refactor bpf_link update handling
  2020-04-28  5:49 [PATCH v2 bpf-next 00/10] bpf_link observability APIs Andrii Nakryiko
@ 2020-04-28  5:49 ` Andrii Nakryiko
  2020-04-28  5:49 ` [PATCH v2 bpf-next 02/10] bpf: allocate ID for bpf_link Andrii Nakryiko
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28  5:49 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Make bpf_link update support more generic by making it into another
bpf_link_ops methods. This allows generic syscall handling code to be agnostic
to various conditionally compiled features (e.g., the case of
CONFIG_CGROUP_BPF). This also allows to keep link type-specific code to remain
static within respective code base. Refactor existing bpf_cgroup_link code and
take advantage of this.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/linux/bpf-cgroup.h | 12 ------------
 include/linux/bpf.h        |  3 ++-
 kernel/bpf/cgroup.c        | 30 ++++++++++++++++++++++++++++--
 kernel/bpf/syscall.c       | 11 ++++-------
 kernel/cgroup/cgroup.c     | 27 ---------------------------
 5 files changed, 34 insertions(+), 49 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index c11b413d5b1a..d2d969669564 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -100,8 +100,6 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 			struct bpf_cgroup_link *link,
 			enum bpf_attach_type type);
-int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
-			 struct bpf_prog *new_prog);
 int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		       union bpf_attr __user *uattr);
 
@@ -112,8 +110,6 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
 		      u32 flags);
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 		      enum bpf_attach_type type);
-int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *old_prog,
-		       struct bpf_prog *new_prog);
 int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		     union bpf_attr __user *uattr);
 
@@ -354,7 +350,6 @@ int cgroup_bpf_prog_query(const union bpf_attr *attr,
 #else
 
 struct bpf_prog;
-struct bpf_link;
 struct cgroup_bpf {};
 static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; }
 static inline void cgroup_bpf_offline(struct cgroup *cgrp) {}
@@ -378,13 +373,6 @@ static inline int cgroup_bpf_link_attach(const union bpf_attr *attr,
 	return -EINVAL;
 }
 
-static inline int cgroup_bpf_replace(struct bpf_link *link,
-				     struct bpf_prog *old_prog,
-				     struct bpf_prog *new_prog)
-{
-	return -EINVAL;
-}
-
 static inline int cgroup_bpf_prog_query(const union bpf_attr *attr,
 					union bpf_attr __user *uattr)
 {
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 10960cfabea4..81c8620cb4c4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1093,7 +1093,8 @@ struct bpf_link {
 struct bpf_link_ops {
 	void (*release)(struct bpf_link *link);
 	void (*dealloc)(struct bpf_link *link);
-
+	int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
+			   struct bpf_prog *old_prog);
 };
 
 void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index fc7c7002fd37..0b284da05d0f 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -557,8 +557,9 @@ static void replace_effective_prog(struct cgroup *cgrp,
  *
  * Must be called with cgroup_mutex held.
  */
-int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
-			 struct bpf_prog *new_prog)
+static int __cgroup_bpf_replace(struct cgroup *cgrp,
+				struct bpf_cgroup_link *link,
+				struct bpf_prog *new_prog)
 {
 	struct list_head *progs = &cgrp->bpf.progs[link->type];
 	struct bpf_prog *old_prog;
@@ -583,6 +584,30 @@ int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
 	return 0;
 }
 
+static int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *new_prog,
+			      struct bpf_prog *old_prog)
+{
+	struct bpf_cgroup_link *cg_link;
+	int ret;
+
+	cg_link = container_of(link, struct bpf_cgroup_link, link);
+
+	mutex_lock(&cgroup_mutex);
+	/* link might have been auto-released by dying cgroup, so fail */
+	if (!cg_link->cgroup) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	if (old_prog && link->prog != old_prog) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+	ret = __cgroup_bpf_replace(cg_link->cgroup, cg_link, new_prog);
+out_unlock:
+	mutex_unlock(&cgroup_mutex);
+	return ret;
+}
+
 static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
 					       struct bpf_prog *prog,
 					       struct bpf_cgroup_link *link,
@@ -811,6 +836,7 @@ static void bpf_cgroup_link_dealloc(struct bpf_link *link)
 const struct bpf_link_ops bpf_cgroup_link_lops = {
 	.release = bpf_cgroup_link_release,
 	.dealloc = bpf_cgroup_link_dealloc,
+	.update_prog = cgroup_bpf_replace,
 };
 
 int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7626b8024471..f5358e1462eb 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3645,13 +3645,10 @@ static int link_update(union bpf_attr *attr)
 		goto out_put_progs;
 	}
 
-#ifdef CONFIG_CGROUP_BPF
-	if (link->ops == &bpf_cgroup_link_lops) {
-		ret = cgroup_bpf_replace(link, old_prog, new_prog);
-		goto out_put_progs;
-	}
-#endif
-	ret = -EINVAL;
+	if (link->ops->update_prog)
+		ret = link->ops->update_prog(link, new_prog, old_prog);
+	else
+		ret = EINVAL;
 
 out_put_progs:
 	if (old_prog)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 06b5ea9d899d..557a9b9d2244 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6508,33 +6508,6 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
 	return ret;
 }
 
-int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *old_prog,
-		       struct bpf_prog *new_prog)
-{
-	struct bpf_cgroup_link *cg_link;
-	int ret;
-
-	if (link->ops != &bpf_cgroup_link_lops)
-		return -EINVAL;
-
-	cg_link = container_of(link, struct bpf_cgroup_link, link);
-
-	mutex_lock(&cgroup_mutex);
-	/* link might have been auto-released by dying cgroup, so fail */
-	if (!cg_link->cgroup) {
-		ret = -EINVAL;
-		goto out_unlock;
-	}
-	if (old_prog && link->prog != old_prog) {
-		ret = -EPERM;
-		goto out_unlock;
-	}
-	ret = __cgroup_bpf_replace(cg_link->cgroup, cg_link, new_prog);
-out_unlock:
-	mutex_unlock(&cgroup_mutex);
-	return ret;
-}
-
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 		      enum bpf_attach_type type)
 {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 bpf-next 02/10] bpf: allocate ID for bpf_link
  2020-04-28  5:49 [PATCH v2 bpf-next 00/10] bpf_link observability APIs Andrii Nakryiko
  2020-04-28  5:49 ` [PATCH v2 bpf-next 01/10] bpf: refactor bpf_link update handling Andrii Nakryiko
@ 2020-04-28  5:49 ` Andrii Nakryiko
  2020-04-28 17:31   ` Alexei Starovoitov
  2020-04-28  5:49 ` [PATCH v2 bpf-next 03/10] bpf: support GET_FD_BY_ID and GET_NEXT_ID " Andrii Nakryiko
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28  5:49 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Generate ID for each bpf_link using IDR, similarly to bpf_map and bpf_prog.
bpf_link creation, initialization, attachment, and exposing to user-space
through FD and ID is a complicated multi-step process, abstract it away
through bpf_link_primer and bpf_link_prime(), bpf_link_settle(), and
bpf_link_cleanup() internal API. They guarantee that until bpf_link is
properly attached, user-space won't be able to access partially-initialized
bpf_link either from FD or ID. All this allows to simplify bpf_link attachment
and error handling code.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/linux/bpf.h      |  17 +++--
 include/uapi/linux/bpf.h |   1 +
 kernel/bpf/cgroup.c      |  14 ++--
 kernel/bpf/syscall.c     | 143 +++++++++++++++++++++++++++------------
 4 files changed, 118 insertions(+), 57 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 81c8620cb4c4..875d1f0af803 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1085,11 +1085,19 @@ int bpf_prog_new_fd(struct bpf_prog *prog);
 
 struct bpf_link {
 	atomic64_t refcnt;
+	u32 id;
 	const struct bpf_link_ops *ops;
 	struct bpf_prog *prog;
 	struct work_struct work;
 };
 
+struct bpf_link_primer {
+	struct bpf_link *link;
+	struct file *file;
+	int fd;
+	u32 id;
+};
+
 struct bpf_link_ops {
 	void (*release)(struct bpf_link *link);
 	void (*dealloc)(struct bpf_link *link);
@@ -1097,10 +1105,11 @@ struct bpf_link_ops {
 			   struct bpf_prog *old_prog);
 };
 
-void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
-		   struct bpf_prog *prog);
-void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
-		      int link_fd);
+void bpf_link_init(struct bpf_link *link,
+		   const struct bpf_link_ops *ops, struct bpf_prog *prog);
+int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer);
+int bpf_link_settle(struct bpf_link_primer *primer);
+void bpf_link_cleanup(struct bpf_link_primer *primer);
 void bpf_link_inc(struct bpf_link *link);
 void bpf_link_put(struct bpf_link *link);
 int bpf_link_new_fd(struct bpf_link *link);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4a6c47f3febe..6121aa487465 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -523,6 +523,7 @@ union bpf_attr {
 			__u32		prog_id;
 			__u32		map_id;
 			__u32		btf_id;
+			__u32		link_id;
 		};
 		__u32		next_id;
 		__u32		open_flags;
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 0b284da05d0f..beb8307b16c6 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -841,10 +841,10 @@ const struct bpf_link_ops bpf_cgroup_link_lops = {
 
 int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
+	struct bpf_link_primer link_primer;
 	struct bpf_cgroup_link *link;
-	struct file *link_file;
 	struct cgroup *cgrp;
-	int err, link_fd;
+	int err;
 
 	if (attr->link_create.flags)
 		return -EINVAL;
@@ -862,22 +862,20 @@ int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	link->cgroup = cgrp;
 	link->type = attr->link_create.attach_type;
 
-	link_file = bpf_link_new_file(&link->link, &link_fd);
-	if (IS_ERR(link_file)) {
+	err  = bpf_link_prime(&link->link, &link_primer);
+	if (err) {
 		kfree(link);
-		err = PTR_ERR(link_file);
 		goto out_put_cgroup;
 	}
 
 	err = cgroup_bpf_attach(cgrp, NULL, NULL, link, link->type,
 				BPF_F_ALLOW_MULTI);
 	if (err) {
-		bpf_link_cleanup(&link->link, link_file, link_fd);
+		bpf_link_cleanup(&link_primer);
 		goto out_put_cgroup;
 	}
 
-	fd_install(link_fd, link_file);
-	return link_fd;
+	return bpf_link_settle(&link_primer);
 
 out_put_cgroup:
 	cgroup_put(cgrp);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f5358e1462eb..659fe3589446 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -42,6 +42,8 @@ static DEFINE_IDR(prog_idr);
 static DEFINE_SPINLOCK(prog_idr_lock);
 static DEFINE_IDR(map_idr);
 static DEFINE_SPINLOCK(map_idr_lock);
+static DEFINE_IDR(link_idr);
+static DEFINE_SPINLOCK(link_idr_lock);
 
 int sysctl_unprivileged_bpf_disabled __read_mostly;
 
@@ -2181,25 +2183,40 @@ static int bpf_obj_get(const union bpf_attr *attr)
 				attr->file_flags);
 }
 
-void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
-		   struct bpf_prog *prog)
+void bpf_link_init(struct bpf_link *link,
+		   const struct bpf_link_ops *ops, struct bpf_prog *prog)
 {
 	atomic64_set(&link->refcnt, 1);
+	link->id = 0;
 	link->ops = ops;
 	link->prog = prog;
 }
 
+static void bpf_link_free_id(int id)
+{
+	unsigned long flags;
+
+	if (!id)
+		return;
+
+	spin_lock_irqsave(&link_idr_lock, flags);
+	idr_remove(&link_idr, id);
+	spin_unlock_irqrestore(&link_idr_lock, flags);
+}
+
 /* Clean up bpf_link and corresponding anon_inode file and FD. After
  * anon_inode is created, bpf_link can't be just kfree()'d due to deferred
- * anon_inode's release() call. This helper manages marking bpf_link as
- * defunct, releases anon_inode file and puts reserved FD.
+ * anon_inode's release() call. This helper marksbpf_link as
+ * defunct, releases anon_inode file and puts reserved FD. bpf_prog's refcnt
+ * is not decremented, it's the responsibility of a calling code that failed
+ * to complete bpf_link initialization.
  */
-void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
-		      int link_fd)
+void bpf_link_cleanup(struct bpf_link_primer *primer)
 {
-	link->prog = NULL;
-	fput(link_file);
-	put_unused_fd(link_fd);
+	primer->link->prog = NULL;
+	bpf_link_free_id(primer->id);
+	fput(primer->file);
+	put_unused_fd(primer->fd);
 }
 
 void bpf_link_inc(struct bpf_link *link)
@@ -2210,6 +2227,7 @@ void bpf_link_inc(struct bpf_link *link)
 /* bpf_link_free is guaranteed to be called from process context */
 static void bpf_link_free(struct bpf_link *link)
 {
+	bpf_link_free_id(link->id);
 	if (link->prog) {
 		/* detach BPF program, clean up used resources */
 		link->ops->release(link);
@@ -2275,9 +2293,11 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
 	bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
 	seq_printf(m,
 		   "link_type:\t%s\n"
+		   "link_id:\t%u\n"
 		   "prog_tag:\t%s\n"
 		   "prog_id:\t%u\n",
 		   link_type,
+		   link->id,
 		   prog_tag,
 		   prog->aux->id);
 }
@@ -2292,36 +2312,74 @@ static const struct file_operations bpf_link_fops = {
 	.write		= bpf_dummy_write,
 };
 
-int bpf_link_new_fd(struct bpf_link *link)
+static int bpf_link_alloc_id(struct bpf_link *link)
 {
-	return anon_inode_getfd("bpf-link", &bpf_link_fops, link, O_CLOEXEC);
-}
+	int id;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock_bh(&link_idr_lock);
+	id = idr_alloc_cyclic(&link_idr, link, 1, INT_MAX, GFP_ATOMIC);
+	spin_unlock_bh(&link_idr_lock);
+	idr_preload_end();
 
-/* Similar to bpf_link_new_fd, create anon_inode for given bpf_link, but
- * instead of immediately installing fd in fdtable, just reserve it and
- * return. Caller then need to either install it with fd_install(fd, file) or
- * release with put_unused_fd(fd).
- * This is useful for cases when bpf_link attachment/detachment are
- * complicated and expensive operations and should be delayed until all the fd
- * reservation and anon_inode creation succeeds.
+	return id;
+}
+
+/* Prepare bpf_link to be exposed to user-space by allocating anon_inode file,
+ * reserving unused FD and allocating ID from link_idr. This is to be paired
+ * with bpf_link_settle() to install FD and ID and expose bpf_link to
+ * user-space, if bpf_link is successfully attached. If not, bpf_link and
+ * pre-allocated resources are to be freed with bpf_cleanup() call. All the
+ * transient state is passed around in struct bpf_link_primer.
+ * This is preferred way to create and initialize bpf_link, especially when
+ * there are complicated and expensive operations inbetween creating bpf_link
+ * itself and attaching it to BPF hook. By using bpf_link_prime() and
+ * bpf_link_settle() kernel code using bpf_link doesn't have to perform
+ * expensive (and potentially failing) roll back operations in a rare case
+ * that file, FD, or ID can't be allocated.
  */
-struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd)
+int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer)
 {
 	struct file *file;
-	int fd;
+	int fd, id;
 
 	fd = get_unused_fd_flags(O_CLOEXEC);
 	if (fd < 0)
-		return ERR_PTR(fd);
+		return fd;
 
 	file = anon_inode_getfile("bpf_link", &bpf_link_fops, link, O_CLOEXEC);
 	if (IS_ERR(file)) {
 		put_unused_fd(fd);
-		return file;
+		return PTR_ERR(file);
 	}
 
-	*reserved_fd = fd;
-	return file;
+	id = bpf_link_alloc_id(link);
+	if (id < 0) {
+		put_unused_fd(fd);
+		fput(file);
+		return id;
+	}
+
+	primer->link = link;
+	primer->file = file;
+	primer->fd = fd;
+	primer->id = id;
+	return 0;
+}
+
+int bpf_link_settle(struct bpf_link_primer *primer)
+{
+	/* make bpf_link fetchable by ID */
+	WRITE_ONCE(primer->link->id, primer->id);
+	/* make bpf_link fetchable by FD */
+	fd_install(primer->fd, primer->file);
+	/* pass through installed FD */
+	return primer->fd;
+}
+
+int bpf_link_new_fd(struct bpf_link *link)
+{
+	return anon_inode_getfd("bpf-link", &bpf_link_fops, link, O_CLOEXEC);
 }
 
 struct bpf_link *bpf_link_get_from_fd(u32 ufd)
@@ -2367,9 +2425,9 @@ static const struct bpf_link_ops bpf_tracing_link_lops = {
 
 static int bpf_tracing_prog_attach(struct bpf_prog *prog)
 {
+	struct bpf_link_primer link_primer;
 	struct bpf_tracing_link *link;
-	struct file *link_file;
-	int link_fd, err;
+	int err;
 
 	switch (prog->type) {
 	case BPF_PROG_TYPE_TRACING:
@@ -2404,22 +2462,19 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
 	}
 	bpf_link_init(&link->link, &bpf_tracing_link_lops, prog);
 
-	link_file = bpf_link_new_file(&link->link, &link_fd);
-	if (IS_ERR(link_file)) {
+	err = bpf_link_prime(&link->link, &link_primer);
+	if (err) {
 		kfree(link);
-		err = PTR_ERR(link_file);
 		goto out_put_prog;
 	}
 
 	err = bpf_trampoline_link_prog(prog);
 	if (err) {
-		bpf_link_cleanup(&link->link, link_file, link_fd);
+		bpf_link_cleanup(&link_primer);
 		goto out_put_prog;
 	}
 
-	fd_install(link_fd, link_file);
-	return link_fd;
-
+	return bpf_link_settle(&link_primer);
 out_put_prog:
 	bpf_prog_put(prog);
 	return err;
@@ -2447,7 +2502,7 @@ static void bpf_raw_tp_link_dealloc(struct bpf_link *link)
 	kfree(raw_tp);
 }
 
-static const struct bpf_link_ops bpf_raw_tp_lops = {
+static const struct bpf_link_ops bpf_raw_tp_link_lops = {
 	.release = bpf_raw_tp_link_release,
 	.dealloc = bpf_raw_tp_link_dealloc,
 };
@@ -2456,13 +2511,13 @@ static const struct bpf_link_ops bpf_raw_tp_lops = {
 
 static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 {
+	struct bpf_link_primer link_primer;
 	struct bpf_raw_tp_link *link;
 	struct bpf_raw_event_map *btp;
-	struct file *link_file;
 	struct bpf_prog *prog;
 	const char *tp_name;
 	char buf[128];
-	int link_fd, err;
+	int err;
 
 	if (CHECK_ATTR(BPF_RAW_TRACEPOINT_OPEN))
 		return -EINVAL;
@@ -2515,24 +2570,22 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 		err = -ENOMEM;
 		goto out_put_btp;
 	}
-	bpf_link_init(&link->link, &bpf_raw_tp_lops, prog);
+	bpf_link_init(&link->link, &bpf_raw_tp_link_lops, prog);
 	link->btp = btp;
 
-	link_file = bpf_link_new_file(&link->link, &link_fd);
-	if (IS_ERR(link_file)) {
+	err = bpf_link_prime(&link->link, &link_primer);
+	if (err) {
 		kfree(link);
-		err = PTR_ERR(link_file);
 		goto out_put_btp;
 	}
 
 	err = bpf_probe_register(link->btp, prog);
 	if (err) {
-		bpf_link_cleanup(&link->link, link_file, link_fd);
+		bpf_link_cleanup(&link_primer);
 		goto out_put_btp;
 	}
 
-	fd_install(link_fd, link_file);
-	return link_fd;
+	return bpf_link_settle(&link_primer);
 
 out_put_btp:
 	bpf_put_raw_tracepoint(btp);
@@ -3464,7 +3517,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
 	if (file->f_op == &bpf_link_fops) {
 		struct bpf_link *link = file->private_data;
 
-		if (link->ops == &bpf_raw_tp_lops) {
+		if (link->ops == &bpf_raw_tp_link_lops) {
 			struct bpf_raw_tp_link *raw_tp =
 				container_of(link, struct bpf_raw_tp_link, link);
 			struct bpf_raw_event_map *btp = raw_tp->btp;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 bpf-next 03/10] bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  2020-04-28  5:49 [PATCH v2 bpf-next 00/10] bpf_link observability APIs Andrii Nakryiko
  2020-04-28  5:49 ` [PATCH v2 bpf-next 01/10] bpf: refactor bpf_link update handling Andrii Nakryiko
  2020-04-28  5:49 ` [PATCH v2 bpf-next 02/10] bpf: allocate ID for bpf_link Andrii Nakryiko
@ 2020-04-28  5:49 ` Andrii Nakryiko
  2020-04-28  5:49 ` [PATCH v2 bpf-next 04/10] bpf: add support for BPF_OBJ_GET_INFO_BY_FD " Andrii Nakryiko
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28  5:49 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add support to look up bpf_link by ID and iterate over all existing bpf_links
in the system. GET_FD_BY_ID code handles not-yet-ready bpf_link by checking
that its ID hasn't been set to non-zero value yet. Setting bpf_link's ID is
done as the very last step in finalizing bpf_link, together with installing
FD. This approach allows users of bpf_link in kernel code to not worry about
races between user-space and kernel code that hasn't finished attaching and
initializing bpf_link.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/uapi/linux/bpf.h |  2 ++
 kernel/bpf/syscall.c     | 49 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6121aa487465..7e6541fceade 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -113,6 +113,8 @@ enum bpf_cmd {
 	BPF_MAP_DELETE_BATCH,
 	BPF_LINK_CREATE,
 	BPF_LINK_UPDATE,
+	BPF_LINK_GET_FD_BY_ID,
+	BPF_LINK_GET_NEXT_ID,
 };
 
 enum bpf_map_type {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 659fe3589446..f296c15b8b3e 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3713,6 +3713,48 @@ static int link_update(union bpf_attr *attr)
 	return ret;
 }
 
+static int bpf_link_inc_not_zero(struct bpf_link *link)
+{
+	return atomic64_fetch_add_unless(&link->refcnt, 1, 0) ? 0 : -ENOENT;
+}
+
+#define BPF_LINK_GET_FD_BY_ID_LAST_FIELD link_id
+
+static int bpf_link_get_fd_by_id(const union bpf_attr *attr)
+{
+	struct bpf_link *link;
+	u32 id = attr->link_id;
+	int fd, err;
+
+	if (CHECK_ATTR(BPF_LINK_GET_FD_BY_ID))
+		return -EINVAL;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	spin_lock_bh(&link_idr_lock);
+	link = idr_find(&link_idr, id);
+	/* before link is "settled", ID is 0, pretend it doesn't exist yet */
+	if (link) {
+		if (link->id)
+			err = bpf_link_inc_not_zero(link);
+		else
+			err = -EAGAIN;
+	} else {
+		err = -ENOENT;
+	}
+	spin_unlock_bh(&link_idr_lock);
+
+	if (err)
+		return err;
+
+	fd = bpf_link_new_fd(link);
+	if (fd < 0)
+		bpf_link_put(link);
+
+	return fd;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr;
@@ -3830,6 +3872,13 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_LINK_UPDATE:
 		err = link_update(&attr);
 		break;
+	case BPF_LINK_GET_FD_BY_ID:
+		err = bpf_link_get_fd_by_id(&attr);
+		break;
+	case BPF_LINK_GET_NEXT_ID:
+		err = bpf_obj_get_next_id(&attr, uattr,
+					  &link_idr, &link_idr_lock);
+		break;
 	default:
 		err = -EINVAL;
 		break;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 bpf-next 04/10] bpf: add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link
  2020-04-28  5:49 [PATCH v2 bpf-next 00/10] bpf_link observability APIs Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2020-04-28  5:49 ` [PATCH v2 bpf-next 03/10] bpf: support GET_FD_BY_ID and GET_NEXT_ID " Andrii Nakryiko
@ 2020-04-28  5:49 ` Andrii Nakryiko
  2020-04-28  9:46   ` Toke Høiland-Jørgensen
  2020-04-28 21:55   ` kbuild test robot
  2020-04-28  5:49 ` [PATCH v2 bpf-next 05/10] libbpf: add low-level APIs for new bpf_link commands Andrii Nakryiko
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28  5:49 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command.
Also enhance show_fdinfo to potentially include bpf_link type-specific
information (similarly to obj_info).

Also introduce enum bpf_link_type stored in bpf_link itself and expose it in
UAPI. bpf_link_tracing also now will store and return bpf_attach_type.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/linux/bpf-cgroup.h     |   2 -
 include/linux/bpf.h            |  10 +-
 include/linux/bpf_types.h      |   6 ++
 include/uapi/linux/bpf.h       |  28 ++++++
 kernel/bpf/btf.c               |   2 +
 kernel/bpf/cgroup.c            |  45 ++++++++-
 kernel/bpf/syscall.c           | 164 +++++++++++++++++++++++++++++----
 kernel/bpf/verifier.c          |   2 +
 tools/include/uapi/linux/bpf.h |  31 +++++++
 9 files changed, 266 insertions(+), 24 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index d2d969669564..ab95824a1d99 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -57,8 +57,6 @@ struct bpf_cgroup_link {
 	enum bpf_attach_type type;
 };
 
-extern const struct bpf_link_ops bpf_cgroup_link_lops;
-
 struct bpf_prog_list {
 	struct list_head node;
 	struct bpf_prog *prog;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 875d1f0af803..701c4387c711 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1026,9 +1026,11 @@ extern const struct file_operations bpf_prog_fops;
 	extern const struct bpf_verifier_ops _name ## _verifier_ops;
 #define BPF_MAP_TYPE(_id, _ops) \
 	extern const struct bpf_map_ops _ops;
+#define BPF_LINK_TYPE(_id, _name)
 #include <linux/bpf_types.h>
 #undef BPF_PROG_TYPE
 #undef BPF_MAP_TYPE
+#undef BPF_LINK_TYPE
 
 extern const struct bpf_prog_ops bpf_offload_prog_ops;
 extern const struct bpf_verifier_ops tc_cls_act_analyzer_ops;
@@ -1086,6 +1088,7 @@ int bpf_prog_new_fd(struct bpf_prog *prog);
 struct bpf_link {
 	atomic64_t refcnt;
 	u32 id;
+	enum bpf_link_type type;
 	const struct bpf_link_ops *ops;
 	struct bpf_prog *prog;
 	struct work_struct work;
@@ -1103,9 +1106,14 @@ struct bpf_link_ops {
 	void (*dealloc)(struct bpf_link *link);
 	int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
 			   struct bpf_prog *old_prog);
+	void (*show_fdinfo)(const struct bpf_link *link, struct seq_file *seq);
+	int (*fill_link_info)(const struct bpf_link *link,
+			      struct bpf_link_info *info,
+			      const struct bpf_link_info *uinfo,
+			      u32 info_len);
 };
 
-void bpf_link_init(struct bpf_link *link,
+void bpf_link_init(struct bpf_link *link, enum bpf_link_type type,
 		   const struct bpf_link_ops *ops, struct bpf_prog *prog);
 int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer);
 int bpf_link_settle(struct bpf_link_primer *primer);
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index ba0c2d56f8a3..8345cdf553b8 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -118,3 +118,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
 #if defined(CONFIG_BPF_JIT)
 BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
 #endif
+
+BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
+BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
+#ifdef CONFIG_CGROUP_BPF
+BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
+#endif
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7e6541fceade..0eccafae55bb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -222,6 +222,15 @@ enum bpf_attach_type {
 
 #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
 
+enum bpf_link_type {
+	BPF_LINK_TYPE_UNSPEC = 0,
+	BPF_LINK_TYPE_RAW_TRACEPOINT = 1,
+	BPF_LINK_TYPE_TRACING = 2,
+	BPF_LINK_TYPE_CGROUP = 3,
+
+	MAX_BPF_LINK_TYPE,
+};
+
 /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
  *
  * NONE(default): No further bpf programs allowed in the subtree.
@@ -3612,6 +3621,25 @@ struct bpf_btf_info {
 	__u32 id;
 } __attribute__((aligned(8)));
 
+struct bpf_link_info {
+	__u32 type;
+	__u32 id;
+	__u32 prog_id;
+	union {
+		struct {
+			__aligned_u64 tp_name; /* in/out: tp_name buffer ptr */
+			__u32 tp_name_len;     /* in/out: tp_name buffer len */
+		} raw_tracepoint;
+		struct {
+			__u32 attach_type;
+		} tracing;
+		struct {
+			__u64 cgroup_id;
+			__u32 attach_type;
+		} cgroup;
+	};
+} __attribute__((aligned(8)));
+
 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
  * by user and intended to be used by socket (e.g. to bind to, depends on
  * attach attach type).
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index d65c6912bdaf..a2cfba89a8e1 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3482,6 +3482,7 @@ extern char __weak __stop_BTF[];
 extern struct btf *btf_vmlinux;
 
 #define BPF_MAP_TYPE(_id, _ops)
+#define BPF_LINK_TYPE(_id, _name)
 static union {
 	struct bpf_ctx_convert {
 #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
@@ -3508,6 +3509,7 @@ static u8 bpf_ctx_convert_map[] = {
 	0, /* avoid empty array */
 };
 #undef BPF_MAP_TYPE
+#undef BPF_LINK_TYPE
 
 static const struct btf_member *
 btf_get_prog_ctx_type(struct bpf_verifier_log *log, struct btf *btf,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index beb8307b16c6..316a3c436ccd 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -833,10 +833,50 @@ static void bpf_cgroup_link_dealloc(struct bpf_link *link)
 	kfree(cg_link);
 }
 
-const struct bpf_link_ops bpf_cgroup_link_lops = {
+static void bpf_cgroup_link_show_fdinfo(const struct bpf_link *link,
+					struct seq_file *seq)
+{
+	struct bpf_cgroup_link *cg_link =
+		container_of(link, struct bpf_cgroup_link, link);
+	u64 cg_id = 0;
+
+	mutex_lock(&cgroup_mutex);
+	if (cg_link->cgroup)
+		cg_id = cgroup_id(cg_link->cgroup);
+	mutex_unlock(&cgroup_mutex);
+
+	seq_printf(seq,
+		   "cgroup_id:\t%llu\n"
+		   "attach_type:\t%d\n",
+		   cg_id,
+		   cg_link->type);
+}
+
+static int bpf_cgroup_link_fill_link_info(const struct bpf_link *link,
+					  struct bpf_link_info *info,
+					  const struct bpf_link_info *uinfo,
+					  u32 info_len)
+{
+	struct bpf_cgroup_link *cg_link =
+		container_of(link, struct bpf_cgroup_link, link);
+	u64 cg_id = 0;
+
+	mutex_lock(&cgroup_mutex);
+	if (cg_link->cgroup)
+		cg_id = cgroup_id(cg_link->cgroup);
+	mutex_unlock(&cgroup_mutex);
+
+	info->cgroup.cgroup_id = cg_id;
+	info->cgroup.attach_type = cg_link->type;
+	return 0;
+}
+
+static const struct bpf_link_ops bpf_cgroup_link_lops = {
 	.release = bpf_cgroup_link_release,
 	.dealloc = bpf_cgroup_link_dealloc,
 	.update_prog = cgroup_bpf_replace,
+	.show_fdinfo = bpf_cgroup_link_show_fdinfo,
+	.fill_link_info = bpf_cgroup_link_fill_link_info,
 };
 
 int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
@@ -858,7 +898,8 @@ int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 		err = -ENOMEM;
 		goto out_put_cgroup;
 	}
-	bpf_link_init(&link->link, &bpf_cgroup_link_lops, prog);
+	bpf_link_init(&link->link, BPF_LINK_TYPE_CGROUP, &bpf_cgroup_link_lops,
+		      prog);
 	link->cgroup = cgrp;
 	link->type = attr->link_create.attach_type;
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index f296c15b8b3e..0c113a05aadb 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -51,9 +51,11 @@ static const struct bpf_map_ops * const bpf_map_types[] = {
 #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type)
 #define BPF_MAP_TYPE(_id, _ops) \
 	[_id] = &_ops,
+#define BPF_LINK_TYPE(_id, _name)
 #include <linux/bpf_types.h>
 #undef BPF_PROG_TYPE
 #undef BPF_MAP_TYPE
+#undef BPF_LINK_TYPE
 };
 
 /*
@@ -1548,9 +1550,11 @@ static const struct bpf_prog_ops * const bpf_prog_types[] = {
 #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
 	[_id] = & _name ## _prog_ops,
 #define BPF_MAP_TYPE(_id, _ops)
+#define BPF_LINK_TYPE(_id, _name)
 #include <linux/bpf_types.h>
 #undef BPF_PROG_TYPE
 #undef BPF_MAP_TYPE
+#undef BPF_LINK_TYPE
 };
 
 static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog)
@@ -2183,10 +2187,11 @@ static int bpf_obj_get(const union bpf_attr *attr)
 				attr->file_flags);
 }
 
-void bpf_link_init(struct bpf_link *link,
+void bpf_link_init(struct bpf_link *link, enum bpf_link_type type,
 		   const struct bpf_link_ops *ops, struct bpf_prog *prog)
 {
 	atomic64_set(&link->refcnt, 1);
+	link->type = type;
 	link->id = 0;
 	link->ops = ops;
 	link->prog = prog;
@@ -2268,27 +2273,23 @@ static int bpf_link_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-#ifdef CONFIG_PROC_FS
-static const struct bpf_link_ops bpf_raw_tp_lops;
-static const struct bpf_link_ops bpf_tracing_link_lops;
+#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type)
+#define BPF_MAP_TYPE(_id, _ops)
+#define BPF_LINK_TYPE(_id, _name) [_id] = #_name,
+static const char *bpf_link_type_strs[] = {
+	[BPF_LINK_TYPE_UNSPEC] = "<invalid>",
+#include <linux/bpf_types.h>
+};
+#undef BPF_PROG_TYPE
+#undef BPF_MAP_TYPE
+#undef BPF_LINK_TYPE
 
+#ifdef CONFIG_PROC_FS
 static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
 {
 	const struct bpf_link *link = filp->private_data;
 	const struct bpf_prog *prog = link->prog;
 	char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
-	const char *link_type;
-
-	if (link->ops == &bpf_raw_tp_lops)
-		link_type = "raw_tracepoint";
-	else if (link->ops == &bpf_tracing_link_lops)
-		link_type = "tracing";
-#ifdef CONFIG_CGROUP_BPF
-	else if (link->ops == &bpf_cgroup_link_lops)
-		link_type = "cgroup";
-#endif
-	else
-		link_type = "unknown";
 
 	bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
 	seq_printf(m,
@@ -2296,10 +2297,12 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
 		   "link_id:\t%u\n"
 		   "prog_tag:\t%s\n"
 		   "prog_id:\t%u\n",
-		   link_type,
+		   bpf_link_type_strs[link->type],
 		   link->id,
 		   prog_tag,
 		   prog->aux->id);
+	if (link->ops->show_fdinfo)
+		link->ops->show_fdinfo(link, m);
 }
 #endif
 
@@ -2403,6 +2406,7 @@ struct bpf_link *bpf_link_get_from_fd(u32 ufd)
 
 struct bpf_tracing_link {
 	struct bpf_link link;
+	enum bpf_attach_type attach_type;
 };
 
 static void bpf_tracing_link_release(struct bpf_link *link)
@@ -2418,9 +2422,35 @@ static void bpf_tracing_link_dealloc(struct bpf_link *link)
 	kfree(tr_link);
 }
 
+static void bpf_tracing_link_show_fdinfo(const struct bpf_link *link,
+					 struct seq_file *seq)
+{
+	struct bpf_tracing_link *tr_link =
+		container_of(link, struct bpf_tracing_link, link);
+
+	seq_printf(seq,
+		   "attach_type:\t%d\n",
+		   tr_link->attach_type);
+}
+
+static int bpf_tracing_link_fill_link_info(const struct bpf_link *link,
+					   struct bpf_link_info *info,
+					   const struct bpf_link_info *uinfo,
+					   u32 info_len)
+{
+	struct bpf_tracing_link *tr_link =
+		container_of(link, struct bpf_tracing_link, link);
+
+	info->tracing.attach_type = tr_link->attach_type;
+
+	return 0;
+}
+
 static const struct bpf_link_ops bpf_tracing_link_lops = {
 	.release = bpf_tracing_link_release,
 	.dealloc = bpf_tracing_link_dealloc,
+	.show_fdinfo = bpf_tracing_link_show_fdinfo,
+	.fill_link_info = bpf_tracing_link_fill_link_info,
 };
 
 static int bpf_tracing_prog_attach(struct bpf_prog *prog)
@@ -2460,7 +2490,9 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
 		err = -ENOMEM;
 		goto out_put_prog;
 	}
-	bpf_link_init(&link->link, &bpf_tracing_link_lops, prog);
+	bpf_link_init(&link->link, BPF_LINK_TYPE_TRACING,
+		      &bpf_tracing_link_lops, prog);
+	link->attach_type = prog->expected_attach_type;
 
 	err = bpf_link_prime(&link->link, &link_primer);
 	if (err) {
@@ -2502,9 +2534,66 @@ static void bpf_raw_tp_link_dealloc(struct bpf_link *link)
 	kfree(raw_tp);
 }
 
+static void bpf_raw_tp_link_show_fdinfo(const struct bpf_link *link,
+					struct seq_file *seq)
+{
+	struct bpf_raw_tp_link *raw_tp_link =
+		container_of(link, struct bpf_raw_tp_link, link);
+
+	seq_printf(seq,
+		   "tp_name:\t%s\n",
+		   raw_tp_link->btp->tp->name);
+}
+
+static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link,
+					  struct bpf_link_info *info,
+					  const struct bpf_link_info *uinfo,
+					  u32 info_len)
+{
+	struct bpf_raw_tp_link *raw_tp_link =
+		container_of(link, struct bpf_raw_tp_link, link);
+	u64 ubuf_ptr;
+	char __user *ubuf = u64_to_user_ptr(uinfo->raw_tracepoint.tp_name);
+	const char *tp_name = raw_tp_link->btp->tp->name;
+	size_t tp_len;
+	u32 ulen;
+
+	if (get_user(ulen, &uinfo->raw_tracepoint.tp_name_len))
+		return -EFAULT;
+	if (get_user(ubuf_ptr, &uinfo->raw_tracepoint.tp_name))
+		return -EFAULT;
+	ubuf = u64_to_user_ptr(ubuf_ptr);
+
+	if (ulen && !ubuf)
+		return -EINVAL;
+	if (!ubuf)
+		return 0;
+
+	tp_len = strlen(raw_tp_link->btp->tp->name);
+	info->raw_tracepoint.tp_name_len = tp_len + 1;
+	info->raw_tracepoint.tp_name = (u64)(unsigned long)ubuf;
+
+	if (ulen >= tp_len + 1) {
+		if (copy_to_user(ubuf, tp_name, tp_len + 1))
+			return -EFAULT;
+	} else {
+		char zero = '\0';
+
+		if (copy_to_user(ubuf, tp_name, ulen - 1))
+			return -EFAULT;
+		if (put_user(zero, ubuf + ulen - 1))
+			return -EFAULT;
+		return -ENOSPC;
+	}
+
+	return 0;
+}
+
 static const struct bpf_link_ops bpf_raw_tp_link_lops = {
 	.release = bpf_raw_tp_link_release,
 	.dealloc = bpf_raw_tp_link_dealloc,
+	.show_fdinfo = bpf_raw_tp_link_show_fdinfo,
+	.fill_link_info = bpf_raw_tp_link_fill_link_info,
 };
 
 #define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
@@ -2570,7 +2659,8 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 		err = -ENOMEM;
 		goto out_put_btp;
 	}
-	bpf_link_init(&link->link, &bpf_raw_tp_link_lops, prog);
+	bpf_link_init(&link->link, BPF_LINK_TYPE_RAW_TRACEPOINT,
+		      &bpf_raw_tp_link_lops, prog);
 	link->btp = btp;
 
 	err = bpf_link_prime(&link->link, &link_primer);
@@ -3366,6 +3456,39 @@ static int bpf_btf_get_info_by_fd(struct btf *btf,
 	return btf_get_info_by_fd(btf, attr, uattr);
 }
 
+static int bpf_link_get_info_by_fd(struct bpf_link *link,
+				  const union bpf_attr *attr,
+				  union bpf_attr __user *uattr)
+{
+	struct bpf_link_info __user *uinfo = u64_to_user_ptr(attr->info.info);
+	struct bpf_link_info info;
+	u32 info_len = attr->info.info_len;
+	int err;
+
+	err = bpf_check_uarg_tail_zero(uinfo, sizeof(info), info_len);
+	if (err)
+		return err;
+	info_len = min_t(u32, sizeof(info), info_len);
+
+	memset(&info, 0, sizeof(info));
+	info.type = link->type;
+	info.id = link->id;
+	info.prog_id = link->prog->aux->id;
+
+	if (link->ops->fill_link_info) {
+		err = link->ops->fill_link_info(link, &info, uinfo, info_len);
+		if (err)
+			return err;
+	}
+
+	if (copy_to_user(uinfo, &info, info_len) ||
+	    put_user(info_len, &uattr->info.info_len))
+		return -EFAULT;
+
+	return 0;
+}
+
+
 #define BPF_OBJ_GET_INFO_BY_FD_LAST_FIELD info.info
 
 static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
@@ -3390,6 +3513,9 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 					     uattr);
 	else if (f.file->f_op == &btf_fops)
 		err = bpf_btf_get_info_by_fd(f.file->private_data, attr, uattr);
+	else if (f.file->f_op == &bpf_link_fops)
+		err = bpf_link_get_info_by_fd(f.file->private_data,
+					      attr, uattr);
 	else
 		err = -EINVAL;
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 91728e0f27eb..2b337e32aa94 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -28,9 +28,11 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
 #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
 	[_id] = & _name ## _verifier_ops,
 #define BPF_MAP_TYPE(_id, _ops)
+#define BPF_LINK_TYPE(_id, _name)
 #include <linux/bpf_types.h>
 #undef BPF_PROG_TYPE
 #undef BPF_MAP_TYPE
+#undef BPF_LINK_TYPE
 };
 
 /* bpf_check() is a static code analyzer that walks eBPF program
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4a6c47f3febe..0eccafae55bb 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -113,6 +113,8 @@ enum bpf_cmd {
 	BPF_MAP_DELETE_BATCH,
 	BPF_LINK_CREATE,
 	BPF_LINK_UPDATE,
+	BPF_LINK_GET_FD_BY_ID,
+	BPF_LINK_GET_NEXT_ID,
 };
 
 enum bpf_map_type {
@@ -220,6 +222,15 @@ enum bpf_attach_type {
 
 #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
 
+enum bpf_link_type {
+	BPF_LINK_TYPE_UNSPEC = 0,
+	BPF_LINK_TYPE_RAW_TRACEPOINT = 1,
+	BPF_LINK_TYPE_TRACING = 2,
+	BPF_LINK_TYPE_CGROUP = 3,
+
+	MAX_BPF_LINK_TYPE,
+};
+
 /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
  *
  * NONE(default): No further bpf programs allowed in the subtree.
@@ -523,6 +534,7 @@ union bpf_attr {
 			__u32		prog_id;
 			__u32		map_id;
 			__u32		btf_id;
+			__u32		link_id;
 		};
 		__u32		next_id;
 		__u32		open_flags;
@@ -3609,6 +3621,25 @@ struct bpf_btf_info {
 	__u32 id;
 } __attribute__((aligned(8)));
 
+struct bpf_link_info {
+	__u32 type;
+	__u32 id;
+	__u32 prog_id;
+	union {
+		struct {
+			__aligned_u64 tp_name; /* in/out: tp_name buffer ptr */
+			__u32 tp_name_len;     /* in/out: tp_name buffer len */
+		} raw_tracepoint;
+		struct {
+			__u32 attach_type;
+		} tracing;
+		struct {
+			__u64 cgroup_id;
+			__u32 attach_type;
+		} cgroup;
+	};
+} __attribute__((aligned(8)));
+
 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
  * by user and intended to be used by socket (e.g. to bind to, depends on
  * attach attach type).
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 bpf-next 05/10] libbpf: add low-level APIs for new bpf_link commands
  2020-04-28  5:49 [PATCH v2 bpf-next 00/10] bpf_link observability APIs Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2020-04-28  5:49 ` [PATCH v2 bpf-next 04/10] bpf: add support for BPF_OBJ_GET_INFO_BY_FD " Andrii Nakryiko
@ 2020-04-28  5:49 ` Andrii Nakryiko
  2020-04-28  5:49 ` [PATCH v2 bpf-next 06/10] selftests/bpf: test bpf_link's get_next_id, get_fd_by_id, and get_obj_info Andrii Nakryiko
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28  5:49 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add low-level API calls for bpf_link_get_next_id() and
bpf_link_get_fd_by_id().

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/bpf.c      | 19 +++++++++++++++++--
 tools/lib/bpf/bpf.h      |  4 +++-
 tools/lib/bpf/libbpf.map |  6 ++++++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 5cc1b0785d18..8f2f0958d446 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -721,6 +721,11 @@ int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id)
 	return bpf_obj_get_next_id(start_id, next_id, BPF_BTF_GET_NEXT_ID);
 }
 
+int bpf_link_get_next_id(__u32 start_id, __u32 *next_id)
+{
+	return bpf_obj_get_next_id(start_id, next_id, BPF_LINK_GET_NEXT_ID);
+}
+
 int bpf_prog_get_fd_by_id(__u32 id)
 {
 	union bpf_attr attr;
@@ -751,13 +756,23 @@ int bpf_btf_get_fd_by_id(__u32 id)
 	return sys_bpf(BPF_BTF_GET_FD_BY_ID, &attr, sizeof(attr));
 }
 
-int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
+int bpf_link_get_fd_by_id(__u32 id)
+{
+	union bpf_attr attr;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.link_id = id;
+
+	return sys_bpf(BPF_LINK_GET_FD_BY_ID, &attr, sizeof(attr));
+}
+
+int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len)
 {
 	union bpf_attr attr;
 	int err;
 
 	memset(&attr, 0, sizeof(attr));
-	attr.info.bpf_fd = prog_fd;
+	attr.info.bpf_fd = bpf_fd;
 	attr.info.info_len = *info_len;
 	attr.info.info = ptr_to_u64(info);
 
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 46d47afdd887..335b457b3a25 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -216,10 +216,12 @@ LIBBPF_API int bpf_prog_test_run(int prog_fd, int repeat, void *data,
 LIBBPF_API int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
 LIBBPF_API int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
 LIBBPF_API int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id);
+LIBBPF_API int bpf_link_get_next_id(__u32 start_id, __u32 *next_id);
 LIBBPF_API int bpf_prog_get_fd_by_id(__u32 id);
 LIBBPF_API int bpf_map_get_fd_by_id(__u32 id);
 LIBBPF_API int bpf_btf_get_fd_by_id(__u32 id);
-LIBBPF_API int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len);
+LIBBPF_API int bpf_link_get_fd_by_id(__u32 id);
+LIBBPF_API int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len);
 LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
 			      __u32 query_flags, __u32 *attach_flags,
 			      __u32 *prog_ids, __u32 *prog_cnt);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index bb8831605b25..7cd49aa38005 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -254,3 +254,9 @@ LIBBPF_0.0.8 {
 		bpf_program__set_lsm;
 		bpf_set_link_xdp_fd_opts;
 } LIBBPF_0.0.7;
+
+LIBBPF_0.0.9 {
+	global:
+		bpf_link_get_fd_by_id;
+		bpf_link_get_next_id;
+} LIBBPF_0.0.8;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 bpf-next 06/10] selftests/bpf: test bpf_link's get_next_id, get_fd_by_id, and get_obj_info
  2020-04-28  5:49 [PATCH v2 bpf-next 00/10] bpf_link observability APIs Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2020-04-28  5:49 ` [PATCH v2 bpf-next 05/10] libbpf: add low-level APIs for new bpf_link commands Andrii Nakryiko
@ 2020-04-28  5:49 ` Andrii Nakryiko
  2020-04-28  5:49 ` [PATCH v2 bpf-next 07/10] bpftool: expose attach_type-to-string array to non-cgroup code Andrii Nakryiko
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28  5:49 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Extend bpf_obj_id selftest to verify bpf_link's observability APIs.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 .../selftests/bpf/prog_tests/bpf_obj_id.c     | 110 ++++++++++++++++--
 .../testing/selftests/bpf/progs/test_obj_id.c |  14 +--
 2 files changed, 104 insertions(+), 20 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c b/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
index f10029821e16..7afa4160416f 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_obj_id.c
@@ -1,26 +1,30 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
 
+#define nr_iters 2
+
 void test_bpf_obj_id(void)
 {
 	const __u64 array_magic_value = 0xfaceb00c;
 	const __u32 array_key = 0;
-	const int nr_iters = 2;
 	const char *file = "./test_obj_id.o";
 	const char *expected_prog_name = "test_obj_id";
 	const char *expected_map_name = "test_map_id";
 	const __u64 nsec_per_sec = 1000000000;
 
-	struct bpf_object *objs[nr_iters];
+	struct bpf_object *objs[nr_iters] = {};
+	struct bpf_link *links[nr_iters] = {};
+	struct bpf_program *prog;
 	int prog_fds[nr_iters], map_fds[nr_iters];
 	/* +1 to test for the info_len returned by kernel */
 	struct bpf_prog_info prog_infos[nr_iters + 1];
 	struct bpf_map_info map_infos[nr_iters + 1];
+	struct bpf_link_info link_infos[nr_iters + 1];
 	/* Each prog only uses one map. +1 to test nr_map_ids
 	 * returned by kernel.
 	 */
 	__u32 map_ids[nr_iters + 1];
-	char jited_insns[128], xlated_insns[128], zeros[128];
+	char jited_insns[128], xlated_insns[128], zeros[128], tp_name[128];
 	__u32 i, next_id, info_len, nr_id_found, duration = 0;
 	struct timespec real_time_ts, boot_time_ts;
 	int err = 0;
@@ -36,14 +40,15 @@ void test_bpf_obj_id(void)
 	CHECK(err >= 0 || errno != ENOENT,
 	      "get-fd-by-notexist-map-id", "err %d errno %d\n", err, errno);
 
-	for (i = 0; i < nr_iters; i++)
-		objs[i] = NULL;
+	err = bpf_link_get_fd_by_id(0);
+	CHECK(err >= 0 || errno != ENOENT,
+	      "get-fd-by-notexist-link-id", "err %d errno %d\n", err, errno);
 
 	/* Check bpf_obj_get_info_by_fd() */
 	bzero(zeros, sizeof(zeros));
 	for (i = 0; i < nr_iters; i++) {
 		now = time(NULL);
-		err = bpf_prog_load(file, BPF_PROG_TYPE_SOCKET_FILTER,
+		err = bpf_prog_load(file, BPF_PROG_TYPE_RAW_TRACEPOINT,
 				    &objs[i], &prog_fds[i]);
 		/* test_obj_id.o is a dumb prog. It should never fail
 		 * to load.
@@ -60,6 +65,17 @@ void test_bpf_obj_id(void)
 		if (CHECK_FAIL(err))
 			goto done;
 
+		prog = bpf_object__find_program_by_title(objs[i],
+							 "raw_tp/sys_enter");
+		if (CHECK_FAIL(!prog))
+			goto done;
+		links[i] = bpf_program__attach(prog);
+		err = libbpf_get_error(links[i]);
+		if (CHECK(err, "prog_attach", "prog #%d, err %d\n", i, err)) {
+			links[i] = NULL;
+			goto done;
+		}
+
 		/* Check getting map info */
 		info_len = sizeof(struct bpf_map_info) * 2;
 		bzero(&map_infos[i], info_len);
@@ -107,7 +123,7 @@ void test_bpf_obj_id(void)
 		load_time = (real_time_ts.tv_sec - boot_time_ts.tv_sec)
 			+ (prog_infos[i].load_time / nsec_per_sec);
 		if (CHECK(err ||
-			  prog_infos[i].type != BPF_PROG_TYPE_SOCKET_FILTER ||
+			  prog_infos[i].type != BPF_PROG_TYPE_RAW_TRACEPOINT ||
 			  info_len != sizeof(struct bpf_prog_info) ||
 			  (env.jit_enabled && !prog_infos[i].jited_prog_len) ||
 			  (env.jit_enabled &&
@@ -120,7 +136,11 @@ void test_bpf_obj_id(void)
 			  *(int *)(long)prog_infos[i].map_ids != map_infos[i].id ||
 			  strcmp((char *)prog_infos[i].name, expected_prog_name),
 			  "get-prog-info(fd)",
-			  "err %d errno %d i %d type %d(%d) info_len %u(%zu) jit_enabled %d jited_prog_len %u xlated_prog_len %u jited_prog %d xlated_prog %d load_time %lu(%lu) uid %u(%u) nr_map_ids %u(%u) map_id %u(%u) name %s(%s)\n",
+			  "err %d errno %d i %d type %d(%d) info_len %u(%zu) "
+			  "jit_enabled %d jited_prog_len %u xlated_prog_len %u "
+			  "jited_prog %d xlated_prog %d load_time %lu(%lu) "
+			  "uid %u(%u) nr_map_ids %u(%u) map_id %u(%u) "
+			  "name %s(%s)\n",
 			  err, errno, i,
 			  prog_infos[i].type, BPF_PROG_TYPE_SOCKET_FILTER,
 			  info_len, sizeof(struct bpf_prog_info),
@@ -135,6 +155,33 @@ void test_bpf_obj_id(void)
 			  *(int *)(long)prog_infos[i].map_ids, map_infos[i].id,
 			  prog_infos[i].name, expected_prog_name))
 			goto done;
+
+		/* Check getting link info */
+		info_len = sizeof(struct bpf_link_info) * 2;
+		bzero(&link_infos[i], info_len);
+		link_infos[i].raw_tracepoint.tp_name = (__u64)&tp_name;
+		link_infos[i].raw_tracepoint.tp_name_len = sizeof(tp_name);
+		err = bpf_obj_get_info_by_fd(bpf_link__fd(links[i]),
+					     &link_infos[i], &info_len);
+		if (CHECK(err ||
+			  link_infos[i].type != BPF_LINK_TYPE_RAW_TRACEPOINT ||
+			  link_infos[i].prog_id != prog_infos[i].id ||
+			  link_infos[i].raw_tracepoint.tp_name != (__u64)&tp_name ||
+			  strcmp((char *)link_infos[i].raw_tracepoint.tp_name,
+				 "sys_enter") ||
+			  info_len != sizeof(struct bpf_link_info),
+			  "get-link-info(fd)",
+			  "err %d errno %d info_len %u(%zu) type %d(%d) id %d "
+			  "prog_id %d (%d) tp_name %s(%s)\n",
+			  err, errno,
+			  info_len, sizeof(struct bpf_link_info),
+			  link_infos[i].type, BPF_LINK_TYPE_RAW_TRACEPOINT,
+			  link_infos[i].id,
+			  link_infos[i].prog_id, prog_infos[i].id,
+			  (char *)link_infos[i].raw_tracepoint.tp_name,
+			  "sys_enter"))
+			goto done;
+
 	}
 
 	/* Check bpf_prog_get_next_id() */
@@ -247,7 +294,52 @@ void test_bpf_obj_id(void)
 	      "nr_id_found %u(%u)\n",
 	      nr_id_found, nr_iters);
 
+	/* Check bpf_link_get_next_id() */
+	nr_id_found = 0;
+	next_id = 0;
+	while (!bpf_link_get_next_id(next_id, &next_id)) {
+		struct bpf_link_info link_info;
+		int link_fd, cmp_res;
+
+		info_len = sizeof(link_info);
+		memset(&link_info, 0, info_len);
+
+		link_fd = bpf_link_get_fd_by_id(next_id);
+		if (link_fd < 0 && errno == ENOENT)
+			/* The bpf_link is in the dead row */
+			continue;
+		if (CHECK(link_fd < 0, "get-link-fd(next_id)",
+			  "link_fd %d next_id %u errno %d\n",
+			  link_fd, next_id, errno))
+			break;
+
+		for (i = 0; i < nr_iters; i++)
+			if (link_infos[i].id == next_id)
+				break;
+
+		if (i == nr_iters)
+			continue;
+
+		nr_id_found++;
+
+		err = bpf_obj_get_info_by_fd(link_fd, &link_info, &info_len);
+		cmp_res = memcmp(&link_info, &link_infos[i],
+				offsetof(struct bpf_link_info, raw_tracepoint));
+		CHECK(err || info_len != sizeof(link_info) || cmp_res,
+		      "check get-link-info(next_id->fd)",
+		      "err %d errno %d info_len %u(%zu) memcmp %d\n",
+		      err, errno, info_len, sizeof(struct bpf_link_info),
+		      cmp_res);
+
+		close(link_fd);
+	}
+	CHECK(nr_id_found != nr_iters,
+	      "check total link id found by get_next_id",
+	      "nr_id_found %u(%u)\n", nr_id_found, nr_iters);
+
 done:
-	for (i = 0; i < nr_iters; i++)
+	for (i = 0; i < nr_iters; i++) {
+		bpf_link__destroy(links[i]);
 		bpf_object__close(objs[i]);
+	}
 }
diff --git a/tools/testing/selftests/bpf/progs/test_obj_id.c b/tools/testing/selftests/bpf/progs/test_obj_id.c
index 98b9de2fafd0..ded71b3ff6b4 100644
--- a/tools/testing/selftests/bpf/progs/test_obj_id.c
+++ b/tools/testing/selftests/bpf/progs/test_obj_id.c
@@ -3,16 +3,8 @@
  */
 #include <stddef.h>
 #include <linux/bpf.h>
-#include <linux/pkt_cls.h>
 #include <bpf/bpf_helpers.h>
 
-/* It is a dumb bpf program such that it must have no
- * issue to be loaded since testing the verifier is
- * not the focus here.
- */
-
-int _version SEC("version") = 1;
-
 struct {
 	__uint(type, BPF_MAP_TYPE_ARRAY);
 	__uint(max_entries, 1);
@@ -20,13 +12,13 @@ struct {
 	__type(value, __u64);
 } test_map_id SEC(".maps");
 
-SEC("test_obj_id_dummy")
-int test_obj_id(struct __sk_buff *skb)
+SEC("raw_tp/sys_enter")
+int test_obj_id(void *ctx)
 {
 	__u32 key = 0;
 	__u64 *value;
 
 	value = bpf_map_lookup_elem(&test_map_id, &key);
 
-	return TC_ACT_OK;
+	return 0;
 }
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 bpf-next 07/10] bpftool: expose attach_type-to-string array to non-cgroup code
  2020-04-28  5:49 [PATCH v2 bpf-next 00/10] bpf_link observability APIs Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2020-04-28  5:49 ` [PATCH v2 bpf-next 06/10] selftests/bpf: test bpf_link's get_next_id, get_fd_by_id, and get_obj_info Andrii Nakryiko
@ 2020-04-28  5:49 ` Andrii Nakryiko
  2020-04-28  8:22   ` Quentin Monnet
  2020-04-28  5:49 ` [PATCH v2 bpf-next 08/10] bpftool: add bpf_link show and pin support Andrii Nakryiko
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28  5:49 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Quentin Monnet

Move attach_type_strings into main.h for access in non-cgroup code.
bpf_attach_type is used for non-cgroup attach types quite widely now. So also
complete missing string translations for non-cgroup attach types.

Cc: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/bpf/bpftool/cgroup.c | 48 ++++++++++++--------------------------
 tools/bpf/bpftool/main.h   | 32 +++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 33 deletions(-)

diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c
index 62c6a1d7cd18..1693c802bb20 100644
--- a/tools/bpf/bpftool/cgroup.c
+++ b/tools/bpf/bpftool/cgroup.c
@@ -31,42 +31,20 @@
 
 static unsigned int query_flags;
 
-static const char * const attach_type_strings[] = {
-	[BPF_CGROUP_INET_INGRESS] = "ingress",
-	[BPF_CGROUP_INET_EGRESS] = "egress",
-	[BPF_CGROUP_INET_SOCK_CREATE] = "sock_create",
-	[BPF_CGROUP_SOCK_OPS] = "sock_ops",
-	[BPF_CGROUP_DEVICE] = "device",
-	[BPF_CGROUP_INET4_BIND] = "bind4",
-	[BPF_CGROUP_INET6_BIND] = "bind6",
-	[BPF_CGROUP_INET4_CONNECT] = "connect4",
-	[BPF_CGROUP_INET6_CONNECT] = "connect6",
-	[BPF_CGROUP_INET4_POST_BIND] = "post_bind4",
-	[BPF_CGROUP_INET6_POST_BIND] = "post_bind6",
-	[BPF_CGROUP_UDP4_SENDMSG] = "sendmsg4",
-	[BPF_CGROUP_UDP6_SENDMSG] = "sendmsg6",
-	[BPF_CGROUP_SYSCTL] = "sysctl",
-	[BPF_CGROUP_UDP4_RECVMSG] = "recvmsg4",
-	[BPF_CGROUP_UDP6_RECVMSG] = "recvmsg6",
-	[BPF_CGROUP_GETSOCKOPT] = "getsockopt",
-	[BPF_CGROUP_SETSOCKOPT] = "setsockopt",
-	[__MAX_BPF_ATTACH_TYPE] = NULL,
-};
-
 static enum bpf_attach_type parse_attach_type(const char *str)
 {
 	enum bpf_attach_type type;
 
 	for (type = 0; type < __MAX_BPF_ATTACH_TYPE; type++) {
-		if (attach_type_strings[type] &&
-		    is_prefix(str, attach_type_strings[type]))
+		if (attach_type_name[type] &&
+		    is_prefix(str, attach_type_name[type]))
 			return type;
 	}
 
 	return __MAX_BPF_ATTACH_TYPE;
 }
 
-static int show_bpf_prog(int id, const char *attach_type_str,
+static int show_bpf_prog(int id, enum bpf_attach_type attach_type,
 			 const char *attach_flags_str,
 			 int level)
 {
@@ -86,18 +64,22 @@ static int show_bpf_prog(int id, const char *attach_type_str,
 	if (json_output) {
 		jsonw_start_object(json_wtr);
 		jsonw_uint_field(json_wtr, "id", info.id);
-		jsonw_string_field(json_wtr, "attach_type",
-				   attach_type_str);
+		if (attach_type < ARRAY_SIZE(attach_type_name))
+			jsonw_string_field(json_wtr, "attach_type",
+					   attach_type_name[attach_type]);
+		else
+			jsonw_uint_field(json_wtr, "attach_type", attach_type);
 		jsonw_string_field(json_wtr, "attach_flags",
 				   attach_flags_str);
 		jsonw_string_field(json_wtr, "name", info.name);
 		jsonw_end_object(json_wtr);
 	} else {
-		printf("%s%-8u %-15s %-15s %-15s\n", level ? "    " : "",
-		       info.id,
-		       attach_type_str,
-		       attach_flags_str,
-		       info.name);
+		printf("%s%-8u ", level ? "    " : "", info.id);
+		if (attach_type < ARRAY_SIZE(attach_type_name))
+			printf("%-15s", attach_type_name[attach_type]);
+		else
+			printf("type %-10u", attach_type);
+		printf(" %-15s %-15s\n", attach_flags_str, info.name);
 	}
 
 	close(prog_fd);
@@ -171,7 +153,7 @@ static int show_attached_bpf_progs(int cgroup_fd, enum bpf_attach_type type,
 	}
 
 	for (iter = 0; iter < prog_cnt; iter++)
-		show_bpf_prog(prog_ids[iter], attach_type_strings[type],
+		show_bpf_prog(prog_ids[iter], type,
 			      attach_flags_str, level);
 
 	return 0;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 86f14ce26fd7..99d84bd1d5b2 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -83,6 +83,38 @@ static const char * const prog_type_name[] = {
 	[BPF_PROG_TYPE_EXT]			= "ext",
 };
 
+static const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = {
+	[BPF_CGROUP_INET_INGRESS] = "ingress",
+	[BPF_CGROUP_INET_EGRESS] = "egress",
+	[BPF_CGROUP_INET_SOCK_CREATE] = "sock_create",
+	[BPF_CGROUP_SOCK_OPS] = "sock_ops",
+	[BPF_CGROUP_DEVICE] = "device",
+	[BPF_CGROUP_INET4_BIND] = "bind4",
+	[BPF_CGROUP_INET6_BIND] = "bind6",
+	[BPF_CGROUP_INET4_CONNECT] = "connect4",
+	[BPF_CGROUP_INET6_CONNECT] = "connect6",
+	[BPF_CGROUP_INET4_POST_BIND] = "post_bind4",
+	[BPF_CGROUP_INET6_POST_BIND] = "post_bind6",
+	[BPF_CGROUP_UDP4_SENDMSG] = "sendmsg4",
+	[BPF_CGROUP_UDP6_SENDMSG] = "sendmsg6",
+	[BPF_CGROUP_SYSCTL] = "sysctl",
+	[BPF_CGROUP_UDP4_RECVMSG] = "recvmsg4",
+	[BPF_CGROUP_UDP6_RECVMSG] = "recvmsg6",
+	[BPF_CGROUP_GETSOCKOPT] = "getsockopt",
+	[BPF_CGROUP_SETSOCKOPT] = "setsockopt",
+
+	[BPF_SK_SKB_STREAM_PARSER] = "sk_skb_stream_parser",
+	[BPF_SK_SKB_STREAM_VERDICT] = "sk_skb_stream_verdict",
+	[BPF_SK_MSG_VERDICT] = "sk_msg_verdict",
+	[BPF_LIRC_MODE2] = "lirc_mode2",
+	[BPF_FLOW_DISSECTOR] = "flow_dissector",
+	[BPF_TRACE_RAW_TP] = "raw_tp",
+	[BPF_TRACE_FENTRY] = "fentry",
+	[BPF_TRACE_FEXIT] = "fexit",
+	[BPF_MODIFY_RETURN] = "mod_ret",
+	[BPF_LSM_MAC] = "lsm_mac",
+};
+
 extern const char * const map_type_name[];
 extern const size_t map_type_name_size;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 bpf-next 08/10] bpftool: add bpf_link show and pin support
  2020-04-28  5:49 [PATCH v2 bpf-next 00/10] bpf_link observability APIs Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2020-04-28  5:49 ` [PATCH v2 bpf-next 07/10] bpftool: expose attach_type-to-string array to non-cgroup code Andrii Nakryiko
@ 2020-04-28  5:49 ` Andrii Nakryiko
  2020-04-28  8:23   ` Quentin Monnet
  2020-04-28  5:49 ` [PATCH v2 bpf-next 09/10] bpftool: add bpftool-link manpage Andrii Nakryiko
  2020-04-28  5:49 ` [PATCH v2 bpf-next 10/10] bpftool: add link bash completions Andrii Nakryiko
  9 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28  5:49 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add `bpftool link show` and `bpftool link pin` commands.

Example plain output for `link show` (with showing pinned paths):

[vmuser@archvm bpf]$ sudo ~/local/linux/tools/bpf/bpftool/bpftool -f link
1: tracing  prog 12
        prog_type tracing  attach_type fentry
        pinned /sys/fs/bpf/my_test_link
        pinned /sys/fs/bpf/my_test_link2
2: tracing  prog 13
        prog_type tracing  attach_type fentry
3: tracing  prog 14
        prog_type tracing  attach_type fentry
4: tracing  prog 15
        prog_type tracing  attach_type fentry
5: tracing  prog 16
        prog_type tracing  attach_type fentry
6: tracing  prog 17
        prog_type tracing  attach_type fentry
7: raw_tracepoint  prog 21
        tp 'sys_enter'
8: cgroup  prog 25
        cgroup_id 584  attach_type egress
9: cgroup  prog 25
        cgroup_id 599  attach_type egress
10: cgroup  prog 25
        cgroup_id 614  attach_type egress
11: cgroup  prog 25
        cgroup_id 629  attach_type egress

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/bpf/bpftool/common.c |   2 +
 tools/bpf/bpftool/link.c   | 333 +++++++++++++++++++++++++++++++++++++
 tools/bpf/bpftool/main.c   |   6 +-
 tools/bpf/bpftool/main.h   |   5 +
 4 files changed, 345 insertions(+), 1 deletion(-)
 create mode 100644 tools/bpf/bpftool/link.c

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index f2223dbdfb0a..c47bdc65de8e 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -262,6 +262,8 @@ int get_fd_type(int fd)
 		return BPF_OBJ_MAP;
 	else if (strstr(buf, "bpf-prog"))
 		return BPF_OBJ_PROG;
+	else if (strstr(buf, "bpf-link"))
+		return BPF_OBJ_LINK;
 
 	return BPF_OBJ_UNKNOWN;
 }
diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c
new file mode 100644
index 000000000000..adc7dc431ed8
--- /dev/null
+++ b/tools/bpf/bpftool/link.c
@@ -0,0 +1,333 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2020 Facebook */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include <bpf/bpf.h>
+
+#include "json_writer.h"
+#include "main.h"
+
+static const char * const link_type_name[] = {
+	[BPF_LINK_TYPE_UNSPEC]			= "unspec",
+	[BPF_LINK_TYPE_RAW_TRACEPOINT]		= "raw_tracepoint",
+	[BPF_LINK_TYPE_TRACING]			= "tracing",
+	[BPF_LINK_TYPE_CGROUP]			= "cgroup",
+};
+
+static int link_parse_fd(int *argc, char ***argv)
+{
+	if (is_prefix(**argv, "id")) {
+		unsigned int id;
+		char *endptr;
+
+		NEXT_ARGP();
+
+		id = strtoul(**argv, &endptr, 0);
+		if (*endptr) {
+			p_err("can't parse %s as ID", **argv);
+			return -1;
+		}
+		NEXT_ARGP();
+
+		return bpf_link_get_fd_by_id(id);
+	} else if (is_prefix(**argv, "pinned")) {
+		char *path;
+
+		NEXT_ARGP();
+
+		path = **argv;
+		NEXT_ARGP();
+
+		return open_obj_pinned_any(path, BPF_OBJ_LINK);
+	}
+
+	p_err("expected 'id' or 'pinned', got: '%s'?", **argv);
+	return -1;
+}
+
+static void
+show_link_header_json(struct bpf_link_info *info, json_writer_t *wtr)
+{
+	jsonw_uint_field(wtr, "id", info->id);
+	if (info->type < ARRAY_SIZE(link_type_name))
+		jsonw_string_field(wtr, "type", link_type_name[info->type]);
+	else
+		jsonw_uint_field(wtr, "type", info->type);
+
+	jsonw_uint_field(json_wtr, "prog_id", info->prog_id);
+}
+
+static int get_prog_info(int prog_id, struct bpf_prog_info *info)
+{
+	__u32 len = sizeof(*info);
+	int err, prog_fd;
+
+	prog_fd = bpf_prog_get_fd_by_id(prog_id);
+	if (prog_fd < 0)
+		return prog_fd;
+
+	memset(info, 0, sizeof(*info));
+	err = bpf_obj_get_info_by_fd(prog_fd, info, &len);
+	if (err)
+		p_err("can't get prog info: %s", strerror(errno));
+	close(prog_fd);
+	return err;
+}
+
+static int show_link_close_json(int fd, struct bpf_link_info *info)
+{
+	struct bpf_prog_info prog_info;
+	int err;
+
+	jsonw_start_object(json_wtr);
+
+	show_link_header_json(info, json_wtr);
+
+	switch (info->type) {
+	case BPF_LINK_TYPE_RAW_TRACEPOINT:
+		jsonw_string_field(json_wtr, "tp_name",
+				   (const char *)info->raw_tracepoint.tp_name);
+		break;
+	case BPF_LINK_TYPE_TRACING:
+		err = get_prog_info(info->prog_id, &prog_info);
+		if (err)
+			return err;
+
+		if (prog_info.type < ARRAY_SIZE(prog_type_name))
+			jsonw_string_field(json_wtr, "prog_type",
+					   prog_type_name[prog_info.type]);
+		else
+			jsonw_uint_field(json_wtr, "prog_type",
+					 prog_info.type);
+
+		if (info->tracing.attach_type < ARRAY_SIZE(attach_type_name))
+			jsonw_string_field(json_wtr, "attach_type",
+			       attach_type_name[info->tracing.attach_type]);
+		else
+			jsonw_uint_field(json_wtr, "attach_type",
+					 info->tracing.attach_type);
+		break;
+	case BPF_LINK_TYPE_CGROUP:
+		jsonw_lluint_field(json_wtr, "cgroup_id",
+				   info->cgroup.cgroup_id);
+		if (info->cgroup.attach_type < ARRAY_SIZE(attach_type_name))
+			jsonw_string_field(json_wtr, "attach_type",
+			       attach_type_name[info->cgroup.attach_type]);
+		else
+			jsonw_uint_field(json_wtr, "attach_type",
+					 info->cgroup.attach_type);
+		break;
+	default:
+		break;
+	}
+
+	if (!hash_empty(link_table.table)) {
+		struct pinned_obj *obj;
+
+		jsonw_name(json_wtr, "pinned");
+		jsonw_start_array(json_wtr);
+		hash_for_each_possible(link_table.table, obj, hash, info->id) {
+			if (obj->id == info->id)
+				jsonw_string(json_wtr, obj->path);
+		}
+		jsonw_end_array(json_wtr);
+	}
+	jsonw_end_object(json_wtr);
+
+	return 0;
+}
+
+static void show_link_header_plain(struct bpf_link_info *info)
+{
+	printf("%u: ", info->id);
+	if (info->type < ARRAY_SIZE(link_type_name))
+		printf("%s  ", link_type_name[info->type]);
+	else
+		printf("type %u  ", info->type);
+
+	printf("prog %u  ", info->prog_id);
+}
+
+static int show_link_close_plain(int fd, struct bpf_link_info *info)
+{
+	struct bpf_prog_info prog_info;
+	int err;
+
+	show_link_header_plain(info);
+
+	switch (info->type) {
+	case BPF_LINK_TYPE_RAW_TRACEPOINT:
+		printf("\n\ttp '%s'  ",
+		       (const char *)info->raw_tracepoint.tp_name);
+		break;
+	case BPF_LINK_TYPE_TRACING:
+		err = get_prog_info(info->prog_id, &prog_info);
+		if (err)
+			return err;
+
+		if (prog_info.type < ARRAY_SIZE(prog_type_name))
+			printf("\n\tprog_type %s  ",
+			       prog_type_name[prog_info.type]);
+		else
+			printf("\n\tprog_type %u  ", prog_info.type);
+
+		if (info->tracing.attach_type < ARRAY_SIZE(attach_type_name))
+			printf("attach_type %s  ",
+			       attach_type_name[info->tracing.attach_type]);
+		else
+			printf("attach_type %u  ", info->tracing.attach_type);
+		break;
+	case BPF_LINK_TYPE_CGROUP:
+		printf("\n\tcgroup_id %zu  ", (size_t)info->cgroup.cgroup_id);
+		if (info->cgroup.attach_type < ARRAY_SIZE(attach_type_name))
+			printf("attach_type %s  ",
+			       attach_type_name[info->cgroup.attach_type]);
+		else
+			printf("attach_type %u  ", info->cgroup.attach_type);
+		break;
+	default:
+		break;
+	}
+
+	if (!hash_empty(link_table.table)) {
+		struct pinned_obj *obj;
+
+		hash_for_each_possible(link_table.table, obj, hash, info->id) {
+			if (obj->id == info->id)
+				printf("\n\tpinned %s", obj->path);
+		}
+	}
+
+	printf("\n");
+
+	return 0;
+}
+
+static int do_show_link(int fd)
+{
+	struct bpf_link_info info;
+	__u32 len = sizeof(info);
+	char raw_tp_name[256];
+	int err;
+
+	memset(&info, 0, sizeof(info));
+again:
+	err = bpf_obj_get_info_by_fd(fd, &info, &len);
+	if (err) {
+		p_err("can't get link info: %s",
+		      strerror(errno));
+		close(fd);
+		return err;
+	}
+	if (info.type == BPF_LINK_TYPE_RAW_TRACEPOINT &&
+	    !info.raw_tracepoint.tp_name) {
+		info.raw_tracepoint.tp_name = (unsigned long)&raw_tp_name;
+		info.raw_tracepoint.tp_name_len = sizeof(raw_tp_name);
+		goto again;
+	}
+
+	if (json_output)
+		show_link_close_json(fd, &info);
+	else
+		show_link_close_plain(fd, &info);
+
+	close(fd);
+	return 0;
+}
+
+static int do_show(int argc, char **argv)
+{
+	__u32 id = 0;
+	int err, fd;
+
+	if (show_pinned)
+		build_pinned_obj_table(&link_table, BPF_OBJ_LINK);
+
+	if (argc == 2) {
+		fd = link_parse_fd(&argc, &argv);
+		if (fd < 0)
+			return fd;
+		return do_show_link(fd);
+	}
+
+	if (argc)
+		return BAD_ARG();
+
+	if (json_output)
+		jsonw_start_array(json_wtr);
+	while (true) {
+		err = bpf_link_get_next_id(id, &id);
+		if (err) {
+			if (errno == ENOENT)
+				break;
+			p_err("can't get next link: %s%s", strerror(errno),
+			      errno == EINVAL ? " -- kernel too old?" : "");
+			break;
+		}
+
+		fd = bpf_link_get_fd_by_id(id);
+		if (fd < 0) {
+			if (errno == ENOENT)
+				continue;
+			p_err("can't get link by id (%u): %s",
+			      id, strerror(errno));
+			break;
+		}
+
+		err = do_show_link(fd);
+		if (err)
+			break;
+	}
+	if (json_output)
+		jsonw_end_array(json_wtr);
+
+	return errno == ENOENT ? 0 : -1;
+}
+
+static int do_pin(int argc, char **argv)
+{
+	int err;
+
+	err = do_pin_any(argc, argv, link_parse_fd);
+	if (!err && json_output)
+		jsonw_null(json_wtr);
+	return err;
+}
+
+static int do_help(int argc, char **argv)
+{
+	if (json_output) {
+		jsonw_null(json_wtr);
+		return 0;
+	}
+
+	fprintf(stderr,
+		"Usage: %1$s %2$s { show | list }   [LINK]\n"
+		"       %1$s %2$s pin        LINK  FILE\n"
+		"       %1$s %2$s help\n"
+		"\n"
+		"       " HELP_SPEC_LINK "\n"
+		"       " HELP_SPEC_PROGRAM "\n"
+		"       " HELP_SPEC_OPTIONS "\n"
+		"",
+		bin_name, argv[-2]);
+
+	return 0;
+}
+
+static const struct cmd cmds[] = {
+	{ "show",	do_show },
+	{ "list",	do_show },
+	{ "help",	do_help },
+	{ "pin",	do_pin },
+	{ 0 }
+};
+
+int do_link(int argc, char **argv)
+{
+	return cmd_select(cmds, argc, argv, do_help);
+}
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 466c269eabdd..1413a154806e 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -30,6 +30,7 @@ bool verifier_logs;
 bool relaxed_maps;
 struct pinned_obj_table prog_table;
 struct pinned_obj_table map_table;
+struct pinned_obj_table link_table;
 
 static void __noreturn clean_and_exit(int i)
 {
@@ -58,7 +59,7 @@ static int do_help(int argc, char **argv)
 		"       %s batch file FILE\n"
 		"       %s version\n"
 		"\n"
-		"       OBJECT := { prog | map | cgroup | perf | net | feature | btf | gen | struct_ops }\n"
+		"       OBJECT := { prog | map | link | cgroup | perf | net | feature | btf | gen | struct_ops }\n"
 		"       " HELP_SPEC_OPTIONS "\n"
 		"",
 		bin_name, bin_name, bin_name);
@@ -215,6 +216,7 @@ static const struct cmd cmds[] = {
 	{ "batch",	do_batch },
 	{ "prog",	do_prog },
 	{ "map",	do_map },
+	{ "link",	do_link },
 	{ "cgroup",	do_cgroup },
 	{ "perf",	do_perf },
 	{ "net",	do_net },
@@ -364,6 +366,7 @@ int main(int argc, char **argv)
 
 	hash_init(prog_table.table);
 	hash_init(map_table.table);
+	hash_init(link_table.table);
 
 	opterr = 0;
 	while ((opt = getopt_long(argc, argv, "Vhpjfmnd",
@@ -422,6 +425,7 @@ int main(int argc, char **argv)
 	if (show_pinned) {
 		delete_pinned_obj_table(&prog_table);
 		delete_pinned_obj_table(&map_table);
+		delete_pinned_obj_table(&link_table);
 	}
 
 	return ret;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 99d84bd1d5b2..9b1fb81a8331 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -50,6 +50,8 @@
 	"\t            {-m|--mapcompat} | {-n|--nomount} }"
 #define HELP_SPEC_MAP							\
 	"MAP := { id MAP_ID | pinned FILE | name MAP_NAME }"
+#define HELP_SPEC_LINK							\
+	"LINK := { id LINK_ID | pinned FILE }"
 
 static const char * const prog_type_name[] = {
 	[BPF_PROG_TYPE_UNSPEC]			= "unspec",
@@ -122,6 +124,7 @@ enum bpf_obj_type {
 	BPF_OBJ_UNKNOWN,
 	BPF_OBJ_PROG,
 	BPF_OBJ_MAP,
+	BPF_OBJ_LINK,
 };
 
 extern const char *bin_name;
@@ -134,6 +137,7 @@ extern bool verifier_logs;
 extern bool relaxed_maps;
 extern struct pinned_obj_table prog_table;
 extern struct pinned_obj_table map_table;
+extern struct pinned_obj_table link_table;
 
 void __printf(1, 2) p_err(const char *fmt, ...);
 void __printf(1, 2) p_info(const char *fmt, ...);
@@ -185,6 +189,7 @@ int do_pin_fd(int fd, const char *name);
 
 int do_prog(int argc, char **arg);
 int do_map(int argc, char **arg);
+int do_link(int argc, char **arg);
 int do_event_pipe(int argc, char **argv);
 int do_cgroup(int argc, char **arg);
 int do_perf(int argc, char **arg);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 bpf-next 09/10] bpftool: add bpftool-link manpage
  2020-04-28  5:49 [PATCH v2 bpf-next 00/10] bpf_link observability APIs Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2020-04-28  5:49 ` [PATCH v2 bpf-next 08/10] bpftool: add bpf_link show and pin support Andrii Nakryiko
@ 2020-04-28  5:49 ` Andrii Nakryiko
  2020-04-28  8:23   ` Quentin Monnet
  2020-04-28  5:49 ` [PATCH v2 bpf-next 10/10] bpftool: add link bash completions Andrii Nakryiko
  9 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28  5:49 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add bpftool-link manpage with information and examples of link-related
commands.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 .../bpftool/Documentation/bpftool-link.rst    | 118 ++++++++++++++++++
 1 file changed, 118 insertions(+)
 create mode 100644 tools/bpf/bpftool/Documentation/bpftool-link.rst

diff --git a/tools/bpf/bpftool/Documentation/bpftool-link.rst b/tools/bpf/bpftool/Documentation/bpftool-link.rst
new file mode 100644
index 000000000000..ee6500d6e6e4
--- /dev/null
+++ b/tools/bpf/bpftool/Documentation/bpftool-link.rst
@@ -0,0 +1,118 @@
+================
+bpftool-link
+================
+-------------------------------------------------------------------------------
+tool for inspection and simple manipulation of eBPF links
+-------------------------------------------------------------------------------
+
+:Manual section: 8
+
+SYNOPSIS
+========
+
+	**bpftool** [*OPTIONS*] **link** *COMMAND*
+
+	*OPTIONS* := { { **-j** | **--json** } [{ **-p** | **--pretty** }] | { **-f** | **--bpffs** } }
+
+	*COMMANDS* := { **show** | **list** | **pin** | **help** }
+
+LINK COMMANDS
+=============
+
+|	**bpftool** **link { show | list }** [*LINK*]
+|	**bpftool** **link pin** *LINK* *FILE*
+|	**bpftool** **link help**
+|
+|	*LINK* := { **id** *LINK_ID* | **pinned** *FILE* }
+
+
+DESCRIPTION
+===========
+	**bpftool link { show | list }** [*LINK*]
+		  Show information about active links. If *LINK* is
+		  specified show information only about given link,
+		  otherwise list all links currently active on the system.
+
+		  Output will start with link ID followed by link type and
+		  zero or more named attributes, some of which depend on type
+		  of link.
+
+	**bpftool link pin** *LINK* *FILE*
+		  Pin link *LINK* as *FILE*.
+
+		  Note: *FILE* must be located in *bpffs* mount. It must not
+		  contain a dot character ('.'), which is reserved for future
+		  extensions of *bpffs*.
+
+	**bpftool link help**
+		  Print short help message.
+
+OPTIONS
+=======
+	-h, --help
+		  Print short generic help message (similar to **bpftool help**).
+
+	-V, --version
+		  Print version number (similar to **bpftool version**).
+
+	-j, --json
+		  Generate JSON output. For commands that cannot produce JSON, this
+		  option has no effect.
+
+	-p, --pretty
+		  Generate human-readable JSON output. Implies **-j**.
+
+	-f, --bpffs
+		  When showing BPF links, show file names of pinned
+		  links.
+
+	-n, --nomount
+		  Do not automatically attempt to mount any virtual file system
+		  (such as tracefs or BPF virtual file system) when necessary.
+
+	-d, --debug
+		  Print all logs available, even debug-level information. This
+		  includes logs from libbpf.
+
+EXAMPLES
+========
+**# bpftool link show**
+
+::
+
+    10: cgroup  prog 25
+            cgroup_id 614  attach_type egress
+
+**# bpftool --json --pretty link show**
+
+::
+
+    [{
+            "type": "cgroup",
+            "prog_id": 25,
+            "cgroup_id": 614,
+            "attach_type": "egress"
+        }
+    ]
+
+|
+| **# bpftool link pin id 10 /sys/fs/bpf/link**
+| **# ls -l /sys/fs/bpf/**
+
+::
+
+    -rw------- 1 root root 0 Apr 23 21:39 link
+
+
+SEE ALSO
+========
+	**bpf**\ (2),
+	**bpf-helpers**\ (7),
+	**bpftool**\ (8),
+	**bpftool-prog\ (8),
+	**bpftool-map**\ (8),
+	**bpftool-cgroup**\ (8),
+	**bpftool-feature**\ (8),
+	**bpftool-net**\ (8),
+	**bpftool-perf**\ (8),
+	**bpftool-btf**\ (8)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 bpf-next 10/10] bpftool: add link bash completions
  2020-04-28  5:49 [PATCH v2 bpf-next 00/10] bpf_link observability APIs Andrii Nakryiko
                   ` (8 preceding siblings ...)
  2020-04-28  5:49 ` [PATCH v2 bpf-next 09/10] bpftool: add bpftool-link manpage Andrii Nakryiko
@ 2020-04-28  5:49 ` Andrii Nakryiko
  9 siblings, 0 replies; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28  5:49 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Quentin Monnet

Extend bpftool's bash-completion script to handle new link command and its
sub-commands.

Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/bpf/bpftool/bash-completion/bpftool | 39 +++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 45ee99b159e2..c033c3329f73 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -98,6 +98,12 @@ _bpftool_get_btf_ids()
         command sed -n 's/.*"id": \(.*\),$/\1/p' )" -- "$cur" ) )
 }
 
+_bpftool_get_link_ids()
+{
+    COMPREPLY+=( $( compgen -W "$( bpftool -jp link 2>&1 | \
+        command sed -n 's/.*"id": \(.*\),$/\1/p' )" -- "$cur" ) )
+}
+
 _bpftool_get_obj_map_names()
 {
     local obj
@@ -1082,6 +1088,39 @@ _bpftool()
                     ;;
             esac
             ;;
+        link)
+            case $command in
+                show|list|pin)
+                    case $prev in
+                        id)
+                            _bpftool_get_link_ids
+                            return 0
+                            ;;
+                    esac
+                    ;;
+            esac
+
+            local LINK_TYPE='id pinned'
+            case $command in
+                show|list)
+                    [[ $prev != "$command" ]] && return 0
+                    COMPREPLY=( $( compgen -W "$LINK_TYPE" -- "$cur" ) )
+                    return 0
+                    ;;
+                pin)
+                    if [[ $prev == "$command" ]]; then
+                        COMPREPLY=( $( compgen -W "$LINK_TYPE" -- "$cur" ) )
+                    else
+                        _filedir
+                    fi
+                    return 0
+                    ;;
+                *)
+                    [[ $prev == $object ]] && \
+                        COMPREPLY=( $( compgen -W 'help pin show list' -- "$cur" ) )
+                    ;;
+            esac
+            ;;
     esac
 } &&
 complete -F _bpftool bpftool
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 07/10] bpftool: expose attach_type-to-string array to non-cgroup code
  2020-04-28  5:49 ` [PATCH v2 bpf-next 07/10] bpftool: expose attach_type-to-string array to non-cgroup code Andrii Nakryiko
@ 2020-04-28  8:22   ` Quentin Monnet
  0 siblings, 0 replies; 25+ messages in thread
From: Quentin Monnet @ 2020-04-28  8:22 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team

2020-04-27 22:49 UTC-0700 ~ Andrii Nakryiko <andriin@fb.com>
> Move attach_type_strings into main.h for access in non-cgroup code.
> bpf_attach_type is used for non-cgroup attach types quite widely now. So also
> complete missing string translations for non-cgroup attach types.
> 
> Cc: Quentin Monnet <quentin@isovalent.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Thank you Andrii


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 08/10] bpftool: add bpf_link show and pin support
  2020-04-28  5:49 ` [PATCH v2 bpf-next 08/10] bpftool: add bpf_link show and pin support Andrii Nakryiko
@ 2020-04-28  8:23   ` Quentin Monnet
  0 siblings, 0 replies; 25+ messages in thread
From: Quentin Monnet @ 2020-04-28  8:23 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team

2020-04-27 22:49 UTC-0700 ~ Andrii Nakryiko <andriin@fb.com>
> Add `bpftool link show` and `bpftool link pin` commands.
> 
> Example plain output for `link show` (with showing pinned paths):
> 
> [vmuser@archvm bpf]$ sudo ~/local/linux/tools/bpf/bpftool/bpftool -f link
> 1: tracing  prog 12
>         prog_type tracing  attach_type fentry
>         pinned /sys/fs/bpf/my_test_link
>         pinned /sys/fs/bpf/my_test_link2
> 2: tracing  prog 13
>         prog_type tracing  attach_type fentry
> 3: tracing  prog 14
>         prog_type tracing  attach_type fentry
> 4: tracing  prog 15
>         prog_type tracing  attach_type fentry
> 5: tracing  prog 16
>         prog_type tracing  attach_type fentry
> 6: tracing  prog 17
>         prog_type tracing  attach_type fentry
> 7: raw_tracepoint  prog 21
>         tp 'sys_enter'
> 8: cgroup  prog 25
>         cgroup_id 584  attach_type egress
> 9: cgroup  prog 25
>         cgroup_id 599  attach_type egress
> 10: cgroup  prog 25
>         cgroup_id 614  attach_type egress
> 11: cgroup  prog 25
>         cgroup_id 629  attach_type egress
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---

Reviewed-by: Quentin Monnet <quentin@isovalent.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 09/10] bpftool: add bpftool-link manpage
  2020-04-28  5:49 ` [PATCH v2 bpf-next 09/10] bpftool: add bpftool-link manpage Andrii Nakryiko
@ 2020-04-28  8:23   ` Quentin Monnet
  0 siblings, 0 replies; 25+ messages in thread
From: Quentin Monnet @ 2020-04-28  8:23 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team

2020-04-27 22:49 UTC-0700 ~ Andrii Nakryiko <andriin@fb.com>
> Add bpftool-link manpage with information and examples of link-related
> commands.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Reviewed-by: Quentin Monnet <quentin@isovalent.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 04/10] bpf: add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link
  2020-04-28  5:49 ` [PATCH v2 bpf-next 04/10] bpf: add support for BPF_OBJ_GET_INFO_BY_FD " Andrii Nakryiko
@ 2020-04-28  9:46   ` Toke Høiland-Jørgensen
  2020-04-28 16:27     ` Andrii Nakryiko
  2020-04-28 21:55   ` kbuild test robot
  1 sibling, 1 reply; 25+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-28  9:46 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Andrii Nakryiko <andriin@fb.com> writes:

> Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command.
> Also enhance show_fdinfo to potentially include bpf_link type-specific
> information (similarly to obj_info).
>
> Also introduce enum bpf_link_type stored in bpf_link itself and expose it in
> UAPI. bpf_link_tracing also now will store and return bpf_attach_type.
>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  include/linux/bpf-cgroup.h     |   2 -
>  include/linux/bpf.h            |  10 +-
>  include/linux/bpf_types.h      |   6 ++
>  include/uapi/linux/bpf.h       |  28 ++++++
>  kernel/bpf/btf.c               |   2 +
>  kernel/bpf/cgroup.c            |  45 ++++++++-
>  kernel/bpf/syscall.c           | 164 +++++++++++++++++++++++++++++----
>  kernel/bpf/verifier.c          |   2 +
>  tools/include/uapi/linux/bpf.h |  31 +++++++
>  9 files changed, 266 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index d2d969669564..ab95824a1d99 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -57,8 +57,6 @@ struct bpf_cgroup_link {
>  	enum bpf_attach_type type;
>  };
>  
> -extern const struct bpf_link_ops bpf_cgroup_link_lops;
> -
>  struct bpf_prog_list {
>  	struct list_head node;
>  	struct bpf_prog *prog;
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 875d1f0af803..701c4387c711 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1026,9 +1026,11 @@ extern const struct file_operations bpf_prog_fops;
>  	extern const struct bpf_verifier_ops _name ## _verifier_ops;
>  #define BPF_MAP_TYPE(_id, _ops) \
>  	extern const struct bpf_map_ops _ops;
> +#define BPF_LINK_TYPE(_id, _name)
>  #include <linux/bpf_types.h>
>  #undef BPF_PROG_TYPE
>  #undef BPF_MAP_TYPE
> +#undef BPF_LINK_TYPE
>  
>  extern const struct bpf_prog_ops bpf_offload_prog_ops;
>  extern const struct bpf_verifier_ops tc_cls_act_analyzer_ops;
> @@ -1086,6 +1088,7 @@ int bpf_prog_new_fd(struct bpf_prog *prog);
>  struct bpf_link {
>  	atomic64_t refcnt;
>  	u32 id;
> +	enum bpf_link_type type;
>  	const struct bpf_link_ops *ops;
>  	struct bpf_prog *prog;
>  	struct work_struct work;
> @@ -1103,9 +1106,14 @@ struct bpf_link_ops {
>  	void (*dealloc)(struct bpf_link *link);
>  	int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
>  			   struct bpf_prog *old_prog);
> +	void (*show_fdinfo)(const struct bpf_link *link, struct seq_file *seq);
> +	int (*fill_link_info)(const struct bpf_link *link,
> +			      struct bpf_link_info *info,
> +			      const struct bpf_link_info *uinfo,
> +			      u32 info_len);
>  };
>  
> -void bpf_link_init(struct bpf_link *link,
> +void bpf_link_init(struct bpf_link *link, enum bpf_link_type type,
>  		   const struct bpf_link_ops *ops, struct bpf_prog *prog);
>  int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer);
>  int bpf_link_settle(struct bpf_link_primer *primer);
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index ba0c2d56f8a3..8345cdf553b8 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -118,3 +118,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
>  #if defined(CONFIG_BPF_JIT)
>  BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
>  #endif
> +
> +BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> +BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> +#ifdef CONFIG_CGROUP_BPF
> +BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
> +#endif
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 7e6541fceade..0eccafae55bb 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -222,6 +222,15 @@ enum bpf_attach_type {
>  
>  #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
>  
> +enum bpf_link_type {
> +	BPF_LINK_TYPE_UNSPEC = 0,
> +	BPF_LINK_TYPE_RAW_TRACEPOINT = 1,
> +	BPF_LINK_TYPE_TRACING = 2,
> +	BPF_LINK_TYPE_CGROUP = 3,
> +
> +	MAX_BPF_LINK_TYPE,
> +};
> +
>  /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
>   *
>   * NONE(default): No further bpf programs allowed in the subtree.
> @@ -3612,6 +3621,25 @@ struct bpf_btf_info {
>  	__u32 id;
>  } __attribute__((aligned(8)));
>  
> +struct bpf_link_info {
> +	__u32 type;
> +	__u32 id;
> +	__u32 prog_id;
> +	union {
> +		struct {
> +			__aligned_u64 tp_name; /* in/out: tp_name buffer ptr */
> +			__u32 tp_name_len;     /* in/out: tp_name buffer len */
> +		} raw_tracepoint;
> +		struct {
> +			__u32 attach_type;
> +		} tracing;

On the RFC I asked whether we could get the attach target here as well.
You said you'd look into it; what happened to that? :)

-Toke


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 04/10] bpf: add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link
  2020-04-28  9:46   ` Toke Høiland-Jørgensen
@ 2020-04-28 16:27     ` Andrii Nakryiko
  2020-04-28 18:31       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28 16:27 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Tue, Apr 28, 2020 at 2:46 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andriin@fb.com> writes:
>
> > Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command.
> > Also enhance show_fdinfo to potentially include bpf_link type-specific
> > information (similarly to obj_info).
> >
> > Also introduce enum bpf_link_type stored in bpf_link itself and expose it in
> > UAPI. bpf_link_tracing also now will store and return bpf_attach_type.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> >  include/linux/bpf-cgroup.h     |   2 -
> >  include/linux/bpf.h            |  10 +-
> >  include/linux/bpf_types.h      |   6 ++
> >  include/uapi/linux/bpf.h       |  28 ++++++
> >  kernel/bpf/btf.c               |   2 +
> >  kernel/bpf/cgroup.c            |  45 ++++++++-
> >  kernel/bpf/syscall.c           | 164 +++++++++++++++++++++++++++++----
> >  kernel/bpf/verifier.c          |   2 +
> >  tools/include/uapi/linux/bpf.h |  31 +++++++
> >  9 files changed, 266 insertions(+), 24 deletions(-)
> >
> > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> > index d2d969669564..ab95824a1d99 100644
> > --- a/include/linux/bpf-cgroup.h
> > +++ b/include/linux/bpf-cgroup.h
> > @@ -57,8 +57,6 @@ struct bpf_cgroup_link {
> >       enum bpf_attach_type type;
> >  };
> >
> > -extern const struct bpf_link_ops bpf_cgroup_link_lops;
> > -
> >  struct bpf_prog_list {
> >       struct list_head node;
> >       struct bpf_prog *prog;
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 875d1f0af803..701c4387c711 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1026,9 +1026,11 @@ extern const struct file_operations bpf_prog_fops;
> >       extern const struct bpf_verifier_ops _name ## _verifier_ops;
> >  #define BPF_MAP_TYPE(_id, _ops) \
> >       extern const struct bpf_map_ops _ops;
> > +#define BPF_LINK_TYPE(_id, _name)
> >  #include <linux/bpf_types.h>
> >  #undef BPF_PROG_TYPE
> >  #undef BPF_MAP_TYPE
> > +#undef BPF_LINK_TYPE
> >
> >  extern const struct bpf_prog_ops bpf_offload_prog_ops;
> >  extern const struct bpf_verifier_ops tc_cls_act_analyzer_ops;
> > @@ -1086,6 +1088,7 @@ int bpf_prog_new_fd(struct bpf_prog *prog);
> >  struct bpf_link {
> >       atomic64_t refcnt;
> >       u32 id;
> > +     enum bpf_link_type type;
> >       const struct bpf_link_ops *ops;
> >       struct bpf_prog *prog;
> >       struct work_struct work;
> > @@ -1103,9 +1106,14 @@ struct bpf_link_ops {
> >       void (*dealloc)(struct bpf_link *link);
> >       int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
> >                          struct bpf_prog *old_prog);
> > +     void (*show_fdinfo)(const struct bpf_link *link, struct seq_file *seq);
> > +     int (*fill_link_info)(const struct bpf_link *link,
> > +                           struct bpf_link_info *info,
> > +                           const struct bpf_link_info *uinfo,
> > +                           u32 info_len);
> >  };
> >
> > -void bpf_link_init(struct bpf_link *link,
> > +void bpf_link_init(struct bpf_link *link, enum bpf_link_type type,
> >                  const struct bpf_link_ops *ops, struct bpf_prog *prog);
> >  int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer);
> >  int bpf_link_settle(struct bpf_link_primer *primer);
> > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > index ba0c2d56f8a3..8345cdf553b8 100644
> > --- a/include/linux/bpf_types.h
> > +++ b/include/linux/bpf_types.h
> > @@ -118,3 +118,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
> >  #if defined(CONFIG_BPF_JIT)
> >  BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
> >  #endif
> > +
> > +BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> > +BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> > +#ifdef CONFIG_CGROUP_BPF
> > +BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
> > +#endif
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 7e6541fceade..0eccafae55bb 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -222,6 +222,15 @@ enum bpf_attach_type {
> >
> >  #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
> >
> > +enum bpf_link_type {
> > +     BPF_LINK_TYPE_UNSPEC = 0,
> > +     BPF_LINK_TYPE_RAW_TRACEPOINT = 1,
> > +     BPF_LINK_TYPE_TRACING = 2,
> > +     BPF_LINK_TYPE_CGROUP = 3,
> > +
> > +     MAX_BPF_LINK_TYPE,
> > +};
> > +
> >  /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
> >   *
> >   * NONE(default): No further bpf programs allowed in the subtree.
> > @@ -3612,6 +3621,25 @@ struct bpf_btf_info {
> >       __u32 id;
> >  } __attribute__((aligned(8)));
> >
> > +struct bpf_link_info {
> > +     __u32 type;
> > +     __u32 id;
> > +     __u32 prog_id;
> > +     union {
> > +             struct {
> > +                     __aligned_u64 tp_name; /* in/out: tp_name buffer ptr */
> > +                     __u32 tp_name_len;     /* in/out: tp_name buffer len */
> > +             } raw_tracepoint;
> > +             struct {
> > +                     __u32 attach_type;
> > +             } tracing;
>
> On the RFC I asked whether we could get the attach target here as well.
> You said you'd look into it; what happened to that? :)
>

Right, sorry, forgot to mention this. I did take a look, but tracing
links are quite diverse, so I didn't see one clear way to structure
such "target" information, so I'd say we should push it into a
separate patch/discussion. E.g., fentry/fexit/fmod_exit are attached
to kernel functions (how do we represent that), freplace are attached
to another BPF program (this is a bit clearer how to represent, but
how do we combine that with fentry/fexit info?). LSM is also attached
to kernel function, but who knows, maybe we want slightly more
extended semantics for it. Either way, I don't see one best way to
structure this information and would want to avoid blocking on this
for this series. Also bpf_link_info is extensible, so it's not a
problem to extend it in follow up patches.

Does it make sense?

> -Toke
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 02/10] bpf: allocate ID for bpf_link
  2020-04-28  5:49 ` [PATCH v2 bpf-next 02/10] bpf: allocate ID for bpf_link Andrii Nakryiko
@ 2020-04-28 17:31   ` Alexei Starovoitov
  2020-04-28 18:56     ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2020-04-28 17:31 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf, netdev, ast, daniel, andrii.nakryiko, kernel-team

On Mon, Apr 27, 2020 at 10:49:36PM -0700, Andrii Nakryiko wrote:
> +int bpf_link_settle(struct bpf_link_primer *primer)
> +{
> +	/* make bpf_link fetchable by ID */
> +	WRITE_ONCE(primer->link->id, primer->id);

what does WRITE_ONCE serve here?
bpf_link_settle can only be called at the end of attach.
If attach is slow than parallel get_fd_by_id can get an new FD
instance for link with zero id.
In such case deref of link->id will race with above assignment?
But I don't see READ_ONCE in patch 3.
It's under link_idr_lock there.
How about grabbing link_idr_lock here as well ?
otherwise it's still racy since WRITE_ONCE is not paired.

The mix of spin_lock_irqsave(&link_idr_lock)
and spin_lock_bh(&link_idr_lock) looks weird.
We do the same for map_idr because maps have complicated freeing logic,
but prog_idr is consistent.
If you see the need for irqsave variant then please use it in all cases.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 04/10] bpf: add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link
  2020-04-28 16:27     ` Andrii Nakryiko
@ 2020-04-28 18:31       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 25+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-28 18:31 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Apr 28, 2020 at 2:46 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andriin@fb.com> writes:
>>
>> > Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command.
>> > Also enhance show_fdinfo to potentially include bpf_link type-specific
>> > information (similarly to obj_info).
>> >
>> > Also introduce enum bpf_link_type stored in bpf_link itself and expose it in
>> > UAPI. bpf_link_tracing also now will store and return bpf_attach_type.
>> >
>> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
>> > ---
>> >  include/linux/bpf-cgroup.h     |   2 -
>> >  include/linux/bpf.h            |  10 +-
>> >  include/linux/bpf_types.h      |   6 ++
>> >  include/uapi/linux/bpf.h       |  28 ++++++
>> >  kernel/bpf/btf.c               |   2 +
>> >  kernel/bpf/cgroup.c            |  45 ++++++++-
>> >  kernel/bpf/syscall.c           | 164 +++++++++++++++++++++++++++++----
>> >  kernel/bpf/verifier.c          |   2 +
>> >  tools/include/uapi/linux/bpf.h |  31 +++++++
>> >  9 files changed, 266 insertions(+), 24 deletions(-)
>> >
>> > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
>> > index d2d969669564..ab95824a1d99 100644
>> > --- a/include/linux/bpf-cgroup.h
>> > +++ b/include/linux/bpf-cgroup.h
>> > @@ -57,8 +57,6 @@ struct bpf_cgroup_link {
>> >       enum bpf_attach_type type;
>> >  };
>> >
>> > -extern const struct bpf_link_ops bpf_cgroup_link_lops;
>> > -
>> >  struct bpf_prog_list {
>> >       struct list_head node;
>> >       struct bpf_prog *prog;
>> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> > index 875d1f0af803..701c4387c711 100644
>> > --- a/include/linux/bpf.h
>> > +++ b/include/linux/bpf.h
>> > @@ -1026,9 +1026,11 @@ extern const struct file_operations bpf_prog_fops;
>> >       extern const struct bpf_verifier_ops _name ## _verifier_ops;
>> >  #define BPF_MAP_TYPE(_id, _ops) \
>> >       extern const struct bpf_map_ops _ops;
>> > +#define BPF_LINK_TYPE(_id, _name)
>> >  #include <linux/bpf_types.h>
>> >  #undef BPF_PROG_TYPE
>> >  #undef BPF_MAP_TYPE
>> > +#undef BPF_LINK_TYPE
>> >
>> >  extern const struct bpf_prog_ops bpf_offload_prog_ops;
>> >  extern const struct bpf_verifier_ops tc_cls_act_analyzer_ops;
>> > @@ -1086,6 +1088,7 @@ int bpf_prog_new_fd(struct bpf_prog *prog);
>> >  struct bpf_link {
>> >       atomic64_t refcnt;
>> >       u32 id;
>> > +     enum bpf_link_type type;
>> >       const struct bpf_link_ops *ops;
>> >       struct bpf_prog *prog;
>> >       struct work_struct work;
>> > @@ -1103,9 +1106,14 @@ struct bpf_link_ops {
>> >       void (*dealloc)(struct bpf_link *link);
>> >       int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
>> >                          struct bpf_prog *old_prog);
>> > +     void (*show_fdinfo)(const struct bpf_link *link, struct seq_file *seq);
>> > +     int (*fill_link_info)(const struct bpf_link *link,
>> > +                           struct bpf_link_info *info,
>> > +                           const struct bpf_link_info *uinfo,
>> > +                           u32 info_len);
>> >  };
>> >
>> > -void bpf_link_init(struct bpf_link *link,
>> > +void bpf_link_init(struct bpf_link *link, enum bpf_link_type type,
>> >                  const struct bpf_link_ops *ops, struct bpf_prog *prog);
>> >  int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer);
>> >  int bpf_link_settle(struct bpf_link_primer *primer);
>> > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
>> > index ba0c2d56f8a3..8345cdf553b8 100644
>> > --- a/include/linux/bpf_types.h
>> > +++ b/include/linux/bpf_types.h
>> > @@ -118,3 +118,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
>> >  #if defined(CONFIG_BPF_JIT)
>> >  BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
>> >  #endif
>> > +
>> > +BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
>> > +BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
>> > +#ifdef CONFIG_CGROUP_BPF
>> > +BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
>> > +#endif
>> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> > index 7e6541fceade..0eccafae55bb 100644
>> > --- a/include/uapi/linux/bpf.h
>> > +++ b/include/uapi/linux/bpf.h
>> > @@ -222,6 +222,15 @@ enum bpf_attach_type {
>> >
>> >  #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
>> >
>> > +enum bpf_link_type {
>> > +     BPF_LINK_TYPE_UNSPEC = 0,
>> > +     BPF_LINK_TYPE_RAW_TRACEPOINT = 1,
>> > +     BPF_LINK_TYPE_TRACING = 2,
>> > +     BPF_LINK_TYPE_CGROUP = 3,
>> > +
>> > +     MAX_BPF_LINK_TYPE,
>> > +};
>> > +
>> >  /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
>> >   *
>> >   * NONE(default): No further bpf programs allowed in the subtree.
>> > @@ -3612,6 +3621,25 @@ struct bpf_btf_info {
>> >       __u32 id;
>> >  } __attribute__((aligned(8)));
>> >
>> > +struct bpf_link_info {
>> > +     __u32 type;
>> > +     __u32 id;
>> > +     __u32 prog_id;
>> > +     union {
>> > +             struct {
>> > +                     __aligned_u64 tp_name; /* in/out: tp_name buffer ptr */
>> > +                     __u32 tp_name_len;     /* in/out: tp_name buffer len */
>> > +             } raw_tracepoint;
>> > +             struct {
>> > +                     __u32 attach_type;
>> > +             } tracing;
>>
>> On the RFC I asked whether we could get the attach target here as well.
>> You said you'd look into it; what happened to that? :)
>>
>
> Right, sorry, forgot to mention this. I did take a look, but tracing
> links are quite diverse, so I didn't see one clear way to structure
> such "target" information, so I'd say we should push it into a
> separate patch/discussion. E.g., fentry/fexit/fmod_exit are attached
> to kernel functions (how do we represent that), freplace are attached
> to another BPF program (this is a bit clearer how to represent, but
> how do we combine that with fentry/fexit info?). LSM is also attached
> to kernel function, but who knows, maybe we want slightly more
> extended semantics for it. Either way, I don't see one best way to
> structure this information and would want to avoid blocking on this
> for this series. Also bpf_link_info is extensible, so it's not a
> problem to extend it in follow up patches.
>
> Does it make sense?

Yup, fair enough, I can live with deferring this to a later series :)

-Toke


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 02/10] bpf: allocate ID for bpf_link
  2020-04-28 17:31   ` Alexei Starovoitov
@ 2020-04-28 18:56     ` Andrii Nakryiko
  2020-04-28 20:38       ` Alexei Starovoitov
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28 18:56 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Tue, Apr 28, 2020 at 10:31 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Apr 27, 2020 at 10:49:36PM -0700, Andrii Nakryiko wrote:
> > +int bpf_link_settle(struct bpf_link_primer *primer)
> > +{
> > +     /* make bpf_link fetchable by ID */
> > +     WRITE_ONCE(primer->link->id, primer->id);
>
> what does WRITE_ONCE serve here?

To prevent compiler reordering this write with fd_install. So that by
the time FD is exposed to user-space, link has properly set ID.

> bpf_link_settle can only be called at the end of attach.
> If attach is slow than parallel get_fd_by_id can get an new FD
> instance for link with zero id.
> In such case deref of link->id will race with above assignment?

Yes, it does race, but it can either see zero and assume bpf_link is
not ready (which is fine to do) or will see correct link ID and will
proceed to create new FD for it. By the time we do context switch back
to user-space and return link FD, ID will definitely be visible due to
context switch and associated memory barriers. If anyone is guessing
FD and trying to create FD_BY_ID before LINK_CREATE syscall returns --
then returning failure due to link ID not yet set is totally fine,
IMO.

> But I don't see READ_ONCE in patch 3.
> It's under link_idr_lock there.

It doesn't need READ_ONCE because it does read under spinlock, so
compiler can't re-order it with code outside of spinlock.

> How about grabbing link_idr_lock here as well ?
> otherwise it's still racy since WRITE_ONCE is not paired.

As indicated above, seems unnecessary? But I also don't object
strongly, I don't expect this lock for links to be a major bottleneck
or anything like that.

>
> The mix of spin_lock_irqsave(&link_idr_lock)
> and spin_lock_bh(&link_idr_lock) looks weird.
> We do the same for map_idr because maps have complicated freeing logic,
> but prog_idr is consistent.
> If you see the need for irqsave variant then please use it in all cases.

No, my bad, I don't see any need to intermix them. I'll stick to
spin_lock_bh, thanks for catching!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 02/10] bpf: allocate ID for bpf_link
  2020-04-28 18:56     ` Andrii Nakryiko
@ 2020-04-28 20:38       ` Alexei Starovoitov
  2020-04-28 22:33         ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2020-04-28 20:38 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Tue, Apr 28, 2020 at 11:56:52AM -0700, Andrii Nakryiko wrote:
> On Tue, Apr 28, 2020 at 10:31 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Apr 27, 2020 at 10:49:36PM -0700, Andrii Nakryiko wrote:
> > > +int bpf_link_settle(struct bpf_link_primer *primer)
> > > +{
> > > +     /* make bpf_link fetchable by ID */
> > > +     WRITE_ONCE(primer->link->id, primer->id);
> >
> > what does WRITE_ONCE serve here?
> 
> To prevent compiler reordering this write with fd_install. So that by
> the time FD is exposed to user-space, link has properly set ID.

if you wanted memory barrier then it should have been barrier(),
but that wouldn't be enough, since patch 2 and 3 race to read and write
that 32-bit int.

> > bpf_link_settle can only be called at the end of attach.
> > If attach is slow than parallel get_fd_by_id can get an new FD
> > instance for link with zero id.
> > In such case deref of link->id will race with above assignment?
> 
> Yes, it does race, but it can either see zero and assume bpf_link is
> not ready (which is fine to do) or will see correct link ID and will
> proceed to create new FD for it. By the time we do context switch back
> to user-space and return link FD, ID will definitely be visible due to
> context switch and associated memory barriers. If anyone is guessing
> FD and trying to create FD_BY_ID before LINK_CREATE syscall returns --
> then returning failure due to link ID not yet set is totally fine,
> IMO.
> 
> > But I don't see READ_ONCE in patch 3.
> > It's under link_idr_lock there.
> 
> It doesn't need READ_ONCE because it does read under spinlock, so
> compiler can't re-order it with code outside of spinlock.

spin_lock in patch 3 doesn't guarantee that link->id deref in that patch
will be atomic.
So WRITE_ONCE in patch 2 into link->id is still racy with plain
read in patch 3.
Just wait and see kmsan complaining about it.

> > How about grabbing link_idr_lock here as well ?
> > otherwise it's still racy since WRITE_ONCE is not paired.
> 
> As indicated above, seems unnecessary? But I also don't object
> strongly, I don't expect this lock for links to be a major bottleneck
> or anything like that.

Either READ_ONCE has to be paired with WRITE_ONCE
(or even better smp_load_acquire with smp_store_release)
or use spin_lock.

> >
> > The mix of spin_lock_irqsave(&link_idr_lock)
> > and spin_lock_bh(&link_idr_lock) looks weird.
> > We do the same for map_idr because maps have complicated freeing logic,
> > but prog_idr is consistent.
> > If you see the need for irqsave variant then please use it in all cases.
> 
> No, my bad, I don't see any need to intermix them. I'll stick to
> spin_lock_bh, thanks for catching!

I think that should be fine.
Please double check that situation described in
commit 930651a75bf1 ("bpf: do not disable/enable BH in bpf_map_free_id()")
doesn't apply to link_idr.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 04/10] bpf: add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link
  2020-04-28  5:49 ` [PATCH v2 bpf-next 04/10] bpf: add support for BPF_OBJ_GET_INFO_BY_FD " Andrii Nakryiko
  2020-04-28  9:46   ` Toke Høiland-Jørgensen
@ 2020-04-28 21:55   ` kbuild test robot
  1 sibling, 0 replies; 25+ messages in thread
From: kbuild test robot @ 2020-04-28 21:55 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: kbuild-all, andrii.nakryiko, kernel-team, Andrii Nakryiko

[-- Attachment #1: Type: text/plain, Size: 2894 bytes --]

Hi Andrii,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]
[also build test ERROR on bpf/master cgroup/for-next net/master net-next/master v5.7-rc3 next-20200428]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Andrii-Nakryiko/bpf_link-observability-APIs/20200428-215720
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: sh-randconfig-a001-20200428 (attached as .config)
compiler: sh4-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=sh 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   sh4-linux-ld: kernel/bpf/syscall.o: in function `bpf_raw_tp_link_fill_link_info':
>> kernel/bpf/syscall.c:2570: undefined reference to `__get_user_unknown'

vim +2570 kernel/bpf/syscall.c

  2547	
  2548	static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link,
  2549						  struct bpf_link_info *info,
  2550						  const struct bpf_link_info *uinfo,
  2551						  u32 info_len)
  2552	{
  2553		struct bpf_raw_tp_link *raw_tp_link =
  2554			container_of(link, struct bpf_raw_tp_link, link);
  2555		u64 ubuf_ptr;
  2556		char __user *ubuf = u64_to_user_ptr(uinfo->raw_tracepoint.tp_name);
  2557		const char *tp_name = raw_tp_link->btp->tp->name;
  2558		size_t tp_len;
  2559		u32 ulen;
  2560	
  2561		if (get_user(ulen, &uinfo->raw_tracepoint.tp_name_len))
  2562			return -EFAULT;
  2563		if (get_user(ubuf_ptr, &uinfo->raw_tracepoint.tp_name))
  2564			return -EFAULT;
  2565		ubuf = u64_to_user_ptr(ubuf_ptr);
  2566	
  2567		if (ulen && !ubuf)
  2568			return -EINVAL;
  2569		if (!ubuf)
> 2570			return 0;
  2571	
  2572		tp_len = strlen(raw_tp_link->btp->tp->name);
  2573		info->raw_tracepoint.tp_name_len = tp_len + 1;
  2574		info->raw_tracepoint.tp_name = (u64)(unsigned long)ubuf;
  2575	
  2576		if (ulen >= tp_len + 1) {
  2577			if (copy_to_user(ubuf, tp_name, tp_len + 1))
  2578				return -EFAULT;
  2579		} else {
  2580			char zero = '\0';
  2581	
  2582			if (copy_to_user(ubuf, tp_name, ulen - 1))
  2583				return -EFAULT;
  2584			if (put_user(zero, ubuf + ulen - 1))
  2585				return -EFAULT;
  2586			return -ENOSPC;
  2587		}
  2588	
  2589		return 0;
  2590	}
  2591	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25376 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 02/10] bpf: allocate ID for bpf_link
  2020-04-28 20:38       ` Alexei Starovoitov
@ 2020-04-28 22:33         ` Andrii Nakryiko
  2020-04-28 22:43           ` Alexei Starovoitov
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28 22:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Tue, Apr 28, 2020 at 1:38 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Apr 28, 2020 at 11:56:52AM -0700, Andrii Nakryiko wrote:
> > On Tue, Apr 28, 2020 at 10:31 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Apr 27, 2020 at 10:49:36PM -0700, Andrii Nakryiko wrote:
> > > > +int bpf_link_settle(struct bpf_link_primer *primer)
> > > > +{
> > > > +     /* make bpf_link fetchable by ID */
> > > > +     WRITE_ONCE(primer->link->id, primer->id);
> > >
> > > what does WRITE_ONCE serve here?
> >
> > To prevent compiler reordering this write with fd_install. So that by
> > the time FD is exposed to user-space, link has properly set ID.
>
> if you wanted memory barrier then it should have been barrier(),
> but that wouldn't be enough, since patch 2 and 3 race to read and write
> that 32-bit int.
>
> > > bpf_link_settle can only be called at the end of attach.
> > > If attach is slow than parallel get_fd_by_id can get an new FD
> > > instance for link with zero id.
> > > In such case deref of link->id will race with above assignment?
> >
> > Yes, it does race, but it can either see zero and assume bpf_link is
> > not ready (which is fine to do) or will see correct link ID and will
> > proceed to create new FD for it. By the time we do context switch back
> > to user-space and return link FD, ID will definitely be visible due to
> > context switch and associated memory barriers. If anyone is guessing
> > FD and trying to create FD_BY_ID before LINK_CREATE syscall returns --
> > then returning failure due to link ID not yet set is totally fine,
> > IMO.
> >
> > > But I don't see READ_ONCE in patch 3.
> > > It's under link_idr_lock there.
> >
> > It doesn't need READ_ONCE because it does read under spinlock, so
> > compiler can't re-order it with code outside of spinlock.
>
> spin_lock in patch 3 doesn't guarantee that link->id deref in that patch
> will be atomic.

What do you mean by "atomic" here? Are you saying that we can get torn
read on u32 on some architectures? If that was the case, neither
WRITE_ONCE/READ_ONCE nor smp_write_release/smp_load_acquire would
help. But I don't think that's the case, we have code in verifier that
does similar racy u32 write/read (it uses READ_ONCE/WRITE_ONCE) and
seems to be working fine.

> So WRITE_ONCE in patch 2 into link->id is still racy with plain
> read in patch 3.
> Just wait and see kmsan complaining about it.
>
> > > How about grabbing link_idr_lock here as well ?
> > > otherwise it's still racy since WRITE_ONCE is not paired.
> >
> > As indicated above, seems unnecessary? But I also don't object
> > strongly, I don't expect this lock for links to be a major bottleneck
> > or anything like that.
>
> Either READ_ONCE has to be paired with WRITE_ONCE
> (or even better smp_load_acquire with smp_store_release)
> or use spin_lock.

Sure, let me use smp_load_acquite/smp_store_release.

>
> > >
> > > The mix of spin_lock_irqsave(&link_idr_lock)
> > > and spin_lock_bh(&link_idr_lock) looks weird.
> > > We do the same for map_idr because maps have complicated freeing logic,
> > > but prog_idr is consistent.
> > > If you see the need for irqsave variant then please use it in all cases.
> >
> > No, my bad, I don't see any need to intermix them. I'll stick to
> > spin_lock_bh, thanks for catching!
>
> I think that should be fine.
> Please double check that situation described in
> commit 930651a75bf1 ("bpf: do not disable/enable BH in bpf_map_free_id()")
> doesn't apply to link_idr.

If I understand what was the problem for BPF maps, we were taking lock
and trying to disable softirqs while softirqs were already disabled by
caller. This doesn't seem to be the case for links, as far as I can
tell. So I'll just go with spin_lock_bh() everywhere for consistency.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 02/10] bpf: allocate ID for bpf_link
  2020-04-28 22:33         ` Andrii Nakryiko
@ 2020-04-28 22:43           ` Alexei Starovoitov
  2020-04-28 23:25             ` Andrii Nakryiko
  0 siblings, 1 reply; 25+ messages in thread
From: Alexei Starovoitov @ 2020-04-28 22:43 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Tue, Apr 28, 2020 at 03:33:07PM -0700, Andrii Nakryiko wrote:
> On Tue, Apr 28, 2020 at 1:38 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Apr 28, 2020 at 11:56:52AM -0700, Andrii Nakryiko wrote:
> > > On Tue, Apr 28, 2020 at 10:31 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Mon, Apr 27, 2020 at 10:49:36PM -0700, Andrii Nakryiko wrote:
> > > > > +int bpf_link_settle(struct bpf_link_primer *primer)
> > > > > +{
> > > > > +     /* make bpf_link fetchable by ID */
> > > > > +     WRITE_ONCE(primer->link->id, primer->id);
> > > >
> > > > what does WRITE_ONCE serve here?
> > >
> > > To prevent compiler reordering this write with fd_install. So that by
> > > the time FD is exposed to user-space, link has properly set ID.
> >
> > if you wanted memory barrier then it should have been barrier(),
> > but that wouldn't be enough, since patch 2 and 3 race to read and write
> > that 32-bit int.
> >
> > > > bpf_link_settle can only be called at the end of attach.
> > > > If attach is slow than parallel get_fd_by_id can get an new FD
> > > > instance for link with zero id.
> > > > In such case deref of link->id will race with above assignment?
> > >
> > > Yes, it does race, but it can either see zero and assume bpf_link is
> > > not ready (which is fine to do) or will see correct link ID and will
> > > proceed to create new FD for it. By the time we do context switch back
> > > to user-space and return link FD, ID will definitely be visible due to
> > > context switch and associated memory barriers. If anyone is guessing
> > > FD and trying to create FD_BY_ID before LINK_CREATE syscall returns --
> > > then returning failure due to link ID not yet set is totally fine,
> > > IMO.
> > >
> > > > But I don't see READ_ONCE in patch 3.
> > > > It's under link_idr_lock there.
> > >
> > > It doesn't need READ_ONCE because it does read under spinlock, so
> > > compiler can't re-order it with code outside of spinlock.
> >
> > spin_lock in patch 3 doesn't guarantee that link->id deref in that patch
> > will be atomic.
> 
> What do you mean by "atomic" here? Are you saying that we can get torn
> read on u32 on some architectures? 

compiler doesn't guarantee that plain 32-bit load/store will stay 32-bit
even on 64-bit archs.

> If that was the case, neither
> WRITE_ONCE/READ_ONCE nor smp_write_release/smp_load_acquire would
> help. 

what do you mean? They will. That's the point of these macros.

> But I don't think that's the case, we have code in verifier that
> does similar racy u32 write/read (it uses READ_ONCE/WRITE_ONCE) and
> seems to be working fine.

you mean in btf_resolve_helper_id() ?
What kind of race do you see there?

> > So WRITE_ONCE in patch 2 into link->id is still racy with plain
> > read in patch 3.
> > Just wait and see kmsan complaining about it.
> >
> > > > How about grabbing link_idr_lock here as well ?
> > > > otherwise it's still racy since WRITE_ONCE is not paired.
> > >
> > > As indicated above, seems unnecessary? But I also don't object
> > > strongly, I don't expect this lock for links to be a major bottleneck
> > > or anything like that.
> >
> > Either READ_ONCE has to be paired with WRITE_ONCE
> > (or even better smp_load_acquire with smp_store_release)
> > or use spin_lock.
> 
> Sure, let me use smp_load_acquite/smp_store_release.

Since there're locks in other places I would use spin_lock_bh
to update id as well.

> 
> >
> > > >
> > > > The mix of spin_lock_irqsave(&link_idr_lock)
> > > > and spin_lock_bh(&link_idr_lock) looks weird.
> > > > We do the same for map_idr because maps have complicated freeing logic,
> > > > but prog_idr is consistent.
> > > > If you see the need for irqsave variant then please use it in all cases.
> > >
> > > No, my bad, I don't see any need to intermix them. I'll stick to
> > > spin_lock_bh, thanks for catching!
> >
> > I think that should be fine.
> > Please double check that situation described in
> > commit 930651a75bf1 ("bpf: do not disable/enable BH in bpf_map_free_id()")
> > doesn't apply to link_idr.
> 
> If I understand what was the problem for BPF maps, we were taking lock
> and trying to disable softirqs while softirqs were already disabled by
> caller. This doesn't seem to be the case for links, as far as I can
> tell. So I'll just go with spin_lock_bh() everywhere for consistency.

Sounds good.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 02/10] bpf: allocate ID for bpf_link
  2020-04-28 22:43           ` Alexei Starovoitov
@ 2020-04-28 23:25             ` Andrii Nakryiko
  2020-04-29  0:11               ` Alexei Starovoitov
  0 siblings, 1 reply; 25+ messages in thread
From: Andrii Nakryiko @ 2020-04-28 23:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Tue, Apr 28, 2020 at 3:43 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Apr 28, 2020 at 03:33:07PM -0700, Andrii Nakryiko wrote:
> > On Tue, Apr 28, 2020 at 1:38 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Tue, Apr 28, 2020 at 11:56:52AM -0700, Andrii Nakryiko wrote:
> > > > On Tue, Apr 28, 2020 at 10:31 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Mon, Apr 27, 2020 at 10:49:36PM -0700, Andrii Nakryiko wrote:
> > > > > > +int bpf_link_settle(struct bpf_link_primer *primer)
> > > > > > +{
> > > > > > +     /* make bpf_link fetchable by ID */
> > > > > > +     WRITE_ONCE(primer->link->id, primer->id);
> > > > >
> > > > > what does WRITE_ONCE serve here?
> > > >
> > > > To prevent compiler reordering this write with fd_install. So that by
> > > > the time FD is exposed to user-space, link has properly set ID.
> > >
> > > if you wanted memory barrier then it should have been barrier(),
> > > but that wouldn't be enough, since patch 2 and 3 race to read and write
> > > that 32-bit int.
> > >
> > > > > bpf_link_settle can only be called at the end of attach.
> > > > > If attach is slow than parallel get_fd_by_id can get an new FD
> > > > > instance for link with zero id.
> > > > > In such case deref of link->id will race with above assignment?
> > > >
> > > > Yes, it does race, but it can either see zero and assume bpf_link is
> > > > not ready (which is fine to do) or will see correct link ID and will
> > > > proceed to create new FD for it. By the time we do context switch back
> > > > to user-space and return link FD, ID will definitely be visible due to
> > > > context switch and associated memory barriers. If anyone is guessing
> > > > FD and trying to create FD_BY_ID before LINK_CREATE syscall returns --
> > > > then returning failure due to link ID not yet set is totally fine,
> > > > IMO.
> > > >
> > > > > But I don't see READ_ONCE in patch 3.
> > > > > It's under link_idr_lock there.
> > > >
> > > > It doesn't need READ_ONCE because it does read under spinlock, so
> > > > compiler can't re-order it with code outside of spinlock.
> > >
> > > spin_lock in patch 3 doesn't guarantee that link->id deref in that patch
> > > will be atomic.
> >
> > What do you mean by "atomic" here? Are you saying that we can get torn
> > read on u32 on some architectures?
>
> compiler doesn't guarantee that plain 32-bit load/store will stay 32-bit
> even on 64-bit archs.
>
> > If that was the case, neither
> > WRITE_ONCE/READ_ONCE nor smp_write_release/smp_load_acquire would
> > help.
>
> what do you mean? They will. That's the point of these macros.

According to Documentation/memory-barriers.txt,
smp_load_acquire/smp_store_release are about ordering and memory
barriers, not about guaranteeing atomicity of reading value.
Especially READ_ONCE/WRITE_ONCE which are volatile read/write, not
atomic read/write. But nevertheless, I'll do lock and this will become
moot.

>
> > But I don't think that's the case, we have code in verifier that
> > does similar racy u32 write/read (it uses READ_ONCE/WRITE_ONCE) and
> > seems to be working fine.
>
> you mean in btf_resolve_helper_id() ?
> What kind of race do you see there?

Two CPUs reading/writing to same variable without lock? Value starts
at 0 (meaning "not yet ready") and eventually becoming valid and final
non-zero value. Even if they race, and one CPU reads 0 while another
CPU already set it to non-zero, it's fine. In verifier's case it will
be eventually overwritten with the same resolved btf id. In case of
bpf_link, GET_FD_BY_ID would pretend link doesn't exist yet and return
error. Seems similar enough to me.

>
> > > So WRITE_ONCE in patch 2 into link->id is still racy with plain
> > > read in patch 3.
> > > Just wait and see kmsan complaining about it.
> > >
> > > > > How about grabbing link_idr_lock here as well ?
> > > > > otherwise it's still racy since WRITE_ONCE is not paired.
> > > >
> > > > As indicated above, seems unnecessary? But I also don't object
> > > > strongly, I don't expect this lock for links to be a major bottleneck
> > > > or anything like that.
> > >
> > > Either READ_ONCE has to be paired with WRITE_ONCE
> > > (or even better smp_load_acquire with smp_store_release)
> > > or use spin_lock.
> >
> > Sure, let me use smp_load_acquite/smp_store_release.
>
> Since there're locks in other places I would use spin_lock_bh
> to update id as well.

Sure, I'll do spin_lock_bh.

>
> >
> > >
> > > > >
> > > > > The mix of spin_lock_irqsave(&link_idr_lock)
> > > > > and spin_lock_bh(&link_idr_lock) looks weird.
> > > > > We do the same for map_idr because maps have complicated freeing logic,
> > > > > but prog_idr is consistent.
> > > > > If you see the need for irqsave variant then please use it in all cases.
> > > >
> > > > No, my bad, I don't see any need to intermix them. I'll stick to
> > > > spin_lock_bh, thanks for catching!
> > >
> > > I think that should be fine.
> > > Please double check that situation described in
> > > commit 930651a75bf1 ("bpf: do not disable/enable BH in bpf_map_free_id()")
> > > doesn't apply to link_idr.
> >
> > If I understand what was the problem for BPF maps, we were taking lock
> > and trying to disable softirqs while softirqs were already disabled by
> > caller. This doesn't seem to be the case for links, as far as I can
> > tell. So I'll just go with spin_lock_bh() everywhere for consistency.
>
> Sounds good.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 bpf-next 02/10] bpf: allocate ID for bpf_link
  2020-04-28 23:25             ` Andrii Nakryiko
@ 2020-04-29  0:11               ` Alexei Starovoitov
  0 siblings, 0 replies; 25+ messages in thread
From: Alexei Starovoitov @ 2020-04-29  0:11 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Tue, Apr 28, 2020 at 04:25:39PM -0700, Andrii Nakryiko wrote:
> >
> > compiler doesn't guarantee that plain 32-bit load/store will stay 32-bit
> > even on 64-bit archs.
> >
> > > If that was the case, neither
> > > WRITE_ONCE/READ_ONCE nor smp_write_release/smp_load_acquire would
> > > help.
> >
> > what do you mean? They will. That's the point of these macros.
> 
> According to Documentation/memory-barriers.txt,
> smp_load_acquire/smp_store_release are about ordering and memory
> barriers, not about guaranteeing atomicity of reading value.
> Especially READ_ONCE/WRITE_ONCE which are volatile read/write, not
> atomic read/write. But nevertheless, I'll do lock and this will become
> moot.

May be that's something for Paul to clarify in the doc?
smp_load_acquire() is READ_ONCE() + smp_mb() unoptimized in general case.
And READ_ONCE + barrier on x86.

> >
> > > But I don't think that's the case, we have code in verifier that
> > > does similar racy u32 write/read (it uses READ_ONCE/WRITE_ONCE) and
> > > seems to be working fine.
> >
> > you mean in btf_resolve_helper_id() ?
> > What kind of race do you see there?
> 
> Two CPUs reading/writing to same variable without lock? Value starts
> at 0 (meaning "not yet ready") and eventually becoming valid and final
> non-zero value. Even if they race, and one CPU reads 0 while another
> CPU already set it to non-zero, it's fine. In verifier's case it will
> be eventually overwritten with the same resolved btf id. In case of
> bpf_link, GET_FD_BY_ID would pretend link doesn't exist yet and return
> error. Seems similar enough to me.

ahh. similar in the sense that only one value is written.
it's either zero or whatever_that_id. Right.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-04-29  0:11 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-28  5:49 [PATCH v2 bpf-next 00/10] bpf_link observability APIs Andrii Nakryiko
2020-04-28  5:49 ` [PATCH v2 bpf-next 01/10] bpf: refactor bpf_link update handling Andrii Nakryiko
2020-04-28  5:49 ` [PATCH v2 bpf-next 02/10] bpf: allocate ID for bpf_link Andrii Nakryiko
2020-04-28 17:31   ` Alexei Starovoitov
2020-04-28 18:56     ` Andrii Nakryiko
2020-04-28 20:38       ` Alexei Starovoitov
2020-04-28 22:33         ` Andrii Nakryiko
2020-04-28 22:43           ` Alexei Starovoitov
2020-04-28 23:25             ` Andrii Nakryiko
2020-04-29  0:11               ` Alexei Starovoitov
2020-04-28  5:49 ` [PATCH v2 bpf-next 03/10] bpf: support GET_FD_BY_ID and GET_NEXT_ID " Andrii Nakryiko
2020-04-28  5:49 ` [PATCH v2 bpf-next 04/10] bpf: add support for BPF_OBJ_GET_INFO_BY_FD " Andrii Nakryiko
2020-04-28  9:46   ` Toke Høiland-Jørgensen
2020-04-28 16:27     ` Andrii Nakryiko
2020-04-28 18:31       ` Toke Høiland-Jørgensen
2020-04-28 21:55   ` kbuild test robot
2020-04-28  5:49 ` [PATCH v2 bpf-next 05/10] libbpf: add low-level APIs for new bpf_link commands Andrii Nakryiko
2020-04-28  5:49 ` [PATCH v2 bpf-next 06/10] selftests/bpf: test bpf_link's get_next_id, get_fd_by_id, and get_obj_info Andrii Nakryiko
2020-04-28  5:49 ` [PATCH v2 bpf-next 07/10] bpftool: expose attach_type-to-string array to non-cgroup code Andrii Nakryiko
2020-04-28  8:22   ` Quentin Monnet
2020-04-28  5:49 ` [PATCH v2 bpf-next 08/10] bpftool: add bpf_link show and pin support Andrii Nakryiko
2020-04-28  8:23   ` Quentin Monnet
2020-04-28  5:49 ` [PATCH v2 bpf-next 09/10] bpftool: add bpftool-link manpage Andrii Nakryiko
2020-04-28  8:23   ` Quentin Monnet
2020-04-28  5:49 ` [PATCH v2 bpf-next 10/10] bpftool: add link bash completions Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).