bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/8] bpf_link observability APIs
@ 2020-04-04  0:09 Andrii Nakryiko
  2020-04-04  0:09 ` [RFC PATCH bpf-next 1/8] bpf: refactor bpf_link update handling Andrii Nakryiko
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-04  0:09 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

This patch series adds various observability APIs to bpf_link:
  - each bpf_link now gets ID, similar to bpf_map and bpf_prog, by which
    user-space can iterate over all existing bpf_links and create limited FD
    from ID;
  - allows to get extra object information with bpf_link general and
    type-specific information;
  - makes LINK_UPDATE operation allowed only for writable bpf_links and allows
    to pin bpf_link as read-only file;
  - implements `bpf link show` command which lists all active bpf_links in the
    system;
  - implements `bpf link pin` allowing to pin bpf_link by ID or from other
    pinned path.

This RFC series is missing selftests and only limited amount of manual testing
was performed. But kernel implementation is hopefully in a good shape and
won't change much (unless some big issues are identified with the current
approach). It would be great to get feedback on approach and implementation,
before I invest more time in writing tests.

Andrii Nakryiko (8):
  bpf: refactor bpf_link update handling
  bpf: allow bpf_link pinning as read-only and enforce LINK_UPDATE
  bpf: allocate ID for bpf_link
  bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  bpf: add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link
  libbpf: add low-level APIs for new bpf_link commands
  bpftool: expose attach_type-to-string array to non-cgroup code
  bpftool: add bpf_link show and pin support

 include/linux/bpf-cgroup.h     |  14 --
 include/linux/bpf.h            |  34 ++-
 include/linux/bpf_types.h      |   6 +
 include/uapi/linux/bpf.h       |  31 +++
 kernel/bpf/btf.c               |   2 +
 kernel/bpf/cgroup.c            |  89 +++++++-
 kernel/bpf/inode.c             |  30 ++-
 kernel/bpf/syscall.c           | 387 +++++++++++++++++++++++++------
 kernel/bpf/verifier.c          |   2 +
 kernel/cgroup/cgroup.c         |  27 ---
 tools/bpf/bpftool/cgroup.c     |  28 +--
 tools/bpf/bpftool/link.c       | 403 +++++++++++++++++++++++++++++++++
 tools/bpf/bpftool/main.c       |   2 +
 tools/bpf/bpftool/main.h       |  37 +++
 tools/include/uapi/linux/bpf.h |  31 +++
 tools/lib/bpf/bpf.c            |  19 +-
 tools/lib/bpf/bpf.h            |   4 +-
 tools/lib/bpf/libbpf.map       |   6 +
 18 files changed, 983 insertions(+), 169 deletions(-)
 create mode 100644 tools/bpf/bpftool/link.c

-- 
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 1/8] bpf: refactor bpf_link update handling
  2020-04-04  0:09 [RFC PATCH bpf-next 0/8] bpf_link observability APIs Andrii Nakryiko
@ 2020-04-04  0:09 ` Andrii Nakryiko
  2020-04-04  0:09 ` [RFC PATCH bpf-next 2/8] bpf: allow bpf_link pinning as read-only and enforce LINK_UPDATE Andrii Nakryiko
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-04  0:09 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Make bpf_link update support more generic by making it into another
bpf_link_ops methods. This allows generic syscall handling code to be agnostic
to various conditionally compiled features (e.g., the case of
CONFIG_CGROUP_BPF). This also allows to keep link type-specific code to remain
static within respective code base. Refactor existing bpf_cgroup_link code and
take advantage of this.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/linux/bpf-cgroup.h | 12 ------------
 include/linux/bpf.h        |  3 ++-
 kernel/bpf/cgroup.c        | 30 ++++++++++++++++++++++++++++--
 kernel/bpf/syscall.c       | 11 ++++-------
 kernel/cgroup/cgroup.c     | 27 ---------------------------
 5 files changed, 34 insertions(+), 49 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index c11b413d5b1a..d2d969669564 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -100,8 +100,6 @@ int __cgroup_bpf_attach(struct cgroup *cgrp,
 int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 			struct bpf_cgroup_link *link,
 			enum bpf_attach_type type);
-int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
-			 struct bpf_prog *new_prog);
 int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		       union bpf_attr __user *uattr);
 
@@ -112,8 +110,6 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
 		      u32 flags);
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 		      enum bpf_attach_type type);
-int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *old_prog,
-		       struct bpf_prog *new_prog);
 int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
 		     union bpf_attr __user *uattr);
 
@@ -354,7 +350,6 @@ int cgroup_bpf_prog_query(const union bpf_attr *attr,
 #else
 
 struct bpf_prog;
-struct bpf_link;
 struct cgroup_bpf {};
 static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; }
 static inline void cgroup_bpf_offline(struct cgroup *cgrp) {}
@@ -378,13 +373,6 @@ static inline int cgroup_bpf_link_attach(const union bpf_attr *attr,
 	return -EINVAL;
 }
 
-static inline int cgroup_bpf_replace(struct bpf_link *link,
-				     struct bpf_prog *old_prog,
-				     struct bpf_prog *new_prog)
-{
-	return -EINVAL;
-}
-
 static inline int cgroup_bpf_prog_query(const union bpf_attr *attr,
 					union bpf_attr __user *uattr)
 {
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fd2b2322412d..ea65c3165e4c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1093,7 +1093,8 @@ struct bpf_link {
 struct bpf_link_ops {
 	void (*release)(struct bpf_link *link);
 	void (*dealloc)(struct bpf_link *link);
-
+	int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
+			   struct bpf_prog *old_prog);
 };
 
 void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index cb305e71e7de..54eacc44d1e4 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -557,8 +557,9 @@ static void replace_effective_prog(struct cgroup *cgrp,
  *
  * Must be called with cgroup_mutex held.
  */
-int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
-			 struct bpf_prog *new_prog)
+static int __cgroup_bpf_replace(struct cgroup *cgrp,
+				struct bpf_cgroup_link *link,
+				struct bpf_prog *new_prog)
 {
 	struct list_head *progs = &cgrp->bpf.progs[link->type];
 	struct bpf_prog *old_prog;
@@ -583,6 +584,30 @@ int __cgroup_bpf_replace(struct cgroup *cgrp, struct bpf_cgroup_link *link,
 	return 0;
 }
 
+static int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *new_prog,
+			      struct bpf_prog *old_prog)
+{
+	struct bpf_cgroup_link *cg_link;
+	int ret;
+
+	cg_link = container_of(link, struct bpf_cgroup_link, link);
+
+	mutex_lock(&cgroup_mutex);
+	/* link might have been auto-released by dying cgroup, so fail */
+	if (!cg_link->cgroup) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	if (old_prog && link->prog != old_prog) {
+		ret = -EPERM;
+		goto out_unlock;
+	}
+	ret = __cgroup_bpf_replace(cg_link->cgroup, cg_link, new_prog);
+out_unlock:
+	mutex_unlock(&cgroup_mutex);
+	return ret;
+}
+
 static struct bpf_prog_list *find_detach_entry(struct list_head *progs,
 					       struct bpf_prog *prog,
 					       struct bpf_cgroup_link *link,
@@ -811,6 +836,7 @@ static void bpf_cgroup_link_dealloc(struct bpf_link *link)
 const struct bpf_link_ops bpf_cgroup_link_lops = {
 	.release = bpf_cgroup_link_release,
 	.dealloc = bpf_cgroup_link_dealloc,
+	.update_prog = cgroup_bpf_replace,
 };
 
 int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 64783da34202..40993d8c936e 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3642,13 +3642,10 @@ static int link_update(union bpf_attr *attr)
 		}
 	}
 
-#ifdef CONFIG_CGROUP_BPF
-	if (link->ops == &bpf_cgroup_link_lops) {
-		ret = cgroup_bpf_replace(link, old_prog, new_prog);
-		goto out_put_progs;
-	}
-#endif
-	ret = -EINVAL;
+	if (link->ops->update_prog)
+		ret = link->ops->update_prog(link, new_prog, old_prog);
+	else
+		ret = EINVAL;
 
 out_put_progs:
 	if (old_prog)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 915dda3f7f19..219624fba9ba 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6317,33 +6317,6 @@ int cgroup_bpf_attach(struct cgroup *cgrp,
 	return ret;
 }
 
-int cgroup_bpf_replace(struct bpf_link *link, struct bpf_prog *old_prog,
-		       struct bpf_prog *new_prog)
-{
-	struct bpf_cgroup_link *cg_link;
-	int ret;
-
-	if (link->ops != &bpf_cgroup_link_lops)
-		return -EINVAL;
-
-	cg_link = container_of(link, struct bpf_cgroup_link, link);
-
-	mutex_lock(&cgroup_mutex);
-	/* link might have been auto-released by dying cgroup, so fail */
-	if (!cg_link->cgroup) {
-		ret = -EINVAL;
-		goto out_unlock;
-	}
-	if (old_prog && link->prog != old_prog) {
-		ret = -EPERM;
-		goto out_unlock;
-	}
-	ret = __cgroup_bpf_replace(cg_link->cgroup, cg_link, new_prog);
-out_unlock:
-	mutex_unlock(&cgroup_mutex);
-	return ret;
-}
-
 int cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 		      enum bpf_attach_type type)
 {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 2/8] bpf: allow bpf_link pinning as read-only and enforce LINK_UPDATE
  2020-04-04  0:09 [RFC PATCH bpf-next 0/8] bpf_link observability APIs Andrii Nakryiko
  2020-04-04  0:09 ` [RFC PATCH bpf-next 1/8] bpf: refactor bpf_link update handling Andrii Nakryiko
@ 2020-04-04  0:09 ` Andrii Nakryiko
  2020-04-04  0:09 ` [RFC PATCH bpf-next 3/8] bpf: allocate ID for bpf_link Andrii Nakryiko
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-04  0:09 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Make it possible to pin bpf_link as read-only to control whether LINK_UPDATE
operation is allowed or not. bpf_links provided through read-only files are
not allowed to perform LINK_UPDATE operations, which this patch starts
enforcing. bpf_map and bpf_prog are still always treated as read-write ones,
just like before.

This is a critical property for bpf_links and is going to be relied upon for
BPF_LINK_GET_FD_BY_ID operation implemented later in the series. GET_FD_BY_ID
will only return read-only links to prevent processes that do not "own"
bpf_link from updating underlying bpf_prog.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/linux/bpf.h  |  6 +++---
 kernel/bpf/inode.c   | 30 ++++++++++++++++++++++--------
 kernel/bpf/syscall.c | 26 +++++++++++++++++++-------
 3 files changed, 44 insertions(+), 18 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ea65c3165e4c..3474f8e34a63 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1103,11 +1103,11 @@ void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
 		      int link_fd);
 void bpf_link_inc(struct bpf_link *link);
 void bpf_link_put(struct bpf_link *link);
-int bpf_link_new_fd(struct bpf_link *link);
+int bpf_link_new_fd(struct bpf_link *link, int flags);
 struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd);
-struct bpf_link *bpf_link_get_from_fd(u32 ufd);
+struct bpf_link *bpf_link_get_from_fd(u32 ufd, fmode_t *link_mode);
 
-int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
+int bpf_obj_pin_user(u32 ufd, const char __user *pathname, int file_flags);
 int bpf_obj_get_user(const char __user *pathname, int flags);
 
 int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 95087d9f4ed3..3fd71c1e3c33 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -66,23 +66,25 @@ static void bpf_any_put(void *raw, enum bpf_type type)
 	}
 }
 
-static void *bpf_fd_probe_obj(u32 ufd, enum bpf_type *type)
+static void *bpf_fd_probe_obj(u32 ufd, enum bpf_type *type, fmode_t *file_mode)
 {
 	void *raw;
 
 	raw = bpf_map_get_with_uref(ufd);
 	if (!IS_ERR(raw)) {
 		*type = BPF_TYPE_MAP;
+		*file_mode = O_RDWR;
 		return raw;
 	}
 
 	raw = bpf_prog_get(ufd);
 	if (!IS_ERR(raw)) {
 		*type = BPF_TYPE_PROG;
+		*file_mode = O_RDWR;
 		return raw;
 	}
 
-	raw = bpf_link_get_from_fd(ufd);
+	raw = bpf_link_get_from_fd(ufd, file_mode);
 	if (!IS_ERR(raw)) {
 		*type = BPF_TYPE_LINK;
 		return raw;
@@ -407,7 +409,7 @@ static const struct inode_operations bpf_dir_iops = {
 };
 
 static int bpf_obj_do_pin(const char __user *pathname, void *raw,
-			  enum bpf_type type)
+			  enum bpf_type type, fmode_t file_mode)
 {
 	struct dentry *dentry;
 	struct inode *dir;
@@ -419,7 +421,7 @@ static int bpf_obj_do_pin(const char __user *pathname, void *raw,
 	if (IS_ERR(dentry))
 		return PTR_ERR(dentry);
 
-	mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
+	mode = S_IFREG | (ACC_MODE(file_mode) & ~current_umask());
 
 	ret = security_path_mknod(&path, dentry, mode, 0);
 	if (ret)
@@ -449,17 +451,29 @@ static int bpf_obj_do_pin(const char __user *pathname, void *raw,
 	return ret;
 }
 
-int bpf_obj_pin_user(u32 ufd, const char __user *pathname)
+int bpf_obj_pin_user(u32 ufd, const char __user *pathname, int file_flags)
 {
 	enum bpf_type type;
+	fmode_t file_mode;
 	void *raw;
 	int ret;
 
-	raw = bpf_fd_probe_obj(ufd, &type);
+	raw = bpf_fd_probe_obj(ufd, &type, &file_mode);
 	if (IS_ERR(raw))
 		return PTR_ERR(raw);
 
-	ret = bpf_obj_do_pin(pathname, raw, type);
+	if ((type == BPF_TYPE_MAP || type == BPF_TYPE_PROG) && file_flags)
+		return -EINVAL;
+
+	/* requested pinned file mode has to be a valid subset */
+	if (!file_flags) {
+		file_flags = file_mode;
+	} else if ((file_mode & file_flags) != file_flags) {
+		bpf_any_put(raw, type);
+		return -EPERM;
+	}
+
+	ret = bpf_obj_do_pin(pathname, raw, type, file_flags);
 	if (ret != 0)
 		bpf_any_put(raw, type);
 
@@ -518,7 +532,7 @@ int bpf_obj_get_user(const char __user *pathname, int flags)
 	else if (type == BPF_TYPE_MAP)
 		ret = bpf_map_new_fd(raw, f_flags);
 	else if (type == BPF_TYPE_LINK)
-		ret = bpf_link_new_fd(raw);
+		ret = bpf_link_new_fd(raw, f_flags);
 	else
 		return -ENOENT;
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 40993d8c936e..47f323901ed9 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2167,10 +2167,11 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 
 static int bpf_obj_pin(const union bpf_attr *attr)
 {
-	if (CHECK_ATTR(BPF_OBJ) || attr->file_flags != 0)
+	if (CHECK_ATTR(BPF_OBJ))
 		return -EINVAL;
 
-	return bpf_obj_pin_user(attr->bpf_fd, u64_to_user_ptr(attr->pathname));
+	return bpf_obj_pin_user(attr->bpf_fd, u64_to_user_ptr(attr->pathname),
+				attr->file_flags);
 }
 
 static int bpf_obj_get(const union bpf_attr *attr)
@@ -2294,9 +2295,10 @@ const struct file_operations bpf_link_fops = {
 	.write		= bpf_dummy_write,
 };
 
-int bpf_link_new_fd(struct bpf_link *link)
+int bpf_link_new_fd(struct bpf_link *link, int flags)
 {
-	return anon_inode_getfd("bpf-link", &bpf_link_fops, link, O_CLOEXEC);
+	return anon_inode_getfd("bpf-link", &bpf_link_fops, link,
+				flags | O_CLOEXEC);
 }
 
 /* Similar to bpf_link_new_fd, create anon_inode for given bpf_link, but
@@ -2316,7 +2318,8 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd)
 	if (fd < 0)
 		return ERR_PTR(fd);
 
-	file = anon_inode_getfile("bpf_link", &bpf_link_fops, link, O_CLOEXEC);
+	file = anon_inode_getfile("bpf_link", &bpf_link_fops, link,
+				  O_RDWR | O_CLOEXEC);
 	if (IS_ERR(file)) {
 		put_unused_fd(fd);
 		return file;
@@ -2326,7 +2329,7 @@ struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd)
 	return file;
 }
 
-struct bpf_link *bpf_link_get_from_fd(u32 ufd)
+struct bpf_link *bpf_link_get_from_fd(u32 ufd, fmode_t *link_mode)
 {
 	struct fd f = fdget(ufd);
 	struct bpf_link *link;
@@ -2340,6 +2343,8 @@ struct bpf_link *bpf_link_get_from_fd(u32 ufd)
 
 	link = f.file->private_data;
 	bpf_link_inc(link);
+	if (link_mode)
+		*link_mode = f.file->f_mode;
 	fdput(f);
 
 	return link;
@@ -3612,6 +3617,7 @@ static int link_update(union bpf_attr *attr)
 {
 	struct bpf_prog *old_prog = NULL, *new_prog;
 	struct bpf_link *link;
+	fmode_t link_mode;
 	u32 flags;
 	int ret;
 
@@ -3625,10 +3631,16 @@ static int link_update(union bpf_attr *attr)
 	if (flags & ~BPF_F_REPLACE)
 		return -EINVAL;
 
-	link = bpf_link_get_from_fd(attr->link_update.link_fd);
+	link = bpf_link_get_from_fd(attr->link_update.link_fd, &link_mode);
 	if (IS_ERR(link))
 		return PTR_ERR(link);
 
+	/* read-only link references are not allowed to perform LINK_UPDATE */
+	if (!(link_mode & O_WRONLY)) {
+		bpf_link_put(link);
+		return -EACCES;
+	}
+
 	new_prog = bpf_prog_get(attr->link_update.new_prog_fd);
 	if (IS_ERR(new_prog))
 		return PTR_ERR(new_prog);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 3/8] bpf: allocate ID for bpf_link
  2020-04-04  0:09 [RFC PATCH bpf-next 0/8] bpf_link observability APIs Andrii Nakryiko
  2020-04-04  0:09 ` [RFC PATCH bpf-next 1/8] bpf: refactor bpf_link update handling Andrii Nakryiko
  2020-04-04  0:09 ` [RFC PATCH bpf-next 2/8] bpf: allow bpf_link pinning as read-only and enforce LINK_UPDATE Andrii Nakryiko
@ 2020-04-04  0:09 ` Andrii Nakryiko
  2020-04-04  0:09 ` [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID " Andrii Nakryiko
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-04  0:09 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Generate ID for each bpf_link using IDR, similarly to bpf_map and bpf_prog.
bpf_link creation, initialization, attachment, and exposing to user-space
through FD and ID is a complicated multi-step process, abstract it away
through bpf_link_primer and bpf_link_prime(), bpf_link_settle(), and
bpf_link_cleanup() internal API. They guarantee that until bpf_link is
properly attached, user-space won't be able to access partially-initialized
bpf_link either from FD or ID. All this allows to simplify bpf_link attachment
and error handling code.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/linux/bpf.h      |  17 +++--
 include/uapi/linux/bpf.h |   1 +
 kernel/bpf/cgroup.c      |  14 ++--
 kernel/bpf/syscall.c     | 140 +++++++++++++++++++++++++++------------
 4 files changed, 116 insertions(+), 56 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3474f8e34a63..67ce74890911 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1085,11 +1085,19 @@ int bpf_prog_new_fd(struct bpf_prog *prog);
 
 struct bpf_link {
 	atomic64_t refcnt;
+	u32 id;
 	const struct bpf_link_ops *ops;
 	struct bpf_prog *prog;
 	struct work_struct work;
 };
 
+struct bpf_link_primer {
+	struct bpf_link *link;
+	struct file *file;
+	int fd;
+	u32 id;
+};
+
 struct bpf_link_ops {
 	void (*release)(struct bpf_link *link);
 	void (*dealloc)(struct bpf_link *link);
@@ -1097,10 +1105,11 @@ struct bpf_link_ops {
 			   struct bpf_prog *old_prog);
 };
 
-void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
-		   struct bpf_prog *prog);
-void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
-		      int link_fd);
+void bpf_link_init(struct bpf_link *link,
+		   const struct bpf_link_ops *ops, struct bpf_prog *prog);
+int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer);
+int bpf_link_settle(struct bpf_link_primer *primer);
+void bpf_link_cleanup(struct bpf_link_primer *primer);
 void bpf_link_inc(struct bpf_link *link);
 void bpf_link_put(struct bpf_link *link);
 int bpf_link_new_fd(struct bpf_link *link, int flags);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2e29a671d67e..eccfd1dea951 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -523,6 +523,7 @@ union bpf_attr {
 			__u32		prog_id;
 			__u32		map_id;
 			__u32		btf_id;
+			__u32		link_id;
 		};
 		__u32		next_id;
 		__u32		open_flags;
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 54eacc44d1e4..ae84c5c90631 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -841,10 +841,10 @@ const struct bpf_link_ops bpf_cgroup_link_lops = {
 
 int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
+	struct bpf_link_primer link_primer;
 	struct bpf_cgroup_link *link;
-	struct file *link_file;
 	struct cgroup *cgrp;
-	int err, link_fd;
+	int err;
 
 	if (attr->link_create.flags)
 		return -EINVAL;
@@ -862,22 +862,20 @@ int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	link->cgroup = cgrp;
 	link->type = attr->link_create.attach_type;
 
-	link_file = bpf_link_new_file(&link->link, &link_fd);
-	if (IS_ERR(link_file)) {
+	err  = bpf_link_prime(&link->link, &link_primer);
+	if (err) {
 		kfree(link);
-		err = PTR_ERR(link_file);
 		goto out_put_cgroup;
 	}
 
 	err = cgroup_bpf_attach(cgrp, NULL, NULL, link, link->type,
 				BPF_F_ALLOW_MULTI);
 	if (err) {
-		bpf_link_cleanup(&link->link, link_file, link_fd);
+		bpf_link_cleanup(&link_primer);
 		goto out_put_cgroup;
 	}
 
-	fd_install(link_fd, link_file);
-	return link_fd;
+	return bpf_link_settle(&link_primer);
 
 out_put_cgroup:
 	cgroup_put(cgrp);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 47f323901ed9..8b3a7d5814ae 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -42,6 +42,8 @@ static DEFINE_IDR(prog_idr);
 static DEFINE_SPINLOCK(prog_idr_lock);
 static DEFINE_IDR(map_idr);
 static DEFINE_SPINLOCK(map_idr_lock);
+static DEFINE_IDR(link_idr);
+static DEFINE_SPINLOCK(link_idr_lock);
 
 int sysctl_unprivileged_bpf_disabled __read_mostly;
 
@@ -2184,25 +2186,39 @@ static int bpf_obj_get(const union bpf_attr *attr)
 				attr->file_flags);
 }
 
-void bpf_link_init(struct bpf_link *link, const struct bpf_link_ops *ops,
-		   struct bpf_prog *prog)
+void bpf_link_init(struct bpf_link *link,
+		   const struct bpf_link_ops *ops, struct bpf_prog *prog)
 {
 	atomic64_set(&link->refcnt, 1);
+	link->id = 0;
 	link->ops = ops;
 	link->prog = prog;
 }
 
+static void bpf_link_free_id(struct bpf_link *link)
+{
+	unsigned long flags;
+
+	if (!link->id)
+		return;
+
+	spin_lock_irqsave(&link_idr_lock, flags);
+	idr_remove(&link_idr, link->id);
+	link->id = 0;
+	spin_unlock_irqrestore(&link_idr_lock, flags);
+}
+
 /* Clean up bpf_link and corresponding anon_inode file and FD. After
  * anon_inode is created, bpf_link can't be just kfree()'d due to deferred
  * anon_inode's release() call. This helper manages marking bpf_link as
  * defunct, releases anon_inode file and puts reserved FD.
  */
-void bpf_link_cleanup(struct bpf_link *link, struct file *link_file,
-		      int link_fd)
+void bpf_link_cleanup(struct bpf_link_primer *primer)
 {
-	link->prog = NULL;
-	fput(link_file);
-	put_unused_fd(link_fd);
+	primer->link->prog = NULL;
+	bpf_link_free_id(primer->link);
+	fput(primer->file);
+	put_unused_fd(primer->fd);
 }
 
 void bpf_link_inc(struct bpf_link *link)
@@ -2213,6 +2229,7 @@ void bpf_link_inc(struct bpf_link *link)
 /* bpf_link_free is guaranteed to be called from process context */
 static void bpf_link_free(struct bpf_link *link)
 {
+	bpf_link_free_id(link);
 	if (link->prog) {
 		/* detach BPF program, clean up used resources */
 		link->ops->release(link);
@@ -2278,9 +2295,11 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
 	bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
 	seq_printf(m,
 		   "link_type:\t%s\n"
+		   "link_id:\t%u\n"
 		   "prog_tag:\t%s\n"
 		   "prog_id:\t%u\n",
 		   link_type,
+		   link->id,
 		   prog_tag,
 		   prog->aux->id);
 }
@@ -2295,38 +2314,76 @@ const struct file_operations bpf_link_fops = {
 	.write		= bpf_dummy_write,
 };
 
-int bpf_link_new_fd(struct bpf_link *link, int flags)
+static int bpf_link_alloc_id(struct bpf_link *link)
 {
-	return anon_inode_getfd("bpf-link", &bpf_link_fops, link,
-				flags | O_CLOEXEC);
-}
+	int id;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock_bh(&link_idr_lock);
+	id = idr_alloc_cyclic(&link_idr, link, 1, INT_MAX, GFP_ATOMIC);
+	spin_unlock_bh(&link_idr_lock);
+	idr_preload_end();
 
-/* Similar to bpf_link_new_fd, create anon_inode for given bpf_link, but
- * instead of immediately installing fd in fdtable, just reserve it and
- * return. Caller then need to either install it with fd_install(fd, file) or
- * release with put_unused_fd(fd).
- * This is useful for cases when bpf_link attachment/detachment are
- * complicated and expensive operations and should be delayed until all the fd
- * reservation and anon_inode creation succeeds.
+	return id;
+}
+
+/* Prepare bpf_link to be exposed to user-space by allocating anon_inode file,
+ * reserving unused FD and allocating ID from link_idr. This is to be paired
+ * with bpf_link_settle() to install FD and ID and expose bpf_link to
+ * user-space, if bpf_link is successfully attached. If not, bpf_link and
+ * pre-allocated resources are to be freed with bpf_cleanup() call. All the
+ * transient state is passed around in struct bpf_link_primer.
+ * This is preferred way to create and initialize bpf_link, especially when
+ * there are complicated and expensive operations inbetween creating bpf_link
+ * itself and attaching it to BPF hook. By using bpf_link_prime() and
+ * bpf_link_settle() kernel code using bpf_link doesn't have to perform
+ * expensive (and potentially failing) roll back operations in a rare case
+ * that file, FD, or ID can't be allocated.
  */
-struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd)
+int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer)
 {
 	struct file *file;
-	int fd;
+	int fd, id;
 
 	fd = get_unused_fd_flags(O_CLOEXEC);
 	if (fd < 0)
-		return ERR_PTR(fd);
+		return fd;
 
 	file = anon_inode_getfile("bpf_link", &bpf_link_fops, link,
 				  O_RDWR | O_CLOEXEC);
 	if (IS_ERR(file)) {
 		put_unused_fd(fd);
-		return file;
+		return PTR_ERR(file);
+	}
+
+	id = bpf_link_alloc_id(link);
+	if (id < 0) {
+		put_unused_fd(fd);
+		fput(file);
+		return id;
 	}
 
-	*reserved_fd = fd;
-	return file;
+	primer->link = link;
+	primer->file = file;
+	primer->fd = fd;
+	primer->id = id;
+	return 0;
+}
+
+int bpf_link_settle(struct bpf_link_primer *primer)
+{
+	/* make bpf_link fetchable by ID */
+	WRITE_ONCE(primer->link->id, primer->id);
+	/* make bpf_link fetchable by FD */
+	fd_install(primer->fd, primer->file);
+	/* pass through installed FD */
+	return primer->fd;
+}
+
+int bpf_link_new_fd(struct bpf_link *link, int flags)
+{
+	return anon_inode_getfd("bpf-link", &bpf_link_fops, link,
+				flags | O_CLOEXEC);
 }
 
 struct bpf_link *bpf_link_get_from_fd(u32 ufd, fmode_t *link_mode)
@@ -2374,9 +2431,9 @@ static const struct bpf_link_ops bpf_tracing_link_lops = {
 
 static int bpf_tracing_prog_attach(struct bpf_prog *prog)
 {
+	struct bpf_link_primer link_primer;
 	struct bpf_tracing_link *link;
-	struct file *link_file;
-	int link_fd, err;
+	int err;
 
 	switch (prog->type) {
 	case BPF_PROG_TYPE_TRACING:
@@ -2411,22 +2468,19 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
 	}
 	bpf_link_init(&link->link, &bpf_tracing_link_lops, prog);
 
-	link_file = bpf_link_new_file(&link->link, &link_fd);
-	if (IS_ERR(link_file)) {
+	err = bpf_link_prime(&link->link, &link_primer);
+	if (err) {
 		kfree(link);
-		err = PTR_ERR(link_file);
 		goto out_put_prog;
 	}
 
 	err = bpf_trampoline_link_prog(prog);
 	if (err) {
-		bpf_link_cleanup(&link->link, link_file, link_fd);
+		bpf_link_cleanup(&link_primer);
 		goto out_put_prog;
 	}
 
-	fd_install(link_fd, link_file);
-	return link_fd;
-
+	return bpf_link_settle(&link_primer);
 out_put_prog:
 	bpf_prog_put(prog);
 	return err;
@@ -2454,7 +2508,7 @@ static void bpf_raw_tp_link_dealloc(struct bpf_link *link)
 	kfree(raw_tp);
 }
 
-static const struct bpf_link_ops bpf_raw_tp_lops = {
+static const struct bpf_link_ops bpf_raw_tp_link_lops = {
 	.release = bpf_raw_tp_link_release,
 	.dealloc = bpf_raw_tp_link_dealloc,
 };
@@ -2463,13 +2517,13 @@ static const struct bpf_link_ops bpf_raw_tp_lops = {
 
 static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 {
+	struct bpf_link_primer link_primer;
 	struct bpf_raw_tp_link *link;
 	struct bpf_raw_event_map *btp;
-	struct file *link_file;
 	struct bpf_prog *prog;
 	const char *tp_name;
 	char buf[128];
-	int link_fd, err;
+	int err;
 
 	if (CHECK_ATTR(BPF_RAW_TRACEPOINT_OPEN))
 		return -EINVAL;
@@ -2522,24 +2576,22 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 		err = -ENOMEM;
 		goto out_put_btp;
 	}
-	bpf_link_init(&link->link, &bpf_raw_tp_lops, prog);
+	bpf_link_init(&link->link, &bpf_raw_tp_link_lops, prog);
 	link->btp = btp;
 
-	link_file = bpf_link_new_file(&link->link, &link_fd);
-	if (IS_ERR(link_file)) {
+	err = bpf_link_prime(&link->link, &link_primer);
+	if (err) {
 		kfree(link);
-		err = PTR_ERR(link_file);
 		goto out_put_btp;
 	}
 
 	err = bpf_probe_register(link->btp, prog);
 	if (err) {
-		bpf_link_cleanup(&link->link, link_file, link_fd);
+		bpf_link_cleanup(&link_primer);
 		goto out_put_btp;
 	}
 
-	fd_install(link_fd, link_file);
-	return link_fd;
+	return bpf_link_settle(&link_primer);
 
 out_put_btp:
 	bpf_put_raw_tracepoint(btp);
@@ -3471,7 +3523,7 @@ static int bpf_task_fd_query(const union bpf_attr *attr,
 	if (file->f_op == &bpf_link_fops) {
 		struct bpf_link *link = file->private_data;
 
-		if (link->ops == &bpf_raw_tp_lops) {
+		if (link->ops == &bpf_raw_tp_link_lops) {
 			struct bpf_raw_tp_link *raw_tp =
 				container_of(link, struct bpf_raw_tp_link, link);
 			struct bpf_raw_event_map *btp = raw_tp->btp;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  2020-04-04  0:09 [RFC PATCH bpf-next 0/8] bpf_link observability APIs Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2020-04-04  0:09 ` [RFC PATCH bpf-next 3/8] bpf: allocate ID for bpf_link Andrii Nakryiko
@ 2020-04-04  0:09 ` Andrii Nakryiko
  2020-04-06 11:34   ` Toke Høiland-Jørgensen
  2020-04-04  0:09 ` [RFC PATCH bpf-next 5/8] bpf: add support for BPF_OBJ_GET_INFO_BY_FD " Andrii Nakryiko
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-04  0:09 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add support to look up bpf_link by ID and iterate over all existing bpf_links
in the system. GET_FD_BY_ID code handles not-yet-ready bpf_link by checking
that its ID hasn't been set to non-zero value yet. Setting bpf_link's ID is
done as the very last step in finalizing bpf_link, together with installing
FD. This approach allows users of bpf_link in kernel code to not worry about
races between user-space and kernel code that hasn't finished attaching and
initializing bpf_link.

Further, it's critical that BPF_LINK_GET_FD_BY_ID only ever allows to create
bpf_link FD that's O_RDONLY. This is to protect processes owning bpf_link and
thus allowed to perform modifications on them (like LINK_UPDATE), from other
processes that got bpf_link ID from GET_NEXT_ID API. In the latter case, only
querying bpf_link information (implemented later in the series) will be
allowed.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/uapi/linux/bpf.h |  2 ++
 kernel/bpf/syscall.c     | 56 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index eccfd1dea951..407c086bc9e4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -113,6 +113,8 @@ enum bpf_cmd {
 	BPF_MAP_DELETE_BATCH,
 	BPF_LINK_CREATE,
 	BPF_LINK_UPDATE,
+	BPF_LINK_GET_FD_BY_ID,
+	BPF_LINK_GET_NEXT_ID,
 };
 
 enum bpf_map_type {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8b3a7d5814ae..527ec16702be 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3719,6 +3719,55 @@ static int link_update(union bpf_attr *attr)
 	return ret;
 }
 
+static int bpf_link_inc_not_zero(struct bpf_link *link)
+{
+	return atomic64_fetch_add_unless(&link->refcnt, 1, 0) ? 0 : -ENOENT;
+}
+
+#define BPF_LINK_GET_FD_BY_ID_LAST_FIELD open_flags
+
+static int bpf_link_get_fd_by_id(const union bpf_attr *attr)
+{
+	struct bpf_link *link;
+	u32 id = attr->link_id;
+	int f_flags;
+	int fd, err;
+
+	if (CHECK_ATTR(BPF_LINK_GET_FD_BY_ID) ||
+	    /* links are not allowed to be open by ID as writable */
+	    attr->open_flags & ~BPF_F_RDONLY)
+		return -EINVAL;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	f_flags = bpf_get_file_flag(attr->open_flags);
+	if (f_flags < 0)
+		return f_flags;
+
+	spin_lock_bh(&link_idr_lock);
+	link = idr_find(&link_idr, id);
+	/* before link is "settled", ID is 0, pretend it doesn't exist yet */
+	if (link) {
+		if (link->id)
+			err = bpf_link_inc_not_zero(link);
+		else
+			err = -EAGAIN;
+	} else {
+		err = -ENOENT;
+	}
+	spin_unlock_bh(&link_idr_lock);
+
+	if (err)
+		return err;
+
+	fd = bpf_link_new_fd(link, f_flags);
+	if (fd < 0)
+		bpf_link_put(link);
+
+	return fd;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr;
@@ -3836,6 +3885,13 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_LINK_UPDATE:
 		err = link_update(&attr);
 		break;
+	case BPF_LINK_GET_FD_BY_ID:
+		err = bpf_link_get_fd_by_id(&attr);
+		break;
+	case BPF_LINK_GET_NEXT_ID:
+		err = bpf_obj_get_next_id(&attr, uattr,
+					  &link_idr, &link_idr_lock);
+		break;
 	default:
 		err = -EINVAL;
 		break;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 5/8] bpf: add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link
  2020-04-04  0:09 [RFC PATCH bpf-next 0/8] bpf_link observability APIs Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2020-04-04  0:09 ` [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID " Andrii Nakryiko
@ 2020-04-04  0:09 ` Andrii Nakryiko
  2020-04-06 11:34   ` Toke Høiland-Jørgensen
  2020-04-04  0:09 ` [RFC PATCH bpf-next 6/8] libbpf: add low-level APIs for new bpf_link commands Andrii Nakryiko
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-04  0:09 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command.
Also enhance show_fdinfo to potentially include bpf_link type-specific
information (similarly to obj_info).

Also introduce enum bpf_link_type stored in bpf_link itself and expose it in
UAPI. bpf_link_tracing also now will store and return bpf_attach_type.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 include/linux/bpf-cgroup.h     |   2 -
 include/linux/bpf.h            |  10 +-
 include/linux/bpf_types.h      |   6 ++
 include/uapi/linux/bpf.h       |  28 ++++++
 kernel/bpf/btf.c               |   2 +
 kernel/bpf/cgroup.c            |  45 ++++++++-
 kernel/bpf/syscall.c           | 164 +++++++++++++++++++++++++++++----
 kernel/bpf/verifier.c          |   2 +
 tools/include/uapi/linux/bpf.h |  31 +++++++
 9 files changed, 266 insertions(+), 24 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index d2d969669564..ab95824a1d99 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -57,8 +57,6 @@ struct bpf_cgroup_link {
 	enum bpf_attach_type type;
 };
 
-extern const struct bpf_link_ops bpf_cgroup_link_lops;
-
 struct bpf_prog_list {
 	struct list_head node;
 	struct bpf_prog *prog;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 67ce74890911..8cf182d256d4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1026,9 +1026,11 @@ extern const struct file_operations bpf_prog_fops;
 	extern const struct bpf_verifier_ops _name ## _verifier_ops;
 #define BPF_MAP_TYPE(_id, _ops) \
 	extern const struct bpf_map_ops _ops;
+#define BPF_LINK_TYPE(_id, _name)
 #include <linux/bpf_types.h>
 #undef BPF_PROG_TYPE
 #undef BPF_MAP_TYPE
+#undef BPF_LINK_TYPE
 
 extern const struct bpf_prog_ops bpf_offload_prog_ops;
 extern const struct bpf_verifier_ops tc_cls_act_analyzer_ops;
@@ -1086,6 +1088,7 @@ int bpf_prog_new_fd(struct bpf_prog *prog);
 struct bpf_link {
 	atomic64_t refcnt;
 	u32 id;
+	enum bpf_link_type type;
 	const struct bpf_link_ops *ops;
 	struct bpf_prog *prog;
 	struct work_struct work;
@@ -1103,9 +1106,14 @@ struct bpf_link_ops {
 	void (*dealloc)(struct bpf_link *link);
 	int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
 			   struct bpf_prog *old_prog);
+	void (*show_fdinfo)(const struct bpf_link *link, struct seq_file *seq);
+	int (*fill_link_info)(const struct bpf_link *link,
+			      struct bpf_link_info *info,
+			      const struct bpf_link_info *uinfo,
+			      u32 info_len);
 };
 
-void bpf_link_init(struct bpf_link *link,
+void bpf_link_init(struct bpf_link *link, enum bpf_link_type type,
 		   const struct bpf_link_ops *ops, struct bpf_prog *prog);
 int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer);
 int bpf_link_settle(struct bpf_link_primer *primer);
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index ba0c2d56f8a3..8345cdf553b8 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -118,3 +118,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
 #if defined(CONFIG_BPF_JIT)
 BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
 #endif
+
+BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
+BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
+#ifdef CONFIG_CGROUP_BPF
+BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
+#endif
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 407c086bc9e4..d2f269082a33 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -222,6 +222,15 @@ enum bpf_attach_type {
 
 #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
 
+enum bpf_link_type {
+	BPF_LINK_TYPE_UNSPEC = 0,
+	BPF_LINK_TYPE_RAW_TRACEPOINT = 1,
+	BPF_LINK_TYPE_TRACING = 2,
+	BPF_LINK_TYPE_CGROUP = 3,
+
+	MAX_BPF_LINK_TYPE,
+};
+
 /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
  *
  * NONE(default): No further bpf programs allowed in the subtree.
@@ -3601,6 +3610,25 @@ struct bpf_btf_info {
 	__u32 id;
 } __attribute__((aligned(8)));
 
+struct bpf_link_info {
+	__u32 type;
+	__u32 id;
+	__u32 prog_id;
+	union {
+		struct {
+			__aligned_u64 tp_name; /* in/out: tp_name buffer ptr */
+			__u32 tp_name_len;     /* in/out: tp_name buffer len */
+		} raw_tracepoint;
+		struct {
+			__u32 attach_type;
+		} tracing;
+		struct {
+			__u64 cgroup_id;
+			__u32 attach_type;
+		} cgroup;
+	};
+} __attribute__((aligned(8)));
+
 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
  * by user and intended to be used by socket (e.g. to bind to, depends on
  * attach attach type).
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index d65c6912bdaf..a2cfba89a8e1 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3482,6 +3482,7 @@ extern char __weak __stop_BTF[];
 extern struct btf *btf_vmlinux;
 
 #define BPF_MAP_TYPE(_id, _ops)
+#define BPF_LINK_TYPE(_id, _name)
 static union {
 	struct bpf_ctx_convert {
 #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
@@ -3508,6 +3509,7 @@ static u8 bpf_ctx_convert_map[] = {
 	0, /* avoid empty array */
 };
 #undef BPF_MAP_TYPE
+#undef BPF_LINK_TYPE
 
 static const struct btf_member *
 btf_get_prog_ctx_type(struct bpf_verifier_log *log, struct btf *btf,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index ae84c5c90631..25ca4c937595 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -833,10 +833,50 @@ static void bpf_cgroup_link_dealloc(struct bpf_link *link)
 	kfree(cg_link);
 }
 
-const struct bpf_link_ops bpf_cgroup_link_lops = {
+static void bpf_cgroup_link_show_fdinfo(const struct bpf_link *link,
+					struct seq_file *seq)
+{
+	struct bpf_cgroup_link *cg_link =
+		container_of(link, struct bpf_cgroup_link, link);
+	u64 cg_id = 0;
+
+	mutex_lock(&cgroup_mutex);
+	if (cg_link->cgroup)
+		cg_id = cgroup_id(cg_link->cgroup);
+	mutex_unlock(&cgroup_mutex);
+
+	seq_printf(seq,
+		   "cgroup_id:\t%llu\n"
+		   "attach_type:\t%d\n",
+		   cg_id,
+		   cg_link->type);
+}
+
+static int bpf_cgroup_link_fill_link_info(const struct bpf_link *link,
+					  struct bpf_link_info *info,
+					  const struct bpf_link_info *uinfo,
+					  u32 info_len)
+{
+	struct bpf_cgroup_link *cg_link =
+		container_of(link, struct bpf_cgroup_link, link);
+	u64 cg_id = 0;
+
+	mutex_lock(&cgroup_mutex);
+	if (cg_link->cgroup)
+		cg_id = cgroup_id(cg_link->cgroup);
+	mutex_unlock(&cgroup_mutex);
+
+	info->cgroup.cgroup_id = cg_id;
+	info->cgroup.attach_type = cg_link->type;
+	return 0;
+}
+
+static const struct bpf_link_ops bpf_cgroup_link_lops = {
 	.release = bpf_cgroup_link_release,
 	.dealloc = bpf_cgroup_link_dealloc,
 	.update_prog = cgroup_bpf_replace,
+	.show_fdinfo = bpf_cgroup_link_show_fdinfo,
+	.fill_link_info = bpf_cgroup_link_fill_link_info,
 };
 
 int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
@@ -858,7 +898,8 @@ int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 		err = -ENOMEM;
 		goto out_put_cgroup;
 	}
-	bpf_link_init(&link->link, &bpf_cgroup_link_lops, prog);
+	bpf_link_init(&link->link, BPF_LINK_TYPE_CGROUP, &bpf_cgroup_link_lops,
+		      prog);
 	link->cgroup = cgrp;
 	link->type = attr->link_create.attach_type;
 
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 527ec16702be..ed94b322c757 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -51,9 +51,11 @@ static const struct bpf_map_ops * const bpf_map_types[] = {
 #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type)
 #define BPF_MAP_TYPE(_id, _ops) \
 	[_id] = &_ops,
+#define BPF_LINK_TYPE(_id, _name)
 #include <linux/bpf_types.h>
 #undef BPF_PROG_TYPE
 #undef BPF_MAP_TYPE
+#undef BPF_LINK_TYPE
 };
 
 /*
@@ -1550,9 +1552,11 @@ static const struct bpf_prog_ops * const bpf_prog_types[] = {
 #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
 	[_id] = & _name ## _prog_ops,
 #define BPF_MAP_TYPE(_id, _ops)
+#define BPF_LINK_TYPE(_id, _name)
 #include <linux/bpf_types.h>
 #undef BPF_PROG_TYPE
 #undef BPF_MAP_TYPE
+#undef BPF_LINK_TYPE
 };
 
 static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog)
@@ -2186,10 +2190,11 @@ static int bpf_obj_get(const union bpf_attr *attr)
 				attr->file_flags);
 }
 
-void bpf_link_init(struct bpf_link *link,
+void bpf_link_init(struct bpf_link *link, enum bpf_link_type type,
 		   const struct bpf_link_ops *ops, struct bpf_prog *prog)
 {
 	atomic64_set(&link->refcnt, 1);
+	link->type = type;
 	link->id = 0;
 	link->ops = ops;
 	link->prog = prog;
@@ -2270,27 +2275,23 @@ static int bpf_link_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-#ifdef CONFIG_PROC_FS
-static const struct bpf_link_ops bpf_raw_tp_lops;
-static const struct bpf_link_ops bpf_tracing_link_lops;
+#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type)
+#define BPF_MAP_TYPE(_id, _ops)
+#define BPF_LINK_TYPE(_id, _name) [_id] = #_name,
+static const char *bpf_link_type_strs[] = {
+	[BPF_LINK_TYPE_UNSPEC] = "<invalid>",
+#include <linux/bpf_types.h>
+};
+#undef BPF_PROG_TYPE
+#undef BPF_MAP_TYPE
+#undef BPF_LINK_TYPE
 
+#ifdef CONFIG_PROC_FS
 static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
 {
 	const struct bpf_link *link = filp->private_data;
 	const struct bpf_prog *prog = link->prog;
 	char prog_tag[sizeof(prog->tag) * 2 + 1] = { };
-	const char *link_type;
-
-	if (link->ops == &bpf_raw_tp_lops)
-		link_type = "raw_tracepoint";
-	else if (link->ops == &bpf_tracing_link_lops)
-		link_type = "tracing";
-#ifdef CONFIG_CGROUP_BPF
-	else if (link->ops == &bpf_cgroup_link_lops)
-		link_type = "cgroup";
-#endif
-	else
-		link_type = "unknown";
 
 	bin2hex(prog_tag, prog->tag, sizeof(prog->tag));
 	seq_printf(m,
@@ -2298,10 +2299,12 @@ static void bpf_link_show_fdinfo(struct seq_file *m, struct file *filp)
 		   "link_id:\t%u\n"
 		   "prog_tag:\t%s\n"
 		   "prog_id:\t%u\n",
-		   link_type,
+		   bpf_link_type_strs[link->type],
 		   link->id,
 		   prog_tag,
 		   prog->aux->id);
+	if (link->ops->show_fdinfo)
+		link->ops->show_fdinfo(link, m);
 }
 #endif
 
@@ -2409,6 +2412,7 @@ struct bpf_link *bpf_link_get_from_fd(u32 ufd, fmode_t *link_mode)
 
 struct bpf_tracing_link {
 	struct bpf_link link;
+	enum bpf_attach_type attach_type;
 };
 
 static void bpf_tracing_link_release(struct bpf_link *link)
@@ -2424,9 +2428,35 @@ static void bpf_tracing_link_dealloc(struct bpf_link *link)
 	kfree(tr_link);
 }
 
+static void bpf_tracing_link_show_fdinfo(const struct bpf_link *link,
+					 struct seq_file *seq)
+{
+	struct bpf_tracing_link *tr_link =
+		container_of(link, struct bpf_tracing_link, link);
+
+	seq_printf(seq,
+		   "attach_type:\t%d\n",
+		   tr_link->attach_type);
+}
+
+static int bpf_tracing_link_fill_link_info(const struct bpf_link *link,
+					   struct bpf_link_info *info,
+					   const struct bpf_link_info *uinfo,
+					   u32 info_len)
+{
+	struct bpf_tracing_link *tr_link =
+		container_of(link, struct bpf_tracing_link, link);
+
+	info->tracing.attach_type = tr_link->attach_type;
+
+	return 0;
+}
+
 static const struct bpf_link_ops bpf_tracing_link_lops = {
 	.release = bpf_tracing_link_release,
 	.dealloc = bpf_tracing_link_dealloc,
+	.show_fdinfo = bpf_tracing_link_show_fdinfo,
+	.fill_link_info = bpf_tracing_link_fill_link_info,
 };
 
 static int bpf_tracing_prog_attach(struct bpf_prog *prog)
@@ -2466,7 +2496,9 @@ static int bpf_tracing_prog_attach(struct bpf_prog *prog)
 		err = -ENOMEM;
 		goto out_put_prog;
 	}
-	bpf_link_init(&link->link, &bpf_tracing_link_lops, prog);
+	bpf_link_init(&link->link, BPF_LINK_TYPE_TRACING,
+		      &bpf_tracing_link_lops, prog);
+	link->attach_type = prog->expected_attach_type;
 
 	err = bpf_link_prime(&link->link, &link_primer);
 	if (err) {
@@ -2508,9 +2540,66 @@ static void bpf_raw_tp_link_dealloc(struct bpf_link *link)
 	kfree(raw_tp);
 }
 
+static void bpf_raw_tp_link_show_fdinfo(const struct bpf_link *link,
+					struct seq_file *seq)
+{
+	struct bpf_raw_tp_link *raw_tp_link =
+		container_of(link, struct bpf_raw_tp_link, link);
+
+	seq_printf(seq,
+		   "tp_name:\t%s\n",
+		   raw_tp_link->btp->tp->name);
+}
+
+static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link,
+					  struct bpf_link_info *info,
+					  const struct bpf_link_info *uinfo,
+					  u32 info_len)
+{
+	struct bpf_raw_tp_link *raw_tp_link =
+		container_of(link, struct bpf_raw_tp_link, link);
+	u64 ubuf_ptr;
+	char __user *ubuf = u64_to_user_ptr(uinfo->raw_tracepoint.tp_name);
+	const char *tp_name = raw_tp_link->btp->tp->name;
+	size_t tp_len;
+	u32 ulen;
+
+	if (get_user(ulen, &uinfo->raw_tracepoint.tp_name_len))
+		return -EFAULT;
+	if (get_user(ubuf_ptr, &uinfo->raw_tracepoint.tp_name))
+		return -EFAULT;
+	ubuf = u64_to_user_ptr(ubuf_ptr);
+
+	if (ulen && !ubuf)
+		return -EINVAL;
+	if (!ubuf)
+		return 0;
+
+	tp_len = strlen(raw_tp_link->btp->tp->name);
+	info->raw_tracepoint.tp_name_len = tp_len + 1;
+	info->raw_tracepoint.tp_name = (u64)(unsigned long)ubuf;
+
+	if (ulen >= tp_len + 1) {
+		if (copy_to_user(ubuf, tp_name, tp_len + 1))
+			return -EFAULT;
+	} else {
+		char zero = '\0';
+
+		if (copy_to_user(ubuf, tp_name, ulen - 1))
+			return -EFAULT;
+		if (put_user(zero, ubuf + ulen - 1))
+			return -EFAULT;
+		return -ENOSPC;
+	}
+
+	return 0;
+}
+
 static const struct bpf_link_ops bpf_raw_tp_link_lops = {
 	.release = bpf_raw_tp_link_release,
 	.dealloc = bpf_raw_tp_link_dealloc,
+	.show_fdinfo = bpf_raw_tp_link_show_fdinfo,
+	.fill_link_info = bpf_raw_tp_link_fill_link_info,
 };
 
 #define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
@@ -2576,7 +2665,8 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 		err = -ENOMEM;
 		goto out_put_btp;
 	}
-	bpf_link_init(&link->link, &bpf_raw_tp_link_lops, prog);
+	bpf_link_init(&link->link, BPF_LINK_TYPE_RAW_TRACEPOINT,
+		      &bpf_raw_tp_link_lops, prog);
 	link->btp = btp;
 
 	err = bpf_link_prime(&link->link, &link_primer);
@@ -3372,6 +3462,39 @@ static int bpf_btf_get_info_by_fd(struct btf *btf,
 	return btf_get_info_by_fd(btf, attr, uattr);
 }
 
+static int bpf_link_get_info_by_fd(struct bpf_link *link,
+				  const union bpf_attr *attr,
+				  union bpf_attr __user *uattr)
+{
+	struct bpf_link_info __user *uinfo = u64_to_user_ptr(attr->info.info);
+	struct bpf_link_info info;
+	u32 info_len = attr->info.info_len;
+	int err;
+
+	err = bpf_check_uarg_tail_zero(uinfo, sizeof(info), info_len);
+	if (err)
+		return err;
+	info_len = min_t(u32, sizeof(info), info_len);
+
+	memset(&info, 0, sizeof(info));
+	info.type = link->type;
+	info.id = link->id;
+	info.prog_id = link->prog->aux->id;
+
+	if (link->ops->fill_link_info) {
+		err = link->ops->fill_link_info(link, &info, uinfo, info_len);
+		if (err)
+			return err;
+	}
+
+	if (copy_to_user(uinfo, &info, info_len) ||
+	    put_user(info_len, &uattr->info.info_len))
+		return -EFAULT;
+
+	return 0;
+}
+
+
 #define BPF_OBJ_GET_INFO_BY_FD_LAST_FIELD info.info
 
 static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
@@ -3396,6 +3519,9 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 					     uattr);
 	else if (f.file->f_op == &btf_fops)
 		err = bpf_btf_get_info_by_fd(f.file->private_data, attr, uattr);
+	else if (f.file->f_op == &bpf_link_fops)
+		err = bpf_link_get_info_by_fd(f.file->private_data,
+					      attr, uattr);
 	else
 		err = -EINVAL;
 
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 04c6630cc18f..48dd5fc7c269 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -28,9 +28,11 @@ static const struct bpf_verifier_ops * const bpf_verifier_ops[] = {
 #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type) \
 	[_id] = & _name ## _verifier_ops,
 #define BPF_MAP_TYPE(_id, _ops)
+#define BPF_LINK_TYPE(_id, _name)
 #include <linux/bpf_types.h>
 #undef BPF_PROG_TYPE
 #undef BPF_MAP_TYPE
+#undef BPF_LINK_TYPE
 };
 
 /* bpf_check() is a static code analyzer that walks eBPF program
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2e29a671d67e..d2f269082a33 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -113,6 +113,8 @@ enum bpf_cmd {
 	BPF_MAP_DELETE_BATCH,
 	BPF_LINK_CREATE,
 	BPF_LINK_UPDATE,
+	BPF_LINK_GET_FD_BY_ID,
+	BPF_LINK_GET_NEXT_ID,
 };
 
 enum bpf_map_type {
@@ -220,6 +222,15 @@ enum bpf_attach_type {
 
 #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
 
+enum bpf_link_type {
+	BPF_LINK_TYPE_UNSPEC = 0,
+	BPF_LINK_TYPE_RAW_TRACEPOINT = 1,
+	BPF_LINK_TYPE_TRACING = 2,
+	BPF_LINK_TYPE_CGROUP = 3,
+
+	MAX_BPF_LINK_TYPE,
+};
+
 /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
  *
  * NONE(default): No further bpf programs allowed in the subtree.
@@ -523,6 +534,7 @@ union bpf_attr {
 			__u32		prog_id;
 			__u32		map_id;
 			__u32		btf_id;
+			__u32		link_id;
 		};
 		__u32		next_id;
 		__u32		open_flags;
@@ -3598,6 +3610,25 @@ struct bpf_btf_info {
 	__u32 id;
 } __attribute__((aligned(8)));
 
+struct bpf_link_info {
+	__u32 type;
+	__u32 id;
+	__u32 prog_id;
+	union {
+		struct {
+			__aligned_u64 tp_name; /* in/out: tp_name buffer ptr */
+			__u32 tp_name_len;     /* in/out: tp_name buffer len */
+		} raw_tracepoint;
+		struct {
+			__u32 attach_type;
+		} tracing;
+		struct {
+			__u64 cgroup_id;
+			__u32 attach_type;
+		} cgroup;
+	};
+} __attribute__((aligned(8)));
+
 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
  * by user and intended to be used by socket (e.g. to bind to, depends on
  * attach attach type).
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 6/8] libbpf: add low-level APIs for new bpf_link commands
  2020-04-04  0:09 [RFC PATCH bpf-next 0/8] bpf_link observability APIs Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2020-04-04  0:09 ` [RFC PATCH bpf-next 5/8] bpf: add support for BPF_OBJ_GET_INFO_BY_FD " Andrii Nakryiko
@ 2020-04-04  0:09 ` Andrii Nakryiko
  2020-04-04  0:09 ` [RFC PATCH bpf-next 7/8] bpftool: expose attach_type-to-string array to non-cgroup code Andrii Nakryiko
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-04  0:09 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add low-level API calls for bpf_link_get_next_id() and
bpf_link_get_fd_by_id().

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/bpf.c      | 19 +++++++++++++++++--
 tools/lib/bpf/bpf.h      |  4 +++-
 tools/lib/bpf/libbpf.map |  6 ++++++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 5cc1b0785d18..8f2f0958d446 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -721,6 +721,11 @@ int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id)
 	return bpf_obj_get_next_id(start_id, next_id, BPF_BTF_GET_NEXT_ID);
 }
 
+int bpf_link_get_next_id(__u32 start_id, __u32 *next_id)
+{
+	return bpf_obj_get_next_id(start_id, next_id, BPF_LINK_GET_NEXT_ID);
+}
+
 int bpf_prog_get_fd_by_id(__u32 id)
 {
 	union bpf_attr attr;
@@ -751,13 +756,23 @@ int bpf_btf_get_fd_by_id(__u32 id)
 	return sys_bpf(BPF_BTF_GET_FD_BY_ID, &attr, sizeof(attr));
 }
 
-int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
+int bpf_link_get_fd_by_id(__u32 id)
+{
+	union bpf_attr attr;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.link_id = id;
+
+	return sys_bpf(BPF_LINK_GET_FD_BY_ID, &attr, sizeof(attr));
+}
+
+int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len)
 {
 	union bpf_attr attr;
 	int err;
 
 	memset(&attr, 0, sizeof(attr));
-	attr.info.bpf_fd = prog_fd;
+	attr.info.bpf_fd = bpf_fd;
 	attr.info.info_len = *info_len;
 	attr.info.info = ptr_to_u64(info);
 
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 46d47afdd887..335b457b3a25 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -216,10 +216,12 @@ LIBBPF_API int bpf_prog_test_run(int prog_fd, int repeat, void *data,
 LIBBPF_API int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
 LIBBPF_API int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
 LIBBPF_API int bpf_btf_get_next_id(__u32 start_id, __u32 *next_id);
+LIBBPF_API int bpf_link_get_next_id(__u32 start_id, __u32 *next_id);
 LIBBPF_API int bpf_prog_get_fd_by_id(__u32 id);
 LIBBPF_API int bpf_map_get_fd_by_id(__u32 id);
 LIBBPF_API int bpf_btf_get_fd_by_id(__u32 id);
-LIBBPF_API int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len);
+LIBBPF_API int bpf_link_get_fd_by_id(__u32 id);
+LIBBPF_API int bpf_obj_get_info_by_fd(int bpf_fd, void *info, __u32 *info_len);
 LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
 			      __u32 query_flags, __u32 *attach_flags,
 			      __u32 *prog_ids, __u32 *prog_cnt);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index bb8831605b25..7cd49aa38005 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -254,3 +254,9 @@ LIBBPF_0.0.8 {
 		bpf_program__set_lsm;
 		bpf_set_link_xdp_fd_opts;
 } LIBBPF_0.0.7;
+
+LIBBPF_0.0.9 {
+	global:
+		bpf_link_get_fd_by_id;
+		bpf_link_get_next_id;
+} LIBBPF_0.0.8;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 7/8] bpftool: expose attach_type-to-string array to non-cgroup code
  2020-04-04  0:09 [RFC PATCH bpf-next 0/8] bpf_link observability APIs Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2020-04-04  0:09 ` [RFC PATCH bpf-next 6/8] libbpf: add low-level APIs for new bpf_link commands Andrii Nakryiko
@ 2020-04-04  0:09 ` Andrii Nakryiko
  2020-04-04  0:09 ` [RFC PATCH bpf-next 8/8] bpftool: add bpf_link show and pin support Andrii Nakryiko
  2020-04-05 16:26 ` [RFC PATCH bpf-next 0/8] bpf_link observability APIs David Ahern
  8 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-04  0:09 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Move attach_type_strings into main.h for access in non-cgroup code.
bpf_attach_type is used for non-cgroup attach types quite widely now. So also
complete missing string translations for non-cgroup attach types.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/bpf/bpftool/cgroup.c | 28 +++-------------------------
 tools/bpf/bpftool/main.h   | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+), 25 deletions(-)

diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c
index 62c6a1d7cd18..d1fd9c9f2690 100644
--- a/tools/bpf/bpftool/cgroup.c
+++ b/tools/bpf/bpftool/cgroup.c
@@ -31,35 +31,13 @@
 
 static unsigned int query_flags;
 
-static const char * const attach_type_strings[] = {
-	[BPF_CGROUP_INET_INGRESS] = "ingress",
-	[BPF_CGROUP_INET_EGRESS] = "egress",
-	[BPF_CGROUP_INET_SOCK_CREATE] = "sock_create",
-	[BPF_CGROUP_SOCK_OPS] = "sock_ops",
-	[BPF_CGROUP_DEVICE] = "device",
-	[BPF_CGROUP_INET4_BIND] = "bind4",
-	[BPF_CGROUP_INET6_BIND] = "bind6",
-	[BPF_CGROUP_INET4_CONNECT] = "connect4",
-	[BPF_CGROUP_INET6_CONNECT] = "connect6",
-	[BPF_CGROUP_INET4_POST_BIND] = "post_bind4",
-	[BPF_CGROUP_INET6_POST_BIND] = "post_bind6",
-	[BPF_CGROUP_UDP4_SENDMSG] = "sendmsg4",
-	[BPF_CGROUP_UDP6_SENDMSG] = "sendmsg6",
-	[BPF_CGROUP_SYSCTL] = "sysctl",
-	[BPF_CGROUP_UDP4_RECVMSG] = "recvmsg4",
-	[BPF_CGROUP_UDP6_RECVMSG] = "recvmsg6",
-	[BPF_CGROUP_GETSOCKOPT] = "getsockopt",
-	[BPF_CGROUP_SETSOCKOPT] = "setsockopt",
-	[__MAX_BPF_ATTACH_TYPE] = NULL,
-};
-
 static enum bpf_attach_type parse_attach_type(const char *str)
 {
 	enum bpf_attach_type type;
 
 	for (type = 0; type < __MAX_BPF_ATTACH_TYPE; type++) {
-		if (attach_type_strings[type] &&
-		    is_prefix(str, attach_type_strings[type]))
+		if (attach_type_name[type] &&
+		    is_prefix(str, attach_type_name[type]))
 			return type;
 	}
 
@@ -171,7 +149,7 @@ static int show_attached_bpf_progs(int cgroup_fd, enum bpf_attach_type type,
 	}
 
 	for (iter = 0; iter < prog_cnt; iter++)
-		show_bpf_prog(prog_ids[iter], attach_type_strings[type],
+		show_bpf_prog(prog_ids[iter], attach_type_name[type],
 			      attach_flags_str, level);
 
 	return 0;
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 86f14ce26fd7..dd212d2af923 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -83,6 +83,38 @@ static const char * const prog_type_name[] = {
 	[BPF_PROG_TYPE_EXT]			= "ext",
 };
 
+static const char * const attach_type_name[] = {
+	[BPF_CGROUP_INET_INGRESS] = "ingress",
+	[BPF_CGROUP_INET_EGRESS] = "egress",
+	[BPF_CGROUP_INET_SOCK_CREATE] = "sock_create",
+	[BPF_CGROUP_SOCK_OPS] = "sock_ops",
+	[BPF_CGROUP_DEVICE] = "device",
+	[BPF_CGROUP_INET4_BIND] = "bind4",
+	[BPF_CGROUP_INET6_BIND] = "bind6",
+	[BPF_CGROUP_INET4_CONNECT] = "connect4",
+	[BPF_CGROUP_INET6_CONNECT] = "connect6",
+	[BPF_CGROUP_INET4_POST_BIND] = "post_bind4",
+	[BPF_CGROUP_INET6_POST_BIND] = "post_bind6",
+	[BPF_CGROUP_UDP4_SENDMSG] = "sendmsg4",
+	[BPF_CGROUP_UDP6_SENDMSG] = "sendmsg6",
+	[BPF_CGROUP_SYSCTL] = "sysctl",
+	[BPF_CGROUP_UDP4_RECVMSG] = "recvmsg4",
+	[BPF_CGROUP_UDP6_RECVMSG] = "recvmsg6",
+	[BPF_CGROUP_GETSOCKOPT] = "getsockopt",
+	[BPF_CGROUP_SETSOCKOPT] = "setsockopt",
+
+	[BPF_SK_SKB_STREAM_PARSER] = "sk_skb_stream_parser",
+	[BPF_SK_SKB_STREAM_VERDICT] = "sk_skb_stream_verdict",
+	[BPF_SK_MSG_VERDICT] = "sk_msg_verdict",
+	[BPF_LIRC_MODE2] = "lirc_mode2",
+	[BPF_FLOW_DISSECTOR] = "flow_dissector",
+	[BPF_TRACE_RAW_TP] = "raw_tp",
+	[BPF_TRACE_FENTRY] = "fentry",
+	[BPF_TRACE_FEXIT] = "fexit",
+	[BPF_MODIFY_RETURN] = "mod_ret",
+	[BPF_LSM_MAC] = "lsm_mac",
+};
+
 extern const char * const map_type_name[];
 extern const size_t map_type_name_size;
 
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 8/8] bpftool: add bpf_link show and pin support
  2020-04-04  0:09 [RFC PATCH bpf-next 0/8] bpf_link observability APIs Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2020-04-04  0:09 ` [RFC PATCH bpf-next 7/8] bpftool: expose attach_type-to-string array to non-cgroup code Andrii Nakryiko
@ 2020-04-04  0:09 ` Andrii Nakryiko
  2020-04-08 23:44   ` David Ahern
  2020-04-05 16:26 ` [RFC PATCH bpf-next 0/8] bpf_link observability APIs David Ahern
  8 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-04  0:09 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Add bpftool link show and bpftool link pin commands.

Example plain output for `link show`:

[vmuser@archvm bpf]$ sudo ~/local/linux/tools/bpf/bpftool/bpftool link
1: tracing  prog 12
        prog_type tracing  attach_type fentry
2: tracing  prog 13
        prog_type tracing  attach_type fentry
3: tracing  prog 14
        prog_type tracing  attach_type fentry
4: tracing  prog 15
        prog_type tracing  attach_type fentry
5: tracing  prog 16
        prog_type tracing  attach_type fentry
6: tracing  prog 17
        prog_type tracing  attach_type fentry
7: raw_tracepoint  prog 21
        tp 'sys_enter'
8: cgroup  prog 25
        cgroup_id 584  attach_type egress
9: cgroup  prog 25
        cgroup_id 599  attach_type egress
10: cgroup  prog 25
        cgroup_id 614  attach_type egress
11: cgroup  prog 25
        cgroup_id 629  attach_type egress

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/bpf/bpftool/link.c | 403 +++++++++++++++++++++++++++++++++++++++
 tools/bpf/bpftool/main.c |   2 +
 tools/bpf/bpftool/main.h |   5 +
 3 files changed, 410 insertions(+)
 create mode 100644 tools/bpf/bpftool/link.c

diff --git a/tools/bpf/bpftool/link.c b/tools/bpf/bpftool/link.c
new file mode 100644
index 000000000000..3f708c29e3eb
--- /dev/null
+++ b/tools/bpf/bpftool/link.c
@@ -0,0 +1,403 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2020 Facebook */
+
+#include <errno.h>
+#include <net/if.h>
+#include <stdio.h>
+#include <unistd.h>
+
+#include <bpf/bpf.h>
+
+#include "json_writer.h"
+#include "main.h"
+
+const char * const link_type_name[] = {
+	[BPF_LINK_TYPE_UNSPEC]			= "unspec",
+	[BPF_LINK_TYPE_RAW_TRACEPOINT]		= "raw_tracepoint",
+	[BPF_LINK_TYPE_TRACING]			= "tracing",
+	[BPF_LINK_TYPE_CGROUP]			= "cgroup",
+};
+
+const size_t link_type_name_size = ARRAY_SIZE(link_type_name);
+
+static int link_parse_fds(int *argc, char ***argv, int **fds)
+{
+	if (is_prefix(**argv, "id")) {
+		unsigned int id;
+		char *endptr;
+
+		NEXT_ARGP();
+
+		id = strtoul(**argv, &endptr, 0);
+		if (*endptr) {
+			p_err("can't parse %s as ID", **argv);
+			return -1;
+		}
+		NEXT_ARGP();
+
+		(*fds)[0] = bpf_link_get_fd_by_id(id);
+		if ((*fds)[0] < 0) {
+			p_err("get link by id (%u): %s", id, strerror(errno));
+			return -1;
+		}
+		return 1;
+	} else if (is_prefix(**argv, "pinned")) {
+		char *path;
+
+		NEXT_ARGP();
+
+		path = **argv;
+		NEXT_ARGP();
+
+		(*fds)[0] = open_obj_pinned_any(path, BPF_OBJ_LINK);
+		if ((*fds)[0] < 0)
+			return -1;
+		return 1;
+	}
+
+	p_err("expected 'id' or 'pinned', got: '%s'?", **argv);
+	return -1;
+}
+
+static int link_parse_fd(int *argc, char ***argv)
+{
+	int *fds = NULL;
+	int nb_fds, fd;
+
+	fds = malloc(sizeof(int));
+	if (!fds) {
+		p_err("mem alloc failed");
+		return -1;
+	}
+	nb_fds = link_parse_fds(argc, argv, &fds);
+	if (nb_fds != 1) {
+		if (nb_fds > 1) {
+			p_err("several links match this handle");
+			while (nb_fds--)
+				close(fds[nb_fds]);
+		}
+		fd = -1;
+		goto exit_free;
+	}
+
+	fd = fds[0];
+exit_free:
+	free(fds);
+	return fd;
+}
+
+static void
+show_link_header_json(struct bpf_link_info *info, json_writer_t *wtr)
+{
+	jsonw_uint_field(wtr, "id", info->id);
+	if (info->type < ARRAY_SIZE(link_type_name))
+		jsonw_string_field(wtr, "type", link_type_name[info->type]);
+	else
+		jsonw_uint_field(wtr, "type", info->type);
+
+	jsonw_uint_field(json_wtr, "prog_id", info->prog_id);
+}
+
+static int get_prog_info(int prog_id, struct bpf_prog_info *info)
+{
+	__u32 len = sizeof(*info);
+	int err, prog_fd;
+
+	prog_fd = bpf_prog_get_fd_by_id(prog_id);
+	if (prog_fd < 0)
+		return prog_fd;
+
+	memset(info, 0, sizeof(*info));
+	err = bpf_obj_get_info_by_fd(prog_fd, info, &len);
+	if (err) {
+		p_err("can't get prog info: %s", strerror(errno));
+		close(prog_fd);
+		return err;
+	}
+
+	close(prog_fd);
+	return 0;
+}
+
+static int show_link_close_json(int fd, struct bpf_link_info *info)
+{
+	struct bpf_prog_info prog_info;
+	int err;
+
+	jsonw_start_object(json_wtr);
+
+	show_link_header_json(info, json_wtr);
+
+	switch (info->type) {
+	case BPF_LINK_TYPE_RAW_TRACEPOINT:
+		jsonw_string_field(json_wtr, "tp_name",
+				   (const char *)info->raw_tracepoint.tp_name);
+		break;
+	case BPF_LINK_TYPE_TRACING:
+		err = get_prog_info(info->prog_id, &prog_info);
+		if (err)
+			return err;
+
+		if (prog_info.type < ARRAY_SIZE(prog_type_name))
+			jsonw_string_field(json_wtr, "prog_type",
+					   prog_type_name[prog_info.type]);
+		else
+			jsonw_uint_field(json_wtr, "prog_type",
+					 prog_info.type);
+
+		if (info->tracing.attach_type < ARRAY_SIZE(attach_type_name))
+			jsonw_string_field(json_wtr, "attach_type",
+			       attach_type_name[info->tracing.attach_type]);
+		else
+			jsonw_uint_field(json_wtr, "attach_type",
+					 info->tracing.attach_type);
+		break;
+	case BPF_LINK_TYPE_CGROUP:
+		jsonw_lluint_field(json_wtr, "cgroup_id",
+				   info->cgroup.cgroup_id);
+		if (info->cgroup.attach_type < ARRAY_SIZE(attach_type_name))
+			jsonw_string_field(json_wtr, "attach_type",
+			       attach_type_name[info->cgroup.attach_type]);
+		else
+			jsonw_uint_field(json_wtr, "attach_type",
+					 info->cgroup.attach_type);
+		break;
+	default:
+		break;
+	}
+
+	if (!hash_empty(link_table.table)) {
+		struct pinned_obj *obj;
+
+		jsonw_name(json_wtr, "pinned");
+		jsonw_start_array(json_wtr);
+		hash_for_each_possible(link_table.table, obj, hash, info->id) {
+			if (obj->id == info->id)
+				jsonw_string(json_wtr, obj->path);
+		}
+		jsonw_end_array(json_wtr);
+	}
+	jsonw_end_object(json_wtr);
+
+	return 0;
+}
+
+static void show_link_header_plain(struct bpf_link_info *info)
+{
+	printf("%u: ", info->id);
+	if (info->type < ARRAY_SIZE(link_type_name))
+		printf("%s  ", link_type_name[info->type]);
+	else
+		printf("type %u  ", info->type);
+
+	printf("prog %u  ", info->prog_id);
+}
+
+static int show_link_close_plain(int fd, struct bpf_link_info *info)
+{
+	struct bpf_prog_info prog_info;
+	int err;
+
+	show_link_header_plain(info);
+
+	switch (info->type) {
+	case BPF_LINK_TYPE_RAW_TRACEPOINT:
+		printf("\n\ttp '%s'  ", (const char *)info->raw_tracepoint.tp_name);
+		break;
+	case BPF_LINK_TYPE_TRACING:
+		err = get_prog_info(info->prog_id, &prog_info);
+		if (err)
+			return err;
+
+		if (prog_info.type < ARRAY_SIZE(prog_type_name))
+			printf("\n\tprog_type %s  ",
+			       prog_type_name[prog_info.type]);
+		else
+			printf("\n\tprog_type %u  ", prog_info.type);
+
+		if (info->tracing.attach_type < ARRAY_SIZE(attach_type_name))
+			printf("attach_type %s  ",
+			       attach_type_name[info->tracing.attach_type]);
+		else
+			printf("attach_type %u  ", info->tracing.attach_type);
+		break;
+	case BPF_LINK_TYPE_CGROUP:
+		printf("\n\tcgroup_id %zu  ", (size_t)info->cgroup.cgroup_id);
+		if (info->cgroup.attach_type < ARRAY_SIZE(attach_type_name))
+			printf("attach_type %s  ",
+			       attach_type_name[info->cgroup.attach_type]);
+		else
+			printf("attach_type %u  ", info->cgroup.attach_type);
+		break;
+	default:
+		break;
+	}
+
+	if (!hash_empty(link_table.table)) {
+		struct pinned_obj *obj;
+
+		hash_for_each_possible(link_table.table, obj, hash, info->id) {
+			if (obj->id == info->id)
+				printf("\n\tpinned %s", obj->path);
+		}
+	}
+
+	printf("\n");
+
+	return 0;
+}
+
+static int do_show_link(int fd)
+{
+	struct bpf_link_info info;
+	__u32 len = sizeof(info);
+	char raw_tp_name[256];
+	int err;
+
+	memset(&info, 0, sizeof(info));
+again:
+	err = bpf_obj_get_info_by_fd(fd, &info, &len);
+	if (err) {
+		p_err("can't get link info: %s",
+		      strerror(errno));
+		close(fd);
+		return err;
+	}
+	if (info.type == BPF_LINK_TYPE_RAW_TRACEPOINT &&
+	    !info.raw_tracepoint.tp_name) {
+		info.raw_tracepoint.tp_name = (unsigned long)&raw_tp_name;
+		info.raw_tracepoint.tp_name_len = sizeof(raw_tp_name);
+		goto again;
+	}
+
+	if (json_output)
+		show_link_close_json(fd, &info);
+	else
+		show_link_close_plain(fd, &info);
+
+	close(fd);
+	return 0;
+}
+
+static int do_show_subset(int argc, char **argv)
+{
+	int *fds = NULL;
+	int nb_fds, i;
+	int err = -1;
+
+	fds = malloc(sizeof(int));
+	if (!fds) {
+		p_err("mem alloc failed");
+		return -1;
+	}
+	nb_fds = link_parse_fds(&argc, &argv, &fds);
+	if (nb_fds < 1)
+		goto exit_free;
+
+	if (json_output && nb_fds > 1)
+		jsonw_start_array(json_wtr);	/* root array */
+	for (i = 0; i < nb_fds; i++) {
+		err = do_show_link(fds[i]);
+		if (err) {
+			for (; i + 1 < nb_fds; i++)
+				close(fds[i]);
+			break;
+		}
+	}
+	if (json_output && nb_fds > 1)
+		jsonw_end_array(json_wtr);	/* root array */
+
+exit_free:
+	free(fds);
+	return err;
+}
+
+static int do_show(int argc, char **argv)
+{
+	__u32 id = 0;
+	int err;
+	int fd;
+
+	if (show_pinned)
+		build_pinned_obj_table(&link_table, BPF_OBJ_LINK);
+
+	if (argc == 2)
+		return do_show_subset(argc, argv);
+
+	if (argc)
+		return BAD_ARG();
+
+	if (json_output)
+		jsonw_start_array(json_wtr);
+	while (true) {
+		err = bpf_link_get_next_id(id, &id);
+		if (err) {
+			if (errno == ENOENT)
+				break;
+			p_err("can't get next link: %s%s", strerror(errno),
+			      errno == EINVAL ? " -- kernel too old?" : "");
+			break;
+		}
+
+		fd = bpf_link_get_fd_by_id(id);
+		if (fd < 0) {
+			if (errno == ENOENT)
+				continue;
+			p_err("can't get link by id (%u): %s",
+			      id, strerror(errno));
+			break;
+		}
+
+		err = do_show_link(fd);
+		if (err)
+			break;
+	}
+	if (json_output)
+		jsonw_end_array(json_wtr);
+
+	return errno == ENOENT ? 0 : -1;
+}
+
+static int do_pin(int argc, char **argv)
+{
+	int err;
+
+	err = do_pin_any(argc, argv, link_parse_fd);
+	if (!err && json_output)
+		jsonw_null(json_wtr);
+	return err;
+}
+
+static int do_help(int argc, char **argv)
+{
+	if (json_output) {
+		jsonw_null(json_wtr);
+		return 0;
+	}
+
+	fprintf(stderr,
+		"Usage: %1$s %2$s { show | list }   [LINK]\n"
+		"       %1$s %2$s pin        LINK  FILE\n"
+		"       %1$s %2$s help\n"
+		"\n"
+		"       " HELP_SPEC_LINK "\n"
+		"       " HELP_SPEC_PROGRAM "\n"
+		"       " HELP_SPEC_OPTIONS "\n"
+		"",
+		bin_name, argv[-2]);
+
+	return 0;
+}
+
+static const struct cmd cmds[] = {
+	{ "show",	do_show },
+	{ "list",	do_show },
+	{ "help",	do_help },
+	{ "pin",	do_pin },
+	{ 0 }
+};
+
+int do_link(int argc, char **argv)
+{
+	return cmd_select(cmds, argc, argv, do_help);
+}
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 466c269eabdd..4b2f74941625 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -30,6 +30,7 @@ bool verifier_logs;
 bool relaxed_maps;
 struct pinned_obj_table prog_table;
 struct pinned_obj_table map_table;
+struct pinned_obj_table link_table;
 
 static void __noreturn clean_and_exit(int i)
 {
@@ -215,6 +216,7 @@ static const struct cmd cmds[] = {
 	{ "batch",	do_batch },
 	{ "prog",	do_prog },
 	{ "map",	do_map },
+	{ "link",	do_link },
 	{ "cgroup",	do_cgroup },
 	{ "perf",	do_perf },
 	{ "net",	do_net },
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index dd212d2af923..ff00a91982f7 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -50,6 +50,8 @@
 	"\t            {-m|--mapcompat} | {-n|--nomount} }"
 #define HELP_SPEC_MAP							\
 	"MAP := { id MAP_ID | pinned FILE | name MAP_NAME }"
+#define HELP_SPEC_LINK							\
+	"LINK := { id LINK_ID | pinned FILE }"
 
 static const char * const prog_type_name[] = {
 	[BPF_PROG_TYPE_UNSPEC]			= "unspec",
@@ -122,6 +124,7 @@ enum bpf_obj_type {
 	BPF_OBJ_UNKNOWN,
 	BPF_OBJ_PROG,
 	BPF_OBJ_MAP,
+	BPF_OBJ_LINK,
 };
 
 extern const char *bin_name;
@@ -134,6 +137,7 @@ extern bool verifier_logs;
 extern bool relaxed_maps;
 extern struct pinned_obj_table prog_table;
 extern struct pinned_obj_table map_table;
+extern struct pinned_obj_table link_table;
 
 void __printf(1, 2) p_err(const char *fmt, ...);
 void __printf(1, 2) p_info(const char *fmt, ...);
@@ -185,6 +189,7 @@ int do_pin_fd(int fd, const char *name);
 
 int do_prog(int argc, char **arg);
 int do_map(int argc, char **arg);
+int do_link(int argc, char **arg);
 int do_event_pipe(int argc, char **argv);
 int do_cgroup(int argc, char **arg);
 int do_perf(int argc, char **arg);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 0/8] bpf_link observability APIs
  2020-04-04  0:09 [RFC PATCH bpf-next 0/8] bpf_link observability APIs Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2020-04-04  0:09 ` [RFC PATCH bpf-next 8/8] bpftool: add bpf_link show and pin support Andrii Nakryiko
@ 2020-04-05 16:26 ` David Ahern
  2020-04-05 18:31   ` Andrii Nakryiko
  8 siblings, 1 reply; 24+ messages in thread
From: David Ahern @ 2020-04-05 16:26 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team

On 4/3/20 6:09 PM, Andrii Nakryiko wrote:
> This patch series adds various observability APIs to bpf_link:
>   - each bpf_link now gets ID, similar to bpf_map and bpf_prog, by which
>     user-space can iterate over all existing bpf_links and create limited FD
>     from ID;
>   - allows to get extra object information with bpf_link general and
>     type-specific information;
>   - makes LINK_UPDATE operation allowed only for writable bpf_links and allows
>     to pin bpf_link as read-only file;
>   - implements `bpf link show` command which lists all active bpf_links in the
>     system;
>   - implements `bpf link pin` allowing to pin bpf_link by ID or from other
>     pinned path.
> 
> This RFC series is missing selftests and only limited amount of manual testing
> was performed. But kernel implementation is hopefully in a good shape and
> won't change much (unless some big issues are identified with the current
> approach). It would be great to get feedback on approach and implementation,
> before I invest more time in writing tests.
> 

The word 'ownership' was used over and over in describing the benefits
of bpf_link meaning a process owns a program at a specific attach point.
How does this set allow me to discover the pid of the process
controlling a specific bpf_link?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 0/8] bpf_link observability APIs
  2020-04-05 16:26 ` [RFC PATCH bpf-next 0/8] bpf_link observability APIs David Ahern
@ 2020-04-05 18:31   ` Andrii Nakryiko
  0 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-05 18:31 UTC (permalink / raw)
  To: David Ahern
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Sun, Apr 5, 2020 at 9:26 AM David Ahern <dsahern@gmail.com> wrote:
>
> On 4/3/20 6:09 PM, Andrii Nakryiko wrote:
> > This patch series adds various observability APIs to bpf_link:
> >   - each bpf_link now gets ID, similar to bpf_map and bpf_prog, by which
> >     user-space can iterate over all existing bpf_links and create limited FD
> >     from ID;
> >   - allows to get extra object information with bpf_link general and
> >     type-specific information;
> >   - makes LINK_UPDATE operation allowed only for writable bpf_links and allows
> >     to pin bpf_link as read-only file;
> >   - implements `bpf link show` command which lists all active bpf_links in the
> >     system;
> >   - implements `bpf link pin` allowing to pin bpf_link by ID or from other
> >     pinned path.
> >
> > This RFC series is missing selftests and only limited amount of manual testing
> > was performed. But kernel implementation is hopefully in a good shape and
> > won't change much (unless some big issues are identified with the current
> > approach). It would be great to get feedback on approach and implementation,
> > before I invest more time in writing tests.
> >
>
> The word 'ownership' was used over and over in describing the benefits
> of bpf_link meaning a process owns a program at a specific attach point.
> How does this set allow me to discover the pid of the process
> controlling a specific bpf_link?

In general, it's many processes, not a single process. Here's how:

1. Each bpf_link has a unique ID.
2. You can find desired bpf_link info (including ID) either by already
having FD and querying it with GET_OBJ_INFO_BY_FD, or by doing
GET_NEXT_ID iteration and then GET_FD_BY_ID + GET_OBJ_INFO_BY_FD.
3. Iterate all open files (either by using tools like drgn or by
iterating over procfs), check their fdinfo to see if it's bpf_link's
with given ID. This gives you which application has FD open against
given bpf_link.

Similarly one can iterate over all pinned files in bpffs and see if it
pins bpf_link (I believe bpftool can do that already and show which
objects are pinned, so it should be a minimal change to actually print
out all the pinned file paths).

Does that answer your question?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  2020-04-04  0:09 ` [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID " Andrii Nakryiko
@ 2020-04-06 11:34   ` Toke Høiland-Jørgensen
  2020-04-06 19:06     ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-06 11:34 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Andrii Nakryiko <andriin@fb.com> writes:

> Add support to look up bpf_link by ID and iterate over all existing bpf_links
> in the system. GET_FD_BY_ID code handles not-yet-ready bpf_link by checking
> that its ID hasn't been set to non-zero value yet. Setting bpf_link's ID is
> done as the very last step in finalizing bpf_link, together with installing
> FD. This approach allows users of bpf_link in kernel code to not worry about
> races between user-space and kernel code that hasn't finished attaching and
> initializing bpf_link.
>
> Further, it's critical that BPF_LINK_GET_FD_BY_ID only ever allows to create
> bpf_link FD that's O_RDONLY. This is to protect processes owning bpf_link and
> thus allowed to perform modifications on them (like LINK_UPDATE), from other
> processes that got bpf_link ID from GET_NEXT_ID API. In the latter case, only
> querying bpf_link information (implemented later in the series) will be
> allowed.

I must admit I remain sceptical about this model of restricting access
without any of the regular override mechanisms (for instance, enforcing
read-only mode regardless of CAP_DAC_OVERRIDE in this series). Since you
keep saying there would be 'some' override mechanism, I think it would
be helpful if you could just include that so we can see the full
mechanism in context.

-Toke


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 5/8] bpf: add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link
  2020-04-04  0:09 ` [RFC PATCH bpf-next 5/8] bpf: add support for BPF_OBJ_GET_INFO_BY_FD " Andrii Nakryiko
@ 2020-04-06 11:34   ` Toke Høiland-Jørgensen
  2020-04-06 18:58     ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-06 11:34 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

Andrii Nakryiko <andriin@fb.com> writes:

> Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command.
> Also enhance show_fdinfo to potentially include bpf_link type-specific
> information (similarly to obj_info).
>
> Also introduce enum bpf_link_type stored in bpf_link itself and expose it in
> UAPI. bpf_link_tracing also now will store and return bpf_attach_type.
>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
>  include/linux/bpf-cgroup.h     |   2 -
>  include/linux/bpf.h            |  10 +-
>  include/linux/bpf_types.h      |   6 ++
>  include/uapi/linux/bpf.h       |  28 ++++++
>  kernel/bpf/btf.c               |   2 +
>  kernel/bpf/cgroup.c            |  45 ++++++++-
>  kernel/bpf/syscall.c           | 164 +++++++++++++++++++++++++++++----
>  kernel/bpf/verifier.c          |   2 +
>  tools/include/uapi/linux/bpf.h |  31 +++++++
>  9 files changed, 266 insertions(+), 24 deletions(-)
>
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index d2d969669564..ab95824a1d99 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -57,8 +57,6 @@ struct bpf_cgroup_link {
>  	enum bpf_attach_type type;
>  };
>  
> -extern const struct bpf_link_ops bpf_cgroup_link_lops;
> -
>  struct bpf_prog_list {
>  	struct list_head node;
>  	struct bpf_prog *prog;
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 67ce74890911..8cf182d256d4 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1026,9 +1026,11 @@ extern const struct file_operations bpf_prog_fops;
>  	extern const struct bpf_verifier_ops _name ## _verifier_ops;
>  #define BPF_MAP_TYPE(_id, _ops) \
>  	extern const struct bpf_map_ops _ops;
> +#define BPF_LINK_TYPE(_id, _name)
>  #include <linux/bpf_types.h>
>  #undef BPF_PROG_TYPE
>  #undef BPF_MAP_TYPE
> +#undef BPF_LINK_TYPE
>  
>  extern const struct bpf_prog_ops bpf_offload_prog_ops;
>  extern const struct bpf_verifier_ops tc_cls_act_analyzer_ops;
> @@ -1086,6 +1088,7 @@ int bpf_prog_new_fd(struct bpf_prog *prog);
>  struct bpf_link {
>  	atomic64_t refcnt;
>  	u32 id;
> +	enum bpf_link_type type;
>  	const struct bpf_link_ops *ops;
>  	struct bpf_prog *prog;
>  	struct work_struct work;
> @@ -1103,9 +1106,14 @@ struct bpf_link_ops {
>  	void (*dealloc)(struct bpf_link *link);
>  	int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
>  			   struct bpf_prog *old_prog);
> +	void (*show_fdinfo)(const struct bpf_link *link, struct seq_file *seq);
> +	int (*fill_link_info)(const struct bpf_link *link,
> +			      struct bpf_link_info *info,
> +			      const struct bpf_link_info *uinfo,
> +			      u32 info_len);
>  };
>  
> -void bpf_link_init(struct bpf_link *link,
> +void bpf_link_init(struct bpf_link *link, enum bpf_link_type type,
>  		   const struct bpf_link_ops *ops, struct bpf_prog *prog);
>  int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer);
>  int bpf_link_settle(struct bpf_link_primer *primer);
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index ba0c2d56f8a3..8345cdf553b8 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -118,3 +118,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
>  #if defined(CONFIG_BPF_JIT)
>  BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
>  #endif
> +
> +BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> +BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> +#ifdef CONFIG_CGROUP_BPF
> +BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
> +#endif
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 407c086bc9e4..d2f269082a33 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -222,6 +222,15 @@ enum bpf_attach_type {
>  
>  #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
>  
> +enum bpf_link_type {
> +	BPF_LINK_TYPE_UNSPEC = 0,
> +	BPF_LINK_TYPE_RAW_TRACEPOINT = 1,
> +	BPF_LINK_TYPE_TRACING = 2,
> +	BPF_LINK_TYPE_CGROUP = 3,
> +
> +	MAX_BPF_LINK_TYPE,
> +};
> +
>  /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
>   *
>   * NONE(default): No further bpf programs allowed in the subtree.
> @@ -3601,6 +3610,25 @@ struct bpf_btf_info {
>  	__u32 id;
>  } __attribute__((aligned(8)));
>  
> +struct bpf_link_info {
> +	__u32 type;
> +	__u32 id;
> +	__u32 prog_id;
> +	union {
> +		struct {
> +			__aligned_u64 tp_name; /* in/out: tp_name buffer ptr */
> +			__u32 tp_name_len;     /* in/out: tp_name buffer len */
> +		} raw_tracepoint;
> +		struct {
> +			__u32 attach_type;
> +		} tracing;

Shouldn't this also include the attach target?

-Toke


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 5/8] bpf: add support for BPF_OBJ_GET_INFO_BY_FD for bpf_link
  2020-04-06 11:34   ` Toke Høiland-Jørgensen
@ 2020-04-06 18:58     ` Andrii Nakryiko
  0 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-06 18:58 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Mon, Apr 6, 2020 at 4:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andriin@fb.com> writes:
>
> > Add ability to fetch bpf_link details through BPF_OBJ_GET_INFO_BY_FD command.
> > Also enhance show_fdinfo to potentially include bpf_link type-specific
> > information (similarly to obj_info).
> >
> > Also introduce enum bpf_link_type stored in bpf_link itself and expose it in
> > UAPI. bpf_link_tracing also now will store and return bpf_attach_type.
> >
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---
> >  include/linux/bpf-cgroup.h     |   2 -
> >  include/linux/bpf.h            |  10 +-
> >  include/linux/bpf_types.h      |   6 ++
> >  include/uapi/linux/bpf.h       |  28 ++++++
> >  kernel/bpf/btf.c               |   2 +
> >  kernel/bpf/cgroup.c            |  45 ++++++++-
> >  kernel/bpf/syscall.c           | 164 +++++++++++++++++++++++++++++----
> >  kernel/bpf/verifier.c          |   2 +
> >  tools/include/uapi/linux/bpf.h |  31 +++++++
> >  9 files changed, 266 insertions(+), 24 deletions(-)
> >
> > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> > index d2d969669564..ab95824a1d99 100644
> > --- a/include/linux/bpf-cgroup.h
> > +++ b/include/linux/bpf-cgroup.h
> > @@ -57,8 +57,6 @@ struct bpf_cgroup_link {
> >       enum bpf_attach_type type;
> >  };
> >
> > -extern const struct bpf_link_ops bpf_cgroup_link_lops;
> > -
> >  struct bpf_prog_list {
> >       struct list_head node;
> >       struct bpf_prog *prog;
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 67ce74890911..8cf182d256d4 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1026,9 +1026,11 @@ extern const struct file_operations bpf_prog_fops;
> >       extern const struct bpf_verifier_ops _name ## _verifier_ops;
> >  #define BPF_MAP_TYPE(_id, _ops) \
> >       extern const struct bpf_map_ops _ops;
> > +#define BPF_LINK_TYPE(_id, _name)
> >  #include <linux/bpf_types.h>
> >  #undef BPF_PROG_TYPE
> >  #undef BPF_MAP_TYPE
> > +#undef BPF_LINK_TYPE
> >
> >  extern const struct bpf_prog_ops bpf_offload_prog_ops;
> >  extern const struct bpf_verifier_ops tc_cls_act_analyzer_ops;
> > @@ -1086,6 +1088,7 @@ int bpf_prog_new_fd(struct bpf_prog *prog);
> >  struct bpf_link {
> >       atomic64_t refcnt;
> >       u32 id;
> > +     enum bpf_link_type type;
> >       const struct bpf_link_ops *ops;
> >       struct bpf_prog *prog;
> >       struct work_struct work;
> > @@ -1103,9 +1106,14 @@ struct bpf_link_ops {
> >       void (*dealloc)(struct bpf_link *link);
> >       int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
> >                          struct bpf_prog *old_prog);
> > +     void (*show_fdinfo)(const struct bpf_link *link, struct seq_file *seq);
> > +     int (*fill_link_info)(const struct bpf_link *link,
> > +                           struct bpf_link_info *info,
> > +                           const struct bpf_link_info *uinfo,
> > +                           u32 info_len);
> >  };
> >
> > -void bpf_link_init(struct bpf_link *link,
> > +void bpf_link_init(struct bpf_link *link, enum bpf_link_type type,
> >                  const struct bpf_link_ops *ops, struct bpf_prog *prog);
> >  int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer);
> >  int bpf_link_settle(struct bpf_link_primer *primer);
> > diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> > index ba0c2d56f8a3..8345cdf553b8 100644
> > --- a/include/linux/bpf_types.h
> > +++ b/include/linux/bpf_types.h
> > @@ -118,3 +118,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops)
> >  #if defined(CONFIG_BPF_JIT)
> >  BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops)
> >  #endif
> > +
> > +BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint)
> > +BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing)
> > +#ifdef CONFIG_CGROUP_BPF
> > +BPF_LINK_TYPE(BPF_LINK_TYPE_CGROUP, cgroup)
> > +#endif
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 407c086bc9e4..d2f269082a33 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -222,6 +222,15 @@ enum bpf_attach_type {
> >
> >  #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
> >
> > +enum bpf_link_type {
> > +     BPF_LINK_TYPE_UNSPEC = 0,
> > +     BPF_LINK_TYPE_RAW_TRACEPOINT = 1,
> > +     BPF_LINK_TYPE_TRACING = 2,
> > +     BPF_LINK_TYPE_CGROUP = 3,
> > +
> > +     MAX_BPF_LINK_TYPE,
> > +};
> > +
> >  /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
> >   *
> >   * NONE(default): No further bpf programs allowed in the subtree.
> > @@ -3601,6 +3610,25 @@ struct bpf_btf_info {
> >       __u32 id;
> >  } __attribute__((aligned(8)));
> >
> > +struct bpf_link_info {
> > +     __u32 type;
> > +     __u32 id;
> > +     __u32 prog_id;
> > +     union {
> > +             struct {
> > +                     __aligned_u64 tp_name; /* in/out: tp_name buffer ptr */
> > +                     __u32 tp_name_len;     /* in/out: tp_name buffer len */
> > +             } raw_tracepoint;
> > +             struct {
> > +                     __u32 attach_type;
> > +             } tracing;
>
> Shouldn't this also include the attach target?

We can store extra stuff in bpf_link for this as well and return it,
I'll take a look what makes sense to return for bpf_tracing_link.

>
> -Toke
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  2020-04-06 11:34   ` Toke Høiland-Jørgensen
@ 2020-04-06 19:06     ` Andrii Nakryiko
  2020-04-08 15:14       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-06 19:06 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Mon, Apr 6, 2020 at 4:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andriin@fb.com> writes:
>
> > Add support to look up bpf_link by ID and iterate over all existing bpf_links
> > in the system. GET_FD_BY_ID code handles not-yet-ready bpf_link by checking
> > that its ID hasn't been set to non-zero value yet. Setting bpf_link's ID is
> > done as the very last step in finalizing bpf_link, together with installing
> > FD. This approach allows users of bpf_link in kernel code to not worry about
> > races between user-space and kernel code that hasn't finished attaching and
> > initializing bpf_link.
> >
> > Further, it's critical that BPF_LINK_GET_FD_BY_ID only ever allows to create
> > bpf_link FD that's O_RDONLY. This is to protect processes owning bpf_link and
> > thus allowed to perform modifications on them (like LINK_UPDATE), from other
> > processes that got bpf_link ID from GET_NEXT_ID API. In the latter case, only
> > querying bpf_link information (implemented later in the series) will be
> > allowed.
>
> I must admit I remain sceptical about this model of restricting access
> without any of the regular override mechanisms (for instance, enforcing
> read-only mode regardless of CAP_DAC_OVERRIDE in this series). Since you
> keep saying there would be 'some' override mechanism, I think it would
> be helpful if you could just include that so we can see the full
> mechanism in context.

I wasn't aware of CAP_DAC_OVERRIDE, thanks for bringing this up.

One way to go about this is to allow creating writable bpf_link for
GET_FD_BY_ID if CAP_DAC_OVERRIDE is set. Then we can allow LINK_DETACH
operation on writable links, same as we do with LINK_UPDATE here.
LINK_DETACH will do the same as cgroup bpf_link auto-detachment on
cgroup dying: it will detach bpf_link, but will leave it alive until
last FD is closed.

We need to consider, though, if CAP_DAC_OVERRIDE is something that can
be disabled for majority of real-life applications to prevent them
from doing this. If every realistic application has/needs
CAP_DAC_OVERRIDE, then that's essentially just saying that anyone can
get writable bpf_link and do anything with it.

>
> -Toke
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  2020-04-06 19:06     ` Andrii Nakryiko
@ 2020-04-08 15:14       ` Toke Høiland-Jørgensen
  2020-04-08 20:23         ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-08 15:14 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Mon, Apr 6, 2020 at 4:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andriin@fb.com> writes:
>>
>> > Add support to look up bpf_link by ID and iterate over all existing bpf_links
>> > in the system. GET_FD_BY_ID code handles not-yet-ready bpf_link by checking
>> > that its ID hasn't been set to non-zero value yet. Setting bpf_link's ID is
>> > done as the very last step in finalizing bpf_link, together with installing
>> > FD. This approach allows users of bpf_link in kernel code to not worry about
>> > races between user-space and kernel code that hasn't finished attaching and
>> > initializing bpf_link.
>> >
>> > Further, it's critical that BPF_LINK_GET_FD_BY_ID only ever allows to create
>> > bpf_link FD that's O_RDONLY. This is to protect processes owning bpf_link and
>> > thus allowed to perform modifications on them (like LINK_UPDATE), from other
>> > processes that got bpf_link ID from GET_NEXT_ID API. In the latter case, only
>> > querying bpf_link information (implemented later in the series) will be
>> > allowed.
>>
>> I must admit I remain sceptical about this model of restricting access
>> without any of the regular override mechanisms (for instance, enforcing
>> read-only mode regardless of CAP_DAC_OVERRIDE in this series). Since you
>> keep saying there would be 'some' override mechanism, I think it would
>> be helpful if you could just include that so we can see the full
>> mechanism in context.
>
> I wasn't aware of CAP_DAC_OVERRIDE, thanks for bringing this up.
>
> One way to go about this is to allow creating writable bpf_link for
> GET_FD_BY_ID if CAP_DAC_OVERRIDE is set. Then we can allow LINK_DETACH
> operation on writable links, same as we do with LINK_UPDATE here.
> LINK_DETACH will do the same as cgroup bpf_link auto-detachment on
> cgroup dying: it will detach bpf_link, but will leave it alive until
> last FD is closed.

Yup, I think this would be a reasonable way to implement the override
mechanism - it would ensure 'full root' users (like a root shell) can
remove attachments, while still preventing applications from doing so by
limiting their capabilities.

Extending on the concept of RO/RW bpf_link attachments, maybe it should
even be possible for an application to choose which mode it wants to pin
its fd in? With the same capability being able to override it of
course...

> We need to consider, though, if CAP_DAC_OVERRIDE is something that can
> be disabled for majority of real-life applications to prevent them
> from doing this. If every realistic application has/needs
> CAP_DAC_OVERRIDE, then that's essentially just saying that anyone can
> get writable bpf_link and do anything with it.

I poked around a bit, and looking at the sandboxing configurations
shipped with various daemons in their systemd unit files, it appears
that the main case where daemons are granted CAP_DAC_OVERRIDE is if they
have to be able to read /etc/shadow (which is installed as chmod 0). If
this is really the case, that would indicate it's not a widely needed
capability; but I wouldn't exactly say that I've done a comprehensive
survey, so probably a good idea for you to check your users as well :)

-Toke


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  2020-04-08 15:14       ` Toke Høiland-Jørgensen
@ 2020-04-08 20:23         ` Andrii Nakryiko
  2020-04-08 21:21           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-08 20:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Wed, Apr 8, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Mon, Apr 6, 2020 at 4:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andriin@fb.com> writes:
> >>
> >> > Add support to look up bpf_link by ID and iterate over all existing bpf_links
> >> > in the system. GET_FD_BY_ID code handles not-yet-ready bpf_link by checking
> >> > that its ID hasn't been set to non-zero value yet. Setting bpf_link's ID is
> >> > done as the very last step in finalizing bpf_link, together with installing
> >> > FD. This approach allows users of bpf_link in kernel code to not worry about
> >> > races between user-space and kernel code that hasn't finished attaching and
> >> > initializing bpf_link.
> >> >
> >> > Further, it's critical that BPF_LINK_GET_FD_BY_ID only ever allows to create
> >> > bpf_link FD that's O_RDONLY. This is to protect processes owning bpf_link and
> >> > thus allowed to perform modifications on them (like LINK_UPDATE), from other
> >> > processes that got bpf_link ID from GET_NEXT_ID API. In the latter case, only
> >> > querying bpf_link information (implemented later in the series) will be
> >> > allowed.
> >>
> >> I must admit I remain sceptical about this model of restricting access
> >> without any of the regular override mechanisms (for instance, enforcing
> >> read-only mode regardless of CAP_DAC_OVERRIDE in this series). Since you
> >> keep saying there would be 'some' override mechanism, I think it would
> >> be helpful if you could just include that so we can see the full
> >> mechanism in context.
> >
> > I wasn't aware of CAP_DAC_OVERRIDE, thanks for bringing this up.
> >
> > One way to go about this is to allow creating writable bpf_link for
> > GET_FD_BY_ID if CAP_DAC_OVERRIDE is set. Then we can allow LINK_DETACH
> > operation on writable links, same as we do with LINK_UPDATE here.
> > LINK_DETACH will do the same as cgroup bpf_link auto-detachment on
> > cgroup dying: it will detach bpf_link, but will leave it alive until
> > last FD is closed.
>
> Yup, I think this would be a reasonable way to implement the override
> mechanism - it would ensure 'full root' users (like a root shell) can
> remove attachments, while still preventing applications from doing so by
> limiting their capabilities.

So I did some experiments and I think I want to keep GET_FD_BY_ID for
bpf_link to return only read-only bpf_links. After that, one can pin
bpf_link temporarily and re-open it as writable one, provided
CAP_DAC_OVERRIDE capability is present. All that works already,
because pinned bpf_link is just a file, so one can do fchmod on it and
all that will go through normal file access permission check code
path. Unfortunately, just re-opening same FD as writable (which would
be possible if fcntl(fd, F_SETFL, S_IRUSR
 S_IWUSR) was supported on Linux) without pinning is not possible.
Opening link from /proc/<pid>/fd/<link-fd> doesn't seem to work
either, because backing inode is not BPF FS inode. I'm not sure, but
maybe we can support the latter eventually. But either way, I think
given this is to be used for manual troubleshooting, going through few
extra hoops to force-detach bpf_link is actually a good thing.


>
> Extending on the concept of RO/RW bpf_link attachments, maybe it should
> even be possible for an application to choose which mode it wants to pin
> its fd in? With the same capability being able to override it of
> course...

Isn't that what patch #2 is doing?... There are few bugs in the
implementation currently, but it will work in the final version.

>
> > We need to consider, though, if CAP_DAC_OVERRIDE is something that can
> > be disabled for majority of real-life applications to prevent them
> > from doing this. If every realistic application has/needs
> > CAP_DAC_OVERRIDE, then that's essentially just saying that anyone can
> > get writable bpf_link and do anything with it.
>
> I poked around a bit, and looking at the sandboxing configurations
> shipped with various daemons in their systemd unit files, it appears
> that the main case where daemons are granted CAP_DAC_OVERRIDE is if they
> have to be able to read /etc/shadow (which is installed as chmod 0). If
> this is really the case, that would indicate it's not a widely needed
> capability; but I wouldn't exactly say that I've done a comprehensive
> survey, so probably a good idea for you to check your users as well :)

Right, it might not be possible to drop it for all applications right
away, but at least CAP_DAC_OVERRIDE is not CAP_SYS_ADMIN, which is
absolutely necessary to work with BPF.

>
> -Toke
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  2020-04-08 20:23         ` Andrii Nakryiko
@ 2020-04-08 21:21           ` Toke Høiland-Jørgensen
  2020-04-09 18:49             ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-08 21:21 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Wed, Apr 8, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Mon, Apr 6, 2020 at 4:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Andrii Nakryiko <andriin@fb.com> writes:
>> >>
>> >> > Add support to look up bpf_link by ID and iterate over all existing bpf_links
>> >> > in the system. GET_FD_BY_ID code handles not-yet-ready bpf_link by checking
>> >> > that its ID hasn't been set to non-zero value yet. Setting bpf_link's ID is
>> >> > done as the very last step in finalizing bpf_link, together with installing
>> >> > FD. This approach allows users of bpf_link in kernel code to not worry about
>> >> > races between user-space and kernel code that hasn't finished attaching and
>> >> > initializing bpf_link.
>> >> >
>> >> > Further, it's critical that BPF_LINK_GET_FD_BY_ID only ever allows to create
>> >> > bpf_link FD that's O_RDONLY. This is to protect processes owning bpf_link and
>> >> > thus allowed to perform modifications on them (like LINK_UPDATE), from other
>> >> > processes that got bpf_link ID from GET_NEXT_ID API. In the latter case, only
>> >> > querying bpf_link information (implemented later in the series) will be
>> >> > allowed.
>> >>
>> >> I must admit I remain sceptical about this model of restricting access
>> >> without any of the regular override mechanisms (for instance, enforcing
>> >> read-only mode regardless of CAP_DAC_OVERRIDE in this series). Since you
>> >> keep saying there would be 'some' override mechanism, I think it would
>> >> be helpful if you could just include that so we can see the full
>> >> mechanism in context.
>> >
>> > I wasn't aware of CAP_DAC_OVERRIDE, thanks for bringing this up.
>> >
>> > One way to go about this is to allow creating writable bpf_link for
>> > GET_FD_BY_ID if CAP_DAC_OVERRIDE is set. Then we can allow LINK_DETACH
>> > operation on writable links, same as we do with LINK_UPDATE here.
>> > LINK_DETACH will do the same as cgroup bpf_link auto-detachment on
>> > cgroup dying: it will detach bpf_link, but will leave it alive until
>> > last FD is closed.
>>
>> Yup, I think this would be a reasonable way to implement the override
>> mechanism - it would ensure 'full root' users (like a root shell) can
>> remove attachments, while still preventing applications from doing so by
>> limiting their capabilities.
>
> So I did some experiments and I think I want to keep GET_FD_BY_ID for
> bpf_link to return only read-only bpf_links.

Why, exactly? (also, see below)

> After that, one can pin bpf_link temporarily and re-open it as
> writable one, provided CAP_DAC_OVERRIDE capability is present. All
> that works already, because pinned bpf_link is just a file, so one can
> do fchmod on it and all that will go through normal file access
> permission check code path.

Ah, I did not know that was possible - I was assuming that bpffs was
doing something special to prevent that. But if not, great!

> Unfortunately, just re-opening same FD as writable (which would
> be possible if fcntl(fd, F_SETFL, S_IRUSR
>  S_IWUSR) was supported on Linux) without pinning is not possible.
> Opening link from /proc/<pid>/fd/<link-fd> doesn't seem to work
> either, because backing inode is not BPF FS inode. I'm not sure, but
> maybe we can support the latter eventually. But either way, I think
> given this is to be used for manual troubleshooting, going through few
> extra hoops to force-detach bpf_link is actually a good thing.

Hmm, I disagree that deliberately making users jump through hoops is a
good thing. Smells an awful lot like security through obscurity to me;
and we all know how well that works anyway...

>> Extending on the concept of RO/RW bpf_link attachments, maybe it should
>> even be possible for an application to choose which mode it wants to pin
>> its fd in? With the same capability being able to override it of
>> course...
>
> Isn't that what patch #2 is doing?...

Ah yes, so it is! I guess I skipped over that a bit too fast ;)

> There are few bugs in the implementation currently, but it will work
> in the final version.

Cool.

>> > We need to consider, though, if CAP_DAC_OVERRIDE is something that can
>> > be disabled for majority of real-life applications to prevent them
>> > from doing this. If every realistic application has/needs
>> > CAP_DAC_OVERRIDE, then that's essentially just saying that anyone can
>> > get writable bpf_link and do anything with it.
>>
>> I poked around a bit, and looking at the sandboxing configurations
>> shipped with various daemons in their systemd unit files, it appears
>> that the main case where daemons are granted CAP_DAC_OVERRIDE is if they
>> have to be able to read /etc/shadow (which is installed as chmod 0). If
>> this is really the case, that would indicate it's not a widely needed
>> capability; but I wouldn't exactly say that I've done a comprehensive
>> survey, so probably a good idea for you to check your users as well :)
>
> Right, it might not be possible to drop it for all applications right
> away, but at least CAP_DAC_OVERRIDE is not CAP_SYS_ADMIN, which is
> absolutely necessary to work with BPF.

Yeah, I do hope that we'll eventually get CAP_BPF...

-Toke


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 8/8] bpftool: add bpf_link show and pin support
  2020-04-04  0:09 ` [RFC PATCH bpf-next 8/8] bpftool: add bpf_link show and pin support Andrii Nakryiko
@ 2020-04-08 23:44   ` David Ahern
  2020-04-09 18:50     ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: David Ahern @ 2020-04-08 23:44 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev, ast, daniel; +Cc: andrii.nakryiko, kernel-team

On 4/3/20 6:09 PM, Andrii Nakryiko wrote:
> diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
> index 466c269eabdd..4b2f74941625 100644
> --- a/tools/bpf/bpftool/main.c
> +++ b/tools/bpf/bpftool/main.c
> @@ -30,6 +30,7 @@ bool verifier_logs;
>  bool relaxed_maps;
>  struct pinned_obj_table prog_table;
>  struct pinned_obj_table map_table;
> +struct pinned_obj_table link_table;
>  
>  static void __noreturn clean_and_exit(int i)
>  {
> @@ -215,6 +216,7 @@ static const struct cmd cmds[] = {
>  	{ "batch",	do_batch },
>  	{ "prog",	do_prog },
>  	{ "map",	do_map },
> +	{ "link",	do_link },
>  	{ "cgroup",	do_cgroup },
>  	{ "perf",	do_perf },
>  	{ "net",	do_net },

you need to add 'link' to the OBJECT list in do_help.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  2020-04-08 21:21           ` Toke Høiland-Jørgensen
@ 2020-04-09 18:49             ` Andrii Nakryiko
  2020-04-14 10:32               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-09 18:49 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Wed, Apr 8, 2020 at 2:21 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Wed, Apr 8, 2020 at 8:14 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >> > On Mon, Apr 6, 2020 at 4:34 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >>
> >> >> Andrii Nakryiko <andriin@fb.com> writes:
> >> >>
> >> >> > Add support to look up bpf_link by ID and iterate over all existing bpf_links
> >> >> > in the system. GET_FD_BY_ID code handles not-yet-ready bpf_link by checking
> >> >> > that its ID hasn't been set to non-zero value yet. Setting bpf_link's ID is
> >> >> > done as the very last step in finalizing bpf_link, together with installing
> >> >> > FD. This approach allows users of bpf_link in kernel code to not worry about
> >> >> > races between user-space and kernel code that hasn't finished attaching and
> >> >> > initializing bpf_link.
> >> >> >
> >> >> > Further, it's critical that BPF_LINK_GET_FD_BY_ID only ever allows to create
> >> >> > bpf_link FD that's O_RDONLY. This is to protect processes owning bpf_link and
> >> >> > thus allowed to perform modifications on them (like LINK_UPDATE), from other
> >> >> > processes that got bpf_link ID from GET_NEXT_ID API. In the latter case, only
> >> >> > querying bpf_link information (implemented later in the series) will be
> >> >> > allowed.
> >> >>
> >> >> I must admit I remain sceptical about this model of restricting access
> >> >> without any of the regular override mechanisms (for instance, enforcing
> >> >> read-only mode regardless of CAP_DAC_OVERRIDE in this series). Since you
> >> >> keep saying there would be 'some' override mechanism, I think it would
> >> >> be helpful if you could just include that so we can see the full
> >> >> mechanism in context.
> >> >
> >> > I wasn't aware of CAP_DAC_OVERRIDE, thanks for bringing this up.
> >> >
> >> > One way to go about this is to allow creating writable bpf_link for
> >> > GET_FD_BY_ID if CAP_DAC_OVERRIDE is set. Then we can allow LINK_DETACH
> >> > operation on writable links, same as we do with LINK_UPDATE here.
> >> > LINK_DETACH will do the same as cgroup bpf_link auto-detachment on
> >> > cgroup dying: it will detach bpf_link, but will leave it alive until
> >> > last FD is closed.
> >>
> >> Yup, I think this would be a reasonable way to implement the override
> >> mechanism - it would ensure 'full root' users (like a root shell) can
> >> remove attachments, while still preventing applications from doing so by
> >> limiting their capabilities.
> >
> > So I did some experiments and I think I want to keep GET_FD_BY_ID for
> > bpf_link to return only read-only bpf_links.
>
> Why, exactly? (also, see below)

For the reasons I explained below: because you can turn read-only
bpf_link into writable one through pinning + chmod, if you have
CAP_DAC_OVERRIDE.

>
> > After that, one can pin bpf_link temporarily and re-open it as
> > writable one, provided CAP_DAC_OVERRIDE capability is present. All
> > that works already, because pinned bpf_link is just a file, so one can
> > do fchmod on it and all that will go through normal file access
> > permission check code path.
>
> Ah, I did not know that was possible - I was assuming that bpffs was
> doing something special to prevent that. But if not, great!
>
> > Unfortunately, just re-opening same FD as writable (which would
> > be possible if fcntl(fd, F_SETFL, S_IRUSR
> >  S_IWUSR) was supported on Linux) without pinning is not possible.
> > Opening link from /proc/<pid>/fd/<link-fd> doesn't seem to work
> > either, because backing inode is not BPF FS inode. I'm not sure, but
> > maybe we can support the latter eventually. But either way, I think
> > given this is to be used for manual troubleshooting, going through few
> > extra hoops to force-detach bpf_link is actually a good thing.
>
> Hmm, I disagree that deliberately making users jump through hoops is a
> good thing. Smells an awful lot like security through obscurity to me;
> and we all know how well that works anyway...

Depends on who users are? bpftool can implement this as one of
`bpftool link` sub-commands and allow human operators to force-detach
bpf_link, if necessary. I think applications shouldn't do this
(programmatically) at all, which is why I think it's actually good
that it's harder and not obvious, this will make developer think again
before implementing this, hopefully. For me it's about discouraging
bad practice.

>
> >> Extending on the concept of RO/RW bpf_link attachments, maybe it should
> >> even be possible for an application to choose which mode it wants to pin
> >> its fd in? With the same capability being able to override it of
> >> course...
> >
> > Isn't that what patch #2 is doing?...
>
> Ah yes, so it is! I guess I skipped over that a bit too fast ;)
>
> > There are few bugs in the implementation currently, but it will work
> > in the final version.
>
> Cool.
>
> >> > We need to consider, though, if CAP_DAC_OVERRIDE is something that can
> >> > be disabled for majority of real-life applications to prevent them
> >> > from doing this. If every realistic application has/needs
> >> > CAP_DAC_OVERRIDE, then that's essentially just saying that anyone can
> >> > get writable bpf_link and do anything with it.
> >>
> >> I poked around a bit, and looking at the sandboxing configurations
> >> shipped with various daemons in their systemd unit files, it appears
> >> that the main case where daemons are granted CAP_DAC_OVERRIDE is if they
> >> have to be able to read /etc/shadow (which is installed as chmod 0). If
> >> this is really the case, that would indicate it's not a widely needed
> >> capability; but I wouldn't exactly say that I've done a comprehensive
> >> survey, so probably a good idea for you to check your users as well :)
> >
> > Right, it might not be possible to drop it for all applications right
> > away, but at least CAP_DAC_OVERRIDE is not CAP_SYS_ADMIN, which is
> > absolutely necessary to work with BPF.
>
> Yeah, I do hope that we'll eventually get CAP_BPF...
>
> -Toke
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 8/8] bpftool: add bpf_link show and pin support
  2020-04-08 23:44   ` David Ahern
@ 2020-04-09 18:50     ` Andrii Nakryiko
  0 siblings, 0 replies; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-09 18:50 UTC (permalink / raw)
  To: David Ahern
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Wed, Apr 8, 2020 at 4:44 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 4/3/20 6:09 PM, Andrii Nakryiko wrote:
> > diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
> > index 466c269eabdd..4b2f74941625 100644
> > --- a/tools/bpf/bpftool/main.c
> > +++ b/tools/bpf/bpftool/main.c
> > @@ -30,6 +30,7 @@ bool verifier_logs;
> >  bool relaxed_maps;
> >  struct pinned_obj_table prog_table;
> >  struct pinned_obj_table map_table;
> > +struct pinned_obj_table link_table;
> >
> >  static void __noreturn clean_and_exit(int i)
> >  {
> > @@ -215,6 +216,7 @@ static const struct cmd cmds[] = {
> >       { "batch",      do_batch },
> >       { "prog",       do_prog },
> >       { "map",        do_map },
> > +     { "link",       do_link },
> >       { "cgroup",     do_cgroup },
> >       { "perf",       do_perf },
> >       { "net",        do_net },
>
> you need to add 'link' to the OBJECT list in do_help.

Yeah, and bash completions and a bunch of other stuff. Bpftool support
is far from being polished.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  2020-04-09 18:49             ` Andrii Nakryiko
@ 2020-04-14 10:32               ` Toke Høiland-Jørgensen
  2020-04-14 18:47                 ` Andrii Nakryiko
  0 siblings, 1 reply; 24+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-14 10:32 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

>> > After that, one can pin bpf_link temporarily and re-open it as
>> > writable one, provided CAP_DAC_OVERRIDE capability is present. All
>> > that works already, because pinned bpf_link is just a file, so one can
>> > do fchmod on it and all that will go through normal file access
>> > permission check code path.
>>
>> Ah, I did not know that was possible - I was assuming that bpffs was
>> doing something special to prevent that. But if not, great!
>>
>> > Unfortunately, just re-opening same FD as writable (which would
>> > be possible if fcntl(fd, F_SETFL, S_IRUSR
>> >  S_IWUSR) was supported on Linux) without pinning is not possible.
>> > Opening link from /proc/<pid>/fd/<link-fd> doesn't seem to work
>> > either, because backing inode is not BPF FS inode. I'm not sure, but
>> > maybe we can support the latter eventually. But either way, I think
>> > given this is to be used for manual troubleshooting, going through few
>> > extra hoops to force-detach bpf_link is actually a good thing.
>>
>> Hmm, I disagree that deliberately making users jump through hoops is a
>> good thing. Smells an awful lot like security through obscurity to me;
>> and we all know how well that works anyway...
>
> Depends on who users are? bpftool can implement this as one of
> `bpftool link` sub-commands and allow human operators to force-detach
> bpf_link, if necessary.

Yeah, I would expect this to be the common way this would be used: built
into tools.

> I think applications shouldn't do this (programmatically) at all,
> which is why I think it's actually good that it's harder and not
> obvious, this will make developer think again before implementing
> this, hopefully. For me it's about discouraging bad practice.

I guess I just don't share your optimism that making people jump through
hoops will actually discourage them :)

If people know what they are doing it should be enough to document it as
discouraged. And if they don't, they are perfectly capable of finding
and copy-pasting the sequence of hoop-jumps required to achieve what
they want, probably with more bugs added along the way.

So in the end I think that all you're really achieving is annoying
people who do have a legitimate reason to override the behaviour (which
includes yourself as a bpftool developer :)). That's what I meant by the
'security through obscurity' comment.

-Toke


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  2020-04-14 10:32               ` Toke Høiland-Jørgensen
@ 2020-04-14 18:47                 ` Andrii Nakryiko
  2020-04-15  9:26                   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 24+ messages in thread
From: Andrii Nakryiko @ 2020-04-14 18:47 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

On Tue, Apr 14, 2020 at 3:32 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> >> > After that, one can pin bpf_link temporarily and re-open it as
> >> > writable one, provided CAP_DAC_OVERRIDE capability is present. All
> >> > that works already, because pinned bpf_link is just a file, so one can
> >> > do fchmod on it and all that will go through normal file access
> >> > permission check code path.
> >>
> >> Ah, I did not know that was possible - I was assuming that bpffs was
> >> doing something special to prevent that. But if not, great!
> >>
> >> > Unfortunately, just re-opening same FD as writable (which would
> >> > be possible if fcntl(fd, F_SETFL, S_IRUSR
> >> >  S_IWUSR) was supported on Linux) without pinning is not possible.
> >> > Opening link from /proc/<pid>/fd/<link-fd> doesn't seem to work
> >> > either, because backing inode is not BPF FS inode. I'm not sure, but
> >> > maybe we can support the latter eventually. But either way, I think
> >> > given this is to be used for manual troubleshooting, going through few
> >> > extra hoops to force-detach bpf_link is actually a good thing.
> >>
> >> Hmm, I disagree that deliberately making users jump through hoops is a
> >> good thing. Smells an awful lot like security through obscurity to me;
> >> and we all know how well that works anyway...
> >
> > Depends on who users are? bpftool can implement this as one of
> > `bpftool link` sub-commands and allow human operators to force-detach
> > bpf_link, if necessary.
>
> Yeah, I would expect this to be the common way this would be used: built
> into tools.
>
> > I think applications shouldn't do this (programmatically) at all,
> > which is why I think it's actually good that it's harder and not
> > obvious, this will make developer think again before implementing
> > this, hopefully. For me it's about discouraging bad practice.
>
> I guess I just don't share your optimism that making people jump through
> hoops will actually discourage them :)

I understand. I just don't see why would anyone have to implement this
at all and especially would think it's a good idea to begin with?

>
> If people know what they are doing it should be enough to document it as
> discouraged. And if they don't, they are perfectly capable of finding
> and copy-pasting the sequence of hoop-jumps required to achieve what
> they want, probably with more bugs added along the way.
>
> So in the end I think that all you're really achieving is annoying
> people who do have a legitimate reason to override the behaviour (which
> includes yourself as a bpftool developer :)). That's what I meant by the
> 'security through obscurity' comment.

Can I please get a list of real examples of legitimate reasons to
override this behavior?

>
> -Toke
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID for bpf_link
  2020-04-14 18:47                 ` Andrii Nakryiko
@ 2020-04-15  9:26                   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 24+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-15  9:26 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, Networking, Alexei Starovoitov,
	Daniel Borkmann, Kernel Team

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Apr 14, 2020 at 3:32 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> >> > After that, one can pin bpf_link temporarily and re-open it as
>> >> > writable one, provided CAP_DAC_OVERRIDE capability is present. All
>> >> > that works already, because pinned bpf_link is just a file, so one can
>> >> > do fchmod on it and all that will go through normal file access
>> >> > permission check code path.
>> >>
>> >> Ah, I did not know that was possible - I was assuming that bpffs was
>> >> doing something special to prevent that. But if not, great!
>> >>
>> >> > Unfortunately, just re-opening same FD as writable (which would
>> >> > be possible if fcntl(fd, F_SETFL, S_IRUSR
>> >> >  S_IWUSR) was supported on Linux) without pinning is not possible.
>> >> > Opening link from /proc/<pid>/fd/<link-fd> doesn't seem to work
>> >> > either, because backing inode is not BPF FS inode. I'm not sure, but
>> >> > maybe we can support the latter eventually. But either way, I think
>> >> > given this is to be used for manual troubleshooting, going through few
>> >> > extra hoops to force-detach bpf_link is actually a good thing.
>> >>
>> >> Hmm, I disagree that deliberately making users jump through hoops is a
>> >> good thing. Smells an awful lot like security through obscurity to me;
>> >> and we all know how well that works anyway...
>> >
>> > Depends on who users are? bpftool can implement this as one of
>> > `bpftool link` sub-commands and allow human operators to force-detach
>> > bpf_link, if necessary.
>>
>> Yeah, I would expect this to be the common way this would be used: built
>> into tools.
>>
>> > I think applications shouldn't do this (programmatically) at all,
>> > which is why I think it's actually good that it's harder and not
>> > obvious, this will make developer think again before implementing
>> > this, hopefully. For me it's about discouraging bad practice.
>>
>> I guess I just don't share your optimism that making people jump through
>> hoops will actually discourage them :)
>
> I understand. I just don't see why would anyone have to implement this
> at all and especially would think it's a good idea to begin with?
>
>>
>> If people know what they are doing it should be enough to document it as
>> discouraged. And if they don't, they are perfectly capable of finding
>> and copy-pasting the sequence of hoop-jumps required to achieve what
>> they want, probably with more bugs added along the way.
>>
>> So in the end I think that all you're really achieving is annoying
>> people who do have a legitimate reason to override the behaviour (which
>> includes yourself as a bpftool developer :)). That's what I meant by the
>> 'security through obscurity' comment.
>
> Can I please get a list of real examples of legitimate reasons to
> override this behavior?

Primarily, I expect that this would be built into admin tools (like
bpftool as you suggested). I just don't see why such tools should be
made to do the whole pin/reopen dance (which, BTW, adds an implicit
dependency on having a mounted bpffs) when we could just add a
capability check directly in bpf_link_get_fd_by_id()?

-Toke


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-04-15  9:26 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-04  0:09 [RFC PATCH bpf-next 0/8] bpf_link observability APIs Andrii Nakryiko
2020-04-04  0:09 ` [RFC PATCH bpf-next 1/8] bpf: refactor bpf_link update handling Andrii Nakryiko
2020-04-04  0:09 ` [RFC PATCH bpf-next 2/8] bpf: allow bpf_link pinning as read-only and enforce LINK_UPDATE Andrii Nakryiko
2020-04-04  0:09 ` [RFC PATCH bpf-next 3/8] bpf: allocate ID for bpf_link Andrii Nakryiko
2020-04-04  0:09 ` [RFC PATCH bpf-next 4/8] bpf: support GET_FD_BY_ID and GET_NEXT_ID " Andrii Nakryiko
2020-04-06 11:34   ` Toke Høiland-Jørgensen
2020-04-06 19:06     ` Andrii Nakryiko
2020-04-08 15:14       ` Toke Høiland-Jørgensen
2020-04-08 20:23         ` Andrii Nakryiko
2020-04-08 21:21           ` Toke Høiland-Jørgensen
2020-04-09 18:49             ` Andrii Nakryiko
2020-04-14 10:32               ` Toke Høiland-Jørgensen
2020-04-14 18:47                 ` Andrii Nakryiko
2020-04-15  9:26                   ` Toke Høiland-Jørgensen
2020-04-04  0:09 ` [RFC PATCH bpf-next 5/8] bpf: add support for BPF_OBJ_GET_INFO_BY_FD " Andrii Nakryiko
2020-04-06 11:34   ` Toke Høiland-Jørgensen
2020-04-06 18:58     ` Andrii Nakryiko
2020-04-04  0:09 ` [RFC PATCH bpf-next 6/8] libbpf: add low-level APIs for new bpf_link commands Andrii Nakryiko
2020-04-04  0:09 ` [RFC PATCH bpf-next 7/8] bpftool: expose attach_type-to-string array to non-cgroup code Andrii Nakryiko
2020-04-04  0:09 ` [RFC PATCH bpf-next 8/8] bpftool: add bpf_link show and pin support Andrii Nakryiko
2020-04-08 23:44   ` David Ahern
2020-04-09 18:50     ` Andrii Nakryiko
2020-04-05 16:26 ` [RFC PATCH bpf-next 0/8] bpf_link observability APIs David Ahern
2020-04-05 18:31   ` Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).