All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
@ 2022-01-06 21:50 Hao Luo
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system Hao Luo
                   ` (9 more replies)
  0 siblings, 10 replies; 28+ messages in thread
From: Hao Luo @ 2022-01-06 21:50 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, Stanislav Fomichev, bpf, Hao Luo

Bpffs is a pseudo file system that persists bpf objects. Previously
bpf objects can only be pinned in bpffs, this patchset extends pinning
to allow bpf objects to be pinned (or exposed) to other file systems.

In particular, this patchset allows pinning bpf objects in kernfs. This
creates a new file entry in the kernfs file system and the created file
is able to reference the bpf object. By doing so, bpf can be used to
customize the file's operations, such as seq_show.

As a concrete usecase of this feature, this patchset introduces a
simple new program type called 'bpf_view', which can be used to format
a seq file by a kernel object's state. By pinning a bpf_view program
into a cgroup directory, userspace is able to read the cgroup's state
from file in a format defined by the bpf program.

Different from bpffs, kernfs doesn't have a callback when a kernfs node
is freed, which is problem if we allow the kernfs node to hold an extra
reference of the bpf object, because there is no chance to dec the
object's refcnt. Therefore the kernfs node created by pinning doesn't
hold reference of the bpf object. The lifetime of the kernfs node
depends on the lifetime of the bpf object. Rather than "pinning in
kernfs", it is "exposing to kernfs". We require the bpf object to be
pinned in bpffs first before it can be pinned in kernfs. When the
object is unpinned from bpffs, their kernfs nodes will be removed
automatically. This somehow treats a pinned bpf object as a persistent
"device".

We rely on fsnotify to monitor the inode events in bpffs. A new function
bpf_watch_inode() is introduced. It allows registering a callback
function at inode destruction. For the kernfs case, a callback that
removes kernfs node is registered at the destruction of bpffs inodes.
For other file systems such as sockfs, bpf_watch_inode() can monitor the
destruction of sockfs inodes and the created file entry can hold the bpf
object's reference. In this case, it is truly "pinning".

File operations other than seq_show can also be implemented using bpf.
For example, bpf may be of help for .poll and .mmap in kernfs.

Patch organization:
 - patch 1/8 and 2/8 are preparations. 1/8 implements bpf_watch_inode();
   2/8 records bpffs inode in bpf object.
 - patch 3/8 and 4/8 implement generic logic for creating bpf backed
   kernfs file.
 - patch 5/8 and 6/8 add a new program type for formatting output.
 - patch 7/8 implements cgroup seq_show operation using bpf.
 - patch 8/8 adds selftest.

Hao Luo (8):
  bpf: Support pinning in non-bpf file system.
  bpf: Record back pointer to the inode in bpffs
  bpf: Expose bpf object in kernfs
  bpf: Support removing kernfs entries
  bpf: Introduce a new program type bpf_view.
  libbpf: Support of bpf_view prog type.
  bpf: Add seq_show operation for bpf in cgroupfs
  selftests/bpf: Test exposing bpf objects in kernfs

 include/linux/bpf.h                           |   9 +-
 include/uapi/linux/bpf.h                      |   2 +
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/bpf_view.c                         | 190 ++++++++++++++
 kernel/bpf/bpf_view.h                         |  25 ++
 kernel/bpf/inode.c                            | 219 ++++++++++++++--
 kernel/bpf/inode.h                            |  54 ++++
 kernel/bpf/kernfs_node.c                      | 165 ++++++++++++
 kernel/bpf/syscall.c                          |   3 +
 kernel/bpf/verifier.c                         |   6 +
 kernel/trace/bpf_trace.c                      |  12 +-
 tools/include/uapi/linux/bpf.h                |   2 +
 tools/lib/bpf/libbpf.c                        |  21 ++
 .../selftests/bpf/prog_tests/pinning_kernfs.c | 245 ++++++++++++++++++
 .../selftests/bpf/progs/pinning_kernfs.c      |  72 +++++
 15 files changed, 995 insertions(+), 32 deletions(-)
 create mode 100644 kernel/bpf/bpf_view.c
 create mode 100644 kernel/bpf/bpf_view.h
 create mode 100644 kernel/bpf/inode.h
 create mode 100644 kernel/bpf/kernfs_node.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c
 create mode 100644 tools/testing/selftests/bpf/progs/pinning_kernfs.c

-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system.
  2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
@ 2022-01-06 21:50 ` Hao Luo
  2022-01-07  0:04   ` kernel test robot
                     ` (2 more replies)
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 2/8] bpf: Record back pointer to the inode in bpffs Hao Luo
                   ` (8 subsequent siblings)
  9 siblings, 3 replies; 28+ messages in thread
From: Hao Luo @ 2022-01-06 21:50 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, Stanislav Fomichev, bpf, Hao Luo

Introduce a new API called bpf_watch_inode() to watch the destruction of
an inode and calls a registered callback function. With the help of this
new API, one can implement pinning bpf objects in a non-bpf file system
such as sockfs. The ability of pinning bpf objects in an external file
system has potential uses: for example, allow using bpf programs to
customize file behaviors, as we can see in the following patches.

Extending the pinning logic in bpf_obj_do_pin() to associate bpf objects
to inodes of another file system is relatively straightforward. The
challenge is how to notify the bpf object when the associated inode is
gone so that the object's refcnt can be decremented at that time. Bpffs
uses .free_inode() callback in super_operations to drop object's refcnt.
But not every file system implements .free_inode() and inserting bpf
notification to every target file system can be too instrusive.

Thanks to fsnotify, there is a generic callback in VFS that can be
used to notify the events of an inode. bpf_watch_inode() implements on
top of that. bpf_watch_inode() allows the caller to pass in a callback
(for example, decrementing an object's refcnt), which will be called
when the inode is about to be freed. So typically, one can implement
exposing bpf objects to other file systems in the following steps:

 1. extend bpf_obj_do_pin() to create a new entry in the target file
    system.
 2. call bpf_watch_inode() to register bpf object put operation at
    the destruction of the newly created inode.

Of course, on a system with no fsnotify support, pinning bpf object in
non-bpf file system will not be available.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 kernel/bpf/inode.c | 118 ++++++++++++++++++++++++++++++++++++++++-----
 kernel/bpf/inode.h |  33 +++++++++++++
 2 files changed, 140 insertions(+), 11 deletions(-)
 create mode 100644 kernel/bpf/inode.h

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 80da1db47c68..b4066dd986a8 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -16,18 +16,13 @@
 #include <linux/fs.h>
 #include <linux/fs_context.h>
 #include <linux/fs_parser.h>
+#include <linux/fsnotify_backend.h>
 #include <linux/kdev_t.h>
 #include <linux/filter.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
 #include "preload/bpf_preload.h"
-
-enum bpf_type {
-	BPF_TYPE_UNSPEC	= 0,
-	BPF_TYPE_PROG,
-	BPF_TYPE_MAP,
-	BPF_TYPE_LINK,
-};
+#include "inode.h"
 
 static void *bpf_any_get(void *raw, enum bpf_type type)
 {
@@ -67,6 +62,95 @@ static void bpf_any_put(void *raw, enum bpf_type type)
 	}
 }
 
+#ifdef CONFIG_FSNOTIFY
+/* Notification mechanism based on fsnotify, used in bpf to watch the
+ * destruction of an inode. This inode could an inode in bpffs or any
+ * other file system.
+ */
+
+struct notify_mark {
+	struct fsnotify_mark fsn_mark;
+	const struct notify_ops *ops;
+	void *object;
+	enum bpf_type type;
+	void *priv;
+};
+
+struct fsnotify_group *bpf_notify_group;
+struct kmem_cache *bpf_notify_mark_cachep __read_mostly;
+
+/* Handler for any inode event. */
+int handle_inode_event(struct fsnotify_mark *mark, u32 mask,
+		       struct inode *inode, struct inode *dir,
+		       const struct qstr *file_name, u32 cookie)
+{
+	return 0;
+}
+
+/* Handler for freeing marks. This is called when the watched inode is being
+ * freed.
+ */
+static void notify_freeing_mark(struct fsnotify_mark *mark, struct fsnotify_group *group)
+{
+	struct notify_mark *b_mark;
+
+	b_mark = container_of(mark, struct notify_mark, fsn_mark);
+
+	if (b_mark->ops && b_mark->ops->free_inode)
+		b_mark->ops->free_inode(b_mark->object, b_mark->type, b_mark->priv);
+}
+
+static void notify_free_mark(struct fsnotify_mark *mark)
+{
+	struct notify_mark *b_mark;
+
+	b_mark = container_of(mark, struct notify_mark, fsn_mark);
+
+	kmem_cache_free(bpf_notify_mark_cachep, b_mark);
+}
+
+struct fsnotify_ops bpf_notify_ops = {
+	.handle_inode_event = handle_inode_event,
+	.freeing_mark = notify_freeing_mark,
+	.free_mark = notify_free_mark,
+};
+
+static int bpf_inode_type(const struct inode *inode, enum bpf_type *type);
+
+/* Watch the destruction of an inode and calls the callbacks in the given
+ * notify_ops.
+ */
+int bpf_watch_inode(struct inode *inode, const struct notify_ops *ops, void *priv)
+{
+	enum bpf_type type;
+	struct notify_mark *b_mark;
+	int ret;
+
+	if (IS_ERR(bpf_notify_group) || unlikely(!bpf_notify_mark_cachep))
+		return -ENOMEM;
+
+	b_mark = kmem_cache_alloc(bpf_notify_mark_cachep, GFP_KERNEL_ACCOUNT);
+	if (unlikely(!b_mark))
+		return -ENOMEM;
+
+	fsnotify_init_mark(&b_mark->fsn_mark, bpf_notify_group);
+	b_mark->ops = ops;
+	b_mark->priv = priv;
+	b_mark->object = inode->i_private;
+	bpf_inode_type(inode, &type);
+	b_mark->type = type;
+
+	ret = fsnotify_add_inode_mark(&b_mark->fsn_mark, inode,
+				      /*allow_dups=*/1);
+
+	fsnotify_put_mark(&b_mark->fsn_mark); /* match get in fsnotify_init_mark */
+
+	return ret;
+}
+#endif
+
+/* bpffs */
+
 static void *bpf_fd_probe_obj(u32 ufd, enum bpf_type *type)
 {
 	void *raw;
@@ -435,11 +519,15 @@ static int bpf_iter_link_pin_kernel(struct dentry *parent,
 	return ret;
 }
 
+static bool dentry_is_bpf_dir(struct dentry *dentry)
+{
+	return d_inode(dentry)->i_op == &bpf_dir_iops;
+}
+
 static int bpf_obj_do_pin(const char __user *pathname, void *raw,
 			  enum bpf_type type)
 {
 	struct dentry *dentry;
-	struct inode *dir;
 	struct path path;
 	umode_t mode;
 	int ret;
@@ -454,8 +542,7 @@ static int bpf_obj_do_pin(const char __user *pathname, void *raw,
 	if (ret)
 		goto out;
 
-	dir = d_inode(path.dentry);
-	if (dir->i_op != &bpf_dir_iops) {
+	if (!dentry_is_bpf_dir(path.dentry)) {
 		ret = -EPERM;
 		goto out;
 	}
@@ -821,8 +908,17 @@ static int __init bpf_init(void)
 		return ret;
 
 	ret = register_filesystem(&bpf_fs_type);
-	if (ret)
+	if (ret) {
 		sysfs_remove_mount_point(fs_kobj, "bpf");
+		return ret;
+	}
+
+#ifdef CONFIG_FSNOTIFY
+	bpf_notify_mark_cachep = KMEM_CACHE(notify_mark, 0);
+	bpf_notify_group = fsnotify_alloc_group(&bpf_notify_ops);
+	if (IS_ERR(bpf_notify_group) || !bpf_notify_mark_cachep)
+		pr_warn("Failed to initialize bpf_notify system, user can not pin objects outside bpffs.\n");
+#endif
 
 	return ret;
 }
diff --git a/kernel/bpf/inode.h b/kernel/bpf/inode.h
new file mode 100644
index 000000000000..3f53a4542028
--- /dev/null
+++ b/kernel/bpf/inode.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) */
+/* Copyright (c) 2022 Google
+ */
+#ifndef __BPF_INODE_H_
+#define __BPF_INODE_H_
+
+enum bpf_type {
+	BPF_TYPE_UNSPEC = 0,
+	BPF_TYPE_PROG,
+	BPF_TYPE_MAP,
+	BPF_TYPE_LINK,
+};
+
+struct notify_ops {
+	void (*free_inode)(void *object, enum bpf_type type, void *priv);
+};
+
+#ifdef CONFIG_FSNOTIFY
+/* Watch the destruction of an inode and calls the callbacks in the given
+ * notify_ops.
+ */
+int bpf_watch_inode(struct inode *inode, const struct notify_ops *ops,
+		    void *priv);
+#else
+static inline
+int bpf_watch_inode(struct inode *inode, const struct notify_ops *ops,
+		    void *priv)
+{
+	return -EPERM;
+}
+#endif  // CONFIG_FSNOTIFY
+
+#endif  // __BPF_INODE_H_
-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC bpf-next v1 2/8] bpf: Record back pointer to the inode in bpffs
  2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system Hao Luo
@ 2022-01-06 21:50 ` Hao Luo
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 3/8] bpf: Expose bpf object in kernfs Hao Luo
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Hao Luo @ 2022-01-06 21:50 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, Stanislav Fomichev, bpf, Hao Luo

When an object is pinned in bpffs, record the bpffs inode in the object.
The previous patch introduced bpf_watch_inode(), which can also be used
to watch the bpffs inode. This capability will be used in the following
patches to expose bpf objects to file systems where the nodes in the
file system are not backed by an inode.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 include/linux/bpf.h |  5 +++-
 kernel/bpf/inode.c  | 60 ++++++++++++++++++++++++++++++++++++++++++++-
 kernel/bpf/inode.h  |  9 +++++++
 3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6e947cd91152..2ec693c3d6f6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -184,7 +184,8 @@ struct bpf_map {
 	char name[BPF_OBJ_NAME_LEN];
 	bool bypass_spec_v1;
 	bool frozen; /* write-once; write-protected by freeze_mutex */
-	/* 14 bytes hole */
+	struct inode *backing_inode; /* back pointer to the inode in bpffs */
+	/* 6 bytes hole */
 
 	/* The 3rd and 4th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
@@ -991,6 +992,7 @@ struct bpf_prog_aux {
 		struct work_struct work;
 		struct rcu_head	rcu;
 	};
+	struct inode *backing_inode; /* back pointer to the inode in bpffs */
 };
 
 struct bpf_array_aux {
@@ -1018,6 +1020,7 @@ struct bpf_link {
 	const struct bpf_link_ops *ops;
 	struct bpf_prog *prog;
 	struct work_struct work;
+	struct inode *backing_inode; /* back pointer to the inode in bpffs */
 };
 
 struct bpf_link_ops {
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index b4066dd986a8..9ba10912cbf8 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -226,6 +226,57 @@ static int bpf_inode_type(const struct inode *inode, enum bpf_type *type)
 	return 0;
 }
 
+/* Conditionally set an object's backing inode. */
+static void cond_set_backing_inode(void *obj, enum bpf_type type,
+				   struct inode *old, struct inode *new)
+{
+	struct inode **ptr;
+
+	if (type == BPF_TYPE_PROG) {
+		struct bpf_prog *prog = obj;
+		ptr = &prog->aux->backing_inode;
+	} else if (type == BPF_TYPE_MAP) {
+		struct bpf_map *map = obj;
+		ptr = &map->backing_inode;
+	} else if (type == BPF_TYPE_LINK) {
+		struct bpf_link *link = obj;
+		ptr = &link->backing_inode;
+	} else {
+		return;
+	}
+
+	if (*ptr == old)
+		*ptr = new;
+}
+
+struct inode *get_backing_inode(void *obj, enum bpf_type type)
+{
+	struct inode *inode = NULL;
+
+	if (type == BPF_TYPE_PROG) {
+		struct bpf_prog *prog = obj;
+		inode = prog->aux->backing_inode;
+	} else if (type == BPF_TYPE_MAP) {
+		struct bpf_map *map = obj;
+		inode = map->backing_inode;
+	} else if (type == BPF_TYPE_LINK) {
+		struct bpf_link *link = obj;
+		inode = link->backing_inode;
+	}
+
+	if (!inode)
+		return NULL;
+
+	spin_lock(&inode->i_lock);
+	if (inode->i_state & (I_FREEING | I_WILL_FREE | I_NEW)) {
+		spin_unlock(&inode->i_lock);
+		return NULL;
+	}
+	__iget(inode);
+	spin_unlock(&inode->i_lock);
+	return inode;
+}
+
 static void bpf_dentry_finalize(struct dentry *dentry, struct inode *inode,
 				struct inode *dir)
 {
@@ -418,6 +469,8 @@ static int bpf_mkobj_ops(struct dentry *dentry, umode_t mode, void *raw,
 {
 	struct inode *dir = dentry->d_parent->d_inode;
 	struct inode *inode = bpf_get_inode(dir->i_sb, dir, mode);
+	enum bpf_type type;
+
 	if (IS_ERR(inode))
 		return PTR_ERR(inode);
 
@@ -425,6 +478,9 @@ static int bpf_mkobj_ops(struct dentry *dentry, umode_t mode, void *raw,
 	inode->i_fop = fops;
 	inode->i_private = raw;
 
+	if (!bpf_inode_type(inode, &type))
+		cond_set_backing_inode(raw, type, NULL, inode);
+
 	bpf_dentry_finalize(dentry, inode, dir);
 	return 0;
 }
@@ -703,8 +759,10 @@ static void bpf_free_inode(struct inode *inode)
 
 	if (S_ISLNK(inode->i_mode))
 		kfree(inode->i_link);
-	if (!bpf_inode_type(inode, &type))
+	if (!bpf_inode_type(inode, &type)) {
+		cond_set_backing_inode(inode->i_private, type, inode, NULL);
 		bpf_any_put(inode->i_private, type);
+	}
 	free_inode_nonrcu(inode);
 }
 
diff --git a/kernel/bpf/inode.h b/kernel/bpf/inode.h
index 3f53a4542028..e7fe8137be80 100644
--- a/kernel/bpf/inode.h
+++ b/kernel/bpf/inode.h
@@ -30,4 +30,13 @@ int bpf_watch_inode(struct inode *inode, const struct notify_ops *ops,
 }
 #endif  // CONFIG_FSNOTIFY
 
+/* Get the backing inode of a bpf object. When an object is pinned in bpf
+ * file system, an inode is associated with the object. This function returns
+ * that inode.
+ *
+ * On success, the inode is returned with refcnt incremented.
+ * On failure, NULL is returned.
+ */
+struct inode *get_backing_inode(void *obj, enum bpf_type);
+
 #endif  // __BPF_INODE_H_
-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC bpf-next v1 3/8] bpf: Expose bpf object in kernfs
  2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system Hao Luo
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 2/8] bpf: Record back pointer to the inode in bpffs Hao Luo
@ 2022-01-06 21:50 ` Hao Luo
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 4/8] bpf: Support removing kernfs entries Hao Luo
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Hao Luo @ 2022-01-06 21:50 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, Stanislav Fomichev, bpf, Hao Luo

This patch extends bpf_obj_do_pin() to allow creating a new entry in
kernfs which references a bpf object. Different from pinning objects
in bpffs, the created kernfs node does not hold an extra reference to
the object, because kernfs by itself doesn't have a notification
mechanism to put the object when the kernfs node is gone. Therefore
this patch is not "pinning" the object, but rather "exposing" the
object in kernfs. The lifetime of the created kernfs node depends on
the lifetime of the bpf object, not the other way around.

More specifically, we allow a bpf object to be exposed to kernfs only
after it becomes "persistent" by pinning in bpffs. So the lifetime of
the created kernfs node is tied to the bpffs inode. When the object
is unpinned from bpffs, the kernfs nodes exposing the bpf object will
be removed automatically. It uses the bpf_watch_inode() interface
introduced in the previous patches. Because the kernfs nodes do not
hold extra references to the object, we can remove the nodes at any
time without worrying about reference leak.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 kernel/bpf/Makefile      |  2 +-
 kernel/bpf/inode.c       | 43 +++++++++++++-------
 kernel/bpf/inode.h       | 11 ++++-
 kernel/bpf/kernfs_node.c | 87 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 126 insertions(+), 17 deletions(-)
 create mode 100644 kernel/bpf/kernfs_node.c

diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index c1a9be6a4b9f..b1abf0d94b5b 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -8,7 +8,7 @@ CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
-obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
+obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o kernfs_node.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
 obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
 obj-$(CONFIG_BPF_SYSCALL) += disasm.o
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 9ba10912cbf8..7e93e477b57c 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -580,6 +580,21 @@ static bool dentry_is_bpf_dir(struct dentry *dentry)
 	return d_inode(dentry)->i_op == &bpf_dir_iops;
 }
 
+static int bpf_obj_do_pin_generic(struct dentry *dentry, umode_t mode,
+				  void *obj, enum bpf_type type)
+{
+	switch (type) {
+	case BPF_TYPE_PROG:
+		return vfs_mkobj(dentry, mode, bpf_mkprog, obj);
+	case BPF_TYPE_MAP:
+		return vfs_mkobj(dentry, mode, bpf_mkmap, obj);
+	case BPF_TYPE_LINK:
+		return vfs_mkobj(dentry, mode, bpf_mklink, obj);
+	default:
+		return -EPERM;
+	}
+}
+
 static int bpf_obj_do_pin(const char __user *pathname, void *raw,
 			  enum bpf_type type)
 {
@@ -598,22 +613,20 @@ static int bpf_obj_do_pin(const char __user *pathname, void *raw,
 	if (ret)
 		goto out;
 
-	if (!dentry_is_bpf_dir(path.dentry)) {
-		ret = -EPERM;
-		goto out;
-	}
+	if (dentry_is_kernfs_dir(path.dentry)) {
+		ret = bpf_obj_do_pin_kernfs(dentry, mode, raw, type);
 
-	switch (type) {
-	case BPF_TYPE_PROG:
-		ret = vfs_mkobj(dentry, mode, bpf_mkprog, raw);
-		break;
-	case BPF_TYPE_MAP:
-		ret = vfs_mkobj(dentry, mode, bpf_mkmap, raw);
-		break;
-	case BPF_TYPE_LINK:
-		ret = vfs_mkobj(dentry, mode, bpf_mklink, raw);
-		break;
-	default:
+		/* Match bpf_fd_probe_obj(). bpf objects exposed to kernfs
+		 * do not hold an active reference. The lifetime of the
+		 * created kernfs node is tied to an inode in bpffs. So the
+		 * kernfs node gets destroyed automatically when the object
+		 * is unpinned from bpffs.
+		 */
+		if (ret == 0)
+			bpf_any_put(raw, type);
+	} else if (dentry_is_bpf_dir(path.dentry)) {
+		ret = bpf_obj_do_pin_generic(dentry, mode, raw, type);
+	} else {
 		ret = -EPERM;
 	}
 out:
diff --git a/kernel/bpf/inode.h b/kernel/bpf/inode.h
index e7fe8137be80..c12d385a3e2a 100644
--- a/kernel/bpf/inode.h
+++ b/kernel/bpf/inode.h
@@ -4,8 +4,10 @@
 #ifndef __BPF_INODE_H_
 #define __BPF_INODE_H_
 
+#include <linux/fs.h>
+
 enum bpf_type {
-	BPF_TYPE_UNSPEC = 0,
+	BPF_TYPE_UNSPEC	= 0,
 	BPF_TYPE_PROG,
 	BPF_TYPE_MAP,
 	BPF_TYPE_LINK,
@@ -39,4 +41,11 @@ int bpf_watch_inode(struct inode *inode, const struct notify_ops *ops,
  */
 struct inode *get_backing_inode(void *obj, enum bpf_type);
 
+/* Test whether a given dentry is a kernfs entry. */
+bool dentry_is_kernfs_dir(struct dentry *dentry);
+
+/* Expose bpf object to kernfs. Requires dentry to be in kernfs. */
+int bpf_obj_do_pin_kernfs(struct dentry *dentry, umode_t mode, void *obj,
+			  enum bpf_type type);
+
 #endif  // __BPF_INODE_H_
diff --git a/kernel/bpf/kernfs_node.c b/kernel/bpf/kernfs_node.c
new file mode 100644
index 000000000000..c1c45f7b948b
--- /dev/null
+++ b/kernel/bpf/kernfs_node.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Expose eBPF objects in kernfs file system.
+ */
+
+#include <linux/fs.h>
+#include <linux/kernfs.h>
+#include "inode.h"
+
+/* file_operations for kernfs file system */
+
+/* Handler when the watched inode is freed. */
+static void kn_watch_free_inode(void *obj, enum bpf_type type, void *kn)
+{
+	kernfs_remove(kn);
+
+	/* match get in bpf_obj_do_pin_kernfs */
+	kernfs_put(kn);
+}
+
+static const struct notify_ops notify_ops = {
+	.free_inode = kn_watch_free_inode,
+};
+
+/* Kernfs file operations for bpf created files. */
+static const struct kernfs_ops bpf_generic_ops = {
+};
+
+/* Test whether a given dentry is a kernfs entry. */
+bool dentry_is_kernfs_dir(struct dentry *dentry)
+{
+	return kernfs_node_from_dentry(dentry) != NULL;
+}
+
+/* Expose bpf object to kernfs. Requires dentry to exist in kernfs. */
+int bpf_obj_do_pin_kernfs(struct dentry *dentry, umode_t mode, void *obj,
+			  enum bpf_type type)
+{
+	struct dentry *parent_dentry;
+	struct super_block *sb;
+	struct kernfs_node *parent_kn, *kn;
+	struct kernfs_root *root;
+	const struct kernfs_ops *ops;
+	struct inode *inode;
+	int ret;
+
+	sb = dentry->d_sb;
+	root = kernfs_root_from_sb(sb);
+	if (!root) /* Not a kernfs file system. */
+		return -EPERM;
+
+	parent_dentry = dentry->d_parent;
+	parent_kn = kernfs_node_from_dentry(parent_dentry);
+	if (WARN_ON(!parent_kn))
+		return -EPERM;
+
+	inode = get_backing_inode(obj, type);
+	if (!inode)
+		return -ENXIO;
+
+	ops = &bpf_generic_ops;
+	kn = __kernfs_create_file(parent_kn, dentry->d_iname, mode,
+				  GLOBAL_ROOT_UID, GLOBAL_ROOT_GID,
+				  0, ops, inode, NULL, NULL);
+	if (IS_ERR(kn)) {
+		iput(inode);
+		return PTR_ERR(kn);
+	}
+
+	/* hold an active kn by bpffs inode. */
+	kernfs_get(kn);
+
+	/* Watch the backing inode of the object in bpffs. When the backing
+	 * inode is freed, the created kernfs entry will be removed as well.
+	 */
+	ret = bpf_watch_inode(inode, &notify_ops, kn);
+	if (ret) {
+		kernfs_put(kn);
+		kernfs_remove(kn);
+		iput(inode);
+		return ret;
+	}
+
+	kernfs_activate(kn);
+	iput(inode);
+	return 0;
+}
-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC bpf-next v1 4/8] bpf: Support removing kernfs entries
  2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
                   ` (2 preceding siblings ...)
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 3/8] bpf: Expose bpf object in kernfs Hao Luo
@ 2022-01-06 21:50 ` Hao Luo
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 5/8] bpf: Introduce a new program type bpf_view Hao Luo
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Hao Luo @ 2022-01-06 21:50 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, Stanislav Fomichev, bpf, Hao Luo

When a bpf object has been exposed in kernfs, there should be a way
to remove it. Kernfs doesn't implement unlink, therefore one can not
remove the entry in a normal way. To remove the file, we can allow
writing a special command to the new entry, which can trigger a
remove_self() for removal.

So far there are two ways to remove an entry that is created by pinning
bpf objects in kernfs:

 1. unpin the object from bpffs.
 2. write a special command to the kernfs entry.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 kernel/bpf/kernfs_node.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/kernel/bpf/kernfs_node.c b/kernel/bpf/kernfs_node.c
index c1c45f7b948b..3d331d8357db 100644
--- a/kernel/bpf/kernfs_node.c
+++ b/kernel/bpf/kernfs_node.c
@@ -9,6 +9,9 @@
 
 /* file_operations for kernfs file system */
 
+/* Command for removing a kernfs entry */
+#define REMOVE_CMD "rm"
+
 /* Handler when the watched inode is freed. */
 static void kn_watch_free_inode(void *obj, enum bpf_type type, void *kn)
 {
@@ -22,8 +25,27 @@ static const struct notify_ops notify_ops = {
 	.free_inode = kn_watch_free_inode,
 };
 
+static ssize_t bpf_generic_write(struct kernfs_open_file *of, char *buf,
+				 size_t bytes, loff_t off)
+{
+	if (sysfs_streq(buf, REMOVE_CMD)) {
+		kernfs_remove_self(of->kn);
+		return bytes;
+	}
+
+	return -EINVAL;
+}
+
+static ssize_t bpf_generic_read(struct kernfs_open_file *of, char *buf,
+				size_t bytes, loff_t off)
+{
+	return -EIO;
+}
+
 /* Kernfs file operations for bpf created files. */
 static const struct kernfs_ops bpf_generic_ops = {
+	.write          = bpf_generic_write,
+	.read           = bpf_generic_read,
 };
 
 /* Test whether a given dentry is a kernfs entry. */
-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC bpf-next v1 5/8] bpf: Introduce a new program type bpf_view.
  2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
                   ` (3 preceding siblings ...)
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 4/8] bpf: Support removing kernfs entries Hao Luo
@ 2022-01-06 21:50 ` Hao Luo
  2022-01-07  0:35   ` Yonghong Song
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 6/8] libbpf: Support of bpf_view prog type Hao Luo
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 28+ messages in thread
From: Hao Luo @ 2022-01-06 21:50 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, Stanislav Fomichev, bpf, Hao Luo

Introduce a new program type called "bpf_view", which can be used to
print out a kernel object's state to a seq file. So the signature of
this program consists of two parameters: a seq file and a kernel object.
Currently only 'struct cgroup' is supported.

The following patches will introduce a call site for this program type
and allow users to customize the format of printing out the state of
kernel objects to userspace.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 include/linux/bpf.h            |   4 +
 include/uapi/linux/bpf.h       |   2 +
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/bpf_view.c          | 179 +++++++++++++++++++++++++++++++++
 kernel/bpf/bpf_view.h          |  24 +++++
 kernel/bpf/syscall.c           |   3 +
 kernel/bpf/verifier.c          |   6 ++
 kernel/trace/bpf_trace.c       |  12 ++-
 tools/include/uapi/linux/bpf.h |   2 +
 9 files changed, 230 insertions(+), 4 deletions(-)
 create mode 100644 kernel/bpf/bpf_view.c
 create mode 100644 kernel/bpf/bpf_view.h

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2ec693c3d6f6..16f582dfff7e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1622,6 +1622,10 @@ void bpf_iter_map_show_fdinfo(const struct bpf_iter_aux_info *aux,
 int bpf_iter_map_fill_link_info(const struct bpf_iter_aux_info *aux,
 				struct bpf_link_info *info);
 
+bool bpf_view_prog_supported(struct bpf_prog *prog);
+int bpf_view_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
+			 struct bpf_prog *prog);
+
 int map_set_for_each_callback_args(struct bpf_verifier_env *env,
 				   struct bpf_func_state *caller,
 				   struct bpf_func_state *callee);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b0383d371b9a..efa0f21d13ba 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -982,6 +982,7 @@ enum bpf_attach_type {
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
 	BPF_TRACE_ITER,
+	BPF_TRACE_VIEW,
 	BPF_CGROUP_INET4_GETPEERNAME,
 	BPF_CGROUP_INET6_GETPEERNAME,
 	BPF_CGROUP_INET4_GETSOCKNAME,
@@ -1009,6 +1010,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_NETNS = 5,
 	BPF_LINK_TYPE_XDP = 6,
 	BPF_LINK_TYPE_PERF_EVENT = 7,
+	BPF_LINK_TYPE_VIEW = 8,
 
 	MAX_BPF_LINK_TYPE,
 };
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index b1abf0d94b5b..c662734d83c5 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -8,7 +8,7 @@ CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
-obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o kernfs_node.o
+obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o kernfs_node.o bpf_view.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
 obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
 obj-$(CONFIG_BPF_SYSCALL) += disasm.o
diff --git a/kernel/bpf/bpf_view.c b/kernel/bpf/bpf_view.c
new file mode 100644
index 000000000000..967a9240bab4
--- /dev/null
+++ b/kernel/bpf/bpf_view.c
@@ -0,0 +1,179 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include <linux/bpf.h>
+#include <linux/btf_ids.h>
+#include <linux/cgroup.h>
+#include <linux/filter.h>
+#include "bpf_view.h"
+
+static struct list_head targets = LIST_HEAD_INIT(targets);
+
+/* bpf_view_link operations */
+
+struct bpf_view_target_info {
+	struct list_head list;
+	const char *target;
+	u32 ctx_arg_info_size;
+	struct bpf_ctx_arg_aux ctx_arg_info[BPF_VIEW_CTX_ARG_MAX];
+	u32 btf_id;
+};
+
+struct bpf_view_link {
+	struct bpf_link link;
+	struct bpf_view_target_info *tinfo;
+};
+
+static void bpf_view_link_release(struct bpf_link *link)
+{
+}
+
+static void bpf_view_link_dealloc(struct bpf_link *link)
+{
+	struct bpf_view_link *view_link =
+		container_of(link, struct bpf_view_link, link);
+	kfree(view_link);
+}
+
+static void bpf_view_link_show_fdinfo(const struct bpf_link *link,
+				      struct seq_file *seq)
+{
+	struct bpf_view_link *view_link =
+		container_of(link, struct bpf_view_link, link);
+
+	seq_printf(seq, "attach_target:\t%s\n", view_link->tinfo->target);
+}
+
+static const struct bpf_link_ops bpf_view_link_lops = {
+	.release = bpf_view_link_release,
+	.dealloc = bpf_view_link_dealloc,
+	.show_fdinfo = bpf_view_link_show_fdinfo,
+};
+
+bool bpf_link_is_view(struct bpf_link *link)
+{
+	return link->ops == &bpf_view_link_lops;
+}
+
+int bpf_view_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
+			 struct bpf_prog *prog)
+{
+	struct bpf_link_primer link_primer;
+	struct bpf_view_target_info *tinfo;
+	struct bpf_view_link *link;
+	u32 prog_btf_id;
+	bool existed = false;
+	int err;
+
+	prog_btf_id = prog->aux->attach_btf_id;
+	list_for_each_entry(tinfo, &targets, list) {
+		if (tinfo->btf_id == prog_btf_id) {
+			existed = true;
+			break;
+		}
+	}
+	if (!existed)
+		return -ENOENT;
+
+	link = kzalloc(sizeof(*link), GFP_USER | __GFP_NOWARN);
+	if (!link)
+		return -ENOMEM;
+
+	bpf_link_init(&link->link, BPF_LINK_TYPE_VIEW, &bpf_view_link_lops, prog);
+	link->tinfo = tinfo;
+	err = bpf_link_prime(&link->link, &link_primer);
+	if (err) {
+		kfree(link);
+		return err;
+	}
+
+	return bpf_link_settle(&link_primer);
+}
+
+int run_view_prog(struct bpf_prog *prog, void *ctx)
+{
+	int ret;
+
+	rcu_read_lock();
+	migrate_disable();
+	ret = bpf_prog_run(prog, ctx);
+	migrate_enable();
+	rcu_read_unlock();
+
+	return ret;
+}
+
+bool bpf_view_prog_supported(struct bpf_prog *prog)
+{
+	const char *attach_fname = prog->aux->attach_func_name;
+	const char *prefix = BPF_VIEW_FUNC_PREFIX;
+	u32 prog_btf_id = prog->aux->attach_btf_id;
+	struct bpf_view_target_info *tinfo;
+	int prefix_len = strlen(prefix);
+	bool supported = false;
+
+	if (strncmp(attach_fname, prefix, prefix_len))
+		return false;
+
+	list_for_each_entry(tinfo, &targets, list) {
+		if (tinfo->btf_id && tinfo->btf_id == prog_btf_id) {
+			supported = true;
+			break;
+		}
+		if (!strcmp(attach_fname + prefix_len, tinfo->target)) {
+			tinfo->btf_id = prog->aux->attach_btf_id;
+			supported = true;
+			break;
+		}
+	}
+	if (supported) {
+		prog->aux->ctx_arg_info_size = tinfo->ctx_arg_info_size;
+		prog->aux->ctx_arg_info = tinfo->ctx_arg_info;
+	}
+	return supported;
+}
+
+/* Generate BTF_IDs */
+BTF_ID_LIST(bpf_view_btf_ids)
+BTF_ID(struct, seq_file)
+BTF_ID(struct, cgroup)
+
+/* Index of bpf_view_btf_ids */
+enum {
+	BTF_ID_SEQ_FILE = 0,
+	BTF_ID_CGROUP,
+};
+
+static void register_bpf_view_target(struct bpf_view_target_info *target,
+				     int idx[BPF_VIEW_CTX_ARG_MAX])
+{
+	int i;
+
+	for (i = 0; i < target->ctx_arg_info_size; ++i)
+		target->ctx_arg_info[i].btf_id = bpf_view_btf_ids[idx[i]];
+
+	INIT_LIST_HEAD(&target->list);
+	list_add(&target->list, &targets);
+}
+
+DEFINE_BPF_VIEW_FUNC(cgroup, struct seq_file *seq, struct cgroup *cgroup)
+
+static struct bpf_view_target_info cgroup_view_tinfo = {
+	.target			= "cgroup",
+	.ctx_arg_info_size	= 2,
+	.ctx_arg_info		= {
+		{ offsetof(struct bpf_view_cgroup_ctx, seq), PTR_TO_BTF_ID },
+		{ offsetof(struct bpf_view_cgroup_ctx, cgroup), PTR_TO_BTF_ID },
+	},
+	.btf_id			= 0,
+};
+
+static int __init bpf_view_init(void)
+{
+	int cgroup_view_idx[BPF_VIEW_CTX_ARG_MAX] = {
+		BTF_ID_SEQ_FILE, BTF_ID_CGROUP };
+
+	register_bpf_view_target(&cgroup_view_tinfo, cgroup_view_idx);
+
+	return 0;
+}
+late_initcall(bpf_view_init);
+
diff --git a/kernel/bpf/bpf_view.h b/kernel/bpf/bpf_view.h
new file mode 100644
index 000000000000..1a1110a5727f
--- /dev/null
+++ b/kernel/bpf/bpf_view.h
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#ifndef _BPF_VIEW_H_
+#define _BPF_VIEW_H_
+
+#include <linux/bpf.h>
+
+#define BPF_VIEW_FUNC_PREFIX "bpf_view_"
+#define DEFINE_BPF_VIEW_FUNC(target, args...) \
+	extern int bpf_view_ ## target(args); \
+	int __init bpf_view_ ## target(args) { return 0; }
+
+#define BPF_VIEW_CTX_ARG_MAX 2
+
+struct bpf_view_cgroup_ctx {
+	__bpf_md_ptr(struct seq_file *, seq);
+	__bpf_md_ptr(struct cgroup *, cgroup);
+};
+
+bool bpf_link_is_view(struct bpf_link *link);
+
+/* Run a bpf_view program */
+int run_view_prog(struct bpf_prog *prog, void *ctx);
+
+#endif  // _BPF_VIEW_H_
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fa4505f9b611..32ac84d3ac0b 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3175,6 +3175,7 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 	case BPF_CGROUP_SETSOCKOPT:
 		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
 	case BPF_TRACE_ITER:
+	case BPF_TRACE_VIEW:
 		return BPF_PROG_TYPE_TRACING;
 	case BPF_SK_LOOKUP:
 		return BPF_PROG_TYPE_SK_LOOKUP;
@@ -4235,6 +4236,8 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
 
 	if (prog->expected_attach_type == BPF_TRACE_ITER)
 		return bpf_iter_link_attach(attr, uattr, prog);
+	else if (prog->expected_attach_type == BPF_TRACE_VIEW)
+		return bpf_view_link_attach(attr, uattr, prog);
 	else if (prog->type == BPF_PROG_TYPE_EXT)
 		return bpf_tracing_prog_attach(prog,
 					       attr->link_create.target_fd,
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index bfb45381fb3f..ce7816519c93 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9770,6 +9770,7 @@ static int check_return_code(struct bpf_verifier_env *env)
 		case BPF_MODIFY_RETURN:
 			return 0;
 		case BPF_TRACE_ITER:
+		case BPF_TRACE_VIEW:
 			break;
 		default:
 			return -ENOTSUPP;
@@ -13971,6 +13972,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
 
 		break;
 	case BPF_TRACE_ITER:
+	case BPF_TRACE_VIEW:
 		if (!btf_type_is_func(t)) {
 			bpf_log(log, "attach_btf_id %u is not a function\n",
 				btf_id);
@@ -14147,6 +14149,10 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		if (!bpf_iter_prog_supported(prog))
 			return -EINVAL;
 		return 0;
+	} else if (prog->expected_attach_type == BPF_TRACE_VIEW) {
+		if (!bpf_view_prog_supported(prog))
+			return -EINVAL;
+		return 0;
 	}
 
 	if (prog->type == BPF_PROG_TYPE_LSM) {
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 21aa30644219..9413b5af6e2c 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1630,6 +1630,12 @@ raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	}
 }
 
+static inline bool prog_support_seq_helpers(const struct bpf_prog *prog)
+{
+	return prog->expected_attach_type == BPF_TRACE_ITER ||
+		prog->expected_attach_type == BPF_TRACE_VIEW;
+}
+
 const struct bpf_func_proto *
 tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
@@ -1663,15 +1669,15 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_socket_ptr_cookie_proto;
 #endif
 	case BPF_FUNC_seq_printf:
-		return prog->expected_attach_type == BPF_TRACE_ITER ?
+		return prog_support_seq_helpers(prog) ?
 		       &bpf_seq_printf_proto :
 		       NULL;
 	case BPF_FUNC_seq_write:
-		return prog->expected_attach_type == BPF_TRACE_ITER ?
+		return prog_support_seq_helpers(prog) ?
 		       &bpf_seq_write_proto :
 		       NULL;
 	case BPF_FUNC_seq_printf_btf:
-		return prog->expected_attach_type == BPF_TRACE_ITER ?
+		return prog_support_seq_helpers(prog) ?
 		       &bpf_seq_printf_btf_proto :
 		       NULL;
 	case BPF_FUNC_d_path:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b0383d371b9a..efa0f21d13ba 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -982,6 +982,7 @@ enum bpf_attach_type {
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
 	BPF_TRACE_ITER,
+	BPF_TRACE_VIEW,
 	BPF_CGROUP_INET4_GETPEERNAME,
 	BPF_CGROUP_INET6_GETPEERNAME,
 	BPF_CGROUP_INET4_GETSOCKNAME,
@@ -1009,6 +1010,7 @@ enum bpf_link_type {
 	BPF_LINK_TYPE_NETNS = 5,
 	BPF_LINK_TYPE_XDP = 6,
 	BPF_LINK_TYPE_PERF_EVENT = 7,
+	BPF_LINK_TYPE_VIEW = 8,
 
 	MAX_BPF_LINK_TYPE,
 };
-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC bpf-next v1 6/8] libbpf: Support of bpf_view prog type.
  2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
                   ` (4 preceding siblings ...)
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 5/8] bpf: Introduce a new program type bpf_view Hao Luo
@ 2022-01-06 21:50 ` Hao Luo
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 7/8] bpf: Add seq_show operation for bpf in cgroupfs Hao Luo
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Hao Luo @ 2022-01-06 21:50 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, Stanislav Fomichev, bpf, Hao Luo

The previous patch introdued a new program type bpf_view. This
patch adds support for bpf_view in libbpf.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 tools/lib/bpf/libbpf.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 7f10dd501a52..0d458e34d82c 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8570,6 +8570,7 @@ static struct bpf_link *attach_raw_tp(const struct bpf_program *prog, long cooki
 static struct bpf_link *attach_trace(const struct bpf_program *prog, long cookie);
 static struct bpf_link *attach_lsm(const struct bpf_program *prog, long cookie);
 static struct bpf_link *attach_iter(const struct bpf_program *prog, long cookie);
+static struct bpf_link *attach_view(const struct bpf_program *prog, long cookie);
 
 static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("socket",		SOCKET_FILTER, 0, SEC_NONE | SEC_SLOPPY_PFX),
@@ -8599,6 +8600,7 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("lsm/",			LSM, BPF_LSM_MAC, SEC_ATTACH_BTF, attach_lsm),
 	SEC_DEF("lsm.s/",		LSM, BPF_LSM_MAC, SEC_ATTACH_BTF | SEC_SLEEPABLE, attach_lsm),
 	SEC_DEF("iter/",		TRACING, BPF_TRACE_ITER, SEC_ATTACH_BTF, attach_iter),
+	SEC_DEF("view/",		TRACING, BPF_TRACE_VIEW, SEC_ATTACH_BTF, attach_view),
 	SEC_DEF("syscall",		SYSCALL, 0, SEC_SLEEPABLE),
 	SEC_DEF("xdp_devmap/",		XDP, BPF_XDP_DEVMAP, SEC_ATTACHABLE),
 	SEC_DEF("xdp_cpumap/",		XDP, BPF_XDP_CPUMAP, SEC_ATTACHABLE),
@@ -8896,6 +8898,7 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj,
 #define BTF_TRACE_PREFIX "btf_trace_"
 #define BTF_LSM_PREFIX "bpf_lsm_"
 #define BTF_ITER_PREFIX "bpf_iter_"
+#define BTF_VIEW_PREFIX "bpf_view_"
 #define BTF_MAX_NAME_SIZE 128
 
 void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
@@ -8914,6 +8917,10 @@ void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type,
 		*prefix = BTF_ITER_PREFIX;
 		*kind = BTF_KIND_FUNC;
 		break;
+	case BPF_TRACE_VIEW:
+		*prefix = BTF_VIEW_PREFIX;
+		*kind = BTF_KIND_FUNC;
+		break;
 	default:
 		*prefix = "";
 		*kind = BTF_KIND_FUNC;
@@ -10575,6 +10582,20 @@ struct bpf_link *bpf_program__attach_freplace(const struct bpf_program *prog,
 	}
 }
 
+static struct bpf_link *attach_view(const struct bpf_program *prog, long cookie)
+{
+	const char *target_name;
+	const char *prefix = "view/";
+	int btf_id;
+
+	target_name = prog->sec_name + strlen(prefix);
+	btf_id = libbpf_find_vmlinux_btf_id(target_name, BPF_TRACE_VIEW);
+	if (btf_id < 0)
+		return libbpf_err_ptr(btf_id);
+
+	return bpf_program__attach_fd(prog, 0, btf_id, "view");
+}
+
 struct bpf_link *
 bpf_program__attach_iter(const struct bpf_program *prog,
 			 const struct bpf_iter_attach_opts *opts)
-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC bpf-next v1 7/8] bpf: Add seq_show operation for bpf in cgroupfs
  2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
                   ` (5 preceding siblings ...)
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 6/8] libbpf: Support of bpf_view prog type Hao Luo
@ 2022-01-06 21:50 ` Hao Luo
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 8/8] selftests/bpf: Test exposing bpf objects in kernfs Hao Luo
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 28+ messages in thread
From: Hao Luo @ 2022-01-06 21:50 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, Stanislav Fomichev, bpf, Hao Luo

Previous patches allow exposing bpf objects to kernfs file system.
They allow creating file entries in kernfs, which can reference bpf
objects. The referred bpf objects can be used to customize the new
entry's file operations.

In particular, this patch introduces one concrete use case of this
feature. It implements the .seq_show file operation for the cgroup
file system. The seq_show handler takes the bpf object and use it to
format its output seq file. The bpf object needs to be a link to the
newly introduced "bpf_view" program type.

Signed-off-by: Hao Luo <haoluo@google.com>
---
 kernel/bpf/bpf_view.c    | 11 ++++++++
 kernel/bpf/bpf_view.h    |  1 +
 kernel/bpf/inode.c       |  4 +--
 kernel/bpf/inode.h       |  3 +++
 kernel/bpf/kernfs_node.c | 58 +++++++++++++++++++++++++++++++++++++++-
 5 files changed, 73 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/bpf_view.c b/kernel/bpf/bpf_view.c
index 967a9240bab4..8f035c5a9b6a 100644
--- a/kernel/bpf/bpf_view.c
+++ b/kernel/bpf/bpf_view.c
@@ -166,6 +166,17 @@ static struct bpf_view_target_info cgroup_view_tinfo = {
 	.btf_id			= 0,
 };
 
+bool bpf_link_is_cgroup_view(struct bpf_link *link)
+{
+	struct bpf_view_link *view_link;
+
+	if (!bpf_link_is_view(link))
+		return false;
+
+	view_link = container_of(link, struct bpf_view_link, link);
+	return view_link->tinfo == &cgroup_view_tinfo;
+}
+
 static int __init bpf_view_init(void)
 {
 	int cgroup_view_idx[BPF_VIEW_CTX_ARG_MAX] = {
diff --git a/kernel/bpf/bpf_view.h b/kernel/bpf/bpf_view.h
index 1a1110a5727f..a02564e529cb 100644
--- a/kernel/bpf/bpf_view.h
+++ b/kernel/bpf/bpf_view.h
@@ -17,6 +17,7 @@ struct bpf_view_cgroup_ctx {
 };
 
 bool bpf_link_is_view(struct bpf_link *link);
+bool bpf_link_is_cgroup_view(struct bpf_link *link);
 
 /* Run a bpf_view program */
 int run_view_prog(struct bpf_prog *prog, void *ctx);
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 7e93e477b57c..1ae4a7b8c732 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -115,8 +115,6 @@ struct fsnotify_ops bpf_notify_ops = {
 	.free_mark = notify_free_mark,
 };
 
-static int bpf_inode_type(const struct inode *inode, enum bpf_type *type);
-
 /* Watch the destruction of an inode and calls the callbacks in the given
  * notify_ops.
  */
@@ -211,7 +209,7 @@ static struct inode *bpf_get_inode(struct super_block *sb,
 	return inode;
 }
 
-static int bpf_inode_type(const struct inode *inode, enum bpf_type *type)
+int bpf_inode_type(const struct inode *inode, enum bpf_type *type)
 {
 	*type = BPF_TYPE_UNSPEC;
 	if (inode->i_op == &bpf_prog_iops)
diff --git a/kernel/bpf/inode.h b/kernel/bpf/inode.h
index c12d385a3e2a..dea78341549b 100644
--- a/kernel/bpf/inode.h
+++ b/kernel/bpf/inode.h
@@ -17,6 +17,9 @@ struct notify_ops {
 	void (*free_inode)(void *object, enum bpf_type type, void *priv);
 };
 
+/* Get the type of bpf object from bpffs inode. */
+int bpf_inode_type(const struct inode *inode, enum bpf_type *type);
+
 #ifdef CONFIG_FSNOTIFY
 /* Watch the destruction of an inode and calls the callbacks in the given
  * notify_ops.
diff --git a/kernel/bpf/kernfs_node.c b/kernel/bpf/kernfs_node.c
index 3d331d8357db..7b58bfc1951e 100644
--- a/kernel/bpf/kernfs_node.c
+++ b/kernel/bpf/kernfs_node.c
@@ -3,15 +3,33 @@
  * Expose eBPF objects in kernfs file system.
  */
 
+#include <linux/bpf.h>
 #include <linux/fs.h>
 #include <linux/kernfs.h>
+#include <linux/btf_ids.h>
+#include <linux/magic.h>
+#include <linux/seq_file.h>
 #include "inode.h"
+#include "bpf_view.h"
 
 /* file_operations for kernfs file system */
 
 /* Command for removing a kernfs entry */
 #define REMOVE_CMD "rm"
 
+static const struct kernfs_ops bpf_generic_ops;
+static const struct kernfs_ops bpf_cgroup_ops;
+
+/* Choose the right kernfs_ops for different kernfs. */
+static const struct kernfs_ops *bpf_kernfs_ops(struct super_block *sb)
+{
+	if (sb->s_magic == CGROUP_SUPER_MAGIC ||
+	    sb->s_magic == CGROUP2_SUPER_MAGIC)
+		return &bpf_cgroup_ops;
+
+	return &bpf_generic_ops;
+}
+
 /* Handler when the watched inode is freed. */
 static void kn_watch_free_inode(void *obj, enum bpf_type type, void *kn)
 {
@@ -80,7 +98,7 @@ int bpf_obj_do_pin_kernfs(struct dentry *dentry, umode_t mode, void *obj,
 	if (!inode)
 		return -ENXIO;
 
-	ops = &bpf_generic_ops;
+	ops = bpf_kernfs_ops(sb);
 	kn = __kernfs_create_file(parent_kn, dentry->d_iname, mode,
 				  GLOBAL_ROOT_UID, GLOBAL_ROOT_GID,
 				  0, ops, inode, NULL, NULL);
@@ -107,3 +125,41 @@ int bpf_obj_do_pin_kernfs(struct dentry *dentry, umode_t mode, void *obj,
 	iput(inode);
 	return 0;
 }
+
+/* file_operations for cgroup file system */
+static int bpf_cgroup_seq_show(struct seq_file *seq, void *v)
+{
+	struct bpf_view_cgroup_ctx ctx;
+	struct kernfs_open_file *of;
+	struct kernfs_node *kn;
+	struct cgroup *cgroup;
+	struct inode *inode;
+	struct bpf_link *link;
+	enum bpf_type type;
+
+	of = seq->private;
+	kn = of->kn;
+	cgroup = kn->parent->priv;
+
+	inode = kn->priv;
+	if (bpf_inode_type(inode, &type))
+		return -ENXIO;
+
+	if (type != BPF_TYPE_LINK)
+		return -EACCES;
+
+	link = inode->i_private;
+	if (!bpf_link_is_cgroup_view(link))
+		return -EACCES;
+
+	ctx.seq = seq;
+	ctx.cgroup = cgroup;
+
+	return run_view_prog(link->prog, &ctx);
+}
+
+static const struct kernfs_ops bpf_cgroup_ops = {
+	.seq_show	= bpf_cgroup_seq_show,
+	.write          = bpf_generic_write,
+	.read           = bpf_generic_read,
+};
-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH RFC bpf-next v1 8/8] selftests/bpf: Test exposing bpf objects in kernfs
  2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
                   ` (6 preceding siblings ...)
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 7/8] bpf: Add seq_show operation for bpf in cgroupfs Hao Luo
@ 2022-01-06 21:50 ` Hao Luo
  2022-01-06 23:02 ` [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs sdf
  2022-01-07  0:30 ` Yonghong Song
  9 siblings, 0 replies; 28+ messages in thread
From: Hao Luo @ 2022-01-06 21:50 UTC (permalink / raw)
  To: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, Stanislav Fomichev, bpf, Hao Luo

Add selftests for exposing bpf objects in kernfs. Basically the added
test tests two functionalities:

  1. the ability to expose generic bpf objects in kernfs.
  2. the ability to expose bpf_view programs to cgroup file system and
     read from the created cgroupfs entry.

The test assumes cgroup v2 is mounted at /sys/fs/cgroup/ and bpffs is
mounted at /sys/fs/bpf/

Signed-off-by: Hao Luo <haoluo@google.com>
---
 .../selftests/bpf/prog_tests/pinning_kernfs.c | 245 ++++++++++++++++++
 .../selftests/bpf/progs/pinning_kernfs.c      |  72 +++++
 2 files changed, 317 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c
 create mode 100644 tools/testing/selftests/bpf/progs/pinning_kernfs.c

diff --git a/tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c b/tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c
new file mode 100644
index 000000000000..aa702d05bf25
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c
@@ -0,0 +1,245 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <fcntl.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <test_progs.h>
+#include <time.h>
+#include <unistd.h>
+#include "pinning_kernfs.skel.h"
+
+/* remove pinned object from kernfs */
+static void do_unpin(const char *kernfs_path, const char *msg)
+{
+	struct stat statbuf = {};
+	const char cmd[] = "rm";
+	int fd;
+
+	fd = open(kernfs_path, O_WRONLY);
+	if (fd < 0)
+		return;
+	ASSERT_GE(write(fd, cmd, sizeof(cmd)), 0, "fail_unpin_cgroup_entry");
+	close(fd);
+
+	ASSERT_ERR(stat(kernfs_path, &statbuf), msg);
+}
+
+static void do_pin(int fd, const char *pinpath)
+{
+	struct stat statbuf = {};
+
+	if (!ASSERT_OK(bpf_obj_pin(fd, pinpath), "bpf_obj_pin"))
+		return;
+
+	ASSERT_OK(stat(pinpath, &statbuf), "stat");
+}
+
+static void check_pinning(const char *bpffs_rootpath,
+			  const char *kernfs_rootpath)
+{
+	const char msg[] = "xxx";
+	char buf[8];
+	struct pinning_kernfs *skel;
+	struct bpf_link *link;
+	int prog_fd, map_fd, link_fd;
+	char bpffs_path[64];
+	char kernfs_path[64];
+	struct stat statbuf = {};
+	int err, fd;
+
+	skel = pinning_kernfs__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "pinning_kernfs__open_and_load"))
+		return;
+
+	snprintf(kernfs_path, 64, "%s/bpf_obj", kernfs_rootpath);
+	snprintf(bpffs_path, 64, "%s/bpf_obj", bpffs_rootpath);
+
+	prog_fd = bpf_program__fd(skel->progs.wait_show);
+
+	/* test 1:
+	 *
+	 *  - expose object in kernfs without pinning in bpffs in the first place.
+	 */
+	ASSERT_ERR(bpf_obj_pin(prog_fd, kernfs_path), "pin_kernfs_first");
+
+	/* test 2:
+	 *
+	 *  - expose bpf prog in kernfs.
+	 *  - read/write the newly creaded kernfs entry.
+	 */
+	do_pin(prog_fd, bpffs_path);
+	do_pin(prog_fd, kernfs_path);
+	fd = open(kernfs_path, O_RDWR);
+	err = read(fd, buf, sizeof(buf));
+	if (!ASSERT_EQ(err, -1, "unexpected_successful_read"))
+		goto out;
+
+	err = write(fd, msg, sizeof(msg));
+	if (!ASSERT_EQ(err, -1, "unexpected_successful_write"))
+		goto out;
+
+	close(fd);
+	do_unpin(kernfs_path, "kernfs_unpin_prog");
+	ASSERT_OK(unlink(bpffs_path), "bpffs_unlink_prog");
+
+	/* test 3:
+	 *
+	 *  - expose bpf map in kernfs.
+	 *  - read/write the newly created kernfs entry.
+	 */
+	map_fd = bpf_map__fd(skel->maps.wait_map);
+	do_pin(map_fd, bpffs_path);
+	do_pin(map_fd, kernfs_path);
+	fd = open(kernfs_path, O_RDWR);
+	err = read(fd, buf, sizeof(buf));
+	if (!ASSERT_EQ(err, -1, "unexpected_successful_read"))
+		goto out;
+
+	err = write(fd, msg, sizeof(msg));
+	if (!ASSERT_EQ(err, -1, "unexpected_successful_write"))
+		goto out;
+
+	close(fd);
+	do_unpin(kernfs_path, "kernfs_unpin_map");
+	ASSERT_OK(unlink(bpffs_path), "bpffs_unlink_map");
+
+	/* test 4:
+	 *
+	 *  - expose bpf link in kernfs.
+	 *  - read/write the newly created kernfs entry.
+	 *  - removing bpffs entry also removes kernfs entries.
+	 */
+	link = bpf_program__attach(skel->progs.wait_record);
+	link_fd = bpf_link__fd(link);
+	do_pin(link_fd, bpffs_path);
+	do_pin(link_fd, kernfs_path);
+	fd = open(kernfs_path, O_RDWR);
+	err = read(fd, buf, sizeof(buf));
+	if (!ASSERT_EQ(err, -1, "unexpected_successful_read"))
+		goto destroy_link;
+
+	err = write(fd, msg, sizeof(msg));
+	if (!ASSERT_EQ(err, -1, "unexpected_successful_write"))
+		goto destroy_link;
+
+	ASSERT_OK(unlink(bpffs_path), "bpffs_unlink_link");
+	ASSERT_ERR(stat(kernfs_path, &statbuf), "unpin_bpffs_first");
+
+	/* cleanup */
+destroy_link:
+	bpf_link__destroy(link);
+out:
+	close(fd);
+	pinning_kernfs__destroy(skel);
+}
+
+static void spin_on_cpu(int seconds)
+{
+	time_t start, now;
+
+	start = time(NULL);
+	do {
+		now = time(NULL);
+	} while (now - start < seconds);
+}
+
+static void do_work(const char *cgroup)
+{
+	int cpu = 0, pid;
+	char cmd[128];
+
+	/* make cgroup threaded */
+	snprintf(cmd, 128, "echo threaded > %s/cgroup.type", cgroup);
+	system(cmd);
+
+	pid = fork();
+	if (pid == 0) {
+		/* attach to cgroup */
+		snprintf(cmd, 128, "echo %d > %s/cgroup.procs", getpid(), cgroup);
+		system(cmd);
+
+		/* pin process to target cpu */
+		snprintf(cmd, 128, "taskset -pc %d %d", cpu, getpid());
+		system(cmd);
+
+		spin_on_cpu(3); /* spin on cpu for 3 seconds */
+		exit(0);
+	}
+
+	/* pin process to target cpu */
+	snprintf(cmd, 128, "taskset -pc %d %d", cpu, getpid());
+	system(cmd);
+
+	spin_on_cpu(3); /* spin on cpu for 3 seconds */
+	wait(NULL);
+}
+
+void read_from_file(const char *path)
+{
+	int id = 0, lat;
+	char buf[64];
+	int fd;
+
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return;
+	ASSERT_GE(read(fd, buf, sizeof(buf)), 0, "fail_read_cgroup_entry");
+	ASSERT_EQ(sscanf(buf, "%d %d", &id, &lat), 2, "unexpected_seq_show_output");
+	close(fd);
+}
+
+static void check_cgroup_seq_show(const char *bpffs_dir,
+				  const char *cgroup_dir)
+{
+	struct pinning_kernfs *skel;
+	char bpffs_path[64];
+	char cgroup_path[64];
+	int fd;
+
+	skel = pinning_kernfs__open_and_load();
+	if (!ASSERT_OK_PTR(skel, "pinning_kernfs__open_and_load"))
+		return;
+
+	pinning_kernfs__attach(skel);
+
+	snprintf(bpffs_path, 64, "%s/bpf_obj", bpffs_dir);
+	snprintf(cgroup_path, 64, "%s/bpf_obj", cgroup_dir);
+
+	/* generate wait events for the cgroup */
+	do_work(cgroup_dir);
+
+	/* expose wait_show prog to cgroupfs */
+	fd = bpf_link__fd(skel->links.wait_show);
+	ASSERT_OK(bpf_obj_pin(fd, bpffs_path), "pin_bpffs");
+	ASSERT_OK(bpf_obj_pin(fd, cgroup_path), "pin_cgroupfs");
+
+	/* read from cgroupfs and check results */
+	read_from_file(cgroup_path);
+
+	/* cleanup */
+	do_unpin(cgroup_path, "cgroup_unpin_seq_show");
+	ASSERT_OK(unlink(bpffs_path), "bpffs_unlink_seq_show");
+
+	pinning_kernfs__destroy(skel);
+}
+
+void test_pinning_kernfs(void)
+{
+	char kernfs_tmpl[] = "/sys/fs/cgroup/bpf_pinning_test_XXXXXX";
+	char bpffs_tmpl[] = "/sys/fs/bpf/pinning_test_XXXXXX";
+	char *kernfs_rootpath, *bpffs_rootpath;
+
+	kernfs_rootpath = mkdtemp(kernfs_tmpl);
+	bpffs_rootpath = mkdtemp(bpffs_tmpl);
+
+	/* check pinning map, prog and link in kernfs */
+	if (test__start_subtest("pinning"))
+		check_pinning(bpffs_rootpath, kernfs_rootpath);
+
+	/* check cgroup seq_show implemented using bpf */
+	if (test__start_subtest("cgroup_seq_show"))
+		check_cgroup_seq_show(bpffs_rootpath, kernfs_rootpath);
+
+	rmdir(kernfs_rootpath);
+	rmdir(bpffs_rootpath);
+}
diff --git a/tools/testing/selftests/bpf/progs/pinning_kernfs.c b/tools/testing/selftests/bpf/progs/pinning_kernfs.c
new file mode 100644
index 000000000000..ca03a9443794
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/pinning_kernfs.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022 Google */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+struct bpf_map_def SEC("maps") wait_map = {
+	.type = BPF_MAP_TYPE_HASH,
+	.key_size = sizeof(__u64),
+	.value_size = sizeof(__u64),
+	.max_entries = 65532,
+};
+
+/* task_group() from kernel/sched/sched.h */
+static struct task_group *task_group(struct task_struct *p)
+{
+	return p->sched_task_group;
+}
+
+static struct cgroup *task_cgroup(struct task_struct *p)
+{
+	struct task_group *tg;
+
+	tg = task_group(p);
+	return tg->css.cgroup;
+}
+
+/* cgroup_id() from linux/cgroup.h */
+static __u64 cgroup_id(const struct cgroup *cgroup)
+{
+	return cgroup->kn->id;
+}
+
+SEC("tp_btf/sched_stat_wait")
+int BPF_PROG(wait_record, struct task_struct *p, __u64 delta)
+{
+	struct cgroup *cgrp;
+	__u64 *wait_ns;
+	__u64 id;
+
+	cgrp = task_cgroup(p);
+	if (!cgrp)
+		return 0;
+
+	id = cgroup_id(cgrp);
+	wait_ns = bpf_map_lookup_elem(&wait_map, &id);
+
+	/* record the max wait latency seen so far */
+	if (!wait_ns)
+		bpf_map_update_elem(&wait_map, &id, &delta, BPF_NOEXIST);
+	else if (*wait_ns < delta)
+		*wait_ns = delta;
+	return 0;
+}
+
+SEC("view/cgroup")
+int BPF_PROG(wait_show, struct seq_file *seq, struct cgroup *cgroup)
+{
+	__u64 id, *value;
+
+	id = cgroup_id(cgroup);
+	value = bpf_map_lookup_elem(&wait_map, &id);
+
+	if (value)
+		BPF_SEQ_PRINTF(seq, "%llu %llu\n", id, *value);
+	else
+		BPF_SEQ_PRINTF(seq, "%llu 0\n", id);
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.34.1.448.ga2b2bfdf31-goog


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
                   ` (7 preceding siblings ...)
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 8/8] selftests/bpf: Test exposing bpf objects in kernfs Hao Luo
@ 2022-01-06 23:02 ` sdf
  2022-01-07 18:59   ` Hao Luo
  2022-01-07  0:30 ` Yonghong Song
  9 siblings, 1 reply; 28+ messages in thread
From: sdf @ 2022-01-06 23:02 UTC (permalink / raw)
  To: Hao Luo
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, bpf

On 01/06, Hao Luo wrote:
> Bpffs is a pseudo file system that persists bpf objects. Previously
> bpf objects can only be pinned in bpffs, this patchset extends pinning
> to allow bpf objects to be pinned (or exposed) to other file systems.

> In particular, this patchset allows pinning bpf objects in kernfs. This
> creates a new file entry in the kernfs file system and the created file
> is able to reference the bpf object. By doing so, bpf can be used to
> customize the file's operations, such as seq_show.

> As a concrete usecase of this feature, this patchset introduces a
> simple new program type called 'bpf_view', which can be used to format
> a seq file by a kernel object's state. By pinning a bpf_view program
> into a cgroup directory, userspace is able to read the cgroup's state
> from file in a format defined by the bpf program.

> Different from bpffs, kernfs doesn't have a callback when a kernfs node
> is freed, which is problem if we allow the kernfs node to hold an extra
> reference of the bpf object, because there is no chance to dec the
> object's refcnt. Therefore the kernfs node created by pinning doesn't
> hold reference of the bpf object. The lifetime of the kernfs node
> depends on the lifetime of the bpf object. Rather than "pinning in
> kernfs", it is "exposing to kernfs". We require the bpf object to be
> pinned in bpffs first before it can be pinned in kernfs. When the
> object is unpinned from bpffs, their kernfs nodes will be removed
> automatically. This somehow treats a pinned bpf object as a persistent
> "device".

> We rely on fsnotify to monitor the inode events in bpffs. A new function
> bpf_watch_inode() is introduced. It allows registering a callback
> function at inode destruction. For the kernfs case, a callback that
> removes kernfs node is registered at the destruction of bpffs inodes.
> For other file systems such as sockfs, bpf_watch_inode() can monitor the
> destruction of sockfs inodes and the created file entry can hold the bpf
> object's reference. In this case, it is truly "pinning".

> File operations other than seq_show can also be implemented using bpf.
> For example, bpf may be of help for .poll and .mmap in kernfs.

This looks awesome!

One thing I don't understand is: why did go through the pinning
interface VS regular attach/detach? IOW, why not allow regular
sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup
(which, in turn, creates the kernfs nodes). Seems like this way you can drop
the requirement on the object being pinned in the bpffs first?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system.
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system Hao Luo
@ 2022-01-07  0:04   ` kernel test robot
  2022-01-07  0:33   ` Yonghong Song
  2022-01-08  0:41     ` kernel test robot
  2 siblings, 0 replies; 28+ messages in thread
From: kernel test robot @ 2022-01-07  0:04 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1995 bytes --]

Hi Hao,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Hao-Luo/Pinning-bpf-objects-outside-bpffs/20220107-055252
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: arc-allyesconfig (https://download.01.org/0day-ci/archive/20220107/202201070824.fU5x5JHr-lkp(a)intel.com/config)
compiler: arceb-elf-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/7c6dacd226b9d48b0c2bcc226bf890776267de90
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Hao-Luo/Pinning-bpf-objects-outside-bpffs/20220107-055252
        git checkout 7c6dacd226b9d48b0c2bcc226bf890776267de90
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=arc SHELL=/bin/bash kernel/bpf/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> kernel/bpf/inode.c:83:5: warning: no previous prototype for 'handle_inode_event' [-Wmissing-prototypes]
      83 | int handle_inode_event(struct fsnotify_mark *mark, u32 mask,
         |     ^~~~~~~~~~~~~~~~~~


vim +/handle_inode_event +83 kernel/bpf/inode.c

    81	
    82	/* Handler for any inode event. */
  > 83	int handle_inode_event(struct fsnotify_mark *mark, u32 mask,
    84			       struct inode *inode, struct inode *dir,
    85			       const struct qstr *file_name, u32 cookie)
    86	{
    87		return 0;
    88	}
    89	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
                   ` (8 preceding siblings ...)
  2022-01-06 23:02 ` [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs sdf
@ 2022-01-07  0:30 ` Yonghong Song
  2022-01-07 20:43   ` Hao Luo
  9 siblings, 1 reply; 28+ messages in thread
From: Yonghong Song @ 2022-01-07  0:30 UTC (permalink / raw)
  To: Hao Luo, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, KP Singh, Shakeel Butt, Joe Burton,
	Stanislav Fomichev, bpf



On 1/6/22 1:50 PM, Hao Luo wrote:
> Bpffs is a pseudo file system that persists bpf objects. Previously
> bpf objects can only be pinned in bpffs, this patchset extends pinning
> to allow bpf objects to be pinned (or exposed) to other file systems.
> 
> In particular, this patchset allows pinning bpf objects in kernfs. This
> creates a new file entry in the kernfs file system and the created file
> is able to reference the bpf object. By doing so, bpf can be used to
> customize the file's operations, such as seq_show.
> 
> As a concrete usecase of this feature, this patchset introduces a
> simple new program type called 'bpf_view', which can be used to format
> a seq file by a kernel object's state. By pinning a bpf_view program
> into a cgroup directory, userspace is able to read the cgroup's state
> from file in a format defined by the bpf program.
> 
> Different from bpffs, kernfs doesn't have a callback when a kernfs node
> is freed, which is problem if we allow the kernfs node to hold an extra
> reference of the bpf object, because there is no chance to dec the
> object's refcnt. Therefore the kernfs node created by pinning doesn't
> hold reference of the bpf object. The lifetime of the kernfs node
> depends on the lifetime of the bpf object. Rather than "pinning in
> kernfs", it is "exposing to kernfs". We require the bpf object to be
> pinned in bpffs first before it can be pinned in kernfs. When the
> object is unpinned from bpffs, their kernfs nodes will be removed
> automatically. This somehow treats a pinned bpf object as a persistent
> "device".

During our initial discussion for bpf_iter, we even proposed to
put cat-able files under /proc/ system. But there are some concerns
that /proc/ system holds stable APIs so people can rely on its format 
etc. Not sure kernfs here has such requirement or not.

I understand directly put files in respective target directories
(e.g., cgroup) helps. But you can also create directory hierarchy
in bpffs. This can make a bunch of cgroup-stat-dumping bpf programs
better organized.

Also regarding new program type bpf_view, I think we can reuse
bpf_iter infrastructure. E.g., we already can customize bpf_iter
for a particular map. We can certainly customize bpf_iter targeting
a particular cgroup.

> 
> We rely on fsnotify to monitor the inode events in bpffs. A new function
> bpf_watch_inode() is introduced. It allows registering a callback
> function at inode destruction. For the kernfs case, a callback that
> removes kernfs node is registered at the destruction of bpffs inodes.
> For other file systems such as sockfs, bpf_watch_inode() can monitor the
> destruction of sockfs inodes and the created file entry can hold the bpf
> object's reference. In this case, it is truly "pinning".
> 
> File operations other than seq_show can also be implemented using bpf.
> For example, bpf may be of help for .poll and .mmap in kernfs.
> 
> Patch organization:
>   - patch 1/8 and 2/8 are preparations. 1/8 implements bpf_watch_inode();
>     2/8 records bpffs inode in bpf object.
>   - patch 3/8 and 4/8 implement generic logic for creating bpf backed
>     kernfs file.
>   - patch 5/8 and 6/8 add a new program type for formatting output.
>   - patch 7/8 implements cgroup seq_show operation using bpf.
>   - patch 8/8 adds selftest.
> 
> Hao Luo (8):
>    bpf: Support pinning in non-bpf file system.
>    bpf: Record back pointer to the inode in bpffs
>    bpf: Expose bpf object in kernfs
>    bpf: Support removing kernfs entries
>    bpf: Introduce a new program type bpf_view.
>    libbpf: Support of bpf_view prog type.
>    bpf: Add seq_show operation for bpf in cgroupfs
>    selftests/bpf: Test exposing bpf objects in kernfs
> 
>   include/linux/bpf.h                           |   9 +-
>   include/uapi/linux/bpf.h                      |   2 +
>   kernel/bpf/Makefile                           |   2 +-
>   kernel/bpf/bpf_view.c                         | 190 ++++++++++++++
>   kernel/bpf/bpf_view.h                         |  25 ++
>   kernel/bpf/inode.c                            | 219 ++++++++++++++--
>   kernel/bpf/inode.h                            |  54 ++++
>   kernel/bpf/kernfs_node.c                      | 165 ++++++++++++
>   kernel/bpf/syscall.c                          |   3 +
>   kernel/bpf/verifier.c                         |   6 +
>   kernel/trace/bpf_trace.c                      |  12 +-
>   tools/include/uapi/linux/bpf.h                |   2 +
>   tools/lib/bpf/libbpf.c                        |  21 ++
>   .../selftests/bpf/prog_tests/pinning_kernfs.c | 245 ++++++++++++++++++
>   .../selftests/bpf/progs/pinning_kernfs.c      |  72 +++++
>   15 files changed, 995 insertions(+), 32 deletions(-)
>   create mode 100644 kernel/bpf/bpf_view.c
>   create mode 100644 kernel/bpf/bpf_view.h
>   create mode 100644 kernel/bpf/inode.h
>   create mode 100644 kernel/bpf/kernfs_node.c
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c
>   create mode 100644 tools/testing/selftests/bpf/progs/pinning_kernfs.c
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system.
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system Hao Luo
  2022-01-07  0:04   ` kernel test robot
@ 2022-01-07  0:33   ` Yonghong Song
  2022-01-08  0:41     ` kernel test robot
  2 siblings, 0 replies; 28+ messages in thread
From: Yonghong Song @ 2022-01-07  0:33 UTC (permalink / raw)
  To: Hao Luo, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, KP Singh, Shakeel Butt, Joe Burton,
	Stanislav Fomichev, bpf



On 1/6/22 1:50 PM, Hao Luo wrote:
> Introduce a new API called bpf_watch_inode() to watch the destruction of
> an inode and calls a registered callback function. With the help of this
> new API, one can implement pinning bpf objects in a non-bpf file system
> such as sockfs. The ability of pinning bpf objects in an external file
> system has potential uses: for example, allow using bpf programs to
> customize file behaviors, as we can see in the following patches.
> 
> Extending the pinning logic in bpf_obj_do_pin() to associate bpf objects
> to inodes of another file system is relatively straightforward. The
> challenge is how to notify the bpf object when the associated inode is
> gone so that the object's refcnt can be decremented at that time. Bpffs
> uses .free_inode() callback in super_operations to drop object's refcnt.
> But not every file system implements .free_inode() and inserting bpf
> notification to every target file system can be too instrusive.
> 
> Thanks to fsnotify, there is a generic callback in VFS that can be
> used to notify the events of an inode. bpf_watch_inode() implements on
> top of that. bpf_watch_inode() allows the caller to pass in a callback
> (for example, decrementing an object's refcnt), which will be called
> when the inode is about to be freed. So typically, one can implement
> exposing bpf objects to other file systems in the following steps:
> 
>   1. extend bpf_obj_do_pin() to create a new entry in the target file
>      system.
>   2. call bpf_watch_inode() to register bpf object put operation at
>      the destruction of the newly created inode.
> 
> Of course, on a system with no fsnotify support, pinning bpf object in
> non-bpf file system will not be available.
> 
> Signed-off-by: Hao Luo <haoluo@google.com>
> ---
>   kernel/bpf/inode.c | 118 ++++++++++++++++++++++++++++++++++++++++-----
>   kernel/bpf/inode.h |  33 +++++++++++++
>   2 files changed, 140 insertions(+), 11 deletions(-)
>   create mode 100644 kernel/bpf/inode.h
> 
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index 80da1db47c68..b4066dd986a8 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -16,18 +16,13 @@
>   #include <linux/fs.h>
>   #include <linux/fs_context.h>
>   #include <linux/fs_parser.h>
> +#include <linux/fsnotify_backend.h>
>   #include <linux/kdev_t.h>
>   #include <linux/filter.h>
>   #include <linux/bpf.h>
>   #include <linux/bpf_trace.h>
>   #include "preload/bpf_preload.h"
> -
> -enum bpf_type {
> -	BPF_TYPE_UNSPEC	= 0,
> -	BPF_TYPE_PROG,
> -	BPF_TYPE_MAP,
> -	BPF_TYPE_LINK,
> -};
> +#include "inode.h"
>   
>   static void *bpf_any_get(void *raw, enum bpf_type type)
>   {
> @@ -67,6 +62,95 @@ static void bpf_any_put(void *raw, enum bpf_type type)
>   	}
>   }
>   
> +#ifdef CONFIG_FSNOTIFY
> +/* Notification mechanism based on fsnotify, used in bpf to watch the
> + * destruction of an inode. This inode could an inode in bpffs or any
> + * other file system.
> + */
> +
> +struct notify_mark {
> +	struct fsnotify_mark fsn_mark;
> +	const struct notify_ops *ops;
> +	void *object;
> +	enum bpf_type type;
> +	void *priv;
> +};
> +
> +struct fsnotify_group *bpf_notify_group;
> +struct kmem_cache *bpf_notify_mark_cachep __read_mostly;
> +
> +/* Handler for any inode event. */
> +int handle_inode_event(struct fsnotify_mark *mark, u32 mask,
> +		       struct inode *inode, struct inode *dir,
> +		       const struct qstr *file_name, u32 cookie)
> +{
> +	return 0;
> +}
> +
> +/* Handler for freeing marks. This is called when the watched inode is being
> + * freed.
> + */
> +static void notify_freeing_mark(struct fsnotify_mark *mark, struct fsnotify_group *group)
> +{
> +	struct notify_mark *b_mark;
> +
> +	b_mark = container_of(mark, struct notify_mark, fsn_mark);
> +
> +	if (b_mark->ops && b_mark->ops->free_inode)
> +		b_mark->ops->free_inode(b_mark->object, b_mark->type, b_mark->priv);
> +}
> +
> +static void notify_free_mark(struct fsnotify_mark *mark)
> +{
> +	struct notify_mark *b_mark;
> +
> +	b_mark = container_of(mark, struct notify_mark, fsn_mark);
> +
> +	kmem_cache_free(bpf_notify_mark_cachep, b_mark);
> +}
> +
> +struct fsnotify_ops bpf_notify_ops = {
> +	.handle_inode_event = handle_inode_event,
> +	.freeing_mark = notify_freeing_mark,
> +	.free_mark = notify_free_mark,
> +};
> +
> +static int bpf_inode_type(const struct inode *inode, enum bpf_type *type);
> +
> +/* Watch the destruction of an inode and calls the callbacks in the given
> + * notify_ops.
> + */
> +int bpf_watch_inode(struct inode *inode, const struct notify_ops *ops, void *priv)
> +{
> +	enum bpf_type type;
> +	struct notify_mark *b_mark;
> +	int ret;
> +
> +	if (IS_ERR(bpf_notify_group) || unlikely(!bpf_notify_mark_cachep))
> +		return -ENOMEM;
> +
> +	b_mark = kmem_cache_alloc(bpf_notify_mark_cachep, GFP_KERNEL_ACCOUNT);
> +	if (unlikely(!b_mark))
> +		return -ENOMEM;
> +
> +	fsnotify_init_mark(&b_mark->fsn_mark, bpf_notify_group);
> +	b_mark->ops = ops;
> +	b_mark->priv = priv;
> +	b_mark->object = inode->i_private;
> +	bpf_inode_type(inode, &type);
> +	b_mark->type = type;
> +
> +	ret = fsnotify_add_inode_mark(&b_mark->fsn_mark, inode,
> +				      /*allow_dups=*/1);
> +
> +	fsnotify_put_mark(&b_mark->fsn_mark); /* match get in fsnotify_init_mark */
> +
> +	return ret;
> +}
> +#endif
> +
> +/* bpffs */
> +
>   static void *bpf_fd_probe_obj(u32 ufd, enum bpf_type *type)
>   {
>   	void *raw;
> @@ -435,11 +519,15 @@ static int bpf_iter_link_pin_kernel(struct dentry *parent,
>   	return ret;
>   }
>   
> +static bool dentry_is_bpf_dir(struct dentry *dentry)
> +{
> +	return d_inode(dentry)->i_op == &bpf_dir_iops;
> +}
> +
>   static int bpf_obj_do_pin(const char __user *pathname, void *raw,
>   			  enum bpf_type type)
>   {
>   	struct dentry *dentry;
> -	struct inode *dir;
>   	struct path path;
>   	umode_t mode;
>   	int ret;
> @@ -454,8 +542,7 @@ static int bpf_obj_do_pin(const char __user *pathname, void *raw,
>   	if (ret)
>   		goto out;
>   
> -	dir = d_inode(path.dentry);
> -	if (dir->i_op != &bpf_dir_iops) {
> +	if (!dentry_is_bpf_dir(path.dentry)) {
>   		ret = -EPERM;
>   		goto out;
>   	}
> @@ -821,8 +908,17 @@ static int __init bpf_init(void)
>   		return ret;
>   
>   	ret = register_filesystem(&bpf_fs_type);
> -	if (ret)
> +	if (ret) {
>   		sysfs_remove_mount_point(fs_kobj, "bpf");
> +		return ret;
> +	}
> +
> +#ifdef CONFIG_FSNOTIFY
> +	bpf_notify_mark_cachep = KMEM_CACHE(notify_mark, 0);
> +	bpf_notify_group = fsnotify_alloc_group(&bpf_notify_ops);
> +	if (IS_ERR(bpf_notify_group) || !bpf_notify_mark_cachep)
> +		pr_warn("Failed to initialize bpf_notify system, user can not pin objects outside bpffs.\n");
> +#endif
>   
>   	return ret;
>   }
> diff --git a/kernel/bpf/inode.h b/kernel/bpf/inode.h
> new file mode 100644
> index 000000000000..3f53a4542028
> --- /dev/null
> +++ b/kernel/bpf/inode.h
> @@ -0,0 +1,33 @@
> +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) */

GPL-2.0-only? This file is kernel only.

> +/* Copyright (c) 2022 Google
> + */
> +#ifndef __BPF_INODE_H_
> +#define __BPF_INODE_H_
> +
> +enum bpf_type {
> +	BPF_TYPE_UNSPEC = 0,
> +	BPF_TYPE_PROG,
> +	BPF_TYPE_MAP,
> +	BPF_TYPE_LINK,
> +};
> +
> +struct notify_ops {
> +	void (*free_inode)(void *object, enum bpf_type type, void *priv);
> +};
> +
> +#ifdef CONFIG_FSNOTIFY
> +/* Watch the destruction of an inode and calls the callbacks in the given
> + * notify_ops.
> + */
> +int bpf_watch_inode(struct inode *inode, const struct notify_ops *ops,
> +		    void *priv);
> +#else
> +static inline
> +int bpf_watch_inode(struct inode *inode, const struct notify_ops *ops,
> +		    void *priv)
> +{
> +	return -EPERM;
> +}
> +#endif  // CONFIG_FSNOTIFY
> +
> +#endif  // __BPF_INODE_H_

For the above two. C style comment is preferred.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 5/8] bpf: Introduce a new program type bpf_view.
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 5/8] bpf: Introduce a new program type bpf_view Hao Luo
@ 2022-01-07  0:35   ` Yonghong Song
  0 siblings, 0 replies; 28+ messages in thread
From: Yonghong Song @ 2022-01-07  0:35 UTC (permalink / raw)
  To: Hao Luo, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann
  Cc: Martin KaFai Lau, Song Liu, KP Singh, Shakeel Butt, Joe Burton,
	Stanislav Fomichev, bpf



On 1/6/22 1:50 PM, Hao Luo wrote:
> Introduce a new program type called "bpf_view", which can be used to
> print out a kernel object's state to a seq file. So the signature of
> this program consists of two parameters: a seq file and a kernel object.
> Currently only 'struct cgroup' is supported.
> 
> The following patches will introduce a call site for this program type
> and allow users to customize the format of printing out the state of
> kernel objects to userspace.
> 
> Signed-off-by: Hao Luo <haoluo@google.com>
> ---
>   include/linux/bpf.h            |   4 +
>   include/uapi/linux/bpf.h       |   2 +
>   kernel/bpf/Makefile            |   2 +-
>   kernel/bpf/bpf_view.c          | 179 +++++++++++++++++++++++++++++++++
>   kernel/bpf/bpf_view.h          |  24 +++++
>   kernel/bpf/syscall.c           |   3 +
>   kernel/bpf/verifier.c          |   6 ++
>   kernel/trace/bpf_trace.c       |  12 ++-
>   tools/include/uapi/linux/bpf.h |   2 +
>   9 files changed, 230 insertions(+), 4 deletions(-)
>   create mode 100644 kernel/bpf/bpf_view.c
>   create mode 100644 kernel/bpf/bpf_view.h
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 2ec693c3d6f6..16f582dfff7e 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1622,6 +1622,10 @@ void bpf_iter_map_show_fdinfo(const struct bpf_iter_aux_info *aux,
>   int bpf_iter_map_fill_link_info(const struct bpf_iter_aux_info *aux,
>   				struct bpf_link_info *info);
>   
> +bool bpf_view_prog_supported(struct bpf_prog *prog);
> +int bpf_view_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
> +			 struct bpf_prog *prog);
> +
>   int map_set_for_each_callback_args(struct bpf_verifier_env *env,
>   				   struct bpf_func_state *caller,
>   				   struct bpf_func_state *callee);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b0383d371b9a..efa0f21d13ba 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -982,6 +982,7 @@ enum bpf_attach_type {
>   	BPF_MODIFY_RETURN,
>   	BPF_LSM_MAC,
>   	BPF_TRACE_ITER,
> +	BPF_TRACE_VIEW,

Please add the new entry to the end of enum list. Otherwise,
this will break backward compatibility.

>   	BPF_CGROUP_INET4_GETPEERNAME,
>   	BPF_CGROUP_INET6_GETPEERNAME,
>   	BPF_CGROUP_INET4_GETSOCKNAME,
> @@ -1009,6 +1010,7 @@ enum bpf_link_type {
>   	BPF_LINK_TYPE_NETNS = 5,
>   	BPF_LINK_TYPE_XDP = 6,
>   	BPF_LINK_TYPE_PERF_EVENT = 7,
> +	BPF_LINK_TYPE_VIEW = 8,
>   
>   	MAX_BPF_LINK_TYPE,
>   };
[...]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-06 23:02 ` [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs sdf
@ 2022-01-07 18:59   ` Hao Luo
  2022-01-07 19:25     ` sdf
  0 siblings, 1 reply; 28+ messages in thread
From: Hao Luo @ 2022-01-07 18:59 UTC (permalink / raw)
  To: sdf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, bpf

On Thu, Jan 6, 2022 at 3:03 PM <sdf@google.com> wrote:
>
> On 01/06, Hao Luo wrote:
> > Bpffs is a pseudo file system that persists bpf objects. Previously
> > bpf objects can only be pinned in bpffs, this patchset extends pinning
> > to allow bpf objects to be pinned (or exposed) to other file systems.
>
> > In particular, this patchset allows pinning bpf objects in kernfs. This
> > creates a new file entry in the kernfs file system and the created file
> > is able to reference the bpf object. By doing so, bpf can be used to
> > customize the file's operations, such as seq_show.
>
> > As a concrete usecase of this feature, this patchset introduces a
> > simple new program type called 'bpf_view', which can be used to format
> > a seq file by a kernel object's state. By pinning a bpf_view program
> > into a cgroup directory, userspace is able to read the cgroup's state
> > from file in a format defined by the bpf program.
>
> > Different from bpffs, kernfs doesn't have a callback when a kernfs node
> > is freed, which is problem if we allow the kernfs node to hold an extra
> > reference of the bpf object, because there is no chance to dec the
> > object's refcnt. Therefore the kernfs node created by pinning doesn't
> > hold reference of the bpf object. The lifetime of the kernfs node
> > depends on the lifetime of the bpf object. Rather than "pinning in
> > kernfs", it is "exposing to kernfs". We require the bpf object to be
> > pinned in bpffs first before it can be pinned in kernfs. When the
> > object is unpinned from bpffs, their kernfs nodes will be removed
> > automatically. This somehow treats a pinned bpf object as a persistent
> > "device".
>
> > We rely on fsnotify to monitor the inode events in bpffs. A new function
> > bpf_watch_inode() is introduced. It allows registering a callback
> > function at inode destruction. For the kernfs case, a callback that
> > removes kernfs node is registered at the destruction of bpffs inodes.
> > For other file systems such as sockfs, bpf_watch_inode() can monitor the
> > destruction of sockfs inodes and the created file entry can hold the bpf
> > object's reference. In this case, it is truly "pinning".
>
> > File operations other than seq_show can also be implemented using bpf.
> > For example, bpf may be of help for .poll and .mmap in kernfs.
>
> This looks awesome!
>
> One thing I don't understand is: why did go through the pinning
> interface VS regular attach/detach? IOW, why not allow regular
> sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup
> (which, in turn, creates the kernfs nodes). Seems like this way you can drop
> the requirement on the object being pinned in the bpffs first?

Thanks Stan.

Yeah, the attach/detach approach is definitely another option. IIUC,
in comparison to pinning, does attach/detach only work for cgroups?
Pinning may be used on other file systems, sockfs, sysfs or resctrl.
But I don't know whether this generality is welcome and implementing
seq_show is the only concrete use case I can think of right now. If
people think the ability of creating files in other subsystems is not
good, I'd be happy to take a look at the attach/detach approach and
that may be the right way.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-07 18:59   ` Hao Luo
@ 2022-01-07 19:25     ` sdf
  2022-01-10 18:55       ` Hao Luo
  0 siblings, 1 reply; 28+ messages in thread
From: sdf @ 2022-01-07 19:25 UTC (permalink / raw)
  To: Hao Luo
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, bpf

On 01/07, Hao Luo wrote:
> On Thu, Jan 6, 2022 at 3:03 PM <sdf@google.com> wrote:
> >
> > On 01/06, Hao Luo wrote:
> > > Bpffs is a pseudo file system that persists bpf objects. Previously
> > > bpf objects can only be pinned in bpffs, this patchset extends pinning
> > > to allow bpf objects to be pinned (or exposed) to other file systems.
> >
> > > In particular, this patchset allows pinning bpf objects in kernfs.  
> This
> > > creates a new file entry in the kernfs file system and the created  
> file
> > > is able to reference the bpf object. By doing so, bpf can be used to
> > > customize the file's operations, such as seq_show.
> >
> > > As a concrete usecase of this feature, this patchset introduces a
> > > simple new program type called 'bpf_view', which can be used to format
> > > a seq file by a kernel object's state. By pinning a bpf_view program
> > > into a cgroup directory, userspace is able to read the cgroup's state
> > > from file in a format defined by the bpf program.
> >
> > > Different from bpffs, kernfs doesn't have a callback when a kernfs  
> node
> > > is freed, which is problem if we allow the kernfs node to hold an  
> extra
> > > reference of the bpf object, because there is no chance to dec the
> > > object's refcnt. Therefore the kernfs node created by pinning doesn't
> > > hold reference of the bpf object. The lifetime of the kernfs node
> > > depends on the lifetime of the bpf object. Rather than "pinning in
> > > kernfs", it is "exposing to kernfs". We require the bpf object to be
> > > pinned in bpffs first before it can be pinned in kernfs. When the
> > > object is unpinned from bpffs, their kernfs nodes will be removed
> > > automatically. This somehow treats a pinned bpf object as a persistent
> > > "device".
> >
> > > We rely on fsnotify to monitor the inode events in bpffs. A new  
> function
> > > bpf_watch_inode() is introduced. It allows registering a callback
> > > function at inode destruction. For the kernfs case, a callback that
> > > removes kernfs node is registered at the destruction of bpffs inodes.
> > > For other file systems such as sockfs, bpf_watch_inode() can monitor  
> the
> > > destruction of sockfs inodes and the created file entry can hold the  
> bpf
> > > object's reference. In this case, it is truly "pinning".
> >
> > > File operations other than seq_show can also be implemented using bpf.
> > > For example, bpf may be of help for .poll and .mmap in kernfs.
> >
> > This looks awesome!
> >
> > One thing I don't understand is: why did go through the pinning
> > interface VS regular attach/detach? IOW, why not allow regular
> > sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup
> > (which, in turn, creates the kernfs nodes). Seems like this way you can  
> drop
> > the requirement on the object being pinned in the bpffs first?

> Thanks Stan.

> Yeah, the attach/detach approach is definitely another option. IIUC,
> in comparison to pinning, does attach/detach only work for cgroups?

attach has target_fd argument that, in theory, can be whatever. We can
add support for different fd types.

> Pinning may be used on other file systems, sockfs, sysfs or resctrl.
> But I don't know whether this generality is welcome and implementing
> seq_show is the only concrete use case I can think of right now. If
> people think the ability of creating files in other subsystems is not
> good, I'd be happy to take a look at the attach/detach approach and
> that may be the right way.

The reason I started thinking about attach/detach is because of clunky
unlink that you have to do (aka echo "rm" > file). IMO, having standard
attach/detach is a much more clear. But I might be missing some
complexity associated with non-cgroup filesystems.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-07  0:30 ` Yonghong Song
@ 2022-01-07 20:43   ` Hao Luo
  2022-01-10 17:30     ` Yonghong Song
  0 siblings, 1 reply; 28+ messages in thread
From: Hao Luo @ 2022-01-07 20:43 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, KP Singh, Shakeel Butt, Joe Burton,
	Stanislav Fomichev, bpf

On Thu, Jan 6, 2022 at 4:30 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 1/6/22 1:50 PM, Hao Luo wrote:
> > Bpffs is a pseudo file system that persists bpf objects. Previously
> > bpf objects can only be pinned in bpffs, this patchset extends pinning
> > to allow bpf objects to be pinned (or exposed) to other file systems.
> >
> > In particular, this patchset allows pinning bpf objects in kernfs. This
> > creates a new file entry in the kernfs file system and the created file
> > is able to reference the bpf object. By doing so, bpf can be used to
> > customize the file's operations, such as seq_show.
> >
> > As a concrete usecase of this feature, this patchset introduces a
> > simple new program type called 'bpf_view', which can be used to format
> > a seq file by a kernel object's state. By pinning a bpf_view program
> > into a cgroup directory, userspace is able to read the cgroup's state
> > from file in a format defined by the bpf program.
> >
> > Different from bpffs, kernfs doesn't have a callback when a kernfs node
> > is freed, which is problem if we allow the kernfs node to hold an extra
> > reference of the bpf object, because there is no chance to dec the
> > object's refcnt. Therefore the kernfs node created by pinning doesn't
> > hold reference of the bpf object. The lifetime of the kernfs node
> > depends on the lifetime of the bpf object. Rather than "pinning in
> > kernfs", it is "exposing to kernfs". We require the bpf object to be
> > pinned in bpffs first before it can be pinned in kernfs. When the
> > object is unpinned from bpffs, their kernfs nodes will be removed
> > automatically. This somehow treats a pinned bpf object as a persistent
> > "device".

Thanks Yonghong for the feedback.

>
> During our initial discussion for bpf_iter, we even proposed to
> put cat-able files under /proc/ system. But there are some concerns
> that /proc/ system holds stable APIs so people can rely on its format
> etc. Not sure kernfs here has such requirement or not.
>
> I understand directly put files in respective target directories
> (e.g., cgroup) helps. But you can also create directory hierarchy
> in bpffs. This can make a bunch of cgroup-stat-dumping bpf programs
> better organized.
>

I thought about this. I think the problem is that you need to
simultaneously manage two hierarchies now. You may have
synchronization problems in bpffs to deal with. For example, what if
the target cgroup is being removed while there is an on-going 'cat' on
the bpf program. I haven't given much thought in this direction. This
patchset doesn't deal with this problem, because kernfs has already
handled these synchronizations.

> Also regarding new program type bpf_view, I think we can reuse
> bpf_iter infrastructure. E.g., we already can customize bpf_iter
> for a particular map. We can certainly customize bpf_iter targeting
> a particular cgroup.
>

Right, that's what I was thinking. During the bpf office hour when I
initially proposed the idea, Alexei suggested creating a new program
type instead of reusing bpf_iter. The reason I remember was that iter
is for iterating a set of objects. Even for dumping a particular map,
it is iterating the entries in a map. While what I wanted to achieve
here is printing for a particular cgroup, not iterating something.
Maybe, we should reuse the infrastructure of bpf_iter (attach, target
registration, etc) but having a different prog type? I do copy-pasted
many code from bpf_iter for bpf_view. I haven't put too much thought
there as I would like to get feedbacks on the idea in general in this
first version of RFC.

> >
> > We rely on fsnotify to monitor the inode events in bpffs. A new function
> > bpf_watch_inode() is introduced. It allows registering a callback
> > function at inode destruction. For the kernfs case, a callback that
> > removes kernfs node is registered at the destruction of bpffs inodes.
> > For other file systems such as sockfs, bpf_watch_inode() can monitor the
> > destruction of sockfs inodes and the created file entry can hold the bpf
> > object's reference. In this case, it is truly "pinning".
> >
> > File operations other than seq_show can also be implemented using bpf.
> > For example, bpf may be of help for .poll and .mmap in kernfs.
> >
> > Patch organization:
> >   - patch 1/8 and 2/8 are preparations. 1/8 implements bpf_watch_inode();
> >     2/8 records bpffs inode in bpf object.
> >   - patch 3/8 and 4/8 implement generic logic for creating bpf backed
> >     kernfs file.
> >   - patch 5/8 and 6/8 add a new program type for formatting output.
> >   - patch 7/8 implements cgroup seq_show operation using bpf.
> >   - patch 8/8 adds selftest.
> >
> > Hao Luo (8):
> >    bpf: Support pinning in non-bpf file system.
> >    bpf: Record back pointer to the inode in bpffs
> >    bpf: Expose bpf object in kernfs
> >    bpf: Support removing kernfs entries
> >    bpf: Introduce a new program type bpf_view.
> >    libbpf: Support of bpf_view prog type.
> >    bpf: Add seq_show operation for bpf in cgroupfs
> >    selftests/bpf: Test exposing bpf objects in kernfs
> >
> >   include/linux/bpf.h                           |   9 +-
> >   include/uapi/linux/bpf.h                      |   2 +
> >   kernel/bpf/Makefile                           |   2 +-
> >   kernel/bpf/bpf_view.c                         | 190 ++++++++++++++
> >   kernel/bpf/bpf_view.h                         |  25 ++
> >   kernel/bpf/inode.c                            | 219 ++++++++++++++--
> >   kernel/bpf/inode.h                            |  54 ++++
> >   kernel/bpf/kernfs_node.c                      | 165 ++++++++++++
> >   kernel/bpf/syscall.c                          |   3 +
> >   kernel/bpf/verifier.c                         |   6 +
> >   kernel/trace/bpf_trace.c                      |  12 +-
> >   tools/include/uapi/linux/bpf.h                |   2 +
> >   tools/lib/bpf/libbpf.c                        |  21 ++
> >   .../selftests/bpf/prog_tests/pinning_kernfs.c | 245 ++++++++++++++++++
> >   .../selftests/bpf/progs/pinning_kernfs.c      |  72 +++++
> >   15 files changed, 995 insertions(+), 32 deletions(-)
> >   create mode 100644 kernel/bpf/bpf_view.c
> >   create mode 100644 kernel/bpf/bpf_view.h
> >   create mode 100644 kernel/bpf/inode.h
> >   create mode 100644 kernel/bpf/kernfs_node.c
> >   create mode 100644 tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c
> >   create mode 100644 tools/testing/selftests/bpf/progs/pinning_kernfs.c
> >

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system.
  2022-01-06 21:50 ` [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system Hao Luo
@ 2022-01-08  0:41     ` kernel test robot
  2022-01-07  0:33   ` Yonghong Song
  2022-01-08  0:41     ` kernel test robot
  2 siblings, 0 replies; 28+ messages in thread
From: kernel test robot @ 2022-01-08  0:41 UTC (permalink / raw)
  To: Hao Luo; +Cc: llvm, kbuild-all

Hi Hao,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Hao-Luo/Pinning-bpf-objects-outside-bpffs/20220107-055252
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: arm64-buildonly-randconfig-r006-20220106 (https://download.01.org/0day-ci/archive/20220108/202201080655.vV6uEYqH-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 32167bfe64a4c5dd4eb3f7a58e24f4cba76f5ac2)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/7c6dacd226b9d48b0c2bcc226bf890776267de90
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Hao-Luo/Pinning-bpf-objects-outside-bpffs/20220107-055252
        git checkout 7c6dacd226b9d48b0c2bcc226bf890776267de90
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash kernel/bpf/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> kernel/bpf/inode.c:83:5: warning: no previous prototype for function 'handle_inode_event' [-Wmissing-prototypes]
   int handle_inode_event(struct fsnotify_mark *mark, u32 mask,
       ^
   kernel/bpf/inode.c:83:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int handle_inode_event(struct fsnotify_mark *mark, u32 mask,
   ^
   static 
   1 warning generated.


vim +/handle_inode_event +83 kernel/bpf/inode.c

    81	
    82	/* Handler for any inode event. */
  > 83	int handle_inode_event(struct fsnotify_mark *mark, u32 mask,
    84			       struct inode *inode, struct inode *dir,
    85			       const struct qstr *file_name, u32 cookie)
    86	{
    87		return 0;
    88	}
    89	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system.
@ 2022-01-08  0:41     ` kernel test robot
  0 siblings, 0 replies; 28+ messages in thread
From: kernel test robot @ 2022-01-08  0:41 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2422 bytes --]

Hi Hao,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Hao-Luo/Pinning-bpf-objects-outside-bpffs/20220107-055252
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: arm64-buildonly-randconfig-r006-20220106 (https://download.01.org/0day-ci/archive/20220108/202201080655.vV6uEYqH-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 32167bfe64a4c5dd4eb3f7a58e24f4cba76f5ac2)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/0day-ci/linux/commit/7c6dacd226b9d48b0c2bcc226bf890776267de90
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Hao-Luo/Pinning-bpf-objects-outside-bpffs/20220107-055252
        git checkout 7c6dacd226b9d48b0c2bcc226bf890776267de90
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash kernel/bpf/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> kernel/bpf/inode.c:83:5: warning: no previous prototype for function 'handle_inode_event' [-Wmissing-prototypes]
   int handle_inode_event(struct fsnotify_mark *mark, u32 mask,
       ^
   kernel/bpf/inode.c:83:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int handle_inode_event(struct fsnotify_mark *mark, u32 mask,
   ^
   static 
   1 warning generated.


vim +/handle_inode_event +83 kernel/bpf/inode.c

    81	
    82	/* Handler for any inode event. */
  > 83	int handle_inode_event(struct fsnotify_mark *mark, u32 mask,
    84			       struct inode *inode, struct inode *dir,
    85			       const struct qstr *file_name, u32 cookie)
    86	{
    87		return 0;
    88	}
    89	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-07 20:43   ` Hao Luo
@ 2022-01-10 17:30     ` Yonghong Song
  2022-01-10 18:56       ` Hao Luo
  0 siblings, 1 reply; 28+ messages in thread
From: Yonghong Song @ 2022-01-10 17:30 UTC (permalink / raw)
  To: Hao Luo
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, KP Singh, Shakeel Butt, Joe Burton,
	Stanislav Fomichev, bpf



On 1/7/22 12:43 PM, Hao Luo wrote:
> On Thu, Jan 6, 2022 at 4:30 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 1/6/22 1:50 PM, Hao Luo wrote:
>>> Bpffs is a pseudo file system that persists bpf objects. Previously
>>> bpf objects can only be pinned in bpffs, this patchset extends pinning
>>> to allow bpf objects to be pinned (or exposed) to other file systems.
>>>
>>> In particular, this patchset allows pinning bpf objects in kernfs. This
>>> creates a new file entry in the kernfs file system and the created file
>>> is able to reference the bpf object. By doing so, bpf can be used to
>>> customize the file's operations, such as seq_show.
>>>
>>> As a concrete usecase of this feature, this patchset introduces a
>>> simple new program type called 'bpf_view', which can be used to format
>>> a seq file by a kernel object's state. By pinning a bpf_view program
>>> into a cgroup directory, userspace is able to read the cgroup's state
>>> from file in a format defined by the bpf program.
>>>
>>> Different from bpffs, kernfs doesn't have a callback when a kernfs node
>>> is freed, which is problem if we allow the kernfs node to hold an extra
>>> reference of the bpf object, because there is no chance to dec the
>>> object's refcnt. Therefore the kernfs node created by pinning doesn't
>>> hold reference of the bpf object. The lifetime of the kernfs node
>>> depends on the lifetime of the bpf object. Rather than "pinning in
>>> kernfs", it is "exposing to kernfs". We require the bpf object to be
>>> pinned in bpffs first before it can be pinned in kernfs. When the
>>> object is unpinned from bpffs, their kernfs nodes will be removed
>>> automatically. This somehow treats a pinned bpf object as a persistent
>>> "device".
> 
> Thanks Yonghong for the feedback.
> 
>>
>> During our initial discussion for bpf_iter, we even proposed to
>> put cat-able files under /proc/ system. But there are some concerns
>> that /proc/ system holds stable APIs so people can rely on its format
>> etc. Not sure kernfs here has such requirement or not.
>>
>> I understand directly put files in respective target directories
>> (e.g., cgroup) helps. But you can also create directory hierarchy
>> in bpffs. This can make a bunch of cgroup-stat-dumping bpf programs
>> better organized.
>>
> 
> I thought about this. I think the problem is that you need to
> simultaneously manage two hierarchies now. You may have
> synchronization problems in bpffs to deal with. For example, what if
> the target cgroup is being removed while there is an on-going 'cat' on
> the bpf program. I haven't given much thought in this direction. This
> patchset doesn't deal with this problem, because kernfs has already
> handled these synchronizations.

If the file is going to pinned inside /sys/fs/cgroup, which arguably is 
indeed a better place, maybe ask cgroup maintainer's opinion?

> 
>> Also regarding new program type bpf_view, I think we can reuse
>> bpf_iter infrastructure. E.g., we already can customize bpf_iter
>> for a particular map. We can certainly customize bpf_iter targeting
>> a particular cgroup.
>>
> 
> Right, that's what I was thinking. During the bpf office hour when I
> initially proposed the idea, Alexei suggested creating a new program
> type instead of reusing bpf_iter. The reason I remember was that iter
> is for iterating a set of objects. Even for dumping a particular map,
> it is iterating the entries in a map. While what I wanted to achieve
> here is printing for a particular cgroup, not iterating something.
> Maybe, we should reuse the infrastructure of bpf_iter (attach, target
> registration, etc) but having a different prog type? I do copy-pasted
> many code from bpf_iter for bpf_view. I haven't put too much thought
> there as I would like to get feedbacks on the idea in general in this
> first version of RFC.

Sorry I am not aware of this bpf_view discussion. It is okay for me.
But it would be great if we can avoid lots of code duplication.

> 
>>>
>>> We rely on fsnotify to monitor the inode events in bpffs. A new function
>>> bpf_watch_inode() is introduced. It allows registering a callback
>>> function at inode destruction. For the kernfs case, a callback that
>>> removes kernfs node is registered at the destruction of bpffs inodes.
>>> For other file systems such as sockfs, bpf_watch_inode() can monitor the
>>> destruction of sockfs inodes and the created file entry can hold the bpf
>>> object's reference. In this case, it is truly "pinning".
>>>
>>> File operations other than seq_show can also be implemented using bpf.
>>> For example, bpf may be of help for .poll and .mmap in kernfs.
>>>
[...]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-07 19:25     ` sdf
@ 2022-01-10 18:55       ` Hao Luo
  2022-01-10 19:22         ` Stanislav Fomichev
  2022-01-11  3:33         ` Alexei Starovoitov
  0 siblings, 2 replies; 28+ messages in thread
From: Hao Luo @ 2022-01-10 18:55 UTC (permalink / raw)
  To: sdf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, bpf

On Fri, Jan 7, 2022 at 11:25 AM <sdf@google.com> wrote:
>
> On 01/07, Hao Luo wrote:
> > On Thu, Jan 6, 2022 at 3:03 PM <sdf@google.com> wrote:
> > >
> > > On 01/06, Hao Luo wrote:
> > > > Bpffs is a pseudo file system that persists bpf objects. Previously
> > > > bpf objects can only be pinned in bpffs, this patchset extends pinning
> > > > to allow bpf objects to be pinned (or exposed) to other file systems.
> > >
> > > > In particular, this patchset allows pinning bpf objects in kernfs.
> > This
> > > > creates a new file entry in the kernfs file system and the created
> > file
> > > > is able to reference the bpf object. By doing so, bpf can be used to
> > > > customize the file's operations, such as seq_show.
> > >
> > > > As a concrete usecase of this feature, this patchset introduces a
> > > > simple new program type called 'bpf_view', which can be used to format
> > > > a seq file by a kernel object's state. By pinning a bpf_view program
> > > > into a cgroup directory, userspace is able to read the cgroup's state
> > > > from file in a format defined by the bpf program.
> > >
> > > > Different from bpffs, kernfs doesn't have a callback when a kernfs
> > node
> > > > is freed, which is problem if we allow the kernfs node to hold an
> > extra
> > > > reference of the bpf object, because there is no chance to dec the
> > > > object's refcnt. Therefore the kernfs node created by pinning doesn't
> > > > hold reference of the bpf object. The lifetime of the kernfs node
> > > > depends on the lifetime of the bpf object. Rather than "pinning in
> > > > kernfs", it is "exposing to kernfs". We require the bpf object to be
> > > > pinned in bpffs first before it can be pinned in kernfs. When the
> > > > object is unpinned from bpffs, their kernfs nodes will be removed
> > > > automatically. This somehow treats a pinned bpf object as a persistent
> > > > "device".
> > >
> > > > We rely on fsnotify to monitor the inode events in bpffs. A new
> > function
> > > > bpf_watch_inode() is introduced. It allows registering a callback
> > > > function at inode destruction. For the kernfs case, a callback that
> > > > removes kernfs node is registered at the destruction of bpffs inodes.
> > > > For other file systems such as sockfs, bpf_watch_inode() can monitor
> > the
> > > > destruction of sockfs inodes and the created file entry can hold the
> > bpf
> > > > object's reference. In this case, it is truly "pinning".
> > >
> > > > File operations other than seq_show can also be implemented using bpf.
> > > > For example, bpf may be of help for .poll and .mmap in kernfs.
> > >
> > > This looks awesome!
> > >
> > > One thing I don't understand is: why did go through the pinning
> > > interface VS regular attach/detach? IOW, why not allow regular
> > > sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup
> > > (which, in turn, creates the kernfs nodes). Seems like this way you can
> > drop
> > > the requirement on the object being pinned in the bpffs first?
>
> > Thanks Stan.
>
> > Yeah, the attach/detach approach is definitely another option. IIUC,
> > in comparison to pinning, does attach/detach only work for cgroups?
>
> attach has target_fd argument that, in theory, can be whatever. We can
> add support for different fd types.
>

I see. With attach API, are we also able to specify some attributes
for the attachment? For example, a property that we may want is: let
descendent cgroups inherit their parent cgroup's programs.

> > Pinning may be used on other file systems, sockfs, sysfs or resctrl.
> > But I don't know whether this generality is welcome and implementing
> > seq_show is the only concrete use case I can think of right now. If
> > people think the ability of creating files in other subsystems is not
> > good, I'd be happy to take a look at the attach/detach approach and
> > that may be the right way.
>
> The reason I started thinking about attach/detach is because of clunky
> unlink that you have to do (aka echo "rm" > file). IMO, having standard
> attach/detach is a much more clear. But I might be missing some
> complexity associated with non-cgroup filesystems.

Oh, I see. Looks good. Let me play with it before sending the next version.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-10 17:30     ` Yonghong Song
@ 2022-01-10 18:56       ` Hao Luo
  0 siblings, 0 replies; 28+ messages in thread
From: Hao Luo @ 2022-01-10 18:56 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, KP Singh, Shakeel Butt, Joe Burton,
	Stanislav Fomichev, bpf

On Mon, Jan 10, 2022 at 9:30 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 1/7/22 12:43 PM, Hao Luo wrote:
> > On Thu, Jan 6, 2022 at 4:30 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 1/6/22 1:50 PM, Hao Luo wrote:
> >>> Bpffs is a pseudo file system that persists bpf objects. Previously
> >>> bpf objects can only be pinned in bpffs, this patchset extends pinning
> >>> to allow bpf objects to be pinned (or exposed) to other file systems.
> >>>
> >>> In particular, this patchset allows pinning bpf objects in kernfs. This
> >>> creates a new file entry in the kernfs file system and the created file
> >>> is able to reference the bpf object. By doing so, bpf can be used to
> >>> customize the file's operations, such as seq_show.
> >>>
> >>> As a concrete usecase of this feature, this patchset introduces a
> >>> simple new program type called 'bpf_view', which can be used to format
> >>> a seq file by a kernel object's state. By pinning a bpf_view program
> >>> into a cgroup directory, userspace is able to read the cgroup's state
> >>> from file in a format defined by the bpf program.
> >>>
> >>> Different from bpffs, kernfs doesn't have a callback when a kernfs node
> >>> is freed, which is problem if we allow the kernfs node to hold an extra
> >>> reference of the bpf object, because there is no chance to dec the
> >>> object's refcnt. Therefore the kernfs node created by pinning doesn't
> >>> hold reference of the bpf object. The lifetime of the kernfs node
> >>> depends on the lifetime of the bpf object. Rather than "pinning in
> >>> kernfs", it is "exposing to kernfs". We require the bpf object to be
> >>> pinned in bpffs first before it can be pinned in kernfs. When the
> >>> object is unpinned from bpffs, their kernfs nodes will be removed
> >>> automatically. This somehow treats a pinned bpf object as a persistent
> >>> "device".
> >
> > Thanks Yonghong for the feedback.
> >
> >>
> >> During our initial discussion for bpf_iter, we even proposed to
> >> put cat-able files under /proc/ system. But there are some concerns
> >> that /proc/ system holds stable APIs so people can rely on its format
> >> etc. Not sure kernfs here has such requirement or not.
> >>
> >> I understand directly put files in respective target directories
> >> (e.g., cgroup) helps. But you can also create directory hierarchy
> >> in bpffs. This can make a bunch of cgroup-stat-dumping bpf programs
> >> better organized.
> >>
> >
> > I thought about this. I think the problem is that you need to
> > simultaneously manage two hierarchies now. You may have
> > synchronization problems in bpffs to deal with. For example, what if
> > the target cgroup is being removed while there is an on-going 'cat' on
> > the bpf program. I haven't given much thought in this direction. This
> > patchset doesn't deal with this problem, because kernfs has already
> > handled these synchronizations.
>
> If the file is going to pinned inside /sys/fs/cgroup, which arguably is
> indeed a better place, maybe ask cgroup maintainer's opinion?
>

Yes, will do.

I don't know if pinning in other file systems other than cgroup has
any use. If not, I can tailor this patchset for cgroup only. Would be
much simplified.

> >
> >> Also regarding new program type bpf_view, I think we can reuse
> >> bpf_iter infrastructure. E.g., we already can customize bpf_iter
> >> for a particular map. We can certainly customize bpf_iter targeting
> >> a particular cgroup.
> >>
> >
> > Right, that's what I was thinking. During the bpf office hour when I
> > initially proposed the idea, Alexei suggested creating a new program
> > type instead of reusing bpf_iter. The reason I remember was that iter
> > is for iterating a set of objects. Even for dumping a particular map,
> > it is iterating the entries in a map. While what I wanted to achieve
> > here is printing for a particular cgroup, not iterating something.
> > Maybe, we should reuse the infrastructure of bpf_iter (attach, target
> > registration, etc) but having a different prog type? I do copy-pasted
> > many code from bpf_iter for bpf_view. I haven't put too much thought
> > there as I would like to get feedbacks on the idea in general in this
> > first version of RFC.
>
> Sorry I am not aware of this bpf_view discussion. It is okay for me.
> But it would be great if we can avoid lots of code duplication.
>

No problem. I totally agree.


> >
> >>>
> >>> We rely on fsnotify to monitor the inode events in bpffs. A new function
> >>> bpf_watch_inode() is introduced. It allows registering a callback
> >>> function at inode destruction. For the kernfs case, a callback that
> >>> removes kernfs node is registered at the destruction of bpffs inodes.
> >>> For other file systems such as sockfs, bpf_watch_inode() can monitor the
> >>> destruction of sockfs inodes and the created file entry can hold the bpf
> >>> object's reference. In this case, it is truly "pinning".
> >>>
> >>> File operations other than seq_show can also be implemented using bpf.
> >>> For example, bpf may be of help for .poll and .mmap in kernfs.
> >>>
> [...]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-10 18:55       ` Hao Luo
@ 2022-01-10 19:22         ` Stanislav Fomichev
  2022-01-11  3:33         ` Alexei Starovoitov
  1 sibling, 0 replies; 28+ messages in thread
From: Stanislav Fomichev @ 2022-01-10 19:22 UTC (permalink / raw)
  To: Hao Luo
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, bpf

On Mon, Jan 10, 2022 at 10:56 AM Hao Luo <haoluo@google.com> wrote:
>
> On Fri, Jan 7, 2022 at 11:25 AM <sdf@google.com> wrote:
> >
> > On 01/07, Hao Luo wrote:
> > > On Thu, Jan 6, 2022 at 3:03 PM <sdf@google.com> wrote:
> > > >
> > > > On 01/06, Hao Luo wrote:
> > > > > Bpffs is a pseudo file system that persists bpf objects. Previously
> > > > > bpf objects can only be pinned in bpffs, this patchset extends pinning
> > > > > to allow bpf objects to be pinned (or exposed) to other file systems.
> > > >
> > > > > In particular, this patchset allows pinning bpf objects in kernfs.
> > > This
> > > > > creates a new file entry in the kernfs file system and the created
> > > file
> > > > > is able to reference the bpf object. By doing so, bpf can be used to
> > > > > customize the file's operations, such as seq_show.
> > > >
> > > > > As a concrete usecase of this feature, this patchset introduces a
> > > > > simple new program type called 'bpf_view', which can be used to format
> > > > > a seq file by a kernel object's state. By pinning a bpf_view program
> > > > > into a cgroup directory, userspace is able to read the cgroup's state
> > > > > from file in a format defined by the bpf program.
> > > >
> > > > > Different from bpffs, kernfs doesn't have a callback when a kernfs
> > > node
> > > > > is freed, which is problem if we allow the kernfs node to hold an
> > > extra
> > > > > reference of the bpf object, because there is no chance to dec the
> > > > > object's refcnt. Therefore the kernfs node created by pinning doesn't
> > > > > hold reference of the bpf object. The lifetime of the kernfs node
> > > > > depends on the lifetime of the bpf object. Rather than "pinning in
> > > > > kernfs", it is "exposing to kernfs". We require the bpf object to be
> > > > > pinned in bpffs first before it can be pinned in kernfs. When the
> > > > > object is unpinned from bpffs, their kernfs nodes will be removed
> > > > > automatically. This somehow treats a pinned bpf object as a persistent
> > > > > "device".
> > > >
> > > > > We rely on fsnotify to monitor the inode events in bpffs. A new
> > > function
> > > > > bpf_watch_inode() is introduced. It allows registering a callback
> > > > > function at inode destruction. For the kernfs case, a callback that
> > > > > removes kernfs node is registered at the destruction of bpffs inodes.
> > > > > For other file systems such as sockfs, bpf_watch_inode() can monitor
> > > the
> > > > > destruction of sockfs inodes and the created file entry can hold the
> > > bpf
> > > > > object's reference. In this case, it is truly "pinning".
> > > >
> > > > > File operations other than seq_show can also be implemented using bpf.
> > > > > For example, bpf may be of help for .poll and .mmap in kernfs.
> > > >
> > > > This looks awesome!
> > > >
> > > > One thing I don't understand is: why did go through the pinning
> > > > interface VS regular attach/detach? IOW, why not allow regular
> > > > sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup
> > > > (which, in turn, creates the kernfs nodes). Seems like this way you can
> > > drop
> > > > the requirement on the object being pinned in the bpffs first?
> >
> > > Thanks Stan.
> >
> > > Yeah, the attach/detach approach is definitely another option. IIUC,
> > > in comparison to pinning, does attach/detach only work for cgroups?
> >
> > attach has target_fd argument that, in theory, can be whatever. We can
> > add support for different fd types.
> >
>
> I see. With attach API, are we also able to specify some attributes
> for the attachment? For example, a property that we may want is: let
> descendent cgroups inherit their parent cgroup's programs.

There are already flags like these: BPF_F_ALLOW_OVERRIDE, maybe you
can rely on them? But you can always add more bit flags if needed.

> > > Pinning may be used on other file systems, sockfs, sysfs or resctrl.
> > > But I don't know whether this generality is welcome and implementing
> > > seq_show is the only concrete use case I can think of right now. If
> > > people think the ability of creating files in other subsystems is not
> > > good, I'd be happy to take a look at the attach/detach approach and
> > > that may be the right way.
> >
> > The reason I started thinking about attach/detach is because of clunky
> > unlink that you have to do (aka echo "rm" > file). IMO, having standard
> > attach/detach is a much more clear. But I might be missing some
> > complexity associated with non-cgroup filesystems.
>
> Oh, I see. Looks good. Let me play with it before sending the next version.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-10 18:55       ` Hao Luo
  2022-01-10 19:22         ` Stanislav Fomichev
@ 2022-01-11  3:33         ` Alexei Starovoitov
  2022-01-11 17:06           ` Stanislav Fomichev
  2022-01-11 18:20           ` Hao Luo
  1 sibling, 2 replies; 28+ messages in thread
From: Alexei Starovoitov @ 2022-01-11  3:33 UTC (permalink / raw)
  To: Hao Luo
  Cc: sdf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, bpf

On Mon, Jan 10, 2022 at 10:55:54AM -0800, Hao Luo wrote:
> 
> I see. With attach API, are we also able to specify some attributes
> for the attachment? For example, a property that we may want is: let
> descendent cgroups inherit their parent cgroup's programs.

Plenty of interesting ideas in this thread. Thanks for kicking it off.
Maybe we should move it to office hours?
The back and forth over email can take some time.
It sounds to me that "let descendents inherit" is a mandatory feature.
In that sense "allow attach in kernfs" is not a feature. It feels that
it's creating more problems for the design.
Creating a "catable" file inside cgroup directory that descedents inherit
with the same name is a cgroup specific feature.
Inherit or not can be a flag, but the inheritance needs to be designed
from the start.

echo "rm" is not pretty. 
fsnotify feels a bit hacky.
Maybe pinning in cgroupfs will avoid both issues?
We can have normal unlink implemented there.

The attach bpf_sys cmd as-is won't work. It needs a name at least.
That will make it look like obj_pin cmd. So probably better to make
obj_pin work when path is inside cgroupfs and use file_flags for
inherit or not.
The patch 8 gives a glimpse of how the bpf prog will look like.
Can you make it more realistic?
Do you need to walk cgroup children? Or all processes in a cgroup?
Will we need css_for_each_descendant() as a bpf helper?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-11  3:33         ` Alexei Starovoitov
@ 2022-01-11 17:06           ` Stanislav Fomichev
  2022-01-11 18:20           ` Hao Luo
  1 sibling, 0 replies; 28+ messages in thread
From: Stanislav Fomichev @ 2022-01-11 17:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Hao Luo, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, bpf

On Mon, Jan 10, 2022 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jan 10, 2022 at 10:55:54AM -0800, Hao Luo wrote:
> >
> > I see. With attach API, are we also able to specify some attributes
> > for the attachment? For example, a property that we may want is: let
> > descendent cgroups inherit their parent cgroup's programs.
>
> Plenty of interesting ideas in this thread. Thanks for kicking it off.
> Maybe we should move it to office hours?
> The back and forth over email can take some time.
> It sounds to me that "let descendents inherit" is a mandatory feature.
> In that sense "allow attach in kernfs" is not a feature. It feels that
> it's creating more problems for the design.
> Creating a "catable" file inside cgroup directory that descedents inherit
> with the same name is a cgroup specific feature.
> Inherit or not can be a flag, but the inheritance needs to be designed
> from the start.
>
> echo "rm" is not pretty.
> fsnotify feels a bit hacky.
> Maybe pinning in cgroupfs will avoid both issues?
> We can have normal unlink implemented there.

With unlink we can do chown and let regular users call unlink.
So maybe it's actually more flexible vs syscall detach, although no
idea why the users would do that.

> The attach bpf_sys cmd as-is won't work. It needs a name at least.

Can we use prog_name in attach? Or, if it's limited by
BPF_OBJ_NAME_LEN, can we extract function name from btf? Or is it too
magic to derive the path from the program name?
obj_pin+unlink is a good alternative. One thing I'm not certain is:
what happens when I call unlink on some of the inherited nodes? (i.e.,
do I need to have a flag to say "unlink this one including
children/parent"? probably not and returning error is fine?)
Agree with your summary that the inheritance story needs more thought :-)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-11  3:33         ` Alexei Starovoitov
  2022-01-11 17:06           ` Stanislav Fomichev
@ 2022-01-11 18:20           ` Hao Luo
  2022-01-12 18:55             ` Song Liu
  1 sibling, 1 reply; 28+ messages in thread
From: Hao Luo @ 2022-01-11 18:20 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: sdf, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, bpf

On Mon, Jan 10, 2022 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jan 10, 2022 at 10:55:54AM -0800, Hao Luo wrote:
> >
> > I see. With attach API, are we also able to specify some attributes
> > for the attachment? For example, a property that we may want is: let
> > descendent cgroups inherit their parent cgroup's programs.
>
> Plenty of interesting ideas in this thread. Thanks for kicking it off.
> Maybe we should move it to office hours?
> The back and forth over email can take some time.

No problem. Requested a time on Thursday (1/13/22).

> It sounds to me that "let descendents inherit" is a mandatory feature.
> In that sense "allow attach in kernfs" is not a feature. It feels that
> it's creating more problems for the design.
> Creating a "catable" file inside cgroup directory that descedents inherit
> with the same name is a cgroup specific feature.
> Inherit or not can be a flag, but the inheritance needs to be designed
> from the start.
>
> echo "rm" is not pretty.
> fsnotify feels a bit hacky.
> Maybe pinning in cgroupfs will avoid both issues?
> We can have normal unlink implemented there.
>
> The attach bpf_sys cmd as-is won't work. It needs a name at least.
> That will make it look like obj_pin cmd. So probably better to make
> obj_pin work when path is inside cgroupfs and use file_flags for
> inherit or not.
> The patch 8 gives a glimpse of how the bpf prog will look like.
> Can you make it more realistic?
> Do you need to walk cgroup children? Or all processes in a cgroup?
> Will we need css_for_each_descendant() as a bpf helper?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-11 18:20           ` Hao Luo
@ 2022-01-12 18:55             ` Song Liu
  2022-01-12 19:19               ` Hao Luo
  0 siblings, 1 reply; 28+ messages in thread
From: Song Liu @ 2022-01-12 18:55 UTC (permalink / raw)
  To: Hao Luo, Tejun Heo
  Cc: Alexei Starovoitov, Stanislav Fomichev, Alexei Starovoitov,
	Andrii Nakryiko, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, KP Singh, Shakeel Butt, Joe Burton, bpf

On Tue, Jan 11, 2022 at 10:20 AM Hao Luo <haoluo@google.com> wrote:
>
> On Mon, Jan 10, 2022 at 7:33 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Jan 10, 2022 at 10:55:54AM -0800, Hao Luo wrote:
> > >
> > > I see. With attach API, are we also able to specify some attributes
> > > for the attachment? For example, a property that we may want is: let
> > > descendent cgroups inherit their parent cgroup's programs.
> >
> > Plenty of interesting ideas in this thread. Thanks for kicking it off.
> > Maybe we should move it to office hours?
> > The back and forth over email can take some time.
>
> No problem. Requested a time on Thursday (1/13/22).

CC Tejun, who might be interested in this discussion.

@Tejun, FYI the office hour is tomorrow 9am PST, via zoom:

  https://fb.zoom.us/j/91157637485

Thanks,
Song

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs
  2022-01-12 18:55             ` Song Liu
@ 2022-01-12 19:19               ` Hao Luo
  0 siblings, 0 replies; 28+ messages in thread
From: Hao Luo @ 2022-01-12 19:19 UTC (permalink / raw)
  To: Song Liu
  Cc: Tejun Heo, Alexei Starovoitov, Stanislav Fomichev,
	Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Shakeel Butt, Joe Burton, bpf

On Wed, Jan 12, 2022 at 10:55 AM Song Liu <song@kernel.org> wrote:
>
> On Tue, Jan 11, 2022 at 10:20 AM Hao Luo <haoluo@google.com> wrote:
> >
> > On Mon, Jan 10, 2022 at 7:33 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Jan 10, 2022 at 10:55:54AM -0800, Hao Luo wrote:
> > > >
> > > > I see. With attach API, are we also able to specify some attributes
> > > > for the attachment? For example, a property that we may want is: let
> > > > descendent cgroups inherit their parent cgroup's programs.
> > >
> > > Plenty of interesting ideas in this thread. Thanks for kicking it off.
> > > Maybe we should move it to office hours?
> > > The back and forth over email can take some time.
> >
> > No problem. Requested a time on Thursday (1/13/22).
>
> CC Tejun, who might be interested in this discussion.
>
> @Tejun, FYI the office hour is tomorrow 9am PST, via zoom:
>
>   https://fb.zoom.us/j/91157637485
>

Thanks Song, let me resend the whole patchset to include Tejun and cc
linux-kernel.

> Thanks,
> Song

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-01-12 19:20 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-06 21:50 [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 1/8] bpf: Support pinning in non-bpf file system Hao Luo
2022-01-07  0:04   ` kernel test robot
2022-01-07  0:33   ` Yonghong Song
2022-01-08  0:41   ` kernel test robot
2022-01-08  0:41     ` kernel test robot
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 2/8] bpf: Record back pointer to the inode in bpffs Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 3/8] bpf: Expose bpf object in kernfs Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 4/8] bpf: Support removing kernfs entries Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 5/8] bpf: Introduce a new program type bpf_view Hao Luo
2022-01-07  0:35   ` Yonghong Song
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 6/8] libbpf: Support of bpf_view prog type Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 7/8] bpf: Add seq_show operation for bpf in cgroupfs Hao Luo
2022-01-06 21:50 ` [PATCH RFC bpf-next v1 8/8] selftests/bpf: Test exposing bpf objects in kernfs Hao Luo
2022-01-06 23:02 ` [PATCH RFC bpf-next v1 0/8] Pinning bpf objects outside bpffs sdf
2022-01-07 18:59   ` Hao Luo
2022-01-07 19:25     ` sdf
2022-01-10 18:55       ` Hao Luo
2022-01-10 19:22         ` Stanislav Fomichev
2022-01-11  3:33         ` Alexei Starovoitov
2022-01-11 17:06           ` Stanislav Fomichev
2022-01-11 18:20           ` Hao Luo
2022-01-12 18:55             ` Song Liu
2022-01-12 19:19               ` Hao Luo
2022-01-07  0:30 ` Yonghong Song
2022-01-07 20:43   ` Hao Luo
2022-01-10 17:30     ` Yonghong Song
2022-01-10 18:56       ` Hao Luo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.