netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures
@ 2020-04-15 19:27 Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 01/17] net: refactor net assignment for seq_net_private structure Yonghong Song
                   ` (17 more replies)
  0 siblings, 18 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

The v1 version is here:
  https://lore.kernel.org/bpf/20200408232520.2675265-1-yhs@fb.com/T/#m058a6817dc3ded9d2db0192ca08486b4a3f4daf0
Compared to v1, I have made changes:
  . use BPF_RAW_TRACEPOINT_OPEN to create an anonymous dumper
  . use BPF_OBJ_PIN with pathname instead dumper name to
    create a file dumper
  . support PTR_TO_BTF_ID_OR_NULL so bpf program will be called when
    the dumping session ends. This gives bpf program an opportunity
    to print footer or accumulate and send summaries for anonymous
    dumper.
  . use BPF_OBJ_GET_INFO_BY_FD to get bpfdump target/dumper info.
Still missing:
  . bpf_seq_printf()/bpf_seq_write() related changes
  . double check seq_ops implementation for bpf_map/task/task_file.
  . libbpf/bpftool implementation
  . tests for new features
  . ...

As there are some discussions regarding to the kernel interface/steps to
create file/anonymous dumpers, I think it will be beneficial for
discussion with this work in progress.

Motivation:
  The current way to dump kernel data structures mostly:
    1. /proc system
    2. various specific tools like "ss" which requires kernel support.
    3. drgn
  The dropback for the first two is that whenever you want to dump more, you
  need change the kernel. For example, Martin wants to dump socket local
  storage with "ss". Kernel change is needed for it to work ([1]).
  This is also the direct motivation for this work.

  drgn ([2]) solves this proble nicely and no kernel change is not needed.
  But since drgn is not able to verify the validity of a particular pointer value,
  it might present the wrong results in rare cases.

  In this patch set, we introduce bpf based dumping. Initial kernel changes are
  still needed, but a data structure change will not require kernel changes
  any more. bpf program itself is used to adapt to new data structure
  changes. This will give certain flexibility with guaranteed correctness.

  Here, kernel seq_ops is used to facilitate dumping, similar to current
  /proc and many other lossless kernel dumping facilities.

User Interfaces:
  1. A new mount file system, bpfdump at /sys/kernel/bpfdump is introduced.
     Different from /sys/fs/bpf, this is a single user mount. Mount command
     can be:
        mount -t bpfdump bpfdump /sys/kernel/bpfdump
  2. Kernel bpf dumpable data structures are represented as directories
     under /sys/kernel/bpfdump, e.g.,
       /sys/kernel/bpfdump/ipv6_route/
       /sys/kernel/bpfdump/netlink/
       /sys/kernel/bpfdump/bpf_map/
       /sys/kernel/bpfdump/task/
       /sys/kernel/bpfdump/task/file/
     In this patch set, we use "target" to represent a particular bpf
     supported data structure, for example, targets "ipv6_route",
     "netlink", "bpf_map", "task", "task/file", which are actual
     directory hierarchy relative to /sys/kernel/bpfdump/.

     Note that nested structures are supported for sub fields in a major
     data structure. For example, target "task/file" is to examine all open
     files for all tasks (task_struct->files) as reference count and
     locks are needed to access task_struct->files safely.
  3. The bpftool command can be used to create a dumper:
       bpftool dumper pin <bpf_prog.o> <dumper_name>
     where the bpf_prog.o encodes the target information. For example, the
     following dumpers can be created:
       /sys/kernel/bpfdump/ipv6_route/{my1, my2}
       /sys/kernel/bpfdump/task/file/{f1, f2}
  4. Use "cat <dumper>" to dump the contents.
     Use "rm -f <dumper>" to delete the dumper.
  5. An anonymous dumper can be created without pinning to a
     physical file. The fd will return to the application and
     the application can then "read" the contents.

Please see patch #15 and #16 for bpf programs and
bpf dumper output examples.

Two new helpers bpf_seq_printf() and bpf_seq_write() are introduced.
bpf_seq_printf() mostly for file based dumpers and bpf_seq_write()
mostly for anonymous dumpers.

Note that certain dumpers are namespace aware. For example,
task and task/... targets only iterate through current pid namespace.
ipv6_route and netlink will iterate through current net namespace.

For introspection, see patch #14,
  bpftool dumper show {target|dumper}
can show all targets and their context structure type name (for writing bpf
programs), or all dumpers with their associated bpf prog_id.
For any open file descriptors (anonymous or from dumper file),
  cat /proc/<pid>/fdinfo/<fd>
will show target and its associated prog_id as well.

Although the initial motivation is from Martin's sk_local_storage,
this patch didn't implement tcp6 sockets and sk_local_storage.
The /proc/net/tcp6 involves three types of sockets, timewait,
request and tcp6 sockets. Some kind of type casting is needed
to convert socket_common to these three types of sockets based
on socket state. This will be addressed in future work.

References:
  [1]: https://lore.kernel.org/bpf/20200225230427.1976129-1-kafai@fb.com
  [2]: https://github.com/osandov/drgn

Yonghong Song (17):
  net: refactor net assignment for seq_net_private structure
  bpf: create /sys/kernel/bpfdump mount file system
  bpf: provide a way for targets to register themselves
  bpf: allow loading of a dumper program
  bpf: create file or anonymous dumpers
  bpf: add PTR_TO_BTF_ID_OR_NULL support
  bpf: add netlink and ipv6_route targets
  bpf: add bpf_map target
  bpf: add task and task/file targets
  bpf: add bpf_seq_printf and bpf_seq_write helpers
  bpf: support variable length array in tracing programs
  bpf: implement query for target_proto and file dumper prog_id
  tools/libbpf: libbpf support for bpfdump
  tools/bpftool: add bpf dumper support
  tools/bpf: selftests: add dumper programs for ipv6_route and netlink
  tools/bpf: selftests: add dumper progs for bpf_map/task/task_file
  tools/bpf: selftests: add a selftest for anonymous dumper

 fs/proc/proc_net.c                            |   5 +-
 include/linux/bpf.h                           |  31 +
 include/linux/seq_file_net.h                  |   8 +
 include/uapi/linux/bpf.h                      |  35 +-
 include/uapi/linux/magic.h                    |   1 +
 kernel/bpf/Makefile                           |   1 +
 kernel/bpf/btf.c                              |  30 +-
 kernel/bpf/dump.c                             | 806 ++++++++++++++++++
 kernel/bpf/dump_task.c                        | 320 +++++++
 kernel/bpf/syscall.c                          | 146 +++-
 kernel/bpf/verifier.c                         |  33 +-
 kernel/trace/bpf_trace.c                      | 172 ++++
 net/ipv6/ip6_fib.c                            |  71 +-
 net/ipv6/route.c                              |  29 +
 net/netlink/af_netlink.c                      |  94 +-
 scripts/bpf_helpers_doc.py                    |   2 +
 tools/bpf/bpftool/dumper.c                    | 135 +++
 tools/bpf/bpftool/main.c                      |   3 +-
 tools/bpf/bpftool/main.h                      |   1 +
 tools/include/uapi/linux/bpf.h                |  35 +-
 tools/lib/bpf/bpf.c                           |   9 +-
 tools/lib/bpf/bpf.h                           |   1 +
 tools/lib/bpf/libbpf.c                        |  88 +-
 tools/lib/bpf/libbpf.h                        |   3 +
 tools/lib/bpf/libbpf.map                      |   2 +
 .../selftests/bpf/prog_tests/bpfdump_test.c   |  42 +
 .../selftests/bpf/progs/bpfdump_bpf_map.c     |  33 +
 .../selftests/bpf/progs/bpfdump_ipv6_route.c  |  71 ++
 .../selftests/bpf/progs/bpfdump_netlink.c     |  80 ++
 .../selftests/bpf/progs/bpfdump_task.c        |  29 +
 .../selftests/bpf/progs/bpfdump_task_file.c   |  30 +
 .../selftests/bpf/progs/bpfdump_test_kern.c   |  31 +
 32 files changed, 2343 insertions(+), 34 deletions(-)
 create mode 100644 kernel/bpf/dump.c
 create mode 100644 kernel/bpf/dump_task.c
 create mode 100644 tools/bpf/bpftool/dumper.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpfdump_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpfdump_bpf_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpfdump_ipv6_route.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpfdump_netlink.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpfdump_task.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpfdump_task_file.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpfdump_test_kern.c

-- 
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 01/17] net: refactor net assignment for seq_net_private structure
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 02/17] bpf: create /sys/kernel/bpfdump mount file system Yonghong Song
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Refactor assignment of "net" in seq_net_private structure
in proc_net.c to a helper function. The helper later will
be used by bpfdump.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 fs/proc/proc_net.c           | 5 ++---
 include/linux/seq_file_net.h | 8 ++++++++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index 4888c5224442..aee07c19cf8b 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -75,9 +75,8 @@ static int seq_open_net(struct inode *inode, struct file *file)
 		put_net(net);
 		return -ENOMEM;
 	}
-#ifdef CONFIG_NET_NS
-	p->net = net;
-#endif
+
+	set_seq_net_private(p, net);
 	return 0;
 }
 
diff --git a/include/linux/seq_file_net.h b/include/linux/seq_file_net.h
index 0fdbe1ddd8d1..0ec4a18b9aca 100644
--- a/include/linux/seq_file_net.h
+++ b/include/linux/seq_file_net.h
@@ -35,4 +35,12 @@ static inline struct net *seq_file_single_net(struct seq_file *seq)
 #endif
 }
 
+static inline void set_seq_net_private(struct seq_net_private *p,
+				       struct net *net)
+{
+#ifdef CONFIG_NET_NS
+	p->net = net;
+#endif
+}
+
 #endif
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 02/17] bpf: create /sys/kernel/bpfdump mount file system
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 01/17] net: refactor net assignment for seq_net_private structure Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 03/17] bpf: provide a way for targets to register themselves Yonghong Song
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

This patch creates a mount point "bpfdump" under
/sys/kernel. The file system has a single user
mode, i.e., all mount points will be identical.

The magic number I picked for the new file system
is "dump".

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/uapi/linux/magic.h |  1 +
 kernel/bpf/Makefile        |  1 +
 kernel/bpf/dump.c          | 79 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+)
 create mode 100644 kernel/bpf/dump.c

diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index d78064007b17..4ce3d8882315 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -88,6 +88,7 @@
 #define BPF_FS_MAGIC		0xcafe4a11
 #define AAFS_MAGIC		0x5a3c69f0
 #define ZONEFS_MAGIC		0x5a4f4653
+#define DUMPFS_MAGIC		0x64756d70
 
 /* Since UDF 2.01 is ISO 13346 based... */
 #define UDF_SUPER_MAGIC		0x15013346
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index f2d7be596966..4a1376ab2bea 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_BPF_SYSCALL) += reuseport_array.o
 endif
 ifeq ($(CONFIG_SYSFS),y)
 obj-$(CONFIG_DEBUG_INFO_BTF) += sysfs_btf.o
+obj-$(CONFIG_BPF_SYSCALL) += dump.o
 endif
 ifeq ($(CONFIG_BPF_JIT),y)
 obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
diff --git a/kernel/bpf/dump.c b/kernel/bpf/dump.c
new file mode 100644
index 000000000000..e0c33486e0e7
--- /dev/null
+++ b/kernel/bpf/dump.c
@@ -0,0 +1,79 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 Facebook */
+
+#include <linux/init.h>
+#include <linux/magic.h>
+#include <linux/mount.h>
+#include <linux/anon_inodes.h>
+#include <linux/namei.h>
+#include <linux/fs.h>
+#include <linux/fs_context.h>
+#include <linux/fs_parser.h>
+#include <linux/filter.h>
+#include <linux/bpf.h>
+
+static void bpfdump_free_inode(struct inode *inode)
+{
+	kfree(inode->i_private);
+	free_inode_nonrcu(inode);
+}
+
+static const struct super_operations bpfdump_super_operations = {
+	.statfs		= simple_statfs,
+	.free_inode	= bpfdump_free_inode,
+};
+
+static int bpfdump_fill_super(struct super_block *sb, struct fs_context *fc)
+{
+	static const struct tree_descr files[] = { { "" } };
+	int err;
+
+	err = simple_fill_super(sb, DUMPFS_MAGIC, files);
+	if (err)
+		return err;
+
+	sb->s_op = &bpfdump_super_operations;
+	return 0;
+}
+
+static int bpfdump_get_tree(struct fs_context *fc)
+{
+	return get_tree_single(fc, bpfdump_fill_super);
+}
+
+static const struct fs_context_operations bpfdump_context_ops = {
+	.get_tree	= bpfdump_get_tree,
+};
+
+static int bpfdump_init_fs_context(struct fs_context *fc)
+{
+	fc->ops = &bpfdump_context_ops;
+	return 0;
+}
+
+static struct file_system_type fs_type = {
+	.owner			= THIS_MODULE,
+	.name			= "bpfdump",
+	.init_fs_context	= bpfdump_init_fs_context,
+	.kill_sb		= kill_litter_super,
+};
+
+static int __init bpfdump_init(void)
+{
+	int ret;
+
+	ret = sysfs_create_mount_point(kernel_kobj, "bpfdump");
+	if (ret)
+		return ret;
+
+	ret = register_filesystem(&fs_type);
+	if (ret)
+		goto remove_mount;
+
+	return 0;
+
+remove_mount:
+	sysfs_remove_mount_point(kernel_kobj, "bpfdump");
+	return ret;
+}
+core_initcall(bpfdump_init);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 03/17] bpf: provide a way for targets to register themselves
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 01/17] net: refactor net assignment for seq_net_private structure Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 02/17] bpf: create /sys/kernel/bpfdump mount file system Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 04/17] bpf: allow loading of a dumper program Yonghong Song
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Here, the target refers to a particular data structure
inside the kernel we want to dump. For example, it
can be all task_structs in the current pid namespace,
or it could be all open files for all task_structs
in the current pid namespace.

Each target is identified with the following information:
   target_rel_path    <=== relative path to /sys/kernel/bpfdump
   target_proto       <=== kernel func proto used by kernel verifier
   prog_ctx_type_name <=== prog ctx type used by bpf programs
   seq_ops            <=== seq_ops for seq_file operations
   seq_priv_size      <=== seq_file private data size
   target_feature     <=== target specific feature which needs
                           handling outside seq_ops.

The target relative path is a relative directory to /sys/kernel/bpfdump/.
For example, it could be:
   task                  <=== all tasks
   task/file             <=== all open files under all tasks
   ipv6_route            <=== all ipv6_routes
   tcp6/sk_local_storage <=== all tcp6 socket local storages
   foo/bar/tar           <=== all tar's in bar in foo

The "target_feature" is mostly used for reusing existing seq_ops.
For example, for /proc/net/<> stats, the "net" namespace is often
stored in file private data. The target_feature enables bpf based
dumper to set "net" properly for itself before calling shared
seq_ops.

bpf_dump_reg_target() is implemented so targets
can register themselves. Currently, module is not
supported, so there is no bpf_dump_unreg_target().
The main reason is that BTF is not available for modules
yet.

Since target might call bpf_dump_reg_target() before
bpfdump mount point is created, __bpfdump_init()
may be called in bpf_dump_reg_target() as well.

The file-based dumpers will be regular files under
the specific target directory. For example,
   task/my1      <=== dumper "my1" iterates through all tasks
   task/file/my2 <=== dumper "my2" iterates through all open files
                      under all tasks

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h |  12 +++
 kernel/bpf/dump.c   | 198 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 208 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index fd2b2322412d..84c7eb40d7bc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -31,6 +31,7 @@ struct seq_file;
 struct btf;
 struct btf_type;
 struct exception_table_entry;
+struct seq_operations;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -1109,6 +1110,17 @@ struct bpf_link *bpf_link_get_from_fd(u32 ufd);
 int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
 int bpf_obj_get_user(const char __user *pathname, int flags);
 
+struct bpf_dump_reg {
+	const char *target;
+	const char *target_proto;
+	const char *prog_ctx_type_name;
+	const struct seq_operations *seq_ops;
+	u32 seq_priv_size;
+	u32 target_feature;
+};
+
+int bpf_dump_reg_target(struct bpf_dump_reg *reg_info);
+
 int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
 int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
 int bpf_percpu_hash_update(struct bpf_map *map, void *key, void *value,
diff --git a/kernel/bpf/dump.c b/kernel/bpf/dump.c
index e0c33486e0e7..e8b46f9e0ee0 100644
--- a/kernel/bpf/dump.c
+++ b/kernel/bpf/dump.c
@@ -12,6 +12,172 @@
 #include <linux/filter.h>
 #include <linux/bpf.h>
 
+struct bpfdump_target_info {
+	struct list_head list;
+	const char *target;
+	const char *target_proto;
+	struct dentry *dir_dentry;
+	const struct seq_operations *seq_ops;
+	u32 seq_priv_size;
+	u32 target_feature;
+};
+
+struct bpfdump_targets {
+	struct list_head dumpers;
+	struct mutex dumper_mutex;
+};
+
+/* registered dump targets */
+static struct bpfdump_targets dump_targets;
+
+static struct dentry *bpfdump_dentry;
+
+static struct dentry *bpfdump_add_dir(const char *name, struct dentry *parent,
+				      const struct inode_operations *i_ops,
+				      void *data);
+static int __bpfdump_init(void);
+
+/* 0: not inited, > 0: successful, < 0: previous init failed */
+static int bpfdump_inited = 0;
+
+static int dumper_unlink(struct inode *dir, struct dentry *dentry)
+{
+	kfree(d_inode(dentry)->i_private);
+	return simple_unlink(dir, dentry);
+}
+
+static const struct inode_operations bpfdump_dir_iops = {
+	.lookup		= simple_lookup,
+	.unlink		= dumper_unlink,
+};
+
+int bpf_dump_reg_target(struct bpf_dump_reg *reg_info)
+{
+	struct bpfdump_target_info *tinfo, *ptinfo;
+	struct dentry *dentry, *parent;
+	const char *target, *lastslash;
+	bool existed = false;
+	int err, parent_len;
+
+	if (!bpfdump_dentry) {
+		err = __bpfdump_init();
+		if (err)
+			return err;
+	}
+
+	tinfo = kmalloc(sizeof(*tinfo), GFP_KERNEL);
+	if (!tinfo)
+		return -ENOMEM;
+
+	target = reg_info->target;
+	tinfo->target = target;
+	tinfo->target_proto = reg_info->target_proto;
+	tinfo->seq_ops = reg_info->seq_ops;
+	tinfo->seq_priv_size = reg_info->seq_priv_size;
+	tinfo->target_feature = reg_info->target_feature;
+	INIT_LIST_HEAD(&tinfo->list);
+
+	lastslash = strrchr(target, '/');
+	parent = bpfdump_dentry;
+	if (lastslash) {
+		parent_len = (unsigned long)lastslash - (unsigned long)target;
+
+		mutex_lock(&dump_targets.dumper_mutex);
+		list_for_each_entry(ptinfo, &dump_targets.dumpers, list) {
+			if (strlen(ptinfo->target) == parent_len &&
+			    strncmp(ptinfo->target, target, parent_len) == 0) {
+				existed = true;
+				break;
+			}
+		}
+		mutex_unlock(&dump_targets.dumper_mutex);
+		if (existed == false) {
+			err = -ENOENT;
+			goto free_tinfo;
+		}
+
+		parent = ptinfo->dir_dentry;
+		target = lastslash + 1;
+	}
+	dentry = bpfdump_add_dir(target, parent, &bpfdump_dir_iops, tinfo);
+	if (IS_ERR(dentry)) {
+		err = PTR_ERR(dentry);
+		goto free_tinfo;
+	}
+
+	tinfo->dir_dentry = dentry;
+
+	mutex_lock(&dump_targets.dumper_mutex);
+	list_add(&tinfo->list, &dump_targets.dumpers);
+	mutex_unlock(&dump_targets.dumper_mutex);
+	return 0;
+
+free_tinfo:
+	kfree(tinfo);
+	return err;
+}
+
+static struct dentry *
+bpfdump_create_dentry(const char *name, umode_t mode, struct dentry *parent,
+		      void *data, const struct inode_operations *i_ops,
+		      const struct file_operations *f_ops)
+{
+	struct inode *dir, *inode;
+	struct dentry *dentry;
+	int err;
+
+	dir = d_inode(parent);
+
+	inode_lock(dir);
+	dentry = lookup_one_len(name, parent, strlen(name));
+	if (IS_ERR(dentry))
+		goto unlock;
+
+	if (d_really_is_positive(dentry)) {
+		err = -EEXIST;
+		goto dentry_put;
+	}
+
+	inode = new_inode(dir->i_sb);
+	if (!inode) {
+		err = -ENOMEM;
+		goto dentry_put;
+	}
+
+	inode->i_ino = get_next_ino();
+	inode->i_mode = mode;
+	inode->i_atime = inode->i_mtime = inode->i_ctime = current_time(inode);
+	inode->i_private = data;
+
+	if (S_ISDIR(mode)) {
+		inode->i_op = i_ops;
+		inode->i_fop = f_ops;
+		inc_nlink(inode);
+		inc_nlink(dir);
+	} else {
+		inode->i_fop = f_ops;
+	}
+
+	d_instantiate(dentry, inode);
+	inode_unlock(dir);
+	return dentry;
+
+dentry_put:
+	dput(dentry);
+	dentry = ERR_PTR(err);
+unlock:
+	inode_unlock(dir);
+	return dentry;
+}
+
+static struct dentry *
+bpfdump_add_dir(const char *name, struct dentry *parent,
+		const struct inode_operations *i_ops, void *data)
+{
+	return bpfdump_create_dentry(name, S_IFDIR | 0755, parent,
+				     data, i_ops, &simple_dir_operations);
+}
+
 static void bpfdump_free_inode(struct inode *inode)
 {
 	kfree(inode->i_private);
@@ -58,22 +224,50 @@ static struct file_system_type fs_type = {
 	.kill_sb		= kill_litter_super,
 };
 
-static int __init bpfdump_init(void)
+static int __bpfdump_init(void)
 {
+	struct vfsmount *mount = NULL;
+	int mount_count = 0;
 	int ret;
 
+	if (bpfdump_inited)
+		return bpfdump_inited < 0 ? bpfdump_inited : 0;
+
 	ret = sysfs_create_mount_point(kernel_kobj, "bpfdump");
 	if (ret)
-		return ret;
+		goto done;
 
 	ret = register_filesystem(&fs_type);
 	if (ret)
 		goto remove_mount;
 
+	/* get a reference to mount so we can populate targets
+	 * at init time.
+	 */
+	ret = simple_pin_fs(&fs_type, &mount, &mount_count);
+	if (ret)
+		goto remove_mount;
+
+	bpfdump_dentry = mount->mnt_root;
+
+	INIT_LIST_HEAD(&dump_targets.dumpers);
+	mutex_init(&dump_targets.dumper_mutex);
+
+	bpfdump_inited = 1;
 	return 0;
 
 remove_mount:
 	sysfs_remove_mount_point(kernel_kobj, "bpfdump");
+done:
+	bpfdump_inited = ret;
 	return ret;
 }
+
+static int __init bpfdump_init(void)
+{
+	if (bpfdump_dentry)
+		return 0;
+
+	return __bpfdump_init();
+}
 core_initcall(bpfdump_init);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 04/17] bpf: allow loading of a dumper program
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (2 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 03/17] bpf: provide a way for targets to register themselves Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 05/17] bpf: create file or anonymous dumpers Yonghong Song
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

A dumper bpf program is a tracing program with attach type
BPF_TRACE_DUMP. During bpf program load, the load attribute
   attach_prog_fd
carries the target directory fd. The program will be
verified against btf_id of the target_proto.

If the program is loaded successfully, the dump target, as
represented as a relative path to /sys/kernel/bpfdump,
will be remembered in prog->aux->dump_target, which will
be used later to create dumpers.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h            |  2 ++
 include/uapi/linux/bpf.h       |  6 ++++-
 kernel/bpf/dump.c              | 42 ++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c           |  8 ++++++-
 kernel/bpf/verifier.c          | 15 ++++++++++++
 tools/include/uapi/linux/bpf.h |  6 ++++-
 6 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 84c7eb40d7bc..068552c2d2cf 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -674,6 +674,7 @@ struct bpf_prog_aux {
 	struct bpf_map **used_maps;
 	struct bpf_prog *prog;
 	struct user_struct *user;
+	const char *dump_target;
 	u64 load_time; /* ns since boottime */
 	struct bpf_map *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
 	char name[BPF_OBJ_NAME_LEN];
@@ -1120,6 +1121,7 @@ struct bpf_dump_reg {
 };
 
 int bpf_dump_reg_target(struct bpf_dump_reg *reg_info);
+int bpf_dump_set_target_info(u32 target_fd, struct bpf_prog *prog);
 
 int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
 int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2e29a671d67e..f92b919c723e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -215,6 +215,7 @@ enum bpf_attach_type {
 	BPF_TRACE_FEXIT,
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
+	BPF_TRACE_DUMP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -476,7 +477,10 @@ union bpf_attr {
 		__aligned_u64	line_info;	/* line info */
 		__u32		line_info_cnt;	/* number of bpf_line_info records */
 		__u32		attach_btf_id;	/* in-kernel BTF type id to attach to */
-		__u32		attach_prog_fd; /* 0 to attach to vmlinux */
+		union {
+			__u32		attach_prog_fd; /* 0 to attach to vmlinux */
+			__u32		attach_target_fd;
+		};
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/dump.c b/kernel/bpf/dump.c
index e8b46f9e0ee0..8c7a89800312 100644
--- a/kernel/bpf/dump.c
+++ b/kernel/bpf/dump.c
@@ -11,6 +11,9 @@
 #include <linux/fs_parser.h>
 #include <linux/filter.h>
 #include <linux/bpf.h>
+#include <linux/btf.h>
+
+extern struct btf *btf_vmlinux;
 
 struct bpfdump_target_info {
 	struct list_head list;
@@ -51,6 +54,45 @@ static const struct inode_operations bpfdump_dir_iops = {
 	.unlink		= dumper_unlink,
 };
 
+int bpf_dump_set_target_info(u32 target_fd, struct bpf_prog *prog)
+{
+	struct bpfdump_target_info *tinfo;
+	const char *target_proto;
+	struct file *target_file;
+	struct fd tfd;
+	int err = 0, btf_id;
+
+	if (!btf_vmlinux)
+		return -EINVAL;
+
+	tfd = fdget(target_fd);
+	target_file = tfd.file;
+	if (!target_file)
+		return -EBADF;
+
+	if (target_file->f_inode->i_op != &bpfdump_dir_iops) {
+		err = -EINVAL;
+		goto done;
+	}
+
+	tinfo = target_file->f_inode->i_private;
+	target_proto = tinfo->target_proto;
+	btf_id = btf_find_by_name_kind(btf_vmlinux, target_proto,
+				       BTF_KIND_FUNC);
+
+	if (btf_id < 0) {
+		err = btf_id;
+		goto done;
+	}
+
+	prog->aux->dump_target = tinfo->target;
+	prog->aux->attach_btf_id = btf_id;
+
+done:
+	fdput(tfd);
+	return err;
+}
+
 int bpf_dump_reg_target(struct bpf_dump_reg *reg_info)
 {
 	struct bpfdump_target_info *tinfo, *ptinfo;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 64783da34202..1ce2f74f8efc 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2060,7 +2060,12 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 
 	prog->expected_attach_type = attr->expected_attach_type;
 	prog->aux->attach_btf_id = attr->attach_btf_id;
-	if (attr->attach_prog_fd) {
+	if (type == BPF_PROG_TYPE_TRACING &&
+	    attr->expected_attach_type == BPF_TRACE_DUMP) {
+		err = bpf_dump_set_target_info(attr->attach_target_fd, prog);
+		if (err)
+			goto free_prog_nouncharge;
+	} else if (attr->attach_prog_fd) {
 		struct bpf_prog *tgt_prog;
 
 		tgt_prog = bpf_prog_get(attr->attach_prog_fd);
@@ -2145,6 +2150,7 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	err = bpf_prog_new_fd(prog);
 	if (err < 0)
 		bpf_prog_put(prog);
+
 	return err;
 
 free_used_maps:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 04c6630cc18f..f531cee24fc5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -10426,6 +10426,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 	struct bpf_prog *tgt_prog = prog->aux->linked_prog;
 	u32 btf_id = prog->aux->attach_btf_id;
 	const char prefix[] = "btf_trace_";
+	struct btf_func_model fmodel;
 	int ret = 0, subprog = -1, i;
 	struct bpf_trampoline *tr;
 	const struct btf_type *t;
@@ -10566,6 +10567,20 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
 		prog->aux->attach_func_proto = t;
 		prog->aux->attach_btf_trace = true;
 		return 0;
+	case BPF_TRACE_DUMP:
+		if (!btf_type_is_func(t)) {
+			verbose(env, "attach_btf_id %u is not a function\n",
+				btf_id);
+			return -EINVAL;
+		}
+		t = btf_type_by_id(btf, t->type);
+		if (!btf_type_is_func_proto(t))
+			return -EINVAL;
+		prog->aux->attach_func_name = tname;
+		prog->aux->attach_func_proto = t;
+		ret = btf_distill_func_proto(&env->log, btf, t,
+					     tname, &fmodel);
+		return ret;
 	default:
 		if (!prog_extension)
 			return -EINVAL;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2e29a671d67e..f92b919c723e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -215,6 +215,7 @@ enum bpf_attach_type {
 	BPF_TRACE_FEXIT,
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
+	BPF_TRACE_DUMP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -476,7 +477,10 @@ union bpf_attr {
 		__aligned_u64	line_info;	/* line info */
 		__u32		line_info_cnt;	/* number of bpf_line_info records */
 		__u32		attach_btf_id;	/* in-kernel BTF type id to attach to */
-		__u32		attach_prog_fd; /* 0 to attach to vmlinux */
+		union {
+			__u32		attach_prog_fd; /* 0 to attach to vmlinux */
+			__u32		attach_target_fd;
+		};
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 05/17] bpf: create file or anonymous dumpers
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (3 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 04/17] bpf: allow loading of a dumper program Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 06/17] bpf: add PTR_TO_BTF_ID_OR_NULL support Yonghong Song
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Given a loaded dumper bpf program, which already
knows which target it should bind to, there
two ways to create a dumper:
  - a file based dumper under hierarchy of
    /sys/kernel/bpfdump/ which uses can
    "cat" to print out the output.
  - an anonymous dumper which user application
    can "read" the dumping output.

For file based dumper, BPF_OBJ_PIN syscall interface
is used. For anonymous dumper, BPF_RAW_TRACEPOINT_OPEN
syscall interface is used.

To facilitate target seq_ops->show() to get the
bpf program easily, dumper creation increased
the target-provided seq_file private data size
so bpf program pointer is also stored in seq_file
private data.

A session_id, which is unique for each
bpfdump open file, is available to the
bpf program. This can differentiate different
sessions if the same program is used
by multiple open files.

A seq_num, which represents how many
bpf_dump_get_prog() has been called, is
also available to the bpf program.
Such information can be used to e.g., print
banner before printing out actual data.

Note the seq_num does not represent the num
of unique kernel objects the bpf program has
seen. But it should be a good approximate.

A target feature BPF_DUMP_SEQ_NET_PRIVATE
is implemented specifically useful for
net based dumpers. It sets net namespace
as the current process net namespace.
This avoids changing existing net seq_ops
in order to retrieve net namespace from
the seq_file pointer.

For open dumper files, anonymous or not, the
fdinfo will show the target and prog_id associated
with that file descriptor. For dumper file itself,
a kernel interface will be provided to retrieve the
prog_ctx_type and prog_id in one of the later patches.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h  |   7 +
 kernel/bpf/dump.c    | 407 ++++++++++++++++++++++++++++++++++++++++++-
 kernel/bpf/syscall.c |  20 ++-
 3 files changed, 429 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 068552c2d2cf..3cc16991c287 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1111,6 +1111,8 @@ struct bpf_link *bpf_link_get_from_fd(u32 ufd);
 int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
 int bpf_obj_get_user(const char __user *pathname, int flags);
 
+#define BPF_DUMP_SEQ_NET_PRIVATE	BIT(0)
+
 struct bpf_dump_reg {
 	const char *target;
 	const char *target_proto;
@@ -1122,6 +1124,11 @@ struct bpf_dump_reg {
 
 int bpf_dump_reg_target(struct bpf_dump_reg *reg_info);
 int bpf_dump_set_target_info(u32 target_fd, struct bpf_prog *prog);
+int bpf_fd_dump_create(u32 prog_fd, const char __user *dumper_name,
+		       bool *is_dump_prog);
+int bpf_prog_dump_create(struct bpf_prog *prog);
+struct bpf_prog *bpf_dump_get_prog(struct seq_file *seq, u32 priv_data_size,
+				   u64 *session_id, u64 *seq_num, bool is_last);
 
 int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
 int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
diff --git a/kernel/bpf/dump.c b/kernel/bpf/dump.c
index 8c7a89800312..f39b82430977 100644
--- a/kernel/bpf/dump.c
+++ b/kernel/bpf/dump.c
@@ -30,11 +30,51 @@ struct bpfdump_targets {
 	struct mutex dumper_mutex;
 };
 
+struct dumper_inode_info {
+	struct bpfdump_target_info *tinfo;
+	struct bpf_prog *prog;
+};
+
+struct dumper_info {
+	struct list_head list;
+	/* file to identify an anon dumper,
+	 * dentry to identify a file dumper.
+	 */
+	union {
+		struct file *file;
+		struct dentry *dentry;
+	};
+	struct bpfdump_target_info *tinfo;
+	struct bpf_prog *prog;
+};
+
+struct dumpers {
+	struct list_head dumpers;
+	struct mutex dumper_mutex;
+};
+
+struct extra_priv_data {
+	struct bpf_prog *prog;
+	u64 session_id;
+	u64 seq_num;
+	bool has_last;
+};
+
 /* registered dump targets */
 static struct bpfdump_targets dump_targets;
 
 static struct dentry *bpfdump_dentry;
 
+static struct dumpers anon_dumpers, file_dumpers;
+
+static const struct file_operations bpf_dumper_ops;
+static const struct inode_operations bpfdump_dir_iops;
+
+static atomic64_t session_id;
+
+static struct dentry *bpfdump_add_file(const char *name, struct dentry *parent,
+				       const struct file_operations *f_ops,
+				       void *data);
 static struct dentry *bpfdump_add_dir(const char *name, struct dentry *parent,
 				      const struct inode_operations *i_ops,
 				      void *data);
@@ -43,12 +83,129 @@ static int __bpfdump_init(void);
 /* 0: not inited, > 0: successful, < 0: previous init failed */
 static int bpfdump_inited = 0;
 
+static u32 get_total_priv_dsize(u32 old_size)
+{
+	return roundup(old_size, 8) + sizeof(struct extra_priv_data);
+}
+
+static void *get_extra_priv_dptr(void *old_ptr, u32 old_size)
+{
+	return old_ptr + roundup(old_size, 8);
+}
+
+#ifdef CONFIG_PROC_FS
+static void dumper_show_fdinfo(struct seq_file *m, struct file *filp)
+{
+	struct dumper_inode_info *i_info = filp->f_inode->i_private;
+
+	seq_printf(m, "target:\t%s\n"
+		      "prog_id:\t%u\n",
+		   i_info->tinfo->target,
+		   i_info->prog->aux->id);
+}
+
+static void anon_dumper_show_fdinfo(struct seq_file *m, struct file *filp)
+{
+	struct dumper_info *dinfo;
+
+	mutex_lock(&anon_dumpers.dumper_mutex);
+	list_for_each_entry(dinfo, &anon_dumpers.dumpers, list) {
+		if (dinfo->file == filp) {
+			seq_printf(m, "target:\t%s\n"
+				      "prog_id:\t%u\n",
+				   dinfo->tinfo->target,
+				   dinfo->prog->aux->id);
+			break;
+		}
+	}
+	mutex_unlock(&anon_dumpers.dumper_mutex);
+}
+
+#endif
+
+static void process_target_feature(u32 feature, void *priv_data)
+{
+	/* use the current net namespace */
+	if (feature & BPF_DUMP_SEQ_NET_PRIVATE)
+		set_seq_net_private((struct seq_net_private *)priv_data,
+				    current->nsproxy->net_ns);
+}
+
+static int dumper_open(struct inode *inode, struct file *file)
+{
+	struct dumper_inode_info *i_info = inode->i_private;
+	struct extra_priv_data *extra_data;
+	u32 old_priv_size, total_priv_size;
+	void *priv_data;
+
+	old_priv_size = i_info->tinfo->seq_priv_size;
+	total_priv_size = get_total_priv_dsize(old_priv_size);
+	priv_data = __seq_open_private(file, i_info->tinfo->seq_ops,
+				       total_priv_size);
+	if (!priv_data)
+		return -ENOMEM;
+
+	process_target_feature(i_info->tinfo->target_feature, priv_data);
+
+	extra_data = get_extra_priv_dptr(priv_data, old_priv_size);
+	extra_data->prog = i_info->prog;
+	extra_data->session_id = atomic64_add_return(1, &session_id);
+	extra_data->seq_num = 0;
+	extra_data->has_last = false;
+
+	return 0;
+}
+
+static int anon_dumper_release(struct inode *inode, struct file *file)
+{
+	struct dumper_info *dinfo;
+
+	/* release the bpf program */
+	mutex_lock(&anon_dumpers.dumper_mutex);
+	list_for_each_entry(dinfo, &anon_dumpers.dumpers, list) {
+		if (dinfo->file == file) {
+			bpf_prog_put(dinfo->prog);
+			list_del(&dinfo->list);
+			break;
+		}
+	}
+	mutex_unlock(&anon_dumpers.dumper_mutex);
+
+	return seq_release_private(inode, file);
+}
+
+static int dumper_release(struct inode *inode, struct file *file)
+{
+	return seq_release_private(inode, file);
+}
+
 static int dumper_unlink(struct inode *dir, struct dentry *dentry)
 {
-	kfree(d_inode(dentry)->i_private);
+	struct dumper_inode_info *i_info = d_inode(dentry)->i_private;
+
+	bpf_prog_put(i_info->prog);
+	kfree(i_info);
+
 	return simple_unlink(dir, dentry);
 }
 
+static const struct file_operations bpf_dumper_ops = {
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo	= dumper_show_fdinfo,
+#endif
+	.open		= dumper_open,
+	.read		= seq_read,
+	.release	= dumper_release,
+};
+
+static const struct file_operations anon_bpf_dumper_ops = {
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo	= anon_dumper_show_fdinfo,
+#endif
+	.read		= seq_read,
+	.release	= anon_dumper_release,
+};
+
 static const struct inode_operations bpfdump_dir_iops = {
 	.lookup		= simple_lookup,
 	.unlink		= dumper_unlink,
@@ -93,6 +250,242 @@ int bpf_dump_set_target_info(u32 target_fd, struct bpf_prog *prog)
 	return err;
 }
 
+static int create_anon_dumper(struct bpfdump_target_info *tinfo,
+			      struct bpf_prog *prog)
+{
+	struct extra_priv_data *extra_data;
+	u32 old_priv_size, total_priv_size;
+	struct dumper_info *dinfo;
+	struct file *file;
+	int err, anon_fd;
+	void *priv_data;
+	struct fd fd;
+
+	anon_fd = anon_inode_getfd("bpf-dumper", &anon_bpf_dumper_ops,
+				   NULL, O_CLOEXEC);
+	if (anon_fd < 0)
+		return anon_fd;
+
+	/* setup seq_file for anon dumper */
+	fd = fdget(anon_fd);
+	file = fd.file;
+
+	dinfo = kmalloc(sizeof(*dinfo), GFP_KERNEL);
+	if (!dinfo) {
+		err = -ENOMEM;
+		goto free_fd;
+	}
+
+	old_priv_size = tinfo->seq_priv_size;
+	total_priv_size = get_total_priv_dsize(old_priv_size);
+
+	priv_data = __seq_open_private(file, tinfo->seq_ops,
+				       total_priv_size);
+	if (!priv_data) {
+		err = -ENOMEM;
+		goto free_dinfo;
+	}
+
+	dinfo->file = file;
+	dinfo->tinfo = tinfo;
+	dinfo->prog = prog;
+
+	mutex_lock(&anon_dumpers.dumper_mutex);
+	list_add(&dinfo->list, &anon_dumpers.dumpers);
+	mutex_unlock(&anon_dumpers.dumper_mutex);
+
+	process_target_feature(tinfo->target_feature, priv_data);
+
+	extra_data = get_extra_priv_dptr(priv_data, old_priv_size);
+	extra_data->session_id = atomic64_add_return(1, &session_id);
+	extra_data->prog = prog;
+	extra_data->seq_num = 0;
+	extra_data->has_last = false;
+
+	fdput(fd);
+	return anon_fd;
+
+free_dinfo:
+	kfree(dinfo);
+free_fd:
+	fdput(fd);
+	return err;
+}
+
+static int check_pathname(struct bpfdump_target_info *tinfo,
+			  const char __user *pathname)
+{
+	struct dentry *dentry;
+	struct inode *dir;
+	struct path path;
+	int err = 0;
+
+	dentry = user_path_create(AT_FDCWD, pathname, &path, 0);
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
+
+	dir = dentry->d_parent->d_inode;
+	if (dir->i_op != &bpfdump_dir_iops || dir->i_private != tinfo)
+		err = -EINVAL;
+
+	done_path_create(&path, dentry);
+	return err;
+
+}
+
+static int create_dumper(struct bpfdump_target_info *tinfo,
+			 const char __user *pathname,
+			 struct bpf_prog *prog)
+{
+	struct dumper_inode_info *i_info;
+	struct dumper_info *dinfo;
+	const char *pname, *dname;
+	struct dentry *dentry;
+	int err = 0;
+
+	i_info = kmalloc(sizeof(*i_info), GFP_KERNEL);
+	if (!i_info)
+		return -ENOMEM;
+
+	i_info->tinfo = tinfo;
+	i_info->prog = prog;
+
+	dinfo = kmalloc(sizeof(*dinfo), GFP_KERNEL);
+	if (!dinfo) {
+		err = -ENOMEM;
+		goto free_i_info;
+	}
+
+	err = check_pathname(tinfo, pathname);
+	if (err)
+		goto free_dinfo;
+
+	pname = strndup_user(pathname, PATH_MAX);
+	if (!pname) {
+		err = -ENOMEM;
+		goto free_dinfo;
+	}
+
+	dname = strrchr(pname, '/');
+	if (dname)
+		dname += 1;
+	else
+		dname = pname;
+
+	dentry = bpfdump_add_file(dname, tinfo->dir_dentry,
+				  &bpf_dumper_ops, i_info);
+	kfree(pname);
+	if (IS_ERR(dentry)) {
+		err = PTR_ERR(dentry);
+		goto free_dinfo;
+	}
+
+	dinfo->dentry = dentry;
+	dinfo->tinfo = tinfo;
+	dinfo->prog = prog;
+
+	mutex_lock(&file_dumpers.dumper_mutex);
+	list_add(&dinfo->list, &file_dumpers.dumpers);
+	mutex_unlock(&file_dumpers.dumper_mutex);
+
+	return 0;
+
+free_dinfo:
+	kfree(dinfo);
+free_i_info:
+	kfree(i_info);
+	return err;
+}
+
+static struct bpfdump_target_info *find_target_info(const char *target)
+{
+	struct bpfdump_target_info *info;
+
+	mutex_lock(&dump_targets.dumper_mutex);
+	list_for_each_entry(info, &dump_targets.dumpers, list) {
+		if (strcmp(info->target, target) == 0) {
+			mutex_unlock(&dump_targets.dumper_mutex);
+			return info;
+		}
+	}
+	mutex_unlock(&dump_targets.dumper_mutex);
+
+	return NULL;
+}
+
+static int bpf_dump_create(struct bpf_prog *prog, const char *target,
+			   const char __user *pathname)
+{
+	struct bpfdump_target_info *tinfo;
+
+	tinfo = find_target_info(target);
+	if (!tinfo)
+		return -EINVAL;
+
+	if (pathname)
+		return create_dumper(tinfo, pathname, prog);
+	else
+		return create_anon_dumper(tinfo, prog);
+}
+
+int bpf_fd_dump_create(u32 prog_fd, const char __user *pathname, bool *is_dump_prog)
+{
+	struct bpf_prog *prog;
+	const char *target;
+	int err = 0;
+
+	if (is_dump_prog)
+		*is_dump_prog = false;
+
+	prog = bpf_prog_get(prog_fd);
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	target = prog->aux->dump_target;
+	if (!target) {
+		err = -EINVAL;
+		goto free_prog;
+	}
+
+	if (is_dump_prog)
+		*is_dump_prog = true;
+
+	err = bpf_dump_create(prog, target, pathname);
+	if (err < 0)
+		goto free_prog;
+	goto done;
+
+free_prog:
+	bpf_prog_put(prog);
+done:
+	return err;
+}
+
+int bpf_prog_dump_create(struct bpf_prog *prog)
+{
+	return bpf_dump_create(prog, prog->aux->dump_target, (void __user *)NULL);
+}
+
+struct bpf_prog *bpf_dump_get_prog(struct seq_file *seq, u32 priv_data_size,
+	u64 *session_id, u64 *seq_num, bool is_last)
+{
+	struct extra_priv_data *extra_data;
+
+	if (seq->file->f_op != &bpf_dumper_ops &&
+	    seq->file->f_op != &anon_bpf_dumper_ops)
+		return NULL;
+
+	extra_data = get_extra_priv_dptr(seq->private, priv_data_size);
+	if (extra_data->has_last)
+		return NULL;
+
+	*session_id = extra_data->session_id;
+	*seq_num = extra_data->seq_num++;
+	extra_data->has_last = is_last;
+
+	return extra_data->prog;
+}
+
 int bpf_dump_reg_target(struct bpf_dump_reg *reg_info)
 {
 	struct bpfdump_target_info *tinfo, *ptinfo;
@@ -212,6 +605,14 @@ bpfdump_create_dentry(const char *name, umode_t mode, struct dentry *parent,
 	return dentry;
 }
 
+static struct dentry *
+bpfdump_add_file(const char *name, struct dentry *parent,
+		 const struct file_operations *f_ops, void *data)
+{
+	return bpfdump_create_dentry(name, S_IFREG | 0444, parent,
+				     data, NULL, f_ops);
+}
+
 static struct dentry *
 bpfdump_add_dir(const char *name, struct dentry *parent,
 		const struct inode_operations *i_ops, void *data)
@@ -294,6 +695,10 @@ static int __bpfdump_init(void)
 
 	INIT_LIST_HEAD(&dump_targets.dumpers);
 	mutex_init(&dump_targets.dumper_mutex);
+	INIT_LIST_HEAD(&anon_dumpers.dumpers);
+	mutex_init(&anon_dumpers.dumper_mutex);
+	INIT_LIST_HEAD(&file_dumpers.dumpers);
+	mutex_init(&file_dumpers.dumper_mutex);
 
 	bpfdump_inited = 1;
 	return 0;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1ce2f74f8efc..4a3c9fceebb8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2173,9 +2173,18 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 
 static int bpf_obj_pin(const union bpf_attr *attr)
 {
+	bool is_dump_prog = false;
+	int err;
+
 	if (CHECK_ATTR(BPF_OBJ) || attr->file_flags != 0)
 		return -EINVAL;
 
+	err = bpf_fd_dump_create(attr->bpf_fd,
+				 u64_to_user_ptr(attr->pathname),
+				 &is_dump_prog);
+	if (!err || is_dump_prog)
+		return err;
+
 	return bpf_obj_pin_user(attr->bpf_fd, u64_to_user_ptr(attr->pathname));
 }
 
@@ -2490,10 +2499,13 @@ static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
 			err = -EINVAL;
 			goto out_put_prog;
 		}
-		if (prog->type == BPF_PROG_TYPE_TRACING &&
-		    prog->expected_attach_type == BPF_TRACE_RAW_TP) {
-			tp_name = prog->aux->attach_func_name;
-			break;
+		if (prog->type == BPF_PROG_TYPE_TRACING) {
+			if (prog->expected_attach_type == BPF_TRACE_RAW_TP) {
+				tp_name = prog->aux->attach_func_name;
+				break;
+			} else if (prog->expected_attach_type == BPF_TRACE_DUMP) {
+				return bpf_prog_dump_create(prog);
+			}
 		}
 		return bpf_tracing_prog_attach(prog);
 	case BPF_PROG_TYPE_RAW_TRACEPOINT:
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 06/17] bpf: add PTR_TO_BTF_ID_OR_NULL support
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (4 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 05/17] bpf: create file or anonymous dumpers Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 07/17] bpf: add netlink and ipv6_route targets Yonghong Song
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Add bpf_reg_type PTR_TO_BTF_ID_OR_NULL support.
For tracing/dump program, the bpf program context
definition, e.g., for ipv6_route target, looks like
   struct bpfdump__ipv6_route {
     struct bpf_dump_meta *meta;
     struct fib6_info *rt;
   };

The kernel guarantees that meta is not NULL, but
rt maybe NULL. The NULL rt indicates the data structure
traversal has done. So bpf program can take proper action.

Add btf_id_or_null_non0_off to prog->aux structure, to
indicate that for tracing programs, if the context access
offset is not 0, set to PTR_TO_BTF_ID_OR_NULL instead of
PTR_TO_BTF_ID. This bit is set for tracing/dump program.
---
 include/linux/bpf.h   |  2 ++
 kernel/bpf/btf.c      |  5 ++++-
 kernel/bpf/dump.c     |  1 +
 kernel/bpf/verifier.c | 18 +++++++++++++-----
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3cc16991c287..1179ca3d0230 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -320,6 +320,7 @@ enum bpf_reg_type {
 	PTR_TO_TP_BUFFER,	 /* reg points to a writable raw tp's buffer */
 	PTR_TO_XDP_SOCK,	 /* reg points to struct xdp_sock */
 	PTR_TO_BTF_ID,		 /* reg points to kernel struct */
+	PTR_TO_BTF_ID_OR_NULL,	 /* reg points to kernel struct or NULL */
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -658,6 +659,7 @@ struct bpf_prog_aux {
 	bool offload_requested;
 	bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */
 	bool func_proto_unreliable;
+	bool btf_id_or_null_non0_off;
 	enum bpf_tramp_prog_type trampoline_prog_type;
 	struct bpf_trampoline *trampoline;
 	struct hlist_node tramp_hlist;
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index d65c6912bdaf..2c098e6b1acc 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3788,7 +3788,10 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 		return true;
 
 	/* this is a pointer to another type */
-	info->reg_type = PTR_TO_BTF_ID;
+	if (off != 0 && prog->aux->btf_id_or_null_non0_off)
+		info->reg_type = PTR_TO_BTF_ID_OR_NULL;
+	else
+		info->reg_type = PTR_TO_BTF_ID;
 
 	if (tgt_prog) {
 		ret = btf_translate_to_vmlinux(log, btf, t, tgt_prog->type, arg);
diff --git a/kernel/bpf/dump.c b/kernel/bpf/dump.c
index f39b82430977..c6d4d64aaa8e 100644
--- a/kernel/bpf/dump.c
+++ b/kernel/bpf/dump.c
@@ -244,6 +244,7 @@ int bpf_dump_set_target_info(u32 target_fd, struct bpf_prog *prog)
 
 	prog->aux->dump_target = tinfo->target;
 	prog->aux->attach_btf_id = btf_id;
+	prog->aux->btf_id_or_null_non0_off = true;
 
 done:
 	fdput(tfd);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f531cee24fc5..af711dd15e08 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -382,7 +382,8 @@ static bool reg_type_may_be_null(enum bpf_reg_type type)
 	return type == PTR_TO_MAP_VALUE_OR_NULL ||
 	       type == PTR_TO_SOCKET_OR_NULL ||
 	       type == PTR_TO_SOCK_COMMON_OR_NULL ||
-	       type == PTR_TO_TCP_SOCK_OR_NULL;
+	       type == PTR_TO_TCP_SOCK_OR_NULL ||
+	       type == PTR_TO_BTF_ID_OR_NULL;
 }
 
 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
@@ -396,7 +397,8 @@ static bool reg_type_may_be_refcounted_or_null(enum bpf_reg_type type)
 	return type == PTR_TO_SOCKET ||
 		type == PTR_TO_SOCKET_OR_NULL ||
 		type == PTR_TO_TCP_SOCK ||
-		type == PTR_TO_TCP_SOCK_OR_NULL;
+		type == PTR_TO_TCP_SOCK_OR_NULL ||
+		type == PTR_TO_BTF_ID_OR_NULL;
 }
 
 static bool arg_type_may_be_refcounted(enum bpf_arg_type type)
@@ -448,6 +450,7 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_TP_BUFFER]	= "tp_buffer",
 	[PTR_TO_XDP_SOCK]	= "xdp_sock",
 	[PTR_TO_BTF_ID]		= "ptr_",
+	[PTR_TO_BTF_ID_OR_NULL]	= "null_or_ptr_",
 };
 
 static char slot_type_char[] = {
@@ -508,7 +511,7 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 			/* reg->off should be 0 for SCALAR_VALUE */
 			verbose(env, "%lld", reg->var_off.value + reg->off);
 		} else {
-			if (t == PTR_TO_BTF_ID)
+			if (t == PTR_TO_BTF_ID || t == PTR_TO_BTF_ID_OR_NULL)
 				verbose(env, "%s", kernel_type_name(reg->btf_id));
 			verbose(env, "(id=%d", reg->id);
 			if (reg_type_may_be_refcounted_or_null(t))
@@ -2102,6 +2105,7 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case PTR_TO_TCP_SOCK_OR_NULL:
 	case PTR_TO_XDP_SOCK:
 	case PTR_TO_BTF_ID:
+	case PTR_TO_BTF_ID_OR_NULL:
 		return true;
 	default:
 		return false;
@@ -2603,7 +2607,7 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
 		 */
 		*reg_type = info.reg_type;
 
-		if (*reg_type == PTR_TO_BTF_ID)
+		if (*reg_type == PTR_TO_BTF_ID || *reg_type == PTR_TO_BTF_ID_OR_NULL)
 			*btf_id = info.btf_id;
 		else
 			env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
@@ -3196,7 +3200,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 				 * a sub-register.
 				 */
 				regs[value_regno].subreg_def = DEF_NOT_SUBREG;
-				if (reg_type == PTR_TO_BTF_ID)
+				if (reg_type == PTR_TO_BTF_ID ||
+				    reg_type == PTR_TO_BTF_ID_OR_NULL)
 					regs[value_regno].btf_id = btf_id;
 			}
 			regs[value_regno].type = reg_type;
@@ -6521,6 +6526,8 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
 			reg->type = PTR_TO_SOCK_COMMON;
 		} else if (reg->type == PTR_TO_TCP_SOCK_OR_NULL) {
 			reg->type = PTR_TO_TCP_SOCK;
+		} else if (reg->type == PTR_TO_BTF_ID_OR_NULL) {
+			reg->type = PTR_TO_BTF_ID;
 		}
 		if (is_null) {
 			/* We don't need id and ref_obj_id from this point
@@ -8374,6 +8381,7 @@ static bool reg_type_mismatch_ok(enum bpf_reg_type type)
 	case PTR_TO_TCP_SOCK_OR_NULL:
 	case PTR_TO_XDP_SOCK:
 	case PTR_TO_BTF_ID:
+	case PTR_TO_BTF_ID_OR_NULL:
 		return false;
 	default:
 		return true;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 07/17] bpf: add netlink and ipv6_route targets
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (5 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 06/17] bpf: add PTR_TO_BTF_ID_OR_NULL support Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 08/17] bpf: add bpf_map target Yonghong Song
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

This patch added netlink and ipv6_route targets, using
the same seq_ops (except show()) for /proc/net/{netlink,ipv6_route}.

Since module is not supported for now, ipv6_route is
supported only if the IPV6 is built-in, i.e., not compiled
as a module. The restriction can be lifted once module
is properly supported for bpfdump.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h      |  8 +++-
 kernel/bpf/dump.c        | 13 ++++++
 net/ipv6/ip6_fib.c       | 71 +++++++++++++++++++++++++++++-
 net/ipv6/route.c         | 29 +++++++++++++
 net/netlink/af_netlink.c | 94 +++++++++++++++++++++++++++++++++++++++-
 5 files changed, 210 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1179ca3d0230..401e5bf921a2 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1124,6 +1124,12 @@ struct bpf_dump_reg {
 	u32 target_feature;
 };
 
+struct bpf_dump_meta {
+	struct seq_file *seq;
+	u64 session_id;
+	u64 seq_num;
+};
+
 int bpf_dump_reg_target(struct bpf_dump_reg *reg_info);
 int bpf_dump_set_target_info(u32 target_fd, struct bpf_prog *prog);
 int bpf_fd_dump_create(u32 prog_fd, const char __user *dumper_name,
@@ -1131,7 +1137,7 @@ int bpf_fd_dump_create(u32 prog_fd, const char __user *dumper_name,
 int bpf_prog_dump_create(struct bpf_prog *prog);
 struct bpf_prog *bpf_dump_get_prog(struct seq_file *seq, u32 priv_data_size,
 				   u64 *session_id, u64 *seq_num, bool is_last);
-
+int bpf_dump_run_prog(struct bpf_prog *prog, void *ctx);
 int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
 int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
 int bpf_percpu_hash_update(struct bpf_map *map, void *key, void *value,
diff --git a/kernel/bpf/dump.c b/kernel/bpf/dump.c
index c6d4d64aaa8e..789b35772a81 100644
--- a/kernel/bpf/dump.c
+++ b/kernel/bpf/dump.c
@@ -487,6 +487,19 @@ struct bpf_prog *bpf_dump_get_prog(struct seq_file *seq, u32 priv_data_size,
 	return extra_data->prog;
 }
 
+int bpf_dump_run_prog(struct bpf_prog *prog, void *ctx)
+{
+	int ret;
+
+	migrate_disable();
+	rcu_read_lock();
+	ret = BPF_PROG_RUN(prog, ctx);
+	rcu_read_unlock();
+	migrate_enable();
+
+	return ret;
+}
+
 int bpf_dump_reg_target(struct bpf_dump_reg *reg_info)
 {
 	struct bpfdump_target_info *tinfo, *ptinfo;
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 46ed56719476..f5a48511d233 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -2467,7 +2467,7 @@ void fib6_gc_cleanup(void)
 }
 
 #ifdef CONFIG_PROC_FS
-static int ipv6_route_seq_show(struct seq_file *seq, void *v)
+static int ipv6_route_native_seq_show(struct seq_file *seq, void *v)
 {
 	struct fib6_info *rt = v;
 	struct ipv6_route_iter *iter = seq->private;
@@ -2625,7 +2625,7 @@ static bool ipv6_route_iter_active(struct ipv6_route_iter *iter)
 	return w->node && !(w->state == FWS_U && w->node == w->root);
 }
 
-static void ipv6_route_seq_stop(struct seq_file *seq, void *v)
+static void ipv6_route_native_seq_stop(struct seq_file *seq, void *v)
 	__releases(RCU_BH)
 {
 	struct net *net = seq_file_net(seq);
@@ -2637,6 +2637,73 @@ static void ipv6_route_seq_stop(struct seq_file *seq, void *v)
 	rcu_read_unlock_bh();
 }
 
+#if IS_BUILTIN(CONFIG_IPV6) && defined(CONFIG_BPF_SYSCALL)
+struct bpfdump__ipv6_route {
+	struct bpf_dump_meta *meta;
+	struct fib6_info *rt;
+};
+
+static int ipv6_route_prog_seq_show(struct bpf_prog *prog, struct seq_file *seq,
+				    u64 session_id, u64 seq_num, void *v)
+{
+	struct bpfdump__ipv6_route ctx;
+	struct bpf_dump_meta meta;
+	int ret;
+
+	meta.seq = seq;
+	meta.session_id = session_id;
+	meta.seq_num = seq_num;
+	ctx.meta = &meta;
+	ctx.rt = v;
+	ret = bpf_dump_run_prog(prog, &ctx);
+	return ret == 0 ? 0 : -EINVAL;
+}
+
+static int ipv6_route_seq_show(struct seq_file *seq, void *v)
+{
+	struct ipv6_route_iter *iter = seq->private;
+	u64 session_id, seq_num;
+	struct bpf_prog *prog;
+	int ret;
+
+	prog = bpf_dump_get_prog(seq, sizeof(struct ipv6_route_iter),
+				 &session_id, &seq_num, false);
+	if (!prog)
+		return ipv6_route_native_seq_show(seq, v);
+
+	ret = ipv6_route_prog_seq_show(prog, seq, session_id, seq_num, v);
+	iter->w.leaf = NULL;
+
+	return ret;
+}
+
+static void ipv6_route_seq_stop(struct seq_file *seq, void *v)
+{
+	u64 session_id, seq_num;
+	struct bpf_prog *prog;
+
+	if (!v) {
+		prog = bpf_dump_get_prog(seq, sizeof(struct ipv6_route_iter),
+					 &session_id, &seq_num, true);
+		if (prog)
+			ipv6_route_prog_seq_show(prog, seq, session_id,
+						 seq_num, v);
+	}
+
+	ipv6_route_native_seq_stop(seq, v);
+}
+#else
+static int ipv6_route_seq_show(struct seq_file *seq, void *v)
+{
+	return ipv6_route_native_seq_show(seq, v);
+}
+
+static void ipv6_route_seq_stop(struct seq_file *seq, void *v)
+{
+	ipv6_route_native_seq_stop(seq, v);
+}
+#endif
+
 const struct seq_operations ipv6_route_seq_ops = {
 	.start	= ipv6_route_seq_start,
 	.next	= ipv6_route_seq_next,
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 310cbddaa533..ea87d3f2c363 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -6390,10 +6390,31 @@ void __init ip6_route_init_special_entries(void)
   #endif
 }
 
+#if IS_BUILTIN(CONFIG_IPV6)
+#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_PROC_FS)
+int __init __bpfdump__ipv6_route(struct bpf_dump_meta *meta, struct fib6_info *rt)
+{
+	return 0;
+}
+#endif
+#endif
+
 int __init ip6_route_init(void)
 {
 	int ret;
 	int cpu;
+#if IS_BUILTIN(CONFIG_IPV6)
+#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_PROC_FS)
+	struct bpf_dump_reg reg_info = {
+		.target			= "ipv6_route",
+		.target_proto		= "__bpfdump__ipv6_route",
+		.prog_ctx_type_name	= "bpfdump__ipv6_route",
+		.seq_ops		= &ipv6_route_seq_ops,
+		.seq_priv_size		= sizeof(struct ipv6_route_iter),
+		.target_feature		= BPF_DUMP_SEQ_NET_PRIVATE,
+	};
+#endif
+#endif
 
 	ret = -ENOMEM;
 	ip6_dst_ops_template.kmem_cachep =
@@ -6452,6 +6473,14 @@ int __init ip6_route_init(void)
 	if (ret)
 		goto out_register_late_subsys;
 
+#if IS_BUILTIN(CONFIG_IPV6)
+#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_PROC_FS)
+	ret = bpf_dump_reg_target(&reg_info);
+	if (ret)
+		goto out_register_late_subsys;
+#endif
+#endif
+
 	for_each_possible_cpu(cpu) {
 		struct uncached_list *ul = per_cpu_ptr(&rt6_uncached_list, cpu);
 
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 5ded01ca8b20..fe9a10642c39 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2596,7 +2596,7 @@ static void *netlink_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	return __netlink_seq_next(seq);
 }
 
-static void netlink_seq_stop(struct seq_file *seq, void *v)
+static void netlink_native_seq_stop(struct seq_file *seq, void *v)
 {
 	struct nl_seq_iter *iter = seq->private;
 
@@ -2607,7 +2607,7 @@ static void netlink_seq_stop(struct seq_file *seq, void *v)
 }
 
 
-static int netlink_seq_show(struct seq_file *seq, void *v)
+static int netlink_native_seq_show(struct seq_file *seq, void *v)
 {
 	if (v == SEQ_START_TOKEN) {
 		seq_puts(seq,
@@ -2634,6 +2634,80 @@ static int netlink_seq_show(struct seq_file *seq, void *v)
 	return 0;
 }
 
+#ifdef CONFIG_BPF_SYSCALL
+struct bpfdump__netlink {
+	struct bpf_dump_meta *meta;
+	struct netlink_sock *sk;
+};
+
+int __init __bpfdump__netlink(struct bpf_dump_meta *meta, struct netlink_sock *sk)
+{
+	return 0;
+}
+
+static int netlink_prog_seq_show(struct bpf_prog *prog, struct seq_file *seq,
+				 u64 session_id, u64 seq_num, void *v)
+{
+	struct bpfdump__netlink ctx;
+	struct bpf_dump_meta meta;
+	int ret = 0;
+
+	meta.seq = seq;
+	meta.session_id = session_id;
+	meta.seq_num = seq_num;
+	ctx.meta = &meta;
+	ctx.sk = nlk_sk((struct sock *)v);
+	ret = bpf_dump_run_prog(prog, &ctx);
+
+	return ret == 0 ? 0 : -EINVAL;
+}
+
+static int netlink_seq_show(struct seq_file *seq, void *v)
+{
+	u64 session_id, seq_num;
+	struct bpf_prog *prog;
+
+	prog = bpf_dump_get_prog(seq, sizeof(struct nl_seq_iter),
+				 &session_id, &seq_num, false);
+	if (!prog)
+		return netlink_native_seq_show(seq, v);
+
+	if (v == SEQ_START_TOKEN)
+		return 0;
+
+	return netlink_prog_seq_show(prog, seq, session_id,
+				     seq_num - 1, v);
+}
+
+static void netlink_seq_stop(struct seq_file *seq, void *v)
+{
+	u64 session_id, seq_num;
+	struct bpf_prog *prog;
+
+	if (!v) {
+		prog = bpf_dump_get_prog(seq, sizeof(struct nl_seq_iter),
+					 &session_id, &seq_num, true);
+		if (prog) {
+			if (seq_num)
+				seq_num = seq_num - 1;
+			netlink_prog_seq_show(prog, seq, session_id,
+					      seq_num, v);
+		}
+	}
+
+	netlink_native_seq_stop(seq, v);
+}
+#else
+static int netlink_seq_show(struct seq_file *seq, void *v)
+{
+	return netlink_native_seq_show(seq, v);
+}
+static void netlink_seq_stop(struct seq_file *seq, void *v)
+{
+	netlink_native_seq_stop(seq, v);
+}
+#endif
+
 static const struct seq_operations netlink_seq_ops = {
 	.start  = netlink_seq_start,
 	.next   = netlink_seq_next,
@@ -2744,6 +2818,16 @@ static int __init netlink_proto_init(void)
 {
 	int i;
 	int err = proto_register(&netlink_proto, 0);
+#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_PROC_FS)
+	struct bpf_dump_reg reg_info = {
+		.target			= "netlink",
+		.target_proto		= "__bpfdump__netlink",
+		.prog_ctx_type_name	= "bpfdump_netlink",
+		.seq_ops		= &netlink_seq_ops,
+		.seq_priv_size		= sizeof(struct nl_seq_iter),
+		.target_feature		= BPF_DUMP_SEQ_NET_PRIVATE,
+	};
+#endif
 
 	if (err != 0)
 		goto out;
@@ -2764,6 +2848,12 @@ static int __init netlink_proto_init(void)
 		}
 	}
 
+#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_PROC_FS)
+	err = bpf_dump_reg_target(&reg_info);
+	if (err)
+		goto out;
+#endif
+
 	netlink_add_usersock_entry();
 
 	sock_register(&netlink_family_ops);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 08/17] bpf: add bpf_map target
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (6 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 07/17] bpf: add netlink and ipv6_route targets Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 09/17] bpf: add task and task/file targets Yonghong Song
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

This patch added bpf_map target. Traversing all bpf_maps
through map_idr. A reference is held for the map during
the show() to ensure safety and correctness for field accesses.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/syscall.c | 116 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 116 insertions(+)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4a3c9fceebb8..e6a4514435c4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3800,3 +3800,119 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 
 	return err;
 }
+
+struct bpfdump_seq_map_info {
+	struct bpf_map *map;
+	u32 id;
+};
+
+static struct bpf_map *bpf_map_seq_get_next(u32 *id)
+{
+	struct bpf_map *map;
+
+	spin_lock_bh(&map_idr_lock);
+	map = idr_get_next(&map_idr, id);
+	if (map)
+		map = __bpf_map_inc_not_zero(map, false);
+	spin_unlock_bh(&map_idr_lock);
+
+	return map;
+}
+
+static void *bpf_map_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	struct bpfdump_seq_map_info *info = seq->private;
+	struct bpf_map *map;
+	u32 id = info->id + 1;
+
+	map = bpf_map_seq_get_next(&id);
+	if (!map)
+		return NULL;
+
+	++*pos;
+	info->map = map;
+	info->id = id;
+	return map;
+}
+
+static void *bpf_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct bpfdump_seq_map_info *info = seq->private;
+	struct bpf_map *map;
+	u32 id = info->id + 1;
+
+	++*pos;
+	map = bpf_map_seq_get_next(&id);
+	if (!map)
+		return NULL;
+
+	__bpf_map_put(info->map, true);
+	info->map = map;
+	info->id = id;
+	return map;
+}
+
+struct bpfdump__bpf_map {
+	struct bpf_dump_meta *meta;
+	struct bpf_map *map;
+};
+
+int __init __bpfdump__bpf_map(struct bpf_dump_meta *meta, struct bpf_map *map)
+{
+	return 0;
+}
+
+static int bpf_map_seq_show(struct seq_file *seq, void *v)
+{
+	struct bpf_dump_meta meta;
+	struct bpfdump__bpf_map ctx;
+	struct bpf_prog *prog;
+	int ret = 0;
+
+	ctx.meta = &meta;
+	ctx.map = v;
+	meta.seq = seq;
+	prog = bpf_dump_get_prog(seq, sizeof(struct bpfdump_seq_map_info),
+				 &meta.session_id, &meta.seq_num,
+				 v == (void *)0);
+	if (prog)
+		ret = bpf_dump_run_prog(prog, &ctx);
+
+	return ret == 0 ? 0 : -EINVAL;
+}
+
+static void bpf_map_seq_stop(struct seq_file *seq, void *v)
+{
+	struct bpfdump_seq_map_info *info = seq->private;
+
+	if (!v)
+		bpf_map_seq_show(seq, v);
+
+	if (info->map) {
+		__bpf_map_put(info->map, true);
+		info->map = NULL;
+	}
+}
+
+static const struct seq_operations bpf_map_seq_ops = {
+	.start	= bpf_map_seq_start,
+	.next	= bpf_map_seq_next,
+	.stop	= bpf_map_seq_stop,
+	.show	= bpf_map_seq_show,
+};
+
+static int __init bpf_map_dump_init(void)
+{
+	struct bpf_dump_reg reg_info = {
+		.target			= "bpf_map",
+		.target_proto 		= "__bpfdump__bpf_map",
+		.prog_ctx_type_name	= "bpfdump__bpf_map",
+		.seq_ops		= &bpf_map_seq_ops,
+		.seq_priv_size		= sizeof(struct bpfdump_seq_map_info),
+		.target_feature		= 0,
+	};
+
+	return bpf_dump_reg_target(&reg_info);
+}
+
+late_initcall(bpf_map_dump_init);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 09/17] bpf: add task and task/file targets
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (7 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 08/17] bpf: add bpf_map target Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 10/17] bpf: add bpf_seq_printf and bpf_seq_write helpers Yonghong Song
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Only the tasks belonging to "current" pid namespace
are enumerated.

For task/file target, the bpf program will have access to
  struct task_struct *task
  u32 fd
  struct file *file
where fd/file is an open file for the task.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/Makefile    |   2 +-
 kernel/bpf/dump_task.c | 320 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 321 insertions(+), 1 deletion(-)
 create mode 100644 kernel/bpf/dump_task.c

diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 4a1376ab2bea..7e2c73deabab 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -26,7 +26,7 @@ obj-$(CONFIG_BPF_SYSCALL) += reuseport_array.o
 endif
 ifeq ($(CONFIG_SYSFS),y)
 obj-$(CONFIG_DEBUG_INFO_BTF) += sysfs_btf.o
-obj-$(CONFIG_BPF_SYSCALL) += dump.o
+obj-$(CONFIG_BPF_SYSCALL) += dump.o dump_task.o
 endif
 ifeq ($(CONFIG_BPF_JIT),y)
 obj-$(CONFIG_BPF_SYSCALL) += bpf_struct_ops.o
diff --git a/kernel/bpf/dump_task.c b/kernel/bpf/dump_task.c
new file mode 100644
index 000000000000..cb0767f4d962
--- /dev/null
+++ b/kernel/bpf/dump_task.c
@@ -0,0 +1,320 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 Facebook */
+
+#include <linux/init.h>
+#include <linux/namei.h>
+#include <linux/pid_namespace.h>
+#include <linux/fs.h>
+#include <linux/fdtable.h>
+#include <linux/filter.h>
+
+struct bpfdump_seq_task_info {
+	struct pid_namespace *ns;
+	struct task_struct *task;
+	u32 id;
+};
+
+static struct task_struct *task_seq_get_next(struct pid_namespace *ns, u32 *id)
+{
+	struct task_struct *task;
+	struct pid *pid;
+
+	rcu_read_lock();
+	pid = idr_get_next(&ns->idr, id);
+	task = get_pid_task(pid, PIDTYPE_PID);
+	if (task)
+		get_task_struct(task);
+	rcu_read_unlock();
+
+	return task;
+}
+
+static void *task_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	struct bpfdump_seq_task_info *info = seq->private;
+	struct task_struct *task;
+	u32 id = info->id + 1;
+
+	if (*pos == 0)
+		info->ns = task_active_pid_ns(current);
+
+	task = task_seq_get_next(info->ns, &id);
+	if (!task)
+		return NULL;
+
+	++*pos;
+	info->task = task;
+	info->id = id;
+
+	return task;
+}
+
+static void *task_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct bpfdump_seq_task_info *info = seq->private;
+	struct task_struct *task;
+	u32 id = info->id + 1;
+
+	++*pos;
+	task = task_seq_get_next(info->ns, &id);
+	if (!task)
+		return NULL;
+
+	put_task_struct(info->task);
+	info->task = task;
+	info->id = id;
+	return task;
+}
+
+struct bpfdump__task {
+	struct bpf_dump_meta *meta;
+	struct task_struct *task;
+};
+
+int __init __bpfdump__task(struct bpf_dump_meta *meta, struct task_struct *task)
+{
+	return 0;
+}
+
+static int task_seq_show(struct seq_file *seq, void *v)
+{
+	struct bpf_dump_meta meta;
+	struct bpfdump__task ctx;
+	struct bpf_prog *prog;
+	int ret = 0;
+
+	prog = bpf_dump_get_prog(seq, sizeof(struct bpfdump_seq_task_info),
+				 &meta.session_id, &meta.seq_num,
+				 v == (void *)0);
+	if (prog) {
+		meta.seq = seq;
+		ctx.meta = &meta;
+		ctx.task = v;
+		ret = bpf_dump_run_prog(prog, &ctx);
+	}
+
+	return ret == 0 ? 0 : -EINVAL;
+}
+
+static void task_seq_stop(struct seq_file *seq, void *v)
+{
+	struct bpfdump_seq_task_info *info = seq->private;
+
+	if (!v)
+		task_seq_show(seq, v);
+
+	if (info->task) {
+		put_task_struct(info->task);
+		info->task = NULL;
+	}
+}
+
+static const struct seq_operations task_seq_ops = {
+        .start  = task_seq_start,
+        .next   = task_seq_next,
+        .stop   = task_seq_stop,
+        .show   = task_seq_show,
+};
+
+struct bpfdump_seq_task_file_info {
+	struct pid_namespace *ns;
+	struct task_struct *task;
+	struct files_struct *files;
+	u32 id;
+	u32 fd;
+};
+
+static struct file *task_file_seq_get_next(struct pid_namespace *ns, u32 *id,
+					   int *fd, struct task_struct **task,
+					   struct files_struct **fstruct)
+{
+	struct files_struct *files;
+	struct task_struct *tk;
+	u32 sid = *id;
+	int sfd;
+
+	/* If this function returns a non-NULL file object,
+	 * it held a reference to the files_struct and file.
+	 * Otherwise, it does not hold any reference.
+	 */
+again:
+	if (*fstruct) {
+		files = *fstruct;
+		sfd = *fd;
+	} else {
+		tk = task_seq_get_next(ns, &sid);
+		if (!tk)
+			return NULL;
+		files = get_files_struct(tk);
+		put_task_struct(tk);
+		if (!files)
+			return NULL;
+		*fstruct = files;
+		*task = tk;
+		if (sid == *id) {
+			sfd = *fd;
+		} else {
+			*id = sid;
+			sfd = 0;
+		}
+	}
+
+	rcu_read_lock();
+	for (; sfd < files_fdtable(files)->max_fds; sfd++) {
+		struct file *f;
+
+		f = fcheck_files(files, sfd);
+		if (!f)
+			continue;
+
+		*fd = sfd;
+		get_file(f);
+		rcu_read_unlock();
+		return f;
+	}
+
+	/* the current task is done, go to the next task */
+	rcu_read_unlock();
+	put_files_struct(files);
+	*fstruct = NULL;
+	sid = ++(*id);
+	goto again;
+}
+
+static void *task_file_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	struct bpfdump_seq_task_file_info *info = seq->private;
+	struct files_struct *files = NULL;
+	struct task_struct *task = NULL;
+	struct file *file;
+	u32 id = info->id;
+	int fd = info->fd + 1;
+
+	if (*pos == 0)
+		info->ns = task_active_pid_ns(current);
+
+	file = task_file_seq_get_next(info->ns, &id, &fd, &task, &files);
+	if (!file) {
+		info->files = NULL;
+		return NULL;
+	}
+
+	++*pos;
+	info->id = id;
+	info->fd = fd;
+	info->task = task;
+	info->files = files;
+
+	return file;
+}
+
+static void *task_file_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct bpfdump_seq_task_file_info *info = seq->private;
+	struct files_struct *files = info->files;
+	struct task_struct *task = info->task;
+	int fd = info->fd + 1;
+	struct file *file;
+	u32 id = info->id;
+
+	++*pos;
+	fput((struct file *)v);
+	file = task_file_seq_get_next(info->ns, &id, &fd, &task, &files);
+	if (!file) {
+		info->files = NULL;
+		return NULL;
+	}
+
+	info->id = id;
+	info->fd = fd;
+	info->task = task;
+	info->files = files;
+
+	return file;
+}
+
+struct bpfdump__task_file {
+	struct bpf_dump_meta *meta;
+	struct task_struct *task;
+	u32 fd;
+	struct file *file;
+};
+
+int __init __bpfdump__task_file(struct bpf_dump_meta *meta,
+			      struct task_struct *task, u32 fd,
+			      struct file *file)
+{
+	return 0;
+}
+
+static int task_file_seq_show(struct seq_file *seq, void *v)
+{
+	struct bpfdump_seq_task_file_info *info = seq->private;
+	struct bpfdump__task_file ctx;
+	struct bpf_dump_meta meta;
+	struct bpf_prog *prog;
+	int ret = 0;
+
+	prog = bpf_dump_get_prog(seq, sizeof(struct bpfdump_seq_task_file_info),
+				 &meta.session_id, &meta.seq_num, v == (void *)0);
+	if (prog) {
+		meta.seq = seq;
+		ctx.meta = &meta;
+		ctx.task = info->task;
+		ctx.fd = info->fd;
+		ctx.file = v;
+		ret = bpf_dump_run_prog(prog, &ctx);
+	}
+
+	return ret == 0 ? 0 : -EINVAL;
+}
+
+static void task_file_seq_stop(struct seq_file *seq, void *v)
+{
+	struct bpfdump_seq_task_file_info *info = seq->private;
+
+	if (v)
+		fput((struct file *)v);
+	else
+		task_file_seq_show(seq, v);
+
+	if (info->files) {
+		put_files_struct(info->files);
+		info->files = NULL;
+	}
+}
+
+static const struct seq_operations task_file_seq_ops = {
+        .start  = task_file_seq_start,
+        .next   = task_file_seq_next,
+        .stop   = task_file_seq_stop,
+        .show   = task_file_seq_show,
+};
+
+static int __init task_dump_init(void)
+{
+	struct bpf_dump_reg task_file_reg_info = {
+		.target			= "task/file",
+		.target_proto		= "__bpfdump__task_file",
+		.prog_ctx_type_name	= "bpfdump__task_file",
+		.seq_ops		= &task_file_seq_ops,
+		.seq_priv_size		= sizeof(struct bpfdump_seq_task_file_info),
+		.target_feature		= 0,
+	};
+	struct bpf_dump_reg task_reg_info = {
+		.target			= "task",
+		.target_proto		= "__bpfdump__task",
+		.prog_ctx_type_name	= "bpfdump__task",
+		.seq_ops		= &task_seq_ops,
+		.seq_priv_size		= sizeof(struct bpfdump_seq_task_info),
+		.target_feature		= 0,
+	};
+	int ret;
+
+	ret = bpf_dump_reg_target(&task_reg_info);
+	if (ret)
+		return ret;
+
+	return bpf_dump_reg_target(&task_file_reg_info);
+}
+late_initcall(task_dump_init);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 10/17] bpf: add bpf_seq_printf and bpf_seq_write helpers
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (8 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 09/17] bpf: add task and task/file targets Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 11/17] bpf: support variable length array in tracing programs Yonghong Song
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Two helpers bpf_seq_printf and bpf_seq_write, are added for
writing data to the seq_file buffer.

bpf_seq_printf supports common format string flag/width/type
fields so at least I can get identical results for
netlink and ipv6_route targets.

For bpf_seq_printf, return value 1 specifically indicates
a write failure due to overflow in order to differentiate
the failure from format strings.

For seq_file show, since the same object may be called
twice, some bpf_prog might be sensitive to this. With return
value indicating overflow happens the bpf program can
react differently.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/uapi/linux/bpf.h       |  18 +++-
 kernel/trace/bpf_trace.c       | 172 +++++++++++++++++++++++++++++++++
 scripts/bpf_helpers_doc.py     |   2 +
 tools/include/uapi/linux/bpf.h |  18 +++-
 4 files changed, 208 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f92b919c723e..75f3657d526c 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3029,6 +3029,20 @@ union bpf_attr {
  *		* **-EOPNOTSUPP**	Unsupported operation, for example a
  *					call from outside of TC ingress.
  *		* **-ESOCKTNOSUPPORT**	Socket type not supported (reuseport).
+ *
+ * int bpf_seq_printf(struct seq_file *m, const char *fmt, u32 fmt_size, ...)
+ * 	Description
+ * 		seq_printf
+ * 	Return
+ * 		0 if successful, or
+ * 		1 if failure due to buffer overflow, or
+ * 		a negative value for format string related failures.
+ *
+ * int bpf_seq_write(struct seq_file *m, const void *data, u32 len)
+ * 	Description
+ * 		seq_write
+ * 	Return
+ * 		0 if successful, non-zero otherwise.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3155,7 +3169,9 @@ union bpf_attr {
 	FN(xdp_output),			\
 	FN(get_netns_cookie),		\
 	FN(get_current_ancestor_cgroup_id),	\
-	FN(sk_assign),
+	FN(sk_assign),			\
+	FN(seq_printf),			\
+	FN(seq_write),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ca1796747a77..e7d6ba7c9c51 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -457,6 +457,174 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
 	return &bpf_trace_printk_proto;
 }
 
+BPF_CALL_5(bpf_seq_printf, struct seq_file *, m, char *, fmt, u32, fmt_size, u64, arg1,
+	   u64, arg2)
+{
+	bool buf_used = false;
+	int i, copy_size;
+	int mod[2] = {};
+	int fmt_cnt = 0;
+	u64 unsafe_addr;
+	char buf[64];
+
+	/*
+	 * bpf_check()->check_func_arg()->check_stack_boundary()
+	 * guarantees that fmt points to bpf program stack,
+	 * fmt_size bytes of it were initialized and fmt_size > 0
+	 */
+	if (fmt[--fmt_size] != 0)
+		return -EINVAL;
+
+	/* check format string for allowed specifiers */
+	for (i = 0; i < fmt_size; i++) {
+		if ((!isprint(fmt[i]) && !isspace(fmt[i])) || !isascii(fmt[i]))
+			return -EINVAL;
+
+		if (fmt[i] != '%')
+			continue;
+
+		if (fmt_cnt >= 2)
+			return -EINVAL;
+
+		/* fmt[i] != 0 && fmt[last] == 0, so we can access fmt[i + 1] */
+		i++;
+
+		/* skip optional "[0+-][num]" width formating field */
+		while (fmt[i] == '0' || fmt[i] == '+'  || fmt[i] == '-')
+			i++;
+		if (fmt[i] >= '1' && fmt[i] <= '9') {
+			i++;
+			while (fmt[i] >= '0' && fmt[i] <= '9')
+				i++;
+		}
+
+		if (fmt[i] == 'l') {
+			mod[fmt_cnt]++;
+			i++;
+		} else if (fmt[i] == 's') {
+			mod[fmt_cnt]++;
+			fmt_cnt++;
+			/* disallow any further format extensions */
+			if (fmt[i + 1] != 0 &&
+			    !isspace(fmt[i + 1]) &&
+			    !ispunct(fmt[i + 1]))
+				return -EINVAL;
+
+			if (buf_used)
+				/* allow only one '%s'/'%p' per fmt string */
+				return -EINVAL;
+			buf_used = true;
+
+			if (fmt_cnt == 1) {
+				unsafe_addr = arg1;
+				arg1 = (long) buf;
+			} else {
+				unsafe_addr = arg2;
+				arg2 = (long) buf;
+			}
+			buf[0] = 0;
+			strncpy_from_unsafe(buf,
+					    (void *) (long) unsafe_addr,
+					    sizeof(buf));
+			continue;
+		} else if (fmt[i] == 'p') {
+			mod[fmt_cnt]++;
+			fmt_cnt++;
+			if (fmt[i + 1] == 0 ||
+			    fmt[i + 1] == 'K' ||
+			    fmt[i + 1] == 'x') {
+				/* just kernel pointers */
+				continue;
+			}
+
+			if (buf_used)
+				return -EINVAL;
+			buf_used = true;
+
+			/* only support "%pI4", "%pi4", "%pI6" and "pi6". */
+			if (fmt[i + 1] != 'i' && fmt[i + 1] != 'I')
+				return -EINVAL;
+			if (fmt[i + 2] != '4' && fmt[i + 2] != '6')
+				return -EINVAL;
+
+			copy_size = (fmt[i + 2] == '4') ? 4 : 16;
+
+			if (fmt_cnt == 1) {
+				unsafe_addr = arg1;
+				arg1 = (long) buf;
+			} else {
+				unsafe_addr = arg2;
+				arg2 = (long) buf;
+			}
+			probe_kernel_read(buf, (void *) (long) unsafe_addr, copy_size);
+
+			i += 2;
+			continue;
+		}
+
+		if (fmt[i] == 'l') {
+			mod[fmt_cnt]++;
+			i++;
+		}
+
+		if (fmt[i] != 'i' && fmt[i] != 'd' &&
+		    fmt[i] != 'u' && fmt[i] != 'x')
+			return -EINVAL;
+		fmt_cnt++;
+	}
+
+/* Horrid workaround for getting va_list handling working with different
+ * argument type combinations generically for 32 and 64 bit archs.
+ */
+#define __BPF_SP_EMIT()	__BPF_ARG2_SP()
+#define __BPF_SP(...)							\
+	seq_printf(m, fmt, ##__VA_ARGS__)
+
+#define __BPF_ARG1_SP(...)						\
+	((mod[0] == 2 || (mod[0] == 1 && __BITS_PER_LONG == 64))	\
+	  ? __BPF_SP(arg1, ##__VA_ARGS__)				\
+	  : ((mod[0] == 1 || (mod[0] == 0 && __BITS_PER_LONG == 32))	\
+	      ? __BPF_SP((long)arg1, ##__VA_ARGS__)			\
+	      : __BPF_SP((u32)arg1, ##__VA_ARGS__)))
+
+#define __BPF_ARG2_SP(...)						\
+	((mod[1] == 2 || (mod[1] == 1 && __BITS_PER_LONG == 64))	\
+	  ? __BPF_ARG1_SP(arg2, ##__VA_ARGS__)				\
+	  : ((mod[1] == 1 || (mod[1] == 0 && __BITS_PER_LONG == 32))	\
+	      ? __BPF_ARG1_SP((long)arg2, ##__VA_ARGS__)		\
+	      : __BPF_ARG1_SP((u32)arg2, ##__VA_ARGS__)))
+
+	__BPF_SP_EMIT();
+	return seq_has_overflowed(m);
+}
+
+static int bpf_seq_printf_btf_ids[5];
+static const struct bpf_func_proto bpf_seq_printf_proto = {
+	.func		= bpf_seq_printf,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_BTF_ID,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
+	.btf_id		= bpf_seq_printf_btf_ids,
+};
+
+BPF_CALL_3(bpf_seq_write, struct seq_file *, m, const char *, data, u32, len)
+{
+	return seq_write(m, data, len);
+}
+
+static int bpf_seq_write_btf_ids[5];
+static const struct bpf_func_proto bpf_seq_write_proto = {
+	.func		= bpf_seq_write,
+	.gpl_only	= true,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_BTF_ID,
+	.arg2_type	= ARG_PTR_TO_MEM,
+	.arg3_type	= ARG_CONST_SIZE,
+	.btf_id		= bpf_seq_write_btf_ids,
+};
+
 static __always_inline int
 get_map_perf_counter(struct bpf_map *map, u64 flags,
 		     u64 *value, u64 *enabled, u64 *running)
@@ -1224,6 +1392,10 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_xdp_output:
 		return &bpf_xdp_output_proto;
 #endif
+	case BPF_FUNC_seq_printf:
+		return &bpf_seq_printf_proto;
+	case BPF_FUNC_seq_write:
+		return &bpf_seq_write_proto;
 	default:
 		return raw_tp_prog_func_proto(func_id, prog);
 	}
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index f43d193aff3a..ded304c96a05 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -414,6 +414,7 @@ class PrinterHelpers(Printer):
             'struct sk_reuseport_md',
             'struct sockaddr',
             'struct tcphdr',
+            'struct seq_file',
 
             'struct __sk_buff',
             'struct sk_msg_md',
@@ -450,6 +451,7 @@ class PrinterHelpers(Printer):
             'struct sk_reuseport_md',
             'struct sockaddr',
             'struct tcphdr',
+            'struct seq_file',
     }
     mapped_types = {
             'u8': '__u8',
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f92b919c723e..75f3657d526c 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3029,6 +3029,20 @@ union bpf_attr {
  *		* **-EOPNOTSUPP**	Unsupported operation, for example a
  *					call from outside of TC ingress.
  *		* **-ESOCKTNOSUPPORT**	Socket type not supported (reuseport).
+ *
+ * int bpf_seq_printf(struct seq_file *m, const char *fmt, u32 fmt_size, ...)
+ * 	Description
+ * 		seq_printf
+ * 	Return
+ * 		0 if successful, or
+ * 		1 if failure due to buffer overflow, or
+ * 		a negative value for format string related failures.
+ *
+ * int bpf_seq_write(struct seq_file *m, const void *data, u32 len)
+ * 	Description
+ * 		seq_write
+ * 	Return
+ * 		0 if successful, non-zero otherwise.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3155,7 +3169,9 @@ union bpf_attr {
 	FN(xdp_output),			\
 	FN(get_netns_cookie),		\
 	FN(get_current_ancestor_cgroup_id),	\
-	FN(sk_assign),
+	FN(sk_assign),			\
+	FN(seq_printf),			\
+	FN(seq_write),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 11/17] bpf: support variable length array in tracing programs
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (9 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 10/17] bpf: add bpf_seq_printf and bpf_seq_write helpers Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 12/17] bpf: implement query for target_proto and file dumper prog_id Yonghong Song
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

In /proc/net/ipv6_route, we have
  struct fib6_info {
    struct fib6_table *fib6_table;
    ...
    struct fib6_nh fib6_nh[0];
  }
  struct fib6_nh {
    struct fib_nh_common nh_common;
    struct rt6_info **rt6i_pcpu;
    struct rt6_exception_bucket *rt6i_exception_bucket;
  };
  struct fib_nh_common {
    ...
    u8 nhc_gw_family;
    ...
  }

The access:
  struct fib6_nh *fib6_nh = &rt->fib6_nh;
  ... fib6_nh->nh_common.nhc_gw_family ...

This patch ensures such an access is handled properly.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/btf.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 2c098e6b1acc..dcee5ca0d501 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3840,6 +3840,31 @@ int btf_struct_access(struct bpf_verifier_log *log,
 	}
 
 	if (off + size > t->size) {
+		/* If the last element is a variable size array, we may
+		 * need to relax the rule.
+		 */
+		struct btf_array *array_elem;
+		u32 vlen = btf_type_vlen(t);
+		u32 last_member_type;
+
+		member = btf_type_member(t);
+		last_member_type = member[vlen - 1].type;
+		mtype = btf_type_by_id(btf_vmlinux, last_member_type);
+		if (!btf_type_is_array(mtype))
+			goto error;
+
+		array_elem = (struct btf_array *)(mtype + 1);
+		if (array_elem->nelems != 0)
+			goto error;
+
+		elem_type = btf_type_by_id(btf_vmlinux, array_elem->type);
+		if (!btf_type_is_struct(elem_type))
+			goto error;
+
+		off = (off - t->size) % elem_type->size;
+		return btf_struct_access(log, elem_type, off, size, atype, next_btf_id);
+
+error:
 		bpf_log(log, "access beyond struct %s at off %u size %u\n",
 			tname, off, size);
 		return -EACCES;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 12/17] bpf: implement query for target_proto and file dumper prog_id
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (10 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 11/17] bpf: support variable length array in tracing programs Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 13/17] tools/libbpf: libbpf support for bpfdump Yonghong Song
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Given a fd representing a bpfdump target, user
can retrieve the target_proto name which represents
the bpf program prototype.

Given a fd representing a file dumper, user can
retrieve the bpf_prog id associated with that dumper.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h            |  2 +
 include/uapi/linux/bpf.h       | 11 +++++-
 kernel/bpf/dump.c              | 72 ++++++++++++++++++++++++++++++++++
 kernel/bpf/syscall.c           |  2 +-
 tools/include/uapi/linux/bpf.h | 11 +++++-
 5 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 401e5bf921a2..a1ae8509e735 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1138,6 +1138,8 @@ int bpf_prog_dump_create(struct bpf_prog *prog);
 struct bpf_prog *bpf_dump_get_prog(struct seq_file *seq, u32 priv_data_size,
 				   u64 *session_id, u64 *seq_num, bool is_last);
 int bpf_dump_run_prog(struct bpf_prog *prog, void *ctx);
+int bpf_dump_query(const union bpf_attr *attr, union bpf_attr __user *uattr);
+
 int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
 int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
 int bpf_percpu_hash_update(struct bpf_map *map, void *key, void *value,
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 75f3657d526c..856e3f8a63b8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -533,7 +533,10 @@ union bpf_attr {
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
-		__u32		bpf_fd;
+		union {
+			__u32		bpf_fd;
+			__u32		dump_query_fd;
+		};
 		__u32		info_len;
 		__aligned_u64	info;
 	} info;
@@ -3618,6 +3621,12 @@ struct bpf_btf_info {
 	__u32 id;
 } __attribute__((aligned(8)));
 
+struct bpf_dump_info {
+	__aligned_u64 prog_ctx_type_name;
+	__u32 type_name_buf_len;
+	__u32 prog_id;
+} __attribute__((aligned(8)));
+
 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
  * by user and intended to be used by socket (e.g. to bind to, depends on
  * attach attach type).
diff --git a/kernel/bpf/dump.c b/kernel/bpf/dump.c
index 789b35772a81..643591bf5aea 100644
--- a/kernel/bpf/dump.c
+++ b/kernel/bpf/dump.c
@@ -93,6 +93,78 @@ static void *get_extra_priv_dptr(void *old_ptr, u32 old_size)
 	return old_ptr + roundup(old_size, 8);
 }
 
+int bpf_dump_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
+{
+	struct bpf_dump_info __user *ubpf_dinfo;
+	struct bpfdump_target_info *tinfo;
+	struct dumper_inode_info *i_info;
+	struct bpf_dump_info bpf_dinfo;
+	const char *prog_ctx_type_name;
+	void * __user tname_buf;
+	u32 tname_len, info_len;
+	struct file *filp;
+	struct fd qfd;
+	int err = 0;
+
+	qfd = fdget(attr->info.dump_query_fd);
+	filp = qfd.file;
+	if (!filp)
+		return -EBADF;
+
+	if (filp->f_op != &bpf_dumper_ops &&
+	    filp->f_inode->i_op != &bpfdump_dir_iops) {
+		err = -EINVAL;
+		goto done;
+	}
+
+	info_len = attr->info.info_len;
+	ubpf_dinfo = u64_to_user_ptr(attr->info.info);
+	err = bpf_check_uarg_tail_zero(ubpf_dinfo, sizeof(bpf_dinfo),
+				       info_len);
+	if (err)
+		goto done;
+	info_len = min_t(u32, sizeof(bpf_dinfo), info_len);
+
+	memset(&bpf_dinfo, 0, sizeof(bpf_dinfo));
+	if (copy_from_user(&bpf_dinfo, ubpf_dinfo, info_len)) {
+		err = -EFAULT;
+		goto done;
+	}
+
+	/* copy prog_id for dumpers */
+	if (filp->f_op == &bpf_dumper_ops) {
+		i_info = filp->f_inode->i_private;
+		bpf_dinfo.prog_id = i_info->prog->aux->id;
+		tinfo = i_info->tinfo;
+	} else {
+		tinfo = filp->f_inode->i_private;
+	}
+
+	prog_ctx_type_name = tinfo->prog_ctx_type_name;
+
+	tname_len = strlen(prog_ctx_type_name) + 1;
+	if (bpf_dinfo.type_name_buf_len < tname_len) {
+		err = -ENOSPC;
+		goto done;
+	}
+
+	/* copy prog_ctx_type_name */
+	tname_buf = u64_to_user_ptr(bpf_dinfo.prog_ctx_type_name);
+	if (copy_to_user(tname_buf, prog_ctx_type_name, tname_len)) {
+		err = -EFAULT;
+		goto done;
+	}
+
+	/* copy potentially updated bpf_dinfo and info_len */
+	if (copy_to_user(ubpf_dinfo, &bpf_dinfo, info_len) ||
+	    put_user(info_len, &uattr->info.info_len))
+		return -EFAULT;
+
+done:
+	fdput(qfd);
+	return err;
+}
+
 #ifdef CONFIG_PROC_FS
 static void dumper_show_fdinfo(struct seq_file *m, struct file *filp)
 {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e6a4514435c4..1cde78e53a17 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3358,7 +3358,7 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 	else if (f.file->f_op == &btf_fops)
 		err = bpf_btf_get_info_by_fd(f.file->private_data, attr, uattr);
 	else
-		err = -EINVAL;
+		err = bpf_dump_query(attr, uattr);
 
 	fdput(f);
 	return err;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 75f3657d526c..856e3f8a63b8 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -533,7 +533,10 @@ union bpf_attr {
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
-		__u32		bpf_fd;
+		union {
+			__u32		bpf_fd;
+			__u32		dump_query_fd;
+		};
 		__u32		info_len;
 		__aligned_u64	info;
 	} info;
@@ -3618,6 +3621,12 @@ struct bpf_btf_info {
 	__u32 id;
 } __attribute__((aligned(8)));
 
+struct bpf_dump_info {
+	__aligned_u64 prog_ctx_type_name;
+	__u32 type_name_buf_len;
+	__u32 prog_id;
+} __attribute__((aligned(8)));
+
 /* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
  * by user and intended to be used by socket (e.g. to bind to, depends on
  * attach attach type).
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 13/17] tools/libbpf: libbpf support for bpfdump
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (11 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 12/17] bpf: implement query for target_proto and file dumper prog_id Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 14/17] tools/bpftool: add bpf dumper support Yonghong Song
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Add a few libbpf APIs for bpfdump pin.

Also, parse the dump program section name,
retrieve the dump target path and open the path
to get a fd and assignment to prog->attach_target_fd.
The implementation is absolutely minimum and hacky now.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/lib/bpf/bpf.c      |  9 +++-
 tools/lib/bpf/bpf.h      |  1 +
 tools/lib/bpf/libbpf.c   | 88 +++++++++++++++++++++++++++++++++++++---
 tools/lib/bpf/libbpf.h   |  3 ++
 tools/lib/bpf/libbpf.map |  2 +
 5 files changed, 95 insertions(+), 8 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 5cc1b0785d18..b23f11c53109 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -238,10 +238,15 @@ int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
 	if (attr.prog_type == BPF_PROG_TYPE_STRUCT_OPS ||
 	    attr.prog_type == BPF_PROG_TYPE_LSM) {
 		attr.attach_btf_id = load_attr->attach_btf_id;
-	} else if (attr.prog_type == BPF_PROG_TYPE_TRACING ||
-		   attr.prog_type == BPF_PROG_TYPE_EXT) {
+	} else if (attr.prog_type == BPF_PROG_TYPE_EXT) {
 		attr.attach_btf_id = load_attr->attach_btf_id;
 		attr.attach_prog_fd = load_attr->attach_prog_fd;
+	} else if (attr.prog_type == BPF_PROG_TYPE_TRACING) {
+		attr.attach_btf_id = load_attr->attach_btf_id;
+		if (attr.expected_attach_type == BPF_TRACE_DUMP)
+			attr.attach_target_fd = load_attr->attach_target_fd;
+		else
+			attr.attach_prog_fd = load_attr->attach_prog_fd;
 	} else {
 		attr.prog_ifindex = load_attr->prog_ifindex;
 		attr.kern_version = load_attr->kern_version;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 46d47afdd887..7f8d740afde9 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -81,6 +81,7 @@ struct bpf_load_program_attr {
 	union {
 		__u32 kern_version;
 		__u32 attach_prog_fd;
+		__u32 attach_target_fd;
 	};
 	union {
 		__u32 prog_ifindex;
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ff9174282a8c..ad7726c0c1dc 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -79,6 +79,7 @@ static struct bpf_program *bpf_object__find_prog_by_idx(struct bpf_object *obj,
 							int idx);
 static const struct btf_type *
 skip_mods_and_typedefs(const struct btf *btf, __u32 id, __u32 *res_id);
+static int fill_dumper_info(struct bpf_program *prog);
 
 static int __base_pr(enum libbpf_print_level level, const char *format,
 		     va_list args)
@@ -229,6 +230,7 @@ struct bpf_program {
 	enum bpf_attach_type expected_attach_type;
 	__u32 attach_btf_id;
 	__u32 attach_prog_fd;
+	__u32 attach_target_fd;
 	void *func_info;
 	__u32 func_info_rec_size;
 	__u32 func_info_cnt;
@@ -2365,8 +2367,12 @@ static inline bool libbpf_prog_needs_vmlinux_btf(struct bpf_program *prog)
 	/* BPF_PROG_TYPE_TRACING programs which do not attach to other programs
 	 * also need vmlinux BTF
 	 */
-	if (prog->type == BPF_PROG_TYPE_TRACING && !prog->attach_prog_fd)
-		return true;
+	if (prog->type == BPF_PROG_TYPE_TRACING) {
+		if (prog->expected_attach_type == BPF_TRACE_DUMP)
+			return false;
+		if (!prog->attach_prog_fd)
+			return true;
+	}
 
 	return false;
 }
@@ -4870,10 +4876,15 @@ load_program(struct bpf_program *prog, struct bpf_insn *insns, int insns_cnt,
 	if (prog->type == BPF_PROG_TYPE_STRUCT_OPS ||
 	    prog->type == BPF_PROG_TYPE_LSM) {
 		load_attr.attach_btf_id = prog->attach_btf_id;
-	} else if (prog->type == BPF_PROG_TYPE_TRACING ||
-		   prog->type == BPF_PROG_TYPE_EXT) {
+	} else if (prog->type == BPF_PROG_TYPE_EXT) {
 		load_attr.attach_prog_fd = prog->attach_prog_fd;
 		load_attr.attach_btf_id = prog->attach_btf_id;
+	} else if (prog->type == BPF_PROG_TYPE_TRACING) {
+		load_attr.attach_btf_id = prog->attach_btf_id;
+		if (load_attr.expected_attach_type == BPF_TRACE_DUMP)
+			load_attr.attach_target_fd = prog->attach_target_fd;
+		else
+			load_attr.attach_prog_fd = prog->attach_prog_fd;
 	} else {
 		load_attr.kern_version = kern_version;
 		load_attr.prog_ifindex = prog->prog_ifindex;
@@ -4958,7 +4969,7 @@ int bpf_program__load(struct bpf_program *prog, char *license, __u32 kern_ver)
 {
 	int err = 0, fd, i, btf_id;
 
-	if ((prog->type == BPF_PROG_TYPE_TRACING ||
+	if (((prog->type == BPF_PROG_TYPE_TRACING && prog->expected_attach_type != BPF_TRACE_DUMP) ||
 	     prog->type == BPF_PROG_TYPE_LSM ||
 	     prog->type == BPF_PROG_TYPE_EXT) && !prog->attach_btf_id) {
 		btf_id = libbpf_find_attach_btf_id(prog);
@@ -5319,6 +5330,7 @@ static int bpf_object__resolve_externs(struct bpf_object *obj,
 
 int bpf_object__load_xattr(struct bpf_object_load_attr *attr)
 {
+	struct bpf_program *prog;
 	struct bpf_object *obj;
 	int err, i;
 
@@ -5335,7 +5347,17 @@ int bpf_object__load_xattr(struct bpf_object_load_attr *attr)
 
 	obj->loaded = true;
 
-	err = bpf_object__probe_caps(obj);
+	err = 0;
+	bpf_object__for_each_program(prog, obj) {
+		if (prog->type == BPF_PROG_TYPE_TRACING &&
+		    prog->expected_attach_type == BPF_TRACE_DUMP) {
+			err = fill_dumper_info(prog);
+			if (err)
+				break;
+		}
+	}
+
+	err = err ? : bpf_object__probe_caps(obj);
 	err = err ? : bpf_object__resolve_externs(obj, obj->kconfig);
 	err = err ? : bpf_object__sanitize_and_load_btf(obj);
 	err = err ? : bpf_object__sanitize_maps(obj);
@@ -6322,6 +6344,8 @@ static const struct bpf_sec_def section_defs[] = {
 		.is_attach_btf = true,
 		.expected_attach_type = BPF_LSM_MAC,
 		.attach_fn = attach_lsm),
+	SEC_DEF("dump/", TRACING,
+		.expected_attach_type = BPF_TRACE_DUMP),
 	BPF_PROG_SEC("xdp",			BPF_PROG_TYPE_XDP),
 	BPF_PROG_SEC("perf_event",		BPF_PROG_TYPE_PERF_EVENT),
 	BPF_PROG_SEC("lwt_in",			BPF_PROG_TYPE_LWT_IN),
@@ -6401,6 +6425,58 @@ static const struct bpf_sec_def *find_sec_def(const char *sec_name)
 	return NULL;
 }
 
+static int fill_dumper_info(struct bpf_program *prog)
+{
+	const struct bpf_sec_def *sec;
+	const char *dump_target;
+	int fd;
+
+	sec = find_sec_def(bpf_program__title(prog, false));
+	if (sec) {
+		dump_target = bpf_program__title(prog, false) + sec->len;
+		fd = open(dump_target, O_RDONLY);
+		if (fd < 0)
+			return fd;
+		prog->attach_target_fd = fd;
+	}
+	return 0;
+}
+
+int bpf_dump__pin(struct bpf_program *prog, const char *dname)
+{
+	int len, prog_fd = bpf_program__fd(prog);
+	const struct bpf_sec_def *sec;
+	const char *dump_target;
+	char *name_buf;
+	int err;
+
+	if (dname[0] == '/')
+		return bpf_obj_pin(prog_fd, dname);
+
+	sec = find_sec_def(bpf_program__title(prog, false));
+	if (!sec)
+		return bpf_obj_pin(prog_fd, dname);
+
+	dump_target = bpf_program__title(prog, false) + sec->len;
+	len = strlen(dump_target) + strlen(dname) + 2;
+	name_buf = malloc(len);
+	if (!name_buf)
+		return -ENOMEM;
+
+	strcpy(name_buf, dump_target);
+	strcat(name_buf, "/");
+	strcat(name_buf, dname);
+
+	err = bpf_obj_pin(prog_fd, name_buf);
+	free(name_buf);
+	return err;
+}
+
+int bpf_dump__unpin(struct bpf_program *prog, const char *dname)
+{
+	return -EINVAL;
+}
+
 static char *libbpf_get_type_names(bool attach_type)
 {
 	int i, len = ARRAY_SIZE(section_defs) * MAX_TYPE_NAME_SIZE;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 44df1d3e7287..e0d31e93d21c 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -217,6 +217,9 @@ LIBBPF_API int bpf_program__pin(struct bpf_program *prog, const char *path);
 LIBBPF_API int bpf_program__unpin(struct bpf_program *prog, const char *path);
 LIBBPF_API void bpf_program__unload(struct bpf_program *prog);
 
+LIBBPF_API int bpf_dump__pin(struct bpf_program *prog, const char *dname);
+LIBBPF_API int bpf_dump__unpin(struct bpf_program *prog, const char *dname);
+
 struct bpf_link;
 
 LIBBPF_API struct bpf_link *bpf_link__open(const char *path);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index bb8831605b25..0beb70bfe65a 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -238,6 +238,8 @@ LIBBPF_0.0.7 {
 
 LIBBPF_0.0.8 {
 	global:
+		bpf_dump__pin;
+		bpf_dump__unpin;
 		bpf_link__fd;
 		bpf_link__open;
 		bpf_link__pin;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 14/17] tools/bpftool: add bpf dumper support
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (12 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 13/17] tools/libbpf: libbpf support for bpfdump Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 15/17] tools/bpf: selftests: add dumper programs for ipv6_route and netlink Yonghong Song
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Implemented "bpftool dumper" command. Two subcommands now:
  bpftool dumper pin <bpf_prog.o> <dumper_name>
  bpftool dumper show {target|dumper}

The "pin" subcommand will create a dumper with <dumper_name>
under the dump target (specified in <bpf_prog>.o).
The "show" subcommand will show bpf prog ctx type name , which
will be useful on how to write the bpf program, and
the "dumper" subcommand will further show the corresponding prog_id
for each dumper.

For example, with some of later selftest dumpers are pinned
in the kernel, we can do inspection like below:
  $ bpftool dumper show target
  target                  prog_ctx_type
  task                    bpfdump__task
  task/file               bpfdump__task_file
  bpf_map                 bpfdump__bpf_map
  ipv6_route              bpfdump__ipv6_route
  netlink                 bpfdump__netlink
  $ bpftool dumper show dumper
  dumper                  prog_id   prog_ctx_type
  task/my1                8         bpfdump__task
  task/file/my1           12        bpfdump__task_file
  bpf_map/my1             4         bpfdump__bpf_map
  ipv6_route/my2          16        bpfdump__ipv6_route
  netlink/my2             24        bpfdump__ipv6_route
  netlink/my3             20        bpfdump__netlink

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/bpf/bpftool/dumper.c | 135 +++++++++++++++++++++++++++++++++++++
 tools/bpf/bpftool/main.c   |   3 +-
 tools/bpf/bpftool/main.h   |   1 +
 3 files changed, 138 insertions(+), 1 deletion(-)
 create mode 100644 tools/bpf/bpftool/dumper.c

diff --git a/tools/bpf/bpftool/dumper.c b/tools/bpf/bpftool/dumper.c
new file mode 100644
index 000000000000..46ca9d1d9a67
--- /dev/null
+++ b/tools/bpf/bpftool/dumper.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+// Copyright (C) 2020 Facebook
+// Author: Yonghong Song <yhs@fb.com>
+
+#define _GNU_SOURCE
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <ftw.h>
+
+#include <linux/err.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "main.h"
+
+static int do_pin(int argc, char **argv)
+{
+	struct bpf_program *prog;
+	struct bpf_object *obj;
+	const char *objfile, *dname;
+	int err;
+
+	if (!REQ_ARGS(2)) {
+		usage();
+		return -1;
+	}
+
+	objfile = GET_ARG();
+	dname = GET_ARG();
+
+	obj = bpf_object__open(objfile);
+	if (IS_ERR_OR_NULL(obj))
+		return -1;
+
+	err = bpf_object__load(obj);
+	if (err < 0)
+		return -1;
+
+	prog = bpf_program__next(NULL, obj);
+	return bpf_dump__pin(prog, dname);
+}
+
+static bool for_targets;
+static const char *bpfdump_root = "/sys/kernel/bpfdump";
+
+static int check_file(const char *fpath, const struct stat *sb,
+		      int typeflag, struct FTW *ftwbuf)
+{
+	char prog_ctx_cname_buf[64];
+	struct bpf_dump_info info;
+	unsigned info_len;
+	const char *name;
+	int ret, fd;
+
+	if ((for_targets && typeflag == FTW_F) ||
+	    (!for_targets && typeflag == FTW_D))
+		return 0;
+
+	if (for_targets && strcmp(fpath, bpfdump_root) == 0)
+		return 0;
+
+	fd = open(fpath, O_RDONLY);
+	if (fd < 0)
+		return fd;
+
+	info_len = sizeof(info);
+	memset(&info, 0, info_len);
+	info.prog_ctx_type_name  = ptr_to_u64(prog_ctx_cname_buf);
+	info.type_name_buf_len = sizeof(prog_ctx_cname_buf);
+	ret = bpf_obj_get_info_by_fd(fd, &info, &info_len);
+	if (ret < 0)
+		goto done;
+
+	name = fpath + strlen(bpfdump_root) + 1;
+	if (for_targets)
+		fprintf(stdout, "%-24s%-24s\n", name, prog_ctx_cname_buf);
+	else
+		fprintf(stdout, "%-24s%-10d%-24s\n", name, info.prog_id,
+			prog_ctx_cname_buf);
+
+done:
+	close(fd);
+	return ret;
+}
+
+static int do_show(int argc, char **argv)
+{
+	int flags = FTW_PHYS;
+	int nopenfd = 16;
+	const char *spec;
+
+	if (!REQ_ARGS(1)) {
+		usage();
+		return -1;
+	}
+
+	spec = GET_ARG();
+	if (strcmp(spec, "target") == 0) {
+		for_targets = true;
+		fprintf(stdout, "target                  prog_ctx_type\n");
+	} else if (strcmp(spec, "dumper") == 0) {
+		fprintf(stdout, "dumper                  prog_id   prog_ctx_type\n");
+		for_targets = false;
+	} else {
+		return -1;
+	}
+
+	if (nftw(bpfdump_root, check_file, nopenfd, flags) == -1)
+		return -1;
+
+	return 0;
+}
+
+static int do_help(int argc, char **argv)
+{
+	return 0;
+}
+
+static const struct cmd cmds[] = {
+	{ "help",	do_help },
+	{ "show",	do_show },
+	{ "pin",	do_pin },
+	{ 0 }
+};
+
+int do_dumper(int argc, char **argv)
+{
+	return cmd_select(cmds, argc, argv, do_help);
+}
diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 466c269eabdd..8489aba6543d 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -58,7 +58,7 @@ static int do_help(int argc, char **argv)
 		"       %s batch file FILE\n"
 		"       %s version\n"
 		"\n"
-		"       OBJECT := { prog | map | cgroup | perf | net | feature | btf | gen | struct_ops }\n"
+		"       OBJECT := { prog | map | cgroup | perf | net | feature | btf | gen | struct_ops | dumper}\n"
 		"       " HELP_SPEC_OPTIONS "\n"
 		"",
 		bin_name, bin_name, bin_name);
@@ -222,6 +222,7 @@ static const struct cmd cmds[] = {
 	{ "btf",	do_btf },
 	{ "gen",	do_gen },
 	{ "struct_ops",	do_struct_ops },
+	{ "dumper",	do_dumper },
 	{ "version",	do_version },
 	{ 0 }
 };
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 86f14ce26fd7..2c59f319bbe9 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -162,6 +162,7 @@ int do_feature(int argc, char **argv);
 int do_btf(int argc, char **argv);
 int do_gen(int argc, char **argv);
 int do_struct_ops(int argc, char **argv);
+int do_dumper(int argc, char **arg);
 
 int parse_u32_arg(int *argc, char ***argv, __u32 *val, const char *what);
 int prog_parse_fd(int *argc, char ***argv);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 15/17] tools/bpf: selftests: add dumper programs for ipv6_route and netlink
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (13 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 14/17] tools/bpftool: add bpf dumper support Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 16/17] tools/bpf: selftests: add dumper progs for bpf_map/task/task_file Yonghong Song
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Two bpf programs are added in this patch for netlink and ipv6_route
target. On my VM, I am able to achieve identical
results compared to /proc/net/netlink and /proc/net/ipv6_route.

  $ cat /proc/net/netlink
  sk               Eth Pid        Groups   Rmem     Wmem     Dump  Locks    Drops    Inode
  000000002c42d58b 0   0          00000000 0        0        0     2        0        7
  00000000a4e8b5e1 0   1          00000551 0        0        0     2        0        18719
  00000000e1b1c195 4   0          00000000 0        0        0     2        0        16422
  000000007e6b29f9 6   0          00000000 0        0        0     2        0        16424
  ....
  00000000159a170d 15  1862       00000002 0        0        0     2        0        1886
  000000009aca4bc9 15  3918224839 00000002 0        0        0     2        0        19076
  00000000d0ab31d2 15  1          00000002 0        0        0     2        0        18683
  000000008398fb08 16  0          00000000 0        0        0     2        0        27
  $ cat /sys/kernel/bpfdump/netlink/my1
  sk               Eth Pid        Groups   Rmem     Wmem     Dump  Locks    Drops    Inode
  000000002c42d58b 0   0          00000000 0        0        0     2        0        7
  00000000a4e8b5e1 0   1          00000551 0        0        0     2        0        18719
  00000000e1b1c195 4   0          00000000 0        0        0     2        0        16422
  000000007e6b29f9 6   0          00000000 0        0        0     2        0        16424
  ....
  00000000159a170d 15  1862       00000002 0        0        0     2        0        1886
  000000009aca4bc9 15  3918224839 00000002 0        0        0     2        0        19076
  00000000d0ab31d2 15  1          00000002 0        0        0     2        0        18683
  000000008398fb08 16  0          00000000 0        0        0     2        0        27

  $ cat /proc/net/ipv6_route
  fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001     eth0
  00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
  00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001       lo
  fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
  ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001     eth0
  00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
  $ cat /sys/kernel/bpfdump/ipv6_route/my1
  fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001     eth0
  00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo
  00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001       lo
  fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001     eth0
  ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001     eth0
  00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200       lo

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 .../selftests/bpf/progs/bpfdump_ipv6_route.c  | 71 ++++++++++++++++
 .../selftests/bpf/progs/bpfdump_netlink.c     | 80 +++++++++++++++++++
 2 files changed, 151 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/bpfdump_ipv6_route.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpfdump_netlink.c

diff --git a/tools/testing/selftests/bpf/progs/bpfdump_ipv6_route.c b/tools/testing/selftests/bpf/progs/bpfdump_ipv6_route.c
new file mode 100644
index 000000000000..ff187577e2b5
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpfdump_ipv6_route.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_endian.h>
+
+char _license[] SEC("license") = "GPL";
+
+extern bool CONFIG_IPV6_SUBTREES __kconfig __weak;
+
+#define	RTF_GATEWAY		0x0002
+#define IFNAMSIZ		16
+#define fib_nh_gw_family        nh_common.nhc_gw_family
+#define fib_nh_gw6              nh_common.nhc_gw.ipv6
+#define fib_nh_dev              nh_common.nhc_dev
+
+SEC("dump//sys/kernel/bpfdump/ipv6_route")
+int dump_ipv6_route(struct bpfdump__ipv6_route *ctx)
+{
+	static const char fmt1[] = "%pi6 %02x ";
+	static const char fmt2[] = "%pi6 ";
+	static const char fmt3[] = "00000000000000000000000000000000 ";
+	static const char fmt4[] = "%08x %08x ";
+	static const char fmt5[] = "%8s\n";
+	static const char fmt6[] = "\n";
+	static const char fmt7[] = "00000000000000000000000000000000 00 ";
+	struct seq_file *seq = ctx->meta->seq;
+	struct fib6_info *rt = ctx->rt;
+	const struct net_device *dev;
+	struct fib6_nh *fib6_nh;
+	unsigned int flags;
+	struct nexthop *nh;
+
+	if (rt == (void *)0)
+		return 0;
+
+	fib6_nh = &rt->fib6_nh[0];
+	flags = rt->fib6_flags;
+
+	/* FIXME: nexthop_is_multipath is not handled here. */
+	nh = rt->nh;
+	if (rt->nh)
+		fib6_nh = &nh->nh_info->fib6_nh;
+
+	bpf_seq_printf(seq, fmt1, sizeof(fmt1), &rt->fib6_dst.addr,
+		       rt->fib6_dst.plen);
+
+	if (CONFIG_IPV6_SUBTREES)
+		bpf_seq_printf(seq, fmt1, sizeof(fmt1), &rt->fib6_src.addr,
+			       rt->fib6_src.plen);
+	else
+		bpf_seq_printf(seq, fmt7, sizeof(fmt7));
+
+	if (fib6_nh->fib_nh_gw_family) {
+		flags |= RTF_GATEWAY;
+		bpf_seq_printf(seq, fmt2, sizeof(fmt2), &fib6_nh->fib_nh_gw6);
+	} else {
+		bpf_seq_printf(seq, fmt3, sizeof(fmt3));
+	}
+
+	dev = fib6_nh->fib_nh_dev;
+	bpf_seq_printf(seq, fmt4, sizeof(fmt4), rt->fib6_metric, rt->fib6_ref.refs.counter);
+	bpf_seq_printf(seq, fmt4, sizeof(fmt4), 0, flags);
+	if (dev)
+		bpf_seq_printf(seq, fmt5, sizeof(fmt5), dev->name);
+	else
+		bpf_seq_printf(seq, fmt6, sizeof(fmt6));
+
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/bpfdump_netlink.c b/tools/testing/selftests/bpf/progs/bpfdump_netlink.c
new file mode 100644
index 000000000000..8a1aec0ba7d0
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpfdump_netlink.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_endian.h>
+
+char _license[] SEC("license") = "GPL";
+
+#define sk_rmem_alloc	sk_backlog.rmem_alloc
+#define sk_refcnt	__sk_common.skc_refcnt
+
+#define offsetof(TYPE, MEMBER)  ((size_t)&((TYPE *)0)->MEMBER)
+#define container_of(ptr, type, member) ({                              \
+        void *__mptr = (void *)(ptr);                                   \
+        ((type *)(__mptr - offsetof(type, member))); })
+
+static inline struct inode *SOCK_INODE(struct socket *socket)
+{
+	return &container_of(socket, struct socket_alloc, socket)->vfs_inode;
+}
+
+SEC("dump//sys/kernel/bpfdump/netlink")
+int dump_netlink(struct bpfdump__netlink *ctx)
+{
+	static const char banner[] =
+		"sk               Eth Pid        Groups   "
+		"Rmem     Wmem     Dump  Locks    Drops    Inode\n";
+	static const char fmt1[] = "%pK %-3d ";
+	static const char fmt2[] = "%-10u %08x ";
+	static const char fmt3[] = "%-8d %-8d ";
+	static const char fmt4[] = "%-5d %-8d ";
+	static const char fmt5[] = "%-8u %-8lu\n";
+	struct seq_file *seq = ctx->meta->seq;
+	struct netlink_sock *nlk = ctx->sk;
+	unsigned long group, ino;
+	struct inode *inode;
+	struct socket *sk;
+	struct sock *s;
+
+	if (nlk == (void *)0)
+		return 0;
+
+	if (ctx->meta->seq_num == 0)
+		bpf_seq_printf(seq, banner, sizeof(banner));
+
+	s = &nlk->sk;
+	bpf_seq_printf(seq, fmt1, sizeof(fmt1), s, s->sk_protocol);
+
+	if (!nlk->groups)  {
+		group = 0;
+	} else {
+		/* FIXME: temporary use bpf_probe_read here, needs
+		 * verifier support to do direct access.
+		 */
+		bpf_probe_read(&group, sizeof(group), &nlk->groups[0]);
+	}
+	bpf_seq_printf(seq, fmt2, sizeof(fmt2), nlk->portid, (u32)group);
+
+
+	bpf_seq_printf(seq, fmt3, sizeof(fmt3), s->sk_rmem_alloc.counter,
+		       s->sk_wmem_alloc.refs.counter - 1);
+	bpf_seq_printf(seq, fmt4, sizeof(fmt4), nlk->cb_running,
+		       s->sk_refcnt.refs.counter);
+
+	sk = s->sk_socket;
+	if (!sk) {
+		ino = 0;
+	} else {
+		/* FIXME: container_of inside SOCK_INODE has a forced
+		 * type conversion, and direct access cannot be used
+		 * with current verifier.
+		 */
+		inode = SOCK_INODE(sk);
+		bpf_probe_read(&ino, sizeof(ino), &inode->i_ino);
+	}
+	bpf_seq_printf(seq, fmt5, sizeof(fmt5), s->sk_drops.counter, ino);
+
+	return 0;
+}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 16/17] tools/bpf: selftests: add dumper progs for bpf_map/task/task_file
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (14 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 15/17] tools/bpf: selftests: add dumper programs for ipv6_route and netlink Yonghong Song
@ 2020-04-15 19:27 ` Yonghong Song
  2020-04-15 19:28 ` [RFC PATCH bpf-next v2 17/17] tools/bpf: selftests: add a selftest for anonymous dumper Yonghong Song
  2020-04-16  2:23 ` [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures David Ahern
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:27 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

The implementation is arbitrary, just to show how the bpf programs
can be written for bpf_map/task/task_file. They can be costomized
for specific needs.

For example, for bpf_map, the dumper prints out:
  $ cat /sys/kernel/bpfdump/bpf_map/my1
      id   refcnt  usercnt  locked_vm
       3        2        0         20
       6        2        0         20
       9        2        0         20
      12        2        0         20
      13        2        0         20
      16        2        0         20
      19        2        0         20
      === END ===

For task, the dumper prints out:
  $ cat /sys/kernel/bpfdump/task/my1
    tgid      gid
       1        1
       2        2
    ....
    1944     1944
    1948     1948
    1949     1949
    1953     1953
    === END ===

For task/file, the dumper prints out:
  $ cat /sys/kernel/bpfdump/task/file/my1
    tgid      gid       fd      file
       1        1        0 ffffffff95c97600
       1        1        1 ffffffff95c97600
       1        1        2 ffffffff95c97600
    ....
    1895     1895      255 ffffffff95c8fe00
    1932     1932        0 ffffffff95c8fe00
    1932     1932        1 ffffffff95c8fe00
    1932     1932        2 ffffffff95c8fe00
    1932     1932        3 ffffffff95c185c0

This is able to print out all open files (fd and file->f_op), so user can compare
f_op against a particular kernel file operations to find what it is.
For example, from /proc/kallsyms, we can find
  ffffffff95c185c0 r eventfd_fops
so we will know tgid 1932 fd 3 is an eventfd file descriptor.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 .../selftests/bpf/progs/bpfdump_bpf_map.c     | 33 +++++++++++++++++++
 .../selftests/bpf/progs/bpfdump_task.c        | 29 ++++++++++++++++
 .../selftests/bpf/progs/bpfdump_task_file.c   | 30 +++++++++++++++++
 3 files changed, 92 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/bpfdump_bpf_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpfdump_task.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpfdump_task_file.c

diff --git a/tools/testing/selftests/bpf/progs/bpfdump_bpf_map.c b/tools/testing/selftests/bpf/progs/bpfdump_bpf_map.c
new file mode 100644
index 000000000000..94e97a5358e9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpfdump_bpf_map.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_endian.h>
+
+char _license[] SEC("license") = "GPL";
+
+SEC("dump//sys/kernel/bpfdump/bpf_map")
+int dump_bpf_map(struct bpfdump__bpf_map *ctx)
+{
+	static const char banner[] = "      id   refcnt  usercnt  locked_vm\n";
+	static const char footer[] = "      === END ===\n";
+	static const char fmt1[] = "%8u %8ld ";
+	static const char fmt2[] = "%8ld %10lu\n";
+	struct seq_file *seq = ctx->meta->seq;
+	__u64 seq_num = ctx->meta->seq_num;
+	struct bpf_map *map = ctx->map;
+
+	if (map == (void *)0) {
+		bpf_seq_printf(seq, footer, sizeof(footer));
+		return 0;
+	}
+
+	if (seq_num == 0)
+		bpf_seq_printf(seq, banner, sizeof(banner));
+
+	bpf_seq_printf(seq, fmt1, sizeof(fmt1), map->id, map->refcnt.counter);
+	bpf_seq_printf(seq, fmt2, sizeof(fmt2), map->usercnt.counter,
+		       map->memory.user->locked_vm.counter);
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/bpfdump_task.c b/tools/testing/selftests/bpf/progs/bpfdump_task.c
new file mode 100644
index 000000000000..70c5b14934a2
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpfdump_task.c
@@ -0,0 +1,29 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_endian.h>
+
+char _license[] SEC("license") = "GPL";
+
+SEC("dump//sys/kernel/bpfdump/task")
+int dump_tasks(struct bpfdump__task *ctx)
+{
+	static char const banner[] = "    tgid      gid\n";
+	static char const footer[] = "=== END ===\n";
+	static char const fmt[] = "%8d %8d\n";
+	struct seq_file *seq = ctx->meta->seq;
+	struct task_struct *task = ctx->task;
+
+	if (task == (void *)0) {
+		bpf_seq_printf(seq, footer, sizeof(footer));
+		return 0;
+	}
+
+	if (ctx->meta->seq_num == 0)
+		bpf_seq_printf(seq, banner, sizeof(banner));
+
+	bpf_seq_printf(seq, fmt, sizeof(fmt), task->tgid, task->pid);
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/bpfdump_task_file.c b/tools/testing/selftests/bpf/progs/bpfdump_task_file.c
new file mode 100644
index 000000000000..95e3ff6f8a06
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpfdump_task_file.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_endian.h>
+
+char _license[] SEC("license") = "GPL";
+
+SEC("dump//sys/kernel/bpfdump/task/file")
+int dump_tasks(struct bpfdump__task_file *ctx)
+{
+	static char const banner[] = "    tgid      gid       fd      file\n";
+	static char const fmt1[] = "%8d %8d";
+	static char const fmt2[] = " %8d %lx\n";
+	struct seq_file *seq = ctx->meta->seq;
+	struct task_struct *task = ctx->task;
+	__u32 fd = ctx->fd;
+	struct file *file = ctx->file;
+
+	if (task == (void *)0 || file == (void *)0)
+		return 0;
+
+	if (ctx->meta->seq_num == 0)
+		bpf_seq_printf(seq, banner, sizeof(banner));
+
+	bpf_seq_printf(seq, fmt1, sizeof(fmt1), task->tgid, task->pid);
+	bpf_seq_printf(seq, fmt2, sizeof(fmt2), fd, (long)file->f_op);
+	return 0;
+}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next v2 17/17] tools/bpf: selftests: add a selftest for anonymous dumper
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (15 preceding siblings ...)
  2020-04-15 19:27 ` [RFC PATCH bpf-next v2 16/17] tools/bpf: selftests: add dumper progs for bpf_map/task/task_file Yonghong Song
@ 2020-04-15 19:28 ` Yonghong Song
  2020-04-16  2:23 ` [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures David Ahern
  17 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-15 19:28 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

The selftest creates a anonymous dumper for the
/sys/kernel/bpfdump/task/ target and ensure the
user space got the expected contents. Both
bpf_seq_printf() and bpf_seq_write() helpers
are tested in this selftest.

  $ test_progs -n 2
  #2 bpfdump_test:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 .../selftests/bpf/prog_tests/bpfdump_test.c   | 42 +++++++++++++++++++
 .../selftests/bpf/progs/bpfdump_test_kern.c   | 31 ++++++++++++++
 2 files changed, 73 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/bpfdump_test.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpfdump_test_kern.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpfdump_test.c b/tools/testing/selftests/bpf/prog_tests/bpfdump_test.c
new file mode 100644
index 000000000000..8978e04c3ca9
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/bpfdump_test.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include "bpfdump_test_kern.skel.h"
+
+void test_bpfdump_test(void)
+{
+	int err, prog_fd, dumper_fd, duration = 0;
+	struct bpfdump_test_kern *skel;
+	char buf[16] = {};
+	const char *expected = "0A1B2C3D";
+
+	skel = bpfdump_test_kern__open_and_load();
+	if (CHECK(!skel, "skel_open_and_load",
+		  "skeleton open_and_load failed\n"))
+		return;
+
+	prog_fd = bpf_program__fd(skel->progs.dump_tasks);
+	dumper_fd = bpf_raw_tracepoint_open(NULL, prog_fd);
+	if (CHECK(dumper_fd < 0, "bpf_raw_tracepoint_open",
+		  "anonymous dumper creation failed\n"))
+		goto destroy_skel;
+
+	err = -EINVAL;
+	while (read(dumper_fd, buf, sizeof(buf)) > 0) {
+		if (CHECK(!err, "read", "unexpected extra read\n"))
+			goto close_fd;
+
+		err = strcmp(buf, expected) != 0;
+		if (CHECK(err, "read",
+			  "read failed: buf %s, expected %s\n", buf,
+			  expected))
+			goto close_fd;
+	}
+
+	CHECK(err, "read", "real failed: no read, expected %s\n",
+	      expected);
+
+close_fd:
+	close(dumper_fd);
+destroy_skel:
+	bpfdump_test_kern__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/bpfdump_test_kern.c b/tools/testing/selftests/bpf/progs/bpfdump_test_kern.c
new file mode 100644
index 000000000000..f6bd61a75a22
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpfdump_test_kern.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_endian.h>
+
+char _license[] SEC("license") = "GPL";
+
+int count = 0;
+
+SEC("dump//sys/kernel/bpfdump/task")
+int dump_tasks(struct bpfdump__task *ctx)
+{
+	struct seq_file *seq = ctx->meta->seq;
+	struct task_struct *task = ctx->task;
+	static char fmt[] = "%d";
+	char c;
+
+	if (task == (void *)0)
+		return 0;
+
+	if (count < 4) {
+		bpf_seq_printf(seq, fmt, sizeof(fmt), count);
+		c = 'A' + count;
+		bpf_seq_write(seq, &c, sizeof(c));
+		count++;
+	}
+
+	return 0;
+}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures
  2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
                   ` (16 preceding siblings ...)
  2020-04-15 19:28 ` [RFC PATCH bpf-next v2 17/17] tools/bpf: selftests: add a selftest for anonymous dumper Yonghong Song
@ 2020-04-16  2:23 ` David Ahern
  2020-04-16  6:41   ` Yonghong Song
  2020-04-17 10:54   ` Alan Maguire
  17 siblings, 2 replies; 24+ messages in thread
From: David Ahern @ 2020-04-16  2:23 UTC (permalink / raw)
  To: Yonghong Song, Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

On 4/15/20 1:27 PM, Yonghong Song wrote:
> 
> As there are some discussions regarding to the kernel interface/steps to
> create file/anonymous dumpers, I think it will be beneficial for
> discussion with this work in progress.
> 
> Motivation:
>   The current way to dump kernel data structures mostly:
>     1. /proc system
>     2. various specific tools like "ss" which requires kernel support.
>     3. drgn
>   The dropback for the first two is that whenever you want to dump more, you
>   need change the kernel. For example, Martin wants to dump socket local

If kernel support is needed for bpfdump of kernel data structures, you
are not really solving the kernel support problem. i.e., to dump
ipv4_route's you need to modify the relevant proc show function.


>   storage with "ss". Kernel change is needed for it to work ([1]).
>   This is also the direct motivation for this work.
> 
>   drgn ([2]) solves this proble nicely and no kernel change is not needed.
>   But since drgn is not able to verify the validity of a particular pointer value,
>   it might present the wrong results in rare cases.
> 
>   In this patch set, we introduce bpf based dumping. Initial kernel changes are
>   still needed, but a data structure change will not require kernel changes
>   any more. bpf program itself is used to adapt to new data structure
>   changes. This will give certain flexibility with guaranteed correctness.
> 
>   Here, kernel seq_ops is used to facilitate dumping, similar to current
>   /proc and many other lossless kernel dumping facilities.
> 
> User Interfaces:
>   1. A new mount file system, bpfdump at /sys/kernel/bpfdump is introduced.
>      Different from /sys/fs/bpf, this is a single user mount. Mount command
>      can be:
>         mount -t bpfdump bpfdump /sys/kernel/bpfdump
>   2. Kernel bpf dumpable data structures are represented as directories
>      under /sys/kernel/bpfdump, e.g.,
>        /sys/kernel/bpfdump/ipv6_route/
>        /sys/kernel/bpfdump/netlink/

The names of bpfdump fs entries do not match actual data structure names
- e.g., there is no ipv6_route struct. On the one hand that is a good
thing since structure names can change, but that also means a mapping is
needed between the dumper filesystem entries and what you get for context.

Further, what is the expectation in terms of stable API for these fs
entries? Entries in the context can change. Data structure names can
change. Entries in the structs can change. All of that breaks the idea
of stable programs that are compiled once and run for all future
releases. When structs change, those programs will break - and
structures will change.

What does bpfdumper provide that you can not do with a tracepoint on a
relevant function and then putting a program on the tracepoint? ie., why
not just put a tracepoint in the relevant dump functions.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures
  2020-04-16  2:23 ` [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures David Ahern
@ 2020-04-16  6:41   ` Yonghong Song
  2020-04-17 15:02     ` Alan Maguire
  2020-04-17 10:54   ` Alan Maguire
  1 sibling, 1 reply; 24+ messages in thread
From: Yonghong Song @ 2020-04-16  6:41 UTC (permalink / raw)
  To: David Ahern, Andrii Nakryiko, bpf, Martin KaFai Lau, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team



On 4/15/20 7:23 PM, David Ahern wrote:
> On 4/15/20 1:27 PM, Yonghong Song wrote:
>>
>> As there are some discussions regarding to the kernel interface/steps to
>> create file/anonymous dumpers, I think it will be beneficial for
>> discussion with this work in progress.
>>
>> Motivation:
>>    The current way to dump kernel data structures mostly:
>>      1. /proc system
>>      2. various specific tools like "ss" which requires kernel support.
>>      3. drgn
>>    The dropback for the first two is that whenever you want to dump more, you
>>    need change the kernel. For example, Martin wants to dump socket local
> 
> If kernel support is needed for bpfdump of kernel data structures, you
> are not really solving the kernel support problem. i.e., to dump
> ipv4_route's you need to modify the relevant proc show function.

Yes, as mentioned two paragraphs below. kernel change is required.
The tradeoff is that this is a one-time investment. Once kernel change
is in place, printing new fields (in most cases except new fields
which need additional locks etc.) no need for kernel change any more.

> 
> 
>>    storage with "ss". Kernel change is needed for it to work ([1]).
>>    This is also the direct motivation for this work.
>>
>>    drgn ([2]) solves this proble nicely and no kernel change is not needed.
>>    But since drgn is not able to verify the validity of a particular pointer value,
>>    it might present the wrong results in rare cases.
>>
>>    In this patch set, we introduce bpf based dumping. Initial kernel changes are
>>    still needed, but a data structure change will not require kernel changes
>>    any more. bpf program itself is used to adapt to new data structure
>>    changes. This will give certain flexibility with guaranteed correctness.
>>
>>    Here, kernel seq_ops is used to facilitate dumping, similar to current
>>    /proc and many other lossless kernel dumping facilities.
>>
>> User Interfaces:
>>    1. A new mount file system, bpfdump at /sys/kernel/bpfdump is introduced.
>>       Different from /sys/fs/bpf, this is a single user mount. Mount command
>>       can be:
>>          mount -t bpfdump bpfdump /sys/kernel/bpfdump
>>    2. Kernel bpf dumpable data structures are represented as directories
>>       under /sys/kernel/bpfdump, e.g.,
>>         /sys/kernel/bpfdump/ipv6_route/
>>         /sys/kernel/bpfdump/netlink/
> 
> The names of bpfdump fs entries do not match actual data structure names
> - e.g., there is no ipv6_route struct. On the one hand that is a good
> thing since structure names can change, but that also means a mapping is
> needed between the dumper filesystem entries and what you get for context.

Yes, the later bpftool patch implements a new command to dump such
information.

   $ bpftool dumper show target
   target                  prog_ctx_type
   task                    bpfdump__task
   task/file               bpfdump__task_file
   bpf_map                 bpfdump__bpf_map
   ipv6_route              bpfdump__ipv6_route
   netlink                 bpfdump__netlink

in vmlinux.h generated by vmlinux BTF, we have

struct bpf_dump_meta {
         struct seq_file *seq;
         u64 session_id;
         u64 seq_num;
};

struct bpfdump__ipv6_route {
         struct bpf_dump_meta *meta;
         struct fib6_info *rt;
};

Here, bpfdump__ipv6_route is the bpf program context type.
User can based on this to write the bpf program.

> 
> Further, what is the expectation in terms of stable API for these fs
> entries? Entries in the context can change. Data structure names can
> change. Entries in the structs can change. All of that breaks the idea
> of stable programs that are compiled once and run for all future
> releases. When structs change, those programs will break - and
> structures will change.

Yes, the API (ctx) we presented to bpf program is indeed unstable.
CO-RE should help to certain extend but if some fields are gone, e.g.,
bpf program will need to be rewritten for that particular kernel 
version, or kernel bpfdump infrastructure can be enhanced to
change its ctx structure to have more information to the program
for that kernel version. In summary, I agree with you that this is
an unstable API similar to other tracing program
since it accesses kernel internal data structures.

> 
> What does bpfdumper provide that you can not do with a tracepoint on a
> relevant function and then putting a program on the tracepoint? ie., why
> not just put a tracepoint in the relevant dump functions.

In my very beginning to explore bpfdump, kprobe to "show" function is
one of options. But quickly we realized that we actually do not want
to just piggyback on "show" function, but want to replace it with
bpf. This will be useful in following different use cases:
   1. first catable dumper file, similar to /proc/net/ipv6_route,
      we want /sys/kernel/bpfdump/ipv6_route/my_dumper and you can cat
      to get it.

      Using kprobe when you are doing `cat /proc/net/ipv6_route`
      is complicated.  You probably need an application which
      runs through `cat /proc/net/ipv6_route` and discard its output,
      and at the same time gets the result from bpf program
      (filtered by pid since somebody may run
      `cat /proc/net/ipv6_route` at the same time. You may use
      perf ring_buffer to send the result back to the application.

      note that perf ring buffer may lose records for whatever
      reason and seq_ops are implemented not to lose records
      by built-in retries.

      Using kprobe approach above is complicated and for each dumper
      you need an application. We would like it to be just catable
      with minimum user overhead to create such a dumper.

   2. second, anonymous dumper, kprobe/tracepoint will incur
      original overhead of seq_printf per object. but user may
      be only interested in a very small portion of information.
      In such cases, bpf program directly doing filtering in
      the kernel can potentially speed up a lot if there are a lot of
      records to traverse.

   3. for data structures which do not have catable dumpers
      for example task, hopefully, as demonstrated in this patch set,
      kernel implementation and writing a bpf program are not
      too hard. This especially enables people to do in-kernel
      filtering which is the strength of the bpf.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures
  2020-04-16  2:23 ` [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures David Ahern
  2020-04-16  6:41   ` Yonghong Song
@ 2020-04-17 10:54   ` Alan Maguire
  2020-04-19  5:30     ` Yonghong Song
  1 sibling, 1 reply; 24+ messages in thread
From: Alan Maguire @ 2020-04-17 10:54 UTC (permalink / raw)
  To: David Ahern
  Cc: Yonghong Song, Andrii Nakryiko, bpf, Martin KaFai Lau, netdev,
	Alexei Starovoitov, Daniel Borkmann, kernel-team

On Wed, 15 Apr 2020, David Ahern wrote:

> On 4/15/20 1:27 PM, Yonghong Song wrote:
> > 
> > As there are some discussions regarding to the kernel interface/steps to
> > create file/anonymous dumpers, I think it will be beneficial for
> > discussion with this work in progress.
> > 
> > Motivation:
> >   The current way to dump kernel data structures mostly:
> >     1. /proc system
> >     2. various specific tools like "ss" which requires kernel support.
> >     3. drgn
> >   The dropback for the first two is that whenever you want to dump more, you
> >   need change the kernel. For example, Martin wants to dump socket local
> 
> If kernel support is needed for bpfdump of kernel data structures, you
> are not really solving the kernel support problem. i.e., to dump
> ipv4_route's you need to modify the relevant proc show function.
>

I need to dig into this patchset a bit more, but if there is
a need for in-kernel BTF-based structure dumping I've got a
work-in-progress patchset that does this by generalizing the code
that  deals with seq output in the verifier. I've posted it
as an RFC in case it has anything useful to offer here:

https://lore.kernel.org/bpf/1587120160-3030-1-git-send-email-alan.maguire@oracle.com/T/#t

The idea is that by using different callback function we can achieve
seq, snprintf or other output in-kernel using the kernel BTF data. 
I created one consumer as a proof-of-concept; it's a printk pointer 
format specifier.  Since the dump format is determined in kernel
it's a bit constrained format-wise, but may be good enough for
some cases.

To give a flavour for what the printed-out data looks like,
here we use pr_info() to display a struct sk_buff *.  Note
we specify the 'N' modifier to show type field names:

  struct sk_buff *skb = alloc_skb(64, GFP_KERNEL);

  pr_info("%pTN<struct sk_buff>", skb);

...gives us:

{{{.next=00000000c7916e9c,.prev=00000000c7916e9c,{.dev=00000000c7916e9c|.dev_scratch=0}}|.rbnode={.__rb_parent_color=0,.rb_right=00000000c7916e9c,.rb_left=00000000c7916e9c}|.list={.next=00000000c7916e9c,.prev=00000000c7916e9c}},{.sk=00000000c7916e9c|.ip_defrag_offset=0},{.tstamp=0|.skb_mstamp_ns=0},.cb=['\0'],{{._skb_refdst=0,.destructor=00000000c7916e9c}|.tcp_tsorted_anchor={.next=00000000c7916e9c,.prev=00000000c7916e9c}},._nfct=0,.len=0,.data_len=0,.mac_len=0,.hdr_len=0,.queue_mapping=0,.__cloned_offset=[],.cloned=0x0,.nohdr=0x0,.fclone=0x0,.peeked=0x0,.head_frag=0x0,.pfmemalloc=0x0,.active_extensions=0,.headers_start=[],.__pkt_type_offset=[],.pkt_type=0x0,.ignore_df=0x0,.nf_trace=0x0,.ip_summed=0x0,.ooo_okay=0x0,.l4_hash=0x0,.sw_hash=0x0,.wifi_acked_valid=0x0,.wifi_acked=0x0,.no_fcs=0x0,.encapsulation=0x0,.encap_hdr_csum=0x0,.csum_valid=0x0,.__pkt_vlan_present_offset=[],.vlan_present=0x0,.csum_complete_sw=0x0,.csum_level=0x0,.csum_not_inet=0x0,.dst_pending_co

[printk output is truncated at 1024 bytes, but more
compact output can be achieved by not specifying 'N'
for type names. I may need to add a specifier to avoid
pointer obfuscation]

With a printk format specifier, trace_printk() in BPF then
inherits this dumping behaviour for free, but I think it
would also be possible to add a helper so that the type
name didn't have to be specified.  The verifier could insert
BTF ids and type data could be dumped for tracing arguments
via a flavour of bpf_perf_event_output() helper or similar.
To be clear I haven't done any of that yet in the RFC patchset,
but it seems feasible at least.

Anyway perhaps there's something useful in it which can help
towards the goal of easier dumping of data structures.

I'll spend some time over the weekend looking at the
BTF dumper patchset; apologies I haven't got very far
with it yet.

Thanks!

Alan

> 
> >   storage with "ss". Kernel change is needed for it to work ([1]).
> >   This is also the direct motivation for this work.
> > 
> >   drgn ([2]) solves this proble nicely and no kernel change is not needed.
> >   But since drgn is not able to verify the validity of a particular pointer value,
> >   it might present the wrong results in rare cases.
> > 
> >   In this patch set, we introduce bpf based dumping. Initial kernel changes are
> >   still needed, but a data structure change will not require kernel changes
> >   any more. bpf program itself is used to adapt to new data structure
> >   changes. This will give certain flexibility with guaranteed correctness.
> > 
> >   Here, kernel seq_ops is used to facilitate dumping, similar to current
> >   /proc and many other lossless kernel dumping facilities.
> > 
> > User Interfaces:
> >   1. A new mount file system, bpfdump at /sys/kernel/bpfdump is introduced.
> >      Different from /sys/fs/bpf, this is a single user mount. Mount command
> >      can be:
> >         mount -t bpfdump bpfdump /sys/kernel/bpfdump
> >   2. Kernel bpf dumpable data structures are represented as directories
> >      under /sys/kernel/bpfdump, e.g.,
> >        /sys/kernel/bpfdump/ipv6_route/
> >        /sys/kernel/bpfdump/netlink/
> 
> The names of bpfdump fs entries do not match actual data structure names
> - e.g., there is no ipv6_route struct. On the one hand that is a good
> thing since structure names can change, but that also means a mapping is
> needed between the dumper filesystem entries and what you get for context.
> 
> Further, what is the expectation in terms of stable API for these fs
> entries? Entries in the context can change. Data structure names can
> change. Entries in the structs can change. All of that breaks the idea
> of stable programs that are compiled once and run for all future
> releases. When structs change, those programs will break - and
> structures will change.
> 
> What does bpfdumper provide that you can not do with a tracepoint on a
> relevant function and then putting a program on the tracepoint? ie., why
> not just put a tracepoint in the relevant dump functions.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures
  2020-04-16  6:41   ` Yonghong Song
@ 2020-04-17 15:02     ` Alan Maguire
  2020-04-19  5:34       ` Yonghong Song
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Maguire @ 2020-04-17 15:02 UTC (permalink / raw)
  To: Yonghong Song
  Cc: David Ahern, Andrii Nakryiko, bpf, Martin KaFai Lau, netdev,
	Alexei Starovoitov, Daniel Borkmann, kernel-team

On Wed, 15 Apr 2020, Yonghong Song wrote:

> 
> 
> On 4/15/20 7:23 PM, David Ahern wrote:
> > On 4/15/20 1:27 PM, Yonghong Song wrote:
> >>
> >> As there are some discussions regarding to the kernel interface/steps to
> >> create file/anonymous dumpers, I think it will be beneficial for
> >> discussion with this work in progress.
> >>
> >> Motivation:
> >>    The current way to dump kernel data structures mostly:
> >>      1. /proc system
> >>      2. various specific tools like "ss" which requires kernel support.
> >>      3. drgn
> >>    The dropback for the first two is that whenever you want to dump more,
> >>    you
> >>    need change the kernel. For example, Martin wants to dump socket local
> > 
> > If kernel support is needed for bpfdump of kernel data structures, you
> > are not really solving the kernel support problem. i.e., to dump
> > ipv4_route's you need to modify the relevant proc show function.
> 
> Yes, as mentioned two paragraphs below. kernel change is required.
> The tradeoff is that this is a one-time investment. Once kernel change
> is in place, printing new fields (in most cases except new fields
> which need additional locks etc.) no need for kernel change any more.
>

One thing I struggled with initially when reading the cover
letter was understanding how BPF dumper programs get run.
Patch 7 deals with that I think and the answer seems to be to
create additional seq file infrastructure to the exisiting
one which executes the BPF dumper programs where appropriate.
Have I got this right? I guess more lightweight methods
such as instrumenting functions associated with an existing /proc
dumper are a bit too messy?

Thanks!

Alan

> > 
> > 
> >>    storage with "ss". Kernel change is needed for it to work ([1]).
> >>    This is also the direct motivation for this work.
> >>
> >>    drgn ([2]) solves this proble nicely and no kernel change is not needed.
> >>    But since drgn is not able to verify the validity of a particular
> >>    pointer value,
> >>    it might present the wrong results in rare cases.
> >>
> >>    In this patch set, we introduce bpf based dumping. Initial kernel
> >>    changes are
> >>    still needed, but a data structure change will not require kernel
> >>    changes
> >>    any more. bpf program itself is used to adapt to new data structure
> >>    changes. This will give certain flexibility with guaranteed correctness.
> >>
> >>    Here, kernel seq_ops is used to facilitate dumping, similar to current
> >>    /proc and many other lossless kernel dumping facilities.
> >>
> >> User Interfaces:
> >>    1. A new mount file system, bpfdump at /sys/kernel/bpfdump is
> >>    introduced.
> >>       Different from /sys/fs/bpf, this is a single user mount. Mount
> >>       command
> >>       can be:
> >>          mount -t bpfdump bpfdump /sys/kernel/bpfdump
> >>    2. Kernel bpf dumpable data structures are represented as directories
> >>       under /sys/kernel/bpfdump, e.g.,
> >>         /sys/kernel/bpfdump/ipv6_route/
> >>         /sys/kernel/bpfdump/netlink/
> > 
> > The names of bpfdump fs entries do not match actual data structure names
> > - e.g., there is no ipv6_route struct. On the one hand that is a good
> > thing since structure names can change, but that also means a mapping is
> > needed between the dumper filesystem entries and what you get for context.
> 
> Yes, the later bpftool patch implements a new command to dump such
> information.
> 
>   $ bpftool dumper show target
>   target                  prog_ctx_type
>   task                    bpfdump__task
>   task/file               bpfdump__task_file
>   bpf_map                 bpfdump__bpf_map
>   ipv6_route              bpfdump__ipv6_route
>   netlink                 bpfdump__netlink
> 
> in vmlinux.h generated by vmlinux BTF, we have
> 
> struct bpf_dump_meta {
>         struct seq_file *seq;
>         u64 session_id;
>         u64 seq_num;
> };
> 
> struct bpfdump__ipv6_route {
>         struct bpf_dump_meta *meta;
>         struct fib6_info *rt;
> };
> 
> Here, bpfdump__ipv6_route is the bpf program context type.
> User can based on this to write the bpf program.
> 
> > 
> > Further, what is the expectation in terms of stable API for these fs
> > entries? Entries in the context can change. Data structure names can
> > change. Entries in the structs can change. All of that breaks the idea
> > of stable programs that are compiled once and run for all future
> > releases. When structs change, those programs will break - and
> > structures will change.
> 
> Yes, the API (ctx) we presented to bpf program is indeed unstable.
> CO-RE should help to certain extend but if some fields are gone, e.g.,
> bpf program will need to be rewritten for that particular kernel version, or
> kernel bpfdump infrastructure can be enhanced to
> change its ctx structure to have more information to the program
> for that kernel version. In summary, I agree with you that this is
> an unstable API similar to other tracing program
> since it accesses kernel internal data structures.
> 
> > 
> > What does bpfdumper provide that you can not do with a tracepoint on a
> > relevant function and then putting a program on the tracepoint? ie., why
> > not just put a tracepoint in the relevant dump functions.
> 
> In my very beginning to explore bpfdump, kprobe to "show" function is
> one of options. But quickly we realized that we actually do not want
> to just piggyback on "show" function, but want to replace it with
> bpf. This will be useful in following different use cases:
>   1. first catable dumper file, similar to /proc/net/ipv6_route,
>      we want /sys/kernel/bpfdump/ipv6_route/my_dumper and you can cat
>      to get it.
> 
>      Using kprobe when you are doing `cat /proc/net/ipv6_route`
>      is complicated.  You probably need an application which
>      runs through `cat /proc/net/ipv6_route` and discard its output,
>      and at the same time gets the result from bpf program
>      (filtered by pid since somebody may run
>      `cat /proc/net/ipv6_route` at the same time. You may use
>      perf ring_buffer to send the result back to the application.
> 
>      note that perf ring buffer may lose records for whatever
>      reason and seq_ops are implemented not to lose records
>      by built-in retries.
> 
>      Using kprobe approach above is complicated and for each dumper
>      you need an application. We would like it to be just catable
>      with minimum user overhead to create such a dumper.
> 
>   2. second, anonymous dumper, kprobe/tracepoint will incur
>      original overhead of seq_printf per object. but user may
>      be only interested in a very small portion of information.
>      In such cases, bpf program directly doing filtering in
>      the kernel can potentially speed up a lot if there are a lot of
>      records to traverse.
> 
>   3. for data structures which do not have catable dumpers
>      for example task, hopefully, as demonstrated in this patch set,
>      kernel implementation and writing a bpf program are not
>      too hard. This especially enables people to do in-kernel
>      filtering which is the strength of the bpf.
> 
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures
  2020-04-17 10:54   ` Alan Maguire
@ 2020-04-19  5:30     ` Yonghong Song
  0 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-19  5:30 UTC (permalink / raw)
  To: Alan Maguire, David Ahern
  Cc: Andrii Nakryiko, bpf, Martin KaFai Lau, netdev,
	Alexei Starovoitov, Daniel Borkmann, kernel-team



On 4/17/20 3:54 AM, Alan Maguire wrote:
> On Wed, 15 Apr 2020, David Ahern wrote:
> 
>> On 4/15/20 1:27 PM, Yonghong Song wrote:
>>>
>>> As there are some discussions regarding to the kernel interface/steps to
>>> create file/anonymous dumpers, I think it will be beneficial for
>>> discussion with this work in progress.
>>>
>>> Motivation:
>>>    The current way to dump kernel data structures mostly:
>>>      1. /proc system
>>>      2. various specific tools like "ss" which requires kernel support.
>>>      3. drgn
>>>    The dropback for the first two is that whenever you want to dump more, you
>>>    need change the kernel. For example, Martin wants to dump socket local
>>
>> If kernel support is needed for bpfdump of kernel data structures, you
>> are not really solving the kernel support problem. i.e., to dump
>> ipv4_route's you need to modify the relevant proc show function.
>>
> 
> I need to dig into this patchset a bit more, but if there is
> a need for in-kernel BTF-based structure dumping I've got a
> work-in-progress patchset that does this by generalizing the code
> that  deals with seq output in the verifier. I've posted it
> as an RFC in case it has anything useful to offer here:
> 
> https://lore.kernel.org/bpf/1587120160-3030-1-git-send-email-alan.maguire@oracle.com/T/#t
> 
> The idea is that by using different callback function we can achieve
> seq, snprintf or other output in-kernel using the kernel BTF data.
> I created one consumer as a proof-of-concept; it's a printk pointer
> format specifier.  Since the dump format is determined in kernel
> it's a bit constrained format-wise, but may be good enough for
> some cases.

The bpfdump work and here in-kernel btf dumping are different.
The bpfdump BPF programs are triggered with a user syscall, e.g.,
    cat /sys/kernel/bpfdump/task/my_dumper (also calling open())
or when user open() an anonymous dumper.

The BPF program can "print" some data through bpf_seq_printf()
helper. These printed data will be received by user space
through read() syscall.

Your work is greater as it makes kernel print more readable.
There is no overlap between your work and bpfdump.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures
  2020-04-17 15:02     ` Alan Maguire
@ 2020-04-19  5:34       ` Yonghong Song
  0 siblings, 0 replies; 24+ messages in thread
From: Yonghong Song @ 2020-04-19  5:34 UTC (permalink / raw)
  To: Alan Maguire
  Cc: David Ahern, Andrii Nakryiko, bpf, Martin KaFai Lau, netdev,
	Alexei Starovoitov, Daniel Borkmann, kernel-team



On 4/17/20 8:02 AM, Alan Maguire wrote:
> On Wed, 15 Apr 2020, Yonghong Song wrote:
> 
>>
>>
>> On 4/15/20 7:23 PM, David Ahern wrote:
>>> On 4/15/20 1:27 PM, Yonghong Song wrote:
>>>>
>>>> As there are some discussions regarding to the kernel interface/steps to
>>>> create file/anonymous dumpers, I think it will be beneficial for
>>>> discussion with this work in progress.
>>>>
>>>> Motivation:
>>>>     The current way to dump kernel data structures mostly:
>>>>       1. /proc system
>>>>       2. various specific tools like "ss" which requires kernel support.
>>>>       3. drgn
>>>>     The dropback for the first two is that whenever you want to dump more,
>>>>     you
>>>>     need change the kernel. For example, Martin wants to dump socket local
>>>
>>> If kernel support is needed for bpfdump of kernel data structures, you
>>> are not really solving the kernel support problem. i.e., to dump
>>> ipv4_route's you need to modify the relevant proc show function.
>>
>> Yes, as mentioned two paragraphs below. kernel change is required.
>> The tradeoff is that this is a one-time investment. Once kernel change
>> is in place, printing new fields (in most cases except new fields
>> which need additional locks etc.) no need for kernel change any more.
>>
> 
> One thing I struggled with initially when reading the cover
> letter was understanding how BPF dumper programs get run.
> Patch 7 deals with that I think and the answer seems to be to
> create additional seq file infrastructure to the exisiting
> one which executes the BPF dumper programs where appropriate.
> Have I got this right? I guess more lightweight methods

Yes. The reason is that some data structures like bpf_map, task, or 
task/file do not have existing seq_ops infrastructure so I created
new ones to iterate them.

> such as instrumenting functions associated with an existing /proc
> dumper are a bit too messy?

We did use existing seq_ops from /proc/net/ipv6_route and 
/proc/net/netlink as an example. In the future, we will do 
/proc/net/tcp[6] and
/proc/net/udp[6] which will reuse existing seq_ops with slight 
modifications.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-04-19  5:34 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-15 19:27 [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 01/17] net: refactor net assignment for seq_net_private structure Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 02/17] bpf: create /sys/kernel/bpfdump mount file system Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 03/17] bpf: provide a way for targets to register themselves Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 04/17] bpf: allow loading of a dumper program Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 05/17] bpf: create file or anonymous dumpers Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 06/17] bpf: add PTR_TO_BTF_ID_OR_NULL support Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 07/17] bpf: add netlink and ipv6_route targets Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 08/17] bpf: add bpf_map target Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 09/17] bpf: add task and task/file targets Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 10/17] bpf: add bpf_seq_printf and bpf_seq_write helpers Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 11/17] bpf: support variable length array in tracing programs Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 12/17] bpf: implement query for target_proto and file dumper prog_id Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 13/17] tools/libbpf: libbpf support for bpfdump Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 14/17] tools/bpftool: add bpf dumper support Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 15/17] tools/bpf: selftests: add dumper programs for ipv6_route and netlink Yonghong Song
2020-04-15 19:27 ` [RFC PATCH bpf-next v2 16/17] tools/bpf: selftests: add dumper progs for bpf_map/task/task_file Yonghong Song
2020-04-15 19:28 ` [RFC PATCH bpf-next v2 17/17] tools/bpf: selftests: add a selftest for anonymous dumper Yonghong Song
2020-04-16  2:23 ` [RFC PATCH bpf-next v2 00/17] bpf: implement bpf based dumping of kernel data structures David Ahern
2020-04-16  6:41   ` Yonghong Song
2020-04-17 15:02     ` Alan Maguire
2020-04-19  5:34       ` Yonghong Song
2020-04-17 10:54   ` Alan Maguire
2020-04-19  5:30     ` Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).