bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements
@ 2020-07-13 16:17 Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 01/13] bpf: refactor bpf_iter_reg to have separate seq_info member Yonghong Song
                   ` (12 more replies)
  0 siblings, 13 replies; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

Bpf iterator is introduced in Commit ae24345da54e
("bpf: Implement an interface to register bpf_iter targets")
which iterates through a particular kernel data
structure and bpf program will be called for each
traversed kernel object.

Bpf iterator has been implemented for task, task_file,
bpf_map, ipv6_route, netlink, tcp and udp so far.

For map elements, there are two ways to traverse all elements from
user space:
  1. using BPF_MAP_GET_NEXT_KEY bpf subcommand to get elements
     one by one.
  2. using BPF_MAP_LOOKUP_BATCH bpf subcommand to get a batch of
     elements.
Both these approaches need to copy data from kernel to user space
in order to do inspection.

This patch implements bpf iterator for map elements.
User can have a bpf program in kernel to run with each map element,
do checking, filtering, aggregation, etc. without copying data
to user space.

Patch #1 and #2 are refactoring. Patch #3 implements readonly buffer
support in verifier. Patches #4 - #7 implements map element support
for hash, percpu hash, lru hash lru percpu hash, array, percpu array
and sock local storage maps. Patches #8 - #9 are libbpf and bpftool
support. Patches #10 - #13 are selftests for implemented
map element iterators.

Yonghong Song (13):
  bpf: refactor bpf_iter_reg to have separate seq_info member
  bpf: refactor to provide aux info to bpf_iter_init_seq_priv_t
  bpf: support readonly buffer in verifier
  bpf: implement bpf iterator for map elements
  bpf: implement bpf iterator for hash maps
  bpf: implement bpf iterator for array maps
  bpf: implement bpf iterator for sock local storage map
  tools/libbpf: add support for bpf map element iterator
  tools/bpftool: add bpftool support for bpf map element iterator
  selftests/bpf: add test for bpf hash map iterators
  selftests/bpf: add test for bpf array map iterators
  selftests/bpf: add a test for bpf sk_storage_map iterator
  selftests/bpf: add a test for out of bound rdonly buf access

 fs/proc/proc_net.c                            |   2 +-
 include/linux/bpf.h                           |  43 +-
 include/linux/bpf_verifier.h                  |   2 +
 include/linux/proc_fs.h                       |   3 +-
 include/uapi/linux/bpf.h                      |   7 +
 kernel/bpf/arraymap.c                         | 140 ++++++
 kernel/bpf/bpf_iter.c                         |  89 +++-
 kernel/bpf/btf.c                              |  13 +
 kernel/bpf/hashtab.c                          | 191 ++++++++
 kernel/bpf/map_iter.c                         |  62 ++-
 kernel/bpf/task_iter.c                        |  18 +-
 kernel/bpf/verifier.c                         |  74 ++-
 net/core/bpf_sk_storage.c                     | 203 +++++++++
 net/ipv4/tcp_ipv4.c                           |  12 +-
 net/ipv4/udp.c                                |  12 +-
 net/ipv6/route.c                              |   8 +-
 net/netlink/af_netlink.c                      |   8 +-
 .../bpftool/Documentation/bpftool-iter.rst    |  16 +-
 tools/bpf/bpftool/iter.c                      |  32 +-
 tools/include/uapi/linux/bpf.h                |   7 +
 tools/lib/bpf/bpf.c                           |   1 +
 tools/lib/bpf/bpf.h                           |   3 +-
 tools/lib/bpf/libbpf.c                        |  10 +-
 tools/lib/bpf/libbpf.h                        |   3 +-
 .../selftests/bpf/prog_tests/bpf_iter.c       | 422 ++++++++++++++++++
 .../bpf/progs/bpf_iter_bpf_array_map.c        |  38 ++
 .../bpf/progs/bpf_iter_bpf_hash_map.c         | 100 +++++
 .../bpf/progs/bpf_iter_bpf_percpu_array_map.c |  48 ++
 .../bpf/progs/bpf_iter_bpf_percpu_hash_map.c  |  51 +++
 .../bpf/progs/bpf_iter_bpf_sk_storage_map.c   |  35 ++
 .../selftests/bpf/progs/bpf_iter_test_kern5.c |  36 ++
 31 files changed, 1624 insertions(+), 65 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_array_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_hash_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_hash_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_test_kern5.c

-- 
2.24.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 01/13] bpf: refactor bpf_iter_reg to have separate seq_info member
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 02/13] bpf: refactor to provide aux info to bpf_iter_init_seq_priv_t Yonghong Song
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

There is no functionality change for this patch.
Struct bpf_iter_reg is used to register a bpf_iter target,
which includes information for both prog_load, link_create
and seq_file creation.

This patch puts fields related seq_file creation into
a different structure. This will be useful for map
elements iterator where one iterator covers different
map types and different map types may have different
seq_ops, init/fini private_data function and
private_data size.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h      | 17 ++++++++++-------
 kernel/bpf/bpf_iter.c    | 12 ++++++------
 kernel/bpf/map_iter.c    |  8 ++++++--
 kernel/bpf/task_iter.c   | 16 ++++++++++++----
 net/ipv4/tcp_ipv4.c      |  8 ++++++--
 net/ipv4/udp.c           |  8 ++++++--
 net/ipv6/route.c         |  8 ++++++--
 net/netlink/af_netlink.c |  8 ++++++--
 8 files changed, 58 insertions(+), 27 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0cd7f6884c5c..deb90ec679b5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -37,6 +37,15 @@ struct seq_operations;
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
 
+typedef int (*bpf_iter_init_seq_priv_t)(void *private_data);
+typedef void (*bpf_iter_fini_seq_priv_t)(void *private_data);
+struct bpf_iter_seq_info {
+	const struct seq_operations *seq_ops;
+	bpf_iter_init_seq_priv_t init_seq_private;
+	bpf_iter_fini_seq_priv_t fini_seq_private;
+	u32 seq_priv_size;
+};
+
 /* map is generic key/value storage optionally accesible by eBPF programs */
 struct bpf_map_ops {
 	/* funcs callable from userspace (via syscall) */
@@ -1183,18 +1192,12 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
 	extern int bpf_iter_ ## target(args);			\
 	int __init bpf_iter_ ## target(args) { return 0; }
 
-typedef int (*bpf_iter_init_seq_priv_t)(void *private_data);
-typedef void (*bpf_iter_fini_seq_priv_t)(void *private_data);
-
 #define BPF_ITER_CTX_ARG_MAX 2
 struct bpf_iter_reg {
 	const char *target;
-	const struct seq_operations *seq_ops;
-	bpf_iter_init_seq_priv_t init_seq_private;
-	bpf_iter_fini_seq_priv_t fini_seq_private;
-	u32 seq_priv_size;
 	u32 ctx_arg_info_size;
 	struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX];
+	const struct bpf_iter_seq_info *seq_info;
 };
 
 struct bpf_iter_meta {
diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index dd612b80b9fe..5b2387d6aa1f 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -218,8 +218,8 @@ static int iter_release(struct inode *inode, struct file *file)
 	iter_priv = container_of(seq->private, struct bpf_iter_priv_data,
 				 target_private);
 
-	if (iter_priv->tinfo->reg_info->fini_seq_private)
-		iter_priv->tinfo->reg_info->fini_seq_private(seq->private);
+	if (iter_priv->tinfo->reg_info->seq_info->fini_seq_private)
+		iter_priv->tinfo->reg_info->seq_info->fini_seq_private(seq->private);
 
 	bpf_prog_put(iter_priv->prog);
 	seq->private = iter_priv;
@@ -433,16 +433,16 @@ static int prepare_seq_file(struct file *file, struct bpf_iter_link *link)
 
 	tinfo = link->tinfo;
 	total_priv_dsize = offsetof(struct bpf_iter_priv_data, target_private) +
-			   tinfo->reg_info->seq_priv_size;
-	priv_data = __seq_open_private(file, tinfo->reg_info->seq_ops,
+			   tinfo->reg_info->seq_info->seq_priv_size;
+	priv_data = __seq_open_private(file, tinfo->reg_info->seq_info->seq_ops,
 				       total_priv_dsize);
 	if (!priv_data) {
 		err = -ENOMEM;
 		goto release_prog;
 	}
 
-	if (tinfo->reg_info->init_seq_private) {
-		err = tinfo->reg_info->init_seq_private(priv_data->target_private);
+	if (tinfo->reg_info->seq_info->init_seq_private) {
+		err = tinfo->reg_info->seq_info->init_seq_private(priv_data->target_private);
 		if (err)
 			goto release_seq_file;
 	}
diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c
index c69071e334bf..ae18b3a86096 100644
--- a/kernel/bpf/map_iter.c
+++ b/kernel/bpf/map_iter.c
@@ -81,17 +81,21 @@ static const struct seq_operations bpf_map_seq_ops = {
 	.show	= bpf_map_seq_show,
 };
 
-static const struct bpf_iter_reg bpf_map_reg_info = {
-	.target			= "bpf_map",
+static const struct bpf_iter_seq_info bpf_map_seq_info = {
 	.seq_ops		= &bpf_map_seq_ops,
 	.init_seq_private	= NULL,
 	.fini_seq_private	= NULL,
 	.seq_priv_size		= sizeof(struct bpf_iter_seq_map_info),
+};
+
+static const struct bpf_iter_reg bpf_map_reg_info = {
+	.target			= "bpf_map",
 	.ctx_arg_info_size	= 1,
 	.ctx_arg_info		= {
 		{ offsetof(struct bpf_iter__bpf_map, map),
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
+	.seq_info		= &bpf_map_seq_info,
 };
 
 static int __init bpf_map_iter_init(void)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 4dbf2b6035f8..2b384ccce907 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -312,25 +312,32 @@ static const struct seq_operations task_file_seq_ops = {
 	.show	= task_file_seq_show,
 };
 
-static const struct bpf_iter_reg task_reg_info = {
-	.target			= "task",
+static const struct bpf_iter_seq_info task_seq_info = {
 	.seq_ops		= &task_seq_ops,
 	.init_seq_private	= init_seq_pidns,
 	.fini_seq_private	= fini_seq_pidns,
 	.seq_priv_size		= sizeof(struct bpf_iter_seq_task_info),
+};
+
+static const struct bpf_iter_reg task_reg_info = {
+	.target			= "task",
 	.ctx_arg_info_size	= 1,
 	.ctx_arg_info		= {
 		{ offsetof(struct bpf_iter__task, task),
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
+	.seq_info		= &task_seq_info,
 };
 
-static const struct bpf_iter_reg task_file_reg_info = {
-	.target			= "task_file",
+static const struct bpf_iter_seq_info task_file_seq_info = {
 	.seq_ops		= &task_file_seq_ops,
 	.init_seq_private	= init_seq_pidns,
 	.fini_seq_private	= fini_seq_pidns,
 	.seq_priv_size		= sizeof(struct bpf_iter_seq_task_file_info),
+};
+
+static const struct bpf_iter_reg task_file_reg_info = {
+	.target			= "task_file",
 	.ctx_arg_info_size	= 2,
 	.ctx_arg_info		= {
 		{ offsetof(struct bpf_iter__task_file, task),
@@ -338,6 +345,7 @@ static const struct bpf_iter_reg task_file_reg_info = {
 		{ offsetof(struct bpf_iter__task_file, file),
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
+	.seq_info		= &task_file_seq_info,
 };
 
 static int __init task_iter_init(void)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index ea0df9fd7618..d204aaee17ea 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2942,17 +2942,21 @@ static void bpf_iter_fini_tcp(void *priv_data)
 	bpf_iter_fini_seq_net(priv_data);
 }
 
-static const struct bpf_iter_reg tcp_reg_info = {
-	.target			= "tcp",
+static const struct bpf_iter_seq_info tcp_seq_info = {
 	.seq_ops		= &bpf_iter_tcp_seq_ops,
 	.init_seq_private	= bpf_iter_init_tcp,
 	.fini_seq_private	= bpf_iter_fini_tcp,
 	.seq_priv_size		= sizeof(struct tcp_iter_state),
+};
+
+static const struct bpf_iter_reg tcp_reg_info = {
+	.target			= "tcp",
 	.ctx_arg_info_size	= 1,
 	.ctx_arg_info		= {
 		{ offsetof(struct bpf_iter__tcp, sk_common),
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
+	.seq_info		= &tcp_seq_info,
 };
 
 static void __init bpf_iter_register(void)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 31530129f137..9695756559e1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -3177,17 +3177,21 @@ static void bpf_iter_fini_udp(void *priv_data)
 	bpf_iter_fini_seq_net(priv_data);
 }
 
-static const struct bpf_iter_reg udp_reg_info = {
-	.target			= "udp",
+static const struct bpf_iter_seq_info udp_seq_info = {
 	.seq_ops		= &bpf_iter_udp_seq_ops,
 	.init_seq_private	= bpf_iter_init_udp,
 	.fini_seq_private	= bpf_iter_fini_udp,
 	.seq_priv_size		= sizeof(struct udp_iter_state),
+};
+
+static const struct bpf_iter_reg udp_reg_info = {
+	.target			= "udp",
 	.ctx_arg_info_size	= 1,
 	.ctx_arg_info		= {
 		{ offsetof(struct bpf_iter__udp, udp_sk),
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
+	.seq_info		= &udp_seq_info,
 };
 
 static void __init bpf_iter_register(void)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 5852039ca9cf..529fe472749e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -6420,17 +6420,21 @@ void __init ip6_route_init_special_entries(void)
 #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_PROC_FS)
 DEFINE_BPF_ITER_FUNC(ipv6_route, struct bpf_iter_meta *meta, struct fib6_info *rt)
 
-static const struct bpf_iter_reg ipv6_route_reg_info = {
-	.target			= "ipv6_route",
+static const struct bpf_iter_seq_info ipv6_route_seq_info = {
 	.seq_ops		= &ipv6_route_seq_ops,
 	.init_seq_private	= bpf_iter_init_seq_net,
 	.fini_seq_private	= bpf_iter_fini_seq_net,
 	.seq_priv_size		= sizeof(struct ipv6_route_iter),
+};
+
+static const struct bpf_iter_reg ipv6_route_reg_info = {
+	.target			= "ipv6_route",
 	.ctx_arg_info_size	= 1,
 	.ctx_arg_info		= {
 		{ offsetof(struct bpf_iter__ipv6_route, rt),
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
+	.seq_info		= &ipv6_route_seq_info,
 };
 
 static int __init bpf_iter_register(void)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 4f2c3b14ddbf..e7ab50e035e8 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2803,17 +2803,21 @@ static const struct rhashtable_params netlink_rhashtable_params = {
 };
 
 #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_PROC_FS)
-static const struct bpf_iter_reg netlink_reg_info = {
-	.target			= "netlink",
+static const struct bpf_iter_seq_info netlink_seq_info = {
 	.seq_ops		= &netlink_seq_ops,
 	.init_seq_private	= bpf_iter_init_seq_net,
 	.fini_seq_private	= bpf_iter_fini_seq_net,
 	.seq_priv_size		= sizeof(struct nl_seq_iter),
+};
+
+static const struct bpf_iter_reg netlink_reg_info = {
+	.target			= "netlink",
 	.ctx_arg_info_size	= 1,
 	.ctx_arg_info		= {
 		{ offsetof(struct bpf_iter__netlink, sk),
 		  PTR_TO_BTF_ID_OR_NULL },
 	},
+	.seq_info		= &netlink_seq_info,
 };
 
 static int __init bpf_iter_register(void)
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 02/13] bpf: refactor to provide aux info to bpf_iter_init_seq_priv_t
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 01/13] bpf: refactor bpf_iter_reg to have separate seq_info member Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 03/13] bpf: support readonly buffer in verifier Yonghong Song
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

This patch refactored target bpf_iter_init_seq_priv_t callback
function to accept additional information. This will be needed
in later patches for map element targets since a particular
map should be passed to traverse elements for that particular
map. In the future, other information may be passed to target
as well, e.g., pid, cgroup id, etc. to customize the iterator.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 fs/proc/proc_net.c      | 2 +-
 include/linux/bpf.h     | 7 ++++++-
 include/linux/proc_fs.h | 3 ++-
 kernel/bpf/bpf_iter.c   | 2 +-
 kernel/bpf/task_iter.c  | 2 +-
 net/ipv4/tcp_ipv4.c     | 4 ++--
 net/ipv4/udp.c          | 4 ++--
 7 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index dba63b2429f0..ed8a6306990c 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -98,7 +98,7 @@ static const struct proc_ops proc_net_seq_ops = {
 	.proc_release	= seq_release_net,
 };
 
-int bpf_iter_init_seq_net(void *priv_data)
+int bpf_iter_init_seq_net(void *priv_data, struct bpf_iter_aux_info *aux)
 {
 #ifdef CONFIG_NET_NS
 	struct seq_net_private *p = priv_data;
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index deb90ec679b5..97c6e2605978 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -33,11 +33,13 @@ struct btf;
 struct btf_type;
 struct exception_table_entry;
 struct seq_operations;
+struct bpf_iter_aux_info;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
 
-typedef int (*bpf_iter_init_seq_priv_t)(void *private_data);
+typedef int (*bpf_iter_init_seq_priv_t)(void *private_data,
+					struct bpf_iter_aux_info *aux);
 typedef void (*bpf_iter_fini_seq_priv_t)(void *private_data);
 struct bpf_iter_seq_info {
 	const struct seq_operations *seq_ops;
@@ -1192,6 +1194,9 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
 	extern int bpf_iter_ ## target(args);			\
 	int __init bpf_iter_ ## target(args) { return 0; }
 
+struct bpf_iter_aux_info {
+};
+
 #define BPF_ITER_CTX_ARG_MAX 2
 struct bpf_iter_reg {
 	const char *target;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index d1eed1b43651..2df965cd0974 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -133,7 +133,8 @@ struct proc_dir_entry *proc_create_net_single_write(const char *name, umode_t mo
 						    void *data);
 extern struct pid *tgid_pidfd_to_pid(const struct file *file);
 
-extern int bpf_iter_init_seq_net(void *priv_data);
+struct bpf_iter_aux_info;
+extern int bpf_iter_init_seq_net(void *priv_data, struct bpf_iter_aux_info *aux);
 extern void bpf_iter_fini_seq_net(void *priv_data);
 
 #ifdef CONFIG_PROC_PID_ARCH_STATUS
diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index 5b2387d6aa1f..8fa94cb1b5a0 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -442,7 +442,7 @@ static int prepare_seq_file(struct file *file, struct bpf_iter_link *link)
 	}
 
 	if (tinfo->reg_info->seq_info->init_seq_private) {
-		err = tinfo->reg_info->seq_info->init_seq_private(priv_data->target_private);
+		err = tinfo->reg_info->seq_info->init_seq_private(priv_data->target_private, NULL);
 		if (err)
 			goto release_seq_file;
 	}
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 2b384ccce907..76b70946e4cb 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -290,7 +290,7 @@ static void task_file_seq_stop(struct seq_file *seq, void *v)
 	}
 }
 
-static int init_seq_pidns(void *priv_data)
+static int init_seq_pidns(void *priv_data, struct bpf_iter_aux_info *aux)
 {
 	struct bpf_iter_seq_task_common *common = priv_data;
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d204aaee17ea..b6f5fdfca668 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2916,7 +2916,7 @@ static struct pernet_operations __net_initdata tcp_sk_ops = {
 DEFINE_BPF_ITER_FUNC(tcp, struct bpf_iter_meta *meta,
 		     struct sock_common *sk_common, uid_t uid)
 
-static int bpf_iter_init_tcp(void *priv_data)
+static int bpf_iter_init_tcp(void *priv_data, struct bpf_iter_aux_info *aux)
 {
 	struct tcp_iter_state *st = priv_data;
 	struct tcp_seq_afinfo *afinfo;
@@ -2928,7 +2928,7 @@ static int bpf_iter_init_tcp(void *priv_data)
 
 	afinfo->family = AF_UNSPEC;
 	st->bpf_seq_afinfo = afinfo;
-	ret = bpf_iter_init_seq_net(priv_data);
+	ret = bpf_iter_init_seq_net(priv_data, aux);
 	if (ret)
 		kfree(afinfo);
 	return ret;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9695756559e1..5184a517abc1 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -3150,7 +3150,7 @@ static struct pernet_operations __net_initdata udp_sysctl_ops = {
 DEFINE_BPF_ITER_FUNC(udp, struct bpf_iter_meta *meta,
 		     struct udp_sock *udp_sk, uid_t uid, int bucket)
 
-static int bpf_iter_init_udp(void *priv_data)
+static int bpf_iter_init_udp(void *priv_data, struct bpf_iter_aux_info *aux)
 {
 	struct udp_iter_state *st = priv_data;
 	struct udp_seq_afinfo *afinfo;
@@ -3163,7 +3163,7 @@ static int bpf_iter_init_udp(void *priv_data)
 	afinfo->family = AF_UNSPEC;
 	afinfo->udp_table = &udp_table;
 	st->bpf_seq_afinfo = afinfo;
-	ret = bpf_iter_init_seq_net(priv_data);
+	ret = bpf_iter_init_seq_net(priv_data, aux);
 	if (ret)
 		kfree(afinfo);
 	return ret;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 03/13] bpf: support readonly buffer in verifier
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 01/13] bpf: refactor bpf_iter_reg to have separate seq_info member Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 02/13] bpf: refactor to provide aux info to bpf_iter_init_seq_priv_t Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  2020-07-13 23:25   ` Alexei Starovoitov
  2020-07-13 16:17 ` [PATCH bpf-next 04/13] bpf: implement bpf iterator for map elements Yonghong Song
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

Two new readonly buffer PTR_TO_RDONLY_BUF or
PTR_TO_RDONLY_BUF_OR_NULL register states
are introduced. These new register states will be used
by later bpf map element iterator.

New register states share some similarity to
PTR_TO_TP_BUFFER as it will calculate accessed buffer
size during verification time. The accessed buffer
size will be later compared to other metrics during
later attach/link_create time.

Two differences between PTR_TO_TP_BUFFER and
PTR_TO_RDONLY_BUF[_OR_NULL].
PTR_TO_TP_BUFFER is for write only
and PTR_TO_RDONLY_BUF[_OR_NULL] is for read only.
In addition, a rdonly_buf_seq_id is also added to the
register state since it is possible for the same program
there could be two PTR_TO_RDONLY_BUF[_OR_NULL] ctx arguments.
For example, for bpf later map element iterator,
both key and value may be PTR_TO_TP_BUFFER_OR_NULL.

Similar to reg_state PTR_TO_BTF_ID_OR_NULL in bpf
iterator programs, PTR_TO_RDONLY_BUF_OR_NULL reg_type and
its rdonly_buf_seq_id can be set at
prog->aux->bpf_ctx_arg_aux, and bpf verifier will
retrieve the values during btf_ctx_access().
Later bpf map element iterator implementation
will show how such information will be assigned
during target registeration time.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h          |  7 ++++
 include/linux/bpf_verifier.h |  2 +
 kernel/bpf/btf.c             | 13 +++++++
 kernel/bpf/verifier.c        | 74 +++++++++++++++++++++++++++++++-----
 4 files changed, 87 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 97c6e2605978..8f708d51733b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -352,6 +352,8 @@ enum bpf_reg_type {
 	PTR_TO_BTF_ID_OR_NULL,	 /* reg points to kernel struct or NULL */
 	PTR_TO_MEM,		 /* reg points to valid memory region */
 	PTR_TO_MEM_OR_NULL,	 /* reg points to valid memory region or NULL */
+	PTR_TO_RDONLY_BUF,	 /* reg points to a readonly buffer */
+	PTR_TO_RDONLY_BUF_OR_NULL, /* reg points to a readonly buffer or NULL */
 };
 
 /* The information passed from prog-specific *_is_valid_access
@@ -362,6 +364,7 @@ struct bpf_insn_access_aux {
 	union {
 		int ctx_field_size;
 		u32 btf_id;
+		u32 rdonly_buf_seq_id;
 	};
 	struct bpf_verifier_log *log; /* for verbose logs */
 };
@@ -678,8 +681,11 @@ struct bpf_jit_poke_descriptor {
 struct bpf_ctx_arg_aux {
 	u32 offset;
 	enum bpf_reg_type reg_type;
+	u32 rdonly_buf_seq_id;
 };
 
+#define BPF_MAX_RDONLY_BUF	2
+
 struct bpf_prog_aux {
 	atomic64_t refcnt;
 	u32 used_map_cnt;
@@ -693,6 +699,7 @@ struct bpf_prog_aux {
 	u32 attach_btf_id; /* in-kernel BTF type id to attach to */
 	u32 ctx_arg_info_size;
 	const struct bpf_ctx_arg_aux *ctx_arg_info;
+	u32 max_rdonly_access[BPF_MAX_RDONLY_BUF];
 	struct bpf_prog *linked_prog;
 	bool verifier_zext; /* Zero extensions has been inserted by verifier. */
 	bool offload_requested;
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 53c7bd568c5d..063e4ab2dd77 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -56,6 +56,8 @@ struct bpf_reg_state {
 
 		u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */
 
+		u32 rdonly_buf_seq_id; /* for PTR_TO_RDONLY_BUF */
+
 		/* Max size from any of the above. */
 		unsigned long raw;
 	};
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 4c3007f428b1..895de2b21385 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -3809,6 +3809,19 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 			btf_kind_str[BTF_INFO_KIND(t->info)]);
 		return false;
 	}
+
+	/* check for PTR_TO_RDONLY_BUF_OR_NULL */
+	for (i = 0; i < prog->aux->ctx_arg_info_size; i++) {
+		const struct bpf_ctx_arg_aux *ctx_arg_info = &prog->aux->ctx_arg_info[i];
+
+		if (ctx_arg_info->offset == off &&
+		    ctx_arg_info->reg_type == PTR_TO_RDONLY_BUF_OR_NULL) {
+			info->reg_type = ctx_arg_info->reg_type;
+			info->rdonly_buf_seq_id = ctx_arg_info->rdonly_buf_seq_id;
+			return true;
+		}
+	}
+
 	if (t->type == 0)
 		/* This is a pointer to void.
 		 * It is the same as scalar from the verifier safety pov.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index b608185e1ffd..87801afa26fc 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -410,7 +410,8 @@ static bool reg_type_may_be_null(enum bpf_reg_type type)
 	       type == PTR_TO_SOCK_COMMON_OR_NULL ||
 	       type == PTR_TO_TCP_SOCK_OR_NULL ||
 	       type == PTR_TO_BTF_ID_OR_NULL ||
-	       type == PTR_TO_MEM_OR_NULL;
+	       type == PTR_TO_MEM_OR_NULL ||
+	       type == PTR_TO_RDONLY_BUF_OR_NULL;
 }
 
 static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg)
@@ -504,6 +505,8 @@ static const char * const reg_type_str[] = {
 	[PTR_TO_BTF_ID_OR_NULL]	= "ptr_or_null_",
 	[PTR_TO_MEM]		= "mem",
 	[PTR_TO_MEM_OR_NULL]	= "mem_or_null",
+	[PTR_TO_RDONLY_BUF]	= "rdonly_buf",
+	[PTR_TO_RDONLY_BUF_OR_NULL] = "rdonly_buf_or_null",
 };
 
 static char slot_type_char[] = {
@@ -579,6 +582,9 @@ static void print_verifier_state(struct bpf_verifier_env *env,
 				verbose(env, ",ks=%d,vs=%d",
 					reg->map_ptr->key_size,
 					reg->map_ptr->value_size);
+			else if (t == PTR_TO_RDONLY_BUF ||
+				 t == PTR_TO_RDONLY_BUF_OR_NULL)
+				verbose(env, ",seq_id=%u", reg->rdonly_buf_seq_id);
 			if (tnum_is_const(reg->var_off)) {
 				/* Typically an immediate SCALAR_VALUE, but
 				 * could be a pointer whose offset is too big
@@ -2174,6 +2180,8 @@ static bool is_spillable_regtype(enum bpf_reg_type type)
 	case PTR_TO_XDP_SOCK:
 	case PTR_TO_BTF_ID:
 	case PTR_TO_BTF_ID_OR_NULL:
+	case PTR_TO_RDONLY_BUF:
+	case PTR_TO_RDONLY_BUF_OR_NULL:
 		return true;
 	default:
 		return false;
@@ -2699,7 +2707,7 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
 /* check access to 'struct bpf_context' fields.  Supports fixed offsets only */
 static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off, int size,
 			    enum bpf_access_type t, enum bpf_reg_type *reg_type,
-			    u32 *btf_id)
+			    u32 *btf_id, u32 *rdonly_buf_seq_id)
 {
 	struct bpf_insn_access_aux info = {
 		.reg_type = *reg_type,
@@ -2719,6 +2727,8 @@ static int check_ctx_access(struct bpf_verifier_env *env, int insn_idx, int off,
 
 		if (*reg_type == PTR_TO_BTF_ID || *reg_type == PTR_TO_BTF_ID_OR_NULL)
 			*btf_id = info.btf_id;
+		else if (*reg_type == PTR_TO_RDONLY_BUF_OR_NULL)
+			*rdonly_buf_seq_id = info.rdonly_buf_seq_id;
 		else
 			env->insn_aux_data[insn_idx].ctx_field_size = info.ctx_field_size;
 		/* remember the offset of last byte accessed in ctx */
@@ -3053,14 +3063,15 @@ int check_ctx_reg(struct bpf_verifier_env *env,
 	return 0;
 }
 
-static int check_tp_buffer_access(struct bpf_verifier_env *env,
-				  const struct bpf_reg_state *reg,
-				  int regno, int off, int size)
+static int __check_buffer_access(struct bpf_verifier_env *env,
+				 const char *buf_info,
+				 const struct bpf_reg_state *reg,
+				 int regno, int off, int size)
 {
 	if (off < 0) {
 		verbose(env,
-			"R%d invalid tracepoint buffer access: off=%d, size=%d",
-			regno, off, size);
+			"R%d invalid %s buffer access: off=%d, size=%d",
+			regno, buf_info, off, size);
 		return -EACCES;
 	}
 	if (!tnum_is_const(reg->var_off) || reg->var_off.value) {
@@ -3072,12 +3083,43 @@ static int check_tp_buffer_access(struct bpf_verifier_env *env,
 			regno, off, tn_buf);
 		return -EACCES;
 	}
+
+	return 0;
+}
+
+static int check_tp_buffer_access(struct bpf_verifier_env *env,
+				  const struct bpf_reg_state *reg,
+				  int regno, int off, int size)
+{
+	int err;
+
+	err = __check_buffer_access(env, "tracepoint", reg, regno, off, size);
+	if (err)
+		return err;
+
 	if (off + size > env->prog->aux->max_tp_access)
 		env->prog->aux->max_tp_access = off + size;
 
 	return 0;
 }
 
+static int check_rdonly_buf_access(struct bpf_verifier_env *env,
+				   const struct bpf_reg_state *reg,
+				   int regno, int off, int size)
+{
+	u32 seq_id = reg->rdonly_buf_seq_id;
+	int err;
+
+	err = __check_buffer_access(env, "readonly", reg, regno, off, size);
+	if (err)
+		return err;
+
+	if (off + size > env->prog->aux->max_rdonly_access[seq_id])
+		env->prog->aux->max_rdonly_access[seq_id] = off + size;
+
+	return 0;
+}
+
 /* BPF architecture zero extends alu32 ops into 64-bit registesr */
 static void zext_32_to_64(struct bpf_reg_state *reg)
 {
@@ -3327,7 +3369,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 			mark_reg_unknown(env, regs, value_regno);
 	} else if (reg->type == PTR_TO_CTX) {
 		enum bpf_reg_type reg_type = SCALAR_VALUE;
-		u32 btf_id = 0;
+		u32 btf_id = 0, rdonly_buf_seq_id = 0;
 
 		if (t == BPF_WRITE && value_regno >= 0 &&
 		    is_pointer_value(env, value_regno)) {
@@ -3339,7 +3381,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 		if (err < 0)
 			return err;
 
-		err = check_ctx_access(env, insn_idx, off, size, t, &reg_type, &btf_id);
+		err = check_ctx_access(env, insn_idx, off, size, t, &reg_type, &btf_id,
+				       &rdonly_buf_seq_id);
 		if (err)
 			verbose_linfo(env, insn_idx, "; ");
 		if (!err && t == BPF_READ && value_regno >= 0) {
@@ -3363,6 +3406,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 				if (reg_type == PTR_TO_BTF_ID ||
 				    reg_type == PTR_TO_BTF_ID_OR_NULL)
 					regs[value_regno].btf_id = btf_id;
+				else if (reg_type == PTR_TO_RDONLY_BUF_OR_NULL)
+					regs[value_regno].rdonly_buf_seq_id = rdonly_buf_seq_id;
 			}
 			regs[value_regno].type = reg_type;
 		}
@@ -3428,6 +3473,15 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 	} else if (reg->type == CONST_PTR_TO_MAP) {
 		err = check_ptr_to_map_access(env, regs, regno, off, size, t,
 					      value_regno);
+	} else if (reg->type == PTR_TO_RDONLY_BUF) {
+		if (t == BPF_WRITE) {
+			verbose(env, "R%d cannot write into %s\n",
+				regno, reg_type_str[reg->type]);
+			return -EACCES;
+		}
+		err = check_rdonly_buf_access(env, reg, regno, off, size);
+		if (!err && value_regno >= 0)
+			mark_reg_unknown(env, regs, value_regno);
 	} else {
 		verbose(env, "R%d invalid mem access '%s'\n", regno,
 			reg_type_str[reg->type]);
@@ -6803,6 +6857,8 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state,
 			reg->type = PTR_TO_BTF_ID;
 		} else if (reg->type == PTR_TO_MEM_OR_NULL) {
 			reg->type = PTR_TO_MEM;
+		} else if (reg->type == PTR_TO_RDONLY_BUF_OR_NULL) {
+			reg->type = PTR_TO_RDONLY_BUF;
 		}
 		if (is_null) {
 			/* We don't need id and ref_obj_id from this point
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 04/13] bpf: implement bpf iterator for map elements
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
                   ` (2 preceding siblings ...)
  2020-07-13 16:17 ` [PATCH bpf-next 03/13] bpf: support readonly buffer in verifier Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 05/13] bpf: implement bpf iterator for hash maps Yonghong Song
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

The bpf iterator for map elements are implemented.
The bpf program will receive four parameters:
  bpf_iter_meta *meta: the meta data
  bpf_map *map:        the bpf_map whose elements are traversed
  void *key:           the key of one element
  void *value:         the value of the same element

Here, meta and map pointers are always valid, and
key and value have register type
PTR_TO_RDONLY_BUF_OR_NULL. The kernel will track
the access range of key and value during verification time.
Later, these values will be compared against the values
in the actual map to ensure all accesses are within range.

A new field iter_seq_info is added to bpf_map_ops which
is used to add map type specific information, i.e., seq_ops,
init/fini seq_file func and seq_file private data size.
Subsequent patches will have actual implementation
for bpf_map_ops->iter_seq_info.

In user space, BPF_ITER_LINK_MAP_FD needs to be
specified in prog attr->link_create.flags, which indicates
that attr->link_create.target_fd is a map_fd.
The reason for such an explicit flag is for possible
future cases where one bpf iterator may allow more than
one possible customization, e.g., pid and cgroup id for
task_file.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h            | 16 ++++++
 include/uapi/linux/bpf.h       |  7 +++
 kernel/bpf/bpf_iter.c          | 89 ++++++++++++++++++++++++++--------
 kernel/bpf/map_iter.c          | 30 +++++++++++-
 tools/include/uapi/linux/bpf.h |  7 +++
 5 files changed, 129 insertions(+), 20 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8f708d51733b..4cbeeb2c8716 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -107,6 +107,9 @@ struct bpf_map_ops {
 	/* BTF name and id of struct allocated by map_alloc */
 	const char * const map_btf_name;
 	int *map_btf_id;
+
+	/* bpf_iter info used to open a seq_file */
+	const struct bpf_iter_seq_info *iter_seq_info;
 };
 
 struct bpf_map_memory {
@@ -1202,12 +1205,18 @@ int bpf_obj_get_user(const char __user *pathname, int flags);
 	int __init bpf_iter_ ## target(args) { return 0; }
 
 struct bpf_iter_aux_info {
+	struct bpf_map *map;
 };
 
+typedef int (*bpf_iter_check_target_t)(struct bpf_prog *prog,
+				       struct bpf_iter_aux_info *aux);
+
 #define BPF_ITER_CTX_ARG_MAX 2
 struct bpf_iter_reg {
 	const char *target;
+	bpf_iter_check_target_t check_target;
 	u32 ctx_arg_info_size;
+	enum bpf_iter_link_info link_info;
 	struct bpf_ctx_arg_aux ctx_arg_info[BPF_ITER_CTX_ARG_MAX];
 	const struct bpf_iter_seq_info *seq_info;
 };
@@ -1218,6 +1227,13 @@ struct bpf_iter_meta {
 	u64 seq_num;
 };
 
+struct bpf_iter__bpf_map_elem {
+	__bpf_md_ptr(struct bpf_iter_meta *, meta);
+	__bpf_md_ptr(struct bpf_map *, map);
+	__bpf_md_ptr(void *, key);
+	__bpf_md_ptr(void *, value);
+};
+
 int bpf_iter_reg_target(const struct bpf_iter_reg *reg_info);
 void bpf_iter_unreg_target(const struct bpf_iter_reg *reg_info);
 bool bpf_iter_prog_supported(struct bpf_prog *prog);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 548a749aebb3..550c92344b4b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -243,6 +243,13 @@ enum bpf_link_type {
 	MAX_BPF_LINK_TYPE,
 };
 
+enum bpf_iter_link_info {
+	BPF_ITER_LINK_UNSPEC = 0,
+	BPF_ITER_LINK_MAP_FD = 1,
+
+	MAX_BPF_ITER_LINK_INFO,
+};
+
 /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
  *
  * NONE(default): No further bpf programs allowed in the subtree.
diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
index 8fa94cb1b5a0..335ea06e8f69 100644
--- a/kernel/bpf/bpf_iter.c
+++ b/kernel/bpf/bpf_iter.c
@@ -14,11 +14,13 @@ struct bpf_iter_target_info {
 
 struct bpf_iter_link {
 	struct bpf_link link;
+	struct bpf_iter_aux_info aux;
 	struct bpf_iter_target_info *tinfo;
 };
 
 struct bpf_iter_priv_data {
 	struct bpf_iter_target_info *tinfo;
+	const struct bpf_iter_seq_info *seq_info;
 	struct bpf_prog *prog;
 	u64 session_id;
 	u64 seq_num;
@@ -35,7 +37,8 @@ static DEFINE_MUTEX(link_mutex);
 /* incremented on every opened seq_file */
 static atomic64_t session_id;
 
-static int prepare_seq_file(struct file *file, struct bpf_iter_link *link);
+static int prepare_seq_file(struct file *file, struct bpf_iter_link *link,
+			    const struct bpf_iter_seq_info *seq_info);
 
 static void bpf_iter_inc_seq_num(struct seq_file *seq)
 {
@@ -199,11 +202,25 @@ static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
 	return copied;
 }
 
+static const struct bpf_iter_seq_info *
+__get_seq_info(struct bpf_iter_link *link)
+{
+	const struct bpf_iter_seq_info *seq_info;
+
+	if (link->aux.map) {
+		seq_info = link->aux.map->ops->iter_seq_info;
+		if (seq_info)
+			return seq_info;
+	}
+
+	return link->tinfo->reg_info->seq_info;
+}
+
 static int iter_open(struct inode *inode, struct file *file)
 {
 	struct bpf_iter_link *link = inode->i_private;
 
-	return prepare_seq_file(file, link);
+	return prepare_seq_file(file, link, __get_seq_info(link));
 }
 
 static int iter_release(struct inode *inode, struct file *file)
@@ -218,8 +235,8 @@ static int iter_release(struct inode *inode, struct file *file)
 	iter_priv = container_of(seq->private, struct bpf_iter_priv_data,
 				 target_private);
 
-	if (iter_priv->tinfo->reg_info->seq_info->fini_seq_private)
-		iter_priv->tinfo->reg_info->seq_info->fini_seq_private(seq->private);
+	if (iter_priv->seq_info->fini_seq_private)
+		iter_priv->seq_info->fini_seq_private(seq->private);
 
 	bpf_prog_put(iter_priv->prog);
 	seq->private = iter_priv;
@@ -318,6 +335,11 @@ bool bpf_iter_prog_supported(struct bpf_prog *prog)
 
 static void bpf_iter_link_release(struct bpf_link *link)
 {
+	struct bpf_iter_link *iter_link =
+		container_of(link, struct bpf_iter_link, link);
+
+	if (iter_link->aux.map)
+		bpf_map_put_with_uref(iter_link->aux.map);
 }
 
 static void bpf_iter_link_dealloc(struct bpf_link *link)
@@ -370,14 +392,13 @@ int bpf_iter_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 {
 	struct bpf_link_primer link_primer;
 	struct bpf_iter_target_info *tinfo;
+	struct bpf_iter_aux_info aux = {};
 	struct bpf_iter_link *link;
+	u32 prog_btf_id, target_fd;
 	bool existed = false;
-	u32 prog_btf_id;
+	struct bpf_map *map;
 	int err;
 
-	if (attr->link_create.target_fd || attr->link_create.flags)
-		return -EINVAL;
-
 	prog_btf_id = prog->aux->attach_btf_id;
 	mutex_lock(&targets_mutex);
 	list_for_each_entry(tinfo, &targets, list) {
@@ -390,6 +411,13 @@ int bpf_iter_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	if (!existed)
 		return -ENOENT;
 
+	/* Make sure user supplied flags are target expected. */
+	target_fd = attr->link_create.target_fd;
+	if (attr->link_create.flags != tinfo->reg_info->link_info)
+		return -EINVAL;
+	if (!attr->link_create.flags && target_fd)
+		return -EINVAL;
+
 	link = kzalloc(sizeof(*link), GFP_USER | __GFP_NOWARN);
 	if (!link)
 		return -ENOMEM;
@@ -398,26 +426,48 @@ int bpf_iter_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
 	link->tinfo = tinfo;
 
 	err  = bpf_link_prime(&link->link, &link_primer);
-	if (err) {
-		kfree(link);
-		return err;
+	if (err)
+		goto free_link;
+
+	if (tinfo->reg_info->link_info == BPF_ITER_LINK_MAP_FD) {
+		map = bpf_map_get_with_uref(target_fd);
+		if (IS_ERR(map)) {
+			err = PTR_ERR(map);
+			goto free_link;
+		}
+
+		aux.map = map;
+		err = tinfo->reg_info->check_target(prog, &aux);
+		if (err) {
+			bpf_map_put_with_uref(map);
+			goto free_link;
+		}
+
+		link->aux.map = map;
 	}
 
 	return bpf_link_settle(&link_primer);
+
+free_link:
+	kfree(link);
+	return err;
 }
 
 static void init_seq_meta(struct bpf_iter_priv_data *priv_data,
 			  struct bpf_iter_target_info *tinfo,
+			  const struct bpf_iter_seq_info *seq_info,
 			  struct bpf_prog *prog)
 {
 	priv_data->tinfo = tinfo;
+	priv_data->seq_info = seq_info;
 	priv_data->prog = prog;
 	priv_data->session_id = atomic64_inc_return(&session_id);
 	priv_data->seq_num = 0;
 	priv_data->done_stop = false;
 }
 
-static int prepare_seq_file(struct file *file, struct bpf_iter_link *link)
+static int prepare_seq_file(struct file *file, struct bpf_iter_link *link,
+			    const struct bpf_iter_seq_info *seq_info)
 {
 	struct bpf_iter_priv_data *priv_data;
 	struct bpf_iter_target_info *tinfo;
@@ -433,21 +483,21 @@ static int prepare_seq_file(struct file *file, struct bpf_iter_link *link)
 
 	tinfo = link->tinfo;
 	total_priv_dsize = offsetof(struct bpf_iter_priv_data, target_private) +
-			   tinfo->reg_info->seq_info->seq_priv_size;
-	priv_data = __seq_open_private(file, tinfo->reg_info->seq_info->seq_ops,
+			   seq_info->seq_priv_size;
+	priv_data = __seq_open_private(file, seq_info->seq_ops,
 				       total_priv_dsize);
 	if (!priv_data) {
 		err = -ENOMEM;
 		goto release_prog;
 	}
 
-	if (tinfo->reg_info->seq_info->init_seq_private) {
-		err = tinfo->reg_info->seq_info->init_seq_private(priv_data->target_private, NULL);
+	if (seq_info->init_seq_private) {
+		err = seq_info->init_seq_private(priv_data->target_private, &link->aux);
 		if (err)
 			goto release_seq_file;
 	}
 
-	init_seq_meta(priv_data, tinfo, prog);
+	init_seq_meta(priv_data, tinfo, seq_info, prog);
 	seq = file->private_data;
 	seq->private = priv_data->target_private;
 
@@ -463,6 +513,7 @@ static int prepare_seq_file(struct file *file, struct bpf_iter_link *link)
 
 int bpf_iter_new_fd(struct bpf_link *link)
 {
+	struct bpf_iter_link *iter_link;
 	struct file *file;
 	unsigned int flags;
 	int err, fd;
@@ -481,8 +532,8 @@ int bpf_iter_new_fd(struct bpf_link *link)
 		goto free_fd;
 	}
 
-	err = prepare_seq_file(file,
-			       container_of(link, struct bpf_iter_link, link));
+	iter_link = container_of(link, struct bpf_iter_link, link);
+	err = prepare_seq_file(file, iter_link, __get_seq_info(iter_link));
 	if (err)
 		goto free_file;
 
diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c
index ae18b3a86096..e740312a5456 100644
--- a/kernel/bpf/map_iter.c
+++ b/kernel/bpf/map_iter.c
@@ -98,9 +98,37 @@ static const struct bpf_iter_reg bpf_map_reg_info = {
 	.seq_info		= &bpf_map_seq_info,
 };
 
+static int bpf_iter_check_map(struct bpf_prog *prog,
+			      struct bpf_iter_aux_info *aux)
+{
+	return -EINVAL;
+}
+
+DEFINE_BPF_ITER_FUNC(bpf_map_elem, struct bpf_iter_meta *meta,
+		     struct bpf_map *map, void *key, void *value)
+
+static const struct bpf_iter_reg bpf_map_elem_reg_info = {
+	.target			= "bpf_map_elem",
+	.check_target		= bpf_iter_check_map,
+	.link_info		= BPF_ITER_LINK_MAP_FD,
+	.ctx_arg_info_size	= 2,
+	.ctx_arg_info		= {
+		{ offsetof(struct bpf_iter__bpf_map_elem, key),
+		  PTR_TO_RDONLY_BUF_OR_NULL, 0 },
+		{ offsetof(struct bpf_iter__bpf_map_elem, value),
+		  PTR_TO_RDONLY_BUF_OR_NULL, 1 },
+	},
+};
+
 static int __init bpf_map_iter_init(void)
 {
-	return bpf_iter_reg_target(&bpf_map_reg_info);
+	int ret;
+
+	ret = bpf_iter_reg_target(&bpf_map_reg_info);
+	if (ret)
+		return ret;
+
+	return bpf_iter_reg_target(&bpf_map_elem_reg_info);
 }
 
 late_initcall(bpf_map_iter_init);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 548a749aebb3..550c92344b4b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -243,6 +243,13 @@ enum bpf_link_type {
 	MAX_BPF_LINK_TYPE,
 };
 
+enum bpf_iter_link_info {
+	BPF_ITER_LINK_UNSPEC = 0,
+	BPF_ITER_LINK_MAP_FD = 1,
+
+	MAX_BPF_ITER_LINK_INFO,
+};
+
 /* cgroup-bpf attach flags used in BPF_PROG_ATTACH command
  *
  * NONE(default): No further bpf programs allowed in the subtree.
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 05/13] bpf: implement bpf iterator for hash maps
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
                   ` (3 preceding siblings ...)
  2020-07-13 16:17 ` [PATCH bpf-next 04/13] bpf: implement bpf iterator for map elements Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 06/13] bpf: implement bpf iterator for array maps Yonghong Song
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

The bpf iterators for hash, percpu hash, lru hash
and lru percpu hash are implemented. During link time,
bpf_iter_reg->check_target() will check map type
and ensure the program access key/value region is
within the map defined key/value size limit.

For percpu hash and lru hash maps, the bpf program
will receive values for all cpus. The map element
bpf iterator infrastructure will prepare value
properly before passing the value pointer to the
bpf program.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/hashtab.c  | 191 ++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/map_iter.c |  24 +++++-
 2 files changed, 214 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index d4378d7d442b..56280b10cb99 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -1612,6 +1612,193 @@ htab_lru_map_lookup_and_delete_batch(struct bpf_map *map,
 						  true, false);
 }
 
+struct bpf_iter_seq_hash_map_info {
+	struct bpf_map *map;
+	struct bpf_htab *htab;
+	void *percpu_value_buf; // non-zero means percpu hash
+	unsigned long flags;
+	u32 bucket_id;
+	u32 skip_elems;
+};
+
+static struct htab_elem *
+bpf_hash_map_seq_find_next(struct bpf_iter_seq_hash_map_info *info,
+			   struct htab_elem *prev_elem)
+{
+	const struct bpf_htab *htab = info->htab;
+	unsigned long flags = info->flags;
+	u32 skip_elems = info->skip_elems;
+	u32 bucket_id = info->bucket_id;
+	struct hlist_nulls_head *head;
+	struct hlist_nulls_node *n;
+	struct htab_elem *elem;
+	struct bucket *b;
+	u32 i, count;
+
+	if (bucket_id >= htab->n_buckets)
+		return NULL;
+
+	/* try to find next elem in the same bucket */
+	if (prev_elem) {
+		n = rcu_dereference_raw(hlist_nulls_next_rcu(&prev_elem->hash_node));
+		elem = hlist_nulls_entry_safe(n, struct htab_elem, hash_node);
+		if (elem)
+			return elem;
+
+		/* not found, unlock and go to the next bucket */
+		b = &htab->buckets[bucket_id++];
+		htab_unlock_bucket(htab, b, flags);
+		skip_elems = 0;
+	}
+
+	for (i = bucket_id; i < htab->n_buckets; i++) {
+		b = &htab->buckets[i];
+		flags = htab_lock_bucket(htab, b);
+
+		count = 0;
+		head = &b->head;
+		hlist_nulls_for_each_entry_rcu(elem, n, head, hash_node) {
+			if (count >= skip_elems) {
+				info->flags = flags;
+				info->bucket_id = i;
+				info->skip_elems = count;
+				return elem;
+			}
+			count++;
+		}
+
+		htab_unlock_bucket(htab, b, flags);
+		skip_elems = 0;
+	}
+
+	info->bucket_id = i;
+	info->skip_elems = 0;
+	return NULL;
+}
+
+static void *bpf_hash_map_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	struct bpf_iter_seq_hash_map_info *info = seq->private;
+	struct htab_elem *elem;
+
+	elem = bpf_hash_map_seq_find_next(info, NULL);
+	if (!elem)
+		return NULL;
+
+	if (*pos == 0)
+		++*pos;
+	return elem;
+}
+
+static void *bpf_hash_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct bpf_iter_seq_hash_map_info *info = seq->private;
+
+	++*pos;
+	++info->skip_elems;
+	return bpf_hash_map_seq_find_next(info, v);
+}
+
+static int __bpf_hash_map_seq_show(struct seq_file *seq, struct htab_elem *elem)
+{
+	struct bpf_iter_seq_hash_map_info *info = seq->private;
+	u32 roundup_key_size, roundup_value_size;
+	struct bpf_iter__bpf_map_elem ctx = {};
+	struct bpf_map *map = info->map;
+	struct bpf_iter_meta meta;
+	int ret = 0, off = 0, cpu;
+	struct bpf_prog *prog;
+	void __percpu *pptr;
+
+	meta.seq = seq;
+	prog = bpf_iter_get_info(&meta, elem == NULL);
+	if (prog) {
+		ctx.meta = &meta;
+		ctx.map = info->map;
+		if (elem) {
+			roundup_key_size = round_up(map->key_size, 8);
+			ctx.key = elem->key;
+			if (!info->percpu_value_buf) {
+				ctx.value = elem->key + roundup_key_size;
+			} else {
+				roundup_value_size = round_up(map->value_size, 8);
+				pptr = htab_elem_get_ptr(elem, map->key_size);
+				for_each_possible_cpu(cpu) {
+					bpf_long_memcpy(info->percpu_value_buf + off,
+							per_cpu_ptr(pptr, cpu),
+							roundup_value_size);
+					off += roundup_value_size;
+				}
+				ctx.value = info->percpu_value_buf;
+			}
+		}
+		ret = bpf_iter_run_prog(prog, &ctx);
+	}
+
+	return ret;
+}
+
+static int bpf_hash_map_seq_show(struct seq_file *seq, void *v)
+{
+	return __bpf_hash_map_seq_show(seq, v);
+}
+
+static void bpf_hash_map_seq_stop(struct seq_file *seq, void *v)
+{
+	struct bpf_iter_seq_hash_map_info *info = seq->private;
+
+	if (!v)
+		(void)__bpf_hash_map_seq_show(seq, NULL);
+	else
+		htab_unlock_bucket(info->htab,
+				   &info->htab->buckets[info->bucket_id],
+				   info->flags);
+}
+
+static int bpf_iter_init_hash_map(void *priv_data,
+				  struct bpf_iter_aux_info *aux)
+{
+	struct bpf_iter_seq_hash_map_info *seq_info = priv_data;
+	struct bpf_map *map = aux->map;
+	void *value_buf;
+	u32 buf_size;
+
+	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
+	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH) {
+		buf_size = round_up(map->value_size, 8) * num_possible_cpus();
+		value_buf = kmalloc(buf_size, GFP_USER | __GFP_NOWARN);
+		if (!value_buf)
+			return -ENOMEM;
+
+		seq_info->percpu_value_buf = value_buf;
+	}
+
+	seq_info->map = map;
+	seq_info->htab = container_of(map, struct bpf_htab, map);
+	return 0;
+}
+
+static void bpf_iter_fini_hash_map(void *priv_data)
+{
+	struct bpf_iter_seq_hash_map_info *seq_info = priv_data;
+
+	kfree(seq_info->percpu_value_buf);
+}
+
+static const struct seq_operations bpf_hash_map_seq_ops = {
+	.start	= bpf_hash_map_seq_start,
+	.next	= bpf_hash_map_seq_next,
+	.stop	= bpf_hash_map_seq_stop,
+	.show	= bpf_hash_map_seq_show,
+};
+
+static const struct bpf_iter_seq_info iter_seq_info = {
+	.seq_ops		= &bpf_hash_map_seq_ops,
+	.init_seq_private	= bpf_iter_init_hash_map,
+	.fini_seq_private	= bpf_iter_fini_hash_map,
+	.seq_priv_size		= sizeof(struct bpf_iter_seq_hash_map_info),
+};
+
 static int htab_map_btf_id;
 const struct bpf_map_ops htab_map_ops = {
 	.map_alloc_check = htab_map_alloc_check,
@@ -1626,6 +1813,7 @@ const struct bpf_map_ops htab_map_ops = {
 	BATCH_OPS(htab),
 	.map_btf_name = "bpf_htab",
 	.map_btf_id = &htab_map_btf_id,
+	.iter_seq_info = &iter_seq_info,
 };
 
 static int htab_lru_map_btf_id;
@@ -1643,6 +1831,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
 	BATCH_OPS(htab_lru),
 	.map_btf_name = "bpf_htab",
 	.map_btf_id = &htab_lru_map_btf_id,
+	.iter_seq_info = &iter_seq_info,
 };
 
 /* Called from eBPF program */
@@ -1760,6 +1949,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
 	BATCH_OPS(htab_percpu),
 	.map_btf_name = "bpf_htab",
 	.map_btf_id = &htab_percpu_map_btf_id,
+	.iter_seq_info = &iter_seq_info,
 };
 
 static int htab_lru_percpu_map_btf_id;
@@ -1775,6 +1965,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
 	BATCH_OPS(htab_lru_percpu),
 	.map_btf_name = "bpf_htab",
 	.map_btf_id = &htab_lru_percpu_map_btf_id,
+	.iter_seq_info = &iter_seq_info,
 };
 
 static int fd_htab_map_alloc_check(union bpf_attr *attr)
diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c
index e740312a5456..2988244853d1 100644
--- a/kernel/bpf/map_iter.c
+++ b/kernel/bpf/map_iter.c
@@ -101,7 +101,29 @@ static const struct bpf_iter_reg bpf_map_reg_info = {
 static int bpf_iter_check_map(struct bpf_prog *prog,
 			      struct bpf_iter_aux_info *aux)
 {
-	return -EINVAL;
+	u32 key_acc_size, value_acc_size, key_size, value_size;
+	struct bpf_map *map = aux->map;
+	bool is_percpu = false;
+
+	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
+	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH)
+		is_percpu = true;
+	else if (map->map_type != BPF_MAP_TYPE_HASH &&
+		 map->map_type != BPF_MAP_TYPE_LRU_HASH)
+		return -EINVAL;
+
+	key_acc_size = prog->aux->max_rdonly_access[0];
+	value_acc_size = prog->aux->max_rdonly_access[1];
+	key_size = map->key_size;
+	if (!is_percpu)
+		value_size = map->value_size;
+	else
+		value_size = round_up(map->value_size, 8) * num_possible_cpus();
+
+	if (key_acc_size > key_size || value_acc_size > value_size)
+		return -EACCES;
+
+	return 0;
 }
 
 DEFINE_BPF_ITER_FUNC(bpf_map_elem, struct bpf_iter_meta *meta,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 06/13] bpf: implement bpf iterator for array maps
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
                   ` (4 preceding siblings ...)
  2020-07-13 16:17 ` [PATCH bpf-next 05/13] bpf: implement bpf iterator for hash maps Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  2020-07-13 18:49   ` kernel test robot
  2020-07-13 16:17 ` [PATCH bpf-next 07/13] bpf: implement bpf iterator for sock local storage map Yonghong Song
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

The bpf iterators for array and percpu array
are implemented. Similar to hash maps, for percpu
array map, bpf program will receive values
from all cpus.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 kernel/bpf/arraymap.c | 140 ++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/map_iter.c |   6 +-
 2 files changed, 144 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index c66e8273fccd..e855bafb3a59 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -487,6 +487,144 @@ static int array_map_mmap(struct bpf_map *map, struct vm_area_struct *vma)
 				   vma->vm_pgoff + pgoff);
 }
 
+struct bpf_iter_seq_array_map_info {
+	struct bpf_map *map;
+	void *percpu_value_buf;
+	u32 index;
+};
+
+static void *bpf_array_map_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	struct bpf_iter_seq_array_map_info *info = seq->private;
+	struct bpf_map *map = info->map;
+	struct bpf_array *array;
+	u32 index;
+
+	if (info->index >= map->max_entries)
+		return NULL;
+
+	if (*pos == 0)
+		++*pos;
+	array = container_of(map, struct bpf_array, map);
+	index = info->index & array->index_mask;
+	if (info->percpu_value_buf)
+	       return array->pptrs[index];
+	return array->value + array->elem_size * index;
+}
+
+static void *bpf_array_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+	struct bpf_iter_seq_array_map_info *info = seq->private;
+	struct bpf_map *map = info->map;
+	struct bpf_array *array;
+	u32 index;
+
+	++*pos;
+	++info->index;
+	if (info->index >= map->max_entries)
+		return NULL;
+
+	array = container_of(map, struct bpf_array, map);
+	index = info->index & array->index_mask;
+	if (info->percpu_value_buf)
+	       return array->pptrs[index];
+	return array->value + array->elem_size * index;
+}
+
+static int __bpf_array_map_seq_show(struct seq_file *seq, void *v)
+{
+	struct bpf_iter_seq_array_map_info *info = seq->private;
+	struct bpf_iter__bpf_map_elem ctx = {};
+	struct bpf_map *map = info->map;
+	struct bpf_iter_meta meta;
+	struct bpf_array *array;
+	struct bpf_prog *prog;
+	int off = 0, cpu = 0;
+	void __percpu **pptr;
+	u32 size;
+
+	meta.seq = seq;
+	prog = bpf_iter_get_info(&meta, v == NULL);
+	if (!prog)
+		return 0;
+
+	ctx.meta = &meta;
+	ctx.map = info->map;
+	if (v) {
+		ctx.key = &info->index;
+
+		array = container_of(map, struct bpf_array, map);
+		if (!info->percpu_value_buf) {
+			ctx.value = v;
+		} else {
+			pptr = v;
+			size = round_up(map->value_size, 8);
+			for_each_possible_cpu(cpu) {
+				bpf_long_memcpy(info->percpu_value_buf + off,
+						per_cpu_ptr(pptr, cpu),
+						size);
+				off += size;
+			}
+			ctx.value = info->percpu_value_buf;
+		}
+	}
+
+	return bpf_iter_run_prog(prog, &ctx);
+}
+
+static int bpf_array_map_seq_show(struct seq_file *seq, void *v)
+{
+	return __bpf_array_map_seq_show(seq, v);
+}
+
+static void bpf_array_map_seq_stop(struct seq_file *seq, void *v)
+{
+	if (!v)
+		(void)__bpf_array_map_seq_show(seq, NULL);
+}
+
+static int bpf_iter_init_array_map(void *priv_data,
+				   struct bpf_iter_aux_info *aux)
+{
+	struct bpf_iter_seq_array_map_info *seq_info = priv_data;
+	struct bpf_map *map = aux->map;
+	void *value_buf;
+	u32 buf_size;
+
+	if (map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY) {
+		buf_size = round_up(map->value_size, 8) * num_possible_cpus();
+		value_buf = kmalloc(buf_size, GFP_USER | __GFP_NOWARN);
+		if (!value_buf)
+			return -ENOMEM;
+
+		seq_info->percpu_value_buf = value_buf;
+	}
+
+	seq_info->map = map;
+	return 0;
+}
+
+static void bpf_iter_fini_array_map(void *priv_data)
+{
+	struct bpf_iter_seq_array_map_info *seq_info = priv_data;
+
+	kfree(seq_info->percpu_value_buf);
+}
+
+static const struct seq_operations bpf_array_map_seq_ops = {
+	.start	= bpf_array_map_seq_start,
+	.next	= bpf_array_map_seq_next,
+	.stop	= bpf_array_map_seq_stop,
+	.show	= bpf_array_map_seq_show,
+};
+
+static const struct bpf_iter_seq_info iter_seq_info = {
+	.seq_ops		= &bpf_array_map_seq_ops,
+	.init_seq_private	= bpf_iter_init_array_map,
+	.fini_seq_private	= bpf_iter_fini_array_map,
+	.seq_priv_size		= sizeof(struct bpf_iter_seq_array_map_info),
+};
+
 static int array_map_btf_id;
 const struct bpf_map_ops array_map_ops = {
 	.map_alloc_check = array_map_alloc_check,
@@ -506,6 +644,7 @@ const struct bpf_map_ops array_map_ops = {
 	.map_update_batch = generic_map_update_batch,
 	.map_btf_name = "bpf_array",
 	.map_btf_id = &array_map_btf_id,
+	.iter_seq_info = &iter_seq_info,
 };
 
 static int percpu_array_map_btf_id;
@@ -521,6 +660,7 @@ const struct bpf_map_ops percpu_array_map_ops = {
 	.map_check_btf = array_map_check_btf,
 	.map_btf_name = "bpf_array",
 	.map_btf_id = &percpu_array_map_btf_id,
+	.iter_seq_info = &iter_seq_info,
 };
 
 static int fd_array_map_alloc_check(union bpf_attr *attr)
diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c
index 2988244853d1..5729d78166e3 100644
--- a/kernel/bpf/map_iter.c
+++ b/kernel/bpf/map_iter.c
@@ -106,10 +106,12 @@ static int bpf_iter_check_map(struct bpf_prog *prog,
 	bool is_percpu = false;
 
 	if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
-	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH)
+	    map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
+	    map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY)
 		is_percpu = true;
 	else if (map->map_type != BPF_MAP_TYPE_HASH &&
-		 map->map_type != BPF_MAP_TYPE_LRU_HASH)
+		 map->map_type != BPF_MAP_TYPE_LRU_HASH &&
+		 map->map_type != BPF_MAP_TYPE_ARRAY)
 		return -EINVAL;
 
 	key_acc_size = prog->aux->max_rdonly_access[0];
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 07/13] bpf: implement bpf iterator for sock local storage map
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
                   ` (5 preceding siblings ...)
  2020-07-13 16:17 ` [PATCH bpf-next 06/13] bpf: implement bpf iterator for array maps Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 08/13] tools/libbpf: add support for bpf map element iterator Yonghong Song
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

The bpf iterator for bpf sock local storage map
is implemented. User space interacts with sock
local storage map with fd as a key and storage value.
In kernel, passing fd to the bpf program does not
really make sense. In this case, the sock itself is
passed to bpf program.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 net/core/bpf_sk_storage.c | 203 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 203 insertions(+)

diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 6f921c4ddc2c..95638eca5d67 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -1217,3 +1217,206 @@ int bpf_sk_storage_diag_put(struct bpf_sk_storage_diag *diag,
 	return err;
 }
 EXPORT_SYMBOL_GPL(bpf_sk_storage_diag_put);
+
+struct bpf_iter_seq_sk_storage_map_info {
+	struct bpf_map *map;
+	unsigned int bucket_id;
+	unsigned skip_elems;
+};
+
+static struct bpf_sk_storage_elem *
+bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info,
+				 struct bpf_sk_storage_elem *prev_selem)
+{
+	struct bpf_sk_storage *sk_storage;
+	struct bpf_sk_storage_elem *selem;
+	u32 skip_elems = info->skip_elems;
+	struct bpf_sk_storage_map *smap;
+	u32 bucket_id = info->bucket_id;
+	u32 i, count, n_buckets;
+	struct bucket *b;
+
+	smap = (struct bpf_sk_storage_map *)info->map;
+	n_buckets = 1U << smap->bucket_log;
+	if (bucket_id >= n_buckets)
+		return NULL;
+
+	/* try to find next selem in the same bucket */
+	selem = prev_selem;
+	count = 0;
+	while (selem) {
+		selem = hlist_entry_safe(selem->map_node.next,
+					 struct bpf_sk_storage_elem, map_node);
+		if (!selem) {
+			/* not found, unlock and go to the next bucket */
+			b = &smap->buckets[bucket_id++];
+			raw_spin_unlock_bh(&b->lock);
+			skip_elems = 0;
+			break;
+		}
+		sk_storage = rcu_dereference_raw(selem->sk_storage);
+		if (sk_storage) {
+			info->skip_elems = skip_elems + count;
+			return selem;
+		}
+		count++;
+	}
+
+	for (i = bucket_id; i < (1U << smap->bucket_log); i++) {
+		b = &smap->buckets[i];
+		raw_spin_lock_bh(&b->lock);
+		count = 0;
+		hlist_for_each_entry(selem, &b->list, map_node) {
+			sk_storage = rcu_dereference_raw(selem->sk_storage);
+			if (sk_storage && count >= skip_elems) {
+				info->bucket_id = i;
+				info->skip_elems = count;
+				return selem;
+			}
+			count++;
+		}
+		raw_spin_unlock_bh(&b->lock);
+		skip_elems = 0;
+	}
+
+	info->bucket_id = i;
+	info->skip_elems = 0;
+	return NULL;
+}
+
+static void *bpf_sk_storage_map_seq_start(struct seq_file *seq, loff_t *pos)
+{
+	struct bpf_sk_storage_elem *selem;
+
+	selem = bpf_sk_storage_map_seq_find_next(seq->private, NULL);
+	if (!selem)
+		return NULL;
+
+	if (*pos == 0)
+		++*pos;
+	return selem;
+}
+
+static void *bpf_sk_storage_map_seq_next(struct seq_file *seq, void *v,
+					 loff_t *pos)
+{
+	struct bpf_iter_seq_sk_storage_map_info *info = seq->private;
+
+	++*pos;
+	++info->skip_elems;
+	return bpf_sk_storage_map_seq_find_next(seq->private, v);
+}
+
+struct bpf_iter__bpf_sk_storage_map {
+	__bpf_md_ptr(struct bpf_iter_meta *, meta);
+	__bpf_md_ptr(struct bpf_map *, map);
+	__bpf_md_ptr(struct sock *, sk);
+	__bpf_md_ptr(void *, value);
+};
+
+DEFINE_BPF_ITER_FUNC(bpf_sk_storage_map, struct bpf_iter_meta *meta,
+		     struct bpf_map *map, struct sock *sk,
+		     void *value)
+
+static int __bpf_sk_storage_map_seq_show(struct seq_file *seq,
+					 struct bpf_sk_storage_elem *selem)
+{
+	struct bpf_iter_seq_sk_storage_map_info *info = seq->private;
+	struct bpf_iter__bpf_sk_storage_map ctx = {};
+	struct bpf_sk_storage *sk_storage;
+	struct bpf_iter_meta meta;
+	struct bpf_prog *prog;
+	int ret = 0;
+
+	meta.seq = seq;
+	prog = bpf_iter_get_info(&meta, selem == NULL);
+	if (prog) {
+		ctx.meta = &meta;
+		ctx.map = info->map;
+		if (selem) {
+			sk_storage = rcu_dereference_raw(selem->sk_storage);
+			ctx.sk = sk_storage->sk;
+			ctx.value = SDATA(selem)->data;
+		}
+		ret = bpf_iter_run_prog(prog, &ctx);
+	}
+
+	return ret;
+}
+
+static int bpf_sk_storage_map_seq_show(struct seq_file *seq, void *v)
+{
+	return __bpf_sk_storage_map_seq_show(seq, v);
+}
+
+static void bpf_sk_storage_map_seq_stop(struct seq_file *seq, void *v)
+{
+	struct bpf_iter_seq_sk_storage_map_info *info = seq->private;
+	struct bpf_sk_storage_map *smap;
+	struct bucket *b;
+
+	if (!v) {
+		(void)__bpf_sk_storage_map_seq_show(seq, v);
+	} else {
+		smap = (struct bpf_sk_storage_map *)info->map;
+		b = &smap->buckets[info->bucket_id];
+		raw_spin_unlock_bh(&b->lock);
+	}
+}
+
+static int bpf_iter_init_sk_storage_map(void *priv_data,
+					struct bpf_iter_aux_info *aux)
+{
+	struct bpf_iter_seq_sk_storage_map_info *seq_info = priv_data;
+
+	seq_info->map = aux->map;
+	return 0;
+}
+
+static int bpf_iter_check_map(struct bpf_prog *prog,
+			      struct bpf_iter_aux_info *aux)
+{
+	struct bpf_map *map = aux->map;
+
+	if (map->map_type != BPF_MAP_TYPE_SK_STORAGE)
+		return -EINVAL;
+
+	if (prog->aux->max_rdonly_access[0] > map->value_size)
+		return -EACCES;
+
+	return 0;
+}
+
+static const struct seq_operations bpf_sk_storage_map_seq_ops = {
+	.start  = bpf_sk_storage_map_seq_start,
+	.next   = bpf_sk_storage_map_seq_next,
+	.stop   = bpf_sk_storage_map_seq_stop,
+	.show   = bpf_sk_storage_map_seq_show,
+};
+
+static const struct bpf_iter_seq_info iter_seq_info = {
+	.seq_ops		= &bpf_sk_storage_map_seq_ops,
+	.init_seq_private	= bpf_iter_init_sk_storage_map,
+	.fini_seq_private	= NULL,
+	.seq_priv_size		= sizeof(struct bpf_iter_seq_sk_storage_map_info),
+};
+
+static const struct bpf_iter_reg bpf_sk_storage_map_reg_info = {
+	.target			= "bpf_sk_storage_map",
+	.check_target		= bpf_iter_check_map,
+	.link_info		= BPF_ITER_LINK_MAP_FD,
+	.ctx_arg_info_size	= 2,
+	.ctx_arg_info		= {
+		{ offsetof(struct bpf_iter__bpf_sk_storage_map, sk),
+		  PTR_TO_BTF_ID_OR_NULL },
+		{ offsetof(struct bpf_iter__bpf_sk_storage_map, value),
+		  PTR_TO_RDONLY_BUF_OR_NULL },
+	},
+	.seq_info		= &iter_seq_info,
+};
+
+static int __init bpf_sk_storage_map_iter_init(void)
+{
+	return bpf_iter_reg_target(&bpf_sk_storage_map_reg_info);
+}
+late_initcall(bpf_sk_storage_map_iter_init);
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 08/13] tools/libbpf: add support for bpf map element iterator
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
                   ` (6 preceding siblings ...)
  2020-07-13 16:17 ` [PATCH bpf-next 07/13] bpf: implement bpf iterator for sock local storage map Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 09/13] tools/bpftool: add bpftool " Yonghong Song
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

Add map_fd to bpf_iter_attach_opts and flags to
bpf_link_create_opts. Later on, bpftool or selftest
will be able to create a bpf map element iterator
by passing map_fd to the kernel during link
creation time.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/lib/bpf/bpf.c    |  1 +
 tools/lib/bpf/bpf.h    |  3 ++-
 tools/lib/bpf/libbpf.c | 10 +++++++++-
 tools/lib/bpf/libbpf.h |  3 ++-
 4 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index a7329b671c41..e1bdf214f75f 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -598,6 +598,7 @@ int bpf_link_create(int prog_fd, int target_fd,
 	attr.link_create.prog_fd = prog_fd;
 	attr.link_create.target_fd = target_fd;
 	attr.link_create.attach_type = attach_type;
+	attr.link_create.flags = OPTS_GET(opts, flags, 0);
 
 	return sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
 }
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 1b6015b21ba8..329a8db5526b 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -170,8 +170,9 @@ LIBBPF_API int bpf_prog_detach2(int prog_fd, int attachable_fd,
 
 struct bpf_link_create_opts {
 	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 flags;
 };
-#define bpf_link_create_opts__last_field sz
+#define bpf_link_create_opts__last_field flags
 
 LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
 			       enum bpf_attach_type attach_type,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 25e4f77be8d7..35ed8fbe6502 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8271,13 +8271,20 @@ struct bpf_link *
 bpf_program__attach_iter(struct bpf_program *prog,
 			 const struct bpf_iter_attach_opts *opts)
 {
+	DECLARE_LIBBPF_OPTS(bpf_link_create_opts, link_create_opts);
 	char errmsg[STRERR_BUFSIZE];
 	struct bpf_link *link;
 	int prog_fd, link_fd;
+	__u32 target_fd = 0;
 
 	if (!OPTS_VALID(opts, bpf_iter_attach_opts))
 		return ERR_PTR(-EINVAL);
 
+	if (OPTS_HAS(opts, map_fd)) {
+		target_fd = opts->map_fd;
+		link_create_opts.flags = BPF_ITER_LINK_MAP_FD;
+	}
+
 	prog_fd = bpf_program__fd(prog);
 	if (prog_fd < 0) {
 		pr_warn("program '%s': can't attach before loaded\n",
@@ -8290,7 +8297,8 @@ bpf_program__attach_iter(struct bpf_program *prog,
 		return ERR_PTR(-ENOMEM);
 	link->detach = &bpf_link__detach_fd;
 
-	link_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_ITER, NULL);
+	link_fd = bpf_link_create(prog_fd, target_fd, BPF_TRACE_ITER,
+				  &link_create_opts);
 	if (link_fd < 0) {
 		link_fd = -errno;
 		free(link);
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 2335971ed0bd..9b883fa7724a 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -264,8 +264,9 @@ LIBBPF_API struct bpf_link *bpf_map__attach_struct_ops(struct bpf_map *map);
 
 struct bpf_iter_attach_opts {
 	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 map_fd;
 };
-#define bpf_iter_attach_opts__last_field sz
+#define bpf_iter_attach_opts__last_field map_fd
 
 LIBBPF_API struct bpf_link *
 bpf_program__attach_iter(struct bpf_program *prog,
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 09/13] tools/bpftool: add bpftool support for bpf map element iterator
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
                   ` (7 preceding siblings ...)
  2020-07-13 16:17 ` [PATCH bpf-next 08/13] tools/libbpf: add support for bpf map element iterator Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  2020-07-16 16:39   ` Quentin Monnet
  2020-07-13 16:17 ` [PATCH bpf-next 10/13] selftests/bpf: add test for bpf hash map iterators Yonghong Song
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

The optional parameter "map MAP" can be added to "bpftool iter"
command to create a bpf iterator for map elements. For example,
  bpftool iter pin ./prog.o /sys/fs/bpf/p1 map id 333

For map element bpf iterator "map MAP" parameter is required.
Otherwise, bpf link creation will return an error.

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 .../bpftool/Documentation/bpftool-iter.rst    | 16 ++++++++--
 tools/bpf/bpftool/iter.c                      | 32 ++++++++++++++++---
 2 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-iter.rst b/tools/bpf/bpftool/Documentation/bpftool-iter.rst
index 8dce698eab79..53ee4fb188b4 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-iter.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-iter.rst
@@ -17,14 +17,15 @@ SYNOPSIS
 ITER COMMANDS
 ===================
 
-|	**bpftool** **iter pin** *OBJ* *PATH*
+|	**bpftool** **iter pin** *OBJ* *PATH* [**map** *MAP*]
 |	**bpftool** **iter help**
 |
 |	*OBJ* := /a/file/of/bpf_iter_target.o
+|       *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
 
 DESCRIPTION
 ===========
-	**bpftool iter pin** *OBJ* *PATH*
+	**bpftool iter pin** *OBJ* *PATH* [**map** *MAP*]
 		  A bpf iterator combines a kernel iterating of
 		  particular kernel data (e.g., tasks, bpf_maps, etc.)
 		  and a bpf program called for each kernel data object
@@ -37,6 +38,10 @@ DESCRIPTION
 		  character ('.'), which is reserved for future extensions
 		  of *bpffs*.
 
+                  Map element bpf iterator requires an additional parameter
+                  *MAP* so bpf program can iterate over map elements for
+                  that map.
+
 		  User can then *cat PATH* to see the bpf iterator output.
 
 	**bpftool iter help**
@@ -64,6 +69,13 @@ EXAMPLES
    Create a file-based bpf iterator from bpf_iter_netlink.o and pin it
    to /sys/fs/bpf/my_netlink
 
+**# bpftool iter pin bpf_iter_hashmap.o /sys/fs/bpf/my_hashmap map id 20**
+
+::
+
+   Create a file-based bpf iterator from bpf_iter_hashmap.o and map with
+   id 20, and pin it to /sys/fs/bpf/my_hashmap
+
 SEE ALSO
 ========
 	**bpf**\ (2),
diff --git a/tools/bpf/bpftool/iter.c b/tools/bpf/bpftool/iter.c
index 33240fcc6319..cc1d9bdf6e9d 100644
--- a/tools/bpf/bpftool/iter.c
+++ b/tools/bpf/bpftool/iter.c
@@ -2,6 +2,7 @@
 // Copyright (C) 2020 Facebook
 
 #define _GNU_SOURCE
+#include <unistd.h>
 #include <linux/err.h>
 #include <bpf/libbpf.h>
 
@@ -9,11 +10,12 @@
 
 static int do_pin(int argc, char **argv)
 {
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, iter_opts);
 	const char *objfile, *path;
 	struct bpf_program *prog;
 	struct bpf_object *obj;
 	struct bpf_link *link;
-	int err;
+	int err = -1, map_fd = -1;
 
 	if (!REQ_ARGS(2))
 		usage();
@@ -21,10 +23,26 @@ static int do_pin(int argc, char **argv)
 	objfile = GET_ARG();
 	path = GET_ARG();
 
+	/* optional arguments */
+	if (argc) {
+		if (is_prefix(*argv, "map")) {
+			NEXT_ARG();
+
+			if (!REQ_ARGS(2)) {
+				p_err("incorrect map spec");
+				return -1;
+			}
+
+			map_fd = map_parse_fd(&argc, &argv);
+			if (map_fd < 0)
+				return -1;
+		}
+	}
+
 	obj = bpf_object__open(objfile);
 	if (IS_ERR(obj)) {
 		p_err("can't open objfile %s", objfile);
-		return -1;
+		goto close_map_fd;
 	}
 
 	err = bpf_object__load(obj);
@@ -39,7 +57,10 @@ static int do_pin(int argc, char **argv)
 		goto close_obj;
 	}
 
-	link = bpf_program__attach_iter(prog, NULL);
+	if (map_fd >= 0)
+		iter_opts.map_fd = map_fd;
+
+	link = bpf_program__attach_iter(prog, &iter_opts);
 	if (IS_ERR(link)) {
 		err = PTR_ERR(link);
 		p_err("attach_iter failed for program %s",
@@ -62,13 +83,16 @@ static int do_pin(int argc, char **argv)
 	bpf_link__destroy(link);
 close_obj:
 	bpf_object__close(obj);
+close_map_fd:
+	if (map_fd >= 0)
+		close(map_fd);
 	return err;
 }
 
 static int do_help(int argc, char **argv)
 {
 	fprintf(stderr,
-		"Usage: %1$s %2$s pin OBJ PATH\n"
+		"Usage: %1$s %2$s pin OBJ PATH [map MAP]\n"
 		"       %1$s %2$s help\n"
 		"",
 		bin_name, "iter");
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 10/13] selftests/bpf: add test for bpf hash map iterators
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
                   ` (8 preceding siblings ...)
  2020-07-13 16:17 ` [PATCH bpf-next 09/13] tools/bpftool: add bpftool " Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 11/13] selftests/bpf: add test for bpf array " Yonghong Song
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

Two subtests are added.
  $ ./test_progs -n 4
  ...
  #4/18 bpf_hash_map:OK
  #4/19 bpf_percpu_hash_map:OK
  ...

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 .../selftests/bpf/prog_tests/bpf_iter.c       | 189 ++++++++++++++++++
 .../bpf/progs/bpf_iter_bpf_hash_map.c         | 100 +++++++++
 .../bpf/progs/bpf_iter_bpf_percpu_hash_map.c  |  51 +++++
 3 files changed, 340 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_hash_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_hash_map.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index fed42755416d..0433a181c6c8 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -15,6 +15,8 @@
 #include "bpf_iter_test_kern2.skel.h"
 #include "bpf_iter_test_kern3.skel.h"
 #include "bpf_iter_test_kern4.skel.h"
+#include "bpf_iter_bpf_hash_map.skel.h"
+#include "bpf_iter_bpf_percpu_hash_map.skel.h"
 
 static int duration;
 
@@ -455,6 +457,189 @@ static void test_overflow(bool test_e2big_overflow, bool ret1)
 	bpf_iter_test_kern4__destroy(skel);
 }
 
+static void test_bpf_hash_map(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	struct bpf_iter_bpf_hash_map *skel;
+	int err, i, map_fd, iter_fd;
+	struct bpf_link *link;
+	struct key_t {
+		int a;
+		int b;
+		int c;
+	} key;
+	__u32 expected_key_a = 0, expected_key_b = 0, expected_key_c = 0;
+	__u64 val, expected_val = 0;
+	char buf[64];
+	int len;
+
+	skel = bpf_iter_bpf_hash_map__open();
+	if (CHECK(!skel, "bpf_iter_bpf_hash_map__open",
+		  "skeleton open failed\n"))
+		return;
+
+	skel->bss->in_test_mode = true;
+
+	err = bpf_iter_bpf_hash_map__load(skel);
+	if (CHECK(!skel, "bpf_iter_bpf_hash_map__load",
+		  "skeleton load failed\n"))
+		goto out;
+
+	/* iterator with hashmap2 and hashmap3 should fail */
+	opts.map_fd = bpf_map__fd(skel->maps.hashmap2);
+	link = bpf_program__attach_iter(skel->progs.dump_bpf_hash_map, &opts);
+	if (CHECK(!IS_ERR(link), "attach_iter",
+		  "attach_iter for hashmap2 unexpected succeeded\n"))
+		goto out;
+
+	opts.map_fd = bpf_map__fd(skel->maps.hashmap3);
+	link = bpf_program__attach_iter(skel->progs.dump_bpf_hash_map, &opts);
+	if (CHECK(!IS_ERR(link), "attach_iter",
+		  "attach_iter for hashmap3 unexpected succeeded\n"))
+		goto out;
+
+	/* hashmap1 should be good, update map values here */
+	map_fd = bpf_map__fd(skel->maps.hashmap1);
+	for (i = 0; i < bpf_map__max_entries(skel->maps.hashmap1); i++) {
+		key.a = i + 1;
+		key.b = i + 2;
+		key.c = i + 3;
+		val = i + 4;
+		expected_key_a += key.a;
+		expected_key_b += key.b;
+		expected_key_c += key.c;
+		expected_val += val;
+
+		err = bpf_map_update_elem(map_fd, &key, &val, BPF_ANY);
+		if (CHECK(err, "map_update", "map_update failed\n"))
+			goto out;
+	}
+
+	opts.map_fd = map_fd;
+	link = bpf_program__attach_iter(skel->progs.dump_bpf_hash_map, &opts);
+	if (CHECK(IS_ERR(link), "attach_iter", "attach_iter failed\n"))
+		goto out;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link));
+	if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n"))
+		goto free_link;
+
+	/* do some tests */
+	while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
+		;
+	if (CHECK(len < 0, "read", "read failed: %s\n", strerror(errno)))
+		goto close_iter;
+
+	/* test results */
+	if (CHECK(skel->bss->key_sum_a != expected_key_a,
+		  "key_sum_a", "got %u expected %u\n",
+		  skel->bss->key_sum_a, expected_key_a))
+		goto close_iter;
+	if (CHECK(skel->bss->key_sum_b != expected_key_b,
+		  "key_sum_b", "got %u expected %u\n",
+		  skel->bss->key_sum_b, expected_key_b))
+		goto close_iter;
+	if (CHECK(skel->bss->val_sum != expected_val,
+		  "val_sum", "got %llu expected %llu\n",
+		  skel->bss->val_sum, expected_val))
+		goto close_iter;
+
+close_iter:
+	close(iter_fd);
+free_link:
+	bpf_link__destroy(link);
+out:
+	bpf_iter_bpf_hash_map__destroy(skel);
+}
+
+static void test_bpf_percpu_hash_map(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	struct bpf_iter_bpf_percpu_hash_map *skel;
+	int err, i, j, map_fd, iter_fd;
+	struct bpf_link *link;
+	struct key_t {
+		int a;
+		int b;
+		int c;
+	} key;
+	__u32 expected_key_a = 0, expected_key_b = 0, expected_key_c = 0;
+	__u32 expected_val = 0;
+	char buf[64];
+	void *val;
+	int len;
+
+	val = malloc(8 * bpf_num_possible_cpus());
+
+	skel = bpf_iter_bpf_percpu_hash_map__open();
+	if (CHECK(!skel, "bpf_iter_bpf_percpu_hash_map__open",
+		  "skeleton open failed\n"))
+		return;
+
+	skel->rodata->num_cpus = bpf_num_possible_cpus();
+
+	err = bpf_iter_bpf_percpu_hash_map__load(skel);
+	if (CHECK(!skel, "bpf_iter_bpf_percpu_hash_map__load",
+		  "skeleton load failed\n"))
+		goto out;
+
+	/* update map values here */
+	map_fd = bpf_map__fd(skel->maps.hashmap1);
+	for (i = 0; i < bpf_map__max_entries(skel->maps.hashmap1); i++) {
+		key.a = i + 1;
+		key.b = i + 2;
+		key.c = i + 3;
+		expected_key_a += key.a;
+		expected_key_b += key.b;
+		expected_key_c += key.c;
+
+		for (j = 0; j < bpf_num_possible_cpus(); j++) {
+			*(__u32 *)(val + j * 8) = i + j;
+			expected_val += i + j;
+		}
+
+		err = bpf_map_update_elem(map_fd, &key, val, BPF_ANY);
+		if (CHECK(err, "map_update", "map_update failed\n"))
+			goto out;
+	}
+
+	opts.map_fd = map_fd;
+	link = bpf_program__attach_iter(skel->progs.dump_bpf_percpu_hash_map, &opts);
+	if (CHECK(IS_ERR(link), "attach_iter", "attach_iter failed\n"))
+		goto out;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link));
+	if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n"))
+		goto free_link;
+
+	/* do some tests */
+	while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
+		;
+	if (CHECK(len < 0, "read", "read failed: %s\n", strerror(errno)))
+		goto close_iter;
+
+	/* test results */
+	if (CHECK(skel->bss->key_sum_a != expected_key_a,
+		  "key_sum_a", "got %u expected %u\n",
+		  skel->bss->key_sum_a, expected_key_a))
+		goto close_iter;
+	if (CHECK(skel->bss->key_sum_b != expected_key_b,
+		  "key_sum_b", "got %u expected %u\n",
+		  skel->bss->key_sum_b, expected_key_b))
+		goto close_iter;
+	if (CHECK(skel->bss->val_sum != expected_val,
+		  "val_sum", "got %u expected %u\n",
+		  skel->bss->val_sum, expected_val))
+		goto close_iter;
+
+close_iter:
+	close(iter_fd);
+free_link:
+	bpf_link__destroy(link);
+out:
+	bpf_iter_bpf_percpu_hash_map__destroy(skel);
+}
+
 void test_bpf_iter(void)
 {
 	if (test__start_subtest("btf_id_or_null"))
@@ -491,4 +676,8 @@ void test_bpf_iter(void)
 		test_overflow(true, false);
 	if (test__start_subtest("prog-ret-1"))
 		test_overflow(false, true);
+	if (test__start_subtest("bpf_hash_map"))
+		test_bpf_hash_map();
+	if (test__start_subtest("bpf_percpu_hash_map"))
+		test_bpf_percpu_hash_map();
 }
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_hash_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_hash_map.c
new file mode 100644
index 000000000000..07ddbfdbcab7
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_hash_map.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "bpf_iter.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct key_t {
+	int a;
+	int b;
+	int c;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 3);
+	__type(key, struct key_t);
+	__type(value, __u64);
+} hashmap1 SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 3);
+	__type(key, __u64);
+	__type(value, __u64);
+} hashmap2 SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 3);
+	__type(key, struct key_t);
+	__type(value, __u32);
+} hashmap3 SEC(".maps");
+
+/* will set before prog run */
+bool in_test_mode = 0;
+
+/* will collect results during prog run */
+__u32 key_sum_a = 0, key_sum_b = 0, key_sum_c = 0;
+__u64 val_sum = 0;
+
+SEC("iter/bpf_map_elem")
+int dump_bpf_hash_map(struct bpf_iter__bpf_map_elem *ctx)
+{
+	struct seq_file *seq = ctx->meta->seq;
+	__u32 seq_num = ctx->meta->seq_num;
+	struct bpf_map *map = ctx->map;
+	struct key_t *key = ctx->key;
+	__u64 *val = ctx->value;
+
+	if (in_test_mode) {
+		/* test mode is used by selftests to
+		 * test functionality of bpf_hash_map iter.
+		 *
+		 * the above hashmap1 will have correct size
+		 * and will be accepted, hashmap2 and hashmap3
+		 * should be rejected due to smaller key/value
+		 * size.
+		 */
+		if (key == (void *)0 || val == (void *)0)
+			return 0;
+
+		key_sum_a += key->a;
+		key_sum_b += key->b;
+		key_sum_c += key->c;
+		val_sum += *val;
+		return 0;
+	}
+
+	/* non-test mode, the map is prepared with the
+	 * below bpftool command sequence:
+	 *   bpftool map create /sys/fs/bpf/m1 type hash \
+	 *   	key 12 value 8 entries 3 name map1
+	 *   bpftool map update id 77 key 0 0 0 1 0 0 0 0 0 0 0 1 \
+	 *   	value 0 0 0 1 0 0 0 1
+	 *   bpftool map update id 77 key 0 0 0 1 0 0 0 0 0 0 0 2 \
+	 *   	value 0 0 0 1 0 0 0 2
+	 * The bpftool iter command line:
+	 *   bpftool iter pin ./bpf_iter_bpf_hash_map.o /sys/fs/bpf/p1 \
+	 *   	map id 77
+	 * The below output will be:
+	 *   map dump starts
+	 *   77: (1000000 0 2000000) (200000001000000)
+	 *   77: (1000000 0 1000000) (100000001000000)
+	 *   map dump ends
+	 */
+	if (seq_num == 0)
+		BPF_SEQ_PRINTF(seq, "map dump starts\n");
+
+	if (key == (void *)0 || val == (void *)0) {
+		BPF_SEQ_PRINTF(seq, "map dump ends\n");
+		return 0;
+	}
+
+	BPF_SEQ_PRINTF(seq, "%d: (%x %d %x) (%llx)\n", map->id,
+		       key->a, key->b, key->c, *val);
+
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_hash_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_hash_map.c
new file mode 100644
index 000000000000..6709697b79dc
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_hash_map.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "bpf_iter.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct key_t {
+	int a;
+	int b;
+	int c;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_HASH);
+	__uint(max_entries, 3);
+	__type(key, struct key_t);
+	__type(value, __u32);
+} hashmap1 SEC(".maps");
+
+/* will set before prog run */
+volatile const __u32 num_cpus = 0;
+
+/* will collect results during prog run */
+__u32 key_sum_a = 0, key_sum_b = 0, key_sum_c = 0;
+__u32 val_sum = 0;
+
+SEC("iter/bpf_map_elem")
+int dump_bpf_percpu_hash_map(struct bpf_iter__bpf_map_elem *ctx)
+{
+	struct bpf_map *map = ctx->map;
+	struct key_t *key = ctx->key;
+	void *pptr = ctx->value;
+	__u32 step;
+	int i;
+
+	if (key == (void *)0 || pptr == (void *)0)
+		return 0;
+
+	key_sum_a += key->a;
+	key_sum_b += key->b;
+	key_sum_c += key->c;
+
+	step = 8;
+	for (i = 0; i < num_cpus; i++) {
+		val_sum += *(__u32 *)pptr;
+		pptr += step;
+	}
+	return 0;
+}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 11/13] selftests/bpf: add test for bpf array map iterators
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
                   ` (9 preceding siblings ...)
  2020-07-13 16:17 ` [PATCH bpf-next 10/13] selftests/bpf: add test for bpf hash map iterators Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 12/13] selftests/bpf: add a test for bpf sk_storage_map iterator Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 13/13] selftests/bpf: add a test for out of bound rdonly buf access Yonghong Song
  12 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

Two subtests are added.
  $ ./test_progs -n 4
  ...
  #4/20 bpf_array_map:OK
  #4/21 bpf_percpu_array_map:OK
  ...

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 .../selftests/bpf/prog_tests/bpf_iter.c       | 139 ++++++++++++++++++
 .../bpf/progs/bpf_iter_bpf_array_map.c        |  38 +++++
 .../bpf/progs/bpf_iter_bpf_percpu_array_map.c |  48 ++++++
 3 files changed, 225 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_array_map.c
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 0433a181c6c8..926ae5b62e3a 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -17,6 +17,8 @@
 #include "bpf_iter_test_kern4.skel.h"
 #include "bpf_iter_bpf_hash_map.skel.h"
 #include "bpf_iter_bpf_percpu_hash_map.skel.h"
+#include "bpf_iter_bpf_array_map.skel.h"
+#include "bpf_iter_bpf_percpu_array_map.skel.h"
 
 static int duration;
 
@@ -640,6 +642,139 @@ static void test_bpf_percpu_hash_map(void)
 	bpf_iter_bpf_percpu_hash_map__destroy(skel);
 }
 
+static void test_bpf_array_map(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	struct bpf_iter_bpf_array_map *skel;
+	int err, i, map_fd, iter_fd;
+	struct bpf_link *link;
+	__u32 expected_key = 0;
+	__u64 val, expected_val = 0;
+	char buf[64];
+	int len;
+
+	skel = bpf_iter_bpf_array_map__open_and_load();
+	if (CHECK(!skel, "bpf_iter_bpf_array_map__open_and_load",
+		  "skeleton open_and_load failed\n"))
+		return;
+
+	map_fd = bpf_map__fd(skel->maps.arraymap1);
+	for (i = 0; i < bpf_map__max_entries(skel->maps.arraymap1); i++) {
+		val = i + 4;
+		expected_key += i;
+		expected_val += val;
+
+		err = bpf_map_update_elem(map_fd, &i, &val, BPF_ANY);
+		if (CHECK(err, "map_update", "map_update failed\n"))
+			goto out;
+	}
+
+	opts.map_fd = map_fd;
+	link = bpf_program__attach_iter(skel->progs.dump_bpf_array_map, &opts);
+	if (CHECK(IS_ERR(link), "attach_iter", "attach_iter failed\n"))
+		goto out;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link));
+	if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n"))
+		goto free_link;
+
+	/* do some tests */
+	while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
+		;
+	if (CHECK(len < 0, "read", "read failed: %s\n", strerror(errno)))
+		goto close_iter;
+
+	/* test results */
+	if (CHECK(skel->bss->key_sum != expected_key,
+		  "key_sum", "got %u expected %u\n",
+		  skel->bss->key_sum, expected_key))
+		goto close_iter;
+	if (CHECK(skel->bss->val_sum != expected_val,
+		  "val_sum", "got %llu expected %llu\n",
+		  skel->bss->val_sum, expected_val))
+		goto close_iter;
+
+close_iter:
+	close(iter_fd);
+free_link:
+	bpf_link__destroy(link);
+out:
+	bpf_iter_bpf_array_map__destroy(skel);
+}
+
+static void test_bpf_percpu_array_map(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	struct bpf_iter_bpf_percpu_array_map *skel;
+	int err, i, j, map_fd, iter_fd;
+	struct bpf_link *link;
+	__u32 expected_key = 0, expected_val = 0;
+	char buf[64];
+	void *val;
+	int len;
+
+	val = malloc(8 * bpf_num_possible_cpus());
+
+	skel = bpf_iter_bpf_percpu_array_map__open();
+	if (CHECK(!skel, "bpf_iter_bpf_percpu_array_map__open",
+		  "skeleton open failed\n"))
+		return;
+
+	skel->rodata->num_cpus = bpf_num_possible_cpus();
+
+	err = bpf_iter_bpf_percpu_array_map__load(skel);
+	if (CHECK(!skel, "bpf_iter_bpf_percpu_array_map__load",
+		  "skeleton load failed\n"))
+		goto out;
+
+	/* update map values here */
+	map_fd = bpf_map__fd(skel->maps.arraymap1);
+	for (i = 0; i < bpf_map__max_entries(skel->maps.arraymap1); i++) {
+		expected_key += i;
+
+		for (j = 0; j < bpf_num_possible_cpus(); j++) {
+			*(__u32 *)(val + j * 8) = i + j;
+			expected_val += i + j;
+		}
+
+		err = bpf_map_update_elem(map_fd, &i, val, BPF_ANY);
+		if (CHECK(err, "map_update", "map_update failed\n"))
+			goto out;
+	}
+
+	opts.map_fd = map_fd;
+	link = bpf_program__attach_iter(skel->progs.dump_bpf_percpu_array_map, &opts);
+	if (CHECK(IS_ERR(link), "attach_iter", "attach_iter failed\n"))
+		goto out;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link));
+	if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n"))
+		goto free_link;
+
+	/* do some tests */
+	while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
+		;
+	if (CHECK(len < 0, "read", "read failed: %s\n", strerror(errno)))
+		goto close_iter;
+
+	/* test results */
+	if (CHECK(skel->bss->key_sum != expected_key,
+		  "key_sum", "got %u expected %u\n",
+		  skel->bss->key_sum, expected_key))
+		goto close_iter;
+	if (CHECK(skel->bss->val_sum != expected_val,
+		  "val_sum", "got %u expected %u\n",
+		  skel->bss->val_sum, expected_val))
+		goto close_iter;
+
+close_iter:
+	close(iter_fd);
+free_link:
+	bpf_link__destroy(link);
+out:
+	bpf_iter_bpf_percpu_array_map__destroy(skel);
+}
+
 void test_bpf_iter(void)
 {
 	if (test__start_subtest("btf_id_or_null"))
@@ -680,4 +815,8 @@ void test_bpf_iter(void)
 		test_bpf_hash_map();
 	if (test__start_subtest("bpf_percpu_hash_map"))
 		test_bpf_percpu_hash_map();
+	if (test__start_subtest("bpf_array_map"))
+		test_bpf_array_map();
+	if (test__start_subtest("bpf_percpu_array_map"))
+		test_bpf_percpu_array_map();
 }
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_array_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_array_map.c
new file mode 100644
index 000000000000..26adc4175e96
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_array_map.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "bpf_iter.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct key_t {
+	int a;
+	int b;
+	int c;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 3);
+	__type(key, __u32);
+	__type(value, __u64);
+} arraymap1 SEC(".maps");
+
+__u32 key_sum = 0;
+__u64 val_sum = 0;
+
+SEC("iter/bpf_map_elem")
+int dump_bpf_array_map(struct bpf_iter__bpf_map_elem *ctx)
+{
+	struct bpf_map *map = ctx->map;
+	__u32 *key = ctx->key;
+	__u64 *val = ctx->value;
+
+	if (key == (void *)0 || val == (void *)0)
+		return 0;
+
+	key_sum += *key;
+	val_sum += *val;
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c
new file mode 100644
index 000000000000..598a461844ed
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_percpu_array_map.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "bpf_iter.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct key_t {
+	int a;
+	int b;
+	int c;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(max_entries, 3);
+	__type(key, __u32);
+	__type(value, __u32);
+} arraymap1 SEC(".maps");
+
+/* will set before prog run */
+volatile const __u32 num_cpus = 0;
+
+__u32 key_sum = 0, val_sum = 0;
+
+SEC("iter/bpf_map_elem")
+int dump_bpf_percpu_array_map(struct bpf_iter__bpf_map_elem *ctx)
+{
+	__u32 seq_num = ctx->meta->seq_num;
+	struct bpf_map *map = ctx->map;
+	__u32 *key = ctx->key;
+	void *pptr = ctx->value;
+	__u32 step;
+	int i;
+
+	if (key == (void *)0 || pptr == (void *)0)
+		return 0;
+
+	key_sum += *key;
+
+	step = 8;
+	for (i = 0; i < num_cpus; i++) {
+		val_sum += *(__u32 *)pptr;
+		pptr += step;
+	}
+	return 0;
+}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 12/13] selftests/bpf: add a test for bpf sk_storage_map iterator
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
                   ` (10 preceding siblings ...)
  2020-07-13 16:17 ` [PATCH bpf-next 11/13] selftests/bpf: add test for bpf array " Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  2020-07-13 16:17 ` [PATCH bpf-next 13/13] selftests/bpf: add a test for out of bound rdonly buf access Yonghong Song
  12 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

Added one test for bpf sk_storage_map_iterator.
  $ ./test_progs -n 4
  ...
  #4/22 bpf_sk_storage_map:OK
  ...

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 .../selftests/bpf/prog_tests/bpf_iter.c       | 72 +++++++++++++++++++
 .../bpf/progs/bpf_iter_bpf_sk_storage_map.c   | 35 +++++++++
 2 files changed, 107 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_map.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 926ae5b62e3a..ecee834a7f60 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -19,6 +19,7 @@
 #include "bpf_iter_bpf_percpu_hash_map.skel.h"
 #include "bpf_iter_bpf_array_map.skel.h"
 #include "bpf_iter_bpf_percpu_array_map.skel.h"
+#include "bpf_iter_bpf_sk_storage_map.skel.h"
 
 static int duration;
 
@@ -775,6 +776,75 @@ static void test_bpf_percpu_array_map(void)
 	bpf_iter_bpf_percpu_array_map__destroy(skel);
 }
 
+static void test_bpf_sk_storage_map(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	int err, i, len, map_fd, iter_fd, num_sockets;
+	struct bpf_iter_bpf_sk_storage_map *skel;
+	int sock_fd[3] = {-1, -1, -1};
+	struct bpf_link *link;
+	__u32 val, expected_val = 0;
+	char buf[64];
+
+	skel = bpf_iter_bpf_sk_storage_map__open_and_load();
+	if (CHECK(!skel, "bpf_iter_bpf_sk_storage_map__open_and_load",
+		  "skeleton open_and_load failed\n"))
+		return;
+
+	map_fd = bpf_map__fd(skel->maps.sk_stg_map);
+	num_sockets = ARRAY_SIZE(sock_fd);
+	for (i = 0; i < num_sockets; i++) {
+		sock_fd[i] = socket(AF_INET6, SOCK_STREAM, 0);
+		if (CHECK(sock_fd[i] < 0, "socket", "errno: %d\n", errno))
+			goto out;
+
+		val = i + 1;
+		expected_val += val;
+
+		err = bpf_map_update_elem(map_fd, &sock_fd[i], &val,
+					  BPF_NOEXIST);
+		if (CHECK(err, "map_update", "map_update failed\n"))
+			goto out;
+	}
+
+	opts.map_fd = map_fd;
+	link = bpf_program__attach_iter(skel->progs.dump_bpf_sk_storage_map, &opts);
+	if (CHECK(IS_ERR(link), "attach_iter", "attach_iter failed\n"))
+		goto out;
+
+	iter_fd = bpf_iter_create(bpf_link__fd(link));
+	if (CHECK(iter_fd < 0, "create_iter", "create_iter failed\n"))
+		goto free_link;
+
+	/* do some tests */
+	while ((len = read(iter_fd, buf, sizeof(buf))) > 0)
+		;
+	if (CHECK(len < 0, "read", "read failed: %s\n", strerror(errno)))
+		goto close_iter;
+
+	/* test results */
+	if (CHECK(skel->bss->ipv6_sk_count != num_sockets,
+		  "ipv6_sk_count", "got %u expected %u\n",
+		  skel->bss->ipv6_sk_count, num_sockets))
+		goto close_iter;
+
+	if (CHECK(skel->bss->val_sum != expected_val,
+		  "val_sum", "got %u expected %u\n",
+		  skel->bss->val_sum, expected_val))
+		goto close_iter;
+
+close_iter:
+	close(iter_fd);
+free_link:
+	bpf_link__destroy(link);
+out:
+	for (i = 0; i < num_sockets; i++) {
+		if (sock_fd[i] >= 0)
+			close(sock_fd[i]);
+	}
+	bpf_iter_bpf_sk_storage_map__destroy(skel);
+}
+
 void test_bpf_iter(void)
 {
 	if (test__start_subtest("btf_id_or_null"))
@@ -819,4 +889,6 @@ void test_bpf_iter(void)
 		test_bpf_array_map();
 	if (test__start_subtest("bpf_percpu_array_map"))
 		test_bpf_percpu_array_map();
+	if (test__start_subtest("bpf_sk_storage_map"))
+		test_bpf_sk_storage_map();
 }
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_map.c b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_map.c
new file mode 100644
index 000000000000..2f8aff5ff38c
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_bpf_sk_storage_map.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "bpf_iter.h"
+#include "bpf_tracing_net.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, int);
+} sk_stg_map SEC(".maps");
+
+__u32 val_sum = 0;
+__u32 ipv6_sk_count = 0;
+
+SEC("iter/bpf_sk_storage_map")
+int dump_bpf_sk_storage_map(struct bpf_iter__bpf_sk_storage_map *ctx)
+{
+	struct sock *sk = ctx->sk;
+	struct bpf_map *map = ctx->map;
+	__u32 *val = ctx->value;
+
+	if (sk == (void *)0 || val == (void *)0)
+		return 0;
+
+	if (sk->sk_family == AF_INET6)
+		ipv6_sk_count++;
+
+	val_sum += *val;
+	return 0;
+}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH bpf-next 13/13] selftests/bpf: add a test for out of bound rdonly buf access
  2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
                   ` (11 preceding siblings ...)
  2020-07-13 16:17 ` [PATCH bpf-next 12/13] selftests/bpf: add a test for bpf sk_storage_map iterator Yonghong Song
@ 2020-07-13 16:17 ` Yonghong Song
  12 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2020-07-13 16:17 UTC (permalink / raw)
  To: bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

If the bpf program contains out of bound access w.r.t. a
particular map key/value size, the verification will be
still okay, e.g., it will be accepted by verifier. But
it will be rejected during link_create time. A test
is added here to ensure link_create failure did happen
if out of bound access happened.
  $ ./test_progs -n 4
  ...
  #4/23 rdonly-buf-out-of-bound:OK
  ...

Signed-off-by: Yonghong Song <yhs@fb.com>
---
 .../selftests/bpf/prog_tests/bpf_iter.c       | 22 ++++++++++++
 .../selftests/bpf/progs/bpf_iter_test_kern5.c | 36 +++++++++++++++++++
 2 files changed, 58 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_test_kern5.c

diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index ecee834a7f60..54a7be25c613 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -20,6 +20,7 @@
 #include "bpf_iter_bpf_array_map.skel.h"
 #include "bpf_iter_bpf_percpu_array_map.skel.h"
 #include "bpf_iter_bpf_sk_storage_map.skel.h"
+#include "bpf_iter_test_kern5.skel.h"
 
 static int duration;
 
@@ -845,6 +846,25 @@ static void test_bpf_sk_storage_map(void)
 	bpf_iter_bpf_sk_storage_map__destroy(skel);
 }
 
+static void test_rdonly_buf_out_of_bound(void)
+{
+	DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts);
+	struct bpf_iter_test_kern5 *skel;
+	struct bpf_link *link;
+
+	skel = bpf_iter_test_kern5__open_and_load();
+	if (CHECK(!skel, "bpf_iter_test_kern5__open_and_load",
+		  "skeleton open_and_load failed\n"))
+		return;
+
+	opts.map_fd = bpf_map__fd(skel->maps.hashmap1);
+	link = bpf_program__attach_iter(skel->progs.dump_bpf_hash_map, &opts);
+	if (CHECK(!IS_ERR(link), "attach_iter", "unexpected success\n"))
+		bpf_link__destroy(link);
+
+	bpf_iter_test_kern5__destroy(skel);
+}
+
 void test_bpf_iter(void)
 {
 	if (test__start_subtest("btf_id_or_null"))
@@ -891,4 +911,6 @@ void test_bpf_iter(void)
 		test_bpf_percpu_array_map();
 	if (test__start_subtest("bpf_sk_storage_map"))
 		test_bpf_sk_storage_map();
+	if (test__start_subtest("rdonly-buf-out-of-bound"))
+		test_rdonly_buf_out_of_bound();
 }
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_test_kern5.c b/tools/testing/selftests/bpf/progs/bpf_iter_test_kern5.c
new file mode 100644
index 000000000000..b6dac5afa64d
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_test_kern5.c
@@ -0,0 +1,36 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2020 Facebook */
+#include "bpf_iter.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+struct key_t {
+	int a;
+	int b;
+	int c;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 3);
+	__type(key, struct key_t);
+	__type(value, __u64);
+} hashmap1 SEC(".maps");
+
+__u32 key_sum = 0;
+
+SEC("iter/bpf_map_elem")
+int dump_bpf_hash_map(struct bpf_iter__bpf_map_elem *ctx)
+{
+	struct seq_file *seq = ctx->meta->seq;
+	void *key = ctx->key;
+
+	if (key == (void *)0)
+		return 0;
+
+	/* out of bound access w.r.t. hashmap1 */
+	key_sum += *(__u32 *)(key + sizeof(struct key_t));
+	return 0;
+}
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 06/13] bpf: implement bpf iterator for array maps
  2020-07-13 16:17 ` [PATCH bpf-next 06/13] bpf: implement bpf iterator for array maps Yonghong Song
@ 2020-07-13 18:49   ` kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2020-07-13 18:49 UTC (permalink / raw)
  To: Yonghong Song, bpf, netdev
  Cc: kbuild-all, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

[-- Attachment #1: Type: text/plain, Size: 3078 bytes --]

Hi Yonghong,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Yonghong-Song/bpf-implement-bpf-iterator-for-map-elements/20200714-002048
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: arc-allyesconfig (attached as .config)
compiler: arc-elf-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   kernel/bpf/arraymap.c: In function '__bpf_array_map_seq_show':
>> kernel/bpf/arraymap.c:540:20: warning: variable 'array' set but not used [-Wunused-but-set-variable]
     540 |  struct bpf_array *array;
         |                    ^~~~~
   In file included from include/linux/perf_event.h:25,
                    from kernel/bpf/arraymap.c:11:
   At top level:
   arch/arc/include/asm/perf_event.h:126:23: warning: 'arc_pmu_cache_map' defined but not used [-Wunused-const-variable=]
     126 | static const unsigned arc_pmu_cache_map[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
         |                       ^~~~~~~~~~~~~~~~~
   arch/arc/include/asm/perf_event.h:91:27: warning: 'arc_pmu_ev_hw_map' defined but not used [-Wunused-const-variable=]
      91 | static const char * const arc_pmu_ev_hw_map[] = {
         |                           ^~~~~~~~~~~~~~~~~

vim +/array +540 kernel/bpf/arraymap.c

   533	
   534	static int __bpf_array_map_seq_show(struct seq_file *seq, void *v)
   535	{
   536		struct bpf_iter_seq_array_map_info *info = seq->private;
   537		struct bpf_iter__bpf_map_elem ctx = {};
   538		struct bpf_map *map = info->map;
   539		struct bpf_iter_meta meta;
 > 540		struct bpf_array *array;
   541		struct bpf_prog *prog;
   542		int off = 0, cpu = 0;
   543		void __percpu **pptr;
   544		u32 size;
   545	
   546		meta.seq = seq;
   547		prog = bpf_iter_get_info(&meta, v == NULL);
   548		if (!prog)
   549			return 0;
   550	
   551		ctx.meta = &meta;
   552		ctx.map = info->map;
   553		if (v) {
   554			ctx.key = &info->index;
   555	
   556			array = container_of(map, struct bpf_array, map);
   557			if (!info->percpu_value_buf) {
   558				ctx.value = v;
   559			} else {
   560				pptr = v;
   561				size = round_up(map->value_size, 8);
   562				for_each_possible_cpu(cpu) {
   563					bpf_long_memcpy(info->percpu_value_buf + off,
   564							per_cpu_ptr(pptr, cpu),
   565							size);
   566					off += size;
   567				}
   568				ctx.value = info->percpu_value_buf;
   569			}
   570		}
   571	
   572		return bpf_iter_run_prog(prog, &ctx);
   573	}
   574	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 64637 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 03/13] bpf: support readonly buffer in verifier
  2020-07-13 16:17 ` [PATCH bpf-next 03/13] bpf: support readonly buffer in verifier Yonghong Song
@ 2020-07-13 23:25   ` Alexei Starovoitov
  2020-07-15 17:34     ` Yonghong Song
  0 siblings, 1 reply; 22+ messages in thread
From: Alexei Starovoitov @ 2020-07-13 23:25 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Martin KaFai Lau

On Mon, Jul 13, 2020 at 09:17:42AM -0700, Yonghong Song wrote:
> Two new readonly buffer PTR_TO_RDONLY_BUF or
> PTR_TO_RDONLY_BUF_OR_NULL register states
> are introduced. These new register states will be used
> by later bpf map element iterator.
> 
> New register states share some similarity to
> PTR_TO_TP_BUFFER as it will calculate accessed buffer
> size during verification time. The accessed buffer
> size will be later compared to other metrics during
> later attach/link_create time.
> 
> Two differences between PTR_TO_TP_BUFFER and
> PTR_TO_RDONLY_BUF[_OR_NULL].
> PTR_TO_TP_BUFFER is for write only
> and PTR_TO_RDONLY_BUF[_OR_NULL] is for read only.
> In addition, a rdonly_buf_seq_id is also added to the
> register state since it is possible for the same program
> there could be two PTR_TO_RDONLY_BUF[_OR_NULL] ctx arguments.
> For example, for bpf later map element iterator,
> both key and value may be PTR_TO_TP_BUFFER_OR_NULL.
> 
> Similar to reg_state PTR_TO_BTF_ID_OR_NULL in bpf
> iterator programs, PTR_TO_RDONLY_BUF_OR_NULL reg_type and
> its rdonly_buf_seq_id can be set at
> prog->aux->bpf_ctx_arg_aux, and bpf verifier will
> retrieve the values during btf_ctx_access().
> Later bpf map element iterator implementation
> will show how such information will be assigned
> during target registeration time.
...
>  struct bpf_ctx_arg_aux {
>  	u32 offset;
>  	enum bpf_reg_type reg_type;
> +	u32 rdonly_buf_seq_id;
>  };
>  
> +#define BPF_MAX_RDONLY_BUF	2
> +
>  struct bpf_prog_aux {
>  	atomic64_t refcnt;
>  	u32 used_map_cnt;
> @@ -693,6 +699,7 @@ struct bpf_prog_aux {
>  	u32 attach_btf_id; /* in-kernel BTF type id to attach to */
>  	u32 ctx_arg_info_size;
>  	const struct bpf_ctx_arg_aux *ctx_arg_info;
> +	u32 max_rdonly_access[BPF_MAX_RDONLY_BUF];

I think PTR_TO_RDONLY_BUF approach is too limiting.
I think the map value should probably be writable from the beginning,
but I don't see how this RDONLY_BUF support can be naturally extended.
Also key and value can be large, so just load/store is going to be
limiting pretty quickly. People would want to use helpers to access
key/value areas. I think any existing helper that accepts ARG_PTR_TO_MEM
should be usable with data from this key/value.
PTR_TO_TP_BUFFER was a quick hack for tiny scratch area.
Here I think the verifier should be smart from the start.

The next patch populates bpf_ctx_arg_aux with hardcoded 0 and 1.
imo that's too hacky. Helper definitions shouldn't be in business
of poking into such verifier internals.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 03/13] bpf: support readonly buffer in verifier
  2020-07-13 23:25   ` Alexei Starovoitov
@ 2020-07-15 17:34     ` Yonghong Song
  2020-07-15 17:52       ` Alexei Starovoitov
  0 siblings, 1 reply; 22+ messages in thread
From: Yonghong Song @ 2020-07-15 17:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team,
	Martin KaFai Lau



On 7/13/20 4:25 PM, Alexei Starovoitov wrote:
> On Mon, Jul 13, 2020 at 09:17:42AM -0700, Yonghong Song wrote:
>> Two new readonly buffer PTR_TO_RDONLY_BUF or
>> PTR_TO_RDONLY_BUF_OR_NULL register states
>> are introduced. These new register states will be used
>> by later bpf map element iterator.
>>
>> New register states share some similarity to
>> PTR_TO_TP_BUFFER as it will calculate accessed buffer
>> size during verification time. The accessed buffer
>> size will be later compared to other metrics during
>> later attach/link_create time.
>>
>> Two differences between PTR_TO_TP_BUFFER and
>> PTR_TO_RDONLY_BUF[_OR_NULL].
>> PTR_TO_TP_BUFFER is for write only
>> and PTR_TO_RDONLY_BUF[_OR_NULL] is for read only.
>> In addition, a rdonly_buf_seq_id is also added to the
>> register state since it is possible for the same program
>> there could be two PTR_TO_RDONLY_BUF[_OR_NULL] ctx arguments.
>> For example, for bpf later map element iterator,
>> both key and value may be PTR_TO_TP_BUFFER_OR_NULL.
>>
>> Similar to reg_state PTR_TO_BTF_ID_OR_NULL in bpf
>> iterator programs, PTR_TO_RDONLY_BUF_OR_NULL reg_type and
>> its rdonly_buf_seq_id can be set at
>> prog->aux->bpf_ctx_arg_aux, and bpf verifier will
>> retrieve the values during btf_ctx_access().
>> Later bpf map element iterator implementation
>> will show how such information will be assigned
>> during target registeration time.
> ...
>>   struct bpf_ctx_arg_aux {
>>   	u32 offset;
>>   	enum bpf_reg_type reg_type;
>> +	u32 rdonly_buf_seq_id;
>>   };
>>   
>> +#define BPF_MAX_RDONLY_BUF	2
>> +
>>   struct bpf_prog_aux {
>>   	atomic64_t refcnt;
>>   	u32 used_map_cnt;
>> @@ -693,6 +699,7 @@ struct bpf_prog_aux {
>>   	u32 attach_btf_id; /* in-kernel BTF type id to attach to */
>>   	u32 ctx_arg_info_size;
>>   	const struct bpf_ctx_arg_aux *ctx_arg_info;
>> +	u32 max_rdonly_access[BPF_MAX_RDONLY_BUF];
> 
> I think PTR_TO_RDONLY_BUF approach is too limiting.
> I think the map value should probably be writable from the beginning,
> but I don't see how this RDONLY_BUF support can be naturally extended.

Agreed. Let me try to make map value read/write-able.

One thing we discussed earlier is whether and how we could make
map element deletable during iterator traversal. I will explore
this as well.

> Also key and value can be large, so just load/store is going to be
> limiting pretty quickly. People would want to use helpers to access
> key/value areas. I think any existing helper that accepts ARG_PTR_TO_MEM
> should be usable with data from this key/value.

This is a useful suggestion. I actually indeed hacked trying to
allow
   bpf_seq_write(seq, buf, buf_size) accepts rdonly_buf register state
so bpf iterator can also copy key/value to user space through seq_file.
The bpf_seq_write 2nd arg is ARG_PTR_TO_MEM. This actually works.

I originally planned to have this as a followup. Since you mentioned 
this, I will incorporate it in the next revision.

> PTR_TO_TP_BUFFER was a quick hack for tiny scratch area.
> Here I think the verifier should be smart from the start. >
> The next patch populates bpf_ctx_arg_aux with hardcoded 0 and 1.
> imo that's too hacky. Helper definitions shouldn't be in business
> of poking into such verifier internals.

The reason I am using 0/1 so later on I can easily correlate
which rdonly_buf access size corresponds to key or value. I guess
I can have a verifier callback to given an ctx argument index to
get the access size.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 03/13] bpf: support readonly buffer in verifier
  2020-07-15 17:34     ` Yonghong Song
@ 2020-07-15 17:52       ` Alexei Starovoitov
  0 siblings, 0 replies; 22+ messages in thread
From: Alexei Starovoitov @ 2020-07-15 17:52 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Network Development, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, Martin KaFai Lau

On Wed, Jul 15, 2020 at 10:48 AM Yonghong Song <yhs@fb.com> wrote:
>
> > PTR_TO_TP_BUFFER was a quick hack for tiny scratch area.
> > Here I think the verifier should be smart from the start. >
> > The next patch populates bpf_ctx_arg_aux with hardcoded 0 and 1.
> > imo that's too hacky. Helper definitions shouldn't be in business
> > of poking into such verifier internals.
>
> The reason I am using 0/1 so later on I can easily correlate
> which rdonly_buf access size corresponds to key or value. I guess
> I can have a verifier callback to given an ctx argument index to
> get the access size.

I see. Hardcoding key vs value in some way is necessary, of course.
Some #define for that with clear name would be good.
I was pointing out that 0/1 were used beyond that need.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 09/13] tools/bpftool: add bpftool support for bpf map element iterator
  2020-07-13 16:17 ` [PATCH bpf-next 09/13] tools/bpftool: add bpftool " Yonghong Song
@ 2020-07-16 16:39   ` Quentin Monnet
  2020-07-16 17:42     ` Yonghong Song
  0 siblings, 1 reply; 22+ messages in thread
From: Quentin Monnet @ 2020-07-16 16:39 UTC (permalink / raw)
  To: Yonghong Song, bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

2020-07-13 09:17 UTC-0700 ~ Yonghong Song <yhs@fb.com>
> The optional parameter "map MAP" can be added to "bpftool iter"
> command to create a bpf iterator for map elements. For example,
>   bpftool iter pin ./prog.o /sys/fs/bpf/p1 map id 333
> 
> For map element bpf iterator "map MAP" parameter is required.
> Otherwise, bpf link creation will return an error.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
>  .../bpftool/Documentation/bpftool-iter.rst    | 16 ++++++++--
>  tools/bpf/bpftool/iter.c                      | 32 ++++++++++++++++---
>  2 files changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-iter.rst b/tools/bpf/bpftool/Documentation/bpftool-iter.rst
> index 8dce698eab79..53ee4fb188b4 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-iter.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-iter.rst
> @@ -17,14 +17,15 @@ SYNOPSIS
>  ITER COMMANDS
>  ===================
>  
> -|	**bpftool** **iter pin** *OBJ* *PATH*
> +|	**bpftool** **iter pin** *OBJ* *PATH* [**map** *MAP*]
>  |	**bpftool** **iter help**
>  |
>  |	*OBJ* := /a/file/of/bpf_iter_target.o
> +|       *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }

Please don't change the indentation style (other lines have a tab).

>  
>  DESCRIPTION
>  ===========
> -	**bpftool iter pin** *OBJ* *PATH*
> +	**bpftool iter pin** *OBJ* *PATH* [**map** *MAP*]
>  		  A bpf iterator combines a kernel iterating of
>  		  particular kernel data (e.g., tasks, bpf_maps, etc.)
>  		  and a bpf program called for each kernel data object
> @@ -37,6 +38,10 @@ DESCRIPTION
>  		  character ('.'), which is reserved for future extensions
>  		  of *bpffs*.
>  
> +                  Map element bpf iterator requires an additional parameter
> +                  *MAP* so bpf program can iterate over map elements for
> +                  that map.
> +

Same note on indentation.

Could you please also explain in a few words what the "Map element bpf
iterator" is? Reusing part of your cover letter (see below) could do,
it's just so that users not familiar with the concept can get an idea of
what it does.

---
User can have a bpf program in kernel to run with each map element,
do checking, filtering, aggregation, etc. without copying data
to user space.
---

>  		  User can then *cat PATH* to see the bpf iterator output.
>  
>  	**bpftool iter help**

[...]

> @@ -62,13 +83,16 @@ static int do_pin(int argc, char **argv)
>  	bpf_link__destroy(link);
>  close_obj:
>  	bpf_object__close(obj);
> +close_map_fd:
> +	if (map_fd >= 0)
> +		close(map_fd);
>  	return err;
>  }
>  
>  static int do_help(int argc, char **argv)
>  {
>  	fprintf(stderr,
> -		"Usage: %1$s %2$s pin OBJ PATH\n"
> +		"Usage: %1$s %2$s pin OBJ PATH [map MAP]\n"

You probably want to add HELP_SPEC_MAP (as in map.c) to tell the user
what MAP should be.

Could you please also update the bash completion?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 09/13] tools/bpftool: add bpftool support for bpf map element iterator
  2020-07-16 16:39   ` Quentin Monnet
@ 2020-07-16 17:42     ` Yonghong Song
  2020-07-17 12:57       ` Quentin Monnet
  0 siblings, 1 reply; 22+ messages in thread
From: Yonghong Song @ 2020-07-16 17:42 UTC (permalink / raw)
  To: Quentin Monnet, bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau



On 7/16/20 9:39 AM, Quentin Monnet wrote:
> 2020-07-13 09:17 UTC-0700 ~ Yonghong Song <yhs@fb.com>
>> The optional parameter "map MAP" can be added to "bpftool iter"
>> command to create a bpf iterator for map elements. For example,
>>    bpftool iter pin ./prog.o /sys/fs/bpf/p1 map id 333
>>
>> For map element bpf iterator "map MAP" parameter is required.
>> Otherwise, bpf link creation will return an error.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
>>   .../bpftool/Documentation/bpftool-iter.rst    | 16 ++++++++--
>>   tools/bpf/bpftool/iter.c                      | 32 ++++++++++++++++---
>>   2 files changed, 42 insertions(+), 6 deletions(-)
>>
>> diff --git a/tools/bpf/bpftool/Documentation/bpftool-iter.rst b/tools/bpf/bpftool/Documentation/bpftool-iter.rst
>> index 8dce698eab79..53ee4fb188b4 100644
>> --- a/tools/bpf/bpftool/Documentation/bpftool-iter.rst
>> +++ b/tools/bpf/bpftool/Documentation/bpftool-iter.rst
>> @@ -17,14 +17,15 @@ SYNOPSIS
>>   ITER COMMANDS
>>   ===================
>>   
>> -|	**bpftool** **iter pin** *OBJ* *PATH*
>> +|	**bpftool** **iter pin** *OBJ* *PATH* [**map** *MAP*]
>>   |	**bpftool** **iter help**
>>   |
>>   |	*OBJ* := /a/file/of/bpf_iter_target.o
>> +|       *MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
> 
> Please don't change the indentation style (other lines have a tab).

Will fix.

> 
>>   
>>   DESCRIPTION
>>   ===========
>> -	**bpftool iter pin** *OBJ* *PATH*
>> +	**bpftool iter pin** *OBJ* *PATH* [**map** *MAP*]
>>   		  A bpf iterator combines a kernel iterating of
>>   		  particular kernel data (e.g., tasks, bpf_maps, etc.)
>>   		  and a bpf program called for each kernel data object
>> @@ -37,6 +38,10 @@ DESCRIPTION
>>   		  character ('.'), which is reserved for future extensions
>>   		  of *bpffs*.
>>   
>> +                  Map element bpf iterator requires an additional parameter
>> +                  *MAP* so bpf program can iterate over map elements for
>> +                  that map.
>> +
> 
> Same note on indentation.
> 
> Could you please also explain in a few words what the "Map element bpf
> iterator" is? Reusing part of your cover letter (see below) could do,
> it's just so that users not familiar with the concept can get an idea of
> what it does.

Will do.

> 
> ---
> User can have a bpf program in kernel to run with each map element,
> do checking, filtering, aggregation, etc. without copying data
> to user space.
> ---
> 
>>   		  User can then *cat PATH* to see the bpf iterator output.
>>   
>>   	**bpftool iter help**
> 
> [...]
> 
>> @@ -62,13 +83,16 @@ static int do_pin(int argc, char **argv)
>>   	bpf_link__destroy(link);
>>   close_obj:
>>   	bpf_object__close(obj);
>> +close_map_fd:
>> +	if (map_fd >= 0)
>> +		close(map_fd);
>>   	return err;
>>   }
>>   
>>   static int do_help(int argc, char **argv)
>>   {
>>   	fprintf(stderr,
>> -		"Usage: %1$s %2$s pin OBJ PATH\n"
>> +		"Usage: %1$s %2$s pin OBJ PATH [map MAP]\n"
> 
> You probably want to add HELP_SPEC_MAP (as in map.c) to tell the user
> what MAP should be.

Good suggestion.

> 
> Could you please also update the bash completion?

This is always my hardest part! In this case it is
   bpftool iter pin <filedir> <filedir> [map MAP]

Any particular existing bpftool implementation I can imitate?

> 
> Thanks,
> Quentin
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 09/13] tools/bpftool: add bpftool support for bpf map element iterator
  2020-07-16 17:42     ` Yonghong Song
@ 2020-07-17 12:57       ` Quentin Monnet
  2020-07-17 18:52         ` Yonghong Song
  0 siblings, 1 reply; 22+ messages in thread
From: Quentin Monnet @ 2020-07-17 12:57 UTC (permalink / raw)
  To: Yonghong Song, bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau

2020-07-16 10:42 UTC-0700 ~ Yonghong Song <yhs@fb.com>
> 
> 
> On 7/16/20 9:39 AM, Quentin Monnet wrote:
>> 2020-07-13 09:17 UTC-0700 ~ Yonghong Song <yhs@fb.com>

[...]

>> Could you please also update the bash completion?
> 
> This is always my hardest part! In this case it is
>   bpftool iter pin <filedir> <filedir> [map MAP]
> 
> Any particular existing bpftool implementation I can imitate?

I would say the closest/easiest to reuse we have would be
completion for the MAP part in either

	bpftool prog attach PROG ATTACH_TYPE [MAP]

or

	bpftool map pin MAP FILE

But I'll save you some time, I gave it a go and this is what
I came up with:

------
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 25b25aca1112..6640e18096a8 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -613,9 +613,26 @@ _bpftool()
             esac
             ;;
         iter)
+            local MAP_TYPE='id pinned name'
             case $command in
                 pin)
-                    _filedir
+                    case $prev in
+                        $command)
+                            _filedir
+                            ;;
+                        id)
+                            _bpftool_get_map_ids
+                            ;;
+                        name)
+                            _bpftool_get_map_names
+                            ;;
+                        pinned)
+                            _filedir
+                            ;;
+                        *)
+                            _bpftool_one_of_list $MAP_TYPE
+                            ;;
+                    esac
                     return 0
                     ;;
                 *)
------

So if we complete "bpftool iter pin", if we're right after "pin"
we still complete with file names (for the object file to pin).
If we're after one of the map keywords (id|name|pinned), complete
with map ids or map names or file names, depending on the case.
For other cases (i.e. after object file to pin), offer the map
keywords (id|name|pinned).

Feel free to reuse.

Best,
Quentin

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH bpf-next 09/13] tools/bpftool: add bpftool support for bpf map element iterator
  2020-07-17 12:57       ` Quentin Monnet
@ 2020-07-17 18:52         ` Yonghong Song
  0 siblings, 0 replies; 22+ messages in thread
From: Yonghong Song @ 2020-07-17 18:52 UTC (permalink / raw)
  To: Quentin Monnet, bpf, netdev
  Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Martin KaFai Lau



On 7/17/20 5:57 AM, Quentin Monnet wrote:
> 2020-07-16 10:42 UTC-0700 ~ Yonghong Song <yhs@fb.com>
>>
>>
>> On 7/16/20 9:39 AM, Quentin Monnet wrote:
>>> 2020-07-13 09:17 UTC-0700 ~ Yonghong Song <yhs@fb.com>
> 
> [...]
> 
>>> Could you please also update the bash completion?
>>
>> This is always my hardest part! In this case it is
>>    bpftool iter pin <filedir> <filedir> [map MAP]
>>
>> Any particular existing bpftool implementation I can imitate?
> 
> I would say the closest/easiest to reuse we have would be
> completion for the MAP part in either
> 
> 	bpftool prog attach PROG ATTACH_TYPE [MAP]
> 
> or
> 
> 	bpftool map pin MAP FILE
> 
> But I'll save you some time, I gave it a go and this is what
> I came up with:
> 
> ------
> diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
> index 25b25aca1112..6640e18096a8 100644
> --- a/tools/bpf/bpftool/bash-completion/bpftool
> +++ b/tools/bpf/bpftool/bash-completion/bpftool
> @@ -613,9 +613,26 @@ _bpftool()
>               esac
>               ;;
>           iter)
> +            local MAP_TYPE='id pinned name'
>               case $command in
>                   pin)
> -                    _filedir
> +                    case $prev in
> +                        $command)
> +                            _filedir
> +                            ;;
> +                        id)
> +                            _bpftool_get_map_ids
> +                            ;;
> +                        name)
> +                            _bpftool_get_map_names
> +                            ;;
> +                        pinned)
> +                            _filedir
> +                            ;;
> +                        *)
> +                            _bpftool_one_of_list $MAP_TYPE
> +                            ;;
> +                    esac
>                       return 0
>                       ;;
>                   *)
> ------
> 
> So if we complete "bpftool iter pin", if we're right after "pin"
> we still complete with file names (for the object file to pin).
> If we're after one of the map keywords (id|name|pinned), complete
> with map ids or map names or file names, depending on the case.
> For other cases (i.e. after object file to pin), offer the map
> keywords (id|name|pinned).
> 
> Feel free to reuse.

Thanks a lot! Will reuse and mention your suggestion of this code
in commit message.

> 
> Best,
> Quentin
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-07-17 18:53 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 16:17 [PATCH bpf-next 00/13] bpf: implement bpf iterator for map elements Yonghong Song
2020-07-13 16:17 ` [PATCH bpf-next 01/13] bpf: refactor bpf_iter_reg to have separate seq_info member Yonghong Song
2020-07-13 16:17 ` [PATCH bpf-next 02/13] bpf: refactor to provide aux info to bpf_iter_init_seq_priv_t Yonghong Song
2020-07-13 16:17 ` [PATCH bpf-next 03/13] bpf: support readonly buffer in verifier Yonghong Song
2020-07-13 23:25   ` Alexei Starovoitov
2020-07-15 17:34     ` Yonghong Song
2020-07-15 17:52       ` Alexei Starovoitov
2020-07-13 16:17 ` [PATCH bpf-next 04/13] bpf: implement bpf iterator for map elements Yonghong Song
2020-07-13 16:17 ` [PATCH bpf-next 05/13] bpf: implement bpf iterator for hash maps Yonghong Song
2020-07-13 16:17 ` [PATCH bpf-next 06/13] bpf: implement bpf iterator for array maps Yonghong Song
2020-07-13 18:49   ` kernel test robot
2020-07-13 16:17 ` [PATCH bpf-next 07/13] bpf: implement bpf iterator for sock local storage map Yonghong Song
2020-07-13 16:17 ` [PATCH bpf-next 08/13] tools/libbpf: add support for bpf map element iterator Yonghong Song
2020-07-13 16:17 ` [PATCH bpf-next 09/13] tools/bpftool: add bpftool " Yonghong Song
2020-07-16 16:39   ` Quentin Monnet
2020-07-16 17:42     ` Yonghong Song
2020-07-17 12:57       ` Quentin Monnet
2020-07-17 18:52         ` Yonghong Song
2020-07-13 16:17 ` [PATCH bpf-next 10/13] selftests/bpf: add test for bpf hash map iterators Yonghong Song
2020-07-13 16:17 ` [PATCH bpf-next 11/13] selftests/bpf: add test for bpf array " Yonghong Song
2020-07-13 16:17 ` [PATCH bpf-next 12/13] selftests/bpf: add a test for bpf sk_storage_map iterator Yonghong Song
2020-07-13 16:17 ` [PATCH bpf-next 13/13] selftests/bpf: add a test for out of bound rdonly buf access Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).