All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 0/8] Introduce bpf ID
@ 2017-05-31 18:58 Martin KaFai Lau
  2017-05-31 18:58 ` [PATCH v2 net-next 1/8] bpf: Introduce bpf_prog ID Martin KaFai Lau
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2017-05-31 18:58 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

This patch series:
1) Introduce ID for both bpf_prog and bpf_map.
2) Add bpf commands to iterate the prog IDs and map
   IDs of the system.
3) Add bpf commands to get a prog/map fd from an ID
4) Add bpf command to get prog/map info from a fd.
   The prog/map info is a jump start in this patchset
   and it is not meant to be a complete list.  They can
   be extended in the future patches.

v2:
Compiler warning fixes:
- Remove lockdep_is_held() usage.  Add comment
  to explain the lock situation instead.
- Add static for idr related variables
- Add __user to the uattr param in bpf_prog_get_info_by_fd()
  and bpf_map_get_info_by_fd().

Martin KaFai Lau (8):
  bpf: Introduce bpf_prog ID
  bpf: Introduce bpf_map ID
  bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command
  bpf: Add BPF_PROG_GET_FD_BY_ID
  bpf: Add BPF_MAP_GET_FD_BY_ID
  bpf: Add jited_len to struct bpf_prog
  bpf: Add BPF_OBJ_GET_INFO_BY_FD
  bpf: Test for bpf ID

 arch/arm64/net/bpf_jit_comp.c             |   1 +
 arch/powerpc/net/bpf_jit_comp64.c         |   1 +
 arch/s390/net/bpf_jit_comp.c              |   1 +
 arch/sparc/net/bpf_jit_comp_64.c          |   1 +
 arch/x86/net/bpf_jit_comp.c               |   1 +
 include/linux/bpf.h                       |   2 +
 include/linux/filter.h                    |   3 +-
 include/uapi/linux/bpf.h                  |  41 +++
 kernel/bpf/syscall.c                      | 433 ++++++++++++++++++++++++++++--
 tools/include/uapi/linux/bpf.h            |  41 +++
 tools/lib/bpf/bpf.c                       |  68 +++++
 tools/lib/bpf/bpf.h                       |   5 +
 tools/testing/selftests/bpf/Makefile      |   2 +-
 tools/testing/selftests/bpf/test_obj_id.c |  35 +++
 tools/testing/selftests/bpf/test_progs.c  | 191 +++++++++++++
 15 files changed, 798 insertions(+), 28 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_obj_id.c

-- 
2.9.3

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 1/8] bpf: Introduce bpf_prog ID
  2017-05-31 18:58 [PATCH v2 net-next 0/8] Introduce bpf ID Martin KaFai Lau
@ 2017-05-31 18:58 ` Martin KaFai Lau
  2017-05-31 18:58 ` [PATCH v2 net-next 2/8] bpf: Introduce bpf_map ID Martin KaFai Lau
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2017-05-31 18:58 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

This patch generates an unique ID for each BPF_PROG_LOAD-ed prog.
It is worth to note that each BPF_PROG_LOAD-ed prog will have
a different ID even they have the same bpf instructions.

The ID is generated by the existing idr_alloc_cyclic().
The ID is ranged from [1, INT_MAX).  It is allocated in cyclic manner,
so an ID will get reused every 2 billion BPF_PROG_LOAD.

The bpf_prog_alloc_id() is done after bpf_prog_select_runtime()
because the jit process may have allocated a new prog.  Hence,
we need to ensure the value of pointer 'prog' will not be changed
any more before storing the prog to the prog_idr.

After bpf_prog_select_runtime(), the prog is read-only.  Hence,
the id is stored in 'struct bpf_prog_aux'.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h  |  1 +
 kernel/bpf/syscall.c | 40 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 6bb38d76faf4..c2793a732edc 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -171,6 +171,7 @@ struct bpf_prog_aux {
 	atomic_t refcnt;
 	u32 used_map_cnt;
 	u32 max_ctx_offset;
+	u32 id;
 	struct latch_tree_node ksym_tnode;
 	struct list_head ksym_lnode;
 	const struct bpf_verifier_ops *ops;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 265a0d854e33..4a936b08a4b0 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -22,8 +22,11 @@
 #include <linux/filter.h>
 #include <linux/version.h>
 #include <linux/kernel.h>
+#include <linux/idr.h>
 
 DEFINE_PER_CPU(int, bpf_prog_active);
+static DEFINE_IDR(prog_idr);
+static DEFINE_SPINLOCK(prog_idr_lock);
 
 int sysctl_unprivileged_bpf_disabled __read_mostly;
 
@@ -650,6 +653,34 @@ static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
 	free_uid(user);
 }
 
+static int bpf_prog_alloc_id(struct bpf_prog *prog)
+{
+	int id;
+
+	spin_lock_bh(&prog_idr_lock);
+	id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC);
+	if (id > 0)
+		prog->aux->id = id;
+	spin_unlock_bh(&prog_idr_lock);
+
+	/* id is in [1, INT_MAX) */
+	if (WARN_ON_ONCE(!id))
+		return -ENOSPC;
+
+	return id > 0 ? 0 : id;
+}
+
+static void bpf_prog_free_id(struct bpf_prog *prog)
+{
+	/* cBPF to eBPF migrations are currently not in the idr store. */
+	if (!prog->aux->id)
+		return;
+
+	spin_lock_bh(&prog_idr_lock);
+	idr_remove(&prog_idr, prog->aux->id);
+	spin_unlock_bh(&prog_idr_lock);
+}
+
 static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 {
 	struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
@@ -663,6 +694,7 @@ void bpf_prog_put(struct bpf_prog *prog)
 {
 	if (atomic_dec_and_test(&prog->aux->refcnt)) {
 		trace_bpf_prog_put_rcu(prog);
+		bpf_prog_free_id(prog);
 		bpf_prog_kallsyms_del(prog);
 		call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
 	}
@@ -855,15 +887,21 @@ static int bpf_prog_load(union bpf_attr *attr)
 	if (err < 0)
 		goto free_used_maps;
 
+	err = bpf_prog_alloc_id(prog);
+	if (err)
+		goto free_used_maps;
+
 	err = bpf_prog_new_fd(prog);
 	if (err < 0)
 		/* failed to allocate fd */
-		goto free_used_maps;
+		goto free_id;
 
 	bpf_prog_kallsyms_add(prog);
 	trace_bpf_prog_load(prog, err);
 	return err;
 
+free_id:
+	bpf_prog_free_id(prog);
 free_used_maps:
 	free_used_maps(prog->aux);
 free_prog:
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 2/8] bpf: Introduce bpf_map ID
  2017-05-31 18:58 [PATCH v2 net-next 0/8] Introduce bpf ID Martin KaFai Lau
  2017-05-31 18:58 ` [PATCH v2 net-next 1/8] bpf: Introduce bpf_prog ID Martin KaFai Lau
@ 2017-05-31 18:58 ` Martin KaFai Lau
  2017-05-31 18:58 ` [PATCH v2 net-next 3/8] bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command Martin KaFai Lau
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2017-05-31 18:58 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

This patch generates an unique ID for each created bpf_map.
The approach is similar to the earlier patch for bpf_prog ID.

It is worth to note that the bpf_map's ID and bpf_prog's ID
are in two independent ID spaces and both have the same valid range:
[1, INT_MAX).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h  |  1 +
 kernel/bpf/syscall.c | 34 +++++++++++++++++++++++++++++++++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c2793a732edc..ea78d87cbc3e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -46,6 +46,7 @@ struct bpf_map {
 	u32 max_entries;
 	u32 map_flags;
 	u32 pages;
+	u32 id;
 	struct user_struct *user;
 	const struct bpf_map_ops *ops;
 	struct work_struct work;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4a936b08a4b0..20f392d64f49 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -27,6 +27,8 @@
 DEFINE_PER_CPU(int, bpf_prog_active);
 static DEFINE_IDR(prog_idr);
 static DEFINE_SPINLOCK(prog_idr_lock);
+static DEFINE_IDR(map_idr);
+static DEFINE_SPINLOCK(map_idr_lock);
 
 int sysctl_unprivileged_bpf_disabled __read_mostly;
 
@@ -117,6 +119,29 @@ static void bpf_map_uncharge_memlock(struct bpf_map *map)
 	free_uid(user);
 }
 
+static int bpf_map_alloc_id(struct bpf_map *map)
+{
+	int id;
+
+	spin_lock_bh(&map_idr_lock);
+	id = idr_alloc_cyclic(&map_idr, map, 1, INT_MAX, GFP_ATOMIC);
+	if (id > 0)
+		map->id = id;
+	spin_unlock_bh(&map_idr_lock);
+
+	if (WARN_ON_ONCE(!id))
+		return -ENOSPC;
+
+	return id > 0 ? 0 : id;
+}
+
+static void bpf_map_free_id(struct bpf_map *map)
+{
+	spin_lock_bh(&map_idr_lock);
+	idr_remove(&map_idr, map->id);
+	spin_unlock_bh(&map_idr_lock);
+}
+
 /* called from workqueue */
 static void bpf_map_free_deferred(struct work_struct *work)
 {
@@ -141,6 +166,7 @@ static void bpf_map_put_uref(struct bpf_map *map)
 void bpf_map_put(struct bpf_map *map)
 {
 	if (atomic_dec_and_test(&map->refcnt)) {
+		bpf_map_free_id(map);
 		INIT_WORK(&map->work, bpf_map_free_deferred);
 		schedule_work(&map->work);
 	}
@@ -239,14 +265,20 @@ static int map_create(union bpf_attr *attr)
 	if (err)
 		goto free_map_nouncharge;
 
+	err = bpf_map_alloc_id(map);
+	if (err)
+		goto free_map;
+
 	err = bpf_map_new_fd(map);
 	if (err < 0)
 		/* failed to allocate fd */
-		goto free_map;
+		goto free_id;
 
 	trace_bpf_map_create(map, err);
 	return err;
 
+free_id:
+	bpf_map_free_id(map);
 free_map:
 	bpf_map_uncharge_memlock(map);
 free_map_nouncharge:
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 3/8] bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command
  2017-05-31 18:58 [PATCH v2 net-next 0/8] Introduce bpf ID Martin KaFai Lau
  2017-05-31 18:58 ` [PATCH v2 net-next 1/8] bpf: Introduce bpf_prog ID Martin KaFai Lau
  2017-05-31 18:58 ` [PATCH v2 net-next 2/8] bpf: Introduce bpf_map ID Martin KaFai Lau
@ 2017-05-31 18:58 ` Martin KaFai Lau
  2017-05-31 18:58 ` [PATCH v2 net-next 4/8] bpf: Add BPF_PROG_GET_FD_BY_ID Martin KaFai Lau
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2017-05-31 18:58 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

This patch adds BPF_PROG_GET_NEXT_ID and BPF_MAP_GET_NEXT_ID
to allow userspace to iterate all bpf_prog IDs and bpf_map IDs.

The API is trying to be consistent with the existing
BPF_MAP_GET_NEXT_KEY.

It is currently limited to CAP_SYS_ADMIN which we can
consider to lift it in followup patches.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/uapi/linux/bpf.h |  7 +++++++
 kernel/bpf/syscall.c     | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 94dfa9def355..e5c88f39bdc3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -82,6 +82,8 @@ enum bpf_cmd {
 	BPF_PROG_ATTACH,
 	BPF_PROG_DETACH,
 	BPF_PROG_TEST_RUN,
+	BPF_PROG_GET_NEXT_ID,
+	BPF_MAP_GET_NEXT_ID,
 };
 
 enum bpf_map_type {
@@ -209,6 +211,11 @@ union bpf_attr {
 		__u32		repeat;
 		__u32		duration;
 	} test;
+
+	struct { /* anonymous struct used by BPF_*_GET_NEXT_ID */
+		__u32		start_id;
+		__u32		next_id;
+	};
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 20f392d64f49..68a73d453d70 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -166,6 +166,7 @@ static void bpf_map_put_uref(struct bpf_map *map)
 void bpf_map_put(struct bpf_map *map)
 {
 	if (atomic_dec_and_test(&map->refcnt)) {
+		/* bpf_map_free_id() must be called first */
 		bpf_map_free_id(map);
 		INIT_WORK(&map->work, bpf_map_free_deferred);
 		schedule_work(&map->work);
@@ -726,6 +727,7 @@ void bpf_prog_put(struct bpf_prog *prog)
 {
 	if (atomic_dec_and_test(&prog->aux->refcnt)) {
 		trace_bpf_prog_put_rcu(prog);
+		/* bpf_prog_free_id() must be called first */
 		bpf_prog_free_id(prog);
 		bpf_prog_kallsyms_del(prog);
 		call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
@@ -1067,6 +1069,34 @@ static int bpf_prog_test_run(const union bpf_attr *attr,
 	return ret;
 }
 
+#define BPF_OBJ_GET_NEXT_ID_LAST_FIELD next_id
+
+static int bpf_obj_get_next_id(const union bpf_attr *attr,
+			       union bpf_attr __user *uattr,
+			       struct idr *idr,
+			       spinlock_t *lock)
+{
+	u32 next_id = attr->start_id;
+	int err = 0;
+
+	if (CHECK_ATTR(BPF_OBJ_GET_NEXT_ID) || next_id >= INT_MAX)
+		return -EINVAL;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	next_id++;
+	spin_lock_bh(lock);
+	if (!idr_get_next(idr, &next_id))
+		err = -ENOENT;
+	spin_unlock_bh(lock);
+
+	if (!err)
+		err = put_user(next_id, &uattr->next_id);
+
+	return err;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr = {};
@@ -1144,6 +1174,14 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_PROG_TEST_RUN:
 		err = bpf_prog_test_run(&attr, uattr);
 		break;
+	case BPF_PROG_GET_NEXT_ID:
+		err = bpf_obj_get_next_id(&attr, uattr,
+					  &prog_idr, &prog_idr_lock);
+		break;
+	case BPF_MAP_GET_NEXT_ID:
+		err = bpf_obj_get_next_id(&attr, uattr,
+					  &map_idr, &map_idr_lock);
+		break;
 	default:
 		err = -EINVAL;
 		break;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 4/8] bpf: Add BPF_PROG_GET_FD_BY_ID
  2017-05-31 18:58 [PATCH v2 net-next 0/8] Introduce bpf ID Martin KaFai Lau
                   ` (2 preceding siblings ...)
  2017-05-31 18:58 ` [PATCH v2 net-next 3/8] bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command Martin KaFai Lau
@ 2017-05-31 18:58 ` Martin KaFai Lau
  2017-05-31 18:58 ` [PATCH v2 net-next 5/8] bpf: Add BPF_MAP_GET_FD_BY_ID Martin KaFai Lau
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2017-05-31 18:58 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Add BPF_PROG_GET_FD_BY_ID command to allow user to get a fd
from a bpf_prog's ID.

bpf_prog_inc_not_zero() is added and is called with prog_idr_lock
held.

__bpf_prog_put() is also added which has the 'bool do_idr_lock'
param to decide if the prog_idr_lock should be acquired when
freeing the prog->id.

In the error path of bpf_prog_inc_not_zero(), it may have to
call __bpf_prog_put(map, false) which does not need
to take the prog_idr_lock when freeing the prog->id.

It is currently limited to CAP_SYS_ADMIN which we can
consider to lift it in followup patches.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/uapi/linux/bpf.h |  8 +++--
 kernel/bpf/syscall.c     | 91 ++++++++++++++++++++++++++++++++++++++++++------
 2 files changed, 87 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e5c88f39bdc3..6d4e1cc5bd18 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -84,6 +84,7 @@ enum bpf_cmd {
 	BPF_PROG_TEST_RUN,
 	BPF_PROG_GET_NEXT_ID,
 	BPF_MAP_GET_NEXT_ID,
+	BPF_PROG_GET_FD_BY_ID,
 };
 
 enum bpf_map_type {
@@ -212,8 +213,11 @@ union bpf_attr {
 		__u32		duration;
 	} test;
 
-	struct { /* anonymous struct used by BPF_*_GET_NEXT_ID */
-		__u32		start_id;
+	struct { /* anonymous struct used by BPF_*_GET_*_ID */
+		union {
+			__u32		start_id;
+			__u32		prog_id;
+		};
 		__u32		next_id;
 	};
 } __attribute__((aligned(8)));
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 68a73d453d70..4f9ee57e7140 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -703,15 +703,23 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog)
 	return id > 0 ? 0 : id;
 }
 
-static void bpf_prog_free_id(struct bpf_prog *prog)
+static void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
 {
 	/* cBPF to eBPF migrations are currently not in the idr store. */
 	if (!prog->aux->id)
 		return;
 
-	spin_lock_bh(&prog_idr_lock);
+	if (do_idr_lock)
+		spin_lock_bh(&prog_idr_lock);
+	else
+		__acquire(&prog_idr_lock);
+
 	idr_remove(&prog_idr, prog->aux->id);
-	spin_unlock_bh(&prog_idr_lock);
+
+	if (do_idr_lock)
+		spin_unlock_bh(&prog_idr_lock);
+	else
+		__release(&prog_idr_lock);
 }
 
 static void __bpf_prog_put_rcu(struct rcu_head *rcu)
@@ -723,16 +731,21 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 	bpf_prog_free(aux->prog);
 }
 
-void bpf_prog_put(struct bpf_prog *prog)
+static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
 {
 	if (atomic_dec_and_test(&prog->aux->refcnt)) {
 		trace_bpf_prog_put_rcu(prog);
 		/* bpf_prog_free_id() must be called first */
-		bpf_prog_free_id(prog);
+		bpf_prog_free_id(prog, do_idr_lock);
 		bpf_prog_kallsyms_del(prog);
 		call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
 	}
 }
+
+void bpf_prog_put(struct bpf_prog *prog)
+{
+	__bpf_prog_put(prog, true);
+}
 EXPORT_SYMBOL_GPL(bpf_prog_put);
 
 static int bpf_prog_release(struct inode *inode, struct file *filp)
@@ -814,6 +827,24 @@ struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog)
 }
 EXPORT_SYMBOL_GPL(bpf_prog_inc);
 
+/* prog_idr_lock should have been held */
+static struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog)
+{
+	int refold;
+
+	refold = __atomic_add_unless(&prog->aux->refcnt, 1, 0);
+
+	if (refold >= BPF_MAX_REFCNT) {
+		__bpf_prog_put(prog, false);
+		return ERR_PTR(-EBUSY);
+	}
+
+	if (!refold)
+		return ERR_PTR(-ENOENT);
+
+	return prog;
+}
+
 static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *type)
 {
 	struct fd f = fdget(ufd);
@@ -926,16 +957,21 @@ static int bpf_prog_load(union bpf_attr *attr)
 		goto free_used_maps;
 
 	err = bpf_prog_new_fd(prog);
-	if (err < 0)
-		/* failed to allocate fd */
-		goto free_id;
+	if (err < 0) {
+		/* failed to allocate fd.
+		 * bpf_prog_put() is needed because the above
+		 * bpf_prog_alloc_id() has published the prog
+		 * to the userspace and the userspace may
+		 * have refcnt-ed it through BPF_PROG_GET_FD_BY_ID.
+		 */
+		bpf_prog_put(prog);
+		return err;
+	}
 
 	bpf_prog_kallsyms_add(prog);
 	trace_bpf_prog_load(prog, err);
 	return err;
 
-free_id:
-	bpf_prog_free_id(prog);
 free_used_maps:
 	free_used_maps(prog->aux);
 free_prog:
@@ -1097,6 +1133,38 @@ static int bpf_obj_get_next_id(const union bpf_attr *attr,
 	return err;
 }
 
+#define BPF_PROG_GET_FD_BY_ID_LAST_FIELD prog_id
+
+static int bpf_prog_get_fd_by_id(const union bpf_attr *attr)
+{
+	struct bpf_prog *prog;
+	u32 id = attr->prog_id;
+	int fd;
+
+	if (CHECK_ATTR(BPF_PROG_GET_FD_BY_ID))
+		return -EINVAL;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	spin_lock_bh(&prog_idr_lock);
+	prog = idr_find(&prog_idr, id);
+	if (prog)
+		prog = bpf_prog_inc_not_zero(prog);
+	else
+		prog = ERR_PTR(-ENOENT);
+	spin_unlock_bh(&prog_idr_lock);
+
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	fd = bpf_prog_new_fd(prog);
+	if (fd < 0)
+		bpf_prog_put(prog);
+
+	return fd;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr = {};
@@ -1182,6 +1250,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 		err = bpf_obj_get_next_id(&attr, uattr,
 					  &map_idr, &map_idr_lock);
 		break;
+	case BPF_PROG_GET_FD_BY_ID:
+		err = bpf_prog_get_fd_by_id(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 5/8] bpf: Add BPF_MAP_GET_FD_BY_ID
  2017-05-31 18:58 [PATCH v2 net-next 0/8] Introduce bpf ID Martin KaFai Lau
                   ` (3 preceding siblings ...)
  2017-05-31 18:58 ` [PATCH v2 net-next 4/8] bpf: Add BPF_PROG_GET_FD_BY_ID Martin KaFai Lau
@ 2017-05-31 18:58 ` Martin KaFai Lau
  2017-05-31 18:59 ` [PATCH v2 net-next 6/8] bpf: Add jited_len to struct bpf_prog Martin KaFai Lau
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2017-05-31 18:58 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Add BPF_MAP_GET_FD_BY_ID command to allow user to get a fd
from a bpf_map's ID.

bpf_map_inc_not_zero() is added and is called with map_idr_lock
held.

__bpf_map_put() is also added which has the 'bool do_idr_lock'
param to decide if the map_idr_lock should be acquired when
freeing the map->id.

In the error path of bpf_map_inc_not_zero(), it may have to
call __bpf_map_put(map, false) which does not need
to take the map_idr_lock when freeing the map->id.

It is currently limited to CAP_SYS_ADMIN which we can
consider to lift it in followup patches.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/uapi/linux/bpf.h |  2 +
 kernel/bpf/syscall.c     | 95 +++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 87 insertions(+), 10 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6d4e1cc5bd18..cf704e8b6e65 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -85,6 +85,7 @@ enum bpf_cmd {
 	BPF_PROG_GET_NEXT_ID,
 	BPF_MAP_GET_NEXT_ID,
 	BPF_PROG_GET_FD_BY_ID,
+	BPF_MAP_GET_FD_BY_ID,
 };
 
 enum bpf_map_type {
@@ -217,6 +218,7 @@ union bpf_attr {
 		union {
 			__u32		start_id;
 			__u32		prog_id;
+			__u32		map_id;
 		};
 		__u32		next_id;
 	};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4f9ee57e7140..de8fe04a6539 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -135,11 +135,19 @@ static int bpf_map_alloc_id(struct bpf_map *map)
 	return id > 0 ? 0 : id;
 }
 
-static void bpf_map_free_id(struct bpf_map *map)
+static void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock)
 {
-	spin_lock_bh(&map_idr_lock);
+	if (do_idr_lock)
+		spin_lock_bh(&map_idr_lock);
+	else
+		__acquire(&map_idr_lock);
+
 	idr_remove(&map_idr, map->id);
-	spin_unlock_bh(&map_idr_lock);
+
+	if (do_idr_lock)
+		spin_unlock_bh(&map_idr_lock);
+	else
+		__release(&map_idr_lock);
 }
 
 /* called from workqueue */
@@ -163,16 +171,21 @@ static void bpf_map_put_uref(struct bpf_map *map)
 /* decrement map refcnt and schedule it for freeing via workqueue
  * (unrelying map implementation ops->map_free() might sleep)
  */
-void bpf_map_put(struct bpf_map *map)
+static void __bpf_map_put(struct bpf_map *map, bool do_idr_lock)
 {
 	if (atomic_dec_and_test(&map->refcnt)) {
 		/* bpf_map_free_id() must be called first */
-		bpf_map_free_id(map);
+		bpf_map_free_id(map, do_idr_lock);
 		INIT_WORK(&map->work, bpf_map_free_deferred);
 		schedule_work(&map->work);
 	}
 }
 
+void bpf_map_put(struct bpf_map *map)
+{
+	__bpf_map_put(map, true);
+}
+
 void bpf_map_put_with_uref(struct bpf_map *map)
 {
 	bpf_map_put_uref(map);
@@ -271,15 +284,20 @@ static int map_create(union bpf_attr *attr)
 		goto free_map;
 
 	err = bpf_map_new_fd(map);
-	if (err < 0)
-		/* failed to allocate fd */
-		goto free_id;
+	if (err < 0) {
+		/* failed to allocate fd.
+		 * bpf_map_put() is needed because the above
+		 * bpf_map_alloc_id() has published the map
+		 * to the userspace and the userspace may
+		 * have refcnt-ed it through BPF_MAP_GET_FD_BY_ID.
+		 */
+		bpf_map_put(map);
+		return err;
+	}
 
 	trace_bpf_map_create(map, err);
 	return err;
 
-free_id:
-	bpf_map_free_id(map);
 free_map:
 	bpf_map_uncharge_memlock(map);
 free_map_nouncharge:
@@ -331,6 +349,28 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
 	return map;
 }
 
+/* map_idr_lock should have been held */
+static struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map,
+					    bool uref)
+{
+	int refold;
+
+	refold = __atomic_add_unless(&map->refcnt, 1, 0);
+
+	if (refold >= BPF_MAX_REFCNT) {
+		__bpf_map_put(map, false);
+		return ERR_PTR(-EBUSY);
+	}
+
+	if (!refold)
+		return ERR_PTR(-ENOENT);
+
+	if (uref)
+		atomic_inc(&map->usercnt);
+
+	return map;
+}
+
 int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 {
 	return -ENOTSUPP;
@@ -1165,6 +1205,38 @@ static int bpf_prog_get_fd_by_id(const union bpf_attr *attr)
 	return fd;
 }
 
+#define BPF_MAP_GET_FD_BY_ID_LAST_FIELD map_id
+
+static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
+{
+	struct bpf_map *map;
+	u32 id = attr->map_id;
+	int fd;
+
+	if (CHECK_ATTR(BPF_MAP_GET_FD_BY_ID))
+		return -EINVAL;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	spin_lock_bh(&map_idr_lock);
+	map = idr_find(&map_idr, id);
+	if (map)
+		map = bpf_map_inc_not_zero(map, true);
+	else
+		map = ERR_PTR(-ENOENT);
+	spin_unlock_bh(&map_idr_lock);
+
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+
+	fd = bpf_map_new_fd(map);
+	if (fd < 0)
+		bpf_map_put(map);
+
+	return fd;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr = {};
@@ -1253,6 +1325,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_PROG_GET_FD_BY_ID:
 		err = bpf_prog_get_fd_by_id(&attr);
 		break;
+	case BPF_MAP_GET_FD_BY_ID:
+		err = bpf_map_get_fd_by_id(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 6/8] bpf: Add jited_len to struct bpf_prog
  2017-05-31 18:58 [PATCH v2 net-next 0/8] Introduce bpf ID Martin KaFai Lau
                   ` (4 preceding siblings ...)
  2017-05-31 18:58 ` [PATCH v2 net-next 5/8] bpf: Add BPF_MAP_GET_FD_BY_ID Martin KaFai Lau
@ 2017-05-31 18:59 ` Martin KaFai Lau
  2017-05-31 18:59 ` [PATCH v2 net-next 7/8] bpf: Add BPF_OBJ_GET_INFO_BY_FD Martin KaFai Lau
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2017-05-31 18:59 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Add jited_len to struct bpf_prog.  It will be
useful for the struct bpf_prog_info which will
be added in the later patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 arch/arm64/net/bpf_jit_comp.c     | 1 +
 arch/powerpc/net/bpf_jit_comp64.c | 1 +
 arch/s390/net/bpf_jit_comp.c      | 1 +
 arch/sparc/net/bpf_jit_comp_64.c  | 1 +
 arch/x86/net/bpf_jit_comp.c       | 1 +
 include/linux/filter.h            | 1 +
 6 files changed, 6 insertions(+)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 71f930501ade..6b21dccf01e8 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -900,6 +900,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	bpf_jit_binary_lock_ro(header);
 	prog->bpf_func = (void *)ctx.image;
 	prog->jited = 1;
+	prog->jited_len = image_size;
 
 out_off:
 	kfree(ctx.offset);
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index aee2bb817ac6..a3f904cd8b1e 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -1052,6 +1052,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
 	fp->bpf_func = (void *)image;
 	fp->jited = 1;
+	fp->jited_len = alloclen;
 
 	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
 
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 6e97a2e3fd8d..76b7e9d5d591 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -1329,6 +1329,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	bpf_jit_binary_lock_ro(header);
 	fp->bpf_func = (void *) jit.prg_buf;
 	fp->jited = 1;
+	fp->jited_len = jit.size;
 free_addrs:
 	kfree(jit.addrs);
 out:
diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 21de77419f48..beb46b14fbbc 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -1555,6 +1555,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 
 	prog->bpf_func = (void *)ctx.image;
 	prog->jited = 1;
+	prog->jited_len = image_size;
 
 out_off:
 	kfree(ctx.offset);
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index f58939393eef..9341030133ae 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1162,6 +1162,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		bpf_jit_binary_lock_ro(header);
 		prog->bpf_func = (void *)image;
 		prog->jited = 1;
+		prog->jited_len = proglen;
 	} else {
 		prog = orig_prog;
 	}
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 62d948f80730..a322a41f394b 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -429,6 +429,7 @@ struct bpf_prog {
 	kmemcheck_bitfield_end(meta);
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	u32			len;		/* Number of filter blocks */
+	u32			jited_len;	/* Size of jited insns in bytes */
 	u8			tag[BPF_TAG_SIZE];
 	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
 	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 7/8] bpf: Add BPF_OBJ_GET_INFO_BY_FD
  2017-05-31 18:58 [PATCH v2 net-next 0/8] Introduce bpf ID Martin KaFai Lau
                   ` (5 preceding siblings ...)
  2017-05-31 18:59 ` [PATCH v2 net-next 6/8] bpf: Add jited_len to struct bpf_prog Martin KaFai Lau
@ 2017-05-31 18:59 ` Martin KaFai Lau
  2017-05-31 18:59 ` [PATCH v2 net-next 8/8] bpf: Test for bpf ID Martin KaFai Lau
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2017-05-31 18:59 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

A single BPF_OBJ_GET_INFO_BY_FD cmd is used to obtain the info
for both bpf_prog and bpf_map.  The kernel can figure out the
fd is associated with a bpf_prog or bpf_map.

The suggested struct bpf_prog_info and struct bpf_map_info are
not meant to be a complete list and it is not the goal of this patch.
New fields can be added in the future patch.

The focus of this patch is to create the interface,
BPF_OBJ_GET_INFO_BY_FD cmd for exposing the bpf_prog's and
bpf_map's info.

The obj's info, which will be extended (and get bigger) over time, is
separated from the bpf_attr to avoid bloating the bpf_attr.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/filter.h   |   2 -
 include/uapi/linux/bpf.h |  28 ++++++++
 kernel/bpf/syscall.c     | 163 ++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 174 insertions(+), 19 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index a322a41f394b..46274f23f127 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -66,8 +66,6 @@ struct bpf_prog_aux;
 /* BPF program can access up to 512 bytes of stack space. */
 #define MAX_BPF_STACK	512
 
-#define BPF_TAG_SIZE	8
-
 /* Helper macros for filter block array initializers. */
 
 /* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index cf704e8b6e65..677b570da8ec 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -86,6 +86,7 @@ enum bpf_cmd {
 	BPF_MAP_GET_NEXT_ID,
 	BPF_PROG_GET_FD_BY_ID,
 	BPF_MAP_GET_FD_BY_ID,
+	BPF_OBJ_GET_INFO_BY_FD,
 };
 
 enum bpf_map_type {
@@ -222,6 +223,12 @@ union bpf_attr {
 		};
 		__u32		next_id;
 	};
+
+	struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
+		__u32		bpf_fd;
+		__u32		info_len;
+		__aligned_u64	info;
+	} info;
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
@@ -683,4 +690,25 @@ struct xdp_md {
 	__u32 data_end;
 };
 
+#define BPF_TAG_SIZE	8
+
+struct bpf_prog_info {
+	__u32 type;
+	__u32 id;
+	__u8  tag[BPF_TAG_SIZE];
+	__u32 jited_prog_len;
+	__u32 xlated_prog_len;
+	__aligned_u64 jited_prog_insns;
+	__aligned_u64 xlated_prog_insns;
+} __attribute__((aligned(8)));
+
+struct bpf_map_info {
+	__u32 type;
+	__u32 id;
+	__u32 key_size;
+	__u32 value_size;
+	__u32 max_entries;
+	__u32 map_flags;
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index de8fe04a6539..aeca61a84ed1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1237,6 +1237,145 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
 	return fd;
 }
 
+static int check_uarg_tail_zero(void __user *uaddr,
+				size_t expected_size,
+				size_t actual_size)
+{
+	unsigned char __user *addr;
+	unsigned char __user *end;
+	unsigned char val;
+	int err;
+
+	if (actual_size <= expected_size)
+		return 0;
+
+	addr = uaddr + expected_size;
+	end  = uaddr + actual_size;
+
+	for (; addr < end; addr++) {
+		err = get_user(val, addr);
+		if (err)
+			return err;
+		if (val)
+			return -E2BIG;
+	}
+
+	return 0;
+}
+
+static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
+				   const union bpf_attr *attr,
+				   union bpf_attr __user *uattr)
+{
+	struct bpf_prog_info __user *uinfo = u64_to_user_ptr(attr->info.info);
+	struct bpf_prog_info info = {};
+	u32 info_len = attr->info.info_len;
+	char __user *uinsns;
+	u32 ulen;
+	int err;
+
+	err = check_uarg_tail_zero(uinfo, sizeof(info), info_len);
+	if (err)
+		return err;
+	info_len = min_t(u32, sizeof(info), info_len);
+
+	if (copy_from_user(&info, uinfo, info_len))
+		return err;
+
+	info.type = prog->type;
+	info.id = prog->aux->id;
+
+	memcpy(info.tag, prog->tag, sizeof(prog->tag));
+
+	if (!capable(CAP_SYS_ADMIN)) {
+		info.jited_prog_len = 0;
+		info.xlated_prog_len = 0;
+		goto done;
+	}
+
+	ulen = info.jited_prog_len;
+	info.jited_prog_len = prog->jited_len;
+	if (info.jited_prog_len && ulen) {
+		uinsns = u64_to_user_ptr(info.jited_prog_insns);
+		ulen = min_t(u32, info.jited_prog_len, ulen);
+		if (copy_to_user(uinsns, prog->bpf_func, ulen))
+			return -EFAULT;
+	}
+
+	ulen = info.xlated_prog_len;
+	info.xlated_prog_len = bpf_prog_size(prog->len);
+	if (info.xlated_prog_len && ulen) {
+		uinsns = u64_to_user_ptr(info.xlated_prog_insns);
+		ulen = min_t(u32, info.xlated_prog_len, ulen);
+		if (copy_to_user(uinsns, prog->insnsi, ulen))
+			return -EFAULT;
+	}
+
+done:
+	if (copy_to_user(uinfo, &info, info_len) ||
+	    put_user(info_len, &uattr->info.info_len))
+		return -EFAULT;
+
+	return 0;
+}
+
+static int bpf_map_get_info_by_fd(struct bpf_map *map,
+				  const union bpf_attr *attr,
+				  union bpf_attr __user *uattr)
+{
+	struct bpf_map_info __user *uinfo = u64_to_user_ptr(attr->info.info);
+	struct bpf_map_info info = {};
+	u32 info_len = attr->info.info_len;
+	int err;
+
+	err = check_uarg_tail_zero(uinfo, sizeof(info), info_len);
+	if (err)
+		return err;
+	info_len = min_t(u32, sizeof(info), info_len);
+
+	info.type = map->map_type;
+	info.id = map->id;
+	info.key_size = map->key_size;
+	info.value_size = map->value_size;
+	info.max_entries = map->max_entries;
+	info.map_flags = map->map_flags;
+
+	if (copy_to_user(uinfo, &info, info_len) ||
+	    put_user(info_len, &uattr->info.info_len))
+		return -EFAULT;
+
+	return 0;
+}
+
+#define BPF_OBJ_GET_INFO_BY_FD_LAST_FIELD info.info
+
+static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
+				  union bpf_attr __user *uattr)
+{
+	int ufd = attr->info.bpf_fd;
+	struct fd f;
+	int err;
+
+	if (CHECK_ATTR(BPF_OBJ_GET_INFO_BY_FD))
+		return -EINVAL;
+
+	f = fdget(ufd);
+	if (!f.file)
+		return -EBADFD;
+
+	if (f.file->f_op == &bpf_prog_fops)
+		err = bpf_prog_get_info_by_fd(f.file->private_data, attr,
+					      uattr);
+	else if (f.file->f_op == &bpf_map_fops)
+		err = bpf_map_get_info_by_fd(f.file->private_data, attr,
+					     uattr);
+	else
+		err = -EINVAL;
+
+	fdput(f);
+	return err;
+}
+
 SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
 {
 	union bpf_attr attr = {};
@@ -1256,23 +1395,10 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	 * user-space does not rely on any kernel feature
 	 * extensions we dont know about yet.
 	 */
-	if (size > sizeof(attr)) {
-		unsigned char __user *addr;
-		unsigned char __user *end;
-		unsigned char val;
-
-		addr = (void __user *)uattr + sizeof(attr);
-		end  = (void __user *)uattr + size;
-
-		for (; addr < end; addr++) {
-			err = get_user(val, addr);
-			if (err)
-				return err;
-			if (val)
-				return -E2BIG;
-		}
-		size = sizeof(attr);
-	}
+	err = check_uarg_tail_zero(uattr, sizeof(attr), size);
+	if (err)
+		return err;
+	size = min_t(u32, size, sizeof(attr));
 
 	/* copy attributes from user space, may be less than sizeof(bpf_attr) */
 	if (copy_from_user(&attr, uattr, size) != 0)
@@ -1328,6 +1454,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_MAP_GET_FD_BY_ID:
 		err = bpf_map_get_fd_by_id(&attr);
 		break;
+	case BPF_OBJ_GET_INFO_BY_FD:
+		err = bpf_obj_get_info_by_fd(&attr, uattr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v2 net-next 8/8] bpf: Test for bpf ID
  2017-05-31 18:58 [PATCH v2 net-next 0/8] Introduce bpf ID Martin KaFai Lau
                   ` (6 preceding siblings ...)
  2017-05-31 18:59 ` [PATCH v2 net-next 7/8] bpf: Add BPF_OBJ_GET_INFO_BY_FD Martin KaFai Lau
@ 2017-05-31 18:59 ` Martin KaFai Lau
  2017-05-31 23:35 ` [PATCH v2 net-next 0/8] Introduce " David Miller
  2017-06-01  2:22 ` Jakub Kicinski
  9 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2017-05-31 18:59 UTC (permalink / raw)
  To: netdev; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team

Add test to exercise the bpf_prog/map id generation,
bpf_(prog|map)_get_next_id(), bpf_(prog|map)_get_fd_by_id() and
bpf_get_obj_info_by_fd().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/include/uapi/linux/bpf.h            |  41 +++++++
 tools/lib/bpf/bpf.c                       |  68 +++++++++++
 tools/lib/bpf/bpf.h                       |   5 +
 tools/testing/selftests/bpf/Makefile      |   2 +-
 tools/testing/selftests/bpf/test_obj_id.c |  35 ++++++
 tools/testing/selftests/bpf/test_progs.c  | 191 ++++++++++++++++++++++++++++++
 6 files changed, 341 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_obj_id.c

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 94dfa9def355..677b570da8ec 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -82,6 +82,11 @@ enum bpf_cmd {
 	BPF_PROG_ATTACH,
 	BPF_PROG_DETACH,
 	BPF_PROG_TEST_RUN,
+	BPF_PROG_GET_NEXT_ID,
+	BPF_MAP_GET_NEXT_ID,
+	BPF_PROG_GET_FD_BY_ID,
+	BPF_MAP_GET_FD_BY_ID,
+	BPF_OBJ_GET_INFO_BY_FD,
 };
 
 enum bpf_map_type {
@@ -209,6 +214,21 @@ union bpf_attr {
 		__u32		repeat;
 		__u32		duration;
 	} test;
+
+	struct { /* anonymous struct used by BPF_*_GET_*_ID */
+		union {
+			__u32		start_id;
+			__u32		prog_id;
+			__u32		map_id;
+		};
+		__u32		next_id;
+	};
+
+	struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
+		__u32		bpf_fd;
+		__u32		info_len;
+		__aligned_u64	info;
+	} info;
 } __attribute__((aligned(8)));
 
 /* BPF helper function descriptions:
@@ -670,4 +690,25 @@ struct xdp_md {
 	__u32 data_end;
 };
 
+#define BPF_TAG_SIZE	8
+
+struct bpf_prog_info {
+	__u32 type;
+	__u32 id;
+	__u8  tag[BPF_TAG_SIZE];
+	__u32 jited_prog_len;
+	__u32 xlated_prog_len;
+	__aligned_u64 jited_prog_insns;
+	__aligned_u64 xlated_prog_insns;
+} __attribute__((aligned(8)));
+
+struct bpf_map_info {
+	__u32 type;
+	__u32 id;
+	__u32 key_size;
+	__u32 value_size;
+	__u32 max_entries;
+	__u32 map_flags;
+} __attribute__((aligned(8)));
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 6e178987af8e..7e0405e1651d 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -257,3 +257,71 @@ int bpf_prog_test_run(int prog_fd, int repeat, void *data, __u32 size,
 		*duration = attr.test.duration;
 	return ret;
 }
+
+int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id)
+{
+	union bpf_attr attr;
+	int err;
+
+	bzero(&attr, sizeof(attr));
+	attr.start_id = start_id;
+
+	err = sys_bpf(BPF_PROG_GET_NEXT_ID, &attr, sizeof(attr));
+	if (!err)
+		*next_id = attr.next_id;
+
+	return err;
+}
+
+int bpf_map_get_next_id(__u32 start_id, __u32 *next_id)
+{
+	union bpf_attr attr;
+	int err;
+
+	bzero(&attr, sizeof(attr));
+	attr.start_id = start_id;
+
+	err = sys_bpf(BPF_MAP_GET_NEXT_ID, &attr, sizeof(attr));
+	if (!err)
+		*next_id = attr.next_id;
+
+	return err;
+}
+
+int bpf_prog_get_fd_by_id(__u32 id)
+{
+	union bpf_attr attr;
+
+	bzero(&attr, sizeof(attr));
+	attr.prog_id = id;
+
+	return sys_bpf(BPF_PROG_GET_FD_BY_ID, &attr, sizeof(attr));
+}
+
+int bpf_map_get_fd_by_id(__u32 id)
+{
+	union bpf_attr attr;
+
+	bzero(&attr, sizeof(attr));
+	attr.map_id = id;
+
+	return sys_bpf(BPF_MAP_GET_FD_BY_ID, &attr, sizeof(attr));
+}
+
+int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len)
+{
+	union bpf_attr attr;
+	int err;
+
+	bzero(&attr, sizeof(attr));
+	bzero(info, *info_len);
+	attr.info.bpf_fd = prog_fd;
+	attr.info.info_len = *info_len;
+	attr.info.info = ptr_to_u64(info);
+
+	err = sys_bpf(BPF_OBJ_GET_INFO_BY_FD, &attr, sizeof(attr));
+	if (!err)
+		*info_len = attr.info.info_len;
+
+	return err;
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 972bd8333eb7..16de44a14b48 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -54,5 +54,10 @@ int bpf_prog_detach(int attachable_fd, enum bpf_attach_type type);
 int bpf_prog_test_run(int prog_fd, int repeat, void *data, __u32 size,
 		      void *data_out, __u32 *size_out, __u32 *retval,
 		      __u32 *duration);
+int bpf_prog_get_next_id(__u32 start_id, __u32 *next_id);
+int bpf_map_get_next_id(__u32 start_id, __u32 *next_id);
+int bpf_prog_get_fd_by_id(__u32 id);
+int bpf_map_get_fd_by_id(__u32 id);
+int bpf_obj_get_info_by_fd(int prog_fd, void *info, __u32 *info_len);
 
 #endif
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index f389b02d43a0..9f0e07ba5334 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -14,7 +14,7 @@ LDLIBS += -lcap -lelf
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
 	test_align
 
-TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o
+TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o
 
 TEST_PROGS := test_kmod.sh
 
diff --git a/tools/testing/selftests/bpf/test_obj_id.c b/tools/testing/selftests/bpf/test_obj_id.c
new file mode 100644
index 000000000000..d8723aaf827a
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_obj_id.c
@@ -0,0 +1,35 @@
+/* Copyright (c) 2017 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <stddef.h>
+#include <linux/bpf.h>
+#include <linux/pkt_cls.h>
+#include "bpf_helpers.h"
+
+/* It is a dumb bpf program such that it must have no
+ * issue to be loaded since testing the verifier is
+ * not the focus here.
+ */
+
+int _version SEC("version") = 1;
+
+struct bpf_map_def SEC("maps") test_map_id = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u64),
+	.max_entries = 1,
+};
+
+SEC("test_prog_id")
+int test_prog_id(struct __sk_buff *skb)
+{
+	__u32 key = 0;
+	__u64 *value;
+
+	value = bpf_map_lookup_elem(&test_map_id, &key);
+
+	return TC_ACT_OK;
+}
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index b59f5ed4ae40..8189bfc7e277 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -22,6 +22,8 @@ typedef __u16 __sum16;
 
 #include <sys/wait.h>
 #include <sys/resource.h>
+#include <sys/types.h>
+#include <pwd.h>
 
 #include <linux/bpf.h>
 #include <linux/err.h>
@@ -70,6 +72,7 @@ static struct {
 		pass_cnt++;						\
 		printf("%s:PASS:%s %d nsec\n", __func__, tag, duration);\
 	}								\
+	__ret;								\
 })
 
 static int bpf_prog_load(const char *file, enum bpf_prog_type type,
@@ -283,6 +286,193 @@ static void test_tcp_estats(void)
 	bpf_object__close(obj);
 }
 
+static inline __u64 ptr_to_u64(const void *ptr)
+{
+	return (__u64) (unsigned long) ptr;
+}
+
+static void test_bpf_obj_id(void)
+{
+	const __u64 array_magic_value = 0xfaceb00c;
+	const __u32 array_key = 0;
+	const int nr_iters = 2;
+	const char *file = "./test_obj_id.o";
+
+	struct bpf_object *objs[nr_iters];
+	int prog_fds[nr_iters], map_fds[nr_iters];
+	/* +1 to test for the info_len returned by kernel */
+	struct bpf_prog_info prog_infos[nr_iters + 1];
+	struct bpf_map_info map_infos[nr_iters + 1];
+	char jited_insns[128], xlated_insns[128];
+	__u32 i, next_id, info_len, nr_id_found, duration = 0;
+	int err = 0;
+	__u64 array_value;
+
+	err = bpf_prog_get_fd_by_id(0);
+	CHECK(err >= 0 || errno != ENOENT,
+	      "get-fd-by-notexist-prog-id", "err %d errno %d\n", err, errno);
+
+	err = bpf_map_get_fd_by_id(0);
+	CHECK(err >= 0 || errno != ENOENT,
+	      "get-fd-by-notexist-map-id", "err %d errno %d\n", err, errno);
+
+	for (i = 0; i < nr_iters; i++)
+		objs[i] = NULL;
+
+	/* Check bpf_obj_get_info_by_fd() */
+	for (i = 0; i < nr_iters; i++) {
+		err = bpf_prog_load(file, BPF_PROG_TYPE_SOCKET_FILTER,
+				    &objs[i], &prog_fds[i]);
+		/* test_obj_id.o is a dumb prog. It should never fail
+		 * to load.
+		 */
+		assert(!err);
+
+		/* Check getting prog info */
+		info_len = sizeof(struct bpf_prog_info) * 2;
+		prog_infos[i].jited_prog_insns = ptr_to_u64(jited_insns);
+		prog_infos[i].jited_prog_len = sizeof(jited_insns);
+		prog_infos[i].xlated_prog_insns = ptr_to_u64(xlated_insns);
+		prog_infos[i].xlated_prog_len = sizeof(xlated_insns);
+		err = bpf_obj_get_info_by_fd(prog_fds[i], &prog_infos[i],
+					     &info_len);
+		if (CHECK(err ||
+			  prog_infos[i].type != BPF_PROG_TYPE_SOCKET_FILTER ||
+			  info_len != sizeof(struct bpf_prog_info) ||
+			  !prog_infos[i].jited_prog_len ||
+			  !prog_infos[i].xlated_prog_len,
+			  "get-prog-info(fd)",
+			  "err %d errno %d i %d type %d(%d) info_len %u(%lu) jited_prog_len %u xlated_prog_len %u\n",
+			  err, errno, i,
+			  prog_infos[i].type, BPF_PROG_TYPE_SOCKET_FILTER,
+			  info_len, sizeof(struct bpf_prog_info),
+			  prog_infos[i].jited_prog_len,
+			  prog_infos[i].xlated_prog_len))
+			goto done;
+
+		map_fds[i] = bpf_find_map(__func__, objs[i], "test_map_id");
+		assert(map_fds[i] >= 0);
+		err = bpf_map_update_elem(map_fds[i], &array_key,
+					  &array_magic_value, 0);
+		assert(!err);
+
+		/* Check getting map info */
+		info_len = sizeof(struct bpf_map_info) * 2;
+		err = bpf_obj_get_info_by_fd(map_fds[i], &map_infos[i],
+					     &info_len);
+		if (CHECK(err ||
+			  map_infos[i].type != BPF_MAP_TYPE_ARRAY ||
+			  map_infos[i].key_size != sizeof(__u32) ||
+			  map_infos[i].value_size != sizeof(__u64) ||
+			  map_infos[i].max_entries != 1 ||
+			  map_infos[i].map_flags != 0 ||
+			  info_len != sizeof(struct bpf_map_info),
+			  "get-map-info(fd)",
+			  "err %d errno %d type %d(%d) info_len %u(%lu) key_size %u value_size %u max_entries %u map_flags %X\n",
+			  err, errno,
+			  map_infos[i].type, BPF_MAP_TYPE_ARRAY,
+			  info_len, sizeof(struct bpf_map_info),
+			  map_infos[i].key_size,
+			  map_infos[i].value_size,
+			  map_infos[i].max_entries,
+			  map_infos[i].map_flags))
+			goto done;
+	}
+
+	/* Check bpf_prog_get_next_id() */
+	nr_id_found = 0;
+	next_id = 0;
+	while (!bpf_prog_get_next_id(next_id, &next_id)) {
+		struct bpf_prog_info prog_info;
+		int prog_fd;
+
+		info_len = sizeof(prog_info);
+
+		prog_fd = bpf_prog_get_fd_by_id(next_id);
+		if (prog_fd < 0 && errno == ENOENT)
+			/* The bpf_prog is in the dead row */
+			continue;
+		if (CHECK(prog_fd < 0, "get-prog-fd(next_id)",
+			  "prog_fd %d next_id %d errno %d\n",
+			  prog_fd, next_id, errno))
+			break;
+
+		for (i = 0; i < nr_iters; i++)
+			if (prog_infos[i].id == next_id)
+				break;
+
+		if (i == nr_iters)
+			continue;
+
+		nr_id_found++;
+
+		err = bpf_obj_get_info_by_fd(prog_fd, &prog_info, &info_len);
+		CHECK(err || info_len != sizeof(struct bpf_prog_info) ||
+		      memcmp(&prog_info, &prog_infos[i], info_len),
+		      "get-prog-info(next_id->fd)",
+		      "err %d errno %d info_len %u(%lu) memcmp %d\n",
+		      err, errno, info_len, sizeof(struct bpf_prog_info),
+		      memcmp(&prog_info, &prog_infos[i], info_len));
+
+		close(prog_fd);
+	}
+	CHECK(nr_id_found != nr_iters,
+	      "check total prog id found by get_next_id",
+	      "nr_id_found %u(%u)\n",
+	      nr_id_found, nr_iters);
+
+	/* Check bpf_map_get_next_id() */
+	nr_id_found = 0;
+	next_id = 0;
+	while (!bpf_map_get_next_id(next_id, &next_id)) {
+		struct bpf_map_info map_info;
+		int map_fd;
+
+		info_len = sizeof(map_info);
+
+		map_fd = bpf_map_get_fd_by_id(next_id);
+		if (map_fd < 0 && errno == ENOENT)
+			/* The bpf_map is in the dead row */
+			continue;
+		if (CHECK(map_fd < 0, "get-map-fd(next_id)",
+			  "map_fd %d next_id %u errno %d\n",
+			  map_fd, next_id, errno))
+			break;
+
+		for (i = 0; i < nr_iters; i++)
+			if (map_infos[i].id == next_id)
+				break;
+
+		if (i == nr_iters)
+			continue;
+
+		nr_id_found++;
+
+		err = bpf_map_lookup_elem(map_fd, &array_key, &array_value);
+		assert(!err);
+
+		err = bpf_obj_get_info_by_fd(map_fd, &map_info, &info_len);
+		CHECK(err || info_len != sizeof(struct bpf_map_info) ||
+		      memcmp(&map_info, &map_infos[i], info_len) ||
+		      array_value != array_magic_value,
+		      "check get-map-info(next_id->fd)",
+		      "err %d errno %d info_len %u(%lu) memcmp %d array_value %llu(%llu)\n",
+		      err, errno, info_len, sizeof(struct bpf_map_info),
+		      memcmp(&map_info, &map_infos[i], info_len),
+		      array_value, array_magic_value);
+
+		close(map_fd);
+	}
+	CHECK(nr_id_found != nr_iters,
+	      "check total map id found by get_next_id",
+	      "nr_id_found %u(%u)\n",
+	      nr_id_found, nr_iters);
+
+done:
+	for (i = 0; i < nr_iters; i++)
+		bpf_object__close(objs[i]);
+}
+
 int main(void)
 {
 	struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
@@ -293,6 +483,7 @@ int main(void)
 	test_xdp();
 	test_l4lb();
 	test_tcp_estats();
+	test_bpf_obj_id();
 
 	printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
 	return 0;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-05-31 18:58 [PATCH v2 net-next 0/8] Introduce bpf ID Martin KaFai Lau
                   ` (7 preceding siblings ...)
  2017-05-31 18:59 ` [PATCH v2 net-next 8/8] bpf: Test for bpf ID Martin KaFai Lau
@ 2017-05-31 23:35 ` David Miller
  2017-06-03  1:08   ` Martin KaFai Lau
  2017-06-01  2:22 ` Jakub Kicinski
  9 siblings, 1 reply; 23+ messages in thread
From: David Miller @ 2017-05-31 23:35 UTC (permalink / raw)
  To: kafai; +Cc: netdev, ast, daniel, kernel-team

From: Martin KaFai Lau <kafai@fb.com>
Date: Wed, 31 May 2017 11:58:54 -0700

> This patch series:
> 1) Introduce ID for both bpf_prog and bpf_map.
> 2) Add bpf commands to iterate the prog IDs and map
>    IDs of the system.
> 3) Add bpf commands to get a prog/map fd from an ID
> 4) Add bpf command to get prog/map info from a fd.
>    The prog/map info is a jump start in this patchset
>    and it is not meant to be a complete list.  They can
>    be extended in the future patches.
> 
> v2:
> Compiler warning fixes:
> - Remove lockdep_is_held() usage.  Add comment
>   to explain the lock situation instead.
> - Add static for idr related variables
> - Add __user to the uattr param in bpf_prog_get_info_by_fd()
>   and bpf_map_get_info_by_fd().

Series applied, thanks Martin.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-05-31 18:58 [PATCH v2 net-next 0/8] Introduce bpf ID Martin KaFai Lau
                   ` (8 preceding siblings ...)
  2017-05-31 23:35 ` [PATCH v2 net-next 0/8] Introduce " David Miller
@ 2017-06-01  2:22 ` Jakub Kicinski
  2017-06-01 13:55   ` David Ahern
  9 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2017-06-01  2:22 UTC (permalink / raw)
  To: Martin KaFai Lau, Hannes Frederic Sowa
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team

On Wed, 31 May 2017 11:58:54 -0700, Martin KaFai Lau wrote:
> This patch series:
> 1) Introduce ID for both bpf_prog and bpf_map.
> 2) Add bpf commands to iterate the prog IDs and map
>    IDs of the system.
> 3) Add bpf commands to get a prog/map fd from an ID
> 4) Add bpf command to get prog/map info from a fd.
>    The prog/map info is a jump start in this patchset
>    and it is not meant to be a complete list.  They can
>    be extended in the future patches.

Hi!  IIUC there were plans at some point to store the BPF byte code in
the kernel, the way it was loaded from the user space and before it
went through the verifier.  Are you still planning on implementing that?
It would be handy for offloads to have the original byte code, things
like call inlining make translations of maps calls a bit challenging,
since we try to run through the verifier again..

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-06-01  2:22 ` Jakub Kicinski
@ 2017-06-01 13:55   ` David Ahern
  2017-06-01 17:24     ` Alexei Starovoitov
  0 siblings, 1 reply; 23+ messages in thread
From: David Ahern @ 2017-06-01 13:55 UTC (permalink / raw)
  To: Jakub Kicinski, Martin KaFai Lau, Hannes Frederic Sowa
  Cc: netdev, Alexei Starovoitov, Daniel Borkmann, kernel-team

On 5/31/17 8:22 PM, Jakub Kicinski wrote:
> On Wed, 31 May 2017 11:58:54 -0700, Martin KaFai Lau wrote:
>> This patch series:
>> 1) Introduce ID for both bpf_prog and bpf_map.
>> 2) Add bpf commands to iterate the prog IDs and map
>>    IDs of the system.
>> 3) Add bpf commands to get a prog/map fd from an ID
>> 4) Add bpf command to get prog/map info from a fd.
>>    The prog/map info is a jump start in this patchset
>>    and it is not meant to be a complete list.  They can
>>    be extended in the future patches.
> 
> Hi!  IIUC there were plans at some point to store the BPF byte code in
> the kernel, the way it was loaded from the user space and before it
> went through the verifier.  Are you still planning on implementing that?
> It would be handy for offloads to have the original byte code, things
> like call inlining make translations of maps calls a bit challenging,
> since we try to run through the verifier again..
> 

Save the instructions:
https://github.com/dsahern/linux/commit/5315578db7b0886a3e8e3bef6a56e828ae2bd2c4

Let userspace read it back:
https://github.com/dsahern/linux/commit/619982064b9b582fc5d4a504feffca1730610c4e

One more in between in the branch:
https://github.com/dsahern/linux/tree/bpf/retrieve-bpf-wip

Now that the id code is in, it needs to be returned on link dumps, route
dumps, etc and accessible via fdinfo. The id can be turned into an fd
and the fd read using the above patches.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-06-01 13:55   ` David Ahern
@ 2017-06-01 17:24     ` Alexei Starovoitov
  2017-06-01 17:51       ` David Ahern
  0 siblings, 1 reply; 23+ messages in thread
From: Alexei Starovoitov @ 2017-06-01 17:24 UTC (permalink / raw)
  To: David Ahern, Jakub Kicinski, Martin KaFai Lau, Hannes Frederic Sowa
  Cc: netdev, Daniel Borkmann, kernel-team

On 6/1/17 6:55 AM, David Ahern wrote:
> On 5/31/17 8:22 PM, Jakub Kicinski wrote:
>> On Wed, 31 May 2017 11:58:54 -0700, Martin KaFai Lau wrote:
>>> This patch series:
>>> 1) Introduce ID for both bpf_prog and bpf_map.
>>> 2) Add bpf commands to iterate the prog IDs and map
>>>    IDs of the system.
>>> 3) Add bpf commands to get a prog/map fd from an ID
>>> 4) Add bpf command to get prog/map info from a fd.
>>>    The prog/map info is a jump start in this patchset
>>>    and it is not meant to be a complete list.  They can
>>>    be extended in the future patches.
>>
>> Hi!  IIUC there were plans at some point to store the BPF byte code in
>> the kernel, the way it was loaded from the user space and before it
>> went through the verifier.  Are you still planning on implementing that?
>> It would be handy for offloads to have the original byte code, things
>> like call inlining make translations of maps calls a bit challenging,
>> since we try to run through the verifier again..
>>
>
> Save the instructions:
> https://github.com/dsahern/linux/commit/5315578db7b0886a3e8e3bef6a56e828ae2bd2c4
>
> Let userspace read it back:
> https://github.com/dsahern/linux/commit/619982064b9b582fc5d4a504feffca1730610c4e
>
> One more in between in the branch:
> https://github.com/dsahern/linux/tree/bpf/retrieve-bpf-wip
>
> Now that the id code is in, it needs to be returned on link dumps, route
> dumps, etc and accessible via fdinfo. The id can be turned into an fd
> and the fd read using the above patches.

what problem are we trying to solve?
If we don't agree on the problem, we won't be able to agree on the solution.

Sounds to me that Jakub wants to translate lookup/update helper calls
for JITing and it's difficult due to inlining optimization.
In such case isn't the easiest solution to disable inlining when
JIT to NFP is on?

Dave's case is to be able to retrieve a cgroup_sock program used
for vrf. Such prog doesn't have any maps and I don't see why
returning real insns is not enough.
As Martin said there is work in progress patches that return IDs
for programs attached to everything: cgroup_sock hook, xdp, etc.
The user can get prog ID, convert to FD and dump all insns.
Why this is not enough for VRF introspection?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-06-01 17:24     ` Alexei Starovoitov
@ 2017-06-01 17:51       ` David Ahern
  2017-06-01 18:27         ` Alexei Starovoitov
  0 siblings, 1 reply; 23+ messages in thread
From: David Ahern @ 2017-06-01 17:51 UTC (permalink / raw)
  To: Alexei Starovoitov, Jakub Kicinski, Martin KaFai Lau,
	Hannes Frederic Sowa
  Cc: netdev, Daniel Borkmann, kernel-team

On 6/1/17 11:24 AM, Alexei Starovoitov wrote:
> 
> what problem are we trying to solve?
> If we don't agree on the problem, we won't be able to agree on the
> solution.

I want the ability to retrieve the BPF instructions pushed to the
kernel. Not just for cgroups and VRF, but everywhere - cgroups, sockets,
xdp, filters, etc. I have this stated many times, including to you at
netconf in April. From that discussion and the discussion on the RFC
patch I sent in February, I thought we had an agreement that saving the
instructions was the first step.

Quentin at 6wind has expressed a similar interest for cls_bpf and
act_bpf code.

I read Jakub's email as wanting the same but for XDP.

The patches I referenced earlier save the original instructions and
enable them to be read by userspace from an fd return by the bpf
syscall. Combine that with the ID and we have a solution.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-06-01 17:51       ` David Ahern
@ 2017-06-01 18:27         ` Alexei Starovoitov
  2017-06-01 18:33           ` David Miller
  2017-06-01 18:52           ` David Ahern
  0 siblings, 2 replies; 23+ messages in thread
From: Alexei Starovoitov @ 2017-06-01 18:27 UTC (permalink / raw)
  To: David Ahern, Jakub Kicinski, Martin KaFai Lau, Hannes Frederic Sowa
  Cc: netdev, Daniel Borkmann, kernel-team

On 6/1/17 10:51 AM, David Ahern wrote:
> On 6/1/17 11:24 AM, Alexei Starovoitov wrote:
>>
>> what problem are we trying to solve?
>> If we don't agree on the problem, we won't be able to agree on the
>> solution.
>
> I want the ability to retrieve the BPF instructions pushed to the
> kernel. Not just for cgroups and VRF, but everywhere - cgroups, sockets,
> xdp, filters, etc. I have this stated many times, including to you at
> netconf in April. From that discussion and the discussion on the RFC
> patch I sent in February, I thought we had an agreement that saving the
> instructions was the first step.
>
> Quentin at 6wind has expressed a similar interest for cls_bpf and
> act_bpf code.
>
> I read Jakub's email as wanting the same but for XDP.
>
> The patches I referenced earlier save the original instructions and
> enable them to be read by userspace from an fd return by the bpf
> syscall. Combine that with the ID and we have a solution.

Please explain the problem instead of pushing for 'solution'.
In this case I do not see a problem that saving original instructions
will solve.

'I want to retrieve original instructions' is not a problem. It's a
push for 'solution'. Explaining 'why' you want to see original
instructions would describe the actual problem.
If you're trying to debug something you're interested in original
source code and in the instructions that kernel is running.
To correlate to source code we have prog_tag (that was computed
over original insns).
To see kernel instructions in the interpreter and JITed assembler
we have this set of patches.
Original instructions are not useful for debugging,
since they have map_fds which can be long gone when program
is running, therefore to solve introspection problem we need to
look into real kernel instructions.

My understanding that 6wind wants to see original instructions to
accelerate them somehow using out of tree technology.
I'm not interested to help that cause. I also suspect that
retrieving kernel instructions will not be much different from
that point of view, but at least returning kernel insns is
solving introspection problem for everyone.

Returning kernel insns also doesn't burn unnecessary memory,
whereas original insns will be there as dead weight.
Hence before we add them we need to be absolutely clear that
there is no cheaper alternative possible.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-06-01 18:27         ` Alexei Starovoitov
@ 2017-06-01 18:33           ` David Miller
  2017-06-01 18:57             ` Jakub Kicinski
  2017-06-01 18:52           ` David Ahern
  1 sibling, 1 reply; 23+ messages in thread
From: David Miller @ 2017-06-01 18:33 UTC (permalink / raw)
  To: ast; +Cc: dsahern, kubakici, kafai, hannes, netdev, daniel, kernel-team

From: Alexei Starovoitov <ast@fb.com>
Date: Thu, 1 Jun 2017 11:27:56 -0700

> Original instructions are not useful for debugging, since they have
> map_fds which can be long gone when program is running, therefore to
> solve introspection problem we need to look into real kernel
> instructions.

The only possible legitimate use I see for the original instructions
is checkpoint/restart.

If 6wind wants this for "external acceleration" or whatever, like
Alexei I am not interested in facilitating that at all.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-06-01 18:27         ` Alexei Starovoitov
  2017-06-01 18:33           ` David Miller
@ 2017-06-01 18:52           ` David Ahern
  2017-06-01 18:56             ` David Miller
                               ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: David Ahern @ 2017-06-01 18:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Jakub Kicinski, Martin KaFai Lau,
	Hannes Frederic Sowa
  Cc: netdev, Daniel Borkmann, kernel-team

On 6/1/17 12:27 PM, Alexei Starovoitov wrote:
> 'I want to retrieve original instructions' is not a problem. It's a
> push for 'solution'. Explaining 'why' you want to see original
> instructions would describe the actual problem.

I have explained this.

You are creating this hyper-complex almost completely invisible
infrastructure. You are enabling binary blobs that can bypass the
network stack and modify packets with almost no introspection on what is
happening. BPF code can from a variety of sources -- OS vendors,
upstream repos, 3rd party vendors (eg., H/W vendors), and "in-house"
development. Each will swear to the end that any observed problem is not
with their code. In my experience, it falls on to the OS and kernel
experts to figure out why Linux is breaking something. To do that we
need tools to look at what code is running where and something that can
be used in production environments not requiring a disruption to the
service that the box is providing.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-06-01 18:52           ` David Ahern
@ 2017-06-01 18:56             ` David Miller
  2017-06-01 19:00             ` Jakub Kicinski
  2017-06-01 21:48             ` Alexei Starovoitov
  2 siblings, 0 replies; 23+ messages in thread
From: David Miller @ 2017-06-01 18:56 UTC (permalink / raw)
  To: dsahern; +Cc: ast, kubakici, kafai, hannes, netdev, daniel, kernel-team

From: David Ahern <dsahern@gmail.com>
Date: Thu, 1 Jun 2017 12:52:28 -0600

> On 6/1/17 12:27 PM, Alexei Starovoitov wrote:
>> 'I want to retrieve original instructions' is not a problem. It's a
>> push for 'solution'. Explaining 'why' you want to see original
>> instructions would describe the actual problem.
> 
> I have explained this.
> 
> You are creating this hyper-complex almost completely invisible
> infrastructure. You are enabling binary blobs that can bypass the
> network stack and modify packets with almost no introspection on what is
> happening. BPF code can from a variety of sources -- OS vendors,
> upstream repos, 3rd party vendors (eg., H/W vendors), and "in-house"
> development. Each will swear to the end that any observed problem is not
> with their code. In my experience, it falls on to the OS and kernel
> experts to figure out why Linux is breaking something. To do that we
> need tools to look at what code is running where and something that can
> be used in production environments not requiring a disruption to the
> service that the box is providing.

I'm not convinced that the verifier processed instruction stream
cannot provide that facility properly.

In fact, because of how the maps and tail calls work, you can only
figure out what is really executing by looking at the final processed
instruction stream.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-06-01 18:33           ` David Miller
@ 2017-06-01 18:57             ` Jakub Kicinski
  0 siblings, 0 replies; 23+ messages in thread
From: Jakub Kicinski @ 2017-06-01 18:57 UTC (permalink / raw)
  To: David Miller; +Cc: ast, dsahern, kafai, hannes, netdev, daniel, kernel-team

On Thu, 01 Jun 2017 14:33:36 -0400 (EDT), David Miller wrote:
> From: Alexei Starovoitov <ast@fb.com>
> Date: Thu, 1 Jun 2017 11:27:56 -0700
> 
> > Original instructions are not useful for debugging, since they have
> > map_fds which can be long gone when program is running, therefore to
> > solve introspection problem we need to look into real kernel
> > instructions.  
> 
> The only possible legitimate use I see for the original instructions
> is checkpoint/restart.
> 
> If 6wind wants this for "external acceleration" or whatever, like
> Alexei I am not interested in facilitating that at all.

+1 

I will try disabling the inlining then.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-06-01 18:52           ` David Ahern
  2017-06-01 18:56             ` David Miller
@ 2017-06-01 19:00             ` Jakub Kicinski
  2017-06-01 21:48             ` Alexei Starovoitov
  2 siblings, 0 replies; 23+ messages in thread
From: Jakub Kicinski @ 2017-06-01 19:00 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexei Starovoitov, Martin KaFai Lau, Hannes Frederic Sowa,
	netdev, Daniel Borkmann, kernel-team

On Thu, 1 Jun 2017 12:52:28 -0600, David Ahern wrote:
> On 6/1/17 12:27 PM, Alexei Starovoitov wrote:
> > 'I want to retrieve original instructions' is not a problem. It's a
> > push for 'solution'. Explaining 'why' you want to see original
> > instructions would describe the actual problem.  
> 
> I have explained this.
> 
> You are creating this hyper-complex almost completely invisible
> infrastructure. You are enabling binary blobs that can bypass the
> network stack and modify packets with almost no introspection on what is
> happening. BPF code can from a variety of sources -- OS vendors,
> upstream repos, 3rd party vendors (eg., H/W vendors), and "in-house"
> development. Each will swear to the end that any observed problem is not
> with their code. In my experience, it falls on to the OS and kernel
> experts to figure out why Linux is breaking something. To do that we
> need tools to look at what code is running where and something that can
> be used in production environments not requiring a disruption to the
> service that the box is providing.

Forgive my ignorance, but is it possible to dump code of a loaded
module out of the kernel?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-06-01 18:52           ` David Ahern
  2017-06-01 18:56             ` David Miller
  2017-06-01 19:00             ` Jakub Kicinski
@ 2017-06-01 21:48             ` Alexei Starovoitov
  2017-06-02 11:39               ` David Ahern
  2 siblings, 1 reply; 23+ messages in thread
From: Alexei Starovoitov @ 2017-06-01 21:48 UTC (permalink / raw)
  To: David Ahern, Jakub Kicinski, Martin KaFai Lau, Hannes Frederic Sowa
  Cc: netdev, Daniel Borkmann, kernel-team

On 6/1/17 11:52 AM, David Ahern wrote:
> On 6/1/17 12:27 PM, Alexei Starovoitov wrote:
>> 'I want to retrieve original instructions' is not a problem. It's a
>> push for 'solution'. Explaining 'why' you want to see original
>> instructions would describe the actual problem.
>
> I have explained this.
>
> You are creating this hyper-complex almost completely invisible
> infrastructure. You are enabling binary blobs that can bypass the
> network stack and modify packets with almost no introspection on what is
> happening. BPF code can from a variety of sources -- OS vendors,
> upstream repos, 3rd party vendors (eg., H/W vendors), and "in-house"
> development. Each will swear to the end that any observed problem is not
> with their code. In my experience, it falls on to the OS and kernel
> experts to figure out why Linux is breaking something. To do that we
> need tools to look at what code is running where and something that can
> be used in production environments not requiring a disruption to the
> service that the box is providing.

You saw patch 7/8, right?
since I'm not following what exactly you're concerned about.
This patch set provided a way to retrieve post-verifier, post-blinding
instruction stream that gives users a way to know exactly what is 
running. Original instruction stream is not the one that is executed.
It's merely an interface between kernel and user space.
Before that stage the clang/llvm did many code transformations and
optimizations on original source code and after that verifier,
context rewriter, inliner, constant blinding did transformations on
these insns as well.

If there is a bug somewhere in all these transformations it can be
anywhere in clang, llvm optimizer, llvm codegen, elf or bcc loader
and in kernel side transformations. Yes. There can be bugs, but we
cannot keep all these stages in the kernel. I don't mind dumping
insn stream for debugging while doing these stages (just like llvm
has -print-before-all flag), but keeping all the intermediate stages
don't make sense to me. In that sense 'original instruction stream'
to me is one of the intermediate stages where source code in C
crossed user->kernel boundary.
I'd rather store more information about original C code than
this user->kernel instruction stream. That's where CTF is heading.
To provide info about types, names, etc. For both progs and maps.

I don't mind storing even that 'original instruction stream' _if_
there is a solid reason. I just didn't hear one so far.
I can imagine somebody saying that there is a bug in context rewriter
and xlated_prog_insns are accessing wrong field. That's bad, but how
keeping original_prog_insns will help such case?
And how such bug is different from llvm generating wrong code ?
If there is a bug anywhere in that transformation pipeline I'd want
to give original source code and final outcome to support people.
In practice everything is more complex, since maps are dynamic,
tail_calls are dynamic, the code flow changes. Debugging is not easy
and this patch set is the first step toward better debuggability.
I'm all for it, but statements like "without original insns
it's not debuggable" are concerning, since we either don't
explain the APIs well enough or understanding of the use case
is missing on our side.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-06-01 21:48             ` Alexei Starovoitov
@ 2017-06-02 11:39               ` David Ahern
  0 siblings, 0 replies; 23+ messages in thread
From: David Ahern @ 2017-06-02 11:39 UTC (permalink / raw)
  To: Alexei Starovoitov, Jakub Kicinski, Martin KaFai Lau,
	Hannes Frederic Sowa
  Cc: netdev, Daniel Borkmann, kernel-team

On 6/1/17 3:48 PM, Alexei Starovoitov wrote:
> You saw patch 7/8, right?

Apparently, not close enough. Nothing in the commit message mentioned
returning instructions. I skimmed the patch and filled in that GET_INFO
was returning just the meta-data.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2 net-next 0/8] Introduce bpf ID
  2017-05-31 23:35 ` [PATCH v2 net-next 0/8] Introduce " David Miller
@ 2017-06-03  1:08   ` Martin KaFai Lau
  0 siblings, 0 replies; 23+ messages in thread
From: Martin KaFai Lau @ 2017-06-03  1:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ast, daniel, kernel-team

On Wed, May 31, 2017 at 07:35:18PM -0400, David Miller wrote:
> From: Martin KaFai Lau <kafai@fb.com>
> Date: Wed, 31 May 2017 11:58:54 -0700
>
> > This patch series:
> > 1) Introduce ID for both bpf_prog and bpf_map.
> > 2) Add bpf commands to iterate the prog IDs and map
> >    IDs of the system.
> > 3) Add bpf commands to get a prog/map fd from an ID
> > 4) Add bpf command to get prog/map info from a fd.
> >    The prog/map info is a jump start in this patchset
> >    and it is not meant to be a complete list.  They can
> >    be extended in the future patches.
> >
> > v2:
> > Compiler warning fixes:
> > - Remove lockdep_is_held() usage.  Add comment
> >   to explain the lock situation instead.
> > - Add static for idr related variables
> > - Add __user to the uattr param in bpf_prog_get_info_by_fd()
> >   and bpf_map_get_info_by_fd().
>
> Series applied, thanks Martin.
Patch 1 may not apply cleanly due to another newly added
field in 'struct bpf_prog_aux' on May 31.

Should I respin?

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2017-06-03  1:08 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-31 18:58 [PATCH v2 net-next 0/8] Introduce bpf ID Martin KaFai Lau
2017-05-31 18:58 ` [PATCH v2 net-next 1/8] bpf: Introduce bpf_prog ID Martin KaFai Lau
2017-05-31 18:58 ` [PATCH v2 net-next 2/8] bpf: Introduce bpf_map ID Martin KaFai Lau
2017-05-31 18:58 ` [PATCH v2 net-next 3/8] bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command Martin KaFai Lau
2017-05-31 18:58 ` [PATCH v2 net-next 4/8] bpf: Add BPF_PROG_GET_FD_BY_ID Martin KaFai Lau
2017-05-31 18:58 ` [PATCH v2 net-next 5/8] bpf: Add BPF_MAP_GET_FD_BY_ID Martin KaFai Lau
2017-05-31 18:59 ` [PATCH v2 net-next 6/8] bpf: Add jited_len to struct bpf_prog Martin KaFai Lau
2017-05-31 18:59 ` [PATCH v2 net-next 7/8] bpf: Add BPF_OBJ_GET_INFO_BY_FD Martin KaFai Lau
2017-05-31 18:59 ` [PATCH v2 net-next 8/8] bpf: Test for bpf ID Martin KaFai Lau
2017-05-31 23:35 ` [PATCH v2 net-next 0/8] Introduce " David Miller
2017-06-03  1:08   ` Martin KaFai Lau
2017-06-01  2:22 ` Jakub Kicinski
2017-06-01 13:55   ` David Ahern
2017-06-01 17:24     ` Alexei Starovoitov
2017-06-01 17:51       ` David Ahern
2017-06-01 18:27         ` Alexei Starovoitov
2017-06-01 18:33           ` David Miller
2017-06-01 18:57             ` Jakub Kicinski
2017-06-01 18:52           ` David Ahern
2017-06-01 18:56             ` David Miller
2017-06-01 19:00             ` Jakub Kicinski
2017-06-01 21:48             ` Alexei Starovoitov
2017-06-02 11:39               ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.