Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH bpf-next v10 0/4] BPF: New helper to obtain namespace data  from current task
@ 2019-09-06 15:09 Carlos Neira
  2019-09-06 15:09 ` [PATCH bpf-next v10 1/4] fs/namei.c: make available filename_lookup() for bpf helpers Carlos Neira
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Carlos Neira @ 2019-09-06 15:09 UTC (permalink / raw)
  To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf

This helper obtains the active namespace from current and returns pid, tgid,
device and namespace id as seen from that namespace, allowing to instrument
a process inside a container.
Device is read from /proc/self/ns/pid, as in the future it's possible that
different pid_ns files may belong to different devices, according
to the discussion between Eric Biederman and Yonghong in 2017 linux plumbers
conference.
Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
scripts but this helper returns the pid as seen by the root namespace which is
fine when a bcc script is not executed inside a container.
When the process of interest is inside a container, pid filtering will not work
if bpf_get_current_pid_tgid() is used. This helper addresses this limitation
returning the pid as it's seen by the current namespace where the script is
executing.

This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
used to do pid filtering even inside a container.

For example a bcc script using bpf_get_current_pid_tgid() (tools/funccount.py):

        u32 pid = bpf_get_current_pid_tgid() >> 32;
        if (pid != <pid_arg_passed_in>)
                return 0;
Could be modified to use bpf_get_current_pidns_info() as follows:

        struct bpf_pidns pidns;
        bpf_get_current_pidns_info(&pidns, sizeof(struct bpf_pidns));
        u32 pid = pidns.tgid;
        u32 nsid = pidns.nsid;
        if ((pid != <pid_arg_passed_in>) && (nsid != <nsid_arg_passed_in>))
                return 0;

To find out the name PID namespace id of a process, you could use this command:

$ ps -h -o pidns -p <pid_of_interest>

Or this other command:

$ ls -Li /proc/<pid_of_interest>/ns/pid

Changes from v9 :
Removed samples/bpf in favor of tools/testing/selftests/bpf
Fixed bug when bpf helper is called in an interrupt context.
Code style fixes.
Added more comments on bpf helper struct member.

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>

Carlos Neira (4):
  fs/namei.c: make available filename_lookup() for bpf helpers.
  bpf: new helper to obtain namespace data from  current task New bpf
    helper bpf_get_current_pidns_info.
  tools: Added bpf_get_current_pidns_info  helper.
  tools/testing/selftests/bpf: Add self-tests  for helper
    bpf_get_pidns_info.

 fs/internal.h                                      |   2 -
 fs/namei.c                                         |   1 -
 include/linux/bpf.h                                |   1 +
 include/linux/namei.h                              |   4 +
 include/uapi/linux/bpf.h                           |  35 ++++-
 kernel/bpf/core.c                                  |   1 +
 kernel/bpf/helpers.c                               |  86 ++++++++++++
 kernel/trace/bpf_trace.c                           |   2 +
 tools/include/uapi/linux/bpf.h                     |  35 ++++-
 tools/testing/selftests/bpf/Makefile               |   2 +-
 tools/testing/selftests/bpf/bpf_helpers.h          |   3 +
 .../testing/selftests/bpf/progs/test_pidns_kern.c  |  52 ++++++++
 .../selftests/bpf/progs/test_pidns_nmi_kern.c      |  52 ++++++++
 tools/testing/selftests/bpf/test_pidns.c           | 146 +++++++++++++++++++++
 tools/testing/selftests/bpf/test_pidns_nmi.c       | 139 ++++++++++++++++++++
 15 files changed, 555 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c
 create mode 100644 tools/testing/selftests/bpf/test_pidns.c
 create mode 100644 tools/testing/selftests/bpf/test_pidns_nmi.c

-- 
2.11.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v10 1/4] fs/namei.c: make available filename_lookup() for bpf helpers.
  2019-09-06 15:09 [PATCH bpf-next v10 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira
@ 2019-09-06 15:09 ` Carlos Neira
  2019-09-06 15:09 ` [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info Carlos Neira
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 25+ messages in thread
From: Carlos Neira @ 2019-09-06 15:09 UTC (permalink / raw)
  To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
---
 fs/internal.h         | 2 --
 fs/namei.c            | 1 -
 include/linux/namei.h | 4 ++++
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 315fcd8d237c..6647e15dd419 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
 /*
  * namei.c
  */
-extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
-			   struct path *path, struct path *root);
 extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
 extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
 			   const char *, unsigned int, struct path *);
diff --git a/fs/namei.c b/fs/namei.c
index 209c51a5226c..a89fc72a4a10 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -19,7 +19,6 @@
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
-#include <linux/fs.h>
 #include <linux/namei.h>
 #include <linux/pagemap.h>
 #include <linux/fsnotify.h>
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 9138b4471dbf..b45c8b6f7cb4 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -6,6 +6,7 @@
 #include <linux/path.h>
 #include <linux/fcntl.h>
 #include <linux/errno.h>
+#include <linux/fs.h>
 
 enum { MAX_NESTED_LINKS = 8 };
 
@@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
 
 extern void nd_jump_link(struct path *path);
 
+extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
+			   struct path *path, struct path *root);
+
 static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
 {
 	((char *) name)[min(len, maxlen)] = '\0';
-- 
2.11.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from  current task New bpf helper bpf_get_current_pidns_info.
  2019-09-06 15:09 [PATCH bpf-next v10 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira
  2019-09-06 15:09 ` [PATCH bpf-next v10 1/4] fs/namei.c: make available filename_lookup() for bpf helpers Carlos Neira
@ 2019-09-06 15:09 ` Carlos Neira
  2019-09-06 15:24   ` Al Viro
  2019-09-10 22:46   ` Yonghong Song
  2019-09-06 15:09 ` [PATCH bpf-next v10 3/4] tools: Added bpf_get_current_pidns_info helper Carlos Neira
  2019-09-06 15:09 ` [PATCH bpf-next v10 4/4] tools/testing/selftests/bpf: Add self-tests for helper bpf_get_pidns_info Carlos Neira
  3 siblings, 2 replies; 25+ messages in thread
From: Carlos Neira @ 2019-09-06 15:09 UTC (permalink / raw)
  To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf

This helper(bpf_get_current_pidns_info) obtains the active namespace from
current and returns pid, tgid, device and namespace id as seen from that
namespace, allowing to instrument a process inside a container.

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
---
 include/linux/bpf.h      |  1 +
 include/uapi/linux/bpf.h | 35 +++++++++++++++++++-
 kernel/bpf/core.c        |  1 +
 kernel/bpf/helpers.c     | 86 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/bpf_trace.c |  2 ++
 5 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5b9d22338606..819cb1c84be0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1055,6 +1055,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
 extern const struct bpf_func_proto bpf_strtol_proto;
 extern const struct bpf_func_proto bpf_strtoul_proto;
 extern const struct bpf_func_proto bpf_tcp_sock_proto;
+extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
 
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b5889257cc33..3ec9aa1438b7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2747,6 +2747,32 @@ union bpf_attr {
  *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
+ *	Description
+ *		Get tgid, pid and namespace id as seen by the current namespace,
+ *		and device major/minor numbers from /proc/self/ns/pid. Such
+ *		information is stored in *pidns* of size *size*.
+ *
+ *		This helper is used when pid filtering is needed inside a
+ *		container as bpf_get_current_tgid() helper always returns the
+ *		pid id as seen by the root namespace.
+ *	Return
+ *		0 on success
+ *
+ *		On failure, the returned value is one of the following:
+ *
+ *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
+ *		or tgid of the current task.
+ *
+ *		**-ENOENT** if /proc/self/ns/pid does not exists.
+ *
+ *		**-ENOENT** if /proc/self/ns does not exists.
+ *
+ *		**-ENOMEM** if helper internal allocation fails.
+ *
+ *		**-EPERM** if not able to call helper.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2859,7 +2885,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(get_current_pidns_info),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -3610,4 +3637,10 @@ struct bpf_sockopt {
 	__s32	retval;
 };
 
+struct bpf_pidns_info {
+	__u32 dev;	/* dev_t from /proc/self/ns/pid inode */
+	__u32 nsid;
+	__u32 tgid;
+	__u32 pid;
+};
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 8191a7db2777..3159f2a0188c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
 const struct bpf_func_proto bpf_get_current_comm_proto __weak;
 const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
 const struct bpf_func_proto bpf_get_local_storage_proto __weak;
+const struct bpf_func_proto bpf_get_current_pidns_info __weak;
 
 const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
 {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 5e28718928ca..8dbe6347893c 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -11,6 +11,11 @@
 #include <linux/uidgid.h>
 #include <linux/filter.h>
 #include <linux/ctype.h>
+#include <linux/pid_namespace.h>
+#include <linux/kdev_t.h>
+#include <linux/stat.h>
+#include <linux/namei.h>
+#include <linux/version.h>
 
 #include "../../lib/kstrtox.h"
 
@@ -312,6 +317,87 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
 	preempt_enable();
 }
 
+BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
+	 size)
+{
+	const char *pidns_path = "/proc/self/ns/pid";
+	struct pid_namespace *pidns = NULL;
+	struct filename *fname = NULL;
+	struct inode *inode;
+	struct path kp;
+	pid_t tgid = 0;
+	pid_t pid = 0;
+	int ret = -EINVAL;
+	int len;
+
+	if (unlikely(in_interrupt() ||
+			current->flags & (PF_KTHREAD | PF_EXITING)))
+		return -EPERM;
+
+	if (unlikely(size != sizeof(struct bpf_pidns_info)))
+		return -EINVAL;
+
+	pidns = task_active_pid_ns(current);
+	if (unlikely(!pidns))
+		return -ENOENT;
+
+	pidns_info->nsid =  pidns->ns.inum;
+	pid = task_pid_nr_ns(current, pidns);
+	if (unlikely(!pid))
+		goto clear;
+
+	tgid = task_tgid_nr_ns(current, pidns);
+	if (unlikely(!tgid))
+		goto clear;
+
+	pidns_info->tgid = (u32) tgid;
+	pidns_info->pid = (u32) pid;
+
+	fname = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
+	if (unlikely(!fname)) {
+		ret = -ENOMEM;
+		goto clear;
+	}
+	const size_t fnamesize = offsetof(struct filename, iname[1]);
+	struct filename *tmp;
+
+	tmp = kmalloc(fnamesize, GFP_ATOMIC);
+	if (unlikely(!tmp)) {
+		__putname(fname);
+		ret = -ENOMEM;
+		goto clear;
+	}
+
+	tmp->name = (char *)fname;
+	fname = tmp;
+	len = strlen(pidns_path) + 1;
+	memcpy((char *)fname->name, pidns_path, len);
+	fname->uptr = NULL;
+	fname->aname = NULL;
+	fname->refcnt = 1;
+
+	ret = filename_lookup(AT_FDCWD, fname, 0, &kp, NULL);
+	if (ret)
+		goto clear;
+
+	inode = d_backing_inode(kp.dentry);
+	pidns_info->dev = (u32)inode->i_rdev;
+
+	return 0;
+
+clear:
+	memset((void *)pidns_info, 0, (size_t) size);
+	return ret;
+}
+
+const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
+	.func		= bpf_get_current_pidns_info,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_CONST_SIZE,
+};
+
 #ifdef CONFIG_CGROUPS
 BPF_CALL_0(bpf_get_current_cgroup_id)
 {
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ca1255d14576..5e1dc22765a5 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 	case BPF_FUNC_send_signal:
 		return &bpf_send_signal_proto;
+	case BPF_FUNC_get_current_pidns_info:
+		return &bpf_get_current_pidns_info_proto;
 	default:
 		return NULL;
 	}
-- 
2.11.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v10 3/4] tools: Added bpf_get_current_pidns_info  helper.
  2019-09-06 15:09 [PATCH bpf-next v10 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira
  2019-09-06 15:09 ` [PATCH bpf-next v10 1/4] fs/namei.c: make available filename_lookup() for bpf helpers Carlos Neira
  2019-09-06 15:09 ` [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info Carlos Neira
@ 2019-09-06 15:09 ` Carlos Neira
  2019-09-06 15:09 ` [PATCH bpf-next v10 4/4] tools/testing/selftests/bpf: Add self-tests for helper bpf_get_pidns_info Carlos Neira
  3 siblings, 0 replies; 25+ messages in thread
From: Carlos Neira @ 2019-09-06 15:09 UTC (permalink / raw)
  To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
---
 tools/include/uapi/linux/bpf.h | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b5889257cc33..3ec9aa1438b7 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2747,6 +2747,32 @@ union bpf_attr {
  *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
+ *	Description
+ *		Get tgid, pid and namespace id as seen by the current namespace,
+ *		and device major/minor numbers from /proc/self/ns/pid. Such
+ *		information is stored in *pidns* of size *size*.
+ *
+ *		This helper is used when pid filtering is needed inside a
+ *		container as bpf_get_current_tgid() helper always returns the
+ *		pid id as seen by the root namespace.
+ *	Return
+ *		0 on success
+ *
+ *		On failure, the returned value is one of the following:
+ *
+ *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
+ *		or tgid of the current task.
+ *
+ *		**-ENOENT** if /proc/self/ns/pid does not exists.
+ *
+ *		**-ENOENT** if /proc/self/ns does not exists.
+ *
+ *		**-ENOMEM** if helper internal allocation fails.
+ *
+ *		**-EPERM** if not able to call helper.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2859,7 +2885,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(get_current_pidns_info),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -3610,4 +3637,10 @@ struct bpf_sockopt {
 	__s32	retval;
 };
 
+struct bpf_pidns_info {
+	__u32 dev;	/* dev_t from /proc/self/ns/pid inode */
+	__u32 nsid;
+	__u32 tgid;
+	__u32 pid;
+};
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.11.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH bpf-next v10 4/4] tools/testing/selftests/bpf: Add self-tests  for helper bpf_get_pidns_info.
  2019-09-06 15:09 [PATCH bpf-next v10 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira
                   ` (2 preceding siblings ...)
  2019-09-06 15:09 ` [PATCH bpf-next v10 3/4] tools: Added bpf_get_current_pidns_info helper Carlos Neira
@ 2019-09-06 15:09 ` Carlos Neira
  2019-09-10 22:55   ` Yonghong Song
  3 siblings, 1 reply; 25+ messages in thread
From: Carlos Neira @ 2019-09-06 15:09 UTC (permalink / raw)
  To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf

Added 2 selftest:

bpf_get_current_pidns_info helper is called in an interrupt
context and also in a non interrupt context.

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
---
 tools/testing/selftests/bpf/Makefile               |   2 +-
 tools/testing/selftests/bpf/bpf_helpers.h          |   3 +
 .../testing/selftests/bpf/progs/test_pidns_kern.c  |  52 ++++++++
 .../selftests/bpf/progs/test_pidns_nmi_kern.c      |  52 ++++++++
 tools/testing/selftests/bpf/test_pidns.c           | 146 +++++++++++++++++++++
 tools/testing/selftests/bpf/test_pidns_nmi.c       | 139 ++++++++++++++++++++
 6 files changed, 393 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c
 create mode 100644 tools/testing/selftests/bpf/test_pidns.c
 create mode 100644 tools/testing/selftests/bpf/test_pidns_nmi.c

diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 1faad0c3c3c9..8507c89141f5 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
 	test_cgroup_storage test_select_reuseport test_section_names \
 	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
 	test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
-	test_sockopt_multi test_sockopt_inherit test_tcp_rtt
+	test_sockopt_multi test_sockopt_inherit test_tcp_rtt test_pidns test_pidns_nmi
 
 BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
 TEST_GEN_FILES = $(BPF_OBJ_FILES)
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 6c4930bc6e2e..30a70ebe583a 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -313,6 +313,9 @@ static unsigned int (*bpf_set_hash)(void *ctx, __u32 hash) =
 static int (*bpf_skb_adjust_room)(void *ctx, __s32 len_diff, __u32 mode,
 				  unsigned long long flags) =
 	(void *) BPF_FUNC_skb_adjust_room;
+static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info *buf,
+					 unsigned int buf_size) =
+	(void *) BPF_FUNC_get_current_pidns_info;
 
 /* Scan the ARCH passed in from ARCH env variable (see Makefile) */
 #if defined(__TARGET_ARCH_x86)
diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
new file mode 100644
index 000000000000..a4c0bde1608b
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <linux/bpf.h>
+#include <errno.h>
+#include "bpf_helpers.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, struct bpf_pidns_info);
+} nsidmap SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u32);
+} pidmap SEC(".maps");
+
+SEC("tracepoint/syscalls/sys_enter_nanosleep")
+int trace(void *ctx)
+{
+	struct bpf_pidns_info nsinfo;
+	__u32 key = 0, *expected_pid;
+	struct bpf_pidns_info  *val;
+
+	if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
+		return -EINVAL;
+
+	expected_pid = bpf_map_lookup_elem(&pidmap, &key);
+	__u64 tgid = bpf_get_current_pid_tgid();
+
+	if (!expected_pid || *expected_pid != nsinfo.pid ||
+			nsinfo.tgid != (__u32)tgid)
+		return 0;
+
+	val = bpf_map_lookup_elem(&nsidmap, &key);
+	if (val)
+		*val = nsinfo;
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
diff --git a/tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c
new file mode 100644
index 000000000000..7f02e4e29021
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <linux/bpf.h>
+#include <errno.h>
+#include "bpf_helpers.h"
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, struct bpf_pidns_info);
+} nsidmap SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u32);
+} pidmap SEC(".maps");
+
+SEC("tracepoint/net/netif_receive_skb")
+int trace(void *ctx)
+{
+	struct bpf_pidns_info nsinfo;
+	__u32 key = 0, *expected_pid;
+	struct bpf_pidns_info  *val;
+
+	if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
+		return -EINVAL;
+
+	expected_pid = bpf_map_lookup_elem(&pidmap, &key);
+	__u64 tgid = bpf_get_current_pid_tgid();
+
+	if (!expected_pid || *expected_pid != nsinfo.pid ||
+			nsinfo.tgid != (__u32)tgid)
+		return 0;
+
+	val = bpf_map_lookup_elem(&nsidmap, &key);
+	if (val)
+		*val = nsinfo;
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c
new file mode 100644
index 000000000000..0c579305da53
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_pidns.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <syscall.h>
+#include <unistd.h>
+#include <linux/perf_event.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <linux/bpf.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "cgroup_helpers.h"
+#include "bpf_rlimit.h"
+
+#define CHECK(condition, tag, format...) ({		\
+	int __ret = !!(condition);			\
+	if (__ret) {					\
+		printf("%s:FAIL:%s ", __func__, tag);	\
+		printf(format);				\
+	} else {					\
+		printf("%s:PASS:%s\n", __func__, tag);	\
+	}						\
+	__ret;						\
+})
+
+static int bpf_find_map(const char *test, struct bpf_object *obj,
+			const char *name)
+{
+	struct bpf_map *map;
+
+	map = bpf_object__find_map_by_name(obj, name);
+	if (!map)
+		return -1;
+	return bpf_map__fd(map);
+}
+
+
+int main(int argc, char **argv)
+{
+	const char *probe_name = "syscalls/sys_enter_nanosleep";
+	const char *file = "test_pidns_kern.o";
+	int err, bytes, efd, prog_fd, pmu_fd;
+	struct bpf_pidns_info knsid = {};
+	int pidmap_fd, nsidmap_fd;
+	struct perf_event_attr attr = {};
+	struct bpf_object *obj;
+	__u32 key = 0, pid;
+	int exit_code = 1;
+	struct stat st;
+	char buf[256];
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
+	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
+		goto cleanup_cgroup_env;
+
+	nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
+	if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+		  nsidmap_fd, errno))
+		goto close_prog;
+
+	pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
+	if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+		  pidmap_fd, errno))
+		goto close_prog;
+
+	pid = getpid();
+	bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
+
+	snprintf(buf, sizeof(buf),
+		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
+	efd = open(buf, O_RDONLY, 0);
+	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
+		goto close_prog;
+	bytes = read(efd, buf, sizeof(buf));
+	close(efd);
+	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
+		  "bytes %d errno %d\n", bytes, errno))
+		goto close_prog;
+
+	attr.config = strtol(buf, NULL, 0);
+	attr.type = PERF_TYPE_TRACEPOINT;
+	attr.sample_type = PERF_SAMPLE_RAW;
+	attr.sample_period = 1;
+	attr.wakeup_events = 1;
+
+	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
+	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
+		  errno))
+		goto close_prog;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
+		  errno))
+		goto close_pmu;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
+		  errno))
+		goto close_pmu;
+
+	/* trigger some syscalls */
+	sleep(1);
+
+	err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
+	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
+		goto close_pmu;
+
+	if (stat("/proc/self/ns/pid", &st))
+		goto close_pmu;
+
+	if (CHECK(knsid.nsid != (__u32) st.st_ino, "namespace id",
+		  "bpf %u user %u\n", knsid.nsid, (__u32) st.st_ino))
+		goto close_pmu;
+
+	if (CHECK(major(knsid.dev) != major(st.st_rdev), "dev major",
+		  "bpf %u user %u\n", major(knsid.dev), major(st.st_rdev)))
+		goto close_pmu;
+
+	if (CHECK(minor(knsid.dev) != minor(st.st_rdev), "dev minor",
+		  "bpf %u user %u\n", minor(knsid.dev), minor(st.st_rdev)))
+		goto close_pmu;
+
+	exit_code = 0;
+	printf("%s:PASS\n", argv[0]);
+
+close_pmu:
+	close(pmu_fd);
+close_prog:
+	bpf_object__close(obj);
+cleanup_cgroup_env:
+	return exit_code;
+}
diff --git a/tools/testing/selftests/bpf/test_pidns_nmi.c b/tools/testing/selftests/bpf/test_pidns_nmi.c
new file mode 100644
index 000000000000..bb8075bbe7d8
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_pidns_nmi.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <syscall.h>
+#include <unistd.h>
+#include <linux/perf_event.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <linux/bpf.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "cgroup_helpers.h"
+#include "bpf_rlimit.h"
+
+#define CHECK(condition, tag, format...) ({		\
+	int __ret = !!(condition);			\
+	if (__ret) {					\
+		printf("%s:FAIL:%s ", __func__, tag);	\
+		printf(format);				\
+	} else {					\
+		printf("%s:PASS:%s\n", __func__, tag);	\
+	}						\
+	__ret;						\
+})
+
+static int bpf_find_map(const char *test, struct bpf_object *obj,
+			const char *name)
+{
+	struct bpf_map *map;
+
+	map = bpf_object__find_map_by_name(obj, name);
+	if (!map)
+		return -1;
+	return bpf_map__fd(map);
+}
+
+
+int main(int argc, char **argv)
+{
+	const char *probe_name = "net/netif_receive_skb";
+	const char *file = "test_pidns_kern.o";
+	int err, bytes, efd, prog_fd, pmu_fd;
+	struct bpf_pidns_info knsid = {};
+	int pidmap_fd, nsidmap_fd;
+	struct perf_event_attr attr = {};
+	struct bpf_object *obj;
+	__u32 key = 0, pid;
+	int exit_code = 1;
+	struct stat st;
+	char buf[256];
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
+	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
+		goto cleanup_cgroup_env;
+
+	nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
+	if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+		  nsidmap_fd, errno))
+		goto close_prog;
+
+	pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
+	if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+		  pidmap_fd, errno))
+		goto close_prog;
+
+	pid = getpid();
+	bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
+
+	snprintf(buf, sizeof(buf),
+		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
+	efd = open(buf, O_RDONLY, 0);
+	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
+		goto close_prog;
+	bytes = read(efd, buf, sizeof(buf));
+	close(efd);
+	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
+		  "bytes %d errno %d\n", bytes, errno))
+		goto close_prog;
+
+	attr.config = strtol(buf, NULL, 0);
+	attr.type = PERF_TYPE_TRACEPOINT;
+	attr.sample_type = PERF_SAMPLE_RAW;
+	attr.sample_period = 1;
+	attr.wakeup_events = 1;
+
+	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
+	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
+		  errno))
+		goto close_prog;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
+		  errno))
+		goto close_pmu;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
+		  errno))
+		goto close_pmu;
+
+	/* trigger some syscalls */
+	sleep(1);
+
+	err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
+	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
+		goto close_pmu;
+
+	if (stat("/proc/self/ns/pid", &st))
+		goto close_pmu;
+
+	if (CHECK(knsid.nsid != (__u32) st.st_ino, "namespace_id",
+				"Called in interrupt context bpf %u user %u\n",
+				knsid.nsid, (__u32) st.st_ino))
+		goto close_pmu;
+
+	exit_code = 0;
+	printf("%s:PASS\n", argv[0]);
+
+close_pmu:
+	close(pmu_fd);
+close_prog:
+	bpf_object__close(obj);
+cleanup_cgroup_env:
+	return exit_code;
+}
-- 
2.11.0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from  current task New bpf helper bpf_get_current_pidns_info.
  2019-09-06 15:09 ` [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info Carlos Neira
@ 2019-09-06 15:24   ` Al Viro
  2019-09-06 15:46     ` Al Viro
  2019-09-10 22:46   ` Yonghong Song
  1 sibling, 1 reply; 25+ messages in thread
From: Al Viro @ 2019-09-06 15:24 UTC (permalink / raw)
  To: Carlos Neira; +Cc: netdev, yhs, ebiederm, brouer, bpf

On Fri, Sep 06, 2019 at 11:09:50AM -0400, Carlos Neira wrote:

> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> +	 size)
> +{
> +	const char *pidns_path = "/proc/self/ns/pid";

> +	fname = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> +	if (unlikely(!fname)) {
> +		ret = -ENOMEM;
> +		goto clear;
> +	}
> +	const size_t fnamesize = offsetof(struct filename, iname[1]);
> +	struct filename *tmp;
> +
> +	tmp = kmalloc(fnamesize, GFP_ATOMIC);
> +	if (unlikely(!tmp)) {
> +		__putname(fname);
> +		ret = -ENOMEM;
> +		goto clear;
> +	}
> +
> +	tmp->name = (char *)fname;
> +	fname = tmp;
> +	len = strlen(pidns_path) + 1;
> +	memcpy((char *)fname->name, pidns_path, len);
> +	fname->uptr = NULL;
> +	fname->aname = NULL;
> +	fname->refcnt = 1;
> +
> +	ret = filename_lookup(AT_FDCWD, fname, 0, &kp, NULL);
> +	if (ret)
> +		goto clear;

Where do I begin?
	* getname_kernel() is there for purpose
	* so's kern_path(), damnit
> +
> +	inode = d_backing_inode(kp.dentry);
> +	pidns_info->dev = (u32)inode->i_rdev;

	* ... and this is utter bollocks - userland doesn't
have to have procfs mounted anywhere; it doesn't have to
have it mounted on /proc; it can bloody well bind a symlink
to anywhere and anythin on top of /proc/self even if its
has procfs mounted there.

	This is fundamentally wrong; nothing in the kernel
(bpf very much included) has any business assuming anything
about what's mounted where.  And while we are at it,
how deep on kernel stack can that thing be called?
Because pathname resolution can bring all kinds of interesting
crap into the game - consider e.g. NFS4 referral traversal.
And it can occur - see above about the lack of warranties
that your pathwalk will go to procfs and will remain there.

NAKed-by: Al Viro <viro@zeniv.linux.org.uk>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from  current task New bpf helper bpf_get_current_pidns_info.
  2019-09-06 15:24   ` Al Viro
@ 2019-09-06 15:46     ` Al Viro
  2019-09-06 16:00       ` Al Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2019-09-06 15:46 UTC (permalink / raw)
  To: Carlos Neira; +Cc: netdev, yhs, ebiederm, brouer, bpf

On Fri, Sep 06, 2019 at 04:24:35PM +0100, Al Viro wrote:
> > +	tmp = kmalloc(fnamesize, GFP_ATOMIC);
> > +	if (unlikely(!tmp)) {
> > +		__putname(fname);
> > +		ret = -ENOMEM;
> > +		goto clear;
> > +	}
> > +
> > +	tmp->name = (char *)fname;
> > +	fname = tmp;
> > +	len = strlen(pidns_path) + 1;
> > +	memcpy((char *)fname->name, pidns_path, len);
> > +	fname->uptr = NULL;
> > +	fname->aname = NULL;
> > +	fname->refcnt = 1;
> > +
> > +	ret = filename_lookup(AT_FDCWD, fname, 0, &kp, NULL);
> > +	if (ret)
> > +		goto clear;
> 
> Where do I begin?
> 	* getname_kernel() is there for purpose
> 	* so's kern_path(), damnit

Oh, and filename_lookup() *CAN* sleep, obviously.  So that
GFP_ATOMIC above is completely pointless.

> > +
> > +	inode = d_backing_inode(kp.dentry);
> > +	pidns_info->dev = (u32)inode->i_rdev;

Why are plaing with device number, anyway?  And why would it
be anything other than 0?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from  current task New bpf helper bpf_get_current_pidns_info.
  2019-09-06 15:46     ` Al Viro
@ 2019-09-06 16:00       ` Al Viro
  2019-09-06 23:21         ` Yonghong Song
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2019-09-06 16:00 UTC (permalink / raw)
  To: Carlos Neira; +Cc: netdev, yhs, ebiederm, brouer, bpf

On Fri, Sep 06, 2019 at 04:46:47PM +0100, Al Viro wrote:

> > Where do I begin?
> > 	* getname_kernel() is there for purpose
> > 	* so's kern_path(), damnit
> 
> Oh, and filename_lookup() *CAN* sleep, obviously.  So that
> GFP_ATOMIC above is completely pointless.
> 
> > > +
> > > +	inode = d_backing_inode(kp.dentry);
> > > +	pidns_info->dev = (u32)inode->i_rdev;

In the original variant of patchset it used to be ->i_sb->s_dev,
which is also bloody strange - you are not asking filename_lookup()
to follow symlinks, so you'd get that of whatever filesystem
/proc/self/ns resides on.

->i_rdev use makes no sense whatsoever - it's a symlink and
neither it nor its target are device nodes; ->i_rdev will be
left zero for both.

What data are you really trying to get there?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-06 16:00       ` Al Viro
@ 2019-09-06 23:21         ` Yonghong Song
  2019-09-07  0:10           ` Al Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Yonghong Song @ 2019-09-06 23:21 UTC (permalink / raw)
  To: Al Viro, Carlos Neira; +Cc: netdev, ebiederm, brouer, bpf



On 9/6/19 9:00 AM, Al Viro wrote:
> On Fri, Sep 06, 2019 at 04:46:47PM +0100, Al Viro wrote:
> 
>>> Where do I begin?
>>> 	* getname_kernel() is there for purpose
>>> 	* so's kern_path(), damnit
>>
>> Oh, and filename_lookup() *CAN* sleep, obviously.  So that
>> GFP_ATOMIC above is completely pointless.
>>
>>>> +
>>>> +	inode = d_backing_inode(kp.dentry);
>>>> +	pidns_info->dev = (u32)inode->i_rdev;
> 
> In the original variant of patchset it used to be ->i_sb->s_dev,
> which is also bloody strange - you are not asking filename_lookup()
> to follow symlinks, so you'd get that of whatever filesystem
> /proc/self/ns resides on.
> 
> ->i_rdev use makes no sense whatsoever - it's a symlink and
> neither it nor its target are device nodes; ->i_rdev will be
> left zero for both.
> 
> What data are you really trying to get there?

Let me explain a little bit background here.
The ultimate goal is for bpf program to filter over
(pid_namespace, tgid/pid inside pid_namespace)
so bpf based tools can run inside the container.

Typically, pid namespace is achieved by looking at
/proc/self/ns/pid:
-bash-4.4$ lsns
         NS TYPE   NPROCS   PID USER COMMAND
4026531835 cgroup     44  8261 yhs  /usr/lib/systemd/systemd --user
4026531836 pid        44  8261 yhs  /usr/lib/systemd/systemd --user
4026531837 user       44  8261 yhs  /usr/lib/systemd/systemd --user
4026531838 uts        44  8261 yhs  /usr/lib/systemd/systemd --user
4026531839 ipc        44  8261 yhs  /usr/lib/systemd/systemd --user
4026531840 mnt        44  8261 yhs  /usr/lib/systemd/systemd --user
4026532008 net        44  8261 yhs  /usr/lib/systemd/systemd --user
-bash-4.4$ readlink /proc/self/ns/pid
pid:[4026531836]
-bash-4.4$ stat /proc/self/ns/pid
   File: ‘/proc/self/ns/pid’ -> ‘pid:[4026531836]’
   Size: 0               Blocks: 0          IO Block: 1024   symbolic link
Device: 4h/4d   Inode: 344795989   Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (128203/     yhs)   Gid: (  100/   users)
Context: user_u:base_r:base_t
Access: 2019-09-06 16:06:09.431616380 -0700
Modify: 2019-09-06 16:06:09.431616380 -0700
Change: 2019-09-06 16:06:09.431616380 -0700
  Birth: -
-bash-4.4$

Based on a discussion with Eric Biederman back in 2019 Linux
Plumbers, Eric suggested that to uniquely identify a
namespace, device id (major/minor) number should also
be included. Although today's kernel implementation
has the same device for all namespace pseudo files,
but from uapi perspective, device id should be included.

That is the reason why we try to get device id which holds
pid namespace pseudo file.

Do you have a better suggestion on how to get
the device id for 'current' pid namespace? Or from design, we
really should not care about device id at all?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-06 23:21         ` Yonghong Song
@ 2019-09-07  0:10           ` Al Viro
  2019-09-07  6:34             ` Yonghong Song
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2019-09-07  0:10 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Carlos Neira, netdev, ebiederm, brouer, bpf

On Fri, Sep 06, 2019 at 11:21:14PM +0000, Yonghong Song wrote:

> -bash-4.4$ readlink /proc/self/ns/pid
> pid:[4026531836]
> -bash-4.4$ stat /proc/self/ns/pid
>    File: ‘/proc/self/ns/pid’ -> ‘pid:[4026531836]’
>    Size: 0               Blocks: 0          IO Block: 1024   symbolic link
> Device: 4h/4d   Inode: 344795989   Links: 1
> Access: (0777/lrwxrwxrwx)  Uid: (128203/     yhs)   Gid: (  100/   users)
> Context: user_u:base_r:base_t
> Access: 2019-09-06 16:06:09.431616380 -0700
> Modify: 2019-09-06 16:06:09.431616380 -0700
> Change: 2019-09-06 16:06:09.431616380 -0700
>   Birth: -
> -bash-4.4$
> 
> Based on a discussion with Eric Biederman back in 2019 Linux
> Plumbers, Eric suggested that to uniquely identify a
> namespace, device id (major/minor) number should also
> be included. Although today's kernel implementation
> has the same device for all namespace pseudo files,
> but from uapi perspective, device id should be included.
> 
> That is the reason why we try to get device id which holds
> pid namespace pseudo file.
> 
> Do you have a better suggestion on how to get
> the device id for 'current' pid namespace? Or from design, we
> really should not care about device id at all?

What the hell is "device id for pid namespace"?  This is the
first time I've heard about that mystery object, so it's
hard to tell where it could be found.

I can tell you what device numbers are involved in the areas
you seem to be looking in.

1) there's whatever device number that gets assigned to
(this) procfs instance.  That, ironically, _is_ per-pidns, but
that of the procfs instance, not that of your process (and
those can be different).  That's what you get in ->st_dev
when doing lstat() of anything in /proc (assuming that
procfs is mounted there, in the first place).  NOTE:
that's lstat(2), not stat(2).  stat(1) uses lstat(2),
unless given -L (in which case it's stat(2) time).  The
difference:

root@kvm1:~# stat /proc/self/ns/pid 
  File: /proc/self/ns/pid -> pid:[4026531836]
  Size: 0               Blocks: 0          IO Block: 1024   symbolic link
Device: 4h/4d   Inode: 17396       Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-09-06 19:43:11.871312319 -0400
Modify: 2019-09-06 19:43:11.871312319 -0400
Change: 2019-09-06 19:43:11.871312319 -0400
 Birth: -
root@kvm1:~# stat -L /proc/self/ns/pid 
  File: /proc/self/ns/pid
  Size: 0               Blocks: 0          IO Block: 4096   regular empty file
Device: 3h/3d   Inode: 4026531836  Links: 1
Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-09-06 19:43:15.955313293 -0400
Modify: 2019-09-06 19:43:15.955313293 -0400
Change: 2019-09-06 19:43:15.955313293 -0400
 Birth: -

The former is lstat, the latter - stat.

2) device number of the filesystem where the symlink target lives.
In this case, it's nsfs and there's only one instance on the entire
system.  _That_ would be obtained by looking at st_dev in stat(2) on
/proc/self/ns/pid (0:3 above).

3) device number *OF* the symlink.  That would be st_rdev in lstat(2).
There's none - it's a symlink, not a character or block device.  It's
always zero and always will be zero.

4) the same for the target; st_rdev in stat(2) results and again,
there's no such beast - it's neither character nor block device.

Your code is looking at (3).  Please, reread any textbook on Unix
in the section that would cover stat(2) and discussion of the
difference between st_dev and st_rdev.

I have no idea what Eric had been talking about - it's hard to
reconstruct by what you said so far.  Making nsfs per-userns,
perhaps?  But that makes no sense whatsoever, not that userns
ever had...  Cheap shots aside, I really can't guess what that's
about.  Sorry.

In any case, pathname resolution is *NOT* for the situations where
you can't block.  Even if it's procfs (and from the same pidns as
the process) mounted there, there is no promise that the target
of /proc/self has already been looked up and not evicted from
memory since then.  And in case of cache miss pathwalk will
have to call ->lookup(), which requires locking the directory
(rw_sem, shared).  You can't do that in such context.

And that doesn't even go into the possibility that process has
something very different mounted on /proc.

Again, I don't know what it is that you want to get to, but
I would strongly recommend finding a way to get to that data
that would not involve going anywhere near pathname resolution.

How would you expect the userland to work with that value,
whatever it might be?  If it's just a 32bit field that will
never be read, you might as well store there the same value
you store now (0, that is) in much cheaper and safer way ;-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-07  0:10           ` Al Viro
@ 2019-09-07  6:34             ` Yonghong Song
  2019-09-09 17:45               ` Carlos Antonio Neira Bustos
  0 siblings, 1 reply; 25+ messages in thread
From: Yonghong Song @ 2019-09-07  6:34 UTC (permalink / raw)
  To: Al Viro; +Cc: Carlos Neira, netdev, ebiederm, brouer, bpf



On 9/6/19 5:10 PM, Al Viro wrote:
> On Fri, Sep 06, 2019 at 11:21:14PM +0000, Yonghong Song wrote:
> 
>> -bash-4.4$ readlink /proc/self/ns/pid
>> pid:[4026531836]
>> -bash-4.4$ stat /proc/self/ns/pid
>>     File: ‘/proc/self/ns/pid’ -> ‘pid:[4026531836]’
>>     Size: 0               Blocks: 0          IO Block: 1024   symbolic link
>> Device: 4h/4d   Inode: 344795989   Links: 1
>> Access: (0777/lrwxrwxrwx)  Uid: (128203/     yhs)   Gid: (  100/   users)
>> Context: user_u:base_r:base_t
>> Access: 2019-09-06 16:06:09.431616380 -0700
>> Modify: 2019-09-06 16:06:09.431616380 -0700
>> Change: 2019-09-06 16:06:09.431616380 -0700
>>    Birth: -
>> -bash-4.4$
>>
>> Based on a discussion with Eric Biederman back in 2019 Linux
>> Plumbers, Eric suggested that to uniquely identify a
>> namespace, device id (major/minor) number should also
>> be included. Although today's kernel implementation
>> has the same device for all namespace pseudo files,
>> but from uapi perspective, device id should be included.
>>
>> That is the reason why we try to get device id which holds
>> pid namespace pseudo file.
>>
>> Do you have a better suggestion on how to get
>> the device id for 'current' pid namespace? Or from design, we
>> really should not care about device id at all?
> 
> What the hell is "device id for pid namespace"?  This is the
> first time I've heard about that mystery object, so it's
> hard to tell where it could be found.
> 
> I can tell you what device numbers are involved in the areas
> you seem to be looking in.
> 
> 1) there's whatever device number that gets assigned to
> (this) procfs instance.  That, ironically, _is_ per-pidns, but
> that of the procfs instance, not that of your process (and
> those can be different).  That's what you get in ->st_dev
> when doing lstat() of anything in /proc (assuming that
> procfs is mounted there, in the first place).  NOTE:
> that's lstat(2), not stat(2).  stat(1) uses lstat(2),
> unless given -L (in which case it's stat(2) time).  The
> difference:
> 
> root@kvm1:~# stat /proc/self/ns/pid
>    File: /proc/self/ns/pid -> pid:[4026531836]
>    Size: 0               Blocks: 0          IO Block: 1024   symbolic link
> Device: 4h/4d   Inode: 17396       Links: 1
> Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2019-09-06 19:43:11.871312319 -0400
> Modify: 2019-09-06 19:43:11.871312319 -0400
> Change: 2019-09-06 19:43:11.871312319 -0400
>   Birth: -
> root@kvm1:~# stat -L /proc/self/ns/pid
>    File: /proc/self/ns/pid
>    Size: 0               Blocks: 0          IO Block: 4096   regular empty file
> Device: 3h/3d   Inode: 4026531836  Links: 1
> Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2019-09-06 19:43:15.955313293 -0400
> Modify: 2019-09-06 19:43:15.955313293 -0400
> Change: 2019-09-06 19:43:15.955313293 -0400
>   Birth: -
> 
> The former is lstat, the latter - stat.
> 
> 2) device number of the filesystem where the symlink target lives.
> In this case, it's nsfs and there's only one instance on the entire
> system.  _That_ would be obtained by looking at st_dev in stat(2) on
> /proc/self/ns/pid (0:3 above).
> 
> 3) device number *OF* the symlink.  That would be st_rdev in lstat(2).
> There's none - it's a symlink, not a character or block device.  It's
> always zero and always will be zero.
> 
> 4) the same for the target; st_rdev in stat(2) results and again,
> there's no such beast - it's neither character nor block device.
> 
> Your code is looking at (3).  Please, reread any textbook on Unix
> in the section that would cover stat(2) and discussion of the
> difference between st_dev and st_rdev.
> 
> I have no idea what Eric had been talking about - it's hard to
> reconstruct by what you said so far.  Making nsfs per-userns,
> perhaps?  But that makes no sense whatsoever, not that userns
> ever had...  Cheap shots aside, I really can't guess what that's
> about.  Sorry.

Thanks for the detailed information. The device number we want
is nsfs. Indeed, currently, there is only one instance
on the entire system. But not exactly sure what is the possibility
to have more than one nsfs device in the future. Maybe per-userns
or any other criteria?

> 
> In any case, pathname resolution is *NOT* for the situations where
> you can't block.  Even if it's procfs (and from the same pidns as
> the process) mounted there, there is no promise that the target
> of /proc/self has already been looked up and not evicted from
> memory since then.  And in case of cache miss pathwalk will
> have to call ->lookup(), which requires locking the directory
> (rw_sem, shared).  You can't do that in such context.
> 
> And that doesn't even go into the possibility that process has
> something very different mounted on /proc.
> 
> Again, I don't know what it is that you want to get to, but
> I would strongly recommend finding a way to get to that data
> that would not involve going anywhere near pathname resolution.
> 
> How would you expect the userland to work with that value,
> whatever it might be?  If it's just a 32bit field that will
> never be read, you might as well store there the same value
> you store now (0, that is) in much cheaper and safer way ;-)

Suppose inside pid namespace, user can pass the device number,
say n1, (`stat -L /proc/self/ns/pid`) to bpf program (through map
or JIT). At runtime, bpf program will try to get device number,
say n2, for the 'current' process. If n1 is not the same as
n2, that means they are not in the same namespace. 'current'
is in the same pid namespace as the user iff
n1 == n2 and also pidns id is the same for 'current' and
the one with `lsns -t pid`.

Are you aware of any way to get the pidns device number
for 'current' without going through the pathname
lookup?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-07  6:34             ` Yonghong Song
@ 2019-09-09 17:45               ` Carlos Antonio Neira Bustos
  2019-09-10 22:35                 ` Yonghong Song
  0 siblings, 1 reply; 25+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-09-09 17:45 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Al Viro, netdev, ebiederm, brouer, bpf

Thanks a lot, Al Viro and Yonghong for taking the time to review this patch and
provide technical insights needed on this one.
But how do we move this forward? 
Al Viro's review is clear that this will not work and we should strip the name 
resolution code (thanks for your detailed analysis).
As there is currently only one instance of the nsfs device on the system,  
I think we could leave out the retrieval of the pidns device number and address it
when the situation changes.
What do you think?


On Sat, Sep 07, 2019 at 06:34:39AM +0000, Yonghong Song wrote:
> 
> 
> On 9/6/19 5:10 PM, Al Viro wrote:
> > On Fri, Sep 06, 2019 at 11:21:14PM +0000, Yonghong Song wrote:
> > 
> >> -bash-4.4$ readlink /proc/self/ns/pid
> >> pid:[4026531836]
> >> -bash-4.4$ stat /proc/self/ns/pid
> >>     File: ‘/proc/self/ns/pid’ -> ‘pid:[4026531836]’
> >>     Size: 0               Blocks: 0          IO Block: 1024   symbolic link
> >> Device: 4h/4d   Inode: 344795989   Links: 1
> >> Access: (0777/lrwxrwxrwx)  Uid: (128203/     yhs)   Gid: (  100/   users)
> >> Context: user_u:base_r:base_t
> >> Access: 2019-09-06 16:06:09.431616380 -0700
> >> Modify: 2019-09-06 16:06:09.431616380 -0700
> >> Change: 2019-09-06 16:06:09.431616380 -0700
> >>    Birth: -
> >> -bash-4.4$
> >>
> >> Based on a discussion with Eric Biederman back in 2019 Linux
> >> Plumbers, Eric suggested that to uniquely identify a
> >> namespace, device id (major/minor) number should also
> >> be included. Although today's kernel implementation
> >> has the same device for all namespace pseudo files,
> >> but from uapi perspective, device id should be included.
> >>
> >> That is the reason why we try to get device id which holds
> >> pid namespace pseudo file.
> >>
> >> Do you have a better suggestion on how to get
> >> the device id for 'current' pid namespace? Or from design, we
> >> really should not care about device id at all?
> > 
> > What the hell is "device id for pid namespace"?  This is the
> > first time I've heard about that mystery object, so it's
> > hard to tell where it could be found.
> > 
> > I can tell you what device numbers are involved in the areas
> > you seem to be looking in.
> > 
> > 1) there's whatever device number that gets assigned to
> > (this) procfs instance.  That, ironically, _is_ per-pidns, but
> > that of the procfs instance, not that of your process (and
> > those can be different).  That's what you get in ->st_dev
> > when doing lstat() of anything in /proc (assuming that
> > procfs is mounted there, in the first place).  NOTE:
> > that's lstat(2), not stat(2).  stat(1) uses lstat(2),
> > unless given -L (in which case it's stat(2) time).  The
> > difference:
> > 
> > root@kvm1:~# stat /proc/self/ns/pid
> >    File: /proc/self/ns/pid -> pid:[4026531836]
> >    Size: 0               Blocks: 0          IO Block: 1024   symbolic link
> > Device: 4h/4d   Inode: 17396       Links: 1
> > Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
> > Access: 2019-09-06 19:43:11.871312319 -0400
> > Modify: 2019-09-06 19:43:11.871312319 -0400
> > Change: 2019-09-06 19:43:11.871312319 -0400
> >   Birth: -
> > root@kvm1:~# stat -L /proc/self/ns/pid
> >    File: /proc/self/ns/pid
> >    Size: 0               Blocks: 0          IO Block: 4096   regular empty file
> > Device: 3h/3d   Inode: 4026531836  Links: 1
> > Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
> > Access: 2019-09-06 19:43:15.955313293 -0400
> > Modify: 2019-09-06 19:43:15.955313293 -0400
> > Change: 2019-09-06 19:43:15.955313293 -0400
> >   Birth: -
> > 
> > The former is lstat, the latter - stat.
> > 
> > 2) device number of the filesystem where the symlink target lives.
> > In this case, it's nsfs and there's only one instance on the entire
> > system.  _That_ would be obtained by looking at st_dev in stat(2) on
> > /proc/self/ns/pid (0:3 above).
> > 
> > 3) device number *OF* the symlink.  That would be st_rdev in lstat(2).
> > There's none - it's a symlink, not a character or block device.  It's
> > always zero and always will be zero.
> > 
> > 4) the same for the target; st_rdev in stat(2) results and again,
> > there's no such beast - it's neither character nor block device.
> > 
> > Your code is looking at (3).  Please, reread any textbook on Unix
> > in the section that would cover stat(2) and discussion of the
> > difference between st_dev and st_rdev.
> > 
> > I have no idea what Eric had been talking about - it's hard to
> > reconstruct by what you said so far.  Making nsfs per-userns,
> > perhaps?  But that makes no sense whatsoever, not that userns
> > ever had...  Cheap shots aside, I really can't guess what that's
> > about.  Sorry.
> 
> Thanks for the detailed information. The device number we want
> is nsfs. Indeed, currently, there is only one instance
> on the entire system. But not exactly sure what is the possibility
> to have more than one nsfs device in the future. Maybe per-userns
> or any other criteria?
> 
> > 
> > In any case, pathname resolution is *NOT* for the situations where
> > you can't block.  Even if it's procfs (and from the same pidns as
> > the process) mounted there, there is no promise that the target
> > of /proc/self has already been looked up and not evicted from
> > memory since then.  And in case of cache miss pathwalk will
> > have to call ->lookup(), which requires locking the directory
> > (rw_sem, shared).  You can't do that in such context.
> > 
> > And that doesn't even go into the possibility that process has
> > something very different mounted on /proc.
> > 
> > Again, I don't know what it is that you want to get to, but
> > I would strongly recommend finding a way to get to that data
> > that would not involve going anywhere near pathname resolution.
> > 
> > How would you expect the userland to work with that value,
> > whatever it might be?  If it's just a 32bit field that will
> > never be read, you might as well store there the same value
> > you store now (0, that is) in much cheaper and safer way ;-)
> 
> Suppose inside pid namespace, user can pass the device number,
> say n1, (`stat -L /proc/self/ns/pid`) to bpf program (through map
> or JIT). At runtime, bpf program will try to get device number,
> say n2, for the 'current' process. If n1 is not the same as
> n2, that means they are not in the same namespace. 'current'
> is in the same pid namespace as the user iff
> n1 == n2 and also pidns id is the same for 'current' and
> the one with `lsns -t pid`.
> 
> Are you aware of any way to get the pidns device number
> for 'current' without going through the pathname
> lookup?
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-09 17:45               ` Carlos Antonio Neira Bustos
@ 2019-09-10 22:35                 ` Yonghong Song
  2019-09-10 23:15                   ` Al Viro
  2019-09-11  4:32                   ` Carlos Antonio Neira Bustos
  0 siblings, 2 replies; 25+ messages in thread
From: Yonghong Song @ 2019-09-10 22:35 UTC (permalink / raw)
  To: Carlos Antonio Neira Bustos; +Cc: Al Viro, netdev, ebiederm, brouer, bpf


Carlos,

Discussed with Eric today for what is the best way to get
the device number for a namespace. The following patch seems
a reasonable start although Eric would like to see
how the helper is used in order to decide whether the
interface looks right.

commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
Author: Yonghong Song <yhs@fb.com>
Date:   Mon Sep 9 21:50:51 2019 -0700

     nsfs: add an interface function ns_get_inum_dev()

     This patch added an interface function
     ns_get_inum_dev(). Given a ns_common structure,
     the function returns the inode and device
     numbers. The function will be used later
     by a newly added bpf helper.

     Signed-off-by: Yonghong Song <yhs@fb.com>

diff --git a/fs/nsfs.c b/fs/nsfs.c
index a0431642c6b5..a603c6fc3f54 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
         return ERR_PTR(-EINVAL);
  }

+/* Get the device number for the current task pidns.
+ */
+void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
+{
+       *inum = ns->inum;
+       *dev = nsfs_mnt->mnt_sb->s_dev;
+}
+
  static int nsfs_show_path(struct seq_file *seq, struct dentry *dentry)
  {
         struct inode *inode = d_inode(dentry);
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index d31cb6215905..b8fc680cdf1a 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -81,6 +81,7 @@ extern void *ns_get_path(struct path *path, struct 
task_struct *task,
  typedef struct ns_common *ns_get_path_helper_t(void *);
  extern void *ns_get_path_cb(struct path *path, ns_get_path_helper_t 
ns_get_cb,
                             void *private_data);
+extern void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev);

  extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
                         const struct proc_ns_operations *ns_ops);

Could you put the above change and patch #1 and then have
all your other patches? In your kernel change, please use
interface function ns_get_inum_dev() to get pidns inode number
and dev number.

On 9/9/19 6:45 PM, Carlos Antonio Neira Bustos wrote:
> Thanks a lot, Al Viro and Yonghong for taking the time to review this patch and
> provide technical insights needed on this one.
> But how do we move this forward?
> Al Viro's review is clear that this will not work and we should strip the name
> resolution code (thanks for your detailed analysis).
> As there is currently only one instance of the nsfs device on the system,
> I think we could leave out the retrieval of the pidns device number and address it
> when the situation changes.
> What do you think?
> 
> 
> On Sat, Sep 07, 2019 at 06:34:39AM +0000, Yonghong Song wrote:
>>
>>
>> On 9/6/19 5:10 PM, Al Viro wrote:
>>> On Fri, Sep 06, 2019 at 11:21:14PM +0000, Yonghong Song wrote:
>>>
>>>> -bash-4.4$ readlink /proc/self/ns/pid
>>>> pid:[4026531836]
>>>> -bash-4.4$ stat /proc/self/ns/pid
>>>>      File: ‘/proc/self/ns/pid’ -> ‘pid:[4026531836]’
>>>>      Size: 0               Blocks: 0          IO Block: 1024   symbolic link
>>>> Device: 4h/4d   Inode: 344795989   Links: 1
>>>> Access: (0777/lrwxrwxrwx)  Uid: (128203/     yhs)   Gid: (  100/   users)
>>>> Context: user_u:base_r:base_t
>>>> Access: 2019-09-06 16:06:09.431616380 -0700
>>>> Modify: 2019-09-06 16:06:09.431616380 -0700
>>>> Change: 2019-09-06 16:06:09.431616380 -0700
>>>>     Birth: -
>>>> -bash-4.4$
>>>>
>>>> Based on a discussion with Eric Biederman back in 2019 Linux
>>>> Plumbers, Eric suggested that to uniquely identify a
>>>> namespace, device id (major/minor) number should also
>>>> be included. Although today's kernel implementation
>>>> has the same device for all namespace pseudo files,
>>>> but from uapi perspective, device id should be included.
>>>>
>>>> That is the reason why we try to get device id which holds
>>>> pid namespace pseudo file.
>>>>
>>>> Do you have a better suggestion on how to get
>>>> the device id for 'current' pid namespace? Or from design, we
>>>> really should not care about device id at all?
>>>
>>> What the hell is "device id for pid namespace"?  This is the
>>> first time I've heard about that mystery object, so it's
>>> hard to tell where it could be found.
>>>
>>> I can tell you what device numbers are involved in the areas
>>> you seem to be looking in.
>>>
>>> 1) there's whatever device number that gets assigned to
>>> (this) procfs instance.  That, ironically, _is_ per-pidns, but
>>> that of the procfs instance, not that of your process (and
>>> those can be different).  That's what you get in ->st_dev
>>> when doing lstat() of anything in /proc (assuming that
>>> procfs is mounted there, in the first place).  NOTE:
>>> that's lstat(2), not stat(2).  stat(1) uses lstat(2),
>>> unless given -L (in which case it's stat(2) time).  The
>>> difference:
>>>
>>> root@kvm1:~# stat /proc/self/ns/pid
>>>     File: /proc/self/ns/pid -> pid:[4026531836]
>>>     Size: 0               Blocks: 0          IO Block: 1024   symbolic link
>>> Device: 4h/4d   Inode: 17396       Links: 1
>>> Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
>>> Access: 2019-09-06 19:43:11.871312319 -0400
>>> Modify: 2019-09-06 19:43:11.871312319 -0400
>>> Change: 2019-09-06 19:43:11.871312319 -0400
>>>    Birth: -
>>> root@kvm1:~# stat -L /proc/self/ns/pid
>>>     File: /proc/self/ns/pid
>>>     Size: 0               Blocks: 0          IO Block: 4096   regular empty file
>>> Device: 3h/3d   Inode: 4026531836  Links: 1
>>> Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
>>> Access: 2019-09-06 19:43:15.955313293 -0400
>>> Modify: 2019-09-06 19:43:15.955313293 -0400
>>> Change: 2019-09-06 19:43:15.955313293 -0400
>>>    Birth: -
>>>
>>> The former is lstat, the latter - stat.
>>>
>>> 2) device number of the filesystem where the symlink target lives.
>>> In this case, it's nsfs and there's only one instance on the entire
>>> system.  _That_ would be obtained by looking at st_dev in stat(2) on
>>> /proc/self/ns/pid (0:3 above).
>>>
>>> 3) device number *OF* the symlink.  That would be st_rdev in lstat(2).
>>> There's none - it's a symlink, not a character or block device.  It's
>>> always zero and always will be zero.
>>>
>>> 4) the same for the target; st_rdev in stat(2) results and again,
>>> there's no such beast - it's neither character nor block device.
>>>
>>> Your code is looking at (3).  Please, reread any textbook on Unix
>>> in the section that would cover stat(2) and discussion of the
>>> difference between st_dev and st_rdev.
>>>
>>> I have no idea what Eric had been talking about - it's hard to
>>> reconstruct by what you said so far.  Making nsfs per-userns,
>>> perhaps?  But that makes no sense whatsoever, not that userns
>>> ever had...  Cheap shots aside, I really can't guess what that's
>>> about.  Sorry.
>>
>> Thanks for the detailed information. The device number we want
>> is nsfs. Indeed, currently, there is only one instance
>> on the entire system. But not exactly sure what is the possibility
>> to have more than one nsfs device in the future. Maybe per-userns
>> or any other criteria?
>>
>>>
>>> In any case, pathname resolution is *NOT* for the situations where
>>> you can't block.  Even if it's procfs (and from the same pidns as
>>> the process) mounted there, there is no promise that the target
>>> of /proc/self has already been looked up and not evicted from
>>> memory since then.  And in case of cache miss pathwalk will
>>> have to call ->lookup(), which requires locking the directory
>>> (rw_sem, shared).  You can't do that in such context.
>>>
>>> And that doesn't even go into the possibility that process has
>>> something very different mounted on /proc.
>>>
>>> Again, I don't know what it is that you want to get to, but
>>> I would strongly recommend finding a way to get to that data
>>> that would not involve going anywhere near pathname resolution.
>>>
>>> How would you expect the userland to work with that value,
>>> whatever it might be?  If it's just a 32bit field that will
>>> never be read, you might as well store there the same value
>>> you store now (0, that is) in much cheaper and safer way ;-)
>>
>> Suppose inside pid namespace, user can pass the device number,
>> say n1, (`stat -L /proc/self/ns/pid`) to bpf program (through map
>> or JIT). At runtime, bpf program will try to get device number,
>> say n2, for the 'current' process. If n1 is not the same as
>> n2, that means they are not in the same namespace. 'current'
>> is in the same pid namespace as the user iff
>> n1 == n2 and also pidns id is the same for 'current' and
>> the one with `lsns -t pid`.
>>
>> Are you aware of any way to get the pidns device number
>> for 'current' without going through the pathname
>> lookup?
>>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-06 15:09 ` [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info Carlos Neira
  2019-09-06 15:24   ` Al Viro
@ 2019-09-10 22:46   ` Yonghong Song
  2019-09-11  4:33     ` Carlos Antonio Neira Bustos
  1 sibling, 1 reply; 25+ messages in thread
From: Yonghong Song @ 2019-09-10 22:46 UTC (permalink / raw)
  To: Carlos Neira, netdev; +Cc: ebiederm, brouer, bpf



On 9/6/19 4:09 PM, Carlos Neira wrote:
> This helper(bpf_get_current_pidns_info) obtains the active namespace from
> current and returns pid, tgid, device and namespace id as seen from that
> namespace, allowing to instrument a process inside a container.
> 
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
>   include/linux/bpf.h      |  1 +
>   include/uapi/linux/bpf.h | 35 +++++++++++++++++++-
>   kernel/bpf/core.c        |  1 +
>   kernel/bpf/helpers.c     | 86 ++++++++++++++++++++++++++++++++++++++++++++++++
>   kernel/trace/bpf_trace.c |  2 ++
>   5 files changed, 124 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 5b9d22338606..819cb1c84be0 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1055,6 +1055,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
>   extern const struct bpf_func_proto bpf_strtol_proto;
>   extern const struct bpf_func_proto bpf_strtoul_proto;
>   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
>   
>   /* Shared helpers among cBPF and eBPF. */
>   void bpf_user_rnd_init_once(void);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b5889257cc33..3ec9aa1438b7 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2747,6 +2747,32 @@ union bpf_attr {
>    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>    *
>    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> + *	Description
> + *		Get tgid, pid and namespace id as seen by the current namespace,
> + *		and device major/minor numbers from /proc/self/ns/pid. Such
> + *		information is stored in *pidns* of size *size*.
> + *
> + *		This helper is used when pid filtering is needed inside a
> + *		container as bpf_get_current_tgid() helper always returns the
> + *		pid id as seen by the root namespace.
> + *	Return
> + *		0 on success
> + *
> + *		On failure, the returned value is one of the following:
> + *
> + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> + *		or tgid of the current task.
> + *
> + *		**-ENOENT** if /proc/self/ns/pid does not exists.
> + *
> + *		**-ENOENT** if /proc/self/ns does not exists.
> + *
> + *		**-ENOMEM** if helper internal allocation fails.

-ENOMEM can be removed.

> + *
> + *		**-EPERM** if not able to call helper.
> + *
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2859,7 +2885,8 @@ union bpf_attr {
>   	FN(sk_storage_get),		\
>   	FN(sk_storage_delete),		\
>   	FN(send_signal),		\
> -	FN(tcp_gen_syncookie),
> +	FN(tcp_gen_syncookie),		\
> +	FN(get_current_pidns_info),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> @@ -3610,4 +3637,10 @@ struct bpf_sockopt {
>   	__s32	retval;
>   };
>   
> +struct bpf_pidns_info {
> +	__u32 dev;	/* dev_t from /proc/self/ns/pid inode */

    /* dev_t of pid namespace pseudo file (typically /proc/seelf/ns/pid) 
after following symbolic link */

> +	__u32 nsid;
> +	__u32 tgid;
> +	__u32 pid;
> +};
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 8191a7db2777..3159f2a0188c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
>   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
>   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
>   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
>   
>   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
>   {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5e28718928ca..8dbe6347893c 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -11,6 +11,11 @@
>   #include <linux/uidgid.h>
>   #include <linux/filter.h>
>   #include <linux/ctype.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/kdev_t.h>
> +#include <linux/stat.h>
> +#include <linux/namei.h>
> +#include <linux/version.h>
>   
>   #include "../../lib/kstrtox.h"
>   
> @@ -312,6 +317,87 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
>   	preempt_enable();
>   }
>   
> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> +	 size)
> +{
> +	const char *pidns_path = "/proc/self/ns/pid";
> +	struct pid_namespace *pidns = NULL;
> +	struct filename *fname = NULL;
> +	struct inode *inode;
> +	struct path kp;
> +	pid_t tgid = 0;
> +	pid_t pid = 0;
> +	int ret = -EINVAL;
> +	int len;
> +
> +	if (unlikely(in_interrupt() ||
> +			current->flags & (PF_KTHREAD | PF_EXITING)))
> +		return -EPERM;
> +
> +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> +		return -EINVAL;
> +
> +	pidns = task_active_pid_ns(current);
> +	if (unlikely(!pidns))
> +		return -ENOENT;
> +
> +	pidns_info->nsid =  pidns->ns.inum;
> +	pid = task_pid_nr_ns(current, pidns);
> +	if (unlikely(!pid))
> +		goto clear;
> +
> +	tgid = task_tgid_nr_ns(current, pidns);
> +	if (unlikely(!tgid))
> +		goto clear;
> +
> +	pidns_info->tgid = (u32) tgid;
> +	pidns_info->pid = (u32) pid;
> +
[...]
> +	fname = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> +	if (unlikely(!fname)) {
> +		ret = -ENOMEM;
> +		goto clear;
> +	}
> +	const size_t fnamesize = offsetof(struct filename, iname[1]);
> +	struct filename *tmp;
> +
> +	tmp = kmalloc(fnamesize, GFP_ATOMIC);
> +	if (unlikely(!tmp)) {
> +		__putname(fname);
> +		ret = -ENOMEM;
> +		goto clear;
> +	}
> +
> +	tmp->name = (char *)fname;
> +	fname = tmp;
> +	len = strlen(pidns_path) + 1;
> +	memcpy((char *)fname->name, pidns_path, len);
> +	fname->uptr = NULL;
> +	fname->aname = NULL;
> +	fname->refcnt = 1;
> +
> +	ret = filename_lookup(AT_FDCWD, fname, 0, &kp, NULL);
> +	if (ret)
> +		goto clear;
> +
> +	inode = d_backing_inode(kp.dentry);
> +	pidns_info->dev = (u32)inode->i_rdev;
The above can bee replaced with new nsfs interface function
ns_get_inum_dev().
> +
> +	return 0;
> +
> +clear:
> +	memset((void *)pidns_info, 0, (size_t) size);
> +	return ret;
> +}
> +
> +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> +	.func		= bpf_get_current_pidns_info,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg2_type	= ARG_CONST_SIZE,
> +};
> +
>   #ifdef CONFIG_CGROUPS
>   BPF_CALL_0(bpf_get_current_cgroup_id)
>   {
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ca1255d14576..5e1dc22765a5 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   #endif
>   	case BPF_FUNC_send_signal:
>   		return &bpf_send_signal_proto;
> +	case BPF_FUNC_get_current_pidns_info:
> +		return &bpf_get_current_pidns_info_proto;
>   	default:
>   		return NULL;
>   	}
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 4/4] tools/testing/selftests/bpf: Add self-tests for helper bpf_get_pidns_info.
  2019-09-06 15:09 ` [PATCH bpf-next v10 4/4] tools/testing/selftests/bpf: Add self-tests for helper bpf_get_pidns_info Carlos Neira
@ 2019-09-10 22:55   ` Yonghong Song
  0 siblings, 0 replies; 25+ messages in thread
From: Yonghong Song @ 2019-09-10 22:55 UTC (permalink / raw)
  To: Carlos Neira, netdev; +Cc: ebiederm, brouer, bpf



On 9/6/19 4:09 PM, Carlos Neira wrote:
> Added 2 selftest:
> 
> bpf_get_current_pidns_info helper is called in an interrupt
> context and also in a non interrupt context.
> 
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
>   tools/testing/selftests/bpf/Makefile               |   2 +-
>   tools/testing/selftests/bpf/bpf_helpers.h          |   3 +
>   .../testing/selftests/bpf/progs/test_pidns_kern.c  |  52 ++++++++
>   .../selftests/bpf/progs/test_pidns_nmi_kern.c      |  52 ++++++++
>   tools/testing/selftests/bpf/test_pidns.c           | 146 +++++++++++++++++++++
>   tools/testing/selftests/bpf/test_pidns_nmi.c       | 139 ++++++++++++++++++++
>   6 files changed, 393 insertions(+), 1 deletion(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
>   create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c
>   create mode 100644 tools/testing/selftests/bpf/test_pidns.c
>   create mode 100644 tools/testing/selftests/bpf/test_pidns_nmi.c
> 
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 1faad0c3c3c9..8507c89141f5 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
>   	test_cgroup_storage test_select_reuseport test_section_names \
>   	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
>   	test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
> -	test_sockopt_multi test_sockopt_inherit test_tcp_rtt
> +	test_sockopt_multi test_sockopt_inherit test_tcp_rtt test_pidns test_pidns_nmi
>   
>   BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
>   TEST_GEN_FILES = $(BPF_OBJ_FILES)
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 6c4930bc6e2e..30a70ebe583a 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -313,6 +313,9 @@ static unsigned int (*bpf_set_hash)(void *ctx, __u32 hash) =
>   static int (*bpf_skb_adjust_room)(void *ctx, __s32 len_diff, __u32 mode,
>   				  unsigned long long flags) =
>   	(void *) BPF_FUNC_skb_adjust_room;
> +static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info *buf,
> +					 unsigned int buf_size) =
> +	(void *) BPF_FUNC_get_current_pidns_info;
>   
>   /* Scan the ARCH passed in from ARCH env variable (see Makefile) */
>   #if defined(__TARGET_ARCH_x86)
> diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
> new file mode 100644
> index 000000000000..a4c0bde1608b
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <linux/bpf.h>
> +#include <errno.h>
> +#include "bpf_helpers.h"
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, 1);
> +	__type(key, __u32);
> +	__type(value, struct bpf_pidns_info);
> +} nsidmap SEC(".maps");
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, 1);
> +	__type(key, __u32);
> +	__type(value, __u32);
> +} pidmap SEC(".maps");
> +
> +SEC("tracepoint/syscalls/sys_enter_nanosleep")
> +int trace(void *ctx)
> +{
> +	struct bpf_pidns_info nsinfo;
> +	__u32 key = 0, *expected_pid;
> +	struct bpf_pidns_info  *val;
> +
> +	if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
> +		return -EINVAL;
> +
> +	expected_pid = bpf_map_lookup_elem(&pidmap, &key);
> +	__u64 tgid = bpf_get_current_pid_tgid();
> +
> +	if (!expected_pid || *expected_pid != nsinfo.pid ||
> +			nsinfo.tgid != (__u32)tgid)
> +		return 0;

The logic in the above is not right.
bpf_get_current_pidns_info() retrieved
   dev, nsid, tgid and pid.

You should create a map, populate dev/nsid you
got from the user space. And then update maybe
another map with tgid/pid and passed back to
user space for verification.

> +
> +	val = bpf_map_lookup_elem(&nsidmap, &key);
> +	if (val)
> +		*val = nsinfo;
> +
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +__u32 _version SEC("version") = 1;
> diff --git a/tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c
> new file mode 100644
> index 000000000000..7f02e4e29021
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_pidns_nmi_kern.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <linux/bpf.h>
> +#include <errno.h>
> +#include "bpf_helpers.h"
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, 1);
> +	__type(key, __u32);
> +	__type(value, struct bpf_pidns_info);
> +} nsidmap SEC(".maps");
> +
> +struct {
> +	__uint(type, BPF_MAP_TYPE_ARRAY);
> +	__uint(max_entries, 1);
> +	__type(key, __u32);
> +	__type(value, __u32);
> +} pidmap SEC(".maps");
> +
> +SEC("tracepoint/net/netif_receive_skb")
> +int trace(void *ctx)
> +{
> +	struct bpf_pidns_info nsinfo;
> +	__u32 key = 0, *expected_pid;
> +	struct bpf_pidns_info  *val;
> +
> +	if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
> +		return -EINVAL;
> +
> +	expected_pid = bpf_map_lookup_elem(&pidmap, &key);
> +	__u64 tgid = bpf_get_current_pid_tgid();
> +
> +	if (!expected_pid || *expected_pid != nsinfo.pid ||
> +			nsinfo.tgid != (__u32)tgid)
> +		return 0;
> +
> +	val = bpf_map_lookup_elem(&nsidmap, &key);
> +	if (val)
> +		*val = nsinfo;
> +
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +__u32 _version SEC("version") = 1;

With the new nsfs interface function(), the file test_pidns_nmi_kern.c
is not needed any more.

Please also remove the in_interrupt() checking in the kernel helper 
implementation.

> diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c

Please restructure and put this file under 
tools/testing/selftests/bpf/prog_tests directory.

> new file mode 100644
> index 000000000000..0c579305da53
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_pidns.c
> @@ -0,0 +1,146 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <syscall.h>
> +#include <unistd.h>
> +#include <linux/perf_event.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include "cgroup_helpers.h"
> +#include "bpf_rlimit.h"
> +
> +#define CHECK(condition, tag, format...) ({		\
> +	int __ret = !!(condition);			\
> +	if (__ret) {					\
> +		printf("%s:FAIL:%s ", __func__, tag);	\
> +		printf(format);				\
> +	} else {					\
> +		printf("%s:PASS:%s\n", __func__, tag);	\
> +	}						\
> +	__ret;						\
> +})
> +
> +static int bpf_find_map(const char *test, struct bpf_object *obj,
> +			const char *name)
> +{
> +	struct bpf_map *map;
> +
> +	map = bpf_object__find_map_by_name(obj, name);
> +	if (!map)
> +		return -1;
> +	return bpf_map__fd(map);
> +}
> +
> +
> +int main(int argc, char **argv)
> +{
> +	const char *probe_name = "syscalls/sys_enter_nanosleep";
> +	const char *file = "test_pidns_kern.o";
> +	int err, bytes, efd, prog_fd, pmu_fd;
> +	struct bpf_pidns_info knsid = {};
> +	int pidmap_fd, nsidmap_fd;
> +	struct perf_event_attr attr = {};
> +	struct bpf_object *obj;
> +	__u32 key = 0, pid;
> +	int exit_code = 1;
> +	struct stat st;
> +	char buf[256];
> +
> +	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
> +	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
> +		goto cleanup_cgroup_env;
> +
> +	nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
> +	if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  nsidmap_fd, errno))
> +		goto close_prog;
> +
> +	pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
> +	if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  pidmap_fd, errno))
> +		goto close_prog;
> +
> +	pid = getpid();
> +	bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
> +
> +	snprintf(buf, sizeof(buf),
> +		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
> +	efd = open(buf, O_RDONLY, 0);
> +	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
> +		goto close_prog;
> +	bytes = read(efd, buf, sizeof(buf));
> +	close(efd);
> +	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
> +		  "bytes %d errno %d\n", bytes, errno))
> +		goto close_prog;
> +
> +	attr.config = strtol(buf, NULL, 0);
> +	attr.type = PERF_TYPE_TRACEPOINT;
> +	attr.sample_type = PERF_SAMPLE_RAW;
> +	attr.sample_period = 1;
> +	attr.wakeup_events = 1;
> +
> +	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
> +	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
> +		  errno))
> +		goto close_prog;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> +	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> +	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	/* trigger some syscalls */
> +	sleep(1);
> +
> +	err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
> +	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
> +		goto close_pmu;
> +
> +	if (stat("/proc/self/ns/pid", &st))
> +		goto close_pmu;
> +
> +	if (CHECK(knsid.nsid != (__u32) st.st_ino, "namespace id",
> +		  "bpf %u user %u\n", knsid.nsid, (__u32) st.st_ino))
> +		goto close_pmu;
> +
> +	if (CHECK(major(knsid.dev) != major(st.st_rdev), "dev major",
> +		  "bpf %u user %u\n", major(knsid.dev), major(st.st_rdev)))
> +		goto close_pmu;
> +
> +	if (CHECK(minor(knsid.dev) != minor(st.st_rdev), "dev minor",
> +		  "bpf %u user %u\n", minor(knsid.dev), minor(st.st_rdev)))
> +		goto close_pmu;
> +
> +	exit_code = 0;
> +	printf("%s:PASS\n", argv[0]);
> +
> +close_pmu:
> +	close(pmu_fd);
> +close_prog:
> +	bpf_object__close(obj);
> +cleanup_cgroup_env:
> +	return exit_code;
> +}
> diff --git a/tools/testing/selftests/bpf/test_pidns_nmi.c b/tools/testing/selftests/bpf/test_pidns_nmi.c
> new file mode 100644
> index 000000000000..bb8075bbe7d8
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_pidns_nmi.c
> @@ -0,0 +1,139 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <syscall.h>
> +#include <unistd.h>
> +#include <linux/perf_event.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include "cgroup_helpers.h"
> +#include "bpf_rlimit.h"
> +
> +#define CHECK(condition, tag, format...) ({		\
> +	int __ret = !!(condition);			\
> +	if (__ret) {					\
> +		printf("%s:FAIL:%s ", __func__, tag);	\
> +		printf(format);				\
> +	} else {					\
> +		printf("%s:PASS:%s\n", __func__, tag);	\
> +	}						\
> +	__ret;						\
> +})
> +
> +static int bpf_find_map(const char *test, struct bpf_object *obj,
> +			const char *name)
> +{
> +	struct bpf_map *map;
> +
> +	map = bpf_object__find_map_by_name(obj, name);
> +	if (!map)
> +		return -1;
> +	return bpf_map__fd(map);
> +}
> +
> +
> +int main(int argc, char **argv)
> +{
> +	const char *probe_name = "net/netif_receive_skb";
> +	const char *file = "test_pidns_kern.o";
> +	int err, bytes, efd, prog_fd, pmu_fd;
> +	struct bpf_pidns_info knsid = {};
> +	int pidmap_fd, nsidmap_fd;
> +	struct perf_event_attr attr = {};
> +	struct bpf_object *obj;
> +	__u32 key = 0, pid;
> +	int exit_code = 1;
> +	struct stat st;
> +	char buf[256];
> +
> +	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
> +	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
> +		goto cleanup_cgroup_env;
> +
> +	nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
> +	if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  nsidmap_fd, errno))
> +		goto close_prog;
> +
> +	pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
> +	if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  pidmap_fd, errno))
> +		goto close_prog;
> +
> +	pid = getpid();
> +	bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
> +
> +	snprintf(buf, sizeof(buf),
> +		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
> +	efd = open(buf, O_RDONLY, 0);
> +	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
> +		goto close_prog;
> +	bytes = read(efd, buf, sizeof(buf));
> +	close(efd);
> +	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
> +		  "bytes %d errno %d\n", bytes, errno))
> +		goto close_prog;
> +
> +	attr.config = strtol(buf, NULL, 0);
> +	attr.type = PERF_TYPE_TRACEPOINT;
> +	attr.sample_type = PERF_SAMPLE_RAW;
> +	attr.sample_period = 1;
> +	attr.wakeup_events = 1;
> +
> +	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
> +	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
> +		  errno))
> +		goto close_prog;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> +	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> +	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	/* trigger some syscalls */
> +	sleep(1);
> +
> +	err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
> +	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
> +		goto close_pmu;
> +
> +	if (stat("/proc/self/ns/pid", &st))
> +		goto close_pmu;
> +
> +	if (CHECK(knsid.nsid != (__u32) st.st_ino, "namespace_id",
> +				"Called in interrupt context bpf %u user %u\n",
> +				knsid.nsid, (__u32) st.st_ino))
> +		goto close_pmu;
> +
> +	exit_code = 0;
> +	printf("%s:PASS\n", argv[0]);
> +
> +close_pmu:
> +	close(pmu_fd);
> +close_prog:
> +	bpf_object__close(obj);
> +cleanup_cgroup_env:
> +	return exit_code;
> +}

the file test_pidns_nmi.c is not needed any more.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-10 22:35                 ` Yonghong Song
@ 2019-09-10 23:15                   ` Al Viro
  2019-09-11  8:16                     ` ebiederm
  2019-09-11  4:32                   ` Carlos Antonio Neira Bustos
  1 sibling, 1 reply; 25+ messages in thread
From: Al Viro @ 2019-09-10 23:15 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Carlos Antonio Neira Bustos, netdev, ebiederm, brouer, bpf

On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
> 
> Carlos,
> 
> Discussed with Eric today for what is the best way to get
> the device number for a namespace. The following patch seems
> a reasonable start although Eric would like to see
> how the helper is used in order to decide whether the
> interface looks right.
> 
> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
> Author: Yonghong Song <yhs@fb.com>
> Date:   Mon Sep 9 21:50:51 2019 -0700
> 
>      nsfs: add an interface function ns_get_inum_dev()
> 
>      This patch added an interface function
>      ns_get_inum_dev(). Given a ns_common structure,
>      the function returns the inode and device
>      numbers. The function will be used later
>      by a newly added bpf helper.
> 
>      Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index a0431642c6b5..a603c6fc3f54 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
>          return ERR_PTR(-EINVAL);
>   }
> 
> +/* Get the device number for the current task pidns.
> + */
> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
> +{
> +       *inum = ns->inum;
> +       *dev = nsfs_mnt->mnt_sb->s_dev;
> +}

Umm...  Where would it get the device number once we get (hell knows
what for) multiple nsfs instances?  I still don't understand what
would that be about, TBH...  Is it really per-userns?  Or something
else entirely?  Eric, could you give some context?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-10 22:35                 ` Yonghong Song
  2019-09-10 23:15                   ` Al Viro
@ 2019-09-11  4:32                   ` Carlos Antonio Neira Bustos
  2019-09-11  8:17                     ` ebiederm
  1 sibling, 1 reply; 25+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-09-11  4:32 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Al Viro, netdev, ebiederm, brouer, bpf

On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
Thanks a lot Yonghong.
I'll include this patch when submitting changes for version 11 of
this patch. 
> 
> Carlos,
> 
> Discussed with Eric today for what is the best way to get
> the device number for a namespace. The following patch seems
> a reasonable start although Eric would like to see
> how the helper is used in order to decide whether the
> interface looks right.
> 
> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
> Author: Yonghong Song <yhs@fb.com>
> Date:   Mon Sep 9 21:50:51 2019 -0700
> 
>      nsfs: add an interface function ns_get_inum_dev()
> 
>      This patch added an interface function
>      ns_get_inum_dev(). Given a ns_common structure,
>      the function returns the inode and device
>      numbers. The function will be used later
>      by a newly added bpf helper.
> 
>      Signed-off-by: Yonghong Song <yhs@fb.com>
> 
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index a0431642c6b5..a603c6fc3f54 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
>          return ERR_PTR(-EINVAL);
>   }
> 
> +/* Get the device number for the current task pidns.
> + */
> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
> +{
> +       *inum = ns->inum;
> +       *dev = nsfs_mnt->mnt_sb->s_dev;
> +}
> +
>   static int nsfs_show_path(struct seq_file *seq, struct dentry *dentry)
>   {
>          struct inode *inode = d_inode(dentry);
> diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
> index d31cb6215905..b8fc680cdf1a 100644
> --- a/include/linux/proc_ns.h
> +++ b/include/linux/proc_ns.h
> @@ -81,6 +81,7 @@ extern void *ns_get_path(struct path *path, struct 
> task_struct *task,
>   typedef struct ns_common *ns_get_path_helper_t(void *);
>   extern void *ns_get_path_cb(struct path *path, ns_get_path_helper_t 
> ns_get_cb,
>                              void *private_data);
> +extern void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev);
> 
>   extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
>                          const struct proc_ns_operations *ns_ops);
> 
> Could you put the above change and patch #1 and then have
> all your other patches? In your kernel change, please use
> interface function ns_get_inum_dev() to get pidns inode number
> and dev number.
> 
> On 9/9/19 6:45 PM, Carlos Antonio Neira Bustos wrote:
> > Thanks a lot, Al Viro and Yonghong for taking the time to review this patch and
> > provide technical insights needed on this one.
> > But how do we move this forward?
> > Al Viro's review is clear that this will not work and we should strip the name
> > resolution code (thanks for your detailed analysis).
> > As there is currently only one instance of the nsfs device on the system,
> > I think we could leave out the retrieval of the pidns device number and address it
> > when the situation changes.
> > What do you think?
> > 
> > 
> > On Sat, Sep 07, 2019 at 06:34:39AM +0000, Yonghong Song wrote:
> >>
> >>
> >> On 9/6/19 5:10 PM, Al Viro wrote:
> >>> On Fri, Sep 06, 2019 at 11:21:14PM +0000, Yonghong Song wrote:
> >>>
> >>>> -bash-4.4$ readlink /proc/self/ns/pid
> >>>> pid:[4026531836]
> >>>> -bash-4.4$ stat /proc/self/ns/pid
> >>>>      File: ‘/proc/self/ns/pid’ -> ‘pid:[4026531836]’
> >>>>      Size: 0               Blocks: 0          IO Block: 1024   symbolic link
> >>>> Device: 4h/4d   Inode: 344795989   Links: 1
> >>>> Access: (0777/lrwxrwxrwx)  Uid: (128203/     yhs)   Gid: (  100/   users)
> >>>> Context: user_u:base_r:base_t
> >>>> Access: 2019-09-06 16:06:09.431616380 -0700
> >>>> Modify: 2019-09-06 16:06:09.431616380 -0700
> >>>> Change: 2019-09-06 16:06:09.431616380 -0700
> >>>>     Birth: -
> >>>> -bash-4.4$
> >>>>
> >>>> Based on a discussion with Eric Biederman back in 2019 Linux
> >>>> Plumbers, Eric suggested that to uniquely identify a
> >>>> namespace, device id (major/minor) number should also
> >>>> be included. Although today's kernel implementation
> >>>> has the same device for all namespace pseudo files,
> >>>> but from uapi perspective, device id should be included.
> >>>>
> >>>> That is the reason why we try to get device id which holds
> >>>> pid namespace pseudo file.
> >>>>
> >>>> Do you have a better suggestion on how to get
> >>>> the device id for 'current' pid namespace? Or from design, we
> >>>> really should not care about device id at all?
> >>>
> >>> What the hell is "device id for pid namespace"?  This is the
> >>> first time I've heard about that mystery object, so it's
> >>> hard to tell where it could be found.
> >>>
> >>> I can tell you what device numbers are involved in the areas
> >>> you seem to be looking in.
> >>>
> >>> 1) there's whatever device number that gets assigned to
> >>> (this) procfs instance.  That, ironically, _is_ per-pidns, but
> >>> that of the procfs instance, not that of your process (and
> >>> those can be different).  That's what you get in ->st_dev
> >>> when doing lstat() of anything in /proc (assuming that
> >>> procfs is mounted there, in the first place).  NOTE:
> >>> that's lstat(2), not stat(2).  stat(1) uses lstat(2),
> >>> unless given -L (in which case it's stat(2) time).  The
> >>> difference:
> >>>
> >>> root@kvm1:~# stat /proc/self/ns/pid
> >>>     File: /proc/self/ns/pid -> pid:[4026531836]
> >>>     Size: 0               Blocks: 0          IO Block: 1024   symbolic link
> >>> Device: 4h/4d   Inode: 17396       Links: 1
> >>> Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
> >>> Access: 2019-09-06 19:43:11.871312319 -0400
> >>> Modify: 2019-09-06 19:43:11.871312319 -0400
> >>> Change: 2019-09-06 19:43:11.871312319 -0400
> >>>    Birth: -
> >>> root@kvm1:~# stat -L /proc/self/ns/pid
> >>>     File: /proc/self/ns/pid
> >>>     Size: 0               Blocks: 0          IO Block: 4096   regular empty file
> >>> Device: 3h/3d   Inode: 4026531836  Links: 1
> >>> Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
> >>> Access: 2019-09-06 19:43:15.955313293 -0400
> >>> Modify: 2019-09-06 19:43:15.955313293 -0400
> >>> Change: 2019-09-06 19:43:15.955313293 -0400
> >>>    Birth: -
> >>>
> >>> The former is lstat, the latter - stat.
> >>>
> >>> 2) device number of the filesystem where the symlink target lives.
> >>> In this case, it's nsfs and there's only one instance on the entire
> >>> system.  _That_ would be obtained by looking at st_dev in stat(2) on
> >>> /proc/self/ns/pid (0:3 above).
> >>>
> >>> 3) device number *OF* the symlink.  That would be st_rdev in lstat(2).
> >>> There's none - it's a symlink, not a character or block device.  It's
> >>> always zero and always will be zero.
> >>>
> >>> 4) the same for the target; st_rdev in stat(2) results and again,
> >>> there's no such beast - it's neither character nor block device.
> >>>
> >>> Your code is looking at (3).  Please, reread any textbook on Unix
> >>> in the section that would cover stat(2) and discussion of the
> >>> difference between st_dev and st_rdev.
> >>>
> >>> I have no idea what Eric had been talking about - it's hard to
> >>> reconstruct by what you said so far.  Making nsfs per-userns,
> >>> perhaps?  But that makes no sense whatsoever, not that userns
> >>> ever had...  Cheap shots aside, I really can't guess what that's
> >>> about.  Sorry.
> >>
> >> Thanks for the detailed information. The device number we want
> >> is nsfs. Indeed, currently, there is only one instance
> >> on the entire system. But not exactly sure what is the possibility
> >> to have more than one nsfs device in the future. Maybe per-userns
> >> or any other criteria?
> >>
> >>>
> >>> In any case, pathname resolution is *NOT* for the situations where
> >>> you can't block.  Even if it's procfs (and from the same pidns as
> >>> the process) mounted there, there is no promise that the target
> >>> of /proc/self has already been looked up and not evicted from
> >>> memory since then.  And in case of cache miss pathwalk will
> >>> have to call ->lookup(), which requires locking the directory
> >>> (rw_sem, shared).  You can't do that in such context.
> >>>
> >>> And that doesn't even go into the possibility that process has
> >>> something very different mounted on /proc.
> >>>
> >>> Again, I don't know what it is that you want to get to, but
> >>> I would strongly recommend finding a way to get to that data
> >>> that would not involve going anywhere near pathname resolution.
> >>>
> >>> How would you expect the userland to work with that value,
> >>> whatever it might be?  If it's just a 32bit field that will
> >>> never be read, you might as well store there the same value
> >>> you store now (0, that is) in much cheaper and safer way ;-)
> >>
> >> Suppose inside pid namespace, user can pass the device number,
> >> say n1, (`stat -L /proc/self/ns/pid`) to bpf program (through map
> >> or JIT). At runtime, bpf program will try to get device number,
> >> say n2, for the 'current' process. If n1 is not the same as
> >> n2, that means they are not in the same namespace. 'current'
> >> is in the same pid namespace as the user iff
> >> n1 == n2 and also pidns id is the same for 'current' and
> >> the one with `lsns -t pid`.
> >>
> >> Are you aware of any way to get the pidns device number
> >> for 'current' without going through the pathname
> >> lookup?
> >>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-10 22:46   ` Yonghong Song
@ 2019-09-11  4:33     ` Carlos Antonio Neira Bustos
  0 siblings, 0 replies; 25+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-09-11  4:33 UTC (permalink / raw)
  To: Yonghong Song; +Cc: netdev, ebiederm, brouer, bpf

Thanks for reviewing the rest of the code, 
I'll include these changes in ver 11.

Bests 
On Tue, Sep 10, 2019 at 10:46:45PM +0000, Yonghong Song wrote:
> 
> 
> On 9/6/19 4:09 PM, Carlos Neira wrote:
> > This helper(bpf_get_current_pidns_info) obtains the active namespace from
> > current and returns pid, tgid, device and namespace id as seen from that
> > namespace, allowing to instrument a process inside a container.
> > 
> > Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > ---
> >   include/linux/bpf.h      |  1 +
> >   include/uapi/linux/bpf.h | 35 +++++++++++++++++++-
> >   kernel/bpf/core.c        |  1 +
> >   kernel/bpf/helpers.c     | 86 ++++++++++++++++++++++++++++++++++++++++++++++++
> >   kernel/trace/bpf_trace.c |  2 ++
> >   5 files changed, 124 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 5b9d22338606..819cb1c84be0 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1055,6 +1055,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
> >   extern const struct bpf_func_proto bpf_strtol_proto;
> >   extern const struct bpf_func_proto bpf_strtoul_proto;
> >   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> > +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
> >   
> >   /* Shared helpers among cBPF and eBPF. */
> >   void bpf_user_rnd_init_once(void);
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index b5889257cc33..3ec9aa1438b7 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2747,6 +2747,32 @@ union bpf_attr {
> >    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
> >    *
> >    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> > + *
> > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> > + *	Description
> > + *		Get tgid, pid and namespace id as seen by the current namespace,
> > + *		and device major/minor numbers from /proc/self/ns/pid. Such
> > + *		information is stored in *pidns* of size *size*.
> > + *
> > + *		This helper is used when pid filtering is needed inside a
> > + *		container as bpf_get_current_tgid() helper always returns the
> > + *		pid id as seen by the root namespace.
> > + *	Return
> > + *		0 on success
> > + *
> > + *		On failure, the returned value is one of the following:
> > + *
> > + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> > + *		or tgid of the current task.
> > + *
> > + *		**-ENOENT** if /proc/self/ns/pid does not exists.
> > + *
> > + *		**-ENOENT** if /proc/self/ns does not exists.
> > + *
> > + *		**-ENOMEM** if helper internal allocation fails.
> 
> -ENOMEM can be removed.
> 
> > + *
> > + *		**-EPERM** if not able to call helper.
> > + *
> >    */
> >   #define __BPF_FUNC_MAPPER(FN)		\
> >   	FN(unspec),			\
> > @@ -2859,7 +2885,8 @@ union bpf_attr {
> >   	FN(sk_storage_get),		\
> >   	FN(sk_storage_delete),		\
> >   	FN(send_signal),		\
> > -	FN(tcp_gen_syncookie),
> > +	FN(tcp_gen_syncookie),		\
> > +	FN(get_current_pidns_info),
> >   
> >   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >    * function eBPF program intends to call
> > @@ -3610,4 +3637,10 @@ struct bpf_sockopt {
> >   	__s32	retval;
> >   };
> >   
> > +struct bpf_pidns_info {
> > +	__u32 dev;	/* dev_t from /proc/self/ns/pid inode */
> 
>     /* dev_t of pid namespace pseudo file (typically /proc/seelf/ns/pid) 
> after following symbolic link */
> 
> > +	__u32 nsid;
> > +	__u32 tgid;
> > +	__u32 pid;
> > +};
> >   #endif /* _UAPI__LINUX_BPF_H__ */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 8191a7db2777..3159f2a0188c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
> >   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
> >   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
> >   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> > +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
> >   
> >   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> >   {
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 5e28718928ca..8dbe6347893c 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -11,6 +11,11 @@
> >   #include <linux/uidgid.h>
> >   #include <linux/filter.h>
> >   #include <linux/ctype.h>
> > +#include <linux/pid_namespace.h>
> > +#include <linux/kdev_t.h>
> > +#include <linux/stat.h>
> > +#include <linux/namei.h>
> > +#include <linux/version.h>
> >   
> >   #include "../../lib/kstrtox.h"
> >   
> > @@ -312,6 +317,87 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> >   	preempt_enable();
> >   }
> >   
> > +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> > +	 size)
> > +{
> > +	const char *pidns_path = "/proc/self/ns/pid";
> > +	struct pid_namespace *pidns = NULL;
> > +	struct filename *fname = NULL;
> > +	struct inode *inode;
> > +	struct path kp;
> > +	pid_t tgid = 0;
> > +	pid_t pid = 0;
> > +	int ret = -EINVAL;
> > +	int len;
> > +
> > +	if (unlikely(in_interrupt() ||
> > +			current->flags & (PF_KTHREAD | PF_EXITING)))
> > +		return -EPERM;
> > +
> > +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > +		return -EINVAL;
> > +
> > +	pidns = task_active_pid_ns(current);
> > +	if (unlikely(!pidns))
> > +		return -ENOENT;
> > +
> > +	pidns_info->nsid =  pidns->ns.inum;
> > +	pid = task_pid_nr_ns(current, pidns);
> > +	if (unlikely(!pid))
> > +		goto clear;
> > +
> > +	tgid = task_tgid_nr_ns(current, pidns);
> > +	if (unlikely(!tgid))
> > +		goto clear;
> > +
> > +	pidns_info->tgid = (u32) tgid;
> > +	pidns_info->pid = (u32) pid;
> > +
> [...]
> > +	fname = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> > +	if (unlikely(!fname)) {
> > +		ret = -ENOMEM;
> > +		goto clear;
> > +	}
> > +	const size_t fnamesize = offsetof(struct filename, iname[1]);
> > +	struct filename *tmp;
> > +
> > +	tmp = kmalloc(fnamesize, GFP_ATOMIC);
> > +	if (unlikely(!tmp)) {
> > +		__putname(fname);
> > +		ret = -ENOMEM;
> > +		goto clear;
> > +	}
> > +
> > +	tmp->name = (char *)fname;
> > +	fname = tmp;
> > +	len = strlen(pidns_path) + 1;
> > +	memcpy((char *)fname->name, pidns_path, len);
> > +	fname->uptr = NULL;
> > +	fname->aname = NULL;
> > +	fname->refcnt = 1;
> > +
> > +	ret = filename_lookup(AT_FDCWD, fname, 0, &kp, NULL);
> > +	if (ret)
> > +		goto clear;
> > +
> > +	inode = d_backing_inode(kp.dentry);
> > +	pidns_info->dev = (u32)inode->i_rdev;
> The above can bee replaced with new nsfs interface function
> ns_get_inum_dev().
> > +
> > +	return 0;
> > +
> > +clear:
> > +	memset((void *)pidns_info, 0, (size_t) size);
> > +	return ret;
> > +}
> > +
> > +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> > +	.func		= bpf_get_current_pidns_info,
> > +	.gpl_only	= false,
> > +	.ret_type	= RET_INTEGER,
> > +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> > +	.arg2_type	= ARG_CONST_SIZE,
> > +};
> > +
> >   #ifdef CONFIG_CGROUPS
> >   BPF_CALL_0(bpf_get_current_cgroup_id)
> >   {
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ca1255d14576..5e1dc22765a5 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >   #endif
> >   	case BPF_FUNC_send_signal:
> >   		return &bpf_send_signal_proto;
> > +	case BPF_FUNC_get_current_pidns_info:
> > +		return &bpf_get_current_pidns_info_proto;
> >   	default:
> >   		return NULL;
> >   	}
> > 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-10 23:15                   ` Al Viro
@ 2019-09-11  8:16                     ` ebiederm
  2019-09-12  5:49                       ` Yonghong Song
  0 siblings, 1 reply; 25+ messages in thread
From: ebiederm @ 2019-09-11  8:16 UTC (permalink / raw)
  To: Al Viro
  Cc: Yonghong Song, Carlos Antonio Neira Bustos, netdev\, brouer\, bpf\

Al Viro <viro@zeniv.linux.org.uk> writes:

> On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
>> 
>> Carlos,
>> 
>> Discussed with Eric today for what is the best way to get
>> the device number for a namespace. The following patch seems
>> a reasonable start although Eric would like to see
>> how the helper is used in order to decide whether the
>> interface looks right.
>> 
>> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
>> Author: Yonghong Song <yhs@fb.com>
>> Date:   Mon Sep 9 21:50:51 2019 -0700
>> 
>>      nsfs: add an interface function ns_get_inum_dev()
>> 
>>      This patch added an interface function
>>      ns_get_inum_dev(). Given a ns_common structure,
>>      the function returns the inode and device
>>      numbers. The function will be used later
>>      by a newly added bpf helper.
>> 
>>      Signed-off-by: Yonghong Song <yhs@fb.com>
>> 
>> diff --git a/fs/nsfs.c b/fs/nsfs.c
>> index a0431642c6b5..a603c6fc3f54 100644
>> --- a/fs/nsfs.c
>> +++ b/fs/nsfs.c
>> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
>>          return ERR_PTR(-EINVAL);
>>   }
>> 
>> +/* Get the device number for the current task pidns.
>> + */
>> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
>> +{
>> +       *inum = ns->inum;
>> +       *dev = nsfs_mnt->mnt_sb->s_dev;
>> +}
>
> Umm...  Where would it get the device number once we get (hell knows
> what for) multiple nsfs instances?  I still don't understand what
> would that be about, TBH...  Is it really per-userns?  Or something
> else entirely?  Eric, could you give some context?

My goal is not to paint things into a corner, with future changes.
Right now it is possible to stat a namespace file descriptor and
get a device and inode number.  Then compare that. 

I don't want people using the inode number in nsfd as some magic
namespace id.

We have had times in the past where there was more than one superblock
and thus more than one device number.  Further if userspace ever uses
this heavily there may be times in the future where for
checkpoint/restart purposes we will want multiple nsfd's so we can
preserve the inode number accross a migration.

Realistically there will probably just some kind of hotplug notification
to userspace to say we have hotplugged your operatining system as
a migration notification.

Now the halway discussion did not quite capture everything I was trying
to say but it at least got to the right ballpark.

The helper in fs/nsfs.c should be:

bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
{
        return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev));
}

That way if/when there are multiple inodes identifying the same
namespace the bpf programs don't need to change.

Up farther in the stack it should be something like:

> BPF_CALL_2(bpf_current_pidns_match, dev_t *dev, ino_t *ino)
> {
>         return ns_match(&task_active_pid_ns(current)->ns, *dev, *ino);
> }
> 
> const struct bpf_func_proto bpf_current_pidns_match_proto = {
> 	.func		= bpf_current_pins_match,
> 	.gpl_only	= true,
> 	.ret_type	= RET_INTEGER
> 	.arg1_type	= ARG_PTR_TO_DEVICE_NUMBER,
> 	.arg2_type	= ARG_PTR_TO_INODE_NUMBER,
> };

That allows comparing what the bpf came up with with whatever value
userspace generated by stating the file descriptor.


That is the least bad suggestion I currently have for that
functionality.  It really would be better to not have that filter in the
bpf program itself but in the infrastructure that binds a program to a
set of tasks.

The problem with this approach is whatever device/inode you have when
the namespace they refer to exits there is the possibility that the
inode will be reused.  So your filter will eventually start matching on
the wrong thing.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-11  4:32                   ` Carlos Antonio Neira Bustos
@ 2019-09-11  8:17                     ` ebiederm
  0 siblings, 0 replies; 25+ messages in thread
From: ebiederm @ 2019-09-11  8:17 UTC (permalink / raw)
  To: Carlos Antonio Neira Bustos
  Cc: Yonghong Song, Al Viro, netdev\, brouer\, bpf\

Carlos Antonio Neira Bustos <cneirabustos@gmail.com> writes:

> On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
> Thanks a lot Yonghong.
> I'll include this patch when submitting changes for version 11 of
> this patch.

Please see my reply to Al.

Eric



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-11  8:16                     ` ebiederm
@ 2019-09-12  5:49                       ` Yonghong Song
       [not found]                         ` <CACiB22j9M2gmccnh7XqqFp8g7qKFuiOrSAVJiA2tQHLB0pmoSQ@mail.gmail.com>
  2019-09-13 16:59                         ` ebiederm
  0 siblings, 2 replies; 25+ messages in thread
From: Yonghong Song @ 2019-09-12  5:49 UTC (permalink / raw)
  To: Eric W. Biederman, Al Viro
  Cc: Carlos Antonio Neira Bustos, netdev, brouer, bpf



On 9/11/19 9:16 AM, Eric W. Biederman wrote:
> Al Viro <viro@zeniv.linux.org.uk> writes:
> 
>> On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
>>>
>>> Carlos,
>>>
>>> Discussed with Eric today for what is the best way to get
>>> the device number for a namespace. The following patch seems
>>> a reasonable start although Eric would like to see
>>> how the helper is used in order to decide whether the
>>> interface looks right.
>>>
>>> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
>>> Author: Yonghong Song <yhs@fb.com>
>>> Date:   Mon Sep 9 21:50:51 2019 -0700
>>>
>>>       nsfs: add an interface function ns_get_inum_dev()
>>>
>>>       This patch added an interface function
>>>       ns_get_inum_dev(). Given a ns_common structure,
>>>       the function returns the inode and device
>>>       numbers. The function will be used later
>>>       by a newly added bpf helper.
>>>
>>>       Signed-off-by: Yonghong Song <yhs@fb.com>
>>>
>>> diff --git a/fs/nsfs.c b/fs/nsfs.c
>>> index a0431642c6b5..a603c6fc3f54 100644
>>> --- a/fs/nsfs.c
>>> +++ b/fs/nsfs.c
>>> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
>>>           return ERR_PTR(-EINVAL);
>>>    }
>>>
>>> +/* Get the device number for the current task pidns.
>>> + */
>>> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
>>> +{
>>> +       *inum = ns->inum;
>>> +       *dev = nsfs_mnt->mnt_sb->s_dev;
>>> +}
>>
>> Umm...  Where would it get the device number once we get (hell knows
>> what for) multiple nsfs instances?  I still don't understand what
>> would that be about, TBH...  Is it really per-userns?  Or something
>> else entirely?  Eric, could you give some context?
> 
> My goal is not to paint things into a corner, with future changes.
> Right now it is possible to stat a namespace file descriptor and
> get a device and inode number.  Then compare that.
> 
> I don't want people using the inode number in nsfd as some magic
> namespace id.
> 
> We have had times in the past where there was more than one superblock
> and thus more than one device number.  Further if userspace ever uses
> this heavily there may be times in the future where for
> checkpoint/restart purposes we will want multiple nsfd's so we can
> preserve the inode number accross a migration.
> 
> Realistically there will probably just some kind of hotplug notification
> to userspace to say we have hotplugged your operatining system as
> a migration notification.
> 
> Now the halway discussion did not quite capture everything I was trying
> to say but it at least got to the right ballpark.
> 
> The helper in fs/nsfs.c should be:
> 
> bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
> {
>          return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev));
> }
> 
> That way if/when there are multiple inodes identifying the same
> namespace the bpf programs don't need to change.

Thanks, Eric. This is indeed better. The bpf helper should focus
on comparing dev/ino, instead of return the dev/ino to bpf program.

So overall, nsfs related change will look like:

diff --git a/fs/nsfs.c b/fs/nsfs.c
index a0431642c6b5..7e78d89c2172 100644
--- a/fs/nsfs.c
+++ b/fs/nsfs.c
@@ -245,6 +245,11 @@ struct file *proc_ns_fget(int fd)
         return ERR_PTR(-EINVAL);
  }

+bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
+{
+       return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev));
+}
+
  static int nsfs_show_path(struct seq_file *seq, struct dentry *dentry)
  {
         struct inode *inode = d_inode(dentry);
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index d31cb6215905..79639807e960 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -81,6 +81,7 @@ extern void *ns_get_path(struct path *path, struct 
task_struct *task,
  typedef struct ns_common *ns_get_path_helper_t(void *);
  extern void *ns_get_path_cb(struct path *path, ns_get_path_helper_t 
ns_get_cb,
                             void *private_data);
+extern bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino);

  extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
                         const struct proc_ns_operations *ns_ops);

> 
> Up farther in the stack it should be something like:
> 
>> BPF_CALL_2(bpf_current_pidns_match, dev_t *dev, ino_t *ino)
>> {
>>          return ns_match(&task_active_pid_ns(current)->ns, *dev, *ino);
>> }
>>
>> const struct bpf_func_proto bpf_current_pidns_match_proto = {
>> 	.func		= bpf_current_pins_match,
>> 	.gpl_only	= true,
>> 	.ret_type	= RET_INTEGER
>> 	.arg1_type	= ARG_PTR_TO_DEVICE_NUMBER,
>> 	.arg2_type	= ARG_PTR_TO_INODE_NUMBER,
>> };
> 
> That allows comparing what the bpf came up with with whatever value
> userspace generated by stating the file descriptor.
> 
> 
> That is the least bad suggestion I currently have for that
> functionality.  It really would be better to not have that filter in the
> bpf program itself but in the infrastructure that binds a program to a
> set of tasks.
> 
> The problem with this approach is whatever device/inode you have when
> the namespace they refer to exits there is the possibility that the
> inode will be reused.  So your filter will eventually start matching on
> the wrong thing.

I come up with a differeent helper definition, which is much more
similar to existing bpf_get_current_pid_tgid() and helper definition
much more conforms to bpf convention.

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 5e28718928ca..bc26903c80c7 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -11,6 +11,8 @@
  #include <linux/uidgid.h>
  #include <linux/filter.h>
  #include <linux/ctype.h>
+#include <linux/pid_namespace.h>
+#include <linux/proc_ns.h>

  #include "../../lib/kstrtox.h"

@@ -487,3 +489,33 @@ const struct bpf_func_proto bpf_strtoul_proto = {
         .arg4_type      = ARG_PTR_TO_LONG,
  };
  #endif
+
+BPF_CALL_2(bpf_get_ns_current_pid_tgid, u32, dev, u32, inum)
+{
+       struct task_struct *task = current;
+       struct pid_namespace *pidns;
+       pid_t pid, tgid;
+
+       if (unlikely(!task))
+               return -EINVAL;
+
+
+       pidns = task_active_pid_ns(task);
+       if (unlikely(!pidns))
+               return -ENOENT;
+
+       if (!ns_match(&pidns->ns, (dev_t)dev, inum))
+               return -EINVAL;
+
+       pid = task_pid_nr_ns(task, pidns);
+       tgid = task_tgid_nr_ns(task, pidns);
+
+       return (u64) tgid << 32 | pid;
+}
+
+const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto = {
+       .func           = bpf_get_ns_current_pid_tgid,
+       .gpl_only       = false,
+       .ret_type       = RET_INTEGER,
+       .arg1_type      = ARG_ANYTHING,
+       .arg2_type      = ARG_ANYTHING,
+};

Existing usage of bpf_get_current_pid_tgid() can be converted
to bpf_get_ns_current_pid_tgid() if ns dev/inode number
is supplied. For bpf_get_ns_current_pid_tgid(), checking
return value ( < 0 or not) is needed.

> 
> Eric
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
       [not found]                         ` <CACiB22j9M2gmccnh7XqqFp8g7qKFuiOrSAVJiA2tQHLB0pmoSQ@mail.gmail.com>
@ 2019-09-13  2:56                           ` Yonghong Song
  2019-09-13 11:58                             ` Carlos Antonio Neira Bustos
  0 siblings, 1 reply; 25+ messages in thread
From: Yonghong Song @ 2019-09-13  2:56 UTC (permalink / raw)
  To: carlos antonio neira bustos; +Cc: Eric Biederman, Al Viro, netdev, brouer, bpf



On 9/12/19 3:03 PM, carlos antonio neira bustos wrote:
> Yonghong,
> 
> I think bpf_get_ns_current_pid_tgid interface is a lot better than the 
> one proposed in my patch, how are we going to move forward? Should I 
> take these changes and refactor the selftests to use this new interface 
> and submit version 12 or as the interface changed completely is a new 
> set of patches?.

This is to solve the same problem. You can just submit as version 12.
This way, we preserves discussion history clearly.

> 
> Eric,
> Thank you very much for explaining the problem and your help to move 
> forward with this new helper.
> 
> 
> Bests

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-13  2:56                           ` Yonghong Song
@ 2019-09-13 11:58                             ` Carlos Antonio Neira Bustos
  0 siblings, 0 replies; 25+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-09-13 11:58 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Eric Biederman, Al Viro, netdev, brouer, bpf

On Fri, Sep 13, 2019 at 02:56:43AM +0000, Yonghong Song wrote:
Yonghong,

Great, I'll submit this new interface along self tests as version 12.
Thanks for your help.

Bests
> 
> 
> On 9/12/19 3:03 PM, carlos antonio neira bustos wrote:
> > Yonghong,
> > 
> > I think bpf_get_ns_current_pid_tgid interface is a lot better than the 
> > one proposed in my patch, how are we going to move forward? Should I 
> > take these changes and refactor the selftests to use this new interface 
> > and submit version 12 or as the interface changed completely is a new 
> > set of patches?.
> 
> This is to solve the same problem. You can just submit as version 12.
> This way, we preserves discussion history clearly.
> 
> > 
> > Eric,
> > Thank you very much for explaining the problem and your help to move 
> > forward with this new helper.
> > 
> > 
> > Bests

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-12  5:49                       ` Yonghong Song
       [not found]                         ` <CACiB22j9M2gmccnh7XqqFp8g7qKFuiOrSAVJiA2tQHLB0pmoSQ@mail.gmail.com>
@ 2019-09-13 16:59                         ` ebiederm
  2019-09-13 17:28                           ` Yonghong Song
  1 sibling, 1 reply; 25+ messages in thread
From: ebiederm @ 2019-09-13 16:59 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Al Viro, Carlos Antonio Neira Bustos, netdev\, brouer\, bpf\

Yonghong Song <yhs@fb.com> writes:

> On 9/11/19 9:16 AM, Eric W. Biederman wrote:
>> Al Viro <viro@zeniv.linux.org.uk> writes:
>> 
>>> On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
>>>>
>>>> Carlos,
>>>>
>>>> Discussed with Eric today for what is the best way to get
>>>> the device number for a namespace. The following patch seems
>>>> a reasonable start although Eric would like to see
>>>> how the helper is used in order to decide whether the
>>>> interface looks right.
>>>>
>>>> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
>>>> Author: Yonghong Song <yhs@fb.com>
>>>> Date:   Mon Sep 9 21:50:51 2019 -0700
>>>>
>>>>       nsfs: add an interface function ns_get_inum_dev()
>>>>
>>>>       This patch added an interface function
>>>>       ns_get_inum_dev(). Given a ns_common structure,
>>>>       the function returns the inode and device
>>>>       numbers. The function will be used later
>>>>       by a newly added bpf helper.
>>>>
>>>>       Signed-off-by: Yonghong Song <yhs@fb.com>
>>>>
>>>> diff --git a/fs/nsfs.c b/fs/nsfs.c
>>>> index a0431642c6b5..a603c6fc3f54 100644
>>>> --- a/fs/nsfs.c
>>>> +++ b/fs/nsfs.c
>>>> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
>>>>           return ERR_PTR(-EINVAL);
>>>>    }
>>>>
>>>> +/* Get the device number for the current task pidns.
>>>> + */
>>>> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
>>>> +{
>>>> +       *inum = ns->inum;
>>>> +       *dev = nsfs_mnt->mnt_sb->s_dev;
>>>> +}
>>>
>>> Umm...  Where would it get the device number once we get (hell knows
>>> what for) multiple nsfs instances?  I still don't understand what
>>> would that be about, TBH...  Is it really per-userns?  Or something
>>> else entirely?  Eric, could you give some context?
>> 
>> My goal is not to paint things into a corner, with future changes.
>> Right now it is possible to stat a namespace file descriptor and
>> get a device and inode number.  Then compare that.
>> 
>> I don't want people using the inode number in nsfd as some magic
>> namespace id.
>> 
>> We have had times in the past where there was more than one superblock
>> and thus more than one device number.  Further if userspace ever uses
>> this heavily there may be times in the future where for
>> checkpoint/restart purposes we will want multiple nsfd's so we can
>> preserve the inode number accross a migration.
>> 
>> Realistically there will probably just some kind of hotplug notification
>> to userspace to say we have hotplugged your operatining system as
>> a migration notification.
>> 
>> Now the halway discussion did not quite capture everything I was trying
>> to say but it at least got to the right ballpark.
>> 
>> The helper in fs/nsfs.c should be:
>> 
>> bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
>> {
>>          return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev));
>> }
>> 
>> That way if/when there are multiple inodes identifying the same
>> namespace the bpf programs don't need to change.
>
> Thanks, Eric. This is indeed better. The bpf helper should focus
> on comparing dev/ino, instead of return the dev/ino to bpf program.
>
> So overall, nsfs related change will look like:
>
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index a0431642c6b5..7e78d89c2172 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -245,6 +245,11 @@ struct file *proc_ns_fget(int fd)
>          return ERR_PTR(-EINVAL);
>   }
>
> +bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
> +{
> +       return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev));
> +}
> +
>   static int nsfs_show_path(struct seq_file *seq, struct dentry *dentry)
>   {
>          struct inode *inode = d_inode(dentry);
> diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
> index d31cb6215905..79639807e960 100644
> --- a/include/linux/proc_ns.h
> +++ b/include/linux/proc_ns.h
> @@ -81,6 +81,7 @@ extern void *ns_get_path(struct path *path, struct 
> task_struct *task,
>   typedef struct ns_common *ns_get_path_helper_t(void *);
>   extern void *ns_get_path_cb(struct path *path, ns_get_path_helper_t 
> ns_get_cb,
>                              void *private_data);
> +extern bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino);
>
>   extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
>                          const struct proc_ns_operations *ns_ops);
>
>> 
>> Up farther in the stack it should be something like:
>> 
>>> BPF_CALL_2(bpf_current_pidns_match, dev_t *dev, ino_t *ino)
>>> {
>>>          return ns_match(&task_active_pid_ns(current)->ns, *dev, *ino);
>>> }
>>>
>>> const struct bpf_func_proto bpf_current_pidns_match_proto = {
>>> 	.func		= bpf_current_pins_match,
>>> 	.gpl_only	= true,
>>> 	.ret_type	= RET_INTEGER
>>> 	.arg1_type	= ARG_PTR_TO_DEVICE_NUMBER,
>>> 	.arg2_type	= ARG_PTR_TO_INODE_NUMBER,
>>> };
>> 
>> That allows comparing what the bpf came up with with whatever value
>> userspace generated by stating the file descriptor.
>> 
>> 
>> That is the least bad suggestion I currently have for that
>> functionality.  It really would be better to not have that filter in the
>> bpf program itself but in the infrastructure that binds a program to a
>> set of tasks.
>> 
>> The problem with this approach is whatever device/inode you have when
>> the namespace they refer to exits there is the possibility that the
>> inode will be reused.  So your filter will eventually start matching on
>> the wrong thing.
>
> I come up with a differeent helper definition, which is much more
> similar to existing bpf_get_current_pid_tgid() and helper definition
> much more conforms to bpf convention.

There is a problem with your bpf_get_ns_current_pid_tgid below.
The inode number is a 64bit number.  To be nice to old userspace
we try and not use 64bit inode numbers where they are not required
but in this case we should not use an interface that assumes inode
numbers are 32bit.  They just aren't.

I didn't know how to express that in the bpf proto so I did what
I could.

The alternative to this would be to simply restrict this
helper to bpf programs registered in the initial pid namespace.
At which point you could just ensure all the numbers are in
the global pid namespace.

Hmm.  Looing at the comment below I am confused.

> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5e28718928ca..bc26903c80c7 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -11,6 +11,8 @@
>   #include <linux/uidgid.h>
>   #include <linux/filter.h>
>   #include <linux/ctype.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/proc_ns.h>
>
>   #include "../../lib/kstrtox.h"
>
> @@ -487,3 +489,33 @@ const struct bpf_func_proto bpf_strtoul_proto = {
>          .arg4_type      = ARG_PTR_TO_LONG,
>   };
>   #endif
> +
> +BPF_CALL_2(bpf_get_ns_current_pid_tgid, u32, dev, u32, inum)
> +{
> +       struct task_struct *task = current;
> +       struct pid_namespace *pidns;
> +       pid_t pid, tgid;
> +
> +       if (unlikely(!task))
> +               return -EINVAL;
> +
> +
> +       pidns = task_active_pid_ns(task);
> +       if (unlikely(!pidns))
> +               return -ENOENT;
> +
> +       if (!ns_match(&pidns->ns, (dev_t)dev, inum))
> +               return -EINVAL;
> +
> +       pid = task_pid_nr_ns(task, pidns);
> +       tgid = task_tgid_nr_ns(task, pidns);
> +
> +       return (u64) tgid << 32 | pid;
> +}
> +
> +const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto = {
> +       .func           = bpf_get_ns_current_pid_tgid,
> +       .gpl_only       = false,
> +       .ret_type       = RET_INTEGER,
> +       .arg1_type      = ARG_ANYTHING,
> +       .arg2_type      = ARG_ANYTHING,
> +};
>
> Existing usage of bpf_get_current_pid_tgid() can be converted
> to bpf_get_ns_current_pid_tgid() if ns dev/inode number
> is supplied. For bpf_get_ns_current_pid_tgid(), checking
> return value ( < 0 or not) is needed.

Ok.  I missed something.

What is the problem bpf_get_ns_current_pid_tgid trying to solve
that bpf_get_current_pid_tgid does not solve.

I would think since much of tracing ebpf is fundamentally restricted
to the global root user.  Limiting the ebpf programs to the initial
pid namespace should not be a problem.

So I don't understand why you need to specify the namespace in
the ebpf call.

Can someone give me a clue what problem is being sovled by this
new call?

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
  2019-09-13 16:59                         ` ebiederm
@ 2019-09-13 17:28                           ` Yonghong Song
  0 siblings, 0 replies; 25+ messages in thread
From: Yonghong Song @ 2019-09-13 17:28 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Al Viro, Carlos Antonio Neira Bustos, netdev, brouer, bpf



On 9/13/19 5:59 PM, Eric W. Biederman wrote:
> Yonghong Song <yhs@fb.com> writes:
> 
>> On 9/11/19 9:16 AM, Eric W. Biederman wrote:
>>> Al Viro <viro@zeniv.linux.org.uk> writes:
>>>
>>>> On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
>>>>>
>>>>> Carlos,
>>>>>
>>>>> Discussed with Eric today for what is the best way to get
>>>>> the device number for a namespace. The following patch seems
>>>>> a reasonable start although Eric would like to see
>>>>> how the helper is used in order to decide whether the
>>>>> interface looks right.
>>>>>
>>>>> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
>>>>> Author: Yonghong Song <yhs@fb.com>
>>>>> Date:   Mon Sep 9 21:50:51 2019 -0700
>>>>>
>>>>>        nsfs: add an interface function ns_get_inum_dev()
>>>>>
>>>>>        This patch added an interface function
>>>>>        ns_get_inum_dev(). Given a ns_common structure,
>>>>>        the function returns the inode and device
>>>>>        numbers. The function will be used later
>>>>>        by a newly added bpf helper.
>>>>>
>>>>>        Signed-off-by: Yonghong Song <yhs@fb.com>
>>>>>
>>>>> diff --git a/fs/nsfs.c b/fs/nsfs.c
>>>>> index a0431642c6b5..a603c6fc3f54 100644
>>>>> --- a/fs/nsfs.c
>>>>> +++ b/fs/nsfs.c
>>>>> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
>>>>>            return ERR_PTR(-EINVAL);
>>>>>     }
>>>>>
>>>>> +/* Get the device number for the current task pidns.
>>>>> + */
>>>>> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
>>>>> +{
>>>>> +       *inum = ns->inum;
>>>>> +       *dev = nsfs_mnt->mnt_sb->s_dev;
>>>>> +}
>>>>
>>>> Umm...  Where would it get the device number once we get (hell knows
>>>> what for) multiple nsfs instances?  I still don't understand what
>>>> would that be about, TBH...  Is it really per-userns?  Or something
>>>> else entirely?  Eric, could you give some context?
>>>
>>> My goal is not to paint things into a corner, with future changes.
>>> Right now it is possible to stat a namespace file descriptor and
>>> get a device and inode number.  Then compare that.
>>>
>>> I don't want people using the inode number in nsfd as some magic
>>> namespace id.
>>>
>>> We have had times in the past where there was more than one superblock
>>> and thus more than one device number.  Further if userspace ever uses
>>> this heavily there may be times in the future where for
>>> checkpoint/restart purposes we will want multiple nsfd's so we can
>>> preserve the inode number accross a migration.
>>>
>>> Realistically there will probably just some kind of hotplug notification
>>> to userspace to say we have hotplugged your operatining system as
>>> a migration notification.
>>>
>>> Now the halway discussion did not quite capture everything I was trying
>>> to say but it at least got to the right ballpark.
>>>
>>> The helper in fs/nsfs.c should be:
>>>
>>> bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
>>> {
>>>           return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev));
>>> }
>>>
>>> That way if/when there are multiple inodes identifying the same
>>> namespace the bpf programs don't need to change.
>>
>> Thanks, Eric. This is indeed better. The bpf helper should focus
>> on comparing dev/ino, instead of return the dev/ino to bpf program.
>>
>> So overall, nsfs related change will look like:
>>
>> diff --git a/fs/nsfs.c b/fs/nsfs.c
>> index a0431642c6b5..7e78d89c2172 100644
>> --- a/fs/nsfs.c
>> +++ b/fs/nsfs.c
>> @@ -245,6 +245,11 @@ struct file *proc_ns_fget(int fd)
>>           return ERR_PTR(-EINVAL);
>>    }
>>
>> +bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
>> +{
>> +       return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev));
>> +}
>> +
>>    static int nsfs_show_path(struct seq_file *seq, struct dentry *dentry)
>>    {
>>           struct inode *inode = d_inode(dentry);
>> diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
>> index d31cb6215905..79639807e960 100644
>> --- a/include/linux/proc_ns.h
>> +++ b/include/linux/proc_ns.h
>> @@ -81,6 +81,7 @@ extern void *ns_get_path(struct path *path, struct
>> task_struct *task,
>>    typedef struct ns_common *ns_get_path_helper_t(void *);
>>    extern void *ns_get_path_cb(struct path *path, ns_get_path_helper_t
>> ns_get_cb,
>>                               void *private_data);
>> +extern bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino);
>>
>>    extern int ns_get_name(char *buf, size_t size, struct task_struct *task,
>>                           const struct proc_ns_operations *ns_ops);
>>
>>>
>>> Up farther in the stack it should be something like:
>>>
>>>> BPF_CALL_2(bpf_current_pidns_match, dev_t *dev, ino_t *ino)
>>>> {
>>>>           return ns_match(&task_active_pid_ns(current)->ns, *dev, *ino);
>>>> }
>>>>
>>>> const struct bpf_func_proto bpf_current_pidns_match_proto = {
>>>> 	.func		= bpf_current_pins_match,
>>>> 	.gpl_only	= true,
>>>> 	.ret_type	= RET_INTEGER
>>>> 	.arg1_type	= ARG_PTR_TO_DEVICE_NUMBER,
>>>> 	.arg2_type	= ARG_PTR_TO_INODE_NUMBER,
>>>> };
>>>
>>> That allows comparing what the bpf came up with with whatever value
>>> userspace generated by stating the file descriptor.
>>>
>>>
>>> That is the least bad suggestion I currently have for that
>>> functionality.  It really would be better to not have that filter in the
>>> bpf program itself but in the infrastructure that binds a program to a
>>> set of tasks.
>>>
>>> The problem with this approach is whatever device/inode you have when
>>> the namespace they refer to exits there is the possibility that the
>>> inode will be reused.  So your filter will eventually start matching on
>>> the wrong thing.
>>
>> I come up with a differeent helper definition, which is much more
>> similar to existing bpf_get_current_pid_tgid() and helper definition
>> much more conforms to bpf convention.
> 
> There is a problem with your bpf_get_ns_current_pid_tgid below.
> The inode number is a 64bit number.  To be nice to old userspace
> we try and not use 64bit inode numbers where they are not required
> but in this case we should not use an interface that assumes inode
> numbers are 32bit.  They just aren't.
> 
> I didn't know how to express that in the bpf proto so I did what
> I could.

We can change inum to u64. Just change the prototype like
`u64, inum` should be good.

> 
> The alternative to this would be to simply restrict this
> helper to bpf programs registered in the initial pid namespace.
> At which point you could just ensure all the numbers are in
> the global pid namespace.
> 
> Hmm.  Looing at the comment below I am confused.
> 
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index 5e28718928ca..bc26903c80c7 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -11,6 +11,8 @@
>>    #include <linux/uidgid.h>
>>    #include <linux/filter.h>
>>    #include <linux/ctype.h>
>> +#include <linux/pid_namespace.h>
>> +#include <linux/proc_ns.h>
>>
>>    #include "../../lib/kstrtox.h"
>>
>> @@ -487,3 +489,33 @@ const struct bpf_func_proto bpf_strtoul_proto = {
>>           .arg4_type      = ARG_PTR_TO_LONG,
>>    };
>>    #endif
>> +
>> +BPF_CALL_2(bpf_get_ns_current_pid_tgid, u32, dev, u32, inum)
>> +{
>> +       struct task_struct *task = current;
>> +       struct pid_namespace *pidns;
>> +       pid_t pid, tgid;
>> +
>> +       if (unlikely(!task))
>> +               return -EINVAL;
>> +
>> +
>> +       pidns = task_active_pid_ns(task);
>> +       if (unlikely(!pidns))
>> +               return -ENOENT;
>> +
>> +       if (!ns_match(&pidns->ns, (dev_t)dev, inum))
>> +               return -EINVAL;
>> +
>> +       pid = task_pid_nr_ns(task, pidns);
>> +       tgid = task_tgid_nr_ns(task, pidns);
>> +
>> +       return (u64) tgid << 32 | pid;
>> +}
>> +
>> +const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto = {
>> +       .func           = bpf_get_ns_current_pid_tgid,
>> +       .gpl_only       = false,
>> +       .ret_type       = RET_INTEGER,
>> +       .arg1_type      = ARG_ANYTHING,
>> +       .arg2_type      = ARG_ANYTHING,
>> +};
>>
>> Existing usage of bpf_get_current_pid_tgid() can be converted
>> to bpf_get_ns_current_pid_tgid() if ns dev/inode number
>> is supplied. For bpf_get_ns_current_pid_tgid(), checking
>> return value ( < 0 or not) is needed.
> 
> Ok.  I missed something.
> 
> What is the problem bpf_get_ns_current_pid_tgid trying to solve
> that bpf_get_current_pid_tgid does not solve.
> 
> I would think since much of tracing ebpf is fundamentally restricted
> to the global root user.  Limiting the ebpf programs to the initial
> pid namespace should not be a problem.
> 
> So I don't understand why you need to specify the namespace in
> the ebpf call.
> 
> Can someone give me a clue what problem is being sovled by this
> new call?

We want to run the bpf program inside the namespace.
There are parallel work to introduce CAP_BPF and CAP_TRACING
(https://lore.kernel.org/bpf/20190906231053.1276792-1-ast@kernel.org/T/#t)
to facilitate this.

We have users requesting to use bcc tools inside the containers.
https://github.com/iovisor/bcc/issues/1875
https://github.com/iovisor/bcc/issues/1366
https://github.com/iovisor/bcc/issues/1329
https://github.com/iovisor/bcc/issues/1532
...

Yes, this may require granting `root` privilege to containers.
This can be done outside container as well. But it is just a
big usability improvement if people can do inside the containers.

In addition, we have requests below and internal requests as well
to filter based on containers.
https://github.com/iovisor/bcc/issues/1119
The new helper permits at root that you can filter based on
a particular container (not container id, but dev/inode should
identifying one).

Hope this clarify why this helper is useful for tracing community.

> 
> Eric
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, back to index

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-06 15:09 [PATCH bpf-next v10 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira
2019-09-06 15:09 ` [PATCH bpf-next v10 1/4] fs/namei.c: make available filename_lookup() for bpf helpers Carlos Neira
2019-09-06 15:09 ` [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info Carlos Neira
2019-09-06 15:24   ` Al Viro
2019-09-06 15:46     ` Al Viro
2019-09-06 16:00       ` Al Viro
2019-09-06 23:21         ` Yonghong Song
2019-09-07  0:10           ` Al Viro
2019-09-07  6:34             ` Yonghong Song
2019-09-09 17:45               ` Carlos Antonio Neira Bustos
2019-09-10 22:35                 ` Yonghong Song
2019-09-10 23:15                   ` Al Viro
2019-09-11  8:16                     ` ebiederm
2019-09-12  5:49                       ` Yonghong Song
     [not found]                         ` <CACiB22j9M2gmccnh7XqqFp8g7qKFuiOrSAVJiA2tQHLB0pmoSQ@mail.gmail.com>
2019-09-13  2:56                           ` Yonghong Song
2019-09-13 11:58                             ` Carlos Antonio Neira Bustos
2019-09-13 16:59                         ` ebiederm
2019-09-13 17:28                           ` Yonghong Song
2019-09-11  4:32                   ` Carlos Antonio Neira Bustos
2019-09-11  8:17                     ` ebiederm
2019-09-10 22:46   ` Yonghong Song
2019-09-11  4:33     ` Carlos Antonio Neira Bustos
2019-09-06 15:09 ` [PATCH bpf-next v10 3/4] tools: Added bpf_get_current_pidns_info helper Carlos Neira
2019-09-06 15:09 ` [PATCH bpf-next v10 4/4] tools/testing/selftests/bpf: Add self-tests for helper bpf_get_pidns_info Carlos Neira
2019-09-10 22:55   ` Yonghong Song

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox