bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next V9 0/3] BPF: New helper to obtain namespace data  from current task
@ 2019-08-13 18:47 Carlos Neira
  2019-08-13 18:47 ` [PATCH bpf-next V9 1/3] bpf: new " Carlos Neira
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Carlos Neira @ 2019-08-13 18:47 UTC (permalink / raw)
  To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf

This helper obtains the active namespace from current and returns pid, tgid,
device and namespace id as seen from that namespace, allowing to instrument
a process inside a container.
Device is read from /proc/self/ns/pid, as in the future it's possible that
different pid_ns files may belong to different devices, according
to the discussion between Eric Biederman and Yonghong in 2017 linux plumbers
conference.
Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
scripts but this helper returns the pid as seen by the root namespace which is
fine when a bcc script is not executed inside a container.
When the process of interest is inside a container, pid filtering will not work
if bpf_get_current_pid_tgid() is used. This helper addresses this limitation
returning the pid as it's seen by the current namespace where the script is
executing.

This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
used to do pid filtering even inside a container.

For example a bcc script using bpf_get_current_pid_tgid() (tools/funccount.py):

        u32 pid = bpf_get_current_pid_tgid() >> 32;
        if (pid != <pid_arg_passed_in>)
                return 0;
Could be modified to use bpf_get_current_pidns_info() as follows:

        struct bpf_pidns pidns;
        bpf_get_current_pidns_info(&pidns, sizeof(struct bpf_pidns));
        u32 pid = pidns.tgid;
        u32 nsid = pidns.nsid;
        if ((pid != <pid_arg_passed_in>) && (nsid != <nsid_arg_passed_in>))
                return 0;

To find out the name PID namespace id of a process, you could use this command:

$ ps -h -o pidns -p <pid_of_interest>

Or this other command:

$ ls -Li /proc/<pid_of_interest>/ns/pid

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>

Carlos Neira (3):
  bpf: new helper to obtain namespace data from current task
  samples/bpf: added sample code for bpf_get_current_pidns_info.
  tools/testing/selftests/bpf: Add self-tests for new helper.

 fs/internal.h                                      |   2 -
 fs/namei.c                                         |   1 -
 include/linux/bpf.h                                |   1 +
 include/linux/namei.h                              |   4 +
 include/uapi/linux/bpf.h                           |  31 ++++-
 kernel/bpf/core.c                                  |   1 +
 kernel/bpf/helpers.c                               |  64 ++++++++++
 kernel/trace/bpf_trace.c                           |   2 +
 samples/bpf/Makefile                               |   3 +
 samples/bpf/trace_ns_info_user.c                   |  35 ++++++
 samples/bpf/trace_ns_info_user_kern.c              |  44 +++++++
 tools/include/uapi/linux/bpf.h                     |  31 ++++-
 tools/testing/selftests/bpf/Makefile               |   2 +-
 tools/testing/selftests/bpf/bpf_helpers.h          |   3 +
 .../testing/selftests/bpf/progs/test_pidns_kern.c  |  51 ++++++++
 tools/testing/selftests/bpf/test_pidns.c           | 138 +++++++++++++++++++++
 16 files changed, 407 insertions(+), 6 deletions(-)
 create mode 100644 samples/bpf/trace_ns_info_user.c
 create mode 100644 samples/bpf/trace_ns_info_user_kern.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
 create mode 100644 tools/testing/selftests/bpf/test_pidns.c

-- 
2.11.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
  2019-08-13 18:47 [PATCH bpf-next V9 0/3] BPF: New helper to obtain namespace data from current task Carlos Neira
@ 2019-08-13 18:47 ` Carlos Neira
  2019-08-13 22:35   ` Yonghong Song
  2019-08-13 23:11   ` Yonghong Song
  2019-08-13 18:47 ` [PATCH bpf-next V9 2/3] samples/bpf: added sample code for bpf_get_current_pidns_info Carlos Neira
  2019-08-13 18:47 ` [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper Carlos Neira
  2 siblings, 2 replies; 16+ messages in thread
From: Carlos Neira @ 2019-08-13 18:47 UTC (permalink / raw)
  To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf

From: Carlos <cneirabustos@gmail.com>

New bpf helper bpf_get_current_pidns_info.
This helper obtains the active namespace from current and returns
pid, tgid, device and namespace id as seen from that namespace,
allowing to instrument a process inside a container.

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
---
 fs/internal.h            |  2 --
 fs/namei.c               |  1 -
 include/linux/bpf.h      |  1 +
 include/linux/namei.h    |  4 +++
 include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
 kernel/bpf/core.c        |  1 +
 kernel/bpf/helpers.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/bpf_trace.c |  2 ++
 8 files changed, 102 insertions(+), 4 deletions(-)

diff --git a/fs/internal.h b/fs/internal.h
index 315fcd8d237c..6647e15dd419 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
 /*
  * namei.c
  */
-extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
-			   struct path *path, struct path *root);
 extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
 extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
 			   const char *, unsigned int, struct path *);
diff --git a/fs/namei.c b/fs/namei.c
index 209c51a5226c..a89fc72a4a10 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -19,7 +19,6 @@
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
-#include <linux/fs.h>
 #include <linux/namei.h>
 #include <linux/pagemap.h>
 #include <linux/fsnotify.h>
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f9a506147c8a..e4adf5e05afd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
 extern const struct bpf_func_proto bpf_strtol_proto;
 extern const struct bpf_func_proto bpf_strtoul_proto;
 extern const struct bpf_func_proto bpf_tcp_sock_proto;
+extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
 
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 9138b4471dbf..b45c8b6f7cb4 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -6,6 +6,7 @@
 #include <linux/path.h>
 #include <linux/fcntl.h>
 #include <linux/errno.h>
+#include <linux/fs.h>
 
 enum { MAX_NESTED_LINKS = 8 };
 
@@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
 
 extern void nd_jump_link(struct path *path);
 
+extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
+			   struct path *path, struct path *root);
+
 static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
 {
 	((char *) name)[min(len, maxlen)] = '\0';
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4393bd4b2419..db241857ec15 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2741,6 +2741,28 @@ union bpf_attr {
  *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
+ *	Description
+ *		Copies into *pidns* pid, namespace id and tgid as seen by the
+ *		current namespace and also device from /proc/self/ns/pid.
+ *		*size_of_pidns* must be the size of *pidns*
+ *
+ *		This helper is used when pid filtering is needed inside a
+ *		container as bpf_get_current_tgid() helper returns always the
+ *		pid id as seen by the root namespace.
+ *	Return
+ *		0 on success
+ *
+ *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
+ *		or tgid of the current task.
+ *
+ *		**-ECHILD** if /proc/self/ns/pid does not exists.
+ *
+ *		**-ENOTDIR** if /proc/self/ns does not exists.
+ *
+ *		**-ENOMEM**  if allocation fails.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2853,7 +2875,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(get_current_pidns_info),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -3604,4 +3627,10 @@ struct bpf_sockopt {
 	__s32	retval;
 };
 
+struct bpf_pidns_info {
+	__u32 dev;
+	__u32 nsid;
+	__u32 tgid;
+	__u32 pid;
+};
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 8191a7db2777..3159f2a0188c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
 const struct bpf_func_proto bpf_get_current_comm_proto __weak;
 const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
 const struct bpf_func_proto bpf_get_local_storage_proto __weak;
+const struct bpf_func_proto bpf_get_current_pidns_info __weak;
 
 const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
 {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 5e28718928ca..41fbf1f28a48 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -11,6 +11,12 @@
 #include <linux/uidgid.h>
 #include <linux/filter.h>
 #include <linux/ctype.h>
+#include <linux/pid_namespace.h>
+#include <linux/major.h>
+#include <linux/stat.h>
+#include <linux/namei.h>
+#include <linux/version.h>
+
 
 #include "../../lib/kstrtox.h"
 
@@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
 	preempt_enable();
 }
 
+BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
+	 size)
+{
+	const char *pidns_path = "/proc/self/ns/pid";
+	struct pid_namespace *pidns = NULL;
+	struct filename *tmp = NULL;
+	struct inode *inode;
+	struct path kp;
+	pid_t tgid = 0;
+	pid_t pid = 0;
+	int ret;
+	int len;
+
+	if (unlikely(size != sizeof(struct bpf_pidns_info)))
+		return -EINVAL;
+	pidns = task_active_pid_ns(current);
+	if (unlikely(!pidns))
+		goto clear;
+	pidns_info->nsid =  pidns->ns.inum;
+	pid = task_pid_nr_ns(current, pidns);
+	if (unlikely(!pid))
+		goto clear;
+	tgid = task_tgid_nr_ns(current, pidns);
+	if (unlikely(!tgid))
+		goto clear;
+	pidns_info->tgid = (u32) tgid;
+	pidns_info->pid = (u32) pid;
+	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
+	if (unlikely(!tmp)) {
+		memset((void *)pidns_info, 0, (size_t) size);
+		return -ENOMEM;
+	}
+	len = strlen(pidns_path) + 1;
+	memcpy((char *)tmp->name, pidns_path, len);
+	tmp->uptr = NULL;
+	tmp->aname = NULL;
+	tmp->refcnt = 1;
+	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
+	if (ret) {
+		memset((void *)pidns_info, 0, (size_t) size);
+		return ret;
+	}
+	inode = d_backing_inode(kp.dentry);
+	pidns_info->dev = inode->i_sb->s_dev;
+	return 0;
+clear:
+	memset((void *)pidns_info, 0, (size_t) size);
+	return -EINVAL;
+}
+
+const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
+	.func		= bpf_get_current_pidns_info,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_CONST_SIZE,
+};
+
 #ifdef CONFIG_CGROUPS
 BPF_CALL_0(bpf_get_current_cgroup_id)
 {
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ca1255d14576..5e1dc22765a5 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 	case BPF_FUNC_send_signal:
 		return &bpf_send_signal_proto;
+	case BPF_FUNC_get_current_pidns_info:
+		return &bpf_get_current_pidns_info_proto;
 	default:
 		return NULL;
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next V9 2/3] samples/bpf: added sample code for bpf_get_current_pidns_info.
  2019-08-13 18:47 [PATCH bpf-next V9 0/3] BPF: New helper to obtain namespace data from current task Carlos Neira
  2019-08-13 18:47 ` [PATCH bpf-next V9 1/3] bpf: new " Carlos Neira
@ 2019-08-13 18:47 ` Carlos Neira
  2019-08-13 18:47 ` [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper Carlos Neira
  2 siblings, 0 replies; 16+ messages in thread
From: Carlos Neira @ 2019-08-13 18:47 UTC (permalink / raw)
  To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf

From: Carlos <cneirabustos@gmail.com>

sample program to call new bpf helper bpf_get_current_pidns_info.

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
---
 samples/bpf/Makefile                  |  3 +++
 samples/bpf/trace_ns_info_user.c      | 35 ++++++++++++++++++++++++++++
 samples/bpf/trace_ns_info_user_kern.c | 44 +++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+)
 create mode 100644 samples/bpf/trace_ns_info_user.c
 create mode 100644 samples/bpf/trace_ns_info_user_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 1d9be26b4edd..238453ff27d2 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -53,6 +53,7 @@ hostprogs-y += task_fd_query
 hostprogs-y += xdp_sample_pkts
 hostprogs-y += ibumad
 hostprogs-y += hbm
+hostprogs-y += trace_ns_info
 
 # Libbpf dependencies
 LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
@@ -109,6 +110,7 @@ task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS)
 xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS)
 ibumad-objs := bpf_load.o ibumad_user.o $(TRACE_HELPERS)
 hbm-objs := bpf_load.o hbm.o $(CGROUP_HELPERS)
+trace_ns_info-objs := bpf_load.o trace_ns_info_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -170,6 +172,7 @@ always += xdp_sample_pkts_kern.o
 always += ibumad_kern.o
 always += hbm_out_kern.o
 always += hbm_edt_kern.o
+always += trace_ns_info_user_kern.o
 
 KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
 KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/bpf/
diff --git a/samples/bpf/trace_ns_info_user.c b/samples/bpf/trace_ns_info_user.c
new file mode 100644
index 000000000000..e06d08db6f30
--- /dev/null
+++ b/samples/bpf/trace_ns_info_user.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <stdio.h>
+#include <linux/bpf.h>
+#include <unistd.h>
+#include "bpf/libbpf.h"
+#include "bpf_load.h"
+
+/* This code was taken verbatim from tracex1_user.c, it's used
+ * to exercize bpf_get_current_pidns_info() helper call.
+ */
+int main(int ac, char **argv)
+{
+	FILE *f;
+	char filename[256];
+
+	snprintf(filename, sizeof(filename), "%s_user_kern.o", argv[0]);
+	printf("loading %s\n", filename);
+
+	if (load_bpf_file(filename)) {
+		printf("%s", bpf_log_buf);
+		return 1;
+	}
+
+	f = popen("taskset 1 ping  localhost", "r");
+	(void) f;
+	read_trace_pipe();
+	return 0;
+}
diff --git a/samples/bpf/trace_ns_info_user_kern.c b/samples/bpf/trace_ns_info_user_kern.c
new file mode 100644
index 000000000000..96675e02b707
--- /dev/null
+++ b/samples/bpf/trace_ns_info_user_kern.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/version.h>
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+typedef __u64 u64;
+typedef __u32 u32;
+
+
+/* kprobe is NOT a stable ABI
+ * kernel functions can be removed, renamed or completely change semantics.
+ * Number of arguments and their positions can change, etc.
+ * In such case this bpf+kprobe example will no longer be meaningful
+ */
+
+/* This will call bpf_get_current_pidns_info() to display pid and ns values
+ * as seen by the current namespace, on the far left you will see the pid as
+ * seen as by the root namespace.
+ */
+
+SEC("kprobe/__netif_receive_skb_core")
+int bpf_prog1(struct pt_regs *ctx)
+{
+	char fmt[] = "nsid:%u, dev: %u,  pid:%u\n";
+	struct bpf_pidns_info nsinfo;
+	int ok = 0;
+
+	ok = bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo));
+	if (ok == 0)
+		bpf_trace_printk(fmt, sizeof(fmt), (u32)nsinfo.nsid,
+				 (u32) nsinfo.dev, (u32)nsinfo.pid);
+
+	return 0;
+}
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper.
  2019-08-13 18:47 [PATCH bpf-next V9 0/3] BPF: New helper to obtain namespace data from current task Carlos Neira
  2019-08-13 18:47 ` [PATCH bpf-next V9 1/3] bpf: new " Carlos Neira
  2019-08-13 18:47 ` [PATCH bpf-next V9 2/3] samples/bpf: added sample code for bpf_get_current_pidns_info Carlos Neira
@ 2019-08-13 18:47 ` Carlos Neira
  2019-08-13 23:19   ` Yonghong Song
  2 siblings, 1 reply; 16+ messages in thread
From: Carlos Neira @ 2019-08-13 18:47 UTC (permalink / raw)
  To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf

From: Carlos <cneirabustos@gmail.com>

Added self-tests for new helper bpf_get_current_pidns_info.

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
---
 tools/include/uapi/linux/bpf.h                     |  31 ++++-
 tools/testing/selftests/bpf/Makefile               |   2 +-
 tools/testing/selftests/bpf/bpf_helpers.h          |   3 +
 .../testing/selftests/bpf/progs/test_pidns_kern.c  |  51 ++++++++
 tools/testing/selftests/bpf/test_pidns.c           | 138 +++++++++++++++++++++
 5 files changed, 223 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
 create mode 100644 tools/testing/selftests/bpf/test_pidns.c

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4393bd4b2419..db241857ec15 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2741,6 +2741,28 @@ union bpf_attr {
  *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
+ *	Description
+ *		Copies into *pidns* pid, namespace id and tgid as seen by the
+ *		current namespace and also device from /proc/self/ns/pid.
+ *		*size_of_pidns* must be the size of *pidns*
+ *
+ *		This helper is used when pid filtering is needed inside a
+ *		container as bpf_get_current_tgid() helper returns always the
+ *		pid id as seen by the root namespace.
+ *	Return
+ *		0 on success
+ *
+ *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
+ *		or tgid of the current task.
+ *
+ *		**-ECHILD** if /proc/self/ns/pid does not exists.
+ *
+ *		**-ENOTDIR** if /proc/self/ns does not exists.
+ *
+ *		**-ENOMEM**  if allocation fails.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2853,7 +2875,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(get_current_pidns_info),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -3604,4 +3627,10 @@ struct bpf_sockopt {
 	__s32	retval;
 };
 
+struct bpf_pidns_info {
+	__u32 dev;
+	__u32 nsid;
+	__u32 tgid;
+	__u32 pid;
+};
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 3bd0f4a0336a..1f97b571b581 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
 	test_cgroup_storage test_select_reuseport test_section_names \
 	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
 	test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
-	test_sockopt_multi test_tcp_rtt
+	test_sockopt_multi test_tcp_rtt test_pidns
 
 BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
 TEST_GEN_FILES = $(BPF_OBJ_FILES)
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 8b503ea142f0..3fae3b9fcd2c 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
 static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
 					  int ip_len, void *tcp, int tcp_len) =
 	(void *) BPF_FUNC_tcp_gen_syncookie;
+static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info *buf,
+					 unsigned int buf_size) =
+	(void *) BPF_FUNC_get_current_pidns_info;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
new file mode 100644
index 000000000000..e1d2facfa762
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <linux/bpf.h>
+#include <errno.h>
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") nsidmap = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u32),
+	.max_entries = 1,
+};
+
+struct bpf_map_def SEC("maps") pidmap = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u32),
+	.max_entries = 1,
+};
+
+SEC("tracepoint/syscalls/sys_enter_nanosleep")
+int trace(void *ctx)
+{
+	struct bpf_pidns_info nsinfo;
+	__u32 key = 0, *expected_pid, *val;
+	char fmt[] = "ERROR nspid:%d\n";
+
+	if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
+		return -EINVAL;
+
+	expected_pid = bpf_map_lookup_elem(&pidmap, &key);
+
+
+	if (!expected_pid || *expected_pid != nsinfo.pid)
+		return 0;
+
+	val = bpf_map_lookup_elem(&nsidmap, &key);
+	if (val)
+		*val = nsinfo.nsid;
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c
new file mode 100644
index 000000000000..a7254055f294
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_pidns.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <syscall.h>
+#include <unistd.h>
+#include <linux/perf_event.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <linux/bpf.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "cgroup_helpers.h"
+#include "bpf_rlimit.h"
+
+#define CHECK(condition, tag, format...) ({		\
+	int __ret = !!(condition);			\
+	if (__ret) {					\
+		printf("%s:FAIL:%s ", __func__, tag);	\
+		printf(format);				\
+	} else {					\
+		printf("%s:PASS:%s\n", __func__, tag);	\
+	}						\
+	__ret;						\
+})
+
+static int bpf_find_map(const char *test, struct bpf_object *obj,
+			const char *name)
+{
+	struct bpf_map *map;
+
+	map = bpf_object__find_map_by_name(obj, name);
+	if (!map)
+		return -1;
+	return bpf_map__fd(map);
+}
+
+
+int main(int argc, char **argv)
+{
+	const char *probe_name = "syscalls/sys_enter_nanosleep";
+	const char *file = "test_pidns_kern.o";
+	int err, bytes, efd, prog_fd, pmu_fd;
+	int pidmap_fd, nsidmap_fd;
+	struct perf_event_attr attr = {};
+	struct bpf_object *obj;
+	__u32 knsid = 0;
+	__u32 key = 0, pid;
+	int exit_code = 1;
+	struct stat st;
+	char buf[256];
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
+	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
+		goto cleanup_cgroup_env;
+
+	nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
+	if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+		  nsidmap_fd, errno))
+		goto close_prog;
+
+	pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
+	if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+		  pidmap_fd, errno))
+		goto close_prog;
+
+	pid = getpid();
+	bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
+
+	snprintf(buf, sizeof(buf),
+		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
+	efd = open(buf, O_RDONLY, 0);
+	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
+		goto close_prog;
+	bytes = read(efd, buf, sizeof(buf));
+	close(efd);
+	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
+		  "bytes %d errno %d\n", bytes, errno))
+		goto close_prog;
+
+	attr.config = strtol(buf, NULL, 0);
+	attr.type = PERF_TYPE_TRACEPOINT;
+	attr.sample_type = PERF_SAMPLE_RAW;
+	attr.sample_period = 1;
+	attr.wakeup_events = 1;
+
+	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
+	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
+		  errno))
+		goto close_prog;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
+		  errno))
+		goto close_pmu;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
+		  errno))
+		goto close_pmu;
+
+	/* trigger some syscalls */
+	sleep(1);
+
+	err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
+	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
+		goto close_pmu;
+
+	if (stat("/proc/self/ns/pid", &st))
+		goto close_pmu;
+
+	if (CHECK(knsid != (__u32) st.st_ino, "compare_namespace_id",
+		  "kern knsid %u user unsid %u\n", knsid, (__u32) st.st_ino))
+		goto close_pmu;
+
+	exit_code = 0;
+	printf("%s:PASS\n", argv[0]);
+
+close_pmu:
+	close(pmu_fd);
+close_prog:
+	bpf_object__close(obj);
+cleanup_cgroup_env:
+	return exit_code;
+}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
  2019-08-13 18:47 ` [PATCH bpf-next V9 1/3] bpf: new " Carlos Neira
@ 2019-08-13 22:35   ` Yonghong Song
  2019-08-20 15:10     ` Carlos Antonio Neira Bustos
  2019-08-13 23:11   ` Yonghong Song
  1 sibling, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2019-08-13 22:35 UTC (permalink / raw)
  To: Carlos Neira, netdev; +Cc: ebiederm, brouer, bpf



On 8/13/19 11:47 AM, Carlos Neira wrote:
> From: Carlos <cneirabustos@gmail.com>
> 
> New bpf helper bpf_get_current_pidns_info.
> This helper obtains the active namespace from current and returns
> pid, tgid, device and namespace id as seen from that namespace,
> allowing to instrument a process inside a container.
> 
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
>   fs/internal.h            |  2 --
>   fs/namei.c               |  1 -
>   include/linux/bpf.h      |  1 +
>   include/linux/namei.h    |  4 +++
>   include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
>   kernel/bpf/core.c        |  1 +
>   kernel/bpf/helpers.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
>   kernel/trace/bpf_trace.c |  2 ++
>   8 files changed, 102 insertions(+), 4 deletions(-)

I prefer to break this into two patches to reduce
the potential merging conflicts:
   patch 1: fs/internal.h, fs/namei.c, include/linux/namei.h
   patch 2: rest of changes
patch 1 is simply a preparing patches to make filename_lookup
available later.

> 
> diff --git a/fs/internal.h b/fs/internal.h
> index 315fcd8d237c..6647e15dd419 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
>   /*
>    * namei.c
>    */
> -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> -			   struct path *path, struct path *root);
>   extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
>   extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
>   			   const char *, unsigned int, struct path *);
> diff --git a/fs/namei.c b/fs/namei.c
> index 209c51a5226c..a89fc72a4a10 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -19,7 +19,6 @@
>   #include <linux/export.h>
>   #include <linux/kernel.h>
>   #include <linux/slab.h>
> -#include <linux/fs.h>
>   #include <linux/namei.h>
>   #include <linux/pagemap.h>
>   #include <linux/fsnotify.h>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f9a506147c8a..e4adf5e05afd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
>   extern const struct bpf_func_proto bpf_strtol_proto;
>   extern const struct bpf_func_proto bpf_strtoul_proto;
>   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
>   
>   /* Shared helpers among cBPF and eBPF. */
>   void bpf_user_rnd_init_once(void);
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 9138b4471dbf..b45c8b6f7cb4 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -6,6 +6,7 @@
>   #include <linux/path.h>
>   #include <linux/fcntl.h>
>   #include <linux/errno.h>
> +#include <linux/fs.h>
>   
>   enum { MAX_NESTED_LINKS = 8 };
>   
> @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
>   
>   extern void nd_jump_link(struct path *path);
>   
> +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> +			   struct path *path, struct path *root);
> +
>   static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
>   {
>   	((char *) name)[min(len, maxlen)] = '\0';
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4393bd4b2419..db241857ec15 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2741,6 +2741,28 @@ union bpf_attr {
>    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>    *
>    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)

size_of_pidns => size.

> + *	Description
> + *		Copies into *pidns* pid, namespace id and tgid as seen by the
Copies => Copy.
Maybe something like below:
Get tgid, pid and namespace id as seen by the current namespace, and 
device major/minor numbers from device /proc/self/ns/pid. Such
information is stored in *pidns* of size *size*.

> + *		current namespace and also device from /proc/self/ns/pid.
> + *		*size_of_pidns* must be the size of *pidns*
> + *
> + *		This helper is used when pid filtering is needed inside a
> + *		container as bpf_get_current_tgid() helper returns always the

returns always => always returns.

> + *		pid id as seen by the root namespace.
> + *	Return
> + *		0 on success
> + *
> + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> + *		or tgid of the current task.
> + *
> + *		**-ECHILD** if /proc/self/ns/pid does not exists.
> + *
> + *		**-ENOTDIR** if /proc/self/ns does not exists.

Let us remove ECHILD and ENOTDIR and replace it with ENOENT as I
described below.

Please *do verify* what happens when namespaces or pid_ns are not
configured.

> + *
> + *		**-ENOMEM**  if allocation fails.

helper internal allocation fails.

> + *
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2853,7 +2875,8 @@ union bpf_attr {
>   	FN(sk_storage_get),		\
>   	FN(sk_storage_delete),		\
>   	FN(send_signal),		\
> -	FN(tcp_gen_syncookie),
> +	FN(tcp_gen_syncookie),		\
> +	FN(get_current_pidns_info),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
>   	__s32	retval;
>   };
>   
> +struct bpf_pidns_info {
> +	__u32 dev;

Please add a comment for dev for how device major and minor number are 
derived. User space gets device major and minor number, they need to
compare to the corresponding major/minor numbers returned by this helper.

> +	__u32 nsid;
> +	__u32 tgid;
> +	__u32 pid;
> +};
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 8191a7db2777..3159f2a0188c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
>   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
>   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
>   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
>   
>   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
>   {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5e28718928ca..41fbf1f28a48 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -11,6 +11,12 @@
>   #include <linux/uidgid.h>
>   #include <linux/filter.h>
>   #include <linux/ctype.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/major.h>
> +#include <linux/stat.h>
> +#include <linux/namei.h>
> +#include <linux/version.h>
> +
>   
>   #include "../../lib/kstrtox.h"
>   
> @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
>   	preempt_enable();
>   }
>   
> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> +	 size)
> +{
> +	const char *pidns_path = "/proc/self/ns/pid";
> +	struct pid_namespace *pidns = NULL;
> +	struct filename *tmp = NULL;

tmp => fname

> +	struct inode *inode;
> +	struct path kp;
> +	pid_t tgid = 0;
> +	pid_t pid = 0;
> +	int ret;
> +	int len;
> +
> +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> +		return -EINVAL;

Please put an empty line. As a general rule for readability,
put an empty line if control flow is interrupted, e.g., by
"return", "break" or "continue". At least this is what
I saw most in bpf mailing list.

> +	pidns = task_active_pid_ns(current);
> +	if (unlikely(!pidns))
> +		goto clear;

An empty line. Also, there is nothing to clear.
I prefer an error code -ENOENT.

You can set
	ret = -EINVAL;
here

> +	pidns_info->nsid =  pidns->ns.inum;
> +	pid = task_pid_nr_ns(current, pidns);
> +	if (unlikely(!pid))
> +		goto clear;

An empty line.

> +	tgid = task_tgid_nr_ns(current, pidns);
> +	if (unlikely(!tgid))
> +		goto clear;

An empty line.

> +	pidns_info->tgid = (u32) tgid;
> +	pidns_info->pid = (u32) pid;

Different functionality, an empty line.

> +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> +	if (unlikely(!tmp)) {
> +		memset((void *)pidns_info, 0, (size_t) size);
> +		return -ENOMEM;

ret = -ENOMEM;
goto clear;

> +	}

An empty line.

> +	len = strlen(pidns_path) + 1;
> +	memcpy((char *)tmp->name, pidns_path, len);
> +	tmp->uptr = NULL;
> +	tmp->aname = NULL;
> +	tmp->refcnt = 1;
> +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
Adding below to free kmem cache memory
	kmem_cache_free(names_cachep, fname);

In the above, we checked task_active_pid_ns().
If not returning NULL, we have a valid pid ns. So the above
filename_lookup should not go wrong. We can still keep
the error checking though.

> +	if (ret) {
> +		memset((void *)pidns_info, 0, (size_t) size);
> +		return ret;

goto clear;

> +	}

An empty line.

> +	inode = d_backing_inode(kp.dentry);
> +	pidns_info->dev = inode->i_sb->s_dev;
> +	return 0;

An empty line.

> +clear > +	memset((void *)pidns_info, 0, (size_t) size);
> +	return -EINVAL;
> +}
> +
> +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> +	.func		= bpf_get_current_pidns_info,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg2_type	= ARG_CONST_SIZE,
> +};
> +
>   #ifdef CONFIG_CGROUPS
>   BPF_CALL_0(bpf_get_current_cgroup_id)
>   {
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ca1255d14576..5e1dc22765a5 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   #endif
>   	case BPF_FUNC_send_signal:
>   		return &bpf_send_signal_proto;
> +	case BPF_FUNC_get_current_pidns_info:
> +		return &bpf_get_current_pidns_info_proto;
>   	default:
>   		return NULL;
>   	}
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
  2019-08-13 18:47 ` [PATCH bpf-next V9 1/3] bpf: new " Carlos Neira
  2019-08-13 22:35   ` Yonghong Song
@ 2019-08-13 23:11   ` Yonghong Song
  2019-08-13 23:51     ` [Potential Spoof] " Yonghong Song
  2019-08-14  0:56     ` Carlos Antonio Neira Bustos
  1 sibling, 2 replies; 16+ messages in thread
From: Yonghong Song @ 2019-08-13 23:11 UTC (permalink / raw)
  To: Carlos Neira, netdev; +Cc: ebiederm, brouer, bpf



On 8/13/19 11:47 AM, Carlos Neira wrote:
> From: Carlos <cneirabustos@gmail.com>
> 
> New bpf helper bpf_get_current_pidns_info.
> This helper obtains the active namespace from current and returns
> pid, tgid, device and namespace id as seen from that namespace,
> allowing to instrument a process inside a container.
> 
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
>   fs/internal.h            |  2 --
>   fs/namei.c               |  1 -
>   include/linux/bpf.h      |  1 +
>   include/linux/namei.h    |  4 +++
>   include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
>   kernel/bpf/core.c        |  1 +
>   kernel/bpf/helpers.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
>   kernel/trace/bpf_trace.c |  2 ++
>   8 files changed, 102 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/internal.h b/fs/internal.h
> index 315fcd8d237c..6647e15dd419 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
>   /*
>    * namei.c
>    */
> -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> -			   struct path *path, struct path *root);
>   extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
>   extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
>   			   const char *, unsigned int, struct path *);
> diff --git a/fs/namei.c b/fs/namei.c
> index 209c51a5226c..a89fc72a4a10 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -19,7 +19,6 @@
>   #include <linux/export.h>
>   #include <linux/kernel.h>
>   #include <linux/slab.h>
> -#include <linux/fs.h>
>   #include <linux/namei.h>
>   #include <linux/pagemap.h>
>   #include <linux/fsnotify.h>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f9a506147c8a..e4adf5e05afd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
>   extern const struct bpf_func_proto bpf_strtol_proto;
>   extern const struct bpf_func_proto bpf_strtoul_proto;
>   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
>   
>   /* Shared helpers among cBPF and eBPF. */
>   void bpf_user_rnd_init_once(void);
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 9138b4471dbf..b45c8b6f7cb4 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -6,6 +6,7 @@
>   #include <linux/path.h>
>   #include <linux/fcntl.h>
>   #include <linux/errno.h>
> +#include <linux/fs.h>
>   
>   enum { MAX_NESTED_LINKS = 8 };
>   
> @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
>   
>   extern void nd_jump_link(struct path *path);
>   
> +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> +			   struct path *path, struct path *root);
> +
>   static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
>   {
>   	((char *) name)[min(len, maxlen)] = '\0';
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4393bd4b2419..db241857ec15 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2741,6 +2741,28 @@ union bpf_attr {
>    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>    *
>    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> + *	Description
> + *		Copies into *pidns* pid, namespace id and tgid as seen by the
> + *		current namespace and also device from /proc/self/ns/pid.
> + *		*size_of_pidns* must be the size of *pidns*
> + *
> + *		This helper is used when pid filtering is needed inside a
> + *		container as bpf_get_current_tgid() helper returns always the
> + *		pid id as seen by the root namespace.
> + *	Return
> + *		0 on success
> + *
> + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> + *		or tgid of the current task.
> + *
> + *		**-ECHILD** if /proc/self/ns/pid does not exists.
> + *
> + *		**-ENOTDIR** if /proc/self/ns does not exists.
> + *
> + *		**-ENOMEM**  if allocation fails.
> + *
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2853,7 +2875,8 @@ union bpf_attr {
>   	FN(sk_storage_get),		\
>   	FN(sk_storage_delete),		\
>   	FN(send_signal),		\
> -	FN(tcp_gen_syncookie),
> +	FN(tcp_gen_syncookie),		\
> +	FN(get_current_pidns_info),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
>   	__s32	retval;
>   };
>   
> +struct bpf_pidns_info {
> +	__u32 dev;
> +	__u32 nsid;
> +	__u32 tgid;
> +	__u32 pid;
> +};
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 8191a7db2777..3159f2a0188c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
>   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
>   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
>   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
>   
>   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
>   {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5e28718928ca..41fbf1f28a48 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -11,6 +11,12 @@
>   #include <linux/uidgid.h>
>   #include <linux/filter.h>
>   #include <linux/ctype.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/major.h>
> +#include <linux/stat.h>
> +#include <linux/namei.h>
> +#include <linux/version.h>
> +
>   
>   #include "../../lib/kstrtox.h"
>   
> @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
>   	preempt_enable();
>   }
>   
> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> +	 size)
> +{
> +	const char *pidns_path = "/proc/self/ns/pid";
> +	struct pid_namespace *pidns = NULL;
> +	struct filename *tmp = NULL;
> +	struct inode *inode;
> +	struct path kp;
> +	pid_t tgid = 0;
> +	pid_t pid = 0;
> +	int ret;
> +	int len;

I am running your sample program and get the following kernel bug:

...
[   26.414825] BUG: sleeping function called from invalid context at 
/data/users/yhs/work/net-next/fs
/dcache.c:843
[   26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
[   26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G        W 
5.3.0-rc1+ #280
[   26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.9.3-1.el7.centos 04/01/2
014
[   26.419393] Call Trace:
[   26.419697]  <IRQ>
[   26.419960]  dump_stack+0x46/0x5b
[   26.420434]  ___might_sleep+0xe4/0x110
[   26.420894]  dput+0x2a/0x200
[   26.421265]  walk_component+0x10c/0x280
[   26.421773]  link_path_walk+0x327/0x560
[   26.422280]  ? proc_ns_dir_readdir+0x1a0/0x1a0
[   26.422848]  ? path_init+0x232/0x330
[   26.423364]  path_lookupat+0x88/0x200
[   26.423808]  ? selinux_parse_skb.constprop.69+0x124/0x430
[   26.424521]  filename_lookup+0xaf/0x190
[   26.425031]  ? simple_attr_release+0x20/0x20
[   26.425560]  bpf_get_current_pidns_info+0xfa/0x190
[   26.426168]  bpf_prog_83627154cefed596+0xe66/0x1000
[   26.426779]  trace_call_bpf+0xb5/0x160
[   26.427317]  ? __netif_receive_skb_core+0x1/0xbb0
[   26.427929]  ? __netif_receive_skb_core+0x1/0xbb0
[   26.428496]  kprobe_perf_func+0x4d/0x280
[   26.428986]  ? tracing_record_taskinfo_skip+0x1a/0x30
[   26.429584]  ? tracing_record_taskinfo+0xe/0x80
[   26.430152]  ? ttwu_do_wakeup.isra.114+0xcf/0xf0
[   26.430737]  ? __netif_receive_skb_core+0x1/0xbb0
[   26.431334]  ? __netif_receive_skb_core+0x5/0xbb0
[   26.431930]  kprobe_ftrace_handler+0x90/0xf0
[   26.432495]  ftrace_ops_assist_func+0x63/0x100
[   26.433060]  0xffffffffc03180bf
[   26.433471]  ? __netif_receive_skb_core+0x1/0xbb0
...

To prevent we are running in arbitrary task (e.g., idle task)
context which may introduce sleeping issues, the following
probably appropriate:

        if (in_nmi() || in_softirq())
                return -EPERM;

Anyway, if in nmi or softirq, the namespace and pid/tgid
we get may be just accidentally associated with the bpf running
context, but it could be in a different context. So such info
is not reliable any way.

> +
> +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> +		return -EINVAL;
> +	pidns = task_active_pid_ns(current);
> +	if (unlikely(!pidns))
> +		goto clear;
> +	pidns_info->nsid =  pidns->ns.inum;
> +	pid = task_pid_nr_ns(current, pidns);
> +	if (unlikely(!pid))
> +		goto clear;
> +	tgid = task_tgid_nr_ns(current, pidns);
> +	if (unlikely(!tgid))
> +		goto clear;
> +	pidns_info->tgid = (u32) tgid;
> +	pidns_info->pid = (u32) pid;
> +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> +	if (unlikely(!tmp)) {
> +		memset((void *)pidns_info, 0, (size_t) size);
> +		return -ENOMEM;
> +	}
> +	len = strlen(pidns_path) + 1;
> +	memcpy((char *)tmp->name, pidns_path, len);
> +	tmp->uptr = NULL;
> +	tmp->aname = NULL;
> +	tmp->refcnt = 1;
> +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> +	if (ret) {
> +		memset((void *)pidns_info, 0, (size_t) size);
> +		return ret;
> +	}
> +	inode = d_backing_inode(kp.dentry);
> +	pidns_info->dev = inode->i_sb->s_dev;
> +	return 0;
> +clear:
> +	memset((void *)pidns_info, 0, (size_t) size);
> +	return -EINVAL;
> +}
> +
> +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> +	.func		= bpf_get_current_pidns_info,
> +	.gpl_only	= false,
> +	.ret_type	= RET_INTEGER,
> +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> +	.arg2_type	= ARG_CONST_SIZE,
> +};
> +
>   #ifdef CONFIG_CGROUPS
>   BPF_CALL_0(bpf_get_current_cgroup_id)
>   {
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ca1255d14576..5e1dc22765a5 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>   #endif
>   	case BPF_FUNC_send_signal:
>   		return &bpf_send_signal_proto;
> +	case BPF_FUNC_get_current_pidns_info:
> +		return &bpf_get_current_pidns_info_proto;
>   	default:
>   		return NULL;
>   	}
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper.
  2019-08-13 18:47 ` [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper Carlos Neira
@ 2019-08-13 23:19   ` Yonghong Song
  0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2019-08-13 23:19 UTC (permalink / raw)
  To: Carlos Neira, netdev; +Cc: ebiederm, brouer, bpf



On 8/13/19 11:47 AM, Carlos Neira wrote:
> From: Carlos <cneirabustos@gmail.com>
> 
> Added self-tests for new helper bpf_get_current_pidns_info.
> 
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
>   tools/include/uapi/linux/bpf.h                     |  31 ++++-
>   tools/testing/selftests/bpf/Makefile               |   2 +-
>   tools/testing/selftests/bpf/bpf_helpers.h          |   3 +
>   .../testing/selftests/bpf/progs/test_pidns_kern.c  |  51 ++++++++
>   tools/testing/selftests/bpf/test_pidns.c           | 138 +++++++++++++++++++++

Could you break this patch into two?
   patch 1: tools/include/uapi/linux/bpf.h
   patch 2: rest of changes

>   5 files changed, 223 insertions(+), 2 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
>   create mode 100644 tools/testing/selftests/bpf/test_pidns.c
> 
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 4393bd4b2419..db241857ec15 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2741,6 +2741,28 @@ union bpf_attr {
>    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>    *
>    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> + *	Description
> + *		Copies into *pidns* pid, namespace id and tgid as seen by the
> + *		current namespace and also device from /proc/self/ns/pid.
> + *		*size_of_pidns* must be the size of *pidns*
> + *
> + *		This helper is used when pid filtering is needed inside a
> + *		container as bpf_get_current_tgid() helper returns always the
> + *		pid id as seen by the root namespace.
> + *	Return
> + *		0 on success
> + *
> + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> + *		or tgid of the current task.
> + *
> + *		**-ECHILD** if /proc/self/ns/pid does not exists.
> + *
> + *		**-ENOTDIR** if /proc/self/ns does not exists.
> + *
> + *		**-ENOMEM**  if allocation fails.
> + *
>    */
>   #define __BPF_FUNC_MAPPER(FN)		\
>   	FN(unspec),			\
> @@ -2853,7 +2875,8 @@ union bpf_attr {
>   	FN(sk_storage_get),		\
>   	FN(sk_storage_delete),		\
>   	FN(send_signal),		\
> -	FN(tcp_gen_syncookie),
> +	FN(tcp_gen_syncookie),		\
> +	FN(get_current_pidns_info),
>   
>   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>    * function eBPF program intends to call
> @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
>   	__s32	retval;
>   };
>   
> +struct bpf_pidns_info {
> +	__u32 dev;
> +	__u32 nsid;
> +	__u32 tgid;
> +	__u32 pid;
> +};
>   #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 3bd0f4a0336a..1f97b571b581 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
>   	test_cgroup_storage test_select_reuseport test_section_names \
>   	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
>   	test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
> -	test_sockopt_multi test_tcp_rtt
> +	test_sockopt_multi test_tcp_rtt test_pidns
>   
>   BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
>   TEST_GEN_FILES = $(BPF_OBJ_FILES)
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 8b503ea142f0..3fae3b9fcd2c 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
>   static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
>   					  int ip_len, void *tcp, int tcp_len) =
>   	(void *) BPF_FUNC_tcp_gen_syncookie;
> +static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info *buf,
> +					 unsigned int buf_size) =
> +	(void *) BPF_FUNC_get_current_pidns_info;
>   
>   /* llvm builtin functions that eBPF C program may use to
>    * emit BPF_LD_ABS and BPF_LD_IND instructions
> diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
> new file mode 100644
> index 000000000000..e1d2facfa762
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
> @@ -0,0 +1,51 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <linux/bpf.h>
> +#include <errno.h>
> +#include "bpf_helpers.h"
> +
> +struct bpf_map_def SEC("maps") nsidmap = {
> +	.type = BPF_MAP_TYPE_ARRAY,
> +	.key_size = sizeof(__u32),
> +	.value_size = sizeof(__u32),
> +	.max_entries = 1,
> +};
> +
> +struct bpf_map_def SEC("maps") pidmap = {
> +	.type = BPF_MAP_TYPE_ARRAY,
> +	.key_size = sizeof(__u32),
> +	.value_size = sizeof(__u32),
> +	.max_entries = 1,
> +};

Could you use new map definitions. Search
"SEC(".maps")" for examples.

> +
> +SEC("tracepoint/syscalls/sys_enter_nanosleep")
> +int trace(void *ctx)
> +{
> +	struct bpf_pidns_info nsinfo;
> +	__u32 key = 0, *expected_pid, *val;
> +	char fmt[] = "ERROR nspid:%d\n";
> +
> +	if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
> +		return -EINVAL;
> +
> +	expected_pid = bpf_map_lookup_elem(&pidmap, &key);
> +
> +
> +	if (!expected_pid || *expected_pid != nsinfo.pid)
> +		return 0;
> 

I would like you to compare device major/minor, namespace id,
pid and tid. We should test everything here.

+
> +	val = bpf_map_lookup_elem(&nsidmap, &key);
> +	if (val)
> +		*val = nsinfo.nsid;
> +
> +	return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +__u32 _version SEC("version") = 1;
> diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c
> new file mode 100644
> index 000000000000..a7254055f294
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_pidns.c
> @@ -0,0 +1,138 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <syscall.h>
> +#include <unistd.h>
> +#include <linux/perf_event.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include "cgroup_helpers.h"
> +#include "bpf_rlimit.h"
> +
> +#define CHECK(condition, tag, format...) ({		\
> +	int __ret = !!(condition);			\
> +	if (__ret) {					\
> +		printf("%s:FAIL:%s ", __func__, tag);	\
> +		printf(format);				\
> +	} else {					\
> +		printf("%s:PASS:%s\n", __func__, tag);	\
> +	}						\
> +	__ret;						\
> +})
> +
> +static int bpf_find_map(const char *test, struct bpf_object *obj,
> +			const char *name)
> +{
> +	struct bpf_map *map;
> +
> +	map = bpf_object__find_map_by_name(obj, name);
> +	if (!map)
> +		return -1;
> +	return bpf_map__fd(map);
> +}
> +
> +
> +int main(int argc, char **argv)
> +{
> +	const char *probe_name = "syscalls/sys_enter_nanosleep";
> +	const char *file = "test_pidns_kern.o";
> +	int err, bytes, efd, prog_fd, pmu_fd;
> +	int pidmap_fd, nsidmap_fd;
> +	struct perf_event_attr attr = {};
> +	struct bpf_object *obj;
> +	__u32 knsid = 0;
> +	__u32 key = 0, pid;
> +	int exit_code = 1;
> +	struct stat st;
> +	char buf[256];
> +
> +	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
> +	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
> +		goto cleanup_cgroup_env;
> +
> +	nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
> +	if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  nsidmap_fd, errno))
> +		goto close_prog;
> +
> +	pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
> +	if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +		  pidmap_fd, errno))
> +		goto close_prog;
> +
> +	pid = getpid();
> +	bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
> +
> +	snprintf(buf, sizeof(buf),
> +		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
> +	efd = open(buf, O_RDONLY, 0);
> +	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
> +		goto close_prog;
> +	bytes = read(efd, buf, sizeof(buf));
> +	close(efd);
> +	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
> +		  "bytes %d errno %d\n", bytes, errno))
> +		goto close_prog;

Please use libbpf perf APIs.

It would be good if the test actually create a namespace and do the test.

Do you think it is possible to use the existing test_progs 
infrastructure. The current one without creating pid namespace
surely fit in. Not sure if we add creating/deleting namespace,
I would think it should fit in as well.

> +
> +	attr.config = strtol(buf, NULL, 0);
> +	attr.type = PERF_TYPE_TRACEPOINT;
> +	attr.sample_type = PERF_SAMPLE_RAW;
> +	attr.sample_period = 1;
> +	attr.wakeup_events = 1;
> +
> +	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
> +	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
> +		  errno))
> +		goto close_prog;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> +	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> +	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
> +		  errno))
> +		goto close_pmu;
> +
> +	/* trigger some syscalls */
> +	sleep(1);
> +
> +	err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
> +	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
> +		goto close_pmu;
> +
> +	if (stat("/proc/self/ns/pid", &st))
> +		goto close_pmu;
> +
> +	if (CHECK(knsid != (__u32) st.st_ino, "compare_namespace_id",
> +		  "kern knsid %u user unsid %u\n", knsid, (__u32) st.st_ino))
> +		goto close_pmu;
> +
> +	exit_code = 0;
> +	printf("%s:PASS\n", argv[0]);
> +
> +close_pmu:
> +	close(pmu_fd);
> +close_prog:
> +	bpf_object__close(obj);
> +cleanup_cgroup_env:
> +	return exit_code;
> +}
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Potential Spoof] Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
  2019-08-13 23:11   ` Yonghong Song
@ 2019-08-13 23:51     ` Yonghong Song
  2019-08-14  0:56     ` Carlos Antonio Neira Bustos
  1 sibling, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2019-08-13 23:51 UTC (permalink / raw)
  To: Carlos Neira, netdev; +Cc: ebiederm, brouer, bpf



On 8/13/19 4:11 PM, Yonghong Song wrote:
> 
> 
> On 8/13/19 11:47 AM, Carlos Neira wrote:
>> From: Carlos <cneirabustos@gmail.com>
>>
>> New bpf helper bpf_get_current_pidns_info.
>> This helper obtains the active namespace from current and returns
>> pid, tgid, device and namespace id as seen from that namespace,
>> allowing to instrument a process inside a container.
>>
>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
>> ---
>>    fs/internal.h            |  2 --
>>    fs/namei.c               |  1 -
>>    include/linux/bpf.h      |  1 +
>>    include/linux/namei.h    |  4 +++
>>    include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
>>    kernel/bpf/core.c        |  1 +
>>    kernel/bpf/helpers.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
>>    kernel/trace/bpf_trace.c |  2 ++
>>    8 files changed, 102 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/internal.h b/fs/internal.h
>> index 315fcd8d237c..6647e15dd419 100644
>> --- a/fs/internal.h
>> +++ b/fs/internal.h
>> @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
>>    /*
>>     * namei.c
>>     */
>> -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
>> -			   struct path *path, struct path *root);
>>    extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
>>    extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
>>    			   const char *, unsigned int, struct path *);
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 209c51a5226c..a89fc72a4a10 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -19,7 +19,6 @@
>>    #include <linux/export.h>
>>    #include <linux/kernel.h>
>>    #include <linux/slab.h>
>> -#include <linux/fs.h>
>>    #include <linux/namei.h>
>>    #include <linux/pagemap.h>
>>    #include <linux/fsnotify.h>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index f9a506147c8a..e4adf5e05afd 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
>>    extern const struct bpf_func_proto bpf_strtol_proto;
>>    extern const struct bpf_func_proto bpf_strtoul_proto;
>>    extern const struct bpf_func_proto bpf_tcp_sock_proto;
>> +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
>>    
>>    /* Shared helpers among cBPF and eBPF. */
>>    void bpf_user_rnd_init_once(void);
>> diff --git a/include/linux/namei.h b/include/linux/namei.h
>> index 9138b4471dbf..b45c8b6f7cb4 100644
>> --- a/include/linux/namei.h
>> +++ b/include/linux/namei.h
>> @@ -6,6 +6,7 @@
>>    #include <linux/path.h>
>>    #include <linux/fcntl.h>
>>    #include <linux/errno.h>
>> +#include <linux/fs.h>
>>    
>>    enum { MAX_NESTED_LINKS = 8 };
>>    
>> @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
>>    
>>    extern void nd_jump_link(struct path *path);
>>    
>> +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
>> +			   struct path *path, struct path *root);
>> +
>>    static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
>>    {
>>    	((char *) name)[min(len, maxlen)] = '\0';
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 4393bd4b2419..db241857ec15 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -2741,6 +2741,28 @@ union bpf_attr {
>>     *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>>     *
>>     *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
>> + *
>> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
>> + *	Description
>> + *		Copies into *pidns* pid, namespace id and tgid as seen by the
>> + *		current namespace and also device from /proc/self/ns/pid.
>> + *		*size_of_pidns* must be the size of *pidns*
>> + *
>> + *		This helper is used when pid filtering is needed inside a
>> + *		container as bpf_get_current_tgid() helper returns always the
>> + *		pid id as seen by the root namespace.
>> + *	Return
>> + *		0 on success
>> + *
>> + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
>> + *		or tgid of the current task.
>> + *
>> + *		**-ECHILD** if /proc/self/ns/pid does not exists.
>> + *
>> + *		**-ENOTDIR** if /proc/self/ns does not exists.
>> + *
>> + *		**-ENOMEM**  if allocation fails.
>> + *
>>     */
>>    #define __BPF_FUNC_MAPPER(FN)		\
>>    	FN(unspec),			\
>> @@ -2853,7 +2875,8 @@ union bpf_attr {
>>    	FN(sk_storage_get),		\
>>    	FN(sk_storage_delete),		\
>>    	FN(send_signal),		\
>> -	FN(tcp_gen_syncookie),
>> +	FN(tcp_gen_syncookie),		\
>> +	FN(get_current_pidns_info),
>>    
>>    /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>>     * function eBPF program intends to call
>> @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
>>    	__s32	retval;
>>    };
>>    
>> +struct bpf_pidns_info {
>> +	__u32 dev;
>> +	__u32 nsid;
>> +	__u32 tgid;
>> +	__u32 pid;
>> +};
>>    #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index 8191a7db2777..3159f2a0188c 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
>>    const struct bpf_func_proto bpf_get_current_comm_proto __weak;
>>    const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
>>    const struct bpf_func_proto bpf_get_local_storage_proto __weak;
>> +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
>>    
>>    const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
>>    {
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index 5e28718928ca..41fbf1f28a48 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -11,6 +11,12 @@
>>    #include <linux/uidgid.h>
>>    #include <linux/filter.h>
>>    #include <linux/ctype.h>
>> +#include <linux/pid_namespace.h>
>> +#include <linux/major.h>
>> +#include <linux/stat.h>
>> +#include <linux/namei.h>
>> +#include <linux/version.h>
>> +
>>    
>>    #include "../../lib/kstrtox.h"
>>    
>> @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
>>    	preempt_enable();
>>    }
>>    
>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
>> +	 size)
>> +{
>> +	const char *pidns_path = "/proc/self/ns/pid";
>> +	struct pid_namespace *pidns = NULL;
>> +	struct filename *tmp = NULL;
>> +	struct inode *inode;
>> +	struct path kp;
>> +	pid_t tgid = 0;
>> +	pid_t pid = 0;
>> +	int ret;
>> +	int len;
> 
> I am running your sample program and get the following kernel bug:
> 
> ...
> [   26.414825] BUG: sleeping function called from invalid context at
> /data/users/yhs/work/net-next/fs
> /dcache.c:843
> [   26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> [   26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G        W
> 5.3.0-rc1+ #280
> [   26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.9.3-1.el7.centos 04/01/2
> 014
> [   26.419393] Call Trace:
> [   26.419697]  <IRQ>
> [   26.419960]  dump_stack+0x46/0x5b
> [   26.420434]  ___might_sleep+0xe4/0x110
> [   26.420894]  dput+0x2a/0x200
> [   26.421265]  walk_component+0x10c/0x280
> [   26.421773]  link_path_walk+0x327/0x560
> [   26.422280]  ? proc_ns_dir_readdir+0x1a0/0x1a0
> [   26.422848]  ? path_init+0x232/0x330
> [   26.423364]  path_lookupat+0x88/0x200
> [   26.423808]  ? selinux_parse_skb.constprop.69+0x124/0x430
> [   26.424521]  filename_lookup+0xaf/0x190
> [   26.425031]  ? simple_attr_release+0x20/0x20
> [   26.425560]  bpf_get_current_pidns_info+0xfa/0x190
> [   26.426168]  bpf_prog_83627154cefed596+0xe66/0x1000
> [   26.426779]  trace_call_bpf+0xb5/0x160
> [   26.427317]  ? __netif_receive_skb_core+0x1/0xbb0
> [   26.427929]  ? __netif_receive_skb_core+0x1/0xbb0
> [   26.428496]  kprobe_perf_func+0x4d/0x280
> [   26.428986]  ? tracing_record_taskinfo_skip+0x1a/0x30
> [   26.429584]  ? tracing_record_taskinfo+0xe/0x80
> [   26.430152]  ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> [   26.430737]  ? __netif_receive_skb_core+0x1/0xbb0
> [   26.431334]  ? __netif_receive_skb_core+0x5/0xbb0
> [   26.431930]  kprobe_ftrace_handler+0x90/0xf0
> [   26.432495]  ftrace_ops_assist_func+0x63/0x100
> [   26.433060]  0xffffffffc03180bf
> [   26.433471]  ? __netif_receive_skb_core+0x1/0xbb0
> ...
> 
> To prevent we are running in arbitrary task (e.g., idle task)
> context which may introduce sleeping issues, the following
> probably appropriate:
> 
>          if (in_nmi() || in_softirq())
>                  return -EPERM;

A better condition is (from helper bpf_probe_write_user()):
         if (unlikely(in_interrupt() ||
                      current->flags & (PF_KTHREAD | PF_EXITING)))
                 return -EPERM;

> 
> Anyway, if in nmi or softirq, the namespace and pid/tgid
> we get may be just accidentally associated with the bpf running
> context, but it could be in a different context. So such info
> is not reliable any way.
> 
>> +
>> +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
>> +		return -EINVAL;
>> +	pidns = task_active_pid_ns(current);
>> +	if (unlikely(!pidns))
>> +		goto clear;
>> +	pidns_info->nsid =  pidns->ns.inum;
>> +	pid = task_pid_nr_ns(current, pidns);
>> +	if (unlikely(!pid))
>> +		goto clear;
>> +	tgid = task_tgid_nr_ns(current, pidns);
>> +	if (unlikely(!tgid))
>> +		goto clear;
>> +	pidns_info->tgid = (u32) tgid;
>> +	pidns_info->pid = (u32) pid;
>> +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
>> +	if (unlikely(!tmp)) {
>> +		memset((void *)pidns_info, 0, (size_t) size);
>> +		return -ENOMEM;
>> +	}
>> +	len = strlen(pidns_path) + 1;
>> +	memcpy((char *)tmp->name, pidns_path, len);
>> +	tmp->uptr = NULL;
>> +	tmp->aname = NULL;
>> +	tmp->refcnt = 1;
>> +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
>> +	if (ret) {
>> +		memset((void *)pidns_info, 0, (size_t) size);
>> +		return ret;
>> +	}
>> +	inode = d_backing_inode(kp.dentry);
>> +	pidns_info->dev = inode->i_sb->s_dev;
>> +	return 0;
>> +clear:
>> +	memset((void *)pidns_info, 0, (size_t) size);
>> +	return -EINVAL;
>> +}
>> +
>> +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
>> +	.func		= bpf_get_current_pidns_info,
>> +	.gpl_only	= false,
>> +	.ret_type	= RET_INTEGER,
>> +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
>> +	.arg2_type	= ARG_CONST_SIZE,
>> +};
>> +
>>    #ifdef CONFIG_CGROUPS
>>    BPF_CALL_0(bpf_get_current_cgroup_id)
>>    {
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index ca1255d14576..5e1dc22765a5 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>    #endif
>>    	case BPF_FUNC_send_signal:
>>    		return &bpf_send_signal_proto;
>> +	case BPF_FUNC_get_current_pidns_info:
>> +		return &bpf_get_current_pidns_info_proto;
>>    	default:
>>    		return NULL;
>>    	}
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
  2019-08-13 23:11   ` Yonghong Song
  2019-08-13 23:51     ` [Potential Spoof] " Yonghong Song
@ 2019-08-14  0:56     ` Carlos Antonio Neira Bustos
       [not found]       ` <9a2cacad-b79f-5d39-6d62-bb48cbaaac07@fb.com>
  1 sibling, 1 reply; 16+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-08-14  0:56 UTC (permalink / raw)
  To: Yonghong Song; +Cc: netdev, ebiederm, brouer, bpf

On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
> 
> 
> On 8/13/19 11:47 AM, Carlos Neira wrote:
> > From: Carlos <cneirabustos@gmail.com>
> > 
> > New bpf helper bpf_get_current_pidns_info.
> > This helper obtains the active namespace from current and returns
> > pid, tgid, device and namespace id as seen from that namespace,
> > allowing to instrument a process inside a container.
> > 
> > Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > ---
> >   fs/internal.h            |  2 --
> >   fs/namei.c               |  1 -
> >   include/linux/bpf.h      |  1 +
> >   include/linux/namei.h    |  4 +++
> >   include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> >   kernel/bpf/core.c        |  1 +
> >   kernel/bpf/helpers.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
> >   kernel/trace/bpf_trace.c |  2 ++
> >   8 files changed, 102 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/internal.h b/fs/internal.h
> > index 315fcd8d237c..6647e15dd419 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
> >   /*
> >    * namei.c
> >    */
> > -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> > -			   struct path *path, struct path *root);
> >   extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
> >   extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
> >   			   const char *, unsigned int, struct path *);
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 209c51a5226c..a89fc72a4a10 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -19,7 +19,6 @@
> >   #include <linux/export.h>
> >   #include <linux/kernel.h>
> >   #include <linux/slab.h>
> > -#include <linux/fs.h>
> >   #include <linux/namei.h>
> >   #include <linux/pagemap.h>
> >   #include <linux/fsnotify.h>
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f9a506147c8a..e4adf5e05afd 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
> >   extern const struct bpf_func_proto bpf_strtol_proto;
> >   extern const struct bpf_func_proto bpf_strtoul_proto;
> >   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> > +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
> >   
> >   /* Shared helpers among cBPF and eBPF. */
> >   void bpf_user_rnd_init_once(void);
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index 9138b4471dbf..b45c8b6f7cb4 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -6,6 +6,7 @@
> >   #include <linux/path.h>
> >   #include <linux/fcntl.h>
> >   #include <linux/errno.h>
> > +#include <linux/fs.h>
> >   
> >   enum { MAX_NESTED_LINKS = 8 };
> >   
> > @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
> >   
> >   extern void nd_jump_link(struct path *path);
> >   
> > +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> > +			   struct path *path, struct path *root);
> > +
> >   static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
> >   {
> >   	((char *) name)[min(len, maxlen)] = '\0';
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4393bd4b2419..db241857ec15 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2741,6 +2741,28 @@ union bpf_attr {
> >    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
> >    *
> >    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> > + *
> > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> > + *	Description
> > + *		Copies into *pidns* pid, namespace id and tgid as seen by the
> > + *		current namespace and also device from /proc/self/ns/pid.
> > + *		*size_of_pidns* must be the size of *pidns*
> > + *
> > + *		This helper is used when pid filtering is needed inside a
> > + *		container as bpf_get_current_tgid() helper returns always the
> > + *		pid id as seen by the root namespace.
> > + *	Return
> > + *		0 on success
> > + *
> > + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> > + *		or tgid of the current task.
> > + *
> > + *		**-ECHILD** if /proc/self/ns/pid does not exists.
> > + *
> > + *		**-ENOTDIR** if /proc/self/ns does not exists.
> > + *
> > + *		**-ENOMEM**  if allocation fails.
> > + *
> >    */
> >   #define __BPF_FUNC_MAPPER(FN)		\
> >   	FN(unspec),			\
> > @@ -2853,7 +2875,8 @@ union bpf_attr {
> >   	FN(sk_storage_get),		\
> >   	FN(sk_storage_delete),		\
> >   	FN(send_signal),		\
> > -	FN(tcp_gen_syncookie),
> > +	FN(tcp_gen_syncookie),		\
> > +	FN(get_current_pidns_info),
> >   
> >   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >    * function eBPF program intends to call
> > @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
> >   	__s32	retval;
> >   };
> >   
> > +struct bpf_pidns_info {
> > +	__u32 dev;
> > +	__u32 nsid;
> > +	__u32 tgid;
> > +	__u32 pid;
> > +};
> >   #endif /* _UAPI__LINUX_BPF_H__ */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 8191a7db2777..3159f2a0188c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
> >   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
> >   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
> >   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> > +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
> >   
> >   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> >   {
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 5e28718928ca..41fbf1f28a48 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -11,6 +11,12 @@
> >   #include <linux/uidgid.h>
> >   #include <linux/filter.h>
> >   #include <linux/ctype.h>
> > +#include <linux/pid_namespace.h>
> > +#include <linux/major.h>
> > +#include <linux/stat.h>
> > +#include <linux/namei.h>
> > +#include <linux/version.h>
> > +
> >   
> >   #include "../../lib/kstrtox.h"
> >   
> > @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> >   	preempt_enable();
> >   }
> >   
> > +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> > +	 size)
> > +{
> > +	const char *pidns_path = "/proc/self/ns/pid";
> > +	struct pid_namespace *pidns = NULL;
> > +	struct filename *tmp = NULL;
> > +	struct inode *inode;
> > +	struct path kp;
> > +	pid_t tgid = 0;
> > +	pid_t pid = 0;
> > +	int ret;
> > +	int len;
> 

Thank you very much for catching this!. 
Could you share how to replicate this bug?.

> I am running your sample program and get the following kernel bug:
> 
> ...
> [   26.414825] BUG: sleeping function called from invalid context at 
> /data/users/yhs/work/net-next/fs
> /dcache.c:843
> [   26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> [   26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G        W 
> 5.3.0-rc1+ #280
> [   26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> BIOS 1.9.3-1.el7.centos 04/01/2
> 014
> [   26.419393] Call Trace:
> [   26.419697]  <IRQ>
> [   26.419960]  dump_stack+0x46/0x5b
> [   26.420434]  ___might_sleep+0xe4/0x110
> [   26.420894]  dput+0x2a/0x200
> [   26.421265]  walk_component+0x10c/0x280
> [   26.421773]  link_path_walk+0x327/0x560
> [   26.422280]  ? proc_ns_dir_readdir+0x1a0/0x1a0
> [   26.422848]  ? path_init+0x232/0x330
> [   26.423364]  path_lookupat+0x88/0x200
> [   26.423808]  ? selinux_parse_skb.constprop.69+0x124/0x430
> [   26.424521]  filename_lookup+0xaf/0x190
> [   26.425031]  ? simple_attr_release+0x20/0x20
> [   26.425560]  bpf_get_current_pidns_info+0xfa/0x190
> [   26.426168]  bpf_prog_83627154cefed596+0xe66/0x1000
> [   26.426779]  trace_call_bpf+0xb5/0x160
> [   26.427317]  ? __netif_receive_skb_core+0x1/0xbb0
> [   26.427929]  ? __netif_receive_skb_core+0x1/0xbb0
> [   26.428496]  kprobe_perf_func+0x4d/0x280
> [   26.428986]  ? tracing_record_taskinfo_skip+0x1a/0x30
> [   26.429584]  ? tracing_record_taskinfo+0xe/0x80
> [   26.430152]  ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> [   26.430737]  ? __netif_receive_skb_core+0x1/0xbb0
> [   26.431334]  ? __netif_receive_skb_core+0x5/0xbb0
> [   26.431930]  kprobe_ftrace_handler+0x90/0xf0
> [   26.432495]  ftrace_ops_assist_func+0x63/0x100
> [   26.433060]  0xffffffffc03180bf
> [   26.433471]  ? __netif_receive_skb_core+0x1/0xbb0
> ...
> 
> To prevent we are running in arbitrary task (e.g., idle task)
> context which may introduce sleeping issues, the following
> probably appropriate:
> 
>         if (in_nmi() || in_softirq())
>                 return -EPERM;
> 
> Anyway, if in nmi or softirq, the namespace and pid/tgid
> we get may be just accidentally associated with the bpf running
> context, but it could be in a different context. So such info
> is not reliable any way.
> 
> > +
> > +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > +		return -EINVAL;
> > +	pidns = task_active_pid_ns(current);
> > +	if (unlikely(!pidns))
> > +		goto clear;
> > +	pidns_info->nsid =  pidns->ns.inum;
> > +	pid = task_pid_nr_ns(current, pidns);
> > +	if (unlikely(!pid))
> > +		goto clear;
> > +	tgid = task_tgid_nr_ns(current, pidns);
> > +	if (unlikely(!tgid))
> > +		goto clear;
> > +	pidns_info->tgid = (u32) tgid;
> > +	pidns_info->pid = (u32) pid;
> > +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> > +	if (unlikely(!tmp)) {
> > +		memset((void *)pidns_info, 0, (size_t) size);
> > +		return -ENOMEM;
> > +	}
> > +	len = strlen(pidns_path) + 1;
> > +	memcpy((char *)tmp->name, pidns_path, len);
> > +	tmp->uptr = NULL;
> > +	tmp->aname = NULL;
> > +	tmp->refcnt = 1;
> > +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> > +	if (ret) {
> > +		memset((void *)pidns_info, 0, (size_t) size);
> > +		return ret;
> > +	}
> > +	inode = d_backing_inode(kp.dentry);
> > +	pidns_info->dev = inode->i_sb->s_dev;
> > +	return 0;
> > +clear:
> > +	memset((void *)pidns_info, 0, (size_t) size);
> > +	return -EINVAL;
> > +}
> > +
> > +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> > +	.func		= bpf_get_current_pidns_info,
> > +	.gpl_only	= false,
> > +	.ret_type	= RET_INTEGER,
> > +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> > +	.arg2_type	= ARG_CONST_SIZE,
> > +};
> > +
> >   #ifdef CONFIG_CGROUPS
> >   BPF_CALL_0(bpf_get_current_cgroup_id)
> >   {
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ca1255d14576..5e1dc22765a5 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >   #endif
> >   	case BPF_FUNC_send_signal:
> >   		return &bpf_send_signal_proto;
> > +	case BPF_FUNC_get_current_pidns_info:
> > +		return &bpf_get_current_pidns_info_proto;
> >   	default:
> >   		return NULL;
> >   	}
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
  2019-08-13 22:35   ` Yonghong Song
@ 2019-08-20 15:10     ` Carlos Antonio Neira Bustos
  2019-08-20 17:29       ` Yonghong Song
  0 siblings, 1 reply; 16+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-08-20 15:10 UTC (permalink / raw)
  To: Yonghong Song; +Cc: netdev, ebiederm, brouer, bpf

Hi Yonghong,

Thanks for taking the time to review this.


> > + *
> > + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> > + *		or tgid of the current task.
> > + *
> > + *		**-ECHILD** if /proc/self/ns/pid does not exists.
> > + *
> > + *		**-ENOTDIR** if /proc/self/ns does not exists.
> 
> Let us remove ECHILD and ENOTDIR and replace it with ENOENT as I
> described below.
> 
> Please *do verify* what happens when namespaces or pid_ns are not
> configured.
>


I have tested kernel configurations without namespace support and with 
namespace support but without pid namespaces, the helper returns -EINVAL
on both cases, now it should return -ENOENT.


> > +struct bpf_pidns_info {
> > +	__u32 dev;
> 
> Please add a comment for dev for how device major and minor number are 
> derived. User space gets device major and minor number, they need to
> compare to the corresponding major/minor numbers returned by this helper.
> 
> > +	__u32 nsid;
> > +	__u32 tgid;
> > +	__u32 pid;
> > +};
>

What do you think of this comment ?

struct bpf_pidns_info {
	__u32 dev;	/* major/minor numbers from /proc/self/ns/pid.
			 * User space gets device major and minor numbers from
			 * the same device that need to be compared against the
			 * major/minor numbers returned by this helper.
			 */
	__u32 nsid;
	__u32 tgid;
	__u32 pid;
};

> 
> Please put an empty line. As a general rule for readability,
> put an empty line if control flow is interrupted, e.g., by
> "return", "break" or "continue". At least this is what
> I saw most in bpf mailing list.
>
I'll fix it in version 10.

> > +	len = strlen(pidns_path) + 1;
> > +	memcpy((char *)tmp->name, pidns_path, len);
> > +	tmp->uptr = NULL;
> > +	tmp->aname = NULL;
> > +	tmp->refcnt = 1;
> > +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> Adding below to free kmem cache memory
> 	kmem_cache_free(names_cachep, fname);
> 

I think we don't need to call kmem_cache_free as filename_lookup
calls putname that calls kmem_cache_free. 


Thanks a lot for your help.

Bests

> In the above, we checked task_active_pid_ns().
> If not returning NULL, we have a valid pid ns. So the above
> filename_lookup should not go wrong. We can still keep
> the error checking though.
> 
> > +	if (ret) {
> > +		memset((void *)pidns_info, 0, (size_t) size);
> > +		return ret;
> 
>

I think we could get rid of this.


On Tue, Aug 13, 2019 at 10:35:42PM +0000, Yonghong Song wrote:
> 
> 
> On 8/13/19 11:47 AM, Carlos Neira wrote:
> > From: Carlos <cneirabustos@gmail.com>
> > 
> > New bpf helper bpf_get_current_pidns_info.
> > This helper obtains the active namespace from current and returns
> > pid, tgid, device and namespace id as seen from that namespace,
> > allowing to instrument a process inside a container.
> > 
> > Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > ---
> >   fs/internal.h            |  2 --
> >   fs/namei.c               |  1 -
> >   include/linux/bpf.h      |  1 +
> >   include/linux/namei.h    |  4 +++
> >   include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> >   kernel/bpf/core.c        |  1 +
> >   kernel/bpf/helpers.c     | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
> >   kernel/trace/bpf_trace.c |  2 ++
> >   8 files changed, 102 insertions(+), 4 deletions(-)
> 
> I prefer to break this into two patches to reduce
> the potential merging conflicts:
>    patch 1: fs/internal.h, fs/namei.c, include/linux/namei.h
>    patch 2: rest of changes
> patch 1 is simply a preparing patches to make filename_lookup
> available later.
> 
> > 
> > diff --git a/fs/internal.h b/fs/internal.h
> > index 315fcd8d237c..6647e15dd419 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
> >   /*
> >    * namei.c
> >    */
> > -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> > -			   struct path *path, struct path *root);
> >   extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
> >   extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
> >   			   const char *, unsigned int, struct path *);
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 209c51a5226c..a89fc72a4a10 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -19,7 +19,6 @@
> >   #include <linux/export.h>
> >   #include <linux/kernel.h>
> >   #include <linux/slab.h>
> > -#include <linux/fs.h>
> >   #include <linux/namei.h>
> >   #include <linux/pagemap.h>
> >   #include <linux/fsnotify.h>
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f9a506147c8a..e4adf5e05afd 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
> >   extern const struct bpf_func_proto bpf_strtol_proto;
> >   extern const struct bpf_func_proto bpf_strtoul_proto;
> >   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> > +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
> >   
> >   /* Shared helpers among cBPF and eBPF. */
> >   void bpf_user_rnd_init_once(void);
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index 9138b4471dbf..b45c8b6f7cb4 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -6,6 +6,7 @@
> >   #include <linux/path.h>
> >   #include <linux/fcntl.h>
> >   #include <linux/errno.h>
> > +#include <linux/fs.h>
> >   
> >   enum { MAX_NESTED_LINKS = 8 };
> >   
> > @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
> >   
> >   extern void nd_jump_link(struct path *path);
> >   
> > +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> > +			   struct path *path, struct path *root);
> > +
> >   static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
> >   {
> >   	((char *) name)[min(len, maxlen)] = '\0';
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4393bd4b2419..db241857ec15 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2741,6 +2741,28 @@ union bpf_attr {
> >    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
> >    *
> >    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> > + *
> > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> 
> size_of_pidns => size.
> 
> > + *	Description
> > + *		Copies into *pidns* pid, namespace id and tgid as seen by the
> Copies => Copy.
> Maybe something like below:
> Get tgid, pid and namespace id as seen by the current namespace, and 
> device major/minor numbers from device /proc/self/ns/pid. Such
> information is stored in *pidns* of size *size*.
> 
> > + *		current namespace and also device from /proc/self/ns/pid.
> > + *		*size_of_pidns* must be the size of *pidns*
> > + *
> > + *		This helper is used when pid filtering is needed inside a
> > + *		container as bpf_get_current_tgid() helper returns always the
> 
> returns always => always returns.
> 
> > + *		pid id as seen by the root namespace.
> > + *	Return
> > + *		0 on success
> > + *
> > + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> > + *		or tgid of the current task.
> > + *
> > + *		**-ECHILD** if /proc/self/ns/pid does not exists.
> > + *
> > + *		**-ENOTDIR** if /proc/self/ns does not exists.
> 
> Let us remove ECHILD and ENOTDIR and replace it with ENOENT as I
> described below.
> 
> Please *do verify* what happens when namespaces or pid_ns are not
> configured.
> 
> > + *
> > + *		**-ENOMEM**  if allocation fails.
> 
> helper internal allocation fails.
> 
> > + *
> >    */
> >   #define __BPF_FUNC_MAPPER(FN)		\
> >   	FN(unspec),			\
> > @@ -2853,7 +2875,8 @@ union bpf_attr {
> >   	FN(sk_storage_get),		\
> >   	FN(sk_storage_delete),		\
> >   	FN(send_signal),		\
> > -	FN(tcp_gen_syncookie),
> > +	FN(tcp_gen_syncookie),		\
> > +	FN(get_current_pidns_info),
> >   
> >   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >    * function eBPF program intends to call
> > @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
> >   	__s32	retval;
> >   };
> >   
> > +struct bpf_pidns_info {
> > +	__u32 dev;
> 
> Please add a comment for dev for how device major and minor number are 
> derived. User space gets device major and minor number, they need to
> compare to the corresponding major/minor numbers returned by this helper.
> 
> > +	__u32 nsid;
> > +	__u32 tgid;
> > +	__u32 pid;
> > +};
> >   #endif /* _UAPI__LINUX_BPF_H__ */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 8191a7db2777..3159f2a0188c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
> >   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
> >   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
> >   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> > +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
> >   
> >   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> >   {
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 5e28718928ca..41fbf1f28a48 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -11,6 +11,12 @@
> >   #include <linux/uidgid.h>
> >   #include <linux/filter.h>
> >   #include <linux/ctype.h>
> > +#include <linux/pid_namespace.h>
> > +#include <linux/major.h>
> > +#include <linux/stat.h>
> > +#include <linux/namei.h>
> > +#include <linux/version.h>
> > +
> >   
> >   #include "../../lib/kstrtox.h"
> >   
> > @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> >   	preempt_enable();
> >   }
> >   
> > +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> > +	 size)
> > +{
> > +	const char *pidns_path = "/proc/self/ns/pid";
> > +	struct pid_namespace *pidns = NULL;
> > +	struct filename *tmp = NULL;
> 
> tmp => fname
> 
> > +	struct inode *inode;
> > +	struct path kp;
> > +	pid_t tgid = 0;
> > +	pid_t pid = 0;
> > +	int ret;
> > +	int len;
> > +
> > +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > +		return -EINVAL;
> 
> Please put an empty line. As a general rule for readability,
> put an empty line if control flow is interrupted, e.g., by
> "return", "break" or "continue". At least this is what
> I saw most in bpf mailing list.
> 
> > +	pidns = task_active_pid_ns(current);
> > +	if (unlikely(!pidns))
> > +		goto clear;
> 
> An empty line. Also, there is nothing to clear.
> I prefer an error code -ENOENT.
> 
> You can set
> 	ret = -EINVAL;
> here
> 
> > +	pidns_info->nsid =  pidns->ns.inum;
> > +	pid = task_pid_nr_ns(current, pidns);
> > +	if (unlikely(!pid))
> > +		goto clear;
> 
> An empty line.
> 
> > +	tgid = task_tgid_nr_ns(current, pidns);
> > +	if (unlikely(!tgid))
> > +		goto clear;
> 
> An empty line.
> 
> > +	pidns_info->tgid = (u32) tgid;
> > +	pidns_info->pid = (u32) pid;
> 
> Different functionality, an empty line.
> 
> > +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> > +	if (unlikely(!tmp)) {
> > +		memset((void *)pidns_info, 0, (size_t) size);
> > +		return -ENOMEM;
> 
> ret = -ENOMEM;
> goto clear;
> 
> > +	}
> 
> An empty line.
> 
> > +	len = strlen(pidns_path) + 1;
> > +	memcpy((char *)tmp->name, pidns_path, len);
> > +	tmp->uptr = NULL;
> > +	tmp->aname = NULL;
> > +	tmp->refcnt = 1;
> > +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> Adding below to free kmem cache memory
> 	kmem_cache_free(names_cachep, fname);
> 
> In the above, we checked task_active_pid_ns().
> If not returning NULL, we have a valid pid ns. So the above
> filename_lookup should not go wrong. We can still keep
> the error checking though.
> 
> > +	if (ret) {
> > +		memset((void *)pidns_info, 0, (size_t) size);
> > +		return ret;
> 
> goto clear;
> 
> > +	}
> 
> An empty line.
> 
> > +	inode = d_backing_inode(kp.dentry);
> > +	pidns_info->dev = inode->i_sb->s_dev;
> > +	return 0;
> 
> An empty line.
> 
> > +clear > +	memset((void *)pidns_info, 0, (size_t) size);
> > +	return -EINVAL;
> > +}
> > +
> > +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> > +	.func		= bpf_get_current_pidns_info,
> > +	.gpl_only	= false,
> > +	.ret_type	= RET_INTEGER,
> > +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> > +	.arg2_type	= ARG_CONST_SIZE,
> > +};
> > +
> >   #ifdef CONFIG_CGROUPS
> >   BPF_CALL_0(bpf_get_current_cgroup_id)
> >   {
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ca1255d14576..5e1dc22765a5 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >   #endif
> >   	case BPF_FUNC_send_signal:
> >   		return &bpf_send_signal_proto;
> > +	case BPF_FUNC_get_current_pidns_info:
> > +		return &bpf_get_current_pidns_info_proto;
> >   	default:
> >   		return NULL;
> >   	}
> > 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
  2019-08-20 15:10     ` Carlos Antonio Neira Bustos
@ 2019-08-20 17:29       ` Yonghong Song
  0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2019-08-20 17:29 UTC (permalink / raw)
  To: Carlos Antonio Neira Bustos; +Cc: netdev, ebiederm, brouer, bpf



On 8/20/19 8:10 AM, Carlos Antonio Neira Bustos wrote:
> Hi Yonghong,
> 
> Thanks for taking the time to review this.
> 
> 
>>> + *
>>> + *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
>>> + *		or tgid of the current task.
>>> + *
>>> + *		**-ECHILD** if /proc/self/ns/pid does not exists.
>>> + *
>>> + *		**-ENOTDIR** if /proc/self/ns does not exists.
>>
>> Let us remove ECHILD and ENOTDIR and replace it with ENOENT as I
>> described below.
>>
>> Please *do verify* what happens when namespaces or pid_ns are not
>> configured.
>>
> 
> 
> I have tested kernel configurations without namespace support and with
> namespace support but without pid namespaces, the helper returns -EINVAL
> on both cases, now it should return -ENOENT.

Indeed. -ENOENT is better.

> 
> 
>>> +struct bpf_pidns_info {
>>> +	__u32 dev;
>>
>> Please add a comment for dev for how device major and minor number are
>> derived. User space gets device major and minor number, they need to
>> compare to the corresponding major/minor numbers returned by this helper.
>>
>>> +	__u32 nsid;
>>> +	__u32 tgid;
>>> +	__u32 pid;
>>> +};
>>
> 
> What do you think of this comment ?
> 
> struct bpf_pidns_info {
> 	__u32 dev;	/* major/minor numbers from /proc/self/ns/pid.
> 			 * User space gets device major and minor numbers from
> 			 * the same device that need to be compared against the
> 			 * major/minor numbers returned by this helper.
> 			 */
> 	__u32 nsid;
> 	__u32 tgid;
> 	__u32 pid;
> };
> 

To be more specific, I like a comment similar to below in uapi bpf.h

struct bpf_cgroup_dev_ctx {
         /* access_type encoded as (BPF_DEVCG_ACC_* << 16) | 
BPF_DEVCG_DEV_* */
         __u32 access_type;
         __u32 major;
         __u32 minor;
};

Some like:
	/* dev encoded as (major << 8 | (minor & 0xff)) */

>>
>> Please put an empty line. As a general rule for readability,
>> put an empty line if control flow is interrupted, e.g., by
>> "return", "break" or "continue". At least this is what
>> I saw most in bpf mailing list.
>>
> I'll fix it in version 10.
> 
>>> +	len = strlen(pidns_path) + 1;
>>> +	memcpy((char *)tmp->name, pidns_path, len);
>>> +	tmp->uptr = NULL;
>>> +	tmp->aname = NULL;
>>> +	tmp->refcnt = 1;
>>> +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
>> Adding below to free kmem cache memory
>> 	kmem_cache_free(names_cachep, fname);
>>
> 
> I think we don't need to call kmem_cache_free as filename_lookup
> calls putname that calls kmem_cache_free.

Oh, right. Thanks for checking this.

> 
> 
> Thanks a lot for your help.
> 
> Bests
> 
>> In the above, we checked task_active_pid_ns().
>> If not returning NULL, we have a valid pid ns. So the above
>> filename_lookup should not go wrong. We can still keep
>> the error checking though.
>>
>>> +	if (ret) {
>>> +		memset((void *)pidns_info, 0, (size_t) size);
>>> +		return ret;
>>
>>
> 
> I think we could get rid of this.
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
       [not found]         ` <CACiB22jyN9=0ATWWE+x=BoWD6u+8KO+MvBfsFQmcNfkmANb2_w@mail.gmail.com>
@ 2019-08-28 20:39           ` Carlos Antonio Neira Bustos
  2019-08-28 20:53             ` Yonghong Song
  0 siblings, 1 reply; 16+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-08-28 20:39 UTC (permalink / raw)
  To: Yonghong Song; +Cc: netdev, Eric Biederman, brouer, bpf

Yonghong, 

Thanks for the pointer, I fixed this bug, but I found another one that's triggered
now the test program I included in  tools/testing/selftests/bpf/test_pidns.
It's seemed that fname was not correctly setup when passing it to filename_lookup.
This is fixed now and I'm doing some more testing.
I think I'll remove the tests on samples/bpf as they are mostly end on -EPERM as 
the fix intended.
Is ok to remove them and just focus to finish the self tests code?.

Bests

On Wed, Aug 14, 2019 at 01:25:06AM -0400, carlos antonio neira bustos wrote:
> Thank you very much!
> 
> Bests
> 
> El mié., 14 de ago. de 2019 00:50, Yonghong Song <yhs@fb.com> escribió:
> 
> >
> >
> > On 8/13/19 5:56 PM, Carlos Antonio Neira Bustos wrote:
> > > On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
> > >>
> > >>
> > >> On 8/13/19 11:47 AM, Carlos Neira wrote:
> > >>> From: Carlos <cneirabustos@gmail.com>
> > >>>
> > >>> New bpf helper bpf_get_current_pidns_info.
> > >>> This helper obtains the active namespace from current and returns
> > >>> pid, tgid, device and namespace id as seen from that namespace,
> > >>> allowing to instrument a process inside a container.
> > >>>
> > >>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > >>> ---
> > >>>    fs/internal.h            |  2 --
> > >>>    fs/namei.c               |  1 -
> > >>>    include/linux/bpf.h      |  1 +
> > >>>    include/linux/namei.h    |  4 +++
> > >>>    include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> > >>>    kernel/bpf/core.c        |  1 +
> > >>>    kernel/bpf/helpers.c     | 64
> > ++++++++++++++++++++++++++++++++++++++++++++++++
> > >>>    kernel/trace/bpf_trace.c |  2 ++
> > >>>    8 files changed, 102 insertions(+), 4 deletions(-)
> > >>>
> > [...]
> > >>>
> > >>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *,
> > pidns_info, u32,
> > >>> +    size)
> > >>> +{
> > >>> +   const char *pidns_path = "/proc/self/ns/pid";
> > >>> +   struct pid_namespace *pidns = NULL;
> > >>> +   struct filename *tmp = NULL;
> > >>> +   struct inode *inode;
> > >>> +   struct path kp;
> > >>> +   pid_t tgid = 0;
> > >>> +   pid_t pid = 0;
> > >>> +   int ret;
> > >>> +   int len;
> > >>
> > >
> > > Thank you very much for catching this!.
> > > Could you share how to replicate this bug?.
> >
> > The config is attached. just run trace_ns_info and you
> > can reproduce the issue.
> >
> > >
> > >> I am running your sample program and get the following kernel bug:
> > >>
> > >> ...
> > >> [   26.414825] BUG: sleeping function called from invalid context at
> > >> /data/users/yhs/work/net-next/fs
> > >> /dcache.c:843
> > >> [   26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> > >> [   26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G        W
> > >> 5.3.0-rc1+ #280
> > >> [   26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > >> BIOS 1.9.3-1.el7.centos 04/01/2
> > >> 014
> > >> [   26.419393] Call Trace:
> > >> [   26.419697]  <IRQ>
> > >> [   26.419960]  dump_stack+0x46/0x5b
> > >> [   26.420434]  ___might_sleep+0xe4/0x110
> > >> [   26.420894]  dput+0x2a/0x200
> > >> [   26.421265]  walk_component+0x10c/0x280
> > >> [   26.421773]  link_path_walk+0x327/0x560
> > >> [   26.422280]  ? proc_ns_dir_readdir+0x1a0/0x1a0
> > >> [   26.422848]  ? path_init+0x232/0x330
> > >> [   26.423364]  path_lookupat+0x88/0x200
> > >> [   26.423808]  ? selinux_parse_skb.constprop.69+0x124/0x430
> > >> [   26.424521]  filename_lookup+0xaf/0x190
> > >> [   26.425031]  ? simple_attr_release+0x20/0x20
> > >> [   26.425560]  bpf_get_current_pidns_info+0xfa/0x190
> > >> [   26.426168]  bpf_prog_83627154cefed596+0xe66/0x1000
> > >> [   26.426779]  trace_call_bpf+0xb5/0x160
> > >> [   26.427317]  ? __netif_receive_skb_core+0x1/0xbb0
> > >> [   26.427929]  ? __netif_receive_skb_core+0x1/0xbb0
> > >> [   26.428496]  kprobe_perf_func+0x4d/0x280
> > >> [   26.428986]  ? tracing_record_taskinfo_skip+0x1a/0x30
> > >> [   26.429584]  ? tracing_record_taskinfo+0xe/0x80
> > >> [   26.430152]  ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> > >> [   26.430737]  ? __netif_receive_skb_core+0x1/0xbb0
> > >> [   26.431334]  ? __netif_receive_skb_core+0x5/0xbb0
> > >> [   26.431930]  kprobe_ftrace_handler+0x90/0xf0
> > >> [   26.432495]  ftrace_ops_assist_func+0x63/0x100
> > >> [   26.433060]  0xffffffffc03180bf
> > >> [   26.433471]  ? __netif_receive_skb_core+0x1/0xbb0
> > >> ...
> > >>
> > >> To prevent we are running in arbitrary task (e.g., idle task)
> > >> context which may introduce sleeping issues, the following
> > >> probably appropriate:
> > >>
> > >>          if (in_nmi() || in_softirq())
> > >>                  return -EPERM;
> > >>
> > >> Anyway, if in nmi or softirq, the namespace and pid/tgid
> > >> we get may be just accidentally associated with the bpf running
> > >> context, but it could be in a different context. So such info
> > >> is not reliable any way.
> > >>
> > >>> +
> > >>> +   if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > >>> +           return -EINVAL;
> > >>> +   pidns = task_active_pid_ns(current);
> > [...]
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
  2019-08-28 20:39           ` Carlos Antonio Neira Bustos
@ 2019-08-28 20:53             ` Yonghong Song
  2019-08-28 21:03               ` Carlos Antonio Neira Bustos
  0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2019-08-28 20:53 UTC (permalink / raw)
  To: Carlos Antonio Neira Bustos; +Cc: netdev, Eric Biederman, brouer, bpf



On 8/28/19 1:39 PM, Carlos Antonio Neira Bustos wrote:
> Yonghong,
> 
> Thanks for the pointer, I fixed this bug, but I found another one that's triggered
> now the test program I included in  tools/testing/selftests/bpf/test_pidns.
> It's seemed that fname was not correctly setup when passing it to filename_lookup.
> This is fixed now and I'm doing some more testing.
> I think I'll remove the tests on samples/bpf as they are mostly end on -EPERM as
> the fix intended.
> Is ok to remove them and just focus to finish the self tests code?.

Yes, the samples/bpf test case can be removed.
Could you create a selftest with tracpoint net/netif_receive_skb, which 
also uses the proposed helper? net/netif_receive_skb will happen in
interrupt context and it should catch the issue as well if 
filename_lookup still get called in interrupt context.

> 
> Bests
> 
> On Wed, Aug 14, 2019 at 01:25:06AM -0400, carlos antonio neira bustos wrote:
>> Thank you very much!
>>
>> Bests
>>
>> El mié., 14 de ago. de 2019 00:50, Yonghong Song <yhs@fb.com> escribió:
>>
>>>
>>>
>>> On 8/13/19 5:56 PM, Carlos Antonio Neira Bustos wrote:
>>>> On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 8/13/19 11:47 AM, Carlos Neira wrote:
>>>>>> From: Carlos <cneirabustos@gmail.com>
>>>>>>
>>>>>> New bpf helper bpf_get_current_pidns_info.
>>>>>> This helper obtains the active namespace from current and returns
>>>>>> pid, tgid, device and namespace id as seen from that namespace,
>>>>>> allowing to instrument a process inside a container.
>>>>>>
>>>>>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
>>>>>> ---
>>>>>>     fs/internal.h            |  2 --
>>>>>>     fs/namei.c               |  1 -
>>>>>>     include/linux/bpf.h      |  1 +
>>>>>>     include/linux/namei.h    |  4 +++
>>>>>>     include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
>>>>>>     kernel/bpf/core.c        |  1 +
>>>>>>     kernel/bpf/helpers.c     | 64
>>> ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>     kernel/trace/bpf_trace.c |  2 ++
>>>>>>     8 files changed, 102 insertions(+), 4 deletions(-)
>>>>>>
>>> [...]
>>>>>>
>>>>>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *,
>>> pidns_info, u32,
>>>>>> +    size)
>>>>>> +{
>>>>>> +   const char *pidns_path = "/proc/self/ns/pid";
>>>>>> +   struct pid_namespace *pidns = NULL;
>>>>>> +   struct filename *tmp = NULL;
>>>>>> +   struct inode *inode;
>>>>>> +   struct path kp;
>>>>>> +   pid_t tgid = 0;
>>>>>> +   pid_t pid = 0;
>>>>>> +   int ret;
>>>>>> +   int len;
>>>>>
>>>>
>>>> Thank you very much for catching this!.
>>>> Could you share how to replicate this bug?.
>>>
>>> The config is attached. just run trace_ns_info and you
>>> can reproduce the issue.
>>>
>>>>
>>>>> I am running your sample program and get the following kernel bug:
>>>>>
>>>>> ...
>>>>> [   26.414825] BUG: sleeping function called from invalid context at
>>>>> /data/users/yhs/work/net-next/fs
>>>>> /dcache.c:843
>>>>> [   26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
>>>>> [   26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G        W
>>>>> 5.3.0-rc1+ #280
>>>>> [   26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>>> BIOS 1.9.3-1.el7.centos 04/01/2
>>>>> 014
>>>>> [   26.419393] Call Trace:
>>>>> [   26.419697]  <IRQ>
>>>>> [   26.419960]  dump_stack+0x46/0x5b
>>>>> [   26.420434]  ___might_sleep+0xe4/0x110
>>>>> [   26.420894]  dput+0x2a/0x200
>>>>> [   26.421265]  walk_component+0x10c/0x280
>>>>> [   26.421773]  link_path_walk+0x327/0x560
>>>>> [   26.422280]  ? proc_ns_dir_readdir+0x1a0/0x1a0
>>>>> [   26.422848]  ? path_init+0x232/0x330
>>>>> [   26.423364]  path_lookupat+0x88/0x200
>>>>> [   26.423808]  ? selinux_parse_skb.constprop.69+0x124/0x430
>>>>> [   26.424521]  filename_lookup+0xaf/0x190
>>>>> [   26.425031]  ? simple_attr_release+0x20/0x20
>>>>> [   26.425560]  bpf_get_current_pidns_info+0xfa/0x190
>>>>> [   26.426168]  bpf_prog_83627154cefed596+0xe66/0x1000
>>>>> [   26.426779]  trace_call_bpf+0xb5/0x160
>>>>> [   26.427317]  ? __netif_receive_skb_core+0x1/0xbb0
>>>>> [   26.427929]  ? __netif_receive_skb_core+0x1/0xbb0
>>>>> [   26.428496]  kprobe_perf_func+0x4d/0x280
>>>>> [   26.428986]  ? tracing_record_taskinfo_skip+0x1a/0x30
>>>>> [   26.429584]  ? tracing_record_taskinfo+0xe/0x80
>>>>> [   26.430152]  ? ttwu_do_wakeup.isra.114+0xcf/0xf0
>>>>> [   26.430737]  ? __netif_receive_skb_core+0x1/0xbb0
>>>>> [   26.431334]  ? __netif_receive_skb_core+0x5/0xbb0
>>>>> [   26.431930]  kprobe_ftrace_handler+0x90/0xf0
>>>>> [   26.432495]  ftrace_ops_assist_func+0x63/0x100
>>>>> [   26.433060]  0xffffffffc03180bf
>>>>> [   26.433471]  ? __netif_receive_skb_core+0x1/0xbb0
>>>>> ...
>>>>>
>>>>> To prevent we are running in arbitrary task (e.g., idle task)
>>>>> context which may introduce sleeping issues, the following
>>>>> probably appropriate:
>>>>>
>>>>>           if (in_nmi() || in_softirq())
>>>>>                   return -EPERM;
>>>>>
>>>>> Anyway, if in nmi or softirq, the namespace and pid/tgid
>>>>> we get may be just accidentally associated with the bpf running
>>>>> context, but it could be in a different context. So such info
>>>>> is not reliable any way.
>>>>>
>>>>>> +
>>>>>> +   if (unlikely(size != sizeof(struct bpf_pidns_info)))
>>>>>> +           return -EINVAL;
>>>>>> +   pidns = task_active_pid_ns(current);
>>> [...]
>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
  2019-08-28 20:53             ` Yonghong Song
@ 2019-08-28 21:03               ` Carlos Antonio Neira Bustos
  2019-09-03 18:45                 ` Carlos Antonio Neira Bustos
  0 siblings, 1 reply; 16+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-08-28 21:03 UTC (permalink / raw)
  To: Yonghong Song; +Cc: netdev, Eric Biederman, brouer, bpf

Thanks, I'll work on the net/netif_receive_skb selftest using this helper.
I hope I could complete this work this week.

Bests.

On Wed, Aug 28, 2019 at 08:53:25PM +0000, Yonghong Song wrote:
> 
> 
> On 8/28/19 1:39 PM, Carlos Antonio Neira Bustos wrote:
> > Yonghong,
> > 
> > Thanks for the pointer, I fixed this bug, but I found another one that's triggered
> > now the test program I included in  tools/testing/selftests/bpf/test_pidns.
> > It's seemed that fname was not correctly setup when passing it to filename_lookup.
> > This is fixed now and I'm doing some more testing.
> > I think I'll remove the tests on samples/bpf as they are mostly end on -EPERM as
> > the fix intended.
> > Is ok to remove them and just focus to finish the self tests code?.
> 
> Yes, the samples/bpf test case can be removed.
> Could you create a selftest with tracpoint net/netif_receive_skb, which 
> also uses the proposed helper? net/netif_receive_skb will happen in
> interrupt context and it should catch the issue as well if 
> filename_lookup still get called in interrupt context.
> 
> > 
> > Bests
> > 
> > On Wed, Aug 14, 2019 at 01:25:06AM -0400, carlos antonio neira bustos wrote:
> >> Thank you very much!
> >>
> >> Bests
> >>
> >> El mié., 14 de ago. de 2019 00:50, Yonghong Song <yhs@fb.com> escribió:
> >>
> >>>
> >>>
> >>> On 8/13/19 5:56 PM, Carlos Antonio Neira Bustos wrote:
> >>>> On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
> >>>>>
> >>>>>
> >>>>> On 8/13/19 11:47 AM, Carlos Neira wrote:
> >>>>>> From: Carlos <cneirabustos@gmail.com>
> >>>>>>
> >>>>>> New bpf helper bpf_get_current_pidns_info.
> >>>>>> This helper obtains the active namespace from current and returns
> >>>>>> pid, tgid, device and namespace id as seen from that namespace,
> >>>>>> allowing to instrument a process inside a container.
> >>>>>>
> >>>>>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> >>>>>> ---
> >>>>>>     fs/internal.h            |  2 --
> >>>>>>     fs/namei.c               |  1 -
> >>>>>>     include/linux/bpf.h      |  1 +
> >>>>>>     include/linux/namei.h    |  4 +++
> >>>>>>     include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> >>>>>>     kernel/bpf/core.c        |  1 +
> >>>>>>     kernel/bpf/helpers.c     | 64
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>>     kernel/trace/bpf_trace.c |  2 ++
> >>>>>>     8 files changed, 102 insertions(+), 4 deletions(-)
> >>>>>>
> >>> [...]
> >>>>>>
> >>>>>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *,
> >>> pidns_info, u32,
> >>>>>> +    size)
> >>>>>> +{
> >>>>>> +   const char *pidns_path = "/proc/self/ns/pid";
> >>>>>> +   struct pid_namespace *pidns = NULL;
> >>>>>> +   struct filename *tmp = NULL;
> >>>>>> +   struct inode *inode;
> >>>>>> +   struct path kp;
> >>>>>> +   pid_t tgid = 0;
> >>>>>> +   pid_t pid = 0;
> >>>>>> +   int ret;
> >>>>>> +   int len;
> >>>>>
> >>>>
> >>>> Thank you very much for catching this!.
> >>>> Could you share how to replicate this bug?.
> >>>
> >>> The config is attached. just run trace_ns_info and you
> >>> can reproduce the issue.
> >>>
> >>>>
> >>>>> I am running your sample program and get the following kernel bug:
> >>>>>
> >>>>> ...
> >>>>> [   26.414825] BUG: sleeping function called from invalid context at
> >>>>> /data/users/yhs/work/net-next/fs
> >>>>> /dcache.c:843
> >>>>> [   26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> >>>>> [   26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G        W
> >>>>> 5.3.0-rc1+ #280
> >>>>> [   26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >>>>> BIOS 1.9.3-1.el7.centos 04/01/2
> >>>>> 014
> >>>>> [   26.419393] Call Trace:
> >>>>> [   26.419697]  <IRQ>
> >>>>> [   26.419960]  dump_stack+0x46/0x5b
> >>>>> [   26.420434]  ___might_sleep+0xe4/0x110
> >>>>> [   26.420894]  dput+0x2a/0x200
> >>>>> [   26.421265]  walk_component+0x10c/0x280
> >>>>> [   26.421773]  link_path_walk+0x327/0x560
> >>>>> [   26.422280]  ? proc_ns_dir_readdir+0x1a0/0x1a0
> >>>>> [   26.422848]  ? path_init+0x232/0x330
> >>>>> [   26.423364]  path_lookupat+0x88/0x200
> >>>>> [   26.423808]  ? selinux_parse_skb.constprop.69+0x124/0x430
> >>>>> [   26.424521]  filename_lookup+0xaf/0x190
> >>>>> [   26.425031]  ? simple_attr_release+0x20/0x20
> >>>>> [   26.425560]  bpf_get_current_pidns_info+0xfa/0x190
> >>>>> [   26.426168]  bpf_prog_83627154cefed596+0xe66/0x1000
> >>>>> [   26.426779]  trace_call_bpf+0xb5/0x160
> >>>>> [   26.427317]  ? __netif_receive_skb_core+0x1/0xbb0
> >>>>> [   26.427929]  ? __netif_receive_skb_core+0x1/0xbb0
> >>>>> [   26.428496]  kprobe_perf_func+0x4d/0x280
> >>>>> [   26.428986]  ? tracing_record_taskinfo_skip+0x1a/0x30
> >>>>> [   26.429584]  ? tracing_record_taskinfo+0xe/0x80
> >>>>> [   26.430152]  ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> >>>>> [   26.430737]  ? __netif_receive_skb_core+0x1/0xbb0
> >>>>> [   26.431334]  ? __netif_receive_skb_core+0x5/0xbb0
> >>>>> [   26.431930]  kprobe_ftrace_handler+0x90/0xf0
> >>>>> [   26.432495]  ftrace_ops_assist_func+0x63/0x100
> >>>>> [   26.433060]  0xffffffffc03180bf
> >>>>> [   26.433471]  ? __netif_receive_skb_core+0x1/0xbb0
> >>>>> ...
> >>>>>
> >>>>> To prevent we are running in arbitrary task (e.g., idle task)
> >>>>> context which may introduce sleeping issues, the following
> >>>>> probably appropriate:
> >>>>>
> >>>>>           if (in_nmi() || in_softirq())
> >>>>>                   return -EPERM;
> >>>>>
> >>>>> Anyway, if in nmi or softirq, the namespace and pid/tgid
> >>>>> we get may be just accidentally associated with the bpf running
> >>>>> context, but it could be in a different context. So such info
> >>>>> is not reliable any way.
> >>>>>
> >>>>>> +
> >>>>>> +   if (unlikely(size != sizeof(struct bpf_pidns_info)))
> >>>>>> +           return -EINVAL;
> >>>>>> +   pidns = task_active_pid_ns(current);
> >>> [...]
> >>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
  2019-08-28 21:03               ` Carlos Antonio Neira Bustos
@ 2019-09-03 18:45                 ` Carlos Antonio Neira Bustos
  2019-09-03 20:36                   ` Yonghong Song
  0 siblings, 1 reply; 16+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-09-03 18:45 UTC (permalink / raw)
  To: Yonghong Song; +Cc: netdev, Eric Biederman, brouer, bpf

Hi Yonghong,

> > Yes, the samples/bpf test case can be removed.
> > Could you create a selftest with tracpoint net/netif_receive_skb, which 
> > also uses the proposed helper? net/netif_receive_skb will happen in
> > interrupt context and it should catch the issue as well if 
> > filename_lookup still get called in interrupt context.
>
For this one scenario I just created another selftest with the only difference 
that the tracepoint is /net/netif_receive_skb so this fails with -EPERM.
Is that enough?.

I have made this comment on include/uapi/linux/bpf.h, maybe is too terse?

struct bpf_pidns_info {
	__u32 dev;	/* dev_t from /proc/self/ns/pid inode */
	__u32 nsid;
	__u32 tgid;
	__u32 pid;
};

I'm only missing clearing out those questions to be ready to submit v11 of this patch.

Bests

On Wed, Aug 28, 2019 at 05:03:35PM -0400, Carlos Antonio Neira Bustos wrote:
> Thanks, I'll work on the net/netif_receive_skb selftest using this helper.
> I hope I could complete this work this week.
> 
> Bests.
> 
> On Wed, Aug 28, 2019 at 08:53:25PM +0000, Yonghong Song wrote:
> > 
> > 
> > On 8/28/19 1:39 PM, Carlos Antonio Neira Bustos wrote:
> > > Yonghong,
> > > 
> > > Thanks for the pointer, I fixed this bug, but I found another one that's triggered
> > > now the test program I included in  tools/testing/selftests/bpf/test_pidns.
> > > It's seemed that fname was not correctly setup when passing it to filename_lookup.
> > > This is fixed now and I'm doing some more testing.
> > > I think I'll remove the tests on samples/bpf as they are mostly end on -EPERM as
> > > the fix intended.
> > > Is ok to remove them and just focus to finish the self tests code?.
> > 
> > Yes, the samples/bpf test case can be removed.
> > Could you create a selftest with tracpoint net/netif_receive_skb, which 
> > also uses the proposed helper? net/netif_receive_skb will happen in
> > interrupt context and it should catch the issue as well if 
> > filename_lookup still get called in interrupt context.
> > 
> > > 
> > > Bests
> > > 
> > > On Wed, Aug 14, 2019 at 01:25:06AM -0400, carlos antonio neira bustos wrote:
> > >> Thank you very much!
> > >>
> > >> Bests
> > >>
> > >> El mié., 14 de ago. de 2019 00:50, Yonghong Song <yhs@fb.com> escribió:
> > >>
> > >>>
> > >>>
> > >>> On 8/13/19 5:56 PM, Carlos Antonio Neira Bustos wrote:
> > >>>> On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
> > >>>>>
> > >>>>>
> > >>>>> On 8/13/19 11:47 AM, Carlos Neira wrote:
> > >>>>>> From: Carlos <cneirabustos@gmail.com>
> > >>>>>>
> > >>>>>> New bpf helper bpf_get_current_pidns_info.
> > >>>>>> This helper obtains the active namespace from current and returns
> > >>>>>> pid, tgid, device and namespace id as seen from that namespace,
> > >>>>>> allowing to instrument a process inside a container.
> > >>>>>>
> > >>>>>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > >>>>>> ---
> > >>>>>>     fs/internal.h            |  2 --
> > >>>>>>     fs/namei.c               |  1 -
> > >>>>>>     include/linux/bpf.h      |  1 +
> > >>>>>>     include/linux/namei.h    |  4 +++
> > >>>>>>     include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> > >>>>>>     kernel/bpf/core.c        |  1 +
> > >>>>>>     kernel/bpf/helpers.c     | 64
> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++
> > >>>>>>     kernel/trace/bpf_trace.c |  2 ++
> > >>>>>>     8 files changed, 102 insertions(+), 4 deletions(-)
> > >>>>>>
> > >>> [...]
> > >>>>>>
> > >>>>>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *,
> > >>> pidns_info, u32,
> > >>>>>> +    size)
> > >>>>>> +{
> > >>>>>> +   const char *pidns_path = "/proc/self/ns/pid";
> > >>>>>> +   struct pid_namespace *pidns = NULL;
> > >>>>>> +   struct filename *tmp = NULL;
> > >>>>>> +   struct inode *inode;
> > >>>>>> +   struct path kp;
> > >>>>>> +   pid_t tgid = 0;
> > >>>>>> +   pid_t pid = 0;
> > >>>>>> +   int ret;
> > >>>>>> +   int len;
> > >>>>>
> > >>>>
> > >>>> Thank you very much for catching this!.
> > >>>> Could you share how to replicate this bug?.
> > >>>
> > >>> The config is attached. just run trace_ns_info and you
> > >>> can reproduce the issue.
> > >>>
> > >>>>
> > >>>>> I am running your sample program and get the following kernel bug:
> > >>>>>
> > >>>>> ...
> > >>>>> [   26.414825] BUG: sleeping function called from invalid context at
> > >>>>> /data/users/yhs/work/net-next/fs
> > >>>>> /dcache.c:843
> > >>>>> [   26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> > >>>>> [   26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G        W
> > >>>>> 5.3.0-rc1+ #280
> > >>>>> [   26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > >>>>> BIOS 1.9.3-1.el7.centos 04/01/2
> > >>>>> 014
> > >>>>> [   26.419393] Call Trace:
> > >>>>> [   26.419697]  <IRQ>
> > >>>>> [   26.419960]  dump_stack+0x46/0x5b
> > >>>>> [   26.420434]  ___might_sleep+0xe4/0x110
> > >>>>> [   26.420894]  dput+0x2a/0x200
> > >>>>> [   26.421265]  walk_component+0x10c/0x280
> > >>>>> [   26.421773]  link_path_walk+0x327/0x560
> > >>>>> [   26.422280]  ? proc_ns_dir_readdir+0x1a0/0x1a0
> > >>>>> [   26.422848]  ? path_init+0x232/0x330
> > >>>>> [   26.423364]  path_lookupat+0x88/0x200
> > >>>>> [   26.423808]  ? selinux_parse_skb.constprop.69+0x124/0x430
> > >>>>> [   26.424521]  filename_lookup+0xaf/0x190
> > >>>>> [   26.425031]  ? simple_attr_release+0x20/0x20
> > >>>>> [   26.425560]  bpf_get_current_pidns_info+0xfa/0x190
> > >>>>> [   26.426168]  bpf_prog_83627154cefed596+0xe66/0x1000
> > >>>>> [   26.426779]  trace_call_bpf+0xb5/0x160
> > >>>>> [   26.427317]  ? __netif_receive_skb_core+0x1/0xbb0
> > >>>>> [   26.427929]  ? __netif_receive_skb_core+0x1/0xbb0
> > >>>>> [   26.428496]  kprobe_perf_func+0x4d/0x280
> > >>>>> [   26.428986]  ? tracing_record_taskinfo_skip+0x1a/0x30
> > >>>>> [   26.429584]  ? tracing_record_taskinfo+0xe/0x80
> > >>>>> [   26.430152]  ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> > >>>>> [   26.430737]  ? __netif_receive_skb_core+0x1/0xbb0
> > >>>>> [   26.431334]  ? __netif_receive_skb_core+0x5/0xbb0
> > >>>>> [   26.431930]  kprobe_ftrace_handler+0x90/0xf0
> > >>>>> [   26.432495]  ftrace_ops_assist_func+0x63/0x100
> > >>>>> [   26.433060]  0xffffffffc03180bf
> > >>>>> [   26.433471]  ? __netif_receive_skb_core+0x1/0xbb0
> > >>>>> ...
> > >>>>>
> > >>>>> To prevent we are running in arbitrary task (e.g., idle task)
> > >>>>> context which may introduce sleeping issues, the following
> > >>>>> probably appropriate:
> > >>>>>
> > >>>>>           if (in_nmi() || in_softirq())
> > >>>>>                   return -EPERM;
> > >>>>>
> > >>>>> Anyway, if in nmi or softirq, the namespace and pid/tgid
> > >>>>> we get may be just accidentally associated with the bpf running
> > >>>>> context, but it could be in a different context. So such info
> > >>>>> is not reliable any way.
> > >>>>>
> > >>>>>> +
> > >>>>>> +   if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > >>>>>> +           return -EINVAL;
> > >>>>>> +   pidns = task_active_pid_ns(current);
> > >>> [...]
> > >>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
  2019-09-03 18:45                 ` Carlos Antonio Neira Bustos
@ 2019-09-03 20:36                   ` Yonghong Song
  0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2019-09-03 20:36 UTC (permalink / raw)
  To: Carlos Antonio Neira Bustos; +Cc: netdev, Eric Biederman, brouer, bpf



On 9/3/19 11:45 AM, Carlos Antonio Neira Bustos wrote:
> Hi Yonghong,
> 
>>> Yes, the samples/bpf test case can be removed.
>>> Could you create a selftest with tracpoint net/netif_receive_skb, which
>>> also uses the proposed helper? net/netif_receive_skb will happen in
>>> interrupt context and it should catch the issue as well if
>>> filename_lookup still get called in interrupt context.
>>
> For this one scenario I just created another selftest with the only difference
> that the tracepoint is /net/netif_receive_skb so this fails with -EPERM.
> Is that enough?.

This should be fine.

> 
> I have made this comment on include/uapi/linux/bpf.h, maybe is too terse?
> 
> struct bpf_pidns_info {
> 	__u32 dev;	/* dev_t from /proc/self/ns/pid inode */
> 	__u32 nsid;
> 	__u32 tgid;
> 	__u32 pid;
> };

Let us keep the above for now. I may have further comments based on
your test code which uses "dev".

> 
> I'm only missing clearing out those questions to be ready to submit v11 of this patch.

Please go ahead to submit the new version.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-09-03 20:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-13 18:47 [PATCH bpf-next V9 0/3] BPF: New helper to obtain namespace data from current task Carlos Neira
2019-08-13 18:47 ` [PATCH bpf-next V9 1/3] bpf: new " Carlos Neira
2019-08-13 22:35   ` Yonghong Song
2019-08-20 15:10     ` Carlos Antonio Neira Bustos
2019-08-20 17:29       ` Yonghong Song
2019-08-13 23:11   ` Yonghong Song
2019-08-13 23:51     ` [Potential Spoof] " Yonghong Song
2019-08-14  0:56     ` Carlos Antonio Neira Bustos
     [not found]       ` <9a2cacad-b79f-5d39-6d62-bb48cbaaac07@fb.com>
     [not found]         ` <CACiB22jyN9=0ATWWE+x=BoWD6u+8KO+MvBfsFQmcNfkmANb2_w@mail.gmail.com>
2019-08-28 20:39           ` Carlos Antonio Neira Bustos
2019-08-28 20:53             ` Yonghong Song
2019-08-28 21:03               ` Carlos Antonio Neira Bustos
2019-09-03 18:45                 ` Carlos Antonio Neira Bustos
2019-09-03 20:36                   ` Yonghong Song
2019-08-13 18:47 ` [PATCH bpf-next V9 2/3] samples/bpf: added sample code for bpf_get_current_pidns_info Carlos Neira
2019-08-13 18:47 ` [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper Carlos Neira
2019-08-13 23:19   ` Yonghong Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).