* [PATCH bpf-next V9 0/3] BPF: New helper to obtain namespace data from current task
@ 2019-08-13 18:47 Carlos Neira
2019-08-13 18:47 ` [PATCH bpf-next V9 1/3] bpf: new " Carlos Neira
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Carlos Neira @ 2019-08-13 18:47 UTC (permalink / raw)
To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf
This helper obtains the active namespace from current and returns pid, tgid,
device and namespace id as seen from that namespace, allowing to instrument
a process inside a container.
Device is read from /proc/self/ns/pid, as in the future it's possible that
different pid_ns files may belong to different devices, according
to the discussion between Eric Biederman and Yonghong in 2017 linux plumbers
conference.
Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
scripts but this helper returns the pid as seen by the root namespace which is
fine when a bcc script is not executed inside a container.
When the process of interest is inside a container, pid filtering will not work
if bpf_get_current_pid_tgid() is used. This helper addresses this limitation
returning the pid as it's seen by the current namespace where the script is
executing.
This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
used to do pid filtering even inside a container.
For example a bcc script using bpf_get_current_pid_tgid() (tools/funccount.py):
u32 pid = bpf_get_current_pid_tgid() >> 32;
if (pid != <pid_arg_passed_in>)
return 0;
Could be modified to use bpf_get_current_pidns_info() as follows:
struct bpf_pidns pidns;
bpf_get_current_pidns_info(&pidns, sizeof(struct bpf_pidns));
u32 pid = pidns.tgid;
u32 nsid = pidns.nsid;
if ((pid != <pid_arg_passed_in>) && (nsid != <nsid_arg_passed_in>))
return 0;
To find out the name PID namespace id of a process, you could use this command:
$ ps -h -o pidns -p <pid_of_interest>
Or this other command:
$ ls -Li /proc/<pid_of_interest>/ns/pid
Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
Carlos Neira (3):
bpf: new helper to obtain namespace data from current task
samples/bpf: added sample code for bpf_get_current_pidns_info.
tools/testing/selftests/bpf: Add self-tests for new helper.
fs/internal.h | 2 -
fs/namei.c | 1 -
include/linux/bpf.h | 1 +
include/linux/namei.h | 4 +
include/uapi/linux/bpf.h | 31 ++++-
kernel/bpf/core.c | 1 +
kernel/bpf/helpers.c | 64 ++++++++++
kernel/trace/bpf_trace.c | 2 +
samples/bpf/Makefile | 3 +
samples/bpf/trace_ns_info_user.c | 35 ++++++
samples/bpf/trace_ns_info_user_kern.c | 44 +++++++
tools/include/uapi/linux/bpf.h | 31 ++++-
tools/testing/selftests/bpf/Makefile | 2 +-
tools/testing/selftests/bpf/bpf_helpers.h | 3 +
.../testing/selftests/bpf/progs/test_pidns_kern.c | 51 ++++++++
tools/testing/selftests/bpf/test_pidns.c | 138 +++++++++++++++++++++
16 files changed, 407 insertions(+), 6 deletions(-)
create mode 100644 samples/bpf/trace_ns_info_user.c
create mode 100644 samples/bpf/trace_ns_info_user_kern.c
create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
create mode 100644 tools/testing/selftests/bpf/test_pidns.c
--
2.11.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
2019-08-13 18:47 [PATCH bpf-next V9 0/3] BPF: New helper to obtain namespace data from current task Carlos Neira
@ 2019-08-13 18:47 ` Carlos Neira
2019-08-13 22:35 ` Yonghong Song
2019-08-13 23:11 ` Yonghong Song
2019-08-13 18:47 ` [PATCH bpf-next V9 2/3] samples/bpf: added sample code for bpf_get_current_pidns_info Carlos Neira
2019-08-13 18:47 ` [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper Carlos Neira
2 siblings, 2 replies; 16+ messages in thread
From: Carlos Neira @ 2019-08-13 18:47 UTC (permalink / raw)
To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf
From: Carlos <cneirabustos@gmail.com>
New bpf helper bpf_get_current_pidns_info.
This helper obtains the active namespace from current and returns
pid, tgid, device and namespace id as seen from that namespace,
allowing to instrument a process inside a container.
Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
---
fs/internal.h | 2 --
fs/namei.c | 1 -
include/linux/bpf.h | 1 +
include/linux/namei.h | 4 +++
include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
kernel/bpf/core.c | 1 +
kernel/bpf/helpers.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
kernel/trace/bpf_trace.c | 2 ++
8 files changed, 102 insertions(+), 4 deletions(-)
diff --git a/fs/internal.h b/fs/internal.h
index 315fcd8d237c..6647e15dd419 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
/*
* namei.c
*/
-extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
- struct path *path, struct path *root);
extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
const char *, unsigned int, struct path *);
diff --git a/fs/namei.c b/fs/namei.c
index 209c51a5226c..a89fc72a4a10 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -19,7 +19,6 @@
#include <linux/export.h>
#include <linux/kernel.h>
#include <linux/slab.h>
-#include <linux/fs.h>
#include <linux/namei.h>
#include <linux/pagemap.h>
#include <linux/fsnotify.h>
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f9a506147c8a..e4adf5e05afd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
extern const struct bpf_func_proto bpf_strtol_proto;
extern const struct bpf_func_proto bpf_strtoul_proto;
extern const struct bpf_func_proto bpf_tcp_sock_proto;
+extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
/* Shared helpers among cBPF and eBPF. */
void bpf_user_rnd_init_once(void);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 9138b4471dbf..b45c8b6f7cb4 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -6,6 +6,7 @@
#include <linux/path.h>
#include <linux/fcntl.h>
#include <linux/errno.h>
+#include <linux/fs.h>
enum { MAX_NESTED_LINKS = 8 };
@@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
extern void nd_jump_link(struct path *path);
+extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
+ struct path *path, struct path *root);
+
static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
{
((char *) name)[min(len, maxlen)] = '\0';
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4393bd4b2419..db241857ec15 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2741,6 +2741,28 @@ union bpf_attr {
* **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
*
* **-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
+ * Description
+ * Copies into *pidns* pid, namespace id and tgid as seen by the
+ * current namespace and also device from /proc/self/ns/pid.
+ * *size_of_pidns* must be the size of *pidns*
+ *
+ * This helper is used when pid filtering is needed inside a
+ * container as bpf_get_current_tgid() helper returns always the
+ * pid id as seen by the root namespace.
+ * Return
+ * 0 on success
+ *
+ * **-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
+ * or tgid of the current task.
+ *
+ * **-ECHILD** if /proc/self/ns/pid does not exists.
+ *
+ * **-ENOTDIR** if /proc/self/ns does not exists.
+ *
+ * **-ENOMEM** if allocation fails.
+ *
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -2853,7 +2875,8 @@ union bpf_attr {
FN(sk_storage_get), \
FN(sk_storage_delete), \
FN(send_signal), \
- FN(tcp_gen_syncookie),
+ FN(tcp_gen_syncookie), \
+ FN(get_current_pidns_info),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@@ -3604,4 +3627,10 @@ struct bpf_sockopt {
__s32 retval;
};
+struct bpf_pidns_info {
+ __u32 dev;
+ __u32 nsid;
+ __u32 tgid;
+ __u32 pid;
+};
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 8191a7db2777..3159f2a0188c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
const struct bpf_func_proto bpf_get_current_comm_proto __weak;
const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
const struct bpf_func_proto bpf_get_local_storage_proto __weak;
+const struct bpf_func_proto bpf_get_current_pidns_info __weak;
const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
{
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 5e28718928ca..41fbf1f28a48 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -11,6 +11,12 @@
#include <linux/uidgid.h>
#include <linux/filter.h>
#include <linux/ctype.h>
+#include <linux/pid_namespace.h>
+#include <linux/major.h>
+#include <linux/stat.h>
+#include <linux/namei.h>
+#include <linux/version.h>
+
#include "../../lib/kstrtox.h"
@@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
preempt_enable();
}
+BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
+ size)
+{
+ const char *pidns_path = "/proc/self/ns/pid";
+ struct pid_namespace *pidns = NULL;
+ struct filename *tmp = NULL;
+ struct inode *inode;
+ struct path kp;
+ pid_t tgid = 0;
+ pid_t pid = 0;
+ int ret;
+ int len;
+
+ if (unlikely(size != sizeof(struct bpf_pidns_info)))
+ return -EINVAL;
+ pidns = task_active_pid_ns(current);
+ if (unlikely(!pidns))
+ goto clear;
+ pidns_info->nsid = pidns->ns.inum;
+ pid = task_pid_nr_ns(current, pidns);
+ if (unlikely(!pid))
+ goto clear;
+ tgid = task_tgid_nr_ns(current, pidns);
+ if (unlikely(!tgid))
+ goto clear;
+ pidns_info->tgid = (u32) tgid;
+ pidns_info->pid = (u32) pid;
+ tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
+ if (unlikely(!tmp)) {
+ memset((void *)pidns_info, 0, (size_t) size);
+ return -ENOMEM;
+ }
+ len = strlen(pidns_path) + 1;
+ memcpy((char *)tmp->name, pidns_path, len);
+ tmp->uptr = NULL;
+ tmp->aname = NULL;
+ tmp->refcnt = 1;
+ ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
+ if (ret) {
+ memset((void *)pidns_info, 0, (size_t) size);
+ return ret;
+ }
+ inode = d_backing_inode(kp.dentry);
+ pidns_info->dev = inode->i_sb->s_dev;
+ return 0;
+clear:
+ memset((void *)pidns_info, 0, (size_t) size);
+ return -EINVAL;
+}
+
+const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
+ .func = bpf_get_current_pidns_info,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg2_type = ARG_CONST_SIZE,
+};
+
#ifdef CONFIG_CGROUPS
BPF_CALL_0(bpf_get_current_cgroup_id)
{
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ca1255d14576..5e1dc22765a5 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
#endif
case BPF_FUNC_send_signal:
return &bpf_send_signal_proto;
+ case BPF_FUNC_get_current_pidns_info:
+ return &bpf_get_current_pidns_info_proto;
default:
return NULL;
}
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH bpf-next V9 2/3] samples/bpf: added sample code for bpf_get_current_pidns_info.
2019-08-13 18:47 [PATCH bpf-next V9 0/3] BPF: New helper to obtain namespace data from current task Carlos Neira
2019-08-13 18:47 ` [PATCH bpf-next V9 1/3] bpf: new " Carlos Neira
@ 2019-08-13 18:47 ` Carlos Neira
2019-08-13 18:47 ` [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper Carlos Neira
2 siblings, 0 replies; 16+ messages in thread
From: Carlos Neira @ 2019-08-13 18:47 UTC (permalink / raw)
To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf
From: Carlos <cneirabustos@gmail.com>
sample program to call new bpf helper bpf_get_current_pidns_info.
Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
---
samples/bpf/Makefile | 3 +++
samples/bpf/trace_ns_info_user.c | 35 ++++++++++++++++++++++++++++
samples/bpf/trace_ns_info_user_kern.c | 44 +++++++++++++++++++++++++++++++++++
3 files changed, 82 insertions(+)
create mode 100644 samples/bpf/trace_ns_info_user.c
create mode 100644 samples/bpf/trace_ns_info_user_kern.c
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 1d9be26b4edd..238453ff27d2 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -53,6 +53,7 @@ hostprogs-y += task_fd_query
hostprogs-y += xdp_sample_pkts
hostprogs-y += ibumad
hostprogs-y += hbm
+hostprogs-y += trace_ns_info
# Libbpf dependencies
LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
@@ -109,6 +110,7 @@ task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS)
xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS)
ibumad-objs := bpf_load.o ibumad_user.o $(TRACE_HELPERS)
hbm-objs := bpf_load.o hbm.o $(CGROUP_HELPERS)
+trace_ns_info-objs := bpf_load.o trace_ns_info_user.o
# Tell kbuild to always build the programs
always := $(hostprogs-y)
@@ -170,6 +172,7 @@ always += xdp_sample_pkts_kern.o
always += ibumad_kern.o
always += hbm_out_kern.o
always += hbm_edt_kern.o
+always += trace_ns_info_user_kern.o
KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/bpf/
diff --git a/samples/bpf/trace_ns_info_user.c b/samples/bpf/trace_ns_info_user.c
new file mode 100644
index 000000000000..e06d08db6f30
--- /dev/null
+++ b/samples/bpf/trace_ns_info_user.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <stdio.h>
+#include <linux/bpf.h>
+#include <unistd.h>
+#include "bpf/libbpf.h"
+#include "bpf_load.h"
+
+/* This code was taken verbatim from tracex1_user.c, it's used
+ * to exercize bpf_get_current_pidns_info() helper call.
+ */
+int main(int ac, char **argv)
+{
+ FILE *f;
+ char filename[256];
+
+ snprintf(filename, sizeof(filename), "%s_user_kern.o", argv[0]);
+ printf("loading %s\n", filename);
+
+ if (load_bpf_file(filename)) {
+ printf("%s", bpf_log_buf);
+ return 1;
+ }
+
+ f = popen("taskset 1 ping localhost", "r");
+ (void) f;
+ read_trace_pipe();
+ return 0;
+}
diff --git a/samples/bpf/trace_ns_info_user_kern.c b/samples/bpf/trace_ns_info_user_kern.c
new file mode 100644
index 000000000000..96675e02b707
--- /dev/null
+++ b/samples/bpf/trace_ns_info_user_kern.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/version.h>
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+typedef __u64 u64;
+typedef __u32 u32;
+
+
+/* kprobe is NOT a stable ABI
+ * kernel functions can be removed, renamed or completely change semantics.
+ * Number of arguments and their positions can change, etc.
+ * In such case this bpf+kprobe example will no longer be meaningful
+ */
+
+/* This will call bpf_get_current_pidns_info() to display pid and ns values
+ * as seen by the current namespace, on the far left you will see the pid as
+ * seen as by the root namespace.
+ */
+
+SEC("kprobe/__netif_receive_skb_core")
+int bpf_prog1(struct pt_regs *ctx)
+{
+ char fmt[] = "nsid:%u, dev: %u, pid:%u\n";
+ struct bpf_pidns_info nsinfo;
+ int ok = 0;
+
+ ok = bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo));
+ if (ok == 0)
+ bpf_trace_printk(fmt, sizeof(fmt), (u32)nsinfo.nsid,
+ (u32) nsinfo.dev, (u32)nsinfo.pid);
+
+ return 0;
+}
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper.
2019-08-13 18:47 [PATCH bpf-next V9 0/3] BPF: New helper to obtain namespace data from current task Carlos Neira
2019-08-13 18:47 ` [PATCH bpf-next V9 1/3] bpf: new " Carlos Neira
2019-08-13 18:47 ` [PATCH bpf-next V9 2/3] samples/bpf: added sample code for bpf_get_current_pidns_info Carlos Neira
@ 2019-08-13 18:47 ` Carlos Neira
2019-08-13 23:19 ` Yonghong Song
2 siblings, 1 reply; 16+ messages in thread
From: Carlos Neira @ 2019-08-13 18:47 UTC (permalink / raw)
To: netdev; +Cc: yhs, ebiederm, brouer, cneirabustos, bpf
From: Carlos <cneirabustos@gmail.com>
Added self-tests for new helper bpf_get_current_pidns_info.
Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
---
tools/include/uapi/linux/bpf.h | 31 ++++-
tools/testing/selftests/bpf/Makefile | 2 +-
tools/testing/selftests/bpf/bpf_helpers.h | 3 +
.../testing/selftests/bpf/progs/test_pidns_kern.c | 51 ++++++++
tools/testing/selftests/bpf/test_pidns.c | 138 +++++++++++++++++++++
5 files changed, 223 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
create mode 100644 tools/testing/selftests/bpf/test_pidns.c
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4393bd4b2419..db241857ec15 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2741,6 +2741,28 @@ union bpf_attr {
* **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
*
* **-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
+ * Description
+ * Copies into *pidns* pid, namespace id and tgid as seen by the
+ * current namespace and also device from /proc/self/ns/pid.
+ * *size_of_pidns* must be the size of *pidns*
+ *
+ * This helper is used when pid filtering is needed inside a
+ * container as bpf_get_current_tgid() helper returns always the
+ * pid id as seen by the root namespace.
+ * Return
+ * 0 on success
+ *
+ * **-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
+ * or tgid of the current task.
+ *
+ * **-ECHILD** if /proc/self/ns/pid does not exists.
+ *
+ * **-ENOTDIR** if /proc/self/ns does not exists.
+ *
+ * **-ENOMEM** if allocation fails.
+ *
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -2853,7 +2875,8 @@ union bpf_attr {
FN(sk_storage_get), \
FN(sk_storage_delete), \
FN(send_signal), \
- FN(tcp_gen_syncookie),
+ FN(tcp_gen_syncookie), \
+ FN(get_current_pidns_info),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@@ -3604,4 +3627,10 @@ struct bpf_sockopt {
__s32 retval;
};
+struct bpf_pidns_info {
+ __u32 dev;
+ __u32 nsid;
+ __u32 tgid;
+ __u32 pid;
+};
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 3bd0f4a0336a..1f97b571b581 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
test_cgroup_storage test_select_reuseport test_section_names \
test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
- test_sockopt_multi test_tcp_rtt
+ test_sockopt_multi test_tcp_rtt test_pidns
BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
TEST_GEN_FILES = $(BPF_OBJ_FILES)
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 8b503ea142f0..3fae3b9fcd2c 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
int ip_len, void *tcp, int tcp_len) =
(void *) BPF_FUNC_tcp_gen_syncookie;
+static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info *buf,
+ unsigned int buf_size) =
+ (void *) BPF_FUNC_get_current_pidns_info;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
new file mode 100644
index 000000000000..e1d2facfa762
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <linux/bpf.h>
+#include <errno.h>
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") nsidmap = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(__u32),
+ .max_entries = 1,
+};
+
+struct bpf_map_def SEC("maps") pidmap = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(__u32),
+ .value_size = sizeof(__u32),
+ .max_entries = 1,
+};
+
+SEC("tracepoint/syscalls/sys_enter_nanosleep")
+int trace(void *ctx)
+{
+ struct bpf_pidns_info nsinfo;
+ __u32 key = 0, *expected_pid, *val;
+ char fmt[] = "ERROR nspid:%d\n";
+
+ if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
+ return -EINVAL;
+
+ expected_pid = bpf_map_lookup_elem(&pidmap, &key);
+
+
+ if (!expected_pid || *expected_pid != nsinfo.pid)
+ return 0;
+
+ val = bpf_map_lookup_elem(&nsidmap, &key);
+ if (val)
+ *val = nsinfo.nsid;
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c
new file mode 100644
index 000000000000..a7254055f294
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_pidns.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <syscall.h>
+#include <unistd.h>
+#include <linux/perf_event.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <linux/bpf.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "cgroup_helpers.h"
+#include "bpf_rlimit.h"
+
+#define CHECK(condition, tag, format...) ({ \
+ int __ret = !!(condition); \
+ if (__ret) { \
+ printf("%s:FAIL:%s ", __func__, tag); \
+ printf(format); \
+ } else { \
+ printf("%s:PASS:%s\n", __func__, tag); \
+ } \
+ __ret; \
+})
+
+static int bpf_find_map(const char *test, struct bpf_object *obj,
+ const char *name)
+{
+ struct bpf_map *map;
+
+ map = bpf_object__find_map_by_name(obj, name);
+ if (!map)
+ return -1;
+ return bpf_map__fd(map);
+}
+
+
+int main(int argc, char **argv)
+{
+ const char *probe_name = "syscalls/sys_enter_nanosleep";
+ const char *file = "test_pidns_kern.o";
+ int err, bytes, efd, prog_fd, pmu_fd;
+ int pidmap_fd, nsidmap_fd;
+ struct perf_event_attr attr = {};
+ struct bpf_object *obj;
+ __u32 knsid = 0;
+ __u32 key = 0, pid;
+ int exit_code = 1;
+ struct stat st;
+ char buf[256];
+
+ err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
+ if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
+ goto cleanup_cgroup_env;
+
+ nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
+ if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+ nsidmap_fd, errno))
+ goto close_prog;
+
+ pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
+ if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+ pidmap_fd, errno))
+ goto close_prog;
+
+ pid = getpid();
+ bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
+
+ snprintf(buf, sizeof(buf),
+ "/sys/kernel/debug/tracing/events/%s/id", probe_name);
+ efd = open(buf, O_RDONLY, 0);
+ if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
+ goto close_prog;
+ bytes = read(efd, buf, sizeof(buf));
+ close(efd);
+ if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
+ "bytes %d errno %d\n", bytes, errno))
+ goto close_prog;
+
+ attr.config = strtol(buf, NULL, 0);
+ attr.type = PERF_TYPE_TRACEPOINT;
+ attr.sample_type = PERF_SAMPLE_RAW;
+ attr.sample_period = 1;
+ attr.wakeup_events = 1;
+
+ pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
+ if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
+ errno))
+ goto close_prog;
+
+ err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+ if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
+ errno))
+ goto close_pmu;
+
+ err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+ if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
+ errno))
+ goto close_pmu;
+
+ /* trigger some syscalls */
+ sleep(1);
+
+ err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
+ if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
+ goto close_pmu;
+
+ if (stat("/proc/self/ns/pid", &st))
+ goto close_pmu;
+
+ if (CHECK(knsid != (__u32) st.st_ino, "compare_namespace_id",
+ "kern knsid %u user unsid %u\n", knsid, (__u32) st.st_ino))
+ goto close_pmu;
+
+ exit_code = 0;
+ printf("%s:PASS\n", argv[0]);
+
+close_pmu:
+ close(pmu_fd);
+close_prog:
+ bpf_object__close(obj);
+cleanup_cgroup_env:
+ return exit_code;
+}
--
2.11.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
2019-08-13 18:47 ` [PATCH bpf-next V9 1/3] bpf: new " Carlos Neira
@ 2019-08-13 22:35 ` Yonghong Song
2019-08-20 15:10 ` Carlos Antonio Neira Bustos
2019-08-13 23:11 ` Yonghong Song
1 sibling, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2019-08-13 22:35 UTC (permalink / raw)
To: Carlos Neira, netdev; +Cc: ebiederm, brouer, bpf
On 8/13/19 11:47 AM, Carlos Neira wrote:
> From: Carlos <cneirabustos@gmail.com>
>
> New bpf helper bpf_get_current_pidns_info.
> This helper obtains the active namespace from current and returns
> pid, tgid, device and namespace id as seen from that namespace,
> allowing to instrument a process inside a container.
>
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
> fs/internal.h | 2 --
> fs/namei.c | 1 -
> include/linux/bpf.h | 1 +
> include/linux/namei.h | 4 +++
> include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> kernel/bpf/core.c | 1 +
> kernel/bpf/helpers.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
> kernel/trace/bpf_trace.c | 2 ++
> 8 files changed, 102 insertions(+), 4 deletions(-)
I prefer to break this into two patches to reduce
the potential merging conflicts:
patch 1: fs/internal.h, fs/namei.c, include/linux/namei.h
patch 2: rest of changes
patch 1 is simply a preparing patches to make filename_lookup
available later.
>
> diff --git a/fs/internal.h b/fs/internal.h
> index 315fcd8d237c..6647e15dd419 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
> /*
> * namei.c
> */
> -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> - struct path *path, struct path *root);
> extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
> extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
> const char *, unsigned int, struct path *);
> diff --git a/fs/namei.c b/fs/namei.c
> index 209c51a5226c..a89fc72a4a10 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -19,7 +19,6 @@
> #include <linux/export.h>
> #include <linux/kernel.h>
> #include <linux/slab.h>
> -#include <linux/fs.h>
> #include <linux/namei.h>
> #include <linux/pagemap.h>
> #include <linux/fsnotify.h>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f9a506147c8a..e4adf5e05afd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
> extern const struct bpf_func_proto bpf_strtol_proto;
> extern const struct bpf_func_proto bpf_strtoul_proto;
> extern const struct bpf_func_proto bpf_tcp_sock_proto;
> +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
>
> /* Shared helpers among cBPF and eBPF. */
> void bpf_user_rnd_init_once(void);
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 9138b4471dbf..b45c8b6f7cb4 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -6,6 +6,7 @@
> #include <linux/path.h>
> #include <linux/fcntl.h>
> #include <linux/errno.h>
> +#include <linux/fs.h>
>
> enum { MAX_NESTED_LINKS = 8 };
>
> @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
>
> extern void nd_jump_link(struct path *path);
>
> +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> + struct path *path, struct path *root);
> +
> static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
> {
> ((char *) name)[min(len, maxlen)] = '\0';
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4393bd4b2419..db241857ec15 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2741,6 +2741,28 @@ union bpf_attr {
> * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
> *
> * **-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
size_of_pidns => size.
> + * Description
> + * Copies into *pidns* pid, namespace id and tgid as seen by the
Copies => Copy.
Maybe something like below:
Get tgid, pid and namespace id as seen by the current namespace, and
device major/minor numbers from device /proc/self/ns/pid. Such
information is stored in *pidns* of size *size*.
> + * current namespace and also device from /proc/self/ns/pid.
> + * *size_of_pidns* must be the size of *pidns*
> + *
> + * This helper is used when pid filtering is needed inside a
> + * container as bpf_get_current_tgid() helper returns always the
returns always => always returns.
> + * pid id as seen by the root namespace.
> + * Return
> + * 0 on success
> + *
> + * **-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> + * or tgid of the current task.
> + *
> + * **-ECHILD** if /proc/self/ns/pid does not exists.
> + *
> + * **-ENOTDIR** if /proc/self/ns does not exists.
Let us remove ECHILD and ENOTDIR and replace it with ENOENT as I
described below.
Please *do verify* what happens when namespaces or pid_ns are not
configured.
> + *
> + * **-ENOMEM** if allocation fails.
helper internal allocation fails.
> + *
> */
> #define __BPF_FUNC_MAPPER(FN) \
> FN(unspec), \
> @@ -2853,7 +2875,8 @@ union bpf_attr {
> FN(sk_storage_get), \
> FN(sk_storage_delete), \
> FN(send_signal), \
> - FN(tcp_gen_syncookie),
> + FN(tcp_gen_syncookie), \
> + FN(get_current_pidns_info),
>
> /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> * function eBPF program intends to call
> @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
> __s32 retval;
> };
>
> +struct bpf_pidns_info {
> + __u32 dev;
Please add a comment for dev for how device major and minor number are
derived. User space gets device major and minor number, they need to
compare to the corresponding major/minor numbers returned by this helper.
> + __u32 nsid;
> + __u32 tgid;
> + __u32 pid;
> +};
> #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 8191a7db2777..3159f2a0188c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
> const struct bpf_func_proto bpf_get_current_comm_proto __weak;
> const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
> const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
>
> const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5e28718928ca..41fbf1f28a48 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -11,6 +11,12 @@
> #include <linux/uidgid.h>
> #include <linux/filter.h>
> #include <linux/ctype.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/major.h>
> +#include <linux/stat.h>
> +#include <linux/namei.h>
> +#include <linux/version.h>
> +
>
> #include "../../lib/kstrtox.h"
>
> @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> preempt_enable();
> }
>
> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> + size)
> +{
> + const char *pidns_path = "/proc/self/ns/pid";
> + struct pid_namespace *pidns = NULL;
> + struct filename *tmp = NULL;
tmp => fname
> + struct inode *inode;
> + struct path kp;
> + pid_t tgid = 0;
> + pid_t pid = 0;
> + int ret;
> + int len;
> +
> + if (unlikely(size != sizeof(struct bpf_pidns_info)))
> + return -EINVAL;
Please put an empty line. As a general rule for readability,
put an empty line if control flow is interrupted, e.g., by
"return", "break" or "continue". At least this is what
I saw most in bpf mailing list.
> + pidns = task_active_pid_ns(current);
> + if (unlikely(!pidns))
> + goto clear;
An empty line. Also, there is nothing to clear.
I prefer an error code -ENOENT.
You can set
ret = -EINVAL;
here
> + pidns_info->nsid = pidns->ns.inum;
> + pid = task_pid_nr_ns(current, pidns);
> + if (unlikely(!pid))
> + goto clear;
An empty line.
> + tgid = task_tgid_nr_ns(current, pidns);
> + if (unlikely(!tgid))
> + goto clear;
An empty line.
> + pidns_info->tgid = (u32) tgid;
> + pidns_info->pid = (u32) pid;
Different functionality, an empty line.
> + tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> + if (unlikely(!tmp)) {
> + memset((void *)pidns_info, 0, (size_t) size);
> + return -ENOMEM;
ret = -ENOMEM;
goto clear;
> + }
An empty line.
> + len = strlen(pidns_path) + 1;
> + memcpy((char *)tmp->name, pidns_path, len);
> + tmp->uptr = NULL;
> + tmp->aname = NULL;
> + tmp->refcnt = 1;
> + ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
Adding below to free kmem cache memory
kmem_cache_free(names_cachep, fname);
In the above, we checked task_active_pid_ns().
If not returning NULL, we have a valid pid ns. So the above
filename_lookup should not go wrong. We can still keep
the error checking though.
> + if (ret) {
> + memset((void *)pidns_info, 0, (size_t) size);
> + return ret;
goto clear;
> + }
An empty line.
> + inode = d_backing_inode(kp.dentry);
> + pidns_info->dev = inode->i_sb->s_dev;
> + return 0;
An empty line.
> +clear > + memset((void *)pidns_info, 0, (size_t) size);
> + return -EINVAL;
> +}
> +
> +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> + .func = bpf_get_current_pidns_info,
> + .gpl_only = false,
> + .ret_type = RET_INTEGER,
> + .arg1_type = ARG_PTR_TO_UNINIT_MEM,
> + .arg2_type = ARG_CONST_SIZE,
> +};
> +
> #ifdef CONFIG_CGROUPS
> BPF_CALL_0(bpf_get_current_cgroup_id)
> {
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ca1255d14576..5e1dc22765a5 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> #endif
> case BPF_FUNC_send_signal:
> return &bpf_send_signal_proto;
> + case BPF_FUNC_get_current_pidns_info:
> + return &bpf_get_current_pidns_info_proto;
> default:
> return NULL;
> }
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
2019-08-13 18:47 ` [PATCH bpf-next V9 1/3] bpf: new " Carlos Neira
2019-08-13 22:35 ` Yonghong Song
@ 2019-08-13 23:11 ` Yonghong Song
2019-08-13 23:51 ` [Potential Spoof] " Yonghong Song
2019-08-14 0:56 ` Carlos Antonio Neira Bustos
1 sibling, 2 replies; 16+ messages in thread
From: Yonghong Song @ 2019-08-13 23:11 UTC (permalink / raw)
To: Carlos Neira, netdev; +Cc: ebiederm, brouer, bpf
On 8/13/19 11:47 AM, Carlos Neira wrote:
> From: Carlos <cneirabustos@gmail.com>
>
> New bpf helper bpf_get_current_pidns_info.
> This helper obtains the active namespace from current and returns
> pid, tgid, device and namespace id as seen from that namespace,
> allowing to instrument a process inside a container.
>
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
> fs/internal.h | 2 --
> fs/namei.c | 1 -
> include/linux/bpf.h | 1 +
> include/linux/namei.h | 4 +++
> include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> kernel/bpf/core.c | 1 +
> kernel/bpf/helpers.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
> kernel/trace/bpf_trace.c | 2 ++
> 8 files changed, 102 insertions(+), 4 deletions(-)
>
> diff --git a/fs/internal.h b/fs/internal.h
> index 315fcd8d237c..6647e15dd419 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
> /*
> * namei.c
> */
> -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> - struct path *path, struct path *root);
> extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
> extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
> const char *, unsigned int, struct path *);
> diff --git a/fs/namei.c b/fs/namei.c
> index 209c51a5226c..a89fc72a4a10 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -19,7 +19,6 @@
> #include <linux/export.h>
> #include <linux/kernel.h>
> #include <linux/slab.h>
> -#include <linux/fs.h>
> #include <linux/namei.h>
> #include <linux/pagemap.h>
> #include <linux/fsnotify.h>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f9a506147c8a..e4adf5e05afd 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
> extern const struct bpf_func_proto bpf_strtol_proto;
> extern const struct bpf_func_proto bpf_strtoul_proto;
> extern const struct bpf_func_proto bpf_tcp_sock_proto;
> +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
>
> /* Shared helpers among cBPF and eBPF. */
> void bpf_user_rnd_init_once(void);
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 9138b4471dbf..b45c8b6f7cb4 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -6,6 +6,7 @@
> #include <linux/path.h>
> #include <linux/fcntl.h>
> #include <linux/errno.h>
> +#include <linux/fs.h>
>
> enum { MAX_NESTED_LINKS = 8 };
>
> @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
>
> extern void nd_jump_link(struct path *path);
>
> +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> + struct path *path, struct path *root);
> +
> static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
> {
> ((char *) name)[min(len, maxlen)] = '\0';
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4393bd4b2419..db241857ec15 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2741,6 +2741,28 @@ union bpf_attr {
> * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
> *
> * **-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> + * Description
> + * Copies into *pidns* pid, namespace id and tgid as seen by the
> + * current namespace and also device from /proc/self/ns/pid.
> + * *size_of_pidns* must be the size of *pidns*
> + *
> + * This helper is used when pid filtering is needed inside a
> + * container as bpf_get_current_tgid() helper returns always the
> + * pid id as seen by the root namespace.
> + * Return
> + * 0 on success
> + *
> + * **-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> + * or tgid of the current task.
> + *
> + * **-ECHILD** if /proc/self/ns/pid does not exists.
> + *
> + * **-ENOTDIR** if /proc/self/ns does not exists.
> + *
> + * **-ENOMEM** if allocation fails.
> + *
> */
> #define __BPF_FUNC_MAPPER(FN) \
> FN(unspec), \
> @@ -2853,7 +2875,8 @@ union bpf_attr {
> FN(sk_storage_get), \
> FN(sk_storage_delete), \
> FN(send_signal), \
> - FN(tcp_gen_syncookie),
> + FN(tcp_gen_syncookie), \
> + FN(get_current_pidns_info),
>
> /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> * function eBPF program intends to call
> @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
> __s32 retval;
> };
>
> +struct bpf_pidns_info {
> + __u32 dev;
> + __u32 nsid;
> + __u32 tgid;
> + __u32 pid;
> +};
> #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 8191a7db2777..3159f2a0188c 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
> const struct bpf_func_proto bpf_get_current_comm_proto __weak;
> const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
> const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
>
> const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> {
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 5e28718928ca..41fbf1f28a48 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -11,6 +11,12 @@
> #include <linux/uidgid.h>
> #include <linux/filter.h>
> #include <linux/ctype.h>
> +#include <linux/pid_namespace.h>
> +#include <linux/major.h>
> +#include <linux/stat.h>
> +#include <linux/namei.h>
> +#include <linux/version.h>
> +
>
> #include "../../lib/kstrtox.h"
>
> @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> preempt_enable();
> }
>
> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> + size)
> +{
> + const char *pidns_path = "/proc/self/ns/pid";
> + struct pid_namespace *pidns = NULL;
> + struct filename *tmp = NULL;
> + struct inode *inode;
> + struct path kp;
> + pid_t tgid = 0;
> + pid_t pid = 0;
> + int ret;
> + int len;
I am running your sample program and get the following kernel bug:
...
[ 26.414825] BUG: sleeping function called from invalid context at
/data/users/yhs/work/net-next/fs
/dcache.c:843
[ 26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
[ 26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G W
5.3.0-rc1+ #280
[ 26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.9.3-1.el7.centos 04/01/2
014
[ 26.419393] Call Trace:
[ 26.419697] <IRQ>
[ 26.419960] dump_stack+0x46/0x5b
[ 26.420434] ___might_sleep+0xe4/0x110
[ 26.420894] dput+0x2a/0x200
[ 26.421265] walk_component+0x10c/0x280
[ 26.421773] link_path_walk+0x327/0x560
[ 26.422280] ? proc_ns_dir_readdir+0x1a0/0x1a0
[ 26.422848] ? path_init+0x232/0x330
[ 26.423364] path_lookupat+0x88/0x200
[ 26.423808] ? selinux_parse_skb.constprop.69+0x124/0x430
[ 26.424521] filename_lookup+0xaf/0x190
[ 26.425031] ? simple_attr_release+0x20/0x20
[ 26.425560] bpf_get_current_pidns_info+0xfa/0x190
[ 26.426168] bpf_prog_83627154cefed596+0xe66/0x1000
[ 26.426779] trace_call_bpf+0xb5/0x160
[ 26.427317] ? __netif_receive_skb_core+0x1/0xbb0
[ 26.427929] ? __netif_receive_skb_core+0x1/0xbb0
[ 26.428496] kprobe_perf_func+0x4d/0x280
[ 26.428986] ? tracing_record_taskinfo_skip+0x1a/0x30
[ 26.429584] ? tracing_record_taskinfo+0xe/0x80
[ 26.430152] ? ttwu_do_wakeup.isra.114+0xcf/0xf0
[ 26.430737] ? __netif_receive_skb_core+0x1/0xbb0
[ 26.431334] ? __netif_receive_skb_core+0x5/0xbb0
[ 26.431930] kprobe_ftrace_handler+0x90/0xf0
[ 26.432495] ftrace_ops_assist_func+0x63/0x100
[ 26.433060] 0xffffffffc03180bf
[ 26.433471] ? __netif_receive_skb_core+0x1/0xbb0
...
To prevent we are running in arbitrary task (e.g., idle task)
context which may introduce sleeping issues, the following
probably appropriate:
if (in_nmi() || in_softirq())
return -EPERM;
Anyway, if in nmi or softirq, the namespace and pid/tgid
we get may be just accidentally associated with the bpf running
context, but it could be in a different context. So such info
is not reliable any way.
> +
> + if (unlikely(size != sizeof(struct bpf_pidns_info)))
> + return -EINVAL;
> + pidns = task_active_pid_ns(current);
> + if (unlikely(!pidns))
> + goto clear;
> + pidns_info->nsid = pidns->ns.inum;
> + pid = task_pid_nr_ns(current, pidns);
> + if (unlikely(!pid))
> + goto clear;
> + tgid = task_tgid_nr_ns(current, pidns);
> + if (unlikely(!tgid))
> + goto clear;
> + pidns_info->tgid = (u32) tgid;
> + pidns_info->pid = (u32) pid;
> + tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> + if (unlikely(!tmp)) {
> + memset((void *)pidns_info, 0, (size_t) size);
> + return -ENOMEM;
> + }
> + len = strlen(pidns_path) + 1;
> + memcpy((char *)tmp->name, pidns_path, len);
> + tmp->uptr = NULL;
> + tmp->aname = NULL;
> + tmp->refcnt = 1;
> + ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> + if (ret) {
> + memset((void *)pidns_info, 0, (size_t) size);
> + return ret;
> + }
> + inode = d_backing_inode(kp.dentry);
> + pidns_info->dev = inode->i_sb->s_dev;
> + return 0;
> +clear:
> + memset((void *)pidns_info, 0, (size_t) size);
> + return -EINVAL;
> +}
> +
> +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> + .func = bpf_get_current_pidns_info,
> + .gpl_only = false,
> + .ret_type = RET_INTEGER,
> + .arg1_type = ARG_PTR_TO_UNINIT_MEM,
> + .arg2_type = ARG_CONST_SIZE,
> +};
> +
> #ifdef CONFIG_CGROUPS
> BPF_CALL_0(bpf_get_current_cgroup_id)
> {
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index ca1255d14576..5e1dc22765a5 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> #endif
> case BPF_FUNC_send_signal:
> return &bpf_send_signal_proto;
> + case BPF_FUNC_get_current_pidns_info:
> + return &bpf_get_current_pidns_info_proto;
> default:
> return NULL;
> }
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper.
2019-08-13 18:47 ` [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper Carlos Neira
@ 2019-08-13 23:19 ` Yonghong Song
0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2019-08-13 23:19 UTC (permalink / raw)
To: Carlos Neira, netdev; +Cc: ebiederm, brouer, bpf
On 8/13/19 11:47 AM, Carlos Neira wrote:
> From: Carlos <cneirabustos@gmail.com>
>
> Added self-tests for new helper bpf_get_current_pidns_info.
>
> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> ---
> tools/include/uapi/linux/bpf.h | 31 ++++-
> tools/testing/selftests/bpf/Makefile | 2 +-
> tools/testing/selftests/bpf/bpf_helpers.h | 3 +
> .../testing/selftests/bpf/progs/test_pidns_kern.c | 51 ++++++++
> tools/testing/selftests/bpf/test_pidns.c | 138 +++++++++++++++++++++
Could you break this patch into two?
patch 1: tools/include/uapi/linux/bpf.h
patch 2: rest of changes
> 5 files changed, 223 insertions(+), 2 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
> create mode 100644 tools/testing/selftests/bpf/test_pidns.c
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 4393bd4b2419..db241857ec15 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -2741,6 +2741,28 @@ union bpf_attr {
> * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
> *
> * **-EPROTONOSUPPORT** IP packet version is not 4 or 6
> + *
> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> + * Description
> + * Copies into *pidns* pid, namespace id and tgid as seen by the
> + * current namespace and also device from /proc/self/ns/pid.
> + * *size_of_pidns* must be the size of *pidns*
> + *
> + * This helper is used when pid filtering is needed inside a
> + * container as bpf_get_current_tgid() helper returns always the
> + * pid id as seen by the root namespace.
> + * Return
> + * 0 on success
> + *
> + * **-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> + * or tgid of the current task.
> + *
> + * **-ECHILD** if /proc/self/ns/pid does not exists.
> + *
> + * **-ENOTDIR** if /proc/self/ns does not exists.
> + *
> + * **-ENOMEM** if allocation fails.
> + *
> */
> #define __BPF_FUNC_MAPPER(FN) \
> FN(unspec), \
> @@ -2853,7 +2875,8 @@ union bpf_attr {
> FN(sk_storage_get), \
> FN(sk_storage_delete), \
> FN(send_signal), \
> - FN(tcp_gen_syncookie),
> + FN(tcp_gen_syncookie), \
> + FN(get_current_pidns_info),
>
> /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> * function eBPF program intends to call
> @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
> __s32 retval;
> };
>
> +struct bpf_pidns_info {
> + __u32 dev;
> + __u32 nsid;
> + __u32 tgid;
> + __u32 pid;
> +};
> #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 3bd0f4a0336a..1f97b571b581 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
> test_cgroup_storage test_select_reuseport test_section_names \
> test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
> test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
> - test_sockopt_multi test_tcp_rtt
> + test_sockopt_multi test_tcp_rtt test_pidns
>
> BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
> TEST_GEN_FILES = $(BPF_OBJ_FILES)
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 8b503ea142f0..3fae3b9fcd2c 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
> static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
> int ip_len, void *tcp, int tcp_len) =
> (void *) BPF_FUNC_tcp_gen_syncookie;
> +static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info *buf,
> + unsigned int buf_size) =
> + (void *) BPF_FUNC_get_current_pidns_info;
>
> /* llvm builtin functions that eBPF C program may use to
> * emit BPF_LD_ABS and BPF_LD_IND instructions
> diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
> new file mode 100644
> index 000000000000..e1d2facfa762
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
> @@ -0,0 +1,51 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <linux/bpf.h>
> +#include <errno.h>
> +#include "bpf_helpers.h"
> +
> +struct bpf_map_def SEC("maps") nsidmap = {
> + .type = BPF_MAP_TYPE_ARRAY,
> + .key_size = sizeof(__u32),
> + .value_size = sizeof(__u32),
> + .max_entries = 1,
> +};
> +
> +struct bpf_map_def SEC("maps") pidmap = {
> + .type = BPF_MAP_TYPE_ARRAY,
> + .key_size = sizeof(__u32),
> + .value_size = sizeof(__u32),
> + .max_entries = 1,
> +};
Could you use new map definitions. Search
"SEC(".maps")" for examples.
> +
> +SEC("tracepoint/syscalls/sys_enter_nanosleep")
> +int trace(void *ctx)
> +{
> + struct bpf_pidns_info nsinfo;
> + __u32 key = 0, *expected_pid, *val;
> + char fmt[] = "ERROR nspid:%d\n";
> +
> + if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
> + return -EINVAL;
> +
> + expected_pid = bpf_map_lookup_elem(&pidmap, &key);
> +
> +
> + if (!expected_pid || *expected_pid != nsinfo.pid)
> + return 0;
>
I would like you to compare device major/minor, namespace id,
pid and tid. We should test everything here.
+
> + val = bpf_map_lookup_elem(&nsidmap, &key);
> + if (val)
> + *val = nsinfo.nsid;
> +
> + return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +__u32 _version SEC("version") = 1;
> diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c
> new file mode 100644
> index 000000000000..a7254055f294
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_pidns.c
> @@ -0,0 +1,138 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <syscall.h>
> +#include <unistd.h>
> +#include <linux/perf_event.h>
> +#include <sys/ioctl.h>
> +#include <sys/time.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#include "cgroup_helpers.h"
> +#include "bpf_rlimit.h"
> +
> +#define CHECK(condition, tag, format...) ({ \
> + int __ret = !!(condition); \
> + if (__ret) { \
> + printf("%s:FAIL:%s ", __func__, tag); \
> + printf(format); \
> + } else { \
> + printf("%s:PASS:%s\n", __func__, tag); \
> + } \
> + __ret; \
> +})
> +
> +static int bpf_find_map(const char *test, struct bpf_object *obj,
> + const char *name)
> +{
> + struct bpf_map *map;
> +
> + map = bpf_object__find_map_by_name(obj, name);
> + if (!map)
> + return -1;
> + return bpf_map__fd(map);
> +}
> +
> +
> +int main(int argc, char **argv)
> +{
> + const char *probe_name = "syscalls/sys_enter_nanosleep";
> + const char *file = "test_pidns_kern.o";
> + int err, bytes, efd, prog_fd, pmu_fd;
> + int pidmap_fd, nsidmap_fd;
> + struct perf_event_attr attr = {};
> + struct bpf_object *obj;
> + __u32 knsid = 0;
> + __u32 key = 0, pid;
> + int exit_code = 1;
> + struct stat st;
> + char buf[256];
> +
> + err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
> + if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
> + goto cleanup_cgroup_env;
> +
> + nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
> + if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> + nsidmap_fd, errno))
> + goto close_prog;
> +
> + pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
> + if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> + pidmap_fd, errno))
> + goto close_prog;
> +
> + pid = getpid();
> + bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
> +
> + snprintf(buf, sizeof(buf),
> + "/sys/kernel/debug/tracing/events/%s/id", probe_name);
> + efd = open(buf, O_RDONLY, 0);
> + if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
> + goto close_prog;
> + bytes = read(efd, buf, sizeof(buf));
> + close(efd);
> + if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
> + "bytes %d errno %d\n", bytes, errno))
> + goto close_prog;
Please use libbpf perf APIs.
It would be good if the test actually create a namespace and do the test.
Do you think it is possible to use the existing test_progs
infrastructure. The current one without creating pid namespace
surely fit in. Not sure if we add creating/deleting namespace,
I would think it should fit in as well.
> +
> + attr.config = strtol(buf, NULL, 0);
> + attr.type = PERF_TYPE_TRACEPOINT;
> + attr.sample_type = PERF_SAMPLE_RAW;
> + attr.sample_period = 1;
> + attr.wakeup_events = 1;
> +
> + pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
> + if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
> + errno))
> + goto close_prog;
> +
> + err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> + if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
> + errno))
> + goto close_pmu;
> +
> + err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> + if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
> + errno))
> + goto close_pmu;
> +
> + /* trigger some syscalls */
> + sleep(1);
> +
> + err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
> + if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
> + goto close_pmu;
> +
> + if (stat("/proc/self/ns/pid", &st))
> + goto close_pmu;
> +
> + if (CHECK(knsid != (__u32) st.st_ino, "compare_namespace_id",
> + "kern knsid %u user unsid %u\n", knsid, (__u32) st.st_ino))
> + goto close_pmu;
> +
> + exit_code = 0;
> + printf("%s:PASS\n", argv[0]);
> +
> +close_pmu:
> + close(pmu_fd);
> +close_prog:
> + bpf_object__close(obj);
> +cleanup_cgroup_env:
> + return exit_code;
> +}
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Potential Spoof] Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
2019-08-13 23:11 ` Yonghong Song
@ 2019-08-13 23:51 ` Yonghong Song
2019-08-14 0:56 ` Carlos Antonio Neira Bustos
1 sibling, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2019-08-13 23:51 UTC (permalink / raw)
To: Carlos Neira, netdev; +Cc: ebiederm, brouer, bpf
On 8/13/19 4:11 PM, Yonghong Song wrote:
>
>
> On 8/13/19 11:47 AM, Carlos Neira wrote:
>> From: Carlos <cneirabustos@gmail.com>
>>
>> New bpf helper bpf_get_current_pidns_info.
>> This helper obtains the active namespace from current and returns
>> pid, tgid, device and namespace id as seen from that namespace,
>> allowing to instrument a process inside a container.
>>
>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
>> ---
>> fs/internal.h | 2 --
>> fs/namei.c | 1 -
>> include/linux/bpf.h | 1 +
>> include/linux/namei.h | 4 +++
>> include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
>> kernel/bpf/core.c | 1 +
>> kernel/bpf/helpers.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
>> kernel/trace/bpf_trace.c | 2 ++
>> 8 files changed, 102 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/internal.h b/fs/internal.h
>> index 315fcd8d237c..6647e15dd419 100644
>> --- a/fs/internal.h
>> +++ b/fs/internal.h
>> @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
>> /*
>> * namei.c
>> */
>> -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
>> - struct path *path, struct path *root);
>> extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
>> extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
>> const char *, unsigned int, struct path *);
>> diff --git a/fs/namei.c b/fs/namei.c
>> index 209c51a5226c..a89fc72a4a10 100644
>> --- a/fs/namei.c
>> +++ b/fs/namei.c
>> @@ -19,7 +19,6 @@
>> #include <linux/export.h>
>> #include <linux/kernel.h>
>> #include <linux/slab.h>
>> -#include <linux/fs.h>
>> #include <linux/namei.h>
>> #include <linux/pagemap.h>
>> #include <linux/fsnotify.h>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index f9a506147c8a..e4adf5e05afd 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
>> extern const struct bpf_func_proto bpf_strtol_proto;
>> extern const struct bpf_func_proto bpf_strtoul_proto;
>> extern const struct bpf_func_proto bpf_tcp_sock_proto;
>> +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
>>
>> /* Shared helpers among cBPF and eBPF. */
>> void bpf_user_rnd_init_once(void);
>> diff --git a/include/linux/namei.h b/include/linux/namei.h
>> index 9138b4471dbf..b45c8b6f7cb4 100644
>> --- a/include/linux/namei.h
>> +++ b/include/linux/namei.h
>> @@ -6,6 +6,7 @@
>> #include <linux/path.h>
>> #include <linux/fcntl.h>
>> #include <linux/errno.h>
>> +#include <linux/fs.h>
>>
>> enum { MAX_NESTED_LINKS = 8 };
>>
>> @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
>>
>> extern void nd_jump_link(struct path *path);
>>
>> +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
>> + struct path *path, struct path *root);
>> +
>> static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
>> {
>> ((char *) name)[min(len, maxlen)] = '\0';
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 4393bd4b2419..db241857ec15 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -2741,6 +2741,28 @@ union bpf_attr {
>> * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
>> *
>> * **-EPROTONOSUPPORT** IP packet version is not 4 or 6
>> + *
>> + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
>> + * Description
>> + * Copies into *pidns* pid, namespace id and tgid as seen by the
>> + * current namespace and also device from /proc/self/ns/pid.
>> + * *size_of_pidns* must be the size of *pidns*
>> + *
>> + * This helper is used when pid filtering is needed inside a
>> + * container as bpf_get_current_tgid() helper returns always the
>> + * pid id as seen by the root namespace.
>> + * Return
>> + * 0 on success
>> + *
>> + * **-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
>> + * or tgid of the current task.
>> + *
>> + * **-ECHILD** if /proc/self/ns/pid does not exists.
>> + *
>> + * **-ENOTDIR** if /proc/self/ns does not exists.
>> + *
>> + * **-ENOMEM** if allocation fails.
>> + *
>> */
>> #define __BPF_FUNC_MAPPER(FN) \
>> FN(unspec), \
>> @@ -2853,7 +2875,8 @@ union bpf_attr {
>> FN(sk_storage_get), \
>> FN(sk_storage_delete), \
>> FN(send_signal), \
>> - FN(tcp_gen_syncookie),
>> + FN(tcp_gen_syncookie), \
>> + FN(get_current_pidns_info),
>>
>> /* integer value in 'imm' field of BPF_CALL instruction selects which helper
>> * function eBPF program intends to call
>> @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
>> __s32 retval;
>> };
>>
>> +struct bpf_pidns_info {
>> + __u32 dev;
>> + __u32 nsid;
>> + __u32 tgid;
>> + __u32 pid;
>> +};
>> #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index 8191a7db2777..3159f2a0188c 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
>> const struct bpf_func_proto bpf_get_current_comm_proto __weak;
>> const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
>> const struct bpf_func_proto bpf_get_local_storage_proto __weak;
>> +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
>>
>> const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
>> {
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index 5e28718928ca..41fbf1f28a48 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -11,6 +11,12 @@
>> #include <linux/uidgid.h>
>> #include <linux/filter.h>
>> #include <linux/ctype.h>
>> +#include <linux/pid_namespace.h>
>> +#include <linux/major.h>
>> +#include <linux/stat.h>
>> +#include <linux/namei.h>
>> +#include <linux/version.h>
>> +
>>
>> #include "../../lib/kstrtox.h"
>>
>> @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
>> preempt_enable();
>> }
>>
>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
>> + size)
>> +{
>> + const char *pidns_path = "/proc/self/ns/pid";
>> + struct pid_namespace *pidns = NULL;
>> + struct filename *tmp = NULL;
>> + struct inode *inode;
>> + struct path kp;
>> + pid_t tgid = 0;
>> + pid_t pid = 0;
>> + int ret;
>> + int len;
>
> I am running your sample program and get the following kernel bug:
>
> ...
> [ 26.414825] BUG: sleeping function called from invalid context at
> /data/users/yhs/work/net-next/fs
> /dcache.c:843
> [ 26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> [ 26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G W
> 5.3.0-rc1+ #280
> [ 26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.9.3-1.el7.centos 04/01/2
> 014
> [ 26.419393] Call Trace:
> [ 26.419697] <IRQ>
> [ 26.419960] dump_stack+0x46/0x5b
> [ 26.420434] ___might_sleep+0xe4/0x110
> [ 26.420894] dput+0x2a/0x200
> [ 26.421265] walk_component+0x10c/0x280
> [ 26.421773] link_path_walk+0x327/0x560
> [ 26.422280] ? proc_ns_dir_readdir+0x1a0/0x1a0
> [ 26.422848] ? path_init+0x232/0x330
> [ 26.423364] path_lookupat+0x88/0x200
> [ 26.423808] ? selinux_parse_skb.constprop.69+0x124/0x430
> [ 26.424521] filename_lookup+0xaf/0x190
> [ 26.425031] ? simple_attr_release+0x20/0x20
> [ 26.425560] bpf_get_current_pidns_info+0xfa/0x190
> [ 26.426168] bpf_prog_83627154cefed596+0xe66/0x1000
> [ 26.426779] trace_call_bpf+0xb5/0x160
> [ 26.427317] ? __netif_receive_skb_core+0x1/0xbb0
> [ 26.427929] ? __netif_receive_skb_core+0x1/0xbb0
> [ 26.428496] kprobe_perf_func+0x4d/0x280
> [ 26.428986] ? tracing_record_taskinfo_skip+0x1a/0x30
> [ 26.429584] ? tracing_record_taskinfo+0xe/0x80
> [ 26.430152] ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> [ 26.430737] ? __netif_receive_skb_core+0x1/0xbb0
> [ 26.431334] ? __netif_receive_skb_core+0x5/0xbb0
> [ 26.431930] kprobe_ftrace_handler+0x90/0xf0
> [ 26.432495] ftrace_ops_assist_func+0x63/0x100
> [ 26.433060] 0xffffffffc03180bf
> [ 26.433471] ? __netif_receive_skb_core+0x1/0xbb0
> ...
>
> To prevent we are running in arbitrary task (e.g., idle task)
> context which may introduce sleeping issues, the following
> probably appropriate:
>
> if (in_nmi() || in_softirq())
> return -EPERM;
A better condition is (from helper bpf_probe_write_user()):
if (unlikely(in_interrupt() ||
current->flags & (PF_KTHREAD | PF_EXITING)))
return -EPERM;
>
> Anyway, if in nmi or softirq, the namespace and pid/tgid
> we get may be just accidentally associated with the bpf running
> context, but it could be in a different context. So such info
> is not reliable any way.
>
>> +
>> + if (unlikely(size != sizeof(struct bpf_pidns_info)))
>> + return -EINVAL;
>> + pidns = task_active_pid_ns(current);
>> + if (unlikely(!pidns))
>> + goto clear;
>> + pidns_info->nsid = pidns->ns.inum;
>> + pid = task_pid_nr_ns(current, pidns);
>> + if (unlikely(!pid))
>> + goto clear;
>> + tgid = task_tgid_nr_ns(current, pidns);
>> + if (unlikely(!tgid))
>> + goto clear;
>> + pidns_info->tgid = (u32) tgid;
>> + pidns_info->pid = (u32) pid;
>> + tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
>> + if (unlikely(!tmp)) {
>> + memset((void *)pidns_info, 0, (size_t) size);
>> + return -ENOMEM;
>> + }
>> + len = strlen(pidns_path) + 1;
>> + memcpy((char *)tmp->name, pidns_path, len);
>> + tmp->uptr = NULL;
>> + tmp->aname = NULL;
>> + tmp->refcnt = 1;
>> + ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
>> + if (ret) {
>> + memset((void *)pidns_info, 0, (size_t) size);
>> + return ret;
>> + }
>> + inode = d_backing_inode(kp.dentry);
>> + pidns_info->dev = inode->i_sb->s_dev;
>> + return 0;
>> +clear:
>> + memset((void *)pidns_info, 0, (size_t) size);
>> + return -EINVAL;
>> +}
>> +
>> +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
>> + .func = bpf_get_current_pidns_info,
>> + .gpl_only = false,
>> + .ret_type = RET_INTEGER,
>> + .arg1_type = ARG_PTR_TO_UNINIT_MEM,
>> + .arg2_type = ARG_CONST_SIZE,
>> +};
>> +
>> #ifdef CONFIG_CGROUPS
>> BPF_CALL_0(bpf_get_current_cgroup_id)
>> {
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index ca1255d14576..5e1dc22765a5 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>> #endif
>> case BPF_FUNC_send_signal:
>> return &bpf_send_signal_proto;
>> + case BPF_FUNC_get_current_pidns_info:
>> + return &bpf_get_current_pidns_info_proto;
>> default:
>> return NULL;
>> }
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
2019-08-13 23:11 ` Yonghong Song
2019-08-13 23:51 ` [Potential Spoof] " Yonghong Song
@ 2019-08-14 0:56 ` Carlos Antonio Neira Bustos
[not found] ` <9a2cacad-b79f-5d39-6d62-bb48cbaaac07@fb.com>
1 sibling, 1 reply; 16+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-08-14 0:56 UTC (permalink / raw)
To: Yonghong Song; +Cc: netdev, ebiederm, brouer, bpf
On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
>
>
> On 8/13/19 11:47 AM, Carlos Neira wrote:
> > From: Carlos <cneirabustos@gmail.com>
> >
> > New bpf helper bpf_get_current_pidns_info.
> > This helper obtains the active namespace from current and returns
> > pid, tgid, device and namespace id as seen from that namespace,
> > allowing to instrument a process inside a container.
> >
> > Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > ---
> > fs/internal.h | 2 --
> > fs/namei.c | 1 -
> > include/linux/bpf.h | 1 +
> > include/linux/namei.h | 4 +++
> > include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> > kernel/bpf/core.c | 1 +
> > kernel/bpf/helpers.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
> > kernel/trace/bpf_trace.c | 2 ++
> > 8 files changed, 102 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/internal.h b/fs/internal.h
> > index 315fcd8d237c..6647e15dd419 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
> > /*
> > * namei.c
> > */
> > -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> > - struct path *path, struct path *root);
> > extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
> > extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
> > const char *, unsigned int, struct path *);
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 209c51a5226c..a89fc72a4a10 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -19,7 +19,6 @@
> > #include <linux/export.h>
> > #include <linux/kernel.h>
> > #include <linux/slab.h>
> > -#include <linux/fs.h>
> > #include <linux/namei.h>
> > #include <linux/pagemap.h>
> > #include <linux/fsnotify.h>
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f9a506147c8a..e4adf5e05afd 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
> > extern const struct bpf_func_proto bpf_strtol_proto;
> > extern const struct bpf_func_proto bpf_strtoul_proto;
> > extern const struct bpf_func_proto bpf_tcp_sock_proto;
> > +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
> >
> > /* Shared helpers among cBPF and eBPF. */
> > void bpf_user_rnd_init_once(void);
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index 9138b4471dbf..b45c8b6f7cb4 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -6,6 +6,7 @@
> > #include <linux/path.h>
> > #include <linux/fcntl.h>
> > #include <linux/errno.h>
> > +#include <linux/fs.h>
> >
> > enum { MAX_NESTED_LINKS = 8 };
> >
> > @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
> >
> > extern void nd_jump_link(struct path *path);
> >
> > +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> > + struct path *path, struct path *root);
> > +
> > static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
> > {
> > ((char *) name)[min(len, maxlen)] = '\0';
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4393bd4b2419..db241857ec15 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2741,6 +2741,28 @@ union bpf_attr {
> > * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
> > *
> > * **-EPROTONOSUPPORT** IP packet version is not 4 or 6
> > + *
> > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> > + * Description
> > + * Copies into *pidns* pid, namespace id and tgid as seen by the
> > + * current namespace and also device from /proc/self/ns/pid.
> > + * *size_of_pidns* must be the size of *pidns*
> > + *
> > + * This helper is used when pid filtering is needed inside a
> > + * container as bpf_get_current_tgid() helper returns always the
> > + * pid id as seen by the root namespace.
> > + * Return
> > + * 0 on success
> > + *
> > + * **-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> > + * or tgid of the current task.
> > + *
> > + * **-ECHILD** if /proc/self/ns/pid does not exists.
> > + *
> > + * **-ENOTDIR** if /proc/self/ns does not exists.
> > + *
> > + * **-ENOMEM** if allocation fails.
> > + *
> > */
> > #define __BPF_FUNC_MAPPER(FN) \
> > FN(unspec), \
> > @@ -2853,7 +2875,8 @@ union bpf_attr {
> > FN(sk_storage_get), \
> > FN(sk_storage_delete), \
> > FN(send_signal), \
> > - FN(tcp_gen_syncookie),
> > + FN(tcp_gen_syncookie), \
> > + FN(get_current_pidns_info),
> >
> > /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > * function eBPF program intends to call
> > @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
> > __s32 retval;
> > };
> >
> > +struct bpf_pidns_info {
> > + __u32 dev;
> > + __u32 nsid;
> > + __u32 tgid;
> > + __u32 pid;
> > +};
> > #endif /* _UAPI__LINUX_BPF_H__ */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 8191a7db2777..3159f2a0188c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
> > const struct bpf_func_proto bpf_get_current_comm_proto __weak;
> > const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
> > const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> > +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
> >
> > const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> > {
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 5e28718928ca..41fbf1f28a48 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -11,6 +11,12 @@
> > #include <linux/uidgid.h>
> > #include <linux/filter.h>
> > #include <linux/ctype.h>
> > +#include <linux/pid_namespace.h>
> > +#include <linux/major.h>
> > +#include <linux/stat.h>
> > +#include <linux/namei.h>
> > +#include <linux/version.h>
> > +
> >
> > #include "../../lib/kstrtox.h"
> >
> > @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> > preempt_enable();
> > }
> >
> > +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> > + size)
> > +{
> > + const char *pidns_path = "/proc/self/ns/pid";
> > + struct pid_namespace *pidns = NULL;
> > + struct filename *tmp = NULL;
> > + struct inode *inode;
> > + struct path kp;
> > + pid_t tgid = 0;
> > + pid_t pid = 0;
> > + int ret;
> > + int len;
>
Thank you very much for catching this!.
Could you share how to replicate this bug?.
> I am running your sample program and get the following kernel bug:
>
> ...
> [ 26.414825] BUG: sleeping function called from invalid context at
> /data/users/yhs/work/net-next/fs
> /dcache.c:843
> [ 26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> [ 26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G W
> 5.3.0-rc1+ #280
> [ 26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.9.3-1.el7.centos 04/01/2
> 014
> [ 26.419393] Call Trace:
> [ 26.419697] <IRQ>
> [ 26.419960] dump_stack+0x46/0x5b
> [ 26.420434] ___might_sleep+0xe4/0x110
> [ 26.420894] dput+0x2a/0x200
> [ 26.421265] walk_component+0x10c/0x280
> [ 26.421773] link_path_walk+0x327/0x560
> [ 26.422280] ? proc_ns_dir_readdir+0x1a0/0x1a0
> [ 26.422848] ? path_init+0x232/0x330
> [ 26.423364] path_lookupat+0x88/0x200
> [ 26.423808] ? selinux_parse_skb.constprop.69+0x124/0x430
> [ 26.424521] filename_lookup+0xaf/0x190
> [ 26.425031] ? simple_attr_release+0x20/0x20
> [ 26.425560] bpf_get_current_pidns_info+0xfa/0x190
> [ 26.426168] bpf_prog_83627154cefed596+0xe66/0x1000
> [ 26.426779] trace_call_bpf+0xb5/0x160
> [ 26.427317] ? __netif_receive_skb_core+0x1/0xbb0
> [ 26.427929] ? __netif_receive_skb_core+0x1/0xbb0
> [ 26.428496] kprobe_perf_func+0x4d/0x280
> [ 26.428986] ? tracing_record_taskinfo_skip+0x1a/0x30
> [ 26.429584] ? tracing_record_taskinfo+0xe/0x80
> [ 26.430152] ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> [ 26.430737] ? __netif_receive_skb_core+0x1/0xbb0
> [ 26.431334] ? __netif_receive_skb_core+0x5/0xbb0
> [ 26.431930] kprobe_ftrace_handler+0x90/0xf0
> [ 26.432495] ftrace_ops_assist_func+0x63/0x100
> [ 26.433060] 0xffffffffc03180bf
> [ 26.433471] ? __netif_receive_skb_core+0x1/0xbb0
> ...
>
> To prevent we are running in arbitrary task (e.g., idle task)
> context which may introduce sleeping issues, the following
> probably appropriate:
>
> if (in_nmi() || in_softirq())
> return -EPERM;
>
> Anyway, if in nmi or softirq, the namespace and pid/tgid
> we get may be just accidentally associated with the bpf running
> context, but it could be in a different context. So such info
> is not reliable any way.
>
> > +
> > + if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > + return -EINVAL;
> > + pidns = task_active_pid_ns(current);
> > + if (unlikely(!pidns))
> > + goto clear;
> > + pidns_info->nsid = pidns->ns.inum;
> > + pid = task_pid_nr_ns(current, pidns);
> > + if (unlikely(!pid))
> > + goto clear;
> > + tgid = task_tgid_nr_ns(current, pidns);
> > + if (unlikely(!tgid))
> > + goto clear;
> > + pidns_info->tgid = (u32) tgid;
> > + pidns_info->pid = (u32) pid;
> > + tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> > + if (unlikely(!tmp)) {
> > + memset((void *)pidns_info, 0, (size_t) size);
> > + return -ENOMEM;
> > + }
> > + len = strlen(pidns_path) + 1;
> > + memcpy((char *)tmp->name, pidns_path, len);
> > + tmp->uptr = NULL;
> > + tmp->aname = NULL;
> > + tmp->refcnt = 1;
> > + ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> > + if (ret) {
> > + memset((void *)pidns_info, 0, (size_t) size);
> > + return ret;
> > + }
> > + inode = d_backing_inode(kp.dentry);
> > + pidns_info->dev = inode->i_sb->s_dev;
> > + return 0;
> > +clear:
> > + memset((void *)pidns_info, 0, (size_t) size);
> > + return -EINVAL;
> > +}
> > +
> > +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> > + .func = bpf_get_current_pidns_info,
> > + .gpl_only = false,
> > + .ret_type = RET_INTEGER,
> > + .arg1_type = ARG_PTR_TO_UNINIT_MEM,
> > + .arg2_type = ARG_CONST_SIZE,
> > +};
> > +
> > #ifdef CONFIG_CGROUPS
> > BPF_CALL_0(bpf_get_current_cgroup_id)
> > {
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ca1255d14576..5e1dc22765a5 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > #endif
> > case BPF_FUNC_send_signal:
> > return &bpf_send_signal_proto;
> > + case BPF_FUNC_get_current_pidns_info:
> > + return &bpf_get_current_pidns_info_proto;
> > default:
> > return NULL;
> > }
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
2019-08-13 22:35 ` Yonghong Song
@ 2019-08-20 15:10 ` Carlos Antonio Neira Bustos
2019-08-20 17:29 ` Yonghong Song
0 siblings, 1 reply; 16+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-08-20 15:10 UTC (permalink / raw)
To: Yonghong Song; +Cc: netdev, ebiederm, brouer, bpf
Hi Yonghong,
Thanks for taking the time to review this.
> > + *
> > + * **-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> > + * or tgid of the current task.
> > + *
> > + * **-ECHILD** if /proc/self/ns/pid does not exists.
> > + *
> > + * **-ENOTDIR** if /proc/self/ns does not exists.
>
> Let us remove ECHILD and ENOTDIR and replace it with ENOENT as I
> described below.
>
> Please *do verify* what happens when namespaces or pid_ns are not
> configured.
>
I have tested kernel configurations without namespace support and with
namespace support but without pid namespaces, the helper returns -EINVAL
on both cases, now it should return -ENOENT.
> > +struct bpf_pidns_info {
> > + __u32 dev;
>
> Please add a comment for dev for how device major and minor number are
> derived. User space gets device major and minor number, they need to
> compare to the corresponding major/minor numbers returned by this helper.
>
> > + __u32 nsid;
> > + __u32 tgid;
> > + __u32 pid;
> > +};
>
What do you think of this comment ?
struct bpf_pidns_info {
__u32 dev; /* major/minor numbers from /proc/self/ns/pid.
* User space gets device major and minor numbers from
* the same device that need to be compared against the
* major/minor numbers returned by this helper.
*/
__u32 nsid;
__u32 tgid;
__u32 pid;
};
>
> Please put an empty line. As a general rule for readability,
> put an empty line if control flow is interrupted, e.g., by
> "return", "break" or "continue". At least this is what
> I saw most in bpf mailing list.
>
I'll fix it in version 10.
> > + len = strlen(pidns_path) + 1;
> > + memcpy((char *)tmp->name, pidns_path, len);
> > + tmp->uptr = NULL;
> > + tmp->aname = NULL;
> > + tmp->refcnt = 1;
> > + ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> Adding below to free kmem cache memory
> kmem_cache_free(names_cachep, fname);
>
I think we don't need to call kmem_cache_free as filename_lookup
calls putname that calls kmem_cache_free.
Thanks a lot for your help.
Bests
> In the above, we checked task_active_pid_ns().
> If not returning NULL, we have a valid pid ns. So the above
> filename_lookup should not go wrong. We can still keep
> the error checking though.
>
> > + if (ret) {
> > + memset((void *)pidns_info, 0, (size_t) size);
> > + return ret;
>
>
I think we could get rid of this.
On Tue, Aug 13, 2019 at 10:35:42PM +0000, Yonghong Song wrote:
>
>
> On 8/13/19 11:47 AM, Carlos Neira wrote:
> > From: Carlos <cneirabustos@gmail.com>
> >
> > New bpf helper bpf_get_current_pidns_info.
> > This helper obtains the active namespace from current and returns
> > pid, tgid, device and namespace id as seen from that namespace,
> > allowing to instrument a process inside a container.
> >
> > Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > ---
> > fs/internal.h | 2 --
> > fs/namei.c | 1 -
> > include/linux/bpf.h | 1 +
> > include/linux/namei.h | 4 +++
> > include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> > kernel/bpf/core.c | 1 +
> > kernel/bpf/helpers.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++
> > kernel/trace/bpf_trace.c | 2 ++
> > 8 files changed, 102 insertions(+), 4 deletions(-)
>
> I prefer to break this into two patches to reduce
> the potential merging conflicts:
> patch 1: fs/internal.h, fs/namei.c, include/linux/namei.h
> patch 2: rest of changes
> patch 1 is simply a preparing patches to make filename_lookup
> available later.
>
> >
> > diff --git a/fs/internal.h b/fs/internal.h
> > index 315fcd8d237c..6647e15dd419 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
> > /*
> > * namei.c
> > */
> > -extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> > - struct path *path, struct path *root);
> > extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
> > extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
> > const char *, unsigned int, struct path *);
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 209c51a5226c..a89fc72a4a10 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -19,7 +19,6 @@
> > #include <linux/export.h>
> > #include <linux/kernel.h>
> > #include <linux/slab.h>
> > -#include <linux/fs.h>
> > #include <linux/namei.h>
> > #include <linux/pagemap.h>
> > #include <linux/fsnotify.h>
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f9a506147c8a..e4adf5e05afd 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
> > extern const struct bpf_func_proto bpf_strtol_proto;
> > extern const struct bpf_func_proto bpf_strtoul_proto;
> > extern const struct bpf_func_proto bpf_tcp_sock_proto;
> > +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
> >
> > /* Shared helpers among cBPF and eBPF. */
> > void bpf_user_rnd_init_once(void);
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index 9138b4471dbf..b45c8b6f7cb4 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -6,6 +6,7 @@
> > #include <linux/path.h>
> > #include <linux/fcntl.h>
> > #include <linux/errno.h>
> > +#include <linux/fs.h>
> >
> > enum { MAX_NESTED_LINKS = 8 };
> >
> > @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
> >
> > extern void nd_jump_link(struct path *path);
> >
> > +extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
> > + struct path *path, struct path *root);
> > +
> > static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
> > {
> > ((char *) name)[min(len, maxlen)] = '\0';
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4393bd4b2419..db241857ec15 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2741,6 +2741,28 @@ union bpf_attr {
> > * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
> > *
> > * **-EPROTONOSUPPORT** IP packet version is not 4 or 6
> > + *
> > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
>
> size_of_pidns => size.
>
> > + * Description
> > + * Copies into *pidns* pid, namespace id and tgid as seen by the
> Copies => Copy.
> Maybe something like below:
> Get tgid, pid and namespace id as seen by the current namespace, and
> device major/minor numbers from device /proc/self/ns/pid. Such
> information is stored in *pidns* of size *size*.
>
> > + * current namespace and also device from /proc/self/ns/pid.
> > + * *size_of_pidns* must be the size of *pidns*
> > + *
> > + * This helper is used when pid filtering is needed inside a
> > + * container as bpf_get_current_tgid() helper returns always the
>
> returns always => always returns.
>
> > + * pid id as seen by the root namespace.
> > + * Return
> > + * 0 on success
> > + *
> > + * **-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
> > + * or tgid of the current task.
> > + *
> > + * **-ECHILD** if /proc/self/ns/pid does not exists.
> > + *
> > + * **-ENOTDIR** if /proc/self/ns does not exists.
>
> Let us remove ECHILD and ENOTDIR and replace it with ENOENT as I
> described below.
>
> Please *do verify* what happens when namespaces or pid_ns are not
> configured.
>
> > + *
> > + * **-ENOMEM** if allocation fails.
>
> helper internal allocation fails.
>
> > + *
> > */
> > #define __BPF_FUNC_MAPPER(FN) \
> > FN(unspec), \
> > @@ -2853,7 +2875,8 @@ union bpf_attr {
> > FN(sk_storage_get), \
> > FN(sk_storage_delete), \
> > FN(send_signal), \
> > - FN(tcp_gen_syncookie),
> > + FN(tcp_gen_syncookie), \
> > + FN(get_current_pidns_info),
> >
> > /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > * function eBPF program intends to call
> > @@ -3604,4 +3627,10 @@ struct bpf_sockopt {
> > __s32 retval;
> > };
> >
> > +struct bpf_pidns_info {
> > + __u32 dev;
>
> Please add a comment for dev for how device major and minor number are
> derived. User space gets device major and minor number, they need to
> compare to the corresponding major/minor numbers returned by this helper.
>
> > + __u32 nsid;
> > + __u32 tgid;
> > + __u32 pid;
> > +};
> > #endif /* _UAPI__LINUX_BPF_H__ */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 8191a7db2777..3159f2a0188c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
> > const struct bpf_func_proto bpf_get_current_comm_proto __weak;
> > const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
> > const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> > +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
> >
> > const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> > {
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 5e28718928ca..41fbf1f28a48 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -11,6 +11,12 @@
> > #include <linux/uidgid.h>
> > #include <linux/filter.h>
> > #include <linux/ctype.h>
> > +#include <linux/pid_namespace.h>
> > +#include <linux/major.h>
> > +#include <linux/stat.h>
> > +#include <linux/namei.h>
> > +#include <linux/version.h>
> > +
> >
> > #include "../../lib/kstrtox.h"
> >
> > @@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> > preempt_enable();
> > }
> >
> > +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> > + size)
> > +{
> > + const char *pidns_path = "/proc/self/ns/pid";
> > + struct pid_namespace *pidns = NULL;
> > + struct filename *tmp = NULL;
>
> tmp => fname
>
> > + struct inode *inode;
> > + struct path kp;
> > + pid_t tgid = 0;
> > + pid_t pid = 0;
> > + int ret;
> > + int len;
> > +
> > + if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > + return -EINVAL;
>
> Please put an empty line. As a general rule for readability,
> put an empty line if control flow is interrupted, e.g., by
> "return", "break" or "continue". At least this is what
> I saw most in bpf mailing list.
>
> > + pidns = task_active_pid_ns(current);
> > + if (unlikely(!pidns))
> > + goto clear;
>
> An empty line. Also, there is nothing to clear.
> I prefer an error code -ENOENT.
>
> You can set
> ret = -EINVAL;
> here
>
> > + pidns_info->nsid = pidns->ns.inum;
> > + pid = task_pid_nr_ns(current, pidns);
> > + if (unlikely(!pid))
> > + goto clear;
>
> An empty line.
>
> > + tgid = task_tgid_nr_ns(current, pidns);
> > + if (unlikely(!tgid))
> > + goto clear;
>
> An empty line.
>
> > + pidns_info->tgid = (u32) tgid;
> > + pidns_info->pid = (u32) pid;
>
> Different functionality, an empty line.
>
> > + tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> > + if (unlikely(!tmp)) {
> > + memset((void *)pidns_info, 0, (size_t) size);
> > + return -ENOMEM;
>
> ret = -ENOMEM;
> goto clear;
>
> > + }
>
> An empty line.
>
> > + len = strlen(pidns_path) + 1;
> > + memcpy((char *)tmp->name, pidns_path, len);
> > + tmp->uptr = NULL;
> > + tmp->aname = NULL;
> > + tmp->refcnt = 1;
> > + ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> Adding below to free kmem cache memory
> kmem_cache_free(names_cachep, fname);
>
> In the above, we checked task_active_pid_ns().
> If not returning NULL, we have a valid pid ns. So the above
> filename_lookup should not go wrong. We can still keep
> the error checking though.
>
> > + if (ret) {
> > + memset((void *)pidns_info, 0, (size_t) size);
> > + return ret;
>
> goto clear;
>
> > + }
>
> An empty line.
>
> > + inode = d_backing_inode(kp.dentry);
> > + pidns_info->dev = inode->i_sb->s_dev;
> > + return 0;
>
> An empty line.
>
> > +clear > + memset((void *)pidns_info, 0, (size_t) size);
> > + return -EINVAL;
> > +}
> > +
> > +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> > + .func = bpf_get_current_pidns_info,
> > + .gpl_only = false,
> > + .ret_type = RET_INTEGER,
> > + .arg1_type = ARG_PTR_TO_UNINIT_MEM,
> > + .arg2_type = ARG_CONST_SIZE,
> > +};
> > +
> > #ifdef CONFIG_CGROUPS
> > BPF_CALL_0(bpf_get_current_cgroup_id)
> > {
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ca1255d14576..5e1dc22765a5 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> > #endif
> > case BPF_FUNC_send_signal:
> > return &bpf_send_signal_proto;
> > + case BPF_FUNC_get_current_pidns_info:
> > + return &bpf_get_current_pidns_info_proto;
> > default:
> > return NULL;
> > }
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
2019-08-20 15:10 ` Carlos Antonio Neira Bustos
@ 2019-08-20 17:29 ` Yonghong Song
0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2019-08-20 17:29 UTC (permalink / raw)
To: Carlos Antonio Neira Bustos; +Cc: netdev, ebiederm, brouer, bpf
On 8/20/19 8:10 AM, Carlos Antonio Neira Bustos wrote:
> Hi Yonghong,
>
> Thanks for taking the time to review this.
>
>
>>> + *
>>> + * **-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
>>> + * or tgid of the current task.
>>> + *
>>> + * **-ECHILD** if /proc/self/ns/pid does not exists.
>>> + *
>>> + * **-ENOTDIR** if /proc/self/ns does not exists.
>>
>> Let us remove ECHILD and ENOTDIR and replace it with ENOENT as I
>> described below.
>>
>> Please *do verify* what happens when namespaces or pid_ns are not
>> configured.
>>
>
>
> I have tested kernel configurations without namespace support and with
> namespace support but without pid namespaces, the helper returns -EINVAL
> on both cases, now it should return -ENOENT.
Indeed. -ENOENT is better.
>
>
>>> +struct bpf_pidns_info {
>>> + __u32 dev;
>>
>> Please add a comment for dev for how device major and minor number are
>> derived. User space gets device major and minor number, they need to
>> compare to the corresponding major/minor numbers returned by this helper.
>>
>>> + __u32 nsid;
>>> + __u32 tgid;
>>> + __u32 pid;
>>> +};
>>
>
> What do you think of this comment ?
>
> struct bpf_pidns_info {
> __u32 dev; /* major/minor numbers from /proc/self/ns/pid.
> * User space gets device major and minor numbers from
> * the same device that need to be compared against the
> * major/minor numbers returned by this helper.
> */
> __u32 nsid;
> __u32 tgid;
> __u32 pid;
> };
>
To be more specific, I like a comment similar to below in uapi bpf.h
struct bpf_cgroup_dev_ctx {
/* access_type encoded as (BPF_DEVCG_ACC_* << 16) |
BPF_DEVCG_DEV_* */
__u32 access_type;
__u32 major;
__u32 minor;
};
Some like:
/* dev encoded as (major << 8 | (minor & 0xff)) */
>>
>> Please put an empty line. As a general rule for readability,
>> put an empty line if control flow is interrupted, e.g., by
>> "return", "break" or "continue". At least this is what
>> I saw most in bpf mailing list.
>>
> I'll fix it in version 10.
>
>>> + len = strlen(pidns_path) + 1;
>>> + memcpy((char *)tmp->name, pidns_path, len);
>>> + tmp->uptr = NULL;
>>> + tmp->aname = NULL;
>>> + tmp->refcnt = 1;
>>> + ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
>> Adding below to free kmem cache memory
>> kmem_cache_free(names_cachep, fname);
>>
>
> I think we don't need to call kmem_cache_free as filename_lookup
> calls putname that calls kmem_cache_free.
Oh, right. Thanks for checking this.
>
>
> Thanks a lot for your help.
>
> Bests
>
>> In the above, we checked task_active_pid_ns().
>> If not returning NULL, we have a valid pid ns. So the above
>> filename_lookup should not go wrong. We can still keep
>> the error checking though.
>>
>>> + if (ret) {
>>> + memset((void *)pidns_info, 0, (size_t) size);
>>> + return ret;
>>
>>
>
> I think we could get rid of this.
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
[not found] ` <CACiB22jyN9=0ATWWE+x=BoWD6u+8KO+MvBfsFQmcNfkmANb2_w@mail.gmail.com>
@ 2019-08-28 20:39 ` Carlos Antonio Neira Bustos
2019-08-28 20:53 ` Yonghong Song
0 siblings, 1 reply; 16+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-08-28 20:39 UTC (permalink / raw)
To: Yonghong Song; +Cc: netdev, Eric Biederman, brouer, bpf
Yonghong,
Thanks for the pointer, I fixed this bug, but I found another one that's triggered
now the test program I included in tools/testing/selftests/bpf/test_pidns.
It's seemed that fname was not correctly setup when passing it to filename_lookup.
This is fixed now and I'm doing some more testing.
I think I'll remove the tests on samples/bpf as they are mostly end on -EPERM as
the fix intended.
Is ok to remove them and just focus to finish the self tests code?.
Bests
On Wed, Aug 14, 2019 at 01:25:06AM -0400, carlos antonio neira bustos wrote:
> Thank you very much!
>
> Bests
>
> El mié., 14 de ago. de 2019 00:50, Yonghong Song <yhs@fb.com> escribió:
>
> >
> >
> > On 8/13/19 5:56 PM, Carlos Antonio Neira Bustos wrote:
> > > On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
> > >>
> > >>
> > >> On 8/13/19 11:47 AM, Carlos Neira wrote:
> > >>> From: Carlos <cneirabustos@gmail.com>
> > >>>
> > >>> New bpf helper bpf_get_current_pidns_info.
> > >>> This helper obtains the active namespace from current and returns
> > >>> pid, tgid, device and namespace id as seen from that namespace,
> > >>> allowing to instrument a process inside a container.
> > >>>
> > >>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > >>> ---
> > >>> fs/internal.h | 2 --
> > >>> fs/namei.c | 1 -
> > >>> include/linux/bpf.h | 1 +
> > >>> include/linux/namei.h | 4 +++
> > >>> include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> > >>> kernel/bpf/core.c | 1 +
> > >>> kernel/bpf/helpers.c | 64
> > ++++++++++++++++++++++++++++++++++++++++++++++++
> > >>> kernel/trace/bpf_trace.c | 2 ++
> > >>> 8 files changed, 102 insertions(+), 4 deletions(-)
> > >>>
> > [...]
> > >>>
> > >>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *,
> > pidns_info, u32,
> > >>> + size)
> > >>> +{
> > >>> + const char *pidns_path = "/proc/self/ns/pid";
> > >>> + struct pid_namespace *pidns = NULL;
> > >>> + struct filename *tmp = NULL;
> > >>> + struct inode *inode;
> > >>> + struct path kp;
> > >>> + pid_t tgid = 0;
> > >>> + pid_t pid = 0;
> > >>> + int ret;
> > >>> + int len;
> > >>
> > >
> > > Thank you very much for catching this!.
> > > Could you share how to replicate this bug?.
> >
> > The config is attached. just run trace_ns_info and you
> > can reproduce the issue.
> >
> > >
> > >> I am running your sample program and get the following kernel bug:
> > >>
> > >> ...
> > >> [ 26.414825] BUG: sleeping function called from invalid context at
> > >> /data/users/yhs/work/net-next/fs
> > >> /dcache.c:843
> > >> [ 26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> > >> [ 26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G W
> > >> 5.3.0-rc1+ #280
> > >> [ 26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > >> BIOS 1.9.3-1.el7.centos 04/01/2
> > >> 014
> > >> [ 26.419393] Call Trace:
> > >> [ 26.419697] <IRQ>
> > >> [ 26.419960] dump_stack+0x46/0x5b
> > >> [ 26.420434] ___might_sleep+0xe4/0x110
> > >> [ 26.420894] dput+0x2a/0x200
> > >> [ 26.421265] walk_component+0x10c/0x280
> > >> [ 26.421773] link_path_walk+0x327/0x560
> > >> [ 26.422280] ? proc_ns_dir_readdir+0x1a0/0x1a0
> > >> [ 26.422848] ? path_init+0x232/0x330
> > >> [ 26.423364] path_lookupat+0x88/0x200
> > >> [ 26.423808] ? selinux_parse_skb.constprop.69+0x124/0x430
> > >> [ 26.424521] filename_lookup+0xaf/0x190
> > >> [ 26.425031] ? simple_attr_release+0x20/0x20
> > >> [ 26.425560] bpf_get_current_pidns_info+0xfa/0x190
> > >> [ 26.426168] bpf_prog_83627154cefed596+0xe66/0x1000
> > >> [ 26.426779] trace_call_bpf+0xb5/0x160
> > >> [ 26.427317] ? __netif_receive_skb_core+0x1/0xbb0
> > >> [ 26.427929] ? __netif_receive_skb_core+0x1/0xbb0
> > >> [ 26.428496] kprobe_perf_func+0x4d/0x280
> > >> [ 26.428986] ? tracing_record_taskinfo_skip+0x1a/0x30
> > >> [ 26.429584] ? tracing_record_taskinfo+0xe/0x80
> > >> [ 26.430152] ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> > >> [ 26.430737] ? __netif_receive_skb_core+0x1/0xbb0
> > >> [ 26.431334] ? __netif_receive_skb_core+0x5/0xbb0
> > >> [ 26.431930] kprobe_ftrace_handler+0x90/0xf0
> > >> [ 26.432495] ftrace_ops_assist_func+0x63/0x100
> > >> [ 26.433060] 0xffffffffc03180bf
> > >> [ 26.433471] ? __netif_receive_skb_core+0x1/0xbb0
> > >> ...
> > >>
> > >> To prevent we are running in arbitrary task (e.g., idle task)
> > >> context which may introduce sleeping issues, the following
> > >> probably appropriate:
> > >>
> > >> if (in_nmi() || in_softirq())
> > >> return -EPERM;
> > >>
> > >> Anyway, if in nmi or softirq, the namespace and pid/tgid
> > >> we get may be just accidentally associated with the bpf running
> > >> context, but it could be in a different context. So such info
> > >> is not reliable any way.
> > >>
> > >>> +
> > >>> + if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > >>> + return -EINVAL;
> > >>> + pidns = task_active_pid_ns(current);
> > [...]
> >
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
2019-08-28 20:39 ` Carlos Antonio Neira Bustos
@ 2019-08-28 20:53 ` Yonghong Song
2019-08-28 21:03 ` Carlos Antonio Neira Bustos
0 siblings, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2019-08-28 20:53 UTC (permalink / raw)
To: Carlos Antonio Neira Bustos; +Cc: netdev, Eric Biederman, brouer, bpf
On 8/28/19 1:39 PM, Carlos Antonio Neira Bustos wrote:
> Yonghong,
>
> Thanks for the pointer, I fixed this bug, but I found another one that's triggered
> now the test program I included in tools/testing/selftests/bpf/test_pidns.
> It's seemed that fname was not correctly setup when passing it to filename_lookup.
> This is fixed now and I'm doing some more testing.
> I think I'll remove the tests on samples/bpf as they are mostly end on -EPERM as
> the fix intended.
> Is ok to remove them and just focus to finish the self tests code?.
Yes, the samples/bpf test case can be removed.
Could you create a selftest with tracpoint net/netif_receive_skb, which
also uses the proposed helper? net/netif_receive_skb will happen in
interrupt context and it should catch the issue as well if
filename_lookup still get called in interrupt context.
>
> Bests
>
> On Wed, Aug 14, 2019 at 01:25:06AM -0400, carlos antonio neira bustos wrote:
>> Thank you very much!
>>
>> Bests
>>
>> El mié., 14 de ago. de 2019 00:50, Yonghong Song <yhs@fb.com> escribió:
>>
>>>
>>>
>>> On 8/13/19 5:56 PM, Carlos Antonio Neira Bustos wrote:
>>>> On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
>>>>>
>>>>>
>>>>> On 8/13/19 11:47 AM, Carlos Neira wrote:
>>>>>> From: Carlos <cneirabustos@gmail.com>
>>>>>>
>>>>>> New bpf helper bpf_get_current_pidns_info.
>>>>>> This helper obtains the active namespace from current and returns
>>>>>> pid, tgid, device and namespace id as seen from that namespace,
>>>>>> allowing to instrument a process inside a container.
>>>>>>
>>>>>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
>>>>>> ---
>>>>>> fs/internal.h | 2 --
>>>>>> fs/namei.c | 1 -
>>>>>> include/linux/bpf.h | 1 +
>>>>>> include/linux/namei.h | 4 +++
>>>>>> include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
>>>>>> kernel/bpf/core.c | 1 +
>>>>>> kernel/bpf/helpers.c | 64
>>> ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> kernel/trace/bpf_trace.c | 2 ++
>>>>>> 8 files changed, 102 insertions(+), 4 deletions(-)
>>>>>>
>>> [...]
>>>>>>
>>>>>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *,
>>> pidns_info, u32,
>>>>>> + size)
>>>>>> +{
>>>>>> + const char *pidns_path = "/proc/self/ns/pid";
>>>>>> + struct pid_namespace *pidns = NULL;
>>>>>> + struct filename *tmp = NULL;
>>>>>> + struct inode *inode;
>>>>>> + struct path kp;
>>>>>> + pid_t tgid = 0;
>>>>>> + pid_t pid = 0;
>>>>>> + int ret;
>>>>>> + int len;
>>>>>
>>>>
>>>> Thank you very much for catching this!.
>>>> Could you share how to replicate this bug?.
>>>
>>> The config is attached. just run trace_ns_info and you
>>> can reproduce the issue.
>>>
>>>>
>>>>> I am running your sample program and get the following kernel bug:
>>>>>
>>>>> ...
>>>>> [ 26.414825] BUG: sleeping function called from invalid context at
>>>>> /data/users/yhs/work/net-next/fs
>>>>> /dcache.c:843
>>>>> [ 26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
>>>>> [ 26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G W
>>>>> 5.3.0-rc1+ #280
>>>>> [ 26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>>>> BIOS 1.9.3-1.el7.centos 04/01/2
>>>>> 014
>>>>> [ 26.419393] Call Trace:
>>>>> [ 26.419697] <IRQ>
>>>>> [ 26.419960] dump_stack+0x46/0x5b
>>>>> [ 26.420434] ___might_sleep+0xe4/0x110
>>>>> [ 26.420894] dput+0x2a/0x200
>>>>> [ 26.421265] walk_component+0x10c/0x280
>>>>> [ 26.421773] link_path_walk+0x327/0x560
>>>>> [ 26.422280] ? proc_ns_dir_readdir+0x1a0/0x1a0
>>>>> [ 26.422848] ? path_init+0x232/0x330
>>>>> [ 26.423364] path_lookupat+0x88/0x200
>>>>> [ 26.423808] ? selinux_parse_skb.constprop.69+0x124/0x430
>>>>> [ 26.424521] filename_lookup+0xaf/0x190
>>>>> [ 26.425031] ? simple_attr_release+0x20/0x20
>>>>> [ 26.425560] bpf_get_current_pidns_info+0xfa/0x190
>>>>> [ 26.426168] bpf_prog_83627154cefed596+0xe66/0x1000
>>>>> [ 26.426779] trace_call_bpf+0xb5/0x160
>>>>> [ 26.427317] ? __netif_receive_skb_core+0x1/0xbb0
>>>>> [ 26.427929] ? __netif_receive_skb_core+0x1/0xbb0
>>>>> [ 26.428496] kprobe_perf_func+0x4d/0x280
>>>>> [ 26.428986] ? tracing_record_taskinfo_skip+0x1a/0x30
>>>>> [ 26.429584] ? tracing_record_taskinfo+0xe/0x80
>>>>> [ 26.430152] ? ttwu_do_wakeup.isra.114+0xcf/0xf0
>>>>> [ 26.430737] ? __netif_receive_skb_core+0x1/0xbb0
>>>>> [ 26.431334] ? __netif_receive_skb_core+0x5/0xbb0
>>>>> [ 26.431930] kprobe_ftrace_handler+0x90/0xf0
>>>>> [ 26.432495] ftrace_ops_assist_func+0x63/0x100
>>>>> [ 26.433060] 0xffffffffc03180bf
>>>>> [ 26.433471] ? __netif_receive_skb_core+0x1/0xbb0
>>>>> ...
>>>>>
>>>>> To prevent we are running in arbitrary task (e.g., idle task)
>>>>> context which may introduce sleeping issues, the following
>>>>> probably appropriate:
>>>>>
>>>>> if (in_nmi() || in_softirq())
>>>>> return -EPERM;
>>>>>
>>>>> Anyway, if in nmi or softirq, the namespace and pid/tgid
>>>>> we get may be just accidentally associated with the bpf running
>>>>> context, but it could be in a different context. So such info
>>>>> is not reliable any way.
>>>>>
>>>>>> +
>>>>>> + if (unlikely(size != sizeof(struct bpf_pidns_info)))
>>>>>> + return -EINVAL;
>>>>>> + pidns = task_active_pid_ns(current);
>>> [...]
>>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
2019-08-28 20:53 ` Yonghong Song
@ 2019-08-28 21:03 ` Carlos Antonio Neira Bustos
2019-09-03 18:45 ` Carlos Antonio Neira Bustos
0 siblings, 1 reply; 16+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-08-28 21:03 UTC (permalink / raw)
To: Yonghong Song; +Cc: netdev, Eric Biederman, brouer, bpf
Thanks, I'll work on the net/netif_receive_skb selftest using this helper.
I hope I could complete this work this week.
Bests.
On Wed, Aug 28, 2019 at 08:53:25PM +0000, Yonghong Song wrote:
>
>
> On 8/28/19 1:39 PM, Carlos Antonio Neira Bustos wrote:
> > Yonghong,
> >
> > Thanks for the pointer, I fixed this bug, but I found another one that's triggered
> > now the test program I included in tools/testing/selftests/bpf/test_pidns.
> > It's seemed that fname was not correctly setup when passing it to filename_lookup.
> > This is fixed now and I'm doing some more testing.
> > I think I'll remove the tests on samples/bpf as they are mostly end on -EPERM as
> > the fix intended.
> > Is ok to remove them and just focus to finish the self tests code?.
>
> Yes, the samples/bpf test case can be removed.
> Could you create a selftest with tracpoint net/netif_receive_skb, which
> also uses the proposed helper? net/netif_receive_skb will happen in
> interrupt context and it should catch the issue as well if
> filename_lookup still get called in interrupt context.
>
> >
> > Bests
> >
> > On Wed, Aug 14, 2019 at 01:25:06AM -0400, carlos antonio neira bustos wrote:
> >> Thank you very much!
> >>
> >> Bests
> >>
> >> El mié., 14 de ago. de 2019 00:50, Yonghong Song <yhs@fb.com> escribió:
> >>
> >>>
> >>>
> >>> On 8/13/19 5:56 PM, Carlos Antonio Neira Bustos wrote:
> >>>> On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
> >>>>>
> >>>>>
> >>>>> On 8/13/19 11:47 AM, Carlos Neira wrote:
> >>>>>> From: Carlos <cneirabustos@gmail.com>
> >>>>>>
> >>>>>> New bpf helper bpf_get_current_pidns_info.
> >>>>>> This helper obtains the active namespace from current and returns
> >>>>>> pid, tgid, device and namespace id as seen from that namespace,
> >>>>>> allowing to instrument a process inside a container.
> >>>>>>
> >>>>>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> >>>>>> ---
> >>>>>> fs/internal.h | 2 --
> >>>>>> fs/namei.c | 1 -
> >>>>>> include/linux/bpf.h | 1 +
> >>>>>> include/linux/namei.h | 4 +++
> >>>>>> include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> >>>>>> kernel/bpf/core.c | 1 +
> >>>>>> kernel/bpf/helpers.c | 64
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>>> kernel/trace/bpf_trace.c | 2 ++
> >>>>>> 8 files changed, 102 insertions(+), 4 deletions(-)
> >>>>>>
> >>> [...]
> >>>>>>
> >>>>>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *,
> >>> pidns_info, u32,
> >>>>>> + size)
> >>>>>> +{
> >>>>>> + const char *pidns_path = "/proc/self/ns/pid";
> >>>>>> + struct pid_namespace *pidns = NULL;
> >>>>>> + struct filename *tmp = NULL;
> >>>>>> + struct inode *inode;
> >>>>>> + struct path kp;
> >>>>>> + pid_t tgid = 0;
> >>>>>> + pid_t pid = 0;
> >>>>>> + int ret;
> >>>>>> + int len;
> >>>>>
> >>>>
> >>>> Thank you very much for catching this!.
> >>>> Could you share how to replicate this bug?.
> >>>
> >>> The config is attached. just run trace_ns_info and you
> >>> can reproduce the issue.
> >>>
> >>>>
> >>>>> I am running your sample program and get the following kernel bug:
> >>>>>
> >>>>> ...
> >>>>> [ 26.414825] BUG: sleeping function called from invalid context at
> >>>>> /data/users/yhs/work/net-next/fs
> >>>>> /dcache.c:843
> >>>>> [ 26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> >>>>> [ 26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G W
> >>>>> 5.3.0-rc1+ #280
> >>>>> [ 26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> >>>>> BIOS 1.9.3-1.el7.centos 04/01/2
> >>>>> 014
> >>>>> [ 26.419393] Call Trace:
> >>>>> [ 26.419697] <IRQ>
> >>>>> [ 26.419960] dump_stack+0x46/0x5b
> >>>>> [ 26.420434] ___might_sleep+0xe4/0x110
> >>>>> [ 26.420894] dput+0x2a/0x200
> >>>>> [ 26.421265] walk_component+0x10c/0x280
> >>>>> [ 26.421773] link_path_walk+0x327/0x560
> >>>>> [ 26.422280] ? proc_ns_dir_readdir+0x1a0/0x1a0
> >>>>> [ 26.422848] ? path_init+0x232/0x330
> >>>>> [ 26.423364] path_lookupat+0x88/0x200
> >>>>> [ 26.423808] ? selinux_parse_skb.constprop.69+0x124/0x430
> >>>>> [ 26.424521] filename_lookup+0xaf/0x190
> >>>>> [ 26.425031] ? simple_attr_release+0x20/0x20
> >>>>> [ 26.425560] bpf_get_current_pidns_info+0xfa/0x190
> >>>>> [ 26.426168] bpf_prog_83627154cefed596+0xe66/0x1000
> >>>>> [ 26.426779] trace_call_bpf+0xb5/0x160
> >>>>> [ 26.427317] ? __netif_receive_skb_core+0x1/0xbb0
> >>>>> [ 26.427929] ? __netif_receive_skb_core+0x1/0xbb0
> >>>>> [ 26.428496] kprobe_perf_func+0x4d/0x280
> >>>>> [ 26.428986] ? tracing_record_taskinfo_skip+0x1a/0x30
> >>>>> [ 26.429584] ? tracing_record_taskinfo+0xe/0x80
> >>>>> [ 26.430152] ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> >>>>> [ 26.430737] ? __netif_receive_skb_core+0x1/0xbb0
> >>>>> [ 26.431334] ? __netif_receive_skb_core+0x5/0xbb0
> >>>>> [ 26.431930] kprobe_ftrace_handler+0x90/0xf0
> >>>>> [ 26.432495] ftrace_ops_assist_func+0x63/0x100
> >>>>> [ 26.433060] 0xffffffffc03180bf
> >>>>> [ 26.433471] ? __netif_receive_skb_core+0x1/0xbb0
> >>>>> ...
> >>>>>
> >>>>> To prevent we are running in arbitrary task (e.g., idle task)
> >>>>> context which may introduce sleeping issues, the following
> >>>>> probably appropriate:
> >>>>>
> >>>>> if (in_nmi() || in_softirq())
> >>>>> return -EPERM;
> >>>>>
> >>>>> Anyway, if in nmi or softirq, the namespace and pid/tgid
> >>>>> we get may be just accidentally associated with the bpf running
> >>>>> context, but it could be in a different context. So such info
> >>>>> is not reliable any way.
> >>>>>
> >>>>>> +
> >>>>>> + if (unlikely(size != sizeof(struct bpf_pidns_info)))
> >>>>>> + return -EINVAL;
> >>>>>> + pidns = task_active_pid_ns(current);
> >>> [...]
> >>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
2019-08-28 21:03 ` Carlos Antonio Neira Bustos
@ 2019-09-03 18:45 ` Carlos Antonio Neira Bustos
2019-09-03 20:36 ` Yonghong Song
0 siblings, 1 reply; 16+ messages in thread
From: Carlos Antonio Neira Bustos @ 2019-09-03 18:45 UTC (permalink / raw)
To: Yonghong Song; +Cc: netdev, Eric Biederman, brouer, bpf
Hi Yonghong,
> > Yes, the samples/bpf test case can be removed.
> > Could you create a selftest with tracpoint net/netif_receive_skb, which
> > also uses the proposed helper? net/netif_receive_skb will happen in
> > interrupt context and it should catch the issue as well if
> > filename_lookup still get called in interrupt context.
>
For this one scenario I just created another selftest with the only difference
that the tracepoint is /net/netif_receive_skb so this fails with -EPERM.
Is that enough?.
I have made this comment on include/uapi/linux/bpf.h, maybe is too terse?
struct bpf_pidns_info {
__u32 dev; /* dev_t from /proc/self/ns/pid inode */
__u32 nsid;
__u32 tgid;
__u32 pid;
};
I'm only missing clearing out those questions to be ready to submit v11 of this patch.
Bests
On Wed, Aug 28, 2019 at 05:03:35PM -0400, Carlos Antonio Neira Bustos wrote:
> Thanks, I'll work on the net/netif_receive_skb selftest using this helper.
> I hope I could complete this work this week.
>
> Bests.
>
> On Wed, Aug 28, 2019 at 08:53:25PM +0000, Yonghong Song wrote:
> >
> >
> > On 8/28/19 1:39 PM, Carlos Antonio Neira Bustos wrote:
> > > Yonghong,
> > >
> > > Thanks for the pointer, I fixed this bug, but I found another one that's triggered
> > > now the test program I included in tools/testing/selftests/bpf/test_pidns.
> > > It's seemed that fname was not correctly setup when passing it to filename_lookup.
> > > This is fixed now and I'm doing some more testing.
> > > I think I'll remove the tests on samples/bpf as they are mostly end on -EPERM as
> > > the fix intended.
> > > Is ok to remove them and just focus to finish the self tests code?.
> >
> > Yes, the samples/bpf test case can be removed.
> > Could you create a selftest with tracpoint net/netif_receive_skb, which
> > also uses the proposed helper? net/netif_receive_skb will happen in
> > interrupt context and it should catch the issue as well if
> > filename_lookup still get called in interrupt context.
> >
> > >
> > > Bests
> > >
> > > On Wed, Aug 14, 2019 at 01:25:06AM -0400, carlos antonio neira bustos wrote:
> > >> Thank you very much!
> > >>
> > >> Bests
> > >>
> > >> El mié., 14 de ago. de 2019 00:50, Yonghong Song <yhs@fb.com> escribió:
> > >>
> > >>>
> > >>>
> > >>> On 8/13/19 5:56 PM, Carlos Antonio Neira Bustos wrote:
> > >>>> On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote:
> > >>>>>
> > >>>>>
> > >>>>> On 8/13/19 11:47 AM, Carlos Neira wrote:
> > >>>>>> From: Carlos <cneirabustos@gmail.com>
> > >>>>>>
> > >>>>>> New bpf helper bpf_get_current_pidns_info.
> > >>>>>> This helper obtains the active namespace from current and returns
> > >>>>>> pid, tgid, device and namespace id as seen from that namespace,
> > >>>>>> allowing to instrument a process inside a container.
> > >>>>>>
> > >>>>>> Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > >>>>>> ---
> > >>>>>> fs/internal.h | 2 --
> > >>>>>> fs/namei.c | 1 -
> > >>>>>> include/linux/bpf.h | 1 +
> > >>>>>> include/linux/namei.h | 4 +++
> > >>>>>> include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++-
> > >>>>>> kernel/bpf/core.c | 1 +
> > >>>>>> kernel/bpf/helpers.c | 64
> > >>> ++++++++++++++++++++++++++++++++++++++++++++++++
> > >>>>>> kernel/trace/bpf_trace.c | 2 ++
> > >>>>>> 8 files changed, 102 insertions(+), 4 deletions(-)
> > >>>>>>
> > >>> [...]
> > >>>>>>
> > >>>>>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *,
> > >>> pidns_info, u32,
> > >>>>>> + size)
> > >>>>>> +{
> > >>>>>> + const char *pidns_path = "/proc/self/ns/pid";
> > >>>>>> + struct pid_namespace *pidns = NULL;
> > >>>>>> + struct filename *tmp = NULL;
> > >>>>>> + struct inode *inode;
> > >>>>>> + struct path kp;
> > >>>>>> + pid_t tgid = 0;
> > >>>>>> + pid_t pid = 0;
> > >>>>>> + int ret;
> > >>>>>> + int len;
> > >>>>>
> > >>>>
> > >>>> Thank you very much for catching this!.
> > >>>> Could you share how to replicate this bug?.
> > >>>
> > >>> The config is attached. just run trace_ns_info and you
> > >>> can reproduce the issue.
> > >>>
> > >>>>
> > >>>>> I am running your sample program and get the following kernel bug:
> > >>>>>
> > >>>>> ...
> > >>>>> [ 26.414825] BUG: sleeping function called from invalid context at
> > >>>>> /data/users/yhs/work/net-next/fs
> > >>>>> /dcache.c:843
> > >>>>> [ 26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping
> > >>>>> [ 26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G W
> > >>>>> 5.3.0-rc1+ #280
> > >>>>> [ 26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > >>>>> BIOS 1.9.3-1.el7.centos 04/01/2
> > >>>>> 014
> > >>>>> [ 26.419393] Call Trace:
> > >>>>> [ 26.419697] <IRQ>
> > >>>>> [ 26.419960] dump_stack+0x46/0x5b
> > >>>>> [ 26.420434] ___might_sleep+0xe4/0x110
> > >>>>> [ 26.420894] dput+0x2a/0x200
> > >>>>> [ 26.421265] walk_component+0x10c/0x280
> > >>>>> [ 26.421773] link_path_walk+0x327/0x560
> > >>>>> [ 26.422280] ? proc_ns_dir_readdir+0x1a0/0x1a0
> > >>>>> [ 26.422848] ? path_init+0x232/0x330
> > >>>>> [ 26.423364] path_lookupat+0x88/0x200
> > >>>>> [ 26.423808] ? selinux_parse_skb.constprop.69+0x124/0x430
> > >>>>> [ 26.424521] filename_lookup+0xaf/0x190
> > >>>>> [ 26.425031] ? simple_attr_release+0x20/0x20
> > >>>>> [ 26.425560] bpf_get_current_pidns_info+0xfa/0x190
> > >>>>> [ 26.426168] bpf_prog_83627154cefed596+0xe66/0x1000
> > >>>>> [ 26.426779] trace_call_bpf+0xb5/0x160
> > >>>>> [ 26.427317] ? __netif_receive_skb_core+0x1/0xbb0
> > >>>>> [ 26.427929] ? __netif_receive_skb_core+0x1/0xbb0
> > >>>>> [ 26.428496] kprobe_perf_func+0x4d/0x280
> > >>>>> [ 26.428986] ? tracing_record_taskinfo_skip+0x1a/0x30
> > >>>>> [ 26.429584] ? tracing_record_taskinfo+0xe/0x80
> > >>>>> [ 26.430152] ? ttwu_do_wakeup.isra.114+0xcf/0xf0
> > >>>>> [ 26.430737] ? __netif_receive_skb_core+0x1/0xbb0
> > >>>>> [ 26.431334] ? __netif_receive_skb_core+0x5/0xbb0
> > >>>>> [ 26.431930] kprobe_ftrace_handler+0x90/0xf0
> > >>>>> [ 26.432495] ftrace_ops_assist_func+0x63/0x100
> > >>>>> [ 26.433060] 0xffffffffc03180bf
> > >>>>> [ 26.433471] ? __netif_receive_skb_core+0x1/0xbb0
> > >>>>> ...
> > >>>>>
> > >>>>> To prevent we are running in arbitrary task (e.g., idle task)
> > >>>>> context which may introduce sleeping issues, the following
> > >>>>> probably appropriate:
> > >>>>>
> > >>>>> if (in_nmi() || in_softirq())
> > >>>>> return -EPERM;
> > >>>>>
> > >>>>> Anyway, if in nmi or softirq, the namespace and pid/tgid
> > >>>>> we get may be just accidentally associated with the bpf running
> > >>>>> context, but it could be in a different context. So such info
> > >>>>> is not reliable any way.
> > >>>>>
> > >>>>>> +
> > >>>>>> + if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > >>>>>> + return -EINVAL;
> > >>>>>> + pidns = task_active_pid_ns(current);
> > >>> [...]
> > >>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next V9 1/3] bpf: new helper to obtain namespace data from current task
2019-09-03 18:45 ` Carlos Antonio Neira Bustos
@ 2019-09-03 20:36 ` Yonghong Song
0 siblings, 0 replies; 16+ messages in thread
From: Yonghong Song @ 2019-09-03 20:36 UTC (permalink / raw)
To: Carlos Antonio Neira Bustos; +Cc: netdev, Eric Biederman, brouer, bpf
On 9/3/19 11:45 AM, Carlos Antonio Neira Bustos wrote:
> Hi Yonghong,
>
>>> Yes, the samples/bpf test case can be removed.
>>> Could you create a selftest with tracpoint net/netif_receive_skb, which
>>> also uses the proposed helper? net/netif_receive_skb will happen in
>>> interrupt context and it should catch the issue as well if
>>> filename_lookup still get called in interrupt context.
>>
> For this one scenario I just created another selftest with the only difference
> that the tracepoint is /net/netif_receive_skb so this fails with -EPERM.
> Is that enough?.
This should be fine.
>
> I have made this comment on include/uapi/linux/bpf.h, maybe is too terse?
>
> struct bpf_pidns_info {
> __u32 dev; /* dev_t from /proc/self/ns/pid inode */
> __u32 nsid;
> __u32 tgid;
> __u32 pid;
> };
Let us keep the above for now. I may have further comments based on
your test code which uses "dev".
>
> I'm only missing clearing out those questions to be ready to submit v11 of this patch.
Please go ahead to submit the new version.
Thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2019-09-03 20:37 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-13 18:47 [PATCH bpf-next V9 0/3] BPF: New helper to obtain namespace data from current task Carlos Neira
2019-08-13 18:47 ` [PATCH bpf-next V9 1/3] bpf: new " Carlos Neira
2019-08-13 22:35 ` Yonghong Song
2019-08-20 15:10 ` Carlos Antonio Neira Bustos
2019-08-20 17:29 ` Yonghong Song
2019-08-13 23:11 ` Yonghong Song
2019-08-13 23:51 ` [Potential Spoof] " Yonghong Song
2019-08-14 0:56 ` Carlos Antonio Neira Bustos
[not found] ` <9a2cacad-b79f-5d39-6d62-bb48cbaaac07@fb.com>
[not found] ` <CACiB22jyN9=0ATWWE+x=BoWD6u+8KO+MvBfsFQmcNfkmANb2_w@mail.gmail.com>
2019-08-28 20:39 ` Carlos Antonio Neira Bustos
2019-08-28 20:53 ` Yonghong Song
2019-08-28 21:03 ` Carlos Antonio Neira Bustos
2019-09-03 18:45 ` Carlos Antonio Neira Bustos
2019-09-03 20:36 ` Yonghong Song
2019-08-13 18:47 ` [PATCH bpf-next V9 2/3] samples/bpf: added sample code for bpf_get_current_pidns_info Carlos Neira
2019-08-13 18:47 ` [PATCH bpf-next V9 3/3] tools/testing/selftests/bpf: Add self-tests for new helper Carlos Neira
2019-08-13 23:19 ` Yonghong Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).