* [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task @ 2019-09-24 15:20 Carlos Neira 2019-09-24 15:20 ` [PATCH bpf-next v11 1/4] fs/nsfs.c: added ns_match Carlos Neira ` (5 more replies) 0 siblings, 6 replies; 18+ messages in thread From: Carlos Neira @ 2019-09-24 15:20 UTC (permalink / raw) To: netdev; +Cc: yhs, ebiederm, brouer, bpf, cneirabustos Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's scripts but this helper returns the pid as seen by the root namespace which is fine when a bcc script is not executed inside a container. When the process of interest is inside a container, pid filtering will not work if bpf_get_current_pid_tgid() is used. This helper addresses this limitation returning the pid as it's seen by the current namespace where the script is executing. In the future different pid_ns files may belong to different devices, according to the discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference. To address that situation the helper requires inum and dev_t from /proc/self/ns/pid. This helper has the same use cases as bpf_get_current_pid_tgid() as it can be used to do pid filtering even inside a container. Signed-off-by: Carlos Neira <cneirabustos@gmail.com> Carlos Neira (4): fs/nsfs.c: added ns_match bpf: added new helper bpf_get_ns_current_pid_tgid tools: Added bpf_get_ns_current_pid_tgid helper tools/testing/selftests/bpf: Add self-tests for new helper. self tests added for new helper fs/nsfs.c | 8 + include/linux/bpf.h | 1 + include/linux/proc_ns.h | 2 + include/uapi/linux/bpf.h | 18 ++- kernel/bpf/core.c | 1 + kernel/bpf/helpers.c | 32 ++++ kernel/trace/bpf_trace.c | 2 + tools/include/uapi/linux/bpf.h | 18 ++- tools/testing/selftests/bpf/Makefile | 2 +- tools/testing/selftests/bpf/bpf_helpers.h | 3 + .../selftests/bpf/progs/test_pidns_kern.c | 71 ++++++++ tools/testing/selftests/bpf/test_pidns.c | 152 ++++++++++++++++++ 12 files changed, 307 insertions(+), 3 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c create mode 100644 tools/testing/selftests/bpf/test_pidns.c -- 2.20.1 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH bpf-next v11 1/4] fs/nsfs.c: added ns_match 2019-09-24 15:20 [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira @ 2019-09-24 15:20 ` Carlos Neira 2019-09-24 15:20 ` [PATCH bpf-next v11 2/4] bpf: added new helper bpf_get_ns_current_pid_tgid Carlos Neira ` (4 subsequent siblings) 5 siblings, 0 replies; 18+ messages in thread From: Carlos Neira @ 2019-09-24 15:20 UTC (permalink / raw) To: netdev; +Cc: yhs, ebiederm, brouer, bpf, cneirabustos ns_match returns true if the namespace inode and dev_t matches the ones provided by the caller. Signed-off-by: Carlos Neira <cneirabustos@gmail.com> --- fs/nsfs.c | 8 ++++++++ include/linux/proc_ns.h | 2 ++ 2 files changed, 10 insertions(+) diff --git a/fs/nsfs.c b/fs/nsfs.c index a0431642c6b5..256f6295d33d 100644 --- a/fs/nsfs.c +++ b/fs/nsfs.c @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd) return ERR_PTR(-EINVAL); } +/* Returns true if current namespace matches dev/ino. + */ +bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino) +{ + return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev)); +} + + static int nsfs_show_path(struct seq_file *seq, struct dentry *dentry) { struct inode *inode = d_inode(dentry); diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h index d31cb6215905..1da9f33489f3 100644 --- a/include/linux/proc_ns.h +++ b/include/linux/proc_ns.h @@ -82,6 +82,8 @@ typedef struct ns_common *ns_get_path_helper_t(void *); extern void *ns_get_path_cb(struct path *path, ns_get_path_helper_t ns_get_cb, void *private_data); +extern bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino); + extern int ns_get_name(char *buf, size_t size, struct task_struct *task, const struct proc_ns_operations *ns_ops); extern void nsfs_init(void); -- 2.20.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH bpf-next v11 2/4] bpf: added new helper bpf_get_ns_current_pid_tgid 2019-09-24 15:20 [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira 2019-09-24 15:20 ` [PATCH bpf-next v11 1/4] fs/nsfs.c: added ns_match Carlos Neira @ 2019-09-24 15:20 ` Carlos Neira 2019-09-27 16:15 ` Andrii Nakryiko 2019-09-24 15:20 ` [PATCH bpf-next v11 3/4] tools: Added bpf_get_ns_current_pid_tgid helper Carlos Neira ` (3 subsequent siblings) 5 siblings, 1 reply; 18+ messages in thread From: Carlos Neira @ 2019-09-24 15:20 UTC (permalink / raw) To: netdev; +Cc: yhs, ebiederm, brouer, bpf, cneirabustos New bpf helper bpf_get_ns_current_pid_tgid, This helper will return pid and tgid from current task which namespace matches dev_t and inode number provided, this will allows us to instrument a process inside a container. Signed-off-by: Carlos Neira <cneirabustos@gmail.com> --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 18 +++++++++++++++++- kernel/bpf/core.c | 1 + kernel/bpf/helpers.c | 32 ++++++++++++++++++++++++++++++++ kernel/trace/bpf_trace.c | 2 ++ 5 files changed, 53 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5b9d22338606..231001475504 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1055,6 +1055,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto; extern const struct bpf_func_proto bpf_strtol_proto; extern const struct bpf_func_proto bpf_strtoul_proto; extern const struct bpf_func_proto bpf_tcp_sock_proto; +extern const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto; /* Shared helpers among cBPF and eBPF. */ void bpf_user_rnd_init_once(void); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 77c6be96d676..9272dc8fb08c 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2750,6 +2750,21 @@ union bpf_attr { * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies * * **-EPROTONOSUPPORT** IP packet version is not 4 or 6 + * + * int bpf_get_ns_current_pid_tgid(u32 dev, u64 inum) + * Return + * A 64-bit integer containing the current tgid and pid from current task + * which namespace inode and dev_t matches , and is create as such: + * *current_task*\ **->tgid << 32 \|** + * *current_task*\ **->pid**. + * + * On failure, the returned value is one of the following: + * + * **-EINVAL** if dev and inum supplied don't match dev_t and inode number + * with nsfs of current task. + * + * **-ENOENT** if /proc/self/ns does not exists. + * */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2862,7 +2877,8 @@ union bpf_attr { FN(sk_storage_get), \ FN(sk_storage_delete), \ FN(send_signal), \ - FN(tcp_gen_syncookie), + FN(tcp_gen_syncookie), \ + FN(get_ns_current_pid_tgid), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 66088a9e9b9e..b2fd5358f472 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2042,6 +2042,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak; const struct bpf_func_proto bpf_get_current_comm_proto __weak; const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak; const struct bpf_func_proto bpf_get_local_storage_proto __weak; +const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto __weak; const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void) { diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 5e28718928ca..81a716eae7ed 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -11,6 +11,8 @@ #include <linux/uidgid.h> #include <linux/filter.h> #include <linux/ctype.h> +#include <linux/pid_namespace.h> +#include <linux/proc_ns.h> #include "../../lib/kstrtox.h" @@ -487,3 +489,33 @@ const struct bpf_func_proto bpf_strtoul_proto = { .arg4_type = ARG_PTR_TO_LONG, }; #endif + +BPF_CALL_2(bpf_get_ns_current_pid_tgid, u32, dev, u64, inum) +{ + struct task_struct *task = current; + struct pid_namespace *pidns; + pid_t pid, tgid; + + if (unlikely(!task)) + return -EINVAL; + + pidns = task_active_pid_ns(task); + if (unlikely(!pidns)) + return -ENOENT; + + if (!ns_match(&pidns->ns, (dev_t)dev, inum)) + return -EINVAL; + + pid = task_pid_nr_ns(task, pidns); + tgid = task_tgid_nr_ns(task, pidns); + + return (u64) tgid << 32 | pid; +} + +const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto = { + .func = bpf_get_ns_current_pid_tgid, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_ANYTHING, + .arg2_type = ARG_ANYTHING, +}; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index ca1255d14576..1d34f1013e78 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) #endif case BPF_FUNC_send_signal: return &bpf_send_signal_proto; + case BPF_FUNC_get_ns_current_pid_tgid: + return &bpf_get_ns_current_pid_tgid_proto; default: return NULL; } -- 2.20.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v11 2/4] bpf: added new helper bpf_get_ns_current_pid_tgid 2019-09-24 15:20 ` [PATCH bpf-next v11 2/4] bpf: added new helper bpf_get_ns_current_pid_tgid Carlos Neira @ 2019-09-27 16:15 ` Andrii Nakryiko 2019-09-27 16:59 ` Yonghong Song 0 siblings, 1 reply; 18+ messages in thread From: Andrii Nakryiko @ 2019-09-27 16:15 UTC (permalink / raw) To: Carlos Neira Cc: Networking, Yonghong Song, ebiederm, Jesper Dangaard Brouer, bpf On Thu, Sep 26, 2019 at 1:15 AM Carlos Neira <cneirabustos@gmail.com> wrote: > > New bpf helper bpf_get_ns_current_pid_tgid, > This helper will return pid and tgid from current task > which namespace matches dev_t and inode number provided, > this will allows us to instrument a process inside a container. > > Signed-off-by: Carlos Neira <cneirabustos@gmail.com> > --- > include/linux/bpf.h | 1 + > include/uapi/linux/bpf.h | 18 +++++++++++++++++- > kernel/bpf/core.c | 1 + > kernel/bpf/helpers.c | 32 ++++++++++++++++++++++++++++++++ > kernel/trace/bpf_trace.c | 2 ++ > 5 files changed, 53 insertions(+), 1 deletion(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 5b9d22338606..231001475504 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -1055,6 +1055,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto; > extern const struct bpf_func_proto bpf_strtol_proto; > extern const struct bpf_func_proto bpf_strtoul_proto; > extern const struct bpf_func_proto bpf_tcp_sock_proto; > +extern const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto; > > /* Shared helpers among cBPF and eBPF. */ > void bpf_user_rnd_init_once(void); > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 77c6be96d676..9272dc8fb08c 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -2750,6 +2750,21 @@ union bpf_attr { > * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies > * > * **-EPROTONOSUPPORT** IP packet version is not 4 or 6 > + * > + * int bpf_get_ns_current_pid_tgid(u32 dev, u64 inum) > + * Return > + * A 64-bit integer containing the current tgid and pid from current task Function signature doesn't correspond to the actual return type (int vs u64). > + * which namespace inode and dev_t matches , and is create as such: > + * *current_task*\ **->tgid << 32 \|** > + * *current_task*\ **->pid**. > + * > + * On failure, the returned value is one of the following: > + * > + * **-EINVAL** if dev and inum supplied don't match dev_t and inode number > + * with nsfs of current task. > + * > + * **-ENOENT** if /proc/self/ns does not exists. > + * > */ [...] > #include "../../lib/kstrtox.h" > > @@ -487,3 +489,33 @@ const struct bpf_func_proto bpf_strtoul_proto = { > .arg4_type = ARG_PTR_TO_LONG, > }; > #endif > + > +BPF_CALL_2(bpf_get_ns_current_pid_tgid, u32, dev, u64, inum) Just curious, is dev_t officially specified as u32 and is never supposed to grow bigger? I wonder if accepting u64 might be more future-proof API here? > +{ > + struct task_struct *task = current; > + struct pid_namespace *pidns; [...] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v11 2/4] bpf: added new helper bpf_get_ns_current_pid_tgid 2019-09-27 16:15 ` Andrii Nakryiko @ 2019-09-27 16:59 ` Yonghong Song 2019-09-27 17:24 ` Andrii Nakryiko 0 siblings, 1 reply; 18+ messages in thread From: Yonghong Song @ 2019-09-27 16:59 UTC (permalink / raw) To: Andrii Nakryiko, Carlos Neira Cc: Networking, ebiederm, Jesper Dangaard Brouer, bpf On 9/27/19 9:15 AM, Andrii Nakryiko wrote: > On Thu, Sep 26, 2019 at 1:15 AM Carlos Neira <cneirabustos@gmail.com> wrote: >> >> New bpf helper bpf_get_ns_current_pid_tgid, >> This helper will return pid and tgid from current task >> which namespace matches dev_t and inode number provided, >> this will allows us to instrument a process inside a container. >> >> Signed-off-by: Carlos Neira <cneirabustos@gmail.com> >> --- >> include/linux/bpf.h | 1 + >> include/uapi/linux/bpf.h | 18 +++++++++++++++++- >> kernel/bpf/core.c | 1 + >> kernel/bpf/helpers.c | 32 ++++++++++++++++++++++++++++++++ >> kernel/trace/bpf_trace.c | 2 ++ >> 5 files changed, 53 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h >> index 5b9d22338606..231001475504 100644 >> --- a/include/linux/bpf.h >> +++ b/include/linux/bpf.h >> @@ -1055,6 +1055,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto; >> extern const struct bpf_func_proto bpf_strtol_proto; >> extern const struct bpf_func_proto bpf_strtoul_proto; >> extern const struct bpf_func_proto bpf_tcp_sock_proto; >> +extern const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto; >> >> /* Shared helpers among cBPF and eBPF. */ >> void bpf_user_rnd_init_once(void); >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >> index 77c6be96d676..9272dc8fb08c 100644 >> --- a/include/uapi/linux/bpf.h >> +++ b/include/uapi/linux/bpf.h >> @@ -2750,6 +2750,21 @@ union bpf_attr { >> * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies >> * >> * **-EPROTONOSUPPORT** IP packet version is not 4 or 6 >> + * >> + * int bpf_get_ns_current_pid_tgid(u32 dev, u64 inum) >> + * Return >> + * A 64-bit integer containing the current tgid and pid from current task > > Function signature doesn't correspond to the actual return type (int vs u64). > >> + * which namespace inode and dev_t matches , and is create as such: >> + * *current_task*\ **->tgid << 32 \|** >> + * *current_task*\ **->pid**. >> + * >> + * On failure, the returned value is one of the following: >> + * >> + * **-EINVAL** if dev and inum supplied don't match dev_t and inode number >> + * with nsfs of current task. >> + * >> + * **-ENOENT** if /proc/self/ns does not exists. >> + * >> */ > > [...] > >> #include "../../lib/kstrtox.h" >> >> @@ -487,3 +489,33 @@ const struct bpf_func_proto bpf_strtoul_proto = { >> .arg4_type = ARG_PTR_TO_LONG, >> }; >> #endif >> + >> +BPF_CALL_2(bpf_get_ns_current_pid_tgid, u32, dev, u64, inum) > > Just curious, is dev_t officially specified as u32 and is never > supposed to grow bigger? I wonder if accepting u64 might be more > future-proof API here? This is what we have now in kernel (include/linux/types.h) typedef u32 __kernel_dev_t; typedef __kernel_dev_t dev_t; But userspace dev_t (defined at /usr/include/sys/types.h) have 8 bytes. Agree. Let us just use u64. It won't hurt and also will be fine if kernel internal dev_t becomes 64bit. > >> +{ >> + struct task_struct *task = current; >> + struct pid_namespace *pidns; > > [...] > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v11 2/4] bpf: added new helper bpf_get_ns_current_pid_tgid 2019-09-27 16:59 ` Yonghong Song @ 2019-09-27 17:24 ` Andrii Nakryiko 2019-09-28 1:42 ` Carlos Antonio Neira Bustos 0 siblings, 1 reply; 18+ messages in thread From: Andrii Nakryiko @ 2019-09-27 17:24 UTC (permalink / raw) To: Yonghong Song Cc: Carlos Neira, Networking, ebiederm, Jesper Dangaard Brouer, bpf On Fri, Sep 27, 2019 at 9:59 AM Yonghong Song <yhs@fb.com> wrote: > > > > On 9/27/19 9:15 AM, Andrii Nakryiko wrote: > > On Thu, Sep 26, 2019 at 1:15 AM Carlos Neira <cneirabustos@gmail.com> wrote: > >> > >> New bpf helper bpf_get_ns_current_pid_tgid, > >> This helper will return pid and tgid from current task > >> which namespace matches dev_t and inode number provided, > >> this will allows us to instrument a process inside a container. > >> > >> Signed-off-by: Carlos Neira <cneirabustos@gmail.com> > >> --- > >> include/linux/bpf.h | 1 + > >> include/uapi/linux/bpf.h | 18 +++++++++++++++++- > >> kernel/bpf/core.c | 1 + > >> kernel/bpf/helpers.c | 32 ++++++++++++++++++++++++++++++++ > >> kernel/trace/bpf_trace.c | 2 ++ > >> 5 files changed, 53 insertions(+), 1 deletion(-) > >> > >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h > >> index 5b9d22338606..231001475504 100644 > >> --- a/include/linux/bpf.h > >> +++ b/include/linux/bpf.h > >> @@ -1055,6 +1055,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto; > >> extern const struct bpf_func_proto bpf_strtol_proto; > >> extern const struct bpf_func_proto bpf_strtoul_proto; > >> extern const struct bpf_func_proto bpf_tcp_sock_proto; > >> +extern const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto; > >> > >> /* Shared helpers among cBPF and eBPF. */ > >> void bpf_user_rnd_init_once(void); > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >> index 77c6be96d676..9272dc8fb08c 100644 > >> --- a/include/uapi/linux/bpf.h > >> +++ b/include/uapi/linux/bpf.h > >> @@ -2750,6 +2750,21 @@ union bpf_attr { > >> * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies > >> * > >> * **-EPROTONOSUPPORT** IP packet version is not 4 or 6 > >> + * > >> + * int bpf_get_ns_current_pid_tgid(u32 dev, u64 inum) > >> + * Return > >> + * A 64-bit integer containing the current tgid and pid from current task > > > > Function signature doesn't correspond to the actual return type (int vs u64). > > > >> + * which namespace inode and dev_t matches , and is create as such: > >> + * *current_task*\ **->tgid << 32 \|** > >> + * *current_task*\ **->pid**. > >> + * > >> + * On failure, the returned value is one of the following: > >> + * > >> + * **-EINVAL** if dev and inum supplied don't match dev_t and inode number > >> + * with nsfs of current task. > >> + * > >> + * **-ENOENT** if /proc/self/ns does not exists. > >> + * > >> */ > > > > [...] > > > >> #include "../../lib/kstrtox.h" > >> > >> @@ -487,3 +489,33 @@ const struct bpf_func_proto bpf_strtoul_proto = { > >> .arg4_type = ARG_PTR_TO_LONG, > >> }; > >> #endif > >> + > >> +BPF_CALL_2(bpf_get_ns_current_pid_tgid, u32, dev, u64, inum) > > > > Just curious, is dev_t officially specified as u32 and is never > > supposed to grow bigger? I wonder if accepting u64 might be more > > future-proof API here? > > This is what we have now in kernel (include/linux/types.h) > typedef u32 __kernel_dev_t; > typedef __kernel_dev_t dev_t; > > But userspace dev_t (defined at /usr/include/sys/types.h) have > 8 bytes. > > Agree. Let us just use u64. It won't hurt and also will be fine > if kernel internal dev_t becomes 64bit. Sounds good. Let's not forget to check that conversion to dev_t doesn't loose high bits, something like: if ((u64)(dev_t)dev != dev) return -E<something>; > > > > >> +{ > >> + struct task_struct *task = current; > >> + struct pid_namespace *pidns; > > > > [...] > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v11 2/4] bpf: added new helper bpf_get_ns_current_pid_tgid 2019-09-27 17:24 ` Andrii Nakryiko @ 2019-09-28 1:42 ` Carlos Antonio Neira Bustos 0 siblings, 0 replies; 18+ messages in thread From: Carlos Antonio Neira Bustos @ 2019-09-28 1:42 UTC (permalink / raw) To: Andrii Nakryiko Cc: Yonghong Song, Networking, ebiederm, Jesper Dangaard Brouer, bpf On Fri, Sep 27, 2019 at 10:24:46AM -0700, Andrii Nakryiko wrote: > On Fri, Sep 27, 2019 at 9:59 AM Yonghong Song <yhs@fb.com> wrote: > > > > > > > > On 9/27/19 9:15 AM, Andrii Nakryiko wrote: > > > On Thu, Sep 26, 2019 at 1:15 AM Carlos Neira <cneirabustos@gmail.com> wrote: > > >> > > >> New bpf helper bpf_get_ns_current_pid_tgid, > > >> This helper will return pid and tgid from current task > > >> which namespace matches dev_t and inode number provided, > > >> this will allows us to instrument a process inside a container. > > >> > > >> Signed-off-by: Carlos Neira <cneirabustos@gmail.com> > > >> --- > > >> include/linux/bpf.h | 1 + > > >> include/uapi/linux/bpf.h | 18 +++++++++++++++++- > > >> kernel/bpf/core.c | 1 + > > >> kernel/bpf/helpers.c | 32 ++++++++++++++++++++++++++++++++ > > >> kernel/trace/bpf_trace.c | 2 ++ > > >> 5 files changed, 53 insertions(+), 1 deletion(-) > > >> > > >> diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > >> index 5b9d22338606..231001475504 100644 > > >> --- a/include/linux/bpf.h > > >> +++ b/include/linux/bpf.h > > >> @@ -1055,6 +1055,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto; > > >> extern const struct bpf_func_proto bpf_strtol_proto; > > >> extern const struct bpf_func_proto bpf_strtoul_proto; > > >> extern const struct bpf_func_proto bpf_tcp_sock_proto; > > >> +extern const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto; > > >> > > >> /* Shared helpers among cBPF and eBPF. */ > > >> void bpf_user_rnd_init_once(void); > > >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > >> index 77c6be96d676..9272dc8fb08c 100644 > > >> --- a/include/uapi/linux/bpf.h > > >> +++ b/include/uapi/linux/bpf.h > > >> @@ -2750,6 +2750,21 @@ union bpf_attr { > > >> * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies > > >> * > > >> * **-EPROTONOSUPPORT** IP packet version is not 4 or 6 > > >> + * > > >> + * int bpf_get_ns_current_pid_tgid(u32 dev, u64 inum) > > >> + * Return > > >> + * A 64-bit integer containing the current tgid and pid from current task > > > > > > Function signature doesn't correspond to the actual return type (int vs u64). > > > > > >> + * which namespace inode and dev_t matches , and is create as such: > > >> + * *current_task*\ **->tgid << 32 \|** > > >> + * *current_task*\ **->pid**. > > >> + * > > >> + * On failure, the returned value is one of the following: > > >> + * > > >> + * **-EINVAL** if dev and inum supplied don't match dev_t and inode number > > >> + * with nsfs of current task. > > >> + * > > >> + * **-ENOENT** if /proc/self/ns does not exists. > > >> + * > > >> */ > > > > > > [...] > > > > > >> #include "../../lib/kstrtox.h" > > >> > > >> @@ -487,3 +489,33 @@ const struct bpf_func_proto bpf_strtoul_proto = { > > >> .arg4_type = ARG_PTR_TO_LONG, > > >> }; > > >> #endif > > >> + > > >> +BPF_CALL_2(bpf_get_ns_current_pid_tgid, u32, dev, u64, inum) > > > > > > Just curious, is dev_t officially specified as u32 and is never > > > supposed to grow bigger? I wonder if accepting u64 might be more > > > future-proof API here? > > > > This is what we have now in kernel (include/linux/types.h) > > typedef u32 __kernel_dev_t; > > typedef __kernel_dev_t dev_t; > > > > But userspace dev_t (defined at /usr/include/sys/types.h) have > > 8 bytes. > > > > Agree. Let us just use u64. It won't hurt and also will be fine > > if kernel internal dev_t becomes 64bit. > > Sounds good. Let's not forget to check that conversion to dev_t > doesn't loose high bits, something like: > > if ((u64)(dev_t)dev != dev) > return -E<something>; > > > > > > > > >> +{ > > >> + struct task_struct *task = current; > > >> + struct pid_namespace *pidns; > > > > > > [...] > > > Thanks Yonghong and Andrii, I'll include these fixes on V12, I'll work on this over the weekend. Bests ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH bpf-next v11 3/4] tools: Added bpf_get_ns_current_pid_tgid helper 2019-09-24 15:20 [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira 2019-09-24 15:20 ` [PATCH bpf-next v11 1/4] fs/nsfs.c: added ns_match Carlos Neira 2019-09-24 15:20 ` [PATCH bpf-next v11 2/4] bpf: added new helper bpf_get_ns_current_pid_tgid Carlos Neira @ 2019-09-24 15:20 ` Carlos Neira 2019-09-25 3:27 ` Yonghong Song 2019-09-24 15:20 ` [PATCH bpf-next v11 4/4] tools/testing/selftests/bpf: Add self-tests for new helper Carlos Neira ` (2 subsequent siblings) 5 siblings, 1 reply; 18+ messages in thread From: Carlos Neira @ 2019-09-24 15:20 UTC (permalink / raw) To: netdev; +Cc: yhs, ebiederm, brouer, bpf, cneirabustos Signed-off-by: Carlos Neira <cneirabustos@gmail.com> --- tools/include/uapi/linux/bpf.h | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 77c6be96d676..9272dc8fb08c 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2750,6 +2750,21 @@ union bpf_attr { * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies * * **-EPROTONOSUPPORT** IP packet version is not 4 or 6 + * + * int bpf_get_ns_current_pid_tgid(u32 dev, u64 inum) + * Return + * A 64-bit integer containing the current tgid and pid from current task + * which namespace inode and dev_t matches , and is create as such: + * *current_task*\ **->tgid << 32 \|** + * *current_task*\ **->pid**. + * + * On failure, the returned value is one of the following: + * + * **-EINVAL** if dev and inum supplied don't match dev_t and inode number + * with nsfs of current task. + * + * **-ENOENT** if /proc/self/ns does not exists. + * */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2862,7 +2877,8 @@ union bpf_attr { FN(sk_storage_get), \ FN(sk_storage_delete), \ FN(send_signal), \ - FN(tcp_gen_syncookie), + FN(tcp_gen_syncookie), \ + FN(get_ns_current_pid_tgid), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call -- 2.20.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v11 3/4] tools: Added bpf_get_ns_current_pid_tgid helper 2019-09-24 15:20 ` [PATCH bpf-next v11 3/4] tools: Added bpf_get_ns_current_pid_tgid helper Carlos Neira @ 2019-09-25 3:27 ` Yonghong Song 0 siblings, 0 replies; 18+ messages in thread From: Yonghong Song @ 2019-09-25 3:27 UTC (permalink / raw) To: Carlos Neira, netdev; +Cc: ebiederm, brouer, bpf On 9/24/19 8:20 AM, Carlos Neira wrote: > Signed-off-by: Carlos Neira <cneirabustos@gmail.com> Please do add some commit message. A couple of examples, commit 0fc2e0b84ba725c5e6ee66059936638389e67c80 Author: Alexei Starovoitov <ast@kernel.org> Date: Thu Aug 22 22:52:13 2019 -0700 tools/bpf: sync bpf.h sync bpf.h from kernel/ to tools/ Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> commit 1f8919b170318e7e13e303eedac363d44057995f Author: Peter Wu <peter@lekensteyn.nl> Date: Wed Aug 21 00:09:00 2019 +0100 bpf: sync bpf.h to tools/ Fix a 'struct pt_reg' typo and clarify when bpf_trace_printk discards lines. Affects documentation only. Signed-off-by: Peter Wu <peter@lekensteyn.nl> Signed-off-by: Alexei Starovoitov <ast@kernel.org> > --- > tools/include/uapi/linux/bpf.h | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index 77c6be96d676..9272dc8fb08c 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -2750,6 +2750,21 @@ union bpf_attr { > * **-EOPNOTSUPP** kernel configuration does not enable SYN cookies > * > * **-EPROTONOSUPPORT** IP packet version is not 4 or 6 > + * > + * int bpf_get_ns_current_pid_tgid(u32 dev, u64 inum) > + * Return > + * A 64-bit integer containing the current tgid and pid from current task > + * which namespace inode and dev_t matches , and is create as such: > + * *current_task*\ **->tgid << 32 \|** > + * *current_task*\ **->pid**. > + * > + * On failure, the returned value is one of the following: > + * > + * **-EINVAL** if dev and inum supplied don't match dev_t and inode number > + * with nsfs of current task. > + * > + * **-ENOENT** if /proc/self/ns does not exists. > + * > */ > #define __BPF_FUNC_MAPPER(FN) \ > FN(unspec), \ > @@ -2862,7 +2877,8 @@ union bpf_attr { > FN(sk_storage_get), \ > FN(sk_storage_delete), \ > FN(send_signal), \ > - FN(tcp_gen_syncookie), > + FN(tcp_gen_syncookie), \ > + FN(get_ns_current_pid_tgid), > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > * function eBPF program intends to call > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH bpf-next v11 4/4] tools/testing/selftests/bpf: Add self-tests for new helper. 2019-09-24 15:20 [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira ` (2 preceding siblings ...) 2019-09-24 15:20 ` [PATCH bpf-next v11 3/4] tools: Added bpf_get_ns_current_pid_tgid helper Carlos Neira @ 2019-09-24 15:20 ` Carlos Neira 2019-09-25 16:07 ` Yonghong Song 2019-09-24 18:01 ` [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task Daniel Borkmann 2019-09-26 0:59 ` Eric W. Biederman 5 siblings, 1 reply; 18+ messages in thread From: Carlos Neira @ 2019-09-24 15:20 UTC (permalink / raw) To: netdev; +Cc: yhs, ebiederm, brouer, bpf, cneirabustos Self tests added for new helper Signed-off-by: Carlos Neira <cneirabustos@gmail.com> --- tools/testing/selftests/bpf/Makefile | 2 +- tools/testing/selftests/bpf/bpf_helpers.h | 3 + .../selftests/bpf/progs/test_pidns_kern.c | 71 ++++++++ tools/testing/selftests/bpf/test_pidns.c | 152 ++++++++++++++++++ 4 files changed, 227 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c create mode 100644 tools/testing/selftests/bpf/test_pidns.c diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 7f3196af1ae4..d86b28aa8f44 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -28,7 +28,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test test_sock test_btf test_sockmap get_cgroup_id_user test_socket_cookie \ test_cgroup_storage test_select_reuseport test_section_names \ test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \ - test_btf_dump test_cgroup_attach xdping + test_btf_dump test_cgroup_attach xdping test_pidns BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c))) TEST_GEN_FILES = $(BPF_OBJ_FILES) diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h index 6c4930bc6e2e..03d0e15ae29f 100644 --- a/tools/testing/selftests/bpf/bpf_helpers.h +++ b/tools/testing/selftests/bpf/bpf_helpers.h @@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal; static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip, int ip_len, void *tcp, int tcp_len) = (void *) BPF_FUNC_tcp_gen_syncookie; +static int (*bpf_get_ns_current_pid_tgid)(__u32 dev, __u64 inum) = + (void *) BPF_FUNC_get_ns_current_pid_tgid; + /* llvm builtin functions that eBPF C program may use to * emit BPF_LD_ABS and BPF_LD_IND instructions diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c new file mode 100644 index 000000000000..96cb707db3ee --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c @@ -0,0 +1,71 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ + +#include <linux/bpf.h> +#include "bpf_helpers.h" + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, __u32); + __type(value, __u64); +} ns_inum_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, __u32); + __type(value, __u32); +} ns_dev_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, __u32); + __type(value, __u32); +} pidmap SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, __u32); + __type(value, __u64); +} ns_pid_map SEC(".maps"); + + + +SEC("tracepoint/syscalls/sys_enter_nanosleep") +int trace(void *ctx) +{ + __u32 key = 0, *expected_pid, *dev; + __u64 *val, *inum, nspid; + __u32 pid; + + dev = bpf_map_lookup_elem(&ns_dev_map, &key); + if (!dev) + return 0; + + inum = bpf_map_lookup_elem(&ns_inum_map, &key); + if (!inum) + return 0; + + nspid = bpf_get_ns_current_pid_tgid(*dev, *inum); + expected_pid = bpf_map_lookup_elem(&pidmap, &key); + + if (!expected_pid || *expected_pid != nspid) + return 0; + + val = bpf_map_lookup_elem(&ns_pid_map, &key); + if (val) + *val = nspid; + + return 0; +} + +char _license[] SEC("license") = "GPL"; +__u32 _version SEC("version") = 1; diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c new file mode 100644 index 000000000000..088f8025f2bf --- /dev/null +++ b/tools/testing/selftests/bpf/test_pidns.c @@ -0,0 +1,152 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <errno.h> +#include <fcntl.h> +#include <syscall.h> +#include <unistd.h> +#include <linux/perf_event.h> +#include <sys/ioctl.h> +#include <sys/time.h> +#include <sys/types.h> +#include <sys/stat.h> + +#include <linux/bpf.h> +#include <bpf/bpf.h> +#include <bpf/libbpf.h> + +#include "cgroup_helpers.h" +#include "bpf_rlimit.h" + +#define CHECK(condition, tag, format...) ({ \ + int __ret = !!(condition); \ + if (__ret) { \ + printf("%s:FAIL:%s ", __func__, tag); \ + printf(format); \ + } else { \ + printf("%s:PASS:%s\n", __func__, tag); \ + } \ + __ret; \ +}) + +static int bpf_find_map(const char *test, struct bpf_object *obj, + const char *name) +{ + struct bpf_map *map; + + map = bpf_object__find_map_by_name(obj, name); + if (!map) + return -1; + return bpf_map__fd(map); +} + + +int main(int argc, char **argv) +{ + int pidmap_fd, ns_inum_map_fd, ns_dev_map_fd, ns_pid_map_fd; + const char *probe_name = "syscalls/sys_enter_nanosleep"; + const char *file = "test_pidns_kern.o"; + int err, bytes, efd, prog_fd, pmu_fd; + struct perf_event_attr attr = {}; + struct bpf_object *obj; + __u32 nspid = 0; + __u32 key = 0, pid; + int exit_code = 1; + struct stat st; + char buf[256]; + + err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd); + if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno)) + goto cleanup_cgroup_env; + + ns_dev_map_fd = bpf_find_map(__func__, obj, "ns_dev_map"); + if (CHECK(ns_dev_map_fd < 0, "bpf_find_map", "err %d errno %d\n", + ns_dev_map_fd, errno)) + goto close_prog; + + ns_inum_map_fd = bpf_find_map(__func__, obj, "ns_inum_map"); + if (CHECK(ns_inum_map_fd < 0, "bpf_find_map", "err %d errno %d\n", + ns_inum_map_fd, errno)) + goto close_prog; + + ns_pid_map_fd = bpf_find_map(__func__, obj, "ns_pid_map"); + if (CHECK(ns_pid_map_fd < 0, "bpf_find_map", "err %d errno %d\n", + ns_pid_map_fd, errno)) + goto close_prog; + + pidmap_fd = bpf_find_map(__func__, obj, "pidmap"); + if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n", + pidmap_fd, errno)) + goto close_prog; + + pid = getpid(); + bpf_map_update_elem(pidmap_fd, &key, &pid, 0); + + if (stat("/proc/self/ns/pid", &st)) + goto close_prog; + + bpf_map_update_elem(ns_inum_map_fd, &key, &st.st_ino, 0); + bpf_map_update_elem(ns_dev_map_fd, &key, &st.st_dev, 0); + + + snprintf(buf, sizeof(buf), + "/sys/kernel/debug/tracing/events/%s/id", probe_name); + efd = open(buf, O_RDONLY, 0); + if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno)) + goto close_prog; + bytes = read(efd, buf, sizeof(buf)); + close(efd); + if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read", + "bytes %d errno %d\n", bytes, errno)) + goto close_prog; + + attr.config = strtol(buf, NULL, 0); + attr.type = PERF_TYPE_TRACEPOINT; + attr.sample_type = PERF_SAMPLE_RAW; + attr.sample_period = 1; + attr.wakeup_events = 1; + + pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0); + if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd, + errno)) + goto close_prog; + + err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0); + if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err, + errno)) + goto close_pmu; + + err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); + if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err, + errno)) + goto close_pmu; + + /* trigger some syscalls */ + sleep(1); + + err = bpf_map_lookup_elem(ns_pid_map_fd, &key, &nspid); + if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno)) + goto close_pmu; + + if (CHECK(nspid != pid, "compare nspid vs pid", + "kern nspid %u user pid %u", nspid, pid)) + goto close_pmu; + + exit_code = 0; + printf("%s:PASS\n", argv[0]); + +close_pmu: + close(pmu_fd); +close_prog: + bpf_object__close(obj); +cleanup_cgroup_env: + return exit_code; +} -- 2.20.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v11 4/4] tools/testing/selftests/bpf: Add self-tests for new helper. 2019-09-24 15:20 ` [PATCH bpf-next v11 4/4] tools/testing/selftests/bpf: Add self-tests for new helper Carlos Neira @ 2019-09-25 16:07 ` Yonghong Song 2019-09-25 20:33 ` Carlos Antonio Neira Bustos 0 siblings, 1 reply; 18+ messages in thread From: Yonghong Song @ 2019-09-25 16:07 UTC (permalink / raw) To: Carlos Neira, netdev; +Cc: ebiederm, brouer, bpf On 9/24/19 8:20 AM, Carlos Neira wrote: > Self tests added for new helper > > Signed-off-by: Carlos Neira <cneirabustos@gmail.com> > --- > tools/testing/selftests/bpf/Makefile | 2 +- > tools/testing/selftests/bpf/bpf_helpers.h | 3 + > .../selftests/bpf/progs/test_pidns_kern.c | 71 ++++++++ > tools/testing/selftests/bpf/test_pidns.c | 152 ++++++++++++++++++ > 4 files changed, 227 insertions(+), 1 deletion(-) > create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c > create mode 100644 tools/testing/selftests/bpf/test_pidns.c > > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile > index 7f3196af1ae4..d86b28aa8f44 100644 > --- a/tools/testing/selftests/bpf/Makefile > +++ b/tools/testing/selftests/bpf/Makefile > @@ -28,7 +28,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test > test_sock test_btf test_sockmap get_cgroup_id_user test_socket_cookie \ > test_cgroup_storage test_select_reuseport test_section_names \ > test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \ > - test_btf_dump test_cgroup_attach xdping > + test_btf_dump test_cgroup_attach xdping test_pidns Could you fold test_pidns into test_progs? > > BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c))) > TEST_GEN_FILES = $(BPF_OBJ_FILES) > diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h > index 6c4930bc6e2e..03d0e15ae29f 100644 > --- a/tools/testing/selftests/bpf/bpf_helpers.h > +++ b/tools/testing/selftests/bpf/bpf_helpers.h > @@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal; > static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip, > int ip_len, void *tcp, int tcp_len) = > (void *) BPF_FUNC_tcp_gen_syncookie; > +static int (*bpf_get_ns_current_pid_tgid)(__u32 dev, __u64 inum) = > + (void *) BPF_FUNC_get_ns_current_pid_tgid; > + > > /* llvm builtin functions that eBPF C program may use to > * emit BPF_LD_ABS and BPF_LD_IND instructions > diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c > new file mode 100644 > index 000000000000..96cb707db3ee > --- /dev/null > +++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c > @@ -0,0 +1,71 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com 2019 > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of version 2 of the GNU General Public > + * License as published by the Free Software Foundation. You do not need the above statement, which is covered by "SPDX-License-Identifier: GPL-2.0". > + */ > + > +#include <linux/bpf.h> > +#include "bpf_helpers.h" > + > +struct { > + __uint(type, BPF_MAP_TYPE_ARRAY); > + __uint(max_entries, 1); > + __type(key, __u32); > + __type(value, __u64); > +} ns_inum_map SEC(".maps"); > + > +struct { > + __uint(type, BPF_MAP_TYPE_ARRAY); > + __uint(max_entries, 1); > + __type(key, __u32); > + __type(value, __u32); > +} ns_dev_map SEC(".maps"); > + > +struct { > + __uint(type, BPF_MAP_TYPE_ARRAY); > + __uint(max_entries, 1); > + __type(key, __u32); > + __type(value, __u32); > +} pidmap SEC(".maps"); Can we make the value __u64 to include both pid and tid to compare both? > + > +struct { > + __uint(type, BPF_MAP_TYPE_ARRAY); > + __uint(max_entries, 1); > + __type(key, __u32); > + __type(value, __u64); > +} ns_pid_map SEC(".maps"); > + > The above four one-element maps are perfectly examples to use static global variables which are supported by the kernel. You can take a look at the patch https://patchwork.ozlabs.org/patch/1081014/ which shows how user space can modify the map values. In the future, we could have a better interface to read/update those static variable values. > + > +SEC("tracepoint/syscalls/sys_enter_nanosleep") > +int trace(void *ctx) > +{ > + __u32 key = 0, *expected_pid, *dev; expected_pid => __u64 *? > + __u64 *val, *inum, nspid; > + __u32 pid; > + > + dev = bpf_map_lookup_elem(&ns_dev_map, &key); > + if (!dev) > + return 0; > + > + inum = bpf_map_lookup_elem(&ns_inum_map, &key); > + if (!inum) > + return 0; > + > + nspid = bpf_get_ns_current_pid_tgid(*dev, *inum); > + expected_pid = bpf_map_lookup_elem(&pidmap, &key); > + > + if (!expected_pid || *expected_pid != nspid) > + return 0; > + > + val = bpf_map_lookup_elem(&ns_pid_map, &key); > + if (val) > + *val = nspid; > + > + return 0; > +} > + > +char _license[] SEC("license") = "GPL"; > +__u32 _version SEC("version") = 1; > diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c > new file mode 100644 > index 000000000000..088f8025f2bf > --- /dev/null > +++ b/tools/testing/selftests/bpf/test_pidns.c > @@ -0,0 +1,152 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com 2019 > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of version 2 of the GNU General Public > + * License as published by the Free Software Foundation. ditto. you do not need the above statement. > + */ > + > +#include <stdio.h> > +#include <stdlib.h> > +#include <string.h> > +#include <errno.h> > +#include <fcntl.h> > +#include <syscall.h> > +#include <unistd.h> > +#include <linux/perf_event.h> > +#include <sys/ioctl.h> > +#include <sys/time.h> > +#include <sys/types.h> > +#include <sys/stat.h> > + > +#include <linux/bpf.h> > +#include <bpf/bpf.h> > +#include <bpf/libbpf.h> > + > +#include "cgroup_helpers.h" > +#include "bpf_rlimit.h" > + > +#define CHECK(condition, tag, format...) ({ \ > + int __ret = !!(condition); \ > + if (__ret) { \ > + printf("%s:FAIL:%s ", __func__, tag); \ > + printf(format); \ > + } else { \ > + printf("%s:PASS:%s\n", __func__, tag); \ > + } \ > + __ret; \ > +}) > + > +static int bpf_find_map(const char *test, struct bpf_object *obj, > + const char *name) > +{ > + struct bpf_map *map; > + > + map = bpf_object__find_map_by_name(obj, name); > + if (!map) > + return -1; > + return bpf_map__fd(map); The 'test' argument is not used here. Also, we have bpf_object__find_map_fd_by_name() API, you can use it and you do not need this function. > +} > + > + > +int main(int argc, char **argv) > +{ > + int pidmap_fd, ns_inum_map_fd, ns_dev_map_fd, ns_pid_map_fd; > + const char *probe_name = "syscalls/sys_enter_nanosleep"; > + const char *file = "test_pidns_kern.o"; > + int err, bytes, efd, prog_fd, pmu_fd; > + struct perf_event_attr attr = {}; > + struct bpf_object *obj; > + __u32 nspid = 0; > + __u32 key = 0, pid; > + int exit_code = 1; to make it reverse Christmas tree style? > + struct stat st; > + char buf[256]; > + > + err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd); > + if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno)) > + goto cleanup_cgroup_env; > + > + ns_dev_map_fd = bpf_find_map(__func__, obj, "ns_dev_map"); > + if (CHECK(ns_dev_map_fd < 0, "bpf_find_map", "err %d errno %d\n", > + ns_dev_map_fd, errno)) > + goto close_prog; > + > + ns_inum_map_fd = bpf_find_map(__func__, obj, "ns_inum_map"); > + if (CHECK(ns_inum_map_fd < 0, "bpf_find_map", "err %d errno %d\n", > + ns_inum_map_fd, errno)) > + goto close_prog; > + > + ns_pid_map_fd = bpf_find_map(__func__, obj, "ns_pid_map"); > + if (CHECK(ns_pid_map_fd < 0, "bpf_find_map", "err %d errno %d\n", > + ns_pid_map_fd, errno)) > + goto close_prog; > + > + pidmap_fd = bpf_find_map(__func__, obj, "pidmap"); > + if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n", > + pidmap_fd, errno)) > + goto close_prog; > + > + pid = getpid(); > + bpf_map_update_elem(pidmap_fd, &key, &pid, 0); maybe id = (__u64) getpid() << 32 | gettid(); bpf_map_update_elem(pidmap_fd, &key, &id, 0); In your original above kernel code, you actually compare user space pid to kernel tid, which is okay for this program as it is the main thread and tid == pid, but the mechanism is wrong. > + > + if (stat("/proc/self/ns/pid", &st)) > + goto close_prog; > + > + bpf_map_update_elem(ns_inum_map_fd, &key, &st.st_ino, 0); > + bpf_map_update_elem(ns_dev_map_fd, &key, &st.st_dev, 0); > + > + > + snprintf(buf, sizeof(buf), > + "/sys/kernel/debug/tracing/events/%s/id", probe_name); > + efd = open(buf, O_RDONLY, 0); > + if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno)) > + goto close_prog; > + bytes = read(efd, buf, sizeof(buf)); > + close(efd); > + if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read", > + "bytes %d errno %d\n", bytes, errno)) > + goto close_prog; > + > + attr.config = strtol(buf, NULL, 0); > + attr.type = PERF_TYPE_TRACEPOINT; > + attr.sample_type = PERF_SAMPLE_RAW; > + attr.sample_period = 1; > + attr.wakeup_events = 1; > + > + pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0); > + if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd, > + errno)) > + goto close_prog; > + > + err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0); > + if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err, > + errno)) > + goto close_pmu; > + > + err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); > + if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err, > + errno)) > + goto close_pmu; The whole thing above related to tracepoint can be simplified. Please see prog_tests/stacktrace_map.c. > + > + /* trigger some syscalls */ > + sleep(1); > + > + err = bpf_map_lookup_elem(ns_pid_map_fd, &key, &nspid); > + if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno)) > + goto close_pmu; > + > + if (CHECK(nspid != pid, "compare nspid vs pid", > + "kern nspid %u user pid %u", nspid, pid)) > + goto close_pmu; > + > + exit_code = 0; > + printf("%s:PASS\n", argv[0]); > + > +close_pmu: > + close(pmu_fd); > +close_prog: > + bpf_object__close(obj); > +cleanup_cgroup_env: > + return exit_code; > +} > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH bpf-next v11 4/4] tools/testing/selftests/bpf: Add self-tests for new helper. 2019-09-25 16:07 ` Yonghong Song @ 2019-09-25 20:33 ` Carlos Antonio Neira Bustos 0 siblings, 0 replies; 18+ messages in thread From: Carlos Antonio Neira Bustos @ 2019-09-25 20:33 UTC (permalink / raw) To: Yonghong Song; +Cc: netdev, ebiederm, brouer, bpf On Wed, Sep 25, 2019 at 04:07:09PM +0000, Yonghong Song wrote: > > > On 9/24/19 8:20 AM, Carlos Neira wrote: > > Self tests added for new helper > > > > Signed-off-by: Carlos Neira <cneirabustos@gmail.com> > > --- > > tools/testing/selftests/bpf/Makefile | 2 +- > > tools/testing/selftests/bpf/bpf_helpers.h | 3 + > > .../selftests/bpf/progs/test_pidns_kern.c | 71 ++++++++ > > tools/testing/selftests/bpf/test_pidns.c | 152 ++++++++++++++++++ > > 4 files changed, 227 insertions(+), 1 deletion(-) > > create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c > > create mode 100644 tools/testing/selftests/bpf/test_pidns.c > > > > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile > > index 7f3196af1ae4..d86b28aa8f44 100644 > > --- a/tools/testing/selftests/bpf/Makefile > > +++ b/tools/testing/selftests/bpf/Makefile > > @@ -28,7 +28,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test > > test_sock test_btf test_sockmap get_cgroup_id_user test_socket_cookie \ > > test_cgroup_storage test_select_reuseport test_section_names \ > > test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \ > > - test_btf_dump test_cgroup_attach xdping > > + test_btf_dump test_cgroup_attach xdping test_pidns > > Could you fold test_pidns into test_progs? > > > > > BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c))) > > TEST_GEN_FILES = $(BPF_OBJ_FILES) > > diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h > > index 6c4930bc6e2e..03d0e15ae29f 100644 > > --- a/tools/testing/selftests/bpf/bpf_helpers.h > > +++ b/tools/testing/selftests/bpf/bpf_helpers.h > > @@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal; > > static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip, > > int ip_len, void *tcp, int tcp_len) = > > (void *) BPF_FUNC_tcp_gen_syncookie; > > +static int (*bpf_get_ns_current_pid_tgid)(__u32 dev, __u64 inum) = > > + (void *) BPF_FUNC_get_ns_current_pid_tgid; > > + > > > > /* llvm builtin functions that eBPF C program may use to > > * emit BPF_LD_ABS and BPF_LD_IND instructions > > diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c > > new file mode 100644 > > index 000000000000..96cb707db3ee > > --- /dev/null > > +++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c > > @@ -0,0 +1,71 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com > > 2019 > > > + * > > + * This program is free software; you can redistribute it and/or > > + * modify it under the terms of version 2 of the GNU General Public > > + * License as published by the Free Software Foundation. > > You do not need the above statement, which is covered by > "SPDX-License-Identifier: GPL-2.0". > > > + */ > > + > > +#include <linux/bpf.h> > > +#include "bpf_helpers.h" > > + > > +struct { > > + __uint(type, BPF_MAP_TYPE_ARRAY); > > + __uint(max_entries, 1); > > + __type(key, __u32); > > + __type(value, __u64); > > +} ns_inum_map SEC(".maps"); > > + > > +struct { > > + __uint(type, BPF_MAP_TYPE_ARRAY); > > + __uint(max_entries, 1); > > + __type(key, __u32); > > + __type(value, __u32); > > +} ns_dev_map SEC(".maps"); > > + > > +struct { > > + __uint(type, BPF_MAP_TYPE_ARRAY); > > + __uint(max_entries, 1); > > + __type(key, __u32); > > + __type(value, __u32); > > +} pidmap SEC(".maps"); > > Can we make the value __u64 to include > both pid and tid to compare both? > > > + > > +struct { > > + __uint(type, BPF_MAP_TYPE_ARRAY); > > + __uint(max_entries, 1); > > + __type(key, __u32); > > + __type(value, __u64); > > +} ns_pid_map SEC(".maps"); > > + > > > > The above four one-element maps are perfectly examples > to use static global variables which are supported > by the kernel. > > You can take a look at the patch > https://patchwork.ozlabs.org/patch/1081014/ > which shows how user space can modify the map > values. In the future, we could have a better > interface to read/update those static variable > values. > > > + > > +SEC("tracepoint/syscalls/sys_enter_nanosleep") > > +int trace(void *ctx) > > +{ > > + __u32 key = 0, *expected_pid, *dev; > > expected_pid => __u64 *? > > > + __u64 *val, *inum, nspid; > > + __u32 pid; > > + > > + dev = bpf_map_lookup_elem(&ns_dev_map, &key); > > + if (!dev) > > + return 0; > > + > > + inum = bpf_map_lookup_elem(&ns_inum_map, &key); > > + if (!inum) > > + return 0; > > + > > + nspid = bpf_get_ns_current_pid_tgid(*dev, *inum); > > + expected_pid = bpf_map_lookup_elem(&pidmap, &key); > > + > > + if (!expected_pid || *expected_pid != nspid) > > + return 0; > > + > > + val = bpf_map_lookup_elem(&ns_pid_map, &key); > > + if (val) > > + *val = nspid; > > + > > + return 0; > > +} > > + > > +char _license[] SEC("license") = "GPL"; > > +__u32 _version SEC("version") = 1; > > diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c > > new file mode 100644 > > index 000000000000..088f8025f2bf > > --- /dev/null > > +++ b/tools/testing/selftests/bpf/test_pidns.c > > @@ -0,0 +1,152 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com > > 2019 > > > + * > > + * This program is free software; you can redistribute it and/or > > + * modify it under the terms of version 2 of the GNU General Public > > + * License as published by the Free Software Foundation. > > ditto. you do not need the above statement. > > > + */ > > + > > +#include <stdio.h> > > +#include <stdlib.h> > > +#include <string.h> > > +#include <errno.h> > > +#include <fcntl.h> > > +#include <syscall.h> > > +#include <unistd.h> > > +#include <linux/perf_event.h> > > +#include <sys/ioctl.h> > > +#include <sys/time.h> > > +#include <sys/types.h> > > +#include <sys/stat.h> > > + > > +#include <linux/bpf.h> > > +#include <bpf/bpf.h> > > +#include <bpf/libbpf.h> > > + > > +#include "cgroup_helpers.h" > > +#include "bpf_rlimit.h" > > + > > +#define CHECK(condition, tag, format...) ({ \ > > + int __ret = !!(condition); \ > > + if (__ret) { \ > > + printf("%s:FAIL:%s ", __func__, tag); \ > > + printf(format); \ > > + } else { \ > > + printf("%s:PASS:%s\n", __func__, tag); \ > > + } \ > > + __ret; \ > > +}) > > + > > +static int bpf_find_map(const char *test, struct bpf_object *obj, > > + const char *name) > > +{ > > + struct bpf_map *map; > > + > > + map = bpf_object__find_map_by_name(obj, name); > > + if (!map) > > + return -1; > > + return bpf_map__fd(map); > > The 'test' argument is not used here. > Also, we have bpf_object__find_map_fd_by_name() API, you can > use it and you do not need this function. > > > +} > > + > > + > > +int main(int argc, char **argv) > > +{ > > + int pidmap_fd, ns_inum_map_fd, ns_dev_map_fd, ns_pid_map_fd; > > + const char *probe_name = "syscalls/sys_enter_nanosleep"; > > + const char *file = "test_pidns_kern.o"; > > + int err, bytes, efd, prog_fd, pmu_fd; > > + struct perf_event_attr attr = {}; > > + struct bpf_object *obj; > > + __u32 nspid = 0; > > + __u32 key = 0, pid; > > + int exit_code = 1; > > to make it reverse Christmas tree style? > > > + struct stat st; > > + char buf[256]; > > + > > + err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd); > > + if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno)) > > + goto cleanup_cgroup_env; > > + > > + ns_dev_map_fd = bpf_find_map(__func__, obj, "ns_dev_map"); > > + if (CHECK(ns_dev_map_fd < 0, "bpf_find_map", "err %d errno %d\n", > > + ns_dev_map_fd, errno)) > > + goto close_prog; > > + > > + ns_inum_map_fd = bpf_find_map(__func__, obj, "ns_inum_map"); > > + if (CHECK(ns_inum_map_fd < 0, "bpf_find_map", "err %d errno %d\n", > > + ns_inum_map_fd, errno)) > > + goto close_prog; > > + > > + ns_pid_map_fd = bpf_find_map(__func__, obj, "ns_pid_map"); > > + if (CHECK(ns_pid_map_fd < 0, "bpf_find_map", "err %d errno %d\n", > > + ns_pid_map_fd, errno)) > > + goto close_prog; > > + > > + pidmap_fd = bpf_find_map(__func__, obj, "pidmap"); > > + if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n", > > + pidmap_fd, errno)) > > + goto close_prog; > > + > > + pid = getpid(); > > + bpf_map_update_elem(pidmap_fd, &key, &pid, 0); > > maybe > id = (__u64) getpid() << 32 | gettid(); > bpf_map_update_elem(pidmap_fd, &key, &id, 0); > > In your original above kernel code, you actually compare user space pid > to kernel tid, which is okay for this program as it is the main thread > and tid == pid, but the mechanism is wrong. > > > + > > + if (stat("/proc/self/ns/pid", &st)) > > + goto close_prog; > > + > > + bpf_map_update_elem(ns_inum_map_fd, &key, &st.st_ino, 0); > > + bpf_map_update_elem(ns_dev_map_fd, &key, &st.st_dev, 0); > > + > > + > > + snprintf(buf, sizeof(buf), > > + "/sys/kernel/debug/tracing/events/%s/id", probe_name); > > + efd = open(buf, O_RDONLY, 0); > > + if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno)) > > + goto close_prog; > > + bytes = read(efd, buf, sizeof(buf)); > > + close(efd); > > + if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read", > > + "bytes %d errno %d\n", bytes, errno)) > > + goto close_prog; > > + > > + attr.config = strtol(buf, NULL, 0); > > + attr.type = PERF_TYPE_TRACEPOINT; > > + attr.sample_type = PERF_SAMPLE_RAW; > > + attr.sample_period = 1; > > + attr.wakeup_events = 1; > > + > > + pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0); > > + if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd, > > + errno)) > > + goto close_prog; > > + > > + err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0); > > + if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err, > > + errno)) > > + goto close_pmu; > > + > > + err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); > > + if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err, > > + errno)) > > + goto close_pmu; > > The whole thing above related to tracepoint can be simplified. Please > see prog_tests/stacktrace_map.c. > > > + > > + /* trigger some syscalls */ > > + sleep(1); > > + > > + err = bpf_map_lookup_elem(ns_pid_map_fd, &key, &nspid); > > + if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno)) > > + goto close_pmu; > > + > > + if (CHECK(nspid != pid, "compare nspid vs pid", > > + "kern nspid %u user pid %u", nspid, pid)) > > + goto close_pmu; > > + > > + exit_code = 0; > > + printf("%s:PASS\n", argv[0]); > > + > > +close_pmu: > > + close(pmu_fd); > > +close_prog: > > + bpf_object__close(obj); > > +cleanup_cgroup_env: > > + return exit_code; > > +} > > Thanks Yonghong, I'll correct these issues on v12. Bests ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task 2019-09-24 15:20 [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira ` (3 preceding siblings ...) 2019-09-24 15:20 ` [PATCH bpf-next v11 4/4] tools/testing/selftests/bpf: Add self-tests for new helper Carlos Neira @ 2019-09-24 18:01 ` Daniel Borkmann 2019-09-24 18:14 ` Carlos Antonio Neira Bustos 2019-09-26 0:59 ` Eric W. Biederman 5 siblings, 1 reply; 18+ messages in thread From: Daniel Borkmann @ 2019-09-24 18:01 UTC (permalink / raw) To: Carlos Neira; +Cc: netdev, yhs, ebiederm, brouer, bpf On Tue, Sep 24, 2019 at 12:20:01PM -0300, Carlos Neira wrote: > Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's > scripts but this helper returns the pid as seen by the root namespace which is > fine when a bcc script is not executed inside a container. > When the process of interest is inside a container, pid filtering will not work > if bpf_get_current_pid_tgid() is used. > This helper addresses this limitation returning the pid as it's seen by the current > namespace where the script is executing. > > In the future different pid_ns files may belong to different devices, according to the > discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference. > To address that situation the helper requires inum and dev_t from /proc/self/ns/pid. > This helper has the same use cases as bpf_get_current_pid_tgid() as it can be > used to do pid filtering even inside a container. > > Signed-off-by: Carlos Neira <cneirabustos@gmail.com> > > Carlos Neira (4): > fs/nsfs.c: added ns_match > bpf: added new helper bpf_get_ns_current_pid_tgid > tools: Added bpf_get_ns_current_pid_tgid helper > tools/testing/selftests/bpf: Add self-tests for new helper. self tests > added for new helper bpf-next is currently closed due to merge window. Please resubmit once back open, thanks. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task 2019-09-24 18:01 ` [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task Daniel Borkmann @ 2019-09-24 18:14 ` Carlos Antonio Neira Bustos 0 siblings, 0 replies; 18+ messages in thread From: Carlos Antonio Neira Bustos @ 2019-09-24 18:14 UTC (permalink / raw) To: Daniel Borkmann; +Cc: netdev, yhs, ebiederm, brouer, bpf On Tue, Sep 24, 2019 at 08:01:17PM +0200, Daniel Borkmann wrote: > On Tue, Sep 24, 2019 at 12:20:01PM -0300, Carlos Neira wrote: > > Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's > > scripts but this helper returns the pid as seen by the root namespace which is > > fine when a bcc script is not executed inside a container. > > When the process of interest is inside a container, pid filtering will not work > > if bpf_get_current_pid_tgid() is used. > > This helper addresses this limitation returning the pid as it's seen by the current > > namespace where the script is executing. > > > > In the future different pid_ns files may belong to different devices, according to the > > discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference. > > To address that situation the helper requires inum and dev_t from /proc/self/ns/pid. > > This helper has the same use cases as bpf_get_current_pid_tgid() as it can be > > used to do pid filtering even inside a container. > > > > Signed-off-by: Carlos Neira <cneirabustos@gmail.com> > > > > Carlos Neira (4): > > fs/nsfs.c: added ns_match > > bpf: added new helper bpf_get_ns_current_pid_tgid > > tools: Added bpf_get_ns_current_pid_tgid helper > > tools/testing/selftests/bpf: Add self-tests for new helper. self tests > > added for new helper > > bpf-next is currently closed due to merge window. Please resubmit once back open, thanks. Thanks, Daniel, I'll do so. Bests. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task 2019-09-24 15:20 [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira ` (4 preceding siblings ...) 2019-09-24 18:01 ` [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task Daniel Borkmann @ 2019-09-26 0:59 ` Eric W. Biederman 2019-09-26 15:51 ` Yonghong Song 2019-09-26 16:16 ` John Fastabend 5 siblings, 2 replies; 18+ messages in thread From: Eric W. Biederman @ 2019-09-26 0:59 UTC (permalink / raw) To: Carlos Neira; +Cc: netdev, yhs, brouer, bpf Carlos Neira <cneirabustos@gmail.com> writes: > Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's > scripts but this helper returns the pid as seen by the root namespace which is > fine when a bcc script is not executed inside a container. > When the process of interest is inside a container, pid filtering will not work > if bpf_get_current_pid_tgid() is used. > This helper addresses this limitation returning the pid as it's seen by the current > namespace where the script is executing. > > In the future different pid_ns files may belong to different devices, according to the > discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference. > To address that situation the helper requires inum and dev_t from /proc/self/ns/pid. > This helper has the same use cases as bpf_get_current_pid_tgid() as it can be > used to do pid filtering even inside a container. I think I may have asked this before. If I am repeating old gound please excuse me. Am I correct in understanding these new helpers are designed to be used when programs running in ``conainers'' call it inside pid namespaces register bpf programs for tracing? If so would it be possible to change how the existing bpf opcodes operate when they are used in the context of a pid namespace? That later would seem to allow just moving an existing application into a pid namespace with no modifications. If we can do this with trivial cost at bpf compile time and with no userspace changes that would seem a better approach. If not can someone point me to why we can't do that? What am I missing? Eric > Signed-off-by: Carlos Neira <cneirabustos@gmail.com> > > Carlos Neira (4): > fs/nsfs.c: added ns_match > bpf: added new helper bpf_get_ns_current_pid_tgid > tools: Added bpf_get_ns_current_pid_tgid helper > tools/testing/selftests/bpf: Add self-tests for new helper. self tests > added for new helper > > fs/nsfs.c | 8 + > include/linux/bpf.h | 1 + > include/linux/proc_ns.h | 2 + > include/uapi/linux/bpf.h | 18 ++- > kernel/bpf/core.c | 1 + > kernel/bpf/helpers.c | 32 ++++ > kernel/trace/bpf_trace.c | 2 + > tools/include/uapi/linux/bpf.h | 18 ++- > tools/testing/selftests/bpf/Makefile | 2 +- > tools/testing/selftests/bpf/bpf_helpers.h | 3 + > .../selftests/bpf/progs/test_pidns_kern.c | 71 ++++++++ > tools/testing/selftests/bpf/test_pidns.c | 152 ++++++++++++++++++ > 12 files changed, 307 insertions(+), 3 deletions(-) > create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c > create mode 100644 tools/testing/selftests/bpf/test_pidns.c ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task 2019-09-26 0:59 ` Eric W. Biederman @ 2019-09-26 15:51 ` Yonghong Song 2019-09-26 16:16 ` John Fastabend 1 sibling, 0 replies; 18+ messages in thread From: Yonghong Song @ 2019-09-26 15:51 UTC (permalink / raw) To: Eric W. Biederman, Carlos Neira; +Cc: netdev, brouer, bpf On 9/25/19 5:59 PM, Eric W. Biederman wrote: > Carlos Neira <cneirabustos@gmail.com> writes: > >> Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's >> scripts but this helper returns the pid as seen by the root namespace which is >> fine when a bcc script is not executed inside a container. >> When the process of interest is inside a container, pid filtering will not work >> if bpf_get_current_pid_tgid() is used. >> This helper addresses this limitation returning the pid as it's seen by the current >> namespace where the script is executing. >> >> In the future different pid_ns files may belong to different devices, according to the >> discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference. >> To address that situation the helper requires inum and dev_t from /proc/self/ns/pid. >> This helper has the same use cases as bpf_get_current_pid_tgid() as it can be >> used to do pid filtering even inside a container. > > I think I may have asked this before. If I am repeating old gound > please excuse me. > > Am I correct in understanding these new helpers are designed to be used > when programs running in ``conainers'' call it inside pid namespaces > register bpf programs for tracing? Right. > > If so would it be possible to change how the existing bpf opcodes > operate when they are used in the context of a pid namespace? Today, typical bpf program getting pid like: uint64_t pid_tgid = bpf_get_current_pid_tgid(); pid_t pid = pid_tgid >> 32; pid_t tid = pid_tgid; /* possible filtering ... */ if (pid == <user_provided pid>) .... ... /* record pid in some places */ map_val->pid = pid; ... The bpf_get_current_pid_tgid() is a kernel helper BPF_CALL_0(bpf_get_current_pid_tgid) { struct task_struct *task = current; if (unlikely(!task)) return -EINVAL; return (u64) task->tgid << 32 | task->pid; } So the bpf_get_current_pid_tgid() gets the tgid/pid outside any pid namespaces. To make the program work inside the container, just get namespace pid/tgid not enough. You need to make sure the namespace you are tracking is the one you are in. That is what the new proposed helper to do. Do you suggest we change bpf_get_current_pid_tgid() to return namespaced tgid/pid? First, this will break user API (kernel helper is an API) and second, even if we do get pid/tgid, we still not sure whether this is for my namespace or not. Do you have something in mind to address this issue? > > That later would seem to allow just moving an existing application into > a pid namespace with no modifications. If we can do this with trivial > cost at bpf compile time and with no userspace changes that would seem > a better approach. > > If not can someone point me to why we can't do that? What am I missing? > > Eric > >> Signed-off-by: Carlos Neira <cneirabustos@gmail.com> >> >> Carlos Neira (4): >> fs/nsfs.c: added ns_match >> bpf: added new helper bpf_get_ns_current_pid_tgid >> tools: Added bpf_get_ns_current_pid_tgid helper >> tools/testing/selftests/bpf: Add self-tests for new helper. self tests >> added for new helper >> >> fs/nsfs.c | 8 + >> include/linux/bpf.h | 1 + >> include/linux/proc_ns.h | 2 + >> include/uapi/linux/bpf.h | 18 ++- >> kernel/bpf/core.c | 1 + >> kernel/bpf/helpers.c | 32 ++++ >> kernel/trace/bpf_trace.c | 2 + >> tools/include/uapi/linux/bpf.h | 18 ++- >> tools/testing/selftests/bpf/Makefile | 2 +- >> tools/testing/selftests/bpf/bpf_helpers.h | 3 + >> .../selftests/bpf/progs/test_pidns_kern.c | 71 ++++++++ >> tools/testing/selftests/bpf/test_pidns.c | 152 ++++++++++++++++++ >> 12 files changed, 307 insertions(+), 3 deletions(-) >> create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c >> create mode 100644 tools/testing/selftests/bpf/test_pidns.c ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task 2019-09-26 0:59 ` Eric W. Biederman 2019-09-26 15:51 ` Yonghong Song @ 2019-09-26 16:16 ` John Fastabend 2019-09-26 17:01 ` Yonghong Song 1 sibling, 1 reply; 18+ messages in thread From: John Fastabend @ 2019-09-26 16:16 UTC (permalink / raw) To: Eric W. Biederman, Carlos Neira; +Cc: netdev, yhs, brouer, bpf Eric W. Biederman wrote: > Carlos Neira <cneirabustos@gmail.com> writes: > > > Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's > > scripts but this helper returns the pid as seen by the root namespace which is > > fine when a bcc script is not executed inside a container. > > When the process of interest is inside a container, pid filtering will not work > > if bpf_get_current_pid_tgid() is used. > > This helper addresses this limitation returning the pid as it's seen by the current > > namespace where the script is executing. > > > > In the future different pid_ns files may belong to different devices, according to the > > discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference. > > To address that situation the helper requires inum and dev_t from /proc/self/ns/pid. > > This helper has the same use cases as bpf_get_current_pid_tgid() as it can be > > used to do pid filtering even inside a container. > > I think I may have asked this before. If I am repeating old gound > please excuse me. > > Am I correct in understanding these new helpers are designed to be used > when programs running in ``conainers'' call it inside pid namespaces > register bpf programs for tracing? > > If so would it be possible to change how the existing bpf opcodes > operate when they are used in the context of a pid namespace? > > That later would seem to allow just moving an existing application into > a pid namespace with no modifications. If we can do this with trivial > cost at bpf compile time and with no userspace changes that would seem > a better approach. > > If not can someone point me to why we can't do that? What am I missing? We have some management/observabiliity bpf programs loaded from privileged containers that end up getting triggered in multiple container context. Here we want the root namespace pid otherwise there would be collisions (same pid in multiple containers) when its used as a key and we would have difficulty finding the pid from the root namespace. I guess at load time if its an unprivileged program we could convert it to use the pid of the current namespace? Or if the application is moved into a unprivileged container? Our code is outside bcc so not sure exactly how the bcc case works. Just wanted to point out we use the root namespace pid for various things so I think it might need to be a bit smarter than just the moving an existing application into a pid namespace. .John ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task 2019-09-26 16:16 ` John Fastabend @ 2019-09-26 17:01 ` Yonghong Song 0 siblings, 0 replies; 18+ messages in thread From: Yonghong Song @ 2019-09-26 17:01 UTC (permalink / raw) To: John Fastabend, Eric W. Biederman, Carlos Neira; +Cc: netdev, brouer, bpf On 9/26/19 9:16 AM, John Fastabend wrote: > Eric W. Biederman wrote: >> Carlos Neira <cneirabustos@gmail.com> writes: >> >>> Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's >>> scripts but this helper returns the pid as seen by the root namespace which is >>> fine when a bcc script is not executed inside a container. >>> When the process of interest is inside a container, pid filtering will not work >>> if bpf_get_current_pid_tgid() is used. >>> This helper addresses this limitation returning the pid as it's seen by the current >>> namespace where the script is executing. >>> >>> In the future different pid_ns files may belong to different devices, according to the >>> discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference. >>> To address that situation the helper requires inum and dev_t from /proc/self/ns/pid. >>> This helper has the same use cases as bpf_get_current_pid_tgid() as it can be >>> used to do pid filtering even inside a container. >> >> I think I may have asked this before. If I am repeating old gound >> please excuse me. >> >> Am I correct in understanding these new helpers are designed to be used >> when programs running in ``conainers'' call it inside pid namespaces >> register bpf programs for tracing? >> >> If so would it be possible to change how the existing bpf opcodes >> operate when they are used in the context of a pid namespace? >> >> That later would seem to allow just moving an existing application into >> a pid namespace with no modifications. If we can do this with trivial >> cost at bpf compile time and with no userspace changes that would seem >> a better approach. >> >> If not can someone point me to why we can't do that? What am I missing? > > We have some management/observabiliity bpf programs loaded from privileged > containers that end up getting triggered in multiple container context. Here > we want the root namespace pid otherwise there would be collisions (same pid > in multiple containers) when its used as a key and we would have difficulty > finding the pid from the root namespace. Yes, using root namespace pid will work. I am referring to a priviledged container (current root, and future may just CAP_BPF and CAP_TRACIING) where you do not need to go to root to check root pids. Also, there are cases, we do pid namespace-scope statistics collecting, filtering based on namespace "id" is also needed. > > I guess at load time if its an unprivileged program we could convert it to > use the pid of the current namespace? This way we will need to helper to get current namespace pid. > > Or if the application is moved into a unprivileged container? Ya. A helper will be needed. > > Our code is outside bcc so not sure exactly how the bcc case works. Just > wanted to point out we use the root namespace pid for various things > so I think it might need to be a bit smarter than just the moving an > existing application into a pid namespace. As a workaround, we do this as well. The goal is to improve usability. So we do not need to go to root to find these pids. Sometimes if filtering at namespace level, we have to approximate as sometimes it is impossible to track all pids in the container. > > .John > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2019-09-28 1:43 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-24 15:20 [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira 2019-09-24 15:20 ` [PATCH bpf-next v11 1/4] fs/nsfs.c: added ns_match Carlos Neira 2019-09-24 15:20 ` [PATCH bpf-next v11 2/4] bpf: added new helper bpf_get_ns_current_pid_tgid Carlos Neira 2019-09-27 16:15 ` Andrii Nakryiko 2019-09-27 16:59 ` Yonghong Song 2019-09-27 17:24 ` Andrii Nakryiko 2019-09-28 1:42 ` Carlos Antonio Neira Bustos 2019-09-24 15:20 ` [PATCH bpf-next v11 3/4] tools: Added bpf_get_ns_current_pid_tgid helper Carlos Neira 2019-09-25 3:27 ` Yonghong Song 2019-09-24 15:20 ` [PATCH bpf-next v11 4/4] tools/testing/selftests/bpf: Add self-tests for new helper Carlos Neira 2019-09-25 16:07 ` Yonghong Song 2019-09-25 20:33 ` Carlos Antonio Neira Bustos 2019-09-24 18:01 ` [PATCH V11 0/4] BPF: New helper to obtain namespace data from current task Daniel Borkmann 2019-09-24 18:14 ` Carlos Antonio Neira Bustos 2019-09-26 0:59 ` Eric W. Biederman 2019-09-26 15:51 ` Yonghong Song 2019-09-26 16:16 ` John Fastabend 2019-09-26 17:01 ` Yonghong Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).