On 8/13/19 5:56 PM, Carlos Antonio Neira Bustos wrote: > On Tue, Aug 13, 2019 at 11:11:14PM +0000, Yonghong Song wrote: >> >> >> On 8/13/19 11:47 AM, Carlos Neira wrote: >>> From: Carlos >>> >>> New bpf helper bpf_get_current_pidns_info. >>> This helper obtains the active namespace from current and returns >>> pid, tgid, device and namespace id as seen from that namespace, >>> allowing to instrument a process inside a container. >>> >>> Signed-off-by: Carlos Neira >>> --- >>> fs/internal.h | 2 -- >>> fs/namei.c | 1 - >>> include/linux/bpf.h | 1 + >>> include/linux/namei.h | 4 +++ >>> include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++- >>> kernel/bpf/core.c | 1 + >>> kernel/bpf/helpers.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++ >>> kernel/trace/bpf_trace.c | 2 ++ >>> 8 files changed, 102 insertions(+), 4 deletions(-) >>> [...] >>> >>> +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32, >>> + size) >>> +{ >>> + const char *pidns_path = "/proc/self/ns/pid"; >>> + struct pid_namespace *pidns = NULL; >>> + struct filename *tmp = NULL; >>> + struct inode *inode; >>> + struct path kp; >>> + pid_t tgid = 0; >>> + pid_t pid = 0; >>> + int ret; >>> + int len; >> > > Thank you very much for catching this!. > Could you share how to replicate this bug?. The config is attached. just run trace_ns_info and you can reproduce the issue. > >> I am running your sample program and get the following kernel bug: >> >> ... >> [ 26.414825] BUG: sleeping function called from invalid context at >> /data/users/yhs/work/net-next/fs >> /dcache.c:843 >> [ 26.416314] in_atomic(): 1, irqs_disabled(): 0, pid: 1911, name: ping >> [ 26.417189] CPU: 0 PID: 1911 Comm: ping Tainted: G W >> 5.3.0-rc1+ #280 >> [ 26.418182] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS 1.9.3-1.el7.centos 04/01/2 >> 014 >> [ 26.419393] Call Trace: >> [ 26.419697] >> [ 26.419960] dump_stack+0x46/0x5b >> [ 26.420434] ___might_sleep+0xe4/0x110 >> [ 26.420894] dput+0x2a/0x200 >> [ 26.421265] walk_component+0x10c/0x280 >> [ 26.421773] link_path_walk+0x327/0x560 >> [ 26.422280] ? proc_ns_dir_readdir+0x1a0/0x1a0 >> [ 26.422848] ? path_init+0x232/0x330 >> [ 26.423364] path_lookupat+0x88/0x200 >> [ 26.423808] ? selinux_parse_skb.constprop.69+0x124/0x430 >> [ 26.424521] filename_lookup+0xaf/0x190 >> [ 26.425031] ? simple_attr_release+0x20/0x20 >> [ 26.425560] bpf_get_current_pidns_info+0xfa/0x190 >> [ 26.426168] bpf_prog_83627154cefed596+0xe66/0x1000 >> [ 26.426779] trace_call_bpf+0xb5/0x160 >> [ 26.427317] ? __netif_receive_skb_core+0x1/0xbb0 >> [ 26.427929] ? __netif_receive_skb_core+0x1/0xbb0 >> [ 26.428496] kprobe_perf_func+0x4d/0x280 >> [ 26.428986] ? tracing_record_taskinfo_skip+0x1a/0x30 >> [ 26.429584] ? tracing_record_taskinfo+0xe/0x80 >> [ 26.430152] ? ttwu_do_wakeup.isra.114+0xcf/0xf0 >> [ 26.430737] ? __netif_receive_skb_core+0x1/0xbb0 >> [ 26.431334] ? __netif_receive_skb_core+0x5/0xbb0 >> [ 26.431930] kprobe_ftrace_handler+0x90/0xf0 >> [ 26.432495] ftrace_ops_assist_func+0x63/0x100 >> [ 26.433060] 0xffffffffc03180bf >> [ 26.433471] ? __netif_receive_skb_core+0x1/0xbb0 >> ... >> >> To prevent we are running in arbitrary task (e.g., idle task) >> context which may introduce sleeping issues, the following >> probably appropriate: >> >> if (in_nmi() || in_softirq()) >> return -EPERM; >> >> Anyway, if in nmi or softirq, the namespace and pid/tgid >> we get may be just accidentally associated with the bpf running >> context, but it could be in a different context. So such info >> is not reliable any way. >> >>> + >>> + if (unlikely(size != sizeof(struct bpf_pidns_info))) >>> + return -EINVAL; >>> + pidns = task_active_pid_ns(current); [...]