Extending bpf_get_ns_current_pid_tgid()

* Extending bpf_get_ns_current_pid_tgid()
@ 2020-11-12 22:20 Daniel Xu
  2020-11-13  0:27 ` Yonghong Song
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Xu @ 2020-11-12 22:20 UTC (permalink / raw)
  To: bpf; +Cc: yhs, cneirabustos, ebiederm, blez

Hi,

I'm looking at the current implementation of
bpf_get_ns_current_pid_tgid() and the helper seems to be a bit overly
restricting to me. Specifically the following line:

    if (!ns_match(&pidns->ns, (dev_t)dev, ino))
            goto clear;

Why bail if the inode # does not match? IIUC from the old discussions,
it was b/c in the future pidns files might belong to different devices.
It's not clear to me (possibly b/c I'm missing something) why the inode
has to match as well.

Would it be possible to instead have the helper return the pid/tgid of
the current task as viewed _from_ the `dev`/`ino` pidns? If the current
task is hidden from the `dev`/`ino` pidns, then return -ENOENT. The use
case is for bpftrace symbolize stacks when run inside a container. For
example:

    (in-container)# bpftrace -e 'profile:hz:99 { print(ustack) }'

This currently does not work b/c bpftrace will generate a prog that gets
the root pidns pid, pack it with the stackid, and pass it up to
userspace. But b/c bpftrace is running inside the container, the root
pidns pid is invalid and symbolization fails.

What would be nice is if bpftrace could generate a prog that gets the
current pid as viewed from bpftrace's pidns. Then symbolization would
work.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 12+ messages in thread