linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Christian Brauner <christian@brauner.io>
Cc: jannh@google.com, khlebnikov@yandex-team.ru, luto@kernel.org,
	dhowells@redhat.com, serge@hallyn.com, ebiederm@xmission.com,
	linux-api@vger.kernel.org, linux-kernel@vger.kernel.org,
	arnd@arndb.de, keescook@chromium.org, adobriyan@gmail.com,
	tglx@linutronix.de, mtk.manpages@gmail.com, bl0pbl33p@gmail.com,
	ldv@altlinux.org, akpm@linux-foundation.org, oleg@redhat.com,
	nagarathnam.muthusamy@oracle.com, cyphar@cyphar.com,
	viro@zeniv.linux.org.uk, dancol@google.com
Subject: Re: [PATCH v1 2/4] pid: add pidctl()
Date: Tue, 26 Mar 2019 14:10:12 -0400	[thread overview]
Message-ID: <20190326181012.GA138478@google.com> (raw)
In-Reply-To: <20190326172231.daa5a53lxf6nz6jn@brauner.io>

On Tue, Mar 26, 2019 at 06:22:33PM +0100, Christian Brauner wrote:
> On Tue, Mar 26, 2019 at 01:06:01PM -0400, Joel Fernandes wrote:
> > On Tue, Mar 26, 2019 at 04:55:11PM +0100, Christian Brauner wrote:
> > > The pidctl() syscalls builds on, extends, and improves translate_pid() [4].
> > > I quote Konstantins original patchset first that has already been acked and
> > > picked up by Eric before and whose functionality is preserved in this
> > > syscall:
> > > 
> > > "Each process have different pids, one for each pid namespace it belongs.
> > >  When interaction happens within single pid-ns translation isn't required.
> > >  More complicated scenarios needs special handling.
> > > 
> > >  For example:
> > >  - reading pid-files or logs written inside container with pid namespace
> > >  - writing logs with internal pids outside container for pushing them into
> > >  - attaching with ptrace to tasks from different pid namespace
> > > 
> > >  Generally speaking, any cross pid-ns API with pids needs translation.
> > > 
> > >  Currently there are several interfaces that could be used here:
> > > 
> > >  Pid namespaces are identified by device and inode of /proc/[pid]/ns/pid.
> > > 
> > >  Pids for nested pid namespaces are shown in file /proc/[pid]/status.
> > >  In some cases pid translation could be easily done using this information.
> > >  Backward translation requires scanning all tasks and becomes really
> > >  complicated for deeper namespace nesting.
> > > 
> > >  Unix socket automatically translates pid attached to SCM_CREDENTIALS.
> > >  This requires CAP_SYS_ADMIN for sending arbitrary pids and entering
> > >  into pid namespace, this expose process and could be insecure."
> > > 
> > > The original patchset allowed two distinct operations implicitly:
> > > - discovering whether pid namespaces (pidns) have a parent-child
> > >   relationship
> > > - translating a pid from a source pidns into a target pidns
> > > 
> > > Both tasks are accomplished in the original patchset by passing a pid
> > > along. If the pid argument is passed as 1 the relationship between two pid
> > > namespaces can be discovered.
> > > The syscall will gain a lot clearer syntax and will be easier to use for
> > > userspace if the task it is asked to perform is passed through a
> > > command argument. Additionally, it allows us to remove an intrinsic race
> > > caused by using the pid argument as a way to discover the relationship
> > > between pid namespaces.
> > > This patch introduces three commands:
> > > 
> > > /* PIDCMD_QUERY_PID */
> > > PIDCMD_QUERY_PID allows to translate a pid between pid namespaces.
> > > Given a source pid namespace fd return the pid of the process in the target
> > > namespace:
> > 
> > Could we call this PIDCMD_TRANSLATE_PID please? QUERY is confusing/ambigious
> > IMO (see below).
> 
> Yes, doesn't matter to me too much what we call it.

Ok.

> > > 1. pidctl(PIDCMD_QUERY_PID, pid, source_fd, -1, 0)
> > >   - retrieve pidns identified by source_fd
> > >   - retrieve struct pid identifed by pid in pidns identified by source_fd
> > >   - retrieve callers pidns
> > >   - return pid in callers pidns
> > > 
> > > 2. pidctl(PIDCMD_QUERY_PID, pid, -1, target_fd, 0)
> > >   - retrieve callers pidns
> > >   - retrieve struct pid identifed by pid in callers pidns
> > >   - retrieve pidns identified by target_fd
> > >   - return pid in pidns identified by target_fd
> > > 
> > > 3. pidctl(PIDCMD_QUERY_PID, 1, source_fd, -1, 0)
> > >   - retrieve pidns identified by source_fd
> > >   - retrieve struct pid identifed by init task in pidns identified by source_fd
> > >   - retrieve callers pidns
> > >   - return pid of init task of pidns identified by source_fd in callers pidns
> > > 
> > > 4. pidctl(PIDCMD_QUERY_PID, pid, source_fd, target_fd, 0)
> > >   - retrieve pidns identified by source_fd
> > >   - retrieve struct pid identifed by pid in pidns identified by source_fd
> > >   - retrieve pidns identified by target_fd
> > >   - check whether struct pid can be found in pidns identified by target_fd
> > >   - return pid in pidns identified by target_fd
> > > 
> > > /* PIDCMD_QUERY_PIDNS */
> > > PIDCMD_QUERY_PIDNS allows to determine the relationship between pid
> > > namespaces.
> > > In the original version of the pachset passing pid as 1 would allow to
> > > deterimine the relationship between the pid namespaces. This is inherhently
> > > racy. If pid 1 inside a pid namespace has died it would report false
> > > negatives. For example, if pid 1 inside of the target pid namespace already
> > > died, it would report that the target pid namespace cannot be reached from
> > > the source pid namespace because it couldn't find the pid inside of the
> > > target pid namespace and thus falsely report to the user that the two pid
> > > namespaces are not related. This problem is simple to avoid. In the new
> > > version we simply walk the list of ancestors and check whether the
> > > namespace are related to each other. By doing it this way we can reliably
> > > report what the relationship between two pid namespace file descriptors
> > > looks like.
> > > 
> > > 1. pidctl(PIDCMD_QUERY_PIDNS, 0, ns_fd1, ns_fd1, 0) == 0
> > >    - pidns_of(ns_fd1) and pidns_of(ns_fd2) are unrelated to each other
> > > 
> > > 2. pidctl(PIDCMD_QUERY_PIDNS, 0, ns_fd1, ns_fd2, 0) == 1
> > >    - pidns_of(ns_fd1) == pidns_of(ns_fd2)
> > > 
> > > 3. pidctl(PIDCMD_QUERY_PIDNS, 0, ns_fd1, ns_fd2, 0) == 2
> > >    - pidns_of(ns_fd1) is ancestor of pidns_of(ns_fd2)
> > > 
> > > 4. pidctl(PIDCMD_QUERY_PIDNS, 0, ns_fd1, ns_fd2, 0) == 3
> > >    - pidns_of(ns_fd2) is ancestor of pidns_of(ns_fd1)
> > 
> > Why not call this PIDCMD_COMPARE_PIDNS, since a comparison is what you're doing.
> 
> Same.

Ok, glad we agree on it.

> > 
> > Again QUERY is ambigious here. Above you called QUERY to translate something,
> > now you're calling QUERY to mean compare something. I suggest to be explicit
> > about the operation PIDCMD_<OPERATION>_<OPERAND-TYPE>.
> > 
> > Arguably, 2 syscalls for this is cleaner:
> > pid_compare_namespaces(ns_fd1, ns_fd2);
> > pid_translate(pid, ns_fd1, nds_fd2);
> 
> I don't think we want to send out pid_compare_namespaces() as a separate
> syscall. If that's the consensus I'd rather just drop this functionality
> completely.

I mean, if pid-namespace fd comparison functionality is not needed in
userspace, why add it all. I suggest to drop it if not needed and can be
considered for addition later when needed.

Then you're just left with GET_PID and TRANSLATE_PID which we know we do need.

> > > These two commands - PIDCMD_QUERY_PID and PIDCMD_QUERY_PIDNS - cover and
> > > improve the functionality expressed implicitly in translate_pid() before.
> > > 
> > > /* PIDCMD_GET_PIDFD */
> > 
> > And this can be a third syscall:
> > pidfd_translate(pid, ns_fd1, ns_fd2).
> 
> Sigh, yes. But I honestly want other people other than us to come out
> and say "yes, please send a PR to Linus with three separate syscalls for
> very closely related functionality". There's a difference between "this
> is how we would ideally like to do it" and "this is what is common
> practice and will likely get accepted". But I'm really not opposed to it
> per se.

Ok.

> > I am actually supportive of Daniel's view that by combining too many
> > arguments into a single syscall, becomes confusing and sometimes some
> > arguments have to be forced to 0 in the single shoe-horned syscall. Like you
> 
> There's a difference between an ioctl() and say seccomp() which this is
> close to:
> int seccomp(unsigned int operation, unsigned int flags, void *args);
> The point is that the functionality is closely related not just randomly
> unrelated stuff. But as I said I'm more than willing to compromise.

Sounds great, yeah whatever makes sense.

Thanks.


  reply	other threads:[~2019-03-26 18:10 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-26 15:55 [PATCH v1 0/4] pid: add pidctl() Christian Brauner
2019-03-26 15:55 ` [PATCH v1 1/4] Make anon_inodes unconditional Christian Brauner
2019-03-26 15:55 ` [PATCH v1 2/4] pid: add pidctl() Christian Brauner
2019-03-26 16:17   ` Daniel Colascione
2019-03-26 16:23     ` Christian Brauner
2019-03-26 16:31       ` Christian Brauner
2019-03-26 16:34         ` Christian Brauner
2019-03-26 16:38           ` Daniel Colascione
2019-03-26 16:43             ` Christian Brauner
2019-03-26 16:50               ` Daniel Colascione
2019-03-26 17:05                 ` Christian Brauner
2019-03-26 16:42           ` Andy Lutomirski
2019-03-26 16:46             ` Christian Brauner
2019-03-26 17:00               ` Daniel Colascione
2019-03-26 16:33       ` Daniel Colascione
2019-03-26 17:06   ` Joel Fernandes
2019-03-26 17:08     ` Christian Brauner
2019-03-26 17:15       ` Joel Fernandes
2019-03-26 17:17         ` Christian Brauner
2019-03-26 17:18           ` Joel Fernandes
2019-03-26 17:22     ` Christian Brauner
2019-03-26 18:10       ` Joel Fernandes [this message]
2019-03-26 18:19         ` Christian Brauner
2019-03-26 18:47           ` Joel Fernandes
2019-03-26 19:19         ` Christian Brauner
2019-03-26 19:51           ` Joel Fernandes
2019-03-26 19:57             ` Christian Brauner
2019-03-26 15:55 ` [PATCH v1 3/4] signal: support pidctl() with pidfd_send_signal() Christian Brauner
2019-03-26 15:55 ` [PATCH v1 4/4] tests: add pidctl() tests Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190326181012.GA138478@google.com \
    --to=joel@joelfernandes.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=bl0pbl33p@gmail.com \
    --cc=christian@brauner.io \
    --cc=cyphar@cyphar.com \
    --cc=dancol@google.com \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=jannh@google.com \
    --cc=keescook@chromium.org \
    --cc=khlebnikov@yandex-team.ru \
    --cc=ldv@altlinux.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=nagarathnam.muthusamy@oracle.com \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).