From: Christian Brauner <christian@brauner.io> To: Jann Horn <jannh@google.com> Cc: Andy Lutomirski <luto@kernel.org>, Konstantin Khlebnikov <khlebnikov@yandex-team.ru>, David Howells <dhowells@redhat.com>, "Serge E. Hallyn" <serge@hallyn.com>, "Eric W. Biederman" <ebiederm@xmission.com>, Linux API <linux-api@vger.kernel.org>, kernel list <linux-kernel@vger.kernel.org>, Arnd Bergmann <arnd@arndb.de>, Kees Cook <keescook@chromium.org>, Alexey Dobriyan <adobriyan@gmail.com>, Thomas Gleixner <tglx@linutronix.de>, Michael Kerrisk-manpages <mtk.manpages@gmail.com>, bl0pbl33p@gmail.com, "Dmitry V. Levin" <ldv@altlinux.org>, Andrew Morton <akpm@linux-foundation.org>, Oleg Nesterov <oleg@redhat.com>, Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>, cyphar@cyphar.com, Al Viro <viro@zeniv.linux.org.uk>, "Joel Fernandes (Google)" <joel@joelfernandes.org>, Daniel Colascione <dancol@google.com> Subject: Re: [PATCH 2/4] pid: add pidctl() Date: Mon, 25 Mar 2019 20:58:45 +0100 [thread overview] Message-ID: <20190325195844.lrf7kgutlyv77vmu@brauner.io> (raw) In-Reply-To: <CAG48ez24xPM4f7ty=8k5p6_gVWd-bEUSxca5Ngy5svHgarhz+w@mail.gmail.com> On Mon, Mar 25, 2019 at 07:18:42PM +0100, Jann Horn wrote: > On Mon, Mar 25, 2019 at 5:21 PM Christian Brauner <christian@brauner.io> wrote: > > The pidctl() syscalls builds on, extends, and improves translate_pid() [4]. > > I quote Konstantins original patchset first that has already been acked and > > picked up by Eric before and whose functionality is preserved in this > > syscall: > [...] > > + > > +static struct pid_namespace *get_pid_ns_by_fd(int fd) > > +{ > > + struct pid_namespace *pidns = ERR_PTR(-EINVAL); > > + > > + if (fd >= 0) { > > +#ifdef CONFIG_PID_NS > > + struct ns_common *ns; > > + struct file *file = proc_ns_fget(fd); > > + if (IS_ERR(file)) > > + return ERR_CAST(file); > > + > > + ns = get_proc_ns(file_inode(file)); > > + if (ns->ops->type == CLONE_NEWPID) > > + pidns = get_pid_ns( > > + container_of(ns, struct pid_namespace, ns)); > > This increments the refcount of the pidns... I didn't touch that code. That's taken over from the orginial patchset. Thanks for catching this. > > > + > > + fput(file); > > +#endif > > + } else { > > + pidns = task_active_pid_ns(current); > > ... but this doesn't. That's pretty subtle; could you please put a > comment on top of this function that points this out? Or even better, > change the function to always take a reference, so that the caller > doesn't have to worry about figuring this out. Always taking a reference sounds more correct. Will do. > > > + } > > + > > + return pidns; > > +} > [...] > > +SYSCALL_DEFINE5(pidctl, unsigned int, cmd, pid_t, pid, int, source, int, target, > > + unsigned int, flags) > > +{ > > + struct pid_namespace *source_ns = NULL, *target_ns = NULL; > > + struct pid *struct_pid; > > + pid_t result; > > + > > + switch (cmd) { > > + case PIDCMD_QUERY_PIDNS: > > + if (pid != 0) > > + return -EINVAL; > > + pid = 1; > > + /* fall through */ > > + case PIDCMD_QUERY_PID: > > + if (flags != 0) > > + return -EINVAL; > > + break; > > + case PIDCMD_GET_PIDFD: > > + if (flags & ~PIDCTL_CLOEXEC) > > + return -EINVAL; > > + break; > > + default: > > + return -EOPNOTSUPP; > > + } > > + > > + source_ns = get_pid_ns_by_fd(source); > > + result = PTR_ERR(source_ns); > > I very much dislike using PTR_ERR() on pointers before checking > whether they contain an error value or not. I understand that the > result of this won't actually be used, but it still seems weird to > have what is essentially a cast of a potentially valid pointer to a > potentially smaller integer here. > > Could you maybe move the PTR_ERR() into the error branch? Like so: Will do! > > if (IS_ERR(source_ns)) { > result = PTR_ERR(source_ns); > goto err_source; > } > > > + if (IS_ERR(source_ns)) > > + goto err_source; > > + > > + target_ns = get_pid_ns_by_fd(target); > > + result = PTR_ERR(target_ns); > > + if (IS_ERR(target_ns)) > > + goto err_target; > > + > > + if (cmd == PIDCMD_QUERY_PIDNS) { > > + result = pidns_related(source_ns, target_ns); > > + } else { > > + rcu_read_lock(); > > + struct_pid = find_pid_ns(pid, source_ns); > > find_pid_ns() doesn't take a reference on its return value, the return > value is only pinned into memory by the RCU read-side critical > section... > > > + result = struct_pid ? pid_nr_ns(struct_pid, target_ns) : -ESRCH; > > + rcu_read_unlock(); > > ... which ends here, making struct_pid a dangling pointer... > > > + > > + if (cmd == PIDCMD_GET_PIDFD) { > > + int cloexec = (flags & PIDCTL_CLOEXEC) ? O_CLOEXEC : 0; > > + if (result > 0) > > + result = pidfd_create_fd(struct_pid, cloexec); > > ... and then here you continue to use struct_pid. That seems bogus. I'll just take a reference to struct pid once I found it to prevent that from happening. > > > + else if (result == 0) > > + result = -ENOENT; > > You don't need to have flags for this for new syscalls, you can just > make everything O_CLOEXEC; if someone wants to preserve an fd across > execve(), they can just clear that bit with fcntl(). (I thiiink it was > Andy Lutomirski who said that before regarding another syscall? But my > memory of that is pretty foggy, might've been someone else.) If that's the way going forward this is fine with me! > > > + } > > + } > > + > > + if (target) > > + put_pid_ns(target_ns); > > +err_target: > > + if (source) > > + put_pid_ns(source_ns); > > I think you probably intended to check for "if (target >= 0)" and "if > (source >= 0)" instead of these conditions, matching the condition in > get_pid_ns_by_fd()? The current code looks as if it will leave the > refcount one too high when used with target==0 or source==0, and leave > the refcount one too low when used with target==-1 or source==-1. > Anyway, as stated above, I think it would be simpler to > unconditionally take a reference instead. Yep. > > > +err_source: > > + return result; > > +}
WARNING: multiple messages have this Message-ID (diff)
From: Christian Brauner <christian@brauner.io> To: Jann Horn <jannh@google.com> Cc: Andy Lutomirski <luto@kernel.org>, Konstantin Khlebnikov <khlebnikov@yandex-team.ru>, David Howells <dhowells@redhat.com>, "Serge E. Hallyn" <serge@hallyn.com>, "Eric W. Biederman" <ebiederm@xmission.com>, Linux API <linux-api@vger.kernel.org>, kernel list <linux-kernel@vger.kernel.org>, Arnd Bergmann <arnd@arndb.de>, Kees Cook <keescook@chromium.org>, Alexey Dobriyan <adobriyan@gmail.com>, Thomas Gleixner <tglx@linutronix.de>, Michael Kerrisk-manpages <mtk.manpages@gmail.com>, bl0pbl33p@gmail.com, "Dmitry V. Levin" <ldv@altlinux.org>, Andrew Morton <akpm@linux-foundation.org>, Oleg Nesterov <oleg@redhat.com>, Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>, cyphar@cyphar.com, Al Viro <viro@zeniv.linux.org.uk>, "Joel Fernandes (Google)" <joe> Subject: Re: [PATCH 2/4] pid: add pidctl() Date: Mon, 25 Mar 2019 20:58:45 +0100 [thread overview] Message-ID: <20190325195844.lrf7kgutlyv77vmu@brauner.io> (raw) In-Reply-To: <CAG48ez24xPM4f7ty=8k5p6_gVWd-bEUSxca5Ngy5svHgarhz+w@mail.gmail.com> On Mon, Mar 25, 2019 at 07:18:42PM +0100, Jann Horn wrote: > On Mon, Mar 25, 2019 at 5:21 PM Christian Brauner <christian@brauner.io> wrote: > > The pidctl() syscalls builds on, extends, and improves translate_pid() [4]. > > I quote Konstantins original patchset first that has already been acked and > > picked up by Eric before and whose functionality is preserved in this > > syscall: > [...] > > + > > +static struct pid_namespace *get_pid_ns_by_fd(int fd) > > +{ > > + struct pid_namespace *pidns = ERR_PTR(-EINVAL); > > + > > + if (fd >= 0) { > > +#ifdef CONFIG_PID_NS > > + struct ns_common *ns; > > + struct file *file = proc_ns_fget(fd); > > + if (IS_ERR(file)) > > + return ERR_CAST(file); > > + > > + ns = get_proc_ns(file_inode(file)); > > + if (ns->ops->type == CLONE_NEWPID) > > + pidns = get_pid_ns( > > + container_of(ns, struct pid_namespace, ns)); > > This increments the refcount of the pidns... I didn't touch that code. That's taken over from the orginial patchset. Thanks for catching this. > > > + > > + fput(file); > > +#endif > > + } else { > > + pidns = task_active_pid_ns(current); > > ... but this doesn't. That's pretty subtle; could you please put a > comment on top of this function that points this out? Or even better, > change the function to always take a reference, so that the caller > doesn't have to worry about figuring this out. Always taking a reference sounds more correct. Will do. > > > + } > > + > > + return pidns; > > +} > [...] > > +SYSCALL_DEFINE5(pidctl, unsigned int, cmd, pid_t, pid, int, source, int, target, > > + unsigned int, flags) > > +{ > > + struct pid_namespace *source_ns = NULL, *target_ns = NULL; > > + struct pid *struct_pid; > > + pid_t result; > > + > > + switch (cmd) { > > + case PIDCMD_QUERY_PIDNS: > > + if (pid != 0) > > + return -EINVAL; > > + pid = 1; > > + /* fall through */ > > + case PIDCMD_QUERY_PID: > > + if (flags != 0) > > + return -EINVAL; > > + break; > > + case PIDCMD_GET_PIDFD: > > + if (flags & ~PIDCTL_CLOEXEC) > > + return -EINVAL; > > + break; > > + default: > > + return -EOPNOTSUPP; > > + } > > + > > + source_ns = get_pid_ns_by_fd(source); > > + result = PTR_ERR(source_ns); > > I very much dislike using PTR_ERR() on pointers before checking > whether they contain an error value or not. I understand that the > result of this won't actually be used, but it still seems weird to > have what is essentially a cast of a potentially valid pointer to a > potentially smaller integer here. > > Could you maybe move the PTR_ERR() into the error branch? Like so: Will do! > > if (IS_ERR(source_ns)) { > result = PTR_ERR(source_ns); > goto err_source; > } > > > + if (IS_ERR(source_ns)) > > + goto err_source; > > + > > + target_ns = get_pid_ns_by_fd(target); > > + result = PTR_ERR(target_ns); > > + if (IS_ERR(target_ns)) > > + goto err_target; > > + > > + if (cmd == PIDCMD_QUERY_PIDNS) { > > + result = pidns_related(source_ns, target_ns); > > + } else { > > + rcu_read_lock(); > > + struct_pid = find_pid_ns(pid, source_ns); > > find_pid_ns() doesn't take a reference on its return value, the return > value is only pinned into memory by the RCU read-side critical > section... > > > + result = struct_pid ? pid_nr_ns(struct_pid, target_ns) : -ESRCH; > > + rcu_read_unlock(); > > ... which ends here, making struct_pid a dangling pointer... > > > + > > + if (cmd == PIDCMD_GET_PIDFD) { > > + int cloexec = (flags & PIDCTL_CLOEXEC) ? O_CLOEXEC : 0; > > + if (result > 0) > > + result = pidfd_create_fd(struct_pid, cloexec); > > ... and then here you continue to use struct_pid. That seems bogus. I'll just take a reference to struct pid once I found it to prevent that from happening. > > > + else if (result == 0) > > + result = -ENOENT; > > You don't need to have flags for this for new syscalls, you can just > make everything O_CLOEXEC; if someone wants to preserve an fd across > execve(), they can just clear that bit with fcntl(). (I thiiink it was > Andy Lutomirski who said that before regarding another syscall? But my > memory of that is pretty foggy, might've been someone else.) If that's the way going forward this is fine with me! > > > + } > > + } > > + > > + if (target) > > + put_pid_ns(target_ns); > > +err_target: > > + if (source) > > + put_pid_ns(source_ns); > > I think you probably intended to check for "if (target >= 0)" and "if > (source >= 0)" instead of these conditions, matching the condition in > get_pid_ns_by_fd()? The current code looks as if it will leave the > refcount one too high when used with target==0 or source==0, and leave > the refcount one too low when used with target==-1 or source==-1. > Anyway, as stated above, I think it would be simpler to > unconditionally take a reference instead. Yep. > > > +err_source: > > + return result; > > +}
next prev parent reply other threads:[~2019-03-25 19:58 UTC|newest] Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-03-25 16:20 [PATCH 0/4] pid: add pidctl() Christian Brauner 2019-03-25 16:20 ` [PATCH 1/4] Make anon_inodes unconditional Christian Brauner 2019-03-25 16:20 ` [PATCH 2/4] pid: add pidctl() Christian Brauner 2019-03-25 17:20 ` Mika Penttilä 2019-03-25 19:59 ` Christian Brauner 2019-03-25 19:59 ` Christian Brauner 2019-03-25 18:18 ` Jann Horn 2019-03-25 18:18 ` Jann Horn 2019-03-25 19:58 ` Christian Brauner [this message] 2019-03-25 19:58 ` Christian Brauner 2019-03-26 16:07 ` Joel Fernandes 2019-03-26 16:07 ` Joel Fernandes 2019-03-26 16:15 ` Christian Brauner 2019-03-26 16:15 ` Christian Brauner 2019-03-25 16:20 ` [PATCH 3/4] signal: support pidctl() with pidfd_send_signal() Christian Brauner 2019-03-25 18:28 ` Jonathan Kowalski 2019-03-25 18:28 ` Jonathan Kowalski 2019-03-25 20:05 ` Christian Brauner 2019-03-25 20:05 ` Christian Brauner 2019-03-25 18:39 ` Jann Horn 2019-03-25 18:39 ` Jann Horn 2019-03-25 19:41 ` Christian Brauner 2019-03-25 19:41 ` Christian Brauner 2019-03-25 16:20 ` [PATCH 4/4] tests: add pidctl() tests Christian Brauner 2019-03-25 16:48 ` [PATCH 0/4] pid: add pidctl() Daniel Colascione 2019-03-25 16:48 ` Daniel Colascione 2019-03-25 17:05 ` Konstantin Khlebnikov 2019-03-25 17:07 ` Daniel Colascione 2019-03-25 17:07 ` Daniel Colascione 2019-03-25 17:36 ` Joel Fernandes 2019-03-25 17:36 ` Joel Fernandes 2019-03-25 17:53 ` Daniel Colascione 2019-03-25 17:53 ` Daniel Colascione 2019-03-25 18:19 ` Jonathan Kowalski 2019-03-25 18:19 ` Jonathan Kowalski 2019-03-25 18:57 ` Daniel Colascione 2019-03-25 18:57 ` Daniel Colascione 2019-03-25 19:42 ` Jonathan Kowalski 2019-03-25 19:42 ` Jonathan Kowalski 2019-03-25 20:14 ` Daniel Colascione 2019-03-25 20:14 ` Daniel Colascione 2019-03-25 20:34 ` Jann Horn 2019-03-25 20:34 ` Jann Horn 2019-03-25 20:40 ` Jonathan Kowalski 2019-03-25 20:40 ` Jonathan Kowalski 2019-03-25 21:14 ` Jonathan Kowalski 2019-03-25 21:14 ` Jonathan Kowalski 2019-03-25 21:15 ` Jann Horn 2019-03-25 21:15 ` Jann Horn 2019-03-25 20:40 ` Christian Brauner 2019-03-25 20:40 ` Christian Brauner 2019-03-25 20:15 ` Christian Brauner 2019-03-25 20:15 ` Christian Brauner 2019-03-25 21:11 ` Joel Fernandes 2019-03-25 21:11 ` Joel Fernandes 2019-03-25 21:17 ` Daniel Colascione 2019-03-25 21:17 ` Daniel Colascione 2019-03-25 21:19 ` Jann Horn 2019-03-25 21:19 ` Jann Horn 2019-03-25 21:43 ` Joel Fernandes 2019-03-25 21:43 ` Joel Fernandes 2019-03-25 21:54 ` Jonathan Kowalski 2019-03-25 21:54 ` Jonathan Kowalski 2019-03-25 22:07 ` Daniel Colascione 2019-03-25 22:07 ` Daniel Colascione 2019-03-25 22:37 ` Jonathan Kowalski 2019-03-25 22:37 ` Jonathan Kowalski 2019-03-25 23:14 ` Daniel Colascione 2019-03-25 23:14 ` Daniel Colascione 2019-03-26 3:03 ` Joel Fernandes 2019-03-26 3:03 ` Joel Fernandes 2019-03-25 16:56 ` David Howells 2019-03-25 16:56 ` David Howells 2019-03-25 16:58 ` Daniel Colascione 2019-03-25 16:58 ` Daniel Colascione 2019-03-25 23:39 ` Andy Lutomirski 2019-03-25 23:39 ` Andy Lutomirski
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190325195844.lrf7kgutlyv77vmu@brauner.io \ --to=christian@brauner.io \ --cc=adobriyan@gmail.com \ --cc=akpm@linux-foundation.org \ --cc=arnd@arndb.de \ --cc=bl0pbl33p@gmail.com \ --cc=cyphar@cyphar.com \ --cc=dancol@google.com \ --cc=dhowells@redhat.com \ --cc=ebiederm@xmission.com \ --cc=jannh@google.com \ --cc=joel@joelfernandes.org \ --cc=keescook@chromium.org \ --cc=khlebnikov@yandex-team.ru \ --cc=ldv@altlinux.org \ --cc=linux-api@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=luto@kernel.org \ --cc=mtk.manpages@gmail.com \ --cc=nagarathnam.muthusamy@oracle.com \ --cc=oleg@redhat.com \ --cc=serge@hallyn.com \ --cc=tglx@linutronix.de \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.