linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Djalal Harouni <tixxdz@gmail.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"kernel-hardening@lists.openwall.com"
	<kernel-hardening@lists.openwall.com>,
	LSM List <linux-security-module@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Dongsu Park <dpark@posteo.net>,
	Casey Schaufler <casey@schaufler-ca.com>,
	James Morris <james.l.morris@oracle.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Jeff Layton <jlayton@poochiereds.net>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Alexey Dobriyan <adobriyan@gmail.com>,
	Ingo Molnar <mingo@kernel.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Oleg Nesterov <oleg@redhat.com>, Michal Hocko <mhocko@suse.com>,
	Jonathan Corbet <corbet@lwn.net>
Subject: Re: [PATCH RFC v2 4/6] proc: support mounting private procfs instances inside same pid namespace
Date: Wed, 26 Apr 2017 15:13:30 -0700	[thread overview]
Message-ID: <CALCETrWOjLnfCgpAJs6tTjLXeNHjqPQa5qSZpztcnbR2kN5A2w@mail.gmail.com> (raw)
In-Reply-To: <1493123038-30590-5-git-send-email-tixxdz@gmail.com>

On Tue, Apr 25, 2017 at 5:23 AM, Djalal Harouni <tixxdz@gmail.com> wrote:
> This patch allows to have multiple private procfs instances inside the
> same pid namespace. Lot of other areas in the kernel and filesystems
> have been updated to be able to support private instances, devpts is one
> major example. The aim here is lightweight sandboxes, and to allow that we
> have to modernize procfs internals.
>
> 1) The main aim of this work is to have on embedded systems one
> supervisor for apps. Right now we have some lightweight sandbox support,
> however if we create pid namespacess we have to manages all the
> processes inside too, where our goal is to be able to run a bunch of
> apps each one inside its own mount namespace without being able to
> notice each other. We only want to use mount namespaces, and we want
> procfs to behave more like a real mount point.
>
> 2) Linux Security Modules have multiple ptrace paths inside some
> subsystems, however inside procfs, the implementation does not guarantee
> that the ptrace() check which triggers the security_ptrace_check() hook
> will always run. We have the 'hidepid' mount option that can be used to
> force the ptrace_may_access() check inside has_pid_permissions() to run.
> The problem is that 'hidepid' is per pid namespace and not attached to
> the mount point, any remount or modification of 'hidepid' will propagate
> to all other procfs mounts.
>
> This also does not allow to support Yama LSM easily in desktop and user
> sessions. Yama ptrace scope which restricts ptrace and some other
> syscalls to be allowed only on inferiors, can be updated to have a
> per-task context, where the context will be inherited during fork(),
> clone() and preserved across execve(). If we support multiple private
> procfs instances, then we may force the ptrace_may_access() on
> /proc/<pids>/ to always run inside that new procfs instances. This will
> allow to specifiy on user sessions if we should populate procfs with
> pids that the user can ptrace or not.
>
> By using Yama ptrace scope, some restricted users will only be able to see
> inferiors inside /proc, they won't even be able to see their other
> processes. Some software like Chromium, Firefox's crash handler, Wine
> and others are already using Yama to restrict which processes can be
> ptracable. With this change this will give the possibility to restrict
> /proc/<pids>/ but more importantly this will give desktop users a
> generic and usuable way to specifiy which users should see all processes
> and which users can not.
>
> Side notes:
> * This covers the lack of seccomp where it is not able to parse
> arguments, it is easy to install a seccomp filter on direct syscalls
> that operate on pids, however /proc/<pid>/ is a Linux ABI using
> filesystem syscalls. With this change LSMs should be able to analyze
> open/read/write/close...
>
> 3) This will modernize procfs and align it with all other filesystems
> and subsystems that have been updated recently to be able to work in a
> flexible way. This is the same as devpts where each mount now is a distinct
> filesystem such that ptys and their indicies allocated in one mount are
> independent from ptys and their indicies in all other mounts.
>
> We have to align procfs and modernize it to have a per mount context
> where at least the mount option do not propagate to all other mounts,
> then maybe we can continue to implement new features. One example is to
> require CAP_SYS_ADMIN in the init user namespace on some /proc/* which are
> not pids and which are are not virtualized by design, or CAP_NET_ADMIN
> inside userns on the net bits that are virtualized, etc.
> These mount options won't propagate to previous mounts, and the system
> will continue to be usable.
>
> Ths patch introduces the new 'limit_pids' mount option as it was also
> suggesed by Andy Lutomirski [1]. When this option is passed we
> automatically create a private procfs instance. This is not the default
> behaviour since we do not want to break userspace and we do not want to
> provide different devices IDs by default, please see [1] for why.

I think that calling the option to make a separate instance
"limit_pids" is extremely counterintuitive.

My strong preference would be to make proc *always* make a separate
instance (unless it's a bind mount) and to make it work.  If that
means fudging stat() output, so be it.

Failing that, let's come up with some coherent way to make this work.
"new_instance" or similar would do.  Then make limit_pid cause an
error unless new_instance is also set.

--Andy

  reply	other threads:[~2017-04-26 22:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-25 12:23 [PATCH RFC v2 0/6] proc: support private proc instances per pidnamespace Djalal Harouni
2017-04-25 12:23 ` [PATCH RFC v2 1/6] proc: add proc_fs_info struct to store proc information Djalal Harouni
2017-04-25 12:23 ` [PATCH RFC v2 2/6] proc: move /proc/{self|thread-self} dentries to proc_fs_info Djalal Harouni
2017-04-25 12:23 ` [PATCH RFC v2 3/6] proc: add helpers to set and get proc hidepid and gid mount options Djalal Harouni
2017-04-25 12:23 ` [PATCH RFC v2 4/6] proc: support mounting private procfs instances inside same pid namespace Djalal Harouni
2017-04-26 22:13   ` Andy Lutomirski [this message]
2017-05-02 14:29     ` Djalal Harouni
2017-05-02 16:33       ` Andy Lutomirski
     [not found]         ` <CALCETrV4SjQE_NM4=j0JgRGBjOVY4o=iu0=ruuvzSuGRUPgNbg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-03 15:18           ` Djalal Harouni
     [not found] ` <1493123038-30590-1-git-send-email-tixxdz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-04-25 12:23   ` [PATCH RFC v2 5/6] proc: instantiate only pids that we can ptrace on 'limit_pids=1' mount option Djalal Harouni
2017-04-26 22:09     ` Andy Lutomirski
     [not found]       ` <CALCETrXM7-NBnBcXbuuhDJZyUFLT7iRfcGGvaqUhDJBGkYJgcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-02 14:00         ` Djalal Harouni
2017-04-25 12:23 ` [PATCH RFC v2 6/6] proc: flush task dcache entries from all procfs instances Djalal Harouni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrWOjLnfCgpAJs6tTjLXeNHjqPQa5qSZpztcnbR2kN5A2w@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=casey@schaufler-ca.com \
    --cc=corbet@lwn.net \
    --cc=dpark@posteo.net \
    --cc=ebiederm@xmission.com \
    --cc=james.l.morris@oracle.com \
    --cc=jlayton@poochiereds.net \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=serge@hallyn.com \
    --cc=tixxdz@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).