archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <>
To: Djalal Harouni <>
Cc: Linux Kernel Mailing List <>,
	Andy Lutomirski <>,
	Kees Cook <>,
	Andrew Morton <>,
	Linux FS Devel <>,
	LSM List <>,
	Linux API <>,
	Dongsu Park <>,
	Casey Schaufler <>,
	James Morris <>,
	"Serge E. Hallyn" <>,
	Jeff Layton <>,
	"J. Bruce Fields" <>,
	Alexander Viro <>,
	Alexey Dobriyan <>,
	Ingo Molnar <>,
	"Eric W. Biederman" <>,
	Oleg Nesterov <>, Michal Hocko <>,
	Jonathan Corbet <>
Subject: Re: [PATCH RFC v2 4/6] proc: support mounting private procfs instances inside same pid namespace
Date: Wed, 26 Apr 2017 15:13:30 -0700	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Tue, Apr 25, 2017 at 5:23 AM, Djalal Harouni <> wrote:
> This patch allows to have multiple private procfs instances inside the
> same pid namespace. Lot of other areas in the kernel and filesystems
> have been updated to be able to support private instances, devpts is one
> major example. The aim here is lightweight sandboxes, and to allow that we
> have to modernize procfs internals.
> 1) The main aim of this work is to have on embedded systems one
> supervisor for apps. Right now we have some lightweight sandbox support,
> however if we create pid namespacess we have to manages all the
> processes inside too, where our goal is to be able to run a bunch of
> apps each one inside its own mount namespace without being able to
> notice each other. We only want to use mount namespaces, and we want
> procfs to behave more like a real mount point.
> 2) Linux Security Modules have multiple ptrace paths inside some
> subsystems, however inside procfs, the implementation does not guarantee
> that the ptrace() check which triggers the security_ptrace_check() hook
> will always run. We have the 'hidepid' mount option that can be used to
> force the ptrace_may_access() check inside has_pid_permissions() to run.
> The problem is that 'hidepid' is per pid namespace and not attached to
> the mount point, any remount or modification of 'hidepid' will propagate
> to all other procfs mounts.
> This also does not allow to support Yama LSM easily in desktop and user
> sessions. Yama ptrace scope which restricts ptrace and some other
> syscalls to be allowed only on inferiors, can be updated to have a
> per-task context, where the context will be inherited during fork(),
> clone() and preserved across execve(). If we support multiple private
> procfs instances, then we may force the ptrace_may_access() on
> /proc/<pids>/ to always run inside that new procfs instances. This will
> allow to specifiy on user sessions if we should populate procfs with
> pids that the user can ptrace or not.
> By using Yama ptrace scope, some restricted users will only be able to see
> inferiors inside /proc, they won't even be able to see their other
> processes. Some software like Chromium, Firefox's crash handler, Wine
> and others are already using Yama to restrict which processes can be
> ptracable. With this change this will give the possibility to restrict
> /proc/<pids>/ but more importantly this will give desktop users a
> generic and usuable way to specifiy which users should see all processes
> and which users can not.
> Side notes:
> * This covers the lack of seccomp where it is not able to parse
> arguments, it is easy to install a seccomp filter on direct syscalls
> that operate on pids, however /proc/<pid>/ is a Linux ABI using
> filesystem syscalls. With this change LSMs should be able to analyze
> open/read/write/close...
> 3) This will modernize procfs and align it with all other filesystems
> and subsystems that have been updated recently to be able to work in a
> flexible way. This is the same as devpts where each mount now is a distinct
> filesystem such that ptys and their indicies allocated in one mount are
> independent from ptys and their indicies in all other mounts.
> We have to align procfs and modernize it to have a per mount context
> where at least the mount option do not propagate to all other mounts,
> then maybe we can continue to implement new features. One example is to
> require CAP_SYS_ADMIN in the init user namespace on some /proc/* which are
> not pids and which are are not virtualized by design, or CAP_NET_ADMIN
> inside userns on the net bits that are virtualized, etc.
> These mount options won't propagate to previous mounts, and the system
> will continue to be usable.
> Ths patch introduces the new 'limit_pids' mount option as it was also
> suggesed by Andy Lutomirski [1]. When this option is passed we
> automatically create a private procfs instance. This is not the default
> behaviour since we do not want to break userspace and we do not want to
> provide different devices IDs by default, please see [1] for why.

I think that calling the option to make a separate instance
"limit_pids" is extremely counterintuitive.

My strong preference would be to make proc *always* make a separate
instance (unless it's a bind mount) and to make it work.  If that
means fudging stat() output, so be it.

Failing that, let's come up with some coherent way to make this work.
"new_instance" or similar would do.  Then make limit_pid cause an
error unless new_instance is also set.


  reply	other threads:[~2017-04-26 22:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-25 12:23 [PATCH RFC v2 0/6] proc: support private proc instances per pidnamespace Djalal Harouni
2017-04-25 12:23 ` [PATCH RFC v2 1/6] proc: add proc_fs_info struct to store proc information Djalal Harouni
2017-04-25 12:23 ` [PATCH RFC v2 2/6] proc: move /proc/{self|thread-self} dentries to proc_fs_info Djalal Harouni
2017-04-25 12:23 ` [PATCH RFC v2 3/6] proc: add helpers to set and get proc hidepid and gid mount options Djalal Harouni
2017-04-25 12:23 ` [PATCH RFC v2 4/6] proc: support mounting private procfs instances inside same pid namespace Djalal Harouni
2017-04-26 22:13   ` Andy Lutomirski [this message]
2017-05-02 14:29     ` Djalal Harouni
2017-05-02 16:33       ` Andy Lutomirski
     [not found]         ` <>
2017-05-03 15:18           ` Djalal Harouni
     [not found] ` <>
2017-04-25 12:23   ` [PATCH RFC v2 5/6] proc: instantiate only pids that we can ptrace on 'limit_pids=1' mount option Djalal Harouni
2017-04-26 22:09     ` Andy Lutomirski
     [not found]       ` <>
2017-05-02 14:00         ` Djalal Harouni
2017-04-25 12:23 ` [PATCH RFC v2 6/6] proc: flush task dcache entries from all procfs instances Djalal Harouni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).