Linux-Doc Archive on lore.kernel.org
 help / color / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Nick Kralevich <nnk@google.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>,
	Jeffrey Vander Stoep <jeffv@google.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Kees Cook <keescook@chromium.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Iurii Zaikin <yzaikin@google.com>,
	Mauro Carvalho Chehab <mchehab+samsung@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Shevchenko <andy.shevchenko@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Peter Xu <peterx@redhat.com>, Mike Rapoport <rppt@linux.ibm.com>,
	Jerome Glisse <jglisse@redhat.com>, Shaohua Li <shli@fb.com>,
	linux-doc@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	Tim Murray <timmurray@google.com>,
	Minchan Kim <minchan@google.com>,
	Sandeep Patil <sspatil@google.com>,
	kernel@android.com, Daniel Colascione <dancol@dancol.org>,
	Kalesh Singh <kaleshsingh@google.com>
Subject: Re: [PATCH 2/2] Add a new sysctl knob: unprivileged_userfaultfd_user_mode_only
Date: Thu, 6 Aug 2020 01:44:01 -0400
Message-ID: <20200806004351-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAFJ0LnEZghYj=d3w8Fmko4GZAWw6Qc5rgAMmXj-8qgXtyU3bZQ@mail.gmail.com>

On Wed, Aug 05, 2020 at 05:43:02PM -0700, Nick Kralevich wrote:
> On Fri, Jul 24, 2020 at 6:40 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jul 23, 2020 at 05:13:28PM -0700, Nick Kralevich wrote:
> > > On Thu, Jul 23, 2020 at 10:30 AM Lokesh Gidra <lokeshgidra@google.com> wrote:
> > > > From the discussion so far it seems that there is a consensus that
> > > > patch 1/2 in this series should be upstreamed in any case. Is there
> > > > anything that is pending on that patch?
> > >
> > > That's my reading of this thread too.
> > >
> > > > > > Unless I'm mistaken that you can already enforce bit 1 of the second
> > > > > > parameter of the userfaultfd syscall to be set with seccomp-bpf, this
> > > > > > would be more a question to the Android userland team.
> > > > > >
> > > > > > The question would be: does it ever happen that a seccomp filter isn't
> > > > > > already applied to unprivileged software running without
> > > > > > SYS_CAP_PTRACE capability?
> > > > >
> > > > > Yes.
> > > > >
> > > > > Android uses selinux as our primary sandboxing mechanism. We do use
> > > > > seccomp on a few processes, but we have found that it has a
> > > > > surprisingly high performance cost [1] on arm64 devices so turning it
> > > > > on system wide is not a good option.
> > > > >
> > > > > [1] https://lore.kernel.org/linux-security-module/202006011116.3F7109A@keescook/T/#m82ace19539ac595682affabdf652c0ffa5d27dad
> > >
> > > As Jeff mentioned, seccomp is used strategically on Android, but is
> > > not applied to all processes. It's too expensive and impractical when
> > > simpler implementations (such as this sysctl) can exist. It's also
> > > significantly simpler to test a sysctl value for correctness as
> > > opposed to a seccomp filter.
> >
> > Given that selinux is already used system-wide on Android, what is wrong
> > with using selinux to control userfaultfd as opposed to seccomp?
> 
> Userfaultfd file descriptors will be generally controlled by SELinux.
> You can see the patchset at
> https://lore.kernel.org/lkml/20200401213903.182112-3-dancol@google.com/
> (which is also referenced in the original commit message for this
> patchset). However, the SELinux patchset doesn't include the ability
> to control FAULT_FLAG_USER / UFFD_USER_MODE_ONLY directly.
> 
> SELinux already has the ability to control who gets CAP_SYS_PTRACE,
> which combined with this patch, is largely equivalent to direct
> UFFD_USER_MODE_ONLY checks. Additionally, with the SELinux patch
> above, movement of userfaultfd file descriptors can be mediated by
> SELinux, preventing one process from acquiring userfaultfd descriptors
> of other processes unless allowed by security policy.
> 
> It's an interesting question whether finer-grain SELinux support for
> controlling UFFD_USER_MODE_ONLY should be added. I can see some
> advantages to implementing this. However, we don't need to decide that
> now.
>
> Kernel security checks generally break down into DAC (discretionary
> access control) and MAC (mandatory access control) controls. Most
> kernel security features check via both of these mechanisms. Security
> attributes of the system should be settable without necessarily
> relying on an LSM such as SELinux. This patch follows the same basic
> model -- system wide control of a hardening feature is provided by the
> unprivileged_userfaultfd_user_mode_only sysctl (DAC), and if needed,
> SELinux support for this can also be implemented on top of the DAC
> controls.
> 
> This DAC/MAC split has been successful in several other security
> features. For example, the ability to map at page zero is controlled
> in DAC via the mmap_min_addr sysctl [1], and via SELinux via the
> mmap_zero access vector [2]. Similarly, access to the kernel ring
> buffer is controlled both via DAC as the dmesg_restrict sysctl [3], as
> well as the SELinux syslog_read [2] check. Indeed, the dmesg_restrict
> sysctl is very similar to this patch -- it introduces a capability
> (CAP_SYSLOG, CAP_SYS_PTRACE) check on access to a sensitive resource.
> 
> If we want to ensure that a security feature will be well tested and
> vetted, it's important to not limit its use to LSMs only. This ensures
> that kernel and application developers will always be able to test the
> effects of a security feature, without relying on LSMs like SELinux.
> It also ensures that all distributions can enable this security
> mitigation should it be necessary for their unique environments,
> without introducing an SELinux dependency. And this patch does not
> preclude an SELinux implementation should it be necessary.
> 
> Even if we decide to implement fine-grain SELinux controls on
> UFFD_USER_MODE_ONLY, we still need this patch. We shouldn't make this
> an either/or choice between SELinux and this patch. Both are
> necessary.
> 
> -- Nick
> 
> [1] https://wiki.debian.org/mmap_min_addr
> [2] https://selinuxproject.org/page/NB_ObjectClassesPermissions
> [3] https://www.kernel.org/doc/Documentation/sysctl/kernel.txt

I am not sure I agree this is similar to dmesg access.

The reason I say it is this: it is pretty easy for admins to know
whether they run something that needs to access the kernel ring buffer.
Or if it's a tool developer poking at dmesg, they can tell admins "we
need these permissions".  But it seems impossible for either an admin to
know that a userfaultfd page e.g. used with shared memory is accessed
from the kernel.

So I guess the question is: how does anyone not running Android
know to set this flag?

I got the feeling it's not really possible, and so for a single-user
feature like this a single API seems enough.  Given a choice between a
knob an admin is supposed to set and selinux policy written by
presumably knowledgeable OS vendors, I'd opt for a second option.

Hope this helps.

> >
> >
> > > > > >
> > > > > >
> > > > > > If answer is "no" the behavior of the new sysctl in patch 2/2 (in
> > > > > > subject) should be enforceable with minor changes to the BPF
> > > > > > assembly. Otherwise it'd require more changes.
> > >
> > > It would be good to understand what these changes are.
> > >
> > > > > > Why exactly is it preferable to enlarge the surface of attack of the
> > > > > > kernel and take the risk there is a real bug in userfaultfd code (not
> > > > > > just a facilitation of exploiting some other kernel bug) that leads to
> > > > > > a privilege escalation, when you still break 99% of userfaultfd users,
> > > > > > if you set with option "2"?
> > >
> > > I can see your point if you think about the feature as a whole.
> > > However, distributions (such as Android) have specialized knowledge of
> > > their security environments, and may not want to support the typical
> > > usages of userfaultfd. For such distributions, providing a mechanism
> > > to prevent userfaultfd from being useful as an exploit primitive,
> > > while still allowing the very limited use of userfaultfd for userspace
> > > faults only, is desirable. Distributions shouldn't be forced into
> > > supporting 100% of the use cases envisioned by userfaultfd when their
> > > needs may be more specialized, and this sysctl knob empowers
> > > distributions to make this choice for themselves.
> > >
> > > > > > Is the system owner really going to purely run on his systems CRIU
> > > > > > postcopy live migration (which already runs with CAP_SYS_PTRACE) and
> > > > > > nothing else that could break?
> > >
> > > This is a great example of a capability which a distribution may not
> > > want to support, due to distribution specific security policies.
> > >
> > > > > >
> > > > > > Option "2" to me looks with a single possible user, and incidentally
> > > > > > this single user can already enforce model "2" by only tweaking its
> > > > > > seccomp-bpf filters without applying 2/2. It'd be a bug if android
> > > > > > apps runs unprotected by seccomp regardless of 2/2.
> > >
> > > Can you elaborate on what bug is present by processes being
> > > unprotected by seccomp?
> > >
> > > Seccomp cannot be universally applied on Android due to previously
> > > mentioned performance concerns. Seccomp is used in Android primarily
> > > as a tool to enforce the list of allowed syscalls, so that such
> > > syscalls can be audited before being included as part of the Android
> > > API.
> > >
> > > -- Nick
> > >
> > > --
> > > Nick Kralevich | nnk@google.com
> >
> 
> 
> -- 
> Nick Kralevich | nnk@google.com


  reply index

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-23  0:26 [PATCH 0/2] Control over userfaultfd kernel-fault handling Daniel Colascione
2020-04-23  0:26 ` [PATCH 1/2] Add UFFD_USER_MODE_ONLY Daniel Colascione
2020-07-24 14:28   ` Michael S. Tsirkin
2020-07-24 14:46     ` Lokesh Gidra
2020-07-26 10:09       ` Michael S. Tsirkin
2020-04-23  0:26 ` [PATCH 2/2] Add a new sysctl knob: unprivileged_userfaultfd_user_mode_only Daniel Colascione
2020-05-06 19:38   ` Peter Xu
2020-05-07 19:15     ` Jonathan Corbet
2020-05-20  4:06       ` Andrea Arcangeli
2020-05-08 16:52   ` Michael S. Tsirkin
2020-05-08 16:54     ` Michael S. Tsirkin
2020-05-20  4:59       ` Andrea Arcangeli
2020-05-20 18:03         ` Kees Cook
2020-05-20 19:48           ` Andrea Arcangeli
2020-05-20 19:51             ` Andrea Arcangeli
2020-05-20 20:17               ` Lokesh Gidra
2020-05-20 21:16                 ` Andrea Arcangeli
2020-07-17 12:57                   ` Jeffrey Vander Stoep
2020-07-23 17:30                     ` Lokesh Gidra
2020-07-24  0:13                       ` Nick Kralevich
2020-07-24 13:40                         ` Michael S. Tsirkin
2020-08-06  0:43                           ` Nick Kralevich
2020-08-06  5:44                             ` Michael S. Tsirkin [this message]
2020-08-17 22:11                               ` Lokesh Gidra
2020-09-04  3:34                                 ` Andrea Arcangeli
2020-09-05  0:36                                   ` Lokesh Gidra
2020-09-19 18:14                                     ` Nick Kralevich
2020-07-24 14:01 ` [PATCH 0/2] Control over userfaultfd kernel-fault handling Michael S. Tsirkin
2020-07-24 14:41   ` Lokesh Gidra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200806004351-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andy.shevchenko@gmail.com \
    --cc=bigeasy@linutronix.de \
    --cc=corbet@lwn.net \
    --cc=dancol@dancol.org \
    --cc=jeffv@google.com \
    --cc=jglisse@redhat.com \
    --cc=kaleshsingh@google.com \
    --cc=keescook@chromium.org \
    --cc=kernel@android.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lokeshgidra@google.com \
    --cc=mcgrof@kernel.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=minchan@google.com \
    --cc=nnk@google.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.ibm.com \
    --cc=shli@fb.com \
    --cc=sspatil@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Doc Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-doc/0 linux-doc/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-doc linux-doc/ https://lore.kernel.org/linux-doc \
		linux-doc@vger.kernel.org
	public-inbox-index linux-doc

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-doc


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git