All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jann Horn <jann@thejh.net>
To: Alexander Viro <viro@zeniv.linux.org.uk>,
	Roland McGrath <roland@hack.frob.com>,
	Oleg Nesterov <oleg@redhat.com>,
	John Johansen <john.johansen@canonical.com>,
	James Morris <james.l.morris@oracle.com>,
	"Serge E. Hallyn" <serge@hallyn.com>,
	Paul Moore <aul@paul-moore.com>,
	Stephen Smalley <sds@tycho.nsa.gov>,
	Eric Paris <eparis@parisplace.org>,
	Casey Schaufler <casey@schaufler-ca.com>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Janis Danisevskis <jdanis@google.com>,
	Seth Forshee <seth.forshee@canonical.com>,
	"Eric . Biederman" <ebiederm@xmission.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Benjamin LaHaise <bcrl@kvack.org>,
	Ben Hutchings <ben@decadent.org.uk>,
	Andy Lutomirski <luto@amacapital.net>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org,
	linux-security-module@vger.kernel.org, security@kernel.org
Subject: [PATCH v2 8/8] Documentation: add security/ptrace_checks.txt
Date: Fri, 23 Sep 2016 22:40:38 +0200	[thread overview]
Message-ID: <1474663238-22134-9-git-send-email-jann@thejh.net> (raw)
In-Reply-To: <1474663238-22134-1-git-send-email-jann@thejh.net>

This includes some things suggested by Ben Hutchings and Kees Cook.

Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
---
 Documentation/security/ptrace_checks.txt | 229 +++++++++++++++++++++++++++++++
 1 file changed, 229 insertions(+)
 create mode 100644 Documentation/security/ptrace_checks.txt

diff --git a/Documentation/security/ptrace_checks.txt b/Documentation/security/ptrace_checks.txt
new file mode 100644
index 0000000..98689d5
--- /dev/null
+++ b/Documentation/security/ptrace_checks.txt
@@ -0,0 +1,229 @@
+			     ====================
+			     PTRACE ACCESS CHECKS
+			     ====================
+
+By: Jann Horn <jann@thejh.net>
+
+=====
+INTRO
+=====
+
+This file describes the rules for ptrace() and ptrace()-like access to another
+task (inspecting or modifying the virtual memory of another process, inspecting
+another task's register state, inspecting another task's open files and so on).
+
+This kind of access is particularly security-sensitive because it often allows
+the caller to take complete control over the remote process. Therefore, the
+kernel must be very careful here.
+
+There are various kernel APIs that grant debugging access to other processes.
+
+
+===========================
+ORDINARY DEBUGGING SYSCALLS
+===========================
+
+The following ordinary syscalls grant debugging-like access:
+
+ - ptrace() grants R+W access to registers and virtual memory
+ - process_vm_readv() / process_vm_writev() grant R/W access to virtual memory
+ - get_robust_list() reveals an address from the target task, which would be
+   useful e.g. for an ASLR bypass
+ - perf_event_open()
+ - (kcmp() reveals a very small amount of information)
+
+These syscalls use the caller's *real* UID/GID and his *permitted* capability
+set (for historical reasons) to determine the permitted access using
+ptrace_may_access(..., PTRACE_MODE_REALCREDS | ...). This function is
+responsible for ensuring that the caller can't steal any privileges using the
+debugging access; in other words, it has to ensure that the caller's credentials
+are (more or less) a superset of the target's credentials. Namespaces aside,
+these rules work as follows:
+
+Introspection, including cross-thread introspection, is always allowed. This
+means that it is unsafe to use any of the debugging APIs on behalf of another,
+untrusted process unless it is verified somehow that the target pid does not
+actually belong to the calling process - dropping privileges is not sufficient.
+
+root (to be more precise, a process with CAP_SYS_PTRACE effective) can always
+use these debugging APIs on any process - unless a LSM denies the access.
+
+For normal users, there are more checks:
+
+ - The real uid of the caller must be equal to ruid, euid and suid of the target
+   task, same thing for the gids. All of the uids and gids of the target (apart
+   from the fs*id) are checked here because a privileged process might stash his
+   privileged uid into any one of them. (checked in __ptrace_may_access())
+   (Note: This doesn't cover the supplementary GIDs.)
+ - The permitted capabilities of the caller must be a superset of the permitted
+   capabilities of the target. (checked in cap_ptrace_access_check())
+ - The target process must be dumpable. What this means is explained in the next
+   section.
+
+
+===========
+DUMPABILITY
+===========
+
+Originally, the dumpability rule didn't exist - as long as a process had ruid,
+euid and suid set to the uid of a user (same thing with the gids), that user was
+able to gain debugging access to the process. This was problematic: It meant
+that if a setuid binary opened some important file, then dropped all privileges,
+the user was able to debug the process and steal its file descriptor - which is
+also a form of privilege! Other issues are e.g. that when a process drops its
+privileges, it might still have secrets in memory that the user isn't meant to
+see.
+
+Therefore, a process now becomes "nondumpable" when it:
+
+ - changes its euid / egid
+ - changes its fsuid / fsgid
+ - changes its permitted capability set
+
+These checks are in commit_creds().
+
+When a process is nondumpable, you need to have the CAP_SYS_PTRACE capability to
+access it - just having the same uids is not enough anymore.
+
+The most interesting part of dumpability is what happens on execve().
+(Non-)Dumpability is not inherited; instead, it is recalculated in
+setup_new_exec() and commit_creds(): A process will be nondumpable after
+execve() if its post-setuid-bit-handling euid and ruid or egid and rgid aren't
+the same. Additionally, it will be nondumpable if execve() changes its
+credentials in a way that commit_creds() triggers on (as described above).
+
+In effect, this means that normally, when a setuid/setgid binary executes
+another program, the other program is still protected by nondumpability (and
+also by secure execution).
+
+(Additionally, a process will be nondumpable if the euid doesn't have read
+access to the executed binary, which can be the case if the binary is
+executable, but not readable; however, this restriction is irrelevant from a
+security perspective because it isn't enforced consistently, neither through
+AT_SECURE nor for bprm->unsafe. In other words, you can e.g. be attached to such
+a binary via ptrace while it is executed, or you can LD_PRELOAD a library into
+it that dumps its memory. If this was fixed, it would be important to keep
+namespacing issues in mind, in particular the interaction with
+LSM_UNSAFE_PTRACE_CAP, the privileged kind of ptrace that works across setuid
+execve.)
+
+Dumpability and the secure exec flag can be affected by the following "unsafe
+execution" rules. When a process with ruid != euid executes another program and
+an unsafe execution is detected, *ITS EUID WILL BE DROPPED TO ITS RUID* by
+cap_bprm_set_creds() (unless it has CAP_SETUID effective in the init user ns).
+Unsafe execution means one of the following:
+
+ - LSM_UNSAFE_PTRACE: An unprivileged process is currently attached via ptrace.
+ - LSM_UNSAFE_NO_NEW_PRIVS: The current thread turned on NO_NEW_PRIVS.
+ - LSM_UNSAFE_SHARE: The fs struct is shared with a non-thread.
+
+
+=====================
+FILESYSTEM DEBUG APIS
+=====================
+
+The pid / tgid entries in procfs contain various entries that allow debugging
+access to a process. Interesting entries are:
+
+ - auxv permits an ASLR bypass
+ - cwd can permit bypassing filesystem restrictions in some cases
+ - environ can leak secret tokens
+ - fd can permit bypassing filesystem restrictions or leak access to things like
+   pipes
+ - maps permits an ASLR bypass
+ - mem grants R+W access to process memory
+ - stat permits an ASLR bypass
+
+Of these, all use both a normal filesystem DAC check (where the file owner is
+the process owner for a dumpable process, root for a nondumpable process) and a
+ptrace_may_access() check; however, the DAC check may be modified, and the
+ptrace_may_access() is performed under PTRACE_FSCREDS, meaning that instead of
+the caller's ruid, rgid and permitted capabilities, the fsuid, fsgid and
+effective capabilities are used, causing the case where a daemon drops its euid
+prior to accessing a file for the user to be treated correctly for this check.
+
+NEEDS ATTENTION:
+Special-case rules that permit introspection become a big problem here. In
+particular, if a setuid binary runs with dropped privileges (ruid=euid=user and
+suid=0 or so), it is still able to open the following entries under /proc/self:
+
+ - cwd (because normal DAC rules always permit symlink access, proc_pid_get_link
+   only checks for ptrace access and proc_cwd_link doesn't check anything)
+ - fd (same as cwd for the entries inside; the directory itself has an override
+   on the permission handler that makes introspection possible even if the
+   normal DAC rules would forbid it)
+ - maps (mode 0444, so DAC doesn't restrict anything)
+ - stat (mode 0444, and anyone can read from it; however, the interesting
+     pointers are only printed if the file opener has ptrace access to the
+     target process)
+
+In particular the fd directory is dangerous: Imagine a privileged process that
+creates some important pipe using pipe() with a dropped EUID and later performs
+writes with user-specified data on a file at a user-specified path with the same
+dropped EUID. An attacker could let the root-owned process reference the pipe as
+/proc/$pid/fd/3 and let the privileged process write to it.
+
+If necessary, it might make sense to introduce some "enable/disable
+introspection" prctl that can be used by userspace to disambiguate whether an
+access is intended to be introspection.
+
+
+Note that in all cases in which the DAC check permits access but a ptrace access
+check later on (in the VFS open handler or afterwards) restricts access,
+faccessat() is broken. However, that's probably not a real problem because
+faccessat() is racy anyway.
+
+
+===============
+USER NAMESPACES
+===============
+
+When ptrace_may_access() checks for CAP_SYS_PTRACE, the capability doesn't need
+to be active in the init namespace or in the current namespace; having it in the
+namespace of the ptrace target is sufficient. There are currently no further
+restrictions on this, meaning that the behavior of a task that creates or enters
+a user namespace is somewhat unintuitive:
+
+ - If a nondumpable task enters a user namespace, it will still be
+   nondumpable - but because the owner of the namespace is capable relative to
+   it when viewing the namespace from outside, this will often cause a
+   nondumpable task to effectively become dumpable because of a setns() /
+   clone() / unshare() call.
+
+ - The root user of a namespace can ptrace any process that enters the namespace
+   (via setns()) - immediately. This means that a process that wants to enter an
+   untrusted namespace from outside needs to be *very* careful to drop any
+   privileges it might have - uids, open file descriptors, secret data that's
+   still in memory, the controlling TTY and so on. (This also means that a
+   namespace owner who is unprivileged outside the namespace can't safely enter
+   the namespace if he doesn't trust the namespace root user: He needs
+   euid==ns->owner to enter the namespace, but must not have euid==ns->owner
+   after having entered the namespace. To get around this, it is necessary to
+   introduce a third namespace between the other two.)
+
+One way to make it easier for userspace to get this right might be to bind
+dumpability to namespaces, as follows: Let a task become nondumpable on
+setns() / clone(CLONE_NEWUSER) / unshare(CLONE_NEWUSER). Record the current
+namespace on dumpable -> nondumpable transition as "dumpability namespace".
+Require the caller to be privileged in the target's dumpability namespace, not
+just the target's current namespace.
+
+=============
+FUSE AND CUSE
+=============
+
+Anyone who can open /dev/cuse (root-owned, mode 0600) can arbitrarily read and
+write to the memory of various system processes (to be more precise, any process
+that opens files in /dev when they appear or so, then calls ioctl() on them; in
+particular, acpid does this) without further checks on capabilities or so.
+
+NEEDS ATTENTION:
+FUSE is more accessible than CUSE, but also more strict. ioctl() replies are
+restricted in location and size based on the ioctl command number (size) and the
+argument (location), but the ability to interactively control the replies to VFS
+method calls is still scary. Therefore, FUSE by default rejects filesystem
+access by anyone except the filesystem owner. This isn't done using the normal
+ptrace check, though; instead, fuse_allow_current_process() is used, which only
+looks at the UIDs and GIDs and therefore is a very simplified version of the
+normal ptrace access check. In particular, this provides no protection for
+setcap binaries.
-- 
2.1.4


  parent reply	other threads:[~2016-09-23 20:40 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-23 20:40 [PATCH v2 0/8] Various fixes related to ptrace_may_access() Jann Horn
2016-09-23 20:40 ` [PATCH v2 1/8] exec: introduce cred_guard_light Jann Horn
2016-09-30 15:35   ` Oleg Nesterov
2016-09-30 18:27     ` Eric W. Biederman
2016-10-03 16:02       ` Oleg Nesterov
2016-10-30 21:12     ` Jann Horn
2016-09-23 20:40 ` [PATCH v2 2/8] exec: turn self_exec_id into self_privunit Jann Horn
2016-09-23 21:04   ` Andy Lutomirski
2016-09-23 21:33     ` Jann Horn
2016-09-30 13:20   ` Oleg Nesterov
2016-09-30 13:44     ` Oleg Nesterov
2016-09-30 18:30       ` Kees Cook
2016-09-30 18:59         ` Jann Horn
2016-09-30 19:05           ` Kees Cook
2016-10-03 16:37         ` Oleg Nesterov
2016-09-23 20:40 ` [PATCH v2 3/8] proc: use open()-time creds for ptrace checks Jann Horn
2016-09-23 20:40 ` [PATCH v2 4/8] futex: don't leak robust_list pointer Jann Horn
2016-09-30 14:52   ` Oleg Nesterov
2016-10-30 17:16     ` Jann Horn
2016-11-02 21:39       ` Jann Horn
2016-11-02 22:47         ` Jann Horn
2016-09-23 20:40 ` [PATCH v2 5/8] proc: lock properly in ptrace_may_access callers Jann Horn
2016-09-23 20:40 ` [PATCH v2 6/8] ptrace: warn on ptrace_may_access without proper locking Jann Horn
2016-09-23 20:40 ` [PATCH v2 7/8] fs/proc: fix attr access check Jann Horn
2016-09-23 20:40 ` Jann Horn [this message]
2016-10-02  3:16   ` [PATCH v2 8/8] Documentation: add security/ptrace_checks.txt Krister Johansen
2016-10-30 19:09     ` Jann Horn
2016-10-31  4:14       ` Eric W. Biederman
2016-10-31 13:39         ` Jann Horn
2016-11-03 20:43         ` Krister Johansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1474663238-22134-9-git-send-email-jann@thejh.net \
    --to=jann@thejh.net \
    --cc=akpm@linux-foundation.org \
    --cc=aul@paul-moore.com \
    --cc=bcrl@kvack.org \
    --cc=ben@decadent.org.uk \
    --cc=casey@schaufler-ca.com \
    --cc=ebiederm@xmission.com \
    --cc=eparis@parisplace.org \
    --cc=james.l.morris@oracle.com \
    --cc=jdanis@google.com \
    --cc=john.johansen@canonical.com \
    --cc=keescook@chromium.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=oleg@redhat.com \
    --cc=roland@hack.frob.com \
    --cc=sds@tycho.nsa.gov \
    --cc=security@kernel.org \
    --cc=serge@hallyn.com \
    --cc=seth.forshee@canonical.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.