linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] Various fixes related to ptrace_may_access()
@ 2016-09-18 15:05 Jann Horn
  2016-09-18 15:05 ` [PATCH 1/9] exec: introduce cred_guard_light Jann Horn
                   ` (8 more replies)
  0 siblings, 9 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-18 15:05 UTC (permalink / raw)
  To: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

This series has a bunch of loosely-related fixes for minor security bugs. Since
the bugs are minor and the patches aren't trivial, I'm sending it publicly.

The reason I'm bundling these patches up as a series instead of sending patches
one by one is that e.g. patch 2 creates some common infrastructure that multiple
other patches depend on.

For specific information about what the purpose of this series is, please see
the individual commits - but the general theme is:

 - get rid of races that can leak things like userspace addresses during setuid
   execve()
 - get rid of procfs files that cause unexpected behavior when passed around
 - add warnings to keep developers from creating more issues like this
 - document access checks

I hope I split the patches up sufficiently?

(To people on the security ML: This is a reworked version of my old ptrace fixes
series.)

Jann Horn (9):
  exec: introduce cred_guard_light
  exec: turn self_exec_id into self_privunit_id
  proc: use open()-time creds for ptrace checks
  futex: don't leak robust_list pointer
  proc: lock properly in ptrace_may_access callers
  ptrace: warn on ptrace_may_access without proper locking
  ptrace: forbid ptrace checks against current_cred() from VFS context
  fs/proc: fix attr access check
  Documentation: add security/ptrace_checks.txt

 Documentation/security/ptrace_checks.txt | 229 +++++++++++++++++++++++++++++++
 fs/aio.c                                 |   2 +
 fs/exec.c                                |  36 ++++-
 fs/proc/array.c                          |  10 +-
 fs/proc/base.c                           | 204 ++++++++++++++++++++-------
 fs/proc/internal.h                       |  14 ++
 fs/proc/namespaces.c                     |  14 ++
 fs/read_write.c                          |  12 ++
 fs/splice.c                              |  12 +-
 include/linux/binfmts.h                  |   1 +
 include/linux/init_task.h                |   1 +
 include/linux/lsm_hooks.h                |   3 +-
 include/linux/ptrace.h                   |   5 +
 include/linux/sched.h                    |  18 ++-
 include/linux/security.h                 |  10 +-
 kernel/fork.c                            |   6 +-
 kernel/futex.c                           |  31 +++--
 kernel/futex_compat.c                    |  31 +++--
 kernel/ptrace.c                          |  60 ++++++--
 kernel/signal.c                          |   2 +-
 security/apparmor/include/ipc.h          |   2 +-
 security/apparmor/ipc.c                  |   4 +-
 security/apparmor/lsm.c                  |  14 +-
 security/commoncap.c                     |   8 +-
 security/security.c                      |   5 +-
 security/selinux/hooks.c                 |   4 +-
 security/smack/smack_lsm.c               |  18 ++-
 security/yama/yama_lsm.c                 |   9 +-
 28 files changed, 646 insertions(+), 119 deletions(-)
 create mode 100644 Documentation/security/ptrace_checks.txt

-- 
2.1.4


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/9] exec: introduce cred_guard_light
  2016-09-18 15:05 [PATCH 0/9] Various fixes related to ptrace_may_access() Jann Horn
@ 2016-09-18 15:05 ` Jann Horn
  2016-09-18 15:05 ` [PATCH 2/9] exec: turn self_exec_id into self_privunit_id Jann Horn
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-18 15:05 UTC (permalink / raw)
  To: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

This is a new per-threadgroup lock that can often be taken instead of
cred_guard_mutex and has less deadlock potential. I'm doing this because
Oleg Nesterov mentioned the potential for deadlocks, in particular if a
debugged task is stuck in execve, trying to get rid of a ptrace-stopped
thread, and the debugger attempts to inspect procfs files of the debugged
task.

The binfmt handlers (in particular for elf_fdpic and flat) might still
call VFS read and mmap operations on the binary with the lock held, but
not open operations (as is the case with cred_guard_mutex).

An rwlock would be more appropriate here, but apparently those don't
have _killable variants of the locking functions?

This is a preparation patch for using proper locking in more places.

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jann Horn <jann@thejh.net>
---
 fs/exec.c                 | 15 ++++++++++++++-
 include/linux/init_task.h |  1 +
 include/linux/sched.h     | 10 ++++++++++
 kernel/fork.c             |  1 +
 kernel/ptrace.c           | 10 ++++++++++
 5 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/fs/exec.c b/fs/exec.c
index 6fcfb3f..84430ee 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1238,6 +1238,10 @@ int flush_old_exec(struct linux_binprm * bprm)
 	if (retval)
 		goto out;
 
+	retval = mutex_lock_killable(&current->signal->cred_guard_light);
+	if (retval)
+		goto out;
+
 	/*
 	 * Must be called _before_ exec_mmap() as bprm->mm is
 	 * not visibile until then. This also enables the update
@@ -1251,7 +1255,7 @@ int flush_old_exec(struct linux_binprm * bprm)
 	acct_arg_size(bprm, 0);
 	retval = exec_mmap(bprm->mm);
 	if (retval)
-		goto out;
+		goto out_unlock;
 
 	bprm->mm = NULL;		/* We're using it now */
 
@@ -1263,6 +1267,8 @@ int flush_old_exec(struct linux_binprm * bprm)
 
 	return 0;
 
+out_unlock:
+	mutex_unlock(&current->signal->cred_guard_light);
 out:
 	return retval;
 }
@@ -1386,6 +1392,7 @@ void install_exec_creds(struct linux_binprm *bprm)
 	 * credentials; any time after this it may be unlocked.
 	 */
 	security_bprm_committed_creds(bprm);
+	mutex_unlock(&current->signal->cred_guard_light);
 	mutex_unlock(&current->signal->cred_guard_mutex);
 }
 EXPORT_SYMBOL(install_exec_creds);
@@ -1753,6 +1760,12 @@ static int do_execveat_common(int fd, struct filename *filename,
 	return retval;
 
 out:
+	if (!bprm->mm && bprm->cred) {
+		/* failure after flush_old_exec(), but before
+		 * install_exec_creds()
+		 */
+		mutex_unlock(&current->signal->cred_guard_light);
+	}
 	if (bprm->mm) {
 		acct_arg_size(bprm, 0);
 		mmput(bprm->mm);
diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index f8834f8..cd9faa0 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -58,6 +58,7 @@ extern struct fs_struct init_fs;
 	INIT_PREV_CPUTIME(sig)						\
 	.cred_guard_mutex =						\
 		 __MUTEX_INITIALIZER(sig.cred_guard_mutex),		\
+	.cred_guard_light = __MUTEX_INITIALIZER(sig.cred_guard_light)	\
 }
 
 extern struct nsproxy init_nsproxy;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 62c68e5..2a1df2f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -808,6 +808,16 @@ struct signal_struct {
 	struct mutex cred_guard_mutex;	/* guard against foreign influences on
 					 * credential calculations
 					 * (notably. ptrace) */
+	/*
+	 * Lightweight version of cred_guard_mutex; used to prevent race
+	 * conditions where a user can gain information about the post-execve
+	 * state of a task to which access should only be granted pre-execve.
+	 * Hold this mutex while performing remote task inspection associated
+	 * with a security check.
+	 * This mutex MUST NOT be used in cases where anything changes about
+	 * the security properties of a running execve().
+	 */
+	struct mutex cred_guard_light;
 };
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index beb3172..2d46f3a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1215,6 +1215,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
 				   current->signal->is_child_subreaper;
 
 	mutex_init(&sig->cred_guard_mutex);
+	mutex_init(&sig->cred_guard_light);
 
 	return 0;
 }
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 1d3b766..b5120ec 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -283,6 +283,16 @@ ok:
 	return security_ptrace_access_check(task, mode);
 }
 
+/*
+ * NOTE: When you call this function, you need to ensure that the target task
+ * can't acquire (via setuid execve) credentials between the ptrace access
+ * check and the privileged access. The recommended way to do this is to hold
+ * one of task->signal->{cred_guard_mutex,cred_guard_light} while calling this
+ * function and performing the requested access.
+ *
+ * This function may only be used if access is requested in the name of
+ * current_cred().
+ */
 bool ptrace_may_access(struct task_struct *task, unsigned int mode)
 {
 	int err;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/9] exec: turn self_exec_id into self_privunit_id
  2016-09-18 15:05 [PATCH 0/9] Various fixes related to ptrace_may_access() Jann Horn
  2016-09-18 15:05 ` [PATCH 1/9] exec: introduce cred_guard_light Jann Horn
@ 2016-09-18 15:05 ` Jann Horn
  2016-09-18 18:13   ` Ben Hutchings
  2016-09-18 15:05 ` [PATCH 3/9] proc: use open()-time creds for ptrace checks Jann Horn
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Jann Horn @ 2016-09-18 15:05 UTC (permalink / raw)
  To: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

This ensures that self_privunit_id ("privilege unit ID") is only shared by
processes that share the mm_struct and the signal_struct; not just
spatially, but also temporally. In other words, if you do execve() or
clone() without CLONE_THREAD, you get a new privunit_id that has never been
used before.

One reason for doing this is that it prevents an attacker from sending an
arbitrary signal to a parent process after performing 2^32-1 execve()
calls.

The second reason for this is that it permits using the self_exec_id in
a later patch to check during a ptrace access whether subject and object
are temporally and spatially equal for privilege checking purposes.

This patch was grabbed from grsecurity and modified. Credit for the
original patch goes to Brad Spengler <spender@grsecurity.net>.

Signed-off-by: Jann Horn <jann@thejh.net>
---
 fs/exec.c               | 21 ++++++++++++++++++++-
 include/linux/binfmts.h |  1 +
 include/linux/sched.h   |  4 ++--
 kernel/fork.c           |  5 +++--
 kernel/signal.c         |  2 +-
 5 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 84430ee..1a15cb0 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1281,6 +1281,25 @@ void would_dump(struct linux_binprm *bprm, struct file *file)
 }
 EXPORT_SYMBOL(would_dump);
 
+static DEFINE_PER_CPU(u64, exec_counter);
+static int __init init_exec_counters(void)
+{
+	unsigned int cpu;
+
+	for_each_possible_cpu(cpu) {
+		per_cpu(exec_counter, cpu) = (u64)cpu;
+	}
+
+	return 0;
+}
+early_initcall(init_exec_counters);
+
+void increment_privunit_counter(void)
+{
+	BUILD_BUG_ON(NR_CPUS > (1 << 16));
+	current->self_privunit_id = this_cpu_add_return(exec_counter, NR_CPUS);
+}
+
 void setup_new_exec(struct linux_binprm * bprm)
 {
 	arch_pick_mmap_layout(current->mm);
@@ -1314,7 +1333,7 @@ void setup_new_exec(struct linux_binprm * bprm)
 
 	/* An exec changes our domain. We are no longer part of the thread
 	   group */
-	current->self_exec_id++;
+	increment_privunit_counter();
 	flush_signal_handlers(current, 0);
 	do_close_on_exec(current->files);
 }
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index 1303b57..9570bd0 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -100,6 +100,7 @@ extern int prepare_binprm(struct linux_binprm *);
 extern int __must_check remove_arg_zero(struct linux_binprm *);
 extern int search_binary_handler(struct linux_binprm *);
 extern int flush_old_exec(struct linux_binprm * bprm);
+extern void increment_privunit_counter(void);
 extern void setup_new_exec(struct linux_binprm * bprm);
 extern void would_dump(struct linux_binprm *, struct file *);
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2a1df2f..e4bf894 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1688,8 +1688,8 @@ struct task_struct {
 	struct seccomp seccomp;
 
 /* Thread group tracking */
-   	u32 parent_exec_id;
-   	u32 self_exec_id;
+	u64 parent_privunit_id;
+	u64 self_privunit_id;
 /* Protection of (de-)allocation: mm, files, fs, tty, keyrings, mems_allowed,
  * mempolicy */
 	spinlock_t alloc_lock;
diff --git a/kernel/fork.c b/kernel/fork.c
index 2d46f3a..537c117 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1567,6 +1567,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 			p->exit_signal = (clone_flags & CSIGNAL);
 		p->group_leader = p;
 		p->tgid = p->pid;
+		increment_privunit_counter();
 	}
 
 	p->nr_dirtied = 0;
@@ -1597,10 +1598,10 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	/* CLONE_PARENT re-uses the old parent */
 	if (clone_flags & (CLONE_PARENT|CLONE_THREAD)) {
 		p->real_parent = current->real_parent;
-		p->parent_exec_id = current->parent_exec_id;
+		p->parent_privunit_id = current->parent_privunit_id;
 	} else {
 		p->real_parent = current;
-		p->parent_exec_id = current->self_exec_id;
+		p->parent_privunit_id = current->self_privunit_id;
 	}
 
 	spin_lock(&current->sighand->siglock);
diff --git a/kernel/signal.c b/kernel/signal.c
index af21afc..e4e3e1b 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1590,7 +1590,7 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 		 * This is only possible if parent == real_parent.
 		 * Check if it has changed security domain.
 		 */
-		if (tsk->parent_exec_id != tsk->parent->self_exec_id)
+		if (tsk->parent_privunit_id != tsk->parent->self_privunit_id)
 			sig = SIGCHLD;
 	}
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/9] proc: use open()-time creds for ptrace checks
  2016-09-18 15:05 [PATCH 0/9] Various fixes related to ptrace_may_access() Jann Horn
  2016-09-18 15:05 ` [PATCH 1/9] exec: introduce cred_guard_light Jann Horn
  2016-09-18 15:05 ` [PATCH 2/9] exec: turn self_exec_id into self_privunit_id Jann Horn
@ 2016-09-18 15:05 ` Jann Horn
  2016-09-19 13:01   ` Stephen Smalley
  2016-09-18 15:05 ` [PATCH 4/9] futex: don't leak robust_list pointer Jann Horn
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Jann Horn @ 2016-09-18 15:05 UTC (permalink / raw)
  To: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

This adds a new ptrace_may_access_noncurrent() method that
uses the supplied credentials instead of those of the
current task. (However, the current task may still be
inspected for auditing purposes, e.g. by the Smack LSM.)

procfs used the caller's creds for a few ptrace_may_access()
checks at read() time, which made a confused deputy attack
by passing an FD to a procfs file to a setuid program
possible. Therefore, the following was a local userspace
ASLR bypass:

rm -f /tmp/foobar
procmail <(echo 'DEFAULT=/tmp/foobar') </proc/1/stat
(
  echo 'obase=16'
  cut -d' ' -f26-30,45-51 </tmp/foobar | tr ' ' '\n'
) | bc
rm /tmp/foobar

procmail is installed setuid root on Debian and read()s
data from stdin before dropping privs, so the
ptrace_may_access() check in the VFS read handler of
/proc/1/stat passes. Procmail then dumps the read data
to a user-accessible file (/tmp/foobar here).

Signed-off-by: Jann Horn <jann@thejh.net>
---
 fs/proc/array.c                 |  3 +-
 fs/proc/base.c                  | 63 ++++++++++++++++++++++++++++++++++-------
 fs/proc/internal.h              | 14 +++++++++
 include/linux/lsm_hooks.h       |  3 +-
 include/linux/ptrace.h          |  5 ++++
 include/linux/security.h        | 10 ++++---
 kernel/ptrace.c                 | 40 +++++++++++++++++++-------
 security/apparmor/include/ipc.h |  2 +-
 security/apparmor/ipc.c         |  4 +--
 security/apparmor/lsm.c         | 14 +++++++--
 security/commoncap.c            |  8 +++---
 security/security.c             |  5 ++--
 security/selinux/hooks.c        |  4 +--
 security/smack/smack_lsm.c      | 18 ++++++++----
 security/yama/yama_lsm.c        |  9 ++++--
 15 files changed, 150 insertions(+), 52 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 88c7de1..3349742 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -413,7 +413,8 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 
 	state = *get_task_state(task);
 	vsize = eip = esp = 0;
-	permitted = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS | PTRACE_MODE_NOAUDIT);
+	permitted = proc_ptrace_may_access_seq(task,
+			PTRACE_MODE_READ_FSCREDS | PTRACE_MODE_NOAUDIT, m);
 	mm = get_task_mm(task);
 	if (mm) {
 		vsize = task_vsize(mm);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index ac0df4d..771199b 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -430,8 +430,9 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
 
 	wchan = get_wchan(task);
 
-	if (wchan && ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)
-			&& !lookup_symbol_name(wchan, symname))
+	if (wchan &&
+	    proc_ptrace_may_access_seq(task, PTRACE_MODE_READ_FSCREDS, m) &&
+	    !lookup_symbol_name(wchan, symname))
 		seq_printf(m, "%s", symname);
 	else
 		seq_putc(m, '0');
@@ -440,12 +441,12 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
 }
 #endif /* CONFIG_KALLSYMS */
 
-static int lock_trace(struct task_struct *task)
+static int lock_trace(struct seq_file *m, struct task_struct *task)
 {
 	int err = mutex_lock_killable(&task->signal->cred_guard_mutex);
 	if (err)
 		return err;
-	if (!ptrace_may_access(task, PTRACE_MODE_ATTACH_FSCREDS)) {
+	if (!proc_ptrace_may_access_seq(task, PTRACE_MODE_ATTACH_FSCREDS, m)) {
 		mutex_unlock(&task->signal->cred_guard_mutex);
 		return -EPERM;
 	}
@@ -478,7 +479,7 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns,
 	trace.entries		= entries;
 	trace.skip		= 0;
 
-	err = lock_trace(task);
+	err = lock_trace(m, task);
 	if (!err) {
 		save_stack_trace_tsk(task, &trace);
 
@@ -660,7 +661,7 @@ static int proc_pid_syscall(struct seq_file *m, struct pid_namespace *ns,
 	unsigned long args[6], sp, pc;
 	int res;
 
-	res = lock_trace(task);
+	res = lock_trace(m, task);
 	if (res)
 		return res;
 
@@ -770,7 +771,7 @@ static const struct inode_operations proc_def_inode_operations = {
 
 static int proc_single_show(struct seq_file *m, void *v)
 {
-	struct inode *inode = m->private;
+	struct inode *inode = m->file->f_inode;
 	struct pid_namespace *ns;
 	struct pid *pid;
 	struct task_struct *task;
@@ -790,14 +791,35 @@ static int proc_single_show(struct seq_file *m, void *v)
 
 static int proc_single_open(struct inode *inode, struct file *filp)
 {
-	return single_open(filp, proc_single_show, inode);
+	int ret;
+	u64 *privunit_id = kmalloc(sizeof(u64), GFP_KERNEL);
+
+	if (privunit_id == NULL)
+		return -ENOMEM;
+	*privunit_id = current->self_privunit_id;
+
+	ret = single_open(filp, proc_single_show, privunit_id);
+
+	if (ret)
+		kfree(privunit_id);
+	return ret;
+}
+
+static int proc_single_release(struct inode *inode, struct file *filp)
+{
+	struct seq_file *seq = filp->private_data;
+	u64 *privunit_id = seq->private;
+
+	kfree(privunit_id);
+
+	return single_release(inode, filp);
 }
 
 static const struct file_operations proc_single_file_operations = {
 	.open		= proc_single_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= single_release,
+	.release	= proc_single_release,
 };
 
 
@@ -2157,10 +2179,29 @@ out:
 	return ret;
 }
 
+int proc_open_get_privunit_id(struct inode *inode, struct file *filp)
+{
+	u64 *privunit_id = kmalloc(sizeof(u64), GFP_KERNEL);
+
+	if (privunit_id == NULL)
+		return -ENOMEM;
+	*privunit_id = current->self_privunit_id;
+	filp->private_data = privunit_id;
+	return 0;
+}
+
+int proc_release_put_privunit_id(struct inode *inode, struct file *filp)
+{
+	kfree((u64 *)filp->private_data);
+	return 0;
+}
+
 static const struct file_operations proc_map_files_operations = {
 	.read		= generic_read_dir,
 	.iterate_shared	= proc_map_files_readdir,
 	.llseek		= generic_file_llseek,
+	.open		= proc_open_get_privunit_id,
+	.release	= proc_release_put_privunit_id,
 };
 
 #ifdef CONFIG_CHECKPOINT_RESTORE
@@ -2614,7 +2655,7 @@ static int do_io_accounting(struct task_struct *task, struct seq_file *m, int wh
 	if (result)
 		return result;
 
-	if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)) {
+	if (!proc_ptrace_may_access_seq(task, PTRACE_MODE_READ_FSCREDS, m)) {
 		result = -EACCES;
 		goto out_unlock;
 	}
@@ -2798,7 +2839,7 @@ static const struct file_operations proc_setgroups_operations = {
 static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
 				struct pid *pid, struct task_struct *task)
 {
-	int err = lock_trace(task);
+	int err = lock_trace(m, task);
 	if (!err) {
 		seq_printf(m, "%08x\n", task->personality);
 		unlock_trace(task);
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 7931c55..9498879 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -14,6 +14,9 @@
 #include <linux/spinlock.h>
 #include <linux/atomic.h>
 #include <linux/binfmts.h>
+#include <linux/ptrace.h>
+#include <linux/pid.h>
+#include <linux/seq_file.h>
 
 struct ctl_table_header;
 struct mempolicy;
@@ -134,6 +137,15 @@ out:
 	return ~0U;
 }
 
+static inline bool proc_ptrace_may_access_seq(struct task_struct *task,
+			unsigned int mode, struct seq_file *m)
+{
+	const u64 *privunit_id = m->private;
+
+	return ptrace_may_access_noncurrent(task, mode, m->file->f_cred,
+					    privunit_id);
+}
+
 /*
  * Offset of the first process in the /proc root directory..
  */
@@ -168,6 +180,8 @@ extern int pid_delete_dentry(const struct dentry *);
 extern int proc_pid_readdir(struct file *, struct dir_context *);
 extern struct dentry *proc_pid_lookup(struct inode *, struct dentry *, unsigned int);
 extern loff_t mem_lseek(struct file *, loff_t, int);
+extern int proc_open_get_privunit_id(struct inode *, struct file *);
+extern int proc_release_put_privunit_id(struct inode *, struct file *);
 
 /* Lookups */
 typedef int instantiate_t(struct inode *, struct dentry *,
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 101bf19..f33ef61 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1147,6 +1147,7 @@
  *	attributes would be changed by the execve.
  *	@child contains the task_struct structure for the target process.
  *	@mode contains the PTRACE_MODE flags indicating the form of access.
+ *	@cred contains the caller's credentials
  *	Return 0 if permission is granted.
  * @ptrace_traceme:
  *	Check that the @parent process has sufficient permission to trace the
@@ -1315,7 +1316,7 @@ union security_list_options {
 					struct file *file);
 
 	int (*ptrace_access_check)(struct task_struct *child,
-					unsigned int mode);
+			unsigned int mode, const struct cred *cred);
 	int (*ptrace_traceme)(struct task_struct *parent);
 	int (*capget)(struct task_struct *target, kernel_cap_t *effective,
 			kernel_cap_t *inheritable, kernel_cap_t *permitted);
diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h
index 504c98a..d48d953 100644
--- a/include/linux/ptrace.h
+++ b/include/linux/ptrace.h
@@ -66,6 +66,9 @@ extern void exit_ptrace(struct task_struct *tracer, struct list_head *dead);
 #define PTRACE_MODE_ATTACH_FSCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_FSCREDS)
 #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS)
 
+/* ignore relationship between current and target task */
+#define PTRACE_MODE_NON_CURRENT 0x20
+
 /**
  * ptrace_may_access - check whether the caller is permitted to access
  * a target task.
@@ -81,6 +84,8 @@ extern void exit_ptrace(struct task_struct *tracer, struct list_head *dead);
  * process_vm_writev or ptrace (and should use the real credentials).
  */
 extern bool ptrace_may_access(struct task_struct *task, unsigned int mode);
+bool ptrace_may_access_noncurrent(struct task_struct *task, unsigned int mode,
+		const struct cred *cred, const u64 *privunit_id);
 
 static inline int ptrace_reparented(struct task_struct *child)
 {
diff --git a/include/linux/security.h b/include/linux/security.h
index 7831cd5..e2cca1d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -72,7 +72,8 @@ struct timezone;
 extern int cap_capable(const struct cred *cred, struct user_namespace *ns,
 		       int cap, int audit);
 extern int cap_settime(const struct timespec64 *ts, const struct timezone *tz);
-extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode);
+extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode,
+				   const struct cred *cred);
 extern int cap_ptrace_traceme(struct task_struct *parent);
 extern int cap_capget(struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted);
 extern int cap_capset(struct cred *new, const struct cred *old,
@@ -191,7 +192,8 @@ int security_binder_transfer_binder(struct task_struct *from,
 				    struct task_struct *to);
 int security_binder_transfer_file(struct task_struct *from,
 				  struct task_struct *to, struct file *file);
-int security_ptrace_access_check(struct task_struct *child, unsigned int mode);
+int security_ptrace_access_check(struct task_struct *child, unsigned int mode,
+				 const struct cred *cred);
 int security_ptrace_traceme(struct task_struct *parent);
 int security_capget(struct task_struct *target,
 		    kernel_cap_t *effective,
@@ -414,9 +416,9 @@ static inline int security_binder_transfer_file(struct task_struct *from,
 }
 
 static inline int security_ptrace_access_check(struct task_struct *child,
-					     unsigned int mode)
+			unsigned int mode, const struct cred *cred)
 {
-	return cap_ptrace_access_check(child, mode);
+	return cap_ptrace_access_check(child, mode, cred);
 }
 
 static inline int security_ptrace_traceme(struct task_struct *parent)
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index b5120ec..dc7594a 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -206,18 +206,20 @@ static int ptrace_check_attach(struct task_struct *child, bool ignore_state)
 	return ret;
 }
 
-static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode)
+static int ptrace_has_cap(const struct cred *cred, struct user_namespace *ns,
+			  unsigned int mode)
 {
 	if (mode & PTRACE_MODE_NOAUDIT)
-		return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE);
+		return security_capable_noaudit(cred, ns, CAP_SYS_PTRACE) == 0;
 	else
-		return has_ns_capability(current, ns, CAP_SYS_PTRACE);
+		return security_capable(cred, ns, CAP_SYS_PTRACE) == 0;
 }
 
 /* Returns 0 on success, -errno on denial. */
-static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
+static int __ptrace_may_access(struct task_struct *task, unsigned int mode,
+			       const struct cred *cred, const u64 *privunit_id)
 {
-	const struct cred *cred = current_cred(), *tcred;
+	const struct cred *tcred;
 	int dumpable = 0;
 	kuid_t caller_uid;
 	kgid_t caller_gid;
@@ -237,8 +239,9 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 	 */
 
 	/* Don't let security modules deny introspection */
-	if (same_thread_group(task, current))
+	if (*privunit_id == task->self_privunit_id)
 		return 0;
+
 	rcu_read_lock();
 	if (mode & PTRACE_MODE_FSCREDS) {
 		caller_uid = cred->fsuid;
@@ -263,7 +266,7 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode)
 	    gid_eq(caller_gid, tcred->sgid) &&
 	    gid_eq(caller_gid, tcred->gid))
 		goto ok;
-	if (ptrace_has_cap(tcred->user_ns, mode))
+	if (ptrace_has_cap(cred, tcred->user_ns, mode))
 		goto ok;
 	rcu_read_unlock();
 	return -EPERM;
@@ -274,13 +277,13 @@ ok:
 		dumpable = get_dumpable(task->mm);
 	rcu_read_lock();
 	if (dumpable != SUID_DUMP_USER &&
-	    !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
+	    !ptrace_has_cap(cred, __task_cred(task)->user_ns, mode)) {
 		rcu_read_unlock();
 		return -EPERM;
 	}
 	rcu_read_unlock();
 
-	return security_ptrace_access_check(task, mode);
+	return security_ptrace_access_check(task, mode, cred);
 }
 
 /*
@@ -297,7 +300,21 @@ bool ptrace_may_access(struct task_struct *task, unsigned int mode)
 {
 	int err;
 	task_lock(task);
-	err = __ptrace_may_access(task, mode);
+	err = __ptrace_may_access(task, mode, current_cred(),
+				  &current->self_privunit_id);
+	task_unlock(task);
+	return !err;
+}
+
+bool ptrace_may_access_noncurrent(struct task_struct *task, unsigned int mode,
+				  const struct cred *cred,
+				  const u64 *privunit_id)
+{
+	int err;
+
+	task_lock(task);
+	err = __ptrace_may_access(task, mode | PTRACE_MODE_NON_CURRENT, cred,
+				  privunit_id);
 	task_unlock(task);
 	return !err;
 }
@@ -338,7 +355,8 @@ static int ptrace_attach(struct task_struct *task, long request,
 		goto out;
 
 	task_lock(task);
-	retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
+	retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS,
+	    current_cred(), &current->self_privunit_id);
 	task_unlock(task);
 	if (retval)
 		goto unlock_creds;
diff --git a/security/apparmor/include/ipc.h b/security/apparmor/include/ipc.h
index 288ca76..eb685c1 100644
--- a/security/apparmor/include/ipc.h
+++ b/security/apparmor/include/ipc.h
@@ -22,7 +22,7 @@ struct aa_profile;
 int aa_may_ptrace(struct aa_profile *tracer, struct aa_profile *tracee,
 		  unsigned int mode);
 
-int aa_ptrace(struct task_struct *tracer, struct task_struct *tracee,
+int aa_ptrace(struct aa_profile *tracer_p, struct task_struct *tracee,
 	      unsigned int mode);
 
 #endif /* __AA_IPC_H */
diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
index 777ac1c..c60b374 100644
--- a/security/apparmor/ipc.c
+++ b/security/apparmor/ipc.c
@@ -82,7 +82,7 @@ int aa_may_ptrace(struct aa_profile *tracer, struct aa_profile *tracee,
  *
  * Returns: %0 else error code if permission denied or error
  */
-int aa_ptrace(struct task_struct *tracer, struct task_struct *tracee,
+int aa_ptrace(struct aa_profile *tracer_p, struct task_struct *tracee,
 	      unsigned int mode)
 {
 	/*
@@ -94,7 +94,6 @@ int aa_ptrace(struct task_struct *tracer, struct task_struct *tracee,
 	 *       - tracer profile has CAP_SYS_PTRACE
 	 */
 
-	struct aa_profile *tracer_p = aa_get_task_profile(tracer);
 	int error = 0;
 
 	if (!unconfined(tracer_p)) {
@@ -105,7 +104,6 @@ int aa_ptrace(struct task_struct *tracer, struct task_struct *tracee,
 
 		aa_put_profile(tracee_p);
 	}
-	aa_put_profile(tracer_p);
 
 	return error;
 }
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 41b8cb1..dd94df2 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -94,14 +94,22 @@ static void apparmor_cred_transfer(struct cred *new, const struct cred *old)
 }
 
 static int apparmor_ptrace_access_check(struct task_struct *child,
-					unsigned int mode)
+				 unsigned int mode, const struct cred *cred)
 {
-	return aa_ptrace(current, child, mode);
+	struct aa_profile *tracer_p = aa_get_profile(aa_cred_profile(cred));
+	int res = aa_ptrace(tracer_p, child, mode);
+
+	aa_put_profile(tracer_p);
+	return res;
 }
 
 static int apparmor_ptrace_traceme(struct task_struct *parent)
 {
-	return aa_ptrace(parent, current, PTRACE_MODE_ATTACH);
+	struct aa_profile *tracer_p = aa_get_task_profile(parent);
+	int res = aa_ptrace(tracer_p, current, PTRACE_MODE_ATTACH);
+
+	aa_put_profile(tracer_p);
+	return res;
 }
 
 /* Derived from security/commoncap.c:cap_capget */
diff --git a/security/commoncap.c b/security/commoncap.c
index 14540bd..fe0a73a 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -133,14 +133,14 @@ int cap_settime(const struct timespec64 *ts, const struct timezone *tz)
  * Determine whether a process may access another, returning 0 if permission
  * granted, -ve if denied.
  */
-int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
+int cap_ptrace_access_check(struct task_struct *child, unsigned int mode,
+			    const struct cred *cred)
 {
 	int ret = 0;
-	const struct cred *cred, *child_cred;
+	const struct cred *child_cred;
 	const kernel_cap_t *caller_caps;
 
 	rcu_read_lock();
-	cred = current_cred();
 	child_cred = __task_cred(child);
 	if (mode & PTRACE_MODE_FSCREDS)
 		caller_caps = &cred->cap_effective;
@@ -149,7 +149,7 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
 	if (cred->user_ns == child_cred->user_ns &&
 	    cap_issubset(child_cred->cap_permitted, *caller_caps))
 		goto out;
-	if (ns_capable(child_cred->user_ns, CAP_SYS_PTRACE))
+	if (security_capable(cred, child_cred->user_ns, CAP_SYS_PTRACE) == 0)
 		goto out;
 	ret = -EPERM;
 out:
diff --git a/security/security.c b/security/security.c
index 4838e7f..70c021a 100644
--- a/security/security.c
+++ b/security/security.c
@@ -154,9 +154,10 @@ int security_binder_transfer_file(struct task_struct *from,
 	return call_int_hook(binder_transfer_file, 0, from, to, file);
 }
 
-int security_ptrace_access_check(struct task_struct *child, unsigned int mode)
+int security_ptrace_access_check(struct task_struct *child, unsigned int mode,
+				 const struct cred *cred)
 {
-	return call_int_hook(ptrace_access_check, 0, child, mode);
+	return call_int_hook(ptrace_access_check, 0, child, mode, cred);
 }
 
 int security_ptrace_traceme(struct task_struct *parent)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 13185a6..f9a0be7 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2133,10 +2133,10 @@ static int selinux_binder_transfer_file(struct task_struct *from,
 }
 
 static int selinux_ptrace_access_check(struct task_struct *child,
-				     unsigned int mode)
+				     unsigned int mode, const struct cred *cred)
 {
 	if (mode & PTRACE_MODE_READ) {
-		u32 sid = current_sid();
+		u32 sid = cred_sid(cred);
 		u32 csid = task_sid(child);
 		return avc_has_perm(sid, csid, SECCLASS_FILE, FILE__READ, NULL);
 	}
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 87a9741..7616742 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -408,7 +408,8 @@ static inline unsigned int smk_ptrace_mode(unsigned int mode)
 
 /**
  * smk_ptrace_rule_check - helper for ptrace access
- * @tracer: tracer process
+ * @tracer: tracer process - only used for auditing if @tracer_cred is set
+ * @tracer_cred: creds for checks, may be different from @tracer's creds
  * @tracee_known: label entry of the process that's about to be traced
  * @mode: ptrace attachment mode (PTRACE_MODE_*)
  * @func: name of the function that called us, used for audit
@@ -416,6 +417,7 @@ static inline unsigned int smk_ptrace_mode(unsigned int mode)
  * Returns 0 on access granted, -error on error
  */
 static int smk_ptrace_rule_check(struct task_struct *tracer,
+				 const struct cred *tracer_cred,
 				 struct smack_known *tracee_known,
 				 unsigned int mode, const char *func)
 {
@@ -431,7 +433,9 @@ static int smk_ptrace_rule_check(struct task_struct *tracer,
 	}
 
 	rcu_read_lock();
-	tsp = __task_cred(tracer)->security;
+	if (tracer_cred == NULL)
+		tracer_cred = __task_cred(tracer);
+	tsp = tracer_cred->security;
 	tracer_known = smk_of_task(tsp);
 
 	if ((mode & PTRACE_MODE_ATTACH) &&
@@ -476,13 +480,14 @@ static int smk_ptrace_rule_check(struct task_struct *tracer,
  *
  * Do the capability checks.
  */
-static int smack_ptrace_access_check(struct task_struct *ctp, unsigned int mode)
+static int smack_ptrace_access_check(struct task_struct *ctp, unsigned int mode,
+				     const struct cred *cred)
 {
 	struct smack_known *skp;
 
 	skp = smk_of_task_struct(ctp);
 
-	return smk_ptrace_rule_check(current, skp, mode, __func__);
+	return smk_ptrace_rule_check(current, cred, skp, mode, __func__);
 }
 
 /**
@@ -500,7 +505,8 @@ static int smack_ptrace_traceme(struct task_struct *ptp)
 
 	skp = smk_of_task(current_security());
 
-	rc = smk_ptrace_rule_check(ptp, skp, PTRACE_MODE_ATTACH, __func__);
+	rc = smk_ptrace_rule_check(ptp, NULL, skp, PTRACE_MODE_ATTACH,
+				   __func__);
 	return rc;
 }
 
@@ -941,7 +947,7 @@ static int smack_bprm_set_creds(struct linux_binprm *bprm)
 		rcu_read_lock();
 		tracer = ptrace_parent(current);
 		if (likely(tracer != NULL))
-			rc = smk_ptrace_rule_check(tracer,
+			rc = smk_ptrace_rule_check(tracer, NULL,
 						   isp->smk_task,
 						   PTRACE_MODE_ATTACH,
 						   __func__);
diff --git a/security/yama/yama_lsm.c b/security/yama/yama_lsm.c
index 0309f21..32a2962 100644
--- a/security/yama/yama_lsm.c
+++ b/security/yama/yama_lsm.c
@@ -347,7 +347,7 @@ static int ptracer_exception_found(struct task_struct *tracer,
  * Returns 0 if following the ptrace is allowed, -ve on error.
  */
 static int yama_ptrace_access_check(struct task_struct *child,
-				    unsigned int mode)
+				    unsigned int mode, const struct cred *cred)
 {
 	int rc = 0;
 
@@ -359,7 +359,9 @@ static int yama_ptrace_access_check(struct task_struct *child,
 			break;
 		case YAMA_SCOPE_RELATIONAL:
 			rcu_read_lock();
-			if (!task_is_descendant(current, child) &&
+			/* fail open for some procfs-based ATTACH accesses! */
+			if ((mode & PTRACE_MODE_NON_CURRENT) == 0 &&
+			    !task_is_descendant(current, child) &&
 			    !ptracer_exception_found(current, child) &&
 			    !ns_capable(__task_cred(child)->user_ns, CAP_SYS_PTRACE))
 				rc = -EPERM;
@@ -367,7 +369,8 @@ static int yama_ptrace_access_check(struct task_struct *child,
 			break;
 		case YAMA_SCOPE_CAPABILITY:
 			rcu_read_lock();
-			if (!ns_capable(__task_cred(child)->user_ns, CAP_SYS_PTRACE))
+			if (security_capable(cred, __task_cred(child)->user_ns,
+					     CAP_SYS_PTRACE) != 0)
 				rc = -EPERM;
 			rcu_read_unlock();
 			break;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/9] futex: don't leak robust_list pointer
  2016-09-18 15:05 [PATCH 0/9] Various fixes related to ptrace_may_access() Jann Horn
                   ` (2 preceding siblings ...)
  2016-09-18 15:05 ` [PATCH 3/9] proc: use open()-time creds for ptrace checks Jann Horn
@ 2016-09-18 15:05 ` Jann Horn
  2016-09-18 18:28   ` Ben Hutchings
  2016-09-18 15:05 ` [PATCH 5/9] proc: lock properly in ptrace_may_access callers Jann Horn
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Jann Horn @ 2016-09-18 15:05 UTC (permalink / raw)
  To: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

This prevents an attacker from determining the robust_list or
compat_robust_list userspace pointer of a process created by executing
a setuid binary. Such an attack could be performed by racing
get_robust_list() with a setuid execution. The impact of this issue is that
an attacker could theoretically bypass ASLR when attacking setuid binaries.

Signed-off-by: Jann Horn <jann@thejh.net>
---
 kernel/futex.c        | 31 +++++++++++++++++++++----------
 kernel/futex_compat.c | 31 +++++++++++++++++++++----------
 2 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 46cb3a3..002f056 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3007,31 +3007,42 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
 	if (!futex_cmpxchg_enabled)
 		return -ENOSYS;
 
-	rcu_read_lock();
-
-	ret = -ESRCH;
-	if (!pid)
+	if (!pid) {
 		p = current;
-	else {
+		get_task_struct(p);
+	} else {
+		rcu_read_lock();
 		p = find_task_by_vpid(pid);
-		if (!p)
-			goto err_unlock;
+		/* pin the task to permit dropping the RCU read lock before
+		 * acquiring the mutex
+		 */
+		get_task_struct(p);
+		rcu_read_unlock();
 	}
+	if (!p)
+		return -ESRCH;
+
+	ret = mutex_lock_killable(&p->signal->cred_guard_light);
+	if (ret)
+		goto err_put;
 
 	ret = -EPERM;
 	if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
 		goto err_unlock;
 
 	head = p->robust_list;
-	rcu_read_unlock();
+
+	mutex_unlock(&p->signal->cred_guard_light);
+	put_task_struct(p);
 
 	if (put_user(sizeof(*head), len_ptr))
 		return -EFAULT;
 	return put_user(head, head_ptr);
 
 err_unlock:
-	rcu_read_unlock();
-
+	mutex_unlock(&current->signal->cred_guard_light);
+err_put:
+	put_task_struct(p);
 	return ret;
 }
 
diff --git a/kernel/futex_compat.c b/kernel/futex_compat.c
index 4ae3232..241b4a9 100644
--- a/kernel/futex_compat.c
+++ b/kernel/futex_compat.c
@@ -143,31 +143,42 @@ COMPAT_SYSCALL_DEFINE3(get_robust_list, int, pid,
 	if (!futex_cmpxchg_enabled)
 		return -ENOSYS;
 
-	rcu_read_lock();
-
-	ret = -ESRCH;
-	if (!pid)
+	if (!pid) {
 		p = current;
-	else {
+		get_task_struct(p);
+	} else {
+		rcu_read_lock();
 		p = find_task_by_vpid(pid);
-		if (!p)
-			goto err_unlock;
+		/* pin the task to permit dropping the RCU read lock before
+		 * acquiring the mutex
+		 */
+		get_task_struct(p);
+		rcu_read_unlock();
 	}
+	if (!p)
+		return -ESRCH;
+
+	ret = mutex_lock_killable(&p->signal->cred_guard_light);
+	if (ret)
+		goto err_put;
 
 	ret = -EPERM;
 	if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS))
 		goto err_unlock;
 
 	head = p->compat_robust_list;
-	rcu_read_unlock();
+
+	mutex_unlock(&p->signal->cred_guard_light);
+	put_task_struct(p);
 
 	if (put_user(sizeof(*head), len_ptr))
 		return -EFAULT;
 	return put_user(ptr_to_compat(head), head_ptr);
 
 err_unlock:
-	rcu_read_unlock();
-
+	mutex_unlock(&current->signal->cred_guard_light);
+err_put:
+	put_task_struct(p);
 	return ret;
 }
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/9] proc: lock properly in ptrace_may_access callers
  2016-09-18 15:05 [PATCH 0/9] Various fixes related to ptrace_may_access() Jann Horn
                   ` (3 preceding siblings ...)
  2016-09-18 15:05 ` [PATCH 4/9] futex: don't leak robust_list pointer Jann Horn
@ 2016-09-18 15:05 ` Jann Horn
  2016-09-18 19:15   ` Jann Horn
  2016-09-18 15:05 ` [PATCH 6/9] ptrace: warn on ptrace_may_access without proper locking Jann Horn
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 28+ messages in thread
From: Jann Horn @ 2016-09-18 15:05 UTC (permalink / raw)
  To: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

Use the new cred_guard_light to prevent information leaks
through races in procfs.

Signed-off-by: Jann Horn <jann@thejh.net>
---
 fs/proc/array.c      |   7 ++++
 fs/proc/base.c       | 114 ++++++++++++++++++++++++++++++---------------------
 fs/proc/namespaces.c |  14 +++++++
 3 files changed, 89 insertions(+), 46 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 3349742..c28f254 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -410,9 +410,15 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	unsigned long rsslim = 0;
 	char tcomm[sizeof(task->comm)];
 	unsigned long flags;
+	int err;
 
 	state = *get_task_state(task);
 	vsize = eip = esp = 0;
+
+	err = mutex_lock_killable(&task->signal->cred_guard_light);
+	if (err)
+		return err;
+
 	permitted = proc_ptrace_may_access_seq(task,
 			PTRACE_MODE_READ_FSCREDS | PTRACE_MODE_NOAUDIT, m);
 	mm = get_task_mm(task);
@@ -568,6 +574,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	seq_putc(m, '\n');
 	if (mm)
 		mmput(mm);
+	mutex_unlock(&task->signal->cred_guard_light);
 	return 0;
 }
 
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 771199b..a9d271b 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -135,6 +135,25 @@ struct pid_entry {
 		NULL, &proc_single_file_operations,	\
 		{ .proc_show = show } )
 
+static int lock_trace(struct seq_file *m, struct task_struct *task,
+		      unsigned int mode)
+{
+	int err = mutex_lock_killable(&task->signal->cred_guard_light);
+
+	if (err)
+		return err;
+	if (!proc_ptrace_may_access_seq(task, mode | PTRACE_MODE_FSCREDS, m)) {
+		mutex_unlock(&task->signal->cred_guard_light);
+		return -EPERM;
+	}
+	return 0;
+}
+
+static void unlock_trace(struct task_struct *task)
+{
+	mutex_unlock(&task->signal->cred_guard_light);
+}
+
 /*
  * Count the number of hardlinks for the pid_entry table, excluding the .
  * and .. links.
@@ -428,36 +447,20 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
 	unsigned long wchan;
 	char symname[KSYM_NAME_LEN];
 
-	wchan = get_wchan(task);
-
-	if (wchan &&
-	    proc_ptrace_may_access_seq(task, PTRACE_MODE_READ_FSCREDS, m) &&
-	    !lookup_symbol_name(wchan, symname))
-		seq_printf(m, "%s", symname);
-	else
-		seq_putc(m, '0');
+	if (lock_trace(m, task, PTRACE_MODE_READ) == 0) {
+		wchan = get_wchan(task);
+		unlock_trace(task);
+		if (wchan && !lookup_symbol_name(wchan, symname)) {
+			seq_printf(m, "%s", symname);
+			return 0;
+		}
+	}
 
+	seq_putc(m, '0');
 	return 0;
 }
 #endif /* CONFIG_KALLSYMS */
 
-static int lock_trace(struct seq_file *m, struct task_struct *task)
-{
-	int err = mutex_lock_killable(&task->signal->cred_guard_mutex);
-	if (err)
-		return err;
-	if (!proc_ptrace_may_access_seq(task, PTRACE_MODE_ATTACH_FSCREDS, m)) {
-		mutex_unlock(&task->signal->cred_guard_mutex);
-		return -EPERM;
-	}
-	return 0;
-}
-
-static void unlock_trace(struct task_struct *task)
-{
-	mutex_unlock(&task->signal->cred_guard_mutex);
-}
-
 #ifdef CONFIG_STACKTRACE
 
 #define MAX_STACK_TRACE_DEPTH	64
@@ -479,7 +482,7 @@ static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns,
 	trace.entries		= entries;
 	trace.skip		= 0;
 
-	err = lock_trace(m, task);
+	err = lock_trace(m, task, PTRACE_MODE_ATTACH);
 	if (!err) {
 		save_stack_trace_tsk(task, &trace);
 
@@ -661,7 +664,7 @@ static int proc_pid_syscall(struct seq_file *m, struct pid_namespace *ns,
 	unsigned long args[6], sp, pc;
 	int res;
 
-	res = lock_trace(m, task);
+	res = lock_trace(m, task, PTRACE_MODE_ATTACH);
 	if (res)
 		return res;
 
@@ -685,23 +688,38 @@ static int proc_pid_syscall(struct seq_file *m, struct pid_namespace *ns,
 /*                       Here the fs part begins                        */
 /************************************************************************/
 
-/* permission checks */
-static int proc_fd_access_allowed(struct inode *inode)
+/* permission checks.
+ * If this returns 1, you'll have to unlock_trace(*taskp) afterwards.
+ */
+static int proc_fd_access_allowed_lock(struct inode *inode,
+					 struct task_struct **taskp)
 {
 	struct task_struct *task;
 	int allowed = 0;
-	/* Allow access to a task's file descriptors if it is us or we
-	 * may use ptrace attach to the process and find out that
-	 * information.
-	 */
+
 	task = get_proc_task(inode);
-	if (task) {
-		allowed = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
+	if (!task)
+		return 0;
+	if (mutex_lock_killable(&task->signal->cred_guard_light))
+		goto out_put;
+	allowed = ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
+	if (!allowed)
+		mutex_unlock(&task->signal->cred_guard_light);
+out_put:
+	if (allowed)
+		*taskp = task;
+	else
 		put_task_struct(task);
-	}
+
 	return allowed;
 }
 
+static void proc_fd_access_allowed_unlock(struct task_struct *task)
+{
+	mutex_unlock(&task->signal->cred_guard_light);
+	put_task_struct(task);
+}
+
 int proc_setattr(struct dentry *dentry, struct iattr *attr)
 {
 	int error;
@@ -722,6 +740,7 @@ int proc_setattr(struct dentry *dentry, struct iattr *attr)
 /*
  * May current process learn task's sched/cmdline info (for hide_pid_min=1)
  * or euid/egid (for hide_pid_min=2)?
+ * NOTE: When you call this, you must hold cred_guard_mutex or cred_guard_light.
  */
 static bool has_pid_permissions(struct pid_namespace *pid,
 				 struct task_struct *task,
@@ -1600,15 +1619,17 @@ static const char *proc_pid_get_link(struct dentry *dentry,
 {
 	struct path path;
 	int error = -EACCES;
+	struct task_struct *task;
 
 	if (!dentry)
 		return ERR_PTR(-ECHILD);
 
 	/* Are we allowed to snoop on the tasks file descriptors? */
-	if (!proc_fd_access_allowed(inode))
+	if (!proc_fd_access_allowed_lock(inode, &task))
 		goto out;
 
 	error = PROC_I(inode)->op.proc_get_link(dentry, &path);
+	proc_fd_access_allowed_unlock(task);
 	if (error)
 		goto out;
 
@@ -1647,12 +1668,14 @@ static int proc_pid_readlink(struct dentry * dentry, char __user * buffer, int b
 	int error = -EACCES;
 	struct inode *inode = d_inode(dentry);
 	struct path path;
+	struct task_struct *task;
 
 	/* Are we allowed to snoop on the tasks file descriptors? */
-	if (!proc_fd_access_allowed(inode))
+	if (!proc_fd_access_allowed_lock(inode, &task))
 		goto out;
 
 	error = PROC_I(inode)->op.proc_get_link(dentry, &path);
+	proc_fd_access_allowed_unlock(task);
 	if (error)
 		goto out;
 
@@ -2047,17 +2070,15 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
 	if (!task)
 		goto out;
 
-	result = -EACCES;
-	if (!ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS))
+	mm = mm_access(task, PTRACE_MODE_READ_FSCREDS);
+	if (IS_ERR(mm))
+		result = PTR_ERR(mm);
+	if (IS_ERR_OR_NULL(mm))
 		goto out_put_task;
 
 	result = -ENOENT;
 	if (dname_to_vma_addr(dentry, &vm_start, &vm_end))
-		goto out_put_task;
-
-	mm = get_task_mm(task);
-	if (!mm)
-		goto out_put_task;
+		goto out_put_mm;
 
 	down_read(&mm->mmap_sem);
 	vma = find_exact_vma(mm, vm_start, vm_end);
@@ -2070,6 +2091,7 @@ static struct dentry *proc_map_files_lookup(struct inode *dir,
 
 out_no_vma:
 	up_read(&mm->mmap_sem);
+out_put_mm:
 	mmput(mm);
 out_put_task:
 	put_task_struct(task);
@@ -2839,7 +2861,7 @@ static const struct file_operations proc_setgroups_operations = {
 static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
 				struct pid *pid, struct task_struct *task)
 {
-	int err = lock_trace(m, task);
+	int err = lock_trace(m, task, PTRACE_MODE_ATTACH);
 	if (!err) {
 		seq_printf(m, "%08x\n", task->personality);
 		unlock_trace(task);
diff --git a/fs/proc/namespaces.c b/fs/proc/namespaces.c
index 51b8b0a..e1246e8 100644
--- a/fs/proc/namespaces.c
+++ b/fs/proc/namespaces.c
@@ -49,11 +49,18 @@ static const char *proc_ns_get_link(struct dentry *dentry,
 	if (!task)
 		return error;
 
+	error = ERR_PTR(mutex_lock_killable(&task->signal->cred_guard_light));
+	if (error)
+		goto out_put_task;
+
+	error = ERR_PTR(-EACCES);
 	if (ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)) {
 		error = ns_get_path(&ns_path, task, ns_ops);
 		if (!error)
 			nd_jump_link(&ns_path);
 	}
+	mutex_unlock(&task->signal->cred_guard_light);
+out_put_task:
 	put_task_struct(task);
 	return error;
 }
@@ -70,11 +77,18 @@ static int proc_ns_readlink(struct dentry *dentry, char __user *buffer, int bufl
 	if (!task)
 		return res;
 
+	res = mutex_lock_killable(&task->signal->cred_guard_light);
+	if (res)
+		goto out_put_task;
+
+	res = -EACCES;
 	if (ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS)) {
 		res = ns_get_name(name, sizeof(name), task, ns_ops);
 		if (res >= 0)
 			res = readlink_copy(buffer, buflen, name);
 	}
+	mutex_unlock(&task->signal->cred_guard_light);
+out_put_task:
 	put_task_struct(task);
 	return res;
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 6/9] ptrace: warn on ptrace_may_access without proper locking
  2016-09-18 15:05 [PATCH 0/9] Various fixes related to ptrace_may_access() Jann Horn
                   ` (4 preceding siblings ...)
  2016-09-18 15:05 ` [PATCH 5/9] proc: lock properly in ptrace_may_access callers Jann Horn
@ 2016-09-18 15:05 ` Jann Horn
  2016-09-18 15:05 ` [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context Jann Horn
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-18 15:05 UTC (permalink / raw)
  To: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

If neither of those locks is taken during a ptrace_may_access() call, that
is a strong indicator for a security bug where post-execve process
properties can be leaked by racing with execve().

Signed-off-by: Jann Horn <jann@thejh.net>
---
 kernel/ptrace.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index dc7594a..b311ca5 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -224,6 +224,9 @@ static int __ptrace_may_access(struct task_struct *task, unsigned int mode,
 	kuid_t caller_uid;
 	kgid_t caller_gid;
 
+	WARN_ON(!mutex_is_locked(&task->signal->cred_guard_mutex) &&
+		!mutex_is_locked(&task->signal->cred_guard_light));
+
 	if (!(mode & PTRACE_MODE_FSCREDS) == !(mode & PTRACE_MODE_REALCREDS)) {
 		WARN(1, "denying ptrace access check without PTRACE_MODE_*CREDS\n");
 		return -EPERM;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context
  2016-09-18 15:05 [PATCH 0/9] Various fixes related to ptrace_may_access() Jann Horn
                   ` (5 preceding siblings ...)
  2016-09-18 15:05 ` [PATCH 6/9] ptrace: warn on ptrace_may_access without proper locking Jann Horn
@ 2016-09-18 15:05 ` Jann Horn
  2016-09-18 18:38   ` Ben Hutchings
                     ` (2 more replies)
  2016-09-18 15:05 ` [PATCH 8/9] fs/proc: fix attr access check Jann Horn
  2016-09-18 15:05 ` [PATCH 9/9] Documentation: add security/ptrace_checks.txt Jann Horn
  8 siblings, 3 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-18 15:05 UTC (permalink / raw)
  To: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

This ensures that VFS implementations don't call ptrace_may_access() from
VFS read or write handlers. In order for file descriptor passing to have
its intended security properties, VFS read/write handlers must not do any
kind of privilege checking.

Signed-off-by: Jann Horn <jann@thejh.net>
---
 fs/aio.c              |  2 ++
 fs/read_write.c       | 12 ++++++++++++
 fs/splice.c           | 12 ++++++++++--
 include/linux/sched.h |  4 ++++
 kernel/ptrace.c       |  7 +++++++
 5 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index fb8e45b..c941885 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1454,9 +1454,11 @@ rw_common:
 
 		if (rw == WRITE)
 			file_start_write(file);
+		current->in_unprivileged_vfs++;
 
 		ret = iter_op(req, &iter);
 
+		current->in_unprivileged_vfs--;
 		if (rw == WRITE)
 			file_end_write(file);
 		kfree(iovec);
diff --git a/fs/read_write.c b/fs/read_write.c
index 66215a7..deddb93 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -472,7 +472,9 @@ ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
 	if (!ret) {
 		if (count > MAX_RW_COUNT)
 			count =  MAX_RW_COUNT;
+		current->in_unprivileged_vfs++;
 		ret = __vfs_read(file, buf, count, pos);
+		current->in_unprivileged_vfs--;
 		if (ret > 0) {
 			fsnotify_access(file);
 			add_rchar(current, ret);
@@ -557,7 +559,9 @@ ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_
 		if (count > MAX_RW_COUNT)
 			count =  MAX_RW_COUNT;
 		file_start_write(file);
+		current->in_unprivileged_vfs++;
 		ret = __vfs_write(file, buf, count, pos);
+		current->in_unprivileged_vfs--;
 		if (ret > 0) {
 			fsnotify_modify(file);
 			add_wchar(current, ret);
@@ -838,12 +842,14 @@ static ssize_t do_readv_writev(int type, struct file *file,
 		iter_fn = file->f_op->write_iter;
 		file_start_write(file);
 	}
+	current->in_unprivileged_vfs++;
 
 	if (iter_fn)
 		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
 	else
 		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
 
+	current->in_unprivileged_vfs--;
 	if (type != READ)
 		file_end_write(file);
 
@@ -1063,12 +1069,14 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
 		iter_fn = file->f_op->write_iter;
 		file_start_write(file);
 	}
+	current->in_unprivileged_vfs++;
 
 	if (iter_fn)
 		ret = do_iter_readv_writev(file, &iter, pos, iter_fn, flags);
 	else
 		ret = do_loop_readv_writev(file, &iter, pos, fn, flags);
 
+	current->in_unprivileged_vfs--;
 	if (type != READ)
 		file_end_write(file);
 
@@ -1369,7 +1377,9 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
 		fl = SPLICE_F_NONBLOCK;
 #endif
 	file_start_write(out.file);
+	current->in_unprivileged_vfs++;
 	retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
+	current->in_unprivileged_vfs--;
 	file_end_write(out.file);
 
 	if (retval > 0) {
@@ -1512,6 +1522,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 	ret = mnt_want_write_file(file_out);
 	if (ret)
 		return ret;
+	current->in_unprivileged_vfs++;
 
 	ret = -EOPNOTSUPP;
 	if (file_out->f_op->copy_file_range)
@@ -1521,6 +1532,7 @@ ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
 		ret = do_splice_direct(file_in, &pos_in, file_out, &pos_out,
 				len > MAX_RW_COUNT ? MAX_RW_COUNT : len, 0);
 
+	current->in_unprivileged_vfs--;
 	if (ret > 0) {
 		fsnotify_access(file_in);
 		add_rchar(current, ret);
diff --git a/fs/splice.c b/fs/splice.c
index dd9bf7e..5fcad56 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1134,7 +1134,7 @@ static long do_splice_to(struct file *in, loff_t *ppos,
 {
 	ssize_t (*splice_read)(struct file *, loff_t *,
 			       struct pipe_inode_info *, size_t, unsigned int);
-	int ret;
+	long ret;
 
 	if (unlikely(!(in->f_mode & FMODE_READ)))
 		return -EBADF;
@@ -1151,7 +1151,11 @@ static long do_splice_to(struct file *in, loff_t *ppos,
 	else
 		splice_read = default_file_splice_read;
 
-	return splice_read(in, ppos, pipe, len, flags);
+	current->in_unprivileged_vfs++;
+	ret = splice_read(in, ppos, pipe, len, flags);
+	current->in_unprivileged_vfs--;
+
+	return ret;
 }
 
 /**
@@ -1334,7 +1338,9 @@ long do_splice_direct(struct file *in, loff_t *ppos, struct file *out,
 	if (unlikely(ret < 0))
 		return ret;
 
+	current->in_unprivileged_vfs++;
 	ret = splice_direct_to_actor(in, &sd, direct_splice_actor);
+	current->in_unprivileged_vfs--;
 	if (ret > 0)
 		*ppos = sd.pos;
 
@@ -1401,7 +1407,9 @@ static long do_splice(struct file *in, loff_t __user *off_in,
 			return ret;
 
 		file_start_write(out);
+		current->in_unprivileged_vfs++;
 		ret = do_splice_from(ipipe, out, &offset, len, flags);
+		current->in_unprivileged_vfs--;
 		file_end_write(out);
 
 		if (!off_out)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e4bf894..6094720 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1473,6 +1473,10 @@ struct task_struct {
 	atomic_t usage;
 	unsigned int flags;	/* per process flags, defined below */
 	unsigned int ptrace;
+	/* depth of VFS read/write; non-zero values let certain privilege checks
+	 * fail with a warning
+	 */
+	unsigned int in_unprivileged_vfs;
 
 #ifdef CONFIG_SMP
 	struct llist_node wake_entry;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index b311ca5..d839919 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -302,6 +302,13 @@ ok:
 bool ptrace_may_access(struct task_struct *task, unsigned int mode)
 {
 	int err;
+
+	/* If you have to check for ptrace access from a VFS method, use
+	 * ptrace_may_access_noncurrent() instead.
+	 */
+	if (WARN_ON(current->in_unprivileged_vfs != 0))
+		return false;
+
 	task_lock(task);
 	err = __ptrace_may_access(task, mode, current_cred(),
 				  &current->self_privunit_id);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 8/9] fs/proc: fix attr access check
  2016-09-18 15:05 [PATCH 0/9] Various fixes related to ptrace_may_access() Jann Horn
                   ` (6 preceding siblings ...)
  2016-09-18 15:05 ` [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context Jann Horn
@ 2016-09-18 15:05 ` Jann Horn
  2016-09-18 15:05 ` [PATCH 9/9] Documentation: add security/ptrace_checks.txt Jann Horn
  8 siblings, 0 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-18 15:05 UTC (permalink / raw)
  To: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

Make sure files in /proc/$pid/attr/ can only be written by the task that
opened them.

This prevents an attacking process from changing the security context of
another process that it can force to write attacker-controlled data into an
attacker-supplied file descriptor. I'm not sure what the impact of this is.

Signed-off-by: Jann Horn <jann@thejh.net>
---
 fs/proc/base.c | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index a9d271b..56a6cdc 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2484,6 +2484,18 @@ out:
 }
 
 #ifdef CONFIG_SECURITY
+static int proc_pid_attr_open(struct inode *inode, struct file *file)
+{
+	u64 *opener_privunit_id;
+
+	opener_privunit_id = kmalloc(sizeof(u64), GFP_KERNEL);
+	if (opener_privunit_id == NULL)
+		return -ENOMEM;
+	*opener_privunit_id = current->self_privunit_id;
+	file->private_data = opener_privunit_id;
+	return 0;
+}
+
 static ssize_t proc_pid_attr_read(struct file * file, char __user * buf,
 				  size_t count, loff_t *ppos)
 {
@@ -2512,6 +2524,7 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
 	void *page;
 	ssize_t length;
 	struct task_struct *task = get_proc_task(inode);
+	u64 *opener_privunit_id = file->private_data;
 
 	length = -ESRCH;
 	if (!task)
@@ -2535,9 +2548,29 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
 	if (length < 0)
 		goto out_free;
 
+	/*
+	 * Ensure that a process can't be tricked into writing into its own attr
+	 * files without intending to do so.
+	 *
+	 * SELinux has a rule that prevents anyone other than `task` from
+	 * writing, but if the fd stays open across execve, or is sent across a
+	 * unix domain socket or whatever, that is bypassable.
+	 * Same thing in AppArmor and in Smack.
+	 *
+	 * To prevent this, compare the opener's exec_id with the target's to
+	 * ensure that they're in the same task group and no exec happened in
+	 * the meantime.
+	 *
+	 * Why is this a file and not a prctl or whatever. :/
+	 */
+	length = -EACCES;
+	if (*opener_privunit_id != task->self_privunit_id)
+		goto out_unlock;
+
 	length = security_setprocattr(task,
 				      (char*)file->f_path.dentry->d_name.name,
 				      page, count);
+out_unlock:
 	mutex_unlock(&task->signal->cred_guard_mutex);
 out_free:
 	kfree(page);
@@ -2547,10 +2580,20 @@ out_no_task:
 	return length;
 }
 
+static int proc_pid_attr_release(struct inode *inode, struct file *file)
+{
+	u64 *opener_privunit_id = file->private_data;
+
+	kfree(opener_privunit_id);
+	return 0;
+}
+
 static const struct file_operations proc_pid_attr_operations = {
+	.open		= proc_pid_attr_open,
 	.read		= proc_pid_attr_read,
 	.write		= proc_pid_attr_write,
 	.llseek		= generic_file_llseek,
+	.release	= proc_pid_attr_release,
 };
 
 static const struct pid_entry attr_dir_stuff[] = {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 9/9] Documentation: add security/ptrace_checks.txt
  2016-09-18 15:05 [PATCH 0/9] Various fixes related to ptrace_may_access() Jann Horn
                   ` (7 preceding siblings ...)
  2016-09-18 15:05 ` [PATCH 8/9] fs/proc: fix attr access check Jann Horn
@ 2016-09-18 15:05 ` Jann Horn
  8 siblings, 0 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-18 15:05 UTC (permalink / raw)
  To: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

This includes some things suggested by Ben Hutchings and Kees Cook.

Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
---
 Documentation/security/ptrace_checks.txt | 229 +++++++++++++++++++++++++++++++
 1 file changed, 229 insertions(+)
 create mode 100644 Documentation/security/ptrace_checks.txt

diff --git a/Documentation/security/ptrace_checks.txt b/Documentation/security/ptrace_checks.txt
new file mode 100644
index 0000000..98689d5
--- /dev/null
+++ b/Documentation/security/ptrace_checks.txt
@@ -0,0 +1,229 @@
+			     ====================
+			     PTRACE ACCESS CHECKS
+			     ====================
+
+By: Jann Horn <jann@thejh.net>
+
+=====
+INTRO
+=====
+
+This file describes the rules for ptrace() and ptrace()-like access to another
+task (inspecting or modifying the virtual memory of another process, inspecting
+another task's register state, inspecting another task's open files and so on).
+
+This kind of access is particularly security-sensitive because it often allows
+the caller to take complete control over the remote process. Therefore, the
+kernel must be very careful here.
+
+There are various kernel APIs that grant debugging access to other processes.
+
+
+===========================
+ORDINARY DEBUGGING SYSCALLS
+===========================
+
+The following ordinary syscalls grant debugging-like access:
+
+ - ptrace() grants R+W access to registers and virtual memory
+ - process_vm_readv() / process_vm_writev() grant R/W access to virtual memory
+ - get_robust_list() reveals an address from the target task, which would be
+   useful e.g. for an ASLR bypass
+ - perf_event_open()
+ - (kcmp() reveals a very small amount of information)
+
+These syscalls use the caller's *real* UID/GID and his *permitted* capability
+set (for historical reasons) to determine the permitted access using
+ptrace_may_access(..., PTRACE_MODE_REALCREDS | ...). This function is
+responsible for ensuring that the caller can't steal any privileges using the
+debugging access; in other words, it has to ensure that the caller's credentials
+are (more or less) a superset of the target's credentials. Namespaces aside,
+these rules work as follows:
+
+Introspection, including cross-thread introspection, is always allowed. This
+means that it is unsafe to use any of the debugging APIs on behalf of another,
+untrusted process unless it is verified somehow that the target pid does not
+actually belong to the calling process - dropping privileges is not sufficient.
+
+root (to be more precise, a process with CAP_SYS_PTRACE effective) can always
+use these debugging APIs on any process - unless a LSM denies the access.
+
+For normal users, there are more checks:
+
+ - The real uid of the caller must be equal to ruid, euid and suid of the target
+   task, same thing for the gids. All of the uids and gids of the target (apart
+   from the fs*id) are checked here because a privileged process might stash his
+   privileged uid into any one of them. (checked in __ptrace_may_access())
+   (Note: This doesn't cover the supplementary GIDs.)
+ - The permitted capabilities of the caller must be a superset of the permitted
+   capabilities of the target. (checked in cap_ptrace_access_check())
+ - The target process must be dumpable. What this means is explained in the next
+   section.
+
+
+===========
+DUMPABILITY
+===========
+
+Originally, the dumpability rule didn't exist - as long as a process had ruid,
+euid and suid set to the uid of a user (same thing with the gids), that user was
+able to gain debugging access to the process. This was problematic: It meant
+that if a setuid binary opened some important file, then dropped all privileges,
+the user was able to debug the process and steal its file descriptor - which is
+also a form of privilege! Other issues are e.g. that when a process drops its
+privileges, it might still have secrets in memory that the user isn't meant to
+see.
+
+Therefore, a process now becomes "nondumpable" when it:
+
+ - changes its euid / egid
+ - changes its fsuid / fsgid
+ - changes its permitted capability set
+
+These checks are in commit_creds().
+
+When a process is nondumpable, you need to have the CAP_SYS_PTRACE capability to
+access it - just having the same uids is not enough anymore.
+
+The most interesting part of dumpability is what happens on execve().
+(Non-)Dumpability is not inherited; instead, it is recalculated in
+setup_new_exec() and commit_creds(): A process will be nondumpable after
+execve() if its post-setuid-bit-handling euid and ruid or egid and rgid aren't
+the same. Additionally, it will be nondumpable if execve() changes its
+credentials in a way that commit_creds() triggers on (as described above).
+
+In effect, this means that normally, when a setuid/setgid binary executes
+another program, the other program is still protected by nondumpability (and
+also by secure execution).
+
+(Additionally, a process will be nondumpable if the euid doesn't have read
+access to the executed binary, which can be the case if the binary is
+executable, but not readable; however, this restriction is irrelevant from a
+security perspective because it isn't enforced consistently, neither through
+AT_SECURE nor for bprm->unsafe. In other words, you can e.g. be attached to such
+a binary via ptrace while it is executed, or you can LD_PRELOAD a library into
+it that dumps its memory. If this was fixed, it would be important to keep
+namespacing issues in mind, in particular the interaction with
+LSM_UNSAFE_PTRACE_CAP, the privileged kind of ptrace that works across setuid
+execve.)
+
+Dumpability and the secure exec flag can be affected by the following "unsafe
+execution" rules. When a process with ruid != euid executes another program and
+an unsafe execution is detected, *ITS EUID WILL BE DROPPED TO ITS RUID* by
+cap_bprm_set_creds() (unless it has CAP_SETUID effective in the init user ns).
+Unsafe execution means one of the following:
+
+ - LSM_UNSAFE_PTRACE: An unprivileged process is currently attached via ptrace.
+ - LSM_UNSAFE_NO_NEW_PRIVS: The current thread turned on NO_NEW_PRIVS.
+ - LSM_UNSAFE_SHARE: The fs struct is shared with a non-thread.
+
+
+=====================
+FILESYSTEM DEBUG APIS
+=====================
+
+The pid / tgid entries in procfs contain various entries that allow debugging
+access to a process. Interesting entries are:
+
+ - auxv permits an ASLR bypass
+ - cwd can permit bypassing filesystem restrictions in some cases
+ - environ can leak secret tokens
+ - fd can permit bypassing filesystem restrictions or leak access to things like
+   pipes
+ - maps permits an ASLR bypass
+ - mem grants R+W access to process memory
+ - stat permits an ASLR bypass
+
+Of these, all use both a normal filesystem DAC check (where the file owner is
+the process owner for a dumpable process, root for a nondumpable process) and a
+ptrace_may_access() check; however, the DAC check may be modified, and the
+ptrace_may_access() is performed under PTRACE_FSCREDS, meaning that instead of
+the caller's ruid, rgid and permitted capabilities, the fsuid, fsgid and
+effective capabilities are used, causing the case where a daemon drops its euid
+prior to accessing a file for the user to be treated correctly for this check.
+
+NEEDS ATTENTION:
+Special-case rules that permit introspection become a big problem here. In
+particular, if a setuid binary runs with dropped privileges (ruid=euid=user and
+suid=0 or so), it is still able to open the following entries under /proc/self:
+
+ - cwd (because normal DAC rules always permit symlink access, proc_pid_get_link
+   only checks for ptrace access and proc_cwd_link doesn't check anything)
+ - fd (same as cwd for the entries inside; the directory itself has an override
+   on the permission handler that makes introspection possible even if the
+   normal DAC rules would forbid it)
+ - maps (mode 0444, so DAC doesn't restrict anything)
+ - stat (mode 0444, and anyone can read from it; however, the interesting
+     pointers are only printed if the file opener has ptrace access to the
+     target process)
+
+In particular the fd directory is dangerous: Imagine a privileged process that
+creates some important pipe using pipe() with a dropped EUID and later performs
+writes with user-specified data on a file at a user-specified path with the same
+dropped EUID. An attacker could let the root-owned process reference the pipe as
+/proc/$pid/fd/3 and let the privileged process write to it.
+
+If necessary, it might make sense to introduce some "enable/disable
+introspection" prctl that can be used by userspace to disambiguate whether an
+access is intended to be introspection.
+
+
+Note that in all cases in which the DAC check permits access but a ptrace access
+check later on (in the VFS open handler or afterwards) restricts access,
+faccessat() is broken. However, that's probably not a real problem because
+faccessat() is racy anyway.
+
+
+===============
+USER NAMESPACES
+===============
+
+When ptrace_may_access() checks for CAP_SYS_PTRACE, the capability doesn't need
+to be active in the init namespace or in the current namespace; having it in the
+namespace of the ptrace target is sufficient. There are currently no further
+restrictions on this, meaning that the behavior of a task that creates or enters
+a user namespace is somewhat unintuitive:
+
+ - If a nondumpable task enters a user namespace, it will still be
+   nondumpable - but because the owner of the namespace is capable relative to
+   it when viewing the namespace from outside, this will often cause a
+   nondumpable task to effectively become dumpable because of a setns() /
+   clone() / unshare() call.
+
+ - The root user of a namespace can ptrace any process that enters the namespace
+   (via setns()) - immediately. This means that a process that wants to enter an
+   untrusted namespace from outside needs to be *very* careful to drop any
+   privileges it might have - uids, open file descriptors, secret data that's
+   still in memory, the controlling TTY and so on. (This also means that a
+   namespace owner who is unprivileged outside the namespace can't safely enter
+   the namespace if he doesn't trust the namespace root user: He needs
+   euid==ns->owner to enter the namespace, but must not have euid==ns->owner
+   after having entered the namespace. To get around this, it is necessary to
+   introduce a third namespace between the other two.)
+
+One way to make it easier for userspace to get this right might be to bind
+dumpability to namespaces, as follows: Let a task become nondumpable on
+setns() / clone(CLONE_NEWUSER) / unshare(CLONE_NEWUSER). Record the current
+namespace on dumpable -> nondumpable transition as "dumpability namespace".
+Require the caller to be privileged in the target's dumpability namespace, not
+just the target's current namespace.
+
+=============
+FUSE AND CUSE
+=============
+
+Anyone who can open /dev/cuse (root-owned, mode 0600) can arbitrarily read and
+write to the memory of various system processes (to be more precise, any process
+that opens files in /dev when they appear or so, then calls ioctl() on them; in
+particular, acpid does this) without further checks on capabilities or so.
+
+NEEDS ATTENTION:
+FUSE is more accessible than CUSE, but also more strict. ioctl() replies are
+restricted in location and size based on the ioctl command number (size) and the
+argument (location), but the ability to interactively control the replies to VFS
+method calls is still scary. Therefore, FUSE by default rejects filesystem
+access by anyone except the filesystem owner. This isn't done using the normal
+ptrace check, though; instead, fuse_allow_current_process() is used, which only
+looks at the UIDs and GIDs and therefore is a very simplified version of the
+normal ptrace access check. In particular, this provides no protection for
+setcap binaries.
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/9] exec: turn self_exec_id into self_privunit_id
  2016-09-18 15:05 ` [PATCH 2/9] exec: turn self_exec_id into self_privunit_id Jann Horn
@ 2016-09-18 18:13   ` Ben Hutchings
  2016-09-18 18:31     ` Jann Horn
  0 siblings, 1 reply; 28+ messages in thread
From: Ben Hutchings @ 2016-09-18 18:13 UTC (permalink / raw)
  To: Jann Horn, Alexander Viro, Roland McGrath, Oleg Nesterov,
	John Johansen, James Morris, Serge E. Hallyn, Paul Moore,
	Stephen Smalley, Eric Paris, Casey Schaufler, Kees Cook,
	Andrew Morton, Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

[-- Attachment #1: Type: text/plain, Size: 847 bytes --]

On Sun, 2016-09-18 at 17:05 +0200, Jann Horn wrote:
> This ensures that self_privunit_id ("privilege unit ID") is only shared by
> processes that share the mm_struct and the signal_struct; not just
> spatially, but also temporally. In other words, if you do execve() or
> clone() without CLONE_THREAD, you get a new privunit_id that has never been
> used before.
[...]
> +void increment_privunit_counter(void)
> +{
> +	BUILD_BUG_ON(NR_CPUS > (1 << 16));
> +	current->self_privunit_id = this_cpu_add_return(exec_counter, NR_CPUS);
> +}
[...]

This will wrap incorrectly if NR_CPUS is not a power of 2 (which is
unusual but allowed).

Ben.
 
-- 
Ben Hutchings
Klipstein's 4th Law of Prototyping and Production:
                                    A fail-safe circuit will destroy
others.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/9] futex: don't leak robust_list pointer
  2016-09-18 15:05 ` [PATCH 4/9] futex: don't leak robust_list pointer Jann Horn
@ 2016-09-18 18:28   ` Ben Hutchings
  2016-09-18 18:33     ` Jann Horn
  0 siblings, 1 reply; 28+ messages in thread
From: Ben Hutchings @ 2016-09-18 18:28 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise, linux-fsdevel,
	linux-security-module, security

[-- Attachment #1: Type: text/plain, Size: 1565 bytes --]

On Sun, Sep 18, 2016 at 05:05:12PM +0200, Jann Horn wrote:
> This prevents an attacker from determining the robust_list or
> compat_robust_list userspace pointer of a process created by executing
> a setuid binary. Such an attack could be performed by racing
> get_robust_list() with a setuid execution. The impact of this issue is that
> an attacker could theoretically bypass ASLR when attacking setuid binaries.
> 
> Signed-off-by: Jann Horn <jann@thejh.net>
> ---
>  kernel/futex.c        | 31 +++++++++++++++++++++----------
>  kernel/futex_compat.c | 31 +++++++++++++++++++++----------
>  2 files changed, 42 insertions(+), 20 deletions(-)
> 
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 46cb3a3..002f056 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -3007,31 +3007,42 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
>  	if (!futex_cmpxchg_enabled)
>  		return -ENOSYS;
>  
> -	rcu_read_lock();
> -
> -	ret = -ESRCH;
> -	if (!pid)
> +	if (!pid) {
>  		p = current;
> -	else {
> +		get_task_struct(p);
> +	} else {
> +		rcu_read_lock();
>  		p = find_task_by_vpid(pid);
> -		if (!p)
> -			goto err_unlock;
> +		/* pin the task to permit dropping the RCU read lock before
> +		 * acquiring the mutex
> +		 */
> +		get_task_struct(p);

get_task_struct() requires a non-null pointer so you can't move the
null check below it.

Ben.

> +		rcu_read_unlock();
>  	}
> +	if (!p)
> +		return -ESRCH;
[...]

-- 
Ben Hutchings
Reality is just a crutch for people who can't handle science fiction.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/9] exec: turn self_exec_id into self_privunit_id
  2016-09-18 18:13   ` Ben Hutchings
@ 2016-09-18 18:31     ` Jann Horn
  2016-09-18 18:45       ` Ben Hutchings
  0 siblings, 1 reply; 28+ messages in thread
From: Jann Horn @ 2016-09-18 18:31 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise, linux-fsdevel,
	linux-security-module, security

[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]

On Sun, Sep 18, 2016 at 07:13:27PM +0100, Ben Hutchings wrote:
> On Sun, 2016-09-18 at 17:05 +0200, Jann Horn wrote:
> > This ensures that self_privunit_id ("privilege unit ID") is only shared by
> > processes that share the mm_struct and the signal_struct; not just
> > spatially, but also temporally. In other words, if you do execve() or
> > clone() without CLONE_THREAD, you get a new privunit_id that has never been
> > used before.
> [...]
> > +void increment_privunit_counter(void)
> > +{
> > +	BUILD_BUG_ON(NR_CPUS > (1 << 16));
> > +	current->self_privunit_id = this_cpu_add_return(exec_counter, NR_CPUS);
> > +}
> [...]
> 
> This will wrap incorrectly if NR_CPUS is not a power of 2 (which is
> unusual but allowed).

If this wraps, hell breaks loose permission-wise - processes that have
no relationship whatsoever with each other will suddenly be able to ptrace
each other.

The idea is that it never wraps. It wraps after (2^64)/NR_CPUS execs or
forks on one CPU core. NR_CPUS is bounded to <=2^16, so in the worst case,
it wraps after 2^48 execs or forks.

On my system with 3.7GHz per core, 2^16 minimal sequential non-thread clone()
calls need 1 second system time (and 2 seconds wall clock time, but let's
disregard that), so 2^48 non-thread clone() calls should need over 100 years.

But I guess both the kernel and machines get faster - if you think the margin
might not be future-proof enough (or if you think I measured wrong and it's
actually much faster), I guess I could bump this to a 128bit number.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/9] futex: don't leak robust_list pointer
  2016-09-18 18:28   ` Ben Hutchings
@ 2016-09-18 18:33     ` Jann Horn
  0 siblings, 0 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-18 18:33 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise, linux-fsdevel,
	linux-security-module, security

[-- Attachment #1: Type: text/plain, Size: 1578 bytes --]

On Sun, Sep 18, 2016 at 07:28:46PM +0100, Ben Hutchings wrote:
> On Sun, Sep 18, 2016 at 05:05:12PM +0200, Jann Horn wrote:
> > This prevents an attacker from determining the robust_list or
> > compat_robust_list userspace pointer of a process created by executing
> > a setuid binary. Such an attack could be performed by racing
> > get_robust_list() with a setuid execution. The impact of this issue is that
> > an attacker could theoretically bypass ASLR when attacking setuid binaries.
> > 
> > Signed-off-by: Jann Horn <jann@thejh.net>
> > ---
> >  kernel/futex.c        | 31 +++++++++++++++++++++----------
> >  kernel/futex_compat.c | 31 +++++++++++++++++++++----------
> >  2 files changed, 42 insertions(+), 20 deletions(-)
> > 
> > diff --git a/kernel/futex.c b/kernel/futex.c
> > index 46cb3a3..002f056 100644
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -3007,31 +3007,42 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
> >  	if (!futex_cmpxchg_enabled)
> >  		return -ENOSYS;
> >  
> > -	rcu_read_lock();
> > -
> > -	ret = -ESRCH;
> > -	if (!pid)
> > +	if (!pid) {
> >  		p = current;
> > -	else {
> > +		get_task_struct(p);
> > +	} else {
> > +		rcu_read_lock();
> >  		p = find_task_by_vpid(pid);
> > -		if (!p)
> > -			goto err_unlock;
> > +		/* pin the task to permit dropping the RCU read lock before
> > +		 * acquiring the mutex
> > +		 */
> > +		get_task_struct(p);
> 
> get_task_struct() requires a non-null pointer so you can't move the
> null check below it.

Oh, right, thanks. Will fix that in v2.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context
  2016-09-18 15:05 ` [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context Jann Horn
@ 2016-09-18 18:38   ` Ben Hutchings
  2016-09-18 18:40     ` Jann Horn
  2016-09-18 19:57   ` Andy Lutomirski
  2016-09-18 20:18   ` Linus Torvalds
  2 siblings, 1 reply; 28+ messages in thread
From: Ben Hutchings @ 2016-09-18 18:38 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise, linux-fsdevel,
	linux-security-module, security

[-- Attachment #1: Type: text/plain, Size: 1020 bytes --]

On Sun, Sep 18, 2016 at 05:05:15PM +0200, Jann Horn wrote:
> This ensures that VFS implementations don't call ptrace_may_access() from
> VFS read or write handlers. In order for file descriptor passing to have
> its intended security properties, VFS read/write handlers must not do any
> kind of privilege checking.
[...]
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -302,6 +302,13 @@ ok:
>  bool ptrace_may_access(struct task_struct *task, unsigned int mode)
>  {
>  	int err;
> +
> +	/* If you have to check for ptrace access from a VFS method, use
> +	 * ptrace_may_access_noncurrent() instead.
> +	 */
> +	if (WARN_ON(current->in_unprivileged_vfs != 0))

Shouldn't this be WARN_ON_ONCE(), so that any such bug can't e used
to spam the log?

Ben.

> +		return false;
> +
>  	task_lock(task);
>  	err = __ptrace_may_access(task, mode, current_cred(),
>  				  &current->self_privunit_id);

-- 
Ben Hutchings
Reality is just a crutch for people who can't handle science fiction.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context
  2016-09-18 18:38   ` Ben Hutchings
@ 2016-09-18 18:40     ` Jann Horn
  0 siblings, 0 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-18 18:40 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise, linux-fsdevel,
	linux-security-module, security

[-- Attachment #1: Type: text/plain, Size: 918 bytes --]

On Sun, Sep 18, 2016 at 07:38:48PM +0100, Ben Hutchings wrote:
> On Sun, Sep 18, 2016 at 05:05:15PM +0200, Jann Horn wrote:
> > This ensures that VFS implementations don't call ptrace_may_access() from
> > VFS read or write handlers. In order for file descriptor passing to have
> > its intended security properties, VFS read/write handlers must not do any
> > kind of privilege checking.
> [...]
> > --- a/kernel/ptrace.c
> > +++ b/kernel/ptrace.c
> > @@ -302,6 +302,13 @@ ok:
> >  bool ptrace_may_access(struct task_struct *task, unsigned int mode)
> >  {
> >  	int err;
> > +
> > +	/* If you have to check for ptrace access from a VFS method, use
> > +	 * ptrace_may_access_noncurrent() instead.
> > +	 */
> > +	if (WARN_ON(current->in_unprivileged_vfs != 0))
> 
> Shouldn't this be WARN_ON_ONCE(), so that any such bug can't e used
> to spam the log?

Hm, makes sense. I'll change it in v2.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/9] exec: turn self_exec_id into self_privunit_id
  2016-09-18 18:31     ` Jann Horn
@ 2016-09-18 18:45       ` Ben Hutchings
  2016-09-18 19:08         ` Jann Horn
  2016-09-18 19:57         ` Andy Lutomirski
  0 siblings, 2 replies; 28+ messages in thread
From: Ben Hutchings @ 2016-09-18 18:45 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise, linux-fsdevel,
	linux-security-module, security

[-- Attachment #1: Type: text/plain, Size: 2120 bytes --]

On Sun, Sep 18, 2016 at 08:31:37PM +0200, Jann Horn wrote:
> On Sun, Sep 18, 2016 at 07:13:27PM +0100, Ben Hutchings wrote:
> > On Sun, 2016-09-18 at 17:05 +0200, Jann Horn wrote:
> > > This ensures that self_privunit_id ("privilege unit ID") is only shared by
> > > processes that share the mm_struct and the signal_struct; not just
> > > spatially, but also temporally. In other words, if you do execve() or
> > > clone() without CLONE_THREAD, you get a new privunit_id that has never been
> > > used before.
> > [...]
> > > +void increment_privunit_counter(void)
> > > +{
> > > +	BUILD_BUG_ON(NR_CPUS > (1 << 16));
> > > +	current->self_privunit_id = this_cpu_add_return(exec_counter, NR_CPUS);
> > > +}
> > [...]
> > 
> > This will wrap incorrectly if NR_CPUS is not a power of 2 (which is
> > unusual but allowed).
> 
> If this wraps, hell breaks loose permission-wise - processes that have
> no relationship whatsoever with each other will suddenly be able to ptrace
> each other.
> 
> The idea is that it never wraps.

That's what I suspected, but wasn't sure.  In that case you can
initialise each counter to U64_MAX/NR_CPUS*cpu and increment by
1 each time, which might be more efficient on some architectures.

> It wraps after (2^64)/NR_CPUS execs or
> forks on one CPU core. NR_CPUS is bounded to <=2^16, so in the worst case,
> it wraps after 2^48 execs or forks.
> 
> On my system with 3.7GHz per core, 2^16 minimal sequential non-thread clone()
> calls need 1 second system time (and 2 seconds wall clock time, but let's
> disregard that), so 2^48 non-thread clone() calls should need over 100 years.
> 
> But I guess both the kernel and machines get faster - if you think the margin
> might not be future-proof enough (or if you think I measured wrong and it's
> actually much faster), I guess I could bump this to a 128bit number.

Sequential execution speed isn't likely to get significantly faster so
with those current numbers this seems to be quite safe.

Ben.

-- 
Ben Hutchings
Reality is just a crutch for people who can't handle science fiction.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/9] exec: turn self_exec_id into self_privunit_id
  2016-09-18 18:45       ` Ben Hutchings
@ 2016-09-18 19:08         ` Jann Horn
  2016-09-18 19:57         ` Andy Lutomirski
  1 sibling, 0 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-18 19:08 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise, linux-fsdevel,
	linux-security-module, security

[-- Attachment #1: Type: text/plain, Size: 1398 bytes --]

On Sun, Sep 18, 2016 at 07:45:07PM +0100, Ben Hutchings wrote:
> On Sun, Sep 18, 2016 at 08:31:37PM +0200, Jann Horn wrote:
> > On Sun, Sep 18, 2016 at 07:13:27PM +0100, Ben Hutchings wrote:
> > > On Sun, 2016-09-18 at 17:05 +0200, Jann Horn wrote:
> > > > This ensures that self_privunit_id ("privilege unit ID") is only shared by
> > > > processes that share the mm_struct and the signal_struct; not just
> > > > spatially, but also temporally. In other words, if you do execve() or
> > > > clone() without CLONE_THREAD, you get a new privunit_id that has never been
> > > > used before.
> > > [...]
> > > > +void increment_privunit_counter(void)
> > > > +{
> > > > +	BUILD_BUG_ON(NR_CPUS > (1 << 16));
> > > > +	current->self_privunit_id = this_cpu_add_return(exec_counter, NR_CPUS);
> > > > +}
> > > [...]
> > > 
> > > This will wrap incorrectly if NR_CPUS is not a power of 2 (which is
> > > unusual but allowed).
> > 
> > If this wraps, hell breaks loose permission-wise - processes that have
> > no relationship whatsoever with each other will suddenly be able to ptrace
> > each other.
> > 
> > The idea is that it never wraps.
> 
> That's what I suspected, but wasn't sure.  In that case you can
> initialise each counter to U64_MAX/NR_CPUS*cpu and increment by
> 1 each time, which might be more efficient on some architectures.

Makes sense. Will do that!

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 5/9] proc: lock properly in ptrace_may_access callers
  2016-09-18 15:05 ` [PATCH 5/9] proc: lock properly in ptrace_may_access callers Jann Horn
@ 2016-09-18 19:15   ` Jann Horn
  0 siblings, 0 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-18 19:15 UTC (permalink / raw)
  To: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

[-- Attachment #1: Type: text/plain, Size: 258 bytes --]

On Sun, Sep 18, 2016 at 05:05:13PM +0200, Jann Horn wrote:
> Use the new cred_guard_light to prevent information leaks
> through races in procfs.

The 0day test robot pointed out that I didn't take the mutex in
proc_map_files_readdir(). I'll add that in v2.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/9] exec: turn self_exec_id into self_privunit_id
  2016-09-18 18:45       ` Ben Hutchings
  2016-09-18 19:08         ` Jann Horn
@ 2016-09-18 19:57         ` Andy Lutomirski
  2016-09-19 15:31           ` Jann Horn
  1 sibling, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-09-18 19:57 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Thomas Gleixner, Stephen Smalley, Andrew Morton, security,
	James Morris, Janis Danisevskis, Casey Schaufler, Roland McGrath,
	Kees Cook, Alexander Viro, LSM List, Serge E. Hallyn, Jann Horn,
	Eric . Biederman, Paul Moore, Linux FS Devel, Oleg Nesterov,
	Benjamin LaHaise, Eric Paris, Seth Forshee, John Johansen

On Sep 18, 2016 8:45 AM, "Ben Hutchings" <ben@decadent.org.uk> wrote:
>
> On Sun, Sep 18, 2016 at 08:31:37PM +0200, Jann Horn wrote:
> > On Sun, Sep 18, 2016 at 07:13:27PM +0100, Ben Hutchings wrote:
> > > On Sun, 2016-09-18 at 17:05 +0200, Jann Horn wrote:
> > > > This ensures that self_privunit_id ("privilege unit ID") is only shared by
> > > > processes that share the mm_struct and the signal_struct; not just
> > > > spatially, but also temporally. In other words, if you do execve() or
> > > > clone() without CLONE_THREAD, you get a new privunit_id that has never been
> > > > used before.
> > > [...]
> > > > +void increment_privunit_counter(void)
> > > > +{
> > > > + BUILD_BUG_ON(NR_CPUS > (1 << 16));
> > > > + current->self_privunit_id = this_cpu_add_return(exec_counter, NR_CPUS);
> > > > +}
> > > [...]
> > >
> > > This will wrap incorrectly if NR_CPUS is not a power of 2 (which is
> > > unusual but allowed).
> >
> > If this wraps, hell breaks loose permission-wise - processes that have
> > no relationship whatsoever with each other will suddenly be able to ptrace
> > each other.
> >
> > The idea is that it never wraps.
>
> That's what I suspected, but wasn't sure.  In that case you can
> initialise each counter to U64_MAX/NR_CPUS*cpu and increment by
> 1 each time, which might be more efficient on some architectures.
>
> > It wraps after (2^64)/NR_CPUS execs or
> > forks on one CPU core. NR_CPUS is bounded to <=2^16, so in the worst case,
> > it wraps after 2^48 execs or forks.
> >
> > On my system with 3.7GHz per core, 2^16 minimal sequential non-thread clone()
> > calls need 1 second system time (and 2 seconds wall clock time, but let's
> > disregard that), so 2^48 non-thread clone() calls should need over 100 years.
> >
> > But I guess both the kernel and machines get faster - if you think the margin
> > might not be future-proof enough (or if you think I measured wrong and it's
> > actually much faster), I guess I could bump this to a 128bit number.
>
> Sequential execution speed isn't likely to get significantly faster so
> with those current numbers this seems to be quite safe.
>

But how big can NR_CPUs get before this gets uncomfortable?

We could do:

struct luid {
  u64 count:
  unsigned cpu;
};

(LUID = locally unique ID).

IIRC my draft PCID code does something similar to uniquely identify
mms.  If I accidentally reused a PCID without a flush, everything
would explode.

--Andy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context
  2016-09-18 15:05 ` [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context Jann Horn
  2016-09-18 18:38   ` Ben Hutchings
@ 2016-09-18 19:57   ` Andy Lutomirski
  2016-09-18 20:38     ` Jann Horn
  2016-09-18 20:18   ` Linus Torvalds
  2 siblings, 1 reply; 28+ messages in thread
From: Andy Lutomirski @ 2016-09-18 19:57 UTC (permalink / raw)
  To: Jann Horn
  Cc: Thomas Gleixner, Stephen Smalley, Andrew Morton, security,
	James Morris, Janis Danisevskis, Casey Schaufler, Kees Cook,
	Roland McGrath, Alexander Viro, LSM List, Serge E. Hallyn,
	Eric . Biederman, Paul Moore, Linux FS Devel, Oleg Nesterov,
	Benjamin LaHaise, Eric Paris, Seth Forshee, John Johansen

On Sep 18, 2016 5:05 AM, "Jann Horn" <jann@thejh.net> wrote:
>
> This ensures that VFS implementations don't call ptrace_may_access() from
> VFS read or write handlers. In order for file descriptor passing to have
> its intended security properties, VFS read/write handlers must not do any
> kind of privilege checking.
>

Ooh, nifty!  Can you warn about capable() too?

Warning about all access to current->cred could be fun.  I expect we
have zillions of these bugs.  Think keys, netlink, proc, etc.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context
  2016-09-18 15:05 ` [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context Jann Horn
  2016-09-18 18:38   ` Ben Hutchings
  2016-09-18 19:57   ` Andy Lutomirski
@ 2016-09-18 20:18   ` Linus Torvalds
  2016-09-18 20:52     ` Jann Horn
  2 siblings, 1 reply; 28+ messages in thread
From: Linus Torvalds @ 2016-09-18 20:18 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise, linux-fsdevel, LSM List,
	security

On Sun, Sep 18, 2016 at 8:05 AM, Jann Horn <jann@thejh.net> wrote:
> This ensures that VFS implementations don't call ptrace_may_access() from
> VFS read or write handlers. In order for file descriptor passing to have
> its intended security properties, VFS read/write handlers must not do any
> kind of privilege checking.

Quite frankly, this smells like it should be a static check, not some
kind of runtime one. Or if runtime, it should be abstracted out so
that you can do an occasional "let's run a checking pass" rather than
enable it unconditionally and universally.

It's just too specialized. Soon you'll want to do other random context
checking, and we can't just keep adding those kinds of ad-hoc things
without it becoming a maintenance nightmare. I can well imagine
somebody ending up writing some stupid patch to take that
"in_unprivileged_vfs" thing into account for some semantics, and then
we're *really* screwed. So there are many reasons to make sure this is
*not* something that people actually expect to always be there.

               Linus

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context
  2016-09-18 19:57   ` Andy Lutomirski
@ 2016-09-18 20:38     ` Jann Horn
  0 siblings, 0 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-18 20:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Thomas Gleixner, Stephen Smalley, Andrew Morton, security,
	James Morris, Janis Danisevskis, Casey Schaufler, Kees Cook,
	Roland McGrath, Alexander Viro, LSM List, Serge E. Hallyn,
	Eric . Biederman, Paul Moore, Linux FS Devel, Oleg Nesterov,
	Benjamin LaHaise, Eric Paris, Seth Forshee, John Johansen

[-- Attachment #1: Type: text/plain, Size: 1367 bytes --]

On Sun, Sep 18, 2016 at 12:57:57PM -0700, Andy Lutomirski wrote:
> On Sep 18, 2016 5:05 AM, "Jann Horn" <jann@thejh.net> wrote:
> >
> > This ensures that VFS implementations don't call ptrace_may_access() from
> > VFS read or write handlers. In order for file descriptor passing to have
> > its intended security properties, VFS read/write handlers must not do any
> > kind of privilege checking.
> >
> 
> Ooh, nifty!  Can you warn about capable() too?
> 
> Warning about all access to current->cred could be fun.  I expect we
> have zillions of these bugs.  Think keys, netlink, proc, etc.

True, that would probably spam dmesg quite a bit. %pK under printk() is
pretty nonsensical (heh, maybe we even have code doing capable() checks
in IRQ context via printk()), then there are all those broken capable()
checks for root-only sysctls and so on.

For capable(), I've been thinking about adopting Linus' old suggestion of
simply overriding the creds on VFS read/write entry. If we make
current->cred non-refcounted and task-private (no RCU-safe pointer
assignment), it's going to basically be two extra pointer reads and writes
per read/write call - even for such a hot code path, that should be fine
IMO. (And if we don't want to do this for all read/write calls, we could
at least do it for sysctls, that would already help quite a bit.)

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context
  2016-09-18 20:18   ` Linus Torvalds
@ 2016-09-18 20:52     ` Jann Horn
  0 siblings, 0 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-18 20:52 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Stephen Smalley,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise, linux-fsdevel, LSM List,
	security

[-- Attachment #1: Type: text/plain, Size: 1430 bytes --]

On Sun, Sep 18, 2016 at 01:18:48PM -0700, Linus Torvalds wrote:
> On Sun, Sep 18, 2016 at 8:05 AM, Jann Horn <jann@thejh.net> wrote:
> > This ensures that VFS implementations don't call ptrace_may_access() from
> > VFS read or write handlers. In order for file descriptor passing to have
> > its intended security properties, VFS read/write handlers must not do any
> > kind of privilege checking.
> 
> Quite frankly, this smells like it should be a static check, not some
> kind of runtime one. Or if runtime, it should be abstracted out so
> that you can do an occasional "let's run a checking pass" rather than
> enable it unconditionally and universally.

Hm, fair point.

I guess this could be implemented in eBPF or systemtap?

Then for now, I guess I'll remove this patch from the series - and maybe
I'll think about writing some external checker with eBPF kprobes or so.


> It's just too specialized. Soon you'll want to do other random context
> checking, and we can't just keep adding those kinds of ad-hoc things
> without it becoming a maintenance nightmare. I can well imagine
> somebody ending up writing some stupid patch to take that
> "in_unprivileged_vfs" thing into account for some semantics, and then
> we're *really* screwed. So there are many reasons to make sure this is
> *not* something that people actually expect to always be there.

Oh, yuck. Yes, those are good points.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/9] proc: use open()-time creds for ptrace checks
  2016-09-18 15:05 ` [PATCH 3/9] proc: use open()-time creds for ptrace checks Jann Horn
@ 2016-09-19 13:01   ` Stephen Smalley
  2016-09-19 14:32     ` Jann Horn
  0 siblings, 1 reply; 28+ messages in thread
From: Stephen Smalley @ 2016-09-19 13:01 UTC (permalink / raw)
  To: Jann Horn, Alexander Viro, Roland McGrath, Oleg Nesterov,
	John Johansen, James Morris, Serge E. Hallyn, Paul Moore,
	Eric Paris, Casey Schaufler, Kees Cook, Andrew Morton,
	Janis Danisevskis, Seth Forshee, Eric . Biederman,
	Thomas Gleixner, Benjamin LaHaise
  Cc: linux-fsdevel, linux-security-module, security

On 09/18/2016 11:05 AM, Jann Horn wrote:
> This adds a new ptrace_may_access_noncurrent() method that
> uses the supplied credentials instead of those of the
> current task. (However, the current task may still be
> inspected for auditing purposes, e.g. by the Smack LSM.)
> 
> procfs used the caller's creds for a few ptrace_may_access()
> checks at read() time, which made a confused deputy attack
> by passing an FD to a procfs file to a setuid program
> possible. Therefore, the following was a local userspace
> ASLR bypass:
> 
> rm -f /tmp/foobar
> procmail <(echo 'DEFAULT=/tmp/foobar') </proc/1/stat
> (
>   echo 'obase=16'
>   cut -d' ' -f26-30,45-51 </tmp/foobar | tr ' ' '\n'
> ) | bc
> rm /tmp/foobar
> 
> procmail is installed setuid root on Debian and read()s
> data from stdin before dropping privs, so the
> ptrace_may_access() check in the VFS read handler of
> /proc/1/stat passes. Procmail then dumps the read data
> to a user-accessible file (/tmp/foobar here).
> 
> Signed-off-by: Jann Horn <jann@thejh.net>
> ---
>  fs/proc/array.c                 |  3 +-
>  fs/proc/base.c                  | 63 ++++++++++++++++++++++++++++++++++-------
>  fs/proc/internal.h              | 14 +++++++++
>  include/linux/lsm_hooks.h       |  3 +-
>  include/linux/ptrace.h          |  5 ++++
>  include/linux/security.h        | 10 ++++---
>  kernel/ptrace.c                 | 40 +++++++++++++++++++-------
>  security/apparmor/include/ipc.h |  2 +-
>  security/apparmor/ipc.c         |  4 +--
>  security/apparmor/lsm.c         | 14 +++++++--
>  security/commoncap.c            |  8 +++---
>  security/security.c             |  5 ++--
>  security/selinux/hooks.c        |  4 +--
>  security/smack/smack_lsm.c      | 18 ++++++++----
>  security/yama/yama_lsm.c        |  9 ++++--
>  15 files changed, 150 insertions(+), 52 deletions(-)
> 

> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 13185a6..f9a0be7 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -2133,10 +2133,10 @@ static int selinux_binder_transfer_file(struct task_struct *from,
>  }
>  
>  static int selinux_ptrace_access_check(struct task_struct *child,
> -				     unsigned int mode)
> +				     unsigned int mode, const struct cred *cred)
>  {
>  	if (mode & PTRACE_MODE_READ) {
> -		u32 sid = current_sid();
> +		u32 sid = cred_sid(cred);
>  		u32 csid = task_sid(child);
>  		return avc_has_perm(sid, csid, SECCLASS_FILE, FILE__READ, NULL);
>  	}

For consistency, don't you also need to change the next line of code to
use cred_has_perm() rather than current_has_perm()?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/9] proc: use open()-time creds for ptrace checks
  2016-09-19 13:01   ` Stephen Smalley
@ 2016-09-19 14:32     ` Jann Horn
  2016-09-19 14:45       ` Stephen Smalley
  0 siblings, 1 reply; 28+ messages in thread
From: Jann Horn @ 2016-09-19 14:32 UTC (permalink / raw)
  To: Stephen Smalley
  Cc: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Eric Paris,
	Casey Schaufler, Kees Cook, Andrew Morton, Janis Danisevskis,
	Seth Forshee, Eric . Biederman, Thomas Gleixner,
	Benjamin LaHaise, linux-fsdevel, linux-security-module, security

[-- Attachment #1: Type: text/plain, Size: 2054 bytes --]

On Mon, Sep 19, 2016 at 09:01:19AM -0400, Stephen Smalley wrote:
> On 09/18/2016 11:05 AM, Jann Horn wrote:
> > This adds a new ptrace_may_access_noncurrent() method that
> > uses the supplied credentials instead of those of the
> > current task. (However, the current task may still be
> > inspected for auditing purposes, e.g. by the Smack LSM.)
> > 
> > procfs used the caller's creds for a few ptrace_may_access()
> > checks at read() time, which made a confused deputy attack
> > by passing an FD to a procfs file to a setuid program
> > possible. Therefore, the following was a local userspace
> > ASLR bypass:
[...]
> > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> > index 13185a6..f9a0be7 100644
> > --- a/security/selinux/hooks.c
> > +++ b/security/selinux/hooks.c
> > @@ -2133,10 +2133,10 @@ static int selinux_binder_transfer_file(struct task_struct *from,
> >  }
> >  
> >  static int selinux_ptrace_access_check(struct task_struct *child,
> > -				     unsigned int mode)
> > +				     unsigned int mode, const struct cred *cred)
> >  {
> >  	if (mode & PTRACE_MODE_READ) {
> > -		u32 sid = current_sid();
> > +		u32 sid = cred_sid(cred);
> >  		u32 csid = task_sid(child);
> >  		return avc_has_perm(sid, csid, SECCLASS_FILE, FILE__READ, NULL);
> >  	}
> 
> For consistency, don't you also need to change the next line of code to
> use cred_has_perm() rather than current_has_perm()?

Oh, right.

How about something like this?

  u32 sid = cred_sid(cred);
  u32 csid = task_sid(child);
  if (mode & PTRACE_MODE_READ)
    return avc_has_perm(sid, csid, SECCLASS_FILE, FILE__READ, NULL);
  else
    return avc_has_perm(sid, csid, SECCLASS_PROCESS, PROCESS__PTRACE, NULL);

The current code looks as if the PTRACE_MODE_READ and PTRACE_MODE_ATTACH cases
behave very differently, which they actually don't. And to be able to use
cred_has_perm() here, it would be necessary to explicitly grab and release
an RCU read lock or so (to prevent the child creds from going away).

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/9] proc: use open()-time creds for ptrace checks
  2016-09-19 14:32     ` Jann Horn
@ 2016-09-19 14:45       ` Stephen Smalley
  0 siblings, 0 replies; 28+ messages in thread
From: Stephen Smalley @ 2016-09-19 14:45 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexander Viro, Roland McGrath, Oleg Nesterov, John Johansen,
	James Morris, Serge E. Hallyn, Paul Moore, Eric Paris,
	Casey Schaufler, Kees Cook, Andrew Morton, Janis Danisevskis,
	Seth Forshee, Eric . Biederman, Thomas Gleixner,
	Benjamin LaHaise, linux-fsdevel, linux-security-module, security

On 09/19/2016 10:32 AM, Jann Horn wrote:
> On Mon, Sep 19, 2016 at 09:01:19AM -0400, Stephen Smalley wrote:
>> On 09/18/2016 11:05 AM, Jann Horn wrote:
>>> This adds a new ptrace_may_access_noncurrent() method that
>>> uses the supplied credentials instead of those of the
>>> current task. (However, the current task may still be
>>> inspected for auditing purposes, e.g. by the Smack LSM.)
>>>
>>> procfs used the caller's creds for a few ptrace_may_access()
>>> checks at read() time, which made a confused deputy attack
>>> by passing an FD to a procfs file to a setuid program
>>> possible. Therefore, the following was a local userspace
>>> ASLR bypass:
> [...]
>>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>>> index 13185a6..f9a0be7 100644
>>> --- a/security/selinux/hooks.c
>>> +++ b/security/selinux/hooks.c
>>> @@ -2133,10 +2133,10 @@ static int selinux_binder_transfer_file(struct task_struct *from,
>>>  }
>>>  
>>>  static int selinux_ptrace_access_check(struct task_struct *child,
>>> -				     unsigned int mode)
>>> +				     unsigned int mode, const struct cred *cred)
>>>  {
>>>  	if (mode & PTRACE_MODE_READ) {
>>> -		u32 sid = current_sid();
>>> +		u32 sid = cred_sid(cred);
>>>  		u32 csid = task_sid(child);
>>>  		return avc_has_perm(sid, csid, SECCLASS_FILE, FILE__READ, NULL);
>>>  	}
>>
>> For consistency, don't you also need to change the next line of code to
>> use cred_has_perm() rather than current_has_perm()?
> 
> Oh, right.
> 
> How about something like this?
> 
>   u32 sid = cred_sid(cred);
>   u32 csid = task_sid(child);
>   if (mode & PTRACE_MODE_READ)
>     return avc_has_perm(sid, csid, SECCLASS_FILE, FILE__READ, NULL);
>   else
>     return avc_has_perm(sid, csid, SECCLASS_PROCESS, PROCESS__PTRACE, NULL);
> 
> The current code looks as if the PTRACE_MODE_READ and PTRACE_MODE_ATTACH cases
> behave very differently, which they actually don't. And to be able to use
> cred_has_perm() here, it would be necessary to explicitly grab and release
> an RCU read lock or so (to prevent the child creds from going away).
> 

Sure, that's fine.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/9] exec: turn self_exec_id into self_privunit_id
  2016-09-18 19:57         ` Andy Lutomirski
@ 2016-09-19 15:31           ` Jann Horn
  0 siblings, 0 replies; 28+ messages in thread
From: Jann Horn @ 2016-09-19 15:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ben Hutchings, Thomas Gleixner, Stephen Smalley, Andrew Morton,
	security, James Morris, Janis Danisevskis, Casey Schaufler,
	Roland McGrath, Kees Cook, Alexander Viro, LSM List,
	Serge E. Hallyn, Eric . Biederman, Paul Moore, Linux FS Devel,
	Oleg Nesterov, Benjamin LaHaise, Eric Paris, Seth Forshee,
	John Johansen

[-- Attachment #1: Type: text/plain, Size: 3054 bytes --]

On Sun, Sep 18, 2016 at 12:57:46PM -0700, Andy Lutomirski wrote:
> On Sep 18, 2016 8:45 AM, "Ben Hutchings" <ben@decadent.org.uk> wrote:
> >
> > On Sun, Sep 18, 2016 at 08:31:37PM +0200, Jann Horn wrote:
> > > On Sun, Sep 18, 2016 at 07:13:27PM +0100, Ben Hutchings wrote:
> > > > On Sun, 2016-09-18 at 17:05 +0200, Jann Horn wrote:
> > > > > This ensures that self_privunit_id ("privilege unit ID") is only shared by
> > > > > processes that share the mm_struct and the signal_struct; not just
> > > > > spatially, but also temporally. In other words, if you do execve() or
> > > > > clone() without CLONE_THREAD, you get a new privunit_id that has never been
> > > > > used before.
> > > > [...]
> > > > > +void increment_privunit_counter(void)
> > > > > +{
> > > > > + BUILD_BUG_ON(NR_CPUS > (1 << 16));
> > > > > + current->self_privunit_id = this_cpu_add_return(exec_counter, NR_CPUS);
> > > > > +}
> > > > [...]
> > > >
> > > > This will wrap incorrectly if NR_CPUS is not a power of 2 (which is
> > > > unusual but allowed).
> > >
> > > If this wraps, hell breaks loose permission-wise - processes that have
> > > no relationship whatsoever with each other will suddenly be able to ptrace
> > > each other.
> > >
> > > The idea is that it never wraps.
> >
> > That's what I suspected, but wasn't sure.  In that case you can
> > initialise each counter to U64_MAX/NR_CPUS*cpu and increment by
> > 1 each time, which might be more efficient on some architectures.
> >
> > > It wraps after (2^64)/NR_CPUS execs or
> > > forks on one CPU core. NR_CPUS is bounded to <=2^16, so in the worst case,
> > > it wraps after 2^48 execs or forks.
> > >
> > > On my system with 3.7GHz per core, 2^16 minimal sequential non-thread clone()
> > > calls need 1 second system time (and 2 seconds wall clock time, but let's
> > > disregard that), so 2^48 non-thread clone() calls should need over 100 years.
> > >
> > > But I guess both the kernel and machines get faster - if you think the margin
> > > might not be future-proof enough (or if you think I measured wrong and it's
> > > actually much faster), I guess I could bump this to a 128bit number.
> >
> > Sequential execution speed isn't likely to get significantly faster so
> > with those current numbers this seems to be quite safe.
> >
> 
> But how big can NR_CPUs get before this gets uncomfortable?
> 
> We could do:
> 
> struct luid {
>   u64 count:
>   unsigned cpu;
> };
> 
> (LUID = locally unique ID).
> 
> IIRC my draft PCID code does something similar to uniquely identify
> mms.  If I accidentally reused a PCID without a flush, everything
> would explode.

So I guess for generating a new LUID, I'd have to do something like this?

  struct luid new_luid;
  preempt_disable();
  raw_cpu_add(luid_counters, 1);
  new_luid.count = raw_cpu_read(luid_counters, 1);
  new_luid.cpu = smp_processor_id();
  preempt_enable();

Disabling preemption should be sufficient as long as nobody generates LUIDs
from IRQ context, right?

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-09-19 15:31 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-18 15:05 [PATCH 0/9] Various fixes related to ptrace_may_access() Jann Horn
2016-09-18 15:05 ` [PATCH 1/9] exec: introduce cred_guard_light Jann Horn
2016-09-18 15:05 ` [PATCH 2/9] exec: turn self_exec_id into self_privunit_id Jann Horn
2016-09-18 18:13   ` Ben Hutchings
2016-09-18 18:31     ` Jann Horn
2016-09-18 18:45       ` Ben Hutchings
2016-09-18 19:08         ` Jann Horn
2016-09-18 19:57         ` Andy Lutomirski
2016-09-19 15:31           ` Jann Horn
2016-09-18 15:05 ` [PATCH 3/9] proc: use open()-time creds for ptrace checks Jann Horn
2016-09-19 13:01   ` Stephen Smalley
2016-09-19 14:32     ` Jann Horn
2016-09-19 14:45       ` Stephen Smalley
2016-09-18 15:05 ` [PATCH 4/9] futex: don't leak robust_list pointer Jann Horn
2016-09-18 18:28   ` Ben Hutchings
2016-09-18 18:33     ` Jann Horn
2016-09-18 15:05 ` [PATCH 5/9] proc: lock properly in ptrace_may_access callers Jann Horn
2016-09-18 19:15   ` Jann Horn
2016-09-18 15:05 ` [PATCH 6/9] ptrace: warn on ptrace_may_access without proper locking Jann Horn
2016-09-18 15:05 ` [PATCH 7/9] ptrace: forbid ptrace checks against current_cred() from VFS context Jann Horn
2016-09-18 18:38   ` Ben Hutchings
2016-09-18 18:40     ` Jann Horn
2016-09-18 19:57   ` Andy Lutomirski
2016-09-18 20:38     ` Jann Horn
2016-09-18 20:18   ` Linus Torvalds
2016-09-18 20:52     ` Jann Horn
2016-09-18 15:05 ` [PATCH 8/9] fs/proc: fix attr access check Jann Horn
2016-09-18 15:05 ` [PATCH 9/9] Documentation: add security/ptrace_checks.txt Jann Horn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).