On Wed, Nov 02, 2016 at 10:39:32PM +0100, Jann Horn wrote: > On Sun, Oct 30, 2016 at 06:16:50PM +0100, Jann Horn wrote: > > On Fri, Sep 30, 2016 at 04:52:57PM +0200, Oleg Nesterov wrote: > > > On 09/23, Jann Horn wrote: > > > > > > > > This prevents an attacker from determining the robust_list or > > > > compat_robust_list userspace pointer of a process created by executing > > > > a setuid binary. Such an attack could be performed by racing > > > > get_robust_list() with a setuid execution. The impact of this issue is that > > > > an attacker could theoretically bypass ASLR when attacking setuid binaries. > > > > > > Well. I am not sure this actually needs a fix, but I won't argue. > > > > > > I can't really understand what this patch actually fixes, > > > > > > > @@ -3007,31 +3007,43 @@ SYSCALL_DEFINE3(get_robust_list, int, pid, > > > > if (!futex_cmpxchg_enabled) > > > > return -ENOSYS; > > > > > > > > - rcu_read_lock(); > > > > - > > > > - ret = -ESRCH; > > > > - if (!pid) > > > > + if (!pid) { > > > > p = current; > > > > - else { > > > > + get_task_struct(p); > > > > + } else { > > > > + rcu_read_lock(); > > > > p = find_task_by_vpid(pid); > > > > + /* pin the task to permit dropping the RCU read lock before > > > > + * acquiring the mutex > > > > + */ > > > > + if (p) > > > > + get_task_struct(p); > > > > + rcu_read_unlock(); > > > > if (!p) > > > > - goto err_unlock; > > > > + return -ESRCH; > > > > } > > > > > > > > + ret = mutex_lock_killable(&p->signal->cred_guard_light); > > > > + if (ret) > > > > + goto err_put; > > > > + > > > > ret = -EPERM; > > > > if (!ptrace_may_access(p, PTRACE_MODE_READ_REALCREDS)) > > > > goto err_unlock; > > > > > > > > head = p->robust_list; > > > > - rcu_read_unlock(); > > > > > > OK, suppose it races with setuid exec, and mutex_lock_killable() + > > > ptrace_may_access() comes after flush_old_exec() but before > > > install_exec_creds(), in this case ptrace_may_access() can wrongly > > > succeed. > > > > I take cred_guard_light in flush_old_exec() and release it in > > install_exec_creds(), so that shouldn't work, I think. > > > > > > > In theory, it is possible that the execing thread can complete exec, > > > return to user-mode and call sys_set_robust_list() before we read > > > head = p->robust_list. Yes, this is unlikely, but unless I am totally > > > confused the race you are trying to fix is equally unlikely? > > > > > > perhaps we can make a much simpler change to prevent this, see below. > > > We can rely on fact that both ptrace_may_access() and exec_mmap() > > > takes the same task_lock(). Sure, this can "leak" robust_list too, > > > a set-uid binary can exec and/or lower its credentials after we > > > read p->robust_list, but personally I think we do not care. > > > > > > Or I missed something else? > > > > No - I think your patch would work, too, apart from the potential > > leak you mentioned. > > Changing my opinion: > > This does not just affect setuid binaries. It also affects daemons like > cron and atd that execute processes with dropped privileges. > > This is how atd runs jobs (strace output, with irrelevant stuff removed): > > [...] > clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa81b1099d0) = 14915 > Process 14915 attached > [...] > [pid 14915] set_robust_list(0x7fa81b1099e0, 24) = 0 > [...] > [pid 14915] setregid(0, 1) = 0 > [pid 14915] setreuid(0, 1) = 0 > [pid 14915] close(0) = 0 > [pid 14915] close(1) = 0 > [pid 14915] close(2) = 0 > [pid 14915] clone(Process 14916 attached > child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa81b1099d0) = 14916 > [pid 14916] set_robust_list(0x7fa81b1099e0, 24) = 0 > [pid 14915] wait4(14916, > [pid 14916] lseek(6, 0, SEEK_SET) = 0 > [pid 14916] dup2(6, 0) = 0 > [pid 14916] dup2(5, 1) = 1 > [pid 14916] dup2(5, 2) = 2 > [pid 14916] close(6) = 0 > [pid 14916] close(5) = 0 > [pid 14916] setreuid(1, 0) = 0 > [pid 14916] setregid(1, 0) = 0 > [...] > [pid 14916] setgroups(13, [1000, [...]]) = 0 > [pid 14916] setgid(1000) = 0 > [pid 14916] setuid(1000) = 0 > [pid 14916] chdir("/") = 0 > [pid 14916] execve("/bin/sh", ["sh"], [/* 0 vars */]) = 0 > [...] > > Basically, you can see that the pointer 0x7fa81b1099e0, which reveals > information about the address space layout, is the robust list of pid 14916 > when it calls execve(), and after that execve() call, pid 14916 will be > ptraceable for the user (modulo LSMs). > > So I think that my patch is a bit safer. Yes, there aren't many local > daemons whose address space layout you can discover this way, but it's still > not great. I think my previous message wasn't very clear about what I think the issue is. Basically, here, it would be plausible for uid 1000 to be able to determine the pre-execve() robust_list pointer of pid 14916 by racing get_robust_list() during the execve(). That itself isn't a big issue because the memory mappings of pid 14916 are thrown away during the execve(), but what is potentially interesting to an attacker is that before the execve(), pid 14916 shared its address space layout with its parents, including the atd daemon. So if an attacker has a vulnerability in atd but needs an address leak in order to exploit it, this would be such a leak.