On Wed, Nov 02, 2016 at 07:18:06PM +0100, Oleg Nesterov wrote:
> On 10/30, Jann Horn wrote:
> >
> > This is a new per-threadgroup lock that can often be taken instead of
> > cred_guard_mutex and has less deadlock potential. I'm doing this because
> > Oleg Nesterov mentioned the potential for deadlocks, in particular if a
> > debugged task is stuck in execve, trying to get rid of a ptrace-stopped
> > thread, and the debugger attempts to inspect procfs files of the debugged
> > task.
> 
> Yes, but let me repeat that we need to fix this anyway. So I don't really
> understand why should we add yet another mutex.

execve() only takes the new mutex immediately after de_thread(), so this
problem shouldn't occur there. Basically, I think that I'm not making the
problem worse with my patches this way.

I believe that it should be possible to convert most existing users of the
cred_guard_mutex to the new cred_guard_light - exceptions to that that I
see are:

 - PTRACE_ATTACH
 - SECCOMP_FILTER_FLAG_TSYNC (sets NO_NEW_PRIVS on remote task)

Beyond that, conceptually, the new cred_guard_light could also be turned
into a read-write mutex to prevent deadlocks between its users (where
execve would take it for writing and everyone else would take it for
reading), but afaik the kernel doesn't have an implementation of
read-write mutexes yet?

cred_guard_light would mean that you could theoretically still create
deadlocks, but afaics only if you do things like trying to read
/proc/$pid/mem in the FUSE read handler for the file that is currently
being executed - and in that case, I think it's okay to have a killable
deadlock.

Do you think that, if (apart from execve) only PTRACE_ATTACH and
SECCOMP_FILTER_FLAG_TSYNC remain as users of cred_guard_mutex and
everything else used my new cred_guard_light, that would be sufficient to
fix the races you are concerned about?

It seems to me like SECCOMP_FILTER_FLAG_TSYNC doesn't really have
deadlocking issues. PTRACE_ATTACH isn't that clear to me; if a debugger
tries to attach to a newly spawned thread while another ptraced thread is
dying because of de_thread() in a third thread, that might still cause
the debugger to deadlock, right?

The problem with PTRACE_ATTACH is basically that bprm->unsafe is used
in the bprm_set_creds LSM hook, so it needs to have been calculated when
that hook is executed. (Also in the bprm_secureexec hook, but that
one happens after install_exec_creds(), so that's unproblematic.)
security_bprm_set_creds() is called in prepare_binprm(), which is
executed very early in do_execveat_common(), at a point where failures
should still be graceful (return an error code instead of killing the
whole process), and therefore other threads can still run and debuggers
can still attach.

The LSM hooks that execute at that point e.g. inspect and modify
bprm->cred, and they can still cleanly prohibit execution. E.g. SELinux
does this - it can cancel execution with errors like -EPERM and -EACCES.

AFAICS the hard case is:

 - Multithreaded process with tasks A and B is running.
 - Task C attaches to B via ptrace.
 - Task A calls execve(), takes the mutex, reaches de_thread(), kills
   task B.
 - Task C tries to attach to A, tries to take the mutex again,
   deadlock.

I'm not sure whether it'd be possible to get rid of the deadlock
for PTRACE_ATTACH without ABI changes, and I would be surprised if it
was doable without nontrivial additional logic.