On Wed, Nov 02, 2016 at 07:18:06PM +0100, Oleg Nesterov wrote: > On 10/30, Jann Horn wrote: > > > > This is a new per-threadgroup lock that can often be taken instead of > > cred_guard_mutex and has less deadlock potential. I'm doing this because > > Oleg Nesterov mentioned the potential for deadlocks, in particular if a > > debugged task is stuck in execve, trying to get rid of a ptrace-stopped > > thread, and the debugger attempts to inspect procfs files of the debugged > > task. > > Yes, but let me repeat that we need to fix this anyway. So I don't really > understand why should we add yet another mutex. execve() only takes the new mutex immediately after de_thread(), so this problem shouldn't occur there. Basically, I think that I'm not making the problem worse with my patches this way. I believe that it should be possible to convert most existing users of the cred_guard_mutex to the new cred_guard_light - exceptions to that that I see are: - PTRACE_ATTACH - SECCOMP_FILTER_FLAG_TSYNC (sets NO_NEW_PRIVS on remote task) Beyond that, conceptually, the new cred_guard_light could also be turned into a read-write mutex to prevent deadlocks between its users (where execve would take it for writing and everyone else would take it for reading), but afaik the kernel doesn't have an implementation of read-write mutexes yet? cred_guard_light would mean that you could theoretically still create deadlocks, but afaics only if you do things like trying to read /proc/$pid/mem in the FUSE read handler for the file that is currently being executed - and in that case, I think it's okay to have a killable deadlock. Do you think that, if (apart from execve) only PTRACE_ATTACH and SECCOMP_FILTER_FLAG_TSYNC remain as users of cred_guard_mutex and everything else used my new cred_guard_light, that would be sufficient to fix the races you are concerned about? It seems to me like SECCOMP_FILTER_FLAG_TSYNC doesn't really have deadlocking issues. PTRACE_ATTACH isn't that clear to me; if a debugger tries to attach to a newly spawned thread while another ptraced thread is dying because of de_thread() in a third thread, that might still cause the debugger to deadlock, right? The problem with PTRACE_ATTACH is basically that bprm->unsafe is used in the bprm_set_creds LSM hook, so it needs to have been calculated when that hook is executed. (Also in the bprm_secureexec hook, but that one happens after install_exec_creds(), so that's unproblematic.) security_bprm_set_creds() is called in prepare_binprm(), which is executed very early in do_execveat_common(), at a point where failures should still be graceful (return an error code instead of killing the whole process), and therefore other threads can still run and debuggers can still attach. The LSM hooks that execute at that point e.g. inspect and modify bprm->cred, and they can still cleanly prohibit execution. E.g. SELinux does this - it can cancel execution with errors like -EPERM and -EACCES. AFAICS the hard case is: - Multithreaded process with tasks A and B is running. - Task C attaches to B via ptrace. - Task A calls execve(), takes the mutex, reaches de_thread(), kills task B. - Task C tries to attach to A, tries to take the mutex again, deadlock. I'm not sure whether it'd be possible to get rid of the deadlock for PTRACE_ATTACH without ABI changes, and I would be surprised if it was doable without nontrivial additional logic.