Hi Aleksa, On 3/1/20 4:13 PM, Aleksa Sarai wrote: > On 2020-03-01, Bernd Edlinger wrote: >> This fixes a deadlock in the tracer when tracing a multi-threaded >> application that calls execve while more than one thread are running. >> >> I observed that when running strace on the gcc test suite, it always >> blocks after a while, when expect calls execve, because other threads >> have to be terminated. They send ptrace events, but the strace is no >> longer able to respond, since it is blocked in vm_access. >> >> The deadlock is always happening when strace needs to access the >> tracees process mmap, while another thread in the tracee starts to >> execve a child process, but that cannot continue until the >> PTRACE_EVENT_EXIT is handled and the WIFEXITED event is received: >> >> strace D 0 30614 30584 0x00000000 >> Call Trace: >> __schedule+0x3ce/0x6e0 >> schedule+0x5c/0xd0 >> schedule_preempt_disabled+0x15/0x20 >> __mutex_lock.isra.13+0x1ec/0x520 >> __mutex_lock_killable_slowpath+0x13/0x20 >> mutex_lock_killable+0x28/0x30 >> mm_access+0x27/0xa0 >> process_vm_rw_core.isra.3+0xff/0x550 >> process_vm_rw+0xdd/0xf0 >> __x64_sys_process_vm_readv+0x31/0x40 >> do_syscall_64+0x64/0x220 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> >> expect D 0 31933 30876 0x80004003 >> Call Trace: >> __schedule+0x3ce/0x6e0 >> schedule+0x5c/0xd0 >> flush_old_exec+0xc4/0x770 >> load_elf_binary+0x35a/0x16c0 >> search_binary_handler+0x97/0x1d0 >> __do_execve_file.isra.40+0x5d4/0x8a0 >> __x64_sys_execve+0x49/0x60 >> do_syscall_64+0x64/0x220 >> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> >> The proposed solution is to have a second mutex that is >> used in mm_access, so it is allowed to continue while the >> dying threads are not yet terminated. >> >> I also took the opportunity to improve the documentation >> of prepare_creds, which is obviously out of sync. >> >> Signed-off-by: Bernd Edlinger > > I can't comment on the validity of the patch, but I also found and > reported this issue in 2016[1] and the discussion quickly veered into > the problem being more complicated (and uglier) than it seems at first > glance. > > You should probably also Cc stable, given this has been a long-standing > issue and your patch doesn't look (too) invasive. > I am fully aware that this patch won't fix the case then PTRACE_ACCESS is racing with de_thread. But I don't see a problem with allowing vm access based on the current credentials as they are still the same until de_thread is done with it's job. And in a practical way this fixes 99% of the real problem here, as it only happens since strace is currently tracing something and needs access to the parameters in the tracee's vm space. Of course you could fork the strace process to do any PTRACE_ACCESS when necessary, and, well, maybe that would fix the remaining problem here... However before I considered changing the kernel for this I tried to fix this within strace. First I tried to wait in the signal handler. See attached strace-patch-1.diff, but that did not work, BUT I think it is possible that your patch you proposed previously would actually make it work. I tried then another approach, using a worker thread to wait for the childs, but it did only work when I remove PTRACE_O_TRACEEXIT from the ptrace options, because the ptrace(PTRACE_SYSCALL, pid, 0L, 0L) does not work in the worker thread, rv = -1, errno = 3 there, and unfortunately the main thread is blocked and unable to do the ptrace call, that makes the thread continue. So I consider that second patch really ugly, and wouldn't propose something like that seriously. @@ -69,7 +71,7 @@ cflag_t cflag = CFLAG_NONE; unsigned int followfork; unsigned int ptrace_setoptions = PTRACE_O_TRACESYSGOOD | PTRACE_O_TRACEEXEC - | PTRACE_O_TRACEEXIT; + ;//| PTRACE_O_TRACEEXIT; unsigned int xflag; bool debug_flag; bool Tflag; so it only works because of this line, without that it is not able to make the thread continue after the PTRACE_EVENT_EXIT. Thanks Bernd. > [1]: https://lore.kernel.org/lkml/20160921152946.GA24210@dhcp22.suse.cz/ >