strace lockup when tracing exec in go

* strace lockup when tracing exec in go
@ 2016-09-21 15:29 Michal Hocko
  2016-09-22  4:15 ` Mike Galbraith
  2016-09-23  9:50 ` Oleg Nesterov
  0 siblings, 2 replies; 19+ messages in thread
From: Michal Hocko @ 2016-09-21 15:29 UTC (permalink / raw)
  To: LKML; +Cc: strace-devel, Oleg Nesterov, Aleksa Sarai

[I am CCing strace mailing list because even if this turns out to be a
 kernel bug strace seems to be doing something unexpected - more on that
 below]

Hi,
Aleksa has reported the following lockup when stracing the following go
program

% cat exec.go
package main

import (
    "os"
    "syscall"
)

func main() {
    syscall.Exec("/bin/echo", []string{"/bin/echo", "Hello"}, os.Environ())
}
$ go version
go version go1.6.3 linux/amd64
$ go build -o exec exec.go

$ strace -f ./exec
[snip]
[pid 10349] select(0, NULL, NULL, NULL, {0, 100} <unfinished ...>
[pid 10346] <... select resumed> )      = 0 (Timeout)
[pid 10346] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 10345] execve("/bin/echo", ["/bin/echo", "Hello"], [/* 95 vars */] <unfinished ...>
[pid 10346] <... select resumed> )      = 0 (Timeout)
[pid 10349] <... select resumed> )      = 0 (Timeout)

execve will never finish unless the strace process get killed with
SIGKILL. The following was debugged on 3.12 kernel and the current
kernel seems to not trigger the issue as easily but I believe the same
problem is there as well.

The further investigation shown that the tracer (strace) is stuck
waiting for cred_guard_mutex
[<0000000000000000>] mm_access+0x22/0xa0
[<0000000000000000>] process_vm_rw_core.isra.1+0x112/0x6c0
[<0000000000000000>] process_vm_rw+0xab/0x110
[<0000000000000000>] SyS_process_vm_readv+0x15/0x20
[<0000000000000000>] system_call_fastpath+0x16/0x1b
[<0000000000000000>] 0x7f186f031c3a
[<0000000000000000>] 0xffffffffffffffff

while the traced process (3 threads) are trying to perform the exec.
That means that 2 threads are dead (in fact zombies) waiting in their
final schedule.
Call Trace:
 [<ffffffff81057c17>] do_exit+0x6f7/0xa70
 [<ffffffff81058009>] do_group_exit+0x39/0xa0
 [<ffffffff810670d8>] get_signal_to_deliver+0x1e8/0x5c0
 [<ffffffff81002342>] do_signal+0x42/0x670
 [<ffffffff810029e8>] do_notify_resume+0x78/0xc0
 [<ffffffff8151f4c2>] int_signal+0x12/0x17
 [<00007f3a33f3ffb9>] 0x7f3a33f3ffb8

and one is
Call Trace:
 [<ffffffff811a80bf>] flush_old_exec+0xdf/0x890
 [<ffffffff811f4197>] load_elf_binary+0x307/0xda0
 [<ffffffff811a7b8e>] search_binary_handler+0xae/0x1f0
 [<ffffffff811a917e>] do_execve_common.isra.26+0x64e/0x810
 [<ffffffff811a95c1>] SyS_execve+0x31/0x50
 [<ffffffff8151f7b9>] stub_execve+0x69/0xa0
 [<00007f3a33f16527>] 0x7f3a33f16526

waiting for notify_count to drop down to 0
        while (sig->notify_count) {
                __set_current_state(TASK_KILLABLE);
                spin_unlock_irq(lock);
                schedule(); <<<<<<<<<
                if (unlikely(__fatal_signal_pending(tsk)))
                        goto killed;
                spin_lock_irq(lock);
        }

this however doesn't happen because both threads which are dead
are zombies waiting to be reaped by the parent and to call their
release_task->__exit_signal. The tracer is blocked on the lock held by
exec (in prepare_bprm_creds). This is the case in the current kernel as
well AFAICS so the same should be possible as well. So is this a bug or
something somewhere else makes sure that this will not happen in newer
kernels?

Now, I've tried to fix this by

diff --git a/kernel/fork.c b/kernel/fork.c
index 5a57b9bab85c..d5b7c3aea187 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -837,7 +837,7 @@ struct mm_struct *mm_access(struct task_struct *task, unsigned int mode)
 	struct mm_struct *mm;
 	int err;
 
-	err =  mutex_lock_killable(&task->signal->cred_guard_mutex);
+	err =  mutex_lock_interruptible(&task->signal->cred_guard_mutex);
 	if (err)
 		return ERR_PTR(err);
 
but it didn't help and I really couldn't see why until I sprinkled
printks all over the exit path and to learn that the parent (strace)
doesn't get notified about SIGCHLD when those threads are dying. Why is that?
exit_notify
  do_notify_parent -> return false
    __group_send_sig_info
      send_signal
        __send_signal
          prepare_signal -> fails because the signal is ignored

this is more than unexpected (at least to me) because I would expect
that the strace wouldn't ignore SIGCHLD! Is this a bug in strace?

Btw. in a successful (not hung) case the tracer is in do_wait
when do_notify_parent->__wake_up_parent does the right thing.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 19+ messages in thread