Re: strace lockup when tracing exec in go

From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: Michal Hocko <mhocko@kernel.org>, LKML <linux-kernel@vger.kernel.org>
Cc: strace-devel@lists.sourceforge.net,
	Oleg Nesterov <oleg@redhat.com>, Aleksa Sarai <asarai@suse.com>
Subject: Re: strace lockup when tracing exec in go
Date: Thu, 22 Sep 2016 06:15:02 +0200	[thread overview]
Message-ID: <1474517702.4851.21.camel@gmail.com> (raw)
In-Reply-To: <20160921152946.GA24210@dhcp22.suse.cz>

On Wed, 2016-09-21 at 17:29 +0200, Michal Hocko wrote:
> [I am CCing strace mailing list because even if this turns out to be a
>  kernel bug strace seems to be doing something unexpected - more on that
>  below]
> 
> Hi,
> Aleksa has reported the following lockup when stracing the following go
> program
> 
> % cat exec.go
> package main
> 
> import (
>     "os"
>     "syscall"
> )
> 
> func main() {
>     syscall.Exec("/bin/echo", []string{"/bin/echo", "Hello"}, os.Environ())
> }
> $ go version
> go version go1.6.3 linux/amd64
> $ go build -o exec exec.go
> 
> $ strace -f ./exec
> [snip]
> [pid 10349] select(0, NULL, NULL, NULL, {0, 100} 
> [pid 10346] <... select resumed> )      = 0 (Timeout)
> [pid 10346] select(0, NULL, NULL, NULL, {0, 20} 
> [pid 10345] execve("/bin/echo", ["/bin/echo", "Hello"], [/* 95 vars */] 
> [pid 10346] <... select resumed> )      = 0 (Timeout)
> [pid 10349] <... select resumed> )      = 0 (Timeout)
> 
> execve will never finish unless the strace process get killed with
> SIGKILL. The following was debugged on 3.12 kernel and the current
> kernel seems to not trigger the issue as easily but I believe the same
> problem is there as well.
> 
> The further investigation shown that the tracer (strace) is stuck
> waiting for cred_guard_mutex
> [<0000000000000000>] mm_access+0x22/0xa0
> [<0000000000000000>] process_vm_rw_core.isra.1+0x112/0x6c0
> [<0000000000000000>] process_vm_rw+0xab/0x110
> [<0000000000000000>] SyS_process_vm_readv+0x15/0x20
> [<0000000000000000>] system_call_fastpath+0x16/0x1b
> [<0000000000000000>] 0x7f186f031c3a
> [<0000000000000000>] 0xffffffffffffffff
> 
> while the traced process (3 threads) are trying to perform the exec.
> That means that 2 threads are dead (in fact zombies) waiting in their
> final schedule.
> Call Trace:
>  [] do_exit+0x6f7/0xa70
>  [] do_group_exit+0x39/0xa0
>  [] get_signal_to_deliver+0x1e8/0x5c0
>  [] do_signal+0x42/0x670
>  [] do_notify_resume+0x78/0xc0
>  [] int_signal+0x12/0x17
>  [<00007f3a33f3ffb9>] 0x7f3a33f3ffb8
> 
> and one is
> Call Trace:
>  [] flush_old_exec+0xdf/0x890
>  [] load_elf_binary+0x307/0xda0
>  [] search_binary_handler+0xae/0x1f0
>  [] do_execve_common.isra.26+0x64e/0x810
>  [] SyS_execve+0x31/0x50
>  [] stub_execve+0x69/0xa0
>  [<00007f3a33f16527>] 0x7f3a33f16526
> 
> waiting for notify_count to drop down to 0
>         while (sig->notify_count) {
>                 __set_current_state(TASK_KILLABLE);
>                 spin_unlock_irq(lock);
>                 schedule(); <<<<<<<<<
>                 if (unlikely(__fatal_signal_pending(tsk)))
>                         goto killed;
>                 spin_lock_irq(lock);
>         }
> 
> this however doesn't happen because both threads which are dead
> are zombies waiting to be reaped by the parent and to call their
> release_task->__exit_signal. The tracer is blocked on the lock held by
> exec (in prepare_bprm_creds). This is the case in the current kernel as
> well AFAICS so the same should be possible as well. So is this a bug or
> something somewhere else makes sure that this will not happen in newer
> kernels?

master.today...

crash> bt 5138
PID: 5138   TASK: ffff8803fb4e4c80  CPU: 6   COMMAND: "strace"
 #0 [ffff88038e323b68] __schedule at ffffffff81624660
 #1 [ffff88038e323bb0] schedule at ffffffff81624b35
 #2 [ffff88038e323bc8] schedule_preempt_disabled at ffffffff81624e0e
 #3 [ffff88038e323bd8] __mutex_lock_killable_slowpath at ffffffff81625f9d
 #4 [ffff88038e323c30] mutex_lock_killable at ffffffff8162605a
 #5 [ffff88038e323c48] mm_access at ffffffff8105bbf7
 #6 [ffff88038e323c70] process_vm_rw_core at ffffffff811823d3
 #7 [ffff88038e323dc0] process_vm_rw at ffffffff8118287d
 #8 [ffff88038e323f38] sys_process_vm_readv at ffffffff811829b9
 #9 [ffff88038e323f50] entry_SYSCALL_64_fastpath at ffffffff81628736
    RIP: 00007faee6b2ac3a  RSP: 00007ffdc95b62b8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00000000019e7030  RCX: 00007faee6b2ac3a
    RDX: 0000000000000001  RSI: 00007ffdc95b62d0  RDI: 0000000000001418
    RBP: 0000000000001414   R8: 0000000000000001   R9: 0000000000000000
    R10: 00007ffdc95b62e0  R11: 0000000000000246  R12: 00007faee73e66c0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000000000136  CS: 0033  SS: 002b
crash> bt 5140
PID: 5140   TASK: ffff8803e703b300  CPU: 4   COMMAND: "exec"
 #0 [ffff88038e0b3ce8] __schedule at ffffffff81624660
 #1 [ffff88038e0b3d30] schedule at ffffffff81624b35
 #2 [ffff88038e0b3d48] flush_old_exec at ffffffff811b1f26
 #3 [ffff88038e0b3d88] load_elf_binary at ffffffff811fd70f
 #4 [ffff88038e0b3e60] search_binary_handler at ffffffff811b13ac
 #5 [ffff88038e0b3e98] do_execveat_common at ffffffff811b2f8f
 #6 [ffff88038e0b3f00] sys_execve at ffffffff811b33ba
 #7 [ffff88038e0b3f28] do_syscall_64 at ffffffff8100194e
    RIP: 000000000045c1ef  RSP: 000000c820043e00  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 000000c82000e6c0  RCX: 000000000045c1ef
    RDX: 000000c820076380  RSI: 000000c82000e6c0  RDI: 000000c82000a400
    RBP: 000000c820076380   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000246  R12: 000000000000006b
    R13: 00000000004c4b40  R14: 0000000000003fff  R15: 0000000000000039
    ORIG_RAX: 000000000000003b  CS: 0033  SS: 002b
crash> bt 5141
PID: 5141   TASK: ffff8803b0221980  CPU: 0   COMMAND: "exec"
 #0 [ffff88038e33fc60] __schedule at ffffffff81624660
 #1 [ffff88038e33fca8] schedule at ffffffff81624b35
 #2 [ffff88038e33fcc0] do_exit at ffffffff81061292
 #3 [ffff88038e33fd28] do_group_exit at ffffffff8106247f
 #4 [ffff88038e33fd58] get_signal at ffffffff8106c7fc
 #5 [ffff88038e33fdf0] do_signal at ffffffff81018468
 #6 [ffff88038e33fef8] exit_to_usermode_loop at ffffffff81059820
 #7 [ffff88038e33ff28] do_syscall_64 at ffffffff81001a35
    RIP: 00000000004538d3  RSP: 000000c820041f38  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 000000c820041f70  RCX: 00000000004538d3
    RDX: 0000000000000000  RSI: 000000c820041f70  RDI: 0000000000000000
    RBP: ffffffffffffffff   R8: 000000c820032000   R9: 000000c820000480
    R10: 0000000000000000  R11: 0000000000000246  R12: 000000000042aa30
    R13: 00000000004c4bc4  R14: 0000000000000000  R15: 0000000000000008
    ORIG_RAX: 0000000000000083  CS: 0033  SS: 002b
crash>