linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: Michal Hocko <mhocko@kernel.org>, LKML <linux-kernel@vger.kernel.org>
Cc: strace-devel@lists.sourceforge.net,
	Oleg Nesterov <oleg@redhat.com>, Aleksa Sarai <asarai@suse.com>
Subject: Re: strace lockup when tracing exec in go
Date: Thu, 22 Sep 2016 06:15:02 +0200	[thread overview]
Message-ID: <1474517702.4851.21.camel@gmail.com> (raw)
In-Reply-To: <20160921152946.GA24210@dhcp22.suse.cz>

On Wed, 2016-09-21 at 17:29 +0200, Michal Hocko wrote:
> [I am CCing strace mailing list because even if this turns out to be a
>  kernel bug strace seems to be doing something unexpected - more on that
>  below]
> 
> Hi,
> Aleksa has reported the following lockup when stracing the following go
> program
> 
> % cat exec.go
> package main
> 
> import (
>     "os"
>     "syscall"
> )
> 
> func main() {
>     syscall.Exec("/bin/echo", []string{"/bin/echo", "Hello"}, os.Environ())
> }
> $ go version
> go version go1.6.3 linux/amd64
> $ go build -o exec exec.go
> 
> $ strace -f ./exec
> [snip]
> [pid 10349] select(0, NULL, NULL, NULL, {0, 100} 
> [pid 10346] <... select resumed> )      = 0 (Timeout)
> [pid 10346] select(0, NULL, NULL, NULL, {0, 20} 
> [pid 10345] execve("/bin/echo", ["/bin/echo", "Hello"], [/* 95 vars */] 
> [pid 10346] <... select resumed> )      = 0 (Timeout)
> [pid 10349] <... select resumed> )      = 0 (Timeout)
> 
> execve will never finish unless the strace process get killed with
> SIGKILL. The following was debugged on 3.12 kernel and the current
> kernel seems to not trigger the issue as easily but I believe the same
> problem is there as well.
> 
> The further investigation shown that the tracer (strace) is stuck
> waiting for cred_guard_mutex
> [<0000000000000000>] mm_access+0x22/0xa0
> [<0000000000000000>] process_vm_rw_core.isra.1+0x112/0x6c0
> [<0000000000000000>] process_vm_rw+0xab/0x110
> [<0000000000000000>] SyS_process_vm_readv+0x15/0x20
> [<0000000000000000>] system_call_fastpath+0x16/0x1b
> [<0000000000000000>] 0x7f186f031c3a
> [<0000000000000000>] 0xffffffffffffffff
> 
> while the traced process (3 threads) are trying to perform the exec.
> That means that 2 threads are dead (in fact zombies) waiting in their
> final schedule.
> Call Trace:
>  [] do_exit+0x6f7/0xa70
>  [] do_group_exit+0x39/0xa0
>  [] get_signal_to_deliver+0x1e8/0x5c0
>  [] do_signal+0x42/0x670
>  [] do_notify_resume+0x78/0xc0
>  [] int_signal+0x12/0x17
>  [<00007f3a33f3ffb9>] 0x7f3a33f3ffb8
> 
> and one is
> Call Trace:
>  [] flush_old_exec+0xdf/0x890
>  [] load_elf_binary+0x307/0xda0
>  [] search_binary_handler+0xae/0x1f0
>  [] do_execve_common.isra.26+0x64e/0x810
>  [] SyS_execve+0x31/0x50
>  [] stub_execve+0x69/0xa0
>  [<00007f3a33f16527>] 0x7f3a33f16526
> 
> waiting for notify_count to drop down to 0
>         while (sig->notify_count) {
>                 __set_current_state(TASK_KILLABLE);
>                 spin_unlock_irq(lock);
>                 schedule(); <<<<<<<<<
>                 if (unlikely(__fatal_signal_pending(tsk)))
>                         goto killed;
>                 spin_lock_irq(lock);
>         }
> 
> this however doesn't happen because both threads which are dead
> are zombies waiting to be reaped by the parent and to call their
> release_task->__exit_signal. The tracer is blocked on the lock held by
> exec (in prepare_bprm_creds). This is the case in the current kernel as
> well AFAICS so the same should be possible as well. So is this a bug or
> something somewhere else makes sure that this will not happen in newer
> kernels?

master.today...

crash> bt 5138
PID: 5138   TASK: ffff8803fb4e4c80  CPU: 6   COMMAND: "strace"
 #0 [ffff88038e323b68] __schedule at ffffffff81624660
 #1 [ffff88038e323bb0] schedule at ffffffff81624b35
 #2 [ffff88038e323bc8] schedule_preempt_disabled at ffffffff81624e0e
 #3 [ffff88038e323bd8] __mutex_lock_killable_slowpath at ffffffff81625f9d
 #4 [ffff88038e323c30] mutex_lock_killable at ffffffff8162605a
 #5 [ffff88038e323c48] mm_access at ffffffff8105bbf7
 #6 [ffff88038e323c70] process_vm_rw_core at ffffffff811823d3
 #7 [ffff88038e323dc0] process_vm_rw at ffffffff8118287d
 #8 [ffff88038e323f38] sys_process_vm_readv at ffffffff811829b9
 #9 [ffff88038e323f50] entry_SYSCALL_64_fastpath at ffffffff81628736
    RIP: 00007faee6b2ac3a  RSP: 00007ffdc95b62b8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00000000019e7030  RCX: 00007faee6b2ac3a
    RDX: 0000000000000001  RSI: 00007ffdc95b62d0  RDI: 0000000000001418
    RBP: 0000000000001414   R8: 0000000000000001   R9: 0000000000000000
    R10: 00007ffdc95b62e0  R11: 0000000000000246  R12: 00007faee73e66c0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000000000136  CS: 0033  SS: 002b
crash> bt 5140
PID: 5140   TASK: ffff8803e703b300  CPU: 4   COMMAND: "exec"
 #0 [ffff88038e0b3ce8] __schedule at ffffffff81624660
 #1 [ffff88038e0b3d30] schedule at ffffffff81624b35
 #2 [ffff88038e0b3d48] flush_old_exec at ffffffff811b1f26
 #3 [ffff88038e0b3d88] load_elf_binary at ffffffff811fd70f
 #4 [ffff88038e0b3e60] search_binary_handler at ffffffff811b13ac
 #5 [ffff88038e0b3e98] do_execveat_common at ffffffff811b2f8f
 #6 [ffff88038e0b3f00] sys_execve at ffffffff811b33ba
 #7 [ffff88038e0b3f28] do_syscall_64 at ffffffff8100194e
    RIP: 000000000045c1ef  RSP: 000000c820043e00  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 000000c82000e6c0  RCX: 000000000045c1ef
    RDX: 000000c820076380  RSI: 000000c82000e6c0  RDI: 000000c82000a400
    RBP: 000000c820076380   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000246  R12: 000000000000006b
    R13: 00000000004c4b40  R14: 0000000000003fff  R15: 0000000000000039
    ORIG_RAX: 000000000000003b  CS: 0033  SS: 002b
crash> bt 5141
PID: 5141   TASK: ffff8803b0221980  CPU: 0   COMMAND: "exec"
 #0 [ffff88038e33fc60] __schedule at ffffffff81624660
 #1 [ffff88038e33fca8] schedule at ffffffff81624b35
 #2 [ffff88038e33fcc0] do_exit at ffffffff81061292
 #3 [ffff88038e33fd28] do_group_exit at ffffffff8106247f
 #4 [ffff88038e33fd58] get_signal at ffffffff8106c7fc
 #5 [ffff88038e33fdf0] do_signal at ffffffff81018468
 #6 [ffff88038e33fef8] exit_to_usermode_loop at ffffffff81059820
 #7 [ffff88038e33ff28] do_syscall_64 at ffffffff81001a35
    RIP: 00000000004538d3  RSP: 000000c820041f38  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 000000c820041f70  RCX: 00000000004538d3
    RDX: 0000000000000000  RSI: 000000c820041f70  RDI: 0000000000000000
    RBP: ffffffffffffffff   R8: 000000c820032000   R9: 000000c820000480
    R10: 0000000000000000  R11: 0000000000000246  R12: 000000000042aa30
    R13: 00000000004c4bc4  R14: 0000000000000000  R15: 0000000000000008
    ORIG_RAX: 0000000000000083  CS: 0033  SS: 002b
crash>

  reply	other threads:[~2016-09-22  4:15 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-21 15:29 strace lockup when tracing exec in go Michal Hocko
2016-09-22  4:15 ` Mike Galbraith [this message]
2016-09-22  8:01   ` Michal Hocko
2016-09-22  8:20     ` Aleksa Sarai
2016-09-22  8:36     ` Michal Hocko
2016-09-22  9:40       ` Mike Galbraith
2016-09-22  9:53         ` Michal Hocko
2016-09-22 10:09           ` Mike Galbraith
2016-09-22 11:09             ` Michal Hocko
2016-09-22 13:53               ` Michal Hocko
2016-09-23  1:17                 ` Aleksa Sarai
2016-09-23 10:21                 ` Oleg Nesterov
2016-09-23 11:18                   ` Michal Hocko
2016-09-23 13:21                     ` Oleg Nesterov
2016-09-23 13:40                       ` Michal Hocko
2016-09-23 14:07                         ` Oleg Nesterov
2016-09-23 14:19                           ` Michal Hocko
2016-09-23  9:50 ` Oleg Nesterov
2016-09-23 11:23   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1474517702.4851.21.camel@gmail.com \
    --to=umgwanakikbuti@gmail.com \
    --cc=asarai@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=strace-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).