All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Galbraith <umgwanakikbuti@gmail.com>
To: Michal Hocko <mhocko@kernel.org>, LKML <linux-kernel@vger.kernel.org>
Cc: strace-devel@lists.sourceforge.net,
	Oleg Nesterov <oleg@redhat.com>, Aleksa Sarai <asarai@suse.com>
Subject: Re: strace lockup when tracing exec in go
Date: Thu, 22 Sep 2016 06:15:02 +0200	[thread overview]
Message-ID: <1474517702.4851.21.camel@gmail.com> (raw)
In-Reply-To: <20160921152946.GA24210@dhcp22.suse.cz>

On Wed, 2016-09-21 at 17:29 +0200, Michal Hocko wrote:
> [I am CCing strace mailing list because even if this turns out to be a
>  kernel bug strace seems to be doing something unexpected - more on that
>  below]
> 
> Hi,
> Aleksa has reported the following lockup when stracing the following go
> program
> 
> % cat exec.go
> package main
> 
> import (
>     "os"
>     "syscall"
> )
> 
> func main() {
>     syscall.Exec("/bin/echo", []string{"/bin/echo", "Hello"}, os.Environ())
> }
> $ go version
> go version go1.6.3 linux/amd64
> $ go build -o exec exec.go
> 
> $ strace -f ./exec
> [snip]
> [pid 10349] select(0, NULL, NULL, NULL, {0, 100} 
> [pid 10346] <... select resumed> )      = 0 (Timeout)
> [pid 10346] select(0, NULL, NULL, NULL, {0, 20} 
> [pid 10345] execve("/bin/echo", ["/bin/echo", "Hello"], [/* 95 vars */] 
> [pid 10346] <... select resumed> )      = 0 (Timeout)
> [pid 10349] <... select resumed> )      = 0 (Timeout)
> 
> execve will never finish unless the strace process get killed with
> SIGKILL. The following was debugged on 3.12 kernel and the current
> kernel seems to not trigger the issue as easily but I believe the same
> problem is there as well.
> 
> The further investigation shown that the tracer (strace) is stuck
> waiting for cred_guard_mutex
> [<0000000000000000>] mm_access+0x22/0xa0
> [<0000000000000000>] process_vm_rw_core.isra.1+0x112/0x6c0
> [<0000000000000000>] process_vm_rw+0xab/0x110
> [<0000000000000000>] SyS_process_vm_readv+0x15/0x20
> [<0000000000000000>] system_call_fastpath+0x16/0x1b
> [<0000000000000000>] 0x7f186f031c3a
> [<0000000000000000>] 0xffffffffffffffff
> 
> while the traced process (3 threads) are trying to perform the exec.
> That means that 2 threads are dead (in fact zombies) waiting in their
> final schedule.
> Call Trace:
>  [] do_exit+0x6f7/0xa70
>  [] do_group_exit+0x39/0xa0
>  [] get_signal_to_deliver+0x1e8/0x5c0
>  [] do_signal+0x42/0x670
>  [] do_notify_resume+0x78/0xc0
>  [] int_signal+0x12/0x17
>  [<00007f3a33f3ffb9>] 0x7f3a33f3ffb8
> 
> and one is
> Call Trace:
>  [] flush_old_exec+0xdf/0x890
>  [] load_elf_binary+0x307/0xda0
>  [] search_binary_handler+0xae/0x1f0
>  [] do_execve_common.isra.26+0x64e/0x810
>  [] SyS_execve+0x31/0x50
>  [] stub_execve+0x69/0xa0
>  [<00007f3a33f16527>] 0x7f3a33f16526
> 
> waiting for notify_count to drop down to 0
>         while (sig->notify_count) {
>                 __set_current_state(TASK_KILLABLE);
>                 spin_unlock_irq(lock);
>                 schedule(); <<<<<<<<<
>                 if (unlikely(__fatal_signal_pending(tsk)))
>                         goto killed;
>                 spin_lock_irq(lock);
>         }
> 
> this however doesn't happen because both threads which are dead
> are zombies waiting to be reaped by the parent and to call their
> release_task->__exit_signal. The tracer is blocked on the lock held by
> exec (in prepare_bprm_creds). This is the case in the current kernel as
> well AFAICS so the same should be possible as well. So is this a bug or
> something somewhere else makes sure that this will not happen in newer
> kernels?

master.today...

crash> bt 5138
PID: 5138   TASK: ffff8803fb4e4c80  CPU: 6   COMMAND: "strace"
 #0 [ffff88038e323b68] __schedule at ffffffff81624660
 #1 [ffff88038e323bb0] schedule at ffffffff81624b35
 #2 [ffff88038e323bc8] schedule_preempt_disabled at ffffffff81624e0e
 #3 [ffff88038e323bd8] __mutex_lock_killable_slowpath at ffffffff81625f9d
 #4 [ffff88038e323c30] mutex_lock_killable at ffffffff8162605a
 #5 [ffff88038e323c48] mm_access at ffffffff8105bbf7
 #6 [ffff88038e323c70] process_vm_rw_core at ffffffff811823d3
 #7 [ffff88038e323dc0] process_vm_rw at ffffffff8118287d
 #8 [ffff88038e323f38] sys_process_vm_readv at ffffffff811829b9
 #9 [ffff88038e323f50] entry_SYSCALL_64_fastpath at ffffffff81628736
    RIP: 00007faee6b2ac3a  RSP: 00007ffdc95b62b8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00000000019e7030  RCX: 00007faee6b2ac3a
    RDX: 0000000000000001  RSI: 00007ffdc95b62d0  RDI: 0000000000001418
    RBP: 0000000000001414   R8: 0000000000000001   R9: 0000000000000000
    R10: 00007ffdc95b62e0  R11: 0000000000000246  R12: 00007faee73e66c0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000000000136  CS: 0033  SS: 002b
crash> bt 5140
PID: 5140   TASK: ffff8803e703b300  CPU: 4   COMMAND: "exec"
 #0 [ffff88038e0b3ce8] __schedule at ffffffff81624660
 #1 [ffff88038e0b3d30] schedule at ffffffff81624b35
 #2 [ffff88038e0b3d48] flush_old_exec at ffffffff811b1f26
 #3 [ffff88038e0b3d88] load_elf_binary at ffffffff811fd70f
 #4 [ffff88038e0b3e60] search_binary_handler at ffffffff811b13ac
 #5 [ffff88038e0b3e98] do_execveat_common at ffffffff811b2f8f
 #6 [ffff88038e0b3f00] sys_execve at ffffffff811b33ba
 #7 [ffff88038e0b3f28] do_syscall_64 at ffffffff8100194e
    RIP: 000000000045c1ef  RSP: 000000c820043e00  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 000000c82000e6c0  RCX: 000000000045c1ef
    RDX: 000000c820076380  RSI: 000000c82000e6c0  RDI: 000000c82000a400
    RBP: 000000c820076380   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000246  R12: 000000000000006b
    R13: 00000000004c4b40  R14: 0000000000003fff  R15: 0000000000000039
    ORIG_RAX: 000000000000003b  CS: 0033  SS: 002b
crash> bt 5141
PID: 5141   TASK: ffff8803b0221980  CPU: 0   COMMAND: "exec"
 #0 [ffff88038e33fc60] __schedule at ffffffff81624660
 #1 [ffff88038e33fca8] schedule at ffffffff81624b35
 #2 [ffff88038e33fcc0] do_exit at ffffffff81061292
 #3 [ffff88038e33fd28] do_group_exit at ffffffff8106247f
 #4 [ffff88038e33fd58] get_signal at ffffffff8106c7fc
 #5 [ffff88038e33fdf0] do_signal at ffffffff81018468
 #6 [ffff88038e33fef8] exit_to_usermode_loop at ffffffff81059820
 #7 [ffff88038e33ff28] do_syscall_64 at ffffffff81001a35
    RIP: 00000000004538d3  RSP: 000000c820041f38  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: 000000c820041f70  RCX: 00000000004538d3
    RDX: 0000000000000000  RSI: 000000c820041f70  RDI: 0000000000000000
    RBP: ffffffffffffffff   R8: 000000c820032000   R9: 000000c820000480
    R10: 0000000000000000  R11: 0000000000000246  R12: 000000000042aa30
    R13: 00000000004c4bc4  R14: 0000000000000000  R15: 0000000000000008
    ORIG_RAX: 0000000000000083  CS: 0033  SS: 002b
crash>

  reply	other threads:[~2016-09-22  4:15 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-21 15:29 strace lockup when tracing exec in go Michal Hocko
2016-09-22  4:15 ` Mike Galbraith [this message]
2016-09-22  8:01   ` Michal Hocko
2016-09-22  8:20     ` Aleksa Sarai
2016-09-22  8:36     ` Michal Hocko
2016-09-22  9:40       ` Mike Galbraith
2016-09-22  9:53         ` Michal Hocko
2016-09-22 10:09           ` Mike Galbraith
2016-09-22 11:09             ` Michal Hocko
2016-09-22 13:53               ` Michal Hocko
2016-09-23  1:17                 ` Aleksa Sarai
2016-09-23 10:21                 ` Oleg Nesterov
2016-09-23 11:18                   ` Michal Hocko
2016-09-23 13:21                     ` Oleg Nesterov
2016-09-23 13:40                       ` Michal Hocko
2016-09-23 14:07                         ` Oleg Nesterov
2016-09-23 14:19                           ` Michal Hocko
2016-09-23  9:50 ` Oleg Nesterov
2016-09-23 11:23   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1474517702.4851.21.camel@gmail.com \
    --to=umgwanakikbuti@gmail.com \
    --cc=asarai@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=oleg@redhat.com \
    --cc=strace-devel@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.