All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Shi <yang.shi@linux.alibaba.com>
To: Alexey Dobriyan <adobriyan@gmail.com>
Cc: akpm@linux-foundation.org, mingo@kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read()
Date: Fri, 23 Feb 2018 10:28:37 -0800	[thread overview]
Message-ID: <bba96569-cf4c-dea5-2446-7ac53e3d8a9a@linux.alibaba.com> (raw)
In-Reply-To: <20180221195720.GA639@avx2>



On 2/21/18 11:57 AM, Alexey Dobriyan wrote:
> On Tue, Feb 20, 2018 at 03:38:24PM -0800, Yang Shi wrote:
>>
>> On 2/20/18 2:38 PM, Alexey Dobriyan wrote:
>>> On Wed, Feb 21, 2018 at 03:49:29AM +0800, Yang Shi wrote:
>>>> When running vm-scalability with large memory (> 300GB), the below hung
>>>> task issue happens occasionally.
>>>>
>>>> INFO: task ps:14018 blocked for more than 120 seconds.
>>>>          Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
>>>>    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>    ps              D    0 14018      1 0x00000004
>>>>     ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
>>>>     ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
>>>>     00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
>>>>    Call Trace:
>>>>     [<ffffffff817154d0>] ? __schedule+0x250/0x730
>>>>     [<ffffffff817159e6>] schedule+0x36/0x80
>>>>     [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
>>>>     [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
>>>>     [<ffffffff81717db0>] down_read+0x20/0x40
>>>>     [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
>>>>     [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
>>>>     [<ffffffff81241d87>] __vfs_read+0x37/0x150
>>>>     [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
>>>>     [<ffffffff81242266>] vfs_read+0x96/0x130
>>>>     [<ffffffff812437b5>] SyS_read+0x55/0xc0
>>>>     [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
>>>>
>>>> When manipulating a large mapping, the process may hold the mmap_sem for
>>>> long time, so reading /proc/<pid>/cmdline may be blocked in
>>>> uninterruptible state for long time.
>>>>
>>>> down_read_trylock() sounds too aggressive, and we already have killable
>>>> version APIs for semaphore, here use down_read_killable() to improve the
>>>> responsiveness.
>>>> -	down_read(&mm->mmap_sem);
>>>> +	rv = down_read_killable(&mm->mmap_sem);
>>>> +	if (rv)
>>>> +		goto out_mmput;
>>>>    	arg_start = mm->arg_start;
>>>>    	arg_end = mm->arg_end;
>>>>    	env_start = mm->env_start;
>>> Fix is incomplete
>> Yes, it is. Since I just ran into the above splat, so I just did the
>> minimum change.
>>
>>> 1) /proc/*/cmdline only wants to read 4 values atomically,
>>>      those 4 values are basically random values and aren't
>>>      related to VM part at all (after C/R went in, they are
>>>      settable to arbitrary values)
>> Sorry, I don't get your point here. Could you please elaborate?
> I hoped there is some random spinlock those 4 values could be moved to
> but no.
>
>>> 2) access_remote_vm() et al will do the same ->mmap_sem, and
>> Yes, it does. But, __access_remote_vm() is called by access_process_vm()
>> too, which is used by much more places, i.e. ptrace, so I was not sure
>> if it is preferred to convert to killable version. So, I leave it untouched.
> Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read
> and your scalability tests should trigger next backtrace right away.
>
>>> 3) /proc/*/environ and get_cmdline() do the same.
>> They look suitable to use killable version.

Had a look at get_cmdline() further. It looks it is not necessary to 
convert to killable since it is called mainly by audit_exit(), which is 
called when process or syscall exit, so it might be preferred to wait 
until the mmap_sem is released. So, it sounds pointless to use 
down_read_killable().

Thanks,
Yang

>>
>> BTW, I just realized the code should go to out_free_page instead of
>> out_mmput. Will rectify in newer version once we decide the extra places
>> to use killable version.

  parent reply	other threads:[~2018-02-23 18:28 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-20 19:49 [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() Yang Shi
2018-02-20 22:38 ` Alexey Dobriyan
2018-02-20 23:38   ` Yang Shi
2018-02-21 19:57     ` Alexey Dobriyan
2018-02-21 23:13       ` Yang Shi
2018-02-23 19:33         ` Alexey Dobriyan
2018-02-23 19:42           ` Yang Shi
2018-02-23 19:45             ` Alexey Dobriyan
2018-02-23 20:08               ` Yang Shi
2018-02-23 18:28       ` Yang Shi [this message]
2018-02-20 23:57 ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bba96569-cf4c-dea5-2446-7ac53e3d8a9a@linux.alibaba.com \
    --to=yang.shi@linux.alibaba.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.