* [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read()
@ 2018-02-20 19:49 Yang Shi
2018-02-20 22:38 ` Alexey Dobriyan
2018-02-20 23:57 ` Yang Shi
0 siblings, 2 replies; 11+ messages in thread
From: Yang Shi @ 2018-02-20 19:49 UTC (permalink / raw)
To: akpm, mingo, adobriyan; +Cc: yang.shi, linux-kernel
When running vm-scalability with large memory (> 300GB), the below hung
task issue happens occasionally.
INFO: task ps:14018 blocked for more than 120 seconds.
Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ps D 0 14018 1 0x00000004
ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
Call Trace:
[<ffffffff817154d0>] ? __schedule+0x250/0x730
[<ffffffff817159e6>] schedule+0x36/0x80
[<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
[<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
[<ffffffff81717db0>] down_read+0x20/0x40
[<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
[<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
[<ffffffff81241d87>] __vfs_read+0x37/0x150
[<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
[<ffffffff81242266>] vfs_read+0x96/0x130
[<ffffffff812437b5>] SyS_read+0x55/0xc0
[<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
When manipulating a large mapping, the process may hold the mmap_sem for
long time, so reading /proc/<pid>/cmdline may be blocked in
uninterruptible state for long time.
down_read_trylock() sounds too aggressive, and we already have killable
version APIs for semaphore, here use down_read_killable() to improve the
responsiveness.
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
---
fs/proc/base.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9298324..44b6f8f 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -242,7 +242,9 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf,
goto out_mmput;
}
- down_read(&mm->mmap_sem);
+ rv = down_read_killable(&mm->mmap_sem);
+ if (rv)
+ goto out_mmput;
arg_start = mm->arg_start;
arg_end = mm->arg_end;
env_start = mm->env_start;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read()
2018-02-20 19:49 [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() Yang Shi
@ 2018-02-20 22:38 ` Alexey Dobriyan
2018-02-20 23:38 ` Yang Shi
2018-02-20 23:57 ` Yang Shi
1 sibling, 1 reply; 11+ messages in thread
From: Alexey Dobriyan @ 2018-02-20 22:38 UTC (permalink / raw)
To: Yang Shi; +Cc: akpm, mingo, linux-kernel
On Wed, Feb 21, 2018 at 03:49:29AM +0800, Yang Shi wrote:
> When running vm-scalability with large memory (> 300GB), the below hung
> task issue happens occasionally.
>
> INFO: task ps:14018 blocked for more than 120 seconds.
> Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ps D 0 14018 1 0x00000004
> ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
> ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
> 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
> Call Trace:
> [<ffffffff817154d0>] ? __schedule+0x250/0x730
> [<ffffffff817159e6>] schedule+0x36/0x80
> [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
> [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
> [<ffffffff81717db0>] down_read+0x20/0x40
> [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
> [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
> [<ffffffff81241d87>] __vfs_read+0x37/0x150
> [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
> [<ffffffff81242266>] vfs_read+0x96/0x130
> [<ffffffff812437b5>] SyS_read+0x55/0xc0
> [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
>
> When manipulating a large mapping, the process may hold the mmap_sem for
> long time, so reading /proc/<pid>/cmdline may be blocked in
> uninterruptible state for long time.
>
> down_read_trylock() sounds too aggressive, and we already have killable
> version APIs for semaphore, here use down_read_killable() to improve the
> responsiveness.
> - down_read(&mm->mmap_sem);
> + rv = down_read_killable(&mm->mmap_sem);
> + if (rv)
> + goto out_mmput;
> arg_start = mm->arg_start;
> arg_end = mm->arg_end;
> env_start = mm->env_start;
Fix is incomplete
1) /proc/*/cmdline only wants to read 4 values atomically,
those 4 values are basically random values and aren't
related to VM part at all (after C/R went in, they are
settable to arbitrary values)
2) access_remote_vm() et al will do the same ->mmap_sem, and
3) /proc/*/environ and get_cmdline() do the same.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read()
2018-02-20 22:38 ` Alexey Dobriyan
@ 2018-02-20 23:38 ` Yang Shi
2018-02-21 19:57 ` Alexey Dobriyan
0 siblings, 1 reply; 11+ messages in thread
From: Yang Shi @ 2018-02-20 23:38 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: akpm, mingo, linux-kernel
On 2/20/18 2:38 PM, Alexey Dobriyan wrote:
> On Wed, Feb 21, 2018 at 03:49:29AM +0800, Yang Shi wrote:
>> When running vm-scalability with large memory (> 300GB), the below hung
>> task issue happens occasionally.
>>
>> INFO: task ps:14018 blocked for more than 120 seconds.
>> Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> ps D 0 14018 1 0x00000004
>> ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
>> ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
>> 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
>> Call Trace:
>> [<ffffffff817154d0>] ? __schedule+0x250/0x730
>> [<ffffffff817159e6>] schedule+0x36/0x80
>> [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
>> [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
>> [<ffffffff81717db0>] down_read+0x20/0x40
>> [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
>> [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
>> [<ffffffff81241d87>] __vfs_read+0x37/0x150
>> [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
>> [<ffffffff81242266>] vfs_read+0x96/0x130
>> [<ffffffff812437b5>] SyS_read+0x55/0xc0
>> [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
>>
>> When manipulating a large mapping, the process may hold the mmap_sem for
>> long time, so reading /proc/<pid>/cmdline may be blocked in
>> uninterruptible state for long time.
>>
>> down_read_trylock() sounds too aggressive, and we already have killable
>> version APIs for semaphore, here use down_read_killable() to improve the
>> responsiveness.
>> - down_read(&mm->mmap_sem);
>> + rv = down_read_killable(&mm->mmap_sem);
>> + if (rv)
>> + goto out_mmput;
>> arg_start = mm->arg_start;
>> arg_end = mm->arg_end;
>> env_start = mm->env_start;
> Fix is incomplete
Yes, it is. Since I just ran into the above splat, so I just did the
minimum change.
> 1) /proc/*/cmdline only wants to read 4 values atomically,
> those 4 values are basically random values and aren't
> related to VM part at all (after C/R went in, they are
> settable to arbitrary values)
Sorry, I don't get your point here. Could you please elaborate?
> 2) access_remote_vm() et al will do the same ->mmap_sem, and
Yes, it does. But, __access_remote_vm() is called by access_process_vm()
too, which is used by much more places, i.e. ptrace, so I was not sure
if it is preferred to convert to killable version. So, I leave it untouched.
> 3) /proc/*/environ and get_cmdline() do the same.
They look suitable to use killable version.
BTW, I just realized the code should go to out_free_page instead of
out_mmput. Will rectify in newer version once we decide the extra places
to use killable version.
Thanks,
Yang
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read()
2018-02-20 19:49 [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() Yang Shi
2018-02-20 22:38 ` Alexey Dobriyan
@ 2018-02-20 23:57 ` Yang Shi
1 sibling, 0 replies; 11+ messages in thread
From: Yang Shi @ 2018-02-20 23:57 UTC (permalink / raw)
To: akpm, mingo, adobriyan; +Cc: linux-kernel
On 2/20/18 11:49 AM, Yang Shi wrote:
> When running vm-scalability with large memory (> 300GB), the below hung
> task issue happens occasionally.
>
> INFO: task ps:14018 blocked for more than 120 seconds.
> Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ps D 0 14018 1 0x00000004
> ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
> ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
> 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
> Call Trace:
> [<ffffffff817154d0>] ? __schedule+0x250/0x730
> [<ffffffff817159e6>] schedule+0x36/0x80
> [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
> [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
> [<ffffffff81717db0>] down_read+0x20/0x40
> [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
> [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
> [<ffffffff81241d87>] __vfs_read+0x37/0x150
> [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
> [<ffffffff81242266>] vfs_read+0x96/0x130
> [<ffffffff812437b5>] SyS_read+0x55/0xc0
> [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
>
> When manipulating a large mapping, the process may hold the mmap_sem for
> long time, so reading /proc/<pid>/cmdline may be blocked in
> uninterruptible state for long time.
>
> down_read_trylock() sounds too aggressive, and we already have killable
> version APIs for semaphore, here use down_read_killable() to improve the
> responsiveness.
>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> ---
> fs/proc/base.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 9298324..44b6f8f 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -242,7 +242,9 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf,
> goto out_mmput;
> }
>
> - down_read(&mm->mmap_sem);
> + rv = down_read_killable(&mm->mmap_sem);
> + if (rv)
> + goto out_mmput;
> arg_start = mm->arg_start;
> arg_end = mm->arg_end;
> env_start = mm->env_start;
Should goto out_free_page, amended patch as below.
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9298324..057e6bd 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -242,7 +242,9 @@ static ssize_t proc_pid_cmdline_read(struct file
*file, char __user *buf,
goto out_mmput;
}
- down_read(&mm->mmap_sem);
+ rv = down_read_killable(&mm->mmap_sem);
+ if (rv)
+ goto out_free_page;
arg_start = mm->arg_start;
arg_end = mm->arg_end;
env_start = mm->env_start;
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read()
2018-02-20 23:38 ` Yang Shi
@ 2018-02-21 19:57 ` Alexey Dobriyan
2018-02-21 23:13 ` Yang Shi
2018-02-23 18:28 ` Yang Shi
0 siblings, 2 replies; 11+ messages in thread
From: Alexey Dobriyan @ 2018-02-21 19:57 UTC (permalink / raw)
To: Yang Shi; +Cc: akpm, mingo, linux-kernel
On Tue, Feb 20, 2018 at 03:38:24PM -0800, Yang Shi wrote:
>
>
> On 2/20/18 2:38 PM, Alexey Dobriyan wrote:
> > On Wed, Feb 21, 2018 at 03:49:29AM +0800, Yang Shi wrote:
> >> When running vm-scalability with large memory (> 300GB), the below hung
> >> task issue happens occasionally.
> >>
> >> INFO: task ps:14018 blocked for more than 120 seconds.
> >> Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
> >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> ps D 0 14018 1 0x00000004
> >> ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
> >> ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
> >> 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
> >> Call Trace:
> >> [<ffffffff817154d0>] ? __schedule+0x250/0x730
> >> [<ffffffff817159e6>] schedule+0x36/0x80
> >> [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
> >> [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
> >> [<ffffffff81717db0>] down_read+0x20/0x40
> >> [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
> >> [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
> >> [<ffffffff81241d87>] __vfs_read+0x37/0x150
> >> [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
> >> [<ffffffff81242266>] vfs_read+0x96/0x130
> >> [<ffffffff812437b5>] SyS_read+0x55/0xc0
> >> [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
> >>
> >> When manipulating a large mapping, the process may hold the mmap_sem for
> >> long time, so reading /proc/<pid>/cmdline may be blocked in
> >> uninterruptible state for long time.
> >>
> >> down_read_trylock() sounds too aggressive, and we already have killable
> >> version APIs for semaphore, here use down_read_killable() to improve the
> >> responsiveness.
> >> - down_read(&mm->mmap_sem);
> >> + rv = down_read_killable(&mm->mmap_sem);
> >> + if (rv)
> >> + goto out_mmput;
> >> arg_start = mm->arg_start;
> >> arg_end = mm->arg_end;
> >> env_start = mm->env_start;
> > Fix is incomplete
>
> Yes, it is. Since I just ran into the above splat, so I just did the
> minimum change.
>
> > 1) /proc/*/cmdline only wants to read 4 values atomically,
> > those 4 values are basically random values and aren't
> > related to VM part at all (after C/R went in, they are
> > settable to arbitrary values)
>
> Sorry, I don't get your point here. Could you please elaborate?
I hoped there is some random spinlock those 4 values could be moved to
but no.
> > 2) access_remote_vm() et al will do the same ->mmap_sem, and
>
> Yes, it does. But, __access_remote_vm() is called by access_process_vm()
> too, which is used by much more places, i.e. ptrace, so I was not sure
> if it is preferred to convert to killable version. So, I leave it untouched.
Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read
and your scalability tests should trigger next backtrace right away.
> > 3) /proc/*/environ and get_cmdline() do the same.
>
> They look suitable to use killable version.
>
> BTW, I just realized the code should go to out_free_page instead of
> out_mmput. Will rectify in newer version once we decide the extra places
> to use killable version.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read()
2018-02-21 19:57 ` Alexey Dobriyan
@ 2018-02-21 23:13 ` Yang Shi
2018-02-23 19:33 ` Alexey Dobriyan
2018-02-23 18:28 ` Yang Shi
1 sibling, 1 reply; 11+ messages in thread
From: Yang Shi @ 2018-02-21 23:13 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: akpm, mingo, linux-kernel
On 2/21/18 11:57 AM, Alexey Dobriyan wrote:
> On Tue, Feb 20, 2018 at 03:38:24PM -0800, Yang Shi wrote:
>>
>> On 2/20/18 2:38 PM, Alexey Dobriyan wrote:
>>> On Wed, Feb 21, 2018 at 03:49:29AM +0800, Yang Shi wrote:
>>>> When running vm-scalability with large memory (> 300GB), the below hung
>>>> task issue happens occasionally.
>>>>
>>>> INFO: task ps:14018 blocked for more than 120 seconds.
>>>> Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> ps D 0 14018 1 0x00000004
>>>> ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
>>>> ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
>>>> 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
>>>> Call Trace:
>>>> [<ffffffff817154d0>] ? __schedule+0x250/0x730
>>>> [<ffffffff817159e6>] schedule+0x36/0x80
>>>> [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
>>>> [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
>>>> [<ffffffff81717db0>] down_read+0x20/0x40
>>>> [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
>>>> [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
>>>> [<ffffffff81241d87>] __vfs_read+0x37/0x150
>>>> [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
>>>> [<ffffffff81242266>] vfs_read+0x96/0x130
>>>> [<ffffffff812437b5>] SyS_read+0x55/0xc0
>>>> [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
>>>>
>>>> When manipulating a large mapping, the process may hold the mmap_sem for
>>>> long time, so reading /proc/<pid>/cmdline may be blocked in
>>>> uninterruptible state for long time.
>>>>
>>>> down_read_trylock() sounds too aggressive, and we already have killable
>>>> version APIs for semaphore, here use down_read_killable() to improve the
>>>> responsiveness.
>>>> - down_read(&mm->mmap_sem);
>>>> + rv = down_read_killable(&mm->mmap_sem);
>>>> + if (rv)
>>>> + goto out_mmput;
>>>> arg_start = mm->arg_start;
>>>> arg_end = mm->arg_end;
>>>> env_start = mm->env_start;
>>> Fix is incomplete
>> Yes, it is. Since I just ran into the above splat, so I just did the
>> minimum change.
>>
>>> 1) /proc/*/cmdline only wants to read 4 values atomically,
>>> those 4 values are basically random values and aren't
>>> related to VM part at all (after C/R went in, they are
>>> settable to arbitrary values)
>> Sorry, I don't get your point here. Could you please elaborate?
> I hoped there is some random spinlock those 4 values could be moved to
> but no.
Those 4 values might be changed by prctl_set_mm concurrently if
mm->mmap_sem is not acquired.
>
>>> 2) access_remote_vm() et al will do the same ->mmap_sem, and
>> Yes, it does. But, __access_remote_vm() is called by access_process_vm()
>> too, which is used by much more places, i.e. ptrace, so I was not sure
>> if it is preferred to convert to killable version. So, I leave it untouched.
> Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read
> and your scalability tests should trigger next backtrace right away.
Yes, however, I didn't run into it if mmap_sem is acquired earlier.
How about defining a killable version, like
__access_remote_vm_killable() which use down_read_killable(), then the
killable version can be used by proc/*/cmdline? There might be other
users in the future.
Thanks,
Yang
>
>>> 3) /proc/*/environ and get_cmdline() do the same.
>> They look suitable to use killable version.
>>
>> BTW, I just realized the code should go to out_free_page instead of
>> out_mmput. Will rectify in newer version once we decide the extra places
>> to use killable version.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read()
2018-02-21 19:57 ` Alexey Dobriyan
2018-02-21 23:13 ` Yang Shi
@ 2018-02-23 18:28 ` Yang Shi
1 sibling, 0 replies; 11+ messages in thread
From: Yang Shi @ 2018-02-23 18:28 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: akpm, mingo, linux-kernel
On 2/21/18 11:57 AM, Alexey Dobriyan wrote:
> On Tue, Feb 20, 2018 at 03:38:24PM -0800, Yang Shi wrote:
>>
>> On 2/20/18 2:38 PM, Alexey Dobriyan wrote:
>>> On Wed, Feb 21, 2018 at 03:49:29AM +0800, Yang Shi wrote:
>>>> When running vm-scalability with large memory (> 300GB), the below hung
>>>> task issue happens occasionally.
>>>>
>>>> INFO: task ps:14018 blocked for more than 120 seconds.
>>>> Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> ps D 0 14018 1 0x00000004
>>>> ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
>>>> ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
>>>> 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
>>>> Call Trace:
>>>> [<ffffffff817154d0>] ? __schedule+0x250/0x730
>>>> [<ffffffff817159e6>] schedule+0x36/0x80
>>>> [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
>>>> [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
>>>> [<ffffffff81717db0>] down_read+0x20/0x40
>>>> [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
>>>> [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
>>>> [<ffffffff81241d87>] __vfs_read+0x37/0x150
>>>> [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
>>>> [<ffffffff81242266>] vfs_read+0x96/0x130
>>>> [<ffffffff812437b5>] SyS_read+0x55/0xc0
>>>> [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
>>>>
>>>> When manipulating a large mapping, the process may hold the mmap_sem for
>>>> long time, so reading /proc/<pid>/cmdline may be blocked in
>>>> uninterruptible state for long time.
>>>>
>>>> down_read_trylock() sounds too aggressive, and we already have killable
>>>> version APIs for semaphore, here use down_read_killable() to improve the
>>>> responsiveness.
>>>> - down_read(&mm->mmap_sem);
>>>> + rv = down_read_killable(&mm->mmap_sem);
>>>> + if (rv)
>>>> + goto out_mmput;
>>>> arg_start = mm->arg_start;
>>>> arg_end = mm->arg_end;
>>>> env_start = mm->env_start;
>>> Fix is incomplete
>> Yes, it is. Since I just ran into the above splat, so I just did the
>> minimum change.
>>
>>> 1) /proc/*/cmdline only wants to read 4 values atomically,
>>> those 4 values are basically random values and aren't
>>> related to VM part at all (after C/R went in, they are
>>> settable to arbitrary values)
>> Sorry, I don't get your point here. Could you please elaborate?
> I hoped there is some random spinlock those 4 values could be moved to
> but no.
>
>>> 2) access_remote_vm() et al will do the same ->mmap_sem, and
>> Yes, it does. But, __access_remote_vm() is called by access_process_vm()
>> too, which is used by much more places, i.e. ptrace, so I was not sure
>> if it is preferred to convert to killable version. So, I leave it untouched.
> Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read
> and your scalability tests should trigger next backtrace right away.
>
>>> 3) /proc/*/environ and get_cmdline() do the same.
>> They look suitable to use killable version.
Had a look at get_cmdline() further. It looks it is not necessary to
convert to killable since it is called mainly by audit_exit(), which is
called when process or syscall exit, so it might be preferred to wait
until the mmap_sem is released. So, it sounds pointless to use
down_read_killable().
Thanks,
Yang
>>
>> BTW, I just realized the code should go to out_free_page instead of
>> out_mmput. Will rectify in newer version once we decide the extra places
>> to use killable version.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read()
2018-02-21 23:13 ` Yang Shi
@ 2018-02-23 19:33 ` Alexey Dobriyan
2018-02-23 19:42 ` Yang Shi
0 siblings, 1 reply; 11+ messages in thread
From: Alexey Dobriyan @ 2018-02-23 19:33 UTC (permalink / raw)
To: Yang Shi; +Cc: akpm, mingo, linux-kernel
On Wed, Feb 21, 2018 at 03:13:10PM -0800, Yang Shi wrote:
> >>> 2) access_remote_vm() et al will do the same ->mmap_sem, and
> >> Yes, it does. But, __access_remote_vm() is called by access_process_vm()
> >> too, which is used by much more places, i.e. ptrace, so I was not sure
> >> if it is preferred to convert to killable version. So, I leave it untouched.
> > Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read
> > and your scalability tests should trigger next backtrace right away.
>
> Yes, however, I didn't run into it if mmap_sem is acquired earlier.
>
> How about defining a killable version, like
> __access_remote_vm_killable() which use down_read_killable(), then the
> killable version can be used by proc/*/cmdline? There might be other
> users in the future.
It would be a disaster as interfaces multiply.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read()
2018-02-23 19:33 ` Alexey Dobriyan
@ 2018-02-23 19:42 ` Yang Shi
2018-02-23 19:45 ` Alexey Dobriyan
0 siblings, 1 reply; 11+ messages in thread
From: Yang Shi @ 2018-02-23 19:42 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: akpm, mingo, linux-kernel
On 2/23/18 11:33 AM, Alexey Dobriyan wrote:
> On Wed, Feb 21, 2018 at 03:13:10PM -0800, Yang Shi wrote:
>
>>>>> 2) access_remote_vm() et al will do the same ->mmap_sem, and
>>>> Yes, it does. But, __access_remote_vm() is called by access_process_vm()
>>>> too, which is used by much more places, i.e. ptrace, so I was not sure
>>>> if it is preferred to convert to killable version. So, I leave it untouched.
>>> Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read
>>> and your scalability tests should trigger next backtrace right away.
>> Yes, however, I didn't run into it if mmap_sem is acquired earlier.
>>
>> How about defining a killable version, like
>> __access_remote_vm_killable() which use down_read_killable(), then the
>> killable version can be used by proc/*/cmdline? There might be other
>> users in the future.
> It would be a disaster as interfaces multiply.
Might be not that bad. Just need add:
__access_remote_vm_killable()
access_remote_vm_killable()
Then access_process_vm() keep using __access_remote_vm(). And, use
access_remote_vm_killable() in fs/proc/base. And, it looks the
access_process_vm() calls in get_cmdline() can be changed to
access_remote_vm() (non killable version).
Yang
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read()
2018-02-23 19:42 ` Yang Shi
@ 2018-02-23 19:45 ` Alexey Dobriyan
2018-02-23 20:08 ` Yang Shi
0 siblings, 1 reply; 11+ messages in thread
From: Alexey Dobriyan @ 2018-02-23 19:45 UTC (permalink / raw)
To: Yang Shi; +Cc: akpm, mingo, linux-kernel
On Fri, Feb 23, 2018 at 11:42:34AM -0800, Yang Shi wrote:
>
>
> On 2/23/18 11:33 AM, Alexey Dobriyan wrote:
> > On Wed, Feb 21, 2018 at 03:13:10PM -0800, Yang Shi wrote:
> >
> >>>>> 2) access_remote_vm() et al will do the same ->mmap_sem, and
> >>>> Yes, it does. But, __access_remote_vm() is called by access_process_vm()
> >>>> too, which is used by much more places, i.e. ptrace, so I was not sure
> >>>> if it is preferred to convert to killable version. So, I leave it untouched.
> >>> Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read
> >>> and your scalability tests should trigger next backtrace right away.
> >> Yes, however, I didn't run into it if mmap_sem is acquired earlier.
> >>
> >> How about defining a killable version, like
> >> __access_remote_vm_killable() which use down_read_killable(), then the
> >> killable version can be used by proc/*/cmdline? There might be other
> >> users in the future.
> > It would be a disaster as interfaces multiply.
>
> Might be not that bad.
Maybe.
But you need to explain why there is no backtrace several lines later:
access_remote_vm
__access_remote_vm
down_read(&mm->mmap_sem)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read()
2018-02-23 19:45 ` Alexey Dobriyan
@ 2018-02-23 20:08 ` Yang Shi
0 siblings, 0 replies; 11+ messages in thread
From: Yang Shi @ 2018-02-23 20:08 UTC (permalink / raw)
To: Alexey Dobriyan; +Cc: akpm, mingo, linux-kernel
On 2/23/18 11:45 AM, Alexey Dobriyan wrote:
> On Fri, Feb 23, 2018 at 11:42:34AM -0800, Yang Shi wrote:
>>
>> On 2/23/18 11:33 AM, Alexey Dobriyan wrote:
>>> On Wed, Feb 21, 2018 at 03:13:10PM -0800, Yang Shi wrote:
>>>
>>>>>>> 2) access_remote_vm() et al will do the same ->mmap_sem, and
>>>>>> Yes, it does. But, __access_remote_vm() is called by access_process_vm()
>>>>>> too, which is used by much more places, i.e. ptrace, so I was not sure
>>>>>> if it is preferred to convert to killable version. So, I leave it untouched.
>>>>> Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read
>>>>> and your scalability tests should trigger next backtrace right away.
>>>> Yes, however, I didn't run into it if mmap_sem is acquired earlier.
>>>>
>>>> How about defining a killable version, like
>>>> __access_remote_vm_killable() which use down_read_killable(), then the
>>>> killable version can be used by proc/*/cmdline? There might be other
>>>> users in the future.
>>> It would be a disaster as interfaces multiply.
>> Might be not that bad.
> Maybe.
>
> But you need to explain why there is no backtrace several lines later:
>
> access_remote_vm
> __access_remote_vm
> down_read(&mm->mmap_sem)
I think it might be because:
CPU A CPU B
read /proc/*/cmdline
get_mm
acquire mmap_sem
munmap(300G) try to acquire mmap_sem --> go to sleep
release mmap_sem
got mmap_sem
release mmap_sem
access_remote_vm
put_mm
The munmap might happen right before access_remote_vm(), but I just
didn't run into it for the time being. It may be hit on another machine
or with some changes to the test cases.
BTW, even the hung I met happened occassionally, not very often. So, the
access_remote_vm() hit sounds less often. But, I agree it is still
possible in theory.
Regards,
Yang
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-02-23 20:08 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-20 19:49 [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() Yang Shi
2018-02-20 22:38 ` Alexey Dobriyan
2018-02-20 23:38 ` Yang Shi
2018-02-21 19:57 ` Alexey Dobriyan
2018-02-21 23:13 ` Yang Shi
2018-02-23 19:33 ` Alexey Dobriyan
2018-02-23 19:42 ` Yang Shi
2018-02-23 19:45 ` Alexey Dobriyan
2018-02-23 20:08 ` Yang Shi
2018-02-23 18:28 ` Yang Shi
2018-02-20 23:57 ` Yang Shi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.