* [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() @ 2018-02-20 19:49 Yang Shi 2018-02-20 22:38 ` Alexey Dobriyan 2018-02-20 23:57 ` Yang Shi 0 siblings, 2 replies; 11+ messages in thread From: Yang Shi @ 2018-02-20 19:49 UTC (permalink / raw) To: akpm, mingo, adobriyan; +Cc: yang.shi, linux-kernel When running vm-scalability with large memory (> 300GB), the below hung task issue happens occasionally. INFO: task ps:14018 blocked for more than 120 seconds. Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ps D 0 14018 1 0x00000004 ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0 ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000 Call Trace: [<ffffffff817154d0>] ? __schedule+0x250/0x730 [<ffffffff817159e6>] schedule+0x36/0x80 [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150 [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30 [<ffffffff81717db0>] down_read+0x20/0x40 [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0 [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100 [<ffffffff81241d87>] __vfs_read+0x37/0x150 [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0 [<ffffffff81242266>] vfs_read+0x96/0x130 [<ffffffff812437b5>] SyS_read+0x55/0xc0 [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5 When manipulating a large mapping, the process may hold the mmap_sem for long time, so reading /proc/<pid>/cmdline may be blocked in uninterruptible state for long time. down_read_trylock() sounds too aggressive, and we already have killable version APIs for semaphore, here use down_read_killable() to improve the responsiveness. Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> --- fs/proc/base.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 9298324..44b6f8f 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -242,7 +242,9 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf, goto out_mmput; } - down_read(&mm->mmap_sem); + rv = down_read_killable(&mm->mmap_sem); + if (rv) + goto out_mmput; arg_start = mm->arg_start; arg_end = mm->arg_end; env_start = mm->env_start; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() 2018-02-20 19:49 [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() Yang Shi @ 2018-02-20 22:38 ` Alexey Dobriyan 2018-02-20 23:38 ` Yang Shi 2018-02-20 23:57 ` Yang Shi 1 sibling, 1 reply; 11+ messages in thread From: Alexey Dobriyan @ 2018-02-20 22:38 UTC (permalink / raw) To: Yang Shi; +Cc: akpm, mingo, linux-kernel On Wed, Feb 21, 2018 at 03:49:29AM +0800, Yang Shi wrote: > When running vm-scalability with large memory (> 300GB), the below hung > task issue happens occasionally. > > INFO: task ps:14018 blocked for more than 120 seconds. > Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > ps D 0 14018 1 0x00000004 > ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0 > ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040 > 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000 > Call Trace: > [<ffffffff817154d0>] ? __schedule+0x250/0x730 > [<ffffffff817159e6>] schedule+0x36/0x80 > [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150 > [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30 > [<ffffffff81717db0>] down_read+0x20/0x40 > [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0 > [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100 > [<ffffffff81241d87>] __vfs_read+0x37/0x150 > [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0 > [<ffffffff81242266>] vfs_read+0x96/0x130 > [<ffffffff812437b5>] SyS_read+0x55/0xc0 > [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5 > > When manipulating a large mapping, the process may hold the mmap_sem for > long time, so reading /proc/<pid>/cmdline may be blocked in > uninterruptible state for long time. > > down_read_trylock() sounds too aggressive, and we already have killable > version APIs for semaphore, here use down_read_killable() to improve the > responsiveness. > - down_read(&mm->mmap_sem); > + rv = down_read_killable(&mm->mmap_sem); > + if (rv) > + goto out_mmput; > arg_start = mm->arg_start; > arg_end = mm->arg_end; > env_start = mm->env_start; Fix is incomplete 1) /proc/*/cmdline only wants to read 4 values atomically, those 4 values are basically random values and aren't related to VM part at all (after C/R went in, they are settable to arbitrary values) 2) access_remote_vm() et al will do the same ->mmap_sem, and 3) /proc/*/environ and get_cmdline() do the same. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() 2018-02-20 22:38 ` Alexey Dobriyan @ 2018-02-20 23:38 ` Yang Shi 2018-02-21 19:57 ` Alexey Dobriyan 0 siblings, 1 reply; 11+ messages in thread From: Yang Shi @ 2018-02-20 23:38 UTC (permalink / raw) To: Alexey Dobriyan; +Cc: akpm, mingo, linux-kernel On 2/20/18 2:38 PM, Alexey Dobriyan wrote: > On Wed, Feb 21, 2018 at 03:49:29AM +0800, Yang Shi wrote: >> When running vm-scalability with large memory (> 300GB), the below hung >> task issue happens occasionally. >> >> INFO: task ps:14018 blocked for more than 120 seconds. >> Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> ps D 0 14018 1 0x00000004 >> ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0 >> ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040 >> 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000 >> Call Trace: >> [<ffffffff817154d0>] ? __schedule+0x250/0x730 >> [<ffffffff817159e6>] schedule+0x36/0x80 >> [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150 >> [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30 >> [<ffffffff81717db0>] down_read+0x20/0x40 >> [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0 >> [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100 >> [<ffffffff81241d87>] __vfs_read+0x37/0x150 >> [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0 >> [<ffffffff81242266>] vfs_read+0x96/0x130 >> [<ffffffff812437b5>] SyS_read+0x55/0xc0 >> [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5 >> >> When manipulating a large mapping, the process may hold the mmap_sem for >> long time, so reading /proc/<pid>/cmdline may be blocked in >> uninterruptible state for long time. >> >> down_read_trylock() sounds too aggressive, and we already have killable >> version APIs for semaphore, here use down_read_killable() to improve the >> responsiveness. >> - down_read(&mm->mmap_sem); >> + rv = down_read_killable(&mm->mmap_sem); >> + if (rv) >> + goto out_mmput; >> arg_start = mm->arg_start; >> arg_end = mm->arg_end; >> env_start = mm->env_start; > Fix is incomplete Yes, it is. Since I just ran into the above splat, so I just did the minimum change. > 1) /proc/*/cmdline only wants to read 4 values atomically, > those 4 values are basically random values and aren't > related to VM part at all (after C/R went in, they are > settable to arbitrary values) Sorry, I don't get your point here. Could you please elaborate? > 2) access_remote_vm() et al will do the same ->mmap_sem, and Yes, it does. But, __access_remote_vm() is called by access_process_vm() too, which is used by much more places, i.e. ptrace, so I was not sure if it is preferred to convert to killable version. So, I leave it untouched. > 3) /proc/*/environ and get_cmdline() do the same. They look suitable to use killable version. BTW, I just realized the code should go to out_free_page instead of out_mmput. Will rectify in newer version once we decide the extra places to use killable version. Thanks, Yang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() 2018-02-20 23:38 ` Yang Shi @ 2018-02-21 19:57 ` Alexey Dobriyan 2018-02-21 23:13 ` Yang Shi 2018-02-23 18:28 ` Yang Shi 0 siblings, 2 replies; 11+ messages in thread From: Alexey Dobriyan @ 2018-02-21 19:57 UTC (permalink / raw) To: Yang Shi; +Cc: akpm, mingo, linux-kernel On Tue, Feb 20, 2018 at 03:38:24PM -0800, Yang Shi wrote: > > > On 2/20/18 2:38 PM, Alexey Dobriyan wrote: > > On Wed, Feb 21, 2018 at 03:49:29AM +0800, Yang Shi wrote: > >> When running vm-scalability with large memory (> 300GB), the below hung > >> task issue happens occasionally. > >> > >> INFO: task ps:14018 blocked for more than 120 seconds. > >> Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 > >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> ps D 0 14018 1 0x00000004 > >> ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0 > >> ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040 > >> 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000 > >> Call Trace: > >> [<ffffffff817154d0>] ? __schedule+0x250/0x730 > >> [<ffffffff817159e6>] schedule+0x36/0x80 > >> [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150 > >> [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30 > >> [<ffffffff81717db0>] down_read+0x20/0x40 > >> [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0 > >> [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100 > >> [<ffffffff81241d87>] __vfs_read+0x37/0x150 > >> [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0 > >> [<ffffffff81242266>] vfs_read+0x96/0x130 > >> [<ffffffff812437b5>] SyS_read+0x55/0xc0 > >> [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5 > >> > >> When manipulating a large mapping, the process may hold the mmap_sem for > >> long time, so reading /proc/<pid>/cmdline may be blocked in > >> uninterruptible state for long time. > >> > >> down_read_trylock() sounds too aggressive, and we already have killable > >> version APIs for semaphore, here use down_read_killable() to improve the > >> responsiveness. > >> - down_read(&mm->mmap_sem); > >> + rv = down_read_killable(&mm->mmap_sem); > >> + if (rv) > >> + goto out_mmput; > >> arg_start = mm->arg_start; > >> arg_end = mm->arg_end; > >> env_start = mm->env_start; > > Fix is incomplete > > Yes, it is. Since I just ran into the above splat, so I just did the > minimum change. > > > 1) /proc/*/cmdline only wants to read 4 values atomically, > > those 4 values are basically random values and aren't > > related to VM part at all (after C/R went in, they are > > settable to arbitrary values) > > Sorry, I don't get your point here. Could you please elaborate? I hoped there is some random spinlock those 4 values could be moved to but no. > > 2) access_remote_vm() et al will do the same ->mmap_sem, and > > Yes, it does. But, __access_remote_vm() is called by access_process_vm() > too, which is used by much more places, i.e. ptrace, so I was not sure > if it is preferred to convert to killable version. So, I leave it untouched. Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read and your scalability tests should trigger next backtrace right away. > > 3) /proc/*/environ and get_cmdline() do the same. > > They look suitable to use killable version. > > BTW, I just realized the code should go to out_free_page instead of > out_mmput. Will rectify in newer version once we decide the extra places > to use killable version. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() 2018-02-21 19:57 ` Alexey Dobriyan @ 2018-02-21 23:13 ` Yang Shi 2018-02-23 19:33 ` Alexey Dobriyan 2018-02-23 18:28 ` Yang Shi 1 sibling, 1 reply; 11+ messages in thread From: Yang Shi @ 2018-02-21 23:13 UTC (permalink / raw) To: Alexey Dobriyan; +Cc: akpm, mingo, linux-kernel On 2/21/18 11:57 AM, Alexey Dobriyan wrote: > On Tue, Feb 20, 2018 at 03:38:24PM -0800, Yang Shi wrote: >> >> On 2/20/18 2:38 PM, Alexey Dobriyan wrote: >>> On Wed, Feb 21, 2018 at 03:49:29AM +0800, Yang Shi wrote: >>>> When running vm-scalability with large memory (> 300GB), the below hung >>>> task issue happens occasionally. >>>> >>>> INFO: task ps:14018 blocked for more than 120 seconds. >>>> Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 >>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>> ps D 0 14018 1 0x00000004 >>>> ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0 >>>> ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040 >>>> 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000 >>>> Call Trace: >>>> [<ffffffff817154d0>] ? __schedule+0x250/0x730 >>>> [<ffffffff817159e6>] schedule+0x36/0x80 >>>> [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150 >>>> [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30 >>>> [<ffffffff81717db0>] down_read+0x20/0x40 >>>> [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0 >>>> [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100 >>>> [<ffffffff81241d87>] __vfs_read+0x37/0x150 >>>> [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0 >>>> [<ffffffff81242266>] vfs_read+0x96/0x130 >>>> [<ffffffff812437b5>] SyS_read+0x55/0xc0 >>>> [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5 >>>> >>>> When manipulating a large mapping, the process may hold the mmap_sem for >>>> long time, so reading /proc/<pid>/cmdline may be blocked in >>>> uninterruptible state for long time. >>>> >>>> down_read_trylock() sounds too aggressive, and we already have killable >>>> version APIs for semaphore, here use down_read_killable() to improve the >>>> responsiveness. >>>> - down_read(&mm->mmap_sem); >>>> + rv = down_read_killable(&mm->mmap_sem); >>>> + if (rv) >>>> + goto out_mmput; >>>> arg_start = mm->arg_start; >>>> arg_end = mm->arg_end; >>>> env_start = mm->env_start; >>> Fix is incomplete >> Yes, it is. Since I just ran into the above splat, so I just did the >> minimum change. >> >>> 1) /proc/*/cmdline only wants to read 4 values atomically, >>> those 4 values are basically random values and aren't >>> related to VM part at all (after C/R went in, they are >>> settable to arbitrary values) >> Sorry, I don't get your point here. Could you please elaborate? > I hoped there is some random spinlock those 4 values could be moved to > but no. Those 4 values might be changed by prctl_set_mm concurrently if mm->mmap_sem is not acquired. > >>> 2) access_remote_vm() et al will do the same ->mmap_sem, and >> Yes, it does. But, __access_remote_vm() is called by access_process_vm() >> too, which is used by much more places, i.e. ptrace, so I was not sure >> if it is preferred to convert to killable version. So, I leave it untouched. > Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read > and your scalability tests should trigger next backtrace right away. Yes, however, I didn't run into it if mmap_sem is acquired earlier. How about defining a killable version, like __access_remote_vm_killable() which use down_read_killable(), then the killable version can be used by proc/*/cmdline? There might be other users in the future. Thanks, Yang > >>> 3) /proc/*/environ and get_cmdline() do the same. >> They look suitable to use killable version. >> >> BTW, I just realized the code should go to out_free_page instead of >> out_mmput. Will rectify in newer version once we decide the extra places >> to use killable version. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() 2018-02-21 23:13 ` Yang Shi @ 2018-02-23 19:33 ` Alexey Dobriyan 2018-02-23 19:42 ` Yang Shi 0 siblings, 1 reply; 11+ messages in thread From: Alexey Dobriyan @ 2018-02-23 19:33 UTC (permalink / raw) To: Yang Shi; +Cc: akpm, mingo, linux-kernel On Wed, Feb 21, 2018 at 03:13:10PM -0800, Yang Shi wrote: > >>> 2) access_remote_vm() et al will do the same ->mmap_sem, and > >> Yes, it does. But, __access_remote_vm() is called by access_process_vm() > >> too, which is used by much more places, i.e. ptrace, so I was not sure > >> if it is preferred to convert to killable version. So, I leave it untouched. > > Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read > > and your scalability tests should trigger next backtrace right away. > > Yes, however, I didn't run into it if mmap_sem is acquired earlier. > > How about defining a killable version, like > __access_remote_vm_killable() which use down_read_killable(), then the > killable version can be used by proc/*/cmdline? There might be other > users in the future. It would be a disaster as interfaces multiply. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() 2018-02-23 19:33 ` Alexey Dobriyan @ 2018-02-23 19:42 ` Yang Shi 2018-02-23 19:45 ` Alexey Dobriyan 0 siblings, 1 reply; 11+ messages in thread From: Yang Shi @ 2018-02-23 19:42 UTC (permalink / raw) To: Alexey Dobriyan; +Cc: akpm, mingo, linux-kernel On 2/23/18 11:33 AM, Alexey Dobriyan wrote: > On Wed, Feb 21, 2018 at 03:13:10PM -0800, Yang Shi wrote: > >>>>> 2) access_remote_vm() et al will do the same ->mmap_sem, and >>>> Yes, it does. But, __access_remote_vm() is called by access_process_vm() >>>> too, which is used by much more places, i.e. ptrace, so I was not sure >>>> if it is preferred to convert to killable version. So, I leave it untouched. >>> Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read >>> and your scalability tests should trigger next backtrace right away. >> Yes, however, I didn't run into it if mmap_sem is acquired earlier. >> >> How about defining a killable version, like >> __access_remote_vm_killable() which use down_read_killable(), then the >> killable version can be used by proc/*/cmdline? There might be other >> users in the future. > It would be a disaster as interfaces multiply. Might be not that bad. Just need add: __access_remote_vm_killable() access_remote_vm_killable() Then access_process_vm() keep using __access_remote_vm(). And, use access_remote_vm_killable() in fs/proc/base. And, it looks the access_process_vm() calls in get_cmdline() can be changed to access_remote_vm() (non killable version). Yang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() 2018-02-23 19:42 ` Yang Shi @ 2018-02-23 19:45 ` Alexey Dobriyan 2018-02-23 20:08 ` Yang Shi 0 siblings, 1 reply; 11+ messages in thread From: Alexey Dobriyan @ 2018-02-23 19:45 UTC (permalink / raw) To: Yang Shi; +Cc: akpm, mingo, linux-kernel On Fri, Feb 23, 2018 at 11:42:34AM -0800, Yang Shi wrote: > > > On 2/23/18 11:33 AM, Alexey Dobriyan wrote: > > On Wed, Feb 21, 2018 at 03:13:10PM -0800, Yang Shi wrote: > > > >>>>> 2) access_remote_vm() et al will do the same ->mmap_sem, and > >>>> Yes, it does. But, __access_remote_vm() is called by access_process_vm() > >>>> too, which is used by much more places, i.e. ptrace, so I was not sure > >>>> if it is preferred to convert to killable version. So, I leave it untouched. > >>> Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read > >>> and your scalability tests should trigger next backtrace right away. > >> Yes, however, I didn't run into it if mmap_sem is acquired earlier. > >> > >> How about defining a killable version, like > >> __access_remote_vm_killable() which use down_read_killable(), then the > >> killable version can be used by proc/*/cmdline? There might be other > >> users in the future. > > It would be a disaster as interfaces multiply. > > Might be not that bad. Maybe. But you need to explain why there is no backtrace several lines later: access_remote_vm __access_remote_vm down_read(&mm->mmap_sem) ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() 2018-02-23 19:45 ` Alexey Dobriyan @ 2018-02-23 20:08 ` Yang Shi 0 siblings, 0 replies; 11+ messages in thread From: Yang Shi @ 2018-02-23 20:08 UTC (permalink / raw) To: Alexey Dobriyan; +Cc: akpm, mingo, linux-kernel On 2/23/18 11:45 AM, Alexey Dobriyan wrote: > On Fri, Feb 23, 2018 at 11:42:34AM -0800, Yang Shi wrote: >> >> On 2/23/18 11:33 AM, Alexey Dobriyan wrote: >>> On Wed, Feb 21, 2018 at 03:13:10PM -0800, Yang Shi wrote: >>> >>>>>>> 2) access_remote_vm() et al will do the same ->mmap_sem, and >>>>>> Yes, it does. But, __access_remote_vm() is called by access_process_vm() >>>>>> too, which is used by much more places, i.e. ptrace, so I was not sure >>>>>> if it is preferred to convert to killable version. So, I leave it untouched. >>>>> Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read >>>>> and your scalability tests should trigger next backtrace right away. >>>> Yes, however, I didn't run into it if mmap_sem is acquired earlier. >>>> >>>> How about defining a killable version, like >>>> __access_remote_vm_killable() which use down_read_killable(), then the >>>> killable version can be used by proc/*/cmdline? There might be other >>>> users in the future. >>> It would be a disaster as interfaces multiply. >> Might be not that bad. > Maybe. > > But you need to explain why there is no backtrace several lines later: > > access_remote_vm > __access_remote_vm > down_read(&mm->mmap_sem) I think it might be because: CPU A CPU B read /proc/*/cmdline get_mm acquire mmap_sem munmap(300G) try to acquire mmap_sem --> go to sleep release mmap_sem got mmap_sem release mmap_sem access_remote_vm put_mm The munmap might happen right before access_remote_vm(), but I just didn't run into it for the time being. It may be hit on another machine or with some changes to the test cases. BTW, even the hung I met happened occassionally, not very often. So, the access_remote_vm() hit sounds less often. But, I agree it is still possible in theory. Regards, Yang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() 2018-02-21 19:57 ` Alexey Dobriyan 2018-02-21 23:13 ` Yang Shi @ 2018-02-23 18:28 ` Yang Shi 1 sibling, 0 replies; 11+ messages in thread From: Yang Shi @ 2018-02-23 18:28 UTC (permalink / raw) To: Alexey Dobriyan; +Cc: akpm, mingo, linux-kernel On 2/21/18 11:57 AM, Alexey Dobriyan wrote: > On Tue, Feb 20, 2018 at 03:38:24PM -0800, Yang Shi wrote: >> >> On 2/20/18 2:38 PM, Alexey Dobriyan wrote: >>> On Wed, Feb 21, 2018 at 03:49:29AM +0800, Yang Shi wrote: >>>> When running vm-scalability with large memory (> 300GB), the below hung >>>> task issue happens occasionally. >>>> >>>> INFO: task ps:14018 blocked for more than 120 seconds. >>>> Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 >>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>> ps D 0 14018 1 0x00000004 >>>> ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0 >>>> ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040 >>>> 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000 >>>> Call Trace: >>>> [<ffffffff817154d0>] ? __schedule+0x250/0x730 >>>> [<ffffffff817159e6>] schedule+0x36/0x80 >>>> [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150 >>>> [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30 >>>> [<ffffffff81717db0>] down_read+0x20/0x40 >>>> [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0 >>>> [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100 >>>> [<ffffffff81241d87>] __vfs_read+0x37/0x150 >>>> [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0 >>>> [<ffffffff81242266>] vfs_read+0x96/0x130 >>>> [<ffffffff812437b5>] SyS_read+0x55/0xc0 >>>> [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5 >>>> >>>> When manipulating a large mapping, the process may hold the mmap_sem for >>>> long time, so reading /proc/<pid>/cmdline may be blocked in >>>> uninterruptible state for long time. >>>> >>>> down_read_trylock() sounds too aggressive, and we already have killable >>>> version APIs for semaphore, here use down_read_killable() to improve the >>>> responsiveness. >>>> - down_read(&mm->mmap_sem); >>>> + rv = down_read_killable(&mm->mmap_sem); >>>> + if (rv) >>>> + goto out_mmput; >>>> arg_start = mm->arg_start; >>>> arg_end = mm->arg_end; >>>> env_start = mm->env_start; >>> Fix is incomplete >> Yes, it is. Since I just ran into the above splat, so I just did the >> minimum change. >> >>> 1) /proc/*/cmdline only wants to read 4 values atomically, >>> those 4 values are basically random values and aren't >>> related to VM part at all (after C/R went in, they are >>> settable to arbitrary values) >> Sorry, I don't get your point here. Could you please elaborate? > I hoped there is some random spinlock those 4 values could be moved to > but no. > >>> 2) access_remote_vm() et al will do the same ->mmap_sem, and >> Yes, it does. But, __access_remote_vm() is called by access_process_vm() >> too, which is used by much more places, i.e. ptrace, so I was not sure >> if it is preferred to convert to killable version. So, I leave it untouched. > Yeah, but ->mmap_sem is taken 3 times per /proc/*/cmdline read > and your scalability tests should trigger next backtrace right away. > >>> 3) /proc/*/environ and get_cmdline() do the same. >> They look suitable to use killable version. Had a look at get_cmdline() further. It looks it is not necessary to convert to killable since it is called mainly by audit_exit(), which is called when process or syscall exit, so it might be preferred to wait until the mmap_sem is released. So, it sounds pointless to use down_read_killable(). Thanks, Yang >> >> BTW, I just realized the code should go to out_free_page instead of >> out_mmput. Will rectify in newer version once we decide the extra places >> to use killable version. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() 2018-02-20 19:49 [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() Yang Shi 2018-02-20 22:38 ` Alexey Dobriyan @ 2018-02-20 23:57 ` Yang Shi 1 sibling, 0 replies; 11+ messages in thread From: Yang Shi @ 2018-02-20 23:57 UTC (permalink / raw) To: akpm, mingo, adobriyan; +Cc: linux-kernel On 2/20/18 11:49 AM, Yang Shi wrote: > When running vm-scalability with large memory (> 300GB), the below hung > task issue happens occasionally. > > INFO: task ps:14018 blocked for more than 120 seconds. > Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1 > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > ps D 0 14018 1 0x00000004 > ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0 > ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040 > 00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000 > Call Trace: > [<ffffffff817154d0>] ? __schedule+0x250/0x730 > [<ffffffff817159e6>] schedule+0x36/0x80 > [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150 > [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30 > [<ffffffff81717db0>] down_read+0x20/0x40 > [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0 > [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100 > [<ffffffff81241d87>] __vfs_read+0x37/0x150 > [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0 > [<ffffffff81242266>] vfs_read+0x96/0x130 > [<ffffffff812437b5>] SyS_read+0x55/0xc0 > [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5 > > When manipulating a large mapping, the process may hold the mmap_sem for > long time, so reading /proc/<pid>/cmdline may be blocked in > uninterruptible state for long time. > > down_read_trylock() sounds too aggressive, and we already have killable > version APIs for semaphore, here use down_read_killable() to improve the > responsiveness. > > Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> > Cc: Alexey Dobriyan <adobriyan@gmail.com> > --- > fs/proc/base.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/fs/proc/base.c b/fs/proc/base.c > index 9298324..44b6f8f 100644 > --- a/fs/proc/base.c > +++ b/fs/proc/base.c > @@ -242,7 +242,9 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf, > goto out_mmput; > } > > - down_read(&mm->mmap_sem); > + rv = down_read_killable(&mm->mmap_sem); > + if (rv) > + goto out_mmput; > arg_start = mm->arg_start; > arg_end = mm->arg_end; > env_start = mm->env_start; Should goto out_free_page, amended patch as below. diff --git a/fs/proc/base.c b/fs/proc/base.c index 9298324..057e6bd 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -242,7 +242,9 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf, goto out_mmput; } - down_read(&mm->mmap_sem); + rv = down_read_killable(&mm->mmap_sem); + if (rv) + goto out_free_page; arg_start = mm->arg_start; arg_end = mm->arg_end; env_start = mm->env_start; ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-02-23 20:08 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-02-20 19:49 [PATCH] fs: proc: use down_read_killable in proc_pid_cmdline_read() Yang Shi 2018-02-20 22:38 ` Alexey Dobriyan 2018-02-20 23:38 ` Yang Shi 2018-02-21 19:57 ` Alexey Dobriyan 2018-02-21 23:13 ` Yang Shi 2018-02-23 19:33 ` Alexey Dobriyan 2018-02-23 19:42 ` Yang Shi 2018-02-23 19:45 ` Alexey Dobriyan 2018-02-23 20:08 ` Yang Shi 2018-02-23 18:28 ` Yang Shi 2018-02-20 23:57 ` Yang Shi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.