linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
@ 2018-03-26 18:20 Yang Shi
  2018-03-26 18:37 ` Matthew Wilcox
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Yang Shi @ 2018-03-26 18:20 UTC (permalink / raw)
  To: adobriyan, mhocko, willy, mguzik, gorcunov, akpm
  Cc: yang.shi, linux-mm, linux-kernel

mmap_sem is on the hot path of kernel, and it very contended, but it is
abused too. It is used to protect arg_start|end and evn_start|end when
reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
sense since those proc files just expect to read 4 values atomically and
not related to VM, they could be set to arbitrary values by C/R.

And, the mmap_sem contention may cause unexpected issue like below:

INFO: task ps:14018 blocked for more than 120 seconds.
       Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
 ps              D    0 14018      1 0x00000004
  ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
  ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
  00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
 Call Trace:
  [<ffffffff817154d0>] ? __schedule+0x250/0x730
  [<ffffffff817159e6>] schedule+0x36/0x80
  [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
  [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
  [<ffffffff81717db0>] down_read+0x20/0x40
  [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
  [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
  [<ffffffff81241d87>] __vfs_read+0x37/0x150
  [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
  [<ffffffff81242266>] vfs_read+0x96/0x130
  [<ffffffff812437b5>] SyS_read+0x55/0xc0
  [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5

Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
for them to mitigate the abuse of mmap_sem.

So, introduce a new spinlock in mm_struct to protect the concurrent access
to arg_start|end and env_start|end.

And, commit ddf1d398e517e660207e2c807f76a90df543a217 ("prctl: take mmap
sem for writing to protect against others") changed down_read to
down_write to avoid write race condition in prctl_set_mm(). Since we
already have dedicated lock to protect them, it is safe to change back
to down_read.

Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
---
v1 --> v2:
* Use spinlock instead of rwlock per Mattew's suggestion
* Replace down_write to down_read in prctl_set_mm (see commit log for details)

 fs/proc/base.c           |  8 ++++----
 include/linux/mm_types.h |  2 ++
 kernel/fork.c            |  1 +
 kernel/sys.c             | 14 ++++++++++----
 mm/init-mm.c             |  1 +
 5 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 9298324..e0282b6 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -242,12 +242,12 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf,
 		goto out_mmput;
 	}
 
-	down_read(&mm->mmap_sem);
+	spin_lock(&mm->arg_lock);
 	arg_start = mm->arg_start;
 	arg_end = mm->arg_end;
 	env_start = mm->env_start;
 	env_end = mm->env_end;
-	up_read(&mm->mmap_sem);
+	spin_unlock(&mm->arg_lock);
 
 	BUG_ON(arg_start > arg_end);
 	BUG_ON(env_start > env_end);
@@ -929,10 +929,10 @@ static ssize_t environ_read(struct file *file, char __user *buf,
 	if (!mmget_not_zero(mm))
 		goto free;
 
-	down_read(&mm->mmap_sem);
+	spin_lock(&mm->arg_lock);
 	env_start = mm->env_start;
 	env_end = mm->env_end;
-	up_read(&mm->mmap_sem);
+	spin_unlock(&mm->arg_lock);
 
 	while (count > 0) {
 		size_t this_len, max_len;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fd1af6b..3be4588 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -413,6 +413,8 @@ struct mm_struct {
 	unsigned long def_flags;
 	unsigned long start_code, end_code, start_data, end_data;
 	unsigned long start_brk, brk, start_stack;
+
+	spinlock_t arg_lock; /* protect concurrent access to arg_* and env_* */
 	unsigned long arg_start, arg_end, env_start, env_end;
 
 	unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */
diff --git a/kernel/fork.c b/kernel/fork.c
index e5d9d40..6540ae7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -898,6 +898,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	mm->pinned_vm = 0;
 	memset(&mm->rss_stat, 0, sizeof(mm->rss_stat));
 	spin_lock_init(&mm->page_table_lock);
+	spin_lock_init(&mm->arg_lock);
 	mm_init_cpumask(mm);
 	mm_init_aio(mm);
 	mm_init_owner(mm, p);
diff --git a/kernel/sys.c b/kernel/sys.c
index f2289de..17bddd2 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1959,7 +1959,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
 			return error;
 	}
 
-	down_write(&mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 
 	/*
 	 * We don't validate if these members are pointing to
@@ -1980,10 +1980,13 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
 	mm->start_brk	= prctl_map.start_brk;
 	mm->brk		= prctl_map.brk;
 	mm->start_stack	= prctl_map.start_stack;
+
+	spin_lock(&mm->arg_lock);
 	mm->arg_start	= prctl_map.arg_start;
 	mm->arg_end	= prctl_map.arg_end;
 	mm->env_start	= prctl_map.env_start;
 	mm->env_end	= prctl_map.env_end;
+	spin_unlock(&mm->arg_lock);
 
 	/*
 	 * Note this update of @saved_auxv is lockless thus
@@ -1996,7 +1999,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
 	if (prctl_map.auxv_size)
 		memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));
 
-	up_write(&mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 	return 0;
 }
 #endif /* CONFIG_CHECKPOINT_RESTORE */
@@ -2063,7 +2066,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
 
 	error = -EINVAL;
 
-	down_write(&mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, addr);
 
 	prctl_map.start_code	= mm->start_code;
@@ -2149,14 +2152,17 @@ static int prctl_set_mm(int opt, unsigned long addr,
 	mm->start_brk	= prctl_map.start_brk;
 	mm->brk		= prctl_map.brk;
 	mm->start_stack	= prctl_map.start_stack;
+
+	spin_lock(&mm->arg_lock);
 	mm->arg_start	= prctl_map.arg_start;
 	mm->arg_end	= prctl_map.arg_end;
 	mm->env_start	= prctl_map.env_start;
 	mm->env_end	= prctl_map.env_end;
+	spin_unlock(&mm->arg_lock);
 
 	error = 0;
 out:
-	up_write(&mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 	return error;
 }
 
diff --git a/mm/init-mm.c b/mm/init-mm.c
index f94d5d1..66cce4c 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -23,6 +23,7 @@ struct mm_struct init_mm = {
 	.mmap_sem	= __RWSEM_INITIALIZER(init_mm.mmap_sem),
 	.page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
 	.mmlist		= LIST_HEAD_INIT(init_mm.mmlist),
+	.arg_lock	= __SPIN_LOCK_UNLOCKED(init_mm.arg_lock),
 	.user_ns	= &init_user_ns,
 	INIT_MM_CONTEXT(init_mm)
 };
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 18:20 [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct Yang Shi
@ 2018-03-26 18:37 ` Matthew Wilcox
  2018-03-26 19:21   ` Cyrill Gorcunov
  2018-03-26 19:42 ` Mateusz Guzik
  2018-03-27  6:29 ` Michal Hocko
  2 siblings, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2018-03-26 18:37 UTC (permalink / raw)
  To: Yang Shi
  Cc: adobriyan, mhocko, mguzik, gorcunov, akpm, linux-mm, linux-kernel

On Tue, Mar 27, 2018 at 02:20:39AM +0800, Yang Shi wrote:
> +++ b/kernel/sys.c
> @@ -1959,7 +1959,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>  			return error;
>  	}
>  
> -	down_write(&mm->mmap_sem);
> +	down_read(&mm->mmap_sem);
>  
>  	/*
>  	 * We don't validate if these members are pointing to
> @@ -1980,10 +1980,13 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>  	mm->start_brk	= prctl_map.start_brk;
>  	mm->brk		= prctl_map.brk;
>  	mm->start_stack	= prctl_map.start_stack;
> +
> +	spin_lock(&mm->arg_lock);
>  	mm->arg_start	= prctl_map.arg_start;
>  	mm->arg_end	= prctl_map.arg_end;
>  	mm->env_start	= prctl_map.env_start;
>  	mm->env_end	= prctl_map.env_end;
> +	spin_unlock(&mm->arg_lock);
>  
>  	/*
>  	 * Note this update of @saved_auxv is lockless thus

I see the argument for the change to a write lock was because of a BUG
validating arg_start and arg_end, but more generally, we are updating these
values, so a write-lock is probably a good idea, and this is a very rare
operation to do, so we don't care about making this more parallel.  I would
not make this change (but if other more knowledgable people in this area
disagree with me, I will withdraw my objection to this part).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 18:37 ` Matthew Wilcox
@ 2018-03-26 19:21   ` Cyrill Gorcunov
  2018-03-26 21:10     ` Tetsuo Handa
  2018-03-26 21:59     ` Yang Shi
  0 siblings, 2 replies; 22+ messages in thread
From: Cyrill Gorcunov @ 2018-03-26 19:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yang Shi, adobriyan, mhocko, mguzik, akpm, linux-mm, linux-kernel

On Mon, Mar 26, 2018 at 11:37:25AM -0700, Matthew Wilcox wrote:
> On Tue, Mar 27, 2018 at 02:20:39AM +0800, Yang Shi wrote:
> > +++ b/kernel/sys.c
> > @@ -1959,7 +1959,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
> >  			return error;
> >  	}
> >  
> > -	down_write(&mm->mmap_sem);
> > +	down_read(&mm->mmap_sem);
> >  
> >  	/*
> >  	 * We don't validate if these members are pointing to
> > @@ -1980,10 +1980,13 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
> >  	mm->start_brk	= prctl_map.start_brk;
> >  	mm->brk		= prctl_map.brk;
> >  	mm->start_stack	= prctl_map.start_stack;
> > +
> > +	spin_lock(&mm->arg_lock);
> >  	mm->arg_start	= prctl_map.arg_start;
> >  	mm->arg_end	= prctl_map.arg_end;
> >  	mm->env_start	= prctl_map.env_start;
> >  	mm->env_end	= prctl_map.env_end;
> > +	spin_unlock(&mm->arg_lock);
> >  
> >  	/*
> >  	 * Note this update of @saved_auxv is lockless thus
> 
> I see the argument for the change to a write lock was because of a BUG
> validating arg_start and arg_end, but more generally, we are updating these
> values, so a write-lock is probably a good idea, and this is a very rare
> operation to do, so we don't care about making this more parallel.  I would
> not make this change (but if other more knowledgable people in this area
> disagree with me, I will withdraw my objection to this part).

Say we've two syscalls running prctl_set_mm_map in parallel, and imagine
one have @start_brk = 20 @brk = 10 and second caller has @start_brk = 30
and @brk = 20. Since now the call is guarded by _read_ the both calls
unlocked and due to OO engine it may happen then when both finish
we have @start_brk = 30 and @brk = 10. In turn "write" semaphore
has been take to have consistent data on exit, either you have [20;10]
or [30;20] assigned not something mixed.

That said I think using read-lock here would be a bug.

	Cyrill

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 18:20 [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct Yang Shi
  2018-03-26 18:37 ` Matthew Wilcox
@ 2018-03-26 19:42 ` Mateusz Guzik
  2018-03-26 21:10   ` Yang Shi
  2018-03-27  6:29 ` Michal Hocko
  2 siblings, 1 reply; 22+ messages in thread
From: Mateusz Guzik @ 2018-03-26 19:42 UTC (permalink / raw)
  To: Yang Shi; +Cc: adobriyan, mhocko, willy, gorcunov, akpm, linux-mm, linux-kernel

On Tue, Mar 27, 2018 at 02:20:39AM +0800, Yang Shi wrote:
> mmap_sem is on the hot path of kernel, and it very contended, but it is
> abused too. It is used to protect arg_start|end and evn_start|end when
> reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
> sense since those proc files just expect to read 4 values atomically and
> not related to VM, they could be set to arbitrary values by C/R.
> 

They are not arbitrary - there is basic validation performed when
setting them.

> And, the mmap_sem contention may cause unexpected issue like below:
> 
> INFO: task ps:14018 blocked for more than 120 seconds.
>        Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
>  ps              D    0 14018      1 0x00000004
>   ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
>   ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
>   00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
>  Call Trace:
>   [<ffffffff817154d0>] ? __schedule+0x250/0x730
>   [<ffffffff817159e6>] schedule+0x36/0x80
>   [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
>   [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
>   [<ffffffff81717db0>] down_read+0x20/0x40
>   [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
>   [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
>   [<ffffffff81241d87>] __vfs_read+0x37/0x150
>   [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
>   [<ffffffff81242266>] vfs_read+0x96/0x130
>   [<ffffffff812437b5>] SyS_read+0x55/0xc0
>   [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
> 
> Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
> for them to mitigate the abuse of mmap_sem.
> 

While switching to arg spinlock here will relieve mmap_sem to an extent,
it wont help with the problem you are seeing here.

proc_pid_cmdline_read -> access_process_vm -> __access_remote_vm and you
got yet another down_read(&mm->mmap_sem);.

i.e. the issue you ran into is well known and predates my change.

The problem does not stem from contention either, but blocking for a
long time while holding the lock - the most common example is dealing
with dead nfs mount vs mmaped areas.

I don't have good ideas how to fix the problem. The least bad I came up
with was to trylock with a timeout - after a failure either return an
error or resort to returning p_comm. ps/top could be modified to
fallback to snatching the name from /status.

Since the lock owner is now being stored in the semaphore, perhaps the
above routine can happily spin until it grabs the lock or the owner is
detected to have gone into uninterruptible sleep and react accordingly. 

I don't know whether it is feasible to somehow avoid the mmap lock
altogether.

If it has to be there no matter what the code can be refactored to grab
it once and relock only if copyout would fault. This would in particular
reduce the number of times it is taken to begin with and still provide
the current synchronisation against prctl. But the fundamental problem
will remain.

That said, refactoring above will have the same effect as your patch and
will avoid growing mm_struct.

That's my $0,03. MM overlords have to comment on what to do with this.

> So, introduce a new spinlock in mm_struct to protect the concurrent access
> to arg_start|end and env_start|end.
> 
> And, commit ddf1d398e517e660207e2c807f76a90df543a217 ("prctl: take mmap
> sem for writing to protect against others") changed down_read to
> down_write to avoid write race condition in prctl_set_mm(). Since we
> already have dedicated lock to protect them, it is safe to change back
> to down_read.
> 
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Mateusz Guzik <mguzik@redhat.com>
> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> ---
> v1 --> v2:
> * Use spinlock instead of rwlock per Mattew's suggestion
> * Replace down_write to down_read in prctl_set_mm (see commit log for details)
> 
>  fs/proc/base.c           |  8 ++++----
>  include/linux/mm_types.h |  2 ++
>  kernel/fork.c            |  1 +
>  kernel/sys.c             | 14 ++++++++++----
>  mm/init-mm.c             |  1 +
>  5 files changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 9298324..e0282b6 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -242,12 +242,12 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf,
>  		goto out_mmput;
>  	}
>  
> -	down_read(&mm->mmap_sem);
> +	spin_lock(&mm->arg_lock);
>  	arg_start = mm->arg_start;
>  	arg_end = mm->arg_end;
>  	env_start = mm->env_start;
>  	env_end = mm->env_end;
> -	up_read(&mm->mmap_sem);
> +	spin_unlock(&mm->arg_lock);
>  
>  	BUG_ON(arg_start > arg_end);
>  	BUG_ON(env_start > env_end);
> @@ -929,10 +929,10 @@ static ssize_t environ_read(struct file *file, char __user *buf,
>  	if (!mmget_not_zero(mm))
>  		goto free;
>  
> -	down_read(&mm->mmap_sem);
> +	spin_lock(&mm->arg_lock);
>  	env_start = mm->env_start;
>  	env_end = mm->env_end;
> -	up_read(&mm->mmap_sem);
> +	spin_unlock(&mm->arg_lock);
>  
>  	while (count > 0) {
>  		size_t this_len, max_len;
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index fd1af6b..3be4588 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -413,6 +413,8 @@ struct mm_struct {
>  	unsigned long def_flags;
>  	unsigned long start_code, end_code, start_data, end_data;
>  	unsigned long start_brk, brk, start_stack;
> +
> +	spinlock_t arg_lock; /* protect concurrent access to arg_* and env_* */
>  	unsigned long arg_start, arg_end, env_start, env_end;
>  
>  	unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index e5d9d40..6540ae7 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -898,6 +898,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>  	mm->pinned_vm = 0;
>  	memset(&mm->rss_stat, 0, sizeof(mm->rss_stat));
>  	spin_lock_init(&mm->page_table_lock);
> +	spin_lock_init(&mm->arg_lock);
>  	mm_init_cpumask(mm);
>  	mm_init_aio(mm);
>  	mm_init_owner(mm, p);
> diff --git a/kernel/sys.c b/kernel/sys.c
> index f2289de..17bddd2 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1959,7 +1959,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>  			return error;
>  	}
>  
> -	down_write(&mm->mmap_sem);
> +	down_read(&mm->mmap_sem);
>  
>  	/*
>  	 * We don't validate if these members are pointing to
> @@ -1980,10 +1980,13 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>  	mm->start_brk	= prctl_map.start_brk;
>  	mm->brk		= prctl_map.brk;
>  	mm->start_stack	= prctl_map.start_stack;
> +
> +	spin_lock(&mm->arg_lock);
>  	mm->arg_start	= prctl_map.arg_start;
>  	mm->arg_end	= prctl_map.arg_end;
>  	mm->env_start	= prctl_map.env_start;
>  	mm->env_end	= prctl_map.env_end;
> +	spin_unlock(&mm->arg_lock);
>  
>  	/*
>  	 * Note this update of @saved_auxv is lockless thus
> @@ -1996,7 +1999,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>  	if (prctl_map.auxv_size)
>  		memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));
>  
> -	up_write(&mm->mmap_sem);
> +	up_read(&mm->mmap_sem);
>  	return 0;
>  }
>  #endif /* CONFIG_CHECKPOINT_RESTORE */
> @@ -2063,7 +2066,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
>  
>  	error = -EINVAL;
>  
> -	down_write(&mm->mmap_sem);
> +	down_read(&mm->mmap_sem);
>  	vma = find_vma(mm, addr);
>  
>  	prctl_map.start_code	= mm->start_code;
> @@ -2149,14 +2152,17 @@ static int prctl_set_mm(int opt, unsigned long addr,
>  	mm->start_brk	= prctl_map.start_brk;
>  	mm->brk		= prctl_map.brk;
>  	mm->start_stack	= prctl_map.start_stack;
> +
> +	spin_lock(&mm->arg_lock);
>  	mm->arg_start	= prctl_map.arg_start;
>  	mm->arg_end	= prctl_map.arg_end;
>  	mm->env_start	= prctl_map.env_start;
>  	mm->env_end	= prctl_map.env_end;
> +	spin_unlock(&mm->arg_lock);
>  
>  	error = 0;
>  out:
> -	up_write(&mm->mmap_sem);
> +	up_read(&mm->mmap_sem);
>  	return error;
>  }
>  
> diff --git a/mm/init-mm.c b/mm/init-mm.c
> index f94d5d1..66cce4c 100644
> --- a/mm/init-mm.c
> +++ b/mm/init-mm.c
> @@ -23,6 +23,7 @@ struct mm_struct init_mm = {
>  	.mmap_sem	= __RWSEM_INITIALIZER(init_mm.mmap_sem),
>  	.page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
>  	.mmlist		= LIST_HEAD_INIT(init_mm.mmlist),
> +	.arg_lock	= __SPIN_LOCK_UNLOCKED(init_mm.arg_lock),
>  	.user_ns	= &init_user_ns,
>  	INIT_MM_CONTEXT(init_mm)
>  };
> -- 
> 1.8.3.1
> 

-- 
Mateusz Guzik

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 19:21   ` Cyrill Gorcunov
@ 2018-03-26 21:10     ` Tetsuo Handa
  2018-03-26 21:20       ` Yang Shi
  2018-03-26 21:29       ` Cyrill Gorcunov
  2018-03-26 21:59     ` Yang Shi
  1 sibling, 2 replies; 22+ messages in thread
From: Tetsuo Handa @ 2018-03-26 21:10 UTC (permalink / raw)
  To: Cyrill Gorcunov, Matthew Wilcox
  Cc: Yang Shi, adobriyan, mhocko, mguzik, akpm, linux-mm, linux-kernel

On 2018/03/27 4:21, Cyrill Gorcunov wrote:
> That said I think using read-lock here would be a bug.

If I understand correctly, the caller can't set both fields atomically, for
prctl() does not receive both fields at one call.

  prctl(PR_SET_MM, PR_SET_MM_ARG_START xor PR_SET_MM_ARG_END xor PR_SET_MM_ENV_START xor PR_SET_MM_ENV_END, new value, 0, 0);

Then, I wonder whether reading arg_start|end and env_start|end atomically makes
sense. Just retry reading if arg_start > env_end or env_start > env_end is fine?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 19:42 ` Mateusz Guzik
@ 2018-03-26 21:10   ` Yang Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Yang Shi @ 2018-03-26 21:10 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: adobriyan, mhocko, willy, gorcunov, akpm, linux-mm, linux-kernel



On 3/26/18 3:42 PM, Mateusz Guzik wrote:
> On Tue, Mar 27, 2018 at 02:20:39AM +0800, Yang Shi wrote:
>> mmap_sem is on the hot path of kernel, and it very contended, but it is
>> abused too. It is used to protect arg_start|end and evn_start|end when
>> reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
>> sense since those proc files just expect to read 4 values atomically and
>> not related to VM, they could be set to arbitrary values by C/R.
>>
> They are not arbitrary - there is basic validation performed when
> setting them.
>
>> And, the mmap_sem contention may cause unexpected issue like below:
>>
>> INFO: task ps:14018 blocked for more than 120 seconds.
>>         Tainted: G            E 4.9.79-009.ali3000.alios7.x86_64 #1
>>   "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>> message.
>>   ps              D    0 14018      1 0x00000004
>>    ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
>>    ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
>>    00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
>>   Call Trace:
>>    [<ffffffff817154d0>] ? __schedule+0x250/0x730
>>    [<ffffffff817159e6>] schedule+0x36/0x80
>>    [<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
>>    [<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
>>    [<ffffffff81717db0>] down_read+0x20/0x40
>>    [<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
>>    [<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
>>    [<ffffffff81241d87>] __vfs_read+0x37/0x150
>>    [<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
>>    [<ffffffff81242266>] vfs_read+0x96/0x130
>>    [<ffffffff812437b5>] SyS_read+0x55/0xc0
>>    [<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5
>>
>> Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
>> for them to mitigate the abuse of mmap_sem.
>>
> While switching to arg spinlock here will relieve mmap_sem to an extent,
> it wont help with the problem you are seeing here.
>
> proc_pid_cmdline_read -> access_process_vm -> __access_remote_vm and you
> got yet another down_read(&mm->mmap_sem);.
>
> i.e. the issue you ran into is well known and predates my change.
>
> The problem does not stem from contention either, but blocking for a
> long time while holding the lock - the most common example is dealing
> with dead nfs mount vs mmaped areas.
>
> I don't have good ideas how to fix the problem. The least bad I came up
> with was to trylock with a timeout - after a failure either return an
> error or resort to returning p_comm. ps/top could be modified to
> fallback to snatching the name from /status.
>
> Since the lock owner is now being stored in the semaphore, perhaps the
> above routine can happily spin until it grabs the lock or the owner is
> detected to have gone into uninterruptible sleep and react accordingly.
>
> I don't know whether it is feasible to somehow avoid the mmap lock
> altogether.
>
> If it has to be there no matter what the code can be refactored to grab
> it once and relock only if copyout would fault. This would in particular
> reduce the number of times it is taken to begin with and still provide
> the current synchronisation against prctl. But the fundamental problem
> will remain.
>
> That said, refactoring above will have the same effect as your patch and
> will avoid growing mm_struct.
>
> That's my $0,03. MM overlords have to comment on what to do with this.

Thanks for the comment. Yes, the spin lock absolutely can't solve all 
the mmap_sem scalability issue. Actually, I already proposed a 
preliminary RFC to try to mitigate the mmap_sem issue. It is still under 
review.

Other than that, we also found mmap_sem is abused somewhere, so this 
patch is proposed to reduce the abuse to mmap_sem.

Yang

>
>> So, introduce a new spinlock in mm_struct to protect the concurrent access
>> to arg_start|end and env_start|end.
>>
>> And, commit ddf1d398e517e660207e2c807f76a90df543a217 ("prctl: take mmap
>> sem for writing to protect against others") changed down_read to
>> down_write to avoid write race condition in prctl_set_mm(). Since we
>> already have dedicated lock to protect them, it is safe to change back
>> to down_read.
>>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> Cc: Alexey Dobriyan <adobriyan@gmail.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Mateusz Guzik <mguzik@redhat.com>
>> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
>> ---
>> v1 --> v2:
>> * Use spinlock instead of rwlock per Mattew's suggestion
>> * Replace down_write to down_read in prctl_set_mm (see commit log for details)
>>
>>   fs/proc/base.c           |  8 ++++----
>>   include/linux/mm_types.h |  2 ++
>>   kernel/fork.c            |  1 +
>>   kernel/sys.c             | 14 ++++++++++----
>>   mm/init-mm.c             |  1 +
>>   5 files changed, 18 insertions(+), 8 deletions(-)
>>
>> diff --git a/fs/proc/base.c b/fs/proc/base.c
>> index 9298324..e0282b6 100644
>> --- a/fs/proc/base.c
>> +++ b/fs/proc/base.c
>> @@ -242,12 +242,12 @@ static ssize_t proc_pid_cmdline_read(struct file *file, char __user *buf,
>>   		goto out_mmput;
>>   	}
>>   
>> -	down_read(&mm->mmap_sem);
>> +	spin_lock(&mm->arg_lock);
>>   	arg_start = mm->arg_start;
>>   	arg_end = mm->arg_end;
>>   	env_start = mm->env_start;
>>   	env_end = mm->env_end;
>> -	up_read(&mm->mmap_sem);
>> +	spin_unlock(&mm->arg_lock);
>>   
>>   	BUG_ON(arg_start > arg_end);
>>   	BUG_ON(env_start > env_end);
>> @@ -929,10 +929,10 @@ static ssize_t environ_read(struct file *file, char __user *buf,
>>   	if (!mmget_not_zero(mm))
>>   		goto free;
>>   
>> -	down_read(&mm->mmap_sem);
>> +	spin_lock(&mm->arg_lock);
>>   	env_start = mm->env_start;
>>   	env_end = mm->env_end;
>> -	up_read(&mm->mmap_sem);
>> +	spin_unlock(&mm->arg_lock);
>>   
>>   	while (count > 0) {
>>   		size_t this_len, max_len;
>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
>> index fd1af6b..3be4588 100644
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -413,6 +413,8 @@ struct mm_struct {
>>   	unsigned long def_flags;
>>   	unsigned long start_code, end_code, start_data, end_data;
>>   	unsigned long start_brk, brk, start_stack;
>> +
>> +	spinlock_t arg_lock; /* protect concurrent access to arg_* and env_* */
>>   	unsigned long arg_start, arg_end, env_start, env_end;
>>   
>>   	unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for /proc/PID/auxv */
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index e5d9d40..6540ae7 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -898,6 +898,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>>   	mm->pinned_vm = 0;
>>   	memset(&mm->rss_stat, 0, sizeof(mm->rss_stat));
>>   	spin_lock_init(&mm->page_table_lock);
>> +	spin_lock_init(&mm->arg_lock);
>>   	mm_init_cpumask(mm);
>>   	mm_init_aio(mm);
>>   	mm_init_owner(mm, p);
>> diff --git a/kernel/sys.c b/kernel/sys.c
>> index f2289de..17bddd2 100644
>> --- a/kernel/sys.c
>> +++ b/kernel/sys.c
>> @@ -1959,7 +1959,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>>   			return error;
>>   	}
>>   
>> -	down_write(&mm->mmap_sem);
>> +	down_read(&mm->mmap_sem);
>>   
>>   	/*
>>   	 * We don't validate if these members are pointing to
>> @@ -1980,10 +1980,13 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>>   	mm->start_brk	= prctl_map.start_brk;
>>   	mm->brk		= prctl_map.brk;
>>   	mm->start_stack	= prctl_map.start_stack;
>> +
>> +	spin_lock(&mm->arg_lock);
>>   	mm->arg_start	= prctl_map.arg_start;
>>   	mm->arg_end	= prctl_map.arg_end;
>>   	mm->env_start	= prctl_map.env_start;
>>   	mm->env_end	= prctl_map.env_end;
>> +	spin_unlock(&mm->arg_lock);
>>   
>>   	/*
>>   	 * Note this update of @saved_auxv is lockless thus
>> @@ -1996,7 +1999,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>>   	if (prctl_map.auxv_size)
>>   		memcpy(mm->saved_auxv, user_auxv, sizeof(user_auxv));
>>   
>> -	up_write(&mm->mmap_sem);
>> +	up_read(&mm->mmap_sem);
>>   	return 0;
>>   }
>>   #endif /* CONFIG_CHECKPOINT_RESTORE */
>> @@ -2063,7 +2066,7 @@ static int prctl_set_mm(int opt, unsigned long addr,
>>   
>>   	error = -EINVAL;
>>   
>> -	down_write(&mm->mmap_sem);
>> +	down_read(&mm->mmap_sem);
>>   	vma = find_vma(mm, addr);
>>   
>>   	prctl_map.start_code	= mm->start_code;
>> @@ -2149,14 +2152,17 @@ static int prctl_set_mm(int opt, unsigned long addr,
>>   	mm->start_brk	= prctl_map.start_brk;
>>   	mm->brk		= prctl_map.brk;
>>   	mm->start_stack	= prctl_map.start_stack;
>> +
>> +	spin_lock(&mm->arg_lock);
>>   	mm->arg_start	= prctl_map.arg_start;
>>   	mm->arg_end	= prctl_map.arg_end;
>>   	mm->env_start	= prctl_map.env_start;
>>   	mm->env_end	= prctl_map.env_end;
>> +	spin_unlock(&mm->arg_lock);
>>   
>>   	error = 0;
>>   out:
>> -	up_write(&mm->mmap_sem);
>> +	up_read(&mm->mmap_sem);
>>   	return error;
>>   }
>>   
>> diff --git a/mm/init-mm.c b/mm/init-mm.c
>> index f94d5d1..66cce4c 100644
>> --- a/mm/init-mm.c
>> +++ b/mm/init-mm.c
>> @@ -23,6 +23,7 @@ struct mm_struct init_mm = {
>>   	.mmap_sem	= __RWSEM_INITIALIZER(init_mm.mmap_sem),
>>   	.page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
>>   	.mmlist		= LIST_HEAD_INIT(init_mm.mmlist),
>> +	.arg_lock	= __SPIN_LOCK_UNLOCKED(init_mm.arg_lock),
>>   	.user_ns	= &init_user_ns,
>>   	INIT_MM_CONTEXT(init_mm)
>>   };
>> -- 
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 21:10     ` Tetsuo Handa
@ 2018-03-26 21:20       ` Yang Shi
  2018-03-26 21:29       ` Cyrill Gorcunov
  1 sibling, 0 replies; 22+ messages in thread
From: Yang Shi @ 2018-03-26 21:20 UTC (permalink / raw)
  To: Tetsuo Handa, Cyrill Gorcunov, Matthew Wilcox
  Cc: adobriyan, mhocko, mguzik, akpm, linux-mm, linux-kernel



On 3/26/18 5:10 PM, Tetsuo Handa wrote:
> On 2018/03/27 4:21, Cyrill Gorcunov wrote:
>> That said I think using read-lock here would be a bug.
> If I understand correctly, the caller can't set both fields atomically, for
> prctl() does not receive both fields at one call.
>
>    prctl(PR_SET_MM, PR_SET_MM_ARG_START xor PR_SET_MM_ARG_END xor PR_SET_MM_ENV_START xor PR_SET_MM_ENV_END, new value, 0, 0);
>
> Then, I wonder whether reading arg_start|end and env_start|end atomically makes
> sense. Just retry reading if arg_start > env_end or env_start > env_end is fine?

It might trap into dead loop if those are set to wrong values, right?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 21:10     ` Tetsuo Handa
  2018-03-26 21:20       ` Yang Shi
@ 2018-03-26 21:29       ` Cyrill Gorcunov
  2018-03-26 22:00         ` Tetsuo Handa
  1 sibling, 1 reply; 22+ messages in thread
From: Cyrill Gorcunov @ 2018-03-26 21:29 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Matthew Wilcox, Yang Shi, adobriyan, mhocko, mguzik, akpm,
	linux-mm, linux-kernel

On Tue, Mar 27, 2018 at 06:10:09AM +0900, Tetsuo Handa wrote:
> On 2018/03/27 4:21, Cyrill Gorcunov wrote:
> > That said I think using read-lock here would be a bug.
> 
> If I understand correctly, the caller can't set both fields atomically, for
> prctl() does not receive both fields at one call.
> 
>   prctl(PR_SET_MM, PR_SET_MM_ARG_START xor PR_SET_MM_ARG_END xor PR_SET_MM_ENV_START xor PR_SET_MM_ENV_END, new value, 0, 0);
> 

True, but the key moment is that two/three/four system calls can
run simultaneously. And while previously they are ordered by "write",
with read lock they are completely unordered and this is really
worries me. To be fair I would prefer to drop this old per-field
interface completely. This per-field interface was rather an ugly
solution from my side.

> Then, I wonder whether reading arg_start|end and env_start|end atomically makes
> sense. Just retry reading if arg_start > env_end or env_start > env_end is fine?

Tetsuo, let me re-read this code tomorrow, maybe I miss something obvious.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 19:21   ` Cyrill Gorcunov
  2018-03-26 21:10     ` Tetsuo Handa
@ 2018-03-26 21:59     ` Yang Shi
  2018-03-27  7:32       ` Cyrill Gorcunov
  1 sibling, 1 reply; 22+ messages in thread
From: Yang Shi @ 2018-03-26 21:59 UTC (permalink / raw)
  To: Cyrill Gorcunov, Matthew Wilcox
  Cc: adobriyan, mhocko, mguzik, akpm, linux-mm, linux-kernel



On 3/26/18 3:21 PM, Cyrill Gorcunov wrote:
> On Mon, Mar 26, 2018 at 11:37:25AM -0700, Matthew Wilcox wrote:
>> On Tue, Mar 27, 2018 at 02:20:39AM +0800, Yang Shi wrote:
>>> +++ b/kernel/sys.c
>>> @@ -1959,7 +1959,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>>>   			return error;
>>>   	}
>>>   
>>> -	down_write(&mm->mmap_sem);
>>> +	down_read(&mm->mmap_sem);
>>>   
>>>   	/*
>>>   	 * We don't validate if these members are pointing to
>>> @@ -1980,10 +1980,13 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>>>   	mm->start_brk	= prctl_map.start_brk;
>>>   	mm->brk		= prctl_map.brk;
>>>   	mm->start_stack	= prctl_map.start_stack;
>>> +
>>> +	spin_lock(&mm->arg_lock);
>>>   	mm->arg_start	= prctl_map.arg_start;
>>>   	mm->arg_end	= prctl_map.arg_end;
>>>   	mm->env_start	= prctl_map.env_start;
>>>   	mm->env_end	= prctl_map.env_end;
>>> +	spin_unlock(&mm->arg_lock);
>>>   
>>>   	/*
>>>   	 * Note this update of @saved_auxv is lockless thus
>> I see the argument for the change to a write lock was because of a BUG
>> validating arg_start and arg_end, but more generally, we are updating these
>> values, so a write-lock is probably a good idea, and this is a very rare
>> operation to do, so we don't care about making this more parallel.  I would
>> not make this change (but if other more knowledgable people in this area
>> disagree with me, I will withdraw my objection to this part).
> Say we've two syscalls running prctl_set_mm_map in parallel, and imagine
> one have @start_brk = 20 @brk = 10 and second caller has @start_brk = 30
> and @brk = 20. Since now the call is guarded by _read_ the both calls
> unlocked and due to OO engine it may happen then when both finish
> we have @start_brk = 30 and @brk = 10. In turn "write" semaphore
> has been take to have consistent data on exit, either you have [20;10]
> or [30;20] assigned not something mixed.
>
> That said I think using read-lock here would be a bug.

Yes it sounds so. However, it was down_read before 
ddf1d398e517e660207e2c807f76a90df543a217 ("prctl: take mmap sem for 
writing to protect against others"). And, that commit is for fixing the 
concurrent writing to arg_* and env_*. I just checked that commit, but 
omitted the brk part. The potential issue mentioned by you should exist 
before that commit, but might be just not discovered or very rare to hit.

I will change it back to down_write.

Thanks,
Yang

>
> 	Cyrill

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 21:29       ` Cyrill Gorcunov
@ 2018-03-26 22:00         ` Tetsuo Handa
  2018-03-26 22:12           ` Yang Shi
  2018-03-27  7:37           ` Cyrill Gorcunov
  0 siblings, 2 replies; 22+ messages in thread
From: Tetsuo Handa @ 2018-03-26 22:00 UTC (permalink / raw)
  To: gorcunov
  Cc: willy, yang.shi, adobriyan, mhocko, mguzik, akpm, linux-mm, linux-kernel

Cyrill Gorcunov wrote:
> On Tue, Mar 27, 2018 at 06:10:09AM +0900, Tetsuo Handa wrote:
> > On 2018/03/27 4:21, Cyrill Gorcunov wrote:
> > > That said I think using read-lock here would be a bug.
> > 
> > If I understand correctly, the caller can't set both fields atomically, for
> > prctl() does not receive both fields at one call.
> > 
> >   prctl(PR_SET_MM, PR_SET_MM_ARG_START xor PR_SET_MM_ARG_END xor PR_SET_MM_ENV_START xor PR_SET_MM_ENV_END, new value, 0, 0);
> > 
> 
> True, but the key moment is that two/three/four system calls can
> run simultaneously. And while previously they are ordered by "write",
> with read lock they are completely unordered and this is really
> worries me.

Yes, we need exclusive lock when updating these fields.

>             To be fair I would prefer to drop this old per-field
> interface completely. This per-field interface was rather an ugly
> solution from my side.

But this is userspace visible API and thus we cannot change.

> 
> > Then, I wonder whether reading arg_start|end and env_start|end atomically makes
> > sense. Just retry reading if arg_start > env_end or env_start > env_end is fine?
> 
> Tetsuo, let me re-read this code tomorrow, maybe I miss something obvious.
> 

You are not missing my point. What I thought is

+retry:
-	down_read(&mm->mmap_sem);
 	arg_start = mm->arg_start;
 	arg_end = mm->arg_end;
 	env_start = mm->env_start;
 	env_end = mm->env_end;
-	up_read(&mm->mmap_sem);
 
-	BUG_ON(arg_start > arg_end);
-	BUG_ON(env_start > env_end);
+	if (unlikely(arg_start > arg_end || env_start > env_end)) {
+		cond_resched();
+		goto retry;
+	}

for reading these fields.

By the way, /proc/pid/ readers are serving as a canary who tells something
mm_mmap related problem is happening. On the other hand, it is sad that
such canary cannot be terminated by signal due to use of unkillable waits.
I wish we can use killable waits.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 22:00         ` Tetsuo Handa
@ 2018-03-26 22:12           ` Yang Shi
  2018-03-27  7:38             ` Cyrill Gorcunov
  2018-03-27  7:37           ` Cyrill Gorcunov
  1 sibling, 1 reply; 22+ messages in thread
From: Yang Shi @ 2018-03-26 22:12 UTC (permalink / raw)
  To: Tetsuo Handa, gorcunov
  Cc: willy, adobriyan, mhocko, mguzik, akpm, linux-mm, linux-kernel



On 3/26/18 6:00 PM, Tetsuo Handa wrote:
> Cyrill Gorcunov wrote:
>> On Tue, Mar 27, 2018 at 06:10:09AM +0900, Tetsuo Handa wrote:
>>> On 2018/03/27 4:21, Cyrill Gorcunov wrote:
>>>> That said I think using read-lock here would be a bug.
>>> If I understand correctly, the caller can't set both fields atomically, for
>>> prctl() does not receive both fields at one call.
>>>
>>>    prctl(PR_SET_MM, PR_SET_MM_ARG_START xor PR_SET_MM_ARG_END xor PR_SET_MM_ENV_START xor PR_SET_MM_ENV_END, new value, 0, 0);
>>>
>> True, but the key moment is that two/three/four system calls can
>> run simultaneously. And while previously they are ordered by "write",
>> with read lock they are completely unordered and this is really
>> worries me.
> Yes, we need exclusive lock when updating these fields.
>
>>              To be fair I would prefer to drop this old per-field
>> interface completely. This per-field interface was rather an ugly
>> solution from my side.
> But this is userspace visible API and thus we cannot change.
>
>>> Then, I wonder whether reading arg_start|end and env_start|end atomically makes
>>> sense. Just retry reading if arg_start > env_end or env_start > env_end is fine?
>> Tetsuo, let me re-read this code tomorrow, maybe I miss something obvious.
>>
> You are not missing my point. What I thought is
>
> +retry:
> -	down_read(&mm->mmap_sem);
>   	arg_start = mm->arg_start;
>   	arg_end = mm->arg_end;
>   	env_start = mm->env_start;
>   	env_end = mm->env_end;
> -	up_read(&mm->mmap_sem);
>   
> -	BUG_ON(arg_start > arg_end);
> -	BUG_ON(env_start > env_end);
> +	if (unlikely(arg_start > arg_end || env_start > env_end)) {
> +		cond_resched();
> +		goto retry;

Can't it trap into dead loop if the condition is always false?

> +	}
>
> for reading these fields.
>
> By the way, /proc/pid/ readers are serving as a canary who tells something
> mm_mmap related problem is happening. On the other hand, it is sad that
> such canary cannot be terminated by signal due to use of unkillable waits.
> I wish we can use killable waits.

I already proposed patches (https://lkml.org/lkml/2018/2/26/1197) to do 
this a few weeks ago. In the review, akpm suggested mitigate the 
mmap_sem contention instead of using killable version workaround. Then 
the preliminary unmaping by section patches 
(https://lkml.org/lkml/2018/3/20/786) were proposed. In the discussion, 
we decided to eliminate the mmap_sem abuse, this is where the patch came 
from.

Yang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 18:20 [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct Yang Shi
  2018-03-26 18:37 ` Matthew Wilcox
  2018-03-26 19:42 ` Mateusz Guzik
@ 2018-03-27  6:29 ` Michal Hocko
  2018-03-27 14:31   ` Mateusz Guzik
                     ` (2 more replies)
  2 siblings, 3 replies; 22+ messages in thread
From: Michal Hocko @ 2018-03-27  6:29 UTC (permalink / raw)
  To: Yang Shi; +Cc: adobriyan, willy, mguzik, gorcunov, akpm, linux-mm, linux-kernel

On Tue 27-03-18 02:20:39, Yang Shi wrote:
[...]
The patch looks reasonable to me. Maybe it would be better to be more
explicit about the purpose of the patch. As others noticed, this alone
wouldn't solve the mmap_sem contention issues. I _think_ that if you
were more explicit about the mmap_sem abuse it would trigger less
questions.

I have just one more question. Now that you are touching this area,
would you be willing to remove the following ugliness?

> diff --git a/kernel/sys.c b/kernel/sys.c
> index f2289de..17bddd2 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1959,7 +1959,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>  			return error;
>  	}
>  
> -	down_write(&mm->mmap_sem);
> +	down_read(&mm->mmap_sem);

Why do we need to hold mmap_sem here and call find_vma, when only
PR_SET_MM_ENV_END: is consuming it? I guess we can replace it wit the
new lock and take the mmap_sem only for PR_SET_MM_ENV_END.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 21:59     ` Yang Shi
@ 2018-03-27  7:32       ` Cyrill Gorcunov
  2018-03-27 18:44         ` Yang Shi
  0 siblings, 1 reply; 22+ messages in thread
From: Cyrill Gorcunov @ 2018-03-27  7:32 UTC (permalink / raw)
  To: Yang Shi
  Cc: Matthew Wilcox, adobriyan, mhocko, mguzik, akpm, linux-mm, linux-kernel

On Mon, Mar 26, 2018 at 05:59:49PM -0400, Yang Shi wrote:
> > Say we've two syscalls running prctl_set_mm_map in parallel, and imagine
> > one have @start_brk = 20 @brk = 10 and second caller has @start_brk = 30
> > and @brk = 20. Since now the call is guarded by _read_ the both calls
> > unlocked and due to OO engine it may happen then when both finish
> > we have @start_brk = 30 and @brk = 10. In turn "write" semaphore
> > has been take to have consistent data on exit, either you have [20;10]
> > or [30;20] assigned not something mixed.
> > 
> > That said I think using read-lock here would be a bug.
> 
> Yes it sounds so. However, it was down_read before
> ddf1d398e517e660207e2c807f76a90df543a217 ("prctl: take mmap sem for writing
> to protect against others"). And, that commit is for fixing the concurrent
> writing to arg_* and env_*. I just checked that commit, but omitted the brk
> part. The potential issue mentioned by you should exist before that commit,
> but might be just not discovered or very rare to hit.
> 
> I will change it back to down_write.

down_read before was a bug ;) And it was not discovered earlier simply
because not that many users of this interface exist, namely only criu
as far as I know by now.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 22:00         ` Tetsuo Handa
  2018-03-26 22:12           ` Yang Shi
@ 2018-03-27  7:37           ` Cyrill Gorcunov
  1 sibling, 0 replies; 22+ messages in thread
From: Cyrill Gorcunov @ 2018-03-27  7:37 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: willy, yang.shi, adobriyan, mhocko, mguzik, akpm, linux-mm, linux-kernel

On Tue, Mar 27, 2018 at 07:00:56AM +0900, Tetsuo Handa wrote:
> 
> >             To be fair I would prefer to drop this old per-field
> > interface completely. This per-field interface was rather an ugly
> > solution from my side.
> 
> But this is userspace visible API and thus we cannot change.

Hi! We could deplrecate this API call for a couple of releases
and then if nobody complain we could rip it off completely.
There should not be many users I think, didn't heard that
someone except criu used it ever.

> > > Then, I wonder whether reading arg_start|end and env_start|end atomically makes
> > > sense. Just retry reading if arg_start > env_end or env_start > env_end is fine?
> > 
> > Tetsuo, let me re-read this code tomorrow, maybe I miss something obvious.
> > 
> 
> You are not missing my point. What I thought is
> 
> +retry:
> -	down_read(&mm->mmap_sem);
>  	arg_start = mm->arg_start;
>  	arg_end = mm->arg_end;
>  	env_start = mm->env_start;
>  	env_end = mm->env_end;
> -	up_read(&mm->mmap_sem);
>  
> -	BUG_ON(arg_start > arg_end);
> -	BUG_ON(env_start > env_end);
> +	if (unlikely(arg_start > arg_end || env_start > env_end)) {
> +		cond_resched();
> +		goto retry;
> +	}
> 
> for reading these fields.

I fear such contentional cycles are acceptable if only they
are guaranteed to finish eventually. Which doesn't look so
in the code above.

	Cyrill

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-26 22:12           ` Yang Shi
@ 2018-03-27  7:38             ` Cyrill Gorcunov
  0 siblings, 0 replies; 22+ messages in thread
From: Cyrill Gorcunov @ 2018-03-27  7:38 UTC (permalink / raw)
  To: Yang Shi
  Cc: Tetsuo Handa, willy, adobriyan, mhocko, mguzik, akpm, linux-mm,
	linux-kernel

On Mon, Mar 26, 2018 at 06:12:55PM -0400, Yang Shi wrote:
> > +	if (unlikely(arg_start > arg_end || env_start > env_end)) {
> > +		cond_resched();
> > +		goto retry;
> 
> Can't it trap into dead loop if the condition is always false?

Yes, unfortunately it can.

> > +	}
> > 
> > for reading these fields.
> > 
> > By the way, /proc/pid/ readers are serving as a canary who tells something
> > mm_mmap related problem is happening. On the other hand, it is sad that
> > such canary cannot be terminated by signal due to use of unkillable waits.
> > I wish we can use killable waits.
> 
> I already proposed patches (https://lkml.org/lkml/2018/2/26/1197) to do this
> a few weeks ago. In the review, akpm suggested mitigate the mmap_sem
> contention instead of using killable version workaround. Then the
> preliminary unmaping by section patches
> (https://lkml.org/lkml/2018/3/20/786) were proposed. In the discussion, we
> decided to eliminate the mmap_sem abuse, this is where the patch came from.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-27  6:29 ` Michal Hocko
@ 2018-03-27 14:31   ` Mateusz Guzik
  2018-03-27 14:43     ` Michal Hocko
  2018-03-27 18:38   ` Yang Shi
  2018-04-02  1:58   ` Yang Shi
  2 siblings, 1 reply; 22+ messages in thread
From: Mateusz Guzik @ 2018-03-27 14:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Yang Shi, adobriyan, willy, gorcunov, akpm, linux-mm, linux-kernel

On Tue, Mar 27, 2018 at 08:29:39AM +0200, Michal Hocko wrote:
> On Tue 27-03-18 02:20:39, Yang Shi wrote:
> [...]
> The patch looks reasonable to me. Maybe it would be better to be more
> explicit about the purpose of the patch. As others noticed, this alone
> wouldn't solve the mmap_sem contention issues. I _think_ that if you
> were more explicit about the mmap_sem abuse it would trigger less
> questions.
> 

>From what I gather even with other fixes the kernel will still end up
grabbing the semaphore. In this case I don't see what's the upside of
adding the spinlock for args. The downside is growth of mm_struct.

i.e. the code can be refactored to just hold the lock and relock only if
necessary (unable to copy to user without faulting)

-- 
Mateusz Guzik

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-27 14:31   ` Mateusz Guzik
@ 2018-03-27 14:43     ` Michal Hocko
  0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-03-27 14:43 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Yang Shi, adobriyan, willy, gorcunov, akpm, linux-mm, linux-kernel

On Tue 27-03-18 16:31:23, Mateusz Guzik wrote:
> On Tue, Mar 27, 2018 at 08:29:39AM +0200, Michal Hocko wrote:
> > On Tue 27-03-18 02:20:39, Yang Shi wrote:
> > [...]
> > The patch looks reasonable to me. Maybe it would be better to be more
> > explicit about the purpose of the patch. As others noticed, this alone
> > wouldn't solve the mmap_sem contention issues. I _think_ that if you
> > were more explicit about the mmap_sem abuse it would trigger less
> > questions.
> > 
> 
> >From what I gather even with other fixes the kernel will still end up
> grabbing the semaphore. In this case I don't see what's the upside of
> adding the spinlock for args. The downside is growth of mm_struct.

Because accessing the specific address in the address space can be later
changed to use a more fine-grained locking. There are people
experimenting with range locking. These mmap_sem abusers, on the other
hand, will require the full range lock without a good reason. So it is
really worth it to remove them and replace by a more fine grained
locking.

If the mm_struct grow is a real concern (I haven't checked that) then we
can use a set of hashed locks or something else.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-27  6:29 ` Michal Hocko
  2018-03-27 14:31   ` Mateusz Guzik
@ 2018-03-27 18:38   ` Yang Shi
  2018-03-27 18:52     ` Cyrill Gorcunov
  2018-04-02  1:58   ` Yang Shi
  2 siblings, 1 reply; 22+ messages in thread
From: Yang Shi @ 2018-03-27 18:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: adobriyan, willy, mguzik, gorcunov, akpm, linux-mm, linux-kernel



On 3/27/18 2:29 AM, Michal Hocko wrote:
> On Tue 27-03-18 02:20:39, Yang Shi wrote:
> [...]
> The patch looks reasonable to me. Maybe it would be better to be more
> explicit about the purpose of the patch. As others noticed, this alone
> wouldn't solve the mmap_sem contention issues. I _think_ that if you
> were more explicit about the mmap_sem abuse it would trigger less
> questions.

Yes, sure.

>
> I have just one more question. Now that you are touching this area,
> would you be willing to remove the following ugliness?
>
>> diff --git a/kernel/sys.c b/kernel/sys.c
>> index f2289de..17bddd2 100644
>> --- a/kernel/sys.c
>> +++ b/kernel/sys.c
>> @@ -1959,7 +1959,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>>   			return error;
>>   	}
>>   
>> -	down_write(&mm->mmap_sem);
>> +	down_read(&mm->mmap_sem);
> Why do we need to hold mmap_sem here and call find_vma, when only
> PR_SET_MM_ENV_END: is consuming it? I guess we can replace it wit the
> new lock and take the mmap_sem only for PR_SET_MM_ENV_END.

Actually, I didn't think of why. It looks prctl_set_mm() checks if vma 
does exist when it tries to set stack_start, argv_* and env_*, btw not 
only env_end.

Cyrill may be able to give us some hint since C/R is the main user of 
this API.

Yang

>
> Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-27  7:32       ` Cyrill Gorcunov
@ 2018-03-27 18:44         ` Yang Shi
  0 siblings, 0 replies; 22+ messages in thread
From: Yang Shi @ 2018-03-27 18:44 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Matthew Wilcox, adobriyan, mhocko, mguzik, akpm, linux-mm, linux-kernel



On 3/27/18 3:32 AM, Cyrill Gorcunov wrote:
> On Mon, Mar 26, 2018 at 05:59:49PM -0400, Yang Shi wrote:
>>> Say we've two syscalls running prctl_set_mm_map in parallel, and imagine
>>> one have @start_brk = 20 @brk = 10 and second caller has @start_brk = 30
>>> and @brk = 20. Since now the call is guarded by _read_ the both calls
>>> unlocked and due to OO engine it may happen then when both finish
>>> we have @start_brk = 30 and @brk = 10. In turn "write" semaphore
>>> has been take to have consistent data on exit, either you have [20;10]
>>> or [30;20] assigned not something mixed.
>>>
>>> That said I think using read-lock here would be a bug.
>> Yes it sounds so. However, it was down_read before
>> ddf1d398e517e660207e2c807f76a90df543a217 ("prctl: take mmap sem for writing
>> to protect against others"). And, that commit is for fixing the concurrent
>> writing to arg_* and env_*. I just checked that commit, but omitted the brk
>> part. The potential issue mentioned by you should exist before that commit,
>> but might be just not discovered or very rare to hit.
>>
>> I will change it back to down_write.
> down_read before was a bug ;) And it was not discovered earlier simply
> because not that many users of this interface exist, namely only criu
> as far as I know by now.

Thanks for confirming this. I assumed so :-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-27 18:38   ` Yang Shi
@ 2018-03-27 18:52     ` Cyrill Gorcunov
  2018-03-28 13:10       ` Michal Hocko
  0 siblings, 1 reply; 22+ messages in thread
From: Cyrill Gorcunov @ 2018-03-27 18:52 UTC (permalink / raw)
  To: Yang Shi
  Cc: Michal Hocko, adobriyan, willy, mguzik, akpm, linux-mm, linux-kernel

On Tue, Mar 27, 2018 at 02:38:11PM -0400, Yang Shi wrote:
> > Why do we need to hold mmap_sem here and call find_vma, when only
> > PR_SET_MM_ENV_END: is consuming it? I guess we can replace it wit the
> > new lock and take the mmap_sem only for PR_SET_MM_ENV_END.
> 
> Actually, I didn't think of why. It looks prctl_set_mm() checks if vma does
> exist when it tries to set stack_start, argv_* and env_*, btw not only
> env_end.
> 
> Cyrill may be able to give us some hint since C/R is the main user of this
> API.

First and most important it makes code smaller. This prctl call is really
rarely used. Of course we can optimize it, but as I said I would prefer
to simply deprecate this old interface (and I gonne to do so once time
permit).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-27 18:52     ` Cyrill Gorcunov
@ 2018-03-28 13:10       ` Michal Hocko
  0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2018-03-28 13:10 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Yang Shi, adobriyan, willy, mguzik, akpm, linux-mm, linux-kernel

On Tue 27-03-18 21:52:17, Cyrill Gorcunov wrote:
> On Tue, Mar 27, 2018 at 02:38:11PM -0400, Yang Shi wrote:
> > > Why do we need to hold mmap_sem here and call find_vma, when only
> > > PR_SET_MM_ENV_END: is consuming it? I guess we can replace it wit the
> > > new lock and take the mmap_sem only for PR_SET_MM_ENV_END.
> > 
> > Actually, I didn't think of why. It looks prctl_set_mm() checks if vma does
> > exist when it tries to set stack_start, argv_* and env_*, btw not only
> > env_end.
> > 
> > Cyrill may be able to give us some hint since C/R is the main user of this
> > API.
> 
> First and most important it makes code smaller. This prctl call is really
> rarely used. Of course we can optimize it, but as I said I would prefer
> to simply deprecate this old interface (and I gonne to do so once time
> permit).

Ohh, it would be really great if we can remove this thingy altogether. I
cannot say it has a wee bit of my sympathy.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct
  2018-03-27  6:29 ` Michal Hocko
  2018-03-27 14:31   ` Mateusz Guzik
  2018-03-27 18:38   ` Yang Shi
@ 2018-04-02  1:58   ` Yang Shi
  2 siblings, 0 replies; 22+ messages in thread
From: Yang Shi @ 2018-04-02  1:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: adobriyan, willy, mguzik, gorcunov, akpm, linux-mm, linux-kernel



On 3/26/18 11:29 PM, Michal Hocko wrote:
> On Tue 27-03-18 02:20:39, Yang Shi wrote:
> [...]
> The patch looks reasonable to me. Maybe it would be better to be more
> explicit about the purpose of the patch. As others noticed, this alone
> wouldn't solve the mmap_sem contention issues. I _think_ that if you
> were more explicit about the mmap_sem abuse it would trigger less
> questions.
>
> I have just one more question. Now that you are touching this area,
> would you be willing to remove the following ugliness?
>
>> diff --git a/kernel/sys.c b/kernel/sys.c
>> index f2289de..17bddd2 100644
>> --- a/kernel/sys.c
>> +++ b/kernel/sys.c
>> @@ -1959,7 +1959,7 @@ static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long data
>>   			return error;
>>   	}
>>   
>> -	down_write(&mm->mmap_sem);
>> +	down_read(&mm->mmap_sem);
> Why do we need to hold mmap_sem here and call find_vma, when only
> PR_SET_MM_ENV_END: is consuming it? I guess we can replace it wit the
> new lock and take the mmap_sem only for PR_SET_MM_ENV_END.

Sorry for taking a little bit longer to get back since I was traveling. 
I think all the stuff can be protected by the new arg_lock except 
mm->brk since arg_lock can't prevent from concurrent writing from sys_brk().

We may use arg_lock to protect everything else other than mm->brk. The 
complexity sounds acceptable.

Of course, as Cyrill mentioned, he prefers to deprecating prctl_set_mm 
since C/R is the only user of it. We may wait until he is done?

Thanks,
Yang

>
> Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-04-02  1:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-26 18:20 [v2 PATCH] mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct Yang Shi
2018-03-26 18:37 ` Matthew Wilcox
2018-03-26 19:21   ` Cyrill Gorcunov
2018-03-26 21:10     ` Tetsuo Handa
2018-03-26 21:20       ` Yang Shi
2018-03-26 21:29       ` Cyrill Gorcunov
2018-03-26 22:00         ` Tetsuo Handa
2018-03-26 22:12           ` Yang Shi
2018-03-27  7:38             ` Cyrill Gorcunov
2018-03-27  7:37           ` Cyrill Gorcunov
2018-03-26 21:59     ` Yang Shi
2018-03-27  7:32       ` Cyrill Gorcunov
2018-03-27 18:44         ` Yang Shi
2018-03-26 19:42 ` Mateusz Guzik
2018-03-26 21:10   ` Yang Shi
2018-03-27  6:29 ` Michal Hocko
2018-03-27 14:31   ` Mateusz Guzik
2018-03-27 14:43     ` Michal Hocko
2018-03-27 18:38   ` Yang Shi
2018-03-27 18:52     ` Cyrill Gorcunov
2018-03-28 13:10       ` Michal Hocko
2018-04-02  1:58   ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).