linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2 -mm] mm: mremap: dwongrade mmap_sem to read when shrinking
@ 2018-09-26  0:46 Yang Shi
  2018-09-26  0:46 ` [PATCH 2/2 -mm] mm: brk: " Yang Shi
  2018-09-26  8:16 ` [PATCH 1/2 -mm] mm: mremap: " Michal Hocko
  0 siblings, 2 replies; 4+ messages in thread
From: Yang Shi @ 2018-09-26  0:46 UTC (permalink / raw)
  To: mhocko, kirill, willy, ldufour, vbabka, akpm
  Cc: yang.shi, linux-mm, linux-kernel

Other than munmap, mremap might be used to shrink memory mapping too.
Use __do_munmap() to shrink mapping with downgrading mmap_sem to read.

MREMAP_FIXED and MREMAP_MAYMOVE are more complicated to adopt this
optimization since they need manipulate vmas after do_munmap(),
downgrading mmap_sem may create race window.

Simple mapping shrink is the low hanging fruit, and it may cover the
most cases of unmap with munmap.

Cc: Michal Hocko <mhocko@kernel.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 include/linux/mm.h |  2 ++
 mm/mmap.c          |  4 ++--
 mm/mremap.c        | 16 ++++++++++++----
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a61ebe8..3028028 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2286,6 +2286,8 @@ extern unsigned long do_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot, unsigned long flags,
 	vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate,
 	struct list_head *uf);
+extern int __do_munmap(struct mm_struct *, unsigned long, size_t,
+		       struct list_head *uf, bool downgrade);
 extern int do_munmap(struct mm_struct *, unsigned long, size_t,
 		     struct list_head *uf);
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 847a17d..017bcfa 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2687,8 +2687,8 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
  * work.  This now handles partial unmappings.
  * Jeremy Fitzhardinge <jeremy@goop.org>
  */
-static int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
-		       struct list_head *uf, bool downgrade)
+int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
+		struct list_head *uf, bool downgrade)
 {
 	unsigned long end;
 	struct vm_area_struct *vma, *prev, *last;
diff --git a/mm/mremap.c b/mm/mremap.c
index 5c2e185..e608e92 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -525,6 +525,7 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta)
 	unsigned long ret = -EINVAL;
 	unsigned long charged = 0;
 	bool locked = false;
+	bool downgrade = false;
 	struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX;
 	LIST_HEAD(uf_unmap_early);
 	LIST_HEAD(uf_unmap);
@@ -561,12 +562,16 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta)
 	/*
 	 * Always allow a shrinking remap: that just unmaps
 	 * the unnecessary pages..
-	 * do_munmap does all the needed commit accounting
+	 * __do_munmap does all the needed commit accounting
 	 */
 	if (old_len >= new_len) {
-		ret = do_munmap(mm, addr+new_len, old_len - new_len, &uf_unmap);
-		if (ret && old_len != new_len)
+		ret = __do_munmap(mm, addr+new_len, old_len - new_len,
+				  &uf_unmap, true);
+		if (ret < 0 && old_len != new_len)
 			goto out;
+		/* Returning 1 indicates mmap_sem is downgraded to read. */
+		else if (ret == 1)
+			downgrade = true;
 		ret = addr;
 		goto out;
 	}
@@ -631,7 +636,10 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta)
 		vm_unacct_memory(charged);
 		locked = 0;
 	}
-	up_write(&current->mm->mmap_sem);
+	if (downgrade)
+		up_read(&current->mm->mmap_sem);
+	else
+		up_write(&current->mm->mmap_sem);
 	if (locked && new_len > old_len)
 		mm_populate(new_addr + old_len, new_len - old_len);
 	userfaultfd_unmap_complete(mm, &uf_unmap_early);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2 -mm] mm: brk: dwongrade mmap_sem to read when shrinking
  2018-09-26  0:46 [PATCH 1/2 -mm] mm: mremap: dwongrade mmap_sem to read when shrinking Yang Shi
@ 2018-09-26  0:46 ` Yang Shi
  2018-09-26  8:17   ` Michal Hocko
  2018-09-26  8:16 ` [PATCH 1/2 -mm] mm: mremap: " Michal Hocko
  1 sibling, 1 reply; 4+ messages in thread
From: Yang Shi @ 2018-09-26  0:46 UTC (permalink / raw)
  To: mhocko, kirill, willy, ldufour, vbabka, akpm
  Cc: yang.shi, linux-mm, linux-kernel

brk might be used to shinrk memory mapping too. Use __do_munmap() to
shrink mapping with downgrading mmap_sem to read.

Cc: Michal Hocko <mhocko@kernel.org>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 mm/mmap.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 017bcfa..3da14a1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -193,9 +193,11 @@ static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long
 	unsigned long retval;
 	unsigned long newbrk, oldbrk;
 	struct mm_struct *mm = current->mm;
+	unsigned long origbrk = mm->brk;
 	struct vm_area_struct *next;
 	unsigned long min_brk;
 	bool populate;
+	bool downgrade = false;
 	LIST_HEAD(uf);
 
 	if (down_write_killable(&mm->mmap_sem))
@@ -229,14 +231,26 @@ static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long
 
 	newbrk = PAGE_ALIGN(brk);
 	oldbrk = PAGE_ALIGN(mm->brk);
-	if (oldbrk == newbrk)
-		goto set_brk;
+	if (oldbrk == newbrk) {
+		mm->brk = brk;
+		goto success;
+	}
 
 	/* Always allow shrinking brk. */
 	if (brk <= mm->brk) {
-		if (!do_munmap(mm, newbrk, oldbrk-newbrk, &uf))
-			goto set_brk;
-		goto out;
+		/*
+		 * mm->brk need to be protected by write mmap_sem, update it
+		 * before downgrading mmap_sem.
+		 * When __do_munmap fail, it will be restored from origbrk.
+		 */
+		mm->brk = brk;
+		retval = __do_munmap(mm, newbrk, oldbrk-newbrk, &uf, true);
+		if (retval < 0) {
+			mm->brk = origbrk;
+			goto out;
+		} else if (retval == 1)
+			downgrade = true;
+		goto success;
 	}
 
 	/* Check against existing mmap mappings. */
@@ -247,18 +261,21 @@ static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long
 	/* Ok, looks good - let it rip. */
 	if (do_brk_flags(oldbrk, newbrk-oldbrk, 0, &uf) < 0)
 		goto out;
-
-set_brk:
 	mm->brk = brk;
+
+success:
 	populate = newbrk > oldbrk && (mm->def_flags & VM_LOCKED) != 0;
-	up_write(&mm->mmap_sem);
+	if (downgrade)
+		up_read(&mm->mmap_sem);
+	else
+		up_write(&mm->mmap_sem);
 	userfaultfd_unmap_complete(mm, &uf);
 	if (populate)
 		mm_populate(oldbrk, newbrk - oldbrk);
 	return brk;
 
 out:
-	retval = mm->brk;
+	retval = origbrk;
 	up_write(&mm->mmap_sem);
 	return retval;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2 -mm] mm: mremap: dwongrade mmap_sem to read when shrinking
  2018-09-26  0:46 [PATCH 1/2 -mm] mm: mremap: dwongrade mmap_sem to read when shrinking Yang Shi
  2018-09-26  0:46 ` [PATCH 2/2 -mm] mm: brk: " Yang Shi
@ 2018-09-26  8:16 ` Michal Hocko
  1 sibling, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2018-09-26  8:16 UTC (permalink / raw)
  To: Yang Shi; +Cc: kirill, willy, ldufour, vbabka, akpm, linux-mm, linux-kernel

On Wed 26-09-18 08:46:55, Yang Shi wrote:
> Other than munmap, mremap might be used to shrink memory mapping too.
> Use __do_munmap() to shrink mapping with downgrading mmap_sem to read.

Please be more explicit about _why_ this is OK. It wouldn't hurt to call
out benefits explicitly as well. You write about downgrate to the shared
lock which suggests what is this about but we can be less cryptic ;)

> MREMAP_FIXED and MREMAP_MAYMOVE are more complicated to adopt this
> optimization since they need manipulate vmas after do_munmap(),
> downgrading mmap_sem may create race window.
> 
> Simple mapping shrink is the low hanging fruit, and it may cover the
> most cases of unmap with munmap.
> 
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
>  include/linux/mm.h |  2 ++
>  mm/mmap.c          |  4 ++--
>  mm/mremap.c        | 16 ++++++++++++----
>  3 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a61ebe8..3028028 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2286,6 +2286,8 @@ extern unsigned long do_mmap(struct file *file, unsigned long addr,
>  	unsigned long len, unsigned long prot, unsigned long flags,
>  	vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate,
>  	struct list_head *uf);
> +extern int __do_munmap(struct mm_struct *, unsigned long, size_t,
> +		       struct list_head *uf, bool downgrade);
>  extern int do_munmap(struct mm_struct *, unsigned long, size_t,
>  		     struct list_head *uf);
>  
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 847a17d..017bcfa 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2687,8 +2687,8 @@ int split_vma(struct mm_struct *mm, struct vm_area_struct *vma,
>   * work.  This now handles partial unmappings.
>   * Jeremy Fitzhardinge <jeremy@goop.org>
>   */
> -static int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
> -		       struct list_head *uf, bool downgrade)
> +int __do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
> +		struct list_head *uf, bool downgrade)
>  {
>  	unsigned long end;
>  	struct vm_area_struct *vma, *prev, *last;
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 5c2e185..e608e92 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -525,6 +525,7 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta)
>  	unsigned long ret = -EINVAL;
>  	unsigned long charged = 0;
>  	bool locked = false;
> +	bool downgrade = false;
>  	struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX;
>  	LIST_HEAD(uf_unmap_early);
>  	LIST_HEAD(uf_unmap);
> @@ -561,12 +562,16 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta)
>  	/*
>  	 * Always allow a shrinking remap: that just unmaps
>  	 * the unnecessary pages..
> -	 * do_munmap does all the needed commit accounting
> +	 * __do_munmap does all the needed commit accounting
>  	 */
>  	if (old_len >= new_len) {
> -		ret = do_munmap(mm, addr+new_len, old_len - new_len, &uf_unmap);
> -		if (ret && old_len != new_len)
> +		ret = __do_munmap(mm, addr+new_len, old_len - new_len,
> +				  &uf_unmap, true);
> +		if (ret < 0 && old_len != new_len)
>  			goto out;
> +		/* Returning 1 indicates mmap_sem is downgraded to read. */
> +		else if (ret == 1)
> +			downgrade = true;
>  		ret = addr;
>  		goto out;
>  	}
> @@ -631,7 +636,10 @@ static int vma_expandable(struct vm_area_struct *vma, unsigned long delta)
>  		vm_unacct_memory(charged);
>  		locked = 0;
>  	}
> -	up_write(&current->mm->mmap_sem);
> +	if (downgrade)
> +		up_read(&current->mm->mmap_sem);
> +	else
> +		up_write(&current->mm->mmap_sem);
>  	if (locked && new_len > old_len)
>  		mm_populate(new_addr + old_len, new_len - old_len);
>  	userfaultfd_unmap_complete(mm, &uf_unmap_early);
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2 -mm] mm: brk: dwongrade mmap_sem to read when shrinking
  2018-09-26  0:46 ` [PATCH 2/2 -mm] mm: brk: " Yang Shi
@ 2018-09-26  8:17   ` Michal Hocko
  0 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2018-09-26  8:17 UTC (permalink / raw)
  To: Yang Shi; +Cc: kirill, willy, ldufour, vbabka, akpm, linux-mm, linux-kernel

On Wed 26-09-18 08:46:56, Yang Shi wrote:
> brk might be used to shinrk memory mapping too. Use __do_munmap() to
> shrink mapping with downgrading mmap_sem to read.

same comment wrt the changelog as for the previous patch.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-09-26  8:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-26  0:46 [PATCH 1/2 -mm] mm: mremap: dwongrade mmap_sem to read when shrinking Yang Shi
2018-09-26  0:46 ` [PATCH 2/2 -mm] mm: brk: " Yang Shi
2018-09-26  8:17   ` Michal Hocko
2018-09-26  8:16 ` [PATCH 1/2 -mm] mm: mremap: " Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).