All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/18] change mmap_sem taken for write killable v2
@ 2016-04-26 12:56 ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Alexander Viro, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Ingo Molnar, Jeff Moyer, Johannes Weiner,
	Kirill A. Shutemov, Konstantin Khlebnikov, Mel Gorman,
	Michal Hocko, Oleg Nesterov, Peter Zijlstra, Petr Cermak,
	Thomas Gleixner, Vlastimil Babka

Hi,
The previous version of the series was posted here [0]. There were
no large changes since then. I have rebased the series on top of the
current linux-next (next-20160426) and added few clarifications based on
the review feedback and acks/reviewed-by.

This is a follow up work for oom_reaper [1]. As the async OOM killing
depends on oom_sem for read we would really appreciate if a holder
for write didn't stood in the way. This patchset is changing many of
down_write calls to be killable to help those cases when the writer
is blocked and waiting for readers to release the lock and so help
__oom_reap_task to process the oom victim.

Most of the patches are really trivial because the lock is help from a
shallow syscall paths where we can return EINTR trivially and allow the
current task to die (note that EINTR will never get to the userspace as
the task has fatal signal pending). Others seem to be easy as well as
the callers are already handling fatal errors and bail and return to
userspace which should be sufficient to handle the failure gracefully. I
am not familiar with all those code paths so a deeper review is really
appreciated.

As this work is touching more areas which are not directly connected I
have tried to keep the CC list as small as possible and people who I
believed would be familiar are CCed only to the specific patches (all
should have received the cover though).

This patchset is based on linux-next and it depends on
down_write_killable for rw_semaphores which got merged into tip
locking/rwsem branch and it is merged into this next tree.  I guess
it would be easiest to route these patches via mmotm because of the
dependency on the tip tree but if respective maintainers prefer other
way I have no objections.

I haven't covered all the mmap_write(mm->mmap_sem) instances here

$ git grep "down_write(.*\<mmap_sem\>)" next/master | wc -l
98
$ git grep "down_write(.*\<mmap_sem\>)" | wc -l
62

I have tried to cover those which should be relatively easy to review in
this series because this alone should be a nice improvement. Other places
can be changed on top.

Any feedback is highly appreciated.

---
[0] http://lkml.kernel.org/r/1456752417-9626-1-git-send-email-mhocko@kernel.org
[1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1456750705-7141-1-git-send-email-mhocko@kernel.org

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 0/18] change mmap_sem taken for write killable v2
@ 2016-04-26 12:56 ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Alexander Viro, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Ingo Molnar, Jeff Moyer, Johannes Weiner,
	Kirill A. Shutemov, Konstantin Khlebnikov, Mel Gorman,
	Michal Hocko, Oleg Nesterov, Peter Zijlstra, Petr Cermak,
	Thomas Gleixner, Vlastimil Babka

Hi,
The previous version of the series was posted here [0]. There were
no large changes since then. I have rebased the series on top of the
current linux-next (next-20160426) and added few clarifications based on
the review feedback and acks/reviewed-by.

This is a follow up work for oom_reaper [1]. As the async OOM killing
depends on oom_sem for read we would really appreciate if a holder
for write didn't stood in the way. This patchset is changing many of
down_write calls to be killable to help those cases when the writer
is blocked and waiting for readers to release the lock and so help
__oom_reap_task to process the oom victim.

Most of the patches are really trivial because the lock is help from a
shallow syscall paths where we can return EINTR trivially and allow the
current task to die (note that EINTR will never get to the userspace as
the task has fatal signal pending). Others seem to be easy as well as
the callers are already handling fatal errors and bail and return to
userspace which should be sufficient to handle the failure gracefully. I
am not familiar with all those code paths so a deeper review is really
appreciated.

As this work is touching more areas which are not directly connected I
have tried to keep the CC list as small as possible and people who I
believed would be familiar are CCed only to the specific patches (all
should have received the cover though).

This patchset is based on linux-next and it depends on
down_write_killable for rw_semaphores which got merged into tip
locking/rwsem branch and it is merged into this next tree.  I guess
it would be easiest to route these patches via mmotm because of the
dependency on the tip tree but if respective maintainers prefer other
way I have no objections.

I haven't covered all the mmap_write(mm->mmap_sem) instances here

$ git grep "down_write(.*\<mmap_sem\>)" next/master | wc -l
98
$ git grep "down_write(.*\<mmap_sem\>)" | wc -l
62

I have tried to cover those which should be relatively easy to review in
this series because this alone should be a nice improvement. Other places
can be changed on top.

Any feedback is highly appreciated.

---
[0] http://lkml.kernel.org/r/1456752417-9626-1-git-send-email-mhocko@kernel.org
[1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org
[2] http://lkml.kernel.org/r/1456750705-7141-1-git-send-email-mhocko@kernel.org


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 01/18] mm: Make mmap_sem for write waits killable for mm syscalls
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Mel Gorman, Kirill A. Shutemov,
	Konstantin Khlebnikov, Hugh Dickins, Andrea Arcangeli,
	David Rientjes, Dave Hansen, Johannes Weiner, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

This is the first step in making mmap_sem write waiters killable. It
focuses on the trivial ones which are taking the lock early after
entering the syscall and they are not changing state before.

Therefore it is very easy to change them to use down_write_killable
and immediately return with -EINTR. This will allow the waiter to
pass away without blocking the mmap_sem which might be required to
make a forward progress. E.g. the oom reaper will need the lock for
reading to dismantle the OOM victim address space.

The only tricky function in this patch is vm_mmap_pgoff which has many
call sites via vm_mmap. To reduce the risk keep vm_mmap with the
original non-killable semantic for now.

vm_munmap callers do not bother checking the return value so open code
it into the munmap syscall path for now for simplicity.

Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/internal.h |  5 +++--
 mm/madvise.c  |  8 +++++---
 mm/mlock.c    | 16 ++++++++++------
 mm/mmap.c     | 27 +++++++++++++++++++++++----
 mm/mprotect.c |  3 ++-
 mm/mremap.c   |  3 ++-
 mm/nommu.c    |  2 +-
 mm/util.c     | 12 +++++++++---
 8 files changed, 55 insertions(+), 21 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index f69851ddf98d..bdc754e90c53 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -451,9 +451,10 @@ extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
 
-extern unsigned long vm_mmap_pgoff(struct file *, unsigned long,
+extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
         unsigned long, unsigned long,
-        unsigned long, unsigned long);
+        unsigned long, unsigned long,
+        bool);
 
 extern void set_pageblock_order(void);
 unsigned long reclaim_clean_pages_from_list(struct zone *zone,
diff --git a/mm/madvise.c b/mm/madvise.c
index 07427d3fcead..93fb63e88b5e 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -707,10 +707,12 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
 		return error;
 
 	write = madvise_need_mmap_write(behavior);
-	if (write)
-		down_write(&current->mm->mmap_sem);
-	else
+	if (write) {
+		if (down_write_killable(&current->mm->mmap_sem))
+			return -EINTR;
+	} else {
 		down_read(&current->mm->mmap_sem);
+	}
 
 	/*
 	 * If the interval [start,end) covers some unmapped address
diff --git a/mm/mlock.c b/mm/mlock.c
index 96f001041928..ef8dc9f395c4 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -617,7 +617,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
 	return error;
 }
 
-static int do_mlock(unsigned long start, size_t len, vm_flags_t flags)
+static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t flags)
 {
 	unsigned long locked;
 	unsigned long lock_limit;
@@ -635,7 +635,8 @@ static int do_mlock(unsigned long start, size_t len, vm_flags_t flags)
 	lock_limit >>= PAGE_SHIFT;
 	locked = len >> PAGE_SHIFT;
 
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem))
+		return -EINTR;
 
 	locked += current->mm->locked_vm;
 
@@ -678,7 +679,8 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
 	len = PAGE_ALIGN(len + (offset_in_page(start)));
 	start &= PAGE_MASK;
 
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem))
+		return -EINTR;
 	ret = apply_vma_lock_flags(start, len, 0);
 	up_write(&current->mm->mmap_sem);
 
@@ -748,9 +750,10 @@ SYSCALL_DEFINE1(mlockall, int, flags)
 	lock_limit = rlimit(RLIMIT_MEMLOCK);
 	lock_limit >>= PAGE_SHIFT;
 
-	ret = -ENOMEM;
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem))
+		return -EINTR;
 
+	ret = -ENOMEM;
 	if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) ||
 	    capable(CAP_IPC_LOCK))
 		ret = apply_mlockall_flags(flags);
@@ -765,7 +768,8 @@ SYSCALL_DEFINE0(munlockall)
 {
 	int ret;
 
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem))
+		return -EINTR;
 	ret = apply_mlockall_flags(0);
 	up_write(&current->mm->mmap_sem);
 	return ret;
diff --git a/mm/mmap.c b/mm/mmap.c
index fba246b8f1a5..a11cdb6d2566 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -178,7 +178,8 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 	unsigned long min_brk;
 	bool populate;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 #ifdef CONFIG_COMPAT_BRK
 	/*
@@ -1332,7 +1333,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff, true);
 out_fput:
 	if (file)
 		fput(file);
@@ -2493,6 +2494,10 @@ int vm_munmap(unsigned long start, size_t len)
 	int ret;
 	struct mm_struct *mm = current->mm;
 
+	/*
+	 * XXX convert to down_write_killable as soon as all users are able
+	 * to handle the error.
+	 */
 	down_write(&mm->mmap_sem);
 	ret = do_munmap(mm, start, len);
 	up_write(&mm->mmap_sem);
@@ -2502,8 +2507,15 @@ EXPORT_SYMBOL(vm_munmap);
 
 SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len)
 {
+	int ret;
+	struct mm_struct *mm = current->mm;
+
 	profile_munmap(addr);
-	return vm_munmap(addr, len);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+	ret = do_munmap(mm, addr, len);
+	up_write(&mm->mmap_sem);
+	return ret;
 }
 
 
@@ -2535,7 +2547,9 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
 		return ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	vma = find_vma(mm, start);
 
 	if (!vma || !(vma->vm_flags & VM_SHARED))
@@ -2700,6 +2714,11 @@ unsigned long vm_brk(unsigned long addr, unsigned long len)
 	unsigned long ret;
 	bool populate;
 
+	/*
+	 * XXX not all users are chcecking the return value, convert
+	 * to down_write_killable after they are able to cope with
+	 * error
+	 */
 	down_write(&mm->mmap_sem);
 	ret = do_brk(addr, len);
 	populate = ((mm->def_flags & VM_LOCKED) != 0);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index b650c5412f58..5019a1ef2848 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -379,7 +379,8 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 
 	reqprot = prot;
 
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem))
+		return -EINTR;
 
 	vma = find_vma(current->mm, start);
 	error = -ENOMEM;
diff --git a/mm/mremap.c b/mm/mremap.c
index 9dc499977924..1f157adfdaf9 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -503,7 +503,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 	if (!new_len)
 		return ret;
 
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem))
+		return -EINTR;
 
 	if (flags & MREMAP_FIXED) {
 		ret = mremap_to(addr, old_len, new_addr, new_len,
diff --git a/mm/nommu.c b/mm/nommu.c
index c8bd59a03c71..b74512746aae 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1446,7 +1446,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff, true);
 
 	if (file)
 		fput(file);
diff --git a/mm/util.c b/mm/util.c
index 8a1b3a1fb595..03b237746850 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -289,7 +289,7 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
 
 unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
-	unsigned long flag, unsigned long pgoff)
+	unsigned long flag, unsigned long pgoff, bool killable)
 {
 	unsigned long ret;
 	struct mm_struct *mm = current->mm;
@@ -297,7 +297,12 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 
 	ret = security_mmap_file(file, prot, flag);
 	if (!ret) {
-		down_write(&mm->mmap_sem);
+		if (killable) {
+			if (down_write_killable(&mm->mmap_sem))
+				return -EINTR;
+		} else {
+			down_write(&mm->mmap_sem);
+		}
 		ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
 				    &populate);
 		up_write(&mm->mmap_sem);
@@ -307,6 +312,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	return ret;
 }
 
+/* XXX are all callers checking an error */
 unsigned long vm_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
 	unsigned long flag, unsigned long offset)
@@ -316,7 +322,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 	if (unlikely(offset_in_page(offset)))
 		return -EINVAL;
 
-	return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT);
+	return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT, false);
 }
 EXPORT_SYMBOL(vm_mmap);
 
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 01/18] mm: Make mmap_sem for write waits killable for mm syscalls
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Mel Gorman, Kirill A. Shutemov,
	Konstantin Khlebnikov, Hugh Dickins, Andrea Arcangeli,
	David Rientjes, Dave Hansen, Johannes Weiner, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

This is the first step in making mmap_sem write waiters killable. It
focuses on the trivial ones which are taking the lock early after
entering the syscall and they are not changing state before.

Therefore it is very easy to change them to use down_write_killable
and immediately return with -EINTR. This will allow the waiter to
pass away without blocking the mmap_sem which might be required to
make a forward progress. E.g. the oom reaper will need the lock for
reading to dismantle the OOM victim address space.

The only tricky function in this patch is vm_mmap_pgoff which has many
call sites via vm_mmap. To reduce the risk keep vm_mmap with the
original non-killable semantic for now.

vm_munmap callers do not bother checking the return value so open code
it into the munmap syscall path for now for simplicity.

Cc: Mel Gorman <mgorman@suse.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/internal.h |  5 +++--
 mm/madvise.c  |  8 +++++---
 mm/mlock.c    | 16 ++++++++++------
 mm/mmap.c     | 27 +++++++++++++++++++++++----
 mm/mprotect.c |  3 ++-
 mm/mremap.c   |  3 ++-
 mm/nommu.c    |  2 +-
 mm/util.c     | 12 +++++++++---
 8 files changed, 55 insertions(+), 21 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index f69851ddf98d..bdc754e90c53 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -451,9 +451,10 @@ extern u64 hwpoison_filter_flags_value;
 extern u64 hwpoison_filter_memcg;
 extern u32 hwpoison_filter_enable;
 
-extern unsigned long vm_mmap_pgoff(struct file *, unsigned long,
+extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
         unsigned long, unsigned long,
-        unsigned long, unsigned long);
+        unsigned long, unsigned long,
+        bool);
 
 extern void set_pageblock_order(void);
 unsigned long reclaim_clean_pages_from_list(struct zone *zone,
diff --git a/mm/madvise.c b/mm/madvise.c
index 07427d3fcead..93fb63e88b5e 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -707,10 +707,12 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
 		return error;
 
 	write = madvise_need_mmap_write(behavior);
-	if (write)
-		down_write(&current->mm->mmap_sem);
-	else
+	if (write) {
+		if (down_write_killable(&current->mm->mmap_sem))
+			return -EINTR;
+	} else {
 		down_read(&current->mm->mmap_sem);
+	}
 
 	/*
 	 * If the interval [start,end) covers some unmapped address
diff --git a/mm/mlock.c b/mm/mlock.c
index 96f001041928..ef8dc9f395c4 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -617,7 +617,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
 	return error;
 }
 
-static int do_mlock(unsigned long start, size_t len, vm_flags_t flags)
+static __must_check int do_mlock(unsigned long start, size_t len, vm_flags_t flags)
 {
 	unsigned long locked;
 	unsigned long lock_limit;
@@ -635,7 +635,8 @@ static int do_mlock(unsigned long start, size_t len, vm_flags_t flags)
 	lock_limit >>= PAGE_SHIFT;
 	locked = len >> PAGE_SHIFT;
 
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem))
+		return -EINTR;
 
 	locked += current->mm->locked_vm;
 
@@ -678,7 +679,8 @@ SYSCALL_DEFINE2(munlock, unsigned long, start, size_t, len)
 	len = PAGE_ALIGN(len + (offset_in_page(start)));
 	start &= PAGE_MASK;
 
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem))
+		return -EINTR;
 	ret = apply_vma_lock_flags(start, len, 0);
 	up_write(&current->mm->mmap_sem);
 
@@ -748,9 +750,10 @@ SYSCALL_DEFINE1(mlockall, int, flags)
 	lock_limit = rlimit(RLIMIT_MEMLOCK);
 	lock_limit >>= PAGE_SHIFT;
 
-	ret = -ENOMEM;
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem))
+		return -EINTR;
 
+	ret = -ENOMEM;
 	if (!(flags & MCL_CURRENT) || (current->mm->total_vm <= lock_limit) ||
 	    capable(CAP_IPC_LOCK))
 		ret = apply_mlockall_flags(flags);
@@ -765,7 +768,8 @@ SYSCALL_DEFINE0(munlockall)
 {
 	int ret;
 
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem))
+		return -EINTR;
 	ret = apply_mlockall_flags(0);
 	up_write(&current->mm->mmap_sem);
 	return ret;
diff --git a/mm/mmap.c b/mm/mmap.c
index fba246b8f1a5..a11cdb6d2566 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -178,7 +178,8 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
 	unsigned long min_brk;
 	bool populate;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 #ifdef CONFIG_COMPAT_BRK
 	/*
@@ -1332,7 +1333,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff, true);
 out_fput:
 	if (file)
 		fput(file);
@@ -2493,6 +2494,10 @@ int vm_munmap(unsigned long start, size_t len)
 	int ret;
 	struct mm_struct *mm = current->mm;
 
+	/*
+	 * XXX convert to down_write_killable as soon as all users are able
+	 * to handle the error.
+	 */
 	down_write(&mm->mmap_sem);
 	ret = do_munmap(mm, start, len);
 	up_write(&mm->mmap_sem);
@@ -2502,8 +2507,15 @@ EXPORT_SYMBOL(vm_munmap);
 
 SYSCALL_DEFINE2(munmap, unsigned long, addr, size_t, len)
 {
+	int ret;
+	struct mm_struct *mm = current->mm;
+
 	profile_munmap(addr);
-	return vm_munmap(addr, len);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+	ret = do_munmap(mm, addr, len);
+	up_write(&mm->mmap_sem);
+	return ret;
 }
 
 
@@ -2535,7 +2547,9 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
 	if (pgoff + (size >> PAGE_SHIFT) < pgoff)
 		return ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	vma = find_vma(mm, start);
 
 	if (!vma || !(vma->vm_flags & VM_SHARED))
@@ -2700,6 +2714,11 @@ unsigned long vm_brk(unsigned long addr, unsigned long len)
 	unsigned long ret;
 	bool populate;
 
+	/*
+	 * XXX not all users are chcecking the return value, convert
+	 * to down_write_killable after they are able to cope with
+	 * error
+	 */
 	down_write(&mm->mmap_sem);
 	ret = do_brk(addr, len);
 	populate = ((mm->def_flags & VM_LOCKED) != 0);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index b650c5412f58..5019a1ef2848 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -379,7 +379,8 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len,
 
 	reqprot = prot;
 
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem))
+		return -EINTR;
 
 	vma = find_vma(current->mm, start);
 	error = -ENOMEM;
diff --git a/mm/mremap.c b/mm/mremap.c
index 9dc499977924..1f157adfdaf9 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -503,7 +503,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
 	if (!new_len)
 		return ret;
 
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem))
+		return -EINTR;
 
 	if (flags & MREMAP_FIXED) {
 		ret = mremap_to(addr, old_len, new_addr, new_len,
diff --git a/mm/nommu.c b/mm/nommu.c
index c8bd59a03c71..b74512746aae 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1446,7 +1446,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
+	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff, true);
 
 	if (file)
 		fput(file);
diff --git a/mm/util.c b/mm/util.c
index 8a1b3a1fb595..03b237746850 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -289,7 +289,7 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
 
 unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
-	unsigned long flag, unsigned long pgoff)
+	unsigned long flag, unsigned long pgoff, bool killable)
 {
 	unsigned long ret;
 	struct mm_struct *mm = current->mm;
@@ -297,7 +297,12 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 
 	ret = security_mmap_file(file, prot, flag);
 	if (!ret) {
-		down_write(&mm->mmap_sem);
+		if (killable) {
+			if (down_write_killable(&mm->mmap_sem))
+				return -EINTR;
+		} else {
+			down_write(&mm->mmap_sem);
+		}
 		ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
 				    &populate);
 		up_write(&mm->mmap_sem);
@@ -307,6 +312,7 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	return ret;
 }
 
+/* XXX are all callers checking an error */
 unsigned long vm_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
 	unsigned long flag, unsigned long offset)
@@ -316,7 +322,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 	if (unlikely(offset_in_page(offset)))
 		return -EINVAL;
 
-	return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT);
+	return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT, false);
 }
 EXPORT_SYMBOL(vm_mmap);
 
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 02/18] mm: make vm_mmap killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Kirill A. Shutemov, Mel Gorman,
	Oleg Nesterov, Andrea Arcangeli, Al Viro, Johannes Weiner,
	Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

All the callers of vm_mmap seem to check for the failure already
and bail out in one way or another on the error which means that
we can change it to use killable version of vm_mmap_pgoff and return
-EINTR if the current task gets killed while waiting for mmap_sem.
This also means that vm_mmap_pgoff can be killable by default and
drop the additional parameter.

This will help in the OOM conditions when the oom victim might be stuck
waiting for the mmap_sem for write which in turn can block oom_reaper
which relies on the mmap_sem for read to make a forward progress
and reclaim the address space of the victim.

Please note that load_elf_binary is ignoring vm_mmap error for
current->personality & MMAP_PAGE_ZERO case but that shouldn't be a
problem because the address is not used anywhere and we never return to
the userspace if we got killed.

Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/mm.h |  2 +-
 mm/internal.h      |  3 +--
 mm/mmap.c          |  2 +-
 mm/nommu.c         |  2 +-
 mm/util.c          | 13 ++++---------
 5 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ed06956a8a12..1085e025852a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2005,7 +2005,7 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {}
 /* These take the mm semaphore themselves */
 extern unsigned long vm_brk(unsigned long, unsigned long);
 extern int vm_munmap(unsigned long, size_t);
-extern unsigned long vm_mmap(struct file *, unsigned long,
+extern unsigned long __must_check vm_mmap(struct file *, unsigned long,
         unsigned long, unsigned long,
         unsigned long, unsigned long);
 
diff --git a/mm/internal.h b/mm/internal.h
index bdc754e90c53..dc2af5b7b85f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -453,8 +453,7 @@ extern u32 hwpoison_filter_enable;
 
 extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
         unsigned long, unsigned long,
-        unsigned long, unsigned long,
-        bool);
+        unsigned long, unsigned long);
 
 extern void set_pageblock_order(void);
 unsigned long reclaim_clean_pages_from_list(struct zone *zone,
diff --git a/mm/mmap.c b/mm/mmap.c
index a11cdb6d2566..1d229487dab1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1333,7 +1333,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff, true);
+	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
 out_fput:
 	if (file)
 		fput(file);
diff --git a/mm/nommu.c b/mm/nommu.c
index b74512746aae..c8bd59a03c71 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1446,7 +1446,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff, true);
+	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
 
 	if (file)
 		fput(file);
diff --git a/mm/util.c b/mm/util.c
index 03b237746850..917e0e3d0f8e 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -289,7 +289,7 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
 
 unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
-	unsigned long flag, unsigned long pgoff, bool killable)
+	unsigned long flag, unsigned long pgoff)
 {
 	unsigned long ret;
 	struct mm_struct *mm = current->mm;
@@ -297,12 +297,8 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 
 	ret = security_mmap_file(file, prot, flag);
 	if (!ret) {
-		if (killable) {
-			if (down_write_killable(&mm->mmap_sem))
-				return -EINTR;
-		} else {
-			down_write(&mm->mmap_sem);
-		}
+		if (down_write_killable(&mm->mmap_sem))
+			return -EINTR;
 		ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
 				    &populate);
 		up_write(&mm->mmap_sem);
@@ -312,7 +308,6 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	return ret;
 }
 
-/* XXX are all callers checking an error */
 unsigned long vm_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
 	unsigned long flag, unsigned long offset)
@@ -322,7 +317,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 	if (unlikely(offset_in_page(offset)))
 		return -EINVAL;
 
-	return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT, false);
+	return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT);
 }
 EXPORT_SYMBOL(vm_mmap);
 
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 02/18] mm: make vm_mmap killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Kirill A. Shutemov, Mel Gorman,
	Oleg Nesterov, Andrea Arcangeli, Al Viro, Johannes Weiner,
	Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

All the callers of vm_mmap seem to check for the failure already
and bail out in one way or another on the error which means that
we can change it to use killable version of vm_mmap_pgoff and return
-EINTR if the current task gets killed while waiting for mmap_sem.
This also means that vm_mmap_pgoff can be killable by default and
drop the additional parameter.

This will help in the OOM conditions when the oom victim might be stuck
waiting for the mmap_sem for write which in turn can block oom_reaper
which relies on the mmap_sem for read to make a forward progress
and reclaim the address space of the victim.

Please note that load_elf_binary is ignoring vm_mmap error for
current->personality & MMAP_PAGE_ZERO case but that shouldn't be a
problem because the address is not used anywhere and we never return to
the userspace if we got killed.

Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/mm.h |  2 +-
 mm/internal.h      |  3 +--
 mm/mmap.c          |  2 +-
 mm/nommu.c         |  2 +-
 mm/util.c          | 13 ++++---------
 5 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ed06956a8a12..1085e025852a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2005,7 +2005,7 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {}
 /* These take the mm semaphore themselves */
 extern unsigned long vm_brk(unsigned long, unsigned long);
 extern int vm_munmap(unsigned long, size_t);
-extern unsigned long vm_mmap(struct file *, unsigned long,
+extern unsigned long __must_check vm_mmap(struct file *, unsigned long,
         unsigned long, unsigned long,
         unsigned long, unsigned long);
 
diff --git a/mm/internal.h b/mm/internal.h
index bdc754e90c53..dc2af5b7b85f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -453,8 +453,7 @@ extern u32 hwpoison_filter_enable;
 
 extern unsigned long  __must_check vm_mmap_pgoff(struct file *, unsigned long,
         unsigned long, unsigned long,
-        unsigned long, unsigned long,
-        bool);
+        unsigned long, unsigned long);
 
 extern void set_pageblock_order(void);
 unsigned long reclaim_clean_pages_from_list(struct zone *zone,
diff --git a/mm/mmap.c b/mm/mmap.c
index a11cdb6d2566..1d229487dab1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1333,7 +1333,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff, true);
+	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
 out_fput:
 	if (file)
 		fput(file);
diff --git a/mm/nommu.c b/mm/nommu.c
index b74512746aae..c8bd59a03c71 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1446,7 +1446,7 @@ SYSCALL_DEFINE6(mmap_pgoff, unsigned long, addr, unsigned long, len,
 
 	flags &= ~(MAP_EXECUTABLE | MAP_DENYWRITE);
 
-	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff, true);
+	retval = vm_mmap_pgoff(file, addr, len, prot, flags, pgoff);
 
 	if (file)
 		fput(file);
diff --git a/mm/util.c b/mm/util.c
index 03b237746850..917e0e3d0f8e 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -289,7 +289,7 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
 
 unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
-	unsigned long flag, unsigned long pgoff, bool killable)
+	unsigned long flag, unsigned long pgoff)
 {
 	unsigned long ret;
 	struct mm_struct *mm = current->mm;
@@ -297,12 +297,8 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 
 	ret = security_mmap_file(file, prot, flag);
 	if (!ret) {
-		if (killable) {
-			if (down_write_killable(&mm->mmap_sem))
-				return -EINTR;
-		} else {
-			down_write(&mm->mmap_sem);
-		}
+		if (down_write_killable(&mm->mmap_sem))
+			return -EINTR;
 		ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
 				    &populate);
 		up_write(&mm->mmap_sem);
@@ -312,7 +308,6 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 	return ret;
 }
 
-/* XXX are all callers checking an error */
 unsigned long vm_mmap(struct file *file, unsigned long addr,
 	unsigned long len, unsigned long prot,
 	unsigned long flag, unsigned long offset)
@@ -322,7 +317,7 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
 	if (unlikely(offset_in_page(offset)))
 		return -EINVAL;
 
-	return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT, false);
+	return vm_mmap_pgoff(file, addr, len, prot, flag, offset >> PAGE_SHIFT);
 }
 EXPORT_SYMBOL(vm_mmap);
 
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 03/18] mm: make vm_munmap killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Oleg Nesterov, Kirill A. Shutemov,
	Konstantin Khlebnikov, Andrea Arcangeli, Alexander Viro

From: Michal Hocko <mhocko@suse.com>

Almost all current users of vm_munmap are ignoring the return value
and so they do not handle potential error. This means that some VMAs
might stay behind. This patch doesn't try to solve those potential
problems. Quite contrary it adds a new failure mode by using
down_write_killable in vm_munmap. This should be safer than other
failure modes, though, because the process is guaranteed to die
as soon as it leaves the kernel and exit_mmap will clean the whole
address space.

This will help in the OOM conditions when the oom victim might be stuck
waiting for the mmap_sem for write which in turn can block oom_reaper
which relies on the mmap_sem for read to make a forward progress and
reclaim the address space of the victim.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/mmap.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 1d229487dab1..032605bda665 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2494,11 +2494,9 @@ int vm_munmap(unsigned long start, size_t len)
 	int ret;
 	struct mm_struct *mm = current->mm;
 
-	/*
-	 * XXX convert to down_write_killable as soon as all users are able
-	 * to handle the error.
-	 */
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	ret = do_munmap(mm, start, len);
 	up_write(&mm->mmap_sem);
 	return ret;
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 03/18] mm: make vm_munmap killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Oleg Nesterov, Kirill A. Shutemov,
	Konstantin Khlebnikov, Andrea Arcangeli, Alexander Viro

From: Michal Hocko <mhocko@suse.com>

Almost all current users of vm_munmap are ignoring the return value
and so they do not handle potential error. This means that some VMAs
might stay behind. This patch doesn't try to solve those potential
problems. Quite contrary it adds a new failure mode by using
down_write_killable in vm_munmap. This should be safer than other
failure modes, though, because the process is guaranteed to die
as soon as it leaves the kernel and exit_mmap will clean the whole
address space.

This will help in the OOM conditions when the oom victim might be stuck
waiting for the mmap_sem for write which in turn can block oom_reaper
which relies on the mmap_sem for read to make a forward progress and
reclaim the address space of the victim.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/mmap.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 1d229487dab1..032605bda665 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2494,11 +2494,9 @@ int vm_munmap(unsigned long start, size_t len)
 	int ret;
 	struct mm_struct *mm = current->mm;
 
-	/*
-	 * XXX convert to down_write_killable as soon as all users are able
-	 * to handle the error.
-	 */
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	ret = do_munmap(mm, start, len);
 	up_write(&mm->mmap_sem);
 	return ret;
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 04/18] mm, aout: handle vm_brk failures
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Alexander Viro, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

vm_brk is allowed to fail but load_aout_binary simply ignores the error
and happily continues. I haven't noticed any problem from that in real
life but later patches will make the failure more likely because
vm_brk will become killable (resp. mmap_sem for write waiting will become
killable) so we should be more careful now.

The error handling should be quite straightforward because there are
calls to vm_mmap which check the error properly already. The only
notable exception is set_brk which is called after beyond_if label.
But nothing indicates that we cannot move it above set_binfmt as the two
do not depend on each other and fail before we do set_binfmt and alter
reference counting.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/ia32/ia32_aout.c | 22 +++++++++++++++-------
 fs/binfmt_aout.c          | 11 ++++++++---
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index ae6aad1d24f7..f5e737ff0022 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -116,13 +116,13 @@ static struct linux_binfmt aout_format = {
 	.min_coredump	= PAGE_SIZE
 };
 
-static void set_brk(unsigned long start, unsigned long end)
+static unsigned long set_brk(unsigned long start, unsigned long end)
 {
 	start = PAGE_ALIGN(start);
 	end = PAGE_ALIGN(end);
 	if (end <= start)
-		return;
-	vm_brk(start, end - start);
+		return start;
+	return vm_brk(start, end - start);
 }
 
 #ifdef CONFIG_COREDUMP
@@ -349,7 +349,10 @@ static int load_aout_binary(struct linux_binprm *bprm)
 #endif
 
 		if (!bprm->file->f_op->mmap || (fd_offset & ~PAGE_MASK) != 0) {
-			vm_brk(N_TXTADDR(ex), ex.a_text+ex.a_data);
+			error = vm_brk(N_TXTADDR(ex), ex.a_text+ex.a_data);
+			if (IS_ERR_VALUE(error))
+				return error;
+
 			read_code(bprm->file, N_TXTADDR(ex), fd_offset,
 					ex.a_text+ex.a_data);
 			goto beyond_if;
@@ -372,10 +375,13 @@ static int load_aout_binary(struct linux_binprm *bprm)
 		if (error != N_DATADDR(ex))
 			return error;
 	}
+
 beyond_if:
-	set_binfmt(&aout_format);
+	error = set_brk(current->mm->start_brk, current->mm->brk);
+	if (IS_ERR_VALUE(error))
+		return error;
 
-	set_brk(current->mm->start_brk, current->mm->brk);
+	set_binfmt(&aout_format);
 
 	current->mm->start_stack =
 		(unsigned long)create_aout_tables((char __user *)bprm->p, bprm);
@@ -434,7 +440,9 @@ static int load_aout_library(struct file *file)
 			error_time = jiffies;
 		}
 #endif
-		vm_brk(start_addr, ex.a_text + ex.a_data + ex.a_bss);
+		retval = vm_brk(start_addr, ex.a_text + ex.a_data + ex.a_bss);
+		if (IS_ERR_VALUE(retval))
+			goto out;
 
 		read_code(file, start_addr, N_TXTOFF(ex),
 			  ex.a_text + ex.a_data);
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index 4c556680fa74..2fab9f130e51 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -297,7 +297,10 @@ static int load_aout_binary(struct linux_binprm * bprm)
 		}
 
 		if (!bprm->file->f_op->mmap||((fd_offset & ~PAGE_MASK) != 0)) {
-			vm_brk(N_TXTADDR(ex), ex.a_text+ex.a_data);
+			error = vm_brk(N_TXTADDR(ex), ex.a_text+ex.a_data);
+			if (IS_ERR_VALUE(error))
+				return error;
+
 			read_code(bprm->file, N_TXTADDR(ex), fd_offset,
 				  ex.a_text + ex.a_data);
 			goto beyond_if;
@@ -378,8 +381,10 @@ static int load_aout_library(struct file *file)
 			       "N_TXTOFF is not page aligned. Please convert library: %pD\n",
 			       file);
 		}
-		vm_brk(start_addr, ex.a_text + ex.a_data + ex.a_bss);
-		
+		retval = vm_brk(start_addr, ex.a_text + ex.a_data + ex.a_bss);
+		if (IS_ERR_VALUE(retval))
+			goto out;
+
 		read_code(file, start_addr, N_TXTOFF(ex),
 			  ex.a_text + ex.a_data);
 		retval = 0;
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 04/18] mm, aout: handle vm_brk failures
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Alexander Viro, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

vm_brk is allowed to fail but load_aout_binary simply ignores the error
and happily continues. I haven't noticed any problem from that in real
life but later patches will make the failure more likely because
vm_brk will become killable (resp. mmap_sem for write waiting will become
killable) so we should be more careful now.

The error handling should be quite straightforward because there are
calls to vm_mmap which check the error properly already. The only
notable exception is set_brk which is called after beyond_if label.
But nothing indicates that we cannot move it above set_binfmt as the two
do not depend on each other and fail before we do set_binfmt and alter
reference counting.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/ia32/ia32_aout.c | 22 +++++++++++++++-------
 fs/binfmt_aout.c          | 11 ++++++++---
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index ae6aad1d24f7..f5e737ff0022 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -116,13 +116,13 @@ static struct linux_binfmt aout_format = {
 	.min_coredump	= PAGE_SIZE
 };
 
-static void set_brk(unsigned long start, unsigned long end)
+static unsigned long set_brk(unsigned long start, unsigned long end)
 {
 	start = PAGE_ALIGN(start);
 	end = PAGE_ALIGN(end);
 	if (end <= start)
-		return;
-	vm_brk(start, end - start);
+		return start;
+	return vm_brk(start, end - start);
 }
 
 #ifdef CONFIG_COREDUMP
@@ -349,7 +349,10 @@ static int load_aout_binary(struct linux_binprm *bprm)
 #endif
 
 		if (!bprm->file->f_op->mmap || (fd_offset & ~PAGE_MASK) != 0) {
-			vm_brk(N_TXTADDR(ex), ex.a_text+ex.a_data);
+			error = vm_brk(N_TXTADDR(ex), ex.a_text+ex.a_data);
+			if (IS_ERR_VALUE(error))
+				return error;
+
 			read_code(bprm->file, N_TXTADDR(ex), fd_offset,
 					ex.a_text+ex.a_data);
 			goto beyond_if;
@@ -372,10 +375,13 @@ static int load_aout_binary(struct linux_binprm *bprm)
 		if (error != N_DATADDR(ex))
 			return error;
 	}
+
 beyond_if:
-	set_binfmt(&aout_format);
+	error = set_brk(current->mm->start_brk, current->mm->brk);
+	if (IS_ERR_VALUE(error))
+		return error;
 
-	set_brk(current->mm->start_brk, current->mm->brk);
+	set_binfmt(&aout_format);
 
 	current->mm->start_stack =
 		(unsigned long)create_aout_tables((char __user *)bprm->p, bprm);
@@ -434,7 +440,9 @@ static int load_aout_library(struct file *file)
 			error_time = jiffies;
 		}
 #endif
-		vm_brk(start_addr, ex.a_text + ex.a_data + ex.a_bss);
+		retval = vm_brk(start_addr, ex.a_text + ex.a_data + ex.a_bss);
+		if (IS_ERR_VALUE(retval))
+			goto out;
 
 		read_code(file, start_addr, N_TXTOFF(ex),
 			  ex.a_text + ex.a_data);
diff --git a/fs/binfmt_aout.c b/fs/binfmt_aout.c
index 4c556680fa74..2fab9f130e51 100644
--- a/fs/binfmt_aout.c
+++ b/fs/binfmt_aout.c
@@ -297,7 +297,10 @@ static int load_aout_binary(struct linux_binprm * bprm)
 		}
 
 		if (!bprm->file->f_op->mmap||((fd_offset & ~PAGE_MASK) != 0)) {
-			vm_brk(N_TXTADDR(ex), ex.a_text+ex.a_data);
+			error = vm_brk(N_TXTADDR(ex), ex.a_text+ex.a_data);
+			if (IS_ERR_VALUE(error))
+				return error;
+
 			read_code(bprm->file, N_TXTADDR(ex), fd_offset,
 				  ex.a_text + ex.a_data);
 			goto beyond_if;
@@ -378,8 +381,10 @@ static int load_aout_library(struct file *file)
 			       "N_TXTOFF is not page aligned. Please convert library: %pD\n",
 			       file);
 		}
-		vm_brk(start_addr, ex.a_text + ex.a_data + ex.a_bss);
-		
+		retval = vm_brk(start_addr, ex.a_text + ex.a_data + ex.a_bss);
+		if (IS_ERR_VALUE(retval))
+			goto out;
+
 		read_code(file, start_addr, N_TXTOFF(ex),
 			  ex.a_text + ex.a_data);
 		retval = 0;
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 05/18] mm, elf: handle vm_brk error
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Alexander Viro, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

load_elf_library doesn't handle vm_brk failure although nothing really
indicates it cannot do that because the function is allowed to fail
due to vm_mmap failures already. This might be not a problem now
but later patch will make vm_brk killable (resp. mmap_sem for write
waiting will become killable) and so the failure will be more probable.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/binfmt_elf.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 81381cc0dd17..37455ee5aeec 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1176,8 +1176,11 @@ static int load_elf_library(struct file *file)
 	len = ELF_PAGESTART(eppnt->p_filesz + eppnt->p_vaddr +
 			    ELF_MIN_ALIGN - 1);
 	bss = eppnt->p_memsz + eppnt->p_vaddr;
-	if (bss > len)
-		vm_brk(len, bss - len);
+	if (bss > len) {
+		error = vm_brk(len, bss - len);
+		if (BAD_ADDR(error))
+			goto out_free_ph;
+	}
 	error = 0;
 
 out_free_ph:
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 05/18] mm, elf: handle vm_brk error
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Alexander Viro, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

load_elf_library doesn't handle vm_brk failure although nothing really
indicates it cannot do that because the function is allowed to fail
due to vm_mmap failures already. This might be not a problem now
but later patch will make vm_brk killable (resp. mmap_sem for write
waiting will become killable) and so the failure will be more probable.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/binfmt_elf.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 81381cc0dd17..37455ee5aeec 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1176,8 +1176,11 @@ static int load_elf_library(struct file *file)
 	len = ELF_PAGESTART(eppnt->p_filesz + eppnt->p_vaddr +
 			    ELF_MIN_ALIGN - 1);
 	bss = eppnt->p_memsz + eppnt->p_vaddr;
-	if (bss > len)
-		vm_brk(len, bss - len);
+	if (bss > len) {
+		error = vm_brk(len, bss - len);
+		if (BAD_ADDR(error))
+			goto out_free_ph;
+	}
 	error = 0;
 
 out_free_ph:
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 06/18] mm: make vm_brk killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Kirill A. Shutemov, Oleg Nesterov,
	Andrea Arcangeli, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

Now that all the callers handle vm_brk failure we can change it
wait for mmap_sem killable to help oom_reaper to not get blocked
just because vm_brk gets blocked behind mmap_sem readers.

Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/mm.h | 2 +-
 mm/mmap.c          | 9 +++------
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1085e025852a..7b52750caf9e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2003,7 +2003,7 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {}
 #endif
 
 /* These take the mm semaphore themselves */
-extern unsigned long vm_brk(unsigned long, unsigned long);
+extern unsigned long __must_check vm_brk(unsigned long, unsigned long);
 extern int vm_munmap(unsigned long, size_t);
 extern unsigned long __must_check vm_mmap(struct file *, unsigned long,
         unsigned long, unsigned long,
diff --git a/mm/mmap.c b/mm/mmap.c
index 032605bda665..62cb02310494 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2712,12 +2712,9 @@ unsigned long vm_brk(unsigned long addr, unsigned long len)
 	unsigned long ret;
 	bool populate;
 
-	/*
-	 * XXX not all users are chcecking the return value, convert
-	 * to down_write_killable after they are able to cope with
-	 * error
-	 */
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	ret = do_brk(addr, len);
 	populate = ((mm->def_flags & VM_LOCKED) != 0);
 	up_write(&mm->mmap_sem);
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 06/18] mm: make vm_brk killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Kirill A. Shutemov, Oleg Nesterov,
	Andrea Arcangeli, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

Now that all the callers handle vm_brk failure we can change it
wait for mmap_sem killable to help oom_reaper to not get blocked
just because vm_brk gets blocked behind mmap_sem readers.

Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/mm.h | 2 +-
 mm/mmap.c          | 9 +++------
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1085e025852a..7b52750caf9e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2003,7 +2003,7 @@ static inline void mm_populate(unsigned long addr, unsigned long len) {}
 #endif
 
 /* These take the mm semaphore themselves */
-extern unsigned long vm_brk(unsigned long, unsigned long);
+extern unsigned long __must_check vm_brk(unsigned long, unsigned long);
 extern int vm_munmap(unsigned long, size_t);
 extern unsigned long __must_check vm_mmap(struct file *, unsigned long,
         unsigned long, unsigned long,
diff --git a/mm/mmap.c b/mm/mmap.c
index 032605bda665..62cb02310494 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2712,12 +2712,9 @@ unsigned long vm_brk(unsigned long addr, unsigned long len)
 	unsigned long ret;
 	bool populate;
 
-	/*
-	 * XXX not all users are chcecking the return value, convert
-	 * to down_write_killable after they are able to cope with
-	 * error
-	 */
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	ret = do_brk(addr, len);
 	populate = ((mm->def_flags & VM_LOCKED) != 0);
 	up_write(&mm->mmap_sem);
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 07/18] mm, proc: make clear_refs killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Petr Cermak, Oleg Nesterov, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

CLEAR_REFS_MM_HIWATER_RSS and CLEAR_REFS_SOFT_DIRTY are relying on
mmap_sem for write. If the waiting task gets killed by the oom killer
and it would operate on the current's mm it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting. This will also expedite the return
to the userspace and do_exit even if the mm is remote.

Cc: Petr Cermak <petrcermak@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/proc/task_mmu.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 541583510cfb..4648c7f63ae2 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1027,11 +1027,15 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 		};
 
 		if (type == CLEAR_REFS_MM_HIWATER_RSS) {
+			if (down_write_killable(&mm->mmap_sem)) {
+				count = -EINTR;
+				goto out_mm;
+			}
+
 			/*
 			 * Writing 5 to /proc/pid/clear_refs resets the peak
 			 * resident set size to this mm's current rss value.
 			 */
-			down_write(&mm->mmap_sem);
 			reset_mm_hiwater_rss(mm);
 			up_write(&mm->mmap_sem);
 			goto out_mm;
@@ -1043,7 +1047,10 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 				if (!(vma->vm_flags & VM_SOFTDIRTY))
 					continue;
 				up_read(&mm->mmap_sem);
-				down_write(&mm->mmap_sem);
+				if (down_write_killable(&mm->mmap_sem)) {
+					count = -EINTR;
+					goto out_mm;
+				}
 				for (vma = mm->mmap; vma; vma = vma->vm_next) {
 					vma->vm_flags &= ~VM_SOFTDIRTY;
 					vma_set_page_prot(vma);
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 07/18] mm, proc: make clear_refs killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Petr Cermak, Oleg Nesterov, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

CLEAR_REFS_MM_HIWATER_RSS and CLEAR_REFS_SOFT_DIRTY are relying on
mmap_sem for write. If the waiting task gets killed by the oom killer
and it would operate on the current's mm it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting. This will also expedite the return
to the userspace and do_exit even if the mm is remote.

Cc: Petr Cermak <petrcermak@chromium.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/proc/task_mmu.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 541583510cfb..4648c7f63ae2 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1027,11 +1027,15 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 		};
 
 		if (type == CLEAR_REFS_MM_HIWATER_RSS) {
+			if (down_write_killable(&mm->mmap_sem)) {
+				count = -EINTR;
+				goto out_mm;
+			}
+
 			/*
 			 * Writing 5 to /proc/pid/clear_refs resets the peak
 			 * resident set size to this mm's current rss value.
 			 */
-			down_write(&mm->mmap_sem);
 			reset_mm_hiwater_rss(mm);
 			up_write(&mm->mmap_sem);
 			goto out_mm;
@@ -1043,7 +1047,10 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
 				if (!(vma->vm_flags & VM_SOFTDIRTY))
 					continue;
 				up_read(&mm->mmap_sem);
-				down_write(&mm->mmap_sem);
+				if (down_write_killable(&mm->mmap_sem)) {
+					count = -EINTR;
+					goto out_mm;
+				}
 				for (vma = mm->mmap; vma; vma = vma->vm_next) {
 					vma->vm_flags &= ~VM_SOFTDIRTY;
 					vma_set_page_prot(vma);
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 08/18] mm, fork: make dup_mmap wait for mmap_sem for write killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Ingo Molnar, Peter Zijlstra, Oleg Nesterov,
	Konstantin Khlebnikov, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

dup_mmap needs to lock current's mm mmap_sem for write. If the waiting
task gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 kernel/fork.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 4bb0a7a0fbe0..bb29839a7e1b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -413,7 +413,10 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 	unsigned long charge;
 
 	uprobe_start_dup_mmap();
-	down_write(&oldmm->mmap_sem);
+	if (down_write_killable(&oldmm->mmap_sem)) {
+		retval = -EINTR;
+		goto fail_uprobe_end;
+	}
 	flush_cache_dup_mm(oldmm);
 	uprobe_dup_mmap(oldmm, mm);
 	/*
@@ -525,6 +528,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 	up_write(&mm->mmap_sem);
 	flush_tlb_mm(oldmm);
 	up_write(&oldmm->mmap_sem);
+fail_uprobe_end:
 	uprobe_end_dup_mmap();
 	return retval;
 fail_nomem_anon_vma_fork:
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 08/18] mm, fork: make dup_mmap wait for mmap_sem for write killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Ingo Molnar, Peter Zijlstra, Oleg Nesterov,
	Konstantin Khlebnikov, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

dup_mmap needs to lock current's mm mmap_sem for write. If the waiting
task gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 kernel/fork.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 4bb0a7a0fbe0..bb29839a7e1b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -413,7 +413,10 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 	unsigned long charge;
 
 	uprobe_start_dup_mmap();
-	down_write(&oldmm->mmap_sem);
+	if (down_write_killable(&oldmm->mmap_sem)) {
+		retval = -EINTR;
+		goto fail_uprobe_end;
+	}
 	flush_cache_dup_mm(oldmm);
 	uprobe_dup_mmap(oldmm, mm);
 	/*
@@ -525,6 +528,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm)
 	up_write(&mm->mmap_sem);
 	flush_tlb_mm(oldmm);
 	up_write(&oldmm->mmap_sem);
+fail_uprobe_end:
 	uprobe_end_dup_mmap();
 	return retval;
 fail_nomem_anon_vma_fork:
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 09/18] ipc, shm: make shmem attach/detach wait for mmap_sem killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Hugh Dickins, Davidlohr Bueso, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

shmat and shmdt rely on mmap_sem for write. If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely
OOM resolving. Wait for the lock in the killable mode and return with
EINTR if the task got killed while waiting.

Cc: Hugh Dickins <hughd@google.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 ipc/shm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index 331fc1b0b3c7..13282510bc0d 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1200,7 +1200,11 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
 	if (err)
 		goto out_fput;
 
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem)) {
+		err = -EINTR;
+		goto out_fput;
+	}
+
 	if (addr && !(shmflg & SHM_REMAP)) {
 		err = -EINVAL;
 		if (addr + size < addr)
@@ -1271,7 +1275,8 @@ SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)
 	if (addr & ~PAGE_MASK)
 		return retval;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	/*
 	 * This function tries to be smart and unmap shm segments that
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 09/18] ipc, shm: make shmem attach/detach wait for mmap_sem killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Hugh Dickins, Davidlohr Bueso, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

shmat and shmdt rely on mmap_sem for write. If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely
OOM resolving. Wait for the lock in the killable mode and return with
EINTR if the task got killed while waiting.

Cc: Hugh Dickins <hughd@google.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 ipc/shm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index 331fc1b0b3c7..13282510bc0d 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1200,7 +1200,11 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
 	if (err)
 		goto out_fput;
 
-	down_write(&current->mm->mmap_sem);
+	if (down_write_killable(&current->mm->mmap_sem)) {
+		err = -EINTR;
+		goto out_fput;
+	}
+
 	if (addr && !(shmflg & SHM_REMAP)) {
 		err = -EINVAL;
 		if (addr + size < addr)
@@ -1271,7 +1275,8 @@ SYSCALL_DEFINE1(shmdt, char __user *, shmaddr)
 	if (addr & ~PAGE_MASK)
 		return retval;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	/*
 	 * This function tries to be smart and unmap shm segments that
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 10/18] vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
  2016-04-26 12:56 ` Michal Hocko
  (?)
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, linux-arch, Andy Lutomirski, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

most architectures are relying on mmap_sem for write in their
arch_setup_additional_pages. If the waiting task gets killed by the oom
killer it would block oom_reaper from asynchronous address space reclaim
and reduce the chances of timely OOM resolving. Wait for the lock in
the killable mode and return with EINTR if the task got killed while
waiting.

Cc: linux-arch@vger.kernel.org
Acked-by: Andy Lutomirski <luto@amacapital.net> # for the x86 vdso
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/arm/kernel/process.c          | 3 ++-
 arch/arm64/kernel/vdso.c           | 6 ++++--
 arch/hexagon/kernel/vdso.c         | 3 ++-
 arch/mips/kernel/vdso.c            | 3 ++-
 arch/powerpc/kernel/vdso.c         | 3 ++-
 arch/s390/kernel/vdso.c            | 3 ++-
 arch/sh/kernel/vsyscall/vsyscall.c | 4 +++-
 arch/x86/entry/vdso/vma.c          | 3 ++-
 arch/x86/um/vdso/vma.c             | 3 ++-
 9 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index a647d6642f3e..4a803c5a1ff7 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -420,7 +420,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	npages = 1; /* for sigpage */
 	npages += vdso_total_pages;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	hint = sigpage_addr(mm, npages);
 	addr = get_unmapped_area(NULL, hint, npages << PAGE_SHIFT, 0, 0);
 	if (IS_ERR_VALUE(addr)) {
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 64fc030be0f2..9fefb005812a 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -95,7 +95,8 @@ int aarch32_setup_vectors_page(struct linux_binprm *bprm, int uses_interp)
 	};
 	void *ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	current->mm->context.vdso = (void *)addr;
 
 	/* Map vectors page at the high address. */
@@ -163,7 +164,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 	/* Be sure to map the data page */
 	vdso_mapping_len = vdso_text_len + PAGE_SIZE;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0);
 	if (IS_ERR_VALUE(vdso_base)) {
 		ret = ERR_PTR(vdso_base);
diff --git a/arch/hexagon/kernel/vdso.c b/arch/hexagon/kernel/vdso.c
index 0bf5a87e4d0a..3ea968415539 100644
--- a/arch/hexagon/kernel/vdso.c
+++ b/arch/hexagon/kernel/vdso.c
@@ -65,7 +65,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	unsigned long vdso_base;
 	struct mm_struct *mm = current->mm;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	/* Try to get it loaded right near ld.so/glibc. */
 	vdso_base = STACK_TOP;
diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 975e99759bab..54e1663ce639 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -104,7 +104,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	struct resource gic_res;
 	int ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	/*
 	 * Determine total area size. This includes the VDSO data itself, the
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index def1b8b5e6c1..6767605ea8da 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -195,7 +195,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	 * and end up putting it elsewhere.
 	 * Add enough to the size so that the result can be aligned.
 	 */
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	vdso_base = get_unmapped_area(NULL, vdso_base,
 				      (vdso_pages << PAGE_SHIFT) +
 				      ((VDSO_ALIGNMENT - 1) & PAGE_MASK),
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index 94495cac8be3..5904abf6b1ae 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -216,7 +216,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	 * it at vdso_base which is the "natural" base for it, but we might
 	 * fail and end up putting it elsewhere.
 	 */
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	vdso_base = get_unmapped_area(NULL, 0, vdso_pages << PAGE_SHIFT, 0, 0);
 	if (IS_ERR_VALUE(vdso_base)) {
 		rc = vdso_base;
diff --git a/arch/sh/kernel/vsyscall/vsyscall.c b/arch/sh/kernel/vsyscall/vsyscall.c
index ea2aa1393b87..cc0cc5b4ff18 100644
--- a/arch/sh/kernel/vsyscall/vsyscall.c
+++ b/arch/sh/kernel/vsyscall/vsyscall.c
@@ -64,7 +64,9 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	unsigned long addr;
 	int ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
 	if (IS_ERR_VALUE(addr)) {
 		ret = addr;
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index b3cf81333a54..ab220ac9b3b9 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -163,7 +163,8 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 		addr = 0;
 	}
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	addr = get_unmapped_area(NULL, addr,
 				 image->size - image->sym_vvar_start, 0, 0);
diff --git a/arch/x86/um/vdso/vma.c b/arch/x86/um/vdso/vma.c
index 237c6831e095..6be22f991b59 100644
--- a/arch/x86/um/vdso/vma.c
+++ b/arch/x86/um/vdso/vma.c
@@ -61,7 +61,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	if (!vdso_enabled)
 		return 0;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	err = install_special_mapping(mm, um_vdso_addr, PAGE_SIZE,
 		VM_READ|VM_EXEC|
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 10/18] vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, linux-arch, Andy Lutomirski, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

most architectures are relying on mmap_sem for write in their
arch_setup_additional_pages. If the waiting task gets killed by the oom
killer it would block oom_reaper from asynchronous address space reclaim
and reduce the chances of timely OOM resolving. Wait for the lock in
the killable mode and return with EINTR if the task got killed while
waiting.

Cc: linux-arch@vger.kernel.org
Acked-by: Andy Lutomirski <luto@amacapital.net> # for the x86 vdso
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/arm/kernel/process.c          | 3 ++-
 arch/arm64/kernel/vdso.c           | 6 ++++--
 arch/hexagon/kernel/vdso.c         | 3 ++-
 arch/mips/kernel/vdso.c            | 3 ++-
 arch/powerpc/kernel/vdso.c         | 3 ++-
 arch/s390/kernel/vdso.c            | 3 ++-
 arch/sh/kernel/vsyscall/vsyscall.c | 4 +++-
 arch/x86/entry/vdso/vma.c          | 3 ++-
 arch/x86/um/vdso/vma.c             | 3 ++-
 9 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index a647d6642f3e..4a803c5a1ff7 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -420,7 +420,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	npages = 1; /* for sigpage */
 	npages += vdso_total_pages;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	hint = sigpage_addr(mm, npages);
 	addr = get_unmapped_area(NULL, hint, npages << PAGE_SHIFT, 0, 0);
 	if (IS_ERR_VALUE(addr)) {
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 64fc030be0f2..9fefb005812a 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -95,7 +95,8 @@ int aarch32_setup_vectors_page(struct linux_binprm *bprm, int uses_interp)
 	};
 	void *ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	current->mm->context.vdso = (void *)addr;
 
 	/* Map vectors page at the high address. */
@@ -163,7 +164,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 	/* Be sure to map the data page */
 	vdso_mapping_len = vdso_text_len + PAGE_SIZE;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0);
 	if (IS_ERR_VALUE(vdso_base)) {
 		ret = ERR_PTR(vdso_base);
diff --git a/arch/hexagon/kernel/vdso.c b/arch/hexagon/kernel/vdso.c
index 0bf5a87e4d0a..3ea968415539 100644
--- a/arch/hexagon/kernel/vdso.c
+++ b/arch/hexagon/kernel/vdso.c
@@ -65,7 +65,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	unsigned long vdso_base;
 	struct mm_struct *mm = current->mm;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	/* Try to get it loaded right near ld.so/glibc. */
 	vdso_base = STACK_TOP;
diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 975e99759bab..54e1663ce639 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -104,7 +104,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	struct resource gic_res;
 	int ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	/*
 	 * Determine total area size. This includes the VDSO data itself, the
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index def1b8b5e6c1..6767605ea8da 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -195,7 +195,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	 * and end up putting it elsewhere.
 	 * Add enough to the size so that the result can be aligned.
 	 */
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	vdso_base = get_unmapped_area(NULL, vdso_base,
 				      (vdso_pages << PAGE_SHIFT) +
 				      ((VDSO_ALIGNMENT - 1) & PAGE_MASK),
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index 94495cac8be3..5904abf6b1ae 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -216,7 +216,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	 * it at vdso_base which is the "natural" base for it, but we might
 	 * fail and end up putting it elsewhere.
 	 */
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	vdso_base = get_unmapped_area(NULL, 0, vdso_pages << PAGE_SHIFT, 0, 0);
 	if (IS_ERR_VALUE(vdso_base)) {
 		rc = vdso_base;
diff --git a/arch/sh/kernel/vsyscall/vsyscall.c b/arch/sh/kernel/vsyscall/vsyscall.c
index ea2aa1393b87..cc0cc5b4ff18 100644
--- a/arch/sh/kernel/vsyscall/vsyscall.c
+++ b/arch/sh/kernel/vsyscall/vsyscall.c
@@ -64,7 +64,9 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	unsigned long addr;
 	int ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
 	if (IS_ERR_VALUE(addr)) {
 		ret = addr;
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index b3cf81333a54..ab220ac9b3b9 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -163,7 +163,8 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 		addr = 0;
 	}
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	addr = get_unmapped_area(NULL, addr,
 				 image->size - image->sym_vvar_start, 0, 0);
diff --git a/arch/x86/um/vdso/vma.c b/arch/x86/um/vdso/vma.c
index 237c6831e095..6be22f991b59 100644
--- a/arch/x86/um/vdso/vma.c
+++ b/arch/x86/um/vdso/vma.c
@@ -61,7 +61,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	if (!vdso_enabled)
 		return 0;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	err = install_special_mapping(mm, um_vdso_addr, PAGE_SIZE,
 		VM_READ|VM_EXEC|
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 10/18] vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, linux-arch, Andy Lutomirski, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

most architectures are relying on mmap_sem for write in their
arch_setup_additional_pages. If the waiting task gets killed by the oom
killer it would block oom_reaper from asynchronous address space reclaim
and reduce the chances of timely OOM resolving. Wait for the lock in
the killable mode and return with EINTR if the task got killed while
waiting.

Cc: linux-arch@vger.kernel.org
Acked-by: Andy Lutomirski <luto@amacapital.net> # for the x86 vdso
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/arm/kernel/process.c          | 3 ++-
 arch/arm64/kernel/vdso.c           | 6 ++++--
 arch/hexagon/kernel/vdso.c         | 3 ++-
 arch/mips/kernel/vdso.c            | 3 ++-
 arch/powerpc/kernel/vdso.c         | 3 ++-
 arch/s390/kernel/vdso.c            | 3 ++-
 arch/sh/kernel/vsyscall/vsyscall.c | 4 +++-
 arch/x86/entry/vdso/vma.c          | 3 ++-
 arch/x86/um/vdso/vma.c             | 3 ++-
 9 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index a647d6642f3e..4a803c5a1ff7 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -420,7 +420,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	npages = 1; /* for sigpage */
 	npages += vdso_total_pages;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	hint = sigpage_addr(mm, npages);
 	addr = get_unmapped_area(NULL, hint, npages << PAGE_SHIFT, 0, 0);
 	if (IS_ERR_VALUE(addr)) {
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 64fc030be0f2..9fefb005812a 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -95,7 +95,8 @@ int aarch32_setup_vectors_page(struct linux_binprm *bprm, int uses_interp)
 	};
 	void *ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	current->mm->context.vdso = (void *)addr;
 
 	/* Map vectors page at the high address. */
@@ -163,7 +164,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 	/* Be sure to map the data page */
 	vdso_mapping_len = vdso_text_len + PAGE_SIZE;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0);
 	if (IS_ERR_VALUE(vdso_base)) {
 		ret = ERR_PTR(vdso_base);
diff --git a/arch/hexagon/kernel/vdso.c b/arch/hexagon/kernel/vdso.c
index 0bf5a87e4d0a..3ea968415539 100644
--- a/arch/hexagon/kernel/vdso.c
+++ b/arch/hexagon/kernel/vdso.c
@@ -65,7 +65,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	unsigned long vdso_base;
 	struct mm_struct *mm = current->mm;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	/* Try to get it loaded right near ld.so/glibc. */
 	vdso_base = STACK_TOP;
diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index 975e99759bab..54e1663ce639 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -104,7 +104,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	struct resource gic_res;
 	int ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	/*
 	 * Determine total area size. This includes the VDSO data itself, the
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index def1b8b5e6c1..6767605ea8da 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -195,7 +195,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	 * and end up putting it elsewhere.
 	 * Add enough to the size so that the result can be aligned.
 	 */
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	vdso_base = get_unmapped_area(NULL, vdso_base,
 				      (vdso_pages << PAGE_SHIFT) +
 				      ((VDSO_ALIGNMENT - 1) & PAGE_MASK),
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index 94495cac8be3..5904abf6b1ae 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -216,7 +216,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	 * it at vdso_base which is the "natural" base for it, but we might
 	 * fail and end up putting it elsewhere.
 	 */
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 	vdso_base = get_unmapped_area(NULL, 0, vdso_pages << PAGE_SHIFT, 0, 0);
 	if (IS_ERR_VALUE(vdso_base)) {
 		rc = vdso_base;
diff --git a/arch/sh/kernel/vsyscall/vsyscall.c b/arch/sh/kernel/vsyscall/vsyscall.c
index ea2aa1393b87..cc0cc5b4ff18 100644
--- a/arch/sh/kernel/vsyscall/vsyscall.c
+++ b/arch/sh/kernel/vsyscall/vsyscall.c
@@ -64,7 +64,9 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	unsigned long addr;
 	int ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	addr = get_unmapped_area(NULL, 0, PAGE_SIZE, 0, 0);
 	if (IS_ERR_VALUE(addr)) {
 		ret = addr;
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index b3cf81333a54..ab220ac9b3b9 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -163,7 +163,8 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 		addr = 0;
 	}
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	addr = get_unmapped_area(NULL, addr,
 				 image->size - image->sym_vvar_start, 0, 0);
diff --git a/arch/x86/um/vdso/vma.c b/arch/x86/um/vdso/vma.c
index 237c6831e095..6be22f991b59 100644
--- a/arch/x86/um/vdso/vma.c
+++ b/arch/x86/um/vdso/vma.c
@@ -61,7 +61,8 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
 	if (!vdso_enabled)
 		return 0;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
 
 	err = install_special_mapping(mm, um_vdso_addr, PAGE_SIZE,
 		VM_READ|VM_EXEC|
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 11/18] coredump: make coredump_wait wait for mmap_sem for write killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Oleg Nesterov, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

coredump_wait waits for mmap_sem for write currently which can
prevent oom_reaper to reclaim the oom victims address space
asynchronously because that requires mmap_sem for read. This might
happen if the oom victim is multi threaded and some thread(s) is
holding mmap_sem for read (e.g. page fault) and it is stuck in
the page allocator while other thread(s) reached coredump_wait
already.

This patch simply uses down_write_killable and bails out with EINTR
if the lock got interrupted by the fatal signal. do_coredump will
return right away and do_group_exit will take care to zap the whole
thread group.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/coredump.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 47c32c3bfa1d..f2cef927789b 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -413,7 +413,9 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
 	core_state->dumper.task = tsk;
 	core_state->dumper.next = NULL;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	if (!mm->core_state)
 		core_waiters = zap_threads(tsk, mm, core_state, exit_code);
 	up_write(&mm->mmap_sem);
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 11/18] coredump: make coredump_wait wait for mmap_sem for write killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Oleg Nesterov, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

coredump_wait waits for mmap_sem for write currently which can
prevent oom_reaper to reclaim the oom victims address space
asynchronously because that requires mmap_sem for read. This might
happen if the oom victim is multi threaded and some thread(s) is
holding mmap_sem for read (e.g. page fault) and it is stuck in
the page allocator while other thread(s) reached coredump_wait
already.

This patch simply uses down_write_killable and bails out with EINTR
if the lock got interrupted by the fatal signal. do_coredump will
return right away and do_group_exit will take care to zap the whole
thread group.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/coredump.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/coredump.c b/fs/coredump.c
index 47c32c3bfa1d..f2cef927789b 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -413,7 +413,9 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
 	core_state->dumper.task = tsk;
 	core_state->dumper.next = NULL;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	if (!mm->core_state)
 		core_waiters = zap_threads(tsk, mm, core_state, exit_code);
 	up_write(&mm->mmap_sem);
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 12/18] aio: make aio_setup_ring killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Benamin LaHaise, Alexander Viro, Jeff Moyer,
	Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

aio_setup_ring waits for mmap_sem in writable mode. If the waiting
task gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely
OOM resolving. Wait for the lock in the killable mode and return with
EINTR if the task got killed while waiting. This will also expedite
the return to the userspace and do_exit.

Cc: Benamin LaHaise <bcrl@kvack.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/aio.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/aio.c b/fs/aio.c
index 155f84253f33..be771046d77c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -496,7 +496,12 @@ static int aio_setup_ring(struct kioctx *ctx)
 	ctx->mmap_size = nr_pages * PAGE_SIZE;
 	pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size);
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem)) {
+		ctx->mmap_size = 0;
+		aio_free_ring(ctx);
+		return -EINTR;
+	}
+
 	ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size,
 				       PROT_READ | PROT_WRITE,
 				       MAP_SHARED, 0, &unused);
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 12/18] aio: make aio_setup_ring killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Benamin LaHaise, Alexander Viro, Jeff Moyer,
	Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

aio_setup_ring waits for mmap_sem in writable mode. If the waiting
task gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely
OOM resolving. Wait for the lock in the killable mode and return with
EINTR if the task got killed while waiting. This will also expedite
the return to the userspace and do_exit.

Cc: Benamin LaHaise <bcrl@kvack.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/aio.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/aio.c b/fs/aio.c
index 155f84253f33..be771046d77c 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -496,7 +496,12 @@ static int aio_setup_ring(struct kioctx *ctx)
 	ctx->mmap_size = nr_pages * PAGE_SIZE;
 	pr_debug("attempting mmap of %lu bytes\n", ctx->mmap_size);
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem)) {
+		ctx->mmap_size = 0;
+		aio_free_ring(ctx);
+		return -EINTR;
+	}
+
 	ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size,
 				       PROT_READ | PROT_WRITE,
 				       MAP_SHARED, 0, &unused);
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Alexander Viro, Oleg Nesterov, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

setup_arg_pages requires mmap_sem for write. If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely
OOM resolving. Wait for the lock in the killable mode and return with
EINTR if the task got killed while waiting. All the callers are already
handling error path and the fatal signal doesn't need any additional
treatment.

The same applies to __bprm_mm_init.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/exec.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 2f44590d88a9..d7a6ff09bb7a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 	if (!vma)
 		return -ENOMEM;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem)) {
+		err = -EINTR;
+		goto err_free;
+	}
 	vma->vm_mm = mm;
 
 	/*
@@ -294,6 +297,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 	return 0;
 err:
 	up_write(&mm->mmap_sem);
+err_free:
 	bprm->vma = NULL;
 	kmem_cache_free(vm_area_cachep, vma);
 	return err;
@@ -700,7 +704,9 @@ int setup_arg_pages(struct linux_binprm *bprm,
 		bprm->loader -= stack_shift;
 	bprm->exec -= stack_shift;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	vm_flags = VM_STACK_FLAGS;
 
 	/*
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Alexander Viro, Oleg Nesterov, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

setup_arg_pages requires mmap_sem for write. If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely
OOM resolving. Wait for the lock in the killable mode and return with
EINTR if the task got killed while waiting. All the callers are already
handling error path and the fatal signal doesn't need any additional
treatment.

The same applies to __bprm_mm_init.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/exec.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 2f44590d88a9..d7a6ff09bb7a 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 	if (!vma)
 		return -ENOMEM;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem)) {
+		err = -EINTR;
+		goto err_free;
+	}
 	vma->vm_mm = mm;
 
 	/*
@@ -294,6 +297,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 	return 0;
 err:
 	up_write(&mm->mmap_sem);
+err_free:
 	bprm->vma = NULL;
 	kmem_cache_free(vm_area_cachep, vma);
 	return err;
@@ -700,7 +704,9 @@ int setup_arg_pages(struct linux_binprm *bprm,
 		bprm->loader -= stack_shift;
 	bprm->exec -= stack_shift;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	vm_flags = VM_STACK_FLAGS;
 
 	/*
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 14/18] prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Alex Thorlton, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

PR_SET_THP_DISABLE requires mmap_sem for write. If the waiting
task gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Cc: Alex Thorlton <athorlton@sgi.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 kernel/sys.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index cf8ba545c7d3..89d5be418157 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2246,7 +2246,8 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_SET_THP_DISABLE:
 		if (arg3 || arg4 || arg5)
 			return -EINVAL;
-		down_write(&me->mm->mmap_sem);
+		if (down_write_killable(&me->mm->mmap_sem))
+			return -EINTR;
 		if (arg2)
 			me->mm->def_flags |= VM_NOHUGEPAGE;
 		else
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 14/18] prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Alex Thorlton, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

PR_SET_THP_DISABLE requires mmap_sem for write. If the waiting
task gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Cc: Alex Thorlton <athorlton@sgi.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 kernel/sys.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index cf8ba545c7d3..89d5be418157 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2246,7 +2246,8 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_SET_THP_DISABLE:
 		if (arg3 || arg4 || arg5)
 			return -EINVAL;
-		down_write(&me->mm->mmap_sem);
+		if (down_write_killable(&me->mm->mmap_sem))
+			return -EINTR;
 		if (arg2)
 			me->mm->def_flags |= VM_NOHUGEPAGE;
 		else
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 15/18] uprobes: wait for mmap_sem for write killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Oleg Nesterov, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

xol_add_vma needs mmap_sem for write. If the waiting task gets killed by
the oom killer it would block oom_reaper from asynchronous address space
reclaim and reduce the chances of timely OOM resolving. Wait for the
lock in the killable mode and return with EINTR if the task got killed
while waiting.

Do not warn in dup_xol_work if __create_xol_area failed due to fatal
signal pending because this is usually considered a kernel issue.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 kernel/events/uprobes.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 7edc95edfaee..7bed7f63336d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1130,7 +1130,9 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
 	struct vm_area_struct *vma;
 	int ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	if (mm->uprobes_state.xol_area) {
 		ret = -EALREADY;
 		goto fail;
@@ -1469,7 +1471,8 @@ static void dup_xol_work(struct callback_head *work)
 	if (current->flags & PF_EXITING)
 		return;
 
-	if (!__create_xol_area(current->utask->dup_xol_addr))
+	if (!__create_xol_area(current->utask->dup_xol_addr) &&
+			!fatal_signal_pending(current))
 		uprobe_warn(current, "dup xol area");
 }
 
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 15/18] uprobes: wait for mmap_sem for write killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Oleg Nesterov, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

xol_add_vma needs mmap_sem for write. If the waiting task gets killed by
the oom killer it would block oom_reaper from asynchronous address space
reclaim and reduce the chances of timely OOM resolving. Wait for the
lock in the killable mode and return with EINTR if the task got killed
while waiting.

Do not warn in dup_xol_work if __create_xol_area failed due to fatal
signal pending because this is usually considered a kernel issue.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 kernel/events/uprobes.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 7edc95edfaee..7bed7f63336d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1130,7 +1130,9 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
 	struct vm_area_struct *vma;
 	int ret;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	if (mm->uprobes_state.xol_area) {
 		ret = -EALREADY;
 		goto fail;
@@ -1469,7 +1471,8 @@ static void dup_xol_work(struct callback_head *work)
 	if (current->flags & PF_EXITING)
 		return;
 
-	if (!__create_xol_area(current->utask->dup_xol_addr))
+	if (!__create_xol_area(current->utask->dup_xol_addr) &&
+			!fatal_signal_pending(current))
 		uprobe_warn(current, "dup xol area");
 }
 
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 16/18] drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Daniel Vetter, David Airlie, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

i915_gem_mmap_ioctl relies on mmap_sem for write. If the waiting
task gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: David Airlie <airlied@linux.ie>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 761e28febddc..b99c761846ce 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1721,7 +1721,10 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
 		struct mm_struct *mm = current->mm;
 		struct vm_area_struct *vma;
 
-		down_write(&mm->mmap_sem);
+		if (down_write_killable(&mm->mmap_sem)) {
+			drm_gem_object_unreference_unlocked(obj);
+			return -EINTR;
+		}
 		vma = find_vma(mm, addr);
 		if (vma)
 			vma->vm_page_prot =
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 16/18] drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Daniel Vetter, David Airlie, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

i915_gem_mmap_ioctl relies on mmap_sem for write. If the waiting
task gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely OOM
resolving. Wait for the lock in the killable mode and return with EINTR
if the task got killed while waiting.

Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: David Airlie <airlied@linux.ie>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 761e28febddc..b99c761846ce 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1721,7 +1721,10 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
 		struct mm_struct *mm = current->mm;
 		struct vm_area_struct *vma;
 
-		down_write(&mm->mmap_sem);
+		if (down_write_killable(&mm->mmap_sem)) {
+			drm_gem_object_unreference_unlocked(obj);
+			return -EINTR;
+		}
 		vma = find_vma(mm, addr);
 		if (vma)
 			vma->vm_page_prot =
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 17/18] drm/radeon: make radeon_mn_get wait for mmap_sem killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Alex Deucher, Christian König,
	David Airlie, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

radeon_mn_get which is called during ioct path relies on mmap_sem for
write. If the waiting task gets killed by the oom killer it would block
oom_reaper from asynchronous address space reclaim and reduce the
chances of timely OOM resolving. Wait for the lock in the killable mode
and return with EINTR if the task got killed while waiting.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/gpu/drm/radeon/radeon_mn.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c
index eef006c48584..896f2cf51e4e 100644
--- a/drivers/gpu/drm/radeon/radeon_mn.c
+++ b/drivers/gpu/drm/radeon/radeon_mn.c
@@ -186,7 +186,9 @@ static struct radeon_mn *radeon_mn_get(struct radeon_device *rdev)
 	struct radeon_mn *rmn;
 	int r;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return ERR_PTR(-EINTR);
+
 	mutex_lock(&rdev->mn_lock);
 
 	hash_for_each_possible(rdev->mn_hash, rmn, node, (unsigned long)mm)
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 17/18] drm/radeon: make radeon_mn_get wait for mmap_sem killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, Alex Deucher, Christian König,
	David Airlie, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

radeon_mn_get which is called during ioct path relies on mmap_sem for
write. If the waiting task gets killed by the oom killer it would block
oom_reaper from asynchronous address space reclaim and reduce the
chances of timely OOM resolving. Wait for the lock in the killable mode
and return with EINTR if the task got killed while waiting.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian KA?nig" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Reviewed-by: Christian KA?nig <christian.koenig@amd.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/gpu/drm/radeon/radeon_mn.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c
index eef006c48584..896f2cf51e4e 100644
--- a/drivers/gpu/drm/radeon/radeon_mn.c
+++ b/drivers/gpu/drm/radeon/radeon_mn.c
@@ -186,7 +186,9 @@ static struct radeon_mn *radeon_mn_get(struct radeon_device *rdev)
 	struct radeon_mn *rmn;
 	int r;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return ERR_PTR(-EINTR);
+
 	mutex_lock(&rdev->mn_lock);
 
 	hash_for_each_possible(rdev->mn_hash, rmn, node, (unsigned long)mm)
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 18/18] drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable
  2016-04-26 12:56 ` Michal Hocko
@ 2016-04-26 12:56   ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, David Airlie, Alex Deucher,
	Christian König, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

amdgpu_mn_get which is called during ioct path relies on mmap_sem for
write. If the waiting task gets killed by the oom killer it would block
oom_reaper from asynchronous address space reclaim and reduce the
chances of timely OOM resolving. Wait for the lock in the killable mode
and return with EINTR if the task got killed while waiting.

Cc: David Airlie <airlied@linux.ie>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index 9f4a45cd2aab..cf90686a50d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -232,7 +232,10 @@ static struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev)
 	int r;
 
 	mutex_lock(&adev->mn_lock);
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem)) {
+		mutex_unlock(&adev->mn_lock);
+		return -EINTR;
+	}
 
 	hash_for_each_possible(adev->mn_hash, rmn, node, (unsigned long)mm)
 		if (rmn->mm == mm)
-- 
2.8.0.rc3

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 18/18] drm/amdgpu: make amdgpu_mn_get wait for mmap_sem killable
@ 2016-04-26 12:56   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-04-26 12:56 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: LKML, Michal Hocko, David Airlie, Alex Deucher,
	Christian König, Vlastimil Babka

From: Michal Hocko <mhocko@suse.com>

amdgpu_mn_get which is called during ioct path relies on mmap_sem for
write. If the waiting task gets killed by the oom killer it would block
oom_reaper from asynchronous address space reclaim and reduce the
chances of timely OOM resolving. Wait for the lock in the killable mode
and return with EINTR if the task got killed while waiting.

Cc: David Airlie <airlied@linux.ie>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian KA?nig <christian.koenig@amd.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index 9f4a45cd2aab..cf90686a50d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -232,7 +232,10 @@ static struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev)
 	int r;
 
 	mutex_lock(&adev->mn_lock);
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem)) {
+		mutex_unlock(&adev->mn_lock);
+		return -EINTR;
+	}
 
 	hash_for_each_possible(adev->mn_hash, rmn, node, (unsigned long)mm)
 		if (rmn->mm == mm)
-- 
2.8.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 14/18] prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
  2016-04-26 12:56   ` Michal Hocko
@ 2016-04-26 15:18     ` Alex Thorlton
  -1 siblings, 0 replies; 56+ messages in thread
From: Alex Thorlton @ 2016-04-26 15:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, LKML, Michal Hocko, Alex Thorlton,
	Vlastimil Babka

On Tue, Apr 26, 2016 at 02:56:21PM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> PR_SET_THP_DISABLE requires mmap_sem for write. If the waiting
> task gets killed by the oom killer it would block oom_reaper from
> asynchronous address space reclaim and reduce the chances of timely OOM
> resolving. Wait for the lock in the killable mode and return with EINTR
> if the task got killed while waiting.
> 
> Cc: Alex Thorlton <athorlton@sgi.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Looks good to me - I wrote that bit of code so I think this can get an:

Acked-by: Alex Thorlton <athorlton@sgi.com>

Thanks for Ccing me!

- Alex

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 14/18] prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
@ 2016-04-26 15:18     ` Alex Thorlton
  0 siblings, 0 replies; 56+ messages in thread
From: Alex Thorlton @ 2016-04-26 15:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, LKML, Michal Hocko, Alex Thorlton,
	Vlastimil Babka

On Tue, Apr 26, 2016 at 02:56:21PM +0200, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> PR_SET_THP_DISABLE requires mmap_sem for write. If the waiting
> task gets killed by the oom killer it would block oom_reaper from
> asynchronous address space reclaim and reduce the chances of timely OOM
> resolving. Wait for the lock in the killable mode and return with EINTR
> if the task got killed while waiting.
> 
> Cc: Alex Thorlton <athorlton@sgi.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Looks good to me - I wrote that bit of code so I think this can get an:

Acked-by: Alex Thorlton <athorlton@sgi.com>

Thanks for Ccing me!

- Alex

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
  2016-02-29 13:26   ` Michal Hocko
  (?)
@ 2016-03-11 12:51     ` Vlastimil Babka
  -1 siblings, 0 replies; 56+ messages in thread
From: Vlastimil Babka @ 2016-03-11 12:51 UTC (permalink / raw)
  To: Michal Hocko, LKML
  Cc: Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Oleg Nesterov,
	Peter Zijlstra, Petr Cermak, Thomas Gleixner, Michal Hocko,
	Alexander Viro

On 02/29/2016 02:26 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> setup_arg_pages requires mmap_sem for write. If the waiting task
> gets killed by the oom killer it would block oom_reaper from
> asynchronous address space reclaim and reduce the chances of timely
> OOM resolving. Wait for the lock in the killable mode and return with
> EINTR if the task got killed while waiting. All the callers are already
> handling error path and the fatal signal doesn't need any additional
> treatment.
>
> The same applies to __bprm_mm_init.
>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Michal Hocko <mhocko@suse.com>


Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
@ 2016-03-11 12:51     ` Vlastimil Babka
  0 siblings, 0 replies; 56+ messages in thread
From: Vlastimil Babka @ 2016-03-11 12:51 UTC (permalink / raw)
  To: Michal Hocko, LKML
  Cc: Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Oleg Nesterov,
	Peter Zijlstra

On 02/29/2016 02:26 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> setup_arg_pages requires mmap_sem for write. If the waiting task
> gets killed by the oom killer it would block oom_reaper from
> asynchronous address space reclaim and reduce the chances of timely
> OOM resolving. Wait for the lock in the killable mode and return with
> EINTR if the task got killed while waiting. All the callers are already
> handling error path and the fatal signal doesn't need any additional
> treatment.
>
> The same applies to __bprm_mm_init.
>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Michal Hocko <mhocko@suse.com>


Acked-by: Vlastimil Babka <vbabka@suse.cz>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
@ 2016-03-11 12:51     ` Vlastimil Babka
  0 siblings, 0 replies; 56+ messages in thread
From: Vlastimil Babka @ 2016-03-11 12:51 UTC (permalink / raw)
  To: Michal Hocko, LKML
  Cc: Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Oleg Nesterov,
	Peter Zijlstra, Petr Cermak, Thomas Gleixner, Michal Hocko,
	Alexander Viro

On 02/29/2016 02:26 PM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> setup_arg_pages requires mmap_sem for write. If the waiting task
> gets killed by the oom killer it would block oom_reaper from
> asynchronous address space reclaim and reduce the chances of timely
> OOM resolving. Wait for the lock in the killable mode and return with
> EINTR if the task got killed while waiting. All the callers are already
> handling error path and the fatal signal doesn't need any additional
> treatment.
>
> The same applies to __bprm_mm_init.
>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Signed-off-by: Michal Hocko <mhocko@suse.com>


Acked-by: Vlastimil Babka <vbabka@suse.cz>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
  2016-02-29 17:47       ` Michal Hocko
  (?)
@ 2016-02-29 18:10         ` Oleg Nesterov
  -1 siblings, 0 replies; 56+ messages in thread
From: Oleg Nesterov @ 2016-02-29 18:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Peter Zijlstra,
	Petr Cermak, Thomas Gleixner, Alexander Viro

On 02/29, Michal Hocko wrote:
>
> On Mon 29-02-16 18:23:34, Oleg Nesterov wrote:
> > On 02/29, Michal Hocko wrote:
> > >
> > > @@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
> > >  	if (!vma)
> > >  		return -ENOMEM;
> > >
> > > -	down_write(&mm->mmap_sem);
> > > +	if (down_write_killable(&mm->mmap_sem)) {
> > > +		err = -EINTR;
> > > +		goto err_free;
> > > +	}
> > >  	vma->vm_mm = mm;
> >
> > I won't argue, but this looks unnecessary. Nobody else can see this new mm,
> > down_write() can't block.
> >
> > In fact I think we can just remove down_write/up_write here. Except perhaps
> > there is lockdep_assert_held() somewhere in these paths.
>
> This is what I had initially but then I've noticed that mm_alloc() does
> mm_init(current)->init_new_context(current)

yes, and init_new_context() is arch dependant...

> code doesn't seem much harder to follow, the callers are already
> handling all error paths so I guess it would be better to simply move on
> this.

Yes, agreed, please forget.

Oleg.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
@ 2016-02-29 18:10         ` Oleg Nesterov
  0 siblings, 0 replies; 56+ messages in thread
From: Oleg Nesterov @ 2016-02-29 18:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Peter Zijlstra

On 02/29, Michal Hocko wrote:
>
> On Mon 29-02-16 18:23:34, Oleg Nesterov wrote:
> > On 02/29, Michal Hocko wrote:
> > >
> > > @@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
> > >  	if (!vma)
> > >  		return -ENOMEM;
> > >
> > > -	down_write(&mm->mmap_sem);
> > > +	if (down_write_killable(&mm->mmap_sem)) {
> > > +		err = -EINTR;
> > > +		goto err_free;
> > > +	}
> > >  	vma->vm_mm = mm;
> >
> > I won't argue, but this looks unnecessary. Nobody else can see this new mm,
> > down_write() can't block.
> >
> > In fact I think we can just remove down_write/up_write here. Except perhaps
> > there is lockdep_assert_held() somewhere in these paths.
>
> This is what I had initially but then I've noticed that mm_alloc() does
> mm_init(current)->init_new_context(current)

yes, and init_new_context() is arch dependant...

> code doesn't seem much harder to follow, the callers are already
> handling all error paths so I guess it would be better to simply move on
> this.

Yes, agreed, please forget.

Oleg.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
@ 2016-02-29 18:10         ` Oleg Nesterov
  0 siblings, 0 replies; 56+ messages in thread
From: Oleg Nesterov @ 2016-02-29 18:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Peter Zijlstra,
	Petr Cermak, Thomas Gleixner, Alexander Viro

On 02/29, Michal Hocko wrote:
>
> On Mon 29-02-16 18:23:34, Oleg Nesterov wrote:
> > On 02/29, Michal Hocko wrote:
> > >
> > > @@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
> > >  	if (!vma)
> > >  		return -ENOMEM;
> > >
> > > -	down_write(&mm->mmap_sem);
> > > +	if (down_write_killable(&mm->mmap_sem)) {
> > > +		err = -EINTR;
> > > +		goto err_free;
> > > +	}
> > >  	vma->vm_mm = mm;
> >
> > I won't argue, but this looks unnecessary. Nobody else can see this new mm,
> > down_write() can't block.
> >
> > In fact I think we can just remove down_write/up_write here. Except perhaps
> > there is lockdep_assert_held() somewhere in these paths.
>
> This is what I had initially but then I've noticed that mm_alloc() does
> mm_init(current)->init_new_context(current)

yes, and init_new_context() is arch dependant...

> code doesn't seem much harder to follow, the callers are already
> handling all error paths so I guess it would be better to simply move on
> this.

Yes, agreed, please forget.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
  2016-02-29 17:23     ` Oleg Nesterov
  (?)
@ 2016-02-29 17:47       ` Michal Hocko
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-02-29 17:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Peter Zijlstra,
	Petr Cermak, Thomas Gleixner, Alexander Viro

On Mon 29-02-16 18:23:34, Oleg Nesterov wrote:
> On 02/29, Michal Hocko wrote:
> >
> > @@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
> >  	if (!vma)
> >  		return -ENOMEM;
> >  
> > -	down_write(&mm->mmap_sem);
> > +	if (down_write_killable(&mm->mmap_sem)) {
> > +		err = -EINTR;
> > +		goto err_free;
> > +	}
> >  	vma->vm_mm = mm;
> 
> I won't argue, but this looks unnecessary. Nobody else can see this new mm,
> down_write() can't block.
> 
> In fact I think we can just remove down_write/up_write here. Except perhaps
> there is lockdep_assert_held() somewhere in these paths.

This is what I had initially but then I've noticed that mm_alloc() does
mm_init(current)->init_new_context(current) so the outside can see this
mm AFAICS. Now I guess this shouldn't matter in the real life but the
code doesn't seem much harder to follow, the callers are already
handling all error paths so I guess it would be better to simply move on
this. Or am I misunderstanding the code or missing something?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
@ 2016-02-29 17:47       ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-02-29 17:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Peter Zijlstra

On Mon 29-02-16 18:23:34, Oleg Nesterov wrote:
> On 02/29, Michal Hocko wrote:
> >
> > @@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
> >  	if (!vma)
> >  		return -ENOMEM;
> >  
> > -	down_write(&mm->mmap_sem);
> > +	if (down_write_killable(&mm->mmap_sem)) {
> > +		err = -EINTR;
> > +		goto err_free;
> > +	}
> >  	vma->vm_mm = mm;
> 
> I won't argue, but this looks unnecessary. Nobody else can see this new mm,
> down_write() can't block.
> 
> In fact I think we can just remove down_write/up_write here. Except perhaps
> there is lockdep_assert_held() somewhere in these paths.

This is what I had initially but then I've noticed that mm_alloc() does
mm_init(current)->init_new_context(current) so the outside can see this
mm AFAICS. Now I guess this shouldn't matter in the real life but the
code doesn't seem much harder to follow, the callers are already
handling all error paths so I guess it would be better to simply move on
this. Or am I misunderstanding the code or missing something?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
@ 2016-02-29 17:47       ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-02-29 17:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: LKML, Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Peter Zijlstra,
	Petr Cermak, Thomas Gleixner, Alexander Viro

On Mon 29-02-16 18:23:34, Oleg Nesterov wrote:
> On 02/29, Michal Hocko wrote:
> >
> > @@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
> >  	if (!vma)
> >  		return -ENOMEM;
> >  
> > -	down_write(&mm->mmap_sem);
> > +	if (down_write_killable(&mm->mmap_sem)) {
> > +		err = -EINTR;
> > +		goto err_free;
> > +	}
> >  	vma->vm_mm = mm;
> 
> I won't argue, but this looks unnecessary. Nobody else can see this new mm,
> down_write() can't block.
> 
> In fact I think we can just remove down_write/up_write here. Except perhaps
> there is lockdep_assert_held() somewhere in these paths.

This is what I had initially but then I've noticed that mm_alloc() does
mm_init(current)->init_new_context(current) so the outside can see this
mm AFAICS. Now I guess this shouldn't matter in the real life but the
code doesn't seem much harder to follow, the callers are already
handling all error paths so I guess it would be better to simply move on
this. Or am I misunderstanding the code or missing something?

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
  2016-02-29 13:26   ` Michal Hocko
  (?)
@ 2016-02-29 17:23     ` Oleg Nesterov
  -1 siblings, 0 replies; 56+ messages in thread
From: Oleg Nesterov @ 2016-02-29 17:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Peter Zijlstra,
	Petr Cermak, Thomas Gleixner, Michal Hocko, Alexander Viro

On 02/29, Michal Hocko wrote:
>
> @@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
>  	if (!vma)
>  		return -ENOMEM;
>  
> -	down_write(&mm->mmap_sem);
> +	if (down_write_killable(&mm->mmap_sem)) {
> +		err = -EINTR;
> +		goto err_free;
> +	}
>  	vma->vm_mm = mm;

I won't argue, but this looks unnecessary. Nobody else can see this new mm,
down_write() can't block.

In fact I think we can just remove down_write/up_write here. Except perhaps
there is lockdep_assert_held() somewhere in these paths.

Oleg.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
@ 2016-02-29 17:23     ` Oleg Nesterov
  0 siblings, 0 replies; 56+ messages in thread
From: Oleg Nesterov @ 2016-02-29 17:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Peter Zijlstra

On 02/29, Michal Hocko wrote:
>
> @@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
>  	if (!vma)
>  		return -ENOMEM;
>  
> -	down_write(&mm->mmap_sem);
> +	if (down_write_killable(&mm->mmap_sem)) {
> +		err = -EINTR;
> +		goto err_free;
> +	}
>  	vma->vm_mm = mm;

I won't argue, but this looks unnecessary. Nobody else can see this new mm,
down_write() can't block.

In fact I think we can just remove down_write/up_write here. Except perhaps
there is lockdep_assert_held() somewhere in these paths.

Oleg.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
@ 2016-02-29 17:23     ` Oleg Nesterov
  0 siblings, 0 replies; 56+ messages in thread
From: Oleg Nesterov @ 2016-02-29 17:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Peter Zijlstra,
	Petr Cermak, Thomas Gleixner, Michal Hocko, Alexander Viro

On 02/29, Michal Hocko wrote:
>
> @@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
>  	if (!vma)
>  		return -ENOMEM;
>  
> -	down_write(&mm->mmap_sem);
> +	if (down_write_killable(&mm->mmap_sem)) {
> +		err = -EINTR;
> +		goto err_free;
> +	}
>  	vma->vm_mm = mm;

I won't argue, but this looks unnecessary. Nobody else can see this new mm,
down_write() can't block.

In fact I think we can just remove down_write/up_write here. Except perhaps
there is lockdep_assert_held() somewhere in these paths.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
  2016-02-29 13:26 [PATCH 0/18] change mmap_sem taken for write killable Michal Hocko
  2016-02-29 13:26   ` Michal Hocko
@ 2016-02-29 13:26   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-02-29 13:26 UTC (permalink / raw)
  To: LKML
  Cc: Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Oleg Nesterov,
	Peter Zijlstra, Petr Cermak, Thomas Gleixner, Michal Hocko,
	Alexander Viro

From: Michal Hocko <mhocko@suse.com>

setup_arg_pages requires mmap_sem for write. If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely
OOM resolving. Wait for the lock in the killable mode and return with
EINTR if the task got killed while waiting. All the callers are already
handling error path and the fatal signal doesn't need any additional
treatment.

The same applies to __bprm_mm_init.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/exec.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index c4010b8207a1..29f2f22ae067 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 	if (!vma)
 		return -ENOMEM;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem)) {
+		err = -EINTR;
+		goto err_free;
+	}
 	vma->vm_mm = mm;
 
 	/*
@@ -294,6 +297,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 	return 0;
 err:
 	up_write(&mm->mmap_sem);
+err_free:
 	bprm->vma = NULL;
 	kmem_cache_free(vm_area_cachep, vma);
 	return err;
@@ -700,7 +704,9 @@ int setup_arg_pages(struct linux_binprm *bprm,
 		bprm->loader -= stack_shift;
 	bprm->exec -= stack_shift;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	vm_flags = VM_STACK_FLAGS;
 
 	/*
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
@ 2016-02-29 13:26   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-02-29 13:26 UTC (permalink / raw)
  To: LKML
  Cc: Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Oleg Nesterov,
	Peter Zijlstra

From: Michal Hocko <mhocko@suse.com>

setup_arg_pages requires mmap_sem for write. If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely
OOM resolving. Wait for the lock in the killable mode and return with
EINTR if the task got killed while waiting. All the callers are already
handling error path and the fatal signal doesn't need any additional
treatment.

The same applies to __bprm_mm_init.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/exec.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index c4010b8207a1..29f2f22ae067 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 	if (!vma)
 		return -ENOMEM;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem)) {
+		err = -EINTR;
+		goto err_free;
+	}
 	vma->vm_mm = mm;
 
 	/*
@@ -294,6 +297,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 	return 0;
 err:
 	up_write(&mm->mmap_sem);
+err_free:
 	bprm->vma = NULL;
 	kmem_cache_free(vm_area_cachep, vma);
 	return err;
@@ -700,7 +704,9 @@ int setup_arg_pages(struct linux_binprm *bprm,
 		bprm->loader -= stack_shift;
 	bprm->exec -= stack_shift;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	vm_flags = VM_STACK_FLAGS;
 
 	/*
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 13/18] exec: make exec path waiting for mmap_sem killable
@ 2016-02-29 13:26   ` Michal Hocko
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Hocko @ 2016-02-29 13:26 UTC (permalink / raw)
  To: LKML
  Cc: Andrew Morton, linux-mm, Alex Deucher, Alex Thorlton,
	Andrea Arcangeli, Andy Lutomirski, Benjamin LaHaise,
	Christian König, Daniel Vetter, Dave Hansen, David Airlie,
	Davidlohr Bueso, David Rientjes, H . Peter Anvin, Hugh Dickins,
	Ingo Molnar, Johannes Weiner, Kirill A . Shutemov,
	Konstantin Khlebnikov, linux-arch, Mel Gorman, Oleg Nesterov,
	Peter Zijlstra, Petr Cermak, Thomas Gleixner, Michal Hocko,
	Alexander Viro

From: Michal Hocko <mhocko@suse.com>

setup_arg_pages requires mmap_sem for write. If the waiting task
gets killed by the oom killer it would block oom_reaper from
asynchronous address space reclaim and reduce the chances of timely
OOM resolving. Wait for the lock in the killable mode and return with
EINTR if the task got killed while waiting. All the callers are already
handling error path and the fatal signal doesn't need any additional
treatment.

The same applies to __bprm_mm_init.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/exec.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index c4010b8207a1..29f2f22ae067 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -267,7 +267,10 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 	if (!vma)
 		return -ENOMEM;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem)) {
+		err = -EINTR;
+		goto err_free;
+	}
 	vma->vm_mm = mm;
 
 	/*
@@ -294,6 +297,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
 	return 0;
 err:
 	up_write(&mm->mmap_sem);
+err_free:
 	bprm->vma = NULL;
 	kmem_cache_free(vm_area_cachep, vma);
 	return err;
@@ -700,7 +704,9 @@ int setup_arg_pages(struct linux_binprm *bprm,
 		bprm->loader -= stack_shift;
 	bprm->exec -= stack_shift;
 
-	down_write(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem))
+		return -EINTR;
+
 	vm_flags = VM_STACK_FLAGS;
 
 	/*
-- 
2.7.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2016-04-26 15:18 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-26 12:56 [PATCH 0/18] change mmap_sem taken for write killable v2 Michal Hocko
2016-04-26 12:56 ` Michal Hocko
2016-04-26 12:56 ` [PATCH 01/18] mm: Make mmap_sem for write waits killable for mm syscalls Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 02/18] mm: make vm_mmap killable Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 03/18] mm: make vm_munmap killable Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 04/18] mm, aout: handle vm_brk failures Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 05/18] mm, elf: handle vm_brk error Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 06/18] mm: make vm_brk killable Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 07/18] mm, proc: make clear_refs killable Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 08/18] mm, fork: make dup_mmap wait for mmap_sem for write killable Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 09/18] ipc, shm: make shmem attach/detach wait for mmap_sem killable Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 10/18] vdso: make arch_setup_additional_pages wait for mmap_sem for write killable Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 11/18] coredump: make coredump_wait " Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 12/18] aio: make aio_setup_ring killable Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 13/18] exec: make exec path waiting for mmap_sem killable Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 14/18] prctl: make PR_SET_THP_DISABLE wait " Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 15:18   ` Alex Thorlton
2016-04-26 15:18     ` Alex Thorlton
2016-04-26 12:56 ` [PATCH 15/18] uprobes: wait for mmap_sem for write killable Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 16/18] drm/i915: make i915_gem_mmap_ioctl wait for mmap_sem killable Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 17/18] drm/radeon: make radeon_mn_get " Michal Hocko
2016-04-26 12:56   ` Michal Hocko
2016-04-26 12:56 ` [PATCH 18/18] drm/amdgpu: make amdgpu_mn_get " Michal Hocko
2016-04-26 12:56   ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2016-02-29 13:26 [PATCH 0/18] change mmap_sem taken for write killable Michal Hocko
2016-02-29 13:26 ` [PATCH 13/18] exec: make exec path waiting for mmap_sem killable Michal Hocko
2016-02-29 13:26   ` Michal Hocko
2016-02-29 13:26   ` Michal Hocko
2016-02-29 17:23   ` Oleg Nesterov
2016-02-29 17:23     ` Oleg Nesterov
2016-02-29 17:23     ` Oleg Nesterov
2016-02-29 17:47     ` Michal Hocko
2016-02-29 17:47       ` Michal Hocko
2016-02-29 17:47       ` Michal Hocko
2016-02-29 18:10       ` Oleg Nesterov
2016-02-29 18:10         ` Oleg Nesterov
2016-02-29 18:10         ` Oleg Nesterov
2016-03-11 12:51   ` Vlastimil Babka
2016-03-11 12:51     ` Vlastimil Babka
2016-03-11 12:51     ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.