linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
@ 2017-12-06  2:43 David Rientjes
  2017-12-06  2:58 ` David Rientjes
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: David Rientjes @ 2017-12-06  2:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Tetsuo Handa, Andrea Arcangeli, linux-kernel, linux-mm

Hi,

I'd like to understand the synchronization between the oom_reaper's 
unmap_page_range() and exit_mmap().  The latter does not hold 
mm->mmap_sem: it's supposed to be the last thread operating on the mm 
before it is destroyed.

If unmap_page_range() races with unmap_vmas(), we trivially call 
page_remove_rmap() twice on the same page:

BUG: Bad page map in process oom_reaper  pte:6353826300000000 pmd:00000000
addr:00007f50cab1d000 vm_flags:08100073 anon_vma:ffff9eea335603f0 mapping:          (null) index:7f50cab1d
file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
CPU: 2 PID: 1001 Comm: oom_reaper
Call Trace:
 [<ffffffffa4bd967d>] dump_stack+0x4d/0x70
 [<ffffffffa4a03558>] unmap_page_range+0x1068/0x1130
 [<ffffffffa4a2e07f>] __oom_reap_task_mm+0xd5/0x16b
 [<ffffffffa4a2e226>] oom_reaper+0xff/0x14c
 [<ffffffffa48d6ad1>] kthread+0xc1/0xe0

And there are more examples of badness from an unmap_page_range() racing 
with unmap_vmas().  In this case, MMF_OOM_SKIP is doing two things: (1) 
avoiding additional oom kills until unmap_vmas() returns and (2) avoid the 
oom_reaper working on the mm after unmap_vmas().  In (2), there's nothing 
preventing the oom reaper from calling unmap_page_range() in parallel with 
the final thread doing unmap_vmas() -- we no longer do mmget() to prevent 
exit_mmap() from being called.

I don't think that we can allow the oom reaper's unmap_page_range() to 
race with unmap_vmas().  If we can, what allows this if we don't either 
increment mm->mm_users in the oom reaper or hold mm->mmap_sem for write in 
exit_mmap()?

One way to solve the issue is to have two mm flags: one to indicate the mm 
is entering unmap_vmas(): set the flag, do down_write(&mm->mmap_sem); 
up_write(&mm->mmap_sem), then unmap_vmas().  The oom reaper needs this 
flag clear, not MMF_OOM_SKIP, while holding down_read(&mm->mmap_sem) to be 
allowed to call unmap_page_range().  The oom killer will still defer 
selecting this victim for MMF_OOM_SKIP after unmap_vmas() returns.

The result of that change would be that we do not oom reap from any mm 
entering unmap_vmas(): we let unmap_vmas() do the work itself and avoid 
racing with it.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-06  2:43 Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap David Rientjes
@ 2017-12-06  2:58 ` David Rientjes
       [not found]   ` <201712060328.vB63SrDK069830@www262.sakura.ne.jp>
  2017-12-06  8:31 ` Michal Hocko
  2017-12-07 11:35 ` Michal Hocko
  2 siblings, 1 reply; 19+ messages in thread
From: David Rientjes @ 2017-12-06  2:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Tetsuo Handa, Andrea Arcangeli, linux-kernel, linux-mm

On Tue, 5 Dec 2017, David Rientjes wrote:

> One way to solve the issue is to have two mm flags: one to indicate the mm 
> is entering unmap_vmas(): set the flag, do down_write(&mm->mmap_sem); 
> up_write(&mm->mmap_sem), then unmap_vmas().  The oom reaper needs this 
> flag clear, not MMF_OOM_SKIP, while holding down_read(&mm->mmap_sem) to be 
> allowed to call unmap_page_range().  The oom killer will still defer 
> selecting this victim for MMF_OOM_SKIP after unmap_vmas() returns.
> 
> The result of that change would be that we do not oom reap from any mm 
> entering unmap_vmas(): we let unmap_vmas() do the work itself and avoid 
> racing with it.
> 

I think we need something like the following?

diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -70,6 +70,7 @@ static inline int get_dumpable(struct mm_struct *mm)
 #define MMF_UNSTABLE		22	/* mm is unstable for copy_from_user */
 #define MMF_HUGE_ZERO_PAGE	23      /* mm has ever used the global huge zero page */
 #define MMF_DISABLE_THP		24	/* disable THP for all VMAs */
+#define MMF_REAPING		25	/* mm is undergoing reaping */
 #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3014,16 +3014,11 @@ void exit_mmap(struct mm_struct *mm)
 
 	lru_add_drain();
 	flush_cache_mm(mm);
-	tlb_gather_mmu(&tlb, mm, 0, -1);
-	/* update_hiwater_rss(mm) here? but nobody should be looking */
-	/* Use -1 here to ensure all VMAs in the mm are unmapped */
-	unmap_vmas(&tlb, vma, 0, -1);
-
-	set_bit(MMF_OOM_SKIP, &mm->flags);
+	set_bit(MMF_REAPING, &mm->flags);
 	if (unlikely(tsk_is_oom_victim(current))) {
 		/*
 		 * Wait for oom_reap_task() to stop working on this
-		 * mm. Because MMF_OOM_SKIP is already set before
+		 * mm. Because MMF_REAPING is already set before
 		 * calling down_read(), oom_reap_task() will not run
 		 * on this "mm" post up_write().
 		 *
@@ -3036,6 +3031,11 @@ void exit_mmap(struct mm_struct *mm)
 		down_write(&mm->mmap_sem);
 		up_write(&mm->mmap_sem);
 	}
+	tlb_gather_mmu(&tlb, mm, 0, -1);
+	/* update_hiwater_rss(mm) here? but nobody should be looking */
+	/* Use -1 here to ensure all VMAs in the mm are unmapped */
+	unmap_vmas(&tlb, vma, 0, -1);
+	set_bit(MMF_OOM_SKIP, &mm->flags);
 	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
 	tlb_finish_mmu(&tlb, 0, -1);
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -529,12 +529,12 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 	}
 
 	/*
-	 * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
-	 * work on the mm anymore. The check for MMF_OOM_SKIP must run
+	 * MMF_REAPING is set by exit_mmap when the OOM reaper can't
+	 * work on the mm anymore. The check for MMF_REAPING must run
 	 * under mmap_sem for reading because it serializes against the
 	 * down_write();up_write() cycle in exit_mmap().
 	 */
-	if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
+	if (test_bit(MMF_REAPING, &mm->flags)) {
 		up_read(&mm->mmap_sem);
 		trace_skip_task_reaping(tsk->pid);
 		goto unlock_oom;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
       [not found]   ` <201712060328.vB63SrDK069830@www262.sakura.ne.jp>
@ 2017-12-06  7:48     ` David Rientjes
  2017-12-06  9:00       ` Michal Hocko
  2017-12-06 11:37       ` Tetsuo Handa
  2017-12-06  8:50     ` Michal Hocko
  1 sibling, 2 replies; 19+ messages in thread
From: David Rientjes @ 2017-12-06  7:48 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Andrew Morton, Michal Hocko, Andrea Arcangeli, linux-kernel, linux-mm

On Wed, 6 Dec 2017, Tetsuo Handa wrote:

> > > One way to solve the issue is to have two mm flags: one to indicate the mm 
> > > is entering unmap_vmas(): set the flag, do down_write(&mm->mmap_sem); 
> > > up_write(&mm->mmap_sem), then unmap_vmas().  The oom reaper needs this 
> > > flag clear, not MMF_OOM_SKIP, while holding down_read(&mm->mmap_sem) to be 
> > > allowed to call unmap_page_range().  The oom killer will still defer 
> > > selecting this victim for MMF_OOM_SKIP after unmap_vmas() returns.
> > > 
> > > The result of that change would be that we do not oom reap from any mm 
> > > entering unmap_vmas(): we let unmap_vmas() do the work itself and avoid 
> > > racing with it.
> > > 
> > 
> > I think we need something like the following?
> 
> This patch does not work. __oom_reap_task_mm() can find MMF_REAPING and
> return true and sets MMF_OOM_SKIP before exit_mmap() calls down_write().
> 

Ah, you're talking about oom_reap_task() setting MMF_OOM_SKIP prematurely 
and allowing for additional oom kills?  I see your point, but I was mainly 
focused on preventing the panic as the first order of business.  We could 
certainly fix oom_reap_task() to not set MMF_OOM_SKIP itself and rather 
leave that to exit_mmap() if it finds MMF_REAPING if your concern matches 
my understanding.

> Also, I don't know what exit_mmap() is doing but I think that there is a
> possibility that the OOM reaper tries to reclaim mlocked pages as soon as
> exit_mmap() cleared VM_LOCKED flag by calling munlock_vma_pages_all().
> 
> 	if (mm->locked_vm) {
> 		vma = mm->mmap;
> 		while (vma) {
> 			if (vma->vm_flags & VM_LOCKED)
> 				munlock_vma_pages_all(vma);
> 			vma = vma->vm_next;
> 		}
> 	}
> 

Yes, that looks possible as well, although the problem I have reported can 
happen with or without mlock.  Did you find this by code inspection or 
have you experienced runtime problems with it?

I think this argues to do MMF_REAPING-style behavior at the beginning of 
exit_mmap() and avoid reaping all together once we have reached that 
point.  There are no more users of the mm and we are in the process of 
tearing it down, I'm not sure that the oom reaper should be in the 
business with trying to interfere with that.  Or are there actual bug 
reports where an oom victim gets wedged while in exit_mmap() prior to 
releasing its memory?

I know this conflicts with your patches in -mm to remove the oom mutex, 
but I think we should make sure we can prevent crashes before cleaning it 
up.

(I also noticed that the mm_has_notifiers() check is missing a 
trace_skip_task_reaping(tsk->pid))
---
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -70,6 +70,7 @@ static inline int get_dumpable(struct mm_struct *mm)
 #define MMF_UNSTABLE		22	/* mm is unstable for copy_from_user */
 #define MMF_HUGE_ZERO_PAGE	23      /* mm has ever used the global huge zero page */
 #define MMF_DISABLE_THP		24	/* disable THP for all VMAs */
+#define MMF_REAPING		25	/* mm is undergoing reaping */
 #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2996,6 +2996,23 @@ void exit_mmap(struct mm_struct *mm)
 
 	/* mm's last user has gone, and its about to be pulled down */
 	mmu_notifier_release(mm);
+	set_bit(MMF_REAPING, &mm->flags);
+	if (unlikely(tsk_is_oom_victim(current))) {
+		/*
+		 * Wait for oom_reap_task() to stop working on this
+		 * mm. Because MMF_REAPING is already set before
+		 * calling down_read(), oom_reap_task() will not run
+		 * on this "mm" post up_write().
+		 *
+		 * tsk_is_oom_victim() cannot be set from under us
+		 * either because current->mm is already set to NULL
+		 * under task_lock before calling mmput and oom_mm is
+		 * set not NULL by the OOM killer only if current->mm
+		 * is found not NULL while holding the task_lock.
+		 */
+		down_write(&mm->mmap_sem);
+		up_write(&mm->mmap_sem);
+	}
 
 	if (mm->locked_vm) {
 		vma = mm->mmap;
@@ -3018,26 +3035,9 @@ void exit_mmap(struct mm_struct *mm)
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);
-
-	set_bit(MMF_OOM_SKIP, &mm->flags);
-	if (unlikely(tsk_is_oom_victim(current))) {
-		/*
-		 * Wait for oom_reap_task() to stop working on this
-		 * mm. Because MMF_OOM_SKIP is already set before
-		 * calling down_read(), oom_reap_task() will not run
-		 * on this "mm" post up_write().
-		 *
-		 * tsk_is_oom_victim() cannot be set from under us
-		 * either because current->mm is already set to NULL
-		 * under task_lock before calling mmput and oom_mm is
-		 * set not NULL by the OOM killer only if current->mm
-		 * is found not NULL while holding the task_lock.
-		 */
-		down_write(&mm->mmap_sem);
-		up_write(&mm->mmap_sem);
-	}
 	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
 	tlb_finish_mmu(&tlb, 0, -1);
+	set_bit(MMF_OOM_SKIP, &mm->flags);
 
 	/*
 	 * Walk the list again, actually closing and freeing it,
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -485,30 +485,29 @@ static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait);
 static struct task_struct *oom_reaper_list;
 static DEFINE_SPINLOCK(oom_reaper_lock);
 
-static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
+/*
+ * Returns 0 on success or
+ * 	-EAGAIN: mm->mmap_sem is contended
+ *	-EINVAL: mm is ineligible
+ */
+static int __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 {
 	struct mmu_gather tlb;
 	struct vm_area_struct *vma;
-	bool ret = true;
+	int ret = 0;
 
 	/*
 	 * We have to make sure to not race with the victim exit path
-	 * and cause premature new oom victim selection:
-	 * __oom_reap_task_mm		exit_mm
-	 *   mmget_not_zero
-	 *				  mmput
-	 *				    atomic_dec_and_test
-	 *				  exit_oom_victim
-	 *				[...]
-	 *				out_of_memory
-	 *				  select_bad_process
-	 *				    # no TIF_MEMDIE task selects new victim
-	 *  unmap_page_range # frees some memory
+	 * and cause premature new oom victim selection: if MMF_REAPING is
+	 * not set under the protection of mmap_sem, allow reaping because
+	 * exit_mmap() has not been entered, which is serialized with a
+	 * down_write();up_write() cycle.  Otherwise, leave reaping to
+	 * exit_mmap(), which will set MMF_OOM_SKIP itself.
 	 */
 	mutex_lock(&oom_lock);
 
 	if (!down_read_trylock(&mm->mmap_sem)) {
-		ret = false;
+		ret = -EAGAIN;
 		trace_skip_task_reaping(tsk->pid);
 		goto unlock_oom;
 	}
@@ -529,13 +528,14 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 	}
 
 	/*
-	 * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
-	 * work on the mm anymore. The check for MMF_OOM_SKIP must run
+	 * MMF_REAPING is set by exit_mmap when the OOM reaper can't
+	 * work on the mm anymore. The check for MMF_REAPING must run
 	 * under mmap_sem for reading because it serializes against the
 	 * down_write();up_write() cycle in exit_mmap().
 	 */
-	if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
+	if (test_bit(MMF_REAPING, &mm->flags)) {
 		up_read(&mm->mmap_sem);
+		ret = -EINVAL;
 		trace_skip_task_reaping(tsk->pid);
 		goto unlock_oom;
 	}
@@ -589,15 +589,18 @@ static void oom_reap_task(struct task_struct *tsk)
 {
 	int attempts = 0;
 	struct mm_struct *mm = tsk->signal->oom_mm;
+	int ret;
 
 	/* Retry the down_read_trylock(mmap_sem) a few times */
-	while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
+	while (attempts++ < MAX_OOM_REAP_RETRIES) {
+		ret = __oom_reap_task_mm(tsk, mm);
+		if (ret != -EAGAIN)
+			break;
 		schedule_timeout_idle(HZ/10);
-
+	}
 	if (attempts <= MAX_OOM_REAP_RETRIES)
 		goto done;
 
-
 	pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
 		task_pid_nr(tsk), tsk->comm);
 	debug_show_all_locks();
@@ -609,7 +612,8 @@ static void oom_reap_task(struct task_struct *tsk)
 	 * Hide this mm from OOM killer because it has been either reaped or
 	 * somebody can't call up_write(mmap_sem).
 	 */
-	set_bit(MMF_OOM_SKIP, &mm->flags);
+	if (ret != -EINVAL)
+		set_bit(MMF_OOM_SKIP, &mm->flags);
 
 	/* Drop a reference taken by wake_oom_reaper */
 	put_task_struct(tsk);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-06  2:43 Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap David Rientjes
  2017-12-06  2:58 ` David Rientjes
@ 2017-12-06  8:31 ` Michal Hocko
  2017-12-07 11:35 ` Michal Hocko
  2 siblings, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2017-12-06  8:31 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Tetsuo Handa, Andrea Arcangeli, linux-kernel, linux-mm

On Tue 05-12-17 18:43:48, David Rientjes wrote:
> Hi,
> 
> I'd like to understand the synchronization between the oom_reaper's 
> unmap_page_range() and exit_mmap().  The latter does not hold 
> mm->mmap_sem: it's supposed to be the last thread operating on the mm 
> before it is destroyed.
> 
> If unmap_page_range() races with unmap_vmas(), we trivially call 
> page_remove_rmap() twice on the same page:

Well, the oom reaper is basically MADV_DONTNEED and that allows
parallel tear down (it takes only mmap_sem for read). The exit path
doesn't take the mmap_sem during unmap_vmas but that shouldn't make any
difference because both path would take it for read anyway. The
essential synchronization between oom reaper and exit_mmap is
exit_mmap:	
	set_bit(MMF_OOM_SKIP, &mm->flags);
	if (unlikely(tsk_is_oom_victim(current))) {
		/*
		 * Wait for oom_reap_task() to stop working on this
		 * mm. Because MMF_OOM_SKIP is already set before
		 * calling down_read(), oom_reap_task() will not run
		 * on this "mm" post up_write().
		 *
		 * tsk_is_oom_victim() cannot be set from under us
		 * either because current->mm is already set to NULL
		 * under task_lock before calling mmput and oom_mm is
		 * set not NULL by the OOM killer only if current->mm
		 * is found not NULL while holding the task_lock.
		 */
		down_write(&mm->mmap_sem);
		up_write(&mm->mmap_sem);
	}

oom_reaper
	if (!down_read_trylock(&mm->mmap_sem)) {

	/*
	 * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
	 * work on the mm anymore. The check for MMF_OOM_SKIP must run
	 * under mmap_sem for reading because it serializes against the
	 * down_write();up_write() cycle in exit_mmap().
	 */
	if (test_bit(MMF_OOM_SKIP, &mm->flags)) {

which makes sure that the reaper doesn't race with free_pgtables.

> BUG: Bad page map in process oom_reaper  pte:6353826300000000 pmd:00000000

Hmm, this is really strange. This is a pte without a pmd or is the
output just incomplete.

> addr:00007f50cab1d000 vm_flags:08100073 anon_vma:ffff9eea335603f0 mapping:          (null) index:7f50cab1d
> file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
> CPU: 2 PID: 1001 Comm: oom_reaper
> Call Trace:
>  [<ffffffffa4bd967d>] dump_stack+0x4d/0x70
>  [<ffffffffa4a03558>] unmap_page_range+0x1068/0x1130

could you use addr2line to get the exact spot where this triggered
please?

>  [<ffffffffa4a2e07f>] __oom_reap_task_mm+0xd5/0x16b
>  [<ffffffffa4a2e226>] oom_reaper+0xff/0x14c
>  [<ffffffffa48d6ad1>] kthread+0xc1/0xe0
> 
> And there are more examples of badness from an unmap_page_range() racing 
> with unmap_vmas().  In this case, MMF_OOM_SKIP is doing two things: (1) 
> avoiding additional oom kills until unmap_vmas() returns and (2) avoid the 
> oom_reaper working on the mm after unmap_vmas().  In (2), there's nothing 
> preventing the oom reaper from calling unmap_page_range() in parallel with 
> the final thread doing unmap_vmas() -- we no longer do mmget() to prevent 
> exit_mmap() from being called.

Yes and that is an intentional behavior. There shouldn't be any reason
to exclude the two because this should be equivalent to calling
MADV_DONTNEED in parallel.

I will get to the rest of your email later because the above is the
essential assumption 212925802454 ("mm: oom: let oom_reap_task and
exit_mmap run concurrently") builds on. If it is not correct then we
have a bigger problem.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
       [not found]   ` <201712060328.vB63SrDK069830@www262.sakura.ne.jp>
  2017-12-06  7:48     ` David Rientjes
@ 2017-12-06  8:50     ` Michal Hocko
  1 sibling, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2017-12-06  8:50 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: David Rientjes, Andrew Morton, Andrea Arcangeli, linux-kernel, linux-mm

On Wed 06-12-17 12:28:53, Tetsuo Handa wrote:
> David Rientjes wrote:
> > On Tue, 5 Dec 2017, David Rientjes wrote:
> > 
> > > One way to solve the issue is to have two mm flags: one to indicate the mm 
> > > is entering unmap_vmas(): set the flag, do down_write(&mm->mmap_sem); 
> > > up_write(&mm->mmap_sem), then unmap_vmas().  The oom reaper needs this 
> > > flag clear, not MMF_OOM_SKIP, while holding down_read(&mm->mmap_sem) to be 
> > > allowed to call unmap_page_range().  The oom killer will still defer 
> > > selecting this victim for MMF_OOM_SKIP after unmap_vmas() returns.
> > > 
> > > The result of that change would be that we do not oom reap from any mm 
> > > entering unmap_vmas(): we let unmap_vmas() do the work itself and avoid 
> > > racing with it.
> > > 
> > 
> > I think we need something like the following?
> 
> This patch does not work. __oom_reap_task_mm() can find MMF_REAPING and
> return true and sets MMF_OOM_SKIP before exit_mmap() calls down_write().
> 
> Also, I don't know what exit_mmap() is doing but I think that there is a
> possibility that the OOM reaper tries to reclaim mlocked pages as soon as
> exit_mmap() cleared VM_LOCKED flag by calling munlock_vma_pages_all().
> 
> 	if (mm->locked_vm) {
> 		vma = mm->mmap;
> 		while (vma) {
> 			if (vma->vm_flags & VM_LOCKED)
> 				munlock_vma_pages_all(vma);
> 			vma = vma->vm_next;
> 		}
> 	}

I do not really see, why this would matter. munlock_vma_pages_all is
mostly about accounting and clearing the per-page state. It relies on
follow_page which crawls page tables and unmap_page_range clears ptes
under the lock which is taken when resolving a locked page as well.

I still have to think about all the consequences when we are effectively
reaping VM_LOCKED vmas - I suspect we can do some misaccounting but I
yet do not see how this could lead to crashes. Maybe we can move
VM_LOCKED clearing _after_ the munlock bussiness is done but this is
really hard to tell before I re-read the mlock code more throughly.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-06  7:48     ` David Rientjes
@ 2017-12-06  9:00       ` Michal Hocko
       [not found]         ` <201712070720.vB77KlBQ009754@www262.sakura.ne.jp>
  2017-12-06 11:37       ` Tetsuo Handa
  1 sibling, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2017-12-06  9:00 UTC (permalink / raw)
  To: David Rientjes
  Cc: Tetsuo Handa, Andrew Morton, Andrea Arcangeli, linux-kernel, linux-mm

On Tue 05-12-17 23:48:21, David Rientjes wrote:
[...]
> I think this argues to do MMF_REAPING-style behavior at the beginning of 
> exit_mmap() and avoid reaping all together once we have reached that 
> point.  There are no more users of the mm and we are in the process of 
> tearing it down, I'm not sure that the oom reaper should be in the 
> business with trying to interfere with that.  Or are there actual bug 
> reports where an oom victim gets wedged while in exit_mmap() prior to 
> releasing its memory?

Something like that seem to work indeed. But we should better understand
what is going on here before adding new oom reaper specific kludges. So
let's focus on getting more information from your crashes first.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-06  7:48     ` David Rientjes
  2017-12-06  9:00       ` Michal Hocko
@ 2017-12-06 11:37       ` Tetsuo Handa
  1 sibling, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2017-12-06 11:37 UTC (permalink / raw)
  To: rientjes; +Cc: akpm, mhocko, aarcange, linux-kernel, linux-mm

David Rientjes wrote:
> On Wed, 6 Dec 2017, Tetsuo Handa wrote:
> > Also, I don't know what exit_mmap() is doing but I think that there is a
> > possibility that the OOM reaper tries to reclaim mlocked pages as soon as
> > exit_mmap() cleared VM_LOCKED flag by calling munlock_vma_pages_all().
> > 
> > 	if (mm->locked_vm) {
> > 		vma = mm->mmap;
> > 		while (vma) {
> > 			if (vma->vm_flags & VM_LOCKED)
> > 				munlock_vma_pages_all(vma);
> > 			vma = vma->vm_next;
> > 		}
> > 	}
> > 
> 
> Yes, that looks possible as well, although the problem I have reported can 
> happen with or without mlock.  Did you find this by code inspection or 
> have you experienced runtime problems with it?

By code inspection.

> 
> I think this argues to do MMF_REAPING-style behavior at the beginning of 
> exit_mmap() and avoid reaping all together once we have reached that 
> point.  There are no more users of the mm and we are in the process of 
> tearing it down, I'm not sure that the oom reaper should be in the 
> business with trying to interfere with that.  Or are there actual bug 
> reports where an oom victim gets wedged while in exit_mmap() prior to 
> releasing its memory?

If our assumption is that the OOM reaper can reclaim majority of OOM
victim's memory via victim's ->signal->oom_mm, what will be wrong with
simply reverting 212925802454672e ("mm: oom: let oom_reap_task and
exit_mmap run concurrently") and replace mmgrab()/mmdrop_async() with
mmget()/mmput_async() so that the OOM reaper no longer need to worry
about tricky __mmput() behavior (like shown below) ?

----------
 kernel/fork.c |   17 ++++++++++++-----
 mm/mmap.c     |   17 -----------------
 mm/oom_kill.c |   21 +++++++--------------
 3 files changed, 19 insertions(+), 36 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 432eadf..018a857 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -394,12 +394,18 @@ static inline void free_signal_struct(struct signal_struct *sig)
 {
 	taskstats_tgid_free(sig);
 	sched_autogroup_exit(sig);
-	/*
-	 * __mmdrop is not safe to call from softirq context on x86 due to
-	 * pgd_dtor so postpone it to the async context
-	 */
-	if (sig->oom_mm)
+	if (sig->oom_mm) {
+#ifdef CONFIG_MMU
+		mmput_async(sig->oom_mm);
+#else
+		/*
+		 * There might be archtectures where calling __mmdrop() from
+		 * softirq context is not safe. Thus, postpone it to the async
+		 * context.
+		 */
 		mmdrop_async(sig->oom_mm);
+#endif
+	}
 	kmem_cache_free(signal_cachep, sig);
 }
 
@@ -931,6 +937,7 @@ static inline void __mmput(struct mm_struct *mm)
 	}
 	if (mm->binfmt)
 		module_put(mm->binfmt->module);
+	set_bit(MMF_OOM_SKIP, &mm->flags);
 	mmdrop(mm);
 }
 
diff --git a/mm/mmap.c b/mm/mmap.c
index a4d5468..fafaf06 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3019,23 +3019,6 @@ void exit_mmap(struct mm_struct *mm)
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);
 
-	set_bit(MMF_OOM_SKIP, &mm->flags);
-	if (unlikely(tsk_is_oom_victim(current))) {
-		/*
-		 * Wait for oom_reap_task() to stop working on this
-		 * mm. Because MMF_OOM_SKIP is already set before
-		 * calling down_read(), oom_reap_task() will not run
-		 * on this "mm" post up_write().
-		 *
-		 * tsk_is_oom_victim() cannot be set from under us
-		 * either because current->mm is already set to NULL
-		 * under task_lock before calling mmput and oom_mm is
-		 * set not NULL by the OOM killer only if current->mm
-		 * is found not NULL while holding the task_lock.
-		 */
-		down_write(&mm->mmap_sem);
-		up_write(&mm->mmap_sem);
-	}
 	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
 	tlb_finish_mmu(&tlb, 0, -1);
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index c957be3..eb2a005 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -528,18 +528,6 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 		goto unlock_oom;
 	}
 
-	/*
-	 * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
-	 * work on the mm anymore. The check for MMF_OOM_SKIP must run
-	 * under mmap_sem for reading because it serializes against the
-	 * down_write();up_write() cycle in exit_mmap().
-	 */
-	if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
-		up_read(&mm->mmap_sem);
-		trace_skip_task_reaping(tsk->pid);
-		goto unlock_oom;
-	}
-
 	trace_start_task_reaping(tsk->pid);
 
 	/*
@@ -683,8 +671,13 @@ static void mark_oom_victim(struct task_struct *tsk)
 		return;
 
 	/* oom_mm is bound to the signal struct life time. */
-	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm))
-		mmgrab(tsk->signal->oom_mm);
+	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) {
+#ifdef CONFIG_MMU
+		mmget(mm);
+#else
+		mmgrab(mm);
+#endif
+	}
 
 	/*
 	 * Make sure that the task is woken up from uninterruptible sleep
----------

If holding the address space when there are so many victim's mm waiting for
the OOM reaper to reclaim is considered dangerous, I think we can try direct
OOM reaping (like untested patch shown below) in order to reclaim first-blocked
first-reaped basis (and also serve as a mitigation for race caused by removing
oom_lock from the OOM reaper).

----------
 include/linux/mm_types.h |   3 ++
 include/linux/sched.h    |   3 --
 mm/oom_kill.c            | 132 ++++++++++++++---------------------------------
 3 files changed, 43 insertions(+), 95 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cfd0ac4..068119b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -501,6 +501,9 @@ struct mm_struct {
 	atomic_long_t hugetlb_usage;
 #endif
 	struct work_struct async_put_work;
+#ifdef CONFIG_MMU
+	unsigned long oom_reap_started;
+#endif
 
 #if IS_ENABLED(CONFIG_HMM)
 	/* HMM needs to track a few things per mm */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a2709d2..d63b599 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1081,9 +1081,6 @@ struct task_struct {
 	unsigned long			task_state_change;
 #endif
 	int				pagefault_disabled;
-#ifdef CONFIG_MMU
-	struct task_struct		*oom_reaper_list;
-#endif
 #ifdef CONFIG_VMAP_STACK
 	struct vm_struct		*stack_vm_area;
 #endif
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 3b0d0fe..a9f8bae 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -309,9 +309,14 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc)
 	return CONSTRAINT_NONE;
 }
 
+#ifdef CONFIG_MMU
+static void oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm);
+#endif
+
 static int oom_evaluate_task(struct task_struct *task, void *arg)
 {
 	struct oom_control *oc = arg;
+	struct mm_struct *mm;
 	unsigned long points;
 
 	if (oom_unkillable_task(task, NULL, oc->nodemask))
@@ -324,7 +329,8 @@ static int oom_evaluate_task(struct task_struct *task, void *arg)
 	 * any memory is quite low.
 	 */
 	if (!is_sysrq_oom(oc) && tsk_is_oom_victim(task)) {
-		if (test_bit(MMF_OOM_SKIP, &task->signal->oom_mm->flags))
+		mm = task->signal->oom_mm;
+		if (test_bit(MMF_OOM_SKIP, &mm->flags))
 			goto next;
 		goto abort;
 	}
@@ -357,6 +363,15 @@ static int oom_evaluate_task(struct task_struct *task, void *arg)
 	if (oc->chosen)
 		put_task_struct(oc->chosen);
 	oc->chosen = (void *)-1UL;
+#ifdef CONFIG_MMU
+	get_task_struct(task);
+	if (!is_memcg_oom(oc))
+		rcu_read_unlock();
+	oom_reap_task_mm(task, mm);
+	put_task_struct(task);
+	if (!is_memcg_oom(oc))
+		rcu_read_lock();
+#endif
 	return 1;
 }
 
@@ -474,23 +489,14 @@ bool process_shares_mm(struct task_struct *p, struct mm_struct *mm)
 	return false;
 }
 
-
 #ifdef CONFIG_MMU
-/*
- * OOM Reaper kernel thread which tries to reap the memory used by the OOM
- * victim (if that is possible) to help the OOM killer to move on.
- */
-static struct task_struct *oom_reaper_th;
-static DECLARE_WAIT_QUEUE_HEAD(oom_reaper_wait);
-static struct task_struct *oom_reaper_list;
-static DEFINE_SPINLOCK(oom_reaper_lock);
-
 static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 {
 	struct mmu_gather tlb;
 	struct vm_area_struct *vma;
 	bool ret = true;
 
+	trace_wake_reaper(tsk->pid);
 	if (!down_read_trylock(&mm->mmap_sem)) {
 		ret = false;
 		trace_skip_task_reaping(tsk->pid);
@@ -507,8 +513,8 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 	 * mmu_notifier_invalidate_range_{start,end} around unmap_page_range
 	 */
 	if (mm_has_notifiers(mm)) {
+		ret = false;
 		up_read(&mm->mmap_sem);
-		schedule_timeout_idle(HZ);
 		goto unlock_oom;
 	}
 
@@ -567,82 +573,22 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 	return ret;
 }
 
-#define MAX_OOM_REAP_RETRIES 10
-static void oom_reap_task(struct task_struct *tsk)
-{
-	int attempts = 0;
-	struct mm_struct *mm = tsk->signal->oom_mm;
-
-	/* Retry the down_read_trylock(mmap_sem) a few times */
-	while (attempts++ < MAX_OOM_REAP_RETRIES && !__oom_reap_task_mm(tsk, mm))
-		schedule_timeout_idle(HZ/10);
-
-	if (attempts <= MAX_OOM_REAP_RETRIES)
-		goto done;
-
-
-	pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
-		task_pid_nr(tsk), tsk->comm);
-	debug_show_all_locks();
-
-done:
-	tsk->oom_reaper_list = NULL;
-
-	/*
-	 * Hide this mm from OOM killer because it has been either reaped or
-	 * somebody can't call up_write(mmap_sem).
-	 */
-	set_bit(MMF_OOM_SKIP, &mm->flags);
-
-	/* Drop a reference taken by wake_oom_reaper */
-	put_task_struct(tsk);
-}
-
-static int oom_reaper(void *unused)
+static void oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 {
-	while (true) {
-		struct task_struct *tsk = NULL;
-
-		wait_event_freezable(oom_reaper_wait, oom_reaper_list != NULL);
-		spin_lock(&oom_reaper_lock);
-		if (oom_reaper_list != NULL) {
-			tsk = oom_reaper_list;
-			oom_reaper_list = tsk->oom_reaper_list;
+	if (__oom_reap_task_mm(tsk, mm))
+		/* Hide this mm from OOM killer because we reaped it. */
+		set_bit(MMF_OOM_SKIP, &mm->flags);
+	else if (!mm->oom_reap_started)
+		mm->oom_reap_started = jiffies;
+	else if (time_after(jiffies, mm->oom_reap_started + HZ)) {
+		if (!mm_has_notifiers(mm)) {
+			pr_info("oom_reaper: unable to reap pid:%d (%s)\n",
+				task_pid_nr(tsk), tsk->comm);
+			debug_show_all_locks();
 		}
-		spin_unlock(&oom_reaper_lock);
-
-		if (tsk)
-			oom_reap_task(tsk);
+		/* Hide this mm from OOM killer because we can't reap. */
+		set_bit(MMF_OOM_SKIP, &mm->flags);
 	}
-
-	return 0;
-}
-
-static void wake_oom_reaper(struct task_struct *tsk)
-{
-	/* tsk is already queued? */
-	if (tsk == oom_reaper_list || tsk->oom_reaper_list)
-		return;
-
-	get_task_struct(tsk);
-
-	spin_lock(&oom_reaper_lock);
-	tsk->oom_reaper_list = oom_reaper_list;
-	oom_reaper_list = tsk;
-	spin_unlock(&oom_reaper_lock);
-	trace_wake_reaper(tsk->pid);
-	wake_up(&oom_reaper_wait);
-}
-
-static int __init oom_init(void)
-{
-	oom_reaper_th = kthread_run(oom_reaper, NULL, "oom_reaper");
-	return 0;
-}
-subsys_initcall(oom_init)
-#else
-static inline void wake_oom_reaper(struct task_struct *tsk)
-{
 }
 #endif /* CONFIG_MMU */
 
@@ -825,7 +771,6 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
 	unsigned int victim_points = 0;
 	static DEFINE_RATELIMIT_STATE(oom_rs, DEFAULT_RATELIMIT_INTERVAL,
 					      DEFAULT_RATELIMIT_BURST);
-	bool can_oom_reap = true;
 
 	/*
 	 * If the task is already exiting, don't alarm the sysadmin or kill
@@ -835,7 +780,6 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
 	task_lock(p);
 	if (task_will_free_mem(p)) {
 		mark_oom_victim(p);
-		wake_oom_reaper(p);
 		task_unlock(p);
 		put_task_struct(p);
 		return;
@@ -924,7 +868,6 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
 		if (same_thread_group(p, victim))
 			continue;
 		if (is_global_init(p)) {
-			can_oom_reap = false;
 			set_bit(MMF_OOM_SKIP, &mm->flags);
 			pr_info("oom killer %d (%s) has mm pinned by %d (%s)\n",
 					task_pid_nr(victim), victim->comm,
@@ -941,9 +884,15 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
 	}
 	rcu_read_unlock();
 
-	if (can_oom_reap)
-		wake_oom_reaper(victim);
-
+#ifdef CONFIG_MMU
+	/*
+	 * sysctl_oom_kill_allocating_task case could get stuck because
+	 * select_bad_process() which calls oom_reap_task_mm() is not called.
+	 */
+	if (sysctl_oom_kill_allocating_task &&
+	    !test_bit(MMF_OOM_SKIP, &mm->flags))
+		oom_reap_task_mm(victim, mm);
+#endif
 	mmdrop(mm);
 	put_task_struct(victim);
 }
@@ -1019,7 +968,6 @@ bool out_of_memory(struct oom_control *oc)
 	 */
 	if (task_will_free_mem(current)) {
 		mark_oom_victim(current);
-		wake_oom_reaper(current);
 		return true;
 	}
 
----------

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
       [not found]         ` <201712070720.vB77KlBQ009754@www262.sakura.ne.jp>
@ 2017-12-07  8:28           ` Michal Hocko
  2017-12-07  8:44             ` Michal Hocko
                               ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Michal Hocko @ 2017-12-07  8:28 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: David Rientjes, Andrew Morton, Andrea Arcangeli, linux-kernel, linux-mm

On Thu 07-12-17 16:20:47, Tetsuo Handa wrote:
[...]
> int main(int argc, char *argv[])
> {
> 	int i;
> 	char *stack;
> 	if (fork() || fork() || setsid() == EOF || pipe(pipe_fd))
> 		_exit(0);
> 	stack = mmap(NULL, STACKSIZE * NUMTHREADS, PROT_WRITE | PROT_READ,
> 		     MAP_ANONYMOUS | MAP_PRIVATE, EOF, 0);
> 	for (i = 0; i < NUMTHREADS; i++)
> 		if (clone(memory_eater, stack + (i + 1) * STACKSIZE,
> 			  /*CLONE_THREAD | CLONE_SIGHAND | */CLONE_VM | CLONE_FS |
> 			  CLONE_FILES, NULL) == -1)
> 			break;

Hmm, so you are creating a separate process (from the signal point of
view) and I suspect it is one of those that holds the last reference to
the mm_struct which is released here and it has tsk_oom_victim = F

[...]
> [  113.273394] Freed by task 1377:
> [  113.276211]  kasan_slab_free+0x71/0xc0
> [  113.279093]  kmem_cache_free+0xaf/0x1e0
> [  113.281974]  remove_vma+0x9d/0xb0
> [  113.284734]  exit_mmap+0x179/0x250
> [  113.287651]  mmput+0x7d/0x1b0
> [  113.290456]  do_exit+0x408/0x1290
> [  113.293268]  do_group_exit+0x84/0x140
> [  113.296109]  get_signal+0x291/0x9b0
> [  113.298915]  do_signal+0x8e/0xa70
> [  113.301637]  exit_to_usermode_loop+0x71/0xb0
> [  113.304632]  do_syscall_64+0x343/0x390
> [  113.307349]  return_from_SYSCALL_64+0x0/0x75

[...]

> What we overlooked is the fact that "it is not always the process which
> got ->signal->oom_mm set, it is any thread which called mmput() which
> invoked __mmput() path". Therefore, below patch fixes oops in my case.
> If some unrelated kernel thread was holding mm_users ref, it is possible
> that we miss down_write()/up_write() synchronization.

Very well spotted! It could be any task in fact (e.g. somebody reading
from /proc/<pid> file which requires mm_struct).

oom_reaper		oom_victim		task
						mmget_not_zero
			exit_mmap
			  mmput
__oom_reap_task_mm				mmput
  						  __mmput
						    exit_mmap
						      remove_vma
  unmap_page_range

So we need a more robust test for the oom victim. Your suggestion is
basically what I came up with originally [1] and which was deemed
ineffective because we took the mmap_sem even for regular paths and
Kirill was afraid this adds some unnecessary cycles to the exit path
which is quite hot.

So I guess we have to do something else instead. We have to store the
oom flag to the mm struct as well. Something like the patch below.

[1] http://lkml.kernel.org/r/20170724072332.31903-1-mhocko@kernel.org
---
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 27cd36b762b5..b7668b5d3e14 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -77,6 +77,11 @@ static inline bool tsk_is_oom_victim(struct task_struct * tsk)
 	return tsk->signal->oom_mm;
 }
 
+static inline bool mm_is_oom_victim(struct mm_struct *mm)
+{
+	return test_bit(MMF_OOM_VICTIM, &mm->flags);
+}
+
 /*
  * Checks whether a page fault on the given mm is still reliable.
  * This is no longer true if the oom reaper started to reap the
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 9c8847395b5e..da673ca66e7a 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -68,8 +68,9 @@ static inline int get_dumpable(struct mm_struct *mm)
 #define MMF_RECALC_UPROBES	20	/* MMF_HAS_UPROBES can be wrong */
 #define MMF_OOM_SKIP		21	/* mm is of no interest for the OOM killer */
 #define MMF_UNSTABLE		22	/* mm is unstable for copy_from_user */
-#define MMF_HUGE_ZERO_PAGE	23      /* mm has ever used the global huge zero page */
-#define MMF_DISABLE_THP		24	/* disable THP for all VMAs */
+#define MMF_OOM_VICTIM		23	/* mm is the oom victim */
+#define MMF_HUGE_ZERO_PAGE	24      /* mm has ever used the global huge zero page */
+#define MMF_DISABLE_THP		25	/* disable THP for all VMAs */
 #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
diff --git a/mm/mmap.c b/mm/mmap.c
index 476e810cf100..d00a06248ef1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3005,7 +3005,7 @@ void exit_mmap(struct mm_struct *mm)
 	unmap_vmas(&tlb, vma, 0, -1);
 
 	set_bit(MMF_OOM_SKIP, &mm->flags);
-	if (unlikely(tsk_is_oom_victim(current))) {
+	if (unlikely(mm_is_oom_victim(mm))) {
 		/*
 		 * Wait for oom_reap_task() to stop working on this
 		 * mm. Because MMF_OOM_SKIP is already set before
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 3b0d0fed8480..e4d290b6804b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -666,8 +666,10 @@ static void mark_oom_victim(struct task_struct *tsk)
 		return;
 
 	/* oom_mm is bound to the signal struct life time. */
-	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm))
+	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) {
 		mmgrab(tsk->signal->oom_mm);
+		set_bit(MMF_OOM_VICTIM, &mm->flags);
+	}
 
 	/*
 	 * Make sure that the task is woken up from uninterruptible sleep
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-07  8:28           ` Michal Hocko
@ 2017-12-07  8:44             ` Michal Hocko
  2017-12-07 11:00             ` Tetsuo Handa
  2017-12-07 21:22             ` David Rientjes
  2 siblings, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2017-12-07  8:44 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: David Rientjes, Andrew Morton, Andrea Arcangeli, linux-kernel, linux-mm

On Thu 07-12-17 09:28:01, Michal Hocko wrote:
[...]
> oom_reaper		oom_victim		task
> 						mmget_not_zero
> 			exit_mmap

bleh, this should have been do_exit

> 			  mmput
> __oom_reap_task_mm				mmput
>   						  __mmput
> 						    exit_mmap
> 						      remove_vma
>   unmap_page_range
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-07  8:28           ` Michal Hocko
  2017-12-07  8:44             ` Michal Hocko
@ 2017-12-07 11:00             ` Tetsuo Handa
  2017-12-07 21:22             ` David Rientjes
  2 siblings, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2017-12-07 11:00 UTC (permalink / raw)
  To: mhocko; +Cc: rientjes, akpm, aarcange, linux-kernel, linux-mm

Michal Hocko wrote:
> Hmm, so you are creating a separate process (from the signal point of
> view) and I suspect it is one of those that holds the last reference to
> the mm_struct which is released here and it has tsk_oom_victim = F

Right.

> So we need a more robust test for the oom victim. Your suggestion is
> basically what I came up with originally [1] and which was deemed
> ineffective because we took the mmap_sem even for regular paths and
> Kirill was afraid this adds some unnecessary cycles to the exit path
> which is quite hot.
> 
> So I guess we have to do something else instead. We have to store the
> oom flag to the mm struct as well. Something like the patch below.

Yes, adding a new flag for this purpose will work.

Also, setting MMF_UNSTABLE flag between after sending SIGKILL and before
victim->mm becomes NULL and testing MMF_UNSTABLE at exit_mm() should work.

But I prefer simple revert + mmget()/mmput_async() approach at
http://lkml.kernel.org/r/201712062037.DAF90168.SVFQOJFMOOtHLF@I-love.SAKURA.ne.jp , for
my approach not only saves lines but also fixes unexpected change for nommu at
http://lkml.kernel.org/r/201711091949.BDB73475.OSHFOMQtLFOFVJ@I-love.SAKURA.ne.jp .
Also, if we replace asynchronous OOM reaping by the OOM reaper kernel thread with
synchronous OOM reaping by the OOM killer, we can close MMF_OOM_SKIP race window
because it is guaranteed that __oom_reap_task_mm() is called before __mmput() is
called.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-06  2:43 Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap David Rientjes
  2017-12-06  2:58 ` David Rientjes
  2017-12-06  8:31 ` Michal Hocko
@ 2017-12-07 11:35 ` Michal Hocko
  2017-12-07 15:44   ` Tetsuo Handa
  2 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2017-12-07 11:35 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Tetsuo Handa, Andrea Arcangeli, linux-kernel, linux-mm

David, could you test with this patch please?
---
>From c4a2a58d3856bc1e528f7c1622a59cef58033cb3 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 7 Dec 2017 09:39:03 +0100
Subject: [PATCH] mm, oom_reaper: fix memory corruption

David Rientjes has reported the following memory corruption while the
oom reaper tries to unmap the victims address space
BUG: Bad page map in process oom_reaper  pte:6353826300000000 pmd:00000000
addr:00007f50cab1d000 vm_flags:08100073 anon_vma:ffff9eea335603f0 mapping:          (null) index:7f50cab1d
file:          (null) fault:          (null) mmap:          (null) readpage:          (null)
CPU: 2 PID: 1001 Comm: oom_reaper
Call Trace:
 [<ffffffffa4bd967d>] dump_stack+0x4d/0x70
 [<ffffffffa4a03558>] unmap_page_range+0x1068/0x1130
 [<ffffffffa4a2e07f>] __oom_reap_task_mm+0xd5/0x16b
 [<ffffffffa4a2e226>] oom_reaper+0xff/0x14c
 [<ffffffffa48d6ad1>] kthread+0xc1/0xe0

Tetsuo Handa has noticed that the synchronization inside exit_mmap is
insufficient. We only synchronize with the oom reaper if tsk_is_oom_victim
which is not true if the final __mmput is called from a different context
than the oom victim exit path. This can trivially happen from context of
any task which has grabbed mm reference (e.g. to read /proc/<pid>/ file
which requires mm etc.). The race would look like this

oom_reaper		oom_victim		task
						mmget_not_zero
			do_exit
			  mmput
__oom_reap_task_mm				mmput
  						  __mmput
						    exit_mmap
						      remove_vma
  unmap_page_range

Fix this issue by providing a new mm_is_oom_victim() helper which operates
on the mm struct rather than a task. Any context which operates on a remote
mm struct should use this helper in place of tsk_is_oom_victim. The flag is
set in mark_oom_victim and never cleared so it is stable in the exit_mmap
path.

Reported-by: David Rientjes <rientjes@google.com>
Debugged-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Cc: stable # 4.14
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/oom.h            | 9 +++++++++
 include/linux/sched/coredump.h | 5 +++--
 mm/mmap.c                      | 2 +-
 mm/oom_kill.c                  | 4 +++-
 4 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 27cd36b762b5..c4139e79771d 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -77,6 +77,15 @@ static inline bool tsk_is_oom_victim(struct task_struct * tsk)
 	return tsk->signal->oom_mm;
 }
 
+/*
+ * Use this helper if tsk->mm != mm and the victim mm needs a special
+ * handling. This is guaranteed to stay true after once set.
+ */
+static inline bool mm_is_oom_victim(struct mm_struct *mm)
+{
+	return test_bit(MMF_OOM_VICTIM, &mm->flags);
+}
+
 /*
  * Checks whether a page fault on the given mm is still reliable.
  * This is no longer true if the oom reaper started to reap the
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 9c8847395b5e..da673ca66e7a 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -68,8 +68,9 @@ static inline int get_dumpable(struct mm_struct *mm)
 #define MMF_RECALC_UPROBES	20	/* MMF_HAS_UPROBES can be wrong */
 #define MMF_OOM_SKIP		21	/* mm is of no interest for the OOM killer */
 #define MMF_UNSTABLE		22	/* mm is unstable for copy_from_user */
-#define MMF_HUGE_ZERO_PAGE	23      /* mm has ever used the global huge zero page */
-#define MMF_DISABLE_THP		24	/* disable THP for all VMAs */
+#define MMF_OOM_VICTIM		23	/* mm is the oom victim */
+#define MMF_HUGE_ZERO_PAGE	24      /* mm has ever used the global huge zero page */
+#define MMF_DISABLE_THP		25	/* disable THP for all VMAs */
 #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
diff --git a/mm/mmap.c b/mm/mmap.c
index 476e810cf100..d00a06248ef1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3005,7 +3005,7 @@ void exit_mmap(struct mm_struct *mm)
 	unmap_vmas(&tlb, vma, 0, -1);
 
 	set_bit(MMF_OOM_SKIP, &mm->flags);
-	if (unlikely(tsk_is_oom_victim(current))) {
+	if (unlikely(mm_is_oom_victim(mm))) {
 		/*
 		 * Wait for oom_reap_task() to stop working on this
 		 * mm. Because MMF_OOM_SKIP is already set before
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 3b0d0fed8480..e4d290b6804b 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -666,8 +666,10 @@ static void mark_oom_victim(struct task_struct *tsk)
 		return;
 
 	/* oom_mm is bound to the signal struct life time. */
-	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm))
+	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) {
 		mmgrab(tsk->signal->oom_mm);
+		set_bit(MMF_OOM_VICTIM, &mm->flags);
+	}
 
 	/*
 	 * Make sure that the task is woken up from uninterruptible sleep
-- 
2.15.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-07 11:35 ` Michal Hocko
@ 2017-12-07 15:44   ` Tetsuo Handa
  2017-12-07 16:30     ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Tetsuo Handa @ 2017-12-07 15:44 UTC (permalink / raw)
  To: mhocko, rientjes; +Cc: akpm, aarcange, linux-kernel, linux-mm

Michal Hocko wrote:
> David, could you test with this patch please?

Even if this patch solved David's case, you need to update

	 * tsk_is_oom_victim() cannot be set from under us
	 * either because current->mm is already set to NULL
	 * under task_lock before calling mmput and oom_mm is
	 * set not NULL by the OOM killer only if current->mm
	 * is found not NULL while holding the task_lock.

part as well, for it is the explanation of why
tsk_is_oom_victim() test was expected to work.

Also, do we need to do

  set_bit(MMF_OOM_SKIP, &mm->flags);

if mm_is_oom_victim(mm) == false?

exit_mmap() is called means that nobody can reach this mm
except ->signal->oom_mm, and mm_is_oom_victim(mm) == false
means that this mm cannot be reached by ->signal->oom_mm .

Then, I think we do not need to set MMF_OOM_SKIP on this mm
at exit_mmap() if mm_is_oom_victim(mm) == false.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-07 15:44   ` Tetsuo Handa
@ 2017-12-07 16:30     ` Michal Hocko
  2017-12-07 21:55       ` David Rientjes
  2017-12-08 10:11       ` Tetsuo Handa
  0 siblings, 2 replies; 19+ messages in thread
From: Michal Hocko @ 2017-12-07 16:30 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: rientjes, akpm, aarcange, linux-kernel, linux-mm

On Fri 08-12-17 00:44:11, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > David, could you test with this patch please?
> 
> Even if this patch solved David's case, you need to update
> 
> 	 * tsk_is_oom_victim() cannot be set from under us
> 	 * either because current->mm is already set to NULL
> 	 * under task_lock before calling mmput and oom_mm is
> 	 * set not NULL by the OOM killer only if current->mm
> 	 * is found not NULL while holding the task_lock.
> 
> part as well, for it is the explanation of why
> tsk_is_oom_victim() test was expected to work.

Yes, the same applies for mm_is_oom_victim. I will fixup s@tsk_@mm_@
here.

> Also, do we need to do
> 
>   set_bit(MMF_OOM_SKIP, &mm->flags);
> 
> if mm_is_oom_victim(mm) == false?

I do not think we really need to set MMF_OOM_SKIP if we are not going to
synchronize.

> exit_mmap() is called means that nobody can reach this mm
> except ->signal->oom_mm, and mm_is_oom_victim(mm) == false
> means that this mm cannot be reached by ->signal->oom_mm .
>
> Then, I think we do not need to set MMF_OOM_SKIP on this mm
> at exit_mmap() if mm_is_oom_victim(mm) == false.

yes. I will fold the following in if this turned out to really address
David's issue. But I suspect this will be the case considering the NULL
pmd in the report which would suggest racing with free_pgtable...

Thanks for the review!

---
diff --git a/mm/mmap.c b/mm/mmap.c
index d00a06248ef1..e63b7a576670 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3004,7 +3004,6 @@ void exit_mmap(struct mm_struct *mm)
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);
 
-	set_bit(MMF_OOM_SKIP, &mm->flags);
 	if (unlikely(mm_is_oom_victim(mm))) {
 		/*
 		 * Wait for oom_reap_task() to stop working on this
@@ -3012,12 +3011,13 @@ void exit_mmap(struct mm_struct *mm)
 		 * calling down_read(), oom_reap_task() will not run
 		 * on this "mm" post up_write().
 		 *
-		 * tsk_is_oom_victim() cannot be set from under us
+		 * mm_is_oom_victim() cannot be set from under us
 		 * either because current->mm is already set to NULL
 		 * under task_lock before calling mmput and oom_mm is
 		 * set not NULL by the OOM killer only if current->mm
 		 * is found not NULL while holding the task_lock.
 		 */
+		set_bit(MMF_OOM_SKIP, &mm->flags);
 		down_write(&mm->mmap_sem);
 		up_write(&mm->mmap_sem);
 	}
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-07  8:28           ` Michal Hocko
  2017-12-07  8:44             ` Michal Hocko
  2017-12-07 11:00             ` Tetsuo Handa
@ 2017-12-07 21:22             ` David Rientjes
  2017-12-08  7:50               ` Michal Hocko
  2 siblings, 1 reply; 19+ messages in thread
From: David Rientjes @ 2017-12-07 21:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, Andrew Morton, Andrea Arcangeli, linux-kernel, linux-mm

On Thu, 7 Dec 2017, Michal Hocko wrote:

> Very well spotted! It could be any task in fact (e.g. somebody reading
> from /proc/<pid> file which requires mm_struct).
> 
> oom_reaper		oom_victim		task
> 						mmget_not_zero
> 			exit_mmap
> 			  mmput
> __oom_reap_task_mm				mmput
>   						  __mmput
> 						    exit_mmap
> 						      remove_vma
>   unmap_page_range
> 
> So we need a more robust test for the oom victim. Your suggestion is
> basically what I came up with originally [1] and which was deemed
> ineffective because we took the mmap_sem even for regular paths and
> Kirill was afraid this adds some unnecessary cycles to the exit path
> which is quite hot.
> 

Yes, I can confirm that in all crashes that we have analyzed so far that 
MMF_OOM_SKIP is actually set at the time that oom_reaper causes BUGs of 
various stack traces all originating from unmap_page_range() which is 
certainly not supposed to happen.

> So I guess we have to do something else instead. We have to store the
> oom flag to the mm struct as well. Something like the patch below.
> 
> [1] http://lkml.kernel.org/r/20170724072332.31903-1-mhocko@kernel.org
> ---
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index 27cd36b762b5..b7668b5d3e14 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -77,6 +77,11 @@ static inline bool tsk_is_oom_victim(struct task_struct * tsk)
>  	return tsk->signal->oom_mm;
>  }
>  
> +static inline bool mm_is_oom_victim(struct mm_struct *mm)
> +{
> +	return test_bit(MMF_OOM_VICTIM, &mm->flags);
> +}
> +
>  /*
>   * Checks whether a page fault on the given mm is still reliable.
>   * This is no longer true if the oom reaper started to reap the
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index 9c8847395b5e..da673ca66e7a 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -68,8 +68,9 @@ static inline int get_dumpable(struct mm_struct *mm)
>  #define MMF_RECALC_UPROBES	20	/* MMF_HAS_UPROBES can be wrong */
>  #define MMF_OOM_SKIP		21	/* mm is of no interest for the OOM killer */
>  #define MMF_UNSTABLE		22	/* mm is unstable for copy_from_user */
> -#define MMF_HUGE_ZERO_PAGE	23      /* mm has ever used the global huge zero page */
> -#define MMF_DISABLE_THP		24	/* disable THP for all VMAs */
> +#define MMF_OOM_VICTIM		23	/* mm is the oom victim */
> +#define MMF_HUGE_ZERO_PAGE	24      /* mm has ever used the global huge zero page */
> +#define MMF_DISABLE_THP		25	/* disable THP for all VMAs */
>  #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
>  
>  #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\

Could we not adjust the bit values, but simply add new one for 
MMF_OOM_VICTIM?  We have automated tools that look at specific bits in 
mm->flags and it would be nice to not have them be inconsistent between 
kernel versions.  Not absolutely required, but nice to avoid.

> diff --git a/mm/mmap.c b/mm/mmap.c
> index 476e810cf100..d00a06248ef1 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3005,7 +3005,7 @@ void exit_mmap(struct mm_struct *mm)
>  	unmap_vmas(&tlb, vma, 0, -1);
>  
>  	set_bit(MMF_OOM_SKIP, &mm->flags);
> -	if (unlikely(tsk_is_oom_victim(current))) {
> +	if (unlikely(mm_is_oom_victim(mm))) {
>  		/*
>  		 * Wait for oom_reap_task() to stop working on this
>  		 * mm. Because MMF_OOM_SKIP is already set before
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 3b0d0fed8480..e4d290b6804b 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -666,8 +666,10 @@ static void mark_oom_victim(struct task_struct *tsk)
>  		return;
>  
>  	/* oom_mm is bound to the signal struct life time. */
> -	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm))
> +	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) {
>  		mmgrab(tsk->signal->oom_mm);
> +		set_bit(MMF_OOM_VICTIM, &mm->flags);
> +	}
>  
>  	/*
>  	 * Make sure that the task is woken up from uninterruptible sleep

Looks good, I see the other email with the same functional change plus a 
follow-up based on a suggestion by Tetsuo.  I'll test it alongside a 
change to not adjust existing MMF_* bit numbers.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-07 16:30     ` Michal Hocko
@ 2017-12-07 21:55       ` David Rientjes
  2017-12-08  9:26         ` David Rientjes
  2017-12-08 10:11       ` Tetsuo Handa
  1 sibling, 1 reply; 19+ messages in thread
From: David Rientjes @ 2017-12-07 21:55 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Tetsuo Handa, akpm, aarcange, linux-kernel, linux-mm

On Thu, 7 Dec 2017, Michal Hocko wrote:

> yes. I will fold the following in if this turned out to really address
> David's issue. But I suspect this will be the case considering the NULL
> pmd in the report which would suggest racing with free_pgtable...
> 

I'm backporting and testing the following patch against Linus's tree.  To 
clarify an earlier point, we don't actually have any change from upstream 
code that allows for free_pgtables() before the 
set_bit(MMF_OOM_SKIP);down_write();up_write() cycle.

diff --git a/include/linux/oom.h b/include/linux/oom.h
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -66,6 +66,15 @@ static inline bool tsk_is_oom_victim(struct task_struct * tsk)
 	return tsk->signal->oom_mm;
 }
 
+/*
+ * Use this helper if tsk->mm != mm and the victim mm needs a special
+ * handling. This is guaranteed to stay true after once set.
+ */
+static inline bool mm_is_oom_victim(struct mm_struct *mm)
+{
+	return test_bit(MMF_OOM_VICTIM, &mm->flags);
+}
+
 /*
  * Checks whether a page fault on the given mm is still reliable.
  * This is no longer true if the oom reaper started to reap the
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -71,6 +71,7 @@ static inline int get_dumpable(struct mm_struct *mm)
 #define MMF_HUGE_ZERO_PAGE	23      /* mm has ever used the global huge zero page */
 #define MMF_DISABLE_THP		24	/* disable THP for all VMAs */
 #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
+#define MMF_OOM_VICTIM		25	/* mm is the oom victim */
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
 				 MMF_DISABLE_THP_MASK)
diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3019,20 +3019,20 @@ void exit_mmap(struct mm_struct *mm)
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	unmap_vmas(&tlb, vma, 0, -1);
 
-	set_bit(MMF_OOM_SKIP, &mm->flags);
-	if (unlikely(tsk_is_oom_victim(current))) {
+	if (unlikely(mm_is_oom_victim(mm))) {
 		/*
 		 * Wait for oom_reap_task() to stop working on this
 		 * mm. Because MMF_OOM_SKIP is already set before
 		 * calling down_read(), oom_reap_task() will not run
 		 * on this "mm" post up_write().
 		 *
-		 * tsk_is_oom_victim() cannot be set from under us
+		 * mm_is_oom_victim() cannot be set from under us
 		 * either because current->mm is already set to NULL
 		 * under task_lock before calling mmput and oom_mm is
 		 * set not NULL by the OOM killer only if current->mm
 		 * is found not NULL while holding the task_lock.
 		 */
+		set_bit(MMF_OOM_SKIP, &mm->flags);
 		down_write(&mm->mmap_sem);
 		up_write(&mm->mmap_sem);
 	}
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -683,8 +683,10 @@ static void mark_oom_victim(struct task_struct *tsk)
 		return;
 
 	/* oom_mm is bound to the signal struct life time. */
-	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm))
+	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) {
 		mmgrab(tsk->signal->oom_mm);
+		set_bit(MMF_OOM_VICTIM, &mm->flags);
+	}
 
 	/*
 	 * Make sure that the task is woken up from uninterruptible sleep

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-07 21:22             ` David Rientjes
@ 2017-12-08  7:50               ` Michal Hocko
  0 siblings, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2017-12-08  7:50 UTC (permalink / raw)
  To: David Rientjes
  Cc: Tetsuo Handa, Andrew Morton, Andrea Arcangeli, linux-kernel, linux-mm

On Thu 07-12-17 13:22:30, David Rientjes wrote:
[...]
> > diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> > index 9c8847395b5e..da673ca66e7a 100644
> > --- a/include/linux/sched/coredump.h
> > +++ b/include/linux/sched/coredump.h
> > @@ -68,8 +68,9 @@ static inline int get_dumpable(struct mm_struct *mm)
> >  #define MMF_RECALC_UPROBES	20	/* MMF_HAS_UPROBES can be wrong */
> >  #define MMF_OOM_SKIP		21	/* mm is of no interest for the OOM killer */
> >  #define MMF_UNSTABLE		22	/* mm is unstable for copy_from_user */
> > -#define MMF_HUGE_ZERO_PAGE	23      /* mm has ever used the global huge zero page */
> > -#define MMF_DISABLE_THP		24	/* disable THP for all VMAs */
> > +#define MMF_OOM_VICTIM		23	/* mm is the oom victim */
> > +#define MMF_HUGE_ZERO_PAGE	24      /* mm has ever used the global huge zero page */
> > +#define MMF_DISABLE_THP		25	/* disable THP for all VMAs */
> >  #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
> >  
> >  #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
> 
> Could we not adjust the bit values, but simply add new one for 
> MMF_OOM_VICTIM?  We have automated tools that look at specific bits in 
> mm->flags and it would be nice to not have them be inconsistent between 
> kernel versions.  Not absolutely required, but nice to avoid.

I just wanted to have those semantically related bits closer
together. But I do not insist on this.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-07 21:55       ` David Rientjes
@ 2017-12-08  9:26         ` David Rientjes
  2017-12-08 11:27           ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: David Rientjes @ 2017-12-08  9:26 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Tetsuo Handa, akpm, aarcange, linux-kernel, linux-mm

On Thu, 7 Dec 2017, David Rientjes wrote:

> I'm backporting and testing the following patch against Linus's tree.  To 
> clarify an earlier point, we don't actually have any change from upstream 
> code that allows for free_pgtables() before the 
> set_bit(MMF_OOM_SKIP);down_write();up_write() cycle.
> 
> diff --git a/include/linux/oom.h b/include/linux/oom.h
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -66,6 +66,15 @@ static inline bool tsk_is_oom_victim(struct task_struct * tsk)
>  	return tsk->signal->oom_mm;
>  }
>  
> +/*
> + * Use this helper if tsk->mm != mm and the victim mm needs a special
> + * handling. This is guaranteed to stay true after once set.
> + */
> +static inline bool mm_is_oom_victim(struct mm_struct *mm)
> +{
> +	return test_bit(MMF_OOM_VICTIM, &mm->flags);
> +}
> +
>  /*
>   * Checks whether a page fault on the given mm is still reliable.
>   * This is no longer true if the oom reaper started to reap the
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -71,6 +71,7 @@ static inline int get_dumpable(struct mm_struct *mm)
>  #define MMF_HUGE_ZERO_PAGE	23      /* mm has ever used the global huge zero page */
>  #define MMF_DISABLE_THP		24	/* disable THP for all VMAs */
>  #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
> +#define MMF_OOM_VICTIM		25	/* mm is the oom victim */
>  
>  #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
>  				 MMF_DISABLE_THP_MASK)
> diff --git a/mm/mmap.c b/mm/mmap.c
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3019,20 +3019,20 @@ void exit_mmap(struct mm_struct *mm)
>  	/* Use -1 here to ensure all VMAs in the mm are unmapped */
>  	unmap_vmas(&tlb, vma, 0, -1);
>  
> -	set_bit(MMF_OOM_SKIP, &mm->flags);
> -	if (unlikely(tsk_is_oom_victim(current))) {
> +	if (unlikely(mm_is_oom_victim(mm))) {
>  		/*
>  		 * Wait for oom_reap_task() to stop working on this
>  		 * mm. Because MMF_OOM_SKIP is already set before
>  		 * calling down_read(), oom_reap_task() will not run
>  		 * on this "mm" post up_write().
>  		 *
> -		 * tsk_is_oom_victim() cannot be set from under us
> +		 * mm_is_oom_victim() cannot be set from under us
>  		 * either because current->mm is already set to NULL
>  		 * under task_lock before calling mmput and oom_mm is
>  		 * set not NULL by the OOM killer only if current->mm
>  		 * is found not NULL while holding the task_lock.
>  		 */
> +		set_bit(MMF_OOM_SKIP, &mm->flags);
>  		down_write(&mm->mmap_sem);
>  		up_write(&mm->mmap_sem);
>  	}
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -683,8 +683,10 @@ static void mark_oom_victim(struct task_struct *tsk)
>  		return;
>  
>  	/* oom_mm is bound to the signal struct life time. */
> -	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm))
> +	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) {
>  		mmgrab(tsk->signal->oom_mm);
> +		set_bit(MMF_OOM_VICTIM, &mm->flags);
> +	}
>  
>  	/*
>  	 * Make sure that the task is woken up from uninterruptible sleep
> 

This passes all functional testing that I have and I can create a 
synthetic testcase that can trigger at least MMF_OOM_VICTIM getting set 
while oom_reaper is still working on an mm that this prevents, so feel 
free to add an

	Acked-by: David Rientjes <rientjes@google.com>

with a variant of your previous changelogs.  Thanks!

I think it would appropriate to cc stable for 4.14 and add a

Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run 
concurrently")

if nobody disagrees, which I think you may have already done on a previous 
iteration.

We can still discuss if there are any VM_LOCKED subtleties in the this 
thread, but I have no evidence that it is responsible for any issues.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-07 16:30     ` Michal Hocko
  2017-12-07 21:55       ` David Rientjes
@ 2017-12-08 10:11       ` Tetsuo Handa
  1 sibling, 0 replies; 19+ messages in thread
From: Tetsuo Handa @ 2017-12-08 10:11 UTC (permalink / raw)
  To: mhocko; +Cc: rientjes, akpm, aarcange, linux-kernel, linux-mm

Michal Hocko wrote:
> On Fri 08-12-17 00:44:11, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > David, could you test with this patch please?
> > 
> > Even if this patch solved David's case, you need to update
> > 
> > 	 * tsk_is_oom_victim() cannot be set from under us
> > 	 * either because current->mm is already set to NULL
> > 	 * under task_lock before calling mmput and oom_mm is
> > 	 * set not NULL by the OOM killer only if current->mm
> > 	 * is found not NULL while holding the task_lock.
> > 
> > part as well, for it is the explanation of why
> > tsk_is_oom_victim() test was expected to work.
> 
> Yes, the same applies for mm_is_oom_victim. I will fixup s@tsk_@mm_@
> here.
> 

If you try to "s@tsk_@mm_@", I suggest doing

----------
 mm/mmap.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index 68fdd14..b2cb4e5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3025,12 +3025,6 @@ void exit_mmap(struct mm_struct *mm)
 		 * mm. Because MMF_OOM_SKIP is already set before
 		 * calling down_read(), oom_reap_task() will not run
 		 * on this "mm" post up_write().
-		 *
-		 * mm_is_oom_victim() cannot be set from under us
-		 * either because current->mm is already set to NULL
-		 * under task_lock before calling mmput and oom_mm is
-		 * set not NULL by the OOM killer only if current->mm
-		 * is found not NULL while holding the task_lock.
 		 */
 		set_bit(MMF_OOM_SKIP, &mm->flags);
 		down_write(&mm->mmap_sem);
----------

because current->mm and current->signal->oom_mm are irrelevant for
the reason to set MMF_OOM_SKIP here. What matters here is that this
mm's ->mmap must not be accessed by the OOM reaper afterwards. (And
I think we can do mm->mmap = NULL instead of setting MMF_OOM_SKIP.
Then, we can defer setting MMF_OOM_SKIP until "the OOM reaper gave up
waiting for __mmput()" or "__mmput() became ready to call mmdrop()"
whichever occurred first.)

I think we can try changes shown below for

 (1) Allowing __oom_reap_task_mm() to run without oom_lock held.
     (Because start OOM reaping without waiting for oom_lock can mitigate
     unexpected stalls due to schedule_timeout_killable(1) with oom_lock
     held. Though, root cause of the stall is that we can't guarantee that
     enough CPU resource is given to a thread holding oom_lock.)

     Also, we could offload __oom_reap_task_mm() to some WQ_MEM_RECLAIM
     workqueue in order to allow reclaiming memory in parallel when many
     victims are waiting for OOM reaping.

 (2) Guarantee that last second allocation at __alloc_pages_may_oom()
     is attempted after confirming that there is no victim's mm without
     MMF_OOM_SKIP set. (Because moving last second allocation to after
     select_bad_process() was rejected. Though, doing last second
     allocation attempt after select_bad_process() can allow us to
     eliminate oom_lock taken in the changes shown below...)

 (3) Guarantee that MMF_OOM_SKIP is set at __mmput(). (Because this was
     an unexpected change for CONFIG_MMU=n.)

----------
 kernel/fork.c |  6 ++++++
 mm/mmap.c     |  9 ++-------
 mm/oom_kill.c | 49 ++++++++++---------------------------------------
 3 files changed, 18 insertions(+), 46 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 432eadf..dd1d69e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -931,6 +931,12 @@ static inline void __mmput(struct mm_struct *mm)
 	}
 	if (mm->binfmt)
 		module_put(mm->binfmt->module);
+	if (unlikely(mm_is_oom_victim(mm))) {
+		/* Tell select_bad_process() to start selecting next mm. */
+		mutex_lock(&oom_lock);
+		set_bit(MMF_OOM_SKIP, &mm->flags);
+		mutex_unlock(&oom_lock);
+	}
 	mmdrop(mm);
 }
 
diff --git a/mm/mmap.c b/mm/mmap.c
index b2cb4e5..641b5c1 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3020,14 +3020,9 @@ void exit_mmap(struct mm_struct *mm)
 	unmap_vmas(&tlb, vma, 0, -1);
 
 	if (unlikely(mm_is_oom_victim(mm))) {
-		/*
-		 * Wait for oom_reap_task() to stop working on this
-		 * mm. Because MMF_OOM_SKIP is already set before
-		 * calling down_read(), oom_reap_task() will not run
-		 * on this "mm" post up_write().
-		 */
-		set_bit(MMF_OOM_SKIP, &mm->flags);
+		/* Tell oom_reap_task() to stop working on this mm. */
 		down_write(&mm->mmap_sem);
+		mm->mmap = NULL;
 		up_write(&mm->mmap_sem);
 	}
 	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 29f8555..ec5303d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -489,28 +489,10 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 {
 	struct mmu_gather tlb;
 	struct vm_area_struct *vma;
-	bool ret = true;
-
-	/*
-	 * We have to make sure to not race with the victim exit path
-	 * and cause premature new oom victim selection:
-	 * __oom_reap_task_mm		exit_mm
-	 *   mmget_not_zero
-	 *				  mmput
-	 *				    atomic_dec_and_test
-	 *				  exit_oom_victim
-	 *				[...]
-	 *				out_of_memory
-	 *				  select_bad_process
-	 *				    # no TIF_MEMDIE task selects new victim
-	 *  unmap_page_range # frees some memory
-	 */
-	mutex_lock(&oom_lock);
 
 	if (!down_read_trylock(&mm->mmap_sem)) {
-		ret = false;
 		trace_skip_task_reaping(tsk->pid);
-		goto unlock_oom;
+		return false;
 	}
 
 	/*
@@ -525,19 +507,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 	if (mm_has_notifiers(mm)) {
 		up_read(&mm->mmap_sem);
 		schedule_timeout_idle(HZ);
-		goto unlock_oom;
-	}
-
-	/*
-	 * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
-	 * work on the mm anymore. The check for MMF_OOM_SKIP must run
-	 * under mmap_sem for reading because it serializes against the
-	 * down_write();up_write() cycle in exit_mmap().
-	 */
-	if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
-		up_read(&mm->mmap_sem);
-		trace_skip_task_reaping(tsk->pid);
-		goto unlock_oom;
+		return true;
 	}
 
 	trace_start_task_reaping(tsk->pid);
@@ -550,6 +520,10 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 	 */
 	set_bit(MMF_UNSTABLE, &mm->flags);
 
+	/*
+	 * exit_mmap() sets mm->mmap to NULL with mm->mmap_sem held for write
+	 * if I need to stop working on this mm.
+	 */
 	for (vma = mm->mmap ; vma; vma = vma->vm_next) {
 		if (!can_madv_dontneed_vma(vma))
 			continue;
@@ -579,9 +553,7 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm)
 	up_read(&mm->mmap_sem);
 
 	trace_finish_task_reaping(tsk->pid);
-unlock_oom:
-	mutex_unlock(&oom_lock);
-	return ret;
+	return true;
 }
 
 #define MAX_OOM_REAP_RETRIES 10
@@ -605,11 +577,10 @@ static void oom_reap_task(struct task_struct *tsk)
 done:
 	tsk->oom_reaper_list = NULL;
 
-	/*
-	 * Hide this mm from OOM killer because it has been either reaped or
-	 * somebody can't call up_write(mmap_sem).
-	 */
+	/* Tell select_bad_process() to start selecting next mm. */
+	mutex_lock(&oom_lock);
 	set_bit(MMF_OOM_SKIP, &mm->flags);
+	mutex_unlock(&oom_lock);
 
 	/* Drop a reference taken by wake_oom_reaper */
 	put_task_struct(tsk);
----------

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap
  2017-12-08  9:26         ` David Rientjes
@ 2017-12-08 11:27           ` Michal Hocko
  0 siblings, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2017-12-08 11:27 UTC (permalink / raw)
  To: David Rientjes; +Cc: Tetsuo Handa, akpm, aarcange, linux-kernel, linux-mm

On Fri 08-12-17 01:26:46, David Rientjes wrote:
> On Thu, 7 Dec 2017, David Rientjes wrote:
> 
> > I'm backporting and testing the following patch against Linus's tree.  To 
> > clarify an earlier point, we don't actually have any change from upstream 
> > code that allows for free_pgtables() before the 
> > set_bit(MMF_OOM_SKIP);down_write();up_write() cycle.
> > 
> > diff --git a/include/linux/oom.h b/include/linux/oom.h
> > --- a/include/linux/oom.h
> > +++ b/include/linux/oom.h
> > @@ -66,6 +66,15 @@ static inline bool tsk_is_oom_victim(struct task_struct * tsk)
> >  	return tsk->signal->oom_mm;
> >  }
> >  
> > +/*
> > + * Use this helper if tsk->mm != mm and the victim mm needs a special
> > + * handling. This is guaranteed to stay true after once set.
> > + */
> > +static inline bool mm_is_oom_victim(struct mm_struct *mm)
> > +{
> > +	return test_bit(MMF_OOM_VICTIM, &mm->flags);
> > +}
> > +
> >  /*
> >   * Checks whether a page fault on the given mm is still reliable.
> >   * This is no longer true if the oom reaper started to reap the
> > diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> > --- a/include/linux/sched/coredump.h
> > +++ b/include/linux/sched/coredump.h
> > @@ -71,6 +71,7 @@ static inline int get_dumpable(struct mm_struct *mm)
> >  #define MMF_HUGE_ZERO_PAGE	23      /* mm has ever used the global huge zero page */
> >  #define MMF_DISABLE_THP		24	/* disable THP for all VMAs */
> >  #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
> > +#define MMF_OOM_VICTIM		25	/* mm is the oom victim */
> >  
> >  #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
> >  				 MMF_DISABLE_THP_MASK)
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -3019,20 +3019,20 @@ void exit_mmap(struct mm_struct *mm)
> >  	/* Use -1 here to ensure all VMAs in the mm are unmapped */
> >  	unmap_vmas(&tlb, vma, 0, -1);
> >  
> > -	set_bit(MMF_OOM_SKIP, &mm->flags);
> > -	if (unlikely(tsk_is_oom_victim(current))) {
> > +	if (unlikely(mm_is_oom_victim(mm))) {
> >  		/*
> >  		 * Wait for oom_reap_task() to stop working on this
> >  		 * mm. Because MMF_OOM_SKIP is already set before
> >  		 * calling down_read(), oom_reap_task() will not run
> >  		 * on this "mm" post up_write().
> >  		 *
> > -		 * tsk_is_oom_victim() cannot be set from under us
> > +		 * mm_is_oom_victim() cannot be set from under us
> >  		 * either because current->mm is already set to NULL
> >  		 * under task_lock before calling mmput and oom_mm is
> >  		 * set not NULL by the OOM killer only if current->mm
> >  		 * is found not NULL while holding the task_lock.
> >  		 */
> > +		set_bit(MMF_OOM_SKIP, &mm->flags);
> >  		down_write(&mm->mmap_sem);
> >  		up_write(&mm->mmap_sem);
> >  	}
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -683,8 +683,10 @@ static void mark_oom_victim(struct task_struct *tsk)
> >  		return;
> >  
> >  	/* oom_mm is bound to the signal struct life time. */
> > -	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm))
> > +	if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) {
> >  		mmgrab(tsk->signal->oom_mm);
> > +		set_bit(MMF_OOM_VICTIM, &mm->flags);
> > +	}
> >  
> >  	/*
> >  	 * Make sure that the task is woken up from uninterruptible sleep
> > 
> 
> This passes all functional testing that I have and I can create a 
> synthetic testcase that can trigger at least MMF_OOM_VICTIM getting set 
> while oom_reaper is still working on an mm that this prevents, so feel 
> free to add an
> 
> 	Acked-by: David Rientjes <rientjes@google.com>
> 
> with a variant of your previous changelogs.  Thanks!
> 
> I think it would appropriate to cc stable for 4.14 and add a
> 
> Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run 
> concurrently")
> 
> if nobody disagrees, which I think you may have already done on a previous 
> iteration.

Thanks for your testing! I will repost the patch later today.

> We can still discuss if there are any VM_LOCKED subtleties in the this 
> thread, but I have no evidence that it is responsible for any issues.

Yes this is worth a separate discussion.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-12-08 11:27 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-06  2:43 Multiple oom_reaper BUGs: unmap_page_range racing with exit_mmap David Rientjes
2017-12-06  2:58 ` David Rientjes
     [not found]   ` <201712060328.vB63SrDK069830@www262.sakura.ne.jp>
2017-12-06  7:48     ` David Rientjes
2017-12-06  9:00       ` Michal Hocko
     [not found]         ` <201712070720.vB77KlBQ009754@www262.sakura.ne.jp>
2017-12-07  8:28           ` Michal Hocko
2017-12-07  8:44             ` Michal Hocko
2017-12-07 11:00             ` Tetsuo Handa
2017-12-07 21:22             ` David Rientjes
2017-12-08  7:50               ` Michal Hocko
2017-12-06 11:37       ` Tetsuo Handa
2017-12-06  8:50     ` Michal Hocko
2017-12-06  8:31 ` Michal Hocko
2017-12-07 11:35 ` Michal Hocko
2017-12-07 15:44   ` Tetsuo Handa
2017-12-07 16:30     ` Michal Hocko
2017-12-07 21:55       ` David Rientjes
2017-12-08  9:26         ` David Rientjes
2017-12-08 11:27           ` Michal Hocko
2017-12-08 10:11       ` Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).