All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Huang Ying <ying.huang@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Chinner <david@fromorbit.com>, Michal Hocko <mhocko@suse.cz>,
	"Theodore Ts'o" <tytso@mit.edu>
Subject: Re: [patch 03/12] mm: oom_kill: switch test-and-clear of known TIF_MEMDIE to clear
Date: Thu, 26 Mar 2015 12:50:20 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.10.1503261231440.9410@chino.kir.corp.google.com> (raw)
In-Reply-To: <20150326110532.GB18560@cmpxchg.org>

On Thu, 26 Mar 2015, Johannes Weiner wrote:

> > > exit_oom_victim() already knows that TIF_MEMDIE is set, and nobody
> > > else can clear it concurrently.  Use clear_thread_flag() directly.
> > > 
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > 
> > For the oom killer, that's true because of task_lock(): we always only set 
> > TIF_MEMDIE when there is a valid p->mm and it's cleared in the exit path 
> > after the unlock, acting as a barrier, when p->mm is set to NULL so it's 
> > no longer a valid victim.  So that part is fine.
> > 
> > The problem is the android low memory killer that does 
> > mark_tsk_oom_victim() without the protection of task_lock(), it's just rcu 
> > protected so the reference to the task itself is guaranteed to still be 
> > valid.
> 
> But this is about *setting* it without a lock.  My point was that once
> TIF_MEMDIE is actually set, the task owns it and nobody else can clear
> it for them, so it's safe to test and clear non-atomically from the
> task's own context.  Am I missing something?
> 

Yes, I'm thinking about the following which already exists before your 
patch:

	tskA			tskB
	----			----
	lowmem_scan()
	-> tskB->mm != NULL
	-> selected = tskB
				exit_mm()
				exit_oom_victim()
				-> TIF_MEMDIE not set, return	
	mark_oom_victim(tskB)
	-> set TIF_MEMDIE

And now if tskA fails to exit then the oom killer is going to stall 
forever because we don't check for p->mm != NULL when testing eligible 
processes for TIF_MEMDIE.

So there's nothing wrong with your patch, I'm just digesting all of this 
new mark_oom_victim() stuff.

Acked-by: David Rientjes <rientjes@google.com>

I think the lmk should be doing this, in addition:


android, lmk: avoid setting TIF_MEMDIE if process has already exited

TIF_MEMDIE should not be set on a process if it does not have a valid 
->mm, and this is protected by task_lock().

If TIF_MEMDIE gets set after the mm has detached, and the process fails to 
exit, then the oom killer will defer forever waiting for it to exit.

Make sure that the mm is still valid before setting TIF_MEMDIE by way of 
mark_tsk_oom_victim().

Signed-off-by: David Rientjes <rientjes@google.com>
---
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -156,20 +156,27 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
 			     p->pid, p->comm, oom_score_adj, tasksize);
 	}
 	if (selected) {
-		lowmem_print(1, "send sigkill to %d (%s), adj %hd, size %d\n",
-			     selected->pid, selected->comm,
-			     selected_oom_score_adj, selected_tasksize);
-		lowmem_deathpending_timeout = jiffies + HZ;
+		task_lock(selected);
+		if (!selected->mm) {
+			/* Already exited, cannot do mark_tsk_oom_victim() */
+			task_unlock(selected);
+			goto out;
+		}
 		/*
 		 * FIXME: lowmemorykiller shouldn't abuse global OOM killer
 		 * infrastructure. There is no real reason why the selected
 		 * task should have access to the memory reserves.
 		 */
 		mark_tsk_oom_victim(selected);
+		task_unlock(selected);
+		lowmem_print(1, "send sigkill to %d (%s), adj %hd, size %d\n",
+			     selected->pid, selected->comm,
+			     selected_oom_score_adj, selected_tasksize);
+		lowmem_deathpending_timeout = jiffies + HZ;
 		send_sig(SIGKILL, selected, 0);
 		rem += selected_tasksize;
 	}
-
+out:
 	lowmem_print(4, "lowmem_scan %lu, %x, return %lu\n",
 		     sc->nr_to_scan, sc->gfp_mask, rem);
 	rcu_read_unlock();

WARNING: multiple messages have this Message-ID (diff)
From: David Rientjes <rientjes@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Huang Ying <ying.huang@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Dave Chinner <david@fromorbit.com>, Michal Hocko <mhocko@suse.cz>,
	Theodore Ts'o <tytso@mit.edu>
Subject: Re: [patch 03/12] mm: oom_kill: switch test-and-clear of known TIF_MEMDIE to clear
Date: Thu, 26 Mar 2015 12:50:20 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.10.1503261231440.9410@chino.kir.corp.google.com> (raw)
In-Reply-To: <20150326110532.GB18560@cmpxchg.org>

On Thu, 26 Mar 2015, Johannes Weiner wrote:

> > > exit_oom_victim() already knows that TIF_MEMDIE is set, and nobody
> > > else can clear it concurrently.  Use clear_thread_flag() directly.
> > > 
> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > 
> > For the oom killer, that's true because of task_lock(): we always only set 
> > TIF_MEMDIE when there is a valid p->mm and it's cleared in the exit path 
> > after the unlock, acting as a barrier, when p->mm is set to NULL so it's 
> > no longer a valid victim.  So that part is fine.
> > 
> > The problem is the android low memory killer that does 
> > mark_tsk_oom_victim() without the protection of task_lock(), it's just rcu 
> > protected so the reference to the task itself is guaranteed to still be 
> > valid.
> 
> But this is about *setting* it without a lock.  My point was that once
> TIF_MEMDIE is actually set, the task owns it and nobody else can clear
> it for them, so it's safe to test and clear non-atomically from the
> task's own context.  Am I missing something?
> 

Yes, I'm thinking about the following which already exists before your 
patch:

	tskA			tskB
	----			----
	lowmem_scan()
	-> tskB->mm != NULL
	-> selected = tskB
				exit_mm()
				exit_oom_victim()
				-> TIF_MEMDIE not set, return	
	mark_oom_victim(tskB)
	-> set TIF_MEMDIE

And now if tskA fails to exit then the oom killer is going to stall 
forever because we don't check for p->mm != NULL when testing eligible 
processes for TIF_MEMDIE.

So there's nothing wrong with your patch, I'm just digesting all of this 
new mark_oom_victim() stuff.

Acked-by: David Rientjes <rientjes@google.com>

I think the lmk should be doing this, in addition:


android, lmk: avoid setting TIF_MEMDIE if process has already exited

TIF_MEMDIE should not be set on a process if it does not have a valid 
->mm, and this is protected by task_lock().

If TIF_MEMDIE gets set after the mm has detached, and the process fails to 
exit, then the oom killer will defer forever waiting for it to exit.

Make sure that the mm is still valid before setting TIF_MEMDIE by way of 
mark_tsk_oom_victim().

Signed-off-by: David Rientjes <rientjes@google.com>
---
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -156,20 +156,27 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
 			     p->pid, p->comm, oom_score_adj, tasksize);
 	}
 	if (selected) {
-		lowmem_print(1, "send sigkill to %d (%s), adj %hd, size %d\n",
-			     selected->pid, selected->comm,
-			     selected_oom_score_adj, selected_tasksize);
-		lowmem_deathpending_timeout = jiffies + HZ;
+		task_lock(selected);
+		if (!selected->mm) {
+			/* Already exited, cannot do mark_tsk_oom_victim() */
+			task_unlock(selected);
+			goto out;
+		}
 		/*
 		 * FIXME: lowmemorykiller shouldn't abuse global OOM killer
 		 * infrastructure. There is no real reason why the selected
 		 * task should have access to the memory reserves.
 		 */
 		mark_tsk_oom_victim(selected);
+		task_unlock(selected);
+		lowmem_print(1, "send sigkill to %d (%s), adj %hd, size %d\n",
+			     selected->pid, selected->comm,
+			     selected_oom_score_adj, selected_tasksize);
+		lowmem_deathpending_timeout = jiffies + HZ;
 		send_sig(SIGKILL, selected, 0);
 		rem += selected_tasksize;
 	}
-
+out:
 	lowmem_print(4, "lowmem_scan %lu, %x, return %lu\n",
 		     sc->nr_to_scan, sc->gfp_mask, rem);
 	rcu_read_unlock();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-03-26 19:50 UTC|newest]

Thread overview: 138+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-25  6:17 [patch 00/12] mm: page_alloc: improve OOM mechanism and policy Johannes Weiner
2015-03-25  6:17 ` Johannes Weiner
2015-03-25  6:17 ` [patch 01/12] mm: oom_kill: remove unnecessary locking in oom_enable() Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26  0:51   ` David Rientjes
2015-03-26  0:51     ` David Rientjes
2015-03-26 11:51     ` Michal Hocko
2015-03-26 11:51       ` Michal Hocko
2015-03-26 13:18       ` Michal Hocko
2015-03-26 13:18         ` Michal Hocko
2015-03-26 19:30         ` David Rientjes
2015-03-26 19:30           ` David Rientjes
2015-03-26 11:43   ` Michal Hocko
2015-03-26 11:43     ` Michal Hocko
2015-03-26 20:05   ` David Rientjes
2015-03-26 20:05     ` David Rientjes
2015-03-25  6:17 ` [patch 02/12] mm: oom_kill: clean up victim marking and exiting interfaces Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26  3:34   ` David Rientjes
2015-03-26  3:34     ` David Rientjes
2015-03-26 11:54   ` Michal Hocko
2015-03-26 11:54     ` Michal Hocko
2015-03-25  6:17 ` [patch 03/12] mm: oom_kill: switch test-and-clear of known TIF_MEMDIE to clear Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26  3:31   ` David Rientjes
2015-03-26  3:31     ` David Rientjes
2015-03-26 11:05     ` Johannes Weiner
2015-03-26 11:05       ` Johannes Weiner
2015-03-26 19:50       ` David Rientjes [this message]
2015-03-26 19:50         ` David Rientjes
2015-03-30 14:48         ` Michal Hocko
2015-03-30 14:48           ` Michal Hocko
2015-04-02 23:01         ` [patch] android, lmk: avoid setting TIF_MEMDIE if process has already exited David Rientjes
2015-04-02 23:01           ` David Rientjes
2015-04-28 22:50           ` [patch resend] " David Rientjes
2015-04-28 22:50             ` David Rientjes
2015-03-26 11:57   ` [patch 03/12] mm: oom_kill: switch test-and-clear of known TIF_MEMDIE to clear Michal Hocko
2015-03-26 11:57     ` Michal Hocko
2015-03-25  6:17 ` [patch 04/12] mm: oom_kill: remove unnecessary locking in exit_oom_victim() Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 12:53   ` Michal Hocko
2015-03-26 12:53     ` Michal Hocko
2015-03-26 13:01     ` Michal Hocko
2015-03-26 13:01       ` Michal Hocko
2015-03-26 15:10       ` Johannes Weiner
2015-03-26 15:10         ` Johannes Weiner
2015-03-26 15:04     ` Johannes Weiner
2015-03-26 15:04       ` Johannes Weiner
2015-03-25  6:17 ` [patch 05/12] mm: oom_kill: generalize OOM progress waitqueue Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 13:03   ` Michal Hocko
2015-03-26 13:03     ` Michal Hocko
2015-03-25  6:17 ` [patch 06/12] mm: oom_kill: simplify OOM killer locking Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 13:31   ` Michal Hocko
2015-03-26 13:31     ` Michal Hocko
2015-03-26 15:17     ` Johannes Weiner
2015-03-26 15:17       ` Johannes Weiner
2015-03-26 16:07       ` Michal Hocko
2015-03-26 16:07         ` Michal Hocko
2015-03-25  6:17 ` [patch 07/12] mm: page_alloc: inline should_alloc_retry() Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 14:11   ` Michal Hocko
2015-03-26 14:11     ` Michal Hocko
2015-03-26 15:18     ` Johannes Weiner
2015-03-26 15:18       ` Johannes Weiner
2015-03-25  6:17 ` [patch 08/12] mm: page_alloc: wait for OOM killer progress before retrying Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-25 14:15   ` Tetsuo Handa
2015-03-25 14:15     ` Tetsuo Handa
2015-03-25 17:01     ` Vlastimil Babka
2015-03-25 17:01       ` Vlastimil Babka
2015-03-26 11:28       ` Johannes Weiner
2015-03-26 11:28         ` Johannes Weiner
2015-03-26 11:24     ` Johannes Weiner
2015-03-26 11:24       ` Johannes Weiner
2015-03-26 14:32       ` Michal Hocko
2015-03-26 14:32         ` Michal Hocko
2015-03-26 15:23         ` Johannes Weiner
2015-03-26 15:23           ` Johannes Weiner
2015-03-26 15:38           ` Michal Hocko
2015-03-26 15:38             ` Michal Hocko
2015-03-26 18:17             ` Johannes Weiner
2015-03-26 18:17               ` Johannes Weiner
2015-03-27 14:01             ` [patch 08/12] mm: page_alloc: wait for OOM killer progressbefore retrying Tetsuo Handa
2015-03-27 14:01               ` Tetsuo Handa
2015-03-26 15:58   ` [patch 08/12] mm: page_alloc: wait for OOM killer progress before retrying Michal Hocko
2015-03-26 15:58     ` Michal Hocko
2015-03-26 18:23     ` Johannes Weiner
2015-03-26 18:23       ` Johannes Weiner
2015-03-25  6:17 ` [patch 09/12] mm: page_alloc: private memory reserves for OOM-killing allocations Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-04-14 16:49   ` Michal Hocko
2015-04-14 16:49     ` Michal Hocko
2015-04-24 19:13     ` Johannes Weiner
2015-04-24 19:13       ` Johannes Weiner
2015-03-25  6:17 ` [patch 10/12] mm: page_alloc: emergency reserve access for __GFP_NOFAIL allocations Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-04-14 16:55   ` Michal Hocko
2015-04-14 16:55     ` Michal Hocko
2015-03-25  6:17 ` [patch 11/12] mm: page_alloc: do not lock up GFP_NOFS allocations upon OOM Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 14:50   ` Michal Hocko
2015-03-26 14:50     ` Michal Hocko
2015-03-25  6:17 ` [patch 12/12] mm: page_alloc: do not lock up low-order " Johannes Weiner
2015-03-25  6:17   ` Johannes Weiner
2015-03-26 15:32   ` Michal Hocko
2015-03-26 15:32     ` Michal Hocko
2015-03-26 19:58 ` [patch 00/12] mm: page_alloc: improve OOM mechanism and policy Dave Chinner
2015-03-26 19:58   ` Dave Chinner
2015-03-27 15:05   ` Johannes Weiner
2015-03-27 15:05     ` Johannes Weiner
2015-03-30  0:32     ` Dave Chinner
2015-03-30  0:32       ` Dave Chinner
2015-03-30 19:31       ` Johannes Weiner
2015-03-30 19:31         ` Johannes Weiner
2015-04-01 15:19       ` Michal Hocko
2015-04-01 15:19         ` Michal Hocko
2015-04-01 21:39         ` Dave Chinner
2015-04-01 21:39           ` Dave Chinner
2015-04-02  7:29           ` Michal Hocko
2015-04-02  7:29             ` Michal Hocko
2015-04-07 14:18         ` Johannes Weiner
2015-04-07 14:18           ` Johannes Weiner
2015-04-11  7:29           ` Tetsuo Handa
2015-04-11  7:29             ` Tetsuo Handa
2015-04-13 12:49             ` Michal Hocko
2015-04-13 12:49               ` Michal Hocko
2015-04-13 12:46           ` Michal Hocko
2015-04-13 12:46             ` Michal Hocko
2015-04-14  0:11             ` Dave Chinner
2015-04-14  0:11               ` Dave Chinner
2015-04-14  7:20               ` Michal Hocko
2015-04-14  7:20                 ` Michal Hocko
2015-04-14 10:36             ` Johannes Weiner
2015-04-14 10:36               ` Johannes Weiner
2015-04-14 14:23               ` Michal Hocko
2015-04-14 14:23                 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.10.1503261231440.9410@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.