linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Tejun Heo <tj@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, linux-pm@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	"\\\"Rafael J. Wysocki\\\"" <rjw@rjwysocki.net>,
	David Rientjes <rientjes@google.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Cong Wang <xiyou.wangcong@gmail.com>
Subject: Re: [RFC 2/2] OOM, PM: make OOM detection in the freezer path raceless
Date: Thu, 4 Dec 2014 17:56:01 +0100	[thread overview]
Message-ID: <20141204165601.GD25001@dhcp22.suse.cz> (raw)
In-Reply-To: <20141204144454.GB15219@htj.dyndns.org>

On Thu 04-12-14 09:44:54, Tejun Heo wrote:
> On Thu, Dec 04, 2014 at 03:16:23PM +0100, Michal Hocko wrote:
> > > A delta but shouldn't it be pr_cont()?
> > 
> > kernel/power/process.c doesn't use pr_* so I've stayed with what the
> > rest of the file is using. I can add a patch which transforms all of
> > them.
> 
> The console output becomes wrong when printk() is used on
> continuation.  So, yeah, it'd be great to fix it.
> 
> > > > +extern bool oom_killer_disabled;
> > > 
> > > Ugh... don't we wanna put this in a header file?
> > 
> > Who else would need the declaration? This is not something random code
> > should look at.
> 
> Let's say, somebody changes the type to ulong for whatever reason
> later and forgets to update this declaration.  What happens then on a
> big endian machine?

OK, see your point. Although this is unlikely...
 
> Jesus, this is basic C programming.  You don't sprinkle external
> declarations which the compiler can't verify against the actual
> definitions.  There's absolutely no compelling reason to do that here.
> Why would you take out compiler verification for no reason?
> 
> > > > +void mark_tsk_oom_victim(struct task_struct *tsk)
> > > >  {
> > > > -	return atomic_read(&oom_kills);
> > > > +	BUG_ON(oom_killer_disabled);
> > > 
> > > WARN_ON_ONCE() is prolly a better option here?
> > 
> > Well, something fishy is going on when oom_killer_disabled is set and we
> > mark new OOM victim. This is a clear bug. Why would be warning and a
> > allow the follow up breakage?
> 
> Because the system is more likely to be able to go on and we don't BUG
> when we can WARN as a general rule.  Working systems is almost always
> better than a dead system even for debugging.
> 
> > > > +	if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
> > > 
> > > Can a task actually be selected as an OOM victim multiple times?
> > 
> > AFAICS nothing prevents from global OOM and memcg OOM killers racing.
> 
> Maybe it'd be a good idea to note that in the comment?

ok

> > > > -void note_oom_kill(void)
> > > > +void unmark_tsk_oom_victim(struct task_struct *tsk)
> > > >  {
> > > > -	atomic_inc(&oom_kills);
> > > > +	int count;
> > > > +
> > > > +	if (!test_and_clear_tsk_thread_flag(tsk, TIF_MEMDIE))
> > > > +		return;
> > > 
> > > Maybe test this inline in exit_mm()?  e.g.
> > > 
> > > 	if (test_thread_flag(TIF_MEMDIE))
> > > 		unmark_tsk_oom_victim(current);
> > 
> > Why do you think testing TIF_MEMDIE in exit_mm is better? I would like
> > to reduce the usage of the flag as much as possible.
> 
> Because it's adding a function call/return to hot path for everybody.
> It sure is a miniscule cost but we're adding that for no good reason.

ok. 
 
> > > So, each complete() increments the done count and wait decs.  The
> > > above code works iff the complete()'s and wait()'s are always balanced
> > > which usually isn't true in this type of wait code.  Either use
> > > reinit_completion() / complete_all() combos or wait_event().
> > 
> > Hmm, I thought that only a single instance of freeze_kernel_threads
> > (which calls oom_killer_disable) can run at a time. But I am currently
> > not sure that all paths are called under lock_system_sleep.
> > I am not familiar with reinit_completion API. Is the following correct?
> 
> Hmmm... wouldn't wait_event() easier to read in this case?

OK, it looks easier. I thought it would require some additional
synchronization between wake up and wait but everything necessary seems
to be done in wait_event already so we cannot miss a wake up AFAICS:

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1d55ab12792f..032be9d2a239 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -408,7 +408,7 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
  * Number of OOM victims in flight
  */
 static atomic_t oom_victims = ATOMIC_INIT(0);
-static DECLARE_COMPLETION(oom_victims_wait);
+static DECLARE_WAIT_QUEUE_HEAD(oom_victims_wait);
 
 bool oom_killer_disabled __read_mostly;
 static DECLARE_RWSEM(oom_sem);
@@ -435,7 +435,7 @@ void unmark_tsk_oom_victim(void)
 	 * is nobody who cares.
 	 */
 	if (!atomic_dec_return(&oom_victims) && oom_killer_disabled)
-		complete_all(&oom_victims_wait);
+		wake_up_all(&oom_victims_wait);
 	up_read(&oom_sem);
 }
 
@@ -464,16 +464,11 @@ bool oom_killer_disable(void)
 		return false;
 	}
 
-	/* unmark_tsk_oom_victim is calling complete_all */
-	if (!oom_killer_disable)
-		reinit_completion(&oom_victims_wait);
-
 	oom_killer_disabled = true;
-	count = atomic_read(&oom_victims);
 	up_write(&oom_sem);
 
 	if (count)
-		wait_for_completion(&oom_victims_wait);
+		wait_event(oom_victims_wait, atomic_read(&oom_victims));
 
 	return true;
 }

> ...
> > > Maybe 0 / -errno is better choice as return values?
> > 
> > I do not have problem to change this if you feel strong about it but
> > true/false sounds easier to me and it allows the caller to decide what to
> > report. If there were multiple reasons to fail then sure but that is not
> > the case.
> 
> It's not a big deal but except for functions which have clear boolean
> behavior - functions which try/attempt something or query or decide

this is basically try_lock which might fail due to whatever internal
reasons.

> certain things - randomly thrown in bool returns tend to become
> confusing especially because its bool fail value is the opposite of
> 0/-errno fail value.  So, "this function only fails with one reason"
> is usually a bad and arbitrary reason for choosing bool return which
> causes confusion on callsites and headaches when the function develops
> more reasons to fail.
> 
> ...
> > > > @@ -712,12 +770,16 @@ void pagefault_out_of_memory(void)
> > > >  {
> > > >  	struct zonelist *zonelist;
> > > >  
> > > > +	down_read(&oom_sem);
> > > >  	if (mem_cgroup_oom_synchronize(true))
> > > > -		return;
> > > > +		goto unlock;
> > > >  
> > > >  	zonelist = node_zonelist(first_memory_node, GFP_KERNEL);
> > > >  	if (oom_zonelist_trylock(zonelist, GFP_KERNEL)) {
> > > > -		out_of_memory(NULL, 0, 0, NULL, false);
> > > > +		if (!oom_killer_disabled)
> > > > +			__out_of_memory(NULL, 0, 0, NULL, false);
> > > >  		oom_zonelist_unlock(zonelist, GFP_KERNEL);
> > > 
> > > Is this a condition which can happen and we can deal with? With
> > > userland fully frozen, there shouldn't be page faults which lead to
> > > memory allocation, right?
> > 
> > Except for racing OOM victims which were missed by try_to_freeze_tasks
> > because they didn't get cpu slice to wake up from the freezer. The task
> > would die on the way out from the page fault exception. I have updated
> > the changelog to be more verbose about this.
> 
> That's something very not obvious.  Let's please add a comment
> explaining that.

@@ -778,6 +795,15 @@ void pagefault_out_of_memory(void)
 	if (oom_zonelist_trylock(zonelist, GFP_KERNEL)) {
 		if (!oom_killer_disabled)
 			__out_of_memory(NULL, 0, 0, NULL, false);
+		else
+			/*
+			 * There shouldn't be any user tasks runable while the
+			 * OOM killer is disabled so the current task has to
+			 * be a racing OOM victim for which oom_killer_disable()
+			 * is waiting for.
+			 */
+			WARN_ON(test_thread_flag(TIF_MEMDIE));
+
 		oom_zonelist_unlock(zonelist, GFP_KERNEL);
 	}
 unlock:

> 
> > > (it only makes sense while the whole system is in quiescent state)
> > > and at least trigger WARN_ON_ONCE() if the above code path gets
> > > triggered while oom killer is disabled?
> > 
> > I can add a WARN_ON(!test_thread_flag(tsk, TIF_MEMDIE)).
> 
> Yeah, that makes sense to me.
> 
> Thanks.
> 
> -- 
> tejun

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-12-04 16:56 UTC|newest]

Thread overview: 93+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-21  7:27 [PATCH 0/4 -v2] OOM vs. freezer interaction fixes Michal Hocko
2014-10-21  7:27 ` [PATCH 1/4] freezer: Do not freeze tasks killed by OOM killer Michal Hocko
2014-10-21 12:04   ` Rafael J. Wysocki
2014-10-21  7:27 ` [PATCH 2/4] freezer: remove obsolete comments in __thaw_task() Michal Hocko
2014-10-21 12:04   ` Rafael J. Wysocki
2014-10-21  7:27 ` [PATCH 3/4] OOM, PM: OOM killed task shouldn't escape PM suspend Michal Hocko
2014-10-21 12:09   ` Rafael J. Wysocki
2014-10-21 13:14     ` Michal Hocko
2014-10-21 13:42       ` Rafael J. Wysocki
2014-10-21 14:11         ` Michal Hocko
2014-10-21 14:41           ` Rafael J. Wysocki
2014-10-21 14:29             ` Michal Hocko
2014-10-22 14:39               ` Rafael J. Wysocki
2014-10-22 14:22                 ` Michal Hocko
2014-10-22 21:18                   ` Rafael J. Wysocki
2014-10-26 18:49               ` Pavel Machek
2014-11-04 19:27               ` Tejun Heo
2014-11-05 12:46                 ` Michal Hocko
2014-11-05 13:02                   ` Tejun Heo
2014-11-05 13:31                     ` Michal Hocko
2014-11-05 13:42                       ` Michal Hocko
2014-11-05 14:14                         ` Michal Hocko
2014-11-05 15:45                           ` Michal Hocko
2014-11-05 15:44                         ` Tejun Heo
2014-11-05 16:01                           ` Michal Hocko
2014-11-05 16:29                             ` Tejun Heo
2014-11-05 16:39                               ` Michal Hocko
2014-11-05 16:54                                 ` Tejun Heo
2014-11-05 17:01                                   ` Tejun Heo
2014-11-06 13:05                                     ` Michal Hocko
2014-11-06 15:09                                       ` Tejun Heo
2014-11-06 16:01                                         ` Michal Hocko
2014-11-06 16:12                                           ` Tejun Heo
2014-11-06 16:31                                             ` Michal Hocko
2014-11-06 16:33                                               ` Tejun Heo
2014-11-06 16:58                                                 ` Michal Hocko
2014-11-05 17:46                                   ` Michal Hocko
2014-11-05 17:55                                     ` Tejun Heo
2014-11-06 12:49                                       ` Michal Hocko
2014-11-06 15:01                                         ` Tejun Heo
2014-11-06 16:02                                           ` Michal Hocko
2014-11-06 16:28                                             ` Tejun Heo
2014-11-10 16:30                                               ` Michal Hocko
2014-11-12 18:58                                                 ` [RFC 0/4] OOM vs PM freezer fixes Michal Hocko
2014-11-12 18:58                                                   ` [RFC 1/4] OOM, PM: Do not miss OOM killed frozen tasks Michal Hocko
2014-11-14 17:55                                                     ` Tejun Heo
2014-11-12 18:58                                                   ` [RFC 2/4] OOM, PM: make OOM detection in the freezer path raceless Michal Hocko
2014-11-12 18:58                                                   ` [RFC 3/4] OOM, PM: handle pm freezer as an OOM victim correctly Michal Hocko
2014-11-12 18:58                                                   ` [RFC 4/4] OOM: thaw the OOM victim if it is frozen Michal Hocko
2014-11-14 20:14                                                   ` [RFC 0/4] OOM vs PM freezer fixes Tejun Heo
2014-11-18 21:08                                                     ` Michal Hocko
2014-11-18 21:10                                                       ` [RFC 1/2] oom: add helper for setting and clearing TIF_MEMDIE Michal Hocko
2014-11-18 21:10                                                         ` [RFC 2/2] OOM, PM: make OOM detection in the freezer path raceless Michal Hocko
2014-11-27  0:47                                                           ` Rafael J. Wysocki
2014-12-02 22:08                                                           ` Tejun Heo
2014-12-04 14:16                                                             ` Michal Hocko
2014-12-04 14:44                                                               ` Tejun Heo
2014-12-04 16:56                                                                 ` Michal Hocko [this message]
2014-12-04 17:18                                                                   ` Michal Hocko
2014-12-05 16:41                                                 ` [PATCH 0/4] OOM vs PM freezer fixes Michal Hocko
2014-12-05 16:41                                                   ` [PATCH -v2 1/5] oom: add helpers for setting and clearing TIF_MEMDIE Michal Hocko
2014-12-06 12:56                                                     ` Tejun Heo
2014-12-07 10:13                                                       ` Michal Hocko
2015-01-07 17:57                                                     ` Tejun Heo
2015-01-07 18:23                                                       ` Michal Hocko
2014-12-05 16:41                                                   ` [PATCH -v2 2/5] OOM: thaw the OOM victim if it is frozen Michal Hocko
2014-12-06 13:06                                                     ` Tejun Heo
2014-12-07 10:24                                                       ` Michal Hocko
2014-12-07 10:45                                                         ` Michal Hocko
2014-12-07 13:59                                                           ` Tejun Heo
2014-12-07 18:55                                                             ` Michal Hocko
2014-12-05 16:41                                                   ` [PATCH -v2 3/5] PM: convert printk to pr_* equivalent Michal Hocko
2014-12-05 22:40                                                     ` Rafael J. Wysocki
2014-12-07 10:26                                                       ` Michal Hocko
2014-12-06 13:08                                                     ` Tejun Heo
2014-12-05 16:41                                                   ` [PATCH -v2 4/5] sysrq: " Michal Hocko
2014-12-06 13:09                                                     ` Tejun Heo
2014-12-05 16:41                                                   ` [PATCH -v2 5/5] OOM, PM: make OOM detection in the freezer path raceless Michal Hocko
2014-12-06 13:11                                                     ` Tejun Heo
2014-12-07 10:11                                                       ` Michal Hocko
2015-01-07 18:41                                                     ` Tejun Heo
2015-01-07 18:48                                                       ` Michal Hocko
2015-01-08 11:51                                                     ` Michal Hocko
2014-12-07 10:09                                                   ` [PATCH 0/4] OOM vs PM freezer fixes Michal Hocko
2014-12-07 13:55                                                     ` Tejun Heo
2014-12-07 19:00                                                       ` Michal Hocko
2014-12-18 16:27                                                         ` Michal Hocko
2014-11-05 14:55                   ` [PATCH 3/4] OOM, PM: OOM killed task shouldn't escape PM suspend Michal Hocko
2014-10-26 18:40   ` Pavel Machek
2014-10-21  7:27 ` [PATCH 4/4] PM: convert do_each_thread to for_each_process_thread Michal Hocko
2014-10-21 12:10   ` Rafael J. Wysocki
2014-10-21 13:19     ` Michal Hocko
2014-10-21 13:43       ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141204165601.GD25001@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=rientjes@google.com \
    --cc=rjw@rjwysocki.net \
    --cc=tj@kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).