linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] OOM vs. freezer interaction fixes
@ 2014-10-08 14:07 Michal Hocko
  2014-10-08 14:07 ` [PATCH 1/3] freezer: check OOM kill while being frozen Michal Hocko
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Michal Hocko @ 2014-10-08 14:07 UTC (permalink / raw)
  To: Andrew Morton, \"Rafael J. Wysocki\"
  Cc: Cong Wang, David Rientjes, Tejun Heo, LKML, linux-mm

Hi Andrew, Rafael,

this has been originally discussed here [1] but didn't lead anywhere AFAICS
so I would like to resurrect them.

The first and third patch are regression fixes and they are a stable
material IMO. The second patch is a simple cleanup.

The 1st patch is fixing a regression introduced in 3.3 since when OOM
killer is not able to kill any frozen task and live lock as a result.
The fix gets us back to the 3.2. As it turned out during the discussion [2]
this was still not 100% sufficient and that's why we need the 3rd patch.

I was thinking about the proper 1st vs. 3rd patch ordering because
the 1st patch basically opens a race window fixed by the later patch.
Original patch from Cong Wang has covered this by cgroup_freezing(current)
check in should_thaw_current(). But this approach still suffers from OOM
vs. PM freezer interaction (OOM killer would still live lock waiting for a
PM frozen task this time).

So I think the most straight forward way is to address only OOM vs.
frozen task interaction in the first patch, mark it for stable 3.3+ and
leave the race to a separate follow up patch which is applicable to
stable 3.2+ (before a3201227f803 made it inefficient).

Switching 1st and 3rd patches would make some sense as well but then
it might end up even more confusing because we would be fixing a
non-existent issue in upstream first...

---
[1] http://marc.info/?l=linux-kernel&m=140986986423092
[2] http://marc.info/?l=linux-kernel&m=141074263721166


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] freezer: check OOM kill while being frozen
  2014-10-08 14:07 [PATCH 0/3] OOM vs. freezer interaction fixes Michal Hocko
@ 2014-10-08 14:07 ` Michal Hocko
  2014-10-08 14:07 ` [PATCH 2/3] freezer: remove obsolete comments in __thaw_task() Michal Hocko
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2014-10-08 14:07 UTC (permalink / raw)
  To: Andrew Morton, \"Rafael J. Wysocki\"
  Cc: Cong Wang, David Rientjes, Tejun Heo, LKML, linux-mm

From: Cong Wang <xiyou.wangcong@gmail.com>

Since f660daac474c6f (oom: thaw threads if oom killed thread is frozen
before deferring) OOM killer relies on being able to thaw a frozen task
to handle OOM situation but a3201227f803 (freezer: make freezing() test
freeze conditions in effect instead of TIF_FREEZE) has reorganized the
code and stopped clearing freeze flag in __thaw_task. This means that
the target task only wakes up and goes into the fridge again because the
freezing condition hasn't changed for it. This reintroduces the bug
fixed by f660daac474c6f.

Fix the issue by checking for TIF_MEMDIE thread flag and get away from
the fridge if it is set. oom_scan_process_thread doesn't have to check
for the frozen task anymore because do_send_sig_info will wake up the
thread and TIF_MEMDIE is already set by that time.

[mhocko@suse.cz: rewrote the changelog]
Fixes: a3201227f803 (freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE)
Cc: stable@vger.kernel.org # 3.3+
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 kernel/freezer.c | 20 +++++++++++++++++---
 mm/oom_kill.c    |  2 --
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index aa6a8aadb911..77ad6794b610 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -45,13 +45,28 @@ bool freezing_slow_path(struct task_struct *p)
 	if (pm_nosig_freezing || cgroup_freezing(p))
 		return true;
 
-	if (pm_freezing && !(p->flags & PF_KTHREAD))
+	if (!(p->flags & PF_KTHREAD))
 		return true;
 
 	return false;
 }
 EXPORT_SYMBOL(freezing_slow_path);
 
+static bool should_thaw_current(bool check_kthr_stop)
+{
+	if (!freezing(current))
+		return true;
+
+	if (check_kthr_stop && kthread_should_stop())
+		return true;
+
+	/* It might not be safe to check TIF_MEMDIE for pm freeze. */
+	if (cgroup_freezing(current) && test_thread_flag(TIF_MEMDIE))
+		return true;
+
+	return false;
+}
+
 /* Refrigerator is place where frozen processes are stored :-). */
 bool __refrigerator(bool check_kthr_stop)
 {
@@ -67,8 +82,7 @@ bool __refrigerator(bool check_kthr_stop)
 
 		spin_lock_irq(&freezer_lock);
 		current->flags |= PF_FROZEN;
-		if (!freezing(current) ||
-		    (check_kthr_stop && kthread_should_stop()))
+		if (should_thaw_current(check_kthr_stop))
 			current->flags &= ~PF_FROZEN;
 		spin_unlock_irq(&freezer_lock);
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index bbf405a3a18f..8975b983a82c 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -266,8 +266,6 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
 	 * Don't allow any other task to have access to the reserves.
 	 */
 	if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
-		if (unlikely(frozen(task)))
-			__thaw_task(task);
 		if (!force_kill)
 			return OOM_SCAN_ABORT;
 	}
-- 
2.1.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] freezer: remove obsolete comments in __thaw_task()
  2014-10-08 14:07 [PATCH 0/3] OOM vs. freezer interaction fixes Michal Hocko
  2014-10-08 14:07 ` [PATCH 1/3] freezer: check OOM kill while being frozen Michal Hocko
@ 2014-10-08 14:07 ` Michal Hocko
  2014-10-08 14:07 ` [PATCH 3/3] OOM, PM: OOM killed task cannot escape PM suspend Michal Hocko
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2014-10-08 14:07 UTC (permalink / raw)
  To: Andrew Morton, \"Rafael J. Wysocki\"
  Cc: Cong Wang, David Rientjes, Tejun Heo, LKML, linux-mm

From: Cong Wang <xiyou.wangcong@gmail.com>

__thaw_task() no longer clears frozen flag since commit a3201227f803
(freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE).

Cc: David Rientjes <rientjes@google.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 kernel/freezer.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/kernel/freezer.c b/kernel/freezer.c
index 77ad6794b610..190d667f4e12 100644
--- a/kernel/freezer.c
+++ b/kernel/freezer.c
@@ -161,12 +161,6 @@ void __thaw_task(struct task_struct *p)
 {
 	unsigned long flags;
 
-	/*
-	 * Clear freezing and kick @p if FROZEN.  Clearing is guaranteed to
-	 * be visible to @p as waking up implies wmb.  Waking up inside
-	 * freezer_lock also prevents wakeups from leaking outside
-	 * refrigerator.
-	 */
 	spin_lock_irqsave(&freezer_lock, flags);
 	if (frozen(p))
 		wake_up_process(p);
-- 
2.1.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] OOM, PM: OOM killed task cannot escape PM suspend
  2014-10-08 14:07 [PATCH 0/3] OOM vs. freezer interaction fixes Michal Hocko
  2014-10-08 14:07 ` [PATCH 1/3] freezer: check OOM kill while being frozen Michal Hocko
  2014-10-08 14:07 ` [PATCH 2/3] freezer: remove obsolete comments in __thaw_task() Michal Hocko
@ 2014-10-08 14:07 ` Michal Hocko
  2014-10-08 22:11 ` [PATCH 0/3] OOM vs. freezer interaction fixes Rafael J. Wysocki
  2014-10-09  4:42 ` Cong Wang
  4 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2014-10-08 14:07 UTC (permalink / raw)
  To: Andrew Morton, \"Rafael J. Wysocki\"
  Cc: Cong Wang, David Rientjes, Tejun Heo, LKML, linux-mm

PM freezer relies on having all tasks frozen by the time devices are
getting frozen so that no task will touch them while they are getting
frozen. But OOM killer is allowed to kill an already frozen task in
order to handle OOM situtation. In order to protect from late wake ups
OOM killer is disabled after all tasks are frozen. This, however, still
keeps a window open when a killed task didn't manage to die by the time
freeze_processes finishes.

Fix this race by checking all tasks after OOM killer has been disabled.
To prevent from useless check also introduce and check oom_kills count
which gets incremented when a task is killed by OOM killer. All the
tasks have to be checked only if the counter changes.

Fixes: f660daac474c6f (oom: thaw threads if oom killed thread is frozen before deferring)
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org # 3.2+
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 include/linux/oom.h    |  2 ++
 kernel/power/process.c | 31 ++++++++++++++++++++++++++++++-
 mm/oom_kill.c          | 14 ++++++++++++++
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 647395a1a550..8927b6e443b5 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -50,6 +50,8 @@ static inline bool oom_task_origin(const struct task_struct *p)
 extern unsigned long oom_badness(struct task_struct *p,
 		struct mem_cgroup *memcg, const nodemask_t *nodemask,
 		unsigned long totalpages);
+
+extern int oom_kills_count(void);
 extern void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 			     unsigned int points, unsigned long totalpages,
 			     struct mem_cgroup *memcg, nodemask_t *nodemask,
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 4ee194eb524b..6ccc2e10724d 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -118,6 +118,7 @@ static int try_to_freeze_tasks(bool user_only)
 int freeze_processes(void)
 {
 	int error;
+	int oom_kills_saved;
 
 	error = __usermodehelper_disable(UMH_FREEZING);
 	if (error)
@@ -131,12 +132,40 @@ int freeze_processes(void)
 
 	printk("Freezing user space processes ... ");
 	pm_freezing = true;
+	oom_kills_saved = oom_kills_count();
 	error = try_to_freeze_tasks(true);
 	if (!error) {
-		printk("done.");
 		__usermodehelper_set_disable_depth(UMH_DISABLED);
 		oom_killer_disable();
+
+		/*
+		 * There was a OOM kill while we were freezing tasks
+		 * and the killed task might be still on the way out
+		 * so we have to double check for race.
+		 */
+		if (oom_kills_count() != oom_kills_saved) {
+			struct task_struct *g, *p;
+
+			read_lock(&tasklist_lock);
+			do_each_thread(g, p) {
+				if (p == current || freezer_should_skip(p) ||
+				    frozen(p))
+					continue;
+				error = -EBUSY;
+				break;
+			} while_each_thread(g, p);
+			read_unlock(&tasklist_lock);
+
+			if (error) {
+				__usermodehelper_set_disable_depth(UMH_ENABLED);
+				oom_killer_enable();
+				printk("OOM in progress. ");
+				goto done;
+			}
+		}
+		printk("done.");
 	}
+done:
 	printk("\n");
 	BUG_ON(in_atomic());
 
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8975b983a82c..ca96b01f4d7e 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -402,6 +402,18 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
 		dump_tasks(memcg, nodemask);
 }
 
+/*
+ * Number of OOM killer invocations (including memcg OOM killer).
+ * Primarily used by PM freezer to check for potential races with
+ * OOM killed frozen task.
+ */
+static atomic_t oom_kills = ATOMIC_INIT(0);
+
+int oom_kills_count(void)
+{
+	return atomic_read(&oom_kills);
+}
+
 #define K(x) ((x) << (PAGE_SHIFT-10))
 /*
  * Must be called while holding a reference to p, which will be released upon
@@ -504,11 +516,13 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 			pr_err("Kill process %d (%s) sharing same memory\n",
 				task_pid_nr(p), p->comm);
 			task_unlock(p);
+			atomic_inc(&oom_kills);
 			do_send_sig_info(SIGKILL, SEND_SIG_FORCED, p, true);
 		}
 	rcu_read_unlock();
 
 	set_tsk_thread_flag(victim, TIF_MEMDIE);
+	atomic_inc(&oom_kills);
 	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
 	put_task_struct(victim);
 }
-- 
2.1.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] OOM vs. freezer interaction fixes
  2014-10-08 14:07 [PATCH 0/3] OOM vs. freezer interaction fixes Michal Hocko
                   ` (2 preceding siblings ...)
  2014-10-08 14:07 ` [PATCH 3/3] OOM, PM: OOM killed task cannot escape PM suspend Michal Hocko
@ 2014-10-08 22:11 ` Rafael J. Wysocki
  2014-10-13 15:14   ` Michal Hocko
  2014-10-09  4:42 ` Cong Wang
  4 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2014-10-08 22:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Cong Wang, David Rientjes, Tejun Heo, LKML,
	linux-mm, Linux PM list

On Wednesday, October 08, 2014 04:07:43 PM Michal Hocko wrote:
> Hi Andrew, Rafael,
> 
> this has been originally discussed here [1] but didn't lead anywhere AFAICS
> so I would like to resurrect them.

OK

So any chance to CC linux-pm too next time?  There are people on that list
who may be interested as well and are not in the CC directly either.

> The first and third patch are regression fixes and they are a stable
> material IMO. The second patch is a simple cleanup.
> 
> The 1st patch is fixing a regression introduced in 3.3 since when OOM
> killer is not able to kill any frozen task and live lock as a result.
> The fix gets us back to the 3.2. As it turned out during the discussion [2]
> this was still not 100% sufficient and that's why we need the 3rd patch.
> 
> I was thinking about the proper 1st vs. 3rd patch ordering because
> the 1st patch basically opens a race window fixed by the later patch.
> Original patch from Cong Wang has covered this by cgroup_freezing(current)
> check in should_thaw_current(). But this approach still suffers from OOM
> vs. PM freezer interaction (OOM killer would still live lock waiting for a
> PM frozen task this time).
> 
> So I think the most straight forward way is to address only OOM vs.
> frozen task interaction in the first patch, mark it for stable 3.3+ and
> leave the race to a separate follow up patch which is applicable to
> stable 3.2+ (before a3201227f803 made it inefficient).
> 
> Switching 1st and 3rd patches would make some sense as well but then
> it might end up even more confusing because we would be fixing a
> non-existent issue in upstream first...
> 
> ---
> [1] http://marc.info/?l=linux-kernel&m=140986986423092
> [2] http://marc.info/?l=linux-kernel&m=141074263721166
> 

I'm fine with the approach in general, but I need to stare at patch 3
for a little bit longer before I ACK it.  Which may not happen really
soon as I'll be rather busy on Thu/Fri and then I'll be traveling to
the LPC/LCEU next week.

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] OOM vs. freezer interaction fixes
  2014-10-08 14:07 [PATCH 0/3] OOM vs. freezer interaction fixes Michal Hocko
                   ` (3 preceding siblings ...)
  2014-10-08 22:11 ` [PATCH 0/3] OOM vs. freezer interaction fixes Rafael J. Wysocki
@ 2014-10-09  4:42 ` Cong Wang
  4 siblings, 0 replies; 7+ messages in thread
From: Cong Wang @ 2014-10-09  4:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, \Rafael J. Wysocki\,
	David Rientjes, Tejun Heo, LKML, linux-mm

Hi, Michal

On Wed, Oct 8, 2014 at 7:07 AM, Michal Hocko <mhocko@suse.cz> wrote:
> Hi Andrew, Rafael,
>
> this has been originally discussed here [1] but didn't lead anywhere AFAICS
> so I would like to resurrect them.
>

Thanks a lot for taking them for me! I was busy with some networking
stuffs and also actually waiting for Rafael's response to your patch.


> The first and third patch are regression fixes and they are a stable
> material IMO. The second patch is a simple cleanup.
>
> The 1st patch is fixing a regression introduced in 3.3 since when OOM
> killer is not able to kill any frozen task and live lock as a result.
> The fix gets us back to the 3.2. As it turned out during the discussion [2]
> this was still not 100% sufficient and that's why we need the 3rd patch.
>
> I was thinking about the proper 1st vs. 3rd patch ordering because
> the 1st patch basically opens a race window fixed by the later patch.
> Original patch from Cong Wang has covered this by cgroup_freezing(current)
> check in should_thaw_current(). But this approach still suffers from OOM
> vs. PM freezer interaction (OOM killer would still live lock waiting for a
> PM frozen task this time).


It should be very rare OOM happens during PM frozen.

>
> So I think the most straight forward way is to address only OOM vs.
> frozen task interaction in the first patch, mark it for stable 3.3+ and
> leave the race to a separate follow up patch which is applicable to
> stable 3.2+ (before a3201227f803 made it inefficient).
>
> Switching 1st and 3rd patches would make some sense as well but then
> it might end up even more confusing because we would be fixing a
> non-existent issue in upstream first...
>

Agreed. Up to you, I have no strong opinions here. :)


Again, thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] OOM vs. freezer interaction fixes
  2014-10-08 22:11 ` [PATCH 0/3] OOM vs. freezer interaction fixes Rafael J. Wysocki
@ 2014-10-13 15:14   ` Michal Hocko
  0 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2014-10-13 15:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andrew Morton, Cong Wang, David Rientjes, Tejun Heo, LKML,
	linux-mm, Linux PM list

On Thu 09-10-14 00:11:33, Rafael J. Wysocki wrote:
> On Wednesday, October 08, 2014 04:07:43 PM Michal Hocko wrote:
> > Hi Andrew, Rafael,
> > 
> > this has been originally discussed here [1] but didn't lead anywhere AFAICS
> > so I would like to resurrect them.
> 
> OK
> 
> So any chance to CC linux-pm too next time?  There are people on that list
> who may be interested as well and are not in the CC directly either.

Sure, sorry about that! I've simply used the same CC list as the
previous post without realizing PM list was missing.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-10-13 15:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-08 14:07 [PATCH 0/3] OOM vs. freezer interaction fixes Michal Hocko
2014-10-08 14:07 ` [PATCH 1/3] freezer: check OOM kill while being frozen Michal Hocko
2014-10-08 14:07 ` [PATCH 2/3] freezer: remove obsolete comments in __thaw_task() Michal Hocko
2014-10-08 14:07 ` [PATCH 3/3] OOM, PM: OOM killed task cannot escape PM suspend Michal Hocko
2014-10-08 22:11 ` [PATCH 0/3] OOM vs. freezer interaction fixes Rafael J. Wysocki
2014-10-13 15:14   ` Michal Hocko
2014-10-09  4:42 ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).