All of lore.kernel.org
 help / color / mirror / Atom feed
From: "azurIt" <azurit@pobox.sk>
To: "Johannes Weiner" <hannes@cmpxchg.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Michal Hocko" <mhocko@suse.cz>,
	"David Rientjes" <rientjes@google.com>,
	"KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>,
	"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 0/7] improve memcg oom killer robustness v2
Date: Mon, 09 Sep 2013 15:10:10 +0200	[thread overview]
Message-ID: <20130909151010.3A3CBC6A@pobox.sk> (raw)
In-Reply-To: <20130905115430.GB856@cmpxchg.org>

>Hi azur,
>
>On Wed, Sep 04, 2013 at 10:18:52AM +0200, azurIt wrote:
>> > CC: "Andrew Morton" <akpm@linux-foundation.org>, "Michal Hocko" <mhocko@suse.cz>, "David Rientjes" <rientjes@google.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
>> >Hello azur,
>> >
>> >On Mon, Sep 02, 2013 at 12:38:02PM +0200, azurIt wrote:
>> >> >>Hi azur,
>> >> >>
>> >> >>here is the x86-only rollup of the series for 3.2.
>> >> >>
>> >> >>Thanks!
>> >> >>Johannes
>> >> >>---
>> >> >
>> >> >
>> >> >Johannes,
>> >> >
>> >> >unfortunately, one problem arises: I have (again) cgroup which cannot be deleted :( it's a user who had very high memory usage and was reaching his limit very often. Do you need any info which i can gather now?
>> >
>> >Did the OOM killer go off in this group?
>> >
>> >Was there a warning in the syslog ("Fixing unhandled memcg OOM
>> >context")?
>> 
>> 
>> 
>> Ok, i see this message several times in my syslog logs, one of them is also for this unremovable cgroup (but maybe all of them cannot be removed, should i try?). Example of the log is here (don't know where exactly it starts and ends so here is the full kernel log):
>> http://watchdog.sk/lkml/oom_syslog.gz
>There is an unfinished OOM invocation here:
>
>  Aug 22 13:15:21 server01 kernel: [1251422.715112] Fixing unhandled memcg OOM context set up from:
>  Aug 22 13:15:21 server01 kernel: [1251422.715191]  [<ffffffff811105c2>] T.1154+0x622/0x8f0
>  Aug 22 13:15:21 server01 kernel: [1251422.715274]  [<ffffffff8111153e>] mem_cgroup_cache_charge+0xbe/0xe0
>  Aug 22 13:15:21 server01 kernel: [1251422.715357]  [<ffffffff810cf31c>] add_to_page_cache_locked+0x4c/0x140
>  Aug 22 13:15:21 server01 kernel: [1251422.715443]  [<ffffffff810cf432>] add_to_page_cache_lru+0x22/0x50
>  Aug 22 13:15:21 server01 kernel: [1251422.715526]  [<ffffffff810cfdd3>] find_or_create_page+0x73/0xb0
>  Aug 22 13:15:21 server01 kernel: [1251422.715608]  [<ffffffff811493ba>] __getblk+0xea/0x2c0
>  Aug 22 13:15:21 server01 kernel: [1251422.715692]  [<ffffffff8114ca73>] __bread+0x13/0xc0
>  Aug 22 13:15:21 server01 kernel: [1251422.715774]  [<ffffffff81196968>] ext3_get_branch+0x98/0x140
>  Aug 22 13:15:21 server01 kernel: [1251422.715859]  [<ffffffff81197557>] ext3_get_blocks_handle+0xd7/0xdc0
>  Aug 22 13:15:21 server01 kernel: [1251422.715942]  [<ffffffff81198304>] ext3_get_block+0xc4/0x120
>  Aug 22 13:15:21 server01 kernel: [1251422.716023]  [<ffffffff81155c3a>] do_mpage_readpage+0x38a/0x690
>  Aug 22 13:15:21 server01 kernel: [1251422.716107]  [<ffffffff81155f8f>] mpage_readpage+0x4f/0x70
>  Aug 22 13:15:21 server01 kernel: [1251422.716188]  [<ffffffff811973a8>] ext3_readpage+0x28/0x60
>  Aug 22 13:15:21 server01 kernel: [1251422.716268]  [<ffffffff810cfa48>] filemap_fault+0x308/0x560
>  Aug 22 13:15:21 server01 kernel: [1251422.716350]  [<ffffffff810ef898>] __do_fault+0x78/0x5a0
>  Aug 22 13:15:21 server01 kernel: [1251422.716433]  [<ffffffff810f2ab4>] handle_pte_fault+0x84/0x940
>
>__getblk() has this weird loop where it tries to instantiate the page,
>frees memory on failure, then retries.  If the memcg goes OOM, the OOM
>path might be entered multiple times and each time leak the memcg
>reference of the respective previous OOM invocation.
>
>There are a few more find_or_create() sites that do not propagate an
>error and it's incredibly hard to find out whether they are even taken
>during a page fault.  It's not practical to annotate them all with
>memcg OOM toggles, so let's just catch all OOM contexts at the end of
>handle_mm_fault() and clear them if !VM_FAULT_OOM instead of treating
>this like an error.
>
>azur, here is a patch on top of your modified 3.2.  Note that Michal
>might be onto something and we are looking at multiple issues here,
>but the log excert above suggests this fix is required either way.




Johannes, is this still up to date? Thank you.

azur






>---
>From: Johannes Weiner <hannes@cmpxchg.org>
>Subject: [patch] mm: memcg: handle non-error OOM situations more gracefully
>
>Many places that can trigger a memcg OOM situation return gracefully
>and don't propagate VM_FAULT_OOM up the fault stack.
>
>It's not practical to annotate all of them to disable the memcg OOM
>killer.  Instead, just clean up any set OOM state without warning in
>case the fault is not returning VM_FAULT_OOM.
>
>Also fail charges immediately when the current task already is in an
>OOM context.  Otherwise, the previous context gets overwritten and the
>memcg reference is leaked.
>
>Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>---
> include/linux/memcontrol.h | 40 ++++++----------------------------------
> include/linux/sched.h      |  3 ---
> mm/filemap.c               | 11 +----------
> mm/memcontrol.c            | 15 ++++++++-------
> mm/memory.c                |  8 ++------
> mm/oom_kill.c              |  2 +-
> 6 files changed, 18 insertions(+), 61 deletions(-)
>
>diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>index b113c0f..7c43903 100644
>--- a/include/linux/memcontrol.h
>+++ b/include/linux/memcontrol.h
>@@ -120,39 +120,16 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page);
> extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
> 					struct task_struct *p);
> 
>-/**
>- * mem_cgroup_toggle_oom - toggle the memcg OOM killer for the current task
>- * @new: true to enable, false to disable
>- *
>- * Toggle whether a failed memcg charge should invoke the OOM killer
>- * or just return -ENOMEM.  Returns the previous toggle state.
>- *
>- * NOTE: Any path that enables the OOM killer before charging must
>- *       call mem_cgroup_oom_synchronize() afterward to finalize the
>- *       OOM handling and clean up.
>- */
>-static inline bool mem_cgroup_toggle_oom(bool new)
>-{
>-	bool old;
>-
>-	old = current->memcg_oom.may_oom;
>-	current->memcg_oom.may_oom = new;
>-
>-	return old;
>-}
>-
> static inline void mem_cgroup_enable_oom(void)
> {
>-	bool old = mem_cgroup_toggle_oom(true);
>-
>-	WARN_ON(old == true);
>+	WARN_ON(current->memcg_oom.may_oom);
>+	current->memcg_oom.may_oom = true;
> }
> 
> static inline void mem_cgroup_disable_oom(void)
> {
>-	bool old = mem_cgroup_toggle_oom(false);
>-
>-	WARN_ON(old == false);
>+	WARN_ON(!current->memcg_oom.may_oom);
>+	current->memcg_oom.may_oom = false;
> }
> 
> static inline bool task_in_memcg_oom(struct task_struct *p)
>@@ -160,7 +137,7 @@ static inline bool task_in_memcg_oom(struct task_struct *p)
> 	return p->memcg_oom.in_memcg_oom;
> }
> 
>-bool mem_cgroup_oom_synchronize(void);
>+bool mem_cgroup_oom_synchronize(bool wait);
> 
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> extern int do_swap_account;
>@@ -375,11 +352,6 @@ mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> {
> }
> 
>-static inline bool mem_cgroup_toggle_oom(bool new)
>-{
>-	return false;
>-}
>-
> static inline void mem_cgroup_enable_oom(void)
> {
> }
>@@ -393,7 +365,7 @@ static inline bool task_in_memcg_oom(struct task_struct *p)
> 	return false;
> }
> 
>-static inline bool mem_cgroup_oom_synchronize(void)
>+static inline bool mem_cgroup_oom_synchronize(bool wait)
> {
> 	return false;
> }
>diff --git a/include/linux/sched.h b/include/linux/sched.h
>index 3f2562c..70a62fd 100644
>--- a/include/linux/sched.h
>+++ b/include/linux/sched.h
>@@ -91,7 +91,6 @@ struct sched_param {
> #include <linux/latencytop.h>
> #include <linux/cred.h>
> #include <linux/llist.h>
>-#include <linux/stacktrace.h>
> 
> #include <asm/processor.h>
> 
>@@ -1573,8 +1572,6 @@ struct task_struct {
> 		unsigned int may_oom:1;
> 		unsigned int in_memcg_oom:1;
> 		unsigned int oom_locked:1;
>-		struct stack_trace trace;
>-		unsigned long trace_entries[16];
> 		int wakeups;
> 		struct mem_cgroup *wait_on_memcg;
> 	} memcg_oom;
>diff --git a/mm/filemap.c b/mm/filemap.c
>index 030774a..5f0a3c9 100644
>--- a/mm/filemap.c
>+++ b/mm/filemap.c
>@@ -1661,7 +1661,6 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> 	struct inode *inode = mapping->host;
> 	pgoff_t offset = vmf->pgoff;
> 	struct page *page;
>-	bool memcg_oom;
> 	pgoff_t size;
> 	int ret = 0;
> 
>@@ -1670,11 +1669,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> 		return VM_FAULT_SIGBUS;
> 
> 	/*
>-	 * Do we have something in the page cache already?  Either
>-	 * way, try readahead, but disable the memcg OOM killer for it
>-	 * as readahead is optional and no errors are propagated up
>-	 * the fault stack.  The OOM killer is enabled while trying to
>-	 * instantiate the faulting page individually below.
>+	 * Do we have something in the page cache already?
> 	 */
> 	page = find_get_page(mapping, offset);
> 	if (likely(page)) {
>@@ -1682,14 +1677,10 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> 		 * We found the page, so try async readahead before
> 		 * waiting for the lock.
> 		 */
>-		memcg_oom = mem_cgroup_toggle_oom(false);
> 		do_async_mmap_readahead(vma, ra, file, page, offset);
>-		mem_cgroup_toggle_oom(memcg_oom);
> 	} else {
> 		/* No page in the page cache at all */
>-		memcg_oom = mem_cgroup_toggle_oom(false);
> 		do_sync_mmap_readahead(vma, ra, file, offset);
>-		mem_cgroup_toggle_oom(memcg_oom);
> 		count_vm_event(PGMAJFAULT);
> 		mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
> 		ret = VM_FAULT_MAJOR;
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index 83acd11..ebd07f3 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -1874,12 +1874,6 @@ static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask)
> 
> 	current->memcg_oom.in_memcg_oom = 1;
> 
>-	current->memcg_oom.trace.nr_entries = 0;
>-	current->memcg_oom.trace.max_entries = 16;
>-	current->memcg_oom.trace.entries = current->memcg_oom.trace_entries;
>-	current->memcg_oom.trace.skip = 1;
>-	save_stack_trace(&current->memcg_oom.trace);
>-
> 	/*
> 	 * As with any blocking lock, a contender needs to start
> 	 * listening for wakeups before attempting the trylock,
>@@ -1935,6 +1929,7 @@ static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask)
> 
> /**
>  * mem_cgroup_oom_synchronize - complete memcg OOM handling
>+ * @wait: wait for OOM handler or just clear the OOM state
>  *
>  * This has to be called at the end of a page fault if the the memcg
>  * OOM handler was enabled and the fault is returning %VM_FAULT_OOM.
>@@ -1950,7 +1945,7 @@ static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask)
>  * Returns %true if an ongoing memcg OOM situation was detected and
>  * finalized, %false otherwise.
>  */
>-bool mem_cgroup_oom_synchronize(void)
>+bool mem_cgroup_oom_synchronize(bool wait)
> {
> 	struct oom_wait_info owait;
> 	struct mem_cgroup *memcg;
>@@ -1969,6 +1964,9 @@ bool mem_cgroup_oom_synchronize(void)
> 	if (!memcg)
> 		goto out;
> 
>+	if (!wait)
>+		goto out_memcg;
>+
> 	if (test_thread_flag(TIF_MEMDIE) || fatal_signal_pending(current))
> 		goto out_memcg;
> 
>@@ -2369,6 +2367,9 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
> 	struct mem_cgroup *memcg = NULL;
> 	int ret;
> 
>+	if (unlikely(current->memcg_oom.in_memcg_oom))
>+		goto nomem;
>+
> 	/*
> 	 * Unlike gloval-vm's OOM-kill, we're not in memory shortage
> 	 * in system level. So, allow to go ahead dying process in addition to
>diff --git a/mm/memory.c b/mm/memory.c
>index cdbe41b..cdad471 100644
>--- a/mm/memory.c
>+++ b/mm/memory.c
>@@ -57,7 +57,6 @@
> #include <linux/swapops.h>
> #include <linux/elf.h>
> #include <linux/gfp.h>
>-#include <linux/stacktrace.h>
> 
> #include <asm/io.h>
> #include <asm/pgalloc.h>
>@@ -3521,11 +3520,8 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> 	if (flags & FAULT_FLAG_USER)
> 		mem_cgroup_disable_oom();
> 
>-	if (WARN_ON(task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM))) {
>-		printk("Fixing unhandled memcg OOM context set up from:\n");
>-		print_stack_trace(&current->memcg_oom.trace, 0);
>-		mem_cgroup_oom_synchronize();
>-	}
>+	if (task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM))
>+		mem_cgroup_oom_synchronize(false);
> 
> 	return ret;
> }
>diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>index aa60863..3bf664c 100644
>--- a/mm/oom_kill.c
>+++ b/mm/oom_kill.c
>@@ -785,7 +785,7 @@ out:
>  */
> void pagefault_out_of_memory(void)
> {
>-	if (mem_cgroup_oom_synchronize())
>+	if (mem_cgroup_oom_synchronize(true))
> 		return;
> 	if (try_set_system_oom()) {
> 		out_of_memory(NULL, 0, 0, NULL);
>-- 
>1.8.4
>
>

WARNING: multiple messages have this Message-ID (diff)
From: "azurIt" <azurit@pobox.sk>
To: "Johannes Weiner" <hannes@cmpxchg.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Michal Hocko" <mhocko@suse.cz>,
	"David Rientjes" <rientjes@google.com>,
	"KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>,
	"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 0/7] improve memcg oom killer robustness v2
Date: Mon, 09 Sep 2013 15:10:10 +0200	[thread overview]
Message-ID: <20130909151010.3A3CBC6A@pobox.sk> (raw)
In-Reply-To: <20130905115430.GB856@cmpxchg.org>

>Hi azur,
>
>On Wed, Sep 04, 2013 at 10:18:52AM +0200, azurIt wrote:
>> > CC: "Andrew Morton" <akpm@linux-foundation.org>, "Michal Hocko" <mhocko@suse.cz>, "David Rientjes" <rientjes@google.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
>> >Hello azur,
>> >
>> >On Mon, Sep 02, 2013 at 12:38:02PM +0200, azurIt wrote:
>> >> >>Hi azur,
>> >> >>
>> >> >>here is the x86-only rollup of the series for 3.2.
>> >> >>
>> >> >>Thanks!
>> >> >>Johannes
>> >> >>---
>> >> >
>> >> >
>> >> >Johannes,
>> >> >
>> >> >unfortunately, one problem arises: I have (again) cgroup which cannot be deleted :( it's a user who had very high memory usage and was reaching his limit very often. Do you need any info which i can gather now?
>> >
>> >Did the OOM killer go off in this group?
>> >
>> >Was there a warning in the syslog ("Fixing unhandled memcg OOM
>> >context")?
>> 
>> 
>> 
>> Ok, i see this message several times in my syslog logs, one of them is also for this unremovable cgroup (but maybe all of them cannot be removed, should i try?). Example of the log is here (don't know where exactly it starts and ends so here is the full kernel log):
>> http://watchdog.sk/lkml/oom_syslog.gz
>There is an unfinished OOM invocation here:
>
>  Aug 22 13:15:21 server01 kernel: [1251422.715112] Fixing unhandled memcg OOM context set up from:
>  Aug 22 13:15:21 server01 kernel: [1251422.715191]  [<ffffffff811105c2>] T.1154+0x622/0x8f0
>  Aug 22 13:15:21 server01 kernel: [1251422.715274]  [<ffffffff8111153e>] mem_cgroup_cache_charge+0xbe/0xe0
>  Aug 22 13:15:21 server01 kernel: [1251422.715357]  [<ffffffff810cf31c>] add_to_page_cache_locked+0x4c/0x140
>  Aug 22 13:15:21 server01 kernel: [1251422.715443]  [<ffffffff810cf432>] add_to_page_cache_lru+0x22/0x50
>  Aug 22 13:15:21 server01 kernel: [1251422.715526]  [<ffffffff810cfdd3>] find_or_create_page+0x73/0xb0
>  Aug 22 13:15:21 server01 kernel: [1251422.715608]  [<ffffffff811493ba>] __getblk+0xea/0x2c0
>  Aug 22 13:15:21 server01 kernel: [1251422.715692]  [<ffffffff8114ca73>] __bread+0x13/0xc0
>  Aug 22 13:15:21 server01 kernel: [1251422.715774]  [<ffffffff81196968>] ext3_get_branch+0x98/0x140
>  Aug 22 13:15:21 server01 kernel: [1251422.715859]  [<ffffffff81197557>] ext3_get_blocks_handle+0xd7/0xdc0
>  Aug 22 13:15:21 server01 kernel: [1251422.715942]  [<ffffffff81198304>] ext3_get_block+0xc4/0x120
>  Aug 22 13:15:21 server01 kernel: [1251422.716023]  [<ffffffff81155c3a>] do_mpage_readpage+0x38a/0x690
>  Aug 22 13:15:21 server01 kernel: [1251422.716107]  [<ffffffff81155f8f>] mpage_readpage+0x4f/0x70
>  Aug 22 13:15:21 server01 kernel: [1251422.716188]  [<ffffffff811973a8>] ext3_readpage+0x28/0x60
>  Aug 22 13:15:21 server01 kernel: [1251422.716268]  [<ffffffff810cfa48>] filemap_fault+0x308/0x560
>  Aug 22 13:15:21 server01 kernel: [1251422.716350]  [<ffffffff810ef898>] __do_fault+0x78/0x5a0
>  Aug 22 13:15:21 server01 kernel: [1251422.716433]  [<ffffffff810f2ab4>] handle_pte_fault+0x84/0x940
>
>__getblk() has this weird loop where it tries to instantiate the page,
>frees memory on failure, then retries.  If the memcg goes OOM, the OOM
>path might be entered multiple times and each time leak the memcg
>reference of the respective previous OOM invocation.
>
>There are a few more find_or_create() sites that do not propagate an
>error and it's incredibly hard to find out whether they are even taken
>during a page fault.  It's not practical to annotate them all with
>memcg OOM toggles, so let's just catch all OOM contexts at the end of
>handle_mm_fault() and clear them if !VM_FAULT_OOM instead of treating
>this like an error.
>
>azur, here is a patch on top of your modified 3.2.  Note that Michal
>might be onto something and we are looking at multiple issues here,
>but the log excert above suggests this fix is required either way.




Johannes, is this still up to date? Thank you.

azur






>---
>From: Johannes Weiner <hannes@cmpxchg.org>
>Subject: [patch] mm: memcg: handle non-error OOM situations more gracefully
>
>Many places that can trigger a memcg OOM situation return gracefully
>and don't propagate VM_FAULT_OOM up the fault stack.
>
>It's not practical to annotate all of them to disable the memcg OOM
>killer.  Instead, just clean up any set OOM state without warning in
>case the fault is not returning VM_FAULT_OOM.
>
>Also fail charges immediately when the current task already is in an
>OOM context.  Otherwise, the previous context gets overwritten and the
>memcg reference is leaked.
>
>Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
>---
> include/linux/memcontrol.h | 40 ++++++----------------------------------
> include/linux/sched.h      |  3 ---
> mm/filemap.c               | 11 +----------
> mm/memcontrol.c            | 15 ++++++++-------
> mm/memory.c                |  8 ++------
> mm/oom_kill.c              |  2 +-
> 6 files changed, 18 insertions(+), 61 deletions(-)
>
>diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>index b113c0f..7c43903 100644
>--- a/include/linux/memcontrol.h
>+++ b/include/linux/memcontrol.h
>@@ -120,39 +120,16 @@ mem_cgroup_get_reclaim_stat_from_page(struct page *page);
> extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg,
> 					struct task_struct *p);
> 
>-/**
>- * mem_cgroup_toggle_oom - toggle the memcg OOM killer for the current task
>- * @new: true to enable, false to disable
>- *
>- * Toggle whether a failed memcg charge should invoke the OOM killer
>- * or just return -ENOMEM.  Returns the previous toggle state.
>- *
>- * NOTE: Any path that enables the OOM killer before charging must
>- *       call mem_cgroup_oom_synchronize() afterward to finalize the
>- *       OOM handling and clean up.
>- */
>-static inline bool mem_cgroup_toggle_oom(bool new)
>-{
>-	bool old;
>-
>-	old = current->memcg_oom.may_oom;
>-	current->memcg_oom.may_oom = new;
>-
>-	return old;
>-}
>-
> static inline void mem_cgroup_enable_oom(void)
> {
>-	bool old = mem_cgroup_toggle_oom(true);
>-
>-	WARN_ON(old == true);
>+	WARN_ON(current->memcg_oom.may_oom);
>+	current->memcg_oom.may_oom = true;
> }
> 
> static inline void mem_cgroup_disable_oom(void)
> {
>-	bool old = mem_cgroup_toggle_oom(false);
>-
>-	WARN_ON(old == false);
>+	WARN_ON(!current->memcg_oom.may_oom);
>+	current->memcg_oom.may_oom = false;
> }
> 
> static inline bool task_in_memcg_oom(struct task_struct *p)
>@@ -160,7 +137,7 @@ static inline bool task_in_memcg_oom(struct task_struct *p)
> 	return p->memcg_oom.in_memcg_oom;
> }
> 
>-bool mem_cgroup_oom_synchronize(void);
>+bool mem_cgroup_oom_synchronize(bool wait);
> 
> #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> extern int do_swap_account;
>@@ -375,11 +352,6 @@ mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
> {
> }
> 
>-static inline bool mem_cgroup_toggle_oom(bool new)
>-{
>-	return false;
>-}
>-
> static inline void mem_cgroup_enable_oom(void)
> {
> }
>@@ -393,7 +365,7 @@ static inline bool task_in_memcg_oom(struct task_struct *p)
> 	return false;
> }
> 
>-static inline bool mem_cgroup_oom_synchronize(void)
>+static inline bool mem_cgroup_oom_synchronize(bool wait)
> {
> 	return false;
> }
>diff --git a/include/linux/sched.h b/include/linux/sched.h
>index 3f2562c..70a62fd 100644
>--- a/include/linux/sched.h
>+++ b/include/linux/sched.h
>@@ -91,7 +91,6 @@ struct sched_param {
> #include <linux/latencytop.h>
> #include <linux/cred.h>
> #include <linux/llist.h>
>-#include <linux/stacktrace.h>
> 
> #include <asm/processor.h>
> 
>@@ -1573,8 +1572,6 @@ struct task_struct {
> 		unsigned int may_oom:1;
> 		unsigned int in_memcg_oom:1;
> 		unsigned int oom_locked:1;
>-		struct stack_trace trace;
>-		unsigned long trace_entries[16];
> 		int wakeups;
> 		struct mem_cgroup *wait_on_memcg;
> 	} memcg_oom;
>diff --git a/mm/filemap.c b/mm/filemap.c
>index 030774a..5f0a3c9 100644
>--- a/mm/filemap.c
>+++ b/mm/filemap.c
>@@ -1661,7 +1661,6 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> 	struct inode *inode = mapping->host;
> 	pgoff_t offset = vmf->pgoff;
> 	struct page *page;
>-	bool memcg_oom;
> 	pgoff_t size;
> 	int ret = 0;
> 
>@@ -1670,11 +1669,7 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> 		return VM_FAULT_SIGBUS;
> 
> 	/*
>-	 * Do we have something in the page cache already?  Either
>-	 * way, try readahead, but disable the memcg OOM killer for it
>-	 * as readahead is optional and no errors are propagated up
>-	 * the fault stack.  The OOM killer is enabled while trying to
>-	 * instantiate the faulting page individually below.
>+	 * Do we have something in the page cache already?
> 	 */
> 	page = find_get_page(mapping, offset);
> 	if (likely(page)) {
>@@ -1682,14 +1677,10 @@ int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> 		 * We found the page, so try async readahead before
> 		 * waiting for the lock.
> 		 */
>-		memcg_oom = mem_cgroup_toggle_oom(false);
> 		do_async_mmap_readahead(vma, ra, file, page, offset);
>-		mem_cgroup_toggle_oom(memcg_oom);
> 	} else {
> 		/* No page in the page cache at all */
>-		memcg_oom = mem_cgroup_toggle_oom(false);
> 		do_sync_mmap_readahead(vma, ra, file, offset);
>-		mem_cgroup_toggle_oom(memcg_oom);
> 		count_vm_event(PGMAJFAULT);
> 		mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
> 		ret = VM_FAULT_MAJOR;
>diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>index 83acd11..ebd07f3 100644
>--- a/mm/memcontrol.c
>+++ b/mm/memcontrol.c
>@@ -1874,12 +1874,6 @@ static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask)
> 
> 	current->memcg_oom.in_memcg_oom = 1;
> 
>-	current->memcg_oom.trace.nr_entries = 0;
>-	current->memcg_oom.trace.max_entries = 16;
>-	current->memcg_oom.trace.entries = current->memcg_oom.trace_entries;
>-	current->memcg_oom.trace.skip = 1;
>-	save_stack_trace(&current->memcg_oom.trace);
>-
> 	/*
> 	 * As with any blocking lock, a contender needs to start
> 	 * listening for wakeups before attempting the trylock,
>@@ -1935,6 +1929,7 @@ static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask)
> 
> /**
>  * mem_cgroup_oom_synchronize - complete memcg OOM handling
>+ * @wait: wait for OOM handler or just clear the OOM state
>  *
>  * This has to be called at the end of a page fault if the the memcg
>  * OOM handler was enabled and the fault is returning %VM_FAULT_OOM.
>@@ -1950,7 +1945,7 @@ static void mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask)
>  * Returns %true if an ongoing memcg OOM situation was detected and
>  * finalized, %false otherwise.
>  */
>-bool mem_cgroup_oom_synchronize(void)
>+bool mem_cgroup_oom_synchronize(bool wait)
> {
> 	struct oom_wait_info owait;
> 	struct mem_cgroup *memcg;
>@@ -1969,6 +1964,9 @@ bool mem_cgroup_oom_synchronize(void)
> 	if (!memcg)
> 		goto out;
> 
>+	if (!wait)
>+		goto out_memcg;
>+
> 	if (test_thread_flag(TIF_MEMDIE) || fatal_signal_pending(current))
> 		goto out_memcg;
> 
>@@ -2369,6 +2367,9 @@ static int __mem_cgroup_try_charge(struct mm_struct *mm,
> 	struct mem_cgroup *memcg = NULL;
> 	int ret;
> 
>+	if (unlikely(current->memcg_oom.in_memcg_oom))
>+		goto nomem;
>+
> 	/*
> 	 * Unlike gloval-vm's OOM-kill, we're not in memory shortage
> 	 * in system level. So, allow to go ahead dying process in addition to
>diff --git a/mm/memory.c b/mm/memory.c
>index cdbe41b..cdad471 100644
>--- a/mm/memory.c
>+++ b/mm/memory.c
>@@ -57,7 +57,6 @@
> #include <linux/swapops.h>
> #include <linux/elf.h>
> #include <linux/gfp.h>
>-#include <linux/stacktrace.h>
> 
> #include <asm/io.h>
> #include <asm/pgalloc.h>
>@@ -3521,11 +3520,8 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> 	if (flags & FAULT_FLAG_USER)
> 		mem_cgroup_disable_oom();
> 
>-	if (WARN_ON(task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM))) {
>-		printk("Fixing unhandled memcg OOM context set up from:\n");
>-		print_stack_trace(&current->memcg_oom.trace, 0);
>-		mem_cgroup_oom_synchronize();
>-	}
>+	if (task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM))
>+		mem_cgroup_oom_synchronize(false);
> 
> 	return ret;
> }
>diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>index aa60863..3bf664c 100644
>--- a/mm/oom_kill.c
>+++ b/mm/oom_kill.c
>@@ -785,7 +785,7 @@ out:
>  */
> void pagefault_out_of_memory(void)
> {
>-	if (mem_cgroup_oom_synchronize())
>+	if (mem_cgroup_oom_synchronize(true))
> 		return;
> 	if (try_set_system_oom()) {
> 		out_of_memory(NULL, 0, 0, NULL);
>-- 
>1.8.4
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-09-09 13:10 UTC|newest]

Thread overview: 227+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-03 16:59 [patch 0/7] improve memcg oom killer robustness v2 Johannes Weiner
2013-08-03 16:59 ` Johannes Weiner
2013-08-03 16:59 ` [patch 1/7] arch: mm: remove obsolete init OOM protection Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-06  6:34   ` Vineet Gupta
2013-08-06  6:34     ` Vineet Gupta
2013-08-06  6:34     ` Vineet Gupta
2013-08-03 16:59 ` [patch 2/7] arch: mm: do not invoke OOM killer on kernel fault OOM Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-03 16:59 ` [patch 3/7] arch: mm: pass userspace fault flag to generic fault handler Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-05 22:06   ` Andrew Morton
2013-08-05 22:06     ` Andrew Morton
2013-08-05 22:25     ` Johannes Weiner
2013-08-05 22:25       ` Johannes Weiner
2013-08-03 16:59 ` [patch 4/7] x86: finish user fault error path with fatal signal Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-03 16:59 ` [patch 5/7] mm: memcg: enable memcg OOM killer only for user faults Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-05  9:18   ` Michal Hocko
2013-08-05  9:18     ` Michal Hocko
2013-08-03 16:59 ` [patch 6/7] mm: memcg: rework and document OOM waiting and wakeup Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-03 17:00 ` [patch 7/7] mm: memcg: do not trap chargers with full callstack on OOM Johannes Weiner
2013-08-03 17:00   ` Johannes Weiner
2013-08-05  9:54   ` Michal Hocko
2013-08-05  9:54     ` Michal Hocko
2013-08-05  9:54     ` Michal Hocko
2013-08-05 20:56     ` Johannes Weiner
2013-08-05 20:56       ` Johannes Weiner
2013-08-03 17:08 ` [patch 0/7] improve memcg oom killer robustness v2 Johannes Weiner
2013-08-03 17:08   ` Johannes Weiner
2013-08-09  9:06   ` azurIt
2013-08-09  9:06     ` azurIt
2013-08-09  9:06     ` azurIt
2013-08-30 19:58   ` azurIt
2013-08-30 19:58     ` azurIt
2013-09-02 10:38     ` azurIt
2013-09-02 10:38       ` azurIt
2013-09-03 20:48       ` Johannes Weiner
2013-09-03 20:48         ` Johannes Weiner
2013-09-04  7:53         ` azurIt
2013-09-04  7:53           ` azurIt
2013-09-04  7:53           ` azurIt
2013-09-04  7:53           ` azurIt
2013-09-04  8:18         ` azurIt
2013-09-04  8:18           ` azurIt
2013-09-05 11:54           ` Johannes Weiner
2013-09-05 11:54             ` Johannes Weiner
2013-09-05 12:43             ` Michal Hocko
2013-09-05 12:43               ` Michal Hocko
2013-09-05 16:18               ` Johannes Weiner
2013-09-05 16:18                 ` Johannes Weiner
2013-09-09 12:36                 ` Michal Hocko
2013-09-09 12:36                   ` Michal Hocko
2013-09-09 12:56                   ` Michal Hocko
2013-09-09 12:56                     ` Michal Hocko
2013-09-12 12:59                     ` Johannes Weiner
2013-09-12 12:59                       ` Johannes Weiner
2013-09-16 14:03                       ` Michal Hocko
2013-09-16 14:03                         ` Michal Hocko
2013-09-16 14:03                         ` Michal Hocko
2013-09-05 13:24             ` Michal Hocko
2013-09-05 13:24               ` Michal Hocko
2013-09-09 13:10             ` azurIt [this message]
2013-09-09 13:10               ` azurIt
2013-09-09 17:28               ` Johannes Weiner
2013-09-09 17:28                 ` Johannes Weiner
2013-09-09 19:59                 ` azurIt
2013-09-09 19:59                   ` azurIt
2013-09-09 20:12                   ` Johannes Weiner
2013-09-09 20:12                     ` Johannes Weiner
2013-09-09 20:18                     ` azurIt
2013-09-09 20:18                       ` azurIt
2013-09-09 21:08                     ` azurIt
2013-09-09 21:08                       ` azurIt
2013-09-10 18:13                     ` azurIt
2013-09-10 18:13                       ` azurIt
2013-09-10 18:37                       ` Johannes Weiner
2013-09-10 18:37                         ` Johannes Weiner
2013-09-10 19:32                         ` azurIt
2013-09-10 19:32                           ` azurIt
2013-09-10 20:12                           ` Johannes Weiner
2013-09-10 20:12                             ` Johannes Weiner
2013-09-10 21:08                             ` azurIt
2013-09-10 21:08                               ` azurIt
2013-09-10 21:08                               ` azurIt
2013-09-10 21:18                               ` Johannes Weiner
2013-09-10 21:18                                 ` Johannes Weiner
2013-09-10 21:32                                 ` azurIt
2013-09-10 21:32                                   ` azurIt
2013-09-10 22:03                                   ` Johannes Weiner
2013-09-10 22:03                                     ` Johannes Weiner
2013-09-11 12:33                                     ` azurIt
2013-09-11 12:33                                       ` azurIt
2013-09-11 18:03                                       ` Johannes Weiner
2013-09-11 18:03                                         ` Johannes Weiner
2013-09-11 18:03                                         ` Johannes Weiner
2013-09-11 18:54                                         ` azurIt
2013-09-11 18:54                                           ` azurIt
2013-09-11 19:11                                           ` Johannes Weiner
2013-09-11 19:11                                             ` Johannes Weiner
2013-09-11 19:41                                             ` azurIt
2013-09-11 19:41                                               ` azurIt
2013-09-11 20:04                                               ` Johannes Weiner
2013-09-11 20:04                                                 ` Johannes Weiner
2013-09-14 10:48                                                 ` azurIt
2013-09-14 10:48                                                   ` azurIt
2013-09-16 13:40                                                   ` Michal Hocko
2013-09-16 13:40                                                     ` Michal Hocko
2013-09-16 14:01                                                     ` azurIt
2013-09-16 14:01                                                       ` azurIt
2013-09-16 14:06                                                       ` Michal Hocko
2013-09-16 14:06                                                         ` Michal Hocko
2013-09-16 14:13                                                         ` azurIt
2013-09-16 14:13                                                           ` azurIt
2013-09-16 14:13                                                           ` azurIt
2013-09-16 14:57                                                           ` Michal Hocko
2013-09-16 14:57                                                             ` Michal Hocko
2013-09-16 15:05                                                             ` azurIt
2013-09-16 15:05                                                               ` azurIt
2013-09-16 15:17                                                               ` Johannes Weiner
2013-09-16 15:17                                                                 ` Johannes Weiner
2013-09-16 15:17                                                                 ` Johannes Weiner
2013-09-16 15:24                                                                 ` azurIt
2013-09-16 15:24                                                                   ` azurIt
2013-09-16 15:25                                                               ` Michal Hocko
2013-09-16 15:25                                                                 ` Michal Hocko
2013-09-16 15:40                                                                 ` azurIt
2013-09-16 15:40                                                                   ` azurIt
2013-09-16 20:52                                                                 ` azurIt
2013-09-16 20:52                                                                   ` azurIt
2013-09-17  0:02                                                                   ` Johannes Weiner
2013-09-17  0:02                                                                     ` Johannes Weiner
2013-09-17 11:15                                                                     ` azurIt
2013-09-17 11:15                                                                       ` azurIt
2013-09-17 11:15                                                                       ` azurIt
2013-09-17 14:10                                                                       ` Michal Hocko
2013-09-17 14:10                                                                         ` Michal Hocko
2013-09-18 14:03                                                                         ` azurIt
2013-09-18 14:03                                                                           ` azurIt
2013-09-18 14:03                                                                           ` azurIt
2013-09-18 14:24                                                                           ` Michal Hocko
2013-09-18 14:24                                                                             ` Michal Hocko
2013-09-18 14:33                                                                             ` azurIt
2013-09-18 14:33                                                                               ` azurIt
2013-09-18 14:42                                                                               ` Michal Hocko
2013-09-18 14:42                                                                                 ` Michal Hocko
2013-09-18 14:42                                                                                 ` Michal Hocko
2013-09-18 18:02                                                                                 ` azurIt
2013-09-18 18:02                                                                                   ` azurIt
2013-09-18 18:36                                                                                   ` Michal Hocko
2013-09-18 18:36                                                                                     ` Michal Hocko
2013-09-18 18:36                                                                                     ` Michal Hocko
2013-09-18 18:04                                                                           ` Johannes Weiner
2013-09-18 18:04                                                                             ` Johannes Weiner
2013-09-18 18:19                                                                             ` Johannes Weiner
2013-09-18 18:19                                                                               ` Johannes Weiner
2013-09-18 19:55                                                                               ` Johannes Weiner
2013-09-18 19:55                                                                                 ` Johannes Weiner
2013-09-18 19:55                                                                                 ` Johannes Weiner
2013-09-18 20:52                                                                                 ` azurIt
2013-09-18 20:52                                                                                   ` azurIt
2013-09-18 20:52                                                                                   ` azurIt
2013-09-25  7:26                                                                                 ` azurIt
2013-09-25  7:26                                                                                   ` azurIt
2013-09-25  7:26                                                                                   ` azurIt
2013-09-26 16:54                                                                                 ` azurIt
2013-09-26 16:54                                                                                   ` azurIt
2013-09-26 16:54                                                                                   ` azurIt
2013-09-26 19:27                                                                                   ` Johannes Weiner
2013-09-26 19:27                                                                                     ` Johannes Weiner
2013-09-27  2:04                                                                                     ` azurIt
2013-09-27  2:04                                                                                       ` azurIt
2013-09-27  2:04                                                                                       ` azurIt
2013-09-27  2:04                                                                                       ` azurIt
2013-10-07 11:01                                                                                     ` azurIt
2013-10-07 11:01                                                                                       ` azurIt
2013-10-07 11:01                                                                                       ` azurIt
2013-10-07 11:01                                                                                       ` azurIt
2013-10-07 19:23                                                                                       ` Johannes Weiner
2013-10-07 19:23                                                                                         ` Johannes Weiner
2013-10-09 18:44                                                                                         ` azurIt
2013-10-09 18:44                                                                                           ` azurIt
2013-10-09 18:44                                                                                           ` azurIt
2013-10-10  0:14                                                                                           ` Johannes Weiner
2013-10-10  0:14                                                                                             ` Johannes Weiner
2013-10-10  0:14                                                                                             ` Johannes Weiner
2013-10-10 22:59                                                                                             ` azurIt
2013-10-10 22:59                                                                                               ` azurIt
2013-10-10 22:59                                                                                               ` azurIt
2013-09-17 11:20                                                                     ` azurIt
2013-09-17 11:20                                                                       ` azurIt
2013-09-16 10:22                                                 ` azurIt
2013-09-16 10:22                                                   ` azurIt
2013-09-04  9:45         ` azurIt
2013-09-04  9:45           ` azurIt
2013-09-04 11:57           ` Michal Hocko
2013-09-04 11:57             ` Michal Hocko
2013-09-04 12:10             ` azurIt
2013-09-04 12:10               ` azurIt
2013-09-04 12:10               ` azurIt
2013-09-04 12:26               ` Michal Hocko
2013-09-04 12:26                 ` Michal Hocko
2013-09-04 12:26                 ` Michal Hocko
2013-09-04 12:39                 ` azurIt
2013-09-04 12:39                   ` azurIt
2013-09-05  9:14                 ` azurIt
2013-09-05  9:14                   ` azurIt
2013-09-05  9:53                   ` Michal Hocko
2013-09-05  9:53                     ` Michal Hocko
2013-09-05 10:17                     ` azurIt
2013-09-05 10:17                       ` azurIt
2013-09-05 11:17                       ` Michal Hocko
2013-09-05 11:17                         ` Michal Hocko
2013-09-05 11:17                         ` Michal Hocko
2013-09-05 11:47                         ` azurIt
2013-09-05 11:47                           ` azurIt
2013-09-05 12:03                           ` Michal Hocko
2013-09-05 12:03                             ` Michal Hocko
2013-09-05 12:33                             ` azurIt
2013-09-05 12:33                               ` azurIt
2013-09-05 12:33                               ` azurIt
2013-09-05 12:45                               ` Michal Hocko
2013-09-05 12:45                                 ` Michal Hocko
2013-09-05 13:00                                 ` azurIt
2013-09-05 13:00                                   ` azurIt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130909151010.3A3CBC6A@pobox.sk \
    --to=azurit@pobox.sk \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=rientjes@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.