All of lore.kernel.org
 help / color / mirror / Atom feed
From: "azurIt" <azurit@pobox.sk>
To: "Johannes Weiner" <hannes@cmpxchg.org>
Cc: "Michal Hocko" <mhocko@suse.cz>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Rientjes" <rientjes@google.com>,
	"KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>,
	"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 0/7] improve memcg oom killer robustness v2
Date: Thu, 26 Sep 2013 18:54:59 +0200	[thread overview]
Message-ID: <20130926185459.E5D2987F@pobox.sk> (raw)
In-Reply-To: <20130918195504.GF856@cmpxchg.org>

> CC: "Michal Hocko" <mhocko@suse.cz>, "Andrew Morton" <akpm@linux-foundation.org>, "David Rientjes" <rientjes@google.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
>On Wed, Sep 18, 2013 at 02:19:46PM -0400, Johannes Weiner wrote:
>> On Wed, Sep 18, 2013 at 02:04:55PM -0400, Johannes Weiner wrote:
>> > On Wed, Sep 18, 2013 at 04:03:04PM +0200, azurIt wrote:
>> > > > CC: "Johannes Weiner" <hannes@cmpxchg.org>, "Andrew Morton" <akpm@linux-foundation.org>, "David Rientjes" <rientjes@google.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
>> > > >On Tue 17-09-13 13:15:35, azurIt wrote:
>> > > >[...]
>> > > >> Is something unusual on this stack?
>> > > >> 
>> > > >> 
>> > > >> [<ffffffff810d1a5e>] dump_header+0x7e/0x1e0
>> > > >> [<ffffffff810d195f>] ? find_lock_task_mm+0x2f/0x70
>> > > >> [<ffffffff810d1f25>] oom_kill_process+0x85/0x2a0
>> > > >> [<ffffffff810d24a8>] mem_cgroup_out_of_memory+0xa8/0xf0
>> > > >> [<ffffffff8110fb76>] mem_cgroup_oom_synchronize+0x2e6/0x310
>> > > >> [<ffffffff8110efc0>] ? mem_cgroup_uncharge_page+0x40/0x40
>> > > >> [<ffffffff810d2703>] pagefault_out_of_memory+0x13/0x130
>> > > >> [<ffffffff81026f6e>] mm_fault_error+0x9e/0x150
>> > > >> [<ffffffff81027424>] do_page_fault+0x404/0x490
>> > > >> [<ffffffff810f952c>] ? do_mmap_pgoff+0x3dc/0x430
>> > > >> [<ffffffff815cb87f>] page_fault+0x1f/0x30
>> > > >
>> > > >This is a regular memcg OOM killer. Which dumps messages about what is
>> > > >going to do. So no, nothing unusual, except if it was like that for ever
>> > > >which would mean that oom_kill_process is in the endless loop. But a
>> > > >single stack doesn't tell us much.
>> > > >
>> > > >Just a note. When you see something hogging a cpu and you are not sure
>> > > >whether it might be in an endless loop inside the kernel it makes sense
>> > > >to take several snaphosts of the stack trace and see if it changes. If
>> > > >not and the process is not sleeping (there is no schedule on the trace)
>> > > >then it might be looping somewhere waiting for Godot. If it is sleeping
>> > > >then it is slightly harder because you would have to identify what it is
>> > > >waiting for which requires to know a deeper context.
>> > > >-- 
>> > > >Michal Hocko
>> > > >SUSE Labs
>> > > 
>> > > 
>> > > 
>> > > I was finally able to get stack of problematic process :) I saved it two times from the same process, as Michal suggested (i wasn't able to take more). Here it is:
>> > > 
>> > > First (doesn't look very helpfull):
>> > > [<ffffffffffffffff>] 0xffffffffffffffff
>> > > 
>> > > 
>> > > Second:
>> > > [<ffffffff810e17d1>] shrink_zone+0x481/0x650
>> > > [<ffffffff810e2ade>] do_try_to_free_pages+0xde/0x550
>> > > [<ffffffff810e310b>] try_to_free_pages+0x9b/0x120
>> > > [<ffffffff81148ccd>] free_more_memory+0x5d/0x60
>> > > [<ffffffff8114931d>] __getblk+0x14d/0x2c0
>> > > [<ffffffff8114c973>] __bread+0x13/0xc0
>> > > [<ffffffff811968a8>] ext3_get_branch+0x98/0x140
>> > > [<ffffffff81197497>] ext3_get_blocks_handle+0xd7/0xdc0
>> > > [<ffffffff81198244>] ext3_get_block+0xc4/0x120
>> > > [<ffffffff81155b8a>] do_mpage_readpage+0x38a/0x690
>> > > [<ffffffff81155ffb>] mpage_readpages+0xfb/0x160
>> > > [<ffffffff811972bd>] ext3_readpages+0x1d/0x20
>> > > [<ffffffff810d9345>] __do_page_cache_readahead+0x1c5/0x270
>> > > [<ffffffff810d9411>] ra_submit+0x21/0x30
>> > > [<ffffffff810cfb90>] filemap_fault+0x380/0x4f0
>> > > [<ffffffff810ef908>] __do_fault+0x78/0x5a0
>> > > [<ffffffff810f2b24>] handle_pte_fault+0x84/0x940
>> > > [<ffffffff810f354a>] handle_mm_fault+0x16a/0x320
>> > > [<ffffffff8102715b>] do_page_fault+0x13b/0x490
>> > > [<ffffffff815cb87f>] page_fault+0x1f/0x30
>> > > [<ffffffffffffffff>] 0xffffffffffffffff
>> > 
>> > Ah, crap.  I'm sorry.  You even showed us this exact trace before in
>> > another context, but I did not fully realize what __getblk() is doing.
>> > 
>> > My subsequent patches made a charge attempt return -ENOMEM without
>> > reclaim if the memcg is under OOM.  And so the reason you have these
>> > reclaim livelocks is because __getblk never fails on -ENOMEM.  When
>> > the allocation returns -ENOMEM, it invokes GLOBAL DIRECT RECLAIM and
>> > tries again in an endless loop.  The memcg code would previously just
>> > loop inside the charge, reclaiming and killing, until the allocation
>> > succeeded.  But the new code relies on the fault stack being unwound
>> > to complete the OOM kill.  And since the stack is not unwound with
>> > __getblk() looping around the allocation there is no more memcg
>> > reclaim AND no memcg OOM kill, thus no chance of exiting.
>> > 
>> > That code is weird but really old, so it may take a while to evaluate
>> > all the callers as to whether this can be changed.
>> > 
>> > In the meantime, I would just allow __getblk to bypass the memcg limit
>> > when it still can't charge after reclaim.  Does the below get your
>> > machine back on track?
>> 
>> Scratch that.  The idea is reasonable but the implementation is not
>> fully cooked yet.  I'll send you an update.
>
>Here is an update.  Full replacement on top of 3.2 since we tried a
>dead end and it would be more painful to revert individual changes.
>
>The first bug you had was the same task entering OOM repeatedly and
>leaking the memcg reference, thus creating undeletable memcgs.  My
>fixup added a condition that if the task already set up an OOM context
>in that fault, another charge attempt would immediately return -ENOMEM
>without even trying reclaim anymore.  This dropped __getblk() into an
>endless loop of waking the flushers and performing global reclaim and
>memcg returning -ENOMEM regardless of free memory.
>
>The update now basically only changes this -ENOMEM to bypass, so that
>the memory is not accounted and the limit ignored.  OOM killed tasks
>are granted the same right, so that they can exit quickly and release
>memory.  Likewise, we want a task that hit the OOM condition also to
>finish the fault quickly so that it can invoke the OOM killer.
>
>Does the following work for you, azur?


Johannes,

bad news everyone! :(

Unfortunaely, two different problems appears today:

1.) This looks like my very original problem - stucked processes inside one cgroup. I took stacks from all of them over time but server was very slow so i had to kill them soon:
http://watchdog.sk/lkmlmemcg-bug-9.tar.gz

2.) This was just like my last problem where few processes were doing huge i/o. As sever was almost unoperable i barely killed them so no more info here, sorry.

azur

WARNING: multiple messages have this Message-ID (diff)
From: "azurIt" <azurit-Rm0zKEqwvD4@public.gmane.org>
To: "Johannes Weiner" <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: "Michal Hocko" <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	"Andrew Morton"
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	"David Rientjes"
	<rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	"KAMEZAWA Hiroyuki"
	<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
	"KOSAKI Motohiro"
	<kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [patch 0/7] improve memcg oom killer robustness v2
Date: Thu, 26 Sep 2013 18:54:59 +0200	[thread overview]
Message-ID: <20130926185459.E5D2987F@pobox.sk> (raw)
In-Reply-To: <20130918195504.GF856-druUgvl0LCNAfugRpC6u6w@public.gmane.org>

> CC: "Michal Hocko" <mhocko-AlSwsSmVLrQ@public.gmane.org>, "Andrew Morton" <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, "David Rientjes" <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>, "KOSAKI Motohiro" <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>On Wed, Sep 18, 2013 at 02:19:46PM -0400, Johannes Weiner wrote:
>> On Wed, Sep 18, 2013 at 02:04:55PM -0400, Johannes Weiner wrote:
>> > On Wed, Sep 18, 2013 at 04:03:04PM +0200, azurIt wrote:
>> > > > CC: "Johannes Weiner" <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>, "Andrew Morton" <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, "David Rientjes" <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>, "KOSAKI Motohiro" <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> > > >On Tue 17-09-13 13:15:35, azurIt wrote:
>> > > >[...]
>> > > >> Is something unusual on this stack?
>> > > >> 
>> > > >> 
>> > > >> [<ffffffff810d1a5e>] dump_header+0x7e/0x1e0
>> > > >> [<ffffffff810d195f>] ? find_lock_task_mm+0x2f/0x70
>> > > >> [<ffffffff810d1f25>] oom_kill_process+0x85/0x2a0
>> > > >> [<ffffffff810d24a8>] mem_cgroup_out_of_memory+0xa8/0xf0
>> > > >> [<ffffffff8110fb76>] mem_cgroup_oom_synchronize+0x2e6/0x310
>> > > >> [<ffffffff8110efc0>] ? mem_cgroup_uncharge_page+0x40/0x40
>> > > >> [<ffffffff810d2703>] pagefault_out_of_memory+0x13/0x130
>> > > >> [<ffffffff81026f6e>] mm_fault_error+0x9e/0x150
>> > > >> [<ffffffff81027424>] do_page_fault+0x404/0x490
>> > > >> [<ffffffff810f952c>] ? do_mmap_pgoff+0x3dc/0x430
>> > > >> [<ffffffff815cb87f>] page_fault+0x1f/0x30
>> > > >
>> > > >This is a regular memcg OOM killer. Which dumps messages about what is
>> > > >going to do. So no, nothing unusual, except if it was like that for ever
>> > > >which would mean that oom_kill_process is in the endless loop. But a
>> > > >single stack doesn't tell us much.
>> > > >
>> > > >Just a note. When you see something hogging a cpu and you are not sure
>> > > >whether it might be in an endless loop inside the kernel it makes sense
>> > > >to take several snaphosts of the stack trace and see if it changes. If
>> > > >not and the process is not sleeping (there is no schedule on the trace)
>> > > >then it might be looping somewhere waiting for Godot. If it is sleeping
>> > > >then it is slightly harder because you would have to identify what it is
>> > > >waiting for which requires to know a deeper context.
>> > > >-- 
>> > > >Michal Hocko
>> > > >SUSE Labs
>> > > 
>> > > 
>> > > 
>> > > I was finally able to get stack of problematic process :) I saved it two times from the same process, as Michal suggested (i wasn't able to take more). Here it is:
>> > > 
>> > > First (doesn't look very helpfull):
>> > > [<ffffffffffffffff>] 0xffffffffffffffff
>> > > 
>> > > 
>> > > Second:
>> > > [<ffffffff810e17d1>] shrink_zone+0x481/0x650
>> > > [<ffffffff810e2ade>] do_try_to_free_pages+0xde/0x550
>> > > [<ffffffff810e310b>] try_to_free_pages+0x9b/0x120
>> > > [<ffffffff81148ccd>] free_more_memory+0x5d/0x60
>> > > [<ffffffff8114931d>] __getblk+0x14d/0x2c0
>> > > [<ffffffff8114c973>] __bread+0x13/0xc0
>> > > [<ffffffff811968a8>] ext3_get_branch+0x98/0x140
>> > > [<ffffffff81197497>] ext3_get_blocks_handle+0xd7/0xdc0
>> > > [<ffffffff81198244>] ext3_get_block+0xc4/0x120
>> > > [<ffffffff81155b8a>] do_mpage_readpage+0x38a/0x690
>> > > [<ffffffff81155ffb>] mpage_readpages+0xfb/0x160
>> > > [<ffffffff811972bd>] ext3_readpages+0x1d/0x20
>> > > [<ffffffff810d9345>] __do_page_cache_readahead+0x1c5/0x270
>> > > [<ffffffff810d9411>] ra_submit+0x21/0x30
>> > > [<ffffffff810cfb90>] filemap_fault+0x380/0x4f0
>> > > [<ffffffff810ef908>] __do_fault+0x78/0x5a0
>> > > [<ffffffff810f2b24>] handle_pte_fault+0x84/0x940
>> > > [<ffffffff810f354a>] handle_mm_fault+0x16a/0x320
>> > > [<ffffffff8102715b>] do_page_fault+0x13b/0x490
>> > > [<ffffffff815cb87f>] page_fault+0x1f/0x30
>> > > [<ffffffffffffffff>] 0xffffffffffffffff
>> > 
>> > Ah, crap.  I'm sorry.  You even showed us this exact trace before in
>> > another context, but I did not fully realize what __getblk() is doing.
>> > 
>> > My subsequent patches made a charge attempt return -ENOMEM without
>> > reclaim if the memcg is under OOM.  And so the reason you have these
>> > reclaim livelocks is because __getblk never fails on -ENOMEM.  When
>> > the allocation returns -ENOMEM, it invokes GLOBAL DIRECT RECLAIM and
>> > tries again in an endless loop.  The memcg code would previously just
>> > loop inside the charge, reclaiming and killing, until the allocation
>> > succeeded.  But the new code relies on the fault stack being unwound
>> > to complete the OOM kill.  And since the stack is not unwound with
>> > __getblk() looping around the allocation there is no more memcg
>> > reclaim AND no memcg OOM kill, thus no chance of exiting.
>> > 
>> > That code is weird but really old, so it may take a while to evaluate
>> > all the callers as to whether this can be changed.
>> > 
>> > In the meantime, I would just allow __getblk to bypass the memcg limit
>> > when it still can't charge after reclaim.  Does the below get your
>> > machine back on track?
>> 
>> Scratch that.  The idea is reasonable but the implementation is not
>> fully cooked yet.  I'll send you an update.
>
>Here is an update.  Full replacement on top of 3.2 since we tried a
>dead end and it would be more painful to revert individual changes.
>
>The first bug you had was the same task entering OOM repeatedly and
>leaking the memcg reference, thus creating undeletable memcgs.  My
>fixup added a condition that if the task already set up an OOM context
>in that fault, another charge attempt would immediately return -ENOMEM
>without even trying reclaim anymore.  This dropped __getblk() into an
>endless loop of waking the flushers and performing global reclaim and
>memcg returning -ENOMEM regardless of free memory.
>
>The update now basically only changes this -ENOMEM to bypass, so that
>the memory is not accounted and the limit ignored.  OOM killed tasks
>are granted the same right, so that they can exit quickly and release
>memory.  Likewise, we want a task that hit the OOM condition also to
>finish the fault quickly so that it can invoke the OOM killer.
>
>Does the following work for you, azur?


Johannes,

bad news everyone! :(

Unfortunaely, two different problems appears today:

1.) This looks like my very original problem - stucked processes inside one cgroup. I took stacks from all of them over time but server was very slow so i had to kill them soon:
http://watchdog.sk/lkmlmemcg-bug-9.tar.gz

2.) This was just like my last problem where few processes were doing huge i/o. As sever was almost unoperable i barely killed them so no more info here, sorry.

azur

WARNING: multiple messages have this Message-ID (diff)
From: "azurIt" <azurit@pobox.sk>
To: "Johannes Weiner" <hannes@cmpxchg.org>
Cc: "Michal Hocko" <mhocko@suse.cz>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"David Rientjes" <rientjes@google.com>,
	"KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>,
	"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 0/7] improve memcg oom killer robustness v2
Date: Thu, 26 Sep 2013 18:54:59 +0200	[thread overview]
Message-ID: <20130926185459.E5D2987F@pobox.sk> (raw)
In-Reply-To: <20130918195504.GF856@cmpxchg.org>

> CC: "Michal Hocko" <mhocko@suse.cz>, "Andrew Morton" <akpm@linux-foundation.org>, "David Rientjes" <rientjes@google.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
>On Wed, Sep 18, 2013 at 02:19:46PM -0400, Johannes Weiner wrote:
>> On Wed, Sep 18, 2013 at 02:04:55PM -0400, Johannes Weiner wrote:
>> > On Wed, Sep 18, 2013 at 04:03:04PM +0200, azurIt wrote:
>> > > > CC: "Johannes Weiner" <hannes@cmpxchg.org>, "Andrew Morton" <akpm@linux-foundation.org>, "David Rientjes" <rientjes@google.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
>> > > >On Tue 17-09-13 13:15:35, azurIt wrote:
>> > > >[...]
>> > > >> Is something unusual on this stack?
>> > > >> 
>> > > >> 
>> > > >> [<ffffffff810d1a5e>] dump_header+0x7e/0x1e0
>> > > >> [<ffffffff810d195f>] ? find_lock_task_mm+0x2f/0x70
>> > > >> [<ffffffff810d1f25>] oom_kill_process+0x85/0x2a0
>> > > >> [<ffffffff810d24a8>] mem_cgroup_out_of_memory+0xa8/0xf0
>> > > >> [<ffffffff8110fb76>] mem_cgroup_oom_synchronize+0x2e6/0x310
>> > > >> [<ffffffff8110efc0>] ? mem_cgroup_uncharge_page+0x40/0x40
>> > > >> [<ffffffff810d2703>] pagefault_out_of_memory+0x13/0x130
>> > > >> [<ffffffff81026f6e>] mm_fault_error+0x9e/0x150
>> > > >> [<ffffffff81027424>] do_page_fault+0x404/0x490
>> > > >> [<ffffffff810f952c>] ? do_mmap_pgoff+0x3dc/0x430
>> > > >> [<ffffffff815cb87f>] page_fault+0x1f/0x30
>> > > >
>> > > >This is a regular memcg OOM killer. Which dumps messages about what is
>> > > >going to do. So no, nothing unusual, except if it was like that for ever
>> > > >which would mean that oom_kill_process is in the endless loop. But a
>> > > >single stack doesn't tell us much.
>> > > >
>> > > >Just a note. When you see something hogging a cpu and you are not sure
>> > > >whether it might be in an endless loop inside the kernel it makes sense
>> > > >to take several snaphosts of the stack trace and see if it changes. If
>> > > >not and the process is not sleeping (there is no schedule on the trace)
>> > > >then it might be looping somewhere waiting for Godot. If it is sleeping
>> > > >then it is slightly harder because you would have to identify what it is
>> > > >waiting for which requires to know a deeper context.
>> > > >-- 
>> > > >Michal Hocko
>> > > >SUSE Labs
>> > > 
>> > > 
>> > > 
>> > > I was finally able to get stack of problematic process :) I saved it two times from the same process, as Michal suggested (i wasn't able to take more). Here it is:
>> > > 
>> > > First (doesn't look very helpfull):
>> > > [<ffffffffffffffff>] 0xffffffffffffffff
>> > > 
>> > > 
>> > > Second:
>> > > [<ffffffff810e17d1>] shrink_zone+0x481/0x650
>> > > [<ffffffff810e2ade>] do_try_to_free_pages+0xde/0x550
>> > > [<ffffffff810e310b>] try_to_free_pages+0x9b/0x120
>> > > [<ffffffff81148ccd>] free_more_memory+0x5d/0x60
>> > > [<ffffffff8114931d>] __getblk+0x14d/0x2c0
>> > > [<ffffffff8114c973>] __bread+0x13/0xc0
>> > > [<ffffffff811968a8>] ext3_get_branch+0x98/0x140
>> > > [<ffffffff81197497>] ext3_get_blocks_handle+0xd7/0xdc0
>> > > [<ffffffff81198244>] ext3_get_block+0xc4/0x120
>> > > [<ffffffff81155b8a>] do_mpage_readpage+0x38a/0x690
>> > > [<ffffffff81155ffb>] mpage_readpages+0xfb/0x160
>> > > [<ffffffff811972bd>] ext3_readpages+0x1d/0x20
>> > > [<ffffffff810d9345>] __do_page_cache_readahead+0x1c5/0x270
>> > > [<ffffffff810d9411>] ra_submit+0x21/0x30
>> > > [<ffffffff810cfb90>] filemap_fault+0x380/0x4f0
>> > > [<ffffffff810ef908>] __do_fault+0x78/0x5a0
>> > > [<ffffffff810f2b24>] handle_pte_fault+0x84/0x940
>> > > [<ffffffff810f354a>] handle_mm_fault+0x16a/0x320
>> > > [<ffffffff8102715b>] do_page_fault+0x13b/0x490
>> > > [<ffffffff815cb87f>] page_fault+0x1f/0x30
>> > > [<ffffffffffffffff>] 0xffffffffffffffff
>> > 
>> > Ah, crap.  I'm sorry.  You even showed us this exact trace before in
>> > another context, but I did not fully realize what __getblk() is doing.
>> > 
>> > My subsequent patches made a charge attempt return -ENOMEM without
>> > reclaim if the memcg is under OOM.  And so the reason you have these
>> > reclaim livelocks is because __getblk never fails on -ENOMEM.  When
>> > the allocation returns -ENOMEM, it invokes GLOBAL DIRECT RECLAIM and
>> > tries again in an endless loop.  The memcg code would previously just
>> > loop inside the charge, reclaiming and killing, until the allocation
>> > succeeded.  But the new code relies on the fault stack being unwound
>> > to complete the OOM kill.  And since the stack is not unwound with
>> > __getblk() looping around the allocation there is no more memcg
>> > reclaim AND no memcg OOM kill, thus no chance of exiting.
>> > 
>> > That code is weird but really old, so it may take a while to evaluate
>> > all the callers as to whether this can be changed.
>> > 
>> > In the meantime, I would just allow __getblk to bypass the memcg limit
>> > when it still can't charge after reclaim.  Does the below get your
>> > machine back on track?
>> 
>> Scratch that.  The idea is reasonable but the implementation is not
>> fully cooked yet.  I'll send you an update.
>
>Here is an update.  Full replacement on top of 3.2 since we tried a
>dead end and it would be more painful to revert individual changes.
>
>The first bug you had was the same task entering OOM repeatedly and
>leaking the memcg reference, thus creating undeletable memcgs.  My
>fixup added a condition that if the task already set up an OOM context
>in that fault, another charge attempt would immediately return -ENOMEM
>without even trying reclaim anymore.  This dropped __getblk() into an
>endless loop of waking the flushers and performing global reclaim and
>memcg returning -ENOMEM regardless of free memory.
>
>The update now basically only changes this -ENOMEM to bypass, so that
>the memory is not accounted and the limit ignored.  OOM killed tasks
>are granted the same right, so that they can exit quickly and release
>memory.  Likewise, we want a task that hit the OOM condition also to
>finish the fault quickly so that it can invoke the OOM killer.
>
>Does the following work for you, azur?


Johannes,

bad news everyone! :(

Unfortunaely, two different problems appears today:

1.) This looks like my very original problem - stucked processes inside one cgroup. I took stacks from all of them over time but server was very slow so i had to kill them soon:
http://watchdog.sk/lkmlmemcg-bug-9.tar.gz

2.) This was just like my last problem where few processes were doing huge i/o. As sever was almost unoperable i barely killed them so no more info here, sorry.

azur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2013-09-26 16:55 UTC|newest]

Thread overview: 227+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-03 16:59 [patch 0/7] improve memcg oom killer robustness v2 Johannes Weiner
2013-08-03 16:59 ` Johannes Weiner
2013-08-03 16:59 ` [patch 1/7] arch: mm: remove obsolete init OOM protection Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-06  6:34   ` Vineet Gupta
2013-08-06  6:34     ` Vineet Gupta
2013-08-06  6:34     ` Vineet Gupta
2013-08-03 16:59 ` [patch 2/7] arch: mm: do not invoke OOM killer on kernel fault OOM Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-03 16:59 ` [patch 3/7] arch: mm: pass userspace fault flag to generic fault handler Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-05 22:06   ` Andrew Morton
2013-08-05 22:06     ` Andrew Morton
2013-08-05 22:25     ` Johannes Weiner
2013-08-05 22:25       ` Johannes Weiner
2013-08-03 16:59 ` [patch 4/7] x86: finish user fault error path with fatal signal Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-03 16:59 ` [patch 5/7] mm: memcg: enable memcg OOM killer only for user faults Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-05  9:18   ` Michal Hocko
2013-08-05  9:18     ` Michal Hocko
2013-08-03 16:59 ` [patch 6/7] mm: memcg: rework and document OOM waiting and wakeup Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-03 17:00 ` [patch 7/7] mm: memcg: do not trap chargers with full callstack on OOM Johannes Weiner
2013-08-03 17:00   ` Johannes Weiner
2013-08-05  9:54   ` Michal Hocko
2013-08-05  9:54     ` Michal Hocko
2013-08-05  9:54     ` Michal Hocko
2013-08-05 20:56     ` Johannes Weiner
2013-08-05 20:56       ` Johannes Weiner
2013-08-03 17:08 ` [patch 0/7] improve memcg oom killer robustness v2 Johannes Weiner
2013-08-03 17:08   ` Johannes Weiner
2013-08-09  9:06   ` azurIt
2013-08-09  9:06     ` azurIt
2013-08-09  9:06     ` azurIt
2013-08-30 19:58   ` azurIt
2013-08-30 19:58     ` azurIt
2013-09-02 10:38     ` azurIt
2013-09-02 10:38       ` azurIt
2013-09-03 20:48       ` Johannes Weiner
2013-09-03 20:48         ` Johannes Weiner
2013-09-04  7:53         ` azurIt
2013-09-04  7:53           ` azurIt
2013-09-04  7:53           ` azurIt
2013-09-04  7:53           ` azurIt
2013-09-04  8:18         ` azurIt
2013-09-04  8:18           ` azurIt
2013-09-05 11:54           ` Johannes Weiner
2013-09-05 11:54             ` Johannes Weiner
2013-09-05 12:43             ` Michal Hocko
2013-09-05 12:43               ` Michal Hocko
2013-09-05 16:18               ` Johannes Weiner
2013-09-05 16:18                 ` Johannes Weiner
2013-09-09 12:36                 ` Michal Hocko
2013-09-09 12:36                   ` Michal Hocko
2013-09-09 12:56                   ` Michal Hocko
2013-09-09 12:56                     ` Michal Hocko
2013-09-12 12:59                     ` Johannes Weiner
2013-09-12 12:59                       ` Johannes Weiner
2013-09-16 14:03                       ` Michal Hocko
2013-09-16 14:03                         ` Michal Hocko
2013-09-16 14:03                         ` Michal Hocko
2013-09-05 13:24             ` Michal Hocko
2013-09-05 13:24               ` Michal Hocko
2013-09-09 13:10             ` azurIt
2013-09-09 13:10               ` azurIt
2013-09-09 17:28               ` Johannes Weiner
2013-09-09 17:28                 ` Johannes Weiner
2013-09-09 19:59                 ` azurIt
2013-09-09 19:59                   ` azurIt
2013-09-09 20:12                   ` Johannes Weiner
2013-09-09 20:12                     ` Johannes Weiner
2013-09-09 20:18                     ` azurIt
2013-09-09 20:18                       ` azurIt
2013-09-09 21:08                     ` azurIt
2013-09-09 21:08                       ` azurIt
2013-09-10 18:13                     ` azurIt
2013-09-10 18:13                       ` azurIt
2013-09-10 18:37                       ` Johannes Weiner
2013-09-10 18:37                         ` Johannes Weiner
2013-09-10 19:32                         ` azurIt
2013-09-10 19:32                           ` azurIt
2013-09-10 20:12                           ` Johannes Weiner
2013-09-10 20:12                             ` Johannes Weiner
2013-09-10 21:08                             ` azurIt
2013-09-10 21:08                               ` azurIt
2013-09-10 21:08                               ` azurIt
2013-09-10 21:18                               ` Johannes Weiner
2013-09-10 21:18                                 ` Johannes Weiner
2013-09-10 21:32                                 ` azurIt
2013-09-10 21:32                                   ` azurIt
2013-09-10 22:03                                   ` Johannes Weiner
2013-09-10 22:03                                     ` Johannes Weiner
2013-09-11 12:33                                     ` azurIt
2013-09-11 12:33                                       ` azurIt
2013-09-11 18:03                                       ` Johannes Weiner
2013-09-11 18:03                                         ` Johannes Weiner
2013-09-11 18:03                                         ` Johannes Weiner
2013-09-11 18:54                                         ` azurIt
2013-09-11 18:54                                           ` azurIt
2013-09-11 19:11                                           ` Johannes Weiner
2013-09-11 19:11                                             ` Johannes Weiner
2013-09-11 19:41                                             ` azurIt
2013-09-11 19:41                                               ` azurIt
2013-09-11 20:04                                               ` Johannes Weiner
2013-09-11 20:04                                                 ` Johannes Weiner
2013-09-14 10:48                                                 ` azurIt
2013-09-14 10:48                                                   ` azurIt
2013-09-16 13:40                                                   ` Michal Hocko
2013-09-16 13:40                                                     ` Michal Hocko
2013-09-16 14:01                                                     ` azurIt
2013-09-16 14:01                                                       ` azurIt
2013-09-16 14:06                                                       ` Michal Hocko
2013-09-16 14:06                                                         ` Michal Hocko
2013-09-16 14:13                                                         ` azurIt
2013-09-16 14:13                                                           ` azurIt
2013-09-16 14:13                                                           ` azurIt
2013-09-16 14:57                                                           ` Michal Hocko
2013-09-16 14:57                                                             ` Michal Hocko
2013-09-16 15:05                                                             ` azurIt
2013-09-16 15:05                                                               ` azurIt
2013-09-16 15:17                                                               ` Johannes Weiner
2013-09-16 15:17                                                                 ` Johannes Weiner
2013-09-16 15:17                                                                 ` Johannes Weiner
2013-09-16 15:24                                                                 ` azurIt
2013-09-16 15:24                                                                   ` azurIt
2013-09-16 15:25                                                               ` Michal Hocko
2013-09-16 15:25                                                                 ` Michal Hocko
2013-09-16 15:40                                                                 ` azurIt
2013-09-16 15:40                                                                   ` azurIt
2013-09-16 20:52                                                                 ` azurIt
2013-09-16 20:52                                                                   ` azurIt
2013-09-17  0:02                                                                   ` Johannes Weiner
2013-09-17  0:02                                                                     ` Johannes Weiner
2013-09-17 11:15                                                                     ` azurIt
2013-09-17 11:15                                                                       ` azurIt
2013-09-17 11:15                                                                       ` azurIt
2013-09-17 14:10                                                                       ` Michal Hocko
2013-09-17 14:10                                                                         ` Michal Hocko
2013-09-18 14:03                                                                         ` azurIt
2013-09-18 14:03                                                                           ` azurIt
2013-09-18 14:03                                                                           ` azurIt
2013-09-18 14:24                                                                           ` Michal Hocko
2013-09-18 14:24                                                                             ` Michal Hocko
2013-09-18 14:33                                                                             ` azurIt
2013-09-18 14:33                                                                               ` azurIt
2013-09-18 14:42                                                                               ` Michal Hocko
2013-09-18 14:42                                                                                 ` Michal Hocko
2013-09-18 14:42                                                                                 ` Michal Hocko
2013-09-18 18:02                                                                                 ` azurIt
2013-09-18 18:02                                                                                   ` azurIt
2013-09-18 18:36                                                                                   ` Michal Hocko
2013-09-18 18:36                                                                                     ` Michal Hocko
2013-09-18 18:36                                                                                     ` Michal Hocko
2013-09-18 18:04                                                                           ` Johannes Weiner
2013-09-18 18:04                                                                             ` Johannes Weiner
2013-09-18 18:19                                                                             ` Johannes Weiner
2013-09-18 18:19                                                                               ` Johannes Weiner
2013-09-18 19:55                                                                               ` Johannes Weiner
2013-09-18 19:55                                                                                 ` Johannes Weiner
2013-09-18 19:55                                                                                 ` Johannes Weiner
2013-09-18 20:52                                                                                 ` azurIt
2013-09-18 20:52                                                                                   ` azurIt
2013-09-18 20:52                                                                                   ` azurIt
2013-09-25  7:26                                                                                 ` azurIt
2013-09-25  7:26                                                                                   ` azurIt
2013-09-25  7:26                                                                                   ` azurIt
2013-09-26 16:54                                                                                 ` azurIt [this message]
2013-09-26 16:54                                                                                   ` azurIt
2013-09-26 16:54                                                                                   ` azurIt
2013-09-26 19:27                                                                                   ` Johannes Weiner
2013-09-26 19:27                                                                                     ` Johannes Weiner
2013-09-27  2:04                                                                                     ` azurIt
2013-09-27  2:04                                                                                       ` azurIt
2013-09-27  2:04                                                                                       ` azurIt
2013-09-27  2:04                                                                                       ` azurIt
2013-10-07 11:01                                                                                     ` azurIt
2013-10-07 11:01                                                                                       ` azurIt
2013-10-07 11:01                                                                                       ` azurIt
2013-10-07 11:01                                                                                       ` azurIt
2013-10-07 19:23                                                                                       ` Johannes Weiner
2013-10-07 19:23                                                                                         ` Johannes Weiner
2013-10-09 18:44                                                                                         ` azurIt
2013-10-09 18:44                                                                                           ` azurIt
2013-10-09 18:44                                                                                           ` azurIt
2013-10-10  0:14                                                                                           ` Johannes Weiner
2013-10-10  0:14                                                                                             ` Johannes Weiner
2013-10-10  0:14                                                                                             ` Johannes Weiner
2013-10-10 22:59                                                                                             ` azurIt
2013-10-10 22:59                                                                                               ` azurIt
2013-10-10 22:59                                                                                               ` azurIt
2013-09-17 11:20                                                                     ` azurIt
2013-09-17 11:20                                                                       ` azurIt
2013-09-16 10:22                                                 ` azurIt
2013-09-16 10:22                                                   ` azurIt
2013-09-04  9:45         ` azurIt
2013-09-04  9:45           ` azurIt
2013-09-04 11:57           ` Michal Hocko
2013-09-04 11:57             ` Michal Hocko
2013-09-04 12:10             ` azurIt
2013-09-04 12:10               ` azurIt
2013-09-04 12:10               ` azurIt
2013-09-04 12:26               ` Michal Hocko
2013-09-04 12:26                 ` Michal Hocko
2013-09-04 12:26                 ` Michal Hocko
2013-09-04 12:39                 ` azurIt
2013-09-04 12:39                   ` azurIt
2013-09-05  9:14                 ` azurIt
2013-09-05  9:14                   ` azurIt
2013-09-05  9:53                   ` Michal Hocko
2013-09-05  9:53                     ` Michal Hocko
2013-09-05 10:17                     ` azurIt
2013-09-05 10:17                       ` azurIt
2013-09-05 11:17                       ` Michal Hocko
2013-09-05 11:17                         ` Michal Hocko
2013-09-05 11:17                         ` Michal Hocko
2013-09-05 11:47                         ` azurIt
2013-09-05 11:47                           ` azurIt
2013-09-05 12:03                           ` Michal Hocko
2013-09-05 12:03                             ` Michal Hocko
2013-09-05 12:33                             ` azurIt
2013-09-05 12:33                               ` azurIt
2013-09-05 12:33                               ` azurIt
2013-09-05 12:45                               ` Michal Hocko
2013-09-05 12:45                                 ` Michal Hocko
2013-09-05 13:00                                 ` azurIt
2013-09-05 13:00                                   ` azurIt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130926185459.E5D2987F@pobox.sk \
    --to=azurit@pobox.sk \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=rientjes@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.