All of lore.kernel.org
 help / color / mirror / Atom feed
From: "azurIt" <azurit@pobox.sk>
To: "Johannes Weiner" <hannes@cmpxchg.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Michal Hocko" <mhocko@suse.cz>,
	"David Rientjes" <rientjes@google.com>,
	"KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>,
	"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 0/7] improve memcg oom killer robustness v2
Date: Tue, 10 Sep 2013 23:08:53 +0200	[thread overview]
Message-ID: <20130910230853.FEEC19B5@pobox.sk> (raw)
In-Reply-To: <20130910201222.GA25972@cmpxchg.org>

>On Tue, Sep 10, 2013 at 09:32:53PM +0200, azurIt wrote:
>> >On Tue, Sep 10, 2013 at 08:13:59PM +0200, azurIt wrote:
>> >> >On Mon, Sep 09, 2013 at 09:59:17PM +0200, azurIt wrote:
>> >> >> >On Mon, Sep 09, 2013 at 03:10:10PM +0200, azurIt wrote:
>> >> >> >> >Hi azur,
>> >> >> >> >
>> >> >> >> >On Wed, Sep 04, 2013 at 10:18:52AM +0200, azurIt wrote:
>> >> >> >> >> > CC: "Andrew Morton" <akpm@linux-foundation.org>, "Michal Hocko" <mhocko@suse.cz>, "David Rientjes" <rientjes@google.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
>> >> >> >> >> >Hello azur,
>> >> >> >> >> >
>> >> >> >> >> >On Mon, Sep 02, 2013 at 12:38:02PM +0200, azurIt wrote:
>> >> >> >> >> >> >>Hi azur,
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>here is the x86-only rollup of the series for 3.2.
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>Thanks!
>> >> >> >> >> >> >>Johannes
>> >> >> >> >> >> >>---
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> >Johannes,
>> >> >> >> >> >> >
>> >> >> >> >> >> >unfortunately, one problem arises: I have (again) cgroup which cannot be deleted :( it's a user who had very high memory usage and was reaching his limit very often. Do you need any info which i can gather now?
>> >> >> >> >> >
>> >> >> >> >> >Did the OOM killer go off in this group?
>> >> >> >> >> >
>> >> >> >> >> >Was there a warning in the syslog ("Fixing unhandled memcg OOM
>> >> >> >> >> >context")?
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> Ok, i see this message several times in my syslog logs, one of them is also for this unremovable cgroup (but maybe all of them cannot be removed, should i try?). Example of the log is here (don't know where exactly it starts and ends so here is the full kernel log):
>> >> >> >> >> http://watchdog.sk/lkml/oom_syslog.gz
>> >> >> >> >There is an unfinished OOM invocation here:
>> >> >> >> >
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715112] Fixing unhandled memcg OOM context set up from:
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715191]  [<ffffffff811105c2>] T.1154+0x622/0x8f0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715274]  [<ffffffff8111153e>] mem_cgroup_cache_charge+0xbe/0xe0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715357]  [<ffffffff810cf31c>] add_to_page_cache_locked+0x4c/0x140
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715443]  [<ffffffff810cf432>] add_to_page_cache_lru+0x22/0x50
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715526]  [<ffffffff810cfdd3>] find_or_create_page+0x73/0xb0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715608]  [<ffffffff811493ba>] __getblk+0xea/0x2c0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715692]  [<ffffffff8114ca73>] __bread+0x13/0xc0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715774]  [<ffffffff81196968>] ext3_get_branch+0x98/0x140
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715859]  [<ffffffff81197557>] ext3_get_blocks_handle+0xd7/0xdc0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715942]  [<ffffffff81198304>] ext3_get_block+0xc4/0x120
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716023]  [<ffffffff81155c3a>] do_mpage_readpage+0x38a/0x690
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716107]  [<ffffffff81155f8f>] mpage_readpage+0x4f/0x70
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716188]  [<ffffffff811973a8>] ext3_readpage+0x28/0x60
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716268]  [<ffffffff810cfa48>] filemap_fault+0x308/0x560
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716350]  [<ffffffff810ef898>] __do_fault+0x78/0x5a0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716433]  [<ffffffff810f2ab4>] handle_pte_fault+0x84/0x940
>> >> >> >> >
>> >> >> >> >__getblk() has this weird loop where it tries to instantiate the page,
>> >> >> >> >frees memory on failure, then retries.  If the memcg goes OOM, the OOM
>> >> >> >> >path might be entered multiple times and each time leak the memcg
>> >> >> >> >reference of the respective previous OOM invocation.
>> >> >> >> >
>> >> >> >> >There are a few more find_or_create() sites that do not propagate an
>> >> >> >> >error and it's incredibly hard to find out whether they are even taken
>> >> >> >> >during a page fault.  It's not practical to annotate them all with
>> >> >> >> >memcg OOM toggles, so let's just catch all OOM contexts at the end of
>> >> >> >> >handle_mm_fault() and clear them if !VM_FAULT_OOM instead of treating
>> >> >> >> >this like an error.
>> >> >> >> >
>> >> >> >> >azur, here is a patch on top of your modified 3.2.  Note that Michal
>> >> >> >> >might be onto something and we are looking at multiple issues here,
>> >> >> >> >but the log excert above suggests this fix is required either way.
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> Johannes, is this still up to date? Thank you.
>> >> >> >
>> >> >> >No, please use the following on top of 3.2 (i.e. full replacement, not
>> >> >> >incremental to what you have):
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> Unfortunately it didn't compile:
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >>   LD      vmlinux.o
>> >> >>   MODPOST vmlinux.o
>> >> >> WARNING: modpost: Found 4924 section mismatch(es).
>> >> >> To see full details build your kernel with:
>> >> >> 'make CONFIG_DEBUG_SECTION_MISMATCH=y'
>> >> >>   GEN     .version
>> >> >>   CHK     include/generated/compile.h
>> >> >>   UPD     include/generated/compile.h
>> >> >>   CC      init/version.o
>> >> >>   LD      init/built-in.o
>> >> >>   LD      .tmp_vmlinux1
>> >> >> arch/x86/built-in.o: In function `do_page_fault':
>> >> >> (.text+0x26a77): undefined reference to `handle_mm_fault'
>> >> >> mm/built-in.o: In function `fixup_user_fault':
>> >> >> (.text+0x224d3): undefined reference to `handle_mm_fault'
>> >> >> mm/built-in.o: In function `__get_user_pages':
>> >> >> (.text+0x24a0f): undefined reference to `handle_mm_fault'
>> >> >> make: *** [.tmp_vmlinux1] Error 1
>> >> >
>> >> >Oops, sorry about that.  Must be configuration dependent because it
>> >> >works for me (and handle_mm_fault is obviously defined).
>> >> >
>> >> >Do you have warnings earlier in the compilation?  You can use make -s
>> >> >to filter out everything but warnings.
>> >> >
>> >> >Or send me your configuration so I can try to reproduce it here.
>> >> >
>> >> >Thanks!
>> >> 
>> >> 
>> >> Johannes,
>> >> 
>> >> the server went down early in the morning, the symptoms were similar as before - huge I/O. Can't tell what exactly happened since I wasn't able to login even on the console. But I have some info:
>> >>  - applications were able to write to HDD so it wasn't deadlocked as before
>> >>  - here is how it looked on graphs: http://watchdog.sk/lkml/graphs.jpg
>> >>  - server wasn't responding from 6:36, it was down between 6:54 and 7:02 (i had to hard reboot it), I was awoken at 6:36 by really creepy sound from my phone ;)
>> >>  - my 'load check' script successfully killed apache at 6:41 but it didn't help as you can see
>> >>  - i have one screen with info from atop from time 6:44, looks like i/o was done by init (??!): http://watchdog.sk/lkml/atop.jpg (ignore swap warning, i have no swap)
>> >>  - also other type of logs are available
>> >>  - nothing like this happened before
>> >
>> >That IO from init looks really screwy, I have no idea what's going on
>> >on that machine, but it looks like there is more than just a memcg
>> >problem...  Any chance your thirdparty security patches are concealing
>> >kernel daemon activity behind the init process and the IO is actually
>> >coming from a kernel thread like the flushers or kswapd?
>> 
>> 
>> 
>> 
>> I really cannot tell but I never ever saw this before and i'm using all of my patches for several years. Here are all patches which i'm using right now (+ your patch):
>> http://watchdog.sk/lkml/patches3
>> 
>> 
>> 
>> 
>> >Are there OOM kill messages in the syslog?
>> 
>> 
>> 
>> Here is full kernel log between 6:00 and 7:59:
>> http://watchdog.sk/lkml/kern6.log
>
>Wow, your apaches are like the hydra.  Whenever one is OOM killed,
>more show up!



Yeah, it's supposed to do this ;)



>> >> What do you think? I'm now running kernel with your previous patch, not with the newest one.
>> >
>> >Which one exactly?  Can you attach the diff?
>> 
>> 
>> 
>> I meant, the problem above occured on kernel with your latest patch:
>> http://watchdog.sk/lkml/7-2-memcg-fix.patch
>
>The above log has the following callstack:
>
>Sep 10 07:59:43 server01 kernel: [ 3846.337628]  [<ffffffff810d19fe>] dump_header+0x7e/0x1e0
>Sep 10 07:59:43 server01 kernel: [ 3846.337707]  [<ffffffff810d18ff>] ? find_lock_task_mm+0x2f/0x70
>Sep 10 07:59:43 server01 kernel: [ 3846.337790]  [<ffffffff810d18ff>] ? find_lock_task_mm+0x2f/0x70
>Sep 10 07:59:43 server01 kernel: [ 3846.337874]  [<ffffffff81094bb0>] ? __css_put+0x50/0x90
>Sep 10 07:59:43 server01 kernel: [ 3846.337952]  [<ffffffff810d1ec5>] oom_kill_process+0x85/0x2a0
>Sep 10 07:59:43 server01 kernel: [ 3846.338037]  [<ffffffff810d2448>] mem_cgroup_out_of_memory+0xa8/0xf0
>Sep 10 07:59:43 server01 kernel: [ 3846.338120]  [<ffffffff81110858>] T.1154+0x8b8/0x8f0
>Sep 10 07:59:43 server01 kernel: [ 3846.338201]  [<ffffffff81110fa6>] mem_cgroup_charge_common+0x56/0xa0
>Sep 10 07:59:43 server01 kernel: [ 3846.338283]  [<ffffffff81111035>] mem_cgroup_newpage_charge+0x45/0x50
>Sep 10 07:59:43 server01 kernel: [ 3846.338364]  [<ffffffff810f3039>] handle_pte_fault+0x609/0x940
>Sep 10 07:59:43 server01 kernel: [ 3846.338451]  [<ffffffff8102ab1f>] ? pte_alloc_one+0x3f/0x50
>Sep 10 07:59:43 server01 kernel: [ 3846.338532]  [<ffffffff8107e455>] ? sched_clock_local+0x25/0x90
>Sep 10 07:59:43 server01 kernel: [ 3846.338617]  [<ffffffff810f34d7>] handle_mm_fault+0x167/0x340
>Sep 10 07:59:43 server01 kernel: [ 3846.338699]  [<ffffffff8102714b>] do_page_fault+0x13b/0x490
>Sep 10 07:59:43 server01 kernel: [ 3846.338781]  [<ffffffff810f8848>] ? do_brk+0x208/0x3a0
>Sep 10 07:59:43 server01 kernel: [ 3846.338865]  [<ffffffff812dba22>] ? gr_learn_resource+0x42/0x1e0
>Sep 10 07:59:43 server01 kernel: [ 3846.338951]  [<ffffffff815cb7bf>] page_fault+0x1f/0x30
>
>The charge code seems to be directly invoking the OOM killer, which is
>not possible with 7-2-memcg-fix.  Are you sure this is the right patch
>for this log?  This _looks_ more like what 7-1-memcg-fix was doing,
>with a direct kill in the charge context and a fixup later on.




I, luckyly, still have the kernel source from which that kernel was build. I tried to re-apply the 7-2-memcg-fix.patch:

# patch -p1 --dry-run < 7-2-memcg-fix.patch 
patching file arch/x86/mm/fault.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
4 out of 4 hunks ignored -- saving rejects to file arch/x86/mm/fault.c.rej
patching file include/linux/memcontrol.h
Hunk #1 succeeded at 141 with fuzz 2 (offset 21 lines).
Hunk #2 succeeded at 391 with fuzz 1 (offset 39 lines).
patching file include/linux/mm.h
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file include/linux/mm.h.rej
patching file include/linux/sched.h
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file include/linux/sched.h.rej
patching file mm/memcontrol.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
10 out of 10 hunks ignored -- saving rejects to file mm/memcontrol.c.rej
patching file mm/memory.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file mm/memory.c.rej
patching file mm/oom_kill.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file mm/oom_kill.c.rej


Can you tell from this if the source has the right patch?



>It's somewhat eerie that you have to manually apply these patches
>because of grsec because I have no idea of knowing what the end result
>is, especially since you had compile errors in this area before.  Is
>grsec making changes to memcg code or why are these patches not
>applying cleanly?




The problem was in mm/memory.c (first hunk) because grsec added this:

        pgd_t *pgd;
        pud_t *pud;
        pmd_t *pmd;
        pte_t *pte;

+#ifdef CONFIG_PAX_SEGMEXEC
+        struct vm_area_struct *vma_m;
+#endif  

        if (unlikely(is_vm_hugetlb_page(vma)))



I'm not using PAX anyway so it shouldn't be used. This was the only rejection but there were lots of fuzz too - I wasn't considering it as a problem, should I?




>> but after i had to reboot the server i booted the kernel with your previous patch:
>> http://watchdog.sk/lkml/7-1-memcg-fix.patch
>
>This one still has the known memcg leak.



I know but it's the best I have which don't take down the server (yet).


azur

WARNING: multiple messages have this Message-ID (diff)
From: "azurIt" <azurit-Rm0zKEqwvD4@public.gmane.org>
To: "Johannes Weiner" <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: "Andrew Morton"
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	"Michal Hocko" <mhocko-AlSwsSmVLrQ@public.gmane.org>,
	"David Rientjes"
	<rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	"KAMEZAWA Hiroyuki"
	<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
	"KOSAKI Motohiro"
	<kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [patch 0/7] improve memcg oom killer robustness v2
Date: Tue, 10 Sep 2013 23:08:53 +0200	[thread overview]
Message-ID: <20130910230853.FEEC19B5@pobox.sk> (raw)
In-Reply-To: <20130910201222.GA25972-druUgvl0LCNAfugRpC6u6w@public.gmane.org>

>On Tue, Sep 10, 2013 at 09:32:53PM +0200, azurIt wrote:
>> >On Tue, Sep 10, 2013 at 08:13:59PM +0200, azurIt wrote:
>> >> >On Mon, Sep 09, 2013 at 09:59:17PM +0200, azurIt wrote:
>> >> >> >On Mon, Sep 09, 2013 at 03:10:10PM +0200, azurIt wrote:
>> >> >> >> >Hi azur,
>> >> >> >> >
>> >> >> >> >On Wed, Sep 04, 2013 at 10:18:52AM +0200, azurIt wrote:
>> >> >> >> >> > CC: "Andrew Morton" <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, "Michal Hocko" <mhocko-AlSwsSmVLrQ@public.gmane.org>, "David Rientjes" <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>, "KOSAKI Motohiro" <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-arch-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> >> >> >> >> >Hello azur,
>> >> >> >> >> >
>> >> >> >> >> >On Mon, Sep 02, 2013 at 12:38:02PM +0200, azurIt wrote:
>> >> >> >> >> >> >>Hi azur,
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>here is the x86-only rollup of the series for 3.2.
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>Thanks!
>> >> >> >> >> >> >>Johannes
>> >> >> >> >> >> >>---
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> >Johannes,
>> >> >> >> >> >> >
>> >> >> >> >> >> >unfortunately, one problem arises: I have (again) cgroup which cannot be deleted :( it's a user who had very high memory usage and was reaching his limit very often. Do you need any info which i can gather now?
>> >> >> >> >> >
>> >> >> >> >> >Did the OOM killer go off in this group?
>> >> >> >> >> >
>> >> >> >> >> >Was there a warning in the syslog ("Fixing unhandled memcg OOM
>> >> >> >> >> >context")?
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> Ok, i see this message several times in my syslog logs, one of them is also for this unremovable cgroup (but maybe all of them cannot be removed, should i try?). Example of the log is here (don't know where exactly it starts and ends so here is the full kernel log):
>> >> >> >> >> http://watchdog.sk/lkml/oom_syslog.gz
>> >> >> >> >There is an unfinished OOM invocation here:
>> >> >> >> >
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715112] Fixing unhandled memcg OOM context set up from:
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715191]  [<ffffffff811105c2>] T.1154+0x622/0x8f0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715274]  [<ffffffff8111153e>] mem_cgroup_cache_charge+0xbe/0xe0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715357]  [<ffffffff810cf31c>] add_to_page_cache_locked+0x4c/0x140
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715443]  [<ffffffff810cf432>] add_to_page_cache_lru+0x22/0x50
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715526]  [<ffffffff810cfdd3>] find_or_create_page+0x73/0xb0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715608]  [<ffffffff811493ba>] __getblk+0xea/0x2c0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715692]  [<ffffffff8114ca73>] __bread+0x13/0xc0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715774]  [<ffffffff81196968>] ext3_get_branch+0x98/0x140
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715859]  [<ffffffff81197557>] ext3_get_blocks_handle+0xd7/0xdc0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715942]  [<ffffffff81198304>] ext3_get_block+0xc4/0x120
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716023]  [<ffffffff81155c3a>] do_mpage_readpage+0x38a/0x690
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716107]  [<ffffffff81155f8f>] mpage_readpage+0x4f/0x70
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716188]  [<ffffffff811973a8>] ext3_readpage+0x28/0x60
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716268]  [<ffffffff810cfa48>] filemap_fault+0x308/0x560
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716350]  [<ffffffff810ef898>] __do_fault+0x78/0x5a0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716433]  [<ffffffff810f2ab4>] handle_pte_fault+0x84/0x940
>> >> >> >> >
>> >> >> >> >__getblk() has this weird loop where it tries to instantiate the page,
>> >> >> >> >frees memory on failure, then retries.  If the memcg goes OOM, the OOM
>> >> >> >> >path might be entered multiple times and each time leak the memcg
>> >> >> >> >reference of the respective previous OOM invocation.
>> >> >> >> >
>> >> >> >> >There are a few more find_or_create() sites that do not propagate an
>> >> >> >> >error and it's incredibly hard to find out whether they are even taken
>> >> >> >> >during a page fault.  It's not practical to annotate them all with
>> >> >> >> >memcg OOM toggles, so let's just catch all OOM contexts at the end of
>> >> >> >> >handle_mm_fault() and clear them if !VM_FAULT_OOM instead of treating
>> >> >> >> >this like an error.
>> >> >> >> >
>> >> >> >> >azur, here is a patch on top of your modified 3.2.  Note that Michal
>> >> >> >> >might be onto something and we are looking at multiple issues here,
>> >> >> >> >but the log excert above suggests this fix is required either way.
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> Johannes, is this still up to date? Thank you.
>> >> >> >
>> >> >> >No, please use the following on top of 3.2 (i.e. full replacement, not
>> >> >> >incremental to what you have):
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> Unfortunately it didn't compile:
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >>   LD      vmlinux.o
>> >> >>   MODPOST vmlinux.o
>> >> >> WARNING: modpost: Found 4924 section mismatch(es).
>> >> >> To see full details build your kernel with:
>> >> >> 'make CONFIG_DEBUG_SECTION_MISMATCH=y'
>> >> >>   GEN     .version
>> >> >>   CHK     include/generated/compile.h
>> >> >>   UPD     include/generated/compile.h
>> >> >>   CC      init/version.o
>> >> >>   LD      init/built-in.o
>> >> >>   LD      .tmp_vmlinux1
>> >> >> arch/x86/built-in.o: In function `do_page_fault':
>> >> >> (.text+0x26a77): undefined reference to `handle_mm_fault'
>> >> >> mm/built-in.o: In function `fixup_user_fault':
>> >> >> (.text+0x224d3): undefined reference to `handle_mm_fault'
>> >> >> mm/built-in.o: In function `__get_user_pages':
>> >> >> (.text+0x24a0f): undefined reference to `handle_mm_fault'
>> >> >> make: *** [.tmp_vmlinux1] Error 1
>> >> >
>> >> >Oops, sorry about that.  Must be configuration dependent because it
>> >> >works for me (and handle_mm_fault is obviously defined).
>> >> >
>> >> >Do you have warnings earlier in the compilation?  You can use make -s
>> >> >to filter out everything but warnings.
>> >> >
>> >> >Or send me your configuration so I can try to reproduce it here.
>> >> >
>> >> >Thanks!
>> >> 
>> >> 
>> >> Johannes,
>> >> 
>> >> the server went down early in the morning, the symptoms were similar as before - huge I/O. Can't tell what exactly happened since I wasn't able to login even on the console. But I have some info:
>> >>  - applications were able to write to HDD so it wasn't deadlocked as before
>> >>  - here is how it looked on graphs: http://watchdog.sk/lkml/graphs.jpg
>> >>  - server wasn't responding from 6:36, it was down between 6:54 and 7:02 (i had to hard reboot it), I was awoken at 6:36 by really creepy sound from my phone ;)
>> >>  - my 'load check' script successfully killed apache at 6:41 but it didn't help as you can see
>> >>  - i have one screen with info from atop from time 6:44, looks like i/o was done by init (??!): http://watchdog.sk/lkml/atop.jpg (ignore swap warning, i have no swap)
>> >>  - also other type of logs are available
>> >>  - nothing like this happened before
>> >
>> >That IO from init looks really screwy, I have no idea what's going on
>> >on that machine, but it looks like there is more than just a memcg
>> >problem...  Any chance your thirdparty security patches are concealing
>> >kernel daemon activity behind the init process and the IO is actually
>> >coming from a kernel thread like the flushers or kswapd?
>> 
>> 
>> 
>> 
>> I really cannot tell but I never ever saw this before and i'm using all of my patches for several years. Here are all patches which i'm using right now (+ your patch):
>> http://watchdog.sk/lkml/patches3
>> 
>> 
>> 
>> 
>> >Are there OOM kill messages in the syslog?
>> 
>> 
>> 
>> Here is full kernel log between 6:00 and 7:59:
>> http://watchdog.sk/lkml/kern6.log
>
>Wow, your apaches are like the hydra.  Whenever one is OOM killed,
>more show up!



Yeah, it's supposed to do this ;)



>> >> What do you think? I'm now running kernel with your previous patch, not with the newest one.
>> >
>> >Which one exactly?  Can you attach the diff?
>> 
>> 
>> 
>> I meant, the problem above occured on kernel with your latest patch:
>> http://watchdog.sk/lkml/7-2-memcg-fix.patch
>
>The above log has the following callstack:
>
>Sep 10 07:59:43 server01 kernel: [ 3846.337628]  [<ffffffff810d19fe>] dump_header+0x7e/0x1e0
>Sep 10 07:59:43 server01 kernel: [ 3846.337707]  [<ffffffff810d18ff>] ? find_lock_task_mm+0x2f/0x70
>Sep 10 07:59:43 server01 kernel: [ 3846.337790]  [<ffffffff810d18ff>] ? find_lock_task_mm+0x2f/0x70
>Sep 10 07:59:43 server01 kernel: [ 3846.337874]  [<ffffffff81094bb0>] ? __css_put+0x50/0x90
>Sep 10 07:59:43 server01 kernel: [ 3846.337952]  [<ffffffff810d1ec5>] oom_kill_process+0x85/0x2a0
>Sep 10 07:59:43 server01 kernel: [ 3846.338037]  [<ffffffff810d2448>] mem_cgroup_out_of_memory+0xa8/0xf0
>Sep 10 07:59:43 server01 kernel: [ 3846.338120]  [<ffffffff81110858>] T.1154+0x8b8/0x8f0
>Sep 10 07:59:43 server01 kernel: [ 3846.338201]  [<ffffffff81110fa6>] mem_cgroup_charge_common+0x56/0xa0
>Sep 10 07:59:43 server01 kernel: [ 3846.338283]  [<ffffffff81111035>] mem_cgroup_newpage_charge+0x45/0x50
>Sep 10 07:59:43 server01 kernel: [ 3846.338364]  [<ffffffff810f3039>] handle_pte_fault+0x609/0x940
>Sep 10 07:59:43 server01 kernel: [ 3846.338451]  [<ffffffff8102ab1f>] ? pte_alloc_one+0x3f/0x50
>Sep 10 07:59:43 server01 kernel: [ 3846.338532]  [<ffffffff8107e455>] ? sched_clock_local+0x25/0x90
>Sep 10 07:59:43 server01 kernel: [ 3846.338617]  [<ffffffff810f34d7>] handle_mm_fault+0x167/0x340
>Sep 10 07:59:43 server01 kernel: [ 3846.338699]  [<ffffffff8102714b>] do_page_fault+0x13b/0x490
>Sep 10 07:59:43 server01 kernel: [ 3846.338781]  [<ffffffff810f8848>] ? do_brk+0x208/0x3a0
>Sep 10 07:59:43 server01 kernel: [ 3846.338865]  [<ffffffff812dba22>] ? gr_learn_resource+0x42/0x1e0
>Sep 10 07:59:43 server01 kernel: [ 3846.338951]  [<ffffffff815cb7bf>] page_fault+0x1f/0x30
>
>The charge code seems to be directly invoking the OOM killer, which is
>not possible with 7-2-memcg-fix.  Are you sure this is the right patch
>for this log?  This _looks_ more like what 7-1-memcg-fix was doing,
>with a direct kill in the charge context and a fixup later on.




I, luckyly, still have the kernel source from which that kernel was build. I tried to re-apply the 7-2-memcg-fix.patch:

# patch -p1 --dry-run < 7-2-memcg-fix.patch 
patching file arch/x86/mm/fault.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
4 out of 4 hunks ignored -- saving rejects to file arch/x86/mm/fault.c.rej
patching file include/linux/memcontrol.h
Hunk #1 succeeded at 141 with fuzz 2 (offset 21 lines).
Hunk #2 succeeded at 391 with fuzz 1 (offset 39 lines).
patching file include/linux/mm.h
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file include/linux/mm.h.rej
patching file include/linux/sched.h
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file include/linux/sched.h.rej
patching file mm/memcontrol.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
10 out of 10 hunks ignored -- saving rejects to file mm/memcontrol.c.rej
patching file mm/memory.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file mm/memory.c.rej
patching file mm/oom_kill.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file mm/oom_kill.c.rej


Can you tell from this if the source has the right patch?



>It's somewhat eerie that you have to manually apply these patches
>because of grsec because I have no idea of knowing what the end result
>is, especially since you had compile errors in this area before.  Is
>grsec making changes to memcg code or why are these patches not
>applying cleanly?




The problem was in mm/memory.c (first hunk) because grsec added this:

        pgd_t *pgd;
        pud_t *pud;
        pmd_t *pmd;
        pte_t *pte;

+#ifdef CONFIG_PAX_SEGMEXEC
+        struct vm_area_struct *vma_m;
+#endif  

        if (unlikely(is_vm_hugetlb_page(vma)))



I'm not using PAX anyway so it shouldn't be used. This was the only rejection but there were lots of fuzz too - I wasn't considering it as a problem, should I?




>> but after i had to reboot the server i booted the kernel with your previous patch:
>> http://watchdog.sk/lkml/7-1-memcg-fix.patch
>
>This one still has the known memcg leak.



I know but it's the best I have which don't take down the server (yet).


azur

WARNING: multiple messages have this Message-ID (diff)
From: "azurIt" <azurit@pobox.sk>
To: "Johannes Weiner" <hannes@cmpxchg.org>
Cc: "Andrew Morton" <akpm@linux-foundation.org>,
	"Michal Hocko" <mhocko@suse.cz>,
	"David Rientjes" <rientjes@google.com>,
	"KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>,
	"KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [patch 0/7] improve memcg oom killer robustness v2
Date: Tue, 10 Sep 2013 23:08:53 +0200	[thread overview]
Message-ID: <20130910230853.FEEC19B5@pobox.sk> (raw)
In-Reply-To: <20130910201222.GA25972@cmpxchg.org>

>On Tue, Sep 10, 2013 at 09:32:53PM +0200, azurIt wrote:
>> >On Tue, Sep 10, 2013 at 08:13:59PM +0200, azurIt wrote:
>> >> >On Mon, Sep 09, 2013 at 09:59:17PM +0200, azurIt wrote:
>> >> >> >On Mon, Sep 09, 2013 at 03:10:10PM +0200, azurIt wrote:
>> >> >> >> >Hi azur,
>> >> >> >> >
>> >> >> >> >On Wed, Sep 04, 2013 at 10:18:52AM +0200, azurIt wrote:
>> >> >> >> >> > CC: "Andrew Morton" <akpm@linux-foundation.org>, "Michal Hocko" <mhocko@suse.cz>, "David Rientjes" <rientjes@google.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@jp.fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, x86@kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org
>> >> >> >> >> >Hello azur,
>> >> >> >> >> >
>> >> >> >> >> >On Mon, Sep 02, 2013 at 12:38:02PM +0200, azurIt wrote:
>> >> >> >> >> >> >>Hi azur,
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>here is the x86-only rollup of the series for 3.2.
>> >> >> >> >> >> >>
>> >> >> >> >> >> >>Thanks!
>> >> >> >> >> >> >>Johannes
>> >> >> >> >> >> >>---
>> >> >> >> >> >> >
>> >> >> >> >> >> >
>> >> >> >> >> >> >Johannes,
>> >> >> >> >> >> >
>> >> >> >> >> >> >unfortunately, one problem arises: I have (again) cgroup which cannot be deleted :( it's a user who had very high memory usage and was reaching his limit very often. Do you need any info which i can gather now?
>> >> >> >> >> >
>> >> >> >> >> >Did the OOM killer go off in this group?
>> >> >> >> >> >
>> >> >> >> >> >Was there a warning in the syslog ("Fixing unhandled memcg OOM
>> >> >> >> >> >context")?
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> Ok, i see this message several times in my syslog logs, one of them is also for this unremovable cgroup (but maybe all of them cannot be removed, should i try?). Example of the log is here (don't know where exactly it starts and ends so here is the full kernel log):
>> >> >> >> >> http://watchdog.sk/lkml/oom_syslog.gz
>> >> >> >> >There is an unfinished OOM invocation here:
>> >> >> >> >
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715112] Fixing unhandled memcg OOM context set up from:
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715191]  [<ffffffff811105c2>] T.1154+0x622/0x8f0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715274]  [<ffffffff8111153e>] mem_cgroup_cache_charge+0xbe/0xe0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715357]  [<ffffffff810cf31c>] add_to_page_cache_locked+0x4c/0x140
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715443]  [<ffffffff810cf432>] add_to_page_cache_lru+0x22/0x50
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715526]  [<ffffffff810cfdd3>] find_or_create_page+0x73/0xb0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715608]  [<ffffffff811493ba>] __getblk+0xea/0x2c0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715692]  [<ffffffff8114ca73>] __bread+0x13/0xc0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715774]  [<ffffffff81196968>] ext3_get_branch+0x98/0x140
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715859]  [<ffffffff81197557>] ext3_get_blocks_handle+0xd7/0xdc0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.715942]  [<ffffffff81198304>] ext3_get_block+0xc4/0x120
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716023]  [<ffffffff81155c3a>] do_mpage_readpage+0x38a/0x690
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716107]  [<ffffffff81155f8f>] mpage_readpage+0x4f/0x70
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716188]  [<ffffffff811973a8>] ext3_readpage+0x28/0x60
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716268]  [<ffffffff810cfa48>] filemap_fault+0x308/0x560
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716350]  [<ffffffff810ef898>] __do_fault+0x78/0x5a0
>> >> >> >> >  Aug 22 13:15:21 server01 kernel: [1251422.716433]  [<ffffffff810f2ab4>] handle_pte_fault+0x84/0x940
>> >> >> >> >
>> >> >> >> >__getblk() has this weird loop where it tries to instantiate the page,
>> >> >> >> >frees memory on failure, then retries.  If the memcg goes OOM, the OOM
>> >> >> >> >path might be entered multiple times and each time leak the memcg
>> >> >> >> >reference of the respective previous OOM invocation.
>> >> >> >> >
>> >> >> >> >There are a few more find_or_create() sites that do not propagate an
>> >> >> >> >error and it's incredibly hard to find out whether they are even taken
>> >> >> >> >during a page fault.  It's not practical to annotate them all with
>> >> >> >> >memcg OOM toggles, so let's just catch all OOM contexts at the end of
>> >> >> >> >handle_mm_fault() and clear them if !VM_FAULT_OOM instead of treating
>> >> >> >> >this like an error.
>> >> >> >> >
>> >> >> >> >azur, here is a patch on top of your modified 3.2.  Note that Michal
>> >> >> >> >might be onto something and we are looking at multiple issues here,
>> >> >> >> >but the log excert above suggests this fix is required either way.
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> 
>> >> >> >> Johannes, is this still up to date? Thank you.
>> >> >> >
>> >> >> >No, please use the following on top of 3.2 (i.e. full replacement, not
>> >> >> >incremental to what you have):
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> Unfortunately it didn't compile:
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >>   LD      vmlinux.o
>> >> >>   MODPOST vmlinux.o
>> >> >> WARNING: modpost: Found 4924 section mismatch(es).
>> >> >> To see full details build your kernel with:
>> >> >> 'make CONFIG_DEBUG_SECTION_MISMATCH=y'
>> >> >>   GEN     .version
>> >> >>   CHK     include/generated/compile.h
>> >> >>   UPD     include/generated/compile.h
>> >> >>   CC      init/version.o
>> >> >>   LD      init/built-in.o
>> >> >>   LD      .tmp_vmlinux1
>> >> >> arch/x86/built-in.o: In function `do_page_fault':
>> >> >> (.text+0x26a77): undefined reference to `handle_mm_fault'
>> >> >> mm/built-in.o: In function `fixup_user_fault':
>> >> >> (.text+0x224d3): undefined reference to `handle_mm_fault'
>> >> >> mm/built-in.o: In function `__get_user_pages':
>> >> >> (.text+0x24a0f): undefined reference to `handle_mm_fault'
>> >> >> make: *** [.tmp_vmlinux1] Error 1
>> >> >
>> >> >Oops, sorry about that.  Must be configuration dependent because it
>> >> >works for me (and handle_mm_fault is obviously defined).
>> >> >
>> >> >Do you have warnings earlier in the compilation?  You can use make -s
>> >> >to filter out everything but warnings.
>> >> >
>> >> >Or send me your configuration so I can try to reproduce it here.
>> >> >
>> >> >Thanks!
>> >> 
>> >> 
>> >> Johannes,
>> >> 
>> >> the server went down early in the morning, the symptoms were similar as before - huge I/O. Can't tell what exactly happened since I wasn't able to login even on the console. But I have some info:
>> >>  - applications were able to write to HDD so it wasn't deadlocked as before
>> >>  - here is how it looked on graphs: http://watchdog.sk/lkml/graphs.jpg
>> >>  - server wasn't responding from 6:36, it was down between 6:54 and 7:02 (i had to hard reboot it), I was awoken at 6:36 by really creepy sound from my phone ;)
>> >>  - my 'load check' script successfully killed apache at 6:41 but it didn't help as you can see
>> >>  - i have one screen with info from atop from time 6:44, looks like i/o was done by init (??!): http://watchdog.sk/lkml/atop.jpg (ignore swap warning, i have no swap)
>> >>  - also other type of logs are available
>> >>  - nothing like this happened before
>> >
>> >That IO from init looks really screwy, I have no idea what's going on
>> >on that machine, but it looks like there is more than just a memcg
>> >problem...  Any chance your thirdparty security patches are concealing
>> >kernel daemon activity behind the init process and the IO is actually
>> >coming from a kernel thread like the flushers or kswapd?
>> 
>> 
>> 
>> 
>> I really cannot tell but I never ever saw this before and i'm using all of my patches for several years. Here are all patches which i'm using right now (+ your patch):
>> http://watchdog.sk/lkml/patches3
>> 
>> 
>> 
>> 
>> >Are there OOM kill messages in the syslog?
>> 
>> 
>> 
>> Here is full kernel log between 6:00 and 7:59:
>> http://watchdog.sk/lkml/kern6.log
>
>Wow, your apaches are like the hydra.  Whenever one is OOM killed,
>more show up!



Yeah, it's supposed to do this ;)



>> >> What do you think? I'm now running kernel with your previous patch, not with the newest one.
>> >
>> >Which one exactly?  Can you attach the diff?
>> 
>> 
>> 
>> I meant, the problem above occured on kernel with your latest patch:
>> http://watchdog.sk/lkml/7-2-memcg-fix.patch
>
>The above log has the following callstack:
>
>Sep 10 07:59:43 server01 kernel: [ 3846.337628]  [<ffffffff810d19fe>] dump_header+0x7e/0x1e0
>Sep 10 07:59:43 server01 kernel: [ 3846.337707]  [<ffffffff810d18ff>] ? find_lock_task_mm+0x2f/0x70
>Sep 10 07:59:43 server01 kernel: [ 3846.337790]  [<ffffffff810d18ff>] ? find_lock_task_mm+0x2f/0x70
>Sep 10 07:59:43 server01 kernel: [ 3846.337874]  [<ffffffff81094bb0>] ? __css_put+0x50/0x90
>Sep 10 07:59:43 server01 kernel: [ 3846.337952]  [<ffffffff810d1ec5>] oom_kill_process+0x85/0x2a0
>Sep 10 07:59:43 server01 kernel: [ 3846.338037]  [<ffffffff810d2448>] mem_cgroup_out_of_memory+0xa8/0xf0
>Sep 10 07:59:43 server01 kernel: [ 3846.338120]  [<ffffffff81110858>] T.1154+0x8b8/0x8f0
>Sep 10 07:59:43 server01 kernel: [ 3846.338201]  [<ffffffff81110fa6>] mem_cgroup_charge_common+0x56/0xa0
>Sep 10 07:59:43 server01 kernel: [ 3846.338283]  [<ffffffff81111035>] mem_cgroup_newpage_charge+0x45/0x50
>Sep 10 07:59:43 server01 kernel: [ 3846.338364]  [<ffffffff810f3039>] handle_pte_fault+0x609/0x940
>Sep 10 07:59:43 server01 kernel: [ 3846.338451]  [<ffffffff8102ab1f>] ? pte_alloc_one+0x3f/0x50
>Sep 10 07:59:43 server01 kernel: [ 3846.338532]  [<ffffffff8107e455>] ? sched_clock_local+0x25/0x90
>Sep 10 07:59:43 server01 kernel: [ 3846.338617]  [<ffffffff810f34d7>] handle_mm_fault+0x167/0x340
>Sep 10 07:59:43 server01 kernel: [ 3846.338699]  [<ffffffff8102714b>] do_page_fault+0x13b/0x490
>Sep 10 07:59:43 server01 kernel: [ 3846.338781]  [<ffffffff810f8848>] ? do_brk+0x208/0x3a0
>Sep 10 07:59:43 server01 kernel: [ 3846.338865]  [<ffffffff812dba22>] ? gr_learn_resource+0x42/0x1e0
>Sep 10 07:59:43 server01 kernel: [ 3846.338951]  [<ffffffff815cb7bf>] page_fault+0x1f/0x30
>
>The charge code seems to be directly invoking the OOM killer, which is
>not possible with 7-2-memcg-fix.  Are you sure this is the right patch
>for this log?  This _looks_ more like what 7-1-memcg-fix was doing,
>with a direct kill in the charge context and a fixup later on.




I, luckyly, still have the kernel source from which that kernel was build. I tried to re-apply the 7-2-memcg-fix.patch:

# patch -p1 --dry-run < 7-2-memcg-fix.patch 
patching file arch/x86/mm/fault.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
4 out of 4 hunks ignored -- saving rejects to file arch/x86/mm/fault.c.rej
patching file include/linux/memcontrol.h
Hunk #1 succeeded at 141 with fuzz 2 (offset 21 lines).
Hunk #2 succeeded at 391 with fuzz 1 (offset 39 lines).
patching file include/linux/mm.h
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file include/linux/mm.h.rej
patching file include/linux/sched.h
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file include/linux/sched.h.rej
patching file mm/memcontrol.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
10 out of 10 hunks ignored -- saving rejects to file mm/memcontrol.c.rej
patching file mm/memory.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
2 out of 2 hunks ignored -- saving rejects to file mm/memory.c.rej
patching file mm/oom_kill.c
Reversed (or previously applied) patch detected!  Assume -R? [n] 
Apply anyway? [n] 
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file mm/oom_kill.c.rej


Can you tell from this if the source has the right patch?



>It's somewhat eerie that you have to manually apply these patches
>because of grsec because I have no idea of knowing what the end result
>is, especially since you had compile errors in this area before.  Is
>grsec making changes to memcg code or why are these patches not
>applying cleanly?




The problem was in mm/memory.c (first hunk) because grsec added this:

        pgd_t *pgd;
        pud_t *pud;
        pmd_t *pmd;
        pte_t *pte;

+#ifdef CONFIG_PAX_SEGMEXEC
+        struct vm_area_struct *vma_m;
+#endif  

        if (unlikely(is_vm_hugetlb_page(vma)))



I'm not using PAX anyway so it shouldn't be used. This was the only rejection but there were lots of fuzz too - I wasn't considering it as a problem, should I?




>> but after i had to reboot the server i booted the kernel with your previous patch:
>> http://watchdog.sk/lkml/7-1-memcg-fix.patch
>
>This one still has the known memcg leak.



I know but it's the best I have which don't take down the server (yet).


azur

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-09-10 21:08 UTC|newest]

Thread overview: 227+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-03 16:59 [patch 0/7] improve memcg oom killer robustness v2 Johannes Weiner
2013-08-03 16:59 ` Johannes Weiner
2013-08-03 16:59 ` [patch 1/7] arch: mm: remove obsolete init OOM protection Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-06  6:34   ` Vineet Gupta
2013-08-06  6:34     ` Vineet Gupta
2013-08-06  6:34     ` Vineet Gupta
2013-08-03 16:59 ` [patch 2/7] arch: mm: do not invoke OOM killer on kernel fault OOM Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-03 16:59 ` [patch 3/7] arch: mm: pass userspace fault flag to generic fault handler Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-05 22:06   ` Andrew Morton
2013-08-05 22:06     ` Andrew Morton
2013-08-05 22:25     ` Johannes Weiner
2013-08-05 22:25       ` Johannes Weiner
2013-08-03 16:59 ` [patch 4/7] x86: finish user fault error path with fatal signal Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-03 16:59 ` [patch 5/7] mm: memcg: enable memcg OOM killer only for user faults Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-05  9:18   ` Michal Hocko
2013-08-05  9:18     ` Michal Hocko
2013-08-03 16:59 ` [patch 6/7] mm: memcg: rework and document OOM waiting and wakeup Johannes Weiner
2013-08-03 16:59   ` Johannes Weiner
2013-08-03 17:00 ` [patch 7/7] mm: memcg: do not trap chargers with full callstack on OOM Johannes Weiner
2013-08-03 17:00   ` Johannes Weiner
2013-08-05  9:54   ` Michal Hocko
2013-08-05  9:54     ` Michal Hocko
2013-08-05  9:54     ` Michal Hocko
2013-08-05 20:56     ` Johannes Weiner
2013-08-05 20:56       ` Johannes Weiner
2013-08-03 17:08 ` [patch 0/7] improve memcg oom killer robustness v2 Johannes Weiner
2013-08-03 17:08   ` Johannes Weiner
2013-08-09  9:06   ` azurIt
2013-08-09  9:06     ` azurIt
2013-08-09  9:06     ` azurIt
2013-08-30 19:58   ` azurIt
2013-08-30 19:58     ` azurIt
2013-09-02 10:38     ` azurIt
2013-09-02 10:38       ` azurIt
2013-09-03 20:48       ` Johannes Weiner
2013-09-03 20:48         ` Johannes Weiner
2013-09-04  7:53         ` azurIt
2013-09-04  7:53           ` azurIt
2013-09-04  7:53           ` azurIt
2013-09-04  7:53           ` azurIt
2013-09-04  8:18         ` azurIt
2013-09-04  8:18           ` azurIt
2013-09-05 11:54           ` Johannes Weiner
2013-09-05 11:54             ` Johannes Weiner
2013-09-05 12:43             ` Michal Hocko
2013-09-05 12:43               ` Michal Hocko
2013-09-05 16:18               ` Johannes Weiner
2013-09-05 16:18                 ` Johannes Weiner
2013-09-09 12:36                 ` Michal Hocko
2013-09-09 12:36                   ` Michal Hocko
2013-09-09 12:56                   ` Michal Hocko
2013-09-09 12:56                     ` Michal Hocko
2013-09-12 12:59                     ` Johannes Weiner
2013-09-12 12:59                       ` Johannes Weiner
2013-09-16 14:03                       ` Michal Hocko
2013-09-16 14:03                         ` Michal Hocko
2013-09-16 14:03                         ` Michal Hocko
2013-09-05 13:24             ` Michal Hocko
2013-09-05 13:24               ` Michal Hocko
2013-09-09 13:10             ` azurIt
2013-09-09 13:10               ` azurIt
2013-09-09 17:28               ` Johannes Weiner
2013-09-09 17:28                 ` Johannes Weiner
2013-09-09 19:59                 ` azurIt
2013-09-09 19:59                   ` azurIt
2013-09-09 20:12                   ` Johannes Weiner
2013-09-09 20:12                     ` Johannes Weiner
2013-09-09 20:18                     ` azurIt
2013-09-09 20:18                       ` azurIt
2013-09-09 21:08                     ` azurIt
2013-09-09 21:08                       ` azurIt
2013-09-10 18:13                     ` azurIt
2013-09-10 18:13                       ` azurIt
2013-09-10 18:37                       ` Johannes Weiner
2013-09-10 18:37                         ` Johannes Weiner
2013-09-10 19:32                         ` azurIt
2013-09-10 19:32                           ` azurIt
2013-09-10 20:12                           ` Johannes Weiner
2013-09-10 20:12                             ` Johannes Weiner
2013-09-10 21:08                             ` azurIt [this message]
2013-09-10 21:08                               ` azurIt
2013-09-10 21:08                               ` azurIt
2013-09-10 21:18                               ` Johannes Weiner
2013-09-10 21:18                                 ` Johannes Weiner
2013-09-10 21:32                                 ` azurIt
2013-09-10 21:32                                   ` azurIt
2013-09-10 22:03                                   ` Johannes Weiner
2013-09-10 22:03                                     ` Johannes Weiner
2013-09-11 12:33                                     ` azurIt
2013-09-11 12:33                                       ` azurIt
2013-09-11 18:03                                       ` Johannes Weiner
2013-09-11 18:03                                         ` Johannes Weiner
2013-09-11 18:03                                         ` Johannes Weiner
2013-09-11 18:54                                         ` azurIt
2013-09-11 18:54                                           ` azurIt
2013-09-11 19:11                                           ` Johannes Weiner
2013-09-11 19:11                                             ` Johannes Weiner
2013-09-11 19:41                                             ` azurIt
2013-09-11 19:41                                               ` azurIt
2013-09-11 20:04                                               ` Johannes Weiner
2013-09-11 20:04                                                 ` Johannes Weiner
2013-09-14 10:48                                                 ` azurIt
2013-09-14 10:48                                                   ` azurIt
2013-09-16 13:40                                                   ` Michal Hocko
2013-09-16 13:40                                                     ` Michal Hocko
2013-09-16 14:01                                                     ` azurIt
2013-09-16 14:01                                                       ` azurIt
2013-09-16 14:06                                                       ` Michal Hocko
2013-09-16 14:06                                                         ` Michal Hocko
2013-09-16 14:13                                                         ` azurIt
2013-09-16 14:13                                                           ` azurIt
2013-09-16 14:13                                                           ` azurIt
2013-09-16 14:57                                                           ` Michal Hocko
2013-09-16 14:57                                                             ` Michal Hocko
2013-09-16 15:05                                                             ` azurIt
2013-09-16 15:05                                                               ` azurIt
2013-09-16 15:17                                                               ` Johannes Weiner
2013-09-16 15:17                                                                 ` Johannes Weiner
2013-09-16 15:17                                                                 ` Johannes Weiner
2013-09-16 15:24                                                                 ` azurIt
2013-09-16 15:24                                                                   ` azurIt
2013-09-16 15:25                                                               ` Michal Hocko
2013-09-16 15:25                                                                 ` Michal Hocko
2013-09-16 15:40                                                                 ` azurIt
2013-09-16 15:40                                                                   ` azurIt
2013-09-16 20:52                                                                 ` azurIt
2013-09-16 20:52                                                                   ` azurIt
2013-09-17  0:02                                                                   ` Johannes Weiner
2013-09-17  0:02                                                                     ` Johannes Weiner
2013-09-17 11:15                                                                     ` azurIt
2013-09-17 11:15                                                                       ` azurIt
2013-09-17 11:15                                                                       ` azurIt
2013-09-17 14:10                                                                       ` Michal Hocko
2013-09-17 14:10                                                                         ` Michal Hocko
2013-09-18 14:03                                                                         ` azurIt
2013-09-18 14:03                                                                           ` azurIt
2013-09-18 14:03                                                                           ` azurIt
2013-09-18 14:24                                                                           ` Michal Hocko
2013-09-18 14:24                                                                             ` Michal Hocko
2013-09-18 14:33                                                                             ` azurIt
2013-09-18 14:33                                                                               ` azurIt
2013-09-18 14:42                                                                               ` Michal Hocko
2013-09-18 14:42                                                                                 ` Michal Hocko
2013-09-18 14:42                                                                                 ` Michal Hocko
2013-09-18 18:02                                                                                 ` azurIt
2013-09-18 18:02                                                                                   ` azurIt
2013-09-18 18:36                                                                                   ` Michal Hocko
2013-09-18 18:36                                                                                     ` Michal Hocko
2013-09-18 18:36                                                                                     ` Michal Hocko
2013-09-18 18:04                                                                           ` Johannes Weiner
2013-09-18 18:04                                                                             ` Johannes Weiner
2013-09-18 18:19                                                                             ` Johannes Weiner
2013-09-18 18:19                                                                               ` Johannes Weiner
2013-09-18 19:55                                                                               ` Johannes Weiner
2013-09-18 19:55                                                                                 ` Johannes Weiner
2013-09-18 19:55                                                                                 ` Johannes Weiner
2013-09-18 20:52                                                                                 ` azurIt
2013-09-18 20:52                                                                                   ` azurIt
2013-09-18 20:52                                                                                   ` azurIt
2013-09-25  7:26                                                                                 ` azurIt
2013-09-25  7:26                                                                                   ` azurIt
2013-09-25  7:26                                                                                   ` azurIt
2013-09-26 16:54                                                                                 ` azurIt
2013-09-26 16:54                                                                                   ` azurIt
2013-09-26 16:54                                                                                   ` azurIt
2013-09-26 19:27                                                                                   ` Johannes Weiner
2013-09-26 19:27                                                                                     ` Johannes Weiner
2013-09-27  2:04                                                                                     ` azurIt
2013-09-27  2:04                                                                                       ` azurIt
2013-09-27  2:04                                                                                       ` azurIt
2013-09-27  2:04                                                                                       ` azurIt
2013-10-07 11:01                                                                                     ` azurIt
2013-10-07 11:01                                                                                       ` azurIt
2013-10-07 11:01                                                                                       ` azurIt
2013-10-07 11:01                                                                                       ` azurIt
2013-10-07 19:23                                                                                       ` Johannes Weiner
2013-10-07 19:23                                                                                         ` Johannes Weiner
2013-10-09 18:44                                                                                         ` azurIt
2013-10-09 18:44                                                                                           ` azurIt
2013-10-09 18:44                                                                                           ` azurIt
2013-10-10  0:14                                                                                           ` Johannes Weiner
2013-10-10  0:14                                                                                             ` Johannes Weiner
2013-10-10  0:14                                                                                             ` Johannes Weiner
2013-10-10 22:59                                                                                             ` azurIt
2013-10-10 22:59                                                                                               ` azurIt
2013-10-10 22:59                                                                                               ` azurIt
2013-09-17 11:20                                                                     ` azurIt
2013-09-17 11:20                                                                       ` azurIt
2013-09-16 10:22                                                 ` azurIt
2013-09-16 10:22                                                   ` azurIt
2013-09-04  9:45         ` azurIt
2013-09-04  9:45           ` azurIt
2013-09-04 11:57           ` Michal Hocko
2013-09-04 11:57             ` Michal Hocko
2013-09-04 12:10             ` azurIt
2013-09-04 12:10               ` azurIt
2013-09-04 12:10               ` azurIt
2013-09-04 12:26               ` Michal Hocko
2013-09-04 12:26                 ` Michal Hocko
2013-09-04 12:26                 ` Michal Hocko
2013-09-04 12:39                 ` azurIt
2013-09-04 12:39                   ` azurIt
2013-09-05  9:14                 ` azurIt
2013-09-05  9:14                   ` azurIt
2013-09-05  9:53                   ` Michal Hocko
2013-09-05  9:53                     ` Michal Hocko
2013-09-05 10:17                     ` azurIt
2013-09-05 10:17                       ` azurIt
2013-09-05 11:17                       ` Michal Hocko
2013-09-05 11:17                         ` Michal Hocko
2013-09-05 11:17                         ` Michal Hocko
2013-09-05 11:47                         ` azurIt
2013-09-05 11:47                           ` azurIt
2013-09-05 12:03                           ` Michal Hocko
2013-09-05 12:03                             ` Michal Hocko
2013-09-05 12:33                             ` azurIt
2013-09-05 12:33                               ` azurIt
2013-09-05 12:33                               ` azurIt
2013-09-05 12:45                               ` Michal Hocko
2013-09-05 12:45                                 ` Michal Hocko
2013-09-05 13:00                                 ` azurIt
2013-09-05 13:00                                   ` azurIt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130910230853.FEEC19B5@pobox.sk \
    --to=azurit@pobox.sk \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=rientjes@google.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.