From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-f45.google.com ([209.85.215.45]:33753 "EHLO mail-lf0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752814AbcIUIE2 (ORCPT ); Wed, 21 Sep 2016 04:04:28 -0400 Received: by mail-lf0-f45.google.com with SMTP id b71so6917591lfg.0 for ; Wed, 21 Sep 2016 01:04:27 -0700 (PDT) Date: Wed, 21 Sep 2016 10:04:25 +0200 From: Michal Hocko Subject: Re: Excessive xfs_inode allocations trigger OOM killer Message-ID: <20160921080425.GC10300@dhcp22.suse.cz> References: <87a8f2pd2d.fsf@mid.deneb.enyo.de> <20160920203039.GI340@dastard> <87mvj2mgsg.fsf@mid.deneb.enyo.de> <20160920214612.GJ340@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160920214612.GJ340@dastard> Sender: linux-xfs-owner@vger.kernel.org List-ID: List-Id: xfs To: Dave Chinner Cc: Florian Weimer , xfs@oss.sgi.com, linux-xfs@vger.kernel.org, linux-mm@vkack.org On Wed 21-09-16 07:46:12, Dave Chinner wrote: > [cc Michal, linux-mm@kvack.org] > > On Tue, Sep 20, 2016 at 10:56:31PM +0200, Florian Weimer wrote: [...] > > [51669.515086] make invoked oom-killer: gfp_mask=0x27000c0(GFP_KERNEL_ACCOUNT|__GFP_NOTRACK), order=2, oom_score_adj=0 > > [51669.515092] CPU: 1 PID: 1202 Comm: make Tainted: G I 4.7.1fw #1 > > [51669.515093] Hardware name: System manufacturer System Product Name/P6X58D-E, BIOS 0701 05/10/2011 > > [51669.515095] 0000000000000000 ffffffff812a7d39 0000000000000000 0000000000000000 > > [51669.515098] ffffffff8114e4da ffff880018707d98 0000000000000000 000000000066ca81 > > [51669.515100] ffffffff8170e88d ffffffff810fe69e ffff88033fc38728 0000000200000006 > > [51669.515102] Call Trace: > > [51669.515108] [] ? dump_stack+0x46/0x5d > > [51669.515113] [] ? dump_header.isra.12+0x51/0x176 > > [51669.515116] [] ? oom_kill_process+0x32e/0x420 > > [51669.515119] [] ? page_alloc_cpu_notify+0x40/0x40 > > [51669.515120] [] ? find_lock_task_mm+0x2c/0x70 > > [51669.515122] [] ? out_of_memory+0x28d/0x2d0 > > [51669.515125] [] ? __alloc_pages_nodemask+0xb97/0xc90 > > [51669.515128] [] ? copy_process.part.54+0xec/0x17a0 > > [51669.515131] [] ? handle_mm_fault+0xaa8/0x1900 > > [51669.515133] [] ? _do_fork+0xd4/0x320 > > [51669.515137] [] ? __set_current_blocked+0x2c/0x40 > > [51669.515140] [] ? do_syscall_64+0x3e/0x80 > > [51669.515144] [] ? entry_SYSCALL64_slow_path+0x25/0x25 > ..... > > [51669.515194] DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB > > [51669.515202] DMA32: 45619*4kB (UME) 73*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 183060kB > > [51669.515209] Normal: 39979*4kB (UE) 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 159916kB > ..... > > Alright, that's what I suspected. high order allocation for a new > kernel stack and memory is so fragmented that a contiguous > allocation fails. Really, this is a memory reclaim issue, not an XFS > issue. There is lots of reclaimable memory available, but memory > reclaim is: > > a) not trying hard enough to reclaim reclaimable memory; and > b) not waiting for memory compaction to rebuild contiguous > memory regions for high order allocations. > > Instead, it is declaring OOM and kicking the killer to free memory > held busy userspace. Yes this was the case with 4.7 kernel. There is a workaround sitting in the linus tree 6b4e3181d7bd ("mm, oom: prevent premature OOM killer invocation for high order request") which should get to stable eventually. More approapriate fix is currently in the linux-next. Testing the same workload with linux-next would be very helpful. Thanks! -- Michal Hocko SUSE Labs