From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750782AbdAWKtC (ORCPT ); Mon, 23 Jan 2017 05:49:02 -0500 Received: from outbound-smtp10.blacknight.com ([46.22.139.15]:58712 "EHLO outbound-smtp10.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750703AbdAWKtB (ORCPT ); Mon, 23 Jan 2017 05:49:01 -0500 Date: Mon, 23 Jan 2017 10:48:58 +0000 From: Mel Gorman To: Trevor Cordes Cc: Michal Hocko , linux-kernel@vger.kernel.org, Joonsoo Kim , Minchan Kim , Rik van Riel , Srikar Dronamraju Subject: Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected) Message-ID: <20170123104858.gpjy25y2ogju3gkg@techsingularity.net> References: <20170116110934.7zopy3ecg2lfadkd@techsingularity.net> <20170117135228.GN19699@dhcp22.suse.cz> <20170117142114.r7abr3x2bbik47sd@techsingularity.net> <20170117145450.GQ19699@dhcp22.suse.cz> <20170119034850.0b7d504c@pog.tecnopolis.ca> <20170119113757.GP30786@dhcp22.suse.cz> <20170120003544.7e6e34d1@pog.tecnopolis.ca> <20170120110232.y7xd4b7wtwqslgnw@techsingularity.net> <20170120155553.gjv2x5eycvdudnil@techsingularity.net> <20170122184559.0b5c0fd8@pog.tecnopolis.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20170122184559.0b5c0fd8@pog.tecnopolis.ca> User-Agent: Mutt/1.6.2 (2016-07-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, Jan 22, 2017 at 06:45:59PM -0600, Trevor Cordes wrote: > On 2017-01-20 Mel Gorman wrote: > > > > > > Thanks for the OOM report. I was expecting it to be a particular > > > shape and my expectations were not matched so it took time to > > > consider it further. Can you try the cumulative patch below? It > > > combines three patches that > > > > > > 1. Allow slab shrinking even if the LRU patches are unreclaimable in > > > direct reclaim > > > 2. Shrinks slab based once based on the contents of all memcgs > > > instead of shrinking one at a time > > > 3. Tries to shrink slabs if the lowmem usage is too high > > > > > > Unfortunately it's only boot tested on x86-64 as I didn't get the > > > chance to setup an i386 test bed. > > > > > > > There was one major flaw in that patch. This version fixes it and > > addresses other minor issues. It may still be too agressive shrinking > > slab but worth trying out. Thanks. > > I ran with your patch below and it oom'd on the first night. It was > weird, it didn't hang the system, and my rebooter script started a > reboot but the system never got more than half down before it just sat > there in a weird state where a local console user could still login but > not much was working. So the patches don't seem to solve the problem. > > For the above compile I applied your patches to 4.10.0-rc4+, I hope > that's ok. > It would be strongly preferred to run them on top of Michal's other fixes. The main reason it's preferred is because this OOM differs from earlier ones in that it OOM killed from GFP_NOFS|__GFP_NOFAIL context. That meant that the slab shrinking could not happen from direct reclaim so the balancing from my patches would not occur. As Michal's other patches affect how kswapd behaves, it's important. Unfortunately, even that will be race prone for GFP_NOFS callers as they'll effectively be racing to see if kswapd or another direct reclaimer can reclaim before the OOM conditions are hit. It is by design, but it's apparent that a __GFP_NOFAIL request can trigger OOM relatively easily as it's not necessarily throttled or waiting on kswapd to complete any work. I'll keep thinking about it. -- Mel Gorman SUSE Labs