From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753129AbcCGPm3 (ORCPT ); Mon, 7 Mar 2016 10:42:29 -0500 Received: from mx2.suse.de ([195.135.220.15]:44611 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751146AbcCGPmW (ORCPT ); Mon, 7 Mar 2016 10:42:22 -0500 Subject: Re: [PATCH] mm: limit direct reclaim for higher order allocations To: Rik van Riel , linux-kernel@vger.kernel.org References: <20160224163850.3d7eb56c@annuminas.surriel.com> Cc: hannes@cmpxchg.org, akpm@linux-foundation.org, mgorman@suse.de From: Vlastimil Babka Message-ID: <56DDA15B.70006@suse.cz> Date: Mon, 7 Mar 2016 16:42:19 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <20160224163850.3d7eb56c@annuminas.surriel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/24/2016 10:38 PM, Rik van Riel wrote: > For multi page allocations smaller than PAGE_ALLOC_COSTLY_ORDER, > the kernel will do direct reclaim if compaction failed for any > reason. This worked fine when Linux systems had 128MB RAM, but > on my 24GB system I frequently see higher order allocations > free up over 3GB of memory, pushing all kinds of things into > swap, and slowing down applications. > > It would be much better to limit the amount of reclaim done, > rather than cause excessive pageout activity. > > When enough memory is free to do compaction for the highest order > allocation possible, bail out of the direct page reclaim code. > > On smaller systems, this may be enough to obtain contiguous > free memory areas to satisfy small allocations, continuing our > strategy of relying on luck occasionally. On larger systems, > relying on luck like that has not been working for years. > > Signed-off-by: Rik van Riel So the main point of this patch is the change from "continue" to "return true", right? This will prevent looking at other zones, but I guess that's not the reason why without this patch reclaim frees 3 of 24GB for you? What I suspect more is should_continue_reclaim() where it wants to reclaim (2UL << sc->order) pages regardless of watermark, or compaction status. But that one is called from shrink_zone(), and shrink_zones() should not call shrink_zone() if compaction is ready, even before this patch. Perhaps if multiple processes manage to enter shrink_zone() simultaneously, they could over-reclaim due to that? > --- > mm/vmscan.c | 19 ++++++++----------- > 1 file changed, 8 insertions(+), 11 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index fc62546096f9..8dd15d514761 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2584,20 +2584,17 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc) > continue; /* Let kswapd poll it */ > > /* > - * If we already have plenty of memory free for > - * compaction in this zone, don't free any more. > - * Even though compaction is invoked for any > - * non-zero order, only frequent costly order > - * reclamation is disruptive enough to become a > - * noticeable problem, like transparent huge > - * page allocations. > + * For higher order allocations, free enough memory > + * to be able to do compaction for the largest possible > + * allocation. On smaller systems, this may be enough > + * that smaller allocations can skip compaction, if > + * enough adjacent pages get freed. > */ > - if (IS_ENABLED(CONFIG_COMPACTION) && > - sc->order > PAGE_ALLOC_COSTLY_ORDER && > + if (IS_ENABLED(CONFIG_COMPACTION) && sc->order && > zonelist_zone_idx(z) <= requested_highidx && > - compaction_ready(zone, sc->order)) { > + compaction_ready(zone, MAX_ORDER)) { > sc->compaction_ready = true; > - continue; > + return true; > } > > /* >