From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S964824AbdCJKVZ (ORCPT <rfc822;w@1wt.eu>);
        Fri, 10 Mar 2017 05:21:25 -0500
Received: from mx2.suse.de ([195.135.220.15]:60595 "EHLO mx2.suse.de"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S935601AbdCJKVV (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 10 Mar 2017 05:21:21 -0500
Date: Fri, 10 Mar 2017 11:20:10 +0100
From: Michal Hocko <mhocko@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Mel Gorman <mgorman@suse.de>, Vlastimil Babka <vbabka@suse.cz>,
        Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>, linux-mm@kvack.org,
        LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm, vmscan: do not loop on too_many_isolated for ever
Message-ID: <20170310102010.GD3753@dhcp22.suse.cz>
References: <20170307133057.26182-1-mhocko@kernel.org>
 <1488916356.6405.4.camel@redhat.com>
 <20170309180540.GA8678@cmpxchg.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20170309180540.GA8678@cmpxchg.org>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu 09-03-17 13:05:40, Johannes Weiner wrote:
> On Tue, Mar 07, 2017 at 02:52:36PM -0500, Rik van Riel wrote:
> > It only does this to some extent.  If reclaim made
> > no progress, for example due to immediately bailing
> > out because the number of already isolated pages is
> > too high (due to many parallel reclaimers), the code
> > could hit the "no_progress_loops > MAX_RECLAIM_RETRIES"
> > test without ever looking at the number of reclaimable
> > pages.
> 
> Hm, there is no early return there, actually. We bump the loop counter
> every time it happens, but then *do* look at the reclaimable pages.
> 
> > Could that create problems if we have many concurrent
> > reclaimers?
> 
> With increased concurrency, the likelihood of OOM will go up if we
> remove the unlimited wait for isolated pages, that much is true.
> 
> I'm not sure that's a bad thing, however, because we want the OOM
> killer to be predictable and timely. So a reasonable wait time in
> between 0 and forever before an allocating thread gives up under
> extreme concurrency makes sense to me.
> 
> > It may be OK, I just do not understand all the implications.
> > 
> > I like the general direction your patch takes the code in,
> > but I would like to understand it better...
> 
> I feel the same way. The throttling logic doesn't seem to be very well
> thought out at the moment, making it hard to reason about what happens
> in certain scenarios.
> 
> In that sense, this patch isn't really an overall improvement to the
> way things work. It patches a hole that seems to be exploitable only
> from an artificial OOM torture test, at the risk of regressing high
> concurrency workloads that may or may not be artificial.
> 
> Unless I'm mistaken, there doesn't seem to be a whole lot of urgency
> behind this patch. Can we think about a general model to deal with
> allocation concurrency? 

I am definitely not against. There is no reason to rush the patch in.
My main point behind this patch was to reduce unbound loops from inside
the reclaim path and push any throttling up the call chain to the
page allocator path because I believe that it is easier to reason
about them at that level. The direct reclaim should be as simple as
possible without too many side effects otherwise we end up in a highly
unpredictable behavior. This was a first step in that direction and my
testing so far didn't show any regressions.

> Unlimited parallel direct reclaim is kinda
> bonkers in the first place. How about checking for excessive isolation
> counts from the page allocator and putting allocations on a waitqueue?

I would be interested in details here.
-- 
Michal Hocko
SUSE Labs