Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free

From: Vlastimil Babka <vbabka@suse.cz>
To: Michal Hocko <mhocko@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Marc MERLIN <marc@merlins.org>, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>, Tejun Heo <tj@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
Date: Wed, 23 Nov 2016 10:18:26 +0100	[thread overview]
Message-ID: <5d506912-d2a1-379b-d384-0a48ec5ab707@suse.cz> (raw)
In-Reply-To: <20161123063410.GB2864@dhcp22.suse.cz>

On 11/23/2016 07:34 AM, Michal Hocko wrote:
> On Tue 22-11-16 11:38:47, Linus Torvalds wrote:
>> On Tue, Nov 22, 2016 at 8:14 AM, Vlastimil Babka <vbabka@suse.cz> wrote:
>>>
>>> Thanks a lot for the testing. So what do we do now about 4.8? (4.7 is
>>> already EOL AFAICS).
>>>
>>> - send the patch [1] as 4.8-only stable.
>>
>> I think that's the right thing to do. It's pretty small, and the
>> argument that it changes the oom logic too much is pretty bogus, I
>> think. The oom logic in 4.8 is simply broken. Let's get it fixed.
>> Changing it is the point.
>
> The point I've tried to make is that it is not should_reclaim_retry
> which is broken. It's an overly optimistic reliance on the compaction
> to do it's work which led to all those issues. My previous fix
> 31e49bfda184 ("mm, oom: protect !costly allocations some more for
> !CONFIG_COMPACTION") tried to cope with that by checking the order-0
> watermark which has proven to help most users. Now it didn't cover
> everybody obviously. Rather than fiddling with fine tuning of these
> heuristics I think it would be safer to simply admit that high order
> OOM detection doesn't work in 4.8 kernel and so do not declare the OOM
> killer for those requests at all. The risk of such a change is not big
> because there usually are order-0 requests happening all the time so if
> we are really OOM we would trigger the OOM eventually.
>
> So I am proposing this for 4.8 stable tree instead
> ---
> commit b2ccdcb731b666aa28f86483656c39c5e53828c7
> Author: Michal Hocko <mhocko@suse.com>
> Date:   Wed Nov 23 07:26:30 2016 +0100
>
>     mm, oom: stop pre-mature high-order OOM killer invocations
>
>     31e49bfda184 ("mm, oom: protect !costly allocations some more for
>     !CONFIG_COMPACTION") was an attempt to reduce chances of pre-mature OOM
>     killer invocation for high order requests. It seemed to work for most
>     users just fine but it is far from bullet proof and obviously not
>     sufficient for Marc who has reported pre-mature OOM killer invocations
>     with 4.8 based kernels. 4.9 will all the compaction improvements seems
>     to be behaving much better but that would be too intrusive to backport
>     to 4.8 stable kernels. Instead this patch simply never declares OOM for
>     !costly high order requests. We rely on order-0 requests to do that in
>     case we are really out of memory. Order-0 requests are much more common
>     and so a risk of a livelock without any way forward is highly unlikely.
>
>     Reported-by: Marc MERLIN <marc@merlins.org>
>     Signed-off-by: Michal Hocko <mhocko@suse.com>

This should effectively restore the 4.6 logic, so I'm fine with it for 
stable, if it passes testing.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index a2214c64ed3c..7401e996009a 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -3161,6 +3161,16 @@ should_compact_retry(struct alloc_context *ac, unsigned int order, int alloc_fla
>  	if (!order || order > PAGE_ALLOC_COSTLY_ORDER)
>  		return false;
>
> +#ifdef CONFIG_COMPACTION
> +	/*
> +	 * This is a gross workaround to compensate a lack of reliable compaction
> +	 * operation. We cannot simply go OOM with the current state of the compaction
> +	 * code because this can lead to pre mature OOM declaration.
> +	 */
> +	if (order <= PAGE_ALLOC_COSTLY_ORDER)
> +		return true;
> +#endif
> +
>  	/*
>  	 * There are setups with compaction disabled which would prefer to loop
>  	 * inside the allocator rather than hit the oom killer prematurely.
>