From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933214AbcFJQIJ (ORCPT ); Fri, 10 Jun 2016 12:08:09 -0400 Received: from mail-io0-f171.google.com ([209.85.223.171]:33888 "EHLO mail-io0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752963AbcFJQIH (ORCPT ); Fri, 10 Jun 2016 12:08:07 -0400 MIME-Version: 1.0 In-Reply-To: References: <60e8df74202e40b28a4d53dbc7fd0b22@IL-EXCH02.marvell.com> <20160531131520.GI24936@arm.com> <20160602135226.GX2527@techsingularity.net> <20160603095344.GZ2527@techsingularity.net> <20160603123655.GA2527@techsingularity.net> <20160608100950.GH2527@techsingularity.net> Date: Fri, 10 Jun 2016 18:08:05 +0200 Message-ID: Subject: Re: [BUG] Page allocation failures with newest kernels From: Marcin Wojtas To: Mel Gorman Cc: Will Deacon , Yehuda Yitschak , Robin Murphy , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , Lior Amsalem , Thomas Petazzoni , Catalin Marinas , Arnd Bergmann , Grzegorz Jaszczyk , Nadav Haklai , Tomasz Nowicki , =?UTF-8?Q?Gregory_Cl=C3=A9ment?= Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Mel, Thanks for posting patch. I tested it on LKv4.4.8. Despite "mode:0x2284020" shows that __GFP_ATOMIC is now not stripped, the issue remains: http://pastebin.com/DmezUJSc Best regards, Marcin 2016-06-09 20:13 GMT+02:00 Marcin Wojtas : > Hi Mel, > > My last email got cut in half. > > 2016-06-08 12:09 GMT+02:00 Mel Gorman : >> On Tue, Jun 07, 2016 at 07:36:57PM +0200, Marcin Wojtas wrote: >>> Hi Mel, >>> >>> >>> >>> 2016-06-03 14:36 GMT+02:00 Mel Gorman : >>> > On Fri, Jun 03, 2016 at 01:57:06PM +0200, Marcin Wojtas wrote: >>> >> >> For the record: the newest kernel I was able to reproduce the dumps >>> >> >> was v4.6: http://pastebin.com/ekDdACn5. I've just checked v4.7-rc1, >>> >> >> which comprise a lot (mainly yours) changes in mm, and I'm wondering >>> >> >> if there may be a spot fix or rather a series of improvements. I'm >>> >> >> looking forward to your opinion and would be grateful for any advice. >>> >> >> >>> >> > >>> >> > I don't believe we want to reintroduce the reserve to cope with CMA. One >>> >> > option would be to widen the gap between low and min watermark by the >>> >> > size of the CMA region. The effect would be to wake kswapd earlier which >>> >> > matters considering the context of the failing allocation was >>> >> > GFP_ATOMIC. >>> >> >>> >> Of course my intention is not reintroducing anything that's gone >>> >> forever, but just to find out way to overcome current issues. Do you >>> >> mean increasing CMA size? >>> > >>> > No. There is a gap between the low and min watermarks. At the low point, >>> > kswapd is woken up and at the min point allocation requests either >>> > either direct reclaim or fail if they are atomic. What I'm suggesting >>> > is that you adjust the low watermark and add the size of the CMA area >>> > to it so that kswapd is woken earlier. The watermarks are calculated in >>> > __setup_per_zone_wmarks >>> > >>> >>> I printed all zones' settings, whose watermarks are configured within >>> __setup_per_zone_wmarks(). There are three DMA, Normal and Movable - >>> only first one's watermarks have non-zero values. Increasing DMA min >>> watermark didn't help. I also played with increasing >> >> Patch? >> > > I played with increasing min_free_kbytes from ~2600 to 16000. It > resulted in shifting watermarks levels in __setup_per_zone_wmarks(), > however only for zone DMA. Normal and Movable remained at 0. No > progress with avoiding page alloc failures - a gap between 'free' and > 'free_cma' was huge, so I don't think that CMA itself would be a root > cause. > >> Did you establish why GFP_ATOMIC (assuming that's the failing site) had >> not specified __GFP_ATOMIC at the time of the allocation failure? >> > > Yes. It happens in new_slab() in following lines: > return allocate_slab(s, flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); > I added "| GFP_ATOMIC" and in such case I got same dumps but with one > bit set more in gfp_mask, so I don't think it's an issue. > > Latest patches in v4.7-rc1 seem to boost page alloc performance enough > to avoid problems observed between v4.2 and v4.6. Hence before > rebasing from v4.4 to another LTS >v4.7 in future, we decided as a WA > to return to using MIGRATE_RESERVE + adding fix for > early_page_nid_uninitialised(). Now operation seems stable on all our > SoC's during the tests. > > Best regards, > Marcin