From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752695AbcFCQfV (ORCPT ); Fri, 3 Jun 2016 12:35:21 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:46043 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752469AbcFCQfU (ORCPT ); Fri, 3 Jun 2016 12:35:20 -0400 Date: Fri, 3 Jun 2016 09:35:18 -0700 From: Andrew Morton To: Geert Uytterhoeven Cc: Mel Gorman , Vlastimil Babka , Linux Kernel Mailing List , Linux MM , linux-m68k Subject: Re: BUG: scheduling while atomic: cron/668/0x10c9a0c0 Message-Id: <20160603093518.09699af30ba0555847511487@linux-foundation.org> In-Reply-To: References: <20160530155644.GP2527@techsingularity.net> <574E05B8.3060009@suse.cz> <20160601091921.GT2527@techsingularity.net> <574EB274.4030408@suse.cz> <20160602103936.GU2527@techsingularity.net> <0eb1f112-65d4-f2e5-911e-697b21324b9f@suse.cz> <20160602121936.GV2527@techsingularity.net> <20160602114341.e3b974640fc3f8cbcb54898b@linux-foundation.org> <20160603084142.GY2527@techsingularity.net> X-Mailer: Sylpheed 3.4.2 (GTK+ 2.24.28; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 3 Jun 2016 11:00:30 +0200 Geert Uytterhoeven wrote: > In the mean time my tests completed successfully with both patches applied. Can we please identify "both patches" with specificity? I have the below one. From: Mel Gorman Subject: mm, page_alloc: recalculate the preferred zoneref if the context can ignore memory policies The optimistic fast path may use cpuset_current_mems_allowed instead of of a NULL nodemask supplied by the caller for cpuset allocations. The preferred zone is calculated on this basis for statistic purposes and as a starting point in the zonelist iterator. However, if the context can ignore memory policies due to being atomic or being able to ignore watermarks then the starting point in the zonelist iterator is no longer correct. This patch resets the zonelist iterator in the allocator slowpath if the context can ignore memory policies. This will alter the zone used for statistics but only after it is known that it makes sense for that context. Resetting it before entering the slowpath would potentially allow an ALLOC_CPUSET allocation to be accounted for against the wrong zone. Note that while nodemask is not explicitly set to the original nodemask, it would only have been overwritten if cpuset_enabled() and it was reset before the slowpath was entered. Link: http://lkml.kernel.org/r/20160602103936.GU2527@techsingularity.net Fixes: c33d6c06f60f710 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Signed-off-by: Mel Gorman Reported-by: Geert Uytterhoeven Acked-by: Vlastimil Babka Signed-off-by: Andrew Morton --- mm/page_alloc.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff -puN mm/page_alloc.c~mm-page_alloc-recalculate-the-preferred-zoneref-if-the-context-can-ignore-memory-policies mm/page_alloc.c --- a/mm/page_alloc.c~mm-page_alloc-recalculate-the-preferred-zoneref-if-the-context-can-ignore-memory-policies +++ a/mm/page_alloc.c @@ -3604,6 +3604,17 @@ retry: */ alloc_flags = gfp_to_alloc_flags(gfp_mask); + /* + * Reset the zonelist iterators if memory policies can be ignored. + * These allocations are high priority and system rather than user + * orientated. + */ + if ((alloc_flags & ALLOC_NO_WATERMARKS) || !(alloc_flags & ALLOC_CPUSET)) { + ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); + ac->preferred_zoneref = first_zones_zonelist(ac->zonelist, + ac->high_zoneidx, ac->nodemask); + } + /* This is the last chance, in general, before the goto nopage. */ page = get_page_from_freelist(gfp_mask, order, alloc_flags & ~ALLOC_NO_WATERMARKS, ac); @@ -3612,12 +3623,6 @@ retry: /* Allocate without watermarks if the context allows */ if (alloc_flags & ALLOC_NO_WATERMARKS) { - /* - * Ignore mempolicies if ALLOC_NO_WATERMARKS on the grounds - * the allocation is high priority and these type of - * allocations are system rather than user orientated - */ - ac->zonelist = node_zonelist(numa_node_id(), gfp_mask); page = get_page_from_freelist(gfp_mask, order, ALLOC_NO_WATERMARKS, ac); if (page) @@ -3816,7 +3821,11 @@ retry_cpuset: /* Dirty zone balancing only done in the fast path */ ac.spread_dirty_pages = (gfp_mask & __GFP_WRITE); - /* The preferred zone is used for statistics later */ + /* + * The preferred zone is used for statistics but crucially it is + * also used as the starting point for the zonelist iterator. It + * may get reset for allocations that ignore memory policies. + */ ac.preferred_zoneref = first_zones_zonelist(ac.zonelist, ac.high_zoneidx, ac.nodemask); if (!ac.preferred_zoneref) { _