From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65D5AC433DF for ; Fri, 26 Jun 2020 21:39:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2FD242088E for ; Fri, 26 Jun 2020 21:39:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2FD242088E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B448A6B0006; Fri, 26 Jun 2020 17:39:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF5336B0007; Fri, 26 Jun 2020 17:39:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0C286B0008; Fri, 26 Jun 2020 17:39:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34]) by kanga.kvack.org (Postfix) with ESMTP id 89E906B0006 for ; Fri, 26 Jun 2020 17:39:11 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 0D42A2DFD for ; Fri, 26 Jun 2020 21:39:11 +0000 (UTC) X-FDA: 76972678902.06.sand59_1e04e1e26e58 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id D775810048930 for ; Fri, 26 Jun 2020 21:39:10 +0000 (UTC) X-HE-Tag: sand59_1e04e1e26e58 X-Filterd-Recvd-Size: 6147 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Fri, 26 Jun 2020 21:39:09 +0000 (UTC) IronPort-SDR: ax+1P1IeUPoCywdK47xM+iDZ+7pOwSRiLmgd8RuJwQPfxWJTyDwoOfGsfQWgoqdyIFRWIV+bjJ t5h0Bl7Sv5Cg== X-IronPort-AV: E=McAfee;i="6000,8403,9664"; a="125139564" X-IronPort-AV: E=Sophos;i="5.75,285,1589266800"; d="scan'208";a="125139564" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2020 14:39:08 -0700 IronPort-SDR: wtkKno96ilMUHD13jawhjY3cnDzp9bWzSifdkWAbNFi23SXHCUvSMa3IH5U0MJf3Td4FJwIU9w /pIzNHyZGPBA== X-IronPort-AV: E=Sophos;i="5.75,285,1589266800"; d="scan'208";a="424199267" Received: from jckalvin-mobl.amr.corp.intel.com (HELO intel.com) ([10.252.132.144]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Jun 2020 14:39:07 -0700 Date: Fri, 26 Jun 2020 14:39:05 -0700 From: Ben Widawsky To: Michal Hocko Cc: linux-mm , Andi Kleen , Andrew Morton , Christoph Lameter , Dan Williams , Dave Hansen , David Hildenbrand , David Rientjes , Jason Gunthorpe , Johannes Weiner , Jonathan Corbet , Kuppuswamy Sathyanarayanan , Lee Schermerhorn , Li Xinhai , Mel Gorman , Mike Kravetz , Mina Almasry , Tejun Heo , Vlastimil Babka , linux-api@vger.kernel.org Subject: Re: [PATCH 00/18] multiple preferred nodes Message-ID: <20200626213905.dpu2rgevazmisvhj@intel.com> Mail-Followup-To: Michal Hocko , linux-mm , Andi Kleen , Andrew Morton , Christoph Lameter , Dan Williams , Dave Hansen , David Hildenbrand , David Rientjes , Jason Gunthorpe , Johannes Weiner , Jonathan Corbet , Kuppuswamy Sathyanarayanan , Lee Schermerhorn , Li Xinhai , Mel Gorman , Mike Kravetz , Mina Almasry , Tejun Heo , Vlastimil Babka , linux-api@vger.kernel.org References: <20200619162425.1052382-1-ben.widawsky@intel.com> <20200622070957.GB31426@dhcp22.suse.cz> <20200623112048.GR31426@dhcp22.suse.cz> <20200623161211.qjup5km5eiisy5wy@intel.com> <20200624075216.GC1320@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200624075216.GC1320@dhcp22.suse.cz> X-Rspamd-Queue-Id: D775810048930 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 20-06-24 09:52:16, Michal Hocko wrote: > On Tue 23-06-20 09:12:11, Ben Widawsky wrote: > > On 20-06-23 13:20:48, Michal Hocko wrote: > [...] > > > It would be also great to provide a high level semantic description > > > here. I have very quickly glanced through patches and they are not > > > really trivial to follow with many incremental steps so the higher level > > > intention is lost easily. > > > > > > Do I get it right that the default semantic is essentially > > > - allocate page from the given nodemask (with __GFP_RETRY_MAYFAIL > > > semantic) > > > - fallback to numa unrestricted allocation with the default > > > numa policy on the failure > > > > > > Or are there any usecases to modify how hard to keep the preference over > > > the fallback? > > > > tl;dr is: yes, and no usecases. > > OK, then I am wondering why the change has to be so involved. Except for > syscall plumbing the only real change to the allocator path would be > something like > > static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) > { > /* Lower zones don't get a nodemask applied for MPOL_BIND */ > if (unlikely(policy->mode == MPOL_BIND || > policy->mode == MPOL_PREFERED_MANY) && > apply_policy_zone(policy, gfp_zone(gfp)) && > cpuset_nodemask_valid_mems_allowed(&policy->v.nodes)) > return &policy->v.nodes; > > return NULL; > } > > alloc_pages_current > > if (pol->mode == MPOL_INTERLEAVE) > page = alloc_page_interleave(gfp, order, interleave_nodes(pol)); > else { > gfp_t gfp_attempt = gfp; > > /* > * Make sure the first allocation attempt will try hard > * but eventually fail without OOM killer or other > * disruption before falling back to the full nodemask > */ > if (pol->mode == MPOL_PREFERED_MANY) > gfp_attempt |= __GFP_RETRY_MAYFAIL; > > page = __alloc_pages_nodemask(gfp_attempt, order, > policy_node(gfp, pol, numa_node_id()), > policy_nodemask(gfp, pol)); > if (!page && pol->mode == MPOL_PREFERED_MANY) > page = __alloc_pages_nodemask(gfp, order, > numa_node_id(), NULL); > } > > return page; > > similar (well slightly more hairy) in alloc_pages_vma > > Or do I miss something that really requires more involved approach like > building custom zonelists and other larger changes to the allocator? Hi Michal, I'm mostly done implementing this change. It looks good, and so far I think it's functionally equivalent. One thing though, above you use NULL for the fallback. That actually should not be NULL because of the logic in policy_node to restrict zones, and obey cpusets. I've implemented it as such, but I was hoping someone with a deeper understanding, and more experience can confirm that was the correct thing to do. Thanks.