From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C5AEC55179 for ; Fri, 30 Oct 2020 19:03:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8B4632072C for ; Fri, 30 Oct 2020 19:03:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8B4632072C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 15DCC6B007D; Fri, 30 Oct 2020 15:02:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0E8046B007E; Fri, 30 Oct 2020 15:02:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3F2C6B0080; Fri, 30 Oct 2020 15:02:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id A35416B007E for ; Fri, 30 Oct 2020 15:02:54 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 4B4C4180AD801 for ; Fri, 30 Oct 2020 19:02:54 +0000 (UTC) X-FDA: 77429513868.12.mist40_5f06a8f27298 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin12.hostedemail.com (Postfix) with ESMTP id 1DA061805629F for ; Fri, 30 Oct 2020 19:02:54 +0000 (UTC) X-HE-Tag: mist40_5f06a8f27298 X-Filterd-Recvd-Size: 8506 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by imf36.hostedemail.com (Postfix) with ESMTP for ; Fri, 30 Oct 2020 19:02:53 +0000 (UTC) IronPort-SDR: V3TW5MudgJhvrkuX86ZYU3tN9QU8SIDZs2Au3uKwLgG7s24VRATl88/UW2lC3LFruNwS5barEI Ga8ED/T1T7+g== X-IronPort-AV: E=McAfee;i="6000,8403,9790"; a="155629130" X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="155629130" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 12:02:52 -0700 IronPort-SDR: 2EzgMc4crjn5qzbC34oumoE1oQS+ksxq4lt03rxhuDIFPECdjcfNlG0oNoJeG3E+pkqyjJvia8 FgbRmZ3NhjfA== X-IronPort-AV: E=Sophos;i="5.77,434,1596524400"; d="scan'208";a="537167717" Received: from kingelix-mobl.amr.corp.intel.com (HELO bwidawsk-mobl5.local) ([10.252.139.120]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2020 12:02:52 -0700 From: Ben Widawsky To: linux-mm , Jonathan Corbet , Mike Kravetz , Andrew Morton Cc: Ben Widawsky , Dave Hansen , Michal Hocko , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH 12/12] mm/mempolicy: Advertise new MPOL_PREFERRED_MANY Date: Fri, 30 Oct 2020 12:02:38 -0700 Message-Id: <20201030190238.306764-13-ben.widawsky@intel.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201030190238.306764-1-ben.widawsky@intel.com> References: <20201030190238.306764-1-ben.widawsky@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Adds a new mode to the existing mempolicy modes, MPOL_PREFERRED_MANY. MPOL_PREFERRED_MANY will be adequately documented in the internal admin-guide with this patch. Eventually, the man pages for mbind(2), get_mempolicy(2), set_mempolicy(2) and numactl(8) will also have text about this mode. Those shall contain the canonical reference. NUMA systems continue to become more prevalent. New technologies like PMEM make finer grain control over memory access patterns increasingly desirable. MPOL_PREFERRED_MANY allows userspace to specify a set of nodes that will be tried first when performing allocations. If those allocations fail, all remaining nodes will be tried. It's a straight forward API which solves many of the presumptive needs of system administrators wanting to optimize workloads on such machines. The mode will work either per VMA, or per thread. Generally speaking, this is similar to the way MPOL_BIND works, except the user will only get a SIGSEGV if all nodes in the system are unable to satisfy the allocation request. Link: https://lore.kernel.org/r/20200630212517.308045-13-ben.widawsky@int= el.com Signed-off-by: Ben Widawsky --- .../admin-guide/mm/numa_memory_policy.rst | 16 ++++++++++++---- include/uapi/linux/mempolicy.h | 6 +++--- mm/hugetlb.c | 4 ++-- mm/mempolicy.c | 14 ++++++-------- 4 files changed, 23 insertions(+), 17 deletions(-) diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Docume= ntation/admin-guide/mm/numa_memory_policy.rst index 1ad020c459b8..b69963a37fc8 100644 --- a/Documentation/admin-guide/mm/numa_memory_policy.rst +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst @@ -245,6 +245,14 @@ MPOL_INTERLEAVED address range or file. During system boot up, the temporary interleaved system default policy works in this mode. =20 +MPOL_PREFERRED_MANY + This mode specifies that the allocation should be attempted from= the + nodemask specified in the policy. If that allocation fails, the = kernel + will search other nodes, in order of increasing distance from th= e first + set bit in the nodemask based on information provided by the pla= tform + firmware. It is similar to MPOL_PREFERRED with the main exceptio= n that + is is an error to have an empty nodemask. + NUMA memory policy supports the following optional mode flags: =20 MPOL_F_STATIC_NODES @@ -253,10 +261,10 @@ MPOL_F_STATIC_NODES nodes changes after the memory policy has been defined. =20 Without this flag, any time a mempolicy is rebound because of a - change in the set of allowed nodes, the node (Preferred) or - nodemask (Bind, Interleave) is remapped to the new set of - allowed nodes. This may result in nodes being used that were - previously undesired. + change in the set of allowed nodes, the preferred nodemask (Pref= erred + Many), preferred node (Preferred) or nodemask (Bind, Interleave)= is + remapped to the new set of allowed nodes. This may result in no= des + being used that were previously undesired. =20 With this flag, if the user-specified nodes overlap with the nodes allowed by the task's cpuset, then the memory policy is diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolic= y.h index 3354774af61e..ad3eee651d4e 100644 --- a/include/uapi/linux/mempolicy.h +++ b/include/uapi/linux/mempolicy.h @@ -16,13 +16,13 @@ */ =20 /* Policies */ -enum { - MPOL_DEFAULT, +enum { MPOL_DEFAULT, MPOL_PREFERRED, MPOL_BIND, MPOL_INTERLEAVE, MPOL_LOCAL, - MPOL_MAX, /* always last member of enum */ + MPOL_PREFERRED_MANY, + MPOL_MAX, /* always last member of enum */ }; =20 /* Flags for set_mempolicy */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d9acc25ed3b5..9539d0429706 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1115,7 +1115,7 @@ static struct page *dequeue_huge_page_vma(struct hs= tate *h, =20 gfp_mask =3D htlb_alloc_mask(h); nid =3D huge_node(vma, address, gfp_mask, &mpol, &nodemask); - if (mpol->mode !=3D MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY= */ + if (mpol->mode =3D=3D MPOL_PREFERRED_MANY) { page =3D dequeue_huge_page_nodemask(h, gfp_mask | __GFP_RETRY_MAYFAIL, nid, nodemask); if (!page) @@ -1984,7 +1984,7 @@ struct page *alloc_buddy_huge_page_with_mpol(struct= hstate *h, nodemask_t *nodemask; =20 nid =3D huge_node(vma, addr, gfp_mask, &mpol, &nodemask); - if (mpol->mode !=3D MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY= */ + if (mpol->mode !=3D MPOL_PREFERRED_MANY) { page =3D alloc_surplus_huge_page(h, gfp_mask | __GFP_RETRY_MAYFAIL, nid, nodemask); if (!page) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index aab9ef698aa8..038c0432ec32 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -108,8 +108,6 @@ =20 #include "internal.h" =20 -#define MPOL_PREFERRED_MANY MPOL_MAX - /* Internal flags */ #define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0) /* Skip checks for = continuous vmas */ #define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1) /* Invert check for node= mask */ @@ -180,7 +178,7 @@ struct mempolicy *get_task_policy(struct task_struct = *p) static const struct mempolicy_operations { int (*create)(struct mempolicy *pol, const nodemask_t *nodes); void (*rebind)(struct mempolicy *pol, const nodemask_t *nodes); -} mpol_ops[MPOL_MAX + 1]; +} mpol_ops[MPOL_MAX]; =20 static inline int mpol_store_user_nodemask(const struct mempolicy *pol) { @@ -385,8 +383,8 @@ static void mpol_rebind_preferred_common(struct mempo= licy *pol, } =20 /* MPOL_PREFERRED_MANY allows multiple nodes to be set in 'nodes' */ -static void __maybe_unused mpol_rebind_preferred_many(struct mempolicy *= pol, - const nodemask_t *nodes) +static void mpol_rebind_preferred_many(struct mempolicy *pol, + const nodemask_t *nodes) { mpol_rebind_preferred_common(pol, nodes, nodes); } @@ -448,7 +446,7 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t = *new) mmap_write_unlock(mm); } =20 -static const struct mempolicy_operations mpol_ops[MPOL_MAX + 1] =3D { +static const struct mempolicy_operations mpol_ops[MPOL_MAX] =3D { [MPOL_DEFAULT] =3D { .rebind =3D mpol_rebind_default, }, @@ -466,8 +464,8 @@ static const struct mempolicy_operations mpol_ops[MPO= L_MAX + 1] =3D { }, /* [MPOL_LOCAL] - see mpol_new() */ [MPOL_PREFERRED_MANY] =3D { - .create =3D NULL, - .rebind =3D NULL, + .create =3D mpol_new_preferred_many, + .rebind =3D mpol_rebind_preferred_many, }, }; =20 --=20 2.29.2