From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3394C43462 for ; Wed, 5 May 2021 00:32:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7D1A0613CD for ; Wed, 5 May 2021 00:32:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7D1A0613CD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4BE796B0072; Tue, 4 May 2021 20:32:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 44D9B6B0070; Tue, 4 May 2021 20:32:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12A3E6B0070; Tue, 4 May 2021 20:32:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0020.hostedemail.com [216.40.44.20]) by kanga.kvack.org (Postfix) with ESMTP id E041D6B0070 for ; Tue, 4 May 2021 20:32:29 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 923F1A760 for ; Wed, 5 May 2021 00:32:29 +0000 (UTC) X-FDA: 78105301218.20.698C8E2 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf17.hostedemail.com (Postfix) with ESMTP id 79C5340002C7 for ; Wed, 5 May 2021 00:32:24 +0000 (UTC) IronPort-SDR: nm5X8gxXdNNP7f6G+prhrIA0uSqPFkOvYG9tFH2kYZEtJtvC67b5PNlCiRFdppxaB4d3SWjL35 UdotBtviQS1g== X-IronPort-AV: E=McAfee;i="6200,9189,9974"; a="197724342" X-IronPort-AV: E=Sophos;i="5.82,273,1613462400"; d="scan'208";a="197724342" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2021 17:32:27 -0700 IronPort-SDR: Qqa/foM0GySgOOlFhGHAxlMC+r64sCbCcfkREouyUfA0fkoq5KrlRm0fs13S9pF2I66o4T9VQ4 ApIzAmJWkXrA== X-IronPort-AV: E=Sophos;i="5.82,273,1613462400"; d="scan'208";a="429490773" Received: from rpedgeco-mobl3.amr.corp.intel.com (HELO localhost.intel.com) ([10.209.26.68]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 May 2021 17:32:26 -0700 From: Rick Edgecombe To: dave.hansen@intel.com, luto@kernel.org, peterz@infradead.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, linux-hardening@vger.kernel.org, kernel-hardening@lists.openwall.com Cc: ira.weiny@intel.com, rppt@kernel.org, dan.j.williams@intel.com, linux-kernel@vger.kernel.org, Rick Edgecombe Subject: [PATCH RFC 3/9] x86/mm/cpa: Add grouped page allocations Date: Tue, 4 May 2021 17:30:26 -0700 Message-Id: <20210505003032.489164-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210505003032.489164-1-rick.p.edgecombe@intel.com> References: <20210505003032.489164-1-rick.p.edgecombe@intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 79C5340002C7 Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none); spf=none (imf17.hostedemail.com: domain of rick.p.edgecombe@intel.com has no SPF policy when checking 192.55.52.115) smtp.mailfrom=rick.p.edgecombe@intel.com X-Rspamd-Server: rspam04 X-Stat-Signature: gw1mxcbt7dz71gd9p53s8d95dqydhnfw Received-SPF: none (intel.com>: No applicable sender policy available) receiver=imf17; identity=mailfrom; envelope-from=""; helo=mga14.intel.com; client-ip=192.55.52.115 X-HE-DKIM-Result: none/none X-HE-Tag: 1620174744-147132 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For x86, setting memory permissions on the direct map results in fracturi= ng large pages. Direct map fracturing can be reduced by locating pages that will have their permissions set close together. Create a simple page cache that allocates pages from huge page size blocks. Don't guarantee that a page will come from a huge page grouping, instead fallback to non-grouped pages to fulfill the allocation if needed. Also, register a shrinker such that the system can ask for the pages back if needed. Since this is only needed when there is a direct map, compile it out on highmem systems. Free pages in the cache are kept track of in per-node list inside a list_lru. NUMA_NO_NODE requests are serviced by checking each per-node list in a round robin fashion. If pages are requested for a certain node but the cache is empty for that node, a whole additional huge page size page is allocated. Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/set_memory.h | 14 +++ arch/x86/mm/pat/set_memory.c | 151 ++++++++++++++++++++++++++++++ 2 files changed, 165 insertions(+) diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set= _memory.h index 4352f08bfbb5..b63f09cc282a 100644 --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -4,6 +4,9 @@ =20 #include #include +#include +#include +#include =20 /* * The set_memory_* API can be used to change various attributes of a vi= rtual @@ -135,4 +138,15 @@ static inline int clear_mce_nospec(unsigned long pfn= ) */ #endif =20 +struct grouped_page_cache { + struct shrinker shrinker; + struct list_lru lru; + gfp_t gfp; + atomic_t nid_round_robin; +}; + +int init_grouped_page_cache(struct grouped_page_cache *gpc, gfp_t gfp); +struct page *get_grouped_page(int node, struct grouped_page_cache *gpc); +void free_grouped_page(struct grouped_page_cache *gpc, struct page *page= ); + #endif /* _ASM_X86_SET_MEMORY_H */ diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 16f878c26667..6877ef66793b 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -2306,6 +2306,157 @@ int __init kernel_unmap_pages_in_pgd(pgd_t *pgd, = unsigned long address, return retval; } =20 +#ifndef HIGHMEM +static struct page *__alloc_page_order(int node, gfp_t gfp_mask, int ord= er) +{ + if (node =3D=3D NUMA_NO_NODE) + return alloc_pages(gfp_mask, order); + + return alloc_pages_node(node, gfp_mask, order); +} + +static struct grouped_page_cache *__get_gpc_from_sc(struct shrinker *shr= inker) +{ + return container_of(shrinker, struct grouped_page_cache, shrinker); +} + +static unsigned long grouped_shrink_count(struct shrinker *shrinker, + struct shrink_control *sc) +{ + struct grouped_page_cache *gpc =3D __get_gpc_from_sc(shrinker); + unsigned long page_cnt =3D list_lru_shrink_count(&gpc->lru, sc); + + return page_cnt ? page_cnt : SHRINK_EMPTY; +} + +static enum lru_status grouped_isolate(struct list_head *item, + struct list_lru_one *list, + spinlock_t *lock, void *cb_arg) +{ + struct list_head *dispose =3D cb_arg; + + list_lru_isolate_move(list, item, dispose); + + return LRU_REMOVED; +} + +static void __dispose_pages(struct grouped_page_cache *gpc, struct list_= head *head) +{ + struct list_head *cur, *next; + + list_for_each_safe(cur, next, head) { + struct page *page =3D list_entry(head, struct page, lru); + + list_del(cur); + + __free_pages(page, 0); + } +} + +static unsigned long grouped_shrink_scan(struct shrinker *shrinker, + struct shrink_control *sc) +{ + struct grouped_page_cache *gpc =3D __get_gpc_from_sc(shrinker); + unsigned long isolated; + LIST_HEAD(freeable); + + if (!(sc->gfp_mask & gpc->gfp)) + return SHRINK_STOP; + + isolated =3D list_lru_shrink_walk(&gpc->lru, sc, grouped_isolate, + &freeable); + __dispose_pages(gpc, &freeable); + + /* Every item walked gets isolated */ + sc->nr_scanned +=3D isolated; + + return isolated; +} + +static struct page *__remove_first_page(struct grouped_page_cache *gpc, = int node) +{ + unsigned int start_nid, i; + struct list_head *head; + + if (node !=3D NUMA_NO_NODE) { + head =3D list_lru_get_mru(&gpc->lru, node); + if (head) + return list_entry(head, struct page, lru); + return NULL; + } + + /* If NUMA_NO_NODE, search the nodes in round robin for a page */ + start_nid =3D (unsigned int)atomic_fetch_inc(&gpc->nid_round_robin) % n= r_node_ids; + for (i =3D 0; i < nr_node_ids; i++) { + int cur_nid =3D (start_nid + i) % nr_node_ids; + + head =3D list_lru_get_mru(&gpc->lru, cur_nid); + if (head) + return list_entry(head, struct page, lru); + } + + return NULL; +} + +/* Get and add some new pages to the cache to be used by VM_GROUP_PAGES = */ +static struct page *__replenish_grouped_pages(struct grouped_page_cache = *gpc, int node) +{ + const unsigned int hpage_cnt =3D HPAGE_SIZE >> PAGE_SHIFT; + struct page *page; + int i; + + page =3D __alloc_page_order(node, gpc->gfp, HUGETLB_PAGE_ORDER); + if (!page) + return __alloc_page_order(node, gpc->gfp, 0); + + split_page(page, HUGETLB_PAGE_ORDER); + + for (i =3D 1; i < hpage_cnt; i++) + free_grouped_page(gpc, &page[i]); + + return &page[0]; +} + +int init_grouped_page_cache(struct grouped_page_cache *gpc, gfp_t gfp) +{ + int err =3D 0; + + memset(gpc, 0, sizeof(struct grouped_page_cache)); + + if (list_lru_init(&gpc->lru)) + goto out; + + gpc->shrinker.count_objects =3D grouped_shrink_count; + gpc->shrinker.scan_objects =3D grouped_shrink_scan; + gpc->shrinker.seeks =3D DEFAULT_SEEKS; + gpc->shrinker.flags =3D SHRINKER_NUMA_AWARE; + + err =3D register_shrinker(&gpc->shrinker); + if (err) + list_lru_destroy(&gpc->lru); + +out: + return err; +} + +struct page *get_grouped_page(int node, struct grouped_page_cache *gpc) +{ + struct page *page; + + page =3D __remove_first_page(gpc, node); + + if (page) + return page; + + return __replenish_grouped_pages(gpc, node); +} + +void free_grouped_page(struct grouped_page_cache *gpc, struct page *page= ) +{ + INIT_LIST_HEAD(&page->lru); + list_lru_add_node(&gpc->lru, &page->lru, page_to_nid(page)); +} +#endif /* !HIGHMEM */ /* * The testcases use internal knowledge of the implementation that shoul= dn't * be exposed to the rest of the kernel. Include these directly here. --=20 2.30.2