From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4148CC4320A for ; Mon, 23 Aug 2021 17:47:18 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0DCA861361 for ; Mon, 23 Aug 2021 17:47:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0DCA861361 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=xen.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.xenproject.org Received: from list by lists.xenproject.org with outflank-mailman.170553.311374 (Exim 4.92) (envelope-from ) id 1mIE2O-0000tw-Ea; Mon, 23 Aug 2021 17:47:08 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 170553.311374; Mon, 23 Aug 2021 17:47:08 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mIE2O-0000tp-8l; Mon, 23 Aug 2021 17:47:08 +0000 Received: by outflank-mailman (input) for mailman id 170553; Mon, 23 Aug 2021 17:47:06 +0000 Received: from mail.xenproject.org ([104.130.215.37]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mIE2M-0000te-So for xen-devel@lists.xenproject.org; Mon, 23 Aug 2021 17:47:06 +0000 Received: from xenbits.xenproject.org ([104.239.192.120]) by mail.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mIE2L-0008OV-VR; Mon, 23 Aug 2021 17:47:05 +0000 Received: from [54.239.6.188] (helo=a483e7b01a66.ant.amazon.com) by xenbits.xenproject.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mIE2L-0000Ge-L5; Mon, 23 Aug 2021 17:47:05 +0000 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=xen.org; s=20200302mail; h=Content-Transfer-Encoding:Content-Type:In-Reply-To: MIME-Version:Date:Message-ID:From:References:Cc:To:Subject; bh=a2qmqJrGSnaiyThI2dszzxxbGjk28THCh159ETZOwZU=; b=ARd9g3NvtNaIH71ZBCi701isN9 xv2P441kH8FKb+8BAQE8Hufab+swgjSpgAv/aS2Qi6t+RNMOOkvYiRXmpCiCUGhV4/lw7Acb+Kefv W0Y4Yis5kZn10Dyjx2MB/r5msVwtfKRfFUUAGoehPgyPT+vxXHKQMchHbTb2G2MZbIbs=; Subject: Re: [XEN RFC PATCH 08/40] xen/x86: Move NUMA memory node map functions to common To: Wei Chen , xen-devel@lists.xenproject.org, sstabellini@kernel.org, jbeulich@suse.com Cc: Bertrand.Marquis@arm.com References: <20210811102423.28908-1-wei.chen@arm.com> <20210811102423.28908-9-wei.chen@arm.com> From: Julien Grall Message-ID: Date: Mon, 23 Aug 2021 18:47:03 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <20210811102423.28908-9-wei.chen@arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Hi Wei, On 11/08/2021 11:23, Wei Chen wrote: > In the later patches we will add NUMA support to Arm. Arm > NUMA support will follow current memory node map management > as x86. So this part of code can be common, in this case, > we move this part of code from arch/x86 to common. I would add "No functional changes intended" to make clear this patch is only moving code. > > Signed-off-by: Wei Chen > --- > xen/arch/x86/numa.c | 114 -------------------------------- > xen/common/Makefile | 1 + > xen/common/numa.c | 131 +++++++++++++++++++++++++++++++++++++ > xen/include/asm-x86/numa.h | 29 -------- > xen/include/xen/numa.h | 35 ++++++++++ > 5 files changed, 167 insertions(+), 143 deletions(-) > create mode 100644 xen/common/numa.c > > diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c > index d23f4f7919..a6211be121 100644 > --- a/xen/arch/x86/numa.c > +++ b/xen/arch/x86/numa.c > @@ -29,14 +29,6 @@ custom_param("numa", numa_setup); > /* from proto.h */ > #define round_up(x,y) ((((x)+(y))-1) & (~((y)-1))) > > -struct node_data node_data[MAX_NUMNODES]; > - > -/* Mapping from pdx to node id */ > -int memnode_shift; > -static typeof(*memnodemap) _memnodemap[64]; > -unsigned long memnodemapsize; > -u8 *memnodemap; > - > nodeid_t cpu_to_node[NR_CPUS] __read_mostly = { > [0 ... NR_CPUS-1] = NUMA_NO_NODE > }; > @@ -58,112 +50,6 @@ int srat_disabled(void) > return numa_off || acpi_numa < 0; > } > > -/* > - * Given a shift value, try to populate memnodemap[] > - * Returns : > - * 1 if OK > - * 0 if memnodmap[] too small (of shift too small) > - * -1 if node overlap or lost ram (shift too big) > - */ > -static int __init populate_memnodemap(const struct node *nodes, > - int numnodes, int shift, nodeid_t *nodeids) > -{ > - unsigned long spdx, epdx; > - int i, res = -1; > - > - memset(memnodemap, NUMA_NO_NODE, memnodemapsize * sizeof(*memnodemap)); > - for ( i = 0; i < numnodes; i++ ) > - { > - spdx = paddr_to_pdx(nodes[i].start); > - epdx = paddr_to_pdx(nodes[i].end - 1) + 1; > - if ( spdx >= epdx ) > - continue; > - if ( (epdx >> shift) >= memnodemapsize ) > - return 0; > - do { > - if ( memnodemap[spdx >> shift] != NUMA_NO_NODE ) > - return -1; > - > - if ( !nodeids ) > - memnodemap[spdx >> shift] = i; > - else > - memnodemap[spdx >> shift] = nodeids[i]; > - > - spdx += (1UL << shift); > - } while ( spdx < epdx ); > - res = 1; > - } > - > - return res; > -} > - > -static int __init allocate_cachealigned_memnodemap(void) > -{ > - unsigned long size = PFN_UP(memnodemapsize * sizeof(*memnodemap)); > - unsigned long mfn = mfn_x(alloc_boot_pages(size, 1)); > - > - memnodemap = mfn_to_virt(mfn); > - mfn <<= PAGE_SHIFT; > - size <<= PAGE_SHIFT; > - printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n", > - mfn, mfn + size); > - memnodemapsize = size / sizeof(*memnodemap); > - > - return 0; > -} > - > -/* > - * The LSB of all start and end addresses in the node map is the value of the > - * maximum possible shift. > - */ > -static int __init extract_lsb_from_nodes(const struct node *nodes, > - int numnodes) > -{ > - int i, nodes_used = 0; > - unsigned long spdx, epdx; > - unsigned long bitfield = 0, memtop = 0; > - > - for ( i = 0; i < numnodes; i++ ) > - { > - spdx = paddr_to_pdx(nodes[i].start); > - epdx = paddr_to_pdx(nodes[i].end - 1) + 1; > - if ( spdx >= epdx ) > - continue; > - bitfield |= spdx; > - nodes_used++; > - if ( epdx > memtop ) > - memtop = epdx; > - } > - if ( nodes_used <= 1 ) > - i = BITS_PER_LONG - 1; > - else > - i = find_first_bit(&bitfield, sizeof(unsigned long)*8); > - memnodemapsize = (memtop >> i) + 1; > - return i; > -} > - > -int __init compute_hash_shift(struct node *nodes, int numnodes, > - nodeid_t *nodeids) > -{ > - int shift; > - > - shift = extract_lsb_from_nodes(nodes, numnodes); > - if ( memnodemapsize <= ARRAY_SIZE(_memnodemap) ) > - memnodemap = _memnodemap; > - else if ( allocate_cachealigned_memnodemap() ) > - return -1; > - printk(KERN_DEBUG "NUMA: Using %d for the hash shift.\n", shift); > - > - if ( populate_memnodemap(nodes, numnodes, shift, nodeids) != 1 ) > - { > - printk(KERN_INFO "Your memory is not aligned you need to " > - "rebuild your hypervisor with a bigger NODEMAPSIZE " > - "shift=%d\n", shift); > - return -1; > - } > - > - return shift; > -} > /* initialize NODE_DATA given nodeid and start/end */ > void __init setup_node_bootmem(nodeid_t nodeid, u64 start, u64 end) > { > diff --git a/xen/common/Makefile b/xen/common/Makefile > index 54de70d422..f8f667e90a 100644 > --- a/xen/common/Makefile > +++ b/xen/common/Makefile > @@ -54,6 +54,7 @@ obj-y += wait.o > obj-bin-y += warning.init.o > obj-$(CONFIG_XENOPROF) += xenoprof.o > obj-y += xmalloc_tlsf.o > +obj-$(CONFIG_NUMA) += numa.o AFAICT, the Makefile is listing the file in alphabetical order. So please add numa.o in the correct position. > > obj-bin-$(CONFIG_X86) += $(foreach n,decompress bunzip2 unxz unlzma lzo unlzo unlz4 unzstd earlycpio,$(n).init.o) > > diff --git a/xen/common/numa.c b/xen/common/numa.c > new file mode 100644 > index 0000000000..e65b6a6676 > --- /dev/null > +++ b/xen/common/numa.c > @@ -0,0 +1,131 @@ > +/* > + * Generic VM initialization for x86-64 NUMA setups. > + * Copyright 2002,2003 Andi Kleen, SuSE Labs. > + * Adapted for Xen: Ryan Harper > + */ > + > +#include > +#include > +#include > +#include You don't seem to use any helpers./types directly defined by at least this header... > +#include > +#include > +#include ... this one and ... > +#include ... this one. Can you check the list of headers and introduce the minimum? If the dependency is required by another headers, then I think that dependency should be moved in the header requiring it. > +#include > +#include Please sort the includes in alphabetical order. > + > +struct node_data node_data[MAX_NUMNODES]; > + > +/* Mapping from pdx to node id */ > +int memnode_shift; > +typeof(*memnodemap) _memnodemap[64]; > +unsigned long memnodemapsize; > +u8 *memnodemap; > + > +/* > + * Given a shift value, try to populate memnodemap[] > + * Returns : > + * 1 if OK > + * 0 if memnodmap[] too small (of shift too small) > + * -1 if node overlap or lost ram (shift too big) > + */ > +static int __init populate_memnodemap(const struct node *nodes, > + int numnodes, int shift, nodeid_t *nodeids) > +{ > + unsigned long spdx, epdx; > + int i, res = -1; > + > + memset(memnodemap, NUMA_NO_NODE, memnodemapsize * sizeof(*memnodemap)); > + for ( i = 0; i < numnodes; i++ ) > + { > + spdx = paddr_to_pdx(nodes[i].start); > + epdx = paddr_to_pdx(nodes[i].end - 1) + 1; > + if ( spdx >= epdx ) > + continue; > + if ( (epdx >> shift) >= memnodemapsize ) > + return 0; > + do { > + if ( memnodemap[spdx >> shift] != NUMA_NO_NODE ) > + return -1; > + > + if ( !nodeids ) > + memnodemap[spdx >> shift] = i; > + else > + memnodemap[spdx >> shift] = nodeids[i]; > + > + spdx += (1UL << shift); > + } while ( spdx < epdx ); > + res = 1; > + } > + > + return res; > +} > + > +static int __init allocate_cachealigned_memnodemap(void) > +{ > + unsigned long size = PFN_UP(memnodemapsize * sizeof(*memnodemap)); > + unsigned long mfn = mfn_x(alloc_boot_pages(size, 1)); > + > + memnodemap = mfn_to_virt(mfn); > + mfn <<= PAGE_SHIFT; > + size <<= PAGE_SHIFT; > + printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n", > + mfn, mfn + size); > + memnodemapsize = size / sizeof(*memnodemap); > + > + return 0; > +} > + > +/* > + * The LSB of all start and end addresses in the node map is the value of the > + * maximum possible shift. > + */ > +static int __init extract_lsb_from_nodes(const struct node *nodes, > + int numnodes) > +{ > + int i, nodes_used = 0; > + unsigned long spdx, epdx; > + unsigned long bitfield = 0, memtop = 0; > + > + for ( i = 0; i < numnodes; i++ ) > + { > + spdx = paddr_to_pdx(nodes[i].start); > + epdx = paddr_to_pdx(nodes[i].end - 1) + 1; > + if ( spdx >= epdx ) > + continue; > + bitfield |= spdx; > + nodes_used++; > + if ( epdx > memtop ) > + memtop = epdx; > + } > + if ( nodes_used <= 1 ) > + i = BITS_PER_LONG - 1; > + else > + i = find_first_bit(&bitfield, sizeof(unsigned long)*8); > + memnodemapsize = (memtop >> i) + 1; > + return i; > +} > + > +int __init compute_hash_shift(struct node *nodes, int numnodes, > + nodeid_t *nodeids) > +{ > + int shift; > + > + shift = extract_lsb_from_nodes(nodes, numnodes); > + if ( memnodemapsize <= ARRAY_SIZE(_memnodemap) ) > + memnodemap = _memnodemap; > + else if ( allocate_cachealigned_memnodemap() ) > + return -1; > + printk(KERN_DEBUG "NUMA: Using %d for the hash shift.\n", shift); > + > + if ( populate_memnodemap(nodes, numnodes, shift, nodeids) != 1 ) > + { > + printk(KERN_INFO "Your memory is not aligned you need to " > + "rebuild your hypervisor with a bigger NODEMAPSIZE " > + "shift=%d\n", shift); > + return -1; > + } > + > + return shift; > +} > diff --git a/xen/include/asm-x86/numa.h b/xen/include/asm-x86/numa.h > index bada2c0bb9..abe5617d01 100644 > --- a/xen/include/asm-x86/numa.h > +++ b/xen/include/asm-x86/numa.h > @@ -26,7 +26,6 @@ extern int compute_hash_shift(struct node *nodes, int numnodes, > extern nodeid_t pxm_to_node(unsigned int pxm); > > #define ZONE_ALIGN (1UL << (MAX_ORDER+PAGE_SHIFT)) > -#define VIRTUAL_BUG_ON(x) > > extern void numa_add_cpu(int cpu); > extern void numa_init_array(void); > @@ -47,34 +46,6 @@ static inline void clear_node_cpumask(int cpu) > cpumask_clear_cpu(cpu, &node_to_cpumask[cpu_to_node(cpu)]); > } > > -/* Simple perfect hash to map pdx to node numbers */ > -extern int memnode_shift; > -extern unsigned long memnodemapsize; > -extern u8 *memnodemap; > - > -struct node_data { > - unsigned long node_start_pfn; > - unsigned long node_spanned_pages; > -}; > - > -extern struct node_data node_data[]; > - > -static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr) > -{ > - nodeid_t nid; > - VIRTUAL_BUG_ON((paddr_to_pdx(addr) >> memnode_shift) >= memnodemapsize); > - nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift]; > - VIRTUAL_BUG_ON(nid >= MAX_NUMNODES || !node_data[nid]); > - return nid; > -} > - > -#define NODE_DATA(nid) (&(node_data[nid])) > - > -#define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn) > -#define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages) > -#define node_end_pfn(nid) (NODE_DATA(nid)->node_start_pfn + \ > - NODE_DATA(nid)->node_spanned_pages) > - > extern int valid_numa_range(u64 start, u64 end, nodeid_t node); > > void srat_parse_regions(u64 addr); > diff --git a/xen/include/xen/numa.h b/xen/include/xen/numa.h > index 7aef1a88dc..39e8a4e00a 100644 > --- a/xen/include/xen/numa.h > +++ b/xen/include/xen/numa.h > @@ -18,4 +18,39 @@ > (((d)->vcpu != NULL && (d)->vcpu[0] != NULL) \ > ? vcpu_to_node((d)->vcpu[0]) : NUMA_NO_NODE) > > +/* The following content can be used when NUMA feature is enabled */ > +#if defined(CONFIG_NUMA) Please use #ifdef CONFIG_NUMA > + > +/* Simple perfect hash to map pdx to node numbers */ > +extern int memnode_shift; > +extern unsigned long memnodemapsize; > +extern u8 *memnodemap; > +extern typeof(*memnodemap) _memnodemap[64]; AFAICT, this will be turned static against in a follow-up patch. Can this be avoided? > + > +struct node_data { > + unsigned long node_start_pfn; > + unsigned long node_spanned_pages; > +}; > + > +extern struct node_data node_data[]; > +#define VIRTUAL_BUG_ON(x) > + > +static inline __attribute__((pure)) nodeid_t phys_to_nid(paddr_t addr) > +{ > + nodeid_t nid; > + VIRTUAL_BUG_ON((paddr_to_pdx(addr) >> memnode_shift) >= memnodemapsize); > + nid = memnodemap[paddr_to_pdx(addr) >> memnode_shift]; > + VIRTUAL_BUG_ON(nid >= MAX_NUMNODES || !node_data[nid]); > + return nid; > +} > + > +#define NODE_DATA(nid) (&(node_data[nid])) > + > +#define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn) > +#define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages) > +#define node_end_pfn(nid) (NODE_DATA(nid)->node_start_pfn + \ > + NODE_DATA(nid)->node_spanned_pages) > + > +#endif /* CONFIG_NUMA */ > + > #endif /* _XEN_NUMA_H */ > Cheers, -- Julien Grall