From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9DB1C432BE for ; Sat, 28 Aug 2021 10:34:08 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4F37660EE5 for ; Sat, 28 Aug 2021 10:34:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4F37660EE5 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=xen.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.xenproject.org Received: from list by lists.xenproject.org with outflank-mailman.174263.317876 (Exim 4.92) (envelope-from ) id 1mJved-0004jn-QO; Sat, 28 Aug 2021 10:33:39 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 174263.317876; Sat, 28 Aug 2021 10:33:39 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mJved-0004jg-N6; Sat, 28 Aug 2021 10:33:39 +0000 Received: by outflank-mailman (input) for mailman id 174263; Sat, 28 Aug 2021 10:33:38 +0000 Received: from mail.xenproject.org ([104.130.215.37]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mJvec-0004jZ-GM for xen-devel@lists.xenproject.org; Sat, 28 Aug 2021 10:33:38 +0000 Received: from xenbits.xenproject.org ([104.239.192.120]) by mail.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mJvec-0003k4-2r; Sat, 28 Aug 2021 10:33:38 +0000 Received: from gw1.octic.net ([81.187.162.82] helo=a483e7b01a66.ant.amazon.com) by xenbits.xenproject.org with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1mJveb-00026H-Tf; Sat, 28 Aug 2021 10:33:38 +0000 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=xen.org; s=20200302mail; h=Content-Transfer-Encoding:Content-Type:In-Reply-To: MIME-Version:Date:Message-ID:From:References:Cc:To:Subject; bh=OvnXD787QaqF5LWmlc3wy1WC+FC9EsJYDxD8ixB5Vj8=; b=YxiDO9154MsgNMhftcMREk8YcN pEUmrQcTdOTW8NMsddvbQdQEriOT7SO2AB6ECrP0aGAeEDAxZCp/D+OdsKEWjemuBqkgUNQ9WaKW9 A0eeAJ7MkXtPwAKRa7qTyMDqOQXUlAmU7r51V7xkyxyj6Oc6rmaT6cmYmBAgDRPP1QxQ=; Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse device tree memory node To: Wei Chen , Stefano Stabellini Cc: "xen-devel@lists.xenproject.org" , Bertrand Marquis , Jan Beulich References: <20210811102423.28908-1-wei.chen@arm.com> <20210811102423.28908-24-wei.chen@arm.com> From: Julien Grall Message-ID: <5c9a196a-8ac3-1173-f402-01d99ccc23e4@xen.org> Date: Sat, 28 Aug 2021 11:33:35 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=gbk; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 8bit Hi Wei, On 28/08/2021 04:56, Wei Chen wrote: >> -----Original Message----- >> From: Stefano Stabellini >> Sent: 2021Äê8ÔÂ28ÈÕ 9:06 >> To: Wei Chen >> Cc: xen-devel@lists.xenproject.org; sstabellini@kernel.org; julien@xen.org; >> jbeulich@suse.com; Bertrand Marquis >> Subject: Re: [XEN RFC PATCH 23/40] xen/arm: introduce a helper to parse >> device tree memory node >> >> On Wed, 11 Aug 2021, Wei Chen wrote: >>> Memory blocks' NUMA ID information is stored in device tree's >>> memory nodes as "numa-node-id". We need a new helper to parse >>> and verify this ID from memory nodes. >>> >>> In order to support memory affinity in later use, the valid >>> memory ranges and NUMA ID will be saved to tables. >>> >>> Signed-off-by: Wei Chen >>> --- >>> xen/arch/arm/numa_device_tree.c | 130 ++++++++++++++++++++++++++++++++ >>> 1 file changed, 130 insertions(+) >>> >>> diff --git a/xen/arch/arm/numa_device_tree.c >> b/xen/arch/arm/numa_device_tree.c >>> index 37cc56acf3..bbe081dcd1 100644 >>> --- a/xen/arch/arm/numa_device_tree.c >>> +++ b/xen/arch/arm/numa_device_tree.c >>> @@ -20,11 +20,13 @@ >>> #include >>> #include >>> #include >>> +#include >>> #include >>> #include >>> >>> s8 device_tree_numa = 0; >>> static nodemask_t processor_nodes_parsed __initdata; >>> +static nodemask_t memory_nodes_parsed __initdata; >>> >>> static int srat_disabled(void) >>> { >>> @@ -55,6 +57,79 @@ static int __init >> dtb_numa_processor_affinity_init(nodeid_t node) >>> return 0; >>> } >>> >>> +/* Callback for parsing of the memory regions affinity */ >>> +static int __init dtb_numa_memory_affinity_init(nodeid_t node, >>> + paddr_t start, paddr_t size) >>> +{ >>> + struct node *nd; >>> + paddr_t end; >>> + int i; >>> + >>> + if ( srat_disabled() ) >>> + return -EINVAL; >>> + >>> + end = start + size; >>> + if ( num_node_memblks >= NR_NODE_MEMBLKS ) >>> + { >>> + dprintk(XENLOG_WARNING, >>> + "Too many numa entry, try bigger NR_NODE_MEMBLKS \n"); >>> + bad_srat(); >>> + return -EINVAL; >>> + } >>> + >>> + /* It is fine to add this area to the nodes data it will be used >> later */ >>> + i = conflicting_memblks(start, end); >>> + /* No conflicting memory block, we can save it for later usage */; >>> + if ( i < 0 ) >>> + goto save_memblk; >>> + >>> + if ( memblk_nodeid[i] == node ) { >>> + /* >>> + * Overlaps with other memblk in the same node, warning here. >>> + * This memblk will be merged with conflicted memblk later. >>> + */ >>> + printk(XENLOG_WARNING >>> + "DT: NUMA NODE %u (%"PRIx64 >>> + "-%"PRIx64") overlaps with itself (%"PRIx64"- >> %"PRIx64")\n", >>> + node, start, end, >>> + node_memblk_range[i].start, node_memblk_range[i].end); >>> + } else { >>> + /* >>> + * Conflict with memblk in other node, this is an error. >>> + * The NUMA information is invalid, NUMA will be turn off. >>> + */ >>> + printk(XENLOG_ERR >>> + "DT: NUMA NODE %u (%"PRIx64"-%" >>> + PRIx64") overlaps with NODE %u (%"PRIx64"-%"PRIx64")\n", >>> + node, start, end, memblk_nodeid[i], >>> + node_memblk_range[i].start, node_memblk_range[i].end); >>> + bad_srat(); >>> + return -EINVAL; >>> + } >>> + >>> +save_memblk: >>> + nd = &nodes[node]; >>> + if ( !node_test_and_set(node, memory_nodes_parsed) ) { >>> + nd->start = start; >>> + nd->end = end; >>> + } else { >>> + if ( start < nd->start ) >>> + nd->start = start; >>> + if ( nd->end < end ) >>> + nd->end = end; >>> + } >>> + >>> + printk(XENLOG_INFO "DT: NUMA node %u %"PRIx64"-%"PRIx64"\n", >>> + node, start, end); >>> + >>> + node_memblk_range[num_node_memblks].start = start; >>> + node_memblk_range[num_node_memblks].end = end; >>> + memblk_nodeid[num_node_memblks] = node; >>> + num_node_memblks++; >> >> >> Is it possible to have non-contigous ranges of memory for a single NUMA >> node? >> >> Looking at the DT bindings and Linux implementation, it seems possible. >> Here, it seems that node_memblk_range/memblk_nodeid could handle it, >> but nodes couldn't. > > Yes, you're right. I copied this code for x86 ACPI NUMA. Does ACPI allow > non-contiguous ranges of memory for a single NUMA node too? I couldn't find any restriction for ACPI. Although, I only briefly looked at the spec. > If yes, I think > this will affect x86 ACPI NUMA too. In next version, we plan to merge > dtb_numa_memory_affinity_init and acpi_numa_memory_affinity_init into a > neutral function. So we can fix them at the same time. > > If not, maybe we have to keep the diversity for dtb and ACPI here. I am not entirely sure what you mean. Are you saying if ACPI doesn't allow non-contiguous ranges of memory, then we should keep the implementation separated? If so, then I disagree with that. It is fine to have code that supports more than what a firmware table supports. The main benefit is less code and therefore less long term maintenance (with the current solution we would need to check both the ACPI and DT implementation if there is a bug in one). Cheers, -- Julien Grall