From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D954AC4320A for ; Wed, 18 Aug 2021 07:44:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BEF8A6108B for ; Wed, 18 Aug 2021 07:44:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239219AbhHRHo7 (ORCPT ); Wed, 18 Aug 2021 03:44:59 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:14048 "EHLO mx0b-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238224AbhHRHo6 (ORCPT ); Wed, 18 Aug 2021 03:44:58 -0400 Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 17I7YhM5010135; Wed, 18 Aug 2021 03:43:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=5tv1k+BmeJCZwNrIRyo0XiclwIqAlvcm7xi6o8P4F+I=; b=cewXEcKVqGwZSR/MDSH3HonA5yyiExeMBYYKWlZH27FehWaU9qmD7IquwG1n8BpZgSF+ i6Ka13NT4/DcIGtj6C5L2kjUWlGFZvUAjycsWzi2dCrP183OsbMA04NDhr8NTh4/7oME v9ksHF+7/i2FpCxzpq9iNff5BI3/cbO7KAr/N1vdRvMsrb2jStTe11WfllkvM+bTPlpG BSdK3kXiyG+V/kfIX7or67fRX551GjnPNeCu2tXejK0i69Wg7g9DVJA+3VdUkL1CZe+/ /CwTO1/hBLp6vEGbxHTqLrOgO5AWelWNRpmunu45brOHDszrnprR3LTOzdqfX567jTG+ hg== Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com with ESMTP id 3agcdxcj03-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 18 Aug 2021 03:43:51 -0400 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 17I7gANr001928; Wed, 18 Aug 2021 07:43:49 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma03ams.nl.ibm.com with ESMTP id 3ae5f8e4nx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 18 Aug 2021 07:43:49 +0000 Received: from d06av23.portsmouth.uk.ibm.com (d06av23.portsmouth.uk.ibm.com [9.149.105.59]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 17I7hjXS59572614 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 18 Aug 2021 07:43:45 GMT Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 70DE8A4059; Wed, 18 Aug 2021 07:43:45 +0000 (GMT) Received: from d06av23.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9C5FDA404D; Wed, 18 Aug 2021 07:43:41 +0000 (GMT) Received: from saptagiri.in.ibm.com (unknown [9.199.45.191]) by d06av23.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 18 Aug 2021 07:43:41 +0000 (GMT) From: Srikar Dronamraju To: Ingo Molnar , Peter Zijlstra Cc: LKML , Mel Gorman , Rik van Riel , Srikar Dronamraju , Thomas Gleixner , Valentin Schneider , Vincent Guittot , Dietmar Eggemann , linuxppc-dev@lists.ozlabs.org, Nathan Lynch , Michael Ellerman , Gautham R Shenoy , Geetika Moolchandani , Laurent Dufour Subject: [PATCH] sched/topology: Skip updating masks for non-online nodes Date: Wed, 18 Aug 2021 13:13:33 +0530 Message-Id: <20210818074333.48645-1-srikar@linux.vnet.ibm.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: B45xSOVfbXkLjOI1_Hz7fBMUQqbql5OZ X-Proofpoint-GUID: B45xSOVfbXkLjOI1_Hz7fBMUQqbql5OZ X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.790 definitions=2021-08-18_02:2021-08-17,2021-08-18 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 priorityscore=1501 mlxscore=0 malwarescore=0 clxscore=1015 spamscore=0 lowpriorityscore=0 mlxlogscore=999 phishscore=0 suspectscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2107140000 definitions=main-2108180047 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Valentin Schneider The scheduler currently expects NUMA node distances to be stable from init onwards, and as a consequence builds the related data structures once-and-for-all at init (see sched_init_numa()). Unfortunately, on some architectures node distance is unreliable for offline nodes and may very well change upon onlining. Skip over offline nodes during sched_init_numa(). Track nodes that have been onlined at least once, and trigger a build of a node's NUMA masks when it is first onlined post-init. Cc: LKML Cc: linuxppc-dev@lists.ozlabs.org Cc: Nathan Lynch Cc: Michael Ellerman Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Gautham R Shenoy Cc: Dietmar Eggemann Cc: Mel Gorman Cc: Vincent Guittot Cc: Rik van Riel Cc: Geetika Moolchandani Cc: Laurent Dufour Reported-by: Geetika Moolchandani Signed-off-by: Srikar Dronamraju Signed-off-by: Valentin Schneider --- Changelog: [Fixed warning: no previous prototype for '__sched_domains_numa_masks_set'] kernel/sched/topology.c | 65 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 65 insertions(+) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index b77ad49dc14f..4e8698e62f07 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1482,6 +1482,8 @@ int sched_max_numa_distance; static int *sched_domains_numa_distance; static struct cpumask ***sched_domains_numa_masks; int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE; + +static unsigned long __read_mostly *sched_numa_onlined_nodes; #endif /* @@ -1833,6 +1835,16 @@ void sched_init_numa(void) sched_domains_numa_masks[i][j] = mask; for_each_node(k) { + /* + * Distance information can be unreliable for + * offline nodes, defer building the node + * masks to its bringup. + * This relies on all unique distance values + * still being visible at init time. + */ + if (!node_online(j)) + continue; + if (sched_debug() && (node_distance(j, k) != node_distance(k, j))) sched_numa_warn("Node-distance not symmetric"); @@ -1886,6 +1898,53 @@ void sched_init_numa(void) sched_max_numa_distance = sched_domains_numa_distance[nr_levels - 1]; init_numa_topology_type(); + + sched_numa_onlined_nodes = bitmap_alloc(nr_node_ids, GFP_KERNEL); + if (!sched_numa_onlined_nodes) + return; + + bitmap_zero(sched_numa_onlined_nodes, nr_node_ids); + for_each_online_node(i) + bitmap_set(sched_numa_onlined_nodes, i, 1); +} + +static void __sched_domains_numa_masks_set(unsigned int node) +{ + int i, j; + + /* + * NUMA masks are not built for offline nodes in sched_init_numa(). + * Thus, when a CPU of a never-onlined-before node gets plugged in, + * adding that new CPU to the right NUMA masks is not sufficient: the + * masks of that CPU's node must also be updated. + */ + if (test_bit(node, sched_numa_onlined_nodes)) + return; + + bitmap_set(sched_numa_onlined_nodes, node, 1); + + for (i = 0; i < sched_domains_numa_levels; i++) { + for (j = 0; j < nr_node_ids; j++) { + if (!node_online(j) || node == j) + continue; + + if (node_distance(j, node) > sched_domains_numa_distance[i]) + continue; + + /* Add remote nodes in our masks */ + cpumask_or(sched_domains_numa_masks[i][node], + sched_domains_numa_masks[i][node], + sched_domains_numa_masks[0][j]); + } + } + + /* + * A new node has been brought up, potentially changing the topology + * classification. + * + * Note that this is racy vs any use of sched_numa_topology_type :/ + */ + init_numa_topology_type(); } void sched_domains_numa_masks_set(unsigned int cpu) @@ -1893,8 +1952,14 @@ void sched_domains_numa_masks_set(unsigned int cpu) int node = cpu_to_node(cpu); int i, j; + __sched_domains_numa_masks_set(node); + for (i = 0; i < sched_domains_numa_levels; i++) { for (j = 0; j < nr_node_ids; j++) { + if (!node_online(j)) + continue; + + /* Set ourselves in the remote node's masks */ if (node_distance(j, node) <= sched_domains_numa_distance[i]) cpumask_set_cpu(cpu, sched_domains_numa_masks[i][j]); } -- 2.18.2