From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33C90C43441 for ; Mon, 26 Nov 2018 19:07:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E22B620865 for ; Mon, 26 Nov 2018 19:07:02 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Ni0C2fdG" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E22B620865 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727027AbeK0GCD (ORCPT ); Tue, 27 Nov 2018 01:02:03 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:42658 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726140AbeK0GCD (ORCPT ); Tue, 27 Nov 2018 01:02:03 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id wAQIxHcR084443; Mon, 26 Nov 2018 19:06:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2018-07-02; bh=umxBeqp5eTRJbe6D1rEYMTt9pRUw0mf8krP+i3Asds8=; b=Ni0C2fdGgjI2xEhIi4SA72zVLgdcIbFSONvfAxRbpc+KqQa0eMUlo9tqVxl46fwThCBO YiYNAtS3z+0UYXSSO4nhBtUqiBHN4UTLAehCsElLIENj6TtUWwlxAa+fFomrMyIKsLmC 0msPhr9LOrEHrUqdM8quW+V1N7ODWS/G8ukwI4Br86QBMMaEiChv8y5Ggitgd8QOlkqz WGJS+VLR2IJ4YQkroFs1aqrwRj61INs1/9HkL4qI41rQVIWEkB1T4Zhgx+ZxwPx6KaWe xFk69ru+XPvNHuCtQalUo37yoZ3W72oqMXCfjRmcJY9Av9+GYHqPXautBTW1uYBzrim8 fg== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2130.oracle.com with ESMTP id 2nxx2tyrkh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Nov 2018 19:06:25 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id wAQJ6Ou4018766 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Nov 2018 19:06:24 GMT Received: from abhmp0020.oracle.com (abhmp0020.oracle.com [141.146.116.26]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id wAQJ6NSv002038; Mon, 26 Nov 2018 19:06:23 GMT Received: from [10.152.35.100] (/10.152.35.100) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 26 Nov 2018 11:06:23 -0800 Subject: Re: [PATCH v3 03/10] sched/topology: Provide cfs_overload_cpus bitmap To: Valentin Schneider , mingo@redhat.com, peterz@infradead.org Cc: subhra.mazumdar@oracle.com, dhaval.giani@oracle.com, daniel.m.jordan@oracle.com, pavel.tatashin@microsoft.com, matt@codeblueprint.co.uk, umgwanakikbuti@gmail.com, riel@redhat.com, jbacik@fb.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, quentin.perret@arm.com, linux-kernel@vger.kernel.org References: <1541767840-93588-1-git-send-email-steven.sistare@oracle.com> <1541767840-93588-4-git-send-email-steven.sistare@oracle.com> <0857925d-a24e-90ea-e28c-90d69b2f66dd@oracle.com> <7d9b6789-af17-bcab-e52d-7e05483e10ea@arm.com> From: Steven Sistare Organization: Oracle Corporation Message-ID: Date: Mon, 26 Nov 2018 14:06:15 -0500 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 In-Reply-To: <7d9b6789-af17-bcab-e52d-7e05483e10ea@arm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9089 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1811260163 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/20/2018 7:42 AM, Valentin Schneider wrote: > On 19/11/2018 17:33, Steven Sistare wrote: > [...] >>> >>> Thinking about misfit stealing, we can't use the sd_llc_shared's because >>> on big.LITTLE misfit migrations happen across LLC domains. >>> >>> I was thinking of adding a misfit sparsemask to the root_domain, but >>> then I thought we could do the same thing for cfs_overload_cpus. >>> >>> By doing so we'd have a single source of information for overloaded CPUs, >>> and we could filter that down during idle balance - you mentioned earlier >>> wanting to try stealing at each SD level. This would also let you get >>> rid of [PATCH 02]. >>> >>> The main part of try_steal() could then be written down as something like >>> this: >>> >>> ----->8----- >>> >>> for_each_domain(this_cpu, sd) { >>> span = sched_domain_span(sd) >>> >>> for_each_sparse_wrap(src_cpu, overload_cpus) { >>> if (cpumask_test_cpu(src_cpu, span) && >>> steal_from(dts_rq, dst_rf, &locked, src_cpu)) { >>> stolen = 1; >>> goto out; >>> } >>> } >>> } >>> >>> ------8<----- >>> >>> We could limit the stealing to stop at the highest SD_SHARE_PKG_RESOURCES >>> domain for now so there would be no behavioural change - but we'd >>> factorize the #ifdef SCHED_SMT bit. Furthermore, the door would be open >>> to further stealing. >>> >>> What do you think? >> >> That is not efficient for a multi-level search because at each domain level we >> would (re) iterate over overloaded candidates that do not belong in that level. > > > Mmm I was thinking we could abuse the wrap() and start at > (fls(prev_span) + 1), but we're not guaranteed to have contiguous spans - > the Arm Juno for instance has [0, 3, 4], [1, 2] as MC-level domains, so > that goes down the drain. > > Another thing that has been trotting in my head would be some helper to > create a cpumask from a sparsemask (some sort of sparsemask_span()), > which would let us use the standard mask operators: > > ----->8----- > struct cpumask *overload_span = sparsemask_span(overload_cpus) > > for_each_domain(this_cpu, sd) > for_each_cpu_and(src_cpu, overload_span, sched_domain_span(sd)) > > -----8>----- > > The cpumask could be part of the sparsemask struct to save us the > allocation, and only updated when calling sparsemask_span(). I thought of providing something like this along with other sparsemask utility functions, but I decided to be minimalist, and let others add more functions if/when they become needed. this_cpu_cpumask_var_ptr(select_idle_mask) is a temporary that could be used as the destination of the conversion. Also, conversion adds cost, particularly on larger systems. When comparing a cpumask and a sparsemask, it is more efficient to iterate over the smaller set, and test for membership in the larger, such as in try_steal: for_each_cpu(src_cpu, cpu_smt_mask(dst_cpu)) { if (sparsemask_test_elem(src_cpu, overload_cpus) >> To extend stealing across LLC, I would like to keep the per-LLC sparsemask, >> but add to each SD a list of sparsemask pointers. The list nodes would be >> private, but the sparsemask structs would be shared. Each list would include >> the masks that overlap the SD's members. The list would be a singleton at the >> core and LLC levels (same as the socket level for most processors), and would >> have multiple elements at the NUMA level. > > I see. As for misfit, creating asym_cpucapacity siblings of the sd_llc_*() > functions seems a bit much - there'd be a lot of redundancy for basically > just a single shared sparsemask, which is why I was rambling about moving > things to root_domain. > > Having different locations where sparsemasks are stored is a bit of a > pain which I'd like to avoid, but if it can't be unified I suppose we'll > have to live with it. I don't follow. A per-LLC sparsemask representing misfits can be allocated with one more line in sd_llc_alloc, and you can steal across LLC using the list I briefly described above. - Steve