From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADE5BC8300B for ; Thu, 30 Apr 2020 06:46:26 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 87D692173E for ; Thu, 30 Apr 2020 06:46:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 87D692173E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1jU2xE-0002Ec-Tg; Thu, 30 Apr 2020 06:45:52 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1jU2xD-0002EX-IL for xen-devel@lists.xenproject.org; Thu, 30 Apr 2020 06:45:51 +0000 X-Inumbo-ID: 3a6d4408-8aae-11ea-9a07-12813bfff9fa Received: from mx2.suse.de (unknown [195.135.220.15]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 3a6d4408-8aae-11ea-9a07-12813bfff9fa; Thu, 30 Apr 2020 06:45:50 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 2F909AB76; Thu, 30 Apr 2020 06:45:48 +0000 (UTC) Subject: Re: [PATCH 2/2] xen: credit2: limit the max number of CPUs in a runqueue To: Dario Faggioli References: <158818022727.24327.14309662489731832234.stgit@Palanthas> <158818179558.24327.11334680191217289878.stgit@Palanthas> From: Jan Beulich Message-ID: <3db33b8a-ba97-f302-a325-e989ff0e7084@suse.com> Date: Thu, 30 Apr 2020 08:45:48 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <158818179558.24327.11334680191217289878.stgit@Palanthas> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , xen-devel@lists.xenproject.org, George Dunlap , Andrew Cooper Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" On 29.04.2020 19:36, Dario Faggioli wrote: > @@ -852,14 +862,61 @@ cpu_runqueue_match(const struct csched2_runqueue_data *rqd, unsigned int cpu) > (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)); > } > > +/* Additional checks, to avoid separating siblings in different runqueues. */ > +static bool > +cpu_runqueue_smt_match(const struct csched2_runqueue_data *rqd, unsigned int cpu) > +{ > + unsigned int nr_sibl = cpumask_weight(per_cpu(cpu_sibling_mask, cpu)); > + unsigned int rcpu, nr_smts = 0; > + > + /* > + * If we put the CPU in this runqueue, we must be sure that there will > + * be enough room for accepting its hyperthread sibling(s) here as well. > + */ > + cpumask_clear(cpumask_scratch_cpu(cpu)); > + for_each_cpu ( rcpu, &rqd->active ) > + { > + ASSERT(rcpu != cpu); > + if ( !cpumask_test_cpu(rcpu, cpumask_scratch_cpu(cpu)) ) > + { > + /* > + * For each CPU already in the runqueue, account for it and for > + * its sibling(s), independently from whether such sibling(s) are > + * in the runqueue already or not. > + * > + * Of course, if there are sibling CPUs in the runqueue already, > + * only count them once. > + */ > + cpumask_or(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), > + per_cpu(cpu_sibling_mask, rcpu)); > + nr_smts += nr_sibl; This being common code, is it appropriate to assume all CPUs having the same number of siblings? Even beyond that, iirc the sibling mask represents the online or parked siblings, but not offline ones. For the purpose here, don't you rather care about the full set? What about HT vs AMD Fam15's CUs? Do you want both to be treated the same here? Also could you outline the intentions with this logic in the description, to be able to match the goal with what gets done? > + } > + } > + /* > + * We know that neither the CPU, nor any of its sibling are here, > + * or we wouldn't even have entered the function. > + */ > + ASSERT(!cpumask_intersects(cpumask_scratch_cpu(cpu), > + per_cpu(cpu_sibling_mask, cpu))); > + > + /* Try adding CPU and its sibling(s) to the count and check... */ > + nr_smts += nr_sibl; > + > + if ( nr_smts <= opt_max_cpus_runqueue ) > + return true; > + > + return false; Fold these into return nr_smts <= opt_max_cpus_runqueue; ? > @@ -873,11 +930,44 @@ cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu) > if ( !rqi_unused && rqd->id > rqi ) > rqi_unused = true; > > - if ( cpu_runqueue_match(rqd, cpu) ) > + /* > + * Check whether the CPU should (according to the topology) and also > + * can (if we there aren't too many already) go in this runqueue. Nit: Stray "we"? > + */ > + if ( rqd->refcnt < opt_max_cpus_runqueue && > + cpu_runqueue_match(rqd, cpu) ) > { > - rqd_valid = true; > - break; > + cpumask_t *siblings = per_cpu(cpu_sibling_mask, cpu); > + > + dprintk(XENLOG_DEBUG, "CPU %d matches runq %d, cpus={%*pbl} (max %d)\n", > + cpu, rqd->id, CPUMASK_PR(&rqd->active), > + opt_max_cpus_runqueue); > + > + /* > + * If we're using core (or socket!) scheduling, or we don't have > + * hyperthreading, no need to do any further checking. > + * > + * If no (to both), but our sibling is already in this runqueue, > + * then it's also ok for the CPU to stay in this runqueue.. Nit: Stray full 2nd stop? > + * Otherwise, do some more checks, to better account for SMT. > + */ > + if ( opt_sched_granularity != SCHED_GRAN_cpu || > + cpumask_weight(siblings) <= 1 || > + cpumask_intersects(&rqd->active, siblings) ) > + { > + dprintk(XENLOG_DEBUG, "runq %d selected\n", rqd->id); > + rqd_valid = rqd; > + break; > + } > + else if ( cpu_runqueue_smt_match(rqd, cpu) ) > + { > + dprintk(XENLOG_DEBUG, "considering runq %d...\n", rqd->id); > + rqd_valid = rqd; > + } > } > + else Hard tab slipped in. > @@ -900,6 +990,12 @@ cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu) > rqd->pick_bias = cpu; > rqd->id = rqi; > } > + else > + rqd = rqd_valid; > + > + printk(XENLOG_INFO "CPU %d (sibling={%*pbl}) will go to runqueue %d with {%*pbl}\n", > + cpu, CPUMASK_PR(per_cpu(cpu_sibling_mask, cpu)), rqd->id, > + CPUMASK_PR(&rqd->active)); Iirc there's one per-CPU printk() already. On large systems this isn't very nice, so I'd like to ask that their total number at least not get further grown. Ideally there would be a less verbose summary after all CPUs have been brought up at boot, with per-CPU info be logged only during CPU hot online. Jan