From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC0EFC43457 for ; Mon, 12 Oct 2020 16:32:12 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4C70320838 for ; Mon, 12 Oct 2020 16:32:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="1oJxsltn" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4C70320838 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=V9y7q3rWqqto55j1y1OUsh5CKAoSuRwYjRbwlVGWPAc=; b=1oJxsltnSRDvwrOWkvfiajEHD MbR5mJiO5Xs3JH8EOnb2IPOFCobGOTmBudUs77ag3KVyJfJmixe08ohSaOSnKHzIVXzMZr27f1xSC rWgqVk/usREdzW8n0/1a28QG5GUAzbG9yqcnDGzy89+70j6KMS/f8YXjmLmNt4T55D2d3e65JixKS DYqh1zSA/xh+xhiT0158313p7cv2ebC5Ac+I9u1bpUomJYvVV25W9Qf4yKvyzOObYNJsrF4t+PK6v Ad4PkkNRNJpDqjwscdOiAJAB5oEUA84wJOu5Myd4i+GYErcSSZXdY4VxPHg7EkhY+Hv6l1c7ewmZ1 9hYtzDBtQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kS0if-00030F-71; Mon, 12 Oct 2020 16:30:41 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kS0ib-0002yw-Be for linux-arm-kernel@lists.infradead.org; Mon, 12 Oct 2020 16:30:38 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A20EE31B; Mon, 12 Oct 2020 09:30:34 -0700 (PDT) Received: from localhost (unknown [10.1.199.49]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 442E73F719; Mon, 12 Oct 2020 09:30:34 -0700 (PDT) Date: Mon, 12 Oct 2020 17:30:32 +0100 From: Ionela Voinescu To: Lukasz Luba Subject: Re: [PATCH v2 2/2] [RFC] CPUFreq: Add support for cpu-perf-dependencies Message-ID: <20201012163032.GA30838@arm.com> References: <2417d7b5-bc58-fa30-192c-e5991ec22ce0@arm.com> <20201008110241.dcyxdtqqj7slwmnc@vireshk-i7> <20201008150317.GB20268@arm.com> <56846759-e3a6-9471-827d-27af0c3d410d@arm.com> <20201009053921.pkq4pcyrv4r7ylzu@vireshk-i7> <42e3c8e9-cadc-d013-1e1f-fa06af4a45ff@arm.com> <20201009140141.GA4048593@bogus> <2b7b6486-2898-1279-ce9f-9e7bd3512152@arm.com> <20201012105945.GA9219@arm.com> <500510b9-58f3-90b3-8c95-0ac481d468b5@arm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <500510b9-58f3-90b3-8c95-0ac481d468b5@arm.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201012_123037_493300_3B7C3608 X-CRM114-Status: GOOD ( 58.44 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Rob Herring , daniel.lezcano@linaro.org, devicetree@vger.kernel.org, vireshk@kernel.org, linux-pm@vger.kernel.org, rjw@rjwysocki.net, linux-kernel@vger.kernel.org, sudeep.holla@arm.com, Nicola Mazzucato , Viresh Kumar , chris.redpath@arm.com, morten.rasmussen@arm.com, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Lukasz, On Monday 12 Oct 2020 at 14:48:20 (+0100), Lukasz Luba wrote: > > > On 10/12/20 11:59 AM, Ionela Voinescu wrote: > > On Monday 12 Oct 2020 at 11:22:57 (+0100), Lukasz Luba wrote: > > [..] > > > > > I thought about it and looked for other platforms' DT to see if can reuse > > > > > existing opp information. Unfortunately I don't think it is optimal. The reason > > > > > being that, because cpus have the same opp table it does not necessarily mean > > > > > that they share a clock wire. It just tells us that they have the same > > > > > capabilities (literally just tells us they have the same V/f op points). > > > > > Unless I am missing something? > > > > > > > > > > When comparing with ACPI/_PSD it becomes more intuitive that there is no > > > > > equivalent way to reveal "perf-dependencies" in DT. > > > > > > > > You should be able to by examining the clock tree. But perhaps SCMI > > > > abstracts all that and just presents virtual clocks without parent > > > > clocks available to determine what clocks are shared? Fix SCMI if that's > > > > the case. > > > > > > True, the SCMI clock does not support discovery of clock tree: > > > (from 4.6.1 Clock management protocol background) > > > 'The protocol does not cover discovery of the clock tree, which must be > > > described through firmware tables instead.' [1] > > > > > > In this situation, would it make sense, instead of this binding from > > > patch 1/2, create a binding for internal firmware/scmi node? > > > > > > Something like: > > > > > > firmware { > > > scmi { > > > ... > > > scmi-perf-dep { > > > compatible = "arm,scmi-perf-dependencies"; > > > cpu-perf-dep0 { > > > cpu-perf-affinity = <&CPU0>, <&CPU1>; > > > }; > > > cpu-perf-dep1 { > > > cpu-perf-affinity = <&CPU3>, <&CPU4>; > > > }; > > > cpu-perf-dep2 { > > > cpu-perf-affinity = <&CPU7>; > > > }; > > > }; > > > }; > > > }; > > > > > > The code which is going to parse the binding would be inside the > > > scmi perf protocol code and used via API by scmi-cpufreq.c. > > > > > > > While SCMI cpufreq would be able to benefit from the functionality that > > Nicola is trying to introduce, it's not the only driver, and more > > importantly, it's not *going* to be the only driver benefiting from > > this. > > > > Currently there is also qcom-cpufreq-hw.c and the future > > mediatek-cpufreq-hw.c that is currently under review [1]. They both do > > their frequency setting by interacting with HW/FW, and could either take > > or update their OPP tables from there. Therefore, if the platform would > > require it, they could also expose different controls for frequency > > setting and could benefit from additional information about clock > > domains (either through opp-shared or the new entries in Nicola's patch), > > without driver changes. > > > > Another point to be made is that I strongly believe this is going to be > > the norm in the future. Directly setting PLLs and regulator voltages > > has been proven unsafe and unsecure. > > > > Therefore, I see this as support for a generic cpufreq feature (a > > hardware coordination type), rather than support for a specific driver. > > > > [1] https://lkml.org/lkml/2020/9/10/11 > > > > > > > > Now regarding the 'dependent_cpus' mask. > > > > > > We could avoid adding a new field 'dependent_cpus' in policy > > > struct, but I am not sure of one bit - Frequency Invariant Engine, > > > (which is also not fixed by just adding a new cpumask). > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Let's take it step by step.. > > > > > > We have 3 subsystems to fix: > > > 1. EAS - EM has API function which takes custom cpumask, so no issue, > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > keep in mind that EAS it's using the max aggregation method > > that schedutil is using. So if we are to describe the > > functionality correctly, it needs both a cpumask describing > > the frequency domains and an aggregation method. > > EAS does not use schedutil max agregation, it calculates max_util > internally. > But isn't it the same logic mechanism that schedutil uses? > The compute_energy() loops through the CPUs in the domain and > takes the utilization from them via schedutil_cpu_util(cpu_rq(cpu)). > It figures out max_util and then em_cpu_energy() maps it to next ^^^^^^^^^^^^^^^^^^^^^^^ Same for schedutil: sugov_next_freq_shared() calls sugov_get_util() which then calls schedutil_cpu_util(). If your point is that one is applying the max function in compute_energy() while the other is doing it in sugov_next_freq_shared(), I'll re-enforce my argument that they are logically doing the same *type* of aggregation. EAS is relying on it and schedutil was purposely modified for this purpose: 938e5e4b0d15 sched/cpufreq: Prepare schedutil for Energy Aware Scheduling > frequency for the cluster. It just needs proper utilization from > CPUs, which is taken from run-queues, which is a sum of utilization > of tasks being there. This leads to problem how we account utilization > of a task. This is the place where the FIE is involved. EAS assumes the > utilization is calculated properly. This is separate. Above we were discussing the aggregation method and what CPUs this is applied on. I'll continue on FIE below. > > > > > fix would be to use it via the scmi-cpufreq.c > > > > > 2. IPA (for calculating the power of a cluster, not whole thermal needs > > > this knowledge about 'dependent cpus') - this can be fixed internally > > > > > 3. Frequency Invariant Engine (FIE) - currently it relies on schedutil > > > filtering and providing max freq of all cpus in the cluster into the > > > FIE; this info is then populated to all 'related_cpus' which will > > > have this freq (we know, because there is no other freq requests); > > > Issues: > > > 3.1. Schedutil is not going to check all cpus in the cluster to take > > > max freq, which is then passed into the cpufreq driver and FIE > > > 3.2. FIE would have to (or maybe we would drop it) have a logic similar > > > to what schedutil does (max freq search and set, then filter next > > > freq requests from other cpus in the next period e.g. 10ms) > > > 3.3. Schedutil is going to invoke freq change for each cpu independently > > > and the current code just calls arch_set_freq_scale() - adding just > > > 'dependent_cpus' won't help > > > > I don't believe these are issues. As we need changes for EAS and IPA, we'd > > need changes for FIE. We don't need more than the cpumask that shows > > frequency domains as we already already have the aggregation method that > > schedutil uses to propagate the max frequency in a domain across CPUs. > > Schedutil is going to work in !policy_is_shared() mode, which leads to > sugov_update_single() being the 'main' function. We won't have > schedutil goodness which is handling related_cpus use case. > Agreed! I did not mean that I'd rely on schedutil to do the aggregation and hand me the answer. But my suggestion is to use the same logical method - maximum, for cases where counters are not present. > Then in software FIE would you just change the call from: > arch_set_freq_scale(policy->related_cpus,...) > to: > arch_set_freq_scale(policy->dependent_cpus,...) > ? > > This code would be called from any CPU (without filtering) and it > would loop through cpumask updating freq_scale, which is wrong IMO. > You need some 'logic', which is not currently in there. > Definitely! But that's because the FIE changes above are incomplete. That's why whomever does these changes should go beyond: s/related_cpus/dependent_cpus. We don't need more information from DT additional to this dependent_cpus maks, but it does not mean the end solution for making use of it will be a simple "s/related_cpus/dependent_cpus". > Leaving the 'related_cpus' would also be wrong (because real CPU > frequency is different, so we would account task utilization wrongly). > > > > > This would be the default method if cycle counters are not present. It > > might not reflect the frequency the cores actually get from HW, but for > > that cycle counters should be used. > > IMHO the configurations with per-cpu freq requests while there are CPUs > 'dependent' and there are no HW counters to use for tasks > utilization accounting - should be blocked. Then we don't need > 'dependent_cpus' in software FIE. Then one less from your requirements > list for new cpumask. > I'd go for a default.. better have something than removing it altogether, but we'll see. I'll stop this here as I think we're distracting a bit from the main purpose of this RFC. I don't believe FIE brings an additional requirement. "Software" FIE will need fixing/optimizing/bypassing (we'll agree later on the implementation) but it does not need anything else from DT/ACPI. Thank you, Ionela. > > > > > 3.4 What would be the real frequency of these cpus and what would be > > > set to FIE > > > 3.5 FIE is going to filter to soon requests from other dependent cpus? > > > > > > IMHO the FIE needs more bits than just a new cpumask. > > > Maybe we should consider to move FIE arch_set_freq_scale() call into the > > > cpufreq driver, which will know better how to aggregate/filter requests > > > and then call FIE update? > > > > I'm quite strongly against this :). As described before, this is not a > > feature that a single driver needs, and even if it was, the aggregation > > method for FIE is not a driver policy. > > Software version of FIE has issues in this case, schedutil or EAS won't > help (different code path). > > Regards, > Lukasz _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel