From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFC8CC433E7 for ; Mon, 12 Oct 2020 13:50:18 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AC5B520678 for ; Mon, 12 Oct 2020 13:50:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="ALA8t8jS" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AC5B520678 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=qU8uluYA9jZbJ3wwtHuh+vpQn3TuPh2rhIat4WbSRPU=; b=ALA8t8jSqnBdrJoEPjfIJ3Ppk t4pmQpHGZLA5JNZLDJiLhYsjuBy96T3Ffuu/6pDVmP811xZZe2SGx3GZD68R5skr+ayyollKjiMSn xPxQnd2A/8euBDWIpqLiiIEdUywV9U3i7+iA3+mKculVb4tmNlx+rfGVyjMBy2hW3aoc/VNdJ3/Bt UA8/Xrpn28jfkQapFANefq8BhOZEY2BZq3UzjFs/DgwwFO1Pw2isqyWsrMXI5oYkoJQD8Fy1lrHxA +SPBExMeRJlmKnjJPbKE3lNVNcjFz1LbgPXNOKGjMQU/ab7KKGOdWgQoh8Rw+E1ttY2S8vHCNRLPH LZ1U7bkQw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kRyBi-0003B1-17; Mon, 12 Oct 2020 13:48:30 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kRyBf-0003A2-Ai for linux-arm-kernel@lists.infradead.org; Mon, 12 Oct 2020 13:48:28 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 73AC1D6E; Mon, 12 Oct 2020 06:48:25 -0700 (PDT) Received: from [10.57.55.84] (unknown [10.57.55.84]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7789D3F66B; Mon, 12 Oct 2020 06:48:22 -0700 (PDT) Subject: Re: [PATCH v2 2/2] [RFC] CPUFreq: Add support for cpu-perf-dependencies To: Ionela Voinescu References: <20200924095347.32148-3-nicola.mazzucato@arm.com> <20201006071909.3cgz7i5v35dgnuzn@vireshk-i7> <2417d7b5-bc58-fa30-192c-e5991ec22ce0@arm.com> <20201008110241.dcyxdtqqj7slwmnc@vireshk-i7> <20201008150317.GB20268@arm.com> <56846759-e3a6-9471-827d-27af0c3d410d@arm.com> <20201009053921.pkq4pcyrv4r7ylzu@vireshk-i7> <42e3c8e9-cadc-d013-1e1f-fa06af4a45ff@arm.com> <20201009140141.GA4048593@bogus> <2b7b6486-2898-1279-ce9f-9e7bd3512152@arm.com> <20201012105945.GA9219@arm.com> From: Lukasz Luba Message-ID: <500510b9-58f3-90b3-8c95-0ac481d468b5@arm.com> Date: Mon, 12 Oct 2020 14:48:20 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <20201012105945.GA9219@arm.com> Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201012_094827_454448_85BF8852 X-CRM114-Status: GOOD ( 42.23 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Rob Herring , daniel.lezcano@linaro.org, devicetree@vger.kernel.org, vireshk@kernel.org, linux-pm@vger.kernel.org, rjw@rjwysocki.net, linux-kernel@vger.kernel.org, sudeep.holla@arm.com, Nicola Mazzucato , Viresh Kumar , chris.redpath@arm.com, morten.rasmussen@arm.com, linux-arm-kernel@lists.infradead.org Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 10/12/20 11:59 AM, Ionela Voinescu wrote: > On Monday 12 Oct 2020 at 11:22:57 (+0100), Lukasz Luba wrote: > [..] >>>> I thought about it and looked for other platforms' DT to see if can reuse >>>> existing opp information. Unfortunately I don't think it is optimal. The reason >>>> being that, because cpus have the same opp table it does not necessarily mean >>>> that they share a clock wire. It just tells us that they have the same >>>> capabilities (literally just tells us they have the same V/f op points). >>>> Unless I am missing something? >>>> >>>> When comparing with ACPI/_PSD it becomes more intuitive that there is no >>>> equivalent way to reveal "perf-dependencies" in DT. >>> >>> You should be able to by examining the clock tree. But perhaps SCMI >>> abstracts all that and just presents virtual clocks without parent >>> clocks available to determine what clocks are shared? Fix SCMI if that's >>> the case. >> >> True, the SCMI clock does not support discovery of clock tree: >> (from 4.6.1 Clock management protocol background) >> 'The protocol does not cover discovery of the clock tree, which must be >> described through firmware tables instead.' [1] >> >> In this situation, would it make sense, instead of this binding from >> patch 1/2, create a binding for internal firmware/scmi node? >> >> Something like: >> >> firmware { >> scmi { >> ... >> scmi-perf-dep { >> compatible = "arm,scmi-perf-dependencies"; >> cpu-perf-dep0 { >> cpu-perf-affinity = <&CPU0>, <&CPU1>; >> }; >> cpu-perf-dep1 { >> cpu-perf-affinity = <&CPU3>, <&CPU4>; >> }; >> cpu-perf-dep2 { >> cpu-perf-affinity = <&CPU7>; >> }; >> }; >> }; >> }; >> >> The code which is going to parse the binding would be inside the >> scmi perf protocol code and used via API by scmi-cpufreq.c. >> > > While SCMI cpufreq would be able to benefit from the functionality that > Nicola is trying to introduce, it's not the only driver, and more > importantly, it's not *going* to be the only driver benefiting from > this. > > Currently there is also qcom-cpufreq-hw.c and the future > mediatek-cpufreq-hw.c that is currently under review [1]. They both do > their frequency setting by interacting with HW/FW, and could either take > or update their OPP tables from there. Therefore, if the platform would > require it, they could also expose different controls for frequency > setting and could benefit from additional information about clock > domains (either through opp-shared or the new entries in Nicola's patch), > without driver changes. > > Another point to be made is that I strongly believe this is going to be > the norm in the future. Directly setting PLLs and regulator voltages > has been proven unsafe and unsecure. > > Therefore, I see this as support for a generic cpufreq feature (a > hardware coordination type), rather than support for a specific driver. > > [1] https://lkml.org/lkml/2020/9/10/11 > >> >> Now regarding the 'dependent_cpus' mask. >> >> We could avoid adding a new field 'dependent_cpus' in policy >> struct, but I am not sure of one bit - Frequency Invariant Engine, >> (which is also not fixed by just adding a new cpumask). > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Let's take it step by step.. >> >> We have 3 subsystems to fix: >> 1. EAS - EM has API function which takes custom cpumask, so no issue, > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > keep in mind that EAS it's using the max aggregation method > that schedutil is using. So if we are to describe the > functionality correctly, it needs both a cpumask describing > the frequency domains and an aggregation method. EAS does not use schedutil max agregation, it calculates max_util internally. The compute_energy() loops through the CPUs in the domain and takes the utilization from them via schedutil_cpu_util(cpu_rq(cpu)). It figures out max_util and then em_cpu_energy() maps it to next frequency for the cluster. It just needs proper utilization from CPUs, which is taken from run-queues, which is a sum of utilization of tasks being there. This leads to problem how we account utilization of a task. This is the place where the FIE is involved. EAS assumes the utilization is calculated properly. > >> fix would be to use it via the scmi-cpufreq.c > >> 2. IPA (for calculating the power of a cluster, not whole thermal needs >> this knowledge about 'dependent cpus') - this can be fixed internally > >> 3. Frequency Invariant Engine (FIE) - currently it relies on schedutil >> filtering and providing max freq of all cpus in the cluster into the >> FIE; this info is then populated to all 'related_cpus' which will >> have this freq (we know, because there is no other freq requests); >> Issues: >> 3.1. Schedutil is not going to check all cpus in the cluster to take >> max freq, which is then passed into the cpufreq driver and FIE >> 3.2. FIE would have to (or maybe we would drop it) have a logic similar >> to what schedutil does (max freq search and set, then filter next >> freq requests from other cpus in the next period e.g. 10ms) >> 3.3. Schedutil is going to invoke freq change for each cpu independently >> and the current code just calls arch_set_freq_scale() - adding just >> 'dependent_cpus' won't help > > I don't believe these are issues. As we need changes for EAS and IPA, we'd > need changes for FIE. We don't need more than the cpumask that shows > frequency domains as we already already have the aggregation method that > schedutil uses to propagate the max frequency in a domain across CPUs. Schedutil is going to work in !policy_is_shared() mode, which leads to sugov_update_single() being the 'main' function. We won't have schedutil goodness which is handling related_cpus use case. Then in software FIE would you just change the call from: arch_set_freq_scale(policy->related_cpus,...) to: arch_set_freq_scale(policy->dependent_cpus,...) ? This code would be called from any CPU (without filtering) and it would loop through cpumask updating freq_scale, which is wrong IMO. You need some 'logic', which is not currently in there. Leaving the 'related_cpus' would also be wrong (because real CPU frequency is different, so we would account task utilization wrongly). > > This would be the default method if cycle counters are not present. It > might not reflect the frequency the cores actually get from HW, but for > that cycle counters should be used. IMHO the configurations with per-cpu freq requests while there are CPUs 'dependent' and there are no HW counters to use for tasks utilization accounting - should be blocked. Then we don't need 'dependent_cpus' in software FIE. Then one less from your requirements list for new cpumask. > >> 3.4 What would be the real frequency of these cpus and what would be >> set to FIE >> 3.5 FIE is going to filter to soon requests from other dependent cpus? >> >> IMHO the FIE needs more bits than just a new cpumask. >> Maybe we should consider to move FIE arch_set_freq_scale() call into the >> cpufreq driver, which will know better how to aggregate/filter requests >> and then call FIE update? > > I'm quite strongly against this :). As described before, this is not a > feature that a single driver needs, and even if it was, the aggregation > method for FIE is not a driver policy. Software version of FIE has issues in this case, schedutil or EAS won't help (different code path). Regards, Lukasz _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel